Merge lets you combine datasets into a new output dataset while preserving as much safe metadata as possible.
It is designed to be conservative. Source datasets are not silently overwritten.
When to Merge
Use merge when you want to combine:
- related dataset files
- cleaned subsets
- work from different sessions
- imported data and native RoleThread data
- separate scene or tag groups
If you only need to export a filtered slice, use Export instead. Merge is for creating a new combined dataset.
Merge Workflow
A normal merge workflow:
- Choose the source dataset files.
- Choose the output path.
- Decide whether to shuffle the merged entries.
- Run the merge.
- Review the output dataset.
- Run Validation on the merge output.
- Use Insights to check structure, metadata, and duplicates.
Validation after merge is mostly a review and cleanup step. Merge is designed to preserve safe structure and metadata, but source datasets may bring in older formats, imported tags, duplicate content, or sidecar conflicts that deserve a pass.
New Output Identity
A merged dataset receives a fresh dataset UUID.
This matters because the merge output is a new dataset, even when it was built from existing sources.
Entries that survive the merge keep stable entry identity where appropriate, but the merged dataset itself gets its own identity.
This prevents the output from pretending to be one of the sources.
Duplicate Handling
Merge uses deterministic duplicate handling.
When entries are considered duplicates, the first matching content wins for the saved entry content and entry UUID.
This means merge results are predictable. Source order matters.
First-Wins Content Policy
First-wins means the first duplicate entry keeps the canonical content.
Later duplicates do not replace the survivor's message text or UUID. This avoids unstable merge results where later files unexpectedly rewrite earlier entries.
If you want a later version to win, place that dataset earlier in the merge order or edit the output after merging.
Tag Merging
Duplicate entries may carry different tags.
RoleThread preserves useful organization by merging and deduplicating tags from duplicate entries into the surviving entry where safe.
This avoids losing metadata just because the content was deduplicated.
Sidecar Import During Merge
Merge can inspect sibling sidecars near source datasets.
Safe metadata may be imported, including:
- tag categories
- tags and aliases
- archived/imported tag metadata
- character definitions
- character mappings
- system prompt templates
Conflicts are handled conservatively. RoleThread avoids overwriting existing registry meaning without a clear reason.
Character Mapping Preservation
Character mappings are preserved only for entries that survive the merge.
If a duplicate entry is discarded, its mappings are not copied as orphan metadata. If tags from that duplicate are merged into the survivor, the entry content and character mappings still follow the surviving entry.
This keeps mappings tied to real entries.
Source Dataset Safety
Merge writes a new output dataset.
It does not silently overwrite source datasets. You should still choose your output path carefully, but the merge workflow is built around producing a separate result.
After Merge
After a merge, run:
- Validation to catch structural or imported-data issues
- Insights to inspect quality, depth, duplication, and metadata coverage
- Manage Dataset filters to review imported tags or untagged entries
Common Mistake
Mistake: Assuming merge edits the source datasets.
Better mental model: Merge creates a new output dataset with its own dataset UUID. The sources are inputs, not the final working file.
Practical Tip
For predictable duplicate handling, put your preferred source file first.