Add keep_tokens_separator as alternative for keep_tokens#975
Merged
kohya-ss merged 3 commits intokohya-ss:devfrom Dec 12, 2023
Linaqruf:dev
Merged
Add keep_tokens_separator as alternative for keep_tokens#975kohya-ss merged 3 commits intokohya-ss:devfrom Linaqruf:dev
kohya-ss merged 3 commits intokohya-ss:devfrom
Linaqruf:dev
Conversation
Contributor
Author
|
Btw I forgot to thanks @KohakuBlueleaf for the idea, I probably would add new key for keep_tokens in the JSON file without his idea for shuffle separator. ✌️ |
kohya-ss
added a commit
that referenced
this pull request
Dec 12, 2023
Owner
|
Thank you for this! I noticed a problem after merging and modified it. |
nana0304
pushed a commit
to nana0304/sd-scripts
that referenced
this pull request
Jun 4, 2025
Add keep_tokens_separator as alternative for keep_tokens
nana0304
pushed a commit
to nana0304/sd-scripts
that referenced
this pull request
Jun 4, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Hi, great job as always.
I propose this feature to be added; it's inspired by NovelAI tagging. They train their model by putting some important tags at the head of the tags and shuffle the rest.
Got this from their docs:
And this is also confirmed by finetunej.

And we know that some Danbooru images have more than one tag in
tag_character_stringandtag_copyright_string, as well as some of them having both1boy, 1girlin one picture, so usingkeep_tokensalone is not effective to 'mimic' NovelAI tagging.The
keep_tokens_separatoris proposed so we can keep tokens from being shuffled for different captions.For example:
Haven't tested for fine-tuning but I train some LoRA with this separator
link to model | link to datasets (5.65gb)
Thank you!