Skip to content

Add keep_tokens_separator as alternative for keep_tokens#975

Merged
kohya-ss merged 3 commits intokohya-ss:devfrom
Linaqruf:dev
Dec 12, 2023
Merged

Add keep_tokens_separator as alternative for keep_tokens#975
kohya-ss merged 3 commits intokohya-ss:devfrom
Linaqruf:dev

Conversation

@Linaqruf
Copy link
Copy Markdown
Contributor

Hi, great job as always.

I propose this feature to be added; it's inspired by NovelAI tagging. They train their model by putting some important tags at the head of the tags and shuffle the rest.

Got this from their docs:

1boy, 1girl, characters, series, everything else in any order

And this is also confirmed by finetunej.
image

And we know that some Danbooru images have more than one tag in tag_character_string and tag_copyright_string, as well as some of them having both 1boy, 1girl in one picture, so using keep_tokens alone is not effective to 'mimic' NovelAI tagging.

The keep_tokens_separator is proposed so we can keep tokens from being shuffled for different captions.

For example:

keep_tokens_separator = "|||"
  • caption 1
1girl, frieren, sousou no frieren, cyan yu, ||| rating: general, black footwear, black pantyhose, blue flower, boots, capelet, closed mouth, earrings, elf, flower, green eyes, grey hair, hugging own legs, jewelry, long hair, long sleeves, looking at viewer, pantyhose, pointy ears, sidelocks, simple background, sitting, solo, thick eyebrows, twintails, white background, white capelet, absurdres, highres, medium quality
  • caption 2
1boy, 1girl, linie (sousou no frieren), lugner (sousou no frieren), sousou no frieren, fujimoto kouki, ||| rating: general, arms at sides, arms behind back, black coat, blonde hair, boots, brown dress, brown ribbon, closed mouth, coat, demon boy, demon girl, demon horns, dress, facing away, flower, from behind, hair ribbon, horns, long hair, long sleeves, orange hair, outdoors, own hands together, pink flower, puffy sleeves, ribbon, sky, standing, tree, twintails, white footwear, wide sleeves, commentary request, highres, official art, promotional art, normal quality

Haven't tested for fine-tuning but I train some LoRA with this separator

link to model | link to datasets (5.65gb)

Image 1 Image 2

Thank you!

@Linaqruf
Copy link
Copy Markdown
Contributor Author

Linaqruf commented Dec 1, 2023

Btw I forgot to thanks @KohakuBlueleaf for the idea, I probably would add new key for keep_tokens in the JSON file without his idea for shuffle separator. ✌️

image

@kohya-ss kohya-ss merged commit 034a49c into kohya-ss:dev Dec 12, 2023
@kohya-ss
Copy link
Copy Markdown
Owner

Thank you for this! I noticed a problem after merging and modified it.

nana0304 pushed a commit to nana0304/sd-scripts that referenced this pull request Jun 4, 2025
Add keep_tokens_separator as alternative for keep_tokens
nana0304 pushed a commit to nana0304/sd-scripts that referenced this pull request Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants