Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

README.md

Standard training data

The training data was generated by running scripts/01-generate-synthetic-training-data.py and scripts/02-split-generated-data.py on a list of common english words, available here.

Generating your own training data

If you want to generate your own dataset, you simply need to create a training and a validation file. They follow a simple format:

<CHARACTER SEQUENCE><TAB><TYPE><TAB><SUBTYPE>

Example

ngnix	STRING	PROGRAM
Y29tbWl4dHVyZQ==	HASH	PASSWORD
b3d2cf2ec3894374b37d1b79edd57ad4	HASH	API_KEY
9c795829-75bc-4596-87d3-3508372bbf5f	HASH	API_KEY
licenser	STRING	WORD

NOTE: There are no predefined values for type and subtype.