hello!
here is the data compressed and compiled into the word models that compromise uses to understand text.
there are some things to note:
-
run
npm run packafter making a change, to see changes appear. -
lexicon words are lowercased and compressed with efrt, some characters are reserved -
[0-9,;!:|¦] -
be careful adding ambiguous words - 'ray' should not be a #Person - it's a better fit for
./switches/person-date.js -
many word-lists have conjugations automatically applied to them - #Singular words are pluralized, etc.
the lexicon output data can be found in ./src/2-two/preTagger/model/lexicon/_data.js
and the word-conjugation data can be found in ./src/2-two/preTagger/model/models/_data.js
for more information, see the compromise-lexicon docs.