This project is under active development, please stay tuned for updates. More documentation and examples are comming.
This library connects to Stanford CoreNLP either via HTTP or by spawning processes. The first (HTTP) is the preferred method since it requires CoreNLP to initialize just once to serve many requests, it also avoids extra I/O given that the CLI method need to write temporary files to run.
npm i --save corenlpVia npm, run this command from your own project after having installed this library:
npm explore corenlp -- npm run corenlp:downloadOnce downloaded you can easily start the server by running
npm explore corenlp -- npm run corenlp:serverOr you can manually download the project from the Stanford's CoreNLP download section at: https://stanfordnlp.github.io/CoreNLP/download.html You may want to download, apart of the full package, other language models (see more on that page).
# Run the server using all jars in the current directory (e.g., the CoreNLP home directory)
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000CoreNLP connects by default via StanfordCoreNLPServer, using port 9000. You can also opt to setup the connection differently:
import CoreNLP, { Properties, Pipeline, ConnectorServer } from 'corenlp';
const connector = new ConnectorServer({ dsn: 'http://localhost:9000' });
const props = new Properties({
annotators: 'tokenize,ssplit,pos,lemma,ner,parse',
});
const pipeline = new Pipeline(props, 'English', connector);CoreNLP expects by default the StanfordCoreNLP package to be placed (unzipped) inside the path ${YOUR_NPM_PROJECT_ROOT}/corenlp/. You can also opt to setup the CLI interface differently:
import CoreNLP, { Properties, Pipeline, ConnectorCli } from 'corenlp';
const connector = new ConnectorCli({
classPath: 'corenlp/stanford-corenlp-full-2017-06-09/*', // specify the paths relative to your npm project root
mainClass: 'edu.stanford.nlp.pipeline.StanfordCoreNLP', // optional
props: 'StanfordCoreNLP-spanish.properties', // optional
});
const props = new Properties({
annotators: 'tokenize,ssplit,pos,lemma,ner,parse',
});
const pipeline = new Pipeline(props, 'English', connector);// ... initialize pipeline first (see above)
const sent = new CoreNLP.simple.Sentence('Hello world');
pipeline.annotate(sent)
.then(sent => {
console.log(sent.words());
})
.catch(err => {
console.log('err', err);
});NOTE1: The examples below assumes that StanfordCoreNLP is running on port 9000.
NOTE2: The examples below assumes es6 syntax, if you use require, use as follows: var CoreNLP = require('corenlp').default;
import CoreNLP, { Properties, Pipeline } from 'corenlp';
const props = new Properties({
annotators: 'tokenize,ssplit,pos,lemma,ner,parse',
});
const pipeline = new Pipeline(props, 'English'); // uses ConnectorServer by default
const sent = new CoreNLP.simple.Sentence('The little dog runs so fast.');
pipeline.annotate(sent)
.then(sent => {
console.log('parse', sent.parse());
console.log(CoreNLP.util.Tree.fromSentence(sent).dump());
})
.catch(err => {
console.log('err', err);
});- setProperty(name, value)
Property setter
- getProperty(name, default) ⇒
* Property getter
- getProperties() ⇒
Object Returns an Object map of the given properties
- toJson() ⇒
Object Returns a JSON object of the given properties
- toPropertiessFileContent() ⇒
string Returns a properties file-like string of the given properties
- get() ⇒
Promise.<Object> - get(config, [utility]) ⇒
Promise.<Object> - text() ⇒
string Get a string representation of the raw text
- setLanguageISO() ⇒
string Sets the language ISO (given by the pipeline during the annotation process) This is solely to keep track of the language chosen for further analysis
- getLanguageISO() ⇒
string Retrieves the language ISO
- addAnnotator(annotator)
Marks an annotator as a met dependency
- addAnnotators(annotators)
Marks multiple annotators as a met dependencies
- removeAnnotator(annotator)
Unmarks an annotator as a met dependency
- hasAnnotator(annotator) ⇒
boolean Tells you if an annotator is a met dependency
- hasAnyAnnotator(annotators) ⇒
boolean Tells you if at least on of a list of annotators is a met dependency
- toString() ⇒
string Get a string representation
- equalsTo(annotator) ⇒
boolean Defines whether a given annotator is the same as current, using shallow compare. This is useful for a Document or Sentence to validate if the minimum of annotators required were already applied to them. Allows at the same time the users to instantiate new annotators and configure them as needed.
- options() ⇒
Object Get an Object key-value representation of the annotor's options (excluding prefix)
- option(key, [value]) ⇒
string Get/Set an option value
- dependencies() ⇒
Array.<Annotator> Get a list of annotators dependencies
- pipeline() ⇒
Array.<string> Get a list of annotators dependencies, following by this annotator, all this as a list of strings This is useful to fulfill the
annotatorsparam in CoreNLP API properties.- pipelineOptions() ⇒
Array.<string> Get an object of all the Annotator options including the current and all its dependencies, prefixed by the annotator names This is useful to fulfill the options params in CoreNLP API properties.
- toString() ⇒
string Get a string representation
- sentences() ⇒
Array.<Sentence> Get a list of sentences
- sentence(index) ⇒
Sentence Get the sentence for a given index
- coref() ⇒
Promise.<DeterministicCorefAnnotator> TODO requirements: tokenize, ssplit, pos, lemma, ner, parse https://stanfordnlp.github.io/CoreNLP/dcoref.html
- fromJSON(data) ⇒
Document Update an instance of Document with data provided by a JSON
- fromJSON(data) ⇒
Document Get an instance of Document from a given JSON
- groups() ⇒
Array.<ExpressionSentenceMatchGroup> Returns the main and labeled groups as a list of ExpressionSentenceMatchGroup
- group(label) ⇒
ExpressionSentenceMatchGroup Nodes in a Macthed expression can be named, we call them groups here, and the labels are the name of the nodes.
- labels() ⇒
Array.<string> Labels are those aliases you can add to a group match expression, for example, in Semgrex, you can do {ner:/PERSON/=good_guy}, from where "good_guy" would be the label and internally it will come as $good_guy as a member of ExpressionSentenceMatchGroup.
- fromJson(data) ⇒
ExpressionSentenceMatch Update an instance of ExpressionSentenceMatch with data provided by a JSON
- fromJson(data) ⇒
ExpressionSentenceMatch Get an instance of ExpressionSentenceMatch from a given JSON
- matches() ⇒
Array.<ExpressionSentenceMatch> Retrieves all the contained ExpressionSentenceMatch instances
- match(index) ⇒
ExpressionSentenceMatch Retrieves a ExpressionSentenceMatch at the index specified
- mergeTokensFromSentence() ⇒
ExpressionSentence The Expression / ExpressionSentence objects comes from outside the standard CoreNLP pipelines. This mean that neither
TokensRegex,SemgrexnorTregexwill tag the nodes with POS, lemma, NER or any otehr annotation data. This is sometimes a usful resource to count with, if you can apart of getting the matching groups, get the annotated tokens for each word in the match group.- fromJson(data) ⇒
ExpressionSentenceJSON Update an instance of ExpressionSentence with data provided by a JSON
- fromJson(data) ⇒
ExpressionSentence Get an instance of ExpressionSentence from a given JSON of sentence matches
- toString() ⇒
string Get a string representation
- pattern() ⇒
string Get the pattern
- sentences() ⇒
Array.<ExpressionSentence> Get a list of sentences
- sentence(index) ⇒
ExpressionSentence Get the sentence for a given index
- mergeTokensFromDocument(document) ⇒
Expression Hydrate the Expression instance with Token objects from an annotated Document
- fromJson(data) ⇒
Expression Update an instance of Expression with data provided by a JSON
- fromJson(data) ⇒
Expression Get an instance of Expression from a given JSON
- toString() ⇒
string Get a string representation
- fromJSON(data) ⇒
Governor Get an instance of Governor from a given JSON
- toString() ⇒
string Get a string representation
- index() ⇒
number Get the index relative to the parent document
- parse() ⇒
string Get a string representation of the parse tree structure
- words() ⇒
Array.<string> Get an array of string representations of the sentence words
- word(index) ⇒
string Get a string representations of the Nth word of the sentence
- posTags() ⇒
Array.<string> Get a string representations of the tokens part of speech of the sentence
- posTag(index) ⇒
string Get a string representations of the Nth token part of speech of the sentence
- lemmas() ⇒
Array.<string> Get a string representations of the tokens lemmas of the sentence
- lemma(index) ⇒
string Get a string representations of the Nth token lemma of the sentence
- nerTags() ⇒
Array.<string> Get a string representations of the tokens nerTags of the sentence
- nerTag(index) ⇒
string Get a string representations of the Nth token nerTag of the sentence
- governors() ⇒
Array.<Governor> Get a list of annotated governors by the dependency-parser
- governor() ⇒
Governor Get the N-th annotated governor by the dependency-parser annotator
- tokens() ⇒
Array.<Token> Get an array of token representations of the sentence words
- token() ⇒
Token Get the Nth token of the sentence
- toJSON() ⇒
SentenceJSON The following arrow function
data => Sentence.fromJSON(data).toJSON()is idempontent, if considering shallow comparison, not by reference. This JSON will respects the same structure as it expects from {@see Sentence#fromJSON}.- fromJSON(data, [isSentence]) ⇒
Sentence Update an instance of Sentence with data provided by a JSON
- fromJSON(data, [isSentence]) ⇒
Sentence Get an instance of Sentence from a given JSON
- toString() ⇒
string Get a string representation
- index() ⇒
number Get the
indenumber associated by the StanfordCoreNLP This index is relative to the sentence it belongs to, and is a 1-based (possitive integer). This number is useful to match tokens within a sentence for depparse, coreference, etc.- word() ⇒
string Get the original word
- originalText() ⇒
string Get the original text
- characterOffsetBegin() ⇒
number A 0-based index of the word's initial character within the sentence
- characterOffsetEnd() ⇒
number Get the characterOffsetEnd relative to the parent sentence A 0-based index of the word's ending character within the sentence
- before() ⇒
string Get the
beforestring relative to the container sentence- after() ⇒
string Get the
afterstring relative to the container sentence- lemma() ⇒
string Get the annotated lemma
- pos() ⇒
string Get the annotated part-of-speech for the current token
- posInfo() ⇒
PosInfo Get additional metadata about the POS annotation NOTE: Do not use this method other than just for study or analysis purposes.
- ner() ⇒
string Get the annotated named-entity for the current token
- toJSON() ⇒
TokenJSON The following arrow function
data => Token.fromJSON(data).toJSON()is idempontent, if considering shallow comparison, not by reference. This JSON will respects the same structure as it expects from {@see Token#fromJSON}.- fromJSON(data) ⇒
Token Get an instance of Token from a given JSON
- dump() ⇒
string Get a Tree string representation for debugging purposes
- visitDeepFirst()
Performs Deep-first Search calling a visitor for each node
- visitDeepFirstRight()
Performs Deep-first Search calling a visitor for each node, from right to left
- visitLeaves()
Performs Deep-first Search calling a visitor only over leaves
- fromSentence(sentence, [doubleLink]) ⇒
Tree - fromString(str, [doubleLink]) ⇒
Tree
- DocumentJSON
The CoreNLP API JSON structure representing a document
- ExpressionSentenceMatchGroup
- ExpressionSentenceMatchJSON
A ExpressionSentenceMatch of either
TokensRegex,SemrgexorTregex.- ExpressionJSON
The CoreNLP API JSON structure representing an expression This expression structure can be found as the output of
TokensRegex,SemrgexandTregex.- GovernorJSON
The CoreNLP API JSON structure representing a governor
- SentenceJSON
The CoreNLP API JSON structure representing a sentence
- TokenJSON
The CoreNLP API JSON structure representing a token
- PosInfo
PosInfo does not come as part of the CoreNLP. It is an indexed reference of POS tags by language provided by this library. It's only helpful for analysis and study. The data was collected from different documentation resources on the Web. The PosInfo may vary depending on the POS annotation types used, for example, CoreNLP for Spanish uses custom POS tags developed by Stanford, but this can also be changed to Universal Dependencies, which uses different tags.
- DeterministicCorefAnnotator
TODO ?? ⇐
Annotator Class representing an DeterministicCorefAnnotator.
- DependencyParseAnnotator
Hydrates {@link Sentence.governors()} ⇐
Annotator Class representing an DependencyParseAnnotator.
- MorphaAnnotator
Hydrates {@link Token.lemma()} ⇐
Annotator Class representing an MorphaAnnotator.
- NERClassifierCombiner
Hydrates {@link Token.ner()} ⇐
Annotator Class representing an NERClassifierCombiner.
- ParserAnnotator
Hydrates {@link Token.parse()} ⇐
Annotator Class representing an ParserAnnotator.
- POSTaggerAnnotator
Hydrates {@link Token.pos()} ⇐
Annotator Class representing an POSTaggerAnnotator.
- RegexNERAnnotator
TODO ?? ⇐
Annotator Class representing an RegexNERAnnotator.
- RelationExtractorAnnotator
TODO ?? ⇐
Annotator Class representing an RelationExtractorAnnotator.
- WordsToSentenceAnnotator
Combines multiple {@link Token}s into sentences ⇐
Annotator Class representing an WordsToSentenceAnnotator.
- TokenizerAnnotator
Identifies {@link Token}s ⇐
Annotator Class representing an TokenizerAnnotator.
Property setter
Kind: global function
| Param | Type | Description |
|---|---|---|
| name | string |
the property name |
| value | * |
the property value |
Property getter
Kind: global function
Returns: * - value - the property value
| Param | Type | Description |
|---|---|---|
| name | string |
the property name |
| default | * |
the defaut value to return if not set |
Returns an Object map of the given properties
Kind: global function
Returns: Object - properties - the properties object
Returns a JSON object of the given properties
Kind: global function
Returns: Object - json - the properties object
Returns a properties file-like string of the given properties
Kind: global function
Returns: string - properties - the properties content
Kind: global function
| Param | Type | Description |
|---|---|---|
| config | Object |
|
| config.annotators | Array.<string> |
The list of annotators that edfines the pipeline |
| config.text | string |
The text to run the pipeline against |
| config.options | Object |
Additinal options (properties) for the pipeline |
| config.language | string |
Language full name in CamelCase (eg. Spanish) |
| [utility] | '' | 'tokensregex' | 'semgrex' | 'tregex' |
Name of the utility to use NOTE: most of the utilities receives properties, these should be passed via the options param |
Get a string representation of the raw text
Kind: global function
Returns: string - text
Sets the language ISO (given by the pipeline during the annotation process) This is solely to keep track of the language chosen for further analysis
Kind: global function
Returns: string - text
Retrieves the language ISO
Kind: global function
Returns: string - text
Marks an annotator as a met dependency
Kind: global function
| Param | Type |
|---|---|
| annotator | Annotator | function |
Marks multiple annotators as a met dependencies
Kind: global function
| Param | Type |
|---|---|
| annotators | Array.<(Annotator|function())> |
Unmarks an annotator as a met dependency
Kind: global function
| Param | Type |
|---|---|
| annotator | Annotator | function |
Tells you if an annotator is a met dependency
Kind: global function
Returns: boolean - hasAnnotator
| Param | Type |
|---|---|
| annotator | Annotator | function |
Tells you if at least on of a list of annotators is a met dependency
Kind: global function
Returns: boolean - hasAnyAnnotator
| Param | Type |
|---|---|
| annotators | Array.<(Annotator|function())> |
Get a string representation
Kind: global function
Returns: string - annotator
Defines whether a given annotator is the same as current, using shallow compare. This is useful for a Document or Sentence to validate if the minimum of annotators required were already applied to them. Allows at the same time the users to instantiate new annotators and configure them as needed.
Kind: global function
| Param | Type |
|---|---|
| annotator | Annotator |
Get an Object key-value representation of the annotor's options (excluding prefix)
Kind: global function
Returns: Object - options
Get/Set an option value
Kind: global function
Returns: string - value
| Param | Type | Default |
|---|---|---|
| key | string |
|
| [value] | string | boolean |
null |
Get a list of annotators dependencies
Kind: global function
Returns: Array.<Annotator> - dependencies
Get a list of annotators dependencies, following by this annotator, all this as
a list of strings
This is useful to fulfill the annotators param in CoreNLP API properties.
Kind: global function
Returns: Array.<string> - pipeline
Get an object of all the Annotator options including the current and all its dependencies, prefixed by the annotator names This is useful to fulfill the options params in CoreNLP API properties.
Kind: global function
Returns: Array.<string> - pipelineOptions
Get a string representation
Kind: global function
Returns: string - document
Get a list of sentences
Kind: global function
Returns: Array.<Sentence> - sentences - The document sentences
Get the sentence for a given index
Kind: global function
Returns: Sentence - sentence - The document sentences
| Param | Type | Description |
|---|---|---|
| index | number |
The position of the sentence to get |
TODO requirements: tokenize, ssplit, pos, lemma, ner, parse https://stanfordnlp.github.io/CoreNLP/dcoref.html
Kind: global function
Returns: Promise.<DeterministicCorefAnnotator> - dcoref
Update an instance of Document with data provided by a JSON
Kind: global function
Returns: Document - document - The current document instance
| Param | Type | Description |
|---|---|---|
| data | DocumentJSON |
The document data, as returned by CoreNLP API service |
Get an instance of Document from a given JSON
Kind: global function
Returns: Document - document - A new Document instance
| Param | Type | Description |
|---|---|---|
| data | DocumentJSON |
The document data, as returned by CoreNLP API service |
groups() ⇒ Array.<ExpressionSentenceMatchGroup>
Returns the main and labeled groups as a list of ExpressionSentenceMatchGroup
Kind: global function
Returns: Array.<ExpressionSentenceMatchGroup> - groups
group(label) ⇒ ExpressionSentenceMatchGroup
Nodes in a Macthed expression can be named, we call them groups here, and the labels are the name of the nodes.
Kind: global function
Returns: ExpressionSentenceMatchGroup - group
See: https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html#Naming_nodes
| Param | Type | Description |
|---|---|---|
| label | string |
The label name, not prefixed wih $ |
Labels are those aliases you can add to a group match expression, for example, in Semgrex, you can do {ner:/PERSON/=good_guy}, from where "good_guy" would be the label and internally it will come as $good_guy as a member of ExpressionSentenceMatchGroup.
Kind: global function
Returns: Array.<string> - labels
Update an instance of ExpressionSentenceMatch with data provided by a JSON
Kind: global function
Returns: ExpressionSentenceMatch - expression - The current match instance
| Param | Type | Description |
|---|---|---|
| data | ExpressionSentenceMatchJSON |
The match data, as returned by CoreNLP API service |
Get an instance of ExpressionSentenceMatch from a given JSON
Kind: global function
Returns: ExpressionSentenceMatch - match - A new ExpressionSentenceMatch instance
| Param | Type | Description |
|---|---|---|
| data | ExpressionSentenceMatchJSON |
The match data, as returned by CoreNLP API service |
Retrieves all the contained ExpressionSentenceMatch instances
Kind: global function
Returns: Array.<ExpressionSentenceMatch> - matches
Retrieves a ExpressionSentenceMatch at the index specified
Kind: global function
Returns: ExpressionSentenceMatch - match
| Param | Type |
|---|---|
| index | number |
The Expression / ExpressionSentence objects comes from outside the standard CoreNLP pipelines.
This mean that neither TokensRegex, Semgrex nor Tregex will tag the nodes with POS,
lemma, NER or any otehr annotation data. This is sometimes a usful resource to count with, if
you can apart of getting the matching groups, get the annotated tokens for each word in the
match group.
Kind: global function
Returns: ExpressionSentence - instance = The current instance
Update an instance of ExpressionSentence with data provided by a JSON
Kind: global function
Returns: ExpressionSentenceJSON - sentence - The current sentence instance
| Param | Type | Description |
|---|---|---|
| data | ExpressionSentenceJSON |
The expression data, as returned by CoreNLP API service |
Get an instance of ExpressionSentence from a given JSON of sentence matches
Kind: global function
Returns: ExpressionSentence - sentence - A new ExpressionSentence instance
| Param | Type | Description |
|---|---|---|
| data | ExpressionSentenceJSON |
The sentence data, as returned by CoreNLP API service |
Get a string representation
Kind: global function
Returns: string - expression
Get the pattern
Kind: global function
Returns: string - pattern - The expression pattern
Get a list of sentences
Kind: global function
Returns: Array.<ExpressionSentence> - sentences - The expression sentences
Get the sentence for a given index
Kind: global function
Returns: ExpressionSentence - sentence - An expression sentence
| Param | Type | Description |
|---|---|---|
| index | number |
The position of the sentence to get |
Hydrate the Expression instance with Token objects from an annotated Document
Kind: global function
Returns: Expression - expression - The current expression instance
See: ExpressionSentence#mergeTokensFromSentence
| Param | Type | Description |
|---|---|---|
| document | Document |
An annotated document from where to extract the tokens |
Update an instance of Expression with data provided by a JSON
Kind: global function
Returns: Expression - expression - The current expression instance
| Param | Type | Description |
|---|---|---|
| data | ExpressionJSON |
The expression data, as returned by CoreNLP API service |
Get an instance of Expression from a given JSON
Kind: global function
Returns: Expression - expression - A new Expression instance
| Param | Type | Description |
|---|---|---|
| data | ExpressionJSON |
The expression data, as returned by CoreNLP API service |
Get a string representation
Kind: global function
Returns: string - governor
Get an instance of Governor from a given JSON
Kind: global function
Returns: Governor - governor - A new Governor instance
Todo
- It is not possible to properly generate a Governor from a GovernorJSON the Governor requires references to the Token instances in order to work
| Param | Type | Description |
|---|---|---|
| data | GovernorJSON |
The token data, as returned by CoreNLP API service |
Get a string representation
Kind: global function
Returns: string - sentence
Get the index relative to the parent document
Kind: global function
Returns: number - index
Get a string representation of the parse tree structure
Kind: global function
Returns: string - parse
Get an array of string representations of the sentence words
Kind: global function
Returns: Array.<string> - words
Throws:
Errorin case the require annotator was not applied to the sentence
Requires: {@link TokenizerAnnotator}
Get a string representations of the Nth word of the sentence
Kind: global function
Returns: string - word
Throws:
Errorin case the require annotator was not applied to the sentenceErrorin case the token for the given index does not exists
Requires: {@link TokenizerAnnotator}
| Param | Type | Description |
|---|---|---|
| index | number |
0-based index as they are arranged naturally |
Get a string representations of the tokens part of speech of the sentence
Kind: global function
Returns: Array.<string> - posTags
Get a string representations of the Nth token part of speech of the sentence
Kind: global function
Returns: string - posTag
Throws:
Errorin case the token for the given index does not exists
| Param | Type | Description |
|---|---|---|
| index | number |
0-based index as they are arranged naturally |
Get a string representations of the tokens lemmas of the sentence
Kind: global function
Returns: Array.<string> - lemmas
Get a string representations of the Nth token lemma of the sentence
Kind: global function
Returns: string - lemma
Throws:
Errorin case the token for the given index does not exists
| Param | Type | Description |
|---|---|---|
| index | number |
0-based index as they are arranged naturally |
Get a string representations of the tokens nerTags of the sentence
Kind: global function
Returns: Array.<string> - nerTags
Get a string representations of the Nth token nerTag of the sentence
Kind: global function
Returns: string - nerTag
Throws:
Errorin case the token for the given index does not exists
| Param | Type | Description |
|---|---|---|
| index | number |
0-based index as they are arranged naturally |
Get a list of annotated governors by the dependency-parser
Kind: global function
Returns: Array.<Governor> - governors
Throws:
Errorin case the require annotator was not applied to the sentence
Requires: {@link DependencyParseAnnotator}
Get the N-th annotated governor by the dependency-parser annotator
Kind: global function
Returns: Governor - governor
Throws:
Errorin case the require annotator was not applied to the sentence
Requires: {@link DependencyParseAnnotator}
Get an array of token representations of the sentence words
Kind: global function
Returns: Array.<Token> - tokens
Throws:
Errorin case the require annotator was not applied to the sentence
Requires: {@link TokenizerAnnotator}
Get the Nth token of the sentence
Kind: global function
Returns: Token - token
Throws:
Errorin case the require annotator was not applied to the sentence
Requires: {@link TokenizerAnnotator}
toJSON() ⇒ SentenceJSON
The following arrow function data => Sentence.fromJSON(data).toJSON() is idempontent, if
considering shallow comparison, not by reference.
This JSON will respects the same structure as it expects from {@see Sentence#fromJSON}.
Kind: global function
Returns: SentenceJSON - data
Update an instance of Sentence with data provided by a JSON
Kind: global function
Returns: Sentence - sentence - The current sentence instance
| Param | Type | Default | Description |
|---|---|---|---|
| data | SentenceJSON |
The document data, as returned by CoreNLP API service | |
| [isSentence] | boolean |
false |
Indicate if the given data represents just the sentence or a full document with just a sentence inside |
Get an instance of Sentence from a given JSON
Kind: global function
Returns: Sentence - document - A new Sentence instance
| Param | Type | Default | Description |
|---|---|---|---|
| data | SentenceJSON |
The document data, as returned by CoreNLP API service | |
| [isSentence] | boolean |
false |
Indicate if the given data represents just the sentence of a full document |
Get a string representation
Kind: global function
Returns: string - token
Get the inde number associated by the StanfordCoreNLP
This index is relative to the sentence it belongs to, and is a 1-based (possitive integer).
This number is useful to match tokens within a sentence for depparse, coreference, etc.
Kind: global function
Returns: number - index
Get the original word
Kind: global function
Returns: string - word
Get the original text
Kind: global function
Returns: string - originalText
A 0-based index of the word's initial character within the sentence
Kind: global function
Returns: number - characterOffsetBegin
Get the characterOffsetEnd relative to the parent sentence A 0-based index of the word's ending character within the sentence
Kind: global function
Returns: number - characterOffsetEnd
Get the before string relative to the container sentence
Kind: global function
Returns: string - before
Get the after string relative to the container sentence
Kind: global function
Returns: string - after
Get the annotated lemma
Kind: global function
Returns: string - lemma
Get the annotated part-of-speech for the current token
Kind: global function
Returns: string - pos
posInfo() ⇒ PosInfo
Get additional metadata about the POS annotation NOTE: Do not use this method other than just for study or analysis purposes.
Kind: global function
Returns: PosInfo - posInfo
See: PosInfo for more details
Get the annotated named-entity for the current token
Kind: global function
Returns: string - ner
toJSON() ⇒ TokenJSON
The following arrow function data => Token.fromJSON(data).toJSON() is idempontent, if
considering shallow comparison, not by reference.
This JSON will respects the same structure as it expects from {@see Token#fromJSON}.
Kind: global function
Returns: TokenJSON - data
Get an instance of Token from a given JSON
Kind: global function
Returns: Token - token - A new Token instance
| Param | Type | Description |
|---|---|---|
| data | TokenJSON |
The token data, as returned by CoreNLP API service |
Get a Tree string representation for debugging purposes
Kind: global function
Returns: string - tree
Performs Deep-first Search calling a visitor for each node
Kind: global function
See: DFS
Performs Deep-first Search calling a visitor for each node, from right to left
Kind: global function
See: DFS
Performs Deep-first Search calling a visitor only over leaves
Kind: global function
See: DFS
Kind: global function
Returns: Tree - tree
| Param | Type | Default | Description |
|---|---|---|---|
| sentence | Sentence |
||
| [doubleLink] | boolean |
false |
whether the child nodes should have a reference to their parent or not - this allows the use of Node.parent() |
Kind: global function
Returns: Tree - tree
| Param | Type | Default | Description |
|---|---|---|---|
| str | string |
||
| [doubleLink] | boolean |
false |
whether the child nodes should have a reference to their parent or not - this allows the use of Node.parent() |
The CoreNLP API JSON structure representing a document
Kind: global typedef
Properties
| Name | Type |
|---|---|
| index | number |
| sentences | Array.<Sentence> |
Kind: global typedef
Properties
| Name | Type | Description |
|---|---|---|
| label | string |
group label |
| begin | number |
0-based index of the matched group, relative to the given text |
| end | number |
0-based index of the matched group, relative to the given text |
| token | Token |
onluy given if aggregated with an annotated Sentence or Document |
| $[label | ExpressionSentenceMatchGroup |
other groups inside |
A ExpressionSentenceMatch of either TokensRegex, Semrgex or Tregex.
Kind: global typedef
Properties
| Name | Type | Description |
|---|---|---|
| begin | number |
word begin position, starting from zero |
| end | number |
word end position, starting from zero (no match ends at 0) |
| text | string |
matched text |
| $[label | string |
any label, as defined in the expression pattern |
The CoreNLP API JSON structure representing an expression
This expression structure can be found as the output of TokensRegex,
Semrgex and Tregex.
Kind: global typedef
Properties
| Name | Type |
|---|---|
| index | number |
| sentences | Array.<Array.<ExpressionSentenceMatch>> |
The CoreNLP API JSON structure representing a governor
Kind: global typedef
Properties
| Name | Type |
|---|---|
| dep | string |
| governor | number |
| governorGloss | string |
| dependent | number |
| dependentGloss | string |
The CoreNLP API JSON structure representing a sentence
Kind: global typedef
Properties
| Name | Type | Description |
|---|---|---|
| index | number |
1-based index, as they come indexed by StanfordCoreNLP |
| tokens | Array.<Token> |
The CoreNLP API JSON structure representing a token
Kind: global typedef
Properties
| Name | Type |
|---|---|
| index | number |
| word | string |
| originalText | string |
| characterOffsetBegin | number |
| characterOffsetEnd | number |
| before | string |
| after | string |
PosInfo does not come as part of the CoreNLP. It is an indexed reference of POS tags by language provided by this library. It's only helpful for analysis and study. The data was collected from different documentation resources on the Web. The PosInfo may vary depending on the POS annotation types used, for example, CoreNLP for Spanish uses custom POS tags developed by Stanford, but this can also be changed to Universal Dependencies, which uses different tags.
Kind: global typedef
Properties
| Name | Type |
|---|---|
| group | string |
| tag | string |
| examples | Array.<string> |
TODO ?? ⇐ Annotator
Class representing an DeterministicCorefAnnotator.
Kind: global external
Extends: Annotator
See: DeterministicCorefAnnotator
Hydrates {@link Sentence.governors()} ⇐ Annotator
Class representing an DependencyParseAnnotator.
Kind: global external
Extends: Annotator
See: DependencyParseAnnotator
Hydrates {@link Token.lemma()} ⇐ Annotator
Class representing an MorphaAnnotator.
Kind: global external
Extends: Annotator
See: MorphaAnnotator
Hydrates {@link Token.ner()} ⇐ Annotator
Class representing an NERClassifierCombiner.
Kind: global external
Extends: Annotator
See: NERClassifierCombiner
Hydrates {@link Token.parse()} ⇐ Annotator
Class representing an ParserAnnotator.
Kind: global external
Extends: Annotator
See: ParserAnnotator
Hydrates {@link Token.pos()} ⇐ Annotator
Class representing an POSTaggerAnnotator.
Kind: global external
Extends: Annotator
See: POSTaggerAnnotator
TODO ?? ⇐ Annotator
Class representing an RegexNERAnnotator.
Kind: global external
Extends: Annotator
See: RegexNERAnnotator
TODO ?? ⇐ Annotator
Class representing an RelationExtractorAnnotator.
Kind: global external
Extends: Annotator
See: RelationExtractorAnnotator
Combines multiple {@link Token}s into sentences ⇐ Annotator
Class representing an WordsToSentenceAnnotator.
Kind: global external
Extends: Annotator
See: WordsToSentenceAnnotator
Identifies {@link Token}s ⇐ Annotator
Class representing an TokenizerAnnotator.
Kind: global external
Extends: Annotator
See: TokenizerAnnotator
We will update this section soon. In the meantime, you can browse the project codebase and read the @jsdoc referenecs.
In summary, this NodeJS library aims to replicate the CoreNLP Simple Java interface but in Javascript. There are some minor differences however, for example the need to call applyAnnotator asynchronously.
Properties
Pipeline
Service
ConnectorServer # https://stanfordnlp.github.io/CoreNLP/corenlp-server.html
ConnectorCli # https://stanfordnlp.github.io/CoreNLP/cmdline.html
CoreNLP
simple # https://stanfordnlp.github.io/CoreNLP/simple.html
Annotable
Annotator
Document
Sentence
Token
annotator # https://stanfordnlp.github.io/CoreNLP/annotators.html
TokenizerAnnotator # https://stanfordnlp.github.io/CoreNLP/tokenize.html
WordsToSentenceAnnotator # https://stanfordnlp.github.io/CoreNLP/ssplit.html
POSTaggerAnnotator # https://stanfordnlp.github.io/CoreNLP/pos.html
MorphaAnnotator # https://stanfordnlp.github.io/CoreNLP/lemma.html
NERClassifierCombiner # https://stanfordnlp.github.io/CoreNLP/ner.html
ParserAnnotator # https://stanfordnlp.github.io/CoreNLP/parse.html
DependencyParseAnnotator # https://stanfordnlp.github.io/CoreNLP/depparse.html
RelationExtractorAnnotator # https://stanfordnlp.github.io/CoreNLP/relation.html
DeterministicCorefAnnotator # https://stanfordnlp.github.io/CoreNLP/coref.html
util
Tree # http://www.cs.cornell.edu/courses/cs474/2004fa/lec1.pdf© 2017 Gerardo Bort <[email protected]> under GPL-3.0 Licence. Documented by jsdoc-to-markdown.
Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.