Skip to content

gerardobort/node-corenlp

 
 

Repository files navigation

NodeJS CoreNLP Library

Build Status

This project is under active development, please stay tuned for updates. More documentation and examples are comming.

This library connects to Stanford CoreNLP either via HTTP or by spawning processes. The first (HTTP) is the preferred method since it requires CoreNLP to initialize just once to serve many requests, it also avoids extra I/O given that the CLI method need to write temporary files to run.

Setup

1. Install the package:

npm i --save corenlp

2. Download Stanford CoreNLP

Via npm, run this command from your own project after having installed this library:

npm explore corenlp -- npm run corenlp:download

Once downloaded you can easily start the server by running

npm explore corenlp -- npm run corenlp:server

Or you can manually download the project from the Stanford's CoreNLP download section at: https://stanfordnlp.github.io/CoreNLP/download.html You may want to download, apart of the full package, other language models (see more on that page).

3. Configure Stanford CoreNLP

3.1. Using StanfordCoreNLPServer

# Run the server using all jars in the current directory (e.g., the CoreNLP home directory)
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

CoreNLP connects by default via StanfordCoreNLPServer, using port 9000. You can also opt to setup the connection differently:

import CoreNLP, { Properties, Pipeline, ConnectorServer } from 'corenlp';

const connector = new ConnectorServer({ dsn: 'http://localhost:9000' });
const props = new Properties({
  annotators: 'tokenize,ssplit,pos,lemma,ner,parse',
});
const pipeline = new Pipeline(props, 'English', connector);

3.2. Use CoreNLP via CLI

CoreNLP expects by default the StanfordCoreNLP package to be placed (unzipped) inside the path ${YOUR_NPM_PROJECT_ROOT}/corenlp/. You can also opt to setup the CLI interface differently:

import CoreNLP, { Properties, Pipeline, ConnectorCli } from 'corenlp';

const connector = new ConnectorCli({
  classPath: 'corenlp/stanford-corenlp-full-2017-06-09/*', // specify the paths relative to your npm project root
  mainClass: 'edu.stanford.nlp.pipeline.StanfordCoreNLP', // optional
  props: 'StanfordCoreNLP-spanish.properties', // optional
});
const props = new Properties({
  annotators: 'tokenize,ssplit,pos,lemma,ner,parse',
});
const pipeline = new Pipeline(props, 'English', connector);

4. Usage

// ... initialize pipeline first (see above)

const sent = new CoreNLP.simple.Sentence('Hello world');
pipeline.annotate(sent)
  .then(sent => {
    console.log(sent.words());
  })
  .catch(err => {
    console.log('err', err);
  });

Examples

NOTE1: The examples below assumes that StanfordCoreNLP is running on port 9000. NOTE2: The examples below assumes es6 syntax, if you use require, use as follows: var CoreNLP = require('corenlp').default;

English

import CoreNLP, { Properties, Pipeline } from 'corenlp';

const props = new Properties({
  annotators: 'tokenize,ssplit,pos,lemma,ner,parse',
});
const pipeline = new Pipeline(props, 'English'); // uses ConnectorServer by default

const sent = new CoreNLP.simple.Sentence('The little dog runs so fast.');
pipeline.annotate(sent)
  .then(sent => {
    console.log('parse', sent.parse());
    console.log(CoreNLP.util.Tree.fromSentence(sent).dump());
  })
  .catch(err => {
    console.log('err', err);
  });

API Reference

Functions

setProperty(name, value)

Property setter

getProperty(name, default)*

Property getter

getProperties()Object

Returns an Object map of the given properties

toJson()Object

Returns a JSON object of the given properties

toPropertiessFileContent()string

Returns a properties file-like string of the given properties

get()Promise.<Object>
get(config, [utility])Promise.<Object>
text()string

Get a string representation of the raw text

setLanguageISO()string

Sets the language ISO (given by the pipeline during the annotation process) This is solely to keep track of the language chosen for further analysis

getLanguageISO()string

Retrieves the language ISO

addAnnotator(annotator)

Marks an annotator as a met dependency

addAnnotators(annotators)

Marks multiple annotators as a met dependencies

removeAnnotator(annotator)

Unmarks an annotator as a met dependency

hasAnnotator(annotator)boolean

Tells you if an annotator is a met dependency

hasAnyAnnotator(annotators)boolean

Tells you if at least on of a list of annotators is a met dependency

toString()string

Get a string representation

equalsTo(annotator)boolean

Defines whether a given annotator is the same as current, using shallow compare. This is useful for a Document or Sentence to validate if the minimum of annotators required were already applied to them. Allows at the same time the users to instantiate new annotators and configure them as needed.

options()Object

Get an Object key-value representation of the annotor's options (excluding prefix)

option(key, [value])string

Get/Set an option value

dependencies()Array.<Annotator>

Get a list of annotators dependencies

pipeline()Array.<string>

Get a list of annotators dependencies, following by this annotator, all this as a list of strings This is useful to fulfill the annotators param in CoreNLP API properties.

pipelineOptions()Array.<string>

Get an object of all the Annotator options including the current and all its dependencies, prefixed by the annotator names This is useful to fulfill the options params in CoreNLP API properties.

toString()string

Get a string representation

sentences()Array.<Sentence>

Get a list of sentences

sentence(index)Sentence

Get the sentence for a given index

coref()Promise.<DeterministicCorefAnnotator>

TODO requirements: tokenize, ssplit, pos, lemma, ner, parse https://stanfordnlp.github.io/CoreNLP/dcoref.html

fromJSON(data)Document

Update an instance of Document with data provided by a JSON

fromJSON(data)Document

Get an instance of Document from a given JSON

groups()Array.<ExpressionSentenceMatchGroup>

Returns the main and labeled groups as a list of ExpressionSentenceMatchGroup

group(label)ExpressionSentenceMatchGroup

Nodes in a Macthed expression can be named, we call them groups here, and the labels are the name of the nodes.

labels()Array.<string>

Labels are those aliases you can add to a group match expression, for example, in Semgrex, you can do {ner:/PERSON/=good_guy}, from where "good_guy" would be the label and internally it will come as $good_guy as a member of ExpressionSentenceMatchGroup.

fromJson(data)ExpressionSentenceMatch

Update an instance of ExpressionSentenceMatch with data provided by a JSON

fromJson(data)ExpressionSentenceMatch

Get an instance of ExpressionSentenceMatch from a given JSON

matches()Array.<ExpressionSentenceMatch>

Retrieves all the contained ExpressionSentenceMatch instances

match(index)ExpressionSentenceMatch

Retrieves a ExpressionSentenceMatch at the index specified

mergeTokensFromSentence()ExpressionSentence

The Expression / ExpressionSentence objects comes from outside the standard CoreNLP pipelines. This mean that neither TokensRegex, Semgrex nor Tregex will tag the nodes with POS, lemma, NER or any otehr annotation data. This is sometimes a usful resource to count with, if you can apart of getting the matching groups, get the annotated tokens for each word in the match group.

fromJson(data)ExpressionSentenceJSON

Update an instance of ExpressionSentence with data provided by a JSON

fromJson(data)ExpressionSentence

Get an instance of ExpressionSentence from a given JSON of sentence matches

toString()string

Get a string representation

pattern()string

Get the pattern

sentences()Array.<ExpressionSentence>

Get a list of sentences

sentence(index)ExpressionSentence

Get the sentence for a given index

mergeTokensFromDocument(document)Expression

Hydrate the Expression instance with Token objects from an annotated Document

fromJson(data)Expression

Update an instance of Expression with data provided by a JSON

fromJson(data)Expression

Get an instance of Expression from a given JSON

toString()string

Get a string representation

fromJSON(data)Governor

Get an instance of Governor from a given JSON

toString()string

Get a string representation

index()number

Get the index relative to the parent document

parse()string

Get a string representation of the parse tree structure

words()Array.<string>

Get an array of string representations of the sentence words

word(index)string

Get a string representations of the Nth word of the sentence

posTags()Array.<string>

Get a string representations of the tokens part of speech of the sentence

posTag(index)string

Get a string representations of the Nth token part of speech of the sentence

lemmas()Array.<string>

Get a string representations of the tokens lemmas of the sentence

lemma(index)string

Get a string representations of the Nth token lemma of the sentence

nerTags()Array.<string>

Get a string representations of the tokens nerTags of the sentence

nerTag(index)string

Get a string representations of the Nth token nerTag of the sentence

governors()Array.<Governor>

Get a list of annotated governors by the dependency-parser

governor()Governor

Get the N-th annotated governor by the dependency-parser annotator

tokens()Array.<Token>

Get an array of token representations of the sentence words

token()Token

Get the Nth token of the sentence

toJSON()SentenceJSON

The following arrow function data => Sentence.fromJSON(data).toJSON() is idempontent, if considering shallow comparison, not by reference. This JSON will respects the same structure as it expects from {@see Sentence#fromJSON}.

fromJSON(data, [isSentence])Sentence

Update an instance of Sentence with data provided by a JSON

fromJSON(data, [isSentence])Sentence

Get an instance of Sentence from a given JSON

toString()string

Get a string representation

index()number

Get the inde number associated by the StanfordCoreNLP This index is relative to the sentence it belongs to, and is a 1-based (possitive integer). This number is useful to match tokens within a sentence for depparse, coreference, etc.

word()string

Get the original word

originalText()string

Get the original text

characterOffsetBegin()number

A 0-based index of the word's initial character within the sentence

characterOffsetEnd()number

Get the characterOffsetEnd relative to the parent sentence A 0-based index of the word's ending character within the sentence

before()string

Get the before string relative to the container sentence

after()string

Get the after string relative to the container sentence

lemma()string

Get the annotated lemma

pos()string

Get the annotated part-of-speech for the current token

posInfo()PosInfo

Get additional metadata about the POS annotation NOTE: Do not use this method other than just for study or analysis purposes.

ner()string

Get the annotated named-entity for the current token

toJSON()TokenJSON

The following arrow function data => Token.fromJSON(data).toJSON() is idempontent, if considering shallow comparison, not by reference. This JSON will respects the same structure as it expects from {@see Token#fromJSON}.

fromJSON(data)Token

Get an instance of Token from a given JSON

dump()string

Get a Tree string representation for debugging purposes

visitDeepFirst()

Performs Deep-first Search calling a visitor for each node

visitDeepFirstRight()

Performs Deep-first Search calling a visitor for each node, from right to left

visitLeaves()

Performs Deep-first Search calling a visitor only over leaves

fromSentence(sentence, [doubleLink])Tree
fromString(str, [doubleLink])Tree

Typedefs

DocumentJSON

The CoreNLP API JSON structure representing a document

ExpressionSentenceMatchGroup
ExpressionSentenceMatchJSON

A ExpressionSentenceMatch of either TokensRegex, Semrgex or Tregex.

ExpressionJSON

The CoreNLP API JSON structure representing an expression This expression structure can be found as the output of TokensRegex, Semrgex and Tregex.

GovernorJSON

The CoreNLP API JSON structure representing a governor

SentenceJSON

The CoreNLP API JSON structure representing a sentence

TokenJSON

The CoreNLP API JSON structure representing a token

PosInfo

PosInfo does not come as part of the CoreNLP. It is an indexed reference of POS tags by language provided by this library. It's only helpful for analysis and study. The data was collected from different documentation resources on the Web. The PosInfo may vary depending on the POS annotation types used, for example, CoreNLP for Spanish uses custom POS tags developed by Stanford, but this can also be changed to Universal Dependencies, which uses different tags.

External

DeterministicCorefAnnotator TODO ??Annotator

Class representing an DeterministicCorefAnnotator.

DependencyParseAnnotator Hydrates {@link Sentence.governors()}Annotator

Class representing an DependencyParseAnnotator.

MorphaAnnotator Hydrates {@link Token.lemma()}Annotator

Class representing an MorphaAnnotator.

NERClassifierCombiner Hydrates {@link Token.ner()}Annotator

Class representing an NERClassifierCombiner.

ParserAnnotator Hydrates {@link Token.parse()}Annotator

Class representing an ParserAnnotator.

POSTaggerAnnotator Hydrates {@link Token.pos()}Annotator

Class representing an POSTaggerAnnotator.

RegexNERAnnotator TODO ??Annotator

Class representing an RegexNERAnnotator.

RelationExtractorAnnotator TODO ??Annotator

Class representing an RelationExtractorAnnotator.

WordsToSentenceAnnotator Combines multiple {@link Token}s into sentencesAnnotator

Class representing an WordsToSentenceAnnotator.

TokenizerAnnotator Identifies {@link Token}sAnnotator

Class representing an TokenizerAnnotator.

setProperty(name, value)

Property setter

Kind: global function

Param Type Description
name string the property name
value * the property value

getProperty(name, default) ⇒ *

Property getter

Kind: global function
Returns: * - value - the property value

Param Type Description
name string the property name
default * the defaut value to return if not set

getProperties() ⇒ Object

Returns an Object map of the given properties

Kind: global function
Returns: Object - properties - the properties object

toJson() ⇒ Object

Returns a JSON object of the given properties

Kind: global function
Returns: Object - json - the properties object

toPropertiessFileContent() ⇒ string

Returns a properties file-like string of the given properties

Kind: global function
Returns: string - properties - the properties content

get() ⇒ Promise.<Object>

Kind: global function

get(config, [utility]) ⇒ Promise.<Object>

Kind: global function

Param Type Description
config Object
config.annotators Array.<string> The list of annotators that edfines the pipeline
config.text string The text to run the pipeline against
config.options Object Additinal options (properties) for the pipeline
config.language string Language full name in CamelCase (eg. Spanish)
[utility] '' | 'tokensregex' | 'semgrex' | 'tregex' Name of the utility to use NOTE: most of the utilities receives properties, these should be passed via the options param

text() ⇒ string

Get a string representation of the raw text

Kind: global function
Returns: string - text

setLanguageISO() ⇒ string

Sets the language ISO (given by the pipeline during the annotation process) This is solely to keep track of the language chosen for further analysis

Kind: global function
Returns: string - text

getLanguageISO() ⇒ string

Retrieves the language ISO

Kind: global function
Returns: string - text

addAnnotator(annotator)

Marks an annotator as a met dependency

Kind: global function

Param Type
annotator Annotator | function

addAnnotators(annotators)

Marks multiple annotators as a met dependencies

Kind: global function

Param Type
annotators Array.<(Annotator|function())>

removeAnnotator(annotator)

Unmarks an annotator as a met dependency

Kind: global function

Param Type
annotator Annotator | function

hasAnnotator(annotator) ⇒ boolean

Tells you if an annotator is a met dependency

Kind: global function
Returns: boolean - hasAnnotator

Param Type
annotator Annotator | function

hasAnyAnnotator(annotators) ⇒ boolean

Tells you if at least on of a list of annotators is a met dependency

Kind: global function
Returns: boolean - hasAnyAnnotator

Param Type
annotators Array.<(Annotator|function())>

toString() ⇒ string

Get a string representation

Kind: global function
Returns: string - annotator

equalsTo(annotator) ⇒ boolean

Defines whether a given annotator is the same as current, using shallow compare. This is useful for a Document or Sentence to validate if the minimum of annotators required were already applied to them. Allows at the same time the users to instantiate new annotators and configure them as needed.

Kind: global function

Param Type
annotator Annotator

options() ⇒ Object

Get an Object key-value representation of the annotor's options (excluding prefix)

Kind: global function
Returns: Object - options

option(key, [value]) ⇒ string

Get/Set an option value

Kind: global function
Returns: string - value

Param Type Default
key string
[value] string | boolean null

dependencies() ⇒ Array.<Annotator>

Get a list of annotators dependencies

Kind: global function
Returns: Array.<Annotator> - dependencies

pipeline() ⇒ Array.<string>

Get a list of annotators dependencies, following by this annotator, all this as a list of strings This is useful to fulfill the annotators param in CoreNLP API properties.

Kind: global function
Returns: Array.<string> - pipeline

pipelineOptions() ⇒ Array.<string>

Get an object of all the Annotator options including the current and all its dependencies, prefixed by the annotator names This is useful to fulfill the options params in CoreNLP API properties.

Kind: global function
Returns: Array.<string> - pipelineOptions

toString() ⇒ string

Get a string representation

Kind: global function
Returns: string - document

sentences() ⇒ Array.<Sentence>

Get a list of sentences

Kind: global function
Returns: Array.<Sentence> - sentences - The document sentences

sentence(index) ⇒ Sentence

Get the sentence for a given index

Kind: global function
Returns: Sentence - sentence - The document sentences

Param Type Description
index number The position of the sentence to get

coref() ⇒ Promise.<DeterministicCorefAnnotator>

TODO requirements: tokenize, ssplit, pos, lemma, ner, parse https://stanfordnlp.github.io/CoreNLP/dcoref.html

Kind: global function
Returns: Promise.<DeterministicCorefAnnotator> - dcoref

fromJSON(data) ⇒ Document

Update an instance of Document with data provided by a JSON

Kind: global function
Returns: Document - document - The current document instance

Param Type Description
data DocumentJSON The document data, as returned by CoreNLP API service

fromJSON(data) ⇒ Document

Get an instance of Document from a given JSON

Kind: global function
Returns: Document - document - A new Document instance

Param Type Description
data DocumentJSON The document data, as returned by CoreNLP API service

Returns the main and labeled groups as a list of ExpressionSentenceMatchGroup

Kind: global function
Returns: Array.<ExpressionSentenceMatchGroup> - groups

Nodes in a Macthed expression can be named, we call them groups here, and the labels are the name of the nodes.

Kind: global function
Returns: ExpressionSentenceMatchGroup - group
See: https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html#Naming_nodes

Param Type Description
label string The label name, not prefixed wih $

labels() ⇒ Array.<string>

Labels are those aliases you can add to a group match expression, for example, in Semgrex, you can do {ner:/PERSON/=good_guy}, from where "good_guy" would be the label and internally it will come as $good_guy as a member of ExpressionSentenceMatchGroup.

Kind: global function
Returns: Array.<string> - labels

fromJson(data) ⇒ ExpressionSentenceMatch

Update an instance of ExpressionSentenceMatch with data provided by a JSON

Kind: global function
Returns: ExpressionSentenceMatch - expression - The current match instance

Param Type Description
data ExpressionSentenceMatchJSON The match data, as returned by CoreNLP API service

fromJson(data) ⇒ ExpressionSentenceMatch

Get an instance of ExpressionSentenceMatch from a given JSON

Kind: global function
Returns: ExpressionSentenceMatch - match - A new ExpressionSentenceMatch instance

Param Type Description
data ExpressionSentenceMatchJSON The match data, as returned by CoreNLP API service

matches() ⇒ Array.<ExpressionSentenceMatch>

Retrieves all the contained ExpressionSentenceMatch instances

Kind: global function
Returns: Array.<ExpressionSentenceMatch> - matches

match(index) ⇒ ExpressionSentenceMatch

Retrieves a ExpressionSentenceMatch at the index specified

Kind: global function
Returns: ExpressionSentenceMatch - match

Param Type
index number

mergeTokensFromSentence() ⇒ ExpressionSentence

The Expression / ExpressionSentence objects comes from outside the standard CoreNLP pipelines. This mean that neither TokensRegex, Semgrex nor Tregex will tag the nodes with POS, lemma, NER or any otehr annotation data. This is sometimes a usful resource to count with, if you can apart of getting the matching groups, get the annotated tokens for each word in the match group.

Kind: global function
Returns: ExpressionSentence - instance = The current instance

fromJson(data) ⇒ ExpressionSentenceJSON

Update an instance of ExpressionSentence with data provided by a JSON

Kind: global function
Returns: ExpressionSentenceJSON - sentence - The current sentence instance

Param Type Description
data ExpressionSentenceJSON The expression data, as returned by CoreNLP API service

fromJson(data) ⇒ ExpressionSentence

Get an instance of ExpressionSentence from a given JSON of sentence matches

Kind: global function
Returns: ExpressionSentence - sentence - A new ExpressionSentence instance

Param Type Description
data ExpressionSentenceJSON The sentence data, as returned by CoreNLP API service

toString() ⇒ string

Get a string representation

Kind: global function
Returns: string - expression

pattern() ⇒ string

Get the pattern

Kind: global function
Returns: string - pattern - The expression pattern

sentences() ⇒ Array.<ExpressionSentence>

Get a list of sentences

Kind: global function
Returns: Array.<ExpressionSentence> - sentences - The expression sentences

sentence(index) ⇒ ExpressionSentence

Get the sentence for a given index

Kind: global function
Returns: ExpressionSentence - sentence - An expression sentence

Param Type Description
index number The position of the sentence to get

mergeTokensFromDocument(document) ⇒ Expression

Hydrate the Expression instance with Token objects from an annotated Document

Kind: global function
Returns: Expression - expression - The current expression instance
See: ExpressionSentence#mergeTokensFromSentence

Param Type Description
document Document An annotated document from where to extract the tokens

fromJson(data) ⇒ Expression

Update an instance of Expression with data provided by a JSON

Kind: global function
Returns: Expression - expression - The current expression instance

Param Type Description
data ExpressionJSON The expression data, as returned by CoreNLP API service

fromJson(data) ⇒ Expression

Get an instance of Expression from a given JSON

Kind: global function
Returns: Expression - expression - A new Expression instance

Param Type Description
data ExpressionJSON The expression data, as returned by CoreNLP API service

toString() ⇒ string

Get a string representation

Kind: global function
Returns: string - governor

fromJSON(data) ⇒ Governor

Get an instance of Governor from a given JSON

Kind: global function
Returns: Governor - governor - A new Governor instance
Todo

  • It is not possible to properly generate a Governor from a GovernorJSON the Governor requires references to the Token instances in order to work
Param Type Description
data GovernorJSON The token data, as returned by CoreNLP API service

toString() ⇒ string

Get a string representation

Kind: global function
Returns: string - sentence

index() ⇒ number

Get the index relative to the parent document

Kind: global function
Returns: number - index

parse() ⇒ string

Get a string representation of the parse tree structure

Kind: global function
Returns: string - parse

words() ⇒ Array.<string>

Get an array of string representations of the sentence words

Kind: global function
Returns: Array.<string> - words
Throws:

  • Error in case the require annotator was not applied to the sentence

Requires: {@link TokenizerAnnotator}

word(index) ⇒ string

Get a string representations of the Nth word of the sentence

Kind: global function
Returns: string - word
Throws:

  • Error in case the require annotator was not applied to the sentence
  • Error in case the token for the given index does not exists

Requires: {@link TokenizerAnnotator}

Param Type Description
index number 0-based index as they are arranged naturally

posTags() ⇒ Array.<string>

Get a string representations of the tokens part of speech of the sentence

Kind: global function
Returns: Array.<string> - posTags

posTag(index) ⇒ string

Get a string representations of the Nth token part of speech of the sentence

Kind: global function
Returns: string - posTag
Throws:

  • Error in case the token for the given index does not exists
Param Type Description
index number 0-based index as they are arranged naturally

lemmas() ⇒ Array.<string>

Get a string representations of the tokens lemmas of the sentence

Kind: global function
Returns: Array.<string> - lemmas

lemma(index) ⇒ string

Get a string representations of the Nth token lemma of the sentence

Kind: global function
Returns: string - lemma
Throws:

  • Error in case the token for the given index does not exists
Param Type Description
index number 0-based index as they are arranged naturally

nerTags() ⇒ Array.<string>

Get a string representations of the tokens nerTags of the sentence

Kind: global function
Returns: Array.<string> - nerTags

nerTag(index) ⇒ string

Get a string representations of the Nth token nerTag of the sentence

Kind: global function
Returns: string - nerTag
Throws:

  • Error in case the token for the given index does not exists
Param Type Description
index number 0-based index as they are arranged naturally

governors() ⇒ Array.<Governor>

Get a list of annotated governors by the dependency-parser

Kind: global function
Returns: Array.<Governor> - governors
Throws:

  • Error in case the require annotator was not applied to the sentence

Requires: {@link DependencyParseAnnotator}

governor() ⇒ Governor

Get the N-th annotated governor by the dependency-parser annotator

Kind: global function
Returns: Governor - governor
Throws:

  • Error in case the require annotator was not applied to the sentence

Requires: {@link DependencyParseAnnotator}

tokens() ⇒ Array.<Token>

Get an array of token representations of the sentence words

Kind: global function
Returns: Array.<Token> - tokens
Throws:

  • Error in case the require annotator was not applied to the sentence

Requires: {@link TokenizerAnnotator}

token() ⇒ Token

Get the Nth token of the sentence

Kind: global function
Returns: Token - token
Throws:

  • Error in case the require annotator was not applied to the sentence

Requires: {@link TokenizerAnnotator}

toJSON() ⇒ SentenceJSON

The following arrow function data => Sentence.fromJSON(data).toJSON() is idempontent, if considering shallow comparison, not by reference. This JSON will respects the same structure as it expects from {@see Sentence#fromJSON}.

Kind: global function
Returns: SentenceJSON - data

fromJSON(data, [isSentence]) ⇒ Sentence

Update an instance of Sentence with data provided by a JSON

Kind: global function
Returns: Sentence - sentence - The current sentence instance

Param Type Default Description
data SentenceJSON The document data, as returned by CoreNLP API service
[isSentence] boolean false Indicate if the given data represents just the sentence or a full document with just a sentence inside

fromJSON(data, [isSentence]) ⇒ Sentence

Get an instance of Sentence from a given JSON

Kind: global function
Returns: Sentence - document - A new Sentence instance

Param Type Default Description
data SentenceJSON The document data, as returned by CoreNLP API service
[isSentence] boolean false Indicate if the given data represents just the sentence of a full document

toString() ⇒ string

Get a string representation

Kind: global function
Returns: string - token

index() ⇒ number

Get the inde number associated by the StanfordCoreNLP This index is relative to the sentence it belongs to, and is a 1-based (possitive integer). This number is useful to match tokens within a sentence for depparse, coreference, etc.

Kind: global function
Returns: number - index

word() ⇒ string

Get the original word

Kind: global function
Returns: string - word

originalText() ⇒ string

Get the original text

Kind: global function
Returns: string - originalText

characterOffsetBegin() ⇒ number

A 0-based index of the word's initial character within the sentence

Kind: global function
Returns: number - characterOffsetBegin

characterOffsetEnd() ⇒ number

Get the characterOffsetEnd relative to the parent sentence A 0-based index of the word's ending character within the sentence

Kind: global function
Returns: number - characterOffsetEnd

before() ⇒ string

Get the before string relative to the container sentence

Kind: global function
Returns: string - before

after() ⇒ string

Get the after string relative to the container sentence

Kind: global function
Returns: string - after

lemma() ⇒ string

Get the annotated lemma

Kind: global function
Returns: string - lemma

pos() ⇒ string

Get the annotated part-of-speech for the current token

Kind: global function
Returns: string - pos

posInfo() ⇒ PosInfo

Get additional metadata about the POS annotation NOTE: Do not use this method other than just for study or analysis purposes.

Kind: global function
Returns: PosInfo - posInfo
See: PosInfo for more details

ner() ⇒ string

Get the annotated named-entity for the current token

Kind: global function
Returns: string - ner

toJSON() ⇒ TokenJSON

The following arrow function data => Token.fromJSON(data).toJSON() is idempontent, if considering shallow comparison, not by reference. This JSON will respects the same structure as it expects from {@see Token#fromJSON}.

Kind: global function
Returns: TokenJSON - data

fromJSON(data) ⇒ Token

Get an instance of Token from a given JSON

Kind: global function
Returns: Token - token - A new Token instance

Param Type Description
data TokenJSON The token data, as returned by CoreNLP API service

dump() ⇒ string

Get a Tree string representation for debugging purposes

Kind: global function
Returns: string - tree

visitDeepFirst()

Performs Deep-first Search calling a visitor for each node

Kind: global function
See: DFS

visitDeepFirstRight()

Performs Deep-first Search calling a visitor for each node, from right to left

Kind: global function
See: DFS

visitLeaves()

Performs Deep-first Search calling a visitor only over leaves

Kind: global function
See: DFS

fromSentence(sentence, [doubleLink]) ⇒ Tree

Kind: global function
Returns: Tree - tree

Param Type Default Description
sentence Sentence
[doubleLink] boolean false whether the child nodes should have a reference to their parent or not - this allows the use of Node.parent()

fromString(str, [doubleLink]) ⇒ Tree

Kind: global function
Returns: Tree - tree

Param Type Default Description
str string
[doubleLink] boolean false whether the child nodes should have a reference to their parent or not - this allows the use of Node.parent()

DocumentJSON

The CoreNLP API JSON structure representing a document

Kind: global typedef
Properties

Name Type
index number
sentences Array.<Sentence>

ExpressionSentenceMatchGroup

Kind: global typedef
Properties

Name Type Description
label string group label
begin number 0-based index of the matched group, relative to the given text
end number 0-based index of the matched group, relative to the given text
token Token onluy given if aggregated with an annotated Sentence or Document
$[label ExpressionSentenceMatchGroup other groups inside

ExpressionSentenceMatchJSON

A ExpressionSentenceMatch of either TokensRegex, Semrgex or Tregex.

Kind: global typedef
Properties

Name Type Description
begin number word begin position, starting from zero
end number word end position, starting from zero (no match ends at 0)
text string matched text
$[label string any label, as defined in the expression pattern

ExpressionJSON

The CoreNLP API JSON structure representing an expression This expression structure can be found as the output of TokensRegex, Semrgex and Tregex.

Kind: global typedef
Properties

Name Type
index number
sentences Array.<Array.<ExpressionSentenceMatch>>

GovernorJSON

The CoreNLP API JSON structure representing a governor

Kind: global typedef
Properties

Name Type
dep string
governor number
governorGloss string
dependent number
dependentGloss string

SentenceJSON

The CoreNLP API JSON structure representing a sentence

Kind: global typedef
Properties

Name Type Description
index number 1-based index, as they come indexed by StanfordCoreNLP
tokens Array.<Token>

TokenJSON

The CoreNLP API JSON structure representing a token

Kind: global typedef
Properties

Name Type
index number
word string
originalText string
characterOffsetBegin number
characterOffsetEnd number
before string
after string

PosInfo

PosInfo does not come as part of the CoreNLP. It is an indexed reference of POS tags by language provided by this library. It's only helpful for analysis and study. The data was collected from different documentation resources on the Web. The PosInfo may vary depending on the POS annotation types used, for example, CoreNLP for Spanish uses custom POS tags developed by Stanford, but this can also be changed to Universal Dependencies, which uses different tags.

Kind: global typedef
Properties

Name Type
group string
tag string
examples Array.<string>

DeterministicCorefAnnotator

TODO ?? ⇐ Annotator Class representing an DeterministicCorefAnnotator.

Kind: global external
Extends: Annotator
See: DeterministicCorefAnnotator

DependencyParseAnnotator

Hydrates {@link Sentence.governors()} ⇐ Annotator Class representing an DependencyParseAnnotator.

Kind: global external
Extends: Annotator
See: DependencyParseAnnotator

MorphaAnnotator

Hydrates {@link Token.lemma()} ⇐ Annotator Class representing an MorphaAnnotator.

Kind: global external
Extends: Annotator
See: MorphaAnnotator

NERClassifierCombiner

Hydrates {@link Token.ner()} ⇐ Annotator Class representing an NERClassifierCombiner.

Kind: global external
Extends: Annotator
See: NERClassifierCombiner

ParserAnnotator

Hydrates {@link Token.parse()} ⇐ Annotator Class representing an ParserAnnotator.

Kind: global external
Extends: Annotator
See: ParserAnnotator

POSTaggerAnnotator

Hydrates {@link Token.pos()} ⇐ Annotator Class representing an POSTaggerAnnotator.

Kind: global external
Extends: Annotator
See: POSTaggerAnnotator

RegexNERAnnotator

TODO ?? ⇐ Annotator Class representing an RegexNERAnnotator.

Kind: global external
Extends: Annotator
See: RegexNERAnnotator

RelationExtractorAnnotator

TODO ?? ⇐ Annotator Class representing an RelationExtractorAnnotator.

Kind: global external
Extends: Annotator
See: RelationExtractorAnnotator

WordsToSentenceAnnotator

Combines multiple {@link Token}s into sentences ⇐ Annotator Class representing an WordsToSentenceAnnotator.

Kind: global external
Extends: Annotator
See: WordsToSentenceAnnotator

TokenizerAnnotator

Identifies {@link Token}s ⇐ Annotator Class representing an TokenizerAnnotator.

Kind: global external
Extends: Annotator
See: TokenizerAnnotator

External Reference

We will update this section soon. In the meantime, you can browse the project codebase and read the @jsdoc referenecs. In summary, this NodeJS library aims to replicate the CoreNLP Simple Java interface but in Javascript. There are some minor differences however, for example the need to call applyAnnotator asynchronously.

Properties
Pipeline
Service
ConnectorServer                   # https://stanfordnlp.github.io/CoreNLP/corenlp-server.html
ConnectorCli                      # https://stanfordnlp.github.io/CoreNLP/cmdline.html
CoreNLP
  simple                          # https://stanfordnlp.github.io/CoreNLP/simple.html
    Annotable
    Annotator
    Document
    Sentence
    Token
    annotator                     # https://stanfordnlp.github.io/CoreNLP/annotators.html
      TokenizerAnnotator          # https://stanfordnlp.github.io/CoreNLP/tokenize.html
      WordsToSentenceAnnotator    # https://stanfordnlp.github.io/CoreNLP/ssplit.html
      POSTaggerAnnotator          # https://stanfordnlp.github.io/CoreNLP/pos.html
      MorphaAnnotator             # https://stanfordnlp.github.io/CoreNLP/lemma.html
      NERClassifierCombiner       # https://stanfordnlp.github.io/CoreNLP/ner.html
      ParserAnnotator             # https://stanfordnlp.github.io/CoreNLP/parse.html
      DependencyParseAnnotator    # https://stanfordnlp.github.io/CoreNLP/depparse.html
      RelationExtractorAnnotator  # https://stanfordnlp.github.io/CoreNLP/relation.html
      DeterministicCorefAnnotator # https://stanfordnlp.github.io/CoreNLP/coref.html
  util
    Tree                          # http://www.cs.cornell.edu/courses/cs474/2004fa/lec1.pdf

© 2017 Gerardo Bort <[email protected]> under GPL-3.0 Licence. Documented by jsdoc-to-markdown.

Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.

Packages

No packages published

Contributors 5