A curated list of resources dedicated to Natural Language Processing
Maintainers - Keon Kim
Please feel free to pull requests, email Keon Kim ([email protected]) to add links.
- Tensor Flow Tutorial on Seq2Seq Models
- Stanford's Coursera Course on NLP from basics
- Intro to Natural Language Processing on Coursera by U of Michigan
- Intro to Artificial Intelligence course on Udacity which also covers NLP
- Pre-trained word embeddings for WSJ corpus by Koc AI-Lab
- Word2vec by Mikolov
- HLBL language model by Turian
- Real-valued vector "embeddings" by Dhillon
- Improving Word Representations Via Global Context And Multiple Word Prototypes by Huang
- Dependency based word embeddings
- Global Vectors for Word Representations
-
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
-
Node.js and Javascript - Node.js Libaries for NLP
- Twitter-text - A JavaScript implementation of Twitter's text processing library
- NLP.js - NLP utilities in javascript and coffeescript
- Knwl.js - A Natural Language Processor in JS
- Retext - Extensible system for analyzing and manipulating natural language
- TextProcessing - Sentiment analysis, stemming and lemmatization, part-of-speech tagging and chunking, phrase extraction and named entity recognition.
- NLP Compromise - Natural Language processing in the browser
- Natural - general natural language facilities for node
-
Python - Python NLP Libraries
-
C++ - C++ Libraries
- MIT Information Extraction Toolkit - C, C++, and Python tools for named entity recognition and relation extraction
- CRF++ - Open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data & other Natural Language Processing tasks.
- CRFsuite - CRFsuite is an implementation of Conditional Random Fields (CRFs) for labeling sequential data.
- BLLIP Parser - BLLIP Natural Language Parser (also known as the Charniak-Johnson parser)
- colibri-core - C++ library, command line tools, and Python binding for extracting and working with basic linguistic constructions such as n-grams and skipgrams in a quick and memory-efficient way.
- ucto - Unicode-aware regular-expression based tokenizer for various languages. Tool and C++ library. Supports FoLiA format.
- libfolia - C++ library for the FoLiA format
- frog - Memory-based NLP suite developed for Dutch: PoS tagger, lemmatiser, dependency parser, NER, shallow parser, morphological analyzer.
- MeTA - MeTA : ModErn Text Analysis is a C++ Data Sciences Toolkit that facilitates mining big text data.
- Mecab (Japanese)
- Mecab (Korean)
-
Java - Java NLP Libraries
-
Closure
- Clojure-openNLP - Natural Language Processing in Clojure (opennlp)
- Infections-clj - Rails-like inflection library for Clojure and ClojureScript
- Deep Learning for Web Search and Natural Language Processing
- Probabilistic topic models
- Natural language processing: an introduction
- A unified architecture for natural language processing: Deep neural networks with multitask learning
- A Critical Review of Recurrent Neural Networksfor Sequence Learning
- Deep parsing in Watson
- Online named entity recognition method for microtexts in social networking services: A case study of twitter
- word2vec - on creating vectors to represent language, useful for RNN inputs
- sense2vec - on word sense disambiguation
- Infinite Dimensional Word Embeddings - new
- Skip Thought Vectors - word representation method
- Adaptive skip-gram - similar approach, with adaptive properties
- Neural autocoder for paragraphs and documents - LTSM representation
- LTSM over tree structures
- Sequence to Sequence Learning - word vectors for machine translation
- Teaching Machines to Read and Comprehend - DeepMind paper
- Efficient Estimation of Word Representations in Vector Space
- Improving distributional similarity with lessons learned from word embeddings
- Low-Dimensional Embeddings of Logic
- Tutorial on Markov Logic Networks (based on this paper)
- Markov Logic Networks for Natural Language Question Answering
- Distant Supervision for Cancer Pathway Extraction From Text
- Privee: An Architecture for Automatically Analyzing Web Privacy Policies
- A Neural Probabilistic Language Model
- Template-Based Information Extraction without the Templates
- Retrofitting word vectors to semantic lexicons
- Unsupervised Learning of the Morphology of a Natural Language
- Natural Language Processing (Almost) from Scratch
- Computational Grounded Cognition: a new alliance between grounded cognition and computational modelling
- Learning the Structure of Biomedical Relation Extractions
- Relation extraction with matrix factorization and universal schemas
- A survey of named entity recognition and classification
- Benchmarking the extraction and disambiguation of named entities on the semantic web
- Knowledge base population: Successful approaches and challenges
- SpeedRead: A fast named entity recognition Pipeline
- Cross-lingual Pseudo-Projected Expectation Regularization for Weakly Supervised Learning
- Generating Chinese Named Entity Data from a Parallel Corpus
- IXA pipeline: Efficient and Ready to Use Multilingual NLP tools
- The Unreasonable Effectiveness of Recurrent Neural Networks
- Statistical Language Models based on Neural Networks
- Slides from Google Talk
- Word2Vec
- Relation Extraction with Matrix Factorization and Universal Schemas
- Towards a Formal Distributional Semantics: Simulating Logical Calculi with Tensors
- Presentation slides for MLN tutorial
- Presentation slides for QA applications of MLNs
- Presentation slides
- Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
- Blog Post on Deep Learning, NLP, and Representations
- Blog Post on NLP Tutorial
- Natural Language Processing Blog by Hal Daumé III
- POS TAGGERS
- NER
- ETC
part of the lists are from