Skip to content
This repository was archived by the owner on Feb 19, 2020. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from 72 commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
ebb3998
Reasonable first crack at neural epic with rule features and position…
gregdurrett Dec 16, 2014
a2f65eb
Additional speed optimizations, first groundwork for the caching firs…
gregdurrett Dec 19, 2014
cb4fe4e
First stuff for caching input layer
gregdurrett Dec 19, 2014
8c96e38
Finished caching implementation of the first layer
gregdurrett Dec 19, 2014
fa38219
Changed random initialization
gregdurrett Dec 21, 2014
4f9719b
Minor tweaks
gregdurrett Dec 25, 2014
7e168d7
Revamped how NNs are instantiated, can now load multiple input vector…
gregdurrett Dec 26, 2014
a85d0d9
Changed how weights are initialized to make backprop into embeddings …
gregdurrett Dec 28, 2014
5cbeff3
First crack at corefdense, additional improvements to the neural pars…
gregdurrett Jan 5, 2015
6ee25e5
More improvements: corefdense stuff, can do fully linear network, etc.
gregdurrett Jan 11, 2015
68bdf2f
Added support for multiple parallel layers
gregdurrett Jan 12, 2015
eec6834
Added support for dependencies
gregdurrett Jan 13, 2015
4aeb231
Various changes
gregdurrett Jan 22, 2015
c3250a5
Small bug fix so it compiles
gregdurrett Jan 22, 2015
9b8aba9
Testing optimization stuff
gregdurrett Jan 25, 2015
faf9269
More optimization playing around
gregdurrett Jan 26, 2015
9523bc2
Adadelta implementation
gregdurrett Jan 27, 2015
04896ad
First crack at NER
gregdurrett Jan 28, 2015
8eeed33
Small updates to dense NER
gregdurrett Jan 28, 2015
39beb0e
Parallel training
gregdurrett Jan 29, 2015
5ed3c63
Added LowRankQuadraticTransform
gregdurrett Jan 29, 2015
6fd0bdf
Added pruning to NER, other changes
gregdurrett Feb 4, 2015
5de8091
Overhauled how nonlinear transforms are handled
gregdurrett Feb 5, 2015
9fba451
Changed how the output layer is handled to allow for LRQT and output …
gregdurrett Feb 5, 2015
9a55ee8
Minor changes
gregdurrett Feb 6, 2015
6a2291d
Separated layer extraction for train and for test to allow dropout to…
gregdurrett Feb 6, 2015
f88adc1
Changes to NER
gregdurrett Feb 6, 2015
f63db36
Changes to set up for dropout in parser
gregdurrett Feb 7, 2015
78eabe9
Embedding clipping, dropout everywhere, etc.
gregdurrett Feb 8, 2015
a79c86e
Before tetra messing
gregdurrett Feb 8, 2015
c5d7986
Random changes
gregdurrett Feb 9, 2015
a7d83c3
Fixed output embedding to be fast in parser
gregdurrett Feb 10, 2015
c23e7b8
Flexibility in input words to the network
gregdurrett Feb 10, 2015
6549e4b
Ensembling
gregdurrett Feb 11, 2015
01eaa88
Expanded set of possible surface features
gregdurrett Feb 12, 2015
933cc97
Added ability to decouple parameters for unaries, binaries, and spans
gregdurrett Feb 13, 2015
b734995
Bug fix
gregdurrett Feb 13, 2015
8bfcd2a
SGD with momentum now supported
gregdurrett Feb 14, 2015
03044e1
Made things serializable, also fixed a bug where SGD wasn't being used
gregdurrett Feb 14, 2015
ff6e16a
Fixed some vector loading stuff
gregdurrett Feb 16, 2015
434b858
Added batch normalization support
gregdurrett Feb 17, 2015
f849f33
Improved output embedding initialization
gregdurrett Feb 17, 2015
1dd7fa9
Fixed initializer for oe
gregdurrett Feb 17, 2015
93f6257
Allowed for lowercased word vectors
gregdurrett Feb 18, 2015
5759b64
Added a way to get a subset of gradient features, also improvements t…
gregdurrett Feb 18, 2015
e3d5705
Added checking coverage of vectors
gregdurrett Feb 20, 2015
756857e
Combo adagrad and skipdep converter implementation
gregdurrett Feb 20, 2015
610bf51
Fixed SpikdepConverter; THIS SHOULD'VE BEEN COMMITTED BEFORE ACL
gregdurrett Mar 7, 2015
331e343
Modernization of neural architecture for coref
gregdurrett Mar 13, 2015
30d506a
Improved coref neural model
gregdurrett Mar 18, 2015
0b23d75
Some camera-ready verification fixes and experiments
gregdurrett May 5, 2015
10605c1
Removed unused classes from the dense package
gregdurrett May 5, 2015
f63bb58
Deleted unused class
gregdurrett May 5, 2015
e124fe4
Moved Word2Vec and Word2VecUtils
gregdurrett May 5, 2015
7f75d11
Moved to dense so we can exclude corefdense
gregdurrett May 5, 2015
3ec611b
Other changes in the move
gregdurrett May 5, 2015
56e4f79
Added NeuralParserTrainer so we can remove casts from ParserTrainer
gregdurrett May 5, 2015
f741202
Camera changes
gregdurrett May 11, 2015
2cd2e23
Improvements to neural coref
gregdurrett May 20, 2015
e7565e9
First crack at sparse net features
gregdurrett May 25, 2015
1411c97
Last of the neural coref stuff, beginning of refactoring
gregdurrett Jun 14, 2015
21894ea
A few more refactoring fixes
gregdurrett Jun 15, 2015
7fc3285
Updating to latest version of epic; minor modifications to trace remo…
gregdurrett Jun 18, 2015
7db377a
Removed batch normalization stuff
gregdurrett Jun 19, 2015
516f590
Mostly reverted ParserTrainer to what it was before neural epic came …
gregdurrett Jun 19, 2015
4a3d1bb
Latest epic commits
gregdurrett Jun 23, 2015
e20db9a
Moved PNMFactory to a better-named file
gregdurrett Jun 23, 2015
3ad6453
More renaming
gregdurrett Jun 23, 2015
6e8ae37
Renaming of files and refactoring so things have better names
gregdurrett Jun 23, 2015
c5d681d
Removed spurious import
gregdurrett Jun 23, 2015
e53a9d3
Readme for the neural stuff
gregdurrett Jun 23, 2015
e264cdd
Some cleanups to the PR
gregdurrett Jun 23, 2015
e70fc0b
Applying dlwh's changes, other removals of commented-out code and one…
gregdurrett Jul 2, 2015
37d2832
Straggler changes
gregdurrett Jul 2, 2015
08a873a
Modified README to point to the downloadable model
gregdurrett Jul 10, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ tmp/
.idea*
.scratch/
java.hprof.txt

*.bbl
*.blg
*.aux
/bin/
106 changes: 106 additions & 0 deletions README-NEURAL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
The neural CRF parser is a high-performing constituency parser.



##Preamble

The neural CRF parser is described in:

"Neural CRF Parsing" Greg Durrett and Dan Klein. ACL 2015.

It is an extension of the span parser described in

"Less Grammar, More Features" David Hall, Greg Durrett, and Dan Klein. ACL 2014.

and is based on the Epic parsing framework. See https://github.com/dlwh/epic
for more documentation about the span parser and the Epic framework.
See http://www.eecs.berkeley.edu/~gdurrett/ for papers and BibTeX.

Questions? Bugs? Email me at [email protected]



##Setup

You need three things to run the neural CRF parser:

1) The compiled .jar; run ```sbt assembly``` to produce this

2) A treebank: the Penn Treebank or one of the SPMRL treebanks

3) Some sort of word vectors. These can either be in the .bin format
of Mikolov et al. (2013) or the .txt format of Bansal et al. (ACL 2014). For
English, the best performance comes from using Bansal et al.'s vectors:

http://ttic.uchicago.edu/~mbansal/codedata/dependencyEmbeddings-skipdep.zip

For other languages, you can train suitable vectors on monolingual data using
```word2vec``` with the following arguments:

-cbow 0 -size 100 -window 1 -sample 1e-4 -threads 8 -binary 0 -iter 15

These are mildly tuned, and using a small window size is important, but other
settings are likely to work well too.




##Usage

To run the parser on new text (tokenized, one-sentence-per-line), use the following command:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParseText actually tokenizes and sentences segments by default. To make your behavior, add the flags "--tokenizer whitespace --sentences newline"


java -Xmx4g -cp path/to/assembly.jar epic.parser.ParseText --model neuralcrf.parser --nthreads 8 [files]

To reproduce the results in the neural CRF paper, run the following command
(note that you need to fill in paths for -cp, --treebank.path, and --word2vecPath):

java -Xmx47g -cp path/to/assembly.jar epic.parser.models.NeuralParserTrainer \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

47g lol

--cache.path constraints.cache \
--opt.useStochastic \
--treebank.path path/to/wsj/ \
--evalOnTest \
--includeDevInTrain \
--trainer.modelFactory.annotator epic.trees.annotations.PipelineAnnotator \
--ann.0 epic.trees.annotations.FilterAnnotations \
--ann.1 epic.trees.annotations.ForgetHeadTag \
--ann.2 epic.trees.annotations.Markovize \
--ann.2.horizontal 0 \
--ann.2.vertical 0 \
--modelFactory epic.parser.models.PositionalNeuralModelFactory \
--opt.batchSize 200 \
--word2vecPath path/to/skipdep_embeddings.txt \
--threads 8

To run SPMRL treebanks, modify the arguments to the command above as follows:

1) Add the following arguments (replace ${LANG}$ as appropriate):

--treebankType spmrl \
--binarization head \
--supervisedHeadFinderPtbPath path/to/gold/ptb/train/train.${LANG}.gold.ptb \
--supervisedHeadFinderConllPath path/to/gold/conll/train/train.${LANG}.gold.conll \
--ann.3 epic.trees.annotations.SplitPunct

2) Modify --treebank.path to point to the X_SPMRL/gold/ptb directory.

Options to configure the neural network and training are largely defined in
```epic.parser.models.PositionalNeuralModel```

###Miscellaneous Notes

To run on the development set, simply remove ```evalOnTest``` and
```includeDevInTrain``` from the arguments.

Note that you should use the official version of ```evalb``` on the output
files (gold and guess) rather than relying on the native scorer in the Epic
parser. For SPMRL, you should use the version distributed with the shared
task.

Also note that the X-bar grammar and coarse pruning masks (constraints) are
cached between runs in the same directory, which speeds up training and testing
time considerably as generating the masks is time-consuming.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while you're at it, might add a note that multiple runs can't be from the same directory.






1 change: 1 addition & 0 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("org", "w3c", "dom", _) => MergeStrategy.first
case PathList("javax", "xml", "stream", _ *) => MergeStrategy.first
case PathList("scala", "xml", _ *) => MergeStrategy.first
case PathList("org", "cyberneko", "html", _ *) => MergeStrategy.first
case x => old(x)
}
Expand Down
71 changes: 71 additions & 0 deletions src/main/scala/epic/dense/AdadeltaGradientDescentDVD.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
package epic.dense

import breeze.linalg._
import breeze.numerics._
import breeze.optimize.StochasticDiffFunction
import breeze.optimize.StochasticGradientDescent


class AdadeltaGradientDescentDVD(maxIter: Int,
rho: Double = 0.95,
tolerance: Double = 1E-5,
improvementTolerance: Double = 1E-4,
minImprovementWindow: Int = 50)
extends StochasticGradientDescent[DenseVector[Double]](1.0, maxIter, tolerance, improvementTolerance, minImprovementWindow) {

val delta = 1E-4
val epsilon = 1e-6
import vspace._

case class History(squaredGradientsHistory: DenseVector[Double], squaredUpdatesHistory: DenseVector[Double])
override def initialHistory(f: StochasticDiffFunction[DenseVector[Double]],init: DenseVector[Double]) = {
History(DenseVector(Array.tabulate(init.size)(i => 1e-6)), DenseVector(Array.tabulate(init.size)(i => 1e-6)))
}

override def updateHistory(newX: DenseVector[Double], newGrad: DenseVector[Double], newValue: Double, f: StochasticDiffFunction[DenseVector[Double]], oldState: State) = {
val oldHistory = oldState.history
// This is correct; the new gradient gets incorporated during the next round of takeStep,
// so this computation should lag by one
val newG = (oldState.grad :* oldState.grad) * (1 - rho)
axpy(rho, oldHistory.squaredGradientsHistory, newG)
val deltaX = newX - oldState.x
val newU = deltaX :* deltaX * (1 - rho);
axpy(rho, oldHistory.squaredUpdatesHistory, newU)
new History(newG, newU)
// val oldHistory = oldState.history
// val newG = (oldState.grad :* oldState.grad)
// val maxAge = 1000.0
// if(oldState.iter > maxAge) {
// newG *= 1/maxAge
// axpy((maxAge - 1)/maxAge, oldHistory.sumOfSquaredGradients, newG)
// } else {
// newG += oldHistory.sumOfSquaredGradients
// }
// new History(newG)
}

override protected def takeStep(state: State, dir: DenseVector[Double], stepSize: Double) = {
// gradient sum needs to
import state._
// Need to pre-emptively update the gradient since the history only has it through the
// last timestep
val rmsGt = sqrt((state.history.squaredGradientsHistory * rho) :+ ((state.grad :* state.grad) * (1-rho)) :+ epsilon)
val rmsDeltaXtm1 = sqrt(state.history.squaredUpdatesHistory :+ epsilon)
val step = dir :* rmsDeltaXtm1 :/ rmsGt
val newX = x
axpy(1.0, step, newX)
newX
}

override def determineStepSize(state: State, f: StochasticDiffFunction[DenseVector[Double]], dir: DenseVector[Double]) = {
defaultStepSize // pegged to 1.0 for this method
}

override protected def adjust(newX: DenseVector[Double], newGrad: DenseVector[Double], newVal: Double) = {
newVal -> newGrad
// val av = newVal + (newX dot newX) * regularizationConstant / 2.0
// val ag = newGrad + newX * regularizationConstant
// (av -> ag)
}

}
108 changes: 108 additions & 0 deletions src/main/scala/epic/dense/AffineOutputTransform.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
package epic.dense

import breeze.linalg._
import breeze.linalg.operators.OpMulMatrix
import epic.features.SegmentedIndex
import epic.framework.Feature

import scala.runtime.ScalaRunTime
import scala.util.Random

/**
* Used at the output layer when we're only going to need some of the possible ouputs;
* it exposes the penultimate layer and then the Layer allows you to pass the results
* from that back in (caching it elsewhere) and only compute certain cells in the
* output layer (activationsFromPenultimateDot).
*/
case class AffineOutputTransform[FV](numOutputs: Int, numInputs: Int, innerTransform: Transform[FV, DenseVector[Double]], includeBias: Boolean = true) extends OutputTransform[FV, DenseVector[Double]] {


val index = SegmentedIndex(new AffineTransform.Index(numOutputs, numInputs, includeBias), innerTransform.index)

def extractLayerAndPenultimateLayer(weights: DenseVector[Double], forTrain: Boolean) = {
val mat = weights(0 until (numOutputs * numInputs)).asDenseMatrix.reshape(numOutputs, numInputs, view = View.Require)
val bias = if(includeBias) {
weights(numOutputs * numInputs until index.componentOffset(1))
} else {
DenseVector.zeros[Double](numOutputs)
}
val inner = innerTransform.extractLayer(weights(index.componentOffset(1) to -1), forTrain)
new OutputLayer(mat, bias, inner) -> inner
}

/**
* N.B. Initialized to zero because this should *only* be used at the output layer, where
* zero initialization is appropriate
*/
def initialWeightVector(initWeightsScale: Double, rng: Random, outputLayer: Boolean, spec: String) = {
require(outputLayer)
DenseVector.vertcat(DenseVector.zeros(index.indices(0).size), innerTransform.initialWeightVector(initWeightsScale, rng, false, spec))
}

def clipHiddenWeightVectors(weights: DenseVector[Double], norm: Double, outputLayer: Boolean) {
innerTransform.clipHiddenWeightVectors(weights(index.componentOffset(1) to -1), norm, false)
}

def getInterestingWeightIndicesForGradientCheck(offset: Int): Seq[Int] = {
(offset until offset + Math.min(10, index.indices(0).size)) ++ innerTransform.getInterestingWeightIndicesForGradientCheck(offset + index.indices(0).size)
}

case class OutputLayer(weights: DenseMatrix[Double], bias: DenseVector[Double], innerLayer: innerTransform.Layer) extends OutputTransform.OutputLayer[FV,DenseVector[Double]] {
override val index = AffineOutputTransform.this.index

val weightst = weights.t
// val weightst = weights.t.copy
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was originally added for better memory locality since mul transpose is pretty slow (but .copy makes it no longer transposed)



def activations(fv: FV) = {
val out = weights * innerLayer.activations(fv) += bias
out
}

def activationsDot(fv: FV, sparseIdx: Int) = {
activationsFromPenultimateDot(innerLayer.activations(fv), sparseIdx)
}

def activationsDot(fv: FV, sparseIndices: Array[Int]) = {
activationsFromPenultimateDot(innerLayer.activations(fv), sparseIndices)
}

def activationsFromPenultimateDot(innerLayerActivations: DenseVector[Double], sparseIdx: Int) = {
weights(sparseIdx, ::) * innerLayerActivations + bias(sparseIdx)
}

def tallyDerivative(deriv: DenseVector[Double], _scale: =>Vector[Double], fv: FV) = {
val scale = _scale
val matDeriv = deriv(0 until (numOutputs * numInputs)).asDenseMatrix.reshape(numOutputs, numInputs, view = View.Require)
val biasDeriv = if(includeBias) {
deriv(numOutputs * numInputs until index.componentOffset(1))
} else {
DenseVector.zeros[Double](numOutputs)
}

// whole function is f(mat * inner(fv) + bias)
// scale(i) pushes in (f'(mat * inner(v) + bias))(i)
val innerAct = innerLayer.activations(fv)
// d/d(weights(::, i)) == scale(i) * innerAct
for (i <- 0 until weights.rows) {
val a: Double = scale(i)
if(a != 0.0) {
axpy(a, innerAct, matDeriv.t(::, i))
// so d/dbias(i) = scale(i)
biasDeriv(i) += a
}
}

// biasDeriv += scale

// scale is f'(mat * inner(v) + bias)
// d/dv is mat.t * f'(mat * inner(v) + bias)

innerLayer.tallyDerivative(deriv(index.componentOffset(1) to -1), weightst * scale, fv)
}

def applyBatchNormalization(inputs: scala.collection.GenTraversable[FV]) = innerLayer.applyBatchNormalization(inputs)

}

}
Loading