This repository was archived by the owner on Feb 19, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 81
Neural epic #33
Merged
Merged
Neural epic #33
Changes from 72 commits
Commits
Show all changes
75 commits
Select commit
Hold shift + click to select a range
ebb3998
Reasonable first crack at neural epic with rule features and position…
gregdurrett a2f65eb
Additional speed optimizations, first groundwork for the caching firs…
gregdurrett cb4fe4e
First stuff for caching input layer
gregdurrett 8c96e38
Finished caching implementation of the first layer
gregdurrett fa38219
Changed random initialization
gregdurrett 4f9719b
Minor tweaks
gregdurrett 7e168d7
Revamped how NNs are instantiated, can now load multiple input vector…
gregdurrett a85d0d9
Changed how weights are initialized to make backprop into embeddings …
gregdurrett 5cbeff3
First crack at corefdense, additional improvements to the neural pars…
gregdurrett 6ee25e5
More improvements: corefdense stuff, can do fully linear network, etc.
gregdurrett 68bdf2f
Added support for multiple parallel layers
gregdurrett eec6834
Added support for dependencies
gregdurrett 4aeb231
Various changes
gregdurrett c3250a5
Small bug fix so it compiles
gregdurrett 9b8aba9
Testing optimization stuff
gregdurrett faf9269
More optimization playing around
gregdurrett 9523bc2
Adadelta implementation
gregdurrett 04896ad
First crack at NER
gregdurrett 8eeed33
Small updates to dense NER
gregdurrett 39beb0e
Parallel training
gregdurrett 5ed3c63
Added LowRankQuadraticTransform
gregdurrett 6fd0bdf
Added pruning to NER, other changes
gregdurrett 5de8091
Overhauled how nonlinear transforms are handled
gregdurrett 9fba451
Changed how the output layer is handled to allow for LRQT and output …
gregdurrett 9a55ee8
Minor changes
gregdurrett 6a2291d
Separated layer extraction for train and for test to allow dropout to…
gregdurrett f88adc1
Changes to NER
gregdurrett f63db36
Changes to set up for dropout in parser
gregdurrett 78eabe9
Embedding clipping, dropout everywhere, etc.
gregdurrett a79c86e
Before tetra messing
gregdurrett c5d7986
Random changes
gregdurrett a7d83c3
Fixed output embedding to be fast in parser
gregdurrett c23e7b8
Flexibility in input words to the network
gregdurrett 6549e4b
Ensembling
gregdurrett 01eaa88
Expanded set of possible surface features
gregdurrett 933cc97
Added ability to decouple parameters for unaries, binaries, and spans
gregdurrett b734995
Bug fix
gregdurrett 8bfcd2a
SGD with momentum now supported
gregdurrett 03044e1
Made things serializable, also fixed a bug where SGD wasn't being used
gregdurrett ff6e16a
Fixed some vector loading stuff
gregdurrett 434b858
Added batch normalization support
gregdurrett f849f33
Improved output embedding initialization
gregdurrett 1dd7fa9
Fixed initializer for oe
gregdurrett 93f6257
Allowed for lowercased word vectors
gregdurrett 5759b64
Added a way to get a subset of gradient features, also improvements t…
gregdurrett e3d5705
Added checking coverage of vectors
gregdurrett 756857e
Combo adagrad and skipdep converter implementation
gregdurrett 610bf51
Fixed SpikdepConverter; THIS SHOULD'VE BEEN COMMITTED BEFORE ACL
gregdurrett 331e343
Modernization of neural architecture for coref
gregdurrett 30d506a
Improved coref neural model
gregdurrett 0b23d75
Some camera-ready verification fixes and experiments
gregdurrett 10605c1
Removed unused classes from the dense package
gregdurrett f63bb58
Deleted unused class
gregdurrett e124fe4
Moved Word2Vec and Word2VecUtils
gregdurrett 7f75d11
Moved to dense so we can exclude corefdense
gregdurrett 3ec611b
Other changes in the move
gregdurrett 56e4f79
Added NeuralParserTrainer so we can remove casts from ParserTrainer
gregdurrett f741202
Camera changes
gregdurrett 2cd2e23
Improvements to neural coref
gregdurrett e7565e9
First crack at sparse net features
gregdurrett 1411c97
Last of the neural coref stuff, beginning of refactoring
gregdurrett 21894ea
A few more refactoring fixes
gregdurrett 7fc3285
Updating to latest version of epic; minor modifications to trace remo…
gregdurrett 7db377a
Removed batch normalization stuff
gregdurrett 516f590
Mostly reverted ParserTrainer to what it was before neural epic came …
gregdurrett 4a3d1bb
Latest epic commits
gregdurrett e20db9a
Moved PNMFactory to a better-named file
gregdurrett 3ad6453
More renaming
gregdurrett 6e8ae37
Renaming of files and refactoring so things have better names
gregdurrett c5d681d
Removed spurious import
gregdurrett e53a9d3
Readme for the neural stuff
gregdurrett e264cdd
Some cleanups to the PR
gregdurrett e70fc0b
Applying dlwh's changes, other removals of commented-out code and one…
gregdurrett 37d2832
Straggler changes
gregdurrett 08a873a
Modified README to point to the downloadable model
gregdurrett File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,7 +16,7 @@ tmp/ | |
| .idea* | ||
| .scratch/ | ||
| java.hprof.txt | ||
|
|
||
| *.bbl | ||
| *.blg | ||
| *.aux | ||
| /bin/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,106 @@ | ||
| The neural CRF parser is a high-performing constituency parser. | ||
|
|
||
|
|
||
|
|
||
| ##Preamble | ||
|
|
||
| The neural CRF parser is described in: | ||
|
|
||
| "Neural CRF Parsing" Greg Durrett and Dan Klein. ACL 2015. | ||
|
|
||
| It is an extension of the span parser described in | ||
|
|
||
| "Less Grammar, More Features" David Hall, Greg Durrett, and Dan Klein. ACL 2014. | ||
|
|
||
| and is based on the Epic parsing framework. See https://github.com/dlwh/epic | ||
| for more documentation about the span parser and the Epic framework. | ||
| See http://www.eecs.berkeley.edu/~gdurrett/ for papers and BibTeX. | ||
|
|
||
| Questions? Bugs? Email me at [email protected] | ||
|
|
||
|
|
||
|
|
||
| ##Setup | ||
|
|
||
| You need three things to run the neural CRF parser: | ||
|
|
||
| 1) The compiled .jar; run ```sbt assembly``` to produce this | ||
|
|
||
| 2) A treebank: the Penn Treebank or one of the SPMRL treebanks | ||
|
|
||
| 3) Some sort of word vectors. These can either be in the .bin format | ||
| of Mikolov et al. (2013) or the .txt format of Bansal et al. (ACL 2014). For | ||
| English, the best performance comes from using Bansal et al.'s vectors: | ||
|
|
||
| http://ttic.uchicago.edu/~mbansal/codedata/dependencyEmbeddings-skipdep.zip | ||
|
|
||
| For other languages, you can train suitable vectors on monolingual data using | ||
| ```word2vec``` with the following arguments: | ||
|
|
||
| -cbow 0 -size 100 -window 1 -sample 1e-4 -threads 8 -binary 0 -iter 15 | ||
|
|
||
| These are mildly tuned, and using a small window size is important, but other | ||
| settings are likely to work well too. | ||
|
|
||
|
|
||
|
|
||
|
|
||
| ##Usage | ||
|
|
||
| To run the parser on new text (tokenized, one-sentence-per-line), use the following command: | ||
|
|
||
| java -Xmx4g -cp path/to/assembly.jar epic.parser.ParseText --model neuralcrf.parser --nthreads 8 [files] | ||
|
|
||
| To reproduce the results in the neural CRF paper, run the following command | ||
| (note that you need to fill in paths for -cp, --treebank.path, and --word2vecPath): | ||
|
|
||
| java -Xmx47g -cp path/to/assembly.jar epic.parser.models.NeuralParserTrainer \ | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 47g lol |
||
| --cache.path constraints.cache \ | ||
| --opt.useStochastic \ | ||
| --treebank.path path/to/wsj/ \ | ||
| --evalOnTest \ | ||
| --includeDevInTrain \ | ||
| --trainer.modelFactory.annotator epic.trees.annotations.PipelineAnnotator \ | ||
| --ann.0 epic.trees.annotations.FilterAnnotations \ | ||
| --ann.1 epic.trees.annotations.ForgetHeadTag \ | ||
| --ann.2 epic.trees.annotations.Markovize \ | ||
| --ann.2.horizontal 0 \ | ||
| --ann.2.vertical 0 \ | ||
| --modelFactory epic.parser.models.PositionalNeuralModelFactory \ | ||
| --opt.batchSize 200 \ | ||
| --word2vecPath path/to/skipdep_embeddings.txt \ | ||
| --threads 8 | ||
|
|
||
| To run SPMRL treebanks, modify the arguments to the command above as follows: | ||
|
|
||
| 1) Add the following arguments (replace ${LANG}$ as appropriate): | ||
|
|
||
| --treebankType spmrl \ | ||
| --binarization head \ | ||
| --supervisedHeadFinderPtbPath path/to/gold/ptb/train/train.${LANG}.gold.ptb \ | ||
| --supervisedHeadFinderConllPath path/to/gold/conll/train/train.${LANG}.gold.conll \ | ||
| --ann.3 epic.trees.annotations.SplitPunct | ||
|
|
||
| 2) Modify --treebank.path to point to the X_SPMRL/gold/ptb directory. | ||
|
|
||
| Options to configure the neural network and training are largely defined in | ||
| ```epic.parser.models.PositionalNeuralModel``` | ||
|
|
||
| ###Miscellaneous Notes | ||
|
|
||
| To run on the development set, simply remove ```evalOnTest``` and | ||
| ```includeDevInTrain``` from the arguments. | ||
|
|
||
| Note that you should use the official version of ```evalb``` on the output | ||
| files (gold and guess) rather than relying on the native scorer in the Epic | ||
| parser. For SPMRL, you should use the version distributed with the shared | ||
| task. | ||
|
|
||
| Also note that the X-bar grammar and coarse pruning masks (constraints) are | ||
| cached between runs in the same directory, which speeds up training and testing | ||
| time considerably as generating the masks is time-consuming. | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. while you're at it, might add a note that multiple runs can't be from the same directory. |
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
71 changes: 71 additions & 0 deletions
71
src/main/scala/epic/dense/AdadeltaGradientDescentDVD.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| package epic.dense | ||
|
|
||
| import breeze.linalg._ | ||
| import breeze.numerics._ | ||
| import breeze.optimize.StochasticDiffFunction | ||
| import breeze.optimize.StochasticGradientDescent | ||
|
|
||
|
|
||
| class AdadeltaGradientDescentDVD(maxIter: Int, | ||
| rho: Double = 0.95, | ||
| tolerance: Double = 1E-5, | ||
| improvementTolerance: Double = 1E-4, | ||
| minImprovementWindow: Int = 50) | ||
| extends StochasticGradientDescent[DenseVector[Double]](1.0, maxIter, tolerance, improvementTolerance, minImprovementWindow) { | ||
|
|
||
| val delta = 1E-4 | ||
| val epsilon = 1e-6 | ||
| import vspace._ | ||
|
|
||
| case class History(squaredGradientsHistory: DenseVector[Double], squaredUpdatesHistory: DenseVector[Double]) | ||
| override def initialHistory(f: StochasticDiffFunction[DenseVector[Double]],init: DenseVector[Double]) = { | ||
| History(DenseVector(Array.tabulate(init.size)(i => 1e-6)), DenseVector(Array.tabulate(init.size)(i => 1e-6))) | ||
| } | ||
|
|
||
| override def updateHistory(newX: DenseVector[Double], newGrad: DenseVector[Double], newValue: Double, f: StochasticDiffFunction[DenseVector[Double]], oldState: State) = { | ||
| val oldHistory = oldState.history | ||
| // This is correct; the new gradient gets incorporated during the next round of takeStep, | ||
| // so this computation should lag by one | ||
| val newG = (oldState.grad :* oldState.grad) * (1 - rho) | ||
| axpy(rho, oldHistory.squaredGradientsHistory, newG) | ||
| val deltaX = newX - oldState.x | ||
| val newU = deltaX :* deltaX * (1 - rho); | ||
| axpy(rho, oldHistory.squaredUpdatesHistory, newU) | ||
| new History(newG, newU) | ||
| // val oldHistory = oldState.history | ||
| // val newG = (oldState.grad :* oldState.grad) | ||
| // val maxAge = 1000.0 | ||
| // if(oldState.iter > maxAge) { | ||
| // newG *= 1/maxAge | ||
| // axpy((maxAge - 1)/maxAge, oldHistory.sumOfSquaredGradients, newG) | ||
| // } else { | ||
| // newG += oldHistory.sumOfSquaredGradients | ||
| // } | ||
| // new History(newG) | ||
| } | ||
|
|
||
| override protected def takeStep(state: State, dir: DenseVector[Double], stepSize: Double) = { | ||
| // gradient sum needs to | ||
| import state._ | ||
| // Need to pre-emptively update the gradient since the history only has it through the | ||
| // last timestep | ||
| val rmsGt = sqrt((state.history.squaredGradientsHistory * rho) :+ ((state.grad :* state.grad) * (1-rho)) :+ epsilon) | ||
| val rmsDeltaXtm1 = sqrt(state.history.squaredUpdatesHistory :+ epsilon) | ||
| val step = dir :* rmsDeltaXtm1 :/ rmsGt | ||
| val newX = x | ||
| axpy(1.0, step, newX) | ||
| newX | ||
| } | ||
|
|
||
| override def determineStepSize(state: State, f: StochasticDiffFunction[DenseVector[Double]], dir: DenseVector[Double]) = { | ||
| defaultStepSize // pegged to 1.0 for this method | ||
| } | ||
|
|
||
| override protected def adjust(newX: DenseVector[Double], newGrad: DenseVector[Double], newVal: Double) = { | ||
| newVal -> newGrad | ||
| // val av = newVal + (newX dot newX) * regularizationConstant / 2.0 | ||
| // val ag = newGrad + newX * regularizationConstant | ||
| // (av -> ag) | ||
| } | ||
|
|
||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,108 @@ | ||
| package epic.dense | ||
|
|
||
| import breeze.linalg._ | ||
| import breeze.linalg.operators.OpMulMatrix | ||
| import epic.features.SegmentedIndex | ||
| import epic.framework.Feature | ||
|
|
||
| import scala.runtime.ScalaRunTime | ||
| import scala.util.Random | ||
|
|
||
| /** | ||
| * Used at the output layer when we're only going to need some of the possible ouputs; | ||
| * it exposes the penultimate layer and then the Layer allows you to pass the results | ||
| * from that back in (caching it elsewhere) and only compute certain cells in the | ||
| * output layer (activationsFromPenultimateDot). | ||
| */ | ||
| case class AffineOutputTransform[FV](numOutputs: Int, numInputs: Int, innerTransform: Transform[FV, DenseVector[Double]], includeBias: Boolean = true) extends OutputTransform[FV, DenseVector[Double]] { | ||
|
|
||
|
|
||
| val index = SegmentedIndex(new AffineTransform.Index(numOutputs, numInputs, includeBias), innerTransform.index) | ||
|
|
||
| def extractLayerAndPenultimateLayer(weights: DenseVector[Double], forTrain: Boolean) = { | ||
| val mat = weights(0 until (numOutputs * numInputs)).asDenseMatrix.reshape(numOutputs, numInputs, view = View.Require) | ||
| val bias = if(includeBias) { | ||
| weights(numOutputs * numInputs until index.componentOffset(1)) | ||
| } else { | ||
| DenseVector.zeros[Double](numOutputs) | ||
| } | ||
| val inner = innerTransform.extractLayer(weights(index.componentOffset(1) to -1), forTrain) | ||
| new OutputLayer(mat, bias, inner) -> inner | ||
| } | ||
|
|
||
| /** | ||
| * N.B. Initialized to zero because this should *only* be used at the output layer, where | ||
| * zero initialization is appropriate | ||
| */ | ||
| def initialWeightVector(initWeightsScale: Double, rng: Random, outputLayer: Boolean, spec: String) = { | ||
| require(outputLayer) | ||
| DenseVector.vertcat(DenseVector.zeros(index.indices(0).size), innerTransform.initialWeightVector(initWeightsScale, rng, false, spec)) | ||
| } | ||
|
|
||
| def clipHiddenWeightVectors(weights: DenseVector[Double], norm: Double, outputLayer: Boolean) { | ||
| innerTransform.clipHiddenWeightVectors(weights(index.componentOffset(1) to -1), norm, false) | ||
| } | ||
|
|
||
| def getInterestingWeightIndicesForGradientCheck(offset: Int): Seq[Int] = { | ||
| (offset until offset + Math.min(10, index.indices(0).size)) ++ innerTransform.getInterestingWeightIndicesForGradientCheck(offset + index.indices(0).size) | ||
| } | ||
|
|
||
| case class OutputLayer(weights: DenseMatrix[Double], bias: DenseVector[Double], innerLayer: innerTransform.Layer) extends OutputTransform.OutputLayer[FV,DenseVector[Double]] { | ||
| override val index = AffineOutputTransform.this.index | ||
|
|
||
| val weightst = weights.t | ||
| // val weightst = weights.t.copy | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this was originally added for better memory locality since mul transpose is pretty slow (but .copy makes it no longer transposed) |
||
|
|
||
|
|
||
| def activations(fv: FV) = { | ||
| val out = weights * innerLayer.activations(fv) += bias | ||
| out | ||
| } | ||
|
|
||
| def activationsDot(fv: FV, sparseIdx: Int) = { | ||
| activationsFromPenultimateDot(innerLayer.activations(fv), sparseIdx) | ||
| } | ||
|
|
||
| def activationsDot(fv: FV, sparseIndices: Array[Int]) = { | ||
| activationsFromPenultimateDot(innerLayer.activations(fv), sparseIndices) | ||
| } | ||
|
|
||
| def activationsFromPenultimateDot(innerLayerActivations: DenseVector[Double], sparseIdx: Int) = { | ||
| weights(sparseIdx, ::) * innerLayerActivations + bias(sparseIdx) | ||
| } | ||
|
|
||
| def tallyDerivative(deriv: DenseVector[Double], _scale: =>Vector[Double], fv: FV) = { | ||
| val scale = _scale | ||
| val matDeriv = deriv(0 until (numOutputs * numInputs)).asDenseMatrix.reshape(numOutputs, numInputs, view = View.Require) | ||
| val biasDeriv = if(includeBias) { | ||
| deriv(numOutputs * numInputs until index.componentOffset(1)) | ||
| } else { | ||
| DenseVector.zeros[Double](numOutputs) | ||
| } | ||
|
|
||
| // whole function is f(mat * inner(fv) + bias) | ||
| // scale(i) pushes in (f'(mat * inner(v) + bias))(i) | ||
| val innerAct = innerLayer.activations(fv) | ||
| // d/d(weights(::, i)) == scale(i) * innerAct | ||
| for (i <- 0 until weights.rows) { | ||
| val a: Double = scale(i) | ||
| if(a != 0.0) { | ||
| axpy(a, innerAct, matDeriv.t(::, i)) | ||
| // so d/dbias(i) = scale(i) | ||
| biasDeriv(i) += a | ||
| } | ||
| } | ||
|
|
||
| // biasDeriv += scale | ||
|
|
||
| // scale is f'(mat * inner(v) + bias) | ||
| // d/dv is mat.t * f'(mat * inner(v) + bias) | ||
|
|
||
| innerLayer.tallyDerivative(deriv(index.componentOffset(1) to -1), weightst * scale, fv) | ||
| } | ||
|
|
||
| def applyBatchNormalization(inputs: scala.collection.GenTraversable[FV]) = innerLayer.applyBatchNormalization(inputs) | ||
|
|
||
| } | ||
|
|
||
| } | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ParseText actually tokenizes and sentences segments by default. To make your behavior, add the flags "--tokenizer whitespace --sentences newline"