-
Couldn't load subscription status.
- Fork 0
wangdxf/ProPPR
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
ProPPR: PROGRAMMING WITH PERSONALIZED PAGERANK
==============================================
This is a Java package for using graph walk algorithms to perform inference tasks over local groundings of first-order logic programs. The package makes use of parallelization to substantially speed processing, making it practical even for large databases.
Contents:
1. Build
2. Run
2.0. Overview of Java main() classes
2.1. Experiment pipeline
2.1.0. Experiment: full pipeline
2.1.1. ExampleCooker: Grounding or "Cooking"
2.1.2. Trainer: SGD learning
2.1.3. Tester: Evaluating results
2.2. Utilities
2.2.0. QueryAnswerer: Answering queries against a program
2.2.1. Prompt: Interact with ProPPR machinery in a shell
3. File Formats
1. BUILD
========
ProPPR $ ant clean build
2. RUN
======
For all run phases, control logging output using conf/log4j.properties.
2.0. RUN: JAVA MAIN CLASSES
===========================
edu.cmu.ml.Experiment - Full pipeline: ground, train, test.
edu.cmu.ml.praprolog.ExampleCooker - for grounding (used by grounding.sh)
edu.cmu.ml.praprolog.trove.Trainer - single-threaded parameter-learning over labeled examples
edu.cmu.ml.praprolog.trove.MultithreadedTrainer - multithreaded parameter-learning
- if the trove *Trainer classes have trouble, there are less memory-efficient but potentially easier-to-debug versions which use string keys:
edu.cmu.ml.praprolog.Trainer
edu.cmu.ml.praprolog.MultithreadedTrainer
edu.cmu.ml.praprolog.Tester - single-threaded evaluation of trained parameters over labeled examples
edu.cmu.ml.praprolog.prove.Prompt - interactive prompt for exploring state trees in grounding proofs - good for debugging rulefiles
edu.cmu.ml.praprolog.QueryAnswerer - compute query solutions, using trained or untrained parameters
2.1. RUN: EXPERIMENT PIPELINE
=============================
2.1.0. RUN: EXPERIMENT PIPELINE: EXPERIMENT
===========================================
Use the file formats below to create files that specify your program's
rules and fact DBs, and optionally a graph. Create training and testing data
files of positive and negative examples for queries on the
program. Details on file format are below. Place these files in their
own directory.
ProPPR $ ls my_program
facts.facts graph.graph rules.rules train.data test.data
Use the provided script 'compile.sh' to convert these files into
ProPPR-readable formats:
ProPPR $ scripts/compile.sh my_program
compiling my_program/rules.rules to my_program/rules.crules
parsing my_program/rules.rules
Converting my_program/facts.facts
Now you can run the full pipeline on the resulting files. The compile script
generates the --programFiles argument for you:
ProPPR $ export PROGRAMFILES=`cat my_program/programFiles.arg`
ProPPR $ java -cp .:bin/:lib/*:conf/ edu.cmu.ml.praprolog.Experiment \
--programFiles ${PROGRAMFILES%:}
--train my_program/toytrain.data --output my_program/train.cooked \
--test my_program/toytest.data --params my_program/params.wts
INFO [Component] Loading from file 'my_program/rules.crules' with alpha=0.0 ...
INFO [Component] Loading from file 'my_program/facts.cfacts' with alpha=0.0 ...
INFO [Component] Loading from file 'my_program/graph.graph' with alpha=0.0 ...
INFO [Experiment] Cooking training examples from my_program/toytrain.data...
INFO [Experiment] Training model parameters...
INFO [Trainer] Importing cooked examples from my_program/train.cooked
INFO [Trainer] Imported 11 examples
INFO [Trainer] Training on cooked examples...
INFO [Trainer] epoch 1 ...
INFO [Trainer] epoch 2 ...
INFO [Trainer] epoch 3 ...
INFO [Trainer] epoch 4 ...
INFO [Trainer] epoch 5 ...
INFO [Trainer] epoch 6 ...
INFO [Trainer] epoch 7 ...
INFO [Trainer] epoch 8 ...
INFO [Trainer] epoch 9 ...
INFO [Trainer] epoch 10 ...
INFO [Trainer] epoch 11 ...
INFO [Trainer] epoch 12 ...
INFO [Trainer] epoch 13 ...
INFO [Trainer] epoch 14 ...
INFO [Trainer] epoch 15 ...
INFO [Trainer] epoch 16 ...
INFO [Trainer] epoch 17 ...
INFO [Trainer] epoch 18 ...
INFO [Trainer] epoch 19 ...
INFO [Trainer] epoch 20 ...
INFO [Trainer] Finished in 1816 ms
INFO [Experiment] Saving parameters to my_program/params.wts...
INFO [Experiment] Testing on my_program/toytest.data...
INFO [Tester] pairTotal 7.0 pairErrors 0.0 errorRate 0.0 map 1.0
result= running time 2836
result= pairs 7.0 errors 0.0 errorRate 0.0 map 1.0
The full list of options is available (including setting details on prover type,
number of threads, loss output, etc) by running Experiment with no arguments.
2.1.1. RUN: EXPERIMENT PIPELINE: EXAMPLECOOKER
==============================================
Use the provided script 'ground.sh'
Use the file formats below to create files that specify your program's
rules and fact DBs, and optionally a graph. Create a training data
file of positive and negative examples for queries on the
program. Details on file format are below. Place these files in their
own directory.
ProPPR $ ls my_program
facts.facts graph.graph rules.rules train.data
Run the grounding script on the directory, and specify the training data file explicitly:
ProPPR $ scripts/ground.sh my_program my_program/train.data
--- Compiling my_program first:
scripts/compile.sh my_program
compiling my_program/rules.rules to my_program/rules.crules
parsing my_program/rules.rules
Converting my_program/facts.facts
--- Cooking my_program/train.data:
java -cp .:bin/:lib/*:conf/ edu.cmu.ml.praprolog.ExampleCooker --programFiles my_program/rules.crules:my_program/facts.cfacts:my_program/graph.graph --data my_program/train.data --output my_program/train.cooked
INFO [Component] Loading from file 'my_program/rules.crules'...
INFO [Component] Loading from file 'my_program/facts.cfacts'...
INFO [Component] Loading from file 'my_program/graph.graph'...
171
Done.
--- Done.
ProPPR $ ls my_program/
programFiles.arg rules.crules rules.rules facts.cfacts facts.facts train.cooked train.data graph.graph
You can pass additional arguments to the cooker, if you wish to
specify prover or multhreading settings:
[--prover { ppr | dpr[:eps] | tr[:depth] }]
(default prover is ppr)
(default dpr epsilon=0.0001)
(default tr depth=5)
[--threads NUMBER]
use multithreading (default implementation is single-threaded)
ProPPR $ scripts/ground.sh my_program my_program/toytrain.data --prover dpr:0.00001 --threads 3
--- Compiling my_program first:
scripts/compile.sh my_program
compiling my_program/textcat.rules to my_program/textcat.crules
parsing my_program/textcat.rules
Converting my_program/toylabels.facts
--- Cooking my_program/toytrain.data:
java -cp .:bin/:lib/*:conf/ edu.cmu.ml.praprolog.ExampleCooker --programFiles my_program/textcat.crules:my_program/toylabels.cfacts:my_program/toywords.graph --data my_program/toytrain.data --output my_program/toytrain.cooked --prover dpr:0.00001 --threads 3
INFO [Component] Loading from file 'my_program/textcat.crules'...
INFO [Component] Loading from file 'my_program/toylabels.cfacts'...
INFO [Component] Loading from file 'my_program/toywords.graph'...
1124
Done.
--- Done.
To reset your project directory to just the original rule, fact,
graph, and data files, use the script:
ProPPR $ scripts/clean.sh my_program/
ProPPR $ ls my_program
facts.facts graph.graph rules.rules train.data
2.1.2. RUN: EXPERIMENT PIPELINE: TRAINER
========================================
Given a cooked file we can learn the parameter weights that best
support the training examples:
ProPPR $ java edu.cmu.ml.praprolog.trove.Trainer my_program/toytrain.cooked my_program/toytrain.params
You can specify additional options if you like:
--epochs {int} Number of epochs (default 5)
--traceLosses Turn on traceLosses (default off)
NB: example count for losses is sum(x.length() for x in examples)
and won't match `wc -l cookedExampleFile`
For multithreaded learning, use:
ProPPR $ java edu.cmu.ml.praprolog.trove.MultithreadedTrainer my_program/toytrain.cooked my_program/toytrain.params
With the same options as above, plus:
--threads {int} Number of threads (default 4)
--rr Use round-robin scheduling (default:queue) (RECOMMENDED)
2.1.3. RUN: EXPERIMENT PIPELINE: TESTER
=======================================
Given a cooked file of testing examples and the model parameters output from
Trainer, we can get the MAP of the trained model over the test data.
If you are running this step you should already have a compiled program
my_program, likely cooked training data, and the trained parameters:
ProPPR $ ls my_program/
facts.cfacts facts.facts graph.graph params.wts programFiles.arg
rules.crules rules.rules test.data train.cooked train.data
Run the tester:
ProPPR $ export PROGRAMFILES=`cat my_program/programFiles.arg`
ProPPR $ java -cp .:bin/:lib/*:conf/ edu.cmu.ml.praprolog.Tester \
--programFiles ${PROGRAMFILES%:}
--test my_program/test.data --params my_program/params.wts
INFO [Tester] flags: 0x59
INFO [Component] Loading from file 'my_program/rules.crules' with alpha=0.0 ...
INFO [Component] Loading from file 'my_program/facts.cfacts' with alpha=0.0 ...
INFO [Component] Loading from file 'my_program/graph.graph' with alpha=0.0 ...
INFO [Tester] Testing on my_program/test.data...
INFO [Tester] pairTotal 7.0 pairErrors 0.0 errorRate 0.0 map 1.0
result= running time 113
result= pairs 7.0 errors 0.0 errorRate 0.0 map 1.0
2.2. RUN: UTILITIES
===================
2.2.0. RUN: UTILITIES: QUERYANSWERER
====================================
If you want to use a program to answer a series of queries, you can
use the QueryAnswerer class. If you are running this step you should
already have a compiled program and a file containing a list of
queries, one per line. Each query is a single goal.
ProPPR $ cat testcases/family.queries
sim(william,X)
sim(rachel,X)
ProPPR $ java edu.cmu.ml.praprolog.QueryAnswerer \
--programFiles testcases/family.cfacts:testcases/family.crules \
--queries testcases/family.queries --output answers.txt
INFO [Component] Loading from file 'testcases/family.cfacts' with alpha=0.0 ...
INFO [Component] Loading from file 'testcases/family.crules' with alpha=0.0 ...
ProPPR $ cat answers.txt
# proved sim(william,-1) 47 msec
1 0.8838968104504825 -1=c[william]
2 0.035512510088781264 -1=c[lottie]
3 0.035512510088781264 -1=c[rachel]
4 0.035512510088781264 -1=c[sarah]
5 0.002391414820793351 -1=c[poppy]
6 0.0017935611155950133 -1=c[lucas]
7 0.0017935611155950133 -1=c[charlotte]
8 0.0017935611155950133 -1=c[caroline]
9 0.0017935611155950133 -1=c[elizabeth]
# proved sim(rachel,-1) 18 msec
1 0.9094251636624519 -1=c[rachel]
2 0.0452874181687741 -1=c[caroline]
3 0.0452874181687741 -1=c[elizabeth]
2.2.1. RUN: UTILITIES: PROMPT
=============================
An interactive prompt can be useful while debugging logic program issues, because you can examine a single query in detail. If you are running this step you should already have a compiled program.
Starting up the prompt:
"""
ProPPR $ java -cp conf/:bin/:lib/* edu.cmu.ml.praprolog.prove.Prompt --programFiles ${PROGRAMFILES%:}
Starting up beanshell...
prv set: edu.cmu.ml.praprolog.prove.TracingDfsProver@57fdc2d
INFO [Component] Loading from file 'kbp_prototype/doc.crules' with alpha=0.0 ...
INFO [Component] Loading from file 'kbp_prototype/kb.cfacts' with alpha=0.0 ...
INFO [Component] Loading from file 'kbp_prototype/lp_predicate_SF_ENG_001-50doc.graph' with alpha=0.0 ...
lp set: edu.cmu.ml.praprolog.prove.LogicProgram@2225a091
Type 'help();' for help, 'quit();' to quit; 'list();' for a variable listing.
BeanShell 2.0b4 - by Pat Niemeyer ([email protected])
bsh %
"""
When it starts up, Prompt instantiates the logic program from the command line as 'lp', and a default prover which prints a depth-first-search-style proof of a query (default maximum depth is 5). You can specify a different prover on the command line if you wish. For information on built-in commands and interpreter syntax, type 'help();':
"""
bsh % help();
This is a beanshell, a command-line interpreter for java. A full beanshell manual is available at <http://www.beanshell.org/manual/contents.html>.
Type java statements and expressions at the prompt. Don't forget semicolons.
Type 'help();' for help, 'quit();' to quit; 'list();' for a variable listing.
'show();' will toggle automatic printing of the results of expressions. Otherwise you must use 'print( expr );' to see results.
'javap( x );' will list the fields and methods available on an object. Be warned; beanshell has trouble locating methods that are only defined on the superclass.
'[sol = ]run(prover,logicprogram,"functor(arg,arg,...,arg)")' will prove the associated state.
'pretty(sol)' will list solutions first, then intermediate states in descending weight order.
bsh %
"""
3. FILE FORMATS
===============
****** File format: *.rules
Example:
predict(X,Y) :- hasWord(X,W),isLabel(Y),related(W,Y) #r.
related(W,Y) :- # w(W,Y).
Grammar:
line= rhs ':-' lhs ('#' featureList)? '.'
rhs= goal
lhs=
|= goal (',' goal)*
featureList=
|= goal (',' goal)*
goal= functor
|= functor '(' argList ')'
argList= constantArgList
|= variableArgList
|= constantArgList ',' variableArgList
constantArgList= constantArg (',' constantArg)*
variableArgList= variableArg (',' variableArg)*
constantArg= [a-z][a-zA-Z0-9]*
variableArg= [A-Z][a-zA-Z0-9]*
functor= [a-z][a-zA-Z0-9]*
****** File format: *.facts
Example:
isLabel(pos)
isLabel(neg)
Grammar:
line= goal
****** File format: *.graph
Example:
hasWord bk punk
hasWord bk queen
hasWord bk barbie
hasWord bk and
hasWord bk ken
hasWord rb a
hasWord rb little
hasWord rb red
hasWord rb bike
hasWord mv a
hasWord mv big
hasWord mv 7-seater
hasWord mv minivan
hasWord mv with
hasWord mv an
hasWord mv automatic
hasWord mv transmission
hasWord hs a
hasWord hs big
hasWord hs house
hasWord hs in
hasWord hs the
hasWord hs suburbs
hasWord hs with
hasWord hs crushing
hasWord hs mortgage
Grammar:
line= edge '\t' sourcenode '\t' destnode
edge= functor
sourcenode,destnode= constantArg
****** File format: *.data
Example:
predict(bk,Y) -predict(bk,neg) +predict(bk,pos)
predict(rb,Y) -predict(rb,neg) +predict(rb,pos)
predict(mv,Y) +predict(mv,neg) -predict(mv,pos)
predict(hs,Y) +predict(hs,neg) -predict(hs,pos)
Grammar:
line= query '\t' exampleList
query= goal
exampleList= example ('\t' example)*
example= positiveExample
|= negativeExample
positiveExample= '+' goal
negativeExample= '-' goal
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Packages 0
No packages published