Converts ontologies represented in OWL RDF/XML to Solr and Neo4j databases.
Start with a config JSON file that lists the ontologies you want to load. You can get the OBO config into a file called foundry.json like so (make sure you have yq installed):
curl "https://raw.githubusercontent.com/OBOFoundry/OBOFoundry.github.io/master/_config.yml" \
| yq eval -j - > foundry.json
Use rdf2json to download all the OWL files, resolve imports, and export JSON files:
java -jar rdf2json/target/rdf2json-1.0-SNAPSHOT.jar --config file://$(pwd)/foundry.json --output foundry_out.json
Now (after about 15 min) you should have a huge file called foundry_out.json that contains not only the original config for each ontology loaded from foundry.json, but also the ontologies themselves represented in an intermediate JSON format! (Note: the intermediate JSON format is a non-standardised application format totally specific to this tool and is subject to change.)
You can now convert this huge JSON file to a CSV file ready for Neo4j, using ols_json2neo:
rm -rf output_csv && mkdir output_csv
ols_json2neo --input foundry_out_flat.json --outDir output_csv --manifest linker_manifest.json
Now (after 5-10 mins) you should have a directory full of CSV files. These files are formatted especially for Neo4j. You can load them using neo4j-admin database import full, but you'll need to provide the filename of every single CSV file on the command line, which is boring, so included in this repo is a script called make_csv_import_cmd.sh that generates the command line for you.
neo4j-admin database import full \
--ignore-empty-strings=true \
--legacy-style-quoting=false \
--multiline-fields=true \
--array-delimiter="|" \
$(./make_csv_import_cmd.sh)
Now you should have a Neo4j database ready to start!
Similar to how the Neo4j CSV was generated, you can also generate JSON files ready for uploading to SOLR using json2solr.
ols_json2solr --input foundry_out_flat.json --outDir output_csv
Each rdf2json process writes a .status.json file alongside its output JSON file. These status files can be collected and processed by the reporting service to generate a consolidated loading report and optionally send notifications.
See the reporting module README for more details on how the reporting system works.