Name	Name	Last commit message	Last commit date
parent directory ..
configs	configs
embeddings	embeddings
extract_strings_from_terms	extract_strings_from_terms
extras	extras
json2neo	json2neo
json2solr	json2solr
linker	linker
merge_configs	merge_configs
nextflow	nextflow
ols-shared-rs	ols-shared-rs
rdf2json	rdf2json
reporting	reporting
solr_config_builder	solr_config_builder
solr_config_template	solr_config_template
.dockerignore	.dockerignore
.gitignore	.gitignore
Cargo.lock	Cargo.lock
Cargo.toml	Cargo.toml
Dockerfile	Dockerfile
Dockerfile.nextflow	Dockerfile.nextflow
README.md	README.md
create_neo4j_indexes.py	create_neo4j_indexes.py
load_into_neo4j.sh	load_into_neo4j.sh
make_csv_import_cmd.sh	make_csv_import_cmd.sh
pom.xml	pom.xml
solr_import.py	solr_import.py

Name

Last commit message

Last commit date

embeddings

extract_strings_from_terms

create_neo4j_indexes.py

load_into_neo4j.sh

make_csv_import_cmd.sh

pom.xml

solr_import.py

Converts ontologies represented in OWL RDF/XML to Solr and Neo4j databases.

Usage

Start with a config JSON file that lists the ontologies you want to load. You can get the OBO config into a file called foundry.json like so (make sure you have yq installed):

curl "https://raw.githubusercontent.com/OBOFoundry/OBOFoundry.github.io/master/_config.yml" \
    | yq eval -j - > foundry.json

Step 1: OWL to JSON

Use rdf2json to download all the OWL files, resolve imports, and export JSON files:

 java -jar rdf2json/target/rdf2json-1.0-SNAPSHOT.jar --config file://$(pwd)/foundry.json --output foundry_out.json

Now (after about 15 min) you should have a huge file called foundry_out.json that contains not only the original config for each ontology loaded from foundry.json, but also the ontologies themselves represented in an intermediate JSON format! (Note: the intermediate JSON format is a non-standardised application format totally specific to this tool and is subject to change.)

Step 2: JSON to CSV for Neo4j

You can now convert this huge JSON file to a CSV file ready for Neo4j, using ols_json2neo:

rm -rf output_csv && mkdir output_csv
ols_json2neo --input foundry_out_flat.json --outDir output_csv --manifest linker_manifest.json

Step 3: CSV to Neo4j

Now (after 5-10 mins) you should have a directory full of CSV files. These files are formatted especially for Neo4j. You can load them using neo4j-admin database import full, but you'll need to provide the filename of every single CSV file on the command line, which is boring, so included in this repo is a script called make_csv_import_cmd.sh that generates the command line for you.

neo4j-admin database import full \
    --ignore-empty-strings=true \
    --legacy-style-quoting=false \
    --multiline-fields=true \
    --array-delimiter="|" \
    $(./make_csv_import_cmd.sh)

Now you should have a Neo4j database ready to start!

Step 4: JSON to JSON for Solr

Similar to how the Neo4j CSV was generated, you can also generate JSON files ready for uploading to SOLR using json2solr.

ols_json2solr --input foundry_out_flat.json --outDir output_csv

Loading Reports

Each rdf2json process writes a .status.json file alongside its output JSON file. These status files can be collected and processed by the reporting service to generate a consolidated loading report and optionally send notifications.

See the reporting module README for more details on how the reporting system works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Usage

Step 1: OWL to JSON

Step 2: JSON to CSV for Neo4j

Step 3: CSV to Neo4j

Step 4: JSON to JSON for Solr

Loading Reports

FilesExpand file tree

dataload

Directory actions

More options

Directory actions

More options

Latest commit

History

dataload

Folders and files

parent directory

README.md

Usage

Step 1: OWL to JSON

Step 2: JSON to CSV for Neo4j

Step 3: CSV to Neo4j

Step 4: JSON to JSON for Solr

Loading Reports