Each of Figgy's servers has SMB mounts to a variety of places on the Isilon, all
under /mnt/hydra_sources. They are the following:
- archives
- Contains source TIFFs for digitized finding aids.
- ingest_scratch
- Working directory for Figgy developers to move content to in order to bulk ingest.
- maplab
- pudl
- Contains source TIFFs for content found in PUDL. Primarily used for migrating to Figgy.
- studio
- Old studio staging server - not currently used for ingest.
- studio_new
- New studio staging server - the digitization studio saves images here for ingest into Figgy.
If you need to ingest content, the first step is to make sure the data is in one
of these shares. If it's new content, rather than being migrated, it should
ideally be in ingest_scratch. If you try to ingest from a location not in
these shares, background jobs will run on other servers which will be unable to
access the files. Currently content can be ingested in three
forms:
Items coming from the PUDL consists of a METS file as well as a series of images. The METS files have links to the source TIFFs and the mounts should be set up appropriately. You can see an example METS file here: https://github.com/pulibrary/figgy/blob/186e13415d94223909800e651cf62d090ec1dcfd/spec/fixtures/mets/pudl0001-4609321-s42.mets. To ingest a METS file do the following:
ssh deploy@lib-proc6cd /opt/figgy/currentRAILS_ENV=production FILE=/mnt/hydra_sources/ingest_scratch/x/my_file.mets bundle exec rake import:mets
Bagged materials currently do not have a rake task to import, will be coming
soon. Currently it's assumed that bag materials sit in a consistent location,
which will not be in /mnt/hydra_sources, since they're not used for ingest and
rather for recovery of exported materials.
A directory structure can be set up to ingest an item. The directory structure should look like this:
- 123456 #(bib-id)
- 01.tif
- 02.tif
For a Multi-volume Work it should look like the following:
- 123456 #(bib-id)
- Volume 1
- 01.tif
- Volume 2
- 02.tif
ssh deploy@lib-proc6cd /opt/figgy/currentRAILS_ENV=production bundle exec rake bulk:ingest DIR=/mnt/hydra_sources/ingest_scratch/x/123456 COLL=collid LOCAL_ID=local_id REPLACES=replaces
collid should be the identifier in figgy of the collection you'd like to add
the item to. Optional.
local_id is a string representing some local identifier. Optional.
replaces is a string representing an old identifier of the item. Optional.
There is currently no way to pre-assign an ARK to an item being bulk ingested
(this is the identifier field in Figgy)
Sometimes the simple-tiles library and the related simpler-tiles gem need to be reinstalled on Figgy machines. Here are the steps:
- SSH into the machine as pulsys:
ssh pulsys@lib-proc2 - Run the following commands:
$ sudo su deploy
$ cd /opt/figgy/current
$ bundle exec gem uninstall simpler-tiles
$ exit
$ sudo rm /usr/local/lib/libsimple-tiles.so
$ curl -L https://github.com/propublica/simple-tiles/archive/v0.6.0.tar.gz | tar -xz && cd simple-tiles-0.6.0
$ ./configure && make
$ sudo make install
$ cd .. && sudo rm -r simple-tiles-0.6.0/
- Redeploy Figgy to the server using Capistrano or Pulbot.
To export a large number of files related to a single object, use the figgy:export:files Rake
task instead of manually downloading each file. The Rake task will export all files (including
files attached to child objects of multi-volume works) to a directory named after the object's
Source Metadata Identifier (or Figgy ID for objects without one). For example, the Figgy object
https://figgy.princeton.edu/catalog/5a6e59c2-8b8d-4a70-bc6c-cad38e781636 has the Source Metadata
Identifier C1384_c0289 (the Source Metadata Identifier is displayed in the staff metadata
display in Figgy). To export this object, login to one of the lib-proc machines as the deploy
user and run the Rake task:
cd /opt/figgy/current
bundle exec rake figgy:export:files ID=5a6e59c2-8b8d-4a70-bc6c-cad38e781636 FIGGY_EXPORT_BASE=/mnt/hydra_sources/ingest_scratch/exportThe ID variable specifies the object to export, and the FIGGY_EXPORT_BASE variable overrides
the default export location to use Isilon-mounted storage to avoid filling up the local disk. This
exports the files to a directory named after the Source Metadata Identifier, in this case in
/mnt/hydra_sources/ingest_scratch/export/C1384_c0289, with a subdirectory for each child volume,
containing the files attached to that child.
It seems possible to mount Google Drive directly on a server, but we have not configured that. So transferring a large amount of files requires downloading the files and then uploading them to Google Drive. RSync is a good tool for downloading a large number of files because it can sync whole directory trees and can resume transfers. To download the object above, I used this command:
rsync -vaz lib-proc7.princeton.edu:/mnt/hydra_sources/ingest_scratch/export/C1384_c0289 .Uploading to Google Drive works pretty well even for large volumes of files, and uploading 100GB in this case completed in a few hours with no issues.