This repository was archived by the owner on Jun 15, 2023. It is now read-only.

Home

Jump to bottom

Chris Hager edited this page Nov 2, 2015 · 5 revisions

Welcome to the pdfx wiki!

Discussions

https://news.ycombinator.com/item?id=10452048

Various

https://en.wikipedia.org/wiki/Digital_object_identifier

(Possibly) Useful Tools / Libraries

http://blog.matt-swain.com/post/25650072381/a-lightweight-xmp-parser-for-extracting-pdf
https://github.com/ckreibich/scholar.py (A parser for Google Scholar, written in Python)
https://code.google.com/p/pdfmeat/ (PDF MEtadata Acquisition Tool (aka pdftobibtex/pdf2bibtex))
https://code.google.com/p/pdfssa4met/ (PDF Structure and Syntactic Analysis for Metadata Extraction and Tagging)
https://github.com/CrossRef/pdfextract (A tool and library that can extract various areas of text from a PDF, especially a scholarly article PDF) [ruby]
https://github.com/ContentMine/quickscrape [nodejs]