Skip to content

mhkeller/pdf-diff

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pdf-diff

Finds differences between two PDF documents:

  1. Compares the text layers of two PDF documents and outputs the bounding boxes of changed text in JSON.
  2. Rasterizes the changed pages in the PDFs to a PNG and draws red outlines around changed text.

Example Image Output

Unfortunately while I started this project in node.js, I couldn't figure out how to quickly do the rendering part in node.js and so I switched to Python where I had some similar code laying around already.

Installation

# for the comparison tool

npm install

git clone https://github.com/mozilla/pdf.js
cd pdf.js
node make singlefile
cd ..

# for the renderer

sudo pip3 install pillow

Running

Compute the changes (writes a JSON file):

node index.js before.pdf after.pdf | grep -v "^Warning:" > changes.json

(Unfortunately the pdf.js library prints warnings on STDOUT, so we have to filter those out.)

Render the changes (turns the PDFs + JSON file into a big PNG image):

python3 render.py < changes.json > test.png

About

A PDF comparison utility in node.js+Python.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published