Finds differences between two PDF documents:
- Compares the text layers of two PDF documents and outputs the bounding boxes of changed text.
- Rasterizes the changed pages in the PDFs to a PNG and draws red outlines around changed text.
Unfortunately while I started this project in node.js, I couldn't figure out how to quickly do the rendering part in node.js and so I switched to Python where I had some similar code laying around already.
# for the comparison tool
npm install
git clone https://github.com/mozilla/pdf.js
cd pdf.js
node make singlefile
cd ..
## for the renderer
sudo pip3 install pillow
# Compute the changes.
# Unfortunately the pdf.js prints warnings on STDOUT, so we have
# to filter those out.
node index.js before.pdf after.pdf | grep -v "^Warning:" > changes.json
# Render the changes.
python3 render.py < changes.json > test.png