The DocTamper dataset is now avaliable at BaiduDrive and Google Drive (part1 and part2).
The DocTamper dataset is only available for non-commercial use, you can request a password for it by sending an email with education email to [email protected] explaining the purpose.
To visualize the images and their corresponding ground-truths from the provided .mdb files, you can run this command "python vizlmdb.py --input DocTamperV1-FCD --i 0".
The official implementation of the paper Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution is in the "models" directory.
I delay the release of training codes as forced by my supervisor and the cooperative enterprise who bought them. My training pipline for DocTamper dataset and the IoU metric heavily brought from a famous project in this area, the results of the paper can be easily re-produced with it, you just need to adjust the loss functions and the learing rate decay curve. I also used its augmentation pipline except for (RandomBrightnessContrast, ShiftScaleRotate, CoarseDropout).
Open Source Scheme:
1、Inference models and codes: June, 2023.
2、Training codes: TBD.
3、Data synthesis code: Within 2024.
Any question about this work please contact [email protected].
Dear researcher,
Thank you for your attention. the password of the dataset is IntSig_DLVC_411
After unrar, the resulting .mdb files should be opened by Python scripts:
To visualize the images and their corresponding ground-truths from the provided .mdb files, you can run the command like "python vizlmdb.py --input DocTamperV1-FCD --i 0". The vizlmdb.py is https://github.com/qcf-568/DocTamper/blob/main/vizlmdb.py The dataloader for training or inference can be refered to Line43~Line107 of https://github.com/qcf-568/DocTamper/blob/main/models/eval_dtd.py To run the eval_dtd.py, you can refer to this Colab Notebook https://colab.research.google.com/drive/1rWaSKy2Rsy5welyvj6FbzF01o2zv8ips?usp=sharing Kind Regards,
Chenfan Qu