This repository is a de-bloated fork of the original Indic NLP Library and integrates UrduHack submodule and Indic NLP Resources directly. This allows to work with Urdu normalization and tokenization without needing to install urduhack and indic_nlp_resources separately, which can be an issue sometimes as it is TensorFlow based. This repository is mainly created and mainted for IndicTrans2 and IndicTransTokenizer
For any queries, please get in touch with the original authors/maintainers of the respective libraries:
Indic NLP Library: anoopkunchukuttanIndic NLP Resources: anoopkunchukuttanUrduHack: UrduHack
git clone https://github.com/VarunGumma/indic_nlp_library.git
cd indic_nlp_library
pip install --editable ./
- Integrated
urduhackdirectly into the repository. - Renamed
masterbranch asmain. - Integrated
indic_nlp_resourcesdirectly into the repository. - De-bloated the repository.