Translating HuggingFace Daily Papers with InternLM
This project automatically downloads and processes HuggingFace daily paper data and translates it into multiple languages using the InternLM large language model. The project runs automatically every day to ensure timely retrieval and translation of the latest papers.
- Translation Model: InternLM-3
- Developer: Shanghai AI Laboratory
- Version: internlm3-latest
- Features:
- Powerful multilingual translation capabilities
- Accurate understanding and translation of academic texts
- Real-time translation via API
- Automatic download of HuggingFace daily paper data
- Support for downloading historical data from specific dates
- Use of Beijing time as default timezone
- Complete activity logging
- JSON format paper metadata storage
- Translation of English papers to multiple languages using InternLM-3:
- Japanese
- Korean
- Spanish
- French
- Automated workflow:
- Daily automatic download of latest papers
- Automatic multilingual translation
- Automatic repository updates
- Clone the repository:
git clone https://github.com/yourusername/hf-daily-paper-newsletter-multilingual.git
cd hf-daily-paper-newsletter-multilingual- Install dependencies:
pip install -r requirements.txtpython download_papers.pypython download_papers.py --date 2024-03-20First obtain an InternLM API key, then run:
python translate_papers.py --date 2024-03-20 --api_key your_api_key_hereThe project is configured with two GitHub Actions workflows:
daily-paper-download.yml: Automatically downloads latest papers at 9:00 AM Beijing timedaily-paper-translate.yml: Automatic translation after download
To enable automatic translation, you need to set INTERNLM_API_KEY in the repository's Secrets.
- Original English paper data is stored in the
Paper_metadata_downloaddirectory - Translated papers are stored in the
Translated_papersdirectory, organized by language code:- ja/: Japanese translations
- ko/: Korean translations
- es/: Spanish translations
- fr/: French translations
- All files are saved in JSON format with names in
YYYY-MM-DD.jsonformat
- Success: exit code 0
- Error: exit code 1
- No data: exit code 0 (with warning in log)
- InternLM - For providing powerful translation capabilities
- HuggingFace - For providing daily paper data