Implementation of "TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning".
This repo fine-tunes an LLM to be more truth-seeking using a GRPO-style (Group Relative Policy Optimization) trainer.
📄 Paper: https://arxiv.org/pdf/2509.25760
pip install packaging
pip install ninja
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.0.post2/flash_attn-2.8.0.post2+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install vllm==0.8.5
pip install onnxruntime-gpu
pip install datasets
pip install transformers
pip install python-dotenv
pip install uuid
pip install openai
pip install math-verify
pip install jsonlines
pip install tqdm
pip install pandas
pip install wandb
cd src
pip install -e . --user
pip install -U deepspeed
cd ..Once dependencies are installed, launch training with:
bash ./src/scripts/grpo.sh- More detailed training dynamics will be provided soon
- Momprehensive evaluation results will also be provided