Stars
Modern RL Post-training Infrastructure: Optimized for NVIDIA/AMD GPUs with a focus on vLLM and DeepSpeed integration, CUDA/ROCm/Triton kernels, and transparent hardware-aware scaling.
An AI skill pack for value investing, capital allocation, and behavioral discipline, distilled from Warren Buffett's 60+ years of shareholder letters.
Jobs scraper library for LinkedIn, Indeed, Glassdoor, Google, ZipRecruiter & more
AI 时代的伯克希尔:基于 Claude Code 的价值投资研究框架。巴菲特·芒格·段永平·李录四大师方法论 + 多Agent并行研究。
A curated collection of papers and resources on On-Policy Distillation for Large Language Models.
你想蒸馏的下一个员工,何必是同事。蒸馏任何人的思维方式——心智模型、决策启发式、表达DNA。Distill how anyone thinks.
RLAnything (ICML 2026) & AutoTool (ICML 2026), DemyAgent: Open-Source RL for LLMs and Agentic Scenarios
OpenClaw-RL: Train any agent simply by talking
Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning" by Zhiheng Xi et al.
AI agents running research on single-GPU nanochat training automatically
A Foundation Model for Generalist Gaming Agents
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
Awesome Deep Learning papers for industrial Search, Recommendation and Advertisement. They focus on Embedding, Matching, Pre-Ranking, Ranking, Post Ranking, Relevance, LLM and RL. Please cite our p…
[ICLR2026] This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incen…
Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
slime is an LLM post-training framework for RL Scaling.
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
A curated list of reinforcement learning (RL) for agents.
The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"
A curated guide to Generative Engine Optimization (GEO) resources: guides, tools & research to boost visibility in AI-powered search engines.
Awesome list for research on GEO (Generative Engine Optimization)
A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)
Monitoring recent cross-research on LLM & RL on arXiv for control. If there are good papers, PRs are welcome.
微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
