Stars
General plug-and-play inference library for Recursive Language Models (RLMs), supporting various sandboxes.
Ring attention implementation with flash attention
Inspect: A framework for large language model evaluations
Thermodynamic Hypergraphical Model Library in JAX
Super basic implementation (gist-like) of RLMs with REPL environments.
Mocked Single-Page-Applications for Evals & RL
Post-training with Tinker
Automated LLM evaluation suite for medical tasks
Lightly-reviewed collection of community environments
SkyRL: A Modular Full-stack RL Library for LLMs
⚔️ OpenHands PR Arena ⚔️ is a platform for evaluating and benchmarking agentic coding assistants through paired pull request (PR) generations.
A benchmark for LLMs on complicated tasks in the terminal
This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedback
main-horse / hnet-old
Forked from goombalab/hnetH-Net Dynamic Hierarchical Architecture
Background coding agent and real-time web interface
Real-time terminal monitor for InfiniBand networks - htop for high-speed interconnects
Estimate the throughput of OAI compatible servers
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
A collection of formalized statements of conjectures in Lean.






