Highlights
- Pro
Stars
Implementation code of Reasoning Test-time Compute with CVAE
Code for GFlowNet-DPO (Direct Preference Optimization) EMNLP 2024 Main
A Framework for LLM-based Multi-Agent Reinforced Training and Inference
This is the code for Optimistic Multi-Agent Policy Gradient.
Lecture slides for the MARL book (www.marl-book.com)
Official code repo for the MARL book (www.marl-book.com)
Simple (but often Strong) Baselines for POMDPs in PyTorch, ICML 2022
Running MAPPO model in Tiny-Hanabi using on-policy environment
PyTorch implementation of Soft Actor-Critic (SAC)
PyTorch implementations of deep reinforcement learning algorithms and environments
Running sparta model in Tiny-Hanabi using open spiel environment
This is the official implementation of Multi-Agent PPO (MAPPO).
OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
[ALA 2022] Analyzing the Deep RL algorithms SPG, VPG, PPO on the cooperative card game Hanabi.
[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)
BenchMARL is a library for benchmarking Multi-Agent Reinforcement Learning (MARL). BenchMARL allows to quickly compare different MARL algorithms, tasks, and models while being systematically ground…
Official PyTorch implementation of AlberDICE
A collection of environments and reference agents for planning and reinforcement learning research in partially observable, multi-agent environments.
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKT…
Repo for the Deep Reinforcement Learning Nanodegree program

