Highlights
- Pro
Stars
PyTorch implementations of deep reinforcement learning algorithms and environments
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKT…
Code for the MADDPG algorithm from the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"
This is the official implementation of Multi-Agent PPO (MAPPO).
Environment generation code for the paper "Emergent Tool Use From Multi-Agent Autocurricula"
[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)
Scalable Multi-Agent RL Training School for Autonomous Driving
Really Fast End-to-End Jax RL Implementations
A suite of test scenarios for multi-agent reinforcement learning.
An extension of the PyMARL codebase that includes additional algorithms and environment support
Official code repo for the MARL book (www.marl-book.com)
BenchMARL is a library for benchmarking Multi-Agent Reinforcement Learning (MARL). BenchMARL allows to quickly compare different MARL algorithms, tasks, and models while being systematically ground…
A Framework for LLM-based Multi-Agent Reinforced Training and Inference
The AMLSim project is intended to provide a multi-agent based simulator that generates synthetic banking transaction data together with a set of known money laundering patterns - mainly for the pur…
Simple (but often Strong) Baselines for POMDPs in PyTorch, ICML 2022
Pytorch solutions for UC Berkeley's cs285 assignments
Codes accompanying the paper "Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning" (NeurIPS 2021 Spotlight https://arxiv.org/abs/2106.03400)
Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning
Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method
ICLR 2021: "Monte-Carlo Planning and Learning with Language Action Value Estimates"
A collection of environments and reference agents for planning and reinforcement learning research in partially observable, multi-agent environments.
Official PyTorch implementation of AlberDICE
Code for GFlowNet-DPO (Direct Preference Optimization) EMNLP 2024 Main

