Skip to content
View dematsunaga's full-sized avatar

Highlights

  • Pro

Block or report dematsunaga

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 1 Updated Mar 24, 2026
Python 1 Updated Jan 21, 2026

Implementation code of Reasoning Test-time Compute with CVAE

Python 12 7 Updated Dec 17, 2025
Python 13 9 Updated Dec 17, 2025
Python 14 10 Updated Dec 17, 2025
Python 12 10 Updated Dec 17, 2025
Python 15 13 Updated Dec 10, 2025

Code for GFlowNet-DPO (Direct Preference Optimization) EMNLP 2024 Main

Python 19 8 Updated Feb 22, 2026

Multi-Agent Reinforcement Learning in LLMs

Python 1 Updated Oct 24, 2025

A Framework for LLM-based Multi-Agent Reinforced Training and Inference

Python 468 48 Updated Feb 19, 2026

This is the code for Optimistic Multi-Agent Policy Gradient.

Python 12 1 Updated Sep 5, 2024

Lecture slides for the MARL book (www.marl-book.com)

TeX 162 35 Updated May 14, 2025

Official code repo for the MARL book (www.marl-book.com)

Python 621 104 Updated Mar 30, 2025

Simple (but often Strong) Baselines for POMDPs in PyTorch, ICML 2022

Python 344 48 Updated Aug 22, 2024

Running MAPPO model in Tiny-Hanabi using on-policy environment

Python 1 Updated Jan 20, 2025

PyTorch implementation of Soft Actor-Critic (SAC)

Jupyter Notebook 594 110 Updated Dec 5, 2021
Python 14 3 Updated Mar 5, 2023

PyTorch implementations of deep reinforcement learning algorithms and environments

Python 5,929 1,211 Updated Jul 25, 2024

Running sparta model in Tiny-Hanabi using open spiel environment

Python 1 Updated Nov 23, 2024

This is the official implementation of Multi-Agent PPO (MAPPO).

Python 1,941 374 Updated Jul 18, 2024

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.

C++ 5,104 1,108 Updated Mar 26, 2026

[ALA 2022] Analyzing the Deep RL algorithms SPG, VPG, PPO on the cooperative card game Hanabi.

Python 5 3 Updated Mar 17, 2022
Python 6 1 Updated Nov 1, 2022

[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)

Python 1,556 188 Updated Mar 28, 2026

Monte Carlo tree search in JAX

Python 2,603 207 Updated Sep 2, 2025

BenchMARL is a library for benchmarking Multi-Agent Reinforcement Learning (MARL). BenchMARL allows to quickly compare different MARL algorithms, tasks, and models while being systematically ground…

Python 593 120 Updated Feb 7, 2026

Official PyTorch implementation of AlberDICE

Python 23 14 Updated Dec 8, 2023

A collection of environments and reference agents for planning and reinforcement learning research in partially observable, multi-agent environments.

Python 30 7 Updated Jun 2, 2025

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKT…

Python 3,887 842 Updated May 29, 2022

Repo for the Deep Reinforcement Learning Nanodegree program

Jupyter Notebook 5,153 2,371 Updated Nov 16, 2023
Next