Skip to content

going-doer/Paper2Code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

61 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“„ Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Minju Seo, Jinheon Baek†, Seongyun Lee, and Sung Ju Hwang† († denotes equal advising)
International Conference on Learning Representations (ICLR), 2026
πŸ“„ Read the paper

PaperCoder Overview

PaperCoder is the multi-agent LLM system introduced in Paper2Code, designed to transform a paper into a code repository. It follows a three-stage pipeline: planning, analysis, and code generation, each handled by specialized agents. Our method outperforms strong baselines on both Paper2Code and PaperBench and produces faithful, high-quality implementations.


πŸ—ΊοΈ Table of Contents


⚑ Quick Start

Using OpenAI API

  • πŸ’΅ Estimated cost for using o3-mini: $0.50–$0.70
pip install openai

export OPENAI_API_KEY="<OPENAI_API_KEY>"

cd scripts
bash run.sh

Using Open Source Models with vLLM

  • If you encounter any issues installing vLLM, please refer to the official vLLM repository.
  • The default model is deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct.
pip install vllm

cd scripts
bash run_llm.sh

Output Folder Structure (Only Important Files)

outputs
β”œβ”€β”€ Transformer
β”‚   β”œβ”€β”€ analyzing_artifacts
β”‚   β”œβ”€β”€ coding_artifacts
β”‚   └── planning_artifacts
└── Transformer_repo # Final output repository

πŸ“š Detailed Setup Instructions

πŸ› οΈ Environment Setup

  • πŸ’‘ To use the o3-mini version, make sure you have the latest openai package installed.
  • We recommend using a Python virtual environment before installing dependencies.
  • πŸ“¦ Install only what you need:
    • For OpenAI API, install openai.
    • For open-source models, install vllm.
    • If you encounter any issues installing vLLM, please refer to the official vLLM repository.
pip install openai 
pip install vllm 
  • Or, if you prefer, you can install all dependencies using pip:
pip install -r requirements.txt

πŸ“„ (Option) Convert PDF to JSON

The following process describes how to convert a paper PDF into JSON format.
If you have access to the LaTeX source and plan to use it with PaperCoder, you may skip this step and proceed to πŸš€ Running PaperCoder.
Note: In our experiments, we converted all paper PDFs to JSON format.

  1. Clone the s2orc-doc2json repository to convert your PDF file into a structured JSON format.
    (For detailed configuration, please refer to the official repository.)
git clone https://github.com/allenai/s2orc-doc2json.git
  1. Run the PDF processing service.
cd ./s2orc-doc2json/grobid-0.7.3
./gradlew run
  1. Convert your PDF into JSON format.
mkdir -p ./s2orc-doc2json/output_dir/paper_coder
python ./s2orc-doc2json/doc2json/grobid2json/process_pdf.py \
    -i ${PDF_PATH} \
    -t ./s2orc-doc2json/temp_dir/ \
    -o ./s2orc-doc2json/output_dir/paper_coder

πŸš€ Running PaperCoder

  • Note: The following command runs example paper (Attention Is All You Need).
    If you want to run PaperCoder on your own paper, please modify the environment variables accordingly.

Using OpenAI API

  • πŸ’΅ Estimated cost for using o3-mini: $0.50–$0.70
# Using the PDF-based JSON format of the paper
export OPENAI_API_KEY="<OPENAI_API_KEY>"

cd scripts
bash run.sh
# Using the LaTeX source of the paper
export OPENAI_API_KEY="<OPENAI_API_KEY>"

cd scripts
bash run_latex.sh

Using Open Source Models with vLLM

  • The default model is deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct.
# Using the PDF-based JSON format of the paper
cd scripts
bash run_llm.sh
# Using the LaTeX source of the paper
cd scripts
bash run_latex_llm.sh

πŸ“¦ Paper2Code Benchmark Datasets

  • Huggingface dataset: paper2code

  • You can find the description of the Paper2Code benchmark dataset in data/paper2code.

  • For more details, refer to Section 4.1 "Paper2Code Benchmark" in the paper.


πŸ“Š Model-based Evaluation of Repositories Generated by PaperCoder

  • We evaluate repository quality using a model-based approach, supporting both reference-based and reference-free settings.
    The model critiques key implementation components, assigns severity levels, and generates a 1–5 correctness score averaged over 8 samples using o3-mini-high.

  • For more details, please refer to Section 4.3.1 (Paper2Code Benchmark) of the paper.

  • Note: The following examples evaluate the sample repository (Transformer_repo).
    Please modify the relevant paths and arguments if you wish to evaluate a different repository.

πŸ› οΈ Environment Setup

pip install tiktoken
export OPENAI_API_KEY="<OPENAI_API_KEY>"

πŸ“ Reference-free Evaluation

  • target_repo_dir is the generated repository.
cd codes/
python eval.py \
    --paper_name Transformer \
    --pdf_json_path ../examples/Transformer_cleaned.json \
    --data_dir ../data \
    --output_dir ../outputs/Transformer \
    --target_repo_dir ../outputs/Transformer_repo \
    --eval_result_dir ../results \
    --eval_type ref_free \
    --generated_n 8 \
    --papercoder

πŸ“ Reference-based Evaluation

  • target_repo_dir is the generated repository.
  • gold_repo_dir should point to the official repository (e.g., author-released code).
cd codes/
python eval.py \
    --paper_name Transformer \
    --pdf_json_path ../examples/Transformer_cleaned.json \
    --data_dir ../data \
    --output_dir ../outputs/Transformer \
    --target_repo_dir ../outputs/Transformer_repo \
    --gold_repo_dir ../examples/Transformer_gold_repo \
    --eval_result_dir ../results \
    --eval_type ref_based \
    --generated_n 8 \
    --papercoder

πŸ“„ Example Output

========================================
🌟 Evaluation Summary 🌟
πŸ“„ Paper name: Transformer
πŸ§ͺ Evaluation type: ref_based
πŸ“ Target repo directory: ../outputs/Transformer_repo
πŸ“Š Evaluation result:
        πŸ“ˆ Score: 4.5000
        βœ… Valid: 8/8
========================================
🌟 Usage Summary 🌟
[Evaluation] Transformer - ref_based
πŸ› οΈ Model: o3-mini
πŸ“₯ Input tokens: 44318 (Cost: $0.04874980)
πŸ“¦ Cached input tokens: 0 (Cost: $0.00000000)
πŸ“€ Output tokens: 26310 (Cost: $0.11576400)
πŸ’΅ Current total cost: $0.16451380
πŸͺ™ Accumulated total cost so far: $0.16451380
============================================

About

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages