AI Personal Learning
and practical guidance
TRAE

Paper2Code: Automatically Converting Machine Learning Papers into Runnable Code

General Introduction

Paper2Code is an open source project that aims to solve the problem of lack of code implementations for machine learning papers. It automatically transforms scientific papers into runnable code repositories through the multi-agent Large Language Model (LLM) system PaperCoder. The system adopts a three-phase process of planning, analysis and code generation, which is handled separately by specialized agents to generate high-quality code implementations that are faithful to the paper. The project takes the famous "Attention Is All You Need" paper as an example, and demonstrates the process from paper to Transformer The ability to transform modeling code. It supports paper input in PDF and LaTeX formats for machine learning researchers, developers, and students.Paper2Code performs well in PaperBench benchmarks, and the code is publicly available on GitHub, making it easy to install and use.

Paper2Code:将机器学习论文自动转化为可运行代码-1


 

Function List

  • Automatically convert machine learning papers into executable code repositories.
  • Supports PDF and LaTeX format paper input to generate structured JSON data.
  • Provides a three-phase processing flow for planning, analysis and code generation.
  • Generate a complete code repository including system architecture, dependencies and configuration files.
  • Supports referenced and un-referenced code quality assessment on a scale of 1-5.
  • Provides sample scripts to quickly run the Transformer code for the "Attention Is All You Need" paper.
  • Open source and free, allowing users to modify and contribute to the code.

 

Using Help

Installation process

To use Paper2Code, you need to install the necessary dependencies and configure your environment. Below are the detailed installation steps:

  1. clone warehouse
    Run the following command in the terminal to clone the Paper2Code repository locally:

    git clone https://github.com/going-doer/Paper2Code.git
    cd Paper2Code
  1. Installation of dependencies
    Install Python dependencies, including openai cap (a poem) tiktoken etc. library:

    pip install openai tiktoken
    

    If you need to use the vLLM model, refer to the official vLLM repository (https://github.com/vllm-project/vllm) for installation.

  2. Setting the OpenAI API Key
    After obtaining the OpenAI API key, configure the environment variables:

    export OPENAI_API_KEY="your-api-key"
    

    Windows users run it:

    set OPENAI_API_KEY=your-api-key
    
  3. Installation of PDF Conversion Tool
    Paper2Code supports converting PDF papers to JSON format. Clone the s2orc-doc2json repository:

    git clone https://github.com/allenai/s2orc-doc2json.git
    

    Run the PDF conversion script:

    mkdir -p ./s2orc-doc2json/output_dir/paper_coder
    python ./s2orc-doc2json/doc2json/grobid2json/process_pdf.py -i <PDF_PATH> -t ./s2orc-doc2json/temp_dir/ -o ./s2orc-doc2json/output_dir/paper_coder
    

Usage

Paper2Code offers several ways to run papers in PDF and LaTeX formats. Here are the details of the operation:

Run the sample script

Paper2Code includes a sample script for generating Transformer code for the "Attention Is All You Need" paper. Go to scripts Catalog:

cd scripts
bash run.sh

The output will be saved in the outputs/Transformer Catalog, Included:

  • planning_artifacts: System architecture and dependency files.
  • analyzing_artifacts: Thesis realization detail analysis.
  • coding_artifacts: The generated code file.
  • Transformer_repo: The final code repository.

Handling custom essays

To convert your paper to code, prepare a PDF or LaTeX format file and modify the environment variables. For example, use the PDF format:

export OPENAI_API_KEY="your-api-key"
cd scripts
bash run.sh

For LaTeX format, run:

bash run_latex.sh

If other large language models are used, run:

bash run_llm.sh  # PDF 格式
bash run_latex_llm.sh  # LaTeX 格式

Evaluating code quality

Paper2Code supports both referenced and unreferenced code quality assessment. Run evaluation scripts:

cd codes
python eval.py \
--paper_name Transformer \
--pdf_json_path ../examples/Transformer_cleaned.json \
--data_dir ../data \
--output_dir ../outputs/Transformer \
--target_repo_dir ../outputs/Transformer_repo \
--eval_result_dir ../results \
--eval_type ref_free \
--generated_n 8 \
--papercoder

Standard warehouse paths need to be specified for reference evaluation:

--eval_type ref_based \
--gold_repo_dir ../examples/Transformer_gold_repo

Evaluation results include a 1-5 correctness score, saved in the results Catalog.

Featured Function Operation

  • Multi-agent collaboration: Planning agents to design code architecture, analyzing agents to extract thesis details, and generating agents to write modular code. Users do not need to manually intervene, the system automatically completes the whole process.
  • High quality code: The generated code is faithful to the paper, includes dependency management and configuration files, and is suitable for production environments.
  • Flexible input: Support PDF and LaTeX formats, compatible with a variety of paper formats, convenient for different user needs.
  • Assessment tools: Provide automated evaluation scripts that quantify code correctness and help users verify implementation quality.

caveat

  • Make sure the OpenAI API key is valid by running o3-mini The estimated cost of the model is $0.50-0.70.
  • When converting PDF, check the JSON output for completeness to avoid formatting errors.
  • Customizing the paper requires adjusting the paths and parameters in the script, cf. README.mdThe

 

application scenario

  1. academic research
    Researchers can quickly turn new papers into code to validate algorithms and save time on manual coding. For example, machine learning scholars can run the code generated by Paper2Code directly to test the performance of the models in their papers.
  2. Educational learning
    Through Paper2Code, students can convert classic papers (e.g. Transformer) into code to gain a deeper understanding of the details of the model implementation and assist in learning the principles of deep learning.
  3. Prototyping
    Developers can quickly build machine learning prototypes based on the generated code repository, shortening the development cycle and making it suitable for fast iterative commercial projects.

 

QA

  1. What paper formats does Paper2Code support?
    Supports machine learning papers in PDF and LaTeX formats; PDF needs to be converted to JSON, LaTeX can be processed directly.
  2. What is the quality of the generated code?
    Code is processed through a three-phase process of planning, analyzing, and generating to be faithful to the content of the paper. The evaluation tool provides a correctness score of 1-5 to ensure high-quality output.
  3. Do I need to pay to run Paper2Code?
    There is a fee to use the OpenAI API, which costs about $0.50-$0.70 to run the o3-mini model. Other features are free.
  4. How to deal with your own paper?
    Prepare the PDF or LaTeX file, configure environment variables, run the run.sh maybe run_latex.sh script, just adjust the path and parameters.
May not be reproduced without permission:Chief AI Sharing Circle " Paper2Code: Automatically Converting Machine Learning Papers into Runnable Code
en_USEnglish