General Introduction
Paper2Code is an open source project that aims to solve the problem of lack of code implementations for machine learning papers. It automatically transforms scientific papers into runnable code repositories through the multi-agent Large Language Model (LLM) system PaperCoder. The system adopts a three-phase process of planning, analysis and code generation, which is handled separately by specialized agents to generate high-quality code implementations that are faithful to the paper. The project takes the famous "Attention Is All You Need" paper as an example, and demonstrates the process from paper to Transformer The ability to transform modeling code. It supports paper input in PDF and LaTeX formats for machine learning researchers, developers, and students.Paper2Code performs well in PaperBench benchmarks, and the code is publicly available on GitHub, making it easy to install and use.
Function List
- Automatically convert machine learning papers into executable code repositories.
- Supports PDF and LaTeX format paper input to generate structured JSON data.
- Provides a three-phase processing flow for planning, analysis and code generation.
- Generate a complete code repository including system architecture, dependencies and configuration files.
- Supports referenced and un-referenced code quality assessment on a scale of 1-5.
- Provides sample scripts to quickly run the Transformer code for the "Attention Is All You Need" paper.
- Open source and free, allowing users to modify and contribute to the code.
Using Help
Installation process
To use Paper2Code, you need to install the necessary dependencies and configure your environment. Below are the detailed installation steps:
- clone warehouse
Run the following command in the terminal to clone the Paper2Code repository locally:git clone https://github.com/going-doer/Paper2Code.git cd Paper2Code
- Installation of dependencies
Install Python dependencies, includingopenai
cap (a poem)tiktoken
etc. library:pip install openai tiktoken
If you need to use the vLLM model, refer to the official vLLM repository (https://github.com/vllm-project/vllm) for installation.
- Setting the OpenAI API Key
After obtaining the OpenAI API key, configure the environment variables:export OPENAI_API_KEY="your-api-key"
Windows users run it:
set OPENAI_API_KEY=your-api-key
- Installation of PDF Conversion Tool
Paper2Code supports converting PDF papers to JSON format. Clone the s2orc-doc2json repository:git clone https://github.com/allenai/s2orc-doc2json.git
Run the PDF conversion script:
mkdir -p ./s2orc-doc2json/output_dir/paper_coder python ./s2orc-doc2json/doc2json/grobid2json/process_pdf.py -i <PDF_PATH> -t ./s2orc-doc2json/temp_dir/ -o ./s2orc-doc2json/output_dir/paper_coder
Usage
Paper2Code offers several ways to run papers in PDF and LaTeX formats. Here are the details of the operation:
Run the sample script
Paper2Code includes a sample script for generating Transformer code for the "Attention Is All You Need" paper. Go to scripts
Catalog:
cd scripts
bash run.sh
The output will be saved in the outputs/Transformer
Catalog, Included:
planning_artifacts
: System architecture and dependency files.analyzing_artifacts
: Thesis realization detail analysis.coding_artifacts
: The generated code file.Transformer_repo
: The final code repository.
Handling custom essays
To convert your paper to code, prepare a PDF or LaTeX format file and modify the environment variables. For example, use the PDF format:
export OPENAI_API_KEY="your-api-key"
cd scripts
bash run.sh
For LaTeX format, run:
bash run_latex.sh
If other large language models are used, run:
bash run_llm.sh # PDF 格式
bash run_latex_llm.sh # LaTeX 格式
Evaluating code quality
Paper2Code supports both referenced and unreferenced code quality assessment. Run evaluation scripts:
cd codes
python eval.py \
--paper_name Transformer \
--pdf_json_path ../examples/Transformer_cleaned.json \
--data_dir ../data \
--output_dir ../outputs/Transformer \
--target_repo_dir ../outputs/Transformer_repo \
--eval_result_dir ../results \
--eval_type ref_free \
--generated_n 8 \
--papercoder
Standard warehouse paths need to be specified for reference evaluation:
--eval_type ref_based \
--gold_repo_dir ../examples/Transformer_gold_repo
Evaluation results include a 1-5 correctness score, saved in the results
Catalog.
Featured Function Operation
- Multi-agent collaboration: Planning agents to design code architecture, analyzing agents to extract thesis details, and generating agents to write modular code. Users do not need to manually intervene, the system automatically completes the whole process.
- High quality code: The generated code is faithful to the paper, includes dependency management and configuration files, and is suitable for production environments.
- Flexible input: Support PDF and LaTeX formats, compatible with a variety of paper formats, convenient for different user needs.
- Assessment tools: Provide automated evaluation scripts that quantify code correctness and help users verify implementation quality.
caveat
- Make sure the OpenAI API key is valid by running o3-mini The estimated cost of the model is $0.50-0.70.
- When converting PDF, check the JSON output for completeness to avoid formatting errors.
- Customizing the paper requires adjusting the paths and parameters in the script, cf.
README.md
The
application scenario
- academic research
Researchers can quickly turn new papers into code to validate algorithms and save time on manual coding. For example, machine learning scholars can run the code generated by Paper2Code directly to test the performance of the models in their papers. - Educational learning
Through Paper2Code, students can convert classic papers (e.g. Transformer) into code to gain a deeper understanding of the details of the model implementation and assist in learning the principles of deep learning. - Prototyping
Developers can quickly build machine learning prototypes based on the generated code repository, shortening the development cycle and making it suitable for fast iterative commercial projects.
QA
- What paper formats does Paper2Code support?
Supports machine learning papers in PDF and LaTeX formats; PDF needs to be converted to JSON, LaTeX can be processed directly. - What is the quality of the generated code?
Code is processed through a three-phase process of planning, analyzing, and generating to be faithful to the content of the paper. The evaluation tool provides a correctness score of 1-5 to ensure high-quality output. - Do I need to pay to run Paper2Code?
There is a fee to use the OpenAI API, which costs about $0.50-$0.70 to run the o3-mini model. Other features are free. - How to deal with your own paper?
Prepare the PDF or LaTeX file, configure environment variables, run therun.sh
mayberun_latex.sh
script, just adjust the path and parameters.