Name	Name	Last commit message	Last commit date
Latest commit History 672 Commits
database	database
dataflow_agent	dataflow_agent
deploy	deploy
docs	docs
fastapi_app	fastapi_app
frontend-workflow	frontend-workflow
script	script
static	static
supabase/functions	supabase/functions
tests	tests
.dockerignore	.dockerignore
.gitignore	.gitignore
CITATION.cff	CITATION.cff
Dockerfile	Dockerfile
LICENSE	LICENSE
README.md	README.md
README_CN.md	README_CN.md
docker-compose.yml	docker-compose.yml
mkdocs.yml	mkdocs.yml
pyproject.toml	pyproject.toml
requirements-base.txt	requirements-base.txt
requirements-paper-backup.txt	requirements-paper-backup.txt
requirements-paper.txt	requirements-paper.txt
requirements-win-base.txt	requirements-win-base.txt

Paper2Any

English | Zhong Wen

Focus on paper multimodal workflows: from paper PDFs/screenshots/text to one-click generation of model diagrams, technical roadmaps, experimental plots, and slide decks

News
Core Features
Showcase
Drawio
Quick Start
Project Structure
Roadmap
Contributing

News

Tip

[NEW] 2026-02-02 * Paper2Rebuttal
Added rebuttal drafting support with structured response guidance and image-aware revision prompts.

Tip

[NEW] 2026-01-28 * Drawio Update
Added Drawio support for visual diagram creation and showcase-ready outputs in the workflow.
KB updates in one line: multi-file PPT generation with doc convert/merge, optional image injection, and embedding-assisted retrieval.

Tip

[NEW] 2026-01-25 * New Features
Added AI-assisted outline editing, three-layer model configuration system for flexible model selection, and user points management with daily quota allocation.
Online Demo: http://dcai-paper2any.nas.cpolar.cn/

Tip

[NEW] 2026-01-20 * Bug Fixes
Fixed bugs in experimental plot generation (image/text) and resolved the missing historical files issue.
Online Demo: http://dcai-paper2any.nas.cpolar.cn/

Tip

[NEW] 2026-01-14 * Feature Updates & Backend Architecture Upgrade

Feature Updates: Added Image2PPT, optimized Paper2Figure interaction, and improved PDF2PPT effects.
Standardized API: Refactored backend interfaces with RESTful /api/v1/ structure, removing obsolete endpoints for better maintainability.
Dynamic Configuration: Supported dynamic model selection (e.g., GPT-4o, Qwen-VL) via API parameters, eliminating hardcoded model dependencies.
Online Demo: http://dcai-paper2any.nas.cpolar.cn/

2025-12-12 * Paper2Figure Web public beta is live
2025-10-01 * Released the first version 0.1.0

Core Features

From paper PDFs / images / text to editable scientific figures, slide decks, video scripts, academic posters, and other multimodal content in one click.

Paper2Any currently includes the following sub-capabilities:

Paper2Figure - Editable Scientific Figures: Model architecture diagrams, technical roadmaps (PPT + SVG), and experimental plots with editable PPTX output.
Paper2Diagram / Image2Drawio - Editable Diagrams: Generate draw.io diagrams from paper/text or images, with drawio/png/svg export and chat-based edits.
Paper2PPT - Editable Slide Decks: Paper/text/topic to PPT, long-doc support, and built-in table/figure extraction.
Paper2Rebuttal: Draft structured rebuttals and revision responses with claims-to-evidence grounding.
PDF2PPT - Layout-Preserving Conversion: Accurate layout retention for PDF - editable PPTX.
Image2PPT - Image to Slides: Convert images or screenshots into structured slides.
PPTPolish - Smart Beautification: AI-based layout optimization and style transfer.
Paper2Video: Generate video scripts and narration assets.
Paper2Technical: Produce technical reports and method summaries.
Knowledge Base (KB): Ingest/embedding, semantic search, and KB-driven PPT/podcast/mindmap generation.

Showcase

Drawio

_{Diagram generation (mindmap / flowchart / ER ...)}

_{Model diagrams from PDF or text (research figure generation)}

_{Image to editable DrawIO diagram}

Paper2Rebuttal: Rebuttal Drafting

_{Rebuttal drafting and revision support}

Paper2Figure: Scientific Figure Generation

_{Model Architecture Diagram Generation}

_{Model Architecture Diagram Generation}

_{Technical Roadmap Generation}

_{Experimental Plot Generation (Multiple Styles)}

Paper2PPT: Paper to Presentation

_{PPT Generation Demo}

_{Paper / Text / Topic - PPT}

_{Long Document Support (40+ Slides)}

_{Intelligent Table Extraction & Insertion}

_{AI-Assisted Outline Editing}

_{Version History Management}

PPT Smart Beautification

_{AI-based Layout Optimization}

_{AI-based Layout Optimization & Style Transfer}

PDF2PPT: Layout-Preserving Conversion

_{Intelligent Cutout & Layout Preservation}

_Image2PPT

Quick Start

Requirements

Docker (Recommended) -- Deployment & Updates

# 1. Clone git clone https://github.com/OpenDCAI/Paper2Any.git cd Paper2Any # 2. Configure environment variables cp fastapi_app/.env.example fastapi_app/.env cp frontend-workflow/.env.example frontend-workflow/.env

Required configuration:

fastapi_app/.env (backend):

# Required: Your LLM API URL (replace with your own) DEFAULT_LLM_API_URL=https://api.openai.com/v1/ # Optional: Supabase (skip for no auth -- core features still work) # SUPABASE_URL=https://your-project-id.supabase.co # SUPABASE_ANON_KEY=your_supabase_anon_key

frontend-workflow/.env (frontend):

# Required: LLM API URLs available in the UI dropdown (comma separated) VITE_DEFAULT_LLM_API_URL=https://api.openai.com/v1 VITE_LLM_API_URLS=https://api.openai.com/v1 # Optional: Supabase (keep consistent with backend) # VITE_SUPABASE_URL=https://your-project-id.supabase.co # VITE_SUPABASE_ANON_KEY=your_supabase_anon_key

# 3. Build + run docker compose up -d --build

Open:

Frontend: http://localhost:3000
Backend health: http://localhost:8000/health

GPU services note: Docker only starts the frontend and backend. No GPU model services are included.

Paper2PPT, Paper2Figure, Knowledge Base, etc. only need LLM APIs and work out of the box.

PDF2PPT, Image2PPT, Image2Drawio require the SAM3 segmentation service (needs GPU), deployed separately:
# On a machine with GPU python -m dataflow_agent.toolkits.model_servers.sam3_server \ --port 8001 --checkpoint models/sam3/sam3.pt \ --bpe models/sam3/bpe_simple_vocab_16e6.txt.gz --device cuda
Then add to fastapi_app/.env: SAM3_SERVER_URLS=http://GPU_MACHINE_IP:8001

See the "Advanced: Local Model Server Load Balancing" section below for details.

Modify & update:

After changing code or .env, rebuild: docker compose up -d --build
Pull latest code and rebuild:
- git pull
- docker compose up -d --build

Common commands:

View logs: docker compose logs -f
Stop services: docker compose down

Notes:

The first build may take a while (system deps + Python deps).
Frontend env is baked at build time (compose build args). If you change it, rebuild with docker compose up -d --build.
Outputs/models are mounted to the host (./outputs, ./models) for persistence.

Linux Installation

We recommend using Conda to create an isolated environment (Python 3.11).

1. Create Environment & Install Base Dependencies

# 0. Create and activate a conda environment conda create -n paper2any python=3.11 -y conda activate paper2any # 1. Clone repository git clone https://github.com/OpenDCAI/Paper2Any.git cd Paper2Any # 2. Install base dependencies pip install -r requirements-base.txt # 3. Install in editable (dev) mode pip install -e .

2. Install Paper2Any-specific Dependencies (Required)

Paper2Any involves LaTeX rendering, vector graphics processing as well as PPT/PDF conversion, which require extra dependencies:

# 1. Python dependencies pip install -r requirements-paper.txt || pip install -r requirements-paper-backup.txt # 2. LaTeX engine (tectonic) - recommended via conda conda install -c conda-forge tectonic -y # 3. Resolve doclayout_yolo dependency conflicts (Important) pip install doclayout_yolo --no-deps # 4. System dependencies (Ubuntu example) sudo apt-get update sudo apt-get install -y inkscape libreoffice poppler-utils wkhtmltopdf

3. Environment Variables

export DF_API_KEY=your_api_key_here export DF_API_URL=xxx # Optional: if you need a third-party API gateway export MINERU_DEVICES="0,1,2,3" # Optional: MinerU task GPU resource pool

Tip

For detailed configuration guide, see Configuration Guide for step-by-step instructions on configuring models, environment variables, and starting services.

4. Configure Environment Files (Optional)

Click to expand: Detailed .env Configuration Guide

Paper2Any uses two .env files for configuration. Both are optional - you can run the application without them using default settings.

Step 1: Copy Example Files

# Copy backend environment file cp fastapi_app/.env.example fastapi_app/.env # Copy frontend environment file cp frontend-workflow/.env.example frontend-workflow/.env

Step 2: Backend Configuration (`fastapi_app/.env`)

Supabase (Optional) - Only needed if you want user authentication and cloud storage:

SUPABASE_URL=https://your-project-id.supabase.co SUPABASE_ANON_KEY=your_supabase_anon_key

Model Configuration - Customize which models to use for different workflows:

# Default LLM API URL DEFAULT_LLM_API_URL=http://123.129.219.111:3000/v1/ # Workflow-level defaults PAPER2PPT_DEFAULT_MODEL=gpt-5.1 PAPER2PPT_DEFAULT_IMAGE_MODEL=gemini-3-pro-image-preview PDF2PPT_DEFAULT_MODEL=gpt-4o # ... see .env.example for full list

Step 3: Frontend Configuration (`frontend-workflow/.env`)

LLM Provider Configuration - Controls the API endpoint dropdown in the UI:

# Default API URL shown in the UI VITE_DEFAULT_LLM_API_URL=https://api.apiyi.com/v1 # Available API URLs in the dropdown (comma-separated) VITE_LLM_API_URLS=https://api.apiyi.com/v1,http://b.apiyi.com:16888/v1,http://123.129.219.111:3000/v1

What happens when you modify VITE_LLM_API_URLS:

The frontend will display a dropdown menu with all URLs you specify
Users can select different API endpoints without manually typing URLs
Useful for switching between OpenAI, local models, or custom API gateways

Supabase (Optional) - Uncomment these lines if you want user authentication:

VITE_SUPABASE_URL=https://your-project.supabase.co VITE_SUPABASE_ANON_KEY=your-anon-key SUPABASE_SERVICE_ROLE_KEY=your-service-role-key SUPABASE_JWT_SECRET=your-jwt-secret

Running Without Supabase

If you skip Supabase configuration:

All core features work normally
CLI scripts work without any configuration
No user authentication or quotas
No cloud file storage

Note

Quick Start: You can skip the .env configuration entirely and use CLI scripts directly with --api-key parameter. See CLI Scripts section below.

Advanced Configuration: Local Model Service Load Balancing

If you are deploying in a high-concurrency local environment, you can use script/start_model_servers.sh to start a local model service cluster (MinerU / SAM / OCR).

Script location: /DataFlow-Agent/script/start_model_servers.sh

Main configuration items:

MinerU (PDF Parsing)
- MINERU_MODEL_PATH: Model path (default models/MinerU2.5-2509-1.2B)
- MINERU_GPU_UTIL: GPU memory utilization (default 0.2)
- Instance configuration: By default, 4 instances are started on GPU 0 and GPU 4 respectively (8 in total), ports 8011-8018.
- Load Balancer: Port 8010, automatically dispatches requests.
SAM3 (Segment Anything Model 3)
- Instance configuration: By default, one instance per configured GPU, ports start from 8021.
- Model assets: default paths are ./models/sam3/sam3.pt and ./models/sam3/bpe_simple_vocab_16e6.txt.gz.
- Load Balancer: Port 8020.
OCR (PaddleOCR)
- Config: Runs on CPU, uses uvicorn's worker mechanism (4 workers by default).
- Port: 8003.

Before using, please modify gpu_id and the number of instances in the script according to your actual GPU count and memory.

For SAM3 assets migration into this repository, run:

bash script/setup_sam3_assets.sh link # or: bash script/setup_sam3_assets.sh copy

For local one-command development test on a single GPU (SAM3 + backend + frontend), run:

bash script/start_local_sam3_dev.sh

Windows Installation

Note

We currently recommend trying Paper2Any on Linux / WSL. If you need to deploy on native Windows, please follow the steps below.

1. Create Environment & Install Base Dependencies

# 0. Create and activate a conda environment conda create -n paper2any python=3.12 -y conda activate paper2any # 1. Clone repository git clone https://github.com/OpenDCAI/Paper2Any.git cd Paper2Any # 2. Install base dependencies pip install -r requirements-win-base.txt # 3. Install in editable (dev) mode pip install -e .

2. Install Paper2Any-specific Dependencies (Recommended)

Paper2Any involves LaTeX rendering and vector graphics processing, which require extra dependencies (see requirements-paper.txt):

# Python dependencies pip install -r requirements-paper.txt # tectonic: LaTeX engine (recommended via conda) conda install -c conda-forge tectonic -y

Install Inkscape (SVG/Vector Graphics Processing | Recommended/Required)

Download and install (Windows 64-bit MSI): Inkscape Download
Add the Inkscape executable directory to the system environment variable Path (example): C:\Program Files\Inkscape\bin\

Tip

After configuring the Path, it is recommended to reopen the terminal (or restart VS Code / PowerShell) to ensure the environment variables take effect.

Install Windows Build of vLLM (Optional | For Local Inference Acceleration)

Release page: vllm-windows releases
Recommended version: 0.11.0

pip install vllm-0.11.0+cu124-cp312-cp312-win_amd64.whl

Important

Please make sure the .whl matches your current environment:

Python: cp312 (Python 3.12)
Platform: win_amd64
CUDA: cu124 (must match your local CUDA / driver)

Launch Application

Paper2Any - Paper Workflow Web Frontend (Recommended)

# Start backend API cd fastapi_app uvicorn main:app --host 0.0.0.0 --port 8000 # Start frontend (new terminal) cd frontend-workflow npm install npm run dev

Configure Frontend Proxy

Modify server.proxy in frontend-workflow/vite.config.ts:

export default defineConfig({ plugins: [react()], server: { port: 3000, open: true, allowedHosts: true, proxy: { '/api': { target: 'http://127.0.0.1:8000', // FastAPI backend address changeOrigin: true, }, }, }, })

Visit http://localhost:3000.

Windows: Load MinerU Pre-trained Model

# Start in PowerShell vllm serve opendatalab/MinerU2.5-2509-1.2B ` --host 127.0.0.1 ` --port 8010 ` --logits-processors mineru_vl_utils:MinerULogitsProcessor ` --gpu-memory-utilization 0.6 ` --trust-remote-code ` --enforce-eager

Launch Application

Web Frontend (Recommended)

# Start backend API cd fastapi_app uvicorn main:app --host 0.0.0.0 --port 8000 # Start frontend (new terminal) cd frontend-workflow npm install npm run dev

Visit http://localhost:3000.

CLI Scripts (Command-Line Interface)

Paper2Any provides standalone CLI scripts that accept command-line parameters for direct workflow execution without requiring the web frontend/backend.

Environment Variables

Configure API access via environment variables (optional):

export DF_API_URL=https://api.openai.com/v1 # LLM API URL export DF_API_KEY=sk-xxx # API key export DF_MODEL=gpt-4o # Default model

Available CLI Scripts

1. Paper2Figure CLI - Generate scientific figures (3 types)

# Generate model architecture diagram from PDF python script/run_paper2figure_cli.py \ --input paper.pdf \ --graph-type model_arch \ --api-key sk-xxx # Generate technical roadmap from text python script/run_paper2figure_cli.py \ --input "Transformer architecture with attention mechanism" \ --input-type TEXT \ --graph-type tech_route # Generate experimental data visualization python script/run_paper2figure_cli.py \ --input paper.pdf \ --graph-type exp_data

Graph types: model_arch (model architecture), tech_route (technical roadmap), exp_data (experimental plots)

2. Paper2PPT CLI - Convert papers to PPT presentations

# Basic usage python script/run_paper2ppt_cli.py \ --input paper.pdf \ --api-key sk-xxx \ --page-count 15 # With custom style python script/run_paper2ppt_cli.py \ --input paper.pdf \ --style "Academic style; English; Modern design" \ --language en

3. PDF2PPT CLI - One-click PDF to editable PPT

# Basic conversion (no AI enhancement) python script/run_pdf2ppt_cli.py --input slides.pdf # With AI enhancement python script/run_pdf2ppt_cli.py \ --input slides.pdf \ --use-ai-edit \ --api-key sk-xxx

4. Image2PPT CLI - Convert images to editable PPT

# Basic conversion python script/run_image2ppt_cli.py --input screenshot.png # With AI enhancement python script/run_image2ppt_cli.py \ --input diagram.jpg \ --use-ai-edit \ --api-key sk-xxx

5. PPT2Polish CLI - Beautify existing PPT files

# Basic beautification python script/run_ppt2polish_cli.py \ --input old_presentation.pptx \ --style "Academic style, clean and elegant" \ --api-key sk-xxx # With reference image for style consistency python script/run_ppt2polish_cli.py \ --input old_presentation.pptx \ --style "Modern minimalist style" \ --ref-img reference_style.png \ --api-key sk-xxx

Note

System Requirements for PPT2Polish:

LibreOffice: sudo apt-get install libreoffice (Ubuntu/Debian)
pdf2image: pip install pdf2image
poppler-utils: sudo apt-get install poppler-utils

Common Options

All CLI scripts support these common options:

--api-url URL - LLM API URL (default: from DF_API_URL env var)
--api-key KEY - API key (default: from DF_API_KEY env var)
--model NAME - Text model name (default: varies by script)
--output-dir DIR - Custom output directory (default: outputs/cli/{script_name}/{timestamp})
--help - Show detailed help message

For complete parameter documentation, run any script with --help:

python script/run_paper2figure_cli.py --help

Project Structure

Paper2Any/ +-- dataflow_agent/ # Core codebase | +-- agentroles/ # Agent definitions | | +-- paper2any_agents/ # Paper2Any-specific agents | +-- workflow/ # Workflow definitions | +-- promptstemplates/ # Prompt templates | +-- toolkits/ # Toolkits (drawing, PPT generation, etc.) +-- fastapi_app/ # Backend API service +-- frontend-workflow/ # Frontend web interface +-- static/ # Static assets +-- script/ # Script tools +-- tests/ # Test cases

Roadmap

Feature	Status	Sub-features
Paper2Figure _{Editable Scientific Figures}
Paper2Diagram _{Drawio Diagrams}
Paper2PPT _{Editable Slide Decks}
PDF2PPT _{Layout-Preserving Conversion}
Image2PPT _{Image to Slides}
PPTPolish _{Smart Beautification}
Knowledge Base _{KB Workflows}
Paper2Video _{Video Script Generation}

Contributing

We welcome all forms of contribution!

License

This project is licensed under Apache License 2.0.

If this project helps you, please give us a Star!

_{Scan to join the community WeChat group}

Made with by OpenDCAI Team

Folders and files

Latest commit

History

Repository files navigation

Paper2Any

Table of Contents

News

Core Features

Showcase

Drawio

Paper2Rebuttal: Rebuttal Drafting

Paper2Figure: Scientific Figure Generation

Paper2PPT: Paper to Presentation

PPT Smart Beautification

PDF2PPT: Layout-Preserving Conversion

Quick Start

Requirements

Linux Installation

1. Create Environment & Install Base Dependencies

2. Install Paper2Any-specific Dependencies (Required)

3. Environment Variables

4. Configure Environment Files (Optional)

Step 1: Copy Example Files

Step 2: Backend Configuration (fastapi_app/.env)

Step 3: Frontend Configuration (frontend-workflow/.env)

Running Without Supabase

Windows Installation

1. Create Environment & Install Base Dependencies

2. Install Paper2Any-specific Dependencies (Recommended)

Install Windows Build of vLLM (Optional | For Local Inference Acceleration)

Launch Application

Launch Application

Web Frontend (Recommended)

CLI Scripts (Command-Line Interface)

Environment Variables

Available CLI Scripts

Common Options

Project Structure

Roadmap

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Step 2: Backend Configuration (`fastapi_app/.env`)

Step 3: Frontend Configuration (`frontend-workflow/.env`)

Packages