Paper2Any
English | Zhong Wen
Focus on paper multimodal workflows: from paper PDFs/screenshots/text to one-click generation of model diagrams, technical roadmaps, experimental plots, and slide decks
| Universal File Support | AI-Powered Generation | Custom Styling | Lightning Speed |
Table of Contents
- News
- Core Features
- Showcase
- Drawio
- Quick Start
- Project Structure
- Roadmap
- Contributing
News
Tip
[NEW] 2026-02-02 * Paper2Rebuttal
Added rebuttal drafting support with structured response guidance and image-aware revision prompts.
Tip
[NEW] 2026-01-28 * Drawio Update
Added Drawio support for visual diagram creation and showcase-ready outputs in the workflow.
KB updates in one line: multi-file PPT generation with doc convert/merge, optional image injection, and embedding-assisted retrieval.
Tip
[NEW] 2026-01-25 * New Features
Added AI-assisted outline editing, three-layer model configuration system for flexible model selection, and user points management with daily quota allocation.
Online Demo: http://dcai-paper2any.nas.cpolar.cn/
Tip
[NEW] 2026-01-20 * Bug Fixes
Fixed bugs in experimental plot generation (image/text) and resolved the missing historical files issue.
Online Demo: http://dcai-paper2any.nas.cpolar.cn/
Tip
[NEW] 2026-01-14 * Feature Updates & Backend Architecture Upgrade
- Feature Updates: Added Image2PPT, optimized Paper2Figure interaction, and improved PDF2PPT effects.
- Standardized API: Refactored backend interfaces with RESTful
/api/v1/structure, removing obsolete endpoints for better maintainability. - Dynamic Configuration: Supported dynamic model selection (e.g., GPT-4o, Qwen-VL) via API parameters, eliminating hardcoded model dependencies.
Online Demo: http://dcai-paper2any.nas.cpolar.cn/
- 2025-12-12 * Paper2Figure Web public beta is live
- 2025-10-01 * Released the first version
0.1.0
Core Features
From paper PDFs / images / text to editable scientific figures, slide decks, video scripts, academic posters, and other multimodal content in one click.
Paper2Any currently includes the following sub-capabilities:
- Paper2Figure - Editable Scientific Figures: Model architecture diagrams, technical roadmaps (PPT + SVG), and experimental plots with editable PPTX output.
- Paper2Diagram / Image2Drawio - Editable Diagrams: Generate draw.io diagrams from paper/text or images, with drawio/png/svg export and chat-based edits.
- Paper2PPT - Editable Slide Decks: Paper/text/topic to PPT, long-doc support, and built-in table/figure extraction.
- Paper2Rebuttal: Draft structured rebuttals and revision responses with claims-to-evidence grounding.
- PDF2PPT - Layout-Preserving Conversion: Accurate layout retention for PDF - editable PPTX.
- Image2PPT - Image to Slides: Convert images or screenshots into structured slides.
- PPTPolish - Smart Beautification: AI-based layout optimization and style transfer.
- Paper2Video: Generate video scripts and narration assets.
- Paper2Technical: Produce technical reports and method summaries.
- Knowledge Base (KB): Ingest/embedding, semantic search, and KB-driven PPT/podcast/mindmap generation.
Showcase
Drawio
Diagram generation (mindmap / flowchart / ER ...)
Model diagrams from PDF or text (research figure generation)
Paper2Rebuttal: Rebuttal Drafting
Paper2Figure: Scientific Figure Generation
Paper2PPT: Paper to Presentation
PPT Generation Demo
Paper / Text / Topic - PPT
Long Document Support (40+ Slides)
PPT Smart Beautification
PDF2PPT: Layout-Preserving Conversion
Quick Start
Requirements
Docker (Recommended) -- Deployment & Updates
git clone https://github.com/OpenDCAI/Paper2Any.git
cd Paper2Any
# 2. Configure environment variables
cp fastapi_app/.env.example fastapi_app/.env
cp frontend-workflow/.env.example frontend-workflow/.env
Required configuration:
fastapi_app/.env (backend):
DEFAULT_LLM_API_URL=https://api.openai.com/v1/
# Optional: Supabase (skip for no auth -- core features still work)
# SUPABASE_URL=https://your-project-id.supabase.co
# SUPABASE_ANON_KEY=your_supabase_anon_key
frontend-workflow/.env (frontend):
VITE_DEFAULT_LLM_API_URL=https://api.openai.com/v1
VITE_LLM_API_URLS=https://api.openai.com/v1
# Optional: Supabase (keep consistent with backend)
# VITE_SUPABASE_URL=https://your-project-id.supabase.co
# VITE_SUPABASE_ANON_KEY=your_supabase_anon_key
docker compose up -d --build
Open:
- Frontend: http://localhost:3000
- Backend health: http://localhost:8000/health
GPU services note: Docker only starts the frontend and backend. No GPU model services are included.
- Paper2PPT, Paper2Figure, Knowledge Base, etc. only need LLM APIs and work out of the box.
- PDF2PPT, Image2PPT, Image2Drawio require the SAM3 segmentation service (needs GPU), deployed separately:
# On a machine with GPUThen add to
python -m dataflow_agent.toolkits.model_servers.sam3_server \
--port 8001 --checkpoint models/sam3/sam3.pt \
--bpe models/sam3/bpe_simple_vocab_16e6.txt.gz --device cudafastapi_app/.env:SAM3_SERVER_URLS=http://GPU_MACHINE_IP:8001See the "Advanced: Local Model Server Load Balancing" section below for details.
Modify & update:
- After changing code or
.env, rebuild:docker compose up -d --build - Pull latest code and rebuild:
git pulldocker compose up -d --build
Common commands:
- View logs:
docker compose logs -f - Stop services:
docker compose down
Notes:
- The first build may take a while (system deps + Python deps).
- Frontend env is baked at build time (compose build args). If you change it, rebuild with
docker compose up -d --build. - Outputs/models are mounted to the host (
./outputs,./models) for persistence.
Linux Installation
We recommend using Conda to create an isolated environment (Python 3.11).
1. Create Environment & Install Base Dependencies
conda create -n paper2any python=3.11 -y
conda activate paper2any
# 1. Clone repository
git clone https://github.com/OpenDCAI/Paper2Any.git
cd Paper2Any
# 2. Install base dependencies
pip install -r requirements-base.txt
# 3. Install in editable (dev) mode
pip install -e .
2. Install Paper2Any-specific Dependencies (Required)
Paper2Any involves LaTeX rendering, vector graphics processing as well as PPT/PDF conversion, which require extra dependencies:
pip install -r requirements-paper.txt || pip install -r requirements-paper-backup.txt
# 2. LaTeX engine (tectonic) - recommended via conda
conda install -c conda-forge tectonic -y
# 3. Resolve doclayout_yolo dependency conflicts (Important)
pip install doclayout_yolo --no-deps
# 4. System dependencies (Ubuntu example)
sudo apt-get update
sudo apt-get install -y inkscape libreoffice poppler-utils wkhtmltopdf
3. Environment Variables
export DF_API_URL=xxx # Optional: if you need a third-party API gateway
export MINERU_DEVICES="0,1,2,3" # Optional: MinerU task GPU resource pool
Tip
For detailed configuration guide, see Configuration Guide for step-by-step instructions on configuring models, environment variables, and starting services.
4. Configure Environment Files (Optional)
Click to expand: Detailed .env Configuration Guide
Paper2Any uses two .env files for configuration. Both are optional - you can run the application without them using default settings.
Step 1: Copy Example Files
cp fastapi_app/.env.example fastapi_app/.env
# Copy frontend environment file
cp frontend-workflow/.env.example frontend-workflow/.env
Step 2: Backend Configuration (fastapi_app/.env)
Supabase (Optional) - Only needed if you want user authentication and cloud storage:
SUPABASE_ANON_KEY=your_supabase_anon_key
Model Configuration - Customize which models to use for different workflows:
DEFAULT_LLM_API_URL=http://123.129.219.111:3000/v1/
# Workflow-level defaults
PAPER2PPT_DEFAULT_MODEL=gpt-5.1
PAPER2PPT_DEFAULT_IMAGE_MODEL=gemini-3-pro-image-preview
PDF2PPT_DEFAULT_MODEL=gpt-4o
# ... see .env.example for full list
Step 3: Frontend Configuration (frontend-workflow/.env)
LLM Provider Configuration - Controls the API endpoint dropdown in the UI:
VITE_DEFAULT_LLM_API_URL=https://api.apiyi.com/v1
# Available API URLs in the dropdown (comma-separated)
VITE_LLM_API_URLS=https://api.apiyi.com/v1,http://b.apiyi.com:16888/v1,http://123.129.219.111:3000/v1
What happens when you modify VITE_LLM_API_URLS:
- The frontend will display a dropdown menu with all URLs you specify
- Users can select different API endpoints without manually typing URLs
- Useful for switching between OpenAI, local models, or custom API gateways
Supabase (Optional) - Uncomment these lines if you want user authentication:
VITE_SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
SUPABASE_JWT_SECRET=your-jwt-secret
Running Without Supabase
If you skip Supabase configuration:
- All core features work normally
- CLI scripts work without any configuration
- No user authentication or quotas
- No cloud file storage
Note
Quick Start: You can skip the .env configuration entirely and use CLI scripts directly with --api-key parameter. See CLI Scripts section below.
Advanced Configuration: Local Model Service Load Balancing
If you are deploying in a high-concurrency local environment, you can use script/start_model_servers.sh to start a local model service cluster (MinerU / SAM / OCR).
Script location: /DataFlow-Agent/script/start_model_servers.sh
Main configuration items:
-
MinerU (PDF Parsing)
MINERU_MODEL_PATH: Model path (defaultmodels/MinerU2.5-2509-1.2B)MINERU_GPU_UTIL: GPU memory utilization (default 0.2)- Instance configuration: By default, 4 instances are started on GPU 0 and GPU 4 respectively (8 in total), ports 8011-8018.
- Load Balancer: Port 8010, automatically dispatches requests.
-
SAM3 (Segment Anything Model 3)
- Instance configuration: By default, one instance per configured GPU, ports start from 8021.
- Model assets: default paths are
./models/sam3/sam3.ptand./models/sam3/bpe_simple_vocab_16e6.txt.gz. - Load Balancer: Port 8020.
-
OCR (PaddleOCR)
- Config: Runs on CPU, uses uvicorn's worker mechanism (4 workers by default).
- Port: 8003.
Before using, please modify
gpu_idand the number of instances in the script according to your actual GPU count and memory.
For SAM3 assets migration into this repository, run:
# or: bash script/setup_sam3_assets.sh copy
For local one-command development test on a single GPU (SAM3 + backend + frontend), run:
Windows Installation
Note
We currently recommend trying Paper2Any on Linux / WSL. If you need to deploy on native Windows, please follow the steps below.
1. Create Environment & Install Base Dependencies
conda create -n paper2any python=3.12 -y
conda activate paper2any
# 1. Clone repository
git clone https://github.com/OpenDCAI/Paper2Any.git
cd Paper2Any
# 2. Install base dependencies
pip install -r requirements-win-base.txt
# 3. Install in editable (dev) mode
pip install -e .
2. Install Paper2Any-specific Dependencies (Recommended)
Paper2Any involves LaTeX rendering and vector graphics processing, which require extra dependencies (see requirements-paper.txt):
pip install -r requirements-paper.txt
# tectonic: LaTeX engine (recommended via conda)
conda install -c conda-forge tectonic -y
Install Inkscape (SVG/Vector Graphics Processing | Recommended/Required)
- Download and install (Windows 64-bit MSI): Inkscape Download
- Add the Inkscape executable directory to the system environment variable Path (example):
C:\Program Files\Inkscape\bin\
Tip
After configuring the Path, it is recommended to reopen the terminal (or restart VS Code / PowerShell) to ensure the environment variables take effect.
Install Windows Build of vLLM (Optional | For Local Inference Acceleration)
Release page: vllm-windows releases
Recommended version: 0.11.0
Important
Please make sure the .whl matches your current environment:
- Python: cp312 (Python 3.12)
- Platform: win_amd64
- CUDA: cu124 (must match your local CUDA / driver)
Launch Application
Paper2Any - Paper Workflow Web Frontend (Recommended)
cd fastapi_app
uvicorn main:app --host 0.0.0.0 --port 8000
# Start frontend (new terminal)
cd frontend-workflow
npm install
npm run dev
Configure Frontend Proxy
Modify server.proxy in frontend-workflow/vite.config.ts:
plugins: [react()],
server: {
port: 3000,
open: true,
allowedHosts: true,
proxy: {
'/api': {
target: 'http://127.0.0.1:8000', // FastAPI backend address
changeOrigin: true,
},
},
},
})
Visit http://localhost:3000.
Windows: Load MinerU Pre-trained Model
vllm serve opendatalab/MinerU2.5-2509-1.2B `
--host 127.0.0.1 `
--port 8010 `
--logits-processors mineru_vl_utils:MinerULogitsProcessor `
--gpu-memory-utilization 0.6 `
--trust-remote-code `
--enforce-eager
Launch Application
Web Frontend (Recommended)
cd fastapi_app
uvicorn main:app --host 0.0.0.0 --port 8000
# Start frontend (new terminal)
cd frontend-workflow
npm install
npm run dev
Visit http://localhost:3000.
CLI Scripts (Command-Line Interface)
Paper2Any provides standalone CLI scripts that accept command-line parameters for direct workflow execution without requiring the web frontend/backend.
Environment Variables
Configure API access via environment variables (optional):
export DF_API_KEY=sk-xxx # API key
export DF_MODEL=gpt-4o # Default model
Available CLI Scripts
1. Paper2Figure CLI - Generate scientific figures (3 types)
python script/run_paper2figure_cli.py \
--input paper.pdf \
--graph-type model_arch \
--api-key sk-xxx
# Generate technical roadmap from text
python script/run_paper2figure_cli.py \
--input "Transformer architecture with attention mechanism" \
--input-type TEXT \
--graph-type tech_route
# Generate experimental data visualization
python script/run_paper2figure_cli.py \
--input paper.pdf \
--graph-type exp_data
Graph types: model_arch (model architecture), tech_route (technical roadmap), exp_data (experimental plots)
2. Paper2PPT CLI - Convert papers to PPT presentations
python script/run_paper2ppt_cli.py \
--input paper.pdf \
--api-key sk-xxx \
--page-count 15
# With custom style
python script/run_paper2ppt_cli.py \
--input paper.pdf \
--style "Academic style; English; Modern design" \
--language en
3. PDF2PPT CLI - One-click PDF to editable PPT
python script/run_pdf2ppt_cli.py --input slides.pdf
# With AI enhancement
python script/run_pdf2ppt_cli.py \
--input slides.pdf \
--use-ai-edit \
--api-key sk-xxx
4. Image2PPT CLI - Convert images to editable PPT
python script/run_image2ppt_cli.py --input screenshot.png
# With AI enhancement
python script/run_image2ppt_cli.py \
--input diagram.jpg \
--use-ai-edit \
--api-key sk-xxx
5. PPT2Polish CLI - Beautify existing PPT files
python script/run_ppt2polish_cli.py \
--input old_presentation.pptx \
--style "Academic style, clean and elegant" \
--api-key sk-xxx
# With reference image for style consistency
python script/run_ppt2polish_cli.py \
--input old_presentation.pptx \
--style "Modern minimalist style" \
--ref-img reference_style.png \
--api-key sk-xxx
Note
System Requirements for PPT2Polish:
- LibreOffice:
sudo apt-get install libreoffice(Ubuntu/Debian) - pdf2image:
pip install pdf2image - poppler-utils:
sudo apt-get install poppler-utils
Common Options
All CLI scripts support these common options:
--api-url URL- LLM API URL (default: fromDF_API_URLenv var)--api-key KEY- API key (default: fromDF_API_KEYenv var)--model NAME- Text model name (default: varies by script)--output-dir DIR- Custom output directory (default:outputs/cli/{script_name}/{timestamp})--help- Show detailed help message
For complete parameter documentation, run any script with --help:
Project Structure
Paper2Any/
+-- dataflow_agent/ # Core codebase
| +-- agentroles/ # Agent definitions
| | +-- paper2any_agents/ # Paper2Any-specific agents
| +-- workflow/ # Workflow definitions
| +-- promptstemplates/ # Prompt templates
| +-- toolkits/ # Toolkits (drawing, PPT generation, etc.)
+-- fastapi_app/ # Backend API service
+-- frontend-workflow/ # Frontend web interface
+-- static/ # Static assets
+-- script/ # Script tools
+-- tests/ # Test cases
Roadmap
Contributing
We welcome all forms of contribution!
License
This project is licensed under Apache License 2.0.