Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

OpenDCAI/Paper2Any

Repository files navigation

Paper2Any

English | Zhong Wen

Focus on paper multimodal workflows: from paper PDFs/screenshots/text to one-click generation of model diagrams, technical roadmaps, experimental plots, and slide decks

| Universal File Support | AI-Powered Generation | Custom Styling | Lightning Speed |




Table of Contents

  • News
  • Core Features
  • Showcase
  • Drawio
  • Quick Start
  • Project Structure
  • Roadmap
  • Contributing

News

Tip

[NEW] 2026-02-02 * Paper2Rebuttal
Added rebuttal drafting support with structured response guidance and image-aware revision prompts.

Tip

[NEW] 2026-01-28 * Drawio Update
Added Drawio support for visual diagram creation and showcase-ready outputs in the workflow.
KB updates in one line: multi-file PPT generation with doc convert/merge, optional image injection, and embedding-assisted retrieval.

Tip

[NEW] 2026-01-25 * New Features
Added AI-assisted outline editing, three-layer model configuration system for flexible model selection, and user points management with daily quota allocation.
Online Demo: http://dcai-paper2any.nas.cpolar.cn/

Tip

[NEW] 2026-01-20 * Bug Fixes
Fixed bugs in experimental plot generation (image/text) and resolved the missing historical files issue.
Online Demo: http://dcai-paper2any.nas.cpolar.cn/

Tip

[NEW] 2026-01-14 * Feature Updates & Backend Architecture Upgrade

  1. Feature Updates: Added Image2PPT, optimized Paper2Figure interaction, and improved PDF2PPT effects.
  2. Standardized API: Refactored backend interfaces with RESTful /api/v1/ structure, removing obsolete endpoints for better maintainability.
  3. Dynamic Configuration: Supported dynamic model selection (e.g., GPT-4o, Qwen-VL) via API parameters, eliminating hardcoded model dependencies.
    Online Demo: http://dcai-paper2any.nas.cpolar.cn/
  • 2025-12-12 * Paper2Figure Web public beta is live
  • 2025-10-01 * Released the first version 0.1.0

Core Features

From paper PDFs / images / text to editable scientific figures, slide decks, video scripts, academic posters, and other multimodal content in one click.

Paper2Any currently includes the following sub-capabilities:

  • Paper2Figure - Editable Scientific Figures: Model architecture diagrams, technical roadmaps (PPT + SVG), and experimental plots with editable PPTX output.
  • Paper2Diagram / Image2Drawio - Editable Diagrams: Generate draw.io diagrams from paper/text or images, with drawio/png/svg export and chat-based edits.
  • Paper2PPT - Editable Slide Decks: Paper/text/topic to PPT, long-doc support, and built-in table/figure extraction.
  • Paper2Rebuttal: Draft structured rebuttals and revision responses with claims-to-evidence grounding.
  • PDF2PPT - Layout-Preserving Conversion: Accurate layout retention for PDF - editable PPTX.
  • Image2PPT - Image to Slides: Convert images or screenshots into structured slides.
  • PPTPolish - Smart Beautification: AI-based layout optimization and style transfer.
  • Paper2Video: Generate video scripts and narration assets.
  • Paper2Technical: Produce technical reports and method summaries.
  • Knowledge Base (KB): Ingest/embedding, semantic search, and KB-driven PPT/podcast/mindmap generation.

Showcase

Drawio



Diagram generation (mindmap / flowchart / ER ...)




Model diagrams from PDF or text (research figure generation)




Image to editable DrawIO diagram


Paper2Rebuttal: Rebuttal Drafting



Rebuttal drafting and revision support

Paper2Figure: Scientific Figure Generation



Model Architecture Diagram Generation

Model Architecture Diagram Generation






Technical Roadmap Generation




Experimental Plot Generation (Multiple Styles)


Paper2PPT: Paper to Presentation



PPT Generation Demo

Paper / Text / Topic - PPT




Long Document Support (40+ Slides)




Intelligent Table Extraction & Insertion




AI-Assisted Outline Editing




Version History Management


PPT Smart Beautification



AI-based Layout Optimization

AI-based Layout Optimization & Style Transfer

PDF2PPT: Layout-Preserving Conversion



Intelligent Cutout & Layout Preservation

Image2PPT

Quick Start

Requirements

Docker (Recommended) -- Deployment & Updates
# 1. Clone
git clone https://github.com/OpenDCAI/Paper2Any.git
cd Paper2Any

# 2. Configure environment variables
cp fastapi_app/.env.example fastapi_app/.env
cp frontend-workflow/.env.example frontend-workflow/.env

Required configuration:

fastapi_app/.env (backend):

# Required: Your LLM API URL (replace with your own)
DEFAULT_LLM_API_URL=https://api.openai.com/v1/
# Optional: Supabase (skip for no auth -- core features still work)
# SUPABASE_URL=https://your-project-id.supabase.co
# SUPABASE_ANON_KEY=your_supabase_anon_key

frontend-workflow/.env (frontend):

# Required: LLM API URLs available in the UI dropdown (comma separated)
VITE_DEFAULT_LLM_API_URL=https://api.openai.com/v1
VITE_LLM_API_URLS=https://api.openai.com/v1
# Optional: Supabase (keep consistent with backend)
# VITE_SUPABASE_URL=https://your-project-id.supabase.co
# VITE_SUPABASE_ANON_KEY=your_supabase_anon_key
# 3. Build + run
docker compose up -d --build

Open:

GPU services note: Docker only starts the frontend and backend. No GPU model services are included.

  • Paper2PPT, Paper2Figure, Knowledge Base, etc. only need LLM APIs and work out of the box.
  • PDF2PPT, Image2PPT, Image2Drawio require the SAM3 segmentation service (needs GPU), deployed separately:
    # On a machine with GPU
    python -m dataflow_agent.toolkits.model_servers.sam3_server \
    --port 8001 --checkpoint models/sam3/sam3.pt \
    --bpe models/sam3/bpe_simple_vocab_16e6.txt.gz --device cuda
    Then add to fastapi_app/.env: SAM3_SERVER_URLS=http://GPU_MACHINE_IP:8001

See the "Advanced: Local Model Server Load Balancing" section below for details.

Modify & update:

  • After changing code or .env, rebuild: docker compose up -d --build
  • Pull latest code and rebuild:
    • git pull
    • docker compose up -d --build

Common commands:

  • View logs: docker compose logs -f
  • Stop services: docker compose down

Notes:

  • The first build may take a while (system deps + Python deps).
  • Frontend env is baked at build time (compose build args). If you change it, rebuild with docker compose up -d --build.
  • Outputs/models are mounted to the host (./outputs, ./models) for persistence.

Linux Installation

We recommend using Conda to create an isolated environment (Python 3.11).

1. Create Environment & Install Base Dependencies

# 0. Create and activate a conda environment
conda create -n paper2any python=3.11 -y
conda activate paper2any

# 1. Clone repository
git clone https://github.com/OpenDCAI/Paper2Any.git
cd Paper2Any

# 2. Install base dependencies
pip install -r requirements-base.txt

# 3. Install in editable (dev) mode
pip install -e .

2. Install Paper2Any-specific Dependencies (Required)

Paper2Any involves LaTeX rendering, vector graphics processing as well as PPT/PDF conversion, which require extra dependencies:

# 1. Python dependencies
pip install -r requirements-paper.txt || pip install -r requirements-paper-backup.txt

# 2. LaTeX engine (tectonic) - recommended via conda
conda install -c conda-forge tectonic -y

# 3. Resolve doclayout_yolo dependency conflicts (Important)
pip install doclayout_yolo --no-deps

# 4. System dependencies (Ubuntu example)
sudo apt-get update
sudo apt-get install -y inkscape libreoffice poppler-utils wkhtmltopdf

3. Environment Variables

export DF_API_KEY=your_api_key_here
export DF_API_URL=xxx # Optional: if you need a third-party API gateway
export MINERU_DEVICES="0,1,2,3" # Optional: MinerU task GPU resource pool

Tip

For detailed configuration guide, see Configuration Guide for step-by-step instructions on configuring models, environment variables, and starting services.

4. Configure Environment Files (Optional)

Click to expand: Detailed .env Configuration Guide

Paper2Any uses two .env files for configuration. Both are optional - you can run the application without them using default settings.

Step 1: Copy Example Files
# Copy backend environment file
cp fastapi_app/.env.example fastapi_app/.env

# Copy frontend environment file
cp frontend-workflow/.env.example frontend-workflow/.env
Step 2: Backend Configuration (fastapi_app/.env)

Supabase (Optional) - Only needed if you want user authentication and cloud storage:

SUPABASE_URL=https://your-project-id.supabase.co
SUPABASE_ANON_KEY=your_supabase_anon_key

Model Configuration - Customize which models to use for different workflows:

# Default LLM API URL
DEFAULT_LLM_API_URL=http://123.129.219.111:3000/v1/

# Workflow-level defaults
PAPER2PPT_DEFAULT_MODEL=gpt-5.1
PAPER2PPT_DEFAULT_IMAGE_MODEL=gemini-3-pro-image-preview
PDF2PPT_DEFAULT_MODEL=gpt-4o
# ... see .env.example for full list
Step 3: Frontend Configuration (frontend-workflow/.env)

LLM Provider Configuration - Controls the API endpoint dropdown in the UI:

# Default API URL shown in the UI
VITE_DEFAULT_LLM_API_URL=https://api.apiyi.com/v1

# Available API URLs in the dropdown (comma-separated)
VITE_LLM_API_URLS=https://api.apiyi.com/v1,http://b.apiyi.com:16888/v1,http://123.129.219.111:3000/v1

What happens when you modify VITE_LLM_API_URLS:

  • The frontend will display a dropdown menu with all URLs you specify
  • Users can select different API endpoints without manually typing URLs
  • Useful for switching between OpenAI, local models, or custom API gateways

Supabase (Optional) - Uncomment these lines if you want user authentication:

VITE_SUPABASE_URL=https://your-project.supabase.co
VITE_SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
SUPABASE_JWT_SECRET=your-jwt-secret
Running Without Supabase

If you skip Supabase configuration:

  • All core features work normally
  • CLI scripts work without any configuration
  • No user authentication or quotas
  • No cloud file storage

Note

Quick Start: You can skip the .env configuration entirely and use CLI scripts directly with --api-key parameter. See CLI Scripts section below.


Advanced Configuration: Local Model Service Load Balancing

If you are deploying in a high-concurrency local environment, you can use script/start_model_servers.sh to start a local model service cluster (MinerU / SAM / OCR).

Script location: /DataFlow-Agent/script/start_model_servers.sh

Main configuration items:

  • MinerU (PDF Parsing)

    • MINERU_MODEL_PATH: Model path (default models/MinerU2.5-2509-1.2B)
    • MINERU_GPU_UTIL: GPU memory utilization (default 0.2)
    • Instance configuration: By default, 4 instances are started on GPU 0 and GPU 4 respectively (8 in total), ports 8011-8018.
    • Load Balancer: Port 8010, automatically dispatches requests.
  • SAM3 (Segment Anything Model 3)

    • Instance configuration: By default, one instance per configured GPU, ports start from 8021.
    • Model assets: default paths are ./models/sam3/sam3.pt and ./models/sam3/bpe_simple_vocab_16e6.txt.gz.
    • Load Balancer: Port 8020.
  • OCR (PaddleOCR)

    • Config: Runs on CPU, uses uvicorn's worker mechanism (4 workers by default).
    • Port: 8003.

Before using, please modify gpu_id and the number of instances in the script according to your actual GPU count and memory.

For SAM3 assets migration into this repository, run:

bash script/setup_sam3_assets.sh link
# or: bash script/setup_sam3_assets.sh copy

For local one-command development test on a single GPU (SAM3 + backend + frontend), run:

bash script/start_local_sam3_dev.sh

Windows Installation

Note

We currently recommend trying Paper2Any on Linux / WSL. If you need to deploy on native Windows, please follow the steps below.

1. Create Environment & Install Base Dependencies

# 0. Create and activate a conda environment
conda create -n paper2any python=3.12 -y
conda activate paper2any

# 1. Clone repository
git clone https://github.com/OpenDCAI/Paper2Any.git
cd Paper2Any

# 2. Install base dependencies
pip install -r requirements-win-base.txt

# 3. Install in editable (dev) mode
pip install -e .

2. Install Paper2Any-specific Dependencies (Recommended)

Paper2Any involves LaTeX rendering and vector graphics processing, which require extra dependencies (see requirements-paper.txt):

# Python dependencies
pip install -r requirements-paper.txt

# tectonic: LaTeX engine (recommended via conda)
conda install -c conda-forge tectonic -y

Install Inkscape (SVG/Vector Graphics Processing | Recommended/Required)

  1. Download and install (Windows 64-bit MSI): Inkscape Download
  2. Add the Inkscape executable directory to the system environment variable Path (example): C:\Program Files\Inkscape\bin\

Tip

After configuring the Path, it is recommended to reopen the terminal (or restart VS Code / PowerShell) to ensure the environment variables take effect.

Install Windows Build of vLLM (Optional | For Local Inference Acceleration)

Release page: vllm-windows releases
Recommended version: 0.11.0

pip install vllm-0.11.0+cu124-cp312-cp312-win_amd64.whl

Important

Please make sure the .whl matches your current environment:

  • Python: cp312 (Python 3.12)
  • Platform: win_amd64
  • CUDA: cu124 (must match your local CUDA / driver)

Launch Application

Paper2Any - Paper Workflow Web Frontend (Recommended)

# Start backend API
cd fastapi_app
uvicorn main:app --host 0.0.0.0 --port 8000

# Start frontend (new terminal)
cd frontend-workflow
npm install
npm run dev

Configure Frontend Proxy

Modify server.proxy in frontend-workflow/vite.config.ts:

export default defineConfig({
plugins: [react()],
server: {
port: 3000,
open: true,
allowedHosts: true,
proxy: {
'/api': {
target: 'http://127.0.0.1:8000', // FastAPI backend address
changeOrigin: true,
},
},
},
})

Visit http://localhost:3000.

Windows: Load MinerU Pre-trained Model

# Start in PowerShell
vllm serve opendatalab/MinerU2.5-2509-1.2B `
--host 127.0.0.1 `
--port 8010 `
--logits-processors mineru_vl_utils:MinerULogitsProcessor `
--gpu-memory-utilization 0.6 `
--trust-remote-code `
--enforce-eager

Launch Application

Web Frontend (Recommended)

# Start backend API
cd fastapi_app
uvicorn main:app --host 0.0.0.0 --port 8000

# Start frontend (new terminal)
cd frontend-workflow
npm install
npm run dev

Visit http://localhost:3000.


CLI Scripts (Command-Line Interface)

Paper2Any provides standalone CLI scripts that accept command-line parameters for direct workflow execution without requiring the web frontend/backend.

Environment Variables

Configure API access via environment variables (optional):

export DF_API_URL=https://api.openai.com/v1 # LLM API URL
export DF_API_KEY=sk-xxx # API key
export DF_MODEL=gpt-4o # Default model

Available CLI Scripts

1. Paper2Figure CLI - Generate scientific figures (3 types)

# Generate model architecture diagram from PDF
python script/run_paper2figure_cli.py \
--input paper.pdf \
--graph-type model_arch \
--api-key sk-xxx

# Generate technical roadmap from text
python script/run_paper2figure_cli.py \
--input "Transformer architecture with attention mechanism" \
--input-type TEXT \
--graph-type tech_route

# Generate experimental data visualization
python script/run_paper2figure_cli.py \
--input paper.pdf \
--graph-type exp_data

Graph types: model_arch (model architecture), tech_route (technical roadmap), exp_data (experimental plots)

2. Paper2PPT CLI - Convert papers to PPT presentations

# Basic usage
python script/run_paper2ppt_cli.py \
--input paper.pdf \
--api-key sk-xxx \
--page-count 15

# With custom style
python script/run_paper2ppt_cli.py \
--input paper.pdf \
--style "Academic style; English; Modern design" \
--language en

3. PDF2PPT CLI - One-click PDF to editable PPT

# Basic conversion (no AI enhancement)
python script/run_pdf2ppt_cli.py --input slides.pdf

# With AI enhancement
python script/run_pdf2ppt_cli.py \
--input slides.pdf \
--use-ai-edit \
--api-key sk-xxx

4. Image2PPT CLI - Convert images to editable PPT

# Basic conversion
python script/run_image2ppt_cli.py --input screenshot.png

# With AI enhancement
python script/run_image2ppt_cli.py \
--input diagram.jpg \
--use-ai-edit \
--api-key sk-xxx

5. PPT2Polish CLI - Beautify existing PPT files

# Basic beautification
python script/run_ppt2polish_cli.py \
--input old_presentation.pptx \
--style "Academic style, clean and elegant" \
--api-key sk-xxx

# With reference image for style consistency
python script/run_ppt2polish_cli.py \
--input old_presentation.pptx \
--style "Modern minimalist style" \
--ref-img reference_style.png \
--api-key sk-xxx

Note

System Requirements for PPT2Polish:

  • LibreOffice: sudo apt-get install libreoffice (Ubuntu/Debian)
  • pdf2image: pip install pdf2image
  • poppler-utils: sudo apt-get install poppler-utils

Common Options

All CLI scripts support these common options:

  • --api-url URL - LLM API URL (default: from DF_API_URL env var)
  • --api-key KEY - API key (default: from DF_API_KEY env var)
  • --model NAME - Text model name (default: varies by script)
  • --output-dir DIR - Custom output directory (default: outputs/cli/{script_name}/{timestamp})
  • --help - Show detailed help message

For complete parameter documentation, run any script with --help:

python script/run_paper2figure_cli.py --help

Project Structure

Paper2Any/
+-- dataflow_agent/ # Core codebase
| +-- agentroles/ # Agent definitions
| | +-- paper2any_agents/ # Paper2Any-specific agents
| +-- workflow/ # Workflow definitions
| +-- promptstemplates/ # Prompt templates
| +-- toolkits/ # Toolkits (drawing, PPT generation, etc.)
+-- fastapi_app/ # Backend API service
+-- frontend-workflow/ # Frontend web interface
+-- static/ # Static assets
+-- script/ # Script tools
+-- tests/ # Test cases

Roadmap

Feature Status Sub-features
Paper2Figure
Editable Scientific Figures



Paper2Diagram
Drawio Diagrams



Paper2PPT
Editable Slide Decks





PDF2PPT
Layout-Preserving Conversion


Image2PPT
Image to Slides

PPTPolish
Smart Beautification


Knowledge Base
KB Workflows


Paper2Video
Video Script Generation


Contributing

We welcome all forms of contribution!


License

This project is licensed under Apache License 2.0.


If this project helps you, please give us a Star!



Scan to join the community WeChat group

Made with by OpenDCAI Team

About

Turn paper/text/topic into editable research figures, technical route diagrams, and presentation slides.

Topics

Resources

Readme

License

Apache-2.0 license

Contributing

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors