Light Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

THU-KEG/DeepPrune

Repository files navigation


DeepPrune: Parallel Scaling without Inter-trace Redundancy

Paper * Project Page * Model * Data

Efficient reasoning at scale by pruning redundant reasoning traces--without sacrificing accuracy.


Overview

Large language models (LLMs) often generate multiple reasoning traces in parallel to improve answer reliability. However, these traces frequently exhibit severe inter-trace redundancy, leading to wasted computation and inflated inference costs.

DeepPrune addresses this by learning to identify and prune semantically redundant traces before full execution--enabling cost-effective parallel reasoning while preserving performance.

More details can be found in our website

Results


Dependencies

cd DeepPrune
pip install -r requirements.txt

We use Llama-Factory for model fine-tuning and inference and we have provided the version we used in Llama-Factory folder. Here we modify it to support Focal Loss. Please refer to the GitHub issue if you want to clone LLaMA-Factory by yourself.

We use Qwen/Qwen3-4B-Instruct-2507 as the backbone LLM for DeepPrune. You can download it from https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507. You can also use other open-source LLMs.


Dataset

The dataset provided here is not complete because of the size! ! ! Please refer to https://huggingface.co/datasets/THU-KEG/DeepPrune for the full dataset.

To understand how to use the dataset, please refer to DeepPrune_data/README.md. Click here


Preliminaries

To understand the motivation behind DeepPrune, explore the preliminary analysis in:

Preliminaries/Preliminary experiment.ipynb

This notebook includes:

  • Distribution of answer agreement:
    Most trace pairs yield the same answer, revealing significant redundancy in parallel reasoning.

  • ROC curves for redundancy detection:

    • Sentence-BERT (shallow similarity): AUROC = 0.58 - limited discriminative power.
    • Qwen3-4B-Instruct (zero-shot LLM comparison): AUROC = 0.66 - moderate improvement, but still suboptimal.

To reproduce the zero-shot Qwen3-4B-Instruct results:

  1. Prepare the evaluation dataset using DeepPrune/Offline/Ablation_Study.ipynb
  2. Run Preliminaries/zero_shot_exp.py

DeepPrune Pipeline

Prerequisites

  • Install Llama-Factory
  • Patch required: Modify the codebase to support Focal Loss (see GitHub issue for guidance).

1 Prepare Finetuning Dataset

Generate the supervised training data for DeepPrune:

jupyter notebook DeepPrune/finetuning/build_finetune_dataset.ipynb

This constructs pairwise trace comparisons labeled by answer equivalence.


2 Offline Training

Train the DeepPrune model using supervised fine-tuning:

  • Config: DeepPrune/Offline/Qwen3_full_sft.yaml
  • Framework: Llama-Factory

After training:

  1. Generate test data: DeepPrune/Offline/Ablation_Study.ipynb
  2. Evaluate performance:
    python DeepPrune/Offline/test_model_performance_parallel.py
  3. Visualize results: DeepPrune/Offline/check_model_output.ipynb

Expect significant gains over shallow similarity baselines (AUROC > 0.83 in our experiments).


3 Online Pruning

Deploy DeepPrune for real-time trace pruning during inference:

  1. Establish baselines:
    Run DeepPrune/Online/check_pass_k.ipynb to compute:

    • pass@1: Accuracy with single trace
    • cons@512: Consensus accuracy with 512 traces
  2. Apply DeepPrune:

    python DeepPrune/Online/greedy_cluster_threshold.py

    This performs greedy clustering of traces using DeepPrune's similarity scores and prunes redundant ones.

  3. Trade-off control:
    Adjust the similarity threshold to balance:

    • Cost reduction (fewer traces executed)
    • Performance retention (maintained consensus accuracy)

Acknowledgement

This code repository is developed based on Llama-Factory, vllm, DeepScaleR and DeepConf.

Thanks for their great work!


Citation

If you use DeepPrune in your research, please cite our work:

@article{tu2025deepprune,
title={DeepPrune: Parallel Scaling without Inter-trace Redundancy},
author={Shangqing Tu, Yaxuan Li, Yushi Bai, Lei Hou, Juanzi Li},
journal={arXiv preprint arXiv:2510.08483},
year={2025}
}

About

DeepPrune: Parallel Scaling without Inter-trace Redundancy

Topics

Resources

Readme

License

MIT license

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors