Name	Name	Last commit message	Last commit date
Latest commit History 13 Commits
DeepPrune	DeepPrune
DeepPrune_data	DeepPrune_data
LLaMA-Factory	LLaMA-Factory
Preliminaries	Preliminaries
figs	figs
LICENSE	LICENSE
README.md	README.md
requirements.txt	requirements.txt

DeepPrune: Parallel Scaling without Inter-trace Redundancy

Efficient reasoning at scale by pruning redundant reasoning traces--without sacrificing accuracy.

Overview

Large language models (LLMs) often generate multiple reasoning traces in parallel to improve answer reliability. However, these traces frequently exhibit severe inter-trace redundancy, leading to wasted computation and inflated inference costs.

DeepPrune addresses this by learning to identify and prune semantically redundant traces before full execution--enabling cost-effective parallel reasoning while preserving performance.

More details can be found in our website

Results

Dependencies

cd DeepPrune pip install -r requirements.txt

We use Llama-Factory for model fine-tuning and inference and we have provided the version we used in Llama-Factory folder. Here we modify it to support Focal Loss. Please refer to the GitHub issue if you want to clone LLaMA-Factory by yourself.

We use Qwen/Qwen3-4B-Instruct-2507 as the backbone LLM for DeepPrune. You can download it from https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507. You can also use other open-source LLMs.

Dataset

The dataset provided here is not complete because of the size! ! ! Please refer to https://huggingface.co/datasets/THU-KEG/DeepPrune for the full dataset.

To understand how to use the dataset, please refer to DeepPrune_data/README.md. Click here

Preliminaries

To understand the motivation behind DeepPrune, explore the preliminary analysis in:

Preliminaries/Preliminary experiment.ipynb

This notebook includes:

Distribution of answer agreement:
Most trace pairs yield the same answer, revealing significant redundancy in parallel reasoning.
ROC curves for redundancy detection:
- Sentence-BERT (shallow similarity): AUROC = 0.58 - limited discriminative power.
- Qwen3-4B-Instruct (zero-shot LLM comparison): AUROC = 0.66 - moderate improvement, but still suboptimal.

To reproduce the zero-shot Qwen3-4B-Instruct results:

Prepare the evaluation dataset using DeepPrune/Offline/Ablation_Study.ipynb

Run Preliminaries/zero_shot_exp.py

DeepPrune Pipeline

Prerequisites

Install Llama-Factory
Patch required: Modify the codebase to support Focal Loss (see GitHub issue for guidance).

1 Prepare Finetuning Dataset

Generate the supervised training data for DeepPrune:

jupyter notebook DeepPrune/finetuning/build_finetune_dataset.ipynb

This constructs pairwise trace comparisons labeled by answer equivalence.

2 Offline Training

Train the DeepPrune model using supervised fine-tuning:

Config: DeepPrune/Offline/Qwen3_full_sft.yaml
Framework: Llama-Factory

After training:

Generate test data: DeepPrune/Offline/Ablation_Study.ipynb
Evaluate performance:
python DeepPrune/Offline/test_model_performance_parallel.py
Visualize results: DeepPrune/Offline/check_model_output.ipynb

Expect significant gains over shallow similarity baselines (AUROC > 0.83 in our experiments).

3 Online Pruning

Deploy DeepPrune for real-time trace pruning during inference:

Establish baselines:
Run DeepPrune/Online/check_pass_k.ipynb to compute:
- pass@1: Accuracy with single trace
- cons@512: Consensus accuracy with 512 traces
Apply DeepPrune:

python DeepPrune/Online/greedy_cluster_threshold.py

This performs greedy clustering of traces using DeepPrune's similarity scores and prunes redundant ones.
Trade-off control:
Adjust the similarity threshold to balance:
- Cost reduction (fewer traces executed)
- Performance retention (maintained consensus accuracy)

Acknowledgement

This code repository is developed based on Llama-Factory, vllm, DeepScaleR and DeepConf.

Thanks for their great work!

Citation

If you use DeepPrune in your research, please cite our work:

@article{tu2025deepprune, title={DeepPrune: Parallel Scaling without Inter-trace Redundancy}, author={Shangqing Tu, Yaxuan Li, Yushi Bai, Lei Hou, Juanzi Li}, journal={arXiv preprint arXiv:2510.08483}, year={2025} }

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

THU-KEG/DeepPrune

Folders and files

Latest commit

History

Repository files navigation

DeepPrune: Parallel Scaling without Inter-trace Redundancy

Overview

Results

Dependencies

Dataset

The dataset provided here is not complete because of the size! ! ! Please refer to https://huggingface.co/datasets/THU-KEG/DeepPrune for the full dataset.

Preliminaries

DeepPrune Pipeline

Prerequisites

1 Prepare Finetuning Dataset

2 Offline Training

3 Online Pruning

Acknowledgement

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeepPrune: Parallel Scaling without Inter-trace Redundancy

Overview

Results

Dependencies

Dataset

The dataset provided here is not complete because of the size! ! ! Please refer to https://huggingface.co/datasets/THU-KEG/DeepPrune for the full dataset.

Preliminaries

DeepPrune Pipeline

Prerequisites

1 Prepare Finetuning Dataset

2 Offline Training

3 Online Pruning

Acknowledgement

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages