Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Luo-Yihang/3DEnhancer

Repository files navigation

3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement

3DEnhancer employs a multi-view diffusion model to enhance multi-view images, thus improving 3D models.

For more visual results, go checkout our project page

Introducing 3DEnhancer

Despite advances in neural rendering, due to the scarcity of high-quality 3D datasets and the inherent limitations of multi-view diffusion models, view synthesis and 3D model generation are restricted to low resolutions with suboptimal multi-view consistency. In this study, we present a novel 3D enhancement pipeline, dubbed 3DEnhancer, which employs a multi-view latent diffusion model to enhance coarse 3D inputs while preserving multi-view consistency. Our method includes a pose-aware encoder and a diffusion-based denoiser to refine low-quality multi-view images, along with data augmentation and a multi-view attention module with epipolar aggregation to maintain consistent, high-quality 3D outputs across views. Unlike existing video-based approaches, our model supports seamless multi-view enhancement with improved coherence across diverse viewing angles. Extensive evaluations show that 3DEnhancer significantly outperforms existing methods, boosting both multi-view enhancement and per-instance 3D optimization tasks.

News

  • [2024/03/08] Our inference code and Gradio demo are released.
  • [2024/12/25] Our paper and project page are now live. Merry Christmas!

Installation

  1. Clone Repo

    git clone --recurse-submodules https://github.com/Luo-Yihang/3DEnhancer
    cd 3DEnhancer
  2. Create Conda Environment

    conda create -n 3denhancer python=3.10 -y
    conda activate 3denhancer
  3. Install Python Dependencies

    Important: Install Torch and Xformers based on your CUDA version. For example, for Torch 2.1.0 + CUDA 11.8:

    # Install Torch and Xformers
    pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
    pip install -U xformers --index-url https://download.pytorch.org/whl/cu118

    # Install other dependencies
    pip install -r requirements.txt

Pretrained Weights

Download the pretrained model from Hugging Face and place it under pretrained_models/3DEnhancer:

mkdir -p pretrained_models/3DEnhancer
wget -P pretrained_models/3DEnhancer https://huggingface.co/Luo-Yihang/3DEnhancer/resolve/main/model.safetensors

Inference

The code has been tested on NVIDIA A100 and V100 GPUs. An NVIDIA GPU with at least 18GB of memory is required.

We provide example inputs in assets/examples/mv_lq, where each subfolder contains four sequential multi-view images. Perform inference on multi-view images using an aligned prompt and noise_level. For example:

python inference.py \
--input_folder assets/examples/mv_lq/vase \
--output_folder results/vase \
--prompt "vase" \
--noise_level 0

For more options, refer to inference.py.

Demo

The script app.py provides a simple web demo for generating and enhancing multi-view images, as well as reconstructing 3D models using LGM.

Install a modified Gaussian splatting (with depth and alpha rendering) required for LGM:

git clone --recursive https://github.com/ashawkey/diff-gaussian-rasterization
pip install ./diff-gaussian-rasterization

Download the LGM pretrained weights from Hugging Face and place it under pretrained_models/LGM:

mkdir -p pretrained_models/LGM
wget -P pretrained_models/LGM https://huggingface.co/ashawkey/LGM/resolve/main/model_fp16_fixrot.safetensors

After installing the dependencies, start the demo with:

python app.py

The web demo is also available on Hugging Face Spaces!

TODO

  • Release paper and project page.
  • Release inference code.
  • Release Gradio demo.

License

This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.

Citation

If you find our code or paper helps, please consider citing:

@article{luo20243denhancer,
title={3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement},
author={Yihang Luo and Shangchen Zhou and Yushi Lan and Xingang Pan and Chen Change Loy},
booktitle={arXiv preprint arXiv:2412.18565}
year={2024},
}

Contact

If you have any questions, please feel free to reach us at luo_yihang@outlook.com.

About

[CVPR 2025] 3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement

Topics

Resources

Readme

License

View license

Stars

Watchers

Forks

Contributors

Languages