Name	Name	Last commit message	Last commit date
Latest commit History 5 Commits
cafuser	cafuser
configs	configs
resources	resources
tools	tools
.gitignore	.gitignore
INSTALL.md	INSTALL.md
LICENSE	LICENSE
README.md	README.md
requirements.txt	requirements.txt
slurm_train.sh	slurm_train.sh
train_net.py	train_net.py

CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes

by Tim Broedermann, Christos Sakaridis, Yuqian Fu, and Luc Van Gool

News:

[2026-01-26] We are happy to announce DGFusion, which adds local depth guidance to CAFuser. DGFusion was accepted to the IEEE Robotics and Automation Letters.
[2025-04-11] Code and models are released.
[2025-01-11] We are happy to announce that CAFuser was accepted in the IEEE Robotics and Automation Letters.

Overview

This repository contains the official code for the RA-L 2025 paper CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes. CAFuser is a condition-aware multimodal fusion architecture designed to enhance robust semantic perception in autonomous driving. It employs a Condition Token (CT), dynamically guiding the fusion of multiple sensor modalities to optimize performance across diverse scenarios. To train the CT, a verbo-visual contrastive loss aligns it with semantic environmental descriptors, enabling direct prediction from RGB features. The Condition-Aware Fusion module uses the CT to adaptively fuse sensor data based on environmental context. Further, CAFuser introduces modality-specific feature adapters, aligning inputs from different sensors into a shared latent space, and integrates these without loss of performance with a single shared backbone. CAFuser ranks first on the public MUSES benchmarks, achieving 59.7 PQ for multimodal panoptic and 78.2 mIoU for semantic segmentation and also sets the new state of the art on DeLiVER.

Installation
Prepare Datasets
Training
Evaluation
Citation
Acknowledgments

Installation

We use Python 3.9.18, PyTorch 2.3.1, and CUDA 11.8.
We use Detectron2-v0.6.
For complete installation instructions, please see INSTALL.md.

Prepare Datasets

CAFuser support two datasets: MUSES and DeLiVER. The datasets are assumed to exist in a directory specified by the environment variable DETECTRON2_DATASETS. Under this directory, detectron2 will look for datasets in the structure described below, if needed.

$DETECTRON2_DATASETS/ muses/ deliver/

You can set the location for builtin datasets by export DETECTRON2_DATASETS=/path/to/datasets. If left unset, the default is ./datasets relative to your current working directory.
For more details on how to prepare the datasets, please see detectron2's documentation.

MUSES dataset structure:

You need to dowload the following packages from the MUSES dataset:

RGB_Frame_Camera_trainvaltest
Panoptic_Annotations_trainval
Semantic_Annotations_trainval
Event_Camera_trainvaltest
Lidar_trainvaltest
Radar_trainvaltest
GNSS_trainvaltest

and place them in the following structure:

$DETECTRON2_DATASETS/ muses/ calib.json gt_panoptic/ frame_camera/ lidar/ radar/ event_camera/ gnss/

DeLiVER dataset structure:

You can download the DeLiVER dataset from the following link and place it in the following structure:

$DETECTRON2_DATASETS/ deliver/ semantic/ img/ lidar/ event/ hha/

Training

We train CAFuser using 4 NVIDIA TITAN RTX GPUs with 24GB memory each.
For further details on training, please look at the OneFormer getting started and the getting Started with Detectron2
We provide a script train_net.py, that is made to train all the configs provided in CAFuser.
We further provide a dummy script slurm_train.sh to run the training on a SLURM cluster. Adjust the scripts at all places marked with TODO's to your needs.

Download swin tiny in the pretrained folder and convert it to a detectron2 compatible format:

mkdir pretrained wget -P pretrained https://github.com/SwinTransformer/storage/releases/download/v1.0.8/swin_tiny_patch4_window7_224_22k.pth python tools/convert-pretrained-model-to-d2.py pretrained/swin_tiny_patch4_window7_224_22k.pth pretrained/swin_tiny_patch4_window7_224_22k.pkl rm pretrained/swin_tiny_patch4_window7_224_22k.pth

Training CAFuser on MUSES dataset:

python train_net.py --dist-url 'tcp://127.0.0.1:50163' \ --num-gpus 4 \ --config-file configs/muses/swin/cafuser_swin_tiny_bs8_180k_muses_clre.yam l \ OUTPUT_DIR output/cafuser_swin_tiny_bs8_180k_muses_clre \ WANDB.NAME cafuser_swin_tiny_bs8_180k_muses_clre

Training CAFuser on DeLiVER dataset:

python train_net.py --dist-url 'tcp://127.0.0.1:50164' \ --num-gpus 4 \ --config-file configs/deliver/swin/cafuser_swin_tiny_bs8_200k_deliver_clde .yaml \ OUTPUT_DIR output/cafuser_swin_tiny_bs8_200k_deliver_clde \ WANDB.NAME cafuser_swin_tiny_bs8_200k_deliver_clde

Evaluation

We provide pre-trained weights for CAFuser on MUSES and DeLiVER datasets. We provide models for both the standard CAFuser (CA²) and CAFuser-CAA variants. The models are trained on the MUSES dataset and the DeLiVER dataset, respectively.
To evaluate a model's performance, use:

MUSES (on the validation set):

python train_net.py \ --config-file configs/muses/swin/cafuser_swin_tiny_bs8_180k_muses_clre.yam l \ --eval-only MODEL.IS_TRAIN False MODEL.WEIGHTS <path-to-checkpoint> \ DATASETS.TEST_PANOPTIC "('muses_panoptic_val',)" \ MODEL.TEST.PANOPTIC_ON True MODEL.TEST.SEMANTIC_ON True

Predict on the test set to upload to the MUSES benchmark for both semantic and panoptic segmentation:

python train_net.py \ --config-file configs/muses/swin/cafuser_swin_tiny_bs8_180k_muses_clre.yam l \ ----inference-only MODEL.IS_TRAIN False MODEL.WEIGHTS <path-to-checkpoint> \ OUTPUT_DIR output/cafuser_swin_tiny_bs8_200k_deliver_clde \ DATASETS.TEST_PANOPTIC "('muses_panoptic_test',)" \ MODEL.TEST.PANOPTIC_ON True MODEL.TEST.SEMANTIC_ON True

This will create folders under /inference for the semantic and panoptic predictions (e.g. output/cafuser_swin_tiny_bs8_200k_deliver_clde/inference/...).

For the panoptic predictions, you can zip the labelIds folder under the panoptic folder and upload it to the MUSES benchmark.
For the semantic predictions, you can zip the labelTrainIds folder under the semantic folder and upload it to the MUSES benchmark.

For better visualization you can further set MODEL.TEST.SAVE_PREDICTIONS.CITYSCAPES_COLORS True to get additional folders with the predictions in the cityscapes colors.

DeLiVER on the test set:

python train_net.py \ --config-file configs/deliver/swin/cafuser_swin_tiny_bs8_200k_deliver_clde .yaml \ --eval-only MODEL.IS_TRAIN False MODEL.WEIGHTS <path-to-checkpoint> \ DATASETS.TEST_SEMANTIC "('deliver_semantic_test',)"

Replace deliver_semantic_test with deliver_semantic_val to evaluate on the validation set.

Results

We provide the following results for the MUSES dataset, with the testing score from the official MUSES Benchmark:

Method	Backbone	PQ-val	mIoU-val	PQ-test	mIoU-test	config	Checkpoint
CAFuser	Swin-T	59.26	78.71	59.7	78.2	config	model
CAFuser-CAA	Swin-T	59.35	79.04	59.5	78.43	config	model

We provide the following results for the DeLiVER dataset:

Method	Backbone	mIoU-val	mIoU-test	config	Checkpoint
CAFuser	Swin-T	68.12	55.80	config	model
CAFuser-CAA	Swin-T	68.79	55.38	config	model

Citation

If you find this project useful in your research, please consider citing:

@article{broedermann2024cafuser, author={Br{\"o}dermann, Tim and Sakaridis, Christos and Fu, Yuqian and Van Gool, Luc}, journal={IEEE Robotics and Automation Letters}, title={CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes}, year={2025}, volume={10}, number={4}, pages={3134-3141}, keywords={Sensors;Sensor fusion;Semantics;Adaptation models;Cameras;Feature extraction;Laser radar;Radar;Meteorology;Semantic segmentation;Sensor fusion;semantic scene understanding;computer vision for transportation;deep learning for visual perception;multimodal semantic perception}, doi={10.1109/LRA.2025.3536218}}

Acknowledgments

This project is based on the following open-source projects. We thank their authors for making the source code publicly available.

This work was supported by the ETH Future Computing Laboratory (EFCL), financed by a donation from Huawei Technologies.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

timbroed/CAFuser

Folders and files

Latest commit

History

Repository files navigation

CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes

Overview

Contents

Installation

Prepare Datasets

Training

Evaluation

Results

Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes

Overview

Contents

Installation

Prepare Datasets

Training

Evaluation

Results

Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages