Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
/ SLAVC Public

Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)

License

Notifications You must be signed in to change notification settings

stoneMo/SLAVC

Repository files navigation

A Closer Look at Weakly-Supervised Audio-Visual Source Localization

Official codebase for SLAVC.

SLAVC is a new approach for weakly-supervised visual sound source localization to identify negatives and solve significant overfitting problems.

A Closer Look at Weakly-Supervised Audio-Visual Source Localization
Shentong Mo, Pedro Morgado
NeurIPS 2022.

Environment

To setup the environment, please simply run

pip install -r requirements.txt

Datasets

Flickr-SoundNet

Data can be downloaded from Learning to localize sound sources

VGG-Sound Source

Data can be downloaded from Localizing Visual Sounds the Hard Way

Extended Flickr-SoundNet

Data can be downloaded from Extended-Flickr-SoundNet

Extended VGG-Sound Source

Data can be downloaded from Extended-VGG-Sound Source

Model Zoo

We release MoVSL model pre-trained on VGG-Sound 144k data and scripts on reproducing results on Extended Flickr-SoundNet and Extended VGG-Sound Source benchmarks.

Method Train Set Test Set AP max-F1 Precision url Train Test
SLAVC VGG-Sound 144k Extended Flickr-SoundNet 51.63 59.10 83.60 model script script
SLAVC VGG-Sound 144k Extended VGG-SS 32.95 40.00 37.79 model script script

Train

For training an SLAVC model, please run

python train.py --multiprocessing_distributed \
--train_data_path /path/to/VGGSound-all/ \
--test_data_path /path/to/Flickr-SoundNet/ \
--test_gt_path /path/to/Flickr-SoundNet/Annotations/ \
--experiment_name vggss144k_slavc \
--model 'slavc' \
--trainset 'vggss_144k' \
--testset 'flickr' \
--epochs 20 \
--batch_size 128 \
--init_lr 0.0001 \
--use_momentum --use_mom_eval \
--m_img 0.999 --m_aud 0.999 \
--dropout_img 0.9 --dropout_aud 0

Test

For testing and visualization, simply run

python test.py --test_data_path /path/to/Extended-VGGSound-test/ \
--model_dir checkpoints \
--experiment_name vggss144k_slavc \
--testset 'vggss_plus_silent' \
--alpha 0.9 \
--relative_prediction

Citation

If you find this repository useful, please cite our paper:

@inproceedings{mo2022SLAVC,
title={A Closer Look at Weakly-Supervised Audio-Visual Source Localization},
author={Mo, Shentong and Morgado, Pedro},
booktitle={Advances in Neural Information Processing Systems},
year={2022}
}

About

Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)

Topics

Resources

Readme

License

Apache-2.0 license

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published