stoneMo / SLAVC Public

Notifications You must be signed in to change notification settings
Fork 6
Star 20

Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)

License

Apache-2.0 license

20 stars 6 forks Branches Tags Activity

Star

Notifications

stoneMo/SLAVC

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
fonts		fonts
images		images
metadata		metadata
scripts		scripts
LICENSE		LICENSE
README.md		README.md
audio_io.py		audio_io.py
datasets.py		datasets.py
model.py		model.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
utils.py		utils.py

Repository files navigation

A Closer Look at Weakly-Supervised Audio-Visual Source Localization

Official codebase for SLAVC.

SLAVC is a new approach for weakly-supervised visual sound source localization to identify negatives and solve significant overfitting problems.

A Closer Look at Weakly-Supervised Audio-Visual Source Localization
Shentong Mo, Pedro Morgado
NeurIPS 2022.

Environment

To setup the environment, please simply run

pip install -r requirements.txt

Datasets

Flickr-SoundNet

Data can be downloaded from Learning to localize sound sources

VGG-Sound Source

Data can be downloaded from Localizing Visual Sounds the Hard Way

Extended Flickr-SoundNet

Data can be downloaded from Extended-Flickr-SoundNet

Extended VGG-Sound Source

Data can be downloaded from Extended-VGG-Sound Source

Model Zoo

We release MoVSL model pre-trained on VGG-Sound 144k data and scripts on reproducing results on Extended Flickr-SoundNet and Extended VGG-Sound Source benchmarks.

Method	Train Set	Test Set	AP	max-F1	Precision	url	Train	Test
SLAVC	VGG-Sound 144k	Extended Flickr-SoundNet	51.63	59.10	83.60	model	script	script
SLAVC	VGG-Sound 144k	Extended VGG-SS	32.95	40.00	37.79	model	script	script

Train

For training an SLAVC model, please run

python train.py --multiprocessing_distributed \ --train_data_path /path/to/VGGSound-all/ \ --test_data_path /path/to/Flickr-SoundNet/ \ --test_gt_path /path/to/Flickr-SoundNet/Annotations/ \ --experiment_name vggss144k_slavc \ --model 'slavc' \ --trainset 'vggss_144k' \ --testset 'flickr' \ --epochs 20 \ --batch_size 128 \ --init_lr 0.0001 \ --use_momentum --use_mom_eval \ --m_img 0.999 --m_aud 0.999 \ --dropout_img 0.9 --dropout_aud 0

Test

For testing and visualization, simply run

python test.py --test_data_path /path/to/Extended-VGGSound-test/ \ --model_dir checkpoints \ --experiment_name vggss144k_slavc \ --testset 'vggss_plus_silent' \ --alpha 0.9 \ --relative_prediction

Citation

If you find this repository useful, please cite our paper:

@inproceedings{mo2022SLAVC, title={A Closer Look at Weakly-Supervised Audio-Visual Source Localization}, author={Mo, Shentong and Morgado, Pedro}, booktitle={Advances in Neural Information Processing Systems}, year={2022} }

About

Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)

Topics

silence overfitting self-supervised-learning audio-visual-correspondence audio-visual-learning visual-sound-localization

Resources

Readme

License

Apache-2.0 license

Activity

Releases

No releases published

Packages

No packages published

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

stoneMo/SLAVC

Folders and files

Latest commit

History

Repository files navigation

A Closer Look at Weakly-Supervised Audio-Visual Source Localization

Environment

Datasets

Flickr-SoundNet

VGG-Sound Source

Extended Flickr-SoundNet

Extended VGG-Sound Source

Model Zoo

Train

Test

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

License

stoneMo/SLAVC

Folders and files

Latest commit

History

Repository files navigation

A Closer Look at Weakly-Supervised Audio-Visual Source Localization

Environment

Datasets

Flickr-SoundNet

VGG-Sound Source

Extended Flickr-SoundNet

Extended VGG-Sound Source

Model Zoo

Train

Test

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages