Name	Name	Last commit message	Last commit date
Latest commit History 13 Commits
dexa	dexa
README.md	README.md

DEXA

Code for DEXA: Deep Encoders with Auxiliary Parameters for Extreme Classification [1]

Setting up

Expected directory structure

Download data for DEXA

* Download the (zipped file) raw data from The XML repository [5]. * Extract the zipped file into data directory. * The following files should be available in /data/ (create empty filter files if unavailable): - trn.json.gz - tst.json.gz - lbl.json.gz - filter_labels_text.txt - filter_labels_train.txt

Example use cases

A single learner

Extract and tokenize data as follows.

./prepare_data.sh LF-AmazonTitles-131K 32

The algorithm can be run as follows. A json file (e.g., config/DEXA/LF-AmazonTitles-131K.json) is used to specify architecture and other arguments. Please refer to the full documentation below for more details.

./run_main.sh 0 DEXA LF-AmazonTitles-131K 0 108

Full Documentation

Tokenize the data

./prepare_data.sh * dataset - Name of the dataset. - Tokenizer expects the following files in /data/ - trn.json.gz - tst.json.gz - lbl.json.gz - it'll dump the following six tokenized files - trn_doc_input_ids.npy - trn_doc_attention_mask.npy - tst_doc_input_ids.npy - tst_doc_attention_mask.npy - lbl_input_ids.npy - lbl_attention_mask.npy * seq-len - sequence length of text to consider while tokenizing - 32 for titles dataset - 256 for Wikipedia - 128 for other full-text datasets

Run DEXA

./run_main.sh * gpu_id: Run the program on this GPU. * type DEXA builds upon NGAME[2], SiameseXML [3] and DeepXML[4] for training. An encoder is trained in M1 and the classifier is trained in M-IV. - DEXA: The intermediate representation is not fine-tuned while training the classifier (more scalable; suitable for large datasets). * dataset - Name of the dataset. - DEXA expects the following files in /data/ - trn_doc_input_ids.npy - trn_doc_attention_mask.npy - trn_X_Y.txt - tst_doc_input_ids.npy - tst_doc_attention_mask.npy - tst_X_Y.txt - lbl_input_ids.npy - lbl_attention_mask.npy - filter_labels_test.txt (put empty file or set as null in config when unavailable) * version - different runs could be managed by version and seed. - models and results are stored with this argument. * seed - seed value as used by numpy and PyTorch.

Cite as

@InProceedings{Dahiya23b, author = "Dahiya, K. and Yadav, S. and Sondhi, S. and Saini, D. and Mehta, S. and Jiao, J. and Agarwal, S. and Kar, P. and Varma, M.", title = "Deep encoders with auxiliary parameters for extreme classification", booktitle = "KDD", month = "August", year = "2023" }

References

[1] K. Dahiya, S. Yadav, S. Sondhi, D. Saini, S. Mehta, J. Jiao, S. Agarwal, P. Kar and M. Varma. Deep encoders with auxiliary parameters for extreme classification. In KDD, Long Beach (CA), August 2023.

[2] K. Dahiya, N. Gupta, D. Saini, A. Soni, Y. Wang, K. Dave, J. Jiao, K. Gururaj, P. Dey, A. Singh, D. Hada, V. Jain, B. Paliwal, A. Mittal, S. Mehta, R. Ramjee, S. Agarwal, P. Kar and M. Varma. NGAME: Negative mining-aware mini-batching for extreme classification. In WSDM, Singapore, March 2023.

[2] K. Dahiya, A. Agarwal, D. Saini, K. Gururaj, J. Jiao, A. Singh, S. Agarwal, P. Kar and M. Varma. SiameseXML: Siamese networks meet extreme classifiers with 100M labels. In ICML, July 2021

[3] K. Dahiya, D. Saini, A. Mittal, A. Shaw, K. Dave, A. Soni, H. Jain, S. Agarwal, and M. Varma. Deepxml: A deep extreme multi-label learning framework applied to short text documents. In WSDM, 2021.

[4] pyxclib: https://github.com/kunaldahiya/pyxclib

[5] The Extreme Classification Repository: http://manikvarma.org/downloads/XC/XMLRepository.html

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extreme-classification/dexa

Folders and files

Latest commit

History

Repository files navigation

DEXA

Setting up

Expected directory structure

Download data for DEXA

Example use cases

A single learner

Full Documentation

Tokenize the data

Run DEXA

Cite as

YOU MAY ALSO LIKE

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Extreme-classification/dexa

Folders and files

Latest commit

History

Repository files navigation

DEXA

Setting up

Expected directory structure

Download data for DEXA

Example use cases

A single learner

Full Documentation

Tokenize the data

Run DEXA

Cite as

YOU MAY ALSO LIKE

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages