About SubModLib
SubModLib is an easy-to-use, efficient and scalable Python library for submodular optimization with a C++ optimization engine. Submodlib finds its application in summarization, data subset selection, hyper parameter tuning, efficient training etc. Through a rich API, it offers a great deal of flexibility in the way it can be used.
Please check out our latest arxiv preprint: https://arxiv.org/abs/2202.10680
Salient Features
- Rich suite of functions for a wide variety of subset selection tasks:
- regular set (submodular) functions
- submodular mutual information functions
- conditional gain functions
- conditional mutual information functions
- Supports different types of optimizers
- naive greedy
- lazy (accelerated) greedy
- stochastic (random) greedy
- lazier than lazy greedy
- Combines the best of Python's ease of use and C++'s efficiency
- Rich API which gives a variety of options to the user. See this notebook for an example of different usage patterns
- De-coupled function and optimizer paradigm makes it suitable for a wide-variety of tasks
- Comprehensive documentation (available here)
Google Colab Notebooks Demonstrating the power of SubModLib and sample usage
- Modelling capabilities of regular submodular functions
- Modelling capabilities of submodular mutual information (SMI) functions
- Modelling capabilities of conditional gain (CG) functions
- Modelling capabilities of conditional mutual information (CMI) functions
- This notebook contains a quantitative analysis of performance of different functions and role of the parameterization in aspects like query-coverage, query-relevance, privacy-irrelevance and diversity for different SMI, CG and CMI functions as observed on synthetically generated dataset.
- This notebook contains similar analysis on ImageNet dataset.
- For a more detailed discussion on all possible usage patterns, please see Different Options of Usage
Setup
Alternative 1
$ pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ submodlib
Alternative 2 (if local docs need to be built and test cases need to be run)
$ git clone https://github.com/decile-team/submodlib.git$ cd submodlib$ pip install .- Latest documentation is available at readthedocs. However, if local documentation is required to be built, follow these steps::
$ pip install -U sphinx$ pip install sphinxcontrib-bibtex$ pip install sphinx-rtd-theme$ cd docs$ make clean html
- To run the tests, follow these steps:
$ pip install pytest$ pytest# this runs ALL tests$ pytest -m# this runs test specified by the . Possible markers are mentioned in pyproject.toml file.--verbose --disable-warnings -rA
Usage
It is very easy to get started with submodlib. Using a submodular function in submodlib essentially boils down to just two steps:
- instantiate the corresponding function object
- invoke the desired method on the created object
The most frequently used methods are:
- f.evaluate() - takes a subset and returns the score of the subset as computed by the function f
- f.marginalGain() - takes a subset and an element and returns the marginal gain of adding the element to the subset, as computed by f
- f.maximize() - takes a budget and an optimizer to return an optimal set as a result of maximizing f
For example,
from submodlib import FacilityLocationFunction
objFL = FacilityLocationFunction(n=43, data=groundData, mode="dense", metric="euclidean")
greedyList = objFL.maximize(budget=10,optimizer='NaiveGreedy')