Over-Parametrized
Repository for my final project for the Biological and Artificial Intelligence class (Neuro 140) at Harvard, taught by Professor Gabriel Kreiman.
Modern machine learning lives in the over-parametrized regime
Modern architecture, such as deep neural networks, typically have many more free parameters than examples used to train them. Classical statistics predict they would over-fit the training set, with poor generalisation performance outside on a test sample. Why this is not the case is still an open question, that I investigate here from the perspective of kernel machines.
Conda environment
The simplest wat to run this code is to create a Conda environment from the environment file, by simply running conda env create -f environment.yml.
If you are not familiar with Conda environments you can have a look here and learn more! They are extremelly powerful!
The condition number for random matrices
The first piece of the project is to study the shape of the condition number for linear system and kernel machines. In Condition number notebook, you can compute the condition number for random matrices associated to linear system and kernel machines, by varying the number of parameters and the dimensionality of the data. The condition number has a double descent shape, with a peak when the number of training examples (or conditions, for a linear system) approach the dimensionality of the data.
Classification of hand-written MNIST digits with support-vector machines
To observe the effect of over-parametrization, I looked for the double descent pattern in the test error of a classifier, trained to distinguish the notorious MNIST handwritten digits.
For running the MNIST SVM notebook you will need to download the MNIST train and test datasets in csv from the Kaggle website and place them in a folder called data.
The notebook allows to change many different parameters and produce test error curves as a function of the number of training examples. The function analytic_pipeline_v can be call with an array of parameters that are varied while all the rest is kept at default values.
Kernel ridge-less regression with kernel machines, implemented with a closed-form expression
Another study implements a kernel ridge-less (without regularisation) regression using a closed-form expression for kernel machines. The classes in the main class definition file can be customised to handle different type of data. I adapt them to handle synthetic data, with which you could see beautiful double descent pattern, and MNIST handwritten digits.
Reach out to me for collaborating and contributing!
I'd love to share this project and work together with other people!