MLFF
Repository for training, testing and developing machine learned force fields using the SO3krates transformer [1, 2].
Installation
Assuming you have already set up an virtual environment with python version >= 3.9. In order to ensure compatibility
with CUDA jax/jaxlib have to be installed manually. Therefore before you install MLFF run one of the following
commands (depending on your CUDA version)
pip install --upgrade pip
# CUDA 12 installation
# Note: wheels only available on linux.
pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
# CUDA 11 installation
# Note: wheels only available on linux.
pip install --upgrade "jax[cuda11_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
for details check the official JAX
repository.
Next clone the mlff repository by running
git clone https://github.com/thorben-frank/mlff.git
Now do
cd mlff
pip install -e .
which completes the installation and installs remaining dependencies.
Weights and Bias
If you do not have a weights and bias account already you can create on here. After installing
mlff run
wandb login
and log in with your account.
Quickstart
Following we will give a quick start how to train, evaluate and run an MD simulation with the
SO3krates model.
Training
Train your fist So3krates model by running
train_so3krates --data_file data.xyz --n_train 1000 --n_valid 100 --wandb_init project=so3krates,name=first_run
The --data_file can be any format digestible by the ase.io.read method. In case minimal image convention
should be applied, add --mic to the command. The model parameters will be saved per default to module/. Another
directory can be specified using --ckpt_dir $CKPT_DIR, which will safe the model parameters to $CKPT_DIR/.
More details on training can be found in the detailed training section below.
Evaluation
After training, change into the model directory, e.g. and run the evaluate command
cd module
evaluate
As before, when your data is not in eV and Angstrom add the --units keyword. The reported metrics are then in eV and
Angstrom (e.g. --units energy='kcal/mol',force='kcal/(mol*Ang)' if the energy in your data is in kcal/mol).
ASE Calculator
Before you can use the calculator make sure you install the glp
package by cloning the glp repository and install it
git clone git@github.com:sirmarcel/glp.git
cd glp
pip install .
After training you can create an ASE Calculator from the trained model via
import numpy as np
calculator = mlffCalculator.create_from_ckpt_dir(
'path_to_ckpt_dir', # directory where e.g. hyperparameters.json is saved.
dtype=np.float32
)
Molecular Dynamics
WARNING: MD is currently under re-write so do not expect to work.
You can use the mdx package which is the mlff internal MD package, fully relying on jax and thus fully
optimized for XLA compilation on GPU.
First, lets create a relaxed structure, using the LBFGS optimizer
run_relaxation --qn_max_steps 1000 --qn_tol 0.0001 --use_mdx
which will save the relaxed geometry to relaxed_structure.h5. Next, convert the .h5 file to an
xyz file, by running
trajectory_to_xyz --trajectory relaxed_structure.h5 --output relaxed_structure.xyz
We now run an MD with the relaxed structure as start geometry
run_md --start_geometry relaxed_structure.xyz --thermostat velocity_verlet --temperature_init 600 --time_step 0.5 --total_time 1 --use_mdx
Temperature is in Kelvin, time step in femto seconds and total time in nano seconds. It will save a trajectory.h5
file to the current working directory.
Analysis
After the MD is finished you can either work with the trajectory.h5 using e.g. a jupyter notebook and h5py.
Alternatively, you can run
trajectory_to_xyz --trajectory trajectory.h5 --output trajectory.xyz
which will create an xyz file. The resulting xyz file can be used as input to the
MDAnalysis python package, which provides a broad range of functions
to analyse the MD simulations. The central Universe object can be creates easily as
# Load MD simulation results from xyz
u = mda.Universe('trajectory.xyz')
Deep Dive
In the quickstart section we went through a few basic steps, allowing to train, validate a so3krates model as well
as running an MD simulation. If you want to learn more about each of the steps, check the following sections.
Training
Lets start start from the training command already shown in the quickstart section
train_so3krates --ckpt_dir first_module --data_file atoms.xyz --n_train 1000 --n_valid 100
for which all data in atoms.xyz is loaded an split into 1000 data points for training,
100 data points for validation and the remaining data points n_test = n_tot - 1000 - 100 is hold back for testing
the potential after training. The validation data points are used to determine the best performing model during training,
for which the parameters are saved to first_module/checkpoint_loss_XXX where XXX denotes the training step for which the best performing
model was found. We will show later, how to load the checkpoint such that one use the trained potential directly in
Python.
Input Data Files
mlff can deal with any input file that can be read by the ase.io.read method.
Further --data_file admits to pass *.npz files. *.npz files allow to store numpy.ndarray under different keys.
Thus, mlff needs to "know" under which key to find e.g. the positions, the forces and so on .. Per default, mlff
assumes the following relations between property and key
{
atomic_position: R, # shape: (n_data, n, 3)
atomic_type: z, # shape: (n_data, n) or (n)
energy: E, # shape: (n_data, 1)
force: F # shape: (n_data, n, 3)
# in case mic should be applied (via --mic keyword)
unit_cell: unit_cell # shape: (n_data, 3, 3) # lattice vectors are row-wise
pbc: pbc # shape: (n_data, 3)
}
If you have an *.npz file which uses a different convention, you can specify the keys customizing the property keys
via
train_so3krates --ckpt_dir second_module --data_file file.npz --n_train 1000 --n_valid 100 --prop_keys atomic_position=pos,atomic_type=numbers
The above examples would assume that the properties energy and force are still found under the keys E and F,
respectively but position and atomic_type are found under pos and numbers.
Units
Per default, mlff assumes the ASE default units which are eV for energy and Angstrom for coordinates. Some data
sets, however, differ from these convention, e.g. the MD17 or the MD22 data set. You can download the corresponding
*.npz files here (Note that the *.xyz files provided there are not formatted in a
way that allows reading them via ase.io.read method). For both data sets the energy is is in kcal/mol such that
the forces are in kcal/(mol*Ang). You can either pre-process the data yourself by applying the proper conversion
factors and pass the data directly into the train_so3krates command. Alternatively, you can set them manually by
train_so3krates --ckpt_dir dha_module --train_file dha.npz --n_train 1000 --n_valid 500 --units energy='kcal/mol',force='kcal/(mol*Ang)'