Bohrium
Bohrium -- AI-accelerated computational chemistry for materials and molecular property prediction at the intersection of quantum mechanics and machine learning.
Topics: academic-search * active-inference * cognitive-architecture * generative-models * knowledge-graph * latent-diffusion * neuromorphic-computing * paper-recommendation * scholar-network
Overview
Bohrium (named after element 107, Bh) is a computational chemistry platform that applies machine learning to accelerate quantum chemical calculations -- a field known as QML (Quantum Machine Learning for Chemistry) or MLPotentials. It implements and benchmarks ML potential energy surfaces (ML-PES) that learn to predict molecular energies and forces at DFT accuracy from atomistic fingerprints, with inference speed orders of magnitude faster than ab initio calculations.
The core workflow follows the standard QML pipeline: generate a training dataset of molecular geometries with DFT-computed energies, forces, and dipole moments (or query an existing dataset like QM9, ANI-1, or COMP6); featurise each molecule using MBTR, SOAP, or SchNet-style graph representations; train an energy model (Gaussian Process Regression, kernel ridge regression, or a message-passing neural network); evaluate on held-out test sets using MAE in kcal/mol (energy) and kcal/mol/A (forces); and deploy for molecular dynamics or property screening.
The platform also includes D-ML (delta machine learning) -- training an ML model to predict the correction from a cheap low-level method (PM7 semi-empirical) to an expensive high-level method (CCSD(T)), enabling near-coupled-cluster accuracy at semi-empirical cost. This is one of the most practically powerful ideas in computational chemistry, and Bohrium provides a clean implementation and benchmark against QM9 reference data.
Motivation
DFT calculations that take hours per molecule create a fundamental bottleneck in drug discovery, materials design, and atmospheric chemistry. ML-accelerated potentials that achieve DFT accuracy in milliseconds per molecule have the potential to transform computational chemistry from a tool used on hundreds of candidates to one applied to millions. Bohrium was built to make those ML potential methodologies accessible, reproducible, and benchmarkable.
Architecture
Molecular Structure (xyz / SMILES / InChI)
|
Preprocessing (ase: geometry + neighbour list)
|
Featurisation:
+-- SOAP (Smooth Overlap of Atomic Positions)
+-- MBTR (Many-Body Tensor Representation)
+-- SchNet graph (atom types + distances + angles)
|
Energy Predictor:
+-- GPR (Gaussian Process Regression, exact/sparse)
+-- KRR (Kernel Ridge Regression, Matern/RBF)
+-- SchNet / DimeNet / PaiNN (MPNN architectures)
|
Energy E(R) + Forces F = -E
|
D-ML: E_HL E_LL + E_ML(R)
Features
ML Potential Energy Surface Training
Train energy models on DFT datasets with GPR, KRR, or MPNN architectures -- producing models that predict energies and forces at DFT accuracy in milliseconds per evaluation.
SOAP and MBTR Featurisation
dscribe-based SOAP and MBTR descriptor computation with configurable hyperparameters, supporting elements from H to Rn and periodic/non-periodic systems.
SchNet / PaiNN MPNN Models
Message-passing neural network implementations for end-to-end energy prediction directly from atomic numbers and positions, with equivariant force computation via automatic differentiation.
D-ML Implementation
Delta machine learning pipeline: compute PM7 energies, train correction model to DFT, and evaluate near-DFT accuracy at PM7 cost -- benchmarked against QM9 and ANI-1 datasets.
QM9 and ANI-1 Benchmarks
Pre-configured benchmark pipelines for the QM9 (134,000 small organic molecules, 13 DFT properties) and ANI-1 (20M off-equilibrium configurations) datasets with reference MAE values.
Molecular Dynamics Integration
ML potential deployment in ASE MD engine: NVE/NVT/NPT molecular dynamics at ML-potential speed, with energy conservation monitoring and trajectory analysis tools.
Property Prediction Dashboard
Streamlit interface for predicting multiple molecular properties (dipole moment, polarisability, HOMO-LUMO gap, heat capacity) from SMILES or xyz input with the trained models.
Active Learning Loop
Uncertainty-guided active learning: identify structures where the ML model is uncertain, queue for DFT calculation, add to training set, retrain -- iteratively improving model with minimum DFT cost.
Tech Stack
| Library / Tool | Role | Why This Choice |
|---|---|---|
| ASE | Atomic simulation | Structure I/O, MD engine, calculator interface |
| dscribe | Featurisation | SOAP, MBTR, Coulomb Matrix descriptors |
| PyTorch Geometric | MPNN models | SchNet, PaiNN graph neural network training |
| scikit-learn | Classical ML models | GPR, KRR with precomputed kernel matrices |
| RDKit | SMILES processing | Molecular structure generation and manipulation |
| pandas / NumPy | Dataset management | QM9/ANI dataset loading and preprocessing |
| Streamlit | Property predictor UI | Interactive molecular property prediction interface |
Getting Started
Prerequisites
- Python 3.9+ (or Node.js 18+ for TypeScript/JavaScript projects)
- A virtual environment manager (
venv,conda, or equivalent) - API keys as listed in the Configuration section
Installation
cd Bohrium
python -m venv venv && source venv/bin/activate
pip install ase dscribe torch torch-geometric scikit-learn rdkit pandas numpy streamlit
# Optional: GPyTorch for exact/sparse GP regression
# pip install gpytorch
# Download QM9 dataset
python download_qm9.py --output data/qm9/
# Train energy model
python train.py --dataset qm9 --model schnet --target dipole_moment --epochs 100
# Launch property predictor
streamlit run app.py
Usage
python train.py --dataset qm9 --model schnet --target gap --epochs 100
# Predict property for a molecule
python predict.py --smiles 'c1ccccc1' --property gap --model checkpoints/best.pt
# Run molecular dynamics
python md_simulation.py --structure benzene.xyz --potential ml_potential.pt \
--ensemble NVT --temperature 300 --steps 10000
# D-ML training
python delta_ml.py --low_level pm7 --high_level dft --dataset qm9 --target energy
Configuration
| Variable | Default | Description |
|---|---|---|
MODEL |
schnet |
ML potential model: gpr, krr, schnet, painn |
DATASET |
qm9 |
Training dataset: qm9, ani1, custom |
TARGET_PROPERTY |
energy |
Prediction target: energy, forces, dipole, gap |
LEARNING_RATE |
0.001 |
Neural model learning rate |
CUTOFF_RADIUS |
6.0 |
Atomic neighbour cutoff in Angstroms |
Copy
.env.exampleto.envand populate required values before running.
Project Structure
Bohrium/
+-- README.md
+-- requirements.txt
+-- ReSeArcH.py
+-- ...
Roadmap
- NequIP and MACE equivariant MPNN implementation for force accuracy improvement
- Transfer learning from pre-trained universal potentials (MACE-MP-0, CHGNet) for new chemistries
- Reaction path optimisation: transition state search using ML potential for barrier height prediction
- Crystal structure prediction: evolutionary algorithm using ML potential for free energy minimisation
- Cloud deployment: REST API for on-demand molecular property prediction
Contributing
Contributions, issues, and suggestions are welcome.
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-idea - Commit your changes:
git commit -m 'feat: add your idea' - Push to your branch:
git push origin feature/your-idea - Open a Pull Request with a clear description
Please follow conventional commit messages and add documentation for new features.
Notes
ML potentials trained on QM9 are valid only for small organic molecules containing C, H, O, N, F up to 9 heavy atoms. Extrapolation to larger molecules or different chemistries requires retraining or fine-tuning. Forces computed via automatic differentiation (backprop through the energy model) are exact gradients of the energy -- not finite differences.
Author
Devanik Debnath
B.Tech, Electronics & Communication Engineering
National Institute of Technology Agartala
License
This project is open source and available under the MIT License.
Built with curiosity, depth, and care -- because good projects deserve good documentation.