SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
-
Updated
Feb 20, 2026 - Python
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
micronet, a model compression and deploy lib. compression: 1, quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference), Low-Bit(<=2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2, pruning: normal, reg...
Neural Network Compression Framework for enhanced OpenVINO(tm) inference
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
YOLO ModelCompression MultidatasetTraining
Tutorial notebooks for hls4ml
A model compression and acceleration toolbox based on pytorch.
01 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture
Zhen Dui pytorchMo Xing De Zi Dong Hua Mo Xing Jie Gou Fen Xi He Xiu Gai Gong Ju Ji ,Bao Han Zi Dong Fen Xi Mo Xing Jie Gou De Mo Xing Ya Suo Suan Fa Ku
Quantized LLM training in pure CUDA/C++.
Enhancing LLMs with LoRA
This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
Notes on quantization in neural networks
FrostNet: Towards Quantization-Aware Network Architecture Search
OpenVINO Training Extensions Object Detection
Quantization Aware Training
Quantization-aware training with spiking neural networks
Train neural networks with joint quantization and pruning on both weights and activations using any pytorch modules
FakeQuantize with Learned Step Size(LSQ+) as Observer in PyTorch
Add a description, image, and links to the quantization-aware-training topic page so that developers can more easily learn about it.
To associate your repository with the quantization-aware-training topic, visit your repo's landing page and select "manage topics."