Repositories: lite.ai.toolkit | Awesome-LLM-Inference | LeetCUDA
ffpa-attn | HGEMM | flux-faster | Awesome-DiT-Inference
RVM-Inference | lihang-notes(PDF, 200 Pages) | torchlm
Contact: qyjdef@163.com | GitHub: DefTruth | Zhi Hu : DefTruth
Repositories: lite.ai.toolkit | Awesome-LLM-Inference | LeetCUDA
ffpa-attn | HGEMM | flux-faster | Awesome-DiT-Inference
RVM-Inference | lihang-notes(PDF, 200 Pages) | torchlm
Contact: qyjdef@163.com | GitHub: DefTruth | Zhi Hu : DefTruth
A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.
A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.
A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.
SGLang is a fast serving framework for large language models and vision language models.
Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.
A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.
A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.
LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Loading...
Loading...