xlite-dev * GitHub - http.pieter.net

hearts I love open source, bro, and I think you do too. hearts

Pinned Loading

LeetCUDA Public

LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.

A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.

A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.

Python 5k 347

A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.

Python 525 26

torchlm Public

An easy-to-use PyTorch library for face landmarks detection: training, evaluation, inference, and 100+ data augmentations.

Python 268 27

ffpa-attn Public

FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x| vs SDPA EA.

Cuda 255 13

Navigation Menu