Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings
xlite-dev

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

xlite-dev

Develop ML/AI toolkits and ML/AI/CUDA Learning resources.

Pinned Loading

  1. LeetCUDA LeetCUDA Public

    LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.

    Cuda 9.8k 979

  2. lite.ai.toolkit lite.ai.toolkit Public

    A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.

    C++ 4.4k 775

  3. Awesome-LLM-Inference Awesome-LLM-Inference Public

    A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.

    Python 5k 347

  4. Awesome-DiT-Inference Awesome-DiT-Inference Public

    A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.

    Python 525 26

  5. torchlm torchlm Public

    An easy-to-use PyTorch library for face landmarks detection: training, evaluation, inference, and 100+ data augmentations.

    Python 268 27

  6. ffpa-attn ffpa-attn Public

    FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x| vs SDPA EA.

    Cuda 255 13

Repositories

Loading
Type
Select type
Language
Select language
Sort
Select order
Showing 10 of 56 repositories

Top languages

Loading...

Most used topics

Loading...