Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings
llm-d

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

llm-d

llm-d enables high performance distributed inference in production on Kubernetes

Welcome to llm-d: a Kubernetes-native high-performance distributed LLM inference framework

llm-d is a Kubernetes-native high-performance distributed LLM inference framework that provides the fastest time-to-value and competitive performance per dollar. Built on vLLM, Kubernetes, and Inference Gateway, llm-d offers modular solutions for distributed inference with features like KV-cache aware routing and disaggregated serving.

Quick Start Guide

New to llm-d? Here's how to get started:

  1. Join our Slack - Get your invite and visit llm-d.slack.com
  2. Explore our code - GitHub Organization
  3. Join a meeting - Add calendar
  4. Pick your area - Browse Special Interest Groups.

Key Resources

Communication Channels

Regular Meetings

All meetings are open to the public!

  • Weekly Standup: Every Wednesday at 12:30pm ET - Project updates and open discussion
  • SIG Meetings: Various times throughout the week - See SIG details for schedules

Join to participate, ask questions, or just listen and learn!

Special Interest Groups (SIGs)

Want to dive deeper into specific areas? Our Special Interest Groups are focused teams working on different aspects of llm-d:

  • Inference Scheduler - Intelligent request routing and load balancing
  • Benchmarking - Performance testing and optimization
  • PD-Disaggregation - Prefill/decode separation patterns
  • KV-Disaggregation - KV caching and distributed storage
  • Installation - Kubernetes integration and deployment
  • Autoscaling - Traffic-aware autoscaling and resource management
  • Observability - Monitoring, logging, and metrics

View more SIG Details -

How to Contribute

Getting Involved

Contributing Code

  1. Read Guidelines: Review our Code of Conduct and contribution process
  2. Sign Commits: All commits require DCO sign-off (git commit -s)

Ways to Contribute

  • Bug fixes and small features - Submit PRs directly to component repos
  • New features with APIs - Require project proposals
  • Documentation - Help improve guides and examples
  • Testing & Benchmarking - Contribute to our test coverage
  • Experimental features - Start in llm-d-incubation org

Security & Safety

Connect With Us

Follow llm-d across social platforms for updates, discussions, and community highlights:

Need Help?

Questions? Ideas? Just want to chat? We're here to help! The llm-d community team is friendly and responsive.


License: Apache 2.0

Pinned Loading

  1. llm-d llm-d Public

    Achieve state of the art inference performance with modern accelerators on Kubernetes

    Shell 2.5k 325

  2. llm-d-inference-scheduler llm-d-inference-scheduler Public

    Inference scheduler for llm-d

    Go 131 127

  3. llm-d-kv-cache llm-d-kv-cache Public

    Distributed KV cache scheduling & offloading libraries

    Go 103 88

  4. llm-d-benchmark llm-d-benchmark Public

    llm-d benchmark scripts and tooling

    Python 47 52

Repositories

Loading
Type
Select type
Language
Select language
Sort
Select order
Showing 10 of 15 repositories