Welcome to llm-d: a Kubernetes-native high-performance distributed LLM inference framework

llm-d is a Kubernetes-native high-performance distributed LLM inference framework that provides the fastest time-to-value and competitive performance per dollar. Built on vLLM, Kubernetes, and Inference Gateway, llm-d offers modular solutions for distributed inference with features like KV-cache aware routing and disaggregated serving.

Quick Start Guide

New to llm-d? Here's how to get started:

Join our Slack - Get your invite and visit llm-d.slack.com
Explore our code - GitHub Organization
Join a meeting - Add calendar
Pick your area - Browse Special Interest Groups.

Key Resources

Documentation: llm-d.ai
Architecture: Architecture docs
Project Details: PROJECT.md
Releases: GitHub Releases
Upcoming Events: Upcoming Events

Communication Channels

Slack: llm-d Workspace - Daily conversations and Q&A
GitHub: llm-d Organization - Code, issues, and discussions
Google Group: llm-d-contributors - Architecture diagrams and updates
Google Drive: Public Documentation - Meeting recordings and project docs

Regular Meetings

All meetings are open to the public!

Weekly Standup: Every Wednesday at 12:30pm ET - Project updates and open discussion
SIG Meetings: Various times throughout the week - See SIG details for schedules

Join to participate, ask questions, or just listen and learn!

Special Interest Groups (SIGs)

Want to dive deeper into specific areas? Our Special Interest Groups are focused teams working on different aspects of llm-d:

Inference Scheduler - Intelligent request routing and load balancing
Benchmarking - Performance testing and optimization
PD-Disaggregation - Prefill/decode separation patterns
KV-Disaggregation - KV caching and distributed storage
Installation - Kubernetes integration and deployment
Autoscaling - Traffic-aware autoscaling and resource management
Observability - Monitoring, logging, and metrics

View more SIG Details -

How to Contribute

Getting Involved

Upcoming Events - Meetups, talks, and conferences
Contributing Guidelines - Complete guide to contributing code, docs, and ideas
Special Interest Groups (SIGs) - Join focused teams working on specific areas
Code of Conduct - Our community standards and values

Contributing Code

Read Guidelines: Review our Code of Conduct and contribution process
Sign Commits: All commits require DCO sign-off (git commit -s)

Ways to Contribute

Bug fixes and small features - Submit PRs directly to component repos
New features with APIs - Require project proposals
Documentation - Help improve guides and examples
Testing & Benchmarking - Contribute to our test coverage
Experimental features - Start in llm-d-incubation org

Security & Safety

Security Policy - How to report vulnerabilities and security issues
Security Announcements - Join the llm-d-security-announce group for emails about security and major API announcements.

Connect With Us

Follow llm-d across social platforms for updates, discussions, and community highlights:

LinkedIn: @llm-d
X (Twitter): @_llm_d_
Bluesky: @llm-d.ai
Reddit: r/llm_d
YouTube: @llm-d-project

Need Help?

Questions? Ideas? Just want to chat? We're here to help! The llm-d community team is friendly and responsive.

Slack: Join our Slack workspace and mention @community-team for quick response
GitHub Issues: Open an issue for bug reports, feature requests, or general questions
Mailing List: llm-d-contributors for broader community discussions

License: Apache 2.0

Pinned Loading

llm-d Public

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 2.5k 325