llm-d incubation

Incubating components of llm-d, a Kubernetes-native high-performance distributed LLM inference framework

Popular repositories Loading

llm-d-infra llm-d-infra Public

llm-d helm charts and deployment examples

Go Template 50 55
llm-d-modelservice llm-d-modelservice Public

helm charts for deploying models with llm-d

Go Template 29 53
llm-d-fast-model-actuation llm-d-fast-model-actuation Public

Kubernetes controllers for fast model actuation using vLLM sleep/wake and launcher-based model swapping

Go 9 12
batch-gateway batch-gateway Public

The batch gateway is an llm-d implementation of the OpenAI batch inference API

Go 7 12
secure-inference secure-inference Public

Go 3 3
ig-wva ig-wva Public

Workload Variant Autoscaler is a service to compute the cost-optimal provisioning of heterogeneous accelerators for inference workloads with varying request latency objectives

Jupyter Notebook 2 2