Running large language models on a single GPU for throughput-oriented scenarios.
-
Updated
Oct 28, 2024 - Python
Running large language models on a single GPU for throughput-oriented scenarios.
A thread-safe queue faster and more resource efficient than golang's native channels
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning Framework on a GPU (JMLR 2022)
[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
atomate2 is a library of computational materials science workflows
quacc is a flexible platform for computational materials science and quantum chemistry that is built for the big data era.
Realtime distributed messaging platform built using Go and React (Fullstack)
One database for likes, views, follows -- pre-computed, served in real-time
A snakemake-based workflow for FEP and MM(PB/GB)SA calculations with GROMACS
extremly light uart library for AVR 8 bit microcontrollers
A low-latency LRU approximation cache in C++ using CLOCK second-chance algorithm. Multi level cache too. Up to 2.5 billion lookups per second.
An Intuitive, Lightweight, High Performance Full Stack Java Web Framework.
Automated Blood Vasculature Analysis of 3D Light-Sheet Image Volumes
A reactive driver for Aeron transport (https://github.com/real-logic/aeron)
A Simple Way of Creating Job Workflows in Go running in Processes, Containers, Tasks, Pods, or Jobs
A Throughput-Optimized Pipeline Parallel Inference System for Large Language Models
Full automation of relative protein-ligand binding free energy calculations in GROMACS
Window-Based Hybrid CPU/GPU Stream Processing Engine
The Zavolab Automated RNA-seq Pipeline
Add a description, image, and links to the high-throughput topic page so that developers can more easily learn about it.
To associate your repository with the high-throughput topic, visit your repo's landing page and select "manage topics."