stopwatch
A simple solution for benchmarking vLLM, SGLang, and TensorRT-LLM on Modal.
Setup
Install dependencies
Run a benchmark
To run a single benchmark, you can use the provision-and-benchmark command, which will provision an LLM server, benchmark it, and save the results to a local file.
For example, to run a synchronous (one request after another) benchmark with vLLM and save the results to results.json:
MODEL=meta-llama/Llama-3.1-8B-Instruct
OUTPUT_PATH=results.json
stopwatch provision-and-benchmark $MODEL $LLM_SERVER_TYPE --output-path $OUTPUT_PATH
Or, to run a fixed-rate (e.g. 5 requests per second) multi-GPU benchmark with SGLang:
GPU_TYPE=H100
LLM_SERVER_TYPE=sglang
RATE_TYPE=constant
REQUESTS_PER_SECOND=5
stopwatch provision-and-benchmark $MODEL $LLM_SERVER_TYPE --output-path $OUTPUT_PATH --gpu "$GPU_TYPE:$GPU_COUNT" --rate-type $RATE_TYPE --rate $REQUESTS_PER_SECOND --llm-server-config "{\"extra_args\": [\"--tp-size\", \"$GPU_COUNT\"]}"
Or, to run a throughput (as many requests as the server can handle) test with TensorRT-LLM:
RATE_TYPE=throughput
stopwatch provision-and-benchmark $MODEL $LLM_SERVER_TYPE --output-path $OUTPUT_PATH --rate-type $RATE_TYPE
Run the profiler
To profile a server with the PyTorch profiler, use the following command (only vLLM and SGLang are currently supported):
MODEL=meta-llama/Llama-3.1-8B-Instruct
NUM_REQUESTS=10
OUTPUT_PATH=trace.json.gz
stopwatch profile $MODEL $LLM_SERVER_TYPE --output-path $OUTPUT_PATH --num-requests $NUM_REQUESTS
Once the profiling is done, the trace will be saved to trace.json.gz, which you can open and visualize at https://ui.perfetto.dev.
Keep in mind that generated traces can get very large, so it is recommended to only send a few requests while profiling.
Run tests
Before committing any changes, you should make sure that your changes don't break any core functionality in Stopwatch. You may verify this with:
Lint
To make sure that any code changes are compliant with our linting rules, you can run ruff with:
Contributing
We welcome contributions, including those that add tuned benchmarks to our collection. See the CONTRIBUTING file and the Getting Started document for more details on contributing to Stopwatch.
License
Stopwatch is available under the MIT license. See the LICENSE file for more details.