Name	Name	Last commit message	Last commit date
Latest commit History 232 Commits
.github/workflows	.github/workflows
src/stopwatch	src/stopwatch
tests	tests
tools	tools
.gitignore	.gitignore
CONTRIBUTING.md	CONTRIBUTING.md
LICENSE.md	LICENSE.md
README.md	README.md
pyproject.toml	pyproject.toml
pytest.ini	pytest.ini

Name

Last commit message

Last commit date

Latest commit

History

stopwatch

A simple solution for benchmarking vLLM, SGLang, and TensorRT-LLM on Modal.

Setup

Install dependencies

pip install -e .

Run a benchmark

To run a single benchmark, you can use the provision-and-benchmark command, which will provision an LLM server, benchmark it, and save the results to a local file. For example, to run a synchronous (one request after another) benchmark with vLLM and save the results to results.json:

LLM_SERVER_TYPE=vllm MODEL=meta-llama/Llama-3.1-8B-Instruct OUTPUT_PATH=results.json stopwatch provision-and-benchmark $MODEL $LLM_SERVER_TYPE --output-path $OUTPUT_PATH

Or, to run a fixed-rate (e.g. 5 requests per second) multi-GPU benchmark with SGLang:

GPU_COUNT=4 GPU_TYPE=H100 LLM_SERVER_TYPE=sglang RATE_TYPE=constant REQUESTS_PER_SECOND=5 stopwatch provision-and-benchmark $MODEL $LLM_SERVER_TYPE --output-path $OUTPUT_PATH --gpu "$GPU_TYPE:$GPU_COUNT" --rate-type $RATE_TYPE --rate $REQUESTS_PER_SECOND --llm-server-config "{\"extra_args\": [\"--tp-size\", \"$GPU_COUNT\"]}"

Or, to run a throughput (as many requests as the server can handle) test with TensorRT-LLM:

LLM_SERVER_TYPE=tensorrt-llm RATE_TYPE=throughput stopwatch provision-and-benchmark $MODEL $LLM_SERVER_TYPE --output-path $OUTPUT_PATH --rate-type $RATE_TYPE

Run the profiler

To profile a server with the PyTorch profiler, use the following command (only vLLM and SGLang are currently supported):

LLM_SERVER_TYPE=vllm MODEL=meta-llama/Llama-3.1-8B-Instruct NUM_REQUESTS=10 OUTPUT_PATH=trace.json.gz stopwatch profile $MODEL $LLM_SERVER_TYPE --output-path $OUTPUT_PATH --num-requests $NUM_REQUESTS

Once the profiling is done, the trace will be saved to trace.json.gz, which you can open and visualize at https://ui.perfetto.dev. Keep in mind that generated traces can get very large, so it is recommended to only send a few requests while profiling.

Run tests

Before committing any changes, you should make sure that your changes don't break any core functionality in Stopwatch. You may verify this with:

pytest

Lint

To make sure that any code changes are compliant with our linting rules, you can run ruff with:

ruff check

Contributing

We welcome contributions, including those that add tuned benchmarks to our collection. See the CONTRIBUTING file and the Getting Started document for more details on contributing to Stopwatch.

License

Stopwatch is available under the MIT license. See the LICENSE file for more details.

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modal-labs/stopwatch

Folders and files

Latest commit

History

Repository files navigation

stopwatch

Setup

Install dependencies

Run a benchmark

Run the profiler

Run tests

Lint

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages