Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

grigio/opencode-benchmark-dashboard

Repository files navigation

opencode-benchmark-dashboard

Benchmark system for testing opencode with various LLM models, measuring speed (latency) and correctness (accuracy).

Why ?

  • The best tradeoff depends on your use-case and your hardware
  • accuracy vs speed: reasoning, tok/s, different quantizations matters. Some small LLM can fix themself using tools, a fast LLM can be slow because it wastes too many tokens in the reasoning. Just test them in real world scenarios.

Quick Start

# Install dependencies
bun install

# Fill prompts/ and prompt-answers/ with your test cases ex. CODING-my-single-test.txt
# Check `~/.config/opencode/opencode.json` with your OpenAI-compatible models.

# It generates the answers with a specific model. use `opencode models` to see the availables
bun run answer -m "opencode/minimax-m2.5-free"
# bun run answer -m "opencode/minimax-m2.5-free" -t CODING-my-single-test

# It generates the evaluations with a specific model.
bun run evaluate -m "opencode/minimax-m2.5-free"
# bun run evaluate -m "opencode/minimax-m2.5-free" -t CODING-my-single-test

# it opens the dashboard on http://localhost:3000
bun run dashboard

Requirements

  • Bun runtime
  • opencode CLI installed and in PATH
  • Models pre-configured in ~/.config/opencode/opencode.json

About

Benchmark system for testing opencode with various LLM models, measuring speed (latency) and correctness (accuracy).

Topics

Resources

Readme

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors