evals
Here are 169 public repositories matching this topic...
Language: All
Sort: Most stars
AI Observability & Evaluation
-
Updated
Mar 19, 2026 - Jupyter Notebook
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Agno, OpenAI Agents SDK, Langchain, Autogen, AG2, and CamelAI
-
Updated
Oct 30, 2025 - Python
Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.
-
Updated
Mar 19, 2026 - Python
AI observability platform for production LLM and agent systems.
-
Updated
Mar 17, 2026 - Python
Evaluation and Tracking for LLM Experiments and AI Agents
-
Updated
Mar 18, 2026 - Python
Laminar - open-source observability platform purpose-built for AI agents. YC S24.
-
Updated
Mar 19, 2026 - TypeScript
OpenSource Production ready Customer service with built in Evals and monitoring
-
Updated
Jan 12, 2026 - TypeScript
RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL
-
Updated
Mar 17, 2026 - Python
Harbor is a framework for running agent evaluations and creating and using RL environments.
-
Updated
Mar 18, 2026 - Python
Test Generation for Prompts
-
Updated
Mar 18, 2026 - TeX
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
-
Updated
Jul 12, 2025 - Jupyter Notebook
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
-
Updated
Feb 15, 2026 - TypeScript
Evalica, your favourite evaluation toolkit
-
Updated
Mar 10, 2026 - Python
Agent ensembles to design, generate, and select the best code for every task.
-
Updated
Mar 19, 2026 - TypeScript
AI system design guide for engineers building production AI systems and evals.
-
Updated
Mar 1, 2026
AgentEval is the comprehensive .NET toolkit for AI agent evaluation--tool usage validation, RAG quality metrics, stochastic evaluation, and model comparison--built first for Microsoft Agent Framework (MAF) and Microsoft.Extensions.AI. What RAGAS, PromptFoo and DeepEval do for Python, AgentEval does for .NET
-
Updated
Mar 15, 2026 - C#
Improve this page
Add a description, image, and links to the evals topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the evals topic, visit your repo's landing page and select "manage topics."