AlpacaEval
OSSAlpacaEval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/tatsu-lab/a...
AlpacaEval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/tatsu-lab/a...
ANN-Benchmarks is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/erikber...
ARES is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/stanford-futureda...
BEIR is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/beir-cellar/beir....
C-Eval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/hkust-nlp/ceval...
Code Generation LM Evaluation Harness is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields....
COMET is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Unbabel/COMET.sv...
Deepchecks is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/deepchecks/...
DeepEval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/confident-ai/...
DomainBed is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/facebookrese...
EvalAI is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Cloud-CV/EvalAI...
Evalchemy is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/mlfoundation...
EvalPlus is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/evalplus/eval...
Evals is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/openai/evals.svg...
EvalScope is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/modelscope/e...
Evaluate is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/huggingface/e...
GAOKAO-Bench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/OpenLMLab...
guidellm is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/vllm-project/...
Helicone is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Helicone/heli...
HumanEval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/openai/human...
Inspect is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/UKGovernmentBE...
JiWER is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/jitsi/jiwer.svg?...
Laminar is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/lmnr-ai/lmnr.s...
LangTest is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/JohnSnowLabs/...
Language Model Evaluation Harness is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/g...
LLMPerf is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/ray-project/ll...
lmms-eval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/EvolvingLMMs...
Massive Text Embedding Benchmark is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/gi...
Melting Pot is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/google-dee...
Meta-World is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Farama-Foun...
mir_eval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/mir-evaluatio...
MLPerf Inference is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/mlcom...
NannyML is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/NannyML/nannym...
OGB is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/snap-stanford/ogb....
Ollama Grid Search is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/dez...
OpenCompass is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/open-compa...
Overcooked-AI is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/HumanCom...
Prometheus-Eval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/promet...
PromptBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/microsoft/...
RagaAI Catalyst is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/raga-a...
RewardBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/allenai/re...
RLBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/stepjam/RLBenc...
SimplerEnv is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/simpler-env...
Speech-to-Text Benchmark is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/sta...
SwanLab is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/SwanHubX/SwanL...
TorchBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/pytorch/ben...
TruLens is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/truera/trulens...
TrustLLM is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/HowieHwong/Tr...
VBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Vchitect/VBench...
VLMEvalKit is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/open-compas...