Home / Categories / Evaluation and Monitoring

Evaluation and Monitoring

Showing 51 agents

AlpacaEval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/tatsu-lab/alpaca_eval…

Details
A
OSS

ANN-Benchmarks is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/erikbern/ann-benc…

Details

ARES is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/stanford-futuredata/ARES.sv…

Details

BEIR is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/beir-cellar/beir.svg?cacheS…

Details

C-Eval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/hkust-nlp/ceval.svg?cache…

Details
C

Code Generation LM Evaluation Harness

Evaluation and Monitoring
OSS

Code Generation LM Evaluation Harness is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/…

Details

COMET is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Unbabel/COMET.svg?cacheSec…

Details

Deepchecks is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/deepchecks/deepchecks…

Details

DeepEval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/confident-ai/deepeval.s…

Details

DomainBed is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/facebookresearch/Domai…

Details

EvalAI is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Cloud-CV/EvalAI.svg?cache…

Details

Evalchemy is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/mlfoundations/evalchem…

Details

EvalPlus is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/evalplus/evalplus.svg?c…

Details

Evals is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/openai/evals.svg?cacheSeco…

Details

EvalScope is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/modelscope/evalscope.s…

Details

Evaluate is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/huggingface/evaluate.sv…

Details

Future AGI is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/future-agi/future-agi…

Details
G
OSS

GAOKAO-Bench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/OpenLMLab/GAOKAO-Be…

Details

guidellm is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/vllm-project/guidellm.s…

Details

Helicone is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Helicone/helicone.svg?c…

Details

HumanEval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/openai/human-eval.svg?…

Details

Inspect is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/UKGovernmentBEIS/inspect…

Details

JiWER is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/jitsi/jiwer.svg?cacheSecon…

Details

Laminar is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/lmnr-ai/lmnr.svg?cacheSe…

Details

LangTest is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/JohnSnowLabs/langtest.s…

Details
L

Language Model Evaluation Harness

Evaluation and Monitoring
OSS

Language Model Evaluation Harness is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/star…

Details

LLMPerf is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/ray-project/llmperf.svg?…

Details

lmms-eval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/EvolvingLMMs-Lab/lmms-…

Details
M

Massive Text Embedding Benchmark

Evaluation and Monitoring
OSS

Massive Text Embedding Benchmark is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars…

Details
M
OSS

Melting Pot is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/google-deepmind/melt…

Details

Meta-World is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Farama-Foundation/Met…

Details

mir_eval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/mir-evaluation/mir_eval…

Details
M
OSS

MLPerf Inference is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/mlcommons/infer…

Details

NannyML is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/NannyML/nannyml.svg?cach…

Details

OGB is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/snap-stanford/ogb.svg?cacheS…

Details
O

Ollama Grid Search

Evaluation and Monitoring
OSS

Ollama Grid Search is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/dezoito/ollam…

Details
O
OSS

OpenCompass is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/open-compass/OpenCom…

Details
O
OSS

Overcooked-AI is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/HumanCompatibleAI/…

Details
P
OSS

Prometheus-Eval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/prometheus-eval/…

Details
P
OSS

PromptBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/microsoft/promptbenc…

Details
R
OSS

RagaAI Catalyst is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/raga-ai-hub/Raga…

Details
R
OSS

RewardBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/allenai/reward-bench…

Details

RLBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/stepjam/RLBench.svg?cach…

Details

SimplerEnv is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/simpler-env/SimplerEn…

Details
S

Speech-to-Text Benchmark

Evaluation and Monitoring
OSS

Speech-to-Text Benchmark is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Picovoi…

Details

SwanLab is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/SwanHubX/SwanLab.svg?cac…

Details

TorchBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/pytorch/benchmark.svg…

Details

TruLens is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/truera/trulens.svg?cache…

Details

TrustLLM is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/HowieHwong/TrustLLM.svg…

Details

VBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Vchitect/VBench.svg?cache…

Details

VLMEvalKit is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/open-compass/VLMEvalK…

Details