Evaluation and Monitoring AI Agents

A

AlpacaEval

OSS

AlpacaEval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/tatsu-lab/alpaca_eval…

Details

A

ANN-Benchmarks

Evaluation and Monitoring

OSS

ANN-Benchmarks is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/erikbern/ann-benc…

Details

A

ARES

Evaluation and Monitoring

OSS

ARES is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/stanford-futuredata/ARES.sv…

Details

B

BEIR

Evaluation and Monitoring

OSS

BEIR is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/beir-cellar/beir.svg?cacheS…

Details

C

C-Eval

Evaluation and Monitoring

OSS

C-Eval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/hkust-nlp/ceval.svg?cache…

Details

C

Code Generation LM Evaluation Harness

Evaluation and Monitoring

OSS

Code Generation LM Evaluation Harness is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/…

Details

C

COMET

Evaluation and Monitoring

OSS

COMET is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Unbabel/COMET.svg?cacheSec…

Details

D

Deepchecks

Evaluation and Monitoring

OSS

Deepchecks is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/deepchecks/deepchecks…

Details

D

DeepEval

Evaluation and Monitoring

OSS

DeepEval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/confident-ai/deepeval.s…

Details

D

DomainBed

Evaluation and Monitoring

OSS

DomainBed is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/facebookresearch/Domai…

Details

E

EvalAI

Evaluation and Monitoring

OSS

EvalAI is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Cloud-CV/EvalAI.svg?cache…

Details

E

Evalchemy

Evaluation and Monitoring

OSS

Evalchemy is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/mlfoundations/evalchem…

Details

E

EvalPlus

Evaluation and Monitoring

OSS

EvalPlus is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/evalplus/evalplus.svg?c…

Details

E

Evals

Evaluation and Monitoring

OSS

Evals is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/openai/evals.svg?cacheSeco…

Details

E

EvalScope

Evaluation and Monitoring

OSS

EvalScope is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/modelscope/evalscope.s…

Details

E

Evaluate

Evaluation and Monitoring

OSS

Evaluate is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/huggingface/evaluate.sv…

Details

F

Future AGI

Evaluation and Monitoring

OSS

Future AGI is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/future-agi/future-agi…

Details

G

GAOKAO-Bench

Evaluation and Monitoring

OSS

GAOKAO-Bench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/OpenLMLab/GAOKAO-Be…

Details

G

guidellm

Evaluation and Monitoring

OSS

guidellm is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/vllm-project/guidellm.s…

Details

H

Helicone

Evaluation and Monitoring

OSS

Helicone is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Helicone/helicone.svg?c…

Details

H

HumanEval

Evaluation and Monitoring

OSS

HumanEval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/openai/human-eval.svg?…

Details

I

Inspect

Evaluation and Monitoring

OSS

Inspect is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/UKGovernmentBEIS/inspect…

Details

J

JiWER

Evaluation and Monitoring

OSS

JiWER is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/jitsi/jiwer.svg?cacheSecon…

Details

L

Laminar

Evaluation and Monitoring

OSS

Laminar is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/lmnr-ai/lmnr.svg?cacheSe…

Details

L

LangTest

Evaluation and Monitoring

OSS

LangTest is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/JohnSnowLabs/langtest.s…

Details

L

Language Model Evaluation Harness

Evaluation and Monitoring

OSS

Language Model Evaluation Harness is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/star…

Details

L

LLMPerf

Evaluation and Monitoring

OSS

LLMPerf is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/ray-project/llmperf.svg?…

Details

L

lmms-eval

Evaluation and Monitoring

OSS

lmms-eval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/EvolvingLMMs-Lab/lmms-…

Details

M

Massive Text Embedding Benchmark

Evaluation and Monitoring

OSS

Massive Text Embedding Benchmark is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars…

Details

M

Melting Pot

Evaluation and Monitoring

OSS

Melting Pot is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/google-deepmind/melt…

Details

M

Meta-World

Evaluation and Monitoring

OSS

Meta-World is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Farama-Foundation/Met…

Details

M

mir_eval

Evaluation and Monitoring

OSS

mir_eval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/mir-evaluation/mir_eval…

Details

M

MLPerf Inference

Evaluation and Monitoring

OSS

MLPerf Inference is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/mlcommons/infer…

Details

N

NannyML

Evaluation and Monitoring

OSS

NannyML is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/NannyML/nannyml.svg?cach…

Details

O

OGB

Evaluation and Monitoring

OSS

OGB is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/snap-stanford/ogb.svg?cacheS…

Details

O

Ollama Grid Search

Evaluation and Monitoring

OSS

Ollama Grid Search is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/dezoito/ollam…

Details

O

OpenCompass

Evaluation and Monitoring

OSS

OpenCompass is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/open-compass/OpenCom…

Details

O

Overcooked-AI

Evaluation and Monitoring

OSS

Overcooked-AI is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/HumanCompatibleAI/…

Details

P

Prometheus-Eval

Evaluation and Monitoring

OSS

Prometheus-Eval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/prometheus-eval/…

Details

P

PromptBench

Evaluation and Monitoring

OSS

PromptBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/microsoft/promptbenc…

Details

R

RagaAI Catalyst

Evaluation and Monitoring

OSS

RagaAI Catalyst is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/raga-ai-hub/Raga…

Details

R

RewardBench

Evaluation and Monitoring

OSS

RewardBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/allenai/reward-bench…

Details

R

RLBench

Evaluation and Monitoring

OSS

RLBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/stepjam/RLBench.svg?cach…

Details

S

SimplerEnv

Evaluation and Monitoring

OSS

SimplerEnv is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/simpler-env/SimplerEn…

Details

S

Speech-to-Text Benchmark

Evaluation and Monitoring

OSS

Speech-to-Text Benchmark is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Picovoi…

Details

S

SwanLab

Evaluation and Monitoring

OSS

SwanLab is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/SwanHubX/SwanLab.svg?cac…

Details

T

TorchBench

Evaluation and Monitoring

OSS

TorchBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/pytorch/benchmark.svg…

Details

T

TruLens

Evaluation and Monitoring

OSS

TruLens is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/truera/trulens.svg?cache…

Details

T

TrustLLM

Evaluation and Monitoring

OSS

TrustLLM is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/HowieHwong/TrustLLM.svg…

Details

V

VBench

Evaluation and Monitoring

OSS

VBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Vchitect/VBench.svg?cache…

Details

V

VLMEvalKit

Evaluation and Monitoring

OSS

VLMEvalKit is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/open-compass/VLMEvalK…

Details