Evaluation and Monitoring AI Agents

A

AlpacaEval

OSS

AlpacaEval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/tatsu-lab/a...

View Details → Visit

A

ANN-Benchmarks

OSS

Evaluation and Monitoring

ANN-Benchmarks is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/erikber...

View Details → Visit

A

ARES

OSS

Evaluation and Monitoring

ARES is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/stanford-futureda...

View Details → Visit

B

BEIR

OSS

Evaluation and Monitoring

BEIR is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/beir-cellar/beir....

View Details → Visit

C

C-Eval

OSS

Evaluation and Monitoring

C-Eval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/hkust-nlp/ceval...

View Details → Visit

C

Code Generation LM Evaluation Harness

OSS

Evaluation and Monitoring

Code Generation LM Evaluation Harness is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields....

View Details → Visit

C

COMET

OSS

Evaluation and Monitoring

COMET is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Unbabel/COMET.sv...

View Details → Visit

D

Deepchecks

OSS

Evaluation and Monitoring

Deepchecks is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/deepchecks/...

View Details → Visit

D

DeepEval

OSS

Evaluation and Monitoring

DeepEval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/confident-ai/...

View Details → Visit

D

DomainBed

OSS

Evaluation and Monitoring

DomainBed is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/facebookrese...

View Details → Visit

E

EvalAI

OSS

Evaluation and Monitoring

EvalAI is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Cloud-CV/EvalAI...

View Details → Visit

E

Evalchemy

OSS

Evaluation and Monitoring

Evalchemy is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/mlfoundation...

View Details → Visit

E

EvalPlus

OSS

Evaluation and Monitoring

EvalPlus is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/evalplus/eval...

View Details → Visit

E

Evals

OSS

Evaluation and Monitoring

Evals is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/openai/evals.svg...

View Details → Visit

E

EvalScope

OSS

Evaluation and Monitoring

EvalScope is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/modelscope/e...

View Details → Visit

E

Evaluate

OSS

Evaluation and Monitoring

Evaluate is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/huggingface/e...

View Details → Visit

G

GAOKAO-Bench

OSS

Evaluation and Monitoring

GAOKAO-Bench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/OpenLMLab...

View Details → Visit

g

guidellm

OSS

Evaluation and Monitoring

guidellm is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/vllm-project/...

View Details → Visit

H

Helicone

OSS

Evaluation and Monitoring

Helicone is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Helicone/heli...

View Details → Visit

H

HumanEval

OSS

Evaluation and Monitoring

HumanEval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/openai/human...

View Details → Visit

I

Inspect

OSS

Evaluation and Monitoring

Inspect is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/UKGovernmentBE...

View Details → Visit

J

JiWER

OSS

Evaluation and Monitoring

JiWER is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/jitsi/jiwer.svg?...

View Details → Visit

L

Laminar

OSS

Evaluation and Monitoring

Laminar is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/lmnr-ai/lmnr.s...

View Details → Visit

L

LangTest

OSS

Evaluation and Monitoring

LangTest is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/JohnSnowLabs/...

View Details → Visit

L

Language Model Evaluation Harness

OSS

Evaluation and Monitoring

Language Model Evaluation Harness is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/g...

View Details → Visit

L

LLMPerf

OSS

Evaluation and Monitoring

LLMPerf is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/ray-project/ll...

View Details → Visit

l

lmms-eval

OSS

Evaluation and Monitoring

lmms-eval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/EvolvingLMMs...

View Details → Visit

M

Massive Text Embedding Benchmark

OSS

Evaluation and Monitoring

Massive Text Embedding Benchmark is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/gi...

View Details → Visit

M

Melting Pot

OSS

Evaluation and Monitoring

Melting Pot is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/google-dee...

View Details → Visit

M

Meta-World

OSS

Evaluation and Monitoring

Meta-World is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Farama-Foun...

View Details → Visit

m

mir_eval

OSS

Evaluation and Monitoring

mir_eval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/mir-evaluatio...

View Details → Visit

M

MLPerf Inference

OSS

Evaluation and Monitoring

MLPerf Inference is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/mlcom...

View Details → Visit

N

NannyML

OSS

Evaluation and Monitoring

NannyML is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/NannyML/nannym...

View Details → Visit

O

OGB

OSS

Evaluation and Monitoring

OGB is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/snap-stanford/ogb....

View Details → Visit

O

Ollama Grid Search

OSS

Evaluation and Monitoring

Ollama Grid Search is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/dez...

View Details → Visit

O

OpenCompass

OSS

Evaluation and Monitoring

OpenCompass is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/open-compa...

View Details → Visit

O

Overcooked-AI

OSS

Evaluation and Monitoring

Overcooked-AI is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/HumanCom...

View Details → Visit

P

Prometheus-Eval

OSS

Evaluation and Monitoring

Prometheus-Eval is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/promet...

View Details → Visit

P

PromptBench

OSS

Evaluation and Monitoring

PromptBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/microsoft/...

View Details → Visit

R

RagaAI Catalyst

OSS

Evaluation and Monitoring

RagaAI Catalyst is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/raga-a...

View Details → Visit

R

RewardBench

OSS

Evaluation and Monitoring

RewardBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/allenai/re...

View Details → Visit

R

RLBench

OSS

Evaluation and Monitoring

RLBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/stepjam/RLBenc...

View Details → Visit

S

SimplerEnv

OSS

Evaluation and Monitoring

SimplerEnv is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/simpler-env...

View Details → Visit

S

Speech-to-Text Benchmark

OSS

Evaluation and Monitoring

Speech-to-Text Benchmark is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/sta...

View Details → Visit

S

SwanLab

OSS

Evaluation and Monitoring

SwanLab is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/SwanHubX/SwanL...

View Details → Visit

T

TorchBench

OSS

Evaluation and Monitoring

TorchBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/pytorch/ben...

View Details → Visit

T

TruLens

OSS

Evaluation and Monitoring

TruLens is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/truera/trulens...

View Details → Visit

T

TrustLLM

OSS

Evaluation and Monitoring

TrustLLM is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/HowieHwong/Tr...

View Details → Visit

V

VBench

OSS

Evaluation and Monitoring

VBench is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/Vchitect/VBench...

View Details → Visit

V

VLMEvalKit

OSS

Evaluation and Monitoring

VLMEvalKit is an AI agent in the Evaluation and Monitoring category. ![](https://img.shields.io/github/stars/open-compas...

View Details → Visit