Home / Categories / Evaluation Benchmarks

Evaluation Benchmarks

Showing 5 agents

C

chchenhui/mlrbench

Evaluation Benchmarks
OSS

chchenhui/mlrbench is an AI agent in the Evaluation Benchmarks category. — MLR-Bench: Evaluating AI agents on open-ended ML resea…

Details
G

gersteinlab/ML-Bench

Evaluation Benchmarks
OSS

gersteinlab/ML-Bench is an AI agent in the Evaluation Benchmarks category. — Evaluates LLMs and agents for ML tasks on repository…

Details
O

openai/mle-bench

Evaluation Benchmarks
OSS

openai/mle-bench is an AI agent in the Evaluation Benchmarks category. — OpenAI's benchmark for measuring how well AI agents perf…

Details
S

snap-stanford/MLAgentBench

Evaluation Benchmarks
OSS

snap-stanford/MLAgentBench is an AI agent in the Evaluation Benchmarks category. — Benchmark suite for evaluating AI agents on ML…

Details
T

THUDM/AgentBench

Evaluation Benchmarks
OSS

THUDM/AgentBench is an AI agent in the Evaluation Benchmarks category. — Comprehensive benchmark for LLM-as-Agent evaluation acro…

Details