chchenhui/mlrbench is an AI agent in the Evaluation Benchmarks category. — MLR-Bench: Evaluating AI agents on open-ende...
c
Evaluation Benchmarks
View Details
→
Visit
g
Evaluation Benchmarks
gersteinlab/ML-Bench is an AI agent in the Evaluation Benchmarks category. — Evaluates LLMs and agents for ML tasks on ...
View Details
→
Visit
o
openai/mle-bench
OSSopenai/mle-bench is an AI agent in the Evaluation Benchmarks category. — OpenAI's benchmark for measuring how well AI a...
View Details
→
Visit
s
Evaluation Benchmarks
snap-stanford/MLAgentBench is an AI agent in the Evaluation Benchmarks category. — Benchmark suite for evaluating AI ag...
View Details
→
Visit
T
THUDM/AgentBench
OSSTHUDM/AgentBench is an AI agent in the Evaluation Benchmarks category. — Comprehensive benchmark for LLM-as-Agent evalu...
View Details
→
Visit