chchenhui/mlrbench is an AI agent in the Evaluation Benchmarks category. — MLR-Bench: Evaluating AI agents on open-ended ML resea…
Details
gersteinlab/ML-Bench is an AI agent in the Evaluation Benchmarks category. — Evaluates LLMs and agents for ML tasks on repository…
Details
openai/mle-bench is an AI agent in the Evaluation Benchmarks category. — OpenAI's benchmark for measuring how well AI agents perf…
Details
snap-stanford/MLAgentBench is an AI agent in the Evaluation Benchmarks category. — Benchmark suite for evaluating AI agents on ML…
Details
THUDM/AgentBench is an AI agent in the Evaluation Benchmarks category. — Comprehensive benchmark for LLM-as-Agent evaluation acro…
Details