LLM Evaluation AI Agents

G

Giskard

OSS

Giskard is an AI agent in the LLM Evaluation category. Testing & evaluation library for LLM applications, in particular ...

View Details → Visit

H

HELM

OSS

LLM Evaluation

HELM is an AI agent in the LLM Evaluation category. Holistic Evaluation of Language Models (HELM), a framework to increa...

View Details → Visit

i

instruct-eval

OSS

LLM Evaluation

instruct-eval is an AI agent in the LLM Evaluation category. This repository contains code to quantitatively evaluate in...

View Details → Visit

L

LangSmith

LLM Evaluation

LangSmith is an AI agent in the LLM Evaluation category. a unified platform from LangChain framework for: evaluation, co...

View Details → Visit

l

lighteval

OSS

LLM Evaluation

lighteval is an AI agent in the LLM Evaluation category. a lightweight LLM evaluation suite that Hugging Face has been u...

View Details → Visit

l

lm-evaluation-harness

OSS

LLM Evaluation

lm-evaluation-harness is an AI agent in the LLM Evaluation category. A framework for few-shot evaluation of language mod...

View Details → Visit

M

MixEval

OSS

LLM Evaluation

MixEval is an AI agent in the LLM Evaluation category. A reliable click-and-go evaluation suite compatible with both ope...

View Details → Visit

O

OLMO-eval

OSS

LLM Evaluation

OLMO-eval is an AI agent in the LLM Evaluation category. a repository for evaluating open language models.

View Details → Visit

R

Ragas

OSS

LLM Evaluation

Ragas is an AI agent in the LLM Evaluation category. a framework that helps you evaluate your Retrieval Augmented Genera...

View Details → Visit

s

simple-evals

OSS

LLM Evaluation

simple-evals is an AI agent in the LLM Evaluation category. Eval tools by OpenAI.

View Details → Visit