Home / Categories / LLM Evaluation

LLM Evaluation

Showing 10 agents

G
OSS

Giskard is an AI agent in the LLM Evaluation category. Testing & evaluation library for LLM applications, in particular RAGs

Details
H
OSS

HELM is an AI agent in the LLM Evaluation category. Holistic Evaluation of Language Models (HELM), a framework to increase the tra…

Details
I

instruct-eval

LLM Evaluation
OSS

instruct-eval is an AI agent in the LLM Evaluation category. This repository contains code to quantitatively evaluate instruction-…

Details
L

LangSmith is an AI agent in the LLM Evaluation category. a unified platform from LangChain framework for: evaluation, collaboratio…

Details
L
OSS

lighteval is an AI agent in the LLM Evaluation category. a lightweight LLM evaluation suite that Hugging Face has been using inter…

Details
L

lm-evaluation-harness

LLM Evaluation
OSS

lm-evaluation-harness is an AI agent in the LLM Evaluation category. A framework for few-shot evaluation of language models.

Details
M
OSS

MixEval is an AI agent in the LLM Evaluation category. A reliable click-and-go evaluation suite compatible with both open-source a…

Details
O
OSS

OLMO-eval is an AI agent in the LLM Evaluation category. a repository for evaluating open language models.

Details
R
OSS

Ragas is an AI agent in the LLM Evaluation category. a framework that helps you evaluate your Retrieval Augmented Generation (RAG)…

Details
S

simple-evals

LLM Evaluation
OSS

simple-evals is an AI agent in the LLM Evaluation category. Eval tools by OpenAI.

Details