LLM Evaluation AI Agents

G

Giskard

OSS

Giskard is an AI agent in the LLM Evaluation category. Testing & evaluation library for LLM applications, in particular RAGs

Details

H

HELM

LLM Evaluation

OSS

HELM is an AI agent in the LLM Evaluation category. Holistic Evaluation of Language Models (HELM), a framework to increase the tra…

Details

I

instruct-eval

LLM Evaluation

OSS

instruct-eval is an AI agent in the LLM Evaluation category. This repository contains code to quantitatively evaluate instruction-…

Details

L

LangSmith

LLM Evaluation

LangSmith is an AI agent in the LLM Evaluation category. a unified platform from LangChain framework for: evaluation, collaboratio…

Details

L

lighteval

LLM Evaluation

OSS

lighteval is an AI agent in the LLM Evaluation category. a lightweight LLM evaluation suite that Hugging Face has been using inter…

Details

L

lm-evaluation-harness

LLM Evaluation

OSS

lm-evaluation-harness is an AI agent in the LLM Evaluation category. A framework for few-shot evaluation of language models.

Details

M

MixEval

LLM Evaluation

OSS

MixEval is an AI agent in the LLM Evaluation category. A reliable click-and-go evaluation suite compatible with both open-source a…

Details

O

OLMO-eval

LLM Evaluation

OSS

OLMO-eval is an AI agent in the LLM Evaluation category. a repository for evaluating open language models.

Details

R

Ragas

LLM Evaluation

OSS

Ragas is an AI agent in the LLM Evaluation category. a framework that helps you evaluate your Retrieval Augmented Generation (RAG)…

Details

S

simple-evals

LLM Evaluation

OSS

simple-evals is an AI agent in the LLM Evaluation category. Eval tools by OpenAI.

Details