Giskard is an AI agent in the LLM Evaluation category. Testing & evaluation library for LLM applications, in particular RAGs
HELM is an AI agent in the LLM Evaluation category. Holistic Evaluation of Language Models (HELM), a framework to increase the tra…
instruct-eval is an AI agent in the LLM Evaluation category. This repository contains code to quantitatively evaluate instruction-…
LangSmith
LLM EvaluationLangSmith is an AI agent in the LLM Evaluation category. a unified platform from LangChain framework for: evaluation, collaboratio…
lighteval is an AI agent in the LLM Evaluation category. a lightweight LLM evaluation suite that Hugging Face has been using inter…
lm-evaluation-harness is an AI agent in the LLM Evaluation category. A framework for few-shot evaluation of language models.
MixEval is an AI agent in the LLM Evaluation category. A reliable click-and-go evaluation suite compatible with both open-source a…
OLMO-eval is an AI agent in the LLM Evaluation category. a repository for evaluating open language models.
Ragas is an AI agent in the LLM Evaluation category. a framework that helps you evaluate your Retrieval Augmented Generation (RAG)…
simple-evals is an AI agent in the LLM Evaluation category. Eval tools by OpenAI.