LLM Inference AI Agents

D

DeepSpeed-Mii

OSS

DeepSpeed-Mii is an AI agent in the LLM Inference category. MII makes low-latency and high-throughput inference, similar...

View Details → Visit

d

deploy-llms-with-ansible

OSS

LLM Inference

deploy-llms-with-ansible is an AI agent in the LLM Inference category. Easily deploy any LLM on a VM with minimal config...

View Details → Visit

e

exllama

OSS

LLM Inference

exllama is an AI agent in the LLM Inference category. A more memory-efficient rewrite of the HF transformers implementat...

View Details → Visit

F

FastChat

OSS

LLM Inference

FastChat is an AI agent in the LLM Inference category. A distributed multi-model LLM serving system with web UI and Open...

View Details → Visit

F

FasterTransformer

OSS

LLM Inference

FasterTransformer is an AI agent in the LLM Inference category. NVIDIA Framework for LLM Inference(Transitioned to Tenso...

View Details → Visit

I

Infinity

OSS

LLM Inference

Infinity is an AI agent in the LLM Inference category. Inference for text-embeddings in Python

View Details → Visit

L

Liger-Kernel

OSS

LLM Inference

Liger-Kernel is an AI agent in the LLM Inference category. Efficient Triton Kernels for LLM Training.

View Details → Visit

L

LMDeploy

OSS

LLM Inference

LMDeploy is an AI agent in the LLM Inference category. A high-throughput and low-latency inference and serving framework...

View Details → Visit

M

MInference

OSS

LLM Inference

MInference is an AI agent in the LLM Inference category. To speed up Long-context LLMs' inference, approximate and dynam...

View Details → Visit

m

mistral.rs

OSS

LLM Inference

mistral.rs is an AI agent in the LLM Inference category. Blazingly fast LLM inference.

View Details → Visit

p

prima.cpp

OSS

LLM Inference

prima.cpp is an AI agent in the LLM Inference category. A distributed implementation of llama.cpp that lets you run 70B-...

View Details → Visit

S

SGLang

OSS

LLM Inference

SGLang is an AI agent in the LLM Inference category. SGLang is a fast serving framework for large language models and vi...

View Details → Visit

S

SkyPilot

OSS

LLM Inference

SkyPilot is an AI agent in the LLM Inference category. Run LLMs and batch jobs on any cloud. Get maximum cost savings, h...

View Details → Visit

T

TensorRT-LLM

OSS

LLM Inference

TensorRT-LLM is an AI agent in the LLM Inference category. Nvidia Framework for LLM Inference

View Details → Visit

T

Text-Embeddings-Inference

OSS

LLM Inference

Text-Embeddings-Inference is an AI agent in the LLM Inference category. Inference for text-embeddings in Rust, HFOIL Lic...

View Details → Visit

T

TGI

LLM Inference

TGI is an AI agent in the LLM Inference category. a toolkit for deploying and serving Large Language Models (LLMs).

View Details → Visit

v

vLLM

OSS

LLM Inference

vLLM is an AI agent in the LLM Inference category. A high-throughput and memory-efficient inference and serving engine f...

View Details → Visit