Home / Categories / LLM Inference

LLM Inference

Showing 17 agents

D

DeepSpeed-Mii

LLM Inference
OSS

DeepSpeed-Mii is an AI agent in the LLM Inference category. MII makes low-latency and high-throughput inference, similar to vLLM p…

Details
D

deploy-llms-with-ansible

LLM Inference
OSS

deploy-llms-with-ansible is an AI agent in the LLM Inference category. Easily deploy any LLM on a VM with minimal configuration, u…

Details
E
OSS

exllama is an AI agent in the LLM Inference category. A more memory-efficient rewrite of the HF transformers implementation of Lla…

Details
F
OSS

FastChat is an AI agent in the LLM Inference category. A distributed multi-model LLM serving system with web UI and OpenAI-compati…

Details
F

FasterTransformer

LLM Inference
OSS

FasterTransformer is an AI agent in the LLM Inference category. NVIDIA Framework for LLM Inference(Transitioned to TensorRT-LLM)

Details
I
OSS

Infinity is an AI agent in the LLM Inference category. Inference for text-embeddings in Python

Details
L

Liger-Kernel

LLM Inference
OSS

Liger-Kernel is an AI agent in the LLM Inference category. Efficient Triton Kernels for LLM Training.

Details
L
OSS

LMDeploy is an AI agent in the LLM Inference category. A high-throughput and low-latency inference and serving framework for LLMs …

Details
M

MInference

LLM Inference
OSS

MInference is an AI agent in the LLM Inference category. To speed up Long-context LLMs' inference, approximate and dynamic sparse …

Details
M

mistral.rs

LLM Inference
OSS

mistral.rs is an AI agent in the LLM Inference category. Blazingly fast LLM inference.

Details
P

prima.cpp

LLM Inference
OSS

prima.cpp is an AI agent in the LLM Inference category. A distributed implementation of llama.cpp that lets you run 70B-level LLMs…

Details
S
OSS

SGLang is an AI agent in the LLM Inference category. SGLang is a fast serving framework for large language models and vision langu…

Details
S
OSS

SkyPilot is an AI agent in the LLM Inference category. Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU…

Details
T

TensorRT-LLM

LLM Inference
OSS

TensorRT-LLM is an AI agent in the LLM Inference category. Nvidia Framework for LLM Inference

Details
T

Text-Embeddings-Inference

LLM Inference
OSS

Text-Embeddings-Inference is an AI agent in the LLM Inference category. Inference for text-embeddings in Rust, HFOIL Licence.

Details

TGI is an AI agent in the LLM Inference category. a toolkit for deploying and serving Large Language Models (LLMs).

Details
V
OSS

vLLM is an AI agent in the LLM Inference category. A high-throughput and memory-efficient inference and serving engine for LLMs.

Details