DeepSpeed-Mii is an AI agent in the LLM Inference category. MII makes low-latency and high-throughput inference, similar to vLLM p…
deploy-llms-with-ansible is an AI agent in the LLM Inference category. Easily deploy any LLM on a VM with minimal configuration, u…
exllama is an AI agent in the LLM Inference category. A more memory-efficient rewrite of the HF transformers implementation of Lla…
FastChat is an AI agent in the LLM Inference category. A distributed multi-model LLM serving system with web UI and OpenAI-compati…
FasterTransformer is an AI agent in the LLM Inference category. NVIDIA Framework for LLM Inference(Transitioned to TensorRT-LLM)
Infinity is an AI agent in the LLM Inference category. Inference for text-embeddings in Python
Liger-Kernel is an AI agent in the LLM Inference category. Efficient Triton Kernels for LLM Training.
LMDeploy is an AI agent in the LLM Inference category. A high-throughput and low-latency inference and serving framework for LLMs …
MInference is an AI agent in the LLM Inference category. To speed up Long-context LLMs' inference, approximate and dynamic sparse …
mistral.rs is an AI agent in the LLM Inference category. Blazingly fast LLM inference.
prima.cpp is an AI agent in the LLM Inference category. A distributed implementation of llama.cpp that lets you run 70B-level LLMs…
SGLang is an AI agent in the LLM Inference category. SGLang is a fast serving framework for large language models and vision langu…
SkyPilot is an AI agent in the LLM Inference category. Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU…
TensorRT-LLM is an AI agent in the LLM Inference category. Nvidia Framework for LLM Inference
Text-Embeddings-Inference is an AI agent in the LLM Inference category. Inference for text-embeddings in Rust, HFOIL Licence.
TGI
LLM InferenceTGI is an AI agent in the LLM Inference category. a toolkit for deploying and serving Large Language Models (LLMs).
vLLM is an AI agent in the LLM Inference category. A high-throughput and memory-efficient inference and serving engine for LLMs.