DeepSpeed-Mii
OSSDeepSpeed-Mii is an AI agent in the LLM Inference category. MII makes low-latency and high-throughput inference, similar...
DeepSpeed-Mii is an AI agent in the LLM Inference category. MII makes low-latency and high-throughput inference, similar...
deploy-llms-with-ansible is an AI agent in the LLM Inference category. Easily deploy any LLM on a VM with minimal config...
exllama is an AI agent in the LLM Inference category. A more memory-efficient rewrite of the HF transformers implementat...
FastChat is an AI agent in the LLM Inference category. A distributed multi-model LLM serving system with web UI and Open...
FasterTransformer is an AI agent in the LLM Inference category. NVIDIA Framework for LLM Inference(Transitioned to Tenso...
Infinity is an AI agent in the LLM Inference category. Inference for text-embeddings in Python
Liger-Kernel is an AI agent in the LLM Inference category. Efficient Triton Kernels for LLM Training.
LMDeploy is an AI agent in the LLM Inference category. A high-throughput and low-latency inference and serving framework...
MInference is an AI agent in the LLM Inference category. To speed up Long-context LLMs' inference, approximate and dynam...
mistral.rs is an AI agent in the LLM Inference category. Blazingly fast LLM inference.
prima.cpp is an AI agent in the LLM Inference category. A distributed implementation of llama.cpp that lets you run 70B-...
SGLang is an AI agent in the LLM Inference category. SGLang is a fast serving framework for large language models and vi...
SkyPilot is an AI agent in the LLM Inference category. Run LLMs and batch jobs on any cloud. Get maximum cost savings, h...
TensorRT-LLM is an AI agent in the LLM Inference category. Nvidia Framework for LLM Inference
Text-Embeddings-Inference is an AI agent in the LLM Inference category. Inference for text-embeddings in Rust, HFOIL Lic...
TGI is an AI agent in the LLM Inference category. a toolkit for deploying and serving Large Language Models (LLMs).
vLLM is an AI agent in the LLM Inference category. A high-throughput and memory-efficient inference and serving engine f...