inference-server

Here are 46 public repositories matching this topic...

jundot / omlx

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

macos inference-server mlx apple-silicon openai-api llm

Updated Apr 23, 2026
Python

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

ai containers cuda intel hip hacktoberfest inference-server podman llm llamacpp vllm

Updated Apr 23, 2026
Python

roboflow / inference

Star

Turn any computer or edge device into a command center for your computer vision projects.

Updated Apr 23, 2026
Python

basetenlabs / truss

Star

The simplest way to serve AI/ML models in production

open-source machine-learning packaging artificial-intelligence falcon easy-to-use whisper inference-server model-serving inference-api stable-diffusion wizardlm

Updated Apr 23, 2026
Python

underneathall / pinferencia

Star

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

Updated Feb 14, 2023
Python

BMW-InnovationLab / BMW-YOLOv4-Inference-API-GPU

Star

This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.

Updated Jun 28, 2022
Python

BMW-InnovationLab / BMW-YOLOv4-Inference-API-CPU

Star

This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.

Updated Jun 28, 2022
Python

BMW-InnovationLab / BMW-TensorFlow-Inference-API-CPU

Star

This is a repository for an object detection inference API using the Tensorflow framework.

Updated Jun 28, 2022
Python

notAI-tech / fastDeploy

Star

Deploy DL/ ML inference pipelines with minimal extra code.

Updated Feb 10, 2026
Python

friendliai / friendli-client

Star

[⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI

ai ml inference gpt inference-server mistral inference-engine serving mlops gpt3 llm stable-diffusion llms generative-ai llmops llm-serving llm-inference llama2 llm-ops

Updated Jun 25, 2025
Python

k9ele7en / Triton-TensorRT-Inference-CRAFT-pytorch

Star

Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX

inference pytorch text-detection nvidia-docker inference-server tensorrt inference-engine onnx onnx-torch tensorrt-conversion triton-inference-server text-detection-from-image

Updated Aug 18, 2021
Python

tensorchord / inference-benchmark

Star

Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)

benchmark whisper inference-server llm stable-diffusion

Updated Jun 28, 2023
Python

leimao / Simple-Inference-Server

Sponsor

Star

Inference Server Implementation from Scratch for Machine Learning Models

inference-server

Updated Dec 31, 2020
Python

roboflow / inference-dashboard-example

Star

Roboflow's inference server to analyze video streams. This project extracts insights from video frames at defined intervals and generates informative visualizations and CSV outputs.

inference object-detection predictions inference-server