streamingllm

Here are 2 public repositories matching this topic...

intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

chatbot stable-diffusion large-language-model chatpdf llm-inference smoothquant 4-bits speculative-decoding llm-cpu streamingllm attention-sink intel-optimized-llamacpp neural-chat

Updated Nov 25, 2023
C++

DefTruth / Awesome-LLM-Inference

Star

💻A small Collection for Awesome LLM Inference [Papers|Blogs|Docs] with codes, contains TensorRT-LLM, streaming-llm, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

flash-attention flash-attention-2 smooth-quant tensorrt-llm paged-attention streaming-llm streamingllm flash-decoding llm-fp4

Updated Nov 25, 2023

Improve this page

Add a description, image, and links to the streamingllm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the streamingllm topic, visit your repo's landing page and select "manage topics."

Learn more