Trending

See what the GitHub community is most excited about this week.

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 27,772 3,210 Built by

54 stars this week

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,787 182 Built by

24 stars this week

NVIDIA / cuopt

GPU accelerated decision optimization

Cuda 453 79 Built by

9 stars this week

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without lossing end-to-end metrics across language, image, and video models.

Cuda 2,492 235 Built by

35 stars this week

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,586 946 Built by

17 stars this week

rapidsai / cuvs

cuVS - a library for vector search and clustering on the GPU

Cuda 536 130 Built by

6 stars this week

Infatoshi / cuda-course

Cuda 1,496 273 Built by

24 stars this week

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 886 130 Built by

17 stars this week

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 3,864 532 Built by

36 stars this week

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Cuda 734 90 Built by

5 stars this week

NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,790 461 Built by

1 star this week

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,782 710 Built by

14 stars this week

NVlabs / instant-ngp

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 16,974 2,014 Built by

13 stars this week