Trending

See what the GitHub community is most excited about this week.

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,168 241 Built by

34 stars this week

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 28,939 3,395 Built by

46 stars this week

NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,820 464 Built by

3 stars this week

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,182 824 Built by

11 stars this week

Dao-AILab / causal-conv1d

Causal depthwise conv1d in CUDA, with a PyTorch interface

Cuda 724 155 Built by

3 stars this week

NVIDIA / cuopt

GPU accelerated decision optimization

Cuda 715 127 Built by

8 stars this week