nvidia-cuda

Bug summary
There is evidence that sub_group::get_group_id() does not return the same value as threadIdx.x / warpSize (assuming 1D kernel), as expected on CUDA. We should check the implementation of this function. Our implementation of this function performs bit manipulation magic, presumably the optimization went to far...

To Reproduce
Compare sub_group{}.get_group_id() or `sub

nvidia-cuda

Here are 101 public repositories matching this topic...

illuhad / hipSYCL

Incorrect results for sub_group::get_group_id() on CUDA

Remove references to deprecated hcc

Switch to using hidden friends

enfiskutensykkel / ssd-gpu-dma

genn-team / genn

Feature tests for additional input variables

Accessing queued pre and postsynaptic weight update model variables

more meaningful assertions?

nathtest / Tutorial-Ubuntu-18.04-Install-Nvidia-driver-and-CUDA-and-CUDNN-and-build-Tensorflow-for-gpu

MysterionRise / mavenized-jcuda

m1k1o / go-transcode

codingCoffee / fahclient

dkozlov / ansible-nvidia

stovorov / ConjugateGradients

hoanglehaithanh / NVIDIA-DeepStreamSDK

mattdean1 / cuda

neurite / debian-setup

Rtoax / 2D3D-TI-FD-RTM-cuda

Rtoax / VTI-FD-CUDA-GTK

fjramireg / StiffMa

karthikeyann / cuda-calculator

iliul / ffmpeg-gpu

JetBrains-Research / cuBool

blitzingeagle / VehicleSuperRes

sudonitin / smart-surveillance-system-for-museum

gabrielkirsten / cnn_keras

tandav / ultrasonic-stethoscope

DumaxFr / ccminer

whitelok / cuDNN-convolution2D-invoke-demo

e-ago / hpgmg-cuda-async

m1k1o / hls-restream

sandialabs / p3a

Beuth-Erdelt / prometheus_nvlink_exporter

pyhf / cuda-images

whitelok / cuDNN-convolution3D-invoke-demo

Improve this page

Add this topic to your repo