gpgpu

Our users are often confused by the output from programs such as zip2john sometimes being very large (multi-gigabyte). Maybe we should identify and enhance these programs to output a message to stderr to explain to users that it's normal for the output to be very large - maybe always or maybe only when the output size is above a threshold (e.g., 1 million bytes?)

Current implementation of join can be improved by performing the operation in a single call to the backend kernel instead of multiple calls.

This is a fairly easy kernel and may be a good issue for someone getting to know CUDA/ArrayFire internals. Ping me if you want additional info.

In order to test manually altered IR, it would be nice to have a --skip-compilation flag for futhark test, just like we do for futhark bench.

Open issue to openly discuss potential ideas or improvements, whether on documentation, interfaces, examples, bug fixes, etc.

Add Javadoc to document the examples in TornadoVM.

This affects the packages under the examples module:

https://github.com/beehive-lab/TornadoVM/tree/master/examples/src/main/java/uk/ac/manchester/tornado/examples

The documentation is at the class-level and it will contain a description of how the TornadoVM API is used for each example. Additionally, it contains how to run the example

Bug summary
There is evidence that sub_group::get_group_id() does not return the same value as threadIdx.x / warpSize (assuming 1D kernel), as expected on CUDA. We should check the implementation of this function. Our implementation of this function performs bit manipulation magic, presumably the optimization went to far...

To Reproduce
Compare sub_group{}.get_group_id() or `sub

gpgpu

Here are 386 public repositories matching this topic...

gpujs / gpu.js

hashcat / hashcat

taskflow / taskflow

openwall / john

arrayfire / arrayfire

dfranx / SHADERed

turbo / js

diku-dk / futhark

calebwin / emu

alexsosn / iOS_ML

Sergio0694 / ComputeSharp

boostorg / compute

MetalPetal / MetalPetal

uncomplicate / neanderthal

mratsim / Arraymancer

intel / compute-runtime

ddemidov / vexcl

Erkaman / vulkan_minimal_compute

stotko / stdgpu

KomputeProject / kompute

arrayfire / arrayfire-rust

beehive-lab / TornadoVM

e-ago / bitcracker

cogciprocate / ocl

Erkaman / regl-cnn

m4rs-mt / ILGPU

illuhad / hipSYCL

ddemidov / amgcl

termoshtt / accel

Polytonic / Chlorine

Improve this page

Add this topic to your repo