cuda

PR #6447 adds a public API to get the maximum number of registers per thread (numba.cuda.Dispatcher.get_regs_per_thread()). There are other attributes that might be nice to provide - shared memory per block, local memory per thread, const memory usage, maximum block size.

These are all available in the FuncAttr named tuple: https://github.com/numba/numba/blob/master/numba/cuda/cudadrv/drive

Problem:
catboost version: 0.23.2
Operating System: all
Tutorial: https://github.com/catboost/tutorials/blob/master/custom_loss/custom_metric_tutorial.md

Impossible to use custom metric (С++).

Code example

from catboost import CatBoost
train_data = [[1, 4, 5, 6],

Spark is really inconsistent in how it handles some values like -0.0 vs 0.0 and the various NaN values that are possible. I don't expect cuDF to be aware of any of this, but I would like the ability to work around it in some cases by treating the floating point value as if it were just a bunch of bits. To me logical_cast feels like the right place to do this, but floating point values are

Current implementation of join can be improved by performing the operation in a single call to the backend kernel instead of multiple calls.

This is a fairly easy kernel and may be a good issue for someone getting to know CUDA/ArrayFire internals. Ping me if you want additional info.

Names map and input are exchanged mistakenly. By sense of Preconditions paragraph they have to be exchanged I suppose, because there is no problem when map and result coincide (in current context).

Is your feature request related to a problem? Please describe.
While porting some code from SKL to cuML, I have noticed the following:

SKL:
from sklearn.model_selection import train_test_split
cuML:
from cuml.preprocessing.model_selection import train_test_split

If I try to do from cuml.model_selection import train_test_split, the following error is displayed:
`ModuleNotFoundE

I often use -v just to see that something is going on, but a progress bar (enabled by default) would serve the same purpose and be more concise.

We can just factor out the code from futhark bench for this.

Thank you for this fantastic work!

Could it be possible the fit_transform() method returns the KL divergence of the run?

Thx!

cuda

Here are 2,561 public repositories matching this topic...

NVIDIA / nvidia-docker

kaldi-asr / kaldi

hashcat / hashcat

numba / numba

catboost / catboost

chainer / chainer

cupy / cupy

taskflow / taskflow

intel-isl / Open3D

hybridgroup / gocv

rapidsai / cudf

arrayfire / arrayfire

NVIDIA / thrust

uber / aresdb

ROCm-Developer-Tools / HIP

rapidsai / cuml

dmlc / nnvm

Celtoys / Remotery

NVIDIA / libcudacxx

diku-dk / futhark

graphistry / pygraphistry

AlexiaJM / Deep-learning-with-cats

mp3guy / ElasticFusion

QuantScientist / Deep-Learning-Boot-Camp

Xtra-Computing / thundersvm

CannyLab / tsne-cuda

inducer / pycuda

sniklaus / 3d-ken-burns

NVIDIA / cutlass

NVIDIA / MinkowskiEngine

Improve this page

Add this topic to your repo

Essential cookies

Always active

Analytics cookies