Create your own GitHub profile
Sign up for your own profile on GitHub, the best place to host code, manage projects, and build software alongside 50 million developers.
Sign up
Popular repositories
-
-
-
Forked from NVIDIA/thrust
Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL).
C++
-
Forked from tensorflow/tensorflow
Open source software library for numerical computation using data flow graphs.
C++
-
1,376 contributions in the last year
Contribution activity
October 2020
- malfet/pytorch-ci-hud JavaScript
Created a pull request in pytorch/pytorch.github.io that received 11 comments
- Fix out-of-bounds access for caching allocator calls
- Test pybind-2.6.0rc2
- Make setup.py python 2 friendly
- Remove Python version upper boundary check
- [caffe2] Add operator schema for FP16SparseNorm
- Fix JIT test config
- qnnpack quantized activations: fix memory format issues
- Improve error checking of Storage._writeFile.
- annotate torch.autograd.* modules
- Doc note update for complex autograd (CP-45270)
- Mark top 3 slowest tests as slow
- Embed callgrind headers [CherryPick of #45914]
- Do not rebase select nighly builds on top of master
- Add `[zc]heevd` to the list of MKL symbols exported from torch_cpu
- Embed callgrind headers
- Add torch::cuda::ncll::all2all
- Cleanup nccl.cpp
- Refactor computeLRWorkDim
- Test torch.svd using complex float and double numbers (take 2)
- Add LazyNVRTC
- Enable python code coverage on windows
- fix type check for torch.quantization._numeric_suite
- install lcov in Docker image if coverage is specified
- [ONNX] Update ONNX doc for indexing export
- Workaround for bug in DistributedDataParallel
- [v1.7 patch] Add warning on ProcessGroup and ProcessGroup::Work APIs
- Remove Python version upper boundary check
- Add CUDA 11.1 docker build
- Performance fix for torch.cat operator on ROCm (#46097)
- [v1.7] Rocm skip test cases (#45782)
- Add warning on ProcessGroup and ProcessGroup::Work APIs
- Complex Autograd Doc fix
- Cherrypick smooth l1 loss fixes
- [v1.7] Update allowlist back compat date for min_values / max_values.
- [NNC] Fix two bugs in Cuda Half support
- [dist_optim] serialize compilation when creating dist_optim (#45871)
- Performance fix for torch.cat operator on ROCm
- make a way to disable callgrind
- trying to make pow work for tensor raised to the power of a scalar
- fix test_serialization not working with Windows.
- including tensorexpr tests in CI for all configs
- [v1.7 patch] Prioritize raising error message about unused parameters when rebuild_buckets fails
- [Release/1.7] [ONNX] Improve error handling for adaptive_pool
- [v1.7 cherry-pick] [JIT] Dict Bug Fixes
- [v1.7 patch] Disallow creation of ProcessGroupNCCL without GPUs. (#45…
- Some pull request reviews not shown.
Created an issue in pytorch/pytorch that received 4 comments
TCPStoreTest.test_numkeys_delkeys takes 5+ min to finish
See report from https://app.circleci.com/pipelines/github/pytorch/pytorch/224304/workflows/2571002d-2075-4e53-877b-4bac675037de/jobs/8062979
Most o…
- Call to `torch.cuda.memory_stats` before any allocating any tensors on GPU causes SEGFAULT
- TestXNNPACKConv1dTransformPass.test_conv1d_with_relu_fc takes 2+ min to finsh
- TestDataLoader.test_proper_exit takes 2.5min to finish
- CPP test errors are not reported as failures
- DISABLED test_functional_debug (quantization.test_quantize_fx.TestQuantizeFx)
- DISABLED test_functional_debug (quantization.test_quantize_fx.TestQuantizeFx)