Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upRemove SchedulingParams variants of ThreadPool::TryParallelFor #5050
Conversation
bbb9d92
into
master
40 checks passed
40 checks passed
Windows CPU CI Pipeline (build_x64_no_contrib_ops debug)
build_x64_no_contrib_ops debug succeeded
Details
Windows CPU CI Pipeline (build_x64_no_contrib_ops release)
build_x64_no_contrib_ops release succeeded
Details
orttraining-linux-gpu-ci-pipeline (Onnxruntime_Linux_GPU_Training Debug)
Onnxruntime_Linux_GPU_Training Debug succeeded
Details
orttraining-linux-gpu-ci-pipeline (Onnxruntime_Linux_GPU_Training Release)
Onnxruntime_Linux_GPU_Training Release succeeded
Details
orttraining-win-ci-pipeline (Win_CPU_Training RelWithDebInfo)
Win_CPU_Training RelWithDebInfo succeeded
Details
orttraining-win-gpu-ci-pipeline (Win_GPU_Training RelWithDebInfo)
Win_GPU_Training RelWithDebInfo succeeded
Details
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
tlh20 commentedSep 3, 2020
Description: Simplify the range of parallel loops implemented in the thread pool by removing the variants based on SchedulingParams. This leaves the variant that takes a simple double to express costs, and the variant that takes a TensorOpCost struct.
Motivation and Context
The parallel loop variants taking a SchedulingParams were essentially unused, but supporting them introduced complexity in writing and testing other changes to the thread pool implementation. The single use is in the gelu.cc microbenchmark which uses SchedulingParams to force a fixed 4096-size chunk for work distribution. I updated the microbenchmark to do this chunking explicitly, following the code in onnxruntime/contrib_ops/cpu/bert/bias_gelu.cc