Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd the framework to support prepack #4413
Conversation
onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.h
Outdated
Show resolved
Hide resolved
|
Changes look good overall. As discussed offline can you add a session option to turn this on/off? |
| Status SessionState::PrePackConstantTensors() { | ||
| // calculate the use count of each value | ||
| std::unordered_map<std::string, size_t> node_arg_use_count; | ||
| for (auto& node : GetGraphViewer().Nodes()) { |
This comment has been minimized.
This comment has been minimized.
pranavsharma
Jul 2, 2020
Contributor
Can this be const auto&? Explicitly call out const based on coding guidelines. "Qualify usages of 'auto' with 'const', '*', '&' and '&&' where applicable to more clearly express the intent".
Check other loops as well below.
| @@ -99,6 +99,8 @@ Status FinalizeSessionState(SessionState& session_state, | |||
| session_state.CleanInitializedTensorsFromGraph(); | |||
|
|
|||
| ORT_RETURN_IF_ERROR(session_state.CreateKernels(kernel_registry_manager)); | |||
| ORT_RETURN_IF_ERROR(session_state.PrePackConstantTensors()); | |||
This comment has been minimized.
This comment has been minimized.
pranavsharma
Jul 2, 2020
Contributor
Can we call this PrePackInitializedTensors to keep the naming consistent with SaveInitializedTensors and CleanInitializedTensorsFromGraph?
|
Also, please fill in the PR template (description and motivation/context) appropriately. |
|
|
||
| for (auto& node : GetGraphViewer().Nodes()) { | ||
| auto kernel = GetMutableKernel(node.Index()); | ||
| int input_idx = 0; |
This comment has been minimized.
This comment has been minimized.
| if (is_packed && node_arg_use_count.count(input_name) && node_arg_use_count[input_name] == 1) { | ||
| // release the constant intialized tensor | ||
| constant_initialized_tensors_[ort_value_idx] = OrtValue(); | ||
| } |
This comment has been minimized.
This comment has been minimized.
|
Overall approach looks nice and clean to me. |
| } | ||
|
|
||
| for (auto& node : GetGraphViewer().Nodes()) { | ||
| auto kernel = GetMutableKernel(node.Index()); |
This comment has been minimized.
This comment has been minimized.
skottmckay
Jul 3, 2020
•
Contributor
Does this work correctly if an initializer from the main graph is packed in a subgraph and becomes unused? Possibly does via the reference counting in OrtValue.
| // } | ||
| // return Status::OK(); | ||
| // } | ||
|
|
This comment has been minimized.
This comment has been minimized.
skottmckay
Jul 3, 2020
Contributor
nit: probably need a more complete example showing that the kernel needs to do an allocation and have a member that owns that buffer.
e18e6aa
to
a22f6c7
| @@ -857,6 +857,10 @@ struct OrtApi { | |||
| * \param index index of string tensor element to fill | |||
| */ | |||
| ORT_API2_STATUS(FillStringTensorElement, _Inout_ OrtValue* value, _In_ const char* s, size_t index); | |||
|
|
|||
| // Control pre-packing of initialized constant tensors | |||
| ORT_API2_STATUS(EnablePrePacking, _Inout_ OrtSessionOptions* options); | |||
This comment has been minimized.
This comment has been minimized.
pranavsharma
Aug 6, 2020
Contributor
You might be able to achieve this without adding these 2 APIs once this PR is merged.
This comment has been minimized.
This comment has been minimized.
gwang-msft
Aug 7, 2020
Contributor
You may merge first, and I can move the prepacking into session config map in my change
In reply to: 466743313 [](ancestors = 466743313)
| const size_t packed_b_size = MlasGemmPackBSize(N, K, b_is_signed); | ||
| if (packed_b_size == 0) { | ||
| return; | ||
| auto alloc = Info().GetAllocator(0, OrtMemTypeDefault); |
This comment has been minimized.
This comment has been minimized.
| @@ -81,5 +81,8 @@ struct SessionOptions { | |||
|
|
|||
| // Deterministic compute is likely not as performant. This option is default to false. | |||
| bool use_deterministic_compute = false; | |||
|
|
|||
| // Control the pre-packing of initialized constant tensors | |||
| bool use_prepacking = true; | |||
This comment has been minimized.
This comment has been minimized.
b22091d
into
master
yufenglee commentedJul 2, 2020
•
edited
Description:
Set up framework to support prepacking
Motivation and Context
Prepacking initialized constant tensor can improve performance for some operators, for example, prepacking the constant matrix B for MatMul, weights for GRU&LSTM. However, our current mechanics to support prepack has a drawback: it introduce memory overhead.
This PR introduces a method to remove the memory overhead. It adds a virtual function PrePack to allow OpKernel to PrePack the tensors. The function is invoked in SessionState initialization stage. If no other Ops using this initialized constant tensor, it will be released.