Add the framework to support prepack #4413

yufenglee · 2020-07-02T19:10:18Z

Description:
Set up framework to support prepacking
Motivation and Context
Prepacking initialized constant tensor can improve performance for some operators, for example, prepacking the constant matrix B for MatMul, weights for GRU&LSTM. However, our current mechanics to support prepack has a drawback: it introduce memory overhead.
This PR introduces a method to remove the memory overhead. It adds a virtual function PrePack to allow OpKernel to PrePack the tensors. The function is invoked in SessionState initialization stage. If no other Ops using this initialized constant tensor, it will be released.

onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.h

pranavsharma

Changes look good overall. As discussed offline can you add a session option to turn this on/off?

pranavsharma · 2020-07-02T21:31:47Z

onnxruntime/core/framework/session_state.cc

+Status SessionState::PrePackConstantTensors() {
+  // calculate the use count of each value
+  std::unordered_map<std::string, size_t> node_arg_use_count;
+  for (auto& node : GetGraphViewer().Nodes()) {


Can this be const auto&? Explicitly call out const based on coding guidelines. "Qualify usages of 'auto' with 'const', '*', '&' and '&&' where applicable to more clearly express the intent".
Check other loops as well below.

pranavsharma · 2020-07-02T21:31:47Z

onnxruntime/core/framework/finalize_session_state.cc

@@ -99,6 +99,8 @@ Status FinalizeSessionState(SessionState& session_state,
  session_state.CleanInitializedTensorsFromGraph();

  ORT_RETURN_IF_ERROR(session_state.CreateKernels(kernel_registry_manager));
+  ORT_RETURN_IF_ERROR(session_state.PrePackConstantTensors());


Can we call this PrePackInitializedTensors to keep the naming consistent with SaveInitializedTensors and CleanInitializedTensorsFromGraph?

pranavsharma · 2020-07-02T21:32:49Z

Also, please fill in the PR template (description and motivation/context) appropriately.

onnxruntime/core/framework/session_state.cc

skottmckay · 2020-07-03T01:58:27Z

onnxruntime/core/framework/session_state.cc

+
+  for (auto& node : GetGraphViewer().Nodes()) {
+    auto kernel = GetMutableKernel(node.Index());
+    int input_idx = 0;


Can delay this until later.

include/onnxruntime/core/framework/op_kernel.h

skottmckay · 2020-07-03T02:04:27Z

onnxruntime/core/framework/session_state.cc

+          if (is_packed && node_arg_use_count.count(input_name) && node_arg_use_count[input_name] == 1) {
+            // release the constant intialized tensor
+            constant_initialized_tensors_[ort_value_idx] = OrtValue();
+          }


Would it be better to remove it from the map completely?

skottmckay · 2020-07-03T02:08:42Z

Overall approach looks nice and clean to me.

skottmckay · 2020-07-03T02:09:40Z

onnxruntime/core/framework/session_state.cc

+  }
+
+  for (auto& node : GetGraphViewer().Nodes()) {
+    auto kernel = GetMutableKernel(node.Index());


Does this work correctly if an initializer from the main graph is packed in a subgraph and becomes unused? Possibly does via the reference counting in OrtValue.

skottmckay · 2020-07-03T02:12:42Z

include/onnxruntime/core/framework/op_kernel.h

+  //     }
+  //     return Status::OK();
+  //   }
+


nit: probably need a more complete example showing that the kernel needs to do an allocation and have a member that owns that buffer.


          add support of prepack


          add support for QAttention and DynamicQuantizeMatMul


          remove the useless tensor instead of making it null


          add an use_prepacking option


          add unit test for python binding


          add use_prepacking in c_sharp api


          merge with latest master


          add missing override


          fix warning

pranavsharma · 2020-08-06T23:58:00Z

include/onnxruntime/core/session/onnxruntime_c_api.h

@@ -857,6 +857,10 @@ struct OrtApi {
     * \param index index of string tensor element to fill 
     */
  ORT_API2_STATUS(FillStringTensorElement, _Inout_ OrtValue* value, _In_ const char* s, size_t index);
+
+  // Control pre-packing of initialized constant tensors
+  ORT_API2_STATUS(EnablePrePacking, _Inout_ OrtSessionOptions* options);


You might be able to achieve this without adding these 2 APIs once this PR is merged.

You may merge first, and I can move the prepacking into session config map in my change

In reply to: 466743313 [](ancestors = 466743313)

pranavsharma · 2020-08-06T23:58:00Z

onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.cc

-  const size_t packed_b_size = MlasGemmPackBSize(N, K, b_is_signed);
-  if (packed_b_size == 0) {
-    return;
+    auto alloc = Info().GetAllocator(0, OrtMemTypeDefault);


nit: auto& alloc

pranavsharma · 2020-08-07T00:00:08Z

onnxruntime/core/framework/session_options.h

@@ -81,5 +81,8 @@ struct SessionOptions {

  // Deterministic compute is likely not as performant. This option is default to false.
  bool use_deterministic_compute = false;
+
+  // Control the pre-packing of initialized constant tensors
+  bool use_prepacking = true;


nit: could be called use_weight_prepacking

yufenglee requested review from skottmckay, pranavsharma and tracysh Jul 2, 2020

yufenglee requested a review from microsoft/onnxruntime as a code owner Jul 2, 2020

yufenglee reviewed Jul 2, 2020

View changes

onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.h Outdated Show resolved Hide resolved

pranavsharma reviewed Jul 2, 2020

View changes

skottmckay reviewed Jul 3, 2020

View changes

onnxruntime/core/framework/session_state.cc Show resolved Hide resolved

skottmckay reviewed Jul 3, 2020

View changes

onnxruntime/core/framework/session_state.cc

for (auto& node : GetGraphViewer().Nodes()) {

auto kernel = GetMutableKernel(node.Index());

int input_idx = 0;

This comment has been minimized.

Sign in to view

skottmckay Jul 3, 2020 Contributor

Can delay this until later.

skottmckay reviewed Jul 3, 2020

View changes

include/onnxruntime/core/framework/op_kernel.h Show resolved Hide resolved

skottmckay reviewed Jul 3, 2020

View changes

include/onnxruntime/core/framework/op_kernel.h Show resolved Hide resolved

skottmckay reviewed Jul 3, 2020

View changes

yufenglee force-pushed the roli/pre-packing-memory branch 2 times, most recently from e18e6aa to a22f6c7 Jul 28, 2020

yufenglee changed the title ~~[WIP] add the framework to support prepack~~ Add the framework to support prepack Jul 28, 2020

yufenglee added 7 commits Jul 22, 2020

add support of prepack

0cb81e2

add support for QAttention and DynamicQuantizeMatMul

7042635

remove the useless tensor instead of making it null

4c63107

add an use_prepacking option

a92e9ca

add unit test for python binding

fc32962

add use_prepacking in c_sharp api

4d0e481

merge with latest master

Loading status checks…

4243a9b

yufenglee force-pushed the roli/pre-packing-memory branch from 91f7920 to 4243a9b Aug 4, 2020

yufenglee added 2 commits Aug 5, 2020

add missing override

Loading status checks…

525abb8

fix warning

Loading status checks…

d487820

pranavsharma approved these changes Aug 6, 2020

View changes

pranavsharma reviewed Aug 7, 2020

View changes

yufenglee deleted the roli/pre-packing-memory branch Aug 7, 2020

microsoft / onnxruntime

Add the framework to support prepack #4413

Add the framework to support prepack #4413

yufenglee commented Jul 2, 2020 •

edited

pranavsharma left a comment •

edited

This comment has been minimized.

This comment has been minimized.

pranavsharma commented Jul 2, 2020

This comment has been minimized.

This comment has been minimized.

skottmckay commented Jul 3, 2020

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

microsoft / onnxruntime

Join GitHub today

Add the framework to support prepack #4413

Add the framework to support prepack #4413

Conversation

yufenglee commented Jul 2, 2020 • edited

pranavsharma left a comment • edited

This comment has been minimized.

pranavsharma Jul 2, 2020 Contributor

This comment has been minimized.

pranavsharma Jul 2, 2020 Contributor

pranavsharma commented Jul 2, 2020

This comment has been minimized.

skottmckay Jul 3, 2020 Contributor

This comment has been minimized.

skottmckay Jul 3, 2020 Contributor

skottmckay commented Jul 3, 2020

This comment has been minimized.

skottmckay Jul 3, 2020 • edited Contributor

This comment has been minimized.

skottmckay Jul 3, 2020 Contributor

This comment has been minimized.

pranavsharma Aug 6, 2020 Contributor

This comment has been minimized.

gwang-msft Aug 7, 2020 Contributor

This comment has been minimized.

pranavsharma Aug 6, 2020 Contributor

This comment has been minimized.

pranavsharma Aug 7, 2020 Contributor

yufenglee commented Jul 2, 2020 •

edited

pranavsharma left a comment •

edited

skottmckay Jul 3, 2020 •

edited

Contributor