Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single Operator Execution Interface #4453

Open
wants to merge 23 commits into
base: master
from
Open

Conversation

@orausch
Copy link
Contributor

orausch commented Jul 8, 2020

Description: This PR adds an interface to the C ABI that enables the execution of single ONNX nodes, without the overhead of graph construction and memory allocation

Motivation and Context
The alternative way to execute single operators/nodes is to create an ONNX graph containing a single node only. However, this (understandably) adds a lot of overhead, as can be seen in the plot below.

image

Here is an example of how the API can be used (UPDATE: new api for adding attributes):

// Setup the context
OrtExecutableKernelContext* kernel_context;
CheckStatus(OrtApi->CreateExecutableKernelContext("MyConv", "Conv", &kernel_context));

// Add parameter X
CheckStatus(OrtApi->ExecutableKernelContext_AddInput(kernel_context, ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT));
// Add parameter W
CheckStatus(OrtApi->ExecutableKernelContext_AddInput(kernel_context, ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT));
// Add parameter Y
CheckStatus(OrtApi->ExecutableKernelContext_AddOutput(kernel_context, ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT));

// Setup attributes
CheckStatus(OrtApi->ExecutableKernelContext_AddAttributeString(kernel_context, "auto_pad", "NOTSET"));
CheckStatus(OrtApi->ExecutableKernelContext_AddAttributeInt(kernel_context, "group", 1));
{
	// Setup attribute strides
	int64_t values[2];
	values[0] = 2;
	values[1] = 2;

	CheckStatus(OrtApi->ExecutableKernelContext_AddAttributeInts(kernel_context, "strides", values, 2));
}
// Create the executable kernel
OrtExecutableKernel* kernel;
CheckStatus(OrtApi->CreateExecutableKernel(__ort_session, kernel_context, /*provider_index=*/0, &kernel));

// Execute the kernel
CheckStatus(OrtApi->ExecutableKernel_SetInput(kernel, 0, ort_value_input_X));
CheckStatus(OrtApi->ExecutableKernel_SetInput(kernel, 1, ort_value_input_W));
CheckStatus(OrtApi->ExecutableKernel_SetOutput(kernel, 0, ort_value_output_Y));
CheckStatus(OrtApi->ExecutableKernel_Compute(kernel));

It has been tested with the CPU and CUDA execution providers.

orausch added 4 commits May 15, 2020
The current design implements a new ExecutionFrame that is used to
execute the op. This is less than optimal; I will attempt to change this
in the future.

The API will also have to be extended to add other providers than CPU.
With this change, the ExecutableKernelContextImpl is initalized at kernel
creation, and not at compute time, which should remove some overhead.  This
allows multiple calls with different datato be made using the same kernel.

Furthermore, the main graph of the different op kernels is now shared through
OrtKernelSession.
* Reuse provides across Kernels
* Support CUDA providers
@orausch orausch requested a review from microsoft/onnxruntime as a code owner Jul 8, 2020
@microsoft-cla
Copy link

microsoft-cla bot commented Jul 8, 2020

CLA assistant check
All CLA requirements met.

@Craigacp
Copy link
Contributor

Craigacp commented Jul 8, 2020

This looks interesting to expose into the Java API, but is there a reason why the input and output arguments are specified separately from the call to compute? In session.run they are supplied to that call, and I feel like that maps a little more naturally.

@orausch
Copy link
Contributor Author

orausch commented Jul 8, 2020

This looks interesting to expose into the Java API, but is there a reason why the input and output arguments are specified separately from the call to compute? In session.run they are supplied to that call, and I feel like that maps a little more naturally.

I'm not particularly tied to this exact API; if exposed to the Java or Python API, it could be implemented similarly to session.run. The current option was just easy implement, and it has the performance advantage of not creating any intermediate data structures.

@liqunfu liqunfu added the api:C/C++ label Aug 3, 2020
@hariharans29
Copy link
Member

hariharans29 commented Aug 13, 2020

/azp run Linux CPU CI Pipeline,Linux CPU x64 NoContribops CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,MacOS CI Pipeline,MacOS NoContribops CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline

@azure-pipelines
Copy link

azure-pipelines bot commented Aug 13, 2020

Pull request contains merge conflicts.
@hariharans29
Copy link
Member

hariharans29 commented Aug 13, 2020

/azp run orttraining-linux-ci-pipeline,orttraining-mac-ci-pipeline,orttraining-linux-gpu-ci-pipeline,centos7_cpu,Linux OpenVINO CI Pipeline

@azure-pipelines
Copy link

azure-pipelines bot commented Aug 13, 2020

Pull request contains merge conflicts.
orausch added 2 commits Aug 17, 2020
# Conflicts:
#	include/onnxruntime/core/session/onnxruntime_c_api.h
#	onnxruntime/core/session/onnxruntime_c_api.cc
#	onnxruntime/core/session/ort_apis.h
@faxu
Copy link
Contributor

faxu commented Aug 17, 2020

/azp run Windows GPU CI Pipeline, WIndows GPU TensorRT CI Pipeline, centos7_cpu, centos7_cpu (linux_centos_ci Debug), centos7_cpu (linux_centos_ci Release), orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline

@faxu
Copy link
Contributor

faxu commented Aug 17, 2020

/azp run Linux CPU CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline

@azure-pipelines
Copy link

azure-pipelines bot commented Aug 17, 2020

Azure Pipelines successfully started running 5 pipeline(s).
@azure-pipelines
Copy link

azure-pipelines bot commented Aug 17, 2020

Azure Pipelines successfully started running 8 pipeline(s).
@faxu
Copy link
Contributor

faxu commented Aug 18, 2020

/azp run Windows GPU CI Pipeline, WIndows GPU TensorRT CI Pipeline, centos7_cpu, centos7_cpu (linux_centos_ci Debug), centos7_cpu (linux_centos_ci Release), orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline

@faxu
Copy link
Contributor

faxu commented Aug 18, 2020

/azp run Linux CPU CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline

@azure-pipelines
Copy link

azure-pipelines bot commented Aug 18, 2020

Azure Pipelines successfully started running 5 pipeline(s).
@azure-pipelines
Copy link

azure-pipelines bot commented Aug 18, 2020

Azure Pipelines successfully started running 8 pipeline(s).
@faxu
Copy link
Contributor

faxu commented Aug 19, 2020

@orausch are you planning more updates? Let me know when the CIs are ready to be run.

orausch added 2 commits Aug 26, 2020
This fixes some issues with node schema resolution
@orausch
Copy link
Contributor Author

orausch commented Aug 26, 2020

@faxu after the latest changes I believe that the PR should pass the CI (I managed to build it on Linux and Windows using build.bat and build.sh).

@faxu
Copy link
Contributor

faxu commented Aug 26, 2020

/azp run Windows GPU CI Pipeline, WIndows GPU TensorRT CI Pipeline, centos7_cpu, centos7_cpu (linux_centos_ci Debug), centos7_cpu (linux_centos_ci Release), orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline

@faxu
Copy link
Contributor

faxu commented Aug 26, 2020

/azp run Linux CPU CI Pipeline, Linux CPU x64 NoContribops CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, MacOS NoContribops CI Pipeline, Windows CPU CI Pipeline

@azure-pipelines
Copy link

azure-pipelines bot commented Aug 26, 2020

Azure Pipelines successfully started running 5 pipeline(s).
@azure-pipelines
Copy link

azure-pipelines bot commented Aug 26, 2020

Azure Pipelines successfully started running 8 pipeline(s).
@jywu-msft
Copy link
Member

jywu-msft commented Aug 31, 2020

This seems pretty useful. +@RyanUnderhill @pranavsharma

@jywu-msft jywu-msft requested a review from RyanUnderhill Aug 31, 2020
@codemzs
Copy link
Member

codemzs commented Sep 21, 2020

Hi @jywu-msft / @orausch either we get traction on this PR or we close it. Can you please drive this to closure? it has been outstanding for a while.

@orausch
Copy link
Contributor Author

orausch commented Sep 22, 2020

Thanks for following up @codemzs. I think a good next step would be to get a review in from someone on the ORT team.

Let me know if there is any other way I can help drive this forward.

@codemzs
Copy link
Member

codemzs commented Sep 22, 2020

@orausch I believe Pranav from ORT team will be looking at this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants
You can’t perform that action at this time.