Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: introduce dropout primitive #760

Open
wants to merge 1 commit into
base: rfcs
from

Conversation

@piotrchmiel
Copy link
Contributor

@piotrchmiel piotrchmiel commented Jun 19, 2020

Rendered document

RFC concerns introduction of dropout primitive in oneDNN. Full motivation is described inside document.

rfcs/20200619-dropout/README.md Outdated Show resolved Hide resolved
rfcs/20200619-dropout/README.md Outdated Show resolved Hide resolved
rfcs/20200619-dropout/README.md Show resolved Hide resolved
rfcs/20200619-dropout/README.md Outdated Show resolved Hide resolved
@emfomenk emfomenk added the RFC label Jun 19, 2020
@piotrchmiel piotrchmiel force-pushed the piotrchmiel:dropout branch from c3a114b to 614b651 Jun 19, 2020
Copy link
Contributor

@mgouicem mgouicem left a comment

I agree that a standalone primitive is tempting, but what about making dropout an attribute (either a postop or a standalone attribute)? In that case, we could still support standalone dropout through eltwise (I guess in a first step), or fuse it with other operations (in a later step).
The reason behind that is that if we only provide a standalone dropout primitive, then this limits the usage of some primitives, typically RNNs. For those, the dropout is applied between layers and/or between iterations, so if we only have a primitive, then we prevent layer/iteration fusion.

rfcs/20200619-dropout/README.md Show resolved Hide resolved
rfcs/20200619-dropout/README.md Outdated Show resolved Hide resolved
rfcs/20200619-dropout/README.md Outdated Show resolved Hide resolved
- User should pass seed in execute instead primitive_descriptor (in intial draft,
seed was passed in the primitive descriptor)
- User passes seed in execute using DNNL_ARG_SEED. Primitivie in
DNNL_ARG_SEED memory returns end-state of rng engine. In next calls of 'execute'

This comment has been minimized.

@mgouicem

mgouicem Jun 19, 2020
Contributor

Isn't it enough to let the user pass a seed at each call to the execute function. If they need different seed, they can use any rng on their side to generate those.

This comment has been minimized.

@piotrchmiel

piotrchmiel Jun 22, 2020
Author Contributor

Based on statement above user is allowed to pass new seed. There are two use cases:

  • user wants to run dropout with completely new seed - by specifying DNNL_ARG_SEED user can pass to execute in each run seed whatever he wants;
  • user in next execution wants to use rng starting from end-state of previous execution - we allowing to do that by returning end seed.

At the end user all time is deciding what seed is passed to execution part.

This comment has been minimized.

@mgouicem

mgouicem Jun 22, 2020
Contributor

I am not sure about the second use case since it will introduce a primitive state which can cause a few problems, especially with the use of the primitive cache. I would rather force the user to pass a seed, and default to some fixed value if the seed is not provided.

This comment has been minimized.

@piotrchmiel

piotrchmiel Jun 23, 2020
Author Contributor

  1. In our proposition user always must pass DNNL_ARG_SEED.
  2. Primitive is not storing in it's memory any state between executions. Let's consider proposed MCG31m1. Starting seed (base on passed initial seed) for each thread is computed as described in document MCG31m1 vectorised algorithm. We want also to compute "end seed" and return it to user the same way as we're returning tensors. User is deciding what seed will pass for next execution. He can take returned value and pass as DNNL_ARG_SEED or generate new custom value. Why do you think that it will affect primitive cache ?
@piotrchmiel piotrchmiel force-pushed the piotrchmiel:dropout branch from 614b651 to d5769bf Jun 22, 2020
@piotrchmiel
Copy link
Contributor Author

@piotrchmiel piotrchmiel commented Jun 22, 2020

@mgouicem Just to confirm few things:

  1. Your idea is to fuse dropout to rnn primitives (https://oneapi-src.github.io/oneDNN/group__cpp__api__rnn.html) as post-op or attribute ? Correct ?
  2. Currently in frameworks dropout is implemented as two elementwise (binary) operations. As I see https://oneapi-src.github.io/oneDNN/v2.0-beta03/dev_guide_attributes_post_ops.html post-ops are not defined for rnn primitives. You mentioned "then this limits the usage of some primitives, typically RNNs". I just wanted from you to confirm statement: "Currently RNNs are also limited because binary is not fused with them" ?
@mgouicem
Copy link
Contributor

@mgouicem mgouicem commented Jun 22, 2020

@piotrchmiel ,
sorry should have been clearer on the dropout attribute idea.
Some primitive need to apply dropout between two computations internally (e.g. if we fuse layers/iteration in RNN, we might have to apply the dropout between layers/iterations inside the primitive execution).
If we offer only a dropout primitive, a user can no more fuse layers/iterations in the RNN primitive if they need to apply dropout.
So we will have to have an attribute eventually to support it for RNN (or any other primitive where applying dropout would prevent fusion). Note that it is not a post-op per se since it happens in the middle of the computation for RNN, not at the end.
Now, the suggestion is, why not introduce such an attribute right now, so that we have only one mechanism to support dropout instead of two (a standalone primitive + an attribute)?

@piotrchmiel
Copy link
Contributor Author

@piotrchmiel piotrchmiel commented Jun 23, 2020

@mgouicem Ok, so if I understand correctly you're proposing attribute instead of post-ops, because you want to enable fused operation not only at the end of execution of the primitive, but also possibly in the middle where applicable.

  1. How it is implemented now ? For example if you need dropout in the middle of RNN primitive execution, how you deal with that ? You simply don't use RNN primitives ?
  2. Why do you think standalone primitive is obsolete not complementary to attribute approach? Attribute will require injecting implementation of dropout in all primitives where user wants to apply it. As you mentioned sometimes somewhere in the middle, sometimes at the end. Attribute allows possible the best fuse method, but it requires implementation of injection inside all primitives where we will identify that user may wants to use dropout. What if the user wanted to use dropout in other than we identified ? Standalone primitive will allow him to do that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.