New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding TensorFlow support to the Machine Learning overview page #22949
Conversation
|
R: @rezarokni |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
| RunInference API is available to Beam Java SDK 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). Please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java) for the Java wrapper transform to use and please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java) for some example pipelines. | ||
| The RunInference API is available with the Beam Java SDK versions 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). For information about the Java wrapper transform, see [RunInference.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java). For example pipelines, see [RunInferenceTransformTest.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java). | ||
|
|
||
| ## TensorFlow support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryanthompson591 could you please provide a canonical example here that includes the imports needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what exactly you want as a canonical example. Do you mean sample code that would go into this doc, or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes something very simple that shows what imports are needed etc... It does not need show how to build a model but just enough to help folks not have to dig around to find the dependencies etc.. As this handler is a little different from the norm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a section with TFx imports (below). Please review to make sure it's accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note we can add any extra information in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent suggestion. I made these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryanthompson591 can you test that this works?
Did we do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file_pattern=predict_values_five_times_table)
predict_values_five_times_table, save_model_dir_multiply is not defied in this snippet, so it's somewhat confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's confusing; I think we're planning to add the notebook as soon as we have it ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok to add this code and it looks working. However, it won't work as is since it relies on save_model_dir_multiply existing and being set. Also predict_values_five_times_table doesnt exist.
With other snippets they just work as is, but this takes some setup. I'm fine with this code but IMO referencing a notebook or something that works makes more sense.
| RunInference API is available to Beam Java SDK 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). Please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java) for the Java wrapper transform to use and please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java) for some example pipelines. | ||
| The RunInference API is available with the Beam Java SDK versions 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). For information about the Java wrapper transform, see [RunInference.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java). For example pipelines, see [RunInferenceTransformTest.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java). | ||
|
|
||
| ## TensorFlow support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what exactly you want as a canonical example. Do you mean sample code that would go into this doc, or something else?
| For more information, see [run_inference.py](https://github.com/tensorflow/tfx-bsl/blob/d1fca25e5eeaac9ef0111ec13e7634df836f36f6/tfx_bsl/public/beam/run_inference.py) in the TensorFlow GitHub repository. | ||
|
|
||
| ``` | ||
| tf_handler = CreateModelHandler(inference_spec_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this example it may make sense to add some information about importing the correct libs from tfx-bsl.
Probably the best way to get this done accurately would be to make a simple notebook and then import the code from that here.
I can try to make some time this week to do that. Probably it's possible to just modify some code in the notebook you created that showed using the tfx-bsl interface and just replace that with the beam interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an example below based on the email thread. Should I link to this notebook in this section so that customers can use it to import the code from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to link the official notebook linked from github. But we can do this as a separate PR as that will take a while to get done.
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
| A Beam RunInference ModelHandler for TensorFlow | ||
| ``` | ||
|
|
||
| Next, in your pipeline, import the required modules: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these modules or libraries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its fine to call them modules.
In general a python module is encapsulated in a single file, whereas a library is the entire suite.
So in our case, tfx-bsl would be a library, and txf-bsl.public.beam.CreateModelHandler would be a module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically these are modules and classes, but imports is something that's being done all the time, before doing anything else. See the suggestion above for possible narrative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using the example above, but I also think it's possibly helpful to tell users exactly which modules are required.
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
| beam.run_inference(tf_handler) | ||
|
|
||
| # keyed | ||
| beam.run_inference(beam.ml.inference.KeyedHandler(tf_handler)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryanthompson591 : this seems broken. I think it will be removed from this PR but we should fix the source:
- It should be KeyedModelHandler
- I think there is mixing of tfx_bsl.beam and apache_beam (imported as beam). I can imagine this will cause confusion.
- nit: in
inferece_spec_type = model_spec_pb2.InferenceSpecType(saved_model_spec=saved_model_spec)there is typo ininference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tvalentyn is right, it should be KeyedModelHandler.
This might be more complete (and I ran this through a colab notebook).
from apache_beam.ml.inference.base import RunInference
from apache_beam.ml.inference.base import KeyedModelHandler
tf_handler = CreateModelHandler(inference_spec_type)
# unkeyed
RunInference(tf_handler)
# keyed
RunInference(KeyedModelHandler(tf_handler))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the examples, switched to KeyedModelHandler, and fixed the typos (hoping I didn't add new ones).
|
|
||
| To use TensorFlow with the RunInference API, you need to create a model handler from within `tfx_bsl`, import the required modules, and add the necessary code to your pipline. | ||
|
|
||
| First, create a model handler from within `tfx_bsl`. The model handler can be keyed or unkeyed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model handler that is created from within tfx-bsl is always unkeyed. In order to make a keyed model handler, you would need to wrap it in the keyed model handler (which would take the tfx-bsl) model handler as a parameter.
eg.
beam.run_inference(beam.ml.inference.KeyedHandler(tf_handler))
In if you were unsure if your data is keyed you could also use the maybe_keyed handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this information to the page.
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
| A Beam RunInference ModelHandler for TensorFlow | ||
| ``` | ||
|
|
||
| Next, in your pipeline, import the required modules: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its fine to call them modules.
In general a python module is encapsulated in a single file, whereas a library is the entire suite.
So in our case, tfx-bsl would be a library, and txf-bsl.public.beam.CreateModelHandler would be a module.
| The model handler that is created from within `tfx-bsl` is always unkeyed. To make a keyed model handler, wrap the unkeyed model handler in the keyed model handler, which would then take the `tfx-bsl` model handler as a parameter. For example: | ||
|
|
||
| ``` | ||
| beam.run_inference(beam.ml.inference.KeyedModelHandler(tf_handler)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It more accurately should be
beam.ml.inference.base.KeyedModelHandler.
Or better:
from apache_beam.ml.inference.base import RunInference
from apache_beam.ml.inference.base import KeyedModelHandler
RunInference(KeyedModelHandler(tf_handler))
Also there is no beam.run_inference. This typo might ultimately have me to blame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this example.
| ) | ||
| ``` | ||
|
|
||
| First, within `tfx_bsl`, create a model handler. For more information, see [run_inference.py](https://github.com/tensorflow/tfx-bsl/blob/d1fca25e5eeaac9ef0111ec13e7634df836f36f6/tfx_bsl/public/beam/run_inference.py) in the TensorFlow GitHub repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I don't quite follow the within
tfx_bslstatement. We use a function from a library. Doing somethingwithina library sounds strange. - the link to d1fca25e5eeaac9ef0111ec13e7634df836f36f6 pinpoints to old revision with buggy comments. Ideally we would refer to a recent file or better, their documentation. I found https://www.tensorflow.org/tfx/tfx_bsl/api_docs/python/tfx_bsl/public/beam/run_inference but it is not updated... @ryanthompson591 @rezarokni do you know if TFX plans to update that page? Can we file a bug?
- Our options: link to master(bleeding edge): https://github.com/tensorflow/tfx-bsl/blob/master/tfx_bsl/public/beam/run_inference.py, or current revision where bug is fixed(https://github.com/tensorflow/tfx-bsl/blob/fbd5195b260ef816d49f50a2d8b1aa4ff1362e71/tfx_bsl/public/beam/run_inference.py#L257), or (my preference) remove lines 200-210 entirely since this narrative repeats line 176.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove the lines. We can add content back when we have a better link.
| RunInference(KeyedModelHandler(tf_handler)) | ||
| ``` | ||
|
|
||
| The model handler that is created from within `tfx-bsl` is always unkeyed. To make a keyed model handler, wrap the unkeyed model handler in the keyed model handler, which would then take the `tfx-bsl` model handler as a parameter. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The model handler that is created from within `tfx-bsl` is always unkeyed. To make a keyed model handler, wrap the unkeyed model handler in the keyed model handler, which would then take the `tfx-bsl` model handler as a parameter. For example: | |
| The model handler that is created with `CreateModelHander()` is always unkeyed. To make a keyed model handler, wrap the unkeyed model handler in the keyed model handler, which would then take the `tfx-bsl` model handler as a parameter. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional:
is always unkeyed
If there is description in beam docs re: keyed versus unkeyed, we could link it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a link
|
|
||
| If you are unsure if your data is keyed, you can also use the `maybe_keyed` handler. | ||
|
|
||
| Next, import the required modules: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
module imports come first, so this is out of place. Again, this was mentioned in the example, and somewhat self-evident.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
| RunInference API is available to Beam Java SDK 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). Please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java) for the Java wrapper transform to use and please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java) for some example pipelines. | ||
| The RunInference API is available with the Beam Java SDK versions 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). For information about the Java wrapper transform, see [RunInference.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java). For example pipelines, see [RunInferenceTransformTest.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java). | ||
|
|
||
| ## TensorFlow support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file_pattern=predict_values_five_times_table)
predict_values_five_times_table, save_model_dir_multiply is not defied in this snippet, so it's somewhat confusing.
| from tfx_bsl.public.beam.run_inference import CreateModelHandler | ||
| ``` | ||
|
|
||
| Finally, add the code to your pipeline. This example shows a pipeline that uses a model that multiplies by five. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example shows a pipeline that uses a model that multiplies by five.
This again repeats the code above. also same concern: is this model save_model_dir_multiply part of our examples? If so, we could perhaps define the variable more precisely... or if there is/will be a notebook, we can mention that this is a sketch and say smth like: see ... for a complete example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm removing all examples below the first one. I think we're waiting for the notebook and plan to add a link to it when it's available.
|
Ok, I think this is good enough as a first step and we can iterate. I can merge if there are no additional concerns or feedback from @ryanthompson591 . |
| RunInference API is available to Beam Java SDK 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). Please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java) for the Java wrapper transform to use and please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java) for some example pipelines. | ||
| The RunInference API is available with the Beam Java SDK versions 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). For information about the Java wrapper transform, see [RunInference.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java). For example pipelines, see [RunInferenceTransformTest.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java). | ||
|
|
||
| ## TensorFlow support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok to add this code and it looks working. However, it won't work as is since it relies on save_model_dir_multiply existing and being set. Also predict_values_five_times_table doesnt exist.
With other snippets they just work as is, but this takes some setup. I'm fine with this code but IMO referencing a notebook or something that works makes more sense.
| RunInference(KeyedModelHandler(tf_handler)) | ||
| ``` | ||
|
|
||
| If you are unsure if your data is keyed, you can also use the `maybe_keyed` handler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change maybe_keyed to MaybeKeyedModelHandler.
| class MaybeKeyedModelHandler(Generic[KeyT, ExampleT, PredictionT, ModelT], |
|
Thanks, @ryanthompson591 , I can commit these changes to this PR. |
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
website/www/site/content/en/documentation/sdks/python-machine-learning.md
Outdated
Show resolved
Hide resolved
done. |
|
Thanks everyone! |
Adding a section about TensorFlow support to the Machine Learning overview page.
I also cleaned up some of the text in the overview page.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.