Adding TensorFlow support to the Machine Learning overview page #22949

rszper · 2022-08-29T22:53:28Z

Adding a section about TensorFlow support to the Machine Learning overview page.

I also cleaned up some of the text in the overview page.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

rszper · 2022-08-29T22:53:59Z

R: @rezarokni
R: @tvalentyn

github-actions · 2022-08-29T22:55:16Z

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

rezarokni · 2022-08-30T13:22:33Z

website/www/site/content/en/documentation/sdks/python-machine-learning.md

-RunInference API is available to Beam Java SDK 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). Please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java) for the Java wrapper transform to use and please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java) for some example pipelines.
+The RunInference API is available with the Beam Java SDK versions 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). For information about the Java wrapper transform, see [RunInference.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java). For example pipelines, see [RunInferenceTransformTest.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java).
+
+## TensorFlow support


@ryanthompson591 could you please provide a canonical example here that includes the imports needed?

I'm not sure what exactly you want as a canonical example. Do you mean sample code that would go into this doc, or something else?

Yes something very simple that shows what imports are needed etc... It does not need show how to build a model but just enough to help folks not have to dig around to find the dependencies etc.. As this handler is a little different from the norm

I added a section with TFx imports (below). Please review to make sure it's accurate.

Note we can add any extra information in a separate PR.

Excellent suggestion. I made these changes.

@ryanthompson591 can you test that this works?

Did we do this?

file_pattern=predict_values_five_times_table)

predict_values_five_times_table, save_model_dir_multiply is not defied in this snippet, so it's somewhat confusing.

I agree that it's confusing; I think we're planning to add the notebook as soon as we have it ready.

I think it's ok to add this code and it looks working. However, it won't work as is since it relies on save_model_dir_multiply existing and being set. Also predict_values_five_times_table doesnt exist.

With other snippets they just work as is, but this takes some setup. I'm fine with this code but IMO referencing a notebook or something that works makes more sense.

ryanthompson591 · 2022-08-30T13:47:33Z

website/www/site/content/en/documentation/sdks/python-machine-learning.md

-RunInference API is available to Beam Java SDK 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). Please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java) for the Java wrapper transform to use and please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java) for some example pipelines.
+The RunInference API is available with the Beam Java SDK versions 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). For information about the Java wrapper transform, see [RunInference.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java). For example pipelines, see [RunInferenceTransformTest.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java).
+
+## TensorFlow support


I'm not sure what exactly you want as a canonical example. Do you mean sample code that would go into this doc, or something else?

ryanthompson591 · 2022-08-30T13:47:33Z

website/www/site/content/en/documentation/sdks/python-machine-learning.md

+For more information, see [run_inference.py](https://github.com/tensorflow/tfx-bsl/blob/d1fca25e5eeaac9ef0111ec13e7634df836f36f6/tfx_bsl/public/beam/run_inference.py) in the TensorFlow GitHub repository.
+
+```
+tf_handler = CreateModelHandler(inference_spec_type)


In this example it may make sense to add some information about importing the correct libs from tfx-bsl.

Probably the best way to get this done accurately would be to make a simple notebook and then import the code from that here.

I can try to make some time this week to do that. Probably it's possible to just modify some code in the notebook you created that showed using the tfx-bsl interface and just replace that with the beam interface.

I added an example below based on the email thread. Should I link to this notebook in this section so that customers can use it to import the code from?

We will need to link the official notebook linked from github. But we can do this as a separate PR as that will take a while to get done.

website/www/site/content/en/documentation/sdks/python-machine-learning.md

rszper · 2022-08-30T21:15:55Z

website/www/site/content/en/documentation/sdks/python-machine-learning.md

+  A Beam RunInference ModelHandler for TensorFlow
+```
+
+Next, in your pipeline, import the required modules:


Are these modules or libraries?

I think its fine to call them modules.

In general a python module is encapsulated in a single file, whereas a library is the entire suite.

So in our case, tfx-bsl would be a library, and txf-bsl.public.beam.CreateModelHandler would be a module.

technically these are modules and classes, but imports is something that's being done all the time, before doing anything else. See the suggestion above for possible narrative.

I'm using the example above, but I also think it's possibly helpful to tell users exactly which modules are required.

website/www/site/content/en/documentation/sdks/python-machine-learning.md

tvalentyn · 2022-08-31T20:58:22Z

website/www/site/content/en/documentation/sdks/python-machine-learning.md

+beam.run_inference(tf_handler)
+
+# keyed
+beam.run_inference(beam.ml.inference.KeyedHandler(tf_handler))


@ryanthompson591 : this seems broken. I think it will be removed from this PR but we should fix the source:

It should be KeyedModelHandler

I think there is mixing of tfx_bsl.beam and apache_beam (imported as beam). I can imagine this will cause confusion.

nit: in inferece_spec_type = model_spec_pb2.InferenceSpecType(saved_model_spec=saved_model_spec) there is typo in inference.

@tvalentyn is right, it should be KeyedModelHandler.

This might be more complete (and I ran this through a colab notebook).

from apache_beam.ml.inference.base import RunInference from apache_beam.ml.inference.base import KeyedModelHandler tf_handler = CreateModelHandler(inference_spec_type) # unkeyed RunInference(tf_handler) # keyed RunInference(KeyedModelHandler(tf_handler))

I updated the examples, switched to KeyedModelHandler, and fixed the typos (hoping I didn't add new ones).

ryanthompson591 · 2022-08-31T21:44:42Z

website/www/site/content/en/documentation/sdks/python-machine-learning.md

+
+To use TensorFlow with the RunInference API, you need to create a model handler from within `tfx_bsl`, import the required modules, and add the necessary code to your pipline.
+
+First, create a model handler from within `tfx_bsl`. The model handler can be keyed or unkeyed.


The model handler that is created from within tfx-bsl is always unkeyed. In order to make a keyed model handler, you would need to wrap it in the keyed model handler (which would take the tfx-bsl) model handler as a parameter.

eg.
beam.run_inference(beam.ml.inference.KeyedHandler(tf_handler))

In if you were unsure if your data is keyed you could also use the maybe_keyed handler.

I added this information to the page.

website/www/site/content/en/documentation/sdks/python-machine-learning.md

ryanthompson591 · 2022-08-31T21:44:42Z

website/www/site/content/en/documentation/sdks/python-machine-learning.md

+  A Beam RunInference ModelHandler for TensorFlow
+```
+
+Next, in your pipeline, import the required modules:


I think its fine to call them modules.

In general a python module is encapsulated in a single file, whereas a library is the entire suite.

So in our case, tfx-bsl would be a library, and txf-bsl.public.beam.CreateModelHandler would be a module.

ryanthompson591 · 2022-09-02T16:20:55Z

website/www/site/content/en/documentation/sdks/python-machine-learning.md

+The model handler that is created from within `tfx-bsl` is always unkeyed. To make a keyed model handler, wrap the unkeyed model handler in the keyed model handler, which would then take the `tfx-bsl` model handler as a parameter. For example:
+
+```
+beam.run_inference(beam.ml.inference.KeyedModelHandler(tf_handler))


It more accurately should be
beam.ml.inference.base.KeyedModelHandler.

Or better:
from apache_beam.ml.inference.base import RunInference
from apache_beam.ml.inference.base import KeyedModelHandler
RunInference(KeyedModelHandler(tf_handler))

Also there is no beam.run_inference. This typo might ultimately have me to blame.

I updated this example.

tvalentyn · 2022-09-02T19:24:51Z

website/www/site/content/en/documentation/sdks/python-machine-learning.md

+        )
+```
+
+First, within `tfx_bsl`, create a model handler. For more information, see [run_inference.py](https://github.com/tensorflow/tfx-bsl/blob/d1fca25e5eeaac9ef0111ec13e7634df836f36f6/tfx_bsl/public/beam/run_inference.py) in the TensorFlow GitHub repository.


I don't quite follow the within tfx_bsl statement. We use a function from a library. Doing something within a library sounds strange.

the link to d1fca25e5eeaac9ef0111ec13e7634df836f36f6 pinpoints to old revision with buggy comments. Ideally we would refer to a recent file or better, their documentation. I found https://www.tensorflow.org/tfx/tfx_bsl/api_docs/python/tfx_bsl/public/beam/run_inference but it is not updated... @ryanthompson591 @rezarokni do you know if TFX plans to update that page? Can we file a bug?

Our options: link to master(bleeding edge): https://github.com/tensorflow/tfx-bsl/blob/master/tfx_bsl/public/beam/run_inference.py, or current revision where bug is fixed(https://github.com/tensorflow/tfx-bsl/blob/fbd5195b260ef816d49f50a2d8b1aa4ff1362e71/tfx_bsl/public/beam/run_inference.py#L257), or (my preference) remove lines 200-210 entirely since this narrative repeats line 176.

I'll remove the lines. We can add content back when we have a better link.

tvalentyn · 2022-09-02T19:30:56Z

website/www/site/content/en/documentation/sdks/python-machine-learning.md

+RunInference(KeyedModelHandler(tf_handler))
+```
+
+The model handler that is created from within `tfx-bsl` is always unkeyed. To make a keyed model handler, wrap the unkeyed model handler in the keyed model handler, which would then take the `tfx-bsl` model handler as a parameter. For example:


Suggested change

The model handler that is created from within `tfx-bsl` is always unkeyed. To make a keyed model handler, wrap the unkeyed model handler in the keyed model handler, which would then take the `tfx-bsl` model handler as a parameter. For example:

The model handler that is created with `CreateModelHander()` is always unkeyed. To make a keyed model handler, wrap the unkeyed model handler in the keyed model handler, which would then take the `tfx-bsl` model handler as a parameter. For example:

Optional:

is always unkeyed

If there is description in beam docs re: keyed versus unkeyed, we could link it here

Adding a link

tvalentyn · 2022-09-02T19:30:56Z

website/www/site/content/en/documentation/sdks/python-machine-learning.md

+
+If you are unsure if your data is keyed, you can also use the `maybe_keyed` handler.
+
+Next, import the required modules:


module imports come first, so this is out of place. Again, this was mentioned in the example, and somewhat self-evident.

tvalentyn · 2022-09-02T19:30:56Z

website/www/site/content/en/documentation/sdks/python-machine-learning.md

-RunInference API is available to Beam Java SDK 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). Please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java) for the Java wrapper transform to use and please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java) for some example pipelines.
+The RunInference API is available with the Beam Java SDK versions 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). For information about the Java wrapper transform, see [RunInference.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java). For example pipelines, see [RunInferenceTransformTest.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java).
+
+## TensorFlow support


file_pattern=predict_values_five_times_table)

predict_values_five_times_table, save_model_dir_multiply is not defied in this snippet, so it's somewhat confusing.

tvalentyn · 2022-09-02T19:35:22Z

website/www/site/content/en/documentation/sdks/python-machine-learning.md

+from tfx_bsl.public.beam.run_inference import CreateModelHandler
+```
+
+Finally, add the code to your pipeline. This example shows a pipeline that uses a model that multiplies by five.


This example shows a pipeline that uses a model that multiplies by five.

This again repeats the code above. also same concern: is this model save_model_dir_multiply part of our examples? If so, we could perhaps define the variable more precisely... or if there is/will be a notebook, we can mention that this is a sketch and say smth like: see ... for a complete example.

I'm removing all examples below the first one. I think we're waiting for the notebook and plan to add a link to it when it's available.

tvalentyn · 2022-09-02T19:57:27Z

Ok, I think this is good enough as a first step and we can iterate. I can merge if there are no additional concerns or feedback from @ryanthompson591 .

ryanthompson591

Couple small things.

ryanthompson591 · 2022-09-02T21:41:27Z

website/www/site/content/en/documentation/sdks/python-machine-learning.md

-RunInference API is available to Beam Java SDK 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). Please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java) for the Java wrapper transform to use and please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java) for some example pipelines.
+The RunInference API is available with the Beam Java SDK versions 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). For information about the Java wrapper transform, see [RunInference.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java). For example pipelines, see [RunInferenceTransformTest.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java).
+
+## TensorFlow support


I think it's ok to add this code and it looks working. However, it won't work as is since it relies on save_model_dir_multiply existing and being set. Also predict_values_five_times_table doesnt exist.

With other snippets they just work as is, but this takes some setup. I'm fine with this code but IMO referencing a notebook or something that works makes more sense.

ryanthompson591 · 2022-09-02T21:41:27Z

website/www/site/content/en/documentation/sdks/python-machine-learning.md

+RunInference(KeyedModelHandler(tf_handler))
+```
+
+If you are unsure if your data is keyed, you can also use the `maybe_keyed` handler.


change maybe_keyed to MaybeKeyedModelHandler.

beam/sdks/python/apache_beam/ml/inference/base.py

Line 196 in 8c57b21

class MaybeKeyedModelHandler(Generic[KeyT, ExampleT, PredictionT, ModelT],

tvalentyn · 2022-09-02T21:51:56Z

Thanks, @ryanthompson591 , I can commit these changes to this PR.

website/www/site/content/en/documentation/sdks/python-machine-learning.md

…learning.md

tvalentyn · 2022-09-02T21:58:21Z

I can commit these changes to this PR.

done.

tvalentyn · 2022-09-02T22:32:42Z

Thanks everyone!

rszper added 2 commits Aug 29, 2022

Adding TensorFlow support to the Machine Learning overview page

51ff7db

Fixing typo

03d90e8

github-actions bot added the website label Aug 29, 2022

rezarokni reviewed Aug 30, 2022

View changes

ryanthompson591 reviewed Aug 30, 2022

View changes

rszper added 2 commits Aug 30, 2022

removed return statement

39a7a15

Added TFx imports, pipeline code, and codelab and notebook link

4369bae

rszper reviewed Aug 30, 2022

View changes

website/www/site/content/en/documentation/sdks/python-machine-learning.md Outdated Show resolved Hide resolved

rezarokni approved these changes Aug 31, 2022

View changes

tvalentyn reviewed Aug 31, 2022

View changes

ryanthompson591 reviewed Aug 31, 2022

View changes

Restructured TFx support section

d0c1827

ryanthompson591 reviewed Sep 2, 2022

View changes

rszper added 3 commits Sep 2, 2022

Updated keyed model handler example

33b4545

Removed whitespace

20176f0

Removed whitespace

547bdf8

tvalentyn reviewed Sep 2, 2022

View changes

Removed content with outdated example

02a6c38

tvalentyn reviewed Sep 2, 2022

View changes

Simplified and updated content

0b0b00c

tvalentyn approved these changes Sep 2, 2022

View changes

ryanthompson591 reviewed Sep 2, 2022

View changes

tvalentyn reviewed Sep 2, 2022

View changes

Update website/www/site/content/en/documentation/sdks/python-machine-…

304a17b

…learning.md

tvalentyn added 2 commits Sep 2, 2022

Update website/www/site/content/en/documentation/sdks/python-machine-…

e5a2134

…learning.md

Update website/www/site/content/en/documentation/sdks/python-machine-…

aa1dc8f

…learning.md

tvalentyn merged commit 31561e2 into apache:master Sep 2, 2022
6 checks passed


		To use TensorFlow with the RunInference API, you need to create a model handler from within `tfx_bsl`, import the required modules, and add the necessary code to your pipline.

		First, create a model handler from within `tfx_bsl`. The model handler can be keyed or unkeyed.

	The model handler that is created from within `tfx-bsl` is always unkeyed. To make a keyed model handler, wrap the unkeyed model handler in the keyed model handler, which would then take the `tfx-bsl` model handler as a parameter. For example:
	The model handler that is created with `CreateModelHander()` is always unkeyed. To make a keyed model handler, wrap the unkeyed model handler in the keyed model handler, which would then take the `tfx-bsl` model handler as a parameter. For example:


		If you are unsure if your data is keyed, you can also use the `maybe_keyed` handler.

		Next, import the required modules:

Adding TensorFlow support to the Machine Learning overview page #22949

Adding TensorFlow support to the Machine Learning overview page #22949

Conversation

rszper commented Aug 29, 2022

GitHub Actions Tests Status (on master branch)

rszper commented Aug 29, 2022

github-actions bot commented Aug 29, 2022

rezarokni Aug 30, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tvalentyn commented Sep 2, 2022

ryanthompson591 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tvalentyn commented Sep 2, 2022 • edited

tvalentyn commented Sep 2, 2022

tvalentyn commented Sep 2, 2022

rezarokni Aug 30, 2022 •

edited

tvalentyn commented Sep 2, 2022 •

edited