Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding TensorFlow support to the Machine Learning overview page #22949

Merged
merged 13 commits into from Sep 2, 2022

Conversation

rszper
Copy link
Contributor

@rszper rszper commented Aug 29, 2022

Adding a section about TensorFlow support to the Machine Learning overview page.

I also cleaned up some of the text in the overview page.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@rszper
Copy link
Contributor Author

rszper commented Aug 29, 2022

R: @rezarokni
R: @tvalentyn

@github-actions
Copy link

github-actions bot commented Aug 29, 2022

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

RunInference API is available to Beam Java SDK 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). Please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java) for the Java wrapper transform to use and please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java) for some example pipelines.
The RunInference API is available with the Beam Java SDK versions 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). For information about the Java wrapper transform, see [RunInference.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java). For example pipelines, see [RunInferenceTransformTest.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java).

## TensorFlow support
Copy link
Contributor

@rezarokni rezarokni Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanthompson591 could you please provide a canonical example here that includes the imports needed?

Copy link
Contributor

@ryanthompson591 ryanthompson591 Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what exactly you want as a canonical example. Do you mean sample code that would go into this doc, or something else?

Copy link
Contributor

@rezarokni rezarokni Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes something very simple that shows what imports are needed etc... It does not need show how to build a model but just enough to help folks not have to dig around to find the dependencies etc.. As this handler is a little different from the norm

Copy link
Contributor Author

@rszper rszper Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a section with TFx imports (below). Please review to make sure it's accurate.

Copy link
Contributor

@rezarokni rezarokni Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note we can add any extra information in a separate PR.

Copy link
Contributor Author

@rszper rszper Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent suggestion. I made these changes.

Copy link
Contributor

@tvalentyn tvalentyn Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanthompson591 can you test that this works?

Did we do this?

Copy link
Contributor

@tvalentyn tvalentyn Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file_pattern=predict_values_five_times_table)

predict_values_five_times_table, save_model_dir_multiply is not defied in this snippet, so it's somewhat confusing.

Copy link
Contributor Author

@rszper rszper Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's confusing; I think we're planning to add the notebook as soon as we have it ready.

Copy link
Contributor

@ryanthompson591 ryanthompson591 Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to add this code and it looks working. However, it won't work as is since it relies on save_model_dir_multiply existing and being set. Also predict_values_five_times_table doesnt exist.

With other snippets they just work as is, but this takes some setup. I'm fine with this code but IMO referencing a notebook or something that works makes more sense.

RunInference API is available to Beam Java SDK 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). Please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java) for the Java wrapper transform to use and please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java) for some example pipelines.
The RunInference API is available with the Beam Java SDK versions 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). For information about the Java wrapper transform, see [RunInference.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java). For example pipelines, see [RunInferenceTransformTest.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java).

## TensorFlow support
Copy link
Contributor

@ryanthompson591 ryanthompson591 Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what exactly you want as a canonical example. Do you mean sample code that would go into this doc, or something else?

For more information, see [run_inference.py](https://github.com/tensorflow/tfx-bsl/blob/d1fca25e5eeaac9ef0111ec13e7634df836f36f6/tfx_bsl/public/beam/run_inference.py) in the TensorFlow GitHub repository.

```
tf_handler = CreateModelHandler(inference_spec_type)
Copy link
Contributor

@ryanthompson591 ryanthompson591 Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example it may make sense to add some information about importing the correct libs from tfx-bsl.

Probably the best way to get this done accurately would be to make a simple notebook and then import the code from that here.

I can try to make some time this week to do that. Probably it's possible to just modify some code in the notebook you created that showed using the tfx-bsl interface and just replace that with the beam interface.

Copy link
Contributor Author

@rszper rszper Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an example below based on the email thread. Should I link to this notebook in this section so that customers can use it to import the code from?

Copy link
Contributor

@rezarokni rezarokni Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to link the official notebook linked from github. But we can do this as a separate PR as that will take a while to get done.

A Beam RunInference ModelHandler for TensorFlow
```

Next, in your pipeline, import the required modules:
Copy link
Contributor Author

@rszper rszper Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these modules or libraries?

Copy link
Contributor

@ryanthompson591 ryanthompson591 Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its fine to call them modules.

In general a python module is encapsulated in a single file, whereas a library is the entire suite.

So in our case, tfx-bsl would be a library, and txf-bsl.public.beam.CreateModelHandler would be a module.

Copy link
Contributor

@tvalentyn tvalentyn Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically these are modules and classes, but imports is something that's being done all the time, before doing anything else. See the suggestion above for possible narrative.

Copy link
Contributor Author

@rszper rszper Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using the example above, but I also think it's possibly helpful to tell users exactly which modules are required.

beam.run_inference(tf_handler)

# keyed
beam.run_inference(beam.ml.inference.KeyedHandler(tf_handler))
Copy link
Contributor

@tvalentyn tvalentyn Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanthompson591 : this seems broken. I think it will be removed from this PR but we should fix the source:

  • It should be KeyedModelHandler
  • I think there is mixing of tfx_bsl.beam and apache_beam (imported as beam). I can imagine this will cause confusion.
  • nit: in inferece_spec_type = model_spec_pb2.InferenceSpecType(saved_model_spec=saved_model_spec) there is typo in inference.

Copy link
Contributor

@ryanthompson591 ryanthompson591 Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tvalentyn is right, it should be KeyedModelHandler.

This might be more complete (and I ran this through a colab notebook).

    from apache_beam.ml.inference.base import RunInference
    from apache_beam.ml.inference.base import KeyedModelHandler

    tf_handler = CreateModelHandler(inference_spec_type)
    # unkeyed
    RunInference(tf_handler)

    # keyed
    RunInference(KeyedModelHandler(tf_handler))

Copy link
Contributor Author

@rszper rszper Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the examples, switched to KeyedModelHandler, and fixed the typos (hoping I didn't add new ones).


To use TensorFlow with the RunInference API, you need to create a model handler from within `tfx_bsl`, import the required modules, and add the necessary code to your pipline.

First, create a model handler from within `tfx_bsl`. The model handler can be keyed or unkeyed.
Copy link
Contributor

@ryanthompson591 ryanthompson591 Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model handler that is created from within tfx-bsl is always unkeyed. In order to make a keyed model handler, you would need to wrap it in the keyed model handler (which would take the tfx-bsl) model handler as a parameter.

eg.
beam.run_inference(beam.ml.inference.KeyedHandler(tf_handler))

In if you were unsure if your data is keyed you could also use the maybe_keyed handler.

Copy link
Contributor Author

@rszper rszper Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this information to the page.

A Beam RunInference ModelHandler for TensorFlow
```

Next, in your pipeline, import the required modules:
Copy link
Contributor

@ryanthompson591 ryanthompson591 Aug 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its fine to call them modules.

In general a python module is encapsulated in a single file, whereas a library is the entire suite.

So in our case, tfx-bsl would be a library, and txf-bsl.public.beam.CreateModelHandler would be a module.

The model handler that is created from within `tfx-bsl` is always unkeyed. To make a keyed model handler, wrap the unkeyed model handler in the keyed model handler, which would then take the `tfx-bsl` model handler as a parameter. For example:

```
beam.run_inference(beam.ml.inference.KeyedModelHandler(tf_handler))
Copy link
Contributor

@ryanthompson591 ryanthompson591 Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It more accurately should be
beam.ml.inference.base.KeyedModelHandler.

Or better:
from apache_beam.ml.inference.base import RunInference
from apache_beam.ml.inference.base import KeyedModelHandler
RunInference(KeyedModelHandler(tf_handler))

Also there is no beam.run_inference. This typo might ultimately have me to blame.

Copy link
Contributor Author

@rszper rszper Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this example.

)
```

First, within `tfx_bsl`, create a model handler. For more information, see [run_inference.py](https://github.com/tensorflow/tfx-bsl/blob/d1fca25e5eeaac9ef0111ec13e7634df836f36f6/tfx_bsl/public/beam/run_inference.py) in the TensorFlow GitHub repository.
Copy link
Contributor

@tvalentyn tvalentyn Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@rszper rszper Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove the lines. We can add content back when we have a better link.

RunInference(KeyedModelHandler(tf_handler))
```

The model handler that is created from within `tfx-bsl` is always unkeyed. To make a keyed model handler, wrap the unkeyed model handler in the keyed model handler, which would then take the `tfx-bsl` model handler as a parameter. For example:
Copy link
Contributor

@tvalentyn tvalentyn Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The model handler that is created from within `tfx-bsl` is always unkeyed. To make a keyed model handler, wrap the unkeyed model handler in the keyed model handler, which would then take the `tfx-bsl` model handler as a parameter. For example:
The model handler that is created with `CreateModelHander()` is always unkeyed. To make a keyed model handler, wrap the unkeyed model handler in the keyed model handler, which would then take the `tfx-bsl` model handler as a parameter. For example:

Copy link
Contributor

@tvalentyn tvalentyn Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional:

is always unkeyed

If there is description in beam docs re: keyed versus unkeyed, we could link it here

Copy link
Contributor Author

@rszper rszper Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a link


If you are unsure if your data is keyed, you can also use the `maybe_keyed` handler.

Next, import the required modules:
Copy link
Contributor

@tvalentyn tvalentyn Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

module imports come first, so this is out of place. Again, this was mentioned in the example, and somewhat self-evident.

Copy link
Contributor Author

@rszper rszper Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

RunInference API is available to Beam Java SDK 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). Please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java) for the Java wrapper transform to use and please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java) for some example pipelines.
The RunInference API is available with the Beam Java SDK versions 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). For information about the Java wrapper transform, see [RunInference.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java). For example pipelines, see [RunInferenceTransformTest.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java).

## TensorFlow support
Copy link
Contributor

@tvalentyn tvalentyn Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file_pattern=predict_values_five_times_table)

predict_values_five_times_table, save_model_dir_multiply is not defied in this snippet, so it's somewhat confusing.

from tfx_bsl.public.beam.run_inference import CreateModelHandler
```

Finally, add the code to your pipeline. This example shows a pipeline that uses a model that multiplies by five.
Copy link
Contributor

@tvalentyn tvalentyn Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example shows a pipeline that uses a model that multiplies by five.

This again repeats the code above. also same concern: is this model save_model_dir_multiply part of our examples? If so, we could perhaps define the variable more precisely... or if there is/will be a notebook, we can mention that this is a sketch and say smth like: see ... for a complete example.

Copy link
Contributor Author

@rszper rszper Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm removing all examples below the first one. I think we're waiting for the notebook and plan to add a link to it when it's available.

@tvalentyn
Copy link
Contributor

tvalentyn commented Sep 2, 2022

Ok, I think this is good enough as a first step and we can iterate. I can merge if there are no additional concerns or feedback from @ryanthompson591 .

Copy link
Contributor

@ryanthompson591 ryanthompson591 left a comment

Couple small things.

RunInference API is available to Beam Java SDK 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). Please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java) for the Java wrapper transform to use and please see [here](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java) for some example pipelines.
The RunInference API is available with the Beam Java SDK versions 2.41.0 and later through Apache Beam's [Multi-language Pipelines framework](https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines). For information about the Java wrapper transform, see [RunInference.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/main/java/org/apache/beam/sdk/extensions/python/transforms/RunInference.java). For example pipelines, see [RunInferenceTransformTest.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/python/src/test/java/org/apache/beam/sdk/extensions/python/transforms/RunInferenceTransformTest.java).

## TensorFlow support
Copy link
Contributor

@ryanthompson591 ryanthompson591 Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to add this code and it looks working. However, it won't work as is since it relies on save_model_dir_multiply existing and being set. Also predict_values_five_times_table doesnt exist.

With other snippets they just work as is, but this takes some setup. I'm fine with this code but IMO referencing a notebook or something that works makes more sense.

RunInference(KeyedModelHandler(tf_handler))
```

If you are unsure if your data is keyed, you can also use the `maybe_keyed` handler.
Copy link
Contributor

@ryanthompson591 ryanthompson591 Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change maybe_keyed to MaybeKeyedModelHandler.

class MaybeKeyedModelHandler(Generic[KeyT, ExampleT, PredictionT, ModelT],

@tvalentyn
Copy link
Contributor

tvalentyn commented Sep 2, 2022

Thanks, @ryanthompson591 , I can commit these changes to this PR.

@tvalentyn
Copy link
Contributor

tvalentyn commented Sep 2, 2022

I can commit these changes to this PR.

done.

@tvalentyn tvalentyn merged commit 31561e2 into apache:master Sep 2, 2022
6 checks passed
@tvalentyn
Copy link
Contributor

tvalentyn commented Sep 2, 2022

Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants