Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

df: types: dataflow: Call update any time any subproperty of flow is modified #1308

Open
pdxjohnny opened this issue Feb 20, 2022 · 0 comments
Open

Comments

@pdxjohnny
Copy link
Member

@pdxjohnny pdxjohnny commented Feb 20, 2022

Pain Point

Currently when a use modifies dataflow.flow the dataflow is not updated unless dataflow.update() is called.

For example:

# This Dataflow takes input from stdio using `AcceptUserInput`
# operation. The string input which corresponds to feature `Years`
# is converted to `int`/`float` by
# `literal_eval` operation.
# `create_mapping` operation creates a mapping using the numeric output
# of `literal_eval` eg. {"Years":34}.
# The mapping is then fed to `model_predict` operation which
# uses the `slr` model trained above to make prediction. The prediction is then printed to
# stdout using `print_output` operation.
dataflow = DataFlow(
operations={
"get_user_input": AcceptUserInput,
"literal_eval_input": literal_eval,
"create_feature_map": create_mapping,
"predict_using_model": model_predict,
"print_predictions": print_output,
},
configs={"predict_using_model": ModelPredictConfig(model=slr_model)},
)
dataflow.flow.update(
{
"literal_eval_input": InputFlow(
inputs={"str_to_eval": [{"get_user_input": "InputData"}]}
),
"create_feature_map": InputFlow(
inputs={
"key": ["seed.Years"],
"value": [{"literal_eval_input": "str_after_eval"}],
}
),
"predict_using_model": InputFlow(
inputs={"features": [{"create_feature_map": "mapping"}]}
),
"print_predictions": InputFlow(
inputs={"data": [{"predict_using_model": "prediction"}]}
),
}
)
dataflow.update()

Proposed Solution

Ideally whenever flow is modified, or any subproperty all the way down the nested objects is modifed. We would automatically call dataflow.update() so the user doesn't have to remember to call it.

The following code might be helpful from the config code paths. We have leveraged getters and setters to do validation on properties. We could possibly take a similar with flow, we might have to ensure each object within flow has a reference to the dataflow it's within. If an object was instantiated as above, we might be able to detect it from the parent it was added to, going all the way up to the flow object, and add down a reference to the dataflow, so the object can call update, as well as then calling update due to that addition of a property.

dffml/dffml/base.py

Lines 366 to 451 in 8c87efa

def config_make_getter(key):
"""
Create a getter function for use with :py:func:`property` on config objects.
"""
def getter_mutable(self):
if not key in self._data:
raise AttributeError(key)
return self._data[key]
return getter_mutable
class ImmutableConfigPropertyError(Exception):
"""
Raised when config property was changed but was not marked as mutable.
"""
class NoMutableCallbacksError(Exception):
"""
Raised when a config property is mutated but there are not mutable callbacks
present to handle it's update.
"""
def config_make_setter(key, immutable):
"""
Create a setter function for use with :py:func:`property` on config objects.
"""
def setter_immutable(self, value):
config_ensure_immutable_init(self)
# Reach into caller's stack frame to check if we are in the
# __init__ function of the dataclass. If we are in the __init__
# method we should not enforce immutability. Set max_depth to 4 in
# case of __post_init__. No point in searching farther.
if within_method(self, "__init__", max_depth=4):
# Mutate without checks if we are within the __init__ code of
# the class. Then bail out, we're done here.
self._data[key] = value
return
# Raise if the property is immutable and we're in enforcing mode
if self._enforce_immutable:
if immutable:
raise ImmutableConfigPropertyError(
f"Attempted to mutate immutable property {self.__class__.__qualname__}.{key}"
)
# Ensure we have callbacks if we're mutating
if self._mutable_callbacks:
raise NoMutableCallbacksError(
"Config instance has no mutable_callbacks registered but a mutable property was updated"
)
# Call callbacks to notify we've mutated
for func in self._mutable_callbacks:
func(key, value)
# Mutate property
self._data[key] = value
return setter_immutable
def _config(datacls):
datacls._fromdict = classmethod(_fromdict)
datacls._replace = lambda self, *args, **kwargs: dataclasses.replace(
self, *args, **kwargs
)
datacls._asdict = config_asdict
datacls.add_mutable_callback = config_add_mutable_callback
datacls.no_enforce_immutable = config_no_enforce_immutable
for field in dataclasses.fields(datacls):
# Make deleter None so it raises AttributeError: can't delete attribute
setattr(
datacls,
field.name,
property(
config_make_getter(field.name),
config_make_setter(
field.name, field.metadata.get("mutable", False)
),
None,
),
)
return datacls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant