We are thinking of the integration of apache spark in our calculation process where we at first wanted to use apache oozie and standard MR or MO (Map-Only) jobs.
After some research several questions remain:
- Is it possible to orchestrate an apache spark process by using apache oozie? If yes, how?
- Is oozie necessary anymore or could spark handle orchestration by itself? (unification seems to be one of the main concerns in spark)
Please consider the following scenarios when answering:
- executing a work flow every 4 hours
- executing a work flow whenever specific data is accessible
- trigger a work flow and configure it with parameters
Thanks for your answers in advance.