I'm building a website which provides some information to the visitors. This information is aggregated in the background by polling a couple external APIs every 5 seconds. The way I have it working now is that I use APScheduler jobs. I initially preferred APScheduler because it makes the whole system more easy to port (since I don't need to set cron jobs on the new machine). I start the polling functions as follows:
from apscheduler.scheduler import Scheduler
@app.before_first_request
def initialize():
apsched = Scheduler()
apsched.start()
apsched.add_interval_job(checkFirstAPI, seconds=5)
apsched.add_interval_job(checkSecondAPI, seconds=5)
apsched.add_interval_job(checkThirdAPI, seconds=5)
This kinda works, but there's some trouble with it:
- For starters, this means that the interval-jobs are running outside of the Flask context. So far this hasn't been much of a problem, but when calling an endpoint fails I want the system to send me an email (saying "hey calling API X failed"). Because it doesn't run within the Flask context however, it complaints that flask-mail cannot be executed (
RuntimeError('working outside of application context')
). - Secondly, I wonder how this is going to behave when I don't use the Flask built-in debug server anymore, but a production server with lets say 4 workers. Will it start every job four times then?
All in all I feel that there should be a better way of running these recurring tasks, but I'm unsure how. Does anybody out there have an interesting solution to this problem? All tips are welcome!
[EDIT] I've just been reading about Celery with its schedules. Although I don't really see how Celery is different from APScheduler and whether it could thus solve my two points, I wonder if anyone reading this thinks that I should investigate more in Celery?