Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I'm building a website which provides some information to the visitors. This information is aggregated in the background by polling a couple external APIs every 5 seconds. The way I have it working now is that I use APScheduler jobs. I initially preferred APScheduler because it makes the whole system more easy to port (since I don't need to set cron jobs on the new machine). I start the polling functions as follows:

from apscheduler.scheduler import Scheduler

@app.before_first_request
def initialize():
    apsched = Scheduler()
    apsched.start()

    apsched.add_interval_job(checkFirstAPI, seconds=5)
    apsched.add_interval_job(checkSecondAPI, seconds=5)
    apsched.add_interval_job(checkThirdAPI, seconds=5)

This kinda works, but there's some trouble with it:

  1. For starters, this means that the interval-jobs are running outside of the Flask context. So far this hasn't been much of a problem, but when calling an endpoint fails I want the system to send me an email (saying "hey calling API X failed"). Because it doesn't run within the Flask context however, it complaints that flask-mail cannot be executed (RuntimeError('working outside of application context')).
  2. Secondly, I wonder how this is going to behave when I don't use the Flask built-in debug server anymore, but a production server with lets say 4 workers. Will it start every job four times then?

All in all I feel that there should be a better way of running these recurring tasks, but I'm unsure how. Does anybody out there have an interesting solution to this problem? All tips are welcome!

[EDIT] I've just been reading about Celery with its schedules. Although I don't really see how Celery is different from APScheduler and whether it could thus solve my two points, I wonder if anyone reading this thinks that I should investigate more in Celery?

share|improve this question

1 Answer 1

up vote 2 down vote accepted

(1)

You can use the app.app_context() context manager to set the application context. I imagine usage would go something like this:

from apscheduler.scheduler import Scheduler

def checkSecondApi():
    with app.app_context():
        # Do whatever you were doing to check the second API

@app.before_first_request
def initialize():
    apsched = Scheduler()
    apsched.start()

    apsched.add_interval_job(checkFirstAPI, seconds=5)
    apsched.add_interval_job(checkSecondAPI, seconds=5)
    apsched.add_interval_job(checkThirdAPI, seconds=5)

Alternatively, you could use a decorator

def with_application_context(app):
    def inner(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            with app.app_context():
                return func(*args, **kwargs)
        return wrapper
    return inner

@with_application_context(app)
def checkFirstAPI():
    # Check the first API as before

(2)

Yes it will still work. The sole (significant) difference is that your application will not be communicating directly with the world; it will be going through a reverse proxy or something via fastcgi/uwsgi/whatever. The only concern is that if you have multiple instances of the app starting, then multiple schedulers will be created. To manage this, I would suggest you move your backend tasks out of the Flask application, and use a tool designed for running tasks regularly (i.e. Celery). The downside to this is that you won't be able to use things like Flask-Mail, but imo, it's not too good to be so closely tied to the Flask ecosystem; what are you gaining by using Flask-Mail over a standard, non Flask, mail library?

Also, breaking up your application makes it much easier to scale up individual components as the capacity is required, compared to having one monolithic web application.

share|improve this answer
    
Thank you for your elaborate answer. One last thing; since APScheduler works perfectly well, what do you think would be the advantage of using Celery over APScheduler? What would be your pick and why? –  kramer65 Sep 3 at 9:18
    
Well Celery offers you a lot more than simple scheduling. With that said, I don't know much about APScheduler, and reading the docs now, it looks perfectly fine. –  BluePeppers Sep 3 at 9:20

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.