random issues with python flask application on apache

Question

I have an apache webserver which I have setup a website using flask using mod_wsgi. I am having a couple of issues which may or may not be related.

With every call to a certain page (which runs a function performing heavy computation that takes over 2 seconds), the memory increases about 20 megabytes. My server starts out with about 350megabytes consumed by everything on the machine. The server has a total of 3,620megabytes shown in htop. After I reload this page many times, the total memory used by the server eventually starts topping out around 2,400 megabytes and stops increasing as much. After it gets to this level I haven't been able to get it consume enough memory to go into swap after hundreds of page reloads. Is this by design of flask or apache or python? To me, if there were some kind of caching mechanism, it didn't seem like memory accumulation would happen if the same URL is called every time. If I restart apache, the memory is released.
Sometimes calls to this page result in called functions erroring out, even though they are all read only calls (not writing any data to the disk) and the query string is the same for every page.
I have another page (calling another function which does much less computation), when called concurrently with other pages running on the web server, randomly errors out or the result (an image) comes back unexpectedly.

Could issues 2 and 3 be related to issue 1? Could issues 2 and 3 be due to bad programming somehow or bad memory in the machine? I am able to reproduce the randomness by loading the same URL in about 40 firefox tabs and then choosing the "reload all tabs" option.

What more information should be provided to get a better answer?

I have tried placing

import gc
gc.collect()

into my code.

I do have

    WSGIDaemonProcess website user=www-data group=www-data processes=2 threads=2 home=https://waybackassets.bk21.net/website
    WSGIScriptAlias / https://waybackassets.bk21.net/website/website.wsgi
    <Directory https://waybackassets.bk21.net/website>
            WSGIProcessGroup website
            WSGIScriptReloading On
            WSGIApplicationGroup %{GLOBAL}
            Order deny,allow
            Allow from all
    </Directory>

in my /etc/apache2/sites-available/default file. It doesn't seem like the memory should grow that much if there are only a total of 4 threads being created, should there?

UPDATE

If I set processes=1 threads=4, then the seemingly random issues occur all the time when two requests are placed at once. One I set processes=4 threads=1, then the seemingly random issues don't happen. The rise in memory is still occurring though, and actually will now rise all the way to the max RAM of the system and start swapping.

UPDATE

Although I haven't gotten this runaway RAM consumption issue resolved, I didn't have problems for several months with my current application. Apparently it wasn't too popular, and after several days or so, apache may have been clearing out the RAM automatically or something.

Now, I've made another application, which is fairly unrelated to the previous one. The previous application was generating about 1 megapixel images using matplotlib. My new application is generating 20 megapixel images and 1 megapixel images using matplotlib. The problem is monumentally larger now when 20 megapixel images are generated with the new application. After the entire swap space is filled up, something seems to get killed, and things work at a decent speed for a while while there is some RAM and swap space available, but is much slower to run when the RAM is consumed. Here are the processes running. I don't think that there are any extra zombie processes running.

$ ps -ef|grep apache  
root      3753     1  0 03:45 ?        00:00:02 /usr/sbin/apache2 -k start  
www-data  3756  3753  0 03:45 ?        00:00:00 /usr/sbin/apache2 -k start  
www-data  3759  3753  0 03:45 ?        00:02:06 /usr/sbin/apache2 -k start  
www-data  3762  3753  0 03:45 ?        00:00:01 /usr/sbin/apache2 -k start  
www-data  3763  3753  0 03:45 ?        00:00:01 /usr/sbin/apache2 -k start  
test      4644  4591  0 12:27 pts/1    00:00:00 tail -f /var/log/apache2/access.log  
www-data  4894  3753  0 21:34 ?        00:00:37 /usr/sbin/apache2 -k start  
www-data  4917  3753  2 22:33 ?        00:00:36 /usr/sbin/apache2 -k start  
www-data  4980  3753  1 22:46 ?        00:00:12 /usr/sbin/apache2 -k start

I am a little confused though when I look at htop because it shows a lot more processes than top or ps.

UPDATE

I have figured out that the memory leak is due to matplotlib (or the way I am using it), and not flask or apache, so the problems 2 and 3 I originally posted are indeed a separate issue from problem 1. Below is a basic function that I made to eliminate/reproduce the problem, interactively in ipython.

def BigComputation():
    import cStringIO
    import matplotlib
    matplotlib.use('Agg')
    import matplotlib.pyplot as plt

    #larger figure size causes more RAM to be used when savefig is run.
    #this function also uses some RAM that is never released automatically
    #if plt.close('all') is never run, but it is a small amount,
    #so it is hard to tell unless run BigComputation thousands of times.
    TheFigure=plt.figure(figsize=(250,8))

    file_output = cStringIO.StringIO()

    #causes lots of RAM to be used, and never released automatically
    TheFigure.savefig(file_output)

    #releases all the RAM that is never released automatically
    plt.close('all')

    return None

The trick to getting rid of the RAM leak is to run

plt.close('all')

within BigComputation(), otherwise, BigComputation() will just keep accumulating RAM every time the function is called. I don't know if I am just using matplotlib inappropriately or have bad coding technique, but I really would think that once BigComputation() returns, it should release all the memory except any global objects or the objects it returned. It seems to me like matplotlib must be creating some global variables in an inappropriate way, because I have no idea what they are named.

I guess where my question stands now is why do I need plt.close('all')? I also need to try the suggestions of Graham Dumpleton in order to further diagnose my apache configuration to see why I need to set threads=1 in apache to get the random errors to go away.

I've previously experienced issues with Apache taking longer than usual to kill zombie processes. When you're doing your tests with top do you see large numbers of extra zombie processes? Also - with 1), Apache may be limiting the numbers of requests causing the memory to stop increasing and extra requests, in 2) and 3) to be rejected or returned unexpectedly — Ewan, May 8 '13 at 6:24

Graham Dumpleton · Answer 1 · 2013-05-08 06:38:20Z

up vote 0 down vote

Obviously a programming issue, but made worse by running a multiprocess configuration. Read:

http://blog.dscpl.com.au/2012/10/why-are-you-using-embedded-mode-of.html

and also perhaps watch:

They explain the need to be careful of how you setup Apache.

UPDATE 1

Based on the configuration you added, you are missing:

WSGIProcessGroup website

Your code will not even be running in the daemon process group. So you are at the mercy of whatever MPM you are using and how many processes it is running.

UPDATE 2

Your Directory block is wrong. It is not referring to the directory. Should be:

<Directory /web>
        WSGIProcessGroup website
        WSGIApplicationGroup %{GLOBAL}
        Order deny,allow
        Allow from all
</Directory>

The WSGIScriptReloading directive is not needed as that is the default.

UPDATE 3

Since you are not providing your exact configuration and so we cant now for sure that what you are giving is the same, to absolutely confirm that you are using daemon mode and thus only maximum 2 processes, do the tests at:

You want to get 'website' and ''. Meaning daemon mode and main interpreter.

That we know we are actually talking just about the memory usage of the two daemon processes.

edited May 8 '13 at 6:38

answered May 7 '13 at 22:44

Graham Dumpleton
23.4k24046

I added some info to my question regarding some of my apache settings. I have looked through your links somewhat and don't see how my configuration would be resulting in that much memory growth and I also don't understand from your links why I should be seeing the random issues with my python scripts. I'm really not very concerned about high performance because this will be a low traffic web server, I am just concerned about consistency of the page loads and that it doesn't take up so much memory that it starts swapping one day. – user1748155 May 8 '13 at 4:15

updated my question with more of the apache config. I did have WSGIProcessGroup website listed under <Directory https://waybackassets.bk21.net/website> area. does this work too, or could it be my problem? That is the way flask says to do it. flask.pocoo.org/docs/deploying/mod_wsgi – user1748155 May 8 '13 at 5:09

my bad on that. i had changed the path in what i posted on here for privacy considerations and had just typed the new example path in wrong. it was correct in my apache config and i have updated my question to reflect an example path that would work. still though, if i had had the path actually wrong, nothing would have worked at all (and I wouldn't have experienced the randomness issues or rise in memory usage). – user1748155 May 8 '13 at 6:13

If you had the path wrong it still may have worked if you had loose Apache access controls set in your wider Apache configuration. What wouldn't have happened is daemon mode wouldn't have been used. – Graham Dumpleton May 8 '13 at 6:34

add comment

asked	1 year ago
viewed	198 times
active	8 months ago

current community

your communities

more stack exchange communities

random issues with python flask application on apache

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged python apache memory matplotlib flask or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

random issues with python flask application on apache

1 Answer

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python apache memory matplotlib flask or ask your own question.

Related

Hot Network Questions