Tell me more ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

UPDATE: I didn't write the library pasted below I just make use of it. I'm unable to paste the part I wrote here because the message exceeds 30k characters when I try to do so.

I have been working on my webapp and this a simple came into life as I needed to scratch my own itch. It basically enables you to store models and query results into datastore, memcache or local instance storage using a basic API.

I haven't published any open source work before, but as this thing helped me keep the lines of code and CPU usage low, I thought it would help others too.

How can I make it any better? I'm planning to write unit tests, but I'm not experienced with it, so any suggestions on which libraries to use etc. is welcome.

"""
Author: Juan Pablo Guereca

Module which implements a per GAE instance data cache, similar to what you can achieve with APC in PHP instances.

Each GAE instance caches the global scope, keeping the state of every variable on the global scope. 
You can go farther and cache other things, creating a caching layer for each GAE instance, and it's really fast because
there is no network transfer like in memcache. Moreover GAE doesn't charge for using it and it can save you many memcache
and db requests. 

Not everything are upsides. You can not use it on every case because: 

- There's no way to know if you have set or deleted a key in all the GAE instances that your app is using. Everything you do
  with Cachepy happens in the instance of the current request and you have N instances, be aware of that.
- The only way to be sure you have flushed all the GAE instances caches is doing a code upload, no code change required. 
- The memory available depends on each GAE instance and your app. I've been able to set a 60 millions characters string which
  is like 57 MB at least. You can cache somethings but not everything. 
"""

import time
import logging
import os

CACHE = {}
STATS_HITS = 0
STATS_MISSES = 0
STATS_KEYS_COUNT = 0

""" Flag to deactivate it on local environment. """
ACTIVE = True#False if os.environ.get('SERVER_SOFTWARE').startswith('Devel') else True

""" 
None means forever.
Value in seconds.
"""
DEFAULT_CACHING_TIME = None

URL_KEY = 'URL_%s'

"""
Curious thing: A dictionary in the global scope can be referenced and changed inside a function without using the global statement,
but it can not be redefined.
"""

def get( key ):
    """ Gets the data associated to the key or a None """
    if ACTIVE is False:
        return None

    global CACHE, STATS_MISSES, STATS_HITS

    """ Return a key stored in the python instance cache or a None if it has expired or it doesn't exist """
    if key not in CACHE:
        STATS_MISSES += 1
        return None

    value, expiry = CACHE[key]
    current_timestamp = time.time()
    if expiry == None or current_timestamp < expiry:
        STATS_HITS += 1
        return value
    else:
        STATS_MISSES += 1
        delete( key )
        return None

def set( key, value, expiry = DEFAULT_CACHING_TIME ):
    """
    Sets a key in the current instance
    key, value, expiry seconds till it expires 
    """
    if ACTIVE is False:
        return None

    global CACHE, STATS_KEYS_COUNT
    if key not in CACHE:
        STATS_KEYS_COUNT += 1
    if expiry != None:
        expiry = time.time() + int( expiry )

    try:
        CACHE[key] = ( value, expiry )
    except MemoryError:
        """ It doesn't seems to catch the exception, something in the GAE's python runtime probably """
        logging.info( "%s memory error setting key '%s'" % ( __name__, key ) )

def delete( key ):
    """ 
    Deletes the key stored in the cache of the current instance, not all the instances.
    There's no reason to use it except for debugging when developing, use expiry when setting a value instead.
    """
    global CACHE, STATS_KEYS_COUNT
    if key in CACHE:
        STATS_KEYS_COUNT -= 1
        del CACHE[key]

def dump():
    """
    Returns the cache dictionary with all the data of the current instance, not all the instances.
    There's no reason to use it except for debugging when developing.
    """
    global CACHE
    return CACHE

def flush():
    """
    Resets the cache of the current instance, not all the instances.
    There's no reason to use it except for debugging when developing.
    """
    global CACHE, STATS_KEYS_COUNT
    CACHE = {}
    STATS_KEYS_COUNT = 0

def stats():
    """ Return the hits and misses stats, the number of keys and the cache memory address of the current instance, not all the instances."""
    global CACHE, STATS_MISSES, STATS_HITS, STATS_KEYS_COUNT
    memory_address = "0x" + str("%X" % id( CACHE )).zfill(16)
    return {'cache_memory_address': memory_address,
            'hits': STATS_HITS,
            'misses': STATS_MISSES ,
            'keys_count': STATS_KEYS_COUNT,
            }

def cacheit( keyformat, expiry=DEFAULT_CACHING_TIME ):
    """ Decorator to memoize functions in the current instance cache, not all the instances. """
    def decorator( fxn ):
        def wrapper( *args, **kwargs ):
            key = keyformat % args[:keyformat.count('%')]
            data = get( key )
            if data is None:
                data = fxn( *args, **kwargs )
                set( key, data, expiry )
            return data
        return wrapper
    return decorator
share|improve this question
5  
Welcome to code review! If you read the FAQ you'll learn that any code that you want reviewed should really be pasted into your question. Its also geared towards smaller pieces of code (classes/functions) rather then even a small library like yours. You'll probably have better success if you show specific pieces of code you want feedback on. – Winston Ewert Apr 11 '11 at 21:57
On a side note, I like the app --- I got distracted from looking at its technical merits by my interest in the content! – JasonFruit Apr 12 '11 at 13:10
Thanks for your interest, I'm not the author of cachepy but I guess I may have to rewrite it based on your suggestions. Also I refrained from pasting the whole library code here as it is about 1000 loc (400 loc comments), to keep the question less intimidating. – Can Bascil Apr 13 '11 at 12:35

2 Answers

up vote 1 down vote accepted

You are using a collection of global variables. I think the code would be cleaner if you moved all of those variables into a class. That would eliminate all of the global statements, and it would increase the flexibility of how you can use such the code.

if ACTIVE is False:

There are many false objects besides False in python. You shouldn't assume that a given value is exactly False. Just use if not ACTIVE:

   if key not in CACHE:
        STATS_MISSES += 1
        return None

    value, expiry = CACHE[key]

This code looks up key in CACHE twice. Its better to try to access the object and catch an exception or use .get and pass a default value.

if expiry != None:

Using is None is better, IMO. Its a little bit faster and None is a singleton.

try:
    CACHE[key] = ( value, expiry )
except MemoryError:
    """ It doesn't seems to catch the exception, something in the GAE's python runtime probably """
    logging.info( "%s memory error setting key '%s'" % ( __name__, key ) )

I don't know GAE, but if you are getting MemoryErrors it probably means you have more serious problems.

key = keyformat % args[:keyformat.count('%')]

Converting you arguments into a string seems inefficient. Why not use a tuple of the arguments?

I try to avoid global variables, so I'd probably have cacheit create a Cache object for every function and attach it to the function as .cache attribute.

share|improve this answer
Regarding MemoryError, I haven't really come upon that while looking at the application logs and I probably store about 800K model entities in local cache (per day). If a local instance of application engine exceeds 187 MB, then it is shut down and new ones are restarted. A restart due to bloated local instances occurs about 3 or 4 times a day. – Can Bascil Apr 13 '11 at 12:46
"""
Curious thing: A dictionary in the global scope can be referenced and changed inside a function without using the global statement, but it can not be redefined.
"""

It's default behavior for global variable. When you try to redefine global variable inside some function without global keyword then python interpreter creates local variable with the same name.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.