Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I have an array filled with 5 other arrays that consists of arrays of 2 values (first a date then a count):

[ 
  [ 
    [dt, cnt], [dt, cnt], .... 
  ], 
  [ 
    [dt, cnt], [dt, cnt], .... 
  ], 
  [ 
    [dt, cnt], [dt, cnt], .... 
  ], 
  [ 
    [dt, cnt], [dt, cnt], .... 
  ], 
  [ 
    [dt, cnt], [dt, cnt], .... 
  ], 
]

It concerns click statistics of different websites. This data needs to be converted to data to make a consistent chart with Google visualisations. So first a conversion is done in Python, then this result converted to JSON to pass to the Google libs.

My current code is this:

# determine min date
mindate = datetime.date.max
for dataSet in sets:
    if (dataSet[0][0] < mindate):
        mindate = dataSet[0][0];
# fill a dictionary with all dates involved
datedict = {}
for dat in daterange(mindate, today):
    datedict[dat] = [dat];
# fill dictionary with rest of data
arrlen = 2
for dataSet in sets:
    # first the values
    for value in dataSet:
        datedict[value[0]] = datedict[value[0]] + [value[1]];
    # don't forget the missing values (use 0)
    for dat in daterange(mindate, today):
        if len(datedict[dat]) < arrlen:
            datedict[dat] = datedict[dat] + [0]
    arrlen = arrlen + 1
# convert to an array
datearr = []
for dat in daterange(mindate, today):
    datearr = datearr + [datedict[dat]]
# convert to json
result = json.dumps(datearr, cls=DateTimeEncoder)

(The DateTimeEncoder just gives a JavaScript-friendly datetime in the JSON)

The output looks like this:

[
    ["new Date(2008, 7, 27)", 0, 5371, 1042, 69, 0], 
    ["new Date(2008, 7, 28)", 0, 5665, 1100, 89, 0], 
    ...
]

This is one of my first Python adventures, so I expect this piece of code can be improved upon easily. I want this to be shorter and more elegant. Show me the awesomeness of Python because I'm still a bit disappointed.

I'm using Django, by the way.

share|improve this question
up vote 4 down vote accepted

List comprehensions are definitely the way to go here. They allow you to say what you mean without getting mired in the details of looping.

To find the first date, make use of the built-in min() function.

defaultdict is useful for handling lookups where there may be missing keys. For example, defaultdict(int) gives you a dictionary that reports 0 whenever you lookup a date that it doesn't know about.

I assume that each of the lists within sets represents the data for one of your sites. I recommend using site_data as a more meaningful alternative to dataSet.

from collections import defaultdict
import json

def first_date(series):
    return series[0][0]

start_date = min(first_date(site_data) for site_data in sets)

# counts_by_site_and_date is just a "more convenient" version of sets, where
# the [dt, cnt] pairs have been transformed into a dictionary where you can
# lookup the count by date.
counts_by_site_and_date = [defaultdict(int, site_data) for site_data in sets]

result = json.dumps([
    [date] + [site_counts[date] for site_counts in counts_by_site_and_date]
        for date in daterange(start_date, today)
], cls=DateTimeEncoder)
share|improve this answer
1  
Thanks! I really like the defaultdict. Didn't know that one yet. – Jacco Jan 21 '15 at 22:11

Trailing semicolons

At many places there is a ; at the end of the line. It's completely unnecessary in Python and you should remove those.

Appending to arrays

At several places you are appending values to arrays in a strange way, for example:

    datedict[val[0]] = datedict[val[0]] + [val[1]];

The common and shorter way to do this is using .append():

    datedict[val[0]].append(val[1])

Augmented assignment operator +=

Instead of:

l = l + 1

This is shorter and better:

l += 1

Use list comprehensions

Instead of:

datearr = []
for dat in daterange(mindate, today):
    datearr = datearr + [datedict[dat]]

You can use a list comprehension, a powerful feature of Python:

datearr = [datedict[dat] for dat in daterange(mindate, today)]
share|improve this answer

In Python 3.3 or earlier you can get the mindate using min() with a generator expression:

mindate = datetime.date.max
if sets:
    mindate = min(d[0][0] for d in sets)

For Python 3.4+ we can pass a default value to min() in case the iterable passed to it is empty:

mindate = min((d[0][0] for d in sets), default=datetime.date.max)

As we are going to use the results returned by daterange many times in our program we should better save it once in a variable.

dates_list = daterange(mindate, today)

Now while initializing the datedict better initialize it as [0, 0, 0, 0, 0] or [0]*len(sets) because after this we can remove the inner loop that you're doing in the next piece of code. And you can initialize the dict using a dict-comprehension:

datedict = {dat: [0]*len(sets) for dat in dates_list}

For Python 2.6 or earlier use dict() with a generator expression:

datedict = dict((dat, [0]*len(sets)) for dat in dates_list)

Now time to update the value of items whose values are known to us. The trick here is to use enumerate to get the index of current row from sets so that we can update the count at particular index:

for i, d in enumerate(sets):
    # use tuple unpacking
    for dat, count in d:
        datedict[dat][i] += count

Lastly we can create the desired dateattr using a list-comprehension:

datearr = [[dat] + datedict[dat] for dat in dates_list]
share|improve this answer
    
Great, that step works now, but I think the last step renders the array empty. (I checked that the datedict is correct before that step) – Jacco Jan 21 '15 at 22:01
    
@Jacco I guess daterange is a generator then? – Ashwini Chaudhary Jan 21 '15 at 22:02
    
def daterange(start_date, end_date): for n in range(int ((end_date - start_date).days)): yield start_date + datetime.timedelta(n) – Jacco Jan 21 '15 at 22:05
    
So, yes :-) I fixed this by doing: dates_list = list(daterange(mindate, today)) – Jacco Jan 21 '15 at 22:07

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.