Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

So I am trying to parse a JSON file with Python. Every time I run my script, I get the output of [] and I am very confused as to why. Is this even a proper way to parse JSON in python?

Here is my code:

import sys
import simplejson
import difflib

filename = sys.argv[1]

data = []

f = file('output.json', "r")
lines = f.readlines()
for line in lines:
        try:
            loadLines = simplejson.loads(line)

            data.append( loadLines['executionTime'])

        except ValueError:
            pass


print data  
share|improve this question
1  
@MattBall it has nothing to do with the size of the file. –  JBernardo Jun 26 '13 at 4:07
 
any chance you guys can help? –  Barnaby Jun 26 '13 at 4:09
 
@JBernardo indeed, though the title implied that the size was the problem. –  Matt Ball Jun 26 '13 at 12:44
add comment

1 Answer

up vote 2 down vote accepted

My best guess is that no line on its own is valid JSON. This will cause ValueError to be thrown every time, and you will never get to data.append(...) as an exception has always been thrown by then.

If the entire file is a JSON array like this:

[
    {
        "direction": "left",
        "time": 1
    },
    {
        "direction": "right",
        "time": 2
    }
]

Then you can simply use something like:

with open('output.json', 'r') as f:
    data = json.load(f)

If, however, it is a bunch of JSON items at the top level, not enclosed within a JSON object or array, like this:

{
    "direction": "left",
    "time": 1
}
{
    "direction": "right",
    "time": 2
}

then you'll have to go with a different approach: decoding items one-by-one. Unfortunately, we can't stream the data, so we'll first have to load all the data in at once:

with open('output.json', 'r') as f:
    json_data = f.read()

To parse a single item, we use decode_raw. That means we need to make a JSONDecoder:

decoder = json.JSONDecoder()

Then we just go along, stripping any whitespace on the left side of the string, checking to make sure we still have items, and parsing an item:

while json_data.strip():  # while there's still non-whitespace...
    # strip off whitespace on the left side of the string
    data = json_data.lstrip()
    # and parse an item, setting the new data to be whatever's left
    item, data = decoder.parse_raw(data)
    # ...and then append that item to our list
    data.append(item)

If you're doing lots of data collection like this, it might be worthwhile to store it in a database. Something simple like SQLite will do just fine. A database will make it easier to do aggregate statistics in an efficient way. (That's what they're designed for!) It would probably also make it faster to access the data if you're doing it frequently rather than parsing JSON a lot.

share|improve this answer
 
Could you explain what this means? "If, however, it is a bunch of JSON items at the top level, not enclosed within a JSON object or array, then you'll have to go with a different approach." I made the JSON file by outputting this link bikenyc.com/stations/json everyminute with a python script and terminal –  Barnaby Jun 26 '13 at 4:13
 
@user1887261: I've added an example to show the difference. Note that the first one is surrounded by [ and ] and has commas separating the individual items, whereas the latter has neither. –  icktoofay Jun 26 '13 at 4:16
 
Ahh okay, the edit makes much more sense now! thank you. So my JSON is formatted like the bottom bit of code. where do i begin formulating a different approach based on this? –  Barnaby Jun 26 '13 at 4:17
 
@user1887261: That's a little tricky. I think the simplest way would be to read the whole file's data into memory as a string and then use raw_decode repeatedly. Read the documentation for more information, but it will try to parse one item and will return to you the item and the data that's left. Simply repeat that process until you've read all the items and there's nothing left. The question that someone linked to as a “possible duplicate” may also yield some possible answers. –  icktoofay Jun 26 '13 at 4:21
 
awesome! thanks! so i'm assuming there would have been a better way to pull the data initially...this may sound stupid, but can i just add square brackets to the file manually? –  Barnaby Jun 26 '13 at 4:23
show 2 more comments

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.