up vote 1 down vote favorite
1

I am fetching a .js file from a remote site that contains data I want to process as JSON using the simplejson library on my Google App Engine site. The .js file looks like this:

var txns = [
    { apples: '100', oranges: '20', type: 'SELL'}, 
    { apples: '200', oranges: '10', type: 'BUY'}]

I have no control over the format of this file. What I did at first just to hack through it was to chop the "var txns = " bit off of the string and then do a series of .replace(old, new, [count]) on the string until it looked like standard JSON:

cleanJSON = malformedJSON.replace("'", '"').replace('apples:', '"apples":').replace('oranges:', '"oranges":').replace('type:', '"type":').replace('{', '{"transaction":{').replace('}', '}}')

So that it now looks like:

[{ "transaction" : { "apples": "100", "oranges": "20", "type": "SELL"} }, 
 { "transaction" : { "apples": "200", "oranges": "10", "type": "BUY"} }]

How would you tackle this formatting issue? Is there a known way (library, script) to format a JavaScript array into JSON notation?

flag

78% accept rate

5 Answers

up vote 3 down vote accepted

It's not too difficult to write your own little parsor for that using PyParsing.

import json
from pyparsing import *

data = """var txns = [
   { apples: '100', oranges: '20', type: 'SELL'}, 
   { apples: '200', oranges: '10', type: 'BUY'}]"""


def js_grammar():
    key = Word(alphas).setResultsName("key")
    value = QuotedString("'").setResultsName("value")
    pair = Group(key + Literal(":").suppress() + value)
    object_ = nestedExpr("{", "}", delimitedList(pair, ","))
    array = nestedExpr("[", "]", delimitedList(object_, ","))
    return array + StringEnd()

JS_GRAMMAR = js_grammar()

def parse(js):
    return JS_GRAMMAR.parseString(js[len("var txns = "):])[0]

def to_dict(object_):
    return dict((p.key, p.value) for p in object_)

result = [
    {"transaction": to_dict(object_)}
    for object_ in parse(data)]
print json.dumps(result)

This is going to print

[{"transaction": {"type": "SELL", "apples": "100", "oranges": "20"}},
 {"transaction": {"type": "BUY", "apples": "200", "oranges": "10"}}]

You can also add the assignment to the grammar itself. Given there are already off-the-shelf parsers for it, you should better use those.

link|flag
Thanks for the reference to pyparsing...this will come in handy in the future. Not sure which answer to accept yet. – Greg Jul 17 '09 at 17:19
One of the details I left out was that one of the fields in the array is a beast of characters that make yaml choke (:, ', "). I'll need to suppress them and I think this solution will let me do that. – Greg Jul 17 '09 at 17:21
This is pretty nice, although it doesn't handle the true, false, and null keywords, or Unicode escapes (not sure if they will ever pop up). – Kiv Jul 17 '09 at 17:34
It's not too difficult to extend the range of allowable values and add a little dictionary with builtins like those. I mostly wrote up that answer to advertise PyParsing:) – Torsten Marek Jul 17 '09 at 17:46
up vote 3 down vote

I would use the yaml parser as its better at most things like this. It comes with GAE as well as it is used for the config files. Json is a subset of yaml.

All you have to do is get rid of "var txns =" then yaml should do the rest.

import yaml

string = """[{ apples: '100', oranges: '20', type: 'SELL'}, 
             { apples: '200', oranges: '10', type: 'BUY'}]"""

list = yaml.load(string)

print list

This gives you.

[{'type': 'SELL', 'apples': '100', 'oranges': '20'},
 {'type': 'BUY', 'apples': '200', 'oranges': '10'}]

Once loaded you can always dump it back as a json.

link|flag
Cool, I wasn't aware of the yaml lib. I am having some difficulty with one of the fields now ... it has some spurious characters that I have to suppress. I might need to go with the pyparsing solution given that issue. – Greg Jul 17 '09 at 17:20
Greg, what issue are you having? – Nosredna Jul 17 '09 at 17:25
Basically I have some values with slashes, escapes, colons and apostrophes like 'oranges:with:1,ripe:\"yes\"' that just make it hard to sweep the text to do the original parsing. – Greg Jul 17 '09 at 19:08
up vote 0 down vote

If you know that's what it's always going to look like, you could do a regex to find unquoted space-delimited text that ends with a colon and surround it with quotes.

I'm always worried about unexpected input with a regex like that, though. How do you know the remote source won't change what you get?

link|flag
I don't know if/when it changes ... it's a fragile solution to be sure and I'll have error handling to report when it changes. But this is just for a hobby app. :) – Greg Jul 17 '09 at 16:50
up vote 0 down vote

You could create an intermediate page containing a Javascript script that just loads the remote one and dumps it to JSON. Then Python can make requests to your intermediate page and get out nice JSON.

link|flag
I'd prefer to keep it to one hop because I'll be doing this with several files ... so multiply each extra hop by at least six for now. – Greg Jul 17 '09 at 17:23
You can bundle all your requests into one request to the intermediate page, so this only actually adds one hop total. – Kiv Jul 17 '09 at 17:28
Good point - duh. :) – Greg Jul 17 '09 at 19:09
up vote -1 down vote

http://www.devpro.it/JSON/files/JSON-js.html

link|flag
I gave this a quick glance and it seems to be JavaScript. I am in a python script (using urlfetch) when I bring in the .js file to be processed and I want to render a python list or dict out to my Django template ... I'm not sure how this library can help me? – Greg Jul 17 '09 at 16:53

Your Answer

get an OpenID
or
never shown

Not the answer you're looking for? Browse other questions tagged or ask your own question.