I have json mapping under properties key as below in a file Customer.json,

{
    "customer": {
        "properties": {
            "customerId": {
                "type": "string",
                "index": "not_analyzed"
            },
            "name": {
                "type": "string",
                "index": "not_analyzed"
            }
        }
    }
}

Which I want to convert into following with key and displayName duplicated and type from above mapping,

field(key: 'customerId',     displayName: 'customerId', type: 'String')
field(key: 'name',           displayName: 'name',       type: 'String')

I hit and tried bash + python as below assuming it fetched customer key first and loops inside properties ,

$ cat Customer.json | python -c 'import sys; import simplejson as json; \
print "\n".join( [i["properties"] for i in json.loads( sys.stdin.read() )["customer"]] )'

Traceback (most recent call last):
File "<string>", line 2, in <module>
TypeError: string indices must be integers, not str

I'm open to other solutions as well.

share|improve this question
up vote 1 down vote accepted

Parsing such structured data is best done using a dedicated parser, just as you are doing. However, in this particular case, it is simple enough that you could do:

$ grep -B 1 '"type":' Customer.json | tr $'"' $"'" | sed 's/[:,]//g' | 
    awk '{print "field(key: "$1",\tdisplayName: "$1",\t type: "$NF")"}' RS="--" 

Which returns:

field(key: 'customerId',    displayName: 'customerId',   type: 'string')
field(key: 'name',  displayName: 'name',     type: 'string')
share|improve this answer

The error seems clear to me, the variable "i" is a string as the for loop iterates over the value of "customer" dict/mapping. That value is a dict/mapping itself and iterating over it will get you the succesion of the list of keys (i.e. ["properties"].

cat Customer.json |  python -c 'import sys; import simplejson as json; \
print "\n".join( [i for i in json.loads( sys.stdin.read() )["customer"]["properties"] ] )'

will give you

 customerid
 name

and the following should get you closer to your goal:

cat Customer.json |  python -c 'import sys; import simplejson as json; \
print "\n".join( ["{} {}".format(k, v) for k,v in json.loads( sys.stdin.read() )["customer"]["properties"].iteritems() ] )'

which gives:

customerId {'index': 'not_analyzed', 'type': 'string'}
name {'index': 'not_analyzed', 'type': 'string'}

From there I would recommend you actually make the python in a script. You will have to decide on how to get from string to String and do the additional formatting. A multiliner is always more easy to debug (case proven by your Question) and maintainable and gives more meaningful (line numbers) error messages.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.