Join the Stack Overflow Community
Stack Overflow is a community of 6.4 million programmers, just like you, helping each other.
Join them; it only takes a minute:
Sign up

So here is the standard way to read in a JSON file in python

import json
from pprint import pprint

with open('ig001.json') as data_file:    
    data = json.load(data_file)

pprint(data)

However, my JSON file that I want to read has multiple JSON objects in it. So it looks something like:

[{},{}.... ]

[{},{}.... ]

Where this represents 2 JSON objects, and inside each object inside each {}, there are a bunch of key:value pairs.

So when I try to read this using the standard read code that I have above, I get the error:

Traceback (most recent call last): File "jsonformatter.py", line 5, in data = json.load(data_file) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/init.py", line 290, in load **kw) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/init.py", line 338, in loads return _default_decoder.decode(s) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode raise ValueError(errmsg("Extra data", s, end, len(s))) ValueError: Extra data: line 3889 column 2 - line 719307 column 2 (char 164691 - 30776399)

Where line 3889 is where the first JSON object ends and the next one begins, the line itself looks like "][".

Any ideas on how to fix this would be appreciated, thanks!

share|improve this question
2  
Can you post your json file? – Manish Gupta Apr 8 at 6:09
    
You mean multiple JSON arrays, right? – magni- Apr 8 at 6:15
    
Please provide more data on the JSON file that you have. – Annapoornima Koppad May 13 at 3:12

Without a link your JSON file, I'm going to have to make some assumptions:

  1. Top-level json arrays are not each on their own line (since the first parsing error is on line 3889), so we can't easily
  2. This is the only type of invalid JSON present in the file.

To fix this:

# 1. replace instances of `][` with `]<SPLIT>[`
# (`<SPLIT>` needs to be something that is not present anywhere in the file to begin with)

raw_data = data_file.read()  # we're going to need the entire file in memory
tweaked_data = raw_data.replace('][', ']<SPLIT>[')

# 2. split the string into an array of strings, using the chosen split indicator

split_data = tweaked_data.split('<SPLIT>')

# 3. load each string individually

parsed_data = [json.loads(bit_of_data) for bit_of_data in split_data]

(pardon the horrible variable names)

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.