I'm trying to parse json files from huge JSON file (1.9GB) so i split them into chunks of 10MB (190 files).
in order to ease the process so i load them 80 files at a time and i put them into a list
i use this to iterate through the 80 files
for root, dirs, filenames in os.walk(path):
for f in filenames:
function below
and this is the function to read file names with corrected path
dat = 'C:/Users/User/My Lab/Python/scripts/thesis/data_extractor/review/{file}'.format(file=f)
with open(dat) as data_file:
for item in data_file:
if len(item) > 1:
dict_review.append(item)
after the process is done, i iterate the list and parse them using json.loads
data = None
for row in dict_review:
data = json.loads(row,'utf-8')
and thats where the exception happens
Unexpected error: <type 'exceptions.TypeError'>
Reason: expected string or buffer
i tried casting the row into string with str(row) but still returns the same exception.
i wonder what i did wrong, thanks!
SOLVED:
it was my mistake, actually the JSON was properly parsed, the problem is when i try to remove all funny characters with regex
re.sub(r'[^\w]', ' ',data['votes'])
to
re.sub(r'[^\w]', ' ',str(data['votes']))
i need to cast the object into string
thanks!
glob
and befor filename in glob.iglob("/some/path/*.ext")
, for example.dict_review
comes from, and what type is it? is it a dict or a list? And what happens between the second and third code blocks?print('row=%r' % row)
before thejson.loads...
line. I'm sure you will be surprised.