Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I have multiple (1000+) JSON files each of which contain a JSON array. I want to merge all these files into a single file.

I came up with the following, which reads each of those files and creates a new object with all the contents. I then write this new object into a new file.

Is this approach efficient? Is there a better way to do so?

head = []
with open("result.json", "w") as outfile:
    for f in file_list:
        with open(f, 'rb') as infile:
            file_data = json.load(infile)
            head += file_data
    json.dump(head, outfile)
share|improve this question
    
do you need multithread solution? –  ipoteka Apr 18 at 0:16
    
My concern with this approach is that you load the entire array into memory in the head variable. If you have a lot of data in those 1000+ files, you might start running out of memory. –  alexwlchan Apr 18 at 14:00
5  
Cheap-and-nasty approach: read each file, then immediately append the contents onto the result file, minus the opening and closing array markers ([]). Add commas and array markers as appropriate between items. Then you only have one file in memory at a time. –  alexwlchan Apr 18 at 14:01

1 Answer 1

  1. First off, if you want reusability, turn this into a function. The function should have it's respective arguments.
  2. Secondly, instead of allocating a variable to store all of the JSON data to write, I'd recommend directly writing the contents of each of the files directly to the merged file. This will help prevent issues with memory.
  3. Finally, I just have a few nitpicky tips on your variable naming. Preferably, head should have a name more along the lines of merged_files, and you shouldn't be using f as an iterator variable. Something like json_file would be better.
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.