I am working on a machine learning algorithm on python's scikit.learn, but this time the data are in Mongodb documents format. I would like to pull my data into a dataframe. Here is an exemple of the documents:
{
"_id" : ObjectId("552b9525359c6a09f061cb53"),
"Interrupt" : true,
"Url" : "Coco_mademoiselle.jpg",
"Target" : {
"FemaleInPercent" : 100,
"MaleInPercent" : 0,
"AgeProperties" : 6
},
"MaxDisplayTime" : 7,
"MinDisplayTime" : 2,
"MediaType" : 0,
"IsLocked" : false,
"FaceTagged" : [
{
"FaceId" : 36,
"GenderConfidence" : -0.1731295609721586,
"Age" : 23,
"TotalAttention" : 14.92099999999997,
"AttentionInsideThisContent" : 2.273999999999992,
"Gender" : "Unknown",
"AngleYaw" : [
0
],
"XPos" : [
0.07704142996903575
],
"YPos" : [
0.7182761555157026
],
"Distance" : [
0.7223960254002589
]
},
{
"FaceId" : 37,
"GenderConfidence" : 0.3932732620245187,
"Age" : 51,
"TotalAttention" : 14.92099999999997,
"AttentionInsideThisContent" : 2.273999999999992,
"Gender" : "Female",
"AngleYaw" : [
0
],
"XPos" : [
0.9852976840852283
],
"YPos" : [
-0.9149562017596122
],
"Distance" : [
1.344602683844596
]
}
],
"PanelId" : "PANEL_1",
"ScenarioId" : "Scenario-1",
"StartTime" : ISODate("2015-04-13T10:06:22.622Z"),
"EndTime" : ISODate("2015-04-13T10:06:29.640Z")
}
I used this function to put my data into a pandas dataframe but I have some issues with my embedded documents and array of documents:
def read_mongo(db, collection, query={}, host='localhost', port=27017, username=None, password=None, no_id=True):
""" Read from Mongo and Store into DataFrame """
# Make a query to the specific DB and Collection
cursor = collection.find(query)
# Expand the cursor and construct the DataFrame
df = pd.DataFrame(list(cursor))
# Delete the _id
if no_id:
del df['_id']
return df
Finally, i get a Dataframe with one column containing the FaceTagged informations gathered all together :
data.FaceTagged.to_frame()
FaceTagged
0 [{u'Distance': [0.871754460354], u'XPos': [0.7...
1 [{u'Distance': [0.845591660012], u'XPos': [0.6...
2 [{u'Distance': [1.01813052012], u'XPos': [-0.7...
each line contain all the fields from only one document besides the fact FaceTagged is an array of documents, and each document contains severals fields.
Anyone can relate to this?