Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. Join them; it only takes a minute:

Sign up

Here's how it works:

Anybody can ask a question
Anybody can answer
The best answers are voted up and rise to the top

Creating Pandas DataFrame on Python From a MongoDB Document containing Embedded documents

up vote 1 down vote favorite

I am working on a machine learning algorithm on python's scikit.learn, but this time the data are in Mongodb documents format. I would like to pull my data into a dataframe. Here is an exemple of the documents:

{
    "_id" : ObjectId("552b9525359c6a09f061cb53"),
    "Interrupt" : true,
    "Url" : "Coco_mademoiselle.jpg",
    "Target" : {
        "FemaleInPercent" : 100,
        "MaleInPercent" : 0,
        "AgeProperties" : 6
    },
    "MaxDisplayTime" : 7,
    "MinDisplayTime" : 2,
    "MediaType" : 0,
    "IsLocked" : false,
    "FaceTagged" : [ 
        {
            "FaceId" : 36,
            "GenderConfidence" : -0.1731295609721586,
            "Age" : 23,
            "TotalAttention" : 14.92099999999997,
            "AttentionInsideThisContent" : 2.273999999999992,
            "Gender" : "Unknown",
            "AngleYaw" : [ 
                0
            ],
            "XPos" : [ 
                0.07704142996903575
            ],
            "YPos" : [ 
                0.7182761555157026
            ],
            "Distance" : [ 
                0.7223960254002589
            ]
        }, 
        {
            "FaceId" : 37,
            "GenderConfidence" : 0.3932732620245187,
            "Age" : 51,
            "TotalAttention" : 14.92099999999997,
            "AttentionInsideThisContent" : 2.273999999999992,
            "Gender" : "Female",
            "AngleYaw" : [ 
                0
            ],
            "XPos" : [ 
                0.9852976840852283
            ],
            "YPos" : [ 
                -0.9149562017596122
            ],
            "Distance" : [ 
                1.344602683844596
            ]
        }
    ],
    "PanelId" : "PANEL_1",
    "ScenarioId" : "Scenario-1",
    "StartTime" : ISODate("2015-04-13T10:06:22.622Z"),
    "EndTime" : ISODate("2015-04-13T10:06:29.640Z")
}

I used this function to put my data into a pandas dataframe but I have some issues with my embedded documents and array of documents:

def read_mongo(db, collection, query={}, host='localhost', port=27017, username=None, password=None, no_id=True):
    """ Read from Mongo and Store into DataFrame """

    # Make a query to the specific DB and Collection
    cursor = collection.find(query)

    # Expand the cursor and construct the DataFrame
    df =  pd.DataFrame(list(cursor))

    # Delete the _id
    if no_id:
        del df['_id']

    return df

Finally, i get a Dataframe with one column containing the FaceTagged informations gathered all together :

data.FaceTagged.to_frame()
                                           FaceTagged
0   [{u'Distance': [0.871754460354], u'XPos': [0.7...
1   [{u'Distance': [0.845591660012], u'XPos': [0.6...
2   [{u'Distance': [1.01813052012], u'XPos': [-0.7...

each line contain all the fields from only one document besides the fact FaceTagged is an array of documents, and each document contains severals fields.

Anyone can relate to this?

edited Apr 24 '15 at 11:36

asked Apr 24 '15 at 9:42

MabroukAljane

"I have some issues" - can you be specific, please. What error messages are thrown, or what design problems are you suffering? – Michael Green Apr 24 '15 at 10:36

I just updated the text of my problem, The problem is related to the embedded documents stored in the array "FaceTagged". hope it's clear now. – MabroukAljane Apr 24 '15 at 11:38

add a comment |

Your Answer

Sign up or log in

Post as a guest

Name

Post as a guest

Name

discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged mongodb nosql python or ask your own question.

question feed

asked	1 year ago
viewed	1112 times

current community

your communities

more stack exchange communities

Creating Pandas DataFrame on Python From a MongoDB Document containing Embedded documents

Your Answer

Browse other questions tagged mongodb nosql python or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Creating Pandas DataFrame on Python From a MongoDB Document containing Embedded documents

Can you help? Database Administrators Stack Exchange depends on everyone sharing their knowledge. If you're able to answer this question, please do!

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged mongodb nosql python or ask your own question.

Related

Hot Network Questions