I've got a little bit of a tricky question here regarding converting JSON strings into Python data dictionaries for analysis in Pandas. I've read a bunch of other questions on this but none seem to work for my case.

Previously, I was simply using CSVs (and Pandas' read_csv function) to perform my analysis, but now I've moved to pulling data directly from PostgreSQL.

I have no problem using SQLAlchemy to connect to my engine and run my queries. My whole script runs the same as it did when I was pulling the data from CSVs. That is, until it gets to the part where I'm trying to convert one of the columns (namely, the 'config' column in the sample text below) from JSON into a Python dictionary. The ultimate goal of converting it into a dict is to be able to count the number of responses under the "options" field within the "config" column.

df = pd.read_sql_query('SELECT questions.id, config from questions ', engine)

df = df['config'].apply(json.loads)

df = pd.DataFrame(df.tolist())

df['num_options'] = np.array([len(row) for row in df.options])

When I run this, I get the error "TypeError: expected string or buffer". I tried converting the data in the 'config' column to string from object, but that didn't do the trick (I get another error, something like "ValueError: Expecting property name...").

If it helps, here's a snipped of data from one cell in the 'config' column (the code should return the result '6' for this snipped since there are 6 options):

{"graph_by":"series","options":["Strongbow Case Card/Price Card","Strongbow Case Stacker","Strongbow Pole Topper","Strongbow Base wrap","Other Strongbow POS","None"]}

My guess is that SQLAlchemy does something weird to JSON strings when it pulls them from the database? Something that doesn't happen when I'm just pulling CSVs from the database?

share|improve this question
up vote 0 down vote accepted

In recent Psycopg versions the Postgresql json(b) adaption to Python is transparent. Psycopg is the default SQLAlchemy driver for Postgresql

df = df['config']['options']

From the Psycopg manual:

Psycopg can adapt Python objects to and from the PostgreSQL json and jsonb types. With PostgreSQL 9.2 and following versions adaptation is available out-of-the-box. To use JSON data with previous database versions (either with the 9.1 json extension, but even if you want to convert text fields to JSON) you can use the register_json() function.

share|improve this answer
    
Sorry for the delay in getting back to you but this worked perfectly, thanks so much for sharing! – Rafi Sands Sep 8 '15 at 17:05

Just sqlalchemy query:

q = session.query(
    Question.id,
    func.jsonb_array_length(Question.config["options"]).label("len")
)

Pure sql and pandas' read_sql_query:

sql = """\
SELECT  questions.id,
        jsonb_array_length(questions.config -> 'options') as len
FROM    questions
"""
df = pd.read_sql_query(sql, engine)

Combine both (my favourite):

# take `q` from the above
df = pd.read_sql(q.statement, q.session.bind)
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.