Postgres + Sqlalchemy double counting results

Ask Question

Asked 8 years, 8 months ago

Modified 8 years, 8 months ago

Viewed 263 times

I am encountering duplicate results in my sqlalchemy+postgres query. I have a model structured like below:

class Audio(db.Model):
    __tablename__ = "audio"

    id = db.Column(db.Integer, primary_key=True)
    file_location = db.Column(db.String(), unique=True)
    upload_time = db.Column(ArrowType, default=arrow.utcnow())
    keyword = db.Column(JSON)
    transcript = db.Column(JSON)
    diarization = db.Column(JSON)

    user_id = db.Column(db.Integer, db.ForeignKey('user.id', onupdate='cascade', ondelete='cascade'))
    company_id = db.Column(db.Integer, db.ForeignKey('company.id', onupdate='cascade', ondelete='cascade'))
    client_id = db.Column(db.Integer, db.ForeignKey('client.id', onupdate='cascade', ondelete='cascade'))

    def __init__(self, file_location, upload_time, keyword, transcript, diarization, user_id, company_id, client_id):
        self.file_location = file_location
        self.upload_time = upload_time
        self.keyword = keyword
        self.transcript = transcript
        self.diarization = diarization
        self.user_id = user_id
        self.company_id = company_id
        self.client_id = client_id

    def __repr__(self):
        return '<Audio ID %r>' % self.id

And under transcript, the input is a json object:

{"transcript": [
    {"p": 0, "s": 0, "e": 320, "c": 0.545, "w": "This"}, 
    {"p": 1, "s": 320, "e": 620, "c": 0.825, "w": "call"}, 
    {"p": 2, "s": 620, "e": 780, "c": 0.909, "w": "is"}, 
    {"p": 3, "s": 780, "e": 1010, "c": 0.853, "w": "being"}
    ...
    ]}

I am trying to filter entries based on the value from "w" and query its corresponding Audio.id and "p". I have tried the following:

  transcript_subquery = s.query(func.json_array_elements(Audio.transcript['transcript']).label('transcript')).subquery()
    temp = transcript_subquery.c.transcript.op('->>')('w').cast(String)
    temp1 = transcript_subquery.c.transcript.op('->>')('s').cast(Integer)
    query = s.query(temp1, Audio.id).filter(temp.ilike("all"))

The problem that I have is that right now I only have 5 data entries and one of the data entry's transcript field is blank. However, I got a result that contains the following:

(665610, 5), (736413, 5), (907230, 5), (942340, 5), (1020852, 5), (1023942, 5), (1037101, 5), (1078521, 5), (1105581, 5), (1117551, 5), (1372730, 5), (1501960, 5), (1508410, 5)

This is really odd to be given that the entry where Audio.id =5 doesn't even have transcript json object yet it produces such result. Moreover, all the entries, each has 112 results. And I suspect that 112 is the total number that the word appears and it somewhat is double counting everything for each Audio ID. And I don't really know how to fix my sqlalchemy query.

asked Dec 8, 2016 at 23:33

YAL

7012 gold badges8 silver badges22 bronze badges

You're implicitly cross joining your subquery and the audio table in the code you've presented. In other words every row from your subquery will be joined to every row from audio, since you have no predicates that would make it an inner join.

Ilja Everilä
– Ilja Everilä

12/09/2016 09:02:30
Commented Dec 9, 2016 at 9:02
@IljaEverilä Is there a way to resolve this problem? I saw this post saying there there's no better way to do this in sqlalchemy: stackoverflow.com/questions/14813516/…

YAL
– YAL

12/09/2016 18:59:37
Commented Dec 9, 2016 at 18:59
1

That question has nothing to do with this. You want to avoid cross joins, not to explicitly use cross joins. Your problem is that you're doing SELECT (foo.transcript->>'s')::int, audio.id FROM (SELECT ... AS transcript FROM audio) foo, audio when it appears that you actually want to do SELECT (foo.transcript->>'s')::int, foo.id FROM (SELECT ... AS transcript, id FROM audio) foo instead.

univerio
– univerio

12/09/2016 22:18:26
Commented Dec 9, 2016 at 22:18

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Postgres + Sqlalchemy double counting results

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked