Sign up ×
Stack Overflow is a community of 4.7 million programmers, just like you, helping each other. Join them, it only takes a minute:

I am trying to download a very large set of data from remote Amazon RedShift server (a Postgresql database). The data is about user visiting log. Since the data is very large. I extract the ids of users who visit the website with a specified time period, and then recursively extract their logs.

The code is as below.

static Connection getUserLogConn() throws SQLException, ClassNotFoundException {
        System.out.println("-------- PostgreSQL "
                + "JDBC Connection Testing ------------");

        Class.forName("org.postgresql.Driver");
        Connection connection = null;
        connection = DriverManager.getConnection("<address>", "<username>", "<password>");
        return connection;
    }

    static LinkedList<String> extractAllUIDsFromRemote( Connection connection ) throws SQLException, UnknownHostException {
        LinkedList<String> allUIDs = new LinkedList<String>();
        String query = "SELECT distinct uid " + 
                       fromStr +
                       " WHERE ts >= " + startTime;
        if(!endTime.equals(""))
            query += " AND ts < " + endTime;

        System.out.println("Sent SQL to RedShift: " + query);

        // ***Below statement is where the exception occurs ***
        ResultSet rs_uid = connection.createStatement().executeQuery( query ); 

        System.out.println( "Received all UIDs successfully" );

        int n = 0;
        while( rs_uid.next() ) {
            // The cursor points to a row in the result
            n++;
            String uid = rs_uid.getString( "uid" );
            allUIDs.add(uid);
        }
        System.out.println( n + " docs are retrieved." );

        return allUIDs;
    }    


    static void queryIndividualUserLog( Connection connection, LinkedList<String> uids ) throws SQLException, UnknownHostException {
        MongoDBManager db = new MongoDBManager( database, "FreqUserLog" );
        db.createIndex("uid");
        db.createIndex("url");

        StringBuffer sb = new StringBuffer();
        int i = 0;
        for( String uid : uids ) {
            sb.append( "uid='" + uid + "'" );
            // Compose SQL query every 10000 users
            if( ( i != 0 && i % 10000 == 0 ) || i == uids.size() - 1 ){
                System.out.println("Processing up to User " + i);
                String query = "SELECT * " + 
                               fromStr +
                               " WHERE " + sb.toString() +
                               " AND ts >= " + startTime;
                if(!endTime.equals(""))
                    query += " AND ts < " + endTime;

                System.out.println("Sent SQL to RedShift for retrieving individual users' logs");
                **ResultSet rs_log = connection.createStatement().executeQuery( query );** // This step takes time to wait for the response from RedShift
                System.out.println( "Received individual users' logs succesfully" );

                while( rs_log.next() ) {
                    db.insertOneLog( rs_log ); // one log = one doc, i.e. one row
                }
                System.out.println( "Have written to DB." );
                sb = new StringBuffer();
            }
            else {
                sb.append( " OR " );
            }   
            i++;
        }
        System.out.println(uids.size() + " user's log are stored into DB");
    }

    public static void main(String[] args) throws ClassNotFoundException, SQLException, UnknownHostException {

        Connection connection = getUserLogConn();
        if(connection != null) {
            System.out.println( "Connect succesfully" );


        /** Extract all users' UIDs, and store them in FreqUserIDs collection */
            LinkedList<String> allUIDs = extractAllUIDsFromRemote( connection );

        /** Query all records of freq users from RedShift, and store them in FreqUserLog collection */
            queryIndividualUserLog( connection, allUIDs );

        connection.close();
    }

However, the problem is that sometimes an exception is thrown out. The statement under the "***" comment in the code is where the problem happens.

org.postgresql.util.PSQLException: An I/O error occured while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:218)
at org.postgresql.jdbc2.AbstractAbstractJdbcAbstractedly5statement.excute(AbstractJdbcabstractedly5statement.java:561)
...
Caused by java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:143)
org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:112)
org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:194)
org.postgresql.core.PGStream.Receive(PGStream.read)

Since I can not access to the remote Postgresql server, I do not have the database log. I googled this problem. Many of related issues are about "Connection reset by peer", not "connection reset" here. Someone says that "connect reset" means the connection is closed at this side, i.e. my side. But I do not know why this happens and how to fix it. Thanks.

UPDATE: My guess is that the query process usually takes too long time because the data is too large. So the problem keeps waiting for the response from RedShift. In this case, my program closes the connection due to timeout. I do not know if this is the truth... If so, is there any better solution? (I mean, better than decreasing the number of users that inquiry each time. Right now the number is 10000).

share|improve this question
    
I think the message would be more expressive if it was due to too long duration. Did you have a try with another program like pgAdmin or SQuirreL in order to see if it's reproducible ? – Fabien Thouraud Jun 3 at 21:16

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged or ask your own question.