Sign up ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I am attempting to speed up the processing of 5000 rows received from the database, which I then render as a crystal report, export to a byte stream and save in a database table.

Currently I am using parallel.ForEach on a datatable. This uses 40 parallel processes that then sequentially iterates 125 (i.e 5000/40) records each of the 5000 rows.

It take approximately 5 minutes to do 5000 records at the moment. Is there any way I can speed this up?

DataTable dtBills = new DataTable("dtBills");//I used an OdbcDataAdapter to fill this with 5000 records

private void ParallelProcess()
{
    int numbills = 5000;
    int numprocesses = 40;
    int numsequential = numbills / numprocesses;

    int p=0;

    if (numbills < numprocesses)
        p = numbills;
    else
        p = numprocesses;

    Parallel.For(1, p+1, new ParallelOptions { MaxDegreeOfParallelism = 40 }, i =>
        {
            SequentialProcess(numsequential,i);

        });

}

private void SequentialProcess(int batch,int num)
{
    ReportDocument cryRpt = new ReportDocument();
    cryRpt.Load(Application.StartupPath + "\\" + "Bill.rpt");

    foreach(DataRow drow in dtBills.Rows.Cast<System.Data.DataRow>().Skip((num - 1) * batch).Take(batch).CopyToDataTable().Rows)

    {

        cryRpt.SetParameterValue(..... //here I set Crystal report Parameter values

        Stream stream = cryRpt.ExportToStream(CrystalDecisions.Shared.ExportFormatType.PortableDocFormat);
        byte[] contents = new byte[stream.Length];
        stream.Read(contents, 0, (int)stream.Length);
        stream.Close();

        if (!string.IsNullOrEmpty(sconnstr))
        {
            using (SqlConnection sconn = new SqlConnection(sconnstr))
            {

                String qry = @"INSERT INTO ....."
                using (SqlCommand cmd = new SqlCommand(qry, sconn))
                {

                    cmd.CommandType = CommandType.Text;

                    cmd.Parameters.AddWithValue(// Set values for insert here one of which will be contents

                    cmd.Connection = sconn;

                    sconn.Open();
                    cmd.ExecuteNonQuery();
                    sconn.Close();

                }


            }
        }

    }

    cryRpt.Close();
    cryRpt.Dispose();

}
share|improve this question
    
I am concerned with your hard-coded use of 40 processes. Can you always be sure that parallel.ForEach will always create the maximum of 40 processes, regardless of hardware? – Nick Udell Nov 27 '14 at 13:35
    
Hi Nick. I am really not sure. The current server i am testing on has 2 quad core Intel Xeon X5355 processors. I was playing up with the numbers and 40 seemed stable. when I go to 50 processes sometimes it gets a "load report failed error". – CoDeGiRl Nov 28 '14 at 12:55
    
My concern is if you were to ever execute this code on something that, for some reason or another, could not produce 40 processes for your program, your results would be incomplete. I wonder if it might be simpler to create one asynchronous call per task, instead of batching them? You could use Parallel LINQ, Tasks or the Thread Pool itself to ensure you have them evenly distributed across all available processes. The downside is the increased overhead of creating all of these threads, so it depends how intensive the sequential process is. The more intensive, the more efficient it will be. – Nick Udell Nov 28 '14 at 14:17
1  
@NickUdell I think the results would still be complete. Parallel.For might not use MaxDegreeOfParallelism threads, but it will always perform all p iterations. – svick Nov 29 '14 at 19:12

The first thing here, you shouldn't be using directly the DataTable to read the results, use the method CreateDataReaderto create a Data Reader from it.

The difference is that a DataTable will check back on the database server on each iteration, while the DataReader will get a snapshot of your data and not check back anymore. That saves lots of time when dealing with a big number of records.

In this same channel, don't make the INSERT directly, but create it on memory, like an array or something (I think there is too something in the DataBase class that manages it in memory and commits later, but don't remember its name). If you stop dealing with the database one-at-a-time and doing in chunks, the speed of your code will improve.

share|improve this answer
1  
"The difference is that a DataTable will check back on the database server on each iteration" is not true as a DataTable is disconnected from the database. – Heslacher Nov 27 '14 at 14:37
    
@Heslacher That disconnected bit is not totally true, because DataTable still has events for RowChanged, ColumnChanged, TableCleared, and allows to insert Columns and Rows in the DataTable, while DataReader has no events, it is a read-only data structure. – fernando.reyes Nov 27 '14 at 14:52
    
But the DataTable does not check back on the database server on each iteration. – Heslacher Nov 27 '14 at 14:54
    
I am also not sure what you mean by "The difference is that a DataTable will check back on the database server on each iteration". However I do agree with you on the inserts so I will try to bulk insert the data back to the database. Thanks for taking the time to assist – CoDeGiRl Nov 27 '14 at 16:15
    
Question though do you all think another datatable consisting of 5000 records each having a byte [] and other data is feasible? I already have one datatable with 5000 records. – CoDeGiRl Nov 27 '14 at 16:22

I would like to point out some other things too:

  • MaxDegreeOfParallelism sets a max value on the number of concurrent tasks, it is a limiter and does not give what you want, which is exactly 40 tasks. And from my experience, just let the .net runtime decide how many tasks to run concurrently based on the underlying hardware. see ParallelOptions.MaxDegreeOfParallelism for more info.
  • I assume the "open connection, execute, close connection" pattern is not only used here, so why not create a reusable general purpose function that accepts a query, a connection and params?
  • You could use the report instance inside a using. What happens if something goes wrong on the foreach loop and an exception is thrown? The report is not disposed.
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.