Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

This a sample code, when it runs it take only about 46% CPU time

I'm using code based on this sample to process big TSV files (from 100GB to 500GB)

I tried to use BlockingCollection but performance didn't improve

How I can improve performance of this sample?

private static IEnumerable<Tuple<string, int>> GetEnumerator(string s, int count)
{
    for (var i = 0; i < count; i++)
    {
        yield return new Tuple<string, int>(s, i);
    }
}

private static void Test()
{
    var columns = 10000;
    var rows = 50000;
    var cols = new List<string>(columns);

    for (var i = 0; i < columns; i++)
    {
        cols.Add(i.ToString());
    }

    var line = string.Join("\t", cols);
    Func<Tuple<string, int>, string> action = li =>
            {
                var sl2 = li.Item1.Split('\t');
                return string.Join("\t", sl2);
            };
    var dt = DateTime.Now;
    GetEnumerator(line, rows).AsParallel().Select(action).ForAll(lline => {});
    Console.WriteLine("Time taken {0}", DateTime.Now - dt);
}
share|improve this question

closed as off-topic by svick, Jeff Vanzella, Jamal, Lstor, Brian Reichle Aug 23 '13 at 7:55

If this question can be reworded to fit the rules in the help center, please edit the question.

4  
This question appears to be off-topic because it is about example code, not about reviewing real code. –  svick Aug 22 '13 at 22:56

2 Answers 2

It doesn't make much sense to try to improve performance of samples.

Both your action and the ForAll() delegate are very unrealistic. And having 100 % CPU utilization also isn't a reasonable goal.

In real world, what you do is that you write code that actually does something useful. And if it's not fast enough, only then you try to optimize it. And CPU utilization is a metric that might be useful in finding a faster solution, but it's not a goal by itself.

share|improve this answer
    
I have no problems with other parts of my program, cutting all other parts didn't decrease execution time for more than 15% –  Vlad M Aug 23 '13 at 6:29
    
@VladM Are you really saying that var sl2 = li.Item1.Split('\t'); return string.Join("\t", sl2); is a good representation of your real code? –  svick Aug 23 '13 at 11:07
    
Yes, I have a different code variations for process splited data but they takes less 15% of total execution time –  Vlad M Aug 23 '13 at 12:43

I would have made only a comment, but can't due to low rep I just wanted to say that:

  1. don't use datetime for benchmarking use StopWatch class which is made for that
  2. how complex are your string? for what i know parallel linq should be used if the items need non-trivial computation, using on "test /t test" may be slower than a sequential search (or may be the same since plinq itself is optimized and fallback to sequential search on some tipes of filter pattern)
share|improve this answer
    
As you can see upper I test this on 10k columns data row. I can't see any search in my sample. –  Vlad M Aug 22 '13 at 15:49

Not the answer you're looking for? Browse other questions tagged or ask your own question.