Parallel LINQ Performance [closed]

Question

This a sample code, when it runs it take only about 46% CPU time

I'm using code based on this sample to process big TSV files (from 100GB to 500GB)

I tried to use BlockingCollection but performance didn't improve

How I can improve performance of this sample?

private static IEnumerable<Tuple<string, int>> GetEnumerator(string s, int count)
{
    for (var i = 0; i < count; i++)
    {
        yield return new Tuple<string, int>(s, i);
    }
}

private static void Test()
{
    var columns = 10000;
    var rows = 50000;
    var cols = new List<string>(columns);

    for (var i = 0; i < columns; i++)
    {
        cols.Add(i.ToString());
    }

    var line = string.Join("\t", cols);
    Func<Tuple<string, int>, string> action = li =>
            {
                var sl2 = li.Item1.Split('\t');
                return string.Join("\t", sl2);
            };
    var dt = DateTime.Now;
    GetEnumerator(line, rows).AsParallel().Select(action).ForAll(lline => {});
    Console.WriteLine("Time taken {0}", DateTime.Now - dt);
}

This question appears to be off-topic because it is about example code, not about reviewing real code. — svick, Aug 22 '13 at 22:56

svick · Answer 1 · 2013-08-22 22:55:33Z

up vote 1 down vote

It doesn't make much sense to try to improve performance of samples.

Both your action and the ForAll() delegate are very unrealistic. And having 100 % CPU utilization also isn't a reasonable goal.

In real world, what you do is that you write code that actually does something useful. And if it's not fast enough, only then you try to optimize it. And CPU utilization is a metric that might be useful in finding a faster solution, but it's not a goal by itself.

answered Aug 22 '13 at 22:55

svick
19.1k23268

I have no problems with other parts of my program, cutting all other parts didn't decrease execution time for more than 15% – Vlad M Aug 23 '13 at 6:29

@VladM Are you really saying that var sl2 = li.Item1.Split('\t'); return string.Join("\t", sl2); is a good representation of your real code? – svick Aug 23 '13 at 11:07

Yes, I have a different code variations for process splited data but they takes less 15% of total execution time – Vlad M Aug 23 '13 at 12:43

add a comment |

user28772 · Answer 2 · 2013-08-22 15:40:10Z

up vote 0 down vote

I would have made only a comment, but can't due to low rep I just wanted to say that:

don't use datetime for benchmarking use StopWatch class which is made for that
how complex are your string? for what i know parallel linq should be used if the items need non-trivial computation, using on "test /t test" may be slower than a sequential search (or may be the same since plinq itself is optimized and fallback to sequential search on some tipes of filter pattern)

answered Aug 22 '13 at 15:40

user28772

As you can see upper I test this on 10k columns data row. I can't see any search in my sample. – Vlad M Aug 22 '13 at 15:49

add a comment |

asked	2 years ago
viewed	354 times
active	2 years ago

current community

your communities

more stack exchange communities

Parallel LINQ Performance [closed]

closed as off-topic by svick, Jeff Vanzella, Jamal♦, Lstor, Brian Reichle Aug 23 '13 at 7:55

2 Answers 2

Not the answer you're looking for? Browse other questions tagged c# performance linq or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Parallel LINQ Performance [closed]

closed as off-topic by svick, Jeff Vanzella, Jamal♦, Lstor, Brian Reichle Aug 23 '13 at 7:55

2 Answers 2

Not the answer you're looking for? Browse other questions tagged c# performance linq or ask your own question.

Related

Hot Network Questions