I have a text file with 100000 pairs: word and frequency.
aba 20
baba 30
...
I parse this file and put the words in
Dictionary<string,double> dictionary;
And I want to execute some search + order logic in the following code:
Parallel.For(0, 15000, body =>
{
tempInputWord = Console.ReadLine();//from file source
var adviceWords = dictionary
.Where(p => p.Key.StartsWith(tempInputWord, StringComparison.Ordinal))
.OrderByDescending(ks => ks.Value)
.ThenBy(ks => ks.Key)
.Select(s => s.Key)
.Take(10).ToList();
//some output
}
The problem: This code must run in less than 10 seconds.
On my computer (core i5 2400, 8gb RAM) with Parallel.For() - about 37sec without Parallel - about 126 sec.
Can you give me some advice how to increase performance?
UPDATE 1: I change code with @CodesInChaos and some other advices
for (int i = 0; i < userWordQuantity; i++)
{
var searchWord =//take data from file(or other sources)
int startIndex = kvList.BinarySearch(new KeyValuePair<string, int>(searchWord, 0), comparer);
if (startIndex < 0)
startIndex = ~startIndex;
//32sec/19sec(parallel)
var adviceWords = kvList
//.AsParallel()
.Skip(startIndex)
.TakeWhile(p => p.Key.StartsWith(searchWord, StringComparison.Ordinal))
.OrderByDescending(ks => ks.Value)
.ThenBy(ks => ks.Key, StringComparer.Ordinal)//add ordinal
.Select(s => s.Key)
.Take(10)
.ToList();
//91sec
//var adviceWords = dictionary
// .Where(p => p.Key.StartsWith(searchWord, StringComparison.Ordinal))
// .OrderByDescending(ks => ks.Value)
// .ThenBy(ks => ks.Key,StringComparer.Ordinal)//add ordinal
// .Take(10)
// .ToList();
...
}
public class MyComparer : Comparer<KeyValuePair<string, int>>
{
public override int Compare(KeyValuePair<string, int> x, KeyValuePair<string, int> y)
{
return x.Key.CompareTo(y.Key);
}
}
Now in best case I have:
32 sec (19 sec for parallel)
UPDATE 2: I try to use Trie (library from here https://trienet.codeplex.com/)
Simplified, with only search, without "OrderBy().ThenBy()"
var trie = new Trie<KeyValuePair<string, int>>();
foreach (var item in dictionary)
{
trie.Add(item.Key, new KeyValuePair<string,int>(item.Key,item.Value));
}
for (int i = 0; i < userWordQuantity; i++)
{
var searchWord =//take data from file(or other sources)
var adviceWords = trie
.Retrieve(searchWord)
.ToList();
}
60 sec
If I use previous variant(simplified)
var kvList = dictionary.OrderBy(ks => ks.Key,StringComparer.Ordinal).ToList();
var comparer = new MyComparer();
for (int i = 0; i < userWordQuantity; i++)
{
var searchWord =//take data from file(or other sources)
int startIndex = kvList.BinarySearch(new KeyValuePair<string, int>(searchWord, 0), comparer);
if (startIndex < 0)
startIndex = ~startIndex;
var test1 = kvList
.Skip(startIndex)
.TakeWhile(p => p.Key.StartsWith(searchWord, StringComparison.Ordinal))
.ToList();
}
15.5 sec