- It's not mandatory but it's a widely accepted guideline. Break it only if you have a good reason to do so.
- No, they don't. Class should be
internal
(default) and static
(because it has not instance methods). Main()
method should be private to make clear that it won't be called outside this class but it's not mandatory (someone prefers public
because it's logically called from outside).
For the other points things are more complex that a paragraph...
You can create a generator to read from standard input (see very last paragraph for a discussion about this):
private static IEnumerable<string> ReadAllInputs()
{
while (true)
{
string line = Console.ReadLine();
if (line == null)
break;
yield return line;
}
}
You do not need ToLower()
on input, I do not even start to mention all the problems of case insensitive comparison if not done properly: a lower case string may be different from an upper case string (even in number of characters) but they may be compared equal. Don't put yourself in trouble and let the framework do the work for you:
var counts = new SortedDictionary<string, int>(StringComparer.CurrentCultureIgnoreCase);
Note that you do not need a SortedDictionary
here, ordering may be done (in this case) when printing output. If performance matters then you should profile by yourself.
When you increment word count you also increment local variable count
, not a big performance hit (and compiler may be smart enough to elide local variable) but it's about code clarity. You can do:
counts[word] = count + 1;
Or simply:
++counts[word];
Note that this innocent code ++counts[word]
(as well as original counts[word] = ++count
) when word
doesn't exist in dictionary (TryGetValue
returns false
) will perform another lookup vanishing part of the benefit of TryGetValue
for the first insertion! You have to explicitly call Add()
.
If you want you can refactor foreach
loop to print results with LINQ, not a big gain (IMO), let's see how:
Console.WriteLine(String.Join(Environment.NewLine,
counts.Select(x => $"{x.Key} {x.Value}")));
If we put everything together with then have something like this:
namespace Test {
static class TraditionalWordCountApp {
static void Main(string[] args) {
var counts = new Dictionary<string, int>(StringComparer.CurrentCultureIgnoreCase);
var wordRegex = new Regex(@"(?i)[\p{L}']+");
foreach (var match in ReadAllInputs().SelectMany(x => wordRegex.Match(x))) {
if (counts.TryGetValue(match.Value, out count))
counts[word] = count + 1;
else
counts.Add(match.Value, 1);
}
Console.WriteLine(String.Join(Environment.NewLine,
counts.OrderBy(x => x.Key).Select(x => $"{x.Key} {x.Value}")));
}
static IEnumerable<string> ReadAllInputs() {
while (true) { // Yes, in C# this is "safe"
string line = Console.ReadLine();
if (line == null)
break;
// No need to match any regex against an empty line
if (line.Length == 0)
continue;
yield return line;
}
}
}
}
All these done you may think that...there should be another way to do it. You can use GroupBy()
. Profile both versions to see which one has better performance (and the one you feel more easy to understand):
var counts = ReadAllInputs()
.SelectMany(x => regex.Match(x))
.GroupBy(x => x.Value, StringComparer.CurrentCultureIgnoreCase)
.Select(x => new { Word = x.Key, Count = x.Count() })
.OrderBy(x => x.Word);
Then you print it with:
Console.WriteLine(String.Join(Environment.NewLine,
counts.Select(x => $"{x.Word} {x.Count}")));
One final note about word counting. It may apply or not to your case but don't forget that word is a concept with different meanings in different languages. Topic is vast but try to paste a - for example - Japanese text in Microsoft Word and check what the word count is: number of characters (another vague concept). If you have to deal with international text you may need to write specific code for different cultures.
Comments. Few notes collected from comments.
Sometimes you see this kind of code:
while (Console.In.Peek() != -1)
yield return Console.ReadLine();
It works only if console input is redirected, when input is line buffered then it'll end immediately after first line. Nice and short but simply it doesn't work. If you really really love LINQ and you can't live without one-line functions then you may do this:
static IEnumerable<string> ReadAllInputs() {
return Enumerable.Range(0, Int32.MaxValue)
.Select(_ => Console.ReadLine())
.TakeWhile(x => x != null);
}
Can we make our reader more "generic"? The obvious idea is to use it with files. Maybe yes, maybe not. First of all a console application can have piped input (from file or from another application) then we may not need this feature. If we do then we should introduce a separate class InputReader
, implementation will be different for ConsoleInputReader
, FileInputReader
and HttpInputReader
(for example); one function with a TextReader
parameter will probably be the worst compromise for all our use-cases. What I'm suggesting is just to don't introduce complexity until you need it...