Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have created a quick console application, which creates 10000 younger people and 10000 older people and adds them to two separate lists. I then perform some queries to obtain information based on personalities.

class Program
{
    static void Main(string[] args)
    {
        private Random random = new Random();
        private List<Person> youngerPersons = new List<Person>();
        private List<Person> olderPersons = new List<Person>();

        private long samePersonalityMatches = 0;

        for (int i = 0; i < 10000; i++)
        {
            youngerPersons.Add(new Person(RandomString(10), DateTime.Now.ToString(), RandomString(4), random.Next(10, 50),(Person.PersonalityType)random.Next(0, 4), i));
        }

        for (int i = 0; i < 10000; i++)
        {
            olderPersons.Add(new Person(RandomString(10), DateTime.Now.ToString(), RandomString(4), random.Next(51, 120),(Person.PersonalityType)random.Next(0, 4), i));
        }

        //Example query 1
        foreach (Person person in youngerPersons.Where(w => w.Id > 4567 && w.Age > 70))
        {
            Console.WriteLine(person.Id);
        }

        //Example query  2
        foreach (Person person in youngerPersons)
        {
            foreach (Person olderPerson in olderPersons)
            {
                if (person.Personality == olderPerson.Personality)
                {
                    samePersonalityMatches++;
                }
            }
        }

        Console.WriteLine("Number of matches: " + samePersonalityMatches);
    }

    private static Random random = new Random((int)DateTime.Now.Ticks);

    private static string RandomString(int size)
    {
        StringBuilder builder = new StringBuilder();
        char ch;
        for (int i = 0; i < size; i++)
        {
            ch = Convert.ToChar(Convert.ToInt32(Math.Floor(26 * random.NextDouble() + 65)));
            builder.Append(ch);
        }

        return builder.ToString();
    }
}

internal class Person
{
    public enum PersonalityType
    {
        Funny = 0,
        Boring = 1, 
        Serious = 2,
        Grumpy = 3, 
        Normal = 4
    }

    public Person(string name, string dateofbirth, string nickname, int age, PersonalityType personalityType, int id)
    {
        this.Name = name;
        this.Dateofbirth = dateofbirth;
        this.Nickname = nickname;
        this.Age = age;
        this.Id = id;
        this.Personality = personalityType;
    }

    public string Name { get; set; }

    public string Dateofbirth { get; set; }

    public string Nickname { get; set; }

    public int Age { get; set; }

    public int Id { get; set; }

    public PersonalityType Personality { get; set; }
}

Basically I would like to understand best practices to get the most performance out of both examples queries I put in the code. I have read some performance related material on using intersect, but i'm not sure which and when is best to use to get most performance. The lists are a bit OTT(size wise), but it made example two more interesting to run.

share|improve this question

3 Answers

up vote 2 down vote accepted

One is fine, very close to optimal and I'd leave it like you have it (remember, programmer time is more expensive than machine time).

For two you can do a lot better. You're walking the olderPersons list too many times, let's see if we can get it down to one traversal.

Dictionary<Personality, int> dict =
    youngerPersons.GroupBy(p => p.Personality)
                  .ToDictionary(g => g.Key, g => g.Count());
long samePersonalityMatches =
    olderPersons.Select(
                     q => dict.ContainsKey(q.Personality) ?
                     dict[q.Personality] : 0
                )
                .Sum();

And then, once we see this, we should say to ourselves, hey, this looks like a hash join! Then, we can write it more clearly:

long samePersonalityMatches = 
    youngerPersons.Join(
                      olderPersons,
                      p => p.Personality,
                      q => q.Personality,
                      (p, q) => null
                  )
                  .Count();

Any time you see the pattern nested loop, match over outer, inner you should be thinking of a join!

share|improve this answer
A join does the same thing and is a lot simpler... – Thomas Levesque 21 hours ago
1  
A join is the same thing. I am being pedagogic. – Jason 21 hours ago

The first example query is fine, there's probably nothing you can do to improve its performance.

In the second query, you could use a join:

samePersonalityMatches =
    youngerPersons.Join(olderPersons,
                        p => p.Personality,
                        p => p.Personality,
                        (left, right) => null) // the result doesn't really matter,
                                               // since we just want the count
                  .Count();

Or if you prefer the query syntax:

samePersonalityMatches =
    (from y in youngerPersons
     join o in olderPersons
         on y.Personality equals o.Personality
     select null).Count();

A join enumerates each list only once, so the complexity is O(M + N), instead of O(M * N)

share|improve this answer

The specific answers here from Jason and Thomas assume (quite reasonably given the way your question was phrased) that you don't know the questions before you have the data.

Just thought I'd point out, that if you do know the questions you are going to ask your data, you can keep running aggregates as you add young and old people to your lists so the answer is always ready.

This is a similar idea to the motivation for opting to build particular indexes in a database.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.