Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

Suppose a series of objects (presented here as tuples):

"a" | 1
"a" | 2
"b" | 3
"b" | 4
"a" | 5

There is no built in function (that I know of) to group by the first columns's sequence, that is, all the "a"'s in a row, then the "b"'s, then the one "a" alone. So that the groups become: {1,2},{3,4},{5} and not {1,2,5},{3,4}.

So I wrote this, which I'm submitting for review. I emulate all 8 variants of GroupBy which I present here as the two main variants (with and without result selector):

public static IEnumerable<IGrouping<TKey, TElement>> GroupBySequence<TSource, TKey, TElement>
    (this TSource[] source,
     Func<TSource, TKey> keySelector,
     Func<TSource, TElement> elementSelector,
     IEqualityComparer<TKey> comparer)
{
    var newElement = source.Select(keySelector).ToArray().MakeSequentialKey(comparer).Zip(
        source.Select(elementSelector),
        (x, y) => new Tuple<int, TElement>(x, y));

    var groupElement = newElement.GroupBy(t => t.Item1, t => t.Item2);

    var newKey = source.Select(keySelector).ToArray().MakeSequentialKey(comparer).Zip(
        source.Select(keySelector),
        (x, y) => new Tuple<int, TKey>(x, y));

    var groupKey = newKey.GroupBy(t => t.Item1, t => t.Item2);

    return groupKey.Zip(groupElement, 
        (key,element) => new Grouping<TKey,TElement>(key.First(),element));
}

public static IEnumerable<TResult> GroupBySequence<TSource, TKey, TElement, TResult>
    (this TSource[] source,
     Func<TSource, TKey> keySelector,
     Func<TSource, TElement> elementSelector,
     Func<TKey, IEnumerable<TElement>, TResult> resultSelector,
     IEqualityComparer<TKey> comparer)
{
    return source.GroupBySequence(keySelector, 
        elementSelector, comparer).Select(x => resultSelector(x.Key, x));
}

Helper methods:

//Performs an operation over each consecutive item. Here used for determining equality.
public static IEnumerable<TResult> WithNext<T, TResult>
    (this T[] source, Func<T, T, TResult> operation)
{
    return source.Zip(source.Skip(1), operation);
}

//Makes the unique key
public static IEnumerable<int> MakeSequentialKey<T>
    (this T[] source, IEqualityComparer<T> comparer)
{
    if (source.Length == 0)
        return Enumerable.Empty<int>();

    return (new[] { 0 })
        .Concat(source.ToArray().WithNext<T, int>((x, y) => comparer.Equals(x, y) ? 0 : 1))
        .ToArray()
        .RunningSum();
}

//Sum of all previous elements up to each item of an array
public static IEnumerable<int> RunningSum(this int[] source)
{
    int cumul = 0;
    foreach (int i in source)
        yield return cumul += i;
}

And the Grouping class, which is pretty much a straightforward implementation of IGrouping:

public class Grouping<TKey, TElement> : IGrouping<TKey, TElement>
{
    TKey key;
    IEnumerable<TElement> elements;

    public Grouping(TKey key, IEnumerable<TElement> elements)
    {
        this.key = key;
        this.elements = elements;
    }

    public TKey Key { get { return key; } }

    public IEnumerator<TElement> GetEnumerator()
    {
        return elements.GetEnumerator();
    }
    IEnumerator IEnumerable.GetEnumerator()
    {
        return elements.GetEnumerator();
    }
}

Anticipated questions:

  • What is your general approach here? Generate a really unique key from the given key, group the elements AND the key with that, and either reform new groups with Grouping or apply result to it, so that the original key type is still used.
  • Why extent T[] and not IEnumerable<T>? Because usage like this means the elements are ordered. It would not make sense to use GroupBySequence over a Dictionary or a HashSet which both implement IEnumerable, because these two collections have AFAIK no notion of order. If there s a better or clearer way to indicate this, I don't know it.

I'm looking for criticism, suggestions on clarity and best practices. Thank you for your time.

share|improve this question
1  
"Pre-anticipated"? When would you ever anticipate anything except before it happens? ;) – Guffa Dec 3 '11 at 23:22
    
Oops, pleonasm... – MPelletier Dec 4 '11 at 17:45
    
If {1,2},{3,4},{5} is your desired outcome, I don't understand all the complexity you added. Can't you just write a simple loop through the items which yields a result every time a group is passed? – Steven Jeuris Dec 8 '11 at 14:28
    
@StevenJeuris My example is simplified, I was looking at making something generic and portable for various usage. I added no complexity that was not already present in GroupBy. – MPelletier Dec 8 '11 at 14:31
1  
@StevenJeuris Done: programmers.stackexchange.com/questions/124128/… – MPelletier Dec 9 '11 at 1:44
up vote 4 down vote accepted

As in my comment:

If {1,2},{3,4},{5} is your desired outcome, I don't understand all the complexity you added. Can't you just write a simple loop through the items which yields a result every time a group is passed?

public static IEnumerable<IGrouping<TKey, TElement>> GroupBySequence<TSource, TKey, TElement>(
    this TSource[] source,
    Func<TSource, TKey> keySelector,
    Func<TSource, TElement> elementSelector,
    IEqualityComparer<TKey> comparer)
{
    if (source.Length == 0)
    {
        yield break;
    }

    TKey currentKey = keySelector(source.First());
    var foundItems = new List<TElement>();
    foreach (var item in source)
    {
        TKey key = keySelector(item);

        if (!comparer.Equals(currentKey, key))
        {                    
            yield return new Grouping<TKey, TElement>(currentKey, foundItems);
            currentKey = key;
            foundItems = new List<TElement>();
        }

        foundItems.Add(elementSelector(item));
    }

    if (foundItems.Count > 0)
    {
        yield return new Grouping<TKey, TElement>(currentKey, foundItems);
    }
}
share|improve this answer
    
I changed foundItems.Clear() to foundItems = new List<TElement>() since it can cause problems otherwise. – Steven Jeuris Dec 8 '11 at 17:50
    
Hmm, I guess the separate classes were really unnecessary. I should have seen that. – Jeff Mercado Dec 9 '11 at 0:06
    
@JeffMercado: While looking through your solution I was wondering whether it was somehow more reusable. You might be onto something of splitting the behavior of 'consecutive equal objects'. Combining this with Zip could result in concise syntax which is conceptually separated from the grouping. Didn't have time to attempt this. :) – Steven Jeuris Dec 9 '11 at 10:15

Here's another approach (less LINQ, slightly more code):

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace testGrouping
{
    static class GroupBySequenceExtension
    {
        internal class Grouping<TKey, TVal> : IGrouping<TKey, TVal>
        {            
            public TKey Key { get; set; }
            public IEnumerable<TVal> Items { get; set; }
            public IEnumerator<TVal> GetEnumerator()
            {
                return Items.GetEnumerator();
            }
            System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
            {
                return Items.GetEnumerator();
            }            
        }

        public static IEnumerable<IGrouping<TKey, TElement>> GroupBySequence<TSource, TKey, TElement>
            (this IEnumerable<TSource> source, 
                Func<TSource, TKey> keySelector,
                Func<TSource, TElement> elementSelector,
                IEqualityComparer<TKey> keyComparer)
        {
            TKey lastKey = default(TKey);
            bool atFirst = true;
            List<TElement> items = new List<TElement>();

            foreach (var item in source)
            {
                var key = keySelector(item);
                var element = elementSelector(item);
                if (atFirst)
                {
                    lastKey = key;
                    atFirst = false;
                }

                if (keyComparer.Equals(key, lastKey))
                {
                    items.Add(element);
                }
                else
                {
                    yield return new Grouping<TKey, TElement>
                    {
                        Key = lastKey, 
                        Items = items
                    };
                    items = new List<TElement>();
                    items.Add(element);
                }

                lastKey = key;
            }

            if (items.Count > 0)
            {
                yield return new Grouping<TKey, TElement>
                {
                    Key = lastKey,
                    Items = items
                };
            }
        }
    }
}
share|improve this answer
    
Although my code has less duplicate lines and doesn't rely on default(), this answer seems to carry the same idea of an easier solution than that of the OP. +1 for that. – Steven Jeuris Dec 8 '11 at 15:59

For this, I would not use LINQ at all here. There unfortunately aren't any methods available to make this task easier. In fact, I would say trying to use what currently is available makes it more complicated and inefficient than it has to. As you can see by all the helper methods and whatnot you had to add, you can see how complicated it can be.

Just make it work for IEnumerable<TSource>, there's no sense in restricting it to arrays. Sure HashSets and Dictionaries don't have a notion of ordering, but how else would you make this accessible to other "ordered" enumerables? You'd have to add overloads for each type you want to support since there aren't any interfaces implemented that has this distinction. It would be easier to let it work for all enumerables and leave when to use it up to the user of your code.

I actually have some code that does something like this. You only really need to keep track of the previous key that was added. If the current key matches the previous one, stick it in the same group, otherwise create a new one.

public static partial class EnumerableEx
{
    public static IEnumerable<IGrouping<TKey, TSource>> GroupByConsecutive<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
    {
        return GroupByConsecutive(source, keySelector, null as IEqualityComparer<TKey>);
    }

    public static IEnumerable<IGrouping<TKey, TSource>> GroupByConsecutive<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey> keyComparer)
    {
        return GroupByConsecutive(source, keySelector, Functions.Identity<TSource>, keyComparer);
    }

    public static IEnumerable<IGrouping<TKey, TElement>> GroupByConsecutive<TSource, TKey, TElement>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, Func<TSource, TElement> elementSelector)
    {
        return GroupByConsecutive(source, keySelector, elementSelector, null as IEqualityComparer<TKey>);
    }

    public static IEnumerable<IGrouping<TKey, TElement>> GroupByConsecutive<TSource, TKey, TElement>(
        this IEnumerable<TSource> source,
        Func<TSource, TKey> keySelector,
        Func<TSource, TElement> elementSelector,
        IEqualityComparer<TKey> keyComparer)
    {
        return ConsecutiveGrouper<TKey, TElement>.Create(source, keySelector, elementSelector, keyComparer as IEqualityComparer<TKey>);
    }

    internal class ConsecutiveGrouper<TKey, TElement> : IEnumerable<IGrouping<TKey, TElement>>
    {
        internal static ConsecutiveGrouper<TKey, TElement> Create<TSource, TKey, TElement>(
            IEnumerable<TSource> source,
            Func<TSource, TKey> keySelector,
            Func<TSource, TElement> elementSelector,
            IEqualityComparer<TKey> keyComparer)
        {
            source.ThrowIfNull("source");
            keySelector.ThrowIfNull("keySelector");
            elementSelector.ThrowIfNull("elementSelector");

            var grouper = new ConsecutiveGrouper<TKey, TElement>(keyComparer);
            foreach (var item in source)
            {
                grouper.NextGroup(keySelector(item)).Add(elementSelector(item));
            }
            return grouper;
        }

        private ConsecutiveGrouper(IEqualityComparer<TKey> keyComparer)
        {
            this._keyComparer = keyComparer ?? EqualityComparer<TKey>.Default;
            this._groupings = new List<Grouping>();
            this._lastGrouping = null;
        }
        private IEqualityComparer<TKey> _keyComparer;
        private List<Grouping> _groupings;
        private Grouping _lastGrouping;

        private Grouping NextGroup(TKey key)
        {
            if (_lastGrouping == null)
            {
                _lastGrouping = new Grouping(key);
                _groupings.Add(_lastGrouping);
            }
            else if (!_keyComparer.Equals(_lastGrouping.Key, key))
            {
                _lastGrouping = new Grouping(key);
                _groupings.Add(_lastGrouping);
            }
            return _lastGrouping;
        }

        public IEnumerator<IGrouping<TKey, TElement>> GetEnumerator()
        {
            return _groupings.GetEnumerator();
        }

        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }

        class Grouping : IGrouping<TKey, TElement>
        {
            internal Grouping(TKey key)
            {
                this.Key = key;
                this._elements = new List<TElement>();
            }
            public TKey Key { get; private set; }
            private List<TElement> _elements;

            internal void Add(TElement element)
            {
                _elements.Add(element);
            }

            public IEnumerator<TElement> GetEnumerator()
            {
                return _elements.GetEnumerator();
            }

            System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
            {
                return GetEnumerator();
            }
        }
    }
}

p.s., Python's itertools.groupby() iterator has the same semantics that you want as far as I can tell. You might want to give a look at the equivalent implementation.

share|improve this answer
    
I'm not sure I agree that helper functions are a sign of complexity. I might as well have coded the helper stuff in the functions themselves. I had to restrain myself not to make RunningSum take a lambda as a parameter in case one would like to use it for something else. Some languages allow many variations of Running functions. Ultimately I thought it didn't make it clear though. – MPelletier Dec 7 '11 at 14:40
    
Also, what does null as IEqualityComparer<TKey> do? I've never seen that before. – MPelletier Dec 7 '11 at 14:41
    
My point about the helper functions is that you needed the helper functions to make using the LINQ functions work. If you had used a more direct approach using a different algorithm, you could have avoided them all together. The null as IEqualityComparer<TKey> is just me being explicit about what I'm passing the null as. It's just a simple cast as if I did (IEqualityComparer<TKey>)null. It's expecting an IEqualityComparer<TKey> there but I'm passing in null. It's helpful for distinguishing between intended calls to certain function overloads. – Jeff Mercado Dec 7 '11 at 19:55
    
I see! I knew it was a cast, I just wasn't sure what it was doing there. Thought maybe the compiler would redefine it to the type's default. So why get rid of the comparer? – MPelletier Dec 8 '11 at 2:26
    
Written that way, I was trying to minimize the amount of repetition in the code. A common pattern when it comes to providing an optional comparer is to use null to signify that the default comparer is to be used. In this case, the comparer is set in the constructor for ConsecutiveGrouper. It is cleaner this way than to have that repeated in every overload that required it. – Jeff Mercado Dec 8 '11 at 4:02

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.