Jacob Carpenter’s Weblog

January 7, 2010

Reading large xml files

Filed under: csharp, extension methods — Jacob @ 12:16 am

I’m a huge fan of System.Xml.Linq or “LINQ to XML”. However, some documents really are just too large to efficiently process with an in-memory representation like XDocument. For such documents, we need to consume the xml with a streaming XmlReader instead.

As much as I love System.Xml.Linq, that’s how much I hate XmlReader. I don’t know why it is, but every time I have to use an XmlReader, I have to go back to the documentation. And working with an XmlReader rarely feels fun.

At work (by the way, we’re hiring all kinds of developers), we’ve written some really nice code to make reading xml easier. But I’m not at work, and I wanted to process a large set of xml data—namely, the Project Gutenberg catalog in RDF/XML format. So I came up with a simple, efficient solution that I want to share.

The Project Gutenberg catalog data looks something like this:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns:dcterms="http://purl.org/dc/terms/"
         xmlns:cc="http://web.resource.org/cc/"
         xmlns:pgterms="http://www.gutenberg.org/rdfterms/">

    <cc:Work rdf:about="">
        <cc:license rdf:resource="http://creativecommons.org/licenses/GPL/2.0/" />
    </cc:Work>

    <cc:License rdf:about="http://creativecommons.org/licenses/GPL/2.0/">
        <!-- cc:license children omitted -->
    </cc:License>

    <rdf:Description rdf:about="">
        <dc:created>
            <dcterms:W3CDTF>
                <rdf:value>2010-01-05</rdf:value>
            </dcterms:W3CDTF>
        </dc:created>
    </rdf:Description>

    <pgterms:etext rdf:ID="etext14624">
        <dc:publisher>&pg;</dc:publisher>
        <dc:title rdf:parseType="Literal">Santa Claus's Partner</dc:title>
        <dc:creator rdf:parseType="Literal">Page, Thomas Nelson, 1853-1922</dc:creator>
        <pgterms:friendlytitle rdf:parseType="Literal">Santa Claus's Partner by Thomas Nelson Page</pgterms:friendlytitle>
        <dc:language><dcterms:ISO639-2><rdf:value>en</rdf:value></dcterms:ISO639-2></dc:language>
        <dc:subject><dcterms:LCSH><rdf:value>Christmas stories</rdf:value></dcterms:LCSH></dc:subject>
        <dc:subject><dcterms:LCC><rdf:value>PZ</rdf:value></dcterms:LCC></dc:subject>
        <dc:created><dcterms:W3CDTF><rdf:value>2005-01-06</rdf:value></dcterms:W3CDTF></dc:created>
        <dc:rights rdf:resource="&lic;" />
    </pgterms:etext>

    <!-- etc. -->

</rdf:RDF>

Let’s first look at the wrong way to read this data:

static void Main()
{
    XNamespace nsGutenbergTerms = "http://www.gutenberg.org/rdfterms/";
    XNamespace nsRdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#";

    XDocument doc = XDocument.Load("catalog.rdf");
    foreach (XElement etext in doc.Root.Elements(nsGutenbergTerms + "etext"))
    {
        string id = (string) etext.Attribute(nsRdf + "ID");
        string title = (string) etext.Element(nsGutenbergTerms + "friendlytitle");

        Console.WriteLine("{0}: {1}", id, title);
    }
}

A couple of problems:

  1. speed—the program sits around for 5 seconds or so before outputting anything, while it loads the 128MB xml file into memory.
  2. memory usage—loading the 128MB file pushes the memory usage from 10,328K to 731,832K (as reported in task manager). I don’t want to read too much into that value, but we can certainly agree that loading the whole file into memory at once isn’t optimal.

This is the worst of both worlds: the program is slower than it needs to be, and it uses more memory than it should.

… but did I mention that I love LINQ to XML? Processing each etext element as an XElement instance is really convenient.

Ideally, we would want to combine the efficiency of reading the large xml file with an XmlReader with the convenience of handling each etext element as an XElement instance.

Cue Patrick Stewart saying, “Make it so”:

static void Main()
{
    XNamespace nsGutenbergTerms = "http://www.gutenberg.org/rdfterms/";
    XNamespace nsRdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#";

    using (XmlReader reader = XmlReader.Create("catalog.rdf",
        new XmlReaderSettings { ProhibitDtd = false }))
    {
        // move the reader to the start of the content and read the root element's start tag
        //   that is, the reader is positioned at the first child of the root element
        reader.MoveToContent();
        reader.ReadStartElement("RDF", nsRdf.NamespaceName);

        foreach (XElement etext in reader.ReadElements(nsGutenbergTerms + "etext"))
        {
            string id = (string) etext.Attribute(nsRdf + "ID");
            string title = (string) etext.Element(nsGutenbergTerms + "friendlytitle");

            Console.WriteLine("{0}: {1}", id, title);
        }
    }
}

Apart from noticing the similarity between this and the previous code block, the most interesting part of this code is the ReadElements extension method.

Before calling ReadElements, the code positions the reader on the first child of the root element. Then, ReadElements is called with an XName referring to the etext element. All of the etext elements are returned as a sequence.

This is exactly what I want: the program starts processing etext elements nearly instantly, and the memory utilization is barely noticeable.

Let’s look at the implementation of ReadElements:

/// <summary>
/// Returns a sequence of <see cref="XElement">XElements</see> corresponding to the currently
/// positioned element and all following sibling elements which match the specified name.
/// </summary>
/// <param name="reader">The xml reader positioned at the desired hierarchy level.</param>
/// <param name="elementName">An <see cref="XName"/> representing the name of the desired element.</param>
/// <returns>A sequence of <see cref="XElement">XElements</see>.</returns>
/// <remarks>At the end of the sequence, the reader will be positioned on the end tag of the parent element.</remarks>
public static IEnumerable<XElement> ReadElements(this XmlReader reader, XName elementName)
{
    if (reader.Name == elementName.LocalName && reader.NamespaceURI == elementName.NamespaceName)
        yield return (XElement) XElement.ReadFrom(reader);

    while (reader.ReadToNextSibling(elementName.LocalName, elementName.NamespaceName))
        yield return (XElement) XElement.ReadFrom(reader);
}

The documentation comments should be pretty self-explanatory, but it’s probably important to call attention to the side effects: ReadElements expects an intentionally positioned xml reader. Once ReadElements is done returning XElements, the reader will be positioned at the end element of the initially positioned element’s parent.

I should also point out it would be trivial to add an overload of ReadElements that didn’t take an XName and simply returned a sequence of the initially positioned element and all of its following siblings. But I don’t need that method yet, so I didn’t write it.

ReadElements will certainly allow me to process this large xml file more efficiently and easily than exclusively using either an XDocument or an XmlReader. Hopefully this method will be helpful to some of you, too.

April 23, 2008

C# abuse of the day: SwitchOnType

Filed under: csharp, extension methods — Jacob @ 5:30 pm

Today I encountered a situation where I wanted to switch based on a type. Maybe I stayed up a little too late reading Foundations of F#, last night.

While this is certainly no pattern matching, it didn’t seem like terrible C#:

DefinitionBase definitionBase = /*...*/;

var targetProperty = definitionBase.SwitchOnType(
        (ColumnDefinition col) => ColumnDefinition.WidthProperty,
        (RowDefinition row) => RowDefinition.HeightProperty);

Note that the lambdas require type decoration (you really don’t want to explicitly declare the generic parameters on this method).

Here’s the implementation (taking two Func projections—feel free to overload to your heart’s content):

public static TResult SwitchOnType<T, T1, T2, TResult>(this T source,
    Func<T1, TResult> act1, Func<T2, TResult> act2)
{
    if (source is T1)
        return act1((T1) source);

    if (source is T2)
        return act2((T2) source);

    throw new InvalidOperationException("No matching delegate found");
}

As you can see from the implementation, the method returns the result of the first delegate for which source can be converted into a parameter.

For a default case, add a final delegate that takes object.

April 16, 2008

PC#1: A solution

Filed under: challenge, csharp, extension methods, LINQ — Jacob @ 12:21 pm

So, when I initially posed the programming challenge #1 I stated:

… since I intended to output HTML, ASP.NET seemed a logical choice. But I was amazed at the amount of code required for such a seemingly simple task (not to mention how ugly code containing <% and %> is!).

Well, it turns out, using plain old C# with a little LINQ to XML functional construction made my solution a lot nicer.

Prerequisites

I created a few DateTimeExtensions to enhance readability, though I could have easily inlined the implementation of each of those methods without any LOC impact.

public static class DateTimeExtensions
{
    public static DateTime ToFirstDayOfMonth(this DateTime dt)
    {
        return new DateTime(dt.Year, dt.Month, 1);
    }
    public static DateTime ToLastDayOfMonth(this DateTime dt)
    {
        return new DateTime(dt.Year, dt.Month, DateTime.DaysInMonth(dt.Year, dt.Month));
    }
    public static DateTime ToFirstDayOfWeek(this DateTime dt)
    {
        return dt.AddDays(-((int) dt.DayOfWeek));
    }
    public static DateTime ToLastDayOfWeek(this DateTime dt)
    {
        return dt.AddDays(6 - ((int) dt.DayOfWeek));
    }
}

I also relied on the Slice extension method I’ve previously blogged about.

Solution

static void Main(string[] args)
{
    DateTime today = DateTime.Today;
    DateTime firstDayOfMonth = today.ToFirstDayOfMonth();
    DateTime startCalendar = firstDayOfMonth.ToFirstDayOfWeek();
    DateTime lastDayOfMonth = today.ToLastDayOfMonth();
    DateTime endCalendar = lastDayOfMonth.ToLastDayOfWeek();

    var calendarPrefix =
        from day in Enumerable.Range(startCalendar.Day, (firstDayOfMonth - startCalendar).Days)
        select new XElement("td", new XAttribute("class", "prevMonth"), day);
    var calendarMonth =
        from day in Enumerable.Range(1, lastDayOfMonth.Day)
        select new XElement("td", day == today.Day ? new XAttribute("class", "today") : null, day);
    var calendarSuffix =
        from day in Enumerable.Range(1, (endCalendar - lastDayOfMonth).Days)
        select new XElement("td", new XAttribute("class", "nextMonth"), day);

    var calendar = calendarPrefix.Concat(calendarMonth).Concat(calendarSuffix);

    var table = new XElement("table",
        new XElement("thead",
            new XElement("tr",
                from offset in Enumerable.Range(0, 7)
                select new XElement("th", startCalendar.AddDays(offset).ToString("ddd")))),
        new XElement("tbody",
            from week in calendar.Slice(7)
            select new XElement("tr", week)));

    Console.WriteLine(table);
}

I’d love to see more ways to solve this. If you’ve got a simpler or more beautiful implementation in your favorite programming langauge/web application framework, let me know in the comments of the original post.

April 4, 2008

Euler 14

Filed under: csharp, Euler, extension methods, LINQ, Ruby — Jacob @ 12:41 pm

When I read Dustin Campbell’s latest post, I couldn’t help but feel a bit like Steve Carrell in this clip from the Office. While his solution is an admirably close port of the original F# solution, it makes me feel a little bit yucky.

Of course, it’s completely hypocritical of me to say so, since I’ve abused C# to make it exhibit F#-like behavior in the past.

But Project Euler invites elegantly simple solutions (like the original F#). Different languages have different idioms, and a literal port typically doesn’t exhibit the same beauty as the original.

If I was solving project Euler 14 in C# (with “elegance and brevity in mind”), my code would look more like:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace Euler14
{
    class Program
    {
        static void Main(string[] args)
        {
            var iterativeSequences = from start in 1.To(1000000L)
                select new
                {
                    Start = start,
                    Length = SequenceUtility.Generate(start,
                        n => n % 2 == 0 ? n / 2 : 3 * n + 1,
                        n => n == 1).Count()
                };

            Stopwatch sw = Stopwatch.StartNew();

            var longestSequence = iterativeSequences.Aggregate(
                (longest, current) => current.Length > longest.Length ? current : longest
            );

            sw.Stop();

            Console.WriteLine("Longest sequence starts with {0:#,#} (found in {1:#,#.000} seconds)",
                longestSequence.Start, (float) sw.ElapsedTicks / (float) Stopwatch.Frequency);
        }
    }

    public static class SequenceUtility
    {
        // Can also overload To by changing the end value's type;
        // example: "int excludedEnd" returns "IEnumerable<int>"
        public static IEnumerable<long> To(this int start, long excludedEnd)
        {
            for (long i = start; i < excludedEnd; i++)
                yield return i;
        }

        public static IEnumerable<T> Generate<T>(T first, Func<T, T> getNext, Func<T, bool> isLast)
        {
            T value = first;
            yield return value;

            while (!isLast(value))
            {
                value = getNext(value);
                yield return value;
            }
        }
    }
}

Which runs in an acceptable ~5 seconds on my machine.

[Okay. You caught me: I stole the idea for that To extension method from Ruby’s upto. I’m a huge hypocrite and take back everything I said before.

Do invest time learning the idoms of other programming languages, and try applying them to your native language. You may discover something beautiful, after all.]

March 13, 2008

Dictionary To Anonymous Type

Filed under: csharp, extension methods, LINQ — Jacob @ 5:34 pm

There’s some buzz about how cool it is to initialize a Dictionary from an anonymous type instance. Roy Osherove recently wrote about it, though he attributes the technique to the ASP.NET MVC framework. Alex Henderson (whose blog I just subscribed to) also came up with an inspiring use of Lambda expressions to initialize Dictionaries (don’t miss the related posts at the bottom).

But I haven’t seen anyone do the reverse: initialize an anonymous type instance from a Dictionary.

Until now.

Prerequisites

public static class DictionaryUtility
{
    public static TValue GetValueOrDefault<TKey, TValue>(this IDictionary<TKey, TValue> dict, TKey key)
    {
        TValue result;
        dict.TryGetValue(key, out result);
        return result;
    }
}

Show me the code!

public static class AnonymousTypeUtility
{
    public static T ToAnonymousType<T, TValue>(this IDictionary<string, TValue> dict, T anonymousPrototype)
    {
        // get the sole constructor
        var ctor = anonymousPrototype.GetType().GetConstructors().Single();

        // conveniently named constructor parameters make this all possible...
        var args = from p in ctor.GetParameters()
            let val = dict.GetValueOrDefault(p.Name)
            select val != null && p.ParameterType.IsAssignableFrom(val.GetType()) ? (object) val : null;

        return (T) ctor.Invoke(args.ToArray());
    }
}

Notice anonymousPrototype. This is a technique called casting by example, coined by Mads Torgerson (of the C# team) in the comments of this post.

Since you can’t ever explicitly refer to the type of an anonymous type, you have to provide an example instance. Using an undocumented feature of the default keyword, we can strongly type the properties of our prototype object without a bunch of null casting.

Here’s some sample code to get you going:

class Program
{
    static void Main(string[] args)
    {
        var dict = new Dictionary<string, object> {
            { "Name", "Jacob" },
            { "Age", 26 },
            { "FavoriteColors", new[] { ConsoleColor.Blue, ConsoleColor.Green } },
        };

        var person = dict.ToAnonymousType(
            new
            {
                Name = default(string),
                Age = default(int),
                FavoriteColors = default(IEnumerable<ConsoleColor>),
                Birthday = default(DateTime?),
            });

        Console.WriteLine(person);
        foreach (var color in person.FavoriteColors)
            Console.WriteLine(color);
    }
}

And thanks to anonymous types overriding ToString(), our program reasonably outputs:

{ Name = Jacob, Age = 26, FavoriteColors = System.ConsoleColor[], Birthday =  }
Blue
Green

Notice that the types don’t even need to exactly match! The dictionary’s “FavoriteColors” value is a ConosleColor[]. But the anonymous type has an IEnumerable<ConsoleColor> property.

Enjoy!

February 4, 2008

DispsoseAfter

Filed under: csharp, extension methods — Jacob @ 9:17 am

My development team at work has recently started a technical blog: http://code.logos.com.

I just contributed my first (real) post: DisposeAfter.

Enjoy.

January 2, 2008

C# abuse of the day: Functional library implemented with lambdas

Filed under: csharp, extension methods, functional programming — Jacob @ 6:09 pm

With all the cool kids writing about F# and functional programming, I started thinking about a possible blog post.

One of my goals was to use lambda syntax to express the functional method implementations. To my eyes, lambdas are great at succinctly expressing higher-order functions. And using the => operator multiple times in a single line rocks! Without thinking about it too hard, I figured I could use static readonly fields to accomplish this goal.

Once I started writing the example code, though, I ran into a bit of a hitch with the generic parameters for the fields’ Func types.

Joseph Albahari (or perhaps his brother and coauthor, Ben) puts it well in C# 3.0 In a Nutshell [which, incidentally, is proving to be a great C# book] (pg. 99):

Generic parameters can be introduced in the declaration of classes, structs, interfaces, delegates […], and methods. Other constructs such as properties [or fields] cannot introduce a generic parameter, but can use one.

Meaning, if I want to declare a field that contains a generic parameter, that generic parameter has to be declared by the containing type.

Specifically:

public static class Functional
{
    public static readonly Func<Func<X, Y, Z>, Func<X, Func<Y, Z>>> Curry =
        fn => x => y => fn(x, y);
}

Won’t compile. Instead you’ll get a few of these:

error CS0246: The type or namespace name ‘X’ could not be found (are you missing a using directive or an assembly reference?)

You’d need to modify the class definition like so:

public static class Functional<X, Y, Z>

Now we could add the parameters to our Functional class, but then the calling code would be hideous:

Func<int, int, int> add = (x, y) => x + y;
Func<int, Func<int, int>> addCurried = Functional<int, int, int>.Curry(add);

I mean, I know this is C#, but that is just way too much type decoration. Especially since the three type arguments to Functional should all be inferable.

Ideally, the calling code should be an extension method:

Func<int, int, int> add = (x, y) => x + y;
Func<int, Func<int, int>> addCurried = add.Curry();

And then it dawned on me: we can define generic extension methods on a static FunctionalEx class and delegate the implementation to a nested generic class (with generic fields).

That is, we can hide the ugly syntax of invoking a delegate field of a generic class, while utilizing the ugly syntax of implementing our functional methods using lambdas!

public static class FunctionalEx
{
    public static Func<T1, Func<T2, TResult>> Curry<T1, T2, TResult>(this Func<T1, T2, TResult> fn)
    {
        return Implementation<T1, T2, TResult>.curry(fn);
    }

    public static Func<T2, T1, TResult> Flip<T1, T2, TResult>(this Func<T1, T2, TResult> fn)
    {
        return Implementation<T1, T2, TResult>.flip(fn);
    }

    private static class Implementation<X, Y, Z>
    {
        public static readonly Func<Func<X, Y, Z>, Func<X, Func<Y, Z>>> curry = 
            fn => x => y => fn(x, y);
        
        public static readonly Func<Func<X, Y, Z>, Func<Y, X, Z>> flip =
            fn => (y, x) => fn(x, y);
    }
}

Notice how the Curry extension method is implemented by the curry field of the nested generic Implementation class. Also notice how [un-?]readable the lambda implementation is. (Seriously though, if you look at the flip lambda long enough, it should start to make sense.)

Here’s some (far from practical) sample calling code to help:

class Program
{
    static void Main(string[] args)
    {
        Func<int, int, int> add = (x, y) => x + y;
        Func<int, Func<int, int>> addCurried = add.Curry();
        Func<int, int> increment = addCurried(1);

        Func<int, int, int> subtract = (x, y) => x - y;
        Func<int, Func<int, int>> subtractFlipped = subtract.Flip().Curry();
        Func<int, int> decrement = subtractFlipped(1);        

        Console.WriteLine("Expected: {0}; Actual {1}", 5, add(2, 3));
        Console.WriteLine("Expected: {0}; Actual {1}", 7, increment(6));

        Console.WriteLine("Expected: {0}; Actual {1}", 6, subtract(9, 3));
        Console.WriteLine("Expected: {0}; Actual {1}", 4, decrement(5));
    }
}

The output of which is:

Expected: 5; Actual 5
Expected: 7; Actual 7
Expected: 6; Actual 6
Expected: 4; Actual 4

I never did get around to writing that post on functional programming, but I now know how I’m going to implement the library if I do.

If you want to read more on currying in C#, I’d recommend Dustin’s post. In fact I should probably skip writing any further posts on functional programming with C#, since he’s got that topic pretty thoroughly covered.

November 24, 2007

Another set of extension methods

Filed under: csharp, extension methods, Ruby — Jacob @ 3:16 pm

In addition to the interesting diversions we’ve taken, I do want to continue presenting potentially useful code samples too. So here’s a fresh set of ruby inspired extensions:

public static IEnumerable<IndexValuePair<T>> WithIndex<T>(this IEnumerable<T> source)
{
    int position = 0;
    foreach (T value in source)
        yield return new IndexValuePair<T>(position++, value);
}    

public static void Each<T>(this IEnumerable<T> source, Action<T> action)
{
    foreach (T item in source)
        action(item);
}

public static void EachWithIndex<T>(this IEnumerable<T> source, Action<T, int> action)
{
    Each(WithIndex(source), pair => action(pair.Value, pair.Index));
}

I’ll include the mundane definition of IndexValuePair<T> (along with parameter validation) at the end of this post. But spend some time looking at these very simple methods.

diagram 1

Notice how once we’ve defined WithIndex and Each, we can combine them to define EachWithIndex. When we chain Each to the result of WithIndex, the type of the Action must be converted accordingly:

diagram 2

This is easily accomplished by the statement:

pair => action(pair.Value, pair.Index)

So with the added definition of IndexValuePair<T> and parameter validation, we add to our collection extensions the following:

using System;
using System.Collections.Generic;

public static class CollectionEx
{
    public static void Each<T>(this IEnumerable<T> source, Action<T> action)
    {
        if (source == null)
            throw new ArgumentNullException("source");
        if (action == null)
            throw new ArgumentNullException("action");

        foreach (T item in source)
            action(item);
    }

    public static void EachWithIndex<T>(this IEnumerable<T> source, Action<T, int> action)
    {
        if (source == null)
            throw new ArgumentNullException("source");
        if (action == null)
            throw new ArgumentNullException("action");

        Each(WithIndexIterator(source), pair => action(pair.Value, pair.Index));
    }

    public static IEnumerable<IndexValuePair<T>> WithIndex<T>(this IEnumerable<T> source)
    {
        if (source == null)
            throw new ArgumentNullException("source");

        return WithIndexIterator(source);
    }

    private static IEnumerable<IndexValuePair<T>> WithIndexIterator<T>(IEnumerable<T> source)
    {
        int position = 0;
        foreach (T value in source)
            yield return new IndexValuePair<T>(position++, value);
    }
}

public struct IndexValuePair<T>
{
    public IndexValuePair(int index, T value)
    {
        m_index = index;
        m_value = value;
    }

    public int Index
    {
        get { return m_index; }
    }
    public T Value
    {
        get { return m_value; }
    }

    readonly int m_index;
    readonly T m_value;
}

November 16, 2007

Ruby inspired extension method

Filed under: csharp, extension methods, Ruby — Jacob @ 12:00 am

Reading Dustin Campbell‘s latest post reminded me that I really like Ruby’s Enumerable mixin.

One of the compelling methods in that type is each_slice (and the related enum_slice). The each/enum distinction to a C# developer can be understood as the distinction between a void method that takes a delegate, and an iterator method (a method that uses yield return) that returns an IEnumerable.

With the advent of C# 3.0 and the built-in Enumerable extension methods, returning an IEnumerable is a pretty powerful construct—developers aren’t limited to just foreach-ing over the results anymore.

So here’s a C# Slice extension method that is roughly the equivalent of Ruby’s enum_slice method:

using System;
using System.Collections.Generic;

namespace RubyInspiredExtensions
{
    public static class CollectionEx
    {
        /// <summary>
        /// Iterates the specified sequence returning arrays of each slice of <paramref name="size"/> elements.
        /// The last array may contain fewer that <paramref name="size"/> elements.
        /// </summary>
        /// <typeparam name="T">The sequence element type.</typeparam>
        /// <param name="sequence">The source sequence.</param>
        /// <param name="size">The desired slice size.</param>
        /// <returns>A sequence of arrays containing the elements from the specified sequence.</returns>
        public static IEnumerable<T[]> Slice<T>(this IEnumerable<T> sequence, int size)
        {
            // validate arguments
            if (sequence == null)
                throw new ArgumentNullException("sequence");
            if (size <= 0)
                throw new ArgumentOutOfRangeException("size");

            // return lazily evaluated iterator
            return SliceIterator(sequence, size);
        }

        // SliceIterator: iterator implementation of Slice
        private static IEnumerable<T[]> SliceIterator<T>(IEnumerable<T> sequence, int size)
        {
            // prepare the result array
            int position = 0;
            T[] resultArr = new T[size];

            foreach (T item in sequence)
            {
                // NOTE: performing the following test at the beginning of the loop ensures that we do not needlessly
                // create empty result arrays for sequences with even numbers of elements [(sequence.Count() % size) == 0]
                if (position == size)
                {
                    // full result array; return to caller
                    yield return resultArr;

                    // create a new result array and reset position
                    resultArr = new T[size];
                    position = 0;
                }

                // store the current element in the result array
                resultArr[position++] = item;
            }

            // no elements in source sequence
            if (position == 0)
                yield break;

            // resize partial final slice
            if (position < size)
                Array.Resize(ref resultArr, position);

            // return final slice
            yield return resultArr;
        }
    }
}

Blog at WordPress.com.