Jacob Carpenter’s Weblog

November 16, 2007

Unnecessary MoveNext: do we care?

Filed under: csharp — Jacob @ 4:59 pm

In the last post we looked at some counts for MoveNext, when iterating IEnumerables returned by Take/Skip. Now we’re going to attempt to answer the question: do we care?

Let’s step back a little bit and strip this code down to the bare essentials:

using System;
using System.Collections.Generic;
using System.Linq;

class Program
{
    static IEnumerable<int> NumbersOneToTen()
    {
        yield return 1;
        yield return 2;
        yield return 3;
        yield return 4;
        yield return 5;
        yield return 6;
        yield return 7;
        yield return 8;
        yield return 9;
        yield return 10;
    }

    static void Main()
    {
        using (IEnumerator<int> enumerator = NumbersOneToTen()
            .GetEnumerator())
        {
            enumerator.MoveNext();
        }
    }
}

Here, we have a console application that merely calls MoveNext on the Enumerator returned by NumbersOneToTen. We’re explicitly controlling the enumeration to avoid any ambiguity introduced by using foreach. For instance, we’ve ensured that the Current property of the enumerator is never accessed.

I’ve set a breakpoint on the line containing MoveNext with the intent to hit F11 (Step Into). [It probably bears mentioning that I’m using Visual Studio 2008 Beta 2.]

When you run that (F5), you’ll see that pressing F11 once steps into the NumbersOneToTen method, pressing it a second time moves to the line with the first yield return, and pressing it a third time returns control back to our Main method (with the enumerator positioned on the first item).

The only potentially surprising thing here is that calling NumbersOneToTen() doesn’t actually step into the method, but that’s the beauty of deferred evaluation.

So, what are some things we could do with the sequence that would make calling MoveNext more interesting?

One interesting thing we can do is project elements from the sequence as different elements. In .NET 2.0, this operation was called ConvertAll. In .NET 3.5, we use the Select extension method.

Let’s modify our Main method a bit:

using (var enumerator = NumbersOneToTen().Select(n => "Number " + n)
    .GetEnumerator())

Here, we’re projecting all of the numbers from 1..10 to the strings: “Number 1″..”Number 10”. Now let’s step through our code again: F11 once steps into NumbersOneToTen; F11 moves to the line with the first yield return; F11 steps into the lambda passed to Select.

I’m not just emphasizing that line because it’s a miraculous feat of the debugger that it can step into embedded statements.

I’m emphasizing that statement because we haven’t accessed Current at all. We’ve only asked to advance the enumerator. Every time we call MoveNext, it appears our projection is evaluated. I was actually surprised by that fact.

So, given the example from the last post, if we were to naively introduce our simple projection:

foreach (var batch in NumbersOneToTen().Select(n => "Number " + n).Slice(3))
    foreach (var item in batch)
        Console.WriteLine(item);

We would end up doing 50 unnecessary string concatenations. It’s also not hard to conceive of a projection more expensive than string concatenation.

But we can improve that, right? Certainly the same result is achieved if we project the items after splitting them into batches:

foreach (var batch in NumbersOneToTen().Slice(3))
    foreach (var item in batch.Select(n => "Number " + n))
        Console.WriteLine(item);

There! Now we’re no longer unnecessarily projecting elements when splitting into batches; we only execute the projection as we consume the batch’s items.

But projection isn’t the only operation we can perform on sequences. Filtering is pretty useful, too. In .NET 2.0, we used FindAll; in 3.5, we use Where:

var evens = NumbersOneToTen().Where(n => n % 2 == 0);

Here, we’re filtering the sequence of numbers from 1..10 to a sequence containing only the even numbers.

Unfortunately, this is also an example of a case where we can’t move the sequence operation into the item-consuming loop. If we filter the outer sequence (prior to slicing), we get two slices: { 2, 4, 6}, { 8, 10 }. However, if we filter at the item-consuming level, we get four slices, none of which are the desired size (3): { 2 }, { 4, 6 }, { 8 }, { 10 }.

So, given a Take/Skip-based implementation of Slice with a filtered sequence, we’re stuck with 50 unnecessary predicate evaluations.

That’s enough to convince me that Take/Skip should be avoided in this context, despite the elegance of the proposed implementation.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: