Today I encountered the strangest .NET Framework design decision I’ve seen in recent times:
HashSet<T>’s GetEnumerator method returns a public struct HashSet<T>.Enumerator.
Let’s count how many Framework Design Guidelines this violates:
1. Avoid publicly exposed nested types.
- violation: duh.
2. It is immutable.
- violation: calling
MoveNextmutates the enumerator object.
3. It will not have to be boxed frequently.
- violation: passing a
HashSet<T>as a parameter to a method that acceptsIEnumerable<T>(Linq, anyone?) will hide the class’GetEnumeratormethod. Therefore, any calls toGetEnumeratorcall the interface method which requires boxing theHashSet<T>.Enumeratorto return anIEnumerator<T>.
4. [Any others you see? Leave a comment.]
I really want to hear the arguments in favor of the shipping design.
Advertisement
You hadn’t noticed that List.GetEnumerator(), Dictionary.GetEnumerator(), LinkedList.GetEnumerator(), Queue.GetEnumerator(), SortedDictionary.GetEnumerator(), SortedList.GetEnumerator() and Stack.GetEnumerator() do precisely the same thing?
I’m pretty certain that this is for performance. If you create a sample that performs a foreach over a HashSet, and look at the generated IL, you’ll find that each call to Enumerator.Current and Enumerator.MoveNext() are generated as .call instructions rather than .callvirts.
The vast majority of code that calls these GetEnumerator() methods is generated by foreach statements within a method body. So, this improves the performs of foreach in the majority of cases. If the object happens to be cast as an IEnumerable, only one box occurs to cast the Enumerator struct to an IEnumerator. IMO, that’s a pretty reasonable trade off.
BTW, I don’t have verification yet that this is why a struct is used, but that’s my gut feeling.
Comment by Dustin Campbell — July 16, 2008 @ 7:13 pm
Heh; obviously I hadn’t noticed that, yet.
Your rationale makes sense, but are .callvirts really that much more expensive than .calls?
Eric Gunnerson recently posted about non-virtual methods still getting .callvirt instructions and made the perf cost sound minimal.
I hope the performance gains are actually there, to justify the strange design.
Comment by Jacob — July 17, 2008 @ 9:00 am
I agree with Dustin that it seems perf related, but I bet it’s more about avoiding heap alloc and GC pressure.
Comment by Joe Cheng [MSFT] — July 28, 2008 @ 12:37 am
Jacob,
It’s not a weird design at all. As mentioned above, having iterators as structs allows the compiler to put less pressure on GC by creating value type iterators on the stack (but in that case they should be declared as public structs, to avoid boxing).
But the iterator will be boxed if casted to an interface, right? So, the compiler does a trick there: if iterating via a foreach loop (90% of cases, I bet) it uses duck typing and does NOT cast the iterator to IEnumerator but calls MoveNext and Current directly (thus call instead of callvirt).
Cheers,
Andrew
PS. btw: “yield return” generates a private nested class, not a public nested struct
Comment by Andrew — August 19, 2008 @ 12:09 pm