Foreach, Garbage, and the CLR Profiler
I've been doing a lot of thinking lately about foreach, and in what situations it may or may not create garbage. There have been some posts on the forums asking about it, and I recently found out that for some types, the enumerator that is created is a value type, not a reference type. So, foreach loops over those types shouldn't make garbage at all. Why does this matter? In an XNA Framework forum thread, Stephen Strychak, an XNA Framework Developer, explains:
"...the performance of your game will degrade if you allow garbage to be generated during gameplay. The problem again is not the for or foreach. The problem is that the garbage collector on the Xbox 360 is less efficient than the one on Windows, and doing a collection can take a long time. When you have many objects in your game, it can take longer than 1/60th of a second to do a collection. That means that you will drop frames. The more garbage you generate, the more frequently the GC will have to do a collection, and the more frequently your game will drop frames.
So the bottom line is that you should avoid generating garbage, and using for instead of foreach is one way to do that."
Shawn posted this in a follow up, which is an important point:
"...bear in mind that just because foreach creates garbage, this does not neccessarily mean you should avoid using it! I think people often get too hung up on optimisation. Fast code is good, but readable code is more important. If your program is already running fast enough, it's sometimes ok to do things the easy way, even if you know that is less efficient than it could be..."
I agree with Shawn: when I write code, I want it to be maintainable, easy to read, and easy to modify. But still, it's good to know what things may have a potential speed impact. And plus, since I'm a nerd, I wanted to look into this: the whole situation seemed like a lot of special cases and conjecture: you will make garbage if you go over this collection, but you won't if you go over that collection (but only if x is true). I wanted to get the whole story. Plus, this was an ideal reason to learn to use the Windows and Xbox 360 CLR profiling tools, which I've put off doing for some time. So, I figured I'd do just that: do a foreach loop over a bunch of collections, use the profilers to see if I'm making garbage, and hopefully we'd all learn something in the process.
I started with Windows, so the first thing I needed was the Windows CLR Profiler. I needed to learn how to use it, too. It comes with a pretty hefty document explaining it, but I found this link on msdn tv, where a friendly man gave me an overview. It's about half an hour long, and was worth watching, I think.
After watching the video, I felt fairly confident that I would be able to get the information I needed. It was time to start mucking around, to see what I can find out. In C# express, I created a new console app, and filled it in with what is quite possibly the world's dumbest code ever:
class Program
{
class GameEntity
{
public void Update()
{
}
}
static GameEntity[] entities = new GameEntity[100];
static Program()
{
for (int i = 0; i < entities.Length; i++)
{
entities[i] = new GameEntity();
}
}
static void Main(string[] args)
{
byte[] byteArray = new byte[1];
for (int i = 0; i < entities.Length; i++)
{
entities[i].Update();
}
}
}
I wanted to do this for a quick sanity check. If I had any idea what I was doing, when I use the CLR profiler on this app, I should see Main making one allocation, the 1 byte array. So I built it, and ran the CLR Profiler on it. Once it was done, I pulled up the allocation graph using by clicking this button:
My first look at the allocation graph was fairly terrifying. It's full of colored lines and boxes for methods I'm fairly sure I didn't write.
After fiddling for a minute, it turned out that as always, right clicking is the answer. I did "Find routine" and looked for Main, which gives me this view. Much better.
I've got no idea why my 1 byte array actually took up 13 bytes, but I'm not particularly worried about it. (I'm curious to know, however, so if anyone out there in readerland has any ideas, send them my way.)
As my next sanity check, I changed my program from an array to a Collection<GameEntity>, and changed Main to do a simple foreach over the Collection:
static void Main(string[] args)
{
byte[] byteArray = new byte[1];
foreach (GameEntity e in entities)
{
e.Update();
}
}
Under the hood, that should create an enumerator on the managed heap, which should show up in the profiler. So, following the exact same steps in the CLR profiler, I get this display:
There's that pesky enumerator right there, just as we expected!
Now, here comes the tricky part. Rumor has it that many of the types in System.Collections.Generic will not allocate an enumerator when using foreach. List's GetEnumerator, for example, returns a struct, which will just sit on the stack. Look for yourself with .NET Reflector, if you don't believe me. To prove to myself that a foreach over a List doesn't cause any heap allocations, I changed entities to be a List, did the exact same foreach loop, and ran the profiler. No enumerator! I tested a few other types as well, LinkedList and Queue, with the same result. Foreach loops over most of the types in System.Collections.Generic do not generate garbage. The most prominent exception is Collection. Collection<T> cannot declare its enumerator type as a struct, because Collection<T> is designed to be easily overridden.
However, there is definitely a caveat to the above. Foreach loops over Lists can still generate garbage. Take this code, for example:
const int NumEntities = 100;
static List<GameEntity> list = new List<GameEntity>();
static Program()
{
for (int i = 0; i < NumEntities; i++)
{
list.Add(new GameEntity());
}
}
static void Main(string[] args)
{
UpdateIEnumerable(list);
}
private static void UpdateIEnumerable(IEnumerable<GameEntity> enumerable)
{
foreach (GameEntity e in enumerable)
{
e.Update();
}
}
this will create garbage. Even though we're still doing a foreach over a List, when the list is cast to an interface, the value type enumerator must be boxed, and placed on the heap.
So, as far as I know, here's the final word:
When doing a foreach over a Collection<T> an enumerator WILL be allocated.
When doing a foreach over most other collections, including as arrays, lists, queues, linked lists, and others:
if the collections are used explicitly, an enumerator will NOT be allocated.
if they are cast to interfaces, an enumerator WILL be allocated.
Hope this helps some people, and maybe you're a little less terrified of the CLR profiler. Next time, I'd like to do the same thing for the .NET CF on the Xbox 360, or maybe figure out how to use the CLR profiler to track down the methods that are doing the most allocations. I have a pretty terrible track record with "Next times", though, so don't keep your fingers crossed.
Comments
Anonymous
March 16, 2007
Nice read. One question: which would be the results if you use, say, a "List<IGameEntity> entities" collection? With something like: foreach (IGameEntity e in entities) { e.Update(); }Anonymous
March 16, 2007
Very interesting! Don't forget to investigate the "yield" key word :-)Anonymous
March 16, 2007
Oh, and I forgot to mention: You are missing sign-in/log-out links on your blog. Since you have annonymous comments dissabled, I had to go to another MSDN blog to log in, then come back in order to post a comment.Anonymous
March 16, 2007
Nice article. It seems foreach() isnt as bad as people have been making it sound :)Anonymous
March 17, 2007
Here we go with another Weekly Roundup from the XNA World. 05-03-2007 Code Release George has postedAnonymous
March 17, 2007
It's been years since I look at anything on the CLR level, but every object in the CLR requires 12 bytes for the object header (syncblock and other stuff?). This is in addition to any actual data the object has.Anonymous
March 19, 2007
Yeah. The one byte array will need some extra data for CLR infrastructure. This is true for any reference type or boxed value type, but not for value types that are living on the stack. I don't know precisely what that header contains, but if I had to guess I'd say maybe a four byte vtable pointer (which identifies the type of the object, and contains the indirection pointers for any virtual methods it may contain), plus a 4 byte array size counter, plus maybe 4 bytes for the garbage collector to tag which objects it has marked, implement write barriers and so on? (I'd guess the GC would pack quite a lot of data into an integer bitfield).Anonymous
March 19, 2007
Performance never ceases to be a fascinating topic. This forum thread inspired Eli to write about foreachAnonymous
March 25, 2007
Ok here we go with this weeks update… It has been a relatively quite week in the world of XNA; this mustAnonymous
April 18, 2007
As promised, here's a bit more about foreach and garbage. Last time I wrote about the Windows DesktopAnonymous
July 02, 2007
In my previous post I described how to tell if your Xbox garbage collection is taking too long. We sawAnonymous
August 07, 2007
事实上对Array进行foreach的时候根本不会调用GetEnumerator()方法, 也就是被编译器给内联了. 而对于List<>, 虽然没有编译器内联GetEnumerator()Anonymous
February 03, 2009
This blog post inspired me to dig around a bit more into the internals of List<T> and Collection<T>. What's interesting is that all of the .NET enumerator types are structs - none are classes. What's really happening here is boxing, not allocation. Internally, instantating a new Collection<T> "wraps" a List<T> internally. The instance is placed into a variable typed as IList<T>. When Collection<T>.GetEnumerator() is called, a "List<T>.Enumerator" struct is returned. However, because GetEnumerator() returns IEnumerator<T>, the struct is boxed - resulting in the allocation. Apparently the C# foreach keyword has a bit of "magic" behavior - it doesn't entirely depend on IEnumerable. If a type has a public "subtype" named "Enumerator", it instances that instead - no boxing! Because Collection<T> wraps IList<T>, it can't provide it's own public Enumerator subtype. Foreach is forced to use the IEnumerable interface, which boxes the struct into IEnumerator<T>. It also explains why casting List<T> to IList<T> causes the same behavior - the public Enumerator subtype is no longer 'visible' to foreach. I ran into a problem because I wanted to expose Collections of objects as properties in my classes instead of List - mainly due to class design guidelines and the fact I wanted to "hook" into SetItem/ClearItems/RemoveItem/InsertItem. This was for my "ScreenManager.Screens" property. On ScreenManager.Draw I was iterating over the collection, causing an allocation for each frame. I didn't want to rip out the Collection, either. There's a nice solution. Add a normal list to the class: "List<Screen> screenList" and wrap it with the collection: "Collection<Screen> screenCollection = new Collection<Screen>(screenList)". Iterate over the screenList in the Draw method - no allocation will be performed since it can directly access the private List<Screen> and it's Enumerator sub-struct. The collection is directly exposed as the property, providing a clean interface for externally managing the screens. The only gotcha here is that modifications to the underlying List<Screen> type won't call the Collection's InsertItem/SetItem/ClearItems/RemoveItem protected virtual methods. The final pattern looks something like this:Create a List<T> type in a class, make it private
Do not expose the List<T> type (keep it private)
Create a Collection<T> type, and pass the List<T> instance to it's constructor
Iterate using foreach (myListVariable) within the class to avoid boxing the iterator
Do not use List<T> for modifying the list - use Collection<T> instead
Expose Collection<T> as a public/protected property (as needed) to provide a clean interface to the collection. (see http://blogs.msdn.com/fxcop/archive/2006/04/27/585476.aspx) Overkill for most situations, but useful in the "tight loop" game render-type code - even on the Windows platform.
Anonymous
February 13, 2009
@ShadowChaser: How sure are you about that "magic" behavior of the C# foreach keyword and the instantiation logic you describe for a "subtype" named "Enumerator"? I would think that the compiler can figure this out by just looking at the implementation of the GetEnumerator method of the enumerable object over which the foreach keyword needs to iterate. For example, the return type of Collection<T>.GetEnumerator is the generic IEnumerator<T> interface, which is also the case for IList<T>.GetEnumerator, while the "magic" List<T>.GetEnumerator returns List<T>.Enumerator (but this nested struct in turn also implements the generic IEnumerator<T> interface, such that List<T>.GetEnumerator is still a valid implementation of the IEnumerable<T>.GetEnumerator method). I find it hard to believe that the C# compiler would insist on you having to define a nested type called "Enumerator" for this optimization to work, but rather just insist on a specific value-type implementation of the IEnumerable<T> interface instead of just returning the basic interface itself from the enumerable's GetEnumerator method.Anonymous
May 17, 2009
关于C#中for和foreach孰优孰劣的争论似乎从来就没有停止过,各种谣言和真相混杂在一起,似乎变成了一个很复杂的问题.....Anonymous
February 05, 2011
The comment has been removed