Extending the World

When people think of C# 3.0 and Linq, they commonly think of queries and databases.  The phenomenal work of the Linq to SQL guys provides ample reason to think of it this way; nevertheless, C# 3.0 and Linq are really much much more.  I have discussed a number of things that can be done with lambdas, expression trees, and queries and will continue to do so but I want to pause and discuss a little gem that is often overlooked in C# 3.0.  This new language feature has fundamentally changed both the way that I work in C# and my view of the world.  I've been using it a lot without ever drawing attention explicitly to it.  At least one reader noticed it and the possibilities it opens up and at least a couple of readers want an expanded version of it without even knowing it.

So what is the feature?  It's extension methods.

At first glance they don't look very special.  I mean really, all they are is one extra token in the definition of a static method inside of a static class.

static class Foo {

  public static Bar Baz(this Qux Quux) { ...

But as is usually the case, it's the semantics that are more interesting than the particular syntax.

The first argument of an extension method (the argument marked with this) is the implicit receiver of the method.  The extension method appears to be an instance method on the receiver but it is not.  Therefore, it cannot access private or protected members of the receiver.

For example, let's say that I detested the fact that the framework doesn't have a ToInt method defined on string.  Now, I can just provide my own:

public static int ToInt(this string s)
{
return int.Parse(s);
}

And I can then call it as:

"5".ToInt()

The compiler transforms the call into:

ToInt("5")

Notice how it turns it outside out.  So if I have three extension methods A, B, and C

x.A().B().C()

The calls get turned into

C(B(A(x)))

While all of this explains how extension methods work, it doesn't explain why they are so cool.

A few months back, I was reading various online content related to C# 3.0.  I wanted to get a feel for what customers were feeling and incorporate it as much as possible into the product.  In the process, I came across an interesting post, Why learning Haskell/Python makes you a worse programmer.  The author argues that learning a language like Python or Haskell can make things more difficult for you if your day job is programming in a language like C#.

I sympathize with what the author has to say and have had to spend enough time programming in languages that I didn't like that I think that I understand the pain.

That said, I hope that the author (and others who feel like him) will be pleasantly surprised by C# 3.0.  For example, let's look at his example of painful programming:

"I have a list of Foo objects, each having a Description() method that returns a string. I need to concatenate all the non-empty descriptions, inserting newlines between them."

In Python, he says that he would write:

"\n".join(foo.description() for foo in mylist if foo.description() != "")

In Haskell, his solution looks like:

concat $ List.intersperse "\n" $ filter (/= "") $ map description mylist

These both look like reasonable code and I rather like them.  Fortunately, you can express them in C# 3.0.  Here is the code that looks like the Python solution.

"\n".Join(from x in mylist where x.Description != "" select x.Description)

And here is the code that is closer to his Haskell solution:

mylist.Where(x => x.Description != "").Select(x => x.Description).Intersperse("\n").Concat();

At this point, some will protest that there is no Join instance method on string and there is no Intersperse defined on IEnumerable<T>.  And for that matter, how can you define a method on an interface in the first place?  Of course, extension methods are the answer to all of these questions.

public static string Join(this string s, IEnumerable<string> ss)
{
return string.Join(s, ss.ToArray());
}

public static IEnumerable<T> Intersperse<T>(this IEnumerable<T> sequence, T value)
{
bool first = true;
foreach (var item in sequence)
{
if (first)
first = false;
else
yield return value;
yield return item;
}
}

It is as if these methods were defined on the receiver to begin with.  At this point the realization sets in: a whole new mode of development has been opened up.

Typically for a given problem, a programmer is accustomed to building up a solution until it finally meets the requirements.  Now, it is possible to extend the world to meet the solution instead of solely just building up until we get to it.  That library doesn't provide what you need, just extend the library to meet your needs.

I find myself switching between the two modes frequently: building up some functionality here and extending some there.  In fact, these days I find that I often start with extension methods and then when certain patterns begin to emerge then I factor those into classes.

It also makes some interesting styles of programming easier.  I am sure it has some name, but since I don't know what it is I'll call it data interface programming.  First we declare an immutable interface that includes only data elements.

interface ICustomer
{
string Name { get; }
int ID { get; }
}

Then, we declare an inaccessible implementation of ICustomer that allows customers to be created through a factory that only exposes the immutable version.

class Factory
{
class Customer : ICustomer
{
public string Name { get; set; }
public int ID { get; set; }
}

  public static ICustomer CreateCustomer(int id, string name)
{
return new Customer { ID = id, Name = name };
}
}

Then we can declare behavior through extension methods.

public static string GetAlias(this ICustomer customer)
{
return customer.Name + customer.ID.ToString();
}

And finally, we can use the behavior.

var customer = Factory.CreateCustomer(4, "wes");
Console.WriteLine(customer.GetAlias());

All of this may seem like a round about way to declare an immutable abstract base class with various derived classes.  But there is a fundamental difference, the interface and behavior can change depending upon which extension methods are in scope.  So one part of the program or system can treat them one way and another can have an entirely different view of things.

Of course, what I really want to be able to do (and we don't do it yet) is something like:

var customer = new ICustomer { ID = 4, Name = "wes" };
Console.WriteLine(customer.GetAlias());

And then I skip the whole Factory thing all together.  The customer is immutable and the definition of the type is short and sweet.  All of the work of done by the compiler which incidentally doesn't need the factory because it can name mangle the implementation class and provide customized constructors automatically.  But I digress, hopefully we can do something like that in the future.

Of course extension methods don't make the traditional techniques inapplicable, they are still as useful as ever.  As with all design considerations, there are trade-offs involved.  Care must be taken to manage extension methods so that chaos doesn't ensue, but when they are used appropriately they are fantastically useful.

As I have been writing C# code, I have accumulated a library of useful extension methods and I encourage you to do the same thing so that the ideas that you think roll naturally off of your fingertips.

Comments

  • Anonymous
    March 09, 2007
    I think I'm missing the point.  How does this approach improve on traditional approaches to polymorphism.  An example would be helpful.

  • Anonymous
    March 09, 2007
    Awesome! You know what would be cool....a website where you can share extension methods. I bet you'd see a ton of stuff for math, image manipulations, string & regex.

  • Anonymous
    March 09, 2007
    Extension methods are really quite nice. It's interesting that you used Join() as an example, since that's the first extension method I wrote (and probably the one that's proved most broadly useful thus far). Here's a version that skips the array translation: static string Join<T>(this IEnumerable<T> value,                      string separator,                      Converter<T, string> converter) {    StringBuilder joined = new StringBuilder(128);    IEnumerator<T> enumerator = value.GetEnumerator();    if (enumerator.MoveNext()) {        for (;;) {            joined.Append(converter(enumerator.Current));            if (enumerator.MoveNext()) {                joined.Append(separator);            } else {                break;            }        }    }    return joined.ToString(); } This particular overload also contains a third parameter (second in extension method syntax) which I've found to really expand the utility of the method. For example, in Web projects you often have to encode data, e.g.: Response.Write(listOfStrings.Join(", ", HttpUtility.HtmlEncode)); It's also an efficient way of joining things of the wrong type: Response.Write(new int[] { 1, 2, 3 }.Join(", ", Convert.ToString)); Or the really wrong type (thanks lambda syntax!): Response.Write(GetUserObjects().Join(", ", u => HttpUtility.HtmlEncode(user.FullName)); Here are a few other Ruby inspired methods: http://derekslager.com/blog/posts/2006/10/channeling-ruby-in-csharp-3.ashx

  • Anonymous
    March 09, 2007
    The comment has been removed

  • Anonymous
    March 09, 2007
    Surely this also means that you can write more static side-effect free code but treat it like an instance method. Having originally started programming in imperative/ OO languages and moving rapidly towards functional programming I have recently come unstuck about whether to make methods instance methods or static. This answers the problem in one step - make them static - ergo side-effect free and stateless - and treat them like an instance method - more easily readable dot notation.

  • Anonymous
    March 09, 2007
    Me too. The whole static vs. instance thing is causing me no end of pain as I try and fold (sic) the functional stuff into my OO background. I'd really appreciate any guidance from folks who've been using C# 3 long enough to have written some medium to large code bases in it already.

  • Anonymous
    March 09, 2007
    The comment has been removed

  • Anonymous
    March 09, 2007
    I've been a Boo fanatic for quite a while, its a python-inspired CLR language, and has had extension methods for a while.  I love them so much!!!  I have a stdlib of string and IEnumerable extensions (see: map) that I use everywhere. Of course, fsharp came out a little while latter, and they have almost all of the IEnumerables extensions I'd been so dilligently re-creating.

  • Anonymous
    March 09, 2007
    Extension methods also allow for some other unusual programming styles. I read a paper called "First-class relationships in an object-oriented language", in whch the authors propose a new language that supports their thesis. Using extension methods, no such language is needed. See http://blog.lab49.com/?p=237 for more details.

  • Anonymous
    March 09, 2007
    The comment has been removed

  • Anonymous
    March 09, 2007
    -I fully agree about your use of extension methods, but i guess that as they give more flexibility to the developer, decipline is needed. One can go after extension methods, to find himself fast in a mess of packages and classes. Having said that, i pretty like your implementation, i think extension methods can be most handy for expressions builders, and fluent interfaces. i mean as most readers said, a side-effect free helper fuctions that can be used in fluent-interface style.

  • i really moaned a lot about anonymous classes, and i would really love to see them in c# 3.0 (i dont want to wait for another version).
  • I posted on my blog about extension methods (as a reflection of Martin Fowler's Fluent Interfaces post) " OBSEV:: Fluent Interface and c# 3.0 Extension Methods : The flexibility of dynamic typing with the powerfull AutoCompletion " i guess it worth to be read http://sadekdrobi.com/?p=22
  • i like statically typed laguages, extension methods came to offer me some flexibility i really lacked before. by the way, it might be a good idea, to have something like a namespace interface, where we can switch  implementations for extension methods, kind of AOP :p . anyway thats an idea to suggest to the research guys :)
  • Anonymous
    March 10, 2007
    Damien: That is an awesome post.  I hadn't thought of pulling cyclical dependencies out an using extension methods to manage them.  Very nice indeed. I agree about the packaging thing.  Use them with caution.  Possibly include each set of related extension methods in their own namespace.  We are thinking about things (post orcas) that would extend and improve the situation. Sadek: Interesting stuff.

  • Anonymous
    March 11, 2007
    Post-Orcas it will no doubt be too late to fix the  extension-method packaging problem. Its a wart, and better to fix it now before its set in stone. Too many warts accumulating in c# as it is.

  • Anonymous
    March 12, 2007
    Thanks for the link Wes. BTW I've tried walking a tree with extension methods and LINQ and wondered if you could see a better way... wouldn't surprise me!

  • Anonymous
    March 12, 2007
    Hi Wes, I've been playing around a lot with both C# 3 and C# 2, trying to push the whole functional thing as far as makes sense in each respective iteration of the langauge (I mean no ones suggesting throwing out the OO baby with the bath water right). In so doing I'm finding that in terms of supporting functional composition (or 'pipelining') it seems as if it's may be better to trend towards returning an empty rather than null collection. So empty means 'nothing' and null is not used in favor of an exception being throw. Given that the cost of a managed new is on average considerably lower than an unmanged malloc I find I'm less nervous about writing code like this than I might be in say C++. Added to which these empty collections when spun up during a pipeline usually have very short life spans since they're almost always either return values or parameters (occasionally locals), but not member variables. I was wondering what your thoughts where on the subject. Clearly like anything it could be taken too far - for example if some strange custom container had a computationally expensive default constructor (hard to imagine really in the general case). I'm aware that the empty vs. null debate is not a new one - I'm just interested to know if writing in a FP style favors one approach over another. Perhaps some of the other FP vets could offer up their experience on the subject? Kind regards, tom

  • Anonymous
    March 13, 2007
    Alex: I like it.  You'll want to check out my Linq perf post when I finish it for some details that are related to your implementation. Tom: I agree.  Do not throw the OO baby out! I like the idea of returning empty collections especially since all empty collections are created equal (of a given type).  So you really don't even need to new them up very often (though they are relatively cheap as you indicate). Personally, I really like removing null as much as possible (without doing it for its own sake).  So one class that I often use is IOptional<T> which is similar to Nullable<T> but for reference types.

  • Anonymous
    March 14, 2007
    It would be nice if we could define static functions in a namespace without having to have a useless wrapping class. Perhaps every namespace should have a hidden static class that holds the free functions in that namespace. Imprting the namespace would be equivalent to importing the functions defined in that hidden class. You could still package up functions in a static class, but that static class would need to be imported explicitly for those functions to be available as free functions or extension methods. By free function, I mean a function that can be called without a qualifying class prepended.

  • Anonymous
    March 14, 2007
    C# has been a class based language since the get-go so it's hard to imagine it changing to the degree that we'd be able to write free functions in it. One alternative might be a 'with' ('using' is overloaded enough) style keyword that would open up a static class into the current scope and allow you to omit the '<ClassName>.' from a function invocation. Of course you can alway point a Func<...> at a member function and then reference using just the variable name - which gets you closer to what you want today (i.e. it works in both C# 2 and 3).

  • Anonymous
    March 14, 2007
    The comment has been removed

  • Anonymous
    March 15, 2007
    Tom & Damien: We have considered it several times and it is certainly a possibility for the post Orcas timeframe.  I would love some way to do this.

  • Anonymous
    March 19, 2007
    Why not just: public static IEnumerable<T> Intersperse<T>(this IEnumerable<T> sequence, T value) {    yield return sequence.First();    foreach (var item in sequence.Skip(1))    {        yield return value;        yield return item;    } } ?

  • Anonymous
    March 19, 2007
    That works very well except if sequence doesn't contain anything.  Which can be solved if you add an if statement before yielding the first item.

  • Anonymous
    March 21, 2007
    Last week the Microsoft MVP's converged on Redmond from all corners of the globe. It was a great occaission

  • Anonymous
    April 05, 2007
    This is a good explanation of how to write an extension method. One thing: you can write the above code without intersperce and join, still in an FP style: mylist.Where(x => x.Description != "").Select(x => x.Description).Aggregate("", (s, i) => s + i + "n"); If you don't like all of the short lived string objects on the heap, this works too: mylist.Where(x => x.Description != "").Select(x => x.Description).Aggregate(new StringBuilder(), (s, i) => s.Append(i).Append("n"), s => s.ToString());

  • Anonymous
    April 06, 2007
    Estava a ler uma mensagem de mais um guru da Microsoft, o Wes Dyer . Nela, ele apresentava uma aplicação

  • Anonymous
    November 07, 2007
    This would be closer to the Haskell version: mylist.Select(x => x.Description).Where(d => d != "").Intersperse("n").Concat();