Thursday, November 19, 2009

Contextual Keywords in C#

Contextual Keywords in C#

A contextual keyword is used to provide a specific meaning in the code, but it is not a reserved word in C#. This will give some understanding on different contextual keywords and make you comfortable while using those...

yield:
Used in an iterator block to return a value to the enumerator object or to signal the end of iteration.


Use of this construct is vital for LINQ. Yield return allows one enumerable function to be implemented in terms of another. It allows us to write functions that return collections that exhibit lazy behavior. This allows LINQ and LINQ to XML to delay execution of queries until the latest possible moment. it allows queries to be implemented in such a way that LINQ and LINQ to XML do not need to assemble massive intermediate results of queries. Without the avoidance of intermediate results of queries, the system would rapidly become unwieldy and unworkable.

The following two small programs demonstrate the difference in implementing a collection via the IEnumerable interface, and using yield return in an iterator block.

With this first example, you can see that there is a lot of plumbing that you have to write. You have to implement a class that derives from IEnumerable, and another class that derives from IEnumerator. The GetEnumerator() method in MyListOfStrings returns an instance of the class that derives from IEnumerator. But the end result is that you can iterate through the collection using foreach.

public class MyListOfStrings : IEnumerable
{
private string[] _strings;
public MyListOfStrings(string[] sArray)
{
_strings = new string[sArray.Length];

for (int i = 0; i < sArray.Length; i++)
{
_strings[i] = sArray[i];
}
}

public IEnumerator GetEnumerator()
{
return new StringEnum(_strings);
}
}

public class StringEnum : IEnumerator
{
public string[] _strings;

// Enumerators are positioned before the first element
// until the first MoveNext() call.
int position = -1;

public StringEnum(string[] list)
{
_strings = list;
}

public bool MoveNext()
{
position++;
return (position < _strings.Length);
}

public void Reset()
{
position = -1;
}

public object Current
{
get
{
try
{
Console.WriteLine("about to return {0}", _strings[position]);
return _strings[position];
}
catch (IndexOutOfRangeException)
{
throw new InvalidOperationException();
}
}
}
}

class Program
{
static void Main(string[] args)
{
string[] sa = new[] {
"aaa",
"bbb",
"ccc"
};

MyListOfStrings p = new MyListOfStrings(sa);

foreach (string s in p)
Console.WriteLine(s);
}
}

Using the yield return keywords, the equivalent in functionality is as follows. This code is attached to this page:

class Program
{
public static IEnumerable MyListOfStrings(string[] sa)
{
foreach (var s in sa)
{
Console.WriteLine("about to yield return");
yield return s;
}
}

static void Main(string[] args)
{
string[] sa = new[] {
"aaa",
"bbb",
"ccc"
};

foreach (string s in MyListOfStrings(sa))
Console.WriteLine(s);
}
}

As you can see, this is significantly easier.

This isn't as magic as it looks. When you use the yield contextual keyword, what happens is that the compiler automatically generates an enumerator class that keeps the current state of the iteration. This class has four potential states: before, running, suspended, and after. This class has Reset and MoveNext methods, and a Current property. When you iterate through a collection that is implemented using yield return, you are moving from item to item in the enumerator using the MoveNext method. The implementation of iterator blocks is fairly involved. A technical discussion of iterator blocks can be found in the C# specifications.

Yield return is very important when implementing our own query operators (which we will want to do sometimes).

There is no counterpart to the yield keyword in Visual Basic 9.0, so if you are implementing a query operator in Visual Basic 9.0, you must use the approach where you implement IEnumerable and IEnumerator.

One of the important design philosophies about the LINQ and LINQ to XML technologies is that they should not break existing programs. Adding new keywords will break existing programs if the programs happen to use the keyword in a context that would be invalid. Therefore, some keywords are added to the language as contextual keywords. This means that when the keyword is encountered at specific places in the program, it is interpreted as a keyword, whereas when the keyword is encountered elsewhere, it may be interpreted as an identifier. Yield is one of these keywords. When it is encountered before a return or break keyword, it is interpreted by the compiler as appropriate, and the new semantics are applied. If the program was written in C# 1.0 or 1.1, and it contained an identifier named yield, then the identifier continues to be parsed correctly by the compiler, and the program is not made invalid by the language extensions.

Wednesday, November 18, 2009

Extension Methods

Extension methods are special methods that, while they are not part of a data type, you can call them as though they were part of the data type. Extension methods are a feature of C# 3.0.

Writing Extension Methods
To write an extension method, you do the following:

· Define a public static class.
· Define a public static method in the class where the first argument is the data type for which you want the extension method.
· Use the this keyword on the first argument to your public static method. The this keyword denotes the method as an extension method.

This special syntax is actually simply a natural extension of what you do normally when you want to make a utility method for a class that you don't want to (or can't) extend. A common pattern when you want to write a utility method is to define a static method that takes an instance of the class as the first parameter. This is exactly what an extension method is - the only difference is that you can then write a call to the method as though it were part of the class, and the compiler will find and bind to the extension method.

Note that the following class does not define a PrintIt method:

public class MyClass
{
public int IntField;
public MyClass(int arg)
{
IntField = arg;
}
}

This static class defines an extension method for the MyClass class:

public static class MyClassExtension
{
public static void PrintIt(this MyClass arg)
{
Console.WriteLine("IntField:{0}", arg.IntField);
}
}

You can now call the PrintIt method as though it were part of the class.

MyClass mc = new MyClass(10);
mc.PrintIt();

When you run this code, it outputs:

IntField:10

But the really cool thing is that you can write extension methods for a type that in and of itself, can't contain methods. The most important example of this is that you can write extension methods for interfaces. You could also write an extension method for an abstract class.

You can define extension methods for parameterized types as well as non-parameterized types. The standard query operators are almost all extension methods that are defined on IEnumerable.

Note that you can define your own extension methods for IEnumerable. When there isn't a standard query operator that does exactly what you want, you can write your own extension method. This is a powerful technique that adds expressiveness to your code.

When you are writing pure functional code, extension methods are important. There are times that writing extension methods on IEnumerable is the way that you want do things. We'll be using this technique sometimes when writing FP code.

Extension Methods are Vital for LINQ
Extension methods are an integral part of LINQ. Consider the following code that contains a LINQ query expression:

int[] source = new[] { 3, 6, 4, 8, 9, 5, 3, 1, 7, 0 };

IEnumerable query =
from i in source
where i >= 5
select i * 2;

foreach (var i in query)
Console.WriteLine(i);

This code is functionally identical to this:

int[] source = new[] { 3, 6, 4, 8, 9, 5, 3, 1, 7, 0 };

IEnumerable query =
source.Where(i => i >= 5)
.Select(i => i * 2);

foreach (var i in query)
Console.WriteLine(i);

In fact, you can see that there is a direct translation from the query expression to the same query that is expressed in method notation. I believe that in early versions of the C# 3.0 compiler, this translation was implemented almost like a macro. But in any case, queries that are implemented via method syntax rely, of course, on the extension methods that are included with the .NET framework, and hence query expressions also rely on them. It is the ability to write extension methods for generic interfaces that enables queries.

Lambda Expressions

A lambda expression is an anonymous function that can contain expressions and statements, and can be used to create delegates or expression tree types.
A lambda anonymous methods, that allows you to declare your method code inline instead of with a delegate function.

All lambda expressions use the lambda operator =>, which is read as "goes to". The left side of the lambda operator specifies the input parameters (if any) and the right side holds the expression or statement block. The lambda expression x => x * x is read "x goes to x times x."

(int x) => x + 1 // explicitly typed parameter
(y,z) => return y * z; // implicitly typed parameter

Note:
1. Lambdas are not allowed on the left side of the is or as operator.

To show lambda expressions in context, consider the problem where you have an array with 10 digits in it, and you want to filter for all digits greater than 5. In this case, you can use the Where extension method, passing a lambda expression as an argument to the Where method:

int[] source = new[] { 3, 8, 4, 6, 1, 7, 9, 2, 4, 8 };

foreach (int i in source.Where(x => x > 5))
Console.WriteLine(i);

First, a quick review of delegates:

Defining, Creating, and Using a Delegate
In C#, a delegate is a data structure that refers to either a static method, or an object and an instance method of its class. When you initialize a delegate, you initialize it with either a static method, or a class instance and an instance method.

The following code shows the definition of a delegate and a method that can be used to initialize the delegate:

// Defines a delegate that takes an int and returns an int
public delegate int ChangeInt(int x);

// Define a method to which the delegate can point
static public int DoubleIt(int x)
{
return x * 2;
}

Now, you can create and initialize an instance of the delegate, and then call it:

ChangeInt myDelegate = new ChangeInt(DelegateSample.DoubleIt);
Console.WriteLine("{0}", myDelegate(5));

This, as you would expect, writes 10 to the console.

Using an Anonymous Method
With C# 2.0, anonymous methods allow you to write a method and initialize a delegate in place:

ChangeInt myDelegate = new ChangeInt(
delegate(int x)
{
return x * 2;
}
);
Console.WriteLine("{0}", myDelegate(5));

Using a Lambda Expression
With Lambda expressions, the syntax gets even terser:

ChangeInt myDelegate = x => x * 2;
Console.WriteLine("{0}", myDelegate(5));

This lambda expression is an anonymous method that takes one argument x, and returns x * 2. In this case, the type of x and the type that the lambda returns are inferred from the type of the delegate to which the lambda is assigned.

If you wanted to, you could have specified the type of the argument, as follows:

ChangeInt myDelegate = (int x) => x * 2;
Console.WriteLine("{0}", myDelegate(5));

Using a Lambda with Two Arguments
When using the Standard Query Operators, on occasion, you need to write a lambda expression that takes two arguments.

If you have a delegate that takes two arguments:

// Defines a delegate that takes two ints and returns an int
public delegate int MultiplyInts(int arg, int arg2);

You can declare and initialize a delegate:

MultiplyInts myDelegate = (a, b) => a * b;
Console.WriteLine("{0}", myDelegate(5, 2));

Statement Lambda Expressions
You can write a more complicated lambda expression using statements, enclosing the statements in braces. If you use this syntax, you must use the return statement, unless the lambda returns void:

int[] source = new[] { 3, 8, 4, 6, 1, 7, 9, 2, 4, 8 };

foreach (int i in source.Where(
x =>
{
if (x <= 3)
return true;
else if (x >= 7)
return true;
return false;
}
))
Console.WriteLine(i);

Sometimes developers wonder how to pronounce the => token.

If the lambda expression is a predicate, expressing some condition: c => c.State == "WA" then the => can be spoken as "such that". In this example, you could say "c such that c dot state equals Washington". If the lambda expression is a projection, returning a new type: c => new XElement("CustomerID", c.CustomerID); then the => can be spoken as "becomes". In the above example, you could say "c becomes new XElement with a name of CustomerID and its value is c dot CustomerID". Or "maps to", or "evaluate to", as suggested in the comments below. But most often, I just say "arrow". J

A quick note: predicates are simply boolean expressions that are passed to some method that will use the boolean expression to filter something. A lambda expression used for projection takes one type, and returns a different type. More on both of these concepts later.

The Func Delegate Types
The framework defines a number of parameterized delegate types:

public delegate TR Func();
public delegate TR Func(T0 a0);
public delegate TR Func(T0 a0, T1 a1);
public delegate TR Func(T0 a0, T1 a1, T2 a2);
public delegate TR Func(T0 a0, T1 a1, T2 a2, T3 a3);

In the above delegate types, notice that if there is only one type parameter, it is the return type of the delegate. If there are two type parameters, the first type parameter is the type of the one and only argument, and the second type is the return type of the delegate, and so on. Many of the standard query operators (which are just methods that you call) take as an argument a delegate of one of these types.

Expression Trees
Lambda expressions can also be used as expression trees. This is an interesting topic, but is not part of this discussion on writing pure functional transformations.