July 24, 2007

3 months, 3 LINQ presentations

That is right, in the past 3 months I have given the same basic presentation three times. I don't know if that counts as a groove or a rut. But there are some nice things about doing that: don't need new Power Point for one. But more importantly: more questions that make you question things more.

Starting off, I'm no LINQ expert. Yet I have to distill it to the group. Luckily LINQ is an easy sell. There is something there for everyone in LINQ. But when it gets right down to it, what is LINQ about? I put it like this: FOR loops are evil and LINQ is the cure.

FOR loops are not run for cover and grab a Bible evil, more of a general GOTO type evil. It isn't as if GOTO is evil in itself, in some languages the GOTO is a required statement. But like all inherently benign language constructs, in the wrong hands it can go really badly.

For myself, I've seen some of the worst code in my life nestled in for loops. And even worse, most people don't even know it. How would they? You could say that they just don't know any better. But in reality, there often isn't a better way.

What is the FOR loop but a structured GOTO. Really, that is it. And it isn't a very thick abstraction. If you don't believe me, go check out assembly language. Same with WHILE.

Next, what are you doing in the FOR loop (looping through a list -- DUH)? Sorry, I need a better question: what are you trying to accomplish with the loop? Now, look at the loop, and how easy is it to figure that out after the fact?

There is a reason people don't like assembly anymore, it is too hard to understand after the fact AND it is to hard to write in the first place. There are too many moving parts. Even adding two number (registers) is a multi-step operation. Things you do in loops have many of the same qualities.

Here is an example: find the largest value in an array of integers.

First the array:
int[] i = int[]{1, 2, 3, 4, 5, 6, 7 };

Here is what you write in C#:

int iMax = i[0];
foreach(int j in i)
{
if (iMax < j)
iMax = j;
}

Here is what you would write thanks to LINQ (and Extension Methods):

int iMax = i.Max();

How many ways are there for the first code to go wrong? There are 6 lines, 4 of them have code. There is one obvious bug in the code anyway...what if the list has no items? You will get an index out of bounds error right there. But there are many ways to incorrectly write this code. This is also a simplistic example, so imagine how bad this can get when doing real code.

In the second example: I can't find one. Plus, there is very little chance that you, or someone else, will not understand what the code is doing.

Now this is simplistic, which is bad because it hides the true power that is hiding underneath. There is more to link than grab bag of small statistic functions (e.g. Sum, Min, Max, Count). Add in a complex object and the Where method and we begin to see.

Visualize a customer object. It will have properties like FirstName, LastName, Address, City, etc. This in in a CSV file that is coming from Sales and Marketing.

First part: load the CSV into your program. No problem, we have all done that from time to time. Now find me all of the people in Idaho. Crap.

Not in LINQ. If you loaded your data into a list (List list) you would write code like this:

var idahoCustomers = from c in list
where c.State = "ID"
select c;

Want that in Lambda:

var idahoCustomers = list.Where(c => c.State == "ID");

Something you should know about now, there are two ways of doing the same thing, and you should probably know both. First is LINQ. If you see "from blah blah where blah blah select blah blah" -- you are looking at LINQ. If you see a "=>" you are looking at Lambda.

Personally, I love Lambda more than LINQ. Lambda can do everything LINQ can do, plus everything else. Another way of saying that is "LINQ is a subset of Lambda."

Anyway, this post could run on and on about the wonders of LINQ and Lambda -- but there are plenty of other people doing that. Hopefully, you have already read some of that. Where I want to finish off with is a few suggestions for anyone looking to get a grip on all of this.

First, there are a lot of new things to learn these days. WPF, WF, WCF, LINQ, Lambda, etc. Is this different? Yes it is. You need to learn LINQ. Personally I will be asking interview questions based on link in the future.

Second, considering you have limited time, what should you concentrate on? My answer is Lambda and Extension Methods. The more you learn about Extension Methods the more you will be able to do with Lambda. (Warning, if you are going to learn Extension Methods, you should probably learn about Predicates as well).

And some words of warning. Watch your return value types. You will see a lot of IQueryable, IEnumerable, and other strange interfaces as return types. These will often be hidden in 'var's. Be warned, each has its own capabilities, and you should know how to convert between them.

For example. In a List object, you get the ForEach extension method. You don't get that with IEnumerable or IQueryable. But you can get there by calling ToList() on either of those object types.

Finally: measure. Grab a profiler and run with it. Just like FOR saved you from the uglyness of GOTO, LINQ saves you from the complexity of FOR. But it doesn't get you away from the costs. There will still be times when it is better to write the loop yourself. A good profiler will tell you when.



Oh, one final note: I'm using Visual Studio.NET 2008 Beta 1 like everyone else. All code samples are subject to change when Beta 2 releases this week.

No comments: