The pitfalls of LINQ deferred execution

Let’s face it, we all love the simplicity of Linq. The fluent syntax, the easy to read – almost sql-like – syntax. However, there are som pitfalls that I’ve seen colleagues fall into unknowingly. One of them is what is called  the deferred execution.

By design, you don’t execute a Linq command, you only specify it. The execution is not performed until the result is required. Hence deferred.

 

Take a look at the following code


//Prepare test data. Could be a set returned
//from a database query
var list = new List<int>();
for (int i = 0; i < 1000000; i++)
{
list.Add(i);
}
//Filter out a small subset of the data [zip code, annual income]
var listSmall = list.Where(i => i > 100000 && i < 150000);
//Now use the small subset and loop through it
//E.g. examine the first 1000 rows
var result = new List<int>();
for (int i = 0; i < 1000; i++)
{
int _i = listSmall.Where(o => o == 100100 + i).Single();
result.Add(_i);
}

view raw

gistfile1.cs

hosted with ❤ by GitHub

Albeit a bit contrieved it is not an unusual pattern. I have a large list that i narrow down to a subset that I would like to work on (zip codes, gender) and then I examine this subset by looping through it.

On my machine this took 15000 ms (I’ve removed the Stopwatch stuff for clarity). This is not reasonable even though we have 1,000,000 records.

 

The reason is that listSmall is not a list (yet)! It is just a defined query. So, every time we execute

listSmall.Where(o => o == 100100 + i).Single()

we are, in fact, executing

list.Where(i => i > 100000 && i < 150000).Where(o => o == 100100 + i).Single()

So, instead of going through 50,000 records a 1000 times, we are searching 1,000,000 records! Not what we intended indeed. The way to solve this is to force Linq to execute the initial filter. The easiest way to do this is to simply append ToList() at the end. Like so:

var listSmall = list.Where(i => i > 100000 && i < 150000).ToList();

Now, the code runs in 400 ms. That’s what I call improvement.

 

Another scenario that has it cause in the same Linq feature. Inspect the following code


//Prepare test data. Could be a set returned
//from a database query
var list = new List<int>();
for (int i = 0; i < 1000000; i++)
{
list.Add(i);
}
//Prepare a list to hold the results
//E.g list of all users born a certain year
var listOfInts = new List<IEnumerable<int>>();
for (int i = 0; i < 10; i++)
{
//Select from the large list all users
//that satisfy the criteria
listOfInts.Add(list.Where(a => a == i));
}
//Now, loop through all years and select the
//first user for every year
foreach(var l in listOfInts)
{
Console.WriteLine(l.First());
}

view raw

gistfile1.cs

hosted with ❤ by GitHub

Here we have a recordset that contains a lot of users and I want to select and group all users from specific years. I loop through the set and select the users. I then store the result in an array. Imagine my surprise when I later loop through my yearbook and see that all users are from the same year. What happended?

Well, since I didn’t, actually, retreive the users in the first loop but only specified the query, when I finally did execute the query the loop is done and i == 0. Through closure, i is visible to my query snippet and is used to select users but – by then – it is 10.

 

The solution is once again to force execution by appending ToList() to the where statement at row 15.

Happy LINQing.

Advertisement

One thought on “The pitfalls of LINQ deferred execution

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s