Ruminations of J.net idle rants and ramblings of a code monkey

Linq Performance Part II - Filtering

Linq | Performance

Continuing on the previous topic of Linq Performance … I’m now doing something a bit more interesting than just a “Select From”. All of the key conditions (machines, specs, methodology, blah blah blah) remain the same; no changes at all there. However, I’ll be digging around in filtering this time, comparing filtering between ADO.NET, Linq to SQL and, just for giggles, Linq to Objects and Linq to ADO.NET. Based on the previous results, I’m not using constructors for the custom classes, but rather property binding. The performance of the full property binding (rather than fields) is good and, let’s be honest here, that’s how you should be doing it anyway.

First, an overview of the different types of filters that I’m going to be running:

Find By First Letter: This will do a search/filter for Persons by the first letter of their first name … a LIKE query. Rather than getting the database all optimized and the query results cached, I select the first letter randomly from a cached copy of the table, but this is not included in the results. Yes, the query plan will be cached, but that’s normal and a part of the overall performance system that we want to test anyway.

Find By Non Key: This does a search/filter for Persons by First Name and Last Name. This uses an equality operator and will (most likely, though I didn’t check) return a single row. As before, the First Name/Last Name combination comes from a cached copy of the table and the values are randomly selected. As with the previous test, the query plan is cached and, again, that’s a normal thing.

Find Key: The last test does a search for a row by the primary key value. This does return a single row in all cases. The key to search for is randomly selected from a cached copy of the table.

For all of the tests, actual, valid values were used – hence the random selection from a cached copy of the table. Originally, this was not the case, but I quickly found that, in particular, the Linq tests that returned a single item would throw an exception if nothing was found – though this is likely because I used to First() method on the query return (the exception said that the list was empty). This would not have been an issue if I didn’t call this method and, instead, enumerated over the collection of 0 or 1 with the return.

For each of the test batches, five different methodologies were used.

Data View: This uses an ADO.NET DataView on an existing DataTable to do the filtering. The creation and filling of the table was not included in the test result. This is a method that you would use for cached data and tests the filtering capabilities of the DataView on its own.

DataSet FIlter: This uses the Filter() method to retrieve a subset of the rows. As with the previous, the table that is used comes prefilled.

Linq Detached: Essentially, this is Linq to Objects. The results come from the database and are then detached from the database, putting the results into a generic List<> class. As with the previous, creating and filling the list is not included in the results.

Linq To ADO: For something different, this filters a DataTable using Linq. Again, this is something that you’d do with a cache. And, yet again (I’m beginning to feel like a broken record here), the filling of the DataTable that is used for this is not included in the results.

Linq To Sql: This uses pure Linq to Sql, retrieving the results from the database and then returning the results. In this case, the cost of actually hitting the database is included in the results. As you can, I’m sure, imagine, this is the only test where the query plan caching made any difference at all; the rest of the tests were working on data in memory.

I did not include results where a DataSet returns results directly from the database; the performance characteristics of this with respect to the Linq To Sql tests would be the same as in the previous selection tests.

So, without further ado, the results:

Test BatchData ViewDataSet FilterLinq DetachedLinq to ADOLinq to Sql
Find By First Letter25.68723.67949.84436.97928.084
Find By Non Key34.516138.7829.06627.02012.787
Find Key17.1150.1626.2007.0299.064
Average25.77354.20821.70323.67616.645

image

I have to say, I found the results quite interesting. There are some pretty wide variations in the methods, depending on what you are doing. I was also surprised to see that the Find By First Letter had the worst performance for Linq Detached … this was not what I was expecting and not something that I had seen in previous test runs on a different machine (but that was also testing against a Debug build rather than a Release build). The average time for the DataSet Filter was very highly impacted by the Find By Non Key batch … this is just really bad with DataSets. Find Key for the dataset was very fast though … so much so that you can’t even see the bar in the chart; this is due to the indexing of the primary key by the DataSet. Linq Detached was hurt by the Find By First Letter batch; my theory is that this is due to string operations, which have always been a little on the ugly side. Other than that, the find performance of Linq to Objects was quite good and finding by key and by non-key fields were little different – and this difference would, again, most likely be due to the string comparison vs. integer comparisons.