Ruminations of J.net idle rants and ramblings of a code monkey

Linq Performance Part II - Filtering

Linq | Performance
Continuing on the previous topic of Linq Performance … I’m now doing something a bit more interesting than just a “Select From”. All of the key conditions (machines, specs, methodology, blah blah blah) remain the same; no changes at all there. However, I’ll be digging around in filtering this time, comparing filtering between ADO.NET, Linq to SQL and, just for giggles, Linq to Objects and Linq to ADO.NET. Based on the previous results, I’m not using constructors for the custom classes, but rather property binding. The performance of the full property binding (rather than fields) is good and, let’s be honest here, that’s how you should be doing it anyway. First, an overview of the different types of filters that I’m going to be running: Find By First Letter: This will do a search/filter for Persons by the first letter of their first name … a LIKE query. Rather than getting the database all optimized and the query results cached, I select the first letter randomly from a cached copy of the table, but this is not included in the results. Yes, the query plan will be cached, but that’s normal and a part of the overall performance system that we want to test anyway. Find By Non Key: This does a search/filter for Persons by First Name and Last Name. This uses an equality operator and will (most likely, though I didn’t check) return a single row. As before, the First Name/Last Name combination comes from a cached copy of the table and the values are randomly selected. As with the previous test, the query plan is cached and, again, that’s a normal thing. Find Key: The last test does a search for a row by the primary key value. This does return a single row in all cases. The key to search for is randomly selected from a cached copy of the table. For all of the tests, actual, valid values were used – hence the random selection from a cached copy of the table. Originally, this was not the case, but I quickly found that, in particular, the Linq tests that returned a single item would throw an exception if nothing was found – though this is likely because I used to First() method on the query return (the exception said that the list was empty). This would not have been an issue if I didn’t call this method and, instead, enumerated over the collection of 0 or 1 with the return. For each of the test batches, five different methodologies were used. Data View: This uses an ADO.NET DataView on an existing DataTable to do the filtering. The creation and filling of the table was not included in the test result. This is a method that you would use for cached data and tests the filtering capabilities of the DataView on its own. DataSet FIlter: This uses the Filter() method to retrieve a subset of the rows. As with the previous, the table that is used comes prefilled. Linq Detached: Essentially, this is Linq to Objects. The results come from the database and are then detached from the database, putting the results into a generic List<> class. As with the previous, creating and filling the list is not included in the results. Linq To ADO: For something different, this filters a DataTable using Linq. Again, this is something that you’d do with a cache. And, yet again (I’m beginning to feel like a broken record here), the filling of the DataTable that is used for this is not included in the results. Linq To Sql: This uses pure Linq to Sql, retrieving the results from the database and then returning the results. In this case, the cost of actually hitting the database is included in the results. As you can, I’m sure, imagine, this is the only test where the query plan caching made any difference at all; the rest of the tests were working on data in memory. I did not include results where a DataSet returns results directly from the database; the performance characteristics of this with respect to the Linq To Sql tests would be the same as in the previous selection tests. So, without further ado, the results: Test Batch Data View DataSet Filter Linq Detached Linq to ADO Linq to Sql Find By First Letter 25.687 23.679 49.844 36.979 28.084 Find By Non Key 34.516 138.782 9.066 27.020 12.787 Find Key 17.115 0.162 6.200 7.029 9.064 Average 25.773 54.208 21.703 23.676 16.645 I have to say, I found the results quite interesting. There are some pretty wide variations in the methods, depending on what you are doing. I was also surprised to see that the Find By First Letter had the worst performance for Linq Detached … this was not what I was expecting and not something that I had seen in previous test runs on a different machine (but that was also testing against a Debug build rather than a Release build). The average time for the DataSet Filter was very highly impacted by the Find By Non Key batch … this is just really bad with DataSets. Find Key for the dataset was very fast though … so much so that you can’t even see the bar in the chart; this is due to the indexing of the primary key by the DataSet. Linq Detached was hurt by the Find By First Letter batch; my theory is that this is due to string operations, which have always been a little on the ugly side. Other than that, the find performance of Linq to Objects was quite good and finding by key and by non-key fields were little different – and this difference would, again, most likely be due to the string comparison vs. integer comparisons.

Linq Performance - Part I

.NET Stuff | Linq | Performance
Well, it’s been a while since I did my initial review of some simple Linq performance tests. Since then, I’ve done a bit more testing of Linq performance and I’d like to share that. The results are enlightening, to say the least. I did this because I’ve gotten a lot of questions regarding the performance of Linq and, in particular, Linq to Sql – something that is common whenever there is a new data-oriented API. Now, let me also say that performance isn’t the only consideration … there are also considerations of functionality and ease of use, as well as the overall functionality of the API and its applicability to a wide variety of scenarios. I used the same methodology that I detailed in this previous post. Now, all of the tests were against the AdventureWorks sample database’s Person.Contact table with some 20,000 rows. Not the largest table in the world, but it’s also a good deal larger that the much-beloved Northwind database. I also decided to re-run all of the tests a second time on my home PC (rather than my laptop) as the client and one of my test servers as the database server. The specs are as follows: Client DB Server AMD Athlon 64 X2 4400+ AMD Athlon 64 X2 4200+ 4 GB RAM 2 GB RAM Vista SP1 x64 Windows Server 2008 Standard x64 Visual Studio 2008 SP1 Sql Server 2008 x64 So, with that out of the way, let’s discuss the first test. Simple Query This is a simple “SELECT * FROM Person.Contact” query … nothing special or funky. From there, as with all of the tests, I loop through the results and assign them to temporary, local variables. An overview of the tests is below: DataReaderIndex Uses a data reader and access the values using the strongly-typed GetXXX methods (i.e. GetString(int ordinal)). With this set, the ordinal is looked up using GetOrdinal before entering the loop to go over the resultset. This is my preferred method of using a DataReader. int firstName = rdr.GetOrdinal("FirstName"); int lastName = rdr.GetOrdinal("LastName"); while (rdr.Read()) { string fullName = rdr.GetString(firstName) + rdr.GetString(lastName); } rdr.Close(); DataReaderHardCodedIndex This is the same as TestDataReaderIndex with the exception that the ordinal is not looked up before entering the loop to go over the resultset but is hard-coded into the application. while (rdr.Read()) { string fullName = rdr.GetString(0) + rdr.GetString(1); } rdr.Close(); DataReaderNoIndex Again, using a reader, but not using the strongly-typed GetXXX methods. Instead, this is using the indexer property, getting the data using the column name as an object. This is how I see a lot of folks using Data Readers. while (rdr.Read()) { string fullName = (string)rdr["FirstName"] + (string)rdr["LastName"]; } rdr.Close(); LinqAnonType Uses Linq with an anonymous type var contactNames = from c in dc.Contacts select new { c.FirstName, c.LastName }; foreach (var contactName in contactNames) { string fullName = contactName.FirstName + contactName.LastName; } LinqClass_Field Again, uses Linq but this time it’s using a custom type. In this class the values are stored in public fields, rather than variables. IQueryable<AdvWorksName> contactNames = from c in dc.Contacts select new AdvWorksName() {FirstName= c.FirstName, LastName= c.LastName }; foreach (var contactName in contactNames) { string fullName = contactName.FirstName + contactName.LastName; } DataSet This final test uses an untyped dataset. We won’t be doing a variation with a strongly-typed dataset for the select because they are significantly slower than untyped datasets. Also, the remoting format for the dataset is set to binary, which will help improve the performance for the dataset, especially as we get more records. DataSet ds = new DataSet(); ds.RemotingFormat = SerializationFormat.Binary; SqlDataAdapter adp = new SqlDataAdapter(cmd); adp.Fill(ds); foreach (DataRow dr in ds.Tables[0].Rows) { string fullName = dr.Field<String>("FirstName") + dr.Field<String>("LastName"); } cnct.Close(); LinqClass_Prop This uses a custom Linq class with properties for the values. IQueryable<AdvWorksNameProps> contactNames = from c in dc.Persons select new AdvWorksNameProps() { FirstName = c.FirstName, LastName = c.LastName }; foreach (var contactName in contactNames) { string fullName = contactName.FirstName + contactName.LastName; } LinqClass_Ctor This uses the same Linq class as above but initializes the class by calling the constructor rather than binding to the properties. IQueryable<AdvWorksNameProps> contactNames = from c in dc.Persons select new AdvWorksNameProps(c.FirstName, c.LastName); foreach (var contactName in contactNames) { string fullName = contactName.FirstName + contactName.LastName; }                                           If you are wondering why the different “flavors” of Linq … it’s because, when I first started re-running these tests for the blog, I got some strange differences that I hadn’t seen before between (what is now) LinqAnonType and LinqClassField. On examination, I found that these things made a difference and wanted to get a more rounded picture of what we were looking at here … so I added a couple of tests. And the results …       Average LinqClass_Field 277.61 DataReaderIndex 283.43 DataReaderHardCodedIndex 291.17 LinqClass_Prop 310.76 DataSet 323.71 LinqAnonType 329.26 LinqClass_Ctor 370.20 DataReaderNoIndex 401.63 These results are actually quite different from what I saw when I ran the tests on a single machine … which is quite interesting and somewhat surprising to me. Linq still does very well when compared to DataReaders … depending on exactly how you implement the class. I didn’t expect that the version using the constructor would turn out to be the one that had the worst performance … and I’m not really sure what to make of that. I was surprised to see the DataSet do so well … it didn’t on previous tests, but in those cases, I also didn’t change the remoting format to binary; this does have a huge impact on the load performance, especially as the datasets get larger (XML gets pretty expensive when it starts getting big).                                                       I’ve got more tests, but due to the sheer length of this post, I’m going to post them separately.

Cool way to do ASP.NET Caching with Linq

.NET Stuff | Linq | Web (and ASP.NET) Stuff
OK, well, I think it's cool (and since the mind is its own place ...). I've been a big fan of ASP.net's cache API since I found out it way back in the 1.0 beta. It certainly solves something that was problematic in ASP "Classic" in a clean, elegant and darn easy to use way. Unfortunately, not a lot of folks seem to know about it. So I'll start with a little overview of ASP.net caching. As the name implies, it's a cache that sits server side. All of the relevant, .Net-supplied classes are in the System.Web.Caching namespace and the class representing the cache itself is System.Web.Caching.Cache. You can access it from the current HttpContext (which you'll see). The management of the cache is handled completely by ASP.net ... you just have to add objects to it and then read from it. When you add to the cache, you can set options like dependencies, expiration, priority and a delegate to call when the item is removed from the cache. Dependencies are interesting ... they will automatically invalidate (and remove) the cache item based on notification from the dependency. ASP.net 1.x had only 1 cache dependency class (System.Web.Caching.CacheDependency) that allowed you to have a dependency on a file, another cache item, and array of them or another CacheDependency. Framework 2.0 introduced System.Web.Caching.SqlCacheDependency for database dependencies and System.Web.Caching.AggregateCacheDependency for multiple, related dependencies. With the AggregateCacheDependency, if one of the dependencies changes, it item is invalidated and tossed from the cache. Framework 2.0 also (finally) "unsealed" the CacheDependency class, so you could create your own cache dependencies. With expiration, you can have an absolute expiration (specific time) or a sliding expiration (TimeSpan after last access). Priority plays into the clean-up algorithm; the Cache will remove items that haven't expired if the cache taking up too much memory/resources. Items with a lower priority are evicted first. Do yourself a favor and make sure that you keep your cache items reasonable. Your AppDomain will thank you for it. ASP.net also provides page and partial-page caching mechanisms. That, however, is out of our scope here. For the adventurous among that don't know what that is ... So ... the cache ... mmmmm ... yummy ... gooooood. It's golly-gee-gosh-darn useful for items that you need on the site, but don't change often. Those pesky drop-down lookup lists that come from the database are begging to be cached. It takes a load off the database and is a good way to help scalability - at the cost of server memory, of course. (There ain't no free lunch.) Still, I'm a big fan of appropriate caching. So ... what's the technique I mentioned that this post is title after? Well, it's actually quite simple. It allows you to have 1 single common method to add and retrieve items from the cache ... any Linq item, in fact. You don't need to know anything about the cache ... just the type that you want and the DataContext that it comes from. And yes, it's one method to rule them all, suing generics (generics are kewl!) and the Black Voodoo Majick goo. From there, you can either call it directly from a page or (my preferred method) write a one-line method that acts as a wrapper. The returned objects are detached from the DataContext before they are handed back (so the DataContext doesn't need to be kept open all) and returned as a generic list object. The cache items are keyed by the type name of the DataContext and the object/table so that it's actually possible to have the same LinqToSql object come from two different DataContexts and cache both of them. While you can load up the cache on application start up, I don't like doing that ... it really is a killer for the app start time. I like to lazy load on demand. (And I don't wanna hear any comments about the lazy.) Here's the C# code: /// <summary> /// Handles retrieving and populating Linq objects in the ASP.NET cache /// </summary> /// <typeparam name="LinqContext">The DataContext that the object will be retrieved from.</typeparam> /// <typeparam name="LinqObject">The object that will be returned to be cached as a collection.</typeparam> /// <returns>Generic list with the objects</returns> public static List<LinqObject> GetCacheItem<LinqContext, LinqObject>() where LinqObject : class where LinqContext : System.Data.Linq.DataContext, new() { //Build the cache item name. Tied to context and the object. string cacheItemName = typeof(LinqObject).ToString() + "_" + typeof(LinqContext).ToString(); //Check to see if they are in the cache. List<LinqObject> cacheItems = HttpContext.Current.Cache[cacheItemName] as List<LinqObject>; if (cacheItems == null) { //It's not in the cache -or- is the wrong type. //Create a new list. cacheItems = new List<LinqObject>(); //Create the contect in a using{} block to ensure cleanup. using (LinqContext dc = new LinqContext()) { try { //Get the table with the object from the data context. System.Data.Linq.Table<LinqObject> table = dc.GetTable<LinqObject>(); //Add to the generic list. Detaches from the data context. cacheItems.AddRange(table); //Add to the cache. No absolute expirate and a 60 minute sliding expiration HttpContext.Current.Cache.Add(cacheItemName, cacheItems, null, System.Web.Caching.Cache.NoAbsoluteExpiration, TimeSpan.FromMinutes(60), System.Web.Caching.CacheItemPriority.Normal, null); } catch (Exception ex) { //Something bad happened. throw new ApplicationException("Could not retrieve the request cache object", ex); } } } //return ... return cacheItems; } And in VB (see, I am multi-lingual!) ... ''' <summary> ''' Handles retrieving and populating Linq objects in the ASP.NET cache ''' </summary> ''' <typeparam name="LinqContext">The DataContext that the object will be retrieved from.</typeparam> ''' <typeparam name="LinqObject">The object that will be returned to be cached as a collection.</typeparam> ''' <returns>Generic list with the objects</returns> Public Shared Function GetCacheItem(Of LinqContext As {DataContext, New}, LinqObject As Class)() As List(Of LinqObject) Dim cacheItems As List(Of LinqObject) 'Build the cache item name. Tied to context and the object. Dim cacheItemName As String = GetType(LinqObject).ToString() + "_" + GetType(LinqContext).ToString() 'Check to see if they are in the cache. Dim cacheObject As Object = HttpContext.Current.Cache(cacheItemName) 'Check to make sure it's the correct type. If cacheObject.GetType() Is GetType(List(Of LinqObject)) Then cacheItems = CType(HttpContext.Current.Cache(cacheItemName), List(Of LinqObject)) End If If cacheItems Is Nothing Then 'It's not in the cache -or- is the wrong type. 'Create a new list. cacheItems = New List(Of LinqObject)() 'Create the contect in a using block to ensure cleanup. Using dc As LinqContext = New LinqContext() Try 'Get the table with the object from the data context. Dim table As Linq.Table(Of LinqObject) = dc.GetTable(Of LinqObject)() 'Add to the generic list. Detaches from the data context. cacheItems.AddRange(table) 'Add to the cache. No absolute expirate and a 60 minute sliding expiration HttpContext.Current.Cache.Add(cacheItemName, cacheItems, Nothing, _ Cache.NoAbsoluteExpiration, TimeSpan.FromMinutes(60), _ CacheItemPriority.Normal, Nothing) Catch ex As Exception 'Something bad happened. Throw New ApplicationException("Could not retrieve the request cache object", ex) End Try End Using End If 'return ... Return cacheItems End Function   The comments, I think, pretty much say it all. It is a static method (and the class is a static class) because it's not using any private fields (variables). This does help performance a little bit and, really, there is no reason to instantiate a class if it's not using any state. Also, note the generic constraints - these are actually necessary and make sure that we aren't handed something funky that won't work. These constraints are checked and enforced by the compiler. Using this to retrieve cache items is now quite trivial. The next example shows a wrapper function for an item from the AdventureWorks database. I made it a property but it could just as easily be a method. We won't get into choosing one over the other; that gets religious. public static List<StateProvince> StateProvinceList { get { return GetCacheItem<AdvWorksDataContext, StateProvince>(); } } And VB ... Public ReadOnly Property StateProvinceList() As List(Of StateProvince) Get Return GetCacheItem(Of AdvWorksDataContext, StateProvince)() End Get End Property Isn't that simple? Now, if you only have one DataContext type, you can safely code that type into the code instead of taking it as a generic. However, looking at this, you have to admit ... you can use this in any ASP.net project where you are using Linq to handle the cache. I think it's gonna go into my personal shared library of tricks. As I think you can tell, I'm feeling a little snarky. It's Friday afternoon so I have an excuse. BTW ... bonus points to whoever can send me an email naming the lit reference (and finish it!) in this entry. Umm, no it isn't Lord of the Rings.

Austin Code Camp Stuff ...

.NET Stuff | Linq | Performance | User Groups
I promised that I'd make the materials from my talk at the Austin Code Camp available for download. I've finally gotten it compressed and uploaded. It's 111 MB so be forewarned. Since I used WinRar (and that's not as ubiquitous as zip formats), I've made is a self-extracting archive. You'll need Visual Studio 2008 Team Edition for Software Developers (at least) to read all of the performance results. But I do have an Excel spreadsheet with the pertinent data.

Thoughts on Linq vs ADO.NET - Simple Query

.NET Stuff | Linq | Performance
I had a little discussion today with an old buddy of mine this morning. I won't mention his name (didn't ask him for permission to) but those of you in Houston probably remember him ... he used to be a Microsoft guy and is probably one of the best developers in town. I have a world of respect for him and his opinion. So ... it started with this ... he was surprised by the "do you think a user will notice 300 ms".  Of course, that's a loaded question. They won't. But his point was this: 300 ms isn't a lot of time for a user, but under a heavy load, it an be a lot of time for the server. Yes, it can be ... if you have a heavy load. I won't give a blow-by-blow account of the conversation (I can't remember it line for line anyway), but it was certainly interesting. One thing that we both agreed on that is important for web developers to understand is this: performance is not equal to scalability. They are related. But they are not the same. It is possible (and I've seen it) to create a web app that is really fast for a single user, but dies when you get a few users. Not only have I seen it, but (to be honest here), I've done it ... though, in my defense, it was my first ASP "Classic" application some 10 or 11 years ago; I was enamored with sessions at the time. This was also the days when ADO "Classic" was new and RDO was the more commonly used API. And ... if you are a developer and haven't done something like that ... well, you're either really lucky or you're just not being honest. With that out of the way ... I'd like to give my viewpoint on this: Data Readers are still the fastest way to get data for a single pass. If it's one-time-use data that is just thrown away, it's still the way to go. No question. (At least, IMHO). But there's a lot of data out there that isn't a single pass and then toss ... it may be something that you keep around for a while as the user is working on it (which you often see in a Smart Client application) or is shared among multiple users (such as a lookup field that is consistent ... or pretty much consistent ... across all users). In both of these cases, you will need to have an object that can be held in memory and accessed multiple times. If you are doing a Smart Client application, it also needs to be scrollable. Data Readers don't provide this. So ... if you are doing these types of things, the extra 300 ms is actually well worth it, In a web application, you'll scale a lot better (memory is a lot faster than a database query and it keeps load off the database server for little stuff) by caching common lookup lists in the global ASP.NET Cache. One thing that I find interesting ... the LinqDataSource in ASP.NET doesn't have an EnableCaching property like the SqlDataSource. It does, however, have a property StoreOriginalValuesInViewState.  Hmmm ... curious. Storing this in ViewState can have its benefits ... it's a per-page, per-user quasi-cache ... but at the cost of additional data going over the wire (which might be somewhat painful over a 28.8 modem ... yes, some folks still use those). That said, ViewState is compressed to minimize the wire hit and can be signed to prevent tampering. But ... the EnableCaching puts the resulting DataSet (it won't work in DataReader mode) into the global ASP.NET cache ... which, again, is good for things like lookups that really don't change very often, if at all.  For the Smart Client application ... well, DataReaders have limited use there anyway due to the respective natures of DataReaders and Smart Client apps.  Granted, you can use a DataReader and then manually add the results to the control that you want it to display in ... but that can be a lot of code (yeah, ComboBoxes are pretty simple, but a DataGrid ... or a grid of any sort?). One thing that struck me is the coding involved with master/child displays in Smart Client applications. There's two ways that you can do this in ADO.NET: You can get all the parents and children in one shot and load 'em into a DataSet (or object structure) -or- you can retrieve the children "on demand" (as the user requests the child). Each method has it benefits, but I'd typically lean to the on-demand access, especially if we are looking at a lot of data. This involves writing code to deal with the switching of the focus in the parent record and then filling the child. Not something that's all that difficult, but it is still more stuff to write and maintain. With Linq to Sql, this can be configured with the DeferredLoadingAvailable property of the DataConnection and it will do it for you - depending on the value of this property (settable at runtime - you won't see it in the property sheet in the DataContext designer). There was also some discussion about using Linq vs. rich data objects. This ... hmmm ... well, I'll just give my perspective. This is certainly possible with Linq, though certainly not with anonymous types (see http://blog.microsoft-j.net/2008/04/15/LinqAndAnonymousTypes.aspx for a discussion of them). But ... the Linq to Sql classes are generated as partial classes, so you can add to them to your heart's delight. As well as add methods that hit stored procs that aren't directly tied to a data class.  Additionally, you can certainly use Linq to Sql to have existing (or new) rich data classes that you create independently of your data access and then filled from the results of your query. As for the performance of these ... well, at the current moment, I don't have any numbers but I'd venture to guess that the performance would be comparable to anonymous types. Performance aside, one thing that you also need to consider when looking to use Linq in your projects is not just the performance, but the other benefits that Linq brings to the table. Things like the ease of sorting and filtering the objects returned by Linq to Sql (or Linq to XML for that matter) using Linq to Objects. There is also the (way cool, IMHO) feature that lets you merge data from two different data sources (i.e. Linq to Sql and Linq to XML) into a single collection of objects or a single object hierarchy. Additional capabilities and functionality of one methodology over another are often overlooked when writing ASP.NET applications ... it's simply easier to look at the raw, single user, single page performance without thinking about the data in the holistic context of the overall application. This is, however, somewhat myopic; you need to keep the overall application context in mind when making technology and architecture decisions. This in mind ... hmmm ... off to do a bit more testing. Not sure if I'll do updates first or Linq sorting and filtering vs. DataViews.

Linq vs. ADO.NET - Simple Query

.NET Stuff | Performance | Linq
In my last blog post, I took a look at how Linq handles anonymous types. I also promised to do some performance comparisons between Linq and traditional ADO.NET code. Believe it or not, creating a "fair" test is not as easy as one would think, especially when data access is involved. Due to the nature of connection pooling, whichever method is first to be tested gets hit with the cost of creating the connection ... which skews the test. Yeah, I'm sure this is out there in the blogosphere, but I do like to do these things myself. Call it the Not-Invented-Here syndrome. This particular test set is for a very simple query. I created a set of 4 methods to test for performance within a standard Windows Console Application, which should give an overall comparison of data access. All tests used the AdventureWorks sample database, with the statement (or its Linq equivalent) Select FirstName, LastName From Person.Contact. This is about as simple a query as you can get. From there, each method concatenated the two field results into a single string value ... The Linq test used an anonymous type going against a data class created with the Data Class designer. Data Reader Test 1 (DataReaderIndex) used the strongly-typed DataReader.GetString(index) ... and I did cheat a little with this one by hardcoding the index rather than looking it up before entering the loop (though this is how I'd do it in the "real world"). In previous tests that I've done, I've found that this gives about 10-20% better performance than DataReader[columnName].ToString() ... though that does include the "lookup" that I mentioned previously. Data Reader Test 2 represents the more common pattern that I've seen out there ... using DataReader[columnName].ToString(). Now, I'm not sure which of these methods Data Binding uses and, honestly, that's not in the test ... though, now that I think of it, it may be a good thing to test as well. Finally, I included a test for DataSets (TestDataSet) ... using an untyped DataSet. I've found (again, from previous tests) that this performs far better than a typed DataSet ... the typed DataSet gets hit (hard) by the creation/initialization costs. Before running any tests, I included a method called InitializeConnectionPool, which creates and opens a connection, creates a command with the Sql statement (to cache the access plan), calls ExecuteNonQuery and then exits. This is not included in the results, but is a key part of making sure that the test is as fair as possible. Additionally, all of the tests access the connection string in the same way ... using the application properties. In looking at the code generated by the LinqToSql class, this is how they get the connection string. This ensures that the connection string for all methods is the same, which means that the connection pools will be the same. To actually do the test, I called each method a total of 30 times from the applications Main, each function in the same loop. This would help to eliminate any variances. After running each test, I also called GC.Collect() to eliminate, as much as possible, the cost of garbage collection from the results.  I also closed all unnecessary processes and refrained from doing anything else to ensure that all possible CPU and memory resources were allocated to the test. One thing that I've noticed from time to time is that it seems to matter the order in which functions are called, so I made a total of 4 runs, each with a different function first. For each run, I tossed out the min and max values and then averaged the rest -- (total - min - max)/(numCalls -2). This gave me a "normalized" value that, I hoped, would provide a fair, apples-to-apples comparison. Each method had a set of 4 values, each with 30 calls, 28 of which were actually included in the normalized value. I then took the average of the 4 values. I know that sounds like an overly complex methodology ... and I agree ... but I've seen some weird things go on and some pretty inconsistent results. That said, in looking at the results, there was not a lot of difference between each of the 4 runs, which makes me feel pretty good about the whole thing. So ... without further ado ... the results (values are in milliseconds): Method Normalized Average TestDataReaderIndex 56.64767857 TestLinq 75.57098214 TestDataSet 117.2503571 TestDataReaderNoIndex 358.751875 Now, I have to say, I was somewhat surprised by the TestDataReaderNoIndex results ... previous tests that I had done didn't show such a big difference between this and TestDataReaderIndex ... though I wonder if that has something to do with the way I did this test - hardcoding the indexes into TestDataReaderIndex. I'm not surprised that TestDataReaderIndex turned out the be the fastest. DataReaders have been, and still are, the absolute fastest way to get data from the database ... that is, if you do it using integer indexes. However, TestLinq didn't come that far behind and was certainly more performant than the untyped DataSet. So ... let's think about this for a second. The Linq collection that is returned is more like a DataSet than it is a DataReader. DataReaders are forward-only, read-only server-side cursors. Use them once and kiss them goodbye. Both the Linq collection and the DataSet allow random access and are re-startable ... and they are both updatable as well. I've had a lot of folks ask about the performance of Linq and now I can, without question and with all confidence, tell them that the performance is quite good. Still, let's be honest ... the difference between the fastest and the slowest is a mere 300ms. Do you really think users will notice this? UPDATE: You can download the code and the tests that I used for this at https://code.msdn.microsoft.com/Release/ProjectReleases.aspx?ProjectName=jdotnet&ReleaseId=948. If you get different results, I'd be interested to hear about it. Even more, I'd be interested in the methodology that you used to create the report.

Linq and Anonymous Types

.NET Stuff | Linq | Performance
I've been playing with Linq quite a bit recently. I have to say ... it's some cool stuff and revolutionizes data access on the .Net platform. One of the things in Linq that I'm really fascinated with is anonymous types. These classes are created based on a Linq statement and only have the properties that you specified. They're nicely type-safe and work with IntelliSense. Beauty and goodness. Now, for a time, I just played with them and used them without much thought about what's going on behind the scenes. But ... my curiosity got the better of me and I decided to dig a bit and see what's going on. And the best way to do this? Lutz Roeder's Reflector of course! So first ... the code. Not much, pretty simple. using (DataClasses1DataContext dc = new DataClasses1DataContext()) { var contactNames = from c in dc.Contacts select new { c.FirstName, c.LastName }; foreach(var contactName in contactNames) { Console.WriteLine(contactName.FirstName + contactName.LastName); } } I could have made it even simpler ... remove the foreach loop. But that let's me know that all's well. So ... what happens with the anonymous type? It's actually compiled in the assembly. Yup, that's right ... it's a compiled class, just like a case that you create. But there is some black voodoo majik going on and, I'm certain, some significant compiler changes to make this happen. Here's the raw IL generated for the class (with attributes): .class private auto ansi sealed beforefieldinit lt;<FirstName>j__TPar, <LastName>j__TPar> extends [mscorlib]System.Object { .custom instance void [mscorlib]System.Diagnostics.DebuggerDisplayAttribute::.ctor(string) =          { string('\\{ FirstName = {FirstName}, LastName = {LastName} }') Type=string('<Anonymous Type>') } .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() And here's the C# version if the IL: [DebuggerDisplay(@"\{ FirstName = {FirstName}, LastName = {LastName} }", Type="<Anonymous Type>"), CompilerGenerated] internal sealed class <>f__AnonymousType0<<FirstName>j__TPar, <LastName>j__TPar>   If you had any doubt at all, the "CompilerGenerated" attribute pretty much says it all. All of the references to the anonymous type in the code are replaced by this class in the compiled IL. And the return value from the query? It's a generic class: [mscorlib]System.Collections.Generic.IEnumerable`1<class <>f__AnonymousType0`2<string, string>>. Pretty cool, eh? Now I'm off to dig into the performance of these beasties when compared to a DataReader and a DataSet. Early results look promising, but I've got some work to do to make sure it's a valid and fair comparison.