Ruminations of J.net idle rants and ramblings of a code monkey

ADO.NET: What to use when?

.NET Stuff

This comes from a young lady's question last night at the Houston .Net User's Group. She's at HCC, taking C# courses and felt that it really wasn't clear when to use DataReaders vs. DataTables vs. DataSets. So ... I'm going to put a couple of thoughts down here on that topic, in the hope that she'll be reading it.

DataReader

These are provider specific classes (i.e. SqlDataReader, OracleDataReader, etc.) that give you access to a read-only, forward-only, server-side cursor. It's a firehose cursor. It's not scrollable, it's not updatable and the current record location is tracked by the server, not the client. The classes will be found in the namespaces for the provider (i.e. System.Data.SqlClient) and will implement the System.Data.IDataReader interface. You get a DataReader by calling ExecuteReader on the provider's DataCommand class (i.e. SqlDataCommand).

When to use

DataReaders provide the most performant way to read through a set of data returned from a query, by far. But they also provide the least additional functionality. In general, you want to use then when you want to do a single pass through a resultset and then don't need it. They are typically perfect for ASP.NET applications when you are doing databinding directly to resultsets where you won't need the data again and are going to toss it out after the page renders. If you are using a SqlDataSource, you can set the DataSourceMode property to DataReader to force the data source to use a reader (the default is DataSet).  Also, if you are using an object structure that you build from query results, you should use a DataReader to populate the object hierarchy. They will not, however, work with databinding in Windows Forms application ... this inherently needs to be scrollable and that's something that a DataReader won't do.
One thing to consider ... even if you are doing a single pass, you may not always want to use a DataReader. If you are doing a non-trivial amount of additional processing on each record as it is read, you may want to consider a DataSet because the DataReader will keep a connection open on the database server and consume server resources. Where is that line? Hmmm ... it depends.
Make sure that you dispose of your DataReaders (the do implement IDisposable). I will typically nest DataReaders in a Using block to make sure that they get disposed properly. I also open my DataReaders using the CommandBehavior.CloseConnection option.

Untyped DataTable/DataSet

DataTables and DataSets are not dependant on the ADO.NET provider but are common classes. Both DataSet and DataTable are found in the System.Data namespace and a DataSet is a container for multiple DataTables and adds things like parent/child relationships to multiple DataTables (so we'll focus mainly on DataTables but also discuss where DataSets come into play). DataTables use client-side cursors (like the old ADO Classic cursor location adUseClient) ... the data is retrieved from the server and then the connection is closed, a process that is done by the provider-specific DataAdapter. The DataAdapter opens a DataReader to do the population so, as you can guess, DataTables are slower than DataReaders. They hold no locks and they don't detect updates on the server. Unlike DataReaders, DataTables are updatable and scrollable.

When to use

First, if you are doing databinding directly to resultsets in WindowsForms applications, you'll need to use DataTables (at least). In web applications, these are very appropriate for data that you cache ... things like lookup lists, for example. Since these resultsets change very infrequently, caching the results can really help with scalability and performance (though be careful ... you probably won't want to cache a 5000 row DataTable). For a one-time, non-scrollable read, it really doesn't make any sense at all to use a DataTable in most circumstances, though (unless, as mentioned above, you are doing a non-trivial amount of processing on the data as it is read). There is a monkey wrench to throw out here ... if you have a high-volume, data-driven site, you may well want to do some testing with DataReaders vs. DataTables as your most important limiting factor may well be server load. In some scenarios, DataTables, because of the relatively quick open/close semantics the adapter's Fill method, can actually scale better though they don't perform as well. That's something that you need to determine by understanding what your scalability requirements are and where your critical bottlenecks are. In most scenarios, however, DataReaders will be a better choice. Just to muddy it up a hair, you can also get a DataReader from a DataTable (by calling CreateDataReader()), but this isn't the same as the DataReader mentioned above. It's not a server-side cursor. It is, however, forward-only and read-only and can enumerate through the rows in a DataTable faster than looping over the Rows collection in a For Each loop.
DataTables allow you to add client-side columns to tables (which can be handy at times); these client-side columns can be completely custom information that the application populates for its own nefarious purposes or it can be a column based on a calculation of other columns. Since they are updatable, they can also be useful when data needs to be updated. When you use a DataTable for this purpose, the DataAdapter does the updates using the associated DataCommands for Insert, Update and Delete operations. Since a DataTable is disconnected, it won't inherently detect if there are concurrency conflicts ... you need to do that in your insert/update/delete commands. Handling these concurrency conflicts is a topic all in itself as there is no one "right" way to do it (how you do it depends on requirements and goals).
DataTables also work with the System.Data.DataView class, allowing you to sort and filter the DataTable without making a round trip to the database. This is especially useful for caching scenarios.
DataTables and DataSets are also serializable (unlike DataReaders) which means that you can easily pass them across the wire from a web server to a client application or persist them to disk.

Typed DataTable/DataSet

These are custom DataTables/DataSets that inherit from the base classes (above) and add strongly typed properties, methods and classes to the experience. Visual Studio has a designer that enables you to create these in a visual manner.

When to use

Typed DataTables and DataSets provide the best design-time experience ... you'll get intellisense, compile-time type-checking for your fields, key (both primary and foreign) enforcement and methods for navigating the relationships between tables. Yes, you can add relationships and keys to tables in an untyped DataSet, but you have to write the code for it. With the typed DataTables/DataSets, it's already in there (well, as long as you define them in the designer, that is). The custom columns mentioned above can also be defined in the DataSet designer as can additional Fill/Get methods for the DataAdapter to use. From a RAD perspective, typed DataSets are the way to go ... all of the stuff that you have to wire up with the untyped versions is done for you by the designer. Like their untyped progenitors, they can be serialized but this time they have strong typing when serializing.
That said, typed DataTables and DataSets are the least performant, primarily due to instantiation cost (all of the fields, relationships, etc. are created in generated code when you instantiate one of these puppies). But, realistically, in many scenarios, this hit is worth the benefit to speed application development ... it's one of those things where you need to balance your requirements and goals. Typically, I'll tend to lean towards the RAD beneifts of typed datasets and use untyped only when there is no need at all for things like updates and strong typing (they do exist).

Regardless of which method that you use, I have to say that I do love the way it is virtually impossible to get yourself into an endless loop with ADO.NET (you can, but you have to work at it). With DataReaders, you loop while(rdr.Read()). For DataTables it's foreach(DataRow rw in tbl.Rows) and DataViews is foreach(DataRowView vw in view). Very nice. If you've ever done ADO "Classic", you've forgotten to call MoveNext and had to get medieval on the process (and that could be IIS) to get it to stop running circles around itself. And, if you've done ADO "Classic", you have forgotten to do that. More than once. If you say you haven't done that, you're lying.
So ... I hope that helps a bit. Unfortunately, there are times when there isn't a hard line in the sand when you choose one vs the other, especially when deciding between typed and untyped datasets. That's one of the reasons why I often refer to development more as an art than as a true engineering discipline ... in my mind, engineering is more defined and cut and dry. Calling it an art doesn't mean that you don't need rigor and discipline ... remember, the best artists are so because of the rigor and discipline that they apply to their art. Of course, that could also be because, long ago, I turned away from engineering as a major (what I intended when I started college) and went into English Lit instead (yes, lit is an artform).

I know that I didn't even touch in Linq here ... that can complicate this decision tree as well.