Ruminations of J.net idle rants and ramblings of a code monkey

Cool way to do ASP.NET Caching with Linq

.NET Stuff | Linq | Web (and ASP.NET) Stuff
OK, well, I think it's cool (and since the mind is its own place ...). I've been a big fan of ASP.net's cache API since I found out it way back in the 1.0 beta. It certainly solves something that was problematic in ASP "Classic" in a clean, elegant and darn easy to use way. Unfortunately, not a lot of folks seem to know about it. So I'll start with a little overview of ASP.net caching. As the name implies, it's a cache that sits server side. All of the relevant, .Net-supplied classes are in the System.Web.Caching namespace and the class representing the cache itself is System.Web.Caching.Cache. You can access it from the current HttpContext (which you'll see). The management of the cache is handled completely by ASP.net ... you just have to add objects to it and then read from it. When you add to the cache, you can set options like dependencies, expiration, priority and a delegate to call when the item is removed from the cache. Dependencies are interesting ... they will automatically invalidate (and remove) the cache item based on notification from the dependency. ASP.net 1.x had only 1 cache dependency class (System.Web.Caching.CacheDependency) that allowed you to have a dependency on a file, another cache item, and array of them or another CacheDependency. Framework 2.0 introduced System.Web.Caching.SqlCacheDependency for database dependencies and System.Web.Caching.AggregateCacheDependency for multiple, related dependencies. With the AggregateCacheDependency, if one of the dependencies changes, it item is invalidated and tossed from the cache. Framework 2.0 also (finally) "unsealed" the CacheDependency class, so you could create your own cache dependencies. With expiration, you can have an absolute expiration (specific time) or a sliding expiration (TimeSpan after last access). Priority plays into the clean-up algorithm; the Cache will remove items that haven't expired if the cache taking up too much memory/resources. Items with a lower priority are evicted first. Do yourself a favor and make sure that you keep your cache items reasonable. Your AppDomain will thank you for it. ASP.net also provides page and partial-page caching mechanisms. That, however, is out of our scope here. For the adventurous among that don't know what that is ... So ... the cache ... mmmmm ... yummy ... gooooood. It's golly-gee-gosh-darn useful for items that you need on the site, but don't change often. Those pesky drop-down lookup lists that come from the database are begging to be cached. It takes a load off the database and is a good way to help scalability - at the cost of server memory, of course. (There ain't no free lunch.) Still, I'm a big fan of appropriate caching. So ... what's the technique I mentioned that this post is title after? Well, it's actually quite simple. It allows you to have 1 single common method to add and retrieve items from the cache ... any Linq item, in fact. You don't need to know anything about the cache ... just the type that you want and the DataContext that it comes from. And yes, it's one method to rule them all, suing generics (generics are kewl!) and the Black Voodoo Majick goo. From there, you can either call it directly from a page or (my preferred method) write a one-line method that acts as a wrapper. The returned objects are detached from the DataContext before they are handed back (so the DataContext doesn't need to be kept open all) and returned as a generic list object. The cache items are keyed by the type name of the DataContext and the object/table so that it's actually possible to have the same LinqToSql object come from two different DataContexts and cache both of them. While you can load up the cache on application start up, I don't like doing that ... it really is a killer for the app start time. I like to lazy load on demand. (And I don't wanna hear any comments about the lazy.) Here's the C# code: /// <summary> /// Handles retrieving and populating Linq objects in the ASP.NET cache /// </summary> /// <typeparam name="LinqContext">The DataContext that the object will be retrieved from.</typeparam> /// <typeparam name="LinqObject">The object that will be returned to be cached as a collection.</typeparam> /// <returns>Generic list with the objects</returns> public static List<LinqObject> GetCacheItem<LinqContext, LinqObject>() where LinqObject : class where LinqContext : System.Data.Linq.DataContext, new() { //Build the cache item name. Tied to context and the object. string cacheItemName = typeof(LinqObject).ToString() + "_" + typeof(LinqContext).ToString(); //Check to see if they are in the cache. List<LinqObject> cacheItems = HttpContext.Current.Cache[cacheItemName] as List<LinqObject>; if (cacheItems == null) { //It's not in the cache -or- is the wrong type. //Create a new list. cacheItems = new List<LinqObject>(); //Create the contect in a using{} block to ensure cleanup. using (LinqContext dc = new LinqContext()) { try { //Get the table with the object from the data context. System.Data.Linq.Table<LinqObject> table = dc.GetTable<LinqObject>(); //Add to the generic list. Detaches from the data context. cacheItems.AddRange(table); //Add to the cache. No absolute expirate and a 60 minute sliding expiration HttpContext.Current.Cache.Add(cacheItemName, cacheItems, null, System.Web.Caching.Cache.NoAbsoluteExpiration, TimeSpan.FromMinutes(60), System.Web.Caching.CacheItemPriority.Normal, null); } catch (Exception ex) { //Something bad happened. throw new ApplicationException("Could not retrieve the request cache object", ex); } } } //return ... return cacheItems; } And in VB (see, I am multi-lingual!) ... ''' <summary> ''' Handles retrieving and populating Linq objects in the ASP.NET cache ''' </summary> ''' <typeparam name="LinqContext">The DataContext that the object will be retrieved from.</typeparam> ''' <typeparam name="LinqObject">The object that will be returned to be cached as a collection.</typeparam> ''' <returns>Generic list with the objects</returns> Public Shared Function GetCacheItem(Of LinqContext As {DataContext, New}, LinqObject As Class)() As List(Of LinqObject) Dim cacheItems As List(Of LinqObject) 'Build the cache item name. Tied to context and the object. Dim cacheItemName As String = GetType(LinqObject).ToString() + "_" + GetType(LinqContext).ToString() 'Check to see if they are in the cache. Dim cacheObject As Object = HttpContext.Current.Cache(cacheItemName) 'Check to make sure it's the correct type. If cacheObject.GetType() Is GetType(List(Of LinqObject)) Then cacheItems = CType(HttpContext.Current.Cache(cacheItemName), List(Of LinqObject)) End If If cacheItems Is Nothing Then 'It's not in the cache -or- is the wrong type. 'Create a new list. cacheItems = New List(Of LinqObject)() 'Create the contect in a using block to ensure cleanup. Using dc As LinqContext = New LinqContext() Try 'Get the table with the object from the data context. Dim table As Linq.Table(Of LinqObject) = dc.GetTable(Of LinqObject)() 'Add to the generic list. Detaches from the data context. cacheItems.AddRange(table) 'Add to the cache. No absolute expirate and a 60 minute sliding expiration HttpContext.Current.Cache.Add(cacheItemName, cacheItems, Nothing, _ Cache.NoAbsoluteExpiration, TimeSpan.FromMinutes(60), _ CacheItemPriority.Normal, Nothing) Catch ex As Exception 'Something bad happened. Throw New ApplicationException("Could not retrieve the request cache object", ex) End Try End Using End If 'return ... Return cacheItems End Function   The comments, I think, pretty much say it all. It is a static method (and the class is a static class) because it's not using any private fields (variables). This does help performance a little bit and, really, there is no reason to instantiate a class if it's not using any state. Also, note the generic constraints - these are actually necessary and make sure that we aren't handed something funky that won't work. These constraints are checked and enforced by the compiler. Using this to retrieve cache items is now quite trivial. The next example shows a wrapper function for an item from the AdventureWorks database. I made it a property but it could just as easily be a method. We won't get into choosing one over the other; that gets religious. public static List<StateProvince> StateProvinceList { get { return GetCacheItem<AdvWorksDataContext, StateProvince>(); } } And VB ... Public ReadOnly Property StateProvinceList() As List(Of StateProvince) Get Return GetCacheItem(Of AdvWorksDataContext, StateProvince)() End Get End Property Isn't that simple? Now, if you only have one DataContext type, you can safely code that type into the code instead of taking it as a generic. However, looking at this, you have to admit ... you can use this in any ASP.net project where you are using Linq to handle the cache. I think it's gonna go into my personal shared library of tricks. As I think you can tell, I'm feeling a little snarky. It's Friday afternoon so I have an excuse. BTW ... bonus points to whoever can send me an email naming the lit reference (and finish it!) in this entry. Umm, no it isn't Lord of the Rings.

ADO.NET: What to use when?

.NET Stuff
This comes from a young lady's question last night at the Houston .Net User's Group. She's at HCC, taking C# courses and felt that it really wasn't clear when to use DataReaders vs. DataTables vs. DataSets. So ... I'm going to put a couple of thoughts down here on that topic, in the hope that she'll be reading it. DataReader These are provider specific classes (i.e. SqlDataReader, OracleDataReader, etc.) that give you access to a read-only, forward-only, server-side cursor. It's a firehose cursor. It's not scrollable, it's not updatable and the current record location is tracked by the server, not the client. The classes will be found in the namespaces for the provider (i.e. System.Data.SqlClient) and will implement the System.Data.IDataReader interface. You get a DataReader by calling ExecuteReader on the provider's DataCommand class (i.e. SqlDataCommand). When to use DataReaders provide the most performant way to read through a set of data returned from a query, by far. But they also provide the least additional functionality. In general, you want to use then when you want to do a single pass through a resultset and then don't need it. They are typically perfect for ASP.NET applications when you are doing databinding directly to resultsets where you won't need the data again and are going to toss it out after the page renders. If you are using a SqlDataSource, you can set the DataSourceMode property to DataReader to force the data source to use a reader (the default is DataSet).  Also, if you are using an object structure that you build from query results, you should use a DataReader to populate the object hierarchy. They will not, however, work with databinding in Windows Forms application ... this inherently needs to be scrollable and that's something that a DataReader won't do.One thing to consider ... even if you are doing a single pass, you may not always want to use a DataReader. If you are doing a non-trivial amount of additional processing on each record as it is read, you may want to consider a DataSet because the DataReader will keep a connection open on the database server and consume server resources. Where is that line? Hmmm ... it depends.Make sure that you dispose of your DataReaders (the do implement IDisposable). I will typically nest DataReaders in a Using block to make sure that they get disposed properly. I also open my DataReaders using the CommandBehavior.CloseConnection option. Untyped DataTable/DataSet DataTables and DataSets are not dependant on the ADO.NET provider but are common classes. Both DataSet and DataTable are found in the System.Data namespace and a DataSet is a container for multiple DataTables and adds things like parent/child relationships to multiple DataTables (so we'll focus mainly on DataTables but also discuss where DataSets come into play). DataTables use client-side cursors (like the old ADO Classic cursor location adUseClient) ... the data is retrieved from the server and then the connection is closed, a process that is done by the provider-specific DataAdapter. The DataAdapter opens a DataReader to do the population so, as you can guess, DataTables are slower than DataReaders. They hold no locks and they don't detect updates on the server. Unlike DataReaders, DataTables are updatable and scrollable. When to use First, if you are doing databinding directly to resultsets in WindowsForms applications, you'll need to use DataTables (at least). In web applications, these are very appropriate for data that you cache ... things like lookup lists, for example. Since these resultsets change very infrequently, caching the results can really help with scalability and performance (though be careful ... you probably won't want to cache a 5000 row DataTable). For a one-time, non-scrollable read, it really doesn't make any sense at all to use a DataTable in most circumstances, though (unless, as mentioned above, you are doing a non-trivial amount of processing on the data as it is read). There is a monkey wrench to throw out here ... if you have a high-volume, data-driven site, you may well want to do some testing with DataReaders vs. DataTables as your most important limiting factor may well be server load. In some scenarios, DataTables, because of the relatively quick open/close semantics the adapter's Fill method, can actually scale better though they don't perform as well. That's something that you need to determine by understanding what your scalability requirements are and where your critical bottlenecks are. In most scenarios, however, DataReaders will be a better choice. Just to muddy it up a hair, you can also get a DataReader from a DataTable (by calling CreateDataReader()), but this isn't the same as the DataReader mentioned above. It's not a server-side cursor. It is, however, forward-only and read-only and can enumerate through the rows in a DataTable faster than looping over the Rows collection in a For Each loop. DataTables allow you to add client-side columns to tables (which can be handy at times); these client-side columns can be completely custom information that the application populates for its own nefarious purposes or it can be a column based on a calculation of other columns. Since they are updatable, they can also be useful when data needs to be updated. When you use a DataTable for this purpose, the DataAdapter does the updates using the associated DataCommands for Insert, Update and Delete operations. Since a DataTable is disconnected, it won't inherently detect if there are concurrency conflicts ... you need to do that in your insert/update/delete commands. Handling these concurrency conflicts is a topic all in itself as there is no one "right" way to do it (how you do it depends on requirements and goals).DataTables also work with the System.Data.DataView class, allowing you to sort and filter the DataTable without making a round trip to the database. This is especially useful for caching scenarios.DataTables and DataSets are also serializable (unlike DataReaders) which means that you can easily pass them across the wire from a web server to a client application or persist them to disk. Typed DataTable/DataSet These are custom DataTables/DataSets that inherit from the base classes (above) and add strongly typed properties, methods and classes to the experience. Visual Studio has a designer that enables you to create these in a visual manner. When to use Typed DataTables and DataSets provide the best design-time experience ... you'll get intellisense, compile-time type-checking for your fields, key (both primary and foreign) enforcement and methods for navigating the relationships between tables. Yes, you can add relationships and keys to tables in an untyped DataSet, but you have to write the code for it. With the typed DataTables/DataSets, it's already in there (well, as long as you define them in the designer, that is). The custom columns mentioned above can also be defined in the DataSet designer as can additional Fill/Get methods for the DataAdapter to use. From a RAD perspective, typed DataSets are the way to go ... all of the stuff that you have to wire up with the untyped versions is done for you by the designer. Like their untyped progenitors, they can be serialized but this time they have strong typing when serializing. That said, typed DataTables and DataSets are the least performant, primarily due to instantiation cost (all of the fields, relationships, etc. are created in generated code when you instantiate one of these puppies). But, realistically, in many scenarios, this hit is worth the benefit to speed application development ... it's one of those things where you need to balance your requirements and goals. Typically, I'll tend to lean towards the RAD beneifts of typed datasets and use untyped only when there is no need at all for things like updates and strong typing (they do exist). Regardless of which method that you use, I have to say that I do love the way it is virtually impossible to get yourself into an endless loop with ADO.NET (you can, but you have to work at it). With DataReaders, you loop while(rdr.Read()). For DataTables it's foreach(DataRow rw in tbl.Rows) and DataViews is foreach(DataRowView vw in view). Very nice. If you've ever done ADO "Classic", you've forgotten to call MoveNext and had to get medieval on the process (and that could be IIS) to get it to stop running circles around itself. And, if you've done ADO "Classic", you have forgotten to do that. More than once. If you say you haven't done that, you're lying.So ... I hope that helps a bit. Unfortunately, there are times when there isn't a hard line in the sand when you choose one vs the other, especially when deciding between typed and untyped datasets. That's one of the reasons why I often refer to development more as an art than as a true engineering discipline ... in my mind, engineering is more defined and cut and dry. Calling it an art doesn't mean that you don't need rigor and discipline ... remember, the best artists are so because of the rigor and discipline that they apply to their art. Of course, that could also be because, long ago, I turned away from engineering as a major (what I intended when I started college) and went into English Lit instead (yes, lit is an artform). I know that I didn't even touch in Linq here ... that can complicate this decision tree as well.

New Account Email Validation (Part II)

.NET Stuff | Security | Web (and ASP.NET) Stuff
In my previous post, I discussed the things to keep in mind with new account validation. Well, as promised, I've done a sample of one way to do this. Certainly step 1 to to do as much as possible without writing any code, following the KISS principle. Since I am using the CreateUserWizard Control, I set the DisableCreatedUser property to true and LoginCreatedUser to false. Easy enough. But that's not the whole story. We need to generate the actual validation key. There are a lot of ways that one can do this. Personally, I wanted, as much as possible, to not have any dependency on storing the validation code in the database anywhere. This, of course, ensures that, should our database be penetrated, the validation codes cannot be determined. With that, then, the validation code should come from data that is supplied by the user and then generated in a deterministic way on the server. Non-deterministic, of course, won't work too well. I started down (and really, almost completed) a path that took the UserName and Email, concatenated them, generating the bytes (using System.Security.Cryptography.Rfs2898DeriveBytes) to create a 32-byte salt from this. I again concatenated the UserName and email, then hashing it with SHA1. This certainly satisfied my conditions ... the values for this would come from the user and so the validation code didn't need to be stored. And it was certainly convoluted enough that a validation code would be highly difficult to guess, even by brute force. In the email to the user, I also included a helpful link that passed the validation code in the query string. Still, this code was some 28 characters in length. Truly, not an ideal scenario. And definitely complex. It was certainly fun to get the regular expression to validate this correct ... more because I'm just not all that good at regular expressions then anything else. If you are interested, the expression is ^\w{27}=$, just in case you were wondering. Thinking about this, I really didn't like the complexity. It seems that I fell into that trap that often ensnares developers: loving the idea of a complex solution. Yes, it's true ... sometime developers are absolutely drawn to create things complex solutions to what should be a simple problem because they can. I guess is a sort of intellectual ego coming out ... we seem to like to show off how smart we are. And all developers can be smitten by it. Developing software can be complex enough on its own ... there really is no good reason to add to that complexity when you don't need to. There are 3 key reasons that come to mind for this. 1) The code is harder to maintain. Digging through the convolutions of overly complicated code can make the brain hurt. I've done it and didn't like it at all. 2) The more complex the code, the more likely you are to have bugs or issues. There's more room for error and the fact that it's complicated and convoluted make it easier to introduce these errors and then miss them later. It also makes thorough testing harder, so many bugs may not be caught until it's too late. So, I wound up re-writing the validation code generation. How did I do it? It's actually very simple. First, I convert the user name, email address and create date into byte arrays. I then loop over all of the values, adding them together. Finally, I subtract the sum of the lengths of the user name, password and creation date and subtract from the previous value. This then becomes the validation code. Typically, it's a 4 digit number. This method has several things going for it. First, it sticks to the KISS principle. It is simple. There are very few lines of code in the procedure and these lines are pretty simple to follow. There are other values that can be used ... for example, the MembershipUser's ProviderKey ... when you are using the Sql Membership provider, this is a GUID. But not depending on it gives you less dependence on this. Second, it is generated from a combination of values supplied by the user and values that are kept in the database. There is nothing that indicates what is being used in the code generation ... it's just a field that happened to be there. This value is not as random as the previous, I know. It's a relatively small number and a bad guy could likely get it pretty quickly with a brute-force attack if they knew it was all numbers. To mitigate against this, one could keep track of attempted validations with the MembershipUser using the comments property, locking the account when there are too many attempts within a certain time period. No, I did not do this. Considering what I was going to use the for (yes, I am actually going to use it), the potential damage was pretty low and I felt that it was an acceptable risk. Overall, it's a pretty simple way to come up with a relatively good validation code. And it's also very user-friendly. Here's the code:public static string CreateValidationCode(System.Web.Security.MembershipUser user) { byte[] userNameBytes = System.Text.Encoding.UTF32.GetBytes(user.UserName); byte[] emailBytes = System.Text.Encoding.UTF32.GetBytes(user.Email); byte[] createDateBytes = System.Text.Encoding.UTF32.GetBytes(user.CreationDate.ToString()); int validationcode = 0; foreach (byte value in userNameBytes) { validationcode += value; } foreach (byte value in emailBytes) { validationcode += value; } foreach (byte value in createDateBytes) { validationcode += value; } validationcode -= (user.UserName.Length + user.Email.Length + user.CreationDate.ToString().Length); return validationcode.ToString(); } Architecturally, all of the code related to this is in a single class called MailValidation. Everything related to the validation codes is done in that class, so moving from the overly-complex method to my simpler method was easy as pie. All I had to do was change the internal implementation. Now that I think of it, there's no reason why it can't be done using a provider model so that different implementations are plug-able. Once the user is created, we generate the validation code. It is never stored on the server, but is sent to the user in an email. This email comes from the MailDefinition specified with the CreateUserWizard ... this little property points to a file that the wizard will automatically send to the new user. It will put the user name and password in there (with the proper formatting), but you'll need to trap the SendingMail event to modify it before it gets sent in order to put the URL and validation code in the email. //This event fires when the control sends an email to the new user. protected void CreateUserWizard1_SendingMail(object sender, MailMessageEventArgs e) { //Get the MembershipUser that we just created. MembershipUser newUser = Membership.GetUser(CreateUserWizard1.UserName); //Create the validation code string validationCode = MailValidation.CreateValidationCode(newUser); //And build the url for the validation page. UriBuilder builder = new UriBuilder("http", Request.Url.DnsSafeHost, Request.Url.Port, Page.ResolveUrl("ValidateLogin.aspx"), "C=" + validationCode); //Add the values to the mail message. e.Message.Body = e.Message.Body.Replace("<%validationurl%>", builder.Uri.ToString()); e.Message.Body = e.Message.Body.Replace("<%validationcode%>", validationCode); } One thing that I want to point out here ... I'm using the UriBuilder class to create the link back tot he validation page. Why don't I just take the full URL of the page and replace "CreateAccount.aspx" with the new page? Well, I would be concerned about canonicalization issue. I'm not saying that there would be any, but it's better to be safe. The UriBuilder will give us a good, clean url. The port is added in there so that it works even if it's running under the VS development web server, which puts the site on random ports. I do see a lot of developers using things like String.Replace() and parsing to get urls in these kinds of scenarios. I really wish they wouldn't. Things do get a little more complicated, however, when actually validating the code. There is a separate form, of course, that does this. Basically, it collects the data from the user, regenerated the validation key and then compares them. It also checks the user's password by calling Membership.ValidateUser. If either of these fails, the user is not validated. Seems simple, right? Well, there is a monkey wrench in here. If the MembershipUser's IsValidated property is false, ValidateUser will always fail. So we can't fully validate the user until they are validated. But ... we need the password to validate their user account. See the problem? If I just check the validation code and the password is incorrect, you shouldn't be able to validate. What I had to wind up doing was this: once the validation code was validated, I had to then set IsApproved to true. Then I'd called ValidateUser. If this failed, I'd then set it back. protected void Validate_Click(object sender, EventArgs e) { //Get the membership user. MembershipUser user = Membership.GetUser(UserName.Text); bool validatedUser = false; if (user != null) { if (MailValidation.CheckValidationCode(user, ValidationCode.Text)) { //Have to set the user to approved to validate the password user.IsApproved = true; Membership.UpdateUser(user); if (Membership.ValidateUser(UserName.Text, Password.Text)) { validatedUser = true; } } } //Set the validity for the user. SetUserValidity(user, validatedUser); } You do see, of course, where I had to Approve the user and then check. Not ideal, not what I wanted, but there was really no other way to to it. There are a couple of things, however, that I want to point out. Note that I do the actual, final work at the very end of the function. Nowhere am I called that SetUserValidity method until the end after I've explored all of code branches necessary. Again, I've seen developers embed this stuff directly in the If blocks. Ewww. And that makes it a lot harder if someone needs to alter the process later. Note that I also initialize the validatedUser variable to false. Assume the failure. Only when I know it's gone through all of the tests and is good do I set that validatedUser flag to true. It both helps keep the code simpler and ensure that if something was missed, it would fail. Well, that's it for now. You can download the code at http://code.msdn.microsoft.com/jdotnet.

New Account Email Validation (Part I)

.NET Stuff | Security | Web (and ASP.NET) Stuff
We’ve all seen it … when you sign up for a new account, your account isn’t active until you validate it from an email sent to the registered email address. This allows sites with public registration to ensure a couple of things. First, that the email provided by the user actually does exist (and they didn’t have a typo). Second, it also validates that the person signing up has access to that email address. Now, let’s be clear, it doesn’t necessarily ensure that the user signing up is the legitimate owner of the email address … there isn’t much more that we can do to actually validate that as we don’t control the email system that they use … but, in a realistic world, that’s the best we can do. Now, there was a little exchange recently on the ASP.NET forums that I had with someone asking how to do this very thing with ASP.NET’s membership system. Of course, this is perfectly possible to do. Now, I do believe in the KISS (that’s Keep It Simple Stupid) principle, so I look at this from a perspective of using as much built-in functionality as possible to accomplish it. So, for example, I’d really prefer not to have any additional database dependencies such as new tables, etc., to support this new functionality (that isn’t there out-of-the-box) as possible. First things first … when the new account is created, the account (represented by the MembershipUser class) should have the IsApproved property set to false. This will prevent any logins until such time as the flag is changed. There are two ways to do this, depending on how you the user signs up. If you are using the built-in CreateUserWizard, you can set the DisableCreatedUser property to true.You can also do it if you are calling the API directly from a custom WebForm (or other method). This is accomplished by calling the CreateUser method on the Membership class. There are two overloads that will allow you to do this; both of them take a boolean IsApproved argument. Again, if this is false, the user won’t be allowed to log in until they are approved. Of course, in extranet-type scenarios with some user self-service, this can be used to validate that the newly registered extranet user is valid via a manual review process. And in those types of cases, because of the very nature of extranets, you would want it to be a manual review process to thoroughly vet the users. Note that you’ll also want to do this if you happen to be a judge and you may have some nasty personal stuff that some people may find offensive think leads to a conflict of interest in a case that you are trying. But that’s not what we are doing here. We want this to be open and completely self-service, but to still validate that the email is valid and the user has access to it, ensuring that we can communicate with them (or spam them, depending on your viewpoint). We’ve already discussed the whole e-mail-account-security thing … nothing that we can do about that, so we’ll just move on. But how can we further ensure that we have a (relatively) secure method for doing this, even with the whole e-mail security issue? First, we need to make sure that whatever validation code we use is not easy for a bad guy to guess … this does defeat the purpose. How far you go with this will certainly depend a great deal on what the risk is from this failure … for example, if this is a site where you have dog pictures, it’s not that big of a deal. If, however, it’s an ecommerce site, you need to be a bit more cautious. Second, we also need to make sure that the validation code wasn’t intercepted en route. Keep in mind – and a number of devs seem to forget this – SMTP is not a secure protocol. Neither is POP3. They never were; they just weren’t designed for it. (This highlights one of the things that I tell developers a lot … There is no majik security pixii dust … you cannot bolt “security” on at the end of the project; it absolutely must be built in from the initial design and architecture phase.) Everything in SMTP is transmitted in the clear, as is POP3. In fact, if you’re feeling ambitious, you can pop open a telnet client and use it for SMTP and POP3. It’s not the most productive but it is enlightening. These are the two things that come to mind that are unique to this scenario. There are additional things that you need to account for … Sql Injection, XSS and the rest of the usual suspects. Now that I’ve said all of that, I will also tell you that I’m working on a sample that shows some techniques for doing this. When I’m done, I will post it here along with a discussion of what was done and what alternative options are that you can do based on your needs, requirements and risk analysis. So … keep tuned right here for more fun and .Net goodness!

Thoughts on Secure File Downloads

.NET Stuff | Security | Web (and ASP.NET) Stuff
Well, that’s kinda over-simplifying it a bit. It’s more about file downloads and protecting files from folks that shouldn’t see them and comes from some of the discussion last night at the OWASP User Group. So … I was thinking that I’d put a master file-download page for my file repository. The idea around it is that there would be an admin section where I could upload the files, a process that would also put them into the database with the relevant information (name, content type, etc.). This would be an example of one of the vulnerabilities discussed last night … insecure direct object reference. Rather than giving out filenames, etc., it would be a file identifier (OWASP #4). That way, there is no direct object reference. That file id would be handed off to a handler (ASHX) that would actually send the file to the client (just doing a redirect from the handler doesn’t solve the issue at all). But I got to thinking … I might also want to limit access to some files to specific users/logins. So now we are getting into restricting URL access (OWASP #10). If I use the same handler as mentioned above, I can’t use ASP.NET to restrict access, leaving me vulnerable. Certainly, using GUIDs makes them harder to guess, but it won’t prevent UserA, who has access to FileA, sending a link to UserB, who does not have access to FileA.  However, once UserB logged in, there would be nothing to prevent him/her from getting to the file … there is no additional protection above and beyond the indirect object reference and I’m not adequately protecting URL access. This highlights one of the discussion points last night – vulnerabilities often travel in packs. We may look at things like the OWASP Top Ten and identify individual vulnerabilities, but that looks at the issues in isolation. The reality is that you will often have a threat with multiple potential attack vectors from different vulnerabilities. Or you may have a vulnerability that is used to exploit another vulnerability (for example, a Cross-Site Scripting vulnerability that is used to exploit a Cross Site Request Forgery vulnerability and so on and so on). So … what do I do here? Well, I could just not worry about it … the damage potential and level of risk is pretty low but that really just evades the question. It’s much more fun to actually attack this head on and come up with something that mitigates the threat. One method is to have different d/l pages for each role and then protect access to those pages in the web.config file. That would work, but it’s not an ideal solution. When coming up with mitigation strategies, we should also keep usability in mind and to balance usability with our mitigation strategy. This may not be ideal to the purist, but the reality is that we do need to take things like usability and end-user experience into account. Of course, there’s also the additional maintenance that the “simple” method would entail as well – something I’m not really interested in. Our ideal scenario would have 1 download page that would then display the files available to the user based on their identity, whether that is anonymous or authenticated. So … let’s go through how to implement this in a way that mitigates (note … not eliminates but mitigates) the threats. First, the database. Here’s a diagram:                                                               We have the primary table (FileList) and then the FileListXREF table. The second has the file ids and the roles that are allowed to access the file. A file that all are allowed to access will not have any records in this table. To display this list of files for a logged in user, we need to build the Sql statement dynamically, with a where clause based on the roles for the current user. This, by the way, is one of the “excuses” that I’ve heard about using string concatenation for building Sql statements. It’s not a valid one, it just takes some more. And, because we aren’t using concatenation, we’ve also mitigated Sql injection, even though the risk of that is low since the list of roles is coming from a trusted source. Still, it’s easy and it’s better to be safe. So … here’s the code. public static DataTable GetFilesForCurrentUser() { //We'll need this later. List<SqlParameter> paramList = new List<SqlParameter>(); //Add the base Sql. //This includes the "Where" for files for anon users StringBuilder sql = new StringBuilder( "SELECT * FROM FileList " + "WHERE (FileId NOT IN " + "(SELECT FileId FROM FileRoleXREF))"); //Check the user ... IPrincipal crntUser = HttpContext.Current.User; if (crntUser.Identity.IsAuthenticated) { string[] paramNames = GetRoleParamsForUser(paramList, crntUser); //Now add to the Sql sql.Append(" OR (FileId IN (SELECT FileId FROM " + "FileRoleXREF WHERE RoleName IN ("); sql.Append(String.Join(",", paramNames)); sql.Append(")))"); } return GetDataTable(sql.ToString(), paramList); } private static string[] GetRoleParamsForUser(List<SqlParameter> paramList, IPrincipal crntUser) { //Now, add the select for the roles. string[] roleList = Roles.GetRolesForUser(crntUser.Identity.Name); //Create the parameters for the roles string[] paramNames = new string[roleList.Length]; for (int i = 0; i < roleList.Length; i++) { string role = roleList[i]; //Each role is a parameter ... string paramName = "@role" + i.ToString(); paramList.Add(new SqlParameter(paramName, role)); paramNames[i] = paramName; } return paramNames; } From there, creating the command and filling the DataTable is simple enough. I’ll leave that as an exercise for the reader. This still, however, doesn’t protect us from the failure to restrict URL access issue mentioned above. True, UserA only sees the files that he has access to and UserB only sees the files that she has access to. But that’s still not stopping UserA from sending UserB a link to a file that he can access, but she can’t. In order to prevent this, we have to add some additional checking into the ASHX file to validate access. It’d be easy enough to do it with a couple of calls to Sql, but here’s how I do it with a single call … public static bool UserHasAccess(Guid FileId) { //We'll need this later. List<SqlParameter> paramList = new List<SqlParameter>(); //Add the file id parameter paramList.Add(new SqlParameter("@fileId", FileId)); //Add the base Sql. //This includes the "Where" for files for anon users StringBuilder sql = new StringBuilder( "SELECT A.RoleEntries, B.EntriesForRole " + "FROM (SELECT COUNT(*) AS RoleEntries " + "FROM FileRoleXREF X1 " + "WHERE (FileId = @fileId)) AS A CROSS JOIN "); //Check the user ... IPrincipal crntUser = HttpContext.Current.User; if (crntUser.Identity.IsAuthenticated) { sql.Append("(SELECT Count(*) AS EntriesForRole " + "FROM FileRoleXREF AS X2 " + "WHERE (FileId = @fileId) AND " + "RoleName IN ("); string[] roleList = GetRoleParamsForUser(paramList, crntUser); sql.Append(String.Join(",", roleList)); sql.Append(")) B"); } else { sql.Append("(SELECT 0 AS EntriesForRole) B"); } DataTable check = GetDataTable(sql.ToString(), paramList); if ((int)check.Rows[0]["RoleEntries"] == 0) //Anon Access {return true;} else if ((int)check.Rows[0]["EntriesForRole"] > 0) {return true;} else {return false;} } So, this little check before having the handler stream the file to the user makes sure that someone isn’t getting access via URL to something that they shouldn’t have access to. We’ve also added code to ensure that we mitigate any Sql injection errors. Now, I’ve not gotten everything put together in a “full blown usable application”. But … I wanted to show some of the thought process around securing a relatively simple piece of functionality such as this. A bit of creativity in the process is also necessary … you have to think outside the use case, go off the “happy path” to identify attack vectors and the threats represented by the attack vectors.

Fixing my instance of dasBlog

.NET Stuff | Web (and ASP.NET) Stuff
Well, I finally have the remaining issues that I had with my dasBlog installation fixed - or at least mostly so. Of course, now I see that there is a new build out on CodePlex (mostly bug fixes) but I'm not going to bother with it at this point in time. Still, in digging around in dasBlog, I've found a couple of things that I stumbled over that are (I feel) are issues. I'm not sure that I'd put them in a bugs per se as they don't break the application, but I do thing that they oughta be fixed. Don't get me wrong - dasBlog is a great blogging tool and pretty darn solid. I don't know if they've been fixed in the current release, but I'm gonna log them up there as well.  Both of these are in Exceptions for flow control: This is is newtelligence.dasBlog.Runtime.LoggingDataServiceXml::LoggingDataService.Xml. This method appears to be using exception handling for flow control. Here's what's going on: when you load up the logs for a particular day, it goes to get those services from the file system (the default location is under the Logs folder of the root web). First it checks for archived (zipped) files ... and it does this right by checking for file existence. But ... when it looks for files in the Xml format, it doesn't check for the file existence. It just tries to open it, assuming that the file is there. Well, it may not be ... if the date is in the future or there are no logs for it (i.e. you tried to get the logs for pre-dasBlog), it throws an FileNotFoundException. This is then caught by the generic application exception handler. The simple solution is to use a wrap the using block for the new StreamReader with an if block that checks File.Exists(path). newtelligence.dasBlog.Runtime.LoggingDataServiceXml.ArchiveLogFiles: This is using AppPool.QueueUserWorkItem for multi-threading. This isn't really ideal for multi-threading in a web application ... you really should use the page async model. Though ... I'm scratching my head as to the correct way to implement this. This is called by the WriteEvent method of the LoggingDataService ... so it's not something that you can really put into a PageAsyncTask. Doing so would, it seems, involved a change to the admin pages to call this on demand from the admin UI. That, however, may also violate the source independence of the archive provider. These aren't, by any stretch of the imagination, show stoppers. The exception handling one, though, really gets under my skin. Exception handling for flow control - and I'm calling it that, though really to do so is something of a stretch because it's only caught by the application's generic handler - is pretty bad mojo. And it's expensive. Much more expensive than a simple check. Just do an MSN Search for Exception Flow Control. It's all over and it's not language-specific. Fortunately, in this case, the fix is a simple one. I do have a little beef too, though and that's with the reporting. Daily reports into your email are good - don't get me wrong - but it'd be nice to see aggregated reports across a week, a month or more. And yes, before you ask, I've also applied to join the team so that I can fix things and add new stuff rather than sitting here and complaining about it.

Austin Code Camp Stuff ...

.NET Stuff | Linq | Performance | User Groups
I promised that I'd make the materials from my talk at the Austin Code Camp available for download. I've finally gotten it compressed and uploaded. It's 111 MB so be forewarned. Since I used WinRar (and that's not as ubiquitous as zip formats), I've made is a self-extracting archive. You'll need Visual Studio 2008 Team Edition for Software Developers (at least) to read all of the performance results. But I do have an Excel spreadsheet with the pertinent data.

Notes on performance testing

.NET Stuff | Performance
In performing the performance tests for Linq vs. ADO.NET, I spent quite a bit of time getting the methodology ironed out. Why? Well, I kept getting different results depending on the order in which the test methods were run. This struck me as somewhat odd and, honestly, even more frustrating. If the methodology was valid, one would certainly expect the results to be consistent regardless of the order in which the test methods were called. Of course, the first things that comes to mind is the connection pool. The first access to the database with a particular set of credentials would create the pool and take the hit for opening the connection to Sql Server. This would skew the results against the first called test run. This was an easy one and one that I had figured out before even running the tests. Creating and opening the connection before any of the tests were run was a no-brainer. But something else was going on. The first method called on a particular run seemed to have a performance advantage. I even, at one time on previous tests, had case statements to alter the order ... but even then I'd get different results on different runs. This left me scratching my head a bit. Eventually, though, it occurred to me. There's a bunch of stuff that the Framework does for us and it's sometimes easy to forget about these things and how the impact performance. In this case, it was garbage collection. And it makes complete sense. Think about it ... the GC in non-deterministic. It happens pretty much when the runtime "feels" like it. So ... the GC would happen in various places and invariably skew the results somewhat. The impact didn't seem to be evenly distributed. Why the skewing? Because the GC, when it does a collection, halts all thread processing while it does its thing. Of course, when this occurred to me, it was a "DOH!" moment. Once I added a call to GC.Collect() after every call to a test method, the results were, as I expected, remarkably similar across all of the test runs, regardless of the order in which they were called. Confirming, of course, my newly realized theory about the garbage collection and its impact on my performance tests. I did, for the final "numbers" toss out the low and the high values and re-averaged. Since Windows always has other things going on, some of those things may take a time slice or two of the processor from the test run. Or not take any. Still, doing this actually made very little difference to the results. As I think about it, though, I should also create an instance  of every class that I create in order to make sure that the type is initialized in memory and the dll is loaded. But, looking at the results, this really didn't appear to make much difference. Still, on future tests, I'll start doing that. Now, keep in mind that this applies only to artificial tests. And if you look at the Linq vs. ADO.NET tests, they were certainly quite artificial. Not what you would do in a real-world application. This was, of course, really only designed to test raw numbers for each of the methods that were being used at the time. When you are doing performance testing on your applications, this kind of testing methodology is invalid, to say the least. And calling GC.Collect() after every method call will, without question, hurt the overall performance of your application. So don't do it. For your individual applications, you need to take a holistic approach; test the application in the way it is expected to be used on the real world. Of course, this can only go so far because users will, invariably, do something that we didn't expect (why is that???) and telling them "Well, just don't do that" never seems to be an acceptable answer. For web applications, this needs to go a step further - in web apps, performance != to scalability. They are related, to be sure, but not the same. I've seen web apps that perform pretty well ... but only with a few users, keeling over when they get 20 or more users. That's not good.

Thoughts on Linq vs ADO.NET - Simple Query

.NET Stuff | Linq | Performance
I had a little discussion today with an old buddy of mine this morning. I won't mention his name (didn't ask him for permission to) but those of you in Houston probably remember him ... he used to be a Microsoft guy and is probably one of the best developers in town. I have a world of respect for him and his opinion. So ... it started with this ... he was surprised by the "do you think a user will notice 300 ms".  Of course, that's a loaded question. They won't. But his point was this: 300 ms isn't a lot of time for a user, but under a heavy load, it an be a lot of time for the server. Yes, it can be ... if you have a heavy load. I won't give a blow-by-blow account of the conversation (I can't remember it line for line anyway), but it was certainly interesting. One thing that we both agreed on that is important for web developers to understand is this: performance is not equal to scalability. They are related. But they are not the same. It is possible (and I've seen it) to create a web app that is really fast for a single user, but dies when you get a few users. Not only have I seen it, but (to be honest here), I've done it ... though, in my defense, it was my first ASP "Classic" application some 10 or 11 years ago; I was enamored with sessions at the time. This was also the days when ADO "Classic" was new and RDO was the more commonly used API. And ... if you are a developer and haven't done something like that ... well, you're either really lucky or you're just not being honest. With that out of the way ... I'd like to give my viewpoint on this: Data Readers are still the fastest way to get data for a single pass. If it's one-time-use data that is just thrown away, it's still the way to go. No question. (At least, IMHO). But there's a lot of data out there that isn't a single pass and then toss ... it may be something that you keep around for a while as the user is working on it (which you often see in a Smart Client application) or is shared among multiple users (such as a lookup field that is consistent ... or pretty much consistent ... across all users). In both of these cases, you will need to have an object that can be held in memory and accessed multiple times. If you are doing a Smart Client application, it also needs to be scrollable. Data Readers don't provide this. So ... if you are doing these types of things, the extra 300 ms is actually well worth it, In a web application, you'll scale a lot better (memory is a lot faster than a database query and it keeps load off the database server for little stuff) by caching common lookup lists in the global ASP.NET Cache. One thing that I find interesting ... the LinqDataSource in ASP.NET doesn't have an EnableCaching property like the SqlDataSource. It does, however, have a property StoreOriginalValuesInViewState.  Hmmm ... curious. Storing this in ViewState can have its benefits ... it's a per-page, per-user quasi-cache ... but at the cost of additional data going over the wire (which might be somewhat painful over a 28.8 modem ... yes, some folks still use those). That said, ViewState is compressed to minimize the wire hit and can be signed to prevent tampering. But ... the EnableCaching puts the resulting DataSet (it won't work in DataReader mode) into the global ASP.NET cache ... which, again, is good for things like lookups that really don't change very often, if at all.  For the Smart Client application ... well, DataReaders have limited use there anyway due to the respective natures of DataReaders and Smart Client apps.  Granted, you can use a DataReader and then manually add the results to the control that you want it to display in ... but that can be a lot of code (yeah, ComboBoxes are pretty simple, but a DataGrid ... or a grid of any sort?). One thing that struck me is the coding involved with master/child displays in Smart Client applications. There's two ways that you can do this in ADO.NET: You can get all the parents and children in one shot and load 'em into a DataSet (or object structure) -or- you can retrieve the children "on demand" (as the user requests the child). Each method has it benefits, but I'd typically lean to the on-demand access, especially if we are looking at a lot of data. This involves writing code to deal with the switching of the focus in the parent record and then filling the child. Not something that's all that difficult, but it is still more stuff to write and maintain. With Linq to Sql, this can be configured with the DeferredLoadingAvailable property of the DataConnection and it will do it for you - depending on the value of this property (settable at runtime - you won't see it in the property sheet in the DataContext designer). There was also some discussion about using Linq vs. rich data objects. This ... hmmm ... well, I'll just give my perspective. This is certainly possible with Linq, though certainly not with anonymous types (see http://blog.microsoft-j.net/2008/04/15/LinqAndAnonymousTypes.aspx for a discussion of them). But ... the Linq to Sql classes are generated as partial classes, so you can add to them to your heart's delight. As well as add methods that hit stored procs that aren't directly tied to a data class.  Additionally, you can certainly use Linq to Sql to have existing (or new) rich data classes that you create independently of your data access and then filled from the results of your query. As for the performance of these ... well, at the current moment, I don't have any numbers but I'd venture to guess that the performance would be comparable to anonymous types. Performance aside, one thing that you also need to consider when looking to use Linq in your projects is not just the performance, but the other benefits that Linq brings to the table. Things like the ease of sorting and filtering the objects returned by Linq to Sql (or Linq to XML for that matter) using Linq to Objects. There is also the (way cool, IMHO) feature that lets you merge data from two different data sources (i.e. Linq to Sql and Linq to XML) into a single collection of objects or a single object hierarchy. Additional capabilities and functionality of one methodology over another are often overlooked when writing ASP.NET applications ... it's simply easier to look at the raw, single user, single page performance without thinking about the data in the holistic context of the overall application. This is, however, somewhat myopic; you need to keep the overall application context in mind when making technology and architecture decisions. This in mind ... hmmm ... off to do a bit more testing. Not sure if I'll do updates first or Linq sorting and filtering vs. DataViews.

Linq vs. ADO.NET - Simple Query

.NET Stuff | Performance | Linq
In my last blog post, I took a look at how Linq handles anonymous types. I also promised to do some performance comparisons between Linq and traditional ADO.NET code. Believe it or not, creating a "fair" test is not as easy as one would think, especially when data access is involved. Due to the nature of connection pooling, whichever method is first to be tested gets hit with the cost of creating the connection ... which skews the test. Yeah, I'm sure this is out there in the blogosphere, but I do like to do these things myself. Call it the Not-Invented-Here syndrome. This particular test set is for a very simple query. I created a set of 4 methods to test for performance within a standard Windows Console Application, which should give an overall comparison of data access. All tests used the AdventureWorks sample database, with the statement (or its Linq equivalent) Select FirstName, LastName From Person.Contact. This is about as simple a query as you can get. From there, each method concatenated the two field results into a single string value ... The Linq test used an anonymous type going against a data class created with the Data Class designer. Data Reader Test 1 (DataReaderIndex) used the strongly-typed DataReader.GetString(index) ... and I did cheat a little with this one by hardcoding the index rather than looking it up before entering the loop (though this is how I'd do it in the "real world"). In previous tests that I've done, I've found that this gives about 10-20% better performance than DataReader[columnName].ToString() ... though that does include the "lookup" that I mentioned previously. Data Reader Test 2 represents the more common pattern that I've seen out there ... using DataReader[columnName].ToString(). Now, I'm not sure which of these methods Data Binding uses and, honestly, that's not in the test ... though, now that I think of it, it may be a good thing to test as well. Finally, I included a test for DataSets (TestDataSet) ... using an untyped DataSet. I've found (again, from previous tests) that this performs far better than a typed DataSet ... the typed DataSet gets hit (hard) by the creation/initialization costs. Before running any tests, I included a method called InitializeConnectionPool, which creates and opens a connection, creates a command with the Sql statement (to cache the access plan), calls ExecuteNonQuery and then exits. This is not included in the results, but is a key part of making sure that the test is as fair as possible. Additionally, all of the tests access the connection string in the same way ... using the application properties. In looking at the code generated by the LinqToSql class, this is how they get the connection string. This ensures that the connection string for all methods is the same, which means that the connection pools will be the same. To actually do the test, I called each method a total of 30 times from the applications Main, each function in the same loop. This would help to eliminate any variances. After running each test, I also called GC.Collect() to eliminate, as much as possible, the cost of garbage collection from the results.  I also closed all unnecessary processes and refrained from doing anything else to ensure that all possible CPU and memory resources were allocated to the test. One thing that I've noticed from time to time is that it seems to matter the order in which functions are called, so I made a total of 4 runs, each with a different function first. For each run, I tossed out the min and max values and then averaged the rest -- (total - min - max)/(numCalls -2). This gave me a "normalized" value that, I hoped, would provide a fair, apples-to-apples comparison. Each method had a set of 4 values, each with 30 calls, 28 of which were actually included in the normalized value. I then took the average of the 4 values. I know that sounds like an overly complex methodology ... and I agree ... but I've seen some weird things go on and some pretty inconsistent results. That said, in looking at the results, there was not a lot of difference between each of the 4 runs, which makes me feel pretty good about the whole thing. So ... without further ado ... the results (values are in milliseconds): Method Normalized Average TestDataReaderIndex 56.64767857 TestLinq 75.57098214 TestDataSet 117.2503571 TestDataReaderNoIndex 358.751875 Now, I have to say, I was somewhat surprised by the TestDataReaderNoIndex results ... previous tests that I had done didn't show such a big difference between this and TestDataReaderIndex ... though I wonder if that has something to do with the way I did this test - hardcoding the indexes into TestDataReaderIndex. I'm not surprised that TestDataReaderIndex turned out the be the fastest. DataReaders have been, and still are, the absolute fastest way to get data from the database ... that is, if you do it using integer indexes. However, TestLinq didn't come that far behind and was certainly more performant than the untyped DataSet. So ... let's think about this for a second. The Linq collection that is returned is more like a DataSet than it is a DataReader. DataReaders are forward-only, read-only server-side cursors. Use them once and kiss them goodbye. Both the Linq collection and the DataSet allow random access and are re-startable ... and they are both updatable as well. I've had a lot of folks ask about the performance of Linq and now I can, without question and with all confidence, tell them that the performance is quite good. Still, let's be honest ... the difference between the fastest and the slowest is a mere 300ms. Do you really think users will notice this? UPDATE: You can download the code and the tests that I used for this at https://code.msdn.microsoft.com/Release/ProjectReleases.aspx?ProjectName=jdotnet&ReleaseId=948. If you get different results, I'd be interested to hear about it. Even more, I'd be interested in the methodology that you used to create the report.