Ruminations of J.net idle rants and ramblings of a code monkey

Moving from dasBlog to BlogEngine.NET

BlogEngine.NET | Web (and ASP.NET) Stuff
As I mentioned previously, I’ve moved from dasBlog to BlogEngine.NET for this blog. This, of course, involved reformatting and redesigning the look and feel of the site; that’s nothing unique to the migration and I’m not going to go into that at all. What I will do, however, is discuss the process of moving existing content over from dasBlog to BlogEngine, something that isn’t really hard but does have a few gotchas. Moving the Content That’s the first thing that needs to be done. In fact, I did this before I even started formatting the new site – I wanted to be sure that the existing content rendered relatively well in the new design. It was not quite as simple as described on Merill’s blog. All of his steps are valid, but there is actually a couple of other things that need to be done. You will definitely want to use the dasBlog to BlogML Converter that Merill posted on MSDN Code Gallery – dasBlog doesn’t do BlogML and, while BlogEngine will import RSS, RSS usually will not get all of your content. BlogML works much better. There were two things with moving the content … how big a deal those are depend on how picky you are about the move. I was. First, the timestamp on the entry. dasBlog uses UTC (GMT) to store the time and that’s how it is imported into BlogML. BlogEngine uses the server time. Both have an offset to convert the saved time into blog local time, but dasBlog’s offset is from UTC (using standard time zones) and BlogEngine uses an offset from the server time. My server is on US Eastern Time and my local blog time is US Central Time, which means that, on import, I had to convert the time to US Eastern and then set my offset in BlogEngine to –1 (US Central is 1 hour “behind” US Eastern). To do this, I had to modify the code that imported the blog entries, which can be found at BlogEngine.Web\api\BlogImporter.asmx. Since the incoming BlogML only had a post date, not a DateCreated and a DateModified (as BlogEngine does), I also set both the create date and the modify date to the same value. Here’s the code snippet from AddPost: Post post = new Post(); post.Title = import.Title; post.Author = import.Author; post.DateCreated = import.PostDate.AddHours(-5); post.DateModified = import.PostDate.AddHours(-5); post.Content = import.Content; post.Description = import.Description; post.IsPublished = import.Publish; Once I set BlogEngine’s server time offset (in the admin section under “Settings”), all of the times were now correctly displayed as US Central. The second thing relates to the tags … BE uses tags (and categories) while dasBlog only uses categories. In dasBlog, the “tag cloud” is generated from the categories and BE generates this from the actual post tags. I can’t say which method I like better yet or if I prefer some mish-mash of the two (generate the cloud from tags and categories … that may be an idea) but I did know that I didn’t want to lose my tag cloud. So, on import, I added tags for each post category to the imported post. Again, simple and again, in AddPost: if (import.Tags.Count == 0) { post.Tags.AddRange(import.Categories); } else { post.Tags.AddRange(import.Tags); } From a performance standpoint, I couldn’t tell you if AddRange is faster than looping block to add each value individually (and it really doesn’t matter here), but it is simpler, cleaner and much easier to read … so I tend to prefer AddRange(). With these two “issues” resolved – and they aren’t issues with BE, to be sure, just a difference between the two – I was ready to move on. Preserving links Some of my entries have a pretty good page rank on various search engines and there is a non-trivial amount of traffic that is generated from these search engines. While I can go in and change things like the RSS source for my feed from FeedBurner to make the move transparent, that doesn’t help with the search engines. Therefore, I needed a way to ensure that existing links would continue to work without returning 404’s. Yes, I moved the old domain over to the new domain and added it as a host header on the new site, but that does not help prevent link breakage and BE and dasBlog have different formats for their links. I also did not, at this point in time, want to force a redirect as soon as a new person hit my site from a search engine; it’s just rude (IMHO) and doesn’t create a great user experience. Sure, maybe it wouldn’t be a big deal, but I didn’t like it. And besides, it gave me an excuse to write code. :-) To keep the links intact, I decided that I would leave BlogEngine’s UrlRewriting intact; I didn’t want to make too many changes to the base source code as it would make it harder for me to move between versions/revisions. Rather, I wanted to sit on top of it and make sure that the links worked. So I used ASP.NET Url Routing to intercept the requests and send them to the right place (post.aspx). Before I go into the code, let’s first examine the (default) url structures for individual posts. In dasBlog, the post link is in the format yyyy/mm/dd/{CompressedTitle}. In BlogEngine, this would be post/{CompressedTitle} –or- (the permalink format) post.aspx?id={PostGUID}. While BE can have the date as a part of the post link, it still wouldn’t work; they compress their titles differently and, as mentioned before, dasBlog uses UTC internally and it’s used in the link as well. For the routing, I created the route using "{Y}/{M}/{D}/{Title}" as the route url. From there, I needed to implement GetHttpHandler to do the work. Initially, I did the matching to title using a Linq query and it worked just fine. The problem with this is that every title in the posts would need to be converted to the dasBlog format (I copied over the dasBlog CompressTitle method as dasBlogCompressTitle), a process that seemed far from ideal. Once I understood how the dates worked, I was able to do the primary matching on the date and then, if necessary, match the titles for posts on the same date, minimizing the string manipulation that was required. Once I determined what the matching post was, all I needed to do was append the query string “?id={postGuid}” to the URL and then pass back the HttpHandler from post.aspx for the actually processing. If there was no match, then there would be no query string appended and post.aspx would show a 404. The code for this is below: public System.Web.IHttpHandler GetHttpHandler(RequestContext requestContext) { //Get the date from the route. string dateString = string.Format(@"{0}/{1}/{2}", requestContext.RouteData.Values["M"], requestContext.RouteData.Values["D"], requestContext.RouteData.Values["Y"]); string titleString = ((string)requestContext.RouteData.Values["Title"]).Replace(".aspx", ""); DateTime postDate = DateTime.MaxValue; Post selectedPost = null; if (DateTime.TryParse(dateString, out postDate)) { //Date is valid at least. //Find posts with the same date. //Date in URL is in UTC. var postsForTitle = from p in Post.Posts where p.DateCreated.ToUniversalTime().Date == postDate select p; if (postsForTitle.Count() == 1) { //There is only one posts for the date, so this must be it. selectedPost = postsForTitle.First(); } else { //differentiate on title. foreach (var p in postsForTitle) { if (dasBlogCompressTitle(p.Title).Equals(titleString, StringComparison.InvariantCultureIgnoreCase)) { selectedPost = p; break; } } } if (selectedPost != null) { //Use UrlRewriting to put the id of the post in the query string. requestContext.HttpContext.RewritePath(requestContext.HttpContext.Request.Path + "?id=" + selectedPost.Id.ToString(), false); } } return BuildManager.CreateInstanceFromVirtualPath( "~/post.aspx", typeof(System.Web.UI.Page)) as System.Web.IHttpHandler; } Once I set it up on the web.config file and added the routes to the RouteTable, all was good and it worked fine. Preserving the RSS Feed Url The final step - and the thing that occurred to me last – was to make sure that the RSS feed url continued to work. While FeedBurner had no problem with changes the RSS url for my blog, there was the possibility (however remote it may have seemed) that someone was using the dasBlog’s RSS feed rather than FeedBurner. I’m not sure how remote a possibility this is but I didn’t use FeedBurner in the early days of the blog, so I figured that it might be an issue. And I certainly wouldn’t want to alienate the longest-time subscribers to my feed. This was incredibly simple and didn’t require any code at all, just two lines line in the web.config file to have SyndicationService.asmx (dasBlog’s RSS feed) handled by BlogEngine’s RSS feed, which is implemented as an HttpHandler and, by default, at syndication.axd. The first line is for IIS 6.0/IIS 7.0 Classic mode and is under the httpHandlers node of system.web: <add verb="*" path="SyndicationService.asmx" type="BlogEngine.Core.Web.HttpHandlers.SyndicationHandler, BlogEngine.Core" validate="false"/ The second goes in the corresponding location for IIS 7 Pipeline mode, in the handlers node of system.webServer: <add name="dasBlogSyndication" verb="*" path="SyndicationService.asmx" type="BlogEngine.Core.Web.HttpHandlers.SyndicationHandler, BlogEngine.Core" resourceType="Unspecified" requireAccess="Script" preCondition="integratedMode"/> These were copied from the default nodes used by BlogEngine for it’s syndication.axd and then the relevant attributes were changed. Simple enough.