Ruminations of J.net idle rants and ramblings of a code monkey

TF209001: NullReferenceException in TFBuild Logger with Custom Task

.NET Stuff | TFS

This has been driving me absolutely nuts for … oh … a month or so (off and on). Here’s the scenario: I’ve got a custom task that access the Team Foundation Server to do some things before the build really gets kicking; the task is called in the BeforeEndToEndIteration target, which is one of the ones that are specified as extensible. So … nothing wrong there.

The custom task itself changed around the workspace settings for the build so that we could have 1 build for each branch of the code. Switching between the branches is done based on a property that is passed when kicking off the build (/p: switch). This allows one build definition for all of the branches, rather than having a separate build for each branch or manually changing the workspace mappings before kicking off the build. The task itself worked perfectly; the builds pulled and labeled the correct branch. Running the TFBuild standalone, debugging with MSBuild Sidekick as I previously detailed showed no problems and everything was fine. But, when running the build under full Team Build caused errors and the build itself was marked “Partially Succeeded”. Here’s what the build results looked like:

image

Searching for TF209001 gave me nada. Trying to figure out what was going on, I tracked the error to the Team Build logging component (in Microsoft.TeamFoundation.Build.Server.Logger.dll) and the error message was a generic one from a try{}catch{} block in the LogProjectStartedEvent. However, I couldn’t tell much else and talking with some of my friends at MSFT didn’t yield any information either.

So it bugged me. I knew that there was nothing wrong with my task … it ran just fine on its own. Even when the error happened, it did what it was supposed to do successfully and the build was actually fine. When the task was removed, everything with the logger was fine and there were no errors. But even when my task didn’t actually remap any of the workspace mappings, the logger ran into this error. It could only be some sort of mysterious interaction between the task and the TF Build logger.

Frustrated, I put this on the side burner for a while and just came back to it yesterday, determined to get to the bottom of this mystery and get everything working. Looking through everything in Reflector – something every .NET developer should have in their toolbox – didn’t show anything that really jumped out at me. The only thing that I could possibly guess was that, somehow, in the call to EnsureWorkspace(), the workspace wasn’t getting created. Checking the MSBuild properties WorkspaceName and WorkspaceOwner showed, however, that these were fine and being passed along correctly.

Baffled, I finally decided to fire up cordbg and get to the bottom of this. I knew that it was a (caught) NullReferenceException so I set the debugger to break on all NullReferenceExceptions.

(cordbg) ca e System.NullReferenceException

Next … fire up Task Manager so that I can grab the PID of the MSBuild process that Team Build used to run the build. Then kick off the build and attach to MSBuild. Finally, the debugger stopped on the exception:

First chance exception generated: (0x15e37fc) <System.NullReferenceException>
  _className=<null>
  _exceptionMethod=<null>
  _exceptionMethodString=<null>
  _message=(0x15e38e4) "Object reference not set to an instance of an object."
  _data=<null>
  _innerException=<null>
  _helpURL=<null>
  _stackTrace=(0x15e3960) <System.SByte[]>
  _stackTraceString=<null>
  _remoteStackTraceString=<null>
  _remoteStackIndex=0
  _dynamicMethods=<null>
  _HResult=-2147467261
  _source=<null>
  _xptrs=4572092
  _xcode=-1073741819
Exception is called:FIRST_CHANCE

Now we’re getting somewhere. Let’s see where we are. The where command will give us a stack trace.

(cordbg) w
Thread 0x2168 Current State:GCUnsafe spot
0)* Microsoft.TeamFoundation.Client.TeamFoundationServer::GetService +0055[native] +0014[IL] in <Unknown File Name>:<Unknown Line Number>
1)  Microsoft.TeamFoundation.Build.Server.Logger.BuildLoggerBase::get_VersionControlServer +0055[native] +0000[IL] in <Unknown File Name>:<Unknown Lin
e Number>
2)  Microsoft.TeamFoundation.Build.Server.Logger.BuildLogger::EnsureWorkspace +0317[native] +0000[IL] in <Unknown File Name>:<Unknown Line Number>
3)  Microsoft.TeamFoundation.Build.Server.Logger.BuildLogger::LogProjectStartedEvent +0093[native] +0047[IL] in <Unknown File Name>:<Unknown Line Numb
er>
4)  Microsoft.Build.BuildEngine.EventSource::RaiseProjectStartedEvent +0061[native] +0008[IL] in <Unknown File Name>:<Unknown Line Number>
5)  Microsoft.Build.BuildEngine.EventSource::RaiseStronglyTypedEvent +0390[native] +0131[IL] in <Unknown File Name>:<Unknown Line Number>
6)  Microsoft.Build.BuildEngine.EngineLoggingServicesInProc::ProcessPostedLoggingEvents +0213[native] +0180[IL] in <Unknown File Name>:<Unknown Line N
umber>
7)  Microsoft.Build.BuildEngine.Project::Load +1608[native] +0845[IL] in <Unknown File Name>:<Unknown Line Number>
8)  Microsoft.Build.BuildEngine.SolutionWrapperProject::ScanProjectDependencies +0319[native] +0114[IL] in <Unknown File Name>:<Unknown Line Number>
9)  Microsoft.Build.BuildEngine.SolutionWrapperProject::CreateSolutionProject +0208[native] +0098[IL] in <Unknown File Name>:<Unknown Line Number>

We see the entire stack from MSBuild all the way into the logger. It turns out that the exception is happening in the GetService() method of Microsoft.TeamFoundation.Client.TeamFoundationServer. The stack trace doesn’t show use the actual source lines because we don’t have a pdb for the assembly, but it does show the IL line number, which will get us to what’s going on in ILDASM. Here’s what we find there:

IL_000f:  ldfld      class [System]System.ComponentModel.Design.ServiceContainer Microsoft.TeamFoundation.Client.TeamFoundationServer::m_serviceContainer
  IL_0014:  ldarg.1

In Reflector, this maps to:

object service = this.m_serviceContainer.GetService(serviceType);

If GetService() returns null, it’s no problem; that’s handled. But if m_serviceContainer is null … that isn’t handled and will throw an exception as soon as it tries to call GetService(). Now we’ve identified where the exception is coming from. But we still don’t know why. Investigating further, m_serviceContainer is created in the constructor of the TeamFoundationServer class, which is working just fine or we wouldn’t have even gotten this far. So how does m_serviceContainer get set to null? It turns out that it’s set to null in TeamFoundationServer::Dispose(). And only there. Hmmm … how is the TeamFoundationServer object getting disposed?

It turns out that I was disposing the TeamFoundationServer class. How did this affect the logger? Simple … both components were calling TeamFoundationServerFactory.GetServer() to return an instance of the TeamFoundationServer class. When I called this method, I wrapped the TeamFoundationServer class in a using{} block (it implements IDisposable) like a good little .NET doobie. A look into GetServer(), however, shows that it creates a cache (using a Dictionary) with a single instance of the TeamFoundationServer for each unique TFS URI. It’s  static and handed to all clients that request the same URI; this single instance lives the entire lifetime of the process and is shared around. (Considering how long it can take to initialize the session context, this isn’t a bad thing.) However, because of this, when I called Dispose(), it was on the same instance that the distributed logger was using! (BTW … this is not documented anywhere that I could find!)

The solution – somewhat counter intuitively – is to not dispose the disposable TeamFoundationServer object when you are done with it. (Don’t make a habit of it though, OK?) Once I took out the using{} block, all was well and the build ran without any hitch at all. This is only applicable to instances that are retrieved from TeamFoundationServerProxy.GetServer(); if you create the instance using new, then you should make sure that you dispose it when you are done.

This smells all kinds of wrong to me. I don’t know what the official stance would be on this, but it smells suspiciously like a bug to me. The first thing is that the TeamFoundationServer class isn’t checking to see if it is disposed before it goes about its happy, merry way, contrary to what I thought was the recommended design pattern for implementing IDisposable; this includes checking to see if the class is disposed before doing any work in a method … and throwing an ObjectDisposedException when a method is called after being disposed. If you don’t do this, then be smart enough to realize when you’ve been previous disposed (somewhere) and reinitalize when necessary (ewww … no, that’s just wrong. Don’t do that.)

So there it is, the end to the mystery.

Comments are closed