You can finally download this hotfix from here and the detailed information is available here. According to the information posted there, it is a fix for an access violation problem when making virtual function calls to the IList<T>, IEnumerable<T>, or ICollection<T> interface. I am not sure yet if this is another manifestation of the same dispatch token problem I described in my previous blog or the CLR team simply bundled together several virtual dispatch bug fixes that just include what I found.
I ran the same test harness that I used before and confirmed that the hotfix does indeed resolve the problem.
I was having some trouble installing the hotfix today but eventually got it installed. It turned out that my OS was Vista SP1 and I had to upgrade it to SP2 in order to install the hotfix. If you have another OS and wondering about the same question as Chris Dion had, the hotfix should be applicable to Windows Server 2000, Windows Server 2003, Windows Server 2008 and Windows XP.
Hotfix KB971030 Released
Posted by Hugh Ang at 9/02/2009 04:58:00 PMCLR Virtual Call Stub Issue Resolved
Posted by Hugh Ang at 5/15/2009 01:51:00 PMGood news - I have just received words from the Microsoft CLR team that the issue I detailed in this blog has been fixed in an upcoming QFE, which will be available for download through the Support/KB site using KB971030 in about a week. This also means that the fix will be automatically included in any future 3.5 service pack release. In addition, this has been fixed in the upcoming 4.0 release. I will try the fix in a week and keep everyone posted.
Making Assumptions in Application Design
Posted by Hugh Ang at 2/18/2009 11:46:00 AMBack during the holidays, I wrote a small .NET program to manipulate my Canon XTi camera, which was connected to the PC via USB. Everything worked great when I was using my laptop that had Vista on it. But on my wife's laptop that had XP, the performance became unbearable. My utility program would wait for several minutes before I could start using it. The program's thread was pretty much idle during the waiting so I was asking myself: could this be the PC hardware? I tested this program on one of my desktops that also had XP. Nope, still the same poor performance. I then updated the camera's firmware and that didn't resolve the issue either. I began to suspect that this was OS related. Sure enough, when I plugged in the camera without my program running, XP took a much longer time than Vista to come up with the camera and scanners wizard. I noticed that while XP seemed to be cluelessly waiting, the CF card indicator light of the camera was flashing madly. I realized that XP was probably trying to download pictures from the camera, which was busy reading from the CF card. At the time, I had about 300 high resolution pictures on the 2GB CF card. After I switched it with an empty CF card, the performance problem just went away.
The morale of the story is that we need to be extra careful when making assumptions for users when designing UI applications. In this case, XP made a not necessarily wise assumption about the intention of plugging a camera into the USB port. Even for users who do want to download pictures this way, there is no visual clue provided, leaving users wonder what is going on.
Reverse P/Invoke - Part 2
Posted by Hugh Ang at 10/14/2008 12:29:00 PMIn my last post, I described a loosely-coupled pattern for native code to call into managed code. That approach requires control of source code of the native library, although only minor change would be made. There are scenarios where native code is not available and we just can not make any changes to it. For example, I have a Canon Rebel XTi camera connected to my PC and I need to notify my Winforms application of a new picture just taken so that the application can download and display it. The Canon SDK is in native library and has exported functions for callback function pointers to be registered. So what do we do? Scenarios like this require us to call native API from managed code, passing delegates marshaled as function pointers.
The solution is surprisingly straightforward. To demonstrate the technique, here is the native library code - note the definitions of data structure and callback function prototype:
#define NATIVELIB_API __declspec(dllexport)
// data structure for the callback function
struct EventData
{
int I;
TCHAR* Message;
};
// callback function prototype
typedef void (*FPCallBack)(EventData data);
// exported API
NATIVELIB_API void fnNativeAPI(int i, FPCallBack callBack)
{
EventData data;
data.I = i;
data.Message = L"Hello from native code!";
// invoke the callback function
callBack(data);
}
And here is my corresponding interop code in C#. Notice that in the P/Invoke definition of the fnNativeAPI call, I added the MarshalAs(UnmanagedType.FunctionPtr) attribute in front of the CallBackDelegate parameter to instruct the runtime to marshal the delegate as a function pointer from managed code to native code.
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
public struct EventData
{
public int I;
public string Message;
}
[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
public delegate void CallBackDelegate(EventData data);
public class Native
{
[DllImport("NativeLib.dll")]
public static extern void fnNativeAPI(int i, [MarshalAs(UnmanagedType.FunctionPtr)] CallBackDelegate callBack);
}
Here is the C# code of the application:
public partial class Form1 : Form
{
private CallBackDelegate _cb;
public Form1()
{
InitializeComponent();
_cb = new CallBackDelegate(Foo);
}
private void Form1_Load(object sender, EventArgs e)
{
// call the native API, passing our .NET delegate
int i = 200;
Native.fnNativeAPI(i, _cb);
}
// this is our callback function
private void Foo(EventData data)
{
Debug.WriteLine(data.I);
Debug.WriteLine(data.Message);
}
}
Careful readers may have noticed that I have kept the delegate as a class field. This is a best practice to prevent the delegate from being garbage-collected. It would be really bad if the native code holds an invalid function pointer and tries to invoke it.
For VB programmers, here is the interop code in VB.NET:
<StructLayout(LayoutKind.Sequential, CharSet:=CharSet.Unicode)> _
Public Structure EventData
Public I As Integer
Public Message As String
End Structure
<UnmanagedFunctionPointer(CallingConvention.Cdecl)> _
Public Delegate Sub CallBackDelegate(ByVal data As EventData)
Public Class Native
<DllImport("NativeLib.dll")> _
Public Shared Sub fnNativeAPI(ByVal i As Integer, _
<MarshalAs(UnmanagedType.FunctionPtr)> ByVal callBack As CallBackDelegate)
End Sub
End Class
And the application code:
Public Class Form1
Dim _cb As CallBackDelegate
Public Sub New()
' This call is required by the Windows Form Designer.
InitializeComponent()
' Add any initialization after the InitializeComponent() call.
_cb = New CallBackDelegate(AddressOf Me.Foo)
End Sub
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
Dim i As Integer = 200
'call the native API, passing our .NET delegate
Native.fnNativeAPI(i, _cb)
End Sub
' this is our callback function
Private Sub Foo(ByVal data As EventData)
Debug.WriteLine(data.I)
Debug.WriteLine(data.Message)
End Sub
End Class
Reverse P/Invoke
Posted by Hugh Ang at 9/30/2008 07:24:00 PMWhile researching for ongoing/upcoming projects that will need approaches for interop between .NET and native code, specifically, for native code to call into .NET code, Reverse P/Invoke has come to my attention as a viable option. Of course there is the official Microsoft recommendation to expose .NET classes as COM components, which are then callable from native code that talks COM.
The Reverse P/Invoke approach allows native code to call into .NET delegate using a function pointer. So it could work well for my requirement, for which I need a way to fire an event from the native app to the .NET app, for instance, an application context change on the native side must be reflected on the .NET side.
The blog by Junfeng, however, does not give a concrete example of such Reverse P/Invoke approach. So I came up with a POC, where I had a VS.NET solution with three projects: (1) a native console application (C++ project) (2) a managed class library (C# project) and (3) a mixed mode dll library with exported C++ function (C++/CLI project).
So this POC is trying to simulate a native application (#1) that needs to notify managed code (#2) of data changes. I came up with a dll library compiled with /clr switch to handle the interop details. Both the native app and the managed code requires very minimum changes.
On the .NET side, we have a managed class that has a Foo() function and a GetDelegate() function that hands out a delegate to Foo to its caller.
public class ManagedClass
{
private CallBackDelegate _delegate;
public ManagedClass()
{
_delegate = new CallBackDelegate(this.Foo);
}
public CallBackDelegate GetDelegate()
{
return _delegate;
}
public void Foo(EventData data)
{
Debug.WriteLine(data.I);
Debug.WriteLine(data.Message);
}
}
The EventData is a data structure that shares the same binary layout as the one that will be created and marshaled from the native code.
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
public struct EventData
{
public int I;
public string Message;
}
And here is the delegate definition. Note the attribute UnmanagedFunctionPointer with the calling convention.
[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
public delegate void CallBackDelegate(EventData data);
In the mixed mode dll, here is the definition of the EventData data structure and function pointer:
#pragma once
#include <windows.h>
// data structure for the callback function
struct EventData
{
int I;
TCHAR* Message;
};
// callback function prototype
typedef void (*NativeToManaged)(EventData data);
And the exported function is defined as in the following. Note how the .NET delegate gets invoked through the function pointer.
#define INTEROPBRIDGE_API __declspec(dllexport)
INTEROPBRIDGE_API void fnInteropBridge(EventData data)
{
ManagedLib::ManagedClass^ c = gcnew ManagedLib::ManagedClass();
IntPtr p = Marshal::GetFunctionPointerForDelegate(c->GetDelegate());
NativeToManaged funcPointer = (NativeToManaged) p.ToPointer();
// invoke the delegate
funcPointer(data);
}
Now in the native app, I have code that creates a copy of EventData and invokes the .NET code through the exported dll function fnInteropBridge:
// forward definition of the API function
void fnInteropBridge(EventData data);
int _tmain(int argc, _TCHAR* argv[])
{
EventData data;
data.I = 50;
data.Message = L"Hello from native code!";
fnInteropBridge(data);
return 0;
}
In summary, I like this approach it that it provides quite an easy and non-invasive way for native code to call into managed code. It should especially work well in my scenario, where application context changes initiated from the native app needs to be propagated to the managed code. Furthermore, besides polishing this up, I think I will add code to raise a .NET event from inside ManagedClass.Foo(). Then all interested .NET citizens on the managed app side can subscribe to it.
Follow-up of PIAB and WCF Article
Posted by Hugh Ang at 9/23/2008 11:09:00 PMSince publishing the MSDN article on an approach to integrate PIAB into WCF, I have received quite a few feedback from folks using this approach in production. And I am glad to say it's working out successfully for those folks.
As I mentioned earlier in the comment, Tom Hollander of p&p group has found an issue with our approach when two PIAB-enabled WCF services are hosted in the same IIS worker process. He graciously shared his code with us and I was able to reproduce the problem when the two WCF services were hosted in two separate AppDomains of the same IIS worker process. Although in production deployment scenarios, different WCF services would likely be hosted in separate processes or on different machines, I have still been wanting to figure out the root cause of this particular issue. Unfortunately with work and other things in my life, I just haven't had time until last week when I was finally able to sit down and focused on this problem.
I have to admit this has been the most tedious debugging I have ever done as I was deep in the guts of WCF and CLR, inspecting dynamic types, JIT compiler, call stubs, etc. without a whole lot of background in this area - I only wish I were working for the CLR team :-) After a few days, I finally found out the cause of the problem, which is pretty close to my initial hunch. A sense of elation at last!
Here it goes.
1. The setup to repro the problem
The setup is fairly straightforward. There are two WCF services, each implementing IService and IAnotherService respectively (the code is pretty much verbatim from Tom):
[ServiceContract]
public interface IService
{
[OperationContract]
Foo GetFoo(int id);
[OperationContract]
void AddFoo(int i, [NotNullValidator] Foo foo);
}
[ServiceContract]
public interface IAnotherService
{
[OperationContract]
void LogFoo(Foo foo);
}
Both services would be enabled with PIAB of course:
[PolicyInjectionBehaviors.PolicyInjectionBehavior]
[ValidationCallHandler]
[LogCallHandler]
public class Service : IService
{
private static Dictionary<int, Foo> store = new Dictionary<int, Foo>();
IAnotherService anotherService;
public Service()
{
BasicHttpBinding binding = new BasicHttpBinding();
binding.SendTimeout = new TimeSpan(4, 0, 0);
anotherService = new AnotherServiceClient(binding, new EndpointAddress("http://localhost/AnotherTestService/AnotherTestService.svc"));
}
public void AddFoo(int id, Foo foo)
{
store[id] = foo;
anotherService.LogFoo(foo);
}
public Foo GetFoo(int id)
{
if (store.ContainsKey(id))
{
anotherService.LogFoo(store[id]);
return store[id];
}
else
return null;
}
}
[PolicyInjectionBehaviors.PolicyInjectionBehavior]
[ValidationCallHandler]
[LogCallHandler]
public class AnotherService : IAnotherService
{
public void LogFoo(Foo foo)
{
Logger.Write("LogFoo() called.");
}
}
Foo is simply a DataContract object that holds both an int and a string properties.
Now the services would be hosted two AppDomains in the same IIS worker process. This is how it looks like on my Vista machine:

As you can see both services are in the DefaultAppPool. With both services set up, we can run the test harness, which first calls Service.AddFoo() and then Service.GetFoo(). The Service.Foo() is completed fine but Service.GetFoo() call fails with a System.Reflection.TargetException: Object does not match target. The stack trace is as follows:
1017e344 79644832 System.Reflection.RuntimeMethodInfo.CheckConsistency(System.Object)
1017e350 793a4124 System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo, Boolean)
1017e39c 793a40a2 System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo)
1017e3bc 0f96e699 Microsoft.Practices.EnterpriseLibrary.PolicyInjection.RemotingInterception.InterceptingRealProxy+<>c__DisplayClass1.b__0(Microsoft.Practices.EnterpriseLibrary.PolicyInjection.IMethodInvocation, Microsoft.Practices.EnterpriseLibrary.PolicyInjection.GetNextHandlerDelegate)
1017e3ec 0f968ed1 Microsoft.Practices.EnterpriseLibrary.PolicyInjection.HandlerPipeline.Invoke(Microsoft.Practices.EnterpriseLibrary.PolicyInjection.IMethodInvocation, Microsoft.Practices.EnterpriseLibrary.PolicyInjection.InvokeHandlerDelegate)
1017e404 0f968a2e Microsoft.Practices.EnterpriseLibrary.PolicyInjection.RemotingInterception.InterceptingRealProxy.Invoke(System.Runtime.Remoting.Messaging.IMessage)
1017e418 79374dc3 System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(System.Runtime.Remoting.Proxies.MessageData ByRef, Int32)
1017e6b4 79f98b43 [TPMethodFrame: 1017e6b4] WcfPiabInstability.Services.IAnotherService.LogFoo(WcfPiabInstability.Services.Foo)
1017e6c4 0efd08dc DynamicClass.SyncInvokeGetFoo(System.Object, System.Object[], System.Object[])
1017e6d4 50b8d90b System.ServiceModel.Dispatcher.SyncMethodInvoker.Invoke(System.Object, System.Object[], System.Object[] ByRef)
1017e74c 50b6d245 System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeBegin(System.ServiceModel.Dispatcher.MessageRpc ByRef)
1017e7a0 509137ad System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage5(System.ServiceModel.Dispatcher.MessageRpc ByRef)
1017e7e0 509136a6 System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage4(System.ServiceModel.Dispatcher.MessageRpc ByRef)
1017e80c 50913613 System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage3(System.ServiceModel.Dispatcher.MessageRpc ByRef)
1017e81c 50913459 System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage2(System.ServiceModel.Dispatcher.MessageRpc ByRef)
1017e82c 50912257 System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage1(System.ServiceModel.Dispatcher.MessageRpc ByRef)
1017e844 50911f8f System.ServiceModel.Dispatcher.MessageRpc.Process(Boolean)
1017e888 509115ff System.ServiceModel.Dispatcher.ChannelHandler.DispatchAndReleasePump(System.ServiceModel.Channels.RequestContext, Boolean, System.ServiceModel.OperationContext)
1017ea34 5090f8c9 System.ServiceModel.Dispatcher.ChannelHandler.HandleRequest(System.ServiceModel.Channels.RequestContext, System.ServiceModel.OperationContext)
1017ea78 5090f35e System.ServiceModel.Dispatcher.ChannelHandler.AsyncMessagePump(System.IAsyncResult)
1017ea8c 5090f2f1 System.ServiceModel.Dispatcher.ChannelHandler.OnAsyncReceiveComplete(System.IAsyncResult)
1017ea98 50232d68 System.ServiceModel.Diagnostics.Utility+AsyncThunk.UnhandledExceptionFrame(System.IAsyncResult)
1017eac4 50904501 System.ServiceModel.AsyncResult.Complete(Boolean)
1017eb00 50992b36 System.ServiceModel.Channels.InputQueue`1+AsyncQueueReader[[System.__Canon, mscorlib]].Set(Item)
1017eb14 50992215 System.ServiceModel.Channels.InputQueue`1[[System.__Canon, mscorlib]].EnqueueAndDispatch(Item, Boolean)
1017eb7c 50991ffb System.ServiceModel.Channels.InputQueue`1[[System.__Canon, mscorlib]].EnqueueAndDispatch(System.__Canon, System.ServiceModel.Channels.ItemDequeuedCallback, Boolean)
1017eba4 5091d7e5 System.ServiceModel.Channels.SingletonChannelAcceptor`3[[System.__Canon, mscorlib],[System.__Canon, mscorlib],[System.__Canon, mscorlib]].Enqueue(System.__Canon, System.ServiceModel.Channels.ItemDequeuedCallback, Boolean)
1017ebc8 50977b7e System.ServiceModel.Channels.HttpChannelListener.HttpContextReceived(System.ServiceModel.Channels.HttpRequestContext, System.ServiceModel.Channels.ItemDequeuedCallback)
1017ec0c 5094f396 System.ServiceModel.Activation.HostedHttpTransportManager.HttpContextReceived(System.ServiceModel.Activation.HostedHttpRequestAsyncResult)
1017ec50 5094e4cf System.ServiceModel.Activation.HostedHttpRequestAsyncResult.HandleRequest()
1017ec68 5094defd System.ServiceModel.Activation.HostedHttpRequestAsyncResult.BeginRequest()
1017eca4 5094dea5 System.ServiceModel.Activation.HostedHttpRequestAsyncResult.OnBeginRequest(System.Object)
1017ecd0 50903c3c System.ServiceModel.Channels.IOThreadScheduler+CriticalHelper+WorkItem.Invoke2()
1017ed0c 50903b26 System.ServiceModel.Channels.IOThreadScheduler+CriticalHelper+WorkItem.Invoke()
1017ed20 50903ab5 System.ServiceModel.Channels.IOThreadScheduler+CriticalHelper.ProcessCallbacks()
1017ed54 5090390f System.ServiceModel.Channels.IOThreadScheduler+CriticalHelper.CompletionCallback(System.Object)
1017ed80 5090388b System.ServiceModel.Channels.IOThreadScheduler+CriticalHelper+ScheduledOverlapped.IOCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)
1017ed8c 50232e1f System.ServiceModel.Diagnostics.Utility+IOCompletionThunk.UnhandledExceptionFrame(UInt32, UInt32, System.Threading.NativeOverlapped*)
1017edc0 79405534 System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)
1017ef60 79e7c74b [GCFrame: 1017ef60]
1017f0b8 79e7c74b [ContextTransitionFrame: 1017f0b8]
2. Culprit - a subtle bug in the CLR?
What is going on here? The exception is being thrown by System.Reflection.RuntimeMethodInfo.CheckConsistency() but the problem happened earlier. Notice the red in those two lines of the stack trace. DynamicClass.SyncInvokeGetFoo is the function generated by WCF using Lightweight Code Gen (LCG) to facilitate the GetFoo() call, which is in IService definition. But the next one on the call stack somehow becomes IAnotherService.LogFoo() - please be reminded that IAnotherService.LogFoo() call on the stack here is not to be confused with the one inside the Service.GetFoo() as the call hasn't reached to the actual object yet when exception happens. Using SOS command !clrstack -p and !dumpobject reveals that the object reference in the call context here is the PIAB proxy to the Service object (tp->rp->real object), which implements IService but not IAnotherService. The mismatch simply didn't manifest into an exception until later in System.Reflection.RuntimeMethodInfo.CheckConsistency(). So where the exception is thrown is not that important. We need to understand why IService.GetFoo() suddenly becomes IAnotherService.LogFoo().
Let's review the high level picture of how the calls are being dispatched from the client site (we will only consider synchronous calls for now):
- Client makes a call by sending an XML message to the WCF service
- WCF processes the message in the pipeline before it finally dispatches the call through System.ServiceModel.Dispatcher.SyncMethodInvoker.Invoke() as shown on the stack trace. System.ServiceModel.Dispatcher.InvokerUtil uses LCG to generate a delegate SyncInvokeXXXX, where XXXX is the target method name and SyncInvoke stands for synchronous call. SyncMethodInvoker.Invoke() simply passes control to SyncInvokeXXXX(). This part is not that difficult to figure out by using Reflector.
- CLR takes the buck from here. Starting with 2.0, .NET uses dispatch stub to handle interface calls. The runtime figures out the method disptach token and sends it along with the target object reference to mscorwks!ResolveWorkerAsmStub, which calls mscorwks!VirtualCallStubManager::ResolveWorkerStatic and then the heavy lifting mscorwks!VirtualCallStubManager::ResolveWorker to figure out the stub that contains the assembly code to make the actual call.
- PIAB proxy gets called. This is where the injection magic happens and service method finally gets called.
The dispatch token is a 32 bit integer with hi as the type id of the interface and lo as the slot number of the method as shown in the Rotor(SSCLI) source code:
static const UINT_PTR MASK_TYPE_ID = 0x0000FFFF;
static const UINT_PTR MASK_SLOT_NUMBER = 0x0000FFFF;
static const UINT_PTR SHIFT_TYPE_ID = 0x10;
static const UINT_PTR SHIFT_SLOT_NUMBER = 0x0;
//------------------------------------------------------------------------
// Combines the two values into a single 32-bit number.
static UINT_PTR CreateToken(UINT32 typeID, UINT32 slotNumber)
{
LEAF_CONTRACT;
CONSISTENCY_CHECK(((UINT_PTR)typeID & MASK_TYPE_ID) == (UINT_PTR)typeID);
CONSISTENCY_CHECK(((UINT_PTR)slotNumber & MASK_SLOT_NUMBER) == (UINT_PTR)slotNumber);
return ((((UINT_PTR)typeID & MASK_TYPE_ID) << SHIFT_TYPE_ID) |
(((UINT_PTR)slotNumber & MASK_SLOT_NUMBER) << SHIFT_SLOT_NUMBER));
}
Type ids are integer identifiers to represent types in an AppDomain. Slot numbers are integer values representing entries of interface methods in the method table. In the example I am using, IService has a type id of 0x0003 and AddFoo has a slot number of 0x0001, therefore yielding a token of 0x00030001. IService.GetFoo has a dispatch token of 0x00030000. And IAnotherService.LogFoo also has a token of 0x0003000. You see that both IService and IAnotherService, living in two AppDomains, happen to have the same type id: 0x0003. It is a coincidence, but not to be ignored.
The reason why the dispatch token is critical here is because of the following:
- all interface disptach stubs for our PIAB enabled services are handled by the VirtualCallStubManager in the shared domain of the process.
- the stub manager keeps a hash table to cache the stub code using those two keys: token and object type. In our example, the object type is always a transparent proxy for PIAB-enabled services. So effectively token becomes the only key that matters.
- the heavy lifting mscorwks!VirtualCallStubManager::ResolveWorker() is responsible for generating and caching the stub. Obviously it always first checks if there is a cached entry. If one is found using the keys, that entry will be returned.
When our unit test harness calls Service.AddFoo, which internally calls AnotherService.LogFoo, two dispatch stubs are created and cached by the shared stub manager, with tokens 0x00030001 and 0x00030000 as the effective key respectively. Now the unit test makes a different call Service.GetFoo. Note that IService.GetFoo also has the dispatch token as 0x00030000, same as IAnotherSevice.LogFoo, despite the fact they are two types in different domains. The stub manager of the shared domain hands out the previously cached dispatch stub for IAnotherService.LogFoo. This is why we saw the strange call stack above and the call eventually fails.
To further prove this, I changed the definition of IAnotherService to the following to include 5 additional functions as fillers to the method table slots:
[ServiceContract]
public interface IAnotherService
{
// fillers
void Filler1();
void Filler2();
void Filler3();
void Filler4();
void Filler5();
[OperationContract]
void LogFoo(Foo foo);
}
The purpose is to alter the slot number of IAnotherService.LogFoo to avoid the token clash. As shown in the following Windbg snippet, I did indeed get a token of 0x00030005 as opposed to the 0x0003000 that I had earlier:
eax=00000003 ebx=0e3f6228 ecx=79e89e87 edx=00000003 esi=00000005 edi=01c45b18
eip=79eb45cf esp=1057dd4c ebp=1057dd80 iopl=0 nv up ei pl nz ac po cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000213
mscorwks!VirtualCallStubManager::GetCallStub+0x34:
79eb45cf c1e010 shl eax,10h
0:025> p
eax=00030000 ebx=0e3f6228 ecx=79e89e87 edx=00000003 esi=00000005 edi=01c45b18
eip=79eb45d2 esp=1057dd4c ebp=1057dd80 iopl=0 nv up ei pl nz ac pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000216
mscorwks!VirtualCallStubManager::GetCallStub+0x37:
79eb45d2 0bf0 or esi,eax
0:025> p
eax=00030000 ebx=0e3f6228 ecx=79e89e87 edx=00000003 esi=00030005 edi=01c45b18
eip=79eb45d4 esp=1057dd4c ebp=1057dd80 iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
And the CLR happily makes all the calls without the dreadful exception!
3. Summary
It is interesting to see how the CLR team may have tried to safeguard the clashing of the keys by using both token and the object type. In our case, the object type, being the transparent proxy for PIAB-enabled services, takes that key out of the equation. The token, although having HI corresponding to the interface type and LO to the slot number, is scoped within the AppDomain where the type is loaded. The fewer the number of types and the fewer the methods of the operation contracts are defined for the WCF service in each AppDomain, the bigger the odds key clashing like this will happen.
So what is the solution? Could other interface method related values such as MethodDesc, which seems to be unique across app domains, be a better candidate? That is the question for the CLR team.
As for us who want to integrate PIAB into WCF while minimizing development team efforts, we should be fine by hosting WCF services in separate processes. If you really, really want to host everything in one IIS worker process like our contrived example, you can get around this issue by not using the default PIAB interception mechanism. Suppose you can come up with a LCG mechanism to generate one dynamic proxy (instead of System.Runtime.Remoting.Proxies.__TransparentProxy) for each target. Having a different object type as each service's PIAB intercepting proxy will therefore avoid the key clashing.
4. Tools etc.
Debugging is always an enlightening experience. And I can't imagine what life is going to be without Windbg and Reflector. Visual Studio 2008 is cool since you can configure it to debug into .NET framework code. But if the symbols are not available for the module you want to investigate or you need to dig deeper below the FCL layer, Reflector and Windbg are just indispensable.
Also, having SSCLI source code is absolutely wonderful. Without it, debugging through machine code in windbg would be a lot harder.
Most of the SSCLI code for this exercise is located inside virtualcallstub.cpp file.
Interview Questions
Posted by Hugh Ang at 9/09/2008 09:46:00 AMHaving a list of questions with standard answers to score candidates during technical screenings certainly has its benefits, especially maintaining consistencies across different interviewers. However if the interviewer simply compares the candidate's answers to the official ones, then she is not doing her job. Interview is an interactive process and should be leveraged as such. Many seemingly easy questions can be extended to discussions at both broader and deeper levels. You will get a better picture of candidate's overall skills and experiences this way. For example, there is usually a basic question on the differences between value and reference types. This question can be extended to boxing/unboxing, and the scenarios where boxing/unboxing can occur and the performance implications, which can be a good starting point to test candidate's knowledge on generics. There are semantic implications of boxing and unboxing as well. A boxed integer, e.g. is a brand new object with a copy of the initial integer value. The following is not allowed by the C# compiler:
int i = 1; // class instance field
lock(i)
{
//...
}
You could hack it with:
lock ((object)i)
{
//...
}
But you will not get the intended lock semantics. I will leave it to you to answer why that is the case. If you know the answer, you will see that you can evaluate the candidate's knowledge of threading and synchronization, besides boxing/unboxing.
And there is more! A related topic is passing by reference vs. passing by value in function calls (I had a previous blog in this area). And you can take the initial question and extend it to a discussion on heap vs. stack and further on GC.
So you see how one easy question can be extended quite a bit and become the vehicle for you to test candidate's overall knowledge. Of course you need to be sensitive to time and not go off to all directions. You usually will get a sense of the candidate's knowledge half way through and can decide whether to go further or not from that point.