13 July 2008

Neverball on iPod

I had a lot of fun playing Neverball's minigolf alter ego, Neverputt, with my brother. Neverball itself was fun, but it was too hard to progress using the mouse. But this changed now that Neverball was ported to the iPod Touch/iPhone, using its internal accelerometer as input:

I can't help but thinking this is how the game was intended to be played.

11 July 2008

Propagating time inside recursive function cycles

I believe that a call graph showing the total function time (that is, the time spent inside the function and all its descendants) is one of the most useful visualizations of profile data, as it allows to quickly grasp what is the critical code path in your program that is taking most of the execution time.

Total function time is typically calculated by propagating the time along the call graph, from callees to callers. The problem with measuring total function time are recursive functions, as noted on gprof article:

Time propagation within strongly connected components is a problem. For example, a self-recursive routine (a trivial cycle in the call graph) is accountable for all the time it uses in all its recursive instantiations. In our scheme, this time should be shared among its call graph parents. The arcs from a routine to itself are of interest, but do not participate in time propagation. Thus the simple equation for time propagation does not work within strongly connected components. Time is not propagated from one member of a cycle to another, since, by definition, this involves propagating time from a routine to itself. In addition, children of one member of a cycle must be considered children of all members of the cycle. Similarly, parents of one member of the cycle must inherit all members of the cycle as descendants. It is for these reasons that we collapse connected components. Our solution collects all members of a cycle together, summing the time and call counts for all members. All calls into the cycle are made to share the total time of the cycle, and all descendants of the cycle propagate time into the cycle as a whole. Calls among the members of the cycle do not propagate any time, though they are listed in the call graph profile.

This means that, for example, the recursive call of of singly recursive functions is ignored:

<

And recursion cycles with two or more functions are collapsed in a single node:

In summary, the call graph is converted in a Directed Acyclic Graph (DAG).

However this approach has several drawbacks. By treating alls functions in a recursion cycle as a single entity, you loose the ability to differentiate among them; you cannot prune individually the functions which have little contribution to the overall time, so the call graph gets bigger and more complex. Furthermore, nothing prevents a call graph to be full of recursion cycles. Actually, if you throw in the mix a statistical profiler like oprofile which produces spurious calls when the instruction pointer is sampled before/after the stack frame is setup/destroyed, and a environment like the linux kernel where there are IRQs artificially introducing function calls to the IRQ handler, you'll probably also end up with a 400KB dot file of spaghetti inside, which graphviz's dot chews for all eternity.

I tried to find a better solution for several months. Most attempts ended up with my head in an infinite loop. The solution eventually came to me once I realized that time is a linear quantity, that is, if you propagate the time along the graph assuming zero time for all but one function, repeat for every function, and take the overall sum, then you will reach the same results as doing the computation in a single pass. Given this, and the fact that by using the gprof approach we already know the time being propagated into and out of the cycle, we can then try to estimate the time propagation inside the cycle by decompose the cycle into a DAG having as root each function that is called externally. For example, taking the two function recursion example above, you can decompose the call graph in two cases:

Then it is just a matter of propagating the partial time for each case, and take the total sum. The same thing can be done with any number of nodes:

The end result is an heuristic that enables gprof2dot to visually differentiate and prune recursive functions as any other function, producing consistent and meaningful results.

10 July 2008

Tracing D3D applications

I needed a tool to trace Direct3D 8 applications, like Microsoft's defunct D3DSpy or PIX. Unfortunately D3DSpy/PIX is for Direct3D 9 and above only, and no source is available, so I decided to roll my own tool.

I started with Michael Koch's sample code to intercept calls to DirectX with a proxy DLL, but when I was done I ended up writing a framework in Python to generate automatically all the code to intercept all the Direct3D 8 API (or almost any DLL for that matter), and dump the parameters in between.

The code generation mechanism is inspired in Python's ctypes module. One describes the APIs in Python like:

D3DPRESENT_PARAMETERS = Struct("D3DPRESENT_PARAMETERS", [
    (UINT, "BackBufferWidth"),
    (UINT, "BackBufferHeight"),
    (D3DFORMAT, "BackBufferFormat"),
    (UINT, "BackBufferCount"),
    (D3DMULTISAMPLE_TYPE, "MultiSampleType"),
    (DWORD, "MultiSampleQuality"),
    (D3DSWAPEFFECT, "SwapEffect"),
    (HWND, "hDeviceWindow"),
    (BOOL, "Windowed"),
    (BOOL, "EnableAutoDepthStencil"),
    (D3DFORMAT, "AutoDepthStencilFormat"),
    (DWORD, "Flags"),
    (UINT, "FullScreen_RefreshRateInHz"),
    (UINT, "PresentationInterval"),
])

IDirect3D9.methods.append(
    Method(HRESULT, "CreateDevice", [
        (UINT, "Adapter"), 
        (D3DDEVTYPE, "DeviceType"), 
        (HWND, "hFocusWindow"), 
        (DWORD, "BehaviorFlags"), 
        (OutPointer(D3DPRESENT_PARAMETERS), "pPresentationParameters"), 
        (OutPointer(PDIRECT3DDEVICE9), "ppReturnedDeviceInterface")
    ]),
)

Which will generate the following C++ code:

void DumpD3DPRESENT_PARAMETERS(const D3DPRESENT_PARAMETERS &value) {
    Log::BeginElement("UINT", "BackBufferWidth");
    DumpUINT((value).BackBufferWidth);
    Log::EndElement();
    Log::BeginElement("UINT", "BackBufferHeight");
    DumpUINT((value).BackBufferHeight);
    Log::EndElement();
    Log::BeginElement("D3DFORMAT", "BackBufferFormat");
    DumpD3DFORMAT((value).BackBufferFormat);
    Log::EndElement();
    Log::BeginElement("UINT", "BackBufferCount");
    DumpUINT((value).BackBufferCount);
    Log::EndElement();
    Log::BeginElement("D3DMULTISAMPLE_TYPE", "MultiSampleType");
    DumpD3DMULTISAMPLE_TYPE((value).MultiSampleType);
    Log::EndElement();
    Log::BeginElement("DWORD", "MultiSampleQuality");
    DumpDWORD((value).MultiSampleQuality);
    Log::EndElement();
    Log::BeginElement("D3DSWAPEFFECT", "SwapEffect");
    DumpD3DSWAPEFFECT((value).SwapEffect);
    Log::EndElement();
    Log::BeginElement("HWND", "hDeviceWindow");
    DumpHWND((value).hDeviceWindow);
    Log::EndElement();
    Log::BeginElement("BOOL", "Windowed");
    DumpBOOL((value).Windowed);
    Log::EndElement();
    Log::BeginElement("BOOL", "EnableAutoDepthStencil");
    DumpBOOL((value).EnableAutoDepthStencil);
    Log::EndElement();
    Log::BeginElement("D3DFORMAT", "AutoDepthStencilFormat");
    DumpD3DFORMAT((value).AutoDepthStencilFormat);
    Log::EndElement();
    Log::BeginElement("DWORD", "Flags");
    DumpDWORD((value).Flags);
    Log::EndElement();
    Log::BeginElement("UINT", "FullScreen_RefreshRateInHz");
    DumpUINT((value).FullScreen_RefreshRateInHz);
    Log::EndElement();
    Log::BeginElement("UINT", "PresentationInterval");
    DumpUINT((value).PresentationInterval);
    Log::EndElement();
}

HRESULT __stdcall WrapIDirect3D9::CreateDevice(UINT Adapter, 
  D3DDEVTYPE DeviceType, HWND hFocusWindow, DWORD BehaviorFlags, 
  D3DPRESENT_PARAMETERS * pPresentationParameters, 
  IDirect3DDevice9 * * ppReturnedDeviceInterface
) {
    HRESULT result;
    Log::BeginCall("IDirect3D9::CreateDevice");
    Log::BeginArg("IDirect3D9 *", "this");
    Log::BeginReference("IDirect3D9", m_pInstance);
    Log::EndReference();
    Log::EndArg();
    Log::BeginArg("UINT", "Adapter");
    DumpUINT(Adapter);
    Log::EndArg();
    Log::BeginArg("D3DDEVTYPE", "DeviceType");
    DumpD3DDEVTYPE(DeviceType);
    Log::EndArg();
    Log::BeginArg("HWND", "hFocusWindow");
    DumpHWND(hFocusWindow);
    Log::EndArg();
    Log::BeginArg("DWORD", "BehaviorFlags");
    DumpDWORD(BehaviorFlags);
    Log::EndArg();
    result = m_pInstance->CreateDevice(Adapter, DeviceType, hFocusWindow, 
      BehaviorFlags, pPresentationParameters, ppReturnedDeviceInterface);
    Log::BeginArg("D3DPRESENT_PARAMETERS *", "pPresentationParameters");
    if(pPresentationParameters) {
        Log::BeginReference("D3DPRESENT_PARAMETERS", pPresentationParameters);
        DumpD3DPRESENT_PARAMETERS(*pPresentationParameters);
        Log::EndReference();
    }
    else
        Log::Text("NULL");
    Log::EndArg();
    Log::BeginArg("IDirect3DDevice9 * *", "ppReturnedDeviceInterface");
    if(ppReturnedDeviceInterface) {
        Log::BeginReference("IDirect3DDevice9 *", ppReturnedDeviceInterface);
        if(*ppReturnedDeviceInterface) {
            Log::BeginReference("IDirect3DDevice9", *ppReturnedDeviceInterface);
        Log::EndReference();
    }
    else
        Log::Text("NULL");
        Log::EndReference();
    }
    else
        Log::Text("NULL");
    Log::EndArg();
    if(*ppReturnedDeviceInterface)
        *ppReturnedDeviceInterface = new WrapIDirect3DDevice9(*ppReturnedDeviceInterface);
    Log::BeginReturn("HRESULT");
    DumpHRESULT(result);
    Log::EndReturn();
    Log::EndCall();
    return result;
}

Which, when executed, hopefully generates something like the following XML:

<call name="IDirect3D9::CreateDevice">
  <arg type="IDirect3D9 *" name="this">
    <ref type="IDirect3D9" addr="001481E0"></ref>
  </arg>
  <arg type="UINT" name="Adapter">0</arg>
  <arg type="D3DDEVTYPE" name="DeviceType">D3DDEVTYPE_HAL</arg>
  <arg type="HWND" name="hFocusWindow">00110138</arg>
  <arg type="DWORD" name="BehaviorFlags">0x00000020</arg>
  <arg type="D3DPRESENT_PARAMETERS *" name="pPresentationParameters">
    <ref type="D3DPRESENT_PARAMETERS" addr="0012FE84">
      <elem type="UINT" name="BackBufferWidth">250</elem>
      <elem type="UINT" name="BackBufferHeight">250</elem>
      <elem type="D3DFORMAT" name="BackBufferFormat">D3DFMT_X8R8G8B8</elem>
      <elem type="UINT" name="BackBufferCount">1</elem>
      <elem type="D3DMULTISAMPLE_TYPE" name="MultiSampleType">D3DMULTISAMPLE_NONE</elem>
      <elem type="DWORD" name="MultiSampleQuality">0x00000000</elem>
      <elem type="D3DSWAPEFFECT" name="SwapEffect">D3DSWAPEFFECT_DISCARD</elem>
      <elem type="HWND" name="hDeviceWindow">00110138</elem>
      <elem type="BOOL" name="Windowed">1</elem>
      <elem type="BOOL" name="EnableAutoDepthStencil">0</elem>
      <elem type="D3DFORMAT" name="AutoDepthStencilFormat">D3DFMT_UNKNOWN</elem>
      <elem type="DWORD" name="Flags">0x00000000</elem>
      <elem type="UINT" name="FullScreen_RefreshRateInHz">0</elem>
      <elem type="UINT" name="PresentationInterval">2147483648</elem>
    </ref>
  </arg>
  <arg type="IDirect3DDevice9 * *" name="ppReturnedDeviceInterface">
    <ref type="IDirect3DDevice9 *" addr="0043289C">
      <ref type="IDirect3DDevice9" addr="0014EBA0"></ref>
    </ref>
  </arg>
  <ret type="HRESULT">D3D_OK</ret>
</call>

Which when viewed by a XML and CSS capable browser like Firefox or Internet Explorer will show:

Hovering on value will popup its type; and with Firefox, hovering on a pointer will show the referred data structure.

Source and binaries available.