I've been spending a lot of time profiling DRI drivers for Gallium 3D. I've tried gprof, valgrind, and finally oprofile. Oprofile seems the best in my opinion for this purpose. More details on the DRI Wiki.
I also wrote a script to generate a time-colored call graph from oprofile output, using graphviz. See an output example of profiling glxgears on Gallium 3D:
The hotter the colour of a function is, more time is spent on that function and its children.