Sunday, May 22, 2011

Bug Fixes in Android NDK Profiler

One of the biggest outstanding bugs in my Android NDK profiler library was that it would produce the gmon.out file but when you ran it through gprof you would see "no accumulate time" and the timing information was nowhere to be found. Now you can thank Justin Wick for debugging the root cause of the problem and providing a fix.

The library size I was using was too small to trigger the problem. It turned out that on larger libraries the calculation that inserted the addresses into the hystogram buckets had an overflow bug that caused the information to be lost. This was a left over from using the profiling code from Visual Boy Advance, since on the GBA the address space tends to be much smaller especially as it doesn't use random address layouts.

I tested this out using the Android port of FreeCiv, which prior to the change didn't show timing information, and with the change it now shows the times for each call. Once again, thanks to Justin for debugging the problem and figuring this out.

Using armeabi-v7a

The other problem you may have seen mentioned in the comments was with armeabi-v7a. In theory you can combine armeabi and armeabi-v7a together and things just work. Sadly I've had reports that doing this causes code compiled as armeabi-v7a to crash.

I finally got round to looking at what gcc spits out when you use the APP_ABI variable and set it to armeabi-v7a. There was a slight surprise in store for me that would explain the crashes. In a nutshell, when I said in my original post that I had to save all the registers in the profiling glue code, I had made an assumption that the link register (lr) did not have to be saved. With Thumb2 it looks like I do have to save that one as well, since gcc forgets that calling __gnu_mcount_nc (the profiling glue) is a function call that wrecks lr.

Look at this bit of C code:

int foo(int x)
{
        return x * 3;
}

When I compile it with -Os -mthumb --save-temps it gives the following:

foo:
        lsl     r3, r0, #1
        add     r0, r3, r0
        bx      lr

Now if I add in -pg to get the profiling calls I get this:

foo:
        push    {r4, lr}
        push    {lr}
        bl      __gnu_mcount_nc
        lsl     r3, r0, #1
        add     r0, r3, r0
        pop     {r4}
        pop     {r1}
        bx      r1

You can see it pushes lr twice - once for the use in the actual function, and a second time for __gnu_mcount_nc to do what it likes with it. Now if I compile for thumb2 using -march=armv7-a -mthumb I get this:

foo:
        @ link register save eliminated.
        push    {lr}
        bl      __gnu_mcount_nc
        movs    r3, #3
        mul     r0, r3, r0
        bx      lr

That comment is quite interesting. If you squint a bit it almost looks like a subtle compiler bug, since when you add profiling it should consider the call to __gnu_mcount_nc just like another function call and save lr. But as I haven't been able to test this on a real device yet, I'm not 100% sure that it's the only problem with using armeabi-v7a in combination with the profiling library. Plus once I saw what was going on I've been able to work around it anyway. If you get the chance to try this out, and are using armeabi-v7a let me know how the latest release works out for you.

4 comments:

  1. Anonymous5:28 am

    Dear Richard,

    I modify an android ndk sample to do a test for your android ndk profiler and run it on emulator. The call graph shows correct, but the time cost by some heavy functions are zero. Could you help to check what's wrong with this? Here is the program http://www.badongo.com/file/25430153. Thanks a lot.

    Barry

    ReplyDelete
  2. Barry,

    I am currently experiencing a similar problem. Some of my heaviest functions do not show up, but this is only on v7a not normal eabi. I have yet to debug this.

    ReplyDelete
  3. @Justin Hurray! :-)

    @Anon: the call graph shows fir_filter_f as being the hog, and doesn't show fir_filter_c at all. Also Java_com_example_neon_HelloNeon_stringFromJNI shows very little time.

    Since monstartup() is at the start of Java_com_example_neon_HelloNeon_stringFromJNI, it's upsetting the profiling of that function. It takes some time to set up the actual profiling information (read the symbol maps, start the monitoring thread) and this is where you lose the information.

    It's best if you can call monstartup() well before calling any functions that you need to profile.

    ReplyDelete
  4. Ouch, comment collision.

    ReplyDelete

Note: only a member of this blog may post a comment.