I read the VBA source, lifted the relevant chunks of gprof-inspired code, hacked in some timing information, and added a touch of assembly to tie together what
gcc -pg
does and what the profile-creating code does. If you look at a chunk of the output from gcc with -pg
, it looks like this:
0001184c <do_raisedead_cast>:
1184c: b5f8 push {r3, r4, r5, r6, r7, lr}
1184e: b500 push {lr}
11850: f004 f830 bl 158b4 <__gnu_mcount_nc>
...
1189c: 3a01 subs r2, #1
1189e: 701a strb r2, [r3, #0]
118a0: bdf8 pop {r3, r4, r5, r6, r7, pc}
So the compiler pushes the link register (lr) and the
__gnu_mcount_nc
function has to pop it off the stack. This isn't a normal calling convention - usually each function uses the stack symmetrically, pushing and popping the same number of times. Also I learned that the mcount function must preserve all the registers. You can't smush a single one or Bad Things happen (I've seen a lot of weird crashes lately, I can tell you). This little conundrum took me quite some time to figure out. It can probably be done better, but the implementation I came up with was this:
push {r0-r3}
push {lr}
ldr r0, [sp, #20] @ r0 = lr pushed by calling routine
mov r1, lr @ address of calling routine
bl profCount
pop {r0}
mov lr, r0
pop {r0-r3}
@ pop previous address
add sp, #4
bx lr
Since registers r4+ must be preserved by the called function, I can ignore those when the code calls out to
profCount
. Keeping r0-r3 clean was the problem, as usually these are "scratch" registers and you're free to mess them up. The function profCount
takes the C function-just-called and its caller addresses as arguments for use in generating the profile. Anyway, the results are pretty good for the code I've tested it on - you get the usual call counts and timing information that gprof provides. Here's a bit of the information from Chaos...
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
65.31 0.32 0.32 288 1.11 1.11 android_render_screen
18.37 0.41 0.09 450 0.20 0.20 clear_square
4.08 0.43 0.02 3 6.67 6.67 clear_bg
2.04 0.44 0.01 25 0.40 0.40 print_text16
2.04 0.45 0.01 3 3.33 3.33 draw_decor_border_internal
2.04 0.46 0.01 2 5.00 33.00 splash_touch
2.04 0.47 0.01 1 10.00 52.00 options_back
2.04 0.48 0.01 1 10.00 54.80 show_options
2.04 0.49 0.01 chaos_start
0.00 0.49 0.00 442368 0.00 0.00 draw_8x8_tile
0.00 0.49 0.00 28800 0.00 0.00 platform_set_tile_entry
0.00 0.49 0.00 3646 0.00 0.00 platform_set_text_map_entry
0.00 0.49 0.00 2671 0.00 0.00 platform_key_pressed
0.00 0.49 0.00 1186 0.00 0.00 platform_set_palette_entry
index % time self children called name
0.32 0.00 288/288 chaos_gl_render [2]
[1] 65.3 0.32 0.00 288 android_render_screen [1]
0.00 0.00 442368/442368 draw_8x8_tile [22]
0.00 0.00 286/286 InterruptProcess [30]
Functions with no "calls", "self" or "total" are because they got called via JNI. The "draw_8x8_tile" call count there is accurate - it is called for each 8x8 tile on the screen in each frame. There are 32 x 24 tiles, and 2 layers of tiles.
32 * 24 * 2 * 288 = 442368
, as expected. And the android_render_screen count is correct, since at 30 fps that is around 9 seconds of use and I only ran the game for a few seconds to check.More details on the android-ndk-profiler page, including a guide on how to use the library.
I am trying to use the profiler but when I compile I get undefined reference to monstartup and moncleanup functions. My Android.mk file is
ReplyDeleteLOCAL_PATH := $(call my-dir)
# remove -ffunction-sections and -fomit-frame-pointer from the default compile flags
TARGET_thumb_release_CFLAGS := $(filter-out -ffunction-sections,$(TARGET_thumb_release_CFLAGS))
TARGET_thumb_release_CFLAGS := $(filter-out -fomit-frame-pointer,$(TARGET_thumb_release_CFLAGS))
TARGET_CFLAGS := $(filter-out -ffunction-sections,$(TARGET_CFLAGS))
# include libandprof.a in the build
include $(CLEAR_VARS)
LOCAL_MODULE := andprof
LOCAL_SRC_FILES := libandprof.a
include $(PREBUILT_STATIC_LIBRARY)
# RayTracer shared library
include $(CLEAR_VARS)
LOCAL_MODULE := RayTracer
# compile with profiling
LOCAL_CFLAGS := -pg
# Ray Tracer
LOCAL_CFLAGS += -DANDROID_NDK
LOCAL_SRC_FILES := RayTracer.cpp
LOCAL_LDLIBS := -lGLESv1_CM -lstdc++
LOCAL_STATIC_LIBRARIES := andprof
include $(BUILD_SHARED_LIBRARY)
The results I get after I run ndk-build are:
Gdbserver : [arm-linux-androideabi-4.4.3] libs/armeabi/gdbserver
Gdbsetup : libs/armeabi/gdb.setup
SharedLibrary : libRayTracer.so
C:/.../TempNDK/obj/local/armeabi/objs-debug/RayTracer/TempNDK.o: In function `Java_ADE_RayTracer_GLView_nativeInit':
C:/.../TempNDK/jni/TempNDK.cpp:185: undefined reference to `monstartup(char const*)'
C:/.../TempNDK/obj/local/armeabi/objs-debug/RayTracer/TempNDK.o: In function `Java_ADE_RayTracer_GLView_nativePause':
C:/.../TempNDK/jni/TempNDK.cpp:239: undefined reference to `moncleanup()'
collect2: ld returned 1 exit status
make: *** [/cygdrive/c/.../TempNDK/obj/local/armeabi/libRayTracer.so] Error 1
Is there anything else I should in order to make it work?
Thanks
Ah! Looks like you're using C++. Name mangling. I'll upload a fix, but if you declare them with C linkage it should work:
ReplyDeleteextern "C" void monstartup(char const*);
extern "C" void moncleanup();
I'm getting the same undefined reference compilation errors when I attempt to use android-ndk-profiler in my project. I'm using version 3.2, and I've got the extern "C" declarations in my header.
DeleteWithout the extern "C" declarations, I get 'monstartup not declared in this scope'. With them, I get 'undefined reference to monstartup'.
Any tips?
I tried using the pre-compiled profiling library posted on the project page, but my app crashes in System.loadLibrary() if I compile with
ReplyDeleteAPP_ABI := armeabi-v7a
in my Application.mk file. It works fine if I use the default APP_ABI.
I did not try compiling the library myself with armeabi-v7a, although I imagine that would fix the problem. You may want to add this caveat to the documentation.
@LightStruk - Thanks for the report. I thought that I only needed to supply the lowest-common-denominator armeabi, as the docs/PREBUILTS.html file in the NDK says:
ReplyDelete"Remember that you don't need to provide an armeabi-v7a prebuilt library, since an armeabi one can easily run on the corresponding devices."
That probably only applies to shared libraries, not when you mix armeabi and v7a static libraries. I've updated the project with v1.3, which has a prebuilt armeabi-v7a library.
I followed the steps on your ndk-profiler usage, and I'm able to run my program and generate a gmon.out on my nexus one phone. However the prof outpu says "no time accumulated". My program is compiled with pg, and android project is set to debug in its manifest also.
ReplyDeletePerhaps the calls are not long enough? Are you getting *any* profiling information? e.g. the flat profile with the number of calls per function? The library hardcodes PC sampling at 100Hz (one sample every 10 ms), if your function is called sporadically and takes less time I guess it won't trigger time accumulation.
ReplyDeleteI called my function 100 times to ensure the call is long enough. The measured wall time is 13 s.
ReplyDeleteThe Flat profile does not have number of calls per function. It shows "Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated" and then gives the definition of %time, name etc. terms, that's it.
I'm using ndk-r5b on Windows, and use cygwin to compile my natvie code.
Are you using armeab-v7a? I've not tested that at all and it probably doesn't work. I don't have a compatible device, but I'll try and fix it if that's the case.
ReplyDeleteJust to be clear, I downloaded your latest NDK-profiler 1.3, and only make for armeab .
ReplyDeleteBut still gmon.out does not produce any function usage information. My device is Nexus One running Android 2.3.3
Here is PROFILING info LogCat:
04-27 11:16:16.280: INFO/PROFILING(2632): Proflie libnative-plasma.so 80400000-8
0409000: 0
04-27 11:16:23.790: INFO/PROFILING(2632): 1: stopping profile thread
04-27 11:16:23.800: INFO/PROFILING(2632): 1: writing gmon.out
One suggestion: it would be useful if you can provide a test project (stating under what platform it has been tested), so we can download it and quickly test if it can work under our platform. That be easier than putting it into our large project and trying to figure out what goes wrong.
No, I’m using armeab
ReplyDeleteLasted NDK profiler v1.3,
Android 2.3.3 gingerbread.
One suggestion, can you provide a test project that we can download and quickly test it under our platform? That would be easier to isolate the problem. Thanks.
I am trying to use the profiler on my c++ code. However although the flat profile shows the number function calls, there is no accumulated time. When I compile the code it gives me a lot of warning says "-ffunction-sections disabled; it makes profiling impossible". Do you know what might be the problem. Normally it should accumulate time as I have functions that are called a million times and the running time of the application is enough.
ReplyDeleteI've not used C++, but I bet adding something like this to your Android.mk would help:
ReplyDeleteTARGET_CXXFLAGS := $(filter-out -ffunction-sections,$(TARGET_CXXFLAGS))
By default the Android build system adds the -ffunction-sections so you have to remove it.
Hmm, actually, seems there isn't actually a TARGET_CXXFLAGS, so if you follow http://code.google.com/p/android-ndk-profiler/wiki/Usage it should Just Work.
ReplyDeleteThanks for posting the armeabi-v7a build; sadly, when compiling in armeabi-v7a mode, it doesn't appear to work any better than the armeabi build. I don't think it's even getting to monstartup() before it hangs.
ReplyDeleteAs you said earlier, it doesn't make sense that armeabi-v7a builds would need a different andprof library. The calling convention is unchanged.
Do you preserve r2 and r3 because profCount() clobbers them? __gnu_mcount_nc() doesn't otherwise touch them. Looking at the disassembly of profCount(), it isn't obvious to me that profCount is correctly preserving r4, r5, and r6.
FWIW, these are the TARGET_* flags I have in my Android.mk:
TARGET_CFLAGS := $(filter-out -ffunction-sections,$(TARGET_CFLAGS))
TARGET_thumb_release_CFLAGS := $(filter-out -fomit-frame-pointer,$(TARGET_thumb_release_CFLAGS))
TARGET_arm_release_CFLAGS := $(filter-out -fomit-frame-pointer,$(TARGET_arm_release_CFLAGS))
These flags seem to make both thumb and arm objects work.
Finally, I also get "no time accumulated" in my profiling results. This happens both with a JNI-loaded library and with a fully native activity. I'm guessing the problem is caused by the multi-threaded nature of Android apps, even native ones. See http://sam.zoy.org/writings/programming/gprof.html for more details.
Yeah, the calling convention is that r0-r3 are free for the called function to reuse (i.e. profCount is free to clobber them). r4 onwards have to be preserved by the called function if it uses them.
ReplyDeleteI haven't looked at what armeabi-v7a spits out - I imagine that when you compile with armeabi-v7a and thumb2 in particular it is using a different calling mechanism to __gnu_mcount_nc. It hasn't been a priority for me to really investigate it though.
My testing is done on a multi-threaded app. I guess that there's something in 2.3 on hardware that's screwing it up, because even that works fine with regular armeabi on emulators.
The article you linked to is for the real gprof and the Linux kernel hooks it uses, which my own hack here doesn't use. I keep track of apparent time spent on each PC in a separate thread. As long as __gnu_mcount_nc is called, it should be registering time spent in the callee.
If you get the chance, see the other post I made about the native activity example.
thanks for posting your work. If works, it can be SOOO useful!..
ReplyDeleteI have a following issue. When trying to perform the build with -pg directive, I get
"arm-linux-androideabi-g++: -pg and -fomit-frame-pointer are incompatible". May be you can hint what can be done?..
THANKS.
@santa - you need to remove the -fomit-frame-pointer declarations from the build. To do that, you have to temporarily add some lines to your Android.mk. More details https://code.google.com/p/android-ndk-profiler/wiki/Usage
ReplyDeleteThanks! There was another source of CFLAGS in my project...
ReplyDeleteThanks again. I managed to build the thing. Yet, when I get into running, I get
ReplyDeleteI/PROFILING( 714): 0: No smaps found
Any chance you got a hint?
Thanks!!
corrected the misspelling of the parameter passed to monstartup. It rocks&rolls. Thanks!!
ReplyDelete@santa, yep that'll do it. I should add a better debug message in that case.
ReplyDeleteI try to use the tool to profile my projects. It seems that the bottle neck is the following function:
ReplyDelete28.00 11.51 11.51 4922 0.00 0.00 TaskQueue::Dequeue(Queue*, unsigned int, unsigned int*, unsigned long)
In this function, pthread_mutex_lock, pthread_cond_wait and pthread_mutex_unlock is used to for atomic operations.
So, if the call is pending in pthread_cond_wait, is the waiting time included in the final perf counter(because it doesn't take any CPU time)?
With version 2.0 of the library onwards you should get correct timing information when you run it on Gingerbread. Amongst the fixes is just this case: system calls with no overhead report no time. On Froyo the timing info is flaky even in 2.0, that's unfixable.
ReplyDeleteWhat would be the changes required to make this work with an x86 android device (Google TV) ?
ReplyDelete@Jay: the NDK supports x86 now, so what's missing is a conversion of the assembly glue for the profiling calls. That's this code: https://android-ndk-profiler.googlecode.com/svn/trunk/jni/gnu_mcount.S which is ARM only.
ReplyDeleteThanks for replying to my earlier query. I actually looked at prof.c and there seem to be some symbols which are defined only in ARM: "instruction_pointer". I also wrote some replacement assembly glue for x86, but it seems that __gnu_mcount_nc might not be the function call that is put in by gcc, since at no point did I see a call to profCount (which is ordinarily called by __gnu_mcount_nc). Is there a difference in the nomenclature for gnu_mcount_nc? (I remember reading about this on the gprof website, and it mentioned it could be compiler/OS dependent, _mcount, __mcount etc?) Do you have any idea about this?
ReplyDelete@Jay - looks like the glue function is called "mcount" on x86 and it is put in by the compiler, it doesn't appear in the disassembly of the object files like with arm. APart from that, there's the issue of getting the instruction pointer from pt_regs, which on arm is easy (there's a macro for it) but on x86 it's a mystery to me.
ReplyDeleteHello!
ReplyDeleteThanks for your effort developing the profiler, I can't wait to give it a try.
I'm having a slight problem, could you please help? I'm getting this error when I try to compile my lib with your profiler (strictly sticking to your instructions):
make: *** [/home/alex/the_project/android/obj/local/armeabi-v7a/libandprof.a] Error 1
make: *** Waiting for unfinished jobs....
**** Build Finished ****
The libandprof.a module is indeed at the path shown above.
Thanks in advance!
Can't do much with just that snippet. The Error 1 is the error code that the program actually erroring out passed to make. I guess it is gcc linking - perhaps you're using armeabi rather than armeabi-v7a?
ReplyDeleteNope, it's armeabi-v7a.
ReplyDeleteI didnt't manage to link with the .a module, but I can compile the profiler from source. However, I get SIGSEGV upon program startup from somewhere outside (before) of my source code.
Is it even legitimate to compile your profiler right into my shared module?
You can compile it in your own code, just as long as you make sure not to do so with -pg on the actual profiling code (or it will get confused).
ReplyDeleteRichard,
ReplyDeleteI'm trying to use your profiler on a JNI app. I don't have much experience with makefiles. I can't seem to get the makefile to build no matter where I insert the snippets from the Usage guide. Can you give me some guidance here? Here is my makefile:
TOP_LOCAL_PATH := $(call my-dir)
include $(TOP_LOCAL_PATH)/fmod/Android.mk
LOCAL_PATH := $(TOP_LOCAL_PATH)
include $(CLEAR_VARS)
LOCAL_CPP_EXTENSION := .cpp
LOCAL_MODULE := xxx
include $(LOCAL_PATH)/Sources.mk
LOCAL_USER_CFLAGS := -DTARGET_NDK -DOPENGLES -DUSE_LEVEL_LIST
ifeq ($(NDK_DEBUG),1)
LOCAL_USER_CFLAGS += -D_DEBUG
else
LOCAL_USER_CFLAGS += -DNDEBUG -Os -funroll-loops
endif
ifeq ($(NDK_LITE),1)
LOCAL_USER_CFLAGS += -D_LITE
endif
LOCAL_CFLAGS := -Wall $(LOCAL_USER_CFLAGS) $(LOCAL_INCLUDE_CFLAGS)
LOCAL_ARM_MODE := arm
LOCAL_LDLIBS := -llog -lGLESv1_CM -lz #-lthread_db
LOCAL_SHARED_LIBRARIES := fmod-prebuilt
include $(BUILD_SHARED_LIBRARY)
Is it possible to know how much CPU each thread in Android native code took. in short Android native code thread profiling
ReplyDeleteTo use Android native pofilier. I used the profiling library while compiling .so
ReplyDeleteand also used monstartup and mon cleaup in the source file sused to make .so.
I am able to get first three parameter in Flat pofile but rest of the parameters are zero. May I know is there as way to ge that information as well
output i m getting is
ReplyDelete% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 43066 0.00 0.00 std::less::operator()(std::string const&, std::string const&) const
Followed your instruction. But the output is weird. Why all the values are 0.00?
Is there something that i missed. Thanks in advance.
I got similar problem, no time data but only call, I used this library in Android 4.1 emulator and manually compiled it.
Delete