Friday, October 26, 2007

Murphy's Law

Anything that can go wrong, will.

Dave Murphy released version r21 of the popular homebrew tool kit devkitArm a few days ago, together with a new version of libnds. It introduces a couple of minor changes - minor from the end developer's point of view, but I'm sure took many hours of work to include - that make quite a difference.

One nice change is the addition of real time clock support in libnds. This means that any files created on your flash card have the correct time stamp, for example. Previously, I had to use my own code to initialise the clock, keep it refreshed and so on. Any changes that mean less of my code and more use of libraries is good.

The second noticeable change is support for Windows Vista. Meh. A nice side effect of this is that the problem with ndstool in Ubuntu 7.10 has also been fixed. The previous ndstool in r20 gave an "error 127" after upgrading to the new Gutsy release.

The release also upgraded gcc from 4.1.1 to 4.1.2. A minor version change, so the impact on my code base would be minimal, right? Wrong! Never underestimate the unpredictability of your own code. Especially if it is C++. Prepare to cry when you realise that the capricious compiler will do what it likes - your input code is just a suggestion!

My Bunjalloo code base has lots of test cases - unit tests that run on the PC to check everything is as expected, rendering tests that have to be inspected manually and also some minor library tests that just check some small functionality works. The first compile after upgrading to r21 resulted in the minor tests failing on the desmume emulator. One managed to get it to seg fault. Fair enough, emulators are not the real thing and the code nearly ran on hardware. I say nearly because it randomly froze.

I tried compiling without optimisation. Nothing, same result. I was stuck here. The code ran perfectly in r20. What could have changed that would cause it to fail so miserably? Step up devkitArm's maintainer, wintermute with a great suggestion. Add the exception handler and see what happens.

Aha! A result. The handler gave a Guru Meditation error. Kudos for the Amiga reference here. This shows the memory address of the code that was running at the time, the registers' state and a stack trace. I used gdb to find where the code actually was from the PC value. Setting a break point at a memory address also tells you the line of code if you have compiled with the -g flag. There's also a tool for doing this provided with devkitArm but I didn't know that at
the time.

Anyway, the errors all pointed to this horrible bit of code in my own
libndspp:
// clear vram
unsigned int * vram = (unsigned int*)VRAM_CR;
for (int i = 0; i < 0x3100/8; ++i) {
*vram++ = 0;
*vram++ = 0;
*vram++ = 0;
*vram++ = 0;
}
The register VRAM_CR is defined as follows:
#define VRAM_CR (*(vuint32*)0x04000240)
Great - I was changing a value to a pointer, throwing away the volatileness while I was at it. How had this ever worked? After "fixing" the code (the only possible fix was to delete this mess) the tests ran once again. Success!

Only not quite. Bunjalloo still didn't run. It has never worked on emulators, but now it only showed a white screen on hardware too. To cut a long story short, I spent many hours patiently commenting out code, removing dependencies and not linking certain modules until I at last discovered the culprit. It was this line:
static const int MAX_SIZE(nds::Canvas::instance().width()-7);
Which was buried deep within the file of a class that was used near the start of the program. Supposedly the singleton pattern prevents the problem of who instantiates an object first - it gets instantiated by whoever calls it first. The Canvas class here sets up the video and VRAM on the DS. It seems that for some reason calling this code too soon results in a white screen of death, and now in r21 it is called earlier, i.e. the static data is created earlier, than in r20. Who knows.

So there we have it - 2 strange bits of code, easily fixed once found, but which caused inexplicable bugs when they were still unknown. I dread to think what other booby traps lie in wait. The fact that code compiles, even runs, doesn't mean it will work when you change compilers, even minor version changes. There's just too much that can go wrong!

Now to figure out why the jpegdecoder library only decodes until it fills the buffer once, then inexplicably fails to decode any more...

2 comments:

  1. How exactly do you debug with gdb? I've been looking for something online that explains it but haven't found anything.

    ReplyDelete
  2. There are some clues here:

    http://code.google.com/p/quirkysoft/wiki/DebuggingNotes

    ReplyDelete

Note: only a member of this blog may post a comment.