Some bugs pull you to the abyss and leave you there. They're the Balrog to our Gandalf. Once you beat them though, you're wiser and more powerful!
I was building a small plugin in C++ for an MT4 Server (ForEx trading server). The output of the project was a Windows DLL. Using the server's protocol, I managed to get JSON strings and parse them with RapidJSON. Everything ran smoothly in my virtual machine and in some development servers. Even Valgrind couldn't find memory leaks. I thought the plugin was ready for production...
Oh boy was I so wrong!
Note: RapidJSON is an amazing library and if I need to parse JSON in C/C++ again, I would use it without hesitation.
Debugging The Problem
Once I deployed the plugin, everything seemed fine... until the next day! A nasty segmentation fault killed the server. The plugin made us loose some money and I had to roll back the deploy.
After several days of testing, I realized the production server always died with the same set of data. I was able to pinpoint the error to RapidJSON. Something weird was happening when the memory was allocated, but none of the tools I was using to debug this were reporting any problems.
I was desperate, so I compiled the DLL with debug symbols and then I de-compiled it using OllyDBG.
I started reading the DLL assembly code ... for a week and a half! Reading assembly was horrible. I considered switching careers. But then I got to the instruction that failed! Eureka! I couldn't believe it! It felt good to finally understand the bug!
The problem was that RapidJSON's custom allocator:
- Compressed the data in memory.
- Allocated only what it needed.
The production machine architecture:
- Allocated the memory RapidJSON asked for.
- Ignored the way RapidJSON wanted the data to be structured.
It's easier to see with an image:
If RapidJSON needed to store an integer, a string and a boolean value, then it could fail randomly depending on the length of the string. e.g. given the integer
42 and the boolean
- For the string
Hey, it would succeed:
- For the string
Hi, it would fail: A debugging nightmare!
I just made RapidJSON use the machine's allocator instead of the custom one. Spent two weeks debugging something and changed just a single word in the code!
Now I avoid C/C++ at all costs!
I hope whatever bug you're dealing with at the moment gets solved soon!