Written by Alex de Sousa
Some bugs pull you to the abyss and leave you there. They're the Balrog to our Gandalf. Once you beat them though, you're wiser and more powerful!
I was building a small plugin in C++ for an MT4 Server (ForEx trading server). The output of the project was a Windows DLL. Using the server's protocol, I managed to get JSON strings and parse them with RapidJSON. Everything ran smoothly in my virtual machine and in some development servers. Even Valgrind couldn't find memory leaks. I thought the plugin was ready for production...
Oh boy was I so wrong!
Note: RapidJSON is an amazing library and if I need to parse JSON in C/C++ again, I would use it without hesitation.
Once I deployed the plugin, everything seemed fine... until the next day! A nasty segmentation fault killed the server. The plugin made us loose some money and I had to roll back the deploy.
After several days of testing, I realized the production server always died with the same set of data. I was able to pinpoint the error to RapidJSON. Something weird was happening when the memory was allocated, but none of the tools I was using to debug this were reporting any problems.
I was desperate, so I compiled the DLL with debug symbols and then I de-compiled it using OllyDBG.
I started reading the DLL assembly code ... for a week and a half! Reading assembly was horrible. I considered switching careers. But then I got to the instruction that failed! Eureka! I couldn't believe it! It felt good to finally understand the bug!
The problem was that RapidJSON's custom allocator:
The production machine architecture:
It's easier to see with an image:
If RapidJSON needed to store an integer, a string and a boolean value, then it could fail randomly depending on the length of the string. e.g. given the integer 42
and the boolean true
:
Hey
, it would succeed:
Hi
, it would fail:
A debugging nightmare!
I just made RapidJSON use the machine's allocator instead of the custom one. Spent two weeks debugging something and changed just a single word in the code!
Now I avoid C/C++ at all costs!
I hope whatever bug you're dealing with at the moment gets solved soon!
Alex de Sousa
Elixir alchemist. Tech enthusiast.