The Mathenaeum – A story about testing and use cases

The importance of unit testing to the software development process is by now well established. Advantages include: a) demonstrating the functionality of code units, b) highlighting any unwanted side-effects caused by new changes, c) a B. F. Skinner-esque positive feedback system reflecting the progress and success of one’s development work. Most importantly perhaps, developing code that fails to perform as desired gives visibility into each successive point of failure and serves to motivate the development process. In general, you can’t fix the bugs that you can’t see and the importance of baking QA into the development workflow cannot be overstated. Unit testing, regression testing and continuous integration are an essential part of the software development process at Three Byte.

The Mathenaeum exhibit, built for the Museum of Mathematics that opened this past December, is a highly optimized, multi-threaded piece of 3D graphics software written in C++ with OpenGL and the Cinder framework. The algorithms employed for manipulating complex objects across a wide range of geometric manipulations and across multiple threads were challenging, but for me the most challenging and edifying part of this project was the problem of hardware integration and effective testing. More specifically, working on the Mathenaeum taught me about the difficulties associated with and the creativity required for effective testing.

Unlike some software deadlines, MoMath was going to open to the general public on December 15 whether we were ready for it or not. At Three Byte we were balancing the pressure of getting our product ready to deliver and the knowledge that long nights and stressful bouts of overtime can introduce more bugs than they fix. Just before opening day functionality on the Mathenaeum was complete. And we delivered…and the museum opened…and things looked fine…but every so often it would freeze. The freezes were infrequent and most visitors had a successful experience and show control software that we wrote made it trivial for the MoMath floor staff to restart a frozen exhibit from a smart-phone, but even an infrequent crash means a frustrated user and a failed exhibit experience which was devastating to me.

Visitors at work in the Mathenaeum

Effectively testing the Mathenaeum was a challenge. The first issue I solved was a slow leak of openGL display lists that weren’t being disposed of properly. This leak was aggravated by a bug in the communications protocol we had setup with a set of five LCD screens embedded in the Mathenaeum control deck. To set the screen state for the arduinos we were creating and opening Windows Socket 2 objects (SOCKET) but failing to close them. This meant we were leaking object handles and causing memory fragmentation causing the leaking Mathenaeum to crash after using only 100 MB of memory.

Visual Leak Detector for C++ was helpful in finding leaks, but in the end tracking the correlation between memory consumption in the task manager and various operations was sufficient for localizing all memory leaks. Despite plugging up all memory leaks the sporadic crash/freeze persisted and no matter what I tried and I could not reproduce the bug on my development machine. Visibility into this issue was basically zero.

Everyone knows that a developer cannot be an effective tester of his or her own software. Therefore, when trying to reproduce the Mathenaeum crash I would try to inhabit the psyche of a person who never saw this software before and is feeling their way around for the first time. Everyone at Three Byte tried to reproduce this bug but to no avail. So, I started spending time at MoMath observing the interactions that happened there. Lots of adults and kids took the time to build stunning creations in 3D and took the care to stylize every vertex and face with artistic precision. Some people were motivated by the novelty of the physical interface, the excitement in experimenting with the various geometric manipulations, and others seemed motivated by a desire to create a stunning piece of visual art to share with the world on a digital gallery. In addition, the most popular creations were printed by a nearby 3D printed and put on display for all to see. I saw a mother stand by in awe as her eleven year-old son learned to navigate the software and spent hours building an amazing creation. Watching people engaged in my exhibit inspired me in a way I never felt before and made me extremely proud to be a software developer.

However, I also saw a second type of interaction which was equally interesting. MoMath hosts a lot of school trips and it’s not uncommon for the museum floor to be “overrun” by hundreds of girls and boys under the age of eight. For these kids, the Mathenaeum is an amazingly dynamic contraption. The trackball (an undrilled bowling ball) can be made to spin at great speeds, the gearshift is a noise maker when banged from side to side and throttle generates exciting visual feedback when jammed in both directions. For this particular use case the Mathenaeum is being used to its fullest when two kids are spinning the trackball as fast as possible while two others work the gear shift and throttle with breakneck force. It soon became clear to me that the Mathenaeum was failing because it was never tested against this second use case.

The first step in stress testing the Mathenaeum, was making sure that my development machine used the same threading context as the production machines. Concretely, the Mathenaeum explicitly spawns four distinct threads: a) a render-loop thread, b) a trackball polling thread, c) an input polling thread, d) a local visitor/RFID tag polling thread. The physical interface on my development machine, being different from the trackball, gearshift and throttle on the deployment machines, was using only one thread for trackball and input polling (both emulated with the mouse). Replicating the deployment environment meant enforcing a threading context which was consistent in both places. In retrospect, this change was obvious and easy to implement, but I hadn’t yet realized the importance of automated stress testing.

My observations at the museum inspired the construction of a new module called fakePoll() which would be responsible for injecting method calls into the two input polling threads as fast as my 3.20 GHz Inter Xeon processor will allow. This overload of redundant calls, (similar perhaps to a team of second graders) works both input threads simultaneously, while causing all types of operations (and combinations thereof) and navigating the Mathenaeum state machine graph at great speeds. In short, fakePoll() made it possible to easily test every corner of Matheaneaum functionality and all the locks and mutexes and race conditions that could be achieved. Unsurprisingly, I was now able to crash the Mathenaeum in a fraction of a second – a veritable triumph!

Given a failing test I had new visibility into the points of failure and I started uncovering threading problem after threading problem. Numerous deadlocks, inconsistent states, rendering routines that weren’t thread safe, and more. With every fix, I was able to prolong the load test – first to two fractions of a second, then to a few seconds, then to a minute then a few minutes. Seeing all the threading mistakes I had missed was a little disheartening but an important learning experience. Injecting other operations into other threads such as an idle timeout to the attract screen and various visitor identification conditions exposed further bugs.

memoryCorruption.jpg

In a single threaded environment a heap corruption bug can be difficult to fix, however by peppering your code with: _ASSERTE(_CrtCheckMemory()); it’s possible to do a binary search over your source code and home in on the fault. In a multithreaded application solving this problem is like finding a needle in a haystack.

After spending hours poring over the most meticulous and painstaking logs I ever produced I finally found an unsafe state transition in the StylizeEdges::handleButton() method. This bug – the least reproducible and most elusive of all solved Mathenaeum bugs, exposed a weakness in the basic architectural choice on which the whole Mathenaeum was built.

The state machine pattern is characterized by a collection of a states, each deriving from a single base class, where each state is uniquely responsible for determining a) how to handle user input in that state, b) what states can be reached next, c) what to show on screen. The state machine design pattern is great because it enforces an architecture built on components which are modular and connected in an extensible network. In the state machine architecture, no individual component is aware of a global topology of states and states can be added or removed without any side-effects or cascade of changes. In the Mathenaeum, the specific set of operations and manipulations that a user can implement with the gearshift, button and throttle, depends on where that person stands within the network of available state machine states.

When a user navigates to the stylizeEdges state in the state machine, they are able to set the diameter of their selected edges and then change the color of these edges. After setting the color of the edges, we navigate them to the main menu state with the call:

_machine->setState(new MainMenuState(_machine));

The setState() method is responsible for deleting the current state and replacing it with a newly created state. At some point, I realized that if the user sets all selected edges to have diameter zero, effectively making these edges invisible, it doesn’t make sense to let the user set the color of these edges. Therefore, before letting the user set the edge color I added a check to see if the edges under inspection had any diameter. If the edges had no diameter, the user would be taken directly to the main menu state without being prompted to set an edge color.

This change set introduced a catastrophic bug. Now, the _machine->setState() could delete the stylizeEdges state before having exited the handleButton method(). In other words, the stylizeEdges state commits premature suicide (by deleting itself) resulting in memory corruption and an eventual crash. To fix the bug, I just had to insure that the handleButton() method would complete as soon as the _machine->setState() method was called.

Now my load test wasn’t failing and I was able to watch colors and shapes spinning and morphing on screen at incredible speeds for a full hour. I triumphantly pushed my changes to the exhibit on site and announced to the office: “the Mathenaeum software is now perfect.” Of course it wasn’t. After about five hours of load testing the Mathenaeum still crashes and I have my eye out for the cause, but I don’t think this bug will reproduce on site anytime soon so it’s low priority.

Some Mathenaeum creations:

Amichai