The Mathenaeum – A story about testing and use cases

The importance of unit testing to the software development process is by now well established. Advantages include: a) demonstrating the functionality of code units, b) highlighting any unwanted side-effects caused by new changes, c) a B. F. Skinner-esque positive feedback system reflecting the progress and success of one’s development work. Most importantly perhaps, developing code that fails to perform as desired gives visibility into each successive point of failure and serves to motivate the development process. In general, you can’t fix the bugs that you can’t see and the importance of baking QA into the development workflow cannot be overstated. Unit testing, regression testing and continuous integration are an essential part of the software development process at Three Byte.

The Mathenaeum exhibit, built for the Museum of Mathematics that opened this past December, is a highly optimized, multi-threaded piece of 3D graphics software written in C++ with OpenGL and the Cinder framework. The algorithms employed for manipulating complex objects across a wide range of geometric manipulations and across multiple threads were challenging, but for me the most challenging and edifying part of this project was the problem of hardware integration and effective testing. More specifically, working on the Mathenaeum taught me about the difficulties associated with and the creativity required for effective testing.

Unlike some software deadlines, MoMath was going to open to the general public on December 15 whether we were ready for it or not. At Three Byte we were balancing the pressure of getting our product ready to deliver and the knowledge that long nights and stressful bouts of overtime can introduce more bugs than they fix. Just before opening day functionality on the Mathenaeum was complete. And we delivered…and the museum opened…and things looked fine…but every so often it would freeze. The freezes were infrequent and most visitors had a successful experience and show control software that we wrote made it trivial for the MoMath floor staff to restart a frozen exhibit from a smart-phone, but even an infrequent crash means a frustrated user and a failed exhibit experience which was devastating to me.

Visitors at work in the Mathenaeum

Effectively testing the Mathenaeum was a challenge. The first issue I solved was a slow leak of openGL display lists that weren’t being disposed of properly. This leak was aggravated by a bug in the communications protocol we had setup with a set of five LCD screens embedded in the Mathenaeum control deck. To set the screen state for the arduinos we were creating and opening Windows Socket 2 objects (SOCKET) but failing to close them. This meant we were leaking object handles and causing memory fragmentation causing the leaking Mathenaeum to crash after using only 100 MB of memory.

Visual Leak Detector for C++ was helpful in finding leaks, but in the end tracking the correlation between memory consumption in the task manager and various operations was sufficient for localizing all memory leaks. Despite plugging up all memory leaks the sporadic crash/freeze persisted and no matter what I tried and I could not reproduce the bug on my development machine. Visibility into this issue was basically zero.

Everyone knows that a developer cannot be an effective tester of his or her own software. Therefore, when trying to reproduce the Mathenaeum crash I would try to inhabit the psyche of a person who never saw this software before and is feeling their way around for the first time. Everyone at Three Byte tried to reproduce this bug but to no avail. So, I started spending time at MoMath observing the interactions that happened there. Lots of adults and kids took the time to build stunning creations in 3D and took the care to stylize every vertex and face with artistic precision. Some people were motivated by the novelty of the physical interface, the excitement in experimenting with the various geometric manipulations, and others seemed motivated by a desire to create a stunning piece of visual art to share with the world on a digital gallery. In addition, the most popular creations were printed by a nearby 3D printed and put on display for all to see. I saw a mother stand by in awe as her eleven year-old son learned to navigate the software and spent hours building an amazing creation. Watching people engaged in my exhibit inspired me in a way I never felt before and made me extremely proud to be a software developer.

However, I also saw a second type of interaction which was equally interesting. MoMath hosts a lot of school trips and it’s not uncommon for the museum floor to be “overrun” by hundreds of girls and boys under the age of eight. For these kids, the Mathenaeum is an amazingly dynamic contraption. The trackball (an undrilled bowling ball) can be made to spin at great speeds, the gearshift is a noise maker when banged from side to side and throttle generates exciting visual feedback when jammed in both directions. For this particular use case the Mathenaeum is being used to its fullest when two kids are spinning the trackball as fast as possible while two others work the gear shift and throttle with breakneck force. It soon became clear to me that the Mathenaeum was failing because it was never tested against this second use case.

The first step in stress testing the Mathenaeum, was making sure that my development machine used the same threading context as the production machines. Concretely, the Mathenaeum explicitly spawns four distinct threads: a) a render-loop thread, b) a trackball polling thread, c) an input polling thread, d) a local visitor/RFID tag polling thread. The physical interface on my development machine, being different from the trackball, gearshift and throttle on the deployment machines, was using only one thread for trackball and input polling (both emulated with the mouse). Replicating the deployment environment meant enforcing a threading context which was consistent in both places. In retrospect, this change was obvious and easy to implement, but I hadn’t yet realized the importance of automated stress testing.

My observations at the museum inspired the construction of a new module called fakePoll() which would be responsible for injecting method calls into the two input polling threads as fast as my 3.20 GHz Inter Xeon processor will allow. This overload of redundant calls, (similar perhaps to a team of second graders) works both input threads simultaneously, while causing all types of operations (and combinations thereof) and navigating the Mathenaeum state machine graph at great speeds. In short, fakePoll() made it possible to easily test every corner of Matheaneaum functionality and all the locks and mutexes and race conditions that could be achieved. Unsurprisingly, I was now able to crash the Mathenaeum in a fraction of a second – a veritable triumph!

Given a failing test I had new visibility into the points of failure and I started uncovering threading problem after threading problem. Numerous deadlocks, inconsistent states, rendering routines that weren’t thread safe, and more. With every fix, I was able to prolong the load test – first to two fractions of a second, then to a few seconds, then to a minute then a few minutes. Seeing all the threading mistakes I had missed was a little disheartening but an important learning experience. Injecting other operations into other threads such as an idle timeout to the attract screen and various visitor identification conditions exposed further bugs.


In a single threaded environment a heap corruption bug can be difficult to fix, however by peppering your code with: _ASSERTE(_CrtCheckMemory()); it’s possible to do a binary search over your source code and home in on the fault. In a multithreaded application solving this problem is like finding a needle in a haystack.

After spending hours poring over the most meticulous and painstaking logs I ever produced I finally found an unsafe state transition in the StylizeEdges::handleButton() method. This bug – the least reproducible and most elusive of all solved Mathenaeum bugs, exposed a weakness in the basic architectural choice on which the whole Mathenaeum was built.

The state machine pattern is characterized by a collection of a states, each deriving from a single base class, where each state is uniquely responsible for determining a) how to handle user input in that state, b) what states can be reached next, c) what to show on screen. The state machine design pattern is great because it enforces an architecture built on components which are modular and connected in an extensible network. In the state machine architecture, no individual component is aware of a global topology of states and states can be added or removed without any side-effects or cascade of changes. In the Mathenaeum, the specific set of operations and manipulations that a user can implement with the gearshift, button and throttle, depends on where that person stands within the network of available state machine states.

When a user navigates to the stylizeEdges state in the state machine, they are able to set the diameter of their selected edges and then change the color of these edges. After setting the color of the edges, we navigate them to the main menu state with the call:

_machine->setState(new MainMenuState(_machine));

The setState() method is responsible for deleting the current state and replacing it with a newly created state. At some point, I realized that if the user sets all selected edges to have diameter zero, effectively making these edges invisible, it doesn’t make sense to let the user set the color of these edges. Therefore, before letting the user set the edge color I added a check to see if the edges under inspection had any diameter. If the edges had no diameter, the user would be taken directly to the main menu state without being prompted to set an edge color.

This change set introduced a catastrophic bug. Now, the _machine->setState() could delete the stylizeEdges state before having exited the handleButton method(). In other words, the stylizeEdges state commits premature suicide (by deleting itself) resulting in memory corruption and an eventual crash. To fix the bug, I just had to insure that the handleButton() method would complete as soon as the _machine->setState() method was called.

Now my load test wasn’t failing and I was able to watch colors and shapes spinning and morphing on screen at incredible speeds for a full hour. I triumphantly pushed my changes to the exhibit on site and announced to the office: “the Mathenaeum software is now perfect.” Of course it wasn’t. After about five hours of load testing the Mathenaeum still crashes and I have my eye out for the cause, but I don’t think this bug will reproduce on site anytime soon so it’s low priority.

Some Mathenaeum creations:


PrePix Virtualization Software

I cannot speculate as to the implications for their sync mechanism, except to observe that the rasters of the three screen segments visible in this video seem to be in good sync, though the frames themselves do not.

A PrePix Screenshot

Different technologies are modeled by selecting a Pixel Model, which is nothing more than a macro image representing a single pixel of the intended display technology displaying white at full brightness. Some sample Pixel Model images representing LCD and various LED technologies are included, but additional Pixel Models may be added easily by the user.

PrePix includes a graphical interface for setting up a virtual sign with parameters for Pixel Model selection, pixel dimensions, physical sign dimensions, brightness gain, RGB level adjustments, and source video selection. PrePix supports 30fps wmv video playback as well as jpg, gif and bmp images.

Source video is automatically up-sampled or down-sampled to match the pixel dimensions of the virtual video surface. Then, a Pixel Model is applied, followed by the Gain and Color Balance layers. The virtual display is viewable in a virtual 3D space.

In the 3D virtual space, the user may select from preset views including “Pixel Match,” which attempts to match large virtual LED pixels onto the PrePix user’s computer display so that the user may get a sense of the look of a virtual video surface in a real, physical space. Viewing a real 11mm LED display from 10 feet away should be comparable to viewing such a virtual display Pixel Matched to a 24” LCD display from the same distance, with the only significant difference being brightness. If this kind of modeling is deemed comparable and effective, then PrePix can be used to determine optimal viewing distances for different pixel pitches and LED technologies without requiring a lot of sample hardware.

PixelMatch close-up

An even more concrete use of PrePix is to determine the effectiveness of video or image content when displayed on certain low-resolution LED signs. The LED signs on the sides of MTA busses in NYC have pixel dimensions of 288×56. They are often sourced with video content that was obviously designed for a higher resolution display. With PrePix, a content producer can easily preview his video content on a virtual LED bus sign to check text legibility and graphical effectiveness.

288x56-pixel source image

288x56-pixel source image

Close-up of 3D video display model

Close-up of 3D video display model

Full view of virtual LED sign. The moire patterns is comparable to that 
which would be seen in a digital photo of a real LED display.

Full view of virtual LED sign. The moire patterns is comparable to that which would be seen in a digital photo of a real LED display.

3Byte would like to develop PrePix further, depending on feedback from users. Please let us know what you think!

PrePix can be downloaded here.

PrePix System Requirements:

  • Windows XP, Vista or 7
  • A graphics card supporting OpenGl 3.0 and FBOs (Nearly all cards less than 18 months old.)

3Byte can be contacted at

Video Synchronization Testing Part II

In a previous post, I analyzed a video synchronization test from a recent video installation, and suggested that although the synchronization between screens was not perfect, it was certainly satisfactory, and as good as might be expected given the design of the system. Now, lets see why.

When a multi-screen video system is in proper sync, each video frame begins at exactly the same time. A system that draws at 60fps will draw each frame in approximately 16.67 milliseconds. During those 16ms, an LCD display will update all of the pixels on screen from top to bottom. We will call the moving line across which the pixels are updated the Raster Line. In this slow-motion test video, you can see video frames alternating between full black and full white, updated from top to bottom. The screens are mounted in portrait orientation, which is why the updates happen right to left:

Many of the screens seem to be updating together, but some do not. This is because the system does not include dedicated hardware to ensure that the signals are in sync, so many of the displays begin their updates at different time. The system is frame-synchronized, meaning that all displays begin a raster sync within one raster-pass of each other. It just isn’t raster-synchronized.

If the displays were indeed raster-synchronized, we might represent their signals like so:


In an otherwise black video, Display 1 flashes a single frame (Frame 1 / F1) of white, followed by a flash of white on Display 2 in Frame 2. In slow-motion, we would see the raster line, represented here as diagonal lines, move across bother displays in sync, like two Tangueros walking together across a dance floor. The red line represents the particular pass of the raster line at which both displays transition from Frame 1 to Frame 2. In all of these illustrations, the red line indicates a pass of the raster line, or a switch between frames, that in a frame-synchronized system would occur at exactly the same time.

It is important to keep in mind that in this system, video is provided at 30fps, so each frame of video is essentially drawn twice in two consecutive passes of the raster line. This cannot be seen on screen as no change occurs in any of the pixels, but we can see it in our illustration as the diagonal line separating F1a and F1b, the two rendered passes of video Frame 1.

In our system, of course, the raster lines are not synchronized between displays. So even when we intend for both displays to flash white at the same time, it is possible that one display might begin a raster pass at the very end of another’s pass of the same frame:


We can certainly hope that the raster lines of any two average displays are nearly synchronized, but in observance of Murphey’s Law, we must always assume the worst case, as indicated above in Displays 1 and 2. Here, we can see that although the frames might be in sync, with all rasters commencing within 17ms of each other, we will expect to see the raster on Display 2 commence just at the end of that of Display 1. This kind of behavior can be seen in our test video on the third and fourth columns, with the raster seeming to pass smoothly from one display to the other. In truth, any case in which two rasters start more than 16.67ms apart from each other demonstrates an imperfect frame sync, but for simplicity we will just say that the worst case in one in which they commence 17ms apart, as illustrated above in Displays 1 and 2.

So, what might this look like in a case where we see flashes in two consecutive video frames on two separate displays in a worst-case scenario?


In the case of Displays 1 and 4, we have a point in time, Time X, at which both displays are entirely black. In the case of Displays 2 and 3, at Time X we see both displays entirely white. We still consider them to be in sync, and we should not consider these anomalies to indicate a failure of the synchronization mechanism.

There is a minor issue in the test video that does bear mentioning. The flashes move through four rows of white in each column. There are extremely brief moments during which you might notice a touch of white that is visible in the first and third row. This should be obvious after repeated viewing. I leave it as an exercise for the reader to demonstrate why, even in a worst case scenario, this should not be observed if the frame synchronization mechanism is working properly. (<sarcasm>Yeah, right.</sarcasm>) So why am I not concerned? Because it is the nature of LCD display pixels to take some time to switch from full white to full black. Typical response times for the particular displays in this system are specified as 9ms, which means that after the raster line passes and updates a pixel from black to white, it may take an average of 9ms (more than half a raster pass) for that pixel to fully change. I say “Average” because white-to-black and black-to-white transitions are usually slightly different, and the spec will mention only the average of the two. If the response time was an ideal zero ms, our raster line would be crisp and clear in our slow-motion capture, but in reality it is not. The raster line is blurry because after it passes, the pixels take time to change between black and white. We can expect that some pixels might remain white for a brief time after we expect them to go dark, resulting in this subtle observable discrepancy.

What does all this mean? It means that in a slow-motion test video of 24 synchronized displays, we observe nothing to suggest that the synchronization mechanism isn’t performing as well as we could hope for. To the viewer, the synchronization is true, and we deem the project a success.

Pixel Perfect Programming

In any project with multiple display devices that need to work together, an important part of the setup process is aligning and calibrating the displays to create a single canvas.  The best way to ensure that the virtual display space is coherent is to use display grids that highlight intersection points while adjusting physical display alignment and video signal positioning.  Grids are not only necessary while calibrating the final image, but are an invaluable template for talking about how to create media that will take advantage of a display surface, rather than be obstructed by it.  Good templates are important and need to be developed for an exact canvas.

Not too long ago, this usually meant going to Photoshop or Paint.NET and drawing.  First I would need to create a new document that is the total size of the canvas, taking into account the display pixels as well as any virtual pixels for mullions or gaps between the adjacent displays.  I wanted the grids to be precise, with each grid line exactly one pixel wide, so I would zoom all the way in to draw one pixel accurately.  Then I count down 96 pixels and draw another line, but it’s really hard to count to 96 on screen, especially when I have to scroll because everything is zoomed so big:

This takes a really long time, but I finally made a complex alignment grid with different colored layers for circles and diagonals, and straight grids.

A finished projection alignment grid

Then another project came along and I had to do it all over.  I tried scaling an old grid to match the new resolution, but then everything starts to alias and blur and you can’t actually tell where the precise lines are anymore.

I thought that there must be an easier way.  Without too much trouble I was able to leverage Windows Presentation Foundation to do the job.  I parametrized the important attributes of the grid and created Properties that I could set according to the project:

DisplaySizeX = 1920;
DisplaySizeY = 1080;
DisplayCountX = 4;
DisplayCountY = 6;
GridSpacingWidth = 128;
GridSpacingHeight = 128;
MullionX = 24;
MullionY = 120;
GenerateDiagonals = true;
GenerateCircles = true;

Using a Canvas inside a Viewbox, I was able to work with the native dimensions of my virtual canvas, but display an onscreen preview, like a thumbnail.  In XAML, this is all you need:

<Canvas Name="RenderCanvas" Background="Black" />

With the parameters set, WPF’s Canvas layout container allowed me to draw to screen with absolute coordinates using primitives like Line, Rectangle, and Ellipse (everything you need for a grid).  Here is a snippet of the loop used to populate the virtual canvas with display devices represented by rectangles:

for(int x = 0; x < DisplayCountX; x++) {
for(int y = 0; y < DisplayCountY; y++) {
Rectangle newRect = new Rectangle()
Width = DisplaySizeX,
Height = DisplaySizeY
(double)(x * DisplaySizeX) + (x * MullionX));
(double)(y * DisplaySizeY) + (y * MullionY));
newRect.Stroke = new SolidColorBrush(Colors.Wheat);
newRect.Fill = new SolidColorBrush(Colors.DarkGray);

Ultimately, I added the logic to draw all of the alignment elements I needed, including horizontal and vertical grid lines, circles, diagonals and some text to uniquely identify each screen in the array. It looks like this:

This isn’t as detailed as many alignment grids, but it is straight-forward to add more elements and colors using the same process.  The important thing is that all the time you spend making a useful template is reusable, because it isn’t locked to a specific dimension.

Finally, this is what makes the whole process worthwhile: In WPF, anything you can draw to screen, you can also render to an image file. It’s like taking a programmatic screenshot, but with much more control. Cribbing some code from Rick Strahl’s blog, I added a method to save a finished png file to disk:

private void SaveToBitmap(FrameworkElement surface, string filename) {
Transform xform = surface.LayoutTransform;
surface.LayoutTransform = null;
int width = (int)surface.Width;
int height = (int)surface.Height;
Size sSize = new Size(width, height);
surface.Arrange(new Rect(sSize));
RenderTargetBitmap renderBitmap = new RenderTargetBitmap(width, height,
96, 96, PixelFormats.Pbgra32);
using(FileStream fStream = new FileStream(filename, FileMode.Create)) {
PngBitmapEncoder pngEncoder = new PngBitmapEncoder();
surface.LayoutTransform = xform;

As a programmer, this saved me a lot of time. Instead of working with graphic design tools, I used the tools I was familiar with and took advantage of WPF’s support for media and imaging. I have attached a complete sample project of how this first part works.

But this is just the beginning of the things this is useful for: Using Data Binding, I exposed each parameter as an input field, so the user can interactively build a grid image by adjusting the values and see the thumbnail preview update live. Furthermore, instead of generating a single static image, this same process can be used to produce test video content by updating the RenderCanvas and saving a series of images as a frame sequence. This ends up being much easier than going to a timeline based non-linear editing station to create very simple graphics with a timecode display.

Video Synchronization Testing

For a recent project, 3byte developed custom graphics software for a 36-screen video wall. This required some kind of synchronization mechanism with which to keep the various screens in sync.  There are dedicated hardware devices like the nVidia G-Sync card that make this sort of thing really simple. However, this project involved driving four video display from each of our graphics workstations, and we ran into trouble with these cards during initial testing. Instead, we developed our own sync mechanism that runs over the local network.

To test our synchronization, we loaded up a video file that would make the quality of the sync really obvious. Breaking the video vertically into four quarters, on each frame of the video we flash a white rectangle in one of the quarter regions. Like so:

Really, it works better with some good PsyTrance. But what is actually happening? Whatever it is, it’s happening too fast to determine with the naked eye. So I picked up a Casio Exilim EX-FS10 and shot the system at 1000fps:

Far more interesting. The resolution of the video is not great, but it shows us what we need to know.

First of all, you will notice the scrolling of the video from right to left. The screens are actually mounted in portrait orientation, to the motion that you see is actual a top-to-bottom scroll as far as the display is concerned. These LCD displays refresh the pixels on screen 60 times every second, but they don’t all update at the same time. The display updates the pixels line by line from top to bottom, each line updating left to right. This makes sense when we consider that analog and digital video signals order pixel data in this way, just like reading a page of a book. At regular speed, a screen refresh seems instantaneous, but at high speed, we can see the way the screen transitions from black to white one line at a time. This scrolling line across which the pixels are updated we might call the Raster, or the Raster Position.

In this system, the timing of each display’s Raster Line is determined by the graphics card in the computer. Whenever the card says “start drawing the next frame… NOW!” the monitor locks in with that signal and starts the Raster Line at the top of the screen. Had we G-Sync cards for this system, we could tell the graphics cards to chant their “NOW” mantras in unison, and in slow motion we would see the Raster Lines of all the displays being drawn in perfect synchronization. As you can see in the video above, this is not the case for our system, where the lines are updated on different displays at slightly different times. This difference between displays is so subtle that it is never noticed by a viewer. The question is, are the correct frames being drawn on each pass of the Raster?

This system supports source video playback at 30fps, but the displays update at 60fps. Each source video frame is doubled on screen, so a single frame of white flash in our source video will be drawn white in two consecutive passes of the Raster Line on the display. In the slow-motion video, we see the raster line update each screen, then a pause while the subsequent Raster pass draws another frame of white (no change) before moving on the to next source video frame.

If you look at the third and fourth columns of displays, you will see that the Raster seems almost to move straight across from the fourth to the third as it updates the two columns together. Of course this is only an illusion, as the Raster does not sync between frames. What we are actually seeing is one display in column three that seems to be lagging behind column four by almost a full 17ms Raster pass. (I say 17ms because that is just about the amount of time it takes to refresh a display at 60fps.) This is not ideal, but in a system with no dedicated sync hardware, it is not surprising, and not a deal-breaker. It means that at 30fps, these screens are within a half-frame of perfect sync, which is nearly undetectable to the eye.

In Part II, I provide an analysis of the best possible sync performance for a network-synchronized video system. I  explain why the 17ms discrepancy in the video above falls within the tolerances of this system. We are quite pleased with the performance of our synchronization mechanism, and believe that it rivals or surpasses that of other industry-standard network-sync video systems. Next chance I get, I’ll run a similar test on a Dataton Watchout system and let you all know how it goes. Stay tuned.