The Mathenaeum – A story about testing and use cases

The importance of unit testing to the software development process is by now well established. Advantages include: a) demonstrating the functionality of code units, b) highlighting any unwanted side-effects caused by new changes, c) a B. F. Skinner-esque positive feedback system reflecting the progress and success of one’s development work. Most importantly perhaps, developing code that fails to perform as desired gives visibility into each successive point of failure and serves to motivate the development process. In general, you can’t fix the bugs that you can’t see and the importance of baking QA into the development workflow cannot be overstated. Unit testing, regression testing and continuous integration are an essential part of the software development process at Three Byte.

The Mathenaeum exhibit, built for the Museum of Mathematics that opened this past December, is a highly optimized, multi-threaded piece of 3D graphics software written in C++ with OpenGL and the Cinder framework. The algorithms employed for manipulating complex objects across a wide range of geometric manipulations and across multiple threads were challenging, but for me the most challenging and edifying part of this project was the problem of hardware integration and effective testing. More specifically, working on the Mathenaeum taught me about the difficulties associated with and the creativity required for effective testing.

Unlike some software deadlines, MoMath was going to open to the general public on December 15 whether we were ready for it or not. At Three Byte we were balancing the pressure of getting our product ready to deliver and the knowledge that long nights and stressful bouts of overtime can introduce more bugs than they fix. Just before opening day functionality on the Mathenaeum was complete. And we delivered…and the museum opened…and things looked fine…but every so often it would freeze. The freezes were infrequent and most visitors had a successful experience and show control software that we wrote made it trivial for the MoMath floor staff to restart a frozen exhibit from a smart-phone, but even an infrequent crash means a frustrated user and a failed exhibit experience which was devastating to me.

Visitors at work in the Mathenaeum

Effectively testing the Mathenaeum was a challenge. The first issue I solved was a slow leak of openGL display lists that weren’t being disposed of properly. This leak was aggravated by a bug in the communications protocol we had setup with a set of five LCD screens embedded in the Mathenaeum control deck. To set the screen state for the arduinos we were creating and opening Windows Socket 2 objects (SOCKET) but failing to close them. This meant we were leaking object handles and causing memory fragmentation causing the leaking Mathenaeum to crash after using only 100 MB of memory.

Visual Leak Detector for C++ was helpful in finding leaks, but in the end tracking the correlation between memory consumption in the task manager and various operations was sufficient for localizing all memory leaks. Despite plugging up all memory leaks the sporadic crash/freeze persisted and no matter what I tried and I could not reproduce the bug on my development machine. Visibility into this issue was basically zero.

Everyone knows that a developer cannot be an effective tester of his or her own software. Therefore, when trying to reproduce the Mathenaeum crash I would try to inhabit the psyche of a person who never saw this software before and is feeling their way around for the first time. Everyone at Three Byte tried to reproduce this bug but to no avail. So, I started spending time at MoMath observing the interactions that happened there. Lots of adults and kids took the time to build stunning creations in 3D and took the care to stylize every vertex and face with artistic precision. Some people were motivated by the novelty of the physical interface, the excitement in experimenting with the various geometric manipulations, and others seemed motivated by a desire to create a stunning piece of visual art to share with the world on a digital gallery. In addition, the most popular creations were printed by a nearby 3D printed and put on display for all to see. I saw a mother stand by in awe as her eleven year-old son learned to navigate the software and spent hours building an amazing creation. Watching people engaged in my exhibit inspired me in a way I never felt before and made me extremely proud to be a software developer.

However, I also saw a second type of interaction which was equally interesting. MoMath hosts a lot of school trips and it’s not uncommon for the museum floor to be “overrun” by hundreds of girls and boys under the age of eight. For these kids, the Mathenaeum is an amazingly dynamic contraption. The trackball (an undrilled bowling ball) can be made to spin at great speeds, the gearshift is a noise maker when banged from side to side and throttle generates exciting visual feedback when jammed in both directions. For this particular use case the Mathenaeum is being used to its fullest when two kids are spinning the trackball as fast as possible while two others work the gear shift and throttle with breakneck force. It soon became clear to me that the Mathenaeum was failing because it was never tested against this second use case.

The first step in stress testing the Mathenaeum, was making sure that my development machine used the same threading context as the production machines. Concretely, the Mathenaeum explicitly spawns four distinct threads: a) a render-loop thread, b) a trackball polling thread, c) an input polling thread, d) a local visitor/RFID tag polling thread. The physical interface on my development machine, being different from the trackball, gearshift and throttle on the deployment machines, was using only one thread for trackball and input polling (both emulated with the mouse). Replicating the deployment environment meant enforcing a threading context which was consistent in both places. In retrospect, this change was obvious and easy to implement, but I hadn’t yet realized the importance of automated stress testing.

My observations at the museum inspired the construction of a new module called fakePoll() which would be responsible for injecting method calls into the two input polling threads as fast as my 3.20 GHz Inter Xeon processor will allow. This overload of redundant calls, (similar perhaps to a team of second graders) works both input threads simultaneously, while causing all types of operations (and combinations thereof) and navigating the Mathenaeum state machine graph at great speeds. In short, fakePoll() made it possible to easily test every corner of Matheaneaum functionality and all the locks and mutexes and race conditions that could be achieved. Unsurprisingly, I was now able to crash the Mathenaeum in a fraction of a second – a veritable triumph!

Given a failing test I had new visibility into the points of failure and I started uncovering threading problem after threading problem. Numerous deadlocks, inconsistent states, rendering routines that weren’t thread safe, and more. With every fix, I was able to prolong the load test – first to two fractions of a second, then to a few seconds, then to a minute then a few minutes. Seeing all the threading mistakes I had missed was a little disheartening but an important learning experience. Injecting other operations into other threads such as an idle timeout to the attract screen and various visitor identification conditions exposed further bugs.

memoryCorruption.jpg

In a single threaded environment a heap corruption bug can be difficult to fix, however by peppering your code with: _ASSERTE(_CrtCheckMemory()); it’s possible to do a binary search over your source code and home in on the fault. In a multithreaded application solving this problem is like finding a needle in a haystack.

After spending hours poring over the most meticulous and painstaking logs I ever produced I finally found an unsafe state transition in the StylizeEdges::handleButton() method. This bug – the least reproducible and most elusive of all solved Mathenaeum bugs, exposed a weakness in the basic architectural choice on which the whole Mathenaeum was built.

The state machine pattern is characterized by a collection of a states, each deriving from a single base class, where each state is uniquely responsible for determining a) how to handle user input in that state, b) what states can be reached next, c) what to show on screen. The state machine design pattern is great because it enforces an architecture built on components which are modular and connected in an extensible network. In the state machine architecture, no individual component is aware of a global topology of states and states can be added or removed without any side-effects or cascade of changes. In the Mathenaeum, the specific set of operations and manipulations that a user can implement with the gearshift, button and throttle, depends on where that person stands within the network of available state machine states.

When a user navigates to the stylizeEdges state in the state machine, they are able to set the diameter of their selected edges and then change the color of these edges. After setting the color of the edges, we navigate them to the main menu state with the call:

_machine->setState(new MainMenuState(_machine));

The setState() method is responsible for deleting the current state and replacing it with a newly created state. At some point, I realized that if the user sets all selected edges to have diameter zero, effectively making these edges invisible, it doesn’t make sense to let the user set the color of these edges. Therefore, before letting the user set the edge color I added a check to see if the edges under inspection had any diameter. If the edges had no diameter, the user would be taken directly to the main menu state without being prompted to set an edge color.

This change set introduced a catastrophic bug. Now, the _machine->setState() could delete the stylizeEdges state before having exited the handleButton method(). In other words, the stylizeEdges state commits premature suicide (by deleting itself) resulting in memory corruption and an eventual crash. To fix the bug, I just had to insure that the handleButton() method would complete as soon as the _machine->setState() method was called.

Now my load test wasn’t failing and I was able to watch colors and shapes spinning and morphing on screen at incredible speeds for a full hour. I triumphantly pushed my changes to the exhibit on site and announced to the office: “the Mathenaeum software is now perfect.” Of course it wasn’t. After about five hours of load testing the Mathenaeum still crashes and I have my eye out for the cause, but I don’t think this bug will reproduce on site anytime soon so it’s low priority.

Some Mathenaeum creations:

Amichai

Low Latency Syncronization over the internet

Previously, ActiveDeck was able to stay in sync within about 4 seconds between the PowerPoint computer and it’s neighboring iPads. However, this wasn’t good enough for us- our background is in show control systems and frame accurate video playback systems. We have a curse of over-analyzing every video playback system we see for raster tear, frame skips and sync problems. Given that ActiveDeck solely relies on the internet, we thought 4 seconds was pretty good, considering. But we wanted to make it much, much better.

We thought about the best way to improve the latency, and our initial thinking leaned towards a local network broadcast originating from the computer running PowerPoint, but that introduces issues on WiFi networks, especially those you would find in hotel ballrooms. VPN to the cloud service would be another option, but adds lots of complexity.

We ended up using pure HTTPS communications (no sockets, no VPN, no broadcasts) to and from the cloud servers with the use of some clever coding. If the iPad has internet connectivity, it will be in sync.

Check the video out, this is over a cable modem internet connection and a plan Linksys WRT54G access point. Our Windows Azure servers are at least 13 router hops from our office. The beautiful part is that the sync messages are tiny and this will scale to hundreds of iPads.

100 iPads

Have you ever wondered what 106 iPads look like when packed as densely as possible? Here is a picture:

For a recent project, we developed a synchronized iPad display app. The project was to support a presentation with some new method of interacting with the participants. The designers liked the idea of handing out ipads to which they could “push” content they wanted, when they wanted.

So, we fired up xCode and built the iPad app. The application is made up of several modes of operation while the main mode is to display content driven by the presenter, so that on cue all of the iPads display new screens without any interaction by the person holding the iPad. This looks pretty awesome when it gets triggered and you can see all of the iPads change their screens at once.

Other modes are sort of like tests or drills, where the users complete a quiz and then submit that data to the presenter. We have another application there that creates graphs based on the statistics from all of the iPad users to which the presenter can speak to when projected on a large screen in the center of the room.

When we set out to design the system, we had to think through the potential bottlenecks. Our main concern was network latency, so after some research we specified the best wireless access points we could find- Ruckus Networks. See these links:

http://www.ruckuswireless.com/

http://www.tomshardware.com/reviews/beamforming-wifi-ruckus,2390.html.

We ended up with 5 access points and a network controller on a gigabit network. Worked great (a little bumpy the first day of the presentation to due to a faulty access point).

Next, we created a back end system where the content would be stored locally, yet able to be updated during the presentation. Using IIS, we posted the images and XML files on a local (to the event network) webservice. We then wrote a multi-threaded socket server on another computer that was dedicated to triggering page turns, mode changes, and initiating fresh content downloads to the iPad.

Here is a video during some initial sync testing, this is all running from one access point, and triggered by Chris and his PC.

Stack Exchange

About a year ago or so, I discovered this site for helping with software development problems:

www.stackoverflow.com

It’s a completely free, community powered site for asking and answering questions related to software development. It was made for experts, by experts. It turned out to be an amazing problem solving resource, and shortly thereafter www.serverfault.com and www.superuser.com were opened up, and now have a huge user base. I encourage you all to take a look at the quality of questions and answers.

The founders have decided to open up to the internet community and ask for ideas to start up new sites. There are ideas for sites from mythology to raw food to industrial control systems.

I’ve put a proposal out there for a site to cater to the community of AV professionals. The concept being this is where you ask the tough questions, and help out people with tough problems. I need people to sign up and back the proposal, as well as to ask sample questions to see if the quality meets the par. This thing needs a critical mass to make it to the next stage..

A sample question could be:

When installing a BSS Soundweb in a rack, can they be stacked with no spacing? Has anyone every had heat related problems?

Or, another sample question could be:

What is a good resource for figuring out how to send wake-on-lan to various computers from an AMX controller?

A BAD question could be:

What does the blinking red light on an AMX frame mean?

So, please go to:

http://area51.stackexchange.com/proposals/8341/audio-video-control-systems

log in and post sample questions..

thanks

Green AV?

I recently attended infocomm 2010. One topic of discussion was “Green AV”. It was pretty prevalent. Some manufacturers had amp meters attached to their gear with digital readouts so you could see in real time the amount of power consumption. I’ve even been noticing LEED accreditations on the email signatures of AV professionals.

Really? Green AV?

For years we’ve been integrating wake-on-lan, and sleep-on-lan procedures in our AV systems to minimize power consumption. I wonder if that qualifies us…

Green AV is a tough concept for me to get because i feel the best thing to do often is “turn the damn thing off”. Though I suppose that would be against my interest as one who makes his living from designing AV systems.

On nearly every project, I work up heat/power loads to determine how much electricity we’ll need as well as how much air conditioning required to cool the system. A few years ago out of curiosity I started to convert the power loads to their equivalent in oil (it was easy to find the conversion, though in the US I suppose most system are ultimately powered by coal).

There are about 5,800,000 BTU’s in a single barrel of oil. A barrel of oil is 42 gallons. Assuming perfect efficiency in the generation process….. You can take a 50” plasma screen and estimates that it consumes about 500 watts and further assume in a typical system runs for 12 hours per day. Total consumption for the day is 6 kilowatt hours. Multiply that by 3.412 to get the BTU equivalent, and we find that running this screen for the day consumes .353 barrels/oil.

Say your digital signage network has 20 screens, and you run them 12 hours per day, 365 days a year. The consumption is about 26 barrels or 1092 gallons of oil, not counting the computers to run it and the air conditioning to cool it.

I wonder how often the message is worth it? What exactly is Green AV?

–olaaf

Times Square Signs in Slow-Motion

And now, by popular demand, a few high-speed captures of LED signs in Times Square:

(All videos were captured at 1000fps with a Casio Exilim EX-FS10, rendered here as 30fps video.)

When transitioning from one frame to the next, the M&M’s sign and all components of the ABC sign update all pixels on the sign at once, demonstrated here in less than one of my video capture frames, so we know it’s less than 1ms. The different components of the ABC sign seem to update at different times because of a sync issue, not an LED technology issue.

LCD displays will lock to incoming signals with a variety of timings, often anywhere from ~57Hz to ~63Hz, depending on the display. When you tell a graphics card to output at 60Hz, it is doubtful that it’s putting out a true 60.000Hz. Depending on the make of the card, the drivers installed and the phase of the moon, the actual refresh rate will vary quite a bit. LED signs tend to maintain their own clock, regardless of the video signal that might be driving it. It is likely that without a genlock signal keeping the system in sync, the source signal will not provide frames at exactly the same frame rate that the LED sign is displaying them. The more disparate the source and display refresh rates, the more dropped or doubled frames you will observe, which will be visible to the layman as stuttering.

The Reuters sign, however, DOES update one row at a time, just like an LCD. The passing of the raster takes about 14ms, as we might expect for a sign running at 60Hz:

I cannot speculate as to the implications for their sync mechanism, except to observe that the rasters of the three screen segments visible in this video seem to be in good sync, though the frames themselves do not.

Software Code Names

What the hell are they for? besides sounding cool, our buddy Dave Sims said he read somewhere that they are supposed to obscure the actual purpose of the product. In a small shop like ours I don’t think obfuscation is a huge priority though..

In anycase, I went a little overboard a few months ago. We have an ongoing project which consists of a couple of different modules, so I decided to call it all “project sealife” and each component would be a different sea animal. We have:

  • Octopus
  • Seahorse
  • Clam
  • Guppy
  • Lobster
  • Dogfish (the watchdog app, I thought this was particularly clever)
  • Goldfish
  • Starfish

the only problem? no one remembers the name of each other’s module. sigh, big fail.

–olaaf

Computers Hiding as Solid State Devices

Prejudice. That is what it’s about. There is an old argument in the Pro AV industry about not using computers as video playback devices, or control systems, or anything else you can imagine that is mission critical.

I remember a few years ago on a certain project in Texas where the AV guys (I was the software consultant) absolutely refused to consider using an off the shelf Dell computer and a custom video playback app to run video to the screens. The options given to us were an Alcorn DVM-HD or a GVG Turbo. I mentioned to them that both of these systems were actually embedded PC’s with a custom PCI-E video output card (one ran XP pro embedded and the other some sort of Linux distro, or maybe even DOS- who knows). However, this didn’t seem to matter. it was all about how the non-computer felt and looked in the racks (and, they even threw the old argument- there isn’t enough rackspace). Come on, really? Maybe they didn’t trust my custom software solution, but Josh and I were building the master show control system running on multiple PC’s and servers, so I don’t see the logic.

Anyways, that project had a truckload of problems concerning video playback. My argument was that it’s the idiosyncrasies that end up dictating how these systems are run, and by holistically controlling one of the more important parts of the project (the video playback engine) we controlled the idiosyncrasies. After the system finally ended up stabilizing by buying twice as many devices (because the dual channel capability actually didn’t work too well in practice) the main issue was that the 74GB HDD size was way too small, but it was the biggest WD raptor drive available. oh well.

So, back to mission critical stuff running on PC’s. Here is a story about BAE systems installing Win XP & 2k as a mission critical command and control system in the Royal Navy’s Trafalgar and Vanguard class nuclear subs:

BAE Win XP and Win2k on subs

How about that? If we dial it back to the level of AV systems, consider personal mission critical stuff. Like your communications device. I bought an iPhone last week. It’s basically a mini computer. it’s now loaded with over 40 applications, and who knows who wrote these things? I’m not afraid this thing is going to “crash” if I dial 911, why would you be afraid if a video screen somewhere glitches for an hour or two? It’s obviously different if you are in a theatre situation, but that is what backup hardware, and, ahem, professional quality code and professional project management, is for.

PS, there is a whole slew of pictures on the internet of random massive LED screens in times square and other places showing windows error screens that stay up for DAYS. That’s just pathetic and reeks of poor planning. Why would you drop 1+ mil on that thing and not have a plan to service & monitor it?

here are some fun pics

appcrash1.jpg
BSOD11.jpg