Kinect: Cheap Key

The 3Byte R&D lab recently purchased a Microsoft Kinect to play with. We didn’t mind the fact that we don’t have an XBox to plug it into because Code Laboratories has published an SDK which allows you to use C# (and several other high-level languages) to access the camera feed. In fact, the test app that they distribute is very cool for immediately figuring out why this device is different than a normal web cam:

In addition to providing a normal color camera video stream (with red, green, and blue pixels), it also provides another dimension (literally) of depth information in a separate parallel stream. The picture above is me sitting at my desk, and the depth feed has been colorized to give a rough indication of where different objects are in the frame.

So, how do we do something useful with our new toy?

One thing that we immediately decided to try is Kinect Keying. The concept is similar to chroma keying but instead of requiring a solid blue or green colored background, we use the depth information from the Kinect to extract only the elements at a certain physical depth. I tackled this problem in a proof-of-concept project using WPF.

The important transformations happen in two steps:

    First I create a mask by capturing the depth frame from the camera and choose a specific depth value to isolate (plus or minus a margin of error). For every pixel in the depth frame, if it is within the desired depth slice, I keep it; if it is closer or farther away, then I set that pixel to 0 so that we ignore it.
    Second, I combine the new depth mask with the normal incoming video signal, and if the pixel from the depth mask is greater than 0, keep the video pixel; otherwise, set the video pixel alpha to 0.0 so that it is totally transparent.

Combine this with a background image, and we can send Mr. Gingerbread man on a trip to the desert:

The upper left-hand corner is the normal video feed of G-Bread standing on his desk. To the right is the grayscale version of the simultaneous depth feed from the camera. Anything in black is either too close or too far away for the camera to perceive it, but that is ok, because we care about a particular section of the mid-field here.

On the bottom left is the depth mask I created by specifying a specific depth slice. The sliders at the bottom of the screen allow you to easily adjust the desired depth and the tolerance (how much depth) to slice.

Finally, on the lower right is the composited image with a static background. As you can see, this a bit primitive because the incoming depth signal is somewhat noisy and it isn’t perfectly registered with the video image (there are two cameras in slightly different positions). But this demonstrates that a cheap keying effect is possible without specialized hardware or sets.

The source code as a Visual Studio project is available here: KinectDepthSample

With thanks to Code Laboratories for their great SDK and managed libraries, and to Greg Schechter for his series of articles on leveraging GPU acceleration through pixel shaders in a managed environment.