PrimeSense technology is behind Microsoftâ€™s Kinect sensor system, which is designed to work with Microsoftâ€™s Xbox 360 console gaming system (Fig. 1). The theory of operation is simple, but its execution can be complex. PrimeSenseâ€™s PS1080 system-on-a-chip (SoC) handles it all.
The chip manages audio and visual information independently. Both are accessible via the USB connection. USB provides power exclusively for PrimeSenseâ€™s unit, but the Kinect requires additional power for its servos.
Most designers at least will be familiar with the way Kinect works with the Xbox. It provides a game with information about the players, who are located in front of the television with the Kinect facing them. The players move and gesture to interact with the game. How the Kinect gets this information is very interesting.
Prior to the Kinect, gesture recognition like this would be accomplished using a LIDAR (laser induced differential absorption radar) or laser radar. Ultrasonic sensors do not have the accuracy. Another approach is to use image analysis, but thatâ€™s very complicated and computationally complex.
PrimeSense uses a different approach. It projects a pattern of IR dots from the sensor and detects them using a conventional CMOS image sensor with an IR filter. The pattern will change based upon objects that reflect the light. The dots will change size and position based on how far the objects are from the source.
The PS1080 takes the results from the image sensor and determines the differences to generate a depth map. The resolution of the depth map is 1024 by 758 (VGA), but the CMOS sensor has much higher resolution. The image that can be captured by the hardware is actually 1600 by 1200, which is necessary to provide the depth map. Otherwise, there would be insufficient resolution to detect changes in the position and size of the projected IR dots.
The chip does the heavy lifting in identifying the dots and translating their state into a depth value. This is not a simple task, nor is it something the typical micro can handle. Luckily, the PS1080 can do this at 30 frames/s. Multiple dots are typically found within an area represented by one pixel.
The minimum range is just under a meter (0.8 m), and the maximum depth is about 3.5 m. This matches the target gaming audience that would be found in front of an HDTV. The field of vision is a rectangular cone with a 58 (H) by 45 (V) degree.
Resolution and the quality of detection depends upon the position of an object with respect to the sensors, but it tends to be sufficient for gameplay and for robot object recognition and avoidance. At 2 m, the depth resolution is is 10 mm while the horizontal and vertical resolution is 3 mm.
The visible video sensor and depth CMOS sensor are located next to each other, enabling the depth map to be merged with the color image. The PS1080 performs a registration process so the color image (RGB) and depth (D) information is aligned properly. The RGBD information is whatâ€™s available to the host.
The depth information alone can be handy for a robot that may need to avoid an object. Itâ€™s sometimes sufficient in gameplay depending upon what actions are being taken and how many players are being tracked. This RGBD information can be further analyzed, allowing a system to identify objects and how they might be related such as a hand-arm-body relationship.
The host handles higher-level object and action recognition. The OpenNI (natural interaction) organization provides a framework and application programming interface (API) for dealing with devices like Kinect. OpenNI addresses a range of devices including visual and audio devices (Fig. 3). It also deals with higher-end middleware for performing functions such as object tracking.
The Kinect has a pair of microphones built into the system. The PS1080 can handle four external digital audio sources as well. It delivers visible video, depth, and audio information in a synchronized fashion via the USB interface.
Microsoft has recognized that the Kinect is going to reach beyond the Xbox. The interface was hacked early on, and Microsoft has now released a software development kit (SDK) for it. A commercial version may be in the works, and PrimeSense also has a hardware development kit.
The technology is scalable. making it interesting for non-game and possible non-robotic applications. For example, accurate proximity detection may not require the visible video portion. PrimeSense technology is definitely going to change the way the world works.