CSCE 436 Blog: Paper Reading #13: Combining Multiple Depth Cameras and Projectors for Interactions On, Above, and Between Surfaces

Reference Information
Combining Multiple Depth Cameras and Projectors for Interactions On, Above, and Between Surfaces
Andrew D. Wilson and Hrvoje Benko
Presented at UIST'10, October 3-6, 2010, New York, New York, USA

Author Bios
Andrew D. Wilson is a senior researcher at Microsoft Research with a PhD from MIT's Media Lab. He helped found the Surface Computing group.
Hrovje Benko is a research at Microsoft Research's Adaptive Systems and Interactions group. He has a PhD from Columbia University and has an interest in augmented reality.

Summary
Hypothesis
How do depth cameras enable new interactive experiences? Does LightSpace enable interactivity and visualization?

Methods
The authors demonstrated the prototype at a three-day demo to more than 800 people. Users could use the system freely.

Results
No more than six users was ideal for correct interactions and to keep the system running smoothly. Users could block each other from the camera, precluding interaction. Holding the item was difficult. Users found new methods of interaction while using the system.

Contents
The authors produced a room-sized system with projectors and depth cameras that combines interaction displays, augmented reality, and smart rooms for a new interaction system. Depth cameras make real-time 3D modelling inexpensive. LightSpace's primary themes are that everything is a surface (building on ubiquitous computing), the room is the computer (reminiscent of smart rooms), and the body can be a display (related to ubiquitous projection). Multiple calibrated depth cameras and projectors project graphics onto any object's surface, even a moving one, through a shared coordinate system. There are no physical markers. 2D image processing can occur in 3D space through meshes to allow for once non-interactive objects to act like a Microsoft Surface. Users can hold a virtual object, since the system projects a red ball representing the object onto the body. Objects can be moved from surface to surface based on the user touching both surfaces simultaneously. Menus rely on the ability to precisely detect where the user is.

The cameras and three projectors are aligned to cover the space well. An infrared camera computes depth from the distortion of the pattern in a captured image. This is compared against measurements when no users are in the room to determine where users are. Depth cameras are calibrated before projectors through retro-reflective dots on a grid followed by a 3D camera pose. Interactive surfaces are manually designated and must be flat, rectangular, and non-moving. The cameras produce a 3D mesh of the room, but a virtual camera makes the computation in 2D space for efficiency. LightSpace combines three virtual cameras that can distinguish hands, but not yet fingers. The plan view helps to determine connections. Picking up an object resembles the motion of a ball on a surface to avoid a complex physics engine. Menu items detect user head positions for readability and is a widget for this system.

Discussion
As a proof of concept for this system, the authors tried to make LightSpace effective and for the most part succeeded. While some of the flaws found in the limited user testing are enough to make me worry about the usability of this technology, I found that the authors adequately demonstrated that this system was in fact feasible.

I was very concerned about users blocking the camera and thus being locked out of the system. I am honestly surprised that that never came up during the design process. Requiring users to be aware of where the cameras are at all times seems to me to completely miss the point of having a smart room. Perhaps more cameras could make this system more robust. The cost is also a little prohibitive at the moment, but the authors are correct to mention that the release of the Kinect may cause depth cameras to decrease in cost, allowing for more widespread use.

I could see this work being combined with the multitoe system for a fully interactive system. The feet could work one system (maybe a keyboard), while the user moves their hands to interact with foot-driven material. If the user moves to use another surface, the feet menus would keep up for ease of access. The system would need some way of telling when a user was ready to input with feet, which could draw off the double-tap of the pocket-based foot gesture system.

CSCE 436 Blog

Thursday, September 29, 2011

Paper Reading #13: Combining Multiple Depth Cameras and Projectors for Interactions On, Above, and Between Surfaces

No comments:

Post a Comment