Tuesday, September 20, 2011

Paper Reading #10: Sensing Foot Gestures from the Pocket

Reference Information
Sensing Foot Gestures from the Pocket
Jeremy Scott, David Dearman, Koji Yatani, Khai N. Truong
Presented at UIST'10, October 3-6, 2010, New York, New York, USA

Author Bios
  • Jeremy Scott was an undergraduate at the University of Toronto but now is a graduate student at MIT, where he works in the Multimodal Understanding Group and studies AI.
  • David Dearman is a PhD student at the University of Toronto. His interests are in context-aware computing, specifically using mobile devices.
  • Koji Yatani is a PhD candidate at the University of Toronto and previously worked for Microsoft Research in Redmond. He is interested in developing new sensing technologies. 
  • Khai N. Truong is an Associate Professor at the University of Toronto. He is interested in enhancing usability and holds a PhD in Computer Science from the Georgia Institute of Technology.

Summary
Hypothesis
How effective is a foot-controlled, visual feedback-less input system for a mobile device?

Methods
The authors conducted an initial study of the efficacy of foot-based gestures involving lifting and rotating. Participants selected targets by rotating from the start position along three axes of rotation: the ankle, heel, and toe. Ankle rotations were further subdivided. Rotations were captured through a motion capture device that focused on a rigid foot model to ensure uniformity. No visual feedback was provided when making a selection, though users were trained.

A second study logged accelerometer data points for further analysis.

A third study operated similarly to the first experiment, but used the authors' system instead, with three iPhones placed on the user. Again, a practice session preceded the test. Gestures were cross-validated through leave-one-participant-out, which tests against an omitted data point, and within-participant stratified cross-validation, which tests against a single participant at a time.

Results
Each of the four rotations were analyzed separately. Closer targets were selected more quickly, but the angle of error was more severe. Ankle selection was less accurate than either toe or heel rotation, the most accurate. A probable cause of error was user exhaustion. Rotation in line with normal physiological bounds was preferred by users. Raising the heel in plantar flexion was more accurate than the other ankle rotation and was preferred by users. Rotating the foot has roughly the same error whether pivoting on the toe or heel. Participants preferred the toe-based gestures for comfort.

The accelerometer data suggested 34 features than could be used to distinguish a gesture. These were categorized into time-domain figures, based around time intervals, and frequency-domain figures, based on samples in the frequency domain.

The third study found that the system could classify ten different gestures correctly approximately 86% of the time. Within-participant classification tended to be more accurate. Placing the accelerometer on the side increased chances of success. Consideration of the angle led to more confused gestures.

Contents
Eyes-free interaction devices already exist and use aural or vibrotactic, which involves small vibration motors, feedback. Some of these rely on accelerometers to detect gestures. Unlike voice or touch controls, foot controls are less researched. The foot cannot provide particularly fine gestures but can perform coarse ones. Accelerometers can sense an infer a person's activity fairly accurately.

The authors distinguish foot gestures based on the axis of rotation and in part the direction of rotation. Their favored rotations, toe, heel, and plantar flexion, were subcategorized based on the angle of rotation. The final type of rotation was ignored based on their first study.

The authors performed two studies to test the efficacy of foot-based gestures and developed a system that analyzed foot gestures through an accelerometer of a phone on the corresponding hip. Gesture recognition is derived through machine learning. The proposed gesture system relies on a 3-axis accelerometer, the user placing their foot in the origin and double-tapping to begin a gesture along a single axis. A Naive Bayes algorithm is used to classify movements and was selected for its low time complexity.

Discussion
The authors tested the efficacy of a foot-based gesture system. Their results suggested that the within-participant stratified cross validator, combined with a small array of foot rotations, could make for a fairly effective gesture recognizer. I was actually quite impressed by how much better the accuracy was than I expected. The principle, at least, convinced me of this study's validity.

I honestly tried to envision myself using this technology while sitting on a bus (for the sake of fairness, a relatively uncrowded one). The most important detail of that sentence is the word "sitting." I cannot begin to imagine the balancing nightmare that trying to use this technology while standing would provoke. Perhaps I'm just overly clumsy or maybe disinclined towards standing on one foot in general, but I cannot imagine myself using a foot-based gesture except while sitting. That is why when I read that the authors thought this could be useful for someone standing with their hands full, the practicality of the device suddenly plummeted. Its uses are far too few to make any real sort of difference in a user's life.

No comments:

Post a Comment