Tuesday, November 8, 2011

Paper Reading #21: Human Model Evaluation in Interactive Supervised Learning

Reference Information
Human Model Evaluation in Interactive Supervised Learning
Rebecca Fiebrink, Perry R. Cook, Daniel Trueman
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

  • Rebecca Fiebrink is an Assistant Professor in Computer Science and affiliated faculty in Music at Princeton. She received her PhD from Princeton.
  • Perry R. Cook is a Professor Emeritus in Princeton's Computer Science and Music Departments. He has a number of papers combining music and software in innovative ways.
  • Daniel Trueman is an Associate Professor of Music at Princeton. He has built a variety of devices, including hemispherical speakers and a device that senses bow gestures.

Summary
Hypothesis
Can interactive supervised learning be used effectively by end users to train an algorithm to correctly interpret musical gestures?

Methods
The authors conducted three studies. The first (A) consisted of composers who created new musical instruments with the incrementally changing software. The instruments drive digital audio synthesis in real-time. Input devices were varied.

The second study (B) asked users to design a system that would classify gestures for different sounds and had a task similar to the prior study. Students would perform a piece through an interactive system they had built.

The third test (C) involved a cellist and building a gesture recognizer for bow movements in real-time. The cellist would label gestures appropriately for inital data.

Results
Each study found that users would adjust their training sets and built them incrementally. Cross-validation was only used occasionally in the latter two studies and never in the first. Those who did use it found that high cross-validation values indicated were reliable clues to the working nature of the model. Direct evaluation was used far more commonly in each study. It was used in a wider set of circumstances than cross-validation. It checked correctness based on user expectations, suggested that users assigned a higher "cost" to more incorrect output (A, B, C), assisted in transitioning between labels for parts of the bow (C), let users use model confidence as a basis for algorithm quality (C), and made it clear that sometimes complexity was desired, as users explicitly acted to add more complexity to the model (A).

In the third study, the cellist had to use direct evaluation to subjectively grade the system after each training. Interestingly, higher cross-validation accuracy did not always appear with higher subjective ratings.

Users learned to produce better data, what could be accomplished with a supervised machine learning system, and possible improvements to their own techniques (C). Users rated Wekinator very highly.

Contents
Machine learning recently has a number of people interested in integrating humans into machine learning to make better and more useful algorithms. One type is supervised machine learning, which trains the software on example inputs and outputs. These inputs are generalizable, which allows the algorithm to produce a likely result for input not in the data set. Building the data sets is difficult, so cross-validation can be used to partition sets into the training set and expected inputs. Human interaction can be added to the machine learning process, especially to judge the validity of outputs.

The authors set their inputs to be gestures, both manual and of a cello bow. The manual one was used not only to model gestures correctly, but to allow for real-time performance of the system. Gesture modelling works for interactive supervised learning because the algorithm must be trained to the person, who can then adjust based on results. Their software is called Wekinator. The system allowed for cross-validation and direct evaluation. These allow users to learn implicitly how to use the machine learning system more effectively without any explicit background training in machine learning. Even with cross-validation, a training set might be insufficiently general. Generalizing is paramount. Since the training is user-performed, however, the data set might be optimized for only a select range, which is inideal.

Discussion
The authors wanted to know if interactive supervised machine learning could produce a viable means of handling music-related gestures. Their three tests found that the users could fairly easily use the software to produce a valid model for the task at hand. As such, I am convinced that interactive supervised machine learning is viable for at least this case, if not for more general ones.

This generality of applications is something that I hope to see addressed in future work. The possible users needed to be experienced in a musical programming language before they could even start to train the software. I wondered if a string player who has a pathological fear of PureData (and its less-free cousin, used by the authors) could learn to use a variation of the software. If so, perhaps this concept can be extended to the increasingly less technically-minded.

No comments:

Post a Comment