Thursday, December 1, 2011

Paper Reading #28: Experimental Analysis of Touch-Screen Gesture Designs in Mobile Environments

Reference Information
Experimental Analysis of Touch-Screen Gesture Designs in Mobile Environments
Andrew Bragdon, Eugene Nelson, Yang Li, Ken Hinckley
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

  • Andrew Bragdon is a PhD student at Brown University. He is currently researching gestural user interfaces.
  • Eugene Nelson is a graduate student at Brown University.
  • Yang Li is a senior research scientist at Google and was a research associate at the University of Washington. He holds a PhD in Computer Science from the Chinese Academy of Sciences. He is primarily interested in gesture-based interaction, with many of his projects implemented on Android.
  • Ken Hinckley is a principal researcher at Microsoft Research. He holds a PhD in Computer Science from the University of Virginia.


Summary
Hypothesis
How do situational impairments impact touch-screen interaction? Will soft buttons perform less effectively in non-ideal environments? What is the best combination of moding techniques and gesture type? How do varying distraction levels affect usage?

Methods
The users were intermediately comfortable with computing. The phones ran Android 2.1. The built-in hard buttons were disabled for the test. Eye gaze data was recorded to determine eye movement start and stop time, with the display and phone separately sufficiently to identify eye movements. To simulate expert usage, icons appeared to reveal the needed gesture or command. Feedback was immediately related to the user.

The authors used a repeated-measures within-participants experimental design. Users completed a questionnaire. Then, they completed each of the 12 commands six times per environment. The condition order was counterbalanced through randomization. The variables considered were completion time, mode errors, command errors, and baseline and concurrent distractor task performance. Gestures were recorded. The users then completed another questionnaire.

Results
The time required depended on the environment, but there was no significant technique x environment interaction. Bezel marks took the least amount of time. Hard mark and soft buttons took about the same time. Bezel paths and soft buttons were also about even. Bezel paths were dramatically faster than hard button paths. Bezel marks and soft buttons performed about the same in the direct, distraction-free environment, but in all other cases, bezel marks were faster. For soft buttons, direct and indirect environments caused the only differences in performance. Bezel marks were the same regardless of direct and indirect. Bezel paths found fairly consistent results regardless of environment, though there was a non-significant increase from direct to indirect.

Hard button marks and bezel marks had the highest accuracy, followed by soft buttons. Hard button paths and bezel paths had the lowest accuracy. There were more errors for paths than for marks. The environment had minimal effect on accuracy. When sitting, soft buttons performed significantly worse than bezel marks. This was similar for walking. Soft buttons ahd the worst normalized distance from the target, while bezel marks had the lowest mean. There was no significant difference for glances in any of the gesture techniques.

Most users preferred soft buttons for their performance in direct sitting tests. Most liked hard button paths the least. In indirect sitting tests, users were split between bezel marks and hard button marks, with most liking soft buttons the least. Tasks involving distraction found the same results as the sitting indirect test. Overall, most people preferred hard button paths and bezel marks. Gesturing begins sooner for bezel moding than hard-buttons.

The gesture error rate was sufficiently high to merit random visual sampling, which found that the recognizer was functional. The difficulty of using thumbs is the probable reason for errors, in addition to speed. Block number was significant on completion time, with the later trials usually performing better. Mark-based gestures can be efficiently used one-handed. Bezel marks were the overall fastest.

Contents
Touchscreens increasingly use soft buttons instead of hard buttons. While these are effective in an ideal, distraction-free environment, users frequently use touchscreens in non-ideal environments. The distractions present in these environments are situational impairments. Gestures rely on muscle memory, which allows users to focus on their task, but visual feedback is still a problem. The authors considered situational impairments as composed of two factors: motor activity and distractional level. To examine motor activity, the authors tested when users were sitting and walking. Distraction level was tested by not having a distraction, a light situational-awareness distraction, and a distracted that relied heavily on attention. They tested moding and gesture type as factors of gesture design. They also considered mark and free-form path gestures.

Previous studies found that executing actions should not demand a lot of visual attention and that walking is not a significant impedence for keyboard input. One study created a system that used the bezel as a way to select modes. Another inspired the mark gestures in this paper. One found that users prefer using phones with one hand.

The moding techniques were hard button-initiated gestures with a button that has a different feel from the bezel, bezel gestures (which start a gesture by swiping through the bezel), and soft buttons similar in size to typical phone buttons.

The two types of path-based gestures were rectilinear mark-based gestures and freeform gestures. Mark-based gestures usually use axis-aligned, rectilinear line mark segments and are quick to execute. The authors used axis-aligned mark segments to maximize recognition. The authors implemented 12 freeform path gestures that were simple to execute, were not axis-aligned, and could be easily recognized. Nine of these were gestures used in previous work. The gestures used the default Android OS recognizer.

The environment was composed of motor activity and distractional level. Motor activity was divided into sitting and walking on a motorized treadmill. There were three distractional levels. The tasks involving no distraction required no eye movement from the phone to read the task. The moderate situational awareness task involved users watching for a circle to appear on another screen while using the phone. The attention-saturating task (AST) involved keeping a moving circle centered on a crosshair. The authors did not combine sitting and AST and walking with no distractions due to their improbability in real-life.

Discussion
The authors wanted to evaluate different conditions involved in making a selection on a phone involving either gestures or soft buttons. They found that users preferred either the hard button mark or bezel mark technique, which were the fastest overall.

I was a little concerned that the authors discussed that sitting with AST was reflected in driving while using a phone. I appreciated that they were recommending against such an action, but the fact that they tested it nonetheless was still bothersome, especially since they did not test a few possible combinations due to their improbability.

It was surprising that gestures were faster than soft buttons, but the time required to find the soft button in question explains that result. I would like to see future work involving how to use mark gestures in an proper application.

Sunday, November 27, 2011

Paper Reading #26: Embodiment in Brain-Computer Interaction

Reference Information
Embodiment in Brain-Computer Interaction
Kenton O'Hara, Abigail Sellen, Richard Harper
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

  • Kenton O'Hara is Senior Researcher in the Socio Digital Systems Group at Microsoft Research in Cambridge. He studies Brain-Computer Interaction.
  • Abigail Sellen is a Principal Researcher at Microsoft Research Cambridge and co-manages the Socio-Digital Systems group. She holds a PhD in Cognitive Science from the University of California at San Diego.
  • Richard Harper is a Principal Researcher at Microsoft Research Cambridge and co-manages the Socio-Digital Systems group. He has published over 120 papers.


Summary
Hypothesis
What are the possibilities and constraints of Brain-Computer Interaction in games and as a reflection of society?

Methods
The study used MindFlex. Participants were composed of groups of people who knew each other and were assembled through a host participant. The hosts were given a copy of the game to take home and play at their discretion. Each game was videorecorded. Groups were fluid. The videos were analyzed for physical behavior to explain embodiment and social meaning in the game.

Results
Users deliberately adjusted their posture to relax and gain focus. These responses were dynamic based on how the game behaved. Averting gaze typically was used to lower concentration. Users sometimes physically turned away to achieve the correct game response. Despite the lack of visual feedback, the decreased whirring of fans indicated the game state. The players created a narrative fantasy on top of the game's controls to explain what was occuring, adding an extra source of mysticism and engagement. The narrative is not necessary to control the game, but is still prevalent even when it is contrary to the function of the game (eg. thinking about lowering the ball increases concentration). Embodiment plays a key role in mental tasks.

How and why people watched the game was not entirely clear, especially when players did not physically manifest their actions. In one case, when a watcher attempted to give instructions to the player, the lack of a response from the player prompted further instructions, which distracted the player too much. The watcher interpreted the player's mind from the game effects, which can be misinterpreted. Players sometimes volunteered their mindset through gestures or words. Sometimes these were a performance for the audience. The spectatorship was a two-way relationship, with the audience interacting with the players. These tended to be humerous remarks to--and about-- the player and could be purposefully helpful or detrimental. The nature of the interaction reflected real-world relationships between people.

Contents
Advances in non-invasive Brain-Computer Interaction (BCI) allows for its wider usage in new applications. Recently, commercial products using BCI have appeared, but the design paradigms and constraints for it are unknown. Initial work in BCI was largely philosophical but later moved towards the observable. The nature of embodiment suggests that understanding BCI relies on understanding actions in the physical world. These actions influence BCI readings and social meaning.

Typical prior studies used BCI as a simple control or took place in a closed environment. One study took place outside of a lab and suggested that the environmental context allows for further ethnographic work. One study worried less about efficacy and more about the experience. These studies noted that the games were a social experience.

To explore embodiment, the authors used a BCI-based game called MindFlex. MindFlex is a commercially-available BCI game that uses a wireless headset with a single electrode. It uses EEG to measure relative levels of concentration, with higher concentration increases the height of the ball and lower concentration lowering it through a fan.

The authors recommended that a narrative that maps correctly to controls should be used. Physical interaction paradigms could be deliberately beneficial or detrimental, depending on the desired effect.

Discussion
The authors wanted to discover possible features and problems with BCI. Their study identified a few of these, but this paper felt far too much like preliminary results for me to agree that these are the only options and flaws. I am convinced that the things the authors identified fall into the categories for constraints and uses, but it is uncertain about how representative their findings are of the entire set of each.

I had not considered the design paradigms behind BCI games before reading this paper, so the reflection of social behavior found in them was particularly interesting.

I would like to see more comprehensive data on how social relationships factor into the playing of the games. The anecdotes provided by the authors were interesting, but the lack of statistics limited the ability to determine the frequency with which identified behaviors occurred.

Thursday, November 24, 2011

Paper Reading #25: TwitInfo: Aggregating and Visualizing Microblogs for Event Exploration

Reference Information
TwitInfo: Aggregating and Visualizing Microblogs for Event Exploration
Adam Marcus, Michael S. Bernstein, Osama Badar, David R. Karger, Samuel Madden, Robert C. Miller
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

  • Adam Marcus is a graduate student at MIT's Computer Science and Artificial Intelligence Lab. He is a member of the Database and Haystack groups.
  • Michael S. Bernstein is a graduate student at MIT's Computer Science and Artificial Intelligence Lab. His work combines crowd intelligence with computer systems.
  • Osama Badar is an undergraduate student at MIT.
  • David R. Karger is a professor in MIT's Computer Science and Artificial Intelligence Lab. His work focuses on information retrieval.
  • Samuel Madden is an Associate Professor with MIT's Computer Science and Artificial Intelligence Lab. His work is in databases.
  • Robert C. Miller is an Associate Professor with MIT's Computer Science and Artificial Intelligence Lab. His work centers on web automation and customization.


Summary
Hypothesis
How functional is a streaming algorithm that allows for real-time searches of Twitter for an arbitrary subject?

Methods
To test the algorithm, three soccer games and one month of earthquake's worth of tweets were collected. The soccer games were annotated by hand to check with the data, while the earthquake data was checked against the US Geological Survey. The authors tested precision and recall. The first tests how many tweets found by the algorithm were in the truth set, while the second sees how many elements in the truth set were found by the algorithm.

The authors then tested if users could understand the UI. Initially, users were asked to perform arbitrary tasks with the program, with usability feedback gathered. The second task was exploration-based with a time limit. Given an event, users had a little time to dictate a news report on the event. The authors then interviewed the users, one of whom is a prominent journalist.

Results
The algorithm has high recall and usually only failed to detect an event if the Twitter data lacked a peak. Precision was high for soccer but produced false positives for major earthquakes as it flagged minor earthquakes too. If minor earthquakes were included, the accuracy increased to 66%. Twitter's interests or the lack thereof introduce a bias to the data. Sometimes a single earthquake produced multiple peaks or two earthquakes overlapped.

Users successfully reconstructed events quickly. Some users felt the information was shallow, but it allowed for high-level understanding. Users focused on the largest peak first. Users liked the timeline, but did not trust the sentiment analysis. The map needed aggregation. Without the system, users would have relied on a news site or aggregator. The journalist suggested that TwitInfo would be useful for backgrounding on a long-term topic. Her opinions were largely similar to the other users.

Contents
Twitter allows for a live understanding of public consciousness but is difficult to adapt into a clear timeline. Past attempts to create visualizations are domain specific and use archived data in general. The authors developed TwitInfo that takes a search query for an event and identifies and labels event peaks, produces a visualization of long-running events, and aggregates public opinion in a graphical timeline. It identifies subevents and summarizes sentiment through signal processing of social streams. The data is normalized to produce correct results. These results can be sorted geographically or aggregated to suggest both localized and overall opinions.

A previous paper found that Twitter bears more in common with a news media site than a social network. Other papers covered hand-created visualizations of events. One paper discussed aggregated opinion on microblogs. Some research involved searching for specific events.

TwitInfo events are defined by a user's search terms. Once an event is defined, the system begins logging tweets that match the query and creates a page that updates in real-time and is used to monitor data. The event timeline is by volume of tweets matching the criteria. Spikes in the volume are labelled as peaks with varying granularity and annotated with relevant descriptors. Peaks can be used to form their own timelines. The actual tweets within the event timeframe are sorted by event or peak keywords and color-coded based on perceived sentiment. Overall sentiment is displayed in a pie chart. Sentiment is derived from a Naive Bayes classifier. The classifier reduces the effect of bias by predicting the probability that a tweet is negative or positive and then recall-normalizes the data. The top three most common URLs are listed on the Popular Links panel. Another panel shows geographic distributions.

Peaks are calculated based on tweet frequency maxima in a user-defined period of time. Tweets are sorted into bins based on time, and, in a method similar to detecting TCP congestion, the bin with an unusually large number of tweets is found. This method involves hill-climbing and identifies the appropriate peak window, which is then labeled. In case a user searches for a noisy search term, IDF-normalizing considers the global popularity of a term to lessen the term's effect.

Discussion
The authors wanted to see if TwitInfo was useful and functional. While they had a few cases that were not covered which could have altered their results slightly, the vast majority of the work appeared to be well-founded. As such, I was convinced of their results.

Since so many users wanted features that were notably missing, like map aggregation, I would like to see the system tested again with those items. The authors also mentioned that a more extensive test would be helpful, which I agree with.

Dr. Caverlee is starting a Twitter project that looks for peaks around the time of a natural disaster. At first I thought he might have gotten scooped, but this is more just a part of what he plans to do. I found the related nature very interesting.

Monday, November 14, 2011

Paper Reading #23: User-defined Motion Gestures for Mobile Interaction

Reference Information
User-defined Motion Gestures for Mobile Interaction
Jaime Ruiz, Yang Li, Edward Lank
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

  • Jaime Ruiz is a doctoral student in the Human Computer Interaction Lab at the University of Waterloo. His research centers on alternative interactions techniques.
  • Yang Li is a senior research scientist at Google and was a research associate at the University of Washington. He holds a PhD in Computer Science from the Chinese Academy of Sciences. He is primarily interested in gesture-based interaction, with many of his projects implemented on Android.
  • Edward Lank is an Assistant Professor of Computer Science at the University of Waterloo with a PhD from Queen's University. Some of his research is on sketch recognition.


Summary
Hypothesis
What sort of motion gestures to users naturally develop? How can we produce a taxonomy for motion gestures?

Methods
The authors created a guessability study that directed users to simulate what they thought would be appropriate gestures for a given task. Participants were given 19 tasks and were instructed to think aloud as well as provide a subjective preference rating for each produced gesture. No recognizer feedback was provided, and users were told to consider the phone to be a magic brick that automatically understood their gesture to avoid the gulf of execution. The sessions were video recorded and transcribed. The Android phone logging accelerometer data and locked the screen.

Tasks were divided into actions and navigation-based tasks. Each task acted either on the phone or on an application. These subdivisions each had a task representing it to minimize task duplication. Similar tasks were grouped together. One set including answering, muting, and ending a call. The gestures were not finalized until the user completed all tasks in a set. The user then performed their gesture multiple times and reported on the effectiveness.

The authors considered their user-defined gesture set through an agreement score, which evaluated the degree of consensus among the users.

Results
Users produced the same gesture as their peers for many of the tasks. The transcripts suggested several themes. The first of these, mimicking normal use, imitated motions performed when normally using the phone. Users produced actions that were remarkably similar in this theme and found their gestures to be natural. Another theme was real-world metaphors, where users considered the device to resemble a physical device, like hanging up by turning the phone as though it was an older-styled telephone. To clear the screen, users tended to shake the phone, which is reminscent of an Etch-A-Sketch. The third theme, natural and consistent mappings, considered the users' mental model of how something should behave. The scrolling tasks were designed to test the users' mental model of navigation. In the XY plane, panning left involved moving the phone left. Zooming in and out involved moving the phone closer and further from the user, as though the phone was a magnifying glass. Discrete navigation, as opposed to one that varied on the force used, was preferred.

Users wanted feedback and designed their gestures to allow for visual feedback while performing the gesture. Most gestures indicated that they would use motion gestures at least occasionally.

Based on the authors' taxonomy of the produced gestures, users opted for simple discrete gestures in a single axis with low kinematic impulse.

The agreement scores for the user-defined gestures are similar to a prior study's scores for surface gestures. A consensus could not be reached on switching to another application and acting on a selection. The subjective ratings for goodness of fit were higher for the user-defined set than gestures not in the set. Ease of use and frequency of use had no significant difference between those two groups. Users stated that motion gestures could be reused in similar contexts, so the user-defined set can complete many of the tasks within an application.

Contents
Smartphones have two common input methods: a touchscreen or motion sensors. Gestures on the former are surface gestures, while on the latter are motion gestures. The authors focused on motion gestures, which have many unanswered questions about the design of them. They developed a taxonomy of parameters that can differentiate between different types of motion gestures. A taxonomy of this sort can be used to create more natural motion gestures and aid in the design of sensors and toolkits for motion gesture interaction from both the application and system.

Most of the prior work on classifying gestures focused on human discourse, but the authors focused on the interaction between the human and the device. One study produced a taxonomy for surface gestures based on user elicitation, which is the foundation of participatory design. Little research on classifying motion gestures has been done, though plenty describes how to develop motion gestures.

The authors' taxonomy classifed user-produced motion gestures from their study into two categories. The first, gesture mapping, involves the mapping, whether in the nature, temporal, or context dimension, of motion gestures to devices. The nature dimension considers mapping to a physical object and produces a gesture that is either metaphorical, physical, symbolic or abstract. Temporal describes when the action occurs with relation to when the gesture is made. Discrete gestures have the action after the gesture. Continuous ones have the action occur duing the gesture and afterwards, as in map navigation. Context considers whether the gesture required a particular context. Answering a phone is in-context, while going to the Home screen is out-of-context. The other category, physical characteristics, focuses on kinematic impulse, dimensionality, and complexity. The impulse considers the range of jerk produced in low, moderate, and high groupings. Dimension involves the number of axes involved to perform a gesture. Complexity states whether a gesture is simple or compound, which is multiple simple gestures placed into one.

The authors took their taxonomy and user gestures to create a user-defined gesture set, based on groups of identical gestures from the study.

Some gestures in the consensus set


The authors suggested that motion gesture toolkits should provide easy access to the user-defined set. The end user should be allowed to add gestures in addition to a designer. The low kinetic impulse gestures could result in false positives, so a button or gesture to denote a motion gesture could be useful. Additional sensors might be necessary. Gestures should also be socially acceptable.

Discussion
The authors wanted to see how users define motion gestures and how to classify these. Their user study was user-oriented and their taxonomy relied heavily on those results. Since the study was well-founded, I concluded that the taxonomy is valid, so I am convinced of the correctness of the results.

The lack of feedback, while necessary for the study, made me wonder if users might produce different gestures when they are able to see what they are doing on-screen. Perhaps a future work could test this idea based on an implemented consensus set.

While I tend to not use motion gestures, this was interesting because, when refined, it allows for greater ease of access in varying contexts.

Tuesday, November 8, 2011

Paper Reading #22: Mid-air Pan-and-Zoom on Wall-sized Displays

Reference Information
Mid-air Pan-and-Zoom on Wall-sized Displays
Mathieu Nancel, Julie Wagner, Emmanuel Pietriga, Olivier Chapuis, Wendy Mackay
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

  • Mathieu Nancel is a PhD student in Human-Computer Interaction at INRIA. He has work in navigating large datasets.
  • Julie Wagner is a PhD student in the insitu lab at INRIA. She was a Postgraduate Research Assistant in the Media Computing Group at Aachen before that.
  • Emmanuel Pietriga is the Interim Leader for insitu at INRIA. He worked for the World Wide Web Consortium in the past.
  • Olivier Chapuis is a Research Scientist for Universidad Paris-Sud. He is a member of insitu.
  • Wendy Mackay is a Research Director for insitu at INRIA. She is currently on sabbatical at Stanford.


Summary
Hypothesis
What possible forms of indirect (ie. not on the wall) interaction are best for wall-sized displays? Will bimanual gestures be faster, more accurate, and easier to use than unimanual ones? Will linear gestures slow over time and be preferred over circular gestures? Will tasks involving fingers be faster than those involving whole limbs? Will 1D gestures be faster? Will 3D gestures be more tiring?

Methods
The authors tested the 12 different conditions under consideration with groups as enumerated in the Contents section below. They measured the performance time and number of overshoots, when users zoomed too far. The users navigated a space of concentric circles, zooming and panning to reach the correct level and centering for each set of circles. Users performed each of the 12 possible tasks and then answered questions about the experience.

Results
Movement time and number of overshoots correlated. No fatigue effect appeared, though users learned over time. Two-handed techniques won out over unimanual gestures. Tasks using the fingers were faster, and 1D gestures were faster, though not as much for bimanual tests. Linear gestures were faster than circular ones and preferred by users. The users' Likert scale questions confirm these findings. 3D gestures were the most fatiguing.

Contents
Increasingly, high-resolution displays can show petas of pixels. These are inconvient or impossible to manage with touchscreens. Interaction techniques should allow the user the freedom to move while working with the display. Other papers found that large displays are useful and discussing mid-air interaction techniques. One involved a circular gesture technique called CycloStar. The number of degrees of freedom allows for users to parallelize their tasks. Panning and zooming is 3DOF, since the user controls the 2-D Cartesian position and the scale.

For their system, the authors discarded techniques that are not intuitive for mid-air interactions or are not precise enough. Their final considerations were unimanual and bimanual input, linear and circular gestures, and the three types of guidance through passive haptic feedback. Unimanual techniques involve one hand, while bimodal use two. Linear gestures move in a straight line, while circular ones involve rotation. Following a path in space with a limited device is 1D guidance, using a touch-screen is 2D, and free gestures are 3D. The bimanual gestures assign zoom to the non-dominant hand, and the other features to the dominant hand. The limb portions in consideration are the wrist, forearm, and upper arm. The 3D circular gestures resembled CycloStar. The linear gestures push inwards (zoom in) and pull outwards (zoom out). Circular ones involving making circles with the hand.

Discussion
The authors wanted to determine which forms of interaction were best for a large interactive display. Their tests were comprehensive and thorough, so I have no reason to doubt the validity of their claims.

This might be useful for major disaster emergency response teams. A large display combined with available information could be vastly useful, albeit expensive to implement. Dr. Caverlee is starting a project that gleans disaster information from social media sites. These technologies would certain work well together.

With the initial steps put together in this paper, I would like to see future work that more finely hones the details of the preferred types of interactions.

Paper Reading #21: Human Model Evaluation in Interactive Supervised Learning

Reference Information
Human Model Evaluation in Interactive Supervised Learning
Rebecca Fiebrink, Perry R. Cook, Daniel Trueman
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

  • Rebecca Fiebrink is an Assistant Professor in Computer Science and affiliated faculty in Music at Princeton. She received her PhD from Princeton.
  • Perry R. Cook is a Professor Emeritus in Princeton's Computer Science and Music Departments. He has a number of papers combining music and software in innovative ways.
  • Daniel Trueman is an Associate Professor of Music at Princeton. He has built a variety of devices, including hemispherical speakers and a device that senses bow gestures.

Summary
Hypothesis
Can interactive supervised learning be used effectively by end users to train an algorithm to correctly interpret musical gestures?

Methods
The authors conducted three studies. The first (A) consisted of composers who created new musical instruments with the incrementally changing software. The instruments drive digital audio synthesis in real-time. Input devices were varied.

The second study (B) asked users to design a system that would classify gestures for different sounds and had a task similar to the prior study. Students would perform a piece through an interactive system they had built.

The third test (C) involved a cellist and building a gesture recognizer for bow movements in real-time. The cellist would label gestures appropriately for inital data.

Results
Each study found that users would adjust their training sets and built them incrementally. Cross-validation was only used occasionally in the latter two studies and never in the first. Those who did use it found that high cross-validation values indicated were reliable clues to the working nature of the model. Direct evaluation was used far more commonly in each study. It was used in a wider set of circumstances than cross-validation. It checked correctness based on user expectations, suggested that users assigned a higher "cost" to more incorrect output (A, B, C), assisted in transitioning between labels for parts of the bow (C), let users use model confidence as a basis for algorithm quality (C), and made it clear that sometimes complexity was desired, as users explicitly acted to add more complexity to the model (A).

In the third study, the cellist had to use direct evaluation to subjectively grade the system after each training. Interestingly, higher cross-validation accuracy did not always appear with higher subjective ratings.

Users learned to produce better data, what could be accomplished with a supervised machine learning system, and possible improvements to their own techniques (C). Users rated Wekinator very highly.

Contents
Machine learning recently has a number of people interested in integrating humans into machine learning to make better and more useful algorithms. One type is supervised machine learning, which trains the software on example inputs and outputs. These inputs are generalizable, which allows the algorithm to produce a likely result for input not in the data set. Building the data sets is difficult, so cross-validation can be used to partition sets into the training set and expected inputs. Human interaction can be added to the machine learning process, especially to judge the validity of outputs.

The authors set their inputs to be gestures, both manual and of a cello bow. The manual one was used not only to model gestures correctly, but to allow for real-time performance of the system. Gesture modelling works for interactive supervised learning because the algorithm must be trained to the person, who can then adjust based on results. Their software is called Wekinator. The system allowed for cross-validation and direct evaluation. These allow users to learn implicitly how to use the machine learning system more effectively without any explicit background training in machine learning. Even with cross-validation, a training set might be insufficiently general. Generalizing is paramount. Since the training is user-performed, however, the data set might be optimized for only a select range, which is inideal.

Discussion
The authors wanted to know if interactive supervised machine learning could produce a viable means of handling music-related gestures. Their three tests found that the users could fairly easily use the software to produce a valid model for the task at hand. As such, I am convinced that interactive supervised machine learning is viable for at least this case, if not for more general ones.

This generality of applications is something that I hope to see addressed in future work. The possible users needed to be experienced in a musical programming language before they could even start to train the software. I wondered if a string player who has a pathological fear of PureData (and its less-free cousin, used by the authors) could learn to use a variation of the software. If so, perhaps this concept can be extended to the increasingly less technically-minded.

Thursday, November 3, 2011

Paper Reading #27: Sensing Cognitive Multitasking for a Brain-Based Adaptive User Interface

Reference Information
Sensing Cognitive Multitasking for a Brain-Based Adaptive User Interface
Erin Treacy Solovey, Francine Lalooses, Krysta Chauncey, Douglas Weaver, Margarita Parasi, Matthias Scheutz, Angelo Sassaroli, Sergio Fantini, Paul Schermerhorn, Audrey Girouard, Robert J.K. Jacob
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

  • Erin Treacy Solovey is a PhD candidate at Tufts University with an interest in reality-based interaction systems to enhance learning.
  • Francine Lalooses is a graduate student at Tufts University who studies brain-computer interaction.
  • Krysta Chauncey is a Post-Doc at Tufts University with an interest in brain-computer interaction.
  • Douglas Weaver is a Master's student at Tufts University working with adaptive brain-computer interfaces.
  • Margarita Parasi graduated from Tufts University and is now a Junior Software Developer at Eze Castle.
  • Matthias Scheutz is the Director of Tufts University's Human-Robot Interaction Lab.
  • Angelo Sassaroli is a Research Assistant Professor at Tufts University and received is doctorate from the University of Electro-Communication.
  • Sergio Fantini is a Professor of Biomedical Engineering at Tufts University, with researhc in biomedical optics.
  • Paul Schermerhorn is affiliated with Tufts University.
  • Audrey Girouard received her PhD from Tufts University and is now an Assistant Professor at Carleton University's School of Information Technology.
  • Robert J.K. Jacob is a Professor of Computer Science at Tufts University with research in new interaction modes.


Summary
Hypothesis
Can a non-invasive functional near-infrared spectroscopy (fNIRS) help with the design of user interfaces with respect to multitasking? How does it compare to a prior study that used a different system?

Methods
Their preliminary study used fNIRS in comparsion to the functional MRI study. The hope was to distinguish brain states as fMRI did. The design paradigm besides the sensors was identical. The authors used leave-one-out cross validation to classify the data, which then had noise removed.

The second set of experiments involved human-robot cooperation. The human monitored the robot's status and sorted rocks found on Mars by the robot. The information from the robot required a human response for each rock found. The Delay task involved checking for immediate consecutive classification messages. Dual-Task checked if successive messages were of the same type. Branching required the Delay process for rock classification messages and Dual Task for location messages. The tasks were presented in psuedo-random order and repeated until 80% accuracy was achieved. Then the fNIRS sensors were placed on the forehead to get a baseline measure. The user then completed ten trials for each of the three conditions.

The authors then tested if they could distinguish variations on branching. Random branching presented messages at random, while Predictive presented rock messages for every three stimuli. The procedures were otherwise identical to the preceding test.

The final test used branching and delay tasks to indicate branching and non-branching. The system was trained using the data from that test and then classified new data based on the training results.

Results
The preliminary study was about 68.4% successful in three pairwise classifications and 52.9% successful for three-way classification, suggesting that fNIRS could be used for further studies.

For the first test of the robot task, with users who performed at less than 70% accuracy removed from consideration, data was classified into normal and non-Gaussian distributions. Normal distributions were analyzed with repeated measuremeants one-way ANOVA, and the latter used non-parametric repeated measurements ANOVA. Accuracy and response time did not have a significant correlation. No learning effect was found. Total hemoglobin was higher in the branching condition.

The next test found no statistically significant difference in response times or accuracy. The correlation between these two was significant for predictive branching. Deoxygenated hemoglobin levels were higher in random branching for the first half of the trials, so this can be used to distinguish between the types of tasks.

In the last test, the system correctly distinguished between task types and adjusted its mode.

Contents
Multitasking is difficult for humans unless the tasks allow for integration of data into each other. Repeated task switching can lead to lower accuracy, increased duration, and increased frustration. Past works have measured mental workload, through means like pupil dilation and blood pressure. Others measured interruptibility costs based on different inputs, such as eye tracking or desktop activity. One study found three multitasking scenarios: branching, dual-task, and delay. The first involves a tree of tasks, the second involves unrelated tasks, and the last involves delay tasks until later. Functional MRI (fMRI) was used for that study and could distinguish between the tasks.

Non-invasive brain-sensing systems can help to enhance researcher's understanding of the mind during tasks. The authors' system uses functional near-infrared spectroscopy to distinguish between four states that occur with multitasking. Their goal is to use this system to assist in the development of user interfaces. They chose a human-robot team scenario to understand cognitive multitasking in UIs. These tasks require the human to perform a task while monitoring the robot.

The authors developed a system that can adjust a UI on the fly depending on the type of task being performed. The system can adjust the robot's goal structure and also has a debug mode. When a branching state appears, the robot autonomously moves to new locations; otherwise it returns to the default state.

Discussion
The authors wanted to find out if fNIRS could distinguish between tasks to adjust user interfaces for better multitasking. Their multiple tests and proof-of-concept certainly suggest that this is a possible way to enhance multitasking.

I am concerned that this innovation will lead to yet another level of complexity when designing user interfaces, since we now have to classify possible scenarios we think will occur. If this could be automated somehow, that would be beneficial.

The authors found one way to distinguish between tasks, but I imagine that there are far more. Future work might find one that is even less obtrusive than external sensors.