CSCE 436 Blog: 2011

Thursday, December 1, 2011

Paper Reading #28: Experimental Analysis of Touch-Screen Gesture Designs in Mobile Environments

Reference Information
Experimental Analysis of Touch-Screen Gesture Designs in Mobile Environments
Andrew Bragdon, Eugene Nelson, Yang Li, Ken Hinckley
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

Andrew Bragdon is a PhD student at Brown University. He is currently researching gestural user interfaces.
Eugene Nelson is a graduate student at Brown University.
Yang Li is a senior research scientist at Google and was a research associate at the University of Washington. He holds a PhD in Computer Science from the Chinese Academy of Sciences. He is primarily interested in gesture-based interaction, with many of his projects implemented on Android.
Ken Hinckley is a principal researcher at Microsoft Research. He holds a PhD in Computer Science from the University of Virginia.

Summary
Hypothesis
How do situational impairments impact touch-screen interaction? Will soft buttons perform less effectively in non-ideal environments? What is the best combination of moding techniques and gesture type? How do varying distraction levels affect usage?

Methods
The users were intermediately comfortable with computing. The phones ran Android 2.1. The built-in hard buttons were disabled for the test. Eye gaze data was recorded to determine eye movement start and stop time, with the display and phone separately sufficiently to identify eye movements. To simulate expert usage, icons appeared to reveal the needed gesture or command. Feedback was immediately related to the user.

The authors used a repeated-measures within-participants experimental design. Users completed a questionnaire. Then, they completed each of the 12 commands six times per environment. The condition order was counterbalanced through randomization. The variables considered were completion time, mode errors, command errors, and baseline and concurrent distractor task performance. Gestures were recorded. The users then completed another questionnaire.

Results
The time required depended on the environment, but there was no significant technique x environment interaction. Bezel marks took the least amount of time. Hard mark and soft buttons took about the same time. Bezel paths and soft buttons were also about even. Bezel paths were dramatically faster than hard button paths. Bezel marks and soft buttons performed about the same in the direct, distraction-free environment, but in all other cases, bezel marks were faster. For soft buttons, direct and indirect environments caused the only differences in performance. Bezel marks were the same regardless of direct and indirect. Bezel paths found fairly consistent results regardless of environment, though there was a non-significant increase from direct to indirect.

Hard button marks and bezel marks had the highest accuracy, followed by soft buttons. Hard button paths and bezel paths had the lowest accuracy. There were more errors for paths than for marks. The environment had minimal effect on accuracy. When sitting, soft buttons performed significantly worse than bezel marks. This was similar for walking. Soft buttons ahd the worst normalized distance from the target, while bezel marks had the lowest mean. There was no significant difference for glances in any of the gesture techniques.

Most users preferred soft buttons for their performance in direct sitting tests. Most liked hard button paths the least. In indirect sitting tests, users were split between bezel marks and hard button marks, with most liking soft buttons the least. Tasks involving distraction found the same results as the sitting indirect test. Overall, most people preferred hard button paths and bezel marks. Gesturing begins sooner for bezel moding than hard-buttons.

The gesture error rate was sufficiently high to merit random visual sampling, which found that the recognizer was functional. The difficulty of using thumbs is the probable reason for errors, in addition to speed. Block number was significant on completion time, with the later trials usually performing better. Mark-based gestures can be efficiently used one-handed. Bezel marks were the overall fastest.

Contents
Touchscreens increasingly use soft buttons instead of hard buttons. While these are effective in an ideal, distraction-free environment, users frequently use touchscreens in non-ideal environments. The distractions present in these environments are situational impairments. Gestures rely on muscle memory, which allows users to focus on their task, but visual feedback is still a problem. The authors considered situational impairments as composed of two factors: motor activity and distractional level. To examine motor activity, the authors tested when users were sitting and walking. Distraction level was tested by not having a distraction, a light situational-awareness distraction, and a distracted that relied heavily on attention. They tested moding and gesture type as factors of gesture design. They also considered mark and free-form path gestures.

Previous studies found that executing actions should not demand a lot of visual attention and that walking is not a significant impedence for keyboard input. One study created a system that used the bezel as a way to select modes. Another inspired the mark gestures in this paper. One found that users prefer using phones with one hand.

The moding techniques were hard button-initiated gestures with a button that has a different feel from the bezel, bezel gestures (which start a gesture by swiping through the bezel), and soft buttons similar in size to typical phone buttons.

The two types of path-based gestures were rectilinear mark-based gestures and freeform gestures. Mark-based gestures usually use axis-aligned, rectilinear line mark segments and are quick to execute. The authors used axis-aligned mark segments to maximize recognition. The authors implemented 12 freeform path gestures that were simple to execute, were not axis-aligned, and could be easily recognized. Nine of these were gestures used in previous work. The gestures used the default Android OS recognizer.

The environment was composed of motor activity and distractional level. Motor activity was divided into sitting and walking on a motorized treadmill. There were three distractional levels. The tasks involving no distraction required no eye movement from the phone to read the task. The moderate situational awareness task involved users watching for a circle to appear on another screen while using the phone. The attention-saturating task (AST) involved keeping a moving circle centered on a crosshair. The authors did not combine sitting and AST and walking with no distractions due to their improbability in real-life.

Discussion
The authors wanted to evaluate different conditions involved in making a selection on a phone involving either gestures or soft buttons. They found that users preferred either the hard button mark or bezel mark technique, which were the fastest overall.

I was a little concerned that the authors discussed that sitting with AST was reflected in driving while using a phone. I appreciated that they were recommending against such an action, but the fact that they tested it nonetheless was still bothersome, especially since they did not test a few possible combinations due to their improbability.

It was surprising that gestures were faster than soft buttons, but the time required to find the soft button in question explains that result. I would like to see future work involving how to use mark gestures in an proper application.

Sunday, November 27, 2011

Paper Reading #26: Embodiment in Brain-Computer Interaction

Reference Information
Embodiment in Brain-Computer Interaction
Kenton O'Hara, Abigail Sellen, Richard Harper
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

Kenton O'Hara is Senior Researcher in the Socio Digital Systems Group at Microsoft Research in Cambridge. He studies Brain-Computer Interaction.
Abigail Sellen is a Principal Researcher at Microsoft Research Cambridge and co-manages the Socio-Digital Systems group. She holds a PhD in Cognitive Science from the University of California at San Diego.
Richard Harper is a Principal Researcher at Microsoft Research Cambridge and co-manages the Socio-Digital Systems group. He has published over 120 papers.

Summary
Hypothesis
What are the possibilities and constraints of Brain-Computer Interaction in games and as a reflection of society?

Methods
The study used MindFlex. Participants were composed of groups of people who knew each other and were assembled through a host participant. The hosts were given a copy of the game to take home and play at their discretion. Each game was videorecorded. Groups were fluid. The videos were analyzed for physical behavior to explain embodiment and social meaning in the game.

Results
Users deliberately adjusted their posture to relax and gain focus. These responses were dynamic based on how the game behaved. Averting gaze typically was used to lower concentration. Users sometimes physically turned away to achieve the correct game response. Despite the lack of visual feedback, the decreased whirring of fans indicated the game state. The players created a narrative fantasy on top of the game's controls to explain what was occuring, adding an extra source of mysticism and engagement. The narrative is not necessary to control the game, but is still prevalent even when it is contrary to the function of the game (eg. thinking about lowering the ball increases concentration). Embodiment plays a key role in mental tasks.

How and why people watched the game was not entirely clear, especially when players did not physically manifest their actions. In one case, when a watcher attempted to give instructions to the player, the lack of a response from the player prompted further instructions, which distracted the player too much. The watcher interpreted the player's mind from the game effects, which can be misinterpreted. Players sometimes volunteered their mindset through gestures or words. Sometimes these were a performance for the audience. The spectatorship was a two-way relationship, with the audience interacting with the players. These tended to be humerous remarks to--and about-- the player and could be purposefully helpful or detrimental. The nature of the interaction reflected real-world relationships between people.

Contents
Advances in non-invasive Brain-Computer Interaction (BCI) allows for its wider usage in new applications. Recently, commercial products using BCI have appeared, but the design paradigms and constraints for it are unknown. Initial work in BCI was largely philosophical but later moved towards the observable. The nature of embodiment suggests that understanding BCI relies on understanding actions in the physical world. These actions influence BCI readings and social meaning.

Typical prior studies used BCI as a simple control or took place in a closed environment. One study took place outside of a lab and suggested that the environmental context allows for further ethnographic work. One study worried less about efficacy and more about the experience. These studies noted that the games were a social experience.

To explore embodiment, the authors used a BCI-based game called MindFlex. MindFlex is a commercially-available BCI game that uses a wireless headset with a single electrode. It uses EEG to measure relative levels of concentration, with higher concentration increases the height of the ball and lower concentration lowering it through a fan.

The authors recommended that a narrative that maps correctly to controls should be used. Physical interaction paradigms could be deliberately beneficial or detrimental, depending on the desired effect.

Discussion
The authors wanted to discover possible features and problems with BCI. Their study identified a few of these, but this paper felt far too much like preliminary results for me to agree that these are the only options and flaws. I am convinced that the things the authors identified fall into the categories for constraints and uses, but it is uncertain about how representative their findings are of the entire set of each.

I had not considered the design paradigms behind BCI games before reading this paper, so the reflection of social behavior found in them was particularly interesting.

I would like to see more comprehensive data on how social relationships factor into the playing of the games. The anecdotes provided by the authors were interesting, but the lack of statistics limited the ability to determine the frequency with which identified behaviors occurred.

Thursday, November 24, 2011

Paper Reading #25: TwitInfo: Aggregating and Visualizing Microblogs for Event Exploration

Reference Information
TwitInfo: Aggregating and Visualizing Microblogs for Event Exploration
Adam Marcus, Michael S. Bernstein, Osama Badar, David R. Karger, Samuel Madden, Robert C. Miller
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

Adam Marcus is a graduate student at MIT's Computer Science and Artificial Intelligence Lab. He is a member of the Database and Haystack groups.
Michael S. Bernstein is a graduate student at MIT's Computer Science and Artificial Intelligence Lab. His work combines crowd intelligence with computer systems.
Osama Badar is an undergraduate student at MIT.
David R. Karger is a professor in MIT's Computer Science and Artificial Intelligence Lab. His work focuses on information retrieval.
Samuel Madden is an Associate Professor with MIT's Computer Science and Artificial Intelligence Lab. His work is in databases.
Robert C. Miller is an Associate Professor with MIT's Computer Science and Artificial Intelligence Lab. His work centers on web automation and customization.

Summary
Hypothesis
How functional is a streaming algorithm that allows for real-time searches of Twitter for an arbitrary subject?

Methods
To test the algorithm, three soccer games and one month of earthquake's worth of tweets were collected. The soccer games were annotated by hand to check with the data, while the earthquake data was checked against the US Geological Survey. The authors tested precision and recall. The first tests how many tweets found by the algorithm were in the truth set, while the second sees how many elements in the truth set were found by the algorithm.

The authors then tested if users could understand the UI. Initially, users were asked to perform arbitrary tasks with the program, with usability feedback gathered. The second task was exploration-based with a time limit. Given an event, users had a little time to dictate a news report on the event. The authors then interviewed the users, one of whom is a prominent journalist.

Results
The algorithm has high recall and usually only failed to detect an event if the Twitter data lacked a peak. Precision was high for soccer but produced false positives for major earthquakes as it flagged minor earthquakes too. If minor earthquakes were included, the accuracy increased to 66%. Twitter's interests or the lack thereof introduce a bias to the data. Sometimes a single earthquake produced multiple peaks or two earthquakes overlapped.

Users successfully reconstructed events quickly. Some users felt the information was shallow, but it allowed for high-level understanding. Users focused on the largest peak first. Users liked the timeline, but did not trust the sentiment analysis. The map needed aggregation. Without the system, users would have relied on a news site or aggregator. The journalist suggested that TwitInfo would be useful for backgrounding on a long-term topic. Her opinions were largely similar to the other users.

Contents
Twitter allows for a live understanding of public consciousness but is difficult to adapt into a clear timeline. Past attempts to create visualizations are domain specific and use archived data in general. The authors developed TwitInfo that takes a search query for an event and identifies and labels event peaks, produces a visualization of long-running events, and aggregates public opinion in a graphical timeline. It identifies subevents and summarizes sentiment through signal processing of social streams. The data is normalized to produce correct results. These results can be sorted geographically or aggregated to suggest both localized and overall opinions.

A previous paper found that Twitter bears more in common with a news media site than a social network. Other papers covered hand-created visualizations of events. One paper discussed aggregated opinion on microblogs. Some research involved searching for specific events.

TwitInfo events are defined by a user's search terms. Once an event is defined, the system begins logging tweets that match the query and creates a page that updates in real-time and is used to monitor data. The event timeline is by volume of tweets matching the criteria. Spikes in the volume are labelled as peaks with varying granularity and annotated with relevant descriptors. Peaks can be used to form their own timelines. The actual tweets within the event timeframe are sorted by event or peak keywords and color-coded based on perceived sentiment. Overall sentiment is displayed in a pie chart. Sentiment is derived from a Naive Bayes classifier. The classifier reduces the effect of bias by predicting the probability that a tweet is negative or positive and then recall-normalizes the data. The top three most common URLs are listed on the Popular Links panel. Another panel shows geographic distributions.

Peaks are calculated based on tweet frequency maxima in a user-defined period of time. Tweets are sorted into bins based on time, and, in a method similar to detecting TCP congestion, the bin with an unusually large number of tweets is found. This method involves hill-climbing and identifies the appropriate peak window, which is then labeled. In case a user searches for a noisy search term, IDF-normalizing considers the global popularity of a term to lessen the term's effect.

Discussion
The authors wanted to see if TwitInfo was useful and functional. While they had a few cases that were not covered which could have altered their results slightly, the vast majority of the work appeared to be well-founded. As such, I was convinced of their results.

Since so many users wanted features that were notably missing, like map aggregation, I would like to see the system tested again with those items. The authors also mentioned that a more extensive test would be helpful, which I agree with.

Dr. Caverlee is starting a Twitter project that looks for peaks around the time of a natural disaster. At first I thought he might have gotten scooped, but this is more just a part of what he plans to do. I found the related nature very interesting.

Monday, November 14, 2011

Paper Reading #23: User-defined Motion Gestures for Mobile Interaction

Reference Information
User-defined Motion Gestures for Mobile Interaction
Jaime Ruiz, Yang Li, Edward Lank
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

Jaime Ruiz is a doctoral student in the Human Computer Interaction Lab at the University of Waterloo. His research centers on alternative interactions techniques.
Yang Li is a senior research scientist at Google and was a research associate at the University of Washington. He holds a PhD in Computer Science from the Chinese Academy of Sciences. He is primarily interested in gesture-based interaction, with many of his projects implemented on Android.
Edward Lank is an Assistant Professor of Computer Science at the University of Waterloo with a PhD from Queen's University. Some of his research is on sketch recognition.

Summary
Hypothesis
What sort of motion gestures to users naturally develop? How can we produce a taxonomy for motion gestures?

Methods
The authors created a guessability study that directed users to simulate what they thought would be appropriate gestures for a given task. Participants were given 19 tasks and were instructed to think aloud as well as provide a subjective preference rating for each produced gesture. No recognizer feedback was provided, and users were told to consider the phone to be a magic brick that automatically understood their gesture to avoid the gulf of execution. The sessions were video recorded and transcribed. The Android phone logging accelerometer data and locked the screen.

Tasks were divided into actions and navigation-based tasks. Each task acted either on the phone or on an application. These subdivisions each had a task representing it to minimize task duplication. Similar tasks were grouped together. One set including answering, muting, and ending a call. The gestures were not finalized until the user completed all tasks in a set. The user then performed their gesture multiple times and reported on the effectiveness.

The authors considered their user-defined gesture set through an agreement score, which evaluated the degree of consensus among the users.

Results
Users produced the same gesture as their peers for many of the tasks. The transcripts suggested several themes. The first of these, mimicking normal use, imitated motions performed when normally using the phone. Users produced actions that were remarkably similar in this theme and found their gestures to be natural. Another theme was real-world metaphors, where users considered the device to resemble a physical device, like hanging up by turning the phone as though it was an older-styled telephone. To clear the screen, users tended to shake the phone, which is reminscent of an Etch-A-Sketch. The third theme, natural and consistent mappings, considered the users' mental model of how something should behave. The scrolling tasks were designed to test the users' mental model of navigation. In the XY plane, panning left involved moving the phone left. Zooming in and out involved moving the phone closer and further from the user, as though the phone was a magnifying glass. Discrete navigation, as opposed to one that varied on the force used, was preferred.

Users wanted feedback and designed their gestures to allow for visual feedback while performing the gesture. Most gestures indicated that they would use motion gestures at least occasionally.

Based on the authors' taxonomy of the produced gestures, users opted for simple discrete gestures in a single axis with low kinematic impulse.

The agreement scores for the user-defined gestures are similar to a prior study's scores for surface gestures. A consensus could not be reached on switching to another application and acting on a selection. The subjective ratings for goodness of fit were higher for the user-defined set than gestures not in the set. Ease of use and frequency of use had no significant difference between those two groups. Users stated that motion gestures could be reused in similar contexts, so the user-defined set can complete many of the tasks within an application.

Contents
Smartphones have two common input methods: a touchscreen or motion sensors. Gestures on the former are surface gestures, while on the latter are motion gestures. The authors focused on motion gestures, which have many unanswered questions about the design of them. They developed a taxonomy of parameters that can differentiate between different types of motion gestures. A taxonomy of this sort can be used to create more natural motion gestures and aid in the design of sensors and toolkits for motion gesture interaction from both the application and system.

Most of the prior work on classifying gestures focused on human discourse, but the authors focused on the interaction between the human and the device. One study produced a taxonomy for surface gestures based on user elicitation, which is the foundation of participatory design. Little research on classifying motion gestures has been done, though plenty describes how to develop motion gestures.

The authors' taxonomy classifed user-produced motion gestures from their study into two categories. The first, gesture mapping, involves the mapping, whether in the nature, temporal, or context dimension, of motion gestures to devices. The nature dimension considers mapping to a physical object and produces a gesture that is either metaphorical, physical, symbolic or abstract. Temporal describes when the action occurs with relation to when the gesture is made. Discrete gestures have the action after the gesture. Continuous ones have the action occur duing the gesture and afterwards, as in map navigation. Context considers whether the gesture required a particular context. Answering a phone is in-context, while going to the Home screen is out-of-context. The other category, physical characteristics, focuses on kinematic impulse, dimensionality, and complexity. The impulse considers the range of jerk produced in low, moderate, and high groupings. Dimension involves the number of axes involved to perform a gesture. Complexity states whether a gesture is simple or compound, which is multiple simple gestures placed into one.

The authors took their taxonomy and user gestures to create a user-defined gesture set, based on groups of identical gestures from the study.

Some gestures in the consensus set

The authors suggested that motion gesture toolkits should provide easy access to the user-defined set. The end user should be allowed to add gestures in addition to a designer. The low kinetic impulse gestures could result in false positives, so a button or gesture to denote a motion gesture could be useful. Additional sensors might be necessary. Gestures should also be socially acceptable.

Discussion
The authors wanted to see how users define motion gestures and how to classify these. Their user study was user-oriented and their taxonomy relied heavily on those results. Since the study was well-founded, I concluded that the taxonomy is valid, so I am convinced of the correctness of the results.

The lack of feedback, while necessary for the study, made me wonder if users might produce different gestures when they are able to see what they are doing on-screen. Perhaps a future work could test this idea based on an implemented consensus set.

While I tend to not use motion gestures, this was interesting because, when refined, it allows for greater ease of access in varying contexts.

Tuesday, November 8, 2011

Paper Reading #22: Mid-air Pan-and-Zoom on Wall-sized Displays

Reference Information
Mid-air Pan-and-Zoom on Wall-sized Displays
Mathieu Nancel, Julie Wagner, Emmanuel Pietriga, Olivier Chapuis, Wendy Mackay
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

Mathieu Nancel is a PhD student in Human-Computer Interaction at INRIA. He has work in navigating large datasets.
Julie Wagner is a PhD student in the insitu lab at INRIA. She was a Postgraduate Research Assistant in the Media Computing Group at Aachen before that.
Emmanuel Pietriga is the Interim Leader for insitu at INRIA. He worked for the World Wide Web Consortium in the past.
Olivier Chapuis is a Research Scientist for Universidad Paris-Sud. He is a member of insitu.
Wendy Mackay is a Research Director for insitu at INRIA. She is currently on sabbatical at Stanford.

Summary
Hypothesis
What possible forms of indirect (ie. not on the wall) interaction are best for wall-sized displays? Will bimanual gestures be faster, more accurate, and easier to use than unimanual ones? Will linear gestures slow over time and be preferred over circular gestures? Will tasks involving fingers be faster than those involving whole limbs? Will 1D gestures be faster? Will 3D gestures be more tiring?

Methods
The authors tested the 12 different conditions under consideration with groups as enumerated in the Contents section below. They measured the performance time and number of overshoots, when users zoomed too far. The users navigated a space of concentric circles, zooming and panning to reach the correct level and centering for each set of circles. Users performed each of the 12 possible tasks and then answered questions about the experience.

Results
Movement time and number of overshoots correlated. No fatigue effect appeared, though users learned over time. Two-handed techniques won out over unimanual gestures. Tasks using the fingers were faster, and 1D gestures were faster, though not as much for bimanual tests. Linear gestures were faster than circular ones and preferred by users. The users' Likert scale questions confirm these findings. 3D gestures were the most fatiguing.

Contents
Increasingly, high-resolution displays can show petas of pixels. These are inconvient or impossible to manage with touchscreens. Interaction techniques should allow the user the freedom to move while working with the display. Other papers found that large displays are useful and discussing mid-air interaction techniques. One involved a circular gesture technique called CycloStar. The number of degrees of freedom allows for users to parallelize their tasks. Panning and zooming is 3DOF, since the user controls the 2-D Cartesian position and the scale.

For their system, the authors discarded techniques that are not intuitive for mid-air interactions or are not precise enough. Their final considerations were unimanual and bimanual input, linear and circular gestures, and the three types of guidance through passive haptic feedback. Unimanual techniques involve one hand, while bimodal use two. Linear gestures move in a straight line, while circular ones involve rotation. Following a path in space with a limited device is 1D guidance, using a touch-screen is 2D, and free gestures are 3D. The bimanual gestures assign zoom to the non-dominant hand, and the other features to the dominant hand. The limb portions in consideration are the wrist, forearm, and upper arm. The 3D circular gestures resembled CycloStar. The linear gestures push inwards (zoom in) and pull outwards (zoom out). Circular ones involving making circles with the hand.

Discussion
The authors wanted to determine which forms of interaction were best for a large interactive display. Their tests were comprehensive and thorough, so I have no reason to doubt the validity of their claims.

This might be useful for major disaster emergency response teams. A large display combined with available information could be vastly useful, albeit expensive to implement. Dr. Caverlee is starting a project that gleans disaster information from social media sites. These technologies would certain work well together.

With the initial steps put together in this paper, I would like to see future work that more finely hones the details of the preferred types of interactions.

Paper Reading #21: Human Model Evaluation in Interactive Supervised Learning

Reference Information
Human Model Evaluation in Interactive Supervised Learning
Rebecca Fiebrink, Perry R. Cook, Daniel Trueman
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

Rebecca Fiebrink is an Assistant Professor in Computer Science and affiliated faculty in Music at Princeton. She received her PhD from Princeton.
Perry R. Cook is a Professor Emeritus in Princeton's Computer Science and Music Departments. He has a number of papers combining music and software in innovative ways.
Daniel Trueman is an Associate Professor of Music at Princeton. He has built a variety of devices, including hemispherical speakers and a device that senses bow gestures.

Summary
Hypothesis
Can interactive supervised learning be used effectively by end users to train an algorithm to correctly interpret musical gestures?

Methods
The authors conducted three studies. The first (A) consisted of composers who created new musical instruments with the incrementally changing software. The instruments drive digital audio synthesis in real-time. Input devices were varied.

The second study (B) asked users to design a system that would classify gestures for different sounds and had a task similar to the prior study. Students would perform a piece through an interactive system they had built.

The third test (C) involved a cellist and building a gesture recognizer for bow movements in real-time. The cellist would label gestures appropriately for inital data.

Results
Each study found that users would adjust their training sets and built them incrementally. Cross-validation was only used occasionally in the latter two studies and never in the first. Those who did use it found that high cross-validation values indicated were reliable clues to the working nature of the model. Direct evaluation was used far more commonly in each study. It was used in a wider set of circumstances than cross-validation. It checked correctness based on user expectations, suggested that users assigned a higher "cost" to more incorrect output (A, B, C), assisted in transitioning between labels for parts of the bow (C), let users use model confidence as a basis for algorithm quality (C), and made it clear that sometimes complexity was desired, as users explicitly acted to add more complexity to the model (A).

In the third study, the cellist had to use direct evaluation to subjectively grade the system after each training. Interestingly, higher cross-validation accuracy did not always appear with higher subjective ratings.

Users learned to produce better data, what could be accomplished with a supervised machine learning system, and possible improvements to their own techniques (C). Users rated Wekinator very highly.

Contents
Machine learning recently has a number of people interested in integrating humans into machine learning to make better and more useful algorithms. One type is supervised machine learning, which trains the software on example inputs and outputs. These inputs are generalizable, which allows the algorithm to produce a likely result for input not in the data set. Building the data sets is difficult, so cross-validation can be used to partition sets into the training set and expected inputs. Human interaction can be added to the machine learning process, especially to judge the validity of outputs.

The authors set their inputs to be gestures, both manual and of a cello bow. The manual one was used not only to model gestures correctly, but to allow for real-time performance of the system. Gesture modelling works for interactive supervised learning because the algorithm must be trained to the person, who can then adjust based on results. Their software is called Wekinator. The system allowed for cross-validation and direct evaluation. These allow users to learn implicitly how to use the machine learning system more effectively without any explicit background training in machine learning. Even with cross-validation, a training set might be insufficiently general. Generalizing is paramount. Since the training is user-performed, however, the data set might be optimized for only a select range, which is inideal.

Discussion
The authors wanted to know if interactive supervised machine learning could produce a viable means of handling music-related gestures. Their three tests found that the users could fairly easily use the software to produce a valid model for the task at hand. As such, I am convinced that interactive supervised machine learning is viable for at least this case, if not for more general ones.

This generality of applications is something that I hope to see addressed in future work. The possible users needed to be experienced in a musical programming language before they could even start to train the software. I wondered if a string player who has a pathological fear of PureData (and its less-free cousin, used by the authors) could learn to use a variation of the software. If so, perhaps this concept can be extended to the increasingly less technically-minded.

Thursday, November 3, 2011

Paper Reading #27: Sensing Cognitive Multitasking for a Brain-Based Adaptive User Interface

Reference Information
Sensing Cognitive Multitasking for a Brain-Based Adaptive User Interface
Erin Treacy Solovey, Francine Lalooses, Krysta Chauncey, Douglas Weaver, Margarita Parasi, Matthias Scheutz, Angelo Sassaroli, Sergio Fantini, Paul Schermerhorn, Audrey Girouard, Robert J.K. Jacob
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

Erin Treacy Solovey is a PhD candidate at Tufts University with an interest in reality-based interaction systems to enhance learning.
Francine Lalooses is a graduate student at Tufts University who studies brain-computer interaction.
Krysta Chauncey is a Post-Doc at Tufts University with an interest in brain-computer interaction.
Douglas Weaver is a Master's student at Tufts University working with adaptive brain-computer interfaces.
Margarita Parasi graduated from Tufts University and is now a Junior Software Developer at Eze Castle.
Matthias Scheutz is the Director of Tufts University's Human-Robot Interaction Lab.
Angelo Sassaroli is a Research Assistant Professor at Tufts University and received is doctorate from the University of Electro-Communication.
Sergio Fantini is a Professor of Biomedical Engineering at Tufts University, with researhc in biomedical optics.
Paul Schermerhorn is affiliated with Tufts University.
Audrey Girouard received her PhD from Tufts University and is now an Assistant Professor at Carleton University's School of Information Technology.
Robert J.K. Jacob is a Professor of Computer Science at Tufts University with research in new interaction modes.

Summary
Hypothesis
Can a non-invasive functional near-infrared spectroscopy (fNIRS) help with the design of user interfaces with respect to multitasking? How does it compare to a prior study that used a different system?

Methods
Their preliminary study used fNIRS in comparsion to the functional MRI study. The hope was to distinguish brain states as fMRI did. The design paradigm besides the sensors was identical. The authors used leave-one-out cross validation to classify the data, which then had noise removed.

The second set of experiments involved human-robot cooperation. The human monitored the robot's status and sorted rocks found on Mars by the robot. The information from the robot required a human response for each rock found. The Delay task involved checking for immediate consecutive classification messages. Dual-Task checked if successive messages were of the same type. Branching required the Delay process for rock classification messages and Dual Task for location messages. The tasks were presented in psuedo-random order and repeated until 80% accuracy was achieved. Then the fNIRS sensors were placed on the forehead to get a baseline measure. The user then completed ten trials for each of the three conditions.

The authors then tested if they could distinguish variations on branching. Random branching presented messages at random, while Predictive presented rock messages for every three stimuli. The procedures were otherwise identical to the preceding test.

The final test used branching and delay tasks to indicate branching and non-branching. The system was trained using the data from that test and then classified new data based on the training results.

Results
The preliminary study was about 68.4% successful in three pairwise classifications and 52.9% successful for three-way classification, suggesting that fNIRS could be used for further studies.

For the first test of the robot task, with users who performed at less than 70% accuracy removed from consideration, data was classified into normal and non-Gaussian distributions. Normal distributions were analyzed with repeated measuremeants one-way ANOVA, and the latter used non-parametric repeated measurements ANOVA. Accuracy and response time did not have a significant correlation. No learning effect was found. Total hemoglobin was higher in the branching condition.

The next test found no statistically significant difference in response times or accuracy. The correlation between these two was significant for predictive branching. Deoxygenated hemoglobin levels were higher in random branching for the first half of the trials, so this can be used to distinguish between the types of tasks.

In the last test, the system correctly distinguished between task types and adjusted its mode.

Contents
Multitasking is difficult for humans unless the tasks allow for integration of data into each other. Repeated task switching can lead to lower accuracy, increased duration, and increased frustration. Past works have measured mental workload, through means like pupil dilation and blood pressure. Others measured interruptibility costs based on different inputs, such as eye tracking or desktop activity. One study found three multitasking scenarios: branching, dual-task, and delay. The first involves a tree of tasks, the second involves unrelated tasks, and the last involves delay tasks until later. Functional MRI (fMRI) was used for that study and could distinguish between the tasks.

Non-invasive brain-sensing systems can help to enhance researcher's understanding of the mind during tasks. The authors' system uses functional near-infrared spectroscopy to distinguish between four states that occur with multitasking. Their goal is to use this system to assist in the development of user interfaces. They chose a human-robot team scenario to understand cognitive multitasking in UIs. These tasks require the human to perform a task while monitoring the robot.

The authors developed a system that can adjust a UI on the fly depending on the type of task being performed. The system can adjust the robot's goal structure and also has a debug mode. When a branching state appears, the robot autonomously moves to new locations; otherwise it returns to the default state.

Discussion
The authors wanted to find out if fNIRS could distinguish between tasks to adjust user interfaces for better multitasking. Their multiple tests and proof-of-concept certainly suggest that this is a possible way to enhance multitasking.

I am concerned that this innovation will lead to yet another level of complexity when designing user interfaces, since we now have to classify possible scenarios we think will occur. If this could be automated somehow, that would be beneficial.

The authors found one way to distinguish between tasks, but I imagine that there are far more. Future work might find one that is even less obtrusive than external sensors.

Thursday, October 27, 2011

Paper Reading #24: Gesture Avatar: A Technique for Operating Mobile User Interfaces Using Gestures

Reference Information
Gesture Avatar: A Technique for Operating Mobile User Interfaces Using Gestures
Hao Lu, Yang Li
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

Hao Lu is a graduate student in the University of Washington's Computer Science and Engineering Department and DUB Group. He is interested in new interaction methods.
Yang Li is a senior research scientist at Google and was a research associate at the University of Washington. He holds a PhD in Computer Science from the Chinese Academy of Sciences. He is primarily interested in gesture-based interaction, with many of his projects implemented on Android.

Summary
Hypothesis
Does Gesture Avatar outperform Shift? Is it slower on large objects and faster on small ones? Is it less error-prone? Will mobile situations affect it less than Shift?

Methods
The walking tasks were performed on a treadmill. All participants used touchscreen phones extensively. A within-subjects factorial user study had half the participants use Shift then Gesture Avatar and then vice versa for the other half. For each technique, the task was performed while sitting and walking. Users were asked to locate a single highlighted target out of 24 small letter boxes. Ambiguity was simluated by controlling the distance between objects and the number of letters used. Finger positions were stabilized.

Results
Gesture Avatar is faster than Shift for small targets and slower for large ones. Medium sizes did not significantly different times. Shift while sitting was faster than Shift while standing, but Gesture Avatar found no difference. Gesture Avatar was far more consistent than Shift across the varying tests and also had lower error rates. Most participants preferred Gesture Avatar.

Contents
Touchscreen mobile devices struggle with low precision due to the size of fingers and covering up the target. Increasing the size of widgets is not always feasible. The authors developed Gesture Avatar, which combines the visibility of traditional GUIs and the casual interaction of gestures. The user's gesture is drawn to the screen and can be manipulated to affect the underlying widget. The translucent bounding box of a drawn gesture is associated with an element and can be tapped to use the widget. Gestures can be arbitrary and have multiple types of interaction. An incorrect association can be corrected with directional gestures or by dismissing it. The previous work in this area does not focus on gestures or requires significant adjustment of the UI.

The system first distinguishes taps from gestures and then displays the gesture with its bounding box and target. Gestures are first classified as characters or shapes and then identified, with shapes being compared to on-screen objects. A 2D Gaussian distribution is used to determine the target. Possible uses include a mobile browser, media player, moving the caret in a text box, and navigating a map program. In all cases, the program was wrapped into the UI as an additional interaction layer.

Gesture Avatar can be accelerated in cases of low ambiguity and can track moving targets. The stroked gesture is shown to be consistent with the avatar metaphor. It struggles with mode switching from gestures to actions.

Discussion
The authors of this paper tested Gesture Avatar against a preexisting method and supposed that theirs would perform better in many respects. Their user test was comprehensive in the range of variables, so I have no doubt that Gesture Search performs better than Shift in a number of ways.

When I was reading this paper, I couldn't help but think of Li's Gesture Search (to the point where I would accidentally type "Search" instead of "Avatar"). The upcoming release of Android 4 strives for a more consistent interface, and I can't think of what would be more consistent than combining these two systems on a mobile device. I would have tested this myself, but the interface is not yet available.

This system seems like it could help considerably with navigating webpages, much less the other examples provided. I frequently need to visit the cached version of a website in Google's search results, but usually my fingers, which are very far from huge, trigger another action instead. A targeted gesture avatar could dramatically reduce that frustration, which for me would make the software definitely worth the download for that alone.

Monday, October 24, 2011

Paper Reading #20: The Aligned Rank Transform for Nonparametric Factorial Analyses Using Only ANOVA Procedures

Reference Information
The Aligned Rank Transform for Nonparametric Factorial Analyses Using Only ANOVA Procedures
Jacob O. Wobbrock, Leah Findlater, Darren Gergle, James J. Higgins
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

Jacob O. Wobbrock is an Associate Professor in the Information School and an Adjunct Associate Professor in the University of Washington's Department of Computer Science and Engineering. He focuses on novel interaction techniques.
Leah Findlater will be affiliated with University of Maryland's College of Information Studies, but is currently in the Information School at the University of Washington.
Darren Gergle is an Associate Professor in Northwestern University's Department of Communication Studies and Department of Electrical Engineering. His work is in HCI related to visual information.
James J. Higgins is a Professor in Kansas State University's Department of Statistics.

Summary
Hypothesis
How well do aligned rank transform (ART) analysis on existing data sets correspond with the authors' results performed through other measures?

Methods
The authors used their software with data sets from published HCI work and compared the results with the original authors' findings. One case evaluated the use of ART to provide interaciton effects. The second showed how ART is not bound to the distributional assumptions of ANOVA. The third is nonparametric testing of repeated measures data.

Results
The first case study revealed a possible interaction that the Friedman test used in the original could not have found. It also found interactions that even an inappropriate test revealed.

The second study originally found minimal interactions because the data was lognormal. The ART test revealed that the interactions were far more significant.

The third study found that ART reduced the skew in the data and revealed all of the significant interactions, which could not be found in the original study.

Contents
Nonparametric data appears frequently in multi-factor HCI experiments, but current methods are likely to violate ANOVA assumptions or do not allow for the examine of interaction effects. Methods exist to solve this problem but are not widely available or easy to use. The authors developed a generalizable system that relies on ART to align and rank data before performing F-tests. ARTool is the desktop version and ARTweb runs online.

ART is usable in similar situations as parametric ANOVA but it does not require a continuous or ordinal response variable and does not need to be normally distributed. Rank tests apply ranks to data sets, and alignment aligns data depending on the effect to remove all effects except one. The prodcedure follows five steps. The first is to compute residuals, which is the response minus the average of all response who have a factor level that matches the response in question. Then, the estimated effects are calculated for all main and interaction effects. The authors presented a generalized version for an n-way interaction for the response. The aligned response is then found by adding the results from the previous two steps. The averaged ranks are assigned with the smallest rank from the previous step receiving a rank of 1 and so on. Then a full-factorial ANOVA is performed on the result of the previous step. The two opportunities to asess correctness are that the results of the third step should have a column that sums to 0. An ANOVA on the results of the third step should show all of the effects stripped out except the effect for which the data was aligned.

ARTool parses long-format data tables and produces aligned and ranked response for all main and interaction effects. It produces descriptive error messages in cause of a problem and has an output that contains (2+N)+2(2^N-1) columns. The system does not work in case of extreme skew of data and is best in randomized designs.

Discussion
The authors developed a system that applies a vetted statistical method and then validated it against pre-existing papers. Their method confirmed the results but also produced interesting venues of future work that went unfound. Because of this, I found their results to be believable.

I held concerns about ART because such a useful method should be standard in the average statistical package. On the other hand, it may just be a matter of the method's having gone undiscovered in CHI.

Possible future work could include comparisons of the various statistical methods the authors discussed with ART on the same data set. This could help to validate it as a means of evaluating CHI results.

Book Reading #4: Obedience to Authority

Milgram's Obedience to Authority was a fascinating, if not disturbing, read. One of my biggest complaints about conference papers is that the page limit puts a harsh constraint on the extent of detail that authors can provide about their experiment. With an entire book to fill, Milgram more than filled the pages and probably could have gone on for much longer. Like Norman, he clearly had a lot to say; however, Milgram actually avoided repeating details wherever possible. That meant that each page had a new detail about the failings of humanity when authority gets involved.

I was greatly concerned that humanity as a whole is so easily manipulated by a man in a lab coat stating that the experiment must continue at all costs. I started the book knowing that I was going to be alarmed by the results, but that didn't make the crushing knowledge that Milgram initially expected results like I did—with people stopping sooner than they did—any less painful. The whole time I kept thinking to myself that I wouldn't be like the people who were obedient, but the more I read, the more I considered that my morality might just fail as theirs did.

I was surprised that no one at all had realized that the shocking experiment was a sham. Surely someone had enough electrical cognizance to think that the hand plate would shock the person holding down the actor too. There was even an electrician who took part in the experiment and was taken in by the ruse. I don't even know how the huge numbers of people—from all backgrounds too—couldn't figure out that they weren't actually going to hurt someone.

The women who deluded herself into thinking that she was more reluctant to continue the experiment was fascinating. I didn't realize just how much memories could be manipulated to make it seem as though we are more benign that we actually act when presented with authority. I suspect that was the case with many of the people involved with the experiment. We are largely good people and something that lets us believe that we acted better than we really did gets considered as a possibility.

I appreciated that Milgram thoroughly considered possible factors that could affect our willingness to comply with authority. The study on conflicting authorities was a relief to read, since it suggested that people make the right decision when presented with an apparent moral quandary. On the other hand, the fact that we readily follow in with other people bothered me. The nature tendency is to do what everyone else wants us to do, even if that is utterly wrong.

The discussion on the importance of obedience in the creation of society provided some explanation of why it is our nature to obey. While I do have to agree that obedience and hierarchy form a good society, these can be corrupted to serve any extremely detrimental cause. Perhaps we as a whole need to rebel against our natural tendencies whenever we feel that something is wrong, since internal morality so frequently tends to be in line with what it should be.

I had expected the book to be considerably more dull than it actually was. This is due at least in part to Milgram, though a large portion of it was because of the interesting nature of the material. Milgram's writing is almost understated; he allows the content to be the predominant focus, not flowery prose. His matter-of-a-fact tone makes it seem as though he means to discuss the most boring topic on earth, which simply makes the material that much more intriguing since he is talking about something that cannot be considered mundane in any sense.

Paper Reading #19: Reflexivity in Digital Anthropology

Reference Information
Reflexivity in Digital Anthropology
Jennifer A. Rode
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios
Jennifer A. Rode is an Assistant Professor at Drexel's School of Information and a fellow in Digital Anthropology at University College London.Her dissertation research involved an ethnography to examine gender and domestic end-user programming for computer security.

Summary
Hypothesis
What are the contributions of anthropological ethnographies to the study of HCI? How can reflexive ehthnographies contribute?

Methods
The author referred to a wide body of other works to define the various forms of ethnography, especially with respect to HCI. She also determined which factors of modern anthropological ethnographies are missing form HCI works. She then discussed the different ways of producing an ethnography and design together.

Results
The voice of ethnographers is critical to make a complete ethnography. Without it, the understanding of the reasons behind a certain design are untold. Discussing rapport, participant-observation, and use of theory all aid in producing a reflexive ethnography, which can be more useful than positivistic works in design. Iterative ethnographies feature grounded theories and increase the users' interest in a design. However, all forms of ethnography have their individual merits.

Contents
HCI is slow to adopt changing practices in anthropology, especially digital anthropology, which is the comparative ethnography of the effects of techonology on how humans experience life. While there is work in HCI that is in that area, it is not reflexive. Reflexivity is one of two anthropological approaches and embraces intervention as a means of gathering data. The social-technological gap can be studied through reflexive works. The other approach is positivism, which celebrates data above all else. Ethnographies are useful to aid in the design process.

There are three main forms of writing ethnographies: realist, confessional, and impressionistic. The realist form is the only form that is commonly accepted in HCI papers. It bears several common factors with positivism and focuses the need for experimental authority, typical forms, the native's point of view, and interpretive omnipotence. The first indicates that the researcher minimizes reactivity to observed events. Typical forms include data and precision. The native's point of view implies that the ethnography is precisely in line with the views of those being observed. Interpretive omnipotence does not allow the observed to participate in the writing and does not allow for uncertainty of the information. Confessional ethnographies reveal the author's biases to address the inherent subjectivity of ethnographical work. Impressionistic ethnographies create a narrative of the observed's daily life. Confessional and impressionist ethnographies are norammly only found as a portion of a realist work. The author emphasized that none of the presented styles are lacking in rigor.

There are several elements of modern anthropological works that tend to be absent from CHI: discussing rapport, participant-observation, and use of theory. Rapport enables access to vaild data and the discussion of it is an explanation of experimental method. It helps to explain data, but is frequently considered to be understood implicitly, which detracts from the quality of the ethnography because the unknown is not enumerated. Participant-observation is being a part of the participant's daily life. It tests hypotheses through experiences rather than external validation and is not discussed first-hand in realist works. Use of theory entails using previous theories as a basis for newer ones that are only formed after working with a participant.

HCI ethnographies are usually formative, summative, or iterative. Formative ones focus on current technology to improve or produce new ones and are the most common in HCI. Summative ones evaluate technology after its design is finished. Iterative ones produce a design in stages, with participants actively aiding in its design both directly and indirectly. It requires a particularly lengthy period of time to produce.

Discussion
The author sought to analyze the contributions of anthropological ethnographies to the study of HCI. Her paper studied several HCI papers with various qualities and cited prominent papers in ethnographical work. Her work was sufficiently explanatory to let me believe in the validity of her findings that the three forms of ethnography are valid for CHI papers.

With as much as Rode frowned upon positivistic papers, I was a little surprised to see her conclusion at first. On second reading, though, I realized that she was merely suggesting that other forms of ethnographies should have merit too, not that she was advocating reflexivity over positivism. The tone of the paper as a whole led me to that first conclusion, so perhaps it would have been worth it to more clearly explain her stance before the final page.

As far as future work, iterative ethnographies could be useful in the design process, but it might be worth looking into to see if the extra time expended in iterative design results that much more of a better product. Since the type of paper the author proposed is very rare, more research should be done to see if it is a viable alterative.

Thursday, October 13, 2011

Book Reading #3: Emotional Design: Why We Love (Or Hate) Everyday Things

I feel there is no better way to describe Donald Norman's Emotional Design that to compare with Design of Everyday Things. It was, simply put, a radical departure from that previous work of his that we read. It seemed to me that the author had some sort of enlightenment, since he went from disapproving of things that probably “won an award” to celebrating that visual aesthetic is also an important factor in the likability of a device. Honestly, I wonder what could have caused such a radical change in viewpoint. I would expect him to have grown more disillusioned with the visceral as time passed, not less so. It could be argued that he had an enlightenment that brought him from relative ignorance (though he hardly believed himself ignorant in the past) to the realization that people like things for more than just their uses. After all, someone had to design the thing in the first place; who would design what they did not like in someone short of being forced to do so? The worship of image that he claimed was detrimental in Design was a perfectly valid reason to produce an object in Emotional. If I had been told when reading Design that Norman was going to transform his mentality to nearly its polar opposite, I would have recommended that the person who informed me should stop lying. It is, after all, quite rare to completely alter one's viewpoint rather than simply changing it slightly. Fifteen years is a long time, but I am not certain that it is sufficiently long of a period.

I liked how he divided the dominant draws of an object into the visceral, behavioral, and reflective factors. His previous book left me wondering why he was so adamantly against beautiful objects that are just that and nothing more. In this book, though, he revealed the hypocrisy of his works: he too likes useless objects. Some of them are more than just knick-knacks, like his impossible tea kettle; they occupy a none-too-ignorable amount of space that he could have filled with more functional objects, like those he praised in Design. Considering he was so concerned with the operability of everyday objects in Design, I have to wonder if he was at all thinking about things that he didn't technically manipulate on a daily basis, but simply observed. Perhaps he thought that curiosities were not everyday enough to be considered. Regardless, I was pleased to see that he extended his definition of everyday objects to include those that are simply observed, not manipulated.

If I had a choice to read either this book or Norman's previous book and this one, I definitely would have selected just this one. Norman apparently is particularly fond of reusing material extensively. When I saw some of the examples (and images!) in Design reappear here, I was a little astonished. (I am, however, grateful that he spared us the wish for a portable computer organizer in this book.) I would understand the reuse of content if he was refuting his previous ideas, but only did this with some of the concepts. Without any background in his prior works, I feel I would have understood his ideas thoroughly. Of course, for all I know, he might have written two more books extolling the visceral and reflective attributes of objects. What I do know is that I hardly needed a further explanation of the same, with the thoroughness with which Norman writes. Perhaps that is why I am so frustrated with Norman. I don't mind his writing style too much, besides the fact that he tends to overexplain a concept. With his previous book, the last chapter taught me all I needed to know. Similarly, this one taught me all that I needed to know about the other book. His writing was actually less dense and more accessible, but still just as informative.

Paper Reading #18: Biofeedback Game Design: Using Direct and Indirect Physiological Control to Enhance Game Interaction

Reference Information
Biofeedback Game Design: Using Direct and Indirect Physiological Control to Enhance Game Interaction
Lennart E. Nacke, Michael Kalyn, Calvin Lough, Regan L. Mandryk
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

Lennart E. Nacke is an Assistant Professor of HCI and Game Science at the University of Ontario Institute of Technology. As a postdoctoral researcher, he worked in the Interaction Lab of the University of Saskatchewan and studied affective computing.

Michael Kalyn is an undergraduate student in the Interaction Lab of the University of Saskatchewan's Department of Computer Science. His focuses are in affective feedback and interfacing sensors.

Calvin Lough is affiliated with the Interaction Lab of the University of Saskatchewan's Department of Computer Science. His research is in affective computing.

Regan L. Mandryk is an Assistant Professor in the Interaction Lab of the University of Saskatchewan's Department of Computer Science. Her research focuses on affective computing and ubiquitous and mobile gaming.

Summary
Hypothesis
How do users respond to physiological sensors that work with game controllers? Which types of physiological sensors work for certain game tasks?

Methods
The authors developed a side-scrolling shooter that uses a traditional control as the normal form of input. Physiological sensors augment the controller through indirectly or directly controlled input. There were thus three versions of a game that participants played. Two game conditions used direct sensors and four used indirect. The direct measures were respiration and EMG on the leg, and the indirect included GSR and EKG. Both physiological games had the eye gaze power-up, though the control lacked it. All participants played the three games, presented in random order, after playing a training level. The players completed questionnaires that asked about their experience after each game and again after completing all the levels. The players were not very experienced with side-scrolling shooters and mostly used Nintendo's Wii and DS as their sole form of novel input.

Results
Players found physiological controls to be more fun than just playing with the controller, with 90% preferring some sort of physiological control. The pleased users liked the increased level of involvement and variations. The participants agreed that the physiological control was novel, involving a little learning at first, but then feeling quite natural. Users preferred eye gaze the most, but only 1/20 of the votes were for indirect sensing. Overall, direct input was preferred to indirect in each category tested. The GSR and EKG sensors were difficult to use, EMG responses were split fairly evenly, and the respiration sensor was liked due to its ease of use. The temperature sensor was easy at first, but users found it tedious over time. Users preferred multiple forms of input. Direct controls were more real-time and suited for controlling the player icon, but indirect controls were slower to respond and thus were better to control the environment. Direct controls were also thought to increase the player's sense of accomplishment. Natural mappings were preferred. The players were comfortable with wearing sensors if they contribute to gameplay.

Contents
The authors developed a classification of direct and indirect sensor input to work with traditional game control. Current physiological game design paradigms revolve around indirectly controlled signals, like heart rate. Eye gaze and muscle flexion are directly controlled. Computer games allow for a low-risk way of testing physiological HCI. This sort of interaction is called affective gaming and relies on the player's emotional state. Replacing traditional controls with biofeedback has not worked well in the past. Adapting affective games use biofeedback to alter the technical parameters or user preferences based on controller movement or button pressure, for example. Indirect controls allow players to learn to control their brainwaves, though a previous study showed that people liked explicit biofeedback in first-person shooters. Various physiological measures include eye gaze, electromyography (EMG), galvanic skin response (GSR), electrocardiography (EKG), respiration sensors, and temperature sensors. The premise of biofeedback training is to turn indirect physiological measures into increasingly directly controlled ones. Indirect sensors are available to consumers, but direct ones are not available at the moment.

The authors' game used controller mappings common in Xbox 360 shooter games with physiological input controlled separately. The game features bosses and checkpoints, which can only be activated when a player kills all the enemies. The size of enemy targets' shadows increased based on physiological control and provided a larger hit box. The backup flamethrower's range is variable. Speed and jump height also varied, though these factors were tied together. The rate of snowfall during the final boss fight also changed. An eye gaze power-up was included, but only lasted for 20 seconds to reduce eye strain and maintain game balance. The sensors were integrated through a custom C# library.

Discussion
The authors tested user preference for biofeedback controls as part of a game and which types of controls are ideal for certain tasks. Their user test was small and focused on a single genre, but they created enough of a basic framework that a large body of future work will be able to rely upon.

I was initially concerned that the sensors that were applied to users would be cumbersome. However, the players were okay with most sensors so long as they contributed to gameplay and were not excessively taxing. That brought up an important limitation: the types of sensors used must not tire the user.

I hope that future work is performed with regard to additional genres. The next foreseeable step is in first-person shooters, which are closely related to their side-scrolling brethren. I would be very interested in seeing how well "God games" like Civilization could be played with biofeedback, since that seems like the least likely effective genre imaginable for the technology.

Paper Reading #17: Privacy Risks Emerging from the Adoption of Innocuous Wearable Sensors in the Mobile Environment

Reference Information
Privacy Risks Emerging from the Adoption of Innocuous Wearable Sensors in the Mobile Environment
Andrew Raij, Animikh Ghosh, Santosh Kumar, Mani Srivastava
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada

Author Bios

Andrew Raij is a Post Doctoral Fellow in University of Memphis's Computer Science Department as a member of Dr. Kumar's lab. He is interested in persuasive interfaces.

Animikh Ghosh is a junior research associate at SETLabs and was a research assistant to Dr. Kumar at the University of Memphis. He is interested in privacy risks from participatory sensing.

Santosh Kumar is an Associate Professor in the University of Memphis's Computer Science Department. He leads the Wireless Sensors and Mobile Ad Hoc Networks Lab.

Mani Srivastava is a professor in UCLA's Electrical Engineering Department. He worked at Bell Labs, which he considers to be the driving force behind his interest in mobile and wireless systems.

Summary
Hypothesis
How comfortable are individuals with the possibility that their data may be made public? How can we reduce the risk of sensitive data leakage?

Methods
The authors administered a survey to users who had their data stored, people who did not have data stored, and people with stored data who were informed of the extent of the stored data to determine their comfort levels with the possibility of data compromises. The people with stored data thus took the survey twice, after having participated in the cooperating study of AutoSense that stored their data. The participants were college students. The survey measured their level of concern about data disclosure both with and without data restrictions and abstractions. Participants were informed of the informed data through the Aha visualization system.

Results
Positively-associated activities, like exercise, were acceptable to share, as was location. The group with no data stake and the group who had not yet learned of the extent of the data collected about them had similar levels of unconcern for data storage. After learning about their data, the second group had higher concern ratings. Some participants referenced that they expect physiological states to remain private. Adding a temporal context increased concern, with increasing abstractions reducing the concern. Duration was less worrisome than a timestamp. Increasing the publicity of data sets concerned users, especially when the identity was included in the data. Participants seemed initially naive to the danger of shared data, with the exception of location. The differing concerns about certain activities suggested that privacy should be handled to different extents depending on the study.

Contents
Wearable sensors record sensitive physical and physiological data, to which machine learning algorithms can be applied. This algorithms reveal a wealth of private information about behavioral states and activities, including stress levels and addictions. These inferences can be shared without the user's permission, potentially revealing private data or identifying the individual. Data sets produced from tests of wearable sensors cannot be released for that reason. Most notably, seemingly innocuous data can be combined to produce informed inferences about a person. Sensor data is hard to anonymize because it is inherently sensitive and quasi-identifying.

The authors produced a framework that focuses on how to displace the boundary where privacy and publicity are in tension. It covers measurements, behaviors, contexts, restrictions, abstractions, and privacy threats. Behaviors and contexts derive from measurements. Contexts can be further subdivided into temporal, physical, physiological, and social contexts. Restrictions and abstractions safeguard data. The former removes data from the set, and the latter tries to reduce the extent of exposure in the set.

The authors developed the Aha visualization system to provide four visualizations of individual behavior, including daily life and stress.

Discussion
The authors wanted to find out how concerned users were about sensor data being used to determine things about them and how to prevent identifiable information from being released. Their survey was well-founded and their framework seems reasonable, so I am convinced that this paper is sound.

I was very interested to see just how much could be determined about a person through seemingly unrelated data points. It was actually extremely disturbing to think that so much information could be inferred through accelerometers and stress meters.

I would be very interested in seeing this survey being expanded to cover a variety of demographics. While I would think that college students would be the most knowledgeable about the extent of information they are revealing, I am curious to see what a child or senior citizen might think. Perhaps generation gaps would emerge--or everyone would be equally ignorant of the dangers. Either way, I would love to see those results.

Paper Reading #15: Madgets: Actuating Widgets on Interactive Tabletops

Reference Information
Madgets: Actuating Widgets on Interactive Tabletops
Malte Weiss, Florian Schwarz, Simon Jakubowski, Jan Borchers
Presented at UIST'10, October 3-6, 2010, New York, New York, USA

Author Bios

Malte Weiss is a PhD student in the Media Computing Group at RWTH Aachen University. His work focuses on interactive surfaces and tangible user interfaces.

Florian Schwarz is a Diploma Thesis student in the Media Computing Group at RWTH Aachen University. His work is in interactive tabletop computing.

Simon Jakubowski is a student assistant in the Media Computing Group at RWTH Aachen University. He is working on two other projects besides this one.

Jan Borchers is the head of the Media Computing Group at RWTH Aachen University. He holds a PhD from Darmstadt University of Technology in Computer Science and now explores HCI.

Summary
Hypothesis
How can we create tangible magnetic widgets for use on tabletop computers?

Methods
The authors created a prototype of the Madgets system and various sample controls for it, including radio buttons and slider knobs. They considered general purpose widgets, height, force feedback, water wheel Madgets (which transfer energy), and mechanical audio feedback as possibilities.

Results
General purpose widgets provide both haptic and visual feedback. The physical configuration can be saved for later use. Multiple users can also use the system since the system is actuated by electromagnets, which enables a tangible presence of another user. They also support ad-hoc usage.

Height is a possible feature, as the electromagnets can keep a Madget in place while lifting parts of it, as in a radio button. A user can feel both the shape of the button and its current state. A clutch control can lock or unlock moving parts to disable it, similar to current GUIs "graying out" and option.

Force feedback can be generated through resistance to the moving of a part of a Madget. The algorithm used allows for Madgets to vibrate and also create dynamic notches when a user reaches a certain step in a scale

The water wheels transfer energy from the table to the Madget. Inductive energy transfer, performed through plates, allows for power to be sent to a Madget without additional components. Motors work through a rotational actuation of a part and can be combined to produce a more complex system.

Mechanical audio feedback can occur through a magnetic pulse that triggers a noise of some sort.

Prototypes are quickly producible and do not take much time to program into the system with dynamic mappings.

Contents
The Madgets created by the authors are translucent, tangible widgets that resemble common controls like sliders and are relevant for general usage. They have magnets attached to them that can be actuated independently through an array of electromagnets. Actuated tangibles include moving the Madget across the surface, and force feedback. They are low-power and low-cost. The devices are unpowered and passive, which makes them easy to produce and hide the underlying technology from the user. Controls can be relabeled dynamically.

The sensing technique requires uniform backlighting, provided by an electroluminescent foil. An array of electromagnets, controlled by an Arduino through shields that provide various output channels to generate pulse width modification, actuates objects separately. To track the physical devices, a visual sensing technique, which does not interfere with the electromagnets, was used. Diffused Surface Illumination detects touch events but is also precise enough to differentiate Madgets from fingers. The controls are illuminated through the LCD, so that they are dynamically labeled. They are mounted on cylindrical markers that are used to determine where the Madgets are and encode the type of Madget. Moving parts also have a marker. Permanent magnets are attached for actuation. Each rigid body in a Madget can have different actuation forces applied to its magnets tangentially or normally. The forces are computed to move a permanent magnet a certain distance based on polarization and the position of the magnet. Linear optimization is used to resolve their computations. The Coin-or programming library minimizes the desired function, and thus the total force, power and heat production. The weights are applied dynamically to optimize performance and reliability with a frame rate of 30fps. Overheating electromagnets are considered weightier than others to reduce their usage. The widgets have gradient fiducials to increase the resolution of each sensing dot based on the radius of the object, detected by its brightness. The table is pre-calibrated for this with no objects, a white sheet, and then a dark gray sheet successively placed on it to determine the thresholds. The algorithm can identify multiple touches and dragging gestures. It first detects widget footprints and then focuses on remaining input.

Discussion
The authors created a system that features magnetic widgets as a new, cheap form of interaction. While their results certainly suggest that more work in this field is viable, the lack of user testing concerned me. I don't doubt that users could quickly adapt to using the devices as an extension of touchscreens, but the effect of automatic actuation was untested, which leaves me concerned about the effectiveness of the system.

The authors proposed that work should be done in avoiding Madget collisions, which struck me as a very good idea. While a good interface should mean that this situation would never occur, designs should always assume that some error will be made. If the system inherently tries to avoid crashing Madgets, then it is far less likely that such a problem will occur.

I was particularly intrigued by the sheer cheapness of producing one's own widgets, especially with a 3D printer available. The system was specifically built to allow the average designer to easily program any device into the system, which increases its accessibility that much more.