CSCE 436 Blog: September 2011

Thursday, September 29, 2011

Paper Reading #13: Combining Multiple Depth Cameras and Projectors for Interactions On, Above, and Between Surfaces

Reference Information
Combining Multiple Depth Cameras and Projectors for Interactions On, Above, and Between Surfaces
Andrew D. Wilson and Hrvoje Benko
Presented at UIST'10, October 3-6, 2010, New York, New York, USA

Author Bios
Andrew D. Wilson is a senior researcher at Microsoft Research with a PhD from MIT's Media Lab. He helped found the Surface Computing group.
Hrovje Benko is a research at Microsoft Research's Adaptive Systems and Interactions group. He has a PhD from Columbia University and has an interest in augmented reality.

Summary
Hypothesis
How do depth cameras enable new interactive experiences? Does LightSpace enable interactivity and visualization?

Methods
The authors demonstrated the prototype at a three-day demo to more than 800 people. Users could use the system freely.

Results
No more than six users was ideal for correct interactions and to keep the system running smoothly. Users could block each other from the camera, precluding interaction. Holding the item was difficult. Users found new methods of interaction while using the system.

Contents
The authors produced a room-sized system with projectors and depth cameras that combines interaction displays, augmented reality, and smart rooms for a new interaction system. Depth cameras make real-time 3D modelling inexpensive. LightSpace's primary themes are that everything is a surface (building on ubiquitous computing), the room is the computer (reminiscent of smart rooms), and the body can be a display (related to ubiquitous projection). Multiple calibrated depth cameras and projectors project graphics onto any object's surface, even a moving one, through a shared coordinate system. There are no physical markers. 2D image processing can occur in 3D space through meshes to allow for once non-interactive objects to act like a Microsoft Surface. Users can hold a virtual object, since the system projects a red ball representing the object onto the body. Objects can be moved from surface to surface based on the user touching both surfaces simultaneously. Menus rely on the ability to precisely detect where the user is.

The cameras and three projectors are aligned to cover the space well. An infrared camera computes depth from the distortion of the pattern in a captured image. This is compared against measurements when no users are in the room to determine where users are. Depth cameras are calibrated before projectors through retro-reflective dots on a grid followed by a 3D camera pose. Interactive surfaces are manually designated and must be flat, rectangular, and non-moving. The cameras produce a 3D mesh of the room, but a virtual camera makes the computation in 2D space for efficiency. LightSpace combines three virtual cameras that can distinguish hands, but not yet fingers. The plan view helps to determine connections. Picking up an object resembles the motion of a ball on a surface to avoid a complex physics engine. Menu items detect user head positions for readability and is a widget for this system.

Discussion
As a proof of concept for this system, the authors tried to make LightSpace effective and for the most part succeeded. While some of the flaws found in the limited user testing are enough to make me worry about the usability of this technology, I found that the authors adequately demonstrated that this system was in fact feasible.

I was very concerned about users blocking the camera and thus being locked out of the system. I am honestly surprised that that never came up during the design process. Requiring users to be aware of where the cameras are at all times seems to me to completely miss the point of having a smart room. Perhaps more cameras could make this system more robust. The cost is also a little prohibitive at the moment, but the authors are correct to mention that the release of the Kinect may cause depth cameras to decrease in cost, allowing for more widespread use.

I could see this work being combined with the multitoe system for a fully interactive system. The feet could work one system (maybe a keyboard), while the user moves their hands to interact with foot-driven material. If the user moves to use another surface, the feet menus would keep up for ease of access. The system would need some way of telling when a user was ready to input with feet, which could draw off the double-tap of the pocket-based foot gesture system.

Monday, September 26, 2011

Book Reading #1: Gang Leader for a Day

As a whole, Sudhir Venkatesh's Gang Leader for a Day provided an interesting, albeit legally gray, way of conducting an ethnography. I have basically zero knowledge of sociology, but I learned of two extremes of the discipline from the book: the almost clinical approach and the one that pushes the research in too deep. Sudhir's side of it made me wonder about the effectiveness of IRBs in certain instances. Of course they are intended to protect human rights of both the researchers and participants, but the collection of the more controversial data is what made Sudhir's study effective. At one point, he mentions how some researchers alienated themselves from the people they intended to study by gathering statistics from the police. While Sudhir gained personal knowledge of criminal activities that could have implicated himself, he also acquired a further understanding of the lifestyles of the residents of Robert Taylor. He lived in a morally gray area that hurt some people, but he personally did not intend to inflict this harm. It is very difficult to say whether he was completely justified or not. While I want to say that his contribution to the general body of knowledge was profound, I can't honestly decide whether that is enough to absolve him of the wrongs he inflicted on some of the residents. I tried to think of what I would do about the abusive manager, the bribe-taking building president, or overhearing a gang hit. When I realized that my personal decisions would resemble Sudhir's, naming simple observation with minimal interference, I had to pull back and think of how a research who followed IRB instructions would act. They probably would inform the police about criminal activities, but not only would the research likely suffer but they are also entrusting a blatantly corrupt police to solve the problem. On the other hand, the people Sudhir talked to were aware of this possibility, so perhaps approval from an IRB would not be detrimental in the least. From a legal or research standpoint, that's great, but the fact is that the moral boundary is very indistinct. I enjoyed reading the book, but it left me with more, perhaps too many, ethical questions than I had before I started it.

I was particularly perturbed by the last chapter of the book—not out of any sense of literary pacing, but because of the bleakness portrayed in it. It may have been fortunate for Sudhir that Robert Taylor was being torn down so that he would have a clean exit from the community, but that demolition was just one step in a long line of torments inflicted upon residents. The corruption inherent in the system, the federal crackdowns on gangs, and even more personal conflicts seemed to focus on the end of Sudhir's study as an ideal time to manifest. With all these problems rising up, the sorrow in Sudhir's voice at seeing the hopeful youth of the community flounder or die was plain. It was demoralizing to see that governmental incompetence or corruption led to people having to move to less favorable areas, leaving behind their friends and family. Also, while the federal crackdown was ostensibly benign, it fragmented a begrudging partnership between building tenants and gangs. The police response to that didn't help matters either. I don't think the federal government expected the local police, already entrusted to protect the area (but terrible at it), to take out their frustrations on even the more legal residents of Robert Taylor.

However, Sudhir's naivete and recklessness drove me insane. Certainly, I would expect someone new to an area to not be knowledgeable about the local culture, but after six years of hanging out with the residents of Robert Taylor, he hardly had a reason to be ignorant of the effects of sharing hustler research with others. It killed me that he would foolishly demolish his reputation within the community. Maybe that betrayal was the reason why his women's writing workshop further pushed him from the community. Maybe he had made himself a plausible target for any sort disdain and frustration, even when it was unjustified.

Thursday, September 22, 2011

Paper Reading #12: Enabling Beyond-Surface Interactions for Interactive Surface with An Invisible Projection

Reference Information
Enabling Beyond-Surface Interactions for Interactive Surface with An Invisible Projection
Li-Wei Chan, Hsiang-Tao Wu, Hui-Shan Kao, Ju-Chun Ko, Home-Ru Lin, Mike Y. Chen, Jane Hsu, Yi-Ping Hung
Presented at UIST'10, October 3-6, 2010, New York, New York, USA

Author Bios

Li-Wei Chan is a PhD student in the Image and Vision Lab at National Taiwan University. He is interested in computer vision and tangible user interfaces.
Hsiang-Tao Wu is a student at National Taiwan University with four papers relating to tabletop computing.
Hui-Shan Kao is a student at National Taiwan University with four papers relating to tabletop computing and other forms of display.
Ju-Chun Ko is a student at National Taiwan University with six papers, most of which relate to interface design.
Home-Ru Lin is a student at National Taiwan University with four papers relating to tabletop computing.
Mike Y. Chen is a professor at National Taiwan University and previously worked at Intel Research- Seattle. He is interested in mobile computing, human-computer interaction, and social networks.
Jane Hsu is a professor of Computer Science and Information Engineering at National Taiwan University. She is interested in data mining and service-oriented computing.
Yi-Ping Hung is a professor in the Graduate Institute of Networking and Multimedia and in the Department of Computer Science and Information Engineering of National Taiwan University. He holds a PhD from Brown University.

Summary
Hypothesis
How effective is a system that combines infrared markers and multi-touch controls? What possible uses does it have?

Methods
Users were presented with the three systems the authors developed and were told to navigate through landmarks on a map. The first two systems could view the photo represented by a pin on the map. The third could see the perspective view. User feedback was collected.

Results
The 3D building viewer could only show a portion of what users wanted to see. Orientation of the phone and zooming were not handled, so user attempts to correct the lack of visibility failed. It was also considered to be too immersive. The flashlight encountered severe focus problems because users moved it rapidly. Users also wanted to use it as a mouse. The lamp was moved on occasion, but mostly remained in a single spot for a while.

Contents
The authors developed a programmable infrared tabletop system that allows mobile devices to interact with displayed surfaces through invisible markers. The system allows for on-surface and above-surface interaction and is based on a Direct-Illumination system. Two IR cameras detect finger touches. A standard DLP projector was converted to IR, with touch-glass placed under the surface's diffuser layer. Augmented reality markers change in size to match the mobile camera, which communicates with the system to update marker layouts and requires at least four markers at a time for calibration. Priority goes to the closest camera. Kalman filtering reduces jitteriness. The normal system of touch detection by subtracting the background is ineffective with the changing IR background, so the system instead simulates the background at each frame and then projects expected white areas for a maximum delay of one frame. The simulation stitches together the marker locations. Foregrounds are extracted through thresholds. Making the markers appear as black pixels provides to little illumination, so the authors adjusted the intensity of the black. Camera and projector synchronization keeps several previous frames in memory for the subtraction method. An additional camera is used only for calibration. The four points used for calibration allow for additional devices without extra input. Content is generated in Flash and warped through DirectX.

One proposed usage resembled a desk lamp that would project onto the surface. Where the lamp and tabletop intersect, the tabletop's projection is masked to reduce blur. A portable version, used like a flashlight was also proposed. It is a pointer to indicate certain information, but can also manipulate content through a button. It uses an integrated laser to resolve focusing issues. The third concepts uses a tablet to display 3D geographical content. A table boundary appears around 3D objects to remind users of reality.

Discussion
The authors were trying to create a basic framework and a few applications of a system that combined IR and a touch surface. My concern is that they did not perform a lot of user testing and apparently omitted several extremely useful features. However, as a proof of concept, I could see the potential of the technology, so the authors convinced me that this is a possibly viable area of research.

The combination of augmented reality with a touchscreen was particularly interesting to me. The authors combined two technologies that are not commonly used together to produce a new form of interaction. However, their claim that the 3D viewer was too immersive concerns me. Since the goal of the system is to combine to disparate elements into a harmonious new technology, users could only focus on the mobile device, which necessarily limits the effectiveness of the system. I was pleased that the authors intend to reform this area, but I am not sure how quickly, if at all, this goal can be obtained.

Paper Reading #11: Multitoe: High-Precision Interaction with Back-Projected Floors Based on High-Resolution Multi-Touch Input

Reference Information
Multitoe: High-Precision Interaction with Back-Projected Floors Based on High-Resolution Multi-Touch Input
Thomas Augsten, Konstantin Kaefer, Rene Meusel, Caroline Fetzer, Dorian Kanitz, Thomas Stoff, Torsten Becker, Christian Holz, Patrick Baudisch
Presented at UIST'10, October 3-6, 2010, New York, New York, USA

Author Bios

Thomas Augsten is a Master's student in IT Systems Engineering at the Hasso Plattner Institute of the University of Potsdam. This is his second paper and first presented at UIST.
Konstantin Kaefer is a Master's student in IT Systems Engineering at the Hasso Plattner Institute of the University of Potsdam and also works on mapping software for Development Seed. He is the co-author of a book on Drupal.
René Meusel is a student at the Hasso Plattner Institute of the University of Potsdam. This is his first paper.
Caroline Fetzer is a student at the Hasso Plattner Institute of the University of Potsdam. This is her first paper.
Dorian Kanitz is a student at the Hasso Plattner Institute of the University of Potsdam. This is his first paper.
Thomas Stoff is a student at the Hasso Plattner Institute of the University of Potsdam. This is his first paper.
Torsten Becker is a graduate student at the Hasso Plattner Institute of the University of Potsdam, specializing in Human-Computer Interaction and mobile and embedded systems. He has two peer-reviewed papers.
Christian Holz is a PhD student at the Hasso Plattner Institute of the University of Potsdam. He has six publications.
Patrick Baudisch is a professor of Computer Science at the Hasso Plattner Institute of the University of Potsdam. He worked at PARC and Microsoft Research.

Summary
Hypothesis
How effective is a back-projected floor-based computer that reads input from users' shoe soles?

Methods
The study of how to not activate a button had participants walk over four buttons, two of which were to be activated, and two that were not. The authors observed strategies used and conducted personal interviews. The buttons were labelled pieces of paper. The user strategies were categorized.

A second study determined which area of the soles users expected to be detected. Users stepped onto the multi-touch floor, which produced a honeycomb grid reflecting where contact with the foot was detected based on user perception.

The third study tried to determine if users have a consistent expected hotspot for foot contact. Users placed their hotspot over the system's generated cross-hair and confirmed their selection. The first contact was with whatever portion of the foot they desired, though the rest used specific portions.

Another study determined precision by asking users to use three differently-sized projected keyboards. Tracking inaccuracy was mitigated, as this test revolved around user capability. The users typed a sentence and were timed.

Results
Users did not have a consistent way of activating only certain paper buttons. Some strategies used were ergonomically unsound. Most participants tapped buttons with their feet.

The second study found that most users expected the foot's arch to be a point of contact, though two excluded the arch. Some users eroded the actual area of contact, but most agreed detection should be based on projection.

The third study showed substantial disagreement between user hotspots, with no hotspot gaining a majority of usage.

The fourth study found that errors and time increased as the key size decreased. Participants were split between a preference for the large and medium keyboards.

Contents
Tabletop computers suffer from size constraints based on a user's arm reach. The authors developed a system that, instead of being a table, is projected under a floor, allowing users to walk to access items. This is based on frustrated total internal reflection (FTIR), which allows for resolution similar to a tabletop computer. Proper foot posture is required for input, pop-up menus are located-independent and activate by jumping, and hotspots to determine foot placement are user-customizable. The system can interpret head tracking, body posture, and recognize users based on pressure on the floor.

Soles are detected with front diffuse illumination, which tracks shadows. The floor surface is specialized and contains a screen, glass, acrylic, and silicone. Because of the expense of creating the surface, the trial version detects a small subregion of the space. Most menus are location independent and are activated by jumping, which was rarely unintentionally done. Based on the second user study, the front diffuse illumination was used over FTIR's tracker, though elements of FTIR were used in the general detection algorithm. FTIR was a predominant component in determining user pressure. Hotspots reduce the foot contact to a single point and are requested whenever a new pair of soles is detected. The new user detection system tries to interpolate from a database of pre-existing soles. Determining actions is based on frames of pressure patterns gleaned from FTIR. Head tracking is approximated based on pressure interpretations of balance. Further subdivision of soles allowed the authors to play a game on their surface.

Discussion
The authors tried to prove that a floor-based system in the same vein as a tabletop computer was usable. Considering that the design of each part of the system relied heavily on user feedback, their work certainly has convinced me of the effectiveness of such a system.

When I first heard about Microsoft Surface, I couldn't help but think that that was an amazing feat of engineering. This raised the bar for me, as it addressed the usability concerns of Surface (for me, arm length). I would imagine that a commercial model would be just as prohibitively expensive as the Surface, but the ability to compute while somewhat exercising is very interesting.

Tuesday, September 20, 2011

Paper Reading #10: Sensing Foot Gestures from the Pocket

Reference Information
Sensing Foot Gestures from the Pocket
Jeremy Scott, David Dearman, Koji Yatani, Khai N. Truong
Presented at UIST'10, October 3-6, 2010, New York, New York, USA

Author Bios

Jeremy Scott was an undergraduate at the University of Toronto but now is a graduate student at MIT, where he works in the Multimodal Understanding Group and studies AI.

David Dearman is a PhD student at the University of Toronto. His interests are in context-aware computing, specifically using mobile devices.

Koji Yatani is a PhD candidate at the University of Toronto and previously worked for Microsoft Research in Redmond. He is interested in developing new sensing technologies.

Khai N. Truong is an Associate Professor at the University of Toronto. He is interested in enhancing usability and holds a PhD in Computer Science from the Georgia Institute of Technology.

Summary
Hypothesis
How effective is a foot-controlled, visual feedback-less input system for a mobile device?

Methods
The authors conducted an initial study of the efficacy of foot-based gestures involving lifting and rotating. Participants selected targets by rotating from the start position along three axes of rotation: the ankle, heel, and toe. Ankle rotations were further subdivided. Rotations were captured through a motion capture device that focused on a rigid foot model to ensure uniformity. No visual feedback was provided when making a selection, though users were trained.

A second study logged accelerometer data points for further analysis.

A third study operated similarly to the first experiment, but used the authors' system instead, with three iPhones placed on the user. Again, a practice session preceded the test. Gestures were cross-validated through leave-one-participant-out, which tests against an omitted data point, and within-participant stratified cross-validation, which tests against a single participant at a time.

Results
Each of the four rotations were analyzed separately. Closer targets were selected more quickly, but the angle of error was more severe. Ankle selection was less accurate than either toe or heel rotation, the most accurate. A probable cause of error was user exhaustion. Rotation in line with normal physiological bounds was preferred by users. Raising the heel in plantar flexion was more accurate than the other ankle rotation and was preferred by users. Rotating the foot has roughly the same error whether pivoting on the toe or heel. Participants preferred the toe-based gestures for comfort.

The accelerometer data suggested 34 features than could be used to distinguish a gesture. These were categorized into time-domain figures, based around time intervals, and frequency-domain figures, based on samples in the frequency domain.

The third study found that the system could classify ten different gestures correctly approximately 86% of the time. Within-participant classification tended to be more accurate. Placing the accelerometer on the side increased chances of success. Consideration of the angle led to more confused gestures.

Contents
Eyes-free interaction devices already exist and use aural or vibrotactic, which involves small vibration motors, feedback. Some of these rely on accelerometers to detect gestures. Unlike voice or touch controls, foot controls are less researched. The foot cannot provide particularly fine gestures but can perform coarse ones. Accelerometers can sense an infer a person's activity fairly accurately.

The authors distinguish foot gestures based on the axis of rotation and in part the direction of rotation. Their favored rotations, toe, heel, and plantar flexion, were subcategorized based on the angle of rotation. The final type of rotation was ignored based on their first study.

The authors performed two studies to test the efficacy of foot-based gestures and developed a system that analyzed foot gestures through an accelerometer of a phone on the corresponding hip. Gesture recognition is derived through machine learning. The proposed gesture system relies on a 3-axis accelerometer, the user placing their foot in the origin and double-tapping to begin a gesture along a single axis. A Naive Bayes algorithm is used to classify movements and was selected for its low time complexity.

Discussion
The authors tested the efficacy of a foot-based gesture system. Their results suggested that the within-participant stratified cross validator, combined with a small array of foot rotations, could make for a fairly effective gesture recognizer. I was actually quite impressed by how much better the accuracy was than I expected. The principle, at least, convinced me of this study's validity.

I honestly tried to envision myself using this technology while sitting on a bus (for the sake of fairness, a relatively uncrowded one). The most important detail of that sentence is the word "sitting." I cannot begin to imagine the balancing nightmare that trying to use this technology while standing would provoke. Perhaps I'm just overly clumsy or maybe disinclined towards standing on one foot in general, but I cannot imagine myself using a foot-based gesture except while sitting. That is why when I read that the authors thought this could be useful for someone standing with their hands full, the practicality of the device suddenly plummeted. Its uses are far too few to make any real sort of difference in a user's life.

Paper Reading #9: Jogging over a Distance between Europe and Australia

Reference Information
Jogging over a Distance between Europe and Australia
Florian 'Floyd' Mueller, Frank Vetere, Martin R. Gibbs, Darren Edge, Stefan Agamanolis, Jennifer G. Sheridan
Presented at UIST'10, October 3-6, 2010, New York, New York, USA

Author Bios

Florian 'Floyd' Mueller was affiliated with the University of Melbourne Interaction Design Group, Microsoft Research Asia, the United Kingdom's Distance Lab, and the London Knowledge Lab. He is now a Fulbright Visiting Scholar at Stanford with a PhD from the University of Melbourne.

Frank Vetere is a professor in the University of Melbourne Interaction Design Group, but also has many strictly sociological papers. He was Mueller's PhD advisor.

Martin R. Gibbs is a lecturer with the University of Melbourne Interaction Design Group and has a PhD in sociology. One of his research projects involves studying social interactions in World of Warcraft.

Darren Edge works in Microsoft Research Asia's Human-Computer Interaction Group. He has a PhD from the University of Cambridge's Rainbow Group.

Stefan Agamanolis was the Chief Executive and Research Director of Distance Lab, but currently is the Associate Director of the Rebecca D. Considine Research Institute at Akron Children's Hospital. He has a PhD from MIT's Media Lab.

Jennifer G. Sheridan was with the London Knowledge Lab, but currently is the Co-founder and Director of BigDog Interactive, which develops interactive applications. She holds a PhD from Lancaster University.

Summary
Hypothesis
Can runners in different areas have the social experience of running together through spatialized audio? How do you design a technologically-augmented social exertion activity?

Methods
The authors interviewed participants after 14 paired runs using Jogging over a Distance. The pairs already had some social connection to their partner. Most of the tests were cross-continent, though one took place on a single track. A coding process was used to identify common themes.

Results
Initial findings suggested that users enjoyed using Jogging over a Distance more than jogging with a physically present person, due to the percentage heart rate as the determinant of position. The integration of communication encouraged users to exert similar efforts if they wanted to talk to the other person. Users also picked up on the breathing of their partner, though inquiries about who was in front were frequent. Users could comprehend their effort level, rather than performance, though the measure of exertion can produce relatively unintuitive results for a race's winners. The system could also virtually map exertion into a shared digital representation, which users considered similar to a handicap in golf.

Contents
Most current systems that make exercise social measure user performance as a competitive score after the completion of some exercise.The authors proposed a work that shares this sort of data as it is collected in real-time and integrates with advances in networking to simulate exercising together. Their system, "Jogging over a Distance", sends audio from a headset in real-time through a spatialized channel to give the illusion of being in a position relative to another runner. Their position is based on the user's percentage of target heart rate, to allow more people to run together. Jogging over a Distance uses a mobile phone and a heart rate monitor to facilitate communication. It does not display data while running to reduce distractions.

The ability to talk with another is part of the communication goal as well as possibly conducive to a runner's health. For this paper, though, the authors chose to explore only the user experience aspect of a social exertion system, as this is a step in a multi-stage investigation of user experiences. While Jogging over a Distance has been revised multiple times, this paper is based off their latest revision to date.

The design recommendations for a social exertion system can focus on the themes the authors noted for Jogging over a Distance, but each theme has its drawbacks. The authors argued that communication integration must be balanced between a high and a low point based on the designer's intentions. Effort comprehension can entail traditional measures, like goals scored, or more unusual ones, like heart rate alone, to give users a better kinetic understanding. Virtual mapping of bodily investments to a sort of digital event, like leveling up or relative position, can vary in its level of detail. Coarseness tends to suit beginners better, though the authors suggest dynamic difficulty balance.

Discussion
The authors found that Jogging over a Distance, at least in the short term, encouraged participants to run together and had some distinct advantages above normal running. The numeric data was necessarily limited, as the goal of this study was to gain qualitative data. Nonetheless, the overwhelming success of the system with its test users convinced me that this system is viable.

This is particularly interesting technology to me because I prefer to not jog alone, and this would allow a greater diversity of partners. Also, the authors produced a more generalized set of design considerations to encourage more work in the area, which I appreciate.

The biggest flaw I could find is that the system relied on both voice and data access. For those of us with less-than-stellar cellular service, this technology is all but unusable.

Thursday, September 15, 2011

Paper Reading #8: Gesture Search: A Tool for Fast Mobile Data Access

Reference Information
Gesture Search: A Tool for Fast Mobile Data Access
Yang Li
Presented at UIST'10, October 3-6, 2010, New York, New York, USA

Author Bio
Yang Li is a senior research scientist at Google and was a research associate at the University of Washington. He holds a PhD in Computer Science from the Chinese Academy of Sciences. He is primarily interested in gesture-based interaction, with many of his projects implemented on Android.

Summary
Hypothesis
How effective is a search tool for mobile phones that allows for faster data access by drawing only a few gestures? Does the search performance optimize over time? How different are user queries for the same item? Can touch events be differentiated from gestures on the basis of velocity?

Methods
To test if gestures were distinguishable from touch events, data was collected on touch events and gestures, even with gestures outside Gesture Search's list. Seven participants found contacts and locations on the phone, yielding over 560 touch events. Gestures were pulled from published data and users who opted in to sending their data. The authors measured the squareness of these events.

To test Gesture Search's effectiveness, a longitudinal study was performed on company Android users. The test was over everyday usage of the program, with a survey at the end, so data was filtered based on the level of usage. Data was logged to a server, including the size of the dataset and actions performed. The number of unique queries versus the number of unique items accessed was assessed.

Results
Gestures generally have larger bounding boxes than touch events, but possibly ambiguous gestures make a time delay necessary. Even events like scrolling tended to have narrower bounding boxes. Squareness allowed quicker prediction of gestures.

Users tended to search for contacts or applications and accessed their desired results by drawing more gestures. Most queries used two or less gestures and involved no re-writing. Roughly half the time, the top choice was selected. Most users did not access most of their available items, but the complexity of queries did not change much with variation in the size of the item dataset. Generally, a unique query was used to access a unique item, though sometimes multiple queries were used to access an item. Users generally found the program useful and liked not needing to type or navigate a UI, though they felt it needed better integration.

Contents
Data access on smartphones is hindered by the small screen size and deep hierarchies. Both desktops and smartphones have keyword-based search tools available, but the small key size of phones makes these applications less than ideal. Voice-activated tools are not always correct. Gestures are also used, but can suffer from difficulty in recognizing many different symbols.

Gesture Search, which the author developed for Android, tries to provide multiple interpretations of an entered gesture against its dataset and updates its search ranking based on selected items for faster access. The gestures are characters based on a time-out system to allow for easier user memorization. The entire screen is used to draw, so users draw on the search results. In case of ambiguity, all relatively likely results are returned, with the matching characters highlighted. A smaller version of the drawn search query is displayed on-screen. Users can erase all or part of their query through swipes and use multiple prefixes through the use of spaces.

Users do not have to do anything to start drawing. When the system cannot tell if a touch event is happening, the UI automatically processes it, but the gesture is kept in a buffer to determine if a series of touches were a character. When this is detected, the gesture layer does not send more information to the list and the color of the strokes changes. A threshold for low probability interpretations was derived. The program is optimized for real-time searching. The mapping of a partially complete query to an item is done through a probabilistic, high-order Hidden Markov Model. Items only appear if they exceed the threshold found from inference of all possible paths.

Discussion
The author produced a gesture-based system and wanted to test whether it would work both in theory and actuality. On both accounts, the data suggests the system was widely successful. His claims were backed initially by the small-scale test, but also by the public release of the program. He checked all of the details thoroughly to produce a solid application, so I am completely convinced that this is effective.

I found this was an innovative approach to searching that also learned with a greater accuracy than I expected. The flexibility of the system in character detection is a major boon for those of us who can't write very legibly, and the real-time searching is speedy.

The only real drawback I could think of for this program is the length of time needed to call up the app, for which some of the users in the study make quick workarounds. I could see how handling touch events while waiting for gestures could cause a problem with accidental selection, but this is largely prevented thanks to squareness measurements.

Paper Reading #7: Performance Optimizations of Virtual Keyboards for Stroke-Based Text Entry on a Touch-Based Tabletop

Reference Information
Performance Optimizations of Virtual Keyboards for Stroke-Based Text Entry on a Touch-Based Tabletop
Jochen Rick
Presented at UIST'10, October 3-6, 2010, New York, New York, USA

Author Bio
Jochen Rick was a research fellow at the United Kingdom's Open University, but now is a junior faculty member in the Department of Educational Technology at Saarland Universty. He holds a PhD in Computer Science from Georgia Tech. His primary interest is in how new media can enhance collaborative learning.

Summary
Hypothesis
Can two proposed alternative keyboard layouts reduce text entry times for stroke-based entry in comparison with tap-based entry?

Methods
Initially, participants completed a series of gestures both tapping and stroking through four points. The distance travelled and angles were measured. Sequences were drawn repetitively and in rapid succession. The results for the left-handed participants were flipped to resemble that of right-handed users. The author derived equations corresponding to times needed per gesture portion.

To test layouts, existing layouts were modified slightly so that they could be tested against a list of 40,000 of the most common English words. The results from this test were tested against the author's specialized layouts. Both tests measured the amount of time necessary to produce each word.

Results
Moving in a straight line took the least amount of time and changing angle by 180 degrees took the longest. More efficient angles follow arm direction, but the arm visually blocking was not a problem. The author's variation of Fitts' law had an increased coefficient of determination.

When comparing existing layouts, stroking was faster than tapping. Keyboards optimized for one-finger typing performed best. The author's generated square and hexagonal stroke keyboard layouts outperformed the tapped best of each of those categories.

Contents
For tabletop computing, tap-based virtual keyboards are not intended for ten-finger typing. Stroke-based keyboards address the lack of tactile feedback somewhat. The author produced two keyboard layouts that are statiscally more effecient than the standard Qwerty layout. Various other layouts exist, some of which address stroke-based entry. Shape writing matches the shape of the stroke as opposed to its absoluted position for word-based input. The ideal is a layout with letters frequently used in succession close to each other. Tap-based interfaces usually follow Fitts' law, which says that the time needed for a gesture follows a logarithmic function of distance over target width. The author sought to improve the approximation for time used to produce a gesture and used a variation of Fitts' law.

He created approximations for the length of time needed to produce a stroke or a series of taps. The approximations were applied to Project Gutenberg's 2006 list of most frequent English words. The author ignored special characters and capital letters, though characters with similar equivalents were mapped differently for strokes. An exhaustive search to produce the best keyboard layout is not feasible, so he used a simulated annealing process followed by hill climbing.

Discussion
The author sought to achieve a more efficient keyboard layout for stroking than current models. While the models he created are theoretically more efficient, his layout as it stands has ambiguity for word entry. Interpreting user input relies on software that guesses the correct input. This is such a significant step backwards from tapping that I cannot accept this model as a viable alternative for tapped layouts without substantial improvements.

My main problem with the stroked layouts is that they rely on predictions. These are difficult to generalize, as a person who only types in chatspeak requires a different subset of language from one who writes more formally. It would take a significant amount of work to produce efficient, and more importantly correct, predicting software.

The prediction software would be particularly interesting future work, though I'm not entirely convinced it would be feasible. If and only if it can be done, I could see myself using this keyboard layout.

Tuesday, September 13, 2011

Paper Reading #6: TurKit: Human Computation Algorithms on Mechanical Turk

Reference Information
TurKit: Human Computation Algorithms on Mechanical Turk
Greg Little, Lydia B. Chilton, Max Goldman, Robert C. Miller
Presented at UIST'10, October 3-6, 2010, New York, New York, USA

Author Bios

Greg Little is a professor in MIT's Computer Science and Artificial Intelligence Lab. He hopes to build a computer that can house consciousness.
Lydia B. Chilton is a graduate student at the University of Washington and interned for Microsoft Research Asia.
Max Goldman is a graduate student at MIT's Computer Science and Artificial Intelligence Lab and User Interface Design Group.
Robert C. Miller is an Associate Professor at MIT and leads their User Interface Design Group. He has a PhD from Carnegie Mellon.

Summary
Hypothesis
Does the TurKit toolkit provide an effective way of integrating iterative programming with human computation?

Methods
The authors tested various example applications in lab and evaluated performance from 20 scripts running nearly 30,000 tasks in a year. The applications included iterative writing, blurry text recognition, decision theory experimentation, psychophysics experimentation

Results
Iterative writing found most contributions added to the text. Blurry text recognition gradually improved with iterations, though initial bad guesses were detrimental. TurKit facilitated decision theory and psychophysics experimentation, though the tests were run with a version that made parallelization more difficult. The authors found that waiting for humans required time that was an order of magnitude greater than running most of the scripts, suggesting that recording executions would reduce time. The need for determinism was not clear in many cases though. TurKit does not scale well.

Contents
TurKit is a toolkit to facilitate human computation algorithms, in which certain functions are delegated to a human. It uses Amazon's Mechanical Turk (MTurk) platform, which relies on paid humans to collect and organize data. TurKit uses crash-and-rerun to allow programmers to write imperative programs that rely on MTurk. Crash-and-rerun allows for modification and re-execution of a script without rerunning the more costly side functions. TurKit's implementation of this stores the trace leading to the current state. Certain actions can be designated to only run once, saving time and resolving non-determinism. Crashing a script is similar to blocking.

An extension of JavaScript, TurKit Script, provides a wrapper for MTurk. It allows for parallel computing through forks. An online interface facilitates script design and execution. Three primary MTurk functions used are text entry, voting, and sorting.

Discussion
The authors wanted to know if TurKit provided a useful and efficient way to integrate human computation into iterative programming. Since the overhead incurred by the system was so dramatically less than the human time required, they achieved their goal. This intuitively makes sense, so I support the authors' claim.

This work is significant because it allows for human computation to be systematic and minimized, reducing time and financial costs. Notably, it stores costly state information in a database in a manner similar to a stack-trace of a debugger. This storage allows a programmer to modify the script for re-execution while still using the same data, which is incredibly useful.

Some of the language features seemed unintuitive, and the wrapping of the once keyword selectively made the language a bit of a guess-and-check type. I for one would need to constantly refer to the API to see what function used an implicit once.

Thursday, September 8, 2011

Paper Reading #5: A Framework for Robust and Flexible Handling of Inputs with Uncertainty

Reference Information
A Framework for Robust and Flexible Handling of Inputs with Uncertainty
Julia Schwarz, Scott E. Hudson, Jennifer Mankoff, Andrew D. Wilson
Presented at UIST'10, October 3-6, 2010, New York, New York, USA

Author Bios

Julia Schwarz is a PhD student at Carnegie Mellon's Human Computer Interaction lab and has interests in handling ambiguous input.
Scott E. Hudson is a Professor in Carnegie Mellon's Human-Computer Interaction Institute and has published over 100 papers.
Jennifer Mankoff is an Associate Professor in Carnegie Mellon's Human Computer Interaction lab and holds a PhD from the Georgia Institute of Technology.
Andrew D. Wilson is a senior researcher at Microsoft Research with a PhD from MIT's Media Lab. He helped found the Surface Computing group.

Summary
Hypothesis
How effective is a new input handling framework that coexists with existing applications and handles uncertainty?

Methods
The authors illustrated the effectiveness of the framework through six case studies. Three focused on improving touch interaction, two on smarter text entry, and one on improved GUI pointing for the motor impaired. The motor impaired test used prerecorded data to simulate clicks in the framework.

Results
Selecting buttons was successful due to the increased target area. Text entry was implemented quickly with no report on its effectiveness. The motor impaired test found a dramatically reduced number of errors compared with the sample's original test. Overall, the framework was effective in handling uncertainty.

Contents
New means of interaction mean that the certainty of inputs is no longer assumable. Conventional input frameworks may resolve uncertainty in an undesirable way. The presented framework can temporarily keep track of possibilities for uncertain input to allow for a more informed decision. Conventional frameworks model inputs, dispatch to an object, interpret what events occurred, and take an action. The interpretations of the proposed framework use a probability mass function to guess what the event was meant to do. Integration with conventional interactors is not yet implemented. A wrapper would be used for this.

Because of ambiguity, each interactor in the possible list, derived by scoring interactors based on a query, receives notification of the event, but conflicting actions must have this ambiguity resolved by making finalization requests to a mediator. The scores are derived from a combination of factors including nearness. For each interactor, the normalized selection score and probability of the event happened are multiplied to determine the final probability of that event on that object. Feedback is immediate for non-permanent or reversible actions even in the case of ambiguity. These actions are modeled separately and are temporary. The mediator can select an action, cancel all actions, or query the user.

Discussion
While the authors claim to have succeeded in creating an effective framework, the data produced was limited to six test cases. Only one of those involved any proper data analysis. While I can't claim that the framework is unlikely to work, the only case in which I was truly convinced was the test of motor impaired clicking, as that provided a measurable claim that the framework was successful. The rest of the cases required me to simply infer from the test with data that they also held.

I found this work interesting because of the rate of erroneous input for touchscreen and other non-traditional input forms is fairly high. With more research into this area, we might be able to use this or a similar framework to make such input forms non-ambiguous and fairly accurate.

When it came to window resizing, I was deeply bothered by the fact that the authors assumed that vertical resizing was unlikely. I work with short/small screens frequently and thus use a vertical resize to increase my productivity. This framework would seemingly punish me for such an action, which is a painful limitation.

Paper Reading #4: Gestalt: Integrated Support for Implementation and Analysis in Machine Learning

Reference Information
Gestalt: Integrated Support for Implementation and Analysis in Machine Learning
Kayur Patel, Naomi Bancroft, Steven M. Drucker, James Fogarty, Andrew J. Ko, James A. Landay
Presented at UIST'10, October 3-6, 2010, New York, New York, USA

Author Bios

Kayur Patel is a computer science PhD student at the University of Washington and is interested in machine learning.
Naomi Bancroft was an undergraduate at the University of Washington and is currently working for Google.
Steven M. Drucker is a principal research at Microsoft Research and is an affliate professor of the Unversity of Washington.
James Fogarty is an Assistant Professor at the University of Washington and is a key member of the university's Human-Computer Interaction group.
Andrew J. Ko is an Assistant Professor at the University of Washington and directs the USE research group at that university.

Summary
Hypothesis
Does the system presented in this paper aid in reducing the debugging time for a machine learning system?

Methods
The study compared bug-finding performance with a similar task in MATLAB. The participants, graduate computer science students, had some experience in machine learning. Participants had to write code to connect data in the MATLAB environment, though the remainder of the debugging task, involving sentiment analysis and gesture recognition, was identical. The measurements were the number of bugs found and the number of bugs fixed.

Results
Users found and fixed far more bugs with the Gestalt environment than in the MATLAB one. Some used the visualization scripting feature of Gestalt. Participants enjoyed the connectedness of Gestalt. Most time was spent analyzing for errors.

Contents
Gestalt is a general-purpose tool for the application of machine learning and supports the implementation of a classification pipeline, analysis of the data in the pipeline, and transitions between implementation and analysis. Analysis currently requires extensive developer time, but it performed repeatedly through the learning process. Two possible applications are sentiment analysis and gesture recognition. Classification of data ensures the accuracy of models, but suffers the problems of verifiability and sparseness. Systems are often tested without the data of user and then tested against that user's data for verification. Developers have two high-level perspectives: implementation and analysis. As hiding steps in the pipeline prevents generality, for flexibility, Gestalt operates similarly to an IDE. Information is stored in a relational data table to eliminate the need for data conversion. Gestalt allows developers to write code to produce generalized visualizations.

The authors reference domain-specific tools as inspiration for many of the generalized features of Gestalt. Disconnected generalized systems influenced the interconnectedness of the program. MATLAB is a commonly used tool in machine learning due to its connected nature, but does not have data representations reflecting the diversity of machine learning.

Discussion
The authors attempted to create a system that was more useful for machine learning debugging than a popular solution. I am somewhat convinced that they succeeded, since the small sample size could suggest that these users were all flukes who preferred Gestalt. However, given that I think I would prefer Gestalt over MATLAB, their results hold some merit.

I find any innovation that makes the debugging process at least little easier to be worth looking into. This seemed to cut down on the difficulty of debugging machine learning problems more than the standard process.

As future work, I would be interested in seeing some of the generalized concepts seen here extended into general-purposed IDEs. Data visualization in particular could make finding any bug significantly easier, as it is a problem not just found in machine learning.

Monday, September 5, 2011

Paper Reading #3: Pen + Touch = New Tools

Reference Information
Pen + Touch = New Tools
Ken Hinckley, Koji Yatani, Michel Pahud, Nicole Coddington, Jenny Rodenhouse, Andy Wilson, Hrvoje Benko, Bill Buxton
Presented at UIST'10 October 3-6, 2010, New York, New York, USA

Author Bios

Ken Hinckley is a principal research at Microsoft Research. He holds a PhD in Computer Science from the University of Virginia.
Koji Yatani is a PhD candidate at the University of Toronto and previously worked for Microsoft Research in Redmond
Michel Pahud works for Microsoft Research and holds a PhD in parallel computing from the Swiss Federal Institute of Technology.
Nicole Coddington worked for Microsoft Research and is now a senior interface designer at HTC.
Jenny Rodenhouse works for Microsoft Research. She is currently in their Xbox division.
Andy Wilson is a senior researcher at Microsoft Research with a PhD from MIT's Media Lab. He helped found the Surface Computing group.
Hrvoje Benko is a researcher of adaptive systems and interaction for Microsoft Research. He received his PhD from Columbia University.
Bill Buxton is a principal researcher for Microsoft Research with three honorary doctorates.

Summary
Hypothesis
Can we divide pen, touch, and combination tasks intuitively for UI design?

Methods
A design study asked participants to illustrate a short film storyboard by pasting clippings into a paper notebook. Observed behavior was categorized into nine behavior types. The implemented system was presented in a similar fashion.

Results
The paper experiment found various behaviors: fingers and the pen have specific roles, the pen is tucked when not in use, clippings were held and "framed" by fingers, sheets were held in the non-dominant hand, an extended workplace appeared, users created piles of materials, some users drew along edges of clippings, and users held their place in the notebook with fingers. The principle of the pen writing, touch manipulating, and pen + touch producing complex interactions was fairly natural for users once instructed, though the gestures are not self-revealing. Users enjoyed the stapling and copying features and being able to hold their place, though sometimes they lost their place. The cutting feature versus tearing emphasized that users perceive touch differently from pen + touch.

Contents
The authors tried to combine pen and touch gestures into a combination they called pen + touch. Most of previous work in this area resulted in unintuitive gestures or used buttons. The authors suggested that pen and touch generally operated in different fields, but also that gestures should be based on how users interact with physical materials. Their main design considerations were input types, if the interface should change based the task, how to assign devices to hands, how to best map inputs, how many hands to use, if simultaneous or sequential actions are needed, when to use ink or command mode, and whether inputs would be simple or phrased together. The authors decided that in general the pen should be ink, and touch should manipulate objects. Users naturally interleave pen and touch, allowing for more actions.

The system reflected natural interaction with a notebook, providing whole-screen and dual-screen views. Simultaneous pen and touch usage is supported, but palm inputs were ignored. Manipulating, zooming, and selecting objects is done through touch gestures. Common controls supported both pen and touch to fit with user tendencies. The pen + touch techniques use the non-preferred hand to hold an item while the pen acts related to the item. A stapling interface was included, with the pen as the stapler to prevent errors. Users could cut or copy images by moving the pen. Objects could be a straightedge instead of having a ruler tool. The various tasks could also be combined. Photos can be used as brushes, and a pen stroke was similar to curving tape. Finger painting broke the "touch manipulates" rule but fit user expectations. A finger-operated bezel menu allowed for objects to be stored off-screen, pages to be flipped, and places to be held.

Discussion
The authors sought a more intuitive usage for touch and pen controls and found one. I am convinced that their system is effective, though I must agree that it is not self-evident.

This work is particularly interesting to me because I find consumer touch-based software at best clunky and at worst unusable. This system has the potential to set in motion a push for natural interfaces that don't necessarily state that they are an emulation of paper but are just as intuitive as paper.

The authors specifically noted that palm touches are ignored, which drops an entire category of interactions. For instance, the dominant hand could rest on the screen while writing, triggering a handwriting recognition program, but when the hand isn't resting, the user intends to leave their writing as is.

Paper Reading #2: Hands-On Math: A page-based multi-touch and pen desktop for technical work and problem solving

Reference Information
Hands-On Math: A page-based multi-touch and pen desktop for technical work and problem solving
Robert Zeleznik, Andrew Bragdon, Ferdi Adeputra, Hsu-Sheng Ko
Presented at UIST'10 October 3-6, 2010, New York, New York, USA

Author Bios

Robert Zeleznik is the Director of Research at Brown University and is the CTO of FluiditySoftware.com with a Masters degree from Brown.
Andrew Bragdon is a PhD student of Computer Science at Brown and studies gestural techniques.
Ferdi Adeputra is affiliated with Brown University and has work in code bubbles.
Hsu-Sheng Ko is affiliated with Brown and has work in gesture selection.

Summary

Hypothesis
Can Computer Algebra Tools (CAS) with a multi-touch user interface cause students and others to learn and work more efficiently?

Methods
The participants were undergraduates who needed math for their classes. Users created and manipulated pages, performed a quick calculation, performed multi-step derivation, graphed an equation, used palm-detecting options, used the Web, and manipulated the contents of a page.

Results
The system was largely successful, though students noted it might work better on a portable system. Page manipulation was successful, though users needed instruction first. Paper folding, bimanual gestures and the rectangular lasso selection were not popular, unlike space insertion. Palm recognition required multiple instructions as reminders. Most were comfortable with both hands on the screen. Panning graph contents suggested that finger posture was difficult to perform at first, like the hidden menus. The ability to manipulate math was popular, with even more functionality desired. The scope of expressions being limited to a single page alleviated some fears. Gesture recognition was an occasional problem. Users frequently approached gestures through an inefficient physical strategy. Overall, a more mature version of the system appeared viable.

Contents
This technology seeks to combine paper for sequential computation and CAS for efficient answers to yield an intuitive interface. For this paper, the complexity of the math was limited to high school-level. This work differs from prior work in that it tries to generalize the application to simply allow for interaction and manipulation of mathematical equations. The system uses Microsoft Surface and an infrared lightpen. It uses the StarPad SDK to recognize handwriting and convert it to a typeface, create graphs, cause computations with extended notations, and provide manipulable web access.

The technology combines the space of whiteboards with the organization of paper. Pages are managed through swipes. The system ignores large contact points as well as non-dominant hand touches to hold the paper in place. A bar containing a live view of the user's workspace can be opened. Users can fold a section of the page similar to code folding. The system provides hidden menus that only appear when an object is moved and do not interfere with dragging. The pen is touch-activated and combines touch and pen tools. The features include lasso selection, inserting space, and clipboard pasting. The non-dominant palm and fingers can be used to provide convenient additional commands. Expressions are manipulated through context-specific pinching, dragging, and stretching, but a hidden menu can help to differentiate actions. Any manipulations appear on the following line.

Discussion
The authors sought to design a hybrid of CAS and paper that was intuitive and largely succeeded. While some of the anticipated features caused problems, as an investigative work of the promise of such a system, the system worked. I am convinced that the results found here are correct.

As someone who frequently makes algebraic errors in math, this technology seems to vastly reduce the chance for error while rewarding exploration of unusual techniques. While this is not the first work to correct that problem, this seems like the most probable and logical solution.

The authors mentioned that this system would be ideal for educational work, which I agree with. Children in particular would benefit from the innovative control scheme, which would allow them to learn both visually and physically.