TwitInfo: Aggregating and Visualizing Microblogs for Event Exploration
Adam Marcus, Michael S. Bernstein, Osama Badar, David R. Karger, Samuel Madden, Robert C. Miller
Presented at CHI 2011, May 7-12, 2011, Vancouver, British Columbia, Canada
Author Bios
- Adam Marcus is a graduate student at MIT's Computer Science and Artificial Intelligence Lab. He is a member of the Database and Haystack groups.
- Michael S. Bernstein is a graduate student at MIT's Computer Science and Artificial Intelligence Lab. His work combines crowd intelligence with computer systems.
- Osama Badar is an undergraduate student at MIT.
- David R. Karger is a professor in MIT's Computer Science and Artificial Intelligence Lab. His work focuses on information retrieval.
- Samuel Madden is an Associate Professor with MIT's Computer Science and Artificial Intelligence Lab. His work is in databases.
- Robert C. Miller is an Associate Professor with MIT's Computer Science and Artificial Intelligence Lab. His work centers on web automation and customization.
Summary
Hypothesis
How functional is a streaming algorithm that allows for real-time searches of Twitter for an arbitrary subject?
Methods
To test the algorithm, three soccer games and one month of earthquake's worth of tweets were collected. The soccer games were annotated by hand to check with the data, while the earthquake data was checked against the US Geological Survey. The authors tested precision and recall. The first tests how many tweets found by the algorithm were in the truth set, while the second sees how many elements in the truth set were found by the algorithm.
The authors then tested if users could understand the UI. Initially, users were asked to perform arbitrary tasks with the program, with usability feedback gathered. The second task was exploration-based with a time limit. Given an event, users had a little time to dictate a news report on the event. The authors then interviewed the users, one of whom is a prominent journalist.
Results
The algorithm has high recall and usually only failed to detect an event if the Twitter data lacked a peak. Precision was high for soccer but produced false positives for major earthquakes as it flagged minor earthquakes too. If minor earthquakes were included, the accuracy increased to 66%. Twitter's interests or the lack thereof introduce a bias to the data. Sometimes a single earthquake produced multiple peaks or two earthquakes overlapped.
Users successfully reconstructed events quickly. Some users felt the information was shallow, but it allowed for high-level understanding. Users focused on the largest peak first. Users liked the timeline, but did not trust the sentiment analysis. The map needed aggregation. Without the system, users would have relied on a news site or aggregator. The journalist suggested that TwitInfo would be useful for backgrounding on a long-term topic. Her opinions were largely similar to the other users.
Contents
Twitter allows for a live understanding of public consciousness but is difficult to adapt into a clear timeline. Past attempts to create visualizations are domain specific and use archived data in general. The authors developed TwitInfo that takes a search query for an event and identifies and labels event peaks, produces a visualization of long-running events, and aggregates public opinion in a graphical timeline. It identifies subevents and summarizes sentiment through signal processing of social streams. The data is normalized to produce correct results. These results can be sorted geographically or aggregated to suggest both localized and overall opinions.
A previous paper found that Twitter bears more in common with a news media site than a social network. Other papers covered hand-created visualizations of events. One paper discussed aggregated opinion on microblogs. Some research involved searching for specific events.
TwitInfo events are defined by a user's search terms. Once an event is defined, the system begins logging tweets that match the query and creates a page that updates in real-time and is used to monitor data. The event timeline is by volume of tweets matching the criteria. Spikes in the volume are labelled as peaks with varying granularity and annotated with relevant descriptors. Peaks can be used to form their own timelines. The actual tweets within the event timeframe are sorted by event or peak keywords and color-coded based on perceived sentiment. Overall sentiment is displayed in a pie chart. Sentiment is derived from a Naive Bayes classifier. The classifier reduces the effect of bias by predicting the probability that a tweet is negative or positive and then recall-normalizes the data. The top three most common URLs are listed on the Popular Links panel. Another panel shows geographic distributions.
Peaks are calculated based on tweet frequency maxima in a user-defined period of time. Tweets are sorted into bins based on time, and, in a method similar to detecting TCP congestion, the bin with an unusually large number of tweets is found. This method involves hill-climbing and identifies the appropriate peak window, which is then labeled. In case a user searches for a noisy search term, IDF-normalizing considers the global popularity of a term to lessen the term's effect.
Discussion
The authors wanted to see if TwitInfo was useful and functional. While they had a few cases that were not covered which could have altered their results slightly, the vast majority of the work appeared to be well-founded. As such, I was convinced of their results.
Since so many users wanted features that were notably missing, like map aggregation, I would like to see the system tested again with those items. The authors also mentioned that a more extensive test would be helpful, which I agree with.
Dr. Caverlee is starting a Twitter project that looks for peaks around the time of a natural disaster. At first I thought he might have gotten scooped, but this is more just a part of what he plans to do. I found the related nature very interesting.
No comments:
Post a Comment