Wellesley Researchers Develop Tool that Tracks Truth and Lies on Twitter
Social media has become a major part of modern news reporting, but the drive to be first to report on a story can often eclipse accuracy and lead to the spread of false information. Enter TwitterTrails, an interactive web tool developed in Wellesley’s Social Informatics Lab that enables journalists to quickly examine a claim made on Twitter, track how far the claim has spread, and determine whether it appears to be true or false by analyzing the behavior of people who follow the story.
TwitterTrails is directed by Wellesley professors Takis Metaxas, Professor of Computer Science and founder of the College’s Media Arts and Sciences Program, and Eni Mustafaraj, Assistant Professor of Computer Science, and managed by Samantha Finn ’12. The project is supported by Wellesley College and by a grant from the National Science Foundation. According to Mustafaraj, the NSF grant has provided a great deal of support for student employment; many student researchers have contributed to the project in myriad ways over the last several years.
TwitterTrails works by using an algorithm to study crowd behavior. Each story is evaluated using two major metrics: “spread” (which examines how much the story has been passed around) and “skepticism” (which examines how much doubt exists in the crowd about the validity of the claim). Claims that receive higher skepticism and lower spread are more likely to be false, and claims that receive lower skepticism and higher spread are more likely to be true.
“When enough people see the same piece of information their reaction correlates well with whether the information is true or false,” said Metaxas. “When a lot of people see something they know is true, it is very unlikely that they will refute it. However, when they see something they know to be false, a good subset of these people will raise questions... TwitterTrails measures these two parameters— how much something is shared versus how much skepticism exists—to determine whether a claim appears to be true or false.”
Within minutes, TwitterTrails can retrieve and return vital information, such as: who posted the first tweet on the topic; how fast the information has spread, when it started spreading, and whether it’s still going; whether people expressed doubt about the claim; and who the major players were in propagating the information.
The tool relies on an algorithm to measure and analyze crowd behavior. uses a well- known metric from Library Science called the h-index. The h-index, which was developed by Dr. Jorge Hirsch, of University of California, San Diego, determines the impact of an author's work by qualifying the author's cited publications; a work’s h-index is equal to the number of papers (h) that have (h) or more citations.
“In our use, a tweet in the dataset of a story corresponds to a publication and retweets act like citations,” Metaxas said. “Retweeting increases a tweet’s visibility, since this action forwards a tweet to one’s followers. It also gives a story credibility, since retweeting generally shows trust and agreement in the information presented.”
When analyzing “spread,” TwitterTrails makes a determination by calculating the h-index of all tweets about a given story. To calculate “skepticism,” the system calculates the h-index of tweets expressing doubt, and compares it to the h-index of those that do not express doubt, then produces a ratio based on these two values. Doubt and disbelief can be detected in a tweet by analyzing what words are being used. According to the researchers, because Twitter messages are short, people tend to use the same words to describe things. When expressing doubt or disbelief, the 10 most common keywords are: hoax, fake, doubt, false, scam, untrue, mistake, unreal, bogus, and mislead.
“We have found that for most of the English tweets, these keywords work well,” wrote TwitterTrails Project Manager Samantha Finn in a post on the project blog. “In about 20 percent of the cases, however, these are not enough to capture the majority of tweets: sometimes they miss tweets which should be counted, and sometimes they capture too many tweets, when people are using these words but without the intention of expressing doubt. To combat this, the algorithm can be customized on a story-by-story basis: words can be added or removed from this list, and words can also be added to a list so that their presence in a tweet excludes it from being counted.”
TwitterTrails is currently designed for journalists who are investigating recent and breaking stories and social media researchers, but others can trail tweets by submitting a request through the “Request a Story” form on the site. The Trails team plans to make the tool available for personal, individual use in the future.
For case studies and additional information, visit the TwitterTrails blog. Read stories about TwitterTrails in Network World (October 2014), BostInno (November 2014), and The Daily Beast (November 2014).