At the Graduate Research Symposium: An algorithm to model Twitter politics and 'fake news'

Twitter algorithm: Cheng Li was awarded the Northrop Grumman Corporation Award for his development of an algorithm that can predict political leanings of social media accounts without reading individual tweets. Li is a Ph.D. student in computer science. Photo by Stephen Salpukas

Photo - of -

by Joseph McClain | March 20, 2017

Cheng Li’s work gives political analysts a new set of tools to use in their quest to understand the role of social media in deciding elections, but he is upfront about what he brings to the table.

“We’re not political scientists,” he says. “We’re computer scientists.”

Li is a Ph.D. student in William & Mary’s Department of Computer Science. He worked with Zhenming Liu, an assistant professor in the department, on a project that presents what is perhaps the best way yet to model the swirl and snarl of political interactions on Twitter.

Li has been awarded the Northrop Grumman Corporation Award for his work. The award recognizes excellence in scholarship in the natural and computational sciences. He will join other award-winning William & Mary graduate students honored as part of the 16th Annual Graduate Research Symposium March 24-25 at William & Mary’s Sadler Center.

logo The computer science project includes a way to look at how “fake news” is propagated through social media. Li notes that Facebook and Twitter are widely credited and blamed for influencing national decisions such as Brexit and U.S. presidential elections, but analysis of the role of social networks in the decision-making process remains “speculative and crude” — even though the social media have been in popular use for more than a decade.

“We want to make sense of Twitter users’ political leanings,” Li said.

He added that it seems that there is a trend toward increasing political polarization in the United States, with evidence indicating that a left-leaning individual tends to move more to the left. The same tendency seems to exist for the right-leaning person.

“With social media, this trend has even gotten more serious,” Li said. “Because a left-leaning person will never see news from a source that is right of center, even though it might be correct and make sense, because their friends don’t share this kind of news.”

In data-science terms, their project is a “sparse graph” challenge, offering comparatively few connections between data points. Li said sparse-graph issues have been at the root of the inability of data scientists to accurately model social media interactions. He said his project has produced a viable algorithm that works for both stochastic block models and small world models — two methods of mathematically describing the interrelationships of “nodes.”

Li’s project focused on the time that encompassed the lead-up to the U.S. presidential election and its aftermath.

“Our data set only focuses on last October to last November — two months,” he said. “Our relationship graph is sparse. And there’s also noise in there. So how can we exclude the noise and still get some sense from the sparse graph?“

In Li’s project, the nodes are individual Twitter users. His algorithm constructed a mathematical portrait of the political leaning of each of the 12 million Twitter users in the study. The portrait, termed a “latent variable,” is constructed by mapping each user’s interactions with other users: They didn’t even have to read the individual tweets.

The computer scientists checked their work by “groundtruthing,” establishing a real-world basis to test their modeling of the political landscape of Twitter. They sought help from a member of William & Mary’s Department of Government.

“In fact, when we started this project, we were not sure whether we were asking the best political science questions,” Li said. “So we asked (Assistant) Professor Jaime Settle for help. Professor Settle gave us invaluable feedback, which made us confident that we were heading in the right direction. ”

He said Settle, who works extensively on social-media issues in her own research, gave the computer scientists advice about using data from an online analysis of the voting records and co-sponsorship of bills among the members of the 114th Congress.

Li was able to confirm the predictive power of his model by running it against the groundtruthed matrix. He added that the algorithm was put to other tests, outperforming competing methods.

Li and Liu also used their model in a brief study of the circulation of “fake news,” on Twitter, showing that the 10 percent rightmost users in the study accounted for most of the fake news counts. Li is careful to point out that as “computer guys,” they tread carefully in this area.

“Fake news is one application of our work, but we are not particularly interested in fake news,” he said. “We have no time to verify whether something is fake news or real news.”

One aspect of their analysis will come as no surprise: Donald Trump dominated the entire spectrum of Twitter. Li said that reactions to Trump-related tweets were highest at both political extremes.

“Trump reactions make a nearly perfect U-curve,” he said. “There are large reactions from the far right and large reactions from the far left. And it goes down pretty evenly on both sides to a low point where the middle-of-the-road is.”