Twitter Algorithm Can Track Flu Outbreaks
January 29, 2013

Twitter Can Be Used To Track Outbreaks Of Infectious Diseases Like The Flu

[ Watch the Video: Twitter Stories: The Future of Public Health ]

Brett Smith for - Your Universe Online

Is it an epic timewaster or a new way to take society´s pulse, literally?

Computer scientists from around the world have devised algorithms that parse Twitter posts for all kinds of insight and a new one from Johns Hopkins University is showing promising results for tracking the spread of the flu virus across the United States.

To track the flu via Twitter, researchers Mark Dredze and Michael Paul needed to develop a program that delivers real-time data on flu cases by filtering out online chatter that has nothing to do with actual flu infections.

“Lots of people on Twitter have what we call Justin Beiber fever,” explained Paul in an online video. “So, just looking for ℠fever´ doesn´t really work.”

According to Paul, the new algorithm utilizes a machine-learning technique that allows the computer to learn which tweets are relevant to tracking the flu and which are not. This enabled the team to determine which tweets were about health and which were about something else.

“For example,” Dredze said, “a recent spike in Twitter flu activity was caused by discussions about basketball legend Kobe Bryant´s flu-like symptoms during a recent game. Mr. Bryant´s health notwithstanding, such tweets do very little to help public health officials prepare our nation for the next big outbreak.”

Another obstacle the researchers had to overcome is the media-generated noise that often accompanies a virus outbreak. With many people posting concerns or reactions to the outbreak, the algorithm had to be designed to ignore flu-commenter tweets in pursuit of flu-sufferer tweets.

“In late December,” Dredze said, “the news media picked up on the flu epidemic, causing a somewhat spurious rise in the rate produced by our Twitter system. But our new algorithm handles this effect much better than other systems, ignoring the spurious spike in tweets.”

To check the accuracy of their algorithm, Dredze and Paul compared their results to CDC data for the same time period. The researchers noted that during November and December 2012, their system closely followed the official CDC figures on the flu outbreak.

One of the biggest advantages of the new program is its ability to deliver real-time results. The current flu-tracking system in use by the CDC typically takes two weeks to publish data on the flu outbreak and the researchers said they hope their system can eventually be used to enhance the current government methods.

“This new work demonstrates that Twitter posts can be used to guide public health officials in their response to outbreaks of infectious diseases,” Dredze said. “Our hope is that the new technology can be used track other diseases as well.”

He added that the new algorithm is also an exciting step forward in culling information from social media, something that was readily dismissed as frivolous when it first emerged.

“This really opens up the path for so many new types of questions,” Dredze said. “This really could change the way we do public health in this country and how we get feedback from our population.”