redOrbit Staff & Wire Reports – Your Universe Online
Researchers at the University of Rochester have devised a new system that can tell people how likely they are to become ill when dining at a particular restaurant by “listening” to the tweets from other restaurant patrons.
The system, dubbed nEmesis, can help people make more informed decisions and could potentially complement traditional public health methods for monitoring food safety, the researchers said.
The system works by combining machine-learning and crowdsourcing techniques to analyze millions of tweets to identify people reporting food poisoning symptoms after visiting a restaurant.
Such a large volume of tweets would be impossible to analyze manually, the researchers noted. Indeed, over a four-month period, nEmesis gathered 3.8 million tweets from more than 94,000 unique users in New York City, traced 23,000 restaurant visitors, and found 480 reports of likely food poisoning – data that correlated fairly well with public inspection figures given by the local health department.
The nEmesis system then ranked restaurants according to how likely it is for someone to become ill after dining there.
“The Twitter reports are not an exact indicator – any individual case could well be due to factors unrelated to the restaurant meal – but in aggregate the numbers are revealing,” said Henry Kautz, chair of the computer science department at the University of Rochester and co-author of a paper about the system to be presented at the Conference on Human Computation & Crowdsourcing in November.
In other words, a “seemingly random collection of online rants becomes an actionable alert” that can help detect cases of foodborne illness in a timely manner, Kautz said.
Since people often tweet from their phones or other mobile devices, which are GPS enabled, nEmesis can “listen” to relevant, geo-tagged public tweets and detect restaurant visits by matching up the location from where the tweet was sent and the known locations of restaurants.
If a user tweets from a location that is determined to be a restaurant, the system will continue to track that person’s tweets for 72 hours, even if they are not geotagged or if the person is tweeting from a different device.
If a user later tweets about feeling ill, the system captures that data.
The correlation between the Twitter data and the public inspection data means about one third of the inspection scores could be reliably predicted from the Twitter data, with the remainder showing some divergence.
“This disagreement is interesting as the public inspection data is not perfect either,” wrote study co-author Adam Sadilek, formerly a colleague of Kautz at Rochester and who is now at Google.
“The adaptive inspections could reveal the real risk, which is currently hidden for both methods.”
The current work builds on an earlier collaboration by Kautz and Sadilek that used Twitter to determine how likely a specific user was to have flu-like symptoms, and also to find the influence of different lifestyle factors on health.
At the heart of all this work is the algorithm based on machine-learning that Sadilek developed to distinguish between tweets that suggest the user is unwell and those that don’t.
“It’s like teaching a baby a new language,” only in this case it’s a computational algorithm that is being taught, Sadilek said.
For nEmesis, Kautz and Sadilek introduced an extra layer of complexity to improve the algorithm – crowdsourcing.
The researchers turned to Amazon’s Mechanical Turk system to reach out to a crowd of readily available workers, each of whom were paid small amounts of money to categorize some tweets that could be used to train the algorithm.
To ensure the pool of tweets was of high enough accuracy, more than one worker examined each tweet. The researchers incentivized the right answer by paying workers when their answers agreed with that of the majority, and deducting money when it didn’t.
Using this collection of tweets allowed the algorithm to learn from the training samples how to identify tweets that showed people who were likely to have foodborne illnesses.
The researchers acknowledge the system only considers people who tweet, which may not be a representative sample of those visiting a particular restaurant. However, Twitter data can be used together with information from other sources to detect foodborne illness in a timely manner. It can also provide an extra layer – a passive level of monitoring – that is cost-effective and can benefit everyone.