September 16, 2008
Questions After News Error Tore Into United
By Miguel Helft
The swift, sharp and short-lived collapse of United Airlines shares on Sept. 8 has been followed by a week of finger-pointing.
The stock plunge was instigated by a series of cascading human and machine errors, and it raised new questions about the reliability of automated news services like Google News and the struggles of some traditional media companies to adapt to the Internet age.
Analysts of new media say there is plenty of blame to go around. They say the problem had its roots in the still-clumsy dance between the Web sites of traditional news outlets and the search engines whose attention they covet. It was then amplified by a researcher at a financial information service who had failed to verify information retrieved from the buyer-beware world of the Web.
"This is what happens when everything goes on autopilot and there are no human controls in place or those controls fail," said Scott Moore, who as head of Yahoo media oversees Yahoo News, the most popular news site on the Web.
The problem began on Sept. 7, shortly after midnight, when a link to an article headlined "United Airlines Files for Bankruptcy," which was originally published in The Chicago Tribune in 2002, appeared in the "Most Viewed" box on the main business page of The Sun-Sentinel's site.
Within a minute, the automated scanning system of Google News, which visits more than 7,500 news sites every 15 minutes or so in search of new material, found the link and followed it to the article page in The Sun-Sentinel's archives.
The article, which was about United's bankruptcy in 2002, did not have a date on it, but the page carried the current date at the top.
Google's system, which the company said had never come across the item before, interpreted it as being new and added it to its news index along with the date it was found, Sept. 6 on the West Coast of the United States.
Google News did not put a link to the article on its main news pages, but it could be found through a search, and it could have been received via e-mail by users who had set up an automated alert for news about United.
The next morning, Monday, an employee of Income Securities Advisors, a financial information company, came across the old article after searching Google News for recent bankruptcies. The employee mistakenly sent out a summary of the article via the Bloomberg service, which is used by finance professionals.
Bloomberg's own news service subsequently flashed a headline, citing the Sun-Sentinel report. Within minutes, United shares crashed.
Trading in the shares was halted, and they later recovered most of their value.
On Thursday, the Securities and Exchange Commission said it had begun a preliminary inquiry into the incident. Income Securities Advisors did not return a call seeking comment for this article.
Tribune said in a statement that its archived bankruptcy article had simply been there online all along. The statement blamed "the inability of Google's automated search agent 'Googlebot' to differentiate between breaking news and frequently viewed stories on the Web sites of its newspapers" for the problem.
The publisher said it had found problems with the Googlebot months ago and had asked Google to stop using it to scan the sites of The Sun-Sentinel and other papers. Tribune said it had asked Google to use a different approach called site maps, which tell a search engine which pages to index.
Tribune also said that a single click on the archived article would have been sufficient to place it on the "most viewed" section because the click came in the middle of the night on a weekend.
For its part, Google said it was unfair to blame it for Tribune's mistakes, including the failure to date the article properly and the failure to use one of many simple methods to prevent links to old articles from appearing on a news page or being seen by a search engine.
Google defended the track record of its news search service, which has to deal with the peculiarities of thousands of news sites. It said it had been talking with Tribune about potential changes in how it found articles on Tribune's sites, but that it had not yet made any changes.
"It is impossible to write a perfect algorithm," said Matt Cutts, a top Google engineer who is best known for helping site owners understand the mysteries of Google's search systems. "People trust a news source in relation to how well it has served them in the past. Google News is very accurate and very reliable and has worked well in lots of situations."
At the root of the problem is the tenuous relationship between newspapers and news aggregators like Google News. By and large, newspapers crave the attention of search engines like Google and Yahoo, which account for about a fifth of the visitors to newspaper Web sites. A prominent link from a search service to a news article can deliver a torrent of Web traffic, along with an advertising windfall. Newspapers often complain to search engines when they think their articles do not appear prominently enough.
At the same time, many newspaper sites now include thousands of older articles that would once have been safely hidden away on microfilm, creating the potential for confusion when they pop up in searches. And many newspaper companies have not fully learned to work in a world where many people are coming to their sites through a search instead of through their own front doors.
"If you are going to make a story available to the public, it is going to be public to Google, and you have to understand how search technology works and how Google News works," said Marshall Simmonds, chief search strategist for The New York Times.
Others defended Google by saying that no article should appear on a news site's "most viewed" list after a single click, and that Google had merely amplified a problem that began on the Sun- Sentinel Web site.
"Leaving out a date in an old article is a bug, the kind of bug that confuses both robots and humans," said Gabriel Rivera, founder and chief executive of TechMeme, a site that automatically compiles the most popular news articles and blog posts on technology topics. "They really do need to fix it."
Rivera said old, undated articles appeared from time to time on TechMeme, but they did not typically have dire consequences for financial markets.
Legal experts, meanwhile, said that while UAL investors might want to recover their losses in court, Tribune is unlikely to be vulnerable to libel charges, and the Communications Decency Act of 1996 generally protects companies like Google that simply transmit electronic information first published by others.
"I would be concerned about the credibility I lost with my customers more than I would about that a big fat lawsuit that is going to hit me," said Eric Goldman, an associate professor at the law school of Santa Clara University and the director of its High Tech Law Institute.
"Everyone here lost some credibility with its readers," Goldman said. "Most obviously Google News, because they were the person in the chain who presented the information to the reader who made the bad judgment."
Originally published by The New York Times Media Group.
On the Net: