Quantcast
  • E-mail
  • Print
  • Comment
  • Font Size
  • Digg
  • del.icio.us
  • Discuss article

Automated Tagging Carries Risk Of Unwanted Results

Posted on: Monday, 4 August 2008, 16:03 CDT

Many readers of an Associated Press story related to the Eliot Spitzer scandal were likely surprised when Yahoo’s automated “Shortcuts” system invited them during the article’s final paragraph to peruse photos of underage girls.

Typically used to help readers search for Web sites and news on topics such as “President Bush” or “Washington”, Yahoo Shortcuts had in this case highlighted the phrase “underage girls”. Readers who passed their mouse over these words were then shown a pop-up window that included images from Yahoos photo-sharing Web site, Flickr. 

And while some of the images were not untoward, several captions said the photos were of underage drinkers.  A click of the pop-up window produced more troubling results, such as girls in pigtails, knee socks and lingerie, and a photo of a naked, faceless female.

Associated Press editors contacted Yahoo Inc. about the incident, which occurred in early July, and were told by a company spokeswoman that the link had been swiftly removed.  Many of Flickr’s more suggestive photos were apparently also removed.

Yahoo said the phrase "underage girls" had been added to a list of thousands of blocked terms that will never again generate a Yahoo Shortcut.   However, the gaffe underscores the difficulty publishers face in managing their sites in the age of user-generated content.

Internet publishers increasingly rely on automated systems such as Shortcut to tag phrases and provide links to other Web sites.   With sites such as YouTube and Flickr becoming ever more popular, a substantial number of photographers and bloggers now post their own content, and it's nearly impossible for publishers using automation to exercise complete control.

"No matter how sophisticated you make these automated systems, you're not going to make them perfect, and all you can really strive for is to tune them as you go along," Lauren Weinstein, co-founder of People for Internet Responsibility, told the Associated Press.

Nevertheless, with respect to this case, "it's pretty clear there was a lapse in terms of the quality control of Yahoo's keyword list," he said.

It is not yet clear how the phrase "underage girls" was selected for a link.  Meagan Busath, a Yahoo spokeswoman, said Shortcuts "leverages a combination of algorithmic and editorial processes to identify current, relevant and popular terms."

Words entered into Yahoo’s search engine are among the factors the system takes into consideration when selecting the phrases, raising concerns that the term "underage girls" might be among the most popular searches, according to Chris Sherman, executive editor of industry Web site Search Engine Land.

However, Sherman said a combination of factors were likely at play, such as the use of similar phrases in a popular search term, or perhaps the exploitation of girls may have been a hot news topic at the time.  The selection of the words could also have been a result of its relevance to the story at hand – which, after all, was about how Spitzer’s call girl had dropped a lawsuit claiming she was underage during her appearance in a "Girls Gone Wild" video.

If the system was merely determining whether Flickr had enough pertinent results, the answer apparently was a resounding “yes”.   But Busath notes that users and employees of Flickr routinely monitor the site's content and report questionable images.  A Flickr search for the words "underage girls" produced 428 photos.

Any technology experiences blunders as engineers work to perfect it, and automated content is no exception. In one recent gaffe, a portfolio of Yahoo photos about Osama bin Laden began with a picture of Sen. Barack Obama.  
There was nothing wrong with the programming – indeed, the senator had just attended a hearing about the al-Qaeda leader.   A Yahoo spokesman said the company had rewritten its code to prevent further occurrences.

In a broadly publicized event in the early days of Google Inc.’s AdSense system, the service displayed an ad for luggage adjacent to a news story about a murder victim whose body was stuffed in a suitcase.

Google spokesman Daniel Rubin told the AP the company has since improved its technology, and it now detects sites containing "sensitive content”.  Such sites often receive public service announcements, rather than advertisements, Rubin said.

"We are really only in the infancy of this kind of automated analysis," Weinstein told the AP.

"I'm sure it's going to be expanding greatly, not just in volume but in sophistication."

Yahoo already displays its Shortcuts on articles hosted by Yahoo News from sources such as E! Online and Time.   And since 2006, The New York Times' Web site has implemented an automated system that tags key words within its stories, and invites readers to archived stories about the related topics. Chief Technology Officer Marc Frons said the Times also uses automated technology to methodically vet blogs from within its site.

Frons said that further expansion of current practices that automatically link to additional sites might be considered I the future, but the links would have to be, Frons said. He noted, however, that the links would have to be thoroughly selected.

"The quality of the content and the information is paramount," he told the AP.

"You want to make sure you're striking the right balance between giving your readers everything the Web has to offer with making sure they're getting the right information and the relevant information."

In the mean time, the most pressing short-term objective for automated tagging is creating a richer browsing experience for users, while offering publishers a profit opportunity as the technology is implemented to link to commercial sites.  For instance, the word "television" in a story could be linked to Best Buy's Web page.

For sites considering expanding their use of the technology, the main question is likely whether users will embrace the presence of additional links or whether the experience would be viewed as annoying and irrelevant.

"If it does succeed, it's going to be done in a way that's subtle enough that it's there for people who want it," said Sherman.

"But it's not going to be intrusive for people who don't."


Source: redOrbit Staff & Wire Reports

More News in this Category


Related Articles



Rating: 2.2 / 5 (5 votes)
Rate this article:
1/52/53/54/55/5

User Comments (0)

Comment on this article

Your Name
Text from the image
Comment
max 1200 chars
* All fields are required