New Classification Engines Provide Alternative Web Search
By O’Leary, Mick
Database Review
Google reigns supreme among search engines for its general Web search and its many specialty search products (Google Scholar, Google Book Search, among others). Google’s dominance is welldeserved and seemingly unshakeable, but longtime rivals such as Yahoo!, Microsoft, and a host of newer search engines have been added to the list of contenders. The most high-profile newcomer is Ask.com (see the May 2006 issue of Information Today), which has published ads on national media, thanks to IAC/InterActiveCorp, its deep-pocketed owner.
Classification Engines
As a Web search engine, Ask.com is nothing special. It works much like Google, and the same search on both will usually produce similar results. Ask.com, however, adds something new: In addition to retrieving a single, relevance-ranked list of Web pages, Ask.com also classifies the results by subject, automatically grouping pages that have words or phrases in common. This is a highly useful complement to the single-list search, which is valuable for highlighting the presumptively best pages, but is of little use for exploring the varied aspects of a complex subject. Ask.com’s “classification engine” not only saves the laborious work of doing this exploration by hand, it also suggests not-so-obvious themes and connections. This subject analysis is the principle behind other useful classification engines, including Clusty (www.clusty.com) and Grokker (www.grokker.com; see the October 2005 issue of Information Today).
exalead and Kosmix are two other search/classification engines that take the classification function into new and intriguing directions, exalead (www.exalead.com) classifies not only by subject, but also by several other important Web page characteristics. Kosmix (www.kos rnix.com) produces topic-related classifications for several high-demand subjects, including health, travel, and autos. Neither one of these search engines are expected to overtake Google in this extremely competitive market, but each has noteworthy distinctions in the field of classification engines.
exalead
exalead was developed by a team of French technologists and released in 2000. The search engine, which is actually a cross- platform product that can be used for desktop, intranet, and Web searching, is prominent in enterprise search. It uses a different search technology than Google and other leading Web search engines, which rank primarily on page popularity measured by the number of incoming links. Instead, exalead uses linguistic analyses of page content to retrieve and sort results. The search engine states that it indexes more than 8 billion Web pages and features an exceptional set of advanced search features.
An exalead search produces two separate results sets. The first is a ranked list, based on relevance as measured by exalead’s search query analysis. The second contains results sets classified by several criteria: Related terms, Multimedia (audio, video, and RSS), Language (eight major Web languages), Directory (a topical classification), File type (Acrobat, Word, etc.), and Geographic location (major regions and countries). When you select one of the classified searches, it displays its results and generates a new set of classified searches. This automatic search “refreshing” lets you explore connections among related subjects as well as limits to specific file type, language, etc. exalead’s advanced search offers several other options including truncation, a proximity operator, and limiting by domain or date.
Kosmix
Kosmix carries the classification engine concept a step further by concentrating on a few high-demand Web search topics. Released in 2006, Kosmix was developed by a group that includes severalAmazon.com veterans. Kosmix searches five extremely popular Web subjects: health, video games, finance, travel, U.S. politics, and autos. The search results in each of these topics are
further sorted according to a classification related to the topic itself. For example, search results in the health category are classified by causes, treatments, support groups, etc. Results in the autos category are classified by reviews, recalls, blogs, etc. (Kosmix’s formal classification system is different from those in exalead andAsk.com, which are automatically generated based on recurring terms and phrases.) Within each category and subcategory, Kosmix displays results in relevance order.
A Reviewer’s Nightmare
Your first thought would probably be to compare exalead’s and Kosmix’s classified search results with Google’s single ranked list. It’s not that simple, however, because now Google also does classified searches. Since spring 2006, Google searches in certain high-demand topics (including health and autos) produce both the familiar ranked list and results classified by topic-related subcategories. These topics are displayed at the top of the search results page under the heading Refine Results. The topics are developed by third-party organizations through the Google Co-op program.
All in all, this has become a reviewer’s nightmare with several search engines offering different combinations of single ranked searches, formal classifications, and on-the-fly classifications- and each are working with different search algorithms, advanced search options, and content sets. The best you can do is run hundreds of side-by-side comparison searches and then announce informed, but still arbitrary, conclusions. If we can agree on this, my conclusion is that Google is generally more useful than exalead or Kosmix, especially in searches where it produces classified results as well as a single list.
exalead Versus Google
Google’s biggest advantage over exalead is not its classification system; instead, it’s the basic search method, exalead uses a content-analysis program that can work on multiple platforms. This method may work well on the controlled environments of the desktop or the intranet, but it is less effective in the chaotic Web than Google’s link popularity search method. So I find that a Google Web search generally produces more useful results than an exalead Web search. In those subject areas where Google refines results, its categories are also usually more productive than exalead’s subcategories. Furthermore, exalead’s other search classifications (language, file type, etc.) can also be performed in Google’s own Advanced Search. Because of its many advanced search options, exalead may be more productive than Google for certain kinds of specialized searches.
Kosmix Versus Google
In my book, Google has the edge over Kosmix. Google has good classification searching in three of the subjects that Kosmix uses (health, video games, and autos). The classifications are necessarily similar with symptoms, reviews, etc., and the two search results often overlap. Google, however, is far larger and pulls from a much larger set of Web content. It’s also generally more upto- date than Kosmix.
Still More Classification Searching
Classification searching is the trend. Google’s two closest competitors-Yahoo! and Windows Live Search (the old Microsoft Search)-now have some classification searching built in to their search products. This is a welcome trend for everyone, especially for the people who rely solely on Web search engines for their information.
Though it may be new to the big Web search engines, classifying information by subject is not new. It’s been around for several thousands of years or so in library catalogs and the like. Still, search engines are finally getting with the program, even if they are a bit late.
exalead and Kosmix
SYNOPSIS
exalead and Kosmix are Web search engines that concentrate on classifying results by subject, a valuable complement to the conventional single ranked list provided by the leading search engines.
PRODUCERS
Exalead, Inc., 15 Mercer St., New York, NY 10013, (646) 290-5860 (U.S. office); www.exalead.com.
Kosmix Corp., 444 Castro St., Suite 109, Mountain View, CA 94041, (650) 938-2300; www.kosmix.com.
Mick O’Leary is the director of the library at Frederick, Md., and a principal in The Data Brokers. His email address is harmonyrd@yahoo.com. Send your comments about this column to itletters@infotoday.com.
Copyright Information Today, Inc. Jan 2007
(c) 2007 Information Today. Provided by ProQuest Information and Learning. All rights Reserved.
