A web search engine stores information about web pages that are retrieved from the Hyper Text Markup Language (HTML). They are retrieved by a web crawler, which is a web browser that follows every link on a website. Meta tags are used to extract words from titles, headers, page content, or special fields. Data about the web page is stored in an index database for later inquiries. This data can be a single word or a phrase which helps in quickly finding information about the page being searched for.
Google search engine, for example, stores all or part of the source page in a cache, since the user expects the return page to have the information searched for on it. The cached pages are very useful in providing that the proper page be found. Google also allows to search by date located in the search tools of its search results page.
When a user enters the word or phrase, called a keyword, in the search box, the search engine examines the text and provides a list of the best matching web pages containing the keyword(s). Search engines typically use the Boolean operators (and, or and not) to assist in the search. These are used to allow the user to search for web pages containing the exact keywords entered. Some search engines have a proximity search, which allows the user to specify the distance between the keywords.
There are concept based search engines like ask.com where the user can ask a question as if talking with another human.
Millions of web pages could contain the keywords, so most search engines will use a method to provide the best result for the keywords entered, usually showing the best match first.
There are two main types of search engines, one has predefined keywords that humans have programmed, and the other generates an inverted index, which analyzes the text, and is the most popular type.
Most search engines collect revenue from advertising, by running ads alongside the search results, and some allow advertisers to pay to have their listings rank higher in the search results.
The first tool used for searching the internet was named Archie, derived from the word archive, created in 1990 by Alan Emtage, Bill Heelan and J. Peter Deutsch. The program downloaded all the files on public sites, but did not index the contents since the data was so limited it could be searched manually.
In 1991 Gopher was created by Mark McCahill at the University of Minnesota, this program was designed to distribute, search for, and retrieve documents over the Internet. Two new search programs were then invented; Veronica was a keyword search program, while Jughead was a tool to acquire menu information.
The first actual search engine was released on September 2, 1993 named W3Catalog and used the technology from Matthew Grey, who, while at MIT, produced the first web robot called Wandex in June 1993. Its purpose was to measure the size of the web. In November 1993 the second search engine was released named Aliweb, which relied on notifications by website administrators of its existence through an index file.
In December 1993, Jumpstation was released which used a web robot to find web pages to build its index. This was the first WWW resource tool to discover websites using three essential features: crawling, indexing, and searching. However, it was limited to just titles and headings on the web pages.
In 1994, WebCrawler was produced as an all text search engine. This search engine would let the user search for any word on any web page, which is the standard for today’s search engines. After its release many more search engines became available, Magellan, Excite, Infoseek, Inktomi, Northern Light, and AltaVista; but the most popular was Yahoo!. Its search engine operated on its web directory, instead of full text copies of the pages, which allowed web searchers to browse in the directory instead of using keywords.
In 1996, Netscape struck a deal with five major search engines for $5 million a year, each rotating on the Netscape search engine page. Yahoo!, Magellan, Lycos, Infoseek, and Excite were the five. In 1998, Google implemented this from Goto.com and boosted the search engine business to one of the most profitable businesses on the Internet.
Google initiated a program called PageRank in 2000, which rated the web page by the number of links to the site from other web sites. Yahoo acquired Inktomi in 2002, and Overture in 2003, and used Google’s search engine until 2004. Then Yahoo launched its own search engine using the technology it had gained from its acquisitions.
In 2011 82.80 percent of the population worldwide who used search engines used Google. Within the US in 2011, Google marketed 65.38 percent, Yahoo! 28.62 percent, and the remaining 66 search engines only six percent.
Google and Bing along with other search engines provide custom search results based on the user’s activity on the internet, resulting in an effect called a filter bubble. A filter bubble uses algorithms to selectively guess the page the user would like to see, according to the information about the user. This isolates the user and restricts the pages searched for, ultimately placing the user in their own informational bubble and will get less exposure to alternate viewpoints. Since this dilemma has been recognized, other search engines avoid tracking users, eliminating the problem.