Web-Presence and Impact Factors for MiDDLe-EasTern counTries
Posted on: Thursday, 2 March 2006, 21:00 CST
By Noruzi, Alireza
QUESTIONS can arise about how global the World Wide Web really is. Particularly for Middle Eastern countries, it is important to know whether Web presence and Web Impact Factor (WIF) attract the attention they deserve from the World Wide Web community. The academic community, in particular, is ill served if an important geographical region is underrepresented. This study investigates the Web presence and Web Impact Factor (WIF) for country code top-level domains (ccTLDs) of Middle-Eastern countries, and sub-level domains (SLDs) related to education and academic institutions in these countries. The study calculated counts of links to the Web sites of Middle-Eastern countries from Yahoo! searches and computes the WIF at two levels: top-level domains and sub-level domains.
The results show that the MiddleEastern countries-apart from Turkey, Israel, and Iran-have a low Web presence. On the other hand, these three countries' Web sites have a low inlink WIE Specific features of sites may affect a country's WIE For linguistic reasons, Middle-Eastern Web sites (Persian, Kurdish, Turkish, Arabic, and Hebrew languages) may not receive and attract the attention that they deserve from the World Wide Web community.
The World Wide Web is a reflection of human culture, a massive sociocultural network of Web resources authored by millions of people and organizations around the world. As a whole, "the Web displays a striking 'rich get richer' behavior, with a relatively small number of countries having a disproportionately large number of Web sites and pages and share of hyperlink references and traffic" (Pennock et al., 2002). Think of the scientific community on the Web, and especially online journals, as a citation network where traditional information entities and citations from them are replaced by Web pages (e-articles) and hyperlinks, respectively.
WEBOMETRiC RESEARCH
The WIF is an important part of Webometric research, which studies hyperlinks and the impacts and influence ofWeb sites. Webometric studies display several similarities to informetric and scientometric studies and use bibliometric methods, according to Almind and Ingwersen, who first discussed these concepts in 1997. For example, simple counts and content analysis of Web resources resemble traditional publication analysis. Counts-and analyses of outlinks (outgoing links from Web resources) and inlinks (backlinks pointing to Web resources)-can be seen as reference and citation analyses, respectively. Outlinks and inlinks are similar to references and citations, respectively, in scientific e-articles. Webometric studies of the structure and content ofWeb sites in various countries, as well as link structures, are important to understand the international virtual highway and interconnections among countries. The WIF provides quantitative tools for ranking, evaluating, categorizing, and comparing Web sites, toplevel domains, and sub-domains.
The primary objective of this study is to formulate a methodology for the calculation of WIF at two hierarchical levels: ccTLD and SLD. This involved three activities.
1. Calculate WIF for all the MiddleEastern countries and rank them based on their inlink (backlinks coming from other countries) WIFs.
2. Calculate WIF for SLDs related to education and academic institutions and rank them based on their inlink WIFs.
3. Show the number ofWeb pages from these countries indexed by the Yahoo! search engine and rank them based on their Web page size.
The WIF as a useful measure of the overall influence of a Web site, using the backlinks or inlinks to the Web site, has been proposed independently by two bibliometric researchers (Rodrguez i Gairn, 1997; Ingwersen, 1998). For a detailed literature review, see Noruzi, 2005.
WEB ADDRESS STRUCTURE
The Web address is hierarchical in structure. This hierarchy has its origin in the Domain Name System (DNS). The DNS translates the plain-English address (ut.ac.ir) into a corresponding IP address (217.218.33.14). From right to left, the domain name structure has the following hierarchy:
* Top-level domain
* Sub-level domain
* Host-level domain (site/server domain)
In the above example, the hierarchy is as follows: .ir (toplevel domain for Iran), .ac (sub-level domain of academic sites under .ir), and .ut (specific domain of the University of Tehran, operating under top-level domain .ir and sub-level domain .ac). The ccTLD is allotted for each country in accordance with two-letter codes based on ISO-3166 (.ir for Iran, .sa for Saudi Arabia, and so on). Each Middle-Eastern country has an SLD for universities and academic institutions as outlined in Table 1 on this page.
METHODOLOgy
The most convenient way of measuring links among countries' sites is to use the advanced search facilities of general Web search engines, several of which, notably Yahoo!, include link data in their databases. Several WIF studies have been carried out using the advanced search facilities of AltaVista, both before and after its acquisition by Yahoo! (Ingwersen, 1998; Smith & Thelwall, 2002; Thelwall, 2002; Kousha S; Horri, 2004; Smith, 2004; Noruzi, 2005b), and of Yahoo! itself (Noruzi, 2005a). This study uses Yahoo! because it offers special commands that search for matches only in Web elements such as pages, domains, links, and so on.
Table 1. ccTLD of Middle-Eastern countries and their SLDs for universities in these countries
Google's advanced search facility does not support the same level of Boolean querying as Yahoo! or AlltheWeb. Its advanced search can limit the source to a given domain but it cannot explicitly exclude all links from within the site itself (it cannot eliminate the self- links), a second critical gap in its functionality.
In this study, Yahoo! is used to collect data for the calculation of WIF at different levels. Yahoo! supports L i η kdo main: as a command to find pages with a backlink to a site. For example, Linkdomain :u t.a c.ir will find all pages with at least one link to the Web site of the University of Tehran. It also supports domain: as a command to retrieve the number of Web pages indexed per site or domain. Using these commands, the study collected the number of Web pages and the number of link-pages, respectively, from the Yahoo! search engine. Yahoo! reports the number of Web pages retrieved against each search. For example, the following queries will retrieve the number of Web pages, the number of inlinks, and the number of self-links for the ccTLD of Iran (.ir):
Linkdomain:i r/
Will report total number ofWeb pages in Yahoo! database that link to .ir domain (ccTLD of Iran) ,i.e., total number of linkpages.
Linkdomain:ir/ NOT domain:ir/
Will report number of Web pages not under .ir domain but that link to .ir domain (ccTLD of Iran), i.e., inlink pages.
Linkdomain:ir/ AND domain:ir/
Will report number of Web pages under .ir domain that link to .ir domain (ccTLD of Iran), i.e., self-links.
domain:i r/
Will report number of pages under .ir domain (ccTLD of Iran) indexed by Yahoo! search engine.
DATA COLLECTION
Data collection took place on Sept. 29, 2005. All the domain names were searched to check whether Yahooi's databases include these ccTLDs. For each of the countries, a search was carried out to determine the total number of links, the number of inlinks, the number of self-links, and the total number of Web pages at the domain. Searches were carried out to determine the following:
* the total number of pages linking to the ccTLD, A, for example, Linkdomain: ir/
* the number of pages at the ccTLD, D, determined by the command domain:ir/
* the number of inlinks (links from pages outside the ccTLD), B: Linkdomain:ir/ NOT domain:ir/
* the number of self-links (links from pages in the same ccTLD), C, measured in the following way: Linkdomain:ir/ AND domain:ir/
For universities and academic institutions in these countries, the following searches were carried out to determine:
* the total number of pages linking to the SLD, A, for example, linkdomain:ac.i r/
* the number of pages at the SLD, D, determined by the command domain:ac.ir/
* the number of inlinks (links from pages outside the SLD), B: linkdomain:ac.ir/ NOT domain:ir/
* the number of self-links (links from pages in the same SLD), C, measured in the following way: linkdomain:ac.ir/ AND domain:ir/
Figure 1. Middle-Eastern countries with the highest Web presence
Table 2. Number of pages indexed by Yahoo! from each Middle- Eastern country
Table 3. WIF for ccTLDs of Middle-Eastern countries
RESULTS
The data obtained from various search statements by following the above-mentioned methodology are in Tables 2 to 4. The WIF for each ccTLD and SLD has been calculated at two levels-overall WIF by considering all the link-pages, and inlink WIF by considering only inlink pages without selflinks. The ranking is based on revised WIF (inlinkWIF), as this is the true reflection of the degree of impact of the domain spaces on the Web.
Table 2 and Figure 1, both on page 26, show that Turkey, Israel, and Iran respectively have the highest Web page size among Middle- Eastern countries, while Table 3 below shows that they have a low WIF. This unexpected result occurred because th\e higher number of Web pages generated a comparatively lower number of link-pages. The higher the Web page size, the lower the WIF for the country. Therefore, countries that publish many Web resources may not have as high an impact as countries with few Web pages, because the high Web resources rate counteracts the high inlinks rate. Additionally, results overall suggest that Web sites from these countries are somewhat insular: nationally well-interconnected, but less-well- known internationally.
Table 4. WIF for academic SLDs
LiMiTATiONS OF USiNg THE WiF
Webometrics is the extension of the theory and practice of bibliometric techniques in the Web. Bibliometric research has been criticized for certain inherent limitations of ISI (now part of Thomson Scientific) products (Moed, 2002; Seglen, 1997). Similarly, search engines as primary data-gathering instruments may create problems in drawing conclusions for WIF studies. The tool being used for WIF analysis is not specifically meant for the task. Search engines are designed for content retrieval not link analysis. These problems are technical and could be resolved if the search engine programmers had incentives to work on them. As the data-gathering mechanism is quite easy to follow by using commercial search engines, Webometrics has all the potential to evolve as a tool for performance evaluation of any Web site instantly (Mukhopadhyay, 2004).
A primary limitation of the current study is that, although several thousand Middle-Eastern Web sites have generic toplevel domains such as dot-org, dot-corn, dot-net, given the current features of the search engines that serve as the basic data mechanism, it is not possible to determine how many Web sites have generic TLDs. Thus, the current research has considered only top- level domains from these countries. For example, the number of pages of Persianblog, one of the wellknown Iranian blogs, is two times greater than Iranian university Web pages:
* domain:persianblog.com/
955,000
* domain:ac.ir/
468,000
In the case of Iraqi-Kurdistan Universities, which have not used SLDs such as edu.iq or ac.iq, their WIFs had to be calculated separately (see Table 5 on page 28).
Figure 2. Middle-Eastern countries with the highest academic Web presence
Table 5. WIF for Iraqi-Kurdistan Universities
Therefore, the WIF is not a perfect tool to measure the quality, or even the quantity, of Web sites from a country. Despite recognizing that the WIF is an imperfect measure with 10 years of criticism, there is no obvious alternative. Those forced to use this tool for direct Web site comparison should be encouraged to remain open-minded and cautious, with an awareness of the inherent limitations of its use. Although the WIF is arguably useful for quantitative intracountry comparison, application beyond this (to intercountry assessment) has little value.
A comparison of Middle-Eastern countries' sites raises interesting questions about the place of different countries, cultures, and languages on the Web. These countries are outside the main Web area, which is dominated by the U.S., Canada, Europe, Australia, India, Japan, and China. It appears that Middle-Eastern Web sites may achieve a lower visibility on the Web because their language and culture are different from the current mainstream of the Web, dominated by English-speaking countries. This should be a warning to cybercitizens.
Further research is needed to gain a better understanding of the nature of Web links, and further research may be necessary to find reasons for the limited number of MiddleEastern Web pages. It also could be interesting to investigate the Web presence of African countries to see if their situation is similar.
ACKNOWLEDgEMENTS
The author wishes to thank Mrs. Marjorie Sweetko for her helpful comments.
REFERENCES
Almind, T. C., & Ingwersen, R "Informetric analyses on the Worldwide Web: Methodological approaches to Webometrics," Journal of Documentation 53, No. 4 (1997): pp. 404-426.
Ingwersen, P. "The calculation of Web Impact Factors," Journal of Documentation 54, No. 2 (1998): pp. 236-243.
Kousha, K., & Horri, A. "The relationship between scholarly publishing and the counts of academic inlinks to Iranian university Web sites: Exploring academic link creation motivations," Journal of Information Management and Scientometrics, Vol. 1, No. 2 (2004): pp. 13-22.
Moed, H. F. "The impact factors debate: The ISI's uses and limits," Nature 415, (2002): pp. 731-2.
Mukhopadhyay, P. "Measuring Web Impact Factors: A Webometric study based on the analysis of hyperlinks," Proceedings of National Seminar on Information Support for Rural Development, India, IASLIC, December (2004).
Noruzi, A. "The Web Impact Factor: A critical review," The Electronic Library, Vol. 23 (2005a), forthcoming.
Noruzi, A. "Web Impact Factor for Iranian Universities." Webology 2, No. 1 (April 2005b), Article 11 [www.Webology.ir/2005/v2n1/ a11.html].
Pennock, D. M., Flake, G. W, Lawrence, S., Glover, E. J., & Giles, C. L. "Winners don't take all: Characterizing the competition for links on the Web," PNAS 99, No. 8 (April 16, 2002): pp. 5207- 5211.
Rodrguez i Gairn, J. M "Valorando el impacto de la informacion en Internet: AltaVista, el "Citation Index" de la red" [Impact assessment of information on the Internet: AltaVista, the Citation Index of the Web], Revista Espanola De Documentacion Cientifica 20, No. 2 (1997): pp. 175-181.
Seglen, P. O. "Why the impact factor of journals should not be used for evaluating research," British Medical Journal 314, (1997) : pp. 498-502.
Smith, A. G. "Citations and links as a measure of effectiveness of online LIS journals," World Library and Information Congress: 70th IFLA General Conference and Council, 22-27 August 2004, Buenos Aires, Argentina. Retrieved Sept. 29, 2005 [www.ifla.org/IV/ifla70/ papers/049e-Smith.pdf].
Smith, A. G., & Thelwall, M. "Web Impact Factors for Australasian Universities," Scientometrics 54 No. 3 (2002): pp. 363-380.
Thelwall, M. "A comparison of sources of links for academic Web Impact Factor calcinations," Journal of Documentation 58, No. 1 (2002): pp. 66-78.
Alireza Noruzi [anouruzi@yahoo.com] is scholar, Department of Library and Information Science, University of Tehran, Tehran, Iran.
Comments? E-mail letters to the editor to marydee@xmission.com
Copyright Information Today, Inc. Mar/Apr 2006
Source: Online
Related Articles
- Vivisimo Powers Site Search on Bupa's Leading Health Information Websites Worldwide
- Quintura Launches Site Search Solution in U.S. Market
- SLI Systems Introduces Site Search Speed and Merchandising Enhancements, and Debuts Auto Complete Feature
- SLI Systems' Hosted Site Search Services and Technology Chosen By More Than 250 Customers Worldwide
- SMX West Features Talk By SLI Systems' Dr. Shaun Ryan on Site Search and Navigation Best Practices
- WebSideStory Launches Industry's First Ajax-Enabled Site Search Solution
- Bazaarvoice Announces Free Integration With Site Search & Navigation Leaders
- Apollofind.Com Unveils Fully Hosted Site Search Technology
- WebSideStory Recognized As a Site Search Leader in Independent Research Report
- Europe's Largest Government Portal Deploys FAST ESP for Improved Site Search
User Comments (0)

RSS Feeds