Many researchers have an interest in finding citation information about a given article – both how many times the article is cited and who is citing that article. This may be for the completeness of a literature search, or perhaps to find how often his or her own publications are cited. Eugene Garfield made possible the widespread use of citation analysis in academe through his creation of three citation indices: Science, Humanities and Social Science Citation Indices, which were combined and transformed into an electronic version called the Web of Science. These indices were based on the concept that a carefully selected subset of journals would produce the majority of important citing literature for any given article. Citation analysis has real world implications: for good or bad, citedness is considered in grants, hiring and tenure decisions. For many reasons professors and researchers may want to demonstrate the impact of their work and citation analysis is one way (albeit a controversial one [
1-
3]) to accomplish this. For many years Web of Science had a virtual monopoly on the provision of citedness tracking. Late in 2004 two competitors to Web of Science emerged – Google Scholar and Scopus.
The Internet search giant Google sponsored the creation of Google Scholar, a tool that attempts to give users a simple way to broadly search the scholarly literature. Google Scholar uses a matching algorithm to look for keyword search terms in the title, abstract or full text of an article from multiple publishers and web sites (Google Scholar does not share the specifics of how this algorithm works). The number of times a journal article, book chapter, or web site is cited also plays an important part in Google Scholar's ranking algorithm. Search results are displayed so that the more cited and highly relevant articles rise to the top of the set. This varies from the more traditional default "reverse chronological" order employed by most scholarly databases. Google Scholar neither lists the journal titles it includes, nor the dates of coverage; although they have indicated that they have agreements with most major publishers (except Elsevier). Another area of difference for Google Scholar is that unlike most scholarly research databases, it looks beyond journal literature to cover other modes of scholarly communication. Other sources covered in Google Scholar include preprint servers such as arXiv (physics) and government and academic Web sites. Google Scholar does not state how a Web site qualifies for inclusion in its searches.
At approximately the same time that Google Scholar was made public, Elsevier introduced Scopus, an indexing and abstracting service that contains its own citation-tracking tool. Scopus indexes a larger number of journals than Web of Science, and includes more international and open access journals. Citation coverage however only dates to 1996 (abstracts, but not citation coverage, are available back to 1966 for some journals.) Scopus includes its own Web search engine, Scirus. Scirus results are presented separately from other Scopus journal results. Also, material from Scirus does not figure into citation counts for Scopus journal records. Table provides a comparison summary of features in Web of Science, Scopus, and Google Scholar.
| Table 1Comparison of features in Web of Science, Scopus, and Google Scholar |
Citation analysis has been the focus of research and discussion for decades. Much has been written about citation analysis techniques [
18-
29], application to different disciplines [
1,
28,
30], and controversies surrounding the use of citation analysis and journal impact factors to gauge the value and impact of a given journal title or the corpus of a given author [
1-
3]. With the introduction of Scopus and Google Scholar, there have been many recent articles that include careful analysis of the features of each individual tool as well as comparisons among two or more of these tools, and others (for example, PubMed and Scirus) [
9-
17]. While these articles discuss the general characteristics and report the results of sample searches the authors have completed, they do not systematically review the citation analysis functions. In a 2005 study analyzing Google Scholar, Noruzi [
14] briefly compared citation counts for two products – Google Scholar and Web of Science – in the field of webometrics. First, the author selected the first article to establish the word "webometrics" [
18], and provided the "times cited" for both the Web of Science and Google Scholar. The author then compared the number of unique and overlapping citations to this one article in each product. Noruzi also looked at the citation counts for the "most-cited" articles in the field by conducting a search on the term "webometrics or webometric" in each product.
There are inherent problems using subject searches as a comparison measure because of the differences in how Web of Science, Google Scholar and Scopus perform searches. For example, Web of Science does not automatically search for common word variations, while Scopus and Google Scholar do. Similar keyword searches in Scopus and Web of Science often return relatively small result sets (less than one hundred records), while the same search in Google Scholar may return hundreds of results. For example, a search for the phrase "complementary medicine" with the word "obesity" returns 9 results in Scopus, 6 in Web of Science and 596 results in Google Scholar.
Citation tracking of known articles as a comparison method avoids the inconsistencies in subject searching. In a preliminary study Bauer and Bakkalbasi [
31] examined the citation counts for these three tools for articles from the Journal of the American Society for Information Science and Technology (JASIST) published in 1985 and 2000. They found that older material appears best covered by the Web of Science, although this was not confirmed statistically due to the small size of the dataset. For the newer material citation counts were higher in Google Scholar than either Web of Science or Scopus, while there was no statistical difference between the citation counts reported by Web of Science and Scopus. The authors recommended a larger, more robust study.
In attempting to provide a more robust study, this paper looks at a known set of articles, and examines the number of citing articles and other material returned by each of the three search tools for that discrete set, thus removing the ambiguity inherent in subject searches. In this way the study produces data sufficient to test the hypothesis that the different scholarly publication coverage provided by the three search tools will lead to different citation counts from each. In selecting a set of articles to work with, we decided that we should also account for the variations in the publication habits for various disciplines [
4-
8]. Thus we chose two disciplines to investigate that we suspected were following different publication patterns. One, physics, has largely embraced the use of preprint servers for the early dissemination of research literature, while a second discipline, medicine, has not. The subjects were narrowed to condensed matter physics (henceforth referred to as CM physics) and oncology. Sets of known articles from each discipline were selected from both 1993 (before e-publishing dominated scientific disciplines) and 2003 (well into the e-publishing era).
This approach of working with sets of known articles and looking for citing material mirrors the experience of the searcher who is interested in finding citing references to a known article. What can this researcher expect from this new landscape that includes the familiar indices from the Web of Science with the new territory of Scopus and Google Scholar?