The data for this study were obtained from four databases. These included Ulrichsweb, Journal Citation Reports 2010 (JCR), SCImago Journal & Country Rank (SCImago), and the Directory of Open Access Journals (DOAJ). SCImago and DOAJ are openly available and provide their data in an easily downloaded format. Both our institutions have subscriptions to the electronic versions of Ulrichsweb and JCR, and it was possible to use our institutional access to these databases to obtain the information needed.
Ulrichsweb is a database of detailed information on more than 300,000 periodicals of all types. The JCR is the 2010 version of a database concerning the articles published and the citations received by the peer-reviewed journals indexed in the Web of Science citation index, a database of selected high quality scholarly journals maintained by Thomson Reuters. This study largely focuses on the average number of citations received by a journal over the most recent 2-year period, commonly called an impact factor. SCImago provides open access to similar metrics for citations concerning journals included in the Scopus Citation Database maintained by Elsevier. Scopus is similar to Web of Science but provides data on a larger number of journals. The DOAJ is a database of open access journals that provides basic information about the journals as well as immediate unrestricted access to full text articles for some of these journals. Of these services, Web of Science whose citation index is provided through the JCR has the strictest inclusion criteria, followed by Scopus. DOAJ accepts all journals that fulfill certain criteria concerning the open accessibility and the peer review, whereas Ulrichsweb is open for any journal to self-report their data.
A limitation of this method is that journals not indexed in Web of Science or Scopus cannot be included, since there is no way to obtain citation data in a systematic way. Google scholar could be used to study citations in that index to individual journals but the process is extremely labor intensive and cannot be performed for large numbers of journals.
Studies have shown a high degree of correlation between the citation metrics of JCR and Scopus, although their absolute values differ. For instance Pislyakov [
18] studied the citedness of 20 leading economics journals using data from both JCR and Scopus and found that the correlation between the Impact factors of these two indexes was 0.93 (Pearson). Sicilia
et al. [
19] also found a strong correlation between the two measures for computer science journals. Hence either one provides a good measure for the level of citations.
We used this mix of sources because we needed a number of data items for our analysis that could not be obtained from just one database. Ulrichsweb was used to obtain the start year for each journal as well as the up to five discipline categories in which it was classified. It was also used to identify the country of origin of the publisher. Being listed in the DOAJ was used as an indicator of whether a journal was open access and to determine if a journal charged APCs. The JCR was used to obtain the 2-year impact factor for each journal as well as the number of articles published in it in the most recent year available in the report, 2010. SCImago was used to obtain the 2-year citation count divided by number of articles published for Scopus indexed journals (in essence similar to the JCR impact factor) and the number of articles published in 2011.
To create a merged data set for analysis we started with the Ulrichsweb database, first narrowing the database to only journals that were: abstracted or indexed, currently active, academic/scholarly, refereed, and formatted as online and/or in print.
We selected all journals within those limits that were listed in the following discipline categories (based on the discipline coding used by Ulrichsweb): arts and literature; biological science; business and economics; chemistry; earth, space and environmental sciences; education; mathematics; medicine and health; physics; social sciences; technology and engineering. While there were other disciplines categorized in Ulrichsweb, these in our view captured the major scholarly disciplines. Many journals were listed under multiple disciplines. We recorded each discipline listed for each journal. The maximum for any journal was five. The data were retrieved in January 2012.
We then merged data from the other three databases to the journals identified in Ulrichsweb using either the International Standard Serial Number (ISSN) or the Electronic International Standard Serial Number (EISSN) as the identifier. There were 23,660 journals identified in Ulrichsweb meeting the criteria within the 11 disciplines of which 12,451 (52.6%) were in the SCImago database as of January 2012, 8,256 (35.0%) were in the JCR 2010 and 2,530 (10.7%) were in the DOAJ as retrieved from their web site in August 2011.
Citation metrics of OA and subscription journals were analyzed in two different ways. Firstly they were analyzed with journals as the unit of analysis, which was at the level the data were retrieved from the four databases. We also estimated the citation metrics of the articles published. This was performed by weighting the journal level citation metrics by the number of articles published in each journal per year using article counts provided by the JCR and SCImago databases. This lends more or less weight to each journal based on the number of articles that were published within the journal. We feel this adds a new and important dimension to the analysis as compared to earlier studies.
In the data collection and analysis process we found some problems with the SCImago data. The site allows downloading the basic article numbers and citation data for all journals as one Microsoft Excel file with the most current year's data. The data on impact factors and number of articles was for 2011 but it seems that the article and citation counts are not complete for the full year, so that both the article numbers and impact factors are too low. This could easily be checked for individual journals and it turned out that the impact factors for 2010 as well as preceding years were in most cases almost double compared to the 2011 figures. A comparison with the journal level analysis in Miguel
et al. [
17] also pointed in the same direction. Unfortunately it was not possible to extract the older data for the over 12,000 journals in the study so we were limited to using the 2011 data, which was incomplete.
We nevertheless feel that the analysis using SCOPUS data provides a useful triangulation with the JCR analysis. Provided that the insufficient counting for 2011 is systematic across all journals, with no differentiation between OA and subscription journals, the citation levels for OA vs. subscription relative to each other should remain the same, although the absolute levels are lower. In comparing the numbers with the JCR based the proportions between OA and subscription citation rates were approximately the same in both sets supporting the conclusions we later illustrate mainly with the JCR results.