Our results concerning the overall OA share are well in line with the results of the study we did of 2006 articles 
. In that study we concluded, using different methods and a much smaller sample, that there were 8,1% gold OA articles and 11,3% green copies. The overall share of OA was 19,4% compared to the 20,4% in this study. The difference can be explained by a number of factors.
- The difference is within the confidence interval
- The two studies used partly different methods
- The share of OA is changing with time and two years had lapsed between the studies.
The most thorough study we have found to compare our results with is Matsubayashi et al (2009) 
. Using methods similar to ours they studied the OA availability of articles in biomedicine from 2005. Their source of article metadata was the Pubmed bibliographic database.
Their material included both peer reviewed articles and news items etc. They reported an OA percentage of 26,3 for peer reviewed articles (70% of their sample), and if the overall share of OA articles requiring registration (0,4%) is subtracted the number comparable to our study would be 25,9%.
Due to the high number of articles analyzed, their confidence interval should be rather narrow and their results rather reliable from a statistical viewpoint. Currently Pubmed offers a search facilty in which one search term is “link to free full text”. We did a search for all “journal articles” published in 2005 with no restrictions (636162 hits) and then repeated this with the further restrictions of “link to free full text”. The OA percentages obtained this way were 23,1 for 2005 and 23,3 for 2008. The figure should include all OA journals (full and delayed) as well as full text deposited in Pubmed Central (either as exact replicas of author manuscripts).
It should be noted that Matsubayashi et al (2009) 
found 72% of their OA copies at journal websites, 26% in PubmedCentral and 17,4% in journal platforms or portal sites (like Scielo). The numbers add up to more than 100% due to possible duplication. Thus their figures are well in line with the above rather exact figures from Pubmed. More so since they estimated that of all the OA copies they found only 5,9% were in Institutional repositories and 4,8% on author's personal websites (which sums up to 2,8% of all analyzed articles).
The difference between their results and our results (In particular our discipline-specific results for medicine, areas related to medicine and biochemistry) could be caused by a number of factors:
- Use of Pubmed vs. Scopus as a source for article data
- Different base year and time delay from publishing to searching for copies
- Different search strategy. We only used Google. Matsubayashi et al used four different databases and search engines to identify full text copies. They also checked the 20 first results in Google and Google Scholar whereas we only checked the first page.
- The method of obtaining the sample (a search based on the pagination of articles) was the same but we compensated for the possible bias towards small journals by multiplying by the number of articles published per year.
In a study concerning the journal output from 2003, Mc Veigh (2004) 
found that out of 747060 citable articles indexed in the ISI Web of Science 2,9% were in open access journals. This can be compared to our 6,6% of gold OA in ISI journals. It should be noted that our figures also include delayed OA and article specific OA.
Bhat (2009) 
studied the OA availability of the research articles from 2003–2007 indexed by Scopus by five leading Indian research institutes. Of the 17516 articles studied 7,8% were published in Open Access journals (either full or delayed). About two thirds of these were in Indian Open Access journals. The study did not include green copies.
In a study of the citation advantage of OA Norris, Oppenheim and Rowland (2008) 
also calculated the OA availability of a 4633 articles from 65 high-impact factor journals (included in Web of Science) in four subjects, Applied Mathematics, Ecology, Economics and Sociology. They specifically recorded only green copies, which had the same title and authors as the published article and discarded any hits to the publisher's web site. The availability of OA copies was very high in Economics (65%), Applied Mathematics (59%) and Ecology (53%) but considerably lower in Sociology (21%). Since the purpose of their study was specifically to study the citation advantage it appears that they have on purpose included subjects which a-priori were known to have a tradition of posting green copies.
Way (2010) 
studied the OA availability of articles published in 2007 in 20 top journals (using ISI's journal impact factors) in Library and Information Science. The overall OA share was 27% over a sample of 922 articles. Way also classified the green copies and found that subject-based repositories (38%) and personal web sites (29%) were the two most common locations for the copies.
The study with the biggest sample of articles was Hajjem et al (2005) 
who used web robot techniques to study the citation advantage of OA. They also calculated the OA availability of 1,3 million articles from 1992–2003 in 10 disciplines and found that the overall OA share was between 5% and 16%. These figures are difficult to compare with.
The clear majority gold articles that we found in our study were in pure gold journals (62%). Articles in delayed OA journals only summed up to 14%. Studying the prevalence of delayed OA articles is much more difficult than pure OA ones, since the journals containing the latter tend to be listed in DOAJ, whereas just about the only site where more aggregate information about delayed OA journals can be found is the Highwire Press website, listing around 200 of the journals they host as offering delayed OA.
We found that 24% of gold articles were individually paid OA articles on subscription sites. This seems to be in line with the few reports available on the actual uptake of this option by authors. For instance 
reported an average uptake in 2007 of 7% for the 65 journals offering Oxford Open. It is also important to note that only a minority of journals currently offer paid article level access. Of the 9500 journals of 22 major publishers 22% offered this option in September 2009 (informal communication, Max Planck digital library). Theoretically, if 22% of the whole volume of articles from 2008 had this option and the average uptake was 10%, this would lead to a figure of 2,2% of the articles. In principle it would be possible to calculate relatively exact numbers by analyzing the tables of contents for the full 2008 volumes of all the journals of the major publishers offering this option, including over a thousand Springer journals. This task would be very tedious and would probably require using a sampling method.
The overall breakdowns of green copies according to type of repository and type of copy should also be of interest. Since the overall “hits” in each category are rather small we decided not to publish the figures per discipline since they would be very unreliable from a statistical point of view. We can just note that in a few disciplines subject-based repositories dominated, in medicine PubMedCentral and in physics arXiv.
It may come as a surprise that only one out of four green copies was found in institutional repositories. A lot of effort has recently been put into starting such repositories and issuing university guidelines encouraging and requiring academics to post copies there. But compared to the leading subject-based repositories these have had a shorter lifespan so far. Other web sites, in particular the authors' home pages were still the most popular places for placing copies (40%).
Morris and Thorn (2009) 
surveyed the OA-attitudes and behaviour of members of learned and profession societies in the UK in the winter of 2008. Of particular interest are their figures of where those respondents who practiced self-archiving placed the copies. The figures sum up to over 100% but if they are normalized to 100% the answers are 30,2 for institutional repositories, 11,8 for subject-based repositories and 58,0 for author, departmental and other websites. These figures thus differ quite a lot from our findings, but one has to bear in mind that the questions were differently phrased. Also the spread of the respondents over research fields might differ quite a lot compared to our study.
Fry et al (2010) 
surveyed author attitudes and behaviour of European researchers. Although they received 3136 responses, a high proportion came from the physical sciences and mathematics (56%). They report on the characteristics of the green copies that the respondents had deposited (p. 33). By normalising their figures to 100% we get the following distribution: preprint version (34%), author final manuscript (38%) and publishers' version (28%). The relative popularity of the different types of repositories was; subject-based repositories (46%), Institutional repositories (45%) and other web sites (9%).
The high share of exact copies we found was slightly surprising, considering the types of copyright restrictions the major publishers pose. In fact, a number of clearly illegal copies were found, where the publishers' files had been copied, usually without proper attribution. Usually these were on the authors' or their departments' home pages. It was also very noticeable that preprints were mainly posted in a few disciplines; mathematics, economics and physics in particular. These areas are known to have traditions of making manuscripts available in the form of preprints or working papers 
We will not attempt a more detailed discussion about the possible reasons for the differences between disciplines (for good discussions see 
). Factors which we believe are particularly important include:
- Uneven spread of available OA journals across disciplines
- Unequal possibilities for financing author charges
- Availability of well established subject based repositories in some disciplines
- Traditions of making preprints available in some subjects
All in all we believe our results should be of interest to science policy makers and scientists alike, providing one of the most comprehensive cross-disciplinary OA studies to date. There are numerous ways to extend the method we have used, for instance comparing more in detail the quality of OA articles compared to non-OA articles. A comparison of the OA availability of articles originating from different countries would be of great interest, since OA has been seen as a great way for authors of developing countries to get their research results better known.