Status of open access in the biomedical field in 2005

Objectives: This study was designed to document the state of open access (OA) in the biomedical field in 2005. Methods: PubMed was used to collect bibliographic data on target articles published in 2005. PubMed, Google Scholar, Google, and OAIster were then used to establish the availability of free full text online for these publications. Articles were analyzed by type of OA, country, type of article, impact factor, publisher, and publishing model to provide insight into the current state of OA. Results: Twenty-seven percent of all the articles were accessible as OA articles. More than 70% of the OA articles were provided through journal websites. Mid-rank commercial publishers often provided OA articles in OA journals, while society publishers tended to provide OA articles in the context of a traditional subscription model. The rate of OA articles available from the websites of individual authors or in institutional repositories was quite low. Discussion/Conclusions: In 2005, OA in the biomedical field was achieved under an umbrella of existing scholarly communication systems. Typically, OA articles were published as part of subscription journals published by scholarly societies. OA journals published by BioMed Central contributed to a small portion of all OA articles.


INTRODUCTION
Open access (OA) has become a hot topic in the field of scholarly communication over the past several years. Although many different definitions of OA have been proposed, all contain common themes, such as, ''to improve access to literature'' or ''the basic human right to know'' [2]. These common themes refer to the fundamental purpose of scholarly communication, because access to scholarly information is essential for all research. However, not all of those involved in scholarly communication place the highest priority on access to information. For example, some researchers consider the peer-review system to be the most important contributor to a successful system of scholarly communication. Most commercial publishers even question the economic sustainability of the OA model, as compared to the current ''pay for access'' model. While many opinions have been expressed either in support of or against OA, few data are available that document the quantity of OA activity or describe the direction in which OA is currently evolving. Whatever their opinions on the value and sustainability of OA, all parties should benefit from an accurate understanding of the breadth of OA publishing.
The purpose of this study was to document the status of OA articles in biomedical publications in 2005. Biomedicine is an area in which OA may be particularly welcome, as members of the public, seeking the most recent research findings on health conditions and treatments, are increasingly demand-ing access to biomedical articles. Because of the public's interest in health information, members of the Science and Technology Committee in the United Kingdom have stated that ''it is better that the public should be informed by peer-reviewed research'' [3].
The year 2005 was chosen for the study because it was a critical year for OA in biomedicine. In May of that year, the National Institutes of Health (NIH) public access policy was implemented, requesting that NIH-funded scientists submit their articles to PubMed Central (PMC) within 12 months of publica- N Seventy percent of the OA articles were accessible on journal websites, while the rate of OA articles available from author's websites or institutional repositories was quite low.
N OA articles were most frequently available as ''freeaccess'' articles in journals published by scholarly societies.

Implications
N The data acquired in this survey may be used as a starting point for future surveys on OA articles, providing a snapshot of the situation in 2005, when the National Institute of Health's public access policy was implemented.
N PubMed is an effective database for sampling articles in the biomedical field to be used in this kind of survey. N Institutional repositories, which are now being constructed by university libraries, play a unique role in contributing to the availability of OA articles. tion. (This policy was later updated to require rather than request submission upon the acceptance of the article for publication after April 2008 [4].) The year 2005 was therefore chosen for this investigation because it was the last year in which the prevalence of OA in biomedicine would not have been influenced to any significant degree by the new NIH public access policy. Although some of the articles in the data sample were actually published after the policy went into effect, their affect on the data would only be minor, as only 200-500 ''author's manuscripts'' were archived in PMC per month in 2005, which is less than 1% of all the articles indexed in PubMed [4].

Studies of the distribution of open access (OA) articles
Most quantitative analyses on the progress of the OA movement have studied the citation advantage of OA as opposed to non-OA articles. In the present review, the authors focus only on results that show the ratio of OA articles to all published articles, ignoring those analyses that do not report the absolute percentage of OA [5,6].
The ratio of OA to non-OA articles has been shown to vary according to the academic field of publication. Hajjem et al. reported on the citation advantage of OA articles in an analysis of more than 140,000 article records from the Web of Science (WoS) databasecovering 10 academic fields, including biology, psychology, sociology, and health-published between 1992 and 2003 [7]. The percentages of OA articles in WoS ranged between 5% and 16% according to field, with 15% in biology and 6% in health. Antelman compared the percentage of OA publications in 4 academic fields in a study of articles published during 2001 and 2002 in 10 journals from each field [8]. The percentage of OA articles varied, ranging from 17% in philosophy to 69% in mathematics.
The percentage of OA articles in physics has been relatively high because arXiv, an electronic preprint archive for the physics community, has been active since 1991. Freely available, online access to peerreviewed research is well established in this discipline. Harnad and Brody reported the following percentages of OA articles in physics: 10% on average between 1992 and 2001 and 18% in 2001 [9]. The highest percentage of OA articles was found in nuclear and particle physics, with over 40% of the articles in 1996 and 48% in 2001 available via open access. This specific field is well known for its large number of OA registrations in arXiv and has received a substantial amount of discipline participation in OA from the early years of the web to the present time. Kurtz et al. found that 70% of the articles published in 2003 in Astrophysical Journal, a core journal in its field, had first been registered with arXiv [10].
Hajjem et al. also reported the average percentage of OA articles in physics by country (based on the first author's affiliation): 13% of the articles in the United States, 10% in the United Kingdom, and 7% in both Japan and Germany [7] were OA. Antelman investigated the various ways in which articles published in 2001-2002 were archived for OA [8]. Her results indicated that, with the exception of mathematics, placing articles on authors' websites was the most common way to provide OA, accounting for 36% of the articles archived in philosophy and more than 20% in both political science and electrical and electronic engineering. In mathematics, use of discipline-specific repositories was much more common (30%) than authors' websites (15%).
Some studies have reported on the characteristics of OA articles-specifically, that higher ranking journals or articles were more frequently published via OA. Wren investigated articles published in both subscription and OA journals in the biomedical field between 1994 and 2003 [11]. He found that articles published in journals with a high impact factor (IF) had a greater tendency to be available as OA articles. Kurtz et al. suggested that self-selection policies by authors might lead them to deposit their most citable articles as OA in arXiv [10]. Miyairi pointed out that the results of investigations on the current state of OA articles might be biased toward ''qualified articles, because the studies tend to deal with arXiv, Web of Science, and samplings from prestigious journals available online'' [12]. In other words, the available samples are not necessarily representative of overall scholarly output.

Definition and types of OA
A variety of definitions of OA articles have been proposed. The Budapest Open Access Initiative (BOAI) 2002 restricts the definition of OA to peer-reviewed journal articles only [13], although many researchers consider this definition to be too narrow. In contrast, Willinsky includes the provision of bibliographic information and abstracts by ScienceDirect in his definition of OA [1]. Other researchers have restricted OA to articles that are freely available immediately [14]. This study adopts the definition that ''open-access (OA) literature is digital, online, [and] free of charge'' [15], regardless of the timing of article availability.
The classification of the method used to provide OA has varied as well. Most researchers have recognized the ''two roads to OA'' that were described at BOAI 2002, the ''green road'' (BOAI-I) of self-archiving and the ''gold road'' (BOAI-II) of OA journals [13]. However in 2002, when these classifications were proposed, authors' individual websites or arXiv were the only available means to provide OA. Although this bipartite classification (i.e., self-archiving and OA journals) may still be useful, the provision of OA has since expanded beyond these two options. Currently, OA can be provided in at least six ways: (Methods (1), (2), and (3) correspond to ''self-archiving'' in BOAI-I [i.e., the green road]). 1. ''Authors' websites'' are conventional methods of providing open access. The authors archive their articles on their websites for various purposes, showing their accomplishment or keeping internal records, for example. These methods might be unstable for OA, because the availability of articles depends on the authors' voluntary contribution. 2. ''Institutional repositories'' (IRs) are developed by universities or other institutions for research and education to collect and provide access to the research achievements of their affiliated researchers. Both the UK Science and Technology Committee and Harnad recommend IRs as the most effective way to provide OA to research output [3,14]. 3. ''Discipline-specific archives'' provide open access to articles in a specific field. PMC is currently a discipline-specific repository for articles in biomedical sciences and related disciplines. Preprint servers such as arXiv, which is most popular in the discipline of physics, are also examples of discipline-specific archives. Before 2005, the NIH's PMC provided open access to articles published in journals from BioMed Central or from a few traditional subscription journals, such as the Proceedings of the National Academy of Sciences (PNAS), which provided their articles as OA after a brief embargo. In that earlier version, PMC would be classified as a journal website. However, as noted above, the NIH public access policy requested that researchers post all articles resulting from research funded by the NIH in PMC beginning in May 2005. PMC therefore may now be regarded as a disciplinespecific archive. As of 2005, however, only a few articles had been registered by the authors themselves, according to the NIH administrators [16]. 4. ''Journal websites'' are basically identical to the category of ''OA journals'' in BOAI-II (i.e., gold road), they include not only journals in which authors pay for publishing but also hybrid OA journals in which some authors may choose to publish OA if they pay a fee, subscription journals with free access to the website version but a fee for print, and embargo journals. (Embargo journals are basically paid-access journals that provide articles as OA a set time period after their initial publication.) 5. ''Journal platforms'' are supplied by a government or public institution to support the digitization of domestic scholarly journals. J-STAGE by the Japan Science and Technology Agency (JST) and the Scientific Electronic Library Online (SciELO) in Brazil are examples. Many scholarly journals that receive public financial support provide articles on the web free of charge. 6. ''Other portal sites'' are generally free web services operated by third parties that supply access to journal articles from a variety of publishers. FindArticles and Nursing.com are examples. FindArticles covers articles in many fields, including business, health sciences, technology, sports, and so on. The articles offered by Nursing.com are limited to the medical and health sciences fields.

Focus of the current study
As is evident from the above review, the few available empirical studies have reported varying percentages of OA. They have also documented a variety of characteristics of OA articles such as the academic field of publication, the countries of the first author, and the IF of the journals in which OA articles appear. To more completely and comprehensively capture the complex structure of OA in biomedicine, a more detailed and large-scale analysis is required. The current study therefore included a broad target sample in the biomedical field, a detailed analysis of the types of OA, and a detailed analysis of the journal's publishers and the publishing model for OA.

METHOD Sampling
PubMed was used to collect the target sample because of its broad coverage and popularity in the biomedical field. A sample was taken in January 2006, consisting of all of the articles in PubMed with publication dates between January and September 2005 and with page numbers of 11, 12, 13, 14, 15, 16, 17, 18, or 19 in the ''Pagination'' tag. Half of the articles in this sample were then selected randomly and searched again in PubMed so that articles without the authors' name or titles could be eliminated, resulting in a final sample of 4,667 articles.
Procedure PMC, Google Scholar, Google, and OAIster were searched to locate the full text (FT) of articles in this sample during March to May 2006. First, we searched PMC, Google Scholar, and Google (in that order). If the FT was not found in PMC, Google Scholar was searched; only when the FT was not found in either PMC or Google Scholar was Google (web) searched. When searching Google Scholar and Google (web), we examined only the first 20 results in the search results list. If the FT was not found among the first 20 search results, we moved to the next database. If multiple versions of the FT existed, we recorded all versions.
Next, OAIster was searched for the FT for all of the articles in the sample. Because OAIster is a database specializing in searches for OA articles, an exhaustive search for FT could be made.
If the FT was found in any of these four databases, the URL was recorded with a code corresponding to one of three categories, as follows: 15OA; 25restricted OA (e.g., user must register to gain access); 35electronic subscription journal (non-OA). Articles for which no FT could be found were assigned to a fourth category, ''Not available online.''

Classification of full-text articles
Those articles in the sample for which free FT was found online (hereafter referred to as OA articles) were then analyzed with regard to: 1. Method of providing OA: The six means of providing OA listed in the ''Background'' were regrouped into five categories: (1) authors' websites, (2) mid-rank commercial publishers (other commercial publishers than the eight publishers listed in (1) above); (3) scholarly society publishers; (4) combination (more than one of the three categories mentioned above, such as both a society publisher and a major commercial publisher, etc.); and (5) pharmaceutical companies (not publishers). 6. Publishing models: Publishing models were classified into one of two types: OA journals and subscription journals. ''Subscription journals'' included journals that offer authors the opportunity to have their articles made OA in exchange for payment of a fee (often referred to as ''hybrid OA journals'').

Percentage of OA
Overall trends. Twenty-six percent of the articles in this sample provided unrestricted OA, and 0.4% provided ''restricted OA'' (user registration was required) (Figure 1). Together, these categories represented more than one-quarter of the sample (27%) that was available as an OA publication. By contrast, the FT of 53.2% of all the articles was only available in electronic subscription journals, and 19.8% did not have FT online.
Type of article. The majority of the articles in this sample could be classified as research articles (70.5%), followed by news items (22.2%). The remaining articles included commentaries, replies, chapters of monographs, and unknown items (7.3%). PubMed includes many more news items than WoS. In this sample, however, most of the target articles were fulllength research papers with introduction-methodresults-discussion (I-M-R-D) elements.
The percentages of OA for research articles and news items were similar. Research articles (n53,290) represented 26.3% of the total, while news items (n51,034) accounted for 29.3%. Impact factor. Approximately half (n52,333) of all the articles in the sample (n54,667) were published in journals for which Thomson Reuters provides an IF, and the others (n52,334) were published in journals for which Thomson Reuters does not publish an IF. Among articles in this sample published in journals that have an Thomson Reuters IF, 22.3% were OA, while 30.9% of the articles from journals that do not have IF were OA. While this is a relatively small difference, the percentage of not available online articles varied substantially between these 2 groups: 5.8% of the articles published in journals with an IF were not available online, while 33.8% of the articles published in the journals that do not have an IF were not available online.

Methods of providing OA
The majority of OA articles were available from journal websites, in which OA is provided by the

Status of open access
journal's publishers (72.1%). PMC (26.0%) was the second most common method of access, followed by journal platforms or portal sites (17.4%).
In contrast, the percentage of OA articles available via self-archiving (in IRs and authors' personal websites) was considerably lower (5.9% and 4.8%, respectively). However, 87.7% (64 of 73 items) of the articles available from IRs were published in journals that did not have a stated policy of OA. Thus although only a small number of articles were available from IRs, these repositories were important because they provided OA articles that were not available in other locations.
Despite the fact that the NIH public access policy took effect in May 2005, the number of articles made available by authors' self-archiving in PMC represented only a small minority of the OA articles. We found only 1 author's manuscript among the target articles, while NIH reports that 200-500 authors' manuscripts were archived per month in 2005 [16].
Method of providing OA by countries. Three thousand six hundred ninety-five articles in the sample included information on the affiliation of the first author. Articles from the 20 countries with the highest number of articles in this sample were then further analyzed for patterns of OA (Table 1). Authors residing in Belgium had the highest rate of OA article publication: 41.7% of the articles by these authors were OA. They were followed by India (40.0%), Canada (37.6%), and Brazil (36.4%). While articles by authors residing in these countries had OA rates greater than 35%, they represent less than 2% of the total articles in the sample, except for Canada. The small sample size might account for the high OA ratios. Among the 8 countries accounting for the largest number of articles in the sample, the rate at which OA articles were published by authors residing in Canada was the highest (37.6%), with the United States second (30.1%). Articles by authors from 4 other countries were also published as OA at a rate of more than 20%: the United Kingdom (24.0%), France (22.3%), Japan (20.9%), and Italy (20.3%).
Among these top 20 countries in terms of number of articles, with the exception of Brazil, a high percentage of OA articles were accessible from journal websites. This result mirrored the overall trend mentioned in the previous section. In countries such as the United States and the United Kingdom, the percentage of OA articles accessible from journal websites was very high (70 or 80%) and those accessible through PMC was around 30%, which was a little higher than average. Japan and Brazil, however, showed different patterns of providing OA.
Although the percentage of OA from journal websites was the highest among the various types of OA (57.1%) studied, a distinctive characteristic in Japan was the higher percentage of journal platforms or portal sites (40.8%). All of the OA articles by authors residing in Japan and categorized as journal platforms or portal sites were available from J-STAGE. J-STAGE is an electronic journal platform that was established by a Japanese governmental agency to encourage domestic scholarly societies in science, technology, and medicine to make their journals available online. Most journal publishers on J-STAGE have been offered support to digitize their articles at no cost. The high percentage of J-STAGE OA journal articles indicates that the Japanese government's policy on the digitization of society  journals has contributed to the availability of OA articles in Japan. However, unlike PMC, the Japanese government has not clearly stated a policy of advancing OA. [17].
In Brazil, the percentage of OA available from journal platforms or portal sites was remarkably high. The percentage of OA available from SciELO, a kind of electronic journals platform created by the government, is overwhelming (85.0%), while the percentage of OA available from journal websites was low (only 10%).

Publishers and publishing models of journals
The publishers and publishing models in 4,463 of the sample articles were examined. After excluding articles regarded as ''monographs'' in Ulrich's Periodicals Directory and articles for which the publishers or publishing models were unknown, the sample consisted of 4,463 articles, 1,203 of which were classified as OA. Figure 2 shows articles in this sample by journal publisher (n54,463). Overall, mid-rank commercial publishers published about one-third (32.8%) of the articles, major commercial publishers (21.2%) and society publishers (21.1%) were tied for second place. Table 2 shows the distribution of OA articles and online availability by type of journal publisher.
Almost four-tenths (40.6%) of the articles published by society publishers were OA, while only one-third (32.7%) of the articles published by mid-rank commercial publishers fell into this category. The percentage of OA in articles published by major commercial publishers was relatively small (11.7%). The percentage of OA among articles published by pharmaceutical companies was the highest (52.6%) in this sample; however, as this type of publisher accounted for only 1.7% of the total articles (n54,463), the percentage might be misleading.
Of the articles in this sample that were published by major commercial publishers, 97.4% had FT available online followed by mid-rank commercial publishers (80.3% available online) and society publishers (63.5%

Journal publishers and publishing models for OA articles
To analyze the relationship between journal publishers and publishing models in a simple manner, the several OA publishing models considered earlier were consolidated into two categories: full ''OA journals'' and traditional ''subscription journals.'' Many journals provide OA articles on the web but maintain a traditional subscription model for their print version. Oxford University Press journals and Japanese journals in J-STAGE are prominent examples. The category of subscription journals also included 32 OA articles published in ''hybrid OA journals,'' which offer OA at the author's expense; these accounted for a small percentage (2.7% of the OA articles). Table 3 shows the distribution of OA articles in OA and subscription journals by publisher type. Fewer OA articles in our sample were published in full OA journals (37.2%) than in traditional subscription journals (62.8%).
OA articles in OA journals were published mostly by mid-rank commercial publishers (69.1%) and to a lesser degree by society publishers (26.8%). However, it should be noted that BioMed Central accounted for 83.2% of the OA articles in OA journals published by mid-rank commercial publishers.
Among OA articles in subscription journals, society publishers accounted for the highest number of articles (34.7%) and mid-rank commercial publishers the next highest amount (22.5%). If society and major commercial publishers and society and mid-rank commercial publishers were combined, the total category of society publishers accounted for 58.9% of the OA articles. Table 4 shows the distribution of OA types (OA journal and subscription journal) by publisher type. The subscription journal dominates the distribution of OA articles published by major commercial publishers (99.1%). Hybrid OA journals represent a minority (17.1%) of OA articles in subscription journals. Among mid-rank commercial publishers, the percentage of OA articles published in OA journals (64.5%) was twice as high as that published in subscription journals (35.5%). Among biomedical society publishers, on the other hand, the percentage of OA articles published in subscription journals (68.6%) was more than twice as high as that published in OA journals (31.4%).
As mentioned in the previous section, mid-ranking commercial publishers and society publishers play central roles in providing OA articles. OA articles by mid-rank commercial publishers are often provided through a new publishing model, OA journals, while society publishers may provide OA articles in the traditional subscription model.

DISCUSSION AND CONCLUSION
This study showed that 27% of articles in the biomedical field in 2005 were accessible as OA articles, including ''restricted OA'' articles. Hajjem et al. reported an even lower percentage (15%) in the field of biology in 2003 [7]. The difference between these 2 studies might represent progress in the OA movement or the use of PubMed instead of WoS to derive the sample. The higher OA percentages found in our results might also be due to checking for OA sites manually, instead of running a search algorithm using robots.
In our analysis of the type of OA, more than 70% of the OA articles were provided on sites maintained by the publishers of the articles. In contrast, the percentages of OA articles available from self-archiving (authors' websites and institutional repositories) were quite small (5.9% and 4.8%, respectively). Although many OA advocates have considered selfarchiving, or the green-road, as a feasible means of advancing OA [3,14], this method did not contribute substantially to OA in the biomedical field in 2005. Eighty-eight percent of the OA articles in institutional repositories, however, were not available any other way. Thus, IR contributed in an important way to the accessibility of OA articles, despite the small number of archived articles. In contrast, 92% of the PMC articles were also available on journal websites.
More than 60% of the OA articles were published in traditional ''subscription journals,'' while a relatively small percentage of OA articles (37.2%) were published in full OA journals. Among the OA articles published in subscription journals, about 60% were published in society journals. Among the articles in full OA journals (basically, an author-pays model), on the other hand, about 60% of the articles were published only on BioMed Central.
In conclusion, OA in the biomedical field in 2005 was achieved under an umbrella of existing scholarly communication systems, the majority of which still use traditional paid-access journals. The OA innovations, author-paid OA journals published by BioMed Central and self-archiving efforts such as IRs and authors' websites, were part of the picture. Both of these methods, however, contributed to only a small portion of OA articles (21% and 10%, respectively).
In 2008, the NIH updated its public access policy, which now mandates OA. According to the NIH's statistical report, the number of ''author manuscripts'' submitted per month since April 2008 is about four times higher than it was in 2005 [16]. While this study provides a valuable snapshot of the state of OA in 2005, further study is needed to investigate whether or how the NIH's renewed public access policy will affect trends in OA in the biomedical field. To do so, it will be necessary to investigate continually not only the rate of OA, but also the details of the distribution of OA among different publishers and publisher types.