The Status and Patterns of Open Access in Research Output of Most Productive Indian Institutions

Open Access is emerging as an important movement worldwide since last few years, triggered mainly by the high subscription cost of pay walled journals that create barriers in universal dissemination of knowledge reported in those journals. The paywall barriers to access of knowledge has become so problematic that even institutions in the developed countries are not only cancelling subscriptions but also mandating it for their researchers to either publish in open access journals or at least deposit their research papers in Institutional Repositories. The high subscription cost of journals is a more serious issue for developing countries, as it takes away institutional resources that can be used for other productive purposes. India has taken several steps in promoting open access, including release of an open access policy by Ministry of Science and Technology, however, it is not very clear that how effective these initiatives have been. This paper intends to address this issue. It examines published output, indexed in Web of Science, from 100 most productive institutions in India and analyze how much research output coming from them are available in Open Access (OA). The paper further analyzes availability of research papers from these institutions in the popular pirate site Sci-Hub. It is interesting to observe that legal OA percentages are significantly lesser than the Sci-Hub availability for all the institutions, an indication that the existing systems for promoting open access in India are not working efficiently. At the end, the paper also presents statistics about number of papers deposited in three central institutional repositories in India. These statistics provides an indication of the extent to which these repositories have been able to promote open access in India. The paper concludes by pointing to some factors that impede Open Access in India.


INTRODUCTION
The increasing subscription cost of journals from large publishing houses is becoming a severe deterrent in universal open access to knowledge. Not only institutions in developing countries are finding it difficult to pay these costs but several institutions in the developed world are also cancelling subscriptions of many journals. Many top Universities in the world are now asking their researchers to publish in open access journals so as to make the knowledge generated accessible, without barriers, to the world. Institutions are creating Institutional Repositories (IRs) to keep copies of papers published by their researchers in different journals. The top funding research organizations of the world are now making it mandatory for their researchers to submit pre-or post-print of their research papers in different repositories, either institutional or disciplinary. Organizations like US National Institutes of Health, US National Science Foundation, Welcome Trust, Bill and Melinda Gates Foundation, the European Commission etc. are some examples. Further, almost all major publishing house are allowing submission of papers that have been earlier submitted as pre-print in repositories.
India has also tried to move towards Open Access, both through efforts at the national level as well as at the institutional level. Open Access culture in India was initiated by Indian physicists back in the early 90s when they started depositing their preprints in arXiv. Later, they were joined by Mathematicians, Computer Scientists, Biologists etc. Since then there have been several initiatives taken, though the impact was not as expected. Some of these early initiatives include Mirror server for arXiv set up by Institute of Mathematical Sciences (IMSc) at Chennai, the Vidyanidhi Digital Library for electronic thesis and dissertations from the university of Mysore in 2002 [1] and EPrints@IISc electronic repository by Indian Institute of Science, Bangalore in 2002. [2] The open access and open educational resources were discussed by the National Knowledge Commission in a report in 2007. [3] In 2009, CSIR headquarters sent a memorandum to all of its 38 laboratories in the country to set up institutional open access repositories, as detailed ahead. The work of Das [4] and Arunachalam and Muthu [5] can be referred to know about initial open access initiatives and projects in India.
The Government of India's most recent effort in this direction was the release of the Department of Biotechnology (DBT) and Department of Science and Technology (DST) Open Access Policy in 2014. [6] The DBT and DST, the two main research Departments of the Ministry of Science and Technology of Government of India have jointly issued the policy. It states that "since all funds disbursed by the DBT and DST are public funds, it is important that the information and knowledge generated through the use of these funds are made publicly available as soon as possible....". The fundamental guiding principal of this policy is the fact that public funded research outcomes should be publicly available. The policy also envisaged that open access would allow percolation of cutting-edge research in Higher Education curricula, which in turn will raise the standards of technical and scientific education in the country. Through the policy, institutions were encouraged to setup institutional repositories (IRs) and to deposit all their research outcome in them. A central harvester sciencecentral.in was also created and it was expected that all institutional repositories would eventually link to it. All institutions receiving funds from DBT and DST are expected to mandatorily follow the policy.
In another similar step, the open access mandate [1] has been issued by Council of Scientific and Industrial Research (CSIR) which instructs the setup of interoperable institutional open access repositories at all CSIR laboratories. Through this mandate CSIR took the initiative to "lead the open access movement within the country". The csircentral.net established by CSIR-URDIP is the central harvester that is expected to link different institutional repositories created by individual CSIR labs. The Indian Council of Agriculture Research (ICAR) has also created an open access policy, [2] which required setting up of open access institutional repository by each ICAR institute. It has also setup a one stop access platform, to provide access to all the agricultural knowledge generated in ICAR, known as Krishikosh. [3] In 2018, the Delhi Declaration on Open Access [4] was signed by a group of academician and open access enthusiasts, which advocated for "the practice of open science" and "adoption of open technologies for the development of models for sharing science and scholarship". The IndiaRxiv [5] of the Open Science project is the most recent addition to the open access cause. Open Access India has initially launched AgriXiv, a preprints repository for agriculture and allied sciences and then launched IndiaRxiv. However, this preprints repository has only 67 preprints available as on 16th Feb 2020, since its launch in April 2019 and is yet to pick its momentum.
Despite all these initiatives and efforts, India has somehow lagged behind in ensuring open access to its research output. There are no proper estimation of the research emerging from Indian institutions which is in open access. It is in this context that this paper tries to measure the current status of open access in research output from 100 most productive institutions in India. These institutions provide a good estimation of research available in Indian institutions in open access as they together account for about 82% of the total research output from India for the year 2016 as indexed in Web of Science. Papers covered in WoS are regarded as a benchmark of quality by several institutions and there is a general tendency for researchers to access papers that are in this database. To provide another perspective and complement this study, a detailed examination of volume of research papers in various institutional repositories (IR) in India is also examined.
In addition to computing open access levels of research output indexed in Web of Science and in institutional repositories in India, the study also present statistics of availability of these papers in Sci-Hub. Sci-Hub provides a questionable route for access to scientific papers as it ignores copyright clauses to allow users to download papers which are not openly licensed content. Our motive at examining paper availability in SCI-Hub is not an endorsement of this type of database but to underscore that lack of Open Access may lead to these types of behaviors. Apart from this it can lead to legal consequences for readers, authors and institutions to which they are affiliated, if they use this approach.
The paper focusses on three questions: First, how much scholarly research output from the 100 most productive Indian Institutions is available in legal (such as gold, green and bronze) and black access models? Unpaywall for publications from worldwide and found that about 28% the scholarly articles are available in open access. Bosman and Kramer [17] collected data from Web of Science using its oaDOI service and explored open access levels across research fields, languages, countries, institutions, funders and topics and found high variations in open access levels on all these dimensions.
There are, however, very few recent studies on open access levels in publications from India. Among the recent studies, Kumar and Mahesh [18] tried to analyze the Institutional Repositories in India, as a means of providing open access to articles. They analyzed the statistics of submission of papers for about one year and found that submissions to the Institutional Repositories were very low, with some not even getting a single paper deposited in the whole year. They concluded that the Institutional Repositories in India have not really picked up in terms of papers deposited. Another recent article in Nature [19] talked about the newly proposed IndiaRxiv repository, with the caution that the performance of Institutional Repositories in India is not very good. Piryani et al. [20] is another recent study that focused on measuring open access levels for India as a country. They tried to analyze the overall level of open access in Indian research output by taking data from the Web of Science for all publications during 2016. They conclude that the overall open access level in Indian research output is about 24% of the total output, which is less than the world average. Another study by Singh et al. [21] have analyzed Indian research output data in Web of Science for the period 2014-18 and obtained OA evidence from Unpaywall as well as volume of papers available for free download from Sci-Hub. However, both of these previous studies only looked at overall data for India as a county and did not go at the institutional level. Therefore, there is no evidence available about what proportion of research papers from different institutions in India are available as OA. Further, no recent analysis is available about number of papers available in Indian IRs.

Data and Methodology
The 100 most productive Indian institutions (including institution systems) account for a total of 62,688 publications out of total 76,530 publications indexed in Web of Science for India for the publication year 2016. This is about 82% of the total research output from India for the year 2016 as indexed in Web of Science. All these 62,688 publication records are then scanned one by one to find out if they are available in open access in some platform. freely available to everyone. Then, there are paywalled journals that requires reader to pay before accessing the article. But these journals may also make some articles freely available to everyone, either after payment of an article processing charge or after a particular period from publication date. Open access articles are classified into following main categories based on Black open access: This refers to an article that is shared on illegal pirate sites, such as Sci-Hub [6] or LibGen. [7] However, this type is not well recognized as open access in the literature.

Closed access:
This refers to all other articles that are not openly accessible in legal forms. The copyright is with the publisher and readers need to pay to access the paper.

Related Work
There exist several previous studies that tried to understand and characterize open access (OA) patterns in research outputs at the international level. Hajjem et al. [7] is one of the earliest studies to have analyzed open access availability of articles and found that OA articles have comparatively more citations than non-OA ones. Bjork et al. [8][9][10][11][12][13] through their multiple studies during 2010 to 2017, analyzed the open access patterns in scientific publishing, with varied data. Archambault et al. [14,15] analyzed the proportion of open access peer-reviewed papers at European and world levels for 2004-2011 and 1996-2013 time periods, respectively. They have shown that several countries, including Brazil, Switzerland, Netherlands, US have more than 50% of the research articles freely available. Piwowar et al. [16] used three different samples of 100K articles each drawn from Web of Science (WoS), CrossRef and terms, only about 23% of the combined output of 100 most productive institutions in India is available in open access. [10] The It can be seen that PHY discipline accounts for the highest proportion (23%) of articles in the total open access articles for all the institutions taken together. This is followed by MED discipline with a 17% share and MUL discipline with a 13% share. Thus, PHY, MED and MUL taken together contribute more than 50% of open access articles in the combined output of the 100 institutions. SS, MAT and (surprisingly) INF disciplines have a very low contribution to open access articles, possibly also because they have less volume of output too. PHY discipline is the most interesting case with only 12.25% contribution in the combined output of the 100 institutions but when it comes to contribution to open access articles, it is much higher at 23%. One possible reason for this could be the existence of the well-known arXiv repository where Physicists are the key contributors.
After looking at the OA levels for combined data of all institutions, we analyzed OA levels at the level of granularity of individual institutions. Table 1 presents the detailed data for all the 100 institutions (including institution systems). It shows the name and location of the institution, number of records it has in Web of Science, number of records that are available in open access and also percentage contribution that this institution makes to total output as well as to total open access articles from India. In order to understand open access levels in each of the institutions in more detail, we categorize institutions in three categories: OA_Low, OA_Med and OA_High corresponding to open access percentage levels of below 25%, between 25 to 45% and above 45%, respectively. It can be observed that out of 100 Institutions, there are 58 Institutions under OA_Low category, 32 Institutions under OA_Med category and 10 under OA_High category. The statistics for these 100 institutions are also plotted in Figure  2 as a scatter plot for an alternative way of visualization and understanding. Here, the x-axis denotes the number of papers Thus, the data for analysis in this work is obtained from four sources: (a) Web of Science 3 (WoS) database, (b) Unpaywall [9] portal, (c) Sci-Hub and (d) main IRs in India.
For all the publication records downloaded from WoS, the OA evidence is obtained from the Unpaywall portal through API calls. The data downloading from WoS and OA evidence from the Unpaywall portal was obtained in the month of July 2019.
The publication record data from WoS was used to make an automated lookup in Sci-Hub website. The crawling was done using a custom Python script to identify which of the DOIs in our WoS data have full text available in Sci-Hub. Thus, for all the 62,688 publication records obtained from the WoS database, the Sci-Hub portal was queried and evidence of availability of full text was recorded.
Further, as was observed by Das and Dutta, [22] there were 726 open access journals covered by WoS in 2017, which covers 7% of the total open access articles globally, therefore, it is clear that capturing open access output from WoS only would limit any study. It is for this reason, we also tried to look at data in IRs in India as this data complements the study and provides a more informed assessment of institutional research that is in open access from India. Data from the three main central portals and their associated IRs, as explained later, was obtained and analyzed.
The publication records data has been analyzed computationally by writing programs in Python and R. Standard computational methods are used for computing results and generating plots. As stated earlier, the OA evidence was obtained from Unpaywall through an API lookup in the portal. One important point of analysis in the paper is to look at disciplinary variations in open access. For this purpose, each publication record is tagged into one of the 14 broad research disciplines, as proposed in a previous work. [23] The Web of Science Category (WC) field is used for this tagging.

Open Access Evidence from Unpaywall
The

Papers available in Sci-Hub
We also looked at how many papers from each institution are available as full text for free download in the popular pirate site Sci-Hub. Our argument is that Indian papers if available in legal forms of open access will not promote illegal access, as researchers, more so in developing countries, are constrained by research funding and hence tend to exploit this type of resources (see for example Gresheke). [24] At the same time, it is equally interesting to note that Sci-Hub is also frequently used for paper downloads from both, the developed and developing countries.  [21] Have also found that Sci-Hub seem to complement the availability in different disciplines too, with disciplines having low legal OA levels being preferentially covered in Sci-Hub.

Open Access through Institutional Repositories
Various ways have been implemented to promote open access in India by several organizations, including creation of Institutional Repositories. The DST-DBT's sciencecentral.in, [11] CSIR's csircentral.net [12] and ICAR's krishikosh [13] are some of the most prominent central institutional repositories in the country. Various government organization have now made it compulsory for their researchers and scientists to submit their research output in relevant IRs. Different funding bodies are also increasingly promoting the cause of public access to public-funded research. The grant award recipients are being encouraged to submit their research outcomes in relevant IRs. Given these initiatives, IRs can play a major role in providing open access to research articles. We have, therefore, tried to analyze the number of research articles that are accessible by the way of being deposited in prominent India IRs.   of Science and in fact the actual publication volume would be much higher than that, it can be concluded that articles accessible through this IR will be lesser than even 15% of the total volume.
The csircentral.net IR is the other big central repository. All the IRs of different labs that are part of CSIR system are regularly harvested by this central harvester. We observe that it has about 30 IR's with a total of about 100,609 articles deposited. Table 4 shows the details of number of articles in each of these 30 IRs.
In this case also, taking the Web of Science publication count as volume benchmark for India for the concerned period, it is seen that there are 747,759 papers in Web of Science, out of which less than 13% are available through this IR. Given that in actual practice the total publication volume for India would be much higher than the Web of Science indexed volume, this availability proportion is actually much lesser than 13%.  Fourthly, the number of papers available in Sci-Hub from all the 100 institutions (including institution systems) is many times higher than those available in legal open access forms. More than 2/3 rd of the institutions in the set have higher than 90% of their papers available for free download in Sci-Hub. These results are in a sense an indication of failure of the legal OA models. Results indicate that there are either deterrents to use of legal OA models or possibly some kind of apathy in researchers to use legal OA models (such as depositing papers in institutional or disciplinary repositories).
Lastly, analysis of major Indian IRs show that the proportion of papers deposited in the IRs is very low and IRs as such could not emerge as a significant source of open access availability of research articles. The analysis of the three major repositories: Sciencentral, csircentral and krishikosh suggest that more efforts are needed to promote the IR culture in India. At the same time a lot needs to be done to understand why IRs are not an attractive medium for Indian researchers to disseminate their research in open access. These efforts could not only be limited to funding agencies but all the research institutions should make it mandatory for their researchers to submit their papers (or pre-print or post-print) to concerned IRs. Individual scientists should also be encouraged and incentivized for submitting their papers to IRs. This would not only help other Indian researchers who do not have access to costly journal subscriptions, but also the researchers themselves as their paper would find more citations and use.
taken together are 18,486, which would again be a very small proportion of Indian research output in the area.
The most recent initiative in the repository culture of open access in India is the creation of IndiaRxiv hosted at indiarxiv. org. However, there are only 67 preprints available as on 16th Feb. 2020, almost 10 months after the launch of the repository in April 2019. This repository is thus yet to gain popularity and momentum of deposits.

CONCLUSION
The paper tried to analyze the research output of 100 most productive institutions (including institution systems) in  It is quite clear from the study that much more needs to be done to promote open access culture in India, including the provision of incentives to researchers and institutions that help promote open availability of research output from Indian institutions. The study opens up new research questions, possibilities for expanding the scope of research and some important lessons.
One important aspect that this paper could not analyze is the amount of research output from India that is deposited in disciplinary repositories like arXiv or in academic social networks (such as ResearchGate). Academic social networks are now increasingly being used by researchers and also indexed by Web search engines. There are some difficulties in accessing these platforms for an automated crawling which prevented us from analyzing data from them. However, we are working on finding solutions to this and hope to look at this aspect as well in future work.