Evolution of a research field—a micro (RNA) example

Background. Every new scientific field can be traced back to a single, seminal publication. Therefore, a bibliometric analysis can yield significant insights into the history and potential future of a research field. This year marks 21 years since that first ground-breaking microRNA (miRNA) publication. Here, we make the case that the miRNA field is mature, utilising bibliometrics. Methods. Utilising the Web of Science™ (WoS) database publication and citation information, we charted the history of miRNA-related publications, describing and dissecting contributions by publication type (plus category, pay-per-view or open access), journal (highlighting dominant journals), by country, citations and languages. Results. We found that the United States of America (USA) publishes the most miRNA papers, followed by China and Germany. Significantly, publications attributed to the USA also receive the most citations per publication, followed by a close grouping of England, Germany and France. We also describe the relevance and acceptance of the miRNA field to different research areas, through its uptake in areas from oncology to plant sciences. Exploring the recent momentous change in publishing, we find that although pay-per view articles vastly out-number open-access articles, the citation rate of pay-per-view articles is currently less than double that of open-access. Conclusions. We believe the trends described here represent the typical evolution of a research field. By analysing publications, citations and distribution patterns, key moments in the evolution of this research area are recognised, indicating the maturation of the miRNA field and providing guidance for future research endeavours.


INTRODUCTION
With expanding scientific production and unprecedented access to information, a means of independently assessing and analysing research output becomes apparent and essential. Consequently, various data analysis tools and internet-based search engines have been devised to enable the processing and organisation of this scientific output into a manageable form. Bibliometric parameters are one such tool utilised for this assessment. First described by Paul Otlet in 1934(Rousseau, 2014, bibliometrics refers to the quantitative statistical analysis of publications to enable activity and dynamics within research fields to be mapped. The onset of the digital age revolutionised the manner in which scientific knowledge is produced and distributed. Gibbons et al. (1994) provide an account of this fundamental change in their concept of 'Mode 2' knowledge production. This refers to the development of a highly interactive and transdisciplinary research system that is socially distributed. Nowotny, Scott & Gibbons (2001) elaborated this concept still further, highlighting novel and contemporary scientific practices with an increasing range of 'knowledge producers.' Various theories of science evolution have been proposed, with some authors analogising the progression of research to the evolution of living organisms: the introduction of new concepts, development of novel research directions and the emergence and loss of hypotheses (Chavalarias & Cointet, 2013).
The field of microRNA research presents an exceptional opportunity to observe the progression of a novel area of scientific investigation from point of discovery to rapidly maturing field, using bibliometrics. The term microRNA (miRNA) was coined by Ruvkun (2001), to refer to a naturally-occurring class of short, non-coding RNA molecule between 19 and 21 nucleotides long. Lee, Feinbaum & Ambros (1993) discovered the first miRNA in 1993, isolating Lin-4 from the nematode Caenorhabditis elegans. It took seven years before a second miRNA, Let-7, was discovered by Pasquinelli et al. (2000). The revelation that Let-7 sequence, expression and function were conserved across animal phylogeny (Pasquinelli et al., 2000), from nematodes to humans, resulted in a research revolution. Subsequently, thousands of miRNAs have been identified in eukaryotes, including plants, fish, fungi and mammals. In humans alone approximately 2,555 unique mature miRNAs have been identified (http://www.mirbase.org/). While the exact function of all recognized miRNAs remains to be fully elucidated, they are known to regulate gene expression via binding target messenger RNA (mRNA), inhibiting translation or triggering mRNA degradation. Importantly, it has been demonstrated that in addition to their inhibitory role miRNAs can function to induce or activate transcript levels (Vasudevan, Tong & Steitz, 2007;Place et al., 2008). Through these mechanisms miRNAs fulfil a regulatory role in various cellular processes, including cell development, differentiation, proliferation and apoptosis (Place et al., 2008;Filipowicz, Bhattacharyya & Sonenberg, 2008;Bartel, 2004).
Deregulated miRNA expression patterns have been noted across all organisms, encompassing a wide spectrum of pathological processes, from immunological defects in fish, altered developmental phase transition and flowering time in plants, and neurodegeneration, cardiovascular disease and cancer in humans (Calin et al., 2002;van Rooij et al., 2006). Discovery of deregulated miRNA expression led to the hypothesis that miRNAs could potentially be used as diagnostic or prognostic markers of disease. Furthermore, miRNA are attractive therapeutic targets for the treatment of various conditions, including cancer. The novel role of miRNA and their importance to many different processes has led to an explosion into the scientific enquiry of miRNA function. The aim of our study is to utilise one data analysis tool, Web of Science TM , in conjunction with current theories of research evolution, to quantitatively analyse a novel field of investigation from its point of discovery, to outline its progression and potentially predict its future course.

Search terms and methods
The WoS database was searched utilising the terms "miRNA" and "microRNA" with the Boolean operator "OR". Upon analysis of initial search findings, conflicting results were identified, namely publications containing the following: mirna estuary, mirna bay, mirna river, mirna equation, Mirna A (author) and mirna SC. As such, the Boolean operator "NOT" was utilised to exclude these findings and refine our search results. Further to this, the research category of "Agriculture" was excluded from our search, as several inaccurate search results were identified within this classification.
Although recognised as the first miRNAs discovered, the initial publications pertaining Lin-4 and Let-7 did not utilise the terms microRNA or miRNA (coined in Ruvkun, 2001), as such these papers did not feature in the search results. To account for this, the Web of Science Core Collection database was searched for Lin-4 and Let-7 in isolation, utilising the Boolean Operator "NOT" to exclude miRNA and microRNA, so as to prevent overlap with original search results.
The publication timeframe analysed encompassed January 1993 to December 2013. Publications of all languages were accepted, comprising all peer-reviewed articles, including reviews, letters to the editor and editorials.

Data analysis
WoS data tools were utilised to perform certain elements of result analysis e.g. generating Journal Citation Reports. Additional data analysis was performed using Microsoft Excel 2010© and Minitab version 16 ® .
N.B. The results returned from WoS upon searching study criteria were found to increase with the passage of time. This is thought to be due to delayed indexing of journals among other factors. As a result, some figures containing quantitative numbers may differ slightly among sections of this manuscript, when totalled.

Database
Several research platforms currently exist for examining bibliometrics, including the Web of Science TM , Highwire© and PubMed TM . Prior to commencing this study, the suitability  of these databases was individually analysed to allow selection of the optimal data resource. Ultimately, WoS was chosen for the purpose of this study due to superior journal coverage, approximately 9,300 compared to HighWire (1,700) and PubMed (5,669). An additional significant factor was the return of a substantially higher proportion of results using the defined study search criteria (see section 'Search terms and methods').

Publication distribution
The number of published items identified pertaining to miRNAs, as catalogued in the Web of Science TM Core Collection database , totals 26,177 publications. The first publication occurred in 1993, with minimal additional publications (35)   with 25% of total miRNA publications (6,560). Of total publications identified, 99.2% of items were published in English (25,980 publications), with the remainder of articles in French (49 publications

Publications by country
To further analyse miRNA-related literature, the distribution of publications by country was determined. In total, 84 countries contributed to the miRNA literature, as outlined in  separating the United Kingdom into its constituent countries England alone would rank 6th (1,239 publications) behind the USA, the Peoples' Republic of China, Germany, Japan and Italy (Fig. S1).

Research categories
Analysing the ontological research categories by which miRNA publications are classified in the WoS database, the top 50% of miRNA publications identified can be categorised into medicine and health sciences, which comprises of Biochemistry Molecular biology (23.5%, 6,163 publications), Oncology (15.5%, 4,070 publications), Cell biology (15%, 3,923 publications) and Genetics heredity (10%, 2,554 publications) ( Table 2). Further ontological categories of miRNA research listed include horticulture, marine science and entomology.

Peer-reviewed publications
The categories of research, as defined in Table 2, are comprised of miRNA publications that feature in various international peer-reviewed journals. Table 3 outlines the 25 most prolific journals publishing miRNA research (Table S1, 50 most prolific). In the 20 year period analysed, the youngest journal Public Library Of Science ONE (PLoS ONE) delivered the largest number of miRNA-related publications at 1,589 (6% of total), with Nucleic Acids Research responsible for 489 publications (2% of total) and Proceedings of the National Academy of Science of the USA (PNAS) 451 publications (2% of total). We found the top 50 journals were responsible for 38% of total miRNA publications, of which the top 10 journals represent 18% of total miRNA publications identified.

Document type
MiRNA publications are comprised of various document types including original articles, review articles, news items, editorial material, corrections, reprints, database  Table S2 shows an analysis of published document types for the top 10 journals which published miRNA-related documents. Constant with the overall trend, articles featured most prominently, comprising 88% of publications for these top 10 journals (4,789 publications), followed by meetings abstracts 8% (393 publications), reviews 1.5% (73 publications), editorial material 0.6% (28 publications) and corrections 0.4% (20 publications). Proceedings papers, letters and database reviews comprise the remainder, providing minimal input.

Publication citations
For all miRNA related publications, 837,898 citations were found by querying the WoS Core Collection Database. In concordance with publication numbers, citations per year increased exponentially peaking in 2008, at which point citation rate decreased (Fig. 4A). Ranking the total number of citations by country, publications originating from the USA received the highest total number of citations (n = 475,300) followed by China (n = 72,265), Germany (n = 71,051), Italy (n = 47,084) and England (n = 42,970) (Fig. 4B). Analysing the average citation per item for the top 10 countries publishing miRNA material, the USA retained the first position (44.3 citations per item). However, the remaining positions changed with the second position now held by England (35.9 citations per item) followed by Germany (35.4 citations per item), France (34.4 citations per item) and Italy (34.2 citations per item). Interestingly, of the top ten countries China now displays the least number of citations per item (n = 14.5) (Fig. 4C). Examining the citations per item from the top 20 countries (as described in Table 1), the rankings change considerably with Switzerland (13th) now displaying the highest number of citations per item at 55.1, followed by USA (44.3 citations per item), then the Netherlands (39.6 citations per item), England (35.9 citations per item) and Sweden (35.8 citations per item) (Table S3). To further analyse the citation pattern, we investigated citations of the top 10 journals publishing miRNA material (Fig. 5). Proceedings of the National Academy of Science of the USA (PNAS) was the most frequently cited with 6% of the total citations (46,112 citations), followed by Cancer Research, Nucleic Acids Research and PLoS ONE each with 3% of total citations (26 763, 26 118 and 20 242 citations respectively). The final entry in this list is RNA A Publication of the RNA Society with 2% of the total citations (19, 217 citations).
Considering citations per publication, Table S4 outlines the 10 most cited miRNA publications since the discovery of this research field. The top 3 publications cited featured in the journal Cell, with an accumulative total of 11,581 citations (1.4% total citations).  (Table 4). Cell was still seen to contribute 3 of the top 10, with Nature, Nature Genetics and Nature Methods contributing 5 publications and PNAS one publication, with the addition of one publication by Science.

Open access versus pay-per-view
Of the miRNA related publications identified, 17% of publications were open access (n = 4,560), with 83% of publications pay-per-view access (n = 22,788) (Fig. 6A). Analysing the citations of these two categories of publication, open access items were cited  (Fig. 6C).

Hallmarks of miRNA research
Utilising the bibliometric data retrieved, we identified the key discoveries in the field of miRNA research. The seminal miRNA publication, outlining the discovery of these short RNA molecules, by Lee, Feinbaum & Ambros (1993) is certainly the first hallmark of miRNA research. Subsequent to this, recognition of conservation of miRNA sequence expression across animal phylogeny from nematodes to humans by Pasquinelli et al. (2000), with the identification of further miRNAs, also represents a significant key advancement in this field. As previously discussed (section 'Publication citations'), these papers feature 3rd and 28th respectively in the most cited miRNA publications, highlighting their visibility and influence. Following these crucial findings, discovery of the regulatory roles of miRNAs in various cellular processes, from differentiation to apoptosis, should be considered highly significant in furthering our understanding of the functionality of these short RNA molecules (Place et al., 2008;Filipowicz, Bhattacharyya & Sonenberg, 2008;Bartel, 2004). The next, related key event in the miRNA field was the discovery of deregulation of miRNA expression associated with human diseases, such as cancer (Calin et al., 2002;van Rooij  et al., 2006;Schaefer et al., 2007). This discovery raises a potential use of miRNA, as both predictive and prognostic biomarkers of disease. At present, multiple clinical trials are currently registered with ClinicalTrials.gov, investigating the ability of miRNA to function as biomarkers of disease and response to current therapies.
Another key event in the evolution of the field has been the discovery that miRNA are capable of extra cellular signalling (Valadi et al., 2007). This novel finding significantly added to our knowledge of the mechanisms that can be employed for cell-cell signalling and communication.
The most recent development in the field that should be considered significant is the therapeutic use of miRNAs as targeted therapies to modulate disease (Kota et al., 2009). The attainment of personalised disease management via the use of miRNAs is highly appealing, though many obstacles currently remain, including identification of optimal delivery methods, off-target effects and safety.

DISCUSSION
To our knowledge, this work represents the first bibliometric analysis of the miRNA field. Our analysis revealed an exponential increase in research output, with yearly publications more than quadrupling between 2005 (n = 356) and 2008 (n = 1,672), and increasing eighteen-fold by 2013 (n = 6,560) (Fig. 1). In describing phylomemetic patterns of science evolution, Chavalarias & Cointet (2013) outline the importance of 'special events' in the progression of a research field, with scientific output increasing significantly during the subsequent time period (Chavalarias & Cointet, 2013). Special events pertaining to the miRNA field can be identified by analysing the most cited of all primary research miRNA publications since the discovery of this research field (Table 4). The timeline covered by the 10 most cited primary research miRNA publications spans from 1993 to 2008, highlighting the continual key discoveries being made in the miRNA field. Of these 10 publications, 3 were published in the journal Cell, with 5 published in Nature or one of its subsidiary journals, one publication in PNAS and one publication in Science. The average journal impact factor of the 10 most cited primary research miRNA publications is 27.8, highlighting their visibility and influence as a driving force behind the exponential increase in yearly miRNA publications. Currently, one key event in the evolution of the field was identified that was not found by our bibliometric analysis: miRNA involvement in extracellular signalling (Valadi et al., 2007). This publication did not make it into our current top 10 (although it has >1,200 citations), possibly due to its relatively recent publication date (2008). However we anticipate in the future that this publication will enter the top 10 most cited papers in the field. The trend of miRNA publications therefore, adheres to Chavalarias and Cointet's association, with output increasing exponentially in concordance with hallmarks of their brief history-discovery (Lee, Feinbaum & Ambros, 1993), recognition of deregulation in cancer (Calin et al., 2002), cardiovascular disease (van Rooij et al., 2006), autoimmune and neurodegenerative disease (Schaefer et al., 2007;Sonkoly et al., 2007), potential use as disease biomarker (Cortez & Calin, 2009), expression in serum/plasma (Cortez & Calin, 2009) and use of anti-miR's as targeted therapy (Kota et al., 2009).
While 84 countries have contributed to miRNA literature to date, five countries dominate scientific production in this research field (Table 1). The USA, China, Germany, the United Kingdom and Japan are collectively responsible for more than 80% of all current miRNA literature with 96% of total citations. The distribution of citations per country differs considerably when considering average citations per publication. Switzerland exhibits the highest number of citations per item, with the USA featuring 2nd, followed by a grouping of the Netherlands, England, Sweden and Germany (Fig. 4C). While China features second in the top 10 countries publishing miRNA material, it exhibits the lowest number of citations per publication (n = 14.5), reflecting a large portion of uncited literature. Further analysing citations by journal access type (open access journals to pay-per-view), it was observed that pay-wall restricted journals, which represented the majority of publications (83%), accounted for 90% of total citations. Interestingly, the average citation per publication was 1.7 times higher for pay-wall restricted journals compared to open access journals (Fig. 6C). While this adheres to currently observed citation patterns in the literature, due to the growing popularity of open access journals and open-access publication requirements from funding agencies, it is proposed that this discrepancy will no longer be apparent in future years, with open-access publications and citation numbers currently increasing (Bjork & Solomon, 2012).
Of the miRNA publications identified in the WoS database, it is interesting to note that 69% of all documents were original articles, reflecting the relative youth of this novel field of investigation, with only ∼10 years of sustained multi-group research efforts. In an analysis of the progression of a field of science, Bonaccorsi (2008) outlines increasing diversity within research paradigms as an instrumental factor, attributing this to various scientific hypotheses and the investigative techniques applied to examine them (Bonaccorsi, 2008). With the discovery of a definitive role for miRNA in the pathogenesis of multiple disease processes, the predominance of medicine and health sciences becomes apparent ( Table 2). Categories of published research span the gamut of medical domains, from immunology to oncology, haematology to virology and neuroscience to surgery. While further areas of investigation feature such as entomology and agriculture, their contribution to overall research output is currently negligible.
In an analysis of the dynamic interest in research topics within the biomedical scientific community, Michon & Tummers (2009) identified trends that are exemplified by our analysis of miRNAs. When novel research is initially published, it generally appears in high impact journals, followed by a lag in scientific output, prior to subsequent progression of publications. The initial miRNA publication outlining the discovery of lin-4 appeared in 1993 in the journal 'Cell' which then brandished an impact factor of 37.2. Subsequent to this however, miRNA output did not significantly progress further until 2003. While miRNA publications began to escalate, the average journal impact factor of the top 10 publishing journals was 13.6. Five years later, with miRNA output increasing still further, the equivalent impact factor decreased to 9, with a further drop to 7.4 by 2012. With increasing scientific output, a shift towards lower impact journals is seen, producing a long-tail distribution of publishing when viewed by host journal impact factor. Originally described by Vilfredo Pareto, a social economist, the long-tail distribution can refer to a number of observable phenomena. In this context, it is used to describe a publishing pattern whereby high and medium impact factor journals feature in the minority, with the majority of journals having minimal citation impact (Michon & Tummers, 2009). Presence of this distribution is recognised as a sign of acceptance of a research topic as valid within the scientific community (Michon & Tummers, 2009). Reaching this stage of publication saturation, Pfeiffer & Hoffmann (2007) advocate the development of novel research directions within a given field as particularly advantageous, with pioneering work potentiating publication in high impact journals, and thus returning the cycle to the beginning of the long-tail distribution once more.

CONCLUSION
When we consider the ongoing remodelling of scientific production, our analysis of publication trends, citations and distribution patterns was very informative. Recognising the developmental stage of a particular research field provides researchers with direction and guidance, both in current and future investigative goals. The current unprecedented access to scientific material and bibliometric information provides an opportunity to analyse the dynamics of scientific landscapes, enabling the production of informed, targeted scientific outputs.

Abbreviations miRNA
microRNA WoS Web of Science PLoS ONE Public Library Of Science PNAS Proceedings of the National Academy of Science of the USA

ADDITIONAL INFORMATION AND DECLARATIONS Funding
Maire-Caitlin Casey, James A. Brown and Michael J. Kerin are funded by BREST-PREDICT and the National Breast Cancer Research Institute (NBCRI). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.