A Bibliometric Analysis of Linguistics Publications in the Web of Science

This article investigates the world’s publications in the category of linguistics based on Thomson Reuters’ Social Science Citation Index (SCCI) from 2005-2014. Research achievements were recorded and analyzed, including publication year, characteristics of journals, productivity of authors, institutions, and countries, publication years, citation life of highly cited articles. Results found that while the number of publications dramatically increased, the average number of authors, cited references, and count page in proportion to the number of publications per year remained quietly stable. The USA was the most productive country in all the bibliometric indicators; total of publications single country articles, national and internationally collaborative article; of which University of Illinois in USA ranked the first institution all these indicators. M. On slow from the University of Melbourne was the most productive author. Journal of Memory and Language was the most productive venue of publication; among which, Baayen’s et al. (2008) article was seen as the most top article. Implications of the study findings to researchers and research policy makers are highlighted at the end of the article.


INTRODUCTION
Web of Sciences database is almost the best standard index to indicate the research performance across different disciplines.Hence, it becomes crucial to assess the research performance in a category to see productivity of academics, countries, institutions that have been done so far. [1,2] bibliometric analysis is an efficient way to quantify the quality of published work for organization, authors, and countries by analysing the data obtained from several indices and converting it into numerical Figures.Research has analyzed the research outcome from SSCI across different fields such asbusiness, business finance, economics, management, [3] nursing, [4] social work, [5] health care sciences and services, [6] psychiatry, [7] and information science and library science. [8]Bibliometric analyses have also been carried out on research from A and HCI to evaluate the quality of published work in various disciplines, for example dance [9] andreligions. [10]Linguistics category, like other categories, has a long history, dating back to the work of von Humboldt and Saussure in 19 th century and to the Bloomfield in the early 20 th century. [11]Yet, this work had not been documented in research papers.The first leading periodical in linguistics category that indexed in SSCI is The Modern Language Journal which was launched in 1916, starting with 29 documents; 12 articles and 17 book reviews, and had gained 11% of world's publications till the year 2014.One year later, The International Journal of American Linguistics had been created, and the Language Journal has appeared in the mid of 1920s, covering 193 documents up to the year 1930.In the subsequent decades, (i.e, 1930, 1940s, and 1950s), 12,949 documents were published in 14 journals.Publications had been tremendously increased in the 1960s to have 98% of the publications appeared in the period between 1930-1960.These dramatic changes could be ascribed to the work of Noam Chomsky with his well-known theory "Innatist Theory" as a reaction to behaviorists' views of language acquisition proposed by Skinner in the beginning of 20 th century.Publications had been increased over the decades in parallel with increase in the venue of publications to gain its height rate in the first decade of the new millennium; 34,130 records published in 188 journals, covering several sub-areas of linguistics such as applied linguistics, theoretical linguistics, psycholinguistics, neurolinguistics, computational linguistics, and translation.It is of paramount importance to analyze the productivity of Linguistics category across countries, authors, and institutions to see the world's contributions in the last ten years to see the productivity over territories, authors, institutions, and venue of publication.
We claim that there is no single bibliometric study that investigated thoroughly the world's publications in the field of linguistics.There was abibliometric study in this regard [11] but s focused solely on a comparison of linguistics' publications between SSCI and A and HCI.There was also an attempt made by Mohsen [12] to examine the attitudes of researchers towards publishing in the Web of Science but it was descriptive and subjective-based.However, Arik [11] did not analyze other factors such as citation per publications, the productivity across nations, authors, areas of research, organizations.This bibliometric study aims to investigate thoroughly world's publications in Linguistics category in SSCI from 2004-2014 to see how this category had been researched and what are the research trends run in the last ten years period of examination.

METHODOLOGY
The analysis provided in this study is based on the Social Sciences Citation Index (SSCI) database of the Web of Science Core Collection of Thomson Reuters (updated on February 05 th , 2016).According to Journal Citation Reports (JCR) of 2014, there are 3,154 journals with citation references across 56 scientific disciplines in SSCI edition.172 of those journals are listed in the category of linguistics.A total of 2,261,354 documents from 2005 to 2014 were found in SSCI.Results were refined by selecting the Web of Science category of linguistics (50,351 documents).Therefore, 2.2% of the total documents were published in the Web of Science belong to category of linguistics, including articles, book reviews, editorial materials, proceedings papers, reviews, meeting abstracts, biographical items, corrections, letters, bibliography, news items, book chapters, software reviews, reprints, books, and hardware reviews.The total articles retrieved for further analysis were 33,479.We chose articles in this bibliometric analysis because they contain description of complete researches and results. [13]Data about those articles and the total annual citations for each article were downloaded.All results were analyzed using Microsoft Excel 2013. [14]e total number of citations from Web of Science Core Collection since publication to the end of 2014 was referred to as the TC 2014 . [15,16]The advantage of TC year is that it is an invariant parameter, thus ensuring repeatability, in comparison with the index of citation from Web of Science Core Collection which has been updated from time to time.[17] Citations per publication (CPP 2104 ) is TC 2014 /total publication. [18,19]

RESULTS AND DISCUSSION
We analyze the data in terms of publication year, productivity of authors, countries and institutions, sourcing titles, citations per publication.All acronyms are explained beneath the tables.

Publication Year
Several characteristics have been launched for the journals' articles for the timespan set for this study [20] .Table 1 shows that number of publications had been dramatically increased.Number of publications in 2014 was almost four times of number of publications in 2005.Number of authors, references cited, and page count had been increased tremendously.However, the average number of each indicators in proportion to the number of publications per year remained quite stable.For example, 261% increase in the number of authors, but the average number of authors remained quite the same; from 2.2 in 2005 to 2.1 in 2014 with average of 2.0.Similarly, the average number of references cited in relation to total number of publications per year had not been changed significantly, reaching 45 in 2005 and 49 in 2014 with whole average of 46 for the total periods.The average number of pages remained also the same in proportion to the number of publication, obtaining 21 pages in 2005 and 20 pages in 2014.There was a steep in the average of citations per publication (CPP 2104 ) over the ten years period, reaching the peak in 2005 with average of 17 and went down to 0.25 in 2014.

Characteristics of the Venue of Publications
Only the top 15 journals were selected to be analysed for their quantity and quality.To analyse the journals' quantity, number of publications was counted, and the percentage was calculated out of the total 195 journals indexed in linguistics category in the period between 2005 and 2014.Additionally, journal's quality wasassessed by their impact factor in Journal Citation Report (JCR) 2014.Results indicated that Journal of Pragmatics was ranked the first in number of publications (1,349

Institutions and Countries
Publications over institutions and countries were analysed and compared to see the most productive institution and country in the category of linguistics.3 was 54 countries, while 4,828 (15%) were internationally collaborative publications from 120 countries.In recent years, Ho and co-workers proposed several indicators to examine the publication performance of countries and institutions, including total publications (TP), independent publications (IP), collaborative publications (CP), first authored publications (FP), corresponding authored publications (RP), and single authored publications (SP). [21]ble 3

Citation Life Cycles of Highly Cited Articles
The citation history of papers gives more details of the impact character of articles [16] .In last decade, citation life cycles of highly cited articles were discussed in number of research topics [17,21] .High citation of a published document indicates the more visible of that cited document.The higher the citation, the more value a document will likely get. Figure 1 shows the highly cited articles in linguistics category.As it shown in Table 5 and Fig. 1, the article entitled "Mixed-effects modelling with crossed random effects for subjects and items" published in Journal of Memory and Language by [23] Baayen et al. got the first rank in almost all the citations parameters; total citations since publication to the end of 2014 (TC 2014 = 1,172); total citations in 2014 (C 2014 = 338); and total citations per year(TCPY=167).Baayen et al. [23] introduced mixedeffects models for the analysis of repeated measurement data with subjects and items as crossed random effects.R.H. Baayen [23] as both first and corresponding authors from University of Alberta in Canada published an article with a "distinguished pattern" [24,25] in linguistics field Figure 1.A steep slope could be found with the distinguished patterns of citations per year [24] .Article entitled "Random effects structure for confirmatory hypothesis testing: Keep it maximal" [26] , published in Journal of Memory and Language, got the first rank in the total citation of the publication year (C 0 = 26) and the sec position in the total citation in 2014 (C 2014 = 172).This paper stated that researchers using Linear mixed-effects models (LMEMs) for confirmatory hypothesis testing should minimally adhere to the standards that have been in place for many decades [26] .Furthermore, Jaeger's [27] article entitled "Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models" obtained sec and third ranks in TC 2014 and C 2014 respectively.This paper identified several serious problems with the widespread use of ANOVAs for the analysis of categorical outcome variables such as forced-choice variables, question-answer accuracy, choice in production, et cetera [27] .The third rank in TC 2014 was for Budanitsky and Hirst's [28] article entitled "Evaluating WordNet-based measures of lexical semantic relatedness" which was published in Computational Linguistics Journal.Budanitsky and Hirst's [28] article also obtained the fourth place in total citation per year (TCPY=32).Table 5 also shows that, Naeser's et al. [29] arti-cle entitled "Improved picture naming in chronic aphasia after TMS to part of right Broca's area: An open-protocol study", which was published in Brain and Language Journal, got the fourth rank in TC 2014 .Most of the top cited articles had low C 0 , which meant the citations receivedat the beginning after publication did not influence the total citations.It is worth-mentioning that the journals which got high citations, and thus obtained high impact factors, are intersected with other categories such as neuroscience, audiology and speech language pathology, psychology, computer science, and artificial linguistics. (2) The higher the j an author has, the more potential the author must take the first order among the list of authors or to act as a corresponding author.Being the first or corresponding author means to play the major role in leading the publication.The value of h indicates publication characteristic constantthat differentiates the nature of the leadership role.When h> 0.7854, it indicates more corresponding author publications and when h< 0.7854, means more first author publications.When h = 0, j = the number of first author publications and when h = p/2, j = the number of corresponding author publications.There is a limitation for analysis of publication characteristics of authors and institutions by Y-index.Only articles with both first author and corresponding author information could be considered.A total of 33,244 articles by 34,308 authors in linguistics field were analysed.Figure 2 displays the distribution of the top 32 authors with j≥ 25, (j Cos h and j Sin h are chosen as the x and y coordinate axes).Each dot represents one value that could be one author or many authors.
The author who had the most potential to publish articles in linguistics field was M. Onslow (j = 43), followed by R.A. van Compernolle (j = 42), and L.B. Leonard and S. Montrul with j = 40 respectively.Publication characteristics constant, h, could help to obtain the different proportion of corresponding author articles to first author articles.The advantage of the Y-index is that, when j of authors is the same, publication characteristics of authors can be indicated by h.For example, the j of L.B. Leonard and S. Montrul were both the same of 40 Figure 2. However h of Leonard was 0.9343 but h of Barrera was 0.7854.Leonard had greater proportion of corresponding author articles to first author articles than Montrul.Within these 32 authors, T.A. Hall was the only author with h< 0.7854 (h = 0.7454).Hall had more first author articles than corresponding author articles, indicating thatthe top productive authors contributing to linguistics field were more likely to be designated as the corresponding authors.Of the 32 authors, 17 had more corresponding author articles than the first author articles.The remaining 14 authors werejust on the boundary line (h = 0.7854), having equal number of authorship characteristics; be first author or corresponding author.Figure 1 shows they had different publication potential.These authors probably contributed more to the initial conception and supervision of study. [34]A potential bias in analysis of authorship might occur when different authors have the same name or authors used different names over time in their articles. [20]

Keywords Analysis
In recent years, Ho and co-workers proposed an indicator to infer research trends and its main foci. [24,35]Distribution of title words, words in abstract, authors' keywords, and Key Words Plus over different periods of time have been thoroughly examined to elucidate research focus. [36,37]Key Words Plus in the Web of Science database supplied additional search terms extracted from the titles of articles cited by authors in their bibliographies and footnotes. [38]Word clusters has been further applied to determine the research foci and to characterize article lives in subsequent years. [24,35]Among 33,479 articles in linguistics field indexed in SSCI, 31,894 articles (95%) had recorded information of abstract, 24,248 articles (72%) had recorded information of author keywords while 22,748 articles (68%) with Key Words Plus information were analyzed.

Words in titles
One way of measuring the research tendency is to analyze words in titles.Authors select words in titles carefully as toappear thought-provoking to readers to draw their attention to the published document to detect the research focus, the variable investigated, and research environment.
The sec most used word was "English" which indicated that research had mostly investigated English language as it is widely spoken and written worldwide, followed by "Spanish" which was also most repeated in research titles, gaining the 8 th rank.Words such as "learning", "speech", and "children" showed that most research trends in the target 10 years period was focusing speech pathology as well as language acquisition area

Author keywords
Author keywords analysis may reveal the most areas of interest to researchers.Recently, the managing submissions systems used by journals ask authors to place their keywords, which would be later used by journals' editors to assign reviewers for new submissions reach journals.
Author keywords had been used for analysis in recent bibliometric studies [36] to track the research hotspots and major directions of scientific research. [16]Author keywords appeared in the articles were calculated and ranked by total 10-year and 2-year sub-period.In total49,499 authors words were used.Table 6 shows the most frequently used 20 words.
The top 20 author keywords showed that most of these words are mostly shared by different disciplines such as "speech pathology", "applied linguistics", and "discourse analysis".Two words that were exclusively used by the "speech pathology" area; (1) "aphasia" which was ranked as the top seventh author keywords and "stuttering" which took the ninth position in the author keywords list Table 6.It has been argued that the authors' keywords analysis may have some limitations to generalize the research findings in terms of research trends due to use of synonymous words, use of acronyms, and spelling variations.To avoid this problem, we checked the words retrieved from the Web of Science to see if the indicators were available in our set of data.We found two synonymous words in the data set; "sec language" and "L2".

Key Words Plus
Key Words Pluses examined in bibliometric analysis to show the novelty in research [36] .According to Garfield [38] , Key Words Plus could be extracted from the titles of cited documents in authors' biodata placed either in their bibliographies or footnotes in the Web of Science database, leading to augment the total words in titles and author keywords indexing.Key Words Plus were calculated for the 10 years with a total of two-year sub-period.Like the author keywords, results of the Key Words Plus showed that top frequent Key Words Plus were used by multiple areas of linguistics such as "applied linguistics", "speech pathology", "discourse analysis", and "theoretical linguistics".For example, the words "language", "English", "acquisition", "children", and "comprehension" were used much frequently in the targeted articles.The word "language" was placed in the first rank of the top frequent word in Key Words Plus during the 10-year period, followed by the word "English" as the sec rank of the most frequent Key Words Plus.The word "acquisition" was in the third rank which was exclusively related to the area of first or sec language acquisition (i.e., a sub-area of applied linguistics).All together

CONCLUSION
An overview of the world's research in the category of linguistics during 2005-2014 was presented in this bibliometric study.Several issues havebeen investigated to provide insights into the research tendencies of the world's publications in linguistics during the last ten years period.We observed a steady increase of publications in the category of linguistics, but the average of publication was quite stable in relation to the average of authors, cited references and count number.The USA, UK, Canada, Spain, and Germany were the most productive countries in the linguistics publications.However, the USA took the leading position in total of publication along with highly citations per years, followed by the UK.Highly cited venues of publications along with highly cited articles were those that studied the area of speech pathology which intersected with other areas such as psychology and neuroscience.Research trends as shown by analysis of words in titles, author keywords and keywords plus indicate that the area of speech pathology was the most researched area in the linguistics category.These study findings are of implications to the research policy makers to increase the publications in the category of linguistics particularly the Asian countries where the research in this area is still scarce.There is a myriad of languages in Asian countries which worth investigation by linguistics researchers to enhance the quantity and quality of research.

Figure 1 :
Figure 1: Citation lives of the top sixpublications with the TC2014≥ 220.

Table 1 : Characteristics of journal articles in Web of Science category of linguistics, 2005-2014.
articles, 4.0% of 33,479 articles).Concerning journal's quality, Journal of Memory and Language obtained the highest rank in terms of the impact factor (IF 2014 = 4.23), while Dynamics of Language: An TP: total number of articles; AU: number of authors; NR: number of cited references; PG; number of pages; TC 2014 : total citations since publication to the end of 2014; CPP 2104 : TC 2014 /TP

Table 2 .
am 57: 26: 18 in the Web of Science category of linguistics.It revealed that USA's research in linguistics field had more national teamwork or collaboration among different institutions than international collaborations among different countries.USA was the only one that had more national than international collaborations Table3.South Africa was more inclined or able to conduct research independently with S :N : I = 70 : 7.0 : 23.Thirteen countries in Table3had at least 50 % institutional independent articles of their total publica- To avoid ambigu-ity of territories' names, articles originating from England, Scotland, Northern Ireland, and Wales were reclassified as being from the United Kingdom (UK).Excluding 546 articles (1.6% of 33,479 articles) without any author address information on the Thomson Reuters Web of Science, the remaining were32,933 articles originated from 137 countries.Among those articles with author affiliations, 28,105 (85%) were independent publications by 116 while Australia was ranked in the third place in nationally collaborative research (310; 18%).A ratio (S:N:I) of percentage of single institute articles in a country (%SP): percentage of nationally collaborative articles in a country (%NCP): percentage of internationally collaborative articles in a country (%ICP) might be used to describe institutions or countries publication characteristics.[22]USA'sratio of S:N: I tions.Germany and Spain published similar articles; however S :N : I shows that Germany (49 : 12 : 39) published much more internationally collaborative articles while Spain (64 : 15 : 20) published much more single institute articles.Average S :N : I for 18 countries in Table

Table 3 : Characteristics of top 18 contributing countries (TP> 400).
Of all the 32,933 articles with author addresses in Web of Science, 21,295 (65%) were institutional independent articles and 11,638 (35%) were collaborations by two or more institutes.Among the top 20 institutes, seven were in the USA, four in the UK, three in Netherlands, two in Australia, two in Canada, and one each in Belgium and China Table4.University of Illinois got the first rank in the total TP: total articles; SP: single institute articles; NCP: nationally collaborative articles; ICP: internationally collaborative articles; %SP: percentage of single institute articles in a country; %NCP: percentage of nationally collaborative articles in a country; %ICP: percentage of internationally collaborative articles in a country.: 16 : 30.However, average S : N : I for 137 countries in linguistics field was found to be 50 : 8.0 : 40."SNI" shows three important rates related with collaboration which was comprehensively and visually higher than just one traditional collaboration rate at a time for measurement and comparison. of articles (TP = 386), in SP (167; 42%), and in NCP (153; 40%).University of Toronto in Canada achieved the sec rank in both TP and SP.Max Planck Institute for Psycholinguistics in Netherlands obtained the first rank in ICP (145; 60%).A pattern of S:N : I = 41: 30: 29 was found for all institutions in Table4and 31: 44: 25 for all institu-

Table 4 : Top 20 most productive institutesduring 2005-2014.
TP: total articles; SP: single institute articles; NCP: nationally collaborative articles; ICP: internationally collaborative articles; %SP: percentage of single institute articles in a country; %NCP: percentage of nationally collaborative articles in a country; %ICP: percentage of internationally collaborative articles in a country.

Table 6 : The 20 most frequently used author keywords.
TP: total articles; R: rank 15,481 Key Words Plus were found in 22,748 articles.Only six Key Words Plus can be found in more than 1,000 articles includinglanguage (3.