What You Publish Matters: A Novel Way of Measuring Research Sophistication

We adopt a concept from the field of international economics called “export sophistication” and transform it into two scientometric indicators we name PUBLY and REARY which can be considered proxies for research sophistication. PUBLY is a hypothetical value of income per capita connected with conducting research in a research area. REARY is a hypothetical value of income per capita connected with the research structure of a country. We present ranking of the best and worst performers based on the values of the indicators, discuss their importance and shortcomings. The indicators constitute a new tool for governments and other stakeholders to evaluate national research structure based on its theoretical economic contribution.


INTRODUCTION
Research outputs have long been evaluated by a diverse range of bibliometric indices. The usual metrics of number of citations, h-index or Eigenfactor have recently been complemented by alternative indicators, [1] mostly due to the rise of the social web and its fast uptake by scholars. [2] While not yet normalized, they include microblogging indicators (such as Twitter), online reference managers (such as Mendeley) or blogging [3] and concern downloads, tweets, shares etc. Several other more "exotic" measures have been adopted from different fields of science, such as Revealed Comparative Advantages, [4][5][6] activity index, [7] Gini coefficient or Herfindahl-Hirschman index. [8] A lot of authors have studied how these indicators perform in predicting future success of articles. In separate recent papers, Wang et al. [9] and Xie et al. [10] conducted analyses of numerous early-after-publishing indicators and their prediction capabilities on future citation count. Tahamtan et al. [11] divided the factors into paper-related, journal-related and authorrelated. The most often used ones include paper length, [12] language, [13] number of authors, [14] international collaboration, [15] references used, [16] accessibility, [17] journal impact factor [18] and many others. The importance of these factors varies by field. [19] Some of the papers use an independent variable of paper quality, even though it is difficult to measure. Several attempts which have tried to quantify quality were based on reviewer scores, [20] editors' assessments [21] or expert evaluation. [22] More often, citation count is designated as the dependent variable approximating paper quality.
However, the success of research does not rest in the number of citations -as implicitly assumed by the vast majority of the previously mentioned studies and numerous other papers focusing explicitly on research quality and research impact [23,24] -but in its influence on the real economy. Indeed, it has already been suggested that citations reflect aspects related to scientific relevance, but not other dimensions of research quality. [25] Research should transform science into social, economic and medical results [26] and contribute to economic development. [27] Considering research is often funded from public sources, one of the most important indicators of quality is the value for money it brings. Consequently, many governments have felt the pressure from taxpayers and altered their research evaluation systems to better incorporate the value-for-money principle, such as Australia, [28] Canada or the UK. [29] Approaches to measuring research impact have been recently reviewed in Greenhalgh et al. [29] They include the Payback Framework, Research Impact Framework, societal impact assessment or monetization models, and are composed of dozens of quantitative measures as well as multi-level logical methods. To our best knowledge, none of them embraces a concept that could be described as "research sophistication". This is based on the premise that even though all the research areas are important, their role in the national economy is not equal. Some research areas are connected with high levels of national product (measured by Gross Domestic Product or similar indicators), while others with significantly lower ones.
Journal of Scientometric Research, Vol 10, Issue 1, Jan-Apr 2021 We will consider the former research areas "more sophisticated" and the latter "less sophisticated" without any judgement regarding their importance, desirability, complexity or cost intensity. As will be seen later in the paper, the concept originated in international economics. We will therefore also apply the terminology of the original field regarding "sophistication", even though we are well aware that the word can have numerous interpretations, and a hundred of social scientists would define "research sophistication" in a hundred of different ways.
The goal of the present paper is to develop an aggregate indicator which would connect the sophistication of a country's research structure to its economic output. This would allow governments and regulators to evaluate research areas based on their theoretical economic contribution, and hence give them a new tool to make informed decisions about the research policy.

METHODOLOGY
The paper builds on the work of Hausmann, Hwang and Rodrik [30] and alters the indicators they developed in international economics for use in scientometrics. Given that their approach is not generally known in the current field, we will briefly present it first.
In an influential paper titled "What you export matters" the authors constructed an index of "income level of a country's exports" called EXPY, tested its empirical validity and statistical properties. Simply said, the indicator is a hypothetical value in dollars which shows the level of quality of a country's exports. If a country specializes in more sophisticated goods, its EXPY will have a high value; if it focuses on producing simple cheap goods, the value will be low. Countries with the highest values can be seen as the most successful in reaping the benefits of international trade and vice versa.
The indicator is calculated in two steps. First, an intermediate measure called PRODY is computed. It is a hypothetical value of income connected with export of a product (in economics usually referred to as a "good"). For each good, the equation is Where X are exports, g is the index of good, * is the sum of all goods, c is the country index and GDP is the Gross Domestic Product. Each good's PRODY is therefore the sum of each country's GDP multiplied by the share of exports of the good in the country's total exports relative to the sum of the good's share in global exports. The expression in brackets is very similar to the Revealed Comparative Advantage index, also originating in international economics, which has nevertheless been successfully used in scientometrics -see for example Lattimore and Revesz, [4] Chuang et al. [31] Harzing and Giroud [6] or Radosevic and Joruk. [5] Second, EXPY can be calculated as the sum of the PRODY of each good the country exports multiplied by the good's share on the country's total exports: This economic concept can be altered and applied to scientometrics. It will result in an indicator which shows the level of sophistication of a country's research measured with the help of GDP. In the first step we will compute an intermediate measure, let's call it REARY (Research Area Income Level -Y is the usual sign for income in economics), which will be a hypothetical value of income per capita connected with conducting research in a research area: Where P is the number of publications, ra is the index of research area, * is the sum of all research areas, c is the country index and GDP is the Gross Domestic Product. In simpler words, this means that for each research area we calculate what could be called a weighted average of GDP of the countries conducting research in it. For example, if only one country conducts research in a specific area, the value of REARY for the area will be equal to its GDP. If ten countries conduct research in the area, the value of REARY will be determined by the GDP of these countries multiplied by each country's relative share of research in this area.
After obtaining REARY we can proceed to calculate the main indicator we call PUBLY (PUBLication-related income level -Y is the usual sign for income in economics) being the sum of the REARY of each research area the country is active in multiplied by the area's share in the country's total number of publications.
PUBLY's value range is determined by the lowest and the highest GDP in the studied year, and generally will be well within these two limits. Similar to the interpretation in international economics, if a country conducts research in more sophisticated research areas, its PUBLY will have a high value; if it focuses on less sophisticated research areas, the value will be low. It is important to note again that it is not the goal of the present paper to divide research areas into "sophisticated" and "less sophisticated" ones, nor would such an exercise be useful. However, calculating REARY will produce a list of research areas by hypothetical value of income connected with them, which can -with a great amount of generalization and the necessary caution -be interpreted in terms of sophistication.
Countries with the highest values of REARY can be seen as the most successful in reaping the benefits of international research cooperation and vice versa.
The theoretical underpinnings of equations 1 and 2 result from the general equilibrium model, where, without going into technical details, a country's economy is driven by the most productive goods it produces and by the resulting exports, which can then by approximated by EXPY. The logic of equations 3 and 4 is based on the same principle. Modern economies are driven by research; [32] hence they are driven by the most productive research areas, not in terms of the number of publications or citations, but rather in terms of the relative revealed comparative advantage captured by PUBLY. This is based on the assumption that research is not detached from the real economy, and a country's research structure mirrors its economic structure. It can be expected that economic and societal fields developing in a country attract also the attention of researchers. As a result, just as the production sophistication predetermines export sophistication, it also induces research sophistication and in turn the country's per capita income.
The data for the present research was taken from the Web of Knowledge database accessed on 2 September 2019. [33] It is based on all articles published in 2018 and indexed in the Web of Science Core Collection, sorted by research area and author country. Due to the fact that a paper can be classified in more than one research area and international co-authorship is shown as a separate publication for each author country, the database includes 4,141,736 entries of 1,970,376 unique articles ( Table 1). The number of research areas is 152, as reported by Web of Knowledge. The number of author countries is 192, after enforcing a limit of at least 10 publications per country in 2018 and making some minor changes, which leads to the exclusion of 0.032 % of articles from the data. (These include: merging England, Scotland, Wales and North Ireland which are reported separately into a single country called the UK; merging entries for Georgia and Republic of Georgia; dropping several countries/regions due to unavailability of reliable and consistent GDP data: Kosovo,  This relationship, understandably, is bidirectional on multiple levels ( Figure 1).
The key to all the terms in Figure 1 is comparative advantage. Even though it is an economic concept and explains (by means of multiple parallel theories) why countries are good at producing what they produce, it is based on numerous noneconomic factors, such as geological endowment, climate, history, culture etc. These comparative advantages drive the economy, but they also drive the research. Given the wide variety of their determinants, their explanation power does not limit itself to the areas of science which can be directly linked to an industry (such as chemistry-chemical industry, agriculture-agricultural sector, metallurgy-metallurgical industry etc.), but it also applies to less "tangible" research areas (such as mathematics, literature, sociology etc.).
The source of information on the Gross Domestic Product is UNCTAD's UNCTADSTAT. [34] Per capita values based on US dollars at current prices are used. All data is for 2018, except for 2016 for Curacao and 2017 for North Korea due to unavailability of newer records.

RESULTS AND DISCUSSION
One might expect that if research areas are ranked by their REARY value, the top positions will be taken mostly by fields with high material and financial requirements. This is not the case. Cost-intense areas such as oceanography, orthopaedics Journal of Scientometric Research, Vol 10, Issue 1, Jan-Apr 2021 or transportation are intertwined with fields from Social Sciences or Art and Humanities such as dance, classics or social issues, which undoubtedly require less investment (Table 2). Similarly, on the other end of the list there are some areas which are clearly expensive, mostly related to diseases. This confirms that "research sophistication" as calculated by our method does not necessarily mean that the research areas are more financially costly or scientifically demanding, but it is more related to the nature of the studied object and lifestyle. Once consumers reach certain level of income, their consumption habits change in favour of more luxurious goods and services. More affluent people visit theatres more often than less affluent ones [35] and go to more concerts. [36] Rich people also practice sports more often [37] and use services of psychologists with a higher frequency. [38] For the vast majority of the items in the left half of Table 2, a similar explanation can be provided. It can be argued that exactly as higher income leads to higher consumption of luxury (or using alternative terminology "more sophisticated") goods, higher national income leads to a higher share of research in the related areas, often areas of Arts and Humanities or Social Sciences. This statement would deserve a full research paper, however, even preliminary analysis using our dataset has shown that it appears to be valid. For example, Pearson correlation coefficients between GDP on one side and share of the research area on a country's total research are 0.44 for sport sciences, 0.39 for psychology, 0.36 for dance or 0.29 for theatre. While this might not appear to be a high correlation by usual standards, these are significantly higher values than Pearson coefficients of the majority of other research areas; in case of sport sciences it is the maximum value. Furthermore, the assumption is logical. As we have argued elsewhere, research is not detached from national economy and society, and economic and societal fields developing in a country quite understandably attract also the attention of researchers. It is therefore not surprising that Arts and Humanities and Social Sciences have a high presence in the list.
On the other hand, many fields which can hardly be considered scientifically basic and financially inexpensive have the lowest values of REARY. Tropical medicine, parasitology and infectious diseases along with several other related fields are at the bottom of the ranking. The obvious reason here is the fact that they constitute a high share of research in poor countries with high prevalence of tropical diseases. This is not a choice on the scientists' part, but rather a necessity.
The mismatch between sophistication index values and expectations can also be seen in international economics, where several types of fish belong among the items with the highest value of the sophistication index, while cobalt ores or uranium ores -which themselves are obviously not too sophisticated, but their mining is expensive -are on the opposite end. [39] Having computed REARY, the final indicator called PUBLY can be calculated for every country in the world and countries ranked by its values (Table 3). Unsurprisingly, countries in the top spots of the list belong to the group of the most developed nations in the world or are small countries with a   highly concentrated research, whereas the lowest values can be found in some of the poorest countries of Africa and few from South-East Asia (Figure 2). The variation in PUBLY is considerably lower than the variation in GDP, which is a direct consequence of the former being based on a weighted average of the latter for all the countries in the world.
Out of the top 20, nine countries have at least one university in the current THE World University Ranking [40] 2019 top 100 by research score. The rest have a university in top 250, the exceptions being Iceland and -more importantly -the first five ranks of the list. The very high positions of these countries can be explained by the combination of (1) low number of publications, and (2) high concentration on a few research areas. Bermuda, San Marino, Palau and Andorra had less than 40 publications each published in 2018 and this was in between 14 and 27 research areas. With such a small publishing output, it is only logical that two or three publications in a highly sophisticated research area would considerably increase the value of PUBLY. For example, San Marino had 10 papers published in the field of geriatrics with REARY of almost 40,000 USD. Qatar and Iceland had a significantly higher number of publications, which however still constituted only around 0.1 % of the world total, and many of them were concentrated in fields with a high PUBLY, such as sport sciences (Qatar) or geology (Iceland).
The end of the list consists chiefly of countries which are not known for the quality of their research and are active in just a few research areas with great importance for them -as mentioned before -mostly rotating around tropical diseases and parasites.
Not surprisingly, there is a strong correlation between PUBLY and GDP per capita ( Figure 3). This indicates that natural resources and a subsequent change in export structure. Indeed, the ultimate result of research -technological progress -affects all the elements of the mechanism: comparative advantage, export as well as the research structure itself.
There is a relatively high level of correlation between PUBLY and the total number of publications per million inhabitants (r P =0.5321, r S =0.7981). This indicates that countries with higher publication output per capita tend to conduct research in more sophisticated areas than countries with lower publication output per capita, and vice versa. Even though the relationship is not absolute, it shows that quantity in research is often accompanied by quality. The highest mismatch can be found in numerous small island states (they perform significantly better in publication output than in PUBLY) and highly populated developing nations such as India and Indonesia (they perform significantly better in PUBLY than in publication output). This difference can be easily explained by small-number effect in case of the former and low publication activity/low relative number of researchers in case of the latter.
The PUBLY indicator is not without issues. The weights incorporated in its calculation decrease the importance of large countries (in terms of annual number of publications) and increase the importance of small countries, thus giving small countries a higher likelihood of distorting the ranking than big ones. [39] Moreover, the indicator tends to be sensitive to research structure variation. [41] Due to the statistical properties of its calculation, the values of PUBLY can decline in time as a result of changes in the composition of research structure, [42] specifically as more countries start to be active in more research areas.
It could also be argued that the link between research sophistication and GDP was stronger in the past than it is now. The flow of information and knowledge today is much faster than before [43] and globalization has opened borders to rapid growth in trade [44] and movement of capital. Under these conditions, any change within one economy affects other economies, the difference being only the size of the effect and the delay. In the past, the spread of research results was relatively slow and limited, whereas now it is literally instantaneous. It follows that today changes in research sophistication in one country are more likely to immediately spill over into foreign countries than in the past, possibly even having a crowding-out effect in the country of origin; hence the link between research sophistication and GDP appears to be weaker than it would have been decades ago.
The above mentioned issues indicate that extra attention needs to be taken when trying to interpret the results and implications of PUBLY. However, they do not make the present research invalid.
rich countries have a research structure based mostly on highly sophisticated research areas while poor countries have a research structure based mostly on less sophisticated research areas. This is partly by construction due to the fact that GDP is an integral part of the equation for the calculation of REARY. However, even if the country's own GDP is excluded from the calculation, the correlation coefficients remain similar.
(A similar observance was demonstrated by Hausmann et al. [30] with regards to EXPY.) Importantly, all the major economies and countries from the top university ranking by research score are within the 10-per-cent band from the hypothetical regression line. The outliers (using the standard definition of two standard errors) are without exception economically and scientifically marginal countries. This is not statistically obvious, because the method of calculation of PUBLY does not give higher weight to richer or larger countries -to be exact, it is the other way around.
The reason why research sophistication and income per capita are linked, was illustrated in Figure 1. The mechanism itself is not straightforward and leaves ample room for different interpretations. Importantly, it is clear that the process is not unidirectional and one cannot draw any conclusions regarding causal relationship. Even though comparative advantage is at the beginning of the process, it is also influenced by all the other factors in the scheme, and hence depends on them. As countries get richer, their endowment with factors of production and the prices of these factors change, directly influencing comparative advantage. This is the reason why less developed countries base their production mostly on cheap labour, but as their level of development (and wages) increases, the comparative advantage in this field declines. Similar reasoning can be applied to export sophistication -changes of export structure lead to changes in comparative advantage, either induced by free market mechanisms or by means of governmental protectionist policy. From the research perspective, changes in research structure can also lead to changes in comparative advantage. This happens when these changes affect factors of production, their quality or quantity, i.e. lead to a higher productivity of labour, increase yield of primary produce or provide new crucial knowledge.
An additional matter to note is that the trade path and the research path of the mechanism are not isolated but affect each other. Changes in export structure can lead to changes in research structure and vice versa. For example, right after Slovakia gained independence, its research focused heavily on biology and chemistry, with engineering areas having merely a 5.2-per-cent share on total publications in 1994. [33] After the arrival of crucial foreign investors in the fields of automotive and electronics, the share has increased to 14.6 % in 2018. The opposite is also possible, e.g. a concentrated research effort in the fields related to geology can lead to a discovery of new

CONCLUSION
Holding other factors constant, a country is better off if its research structure focuses more on sophisticated research areas than if it focuses on less sophisticated research areas. The reasoning is based on the relation between production structure of a country and its per capita income: Just like a country exports mostly the goods in which it possesses a comparative advantage (and hence exports mostly the goods that have a high share in its domestic production), it is a logical assumption that it conducts research mostly in those research areas where it possesses a comparative advantage. The sophistication of research areas can be measured by an indicator called PUBLY we developed in the present paper.
A country which has a comparative advantage in a field it does not desire (because it generates low revenue, is unstable, is obsolete etc.) can try and change it. In trade this can be done by protectionist governmental measures, including production subsidies or tariffs. In science, the measures can range from guiding documents such as adopting a new research and innovation strategy to financial ones such as field-specific national grants and/or redistributing research funding. The importance of the present paper lies in providing stakeholders with a novel way of measuring research sophistication.
It is vital to note not only what PUBLY is, but also what it is not. It is not an indicator which would measure research sophistication by quantifying intellectual complexity or scientific value of research areas or countries. Higher sophistication of a research area does not necessarily mean it requires more financial, technological our knowledge inputs. In our definition it simply indicates a research area which is connected with a higher average income per capita than less sophisticated research areas, which is a result of the elsewhere described mechanism. Therefore, it cannot be surprising that dance is found to have a higher level of sophistication than chemistry or forestry, just like economic research has found watches to be a more sophisticated good than nuclear reactors. [41] Additionally, PUBLY is not an indicator implying causal relationship between GDP and research sophistication. Neither is it a measure of research quality.
Finally, one cannot stress strongly enough that our research says no word about the importance of individual research areas. We believe that every research area (be it in Science, Social Sciences or Arts and Humanities) has a crucial role for the development of mankind. While some might be accompanied by a higher GDP and/or might be costlier, in no way should they be considered superior to the others. Just like in international economics, where the original index originates, exporting airplanes is in itself not superior to exporting food. It might bring higher profits, but without securing the latter it would be completely useless.