Do Scientists with Foreign Names Improve European Medical Research? A Preliminary Study of a New Methodology

The UK’s recent departure from the European Union may make it harder for researchers from other countries to be employed here and may discourage research institutions from seeking them. We sought to develop a methodology to see if non-native heritage scientists (with foreign names) brought measurable benefit to their host countries’ research. We selected two European countries, Sweden and Italy, and two medical subject areas (cancer and diabetes). We studied research papers from the Web of Science for 200915. We compared the citations received and journal impact factors (JIFs) for papers with different combinations of nationality (other Western Europe, Rest of the World) based on the researchers’ given names and surnames, using the OriginsInfo Ltd database to identify their cultural heritage and sex. Sweden had more international papers, and more foreign-name researchers, than Italy. Citations and journal impact factors were higher for both countries’ domestic papers when the diversity of their researchers’ names was greater, especially for Sweden. An analysis of the effects of varied name-nationality on individual papers gave inconclusive results. We developed a methodology that could be used to determine the contributions of foreign-name researchers and women to the impact of European medical research papers and to their subject matter. More in-depth analysis is needed to discover how this is taking place.


INTRODUCTION
The tradition of scientists moving across borders is old and many distinguished researchers have found fame and sometimes also fortune in countries other than the ones in which they were born. For example, of the 85 Nobel Prize winners in science (chemistry, medicine or physiology, physics) associated with the UK, 18 were born overseas and a further seven now work abroad (https://www.nobelprize.org/prizes/lists/allnobel-prizes/). Some scientists have also been persecuted in their own countries, especially Jews, or ones with supposedly politically incorrect views have needed to escape in order even to survive, sometimes unsuccessfully. For example, Michael Servetus offended the Roman Catholic Church in Spain and also the Calvinists in Switzerland whither he had fled. They tortured and killed him in 1553. In 1847 Ignaz Semmelweis in Vienna ordered hand-washing with chlorinated water between deliveries of babies, so greatly reducing mortality. But opposition from his colleagues forced him to relocate to Hungary, where he died a few years later. There has also been targeting of LGBT (Lesbian, Gay, Bisexual, Transgender) scientists in countries with regressive social attitudes, such as the chemical castration of Alan Turing by the British in 1952.
In recent years, there has been less overt discrimination, but many black and other minority ethnic scientists still struggle to be accepted in the USA. [1] And many scientists still move, often to richer countries where the opportunity to carry out research is better. [2][3][4] This has been described as the "brain drain" and it applies particularly to researchers from Lower Middle Income and Low Income countries moving to north America and to western Europe. We noted earlier [5] that there were more cancer researchers of Indian heritage working in Canada and the USA than there were in India itself. However, not all the moves are permanent and some are only for a few years, or even months.
For the individual scientist, movement to another country is usually of great benefit, [6] and if she can work in English the language barriers that might be problematic in other professional occupations are much less because the output of research teams is usually journal articles written in English and so it is often the common language for the researchers. For the host organisation, the ability to recruit for a vacancy from a much larger pool of potential applicants is also attractive, [7,8] although it may involve sifting through a larger pile of curricula vitae in order to draw up a short list for interview. The advent of convenient video conferencing platforms, such as Skype, Zoom and Microsoft Teams, has also reduced the need for long-distance travel even when senior positions are being filled.
Nevertheless, there is often resistance to recruitment from abroad by governments who are concerned to protect jobs for their own citizens and who may be under populist pressure to limit immigration, particularly from countries with different cultural norms. There can also be a reluctance by research institutions to go to the trouble of seeking candidates from abroad and of accommodating them if they are appointed. Additionally, there may be a reasonable fear that recruitment from less well-off countries will denude them of the expertise which they may need much more than the prospective host country and not just in science, [9] and represent an unfair harvesting of skills and training that have been provided by the donor country. To some extent this may be offset by remittances sent back to the donor country and some emigrants may return. [10] There is surprisingly little quantitative research on the actual benefits that a host laboratory (or country) gains from having foreigners in its research cadres. Anecdotally, it has been suggested that the newcomers' different methods of education and training and a more varied range of experience may act as stimulants to the creative process needed for research. [7] We wanted to explore this question in greater depth and to bring statistical rigour to the question of whether there are real benefits to a research team if it includes members from different cultural heritages, i.e., with foreign names. These benefits can be conveniently measured by means of actual citations to the papers that they publish, and by the potential citations that may accrue if the papers are published in more prestigious and higher citation impact journals. These are only two of a large number of possible indicators of benefit, but at least they are well-established and can be used to compare research outputs in many different fields.
There are many other factors that can influence the citation scores that research papers receive and the effect of varied author names may be hard to separate out from these other independent variables. However, the question has some chance of being answered clearly because of the sheer volume of data that can be brought to the table for examination. The amount of work needed makes this a daunting prospect, so in this paper we have tried simply to develop a methodology that can be used widely and may then provide more definitive answers to the question of whether there is actually an observable benefit and if so, how big it is.
We are very much aware that this is only a preliminary study and in particular that a scientist with a name that is foreign to the country in which she is working may not be a new immigrant, but a second or higher generation one. This would need to be tested by examination of each identified researcher's curriculum vitae, which is often available in abbreviated form on the website of their institution.
In this study, we examined samples of papers from two European countries whose cultures we initially thought might be sufficiently different that they would furnish some measurable indicators of benefit. We sought out two countries whose propensity to collaborate internationally was either greater than that expected from their integer-based output, or less, see Figure 1. This shows the situation for cancer research in 2009-13. Sweden (SE) is widely considered to be quite welcoming towards visitors and in 2018 18% of the population were not born there (https://www.statista. com/statistics/549292/foreign-born-population-of-eu/), as were 38% of its researchers. [8] It is also notably welcoming to immigrants and was seventh on the list ranking countries' acceptance of migrants in 2016. [11] On the other hand, Italy (IT) is traditionally considered much less so and indeed there are still strong feelings of provincial loyalties over-riding national ones. [12] The percentage of foreign-born people in Italy was only 10% in 2018 and only 3% of its researchers were born abroad. [8] It was well below Sweden in its acceptance of migrants (being ranked 43rd), but well above the 15 Eastern European countries whose attitudes were surveyed.
Both countries are active in medical research, [13,14] so that we might expect that each would carry out work in a wide range of subject areas and research domains. Their citizens tend to have distinctive personal and family names. Both have also been Member States of the European Union for many years, with no legal restriction on the recruitment of scientists from other Member States and indeed a legal requirement to make all such posts open to applicants from them. [Although Switzerland (CH) and Belgium (BE) are even more relatively "international" than Sweden in Figure 1, their multi-cultural populations would have made analysis on the basis of names almost impossible. Greece (GR) and Portugal (PT) are much less "international" than Italy, but their medical research outputs are much smaller than that of Italy and so less fruitful for analysis.]

METHODOLOGY
We selected two well-defined medical research areas: cancer in Sweden and diabetes in Italy and identified papers recorded in the Web of Science for the seven years, 2009 to 2015. We used two complex filters based on specialist journals and title words [13,14] and downloaded the details of the two sets of papers to text files and then to MS Excel spreadsheets. One of the columns was labelled C1 and contained details of both the researchers' full names and their addresses, for example: We parsed these items and for each paper listed the names of researchers with an address in Italy (or, for the cancer papers, Sweden) and also any researchers in any other countries. We then removed any duplicates from each of the two lists and removed the names of any researchers in the second list who were also in the first list. For each paper, we were then able to determine the total number of authors, A.
We listed all the names with an address in Italy (or Sweden) alphabetically by surname and determined how many occurrences of each name (surname and given name) were present in the lists. The names were then processed by the OriginsInfo Ltd software (https://www.originsinfo.com/ services) to show the country (and in some countries, the region) from which they came. This software uses a file of over four million surnames and one million personal (given) names and categorises them into one of nearly 200 countries and regions and, for the personal names, by sex. For the purposes of our analysis, these were amalgamated into just three categories: Italy (IT) or Sweden (SE), the rest of the "old" European Union in 1988 of 15 Member States (EUR) and the Rest of the World (RoW).
However, we soon saw that some people with the same surname had been categorised differently, so it was necessary to resolve each of these cases and ensure that any misattributions were, as far as possible, removed. This applied particularly to names that appeared to be Italian in origin, but had been characterised as EUR or RoW. We also took the opportunity, by means of a special macro, to identify the sex of researchers who only had an initial if their surnames were also present with a given name, from which the person's sex could almost always be identified. Thus Abbatini, F. could be identified as female, because Abbatini, Francesca was also in the list. But if Abbatini, Fabio was there as well, Abbatini, F. could not be sexed.
In order to enumerate the researchers of male (M), female (F) and undetermined (U) sex, we needed to scan the names individually and mark any that appeared to be the same person as another one with a closely matching given name, but the same surname. This could occur for any of the following reasons: • one name only had an initial and the other a given name (this was the most common) • there was an initial as well as a given name, e.g., Katarina Y.
• one of the given names was mis-spelled because the WoS does not use accents or umlauts, e.g., Goeran or Goran For each of these sources of error we needed to make a careful comparison of the names. Ones that appeared to be duplicates were marked with an "X" so that they would not be counted twice when we sought the numbers of individual researchers in Italy (or Sweden) in each of the 3 × 3 = 9 categories (i.e., three divisions of sex and three of name origins).
Another macro was then used to list the countries named in the C1 column and to distinguish those papers with only Italian (or Swedish) addresses from those that were internationally co-authored. Yet another macro then marked each paper with the contributions made by the three groups of own-countrybased authors and by EUR or RoW authors, if any. A few papers had corporate authors, such as Elderly Study Grp or Progetto Diabete Calabria. These were ignored for the purpose of analysis.
Journal of Scientometric Research, Vol 10, Issue 1, Jan-Apr 2021 Traditionally, papers have been classified as "international" if the addresses of their authors contained more than one country. Such papers would naturally have authors of different national backgrounds or ethnicities. However, we wanted to study the effect of the presence of authors with foreign names in the study country research teams, so we therefore primarily concentrated on purely domestic papers, without foreign addresses.
In order to evaluate the impact of the papers with different author name origins, we next determined the numbers of citations in a five-year period beginning with the year of publication for each of the papers. The citation counts were downloaded from the WoS and converted to a single MS Excel spreadsheet by another macro. The five-year values, Actual Citation Impact (ACI), were then calculated and matched to the papers in the first file on the basis of their titles. However, for a few papers this could not be done because the title contained quotation marks, or was too long and for these a match was made based on the papers' source data (journal name, year, volume, issue and pagination). It was then possible to determine the mean ACI value for any given set of papers.
Although we have been warned [15] not to use journal impact factors (JIFs) as a means to evaluate research, we thought that it would be useful to see if mean JIF values for different sets of papers followed similar trends to those seen for ACI values.
We therefore determined the JIFs for all those papers for which Clarivate Analytics had tabulated the values for the corresponding year, in practice about 97% of them.
Our attention was focussed on four mutually exclusive sets of papers, for both domestic-only ones and for those that were co-authored internationally, which were marked to show ones with: • Italian (or Swedish) authors only • EUR authors (but not RoW ones) • RoW authors (but not EUR ones) • With both EUR and RoW authors.
The first analysis was of the mean ACI and JIF values for the four groups of papers, to see if the presence of other Europeans, or researchers from RoW, had any positive effect on them. Internationally co-authored papers typically receive more citations than purely domestic ones, [16] so these two groups were analysed separately.
However, as there are many other factors that affect citation scores, [17][18][19][20] we also carried out an analysis of the individual papers, in which the dependent variable (ACI or JIF) could be influenced by several independent variables, of which the team composition would be two (i.e., presence of other European (EUR) or Rest of the World (RoW) authors). Others that have been implicated in having an effect on citation scores are the numbers of authors (A), the numbers of addresses (D) and the numbers of funding bodies acknowledged (F), whether the paper is an article or a review (document type, DT) and whether the paper is open access (OA). The subject area within the medical field (i.e., diabetes or oncology) may also have an effect. [13,14] For this second analysis, we used the software package SPSS, version 25. We limited the analysis to domestic papers from the two countries, as we believed that the presence of foreign collaborators would swamp any possible impact improvement from non-native heritage researchers. In our previous attempt [21] to derive an equation relating the dependent variable, ACI, to the numerous independent variables, we found that it was helpful to allow both linear and squared terms for A, D and F and to limit each of them to 10. We found that the linear term for D had a negative coefficient. This means that for D up to about five, all other variables being the same, singleaddress papers were more highly cited than ones with multiple addresses. [17] The name-origins of the teams were represented by two categorical variables for the presence or absence of EUR and RoW authors in Italy (or Sweden), on each paper. The dependent variable, the number of citations ACI was also transformed to its square root (ACI 0.5) so as better to cater for extreme values. We also investigated JIF as the dependent variable.

RESULTS
There were 4282 diabetes research papers from Italy in 2009-15 and 10,836 cancer research papers from Sweden in the same years. It was immediately apparent that the latter were far more international than the former in terms of addresses and that the international papers were more than twice as well cited as the domestic ones in both countries, see Table 1.
In order to reduce the effect from any difference in research impact from the medical researchers in the two countries, we also examined the values of ACI for the opposite sets of papers, viz., Italian cancer and Swedish diabetes research.
In their domestic papers, the Italians were less cited than were the Swedes in both subject areas, but in their international papers the reverse was true. This was because higher percentages of these Italian papers were co-authored with researchers from the Netherlands and the USA, who tend to be well-cited, and fewer with China, whose papers have (until recently) been less cited than the world average. Overall, Swedish papers in both diabetes and cancer were better cited than Italian ones in the same subjects (by 9% in diabetes and 11% in cancer), mainly because much higher percentages of the Swedish papers were international. When all the Italian (and Swedish) names had been categorised by national origin, with sex assigned as far as possible to those with only initials and without given names, the results were as shown in Table 2. There is still doubt about the uniqueness of some authors. For example, although names such as Barbagallo, Mario and Barbagallo, M. can be considered as referring to a single individual, there could be some homonyms (two individuals with the same name) and it is possible that we missed a few authors with different names that were really the same individual. [For example, a woman on marriage might have changed her surname completely to that of her husband.] So the totals given above must be regarded as indicative rather than definitive.
It is very clear that the percentage of non-native heritage researchers is much higher in Sweden than in Italy. This is a different indicator than the percentage of papers with foreign addresses, shown in Table 1. It appears on average that all the members of the Italian diabetes research teams tend to be more female than male and for foreign cancer researchers in Sweden the difference is in the same direction but smaller. Female researchers in five of the six groups are less productive than males: the exception is RoW diabetes researchers in Italy.
Finally, it appears that all the nine groups in Sweden are more individually productive than the corresponding ones in Italy. However, because they are working in different subject areas, it may be that cancer researchers produce more papers than ones in diabetes.
The third and main result, is the mean value of five-year citation counts, ACI and the mean value of the Journal Impact Factor (JIF) for the papers in the different groups. These values are all presented in Table 3. The pattern is almost identical between ACI and JIF values. For the domestic papers, the highimpact groups are own country + RoW, with or without EUR. However, for the international papers, the highestimpact groups are either all three (for Italy), or own country only (for Sweden). Clearly papers with name diversity from other countries can make up for a lack of it in own country, which is understandable. However, the advantage of name diversity, especially from the RoW, is important for domesticonly papers.
The results of the analysis of the individual papers with SPSS in the two domestic sets are somewhat anomalous. Of the six different analyses (two countries, three possible dependent variables) the one consistent finding was that reviews were more highly cited and in higher impact journals than articles, as were open access papers. Neither result is unexpected and both were statistically highly significant (p < 0.01). For the Swedish oncology papers, the coefficients of D and D 2 (the number of addresses on a paper and its square) were uniformly as expected, viz., respectively negative and positive and also statistically significant (p < 0.05), showing that fewer addresses were beneficial. For the Italian diabetes papers, the signs of the coefficients of D and D 2 were reversed, but none of them were statistically significant. As for the main question, whether the coefficients of EUR and RoW would be positive, the results are shown in Table 4 for the six analyses. Only one of the 12 analyses gave a statistically significant result. This was the positive effect of other Europeans on the JIFs of the Italian diabetes researchers. None of the other 11 gave a statistically significant result. However, it is worth noting that five of the six analyses showed that other Europeans had a positive effect, whereas only two of the RoW analyses showed one and they were much smaller than the negative effects shown by the other four analyses. This is an important result. Data are shown for different groups of authors based on their names: own country only, own + Western European (EUR), own + Rest of the World (RoW) and all three groups. The highest impact group in each set is tinted bright green and the second-highest impact group is tinted pale green. Our detailed methodology also permitted an analysis of the sex ratio of both the autochthonous and visiting researchers. For most of the six groups, females outnumbered males in these two subject areas and visitors were more female than the locals in both countries and from both Europe and the Rest of the World. This is a surprising result as one might have expected visiting scientists to be predominantly male -perhaps they are in physics and engineering. However, they tended to be less productive than locals as their percentage contributions were less than their percentage presences.

DISCUSSION
The results of the analysis with SPSS of the individual papers were disappointing, probably because the numbers of papers were relatively small and so the numbers of independent variables had to be limited. No account was taken of detailed subject areas, or of research level. In previous exercises, the numbers of papers were much greater (by an order of magnitude) and many more independent variables were considered. There is some slight evidence that visitors from other European countries have a positive effect, especially those going to Italy to work on diabetes. If the exercise were to be repeated, it would be desirable to cover a wider range of medical subject areas so as to provide many more papers.
It was suggested to us (by one of the referees) that the presence of researchers with foreign names in a research team on a domestic paper might not just make it of greater impact, but possibly bias the research subject area. The one analysis that could allow this hypothesis to be tested was of the cancer manifestations that were researched by the Swedes and whether they were different if the team included one or more members with names characteristic of East Asia, particularly China, Japan and Korea (CJK). Would the cancer manifestations being researched be skewed towards those that caused a higher relative burden in those countries and away from those most burdensome in Sweden?
We checked the cancer burden in Disability Adjusted Life Years (DALYs) using World Health Organization (WHO) data for 2010. [22] We selected three cancers with a much higher burden in CJK, namely liver cancer (6.1 times as burdensome as in Sweden), stomach cancer (4.2 times) and oesophageal cancer (3.8 times). For comparison, we selected three cancers with a much lower CJK burden, prostate cancer (0.10 times), melanoma (0.23 times) and breast cancer (0.38 times that in Sweden). We then identified those domestic papers with one of the CJK-named authors whose names appeared twice or more in the file. Of the 309 such papers, 53 were on a cancer more burdensome in Sweden and 21 were on an East Asian cancer. For comparison, we compared the distribution of papers by cancer site where there was no CJK author. For these, the ratio of those on Swedish cancers to those on East Asian cancers was 730/171 = 4.27. So the expected number of papers on East Asian cancers where a CJK author was present would have been 53/4.27 = 12. The difference between this number and the observed number of 21 is statistically significant (on the Poisson distribution with one degree of freedom) with p = 0.8%. It therefore does appear that the presence of a CJK member in a Swedish cancer research team has altered the subject matter of the research somewhat to reflect the interests and experience of the visitor. Of course, this is only one observation, but it does suggest that the question is worthy of further investigation.

CONCLUSION
We have developed a methodology that could be used to calculate the contributions of researchers with foreign names and women to the impact of European medical research papers, and to their subject matter. However, more indepth analysis is needed to discover how this is taking place.
In particular, it will be necessary first to check that these "foreign" researchers really are from a different country, and not second-or third-generation immigrants. We will then need to send questionnaires to, and hold interviews with, the foreign researchers and their team leaders to learn about how their different backgrounds may have helped the research in which they were engaged.