ORCID growth and field‐wise dynamics of adoption: A case study of the Toulouse scientific area

Research‐focused information systems harvest and promote the scientific output of researchers. Disambiguating author identities is key when disentangling homonyms to avoid merging several persons' records. ORCID offers an identifier to link one's identity, affiliations and bibliography. While funding agencies and scholarly publishers promote ORCID, little is known about its adoption rate. We introduce a method to quantify ORCID adoption according to researchers' discipline and occupation in a higher‐education organization. We semi‐automatically matched the 6,607 staff members affiliated to the 145 labs of the Toulouse scientific area with the 7.3 million profiles at orcid.org. The observed ORCID adoption of 41.8% comes with discipline‐wise disparities. Unexpectedly, only 48.3% of all profiles listed at least one work and profiles with no works might just have been created to get an identifier. Those ‘empty’ profiles are of little interest for the entity disambiguation task. To our knowledge, this is the first study of ORCID adoption at the scale of a multidisciplinary scientific metropole. This method is replicable and future studies can target other cases to contrast the dynamics of ORCID adoption worldwide.


INTRODUCTION
Scholarly publications feature the names of the contributing authors in the by-line of the articles. With an increasing number of scholarly works published each year, the number of homonyms is growing too: Of the more than 6 million authors in a major journal citations and abstracts database, more than two-thirds of them share a last name and single initial with another author, and an ambiguous name in the same database refers on average to eight people. (Sabine, 2014).
Such ambiguity has several implications, a critical one being identity theft. The recent Surgisphere scandal a.k.a. LancetGate stressed this issue of homonymy detection (Piller, 2020). A USbased tenured faculty padded his CV with two-thirds of publications not written by him but by homonyms. This article is an extended version of a conference paper given in French (Heusse & Cabanac, 2020, Adoption de l'identifiant chercheur ORCID : le cas des universités toulousaines. In: INFORSID'20: 38e congrès de Bibliometric studies also suffer from the 'namesake' problem and Harzing (2015) stressed the case of author Y. Wang publishing nine papers a day as per Web of Science data. In reality, this identity amalgamates thousands of academics whose production is considered as one.
Research organizations worldwide use current research information systems (CRISs) to collect, analyse, and share their research output (Fabre et al., 2021;Sivertsen, 2019). They rely on document identifiers (e.g., DOI) and author identifiers, such as VIAF, ISNI and later ORCID in October 2012. 1 ORCID stands for 'Open Researcher and Contributor ID' and is operated by a non-profit organization that mints identifiers for authors of scholarly works (Haak et al., 2012). As of May 2021, 2 there were 11 million ORCIDs created, a number greater than 7.8 million researchers, a UNESCO estimate of the worldwide population of researchers (Soete et al., 2015, p. 33). This discrepancy may partly come from non-researchers creating ORCID profiles, such as support staff and librarians. Organizations and publishers also adopted and promoted ORCID: funding agencies and editorial managers request an ORCID to submit a proposal, a manuscript or a report during peer-review (Haak et al., 2018;Hanson et al., 2016). ORCID has also become a key component for open archives (Brown et al., 2016).

ORCID PROFILE: CREATION, UPDATE AND VISIBILITY SETTINGS
There were two ways to create an ORCID profile. On the one hand, up until 2016 an institution could create an ID for its employees, such as the University of Colorado. 3  On the other hand, authors are free to sign up with their email and obtain an ID. They are asked to pick a visibility setting for their profile: public, restricted to trusted parties only or private (Fig. 1). Several sections make an ORCID profile, as shown publicly available profile may contain items that are available to trusted parties only or even visible to the owner only (e.g., some confidential works).
ORCID operates no control over the contents of the profiles. As a result, many profiles are void of information: all sections appear to be empty, and even the owner's identify is undisclosed (we elaborate on this in Section 5). There is no way for readers to know if owners entered no data at all or if owners chose to mask some entries (picking a visibility different from 'Everyone'). While some authors prefer not to disclose information legitimately, Teixeira da Silva (2021) reported some abusive profile creations by papers mills to fool publishers at paper submission time.
ORCID owners can contribute data themselves or use built-in import features. For some sections, such as Funding and Works, owners provide an identifier (e.g., DOI, PubMed ID, ArXiv ID) or a search query and the system fetches all relevant metadata. Haak

Key points
• The ORCID adoption measured at the scale of a French multidisciplinary scientific metropole was 41.8% of its 6,607 staff members.
• The ORCID adoption rate varies throughout the year: it peaked in October when applications to the French national research agency were due.
• The ORCID adoption was higher in science, technology, engineering and mathematics (STEM) versus social sciences and humanities (SSH). The yearly adoption rate differed among disciplines of both STEM and SSH.
• Senior and junior faculty and researchers adopted ORCID at a higher rate compared with postdocs, although postdocs are more likely to use an up-to-date ORCID profile for applications.   To overcome these biases, we designed a study focusing on a large scientific area in France (Heusse & Cabanac, 2020) gathering 6,471 research staff (a sample of equivalent size compared to the OECD study). We found an adoption rate of about 40% in the Toulouse area, to be compared with the 17% in Caen University, a smaller site accounting for 1,047 research staff members (Boudry & Durand-Barthez, 2020). To the best of our knowledge, there is no other estimate of ORCID adoption in the general research population of a scientific metropole. The present paper extends our previous study published in French (Heusse & Cabanac, 2020) to report the ORCID adoption in the Toulouse scientific area for the period 2012-2020.

CASE STUDY: THE TOULOUSE SCIENTIFIC AREA
This article aims to assess the dynamics of ORCID adoption in one of the leading academic site in France.
About 20 years ago, European countries have harmonized the structure of training at university. The organization of research, however, is not uniform across Europe. This article focusing on a scientific area in France, let us briefly summarize the higher-education landscape in France (Angermuller, 2017;Chevaillier, 2001;Grossetti et al., 2020). This context is important to bear in mind when interpreting the results presented in this paper.
France had 115,308 teaching and research staff (Meuric, 2020, p. 91) in 2017 working as tenured civil servants or contract workers: • Academics employed by universities and higher schools (grandes écoles) share their agenda between research and teaching (192 hours a year of face-to-face teaching). Nine-tenth are tenured staff (Meuric, 2020, p. 100). Whatever their employer, all staff are affiliated to a public research laboratory (laboratoire de recherche) which provides office space and research facilities (Chevaillier, 2001, p. 57). A laboratory is not usually run by a single PI (as in some countries like in the US) but gather many teams (équipes) and host from 30 to 500+ members.
The Toulouse scientific area is third after Paris and Lyon in terms of scientific output (Grossetti et al., 2020). More than 100,000 students a year are trained in all disciplines, 45% of which are taught in a masters or doctoral programme. More than 9,000 researchers whose disciplinary breakdown is detailed in Section 4.1 work in 145 public research laboratories. The Toulouse scientific area is known best for its contributions in health, astronomy, universe science and economics.

METHOD AND DATA
This section introduces the data collection protocol we used. We detail how the demographic data retrieved from the institutions were matched to ORCID profiles and manually validated. The

Category
Faculty

Demographics and bibliographic data collection
Demographic data of the 6,607 persons affiliated to the research centres in the Toulouse scientific area were collected in a previous study (Heusse, 2016). This staff registry tabulates 5,029 faculty and researchers as well as 1,578 support staff involved in research activities: engineers, technicians, clinical research fellows. The 4,284 PhD students enrolled in universities and schools (écoles) were not included. Each individual included in the registry is characterized by his/her identify (first name, last name), year of birth, sex, institution, job category, rank, laboratory and scientific domain. The staff registry lists individuals ( Fig. 3) with the following job categories: • Faculty (enseignants-chercheurs) teach and conduct research activities as junior faculty (maître de conférences) or senior faculty (professeur des universités). • Researchers are involved in research and can teach without it being mandatory. Four ranks were considered: postdocs (postdoctorant), junior (chargé de recherche), senior (directeur de recherche) and undefined (e.g., visitors).
Demographic data were collated by each research structure (145 laboratories). We merged these data into the staff registry of the Toulouse scientific area and tagged each person with the discipline of the lab he/she belongs to. Each lab is associated to one of the six following discipline groups delineated by the research council of the Toulouse scientific area to reflect its major scientific highlights: • AAE: Astronomy, astrophysics, environment • AHS: Arts, humanities, social sciences The sex ratio is imbalanced across disciplines (Fig. 5), with more males in all disciplines but Art, Humanities, Social Sciences and Health, Biology, Agronomy.
We aimed to retrieve the ORCID profile of the 6,607 individuals by querying the ORCID public API 7 with their first name and last name. Results were overabundant for some identities such as 'Philippe Durand' that yielded 4,534 profiles. The topranked profiles were perfect homonyms and the remaining were partial matches such as "Philippe S. Durand" and "Romain  and the person's identity, biography, affiliations and works. Part of these data appear on the online version of ORCID profiles (Fig. 2).
We tagged the profiles mentioning the named entities relevant to the Toulouse scientific area: 12 cities (the largest ones being Auch, Castres, Foix, Rodez, Tarbes and Toulouse), university names (Champollion, Capitole, Mirail and its new name Jean Jaurés and Paul Sabatier) and 13 school acronyms (e.g., ENSEEIHT, SUPAERO, TBS). This tag with the count of matching entities proved helpful to the visual inspection performed, that we detail in the next section.
Matching the staff registry to ORCID profiles: Semi-automated approach With the ORCID data collected, we faced three cases for a given person's identity used as query: 1. No matching ORCID profile. 2. One matching ORCID profile.
3. Between 2 and 20 matching ORCID profiles.    Each pair of matching identity-profile pair was annotated with one of these codes: • Code '0' when the looked-up identity and the ORCID profile did not match (e.g., different first names). We considered as ORCID adopters the 2,789 identities with code '1' or '?.' This is a conservative estimate of the number of ORCID adopters in the Toulouse scientific area since some identities coded '??' were discarded because of lacking evidence.

QUANTITATIVE RESULTS
As of September 2020, 41.8% of the considered population had an ORCID profile. Setting aside the 'Other staff' job category, the ORCID adoption amounts to 46.9%. These two figures reflect some disparities when broken down by job category or discipline, as discussed in this section.

ORCID adoption by job category
Researchers were the most likely (54.7%) to register an ORCID profile, with a varying adoption per rank (Fig. 6). Junior and senior tenured researchers registered more ORCID profiles (64.0% on average) than postdocs (36.4%). This imbalance is all the more surprising as postdocs usually have an intense research activity and they are expected to care for the visibility of their research (Nicholas et al. 2020).
Faculty members were less likely to register an ORCID (42.5%) and there is little difference with respect to seniority. This lesser adoption can result from two factors. On the one hand, faculty members are overrepresented in disciplines with a lesser use of ORCIDs: Art, Humanities, Social Sciences and Law, Economics, Management. On the second hand, part of the faculty members are more active in teaching and management roles than in research per se.
The other staff category has the lowest ORCID adoption rate (25.5%). They are engineers, technicians and assistants part of a research lab. Their work is acknowledged in the papers and they are sometimes co-authors of their lab's publications.

ORCID adoption by discipline and sex
The average ORCID adoption per discipline is 39.2%, ranging from 21.7% for Law, Economics, Management to 49.8% for Physics, Chemistry (Fig. 7). This result is in line with the publication habits of these disciplines and the use of identifiers that is well established in the hard sciences. The sex imbalance of adopters reflects the one we observed in the population (Fig. 5)   Health, Biology, Agronomy where women adopters are 5% less than women versus men in this discipline. Figure 8 shows the adoption of each discipline per job cat-

egory. Researchers are top adopters for all disciplines but
Health, Biology, Agronomy. The promotion of ORCID by research institutes 10 might be a favouring factor here. In Law, Economics, Management researchers and faculty members adopted ORCID at a lower rate than average. Other Staff have a high adoption rate in AAE, HBA and PC; these are disciplines whose labs were more likely to list support personnel as coauthors (see Section 6.2). October 2012 and September 2020 is shown in Fig. 9  consortium members to be identified with an ORCID: usually the PI and at least one partner per institution involved. 11 We split the cumulated red line of Fig. 9 to plot the cumulated number of profile creation for each discipline (Fig. 10).

ORCID adoption through time
Disciplines gather into two groups characterized by different growth types. Throughout the 2012-2020 period, AAE-EMC-HBA-PC showed a greater adoption rate compared to AHS-LEM. For the first group, HBA adopted ORCID early, then AAE became a forefront adopter. Despite a slow start, PC is nowadays the leading adopting discipline. For the second group with lower adoption, LEM was quicker to create ORCID profiles compared to AHS until 2018 when AHS showed the largest adoption rate among all six disciplines. The rationale behind the changing adoption speed is unknown. We can only speculate on the different publishing houses active in EMC-AAE-HBA-PC (i.e., STEM-Science, Technology, Engineering, and Mathematics) versus AHS-LEM (i.e., SSH-Social Sciences and Humanities) and their use of ORCID: • STEM publishers are generally large-sized firms (e.g., Elsevier, Springer, Wiley). Most of them are members of the ORCID consortium and they integrated ORCID into their peer-review and production system. Authors and reviewers are encouraged to create and link their ORCID to their profile on the submission/review platform (Johnson et al., 2018, p. 162).
• SSH publishers are more diverse and smaller publishing houses (e.g., local University Presses). Many are small-sized companies publishing works on a specific topic, such as Dalloz in Law, Vrain in Philosophy, Presses Universitaires de Rennes in the SSH. Most of these publishers are not members of the ORCID consortium and their authors are not asked to provide an ORCID at submission stage.t

Diversity of ORCID profile (mis)uses
Let us recall that researchers from the Toulouse area register an ORCID themselves (no institutional automated registration is  11. See page 20 in https://anr.fr/fileadmin/aap/2019/aapg-anr-2019-Guide.pdf performed) and fill the associated sections, such as identity, employment biography and works (Section 2). The ideal ORCID profile as illustrated in Fig. 2 lists comprehensive data about the author and his/her publications. Looking at the ORCID profile in our corpus, we found that many are far from such thoroughly completed profile with up-to-date data. Most profiles appear to be incomplete and even empty: Fig. 11 only shows 'philippe Durand' as the author's identity (there were five such empty profiles for this name at the time of data extraction). This homonym issue is even worse with surnames such as Wang (Youtie et al., 2017).
For our corpus, Fig. 12 shows the percentage of ORCID profiles with at least one publication listed. All disciplines considered, only 48.3% of the profiles feature one or more works. This is

FIGURE 12
Share of ORCID profiles with at least one publication.  (Fig. 1) and never connect to ORCID and change the visibility to public, the works are attached to the profiles but invisible to the public.
Based on the visual inspection of 100+ profiles, we noted the following intriguing characteristics regarding two aspects: • The Biography and Employment sections are supposed to list the successive affiliations of the profile owner. Yet, some users list foreign universities that never employed them. It appeared that these universities are affiliations of colleagues with whom the profile owner collaborated. There is also a propension to list prestigious institutions. For researchers employed in French national research organisms (e.g., CNRS and INSERM) and affiliated to a lab in the Toulouse area, we noted that some of them mention the Paris headquarters (e.g., CNRS Paris) instead of the regional lab (e.g., CNRS IRIT Toulouse). And yet, they use their 'local' affiliation in the published papers.
• The works section does not seem comprehensive for most ORCID profiles that contain only a fraction of the publications of the profile owner. We noticed several profiles of highly productive researchers missing hundreds of bibliographic records when compared with the Web of Science. We also raised quality concerns: some bibliographic records did not show any publication date, some titles and authors were misspelled.
These errors suggest that profile owners entered the bibliographic data manually themselves. They might have ignored that they could have automatically retrieved metadata by providing the DOIs or other identifiers they wanted to add to their profiles.
A researcher can create multiple ORCID profiles linked to a several of his/her email addresses. One can only speculate that it was easier and faster to create a new ORCID instead of searching in one's archives for the ID and associated password created a long time ago. We found 18 individuals from our corpus who created multiple ORCID profiles: eight cases of a completed versus empty profile, seven cases of two empty profiles, and three cases of completed versus completed profile (with a varying degree of completion). The profile created last was not always the one completed by the researchers. We do not suspect any fraud related to fake profile creation as reported in (Teixeira da Silva, 2021).
One may wonder: why do not people create ORCID profiles and, when they own one, why do not they fill it properly? ORCID is not the only platform allowing the creation of a profile for an author to list his/her publications (Boudry & Durand-Barthez, 2020;French & Fagan, 2019;Tran & Lyon, 2017). Some authors might own such a profile and view ORCID as a platform providing identifiers only, disregarding its bibliography curation capability. These authors might invest more efforts into Google Scholar and the likes at the expense of their ORCID profiles.
We tested this hypothesis by tabulating ( Table 1)

Focus on the ORCID profiles by 'other staff'
Slightly more than 25% of the 'other staff' have created an ORCID profile (Section 5.1). Including these staff as co-authors or not varies according to the disciplines. Other staff were included as co-authors in 7 out of the 30 labs in Heath, Biology, Agronomy.
It also happened in Astronomy, Astrophysics, Environment and Physics, Chemistry. Some extra cases occurred in Engineering, Mathematics, Computing. This virtually did not happen in Law, Economics, Management and Art, Humanities, Social Sciences. Being listed as co-authors is an incentive for the creation of ORCID profiles among the 'other staff' job category.

CONCLUSION
Introduced in 2012, ORCID has become a key global infrastructure to disambiguate scholarly authors' identities. This study reported on the adoption of ORCID in one of the largest scientific areas in France. Contrary to Youtie et al. (2017) who focused on Wang as a single surname of interest, we considered the entire multidisciplinary research workforce of a French scientific metropole. Matching the 6,607 staff members affiliated to the 145 labs based in the Toulouse area in 2016 with the ORCID registry, we showed a steadily increase for all research disciplines. The overall adoption is 41.8% with varying percentages among job ranks and disciplines. Faculty members in Health, Biology, Agronomy are leading adopters (62.0%) as opposed to those in Law, Economics, Management with 20.4% ORCID profiles. Such differences might result from various contexts: discipline-related incentives enacted by research institutions, funding agencies and the publishing industry. For the 60% ORCID 12. https://orcid.org/statistics 13. https://recognition.webofsciencegroup.com/awards/highlycited/2019/ profiles with contents provided, a qualitative analysis reveals they were not comprehensively filled. Profile sections documenting the biography, employment history, other identifiers, awards and grants received are frequently lacking. The same goes for publications with incomplete lists of works. We acknowledge that the people hired between 2017 and 2021 are not considered in this study. Likewise the people who left Toulouse since 2016 are included in this study performed in 2020. All in all, our results suggest that a majority of researchers in the Toulouse area have little use of ORCIDs or understanding (or interest?) for identifiers.
There are only a few comparable studies of ORCID adoption worldwide. Dasler et al. (2017) analyzed the ORCID profiles created over 2012-2016, stressing breakdowns by discipline and location. This study at the global scale lacks a reference population, contrary to two studies on French scientific areas published later. The adoption in Toulouse of 41.8% (the present study) is higher than the 17.1% reported by (Boudry & Durand-Barthez, 2020) for the Caen University which is smaller in research workforce. We have no clue about this threefold observed difference and more research is needed to decipher this imbalance. The limited adoption of ORCID-and related misusescalls for a better education to the issue of identity management for academics and the use of ORCID to tackle the issue of homonyms among authors worldwide, as stressed by the Surgisphere affair (Piller, 2020).