Scientiﬁc Collaboration Network Analysis: A Brazilian Computer Science Graduate Programs Case

Scientiﬁc collaboration networks can present diﬀerent views of researchers’ interaction. The analysis of these networks allows research transparency focusing on the understanding of how science evolves in diﬀerent contexts. In this article, we present a scientiﬁc collaboration network analysis using social network metrics as average degree , connected components , betweenness and closeness centralities applied to ﬁve Brazilian Computer Science graduate programs. An online artifact built upon the design science research paradigm, namely SCI-synergy, was developed to ease the analysis involving publication data available on the Digital Bibliography & Library Project (DBLP) from the graduate program members allied to the Federal University of Minas Gerais (UFMG), the State University of São Paulo (USP), the Federal University of Rio Grande do Norte (UFRN), the Federal University of Amazonas (UFAM), and the University of Brasília (UnB). The objective is to analyze the scientiﬁc collaboration network of each program focusing on the researchers’ collaboration to understand patterns considering intra-and inter-program relationships. The analysis covers two database updates of 2019 and 2021. Results show that intra-program relationships are not always bigger among the highest-ranked programs in CAPES. Nevertheless, UFMG presents a greater total of cooperation among the ﬁve programs indicating research leadership and good inter-program collaboration patterns. We advocate that the collaboration network analysis using an online artifact can be useful to understand patterns of all the Brazilian Computer Science graduate programs to discover new research perspectives to be improved, avoided or even applied to other contexts.


Introduction
The scientific production process currently requires strategies to interconnect and enrich the different perspectives and interdisciplinarity in science.Information sharing, the joining of competencies, and the union of researchers' efforts to pursue common goals impel the production of knowledge [1].Besides, the association of different points of view may generate new perspectives in research that combined with technology and connection facilities might lead to an increase in the number of collaborations among individuals geographically dispersed [2].These factors contribute to the current appreciation for researchers who are capable of forming productive scientific collaboration networks.
As the number of studies and scientific publications increases, the interest in analyzing these collaborations augment.Regardless of specificities, it is possible to affirm that the co-authoring of artifacts generated by scientific activity, particularly of publications, is an indicator of collaboration [3,4,5].Also, co-authorship exploration can be useful to reveal the flow and pattern of knowledge integration since the relationship between the authors might serve for scientific collaboration network analysis.
Co-authorship studies cover many research contexts verifying differences between academic and technical collaborations [6], collaboration characteristics in disciplines and between researchers from several institutions or countries [7,8,9,10,11,12].These studies confirm that authors collaboration has increased in science.Other studies demonstrate that international cooperative work has a greater impact and visibility increasing researchers' productivity [13].Related to the inter-institutional research collaboration as a channel of open innovation, it boosts the flow of knowledge and technology between actors of an innovation system [14,15,16].
This work focus on Computer Science graduate programs allied to Brazilian universities.Brazilian Computer Science has long studied their graduate programs as presented in a seminal work [17], and posterior works [18,19,20].Focusing on collaboration networks, in [21] authors use a quantification method based on the Gini coefficient to analyze scientific collaboration in research networks.In previous works ( [22,23]), we present a collaboration network analysis as a management tool using publication data available in Lattes Platform from one Computer Science graduate program in Brazil, including a recommendation module for scientific partnerships.Besides, many international works are leading to interesting findings as in [24] that explores how collaboration in Computer Science evolved since 1960, measuring influences of scientist groups based on multiple types of collaboration [25], and a cross-disciplinary research work presenting a survey of scientific teamwork collaboration [26].Nevertheless, there is a research gap concerning the Brazilian Computer Science graduate programs with intra-and inter-scientific collaboration analysis.
Therefore, this work builds upon the fact that information related to the researchers' publications in their respective graduate programs with intra-and interprogram partnerships are not usually available online.Thus, the assumed hypothesis is that the use of online scientific collaboration network artifacts may help in promoting transparency of co-authorship.The resulting information might help researchers and institutions to analyze collaborative relationships to improve research policy decisions.For example, institutions may implement initiatives to foster the development of potential research collaborations with common interests but geographically distant to reduce regional disparities or increase internationalization.
In this context, we present a scientific collaboration network analysis using social network metrics through an online artifact, namely SCI-synergy.SCI-synergy was established over the design science research paradigm as an important methodology for information systems.The source data is from five Computer Science graduate programs linked to Brazilian universities focusing on the co-authorship relation in publications indexed by the Digital Bibliography & Library Project (DBLP) [27].The graduate programs are located at five different states in Brazil illustrating the variety of characteristics of the country's research in the area: the Federal University of Minas Gerais (UFMG), the State University of São Paulo (USP), the Federal

Social Networks
The scientific study of networks is broadly interdisciplinary including computer networks, social networks, and biological networks [33].Social networks have received an enormous amount of interest in the last few years through the use of technology in platforms such as Facebook, Instagram, and others, having important breakthroughs in the social sciences field.
According to the fundamentals of graph theory, a social network can be represented by a graph G = (V, E), where V is the set of vertices that denotes individuals under consideration, and E are the edges corresponding to a set of existing relationships between these individuals (e.g., friendship, parenthood or professional collaboration).Relationships can have different intensities that reflect the strength of social connections and their intensity is usually represented by a function w(e) with e ∈ E, which associates a weight as a property of an edge in the graph.
Scientific collaboration networks are in essence social networks where relationships represent some type of scientific interaction.In this work, we are interested in a specific network, where the vertices correspond to authors of scientific publication and the edges (with weights) express how much two authors have collaborated in the authorship of articles.In this regard, there are wide-open platforms such as Re-searchGate with over 15 million registered researchers and 118 million publications.But such platforms do not provide transparency of scientific collaboration through graphical views with social network metrics to understand the existing relationships through intra-and inter-program perspectives.

Network Metrics
With the structure of a network, it is possible to compute a variety of measures that represent particular features of the network topology.In the literature, there are standard measures and metrics for quantifying network structure including degree centrality, Eigenvector centrality, Katz centrality, transitivity, reciprocity, and similarity.Some social network metrics first introduced to compute aspects of the network relationships are now in wide use in many other areas [33].In this work, we assume the network centrality concepts according to [34].The following metrics are used to allow the analysis of co-authorship relationships: • average degree or degree of vertices computes the average number of relationships of each node in the graph by the sum of adjacent edges divided by the number of vertices as presented in Equation 1.Let N = |V | be the number of nodes, and L = |E| be the number of edges, the average degree of a network is given by: The average degree highlights how connected the network is.With a low average degree, there are many isolated nodes, while, a high score tells that there are many relationships among nodes (i.e., high collaboration).The average degree presents the overall profile of each university graduate program network.This measure can be used to find very connected or collaborative researchers who may easily connect with the wider network of researchers.• connected components or union-find describes disjoint sets of connected nodes in the network detecting smaller research groups or communities.This measure gives us a picture of how big and how many communities there are in the network.These communities can be inside a graduate program (intra-program view) or among graduate programs from different universities (inter-program view).• betweenness centrality calculates the shortest path through a network between two vertices (i.e., geodesic path) using Equation 2and the breadth-first search algorithm.Let n i s,t be the number of geodesic paths from s to t that pass through i and let n s,t be the total number of geodesic paths from s to t.The betweenness centrality of vertex i is: where by convention the ratio w i s,t = 0 if n s,t = 0.This measure shows researchers that can act as 'bridges' between nodes in a network, which is a way to find members who can influence the flow around a system.• closeness centrality calculates the shortest paths between all nodes as presented in Equation 3 since nodes with a high closeness score have the shortest distances to all other nodes being useful to find researchers best placed to influence the entire network quicker.Suppose d i,j is the length of a geodesic path from i to j, meaning the number of edges along the path.Then the closeness centrality for vertex i is:

Name Disambiguation
In the context of scientific social networks, one challenge is to check whether authors correspond to the same individuals.Although there is a whole area of research on automatic disambiguation of author names [35,36,37], we apply traditional techniques available at the DBLP that include homonyms and synonyms treatment.
A homonym is one of a group of words that share the same spelling but have different meanings.In DBLP different authors with the same name are homonyms.Thus, the same name refers to the same (Latin-1) string, taking punctuation (e.g., "O'Shea" and "O-Shea"), diacritics (e.g., AEleen" and "AEleen), and case ("Gianluigi" and "GianLuigi") into account to consider different names.
In DBLP digital bibliography repository, authors' data is represented by their page.Authors are assigned a unique key and their names are distinguished in the database by a unique numerical suffix.At the moment, the splitting of existing DBLP author pages is either triggered by requests of authors or if the DBLP team can prove that there are several persons behind an entry.Unfortunately, in many cases, homonyms remain undetected needing further investigation.
There are many reasons why several author names are considered to be synonymous for a particular author: name changes, nicknames, sporadic use of middle names, missing or abbreviated name parts, or even pseudonyms.Different spelling, misspelling, or mistranslation are also causes of author name synonym [38].Occasional spelling errors in the publishers' metadata also complicate the matter.
When multiple versions of a name are frequently used on publications, these names may be included as aliases to the DBLP data set and we used those in the SCIsynergy artifact.There are many techniques to deal with author name disambiguation such as string similarity metrics, e.g., Jaccard similarity, Levenshtein distance, term frequency-inverse document frequency (TF-IDF), and the Jaro-Winkler distance [39,40].In future work, we may use some other metrics associated with the DBLP author name disambiguation.

Environment
The environment in the DSR paradigm establishes where the information system research is being applied and to whom it wants to provide results and benefits.This work focuses on research institutions, more specifically, graduate programs allied to Brazilian universities using real data of five Computer Science programs.
In Brazil, every four years the graduate programs have the research and education quality assessed by a public agency within the Brazilian Ministry of Higher Education, the Brazilian Coordination for the Improvement of Higher Education Personnel (CAPES) [1] .CAPES is responsible for the expansion and consolidation of graduate programs in the country.For that, it is constantly improving the assessment metrics.Nevertheless, there are some commonly used metrics such as the intellectual production that always compose the program's weight, a numerical value between 3 and 7.According to CAPES, a program which weight is 7 excels in their respective fields worldwide.
The Brazilian graduate programs' research policy focus on the increase of intraand inter-program interactions encouraging cooperation with international senior researchers to achieve the highest level at the quadrennial assessment carried on by CAPES (last run 2013-2016).As presented in the Introduction section, the selected Computer Science graduate programs cover four different regions and five states in Brazil: in the Southeast region, Minas Gerais state, the UFMG (level 7 by CAPES), and in São Paulo state, the USP (level 6 by CAPES); in the North region, Amazonas state, the UFAM (level 5 by CAPES); in the Northeastern region, Rio Grande do Norte state, the UFRN (level 5 by CAPES); and in the Central-West region, Federal District state, the UnB (level 5 by CAPES).Since there are no level 7 and 6 Computer Science graduate programs in the North and Central-West regions of Brazil, we have chosen three programs level 5 to enable comparison among regions.
The researchers' information was captured from the websites of the graduate programs linked to the publications' authorship in the DBLP digital repository.Additional DSR environment details are presented in the Artifact Construct section focusing on technological aspects.

Artifact Construct
Considering the computational infrastructure used in the SCI-synergy artifact construct the architecture is presented in Figure 1.The architecture is composed of three modules encapsulating different requirements including data source (input), database storage, and collaboration network visualization (output): • data collection from digital repositories -researchers information (authors name) from the graduate programs web sites (Uniform Resource Locator -URL), and authors' publications from the DBLP data source eXtensible Markup Language (XML); • graph NoSQL database construction -consolidates all data collected from the other two repositories (web sites and DBLP); • social network metricsaverage degree, connected components.The betweenness and closeness centralities are compute based on [41]; [1] http://www.capes.gov.br/ • visualization interface -the presentation of the scientific collaboration network views with intra-and inter-program, and researchers' information with collaboration chart and publication histogram.The operationalization process is presented in Figure 2. The process begins with researchers' data collection from the university's graduate programs (web sites) using a Python script that receives the address of the page as input.The researchers' data collection includes the name (authors of publications), the Brazilian Curriculum Vitae platform called Lattes [2] and the email (when available).The researchers' data is stored in a JavaScript Object Notation (JSON) file of the respective graduate program.These initial authors are called 'seeds' since it is from them that the publications with co-authors are filtered from DBLP records (retrieved in XML format).The 'seeds' are loaded on the graph database according to the conceptual data model presented in Figure 3.One important aspect to note is that the researcher's name in the graduate programs web sites should match with the author's name in DBLP, however, it is not what happens in all cases, and even using disambiguation techniques some information maybe get lost.
In parallel, publications data is recovered from the DBLP repository.A DBLP XML archive is obtained through the DBLP url [3] being compressed in gzip format.Then, publications are loaded on the graph database whose authors are from one of the selected institutions ('seeds') and respective coauthors from other institutions.The DBLP name disambiguation of the authors' names is carried out as cited in the Research Approach section.The authoring attributes matching is then performed to create the scientific collaboration network (attribute name for vertices of type Author and attribute author for vertices of type Publication).
Finally, the graph visualization of the scientific collaboration networks is generated.Subsequently, the scientific social network graph is ready to be visualized through the online interface or using a tool like Gephi, graph visualization and manipulation software [42].The graph presentation's purpose is to allow the visualization of the relationships between the researchers and how connected (the edge weight) they are with other groups.From the final database, several information and statistics focusing on researchers' collaboration are collected and presented.
The conceptual data model used in the graph NoSQL database is depicted in the form of a directed graph in Figure 3.The centered vertex «Author» represents the authors of each published article which using the W rite edge relates to their respective articles («Article» vertex) that through the P ublish edge relates to the «Journal» vertex.The relationship of each author with their respective university was modeled through the «Institution» vertex through the Associated edge and its relation to the graduate program («Program» vertex) to which the author is bound using the Has edge when the author is a researcher from one of the five universities used in the case.Authors may also be connected to other authors through Co-Authoring edge.

Technologies Summary
The artifact construct applied different technologies that helped to overcome the work challenges.One challenge was the storage of the collaboration network data in [2] http://lattes.cnpq.br [3]http://dblp.uni-trier.de/xml a graph NoSQL database.For that the Neo4j was chosen since it has the best query performance in graph-based data models [43].To execute analytical calculations and the generation of graphs, the Networkx library was used [44].The Networkx implementation in Python allowed the integration with the software artifacts developed in this research.For the communication between the various modules implemented the JSON data format was adopted, since it is used for structured data representation, being very common in the development of Web services to the detriment of the XML format.
In parallel with the collection of the researchers of each graduate program, the DBLP publications data was obtained in the form of a single compressed XML file in the gzip format of 510 MB in size.When extracted the XML file consumed about 2. 5

Scientific Collaboration Network Analysis
The scientific collaboration network analysis using social network metrics as average degree, connected components, betweenness and closeness centralities were applied to five Brazilian Computer Science graduate programs using the SCI-synergy artifact.Considering the database update in 2019-12-27, the five programs include 179 members distributed as follows (decreasing order): 55 from UFMG, 38 from USP, 36 from UFAM, 26 from UnB, and 24 from UFRN.From the initial number of 179 researchers ('seeds'), the co-authors were added, resulting in 7, 368 authors in the scientific collaboration network.
The names of the 'seeds' were used to filter the publications from the DBLP.These authors produced 7, 203 publications between journal articles (2, 229), conferences proceedings (4, 961) and books (13).In total 13, 962 vertices and 77, 605 edges were loaded into the scientific collaboration network including authors and publications from the five Computer Science graduate programs together with co-authors from other institutions.A database update was done in 2021-02-03 for the five graduate programs that increase the number of members to (194) distributed as follows (decreasing order): 61 from UFMG, 37 from USP, 36 from UFAM, 31 from UnB, and 29 from UFRN.The total number of authors increased to 8, 418, and publications to 8, 757, with 3, 030 journal articles, 5, 711 conferences proceedings, and 16 books.
The SCI-synergy home page is presented in Figure 4 including available universities and number of researchers; the total number of graduate programs researchers, the total number of authors (researchers and co-authors), the total number of publications (detailed by books, journal, and in proceedings articles); and the intraand inter-program collaboration network among the five programs.

RQ1. How is the scientific collaboration network of each graduate program?
Considering the four social network metrics used in the analysis, we applied the average degree to each graduate program to check how is the collaboration network.As presented in the Network Metrics section, a high average degree score tells that there are many relationships among nodes while many isolated nodes present a low score.Although it is a basic centrality measure to understand such potentially complex networks, we consider this metric can present the overall profile of each program focusing on how connected or collaborative researchers are in the scientific network.The first order neighborhood was used to calculate the average degree of each program, the co-authorship relation between the allied authors and their co-authors was considered for this metric.
Related to the database update in 2019-12-27, Table 1 presents the total network amounts including the number of researchers, co-authors, co-authorship relations, scientific production with the period, and average degree of each graduate program.Note that the average degree of each collaboration network varies from 4.58 to 2.76, which means that on average each author is connected to 4 and 2 others in the UFMG and USP programs, respectively.Considering that the computer science graduate program from UFMG is level 7 by CAPES, it is expected that the average degree of the researchers' network would be bigger than the other programs that are level 6 (USP) and level 5 (UFAM, UFRN, and UnB).Although UFMG and USP period of publication is almost the same, 1981 and 1980, respectively, the coauthors' number, co-authorship relations, and the total of publications are almost three times bigger in UFMG than in USP, resulting in USP's average degree of 60% of UFMG.It is interesting to note that the average degree of UFAM (3.78) is the highest among the three level 5 programs even with the least number of co-authors (895) and the total number of publications (819) meaning there is a well-connected network in the program with many collaborative researchers.
To compare the 2019 results to the 2021 database update, we present Table 2.Although the number of researchers had a little increase from 179 to 194, there are changes in the number of co-authors, co-authorship relations, total publications, and average degree of the programs.Note that the average degree of each collaboration network varies from 3.49 to 2.32 with the highest being UFMG with 3 and the other four with 2. The average degree of UFRN (2.83) is the highest among the other level 5 programs (UFAM, and UnB) and USP (2.45).
The Scientific Network menu option in SCI-synergy is illustrated in Figure 5 with the UnB graduate program highlighting the filters to refine the search by institution, researcher, and period (year).For each of the five programs, researchers appear with the same vertex color in the network differing from researchers linked to other programs using the colors presented on the SCI-synergy home page (Figure 4).The UFMG, USP, UFAM, UnB, and UFRN network views are presented in Figures 6, 7, 8 9, 10, respectively.

RQ2. How cooperative is each researcher in the graduate program?
The collaborations among researchers can occur in different ways.To check how cooperative each researcher is in the graduate program we used two centrality mea-sures.The betweenness centrality allows finding the members that act as "bridge" between researchers in the collaboration network, or the researchers who influence the flow of knowledge in the scientific network.Table 3 presents the ten first researchers with the highest betweenness centrality score, their program association, CNPq researcher scholarship level, considering both database updates (2019-12-27 and 2021-02-03).The researchers' names appear anonymized to maintain privacy, but can be accessed in Sci-synergy.The same name identification is kept in the closeness centrality in Table 4.
Considering the 2019-12-27 database update, six of the ten highest scores are from the UFMG program (60%).But considering the 2021-02-03 update, the participation of UFMG falls to 40%.These researchers receive the CNPq researcher scholarship varying from PQ 1A (the highest level) to PQ 2. It is interesting to note that some researchers are not associated with the five studied programs but from PUC-Rio, UFPA, UFRJ, Unicamp, and the University of Ottawa in Canada, who promote collaboration in the network.
Another centrality measure used to check how cooperative researchers are is the closeness centrality.This metric tells us who is the researcher that is closer to a major number of other collaborators in the social network and in this way can influence the entire network quicker.Table 4 presents the ten first researchers with the highest scores with their program association and CNPq researcher level, considering both database updates (2019-12-27 and 2021-02-03).
Considering the 2019-12-27 database update, nine of the ten highest scores are from the UFMG program (90%), where all researchers receive the CNPq scholarship varying from PQ 1A to PQ 2. However, in update 2021-02-03 the UFMG participation falls to 60% including a bigger diversity of programs (UnB, USP, UFRN).Considering the update 2019-12-27, although the score variation is small (from 0.3305 to 0.3017), it is interesting to note that two PQ 2 researchers have the highest closeness centrality scores than others with the highest CNPq scholarships.Also, an interesting fact happens with the 2021-02-03 update, where six researchers without the CNPq scholarship are better ranked than two PQ 1A researchers.This is a trait of closeness centrality, meaning these researchers can influence the entire network quicker.
Although the degree of internationalization is not the focus of this work, we illustrate in Table 5 aspects of collaboration occurring with one researcher of each Computer Science graduate program considering the 2021-02-03 database update.We included researchers that receive the CNPq scholarship level 1 (varying from PQ 1A to 1D) since they are expected to have the highest degree of internationalization.The aspects include scores of intra-program, inter-program, national (Brazilian co-authors), and international co-authorship with a relative score (Intl.(%) = international/(intra-program + inter-program + national + international)).
Since the DBLP digital repository does not include all researchers' affiliations it was not possible to extract the degree of internationalization automatically, demanding manual treatment.Note the relative score presents a great variation among the researchers with the following decreasing order: (1) Name4 (UnB, PQ 1B) 67.64%; (2) Name2 (USP, PQ 1D) 40.19%; (3) Name1 (UFMG, PQ 1A) 16.27%; (4) Name5 (UFRN, PQ 1D) 13.57%; (5) Name3 (UFAM, PQ 1B) 5.98%.Thus, the highest international co-authorship relative score is not allied to the highest CNPq scholarship.In future work, the degree of internationalization would be computed automatically with the inclusion of affiliation attribute in the database or the association of other databases that provide some affiliation attribute, e.g., ORCID [4] .
Individual collaboration aspects of the researchers can be viewed through filters in the SCI-synergy artifact as presented in Figure 12.Each edge on the network represents at least one relationship of coauthoring with the total amount appearing as the edge label.
There is also the Find Researcher menu option as presented in Figure 13.This option presents how cooperative the researcher is in the graduate program comparing to the highest, medium, and lowest coauthoring degree by a year.The total number and the name of coauthors are presented by year with the number of publications.An annual histogram of publications is presented in Figure 14.The search engine uses a full-text search index on researchers' names allowing queries with partial names.

RQ3. What are the intra-and inter-program collaboration patterns?
The intra-and inter-program research collaboration network views of the five graduate programs are presented in Figure 11 considering both database updates (a) 2019-12-27 and (b) 2021-02-03.In the relationships graph, the edge width represents the amount of collaboration between the institutions and the total amount appears like the edge label.
Tables 6a and 6b present the intra-and inter-program relationships of the graduate programs with two database updates.Considering the 2019 update and the inter-program relationships greater than ten, we note that there are 408 relationships between UFMG-UFAM, followed by UFMG-UnB (59), UFMG-UFRN (28), and UnB-UFRN (26).Thus, UFMG (level 7 by CAPES) presents a greater total of collaborations among the five Computer Science graduate programs indicating research leadership and good inter-program cooperation patterns.An interesting aspect is an absence of cooperation between some level 5 programs as UFAM-UnB, and UFAM-UFRN.Also considering the intra-program view the UFMG holds the highest collaboration score (3, 290), followed by UFAM (684), USP (532), UFRN (464), and UnB (456).Thus, the intra-program relationships are not always bigger among the highest-ranked programs in CAPES as presented by the following decrescent order: UFMG (level 7), UFAM (level 5), USP (level 6), UFRN and UnB (level 5).
Considering the 2021 database update the relationship patterns are the same.If we consider relationships greater than twelve we note 410 collaborations between UFMG-UFAM, followed by UFMG-UnB (67), UFMG-UFRN (28), UnB-UFRN (27), UFMG keeps the collaboration leadership of the programs.Also, the absence of cooperation between UFAM-UnB, and UFAM-UFRN programs are kept.The intra-program view presents UFMG with the highest collaboration score (3, 498), followed by UFAM (764), USP (564), UFRN (526), and UnB (512). [4]The ORCID iD is a unique, persistent identifier free of charge to researchers, available at https://orcid.org/ Considering the topological aspect of the network, we applied the connected components to check how big and how many communities there are in the network and validate whether the small world phenomenon [45] happens in this network.The identification component represents each research group in the network.In the SCI-synergy interface, by clicking on the id numbers it is possible to observe their participants.We present in Table 7 the first ten groups applying the connected components metric to the network formed by the five graduate programs with the two database updates (2019-12-27 and 2021-02-03).
Note that component 8 is formed by 6, 933 inter-program members in 2019, representing the biggest group in the network.In 2021 (14 months later), the component 0 has 8, 373 members (the same component 8 in 2019).Thus, the social network is more connected in the biggest group in 2021 than in 2019.Note also, that in 2021 there are only four groups with more than one component while there were eight in 2019.
Consulting the 2019 data, we verified that in the bigger component 8 with 6, 933 members only 160 are linked to one of the five programs analyzed.The graph in Figure 15  Thus, the connected components metric indicates that most of the researchers are connected to the bigger groups in both years (6, 933 -2019, 8, 373 -2021).Also, there are many individual groups in the whole network as presented in Table 7. Besides, considering Figures 15 and 16, there are four programs with high participation in the bigger groups (USP, UFRN, UFMG, UnB).But in 2019 there is a more uniform participation distribution -UFMG 92.73%, USP 89.47%, UFAM 86.11%, UnB 92.31%, UFRN 87.50% -than in 2021 -UFMG 95.08%, USP 100%, UFAM 50%, UnB 87.1%, UFRN 96.55%.Considering the 2019 and 2021 results there are programs that increased participation in bigger group -UFMG, USP, UFRN and others that decreased participation -UFAM, Unb.Further investigation is necessary to drive conclusions related to the connected components composition.
Considering the researchers' cooperation in the graduate programs many aspects can be analyzed using the intra-and inter-program views of a scientific collaboration network.For example, the network results of Table 7 presents many individual groups which might reflect a specific characteristic of the graduate programs analyzed, but further investigation including other computer science programs can verify whether this is a national reality.However, we can confirm that this network has a small world aspect with more than 98% of members in the bigger component or 8,373 members of 8,418 total researchers.

Discussion
Interesting results answering the research questions were presented in the Scientific Collaboration Network Analysis section.In this section, we present an overview of these results focusing on each Computer Science graduate program considering the two database updates (2019-12-27 and 2021-02-03) as put together in Tables 8a  and 8b.
The highlights of each program network analysis include: • the overall profile of the UFMG graduate program presents the highest average degree among the five analyzed programs -4.58 in 2019 and 3.49 in 2021 -meaning that on average each researcher is connected to four and three other researchers, respectively (Tables 1 and 2).Checking the researchers who influence the flow of knowledge in the whole network, there are six in ten researchers with the highest scores from UFMG (60% of betweenness centrality in 2019, Table 3) and 40% in 2021.Besides, checking how cooperative researchers are computing the closeness centrality we note that from the ten highest scores nine are from UFMG (90% in 2019 and 60% in 2021, Table 4).Also, UFMG presents the highest intra-program collaboration amount with 3,290 in 2019 and 3,498 in 2021, and also inter-program with 498 in 2019 and 508 in 2021 (Tables 6a and 6b).The connected components of the UFMG program present the highest participation (51/55= 92.73%) in the bigger groups formed by 6, 933 members in 2019, and 58/61= 95.08% in 2021 (Table 7).Note that UFMG presents the highest results in five aspects reported in Table 8a.In special, it presents a high degree of closeness and betweenness centrality affecting other programs in their collaborative pairing.UFMG presents the highest intra-program collaboration (3,290 and 3,498) and interprogram collaboration, especially with the UFAM program (408 and 410).In summary, UFMG metrics confirm that the program is a good representative of the level 7 programs by CAPES.• the USP graduate program presents the lowest average degree among the five programs -2.76 in 2019 and the second lower in 2021 (2.45) -meaning that on average each researcher is connected to two other researchers (Tables 1 and 2).USP has one in ten researchers with the highest scores of betweenness centrality 10% in 2019 and 0% in 2021 (Table 3), and none in the ten highest scores of closeness centrality 0% in 2019 and 10% in 2021 (Table 4).Also, USP presents the third highest intra-program collaboration amount with (532) in 2019 and (564) in 2021, and the lowest inter-program amount with 23 in 2019 and 25 in 2021 (Tables 6a and 6b).The connected components of USP present a good level of participation in the bigger group of 2019 (34/38=89.47%)increasing to 37/37=100% in 2021 (Table 7).USP is a level 6 program by CAPES presenting an average degree lower than the three level 5 programs in 2019 and lower than two in 2021 (UFAM, UFRN), indicating that the researchers are not very connected in the network.USP presents good intra-program collaboration, but very incipient inter-program collaboration with UFAM, UnB, and UFRN programs.USP presents a degree of betweenness centrality egual to UFAM and better than UnB and UFRN in 2019, but worse than UFAM in 2021.The closeness centrality of USP in 2019 is equal to the three level 5 programs and in 2021 equal to UFRN, worse than UnB and better than UFAM.USP presents the third-highest connected components score in the bigger group of 2019 and the best in 2021 showing a greater increase in the social network connection in a short period (14 months), while UFAM and UnB decreased their participation in the last year.• the UFAM graduate program presents the highest average degree among the three level 5 programs by CAPES with 3.78 in 2019 and the second highest 2.63 in 2021 -meaning that on average each researcher is connected to three and two other researchers in the network, respectively, just like UnB and UFRN (Tables 1 and 2).Considering 2019, UFAM (like USP) has one researcher in ten researchers with the highest scores of betweenness centrality (10%, Table 3) and none in the ten highest scores of closeness centrality (0%, Table 4).However the participation of UFAM in top betweenness centrality is 20% while in top closeness centrality there is no UFAM researcher in 2021.Also, UFAM presents the second highest intra-program collaboration amount with (684) in 2019 and 764 in 2021, and also the second highest inter-program amount with 418 in 2019 and 422 in 2021 (Tables 6a and 6b).
The connected components of the UFAM program present the smallest participation in the bigger groups of 2019 (31/36=86.11%)(Table 7), getting worse in 2021 with 18/36=50% of participation in the bigger component (Table 8b.UFAM presents the most recent publication period (1995-2019) among the five programs, with a concentrated inter-program collaboration with the UFMG program (408) (Table 6a) and 410 relationships (Table 6b).
Considering 2021, UFAM (level 5 by CAPES) is better than USP (level 6) presenting two researchers in the hall of the ten highest betweenness centrality score who act as "bridge" between researchers in the whole scientific network, where one of them without CNPq PQ scholarship.• the UnB graduate program presents the lowest average degree among the three level 5 programs 3.04 in 2019 and 2.32 in 2021 -meaning that on average each researcher is connected to three and two other researchers in the network, respectively, just like UFAM and UFRN (Tables 1 and 2).UnB has none researcher with the ten highest scores of the betweenness centrality (0%,  1  and 2).Like UnB, UFRN has none researcher with the highest scores of the betweenness centrality (0%, Table 3) but one in the closeness centrality (10%, Table 4).Also, UFRN presents the second best intra-program collaboration amount among the level 5 programs with 464 in 2019, but the second lowest inter-program amount with 55 in 2019 and 56 in 2021 (Tables 6a and 6b).The connected components of UFRN present a good level of participation in the bigger group of 2019 (21/24=87.50%)(Table 7) and this score is improved in 2021 (28/29=96.55%)(Table 8b).The UFRN program is the smallest in the number of researchers, but with a good average degree (better than USP and UnB), intra-program collaboration score better than UnB, interprogram collaboration score better than USP, connected components in the bigger group of 2019 better than UFAM, even increasing in 2021 indicating a successful effort to have a well-connected collaboration network.

Related Work
The literature review highlight works that deal with collaboration aspects in scientific social networks and/or discuss the Brazilian Computer Science scenario.To analyze the works we elected a set of aspects stressing the aims of this research being inspired by the subjects addressed by the profile analysis of the top Brazilian Computer Science graduate programs [46].The set of aspects were classified into six categories: • focus of the study -studies concentrated on individual scholars, research groups or institutions.• geographic coverage -the scientific network coverage, i.e., specific country or the world.• data source -the data source is DBLP digital library or other platforms/sites (e.g., Lattes, Jems).• adopted metrics -researchers' volume of publication, considers authors and co-authors, and social network metrics (e.g., average degree, connected components, betweenness and closeness centralities).• scope of the analysis -the network properties, the adopted collaboration patterns, how temporal evolution is considered, how intra-and inter-program collaboration occurs.• implementation -the artifact to analyze the scientific network is available online and the code accessible for anyone.
Considering the listed set of aspects, SCI-synergy was compared to nine research articles summarized in Table 9 and discussed in the sequence.mostly from Brazil but co-authors from the world.As future work, the database in SCI-synergy can be extended to include the 88 Computer Science graduate programs in Brazil.
Adopted Metrics, Scope of the Analysis and Implementation Regarding the adopted metrics, all works used the researchers' volume of publication considering authors, but [50,49] didn't use co-authoring.In this work, we calculate how many co-authors one author has worked with and how many times this collaboration occurred.In [48] authors apply the "tieness" metric to compute the strength of relationships between a pair of authors to identify when a tie is weak or strong.In [47] social network metrics of Ph.Ds.authors in Computer Science are computed based on the co-authorship network.We also applied social network metrics such as average degree, connected components, betweenness and closeness centralities.The papers [50,51,46] dealt with venue's quality using the Brazilian CAPES Qualis ranking.
Regarding the scope of analysis, the works [47,48,52,53,49] have applied network properties implementing them.Collaboration patterns were considered in [46,51,52,53,49,54].Specially small world and link density collaboration patterns were applied in [51,53].Temporal evolution is taken into account in [46,50,51,47,52,54], but mainly manipulated manually.In this work, we made use of the absolute frequency of interaction, but we stand out by considering the elapsed time between the co-authorship.Using SCI-synergy it is possible year-round queries, where a researcher might query his/her network or the graduate program network considering publications within a specific period (e.g. the last 3 years).Note that our work is the only one that focuses analysis on intra-and inter-program collaboration patterns.
Considering implementation aspects, there is no solution implemented with code available in any of the presented related work.The SCI-synergy online artifact is implemented with the code available in the GitLab as presented in the Technologies Summary section.
Note by the summarized aspects listed in Table 9, that our work is the only one with an implementation available with SCI-synergy that presents an intra-and inter-program analysis using social network metrics.

Conclusion
This work presents a scientific collaboration network analysis of five Brazilian Computer Science graduate programs using social network metrics and the SCI-synergy artifact.The analysis focuses on the researchers' collaboration involving publication data available on the DBLP to understand patterns of intra-and inter-program relationships considering two database updates of 2019-12-27 and 2021-02-03.Three research questions posed to be investigated were answered through the analysis of scientific co-authorship applying social network metricsaverage degree, connected components, betweenness centrality, and closeness centrality.
The database update comparison allows many interesting results as the number of members in the graduate programs have a little increased from 179 to 194: • This increase resulted in changes in the number of co-authors, co-authorship relations, total publications, and consequently decreased the average degree of the programs (Tables 1 and 2).The highest increase in the number of researchers was in UFMG with six or 10% of the amount in 2019, and five new researchers in UnB and UFRN.However, the number of researchers decreased by one in USP (2% of the amount in 2019) and kept the same in UFAM.• The betweenness centrality shows a high concentration of UFMG researchers acting as "bridge" in the collaboration network (60% -Table 3) that decreased to 40% in 2021.• The closeness centrality shows that UFMG researchers can influence the entire network quicker as they boost the flow of knowledge (90% in 2019 and 60% in 2021 -Table 4).• The internationalization relative score shows a great variation among the graduate programs' researchers, where not always the highest CNPq fellow scholarship presents the biggest international collaboration network (Table 5).15 and 16).This result demands further investigation, but a first guess relates to the pandemic situation over 2020.In future work, the SCI-synergy database could be extended to include the 88 Computer Science graduate programs in Brazil.Affiliation attributes would be introduced in the database to allow institutional analysis and internationalization aspects of co-authorship.Also, different data sources can be explored like Web of Science and Scopus to include evaluation of other knowledge domains.Disambiguation techniques have to be improved to explore authors' correspondence.Finally, additional network metrics, such as degree centrality, Eigenvector centrality, Katz centrality, transitivity, reciprocity, and similarity, can be used to compute different collaboration aspects.

Figures
Figures

Figure 7 :
Figure 7: The USP scientific collaboration network.

Figure 12 :
Figure 12: SCI-synergy scientific collaboration network focusing on a specific researcher.
GB that was handled by XML parsers of Document Object Model (DOM) and Simple API for XML (SAX).SCI-synergy online artifact is available for use in http://165.227.113.212 with the code available in the GitLab InfoKnow research group project repository (https://gitlab.com/InfoKnow/SocialNetwork/aureliocosta-sci-synergy).

Table 8 :
Overview of scientific collaboration network analysis.