Identifying major civil engineering research influencers and topics using social network analysis

Abstract This paper focused on applying social network analysis techniques to co-authorship network in order to discover the influencers in Civil engineering research field in Nigeria. It further applies the Latent Dirichlet allocation (LDA) algorithm to uncover the major research topics in this field. The research used 663 publications downloaded from the Scopus database, with the year of publication ranging from 1968 to 2018, using Nigeria as the case study, Civil and Structural engineering as the field of research. The study was carried out using the centrality measures in network analysis such as degree centrality, closeness centrality, and betweenness centrality for co-authorship network analysis of authors and text mining using the LDA algorithm to discover the research focus of the authors. Also, the relationship between the centrality measures and authors’ performance, measured in terms of citation was investigated using regression analysis. The results showed that there was a significantly positive relationship with betweenness centrality and closeness centrality for performance, but a negative relationship with degree centrality. Also the topics discovered using the LDA algorithm helped to reveal the major focus of Civil Engineering research in Nigeria. In conclusion, it is recommended that based on the co-authorship network of civil engineering research in Nigeria, which was found to be a healthy small-world community, the environment discovered can be improved upon to support collaboration and sharing of ideas between researchers in the civil engineering field.

ABOUT THE AUTHOR Ibukun T. Afolabi holds a B.Sc in Engineering Physics, an M.Sc in Computer Science and a Ph.D. in Computer Science. Her specific area of research is in the field of Data & Text Mining, Programming and Business Analytics. She has supervised several undergraduate and postgraduate students in computer science research and published over 39 articles in International high impact journals and conferences indexed in Scopus and Web of Science. She is a member of the Computer Profession of Nigeria (CPN) and Nigeria Computer Society (NCS).

PUBLIC INTEREST STATEMENT
In order to provide valuable interventions that will improve academic research, there are current waves of study which focuses on co-authorship networks and co-innovator networks where patents are the focus of the research. In this paper, the developed co-authorship network discovers the influencers in civil engineering research. The study was able to use centrality measures in social network analysis, and a semantic text clustering approach to identify major civil engineering research influencers and topics. In the research, it was discovered that academic performance has a significantly positive relationship with betweenness centrality and closeness centrality but a negative relationship with degree centrality. It is therefore recommended that based on the co-authorship network of civil engineering research in Nigeria, which was found to be a healthy small-world community, the environment discovered can be improved upon to support collaboration and sharing of ideas between researchers in the civil engineering field.

Introduction
Civil Engineering is the branch of engineering that deals with planning, design, and construction of structures for the need of the people (Ricketts et al., 2003) while Structural Engineering is a branch of Civil Engineering that deals with the study and application of structural theory to the design, analysis and evaluation of civil engineering structures (Chen & Liew, 2002). Very important engineering structures in everyday life need a touch of Civil and Structural Engineering: bridge construction, airport construction, highway engineering, water resources engineering, and community and regional planning are all notable mentions (Ricketts et al., 2003). Research in civil and structural engineering is important in Nigeria since the country is still developing and standard structures existing in developed countries are not fully visible yet in the country. From literature, it has been stated that one of the most important factors that causes construction delay in Nigeria are architect's incomplete drawing and incomplete structural drawing (Aibinu & Odeyinka, 2006). In the same study, it was reported that 88% of the delay factors caused 90% of the overall delays on the building projects surveyed (Aibinu & Odeyinka, 2006). The impact of a lot of delay factors on buildings in Nigeria was also confirmed in a recent study (Ogunde et al., 2017). Ogunsemi and Jagboro (2006) also reported an issue in the building projects in Nigeria in their study. Hence, the need for civil and structural engineering research in Nigeria.
Co-authorship network analysis on the other hand involves analyzing the inter-connectedness of authors with respect to how they collaborate on research projects (Racherla & Hu, 2010). Research in co-authorship network analysis typically involves the use of centrality measures which include degree centrality, closeness centrality, and betweenness centrality (Abbasi et al., 2012b) to measure the characteristics of authors in a co-authorship network. Co-authorship network analysis has been done on "steel structures", an emerging research discipline in Civil engineering (Uddin et al., 2012) and also on infrastructure resilience in disasters (Nazarnia & Sarmasti, 2018). Other studies on co-authorship network analysis on the Civil engineering discipline can be found in literature (Abbasi et al., 2012b;Cimenler et al., 2014;Franceschet & Costantini, 2010). Some studies have directed their research efforts to focus on individual countries and regions of the world, some countries of focus in recent studies include Slovenia (Cugmas et al., 2016), Spain (Bordons et al., 2015), UK (Crescenzi et al., 2016), USA (Velden et al., 2010) and Italy (Abramo et al., 2011a;Franceschet & Costantini, 2010).
The research area is also not void of new techniques and innovation, this is also evident in literature. He et al. (2013) proposed a new method for extracting information from a co-authorship network called Diversity Subgraph Extraction (DSE) using data primarily for studying interdisciplinary collaboration. The method was compared against another method based on breadth first search (BFS) and the results favoured DSE in terms of capturing richer information from a network. Oliveira et al. (2017) proposed a Bayesian inference approach to measure the reliability of nodes (researchers) in a network and concluded in their study that the contribution of each researcher is relevant to the maintenance of the research groups.
This research work takes a step further from co-authorship and co-innovator network analysis by introducing semantic analysis to discover the focus of the research outputs in order to provide valuable interventions that will further improve academic research and collaboration.

Literature review
The review is divided into two major sections, the first section discusses the recent works done on co-authorship networks with the focus solely on research papers as the product of academicians while the second section focuses on co-innovator networks where patents are the focus of the research products.
In the first section, Racherla and Hu (2010) did a social network analysis in the field of tourism using data from an identified top three journals in the field which are Annals of Tourism Research, Journal of Travel Research, and Tourism Management. They concluded that researchers in the tourism field restrict their collaborations to a specific group of interest that is; prominent researchers did not show the tendency to connect disparate groups even though researchers tend to favour collaborations with highly successful researchers. Franceschet and Costantini (2010) studied how collaboration affect co-authored papers in terms of impact and quality using dataset across various disciplines in Italian Universities. The measure of impact was achieved through the citation metrics on retrieved papers while quality was measured based on the judgement of peer reviewers on the same set of papers. The national research assessment exercise of Italian universities (a peer review exercise) was used as the major source of data gathering and most of the articles were also indexed in the Web of Science database. They concluded that in terms of impact (citation metrics), in-house papers garnered less citations than those ones with contributions from external collaborators (collaborators from other universities). Velden et al. (2010) obtained data from Web of Science (WoS) for three research fields in the discipline of Chemistry over 20-year period to carry out a mesoscopic analysis for some research groups in Europe and the USA. The authors combined the use of participant interview (qualitative) and network analysis (quantitative) methods as viable approaches for co-authorship analysis. Abramo et al. (2011a) did a bibliometric analysis on Italian researchers in the field of hard sciences with data gathered from Web of Science over the period of 2001-2005 to answer the questions relating to the link between the level of internationalization of a single researcher and research performance. The authors used some indicators that can be broadly grouped into two (research performance indicators and internalization indicators) for the required analysis. The results from their analysis confirmed that researchers with more international collaborations have the best research performance. Abramo et al. (2011b) studied collaboration networks among Italian researchers for the period between 2001 and 2005 and how they affect each researcher's research performance. Like the work of Franceschet and Costantini (2010), it was discovered that the discipline of Physics have the highest degree of international collaboration among researchers and that a researcher with a high level of international collaboration enjoys more productivity (publication volume). Productivity was judged to have more impact on the intensity of international collaboration than the average quality of research papers of scientists although both measurement factors have similar relative importance on the disposition to collaborate internationally. Abbasi et al. (2012a) carried out another study on scholars in the discipline of Information Science and Library Science using data gathered from Scopus within the time frame of 2000 to 2009. The authors sought to look for the relationship between researchers' collaboration network and their citation performance using structural holes theory. They concluded from their results that research performance is positively correlated with a researcher's ego-network structure. Also, it was concluded that researchers with high betweenness centrality (high shortest path among collaborators) exhibit higher performance in their research. Finally, it was suggested that researchers should connect to more diverse research groups to improve their efficiency in research. Abbasi et al. (2012b) carried out a study within the domain of "steel structures" to investigate how authors evolve during the expansion of the co-authorship network using three common centrality measures: degree centrality, closeness centrality and betweenness centrality. The authors set out to answer three research questions and came up with the following conclusions: (1) authors with higher betweenness centrality tend to attract more co-authors than those with higher degree centrality, (2) relatively small number of new authors attach themselves to existing authors, this coincides with the fact that existing authors also prefer to collaborate with their existing co-authors, (3) as one of the main highlights of the paper, the authors identified a "new" property for authors called brokering roles which characterises PhD supervisors who tend to create new collaborations as nodes with the highest degree of betweenness centrality over the years as they get new PhD students. Li et al. (2013) investigated how six indicators of social capital interact and affect citations of Information Systems (IS) researchers using data gotten from the Social Science Citation Index (SSCI). Their results favours betweenness centrality in correspondence to the work of Yan and Ding (2009) and Abbasi et al. (2012b) by concluding that it significantly affect citation count for publications. Also, co-authoring with highly productive researchers has a huge tendency to increase the centrality of a researcher and positively affect their citation counts. The top five pure IS journals were selected for this study and the data gathered spans through the year 1999-2003. Finally, the result presented in the paper also concluded that team exploration and publishing tenure do not have any effect on author citations. Liao (2010) made the same conclusion on team exploration. Benckendorff and Zehrer (2013) studied the influence pioneer authors in tourism research along with their seminal works have on recent works. Scopus was used as the primary data source for data gathered within the spheres of the following leading journals: Annals of Tourism Research (ATR), Journal of Travel Research (JTR) and Tourism Management (TM) between 1996 and 2010. The following data points were gathered from their research: most frequently cited journals, most cited lead authors, co-citation network of most cited authors, most cited individual works and co-citation network of most influential works. De Stefano et al. (2013) carried out a co-authorship analysis among Italian statisticians using three different data sources: Web of Science (WoS), Current Index to Statistics (CIS) and nationally funded research projects (PRIN). All the hypothesis tested in the research study were confirmed leading to the following conclusions: the number of co-authored publications is growing faster than single-authored publications in the field of statistics, the collaboration style of Italian statisticians is similar to that obtained in Social Sciences, the sub-fields have different collaboration styles and finally, the performance of Italian authors in the field of statistics is related to their collaboration styles in the co-authorship network. Didegah and Thelwall (2013) examined whether some factors affect citation counts of articles. The factors include research collaboration, journal and reference impact, abstract readability, and article size. The number of authors and the number of countries positively affect citation counts in the fields studied except that there is no significant correlation between the number of countries and citation counts in the Social Sciences field. The authors also concluded that each additional author causes an increase in the average number of citations and that each additional country in the international collaboration network caused the same positive effect on the citation metric. Ortega (2014) made some main conclusions as answers to the research questions raised in his work include: (1) authors within sparse and thin networks have a higher citation per document count than their counterparts within dense networks. (2) The sparse networks are more frequent in the Mathematics, Social Sciences and Economics, and Business disciplines while the dense networks occur frequently among the Physics, Engineering and Geosciences disciplines. (3) The collaboration type, research impact, and network structure are not correlated. These findings were gotten after the co-authorship analysis was done on data gotten from the Microsoft Academic Search bibliographic source. Cimenler et al. (2014) carried out a regression analysis on gathered data for the College of Engineering of University of South Florida, USA to investigate the effect social network metrics have on their citation performance. Another study by Manganote et al. (2014) was done based on the Scimago Institutions Ranking 2012 focusing on three indicators including internationalisation (international collaboration) which is closely related to our focus in this study. The result revealed a significant difference in the behaviours of Northern American and Western European institutions towards international collaboration. The study also revealed that Chinese institutions have low propensity to collaborate internationally which also reflected in their research impact (which was low). Ebadi and Schiffauerova (2015) studied the impact of some factors on the position of researchers in a collaboration network at individual level. The dataset gathered was from Canadian researchers during the period of 1996 to 2010. It was concluded that funding is an important element in driving researchers to collaborate more. Bordons et al. (2015) analysed the structure of coauthorship networks in three fields: Nanoscience, Pharmacology and Statistics for scientists that reside in Spain over a period of 3 years (2006 to 2008) the data were gathered from the Web of Science database. In the two experimental fields (Nanoscience and Pharmacology), the network structure is dense unlike that of the Statistics field that is less connected. For the three fields, the strengths of links among the co-authors have a positive relationship with the g-index. The authors used a Poisson regression model to explore the relationship between authors' performance and the different measures of the collaboration network. It was also observed that there is a strong relationship between the positions of authors in the collaboration network and their research performance, a very glaring picture was portrayed in Nanoscience and Pharmacology. Finally, answering the third research question in the paper, it was concluded that there exist a significant difference in the "average citedness" of research papers among the three fields used for the study. Sadoughi et al. (2016) studied co-authorship networks of Iranian researchers in the field of parasitology covering the period between 1972 and 2013. Data gathering was done with the Web of Science database. They concluded that the co-authorship network consists of 78 authors with a particular researcher leading in all measures of centrality (degree centrality, betweenness centrality, and closeness centrality) and that the collaboration with international colleagues is mostly done with researchers in England and the United States of America. Guan et al. (2017) took into consideration the node attributes of knowledge elements in a paper in addition to the author attributes to study their effects on paper citations using papers in the research field of wind energy from the Web of Science and Journal Citation Reports databases. Dehdarirad and Nasini (2017) applied a two-mode regression model to capture simultaneously, the effect of paper properties and authors' behavioural characteristics and demography on the success and impact of the academic careers of authors based on the assumption that paper impact is not only affected by the paper characteristics but also on the characteristics of the authors.
The second section of the review focused on co-innovator networks. Beaudry and Schiffauerova (2011) gathered data for Canadian inventors in the field of nanotechnology to study the impact of co-inventorship network characteristics on the quality of inventions (patent quality). They studied 5067 patents from the Nanobank database where at least one of the inventors resides in Canada with the aim of confirming four research hypotheses. Based on previous studies that highlighted the negative impact of geographical distance on R&D collaborations, Morescalchi et al. (2015) carried out a study on data gathered from European Patent Office (EPO) for OECD countries to verify the claim. The study analysed four different networks: co-inventor network, patent citations network, inventor mobility network, and applicant-inventor network within the period of 1988 to 2009. The summary of their findings collaborated with previous findings: the distance effect has really had an impact on the two networks of concern to this study: co-inventor and patent citations. Guan et al. (2015) studied the effects of a multi-level collaboration on innovators' innovation performance, a drift from other studies that analyzed a single network. They used patent records in the field of alternative energy. The inter-country network structure was able to moderate the relationship between the inter-city network structure and innovation performance. The authors opined that their results were consistent with the interaction theory (Paruchuri, 2010). Crescenzi et al. (2016) studied the characteristics of the collaborations among inventors in the UK, based on three research questions. They concluded that geographical proximity seems to be paramount to the inventor's inventions though there is a potential advantage in creating a sparse network through collaborations with innovators outside the geographical boundaries. Guan and Liu (2016) studied the effects of not just collaboration networks but also knowledge networks have on exploitative and exploratory innovations using data for the nano-energy field. The authors considered both types of networks based on two previous works (Wang et al., 2014;Yayavaram & Ahuja, 2008) where it was stated that technological innovation is influenced not only by collaboration networks but also in knowledge networks. This research work takes a step further from co-authorship and co-innovator network analysis by introducing semantic analysis to discover the focus of the research outputs.

Data collection and preprocessing
In order to retrieve the publications used for this research, "Civil and Structural Engineering" was used as the search criteria with the region restricted to Nigeria. A total of 663 records or publications were retrieved and 1461 unique authors were identified using their Scopus Identification Number (Author Ids). The publications retrieved for this research ranged from 1969 to 2018 as revealed in Table 1. The records retrieved had the following attributes; Publication Identity in Scopus, Author Identity in Scopus, Citations, Title, Year, Source title, Affiliations, Abstract, and Keywords. The unidirectional graph generated contained 1461 nodes (authors) and 12,273 edges (collaborations). Each node (author) in the network has a unique identification number and each edge has weight representing the citation of the collaboration. The dataset was retrieved from the Scopus website (https://www.scopus.com), which is an abstract and indexing database with full-text links that is produced by the Elsevier Co. The data were downloaded on the 5th of October 2018. To convert the raw data downloaded from the Scopus database to Edgelist which is the input to the network x API in python and Gephi, a script was written in MATLAB. The network graph for civil engineering research in Nigeria was constructed using network x API in python library and visualized using Gephi (https://gephi.org). The author identity number was used as nodes for the network construction to avoid name variation of the authors.

Research structure
In this study, the analysis is broken down into two sections: Scientific co-authorship Network Analysis on Civil Engineering Research (Section 4.2) and Identification of major civil Engineering research fields (Section 4.3).
The first section includes methods and theories of identifying major influencers of the civil engineering research in Nigeria using Social Network Analysis. Section two includes the description of the research field identification method which is basically text mining. For this research, LDA (Latent Dirichlet allocation) algorithm will be used for Topic Modeliing. The LDA algorithm was applied on the abstracts retrieved for the publications.

Social networks analysis
Social Network Analysis is from Sociology, but based on mathematical notions of graph theory (Arif et al., 2012). According to Kolaczyk and Csárdi (2014), network graph modeling can be used to test the significance of the characteristics of observed network graphs, and also to study proposed mechanisms of real-world networks such as degree distributions and small-world effects. In this research, centrality measures from network analysis was used to find out the important authors in the network, signifying the most important or influential researchers in the civil engineering research in Nigeria. According to (Marsden, 2002), centrality is most often used to explain an actor's structural position in a network. Centrality can therefore be defined as the extent to which a node is connected to other nodes within a specific network (Wasserman & Faust, 1994). Several studies have been carried out on the centrality measures of a co-authorship network and these include (Arif et al., 2012;Badar et al., 2013). Some researches such as Abbasi et al. (2011) investigated and identified positive influences of centrality on performance outcomes in co-authorship network. In this research, we explore questions such as "Who are the major influencers of the civil engineering community in Nigeria?", Are there correlations between performance measures in terms of citation and centrality measures, i.e., does the importance or influence of the researchers correlate with their output in terms of citation. The centrality measures used in this paper includes Degree Centrality, Closeness centrality, and Betweenness Centrality.
Degree Centrality aims at finding centrality based on the notion that important nodes have many connections and is expressed in (1) (Zachary, 1977).
(1) C deg V ð Þ is the degree of centrality of node V N j j is the number of nodes in the network d v is the degree of node V Closeness centrality aims at finding centrality based on the notion that nodes that are important are going to be a short distance away from all other nodes in the network. Equation 2 is used to express the closeness centrality of node V (Zachary, 1977). Betweenness centrality aims at finding centrality based on the assumption that nodes that connect other nodes are important nodes. It is the number of shortest paths between nodes that go through a particular author. It relates to the perspective that importance relates to where a vertex is located with respect to the paths in the network graph (Azondekon et al., 2018). Therefore, the betweenness value gives the conclusion that nodes are important because they connect other nodes. Equation (3) is used to express betweenness centrality (Zachary, 1977).
(3) C btw V ð Þ is the importance of node V using betweenness s; t 2 N is the sum over all possible s and t σ s;t is the number of shortest paths between nodes s and t σ s;t v ð Þ is the number of shortest paths between nodes s and t that pass through node v In this research, the research performance of the authors is measured using citation of authors. According to Garfield (2014), "Citations are the currency of scholarship". Other researchers such as Bornmann et al. (2008) and Badar et al. (2013) also collaborated this belief. According to existing research by Soheili et al. (2015) and Khasseh et al. (2017), regression analysis has been widely used to explore the relationship between centrality measures and also performance measures in order to make recommendations for improvement in co-authorship collaboration for improved research. In this research, we report the results of analysis of variance (ANOVA) for regression analysis for centrality measures and performance measures.

Text mining
According to Irfan et al. (2015), text mining is described as an intersection of techniques from information retrieval, text analysis, Natural Language Processing (NLP), and information classification domains which can be used to provide computational intelligence. It involves a complex procedure, involving automated discovery of useful and interesting knowledge from unstructured and ambiguous textual data (Basu et al., 2001;Han, 2005). Several existing research have attempted text clustering in various forms in order to discover non-trivial knowledge some of which include Liu and Xiong (2011) and Afolabi et al. (2017). In this research the basic text mining process employed is topic modelling which is a text clustering problem where document and words are clustered simultaneously (Blei et al., 2003). The main process is to transform keywords (or terms) contained in text documents (i.e., abstracts) into document-keyword matrix, which is then used to extract the topics in the document collection. According to Song and Suh (2018), various data mining methods have been applied for document-keyword matrix such as clustering, latent semantic analysis, and sentimental analysis. According to Blei et al. (2003), LDA is considered the most useful method for basic tasks of the natural language processes such as classification, novelty detection, summarization, and similarity and relevance judgments. This informed the selection for the LDA algorithm. The text mining methodology employed in this research is based on three major steps. i. Text pre-processing; ii. Creation of document matrix; iii. Building LDA models on the document-term matrix. The text preprocessing step consists of tokenization, normalization, and removal of stop words. The final stage of the pre-processing is the lemmatization using the wordnet dictionary. All the pre-processing was implemented using the python NLTK library. The tokenized words were then converted to document matrix using gensim library in Python. Finally, the LDA models were built using the gensim library. According to Blei et al. (2003), the LDA algorithm can be described as a generative probabilistic model for topic modeling, and it is based on collections of discrete data such as frequency and text corpora. It is an algorithm that is useful for basic natural language processing tasks which involves classification, novelty detection, summarization, and similarity and relevance judgments. The main focus of LDA algorithms is that, documents are represented as random mixtures over latent topics where each topic is determined by a distribution over words (terms) (song & Suh, 2018). The algorithm is able to derive latent topics from topic probability conditioned on the document distributions and word probability conditioned on the topic distribution. LDA algorithm is useful in the case that the topic probabilities indicate an explicit representation of words contained in documents (Lee et al., 2015).
The number of keywords used to represent the latent topics in this work is 10 because it is difficult to characterize the latent topics based on the small number of keywords. Also, five latent topics were selected for each text category examined.
The text mining section is presented in the flow chart in Figure 1.

Descriptive analysis
The degrees of the undirected graph ranged from 1 to 128 with an average degree distribution of 15.943. The degree distribution graph is presented in Figure 2. Mean Closeness centrality was discovered to be 0.711283 and the top 10 authors based on closeness centrality values are revealed in Table 2. Betweenness measures range between 0 and 9605 and the top 10 broker authors, i.e., authors with the highest betweenness centrality measure are also revealed in Table 2. The degree Centrality values ranged from 0.000684 to 0.087671 and the most connected authors, i.e., the top 10 authors based on degree centrality measure are also revealed in Table 2.  Afolabi et al., Cogent Engineering (2020), 7: 1835147 https://doi.org/10.1080/23311916.2020.1835147

Network cohesion
The number of cliques or communities identified in the network totalled 227 and Figure 3 represents the size distribution of the communities (represented as modularity class). The network is not connected and a census of all the communities within the network reveals the existence of a giant community containing 166 authors. Figure 4 is a visualization of the largest community in the network. Clustering coefficient measures the tendency for authors who share connections in the network to become connected which is also known as triadic closure. For co-authorship network the higher the clustering coefficient the higher the possibility of author who share connections in the network to become connected. In this research the average clustering coefficient was calculated using Latapy (2008) to give 0.911 with total triangles of 358,850. The civil engineering co-authorship network has a density of 0.011 which is a measure of how close the network is to complete.
The largest community in the network, consisting of 166 authors visualized in Figure 3, is an indication that the community has authors with high number of collaboration symbolizing a very active research community.

Relationship between performance (i.e. citation) and centrality measures
Regression analysis was used to study the relationship between peformance measures in terms of citation and the centrality measures (Betweenness, Closeness Centrality, and Degree Centrality).
The result of analysis of variance (ANOVA) for regression analysis is presented in Table 3.
The regression analysis result shows that the centrality measures explain just 4.3% of variance of performance measure in terms of citation (R Square = 0.043905). The Significant F is 4.042E-14 which is less than 0.5 indicating a strong regression. The regression coefficients indicate that there is a positive relationship between performance and Betweenness (Coefficients = 7094.659, P-Value = 4.24E-09) and closeness centrality (Coefficients = 114.6901, P-Value = 0.009965) and a negative relationship between performance and degree centrality (Coefficients = −207.761,  P-Value = 2.77E-05). The regression coefficients of each predicting variable showed that Closeness centrality and Betweenness centrality measure can significantly explain the variance of performance.

Identification of major civil engineering research topics
The goal of the text mining section is to discover the latent topics researched on in the most recent year, that is 2018, the topics researched on in the largest research community and finally, the topics focused on in the civil engineering research in Nigeria overall. The topics of the research extracted by the text mining and LDA algorithm are presented in Table 4.
Although unnecessary keywords have been eliminated by the semantic text pre-processing step, the title of topics is determined with the help of experts and some of the topics consist of common words. From Table 4, it can be seen that in 2018, Topic 1 focuses on studying the strength of concrete, material, there is also interest in waste, material, and property with emphasis on using models for this study. Topic 2 also focuses on strength of concrete but in addition, words like cement is featured in this topic. Topic 3 still revolves around the strength of material but in addition, steel, construction, and Nigeria is added to the keywords. Topic 4 focuses on new words like building, agricultural, structure, and water. The new keyword introduced to topic 5 is mainly alloy. For the largest community, it can be seen from the list of keywords in the topics that they are similar to that discovered for topics in 2018 with few addition such as ceramic, glass. The topics from the total publications signifying the overall research content of civil engineering research in Nigeria as retrieved from Scopus database focuses are as follows: Topic 1 is focused on concrete, strength, cement, compressive, result, aggregate, property, study, and content. Topic 2 adds words such as, value, material, sample, and compaction Topic 3 is revolves around factors, models, analysis of buildings in Nigeria. Topic 4 centres on groundwater, magnetic field, source, quality, and control. Finally, Topic 5 is focused on building designs, construction.
Also previous research by Liu et al. (2007), Cimenler et al. (2014) and Khasseh et al. (2017) has revealed that centrality measures have an effect on the performance of authors in a co-authorship network. This is further confirmed by our research, which discovered that there is significant positive relationship between performance and betweenness centrality and closeness centrality.
Unlike most studies on co-authorship network, our research combines the semantic text mining approach to further discover the research focus of the discovered co-authorship network. The text mining approach addresses issues of semantics in text clustering and provides better inference according to Blei et al. (2003).
In the text mining section of the research, we were able to discover both the current research topics and the overall topics focused on in civil engineering research in Nigeria. For further work, since this study used only data collected from the Scopus database which can be considered as a limitation, other research can attempt to include publication data from Web of Science and Google Scholar.