An Integrated Method for Cooperation Prediction in Complex Standard Networks

: Standards play significant roles in the development of technology and economics, while the cooperation between drafters directly determines the quality of standard systems. The cooperation prediction is a significant while challenging problem for seeking new cooperation chances between drafting units due to their differences in experience and professional ability. In this study, an integrated artificial intelligence method is proposed for cooperation prediction using the link prediction method, text analysis, and network modeling. Specifically, we develop a multi-layer standard network formed by standard citation relationships and cooperation relationships between drafters. Then, a set of novel metrics is designed for predicting the cooperation between drafters considering the knowledge, experience, and professional capability. These metrics are further integrated into a neural network to improve the prediction accuracy. The priorities of our method in terms of prediction accuracy are verified with realistic data of Chinese environmental health standards. The prediction results provide strong support for the selection of drafters and further optimize the structure of standard systems.


Introduction
A standard is an authoritative technical specification document designed to regulate technical requirements, performance indicators or quality assurance in a specific field, which can be developed by a variety of drafters including international organizations, national agencies, industry associations or enterprises [1][2][3].A substantial number of standards within a specific industry, along with their corresponding drafters, form a complex, interconnected system.A high-quality standard system must exhibit synergy, compatibility, and coordination, which can be achieved through the close cooperation of qualified drafters.Due to the differences between drafters in terms of professional level, technical ability, and resources, the cooperation selection is essential to the quality of standards [4].High-quality standards ensure product quality, environmental health, and safety.In other words, the quality and its influence on a standard largely depends upon the experience and ability of drafters [5,6].
The network structure, formed by enterprise cooperation, plays a key role in acquiring resources, promoting technological innovation, and facilitating the formulation of advanced and compatible technology.Furthermore, enterprise cooperation in standards development significantly affects the standardization capability, including the standard-setting capability, standard implementation capability, and standard diffusion, which significantly affect standard quality and further promotes innovation [7].It is well recognized that the selection of drafters, which is usually managed by the national technical committees of standardization (NTCS), should be broadly representative, meaning that different kinds of innovative entities are encouraged to participate in standards development [8,9].Actually, choosing appropriate drafters needs to consider various factors, such as professional abilities, technical relevance, resource richness, and financial conditions, which largely determines the quality of drafted standards [10,11].It is a challenging problem for an NTC to identify the cooperation opportunities between drafters in a specific field by relying on social relations and experience judgment [12].From a systematic perspective, an NTC should optimize the structure of such a cooperation network with the aim to improve the compatibility and coordination of a standard system.For instance, it might produce a high-quality standard system by bridging institutions or companies with similar professional association that seldom draft standards jointly [13,14].If the cooperative relationships between the drafters can be quantitatively predicted in a certain way, it can provide strong support for optimizing the cooperative relationship between the drafters and improving the quality of the standards.In this study, we focus on cooperation prediction with an integrated method based on network theory, text analysis, and machine learning.In particular, the cooperation prediction between environmental health drafting units is solved with the link prediction method.
Link prediction refers to forecasting the likelihood of a connection between node pairs that have not yet established a link relationship, using the existing network structure and node information.This process aims to uncover hidden relationships within the network [15][16][17].At present, link prediction has been used to study a variety of network structures, including social networks and innovation networks [18][19][20][21], but there is a research gap in the field of cooperation in drafting standards.Currently, the mainstream similarity prediction methods based on the network topological structure, such as the Jaccard coefficient index, Adamic-Adar index, preferential attachment index [22][23][24], but the prediction accuracy of these individual indexes is often limited [25].The cooperation prediction problem between drafters can be viewed as a network-based link prediction problem.In this problem, there exist multiple factors or rich data, such as the standard documents and the information of drafters, which can be used to improve the prediction performance.
In this study, we propose a text-based learning method for cooperation prediction based on a complex network model by integrating the text similarity analysis, neural network learning, and network analysis.Firstly, we developed a multi-layer network using the realistic data of Chinese environmental health standards composed of three layers, namely, the standard citation layer, standard association layer, and drafter cooperation layer.Secondly, we developed four metrics based on similarity and the citation relationships.Meanwhile, three classical link prediction indicators [22][23][24], namely, Common Neighbors (CN), Lacal Path (LP), SimRank (SR), were also deliberately selected in the learning and prediction process for comparison.Finally, we integrated our proposed metrics with the classical indicators in the neural network model to improve the prediction performance.This approach considers not only the topological information of cooperation relationships but also considers the relationships between drafters based on knowledge and experience.Based on our prediction model, several pairs of drafting units are encouraged to be linked.The results show that our method provides an effective solution for optimizing the structure of cooperation networks.
The remainder of this study is organized as follows.Section 2 introduces the related literature.An integrated research framework is proposed in Section 3, and the main three stages are introduced.The method is verified with a case study and realistic data in Section 4. The paper is concluded in Section 5.

Literature Review
This study focuses on the cooperation prediction in a complex standard network.Two research streams are closely related to this study, namely, cooperation network analysis and link prediction, which will be reviewed as follows.

Cooperation Network Analysis
A cooperation network can be defined as a group of agents who conduct specific activities cooperatively, such as inventing patents, drafting standards, and conducting research.In this study, we focus on a cooperation network in drafting standards.The research of the cooperation network has attracted extensive academic attention [26][27][28][29], since Newman pioneered the study of the co-authorship network [30].In recent years, the collaboration network has been widely researched across various fields, including computer science, engineering, mathematics, business economics, environmental and ecological science, and telecommunications [31][32][33].With the tools of network theory, the formation and dynamic evolution laws of collaboration patterns and behaviors can be disclosed, the collaboration ability and characteristics can be evaluated or analyzed, and critical nodes also can be identified.Li et al. (2020) [34] studied the evolution of cooperation over temporal networks and found that network timeliness actually enhances the evolution of cooperation relative to static networks.In recent years, the research on multi-layer cooperation networks with heterogeneous nodes and edges and the weighted cooperation network has begun to drawn academic attention [4,35,36].In addition, some scholars have devoted significant focus to studying the impact of cooperation on innovation using patent data.For example, De et al. (2018) [37] used a 7-year patent data set to verify whether cooperation among scholars is beneficial to technological innovation.Based on China's national air quality standards, Wei et al. (2020) [4] proposed a method based on two-layer network model to assess the impact of drafters.Lin et al. (2022) [38] tested the influence of the stability of the self-cooperative network on innovation by using the data of Huawei's patent application.As we can see, various fields are keen on exploring the structure and functionality of diverse cooperation networks, particularly in fostering innovation.However, the cooperation networks among drafters underlying standard systems have not been explored, which is closely related to the quality of standards.

Link Prediction
Link prediction involves forecasting the likelihood of upcoming connections to reconstruct networks and delve deeper into the factors behind network formation and development.The link prediction methods can be grouped into three main classes: the probabilistic method, topological method, and machine learning method.The first method uses probability models such as the matrix factorization, Markov chain, and logistic regression for different networks, such as social networks and citation networks [39,40].These methods have the advantage of considering randomness uncertainties.In the past decade, with the development of network science, some topological algorithms or attribute-based metrics which exploit local or global structure information have been widely used to make better predictions [22,41,42].An important hypothesis of such algorithms is that two similar nodes in terms of structure or function tend to link with each other.The link prediction algorithms based on network topology have the advantages of simplicity, low computational complexity, and strong universality.The commonly used metrics include the common neighbor index (CN), the Jaccard index (JA), preferred connection index (PA), and the Adamic-Adar index (AA), the Katz index (KZ), and so on [43,44].In addition, the similarity metrics between two nodes can be measured with attributes information.The metrics of the cosine similarity, Euclidean distance, and Mahalanobis distance have also been effectively applied [45].In recent years, due to the development of big data and artificial intelligence, machine learning methods, such as reinforcement learning, deep learning, transfer learning, and generative adversarial networks, have received considerable attention in link prediction problems [46,47].These methods have the advantages of incorporating more diversified information.In addition, the integration of multiple methods has also become a significant trend for producing better prediction performance.Cao et al. (2019) [48] combined the chaotic perturbation model with the ant colony optimization algorithm to propose a chaotic ant colony optimization (CACO) link prediction algorithm.Ghorbanzadeh et al. (2021) [49] proposed a new method based on common neighborhood, which has been applied to three benchmark networks with unsupervised and supervised learning modes.
From the above, the prediction problem of cooperation in standard systems is a novel challenge that remains unexplored.In this study, we used the machine learning method to predict the cooperation between drafters based on a multi-layer network constructed via text similarity analysis.In particular, we used the text similarity analysis method to qualify the association intensity between standards, which provide a basis to propose new metrics to quantify the relationship between drafters.Combined with traditional topology metrics, our integrated methods substantially improved the prediction performance.

Problem Statement and Research Framework
As stated previously, standard systems play a vital role in social and economic development.In the standardization process, drafters play a crucial role in shaping the quality of standard systems.As standards are developed through the joint effort of multiple drafters, the structure of the collaboration network among these drafters holds great significance.Subsequently, structure optimization for the cooperation network among drafting units provides an effective solution to achieving high-quality standard systems.A critical research problem naturally arises: how do we optimize the topology of such a cooperation network?Specifically, we focus on how to form new cooperation relationships considering the background or experience of drafters in standard drafting.To solve this problem, we propose an integrated data-driven link prediction framework which combines network analysis, text analysis, and machine learning.The research framework is depicted in Figure 1.The cooperation prediction problem includes three stages-network model construction, metrics design and selection, and link prediction-which integrate network modeling, text analysis, and a neural network-based learning algorithm.The first stage focuses on constructing a multi-layer network model including the citation relationships and text similarity-based association relationships and cooperation relationships between drafters.The citation relationships form a directed subnetwork.It is noted that the text similarity analysis can better capture the interrelationship between standards in contrast to the citation relationship with limited information.The association relationship between each pair of standards has a continuous strength value between 0 and 1, while the citation relationship either takes 0 or 1.Meanwhile, the inter-layer relationship reflects the drafting relationships between drafters and standards.The second stage focuses on designing or selecting effective metrics for link prediction.The critical contribution lies in the incorporation of text-similarity based metrics, citation-based metrics, and traditional topological metrics.These metrics are treated as input in the training and prediction processes of the neural network system.The third stage is the link prediction, which uses the metrics proposed in the second stage as candidate input variables and the existing cooperation links as outputs.Due to the unknown nonlinear relationship between inputs and outputs, we use a neural network in the prediction of the cooperation between drafters.To improve the prediction accuracy, the hyper-parameters of the neural network model are optimized.

A Multi-Layer Network Model
In this study, a multi-layer network model is developed for cooperation prediction.The network is represented as where N s is the set of standard nodes, N d is the set of drafter nodes, E sc = s i , s j |s i , s j ∈ N s is the set of edges representing the directed citation relationship between standards, E sa = s i , s j |s i , s j ∈ N s is the set of undirected edges expressing the association relationship between standards, and E dc = d i , d j |d i , d j ∈ N d is the set of undirected edges representing the cooperation relationship between each pair of drafters.We assume that the network has n s standard nodes and n d drafters.The network structure is depicted in Figure 2, which is composed of three layers: standard citation layer L sc , standard association layer L sa , and drafter cooperation layer L dc .The relationship characteristics on the three layers is distinct with different nodes and edges.It should be noted that the edges reflecting citation relationships are directed following the sequence of the issued time of each standard.It means that the citation layer is an acyclic subnetwork.The edges in the association layer L sa are weighted with strengths, which are quantified through text analysis.
Based on the network structure, we can define an adjacency matrix for the standard citation layer as A = a ij n s ×n s , in which a ij = 1 if standard s i is cited by standard s j ; otherwise, a ij = 0. We know that A is an asymmetric matrix due to the directed citation relationship.Furthermore, we can define a ψ s i , s j as the similarity degree between each pair of standards s i and s j as computed through text analysis.Finally, we can define another adjacency matrix C = c ij n d ×n d to represent the mutual cooperation relationship, in which c ij = c ji = 1 if drafter d i and drafter d j have drafted at least one standard jointly.In addition, it should be noted that the nodes in L sc and L sa are completely the same.The main variables or symbols used in this study are listed in Table 1.

Text Similarity Analysis
In this study, text similarity analysis (TSA) is used to quantify the association strength between each pair of standards.At present, there are many TSA algorithms, such as the cosine similarity algorithm, Levenshtein distance algorithm, SimHash algorithm, and Euclidean distance algorithm [45].The core idea of text similarity analysis is to segment, clean, and extract the keywords of the text.The word frequency is converted into space vectors.Each document is mapped to a vector.Then, different TSA algorithms are used to calculate the text similarity degree.
This study combines the Vector Space Model and cosine similarity formula to measure the text similarity degree between standards due to three wide applications.The main idea is shown in Figure 3.After the keywords extraction, a standard text is transformed into a multiple dimensional vector composed of keywords TF-IDF (word frequency-inverse document frequency) values.Then, the cosine formula is used to calculate the association degree between each pair of standards.For example, the cosine value of the included angle θ between two n-dimensional space vectors x = (x 1 , x 2 , ..., x n ) and y = (y 1 , y 2 , ..., y n ) is computed as The specific process of the TSA is depicted as follows: Step 1: Content extraction.The selection of standard text is directly related to the final computational results of the association degree.In order to reduce the subjectivity and computational efforts, we propose uniform rules for text.First, we select structured content in each standard document, including the standard name, preface, scope, and term definitions.Then, to improve the theme relevance, we select the texts highly related to the standard topic through reading the original document.
Step 2: Text Segmentation.Each standard original text is decomposed into a set of "word + pos(part of speech)" via word segmentation processing on the collected standard texts.
Step 3: Invalid words filtering.The invalid words, which refer to the words that are automatically filtered out during text processing to save storage space and program running efficiency, mainly include auxiliary words, modal words, adjectives, and so on.In this step, we have a set of stop words and a deactivated part of speech set.All the "word + part of speech" sets obtained in the previous step will be double filtered in this step.The filtered words form a preliminary word bag.Then, we will manually identify the word bag, add the missing words in the word bag to the invalid word set, and filter again.We repeat the above operations until no words are found in the word bag to form the final keywords set as we need.
Step 4: Dictionary construction.A dictionary based on the set of keywords obtained from all the standards is built.The main task in this step is to assign a specific ID to each keyword for subsequent vectorization.After this step, each standard text is converted into the form of "word +ID" and stored in the dictionary.
Step 5: Corpus establishment.A corpus is established based on the dictionary (the corpus here represents the sparse vector of each standard key content) and TF-IDF model training is conducted.This step is divided into two phases.First, the final keyword set obtained from each standard is further converted into the form of "word ID+ word frequency" according to the dictionary obtained in Step 3 and stored in a set.In addition, in order to measure the importance of the extracted standard keywords accurately, we will conduct TF-IDF model training based on the obtained corpus.
It is apparent that the frequency of a specific keyword does not necessarily measure its importance.We need to introduce an adjustment coefficient to determine whether a word is common or not.If a certain word appears less frequently in other standard texts but frequently in a certain standard text, this word can represent the thematic characteristics of the text.It means that we can assign an "importance" weight to each word on the basis of word frequency.The TF-IDF is a widely used algorithm [50], which is divided into three steps.The first step calculates the TF, the second step calculates the IDF, while the final step calculates the TF-IDF value.The TF-IDF value of a keyword k in a specific standard s, χ k,s , is calculated as follows: where h k,s is the number of occurrence of word k in standard s, ξ s is the number of all the keywords in standard s, τ is the total number of standards, and τ k is the number of standards containing the specific keyword k.The trained TF-IDF model can further transform"word ID+ word frequency" into "the relative importance of word ID+ word to the whole corpus", which makes the final calculation results of text association degree between standards more scientific.
Step 6: we use the cosine similarity formula to calculate the text similarities between each pair of standards.

Link Prediction Hypotheses
In the existing literature, although several topological metrics have been proposed, the prediction capability of individual metrics is limited.Meanwhile, the topological metrics mainly focus on structural information without considering the professional experiences or abilities of drafters.Therefore, we incorporate both topological metrics and text analysis based metrics in the neural network model to improve prediction performance.
Cooperation between drafters is often based on professional preferences.It is well known that the standard text can well reflect the knowledge preferences of drafters, which provide a basis for exploring their cooperation relationships.In this study, we put forward the following hypotheses in the prediction of cooperation between drafters for metrics design.

Standards Association and Cooperation Potential
Firstly, from a theoretical perspective, the standards association between drafters reflects the extent of overlap in the knowledge bases, objectives, and technical languages [51,52].This shared foundation can facilitate communication and understanding, thereby increasing the possibility of further cooperation.When developing similar thematic standards, drafters with a common knowledge system and technical languages find it easier to engage in effective communication and coordination, thereby laying a solid foundation for cooper-ation [53].Secondly, knowledge sharing and network effects are significant factors driving standardization cooperation [54].Drafting standards on similar topics often involve similar stakeholders, including industry experts, regulatory bodies, and academic institutions [55].This overlap creates network effects for the exchange of knowledge and best practices.As drafters become more familiar with each other's work, trust and mutual recognition gradually increase, paving the way for future collaboration.Additionally, resource optimization is also a powerful motivator for cooperation.Drafters cooperation on similar topics can more effectively utilize each other's resources, including time, expertise, and financial investments.Joint efforts can reduce redundant work, making the standardization process more efficient and cost-effective [56].These economic incentives further enhance the motivation for further cooperation.The above leads to the following hypothesis.
Hypothesis 1.The higher the association between the standards, the greater the possibility of cooperation between the corresponding drafters in the future.
Based on this hypothesis, we propose two metrics for cooperation prediction: total association intensity (TAI) and maximal association intensity (MAI).We define C u and C v as two sets of standards drafted by drafters d u and d v , respectively.
The metric TAI summarizes the total similarity degree between each pair of standards drafted by the two drafters as computed via text analysis, which is expressed as where ψ(s a , s b ) is the similarity degree between two standards s a and s b , which are drafted by drafters d u and d v , respectively.Due to the capability differences of drafters, the number of standards drafted by them can be substantially different.The TAI value might not necessarily indicate the cooperation possibility or necessity if two drafters have drafted too many standards.To overcome this problem, we propose the MAI metric, which is represented as Unlike TAI, MAI predicts the cooperation probability based the maximal association degree between the standards drafted by two drafters d u and d v .

Citation Degree and Cooperation Potential
First, a high degree of citation indicates a high level of consistency in technical content and expertise among drafters.This consistency is not only reflected in the specific content of the standards but also in the recognition of technical norms and best practices [57].When drafters frequently cite each other's standards, it signifies a common foundation and perspective in technical understanding and application, which strongly promotes the further cooperation possibility [58].Secondly, standards with high citation frequencies usually possess significant authority and influence within the industry [59].When drafters cite each other's standards, it is not just an acknowledgment but also an expression of trust.Through citations, an invisible network of trust is established, which can effectively reduce future communication costs and coordination difficulties, thereby promoting cooperation.Moreover, citation relationships build a knowledge network among drafters [4].When one cites another drafter's standard, it essentially draws on their experience and knowledge, strengthening the connection between them.Frequent citation relationships enable these drafters to collaborate more closely in technical development and innovation, jointly addressing industry challenges and technological changes, thereby enhancing the adaptability and foresight of the standards.The above leads to the following hypothesis.
Hypothesis 2. The higher the citation degree between standards, the greater the possibility of cooperation between the corresponding drafters in the future.
Similar to the TSA-based metrics TAI and MAI, two citation-based metrics are proposed.The first one is the total citation intensity TAI, which represents the total number of citations between the two sets of standards drafted by two drafters d u and d v , represented as in which δ(s a , s b ) = 1 if standard s a cites s b or s b cites s a ; otherwise, δ(s a , s b ) = 0. Similar to the metric MAI, we define the maximal citation intensity MCI as where represents the maximal value of the total number of the citations of the standards drafted by d u and d v in the corresponding sets of C v and C u , respectively.

Topological Connectivity and Cooperation Potential
The topological connectivity in cooperation network not only reflects the overall embeddedness of drafters within the standardization process but also reveals the paths of knowledge flow and the positions [60,61].These factors collectively determine the potential for future cooperation.Firstly, highly embedded drafters typically maintain direct or indirect connections with others, which enhances trust and the capacity for information flow among them [62].When a drafter possesses a greater number of links within the network, they are more likely to gain recognition and trust from others, thereby augmenting the possibility of future collaborations.Secondly, the network topology determines the flow paths of knowledge and innovation within the entire standardization network [63].Close link relationships can accelerate the dissemination of knowledge and best practices, thereby promoting collective learning and mutual progress.In a highly interconnected network, the diffusion of knowledge and innovation occurs at a faster pace, enabling drafters to acquire and apply new technical and managerial expertise more quickly [64].This rapid acquisition and application capability forms a crucial foundation for future collaboration.Tight topological connectivity not only facilitates the dissemination of existing knowledge but also inspires new opportunities for innovation and cooperation.The above leads to the following hypothesis.Hypothesis 3. The topological connectivity in the drafter cooperation network may be positively related to cooperation possibility in the future.
In this study, after a initial accuracy test, we choose three metrics of Common Neighbors (CN), Local Path (LP), and SimRank (SR) in solving our link prediction problem.
The CN index is the most basic similarity index, which implies that two nodes tend to be connected if they have more common neighbors.In this study, the CN index for the drafter cooperation layer is defined as where Γ(d u ) and Γ(d v ) denote the sets of adjacent nodes of drafter d u and d v respectively.The LP metric further considers the contribution of high-order neighbors based on the CN metric, which is defined as where α ∈ [0, 1] is an adjustable parameter and ϕ n uv is the number of paths of length n between d u and d v .The LP metric becomes CN if α = 0.For simplicity, we take n = 3.
A basic assumption underlying the SR metric is that two nodes are similar if their adjacent nodes are similar.Then, we can define the SR metric for two drafters d u and d v with the following iterative algorithm: where β ∈ (0, 1) is an adjustable parameter, and k u and k v are the centrality degree values of nodes d u and d v , namely, the number of adjacent nodes.
In this study, a BP neural network model is used for predicting cooperation in drafting standards due to its strong capability to map the nonlinear relationship between the input and output, which is expressed as For each pair of two nodes, the input data include all the combinations of the previous metrics for each pair of drafters, while the output is their cooperation relationship which has been characterized by the element c uv in the adjacent matrix C. For example, the output y uv = 1 if c uv = 1; otherwise, y uv = 0.After the training procedure with the existing network data, we can obtain a predictive value y uv ∈ [0, 1] for each pair of two drafters.As a result, we can rank the predictive values for those pairs of nodes with c uv = 0.A high value of y uv denotes the high possibility of cooperation in the future.Meanwhile, the Area Under the Curve (AUC) index [65], which is the most commonly used index to measure the prediction performance in link prediction problems, is used to measure the prediction accuracy and optimize hyper-parameters of the BP neural network, such as the number of neurons in each layer of the neural network.The larger the AUC value, the higher the prediction accuracy of the index will be.

Data Collection and Resources
We firstly collected 118 standards' documents on environmental health from the National Health Commission of the People's Republic of China (http://www.nhc.gov.cn/wjw/pgw/wsbz.shtml,accessed on 10 January 2021) up to 31 December 2020 as the total sample.Then, the corresponding citation and cooperation relationship was established by viewing the "Normative Reference Documents" and "Introduction" sections from the normative document of each standard confirmed on the website of the National Public Service Platform for Standards Information (http://std.samr.gov.cn/gb,accessed on 10 January 2021) to ensure the accuracy of the data.As for the text association calculation, we did not include the entire original text of the standards since too many irrelevant sentences and words would significantly increase the program's runtime and potentially lead to considerable deviations in the final results due to the incomplete removal of irrelevant words.Therefore, a purely manual screening method was adopted, selecting sentences related to the topic based on an analysis of each standard's original text.Additionally, to minimize errors caused by subjective factors, uniform rules were established for selecting each standard text in this paper.First, we chose the common structured content in the standards, such as the standard name, foreword, scope, and definitions of terms.Additionally, based on a thorough reading of the standard text, we selected texts related to the standard's theme from primary and secondary headings.These five parts of content adequately reflect the theme and focus of a standard.
Finally, 1092 cooperation relationships and 5436 non-cooperative relationships in the cooperative network of the drafters up to 31 December 2010 were selected as the training set, and the newly added 232 cooperative relationships and 8613 non-cooperative relationships up to 2020 were selected as the validation set in this study.The cooperation network in 2010 is constructed with 115 drafters and 1092 edges; the cooperation network in 2020 includes 176 drafters and 1324 edges.The unconnected edges are treated as non-cooperative relationships.

AUC Validation under Metrics Integration
In the training process, three metrics of CN, LP and SR are selected as the classical link prediction indexes.It is noted that other classic link prediction metrics proposed in the existing literature are excluded due to their poor accuracy with pre-test.Since the prediction performance of LP and SR is affected by the parameters α and β, we optimize these two parameters via a step increase of 0.1 and set α = 0.2 and β = 0.6 to achieve the optimal link performance.In this way, we obtain the AUC values for each prediction indicators, which are listed in Table 2.
From Table 2, we see that all four predictors proposed in this paper perform better compared to the classical three predictors, among which the two text analysis based indicators TAI and MAI lead the optimal prediction performance.In contrast, the performance of the two metrics TCI and MCI based on the citation relationship between standards are basically the same as the three traditional metrics.It means that the citation-based metrics can also be used in the cooperation prediction.The results clearly validate the effectiveness of the four predictors proposed in this study.To further improve the prediction performance, we use the BP network to integrate the seven metrics in the training and prediction process via combining different metrics as feature inputs.We divide the cooperative and non-cooperative relationships before July 20 in 2020 randomly into a training set and a validation set with a ratio of 7:3.In order to reduce the contingency, all the prediction accuracy metrics are averaged on 100 independent tests.
The idea of the feature selection is depicted as follows.Firstly, we select the three classical prediction indicators CN, LP and SR as the feature parameter sets to train the cooperative relationship in the training set.It should be noted that all the input data are normalized to fall into (0,1).In addition, to achieve high prediction performance, we select the optimal hyper-parameters of the neural network through comparative test.Due to the limited number of feature input and data size, we select a neural network composed of three layers and 6 neurons.The averaged AUC value of the three metrics is 0.69 with a maximal value of 0.74, a median value of 0.7, and the variance of 0.07.It means that the neural network performs stable in cooperation prediction.In the second step, we include a single metric of TAI, MAI, TCI, and MCI into the set of input variables.which leads to the average prediction accuracy of 0.85, 0.83, 0.80, and 0.77.The prediction accuracy with different number of neurons in the hidden layer is shown in Figure 4.It means that including our proposed metrics significantly improves the prediction accuracy, among which the text analysis based metrics behave the best.Meanwhile, the integration of multiple metrics in a neural network model performs better than a single metric.
To integrate more metrics, we design other 10 cases as shown in Table 3, in which 1 indicates that the corresponding metric is selected and 0 denotes that it is not selected.In addition to the three classical metrics, two additional metrics are included in cases 1-6, three additional metrics are included in cases 7-9, and all the metrics are included in case 10.The prediction performance for these 10 cases is shown in Figure 4a-d

Cooperation Prediction with Neural Network
Through feature selection and hyper-parameters optimization, we establish an optimized neural network model using the configuration of Case 2 in Table 3.We then measure the prediction accuracy with 100 tests using the cooperation data in 2020, a total of 15,400 cooperation relationships.The updated neural network model can predict the probability of cooperation among drafters falling into [0, 1].The higher the predictive value, the greater the probability of cooperation between drafters.The results show an ideal prediction performance; the maximal prediction accuracy is 0.92, the minimum is 0.71, and the average is 0.85.Table 4 shows the top ten drafter pairs that do not have a cooperative relationship in 2020 but are recommended to have a cooperative relationship in the future.This result provides a strong support for selecting drafters in standard setting activities with the aim of improving the quality of standard systems.
As can be seen from Table 4, there are 16 government departments and 4 research institutes among the top 10 drafters.It suggests that the drafting activities of environmental health standards will still be dominated by relevant government departments and assisted by scientific research institutes.We also note that the Institute of Environmental and Healthrelated Product Safety of the Chinese Center for Disease Control and Prevention is one of the top three drafters playing a leading role in the construction of the environmental health standard system.
Additionally, upon reviewing the standards drafted by the units listed in Table 4, we find that among these 20 drafting units, 15 have drafted at least one standard related to public health.To further verify this finding, a word frequency analysis is conducted on all the themes of the standards drafted by the aforementioned units, resulting in the creation of a word cloud as shown in Figure 5. Figure 6 is the specific frequency of the key words.In the following, we demonstrate the significance of the aforementioned predicted relationships at the level of the network's topological structure.Figure 7 represents the current cooperation network of drafting units, nodes of different five components are marked with different colors identified with.The independence between different components indicates the absence of cooperative relationships between drafting units, which may be caused by two reasons.Firstly, the thematic preferences among drafting units may differ significantly.Secondly, the drafting units within each module lack a mechanism to select appropriate collaborative partners, resorting to the principle of proximity and clustering for support.The red dashed lines in the figure represent the top 10 predicted cooperative relationships from Table 4.It is evident that all these relationships occur between component 1 and component 2, projected from the nodes in central positions in each module to other units, while the other three modules remain in isolation.Upon further analysis, it is found that the thematic preferences of the drafting units in these three groups are overly narrow, specifically focusing on lead-acid batteries, iron foundries, and crematories, which significantly deviates from the themes of component 1 and component 2.Moreover, among the ten predicted relationships, five occur between component 1 and component 2. If these relationships are established in the future, the largest connected subgraph of the collaboration network will increase from 132 to 167, significantly enhancing the collaborative synergy among environmental health drafting units, which will greatly optimize the cooperative network structure of drafters and improve the efficiency of knowledge flow in the network.The above demonstrates the changes in the network topology structure resulting from the predicted cooperative relationships.The results effectively reflect the advantages of the cooperative prediction model proposed in this paper, our prediction method effectively integrates the network topology structure attributes and thematic preference attributes of node units.It not only weakens the status of extreme edge nodes in future cooperation trends but also effectively overcomes the constraints of the topological structure attributes, allowing for scientific predictions based on the thematic preferences between drafting units.

Discussion and Practical Implications
This study proposes an integrated method for predicting collaborations among drafters in the context of developing standards using a multi-layer network.Cooperation prediction addresses a link prediction problem, aiming to estimate the likelihood of a link between two nodes based on structural or functional information.Within the existing literature, heuristic methods, embedding methods, and machine learning techniques such as graph neural networks are prevalent [25,66,67].Although graph neural networks have garnered significant attention, they overlook pairwise structural information.Recently, a combination of structural features (SFs) and neural networks (NNs) has enhanced prediction accuracy.Two integration methods have been employed: SF-then-NN and SF-and-NN [68].In the SF-and-NN approach, SF serves as input for NN, while SF and NN operate independently in the former method, potentially limiting the expressive capabilities.Consequently, we have chosen the former for the integration method to enhance prediction accuracy.
In this study, we have intentionally devised four innovative metrics based on a multilayer network model.These metrics leverage citation relationships and text association strengths tailored for our intricate standard systems.Within such a system, collaboration between two drafters should account for the correlation between their professional expertise and knowledge, with standards serving as the optimal conduit for such information.Our approach involves utilizing text similarity analysis to quantify the resemblances among each pair of standards.To our knowledge, we are pioneers in quantifying standard relationships using text mining methods for cooperation prediction.The existing literature predominantly portrays standard systems as unweighted networks based on citation relationships, which may fail to filter out citation noise stemming from various intentional behaviors.For contrast, we also take into account three classic structural metrics: CN, LP, and SP [43,44].Through a case study, we illustrate that our text analysis-based metrics yield noticeably improved prediction results in terms of AUC values.Furthermore, upon integrating our metrics with traditional prediction metrics, we demonstrate a significant enhancement in prediction accuracy.
The standard development process is open to various individuals or organizations who might be affected by a standard or have relevant technical knowledge.With a wide range of potential participants, organizations, and forms of cooperation [69], bringing together appropriate drafters is increasingly becoming a critical challenge.From a practical contribution standpoint, this study puts forward an effective method for predicting cooperation in complex standard networks, which has the potential to transform the way standards are developed, particularly in the selection of cooperative drafters.The method is versatile and applicable across various industries or application areas.Overall, our study aims to enhance the structure and functionality of existing standard systems.The components within these systems are relatively independent due to loose or hesitant cooperation.The potential benefits of utilizing our method extend to, but are not limited to, the following aspects.It can optimize the structure and quality of standard systems by enhancing compatibility and systematicity through identifying valuable and more suitable potential collaborations.We maintain that cooperation among strong and influential drafters, such as companies or research institutions, has the capacity to generate high-quality standards.Furthermore, our method can aid in the selection of drafters for the development or update of standards, as the recommendations presented in Table 3 are valuable for establishing new collaborations.

Conclusions
The cooperation in standard drafting is essential to improving the quality of standard systems similar to the significance of academic cooperation.This paper proposes an integrated text-based learning method for cooperation prediction in a complex standard system, which is modeled with a multi-layer network model composed of heterogeneous nodes and edges.To improve the prediction performance, we firstly propose four predictive metrics based on text analysis and citation relationships, namely, TAI, MAI, TCI, MCI.Aiming at improving the prediction performance, we further integrate these four metrics with classical topological metrics, CN, LP, and SR, into a neural network model for cooperation prediction through learning.Our method is verified with the publicized data of standards and drafting units in the field of Chinese environmental health standards.Specifically, we compare the prediction accuracy in terms of the AUC value of the three classical link prediction metrics and our proposed metrics' AUC value.Finally, we compare the prediction performance of the neural network model under different cases by combining prediction metrics.The prediction results show that government departments and research institutes are recommended to cooperate and play leading roles in the formulation of environmental health standards.Through topological visualization and thematic preferences, we observe that our method can effectively enhance the synergy of a standard system and bridge the links between different groups of drafting units.In this sense, our results can provide strong support for selecting drafting units, which is critical to improving the quality of standard systems.Meanwhile, our approach is versatile and not limited by specific fields or industries, making it suitable for implementation in various standard systems.
It should be noted that this study also has some limitations.First of all, we have just selected three classical link prediction indicators for comparison.However, integrating more metrics may produce better performance.Secondly, the prediction accuracy may be limited by our neural network model.In the future, other machine learning models can be used, such as the Graph Attention Network Model (GAT), Generative Adversarial Network Model (GAN) and ML-Link Model [25,70,71].Finally, we have verified the effectiveness of our method only using Chinese environmental health standards.The cooperation prediction for the standard systems in other fields, such as ICT and automotive industries, can be explored.

Figure 1 .
Figure 1.Research framework for cooperation prediction in standards drafting.

Figure 2 .
Figure 2. A schematic diagram of the three-layer standard network.

Figure 4 .
Figure 4. Prediction accuracy of cooperation prediction with metrics integration.

Table 1 .
Main symbols and explanations.Similarity degree between standards s i and s j , ψ(s i , s j ) ∈ (0, 1) δ(s i , s j )Citation index between standards s i and s j , δ(s i , s j ) ∈ {0, 1}

Table 2 .
AUC values of each prediction indicator.

Table 3 .
Cases of the metrics integration.