A MOOC Course Data Analysis Based on an Improved Metapath2vec Algorithm

: Many real-world scenarios can be naturally modeled as heterogeneous graphs, which contain both symmetry and asymmetry information. How to learn useful knowledge from the graph has become one of the hot spots of research in artiﬁcial intelligence. Based on Metapath2vec algorithm, an improved Metapath2vec algorithm is presented, which combines Metapath random walk, used to capture semantics and structure information between different nodes of a heterogeneous network


Introduction
In the real-world, many scenarios, such as transportation networks, social networks, communication networks, etc., can be effectively stored and represented using a graph structure. In the graph, nodes represent the objects in the scenario and edges represent the relationships between objects [1]. Therefore, research can be conducted on analyzing the structure and properties of graph structures to understand such real-world complex systems. With the increasing complexity of real-world data, graph structures are characterized as massive, high-dimensional, sparse, heterogeneous, complex, and dynamic [2]. Both symmetry and asymmetry information exist in the graph. For example, in the social network, the marriage relationship is symmetric while the filiation relationship is asymmetric [3]. Therefore, how to learn useful knowledge from the graph has become one of the hot spots of research in artificial intelligence.
Graph embedding is an important technical tool for graph knowledge discovery. It aims to map graph data into a low-dimensional vector that can correctly represent some important information of the original graph [4]. Moreover, graph embedding enables practical application problems such as node/graph classification [5], node clustering [6], connection prediction [7], and recommendation systems [8]. Initially, much of the research on graph embedding focuses on the homogeneous networks, which consist of only one node type and edge type [9]. However, the relationships in the real-world are naturally modeled as heterogeneous networks, which contain richer and more complex semantics and structures. As more than one node type and edge type are provided, the graph embedding techniques for homogeneous networks cannot be directly applied to heterogeneous networks [10]. Therefore, how to learn embedding of heterogeneous networks has become one of the hot topics of research in artificial intelligence [11].
Initially, matrix decomposition methods [12], i.e., singular value decomposition (SVD) [13] and non-negative matrix factorization (NMF) [14], are used to generate potential dimensional features in heterogeneous networks. The main idea of this kind of method is to represent the association information of a graph as a matrix, and then matrix decomposition is implemented on the association matrix to generate a low-dimensional vector of each node. However, the computational cost of decomposing large-scale matrices is usually very expensive, and there are also statistical performance drawbacks [15]. At the same time, matrix decomposition is not easy to incorporate features of different types of nodes and contexts, which makes it impossible for matrix decomposition to obtain enough effective information. Subsequently, random walk-based approaches are proposed to learn the embedding of nodes in the graph. This kind of method evolves from the Word2vec model in natural language processing [16], where different walk strategies are used to obtain the node sequence of the graph, and then the Skip-Gram word embedding model is used to complete the node embedding learning. For example, DeepWalk [17] first randomly generates the neighbors of nodes in the network to form a fixed-length random walk sequence, while Node2vec model [18] obtains the random walk sequence by adjusting the parameters of Breadth-First Search (BFS) and Depth-First Search (DFS) strategies. However, these two random walk strategies do not distinguish the forward direction, and when applied to a heterogeneous graph with various node types, it is easy to lead to statistical bias. Therefore, Metapath2vec [19] is proposed to generate a sequence of heterogeneous nodes by introducing meta-paths to guide random walk, so that it can be used to capture the semantic and structural information between different types of nodes to ensure the correctness of semantic changes, and then the nodes in the sequence are treated as words and the Skip-Gram model is adopted to learn the embeddings of the heterogeneous graph. Indeed, Metapath2vec essentially transforms the heterogeneous graph embedding into a word embedding problem and can fully capture the structural and semantic relevance of different types of nodes and relations. Among them, word embedding models are the key to achieving good embedding results. For example, the Skip-Gram model [20] is a local context window approach for learning word vectors and is a mainstream model for word vector learning. The global matrix decomposition model [21] (e.g., LSA) is a word vector learning method, which is trained on separate local context windows. However, the former may be better at analogy tasks, but they are trained in separate local context windows with limited window lengths and do not make good use of global statistics of the data. Especially, the latter makes effective use of the statistical information but fails to capture the semantic information and so performs poorly in tasks such as the similarity of words [22]. Then, Global Vectors for Word Representation (GloVe) model [23], which incorporates the global statistical information of matrix decomposition (LSA), has the advantage of co-occurrence windows and global priori statistical information. Its core idea is to use the number of co-occurrences between words for training. It is efficient in training the learning of word vectors and can carry more semantic information. Zheng Yanan [24] and others use GloVe to extract text features compared with Word2vec word vector, and then use SVM to classify text. Experiments show that using GloVe to extract features in text classification is better. Shishir Kulkarni [25] and others use second-order random walk to create a corpus, and then generate node embeddings of the graph by transplanting the GloVe algorithm. Experiments show that the feature extraction by GloVe is also effective in extracting node features of the graph.
In this paper, an algorithm named Metapath-GloVe is proposed to improve the efficiency of the Metapath2vec algorithm. The main idea of this method is to train the heterogeneous node sequences, which are obtained by random walk based on Metapath2vec algorithm, and then use the GloVe model to complete the learning of node embedding in heterogeneous graphs [26]. Following that, the proposed algorithm is applied to analyze the MOOC course data, which are collected from the MOOC online learning platform [27]. The dataset is modeled into a heterogeneous graph by extracting the users, videos, course, and the asymmetric relationship among them. Based on the constructed heterogeneous graph, the proposed algorithm is used to transform the structured information of the heterogeneous graph into vector information. Subsequently, based on the learned user and video embedding vectors, a joint Spectral Co-clustering algorithm is used to perform association analysis of users and videos. Then, a classification-based link prediction model is constructed by obtaining the embedding vectors of user-video links to make video recommendations to users.
The main contributions of this paper can be summarized as follows: (1) A heterogeneous graph embedding learning algorithm is proposed by training GloVe word embedding model on meta-path random walks, which can improve the efficiency of heterogeneous node embedding learning. (2) The proposed algorithm is applied on the MOOC course data to learn embeddings of users and videos, which lays the foundation for the realization of subsequent analysis tasks. (3) The proposed method has better results than traditional methods for user and video association analysis and video recommendation experiments on MOOC course data.
The rest of the paper is organized as follows: Section 2 introduces an improved Metapath2vec algorithm proposed in this paper in the context of the MOOC data used in this paper; Section 3 presents the application of the algorithm proposed in this paper to MOOC data analysis; Section 4 gives the experimental results of this paper and its analysis; Section 5 makes a summary of the whole paper.

Learning MOOC Course Data Node Embedding Based on Metapath-GloVe
In this section, the proposed algorithm Metapath-GloVe is applied to the context of MOOC data. Firstly, the relationship between users, videos, and courses in MOOC data is extracted, and a heterogeneous graph of MOOC data is constructed. Secondly, random walk sequences, which contain nodes in the type user and video, are obtained from the heterogeneous graph by implementing meta-path random walk. Then, in view of the advantages of the GloVe model, this model is used to learn the node embedding of users and videos based on the meta-path sequences. Finally, an overall framework of the proposed algorithm for learning node embeddings in heterogeneous graphs is given.

The Construction of Heterogeneous Graph from MOOC Course Data
In general, a graph is structured data, which is essentially a collection of vertices or nodes connected together by edges. Usually, a graph is represented by G = (V, E), where V and E represent the set of all nodes and the set of all edges, respectively. Then, the types of nodes and the types of edges in the graph are described by defining the node type mapping functions ϕ : V → Tv such that ∀v ∈ V, ϕ(v) ∈ Tv, Tv are the sets of node types, and the edge type mapping functions ψ : E → Te , ∀e ∈ E, ψ(e) ∈ Te, Te are the sets of edge types, respectively. Generally, graphs can be classified into homogeneous and heterogeneous graphs according to the types of nodes and edges. Homogeneous graphs refer to graphs with only one node type and one edge type, i.e., |Tv| + |Te| = 2, which usually have a simple network structure. While heterogeneous graphs have more than one node type or edge type, i.e., |Tv| + |Te| > 2, which contain richer semantic information and more complex network structure.
In this paper, users, videos, courses, and their relationships are extracted from MOOC course data, and a heterogeneous graph is built as shown in Figure 1. The heterogeneous graph contains three types of nodes: users, videos, and courses, and different types of relationships exist between different nodes. Specifically, if a user watches a video, we consider that there is a link between the user and the video, and if the video belongs to a course, we consider that there is a link between the video and the course. In the following experiments, take this heterogeneous graph as the object, we carry out the main analysis of the relationship between the users and the videos, and courses are treated as labels to evaluate the proposed algorithms. Thus, we adopt B(U, V, E) to represent the object graph, where U and V stand for users and videos, respectively, and E is the relationship between users and videos.
we consider that there is a link between the user and the video, and if the video belongs to a course, we consider that there is a link between the video and the course. In the following experiments, take this heterogeneous graph as the object, we carry out the main analysis of the relationship between the users and the videos, and courses are treated as labels to evaluate the proposed algorithms. Thus, we adopt ( ) , , B U V E to represent the object graph, where U and V stand for users and videos, respectively, and E is the relationship between users and videos.

The Acquisition of Meta-Path Sequences Based on Heterogeneous Random Walk
When a heterogeneous graph is obtained, heterogeneous random walk is adopted to acquire node sequences with multiple types. Firstly, a pre-set meta-path scheme ρ is introduced in the heterogeneous graph to guide the process of random walk to generate node sequences that capture the semantic and structural correlations between different types of nodes. As shown in Equation (1), it is a defined meta-path scheme ρ .
denotes the combination of meta-paths between node 1 V and node l V . Finally, a meta-path scheme can be generated from the combination with a higher-order relation (nodes in different types). It is able to capture more complex and rich semantic relations than the first-order relation (nodes in same type) approach.
When a meta-path scheme ρ is given, the i -th transition probability of the meta-path random walk is defined in Equation (2).
( ) viously, random walk is a biased under the predefined meta-path scheme ρ . Thus, metapath based random walk can be used to capture the semantic and structural information among different types of nodes to ensure the correctness of semantic changes, and also to

The Acquisition of Meta-Path Sequences Based on Heterogeneous Random Walk
When a heterogeneous graph is obtained, heterogeneous random walk is adopted to acquire node sequences with multiple types. Firstly, a pre-set meta-path scheme ρ is introduced in the heterogeneous graph to guide the process of random walk to generate node sequences that capture the semantic and structural correlations between different types of nodes. As shown in Equation (1), it is a defined meta-path scheme ρ.
where R = r 1 • r 2 • · · · • r l−1 denotes the combination of meta-paths between node V 1 and node V l . Finally, a meta-path scheme can be generated from the combination with a higher-order relation (nodes in different types). It is able to capture more complex and rich semantic relations than the first-order relation (nodes in same type) approach. When a meta-path scheme ρ is given, the i-th transition probability of the meta-path random walk is defined in Equation (2).
Obviously, random walk is a biased under the predefined meta-path scheme ρ. Thus, meta-path based random walk can be used to capture the semantic and structural information among different types of nodes to ensure the correctness of semantic changes, and also to effectively avoid the statistical bias caused by the high proportion of a certain type of nodes, so that the heterogeneous graph can be effectively integrated into the subsequent model to complete the embedding of nodes. In our MOOC course heterogeneous graph B(U, V, E), two meta-paths are constructed to form a meta-path scheme as shown in Equation (3).
In details, R 1 describes the relationship between users who have clicked on the same video R 2 describes the relationship among videos that have been clicked by the same user. Biased random walks are guided by the meta-path scheme to obtain the node sequences with multiple node types.

Learning Node Embedding Based on GloVe Model
When node sequences are obtained, they are treated as corpus and words, and GloVe model [18] is adopted to learn the node embeddings. The GloVe model adopts both the overall statistics feature and the local context feature of the corpus. The progress of the GloVe model is shown as follows.
Step 1: The node sequences generated by random walk are treated as corpus and set as the input of the model.
Step 2: A global co-occurrence matrix is generated. A context window is set to traverse from the beginning to the end of the corpus, and then the number of simultaneous occurrences of two nodes are counted. For example, the co-occurrence matrix is represented as X with element X i,j denote the number of times the words i and j appear together in a window.
Step 3: An approximate relationship between the word vectors and the co-occurrence matrix is constructed as shown in Equation (4).
where u i and v j are the word vectors of the word i, and j, b i , and b j are the bias terms of the two word vectors. The equation means the inner product of word vectors converge to the logarithmic value of the co-occurrence matrix.
Step 4: The loss function L is constructed based on the approximate relationship using the mean square error, as shown in Equation (5).
where N is the dimension of the co-occurrence matrix. Here, f (x) is the weight function with the characteristics shown in Equation (6). f (x) is non-degeneracy, and the weights do not decrease as the number of co-occurrences increases. In this paper, x max = 100, α = 0.75.
In this paper, AdaGrad [28] is adopted to train the model. Initially, all non-zero pairs i and j in the co-occurrence matrix X are randomly sample as training data, the corresponding word vectors u i , v j and two biases b i and b j are randomly initialized. Then the initial loss value can be calculated. In the following, the gradient can be calculated by a given learning rate; thus, u i , v j , b i , b j can be updated. The above process continues until the given iteration conditions reached. Finally, the output vectors are the embeddings feature representation of each node. This obtained embeddings can correctly express the semantic information of the whole heterogeneous network and the relationship between each node, and can be used for node classification output, clustering, or similarity search.

An Overall Process of Metapath-GloVe Algorithm
In this paper, the proposed Metapath-GloVe algorithm is used to analyze the MOOC course data, which are collected from the MOOC platform. The overall process of learning the node embeddings of MOOC course data based on Metapath-GloVe is shown in Figure 2. Firstly, users, videos, courses, and the relationship among them are extracted from MOOC coursed data to construct a heterogeneous graph of MOOC course data as shown in Figure 1. In the graph, we focus on the relationship between users and videos, while courses are treated as labels for evaluation. Then, meta-paths are pre-set to guide random walk to obtain heterogeneous node sequences. The obtained heterogeneous node sequences are input to the GloVe model as a text corpus, and the co-occurrence matrix is obtained statistically. Then, the word vector training is performed based on this co-occurrence matrix, and the final embedding feature vector matrix is obtained through the optimization of the loss function. Based on the learned node embeddings, tasks such as association segmentation and user-video link prediction can be achieved. a given learning rate; thus, i , j , i , j can be updated. The above process continues until the given iteration conditions reached. Finally, the output vectors are the embeddings feature representation of each node. This obtained embeddings can correctly express the semantic information of the whole heterogeneous network and the relationship between each node, and can be used for node classification output, clustering, or similarity search.

An Overall Process of Metapath-GloVe Algorithm
In this paper, the proposed Metapath-GloVe algorithm is used to analyze the MOOC course data, which are collected from the MOOC platform. The overall process of learning the node embeddings of MOOC course data based on Metapath-GloVe is shown in Figure  2. Firstly, users, videos, courses, and the relationship among them are extracted from MOOC coursed data to construct a heterogeneous graph of MOOC course data as shown in Figure 1. In the graph, we focus on the relationship between users and videos, while courses are treated as labels for evaluation. Then, meta-paths are pre-set to guide random walk to obtain heterogeneous node sequences. The obtained heterogeneous node sequences are input to the GloVe model as a text corpus, and the co-occurrence matrix is obtained statistically. Then, the word vector training is performed based on this co-occurrence matrix, and the final embedding feature vector matrix is obtained through the optimization of the loss function. Based on the learned node embeddings, tasks such as association segmentation and user-video link prediction can be achieved.

Data Analysis of MOOC Course Data Based on Metapath-GloVe
After learning the node embeddings based on Metapath-GloVe, two methods are proposed to analyze the MOOC course data: the node embedding-based association partitioning method and the node embedding-based user-video link prediction method. The details are shown in the following.

User/Video Association Analysis
Based on the learned node embeddings, the clustering algorithm can be adopted to achieve association analysis. The specific experimental process is shown in Figure 3.

Data Analysis of MOOC Course Data Based on Metapath-GloVe
After learning the node embeddings based on Metapath-GloVe, two methods are proposed to analyze the MOOC course data: the node embedding-based association partitioning method and the node embedding-based user-video link prediction method. The details are shown in the following.

User/Video Association Analysis
Based on the learned node embeddings, the clustering algorithm can be adopted to achieve association analysis. The specific experimental process is shown in Figure 3. Figure 4 demonstrates the effect of clustering analysis on MOOC course data. In details, Figure 4a shows the adjacency matrix of a graph, Figure 4b shows the learned node embedding feature matrix, and Figure 4c shows the result of clustering analysis. It can be seen from the figures, Figure 4a,b do not contain the regular information, while in Figure 4c, the association between users and videos can be extracted easily after clustering analysis. Figure 4 demonstrates the effect of clustering analysis on MOOC course data. In details, Figure 4a shows the adjacency matrix of a graph, Figure 4b shows the learned node embedding feature matrix, and Figure 4c shows the result of clustering analysis. It can be seen from the figures, Figure 4a,b do not contain the regular information, while in Figure  4c, the association between users and videos can be extracted easily after clustering analysis.  In this paper, Spectral Co-clustering [29] is implemented on the embeddings of the user and video to perform association partitioning. The processing is shown as follows.
Step 1: An input matrix A is constructed by set users and videos as the rows and columns of the matrix, respectively. In the matrix, each element can be obtained by calculating the cosine similarity of corresponding user and video embeddings.
Step 2: The input matrix A is preprocessed as follows.
where R is a diagonal matrix, where the elements i are equal to Step 3: Singular value decomposition is implanted on A as shown in Equation (8).
It yields a partition of A in rows and columns, which a subset of the singular vectors on the left gives row partitions, and a subset of the singular vectors on the right gives column partitions.
Step 4: Calculate matrix Z , which provides the required partitioning information by Equation (9).  Figure 4 demonstrates the effect of clustering analysis on MOOC course data. In details, Figure 4a shows the adjacency matrix of a graph, Figure 4b shows the learned node embedding feature matrix, and Figure 4c shows the result of clustering analysis. It can be seen from the figures, Figure 4a,b do not contain the regular information, while in Figure  4c, the association between users and videos can be extracted easily after clustering analysis.  In this paper, Spectral Co-clustering [29] is implemented on the embeddings of the user and video to perform association partitioning. The processing is shown as follows.
Step 1: An input matrix A is constructed by set users and videos as the rows and columns of the matrix, respectively. In the matrix, each element can be obtained by calculating the cosine similarity of corresponding user and video embeddings.
Step 2: The input matrix A is preprocessed as follows.
where R is a diagonal matrix, where the elements i are equal to Step 3: Singular value decomposition is implanted on A as shown in Equation (8).
It yields a partition of A in rows and columns, which a subset of the singular vectors on the left gives row partitions, and a subset of the singular vectors on the right gives column partitions.
Step 4: Calculate matrix Z , which provides the required partitioning information by Equation (9). In this paper, Spectral Co-clustering [29] is implemented on the embeddings of the user and video to perform association partitioning. The processing is shown as follows.
Step 1: An input matrix A is constructed by set users and videos as the rows and columns of the matrix, respectively. In the matrix, each element can be obtained by calculating the cosine similarity of corresponding user and video embeddings.
Step 2: The input matrix A is preprocessed as follows.
where R is a diagonal matrix, where the elements i are equal to ∑ j A ij , and C is a diagonal matrix, where the elements j are equal to ∑ i A ij .
Step 3: Singular value decomposition is implanted on A as shown in Equation (8).
It yields a partition of A in rows and columns, which a subset of the singular vectors on the left gives row partitions, and a subset of the singular vectors on the right gives column partitions.
Step 4: Calculate matrix Z, which provides the required partitioning information by Equation (9).
Among them, the columns of U are u 2 , u 3 , . . . , u λ+1 , and the columns of V also have similar characteristics.
Step 5: The clustering result can be obtained by using k-means on all rows of Z.

User-Video Link Prediction Method Based on Node Embeddings
In this section, link prediction task is implemented on the MOOCCube data to evaluate the performance of Metapath-GloVe algorithm. Users are treated as nodes U, videos are treated as nodes V, and the relation between them are set as user-video links (U-V). The process of link prediction is shown in Figure 5.

User-Video Link Prediction Method Based on Node Embeddings
In this section, link prediction task is implemented on the MOOCCube data to evaluate the performance of Metapath-GloVe algorithm. Users are treated as nodes U, videos are treated as nodes V, and the relation between them are set as user-video links (U-V). The process of link prediction is shown in Figure 5. Step 1: The user-video pairs with existing links are considered as positive node pairs, forming a set E of link relationships, and all information is integrated to build a uservideo bipartite graph ( ) , , B U V E .
Step 2: Randomly select 20% of positive node pairs as test samples and the remaining 80% of positive node pairs as training samples.
Step 3: All unlinked user-video pairs are treated as negative links, from which the same number of negative links are randomly selected to form the test and training sets, respectively.
Step 4: Generate a new user-video bipartite graph ( ) , , B U V E ′ ′ by removing the positive test link and learn the node embeddings of ( ) , , B U V E ′ ′ by using Metapath-GloVe.
Step 5: The edge embedding values of all positive and negative node pairs samples in the training and test sets are computed as the following formula. b a value − = (10) where a is the feature vector of user nodes in a link, and b is the feature vector of video nodes in this link.
Step 6: The positive node pairs in the training and test sets are assigned a label value of 1, and the negative node pairs are assigned a label value of 0.
Step 7: The embedding values are used as input and the label values are used as output, which are input to different classifiers for training.
Step 8: The trained classifiers, i.e., Bagging classifier, Stacking classifier, and Neural Network (MLP) classifier, are used to predict the links in the test set for link prediction.

Experiments and Results Analysis
In this paper, we implemented two proposed methods in Section 3 on MOOC course dataset to achieve user/video association analysis and user-video link prediction. To validate the effectiveness and scalability of the proposed methods, extensive experiments were conducted as follows. Step 1: The user-video pairs with existing links are considered as positive node pairs, forming a set E of link relationships, and all information is integrated to build a user-video bipartite graph B(U, V, E).
Step 2: Randomly select 20% of positive node pairs as test samples and the remaining 80% of positive node pairs as training samples.
Step 3: All unlinked user-video pairs are treated as negative links, from which the same number of negative links are randomly selected to form the test and training sets, respectively.
Step 4: Generate a new user-video bipartite graph B (U, V, E ) by removing the positive test link and learn the node embeddings of B (U, V, E ) by using Metapath-GloVe.
Step 5: The edge embedding values of all positive and negative node pairs samples in the training and test sets are computed as the following formula.
where a is the feature vector of user nodes in a link, and b is the feature vector of video nodes in this link.
Step 6: The positive node pairs in the training and test sets are assigned a label value of 1, and the negative node pairs are assigned a label value of 0.
Step 7: The embedding values are used as input and the label values are used as output, which are input to different classifiers for training.
Step 8: The trained classifiers, i.e., Bagging classifier, Stacking classifier, and Neural Network (MLP) classifier, are used to predict the links in the test set for link prediction.

Experiments and Results Analysis
In this paper, we implemented two proposed methods in Section 3 on MOOC course dataset to achieve user/video association analysis and user-video link prediction. To validate the effectiveness and scalability of the proposed methods, extensive experiments were conducted as follows.

Result of User/Video Association Analysis on MOOC Course Dataset
In the MOOC course dataset, the label of users is not available. To better evaluate the proposed method, synthetic datasets are generated with explicit labels. Therefore, we implement the association analysis method on both a synthetic dataset and the MOOC course dataset.

Synthetic Data
In this paper, synthetic datasets are generated based on the scheme shown in Figure 6. Suppose there are two node types, which are T 1 and T 2 . For each type of node, we suppose there are two node clusters, which are T 1 : [C 1 , C 2 ] and T 2 : [C 3 , C 4 ], and the number of nodes in each cluster is set as 100; thus, the size of the synthetic graph is 200 × 200, and the total node is labelled from 1 to 399. Based on the principle of clustering that connection is dense in the same cluster while sparse between different clusters, we adopt the connection probability P = [P 1 , P 2 , P 3 , P 4 ] to represent the degree of sparsity of connections. As shown in the figure, P 1 , P 4 are the probability in the cluster and P 2 , P 3 are the probability between clusters. Thus, when P is given, edges are generated randomly according to the probability. Three sets of connection probabilities, P = [0.9, 0.1, 0.1, 0.9], P = [0.7, 0.3, 0.3, 0.7], and P = [0.6, 0.4, 0.4, 0.6], were used in the experiments, and the data distribution is shown in Figure 7. It can be seen from the figure, Figure 7a has a clear cluster division, and with the decrease of the probability in the cluster and the increase of the probability between clusters, the noise is raised obviously.

Synthetic Data
In this paper, synthetic datasets are generated based on the scheme shown in Figure  6. Suppose there are two node types, which are 1 T and 2 T . For each type of node, we suppose there are two node clusters, which are 1 : [ , ] T C C , and the number of nodes in each cluster is set as 100; thus, the size of the synthetic graph is 200 × 200, and the total node is labelled from 1 to 399. Based on the principle of clustering that connection is dense in the same cluster while sparse between different clusters, we adopt the connection probability 1 2 3 4 [ , , , ] P P P P P = to represent the degree of sparsity of connections. As shown in the figure, 1 4 , P P are the probability in the cluster and 2 3 , P P are the probability between clusters. Thus, when P is given, edges are generated randomly according to the probability. Three sets of connection probabilities, , were used in the experiments, and the data distribution is shown in Figure 7. It can be seen from the figure, Figure 7a has a clear cluster division, and with the decrease of the probability in the cluster and the increase of the probability between clusters, the noise is raised obviously.

Synthetic Data
In this paper, synthetic datasets are generated based on the scheme shown in Figure  6. Suppose there are two node types, which are 1 T and 2 T . For each type of node, we suppose there are two node clusters, which are 1 : [ , ] T C C , and the number of nodes in each cluster is set as 100; thus, the size of the synthetic graph is 200 × 200, and the total node is labelled from 1 to 399. Based on the principle of clustering that connection is dense in the same cluster while sparse between different clusters, we adopt the connection probability 1 2 3 4 [ , , , ] P P P P P = to represent the degree of sparsity of connections. As shown in the figure, 1 4 , P P are the probability in the cluster and 2 3 , P P are the probability between clusters. Thus, when P is given, edges are generated randomly according to the probability. Three sets of connection probabilities, , were used in the experiments, and the data distribution is shown in Figure 7. It can be seen from the figure, Figure 7a has a clear cluster division, and with the decrease of the probability in the cluster and the increase of the probability between clusters, the noise is raised obviously.  Lastly, the Fowlkes-Mallows Scores (FMI) calculates the geometric mean of the clustering results with respect to the true value to the exact rate and recall, with a value range of [0,1]. In summary, the clustering quality is determined by the values of these indices, with higher values indicating better clustering results. Conversely, lower values indicate poorer clustering quality.
In this paper, the Metapath-GloVe algorithm, together with NMF and Metapath2vec algorithms, is implemented on the three generated datasets for association analysis evaluation. The important hyperparameters of each method are shown in Table 1. Moreover, node embeddings are learned by each method, and then are fed into clustering algorithm. The average evaluation results of the five indicators are shown in Table 2. Seen from the table, as the cluster relation is distinct in case [0.9,0.1,0.1,0.9], and it can be seen that all three methods can achieve perfect result. As the probability in the cluster decreased and the probability between the clusters increased, the efficiency of all methods is reduced at different level. However, the proposed Metapath-GloVe method can achieve the best result in the other two cases. Table 1. Hyperparameters of various methods on the self-generated synthetic dataset.

NMF k = 150
Metapath2vec The meta-path random walk step size is 20,000 and the dimension is 150. The window size is 30, the learning rate is 0.01, and the number of iterations is 20.

Metapath-GloVe
The meta-path random walk step size is 20,000, the dimension is 150, the window size is 30, the learning rate is 0.01, and the number of iterations is 20.

MOOC Course Dataset
In this section, the association analysis is implemented on the MOOC course dataset. In detail, the MOOC course dataset is extracted from MOOCCube data of concept named "K_Data_Structure_Computer_Science_Technology", which includes data on courses, videos, users, and their relationships. Table 3 provides details on the specific data analyzed in this paper. User information is treated as a type of node U. For a node of user information class, add the node attribute "id" and start numbering from 0, that is, the user node "id" number is 0-1049. Video information as a type of node V, for the node of video information class, add the node attribute "video" and start numbering from 10,000, that is, the video node "video" number is 10,000-10,586. The MOOC course dataset includes information on the relationships between videos and courses, where videos are contained within courses. Additionally, the videos are classified into 12 categories based on the course category, which is used as a labeling system for external evaluation indicators. Table 4 provides a detailed list of these categories. Similarly, the Metapath-GloVe algorithm, together with NMF and Metapath2vec algorithms, is implemented on the MOOC course dataset. Selecting the optimal parameter value through Grid Search, the important hyperparameters of each method on this dataset are shown in Table 5. When performing clustering, this experiment also used the average value of each group as the final evaluation value, and their evaluation results for the five categories of indicators are shown in Table 6, and Figure 8 shows the clustering visualization results for each type of method. Table 5. Hyperparameters of various methods on the MOOCCube dataset.

Metapath2vec
The meta-path random walk step size is 20,000, the dimension is 300, the window size is 15, the learning rate is 0.01, and the number of iterations is 20.

Metapath-GloVe
The meta-path random walk step size is 20,000, the dimension is 300, the window size is 15, the learning rate is 0.01, and the number of iterations is 20.
As seen from Table 6, it can be seen clearly that the proposed method in this paper achieves the highest values of all the indicators of the MOOC course datasets. In addition, we used function perf_counter() in the time library to calculate the running time of the algorithm. Because NMF is not iterative and does not need mapping and relationship identification, it saves running costs. The running time is 14 s, which takes the shortest time. However, with the visualization results in Figure 8, it can be seen that the NMF algorithm in Figure 8a cannot effectively divide the nodes into different groups, and the results are not ideal. Metapath2vec has the longest running time (4413.6 s). Combining the results of Metapath2vec clustering in Figure 8b, two types of tag nodes, six and seven, are missing, which means the videos of types "ARM Microcontrollers and Embedded Systems (Spring 2019)" and "Operating system (autonomous mode)" cannot detected. The Metapath-GloVe algorithm proposed in this paper needs 2252.9 s, which saves a lot of run time compared with Metapath2vec. While the cluster result of the Metapath-GloVe algorithm in Figure 8c demonstrates that it is missing only six labeled nodes, which can indicate that the Metapath-GloVe algorithm model has a better performance. Metapath-GloVe The meta-path random walk step size is 20,000, the dimension is 300, the window size is 15, the learning rate is 0.01, and the number of iterations is 20.  As seen from Table 6, it can be seen clearly that the proposed method in this paper achieves the highest values of all the indicators of the MOOC course datasets. In addition, we used function perf_counter() in the time library to calculate the running time of the algorithm. Because NMF is not iterative and does not need mapping and relationship identification, it saves running costs. The running time is 14 s, which takes the shortest time. However, with the visualization results in Figure 8, it can be seen that the NMF algorithm in Figure 8a cannot effectively divide the nodes into different groups, and the results are not ideal. Metapath2vec has the longest running time (4413.6 s). Combining the results of Metapath2vec clustering in Figure 8b, two types of tag nodes, six and seven, are missing, which means the videos of types "ARM Microcontrollers and Embedded Systems (Spring 2019)" and "Operating system (autonomous mode)" cannot detected. The Metapath-GloVe algorithm proposed in this paper needs 2252.9 s, which saves a lot of run time compared with Metapath2vec. While the cluster result of the Metapath-GloVe algorithm in Figure 8c demonstrates that it is missing only six labeled nodes, which can indicate that the Metapath-GloVe algorithm model has a better performance.
In addition, we compare the clustering results of the Metapath-GloVe and Meta-path2vec algorithms from the aspect of statistics. We assume that there is no significant difference in the NMI between the Meatpath-GloVe and Metapath2vec algorithms. By implementing the t-test, the p value = 1.9721 × 10 −37 , and it is much less than 0.05, and we reject the original hypothesis. Therefore, there is a significant difference in NMI between Metapath-GloVe and Metapath2vec. Similarly, Metapath-GloVe outperforms Meta-path2vec in all metrics, which can indicate that Metapath-GloVe has better performance on clustering. In addition, we compare the clustering results of the Metapath-GloVe and Metap-ath2vec algorithms from the aspect of statistics. We assume that there is no significant difference in the NMI between the Meatpath-GloVe and Metapath2vec algorithms. By implementing the t-test, the p value = 1.9721 × 10 −37 , and it is much less than 0.05, and we reject the original hypothesis. Therefore, there is a significant difference in NMI between Metapath-GloVe and Metapath2vec. Similarly, Metapath-GloVe outperforms Meta-path2vec in all metrics, which can indicate that Metapath-GloVe has better performance on clustering.

User-Video Link Prediction
In this section, link prediction which illustrated in Section 3.2 are implemented on the MOOC course data. In detail, 20% of user-video links are selected for prediction evaluation.
(1) Evaluation indicators In this paper, four metrics, such as accuracy, precision, recall, and F1 value, are adopted to evaluate the result of link prediction. The calculation is shown in Equations (11)- (14).
where TP is the number of positive cases that the model predicts correctly, FN is the number of positive cases that the model predicts incorrectly, FP is the number of negative cases that the model predicts incorrectly, and TN is the number of negative cases that the model predicts correctly.
(2) Experimental results From the result of node clustering, it is clear that NMF learns the node features of this data poorly; therefore, this experiment only compares two types of algorithmic models, Metapath2vec and Metapath-GloVe. Therefore, Figure 9 is the visualization result of the confusion matrix, Table 7 is an example of the comparative experimental results, and Table 8 shows the average results of 10 runs per model. where TP is the number of positive cases that the model predicts correctly, FN is the number of positive cases that the model predicts incorrectly, FP is the number of negative cases that the model predicts incorrectly, and TN is the number of negative cases that the model predicts correctly.
(2) Experimental results From the result of node clustering, it is clear that NMF learns the node features of this data poorly; therefore, this experiment only compares two types of algorithmic models, Metapath2vec and Metapath-GloVe. Therefore, Figure 9 is the visualization result of the confusion matrix, Table 7 is an example of the comparative experimental results, and Table 8 shows the average results of 10 runs per model.      Figure 9, we can see the difference between them more intuitively. The Metapath-GloVe algorithm model results in fewer positive examples of prediction errors and higher accuracy of recommendation. It is slightly better than Metapath2vec. From Table 8, the accuracy of the Metapath-GloVe algorithm on all kinds of classifiers is higher than that of Metapath2vec, and the Bagging classifier achieves the best performance among them.
Similarly, we assume that there is no significant difference between the accuracy of link prediction by the Metapath-GloVe and Metapath2vec algorithms. By implementing the t-test, the p value = 2.8635 × 10 −13 , and it is much less than 0.05, and we reject the original hypothesis. Therefore, there is a significant difference between the accuracy of link prediction by the Metapath-GloVe and Metapath2vec algorithms.

Summary
As many real-world scenarios are commonly represented by heterogeneous graphs, graph embedding has become an important technical for graph analysis and applications. In this paper, an algorithm that combines meta-path random walk to obtain sequences of heterogeneous nodes and a Glove model for global text learning of word vectors is proposed to improve the embedding of heterogeneous graphs. The proposed model considers not only meta-path random walk to capture semantic and structural information among different nodes, but also utilizes global statistical information in the vectorized representation of heterogeneous graph nodes. In the MOOC course data, rich auxiliary information, such as viewing order and viewing duration, is usually associated with video. It will be interesting to study how to combine such auxiliary information to further improve the efficiency of heterogeneous node embedding learning.