Multi-Site Wind Speed Prediction Based on Graph Embedding and Cyclic Graph Isomorphism Network (GIN-GRU)

: Accurate and reliable wind speed prediction is conducive to improving the power generation efficiency of electrical systems. Due to the lack of adequate consideration of spatial feature extraction, the existing wind speed prediction models have certain limitations in capturing the rich neighborhood information of multiple sites. To address the previously mentioned constraints, our study introduces a graph isomorphism-based gated recurrent unit (GIN-GRU). Initially, the model utilizes a hybrid mechanism of random forest and principal component analysis (PCA-RF) to discuss the feature data from different sites. This process not only preserves the primary features but also extracts critical information by performing dimensionality reduction on the residual features. Subsequently, the model constructs graph networks by integrating graph embedding techniques with the Mahalanobis distance metric to synthesize the correlation information among features from multiple sites. This approach effectively consolidates the interrelated feature data and captures the complex interactions across multiple sites. Ultimately, the graph isomorphism network (GIN) delves into the intrinsic relationships within the graph networks and the gated recurrent unit (GRU) integrates these relationships with temporal correlations to address the challenges of wind speed prediction effectively. The experiments conducted on wind farm datasets for offshore California in 2019 have demonstrated that the proposed model has higher prediction accuracy compared to the comparative model such as CNN-LSTM and GAT-LSTM. Specifically, by modifying the network layers, we achieved higher precision, with the mean square error (MSE) and root mean square error (RMSE) of wind speed at a height of 10 m being 0.8457 m/s and 0.9196 m/s, respectively.


Introduction
The growing global energy crisis and the critical issue of environmental pollution have highlighted the need for clean and renewable energy sources.Wind energy has garnered significant attention due to its relatively short construction cycle, minimal environmental prerequisites, and vast reserves [1].This has led to its widespread adoption and rapid development globally.Many nations have recognized the potential of wind power and are actively promoting its generation to capture wind energy.Consequently, the wind power sector has undergone rapid growth and expansion, firmly establishing wind power generation as a field with promising current prospects.Presently, wind speed prediction plays a pivotal role in formulating control strategies for wind farms, which stands as a cornerstone technology that enhances the operational efficiency of wind turbines [2].
Accurate wind speed prediction is crucial to optimizing wind power generation.It enhances wind energy utilization, mitigates wind power's grid impact, and ensures efficient operation of wind farms.To refine wind speed forecasting, researchers worldwide have pioneered various innovative approaches.For instance, data decomposition techniques like wavelet transform [3], empirical modal decomposition [4][5][6], and variational modal the relationships between each site's features.Ultimately, the GIN model is leveraged for its proficient learning of graph structure similarities.By conducting deep learning on the spatial attributes of these graph structures, we can then feed the deeply learned feature data into a GRU network for subsequent prediction.This approach generates wind speed predictions that considers the characteristics of each site and their interconnectedness to generate wind speed predictions.Additionally, it summarizes the interactive effects between sites on the predicted wind speeds.

PCA-RF Fusion Model
Raw meteorological data are complex and multifaceted.However, not all variables are pertinent to changes in wind speed.An overabundance of predictive variables can introduce redundancy, thereby diminishing the model's generalization capabilities.Some residual features have some key information that needs to be extracted.Therefore, we focus on primary feature extraction and dimensionality reduction of residual features from the original meteorological elements.This approach streamlines the dataset and enhances model interpretability and efficiency.Consequently, this research leverages the intrinsic feature extraction capabilities of the random forest algorithm in conjunction with PCA to further diminish the dimensionality of residual features.This approach enables the separate processing of primary and residual features to achieve maximal efficiency.Subsequently, the primary features and the dimensionality-reduced residual features are concatenated to form the dataset used for constructing graph networks.

Random Forests
Random forest is a kind of machine learning algorithm that makes predictions from multiple decision trees and integrates their results.Random forests construct each decision tree using random samples and features, which imparts the model with robustness against overfitting [18].MDI is a measure of feature importance based on the reduction in Gini impurity of each feature at the split point in the decision tree [19].Given a dataset containing nodes from C categories, where the probability that node j belongs to category c is denoted as p j , the Gini impurity of node j is as follows: Gini impurity is a statistical metric that measures the probability of a random classification error for an element chosen from a dataset, given that the classification is random and reflects the distribution of classes in the dataset.Owing to its simplicity and direct computation, MDI is selected for evaluating the significance of feature variables within this study.The MDI value signifies a feature's greater relevance in strengthening the predictive accuracy of the model.The steps for calculating MDI are given as follows: Step 1.Each decision tree generated through bootstrap sampling on the training set constitutes a random forest.
Step 2. For every tree, the Gini impurity of each node is computed.
Step 3. MDI for each feature is ascertained by averaging the reduction in Gini impurity across all trees.

Principal Component Analysis
PCA is a prevalent technique for reducing data dimensionality.It employs orthogonal transformations to convert correlated variables into a new set of uncorrelated variables, known as principal components [20].PCA is designed to preserve critical information from the original dataset and reduce dimensionality by decreasing the number of variables.PCA streamlines complex datasets by isolating pivotal features to reduce noise and safeguard essential information.
The fundamental method involves computing the eigenvalues and eigenvectors from the covariance matrix.The eigenvalues represent the core information of the original variables, which are crucial to the remaining features.Subsequently, the dimensionality reduction of the feature matrix facilitates the isolation of distinctive meteorological factors that exert a significant influence on wind speed variations.

PCA-RF Feature Fusion
In this study, the primary and residual feature matrices are based on the ordering of features according to their MDI scores.The residual feature matrix is then subjected to PCA for dimensional reduction.For a wind field with N sites, the feature matrix of the n-th site is expressed as , where n = 1, . .., N, M is the number of features per site, the total number of features is M × D and is the feature vector of the site.This dataset primarily includes measurements taken at 15-m intervals for wind speed, wind direction, temperature, pressure, and other relevant characteristics.The steps of wind speed feature processing for the n-th site are given as follows: (1) Build a random forest.Bootstrap sampling and random feature selection techniques are utilized to generate a training dataset and a corresponding subset of features.These elements form the basis for constructing an individual decision tree.The process of building a decision tree involves randomly selecting a subset of the dataset and recursively splitting the nodes based on optimal segmentation criteria until the stopping criteria of the random forest are met.Through the repetition of this process, a random forest model is assembled with I different decision trees, where the i-th decision tree has T nodes and C categories.(2) Calculate the Gini impurity and the amount of its variation.In tree i, at node j, it is essential to calculate the proportion of category c, represented as p(c|(i, j), k), against the total categories.The variation in Gini impurity is then determined by evaluating the Gini impurity before and after the branching of node j.The calculation formula is as follows: where j = 1, 2. ..T, i = 1, 2. ..I, k = 1, 2. ..M, c = 1, 2. ..C. j b and j f represent the two new nodes after node j is branched.∆Gini(i, j, k) are feature vector V k reduction of Gini impurity after node splitting in decision tree i. (3) Calculate the MDI.MDI for feature V k is as follows: (4) Feature selection.In the feature selection phase, features are ranked by their MDI values.Subsequently, a threshold denoted by γ is determined.Features with an MDI value of M(V k ) are selected to comprise the primary feature matrix X f , which includes M f features.Conversely, features with an MDI value of M(V k ) < γ are used to construct the auxiliary matrix X b , which incorporates the remaining M b = M − M f features.(5) Feature decentralization.The auxiliary matrix X b is decentralized to yield the matrix X ′ b .
where X b is the auxiliary matrix X b .(6) Compute eigenvalues and eigenvectors.The eigenvalue decomposition is performed on the covariance matrix.The feature values are ranked in descending order and sequentially aggregated until the cumulative contribution rate, denoted by η ≥ 0.85.Consequently, in adherence to this criterion, the primary r features are chosen to constitute the matrix R of r × D.
where λ = λ 1 , λ 2 . . .λ M b is a set of eigenvalues, l = l 1 , l 2 . . .l M b is the matrix of eigenvectors.( 7) Data dimensionality reduction.We calculate the dimensionality reduction matrix Y utilizing the subsequent formula: (8) Matrix splicing.By horizontally concatenating the matrices X f and Y, we obtain a PCA-RF processes all sites to yield a sequence of feature matrices W 1 , W 2 . . .W N , which captures the pivotal features of the dataset.Finally, all the feature matrices are horizontally spliced to obtain a feature matrix W of g × D.
where g = N * M f + r .Figure 1 illustrates the steps of PCA-RF.Initially, the raw datasets from different sites are processed through RF algorithm to extract the primary feature matrix and the auxiliary feature matrix.Subsequently, the auxiliary feature matrix is subjected to PCA for dimensionality reduction to yield the dimensionality reduction matrix.Finally, the final feature matrix is constructed by concatenating the primary feature matrix with the dimensionality reduction matrix.Graph neural networks combined with graph networks on some feature correlations have achieved excellent results in wind speed prediction [21,22].However, the performance of graph neural networks is highly dependent on predefined graphs to characterize Graph neural networks combined with graph networks on some feature correlations have achieved excellent results in wind speed prediction [21,22].However, the performance of graph neural networks is highly dependent on predefined graphs to characterize the relationships between site features, such as those based on the Pearson correlation coefficients.Predefined graphs often do not reflect the complex dynamics of wind speed features, with their quality significantly depending on expert judgment and accurate wind speed measurements.Considering these factors, this paper introduces a novel compositional algorithm to analyze the spatio-temporal attributes of wind speed.The algorithm acquires node embedding vectors by employing graph embedding techniques on a predefined graph, which elucidate the graph's topology and the inter-node relationships [23].Subsequently, it quantifies the distance between node embedding vectors by employing the Mahalanobis distance [24,25].The final graph network results from the adjustment of edge relationships in the predefined graph through a filter based on a threshold Mahalanobis distance.

Construction of Predefined Graphs
In this paper, we use the matrix W obtained above to construct the predefined graph represents the set of nodes.We consider each column of data of matrix W as the feature vector of a node and the dimension is g × D where g is the number of nodes and D is the dimension of each node feature.The set of edges E = e ij v i,j=1 embodies the connectivity relationships among nodes within the network, such that there exists an edge connecting any two nodes V i and V j .In the predefined graphs, the existence of an edge is denoted by e ij = 1, which is presented as follows: where δ is the correlation threshold.v i represents the mean value of the i-th node.
Graph embedding is a process that maps graph data into low-dimensional, dense vectors, preserving the graph's structure and properties.This process facilitates node classification, clustering, link prediction, graph reconstruction, and visualization [26].In this study, graph embedding techniques are employed to transform a predefined graph into a continuous vector space.This conversion enables the effective learning of features and facilitates subsequent data analysis tasks.Certain shallow graph embedding techniques are initiated by randomly selecting neighboring nodes within the network to create a fixed-length random walk sequence.This sequence is then utilized by the skip-gram model to project the sequence of nodes into a low-dimensional space, which results in an embedding vector that captures the essential structural features of the graph [27].This method effectively reduces the dimensionality and maintains node contextuality.The specific process is shown in Figure 2.However, the shallow model is unable to capture the highly nonlinear structure, which in turn leads to the generation of non-optimal solutions.Recent breakthroughs in deep learning have profoundly impacted graph analysis, as deep neural network techniques are increasingly employed to advance graph embedding methodologies [28].Then we use deep neural networks to perform a nonlinear transformation of node features and neighborhood information, which leads to the generation of node embedding vectors that capture the graph's higher-order dependencies.This approach allows for a more profound understanding of node characteristics, which optimizes the graph's structural representation.
neural network techniques are increasingly employed to advance graph embedding methodologies [28].Then we use deep neural networks to perform a nonlinear transformation of node features and neighborhood information, which leads to the generation of node embedding vectors that capture the graph's higher-order dependencies.This approach allows for a more profound understanding of node characteristics, which optimizes the graph's structural representation.

GraphSAGE-Based Graph Embedding
We employ a deep learning approach grounded in GraphSAGE for graph embedding.This method leverages node feature information to generate embedding vectors for new nodes or subgraphs via an enhanced graph convolutional network.It facilitates incremental updates to node embeddings and preserves the graph's features and structural information.Based on this groundwork, the algorithm establishes 'edges' that precisely delineate the relationships between nodes.The specific process is given as follows: A. Sampling: for node   , we randomly sample a subset of its neighboring nodes ( ) from the set of its neighbor nodes  to create a subgraph.This method decreases computational demands yet preserves the heterogeneity among adjacent nodes.B. Aggregation: for node   , we employ an aggregation function  that compresses and transforms the feature vectors of its neighboring nodes {    , ∀ ∈ (  )} to generate a new feature vector     through aggregation.The formula for aggregation is expressed as where     denotes the node   in the first k layer of the embedding vector, and is the feature vector of node   . denotes the sigmoid activation function.The matrix  represents the learnable weights. signifies the splicing operation.

GraphSAGE-Based Graph Embedding
We employ a deep learning approach grounded in GraphSAGE for graph embedding.This method leverages node feature information to generate embedding vectors for new nodes or subgraphs via an enhanced graph convolutional network.It facilitates incremental updates to node embeddings and preserves the graph's features and structural information.Based on this groundwork, the algorithm establishes 'edges' that precisely delineate the relationships between nodes.The specific process is given as follows: A. Sampling: for node V i , we randomly sample a subset of its neighboring nodes N(V i ) from the set of its neighbor nodes S k to create a subgraph.This method decreases computational demands yet preserves the heterogeneity among adjacent nodes.B. Aggregation: for node V i , we employ an aggregation function AGGREGATE k that compresses and transforms the feature vectors of its neighboring nodes h k−1 u , ∀u ∈ N(V i ) to generate a new feature vector h k V i through aggregation.The formula for aggregation is expressed as where h k V i denotes the node V i in the first k layer of the embedding vector, and is the feature vector of node V i .σ denotes the sigmoid activation function.The matrix Q k represents the learnable weights.CONCAT signifies the splicing operation.AGGREGATE k is the aggregation function at the first k layer and N(V i ) refers to the node V i neighboring set.C. Update: for node V i , the feature vector h k−1 V i is concatenated with the aggregated feature vector h k V i .This combined vector then passes through a fully connected layer followed by an activation function, which results in the embedding vector h k+1 V i for node V i .This process facilitates the integration and nonlinear transformation of the features represented by node V i with those of its neighboring nodes.The updated formula is expressed as follows: where COMBI NE denotes the splicing or summing operation and Q k+1 denotes the learnable weight matrix.

Modification of Graph Network Edges
The node embedding vectors obtained through GraphSAGE graph embedding capture the features of the nodes and the relationships between the nodes.However, they are not directly used as components of the graph network.In this study, the edge connections are determined by setting a similarity threshold based on the Mahalanobis distance, which dictates the connectivity between nodes.
The Mahalanobis distance quantifies the similarity or divergence between two data samples by incorporating their covariance matrix.This metric excels at handling features with varying scales and interdependencies, which effectively neutralizes the impact of scale disparities and feature correlations.Unlike traditional distance measures (such as Euclidean distance), which often assume that individual features are independent, realworld data often exhibit correlations between features.The Mahalanobis distance accounts for these correlations by utilizing the covariance matrix.Consequently, we employ the Mahalanobis distance method for optimizing node embedding vectors.This method is particularly suitable for high-dimensional data influenced by numerous meteorological factors and can accommodate the conditions of non-independent as well as identically distributed dimensions.Its calculation formula is expressed as follows: where h k+1 V i denotes the node embedding vector.The covariance matrix is represented by Σ, and its inverse is denoted as Σ −1 .Finally, we can compute the dimension d mahal and the correlation threshold α to determine the connectivity of the optimized edges within the predefined graph network G = (V, E).
A new graph network G ′ = (V, E ′ ) is thus created.The process of embedding the graph network is illustrated in Figure 3.

Graph Isomorphism Network
GIN is a robust method for learning graph representations, which closely approximates the graph isomorphism test (WL-test) performance [29,30].It effectively distinguishes between non-isomorphic graphs.GIN excels at learning node embedding vectors, discerning patterns within graphs to enhance efficiency and capturing structural dependencies to improve performance.The framework of GIN proceeds as follows: (1) Aggregation.The GIN model employs a summation aggregation function to compile the feature vectors     from the neighboring nodes  ∈ (  ) of node   , which aims to integrate the information from all adjacent nodes.
where     is the temporary aggregation result at layer k, which contains only the information of neighboring nodes. (

Spatio-Temporally Integrated Forecasting Model Based on GIN and GRU 2.2.1. Graph Isomorphism Network
GIN is a robust method for learning graph representations, which closely approximates the graph isomorphism test (WL-test) performance [29,30].It effectively distinguishes between non-isomorphic graphs.GIN excels at learning node embedding vectors, discerning patterns within graphs to enhance efficiency and capturing structural dependencies to improve performance.The framework of GIN proceeds as follows: (1) Aggregation.The GIN model employs a summation aggregation function to compile the feature vectors h k−1 u from the neighboring nodes u ∈ N(V i ) of node V i , which aims to integrate the information from all adjacent nodes.
where a k V i is the temporary aggregation result at layer k, which contains only the information of neighboring nodes.
(2) Combination.The aggregated features of neighboring nodes are combined with the target node V i and the features from the previous layer h k−1 V i to form a new node feature h k V i .This process, facilitated by learnable parameters and the nonlinear transformations of a multilayer perceptron (MLP), enhances the model's ability to learn and represent complex patterns.
where ϵ k is a trainable parameter that allows the model to adjust the self-loop when updating the node features with the contribution.h k V i is the final node feature representation after combination.

Gated Recurrent Unit
GRU is a variant of the recurrent neural network (RNN) designed for processing sequential data, including natural language, speech, and video.Characterized by its dual gating mechanisms-the reset and update gates-GRU regulates the information flow and memory within the network [31,32].This architecture effectively addresses the vanishing gradient problem, preserves long-term dependencies, and enhances model performance.GRU is particularly adept at extracting temporal correlations, which makes it a prime choice for the second layer in deep learning neural networks.The procedural steps are as follows: (1) Determine the values of the update and reset gates.The update gate assesses the degree to which the hidden state from the preceding timestep is retained in the current timestep.
Conversely, the reset gate regulates the proportion of the previous timestep's hidden state that is incorporated into the computation of the current state.The formula for this step is given by where z t and r t denote the values of the update gate and the reset gate, respectively.The sigmoid activation function, represented by σ, is utilized to facilitate the computation of gradients and enable effective backpropagation.The weight matrices U z and U r , along with the bias vectors b z and b r , play a pivotal role in this mechanism.
Additionally, b r represents the hidden state from the previous timestep, while x t corresponds to the input at the current timestep.(2) Compute the candidate hidden state.This represents the candidate hidden state, obtained by applying the hyperbolic tangent (tanh) activation function to the current input and the previously reset hidden state.The formulas are expressed as follows: where h t−1 is the candidate hidden state, U h is the weight matrix, b h is the bias vector, and ⊙ represents the element-wise multiplication operation.(3) Compute the current hidden state.The current hidden state is computed as the weighted average of the previous hidden state and the candidate hidden state, where the weights are governed by the update gate.The formulas are expressed as follows: where h t represents the hidden state at the current moment, z t denotes the value of the update gate, and ∼ h ′ t is the candidate hidden state.These equations collectively illuminate the dynamic update mechanism of the hidden state at each timestep, as orchestrated by gated recurrent units (GRUs).This update Energies 2024, 17, 3516 10 of 20 methodology is intricately designed to capture long-term dependencies and complex patterns inherent in sequential data.Illustrated in Figure 4, the GRU's input-output structure includes the current input x t , as well as the transmitted hidden state h t−1 from the previous timestep, which retains essential historical information.By amalgamating x t and h t−1 , the GRU calculates the hidden node output y t for the current timestep and the hidden state h t , which will be forwarded to the next timestep.
where ℎ represents the hidden state at the current moment,  denotes the value of the update gate, and ℎ is the candidate hidden state.These equations collectively illuminate the dynamic update mechanism of the hidden state at each timestep, as orchestrated by gated recurrent units (GRUs).This update methodology is intricately designed to capture long-term dependencies and complex patterns inherent in sequential data.Illustrated in Figure 4, the GRU's input-output structure includes the current input  , as well as the transmitted hidden state ℎ from the previous timestep, which retains essential historical information.By amalgamating  and ℎ , the GRU calculates the hidden node output  for the current timestep and the hidden state ℎ , which will be forwarded to the next timestep.

Hybrid Model of GE-GIN-GRU Network
In our study, we explore wind speed prediction models based on GIN and GRU.These models capture both multi-site wind speed spatio-temporal correlations and wind direction spatio-temporal correlations, which facilitate accurate local wind speed predictions across various time domains.The data such as wind direction data, wind speed information, temperature information, and barometric pressure information are independent time series from each other.To incorporate feature information affecting loads and prepare it as input for the GIN, we need to process this information to construct a network graph.In our study, we use the PCA-RF method to optimize the nodes.Furthermore, more accurate edge representations are obtained by employing graph embedding techniques in conjunction with Mahalanobis distance.Graph embedding techniques are intended to augment the accuracy of graph network within the GIN framework.
Firstly, at the node level, primary features are extracted using PCA-RF and residual features are dimensionally reduced to retain essential information.This process yields the g × D fusion matrix W, which serves as the nodes in the predefined graph G = (V,E).
Secondly, at the edge level, the predefined graph G = (V,E) is subject to aggregation and subsequent updating to obtain the node embedding vector.Subsequently, the Mahalanobis distance between node embedding vectors is employed as a discriminative criterion to establish the edges within the newly formulated graph network G ′ = (V, E ′ ).
Finally, we use the new graph network G ′ = (V, E ′ ) as the input for the GIN-GRU network.The structure of the proposed GIN-GRU network model is illustrated in Figure 5. Within this model, the GIN component primarily handles feature extraction; furthermore, the GRU network focuses on load prediction.The GIN is structured with two convolutional layers, which sequentially perform the aggregation and combination operations.In the GRU network component, we observed that a greater number of GRU network units, which adds to the model's depth, enhances its prediction capability.Consequently, the proposed model includes two layers of GRU networks, with 128 neurons in each layer.In each layer of the GRU network, random deactivation is employed to prevent overfitting.Ultimately, the wind speed prediction vector is generated through the fully connected layer (Dense).
work units, which adds to the model's depth, enhances its prediction capability.Consequently, the proposed model includes two layers of GRU networks, with 128 neurons in each layer.In each layer of the GRU network, random deactivation is employed to prevent overfitting.Ultimately, the wind speed prediction vector is generated through the fully connected layer (Dense).

Results
This subsection is structured into four different parts: the first part introduces the dataset, the second part analyzes the dimension reduction results of PCA-RF on the dataset, the third part discusses the results of graph networks for graph embedding, and the fourth part examines the prediction results of different models on the same test data.

Experimental Equipment
This experiment implements the TensorFlow 2.10.0 framework in Python 3.10.9and accelerates computation through Compute Unified Device Architecture (CUDA).The simulation hardware platform features an Intel Core i7-10875H CPU (manufactured by Intel Corporation, Santa Clara, CA, USA) running at 2.30 GHz with 32GB of RAM, complemented by an Nvidia GeForce RTX 2070 GPU (manufactured by NVIDIA Corporation, Santa Clara, CA, USA).

Error Assessment Criteria
For the multi-site local wind speed prediction with graph network input in the wind farm, we evaluate its effectiveness using several key metrics.These metrics serve as indicators of prediction accuracy and performance.The following evaluation metrics are employed in this simulation: mean squared error (MSE), mean absolute error (MAE), normalized (RMSE), and normalized mean absolute error (MAPE) [33].The formulas are expressed as follows: where n represents the number of samples, y i denotes the i-th observed value, and ŷi represents the i-th predicted value.MSE

Analysis of PCA-RF
In our study, we address the challenge of extracting meaningful meteorological features from a dataset containing 46 variables.Previous attempts using individual methods yielded inaccurate results and exhibited bias.Therefore, we employ random forest to identify weather elements that capture essential information from the original variables.
By doing so, we filter out noise and focus on the most relevant features.Subsequently, we employ PCA to downsize the remaining minor weather elements.Finally, we fuse the main features with the dimensionality-reduced data to extract the characteristic weather elements that significantly impact wind changes.Figure 6 shows the MDI of the first 13 features of windspeed_10 m, which is the feature with the largest contribution to the other features of the site numbered 0. This paper observed a significant decline in MDI for all sites after the 10th feature.Consequently, the first 10 features are selected as the main features.Subsequently, the remaining 36 residual features undergo downsizing via PCA, which results in a new downscaled matrix with five principal components.Finally, the main feature matrix and the downscaling matrix are spliced as our feature set to build the graph network.

Evaluation of Graph Embedding for Graph Networks
To verify the effectiveness of graph embedding to build graph networks, we will compare the method of building graph networks solely using the Pearson correlation coefficient, i.e., the PS-GIN-GRU model.Due to the variability in the number of edges resulting from distinct threshold settings, we employ the optimally constructed network graph by both models for training purposes.The constructed graph networks were put into the same GIN-GRU model training and wind speed prediction at 10 m height for 40 days was performed.The model's performance was assessed using four key metrics: MSE, RMSE, MAE, and MAPE.We compared the predicted wind speeds from several stations with the actual measurements to validate the model's effectiveness.The error metrics for wind speed prediction, as obtained from graph networks constructed by different methods, are presented in Table 2.

Evaluation of Graph Embedding for Graph Networks
To verify the effectiveness of graph embedding to build graph networks, we will compare the method of building graph networks solely using the Pearson correlation coefficient, i.e., the PS-GIN-GRU model.Due to the variability in the number of edges resulting from distinct threshold settings, we employ the optimally constructed network graph by both models for training purposes.The constructed graph networks were put into the same GIN-GRU model training and wind speed prediction at 10 m height for 40 days was performed.The model's performance was assessed using four key metrics: MSE, RMSE, MAE, and MAPE.We compared the predicted wind speeds from several stations with the actual measurements to validate the model's effectiveness.The error metrics for wind speed prediction, as obtained from graph networks constructed by different methods, are presented in Table 2.
Our focus was on evaluating the stability of predictions across multiple sites.The GE-GIN-GRU model integrates various neural network algorithms.Notably, this model demonstrates robustness and strong generalization ability.Models constructed solely using Pearson correlation coefficients rely exclusively on the correlation threshold method for building graph networks.While this approach is straightforward, it lacks the adaptability and complexity inherent in the GE-GIN-GRU model.We evaluated both models to assess the stability of wind speed forecasts across various sites.Especially, the GE-GIN-GRU model consistently outperformed the single correlation threshold model in terms of stability.Additionally, the GE-GIN-GRU model demonstrated robust prediction quality across diverse wind speed datasets.Our study underscores the critical role of constructing a graph that captures intricate relationships among wind speed features.By embedding vectors within this graph, we enhance accuracy in wind speed prediction.Particularly, the fusion of graph embedding techniques with neural networks significantly contributes to the model's performance and reliability.Figure 7 presents the graph networks constructed by various models.With 20 sites in total, the diagram is divided into 20 corresponding groups.Within this diagram, each feature is depicted as a node, and features belonging to the same site are denoted by a uniform color.Figure 7 reveals that the graph networks generated through the Pearson correlation coefficient possess fewer edges compared to the one constructed via graph embeddings.Additionally, the node density within the correlation-based graph networks is comparatively lower, which leads to a more uniform distribution of edge connections.In contrast, the graph networks constructed by graph embeddings demonstrate a concentrated focus on specific nodes to capture their profound connections.

Evaluation and Analysis of GIN-GRU Neural Networks
To further validate the effectiveness of the GIN-GRU neural network, we compared it with recent popular wind speed prediction models: CNN-LSTM proposed by W. Tuerxun [16], GAT-GRU proposed by D. Aykas [34], and GAT-LSTM proposed by A. Flores [35].For these models, we also constructed a network using the TensorFlow 2.10.0 framework in Python for training and fine-tuning.We evaluated these models using a test set of 20 stations over 40 days for each height wind speed prediction.The input graph network, which is identical across various models, is constructed utilizing consistent graph embedding techniques.We employed four key metrics (MSE, RMSE, MAE, and MAPE) to summarize the results.The results were summarized using four metrics: MSE, RMSE, MAE, and MAPE.The error metrics for the wind speed predictions made by different neural network models are displayed in Tables 3 and 4.
correlation coefficient possess fewer edges compared to the one constructed via graph embeddings.Additionally, the node density within the correlation-based graph networks is comparatively lower, which leads to a more uniform distribution of edge connections.In contrast, the graph networks constructed by graph embeddings demonstrate a concentrated focus on specific nodes to capture their profound connections.As shown in Figure 8, we compare the wind speed predictions for heights of 10 m and 30 m using the GIN-GRU model.
Despite variations in altitude, the GIN-GRU model consistently predicts wind speed with higher accuracy compared to other models.This result validates the effectiveness of the proposed method.
An examination of wind speed forecasts using various models at the designated site for elevations of 10 m and 30 m uncovers notable trends, as illustrated in Figure 9.While other networks generally align with measured wind speed signals, they often fail to grasp the nuanced graphical structural relationships.In contrast, GIN excels in this regard by leveraging additional spatio-temporal nodes as a predictive foundation.Despite variations in altitude, the GIN-GRU model consistently predicts wind speed with higher accuracy compared to other models.This result validates the effectiveness of the proposed method.
An examination of wind speed forecasts using various models at the designated site for elevations of 10 m and 30 m uncovers notable trends, as illustrated in Figure 9.While other networks generally align with measured wind speed signals, they often fail to grasp the nuanced graphical structural relationships.In contrast, GIN excels in this regard by leveraging additional spatio-temporal nodes as a predictive foundation.Overall, both GE-GIN-GRU and other models effectively track wind speed trends.However, due to the limitations of other models in data processing and their sensitivity to critical information, the prediction error tends to be larger during wind speed fluctuations.GE-GIN-GRU stands out by leveraging information from neighboring stations.This approach better simulates unique wind speed patterns across wind of different heights.Notably, the predicted trend of GE-GIN-GRU closely aligns with the measured wind speed data, particularly during smooth wind speed changes and gradual increases.Additionally, during wind speed fluctuation periods, GE-GIN-GRU leverages its nonlinear fitting ability to approximate the wind speed more accurately than the continuous method.

Discussion
To delve deeper into the impact of network graph alterations on wind speed prediction outcomes within graph neural networks, this study will explore two primary dimensions: the variations in nodes within the input graph networks and the modifications in edges within the input graph networks.

Effect of the Different Nodes on Graph Networks
In this study, we construct the graph nodes using the primary site features obtained through PCA-RF processing.We focus exclusively on the principal features identified after RF processing as the graph's nodes and deliberately omit the residual feature matrix that is obtained following PCA dimensionality reduction for our comparative analysis.Tables 5 and 6 present the error metrics for wind speed predictions, following the application of different feature selection methods to the nodes.Overall, both GE-GIN-GRU and other models effectively track wind speed trends.However, due to the limitations of other models in data processing and their sensitivity to critical information, the prediction error tends to be larger during wind speed fluctuations.GE-GIN-GRU stands out by leveraging information from neighboring stations.This approach better simulates unique wind speed patterns across wind of different heights.Notably, the predicted trend of GE-GIN-GRU closely aligns with the measured wind speed data, particularly during smooth wind speed changes and gradual increases.Additionally, during wind speed fluctuation periods, GE-GIN-GRU leverages its nonlinear fitting ability to approximate the wind speed more accurately than the continuous method.

Discussion
To delve deeper into the impact of network graph alterations on wind speed prediction outcomes within graph neural networks, this study will explore two primary dimensions: the variations in nodes within the input graph networks and the modifications in edges within the input graph networks.

Effect of the Different Nodes on Graph Networks
In this study, we construct the graph nodes using the primary site features obtained through PCA-RF processing.We focus exclusively on the principal features identified after RF processing as the graph's nodes and deliberately omit the residual feature matrix that is obtained following PCA dimensionality reduction for our comparative analysis.Tables 5  and 6 present the error metrics for wind speed predictions, following the application of different feature selection methods to the nodes.The distinction between the RF and PCA-RF models is evident in the diminished quantity of nodes and edges within the graph networks, as presented in Figure 10.To ascertain the influence of varying nodes, we modulate the threshold to align the number of edges as closely as possible.Our empirical observations reveal that both models exhibit parallel accuracy levels in predicting wind speeds at an altitude of 30 m.However, at a 10-m elevation, the performance of the RF model is significantly inferior to that of the PCA-RF model.This discrepancy may stem from the presence of nodes in the residual feature matrix, obtained through the PCA dimensionality reduction, which significantly enhances the precision of wind speed predictions at the 10-m mark.Conversely, the RF model which relies solely on the primary feature matrix yields suboptimal predictions due to an inadequate feature set.The distinction between the RF and PCA-RF models is evident in the diminished quantity of nodes and edges within the graph networks, as presented in Figure 10.To ascertain the influence of varying nodes, we modulate the threshold to align the number of edges as closely as possible.Our empirical observations reveal that both models exhibit parallel accuracy levels in predicting wind speeds at an altitude of 30 m.However, at a 10-m elevation, the performance of the RF model is significantly inferior to that of the PCA-RF model.This discrepancy may stem from the presence of nodes in the residual feature matrix, obtained through the PCA dimensionality reduction, which significantly enhances the precision of wind speed predictions at the 10-m mark.Conversely, the RF model which relies solely on the primary feature matrix yields suboptimal predictions due to an inadequate feature set.

Effect of the Different Edges on Graph Networks
The construction of additional edges within the input graph networks do not necessarily correlate with the improvement of prediction accuracy.By varying the threshold value, we manipulate the number of edges via graph embedding and the predictive outcomes are expressed as Table 7.

Effect of the Different Edges on Graph Networks
The construction of additional edges within the input graph networks do not necessarily correlate with the improvement of prediction accuracy.By varying the threshold value, we manipulate the number of edges via graph embedding and the predictive outcomes are expressed as Table 7.The experimental data suggest that the network achieves peak predictive accuracy when it maintains a balanced number of edges, as illustrated in Figure 11.A paucity of edges compromises accuracy due to the neural network's deficiency in requisite information for effective learning.On the other hand, an overabundance of edges leads to computational inefficiency by introducing unnecessary data, which does not translate to improved accuracy.The experimental data suggest that the network achieves peak predictive accuracy when it maintains a balanced number of edges, as illustrated in Figure 11.A paucity of edges compromises accuracy due to the neural network's deficiency in requisite information for effective learning.On the other hand, an overabundance of edges leads to computational inefficiency by introducing unnecessary data, which does not translate to improved accuracy.Furthermore, the capacity for information processing and filtration varies across graph networks constructed via disparate methodologies.Notably, when a graph network constructed via correlation amasses 1000 edges, there is a significant decline in the accuracy of neural network processing.Comparatively, when the correlation and graph embedding models are used to construct graph networks with an equivalent number of edges, the latter outperforms the former in terms of quality.This superiority is attributed to the graph embedding model's advanced proficiency in feature extraction, understanding of the dataset's topology, and advanced analysis of node relationships, which facilitates the construction of a network graph optimally suited for graph neural network learning.Moreover, the methods by which various neural networks process graphical networks differ significantly.GIN exhibits superior performance over other graph neural Furthermore, the capacity for information processing and filtration varies across graph networks constructed via disparate methodologies.Notably, when a graph network constructed via correlation amasses 1000 edges, there is a significant decline in the accuracy of neural network processing.Comparatively, when the correlation and graph embedding models are used to construct graph networks with an equivalent number of edges, the latter outperforms the former in terms of quality.This superiority is attributed to the graph embedding model's advanced proficiency in feature extraction, understanding of the dataset's topology, and advanced analysis of node relationships, which facilitates the construction of a network graph optimally suited for graph neural network learning.Moreover, the methods by which various neural networks process graphical networks Energies 2024, 17, 3516 19 of 20 differ significantly.GIN exhibits superior performance over other graph neural networks in managing intricate network edges, which translates to enhanced prediction accuracy.

Conclusions
This paper presents the GE-GIN-GRU model, which synthesizes graph embeddings and neural network techniques to streamline wind speed prediction for stations within the study area.The proposed PCA-RF algorithm effectively reduces the number of features involved in the computation and improves the computational efficiency of the model.Subsequently, we employed deep learning-based graph embedding techniques to construct graph networks that capture the interrelationships among the sites.Our graph embedding methods capitalize on the strengths of both GraphSAGE and the Mahalanobis distance.The former excels at extracting intricate connections within wind speed features, forming feature vectors.Meanwhile, the latter demonstrates advantages in processing high-dimensional feature vectors.By fully leveraging the strengths of these two methods, we construct optimized graph networks.The GIN-GRU model seamlessly integrates diverse neural network algorithms to enhance generalization capabilities and improve prediction accuracy.Consequently, it consistently maintains excellent prediction quality and stability across wind speed datasets at varying heights.By fully leveraging the strengths of both models, we achieve deep extraction of spatio-temporal relationship features.In subsequent research, our primary goal is to construct new graph networks by combining the location coordinates of the sites with their corresponding nodes.

Figure 2 .
Figure 2. The flow chart of graph embedding.

Figure 2 .
Figure 2. The flow chart of graph embedding.

Figure 3 .
Figure 3.The flow chart of graph creation.

Figure 3 .
Figure 3.The flow chart of graph creation.
Structure of GRU (b) Flow of GRU

Figure 6 .
Figure 6.Features of the top thirteen MDI scores.

Figure 6 .
Figure 6.Features of the top thirteen MDI scores.
(a) Graph networks constructed by pearson correlation coefficient (b) Graph networks constructed by integrating graph embedding techniques with the Mahalanobis distance metric

Figure 7 .
Figure 7. Graph networks constructed by different models.
Figure 8, we compare the wind speed predictions for heights of 10 m and 30 m using the GIN-GRU model.(a) wind speed prediction at 10 m height (b) wind speed prediction at 30 m height

Figure 8 .
Figure 8. Result of comparison between GE-GIN-GRU wind speed prediction and real value.Figure 8. Result of comparison between GE-GIN-GRU wind speed prediction and real value.

Figure 8 .
Figure 8. Result of comparison between GE-GIN-GRU wind speed prediction and real value.Figure 8. Result of comparison between GE-GIN-GRU wind speed prediction and real value.

Figure 9 .
Figure 9. GE-GIN-GRU wind speed prediction compared with other models.

Figure 9 .
Figure 9. GE-GIN-GRU wind speed prediction compared with other models.

Figure 10 .
Figure 10.Comparison between PCA-RF feature screening and RF feature screening.

Figure 10 .
Figure 10.Comparison between PCA-RF feature screening and RF feature screening.

Figure 11 .
Figure 11.Comparison of varying the number of network edges on the error metrics of wind speed prediction.

Figure 11 .
Figure 11.Comparison of varying the number of network edges on the error metrics of wind speed prediction.

Table 1 .
and RMSE quantify the goodness of fit of the model.Smaller values indicate better performance and reduced prediction error.However, it's important to note that MSE and RMSE are sensitive to outliers and can reflect the distribution of prediction errors.The MAE is the mean of the absolute differences between the predicted values and the true values.Unlike MSE, MAE is not influenced by outliers and provides a robust measure of prediction accuracy.MAPE remains unaffected by outliers and provides insight into the relative error across different data points.MSE and RMSE are commonly utilized for evaluating model performance, while MAE and MAPE offer alternative perspectives that take outliers into account, providing a more comprehensive assessment.Site dataset classification.

Table 2 .
Comparison results of different methods of constructing graph networks.

Table 2 .
Comparison results of different methods of constructing graph networks.

Table 3 .
Wind speed error metrics predicted by different neural networks at 10 m height.

Table 4 .
Wind speed error metrics predicted by different neural networks at 30 m height.

Table 5 .
Different feature selection methods are utilized to predict the wind speed error metrics at 10 m height.

Table 6 .
Different feature selection methods are utilized to predict the wind speed error metrics at 30 m height.

Table 5 .
Different feature selection methods are utilized to predict the wind speed error metrics at 10 m height.

Table 6 .
Different feature selection methods are utilized to predict the wind speed error metrics at 30 m height.

Table 7 .
The impact of varying the number of network edges on the error metrics of wind speed prediction.

Table 7 .
The impact of varying the number of network edges on the error metrics of wind speed prediction.