Cyberattack patterns in blockchain-based communication networks for distributed renewable energy systems: A study on big datasets

Blockchain-based reliable, resilient, and secure communication for Distributed Energy Resources (DERs) is essential in Smart Grid (SG). The Solana blockchain, due to its high stability, scalability, and throughput, along with low latency, is envisioned to enhance the reliability, resilience, and security of DERs in SGs. This paper presents big datasets focusing on SQL Injection, Spoofing, and Man-in-the-Middle (MitM) cyberattacks, which have been collected from Solana blockchain-based Industrial Wireless Sensor Networks (IWSNs) for events monitoring and control in DERs. The datasets provided include both raw (unprocessed) and refined (processed) data, which highlight distinct trends in cyberattacks in DERs. These distinctive patterns demonstrate problems like superfluous mass data generation, transmitting invalid packets, sending deceptive data packets, heavily using network bandwidth, rerouting, causing memory overflow, overheads, and creating high latency. These issues result in ineffective real-time events monitoring and control of DERs in SGs. The thorough nature of these datasets is expected to play a crucial role in identifying and mitigating a wide range of cyberattacks across different smart grid applications.


Value of the Data
• The cybersecurity research community, especially those focusing on energy and power sectors, can derive significant value from these datasets in enhancing smart grid applications.• These rarely made-available cybersecurity datasets allow researchers to effectively distinguish between normal and abnormal system behaviors in power generation, transmission, and distribution processes.• Analysis of these datasets is instrumental in predicting the patterns of cyberattacks, including their frequency and continuity, particularly in distributed renewable energy systems.This knowledge is crucial for designing and developing advanced solutions for anomaly detection and mitigation in the power and energy sector.
• The collaboration between the cybersecurity and energy sectors, along with other stakeholders, is essential in utilizing these datasets to fortify communication infrastructures in the smart grid.This effort is essential for protecting the privacy of employees, organizations, and customers.• Enhancing the datasets with expert-annotated semantics also improves their credibility, trustworthiness, and access control.In remote system applications, such as those in ehealth, e-transportation, e-agriculture, and other fields, this enrichment is especially helpful.Such comprehensive utilization and enhancement of the datasets promise a more secure and resilient future in these fields.

Background
The needs for more energy is rising day by day, pushing electric power companies to instantly integrate green energy sources in the smart grid using advanced information and communication technologies (ICTs) [2][3][4] .However, the ICTs in DERs are susceptible to various kinds of cyberattacks such as SQL Injection, Spoofing, Man-in-the-Middle, cloning, and others [5][6][7][8][9] .Therefore, innovative solutions are essential and must be integrated to improve the resilience, stability, and efficiency of the DERs in the SG [10][11][12][13] .The blockchain technology offers a reliable, resilient, and secure information exchange architecture for monitoring and control of DERs in SG [14][15][16][17] .In this regard, some advanced blockchain technologies with different characteristics have been listed in Table 1 [18][19][20][21] for various types of SG applications shown in Table 2 [22] .Consequently, this study presents big cybersecurity datasets for further analyses, interpretations, and visualizations that were not fully explored in the original research, thereby enriching the understanding of the framework's efficiency in energy and power systems security.The big datasets were collected from various wind turbines in a wind farm, reveal nuanced aspects of the cybersecurity framework, contributing to a more comprehensive view of its potential and limitations.By making this extensive data and methodological information available, the data article fosters further cybersecurity research and innovation in blockchain-based infrastructure in various energy and power systems applications.

Data Description
This paper presents datasets of Solana blockchain-based IWSNs deployed for the events monitoring and control in geographically distributed wind turbines in a wind farm.As part of the research methodology, real-world statistics on cyber events in Solana blockchain-based IWSNs in DERs are gathered and analyzed.These datasets contain details on different kinds of cyberattacks, their frequency, and the tactics used by attackers in energy and power systems.By examining these big datasets, researchers can identify common attack vectors, vulnerabilities, and potential weak points in the security framework of blockchain-based communication systems.For the sake of reusability, the measured cybersecurity datasets provided here are in .CSV (Comma Separated Values) format.As shown in Fig. 1 , these datasets were collected and transmitted from the wind farm to the remote data center using hybrid (5G and Optical fiber) communication technologies, and stored in an MS SQL server in the SG.Statically deployed sensors were involved in computing and measuring various events such as, wind direction, speed, temperature, humidity, smoke, proximity, motion, cracks, current, voltage, frequency, etc.
During monitoring and control process, various cyberattacks including, SQL Injection, Spoofing, and Man-in-the-Middle were launched for data leakage, malicious tampering, and identity validity theft of the energy and power systems.The SQL Injection attack involves inserting malicious SQL code into a database query, allowing attackers to manipulate or steal data from the database.The Man-in-the-Middle attack allows an attacker to intercept and possibly alters the communication between two energy and power systems nodes without their knowledge, potentially manipulating the data being exchanged.On the other hand, in a spoofing attack, the attacker disguises themselves as a trusted entity to manipulate data, such as altering the information in the sensors and intelligent electronics devices cache or monitoring system website redirects.
In the simulation studies, 40 nodes (n) with their unique identity (e.g., node with unique identify number 1, is indicated as n1 and vice versa) were randomly selected to study the cyberattacks pattern in the smart grid.The frequency of measurements is configured to be real-time in intervals of every 30 min, and the values measured in the under-attack networks are given in Tables 3-7 , and their graphical representations are shown in Figs.2-6 .In addition, the values presented in tables (3 to 7) were converted from Megabits per second (Mbps) to Gigabits per second (Gbps) for a more clear understanding in the established network.
Table 3 illustrates the datasets for creating key (CrK), decryption (DeC), and signature (SiG) operations in the Solana blockchain-based IWSNs.It can be seen that the maximum and minimum latency values of CrK are changing between 3.80 and 0.03 for the randomly selected nodes in the SG.The high and low latency values of DeC are observed between 1.74 and 0.0013 for the randomly selected nodes in the SG.In addition, the maximum and minimum latency values of SiG for the randomly selected nodes are observed between 1.44 and 0.01 in the smart grid.The data presented in Table 3 highlights that the CrK latency value is higher compared to both DeC and SiG in the SG.On the other hand, the DeC latency value is slightly higher than the SiG, and most of the time both latency values overlap each other, as shown in Fig. 2 .
Table 4 presents the datasets for updating smart contracts (UsC), signature verification (SiV), and encryption (EnG) operations in the Solana blockchain-based IWSNs.It is observed that the  3 .Case (i): Table 5 indicates the network resilience datasets when the nodes are involved in malicious activity in case of single type of SQL Injection cyberattack, introduced by the adversary in the Solana blockchain-based IWSNs.The first column in Table 5 shows the normal data shared between different nodes during events monitoring and control in the DERs.On the other hand, malicious activity between specific nodes in the data-sharing process in the network is shown in columns 5(a) and 5(b), respectively.The highlighted datasets in columns 5(a) and 5(b) represent the facts when 50% and 70% of the nodes in the network are involved in malicious activities in the DERs.The highlighted datasets in these columns express that the value of data is changing frequently in the case of a single kind of cyberattack in the network.After analyzing the datasets of the randomly selected specific nodes having unique identities, e.g., n9 and n39, it is noticed that the data shared between nodes over a communication link is higher than the data packets generated in the network.On the other hand, it is also found that the data shared between nodes over a communication link is extremely low compared to the data packets generated in the network.Such types of cyberattacks may lead to memory overflow and invalid data packet issues in the Solana blockchain-based IWSNs.The impact of network resilience against a single type of attack is shown in Fig. 4 .Case (ii): Table 6 highlights the network resilience datasets when the nodes are involved in malicious activity in case of multiple cyberattacks ≤2 (Spoofing and Man-in-the-Middle), introduced by the adversary in the Solana blockchain-based IWSNs.The first column in Table 6 shows the normal data shared between nodes during events monitoring and control, while the highlighted datasets in columns 6(a) and 6(b) illustrate when 60% and 80% of the nodes in the network are involved in malicious activities in the SG.The highlighted columns show the frequent change in datasets value when the nodes are involved in malicious activities under multiple cyberattacks in the DERs.In such cases, we notice several malicious activities of the nodes, including (i) bulk data packets being shared between nodes to create memory overflow and bandwidth utilization issues (ii) invalid data packets being shared between nodes to create systems monitoring and control issues, and (iii) empty data packets were routed between the nodes to enlarge overheads in the network.These observations are made by considering the malicious activities of the specific nodes having the unique identities, e.g., n14, n62, n84, n170, etc.The impact of network resilience against multiple cyberattacks is shown in Fig. 5 .

Table 6
Network resilience datasets when the network is attacked by spoofing and man-in-the-middle cyberattacks in Solana blockchain-based IWSNs.

Nodes Network resilience operations in cyberattacks
No.
( ∼ = ) Case (iii): Table 7 highlights the network resilience datasets when the nodes are involved in malicious activity in case of multiple cyberattacks > 2 and ≤5 multiple (SQL Injection, Spoofing, and Man-in-the-Middle), launched by the adversary in the Solana blockchain-based IWSNs.In Table 7 , the highlighted datasets in columns 7(a) and 7(b) illustrate when 80% and 95% of the nodes in the network are involved in malicious activities in the SG.In case of multiple cyberattacks, we noticed several malicious activities of the nodes, including the aforementioned (i) data packets embedded with misleading information being shared between the nodes for misleading control of the power generation and distribution systems, and (ii) data packets with missing information being shared between the nodes to lose control of the smart grid.These observations were made by considering the malicious activities of the specific nodes having the unique identities, e.g., n14, n62, n84, n170, n196, etc.The impact of network resilience against multiple cyberattacks is shown in Fig. 6 .

Experimental Design, Materials, and Methods
In this study, a virtual machine Fedora32 installed on a local server with programming tools Metaplex and Rust is used to simulate the blockchain architecture in combination with RTDS/OPAL-RT in the smart grid.In the wind farm, each wind turbine was equipped with at least 9 multifunction sensors for temperature, humidity, smoke, proximity, motion, cracks, current, and voltage measurements in the energy and power systems.The path loss model [23] is used to simulate a point-to-point communication environment in each wind turbine located in different regions in the SGs.In addition, the positioning method [24] is employed to find the appropriate location of each node in the system along with perfect synchronization between power equipment and nodes in the Solana blockchain-based IWSNs [25] .In addition, the miss-

Limitations
There are some limitations with the datasets.First, the extent and variety of the datasets may not adequately cover all types of stealthy cyberattack scenarios, particularly the new ones.Therefore, it would be advantageous to generate synthetic datasets using machine learning techniques and integrate with the given datasets to encompass a broader spectrum of attack vectors and novel forms of cyberthreats in various energy and power system applications.Second, because the cybersecurity landscape is changing quickly, it is possible that the datasets may not be sufficient to adequately represent all types of network setups and user habits in diverse cyberattacks environments in smart grid.Therefore, enhancing the datasets to encompass a wider range of real-world network infrastructures might further improve the blockchain-based communication networks for power generation, transmission, and distribution systems.In future studies, the researchers might explore these issues to address cybersecurity challenges in a large-scale distributed energy and power systems.

Fig. 2 .
Fig. 2. The relationship between the number of nodes and the time spent on running creating key, decryption, and signature operations in the smart grid.

Fig. 3 .
Fig. 3.The relationship between the number of nodes and the time spent on updating smart contracts, signature verification, and encryption operations in the smart grid.ing or manipulated data values of a sensor node    involved in events monitoring were obtained using neighboring nodes matrix technique in which the average data flow   (   ) of the neighboring nodes    is observed in an event region  in time t  in the SG.This can be numerically illustrated as    =   ℊ j=1 → n

Fig. 4 .
Fig. 4. case (i), the relationship between the number of compromised nodes and the network resilience in the smart grid.

Fig. 5 .
Fig. 5. case (ii), the relationship between the number of compromised nodes and the network resilience in the smart grid.

Fig. 6 .
Fig.6.case (iii), the relationship between the number of compromised nodes and the network resilience in the smart grid.

Table 1
Blockchain technologies for IWSNs in smart grid applications.

Table 2
Communication requirements for blockchain-based IWSNs in smart grid applications.

Table 3
Datasets for creating key, decryption, and signature operations in Solana blockchain-based IWSNs.latencyvalues of UsC are changing between 1.44 and 0.64 for the randomly selected nodes in the SG.On the other hand, the maximum and minimum latency values of SiV and EnG are changing between 134 and 0.093, and 0.1053 and 0.086, respectively.The data presented in Table4clearly shows that the EnG latency value is low compared to both UsC and SiV in the SG.Most of the time, the EnG and SiV latency values overlap each other in the SG.The latency value of UsC is recorded high compared to both SiV and EnG as highlighted in Fig.

Table 4
Datasets for updating smart contracts, signature verification, and encryption operations in Solana blockchain-based IWSNs.

Table 5
Network resilience datasets when the network is attacked by SQL injection cyberattack in Solana blockchain-based IWSNs.

Table 7
Network resilience datasets when the network is attacked by SQL injection, spoofing, and man-in-the-middle cyberattacks in Solana blockchain-based IWSNs.