Multilayer cyberattacks identification and classification using machine learning in internet of blockchain (IoBC)-based energy networks

The world's need for energy is rising due to factors like population growth, economic expansion, and technological breakthroughs. However, there are major consequences when gas and coal are burnt to meet this surge in energy needs. Although these fossil fuels are still essential for meeting energy demands, their combustion releases a large amount of carbon dioxide and other pollutants into the atmosphere. This significantly jeopardizes community health in addition to exacerbating climate change, thus it is essential need to move swiftly to incorporate renewable energy sources by employing advanced information and communication technologies. However, this change brings up several security issues emphasizing the need for innovative cyber threats detection and prevention solutions. Consequently, this study presents bigdata sets obtained from the solar and wind powered distributed energy systems through the blockchain-based energy networks in the smart grid (SG). A hybrid machine learning (HML) model that combines both the Deep Learning (DL) and Long-Short-Term-Memory (LSTM) models characteristics is developed and applied to identify the unique patterns of Denial of Service (DoS) and Distributed Denial of Service (DDoS) cyberattacks in the power generation, transmission, and distribution processes. The presented big datasets are essential and significantly helps in identifying and classifying cyberattacks, leading to predicting the accurate energy systems behavior in the SG.


Value of the Data
• The data provides insights into cyberattacks, helping in the development of predictive models that can anticipate future threats and vulnerabilities in energy and power systems.• Data analysis helps in customizing security protocols and measures tailored to specific threats and vulnerabilities in distributed energy systems, enhancing overall system security.• Data contributes to a better understanding of the current security posture of the energy and power systems, aiding in strategic decision-making and resource allocation for cybersecurity.• Cybersecurity and smart grid agencies, along with other stakeholders, can leverage these datasets to develop a more intelligent and resilient data exchange network.This forwardthinking strategy will help in identifying and mitigating different types of cyberattacks, ensuring the protection of the confidentiality of employees, companies, and clients.

Background
The increasing global demand for energy is met primarily by fossil fuels like coal and gas, leading to significant emissions of carbon dioxide and pollutants [ 2 , 3 ].This poses health risks and intensifies the problem of climate change.To address these challenges, it's crucial to incorporate green energy sources like hydropower, wind, and solar power in the smart grid by using blockchain-based advanced information and communication technologies [4][5][6].Table 1 highlights various characteristics of blockchain technology in the smart grid [7][8][9].However, there are several critical cybersecurity issues, which brings unique challenges to the reliability, stability, and resilience of the smart grid [ 10 , 11 ].In recent years, the scientific community acknowledges the importance of machine learning technology since it plays a crucial part in predicting current and future behavior in various industrial applications.In smart grid, machine learning could be beneficial in terms of analyzing unique data patterns to identify and classify the different cyberattacks behavior to improve the behavior of energy, [12][13][14].Table 2 highlights various types of machine learning algorithms, their strength, weaknesses, and potential applications [15][16][17].Consequently, this study presents cybersecurity datasets collected from wind turbines and solar panels in energy systems, which were not fully explored previously systems [18][19][20].These datasets offer new opportunities for analysis and visualization, enhancing understanding of a cybersecurity framework's effectiveness in energy and power systems.The comprehensive data contributes to evaluating the cybersecurity framework's potential and limitations, encouraging further research and innovations in the smart grid.

Data Description
This study presents datasets obtained from the deployment of IoT-enabled advanced Solana blockchain-based IWSNs and IEDs deployed for the purpose of monitoring and controlling events across spatially dispersed solar panels and wind turbines in the SG.The existing datasets offer an in-depth examination of various cyberattack modalities, delineating their occurrence rates, and elucidating the strategies employed by malefactors targeting critical energy and power infrastructures.The data gathering mechanism involved statically deployed nodes tasked with the continuous monitoring and recording of a wide array of environmental and operational parameters, including, wind direction, velocity, ambient temperature, humidity levels, smoke detection, proximity, motion, structural integrity (cracks), electrical current, voltage, and frequency metrics.As depicted in Fig. 1 , the data acquisition process involved the collection and transmission of energy and power systems data from the solar park and wind farm directly to the remote data center, leveraging a hybrid communication infrastructure that combines 5G and optical fiber technologies in the SG.Subsequently, the collected data is securely stored on an MS SQL server situated in the SG.To ensure the datasets applicability and facil-

Late Fusion (Decision Fusion)
Combines decisions or outputs from multiple models  itate their future use, they have been accurately structured in the .CSV format (available at: https://data.mendeley.com/datasets/zc9z7m7gcd/1).Throughout the surveillance and management phases, the network was exposed to a series of DoS and DDoS cyberattacks, aimed at compromising data integrity, effectuating unauthorized data manipulation, and usurping identity verification of the energy and power systems, including the users and utilities.A DoS attack is a cyber-attack where the attacker seeks to make a power systems or energy network resource unavailable to its intended users or neighboring devices by temporarily or indefinitely disrupting services of a host connected to the internal and external network.This is achieved by overwhelming the target with a flood of requests or packets, causing the system to slow down or crash, thereby denying service to legitimate users and systems.The attack can be executed from a single internal or external internet connection, targeting one or more websites, servers, or other resources.Common methods include flooding the network to prevent legitimate network traffic, disrupting connections between two machines, preventing access to a service, or exhausting resources in a targeted device.Instead, a DDoS attack is similar to a DoS attack, but the attack originates from multiple, often thousands of internal or external sources.This makes it much harder to stop because blocking a single source doesn't sojourn the attacks.In the DDoS attack, a network of intelligent devices like computers (often part of a botnet) is used to flood the target with an overwhelming amount of traffic.This can include requests for connections, messages, or malformed packets, with the goal of exhausting the target's resources.DDoS attacks can be volumetric (increasing traffic to saturate the bandwidth), protocol attacks (targeting network layer protocols), or application layer attacks (targeting web applications with seemingly legitimate requests).DDoS attacks are generally more complex and difficult to mitigate than DoS attacks because they involve multiple distributed sources in distributed energy and power systems.Both types of attacks can cause significant damage to existing energy and power infrastructures by disrupting services, causing financial loss, and damaging reputations of the energy utilities.In order to identify these imperceptible cyber attacks, a hybrid machine learning model as shown in Fig. 2 is designed to perform the rigorous analysis of the collected big datasets to uncover their recurring patterns, highlighting their inherent vulnerabilities in the smart grid.
In the initial phase, data is collected from various devices through a blockchain-based communication network and stored in the input data storage at the data center as shown in Fig. 2 .Following initial preprocessing, attributes and weights for each metric are accurately determined to assess changes in the original data values.The proposed HML model employs DL and LSTM techniques for analyzing big datasets to identify the precise patterns with their initial true values in the smart grid.The DL model utilizes CNN layers to process raw data, enabling the effortless identification of vital features without the need for manual intervention.The architecture of the CNN consists of convolutional layers, max pooling layers, and sequential layers.Convolutional layers are mainly responsible for feature extraction, while the max pooling layers contribute to minimizing the datasets dimensionality and increasing robustness.Sequential layers known as a layered structure, simplify the development of linearly connected neural networks, thereby enhancing the efficiency of recognizing patterns and structures of the energy systems datasets.LSTM, a specialized type of recurrent neural network, is proficient at handling sequences and understanding the long-term dependencies, making it ideal for time-series energy systems datasets in smart grid.It is proficient of analyzing temporal dynamics and learning from event sequences to either predict future outcomes or classify anomalies.The process starts with the preprocessing of energy and power systems data, during which relevant features are extracted using CNN technique.This extracted data is then fed into an LSTM model, where it undergoes analysis to assess temporal dependencies and sequential patterns.The LSTM model is specifically designed to refine the energy and power systems datasets, leveraging its advanced temporal analysis proficiencies to improve the relevance and accuracy of the data values significantly.This step is vital for processing the data with high precision and guaranteeing that it imitates the most pertinent information with highest accuracy.The combination of DL for spatial feature extraction and LSTM for modeling time-dependent aspects advances the detection of anomalies in complex datasets of energy systems.Finally, the updated information is stored in the output data storage as results, signifying the accomplishment of an advanced data processing cycle.
Eq. 1 shows the multidimensional input data D  in matrix | ×  | received from different energy systems in time   in the smart grid.The data is collected from various kinds of sensors installed on different distributed energy and power systems, operating on blockchain-based communication network in the smart grid (as discussed in detail in experimental design, materials, and methods Section).Eq. 2 illustrates the DL method, which is applied on the received data matrix | ×  | in time   and the output of this process is then forwarded to the LSTM model in time   +1 .It significantly improves the capacity to identify anomalies within complex energy and power systems datasets in the smart grid.Lastly, Eq. 3 describes the output data obtained from both DL and LSTM model is stored in the data storage as output for observing the change in the original datasets in time   +1 in the smart grid.
Figs. 3 shows a detailed view of the multidimensional data collection process from a variety of IEDs and sensors in DERs in SG.In Fig. 3 , X-axis highlights the upper most limit of the received data signal values set to 0.8%, while Y-axis shows the different time domain between 0 and 0.5sec in the SG.Figs.storage devices play within the smart grid.Given the erratic nature of wind and solar power output, the data flow that is being highlighted here is crucial for maintaining a balance between the supply and demand of energy.The information is strategically routed as indicated by the unique black colorful line, guaranteeing that energy storage is managed as efficiently as possible to maintain grid stability and dependability.Fig. 3 (e), which provides a broad overview of the data integration and flow from all wind and solar powered systems, such as storage, wind, and solar power, to the smart grid control center.The intricate, multi-domain data gathering tactics are highlighted in this figure, which also shows how operational, maintenance, and performance data are combined to provide information for demand response and real-time grid management.The black colored lines highlight how the smart grid can adaptively control energy production, storage, and distribution in response to changing demands and conditions.They also show how harmonized data flow occurs across various time scales and frequencies.
Figs. 3 (f) to (g) illustrate the consequences of DoS attacks aimed to expose vulnerabilities in the distributed energy and power systems.Initially, the data transmission between energy systems proceeds normally for the first few seconds.However, as time progresses, the integrity of Fig. 3. Continued the received data deteriorates; it becomes increasingly difficult to discern the values within the timeframe of 0.081 seconds to 0.3 seconds due to manipulation.Subsequently, the data signals return to normalcy as the blockchain algorithm initiates an immediate recovery process after receiving input from the hybrid machine learning algorithm for the compromised nodes, effectively isolating them in the network.The similar process is repeated in different time domain cycles in other figures.However, the impact of DDoS attacks is observed to be more severe on the DERs in the SG, as illustrated from Fig. 3 (h) to (n).Initially, there is a noticeable deviation in the data transmission performance among different energy systems.The integrity of the received data significantly deteriorates, making it challenging to accurately discern values within the timeframe of 0.081 seconds to 0.3 seconds due to manipulation, as demonstrated in Fig. 3 (h).
Subsequently, normalcy in data signals is restored as the blockchain algorithm initiates an immediate recovery process, reinforced by insights from a hybrid machine learning algorithm, for the compromised nodes, thereby effectively isolating them in the network.This recovery mechanism is consistently applied in various time domains in all subsequent figures.In both cases, the proportion of compromised nodes were remained below 51%, enabling the blockchain algorithm, with the assistance of the hybrid machine learning algorithm, to commence the recovery of the compromised nodes in the DERs.Finally, the most severe scenario, combining both DoS and DDoS attacks, is depicted in Fig. 3 (o), where the entire data packets signals are corrupted when received at the control center.In this case, it becomes significantly challenging for the blockchain algorithm to facilitate recovery in a short time as observed in previous figures, though machine learning algorithm provides identifications of the cyberattacks.The primary reason for slow recovery is the escalation in the number of compromised nodes beyond the 51% threshold level, complicating the system's ability to autonomously recover from the DoS and DDoS cyberattacks in the smart grid.

Experimental Design, Materials, and Methods
The simulation model is depicted in Fig. 4 consists of a network that incorporates several sensor nodes in the smart grid.These nodes are crucial for gathering data from multiple energy and power systems in the smart grid.The data collected by nodes is transmitted to a centralized data storage server, which serves as the core for information management and aggregation.The nodes are connected to the data storage server over a blockchain-based wireless network architecture, offering continuous data transfer to the central repository [ 21 , 22 ].In a hybrid topology, nodes sense real-time measurements including voltage, current, signal strength, network traffic, and bandwidth usage, along with power parameters.This setup allows for continuous monitoring of the energy and power systems status.Furthermore, the nodes collect information on the status of critical equipment and generate event reports that detail system events and faults.They also enable the timely transmission of control commands for device management.Additionally, the nodes gather security metrics to guarantee the integrity of data transmission and evaluate the connectivity status of devices across both wired and wireless networks.In sum, the data gathered by nodes enable a comprehensive understanding of the smart grid, enhancing the monitoring, control, and optimization of its components.Consequently, by employing advanced analytical methods, like machine learning (as explained in Data Description Section), identify patterns and anomalies in the data impacted by DoS and DDoS cyberattacks.This enables the early detection of potential security breaches or unauthorized access attempts in power and energy systems.This setup not only ensures the integrity and reliability of the data collected but  also supports efficient and secure data handling.Moreover, the data storage server is directly linked to the RTDS, a tool vital for simulating real-time operations.The connection, established over an internal network, underscores the effective collaboration between the simulation tool and the data storage system.The RTDS is essential for modeling power and energy sources, such as wind turbines and solar panels models in the smart grid.The administrator has access to all of this updated information, allowing to take timely and crucial actions to mitigate cybersecurity threats in energy and power systems.In this manner, it lays the groundwork for a reliable, resilient, and secure energy infrastructure that benefits consumers, businesses, and society at large.In addition, the blockchain architecture is simulated with the help of programming tools C ++ , Java, and Rust installed on a virtual computer running Fedora32.The path loss model [ 23 ], synchronization between nodes [ 24 ], and positioning technique [ 25 , 26 ] were employed to identify the energy systems and nodes location in the SG.In this study following simulation parameters have been used to perform simulations.Simulation parameters and their values used in this study are given in Table 3 .

Fig. 3 .
Fig. 3. Multidimensional data collection using IEDs and sensors in DERs in SG.

Fig. 4 .
Fig. 4. Simulation design and testing in smart grid.

Table 1
Various blockchain techniques in smart grid.

Table 2
Machine learning techniques with their applications, strengths, and weaknesses in smart grid.

Table 2 (
continued ) Hierarchical ClusteringA method of cluster analysis which seeks to build a hierarchy of clusters• Gene sequence analysis • Social network analysis • Market segmentation • Not require a pre-specified number of clusters • Easy to interpret and visualize • Can capture complex structures • Scalability issues with large datasets • Sensitive to noise and outliers • Finding the optimal number of clusters can be subjective

Table 3
Simulation parameters and values.