Link Characterization and Edge-Centric Predictive Modeling in an Ocean Network

One of the critical problems fishermen face in deep-sea fishing is the lack of low-cost communication mechanisms to the shore. The Offshore Communication Network (OCN) is a network of fishing vessels at sea whose goal is to provide wireless internet over the ocean. The impact of extreme weather conditions on wireless signals, the inability to deploy additional infrastructure, the movements induced by sea waves, the expanded mobility freedom at sea, and the misalignment of directional antenna links are all unique challenges that cause abrupt signal quality fluctuations in OCN. For this reason, it is necessary to integrate near real-time link quality assessment to improve the resilience of communication. This paper examines the characteristics of marine wireless links and the factors impacting communication using data collected through sea-trial experiments involving multiple fishing vessels. The paper proposes a Bayesian framework for forecasting signal strength by employing historical and real-time data. This hybrid learning integrates offline and online probabilistic learning methods to provide intelligence at edge devices. The evaluation of the learning scheme on real datasets and the comparison with baseline methods under different communication contexts show improved predictive accuracy in OCN links.

to deploy additional infrastructure, and the misalignment of directional antennas. Even when an ad-hoc network has been established, nodes may experience unpredictable movements due to sea waves. The topology may change rapidly due to the antenna orientation, the rocking movement of vessels, and the propagation effects leading to abrupt changes in the link quality. Therefore, integrating near real-time link quality evaluation is crucial to ensure adequate connectivity among nodes.

A. ARCHITECTURE OF OCN
The offshore communication network is a heterogeneous network of fishing vessels to provide Internet over the ocean [1]. This network uses a distributed architecture integrated with edge computing. The architecture considers fishing vessels as edge nodes that process the data collected locally, avoiding the dependency on the base station for analysis. This approach helps adapt to the ocean environment's dynamic and extreme variability. Figure 1 presents the architecture. A comprehensive description of the communication architecture and the routing schemes of OCN can be found in companion papers [1], [3], [4], [5], [6].
OCN nodes are categorized into three groups: access nodes, adaptive nodes, and supernodes, based on the communication mechanisms available in the fishing vessels. Access nodes are vessels that only provide a wireless access router (AR); adaptive nodes hold one adaptive back-haul equipment (ABE), while super nodes hold two ABEs and one AR. The latter two types of nodes are also designated as longrange (LR). In fishing vessels, each AR is equipped with an omnidirectional antenna that provides a Wi-Fi signal in 500 m and connects devices such as smartphones, tablets, and other nearby ARs. The ABEs are equipped with 120 • sector antennas that provide connectivity up to 20 km, using longrange Wi-Fi links.
The architecture of OCN is divided into three layers: Layer 0 is a mesh network of access nodes that communicate through Wi-Fi links;Layer 1 is the ad-hoc backbone network of LR nodes; Layer 2 is the network of base stations. Machine learning techniques on offline data are employed in Layer 2 nodes; the models obtained there are sent to the edge nodes in Layer 0 and Layer 1. These edge nodes update offline model parameters using real-time data. A pilot implementation of the OCN architecture has been evaluated over the Arabian Sea from a coastal village in the state of Kerala, India. In the field tests, the network provided a range of at least 50 km in the first hop and 20 km in every subsequent hop.

B. OUR CONTRIBUTIONS
The wireless link characteristics of different ocean regions are unknown; to our knowledge, there are currently no studies on signal strength analysis in maritime networks. Extensive marine experiments on multiple fishing vessels provided data for uncovering the dynamic characteristics of wireless links. We used this data to analyze the characteristics of marine wireless links and proposed a machine learning framework to predict link quality. Although many predictive models have been applied to terrestrial networks, there is less environmental dynamism compared to the OCN context. Most of the existing prediction methods assume restricted node mobility in the network. In OCN, variations between real-time and historical data are more likely in rough sea states. Hence, employing present deterministic models to predict signal behavior is hard. Instead, the OCN scenario requires a context-dependent prediction strategy that utilizes historical and real-time data depending on the dynamics of the signal variance.
In this research, we first study the factors that influence wireless signal strength in the ocean using data collected from sea trials. Second, we analyzed the signal strength characteristics in different ocean regions to understand the signal degradation with distance. Finally, we developed a hybrid prediction model based on the identified factors and communication regions. The main contributions of the paper are the following as shown in Figure 2: • The factors influencing radio signal strength in maritime communication are analyzed using the data collected from marine experiments in the Arabian Sea.
• A statistical analysis of the impact of wave-induced mobility on signal strength is performed.
• A detailed analysis of signal degradation with distance is performed in different ocean regions and categorizes the connectivity levels.
• A real-time link quality prediction system is proposed to reduce the impact of environment dynamics. The hybrid Bayesian learning framework that utilizes historical and real-time data is designed to forecast the received signal strength variations in OCN links.

C. ORGANIZATION OF THIS ARTICLE
The rest of this paper is organized as follows: Section II reviews previous works related to link analysis and prediction models for wireless networks. Section III describes the factors impacting signal strength in the marine environment, while Section IV examines signal characteristics in different ocean regions. Section V introduces the hybrid Bayesian learning framework for signal prediction. Section VI presents experimental results, followed by the concluding remarks in Section VII.

II. RELATED WORK
This section summarizes the state-of-the-art link prediction techniques applied in terrestrial wireless networks. Several studies have investigated the link quality estimation problem in ad-hoc and mesh networks; interested readers can refer to the review of Lowrance and Lauf [7]. Metrics used for link estimation can be classified into physical, logical, or hybrid [8]. Received signal strength, link quality, and signal-to-noise ratio are some of the physical layer metrics that can be directly obtained or computed from the radio hardware measurements. Logical metrics such as packet success rate and expected transmission count (ETX) are instead obtained by measuring the number of packets delivered successfully. Hybrid approaches combine physical and logical data [9]. Three main approaches for empirical link prediction exist: analytical modeling, packet-counting approaches, and statistical models. In analytical modeling, empirical measurements and theoretical models are used to estimate the probabilistic loss in a wireless channel [10]. These models are complex and based on assumptions that may not correspond to real-world conditions.
The packet-counting approach estimates the link quality by counting the number of successful and failed transmissions. To get statistics on transmission success, periodic verification of channels through probe packets is required [11], [12]. ETX and its variants are available for link quality estimation in ad-hoc networks. Draves et al. proposed a routing metric based on expected transmission time [13]. Although probe-based approaches are simple, they generate message overhead. Passive methods reduce bandwidth utilization by monitoring the MAC level traffic; MAC coordination, however, is a challenging task [14], [15]. Markovian models were also widely applied for channel quality prediction in various wireless network applications under different channel conditions [16], [17]. In the statistical prediction method, the channel features are measured and are statistically mapped to link quality [18].
Nowadays, machine-learning approaches are widely used to capture the dynamism of wireless links and to predict link quality [15], [19], [20]. Supervised, unsupervised, and reinforcement learning strategies have been applied. A signalto-noise ratio (SNR) based pattern-matching method to predict link quality is discussed by Farkas et al., assuming that the link data has a repetitive SNR pattern [21]. Many batch-supervised machine learning techniques such as support vector regression (SVR), k-nearest neighbor (k-NN), logistic regression, and regression trees (RT) have been applied in different wireless network applications to predict link quality. Kudelski et al. used SVR to predict the packet reception ratio as a link quality measure in a robotic network [22]. Liu et al. used both physical and link-layer information to predict the reception probability of the next packet using a naive Bayes classifier, logistic regression, and artificial neural networks [23]. Feng et al. proposed a prediction method using fuzzy C-means clustering and XGBoost for wireless sensor networks [24]. The authors compared current values of SNR with historical data for prediction in an 802.11 ad hoc network. Cacciapuoti et al. proposed a neural network model for estimating ETX [25]. The link prediction performance of RT, SVR, and k-NN were compared by Millan et al. in wireless mesh community networks [26].
As wireless link characteristics change over time, online machine learning algorithms have also been applied to predict link quality. Di et al. proposed locally weighted projection regression to estimate link quality in a mobile ad hoc network [27]. Marinca and Minet proposed an online prediction method to estimate link quality in IEEE 802.15.4 networks [28]. Liu et al. predicted the packet success rate using a stochastic gradient descent online learning algorithm for a logistic regression classifier [29]. However, the mobility due to environmental effects and the link characteristics in OCN is different from the terrestrial networks. Hence an adaptive real-time prediction model is recommended for the OCN scenario.
After reviewing the related works in the area of link estimation techniques in terrestrial networks, it is noted that the existing methods cannot be directly applied to the OCN environment. In many rough sea conditions, realtime data variability is much greater than historical data collected in past sea trials. Using models derived from offline data may increase the prediction errors in these situations. Additionally, the parameters affecting signal strength in the ocean need to be considered in forecasting models. Hence, the OCN demands a context-dependent prediction model that incorporates historical and real-time data depending on the changes in signal characteristics.

III. OCN LINK ANALYSIS USING DATA COLLECTED FROM THE ARABIAN SEA
This section analyzes the data collected from fishing vessels at sea and explores the factors that affect signal strength in the marine environment. The field trials were conducted over the Arabian Sea from a coastal village in Kerala, India. Figure 3 shows the fishing vessel setup for collecting data. The field trials utilized LR Wi-Fi equipment from Ubiquiti Networks and Cisco Linksys access routers. The onshore base station and the boats' ABE were 56 m and 9 m above sea level, respectively. In these marine experiments, the network offered a range of at least 40 km in the first hop and 20 km in every succeeding hop. With this setup, we collected data on sea states 3, 4, and 5. It is challenging for the fishing vessels to collect data when the sea is between states 6 to 9.
The dynamics of the ocean environment influence the connectivity of the wireless link between OCN nodes. Over time, the physical state of the sea waves changes according to their characteristics, such as period and altitude. To define ocean surface harshness, we use the Douglas Sea Scale that categorizes the ocean conditions into different states [30]. On such a scale, sea states go from 0 (calm) to 9 (phenomenal), the latter being the most severe turbulence. The Indian National Centre for Ocean Information Services (INCOIS) has developed a forecasting system named INDian Ocean FOrecasting System (INDOFOS) to predict the surface roughness of the Indian Ocean five to seven days in advance [31]. We use these sources to describe the sea rocking states in this study.
Factors contributing to link availability and topology variability are due to the unique characteristics of the marine environment. The data analysis shows that the primary factors influencing radio links between vessels in ocean communication are wave-induced vessel rocking motion, physical distance, antenna misalignment, and propagation results.

A. WAVE-INDUCED VESSEL ROCKING MOTION
Fishing vessels at sea experience six degrees of movement freedom: translational movements include linear vertical up/down (heave), side-to-side motion (sway), and front/back motion (surge), while rotational ones include left/right (yaw), up/down (pitch) and front/rear (roll) motion. Depending on the varying intensity of sea waves, different sea states experience diverse translation and rotational motions. Huber et al. studied the mobility characteristics of ships in the ocean [32]. However, the intensity of the rocking movement varies among boats and fishing vessels due to dissimilarities in size. The mobility induced by ocean waves may lead to the incorrect orientation of the transmission-reception antennas, resulting in dynamic signal variability. These variations are proportional to the roughness of the ocean surface. Therefore, we define Douglas State 0 to 9 as vessel rocking degrees from 0 to 9. Figure 4 shows real-time data collected from marine experiments in rocking states 3, 4, and 5. In these different ocean conditions, the signal strength may differ, even at the same distance between the transmitter and the receiver antenna. Table 1 summarizes the descriptive statistics of signal variations. We can note that the mean signal strength, 25 percentile, and 75 percentiles are dissimilar in these three sea states. For example, in State 3, 25% of signals are less than -78dBm and more than -61dBm. In State 5, 25% are less than -76dBm and more than -66dBm. These data clearly explain the impact of vessel movements on the signal strength.

B. DISTANCE BETWEEN FISHING VESSELS
Like terrestrial networks, the signal strength in the OCN decreases with the distance between the transmitter and the 5034 VOLUME 11, 2023 FIGURE 4. Signal strength variation in different sea states with distance: In higher sea states, more signal degradation is observed. A relationship between distance and signal strength has been noticed in all sea states after a distance ≈ 15 km. receiver. Figure 4 reveals the drop in signal strength across all sea states as the distance increases. These data are collected from the long-range Wi-Fi link between the base station and the fishing vessel. Subsequent Section IV presents a comprehensive analysis of the distance and signal strength relationship.

C. DEGREE OF ANTENNA MISALIGNMENT
Since the LR nodes utilize a sectored antenna of 120 • , the angle between the transmitting and the receiving antennas plays a significant role in the quality of signals. We conducted experiments with sectored antennas separated by a few meters of line-of-sight to investigate the impact of incorrect alignment between antennas on signal variation. A transmitter antenna was installed on the top of a building approximately 16 m in height and a receiver antenna at 150 m above ground level. An automatic antenna-rotating platform was operated to adjust the angle of alignment between the transmitter and receiver. Figure 5 shows the signal strength variations with the degree of misalignment between antennas. As the degree of incorrect alignment of the antenna increases, we can observe an increase in signal attenuation. The signal strength falls significantly after 45 • misalignment.

D. PROPAGATION EFFECTS
The characteristics of radio wave propagation over the sea surface are distinctive from those of terrestrial wireless channels. Comprehensive research on propagation effects  in terrestrial networks has been conducted, but very few investigations focus on the wireless channel properties at sea. Free space path loss and two-ray models have been used in marine environments without considering movements induced by wave rocking motions [33], [34]. A significant deviation is observed when comparing OCN propagation loss with two-ray and free-space path loss models, as demonstrated in Figure 6. This change indicates the influence of radio propagation over the sea on maritime communication. VOLUME 11, 2023

IV. SIGNAL CHARACTERISTICS IN DIFFERENT OCEAN REGIONS
We performed a detailed analysis of signal degradation using data collected from fishing vessels and categorized the connectivity of ocean regions based on the LR link from the base station. Examining the received signal strength (rss) data as time series, we computed the moving window statistics of the data to infer signal variance. Figure 7 shows the rolling variance of the data in three sea states. In State 3, we can observe a significant exception in signal variance up to 150 units of time. This deviation happens at 0-15 km from the coast. In States 4 and 5, a comparable difference of up to 50 units has been recorded. This variation in distance scale can be observed at 11 km from the shore. There is a signal inconsistency in the initial few kilometers in all sea states. Hence, we analyzed the forward and return trips of vessels. The analysis of the data from the forward and return fishing trips also reveals high signal variations within a few kilometers starting from the shore, as shown in Figure 8b. These observations demonstrated high signal variations in the region approximately 15 km from the shore in all three sea states.
Another measure obtained from the Ubiquity Nanostation device is the client connection quality (ccq). This metric FIGURE 9. When the signal strength is below -70 dBm, the ccq is less than 50. This fall in ccq happens in regions beyond 35 km from the shore. In the 0-35 km region, rss has been observed above -70 dBm, and the ccq is greater than 50.
provides the connection quality on a scale of 0 to 100 percent, taking into account errors in transmission, latency, and throughput. If the link state is ideal, then ccq will be 100%. We assume that the quality of communication is grade A if ccq is greater than or equal to 50% and grade B otherwise. Figure 9 shows the variation of communication quality with rss. For rss in the range of 0 to -70 dBm, quality is greater than 50% and is classified as grade A. This signal strength range is obtained for distances in the range of 15 to 35 km. Regions beyond 35 km from the shore show grade B quality. This finding helped to divide the spatial region based on rss and ccq into three: near-shore communication region, for distance up to 15 km; strongly connected stable region for distances from 15 to 35 km; and weakly connected stable region for the region beyond 35 km. This partitioning scheme helps to devise connectivity management algorithms for distinct communication regions. Figure 8a shows the signal strength data in the nearshore region, where uncertainty can be observed in all sea states. Forward and return trials in the near-shore region are examined to verify this observation. It revealed a similar change as shown in Figure 8b. Over short time intervals, the rss variance is high in the near-shore area compared to other zones. We cannot perceive any association between distance and rss in both forward and reverse journeys, as summarized in Figure 8b. Moreover, the temporal variance in this region is high compared to the remaining area, as demonstrated in Figure 10.

A. THE NEAR-SHORE COMMUNICATION REGION
We examined the near-field effect of the antenna to interpret the signal degradation in this region. Data employed for this analysis come from the link connecting the onshore base station and an LR node positioned at sea. The base station antenna, located at an altitude of 150 m above ground level, provides a maximum near-field effect of 735 m from the shore. However, we can notice that substantial variations occur approximately 15 km from the coast. Another reason for the high near-shore variability is the sea depth and wave characteristics. The sea wave properties are related to the sea depth; where the depth is shorter, typically near the shore, waves become higher and steeper [35]. If the water depth is smaller than half the wavelength, then the upper part of the wave moves faster than its lower part. This phenomenon is due to the friction at the bottom of the deepest segment of the wave. In these cases, the anterior surface of the wave gradually becomes steeper than the posterior surface. This steepness leads to more disturbance in the vessels and creates signal degradation. For these reasons, we concluded that the signal strength is unstable, and the link prediction is complex in the near-shore region.

B. THE STRONGLY CONNECTED STABLE REGION
In the strongly connected stable region (15 to 35 km), the variance in signal strength is low, as shown in Figure 11; hence, stable communication is possible. We can observe that the signal strength is related to distance in all three sea states. Figure 8b conveys that the behavior of signal degradation is similar in forward and return journeys from 15 to 35 km. The temporal variance of this region plotted in Figure 10 is also low. The communication quality obtained is grade A. Hence, predictive models will help in this region for connectivity optimization.

C. THE WEAKLY CONNECTED STABLE REGION
In the region beyond 35 km, the distance between the transmitter and the receiver is larger, and lower signal strength is obtained. We can observe a relationship between distance and signal strength as in Figure 11. Temporal variance is low, as plotted in Figure 10. Although the communication quality is grade B, we can use predictive models here for connectivity optimization. Figure 12 shows a comparison of signal variance in these three regions. In all sea states, the near-shore region has the highest variation decreasing from State 5 to State 3. Table 2 summarises the characteristics of links; the following are the key observations from the analysis of OCN links:

D. OBSERVATIONS FROM LINK ANALYSIS
• Observation 1: The main factors affecting communication in OCN are the wave-induced rocking movement of vessels, the distance between vessels, the antenna alignment angle, and the propagation effects.
• Observation 2: In the near-shore region, the signal strength does not depend on the distance in both forward and return fishing trips. Over short time intervals, the signal variation is abrupt; hence, the temporal variance is high.  • Observation 3: In the strongly connected stable region, signal strength is related to distance in both forward and return journeys. The temporal variance in signals is low, and communication quality is grade A.
• Observation 4: In the weekly connected stable region, the signal strength is associated with distance in both forward and return journeys. The temporal signal variance is low, and the communication quality is grade B.
• Observation 5: Predictive analytics can be employed to forecast link features in both strong and weak connected stable regions.

V. MACHINE LEARNING FRAMEWORK FOR LINK PREDICTION
The evaluation of links is one of the crucial tasks in preserving reliable connectivity between nodes. Link status forecasting enables the nodes to predict forthcoming changes in the topology and reduce packet drops. rss is a physical layer parameter to estimate the wireless channel quality. rss provides instant link data directly from the hardware, and hence we prefer it as the metric to measure the OCN link status. Each node monitors the neighborhood rss and stores it as a time series for prediction. Data gathered from numerous marine experiments provide important information about the wireless link features in a sea environment. These historical data are beneficial to develop offline predictive models for link status forecasting. Since the ocean environment is highly dynamic and the link properties are time-dependent, signal characteristics may change in different periods. These variations cause the actual prediction to deviate from the offline generated model. Spatiotemporal variations in path loss parameters further complicate the prediction of link features. Thus, we require a model to learn the link features using real-time data continuously.
Since the data streams are continuously evolving and the network connectivity is intermittent, it is not preferable to forward it to an onshore server for analysis. Here, edge computing can be employed for real-time data processing by considering fishing vessels as the network edge. Online machine learning allows continuous estimation of the link features to provide edge intelligence to the vessels. In addition, this online analysis accelerates the learning process and incorporates node-specific local knowledge into the prediction system. Integrating the online learning scheme with the offline model provides a hybrid predictive system that combines the benefits of historical and real-time measurements.
We propose a hybrid learning framework that integrates offline and online predictors as shown in Figure 13. First, a linear base model H 1 is fitted to the historical data obtained from marine experiments. H 1 has been expanded to an offline Bayesian learning model H 2 to generate a probability distribution of model parameters. During network operation, each node obtains real-time signal strength. This recent data is employed to update the offline probabilistic learner on a mini-batch basis, resulting in a hybrid model H 3 . If historical data is not available for any sea states, online data only is utilized for generating the learning model.

A. LINEAR PREDICTION MODEL
In order to analyze the OCN link characteristics, we used the historical signal strength data collected from sea trials. The dataset consists of samples with the following features: distance between vessels (d), sea state (s), and the target variable received signal strength (E s ). This dataset includes measurements from sea states 3, 4, and 5. The primary goal of OCN link analysis is to find the relationship between E s and the input features (d, s) from the historical data for predicting E s . Since the signal measurements in the near-shore area are extremely variable, the prediction area is limited to the stable region (15 -55 km). The signal prediction requirement is very high in stable regions for connectivity management.
Due to hardware errors, E s may be out of the valid range and lead to biased machine learning models. Hence data processing with cleaning and outlier removal is carried out. After the cleaning phase, 70% of data at random is used for training and the remaining 30% for testing. A forward selection strategy is applied to select a suitable offline prediction model. We trained linear, second-order, and higher-order models with the training data and applied the test set for prediction. Figure 14a shows the linear and polynomial approximations on complete data that includes the near-shore region as well. Figure 14b shows the stable communication region where higher-order models behave similarly to the linear model. Hence, a linear model is selected for offline prediction.
The linear model selected is = (θ s 1 , θ s 2 ) for each sea state. Assume that the error is a normal random variable ϵ that is independent and identically distributed with mean zero and variance σ 2 : then the likelihood of linear model E s i = f θ s (d i ) + ϵ i with probabilistic error function is given by for s = 3, 4, 5. Figure 16 shows the fitted linear model in three sea states. At higher rocking states, the slope gradually increases, and the intercept parameters of the model shrink.
An increased slope (θ s 2 ) in the high ocean states indicates that a short change in the distance has a more significant impact on the signal strength. Suppose we assume that the signal degradation is proportional to the sea rocking effect. In that case, it is possible to predict the signal behavior in another state by drifting the prediction line. Then, the models fitted to different states become parallel, and it is straightforward to generate a model for a given rocking degree with unknown data. Therefore, a statistical test was conducted to examine whether the model for another sea state varies only with the intercept of a reference state. Such a test would convey the significance of the slope of the prediction model for other sea states.

B. STATISTICAL ANALYSIS ON THE IMPACT OF ROCKING STATE
As data collection is a challenging task in rough sea states, we investigated whether signal behavior in a sea state can be predicted by drifting the prediction model of a known sea state with reference to the intercept. To understand the impact of the slope of the linear fit on the sea state, the conduct of a statistical test revealed the significance of the slope of the prediction model for other sea states. From the linear model obtained, the signal strength E s at a distance d i is given by for real constants θ s 1 , θ s 2 depending on the sea state s ∈ {3, 4, 5}. A t-test is used to conduct hypothesis tests on the parameters obtained from the linear model. We restricted the parameter rocking state to the same slope in all three sea states in the null hypothesis. Following are the null hypothesis and alternate hypothesis of the test. We test whether there is sufficient evidence to reject the null hypothesis.  (3) Here we give the details for the hypothesis testing on the independence of the regression slope from the sea state. The regression equations (1) from the three states s ∈ {3, 4, 5} can be rewritten as follows: Introducing the differences in slopes from the first sea state, ξ (4) := θ 2 and ξ (5) and then H 0 corresponds to ξ (4) = ξ (5) = 0. which admits a coefficient test in a multilinear regression model where n = n (3) + n (4) + n (5) , with n (3) , n (4) , n (5) the number of measurements in state 3, 4, 5 respectively. Least squares estimators for linear regression coefficients are normally distributed random variables. An unbiased estimator of the variance of the error ϵ i can be computed from the residual sum of squares, and k is the number of parameters of the regression model, see [36,Theorem 12.1]. Using this, the t-statistic derived for ξ (4) = β 5 follows a t-distribution under H 0 . Here we took the number of data points n as 576, and the test statistic follows a t-distribution with 574 degrees of freedom. We applied a t-test to compare the slopes of linear models in three sea states at the 95% significance level to confirm the correlation between signal strength variation in different sea states. The p-value corresponding to this statistic with 574 degrees of freedom is computed as 1.964. The t-test statistic computed from the data is observed to be larger than the p-value. Hence, the null hypothesis is rejected, and this test result statistically proves that the slopes of the prediction model vary with sea states. That is, the result indicates that we cannot assume the same model parameter θ s 2 (slope) for all sea states; hence, vessel rocking movement impacts the signal behavior.

C. OFFLINE BAYESIAN PREDICTIVE MODEL
In many fishing contexts, the historical data available from the vessels are not sufficient to generate a good prediction model. Also, the distribution of data varies dynamically with the environmental conditions. If we use linear regression, the prediction is a single value indicating the signal strength. Hence, a deterministic prediction scheme may not perform well in all scenarios of OCN. A Bayesian learning approach is appropriate here because it can generate prediction models with less volume of data. Besides, it is possible to integrate prior knowledge about the parameters into the model.
To use Bayesian methods for predicting rss, we need to define the likelihood of data and priors of model parameters. The likelihood for the data can be written as: where mean µ i is θ s 1 + θ s 2 · d i with model parameters θ s 1 , θ s 2 and σ . Each observation in the collected data (d, s, E s ) updates this distribution using Bayes' theorem. The Bayesian inference can generate a predictive distribution of E s instead of predicting the most likely value. This distribution gives information about the uncertainty of the model parameters. E s can be sampled from a normal distribution given by equation 10.
The selection of prior is important in Bayesian learning because it affects inference. Prior distribution P(θ s ) indicates the uncertainty in the parameter vector θ s and is modeled as a normal distribution. From the linear model, we have a point estimate of the parameter vectorθ s (θ s 1 ,θ s 2 ) for each state. This point estimate is used as the mean of the prior distribution. Let σ 2 1 andσ 2 2 be the variance in parameters θ s 1 and θ s 2 . Then prior distribution can be written in the form For example, we have point estimatesθ s 1 =−50.19 and θ s 2 =−0.78 in State 3. So we define the prior of the parameter θ s 1 as a normal distribution with mean −50.19 and standard deviation 3 in this state. This means that 95% of the coefficients are between −55 to −49. For θ s 1 , the prior is selected as a normal distribution with mean -0.7 and sigma 0.1 to make 95% of the coefficients between −0.8 and −0.6.
Here the priors are more informative than linear regression, where all coefficients have equal probability.
The prediction noise σ follows a half-normal distribution because we assume the noise is always positive.
The posterior probability of the model parameters is Kruschke, J. K defined Bayesian model diagram with prior parameter distributions [37]. Figure 15 shows the Bayesian model for rss prediction. Predictions of E s are obtained by integrating over the posterior distribution of model parameters. To avoid the complexity in integration, we use a Markov Chain Monte Carlo sampling.

D. ONLINE LEARNING
Nodes traversing at sea receive real-time signal strength from their neighboring nodes. Changes in the characteristics of ocean waves over time and the lack of historical data for specific sea states can cause variations in signal forecasts. All nodes stores offline prediction models derived from previously collected data for different sea states. The accuracy of these models falls when real-time data deviates further from the historical data due to environmental changes. In these situations, we can exploit online data. The offline Bayesian model is updated with new data points whenever the nodes start receiving signals. Hence, the effect of real-time data can be incorporated into the learning scheme. This model update is done on mini-batches depending on real-time data availability. This hybrid model of offline and online learning schemes can dynamically adapt the forecast to the changes in the environment.
Let E s 1 , E s 2 , . . . , E s n be the series of signal strength received at distance d 1 , d 2 , . . . , d n . In the initial state of online learning with a single data point, the posterior distribution is computed as After receiving the next data point, the distribution update is After getting n data points in a window, the posterior is updated as )P(E s n |θ s , d n ) (13) Here, the new predictive model depends entirely on the previous model and the current input. Since the posterior distribution in the previous update becomes the prior distribution in the next model, the algorithm performs online learning. Moreover, it is only necessary to maintain the current posterior distribution of parameters. When an offline model is available for a sea state, we can update the offline posterior distribution with online data using the same approach. This hybrid prediction model utilizes offline and real-time data to resolve the initial delay in online learning.
At any distance d i in a given sea state s, signal strength can be predicted using Since solving this posterior distribution is computationally intensive, we applied Markov Chain Monte Carlo approximation in the simulations. One of the critical factors influencing prediction accuracy is the windowing scheme. Since we process online data in real time, a window stores the recent samples after dropping irrelevant data points based on statistical properties. The algorithm determines which samples to keep in the window and when to update the previous model depending on incoming data changes. Instead of a fixed-size window, we employed an adaptive windowing scheme utilizing rolling variance to perceive how the signal strength distribution changes. Each incoming rss value is added to the rear side of the window, and the existing values are dropped from the front when the rolling variance is greater than a threshold. The nature of incoming data adaptively determines the window size. The model update occurs only when the difference between online data's previous and current batches is significant.
Using the predicted value of E s , each node can compute the probability of connectivity of its neighbors. A node position reorientation algorithm is proposed to optimize the connectivity among nodes in OCN [38]. In this, a node-level metric called dynamic connectivity index (DCI) is proposed to quantify the communication capability of nodes and to facilitate the reorientation of node positions for maintaining connectivity. Awareness of this connectivity index helps to maximize the network connectivity via optimal node reorientations, to choose the locations of high connectivity, and to reduce the chances of node isolation. Also, DCI can also be used to make decisions on message forwarding and communication planning. Computation of DCI uses prediction of E s using this learning framework.

VI. RESULTS
A pilot implementation of the OCN architecture was verified over the Arabian Sea. Fishing vessels used in the sea trials were equipped with long-range Wi-Fi equipment from Ubiquiti Networks along with Cisco Linksys access routers. Network components deployed in the vessels are shown in  at least 20+ km range are used. Cisco LinkSys E2500 Access Point provides 802.11 b/g/n 2.4 GHz Wi-Fi network on the boat. This setup collects data on received signal strength for varying distances.
A linear model was fitted on the data collected from three sea states and evaluated the prediction error. Ten-fold cross-validation was executed to obtain the best parameter vector θ s from the offline data. The linear model fit of these three states is plotted in Figure 16. The parameters (slope and intercept) obtained in these three states differ significantly. When we analyze states with a high wave effect, an increase in slope indicates a short difference in distance causes more signal disruption in rough sea conditions. Figure 17 shows the predicted and actual rss when testing the model. Since the signal distribution varies with time, a probabilistic approach is suggested to account for the uncertainties associated with the parameters. This probabilistic model can apply the range of values obtained for each parameter from the historical data. The purpose of using Bayesian analysis in OCN was to predict the signal strength at a node from a certain distance and sea state. To this end, we employed data from three marine states and applied Bayesian hypotheses to capture the distribution of parameters. We performed a separate Bayesian analysis for each sea state, so the model contains parameters intercept θ s 1 , slope θ s 2 and a noise term σ . The prior distribution is a normal curve with a mean value taken from the point estimate of the linear regression model, and the error distribution is selected as half-normal. We adopted the probabilistic, python-based programming framework PyMC3 in our implementation. Markov Chain Monte Carlo (MCMC), a No-U-Turn sampler, is used to draw samples from the posterior distribution of the model parameters. Two chains were used to draw 1000 samples of model parameters θ s , and noise σ is shown in Figure 18. The first part of Figure 18 is the KDE plot of intercept, slope, and noise terms. This model parameter's marginal distribution specifies each parameter value's probability. The second half of Figure 18 shows sampled values from the two chains at each step. MCMC takes a random path through the posterior distribution. We can observe that these two chains start from unique initial values and converges in the same space with stationary behavior. In addition, we used a Gelman-Rubin statistic for the convergence test. The variability between the chains and within the chains is compared. The obtained score is close to 1, indicating the convergence of the chains. Table 3 shows the posterior distribution statistics in three sea states with parameters θ s 1 as the intercept term and θ s 2 as the slope term of the fit. Posterior mean and standard deviation, along with an estimate of simulation error (mc_error) and the highest posterior density (hpd), are shown in Table 3. A 95% hpd of the slope is in the interval [-0.82, -0.75] means that the value of the parameter slope is between -0.82 and -0.75 with 95% probability. This 95% credible interval is the direct measure of uncertainty in the model parameter. A 95% hpd is defined by percentiles 2.5 and 97.5. Here we can notice that each parameter's mean, standard deviation, and credible interval vary with the sea states. As we move to higher-numbered sea states, the fitted slope increases. The increase in the slope of the rough sea state means that even a slight distance difference between the nodes causes a more significant signal loss in the rough sea states than in the low sea states. Figure 19 shows a visualization of this credible interval for States 3 through 5. The upper and lower limits of parameters with a probability of 0.94 are defined in these credible intervals. A large interval range indicates a significant uncertainty in the parameter. For example, the interval of θ s 1 for State 3 is −49 to −51, while for State 5 is −55 to −59. This variation shows more uncertainty in State 5 for θ s 1 than in State 3. Bayesian credible intervals signify which parameter values have the highest probability for a given dataset and prior distribution. Linear regression intervals do not uncover the probable values of parameters. If we have an interval with a 95% probability of linear regression, it means that 95% of the interval contains the true parameter when the experiment is repeated several times [39].
To predict the signal strength for a given distance and sea state, we can use the uncertainty of model parameters. Figure 20 shows the prediction for distance = 30 km in State 3. Also, we evaluated the confidence interval of the Bayesian model. From Table 3, State 5 has a posterior mean of -57.2 and -0.48 for parameters θ s 1 and θ s 2 . For the parameter θ s 1 , 2.5 and 97.5 hpd is [−58.45, −55.95] and for θ s 2 is [-0.52, -0.44]. Based on this data, we can infer that there is a 95% chance that rss will decrease by 0.44% up to 0.52% for every additional kilometer change in distance. Figure 21 shows a 95% confidence interval of predictions in State 5. We compared the performance of the Bayesian model with existing prediction models Random Forest Regressor (RFR), Extra Trees Regressor (ETR), Hoeffding Tree Regressor (HTR), and Decision Tree Regressor (DTR). The Mean Absolute Percentage Error (MAPE) on each prediction model is shown in Figure 22. Here we can observe that the Bayesian model has a better prediction accuracy in all sea states.
In addition, we evaluated the performance of the online learning scheme on real-time data. A sequence of rss recorded in a selected link is provided to the learning algorithm as VOLUME 11, 2023 FIGURE 21. Confidence interval of rss in Bayesian regression for State 5: 95% confidence intervals for prediction signifies that there is 95% probability that the observed rss falls between these upper and lower limits. a mini-batch. The training window size is a critical factor affecting model updates and prediction performance. If the signal strength distribution does not change considerably over a period of time, we can adopt the previous prediction models. Hence an adaptive windowing scheme is employed. The rolling variance of the real-time data is utilized to understand the changes in signal distribution. Fixed and adaptive window sizes are applied to test the predictive performance of the online learning scheme. Figure 23 shows the MAPE of the fixed and the adaptive windowing schemes. Here we can observe that the prediction error is high in the initial batches, and in the subsequent batches, it decreases due to the model updates. For a fixed window size of 15, more model updates are required than a window size of 20. Compared to the fixed window online algorithm, adaptive windowing takes fewer model updates for the same training set without reducing accuracy.  We also compared the offline Bayesian and hybrid models' prediction accuracy. Figure 24 shows the comparison of MAPE in the offline Bayesian and the hybrid Bayesian schemes for all sea states. S3Bayesian, S4Bayesian, and S5Bayesian represent the offline models applied in states 3, 4, and 5. S3Hybrid, S4Hybrid, and S5Hybrid denote the hybrid models. In all sea states, the hybrid model has less error than offline models utilizing historical data alone. Here we can observe that employing real-time data in the hybrid model improves the prediction accuracy.

VII. CONCLUSION
Designing predictive models for OCN links is a challenging problem due to the extremely dynamic nature of the marine environment. In this paper, we analyzed the behavior of wireless signals in different regions of the sea and investigated the factors affecting communication with the data collected from sea trials over the Arabian Sea. We presented a hybrid approach combining an offline and an online predictive model for learning signal strength characteristics using a Bayesian framework. The offline model is trained on historical data, while the online model exploits real-time data in edge devices for learning. The proposed learning scheme is evaluated on real data collected from sea trails. We studied the predictive performance of the learning framework and inferred that the performance improved with the hybrid learning scheme. The prediction accuracy is expected to improve when more real-time data is available for online learning. In the future, we plan to investigate propagation-dependent features in the learning scheme.