A Deep Learning Framework for Real-Time Bird Detection and Its Implications for Reducing Bird Strike Incidents

Bird strikes are a substantial aviation safety issue that can result in serious harm to aircraft components and even passenger deaths. In response to this increased tendency, the implementation of new and more efficient detection and prevention technologies becomes urgent. The paper presents a novel deep learning model which is developed to detect and alleviate bird strike issues in airport conditions boosting aircraft safety. Based on an extensive database of bird images having different species and flight patterns, the research adopts sophisticated image augmentation techniques which generate multiple scenarios of aircraft operation ensuring that the model is robust under different conditions. The methodology evolved around the building of a spatiotemporal convolutional neural network which employs spatial attention structures together with dynamic temporal processing to precisely recognize flying birds. One of the most important features of this research is the architecture of its dual-focus model which consists of two components, the attention-based temporal analysis network and the convolutional neural network with spatial awareness. The model’s architecture can identify specific features nested in a crowded and shifting backdrop, thereby lowering false positives and improving detection accuracy. The mechanisms of attention of this model itself enhance the model’s focus by identifying vital features of bird flight patterns that are crucial. The results are that the proposed model achieves better performance in terms of accuracy and real time responses than the existing bird detection systems. The ablation study demonstrates the indispensable roles of each component, confirming their synergistic effect on improving detection performance. The research substantiates the model’s applicability as a part of airport bird strike surveillance system, providing an alternative to the prevention strategy. This work benefits from the unique deep learning feature application, which leads to a large-scale and reliable tool for dealing with the bird strike problem.


Introduction
Aviation safety and wildlife, including birds, has been a subject of concern and research for many years within the aviation industry [1,2].Bird strikes, which are a relatively common phenomenon, a serious threat to aviation security, and an expensive problem, have been increasing with the increase in air traffic [3].This article presents the improvement in the deep learning model that will be used to sense and avoid bird strikes in airports' surrounding areas.

Background
Bird strikes have been an unsolvable problem of aviation since its beginning: the first moment it occurred was in 1905 [4].Such accidents can cause irreversible destruction of an aircraft, operational disturbances that are considered moderate, and even the chance of a catastrophic failure, posing a threat to human lives [5][6][7].Pursuant to the Federal Aviation Administration (FAA), bird strikes have been the major cause of damage to US civil and military transportation service estimated in the annual amount of USD 600 million in 2010, and now the costs will be higher [8].The increase in global air traffic and variations in the population of birds due to environmental changes are other reasons for avian flu spread [9].The array of solutions used in bird strike avoidance ranges from habitat management to avian radar systems and acoustic deterrents [10].Nevertheless, they appear to be inadequate in terms of sensing and flexibility when different environmental conditions are put in place.This explains why there is an urgent need to look at better and superior alternatives.
The authors [11] aimed to develop a prediction model using machine learning techniques such as fuzzy decision trees and dynamic neural networks, which are quite useful in the aviation safety system and were adopted to give a prognosis for the failures in the aircraft beforehand to increase aviation safety systems and decrease aircraft crash rates.The results of the prediction are within a range of 80% to 90%.The synthetic dataset testing against the proposed aircraft crash prediction model was one of the ways to test the proposed aircraft crash prediction model.
In another research study [12], a new highly challenging dataset, Air Birds, has been presented, formed by 118,312 images obtained by an aerial camera.It has a complete set of 409967 GPS-like tags attached manually to each flying bird captured in the image.The average dimension of all the annotated objects has 10 pixels as the smallest one, following 1920 × 1080 images.The dataset consists of pictures of the airport obtained by four cameras installed in a network in four seasons of 1 year.It features both birds on the ground and in flight, changes in lighting, and 13 meteorological situations.In the work of [13], the researchers presented a sample of drone and bird images in various environments and habitats.The dataset was curated using the Rob flow software which allowed us to custom edit the images using AI-assisted bounding boxes, polygons, and instance segmentation.A variety of input and output formats, including import and export, were supported by the software that could provide an interface with different machine learning frameworks.To attain the best accuracy, each image was manually segmented from edge to edge and detailed for training the YOLO-based model.This research in [14] fully utilized a deep learning-based object detection structure by aerial photographs with an Unmanned Aerial Vehicle (UAV).The aerial photography dataset used included different birds' photos and those taken in various habitats, near lakes, and adjacent to farmland.Moreover, the aerial view of the placements of the bird decoys was filmed to obtain more birds' information.Bird detection or classifying models using Faster R-CNN, R-FCN, SSD, Retina net, and YOLO were created, and their time complexity as well as precision were evaluated by making comparisons between them.The test results revealed that the Faster R-CNN performs better than YOLO in terms of accuracy while YOLO is faster among the models.
The authors of [15] developed a unified spatiotemporal CNN model for visual object segmentation which was composed of the spatial segmentation and temporal coherence networks seeing as they achieved accurate results on challenging datasets [16][17][18][19].The authors of [20] tested feature extraction variation and spatiotemporal convolution deep neural network (STCNN) for driver fatigue detection.The combination of the two produced a better result than either feature extraction or STCNN both used independently, as revealed by the experiments.The attention, time, and convolutional neural network sequence that was inspired by an attention mechanism of CNN is a component of the proposed STCNN architecture.In another study [21], a new approach to the aviation failure problem was suggested that comprised STL (seasonal trend decomposition losses) and a hybrid model that uses a transformer and ARIMA components.The combination of STL and ARIMA was a good idea to predict aviation failure.
Previous approaches have largely relied on simpler CNNs or traditional machine learning algorithms that do not account for the temporal dynamics of moving objects.Such models lack the sophistication needed to differentiate between birds and other moving objects like aircraft, leading to significant detection inaccuracies.Additionally, these systems often require extensive manual tuning and are not scalable to different or larger datasets without substantial reconfiguration.
The fundamental research for aviation safety gravitates around machine learning models for aircraft fault estimation and object detection; however, there are insufficient studies particularly on real-time bird detection to prevent bird strike incidents.The present study has taken this gap to its fill by introducing new feature extraction methods coupled with deep learning techniques.

Motivation
Deep learning as a revolutionized AI technique has demonstrated outstanding abilities of breaking down and looking through visual data using CNNs and other such methods, making it apt for bird discovery in the densely crowded areas/airport locality.The inspiration in the neural network powerfulness in the field comes from this experiment and its objectives are to apply deep learning to the real-time identification of birds, which is vital for safety equipment in airplanes.The motivation of this research work is rooted from the need to improve the existing bird detection systems, which are often limited by their detection range and accuracy.

Problem Definition
The identification of precise, timely bird detection in and around airport areas with the aim of preventing bird strikes thus becomes the focus of this study.The complexity is in the multitude of bird species with different sizes, flight patterns, and behaviors, as well as in the unusual airport environment with moving aircraft, cars, and various types of infrastructure.Determining a model capable of classifying birds from fast-changing backgrounds under different weather conditions and times of day is a great difficulty.

Objectives
The main objective of the paper is to advance bird strike prevention technologies through the development of a novel deep learning framework optimized for real-time bird detection in aviation environments.The key goals supporting this objective are as follows: • To develop spatiotemporal convolutional neural network (ST-CNN) mechanisms for enhanced feature recognition and dynamic temporal processing for accurate motion tracking.• To optimize model performance metrics by fine-tuning the model's parameters, ensuring minimal false positives and negatives for reliable detection.

Novel Approach
The technique suggested in this research is a novel ST-CNN which incorporated both spatial and temporal characteristics to identify birds.Using ideas originating from the analysis of EEG signals [20] for the detection of driving fatigue, the model introduces spatial attention with temporal dynamics.This makes it easier to detect the movement patterns of birds and minimize false measurements.The proposed technique makes the model capable of discarding noise information while targeting specific information relative to birds in flight such as flapping, and direction changes of wings are picked from the video feed.

Structure of the Paper
The remaining section of this paper is organized as follows: Section 2 describes the development of the model and the experimental methodology.Section 3 presents the findings of the performance evaluation of the proposed model and the comparisons with other existing models.Finally, Section 4 concludes with the highlight of the research contribution and the possible future research on the proposed model.

Methodology and Methods
Towards having a perfect solution to the bird strike problem, proper consideration was taken in the following aspects: proper collection of the dataset and creation of a DL model.This section defines and describes further the basic components of the proposed study, ranging from data collection methods down to the modeling method.The flow of the proposed research is illustrated in Figure 1.

Structure of the Paper
The remaining section of this paper is organized as follows: Section 2 describes the development of the model and the experimental methodology.Section 3 presents the findings of the performance evaluation of the proposed model and the comparisons with other existing models.Finally, Section 4 concludes with the highlight of the research contribution and the possible future research on the proposed model.

Methodology and Methods
Towards having a perfect solution to the bird strike problem, proper consideration was taken in the following aspects: proper collection of the dataset and creation of a DL model.This section defines and describes further the basic components of the proposed study, ranging from data collection methods down to the modeling method.The flow of the proposed research is illustrated in Figure 1.

Data Collection and Preprocessing
Given that deep learning models rely on the quality and variety of training data, then "the more data, the better the model".To this end, a large sample of bird images belonging to various species, flying poses, and parts of the environment was collected.To supplement the dataset's variety, additional techniques of image scaling, rotation, and altering the lighting were used to simulate different weather, time, and the bird's distance from the camera.Table 1 discusses the parameters used in data preprocessing.Scale images from 0.5× to 2×

Model Architecture Design
To resolve the limitations of the current models, the proposed model is a spatiotemporal convolutional neural network that integrates spatial attention mechanisms and temporal analysis into one system.The model architecture is based on the latest works on

Data Collection and Preprocessing
Given that deep learning models rely on the quality and variety of training data, then "the more data, the better the model".To this end, a large sample of bird images belonging to various species, flying poses, and parts of the environment was collected.To supplement the dataset's variety, additional techniques of image scaling, rotation, and altering the lighting were used to simulate different weather, time, and the bird's distance from the camera.Table 1 discusses the parameters used in data preprocessing.Scale images from 0.5× to 2×

Model Architecture Design
To resolve the limitations of the current models, the proposed model is a spatiotemporal convolutional neural network that integrates spatial attention mechanisms and temporal analysis into one system.The model architecture is based on the latest works on using EEG signals to detect driver fatigue, generalized for the visual input [20].The model consists of two main components: a spatially aware convolutional network (SACN) as well as an attention-based temporal analysis network, named ATAN.Sensors 2024, 24, x FOR PEER REVIEW 5 using EEG signals to detect driver fatigue, generalized for the visual input [20].The m consists of two main components: a spatially aware convolutional network (SACN well as an attention-based temporal analysis network, named ATAN.

Attention-Based Temporal Analysis Network (ATAN)
ATAN is intended to work with the reiterated image data, while temporal featur birds' motion describe the flow of data.It uses forward-backward LSTM layers, fur improved with attention activities, to pay attention to essential temporal features refl ing the flight of birds like the flapping of wings and fluctuations in speed.The ov process of the proposed ATAN is shown in the following Figure 2. Let   ,  , . . .,  denote a sequence of image frames captured over time T. H each frame  represents a snapshot at time t.These frames are processed individual extract spatial features before being analyzed temporally.For each frame  , we defin extracted spatial features as  .These are obtained via a convolutional neural netw (CNN), which processes each frame as follows:

𝐹 𝐶𝑁𝑁 𝐼
The sequence of feature maps  ,  , . . .,  is then fed into a bidirectional L network to capture the temporal dynamics of the birds in flight.The bidirectional L allows the network to have both forward-looking and backward-looking insights at time step, capturing patterns that unfold over time.Let  and  denote the den states of the forward and backward LSTMs at time t, respectively: The combined hidden state at each time t is then given by concatenating the forw and backward hidden states.Equation ( 4) is obtained by combining Equations (2) and With the temporal features extracted, an attention mechanism is applied to weigh significance of each time step's features based on their relevance to the detection task.attention weights  are computed using a softmax function over the energy score which are derived from the hidden states: where  ,  , and  are trainable parameters of the attention network.The context tor C, which represents a weighted sum of the hidden states, is then calculated: Let S = {I 1 , I 2 , ..., I T } denote a sequence of image frames captured over time T. Here, each frame I t represents a snapshot at time t.These frames are processed individually to extract spatial features before being analyzed temporally.For each frame I t , we define the extracted spatial features as F t .These are obtained via a convolutional neural network (CNN), which processes each frame as follows: The sequence of feature maps {F 1 , F 2 , ..., F T } is then fed into a bidirectional LSTM network to capture the temporal dynamics of the birds in flight.The bidirectional LSTM allows the network to have both forward-looking and backward-looking insights at each time step, capturing patterns that unfold over time.Let H f wd t and H bwd t denote the hidden states of the forward and backward LSTMs at time t, respectively: The combined hidden state at each time t is then given by concatenating the forward and backward hidden states.Equation ( 4) is obtained by combining Equations ( 2) and (3).
With the temporal features extracted, an attention mechanism is applied to weigh the significance of each time step's features based on their relevance to the detection task.The attention weights α t are computed using a softmax function over the energy scores e t , which are derived from the hidden states: where v a , W a , and b a are trainable parameters of the attention network.The context vector C, which represents a weighted sum of the hidden states, is then calculated: Sensors 2024, 24, 5455 6 of 17 This context vector C is a dynamic representation of the entire sequence, emphasizing the most relevant temporal features for bird detection.Finally, the context vector C is passed through a fully connected layer for classification.The output layer provides a probability distribution over two classes, "bird" and "no bird": where W c and b c represent the weights and bias of the output layer, and P provides the probabilities of bird presence in the sequence of frames.

Spatially Aware Convolutional Network (SACN)
SACN focuses on analyzing individual frames for spatial features, utilizing a series of convolutional layers followed by spatial attention mechanisms.This component is tasked with identifying bird-specific features against complex airport backgrounds, effectively distinguishing birds from other moving objects like aircraft and vehicles.Figure 3 represents the workflow of the SACN mechanism.

𝐶 𝛼 𝐻
This context vector C is a dynamic representation of the entire sequence, emphas the most relevant temporal features for bird detection.Finally, the context vector passed through a fully connected layer for classification.The output layer provides a ability distribution over two classes, "bird" and "no bird": where  and  represent the weights and bias of the output layer, and P provide probabilities of bird presence in the sequence of frames.frame  undergoes a series of convolutional operations that extract features at mu levels of abstraction.We define the output feature map after the  convolutional for frame  as  .The convolutional operation can be mathematically represented a lows:

𝐹 𝜎 𝑊 * 𝐹 𝑏
where " * " denotes the convolution operation,  and  are the weights and bias the  convolutional layer,  is the input feature map from the previous layer   , and  is the non-linear activation function, such as ReLU.
After each convolutional layer, a pooling layer is typically applied to reduce s dimensions and enhance the invariance of the features.The pooling operation for l is denoted as  and is defined as follows:

𝑃 𝑝𝑜𝑜𝑙 𝐹
where pool represents the pooling function, and here, max pooling is used.This most used type of pooling in CNNs.It reduces the dimensionality of feature ma applying a maximum filter to non-overlapping subregions of the initial feature map each such subregion, max pooling takes the maximum value: ,  Let I = {I 1 , I 2 , ..., I N } represent a batch of N image frames where each frame I n corresponds to a different point in time or a different camera view, with n ∈ {1, 2, ..., N}.Each frame I n undergoes a series of convolutional operations that extract features at multiple levels of abstraction.We define the output feature map after the l th convolutional layer for frame I n as F l n .The convolutional operation can be mathematically represented as follows: where " * " denotes the convolution operation, W l and b l are the weights and biases of the l th convolutional layer, F l−1 n is the input feature map from the previous layer with F 0 n = I n , and σ is the non-linear activation function, such as ReLU.
After each convolutional layer, a pooling layer is typically applied to reduce spatial dimensions and enhance the invariance of the features.The pooling operation for layer l is denoted as P l n and is defined as follows: where pool represents the pooling function, and here, max pooling is used.This is the most used type of pooling in CNNs.It reduces the dimensionality of feature maps by applying a maximum filter to non-overlapping subregions of the initial feature map.For each such subregion, max pooling takes the maximum value: where W(i, j) is the window of the feature map F l n covered by the pooling operation at position (i, j).
To emphasize relevant features and suppress less useful information, a spatial attention mechanism is applied to the feature maps.The mechanism generates a spatial attention map A l n which is element-wise multiplied with the feature map to weight the importance of each feature spatially: where Conv is a convolution operation specific to the attention mechanism, W l a are the weights of the attention convolution, and ⊙ represents element-wise multiplication.
The feature maps from different layers can be integrated using a feature integration function ϕ, which combines information across layers to form a rich representation of the spatial features: The integrated feature map F integrated n is then flattened and passed through one or more fully connected layers for classification.The output is a probability score indicating the likelihood of a bird's presence: where W c and b c are the weights and bias of the fully connected layer.
The SACN is trained using a supervised learning approach where the loss function measures the discrepancy between the predicted probabilities and the ground truth labels.The loss function L SACN is typically a cross-entropy loss defined as follows: where y n represents the true label for frame I n , with y n = 1 indicating the presence of a bird and y n = 0 otherwise.

Integration and Fusion
The outputs of ATAN and SACN are fused using a custom integration layer, combining temporal and spatial insights to make a final prediction.This fusion approach ensures that the model's decision-making process benefits from a comprehensive understanding of both the spatial appearance and temporal behavior of birds in flight.Assume that the ATAN and SACN output feature maps for the n th frame are represented as C n (the context vector from ATAN) and F integrated n (the integrated feature map from SACN), respectively.The work flow of ATAN and SACN is given in the Figure 4.
Sensors 2024, 24, x FOR PEER REVIEW where  ,  is the window of the feature map  covered by the pooling operat position ,  .
To emphasize relevant features and suppress less useful information, a spatial tion mechanism is applied to the feature maps.The mechanism generates a spatial tion map  which is element-wise multiplied with the feature map to weight th portance of each feature spatially: where  is a convolution operation specific to the attention mechanism,  a weights of the attention convolution, and ⊙ represents element-wise multiplication The feature maps from different layers can be integrated using a feature integ function , which combines information across layers to form a rich representation spatial features: The integrated feature map  is then flattened and passed through o more fully connected layers for classification.The output is a probability score indi the likelihood of a bird's presence:

𝑃 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑊 𝐹 𝑏
where  and  are the weights and bias of the fully connected layer.
The SACN is trained using a supervised learning approach where the loss fun measures the discrepancy between the predicted probabilities and the ground truth l The loss function  is typically a cross-entropy loss defined as follows: where  represents the true label for frame  , with  1 indicating the presenc bird and  0 otherwise.

Integration and Fusion
The outputs of ATAN and SACN are fused using a custom integration layer, co ing temporal and spatial insights to make a final prediction.This fusion approach en that the model's decision-making process benefits from a comprehensive understa of both the spatial appearance and temporal behavior of birds in flight.Assume th ATAN and SACN output feature maps for the  frame are represented as  (th text vector from ATAN) and  (the integrated feature map from SACN), re tively.The work flow of ATAN and SACN is given in the Figure 4.The goal of the fusion layer is to combine these two sets of features in a way that preserves and utilizes the information from both sources.This is conducted through a fusion function ψ, which is defined as follows: A common choice for ψ is concatenation followed by a fully connected layer that mixes the features, although other fusion techniques like summation or multiplication could also be considered.If concatenation is used, the fusion function can be mathematically expressed as follows: where W f used and b f used are trainable weights and biases of the fusion layer, and [;] denotes the concatenation operation.
The fused feature vector F f used n is then fed into a classification layer to make a final prediction about the presence of a bird in the frame.The output of the classification layer is a probability distribution over possible classes, given by the following: where W class and b class are the weights and bias of the classification layer, and P n represents the predicted probabilities.
During training, a loss function is employed to optimize the model parameters.Given that the model is now performing a fusion of temporal and spatial features, the loss function L total is a combination of losses from both ATAN and SACN as well as a classification loss from the fused output: where the coefficients λ 1 , λ 2 , and λ 3 are hyperparameters that balance the contributions of the respective components.The classification loss L class for the fused feature vector is typically a cross-entropy loss if the task is a classification problem, defined as follows: where y n is the true label for the presence or absence of a bird in the n th frame.

Proposed Algorithm
In the framework of this current study, we focus on preventing bird strikes through realtime detection.The proposed work develops a comprehensive algorithmic approach that seamlessly integrates with the spatiotemporal convolutional neural network architecture detailed in Section 2.2.This section describes the procedural aspects of the proposed approach.It encompasses image preprocessing, bird detection, and the fusion of spatial and temporal data for enhanced decision-making.

Image Preprocessing and Augmentation
The initial stage of the proposed algorithm involves meticulous preprocessing and augmentation of raw image data.Given a batch of raw images I, each image Ii is first resized to a uniform dimension, ensuring consistency across the dataset.Following that, each image undergoes normalization to standardize pixel values, by preparing the data for efficient model processing.Augmentation plays a crucial role in enhancing the robustness of the model.Through techniques such as random flipping, rotation by an angle θ, and scaling by a factor s, it simulates a variety of environmental conditions.This diversity in the training data is instrumental in developing a model capable of accurate bird detection across different scenarios.The process of data preprocessing is given in the Algorithm 1.

Bird Detection Model Prediction
After preprocessing, the images are inputted to the deep learning model "M" to detect a bird.The model's prediction result includes scores "P" for every possible bird detection in an image and the corresponding bounding boxes "B".If a detection's confidence is above or equal to the predetermined T, then it is considered valid and unique and therefore used in the subsequent investigation.Therefore, to improve the detection outcomes and remove duplicate detections, the method uses Non-Maximum Suppression (NMS).This step is crucial in arriving at the most likely detections of birds while at the same time minimizing the number of false positives of the model.Furthermore, NMS is pivotal in enhancing the accuracy and reliability of the bird strike prevention system.It minimizes false positives and overlaps in detected birds, providing clearer and more actionable insights for aviation safety measures.
A distinctive feature of the proposed approach is the integration and fusion of temporal and spatial data.It is derived from ATAN and SACN, respectively.This fusion is facilitated by a custom integration layer which combines the context vector C from ATAN with the integrated feature map F integrated from SACN.
The fusion function ψ amalgamates these data sources, leveraging both the spatial and temporal characteristics of bird movements.This comprehensive data integration ensures that our model's predictions are based on a holistic understanding of birds in flight, significantly improving detection accuracy.The process of bird detection model is given in Algorithm 2. Further, Algorithm 3 presents the step-by-step process of finding the non-maximum suppression of the calculated data.

Training and Validation
The data were augmented to increase the size of the data and the data were split in the ratio of 7:3 with 70% of the augmented data used for training the model and the remaining 30% used for the validation.Training involved employments of advanced optimization algorithms and loss functions in the contrary object detection tasks.A validation process is concerned with accuracy, speed, reliability, and performance under different environmental situations, whereby the metrics applied include precision, recall, and the F1 score.-If d j is not redundant, add it to D f inal : Return D f inal as the set of detections after applying NMS.

Dataset and Experimental Setup
For the study on bird strike prevention technology, a valuable source for the research work was the construction of a proper database meant for the differentiation of the birds from the UAVs [22].This images file contains 20,925 labelled images with dimensions of 640 × 640, with birds and drones taken in various settings and lighting.All the images are properly divided and labeled following the YOLOv7 specifications to facilitate the accurate identification of the objects during the training of the deep learning models.With sets for training and validation as well as for testing the model performance, the dataset provides comprehensive and optimized assistance in training and fine-tuning the model specifically for real-life applications, and every image in the dataset is thoroughly annotated, allowing for the precise detection of airborne objects in real-time conditions.Table 2 presents the hardware and software used for the experiments.The deep learning parameters are presented in Table 2.

Results
Table 3 below identifies a comparison of the results of performance measurements between the developed model and six benchmark meta-architectures, namely, Faster R-CNN, R-FCN, Retina Net, SSD, YOLO v4, and YOLO v5.The assessment measures used are precision, recall, F1 score, accuracy, the area under the curve of the receiver operating characteristic (AUC-ROC), and the intersection over union (IoU).The following work shows a higher efficiency in all investigated characteristics, which speaks for its possible effectiveness in bird detection in the sphere of aviation security.It is notable that the following model developed from the proposed model enjoys percentages of precision of 0. Although, we obtain a satisfactory precision of 97% and an accuracy of 96%, thereby re-asserting the tool's capability to make accurate threat assessments on birds.A more detailed understanding of how the proposed model performs in terms of real-time bird detection is provided with the help of the evaluation of the inference time and the confusion matrix which can be found in Figures 5 and 6, respectively.
In more detail, Figure 5 shows the comparative analysis of the proposed model with other competitive meta-architectures such as Faster R-CNN, R-FCN, Retina Net, SSD, YOLO v4, and YOLO v5 in terms of inference time.The inference time, which is in milliseconds (ms), can be explained by how quickly the system needs to decide in real-time applications ranging to the extent of preventing bird strikes in aviation.A competitive inference time of roughly 45 ms is demonstrated, thus enhancing the viability of the model for real-time analysis.Compared to other described architectures, rapid processing capability means that the model is optimized for speed without a significant loss of the detection accuracy, which is quite suitable for the usage in the aircraft safety systems where the immediate response matters.A more detailed understanding of how the proposed model performs in terms of real-time bird detection is provided with the help of the evaluation of the inference time and the confusion matrix which can be found in Figures 5 and 6, respectively.In more detail, Figure 5 shows the comparative analysis of the proposed model with other competitive meta-architectures such as Faster R-CNN, R-FCN, Retina Net, SSD, YOLO v4, and YOLO v5 in terms of inference time.The inference time, which is in milliseconds (ms), can be explained by how quickly the system needs to decide in real-time applications ranging to the extent of preventing bird strikes in aviation.A competitive inference time of roughly 45 ms is demonstrated, thus enhancing the viability of the model for real-time analysis.Compared to other described architectures, rapid processing capa- A more detailed understanding of how the proposed model performs in terms of real-time bird detection is provided with the help of the evaluation of the inference time and the confusion matrix which can be found in Figures 5 and 6, respectively.In more detail, Figure 5 shows the comparative analysis of the proposed model with other competitive meta-architectures such as Faster R-CNN, R-FCN, Retina Net, SSD, YOLO v4, and YOLO v5 in terms of inference time.The inference time, which is in milliseconds (ms), can be explained by how quickly the system needs to decide in real-time applications ranging to the extent of preventing bird strikes in aviation.A competitive inference time of roughly 45 ms is demonstrated, thus enhancing the viability of the model for real-time analysis.Compared to other described architectures, rapid processing capa- The model performance of the proposed work has been quantified by using a confusion matrix and this is depicted in Figure 6.The confusion matrix shows the number of true positives, true negatives, false positives, false negatives, and the number of objects that are recognized as a bird and the number of objects that are not recognized as bird objects, meaning it provides a better handle of the accurate precision of the model.In relation to the number of TP and TN, 970 and 960, respectively, it can be concluded that the model has a potential to perform rather accurately when it comes to bird presence and absence detection.The comparatively low numbers for both FP and FN (30 and 40, respectively) also suggest the model is useful in reducing false positive results.The high predictive accuracy combined Sensors 2024, 24, 5455 13 of 17 with the inference times described in this study situates the proposed model as a preferred candidate for real-time bird detection in safety-related aspects of aviation.
Figure 7 presents the training and validation accuracy over epochs, where both accuracies exhibit an upward trend, reflecting the model's improving capability to correctly classify birds.The close tracking of validation accuracy with training accuracy implies that the model generalizes well to unseen data.
that are recognized as a bird and the number of objects that are not recognized as bird objects, meaning it provides a better handle of the accurate precision of the model.In relation to the number of TP and TN, 970 and 960, respectively, it can be concluded that the model has a potential to perform rather accurately when it comes to bird presence and absence detection.The comparatively low numbers for both FP and FN (30 and 40, respectively) also suggest the model is useful in reducing false positive results.The high predictive accuracy combined with the inference times described in this study situates the proposed model as a preferred candidate for real-time bird detection in safety-related aspects of aviation.
Figure 7 presents the training and validation accuracy over epochs, where both accuracies exhibit an upward trend, reflecting the model's improving capability to correctly classify birds.The close tracking of validation accuracy with training accuracy implies that the model generalizes well to unseen data.Figure 9a shows that the model is sensitive and precise in identifying birds as this can be determined using bounding frames which have been labeled to a single bird.Such customization of the detection parameters is indicative of the robustness of the feature extraction layers of the model which is significant for the accuracy of a program as in this case for real-time online applications.The accuracy of the model according to Table 1 is demonstrated with a precision of 97%, wherein this figure helps the reader to see the accuracy of the model with regards to the localizing of birds in a visual scene.Figure 9a shows that the model is sensitive and precise in identifying birds as this can be determined using bounding frames which have been labeled to a single bird.Such customization of the detection parameters is indicative of the robustness of the feature extraction layers of the model which is significant for the accuracy of a program as in this case for real-time online applications.The accuracy of the model according to Table 1 is demonstrated with a precision of 97%, wherein this figure helps the reader to see the accuracy of the model with regards to the localizing of birds in a visual scene.Figure 9a shows that the model is sensitive and precise in identifying birds as this can be determined using bounding frames which have been labeled to a single bird.Such customization of the detection parameters is indicative of the robustness of the feature extraction layers of the model which is significant for the accuracy of a program as in this case for real-time online applications.The accuracy of the model according to Table 1 is demonstrated with a precision of 97%, wherein this figure helps the reader to see the accuracy of the model with regards to the localizing of birds in a visual scene.Figure 9b shows how the model can be used to predict multiple birds flying next to each other.This figure emphasizes another important strength of the model: the ability to forecast crowded scenes with all birds properly separated from one another-an important quality for operational systems considering the high birds per frame rates expected in scenes like those in an airport.The reliability of the accuracy estimated is 96.5% and the recall is 96%.These are highlighted in Table 1 and it shows the model's ability to replicate the outputs with high detection under adverse conditions.Figure 9b shows how the model can be used to predict multiple birds flying next to each other.This figure emphasizes another important strength of the model: the ability to forecast crowded scenes with all birds properly separated from one another-an important quality for operational systems considering the high birds per frame rates expected in scenes like those in an airport.The reliability of the accuracy estimated is 96.5% and the recall is 96%.These are highlighted in Table 1 and it shows the model's ability to replicate the outputs with high detection under adverse conditions.

Discussion
This discussion is based on the applicability of the proposed spatiotemporal convolutional neural network (ST-CNN) which plays a significant role in the improvement in bird strike prevention technologies by detecting birds in a real-time manner.This advanced model is distinguished by the fact that it introduced both spatial data processing as well as temporal data processing, which is crucial to identify and track birds in dynamic airport environments.
The proposed model integrates an ATAN which targets the temporal aspect of the data with a SACN targeting spatial aspect.This design enables the detection to be precise by highlighting features of birds in backgrounds and their movements, a tremendous enhancement from the previous bird detection systems.This creates spatial attention mechanisms that allow the model to focus on concentrating on vital aspects of birds such as wing flapping or the changes in direction, thus improving the model's ability to detect birds in environments that have different amounts of noise or interference.
In comparison to the current approaches, the benefit of this model can be supported by the results in terms of performance given in Table 1 of this paper.Specifically, it gains a huge advantage against the Faster R-CNN, Retina Net, and YOLO series models in terms of precision, recall, and IoU.This superior performance shows that the proposed ST-CNN is not only more accurate but also gains superior speed in processing and responding to real-time scenarios, and this is crucial in systems that are used in areas that require immediate action such as aviation safety systems.
In addition, deep learning to achieve the real-time interpretation of complex data in visual forms is proven to be considerable progress in the field.The function of decreasing false positives and increasing the detection accuracy presents a valuable tool to airports, which might diminish the high risk associated with bird strikes' expenses while increasing the safety features.In this discussion, aware of the impacts of deep learning simulation on the risk management of the aviation industry by minimizing human errors of the pilots, the deep learning framework is proposed as a strong solution to the conventional detection systems in aviation which have been challenged by this safety issue for a long time.
Despite its robust performance, the proposed model has limitations primarily related to its generalization capabilities and real-time processing constraints.While effective in diverse scenarios, its accuracy may diminish in environments drastically different from those in the training dataset, particularly under extreme weather conditions or highly cluttered backgrounds.Additionally, the computational demands of the model, while manageable on high-end hardware, could pose challenges when scaling to broader, realtime applications requiring ultra-low latency.These areas present opportunities for further optimization and testing to enhance the model's practical deployment in aviation safety and beyond.

Ablation Study
The ablation study aims to dissect the contributions of two critical components within the proposed model: ATAN and SACN.Table 4 presents the ablation study of the proposed methodology.The study evaluates the necessity and impact of each component by observing changes in performance metrics under three distinct configurations: i Proposed Model: Integrates both ATAN and SACN, serving as the control setup to benchmark the full capabilities of our proposed system.ii Without ATAN: Omits the ATAN component, leveraging only SACN to assess the importance of spatial feature analysis in the absence of temporal dynamics.iii Without SACN: Removes SACN, relying solely on ATAN to determine the impact of temporal analysis when spatial features are not considered.Each model configuration is rigorously tested under uniform conditions using the same dataset, ensuring that any variations in performance are attributable solely to the changes in model architecture.
Proposed Model: The performance of the full model, incorporating both ATAN and SACN, sets a high standard across all evaluated metrics, including precision, recall, F1 score, accuracy, AUC-ROC, and IoU.This configuration highlights the synergistic effect of combining spatial and temporal analyses, crucial for the dynamic detection of objects such as birds in flight.The integration ensures robust detection capabilities, minimizing both false positives and false negatives, which is vital for applications requiring high reliability.
Without ATAN: This configuration shows a significant reduction in performance across all key metrics.The decrease in recall and precision is particularly notable, underscoring the critical role of temporal dynamics in tracking and identifying objects over time.The absence of ATAN suggests that temporal features are essential for understanding object behavior and movement, which are key to reducing identification errors and improving model responsiveness to real-time changes.
Without SACN: The removal of SACN results in the most dramatic decline in performance, especially in metrics like precision and IoU.This outcome emphasizes the importance of spatial analysis in accurately distinguishing objects from their backgrounds.Spatial features provide the necessary context for the model to localize and classify objects effectively within their environments.The pronounced drop in performance metrics indicates that spatial analysis is not merely supportive but fundamental to the model's detection capabilities.

Conclusions
This research has successfully demonstrated the development and implementation of a novel deep learning framework for real-time bird detection, aimed at mitigating the risks associated with bird strikes in aviation.Through the integration of a spatiotemporal convolutional neural network, enhanced with spatial attention mechanisms and temporal analysis, the proposed model has shown superior performance in accurately identifying birds within diverse environmental settings.The experimental results provided a comprehensive evaluation of the model's efficacy.The performance comparison with other meta-architectures which include Faster R-CNN, R-FCN, Retina Net, SSD, YOLO v4, and YOLO v5 shows that the proposed model achieved higher mean precision, mean recall, mean F1 score, mean accuracy, and maximum mean IoU.In addition, the inference time analysis supports the choice of the proposed model as a real-time solution, which is vital for the aviation safety systems.Such analysis is available in the confusion matrix; hence the revealed data show that the predictability of the model is very reliable in differentiating between birds and other objects.This capability is very important for the reduction in false alarms and activation of protective measures only when it is necessary.Possible further research can be devoted to the explanation of the addition of other data sources, such as radar and acoustic data for enhanced detection.Another area that could be further investigated included exploring the transfer learning and domain adaptation which could increase the model's range of applications within the different geography and operation environments.
1. Attention-Based Temporal Analysis Network (ATAN) ATAN is intended to work with the reiterated image data, while temporal features of birds' motion describe the flow of data.It uses forward-backward LSTM layers, further improved with attention activities, to pay attention to essential temporal features reflecting the flight of birds like the flapping of wings and fluctuations in speed.The overall process of the proposed ATAN is shown in the following Figure 2.

2. 2 . 2 .
Spatially Aware Convolutional Network (SACN) SACN focuses on analyzing individual frames for spatial features, utilizing a of convolutional layers followed by spatial attention mechanisms.This compone tasked with identifying bird-specific features against complex airport backgrounds, tively distinguishing birds from other moving objects like aircraft and vehicles.Fig represents the workflow of the SACN mechanism.

Figure 3 .
Figure 3. Workflow of SACN.Let   ,  , . . .,  represent a batch of N image frames where each frame  responds to a different point in time or a different camera view, with  ∈ 1,2, . . .,  .frame  undergoes a series of convolutional operations that extract features at mu levels of abstraction.We define the output feature map after the  convolutional for frame  as  .The convolutional operation can be mathematically represented a lows:

Figure 4 .
Figure 4. Workflow of integration and fusion.Figure 4. Workflow of integration and fusion.

Figure 4 .
Figure 4. Workflow of integration and fusion.Figure 4. Workflow of integration and fusion.

Algorithm 2 :
Bird Detection Model Prediction Variables: -D represents the set of detection results.-P is the set of prediction scores.-B is the set of bounding boxes.-T is the detection threshold.-I processed is a preprocessed image.-M represents the deep learning model.Algorithm: Detect_Birds(I processed , M, T) 1. Pass I processed through model M to obtain predictions P and bounding boxes B. 2. Initialize an empty set for detection results D = {}.3.For each p i , b i in P, B : a.If p i > T : i. Add (b i , p i ) to D. 4. Apply Non-Maximum Suppression (NMS) to D, yielding D f iltered . 5. Return D f iltered .Algorithm 3: Non-Maximum Suppression (NMS) Variables: -Let D = {d 1 , d 2 , ..., d n } represent the set of detected bounding boxes, where each detection d i is associated with a confidence score s i and a bounding box b i .-D f inal is the final set of detections after applying NMS.-IoU b i , b j represents the Intersection over Union between the bounding boxes b i and b j .-T IoU is the IoU threshold for determining overlap.1. Sort Detections : Sort all detections D in descending order based on their confidence scores s i .D sorted = sort(D, key = s i , order = descend) 2. Initialize Final Detections : Set D f inal = {}.3. Iterate Over Sorted Detections : -For each detection d i in D sorted : -Compare d i with each detection d j in D f inal to calculate the IoU : IoU b i , b j = area(b i ∩b j ) area(b i ∪b j ) -If IoU b i , b j > T IoU for any j, d i is considered redundant and is not added to D f inal .4. Add Non-Redundant Detections to Final Set :

Figure 5 .
Figure 5.Comparison of inference times across models.

Figure 6 .
Figure 6.Confusion matrix of proposed model.

Figure 5 .
Figure 5.Comparison of inference times across models.

Figure 5 .
Figure 5.Comparison of inference times across models.

Figure 6 .
Figure 6.Confusion matrix of proposed model.

Figure 6 .
Figure 6.Confusion matrix of proposed model.

Figure 7 .
Figure 7.Comparison of training and validation accuracy.

Figure 8
Figure 8 showcases the training and validation loss over epochs, depicting a steady decrease in loss values as the training progresses.This trend indicates that the model is learning effectively, with both training and validation losses converging, suggesting a good fit without overfitting the training data.

Figure 7 .
Figure 7.Comparison of training and validation accuracy.

Figure 8 18 Figure 8 .
Figure 8 showcases the training and validation loss over epochs, depicting a steady decrease in loss values as the training progresses.This trend indicates that the model is learning effectively, with both training and validation losses converging, suggesting a good fit without overfitting the training data.Sensors 2024, 24, x FOR PEER REVIEW 14 of 18

Figure 8 .
Figure 8.Comparison of training and validation loss.

Figure 8 .
Figure 8.Comparison of training and validation loss.

Figure 9 .
Figure 9. (a) Multiple detections for a single bird instance.(b) Effective bird detection in a dense scene.

Figure 9 .
Figure 9. (a) Multiple detections for a single bird instance.(b) Effective bird detection in a dense scene.
a batch of raw images I = {I 1 , I 2 , ..., I n }. < p op , apply op to I i,norm , resulting in I i,op .d. Add I i,op (or I i,norm if no op was applied) to I ′ .3. Return I ′ .

Table 2 .
Hardware and software specifications.