Prairie Dog Optimization Algorithm with deep learning assisted based Aerial Image Classification on UAV imagery

This study presents a Prairie Dog Optimization Algorithm with a Deep learning-assisted Aerial Image Classification Approach (PDODL-AICA) on UAV images. The PDODL-AICA technique exploits the optimal DL model for classifying aerial images into numerous classes. In the presented PDODL-AICA technique, the feature extraction procedure is executed using the EfficientNetB7 model. Besides, the hyperparameter tuning of the EfficientNetB7 technique uses the PDO model. The PDODL-AICA technique uses a convolutional variational autoencoder (CVAE) model to detect and classify aerial images. The performance study of the PDODL-AICA model is implemented on a benchmark UAV image dataset. The experimental values inferred the authority of the PDODL-AICA approach over recent models in terms of dissimilar measures.


Introduction
Unmanned aerial vehicle (UAV) networks link the gap between airborne, spaceborne, and ground-based remote sensing information (RSI).Its lightweight and lowest-price features allow reasonable observations with high time-based and spatial resolutions [1].The growths in the RSI model and the resultant significant increases in the spatial, temporal, and spectral resolves of remotely detected data and the improved progress in information and communication technologies (ICT) dependent upon storage, data transmission, and integration abilities are remarkably altering the method they observed the Earth [2].The most vital use of RSI is to observe the Earth, and the core concern in Earth-observing is monitoring the variations in land cover.Aerial imagery classification (AIC) of scenes categorizes the resultant AI into sub-regions by covering several ground things and types of land cover into various semantic sorts [3].Many actual uses of RSI are accessible, such as urban development, computer cartography, resources management, and AIC, which are very important.Usually, some similar object types of land cover are united amid several forms of scenes [4].For example, commercial and domestic scenes are two of the foremost sorts of scenes that may contain roads, buildings, and trees.Still, there are dissimilarities in the spatial distribution and density of the three types.So, in aerial scenes, classification entirely depends upon the spatial and structural pattern, which is a complex issue [5].
The standard method is constructing a holistic scene symbol for classification [6].In the RSI community, bag-of-visual-words (BoVW) is one of the known techniques for scene classification issues.It was made for text analysis that plans a text over the occurrence of words.The bag-of-words (BOW) method was employed with a clustering model to identify imageries of the number of visual words produced by quantizing local features.The BoVW approach is a type of BoW method for image analysis, whereas each image is definite as an order set.Deep learning (DL) methodologies like convolutional neural networks (CNNs) have been generally identified as an excellent technique for frequent computer vision (CV) uses (video or image classification and detection) and have also exposed remarkable outcomes in several uses [7].Thus, various benefits are reduced from employing DL models in emergency response and disaster organization to recover vital data appropriately, permit superior preparation and response throughout critical situations, and aid in making decision procedures [8].Past works have proved how DL models can overcome traditional machine learning (ML) techniques with hand-crafted features over the usage of transfer learning (TL), where a pre-trained CNN is applied as feature extraction and one or many layers are inserted on topmost to execute the classification for the novel task [9].
This study presents a Prairie Dog Optimization Algorithm with a Deep Learning Assisted based Aerial Image Classification Approach (PDODL-AICA) on UAV images.The PDODL-AICA technique exploits the optimal DL model for classifying aerial images into numerous classes.In the presented PDODL-AICA technique, the feature extractor process is executed by customizing the EfficientNetB7 approach.The PDODL-AICA technique uses a convolutional variational autoencoder (CVAE) approach to detect and classify aerial images.The performance study of the PDODL-AICA method is carried out on a benchmark UAV image dataset.The contribution of the PDODL-AICA method is as follows: • The presented PDODL-AICA technique implements the EfficientNetB7 model for feature extraction, utilizing its state-of-the-art model that balances the size and accuracy.EfficientNetB7 is recognized for its capacity to capture convolutional patterns and details in the data, paving the way to a more significant accomplishment in the task of feature extraction • The hyperparameter tuning of the EfficientNetB7 technique is optimized by employing the PDO method.This optimization approach confirms that the hyperparameters of the EfficientNetB7 model are finely altered to attain optimum accomplishment, improving the comprehensive effectualness and accuracy of the technique • The approach also incorporates a CVAE model for recognition and classification, allowing for efficient modelling of convolutional data dispersions.The CVAEs method integrates the power of CNNs with variational inference, making them appropriate for tasks needing both recognition and classification • The novelty of the PDODL-AICA technique is in its novel integration of EfficientNetB7, PDO, and CVAE for advanced feature extraction, hyperparameter tuning, and efficient recognition and classification.This incorporated model addresses several threats in recognition and classification tasks, giving an overall, efficient, and exact outcome

Literature works
Behera et al. [10] developed a super pixel-aided multiscale CNN architecture to prevent misclassification in complex urban aerial images.The presented structure is a dual-tier DL-based segmentation framework.In the 1st phase, a super-pixel-based simple linear iterative cluster system offers superpixel images with vital contextual data.The 2nd phase includes a multiscale CNN design that employs these data-rich superpixel pictures to remove scale-invariant features.Rahman et al. [11] projected a forest fire detection model reliant on a CNN architecture utilizing a unique fire detection database.The model also employs separate convolution layers for instant fire recognition and usual layers.So creating it appropriate for actual use.Dewangan and Vij [12] present a new structure, like the hybrid CNN and Long Short-Term Memory (HCNN-LSTM), which targets to identify anomalies in farmland utilizing imageries attained from UAVs mechanically.The model uses a CNN for the deep feature extractor, whereas LSTM is used for the recognition task.Behera et al. [13] projected a DL-based technique inclined by the dense units that aid the domain of the system's feed-forward nature, so removing the vanishing gradient issue generally got in deep modern devices.The planned endwise CNN design contains contracting and symmetric growing tracks that exactly remove the global feature to divide the vegetation type from the aerial imageries.Pandey and Jain [14] clarify an innovative conjugated dense CNN (CD-CNN) design named SL-ReLU for intellectual classification.CD-CNN combines data fusion and feature map extractor in combination with classification procedure.Jiskani et al. [15] presented a DL-based model for identifying speeded-over insulator errors in the real.The system is dependent upon the architecture of Resnet 50.
Minu and Canessane [16] proposed an effective DL-based AIC utilizing Inception with Residual Network v2 and a multilayer perceptron (DLIRV2-MLP).UAVs have been used mainly for a range of aerial imagery.The IRV2-based feature extraction has been applied to create a beneficial set of feature vectors.Lastly, the AIC utilizes the resultant feature vector through the MLP model.Samadzadegan et al. [17] developed a new DL-based model for effectively detecting binary sorts of drones and birds.Estimating the planned system with the set image dataset validates superior efficacy equated to present recognition methods.Also, drones are frequently tangled with birds due to their primary and behavioural resemblance.Cheng et al. [18] collectively review DL models covering threats, autoencoder-based, CNN-based, and generative adversarial network-based models.Geetha and Sunitha [19] introduce a Pelican Optimization Algorithm with a Convolutional-Recurrent Hop-Field Neural Network (POA-CRHFNN) model.This method comprises the Gaussian Filter (GF) technique for noise removal and contrast enhancement, ShuffleNetv2 for extraction, and CRHFNN for spatial-temporal dependency capture.In Ref. [20], Archimedes Optimization with DL-based Aerial Image Classification and Intrusion Detection (AODL-AICID) method is presented.This approach employed MobileNetv2, AOA, BPNN, and a stacked Bi-directional LSTM (SBLSTM) model tuned with the Nadam optimizer for extraction, optimization, classification, and detection.Mogaka et al. [21] propose a co-design optimization technique for deploying the EmergencyNet CNN on resource-limited UAVs.This method encompasses channel-wise pruning to mitigate network size and optimize the model.The additive powers-of-two (APoT) quantization is also utilized for additional model compression and enhanced computational effectualness.

The proposed method
This research proposes an innovative PDODL-AICA technique for UAV images.The method exploits the optimal DL model for classifying aerial images into manifold classes.It contains an EfficientNetB7-based feature extractor, PDO-based parameter tuning, and CVAE-based classification processes.Fig. 1 portrays the workflow of the PDODL-AICA model.

Feature extraction: EfficientNetB7 model
Using the EfficientNetB7 technique for the feature extractor procedure executes the PDODL-AICA technique.Due to the high upsurge in the convolutional layers, network width, and depth, deep CNN (DCNN) architecture is usually overparameterized, making an architecture computationally expensive and compromising network efficiency [22].There are tradeoffs between network accuracy and efficiency.Deep networks are well-generalized on testing datasets, and their efficacy in terms of network parameters, model dimensions, inference speed, and flops (floating-point operations per second) increases.Google researchers in 2019 introduced a family of EfficientNet sequences, such as EfficientNetB0 -B7, as a backbone structure that has outperformed many DCNN-based frameworks, namely InceptionV2, ResNet, InceptionV3, DenseNet, and ResNet50 for TL, image classification from segmentation, ImageNet and other issues.This contradicts classical scaling models used by prior research, which include randomly increasing the system's depth, width, and resolution to improve the generalization capability.The CNN can be generalized by the EfficientNet using a secure set of scaling co-efficient over the compound scaling model.Compound scaling is derived from balancing the network size of width w, depth d, and resolution r by ascending it with an endless ratio as shown in Eq. ( 1).In such a way that α.β2.γ2≈2 whereas, α ≥ 1, β ≥ 1, γ ≥ 1. α, β and γ values are defined by the grid search technique.A user-defined parameter defines an upsurge in computation resources to the network is known as ∅.Flops of the convolution network operation are equivalent to d, w2, and r2.That flops will be doubled if the network depth is doubled.At the same time, flops will be increased to four times if the resolution and width are doubled.The increase in flops is based on the relation (α.β2.γ2)∅ such that overall flops are enlarged by 2∅for novel value.The EfficientNet structure has a stem block that follows the seven blocks and a last layer.Every block in EfficientNet contains a variable amount of modules, and the amount of modules upsurge as one profit from EfficientNet-B0 to B7. EfficientNet contains parameters and variable depth.EfficientNetB0 is the simple version of EfficientNet with 5.3M parameters and 237 layers, whereas EfficientNetB7 has 66M and 813 layers.EfficientNet structure exploits MBConv layers, similar to MnasNet and MobileNetV2.Meanwhile, the normalization layer is pre-existing in the stem layer, so no further image standardization is needed as a preprocessing stage, so it takes an input image within the 0-255 range.Here, five variations of pre-trained EfficientNet viz., Effi-cientNet B0 -B4 support the lung cancer classification from the CT scans.The conditions for choosing EfficientNet variation are based on different variables, like dataset size, the resource accessible for the evaluation and model training, the model depth, the network parameter, and the batch size.EfficientNet-B5 over EfficientNet-B7 are significant differences between EfficientNet with more parameters and deep networks.

Hyperparameter tuning: PDO model
At this stage, the hyperparameter tuning of the EfficientNetB7 technique takes place using the PDO model.PDO is a biologically stimulated optimizer technique stimulated by the four-prairie dog (PD) actions to determine the exploitation and exploration stages of the algorithm [23].The PD spends their days building new burrows, eating, and keeping present by doing maintenance for it or monitoring for enemies.The PD produces sounds according to the present event that might be an alarm for the predator threat or food source obtainability.They can communicate diverse signals for chasing patterns and different kinds of predators.PDs have dissimilar survival responses dependent on the type of signals attained.Each dog will observe from the entry of the caves in case of a coyote.Each dog will escape into the caves if the translated sound by the predator is human.Hence, dissimilar responses for the predators are applied in the exploitation phase.These survival habitats and communication sounds are the motivation for the PDO technique.While raising their system, certain assumptions were made as scheduled below: -Every colony creates a group of wards where the entire coterie (CT) occurs in the ward.
-Each PD is a member of CT that has n dogs, whereas m CT forms a colony.
-There is no difference between PDs; they are only divided into different groups.
-The new nutrition supply and anti-predation are the only sounds of communication.
-The number of caves in the ward arrays from 10 to 100.
-According to the division of the colony into m CT where a similar activity is simultaneously done, the exploitation and exploration are repeated m times.-The anti-predation activities, signals transfer, food search, and burrow-creating actions are limited to the dogs living in similar CT.

Initialization
Like other metaheuristic approaches, the PDO technique applies a random initial position for the PDs within the search space.The matrix, as portrayed by Eqs. ( 2) and (3) gives the locations of the PDs within the CT that are part of the colony and the locations of each CT within the colony.
here, CT i,j represents the j th dimension of the i th CTs, and PD indicates the j th parameter of the i th PDs in a particular CT, according to Eqs. ( 4) and ( 5), CT and dog locations are defined correspondingly.
In Eqs. ( 6) and ( 7), UB j and LB j are the upper and lower limitations of the search range.ub j ,lb j , and U(0, 1) are uniform distribution random values within [0, 1].
The dog position's fitness function (FF) value is evaluated and stored in the matrix, as shown in Eq. ( 8).The small value of the FF is pointed and captured as the optimum solution.

Exploration phase
The burrow constructing and food search activity is used in this stage.The PDs search for new food sources and create novel burrows around them; they search the group to discover new food sources once the food source is done.The PDs in the colony live in CTs, seek food, and build burrows within the bounds of CT.Here, the maximal iteration counter is split into ; the first dual arrays are applied for the exploration stage, whereas remaining for the exploitation stage.Eq. ( 9) demonstrates the updated location for seeking new food sources where the Levy fight (LF) defines the PDs movement.After exploring the food sources, the burrows are constructed near the food sources.The digging strength limits the number of burrows built and Eqs. 9 and 10 specify the position update of dogs.
Here, GBesr ij is the global best solution, eCBesr ii is the impact of the current optimum, q is the respective sign for the food source, rPD signifies the position of the random solution, and CPD i,j indicates the random cumulative effect of each member.According to Eqs. 11-13, the digging power of CT is DS, which is dependent upon the fitness of the food source and is evaluated at random.Levy (n) shows the distribution of Levy for efficient exploration.
CPD i,j = GBest i,j − rPD i,j GBest i,j + Δ (12) where r refers to arbitrary value in [− 1, 1].The factor Δ defines the individual alterations among the members of colonies.Iter and Max iter are the existing and overall iteration counters.

Exploitation phase
The PD produces different scenarios, ranging from the predator menace to the food sources.Communication skills perform a substantial function in fulfilling the need for anti-predation abilities and nutrition.These scenarios adopted in the exploitation stage are shown in Eqs. ( 14) and (15).
Now, PE shows the impact of the predator, ε indicates the small number, describing the quality of the food source, and rand is a random integer within [0, 1], as illustrated in Eq. (16).
The PDO method improves an FF to attain enhanced classifier performance.It designates an optimistic number to indicate the amended presentation of the candidate solution.In this research, the classifier rate of error minimization has been measured as FF assumed in Eq. (17).

Image classification: CVAE model
Finally, the PDODL-AICA technique uses the CVAE model to detect and classify aerial images.VAE model is a conventional AE network [24].An encoder and a decoder are combined neural networks of VAE.As an inference model q(z|x), the encoder maps input dataset x to low dimensional hidden variable space z.On the other hand, the decoder obtains the hidden space z variable as the input and output probability of data p(x|z).VAE varies from AE to form a hidden vector; instead of enlarging the bordering log-likelihood and immediately producing a hidden vector, a vector of the mean (μ) and standard deviation (σ) is generated and combined to create the hidden vector.But, if the network cannot acquire the spreading of those outcomes from the encoder, then the straight integration of this parameter in the hidden vector z represents the constant random variable.The re-parameterization method solves these issues and defines the random variable z as deterministic variables, with z based on the parameter of encoder output (μ; σ) and variable epsilon experimented from the Gaussian distribution shown in Eq. ( 18): Further, VAE varies from conventional AE in increasing evidence lower bound (ELBO) on minimal log-likelihood of p(x).The Kullback-Leibler (KL) difference between the encoder's distribution q(z|x) and the prior distribution p(z) is represented as KL(q(z|x)‖p(z)).Regularization assesses the data lost when p is referred to as distribution q, as depicted in Eq. (19).
CVAE is a VAE.The conditional data y is presented as an encoder and decoder model.This creates the sample class to be planned into the encoder's hidden space z, which enhances the ability to discriminate between sample classes and is shown in Eq. (20).min p E q(z|x,y) [ log p(x|z, y)] − KL(q(z|x, y)‖p(z|y)) (20) The data flow of CVAE: the input (x and y) of the encoder, the intermediate hidden space z generated by the network's output, the distribution epsilon, mean (μ), and standard deviation (σ).Fig. 2 demonstrates the architecture of CVAE.
Fig. 4 establishes the confusion matrix moulded by the AOAFS-HDLCP method below 70 % of TRAPH.The outcomes specified that the PDODL-AICA method correctly categorized 21 classes.
In Table 2, the overall AIC outcomes of the PDODL-AICA technique are reported with 70 % of TRAPH.The results stated that the  Fig. 5 establishes the confusion matrix formed by the AOAFS-HDLCP method below 30 % of TESPH.The outcomes stated that the PDODL-AICA procedure correctly classified 21 classes.
The accu y curves for training (TRA) and validation (VL) exposed in Fig. 7 for the PDODL-AICA technique under the UCM dataset deliver valued insights into its performance under numerous epochs.Mainly, there is a consistent development in both TRA and TES accu y to rising epochs, demonstrating the model's capability to acquire and recognize designs from both TRA and TES data.The growing trend in TES accu y underlines the model's flexibility in relation to the TRA dataset and its ability to generate precise predictions on hidden data, underlining strong generalization skills.
Fig. 8 summarises the TRA and TES loss values for the PDODL-AICA technique below the UCM dataset across several epochs.The TRA loss reliably decreases as the model improves its weights to minimize classification error on both datasets.The loss curves demonstrate the model's arrangement with the TRA data, underlining its capability to take designs well in both datasets.Notable is the nonstop alteration of parameters in the PDODL-AICA technique, which is planned to minimize differences between forecasts and real TRA labels.Regarding the precision-recall (PR) curve in Fig. 9, the results confirm that the PDODL-AICA method below the UCM dataset reliably attains enhanced PR values across every class.These outcomes underline the model's real capacity for discriminating amongst diverse classes, underlining its value in accurately in class labels.
Besides, in Fig. 10, ROC curves produced by the PDODL-AICA method are presented below the UCM dataset, representing its ability to differentiate among classes.These curves deliver valuable insights into how TPR and FPR tradeoffs differ across dissimilar classification epochs and thresholds.The results underline the model's exact classification performance under several class labels, underlining its efficiency in addressing various classification tasks.
Table 4 reports    shows higher performance with the greatest prec n of 98.46 %, reca l of 98.5 %, F score of 98.45 %, and G mean of 98.47 %.
Table 5 and Fig. 13 comprehensively compare the computational time (CT) study of the PDODL-AICA method.The outcomes specify that the DL-MOPSO, DL-AlexNet, DL-VGG-S, and DL-VGG-VD19 techniques have revealed worse performance than other techniques.Besides that, the SIDTLD-AIC, SIDTLD + SSA, and DL-C-PTRN approaches achieve closer classification performance.However, the PDODL-AICA model performs better with a smaller CT of 0.91s.These superior results stated the enhanced performance of the PDODL-AICA technique over other recent methods in terms of distinct measures.problems.Furthermore, exploring models to improve the interpretability of CVAE-generated features and enhancing robustness against various convolutional real-world scenarios would be significant for advancing the technique's applicability in practical scenarios.

Fig. 7 .
Fig. 7. Accu y the curve of the PDODL-AICA technique under the UCM dataset.
a detailed comparison study of the PDODL-AICA model under distinct measures [27].In Fig. 11, a comparative accu y analysis of the PDODL-AICA technique is given.The results indicate that the DL-MOPSO, DL-AlexNet, DL-VGG-S, and DL-VGG-VD19 techniques have performed worse than other approaches.The SIDTLD-AIC, SIDTLD + SSA, and DL-C-PTRN techniques also accomplish closer classification performance.However, the PDODL-AICA technique exhibits superior performance with a maximum accu y of 99.85 %.In Fig. 12, a comparative prec n , reca l , F score , and G mean study of the PDODL-AICA model is given.The results specify that the DL-MOPSO, DL-AlexNet, DL-VGG-S, and DL-VGG-VD19 approaches have exposed worse performance over other techniques.Besides that, the SIDTLD-AIC, SIDTLD + SSA, and DL-C-PTRN methods achieve nearer classification performance.But, the PDODL-AICA model

Table 1
Details on database.

Table 4
Comparative analysis of PDODL-AICA technique with other approaches.

Table 5
CT analysis of PDODL-AICA technique with other approaches.