Deep Learning-Enhanced Hybrid Fruit Fly Optimization for Intelligent Traffic Control in Smart Urban Communities

The rapid urbanization accompanying the evolution into “smart” communities presents numerous challenges, not least of which is the significant increase in road vehicles. This proliferation exacerbates congestion and accident rates, posing major barriers to the successful implementation of innovative technologies such as Wireless Sensor networks (WSNs), surveillance cameras, and the Internet of Things (IoT). Accurate traffic flow prediction, a crucial component of these technological initiatives, requires a reliable and efficient methodology. This research explores the implementation of an intelligent traffic control system that employs a Transferable Texture Convolutional Neural Network (TTCNN). The design of this system eschews the traditional pooling layer, instead incorporating three convolutional layers and a single Energy Layer (EL). This configuration facilitates the provision of real-time traffic updates, which can enhance the utility and efficiency of the smart city infrastructure. A model inspired by the Hybrid Fruit Fly (HFFO) optimizes the system’s hyperparameters. The application of HFFO to the TTCNN showcases the potential for improved accuracy in traffic flow prediction. Simulation results suggest that the HFFO provides superior organizational boundaries for the TTCNN, enhancing the overall accuracy of the model’s predictions. The hybrid forecasting method discussed herein demonstrates its potential to outperform other established techniques. This investigation sheds light on the potential benefits of applying deep learning algorithms and hybrid models in the context of traffic flow prediction and control, contributing to the ongoing development of smart urban communities.


Introduction
Smart vehicles are now a practical choice for urban residents because to the widespread availability of autonomous driving technologies that have been shrunk in recent years. By performing tasks like collision avoidance, identification, autonomous vehicles lighten the load on human drivers. And because of their enhanced fuel efficiency and reduced emissions, vehicles equipped with autonomous driving technology can better manage traffic flow and minimize congestion [1]. Providing safe and dependable transportation for the elderly and the disabled, solving parking issues, and reducing the number of accidents due by human mistake are all ways in which autonomous vehicles help people in their everyday lives [2].
People today spend the vast majority of their waking hours away from home [3]. This includes commutes to and from work, visits to entertainment venues and retail establishments, and trips to and from the city centre. This undoubtedly threw a wrench into daily mobility and prompted the creation of parking services so that people wouldn't have to waste time and gas driving aimlessly about downtown in quest of a parking spot. On the one hand, this results in more greenhouse gas emissions and harms, it exacerbates driver annoyance and urban traffic, both of which are major contributors to increased vehicle collisions [4].
Due to the evolving nature of the global economy and contemporary lifestyle, urban areas have been expanding at a dizzying clip recently. When it comes to city planning and urban development, information and communication technologies are essential [5]. The goal of these new knowledges and networked smart devices is to create smart cities and maximize the efficiency of urban services that are directly related to inhabitants [6]. Every aspect of daily living benefits from the technological advancements made possible by smart cities [7,8]. Increases in automobile ownership over the previous five years have been alarming, leading to gridlock, accidents, and even driver disease as a result of the stress and irritation they generate [9]. Poor management, and the need for drivers to move around aimlessly in search of parking spaces in already-crowded areas are to blame [10].
Major technological advancements, shifts in corporate practices, and global environmental issues have all contributed to the emergence of the study of "smart cities and communities" (SCC). (ICT) that link infrastructure, to data monitoring and asset management systems are among the technologies that make SCC possible [11]. Internet of Things (IoT) is another technology that allows for even the tiniest devices to attach to the Internet and report on their status [12]. IoT technology enables the interconnection of devices from many sectors, including the transportation network, power plants, and private homes [13].
Processing and analyzing the obtained data is necessary for realizing the possible of SCC in any request area. Finding connections, causes, and patterns in data is made possible by AI. The updated AI coaching and user behaviour improvement suggestions might be more specific and useful [6,7]. Many problems are still to be solved in terms of SCC, and some of them are detailed below. Over the past 60 years, there have been enormous shifts in the way businesses function. The decline of manufacturing and the expansion of services to businesses and individuals are seen in the GDP trends from 1947 to 2009. About half of the United States' industrial GDP has been lost. Conversely, GDP growth for commercial and professional services is 400% [14]. The percentage of GDP contributed by services has climbed from 72.8% to 77.4% during the past 17 years The study's most significant findings are as follows.
•For the purpose of traffic flow prognostication, we present a novel (TTCNN).
•For our TTCNN architecture, we employ the EL (energy layer). We can keep texture information, keep the productivity vector small, and improve the replica's learning capacity in this way.
•Improve the quality of the solutions generated by the original fruit-fly optimization (FFO) procedure by creating a hybrid version of the algorithm that uses swarm intelligence.
The remainder of the paper follows this construction: The relevant literature is presented in Section 2, shadowed by a problem statement in Section 3. Section 4 delivers a concise summary of the optional perfect. In Section 5, we display the consequences of our comparison of the projected model to the already available validation methods. Section 6 provides a conclusion.

Related Works
In order to anticipate traffic flows, Bao et al. [15] offer a new technique based on (ST-CGCN). Based on physical locations, past data records, and external interference among traffic nodes, we first build a matrix. To further enhance the joint modelling capabilities of spatial-temporal characteristics and external influences, these fused into a multifaceted matrix by incorporating self-learning dynamic weights. Next, modules for extracting spatial characteristics and temporal information are developed so that dynamic spatial-temporal aspects may be described. The spatial feature extraction component is made up of a residual unit and a graph convolution operator using a suggested make up the temporal feature extraction component.
Djenouri et al. [16] have explored a novel convolutional system for predicting urban traffic flows in an edge IoT setting by combining graph prediction in a single pipeline. First, the raw data set of urban traffic road networks is pre-processed using a connected graph pre-processing procedure to eliminate unwanted noise. The road network is effectively explored by employing an outlier identification approach to filter out unnecessary designs and noise. The generated graph is then used to train an network, which is ultimately used to provide traffic predictions for the city. A novel branch-and-bound-based optimization approach is created to fine-tune the values of the planned framework. Multiple datasets and reference methodologies are used in a thorough review for contrast. The findings demonstrate that, especially with a high sum of nodes in the graph, the suggested framework performs solutions.
To deal with the novel crown epidemic's extremely discontinuous and irregular character, Li et al. [17] offer a deep-space temporal traffic discrete wavelet transform (DSTM-DWT). First, DSTM-DWT dissects flow information into components such as trend, amplitude, and baseline. Second, we use a graph-based approach to designing the transportation network's spatial connection, including newly available data on the crown pneumonia outbreak into the attributes of each node. The geographical correlation of each node is then determined using the graph convolutional network, while the temporal correlation is determined using the temporal convolutional network. This paper offers a graph memory network (GMN) for converting discrete magnitudes split by discrete wavelet transform into high-dimensional discrete features, which is a solution to the flow data epidemic. Once the traffic data has been forecasted using DWT, the traffic trend and discrete baseline may be separated out, and the GMN-predicted discrete model can be compared to the outcome of the inverse DWT.
A unique attention-based learning model has been proposed by Jia and Cai [18] to accurately flow at road crossings throughout an entire city. To begin, we look at how turning traffic is distributed over space and time.
Then, to predict the reversal of traffic, a four-part, end-to-end deep learning framework is developed. To learn spatial dependencies and sparseness, we adapt a graph convolutional network, and to learn temporal dependencies and fluctuations, we design a gate recurrent unit mechanism. Our model was trained and tested using trajectories collected from taxi rides in Wuhan, China. According to the outcomes, our model provides more accurate estimates of turning traffic flow than the present state-of-the-art representations.
Henry gas is a new method developed by Escorcia-Gutierrez et al. [19] for 6G-enabled vehicular networks. The primary goal of the described HGSODL-TFF method is to foretell the volume of traffic in the 6G equipped VANET. Furthermore, the traffic data is initially preprocessed in the HGSODL-TFF model with a z-score normalization technique. In addition, a deep belief network (DBN) perfect is used to predict traffic volumes with impressive accuracy. The DBN model's forecasting performance may be optimized by adjusting hyperparameters like the count, and batch size using the HSGO method. The HGSODL-TFF model is experimentally validated using test data, and the findings are analyzed in detail. According to the simulation findings, the HGSODL-TFF model is superior than the other contemporary methods.
Air traffic flow may be predicted using a convolution network (TAaDGCN) proposed by Cai et al. [20], which takes into explanation both the airspace structure and the flow paths. To begin, we build a spatial capture the interdependencies between neighbouring and OD sectors. As a next step, we employ a (SE) block to symbolize potentially connected flight sectors in order to incorporate long route information. In addition, a module is used to retrieve historical aspects of the input sequence in order to define the temporal evolution pattern. A spatio-temporal block, including several geographical and temporal relationships, is built from the aforementioned blocks. The experimental findings from using real-world flight data show that methods in terms of prediction performance, especially those that disregard the sector's spatial structure.

Challenges Addressed by SCC
Over the past 60 years, there have been enormous shifts in the way business's function. The decline of manufacturing and the expansion of services to businesses and individuals are seen in the GDP trends from 1947 to 2009. About half of the United States' industrial GDP has been lost. Conversely, GDP growth for commercial and professional services is 400% [14]. The percentage of GDP contributed by services has climbed from 72.8% to 77.4% during the past 17 years [21], while the percentage contributed by industry has declined from 22.5% to 18.6%.
This tendency is mostly attributable to digitalization with the aid of ICT, and with the most recent developments in computation power, AI is emerging as a crucial technology to exploit the data and advance the services. Consumption of energy, especially from nonrenewable sources like oil, is another difficulty. In 2017, primary energy consumption in Europe was 1561 Mtoe, 5.3% more than the EU goal for 2020. In 2018, crude oil accounted for 36% of all energy production, followed by natural gas (21%), renewables (15%), solid fossil fuel (15%), and nuclear (13%). Industry accounts for 31% of final energy consumption in the EU, followed by transport (28%), homes (25%), services (2%), according to a recent study [22].  [23] are both unique to individual stretches of road. One-way streets have only one oncoming and one going lane. Figure 1 illustrates how the sum of cars arriving and departing the road segment during 5(n 1) 5n minutes, as well as the sum of vehicles already present in the road segment, affect the traffic flow along that section of road. This allows us to derive the following formula for determining the number of cars currently using this stretch of road at time k.
where, q0 is the sum of vehicles comprised in the early period; and q in (i) and q out (i) are the number of vehicles incoming and section, correspondingly.

Figure 2.
Extraction of the sum of vehicles on a road section Figure 2 illustrates the process of vehicle count extraction along a multi-channel road stretch. Setting up several detection intersections and using the data obtained from these sensors to anticipate the sum of cars on the complicated road segments is required to estimate the volume of traffic on the multi-channel road segment. Here is how we may define k, the current number of cars: where,Q(k) is the entire vehicle capacity on the multifaceted road segment;q 0 is the sum of vehicles at the early instant; andq in (i, s) andq out (j, s) are the sum of vehicles incoming and exiting the unit at the sth instant from the ith on-ramp and jth off-ramp, correspondingly.

Network Architecture
For the purpose of traffic forecasting, the TTCNN framework is described. The suggested deep CNN takes into account these three picture characteristics: First of all, certain description patterns are considerably smaller than the source picture, but the convolution filter can still locate the pattern if its size is equivalent to the size of the convolution filter mask. Second, certain areas of the picture might make use of particular forms or designs. Convoluting the full source picture is another way to define these models. In addition, the downsampled pixels play a crucial role in the max-pooling layer without altering the overall form of the original mammography.
Two pooling layers, a third convolution layer, and an output layer (EL) are all part of the proposed TTCNN. The fully connected (FC) layer is then shadowed by a softmax layer. By averaging the corrected activation output, EL précises the feature maps of the previous convolutional layer. A value, representing the energy response of a filter bank, is returned for each feature map. This design not only improves efficiency in learning texture functions but also uses less memory and compute. This compromise between speed and processing time is made possible by EL. The primary motivation for implementing this layer is to maintain the original layer's data flow. The output of EL is flattened and delivered to the concatenation layer after the final pooling layer. Through this link, information about the image's contours and textures is flattened into a new vector and sent across the completely linked layer. Table 1 provides a comprehensive explanation of the projected network, including input and output dimensions. Eq. (3) provides a mathematical formula for calculating the convolution layer's output size.
where, I a and I b signifies the filter size correspondingly, S denotes the ϱ is the stride value. Dropout After that, we use three convolution 16 and 32 channels, respectively, for their output. The third convolution layer, with a kernel size of 3×3, and 64 output channels, is investigated as an intermediary texture attributes. The convolution layer can only be used to learn a maximum of 31,744 parameters, which are then computed using the methods in Eqns. (4) and (5): where, ξ v signifies the CNN layer limits, I k signifies the kernel size, and ζ v denotes the channel amount. At each convolution layer, the output of the neuron coupled to the input is calculated. The answer is the dot product of its mass and the lowest input field connected with it. A 16-kernel 32x32x16 output is created by the first convolution layer. The production of the neurons in the first convolution layer may be calculated using Eq. (6): where, S ϑ is the feature maps used to generate an output, C ϑ is the feature maps used to generate input, and T ϑ is the weighted map. After that, the output of the last convolution layer is mapped onto an energy descriptor. After the third convolutional layer, energy layers are merged according to the specifications of the energy descriptor. It performs similarly to a texture description for a cluttered, thick surface. Eq. (7) explains the relationship: where, EL(ξ, ϑ) represents the EL weighted vector, j stands for the EL input influences, and represents the EL output layer. The link between the EL and FC layers is substantially smaller compared to the final traditional convolution layer, which reduces the number of trainable parameters. Furthermore, EL learns during both forwards and backwards propagation by retaining energy information from the previous layer. Additionally, EL enhances the network's general learning ability and simplifies the proposed system by decreasing the vector size of the subsequent FC layer. Learnable EL parameters can be determined by solving Eq. (8): where, ξ EL is the EL learnable limits, η m is the current F C neuron, and η m−1 is the previous FC layer neuron.
Between the rectified linear unit (ReLU) layer, a batch normalization and activation function is utilized to expedite the training process. The internal covariance shift can be eliminated by employing batch normalization. The mean and standard deviation can be normalized to achieve this goal. Mean and Variance are determined using Eqns. (9) and (10), respectively, in the bulk normalization procedure.
where, τ Q and v Q indicates the average and standard deviation, and n is the smallest possible batch size for the features in the 1 i dimension. The number n equals 64 in our study. Eq. (11) describes how to determine the batch normalisation: where, a and A are the possible starting values for each output layer's learnable parameters. Eq. (12) computes the ReLU activation function, and Eq. (13) determines the ReLU layer's output: where, λ i,j,k means characteristics of the final element, while (i, j, k) means characteristics of the first element. The subsequent shrinking of feature maps, weights, and calculations due to the pooling layer demonstrates that the control network has been overfit. Eq. (14) is a mathematical formula used to determine the max pooling layer: where, M pool signifies the maps, Q indicates the input feature maps, and T signifies the vector. The work utilizes two max pooling layers, each with a kernel size of 2×2.
To avoid overfitting training data, the dropout layer is used throughout the weighted update phase to continuously delete a subset of random parameters. Drop editing is used throughout the weighted update process to delete a selection of random parameters to prevent overfitting of the training data. Because FC layers contain the most network-wide features, they are particularly sensitive to over-compatibility difficulties when training data is used. As a result, the dropout layer is established after the FC layer. To do classification, the softmax layer employs the loss function. For softmax, the valid probability values are between zero and one. Eq. (15) provides the mathematical formulation of the loss function.
where, k l signifies the total loss and δ j having the class δ which is i-th element. The goal is to minimise the probability dissimilarity among the true label and the assessed label computed by the softmax purpose in Eq. (16): Here, the proposed models' hyper-parameter tuning is carried out by improved fruit fly optimization procedure.

Original fruit-fly optimization procedure
The original FFO algorithm takes its name and inspiration from the foraging habits of the fruit fly. The four-step original fruit-fly optimisation technique: •initialization, •population evaluation, •osphresis foraging, •vision. As shown in Eq. (17), initial solutions (fruit flies) are created at random inside the provided lower and upper boundaries, with x i,j denoting the i-th solution and the j subscript indicating the element's location within the i-th solution. Lower bound (lb) and upper bound (ub) stand for limits. rand is a uniformly distributed random sum generator.
After the position is updated, the fitness value of each solution is determined, and the greedy selection process decides whether the previous position or the new one should be maintained. The vision foraging phase is the next step in the algorithm. A new answer will replace an older one if its fitness value is higher; otherwise, the older solution will continue in the populace while the newer one is eliminated. Upon meeting the stop condition, the algorithm exits and delivers the optimal solution.

Proposed hybrid FFO
The FFO is easy to customize because of its uncomplicated structure, few parameters, and flexibility in solving a wide variety of problems. Despite its benefits, however, this approach is not without its flaws. The method has the potential to become stuck in local minima, has a static position update technique, and is not very good at exploiting opportunities.
Experiments on unbounded functions revealed the algorithm's flaws. Because of its superiority in intensification, the firefly algorithm (FA) search mechanism has been added into the algorithm to improve exploitation. In addition, opposition-based learning (OBL) is presented as a means of enhancing the search space exploration process.
The acronym HEFFF stands for "hybrid enhanced fruit-fly firefly" to describe the name of the hybrid algorithm. At first, Eq. (17) is used to produce a random population. If the random number is less than 0.5, then the FFO search mechanism is used in even iterations; otherwise, opposition-based learning is performed; and the firefly search mechanism is used in odd iterations.
The definition of the inverse number,X ′ is given by Eq. (19).
To determine how far apart two solutions are, the firefly search process uses Eq. (20), where r i,j signifies the distance x j . At zero distance, ′ represents how alluring the healthiest firefly (the optimal option) is. The random number k is drawn from a Gaussian distribution with control parameters and alpha.
According to reference [24], a dynamic stage is used for a control limit to further enhance the algorithm's efficiency, where a value continuously decreases from the beginning value (a 0 ), is reached through the course of the iterations. Eq. (22) defines the update of a value at each repetition as follows: where t is the current iteration, MaxIter is the extreme sum of repetitions, a(t) is the value of an at the present iteration, and u is the updated value a(t + 1).
Algorithm 1 exemplifies the main phases of the projected method [25].

Algorithm 1 Pseudo-code of projected model
Initialize the population randomly by Eq. (17) Initialize the FA parameters of β 0 , γ, a Set the iteration counter t to 0 and define the termination criteria Evaluate the fitness of each individuals while termination criteria is not satisfied do for i = 1 to N do if t is even then if rand < 0.5 then Update the position according to FFO updating mechanism by Eq.

Dataset and Pre-Processing
Our experimental dataset consists of traffic counts recorded along California State Route 1 during the course of 31 days, beginning on March 1 and ending on March 31 [26]. The ratio of training data to validation data to test data is 6:2:2. We use data on traffic flows from March 1st through March 19th for our training set, March 20th through March 25th for our validation set, and March 26th through March 31st for our test set. The traffic data in the road segment scenario is computed using the mathematical model discussed in Section 2 because this dataset is a cross section dataset. Figure 3 depicts a comparison of daily and weekly traffic volumes on the road lane. It is clear that the amplitude, temporal correlation, and time lag of the road segment's traffic flow are all higher than those of the overall traffic flow.

Dataset and Pre-Processing
The suggested model's predictive power is graphically represented in Table 2 by calculating the bias among the anticipated value and the actual traffic flow of ten randomly selected samples. It demonstrates that the suggested model has better predictive accuracy than the state-of-the-art methods. In the above Table 2 represent that the proposed model with existing models. In this analysis we have used different sample analysis. In the 5th sample the actual data is 14

Performances Metrics
Based on the aforementioned criteria, the following measures of effectiveness are analysed. Accuracy: It is hypothesised what fraction of the test dataset contains of connection records with a predicted ratio. Accuracy, as stated, is a suitable measure for use on an experimental dataset with evenly distributed classes.
Precision: It takes as input the total number of attachment logs and makes an educated judgement as to what percentage of those logs were successfully detected. Higher precision (Precision [0,1]) in an ML model is preferable.
F1-Score: The harmonic mean is accuracy and memory. A higher F1-score (F1-score for [0,1]) is preferable. False Positive Rate (FPR): Traffic is determined by dividing the sum of regular connecting records by the total sum of standard connection records. The ML model is enhanced with a reduced FPR (FPR [0,1]). In the above Table 3    Using the traffic flow data from the cross section, this research first establishes a mathematical model of the road segments, and then uses this model to derive the traffic flow information for the road segments. After that, we put the enhanced TTCNN to use to complete the traffic flow prediction process. The TTCNN architecture relies on an EL to analyze texture characteristics, extract the broad shape information, cap the output vector's size, and fine-tune the model's receptiveness to new input. TTCNN is sensitive to the starting weight, thus the HFFO is built to optimize the TTCNN's structural parameters using a combination of a parallel search method and a group cooperation technique. The experimental consequences validate the superiority of our projected HFFO over the HFFO -TTCNN hybrid perfect improves prediction accuracy and convergence speed by using high-quality structural parameters found by the proposed HFFO for the TTCNN. In the next suggested model of Proposed model reached the accuracy of 95.62 and the precision value as 98.32 and the recall value of 94.62 and finally, F-score value as 94.53 correspondingly. In this comparisons investigation the projected model reached the better consequences than other compared models. The geographical aspect of traffic flow is ignored in favour of its temporal counterpart for the purposes of this article. The prediction performance may be greatly enhanced by including the spatio-temporal characteristic into the input matrix in future research.

Data Availability
The data supporting our research results are included within the article or supplementary material.

Conflicts of Interest
The authors declare no conflict of interest.