When Smart Cities Get Smarter via Machine Learning: An In-depth Literature Review

The manuscript represents a comeprehensive and systematic literature review on the machine learning methods in the emerging applications of smart city. Application domains include the essential aspect of the smart cities including the energy, healthcare, transportation, security, and pollution. The methodology presents the state-of-the-art, taxonomy, evaluation and model performance. The study concludes that the hybrid models and ensembles are the best performers since they exhibit both high accuracy and not-costly complexity. On the other hand, the deep learning (DL) techniques had higher accuracy than the hybrid models and ensembles, but they demanded relatively higher computation power. Moreover, all these advanced ML methods had a slower processing speed than the single methods. Likewise, the support vector machine (SVM) and decision tree (DT) generally outperformed the artificial neural network (ANN) for accuracy and other metrics. However, since the difference is negligible, it can be concluded that using either of them is appropriate. The study’s findings identify the pros and cons of the methods in each application for future researchers, practitioners, and policy-makers for the right problem within the context of smart cities.


I. INTRODUCTION
Due to their abundance of resources, facilities, and welfare, half of the world's population lives in cities [1]. The lack of a specific definition of what makes a smart city smart [2] led to many cities around the world tagging themselves smart [3]. In this paper, we define the smart city as a city that utilizes the various Information and Communication Technologies (ICT) [4,5] to improve the lives of its citizens, to solve problems (e.g., pollution, traffic, crime, etc.) [6], and to preserve its natural resources [7]. Conceptually, smart cities might be the answer to goals such as improving living standards, provisioning more services and facilities, and attaining social sustainability [4,6,8,9]. Consequently, numerous technologies, such as the Internet of Things (IoT) [7], Big Data, and Cloud Computing technologies [10] have been among the tools used to support smart cities and the goals behind constructing them [11]. IoT-based devices help to optimize the decisions to enhance the performance of the city services to citizens [12]. However, adopting IoTs in smart cities can have its toll on lifestyles and undesirable impacts such as the increase in energy consumption [13], and the increase in the pollution levels in the air, soil, and water resources [14]. As a result, several studies emerged to mitigate such cons. For example, Ghahramani et al. evaluated an intelligent technique for routing recommendations in an IoTbased waste management complex [15]. Ghahramani et al. provided a unified topic modeling technique to disclose urban green space characteristics using artificial intelligence (AI) techniques [16]. Alsamhi et al. [17], in a study, proposed Green IoT as an environmentally friendly solution for the future use of IoTs. Almalki et al. [18] also presented a lowcost platform to monitor environmental parameters by employing flying IoT of real-time applications. Figure I portrays a future smart city which shows that the concept of smart cities integrates with all aspects of human life with a variety of ICT technologies. The incessant adoption of ICT inevitably induces a sheer volume of data that the machine can learn and discover latent patterns from them. Big data is another technology that will help us analyze smart cities' data efficiently and at a higher degree of scalability [19]. Figure I represents a holistic representation of smart cities application [20].

FIGURE I. An outlook on smart cities
The literature is rich with studies investigating the role of AI and ML-based techniques in smart city applications. For example, Ullah et al. [19] reviewed recent trends in the application of artificial intelligence techniques in smart cities but limited their analysis to ML and reinforcement learning and a selected set of applications (i.e., transportation, cybersecurity, smart grids, unmanned air vehicles, and healthcare). The study lacks a comparison between the performance of the different ML techniques. Shafiq et al. [21] presented a survey on the applications of data mining and single ML techniques to have sustainable smart cities. The study discussed the performance of these techniques against complex datasets. Nosratabadi et al. [22] reviewed the use of ML and deep learning techniques in smart cities for prediction, planning, and uncertainty analysis. Din et al. [23] studied IoT-based ML techniques in some aspects of smart cities such as healthcare, smart grids, and vehicular communications. Similarly, Din et al. [24] surveyed single ML and internet-of-thing-(IoT) techniques used in healthcare, smart grids, and vehicular communications. Souza et al. [25] surveyed ML data mining techniques and their role in smart city applications using the arrangement method [25] and the e VOS viewer [25]. Its aim and purpose were focused on statistical perspectives, not comparing performance or recommending certain techniques for smart city applications. Batty et al. [26] discussed the relationship between AI and smart cities and proposed ML techniques for real-time city functions. Mohammadi et al. [27] shed light on the challenge of big data in smart city applications from a machine learning point of view. The study focused on deep reinforcement learning and how it was used to handle the cognitive aspect of smart city services. Bhattacharya et al. [28] developed a qualitative study for discussing the future of DL-based techniques for smart city applications. Kolomvatsos et al. [29] studied the application of deep reinforcement learning and clustering for query controller application in smart cities as a comparative analysis. Table I presents the study's strengths and weaknesses to generate the central research gap. This table compares the conducted studies with the criteria of the present study. Despite the abundance of the conducted studies, they still have shortcomings and limitations that warrant further investigation and study. Specifically, they do not provide a classification for the ML and DL techniques used or categorize their roles and functionality in smart cities. In addition, researchers in the field are challenged by the scarcity of reviews that contrast the performance of ML techniques and analyze their suitability to solve different problems. Currently the literature lacks essential comprehensive review that categorizes ML algorithms and their applications to smart cities. Such a study would guide researchers in the field of smart cities to use the right tool for a given problem. Managing a significant amount of data in review articles can ensure the successful implementation of smart cities for future planning and policymaking [30]. We argue that our analysis in this study may bridge the gap by providing a taxonomy of the ML algorithms and their contributions to improving smart cities. Furthermore, we provide a quantitative analysis of the performance of these ML algorithms to select the most likely effective one in a given field. We evaluate these algorithms concerning efficiency, accuracy, and computational complexity. Our contribution in this paper aims to introduce a novel taxonomy that focuses on the type of ML algorithms and approaches rather than the type of applications in smart cities. The proposed taxonomy may help researchers, policy makers, and practitioners to enhance the living standards in smart cities by leveraging the right ML tools. The rest of the manuscript is organized as follows. Section II explains the methodology we used to carry out this literature review. Section III surveys the literature, describes the role of state-of-the-art ML algorithms in solving problems in smart cities and presents the taxonomy of the AI and Mlbased techniques for application in smart city concepts.  [19] Recent trends in the application of artificial intelligence techniques in smart cities N.A.   Database information and subject review interval [21] A survey on the applications of data mining and single ML techniques against complex datasets The most cited methods and datasets   Evaluation interval [22] The use of ML and DL techniques in smart cities for prediction, planning, and uncertainty analysis Database from web of science (WoS) and Scopus   Subject review interval [23] IoT-based ML techniques in healthcare, smart grids, and vehicular communications N.A.   Database information and subject review interval [24] ML and internet-of-thing-(IoT) techniques used in healthcare, and smart grids N.A.   Database information [25] ML data mining techniques in smart city applications Database from web of science (WoS) and Scopus [26] The relationship between AI and smart cities

II. Methodology
It is challenging to search and identify all studies in which ML algorithms have supported smart cities due to the abundance of such algorithms and their variations. The simple search queries for "smart city" and "machine learning" may not provide a comprehensive list of relevant literature. The search phrase "smart city" is not the only one that we would solely bank on because other search phrases that bear close semantics, such as "intelligent city," "smart urban planning," "smart urban mobility," etc., should not be neglected. The complexity notably increases when we compound the query with the names of many ML algorithms. We relied on the main algorithms discussed in textbooks and in surveys such as [31] for the names of the ML algorithms. In this research, the Scopus database 1 has been used as the primary repository as it indexes the major authenticated publishers. Our review ultimately aims to identify, organize, and classify the ML techniques that have been used to serve smart cities into one of the four architecture categories: single models, hybrid models, ensemble models, and DL. Figure II depicts our review methodology which consists of four stages. In the first stage, an initial set of relevant articles is identified based on the search queries: "smart city" and "machine learning methods". For each ML method, we applied a new search query taking into consideration the specifics of each ML method and its variations. In the second and third stages of the review methodology, we analyzed and classified the ML algorithms based on how each algorithm was applied in smart cities, the datasets used, and the results attained. Finally, in the fourth stage, the ML models are classified into the four aforementioned categories. Overall, our search has generated more than 430 relevant documents. During the second stage, we have carefully analyzed these documents to discern the most relevant ones (i.e., those belonging to the fields depicted in Figure II) and thus we narrowed the search pool down to 100 relevant papers. In the third stage, the papers pool was further refined so that we ended up with 80 core papers to review. There was a considerable increase in the number of articles that used ML methods over the last ten years (2010 to 2020) ( Figure III).

Limitation
Research work on smart cities dates back to 2010. However, the research has progressed exponentially in terms of the number of papers published after 2016. Additionally, the popularity of ML applications in smart city technologies has also been recognized since 2016 with significant growth of publications in the last years as shown in Figure 3. Consequently, and for the sake of staying current and relevant, the focus of this survey has been confined to papers published in 2016 or after.

III. Smart Cities and Machine Learning
The concept of the smart city has been used in literature since the early 90s [32]. However, the term "smart city/cities" has been used only in a limited number of articles until 2011, when the concept started to be widely popular. Additionally, the importance of ML methods has exponentially grown over the past few years (see Figure 3). In reverse chronological order, Table 2 provides a summary of the most important studies in which ML algorithms were used in smart cities. Next, we discuss these studies in more detail. Elsaeidy et al. [33] used Restricted Boltzmann Machines (RBM) as the ML technique to detect distributed denial of service attacks in smart cities. The use of RBM was justified by the high number of features in the datasets. Evaluation results showed that the approach can cope with the attackdetection task as it showed high accuracy and reliability scores. Alrashdi et al. [34] used the IoT-based Random Forest (RF) technique to intelligently detect anomalies in a smart city. In comparison with other several techniques, the authors found that RF gives the most reliable and accurate results for detecting compromised IoT-based systems at distributed fog nodes. Similarly, Meenal and Selvakumar [35] found that the RF technique is promising for the detection of global solar radiation when compared with other ML techniques. Bilen et al [36] tackled the problem of estimating business locations in smart cities using the Multi-Layer Perceptron (MLP) and Multi-Linear Regression (MLR) techniques. They justified the use of these techniques to the large number of features involved and to the need for high accuracy. The method was to import London data to the main algorithm for a Feature set module. The next step was to develop the regression module followed by error analysis using Relative Absolute Error (%). If the calculated error value (%) is higher than the desired value, the algorithm returns to the regression module to perform the modeling operation again. Then, the estimated value was imported by the clustering module in parallel with the Web Client Feature Input for Hierarchical Clustering. This module generates district clusters as the Web client suggested district. Figure 4 presents the related algorithm reproduced from Bilen et al. [36]. This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and  Bakhshi and Ahmad [37] combined the ML algorithms with the IoT techniques to manage waste in smart cities. Figure V presents the algorithm of the mechanism. According to the mechanism represented in the figure reproduced from [37], there is an IoT-based unit for collecting the information of all the dustbins. This system monitors whether the Dustbini is full or empty by using sensors implemented. Then transfers the information to the analyzing server by using an internet connection. This unit forms the IoT unit. The next step is to generate the optimal route for the garbage truck to collect the wastes. This is a brief description of the mechanism. To increase safety and security in smart cities, Lourenço et al. [38] used ML techniques to detect criminal patterns based on historical data to increase safety and security in smart cities. The main mechanism can be found in Figure VI which is reproduced from [38]. The Citizen from the client side communicates with the Data Center Module. Data were imported by the Sci-Cumulus workflow engine on the serverside. The existing ML-based technique in this unit employs external sources and communicates with the knowledge base unit.The next step is to export the information to the analytical module. This module, as a decision-making system, communicates with the police. The ML techniques showed promising prediction results in comparison with other non-ML tools. Reid et al. [39] focused on one of the crucial issues in smart cities, namely traffic jams. The authors found the Support Vector Machine (SVM) showing high accuracy for classifying vehicular traffic in their attempt to mitigate air and noise pollution and optimize fuel consumption. Martínez-España et al. [40] experimented with RF and compared it with k-NN and Bagging ML techniques for forecasting air pollution in smart cities. Results, evaluated using the RMSE and correlation coefficient values, showed that RF provides the highest accuracy among the considered ML techniques. In another study by Chung and Jeng [41], ML techniques were also used for the prediction of air pollution and to determine the factors that affect air quality. In another weather-related problem, Chin et al. [42] developed a proper personalized service using an ML-based IoT system that correlates weather data (i.e. rainfall and temperature) with short journeys made by cyclists. Alsamhi et al. [43] provided a classification of ML-based techniques for enhancing the applications of IoTbased technologies in a smart city. Carrera et al. [44] employed a meta-XGBoost model integrated with meta-regression to generate energy data to enhance the prediction accuracy of the energy production. Alagumalai et al. [45] also used ML-based techniques to assess the trends of using nano generators in smart cities. Ullah et al. [46] analyzed the different applications of ML-based techniques employed for enhancing unmanned aerial vehicles' efficiency. Shahriar et al. [47] discussed supervised and unsupervised ML-based techniques for handling electric vehicles in a smart city. By analyzing the above studies, we noticed that two main motives compelled the use of ML techniques in smart cities. First, most of the tackled problems have high dimensionality datasets (the number of features is big). Second, accuracy and reliability were a priority in most of the studies to have a sustainable ecosystem in smart cities. Next, we briefly describe each ML technique used in smart cities.

Figure V: IoT Trash Collection Mechanism
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication.

1) Decision Trees
The first regression tree was initially proposed and implemented in 1963 by Morgan and Sonquist [48]. Then, the first work on Decision Trees (DTs) was published in 1966 by Hunt in the psychology field [49]. The DT algorithm is a supervised learning method [50] that can be employed for classification and regression tasks. More specifically, a DT leverages a tree-based data structure in which the samples are recursively partitioned based on the selected feature whose values most effectively split to maximize a purity measure [51,52]. Figure 4 presents a simple DT algorithm schematic diagram with leaf nodes as attributes. As seen in Figure VII, a DT algorithm collects the outcome of each node and decides the final results reproduced from [56].Attribute selection is one of the most important challenges in constructing a DT. The value of attributes is measured by two functions: Information Gain (IG) and Gini Index (GI) [53][54][55]. IG computes the entropy changes in the whole mechanism of DT based on Eq. 1: where S, A, and SV, define the set of instances, attributes, and instances in the V th attribute, respectively, whereas the entropy characterizes the impurity of an arbitrary collection. On the other hand, the GI determines the frequency of incorrect identification for a randomly chosen element, which leads to favoring an attribute with a lower GI. Eq. 2 shows GI's formula: = 1 − ∑ 2 (2) where refers to the probability of the event occurring. In addition to a single DT, the RF approach is constructed by considering an ensemble of multiple DTs, constituting a "forest" of simpler estimators. Each tree is built on different portions of the training set to minimize the error between the predictions and the actual values. Figure IV presents a simple flowchart for a decision-making purpose in smart city applications by DT. Table III presents the most important studies in smart cities that leveraged the DT-based techniques:  This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Connected vehicles in a smart city are a hot topic due to their security [70] and control aspects (e.g., platooning) [71]. DT was used in [57] to estimate the traffic classification in comparison with other ML-based techniques. Results have been evaluated using the accuracy metric (99.18% for DT). In [58], DT was used for pandemic prediction and compared to other ML-based techniques. According to the findings, DT provided a good accuracy (about 99%) for the estimation task. Balta et al. [59] employed DT integrated with a fuzzy approach for the optimization of the traffic signals in a smart city. Accordingly, nearly 15% to 17% performance improvement was obtained using the proposed technique. The study of Aloqaily et al. [61] deemed transportation one of the important fields in smart cities when they investigated how to detect connected vehicles using Deep Belief Networks (DBN) and DT. The performance of the proposed technique was evaluated using accuracy and detection rate. In the telecommunication field, Manzanilla-Salazar et al. [62] detected failures in the LTE infrastructures using the DT and SVM techniques and compared them. Early detection of failures in the LTE infrastructure can be a big cost saver. The study showed that the proposed DT technique can increase the accuracy and detection rate of failures. To protect smart cities from cyber-security, Alrashdi et al. [34] developed a system for the detection of attack points using an IoT-based RF whose accuracy reached 99.34% on real datasets. Solar radiation is one of the vital issues in smart cities that captured the attention of Meenal and Selvakumar [35] when they found that the RT technique outperformed other ML techniques when tried on empirical data collected in Tamil Nadu. Furthermore, in the field of electrical cards, the RF technique showed another success in smart cities when the authors in [44] tried to predict the charging demands of electric vehicles. Detecting and locating road anomalies is a significant aspect of smart cities. To that end, El-Wakeel et al. [64] used the DT algorithm with great success. Education and predicting student performance were the focus of Gomede et al. [65], who relied on an RF-based technique to do so. Orlowski et al. [66] presented a DT-based IoT model for increasing the performance of building business models. The paper discussed sustainable decision-making processes in smart cities and highlighted the importance of DTs and business models for making decisions. Validation was performed via a case study on air quality. Similarly, in [67], Mei et al. proposed a Rule-based Incentive Framework utilizing a DT along with a Game Theory (GT)-based technique that was evaluated in terms of decision-making accuracy for handling traveling information of passengers in a smart city. Simulations showed that the proposal constitutes an effective way to incentivize travelers to change travel routes, proving to be an essential smart city service. Air pollution in smart cities was also investigated by Benedict [68] who built a prediction framework based on RF for estimating air pollution, which is considered one of the most urgent challenges in smart cities. Using real validation data, the accuracy ranged between 70% and 90%. Pribadi et al. [69] developed a DT-based decision-making mechanism for handling CCTV cameras in smart cities. The performance evaluation showed an accuracy of 87.96%.
2) Support Vector Machines Support Vector Machine (SVM) algorithm was firstly developed by Vladimir et al. in 1963 [72], whereas Boser et al. provided an approach to employ non-linear classifiers using the kernel trick in 1992 [73]. SVM is one of the most frequently used supervised ML algorithms and employs the related learned model to handle both classification and regression tasks. In detail, the SVM represents the training samples as points in the feature space to find a set of hyperplanes that provide the best class separation, whereas new points are classified or predicted according to the portion of space they belong. The input-output formulation of an SVM is formally described by f(x) given in Eq. 3: denotes the transposed vector related to the output layer, φ(x) represents the kernel function, and the bias. Overall, the matrix has × dimensions in which and refer to the number of input parameters and data points, respectively. The following cost function is optimized to evaluate and parameters [74]: (4) which is constrained by Eq. 5: in which Xk and Yk are the k th input and output, respectively, whereas ε represents the fixed precision of the estimation; the slack variables (ξk,ξk*) are also in charge to determine the acceptable error margin. The following Lagrangian optimization is applied to minimize the cost function: ( . ) = ( ) ( ), for = 1, 2, … , (8) where , * are the Lagrangian multipliers. In the last step, the f(x) of the SVM is given as follows:  Recently, Manogaran et al. [75] adopted an approach that integrates the SVM with the shared Adaptive Computing Model for a traffic management system that provided an improved platform by increasing the decision reliability and reducing the computing time compared to the SVM alone. In [76], the SVM was compared to other ML-based techniques for cyber-attack detection in smart cities but the performance was not promising. Shen et al. [78] devised a secure and privacy-preserving SVM using blockchain-based encrypted IoT data. Results reported the accuracy and confidentiality of the proposed technique showing that it could successfully cope with the considered task and ensure the confidentiality of sensitive data. The SVM is leveraged also by Aymen and Mahmoudi [79] that presented a methodology for management and optimization of power status in electrical vehicles for smart cities. The evaluation used energy consumption and charge state of batteries and showed that the SVM attains high performance and robustness. Differently, Pujol et al. [80] developed an SVM-based system to detect and classify violence types in social media. This system monitors social media space and decides about observations by using a set of terms and rules. The accuracy measure of the proposed system exhibited acceptable performance between 85% and 97%. Le et al. [81] developed a platform for predicting and estimating building heating load in smart cities using ML methods, including SVM and RF, and a hybrid technique based on particle swarm optimization and extreme gradient boosting machine (PSO-XGBoost). Evaluations were performed using the Root Mean Square Error (RMSE) and correlation coefficient measures. Results demonstrated that the best method (i.e. SVM) generates predictions with moderate accuracy values but also emphasized the capability of hybrid techniques able to outperform single models (cf. later Sec. 4.4). Likewise, Chui et al. [82] presented a study aimed at the optimization of energy consumption in smart cities. The proposed method employed a Genetic Algorithm (GA) to construct a hybrid GA-SVM technique that was compared with other single ML techniques in terms of specificity, sensitivity, and accuracy. The proposed technique improved the performance by more than 21%, thanks to the presence of the GA optimizer. Garcia-Font et al. [83] tested an SVM method for anomaly detection in a laboratory that reproduces a real smart city use case with heterogeneous devices, algorithms, protocols, and network configurations. Results indicated the high reliability and accuracy attained by the proposed method for anomaly detection despite possible technical difficulties in configuring and implementing ML models in such environments. Belhajem et al. [84] developed an estimation platform for vehicle position using SVM and Extended Kalman Filter. The dataset was gathered via the Global Position System (GPS) and Inertial Navigation System (INS). This technique is aimed at low-cost detection of vehicle position. Experimental results showed an improvement of up to 94% in position prediction in case of GPS failures compared to related baselines. In another study, Aborokbah et al. [85] devised and evaluated a platform for a clinical decision support system based on SVM. The latter was developed with the RBF kernel function and leveraged to detect heart failures. Performance has been evaluated using the sensitivity measure and demonstrated that the SVM could provide a sensitivity of 76.9%.

3) Artificial Neural Networks
Artificial Neural Networks (ANNs) were first developed by Warren McCulloch et al. in 1943 [86]. This work simulated a simple neural network with electrical circuits to investigate the performance of neurons in learning tasks. The ANN is an initial and simple way to design an intelligent learning system inspired by the biological neurons that constitute brains. This system uses a training stage related to a certain task that extracts knowledge from a training dataset without the need to be programmed by task-specific rules [87]. Indeed, the basic idea of ANNs is performing tasks without any prior knowledge about the nature of phenomena. Consequently, ANNs can generate identifying characteristics (i.e., extracting discriminative features) from the data that are given as input [88]. ANNs can be considered as a comprehensive modeling framework to process complex datasets. Recently, ANNs have been employed for forecasting, regression, and curve-fitting purposes [87]. In an ANN model, neurons represent the fundamental components that employ transfer functions for generating the output values. The most important advantage of ANNs is that they are simple and costeffective methods for handling large datasets [89]. Multilayer Perceptron (MLP) is one of the simplest and most frequently used variants of feedforward ANNs. MLP is characterized by 3-layer, or more, architecture, as shown in Figure VIII [90]. The first layer is the input layer, the intermediate layers are the hidden layers, and the last layer is the output layer [91]. An MLP can have multiple hidden layers. In that case, we refer to it as a "deep" MLP (cf. Sec 4.4).
where w is referred to as weight values that control the propagation value x from input to output with n being the number of layers, whereas O is referred to as the output value from each node to be modified by the bias B value.   Alsamhi et al. [92] developed a platform using an ANN to predict the signal strength of a drone. The independent variables were drone altitude and path loss. Results have been analyzed using a determination coefficient. Findings showed reasonable validation accuracy determination coefficient equal to 0.96 and 0.98, respectively, for varying heights and distances. Singh et al. [93] employed ANN to estimate the arsenic vulnerable zones with reasonable accuracy. Le et al. [94] developed different types of ANNs combined with optimization techniques (e.g., GA, PSO, etc.) to estimate building heating loads in smart cities and optimize energy efficiency. Results evaluated the RMSE and determination coefficient measures. They showed that hybrid ANN-GA provides the highest determination coefficient and the lowest RMSE equal to 0.9 and 1.625, respectively. Similarly, Ullah et al. [95] developed an ANN-based smart system to detect lighting in smart cities (i.e. a classification task). The evaluation was performed by employing accuracy and reported that the proposed ANN provides an accuracy value of 92.6%. Recently, Keung et al. [96] developed an ANN-based monitoring system for drainage handling in a smart city. Monitoring and prediction of urban drainage employed data of IoT sensors and ANN capability, respectively. Results showed that the proposed ANN could successfully perform drainage prediction since it obtained 99% accuracy on the testing dataset related to Hong Kong. Banach et al. [97] developed a platform for mapping air pollution in smart cities using an ANN-based system. Specifically, the ANN was trained and tested for the required prediction task and integrated into a laboratory target prototype. The evaluation was based on using accuracy and showed that the ANN has acceptable performance and could be implemented and tested in a real operational scenario. Sharad et al. [98] devised an ANN-based technique for solving the time problems in reaching destinations for bus drivers. The technique managed the urban bus transportation paths in smart cities and monitored them to find the shortest path. The authors demonstrated then the ANN could provide an accurate estimation of the arrival time effectively reducing the delays. Bennati et al. [99] employed various data-driven learning algorithms based on ANNs to investigate and evaluate their application to social welfare, fairness, and privacy in smart cities. The algorithms were evaluated through computer simulations based on real-world data (i.e. smart-meter readings and participatory sensing) and considering two implementation scenarios (i.e. smart grid and traffic congestion information system). The authors identified algorithm trade-offs and provided a set of guidelines depending on the requirements and privacy constraints of the specific smart-city scenario and application. Differently, Sharad et al. [100] developed a real-time managing console for public transportation systems in smart cities employing an ANN-based monitoring system. The latter computed the shortest path to reach a destination and provided that information to the bus driver. In addition, the ANN was used to estimate the arrival time for the commuters accurately. Based on the findings of a real-time implementation, the authors demonstrated that the proposed technique could successfully provide a fleet management console to administrators use as a real-time monitoring system in buses. Jiang and Claudel [101] implemented an intelligence platform for wireless technology for traffic/flash flood monitoring systems. The platform worked in real-time and provided high reliability and accuracy on complex problems arising in smart cities (e.g., traffic flow monitoring, machine-learning-based flash flood monitoring, and Kalman-filter-based vehicle trajectory estimation). More specifically, for flash food monitoring, the authors employed an ANN that learned the variations of the air temperature profile in function of the ground and air temperature inputs measured by passive sensors.

B) Advanced Machine Learning Approaches: Hybrid, Ensembles, and Deep Learning
This section presents hybrid approaches, ensembles, and DLbased techniques that we have categorized as advanced ML methods.

1) Hybrid Approaches
Hybrid approaches refer to integrating two or more (MLbased) methods for jointly exploiting their advantages in solving learning tasks (e.g., joint prediction and optimization) [102]. Figure IX reports an example flowchart that illustrates the application of a hybrid method.Hereinafter, we provide a brief explanation of hybrid method development and goal. As depicted in Figure IX, input data are fed to the predictor component which in turn produces the output values. The latter values are given as input to the optimizer component that compares them with target values (i.e. the ground truth) to optimize a compound cost function. Depending on the specific optimization task, the cost function can be either minimized or maximized. In detail, this optimization procedure aims to tune predictor parameters and the cycle continues until achieving the desired performance. This is obtained by comparing the output of the optimized predictor and target values and computing the related evaluation metrics.

1) Ensemble Methods
Similarly, ensemble methods jointly employ different ML techniques (usually called "weak learners") but for different purposes, such as decreasing variance and bias or increasing prediction performance. They are based on the assumption that combining multiple models to solve the same problem can produce a model with better performance. Figure X depicts the general structure of an ensemble model encompassing N MLbased "weak classifiers" whose outputs are combined via a meta-classifier. It should be noted that each ML-based classifier can be fed with a different set of features (viz., inputs). Bagging, stacking, and boosting are common meta-algorithms for obtaining an ensemble of ML-based algorithms [103]. Bagging employs homogeneous weak learners trained in parallel and combines their outcomes using deterministic averaging. Bagging is frequently used to successfully improve the performance of DTs used as weak learners in RF [104]. On the other hand, boosting also uses homogenous weak learners trained sequentially in an adaptive fashion (i.e., there is a dependence between each model and the previous one) that are deterministically combined. Finally, stacking considers weak learners trained in parallel that are combined by training a meta-algorithm (i.e., a meta-classifier) that provides a prediction by intelligently combining the "base" models (see, e.g., Figure 7). Advanced combination techniques can exploit both hard decisions and soft outputs of base models [50]. This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3181718 Figure XI sketches the flowchart of an example application of an ensemble method. In this workflow, the ensemble predictor is shown as a black box ("Predictor" in the Figure), and its role is independent of the specific ensemble meta-algorithm adopted. First, input data enter the pre-processing component that performs dataset cleaning and normalization. Then the data are passed to the feature selection unit. In more detail, the former component makes input values suitable to feed the ensemble predictor, whereas the latter aims to select the most informative features to improve ensemble performance that is assessed by comparing the output values with the target ones.

2) Deep Learning
Among advanced ML techniques, DL has emerged as a possible disruptive breakthrough allowing the automatic design of inference systems that can distill complex dependencies among input data limiting human-expert need in designing accurate features. The term "deep" refers to the usage of multiple transformation steps to create these features, which is reflected in computations performed by a neural network encompassing many "hidden" layers placed between the input layer (passing input data to the first hidden layer) and the output layer (producing the output variables). A wide variety of practical and robust methods are comprised within this subset of ML techniques. The most common DL architectures that we have found in the literature fall within the families of Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Auto Encoders (AEs) [105]. These DL techniques are widely used for multiple purposes, such as audio and speech processing, computer vision, network traffic analysis, social network filtering, pattern recognition, and big data applications. The parameters of DL networks are learned iteratively via the stochastic gradient descent optimization algorithm that finds the minimum of a cost (or loss) function. Specifically, an estimate of the gradients is calculated from a random subset of the training data. Also, the backpropagation algorithm is leveraged to efficiently compute the gradient of the loss function [80]. We briefly describe the most common variants of DL networks in the following. A CNN architecture is inspired by the visual functioning of living creatures and is one of the most popular DL techniques, finding applications, especially in computer vision [106]. Figure 12 depicts an example of a bi-dimensional CNN (briefly a 2D-CNN). From a macroscopic viewpoint, the CNN architecture encompasses two main parts. The former is a chain of convolutional layers that employ transition-invariant filters-whose dimensionality depends on the input nature This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ (e.g., bi-dimensional in the case of images)-that extract the features from a given input region within their receptive field by convolving with the input data. Commonly, each convolutional layer is followed by a pooling layer (e.g., a max-pooling in Figure XII) that performs down sampling of intermediate convolutional representation to reduce complexity and avoid overfitting.

Figure XII:
Example of a 2D-CNN architecture adopted by [107] The latter part consists of a series of fully connected layers that generate the proper output values depending on the considered task (e.g., classification vs. regression). An RNN architecture presents neuron connections forming direct cycles and is usually employed to recall temporal information via a state vector. It has as input a vector sequence and outputs either its final state or its entire time-evolution. Long Short-Term Memory (LSTM) is one of the most common variants of RNNs and presents special neurons (called cells) that can store and model dynamic temporal behaviors with long-term dependencies. An LSTM cell is made of three main gates (i.e. internal mechanisms that operate with sigmoid and hyperbolic-tangent activation functions and sum and product operations of vector variables), namely input, output, and forget gates, which control the input and output of the cell, regulate the information flow, and decide which information is relevant to recall or forget [108].

Figure XIII: Example workflow of an AE
The AE is a type of ANN commonly used for (unsupervised) feature learning, whose aim is to (ideally) output a reconstruction of the input by learning a compressed data representation. Figure XIII reports the example architecture of an AE. Specifically, the first AE block adopted from [109]. (i.e. the encoder) provides a lower-dimensional data representation (via a hidden layer of neurons), whereas the second block (i.e., the decoder) tries to reconstruct the data from the compressed representation [109]. The AE is commonly trained via fast, optimized backpropagation algorithms like the conjugate gradient [110]. Several studies have demonstrated the higher capability of advanced ML approaches (i.e., hybrid, ensemble, and DL techniques) in designing accurate models compared to traditional ML approaches. Ardabili et al. [111] presented a comparative study among single and hybrid Extreme Learning Machine (ELM) techniques for predicting and optimizing ethyl and methyl esters production, claiming that hybrid ELM techniques provided higher accuracy and optimized efficiency performance compared with that of single ELM. Jesús Cuenca-Jara et al. [112] proposed a novel data-driven methodology employing a fuzzy classifier based on volunteer geographic information to label spatial-temporal trajectories. Results were evaluated considering real-time detection of tourists and local citizens' flows. Comparisons were performed regarding classification accuracy with a wellestablished trajectory classifier used as a baseline, proving that the proposed solution is suitable for coping with the task. Recent research has shown that advanced ML techniques have become more and more popular due to their applicability in different research fields and higher performance when compared to traditional ML approaches. Smart city application is one of the most relevant fields that has found benefit from the appropriate usage of advanced ML methods. Table 6 presents notable papers-starting from the most recent ones-that have leveraged advanced ML methods for smart city applications. This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication.  In [113], an RNN-based LSTM platform was proposed to detect cyber-attacks in a smart city. The proposed technique provided an accuracy of more than 90%. In [114], the RNNbased LSTM technique was employed for preparing a platform to estimate traffic using noise pollution analyses in a smart city. The proposed technique provided a higher accuracy. In [116], Kumar employed Bi-LSTM to recognize the duplicity within the medical community sites. The obtained results suggested that the proposed technique provided an accuracy of 86.375%. Yin et al. [118] proposed a hybrid Ant Colony Optimization Ridge Regression (ACO-RR) algorithm, a smart-city evaluation method based on ridge regression, exploited to help construct small and medium-sized smart cities intelligently reusing existing resources and systems. Experimental evaluation is performed considering real smart-city datasets spanning over different years and coming from the evaluation report on the development level of China's smart cities. The results showed that the hybrid ACO-RR technique provides higher accuracy compared to SVM, DT, and ANN, thus proving to be more reliable than single ML approaches in the evaluation of smart cities. Kwon et al. [120] developed a hybrid reasoning model via a combination of crowd knowledge extracted from open source data and collective knowledge (CBR) for handling huge amounts of data aimed at obtaining a healthy environment (i.e. diagnosing wellness levels in patients suffering from stress or depression) in smart cities. The empirical evaluation demonstrated that the proposed approach performs better than traditional ML-based methods (e.g., SVM, DT, k-NN, Bayesian Network, Logistic Regression) due to the ability of hybrid CBR to properly manage big data (and possible class imbalance). Belhajem et al. [123] presented a study on the real-time prediction of vehicle positions in a smart city using a hybrid approach based on ANN and Autoregressive Integrated Moving Average (ARIMA) techniques. The ANN-ARIMA model is trained with GPS data to jointly learn both linear and non-linear dependencies in vehicle positions. Results showed up to 95% accuracy in predicting vehicle position during GPS outages compared to the Extended Kalman Filter. Besides, a group of works applied ensemble methods in smart cities. Hansen et al. [122] presented an ensemble method of ML-based classifiers exploiting both Logistic Regression (LR) and RF for forecasting home-care hours in a smart city. Experimental results are carried out considering data of Copenhagen citizens receiving home care from 2013 to 2017 and showed that the proposed method reaches an Area Under Curve (AUC) value of 0.715. The authors claimed that the proposed methodology can properly predict large increases in home-care hours, which is one of the major health expenses in a smart city. Alajali et al. [124] developed an ensemble technique based on Gradient Boosting Regression Trees (GBRT) for the prediction of car parking availability in smart cities. The method exploited data from multiple sources (i.e., car parking, pedestrian, and car traffic data) for extracting the relationship between pedestrian volume and car parking demand to predict parking availability at fifteen-minute intervals. The authors compared the proposed ensemble method with traditional SVM and DT inaccuracy and error probability. Experimental results demonstrated that the proposed ensemble technique has higher performance than single ML-based techniques, presenting an error probability of 0.029. Finally, DL techniques have also been widely applied in smart cities, as discussed here. Mujeeb et al. [117] employed LSTM for developing a prediction platform for the load and cost of an electricity grid system in the presence of data generated in smart cities. The proposed DL-based method was compared with ANN, and ELM techniques in terms of Mean Absolute Error (MAE) and Normalized Root Mean Square Error (NRMSE) measures. The results demonstrated that the LSTM outperformed compared forecasting methods in terms of accuracy, proving the efficiency of the proposed method for electricity price and load prediction. Indeed, the LSTM showed an MAE of 1.95 and an NRMSE of 0.08 for price forecasting on the ISO NE (Independent System Operator, New England) dataset, while an MAE of 2.9 and an NRMSE of 0.087 for load forecasting, showing better performance than ANN and ELM. Chackravarthy et al. [119] employed a DL architecture as a composition of an RNN with a CNN to predict criminal acts (e.g., assault detection, car theft, etc.) in smart cities. The proposed system aims to overcome the limitation of single DL techniques in analyzing video stream data playing criminal acts. The results showed higher accuracy compared to single DL algorithms at the cost of higher training time, thus allowing the implementation of an effective crime detection system that can reduce the workload of supervising officials in smart cities. Obinikpo et al. [121] presented a survey discussing the application of DL techniques for handling data generated by connected smart health systems. Specifically, they considered how these techniques can be exploited to improve the prediction of data sensed by IoT devices and to help decision-making in smart health services. The authors focused on both architectures (e.g., CNN, RNN, DBF, etc.) and methods for data collection using different sensor types, studying also challenges and open issues for identifying future directions for the application of DL techniques in smart health systems (e.g., medical imaging, bioinformatics, and predictive analysis).

IV. Evaluation of the ML Methods
As explained in the previous section, many ML algorithms tackled various challenges in smart cities. Therefore, we trust it is useful to augment our study by evaluating each ML technique (Section 5.2) based on different performance metrics (Section 5.1).

A) Overview of Performance Evaluation Metrics
Several evaluation criteria were used to evaluate the ML algorithm used throughout the tens of papers that we reviewed in this study. Figure XIV shows the commonly used evaluation criteria and the frequency distribution of their use across nearly 40 case studies that adequately analyzed and reported on their model's performance. The figure shows that accuracy, precision, and recall are the most common metrics, followed by error-related metrics (i.e., MAE and RMSE) and the correlation coefficient. Other metrics are less common since they either complement the above metrics (e.g., MAPE and MSE) or are simply specific to the case study (e.g., sensitivity and specificity related to binary classification tasks).

B) Experimental Results Reported in the Surveyed Case
Studies Next, we briefly explain the most common metrics (i.e. accuracy, precision, recall, RMSE, and correlation coefficient) and compare the experimental results reported in the studies we reviewed in this paper using them.

1) Accuracy
Accuracy has a positive correlation with the performance of ML methods and a negative correlation with RMSE (in general with error-related metrics). Per Equation (11), accuracy is the fraction of correctly classified samples among the total number of samples: where Truep denotes the true positives, Truen the true negatives, Falsep the false positive, and Falsen the false negatives. Figure XV   This could be justified by the hypothesis that the inference power increases when we combine multiple predictors or voters which help optimize the final performance (see e.g., [125] and [31]).
2) Recall Recall (also known as sensitivity, particularly in binary classification) is a metric that measures the relevance of a model. Equation (12) shows the formal definition of the recall metric, defined as the fraction of relevant instances of a class that are correctly classified (i.e., the class-conditional accuracy): where Truen denotes the true positives and Falsen the false negatives. Figure XVI compares the results in terms of recall values as reported in reviewed studies. The horizontal axis reports the methods employed, while the vertical axis represents the associated recall values. Again, single methods (e.g., SMO, NRBNF, LR) more often provided lower recall values as shown in Kwon et al. [120], whereas DTs and hybrid methods reached higher recall values based on the finding reported, for instance, by Elsaeidy et al. [33] and Kwon et al. [120].
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication.  3) Precision Precision is a metric that measures the overall performance stability of a model. Equation (13) outlines the formal definition of the precision metric, defined as the share of classifier decisions for a certain class that is correct: where Truep denotes the true positives and Falsep the false positives. Figure   Like other performance metrics, we notice that DTs and hybrid methods provided a higher precision value compared to other methods based on the results reported in Elsaeidy et al. [33] and Kwon et al. [120]. Ensemble techniques (e.g., RF used in Alrashdi et al. [26]) also showed high precision values (i.e. > 95%) when applied to smart city applications. 4) RMSE RMSE is an error-related metric that measures the difference between actual and predicted values. In general, increasing the difference between actual and predicted values reduces the accuracy and increases the error metrics such as the RMSE. Equation 14 defines the RMSE formally as: where N denotes the total number of samples, xi the actual samples, and ̂ the predicted samples. Figure [35]. Moreover, from Figure XII it is evident that the best models reaching the minimum RMSE are DL-based and hybrid techniques. For instance, the DLSTM and the hybrid WT+SAPSO+KELM compared in Mujeeb et al. [86], the hybrid GA-ANN, PSO-ANN, and ABC-ANN employed in Le et al [67], and the PSO-XGboost proposed in Le et al. [19] spawned significantly lower RMSE values compared with the single method baselines.

5) Correlation Coefficient
The correlation coefficient measures the (linear) statistical relationship between actual and predicted values. In particular, a higher correlation between target and output values increases the overall accuracy and reduces total error. Equation (15) shows the formulation for calculating the correlation coefficient: where x refers to actual samples, ̂ to predicted samples, ( ,̂) to the covariance between x and ̂, and σ to the standard deviation (calculated for both and ̂). The correlation coefficient ranges between -1 and +1. A negative number indicates a negative correlation, whereas a positive number denotes a direct correlation between target and output values: the closer the coefficient to 1, the higher the resulting correlation as well as the accuracy. Figure XIX presents the comparison of the correlation coefficient obtained by different ML methods in reviewed studies. First of all, we can notice that the values of the correlation coefficient are always positive, indicating a direct correlation and thus the suitability of all proposed models for applications in smart cities. Again, Figure XIX demonstrates that single techniques provided a lower correlation coefficient than hybrid techniques with the notable exception of DT-based ones that had comparable performance (e.g., the RF in Meenal et al. [35]). Specifically, the hybrid techniques PSO-XGboost proposed in Le et al. [81] and GA-ANN and PSO-ANN presented in Le et al. [94] confirm this claim.

V. Analysis and Discussion
In this section, we discuss ML methods used in smart cities from different perspectives. Based on our survey, we analyze how these methods compare to each other for efficiency (processing time), reliability (accuracy of results), and other performance aspects.

Efficiency (Processing Time) Analysis
For the processing time, Figure XX sketches the processing time score, the lower this score the faster the ML algorithm is. As seen, the x-axis of the chart lists the ML algorithms while the y-axis represents the processing time score. These scores are normalized using min-max normalization by applying Equation 16.
where XN denotes the normalized processing time score, and Xmin and Xmax are the parameters used for the min-max normalization and depend on the specific dataset employed. This ensures having a range of scores between 0 and 1. For better interpretation, we further categorized this score into four zones: High if 0.5 ≤ < 0.75; (iv) Very high 0.75 ≤ ≤ 1. The lower the score means that the ML algorithm is faster. Therefore, we noticed that the ANN is the fastest model, whereas DL and hybrid/ensemble are slower due to their complex computational architecture.

A) Reliability Analysis
When it comes to reliability, Figure XXI compares the accuracy of the output of each ML algorithms used in the smart city studies that we reviewed. The x-axis lists the ML algorithms, and the y-axis indicates the reliability score which is computed based on normalizing the performance metric (i.e. accuracy, precision, recall, RMSE, and correlation coefficient) used in the relevant work. To make these metrics comparable we normalized them using the min-max normalization as shown in Equation 17: (17) where YN denotes the normalized reliability score, and Ymin and Ymax the parameters (depending on the specific metric reported) used for the min-max normalization. This ensures having a range of scores between 0 and 1. For better interpretation, we further categorized this score into four zones: (iv) Very high 0.75 ≤ ≤ 1. Based on the reliability analysis, we conclude that that the ANN was the least reliable while DL and hybrids/ensemble methods are the highest. Among the single ML category, we noticed that the SVM had shown better performance (High) than the DT (Moderate).  Table XII gives a comprehensive comparison of the single ML-based, hybrid, ensemble, and DL-based models. The table describes the complexity, user-friendliness, accuracy, and processing speed of models used in smart city applications using the following categories: Low, Reasonable, Reasonably high, and High. We can notice that hybrid models and ensembles are the best performers since they exhibit both high accuracy and not-costly complexity. On the other hand, and despite that the DL techniques had higher accuracy than the hybrid models and ensembles, but they demanded relatively higher computation power. Moreover, all these advanced ML methods had a slower processing speed than the single methods. Likewise, the SVM and DT generally outperformed the ANN for accuracy and other metrics. However, since the difference is negligible, we can conclude that using any one of them is appropriate (cf. Sec. 5). The summary of Table VII suggests that the advanced ML methods are the best candidates to use in mart cities based on accuracy and efficiency. Nevertheless, it is not uncommon to use the ANN and SVM as they have a simpler design, faster, and with acceptable accuracy. Table VIII highlights the pros and cons of each method and a recap of the discussion presented herein. Based on this report, we may claim that advanced ML models have superior performance to the single ML techniques, but given their higher complexity, they can still be used successfully in specific applications. High accuracy, reliability, and user-friendliness

C) Pros and Cons
High complexity and moderate-to-low processing speed D) Application Share Figure XXII depicts the relative share of each ML method under different smart city applications such as vehicles and transportation, mobile communications, building, energy, health care, data management, public safety, management of (IoT) sensors, pollution monitoring, and reduction, etc. As illustrated in Figure XXII, the ANN, DT, and SVM were predominantly used in smart transportation, mobile communication, IoT sensors, smart energy, smart education, smart building, and air pollution monitoring. On the other hand, the advanced ML models are commonly used in more complex applications, for instance, with those that have big data. Consequently, the hybrid, ensemble, and DL methods are more popular in smart health and transportation systems applications and manage open big data and resources in smart environments.

VI. Open Issues and Challenges
Smart city applications have been faced with ML-based techniques as a new paradigm in this area. ML-based techniques are introduced as the vital element of smart cities, but the developed studies have not sufficiently and comprehensively considered these techniques. This part of the study discusses some open issues and challenges that can be targeted for future studies. For example, smart-city-based datasets are big and used by time-sensitive applications that demand real-time or semi-real-time analytics. This highlights the need for a new analytic platform that supports big data analytics with fast/streaming data analytics. Furthermore, in applying the ML-based methods for smart city applications, the system's validity is closely related to the accuracy and precision of the data. On the other hand, data availability is a major challenge from the point of view of copyright issues and ethics. Furthermore, due to the nature of the data required for smart city applications, many performance domains can be easily rendered inaccessible if the results with large volumes of data for simulation are not confirmed. Therefore, the success of ML-based techniques in smart city applications depends on overcoming these challenges and excelling over them. Furthermore, due to the real-time applications of smart cities, the need for an ML-based technique that can provide high accuracy while providing a high operating speed and light platform can improve system reliability, stability, sustainability and availability.

A) Advanced ML and DL techniques
The number of applications in smart cities and their complexity will keep increasing due to the increase in the human population, the advent of new technologies every day, and the complexity of orchestrating all these systems together. This situation will continually generate big data that require more computational power and smarter algorithms. Handling this massive amount of data will remain a constant challenge for scientists to tackle by introducing more efficient and reliable ML algorithms that can be practically used in smart cities. The current advancements in ML and DL-based technologies rendered the concept of Smart Cities a reality. Nevertheless, more improvements will remain in demand if we aim to have smarter cities in all fields such as healthcare, security, transportation, traffic congestion, parking, pollution, etc.

B) IoT in Smart city applications
The presence of IoT in smart city applications can be a gamechanger in applications. Many open issues related to security, healthcare, safety, transportation, waste management, etc., can benefit from the IoT sensors. Combining those sensors and the data they collect with ML algorithms can foster the development of smart cities and make them more efficient and sustainable. This alliance between the sufficient datasets collected by IoT sensors and more powerful ML algorithms can provide practical solutions for the serious challenges that smart cities encountering today.

D) ML for security and privacy
Nowadays, cities are changing to evolve as smart cities all over the world. Accordingly, they need to collect and analyze huge volumes of data for different applications like automating processes, enhancing service quality, improving marketing services for users, and making better decisions. One of the main challenges of the creation of smart cities is to increase the quality of life for humans using digital interconnectivity, leading to increased efficiency and accessibility in cities. This leads Smart cities to move towards the enhancement of privacy and security to ensure the participation of citizens because the existence of security and privacy in society guarantees the satisfaction of citizens and the stability of the society. Therefore, one of the most important challenges of a smart city is ensuring security and privacy. Security and privacy challenges in the smart city include different subsections, which are described as follows: 1) Cyber risks Smart City covers several advantages and benefits. IoT-based technologies in smart city applications can successfully enhance critical infrastructures. But there are arrangements required for preventing cyber risks to smart cities, such as threats that endanger the safety of citizens and the continuation This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3181718 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ of operations and services. Also, these arrangements have to prevent personal privacy reliance on rapid data sharing and data mining techniques. A smart city is integrated with a database to store data securely. In the meantime, employing ML-based techniques can successfully prevent cyber-attacks and strengthens security infrastructure. In fact, ML techniques benefit pattern recognition ability, estimation of behaviors, organizing a huge volume of files, recognizing potentially dangerous ones, and blocking perceived threats.
2) Public safety The requirement for public safety in a smart city is a growing challenge in general. New digital technologies to enhance the efficiency of different applications of smart cities are followed by urban population growth. The 5G technology, AI and IoT are the basis of smart cities. Progress in all of these increases the sense of need for public safety. Increasing public safety in smart cities increases trust and confidence in the system. The progress in AI and ML-based techniques for smart city applications can successfully enhance public safety by concentrating insights into the IoT networks to be monitored, analyzed, and acted upon in real-time. Data-based systems are one of the fields in which ML-based techniques have been successfully tested and employed to increase the system's abilities. Public safety in smart city applications can be considered one of the data-based systems that can be integrated with ML-based techniques for enhancing the system's efficiency. Several applications of ML-based techniques like Image processing, speech recognition, and efficient monitoring algorithms can be considered as elements to enable roaming ML-based techniques around public safety in smart city applications. ML-based techniques are considered a collection of intelligence technologies to provide considerable benefits to the criminal justice, Law Enforcement, Corrections, Courts, homeland security, and public safety domains such as Fire and Emergency Management Services.

3) Monitoring and sensor-based technologies
Monitoring in smart city applications is an innovative and significant open issue that can also be known as an effective challenge. Monitoring needs to be equipped with powerful information technology enabling ML-based transformation of big data into a wide range of custom services to monitor and control complex urban processes in real-time. Monitoring provides a holistic vision and transparency of the complex processes in the urban area as a practical system in real-time applications. Accordingly, the stakeholders can enhance the efficiency and quality of Local and Regional Management and Governance. Accordingly, the quality of life and community transparency will promote new business models. Monitoring can be employed for traffic, public transportation, and natural hazards monitoring systems like flash floods and air pollution monitoring systems. The rise of smart city applications toward monitoring systems causes a considerable growth opportunity for sensor makers. This growth supports technologies such as 5G, robots, AI, and edge computing for smart city applications. The electronic, infrared, thermal, and proximity sensors are sensor technologies for smart city applications. As it is clear, the future of smart cities is intertwined with new technologies in the sensor industry, and we must wait for tremendous progress in this area. Table IX summarizes the studies and ML-based techniques which support the security and privacy challenges in the smart city applications and identifies which challenges and ML-based techniques require high-level studies and experimental work for future perspectives.

VII. Conclusion
In this work, we present a comprehensive, systematic review of machine learning algorithms in smart city applications. As a result, we can conclude that the ML algorithms can fall into one of the following four categories: decision trees, support vector machines, artificial neural networks, and advanced machine learning methods (i.e., hybrid methods, ensembles, and Deep Learning techniques). We give a theoretical description for each ML algorithm and demonstrate how it was used across many applications in the smart city context. Furthermore, we evaluate all reviewed ML algorithms concerning efficiency (computational speed), reliability (accuracy of the output), and the pros and cons of each. Among the many important observations we encountered through our analysis, we found that hybrid methods, ensembles, and deep learning techniques can outperform single methods at the cost of higher complexity and processing time. With this meticulous analysis and comparisons, we hope to guide researchers, practitioners, and policymakers to select the appropriate ML tool for the right problem. Many challenges and issues are still open for smart cities. We believe that coupling IoT with more powerful and reliable ML algorithms that can process a massive amount of data collected from the sensors will be the trend in the coming years. This might result in solutions for the never-ending problems typically associated with urban cities such as traffic, healthcare, pollution, education, etc.