1 Introduction

Cities are constantly evolving, and the transformation into smarter and more efficient environments is a pressing need due to population growth, resource management, technological advancements, citizen expectations and economic competitiveness. To accomplish this transformation, the innovative use of data and AI-driven techniques is of significant importance. These technologies have the potential to revolutionise various aspects of our life, ranging from infrastructure maintenance to service delivery and citizen engagement Angelidou et al. [1] and Kaluarachchi [2].

In particular, in the realm of city-wide infrastructure maintenance, one critical area that has the power to transform our lives is roadside infrastructure Forkan et al. [3] and Karimi and Faghri [4]. Roadside infrastructure plays a vital role in the development and growth of cities. Enormous amounts of money are spent each year on inspecting, monitoring and maintaining these essential components of urban infrastructure. For instance, for local roads across Australia, the estimated cost of maintenance and renewal infrastructure between 2010 and 2024 is expected to be a staggering A$45 billion the conversation [5]. Inadequate maintenance of roadside infrastructure poses significant challenges, including environmental externalities, safety risks and citizen frustration Karimi and Faghri [4]. Thus, it is imperative for governments at the local, state and federal levels to prioritise the monitoring and timely maintenance of these critical assets [6, 7]. However, the current approaches to roadside infrastructure maintenance are largely reliant on traditional methods that are reactive, time-consuming and costly. Infrequent road condition audits and manual processing of community-initiated service requests often lead to delays, inaccurate information and operational inefficiencies Salcedo et al. [8] and Forkan et al. [3]. Moreover, these approaches face limitations when it comes to areas with lower levels of citizen participation, visibility and reporting, such as remote and rural regions.

To bridge this gap and harness the potential of emerging technologies, the paradigm of smart cities has gained widespread attention. Smart cities leverage cutting-edge information and communications technologies to address the complex challenges faced by modern cities Ahad et al. [9] and Atitallah et al. [10]. Technologies such as the Internet of Things (IoT), edge and cloud computing Atitallah et al. [10], big data Rathore et al. [11], artificial intelligence (AI) Ashwini et al. [12], geospatial technologies and high-speed wireless networks like 5 G have been identified as key enablers for implementing smart cities Ahad et al. [9]. These technologies are expected to provide avenues for automating city services, integrating diverse entities within cities and enhancing the overall quality of life for citizens.

In this paper, we present an adaptable framework driven by AI and IoT for city-scale sensing, namely AIoT-CitySense, that is collaboratively designed with a local government in Australia, that leverages a range of emerging technologies towards mobile city-scale sensing for roadside infrastructure maintenance. AIoT-CitySense stands out due to its ability to collect city-scale data and applies AI models to analyse this extensive data in real-time, advancing the identification and prioritisation of maintenance needed for roadside infrastructure. One of the key advantages of AIoT-CitySense is its ability to leverage the natural mobility of already available mobile assets, such as waste collection service trucks, public transports, police cars and street cleaning vehicles. These assets cover a large geographical area of the city on a daily basis, allowing for city-wide data collection to fulfil the sensing needs for road infrastructure maintenance. By equipping these mobile assets with suitable sensor modalities that seamlessly integrate into their operations, AIoT-CitySense converts each vehicle into a sensing node capable of efficiently collecting and storing city-scale data.

Moreover, we present an AI and IoT-driven solution namely Mobile IoT-Roadbot that is developed by adapting AIoT-CitySense to address specific requirements of the government municipality, particularly focusing on the management of roadside infrastructure, such as detecting damaged road signs and identifying instances of dumped rubbish on the streets. Specifically, this solution harnesses existing waste collection service trucks in the municipality with a cloud computing setting and smart IoT devices such as stereo-vision cameras and global navigation satellite system (GNSS) sensors, along with edge computers and 5 G network connectivity. These trucks facilitate the seamless and real-time capture of roadside data. These data are then transmitted to our AI models hosted on a cloud server, allowing for real-time analysis and processing of the collected information. The solution was successfully implemented in the municipality: the western suburbs of Melbourne, Australia, where 11 waste collection service trucks were deployed, covering over 1000 kms of roads on a weekly basis. By tailoring AIoT-CitySense into the solution in a real-world operational environment, we demonstrate its effectiveness and practicality. This live deployment serves as a testament to the potential of our solution in enhancing roadside infrastructure management and validates its ability to address real-life challenges in a meaningful way.

In contrast to previous studies that primarily centred on laboratory experiments [13], small-scale pilot tests using smartphones for data collection [14], or relying on publicly available data [15], this paper distinguishes itself by offering city-scale sensing and real-world pilot testing. This paper lays a strong foundation for further exploration and implementation of AI-driven approaches in the advancement of smart cities, paving the way for urban environments that are more efficient, sustainable and conducive to a high quality of life.

Specifically, this paper makes the following key contributions:

  • AI and IoT-driven adaptable framework for city-scale sensing: We propose an adaptable framework, AIoT-CitySense, that leverages AI and IoT-driven emerging techniques to enable city-scale sensing. It is designed to be flexible and scalable, allowing for the seamless integration of emerging technologies. By harnessing the power of AI and IoT, it can empower cities to gather and analyse large volumes of data, transforming traditional infrastructure maintenance practices into proactive and efficient processes.

  • Tailored solution for roadside infrastructure maintenance We demonstrate a tailored solution of AIoT-CitySense, Mobile IoT-Roadbot to meet the distinct needs of a government municipality in relation to roadside infrastructure maintenance. To the best of our knowledge, Mobile IoT-Roadbot is the first research solution that combines emerging technologies such as IoT, 5 G, cloud and AI to automate roadside infrastructure management using mobile assets in real-time in a real-world setting.

  • Development of custom deep learning (DL) models for roadside infrastructure maintenance We demonstrate the practical application of DL models with an average accuracy of 88% in real-time identification of roadside infrastructure in need of maintenance. Using a cloud computing setting, we demonstrate the effectiveness of these DL models in proactively identifying maintenance requirements.

  • Real-world pilot test and learning from it We demonstrate the cost-effectiveness and efficiency of the solution adapting AIoT-CitySense. Through a pilot deployment, we validate the feasibility and effectiveness of the solution in a real-world setting. We address specific roadside infrastructure monitoring use cases, demonstrating the practical application of our solution. This pilot deployment provides insights into the impact of our solution on creating sustainable cities, demonstrating the benefits of proactive maintenance and its contribution to more liveable urban environments.

The rest of this paper is organised as follows: Sect. 2 provides a review of the existing practices in roadside infrastructure maintenance. Section 3 details the AIoT-CitySense ’s architecture. Section 4 presents a solution, outlining how to adapt AIoT-CitySense to address specific needs of identifying the maintenance issues related to damaged road signs and dumped rubbish. in Sect. 5, we discuss the outcomes of a pilot study conducted to evaluate the effectiveness of the solution and highlight the key lessons gained from the study. Finally, Sect. 6 concludes this paper.

2 Related Work

Several studies have explored AI and IoT-driven approaches and frameworks in the field of smarter cities. Atitallah et al. [10] surveyed the literature regarding the use of IoT and DL to develop smart cities including computing infrastructure such as edge and cloud computing and big data analytics. The generalised IoT architecture proposed in this survey consists of hardware, connectivity and communication middleware, big data storage and analytics and IoT applications. Our AIoT-CitySense framework is adapted from this generalised architecture and modified to meet the unique requirement for utilising mobile assets for roadside infrastructure maintenance. Ashwini et al. [12] surveyed AI-based solutions in smart city implementation. Authors identified that DL models are the most popular to develop such applications. The survey also highlighted that urban planning such as the case study presented in this paper is less explored for smart cities in literature. Dias et al. [16] emphasised the importance of AI and IoT for smart city applications such as detecting road maintenance issues like potholes, reducing the risk of traffic collisions and reducing energy consumption. Rathore et al. [11] proposed various smart systems that are used to get real-time city data to make a decision. The authors used big data analytics approach and demonstrated the effectiveness of the solution using lab-based experiments. Our solution distinguishes itself from such work in the literature by offering city-scale sensing in real-world settings.

The literature in city-scale sensing in the context of roadside infrastructure monitoring has aimed to leverage emerging technologies and advanced analytics to improve the efficiency and effectiveness of infrastructure management in smart cities. For example, a study Wu et al. [17] proposed an AI-based system for city-scale pothole detection using data from vehicle-mounted IoT sensors. The system utilised machine learning algorithms to analyse sensor data and identify potholes in real-time. AI models for traffic sign detection models were proposed Lim et al. [18] and Li et al.[19] to detect traffic signs in various scenarios and conditions. While traffic sign detection is indeed a crucial component of roadside infrastructure monitoring, our focus is on addressing the specific challenges of detecting damaged road signs and identifying dumped hard rubbish. Damaged road sign detection poses additional complexities as it requires identifying signs that may exhibit various forms of damage or degradation, such as graffiti, fading or physical deterioration. Similarly, the identification and analysis of dumped hard rubbish on the streets involve complex visual recognition and data-driven classification techniques. Our solution is tailored from AIoT-CitySense to tackle these challenges and provide effective detection and analysis in these specific domains.

Various techniques were utilised to capture roadside data, each with different cost considerations and capabilities: radar  [20], laser  [21, 22] and visual systems  [13, 23]. Radar technology was commonly used to detect objects and measure distances, providing information on traffic volumes and speeds  [20]. It works in various weather conditions, detecting objects at long ranges through snow, rain and fog. However, there are some limitations that can hinder their capabilities for city-scale sensing. One limitation is its relatively high cost compared to other techniques. Also, radar may not provide detailed information about the shape or characteristics of detected objects. Light Detection and Ranging (LiDAR) uses laser pulses to generate a detailed 3D representation of the environment, enabling measurements of the roadway and its surrounding features  [13]. This technology can provide the capability to produce high-resolution 3D images and measure distances, making it effective for detecting small objects. However, its limitations include that the LiDAR’s range is restricted, which means it can only capture data within a limited distance from the sensor. Further, the equipment required for LiDAR-based data collection and processing can be expensive, making it less accessible for widespread deployment and large-scale applications for city-scale sensing. Computer vision-based systems involve using cameras and software to analyse roadside images and videos to extract useful information, roadside infrastructure and signage  [3, 23]. It offers high-resolution imaging capabilities, and it can detect and classify issues using AI. Moreover, it has a relatively low cost compared to LiDAR and radar. These systems are relatively cost-effective for analysing a large number of images and video data. The proposed solution leverages stereo-vision cameras and GNSS sensors. Such cameras capture images that can be used for detailed analysis, while GNSS sensors provide precise location data. This combination offers several advantages, including cost-effectiveness, accurate positioning and the ability to capture both visual and spatial information.

As AI-driven approaches, machine learning (including deep learning) models have gained popularity for monitoring the conditions of roadside infrastructure assets. To facilitate the implementation of these models, the adoption of crowdsourcing for data collection has become widespread, addressing the challenge of scarce real-world data. In the studies [24] and [25], accelerometers and GPS devices were installed in hundreds of participant-owned vehicles to monitor vibrations and locations. The data collected from these crowdsourced sensors were then analysed to identify road anomalies such as potholes and road roughness. Crowdsourcing, although beneficial for data collection, may encounter limitations in providing the labelled training data essential for training AI models. Moreover, crowdsourcing is susceptible to challenges such as participant dependency, restrictions on sensor types and concerns regarding the reliability of data sources. A study [26] utilised machine learning techniques to identify traffic signs in rural highways, using a 3D point cloud dataset generated by LiDAR cameras. However, this work sorely focused on recognising traffic signs and did not address the automated flow of data collection and maintenance issue identification within the context of a smart city. In the study [13], machine learning models were used to identify road maintenance issues such as road surface, road marks and traffic signs. These models were trained using proprietary databases, but this study did not conduct real-world experiments or evaluate the efficiency and feasibility of the approach in a smart city setting, where efficient and timely responses across a large area are crucial considerations.

To address the limitations of crowdsourcing and data scarcity in building effective machine learning models, several works have focused on capturing city street data. For example, the City of Boston in the USA developed a smartphone application that utilises the device’s sensors to identify and report potential road issues as citizens drive through the city [27]. However, the success of this approach heavily relies on active citizen engagement and may not provide comprehensive coverage across the entire city. In addition, scalability can be a challenge, limiting its effectiveness in covering a large portion of the city’s streets. Another project undertaken by the City of Boston involved the deployment of fixed cameras at a specific intersection. Machine learning models were used to analyse the captured data, characterising traffic flow and counting pedestrians, bicycles and vehicles [28]. While these data can be valuable for local traffic planning and improvement, the approach faces difficulties in scaling up to cover a significant area of the city’s streets. In Spain, the SmartSantander testbed was implemented as a functional smart city environment, deploying various use cases such as waste management and incident management, which share similarities with the roadside maintenance use case addressed in this paper [29]. However, like the City of Boston projects, these implementations also relied on citizen reporting, which can introduce limitations and dependencies on active citizen participation.

Our proposed framework, AIoT-CitySense, and its tailored solution are distinguished from previous studies by providing an adaptable city-scale sensing approach for roadside infrastructure monitoring. While the majority of studies focused on pothole detection or traffic sign recognition, our approach tackles the challenges of identifying damaged road signs and dumped hard rubbish. It provides adaptability, reducing data scarcity by utilising stereo-vision cameras and GNSS sensors. Our solution is cost-effective, leveraging affordable technologies while integrating visual and spatial information for more accurate analysis. Thus, we present a more holistic and adaptable approach for city-scale sensing in smart cities, addressing limitations and providing a broader range of capabilities.

3 AIoT-CitySense Framework

This section provides an overview of AIoT-CitySense framework, which is depicted in Fig. 1. AIoT-CitySense is a framework for city-wide data collection using IoT with the idea of using mobile assets in the infrastructure and utilising AI for efficient processing. It comprises five layers that leverage a range of technologies. AIoT-CitySense stands out due to its cost-effectiveness, adaptability and scalability, making it a transformative framework for city-scale sensing in smart cities.

AIoT-CitySense eliminates the need for extensive deployment of dedicated roadside infrastructure sensors, resulting in significant cost savings. It also minimises disruptions to the vehicles’ usual operations, making it a practical and scalable solution for city-scale sensing.

The adaptability of AIoT-CitySense stems from its layer-based design and flexible architecture. By separating the system into distinct layers, each with its specific functionality, AIoT-CitySense allows for customisation and integration of additional components. This approach enables organisations to tailor AIoT-CitySense to their specific requirements and seamlessly incorporate it into their existing infrastructure. Thus, integrating new sensors, expanding the functionality of the AI models or adapting to evolving data sources, AIoT-CitySense is designed to easily accommodate these changes without disrupting the overall system.

Scalability is another key strength of AIoT-CitySense. As cities grow and their infrastructures become more complex, the demand for effective city-scale sensing solutions increases. AIoT-CitySense is designed to meet this challenge by leveraging both cloud-based and edge-based computing environments. This allows for the efficient processing and analysis of large volumes of data, ensuring real-time insights and timely decision-making. Moreover, the scalability of AIoT-CitySense enables it to accommodate expanding data sources, such as additional sensors or sources of public data, without compromising its performance or reliability.

The cost-effectiveness, adaptability and scalability of AIoT-CitySense make it an ideal choice for smart city applications beyond roadside infrastructure maintenance. Its layer-based and flexible architecture empowers cities to address a wide range of challenges and leverage the benefits of AI-driven city-scale sensing. By providing a versatile framework that can be easily customised and scaled, AIoT-CitySense paves the way for more efficient resource allocation, improved decision-making and the creation of smarter and more sustainable urban environments. The five layers of AIoT-CitySense are described as follows.

Fig. 1
figure 1

The adaptable AIoT-CitySense framework for city-scale sensing

3.1 Mobile IoT Layer

The Mobile IoT layer is responsible for producing city-scale data. It consists of multiple service vehicles (SVs) that collect roadside infrastructure data. For example, a waste collection service truck collects data during waste collection rounds. Each SV can be equipped with a number of IoT devices. For instance, this can include a camera, a global navigation satellite system (GNSS) receiver and an onboard edge computer. These devices are connected to the SV’s engine and powered on when the SV is turned on. The camera produces roadside images/videos and GNSS creates relevant location information. The edge computer receives data from connected IoT devices and can run necessary customised software to prepare data in the required format (e.g. compressed video segments). These formatted data are then sent to the data ingestion layer depending on the available communication medium (e.g. 5 G, wifi, local network). For example, if the data ingestion layer is hosted in a cloud server and if a 5 G router and 5 G dome antenna are connected with the edge computer then data can be transmitted to the ingestion layer via 5 G. However, if the data ingestion layer runs as a separate process in the edge computer itself then such transmission can directly happen via the edge device memory and disc space.

3.2 Data Ingestion Layer

The data ingestion layer is responsible for receiving streamed data from Mobile IoT layer. Data are processed in this layer using several steps. A streaming data receiver gets video segments including metadata (e.g. GNSS location, time) from SVs. Data are then queued in order to process every segment. A data filtering process removes irrelevant data based on GNSS location, SV speed and time. After that, the data transformation process is initiated to extract and choose images from a video segment (to eliminate repeated or duplicated information). In the final step, the images are integrated with location and time. Depending on the requirement of the organisation the processes in the data ingestion layer can be hosted in an edge, fog or cloud environment.

3.3 Data Storage Layer

This layer is responsible for storing necessary incoming data from other layers. For example, all processed videos, images and integrated data processed by the data ingestion layer are stored here. The analytics outcomes produced by analytics layer are also stored here. Required databases to support organisation-specific data are hosted in this layer. Moreover, the decision outcomes generated from decision support layer are also stored in this layer. The storage layer can be located in the cloud or organisation-managed local storage.

3.4 Analytics Layer

The analytics layer is responsible for automatically processing the integrated city-scale data stored in the data storage layer to identify points of maintenance (PoMs) that are roadside infrastructure requiring maintenance. This layer is composed of multiple AI models (off-the-shelf or custom-developed) developed to identify PoMs. This layer performs two major tasks. First, the target objects of interest are identified using multiple AI models. This task basically does the first level of filtering of images. For example, if the target is to detect damaged road signs then this task first uses the off-the-shelf DL model to filter images with road sign objects. Similarly, other DL models run to detect other objects of interest (e.g. filter images with bus shelter objects to detect PoMs as vandalised bus shelter, filter images with lane marking objects to detect PoMs as faded lane marking). AI models are also can be used for discarding images containing objects of no interest (e.g. removing images from further analysis of detecting dumped rubbish if the image contains a rubbish bin), or any privacy filtering (e.g. removing images containing people). The next task is to use custom-developed AI models to detect PoMs. Since PoMs for road infrastructure maintenance has a wide range of variations these AI models are custom-developed by training using a large number of annotated data of target PoMs along with AI model performance improvement techniques such as image augmentation and incremental learning. These custom-developed AI models are capable of performing image-level or object-level prediction. For the object-level, they predict PoMs by highlighting bounding boxes around detected PoMs in the image and for the image-level they predict whether the image contains the PoMs along with a confidence score. This makes it easier for humans to manually inspect and verify the results.

3.5 Decision Support Layer

This layer provides an interactive tool with maps to visualise PoMs along with location and detection time. This dashboard can be designed to communicate interactively with maintenance crews about identified PoMs and allows the crew to verify them. The interactive tool can consist of several filtering options to view city-scale data such as filter by time, geographical region and use case. The tool allows maintenance crews to accept (as correct PoMs) or ignore (as incorrect PoMs) reported issues. These decisions can be exported into the organisations’ issue-tracking software, allowing for the maintenance crew to take immediate action. This also helps to collect ongoing data to improve the performance of AI models in the analytics layer.

4 AIoT-CitySense ’s Adaptation: Mobile IoT-Roadbot

This section presents our solution, called Mobile IoT-Roadbot, that has been tailored from AIoT-CitySense for a real-world pilot. Mobile IoT-Roadbot is specifically tailored to detect PoMs of roadside infrastructure. This solution was successfully deployed on 11 waste collection service trucks (WCSTs) in a city council covering an area of 123 km\(^2\) in the western suburbs of Melbourne, Australia. The deployment of Mobile IoT-Roadbot is depicted in Fig. 2, utilising the Amazon Web Services (AWS) cloud environmentFootnote 1 for hosting and data processing. To enable seamless data transmission, 5 G connectivity, deployed by a major Australian telecom provider Optus Enterprise, Footnote 2is used, with commercial 5 G routers. Note that, though this particular deployment of AIoT-CitySense is based on AWS-specific services, it is independent of any specific cloud or network provider.

Fig. 2
figure 2

AWS-specific implementation and deployment adapted from AIoT-CitySense

The customisation details of each layer with AWS-specific implementation of AIoT-CitySense are described as follows.

4.1 Mobile IoT Layer

This layer consists of multiple WCSTs that collect city-scale data during waste collection rounds using our customised deployed IoT hardware and transmit the data to AWS cloud server via 5 G. The IoT hardware of each WCST consists of a stereo-vision camera, a 5 G dome antenna mounted on the front bull bar as per Fig. 3a, b, an onboard edge computer, a GNSS receiver and a 5 G router inside their cabin as depicted in Fig. 3c. These devices are connected to the truck via internal cabling and powered on when the truck is turned on (as shown in Fig. 3d). AWS Greengrass is used for remote management of these IoT and edge devices from the cloud. The stereo-vision camera captures videos at 32 images (or frames) per second. The video frames are compressed and partitioned into consecutive segments of \(\thicksim\)3 s, with each segment containing \(\thicksim\)50 frames. The length of \(\thicksim\)3 s was chosen to ensure that the segment is of smaller size (\(\thicksim\)1 MB), allowing fast real-time streaming of video data to the data ingestion layer in the cloud via 5 G for further processing. This approach also helps in avoiding high data loss.

Fig. 3
figure 3

The IoT device setup in the waste collection service truck

An edge computer software sub-component (EC) is developed to pack the recorded video and GNSS information together, split it into short 2–3 s video segments and transmit it to the cloud via the AWS Kinesis Video Stream (KVS) service. KVS acts as a buffer to ensure that no video is lost and as a load balancer to ensure that all data streamed by WCSTs are securely transmitted and stored in the cloud for further processing.

4.2 Data Ingestion Layer

The technical process of the data ingestion layer involves several steps, beginning with a developed cloud software sub-component (CC) which operates on an AWS elastic compute (EC2) instance. The CC reads the streamed data (i.e. videos with GNSS locations) from KVS and extracts individual frames (i.e. images) and GNSS information from each video segment. Then, the CC filters out irrelevant data based on pre-defined rules. For example, when a truck is moving at a high speed on highways or is in an area of no interest (e.g. depot or landfill sight), then the corresponding data are discarded.

4.3 Data Storage Layer

The pre-processed data are stored in AWS simple service storage (S3) for long-term storage. In particular, one frame per second from each video segment is selected and stored in S3, which will be used by the analytics layer for further processing. This choice is based on the relatively slow movement of the WCST, with limited changes in the scene within one second of video capture. For each frame, the corresponding metadata (i.e. the location from GNSS, time and frame path in S3) is also stored in another repository within S3. All metadata related to identified points of maintenance (PoMs) by the analytics layer (including GNSS coordinates, images, videos, etc.) and decision outcomes in the decision support layer are stored in AWS relational database service (RDS) with a PostgreSQL engine in this layer.

4.4 Analytics Layer

This layer is composed of deep learning (DL) models designed to identify roadside infrastructure issues such as dumped rubbish on the road and damaged road signs that require maintenance. The AWS Rekognition service was used to deploy the DL models and a lambda function is used to store the outcomes detected by the model. The three use cases were chosen as target PoMs for the pilot, namely damaged road signs, dumped rubbish and bus shelters. The real-world roadside data collected from WCSTs were used to train the DL models. Each use case required a separate DL model.

AWS Rekognition contains a pre-trained DL model that is capable of detecting around 300 objects and more than 3000 labels across many categories. Using the custom label feature in Rekognition a custom DL model can be developed using Rekognition’s existing capabilities, which are already trained on tens of millions of images across many categories. The DL model development steps for two use cases using AWS Rekognition are described as follows.

4.4.1 Damaged Sign Detection Model

The pre-trained DL model of Rekognition already can detect a ‘road sign’ in an image. The custom model for damaged sign detection is developed by training this DL model with a large number of labelled image samples of road signs and damaged signs that were collected in the trial to accurately identify damaged signs. The DL model development process is described as follows.

  • Collect 2 weeks of data from 11 WCSTs

  • Use the pre-trained DL model to separate all images containing a road sign

  • Human expert manually inspects each image and draws a bounding box around the road sign using an annotation tool (e.g. Rekognition custom label) and labels them as ‘damaged sign’ if any sort of damage (e.g. bend, crack, vandalised, faded) is identified as defined by the council, otherwise labels them as simply ‘road sign’. Some examples of labelled data are presented in Fig. 4.

  • The annotated data were then used to train and develop the custom DL model in Rekognition. This model is used to predict damaged signs from live streaming data from WCSTs.

  • The prediction outcomes are verified by the experts and any false positives detection by the DL model are corrected.

  • To improve the model detection accuracy further, the custom DL model was incrementally trained using new corrected label data from subsequent weeks of WCSTs operation and new models are developed

  • The final DL model, after incremental improvements, was trained using 3000+ images where only 10% of those images had damaged signs and the remaining were road signs.

  • The final DL model is the deployed in analytics layer for damaged road sign detection.

Fig. 4
figure 4

Sample annotation for damaged sign and road sign

4.4.2 Dumped Rubbish Detection Model

The custom model for dumped rubbish detection is developed by training this DL model with a large number of labelled image samples of dumped rubbish. This DL model is constructed as an image classifier and described as follows.

  • The initial dataset of dumped rubbish was provided from the council’s image repository. Images containing dumped rubbish from the council-provided dataset are separated and labelled as ‘dumped rubbish’ images. The images collected from the trial not containing any rubbish are labelled as ’not rubbish’. Some examples of labelled data are provided in Fig. 5.

  • The labelled data were used to train and develop the custom DL model in Rekognition for dumped rubbish detection in image.

  • Similar to the damaged sign detection model, the false positive detections are corrected and the model is incrementally improved.

  • There were still some false dumped rubbish detection such as rubbish bins (e.g. the leftmost image of the bottom row in Fig. 5), fences and trees. These objects are filtered out in the final phase of the detection using the pre-trained DL model of Rekognition.

  • The final DL model was trained using 10,000+ images and deployed in the analytics layer for real-time detections of dumped rubbish PoMs.

Fig. 5
figure 5

Sample annotation for dumped rubbish and not rubbish images

Overall, for the purpose of developing custom DL models for the analytics layer, we developed 15,000+ annotated images. The dataset can be made available for future research upon request.

4.5 Decision Support Layer

This layer is comprised of two sub-components, namely: (1) a front-end interface, which is a visualisation dashboard, developed using Vue JS and Typescript; and (2) a back-end API, developed using.NET Core. The dashboard provides an interactive map with markers, where each marker corresponds to a potential maintenance task identified by the analytics layer. Figure 6 shows this dashboard. By clicking on a marker, users can view more details about the identified issues, such as their type (e.g. damaged road sign, dumped rubbish or bus shelter), location, date and time of detection, as well as the GNSS coordinate (Latitude and Longitude) and a photo/video of the PoM. This dashboard also includes a PoM detection confidence score, which is a probability-based score (between 1 and 100) provided by the DL model. The dashboard also includes filtering capabilities, enabling users to retrieve PoM data based on a date range (From and To), status (Open, Accepted or Ignored) and PoM type. Our PoM dashboard is a unique and valuable management tool because it provides a real-time interactive, visualisation tool that was previously unavailable to local council users.

Fig. 6
figure 6

Map-based user interface in PoM dashboard

5 Outcome Analysis of Real-World Pilot and Lessons Learned

This section provides the analysis of outcomes from the real-world pilot deployment of Mobile IoT-Roadbot for city-scale sensing. The trial spanned a period of 6 months from June 2022 to Dec 2022 on 11 WCSTs. Each WCST typically operates for seven hours (from 5 AM to 2 PM). In Fig. 7, the travel routes of the 11 WCSTs over 2 weeks, captured by the installed GNSS receiver, demonstrate that the trucks’ natural mobility covers approximately 95% of the entire council area in that time period.

In this section, we provide valuable insights gained from Mobile IoT-Roadbot ’s deployment and testing outcomes. Our analysis mainly focuses on evaluating the effectiveness and accuracy of Mobile IoT-Roadbot in detecting PoMs for roadside infrastructure.

Fig. 7
figure 7

The travel routes of the 11 WCSTs for the trial period of 2 weeks

5.1 Effectiveness of Mobile IoT-Roadbot

The effectiveness of Mobile IoT-Roadbot in detecting and reporting PoMs was evaluated through a comparative study against traditional methods. The study analysed data collected from the council’s community-initiated (resident-reported) service request system between 28th July and 15th August 2022 and compared it with roadside issues identified by Mobile IoT-Roadbot. The resident reporting data included GNSS location information, time and a plain text description of the complaint, which was manually exported from the council’s system. An analysis was conducted by matching GNSS locations to identify the number of dumped rubbish issues detected by Mobile IoT-Roadbot and also reported to the council via community-initiated service requests. Based on this analysis, the effectiveness of Mobile IoT-Roadbot is presented in the following, along with Fig. 8.

Fig. 8
figure 8

Effectiveness of Mobile IoT-Roadbot

  • Mobile IoT-Roadbot detected 85% more PoMs dumped rubbish incidents accurately than the traditional methods.

  • In the 13% incidents that matched with resident reports in the council’s reporting system, we found that 94% of dumped rubbish issues (14 out of 15 issues) identified by Mobile IoT-Roadbot were captured prior to resident reporting, on average 3 to 4 days earlier.

  • Only 15% of dumped rubbish PoMs detected by Mobile IoT-Roadbot were captured in the council’s resident-reported system. The remaining 85% of the detected issues were not reported.

These observations indicate that the Mobile IoT-Roadbot is highly effective, feasible and capable of supporting city councils in transitioning to proactive roadside infrastructure maintenance management through city-scale sensing.

5.2 Accuracy of PoM Detection Using DL Models

To evaluate the accuracy of PoM detection using deep learning (DL) models. To assess the performance of these models, we used three well-known metrics: precision, recall and F1-score. These metrics are computed based on the true positive (TP), false positive (FP), true negative (TN) and false negative (FN) values obtained from the outputs produced by Mobile IoT-Roadbot. By analysing these metrics, we gain insights into the effectiveness and reliability of the DL models in detecting PoMs accurately.

Overall, the evaluation results show excellent performance from the developed DL models, achieving over 88% accuracy in automatically detecting roadside infrastructure issues for three use cases: dumped rubbish, road sign damage and bus shelter. The performance of each DL model for each of the use cases is presented in Table 1. As seen, the DL models achieved an overall accuracy of over 88% in detecting roadside issues. The model’s performance for detecting dumped rubbish and bus shelter issues was better than that for damaged road signs, with F1-scores of 97% and 96%, respectively. The model’s performance in detecting damaged road signs had an F1-score of 71%. The developed DL models for detecting roadside issues were trained on datasets of different sizes. Specifically, the model for detecting damaged road signs was trained on a dataset of 2917 images, while the models for detecting dumped rubbish and bus shelters were trained on 7808 and 1004 images, respectively. These models were then tested on separate datasets of 731, 1956 and 252 images for detecting damaged road signs, dumped rubbish and bus shelters, respectively. Note that the detection accuracy for damaged road signs is relatively lower compared to dumped rubbish and bus shelters. This is due to the limited availability of the training data, which contains only around 10% of damaged road signs. However, the performance of this model is expected to improve as the model is re-trained with more real-world images of damaged signs.

Table 1 The performance of each of the DL models for the three use cases

Figure 9 shows some examples of PoM detections, along with zoom views that were identified by the aforementioned custom-developed DL models during the trial. As shown, Mobile IoT-Roadbot was able to detect damaged road signs (e.g. bend, cracked), dumped rubbish on the street and bus shelters. Further, as presented in Fig. 9, our DL models were also capable of detecting PoMs even in dark, cloudy or rainy conditions.

Fig. 9
figure 9

Examples identified PoMs

5.3 Lessons Learned

Intelligent decision support tools based on city-scale sensing are expected to transform the way people live and work in a smart city. Our adapted solution from AIoT-CitySense demonstrated the power of leveraging city-wide data using AI and IoT techniques to improve the quality of people’s lives through fast and efficient detection of roadside infrastructure issues.

From the real-world pilot deployment, we observed that the Mobile IoT-Roadbot, tailored from the AIoT-CitySense, is adaptable and scalable. This has been operational since June 2022 on 11 waste collection service trucks and easily can be expanded on many more trucks. The solution detected several maintenance issues that are currently not being reported to the council’s citizen reporting system. The solution, on average, identified 40–50 roadside issues, i.e. on average 1 issue every 1–2 km\(^2\) across the service areas of the 11 trucks. On average, each truck streamed approximately 5 GB of data per day, with a transmission rate of 2.5 MBps (max: 4.24MBps). However, we observed that there was a variation of 10–20% data loss per day due to weather conditions, network coverage and truck speed. On average 5000 images per truck, per day, have been processed by DL models and took approximately 2.5 h in AWS Rekognition with $12 AUD associated AWS services running cost. This is approximately 35,000–55,000 images per day across all operational trucks with an associated cloud cost of approximately $200 AUD. This cost can be minimised through the development and deployment of edge-computer-compatible DL models.

The development of AIoT-CitySense has several important implications. First, AIoT-CitySense, from which Mobile IoT-Roadbot is derived, demonstrates cost-effectiveness, adaptability and scalability. By leveraging the existing fleet of waste collection service trucks, Mobile IoT-Roadbot capitalises on the natural mobility of these assets to collect data about roadside infrastructure during their regular service operations. This cost-effective approach eliminates the need for extensive deployment of dedicated roadside infrastructure sensors, making it an economically viable solution for city-scale sensing. Second, the adaptability of AIoT-CitySense is a key advantage that allows it to be easily tailored for various applications within smart cities. With its layer-based design and flexible architecture, AIoT-CitySense enables seamless integration of additional components and customisation to meet specific requirements. This adaptability empowers organisations to address a wide range of challenges beyond roadside infrastructure maintenance, leveraging the benefits of AI-driven city-scale sensing for other applications. Third, scalability is another significant aspect of AIoT-CitySense ’s framework. There is a growing demand for effective city-scale sensing solutions with the growth of the city along with its complex infrastructure. AIoT-CitySense is designed to meet this challenge by leveraging cloud-based or edge-based computing environments. This ensures efficient processing and analysis of large volumes of data, providing real-time insights and enabling timely decision-making. The scalability of AIoT-CitySense allows for the incorporation of additional sensors or sources of public data without compromising performance or reliability.

We demonstrated that Mobile IoT-Roadbot can help to reduce the burden on citizens to report issues themselves. The analysis from the pilot deployment shows that many maintenance issues are never reported through the council’s citizen reporting system, but are instead detected by the Mobile IoT-Roadbot solution. This means that citizens can feel more confident in the safety and maintenance of their communities, without having to take on the responsibility of reporting every issue they encounter. Also, the use of 5 G technology to transmit the city-scale data in real-time opens up new possibilities for monitoring and analysing more road infrastructure issues (e.g. potholes, road surface cracks).

In summary, both AIoT-CitySense and Mobile IoT-Roadbot represents a significant step forward in the use of AI and IoT to address social problems for improving community satisfaction. By leveraging the power of IoT, 5 G, AI, edge and cloud computing, this paper presents an efficient and effective approach to monitoring and maintaining roadside issues, improving safety and quality of life for citizens.

6 Conclusion

In this paper, we presented an AI and IoT-driven city-scale sensing framework, AIoT-CitySense and a tailored solution of this, Mobile IoT-Roadbot. Mobile IoT-Roadbot is an innovative first-of-its-kind solution for city-scale sensing that has been deployed and piloted on 11 waste collection service trucks in Melbourne, Australia. The solution uses IoT devices to capture data about roadside infrastructure and advanced AI models to automatically detect and report issues with roadside infrastructure. A 6-month pilot of the solution validates its ability to offer city-scale sensing scalability while delivering significant benefits in supporting cities via real-time identification and reporting of road infrastructure issues. AIoT-CitySense has the potential to revolutionise the way cities manage their infrastructure and services, making them more efficient and responsive to the needs of their citizens. Future work includes further optimisation and enhancement of our AI models to improve the accuracy of automatic issue detection and to develop and deploy the models on the edge.