Litter on the streets - solid waste detection using VHR images

ABSTRACT Failures in urban areas’ solid waste management lead to clandestine garbage dumping and pollution. This affects sanitation and public human hygiene, deteriorates quality of life, and contributes to deprivation. This study aimed to test a combination of machine learning, high-resolution earth observation and GIS data to detect diverse categories of residual waste on the streets, such as sacks and construction debris. We conceptualised five different classes of solid waste from image interpretation: “Sure”, “Half-sure”, “Not-sure”, “Dispersed”, and “Non-garbage”. We tested a combination of k-means-based segmentation and supervised random forest to investigate the capabilities of automatic classification of these waste classes. The model can detect the presence of solid waste on the streets and achieved an accuracy of up from 73.95%–95.76% for the class “Sure”. Moreover, a building extraction using an EfficientNet deep-learning-based semantic segmentation allowed masking the rooftops. This improved the accuracy of the classes “Sure” and “Non-garbage”. The systematic evaluation of all parameters considered in this model provides a robust and reliable method of solid waste detection for decision-makers. These results highlight areas where insufficient waste management affects the citizens of a given city. Key policy highlights The best segmentation using simple linear iterative clustering (SLIC) was achieved with the parameter values 8,000 segments and 0.3 compactness. The following supervised classification of the segmented images using Random Forest yielded an average overall accuracy of 80.18%. The model can detect the presence of solid waste on the streets and achieved an accuracy of up from 73.95%–95.76% for the class “Sure”. The average reflectance values of the classes “Sure” and “non-Garbage” overlapped. Removing the building rooftops from the orthotiles reduced the overlap of the classes mentioned above. This allowed better identification of the class “Sure”. Moreover, rooftop removal helped improve the accuracy of the classifier, from 59.51% to 90.18% to 71.53% to 95.76% in study areas with and without rooftops, respectively.


Introduction
Can we monitor garbage on the streets?Can we use remote sensing together with an automatic or semiautomatic method to identify where there are sanitary problems in a city?Sanitation, a human right, refers to the access to and use of facilities to dispose of solid waste appropriately, among others (Habitat, 2020).Unfortunately, not all institutions or governments have the capacity and resources to provide the necessary services to the residents, like proper sanitation, fast enough (Habitat, 2020).The "management of solid waste and stormwater drainage", also named "environmental sanitation" (HABITAT, 2008), affects the individual and the community as well.
At a local scale, municipalities are typically the ones in charge of the collection, transport, and final disposal of solid waste (HABITAT, 2008).When this service fails, residents might discard their garbage legally or illegally in open spaces, parks, and rivers or leave it accumulating on the streets.This waste can be transported to other areas because of strong winds or rain, polluting other neighbourhoods, rivers, or groundwater.When left on the streets, it clogs the drainage system, causing flooding (Medina, 2010).Improper management of solid urban waste contaminates groundwater (Vasanthi et al., 2008), attracts pests and animals (e.g.rats) that transmit diseases, and contaminates the air, among others (Yang et al., 2018).Since the decomposition of garbage also occurs in an anaerobic way, it produces methane, which in turn causes spontaneous fires.Moreover, some people might induce fires to burn waste and reduce the sanitary impact and the volume of waste in the dumps (Medina, 2010).
These challenges of unmanaged waste predominantly appear in cities of the Global South, for example, in Latin America and the Caribbean, and especially in poor urban areas, such as informal settlements, where public services are often not comprehensive (Martínez Arce et al., 2010).With the expected increase in the global urban population to 60.4% by 2030, the number of slums or areas with deprived urban infrastructure will also increase with sanitary problems (Habitat, 2020;Medina, 2010).
The lack of proper sanitation or poor management of urban solid waste deprives the population of basic hygiene and health, leading to a lower quality of life (HABITAT, 2008).This deprivation of basic needs and opportunities limits the individual's ability to live a fulfilling life, thereby enhancing poverty (Anand & Sen, 1997;Kuffer et al., 2018;Taubenböck et al., 2018).To tackle these issues and thereby improve the well-being of urban inhabitants is part of the United Nations Agenda of Sustainable Development Goals (SDGs) objectives.More specifically, targets 6.3 and 11.6.1 aim for the appropriate disposal of waste to avoid the pollution of water sources (UN-Water, 2017) and promote the sustainable management of solid waste (Habitat, 2016).Therefore, identifying areas with deficiencies in sanitation or solid waste management (SWM) can support urban planning and management.

Research on remote sensing to study waste management in urban areas
Remotely-sensed data can provide information on the location where the garbage was or should be disposed.The use of sensor products varies with the size and characteristics of the solid waste being studied.For example, Gill et al. (2019) used Landsat TM and ETM+ to detect the waste that spread under landfills with a Ground Sampling Distance (GSD) of 100 m.For the monitoring of waste on land areas approximately 2 × 2 m in size, Yonezawa (2009) combined data from ALOS and Quickbird (0.65-2.5 m GSD).Karimi et al. (2022) used Landsat 8 and night light images from the Suomi NPP to estimate the probability of locating illegal landfills (30-500 m GSD).In general, images from high-to middle-resolution sensors can be used for the identification of dumping sites at a scale of a few metres.
In the case of dumping zones a few centimetres in size, very high-resolution (VHR) imagery is necessary.Data from airborne cameras or unmanned aerial vehicles (UAVs) are available in the order of millimetres or centimetres, depending on the flight altitude, quality of the camera, and atmospheric conditions (Osco et al., 2021).Jakovljevic et al. (2020) used UAV (0.4-2.3 cm GSD) data to detect plastic bottles in water bodies.Torres and Fraternali (2021) used UAVs (20 cm GSD) to detect and map illegal dumping zones.To achieve greater detail on the nature and extent of solid waste, other sources have been used, such as data from photos or images from surveillance cameras (Alfarrarjeh et al., 2018;Dabholkar et al., 2017), a combination of Google Street View, ImageNet, and self-taken images (Ping et al., 2020), or repositories of data like SpotGarbageGINI in GitHub (Patel et al., 2021).In all of these cases, objects like bottles, cartons, furniture, etc., were visible and easily identified.
For solid waste data analysis, several methods have been tested.Visual estimations of dumping areas can be helpful if it is not possible to access them and were the most successful on sites <400 m 2 in Bangalore, India (Chanakya et al., 2017).Diverse machine learning methods, such as deep learning (DL) (Dabholkar et al., 2017;Jakovljevic et al., 2020;Patel et al., 2021;Ping et al., 2020;Torres & Fraternali, 2021;Youme et al., 2021), and decision trees classifiers (Alfarrarjeh et al., 2018), among others (Shahabi et al., 2014) (for a more detailed review, see (Singh, 2019;Xia et al., 2021).However, some studies also relied on spectral signature differences (Yonezawa, 2009) or visual change detection to estimate the dumping zones' location.
Regardless of the extensive research on solid waste identification using remote sensing methods, when urban deprivation in cities is estimated, the waste aspect is integrated using GIS or survey-based methods (Ajami et al., 2019;Kuffer et al., 2021).For example, Ajami et al. (2019) measured the deprivation of a slum using a set of surveyed and remotely-sensed factors.Waste management was only part of the survey (i.e.GIS data).While there is general agreement with this approach, we believe that the estimation of urban deprived areas could also benefit from a remote sensing -based method as a proxy of sanitary deprivation.After all, the more accurate the data for urban areas, especially the ones that struggle the most, the better we can provide information for policymakers, and stakeholders, among others (Kuffer et al., 2021).

Solid waste conceptualisation
Illegal waste disposal has different meanings depending on many factors, including the following: (1) The legal system (i.e.how does a local government define litter?).
(2) The components (i.e.domestic, or construction waste, among others).( 3) Size (i.e. from a few square centimetres to several hundred square metres).(4) The behaviour of the citizens (e.g.dumping zones on the streets or around collection centres).
Different locations have laws to define illegal littering.For example, in a study of illegal dumping in Queensland, Australia, the authors stuck to the local legislation for a definition of the type of waste on which their research focused: "illegal waste disposal sites are restricted to the unlawful deposit of an amount of domestic waste 200 litres or greater in volume" (Glanville & Chang, 2015).In Colombia, Law 120-99 of the National Congress states where solid waste should not be disposed of and, if so, how the person should be penalised (Congreso Nacional, 1999).Although the law does not define solid waste, it states that garbage should not be disposed of on "streets, sidewalks, curbs, parks, highways, roads, public baths, seas, rivers, creeks, streams, and irrigation channels, beaches, squares and other places of recreation and other public places" (Congreso Nacional, 1999).
The size of the dumping zones depends on their characteristics.Generally, solid waste refers to objects or materials that are useless to humans and, therefore, discarded (Medina, 2010).Waste can be divided into several categories: household solid waste, municipal, or urban solid waste, special waste, construction waste, and hazardous waste (Martínez Arce et al., 2010).Waste is defined by sources, such as the households of city residents, generated during production processes, produced by the construction or demolition of infrastructure, or by activities that could affect human health.These can be in solid, liquid, or gaseous form (HABITAT, 2008;Martínez Arce et al., 2010;Xia et al., 2021).Even though there is research on clandestine littering on streets using remote sensing and or artificial intelligence (AI) methods, all studies have diverse definitions of garbage (Alfarrarjeh et al., 2018;Dabholkar et al., 2017;Patel et al., 2021;Ping et al., 2020;Torres & Fraternali, 2021).
When waste is packed in plastic bags, regardless of content, specific elements are escape the remote sensing detection.With VHR or camera surveillance imagery, it is possible to detect specific elements like furniture, electronics (Alfarrarjeh et al., 2018;Dabholkar et al., 2017), or plastic bottles (Jakovljevic et al., 2020).Detection focused on garbage bag accumulation or small piles of litter on the streets might be useful, especially for low-income countries that struggle with their SWM (Iyamu et al., 2020), and when VHR imagery is not available for detecting individual objects.
In this study, we developed a model for detecting solid waste that focuses on objects disposed on the streets or areas of public access that are not dumped into containers but instead abandoned, cornered, or grouped into visually defined clusters.Usually, these waste objects are packed into white or black bags, creating compact objects that can be recognised in several locations.For this purpose, we defined classification categories based on the probability that an object was garbage.
As a case study, we focused on Medellín, Colombia.Local media constantly reports about citizens dumping their waste on the streets outside the containers designated for its disposal.The municipality struggles to identify the more affected zones and the citizens who litter illegally (El Tiempo, 2022).The novelty of this work is to develop a model of solid waste identification focused on aggregations of litter (like bags) dumped in streets or areas of public access and not dumped into containers or landfills, which have been the main focus of most of the recent studies in this topic.Moreover, our model uses imagery provided by the local government of Medellín, which allows for faster implementation of SWM programmes.

Objectives
This study aimed to test a combination of remote sensing data and machine learning approaches to conceptualise and detect illegal solid waste dumping in an urban landscape.In this way, we can provide a reliable method to decision-makers on where insufficient waste management affects urban residents.Described below are the steps of the workflow: (1) Supervised segment-based classification of orthorectified images to detect urban waste accumulations.
(2) Evaluate which appearance or type of urban waste can be detected at which accuracy levels with the approach mentioned above.(3) Determine if an auxiliary data set on the buildings improves the capacity to identify street waste accumulations.
In the following chapters, we (i) describe the utilised materials and explain the developed methods, (ii) focus on the results of our experiments, and (iii) explain the outcome of our analysis and implications for policymakers or decision-makers.

Materials and methods
The following section describes the datasets and the algorithms used.

Study area and data
The research used data from Medellín, Colombia (Figure 1).This municipality belongs to the Department of Antioquia.Its authority extends over 374.8 km2, which contains 16 communes and 273 neighbourhoods in 117.4 km2.The working area or region of interest (ROI) comprises 23.04 km2.This is defined by the area covered by the available aerial images.From this ROI, 25 areas of interest (AOI) were selected, each 0.25 km2.These AOIs are image subsets in which the analyses were conducted.
The orthorectified images comprise four bands (blue, green, red, and near-infrared) with a pixel size of 8 cm.Each image is a composite created by the mosaic of different stripes of camera recording underneath an aeroplane (Servicios de Imágenes de Medellín, 2021) and covers an area of 3.84 km 2 .All images were reprojected to the Antioquia Medellín coordinate system with Datum MAGNA and Mercator Projection.The images were from 2019 and were provided by the Image Service of the Municipality of Medellín via an ArcGIS online server (Servicios de Imágenes de Medellín, 2021).
Covering the entire city of Medellín, building footprint data outlined the borders of all buildings with rooftops.The dataset from 2017 is provided by the GeoMedellín Service of the Municipality of Medellín (GeoMedellin, 2020).Since this dataset did not include all buildings created from 2017 onwards, an updated building footprint data set was created based on semantic segmentation of the orthophotos (see the Building Footprint section) and merged with the official one.Since garbage is usually found on the streets, rooftops are excluded by masking out the building footprints.This allowed the classifier to focus on the streets.The intent was to detect garbage that poses a hygiene risk to the urban population.Therefore, any element that resembled garbage inside a private property was beyond the scope of this research.The training and test data included labelled polygons in two main categories: (1) Areas that included garbage or urban residual waste (G).( 2) Areas with anything else that was not garbage (nG).
The G-dataset was created manually using visual recognition of the garbage accumulation on the orthorectified images of the ROI and their posterior mapping.This process produced a total of 2,660 training areas for detected waste.The creation of the nG dataset was done using segmentation (see the section on Sensitivity Analysis and Segmentation) and posterior selection of segments representing the diversity of nG elements on the 25 AOI raster images.For the input dataset, 500 samples of G and nG objects were randomly selected.Finally, this input dataset was split into 70% train and 30% testing.

Building footprint
The official building footprint data set provided by the city of Medellín dates from 2017, while the orthophotos for garbage detection date from 2019.The entire orthophotos, at 16 cm pixel size, were split into individual image tiles of 224 × 224 pixels with 33% overlap between the images to reduce border effects.Using this high-resolution remote sensing data, we created an updated building footprint dataset with a deeplearning-based semantic segmentation approach (Wurm et al., 2019(Wurm et al., , 2021) ) (Figure 3).A precise and complete data set on building footprints was essential for the success of the presented method for garbage detection.
For the process of building extraction using semantic segmentation, we used EfficientNet, as introduced by Ronneberger et al. (2015).One of the main advantages of this architecture is that it can deal with a small number of samples, which is advantageous in the context of building extraction.Furthermore, the network uses data augmentation to artificially increase the number of training samples.The network was trained with local domain knowledge from Medellín Orthophotos with manually derived building footprints.Detailed information on the model set-up and parameters can be found in (Wurm et al., 2021).

Urban residual waste dataset
Samples for training the model were created by visual interpretation.This resulted in more than 3,000 polygons assigned to five different categories, indicating the reliability of the objects being garbage or not.These are termed "Sure", "Half-sure", "Not-sure", "Dispersed", and "Non-garbage" (Figure 4).The class "Sure" was composed of grouped black-and-white round-shaped objects easily recognisable as garbage elements.Comparing some "Sure" locations with Google Street View data confirmed that it refers to bags piled up and disposed of on the streets.
The classes "Half-sure" and "Not-sure" refer to the 50% and<25% probability that the selected objects are garbage, respectively.These proportions are based on ground-based user experience, i.e. some of the authors have observed the littering problem in the region.Small, scattered elements of garbage covering an empty area or with ground surface visible in between are labelled as "Dispersed".They are probably garbage, but they are not packed into bright and black bags like the "Sure" ones.Everything else in the scenes that do not belong to the categories above was labelled "Nongarbage", i.e. dwellings, streets, humans, vehicles, vegetation, and rivers, among others.
To determine which residual waste classes could be identified, all waste categories were combined with nG polygons.The five categories mentioned above were combined in different ways, called "Treatments" in this paper.Treatments refer to the five combinations of the solid waste factor on which the classifier is applied.They were defined as follows (see also Figure 4): In conclusion, based on the ground sampling size and temporal frequency of the orthophotos, theoretically it is possible to detect solid waste objects of at least 256 cm 2 when the aeroplane camera is recording.However, the segmentation algorithm will also influence the final minimum size, since it clusters the pixels in searching for homogeneity zones.Our waste objects have a minimum size of 760 cm 2 .Since it is impossible to detect individual elements such as plastic bottles or cartons with 8 cm GSD, we defined the probability mentioned above as classes of garbage.The categorisation of waste was based on the user ground-based experience.Finally, the VHR imagery available is recorded once per year.This only allows for the detection of waste at a specific time of the year.

Sensitivity analysis and segmentation
SLIC is an unsupervised k-means-based algorithm for spatial image segmentation.This classifier groups or clusters pixels based on their colour similarity and closeness on the image plane, thereby reducing the complexity of an image (Achanta et al., 2010).It is based on a five-dimensional space named CIELAB, composed of the Labxy parameters: [Lab], which represents the colour vector using lightness L and chromaticity ab, and [xy], which represents the location coordinates of a given pixel (Achanta et al., 2010(Achanta et al., , 2012)).Segmentation results in a series of superpixels or regions of homogeneity.These superpixels are a set of pixels grouped into a segment that does not necessarily represent a semantic object completely, but a homogeneous part (Ren & Malik, 2003).
The implementation of the SLIC algorithm in Python was accomplished via the skimage library (van der Walt et al., 2014).The SLIC function assigned each pixel i to the closest cluster, and in every iteration, the distance was reduced (Achanta et al., 2012).The Python implementation of the SLIC function clustered the pixels based on the parameters "number of segments" (ns) and "compactness" (c), where the first defined the approximate number of segments to fit in the image.The latter measures the compromise between colour and spatial proximity (van der Walt et al., 2014).
The goal was to segment each of the 25 AOIs so that the segments could capture even the smallest garbage areas.To determine which parameter values would generate meaningful segmentation in our AOIs, a supervised sensitivity analysis was applied.After a brief inspection of the SLIC function and how it performs with our data, we systematically tested the following values for the "number of segments": 2,000, 4,000, 6,000, 8,000, 10000, 12000, and 14,000.The following values were tested for "compactness": 0.001, 0.01, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1.This analysis was performed on the randomly selected AOIs 1, 25, 6, 14, and 17, resulting in 560 segmentations.
Figure 5 illustrates the effect of different compactness values for a fixed number of segments (8,000 ns).The lower the compactness, the larger and less homogenous the segments.For compactness>1, the quality rate (QR) and the number of squared polygons stabilise until the values for both functions do not change further (for more detailed information, see Supplementary Material, Tables 1 and 2).After running all the segmentations, the resultant polygons were tested against the digitised garbage dataset.An accuracy metric and a measure of shape were used to estimate the goodness of the segmentation algorithm (Clinton et al., 2010).
QR was chosen as an accuracy metric because it considers false-positive errors.In this way, we can avoid segment sizes that cannot fit into a potential garbage object but are incorrectly identified as suitable for the task (i.e.we want to avoid a Type I error) Figure 4. Training data categories were defined in this study.There were four categories of urban residual waste":sure"",halfsure"", and not-sure", which represent 100%, 50%, and<25% probability of being garbage, respectively, as well as"dispersed", and non-garbage.The first row shows the orthotile subsets, while the second row shows the same location in google street view.(Bramer, 2016).QR is an area-based measure that calculates the proportion between the intersection and union of digitised and algorithm-segmented polygons (Clinton et al., 2010;Weidner, 2008).In equation 1, R refers to the polygons of the reference dataset, and S to the segments created by the SLIC segmentation that we want to evaluate.QR takes values with ρ q ϵ [0,1], with 1 being the optimal segmentation (Weidner, 2008).
The SLIC segmentation produced squared segments for high values of the compactness parameter.In this case, homogeneity no longer plays a role (Figure 5).Since we wanted segments that considered the influence of the spectral signature, we excluded the polygons with four right-angle vertexes using formula 2: where A equals the area of each i segment.For every combination of ns and c, if P equals the perimeter, it is considered a "perfect square" or a segment with four right-angle vertexes.The proportion of polygons X that are not squares was calculated with X = 1 -P.Both indexes, X, and ρ q , were summed.For the classification step, three of the highest values of X were selected (Table 1).

Factors and classification
To test if the garbage accumulations could be detected in aerial imagery with machine learning approaches,  a combination of many framework conditions or factors were chosen, namely (Figure 6): • 25 AOIs in scene subsets taken from five orthotiles.
• Three appropriate sets of parameter values for SLIC segmentation (see Table 1).• Five combinations (A -E) of garbage categories or treatments.• Two building conditions: Each raster subset or AOI was classified as a whole or without buildings.In the second case, we used a building footprint to clip the rooftops of dwellings out of the scene.
• Six statistical metrics: For each segment, six metrics were calculated for each of the four bands separately.These are minimum pixel value, maximum pixel value, mean, variance, skewness, and kurtosis.The metrics provided a higher-dimensional feature space for classification than only the pixel values.
The combination of the factors mentioned above produced 750 classifications.Each subset was classified using a random forest (RF) approach.RF is an algorithm that performs a classification using decision trees.This classifier was trained with samples selected randomly and with replacement.Roughly two-thirds of these samples were used to create decision trees, while the remaining third was used to validate these trees, in other words, to measure the model's accuracy.
The model must define two variables: the number of decision trees and the number of variables to be used when making every decision that leads to a tree.For this study, RF was chosen for the following reasons: its high computational speed and accuracy, it does not assume a normal distribution, and its implementation is quite simple, since it requires only setting up two parameters (Belgiu & Drăguţ, 2016;Breiman, 2001).It has been successfully used in a wide range of remote sensing data, from low to VHR images, in combination with other products, to detect land cover classes (for an overview, see (Belgiu & Drăguţ, 2016)).RF was implemented using the "RandomForestClassifier" function from the Scikitlearn package in Python (van der Walt et al., 2014).The model was applied using 500 trees and bootstrapped samples.Using the training data, the model was fit using the metadata of the segments belonging to each of the classes mentioned above and their corresponding labels.

Accuracy assessment
The estimation of the classification accuracy was grouped into different categories: the segmentation values used, the type of waste, and the presence or absence of building footprint.A confusion or error matrix per group was calculated, and the following indexes were measured: overall accuracy (OA), producer's (PA), user's accuracy (UA), and kappa coefficient.Furthermore, PA and UA were summarised in the F-score for better readability of the results.The F-score is defined as the "harmonic mean between precision P and recall R" (Dalianis, 2018), or PA and UA, and it is defined by formula 3: The confusion matrix summarises how the samples from the test dataset correspond to the categories of the same pixels from the classified image (Bramer, 2020).The correctly classified pixels, related to the total number of pixels evaluated, correspond to OA (Congalton, 2001).The kappa coefficient measures the agreement between the classified image and the reference data and has a value range of [−1,1].The closer the value is to 1, the higher the agreement between the classified image and the reference dataset (Congalton, 2001).To get an idea of the performance of each class, PA and UA were calculated (Story & Congalton, 1986).The PA measures the "errors of omission" or the probability of a class being correctly classified -in other words, how well the algorithm predicted every class.On the other hand, the UA measures the "errors of commission" or the probability that the classification is what, in reality, is happening in the area studied -in other words, how reliable it is (Congalton, 2001;Story & Congalton, 1986).

Results
In the following section, we describe in detail the results and accuracy metrics for the various steps of the workflow.We found that small and heterogeneous urban residual waste can be identified in VHR aerial images using an RF classifier with high accuracy.

SLIC segmentation
For the sensitivity analysis of the SLIC segmentation, the higher the ns and c values, the higher the QR (0.017 to 0.354; see Supplementary material).Segmentations with<8,000 ns produced very big superpixels.For example, polygons with [2,000 ns, 0.001c] and [4,000 ns, 0.001c] were huge and yielded lower QR values.These values are located in the upper left light region of the heatmap in (Figure 7).Since vast segments are not suitable for detecting garbage polygons, they were excluded.
On the other hand, high compactness levels affected the shape of the produced segments.The estimation of the proportion of square polygons per scene showed that approximately 99% of the segments with c ≥ 10 were mostly perfect rectangles.In this case, the shape and homogeneity information of every superpixel were lost.Moreover, the accuracy of the segmentations>12,000 ns and>10c was the same, which means the function reached a plateau after these values.These QR and shape values were primarily found in the lower right diagonal of the heatmap (Figure 7).After discarding all non-suitable values, three values were selected (see Table 1).These three combinations of SLIC function parameters were used to segment the 25 AOIs for the analysis in this study (Figure 8).
The choice of the values of the SLIC segmentation parameters, ns, and c, influenced the classifications.In general, the segmentation 8,000 ns − 0.3c performed the best in terms of OA.The average OA of the segmentation 8,000 ns − 0.3c was 80.18%, followed by 10,000 ns − 0.1c with an OA of 77.95%, and 12,000 ns − 0.1c with an OA of 75% (see detailed values in Table 2).

Building footprint
The accuracy of the resulting updated building footprint was evaluated using official cadastral building data, yielding 80% accuracy.Specifically, this building footprint had an accuracy of F1: 0.92, precision: 0.89, and recall: 0.94.Adding a building footprint increased OA on almost all treatment and segmentation combinations (Figure 9).This graph shows the difference in the OA minus with the building footprint.The OA is generally higher in the results where the classification is limited to areas outside building rooftops (from 71.53% to 95.76%) (Table 2).Otherwise, OA was lower (from 59.51% to 90.18%).
A closer look at every class shows that the F1 score tends towards being higher without the building footprints, especially for "non-Garbage", "Sure", "Dispersed", and "Not-Sure" (Figure 10).However, the class "Half-Sure" often performed better with the presence of rooftops (for treatments B and D).
When the buildings were removed, the algorithm located residual waste objects, mainly on the sidewalks where the garbage is usually dumped.Moreover, the classifier seldom identified residual waste objects in open areas, such as in the middle of streets, rivers, and vegetation.This confirms the plausibility of our classification results.

Identification of urban residual waste categories
The algorithm and combination of different factors evaluated in this study can separate the defined solid waste classes from the nG segments in the selected study areas.However, differentiating the diverse classes proved to be a more difficult task.The detection of urban waste was successfully achieved with the probability class "Sure", and treatment A was followed by B. Treatment A had an OA ranging from 79.62% to 95.76%, whereas the latter had an OA ranging from 73.95% to 90.18%.On the other hand, the "non-Garbage" category scored the highest UA and PA in all treatments (>70%) (Figure 10).In other words, the absence of solid waste was the most accurate result obtained.
confusion matrix of the classification of the 25 AOIs using segments created with the SLIC parameters: 8000ns and 0.3c."Non-garbage" is the class best identified on the orthotiles with UA = 92.21%and PA = 88.96%.In this example, the class "Sure" with UA = 56.81%, is the second most reliable one.The model can identify and differentiate it from "Non-garbage" and "Dispersed".Approximately 50.22% of the objects classified as "Dispersed" were correctly identified, but only 13% of the scattered solid waste piles found were actually "Dispersed"."Not-sure" and "Dispersed" showed the lowest UA values, which indicates high commission errors, or segments wrongly classified.Finally, the classifier performs best, when all solid waste samples vs non-garbage are evaluated.
The classification results are influenced by the choice of methods and the properties of the data.Therefore, we explored the average spectral signature of each class.The "Half-sure" class exhibits, on average, the highest spectral reflectance values in all bands (Blue: 146.49 ± 35.55,Green: 160.36 ± 32.65,Red: 161.48 ± 30.69,NIR: 108.98 ± 32.66).In addition, the reflectance distribution was easily differentiated from the signatures of all other classes.On the contrary, the average spectral signature of class "Sure" did not overlap with the nG classes only when the building footprint was masked.The average values of the classes "Dispersed" and "Not-Sure" were similar, overlapping in the spectral signature."Non-Garbage" elements have lower reflectance values in the training data without building footprints.In general, the spectral signature that was best differentiable from the nG datasets was from the class "Half-sure", followed by "Sure" without the influence of the rooftops (Table 4).For more information, see (Figure 1) of the supplementary content.

Discussion
In our experiments, we proved it was possible to locate residual waste on urban roads in high-resolution aerial imagery.This was possible with an accuracy of up to 80-90% for class "Sure" when the rooftops were masked, although objects in the class "Half-Sure" were also detected with the entire scene.

Effects of segmentation on classification
The sum of QR and the rate of non-square polygons shown in (Figure 8) combine the spectral and shape information in one index.The rate of nonsquare polygons allowed segmentations with high QR to be excluded, but with compactness so high that homogeneity no longer played a role.Since solid waste objects do not always have the same appearance and size, the selected values were optimal for the segmentation process.The selected segmentations produced small, not square, segments, and fit into the training garbage areas.Many small segments combined had more chances to overlap with any possible garbage object, increasing the possibility of identifying any shape and size of the garbage area in an image.
The selected parameter values ns and c provide the best balance among all the variables defining the shape and size of superpixels that can detect garbage objects or parts of them.However, it is essential to highlight that these chosen values are specific to our data for several reasons: the size of each AOI raster (i.e.0.25 km 2 ), the size, shape, and spectral properties of the objects to be found on the image, the spectral information and resolution (i.e.bands red, green, blue, and near-infrared), and the pixel and ground sampling size (i.e. 8 cm pixel size), as well other factors that affected the scenes, such as differences in light intensity or quality, errors in the mosaicking, or shadows.Nevertheless, by applying a sensitivity analysis using the SLIC algorithm, we developed a systematic and non-subjective method to overcome these challenges and choose the correct values for the final segmentation.Here, we highlight the importance of evaluating other parameter values like the second and third best instead of only the first.Since urban waste objects present high spectral variability, this approach increases the chances of detection.

Effects of the building footprint on classification
Training the model without the building footprint allowed for improving the classification for many reasons.First, after removing the buildings, we identified waste in the areas of our public spaces.Second, the remaining area has fewer land cover classes available.A visual inspection of the images without the building footprint indicated that streets, vegetation, bare soil, and water were primarily present.Finally, the spectral information of the study areas with and without rooftops was very different (Table 4).The class "Sure" signature overlapped with "non-Garbage" when rooftops were included.This could be due to the colours of the rooftops, which resemble garbage elements.Therefore, removing the building footprint allowed better identification of the class "Sure".The other solid waste class features have an average spectral signature higher than nG, making them easier to differentiate.
The fact that the accuracy of classes "Half-Sure", "Not-Sure", and "Dispersed" was not always improved when masking the rooftops could have different explanations: (1) How the class was defined, or how the objects were assigned to this class.(2) How the training data for these classes were created because these objects were not easily identified as solid waste as the class "Sure".(3) The spectral signatures of "Not-Sure" and "Dispersed" presented a high overlap.Another feature space could be considered in future analysis; for example, the inter-channel correlation could enhance the distinction between classes.
(4) The algorithm might perform better in identifying single classes than mixtures of them.
A future step would be to evaluate those classes independently (similar to Treatment A).

Identification of garbage and non-garbage areas
The model was successful at identifying what is not garbage, as well as the category "Sure".Visually speaking, the class "Sure" was very homogenous because it was primarily composed of the same types of objects or plastic bags.Hence, objects containing diverse elements, not packed in the usual white -black plastic garbage bags, scored lower accuracy.The UA was mostly higher than the PA, indicating how many segments were identified as "Sure" waste that genuinely belonged to this category of garbage (Congalton, 2001).Other classes and treatments scored lower in UA and PA, which denoted how difficult it was to distinguish them from nG or other classes.When classes such as "Dispersed" and "Notsure" were included in the treatments, the accuracy dropped.These classes were the most difficult to detect and classify correctly.This could be due to the nature of the objects, i.e. the semantic information used to label those elements on the streets as one class or the other.Removal of these classes can still identify the typical litter, clearly wrapped in bags, dumped along the streets or piled against an electricity pole.
The average reflectance of a band of these categories overlapped significantly in blue and green (Table 4).Including band combinations of blue and green could be a way to identify these classes.The fact that the algorithm performed worse when including these classes could be due to the identification of the elements that belonged to the class itself, to the algorithm chosen, or to the variability of the spectral signature of that class.Difficulties distinguishing solid waste from bare soil have been previously reported.Yonezawa (2009) struggled to identify garbage over the ground without vegetation using multispectral Quickbird data.
During the creation of training data, garbage areas were sometimes challenging to distinguish from other objects on the scene, which were difficult to identify or assign to any class.Sometimes, it was clear that a specific object belonged to a residual waste category, but its appearance differed from other objects of the same class.At other times, objects on the scene looked similar to urban residual waste, for example, motorbikes, car windows, shadows, street drains, heads of pedestrians, or other types of garbage not previously identified.There might also be other solid waste categories found in the images that we could not detect due to size or appearance.These features influence the assertiveness of the final classification, which can be seen in the images from (Figure 11).Most misclassifications happened on some of the objects mentioned above, identified as solid waste, and primarily located alongside the streets.
While there were designed centres for waste collection and recycling in the city, our selected AOIs did not overlap.Nevertheless, our model could also identify the garbage inside these locations.If they overlap, these locations could be excluded to generate an accurate view of the illegal dumps.Besides the official centres, there are also authorised locations for solid waste accumulation, which our model can detect.This method can be used to validate illegal dumping zones if combined with ground truth data.

Challenges
Due to the spatial resolution of the orthotiles it is only possible to detect objects larger than the sampling size (64 cm 2 ).This implies that the model cannot detect small solid waste elements thrown on the streets, such as plastic bottles or cigarette butts.However, when local media reports illegal dumping on the streets, this includes big plastic bags of domestic waste.Therefore, this model contributes to the detection of a significant component of littering in Medellín.
The temporal resolution of the images of one record per year provides a screenshot of the city.When we apply the waste detection model to these images, we briefly see the city's condition, which might not represent a whole year.Therefore, it is impossible to quantify how much solid waste can be found on the streets of Medellín.
Another aspect of the temporal scale is comparing photos of other dates.The orthotiles are images recorded in 2019, while the Google Street View photos span from 2016 to 2021.Comparing the identified locations with Google Street View did not necessarily indicate that the dumping zones were permanent.However, if certain spots were visible on different time stamps, this might indicate that some illegal dumping occurred regularly, as the local media reported.Nevertheless, the model can detect solid waste, and if more images are available, more accurate temporal quantification could be possible.
The definition of the classes might have affected the capacity of the classifier to identify them.As previously mentioned, sometimes the waste objects looked similar to other elements in the scene.We The examples are from the following combinations of treatments, building footprint, and segmentations: a) class"Sure", treatment A, with building footprint, 8,000 ns − 0.3c, b) class"Sure", treatment A, without building footprint, 10,000 ns − 0.1c, c) class"Half-sure", treatment B, with building footprint, 8,000 ns − 0.3c, d) class"Half-Sure", Treatment B, without building footprint, 8,000 ns − 0.3c, e) class"Dispersed", treatment C, with building footprint, 8,000 ns − 0.3c, f) class"Dispersed", treatment C, without building footprint, 12,000 ns − 0.1c, g) class"Not-sure", Treatment E, with building footprint, 10,000 ns − 0.1c, h) class"Not-sure", treatment E, without building footprint, 10,000 ns − 0.1c.
focused on garbage dumped in large plastic clusters because the average citizen discards all types of garbage in one bag, regardless of which category it belongs (Peralta Miranda et al., 2019).Therefore, the classes were designed on the probability of being garbage and not on its content.
Further research on this topic can benefit from many lessons learned, such as, focus on a garbage vs non-garbage classification, include multi-temporal information, or testing other AI algorithms.More training and adequate data are necessary for better classification performance.For example, Thung and Yang (2017) first built a dataset of images called TrashNet before training a model.The final algorithm helped identify and sort trash in a recycling process.More extensive and diverse training data on solid waste could also improve the detection of dumping zones.
Recently, more studies have been conducted in the SWM field using DL.As an artificial neural network method, it can handle unbalanced or incomplete datasets (Abdallah et al., 2020).This would be suitable for our case study, since most elements on a scene are not solid waste.This could contribute to excluding objects that can be confounded with garbage because of a similar appearance.DL methods can handle higher amounts of nonlinear, complex data faster.However, they are prone to overfitting and will not necessarily improve accuracy compared to decision tree models like the RF tested here (Abdallah et al., 2020).

Socioeconomic applications
The model proposed in this study contributes to identifying the areas in which SWM collection may be failing.The model demonstrates that it is possible to detect solid waste wrapped in bags and dumped in urban areas.Furthermore, comparing imagery at different times can show which areas are most affected by littering and how it changes over time.Machine learning approaches are not restricted to this phase.Several authors have also contributed to other phases, such as waste bin detection, collection routing optimisation, waste classification for recycling, model parameters of the composting process, and landfill location, among others (Xia et al., 2021).
The local municipality of Medellín reports recurrent disposal of solid waste at unauthorised sites next to a designated dumping site, or even right after the garbage was collected (El Tiempo, 2022).In other words, the problem is not only the garbage disposed of at unauthorised locations, but also that citizens dispose of it right after the regular municipal collection.Due to the impact of residual waste on health and quality of life (Medina, 2010), people need to dispose of their garbage, regardless of whether the local government has an efficient or effective residual waste management system.Using imagery from different times of the day, this model could also identify the zones where people dump their waste outside authorised times.
Classification of images for solid waste can be applied on aerial imagery, or camera surveillance (Ping et al., 2020).The local municipality of Medellín has recently implemented a machine called "Robocop", which does visual recognition and image classification in real time of camera recordings of people who dumps their waste in unauthorized sites.The machine "speaks" with the citizen, and reports the information to the corresponding office.The results are that people feel discouraged to repeat their behaviour.A further step would be to couple this technology with identified critical dumping zones from remotely sensed data.
Garbage collection efficiency might be related to the socioeconomic level of the district (Galvis Gonzalez, 2016).A visual inspection of the orthotiles used in this model shows that districts categorised as middle class or higher seldom have litter on the streets.The failure to provide an adequate residual waste management service could have many reasons: budget, political will, illegal actions of residents, infrastructure, and terrain, among others.Many slums are located in difficult-to-access areas: streets may be narrow or unpaved, or the location may be very steep, hilly, or far away from the disposal centre (Sliuzas & Kuffer, 2008).
Moreover, the residents usually pay the waste management costs via taxes, or in some cities like Bogotá, through the electricity bill (Medina, 2010).Since many slum dwellers do not pay taxes (Medina, 2010) or are not registered users in the electricity grid service, they do not contribute to this service, which aggravates their quality of life.Poor people are just poorer.The application of this model to the entire city of Medellín can also highlight socioeconomic and political problems.Whether the community is rich or poor, or if dumping is legal or illegal, the disposal of litter in public areas imposes a hygiene threat to all citizens (Du et al., 2021).
The implementation of this method requires the generation and processing of orthophotos.However, once automated, this process can be more economically efficient than humans patrolling the streets.With this approach, we hope to generate more knowledge about ineffective waste management and its solutions.

Conclusions
This study aimed to test a combination of methods with a conceptual definition of solid waste to detect residual urban waste in Medellín, Colombia.For this purpose, several possible combinations of residual waste, segmentation, and the presence or absence of a building footprint were tested on orthorectified aerial images of Medellín.The methodology for this study focused on statistical robustness, hence the systematic selection of segmentation parameters, the balanced number of samples, and the evaluation of many combinations of factors.
The research methods applied in this study can identify presence of solid waste.While it struggles to differentiate among the categories of garbage, especially "Dispersed" and "Not-sure", it can detect with high accuracy where objects of "Sure" solid waste are disposed of on the streets.In general, the method proved capable of detecting the random waste littered on the streets, serving as valuable information for decision-makers working to enhance SWM.
Accumulations of residual waste in the urban environment are a known public hygiene problem highly correlated with urban poverty (Medina, 2010).Although this research did not explicitly detect the location of poverty, it contributed a method to determine areas in a city affected by a public health issue.Future research should further develop and confirm these initial findings to provide a proxy for urban poverty based on sanitation using remote sensing data.

Figure 1 .
Figure 1.Relative location and overview of the study area, Medellín.The administrative borders of the different neighbourhoods are shown in green.

Figure 2 .
Figure 2. Vector and raster datasets used in this project: a) shapefiles clipped to the extent of the 25 AOIs, b) shapefile of building and rooftop footprint, and c) orthotiles, 8 cm GSD, 2019, clipped to the ROI.

Figure 3 .
Figure 3.A subset of an orthotile: a) with buildings and b) without building footprints.

Figure 6 .
Figure 6.Workflow: A sensitivity analysis of six AOIs was performed, and the building footprint was created.The input data preprocessing involved the following datasets and steps: a) five orthotiles, b) 25 AOIs, c) three SLIC segmentation parameter combinations resulting from the sensitivity analysis, d) five treatments or combinations of garbage categories, and e) two building conditions with buildings or without buildings footprint.All possible combinations of factors [b:e] were calculated.Six statistical metrics were calculated and integrated into the model for each segment.The resultant input dataset was divided into training and testing.This model was evaluated for a total of 750 classifications; later, the accuracy was estimated.

Figure 7 .
Figure 7. Heatmap of the sensitivity analysis showing the combination of the QR with the proportion of non-square polygons.The upper left corner represents segments with low QR and a high proportion of non-square polygons.The bottom right corner represents segments with high QR and a low proportion of non-square polygons.The optimum values are shaded in dark orange.

Figure 11 .
Figure 11.Detailed view of some examples of the classification performance with the corresponding F1 score.The segments related to the class in the label are highlighted in yellow.Since some classifications involved more than one class, the segments of the other solid waste categories are shown in black.The examples are from the following combinations of treatments, building footprint, and segmentations: a) class"Sure", treatment A, with building footprint, 8,000 ns − 0.3c, b) class"Sure", treatment A, without building footprint, 10,000 ns − 0.1c, c) class"Half-sure", treatment B, with building footprint, 8,000 ns − 0.3c, d) class"Half-Sure", Treatment B, without building footprint, 8,000 ns − 0.3c, e) class"Dispersed", treatment C, with building footprint, 8,000 ns − 0.3c, f) class"Dispersed", treatment C, without building footprint, 12,000 ns − 0.1c, g) class"Not-sure", Treatment E, with building footprint, 10,000 ns − 0.1c, h) class"Not-sure", treatment E, without building footprint, 10,000 ns − 0.1c.

Table 1 .
Selected values from the sensitivity analysis using the sum of the QR and the proportion of polygons that are not squares (NSP) in each vector scene.NS is the number of segments, and C is compactness.

Table 2 .
Overall accuracy (OA) and kappa for all treatments (T), number of segments (NS), compactness (C), and presence (wB) or absence (woB) of the building footprint.

Table 3 .
Estimated error matrix of a classification with 8000ns and 0.3c for all garbage and non-garbage classes.Overall accuracy (OA), user accuracy (UA), and producer accuracy (PA) calculations are included.The upper part of the table shows a detailed information of every index per class.The lower part of the table shows a summarized information for all solid waste classes.

Table 4 .
Overall mean and standard deviation of the spectral signature (reflectance) of each classification category for all four bands.wB: nG dataset with buildings; woB: nG dataset without buildings.