Automated urban tree survey using remote sensing data, Google street view images, and plant species recognition apps

ABSTRACT Urban tree inventories have mostly focused on the information of individual trees becausethese allows city authorities to efficiently plan urban forestation . However, single-tree urban tree inventories are expensive for municipalities, so the inventories lack detail and are often out of date. In this work, we aim to integrate the possibility of using online applications for automatic species identification with worldwide coverage Pl@ntNet and Plant.Id on Google Street View (GSV) images in order to perform cost-effective urban tree inventories at the single-tree level and evaluate the performance of the two applications through comparison with a locally trained neural network using an appropriate set of metrics. Our work showed that the Plant.Id application gave the best performance by correctly identifying plants in the city of Prato with a median accuracy of 0.73 and better performance for the most common plants: Pinus pinea 0.87, Tilia aeuropea 0.87, Platanus hybrida 0.89. The proposed method also has a limitation. Trees within parks, walking paths and private green areas cannot be photographed and identified because Google cars cannot access them. The solution to this limitation is to combine GSV images with spherical photos taken via light unmanned aircraft.


Introduction
Urban tree inventories are a key tool for urban planning especially in the context of global climate change trends.(Padayachee et al., 2017).Urban tree inventories have mostly focused on single-tree information rather than surveying forest stands Östberg et al. (2013) because having information on individual trees allows city authorities regarding efficient urban forestation planning based on species selection, risk management, and subsequent replanting decisions (Keller & Konijnendijk, 2012).However urban singletree inventories are generally expensive for municipalities so, despite their importance, inventories lack detail and are often out of date due to the costs associated with mapping and monitoring trees over time and over large areas (Nielsen et al., 2014).Nielsen et al. (2014) distinguish four types of inventories for data collection at the individual tree level: 1) satellite remote sensing, 2) aircraft remote sensing, 3) field scanning or digital photography, and 4) field surveys with direct hand measurements and/or visual assessment.
Satellite or airborne remote sensing-based methods can cost-effectively collect information over very large areas (Cook & Iverson, 1991;Small & Lu, 2006).Very high-resolution multispectral imagery can also be used to collect information at the individual tree level (Jansen et al. 2006).Combining multispectral and LiDAR data can also be used to segment individual tree crowns (Alonzo et al., 2014;Wallace et al., 2021).
Compared to remote sensing methods, data collection and processing from digital scanning (Patterson et al., 2011) or ground-based photography (proximate sensing) is limited to small areas because each scan/ photo image is limited to a single tree or small group of trees.Although this technology is developing rapidly, it is still time consuming.Recently, methods based on images from Google Street View™ (GSV) have been developed to conduct a virtual inventory of street trees (Berland & Lange, 2017, Barbierato et al. 2021).
Field survey methods include dendrometric and/or phytopathological surveys on individual trees by volunteers or professional staff.(Adkins et al., 1997;Martin, 2011;Östberg et al., 2012).Although field surveys are labour-intensive and time-consuming, this inventory method is the most adopted (Nielsen et al., 2014).
Most of the more recent works follows the classic processing method based on artificial intelligence techniques: extract a small set of structure and shape features from the images and/or Aerial Light Detection and Ranging (LiDAR) data, and train a classifier (e.g.Linear Discriminant Analysis, Support Vector Machines, or more recently Deep Learning) to distinguish among a small number of species 3 in (Korpela et al., 2010); Leckie et al., (2005); (Heikkinen et al., 2011), 4 in (Heinzel & Koch, 2011), 7 in (Waser et al., 2010;Pu and Landry, 2012).Recently, some authors (Branson et al., 2018;Ringland et al. 2021) applied tree detection and species recognition methods using publicly available Google Maps(TM) aerial and street view images by applying convolutional neural networks (CNNs).However, no studies have been conducted on the transferability of the methodology to other cities worldwide.
Over the past 10 years, much research of deep learning image recognition approaches for plant identification has been published.(see e.g.Wäldchen & Mäder, 2018).The development of many trained convolutional neural network stems from the Cross Language Evaluation Forum (CLEF) initiative (http://www.clefinitiative.eu/association),which since 2013 has included the LifeCLEF challenge to develop automatic identification systems for living organisms.The PlantCLEF subproject has focused on plant identification (Goëau et al. 2013;Cappellato et al. 2017), with experiments aimed at comparing the performance of "experts" with that of the best deep learning algorithms (Bonnet et al. 2018).
In this paper, we aim to integrate the possibility of using automatic species identification apps with worldwide coverage on Google Street View images to be able to perform cost-effective urban green inventories at the individual tree level.In detail, the objectives of the research are: (1) identify a methodology to extract images of urban trees from GSV using LiDAR and multispectral data, without the need for other ancillary data; (2) to classify plant species in GSV images through the two globally available applications that can be queried through a programming interface: Pl@ntNet and Plant.Id; (3) compare the classification performances of the two apps with those of a Convolutional Neural Network (CNN) trained on the study area using an appropriate set of metrics; (4) evaluate the benefits and limitations of an automated urban green inventory integrating LiDAR data, GSV images, and tree species classification apps.

Study area
Prato (Figure 1) is a city in Tuscany (Italy) with 200,647 inhabitants and an area of 44.37 square kilometres placed at coordinates 43°52′50.93″N11°05′ 47.62″E.Prato is the third largest city in central Italy after Rome and Florence, thanks to the immigrants that arrived first from the countryside, then from southern Italy.The city's climate is characterized by rather cold and moderately dry winters and hot and sometimes sultry summers.The city of Prato has an urban greenspace of 62.56 square kilometers, of which 13.77hectares are public urban trees.There are 3 protected natural areas in the city totalling 11.63 square kilometres (ISTAT, 2019).According to official statistics (ISTAT, 2019) Prato is one of the greenest cities in Italy with 14.2% urban green area, compared to 13.2% in Florence and 9.6% in Rome.According to the city's public green census (Figures 2 and 3), there are 147 different species in the city of Prato, and 27 species are present with at least 50 plants.The predominant species (Figure 2) is Pinus pinea, with almost 27% of the total green area covered, followed by Tilia x aeuropaea with 16% and Platanus x acerifolia with almost 10%.
The city of Prato is one of the most active Italian cities in urban green planning.Since 2021, Prato has adopted the "Prato Urban Jungle" reforestation plan.The "Prato Urban Jungle" project will redesign the city's neighbourhoods in a sustainable and inclusive way, developing high-density green areas that will be incorporated into the urban landscape, multiplying the natural ability of plants to abate pollutants, transforming areas of urban marginality into green places of well-being within the city.Urban jungles will be codesigned with the help of citizens, through shared urban planning facilitated by the use of digital platforms.Implementation of the plan will involve the planting of 190,000 trees in highly urbanized areas to create multifunctional ecological spaces and corridors that generate urban renaturalization processes.

Automated image recognition apps for plant identification
Over the past 20 years, much progress has been made in the development of image recognition/AI approaches for plant identification.Much effort has been focused on the Cross Language Evaluation Forum (CLEF) initiative, which with the PlantCLEF subproject has focused on plant identification (Goëau et al., 2021), with many different research groups contributing their models from 2011 to 2022, all with the goal of comparing the performance of "experts" with that of the best deep learning algorithms (Bonnet et al. 2018).Image recognition technology is maturing so rapidly that numerous automatic plant identification apps are now available for Personal Computers, smartphones, and tablets, so it is worth considering the state of the art of this technology (Jakuschona et al., 2022).
The two apps chosen for the present research, Plant.Id and Pl@ntNet had the following advantages: (a) availability of an Application Programming Interface (API) for personal computers; (b) good performance in recognizing photographic images of  plants from standardized datasets (Jones, 2020;Jakuschona et al., 2022).
The Pl@ntNet project is an app made by a consortium including CIRAD, INRA, INRIA, IRD and the Agropolis Foundation, is a tool that supports image-based plant identification for both amateurs but especially professionals.The model behind the API is updated monthly both in terms of training data and new training architecture.Pl@ntNet's identification service is a RESTful JSON API, which returns the list of species corresponding to the query, each associated with the classification score emitted by the deep learning model.For each species the scientific name, common name, and genus and family name of the identified plant is provided.
Plant.id is a project developed by the FlowerChecker team, whose main goal is to facilitate the monitoring of invasive and endangered species for a wide range of use scenarios, from businesses to private use.The API is based on TensorFlow, Python and AWS technologies.For matching images, the API returns results with predictions about the species represented in the image and additional information about the species, such as potential plant diseases.For Plant.id, it is also possible to specify geographic coordinates of plant location, as it significantly improves the accuracy of classification.Again, to increase the efficiency Plant.Id API allows multiple images of the same plant to be uploaded.API returns the scientific and common name.Plant.Id also associates each proposed prediction with a prediction score.Plant.idcan also identify whether the plant is affected by a disease and provides additional details about the disease.

Remote sensing and other spatial data
The spatial data used in the research are derived from remote sensing (multispectral orthophotos and data derived from LiDAR survey) and field survey, regarding the public urban green census of the municipality of Prato.
The remotely sensed data, both of which were downloaded from the Tuscany Region's mapping portal.The multispectral aerial frames have four bands: red, green, and blue + near-infrared (NIR) regions and were acquired in 2019 by means of a digital metric camera UltraCam Xp (Vexcel), with a resolution of 0.2 × 0.2 m (Table 1).The spectral sensitivity of red, green, blue and near infrared and the panchromatic channel PAN from 410 nm to 690 nm, RED from 580 nm to 700 nm, GREEN from 480 nm to 630 nm, BLUE fro 410 nm to 570 nm and NIR from 690 nm to 1000 nm.
The LiDAR data were provided by the Italian Ministry of the Environment, Land and Sea.The points acquired from this survey have an altimetric accuracy of ±15 cm and a planimetric accuracy is ±30 cm.In this work the data available by the geographical portal of the Tuscan region with a resolution of 1 × 1 meter were used.That resolution was considered satisfactory for the objectives of the work.If necessary, however, the proposed method could therefore also be applied to more detailed scales.The LiDAR survey data consist of two datasets: Digital Surface Model (DSM) and Digital Terrain Model (DTM).All input raster were aligned at 1 × 1 m resolution using polynomial warp algorithm.
The census of public urban trees was conducted by GPS survey in 2017 and is accessible for download through the open data network of the municipality of Prato.The dataset primitive is pointwise, and each point is associated with the scientific name of the plant and the coordinates of the trunk with reference system EPSG 3003. Figure 3 shows some small examples of the raster layers used.

Methods
The workflow followed in the research is as follows (see Figure 4): 1) Data Production a) Identification of tree morphological parameters: (i) Segmentation of tree crowns using a watershed segmentation algorithm for automatic extraction of tree crowns from LiDAR and Normalized Difference Vegetation Index (NDVI) data.(ii) Extraction of tree geometric features: (i) height, (ii) crown diameter, (iii) x and y coordinates of the crown center of gravity.(iii) Ground identification of genus and species for trees visible from public roads.
b) The harvesting of canopy images from Google Street View (GSV): (i) downloading of all GSV photo codes in the city of Prato; (ii) identification for each tree of the GSV codes of the nearest spherical photos.(iii) calculation of the zenith and elevation angle of the canopy with respect to each GSV photo point; (iv) download of ground images of the crowns from GSV; 2) Processing (a) Identification of tree genus andspecies with Plant.Id and Pl@ntNet apps; (b) Identification of tree genus andspecies with a GoogLeNet CNN.
3) Validation (a) Evaluation of classification efficiency by comparing app and CNN results with the tree census of Prato municipality using a set of classification performances metrics.

Crown model segmentation
The basis for modern LiDAR-based forest measurements is based on the acquisition of three surfaces, namely the crown height model (CHM), digital terrain model (DTM), and digital surface model (DSM) (Hyyppä et al., 2017).To segment the crowns, we applied the geographic watershed algorithm to the CHM.The watershed algorithm is the typical procedure applied to CHMs to extract individual tree crowns of woody vegetation (Chen et al., 2006;Hyyppa et al., 2001).Based on the similarity between geographic reliefs and tree crown surfaces, the watershed segmentation approach (Jing et al., 2012;Silván-Cárdenas, 2012) is widely used to segment images for tree crown delineation.In applying the procedure to the canopy, it is first necessary to invert the filtered CHM such that the highest value becomes the lowest and vice versa.In order to apply the algorithm in urbanized area, it was necessary to create a green mask to remove the artificialized areas from the analysis.We created a mask by calculating the NDVI by adopting a threshold value of 0.6 identified from the literature related to the sensor used for imaging (Alvarez et al., 2010).In order to separate the areas of shrubs and grasslands, we included a condition in the mask that the Digital Height Model (DHM), calculated as the difference between DSM and DTM, should be greater than 3 meters.Formally: with M binary mask, NIR infrared band, Red band.Finally Large image format of 17,310 pixels cross track and 11,310 pixels along track.Optical system with 100 mm focal length for the panchromatic camera heads and 33 mm for the multi spectral camera heads.
The CHM obtained by mask analysis was subjected to a two-dimensional Gaussian filter (2D): (3) With G Gaussian filter, σ standard deviation, r window radius, fixed at a value of 2 pixels, like the methods proposed by Persson et al. (2002), Brandtberg et al. (2003) and Falkowski et al. (2006).By applying such Gaussian filtering to the CHM we made the tree crowns with smaller sizes were better outlined; at the same time those with larger sizes became more regular (Falkowski et al., 2006).For watershed segmentation method we used SAGA software (Conrad et al., 2015).To identify the canopies of public green areas, we performed a map overlay operation with urban public areas.Through the canopy geodatabase maximum diameter of the canopy (maxCanopy) was calculated for each Tree element based on the coordinates of the bounding boxes.Finally, through a map overlay operation with the CHM we assigned the tree height (Htree).Thus, the final structure of the database is as follows: Tree TreeId; Species; Tlat; Tlon; maxCanopy; Htree ð Þ: (5) with: TreeId, tree identifier; Species, scientific name of the tree; Tlat and Tlon, longitude and latitude of the center of gravity of the canopy in EPSG:32632; maxCanopy, maximum diameter of the canopy; Htree, height of the tree.

Street level imagery
GSV is a Google service (Google, 2014) that provides 360º horizontal and 180º vertical panoramic views along streets, 10-20 meters apart, and is available to most nations around the world.Through a specific Application Programming Interface, Street View Static API (SWSAPI), square portions of the panoramic images can be downloaded.By specifying different parameters in the SWASAPI, users can download GSV images of different locations, direction angles, and pitch angles.Figure 5 shows the parameters needed to locate a specific portion of the panoramic views: heading indicates the azimuth angle (heading values range from 0 to 360), pitch specifies the elevation angle relative to the ground plane, and FOV determines the horizontal field of view of the image.
To identify the parameters of the SWSAPI related to urban trees in the study area, we used the following procedure.
By querying the SWSAPI with the metadata option, it was possible to obtain a points geodatabase with the Id of the panoramic photo and the geographic coordinates of the photo's shooting point closest to each tree surveyed on the ground.
To avoid obstacles standing between GSV's car and the tree, we performed a viewshed analysis using the digital height model obtained from the LiDAR data as the DEM and GSV's shot points as the observation points.The analysis was performed with the "Viewshed" module of QGIS software by setting a search radius of 30 meters (considered appropriate to obtain images with satisfactory detail) and a GSV car height of 3 meters.Through the intervisibility layer, only trees visible from the shot points were selected.A GSV snapshot point database was then formed with the following structure: with PanoId identifying the GSV panoramic image, Glat and Glon coordinates of the image shot point in EPSG:32632 -WGS 84/UTM zone 32N reference system.
The two databases Tree and GSV were merged via the TreeId field obtaining the database TreeGSV: TreeGSV TreeId; Tlat; Tlon; PanoId; Glat; Glon; ð maxCanopy; HtreeÞ (7) For each TreeGSV record, the parameters of SWSAPI were identified as shown in Figure 6: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi with h GSV = 2.5 meters panoramic camera height of GSV; FOV ¼ arcsin maxCanopy=2 ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Thus, the final database has the following structure: TreeGSV TreeId; Tlat; Tlon; PanoId; Glat; Glon; ð maxCanopy; Htree; pitch; FOV; headingÞ: (11) The Tree database data were used to download tree crown images from GSV via the API.In the case of Plant.Id, since the model accepted up to 5 images of the same tree are also downloaded for each tree a second image by setting FOV = 20 and leaving pitch and azimuth unchanged, so as to have also a more detailed detail of the foliage in the center of the canopy.To access the API we used a procedure based on the googleway library of the software R.
To provide classification apps with images of sufficient quality, we examined the actual presence of greenery in the photos collected by GSV.We estimated the total area covered by trees in each image by applying semantic segmentation with deep learning (for more details on the procedure, see Barbierato et al., 2020).A semantic segmentation network classifies each pixel in an image, resulting in an image segmented by class.In this phase of the work, we used the MatLab softwarebased Deeplabv3+ pre-trained network, which is a type of convolutional neural network (CNN) designed for semantic image segmentation (Brostow et al., 2009), with weights initialized from a pre-trained Resnet-18 network.ResNet-18 is an efficient network, suitable for applications with limited computational resources.The network is from MatLab software was trained using the University of Cambridge's CamVid dataset (Zhang et al., 2010), a collection of images containing streetlevel views.There are 11 image segmentation classes: "Sky", "Building", "Pole", "Road", "Sidewalk", "Tree", "SignSymbol", "Fence", "Car", "Pedestrian", and "Cyclist".Once all images were classified, we then filtered those with at least 50% pixels classified as "Tree", as on the basis of pre-processing, this limit was considered satisfactory to achieve efficient classification.

Tree species classification by Pl@ntNet and Plant. Id
Images of tree crowns were sent to the Pl@ntNet and Plant ID APIs using R software with the rjson, httr, and base64 libraries.In the case of Pl@ntNet, one image was used for each API access, while in the case of Plant.Id for each query, the two images with FOV=FOV and FOV = 20 along with the coordinates of the crown center of gravity were sent.
For each tree in the Tree database, the result was obtained in json format.The models return a list of suggested species, but we considered only the result with the best score to evaluate the accuracy of classification.

Tree species classification by convolutional neural networks
Classification of tree species was carried out using the GoogLeNet CNN.As pioneered by Wegner et al. (2016), the GoogLeNet CNN model offered the best trade-off in terms of recognition performance, execution time and memory consumption.
We resized the GSV images to 400 × 400 pixels, and the data were divided into a training set and a validation set with a ratio of 90 to 10.In the training parameter settings, to avoid overfitting, the maximum epoch value was set to 500 and the initial learning rate to 0.0003.
To expand the size of the GSV image set, the data augmentation method was performed.We employed four geometric transformations of GSV images: horizontal inversion and vertical inversion rotation of 90, 180, and 270 degrees.
For training the CNN for each species, we used 90% of the photos for the training set and 10% of the photos for the testing set.To have an adequate number of photos in the testing set for calculating performance metrics, we selected only species with at least 200 GSV images, for a total of 26 species.

The confusion matrix
The goal of the Machine Learning models used in our research is to classify trees according to their genus and species, so the problem falls under "multiclass classification".The metrics that we employed are based on the concept of multiclass confusion matrix.The confusion matrix is a cross table that records the number of occurrences between detected classification and predicted classification.By custom, the columns represent the model prediction, while the rows show the actual classification.In binary confusion matrices (Figure 7a) The main diagonal reports the correct answers (true positives TP), the position above the main diagonal reports the number of false negatives (FN) finally the position below the diagonal reports the false positives (FP).Multiclass confusion matrices can be brought back to the binary case by performing a separate analysis for each row (C i class) as shown in Figure 7b.
The multiclass metrics for unbalanced data are derived from the two main indices of confusion matrix: specificity and sensitivity: Sensitivity (also called recall) is the percentage of correct positive predictions (TP) out of the total number of items actually in class C i and ranges from 0 (worst) to 1 (best).Specificity is the percentage of negative corrected predictions (TN) out of the total predictions misclassified by the model in the C i class, again Varies from 0 (worst) to 1 (best).
The evaluation metrics.
Balanced accuracy is a metric used in remote sensing applications to evaluate classification efficiency Gibson et al. (2020); Simoniello et al. (2022).it is calculated as the average of sensitivity and specificity, which can also be defined as the mean accuracy obtained over one of the two classes.
Another metric applied to the evaluation of classification performance using remotely sensed data is the geometric mean (Gmean) between sensitivity and specificity (Silva et al., 2017).Gmean was proposed in Burez and Van den Poel (2009) by combining the prediction accuracies, i.e. sensitivity as accuracy in positive classifications and specificity as accuracy in negative classifications.Poor performance in identifying true positives will lead to a low G-mean value, even if negative examples are classified correctly by the model (Hido et al., 2009).
Gmean ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi The likelihood ratio (Bekkar et al., 2013;Dabboor & Shokr, 2013) is divided into positive and negative.The positive likelihood ratio LR(+) represents the ratio of the probability of classifying a case correctly to the probability of classifying incorrectly.

LR
While the negative likelihood ratio LR(-) is the ratio of the probability of predicting an example as negative when it is actually positive to the probability of correctly classifying it as negative.
A higher positive likelihood ratio detects better performance on positive classes instead a lower negative likelihood ratio better performance on negative classes.For example, a positive likelihood ratio of 50 means that the probability of correctly classifying a lime tree is 50 times greater than the probability of incorrectly classifying the species under consideration; a negative likelihood ratio of 0.01 means that the probability of classifying a plane tree as a lime tree is 100 times lower (1/ 0.01 = 100) than it is to classify the plane tree correctly.
To interpret the results of LR(+) and LR(-) we can refer to the limits proposed by Bekkar et al. (2013), shown in Table 2.

Results
Given the complexity of the methodology, the exposition of results will systematically follow the workflow previously illustrated and depicted in Figure 4, and will therefore be divided into the following subsections: data production, processing and validation.

Data production
Identification of tree morphological parameters Canopy segmentation in the urban perimeter of the city of Prato resulted in the identification of 329,026 canopies for an area covered by urban forest of 782.55 hectares (Figure 8).The result of the map overlay operation resulted in the identification of 14,057 "public urban trees" with an area of 91.38 hectares; the category "other urban green", including private green spaces and abandoned areas with shrub/herbaceous species covers an area of 691.17 hectares.Comparison with data from official national statistics leads to a greater extent of green areas by almost 25%; this figure is plausible because the procedure adopted also leads to segmenting areas that are not officially classified as urban green, mainly abandoned urban areas under natural succession (so-called brownfields) or small residual agricultural areas totally included in the urban perimeter.According to many authors (Mathey et al., 2015;Pueffel et al., 2018;Sikorski et al., 2021) even these unofficially recognized green areas contribute to the provision of ecosystem services and it would be correct for them to be officially surveyed.
The harvesting of canopy images from Google Street ViewGSVWithin a 30-meter radius of GSV, 12659 urban public trees were found to be visible out of a total of 14,057 trees taller than 3 meters surveyed by the City of Prato, accounting for about 90%.Trees not visible can be attributed to two main reasons: plants covered by other plants in the perspective of GSV's camera point or plants within urban parks or other areas not accessible to Google machines.In the latter case, however, trees at the edges were often visible.Therefore, 12659 photos were downloaded from GSV taken during the period from June 2021 to September 2020.
Based on classification through Resnet18 we filtered 11,552 shot points (91%) with green index above 50%.For these images, photos with FOV = 20 were also downloaded.An example of the downloaded images for the most important species is shown in Figure 9.

New data production
Classification procedures using Plant.Id, Pl@ntNet, and GoogLeNet CNN were applied to classify canopy images downloaded from GSV.The result was a new geographic database containing the vector geometry of the canopy and the following features: coordinates of the centre of gravity of the canopy, height of the tree, code of the GSV images collected, results of the classification in genus and species of the ground survey of the census, results of the classification in genus and species of the classification by Plant.Id, Pl@ntNet and GoogLeNet CNN. Figure 10 shows a sample of the result of the three classifications compared with ground observation.

Validation
In accordance with the literature, we evaluated the performance of the two applications Pl@ntNet and Plant.Id and the CNN GoogLeNet classifier at the genus and species level.Tables 3-8 report performance metrics for the most frequent species and genera as well as descriptive statistics for all species and genera in the study area.
The Pl@ntNet classification (Table 3) at the species level has a barely acceptable median balanced accuracy (0.60), with a rather narrow frequency distribution (first and third quartiles at 0.64 and 0.55, respectively).Relative to the three most frequent species in the study area Pinus pinea and Tilia europea have slightly better performances (0.75 and 0.62, respectively), while Platanus hybrida is below the median (0.54).Among species with at least 50 trees in the study area, the best ranking performances are for Pinus and the worst for Robinia.The geometric mean shows worse performance and greater dispersion of descriptive statistics, (median 0.48, first quartile 0.33 and third quartile 0.57).Classification of the top three species has acceptable results for Pinus pinea (0.71), but Tilia aeuropea and Platanus Hybrida have poor performance (0.58 and 0.33, respectively).The best performance is for Pinus pinea and the worst for Robinia pseudoacacia (just 0.13).LR(+) shows good performance in identifying true positive species: the first quartile is above the threshold of 5 and the median is above 10.For the top three species the correct classifications are also good for this metric only Pinus pinea, but barely acceptable for Platanus hybrida and poor for Tilia aeuropaea.For other species with more than 50 plants there are poor performances (LR(+) < 5) for some important species in the study area: Acer campestre, Acer platanoides, Fraxinus excelsior and Populus nigra.LR(-), on the other hand, has values always below the limit of 1 with thus low probability of false positives.
The performance of Pl@ntNet at the genus level is slightly better than the classification at the species level (Table 4).At the genus level, balanced accuracy has an acceptable average of 0.62, with first and third quartiles at 0.69 and 0.56, respectively.Relative to the three most frequent genera in the study area, Pinus and Platanus have significantly higher performances (0.81 and 0.78 respectively), while Tilia is only slightly above the median value (0.68).Among the genera with at least 50 occurrences, the best rankings are for Pinus, and the worst for Robinia.The geometric mean shows a median of 0.48, with first quartile 0.35 and third quartile 0.63.However, the classification of Pinus, Tilia and Platanus has results above the median (0.80, 0.77 and 0.63, respectively).The best result is for Cedrus and the worst for Robinia (just 0.13).LR(+) already has the first quartile above the threshold of 10 (10.2).For the three main species, the correct classification performances are good for Pinus and Platanus, but only acceptable for Tilia.For the set of genera with more than 50 plants, there are poor performances (LR (+) < 5) for Quercus and Acer.Even at the genus level LR(-), on the other hand, has values always below the limit of 1 with thus low probability of false positives.
As shown in Table 5, at the level of balanced accuracy, the performances of Plant.Id classification by species are quite good, with a median value of 0.73, first quartile 0.67 and third quartile 0.8.The top three species have definitely good results: Pinus pinea 0.87, Tilia aeuropea 0.87, Platanus hybrida 0.89.The species with the lowest value is Ligustrum lucidum with 0.59.Again the geometric mean has lower values: median 0.69 first and third quartiles 0.58 and 0.78.The values of the three most important species remain high: Pinus pinea 0.87, Tilia europea 0.87 and Platanus hybrida 0.89.LR(+) is very good in descriptive statistics and in the most representative species with the only exception of Tilia europea which has a value of 8.92, still acceptable.LR(-) is always less than 1.
The results at the genus level are also slightly better than the classification by species (Table 6).The balanced accuracy at median 0.75, first and third quartiles of at 0.70 and 0.82; the geometric mean has slightly lower values: median 0.70, first and third quartiles 0.64 and 0.80.Good performances are maintained for the most important genera with LR(+) and LR(-) values all in the "good" limit.Pinus, Tilia and Platanus have balanced accuracy and geometric mean values all above 0.9.
GoogLeNet's classification performance is comparable to that of Plant.Id and higher than that of Pl@ntNet (see Tables 7 and 8).The median balanced accuracy value of 0.71 is slightly lower than that of Plant.Id, a trend confirmed by the geometric mean (0.65 for GoogLeNet and 0.69 for Plant.Id).The top three species are ranked efficiently: Pinus Pinea 0.93 for balanced accuracy and 0.93 for geometric mean, Tilia aeuropea 0.847 and 0.846 respectively, and Platanus hybrida 0.82 and 0.81.
Examining the frequency distribution statistics, GoogLeNet has a wider distribution especially for geometric mean.The first and third quartiles of the balanced accuracy are 0.52 and 0.85, respectively, compared with 0.67 and 0.80 for Plant.Id; for the geometric mean we even have 0.22 and 0.84 compared with 0.57 and 0.78.This phenomenon can be explained by the fact that for less frequent species GoogLeNet does not have many images to process in the training set, so it gets poor results, while for species with more images it gets satisfactory results.This hypothesis is confirmed by the LR(+) values, which are definitely unsatisfactory for Acer campestre, Acer saccharinum, Fraxinus angustifolia, Prunus cerasifera, and Quercus robur and only fair for Aesculus hippocastanum, Quercus rubra, and Tilia europaea.LR(-) is always less than 1 thus satisfactory.By grouping species by genus GoogLeNet succeeds in making a better classification because even less frequent plants refer to a larger training set.In fact at the genus level GoogLeNet gets better results than Plant.Id.The median balanced accuracy is 0.78 with first and third quartiles at 0.60 and 0.87, compared to 0.75, 0.70 and

Discussion
The objective of our study was to identify a framework to automatically create the urban forest inventory at the individual tree level through the integrating LiDAR data and publicly open GSV 360 images by applying online automatic species classification apps.Our work demonstrated that it is possible to extract images of urban tree canopies from GSV by segmenting LiDAR images filtered through NDVI computed using multispectral data from high-resolution remote sensing.GSV images were classified by two artificial intelligence applications (Pl@ntNet and Plant.Id) queried through an API and a classifier trained through the GoogLeNet CNN.Despite the low resolution of GSV images, Plant.Id and GoogLeNet achieved satisfactory classification efficiency especially for the most frequent plants in the study area.The good performance  of Plant.Id probably stems from the ability to also use geographic location as input data in the query.
GoogLeNet CNN, on the other hand, had the advantage of training the AI with images taken from GSV while the global apps are trained with images from very diverse sources, mainly provided by users through smartphones.Pl@ntNet, on the other hand, yielded less satisfactory results, partly because this AI performs best when using images of plant organs (leaves, flowers, and bark) of the plant as input data, whereas GSV images were only crowns.
The differences from other studies using GSV 360 imagery our model does not require prior sampling to train a deep learning classifier as one of the two apps tested, Plant.Id, demonstrated similar performance to a Deep Learning classifier trained with species in the study area.With the Plant.Id app it is possible to classify even species with very few plants in the study area (at the limit even species with only one plant), while in training a CNN classifier it is possible to identify only species with a sufficient number of observations.These advantages further reduce costs and  make it possible to transfer the method to even small cities that would not allow for a large enough training set to train the classifier.
The results are consistent with those reported in the literature (Ringland et al., 2021, Lumnitz et al., 2021;;Jakuschona et al., 2022).Plant.Id and GoogLeNet demonstrated good performance in detecting and classifying major urban street trees and tree species from GSV 360 images using DL-based techniques, showing Balanced accuracy at the genus level of 0.75 and 0.78, respectively, and at the species level of 0.73 and 0.71.
This performance is lower than that of Zarrin (2019), who reported a tree classification performance of 0.96 for all trees at the genus level.The authors collect the images in situ via smartphone camera, thus with optimal image quality compared to those detectable by GSV.
Comparing our results with in work that used GSV images, in Berland and Lange's (2017) research, genus identification was in agreement between field and virtual surveys for 90% of trees (kappa = 0.88, p < 0.001), and at the species level, the level of agreement for tree identification was 66% of trees (kappa = 0.64, p < 0.001).GSV Branson et al. (2018), achieved slightly better classification performances at the species level, with an average class precision of 0.83 for 30 different species; finally, recently Choi et al. (2022) performed worse, with a mean precision accuracy of only 0.54.
In agreement with Choi et al. (2022) we believe that the performance of CNN-based species classification systems is greatly influenced by the morphological and phenological characteristics of each tree species.According to our results, tree species with distinct morphological characteristics showed better classification accuracy.Pinus pinea showed the highest classification accuracy among the main species in our study because of the characteristic umbrella canopy clearly different from other species.
For the evaluation of app performance, we used a set of metrics that provided comprehensive information on strengths and weaknesses.In particular, balanced accuracy and geometric mean allowed us to evaluate the efficiency in classifying the most frequent species, LR(+) the probability of correct positive classification.Finally, the low values of LR(-) showed that neither Pl@ntNet nor Plant.Id are prone to systematic misclassification for trees in the study area.
Contrary to the findings of Zarrin et al. (2019), work has also shown that Plant.Id performance does not vary significantly by classifying trees at the genus rather than species level .
Related to the research question, "Does the integration of multispectral data, LiDAR, GSV imagery and AI for species recognition enable automatic censuses of urban greenery at the individual tree level?" the answer is only partially positive.
Google machines do not penetrate inside parks and pedestrian paths, so trees inside cannot be photographed and identified.It may also be the case that even for street trees a tree is masked by another object than where the GSV photo is taken.Finally, with GSV images it is not possible to survey private urban greenery.This aspect represents the main limitation of the present work.
Relative to errors in the acquisition of canopy images the main causes of failure of the methodology were: (1) the GSV images were not all taken on the same date, and it may be the case that they were taken at very different times, and thus are not temporally aligned with the remotely sensed data.(2) GSV images were taken in seasons when leaves were not present.
The solution to this limitation is to integrate unsatisfactory GSV images using spherical photos taken on foot or with small electric vehicles that can be used in urban parks via backpack-mounted cameras.A planned development of the research will be to apply our methodology with this modality by assessing whether there are significant differences in performance between professional or consumer cameras.A further development involves integrating spherical photos with an aerial survey using unmanned aerial vehicles (UAV).
Another limitation of the work is that the canopy segmentation procedure was not subjected to statistical validation, since the purpose of the work was to evaluate the efficiency of classification using Pl@ntNet and Plant.Id, and thus the canopy segmentation was only functional for GSV image acquisition.From nonsystematic recognition, however, the quality of segmentation appeared good at different scales, as shown in the example in Figure 11.
However, it will be necessary in a future work to investigate the most efficient methodologies for segmenting canopies in urban areas efficiently for both automatic identification and map rendering.
Finally, the economic costs of automated urban forest inventories by integrating GSV images, spherical images taken from the ground and UAVs, and LiDAR data classified with Plant.Id should be carefully evaluated.Currently, the base price of Plant.Id is 0.05 euros per request, but significant discounts are expected for larger volumes of identifications.It will be necessary to conduct cost simulations between our methodology and other methods of conducting an urban forest inventory at the individual tree level.

Conclusion
Our work demonstrated that it is possible to combine the Plant.Id application with photos downloaded from GSV and LiDAR and multispectral data to make single-tree public green inventories.
Through a crown segmentation procedure based on the capture area method, we were able to calculate the vertical projection parameters of tree crowns and extract crown images from spherical GSV photos.The Plant.Id application correctly identified plants in the city of Prato with a median accuracy of 0.73 and with better performance for the most common plants: Pinus pinea 0.87, Tilia aeuropea 0.87, Platanus hybrida 0.89.
As emerged in the discussion section there are many possible research developments.The most promising in our opinion are the following.− The use of higher detail azimuth spherical images obtained from UAVs.
-The use of more advanced machine-learningbased crown segmentation procedures applied to aerial zenith images, again from UAVs.
− The evaluation of the efficiency of the new Plant.Id feature dedicated to plant disease classification for health monitoring of urban greenery.
GSVGSVWe believe that our procedure can be useful for city administrators to update urban green censuses in order to set the correct maintenance and management actions.
We hope that this work will contribute to the dissemination of single-tree urban green inventories even in small cities.

Figure 1 .
Figure 1.The city of prato (Points data: Census of public urban trees fo municipality of prato; basemap: OpenStreetmap).

Figure 2 .
Figure 2. Frequency distribution and total area covered by major tree species (species with at least 50 plants).

Figure 3 .
Figure 3. Small examples of the raster layers used: (a) RGB orthopoto from UltraCam Xp camera, (b) Normalized difference vegetation index (NDVI) from UltraCam Xp data and (c) LiDAR layer.

Figure 5 .
Figure 5. Google street view static API parameters.

Figure 8 .
Figure 8. Urban greenery and public urban trees in the perimeter of the city of Prato.

Figure 9 .
Figure 9.An example of the downloaded images for the most important species (FOV= Field of View).

Figure 10 .
Figure 10.Sample of the result of the three classifications compared with ground observation.

Figure 11 .
Figure 11.Sample of crown segmentation at different scale.

Table 2 .
Thresholds for positive likelihood ratio interpretation.

Table 3 .
Performance indicators of the Pl@ntNet app by species.

Table 4 .
Performance indicators of the Pl@ntNet app by genus.

Table 5 .
Performance indicators of the Plant.Id app by species.

Table 6 .
Performance indicators of the Plant.Id app by genus.

Table 7 .
Performance indicators of the GoogleNet network by species.

Table 8 .
Performance indicators of the GoogleNet network by genus.