ESTATE: A Large Dataset of Under-Represented Urban Objects for 3D Point Cloud Classification

: Cityscapes contain a variety of objects, each with a particular role in urban administration and development. With the rapid growth and implementation of 3D imaging technology, urban areas are increasingly surveyed with high-resolution point clouds. This technical advancement extensively improves our ability to capture and analyse urban environments and their small objects. Deep learning algorithms for point cloud data have shown considerable capacity in 3D object classification but still face problems with generally under-represented objects (such as light poles or chimneys). This paper introduces the ESTATE dataset (https://github.com/3DOM-FBK/ESTATE), which combines available datasets of various sensors, densities, regions, and object types. It includes 13 classes featuring intensity and/or colour attributes. Tests using ESTATE demonstrate that the dataset improves the classification performance of deep learning techniques and could be a game-changer to advance in the 3D classification of urban objects.


INTRODUCTION
Urban point clouds have recently played an important role in 3D scene interpretation (Xie et al., 2020;Grilli et al., 2021).The growing use of reality-based 3D techniques is sparking research efforts to develop solutions for point cloud analyses useful for building modelling (Özdemir and Remondino, 2018), urban management (Zolanvari et al., 2019), street furniture extraction (Bai et al., 2021), digital twin generation (Ismail et al., 2023).Operative approaches for point cloud categorization rely on hand-crafted feature extraction rules and a variety of machine learning-based classifiers (Zhang et al., 2023).With the advancements in deep learning-based techniques, the use of deep neural networks has gained traction (Hu et al., 2020;Hu et al., 2021;Mao et al., 2022;Ren et al., 2023), including the combination of logic rules (Grilli et al., 2023).Although there have been positive findings, the classification of 3D point clouds still encounters numerous difficulties when applied in real-world scenarios.While current methods perform well on single datasets, they generally struggle to generalize when faced with cross-dataset circumstances, where the training and test data are obtained from distinct distributions (Wang et al., 2021).Figure 1 shows how point clouds from distinct datasets can vary in terms of density, colour, noise and shape.This variation is particularly visible in small and generally underrepresented objects, such as cables, traffic lights and garbage boxes.With the advancement of 3D digitization techniques and the increase in point cloud density, machine and deep learning methods suffer even more in detecting small urban objects.The presence of under-represented classes limits the performance of state-of-the-art neural networks.Therefore, to allow an effective utilization of deep learning-based algorithms in realworld contexts, especially for supporting the needs of municipalities and mapping agencies, a set of discriminated 3D urban elements must be incorporated.The use of reality-based 3D surveying data with respect to synthetic ones (Wu et al., 2015;Chang et al., 2015;Deitke et al., 2022) can enhance dataset generalization and expand its applications.

Paper motivation and aims
The motivations behind the presented research activities include:  • data should support object classification.Therefore, the aim of the paper is to present ESTATE, a new dataset to improve the identification and classification of normally under-represented objects in urban point clouds, including generalization capabilities (Figure 2).ESTATE contains 13 objects (classes).It is produced by combining inhouse and available heterogeneous datasets and could also be used for semantic segmentation purposes.The reported analyses demonstrate that ESTATE improves the classification performances of deep learning techniques.

STATE OF THE ART
3D point cloud datasets for object classification purposes can be broadly categorised based on the location (indoor vs outdoor) and scene type (synthetic vs real-world) (Table 1).Among the widely recognized datasets, Objaverse (Deitke et al., 2022), ModelNet10 (Wu et al., 2015), ModelNet40 (Wu et al., 2015), ModelNet40-C (Sun et al., 2022) and ShapeNet (Chang et al., 2015) consist of synthetic and indoor object samples.Despite their sizes, the synthetic contents limit their applicability in realworld scenarios with environmental variability and unpredictability.On the other hand, datasets like ScanNet (Dai et al., 2017) and ScanObjectNN (Uy et al., 2019) provide real-world data captured from indoor environments.Although these datasets introduce more realistic scenarios compared to their synthetic counterparts, they still have disadvantages in representing generally underrepresented urban objects, such as traffic lights or street furniture, which are crucial for applications like autonomous driving and urban planning.The Sydney Urban Objects dataset (De Deuge et al., 2013) addresses some of these gaps by comprising reality-based point clouds of outdoor objects.However, the insufficient sample size of this dataset hampers its ability to generalize across the broad range of objects and complex urban conditions.In summary, while significant progress has been made in the development of 3D datasets for object classification, the field should continue to evolve with an increasing focus on enhancing the diversity, realism and practical applicability of the real-world dataset by including a sufficient variety and number of underrepresented urban objects.

THE ESTATE DATASET
To overcome the above-mentioned gaps, as well as generalization limitations of neural networks with existing datasets, we provide ESTATE (Figures 2 and 3), which includes various urban objects normally under-represented in the publicly available datasets.ESTATE contains 13 different classes of 3D points (with colour and/or intensity information) extracted and merged from 11 MLS/ALS/UAV-Photogrammetry datasets, which were created for 3D segmentation purposes: • WHU-Urban3D (Han et al., 2024): a large-coverage ALS and MLS annotated datasets (three subsets) containing urban scenes and roads from different cities (one of the subsets contains 37 annotated classes);  (FBK).The ESTATE dataset encompasses and refines semantically segmented 3D data from each of the presented datasets, focusing on 13 specific classes.It was noted that most of the datasets, due to semi-automatic or manual labelling procedures, contain many labelling errors; therefore, only higher quality and manually refined data were included in ESTATE.The dataset characteristics are summarized in Table 2.

EXPERIMENTS
The ESTATE dataset is generated to facilitate the accurate classification of generally under-represented objects.In the experiments, the data were split into the train (70%) and test (%30) with three different input configurations: Among the available recent deep learning methods (Wag et al., 2019;Wu et al., 2019;Guo et al., 2021;Lu et al., 2022), we evaluated the performance of KPConv (Thomas et al., 2019), which is commonly used in 3D semantic segmentation, object classification and SLAM segmentation benchmarks.KPConv utilizes radius neighbourhoods as input and applies weights spatially determined by a small set of kernel points.KP-CNN is a convolutional network with 5 layers for classification.Every layer consists of two convolutional blocks, with the exception of the first layer, where the first block is not stride.The convolutional blocks are structured similarly to bottleneck ResNet blocks (He et al., 2016), utilizing a KPConv instead of the traditional image convolution, along with batch normalization and leaky ReLu activation.After the final layer, the features undergo aggregation through global average pooling and are then processed by the fully connected and softmax layers.Only deformable kernels are utilized in the last 5 KPConv blocks.These kernels have proven to be highly effective in learning local shifts that can accurately adapt to the point cloud geometry and local structures.An optimizer is used to minimize the crossentropy loss by implementing a Momentum gradient Descent approach.The batch size is set to 16, while the momentum is set to 0.98.The initial learning rate is set to 10 -3 .The learning rate was set to decrease exponentially, with a chosen exponential decay that guarantees a division by 10 every 100 epochs during training of 300 epochs.A dropout probability of 0.5 is employed in the fully connected layers at the end.The initial subsampling grid size was set to 1 cm.The purpose of ESTATE data is to make deep learning models invariant to various density, sensor, and object types belonging to the same class.Thus, in order to determine whether the ESTATE dataset improves (i) the classification performance and (ii) generalization capability of deep learning methods, two different training and testing approaches were applied: Light in FBK (from 0.00 to 0.4), Light Pole in SensatUrban (from 0.50 to 0.80), Electrical Pole in Paris-Lille3D (from 0.67 to 1.00), Garbage Box in Swiss3DCities (from 0.44 to 0.73), Traffic Light (from 0.00 to 0.86) and Electrical Pole (from 0.24 to 0.69) in Toronto3D.However, no improvement was achieved in YTU3D and Hessigheim datasets.This is probably due to the high similarity of objects in the same class in those datasets.

CONCLUSIONS
The paper (i) introduced a new dataset for the 3D classification of urban objects and (ii) evaluated its benefits on a deep learning method with various input configurations.The shared data and research findings are publicly available at https://github.com/3DOM-FBK/ESTATE.The detailed collection of the 13 objects, which include some urban objects normally under-represented in commonly available datasets, enhances the practical utility of 3D object classification models.The reported experimental results indicate that the ESTATE dataset improved the overall performance of classification models.Intensity feature obtained more successful results than RGB colour inputs.This shows that additional features are able to improve the classification performance, but the inclusion of various colour features couldn't provide the expected improvement.In order to use the models to be trained on the ESTATE dataset in reallife scenarios, only the coordinate values of the objects can be considered since the results show that there are relatively low differences between the results of XYZ and XYZ+Intensity input configurations.Furthermore, the ESTATE dataset has the potential to be used for object classification as well as semantic segmentation, instance or panoptic segmentation (Figure 2), where objects in complex urban areas can be extracted using traditional preprocessing methods or unsupervised learning, graphs, etc. Future studies may evaluate the performances of other neural networks and focus on improving and integrating supervised and unsupervised learning techniques into a complementary process.

Figure 1 :
Figure 1: Examples of some objects included in the ESTATE dataset realized to improve the identification and classification of normally under-represented objects in urban point clouds.

Figure 2 :
Figure 2: Potential uses of the proposed ESTATE dataset.

Figure 3 :
Figure 3: Examples of instances (rows) available in the proposed ESTATE dataset and collected various available datasets (columns).

Table 1 .
A summary of some representative datasets for object classification in point clouds.

Table 2 .
Selected datasets and extracted objects (classes) featuring the proposed ESTATE dataset.ALS and MLS data include also intensity values.
• DublinCity (Zolanvari et al., 2019): a benchmark dataset including 13 manually annotated object classes from a LiDAR point cloud depicting the city of Dublin;

Table 3 .
ATAT results with KPConv on the 13 objects of ESTATE.For the Traffic Light class, the addition of Intensity and RGB features decreased the classification accuracy.It was observed that the RGB attribute decreased the classification accuracy of objects with various colour ranges in different datasets such as Traffic Light, Electrical Pole, Traffic Sign, Garbage Box and Bus.This finding is similar toSun et al. (2020), where the classification performance decreases with the addition of colour information.However, the best results were obtained with the addition of Intensity in Light Pole, Pole, Electrical Pole, Traffic Sign, Pylon, Garbage Box and Bus classes.According to these results, it is seen that the employed network uses the point cloud geometry predominantly, while RGB attributes (that can difference among datasets) decrease the generalization ability (which instead also increase with the Intensity attribute).Table4reports classification results using only XYZ information.The model trained on the ESTATE dataset using XYZ features improved the classification results.The F1-scores obtained for STST and ATST, respectively, have relatively small improvements of 0.95 to 0.96 for Light Pole in DublinCity, 0.85 to 0.89 for Traffic Sign in FBK, 0.63 to 0.64 for TR-MLS Pole.But, at the same time, the ESTATE dataset allows meaningful improvements for Bus in DublinCity (from 0.2 to 1.00), Traffic

Table 4 .
Classification results (F1-score) and number of instances per object with XYZ input for STST and ATST.