SimoSet: A 3D Object Detection Dataset Collected from Vehicle Hybrid Solid-State LiDAR

: Three-dimensional (3D) object detection based on point cloud data plays a critical role in the perception system of autonomous driving. However, this task presents a signiﬁcant challenge in terms of its practical implementation due to the absence of point cloud data from automotive-grade hybrid solid-state LiDAR, as well as the limitations regarding the generalization ability of data-driven deep learning methods. In this paper, we introduce SimoSet, the ﬁrst vehicle view 3D object detection dataset composed of automotive-grade hybrid solid-state LiDAR data. The dataset was collected from a university campus, contains 52 scenes, each of which are 8 s long, and provides three types of labels for typical trafﬁc participants. We analyze the impact of the installation height and angle of the LiDAR on scanning effect and provide a reference process for the collection, annotation, and format conversion of LiDAR data. Finally, we provide baselines for LiDAR-only 3D object detection.


Introduction
Autonomous driving technology has attracted widespread attention in recent years due to its potential to free drivers from tiring driving activities and improve travel safety and traffic efficiency [1,2]. As a key component of environment perception technology, the task of 3D object detection forms the basis and premise for intelligent vehicles to obtain information regarding the surrounding area to ensure safe autonomous driving. To achieve this, LiDAR is considered as the core sensor in strong perception schemes and has been deployed in autonomous vehicles [3,4].
Deep learning is an end-to-end method that does not require manual feature engineering and can uncover potential features of data [5,6]. Therefore, it is favored by more and more researchers in 3D point cloud object detection algorithms, driving the need for benchmark datasets. Existing open-source datasets collect point cloud data using mechanical spinning LiDAR, which differs from the hybrid solid-state LiDAR channel assembled by mass-produced vehicles, resulting in significant differences in the distribution of presented point cloud data. The opposite results may be exhibited on mechanical spinning LiDAR and hybrid solid-state LiDAR due to the different point cloud resolutions [7]. However, due to the regulatory protection of data by autonomous driving companies, there are no publicly available hybrid solid-state LiDAR point cloud datasets-inhibiting algorithm development and deployment.
To help fill the gap of automotive-grade hybrid solid-state LiDAR point cloud data and facilitate the application testing of 3D object detection algorithms, we launched SimoSet, an open-source dataset for training 3D point cloud object detection deep learning models, to further accelerate the application of 3D perception technology and provide baseline results The innovative KITTI dataset [10], released in 2012, is considered the first benchmark dataset collected by an autonomous driving platform. KITTI uses a 64-channel mechanical spinning LiDAR to collect data from urban areas, rural areas, and highways during the daytime in Karlsruhe, Germany. In the KITTI dataset, an object is annotated when it appears in the field of view of the vehicle's front-view camera. The ApolloScape [11], H3D [12], Lyft L5 [13], Argoverse [14], and A*3D [15] datasets launched in 2019 expanded the quantity and quality of open-source datasets for 3D object detection. ApolloScape utilizes two survey-grade LiDAR to capture dense point cloud data in complex traffic environments. H3D focuses on congested and highly interactive urban traffic scenes. Lyft L5 and Argoverse comprise abundant high-definition semantic maps and extend to tasks such as motion prediction and motion planning. A*3D increases the diversity of scenes in terms of time periods, lighting conditions, and weather circumstances, including a large number of night and heavily occluded scenes. The 2020 A2D2 [16] employs five LiDAR units to acquire point cloud data with precise timestamps. nuScenes [17] provides point cloud data within 70 m using a 32-channel mechanical spinning LiDAR, annotates 23 object classes, introduces new detection metrics that balance all aspects of detection performance, and demonstrates how the amount of data affects the excitation of algorithm performance potential. Waymo Open [18] equips one medium-range LiDAR with a scanning range of 75 m and four short-range LiDAR with a scanning range of 20 m to acquire various weather data in multiple regions. Two difficulty levels are set based on the number of points in the 3D bounding box, and the detection performance is divided into three levels according to the object distance. Currently, nuScenes and Waymo Open are the most commonly used open-source datasets for evaluating 3D object detection algorithms. Cirrus [19] adopts a long-range bi-pattern LiDAR to obtain point cloud data within a range of 250 m. PandaSet [20] provides point cloud data within a range of 250 m via a mechanical spinning LiDAR and a forward-facing LiDAR. The larger the data scale, the more diverse time and weather conditions and the more challenging scenes of open-source datasets promote the progress of 3D object detection technology. Vehicle long-range master LiDAR has evolved from being of the mechanical spinning type to the hybrid solid-state type. However, at present, there is no 3D object detection dataset composed of hybrid solid-state LiDAR point cloud data, and the generalization ability of data-driven deep learning algorithms is weak. These factors hinder the development and deployment of 3D object detection algorithms.

LiDAR-Only 3D Object Detection Methods
Three-dimensional (3D) object detection with point cloud data alone can mainly be summarized into two categories: voxel-based and point-based methods, depending on how one wishes to convert point cloud data to 3D representations for localizing objects.
The voxel-based methods convert irregular point cloud data into ordered grid representations and typically use PointNet to extract features. VoxelNet [21] is an innovative dataset that voxelizes the sparse point cloud and then uses Voxel Feature Extractor (VFE) and 3D convolutions to generate geometrical representations; however, the huge computational burden of the 3D convolutions results in low computational efficiency. To save the computational cost of empty voxels, SECOND [22] introduces 3D sparse convolutions and 3D submanifold sparse convolutions to reduce memory consumption. However, 3D sparse convolution is not user-friendly. To this end, PointPillars [23] converts 3D point cloud data to a 2D pseudo-image and uses highly optimized 2D convolution to achieve excellent performance. The user-friendly advantages of PointPillars mean that it is a mainstream approach suitable for industrial deployment. In order to further improve detection accuracy, CIA-SSD [24] makes use of spatial semantic features. Part-A2 [25] performs semantic segmentation on foreground points. SA-SSD [26] advises an auxiliary network that is only used during the training stage. The auxiliary network is used to guide the backbone to learn the structural information of 3D objects. This structure improves the detection accuracy without increasing the inference time. Voxel R-CNN [27] aggregates K-nearest neighbor voxel features in the second stage to refine 3D bounding boxes. FastPillars [28] proposes a Max-and-Attention Pillar Encoding (MAPE) module that can be used to minimize the loss of local fine-grained information in feature dimension reduction and applies structural reparameterization to make the network more compact and efficient. CenterPoint [29] presents using point representations to describe objects, which solves the constraints imposed by anchor-based methods on angle and size and diminishes the search space for objects. VoxelNeXt [30] proposes an efficient structure that predicts objects directly from sparse voxel features rather than relying on hand-crafted proxies. The voxel-based methods can achieve decent detection performance with promising efficiency. However, voxelization inevitably introduces quantization loss, leading to the loss of fine-grained 3D structural information. Moreover, the localization performance largely depends on the size of voxel grids. Smaller voxel grids can obtain more fine-grained feature representations but at the cost of longer running time. The point-based methods directly learn geometry from raw point cloud data without additional preprocessing steps and typically use PointNet++ to extract features. PointR-CNN [31] suggests a point-based two-stage 3D region proposal paradigm. In the first stage, proposals are generated by segmented foreground points. In the second stage, highquality 3D bounding boxes are regressed, exploiting semantic features and local spatial cues. However, it is inefficient to extract features directly from the original point cloud data. Therefore, 3DSSD [32] recommends farthest point sampling in feature space and Euclidean space and applies fusion strategy to remove part of the background points. To reduce memory usage and computational cost, IA-SSD [33] extracts semantic information from points, keeping as many foreground points as possible. The point-based methods adopt a two-stage pipeline. Specifically, they estimate 3D object proposals in the first stage and refine the object proposals in the second stage. The point-based methods have higher detection accuracy, but they spend 90% of the runtime organizing irregular point data [34], which is inefficient.
The point-voxel methods combine the advantages of point-based and voxel-based methods. PV-RCNN [35] aggregates point features into the voxel-based framework through a voxel-to-keypoint encoding technique. HVPR [36] designs a memory module to simplify the interaction between voxel features and point features. To avoid the computational burden of point-based methods while preserving the precise geometric shape of the object in the original point cloud, LiDAR R-CNN [37] generates 3D region proposals using voxel features in the first stage, and refines the geometric information of the 3D bounding boxes utilizing the raw point cloud coordinates in the second stage. Overall, different detection pipelines have their own advantages in terms of detection accuracy and/or operational efficiency.

SimoSet Dataset
Here, we introduce sensor specification, sensor placement, scene selection, data annotation, and data format conversion, then provide a brief analysis of our dataset.

Sensor Specification and Layout
The data collection uses a forward-facing hybrid solid-state LiDAR (RS-LiDAR-M1, 150 m range at 10% reflectivity). RS-LiDAR-M1 employs a micro-electro-mechanical system (MEMS) solution that covers a 120 • × 45 • spatial area through a zigzag scan pattern. Table 2 presents detailed specifications for the LiDAR. In mass-produced vehicles, the installation locations for forward-facing hybrid solidstate LiDAR include above the front windshield, on both sides of the headlights, and in the intake grille. Due to the curved structure of the windshield, the signal of the LiDAR will be attenuated, resulting in the inability to meet the requirements for ranging and resolution. Therefore, the plan of installing LiDAR inside the front windshield has not been promoted yet. There are obvious differences in size between different brands or series of vehicles in the same category. In order to measure the scanning effect of hybrid solid-state LiDAR with different heights, the positions above the front windshield, the sides of the headlights, and the intake grille of the sedan are set heights of 1.4 m, 0.7 m, and 0.6 m, respectively. Whereas, the positions above the front windshield, the sides of the headlights, and the intake grille of the SUV are set at heights of 1.6 m, 0.8 m, and 0.7 m, respectively. The location and height of hybrid solid-state LiDAR on sedan and SUV are shown in Figure 1. vehicles in the same category. In order to measure the scanning effect of hybrid solid-state LiDAR with different heights, the positions above the front windshield, the sides of the headlights, and the intake grille of the sedan are set heights of 1.4 m, 0.7 m, and 0.6 m, respectively. Whereas, the positions above the front windshield, the sides of the headlights, and the intake grille of the SUV are set at heights of 1.6 m, 0.8 m, and 0.7 m, respectively. The location and height of hybrid solid-state LiDAR on sedan and SUV are shown in Figure 1. To determine the installation position of the hybrid solid-state LiDAR on the vehicle, a LiDAR mounting bracket with two degrees of freedom is designed. The fixation plate can rotate along the pitch direction and slide vertically and is used to place the hybrid solid-state LiDAR. The spatial coordinate system is established according to the righthand rule. The direction from the origin O of the coordinate system to the positive direction of the X-axis is defined as the initial 0° rotation around the Y-axis. When the view from the positive direction of the Y-axis is towards the origin O of the coordinate system, the counterclockwise direction is defined as the positive direction of rotation around the Y-axis. The angle range of the fixation plate rotating around the Y-axis is from 0° to 15°. The height range of the fixation plate sliding along the Z-axis is from 0.1 m to 1.7 m. The LiDAR mounting bracket and its coordinate system are shown in Figure 2. To determine the installation position of the hybrid solid-state LiDAR on the vehicle, a LiDAR mounting bracket with two degrees of freedom is designed. The fixation plate can rotate along the pitch direction and slide vertically and is used to place the hybrid solid-state LiDAR. The spatial coordinate system is established according to the right-hand rule. The direction from the origin O of the coordinate system to the positive direction of the X-axis is defined as the initial 0 • rotation around the Y-axis. When the view from the positive direction of the Y-axis is towards the origin O of the coordinate system, the counterclockwise direction is defined as the positive direction of rotation around the Y-axis. The angle range of the fixation plate rotating around the Y-axis is from 0 • to 15 • . The height range of the fixation plate sliding along the Z-axis is from 0.1 m to 1.7 m. The LiDAR mounting bracket and its coordinate system are shown in Figure 2.
The installation height directly affects the scanning effect of the hybrid solid-state LiDAR. The hybrid solid-state LiDAR is placed at 0 • . The blind spot, the farthest ground line distance, and the effective detection distance are measured at the height corresponding to above the front windshield, the sides of the headlights, and the intake grille of both a sedan and an SUV. The effective detection distance is defined as the farthest distance at which the number of point cloud data representing pedestrian objects is no less than 5. It is observed that, although there is fluctuation in the point cloud from the same laser beam between adjacent frames, when the angle remains unchanged, the blind spot, the farthest ground line distance, and the effective detection distance tend to shorten in terms of height. In the current 360 • full coverage perception scheme, hybrid solid-state LiDAR is used as the main long-range LiDAR for forward detection on the vehicle, focusing on long distance. Therefore, the installation height of the hybrid solid-state LiDAR is set at 1.6 m.
The installation angle is an important factor affecting the scanning effect of the hybrid solid-state LiDAR. For the task of 3D object detection, more attention is paid to traffic participants walking or driving on the ground, such as cars, cyclists, and pedestrians. Therefore, the hybrid solid-state LiDAR is placed at 1.6 m. Subsequently, the blind spot, the farthest ground line distance, and the effective detection distance are measured every 5 • along the overlooking direction from 0 • to 15 • , starting from 0 • horizontally. When the height remains unchanged, as the angle increases, the blind spot, farthest ground line distance, and effective detection distance all decrease. This change is most significant at The installation height directly affects the scanning effect of the hybrid solid-state LiDAR. The hybrid solid-state LiDAR is placed at 0°. The blind spot, the farthest ground line distance, and the effective detection distance are measured at the height corresponding to above the front windshield, the sides of the headlights, and the intake grille of both a sedan and an SUV. The effective detection distance is defined as the farthest distance at which the number of point cloud data representing pedestrian objects is no less than 5. It is observed that, although there is fluctuation in the point cloud from the same laser beam between adjacent frames, when the angle remains unchanged, the blind spot, the farthest ground line distance, and the effective detection distance tend to shorten in terms of height. In the current 360° full coverage perception scheme, hybrid solid-state LiDAR is used as the main long-range LiDAR for forward detection on the vehicle, focusing on long distance. Therefore, the installation height of the hybrid solid-state LiDAR is set at 1.6 m.
The installation angle is an important factor affecting the scanning effect of the hybrid solid-state LiDAR. For the task of 3D object detection, more attention is paid to traffic participants walking or driving on the ground, such as cars, cyclists, and pedestrians. Therefore, the hybrid solid-state LiDAR is placed at 1.6m. Subsequently, the blind spot, the farthest ground line distance, and the effective detection distance are measured every 5° along the overlooking direction from 0° to 15°, starting from 0° horizontally. When the height remains unchanged, as the angle increases, the blind spot, farthest ground line distance, and effective detection distance all decrease. This change is most significant at 15°. The installation angle of hybrid solid-state LiDAR is set at 0° for ease of installation and maintenance.
The placement position of the hybrid solid-state LiDAR (above the front windshield of the SUV) is determined with respect to the position of the mass-produced vehicles and the scanning effect of the test at typical heights and angles. The hybrid solid-state LiDAR

Scene Selection and Data Annotation
The raw data packets were collected through an autonomous vehicle test platform equipped with a hybrid solid-state LiDAR at Yanshan University campus. After obtaining the raw sensor data, 52 scenes were carefully selected, each lasting 8 s. There were a total of 4160 point cloud frames of the forward-facing hybrid solid-state LiDAR. These scenes

Scene Selection and Data Annotation
The raw data packets were collected through an autonomous vehicle test platform equipped with a hybrid solid-state LiDAR at Yanshan University campus. After obtaining the raw sensor data, 52 scenes were carefully selected, each lasting 8 s. There were a total of 4160 point cloud frames of the forward-facing hybrid solid-state LiDAR. These scenes covered different driving conditions, including complex traffic environments (e.g., intersections, construction), important traffic participants (e.g., cars, cyclists, pedestrians), and different lighting conditions, throughout the day and at night. The diversity of the scenes helps to capture the complex scenarios found in real-world driving.
SimoSet provides high-quality ground truth annotations of the hybrid solid-state LiDAR data, including 3D bounding box labels for all objects in the scenes. The annotation frequency remains 10 Hz. For the 3D object detection task, cars, cyclists, and pedestrians are exhaustively annotated in the LiDAR sensor readings. Each object is labeled as a 3D upright bounding box (x, y, z, l, w, h, θ) with 7 degrees of freedom (DOF), where x, y, z represent the center coordinates; l, w, h, are the length, width, height; and θ denotes the heading angle of the bounding box in radians. All cuboids contain at least five LiDAR points [20], while cuboids with less than five object points are discarded. All ground truth labels of point cloud data are created by human annotators using SUSTechPOINTS [38]. Multiple phases of label verification are performed to ensure high precision as well as quality annotations. An example of a labeled hybrid solid-state LiDAR point cloud is shown in Figure 4, where the annotated bounding boxes are displayed in blue, and the object point cloud data within the 3D bounding boxes are colored in red.

Format Conversion and Dataset Statistics
We create a virtual camera coordinate system, add fake images, and convert the data format to the widely known KITTI dataset format for researchers to use. In the pre-processing stage of point cloud data, we discard the filtering of point cloud range by image borders. We measured that the number of point clouds of the pedestrian objects fluctuate near the annotation threshold at a distance of 75 m. Considering the location and number of the annotated objects, two levels of difficulty are designed based on the horizontal distance of objects. LEVEL_1 and LEVEL_2 correspond to the object ranges of   0 m, 35 m and   35 m, 70 m , respectively. The object horizontal distance, LEVEL_1 and LEVEL_2 , are defined as follows:

Format Conversion and Dataset Statistics
We create a virtual camera coordinate system, add fake images, and convert the data format to the widely known KITTI dataset format for researchers to use. In the pre-processing stage of point cloud data, we discard the filtering of point cloud range by image borders. We measured that the number of point clouds of the pedestrian objects fluctuate near the annotation threshold at a distance of 75 m. Considering the location where range object is the horizontal distance of the object, x and y are the horizontal center coordinates of the object, LEVEL_1 represents the easy level, and LEVEL_2 denotes the hard level. The evaluation metric for 3D object detection adopts Average Precision (AP) [39]. The AP is calculated as: where R is the equally spaced recall level, r is the recall, and ρ interp (r) is the interpolation function. Specifically The evaluation metric for 3D object detection adopts Average Precision (AP) [39]. The AP is calculated as: where R is the equally spaced recall level, r is the recall, and ( ) SimoSet provides pre-defined training (32 scenes) and tests (20 scenes) set splits. The training set contains 2560 frames with 14,249 3D object annotations. The test set consists of 1600 frames with 8309 3D object annotations. The proportion of 3D object annotations in the training and test sets is shown in Figure 6. SimoSet defines 3D bounding boxes for 3D object detection, and we anticipate the need to extend it to perform 3D multi-object  SimoSet provides pre-defined training (32 scenes) and tests (20 scenes) set splits. The training set contains 2560 frames with 14,249 3D object annotations. The test set consists of 1600 frames with 8309 3D object annotations. The proportion of 3D object annotations in the training and test sets is shown in Figure 6. SimoSet defines 3D bounding boxes for 3D object detection, and we anticipate the need to extend it to perform 3D multi-object tracking tasks in the future.

Baseline Experiments
We established baselines on our dataset with methods for LiDAR-only 3D object detection. Training and test sets were created according to the SimoSet pre-defined dataset split. The AP of seven DOF 3D boxes in 40 recall positions is adopted as the evaluation benchmark. Cars, cyclists, and pedestrians are chosen as detection objects, with 0.7 Intersection-over-Union (IoU) used for vehicles and 0.5 IoU used for cyclists and pedestrians. The class imbalance problem is an inherent attribute of autonomous driving scenarios. We tested the performance metrics of baseline algorithms under the Det3D [40] framework with a class-balanced sampling and augmentation strategy.
To establish the baseline for LiDAR-only 3D object detection, the typical voxel-based (SECOND, PointPillars, SA-SSD) and point-based (PointRCNN) methods were retrained, taking the widely deployed PointPillars algorithm for point cloud object detection as an example. As shown in Figure 7, the PointPillars network structure consists of three parts: a pillar encoder module that converts a point cloud to a sparse pseudo-image, a 2D convolutional backbone for feature extraction, a detection head for 3D boxes regression. PointPillars first divides the 3D space point cloud into pillars, extracts features using PointNet, and converts them into a sparse pseudo-image. Then, multiple 2D convolutions are utilized for downsampling to generate feature maps of different resolutions, which are aligned to the same size through multiple 2D deconvolution upsampling before being concatenated. Finally, the multi-scale features are fed into a region proposal network composed of 2D convolutional neural networks to regress the class, location, orientation, and scale.

Baseline Experiments
We established baselines on our dataset with methods for LiDAR-only 3D object detection. Training and test sets were created according to the SimoSet pre-defined dataset split. The AP of seven DOF 3D boxes in 40 recall positions is adopted as the evaluation benchmark. Cars, cyclists, and pedestrians are chosen as detection objects, with 0.7 Intersection-over-Union (IoU) used for vehicles and 0.5 IoU used for cyclists and pedestrians. The class imbalance problem is an inherent attribute of autonomous driving scenarios. We tested the performance metrics of baseline algorithms under the Det3D [40] framework with a class-balanced sampling and augmentation strategy.
To establish the baseline for LiDAR-only 3D object detection, the typical voxel-based (SECOND, PointPillars, SA-SSD) and point-based (PointRCNN) methods were retrained, taking the widely deployed PointPillars algorithm for point cloud object detection as an example. As shown in Figure 7, the PointPillars network structure consists of three parts: a pillar encoder module that converts a point cloud to a sparse pseudo-image, a 2D convolutional backbone for feature extraction, a detection head for 3D boxes regression. PointPillars first divides the 3D space point cloud into pillars, extracts features using PointNet, and converts them into a sparse pseudo-image. Then, multiple 2D convolutions are utilized for downsampling to generate feature maps of different resolutions, which are aligned to the same size through multiple 2D deconvolution upsampling before being concatenated. Finally, the multi-scale features are fed into a region proposal network composed of 2D convolutional neural networks to regress the class, location, orientation, and scale.
Considering the different coverage between the forward hybrid solid-state LiDAR used by SimoSet and the mechanical spinning LiDAR used in existing open-source datasets, it is unsurprising that the network configuration is slightly different. The model is trained on single frame point cloud data of hybrid solid-state LiDAR. The detection range along the x-axis is set to [0 m, 76. PointPillars first divides the 3D space point cloud into pillars, extracts features using PointNet, and converts them into a sparse pseudo-image. Then, multiple 2D convolutions are utilized for downsampling to generate feature maps of different resolutions, which are aligned to the same size through multiple 2D deconvolution upsampling before being concatenated. Finally, the multi-scale features are fed into a region proposal network composed of 2D convolutional neural networks to regress the class, location, orientation, and scale.  The AP results of 3D proposals at different levels on the test set for the trained PointPillars and PointRCNN models can be seen in Table 3. From Table 3, it can be seen that the point-based method has better performance than the voxel-based methods, and that the auxiliary network can aid feature representation. The PointRCNN has lower AP for pedestrian object at LEVEL_2, which may be due to the sparsity of the point cloud for distant pedestrians, resulting in insufficient neighbor point features being relied upon by the PointRCNN model. As the distance increases, the object point cloud gradually becomes sparse, leading to a decline in the detection performance of the object. We found that the AP of cars in the test results is lower than that of cyclists and pedestrians. This may be due to the fact that, in some scenes, vehicles are parked along the roadside sequentially, with the preceding vehicle occluding the shape features of the following vehicle's point cloud. The horizontal FOV of the hybrid solid-state LiDAR is only 120 • . When the object vehicle enters or exits the blind spot of the ego vehicle, no point cloud data are in the parts of the object vehicle outside of the coverage area of the LiDAR. However, as the distance between the object vehicle and the ego vehicle is relatively close, the object still requires special attention; thus, it is annotated. This is another reason why the AP of cars is lower than that of cyclists and pedestrians. The trained PointPillars model is applied to the test set for inference. A visualization of the results is shown in Figure 8. The predicted and ground truth bounding boxes are shown in green and blue, respectively. The object point cloud within the 3D bounding boxes are colored in red. Electronics 2023, 12, x FOR PEER REVIEW 12 of 14

Conclusions
In this paper, we introduced SimoSet, the world's first open-source dataset collected from automotive-grade hybrid solid-state LiDAR for 3D object detection. We collected 52 scenes in a university campus, annotated typical traffic participants of 3 types, counted the number of objects for each type, and provided pre-defined training and test set splits and statistical proportion. Two levels of difficulty are designed based on the distance of the objects, and the AP of 3D bounding boxes is used as the evaluation metric. The performance of the LiDAR-only detectors on the SimoSet were chosen as the baselines, and the sample quality of SimoSet was demonstrated by utilizing the PointPillars algorithm. We also presented details on the installation height and angle of the hybrid solid-state LiDAR, data collection, 3D object annotation, and format conversion. By introducing Si-moSet, we hope to help researchers accelerate the development and deployment of 3D point cloud object detection in the field of autonomous driving. We acknowledge that the number of data labels included in SimoSet is currently limited. In the future, we plan to expand the dataset with more diverse weather conditions, such as rain, snow, and fog. Further, we will extend SimoSet to include 3D point cloud object tracking.

Conclusions
In this paper, we introduced SimoSet, the world's first open-source dataset collected from automotive-grade hybrid solid-state LiDAR for 3D object detection. We collected 52 scenes in a university campus, annotated typical traffic participants of 3 types, counted the number of objects for each type, and provided pre-defined training and test set splits and statistical proportion. Two levels of difficulty are designed based on the distance of the objects, and the AP of 3D bounding boxes is used as the evaluation metric. The performance of the LiDAR-only detectors on the SimoSet were chosen as the baselines, and the sample quality of SimoSet was demonstrated by utilizing the PointPillars algorithm. We also presented details on the installation height and angle of the hybrid solid-state LiDAR, data collection, 3D object annotation, and format conversion. By introducing SimoSet, we hope to help researchers accelerate the development and deployment of 3D point cloud object detection in the field of autonomous driving. We acknowledge that the number of data labels included in SimoSet is currently limited. In the future, we plan to expand the dataset with more diverse weather conditions, such as rain, snow, and fog. Further, we will extend SimoSet to include 3D point cloud object tracking.  Data Availability Statement: SimoSet dataset and baselines code are available from the corresponding author upon request.

Conflicts of Interest:
The authors declare no conflict of interest.