Pseudo-labelling-aided semantic segmentation on sparsely annotated 3D point clouds

Manually labelling point cloud scenes for use as training data in machine learning applications is a time- and labour-intensive task. In this paper, we aim to reduce the effort associated with learning semantic segmentation tasks by introducing a semi-supervised method that operates on scenes with only a small number of labelled points. For this task, we advocate the use of pseudo-labelling in combination with PointNet, a neural network architecture for point cloud classification and segmentation. We also introduce a method for incorporating information derived from spatial relationships to aid in the pseudo-labelling process. This approach has practical advantages over current methods by working directly on point clouds and not being reliant on predefined features. Moreover, we demonstrate competitive performance on scenes from three publicly available datasets and provide studies on parameter sensitivity.


Introduction
Processing of point cloud data, such as scans acquired by LiDAR systems, is a topic of interest in the fields of machine vision and robotics [1]. For a machine to understand the contents of a scanned scene, it is often necessary to semantically segment the scene by labelling each point. Most current approaches to semantic segmentation tasks on point clouds use supervised machine learning methods which rely on abundant and accurately labelled training data. However, suitable training data is relatively scarce and expensive to generate because the task of manually annotating every point in a scene is laborious and time consuming. It is therefore advantageous to develop semantic segmentation methods that are effective when only a small amount of annotated data is available. Semi-supervised learning techniques have been used to effectively handle scarcity of labelled data by incorporating unlabelled data in training. However, only a few works *Correspondence: yasuhiro.yao.tc@hco.ntt.co.jp † Yasuhiro Yao and Katie Xu contributed equally to this work. 1 NTT Media Intelligence Laboratories, Yokosuka 239-0847, Japan Full list of author information is available at the end of the article have addressed this issue with specific regard to semantic segmentation of point clouds. We propose to integrate pseudo-labelling with PointNet [2] to form a technique which can semantically label a point cloud scene given only a few labelled points. Point-Net is a deep neural network architecture designed to work directly with point clouds which allows us to process the scene without explicitly defining pre-set features. Pseudo-labelling is a form of semi-supervised learning where a classifier trained, used to make predictions, and then retrained by taking select predictions as ground truth. We use this method to include initially unlabelled data in training. To aid in the selection of accurate pseudolabels, we prioritize pseudo-label assignment to points close to already labelled points. Since spatially near points have an increased likelihood of sharing a label, this has the effect of prioritizing predictions which are more likely to be correct.
In this paper, we describe our approach and evaluate the performance of our technique on three publicly available datasets, comparing results to our own baselines as well as state of the art methods. Our contributions are as follows: (2020) 12:2 Page 2 of 13 • We integrate pseudo-labelling with a state of the art architecture for deep learning on point clouds to semantically label a sparsely annotated scene. • We introduce a method for generating high quality pseudo-labelled training data by leveraging the assumption that spatially near points tend to be semantically similar.
• We demonstrate improved labelling accuracy compared to learning on sparsely annotated data without the aid of pseudo-labelling. The unweighted average F -score across classes was increased by 18.3% for the Oakland dataset [3], 14.8% for the Semantic3D dataset [4] , and 20.1% for the S3DIS dataset [5]. • We provide parameter sensitivity investigation on our method by varying key parameters (the size of local neighbourhood, the label selection thresholds, the number of labelled points, and the stopping point of the process).
An earlier version of this study was presented in International Conference on 3D Vision (3DV) 2019 [6]. In comparison to [6], which focused on handling outdoor datasets with no RGB information, we extend the method to utilize colour information of point cloud data, and to show the generality of our method by providing the experiment result with an indoor dataset. Additionally, we formulate the method in simpler way to reduce the number of hyperparameters while maintaining performance.

Semantic segmentation of point clouds
Feature-based pointwise classification such as [1,7,8] has traditionally been the method of choice semantic segmentation tasks [9]. Descriptive pointwise features are computed based on a local neighbourhood and used to train a classifier such as a random forest or a support vector machine. The usefulness of this approach is limited by its reliance on predefined features.
To overcome the limitations of traditional approaches, many recent works make use of deep neural networks. Examples include 2D convolutional neural networks (CNN) [10] that operate on rendered views of the data, 3D CNNs [11] that operate on voxel representations of 3D data, and networks that operate directly on point cloud data [2,12]. These methods are much more versatile than traditional approaches because neural networks are able to represent the data without predefined features. Superpoint graph [13] has achieved state of the art performance in multiple datasets for semantic segmentation. Superpoint graph combines PointNet [2] with graphical methods to encode local features and contextual information. In our work, we also use PointNet as the base on which our method is built.
The works described above focus on training with densely labelled point clouds. However, we are interested in learning based on a reduced number of labelled points. Interactive segmentation methods such as [14][15][16] can be used to label groups of points by making a binary foreground/background classification based on sparse annotations. However, to obtain a full semantic labelling of the scene, it would be necessary to manually identify and classify every object instance. This can prove to be a time consuming task if there are many distinct objects scattered throughout the scene. In [9], unsupervised presegmentation is used to generate examples of objects first before annotation by a human operator. Labelled examples are then used along with pairwise constraints to train a classifier in a semi-supervised fashion. This method operates on a range image representation of the scene and uses a CNN for classification. The range image representation restricts the applicability of this technique, as individual data frames are often not available. Segmentationaided classification [17] also uses pre-segmentation to work with sparse annotation; segments are classified initially based on the output of a pointwise classifier and further processed by a conditional random field (CRF). Their use of weak supervision (by the pointwise classifier output) overcomes the need for manual classification of object instances. An important drawback of this method, however, is reliance on carefully engineered features to capture geometry. Nonetheless, this method is, to the best of our knowledge, the state of the art for this application.

Semi-supervised learning
Semi-supervised learning refers to the use of both labelled and unlabelled data in machine learning. It is often used to enhance performance when limited amounts of labelled training data are available. In some cases, results competitive with fully supervised learning have been achieved using substantially less labelled data [18]. Pseudo-labelling [19] is an approach to semi-supervised learning which takes some of the model's own predictions as ground truth for training. The process is iterative in nature and alternates between training and pseudo-label generation. A number of variations on pseudo-labelling exist; for example, Iscen et al. [20] recently proposed transductive label propagation as a way of generating pseudolabels. Pseudo-labelling itself is a variation of self-training, adapted for use with deep neural networks. Self-training, sometimes known as bootstrapping, is one of the earliest approaches to semi-supervised learning [21]. Self-training and pseudo-labelling share the commonality of using an existing model to automatically generate additional training data. However, label selection and retention procedures differ. In this work, we present our own variation of pseudo-labelling specifically targeted towards point cloud processing. Pseudo-labelling was chosen over other semi-supervised learning methods for its simplicity and adaptability. It is a wrapper algorithm which can be used around almost any base classifier, and its basic concept is not inherently reliant on assumptions typically used in semi-supervised learning (cluster, smoothness, low density separation, and manifold assumptions) [21]. Rather, assumptions are imposed by choices in base classifier and label selection criteria.

Problem statement
In this paper, we consider the task of semantically labelling a point cloud scene given only a small number of annotated points. Point labels are drawn from a set of known, mutually exclusive semantic classes. Point cloud scenes of interest typically contain over several hundred thousand points. For annotations to be manually producible within a short amount of time, we set the number of labelled points to be a few tens per class (Fig. 1a). We believe such a small amount of scattered initial points can be manually selected in practical situations.

Method
Our method is based on pseudo-labelling, a semisupervised learning technique described by Lee [19]. It operates by alternating between training of a classification network and label propagation (Fig. 1). The classification network is used to predict point labels based on its local neighbourhood of points. It is trained in a supervised fashion using originally labelled and pseudo-labelled points. Pseudo-labelled points are points which were originally unlabelled but have been assigned a pseudo-label. Pseudolabels are assigned during the label propagation step in each iteration by selecting "good" predictions based on its confidence and spatial relationship to already labelled points. Once assigned, pseudo-labels continue to be used as ground truth in all subsequent training steps. The process continues until all unlabelled points have been assigned a pseudo-label; at the end, pseudo-labels are taken as the final semantic labelling. This stands in contrast to pseudo-labelling as described by Lee [19] where pseudo-labels are discarded every iteration rather than accumulating as we have done. We choose to accumulate pseudo-labels because, as we demonstrate in Section 5.5.2, labels assigned early in the process are highly accurate. Thus, we avoid overwriting this useful information. In practice, due to label selection criterion and the decreasing number of unlabelled points, the number of pseudolabels assigned decreases exponentially every iteration. Thus, continuing to iterate until no unlabelled points remain is not feasible in a reasonable amount of time. For this reason, we end the process by assigning labels to all remaining points once more than 95% of the scene has been labelled. We study the effect of changing the 95% cutoff in Section 5.6.4. In the next two sections, we describe the network training and label propagation steps in more detail.

Network and training
Traditional classification methods are limited by the need for predefined features. For this reason, we choose to use a neural network for feature learning and point classification. Specifically, we select the PointNet architecture [2] because it is simple and is not subject to the computational expense and loss of information associated with conversions to voxel and image representations. We use their implementation of the classification network as described in [2] with similar parameters and training schedule. Details on the network architecture and training parameters are given in Appendix A.
To facilitate feature extraction, each point i is represented by its local neighbourhood in the local coordinate frame. We define the neighbourhood as the collection of points within a radius r of i. Using this as input to the network, the model is trained as described above. This is equivalent to PointNet++ with single scale grouping and a single set abstraction layer [12], and we use their implementation of ball query to compute the local neighbourhood. The output of the network is a softmax normalized score which represents the probabilistic classification of the point. The predicted class is the class corresponding to the highest probability, and probability itself is the confidence of the prediction c.
Every iteration, the model is trained until convergence using a modified version of the cross entropy loss used in [2]. Convergence is determined by keeping a moving average of training accuracy. If this value changes by less than 0.2% for five consecutive updates, training is said to have converged and pseudo-labels are assigned after the end of the epoch. Cross-entropy loss weighted by the proportion of pseudo-labelled data to originally labelled data is used to account for the increasing quantity of pseudo-labelled points compared to originally labelled points.
For discrete distributions with mutually exclusive classes, cross-entropy loss is given by where l is a training class of a point as a one-hot vector, p is the probabilistic prediction made by the network, S is the set of pairs of l and p for all labelled points, S a is the set of pairs of l and p for initially annotated points, and S p is the set of pairs of l and p for pseudo-labelled points. Note S a S p = S and S a S p = ∅.
As pseudo-labelling progresses, cardinality |S p | becomes much greater than |S a |, causing the model to increasingly favour fidelity with pseudo-labelled data over the originally labelled data. This is undesirable for two reasons: (i) the originally labelled data is guaranteed to be correct whereas the pseudo-labelled data may contain errors and (ii) the quantity of pseudo-labelled data may be highly imbalanced across classes, which is known to adversely impact learning. The originally labelled data, on the other hand, can be selected to be well balanced. We compensate for these effects by scaling the first term by the proportion of pseudo-labelled data relative to originally labelled data. The modified loss L thus becomes In practice, this effect is achieved by repeatedly sampling from the original labelled data. In the event that not all pseudo-labelled points are used in training, S p is a set of pairs of l and p for only points actually used in training in one epoch.

Label propagation
In the label propagation step, the model is evaluated to generate predictions for all unlabelled points which have not yet been assigned a pseudo-label. Predictions made with confidence (or modified confidence) values above a threshold t conf are selected as pseudo-labels. The confidence c of a prediction is defined as the maximum element of the softmax normalized probability of the model output.
Our observations confirm that c is strongly correlated with accuracy, even on data not used in training. Additionally, we experiment with using modified confidence values c and c for label selection instead of c alone. c incorporates awareness of spatial relationships by applying a multiplier k dist which reduces the confidence of a prediction if it is far away from already labelled points of the predicted class. This encourages spatially smooth labelling, which is desirable because point cloud representations of reality generally display some degree of spatial regularity. Formally, where k dist is defined as follows, by assuming that the probability of two points sharing a label follows a normal distribution based on the distance between them.
Here, d is Euclidean distance between the two points and σ is the standard deviation of the distribution. For a given prediction, d is computed as the distance to the nearest point of the predicted class. σ is selected so that k dist slightly less than t conf when d = r, where r is the radius of the local neighbourhood described in Section 4.1. This has the effect of restricting label selection to points that are within a distance of r of existing labelled points. When RGB data is available, we can extend this idea to RGB-space by defining k rgb based on d rgb , the distance in RGB space, rather than the physical distance. k rgb is applied to c in addition to k dist as follows With k rgb defined as σ rgb is the standard deviation of this distribution and it is selected in a similar way to σ . Since k dist decreases, approaching zero as distance increases, the maximum confidence of predictions far away from already labelled points are reduced below the selection threshold and have no chance of being selected as pseudo-labels. This is problematic because it prevents instances of a class which are not near the originally labelled points from being labelled. As a provision for such a situation, we temporarily ignore k dist or k rgb when the number of pseudo-labels assigned in an iteration drops below 1000 as followings. When k rgb is used, we first ignore k rgb . If the number of predictions selected remains still less than 1000 with ignoring k rgb then k dist is also ignored. Finally, if less than 1000 predictions are selected despite ignoring k dist and k rgb , then all remaining unlabelled points are labelled and the process is stopped.

Competing methods
To demonstrate the performance of our method compared to the state of the art, we evaluate against segmentation-aided classification (seg-aided) [17] as well as their baselines, CRF-regularization (CRF-reg) [22], and pointwise classification with a random forest [1]. We use their implementations of CRF-regularization [23][24][25][26][27] and segmentation-aided classification [17,28] with our own implementation of a random forest classifier using their geometric features. Neighbourhood selection was performed following the procedure described by Weinmann et al. [1]. Details are given in Appendix B.
In their original work, Guinard and Landrieu [17] describe four local descriptors (linearity, planarity, scatter, and verticality) and two global descriptors (elevation and position with respect to the road). All descriptors were used for initial pointwise classification and only the local descriptors were used for segmentation. However, we only implement the local descriptors which are used for both initial classification and segmentation. We do not include the global descriptors because they are not applicable to indoor scenes, which we also consider.

Data
We test our method on scenes from two publicly datasets: the Oakland 3D point cloud dataset [3], the Semantic3D large scale point cloud classification benchmark [17], and the Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS) [5]. For each scene, we choose a small number of points to use as labelled data. The remaining points are used as unlabelled training data and for evaluation. For the Oakland and Semantic3D datasets, we deliberately choose a data setup similar to [17] to help the readers refer their evaluations along with ours.
The Oakland dataset is a labelled 3D point cloud captured by mobile laser scanners near the CMU campus in Pittsburgh, Pennsylvania. The scene is divided into five classes: foliage, wire, pole, ground, and façade. Specifically, we take the urban portion of the test set consisting of 655,273 points. For training, 15 labelled points are randomly selected from each class for a total of 75 points.
The Semantic3D dataset consists of several outdoor scenes captured by stationary 3D scanners. We consider one of the urban scenes (domfountain1). The full dataset consists of 8 classes, man-made terrain, natural terrain, high vegetation, low vegetation, buildings, hard scape, scanning artefacts, and cars; however, our chosen scene does not contain any natural terrain. This dataset also includes unlabelled points. As with the Oakland dataset, we prepare this dataset similarly to Guinard and Landrieu [17]. We start by subsampling the scene to 3.5 million points. High vegetation and low vegetation are then combined into a single class and all unlabelled points are removed. 1,982,375 points remain, divided between 6 classes: terrain, vegetation, buildings, hardscape, scanning artefacts, and cars. For training, we randomly select 30 points per class for a total of 180 points.
The S3DIS dataset is a large scale dataset comprised of coloured scans of indoor areas. The full dataset consists of 6 areas from 3 buildings. Each area is divided into sections such as offices, auditoriums, and hallways. For our experiments, we choose to work with a single room consisting of 759,861 points (area 3, office 1). This dataset has 13 classes; however, only 11 of these appear in our section of choice. These classes are ceiling, floor, wall, beam, window, door, table, chair, bookcase, board, and clutter. The two classes that do not appear are sofa and column. Just as with the Oakland dataset, we randomly select 15 labelled points per class for a total of 165 points.

Evaluation metric
Following Guinard and Landrieu [17], we evaluate our results using the unweighted average of F-scores across classes. This metric compensates for class imbalance because it is not influenced by class cardinality and tends to favour balanced performance across classes. That is to say, exceptionally poor performance in a given class is not easily compensated by exceptionally good performance in another. The evaluation metric is computed based on pseudo-labels assignments at the end of the process. It is sometimes the case that the Fscore for a class is undefined. This happens when either no instances of the class are predicted correctly, or no instances of the class are predicted at all. When taking the average, undefined F-scores are treated as 0. In our results, we note also the overall accuracy and per-class F-scores.

Experiment conditions
Unless otherwise specified, r = 1 is used for experiments on the Oakland and Semantic3D datasets and r = 0.25 for the S3DIS dataset. When k dist is used, σ is specified so that k dist = t conf − 0.01 when d = r. When k rgb is used, σ rgb is selected in the same way except with r rgb = 15 is used instead of r. In Section 5.6, we further explore the effect of changing r and t conf using the Oakland dataset.
To evaluate the performance of our method on the Oakland and Semantic3D datasets, we perform the following experiments: (2020) 12:2 Page 6 of 13 i) Pointwise RF-classification based on local geometric features using a random forest classifier trained on only the labelled data ii) CRF-reg-CRF-regularization applied to the random forest initial classification iii) Seg-aided-segmentation-aided classification with segmentation based on local geometric features applied to the random forest initial classification iv) Supervised baseline-PointNet trained on only the labelled data v) Ours no k dist -pseudo-labelling with k dist not applied. Label selection based only on the confidence of the prediction (t conf = 0.98) vi) Ours with k dist -pseudo-labelling with k dist (σ = 4.02 and t conf = 0.95) For the S3DIS dataset, we perform the same experiments described for the Oakland dataset; however, training data was sampled from pseudo-labelled points so that the number of points taken from each class was the same (2,272 points/class). Furthermore, RGB information is available for this dataset; however, existing methods are not designed to handle RGB information. Therefore, we run these experiments once without RGB information and once using RGB information. For conditions i through iii, we include RGB information by using RGB values as features used to train the random forest; RGB values are not used for segmentation as we found this to be less effective than using geometry alone. For conditions iv through vi, we append RGB values to point features after the input transform in PointNet. For condition vi, following aforementioned derivation of σ in this section, we select σ = 1.005. When RGB information is used, we include an additional test condition where k dist and k rgb are both used: vii) Ours with k dist and k rgb -pseudo-labelling with k dist and k rgb (σ = 1.005, t conf = 0.95, and σ rgb = 60.30) Table 1 lists our results for the Oakland dataset. Our method at its best outperforms segmentation-aided classification [17] due to significant improvement in the pole class. However, their method achieves slightly better performance in other classes. Additionally, our method with k dist demonstrates substantial improvement over the fully supervised baseline, especially in the pole and wire classes. From our observations, this is due largely to an improvement in precision as fewer points are mislabelled as poles and wires. For the Semantic3D dataset, results are shown in Table 2. Again, our method with k dist achieves the best results overall, demonstrating significant improvements over both the state of the art and the fully supervised baseline. We notice, however, that performance of the competing methods is notably worse on the Semanitic3D dataset compared to Oakland, yet in their original paper, the authors report better performance on the Seman-tic3D dataset. This difference can be attributed to several factors:

Overall performance
• We did not implement their global descriptors which may have been important for this dataset. • We selected training data randomly rather than manually choosing representative points based on the geometric features. • In all cases, we use the same hyperparameters for both datasets. This may also explain why our own method also fares worse on the Semantic3D dataset compared to Oakland. However, we believe this result shows that our method is less dependent on hyperparameter settings than the alternative.
For the S3DIS dataset, results are shown in Table 3 (no RGB) and in Table 4 (with RGB). We can observe that incorporating RGB information is effective to improve the performance for both competing methods and our method. Among all, our method with k dist and k rgb (condition vii in Section 5.4) achieved the best result. Notably, segmentation-aided classification [17] failed to correctly classify the clutter and the wall classes on this dataset with RGB. Although their pointwise prediction gave some correct results, the points were mis-labeled after the segmentation aided smoothing. We believe this is because the correctly predicted points were segmented together with  a larger number of points from another class. On the other hand, our method stably predicted correct classes for all types of objects. Figure 2 shows the visualized results of semantic segmentation described in this section. We notice that for the Oakland and Semantic3D datasets, the k dist variation results in better performance, particularly in areas with low point densities. We believe this occurs because no k dist variations tend to wrongly label sparse scatter early in the process as a result of overfitting to the initial training data. On the other hand, applying k dist does not allow labelling of faraway points; as a result, most scatter is not labelled until the model has developed better generalization abilities by training on data with greater variation. The ability to perform well in low density regions presents an important advantage when working with real world data, which often contains large variations in point density.

Intermediate results
Our method labels the scene gradually by accepting confident predictions every iteration. In this section, we discuss the intermediate stages of the process for the case when k dist is hard. Figure 3 visualizes label assignments at three points in the process alongside error cases for each. Intermediate F-scores shown below the images are calculated by evaluating on the accepted pseudo-labels at each stage. Figure 4 plots intermediate F-scores against the percentage of points labelled. From these figures, we make two important observations. First, pseudo-labelled points selected early on are highly accurate. Thus, they provide the model with additional high quality training data. This is why our method was able to achieve improvements over the supervised baseline. Second, we observe that pseudo-labels remain quite accurate until most points had been labelled and that there is a rather sudden drop in performance when approximately 85% of the scene has been labelled. Interestingly, we note that this occurred when k dist was not applied (as described in Section 4.2, we do this when the number of pseudolabels assigned in an iteration drops below 1000). This confirms our initial assumption that spatially near points tend to be semantically similar. Additionally, based on these results, we suggest that it may be possible to improve performance by incorporating user interaction into our process. This can be accomplished, for example, by having the programme ask the user for additional annotations rather than ignoring k dist when progress slows.

Parameter studies
In this section, we investigate the effect of varying key process and data preparation parameters. Specifically, we  experiment with changing the size of the local neighbourhood (r), the label selection thresholds (t conf and t dist ), the number of labelled points (|S a |), and the stopping point of the process. Table 5 shows the effect of changing the neighbourhood radius r. We observe that there exists an optimal radius for this particular dataset around 1 to 1.5. We also observe that away from the optimum, a larger radius yields better results than a smaller radius. This is consistent with observations made by Qi et al. in [12]. Furthermore, we note that performance does not deteriorate rapidly as we stray from optimal values and remains competitive with segmentation-aided classification in most cases tested. The ground truth, b predictions made using Seg-aided [17], c predictions made by our supervised baseline, d prediction results made by our method without k dist or k rgb , e predictions made by our methods with k dist (for the Oakland and Semantic3D datasets) or with k dist and k rgb (for S3DIS dataset). Colours correspond to semantic classes. White dots indicate initially annotated points (2020) 12:2 Page 9 of 13  Table 6 shows the effect of varying the label selection thresholds. For these experiments, we restrict the search space by setting t dist = t conf − 0.01. These results show that a highly restrictive threshold is detrimental to performance while a more relaxed threshold yields favourable results. Furthermore, with the exception of highly restrictive selection thresholds, performance is not heavily influenced by small changes and remains competitive with segmentation-aided classification. Table 7 shows the effect of changing the number of labelled points. We tested both our method and segmentation-aided classification. We observe that our method cannot outperform the segmentation-aided classification when very few points are labelled. Furthermore, neither method benefits significantly from increased data. In fact, our method performs better with 15 labels per class than 30 or 100. This indicates sensitivity to the specific choice of initially labelled data. Thus, it would be beneficial to develop a suitable strategy for selecting annotations.

Process cutoff
In Section 4.2 we specify that, for practicality reasons, the process ends by assigning labels to all unlabelled points when more than 95% of the scene has been labelled. Here,  The segmentation-aided classification result is reproduced in the last row for reference The segmentation-aided classification result is reproduced in the last row for reference Experiments were performed using our method and segmentation-aided classification (2020) 12:2 Page 11 of 13 we show the effect of changing the cut-off point. In Fig. 5 we plot, against the percentage of pseudo-labels assigned, the F-score if the process was stopped at that point. The F-score was calculated after each iteration by evaluating on current pseudo-label assignments and predictions made on unlabelled points. We observe that in general, delaying the end of the process improves performance and thus stopping point selection becomes a trade-off between processing time and accuracy.

Conclusions
We have introduced a method for semantically labelling a point cloud scene given a small number of annotated examples. Our proposed method implements a pseudo-labelling training procedure using PointNet as a base classifier. In addition, we include spatial awareness by favouring points near existing labelled points when selecting pseudo-labels. We have demonstrated competitive performance over baseline and state of the art methods for this task. Moreover, our method has several advantages over current approaches. Most significantly, we are able to work directly with point clouds and do not rely on predefined features. Our method with k dist in particular was observed to perform well in regions with low point density, where other variants had failed. Additionally, we have shown that our method is fairly robust to changes in hyperparameter settings.
In the future, it is worthwhile to investigate methods to select favourable initial labels. It may also be possible to improve results by incorporating user interaction to avoid deteriorating performance during later stages of the process.
In addition, our experiments implicitly assume that the distributions of the labelled and unlabelled data are the same by selecting initial points randomly. We did not evaluate if this assumption is practical or how our method performs under the case this assumption did not hold. Regarding this, actual user study with manually created initials, along with user guidance, is a future work.