Feature weighting in DBSCAN using reverse nearest neighbours

DBSCAN is arguably the most popular density-based clustering algorithm, and it is capable of recovering non-spherical clusters. One of its main weaknesses is that it treats all features equally. In this paper, we propose a density-based clustering algorithm capable of calculating feature weights representing the degree of relevance of each feature, which takes the density structure of the data into account. First, we improve DBSCAN and introduce a new algorithm called DBSCANR. DBSCANR reduces the number of parameters of DBSCAN to one. Then, a new step is introduced to the clustering process of DBSCANR to iteratively update feature weights based on the current partition of data. The feature weights produced by the weighted version of the new clustering algorithm, W-DBSCANR, measure the relevance of variables in a clustering and can be used in feature selection in data mining applications where large and complex real-world data are often involved. Experimental results on both artiﬁcial and real-world data have shown that the new algorithms outperformed various DBSCAN type algorithms in recovering clusters in data. © 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )


Introduction
The digital universe is growing in size every year.This wealth of data being generated are usually stored in digital media, hence offering huge potential for its automatic mining.Raw data by itself are unlikely to be useful, and given its size they are very expensive to label.Clustering algorithms follow the unsupervised learning framework, which does not require labeled data to learn from.Given a data set Y containing n points, a clustering algorithm will produce a set of clusters so that the points assigned to a given cluster are similar according to some measure.These algorithms have been applied as a dominant data analysis tool in diverse fields including medicine, marketing, bioinformatics, image processing, computer security, geography, physics, and astronomy (see for instance [1] , and references therein).
There are indeed different approaches clustering algorithms may employ.Arguably, the most popular approaches are partitional, hierarchical, and density-based.Algorithms following the partitional approach produce a set of K disjoint clusters S = { S 1 , S 2 , . . ., S K } whose sum of cardinalities equals the cardinality of the data set itself -That is, S l ∈ S S l = n .Hierarchical algorithms go a step further by also producing information about the relationships between clusters, usually at a high computational cost.Density-based algorithms define clusters as areas of higher density, which allows clusters with arbitrary shapes.In this paper we focus on density-based algorithms.We direct readers interested in other approaches to the many sources in the literature (see for instance [2,3] , and references therein).DBSCAN [4] is a classic example of density-based algorithm, which remains very relevant [5] .It recovers clusters in a twostep approach: (i) it identifies the core points, that is, a set of high-density points; (ii) it forms clusters from these core points by grouping reachable points.Reachability is defined in such a way that no two points of different high-density regions, separated by a contiguous low density region, are reachable from each other.This definition makes DBSCAN intuitive and rather popular among the density-based clustering algorithms.
Unfortunately, DBSCAN does have shortcomings.Among these we have: (i) it requires two parameters with no obvious method to determine their optimum values.In fact, this algorithm is rather sensitive to these parameters; (ii) it is not particularly suitable for data sets containing clusters with widely different densities; (iii) it treats all features equally regardless of their contribution to the clustering.Various algorithms have attempted to address the shortcomings (i) and (ii), for details see Section 2 .
We find (i) and (ii) to be important, but (iii) is particularly interesting.This is an issue because in real-world data sets it is un-likely that all features will be relevant.In fact, even among relevant features there may be different degrees of relevance.Hence, density-based clustering algorithms may benefit from taking into account the relevance of each feature during the clustering procedure.This princip has been successfully applied to partitional algorithms and hierarchical algorithms (see for instance [6][7][8] , and references therein), but no such large research effort has been done in density-based clustering algorithms.
The main contribution of this paper is two-fold.We first introduce DBSCANR, a density-based clustering algorithm that uses reverse nearest neighbour to address shortcomings (i) and (ii).Second, we extend this work by introducing automatic feature weights.These feature weights are used to model the degree of relevance of each feature, and can be seen as a generalisation of feature selection.The latter either selects or deselects a particular feature.Feature weights assign a degree of relevance to each feature, a factor between zero and one.

Related work
Unfortunately, there is no precise widely accepted definition for the term cluster .A loose definition often employed is that a cluster is a compact set of similar points.Clearly, clusters may have different cardinalities, shapes and densities.Density-based clustering algorithms aim at discovering high-density regions that are separated from each other by contiguous regions of lower density [9] .It is intuitive to assign the term cluster to such high-density areas.These algorithms rely heavily on a density estimation function, but they do not usually make assumptions regarding the number of clusters in a data set, or the data distribution.This can lead to the identification of arbitrarily shaped clusters.In this section we describe some of the key algorithms related to our research, giving emphasis to those we experimentally compare with.
DBSCAN is often considered the most popular density-based clustering algorithm.As such, it can detect clusters of different cardinalities and shapes.Given a data set Y containing n points y i , each described over V features, DBSCAN begins by assigning each y i ∈ Y into one of three categories: (i) core; (ii) (directly) reachable; (iii) outlier.A core point is a y i ∈ Y with at least minPts points within a distance of .In other words, let and N (y i ) = { y j ∈ Y : d(y i , y j ) ≤ } . (2) The point y i ∈ Y is a core point iff | N (y i ) | ≥ minP ts , where minPts is a user-defined threshold.A point y j is said to be directly reachable from y i iff y i is a core point and y j ∈ N (y i ) .A point y j is reachable from y i if there is a path of points y i , . . ., y j where each point is directly reachable from the previous.Outliers are points that are unreachable from any other point in the data set.DBSCAN produces a clustering using the definitions above and following three simple steps: (i) for each y i ∈ Y , compute N (y i ) and identify the set of core points; (ii) for each core point, identify all reachable and directly reachable points; (iii) assign each non-core point (excluding outliers) to its connected cluster.
DBSCAN produces a clustering based on three things: , minP ts , and the distance function in use.Under usual conditions is inversely proportional to the number of clusters.A high leads to larger neighbourhoods and by consequence a lower number of clusters, while a low has the opposite effect.It is often stated that density-based clustering algorithms are capable of recovering clusters of arbitrary shapes.This is a very tempting thought, which may lead to some disregarding the importance of selecting an appropriate distance or similarity measure.This measure is the key to produce homogeneous clusters as it defines homogeneity, so it has an impact on the actual clustering.Most likely the impact will not be as obvious as if one were to apply an algorithm such as k -means [10] (where the distance in use leads to a clear bias towards a particular cluster shape, which is something that can also be exploited [11] ).However, the impact of this selection will still exist at a more local level.If this was not the case, DBSCAN would produce the same clustering regardless of the distance measure in place.
OPTICS [12] still requires two parameters, minPts and , but it manages to address DBSCAN's inability to deal with clusters of different densities.It does so by taking into consideration the distance between core points and the minPts th nearest point when calculating the reachability distances.This essentially allows OP-TICS to identify clusters in data of varying density.One should note that a higher incurs more computational cost [13] .
ISDBSCAN [14] has pioneered the use of reverse nearest neighbour (RNN) [15] in DBSCAN-based algorithms.Let us first make some important definitions.The nearest neighbour of y i ∈ Y is the point y j ∈ Y with the lowest distance to y i , with y i = y j .With this, we can now define the k -neighbourhood of y i as the set NN k (y i ) containing the k -nearest points to y i , with y i / ∈ NN k (y i ) .We can now make an important definition we will use later on.
there is no such guarantee for RNN k (y i ) .However, this RNN-based approach allows the algorithm to capture local densities in different regions of the data space, leading to the recovery of clusters having heterogeneous densities.In addition, ISDBSCAN attempts to lower the difficulty of using DB-SCAN by removing one of its parameters, , leaving only k (the number of nearest neighbours) as a parameter.
ISDBSCAN performs the clustering task in two-steps.First, it attempts to identify all outliers in a given data set.It does so by calculating the k -influenced outlierness of a point y i given by where den k (y i ) = 1 d(y i ,y t ) and y t is the k th-neighbour of y i .Second, the clustering algorithm is applied to the residual data set.This algorithm builds a cluster based on the density of 2 / 3 k .This was the best threshold identified by its authors.Of course, it is fair to assume that a different threshold may be found in experiments on different data sets.Hence, one may even argue that this threshold is in fact a parameter with no clear method to identify its optimal value.Such thought leads ISDBSCAN to have the same number of parameters as DBSCAN.
RNN-DBSCAN [16] aims at reducing the number of parameters of DBSCAN by adapting ISDBSCAN's RNN k -based density estimation.Thus, RNN-DBSCAN is able to recover clusters with different degrees of density by setting a single parameter, k .Unlike ISDB-SCAN, the density of a point is determined by a special combination of nearest neighbourhood and reverse nearest neighbourhood instead of the influence space.Given y i , y j ∈ Y there are three scenarios for connectivity.

The point y j is directly density-reachable from y
A point y j is density-reachable from y i if there exists a sequence of points C = (y i , . . ., y j ) , such that each of these points is directly density-reachable from the previous, and 3. The point y j is density-connected to point y i , if there is a point y t ∈ Y such that both y i and y j are density-reachable from y t .
Using the above, RNN-DBSCAN defines a cluster using a simple definition: any two points y i , y j ∈ Y belong to the same cluster if they are density-reachable or density-connected.RNN-DBSCAN indeed requires a single parameter to tackle the problem of variable density clusters, however, it does so considering all the features to be equally relevant.
Adaptive DBSCAN (ADBSCAN) [17] is a recent advancement in density-based clustering that requires two parameters, k and noise _ percent (the prior estimate of the noise ratio of the data set).This algorithm automatically discovers the number of clusters by initially building a nearest neighbour graph, and eventually dividing the data set into subgraphs.In the latter, two vertices are considered to be subgraph core points if they are the nearest neighbour to each other.The density of a point y i ∈ Y is give by where y k i is the k th nearest neighbour of y i .The original authors then specify a criteria for a subgraph to be a core subgraph based on the existence of a subgraph core point, the value of noise _ percent, the average, quartile, and standard deviation obtained with (5) .ADBSCAN is indeed an enhancement of DBSCAN as it seems to act well on data sets with large density variations.However, it introduces a new parameter to do so.
Density Peak Clustering (DPC) [18] has recently gained popularity [19][20][21][22] , due to its effectiveness and intuitive distance threshold parameter (with a suggested standard value).The general idea behind DPC is that cluster centres are high-density points that are surrounded by lower-density neighbours, and that these centres have a high relative distance to other points of higher density.DPC identifies K clusters and automatically assign points to them.The local density of point y i ∈ Y is the number of neighbours adjacent to y i within a user-defined cutoff distance d c .
As popular as it may be, DPC is not without weaknesses.Hence, it has been a target of numerous extensions.DPC-DBFN [23] improves clustering recovery by calculating local densities using a fuzzy kernel rather than a crisp kernel.Once the cluster centre is identified, before the label assignment, a new step is introduced to the DPC clustering process to form the high-density regions called cluster backbones.This is constructed by labelling a data point as a dense point, border point or noisy point.A point y i is a dense point if its density is equal to or higher than the average density over all points in the data set.Otherwise, y i is either a border point or a noise point depending on the variance of the distance between points.DPC-DBFN improves DPC to find clusters with various densities, shapes, and sizes, however, it introduces a new controlling parameter to distinguish border points from noise points.
The above clustering methods enhance DBSCAN, however, they treat all the features equally regardless of their degree of relevance, which can have a detrimental impact on the clustering.One could argue that nowadays the most interesting data sets are high-dimensional.In this type of data meaningful clusters often appear to be discovered in a particular subset of features rather than on all the available features [24,25] .A common solution is to apply a feature selection algorithm before the clustering.However, this introduces two issues: (i) it assumes that all clusters have the same relevant features; (ii) it assumes that all selected features are equally relevant.Both issues go considerably against intuition.In real-world data sets, it is perfectly possible to have a set of relevant features in which the relevance of each of them differs.

DBSCANR
In this section we introduce our density-based clustering algorithm, DBSCANR.Very much like DBSCAN (for details, see Section 2 ), DBSCANR needs to determine whether a point y i ∈ Y is core or directly reachable .In the case of DBSCANR this is determined using reverse nearest neighbour, RNN k (y i ) (see Definition 1 ).
Notice that we use the quantity of reverse nearest neighbours as the density of a point.So, the density of y j ∈ Y is higher than We can now make another important definition for DBSCANR.
The key idea of our method is that, each point in a cluster has to comprise of at least a given minimum number of points ( k ) in its reverse nearest neighbour.This way reverse nearest neighbourhood estimates density of a point by discarding those that do not consider the query point as their nearest neighbour.We find the above definitions of core and directly reachable entities to be more robust than those used by DBSCAN, and our experiments support this statement.
Given the basic definitions above, we can now go further and introduce other new key definitions for our method.Definition 4. A point y j ∈ Y is density-reachable from y i ∈ Y with respect to k , if there exists a sequence of points C = (y i , . . ., y j ) , such that each element is directly-reachable from the previous.Density reachability is the transitive closure of direct reachability.Any point other than core can not be mutually density and direct reachable, leading to the asymmetry illustrated in Fig. 1 c.This figure shows the more interesting asymmetric case of this definition in a 2 D vector space, which measures distance using (1) .Within cluster S c , two core entities are density-reachable from each other.The same can not be said for the entities that are not core .The following definition relates those non-core entities to the core entities they are density-reachable from.
Density-connectivity is a symmetric relation (see Fig. 1 b).Similar to the approach taken by DBSCAN, a DBSCANR cluster is a set of density-connected points holding the maximality with respect to density-reachability.

Definition 6. The purpose of any clustering algorithms is to
Here, we are particularly interested in hardclustering so that a given point y i can be assigned to a single cluster S c ∈ S, and 1. ∀ y i , y j ∈ Y : if y i ∈ S c and y j is density-reachable from y i wrt.k, Given k , we can recover a cluster in a two-step process.First, select the core point from the data set with the highest density (see Definition 2 ), using it to retrieve all related density-reachable points.Second, assign the latter to the cluster of the core point.
DBSCANR starts with the highest density core point y i ∈ Y from a sequence of core points C. If there is more than one point with the same highest density, then one of them is selected uniformly at random.Afterwards, DBSCANR retrieves all points that are densityreachable from y i wrt.k .This method, iteratively, recovers all clusters comprising the core points.Finally, each point that does not satisfy the condition for core point (see Definition 2 ) will be assigned to the cluster of its nearest core point.Although we use only global values for k , DBSCANR recovers clusters of different densities and shapes simultaneously (see Definition 6 ).

Input
Add each point in Y to C (as per definition 2).3: Identify the point q ∈ C with the highest density, and remove q from C.
Repeat steps 3 to 6 until | C| has converged.
7: Assign each unclustered point to the cluster of its nearest core point.
We can formalise the whole algorithm as follows.
In the above, the quantity of nearest neighbours, k , is a userdefined parameter.The quantity of clusters, K, is automatically found by the algorithm.
There are reasons why using reverse nearest neighbours makes our algorithm superior to others.Notice that the k -nearest neighbourhood of a point usually contains k points.However, with reverse nearest neighbour no such guarantee exists as local densities are taken into account.This is particularly helpful when attempting to identify clusters with very different local densities.For instance, Fig. 2 illustrates the neighbourhood using coloured circles.If y i ∈ RNN k (q ) and y i has never been assigned to S c Add y i to seeds .3: Add q to S c , and remove q from seeds (if q ∈ seeds ). 4: For each y i ∈ seeds If y i has never been assigned to S c Set q ← y i .

5: For each y i ∈ S c
Add to S c all points in NN k (y i ) that are not in C and have never been assigned to S c .
Set K to be the number of clusters in the clustering produced by Algorithm 1.
3: S = UpdateClustering ( Y, k, S, W ). 4: Update feature weights for each cluster (as per Equation 17).7: Assign each unclustered point to the cluster of its nearest weighted core point.

Input
C: Weighted core point vector.q : Weighted core point with the highest density.k : Minimum number of points.
and y i has never been assigned to S c Add y i to seeds .3: Add q to S c , and remove q from seeds (if q ∈ seeds ).4: Identify y i ∈ seeds , such that y i has not been assigned to any cluster.Set q ← y i .Repeat steps 2 to 4 until | seeds | = 0 .It is interesting to note that the empty reverse nearest neighbour set of point 1 can be associated with the separation of two widely variable clusters.This indicates that the reverse nearest neighbour can be used to identify the border point of a widely variable density cluster such as point 1, without the need of any special combination, while still maintaining the competitiveness in clustering recovery when compared with DBSCAN and its state-of-theart counterparts.Hence, our algorithm can find naturally meaningful clusters rather than clusters that fit a certain static neighbourhood query.Unlike ISDBSCAN and RNN-DBSCAN, we used only RNN for our neighbourhood calculation rather than any special combination of the nearest neighbourhood and its reverse counterpart.
When two clusters of widely variable densities are separated by very narrow sparse regions, recovering cluster borders may become difficult.To address this issue only core points are clustered in the initial clustering recovery step of our algorithm.We take the view that cluster borders are surrounded by non-core or border points, in the final clustering recovery step we assigned the border points to its nearest core neighbour cluster.Within the same cluster if the density varies, this cluster extension strategy includes all the points rather than assigning the non-core points as outliers just because it does not meet the clustering definition based on a special neighbourhood search condition.
DBSCANR requires a single user-defined parameter, and it is able to recover clusters of different densities.However, very much like its competitors DBSCANR still treats all features equally.

Weighted DBSCANR (W-DBSCANR)
In most pattern recognition tasks different features may have different degrees of relevance, and this certainly applies to clustering.Even if we assume that all features in a given data set are relevant, there may be different degrees of relevance.Given a cluster S l ∈ S, one can set the weight of a feature v to be inversely proportional to the dispersion of v within S l [26] .In other words, features that are more compact within a cluster are more discrim- inatory than those that are less compact.We considerably expand the above in order to introduce, perhaps for the first time, feature weighting to a density-based clustering algorithm.Given y i , y j ∈ S l we can calculate their distance using where β is a user-defined parameter, and w lv is the weight of feature v at cluster S l .Clearly, the balanced use of ( 7) for density estimation requires each weight to be non-negative and V v =1 w lv = 1 for each cluster S l ∈ S. Hence, the weighted k −neighbourhood of y i is the set N N W k (y i ) containing the k −nearest points to y i , calculated using (7) , with y i / ∈ N N W k .Notice that in this case the l in (7) represents the cluster y i belongs to.The above allow us to revisit our definition of reverse k -neighbourhood ( RNN k ), and present its weighted version.

RN N
Now, we are ready to make some important definitions for our algorithm.

Definition 7.
A point y j is weighted directly density-reachable from a point y i with respect to k and β, if Weighted density-connectivity is a symmetric relation.We now introduce the notion of weighted density-based cluster.Similar to DBSCAN, a weighted density-based cluster can now be defined as a set of weighted density-connected points which hold maximality with respect to weighted density-reachability. Definition 10.Weighted clusters are a partition of a data set Y containing n entities y i ∈ R V into K non-empty disjoint clusters S = { S 1 , S 2 , . . ., S K } .Here, we are particularly interested in hardclustering so that a given point y i can be assigned to a single cluster S c ∈ S. Thus, the final clustering is a maximal set of weighted density-connected entities subject to S k ∩ S l = ∅ for k, l = 1 , 2 , . . ., K and k = l satisfying the following conditions : 1. ∀ y i , y j ∈ Y : if y i ∈ S c and y j is weighted density-reachable from y i with respect to k and β, then y j ∈ S c .(Maximality) 2. ∀ y i , y j ∈ S c : y i is weighted density-connected to y j with respect to k and β. (Connectivity) Our proposed clustering algorithm recovers weighted cluster in two steps.First, it identifies the weighted core points with the largest number of similar core points in its neighbourhood.Second, it retrieves all points that are weighted density reachable from y i .

Calculating feature weights in W-DBSCANR
Feature weighting, can be thought of as a generalization of feature selection.Under this view, feature selection assigns a binary weight.A weight of one means the feature is selected, and a weight of zero means the feature is deselected.Feature weighting assigns a value, usually in the interval [ 0 , 1 ] , to each feature.In our model, the higher this value is for a particular feature, the more relevant the feature is.In fact, we go further and assign a weight to each feature at each cluster.Feature weighting is a rather intuitive approach because even among relevant features there may be different degrees of relevance.That is, a feature v may have different degrees of relevance at different clusters.Also, feature weights can be used as a starting point for feature selection (see for instance [27] , and references therein).
In order to calculate feature weights we introduce a new step to DBSCANR.This allows us to iteratively update each feature weight at each cluster based on the current partition.In the first iteration we set each feature weight, w cv , to 1  V so that all feature weights have the same value to start from.
With the above, we can recover K clusters from the first iteration of our algorithm, and represent this clustering using graphs.Let G be a graph with K components G (1) , G (2) , . . ., G (K) , so that the vertices of G (c) (with 1 ≤ c ≤ K) represent the data points of a cluster S c ∈ S. Given G (c) , we can generate V graphs is the feature-wise distance between its endvertices calculated using The equation above ensures a fair distribution of d W (y i , y j ) over each feature v .Notice that d W (y i , y j ) is calculated over all features.
However, the division by w β cv ensures the degree of relevance of a feature v at cluster S c ∈ S is taken into account.A lower weight leads to a higher distance, and by consequence a less compact cluster.
The above requires a precise definition of compactness.Given a graph G (c, v ) , representing the feature v at cluster S c ∈ S, we can calculate its compactness based on the edges of its minimum spanning tree (MST), G * (c, v ) , after removing all vertices of degree one.Let u i jc = 1 , if there exists an edge between y i , We can then measure the compactness of G (c, v ) with Now that we have a measure of compactness, we would like to minimise it over all clusters and features.Let ˆ w cv be the weight found in the previous iteration (or ˆ w cv = V −1 , if in the first iteration).The optimal w cv is that which minimises Table 6 Results of the experiments on the high-dimensional data sets.We measure cluster recovery using the Adjusted Rand index (ARI), F-Measure (FM), Normalised Mutual Information (NMI) and Accuracy (Acc).We can minimise the above, subject to whose derivative with respect to w ck is Equating the above to zero leads to λ β Summing the above over all features leads to Finally we have We are now ready to present our feature weighted density-based clustering method W-DBSCANR as follows:

W-DBSCANR complexity
Since W-DBSCANR is an extension of the DBSCANR algorithm, and DBSCANR needs to calculate the k-nearest neighbours of each point, if n is the cardinality of the data set, the direct implementation of W-DBSCANR has O (n 2 ) time complexity.The time complexity of W-DBSCANR depends on the following five parts: 1) the time complexity of finding k-nearest neighbours.The key issue of DBSCAN-type clustering methods is identifying each point type, which is a k-nearest neighbour problem.DBSCANR is not any different.Therefore, improving the k-nearest neighbour's complexity will improve the computational complexity of DBSCANR and of W-DBSCANR.Many techniques were proposed to improve the runtime of the nearest neighbour query.For instance, Kd-tree [28] , semiconvex hull tree [29] , and trinary-project tree [30] are some examples to name but a few.Most of the proposed algorithms degenerate in higher dimensional space [31] .If Kd-tree is used for the nearest neighbour query, the time complexity of finding the reverse nearest neighbours of each point is O (n log n ) [32] .DBSCANR computes all pairwise distances to determine the core and noncore points, which requires O (n 2 ) .2) the time complexity of finding the core points.Determining the core point requires O (n ) time given the nearest neighbours have already been calculated.3) the time complexity of clustering the core points.If c is the number of core points and r is the number of core points in the reverse nearest neighbour of the c points, then clustering core points requires

Table 7
The results of our experiments on the synthetic data sets with 50% added noise features.We measure cluster recovery using the Adjusted Rand index (ARI), F-Measure (FM), Normalised Mutual Information (NMI) and Accuracy (Acc).

Set up of experiments
In this section we describe the set up of our experiments.We experiment with both real-world and synthetic data sets, with and without added noise features.
The real-world data sets we experiment with were obtained from the popular UCI machine learning repository [33] and scikitfeature selection repository [34] , for details see Table 2 .These data sets have no missing values, or features with a range of zero.From some of these data sets we have generated two others by adding 0 .5 V and V noise features, respectively.Here, a noise feature is one composed entirely of within-domain uniformly random noise.We have added noise features so that we can evaluate how the algorithms we experiment with perform under such conditions.
The synthetic data sets we experiment with were also also obtained online [35] , for details see Table 1 .We generated two extra data sets from each of these in a similar way to what we did regarding the real-world data sets.
We have normalised all the data sets we experiment with using We opted for (18) rather than the z -score because the former is biased towards features under a unimodal distribution.Such features are inclined to have a lower standard deviation (when compared to multimodal features) which leads to higher z -score.Hence, features with a unimodal distribution are likely to have a higher contribution to the clustering than features with a multimodal distribution.However, multimodal features are those that are usually of particular interest during clustering.The algorithms we experiment with require parameters, we have set those according to the below.In all cases we attempted to identify the best possible parameters for each of the algorithms.All algorithms are deterministic, so the results in our tables are the best we could find.
1. DBSCAN: We experimented with k from 3 to 50 in steps of 1, and from the minimum pairwise to maximum pairwise distances for each data set in steps of 0.01.
The results we present in this section support our claim that density-based algorithms can benefit from feature weighting.Let us analyse the case of a particular data set a bit further.Figure 3 presents the feature weights obtained by our method, averaged over the three clusters there are in the Iris data set.We can see a higher weight in features 3 and 4 (petal length and petal width) than features 1 and 2 (sepal length and sepal width).These results are very much supported by the literature in partitional clustering algorithms (see for instance [26] ).
Given popular density-based algorithms tend to degenerate in higher dimensional space [31] , we experiment further with data sets with a much higher number of features than points.Table 6 presents the results on 10 well-known high-dimensional data sets, again under four clustering evaluation indices.W-DBSCANR has the best performance in 21 cases, while DPC-DBFN (the algorithm in second place) has the best results in 8 cases.

Experiments on data sets with added noise features
In this section, we present the results of our experiments with the data sets to which we added noise features.We find this set of experiments rather important because we can be certain these data sets have irrelevant features.Hence, we are interested in the behaviour of density-based clustering algorithms in this scenario.More specifically, we show the superiority of W-DBSCANR (in terms of cluster recovery) over the other algorithms we experiment with.First, we show the catastrophic effect noise features can have on data sets.Figures 4 and 5 show plots over the first and second principal components of the Aggregation and D31 data sets, respectively.In these, it is quite clear that the original data sets (those with no added noise features) have clear clusters.However, as the number of added noise features increases these clusters become less and less clear.This has a direct effect on cluster recovery as our experiments in this section demonstrate.
Tables 7 and 8 present the results of our experiments on the synthetic data sets adding 1 ( V × 0 .5 ) or 2 ( V × 1 ) noise features respectively, since these data sets are two dimensional.In this set of experiments, W-DBSCANR reached the highest expected ARI in 9 + 5 = 14 data sets (9 when adding one noise feature, and 5 when adding two noise features), and DBSCAN reached the highest ARI in 2 + 2 = 4 data sets.Given the presence of noise features in the original data set, unsurprisingly, ISDBSCAN and RNN-DBSCAN could not reach the highest ARI for any of the data sets, and could not recover the true number of clusters for most artificial data sets with noise feature under experiment (hence the dashes).We were, of course, happy to see that our W-DBSCANR reached the highest possible score of 1 in most data sets.Overall W-DBSCANR dominates with highest score in 35 + 32 = 67 of the 80 cases.DBSCANR takes the second place with the highest score in 8 + 8 = 16 cases.
Tables 9 and 10 present the results on the noise versions of the 15 real-world data sets we experiment with.We use the same noise model of before, adding 50% and 100% noise features to each data set (for details see Table 3 ).W-DBSCANR had results higher by about 0.15 in average (that is an increase of about 27%) when compared to the second best performing algorithm, DBSCAN.W-DBSCANR also reached the highest overall score in 42 + 35 = 77 of the 60 + 60 = 120 cases.The result for DBSCAN (the second best algorithm) was 15 + 16 = 31 .However, the total average cluster recovery of DBSCAN across all 15 data sets and 4 measures is only 1% higher than the proposed DBSCANR.Given the latter has only one parameter to be optimised (DBSCAN has two), we are tempted to claim DBSCANR is still rather competitive.Notice that the OP-TICS, ISDBSCAN and RNN-DBSCAN ceased to find the true number of clusters in some of the data sets and therefore we were forced to put dashes under score and parameters.

Conclusion and future work
Feature selection has a long history in the machine learning community.However, even among relevant features there may be different degrees of relevance.With this in mind this paper introduces, perhaps for the first time, the use of feature weights to density-based clustering algorithms.Our method, W-DBSCANR, is capable of generating a set of weights modeling the degree of relevance of features.In fact, it goes a step further by allowing the intuitive idea that a given feature may have different degrees of relevance at different clusters.Clearly, as a clustering method it does the above without requiring labelled samples.
Our experiments clearly demonstrate that W-DBSCANR outperforms other popular and new density-based clustering algorithms (for details see Section 6 ).We have demonstrated this is the case on a number of data sets with and without added noise features, high-dimensional or not, real-world and synthetic.Our evaluation made use of four measures, these being the Adjusted Rand Index, F-Measure, Normalised Mutual Information, and the usual classification Accuracy.However, this is not to say our algorithm has no limitations.For instance, W-DBSCANR has two parameters (the same number DBSCAN and most others have).One of these, β, helps define how much higher the weight of a compact feature should be in comparison to features that are less compact.That is, with a high β the standard deviation of the weights of a particular feature within a cluster are lower than with a low β.Although β seems somewhat stable (its optimal value is usually between 1.1 and 2.5) it would be better to have a method to estimate it.The same can be said for the other parameter, that is k , the number of nearest neighbours.
Another limitation with our algorithm (and with the vast majority of feature weighting algorithms in partitional clustering [7] ) is that feature weights are calculated in isolation.This is problematic when relevance is not found at a particular feature but instead in a group of features.We envisage that it should be possible to deal with this problem by grouping features (rather than points) in the data pre-processing stage.Of course, research is needed to find the exact way this should be done.The third main limitation of our algorithm is that feature weights are always used in distance calculations, even if the weight itself is negligible.It may be of benefit to perform some level of feature selection before applying our algorithm, or any other feature weighting method.Our future work will address the limitations above.

Declaration of Competing Interest
There is no conflict of interest.

K l=1 |
S l | = n .Our final clustering satisfies the following conditions:

Fig. 1 .
Fig. 1.Reverse nearest neighbour based density-reachability and density-connectivity. (a) y 1 is density-reachable from q ; q is not density-reachable from y 1 .(b) y 1 and y 2 are density-connected by q .

Algorithm 2 :
RecoverCluster (C, q, k ) .Input C: Core point vector.q : A point of high density.k : Minimum number of points.Output S c : A cluster 1: Set seeds ← ∅ and S c ← ∅ .2: For each y i ∈ C

5 :
Repeat steps 3 and 4 until | S| has converged.In Fig. 2 (a), the black circle of point 1 along with other coloured circles of point 2, 3, 4, and 5 corresponds to the k -nearest neighbourhood at k

Fig. 2 .
Fig. 2. The neighbourhood are shown in different colours (a) k-nearest neighbourhood at k = 3 (b) reverse k-nearest neighbourhood at k = 3 .

Algorithm 4 :
UpdateClustering ( Y , k , S, W ). Input Y : Data set.k : Minimum number of points.S: A clustering.W : A set of weight vectors.Output S : A clustering S = { S 1 , S 2 , • • • , S K } 1: Set C ← ∅ .2: Add each weighted core point in Y to C (as per definition 7).3: Identify the point q ∈ C with the highest density, and remove q from C. 4: S c = RecoverCluster (C, q, k, W ) 5: If | S c | ≥ k Add S c to S 6: Remove each point in S c from C Repeat steps 3 to 6 until | C| has converged.

5 :
For each y i ∈ S c Add to S c all points in N N W k (y i ) that are not in C and have not been assigned to a cluster.In Fig. 2 (b), the coloured circles represent the reverse knearest neighbourhoods of each point at k 3 , 5 } and RNN k (5) = { 2 , 3 , 4 } .Since point 2, 3, 4, and 5 do not have point 1 as their neighbour, the reverse nearest neighbour set of point 1 is empty, hence no circle around point 1 in Fig. 2 (b).

Fig. 4 .
Fig. 4. Clustering using true labels shown on the plane of the first two principal components (a) Aggregation original data set.(b) Aggregation data set with one noise feature.(c) Aggregation data set with two noise features.

Fig. 5 .
Fig. 5. Clustering using true labels shown on the plane of the first two principal components (a) D31 original data set.(b) D31 data set with one noise feature.(c) D31 data set with two noise features.

Table 1
The synthetic data sets we experiment with.

Table 2
The real-world data sets we experiment with.

Table 3
The real-world data sets with added noise we experiment with.
i ) | k (i.e.y i is a weighted core point ) Definition 8.A point y j is weighted density-reachable from a point y i , if there exists a sequence of points C W = (y i , ..., y j ) , such that each element is weighted directly density-reachable from the previous.Definition 9.A point y j is weighted density-connected to a point y i , if both y i and y j are weighted density-reachable from a common point y t .

Table 4
Results of the experiments on the original synthetic data sets (no noise features have been added).We measure cluster recovery using the Adjusted Rand index (ARI), F-Measure (FM), Normalised Mutual Information (NMI) and Accuracy (Acc).

Table 5
Results of the experiments on the original real-world data sets (no noise features have been added).We measure cluster recovery using the Adjusted Rand index (ARI), F-Measure (FM), Normalised Mutual Information (NMI) and Accuracy (Acc).
time. 5) the time complexity of updating feature weights for each cluster.If t is the number of iterations required for steps 3 and 4 of Algorithm 3 , m is the number of features, and n is the data points, updating feature weights for each cluster require O (tmn ) time.Thus the time complexity of W-DBSCANR is O (n 2 ) .
O (c log c + rc) time.4) the time complexity of clustering the unclustered points to their nearest core points.If there are l unclustered points and l is fairly less than the cardinality of the data set, then clustering l points requires O (l)