A Large-Scale Evaluation of Shape-Aware Neighborhood Weights and Neighborhood Sizes

In this paper, we define and evaluate a weighting scheme for neighborhoods in point sets. Our weighting takes the shape of the geometry, i.e., the normal information, into account. This causes the obtained neighborhoods to be more reliable in the sense that connectivity also depends on the orientation of the point set. We utilize a sigmoid to define the weights based on the normal variation. For an evaluation of the weighting scheme, we turn to a Shannon entropy model for feature classification that can be proven to be non-degenerate for our family of weights. Based on this model, we evaluate our weighting terms on a large scale of both clean and real-world models. This evaluation provides results regarding the choice of optimal parameters within our weighting scheme. Furthermore, the large-scale evaluation also reveals that neighborhood sizes should not be fixed globally when processing models. Finally, we highlight the applicability of our weighting scheme withing the application context of denoising.


Introduction
Point sets arise naturally in many kinds of 3D acquisition processes, like e.g.3D laserscanning.As early as 1985, they have been recognized as fundamental shape representations in computer graphics, see [1].Ever since, they have been used in diverse applications, e.g. in face recognition [2], traffic accident analysis [3], or archaeology [4].
Despite their versatility and their advantages-like easy acquisition and low storage costs-point sets have a significant downside to them when compared with mesh representations: They are not equipped with connectivity information.This is mostly due to the acquisition process.Consider for example a manually guided scanning device.The operator will scan those areas of the real-world objects that have very sharp features multiple times.Consequently, occlusion is prevented and the whole geometry is captured.Even though each scan can provide connectivity information on the respectively acquired points, the complete point set obtained via registration of the individual scans (see e.g.[5]) does not provide global connectivity information in general.Thus, the notion of neighborhoods has to be defined and computed for each point.
Many notions of neighborhoods, combinatorial or geometric, with global or local parameters, have been proposed and discussed (see Section 2.).Furthermore, the concept of weighting neighboring points is not new.For example, the pure selection of a neighborhood causes an equal treatment of all neighbors.Isotropic weighting is one common way, by for instance evaluating Euclidean distances via a Gaussian weighting function.This provides closer points with higher influence (see e.g.[6]).Additionally, other point set information can be incorporated, like density or distribution (see e.g.[7] or [8]).The inclusion of normal deviation in the area of anisotropic weighting has also been considered and discussed before (see [9,10]).
This research aims at investigating anisotropic weighting terms in a broad framework (Section 3.) which includes usual weighting choices such as all-one weights or sharp cut-off weights 1 .Our evaluation is processed via a Shannon entropy model (Section 4.), which is based on the work of [11,12].Furthermore, we aim at evaluating the weighting scheme on a large scale.This is to prevent over-interpretation of findings obtained from a very small set of models, as is often the case.Overall, the contributions of this work are: • Definition of a shape-aware neighborhood weighting utilizing sigmoid function weights based on normal variation; • Presentation of an evaluation model and proof of its non-degenerate cases as well as dependency on the sigmoid parameters; • Large scale experimental evaluation of the proposed neighborhood weighting concept; • Discussion of the results with respect to both neighborhood weighting and neighborhood sizes.

Related Work
Neighborhoods are very important in point set processing, as almost all algorithmic approaches rely on them, but only few discuss them properly.A common choice is to use heuristics to determine sufficient notions like the size of a combinatorial or metric neighborhood.In some areas neighborhoods arise as byproducts, for instance in segmentation where one could consider segments to impose a neighborhood relation on the points they cover.However, we aim at a more general framework for the determination and weighting of neighborhoods.In the following section, we recall works discussing heuristic neighborhood definitions.Several works have advanced from simple heuristics and derive more involved notions for better fitting neighborhood definitions in different contexts.These are mainly obtained from error functionals, which we will also discuss.

Heuristics
Most works consider either a combinatorial k-nearest neighborhood N k (•) or a metric ball B r (•) inducing a neighborhood.Both of these notions have parameters to be tuned, namely the number of neighbors k or the radius r of the neighborhood.Several works have been presented introducing heuristics to find appropriate values for k or r in different scenarios.The authors of [6] for instance use a global radius and change it to affect the running time of their algorithm.In [13], the authors fix a combinatorial number k of neighbors to be sought.Then, for each point p i , these k neighbors are found, which fixes a radius r i to the farthest of them.Finally, the neighbors within radius r i /3 are used.Therefore, their approach resembles the geometric neighborhood in a local manner.The method used in [14] is more involved.The authors recognize that both a too large or too small radius r lead to problems and thus aim for a local adaption like [13].A local density estimate δ i around each point p i ∈ P is computed from the smallest ball centered at p i , containing N k (p i ), where k is found experimentally to be best chosen from {6, . . ., 20} ⊂ N. Given the radius r i of this ball, the local density is set to be δ i = k/r 2 i .In a second step, a smooth density function δ is interpolated from the local density estimates δ i , hence this weighting involves the incorporation of density-information into the weight assignment.
In the context of surface reconstruction, the authors of [15] discuss several choices for neighborhoods and corresponding weights.While two of the three presented methods simply use geometric neighborhoods, the third method takes a different approach.Namely, the authors collect all neighbors of p i in a "large" ball ([15, page 7]) around p i .Then, they fit a plane to this preliminary neighborhood and project all neighbors and p i onto this plane.On the projections, a Delaunay triangulation is built and the induced neighborhood of the triangulation is used in the following computations, making their approach local and respecting distributions.
A completely different route is taken by [16].The authors first calculate features of a point set based on differently sized neighborhoods.Then, they use a training procedure to find the combination of neighborhood sizes that provides the best separation of different feature classes.
The inclusion of normal deviation and hence anisotropic weighting into neighborhood concepts is part of the work [9].The approach of the authors is to use a weighted principal component analysis, which fits our evaluation model.However, they rely on a global neighborhood size and assign sharp cut-off weights while we allow for changing neighborhood sizes and smooth weighting terms.

Error Functionals
While the approaches presented above are based on heuristics, some works try to deduce an optimal k for the k nearest neighborhoods based on error functions.For instance, the authors of [17] work in the context of the MLS framework (see [6,18,19,20]) for function approximation.The authors perform an extensive error analysis to quantify the approximation error both independent and depending on the given data.Finally, they obtain an error functional.This is then evaluated for different neighborhood sizes k.The neighborhood N k yielding the smallest error is finally chosen to be used in the actual MLS approximation.
In contrast, the authors of [21] deduce an error bound on the normal estimation obtained from different neighborhood sizes.Utilizing this error functional, they obtain the best suited neighborhood size for normal computation.The work of [17] heavily depends on the MLS framework in which the error analysis is deduced, while the work of [21] depends on the framework of normal computation.
The authors of [12] take a more general approach in the context of segmentation of 3D point sets.They also use the concept of combinatorial neighborhoods, going back to results of [22,11].In order to choose an optimal value for k, the authors turn to the covariance matrix, which is symmetric and positive-semi-definite.Thus, the matrix has three non-negative eigenvalues.Following an idea of [23], in the work of [14], the authors grow a neighborhood and consider a surface variation as a measure to grow a neighborhood around each point p i .The same quantity is used by [24].However, they do not grow a neighborhood, but choose a size k for it according to a consistent curvature level.The authors of [12] proceed to consider three more quantities derived from the eigenvalues of the covariance matrix reflecting point set features, see [11,12].Afterwards, following the concept of entropy by Shannon [25], they evaluate combinatorial and geometric neighborhood sizes via two error measures (see Section 4. for a detailed discussion).

Sigmoid Weights
we obtain the following weights Note that the argument of φ includes the deviation of the normals measured by the Euclidean scalar product.The term n i , n j ranges from −1 to 1, because we assume normals of unitlength.By shifting the scalar product and normalizing, the argument is in the range [0, 1].Note that by the symmetry of the scalar product the weights are symmetric, i.e. w ij = w ji .The weighting function φ shall assign non-negative weights between 0 and 1.These weights should correspond to the similarity of the corresponding normals, i.e. a small normal variation should result in weights close to 1, while a high normal variation should yield weights close to 0.
Our choice for the weighting function is a sigmoid.A sigmoid function is visually characterized by its shape of an "S"-curve, see Figure 1.We will consider a family of sigmoid functions that provide different interpolations between 0 and 1.The family is based on the trigonometric cosine function.It is related to the sigmoid used in [26], however, we fix the image of the function to be 0 or 1 respectively outside of [0, 1].  ).The threshold parameter a ∈ [0, 1) translates the curve along the x-axis and controls where the cosine-curve starts.Furthermore, the incline parameter b influences the slope of the cosine-curve, where a value of 1 results in a soft increase, while higher values of b cause increasingly steeper slopes until the curve simulates a sharp cut-off at b = ∞.An illustration of Equation ( 2) for different pairs of parameters (a, b) is given in Figure 1.Observe that we obtain equal weights ω ij ≡ 1 by parameters a = 0, b = ∞ and mimic a sharp cut-off at a ∈ [0, 1) by setting b = ∞.These observations relate our weights to the uniform weights used in [12] and to the sharp cut-off of [9], respectively.

Evaluation Model
Having presented the set of neighborhood weights in Equation ( 1) and the corresponding weighting function in Equation ( 2) in the previous section, we will now describe the mathematical background of our evaluation process.For this, we turn to the information measures originally introduced by Shannon [25].Specifically, we will use a variation of the quantities derived in [11,12] as we will present in Section 4.1.. First, we will establish the necessary notation and preliminary results.
Consider the covariance matrices C i ∈ R 3×3 given by where pi = 1 j∈N i p j is the barycenter of the neighborhood of p i and v T denotes the transpose of a vector v ∈ R 3 .The weights w ij are chosen according to Equation (1).The covariance matrix C i is symmetric and positive-semi-definite.Thus, it has three nonnegative eigenvalues, which in the following we will denote by Depending on the neighborhood N i and the assigned weights w ij , we can prove the following theorem about the covariance matrix C i .from Equation (2) as well as the covariance matrix C i given in Equation (3).Assume there are 1 , 2 ∈ N i , 1 = 2 with p 1 = p 2 and n 1 = −n 2 .Then there exists some a ∈ [0, 1), such that the sum of all eigenvalues of C i is strictly positive independent of the choice of b ∈ R ≥1 ∪ {∞}.
Proof.First, we make the following two observations: i) The weights w ij = sig cos a,b ( n i , n j + 1) • 2 −1 are non-negative for all j ∈ N i .This follows directly from the definition of the function in Equation (2).
ii) The matrix where Tr(A) denotes the trace of square matrix A and λ l ij , l = 1, 2, 3, are the eigenvalues of C ij .Equations 1 and 3 hold because of the relation between the trace and the eigenvalues, and 2 is justified by the linearity of the trace.
From the observations i) and ii) above we know that both w ij and λ ij are non-negative for all j ∈ N i and all ∈ {1, 2, 3}.Hence, the sum ( 5) is 0, if and only if all summands are.
We fix an arbitrary summand C ij .Assume that w ij = 0. From this, we set x := ( n i , n j + 1) • 2 −1 and deduce directly that w ij = sig cos a,b (x) = 0. Independent of the choice of b, by the reasoning of Appendix A, we obtain that x ≤ a.For n i = −n j , we have x > 0. Therefore, choosing a j := x 2 results in weights sig cos a j ,b (x) > 0 independent of b.
Finally, by setting a = min{a j | j ∈ N i }, we obtain a new set of weights Since we assumed that there is at least one pair of distinct points p 1 = p 2 with normals n 1 = −n 2 , we have at least one of the summands C i 1 , C i 2 to be non-zero and with it . Therefore, we constructed a parameter a such that the corresponding covariance matrix provides a strictly positive sum of eigenvalues independent of the choice of b ∈ R ≥1 ∪ {∞}.

Non-degenerate Covariance Matrix
Given the assumptions of Theorem 1, we can assume that C i = 0 ∈ R 3×3 .Therefore, we can derive certain quantities from the eigenvalues of the covariance matrix.In our context, we will consider the linearity L λ , planarity P λ , and scattering S λ .These are given by and represent 1D, 2D, and 3D features in the point set, respectively.See [11] for a derivation and a detailed explanation of these quantities.As C i = 0, we have λ 1 i = 0, therefore the quantities in Equation ( 6) are well-defined.Furthermore, because of the ordering of the eigenvalues given in Equation (4), we have L λ i , P λ i , S λ i ∈ [0, 1].Hence, as L λ i + P λ i + S λ i = 1, each of these three quantities can be interpreted as the probability of the considered point to be part of an intrinsic 1D, 2D, or 3D part of the geometry.The authors of [11,12] consider the first measure See Figure 2 for a plot of each summand of the equation.Note that while lim x→0 ln(x) = ∞ it is lim x→0 x ln(x) = 0, see Appendix B for a detailed discussion.Practically, the error measure E dim i assesses to what extent the neighborhood N i indicates a corner, an edge point, or a planar point of the geometry.In particular, the extreme cases all obtain E dim i = 0.  7) and ( 9) for x ∈ [0, 1] as all arguments L λ i , P λ i , S λ i , and The second measure is a more general solution for optimal selection of neighborhood sizes.For this, recall that the eigenvalues correspond to the size of the principal components spanning a 3D covariance ellipsoid, see [7].We denote their sum by λ Σ i = 3 =1 λ i .Then, by normalizing the eigenvalues with λ Σ i and recalling the positiveness of all eigenvalues, we once more obtain Therefore, these quantities can also be interpreted as probabilities for p i being a corner or part of an edge or planar area respectively.Furthermore, as we assume λ 1 i > 0, these terms are well-defined.By considering the entropy of the eigenvalues, i.e. the eigenentropy [12], we obtain the second measure Note that while the arguments are slightly different, the summands in this measure behave once more like the plot in Figure 2.However, in terms of the different extremal cases for eigenvalues given in Equation ( 8), this measure only attains 0 for λ 1 i > 0 and λ 2 i = λ 3 i = 0 and not for the other two.Therefore, it shows a general preference for linear structures over planar or volumetric structures in the data.
We will use the two measures ( 7) and (9) in our quantitative experiments in Section 5..However, the above discussion depends on the assumptions providing Theorem 1.In the following we will discuss cases in which these assumptions are not satisfied.

Degenerate Covariance Matrix
In practical applications, the assumptions of Theorem 1 are not always satisfied.Note here that the error values E dim i and E λ i are evaluated on a single point p i of the point set P .The following reasons can hinder the correct evaluation: i) If the point set contains multiple duplicates of a point, more than the sought-for number of neighbors k, all points in the reported neighborhood collapse into a single point equal to the barycenter of the neighborhood.Thus, the summands C ij all become 0.
ii) If a point p i has a flipped normal in comparison to all its neighboring points p j , the argument x in the weight equation w ij = sig cos a,b (x) becomes 0 and therefore, all weights degenerate to 0. This happens in particular for very small or thin geometries as well as for faulty normal fields.
iii) Even if the assumptions of Theorem 1 are satisfied, it only states the existence of a suitable parameter a ∈ [0, 1).Therefore, the choice of a parameter a too large can cause all weights in the covariance matrix (3) to degenerate to 0.
In the following evaluation, we prevent cases i) by requiring the point sets to only contain distinct points.Furthermore, we orient the normal field to prevent case ii).Concerning a too large parameter a, we report a failure in the computation of the error values for the point set P if λ Σ i = 0 for at least one point p i ∈ P .By including the choice a = 0 for the parameters, we ensure that each model has at least one correctly evaluated pair of error values E dim and E λ .

Evaluation Results
In this section, we present our quantitative evaluation of the weights presented in Equation (1).For the evaluation, we utilize the error measures E dim and E λ as defined in Equations ( 7) and ( 9) respectively.Our models are taken from a data set described in [27].The authors provide ten thousand clean and manifold surface meshes, which are obtained by exporting only the boundary of the tetrahedral meshes used in [27].From these, we randomly select a subset of 1, 000 meshes with uniform probability.Furthermore, we use nine meshed models 2 from the Stanford 3D Scanning Repository [28].We use the mesh information and its manifold property to obtain oriented face normals.From these, we compute vertex normals and then use these and the vertices as point sets for our experiments.For each such point set P , we consider the parameter set P := {0, .25,.5, .75,.9}× {1, 2, 4, ∞}.
We use the combinatorial neighborhood notion 3 , so that for every pair (a, b) and every point p i ∈ P , we calculate its E dim i and E λ i value over the range of k, taken from K := {6, . . ., 20}.
We assume this range for k, as it reflects typical, heuristic choices for neighborhood sizes in the area of point set processing, see the works discussed in Section 2., in particular [14].
For each point set P , we obtain the optimal parameter pair (a * , b * ) as Following the discussion from Section 4.2., we set E dim i = ∞ if there is some point p i ∈ P for which the covariance matrix C i degenerates given the current parameters (a, b) ∈ P, k ∈ K.We proceed accordingly for E λ i .That is, a parameter choice (a, b) ∈ P cannot be attained as optimal parameter pair if there is at least one point that cannot be interpreted meaningfully.Furthermore, for the optimal parameters (a * , b * ) and each point p i , we store the utilized neighborhood sizes arg min k∈K E dim i and arg min k∈K E λ i respectively.In the following we report and interpret our findings.

Global (a,b) analysis
We analyze the total amount of (a, b) choices for both model repository selections.Here, we count all point sets their respective optimal parameter pair (a * , b * ).The corresponding four global histograms for both model repositories and both error measures are given in Figures 3 and 4. In summary, both error measures act almost similar on the two data sets, i.e. in the comparison between clean and real world models.
On the large scale of 1, 000 point sets (Figure 3), we observe, that on average a large choice for parameter a and small choice for parameter b are preferred.This can be interpreted to say that it is desirable to take only normals into account that exhibit a small deviation.Also, this suggests to assign weights with a slow ascent, caused via a low   7)) and (b) E λ (Eq.( 9)) over the range K applied to 9 geometries taken from [28].
choice of parameter b.This experiment shows the potential of soft increasing weights assigned to smaller normal deviations only, and it contrasts the assignment of equal weights (a = 0, b = ∞ as used in [12]) or a sharp cut off (b = ∞ as used in [9]), as both are rarely chosen as optimal choices regarding the two error measures.A localized, i.e. modeldepended, discussion about the possibility to increase a and b for better results is given in the upcoming section.
In terms of scanned real world models (Figure 4), we analyzed only a small amount of point sets.In comparison to the clean models, we do however observe a different behavior.Namely, smaller values for a and b are favored.The latter matches the observation made above for the clean models.We interpret the parameter a to reflect the noise components caused by the acquisition process.Therefore, even a mid-range choice for a causes several points p i ∈ P to have degenerate covariance matrices C i respectively.As discussed above, these parameter pairs (a, b) are then neglected from the choice as optimal parameters.See also the following section for a more detailed discussion of this.
In conclusion we see that weight-determination favors a soft increase of the weight function, i.e. choosing b rather small.The value a however depends on the geometry.Clean models mostly attain smaller error values for larger values of a, whereas real world models require smaller values of a to obtain covariance non-degenerate covariance matrices.Both model repositories have in common that they almost never report equal weights or sharp cut-off weights as the preferred weight assignment.

Local (a,b) analysis
In this section, we will discuss the (a * , b * ) choices presented in the previous section from a local, i.e. point-set-dependent, perspective.Therefore, we consider Table 1.In the table, the first two rows correspond to the clean and the last two rows to the real world models.The columns present information about the amount of point sets accepting maximal value a = .9,allowing or forbidding an increase of a, accepting maximal value b = ∞, and allowing or forbidding an increase of b.
For instance, the column labeled a = .9reports the number of all geometries reporting a = .9a+ ¬a+ b = ∞ b+ ¬b+ Num.  1. Distribution of (a * , b * ) choices into the three cases of (a) an attained maximum (a = .9,b = ∞), (b) a possible increase of the parameter without failure (a+, b+), and (c) impossibility of increasing the parameter because it would cause a failure (¬a+, ¬b+).
this value as best choice.The column labeled a+ gives the number of those point sets, where larger values for a would have been possible but were not attained.Finally, the column labeled ¬a+ provides the number of geometries, where an increase of the parameter a would result in a failure, i.e. in at least one degenerate covariance matrix C i , see Section 4.2.. Observe, that we cover all cases, hence the three columns sum up to the number of considered geometries, given in the last column.The second set of three columns presents the corresponding values for parameter b.
Having all values in one chart, we directly observe the behavior assessed for parameter a in the previous section.There, we stated that especially in the case of clean models, an as-large-as-possible value for a is favorable over smaller values for a.Indeed, Table 1 confirms this statement, as in the case of E dim , almost all but 20 (clean) and 1 (real world) cases occur, in which the parameter a does not attain the highest possible choice.For the error measure E λ , the result is even more striking.For this measure, neither the clean nor the scanned models exhibit any case in which parameter a could be increased without causing a degenerate covariance matrix C i for some point of the geometry.This justifies the comparably small values for a attained in the real world scenarios presented in Figure 4.
The reported numbers on the parameter b support the observation drawn before: From Theorem 1 we know that an increase of parameter b cannot cause a failure.And indeed, we observe all-zero entries in the last column, which experimentally validates Theorem 1. Furthermore, the optimal choices for b rarely assume larger values up to ∞ as seen in Figures 3 and 4. The one interesting addendum to the previous observations that can be made via this analysis is the difference between the measures.Namely, the measure E λ offers a more distinctive separation into lesser cases as it attains the maximal value for b in 0.2% of the clean models and in none of the real world models.
Summarizing the global and local analysis of the parameter choices (a * , b * ), we draw the following conclusions: • The utilized error measures favor weight determination with as-large-as-possible values for parameter a and small values for parameter b.
• The error measure E λ better supports the first conclusion when compared to E dim .
• Equal or sharp cut-off weights as widely used in the literature rarely attained minimal error measures.

Global k analysis
As stated in the beginning of Section 5., for each point in the utilized point sets, we store the neighborhood size k ∈ K which leads to the optimal choice of parameters (a * , b * ) according to Equation (10).In Figures 5 and 6 we show histograms plotting this data, i.e. for each neighborhood size k ∈ K we show how many points use this k when contributing to the optimal parameters (a * , b * ).
Note that Figures 5(a), 5(b), and 6(b) are qualitatively similar.All favor an as-smallas-possible neighborhood size k over larger neighborhoods.However, when using the error measure E dim as defined in Equation (7) and evaluating it on the scanned real-world models taken from [28], the histogram indicates a different behavior.Still, the smallest neighborhood size k = 6 collects the highest number of points.But the other neighborhood sizes exhibit more of a uniform distribution while in the other cases the histogram rather resembled a hyperbola.
For the clean geometries taken from [27], we obtain an average neighborhood size For the real-world models from [28], we have average neighborhood sizes of 12.5880 and 8.6881 for E dim and E λ respectively with corresponding standard deviations of 4.5557 and 3.3407.In order to further investigate this behavior, in the following section, we turn to a local, i.e. point-set-dependent, perspective.These findings suggests that variable neighborhood sizes yield smaller error values in the two functionals.

Local k analysis
We will now consider the standard variation of the neighborhood size taken over a single model for E dim and E λ .In order to interpret these values, we present them in form of a box-whisker plot in Figure 7.While taken over all points of all models, the standard deviation of the neighborhood size according to E dim and E λ is comparably as seen above, when considering the individual models, we find a slightly more diverse behavior.However, it is obvious from Figure 7 that all standard deviations are located well away from 0, which would correspond to a uniform neighborhood size over the whole geometry.This observation is further supported when considering an analog to Figure 6, but separated for the respective models.A very interesting case occurs for the famous "Bunny" model.When considering the error value E dim , the optimal neighborhood size qualitatively follows a Gaussian distribution around a mean of k = 16.However, when considering E λ , it once more mimics a hyperbolic behavior.Another noteworthy model is the "Drill" model.For error measure E dim , it is roughly uniformly distributed except for three peaks at 6, 8, and 20.In the case of E λ , there is a qualitative Gauss bump centered at k = 15, with notable exceptions at 6 and 10.
In summary, from the global and local analysis of the obtained neighborhood sizes k,  we draw the following conclusions: • All standard deviations lie well above 0, i.e. both considered error measure favor variable neighborhood sizes of constant-size neighborhoods.
• As already observed in Sections 5.1.and 5.2., the measure E λ provides more precise predictions.From Figure 8, it is clear that E λ indeed mimics the behavior of clean models better than E dim when applied to real-world models.

Conclusion
In this paper, we investigated a family of weights (Eq.( 1)) for point set processing.These weights are based on the normal similarity.The family includes common choices such as all-one weights or sharp cut-off weights at a given threshold.Furthermore, we presented an evaluation model for neighborhood weights based on two Shannon entropy error measures (Eqs.( 7) and ( 9)).
We have performed a large-scale evaluation of our weight family on two data sets.The first set consisted of 1, 000 clean surface meshes from the work of [27].The second set consisted of 9 real-world scans taken from [28].A statistical analysis revealed that the optimal weight parameters should lead to a neglect of non-similar normals, yet include midrange normal points with a low weight.Furthermore, it became obvious in the evaluation that neighborhood sizes have to be variable over a point set as only these variable sizes attain minimal error values.
Further research consists of running the large-scale analysis on a broader range of neighborhood sizes, comparable to [12].From a theoretical point of view it remains to be better understood how the two error measures E dim and E λ differ.Finally, more tests need to be run on a large set of real-world models to further validate the findings presented in this paper.Where holds as a ≤ a .Consequently, the only additional case for sig cos a,b (x) = 0 aside from x ∈ (−∞, a) is x = a.

B Limit Results
In Equations ( 7) and ( 9) we deal with energy terms of the form x ln(x).We want to reason their limit and show, that f (x) = x ln(x) x 0 −→ 0. To do so we rewrite it as The limits lim e. a ∈ (a, 1].Then, we define the sigmoid weighting function sig cos a,b as

Figure 1 .
Figure 1.Plots of the sigmoid sig cos a,b (x) for three parameter choices.

Theorem 1 (
Non-degenerate Covariance Matrix).Let P = {p i | i ∈ [n]} be a set of points, fix a point p i ∈ P and its neighborhood N i ⊆ [n], and consider the sigmoid function sig cos a,b

Figure 5 .
Figure 5. Histogram of preferred neighborhood sizes k with respect to minimal error values (a) E dim and (b) E λ for the corresponding optimal sigmoid parameters (a * , b * ) applied to 1000 geometries taken from [27].

Figure 6 .
Figure 6.Histogram of preferred neighborhood sizes k with respect to minimal error values (a) E dim and (b) E λ for the corresponding optimal sigmoid parameters (a * , b * ) applied to 9 geometries taken from [28].

Figure 7 .
Figure 7. Box-whisker plot for the standard deviations obtained by the different models.Each model contributes its own standard deviation as a data point for the diagram.Therefore, the two leftmost columns represent 1, 000 data points each, while the two rightmost columns represent 9 data points each.

Figure 8 .
Figure 8. Histogram of preferred neighborhood sizes k with respect to minimal error values (a) E dim and (b) E λ for the corresponding optimal sigmoid parameters (a * , b * ) applied to 9 geometries taken from [28] and separated into the individual models.

= 1 .
This material is based upon work supported by the National Science Foundation under Grant No. DMS-1439786 and the Alfred P. Sloan Foundation award G-2019-11406 while the author was in residence at the Institute for Computational and Experimental Research in Mathematics in Providence, RI, during the Illustrating Mathematics program.Furthermore, this research was supported by the DFG Collaborative Research Center TRR 109, "Discretization in Geometry and Dynamics" as well as by the German National Academic Foundation.Appendices A Zeroes of the Sigmoid Function Here, we will determine the zeroes of the sigmoid function sig cos a,b as defined in Equation (2).Note that sig cos a,b (x) = 0 for all x ∈ (−∞, a) and sig cos a,b (x) = 0 for all x ∈ [a , +∞).Thus, it remains to be determined for which x ∈ [a, a ) we have sig cos a,b (x) = 0 The latter is true if and only if bπ(x − a)(1 − a) −1 = 2 π with ∈ Z, i.e. the argument in cos(•) is a multiple of 2π, which yields x = 2 (1 − a) b + a for some ∈ Z.We proceed with a case distinction for ∈ Z, but first we recall that a = 1−a b + a. a) If = 0 it follows directly, that x = a.b) If > 0 we have x = 2 (1 − a) b + a ≥ 2(1 − a) b + a = 2a − a ≥ a , but this indicates sig cos a,b (x) = 1.a) b + a = −2a + 3a ≤ a.