Random Sample Consensus (RANSAC) Algorithm, A Generic Implementation

The Random Sample Consensus (RANSAC) algorithm for robust parameter value estimation has been applied to a wide variety of parametric entities (e

The RAndom SAmple Consensus (RANSAC) algorithm, originally introduced in [1], is a widely used robust parameter estimation algorithm.Two reasons contributed to its wide adoption, it is simple and it can potentially deal with outlier contamination rates greater than 50%.
In the interest of making this paper self contained we first present the theory behind the Random Sample Consensus algorithm, and identify the elements required for its use in estimating parametric objects.This presentation is primarily based on [4].We then describe two such parametric objects, the n dimensional (nD) plane and sphere.Following the theoretical discussion we describe our implementation, highlighting the separation between the RANSAC algorithm and the specific estimation task.

RANSAC
The RANSAC algorithm is based on the observation that if we can draw a small subset from the data that is sufficiently large to estimate the model parameters and which contains no outliers, then all inliers will agree 1 with this model.This is done by randomly drawing data subsets, estimating corresponding models, and assessing how many data elements agree with each model, its consensus set.The maximal consensus set obtained in this manner is assumed to be outlier free and is used as input for a least squares model estimate.
Assuming we have no prior knowledge about the data elements we assign equal probability to all subsets (in the statistics literature this is known as simple random sampling).Optionally, a minimal threshold on the size of the largest consensus set can be used to decrease the probability of accepting an erroneous model.Table 1 summarizes the basic algorithm.To complete the algorithm's description we need to answer the following questions: a) How do we set the threshold for inclusion into the consensus set? b) What subset size should we choose?and c) How many subsets do we need to draw in order to ensure that one of them is outlier free?An informed choice of the threshold, τ, requires knowledge about the probability distribution of the inlier distances.In most cases we do not have this information and the threshold is set empirically.For the derivation of the threshold assuming a zero mean Gaussian distribution see [4].

While the standard formulation presented in
As our goal is to obtain an outlier free subset, we observe that the probability to choose an outlier free subset of size s given n data elements, out of which o are outliers is: From the equation above it is clear that to increase the probability we should choose the minimal possible value for s, which is the minimal number of elements required for model estimation.
Finally we derive the number of subsets required to ensure that one of them is outlier free.As subsets are drawn independently with replacement, ensuring that one subset is outlier free requires that we check all ( n s ) subsets.In most cases this is computationally prohibitive, although if n is small enough or s ≃ n it may still be feasible [5].
To alleviate the computational burden we relax the requirement for an outlier free subset, changing it to a probabilistic requirement.We now require that we obtain an outlier free subset with probability p.Assuming that the probability for an inlier is w the probability for an outlier is ε = 1 − w.Then we need to perform at least N subset selections, where (1 − w s ) N = 1 − p which gives us: Finally, it should be emphasized that even when we do draw an outlier free minimal subset we may not be able to estimate the model.For a successful estimate an outlier free minimal subset is necessary but not sufficient.Usually, the subset must satisfy additional problem specific constraints (e.g. three non-collinear points to fit a 3D plane).
Having answered these questions we can reformulate the RANSAC algorithm.We now require as input the desired probability, p, for drawing an outlier free subset.In addition we adaptively update the number of drawn subsets, N, when a new subset results in a larger consensus set (higher percentage of inliers).(c) For each data element, 3. Estimate model parameters using maximalConsensusSet.Thus, to use the generic RANSAC algorithm we require: 1.A method for estimating the parametric entity given the minimal number of data elements.
2. A method for least squares parameter estimation.
3. A method that checks if a given data element agrees with the model.
We end this section with a discussion of methods for simple random sampling, drawing subsets with replacement from a uniform distribution.
The first method draws a subset by drawing data elements with replacement using a uniform distribution (in MATLAB: ceil(rand(1,s)*n) ).That is, s values are independently generated using a uniform distribution on the interval [1, n].As the elements are drawn with replacement a data element can be drawn multiple times for a single subset.The number of possible subsets is thus n s while the number of valid subsets is smaller, ( ) .This will yield degenerate configurations when the same data element is drawn multiple times, as we are generating minimal subsets.While this is not optimal, it is simple to implement and the requirement that all valid subsets have the same probability ( 1 n s ) is satisfied.A slight modification that ensures that only valid subsets are generated is to remove the selected data element and to change the uniform distribution such that the probability for choosing the k'th element is 1  n−k .Thus only valid subsets are generated and they all have equal probability ( Another method for simple random sampling is to independently generate a random value from a uniform distribution over the interval [a, b] for each of the data elements, then sort the values and take the s data elements that correspond to the highest ones.It is clear that this approach only generates the ( n s ) valid subsets.We now show that they all have the same probability.Let x i=1...n be our independently identically distributed (iid) random variables.The probability of incorporating element i into the subset is the probability that x i is greater than n − s of the random variables.As the Latest version available at the Insight Journal [ http://hdl.handle.net/10380/3223] Distributed under Creative Commons Attribution License variables are generated independently we have: If p(x i > x j ) is a constant, then the probability of incorporating element i into the subset (Equation 2) is constant, which means that the probability for every subset is constant and equal.
As our random variables are iid their joint distribution is symmetric about the diagonal, p(x i = x j ) which gives us: We now show that p(x i = x j ) is a constant using the fact that the variables are iid from a uniform distribution: While this approach only generates valid subsets with uniform probability it is computationally expensive, due to the sorting phase which is an O(n log n) operation.

nD Plane
A nD plane is defined as the set of points that satisfy the following equation: where a is a point on the hyperplane and n is the plane's normal.

Exact estimate
Each point, p i ∈ R d provides us with one linear equation in (d + 1) unknowns: Given d linearly independent points we obtain a matrix whose one dimensional null space gives us the plane normal, n.The point on the plane, a, is arbitrarily chosen from the point set.
For d = 2 and d = 3 the line/plane normal can be computed constructively: • d = 2, given two points p, q the line normal is: • d = 3, given three points p, q, w the plane normal is given by the cross product: Latest

Geometric Least squares estimate
Given m points in R d , m > d, we want to fit them to a plane such that their orthogonal distance from the plane is minimized.Given a point p i it can be written as follows (Figure 1): where n i ⊥ is a unit vector perpendicular to n and s i is the appropriate scale factor.The signed point-to-plane distance is thus: The optimal plane parameters are computed as: and in explicit form: Deriving ∆ with respect to a we get: equating this to zero yields: a is the mean of the sample points.
Going back to our optimization problem we have:

nD sphere
A nD sphere is defined as the set of points that satisfy the following equation: where c is the center and r is the radius.

Exact estimate
As all the points, p i ∈ R d , are on the sphere we have: Each pair of points provides us with one linear equation in (d + 1) unknowns: Given d + 1 linearly independent points we obtain a regular matrix, and solve the equation system to get c.The radius is computed as ∥p i − c∥, where p i is arbitrarily chosen from the point set.
For d = 2 and d = 3 we can use Cramer's rule to compute the matrix inverse 3 : • d = 2, given three points p 1 , p 2 , p 3 the equation system is: ] and the matrix inverse is given as: , given four points p1, p2, p3, p4 the equation system is: and the matrix inverse is given as: Eigenvector computation is performed either using methods that take advantage of the fact that the matrix is symmetric, or in the case of d = 2, using an analytic solution. 3 where C is the cofactor matrix, Latest

Algebraic least squares estimate
The derivations described in this subsection are partially due to [2].
Given m points in R d , m > (d + 1), we want to fit them to a sphere such that the sum of the squared algebraic distances is minimized.From the definition of a sphere (Equation 3) we have: The optimal sphere parameters are computed as: setting m = c T c − r 2 we get the following linear equation system (Ax = b): The solution of this equation system minimizes Σ m i=1 δ 2 i = ∥Ax − b∥ 2 .Note that this equation system admits solutions where m ≥ c T c.That is, we have a solution that does not represent a valid circle, as r 2 <= 0. This situation can arise in the presence of outliers.

Geometric least squares estimate
Given m points in R d , m > (d + 1), estimate the parameters of the sphere that minimizes the sum of squared geometric distances to the point set.
The signed geometric distance is (Figure 2): The optimal sphere parameters are computed as: Latest version available at the Insight Journal [ http://hdl.handle.net/10380/3223] Distributed under Creative Commons Attribution License This nonlinear optimization problem is solved using the Levenberg-Marquardt method which requires the computation of δ i and its partial derivatives with respect to the (d + 1) unknowns: When there is no prior knowledge with regard to the sphere location or its radius the RANSAC algorithm will often yield incorrect estimates.A key assumption of the algorithm is that the maximal consensus set is outlier free.In the case of spheres, the size of the consensus set is related to the sphere radius, causing bias.That is, for a random data set a larger radius will most often lead to a larger consensus set (the surface area of the nD sphere is proportional to r n−1 ).This can be mitigated by incorporating prior constraints on the parameter values.An implicit approach is to reject hypotheses that invalidate the constraints, similar to the way that we deal with degenerate configurations.

Implementation
We next address several implementation related issues: 1. What happens if the data is outlier free?
In the basic algorithm we continue to iterate even though the algorithm could have terminated immediately with all the data as the consensus set.In the adaptive sampling version we need to explicitly address this situation, as the denominator in Table 2 step 2(d), (log(1 − w s )), is not defined.When encountering this situation we terminate the iterations without attempting to compute the undefined value.
2. What happens when the selected subset is a degenerate configuration, that is, a unique model cannot be estimated (e.g.fitting a homography to four points where three/four of them are collinear)?
We can either accept this subset as a valid draw whose consensus set is empty, or we can require that the sample be drawn again and again till we obtain a non-degenerate configuration.This approach requires a limit on the number of attempts, as we risk the chance of iterating infinitely.This can happen if the data does not contain an instance of the model (fitting a circle to line data).It is also possible that we repeatedly draw the same degenerate configuration as we are drawing with replacement and all subsets have equal probability.In our implementation we consider this a valid subset with an empty consensus set. ) . The only issue with this choice is the potential for data overflow during the computation which is addressed in our implementation.
Finally, we note an interesting aspect of the probabilistic approach to computing the required number of subsets.It is possible that the number of subsets computed in this manner is greater than all possible subsets.This is a result of using the proportion of inliers/outliers and not their nominal numbers.Figure 3 shows an example of how this can happen.
Algebraic Least Squares Sphere Estimate: The validity of the solution of Equation 4 is checked and rejected if the solution is invalid.That is, solutions in which r 2 <= 0 which potentially arise in the presence of outliers.
Geometric Least Squares Sphere Estimate: The use of the iterative nonlinear solver requires an initial estimate, this is obtained using the analytic algebraic least squares method.
Latest version available at the Insight Journal [ http://hdl.handle.net/10380/3223] Distributed under Creative Commons Attribution License Figure 4: Least squares (left column) and RANSAC (right column) based estimates of a plane and sphere.Green spheres are the data points that agree with the models.Quantitative comparison of these data sets is given in Table 3.
Visual comparison of results obtained using a least squares estimate versus those obtained using our RANSAC implementation are shown in Figure 4, with the corresponding quantitative comparison given in Table 3
min n n T Σn, s.t.∥n∥ = 1 Latest version available at the Insight Journal [ http://hdl.handle.net/10380/3223] Distributed under Creative Commons Attribution License where Σ = Σ m i=1 (p i − a)(p i − a) T .The minimum is obtained for the eigenvector 2 corresponding to the minimal eigenvalue of Σ [3].

Table 1 :
Threshold which defines if a data element, d i , agrees with model M .Randomly draw a subset containing s elements and estimate model M .(c) For each data element, d i : if Agree(d i , M , τ), consensusSet ← d i RANSAC, basic algorithm.

Table 2
Minimal number of data elements required to estimate model M .Threshold which defines if a data element, d i , agrees with model M .
summarizes the adaptive sampling algorithm.Latest version available at the Insight Journal [ http://hdl.handle.net/10380/3223] Distributed under Creative Commons Attribution License Input: data -Data contaminated with outliers (cardinality(data) = n).p -Desired probability for drawing an outlier free subset.ss ) 2. Iterate N times: (a) consensusSet ← ∅ (b) Randomly draw a subset containing s elements and estimate model M .
Figure3: When estimating a line we randomly draw pairs of points.Given the pair p 1 , p 2 the consensus set includes only these two points (for a specific value of τ).The number of trials required to obtain an outlier free subset with a probability of 0.99 is then computed as N = log(0.01)log(0.96)≃ 113, while the number of all possible subsets for this data set is the algorithm to fail.As we do not know the number of outlying data elements it is possible that the first k subsets all contain outliers.Thus we set the initial number of iterations to ( n s 3. What happens if there are multiple consensus sets with the maximal cardinality?One possibility is to simply ignore this scenario and arbitrarily accept the consensus set that was obtained first.This is what we do.Another possible option is to keep all of the sets, estimate the least squares model for each of them and select the one that has the minimal residual error.Again, if multiple models result in the same residual error we can arbitrarily choose one.Latest version available at the Insight Journal [ http://hdl.handle.net/10380/3223] Distributed under Creative Commons Attribution License

.
Latest version available at the Insight Journal [ http://hdl.handle.net/10380/3223] Distributed under Creative Commons Attribution License