Error analysis for circle fitting algorithms

We study the problem of fitting circles (or circular arcs) to data points observed with errors in both variables. A detailed error analysis for all popular circle fitting methods -- geometric fit, Kasa fit, Pratt fit, and Taubin fit -- is presented. Our error analysis goes deeper than the traditional expansion to the leading order. We obtain higher order terms, which show exactly why and by how much circle fits differ from each other. Our analysis allows us to construct a new algebraic (non-iterative) circle fitting algorithm that outperforms all the existing methods, including the (previously regarded as unbeatable) geometric fit.


Introduction
Fitting circles and circular arcs to observed points is one of the basic tasks in pattern recognition and computer vision, nuclear physics, and other areas [5,9,11,23,24,27,30,32]. Many algorithms have been developed that fit circles to data. Some minimize the geometric distances from the circle to the data points (we call them geometric fits). Others minimize various approximate (or 'algebraic') distances, they are called algebraic fits. We overview most popular algorithms in  Geometric fit is commonly regarded as the most accurate, but it can only be implemented by iterative schemes that are computationally intensive and subject to occasional divergence. Algebraic fits are faster but presumably less precise. At the same time the assessments on their accuracy are solely based on practical experience, no one has performed a detailed theoretical comparison of the accuracy of various circle fits. It was shown in [8] that all the circle fits have the same covariance matrix, to the leading order, in the small-noise limit. Thus the differences between various fits can only be revealed by a higher-order error analysis.
The purpose of this paper is to do just that. We employ higher-order error analysis (a similar analysis was used by Kanatani [22] in the context of more general quadratic models) and show exactly why and by how much the geometric circle fit outperforms the algebraic circle fits in accuracy; we also compare the precision of different algebraic fits. Section 5 presents our error analysis in a general form, which can be readily applied to other curve fitting problems.
Finally, our analysis allows us to develop a new algebraic fit whose accuracy exceeds that of the geometric fit. Its superiority is demonstrated by numerical experiments.
Note thatũ 2 i +ṽ 2 i = 1 for every i. Remark. In our paper δ i and ε i have common variance σ 2 , i.e. our noise is homoscedastic. In many studies the noise is heteroscedastic [25,35], i.e. the normal vector (δ i , ε i ) has point-dependent covariance matrix σ 2 C i , where C i is known and depends on i, and σ 2 is an unknown factor. Our analysis can be extended to this case, too, but the resulting formulas will be somewhat more complex, so we leave it out.

Geometric circle fits
A standard approach to fitting circles to 2D data is based on orthogonal least squares, it is also called geometric fit, or orthogonal distance regression (ODR). It minimizes the function where d i stands for the distance from (x i , y i ) to the circle, i.e.
where (a, b) denotes the center, and R the radius of the circle.
In the context of the functional model, the geometric fit returns the maximum likelihood estimates (MLE) of the circle parameters [6], i.e. (7) (â MLE ,b MLE ,R MLE ) = argmin F (a, b, R).
A major concern with the geometric fit is that the above minimization problem has no closed form solution. All practical algorithms of minimizing F are iterative; some implement a general Gauss-Newton [6,15] or Levenberg-Marquardt [9] schemes, others use circle-specific methods proposed by Landau [24] and Späth [30]. The performance of iterative algorithms heavily depends on the choice of the initial guess. They often take dozens or hundreds of iterations to converge, and there is always a chance that they would be trapped in a local minimum of F or diverge entirely. These issues are explored in [9].
A peculiar feature of the maximum likelihood estimates (â,b,R) of the circle parameters is that they have infinite moments [7], i.e.
for any set of true values (ã,b,R); here E denotes the mean value. This happens because the distributions of these estimates have somewhat heavy tails, even though those tails barely affect the practical performance of the MLE (the same happens when one fits straight lines to data with errors in both variables [2,3]).
To ensure the existence of moments one can adopt a different parameter scheme. An elegant scheme was proposed by Pratt [27] and others [13], which describes circles by an algebraic equation (9) A(x 2 + y 2 ) + Bx + Cy + D = 0 with an obvious constraint A = 0 (otherwise this equation describes a line) and a less obvious constraint B 2 + C 2 − 4AD > 0. The necessity of the latter can be seen if one rewrites equation (9) as It is clear now that (9) defines a circle if and only if B 2 + C 2 − 4AD > 0. As the parameters (A, B, C, D) only need to be determined up to a scalar multiple, it is natural to impose a constraint (11) B 2 + C 2 − 4AD = 1, because it automatically ensures B 2 +C 2 −4AD > 0. The constraint (11) was first proposed by Pratt [27]. Under this constraint, the parameters A, B, C, D are essentially bounded, see [9], and their maximum likelihood estimates can be shown to have finite moments. The equation (9), under the constraint (11), conveniently describes all circles and lines (the latter are obtained when A = 0); the inclusion of lines is necessary to ensure the existence of the least squares solution [9,26,37].
After one estimates the algebraic circle parameters A, B, C, D, they can be converted to the natural parameters via

Algebraic circle fits
An alternative to the complicated geometric fit is made by fast non-iterative procedures called algebraic fits. We describe three most popular algebraic circle fits below.
Kåsa fit. One can find a circle by minimizing the function In other words, one minimizes (13) to a linear least squares problem minimizing where we denote z i = x 2 i +y 2 i for brevity (we intentionally omit symbol A here to make our formulas consistent with the subsequent ones). Now the problem reduces to a system of linear equations (normal equations) with respect to B, C, D that can be easily solved, and then one recovers the natural circle parameters a, b, R via (12).
This method was introduced in the 1970s by Delogne [11] and Kåsa [23], and then rediscovered and published independently by many authors, see references in [9]. It remains popular in practice. We call it Kåsa fit.
The Kåsa method is perhaps the fastest circle fit, but its accuracy suffers when one observes incomplete circular arcs (partially occluded circles); then the Kåsa fit is known to be heavily biased toward small circles [9]. The reason for the bias is that the algebraic distances f i provide a poor approximation to the geometric distances d i ; in fact, (15) hence the Kåsa fit minimizes F K ≈ 2R 2 d 2 i , and it often favors smaller circles minimizing R 2 rather than the distances d i .
Pratt fit. To improve the performance of the Kåsa method one can minimize another function, F = 1 4R 2 F K , which provides a better approximation to d 2 i . This new function, expressed in terms of A, B, C, D reads due to (12). Equivalently, one can minimize subject to the constraint (11). This method was proposed by Pratt [27].
Taubin fit. A slightly different method was proposed by Taubin [32] who minimizes the function Expressing it in terms of A, B, C, D gives Equivalently, one can minimize (17) subject to a new constraint (20) 4A 2z + 4ABx + 4ACȳ + B 2 + C 2 = 1.
Here we use standard 'sample means' notation: General remarks. Note that the minimization of (17) must use some constraint, to avoid a trivial solution A = B = C = D = 0. Pratt and Taubin fits utilize constraints (11) and (20), respectively. Kåsa fit also minimizes (17), but subject to constraint A = 1. While the Pratt and Taubin estimates of the parameters A, B, C, D have finite moments, the corresponding estimates of a, b, R have infinite moments, just like the MLE (8). On the other hand, Kåsa's estimates of a, b, R have finite moments whenever n ≥ 4; see [37].
All the above circle fits have an important property -they are independent of the choice of the coordinate system, i.e. their results are invariant under translations and rotations; see a proof in [14].
Practical experience shows that the Pratt and Taubin fits are more stable and accurate than the Kåsa fit, and they perform nearly equally well, see [9]. Taubin [32] intended to compare his fit to Pratt's theoretically, but no such analysis was ever published. We make such a comparison below.
There are many other approaches to the circle fitting problem in the modern literature [4,10,35,28,29,31,33,36,38], but most of them are either quite slow or can be reduced to one of the algebraic fits [14, Chapter 8].
Matrix representation. We can represent the above three algebraic fits in matrix form. Let A = (A, B, C, D) denote the parameter vector, Differentiating with respect to A gives thus A must be a generalized eigenvector for the matrix pair (M, N). This fact is sufficient for the subsequent analysis, because it determines A up to a scalar multiple, and multiplying A by a scalar does not change the circle it represents, so we can set A = 1.
Remark. The generalized eigenvalue problem (26) may have several solutions. To choose the right one we note that for each solution (η, A) thus for the purpose of minimizing A T MA we should choose the solution of (26) with the smallest positive η.

Error Analysis: a general scheme
We employ an error analysis scheme based on a 'small noise' assumption. That is, we assume that the errors δ i and ε i (Section 2) are small and treat their standard deviation σ as a small parameter. The sample size n is fixed, though it is not very small.
This approach goes back to Kadane [16] and was employed by Anderson [2] and other statisticians [3]. More recently it has been used by Kanatani [19,22] in image processing applications, who argued that the 'small noise' model, where σ → 0 while the sample size n is kept fixed, is more appropriate than the traditional statistical 'large sample' approach, where n → ∞ while σ > 0 is kept fixed. We use a combination of these two models: our main assumption is σ → 0, but n is regarded as a slowly increasing parameter; more precisely we assume n ≪ σ −2 .
Suppose one is fitting curves defined by an implicit equation where Θ = (θ 1 , . . . , θ k ) T denotes a vector of unknown parameters to be estimated. LetΘ = (θ 1 , . . . ,θ k ) T be the 'true' parameter vector corresponding to the 'true' curve P (x, y;Θ) = 0. As in Section 2 let (x i ,ỹ i ), i = 1, . . . , n, denote true points, which lie on the true curve, and (x i , y i ) observed points satisfying (1). LetΘ(x 1 , y 1 , . . . , x n , y n ) be an estimator. We assume that Θ is a regular (at least four times differentiable) function of observations (x i , y i ). The existence of the derivatives ofΘ is only required at the true points (x i , y i ) = (x i ,ỹ i ), and it follows from the implicit function theorem under general assumptions provided P (x, y; Θ) in (27) is differentiable (in most cases P is a polynomial in all its variables); we omit the proof. For brevity we denote by X = (x 1 , y 1 , . . . , x n , y n ) T the vector of all observations, so that X =X + E, whereX = (x 1 ,ỹ 1 , . . . ,x n ,ỹ n ) T is the vector of the true coordinates and E = (δ 1 , ε 1 , . . . , δ n , ε n ) T is the 'noise vector'; the components of E are i.i.d. normal random variables with mean zero and variance σ 2 .
We use Taylor expansion to the second order terms. To keep our notation simple, we work with each scalar parameter θ m of the vector Θ separately: m denote the gradient (the vector of the first order partial derivatives) and the Hessian matrix of the second order partial derivatives ofθ m , respectively, taken at the true vectorX. The remainder Expansion (28) shows thatΘ(X) →Θ(X) in probability, as σ → 0. It is convenient to assume that Precisely (29) means that whenever σ = 0, i.e. when the true points are observed without noise, then the estimator returns the true parameter vector, i.e. finds the true curve. Geometrically, it means that if there is a model curve that interpolates the data points, then the algorithm finds it.
With some degree of informality, one can assert that whenever (29) holds, the estimateΘ is consistent in the limit σ → 0. This is regarded as a minimal requirement for any sensible fitting algorithm. For example, if the observed points lie on one circle, then every circle fitting algorithm finds that circle uniquely. Kanatani [20] remarks that algorithms which fail to follow this property "are not worth considering".
Under the assumption (29) we rewrite (28) as The accuracy of an estimatorθ in statistics is characterized by its Mean Squared Error (MSE) where bias(θ) = E(θ) −θ. But it often happens that exact (or even approximate) values of E(θ) and Var(θ) are unavailable because the probability distribution ofθ is overly complicated, which is common in curve fitting problems, even if one fits straight lines to data points; see [2,3]. There are also cases where the estimates have theoretically infinite moments because of somewhat heavy tails, which on the other hand barely affect their practical performance. Thus their accuracy should not be characterized by the theoretical moments which happen to be affected by heavy tails; see also [2]. In all such cases one usually constructs a good approximate probability distribution forθ and judges the quality ofθ by the moments of that distribution.
It is standard [1,2,3,12,34] to construct a normal approximation toθ and treat its variance as an 'approximative' MSE ofθ. The normal approximation is usually based on the leading term in the Taylor expansion, like G T m E in (30). For circle fitting algorithms, the resulting variance (see below) will be the same for all known methods, so we will go one step further and use the second order term. This gives us a better approximative distribution and allows us to compare circle fitting methods. In our formulas, E(θ m ) and Var(θ m ) denote the mean and variance of the resulting approximative distribution.
The first term in (30) The vector E m = Q m E has the same distribution as E does, i.e. its components are i.i.d. normal random variables with mean zero and variance σ 2 . Thus where the Z i 's are i.i.d. standard normal random variables, and the mean value of (32) is Therefore, taking the mean value in (30) gives . Note that the expectations of all third order terms vanish, because the components of E are independent and their first and third moments are zero; thus the remainder term is of order σ 4 .
Squaring (30) and again using (32) give the mean squared error (MSE) . The remainder R includes terms of order σ 6 , as well as some terms of order σ 4 that contain third order partial derivatives, such as ∂ 3θ m /∂x 3 i and ∂ 3θ m /∂x 2 i ∂x j . A similar expression can be derived for E ∆θ m ∆θ m ′ for m = m ′ , we omit it and only give the final formula below.
Classification of higher order terms. In the MSE expansion (35), the leading term σ 2 G T m G m is the most significant. The terms of order σ 4 are often given by long complicated formulas. Even the expression for the bias (34) may contain several terms of order σ 2 , as we will see below. Fortunately, it is possible to sort them out keeping only the most significant ones, see next.
Kanatani [22] recently derived formulas for the bias of certain ellipse fitting algorithms. First he found all the terms of order σ 2 , but in the end he noticed that some terms were of order σ 2 (independent of n), while the others of order σ 2 /n. The magnitude of the former was clearly larger than that of the latter, and when Kanatani made his conclusions he ignored the terms of order σ 2 /n. Here we formalize Kanatani's classification of higher order terms as follows: -In the expression for the bias (34) we keep terms of order σ 2 (independent of n) and ignore terms of order σ 2 /n. -In the expression for the mean squared error (35) we keep terms of order σ 4 (independent of n) and ignore terms of order σ 4 /n.
These rules agree with our assumption that not only σ → 0, but also n → ∞, although n increases rather slowly (n ≪ 1/σ 2 ). Such models were studied by Amemiya, Fuller and Wolter [1,34] who made a more rigid assumption that n ∼ σ −a for some 0 < a < 2.
Now it turns out (we omit detailed proofs; see [14]) that the main term σ 2 G T m G m in our expression for the MSE (35) is of order σ 2 /n; so it will never be ignored. Of the fourth order terms, 1 2 σ 4 H m 2 F is of order σ 4 /n, hence it will be discarded, and the same applies to all the terms involving third order partial derivatives mentioned above.
The bias σ 2 tr H m in (34) is, generally, of order σ 2 (independent of n), thus its contribution to the mean squared error (35) is significant. However the full expression for the bias may contain terms of order σ 2 and of order σ 2 /n, of which the latter will be ignored; see below.
Now the terms in (35) have the following orders of magnitude: where each big-O simply indicates the order of the corresponding term in (35). It is interesting to roughly compare their values numerically. In typical computer vision applications, σ does not exceed 0.05; see [5]. The number of data points normally varies between 10-20 (on the low end) and a few hundred (on the high end). For simplicity, we can set n ∼ 1/σ for smaller samples large samples (n ∼ 1/σ 2 ) σ 4 σ 4 σ 6 σ 6 Table 1: The order of magnitude of the four terms in (35).
and n ∼ 1/σ 2 for larger samples. Then Table 1 presents the corresponding typical magnitudes of each of the four terms in (35). We see that for larger samples the fourth order term coming from the bias may be just as big as the leading second-order term, hence it would be unwise to ignore it. Earlier studies, see e.g. [5,8,17], usually focused on the leading, i.e. second-order, terms only, disregarding all the fourth-order terms, and this is where our analysis is different. We make one step further -we keep all the terms of order O(σ 2 /n) and O(σ 4 ). The less significant terms of order O(σ 4 /n) and O(σ 6 ) would be discarded. Now combining all our results gives a matrix formula for the (total) mean squared error (MSE) where G is the k × 2n matrix of first order partial derivatives ofΘ(X), its rows are G T m , 1 ≤ m ≤ k, and B = 1 2 [ tr H 1 , . . . tr H k ] T is the k-vector that represents the leading term of the bias ofΘ, cf. (34). The trailing dots in (37) stand for all insignificant terms (those of order σ 4 /n and σ 6 ).
We call the first (main) term σ 2 GG T in (37) the variance term, as it characterizes the variance (more precisely, the covariance matrix) of the es-timatorΘ, to the leading order. For brevity we denote V = GG T . The second term σ 4 BB T is the 'tensor square' of the bias σ 2 B of the estimator, again to the leading order. When we deal with particular estimators in the next sections, we will see that the actual expression for the bias is a sum of terms of two types: some of them are of order O(σ 2 ) and some others are of order O(σ 2 /n), i.e.
where B 1 = O(1) and B 2 = O(1/n). We call σ 2 B 1 the essential bias of the estimatorΘ. This is its bias to the leading order, σ 2 . The other terms, i.e. σ 2 B 2 , and O(σ 4 ), constitute non-essential bias; they can be discarded. Now (37) can be written as where we only keep significant terms of order σ 2 /n and σ 4 and drop the rest.
KCR lower bound. The matrix V representing the leading terms of the variance has a natural lower bound (an analogue of the Cramer-Rao bound): for every curve family (27) there is a symmetric positive semi-definite matrix V min such that for every estimator satisfying (29) (40) in the sense that V − V min is a positive semi-definite matrix. Here Therefore, andũ i ,ṽ i are given by (4). The general inequality (40) was proved by Kanatani [17,18] for unbiased estimatorsΘ and then extended by Chernov and Lesort [8] to all estimators satisfying (29). The geometric fit (which minimizes orthogonal distances) always satisfies (29) and attains the lower bound V min ; this was proved by Fuller (Theorem 3.2.1 in [12]) and independently by Chernov and Lesort [8], who named the inequality (40) Kanatani-Cramer-Rao (KCR) lower bound. See also survey [25] for the more general case of heteroscedastic noise.
Assessing the quality of estimators. Our analysis dictates the following strategy of assessing the quality of an estimatorΘ: first of all, its accuracy is characterized by the matrix V, which must be compared to the KCR lower bound V min . We will see that for all the circle fitting algorithms the matrix V actually achieves its lower bound V min , i.e. we have V = V min , hence these algorithms are optimal to the leading order.
Next, once the factor V is already at its natural minimum, the accuracy of an estimator should be characterized by the vector B 1 representing the essential bias -better estimates should have smaller essential biases. It appears that there is no natural minimum for B 1 , in fact there exist estimators which have a minimum variance V = V min and a zero essential bias, i.e. B 1 = 0. We will construct such an estimator in Section 7.

Error analysis of geometric circle fit
Here we apply the general method of the previous section to the geometric circle fit, i.e. to the estimatorΘ = (â,b,R) of the circle parameters minimizing the sum d 2 i of orthogonal (geometric) distances from the data points to the fitted circle.
Variance of the geometric circle fit. We start with the main part of our error analysis -the variance term represented by σ 2 V in (39). The distances d i = r i − R can be expanded as see (4). Minimizing i to the first order is equivalent to minimizing This is a classical least squares problem that can also be written as where W is given by (45), Θ = (a, b, R) T , as well as δ = (δ 1 , . . . , δ n ) T and ε = (ε 1 , . . . , ε n ) T , whileŨ = diag(ũ 1 , . . . ,ũ n ) andṼ = diag(ṽ 1 , . . . ,ṽ n ). The solution of the least squares problem (48) is of course this does not include the O P (σ 2 ) terms. Thus the variance of our estimator, to the leading order, is Now observe that E(δε T ) = E(εδ T ) = 0, as well as E(δδ T ) = E(εε T ) = σ 2 I, and we haveŨ 2 +Ṽ 2 = I. Thus to the leading order where the higher order (of σ 4 ) terms are not included. Comparing this to (44) confirms that the geometric fit attains the minimal possible covariance matrix V.
Bias of the geometric circle fit. Now we do a second-order error analysis, which has not been previously done in the literature. According to a general formula (28), we put Here ∆ 1 a, ∆ 1 b, ∆ 1 R are linear combinations of ε i 's and δ i 's, which were found above, in (49), and ∆ 2 a, ∆ 2 b, ∆ 2 R are quadratic forms of ε i 's and δ i 's to be determined next.
Expanding the distances d i to the second order terms gives Since we already found ∆ 1 a, ∆ 1 b, ∆ 1 R, the only unknowns are ∆ 2 a, ∆ 2 b, This is another least squares problem, and its solution is where F = (f 1 , . . . , f n ) T ; of course this is a quadratic approximation which does not include O P (σ 3 ) terms. In fact, the contribution from the first three (linear) terms in (55) vanishes, quite predictably; thus only the last two (quadratic) terms matter. Taking the mean value gives, to the leading order, The second term in (57) is of order O(σ 2 /n), thus the essential bias is given by the first term only, and it can be simplified. Since the last column of the matrix W T W coincides with the vector W T 1, we have (W T W) −1 W T 1 = [0, 0, 1] T , hence the essential bias of the geometric circle fit is Thus the estimates of the circle center,â andb, have no essential bias, while the estimate of the radius has essential bias which is independent of the number and location of the true points. These facts are consistent with the results obtained by Berman [5] under the assumptions that σ > 0 is fixed and n → ∞.

Error analysis of algebraic circle fits
Here we analyze algebraic circle fits using their matrix representation.
Matrix perturbation method. For every random variable, matrix or vector, L, we write whereL is its 'true', nonrandom, value (achieved when σ = 0), ∆ 1 L is a linear combination of δ i 's and ε i 's, and ∆ 2 L is a quadratic form of δ i 's and ε i 's; all the higher order terms (cubic etc.) are represented by O P (σ 3 ). For brevity, we drop the O P (σ 3 ) terms in our formulas. Therefore A =Ã + ∆ 1 A + ∆ 2 A and M =M + ∆ 1 M + ∆ 2 M, and (22) implies Since the true points lie on the true circle,ZÃ = 0, as well asMÃ = 0 (henceM is a singular matrix). Therefore , and premultiplying (26) by A T yields η = O P (σ 2 ). Next, substituting the expansions of M, A, and N into (26) gives (recall that N is data-dependent for the Taubin method, but only its 'true' valueÑ matters, as η = O P (σ 2 ), hence the use of the observed values only adds higher order terms). Now usingMÃ = 0 yields The left hand side of (66) consists of a linear part (M ∆ 1 A + ∆ 1 MÃ) and a quadratic part (M ∆ 2 A + ∆ 1 M ∆ 1 A + ∆ 2 MÃ). Separating them gives (where we used (62) andZÃ = 0) and Note thatM is a singular matrix (becauseMÃ = 0), but whenever there are at least three distinct true points, they determine a unique true circle, thus the kernel ofM is one-dimensional, and it coincides with span(Ã). Also, we set A = 1, hence ∆ 1 A is orthogonal toÃ, and we can write whereM − denotes the Moore-Penrose pseudoinverse. Now one can easily check that E(∆ 1 M ∆ 1 A) = O(σ 2 /n) and E(∆ 1 A) = 0; these facts will be useful in the upcoming analysis.
Variance of algebraic circle fits. From (69) we conclude that denote the columns of the matricesZ T and ∆ 1 Z T , respectively. Next, (23) and (24). Hence Combining the above formulas gives Remarkably, the variance of algebraic fits does not depend on the constraint matrix N, hence all algebraic fits have the same variance (to the leading order). In the next section we will derive the variance of algebraic fits in the natural circle parameters (a, b, R) and see that it coincides with the variance of the geometric fit (51).
Bias of algebraic circle fits. Since E(∆ 1 A) = 0, it will be enough to find E(∆ 2 A). Premultiplying (68) byÃ T yields Recall that E(∆ 1 M ∆ 1 A) = O(σ 2 /n), thus this term will not affect the essential bias and we drop it. Taking the mean value and using (69) gives We substitute (63) into (76), useZÃ = 0, then observe that (hereÃ is the first component of the vectorÃ). Then, note that ∆ 2 Z i = (δ 2 i + ε 2 i , 0, 0, 0) T , and so Therefore the essential bias is given by A more detailed analysis (which we omit) gives the following expression containing all the O(σ 2 ) and O(σ 2 /n) terms: This expression demonstrates that the terms of order σ 2 /n (the non-essential bias) only add a small correction, which is negligible when n is large.
The expressions (80) and (81) can be simplified. Note that the vector n −1 Z i coincides with the last column of the matrixM, hence In fact, this term will play the key role in the subsequent analysis.

Comparison of various circle fits
Bias of the Pratt and Taubin fits. We have seen that all the algebraic fits have the same main characteristic -the variance (75), to the leading order.
We will see below that their variance coincides with that of the geometric circle fit. Thus the difference between all our circle fits should be traced to the higher order terms, especially to their essential biases. First we compare the Pratt and Taubin fits. For the Pratt fit, the constraint matrix is N =Ñ = P, hence its essential bias (80) becomes In other words, the Pratt constraint N = P cancels the second (middle) term in (80); it leaves the first term intact. For the Taubin fit, the constraint matrix is N = T and its 'true' value is N =T = 1 n T i ; also note thatT iÃ = 2ÃZ i + PÃ for every i. Hence the Taubin's bias is Thus, the Taubin constraint N = T cancels the second term in (80) and a half of the first term; it leaves only a half of the first term in place. As a result, the Taubin fit's essential bias is twice as small as that of the Pratt fit. Given that their main terms (variances) are equal, we see that the Taubin fit is statistically more accurate than that of Pratt. We believe our analysis answers the question posed by Taubin [32] who intended to compare his fit to Pratt's.
'Hyperaccurate' algebraic fit. Our error analysis leads to another stunning discovery -an algebraic fit that has no essential bias at all. To our knowledge, this is the first such algorithm for curve fitting problems.
Let us set the constraint matrix to Then one can easily see that HÃ = 4Ã 1 n Z i + PÃ, as well asÃ T HÃ = A T PÃ, hence all the terms in (80) cancel out! The resulting essential bias vanishes: We call this fit hyperaccurate, or 'Hyper' for short. The term hyperaccuracy was introduced by Kanatani [21,22] who was first to employ Taylor expansion up to the terms of order σ 4 for the purpose of comparing various algebraic fits and designing better fits. We note that the Hyper fit is invariant under translations and rotations because its constraint matrix H is a linear combination of two others, T and P, that satisfy the invariance requirements; see a proof in [14].
As any other algebraic circle fit, the Hyper fit minimizes the function The Hyper fit can be computed by a numerically stable procedure involving singular value decomposition (SVD). First, we compute the (short) SVD, Z = UΣV T , of the matrix Z. If its smallest singular value, σ 4 , is less than a predefined tolerance ε (we suggest ε = 10 −12 ), then A is the corresponding right singular vector, i.e. the fourth column of the V matrix. In the regular case (σ 4 ≥ ε), one forms Y = VΣV T and finds the eigenpairs of the symmetric matrix YH −1 Y. Selecting the eigenpair (η, A * ) with the smallest positive eigenvalue and computing A = Y −1 A * completes the solution. The prior translation of the coordinate system to the centroid of the data set (which ensures thatx =ȳ = 0) makes the computation of H −1 particularly simple. The corresponding MATLAB code is available from our web page [14].
Transition between parameter schemes. Our next goal is to express the covariance and the essential bias of the algebraic circle fits in terms of the natural parameters Θ = (a, b, R) T . Taking partial derivatives in (12) gives a 3 × 4 'Jacobian' matrix Thus we have whereJ denotes the matrix J at the true parameters (Ã,B,C,D). The remainder term O P (σ 2 /n) comes from the second order partial derivatives, for example where ∇ 2 a is the Hessian matrix of the second order partial derivatives of a with respect to (A, B, C, D). The last term in (89) can be actually discarded, as it is of order We collect all such terms in the remainder term O P (σ 2 /n) in (88).
Next we need a useful fact. Suppose a point (x 0 , y 0 ) lies on the true circle (ã,b,R), i.e.
In accordance with our early notation we denote z 0 = x 2 0 + y 2 0 and Z 0 = (z 0 , x 0 , y 0 , 1) T . We also put u 0 = (x 0 −ã)/R and v 0 = (y 0 −b)/R, and consider the vector W 0 = (u 0 , v 0 , 1) T . The following formula will be useful: where the matrix (W T W) −1 appears in (57) and the matrixM − appears in (75). The identity (91) is easy to verify directly for the unit circleã =b = 0 andR = 1, and then one can check that it remains valid under translations and similarities. Equation (91) implies that for every true point ( Variance and bias of algebraic circle fits in the natural parameters. Now we can compute the variance (to the leading order) of the algebraic fits in the natural geometric parameters. Notice that the third relation in (12) impliesÃ T PÃ =B 2 +C 2 − 4ÃD = 4Ã 2R2 . Thus using (93) gives Thus the variance of all the algebraic circle fits (to the leading order) coincides with that of the geometric circle fit, cf. (51). Therefore the difference between all the circle fits should be then characterized in terms of their biases, which we do next.
The essential bias of the Pratt fit is, due to (83), Observe that the estimates of the circle center are essentially unbiased, and the essential bias of the radius estimate is 2σ 2 /R, which is independent of the number and location of the true points. We know that the essential bias of the Taubin fit is twice as small, hence Comparing to (59) shows that the geometric fit has an essential bias that is twice as small as that of Taubin and four times smaller than that of Pratt. Therefore, the geometric fit has the smallest bias among all the popular circle fits, i.e. it is statistically most accurate. The formulas for the bias of the Kåsa fit can be derived, too, but in general they are complicated. However recall that all our fits, including Kåsa, are independent of the choice of the coordinate system, hence we can choose it so that the true circle has center at (0, 0) and radiusR = 1. For this circlẽ A = 1 θ Figure 1: The arc containing the true points.
Next, assume for simplicity that the true points are equally spaced on an arc of size θ (a typical arrangement in many studies  Table 2: Mean square error (and its components) for four circle fits (10 4 ×values are shown). In this test n = 100 points are placed (equally spaced) along a semicircle of radius R = 1 and the noise level is σ = 0.05.
coordinate system so that the east pole (1, 0) is at the center of that arc (see Figure 1) ensuresȳ = xy = 0. It is not hard to see now that Using the formula (88) we obtain (omitting details as they are not so relevant) the essential bias of the Kåsa fit in the natural parameters (a, b, R): The first term here is the same as in (95) (recall thatR = 1), but it is the second term above that causes serious trouble: it grows to infinity because xx −x 2 → 0 as θ → 0. This explains why the Kåsa fit develops a heavy bias toward smaller circles when data points are sampled from a small arc.

Experimental tests and conclusions
To illustrate our analysis of various circle fits we have run a few computer experiments where we set n true points equally spaced along a semicircle of radius R = 1. Then we generated random samples by adding a Gaussian noise at level σ = 0.05 to each true point, and after that applied various circle fits to estimate the parameters (a, b, R).  Table 3: Mean square error (and its components) for four circle fits (10 6 ×values are shown). In this test n = 10000 points are placed (equally spaced) along a semicircle of radius R = 1 and the noise level is σ = 0.05.
fit (obtained by averaging over 10 7 randomly generated samples). The table also gives the breakdown of the MSE into three components. The first two are the variance (to the leading order) and the square of the essential bias, both computed according to our theoretical formulas. These two components do not account for the entire mean square error, due to higher order terms which our analysis discarded. The remaining part of the MSE is shown in the last column, which is relatively small. (We note that only the total MSE can be observed in practice; all the other columns of this table are the results of our theoretical analysis.) We see that all the circle fits have the same (leading) variance, which accounts for the 'bulk' of the MSE. Their essential bias is different, it is highest for the Pratt fit and smallest (zero) for the Hyper fit. Algorithms with smaller essential biases perform overall better, i.e. have smaller mean square error. The Hyper fit is the best in our experiment; it outperforms the (usually unbeatable) geometric fit.
To highlight the superiority of the Hyper fit, we repeated our experiment increasing the sample up to n = 10000, see Table 3 and Figure 2. We see that when the number of points is high, the the Hyper fit becomes several times more accurate than the geometric fit. Thus, our analysis disproves the popular belief in the statistical community that there is nothing better than minimizing the orthogonal distances.
Needless to say, the geometric fit involves iterative approximations, which are computationally intensive and subject to occasional divergence, while our Hyper fit is a fast non-iterative procedure, which is 100% reliable.  Summary. All the known circle fits (geometric and algebraic) have the same variance, to the leading order. The relative difference between them can be traced to higher order terms in the expansion for the mean square error. The second leading term in that expansion is the essential bias, for which we have derived explicit expressions. Circle fits with smaller essential bias perform better overall. This explains a poor performance of the Kåsa fit, a moderate performance of the Pratt fit, and a good performance of the Taubin and geometric fits (in this order). We showed that while there is a natural lower bound on the variance to the leading order (the KCR bound), there is no lower bound on the essential bias. In fact there exists an algebraic fit with zero essential bias (the Hyper fit), which outperforms the geometric fit in accuracy. We plan to perform a similar analysis for ellipse fitting algorithms in the near future.