Performance Measures for Geometric Fitting in the NIST Algorithm Testing and Evaluation Program for Coordinate Measurement Systems

The Algorithm Testing and Evaluation Program for Coordinate Measurement Systems (ATEP-CMS) is a Special Test Service offered under the NIST Calibration Program. ATEP-CMS evaluates the performance of geometric fitting software used in coordinate measurement systems. It is a Special Test because it is a new type of NIST service, experimental in nature and unsupported by historical data. This report documents and explains the rationale of the performance measures used in ATEP-CMS and analyzes the uncertainties of those measures.


Introduction
This report is concerned with the performance of software used in coordinate measurement systems (CMSs) to evaluated the geometric characteristics of manufactured parts. The particular performance characteristics of interest are those that impact the uncertainty of measurement results produced by CMSs.
Inspection planners need quantitative measures of performance to develop uncertainty budgets and to evaluate the quality of measurement results. In support of this need, NIST has recently established a Special Test Service, the Algorithm Testing and Evaluation Program for Coordinate Measurement Systems (ATEP-CMS), to measure the performance of geometric fitting software used in CMSs [1]. This report documents and explains the performance measures used in ATEP-CMS.
Geometric fitting is the process of computing the representation parameters of a geometric element that in some sense best represents a set of point coordinate data. This representative geometric element is called the substitute geometry for the data points. A manufactured hole, for instance, is usually not perfectly cylindrical because the process that produced it can never be totally perfect. Inspection of the hole might involve measuring the coordinates of selected points on the surface of the hole, fitting a cylinder to the measured points to minimize the sum of squares of the orthogonal distances from the points to the cylinder, and comparing the position, orientation, and size of the fitted cylinder to the dimensions and tolerances of the part specification.
Many factors affect the accuracy of inspection procedures. 1 This report focuses on one of these factors: how close the computed fit is to the intended, mathematically defined, substitute geometry. The performance 1 ATEP-CMS-part of a larger field of endeavor called computational metrology [2]-addresses the fitting software as a source of uncertainty due to possible computational errors. It does not address the propagation of point coordinate uncertainty through the geometric fitting computations, how well the points represent the surface, or whether the intended software function is the most appropriate for the task. measures used in ATEP-CMS quantify how well the software computes substitute geometries over a range of inspection problems.
The next section provides background information on the need for algorithm testing, how ATEP-CMS works, and criteria for performance measures. Section 3 presents details of the test methods used in ATEP-CMS for various types of geometric elements. Section 4 discusses a statistical interpretation of algorithm performance and explains the method used in ATEP-CMS to summarize and interpret the test results. Section 5 presents an uncertainty analysis of ATEP-CMS results.

Background
The drive to develop methods for testing CMS algorithms began in Germany in the early 1980s, when it became apparent that the software supplied with commercial coordinate measurement machines (CMMs) varied widely in quality [3,4]. In the United States, the American Society of Mechanical Engineers established a task force on CMM software that is now designated B89.4.10. Work intensified in 1988 following widespread notification within the defense industries of serious problems with certain commercial CMM software [5]. A U.S. standard on performance evaluation of CMS software is now in draft form and is expected to be issued for public review in early 1996 [6]. NIST's ATEP-CMS service [1] supports this emerging standard.

What is Evaluated by ATEP
When discussing software, the term "testing" is used in many different ways, and it is important to understand the sense in which it is used within ATEP-CMS. It is specifically not used in the sense of software engineering, where software testing is aimed at supporting debugging or maintenance and typically involves code structure analysis, walk-throughs, and similar activities. Within ATEP-CMS, testing is strictly limited to blackbox testing of how the actual behavior of the software compares with its intended behavior. Moreover, ATEP-CMS restricts itself to measuring the accuracy of reported results. Other performance characteristics-such as memory requirements, computing time, ease of use, and other factors-are not considered. In sum, ATEP-CMS ignores the fact that the software under test is software; the ATEP-CMS process would be unchanged if the fitting methods were implemented with an analog computer, as a digital filter circuit, or even mechanically [7]. ATEP-CMS does not address whether the intended behavior of the software is appropriate for a particular measurement task. That issue is not one of performance, but of whether a particular data analysis method is the right tool for the job. Such considerations are beyond the scope of a performance testing program.
ATEP-CMS currently supports testing of orthogonal distance regression fitting 2 for seven geometry types: line, circle, plane, sphere, cylinder, cone, and torus. (Lines and circles are fit to three-dimensional point coordinates which need not be coplanar.) The reader is referred to Ref. [9] for information on how to use ATEM-CMS and the procedures used by NIST for conducting a test. Section 3 below describes the specific difference parameters used in ATEP-CMS for each geometry type.

Criteria for Performance Measures
In developing performance measures, the question naturally arises as to how one might choose one measure over another. One approach is to seek a reliable method of certifying software as "good" or "bad." (This is the approach taken, for instance, by the German testing program [3].) The approach taken in ATEP-CMS is somewhat different. We start with the observation that software is not mathematically perfect. (If nothing else, the end results must be represented in finite precision.) So rather than developing a pass/fail criterion, ATEP-CMS quantifies the expected performance numerically. The goal is to develop numerical measures of performance that provide to a user of the software, information necessary to decide whether the software is adequate for a particular application. To this end, we have used the following criteria in developing ATEP-CMS performance measures: • each measure should be directly related to inspection tasks; • each measure should combine like a standard deviation with other sources of uncertainty in a CMS; 2 Orthogonal distance regression is commonly called "least squares" by manufacturing practitioners. Statisticians consider these two terms to be quite different. Lease squares measures deviations from the surface along a particular direction in a coordinate system. Orthogonal distance regression (which is also called "errors in variables" regression or, for linear fits, "total least squares") measures deviations perpendicular to the surface. Unfortunately, some of the literature (e.g., Ref. [8]) further confuses the terminology by using the term "orthogonal distance regression" for one specific algorithm that solves the orthogonal distance regression problem.
• each measure should have reasonable probability and coverage interpretations; • an estimate of uncertainty should be derivable for each measure.
The first three criteria directly support the goal of using the performance measures to establish uncertainty budgets for CMSs. The last criterion reflects the view that an assessment of software uncertainty is itself a measurement. By NIST [10] and international [11] guidelines, a quantitative statement of uncertainty should be part of any complete measurement result. The last criterion says that the ATEP-CMS performance measures must be amenable to such a treatment.

ATEP-CMS Test Methods
To start a software test, NIST generates a collection of data sets for each geometry type, consisting of threedimensional point coordinates together with some bookkeeping information. NIST also generates a fit for each data set, called the reference fit for that data set. Within ATEP-CMS, the reference fits are generated by fitting the data sets using NIST-developed fitting algorithms. 3 (It is also possible to start with the desired fits and generate data sets having those fits as answers [13].) The data sets are then sent to the software under test, which generates a test fit for each data set. The test fits, represented by a set of parameter values in a standard format, are returned to NIST for evaluation with respect to the corresponding reference fits.
This section describes how differences between a test fit and the corresponding reference fit are evaluated in ATEP-CMS. Differences between each pair of fits are represented by a set of difference parameters. Each geometry type gives rise to its own set of difference parameters. The relationship between the difference parameters and tolerance applications is also discussed.

General Procedure for a Single Fit
Methods for comparing one fit to a reference are described in the draft B89.4.10 standard [6]. The procedure for all geometry types follows a common pattern.
First, the test fit is bounded by projecting the data points onto the geometry. (The way this is done depends on the geometry type, and will be described below. Also, bounding does not apply to circles, spheres, or tori, which are naturally bounded.) This is done because tolerancing standards (e.g., Ref. [14]) specify that tolerances are to be evaluated over the full extent of the associated geometric features. ATEP-CMS assumes that the data points represent the features.
Once the test fit is bounded, geometric differences between the test fit and the reference fit are determined. Again, the specifics depend on the geometry type. In general, however, the differences are designed either to directly reflect a tolerancing application or to provide diagnostic information about the software under test. The difference parameters depend only on the geometry represented by the fit parameters, and not on the parameterization of that geometry.
Finally we note that in many cases the difference parameters are not symmetric. This is a result of the asymmetric treatment of the test and reference fit (the test fit is bounded; the reference fit is not). Thus, reversing the roles of test and reference fit will generally change the difference parameter values.

Application to Specific Geometry Types
The remainder of this section discusses the application of the general procedure to the seven geometry types currently supported by ATEP-CMS: lines, circles, planes, spheres, cylinders, cones, and tori. For each geometry type, we describe the fitting objective, typical uses for such fits in inspection, and the difference parameters computed within ATEP-CMS.

Lines
Line fitting minimizes the sum of squares of orthogonal distances from the points to the fit line. Line fitting is commonly used to check straightness and to establish an axis from the centers of circular cross sections. In the latter case, the axis may be checked for location or orientation or used as a coordinate datum axis.
Within ATEP-CMS, two parameters representing differences between the lines are computed. (Henceforth, we will call such parameters, difference parameters. ) One difference parameter is the angle between the test and reference lines. This directly relates to the use of the fit as a datum axis. To define the other parameter, the test fit line is first trimmed to a segment by the orthogonal projection of the data points. Then the second parameter is the maximum orthogonal distance from the segment endpoints to the reference line. This is a measure of separation between the lines and directly relates to the assessment of straightness, position, and orientation. Both parameters are inherently nonnegative quantities. (Henceforth, quantities that are inherently nonnegative will be referred to as magnitudes. )

Circles
Circle fitting is a two-step process. First, the plane of a circle is found by fitting a plane to the data (as described below). Then, the points are projected into the plane and an orthogonal distance regression circle is fit to the projected points. Note that this process generally results in a different circle than would be found by using the three-dimensional distances from the points to the circle.
Circle fitting is commonly used to check roundness, runout, cross-sectional size, the straightness of the median line of a hole or shaft, and sometimes position. It is also sometimes used as one step in finding an axis. The purpose of the two-step fitting process is that, in practice, the deviations of the points from a plane usually result from measurement error only, while the deviations from a circle in the plane also include the effects of form deviation of the part.
Within ATEP-CMS, three difference parameters are computed. One is the angle between the planes of the test and reference circles. This is primarily a diagnostic measure 4 that can explain other differences between the fits. The second parameter is the Euclidean distance between the circle centers, measured in three dimensions. This relates directly to assessment of roundness, position, and establishing an axis. The third parameter is the difference between the test and reference circle radii. This relates directly to the assessment of size. The first two parameters are magnitudes, while the last parameter is a signed quantity, and is negative whenever the test fit radius is smaller than the reference fit radius.

Planes
Plane fitting minimizes the sum of squares of orthogonal distances from the points to the fit plane. Plane fitting is commonly used to establish a coordinate datum plane and to check flatness, position, and orientation.
Within ATEP-CMS, two difference parameters are computed. One parameter is the angle between the test and reference planes. This directly relates to the use of the fit as a datum plane. To define the second parameter, the data points are orthogonally projected onto the test fit plane. The convex hull of the projected points in the plane forms a planar patch. The second parameter is the maximum separation of the planar patch measured orthogonally from the reference plane. This measure of separation directly relates to the assessment of flatness, position, and orientation. Both parameters are magnitudes.

Spheres
Sphere fitting minimizes the sum of squares of orthogonal distances from the points to the fit sphere. Sphere fitting is commonly used to check form, size, and location, and occasionally to locate the origin of a coordinate system.
Within ATEP-CMS, two difference parameters are computed. The first parameter is the Euclidean distance between the sphere centers. This relates directly to assessment of form, position, and establishing an origin. The second parameter is the difference between the test and reference sphere radii. This relates directly to the assessment of size. The first parameter is a magnitude, while the second is a signed quantity, and is negative whenever the test fit radius is smaller than the reference fit radius.

Cylinders
Cylinder fitting minimizes the sum of squares of orthogonal distances from the points to the fit cylinder. Cylinder fitting is commonly used to check cylindricity and size, and to establish an axis. In the latter case, the axis is used to check position and orientation and to establish a coordinate datum.
Within ATEP-CMS, three difference parameters are computed. One parameter is the angle between the test and reference cylinder axes. This directly relates to the use of the fit as a datum axis. To define the second parameter, the test fit axis is first trimmed to a segment by the perpendicular projection of the data points. Then the second parameter is the maximum perpendicular distance from the segment endpoints to the reference cylinder axis. This is a measure of separation between the axes and directly relates to the assessment of cylindricity, position, and orientation. The third parameter is the difference between the test and reference cylinder radii. This relates directly to the assessment of size. The first two parameters are magnitudes, while the third is a signed quantity, and is negative whenever the test fit radius is smaller than the reference fit radius.

Cones
Cone fitting minimizes the sum of squares of orthogonal distances from the points to the cone surface. Cone fitting is commonly used to check profile and conical taper.
Within ATEP-CMS, four difference parameters are computed. One is the angle between the test and reference cone axes. This directly relates to the use of the fit for checking profile. To define the next parameter, the test fit axis is first trimmed to a segment by first projecting the data points perpendicularly onto the cone surface and then orthogonally projecting the projections onto the cone axis. Then the second parameter is the maximum perpendicular distance from the segment endpoints to the reference cone axis. This is a measure of separation between the axes and relates to the assessment of profile. To define the third parameter, the reference cone axis is bounded using the same process as was used for the test fit. For each fit, the perpendicular distance from the midpoint of the axis segment to the corresponding cone surface is a measure of location of the cone along its axis. The third parameter is the difference between the location of the test fit cone and the location of the reference fit cone along their respective axes. This is primarily diagnostic, and relates to the assessment of profile. The last (fourth) parameter is the difference between the test and reference cone half-angles. This relates directly to the assessment of conical taper. The first two parameters are magnitudes, while the last two are signed quantities, and are negative whenever the test fit cone parameter is smaller than that of the reference fit.

Tori
Torus fitting minimizes the sum of squares of perpendicular distances from the points to the torus surface. Torus fitting is commonly used to check major and minor radii and profile. Occasionally torus fitting is used to establish a datum axis or plane.
Within ATEP-CMS, four difference parameters are computed. One is the angle between the test and reference torus planes. This directly relates to the use of the fit for checking profile and as a datum plane or axis. The second is the Euclidean distance between the torus centers. This also relates to the assessment of profile. The third and fourth parameters are the difference between the test and reference fit major and minor radii. These relate directly to the assessment of major and minor radius tolerances. The first two parameters are magnitudes, while the last two are signed quantities, and are negative whenever the test fit torus parameter is smaller than that of the reference fit.

Data Interpretation
The previous section described how each data set generates a set of difference parameters representing the differences between the test fit to the data set and the corresponding reference fit. In this section we discuss how these difference parameters are summarized into performance measures. For each difference parameter, we will define an associated performance measure summarizing the difference parameter values for all the data sets.
We consider the set of difference parameter values associated with each data set to be one (random) sample of software performance from a theoretical underlying population of inspection problems. We will use a statistical approach to the interpretation of the observed sample values. In Sec. 4.1 we derive the population characteristic we wish to use as a performance measure. In Sec. 4.2 we derive a practical estimator for that measure.

Statistical Model of Algorithm Behavior
We deal with each geometry type separately, and consider each performance measure (and associated difference parameters) independently. For simplicity, consider a single performance parameter for a given geometry type, as described in Sec. 3. Each data set represents an inspection problem drawn at random from a theoretical underlying population of inspection problems. We call the substitute geometry defined by the mathematical objective function the true fit to the data set. We assume the true fit cannot be computed exactly. For the i th data set, we now define three difference parameter values: t i -the difference between the test fit and the true fit to the data set; t i is unknown and unobservable.
r i -the difference between the reference fit and the true fit to the data set; r i is unobservable, but it can be bounded using numerical analysis theory (see Sec. 5).
p i -the difference between the test fit and the reference fit to the data set; p i is the calculated difference parameter for the i th data set.
These quantities are random variables since they are functions of the data set, which is a random sample. As random variables, we can define various expectations.
where E ( ) denotes the average over all the data setsthat is, the expected value taken over the theoretical population of inspection problems. We define analogous quantities for r i : R , R , and ␥ R ; and for p i : P , P , and ␥ P . If the t i are normally distributed, then one can make statements of coverage like This approach has two shortcomings. First, two parameters, T and T , are needed to summarize the test; for simplicity, we would like a single measure of performance. Second, it is difficult to estimate T and T , because we do not know the true fit.
If T is less than half of T , then the rmse provides similar coverage. 5 Pr { | t i | Յ 2␥ T : T < T /2} ≈ 0.95.
If T > T /2, the actual coverage is greater (the software performance is better) than that suggested by a measure based on ␥ T .
The above coverage interpretations are valid for normal distributions. We believe that for the typical distributions that will arise, the coverage error will be on the conservative side since the tails are likely to be smaller than those of a normal distribution. 6 ATEP-CMS uses ␥ T as the theoretical performance measure of the software under test. Using the rmse overcomes one shortcoming of Eq. (1)-we have reduced the performance indicator to a single number. We have also replaced the problem of estimating T and T with the problem of estimating ␥ T . We consider this next.

Definition of Practical Performance Measures
In this section, we will describe two methods of estimating ␥ T . The first follows directly from the definition of ␥ T . The second is a simplified estimate. ATEP-CMS uses the second method, for reasons discussed at the end of this section.
We observe from the definitions that t i = p i + r i for signed parameters, and t i Յ p i + r i for magnitudes. Then a simple substitution yields Thus, we can estimate an upper bound for ␥ T if we can estimate ␥ R and ␥ P . The upper bound can then be used as an estimate for ␥ T .
To estimate ␥ R , we need an estimate of the difference of each reference fit from the true fit. Methods for doing this will be discussed in detail in Sec. 5. For now, assume that the value of r i can be estimated as u i . Then an estimate 7 ␥ R of ␥ R is the positive square root of 5 See, e.g., Ref. [15,Chap. 2]. 6 This assumption will be tested using data collected during future operation of ATEP-CMS. 7 Throughout this paper, a carat appearing over a quantity denotes an estimated value.
where the sum is over all n data sets. (In this paper, all summations are from i = 1 to n unless otherwise indicated. Henceforth, we will use the summation sign alone with the index variable and limits understood.) Similarly, we estimate ␥ P by the positive square root of This first estimation method includes a term in the performance measure that represents the estimated difference between the reference and true fits.
A second method of estimating ␥ T is simply ␥ T = ␥ P , estimating ␥ P as before. This method uses the reference fit as an estimate of the true fit. Unlike the first approach, the differences between the reference and true fits are not incorporated in the performance measure of the software under test. However, they will contribute to the uncertainty of the performance measures (discussed in Sec. 5).
An argument in support of the first method of estimation is that the second, simpler method may underestimate the true quantity. If the performance measures are to be used to establish uncertainty budgets for a CMS, use of the second method makes it critical to analyze the sensitivity of the budget to the uncertainties of the ATEP-CMS results. The first method may have smaller uncertainties.
On the other hand, there are three reasons why the simpler estimate is preferable. First, the software under test should not be penalized simply because the reference software has uncertain performance. Second, the first method may overemphasize the role of the reference software, and may be unduly pessimistic (since it is an upper bound). Finally, a sensitivity study is a proper precaution, and should be recommended anyway.
For these reasons, ATEP-CMS uses the second method-the observed sample rmse-to estimate the performance of the software under test. That is, for each performance measure in ATEP-CMS, ␥ T = ␥ P = ͱ 1 n p i 2 .

Uncertainty Analysis
This section addresses the uncertainty of the ATEP-CMS estimates of software performance. Recall that we have defined different performance measures for each geometry type, and that each performance measure is estimated from the corresponding set of difference parameters for the fits. We will show in this section that the uncertainty of each estimate arises mainly from two sources: (1) the uncertainty of the individual difference parameters due to the uncertainty of the reference fit and (2) the uncertainty arising from having sampled the population of inspection problems.
In Sec. 5.1 we will first examine the uncertainties of the reference fits themselves. In Sec. 5.2 we then examine how these uncertainties propagate to the difference parameters. Finally, in Sec. 5.3 we discuss how the difference parameter uncertainties combine with the sampling uncertainty to yield an overall uncertainty for each performance measure.

Uncertainty of Reference Fits
The uncertainties of the difference parameters for a single test fit arise primarily from the uncertainty due to the presence of (unknown) numerical errors in the reference fit. 8 That is, the reference fit may not correspond exactly to the true fit. A detailed discussion of the NIST fitting algorithms and their uncertainties will be presented in a subsequent report. Here, we provide a summary of the pertinent results.
Our approach is to develop bounds for the numerical errors, assume the errors are uniformly distributed between the bounds, and combine the standard deviations of the errors into the uncertainty of the reference fits. The results presented in this section and the next are specific to the particular algorithms currently used in ATEP-CMS. It is possible that the algorithms may change in the future [16], in which case these results would no longer hold. (The performance measures would not change, however.)

Centroid Coordinates
For all fits, the data are first translated to be centered at the origin. One source of uncertainty is the possible rounding error in computing the centroid. The ATS algorithms use the Kahan summation algorithm [17] with a C-language long double accumulator (80 bit IEEE extended real format). The error in the coordinates of the computed centroid can be bounded in terms of the unit roundoff 80 for the accumulator (about 1.084ϫ10 -19 ). For instance, if x c is the computed x coordinate of the centroid (the average of the coordinates x i , i = 1, . . . , m) and x 0 is the true coordinate, then the following result holds: where | ␦ i | Յ 2 80 ≈ 2.17ϫ10 -19 and in the second line we have ignored the term in O (10 -38 ). Similar results hold for the y and z coordinates.
To conform to NIST policy on uncertainty statements, this must be converted to an uncertainty by assuming some distribution on the possible values of ␦ i . We will assume that the ␦ i are independent and uniformly distributed within the bounds [18]. Then the i th error term x i ␦ i is a random variable with mean of zero and standard deviation 2| x i | 80 /͙3. So, by the Central Limit Theorem [19], the x coordinate of the computed centroid of m data points is approximately normally distributed with mean x 0 and standard deviation of Similar results hold for the y and z coordinates. For simplicity, the uncertainty of the centroid is modeled in ATEP-CMS by an isotropic, trivariate normal distribution using a standard deviation of

Linear Fits (Lines and Planes)
For lines and planes, orthogonal distance regression can be formulated as a linear algebra problem. Within ATEP-CMS, the reference fits are computed using the singular value decomposition of the data matrix after translating the centroid to the origin. The properties of the singular value decomposition are well understood, and the magnitude of the error in the direction vector can be bounded using standard techniques [20] as follows.
We consider first the case of line fits. The direction of the line is the right eigenvector corresponding to the largest singular value of the data matrix. The singular value decomposition is computed using C-language double (64 bit IEEE real format) arithmetic, for which the unit roundoff is 64 ≈ 2.22ϫ10 -16 . Call the data matrix D and denote the exact solution by Q * . It can be shown that the computed solution, Q , is (within 64 ) the exact solution to a matrix D + E where ʈE ʈ 2 р 64 ʈD ʈ 2 . (Here, ʈ и ʈ 2 denotes the vector or matrix 2-norm.) From this, it can be shown that [20], as long as the largest singular value is well separated 9 from the next largest singular value in comparison to ʈE ʈ 2 , the error in the computed solution is bounded by where 1 is the largest singular value and 2 is the next-largest singular value. (This bound is not the tightest possible. For instance, as 2 → 1 , the bound can far exceed the theoretical worst-case difference between Q and Q * of 2. In such situations, however, the singular values are not well separated. Within ATEP-CMS, the error bound is never allowed to exceed 2.) Similar results can be obtained for plane fits. In this case, however, the normal to the plane is the right eigenvector corresponding to the smallest singular value, 3 . The corresponding error bound is ʈQ -Q *ʈ 2 Յ 4 64 3 2 | 2 2 -3 2 | (for planes).
We have here developed overall bounds for the errors in the eigenvectors. To convert these to standard uncertainties, we assume the errors are uniformly distributed within the bounds. The resulting standard deviation, denoted u D , is given by These results are quite conservative. In general any one eigenvector will be more sensitive in one direction than another to perturbations in the data. This behavior could, in principle, be used to develop tighter bounds on the numerical errors propagated through the difference computations. Such an analysis will be the subject of future work.

Nonlinear fits
For geometries other than line and plane (circle, sphere, cylinder, cone, and torus), orthogonal distance regression is a nonlinear leastsquares optimization problem. (Circle fitting uses plane fitting to reduce the problem to two dimensions.) The algorithm currently used in ATEP-CMS to compute the reference fits is a modified Levenberg-Marquardt algorithm proposed by Nash [21].
The ATEP-CMS algorithm starts with an initial guess and iteratively solves the normal equations for a linear approximation to the nonlinear objective function, where the solution is constrained to lie within a "trust region" of the current best guess. The iterations terminate when the computed solution to the normal equations changes the solution by less than a convergence factor set by the tester. (Typically, the convergence factor used in ATEP-CMS is 10 Ϫ15 .) The uncertainty of the reference fit is a combination of two factors. First, the termination logic introduces an uncertainty that is bounded by the convergence factor setting. Second, the solution to the normal equations at the terminating iteration is subject to numerical roundoff effects.
The convergence factor introduces an uncertainty in each element of the computed solution. If the calculated solution is a k -dimensional vector F , if F * is the exact solution, and if C is the convergence factor, then the resulting error due to C alone is bounded by We next deal with the numerical inaccuracies. The fits are calculated using the Choleski decomposition of the normal equations in C-language double arithmetic, so the error in the solution due to numerical roundoff effects alone can be bounded by where (A ) is the condition number of the normal equation matrix A used for the last iteration of the Levenberg-Marquardt routine, and 64 ≈ 2.22ϫ10 -16 as before [20].
To obtain an overall uncertainty, we must convert these bounds into standard deviations. We assume that the numerical errors and the convergence factor errors are independent and uniformly distributed between the bounds. Denote by u F the standard uncertainty for nonlinear fits. Then As with linear fits, the bounds developed here bracket the overall behavior of the fit parameter vector. But as with the singular value decomposition, the solution to the nonlinear problem will in general exhibit greater sensitivity and correlations for some parameters than for others. Thus, it may be possible to obtain tighter uncertainty bounds than those developed here. One problem is that, for geometries other than the sphere, the parameterization used for fitting must somehow represent the orientation of the element. Unfortunately, there is no parameterization of orientations that is free of singularities, and there is always the possibility that the Jacobian of the residuals is rank deficient. This makes traditional approaches using, for instance, the condition number of the asymptotic covariance matrix at the computed solution problematic [22]. As with the singular value decomposition, future work will address tightening the bounds on the fit uncertainties.

Propagation to Difference Parameters
This section discusses how the uncertainties of the computed reference fits propagate to the difference parameters for individual fits. As with Sec. 5.1, the results of this section are specific to the particular algorithms currently being used within ATEP-CMS.
The previous section identified several sources of standard uncertainty for the reference fits. We have designated these as follows: u 0 -the standard deviation of the centroid coordinates u D -the standard deviation of the error in direction for line and plane fits u F -the standard deviation of the error norm of the representation parameter vector for nonlinear fits.
We will separately propagate the standard uncertainties represented by each of these standard deviations through the comparison algorithms and combine the resulting standard uncertainties.
The calculations are fairly straightforward, but tedious. Therefore, we will show details only for the case of the uncertainties of line fits. For other geometry types, we will just present the final results, since the method is the same. 5.2.1 Line Line differences are defined by two characteristics: the angle between the test and reference lines, and the distance of an endpoint of the test line segment to the reference line. For the angle, we proceed as follows. As in Sec. 5.1.2, let Q be the direction of the computed reference line and Q * the direction of the true reference line. Also, let Q T be the direction vector of the test fit, ␣ be the angle between Q and Q T , and ␣* be the angle between Q T and Q *. Since all the direction vectors are unit vectors, the sine of the angle between any two of the vectors is the magnitude of their vector crossproduct. We then have: Յ sin (␣ *) + ʈ Q -Q *ʈ 2 .
If we start with sin(␣ *) = ʈQ T ϫQ *ʈ 2 and follow a similar sequence, we find that sin(␣ *) Յ sin(␣ ) + ʈQ-Q*ʈ 2 . Assuming that ␣ and ␣ * are small (i.e., that the test and reference results are close), sin(␣ ) ≈ ␣ and sin(␣ *) ≈ ␣ *. Thus, |␣ -␣ *| Յ ʈQ -Q *ʈ 2 . Using the results of Sec. 5.1.2, ʈQ -Q *ʈ 2 is the random variable corresponding to a standard deviation of u D . We then have We now consider the distance from any point b to the reference line. Call this distance d . The reference line is located by the centroid of the data, q 0 , so d = ʈ(q-q 0 )ϫQʈ 2 . Two components of the fit uncertainty affect the uncertainty of d : the uncertainty of Q and the uncertainty of q 0 . We treat these as independent sources of uncertainty and combine them in quadrature. With regard to Q , we proceed as we did with ␣ and find that the uncertainty in d due to the uncertainty in Q is bounded by ʈq -q 0 ʈ 2 ʈQ -Q *ʈ 2 .
The standard uncertainty in d due to the standard uncertainty in q 0 is u 0 . Thus, the standard uncertainty of the separation parameter for lines is where q is now the endpoint on the test line segment furthest from the reference line.

Circle
The standard uncertainties in the difference parameters for circles are as follows: • distance between centers: u c = ͙u 0 2 + u F 2 • angle between planes (due to the plane fits): u ␣ = u D • difference in radii: u r = u F .

Plane
The standard uncertainties in the difference parameters for planes are as follows: • angle between planes: u ␣ = u D • plane separation: u d = ͙u 0 2 + ʈ q -q r ʈ 2 2 u F 2 , where q is the point on the test plane furthest from the reference plane and q 0 is the centroid of the data.

Sphere
The standard uncertainties in the difference parameters for spheres are as follows: • distance between centers: u c = ͙u 0 2 + u F 2 • difference in radii: u r = u F .

Cylinder
The standard uncertainties in the difference parameters for cylinders are as follows: • angle between axes: u ␣ = u F • axis separation: u d = ͙u 0 2 + ʈ q -q r ʈ 2 2 u F 2 where q is the endpoint of the test fit axis furthest from the reference fit axis and q r is the point used to locate the reference fit axis • difference in radii: u r =u F

Cone
The standard uncertainties in the difference parameters for cones are as follows: • angle between axes: u ␣ = u F • axis separation: where q is the endpoint of the test fit axis furthest from the reference fit axis and q r is the point used to locate the reference fit axis • difference in axial location:

Torus
The standard uncertainties in the difference parameters for tori are as follows: • angle between planes: u ␣ = u F • center separation: u c = ͙u 0 2 + u F 2 • difference in major radii: u M = u F • difference in minor radii: u m = u F

Uncertainty of the Estimated Performance Parameters
The results of Sec. 5.2 provide us with a means of estimating the uncertainties, expressed as standard deviations, of individual difference parameters p i for individual data sets. Using this, we can now estimate the uncertainty of the software performance measures for each geometry type.
Recall (from Sec. 4.2) that for a particular performance characteristic, we are estimating the performance measure by the observed root-mean-square value of the difference parameters: Furthermore, we have an estimate, denoted by u i , for the standard deviation of each p i attributable to the uncertainty of the reference fit.
Our objective is to estimate the variance of ␥ P . To that end, we will use the following identity: where A is a random event, E (␥ P | A )and V (␥ P | A ) are, respectively, the expected value and variance of ␥ P conditional on event A , and E A [ ] and V A [ ] are, respectively, the expected value and variance over all events A . This identity, which follows from the theorem of total probability, helps us estimate the variance of ␥ P by using the observed sample of performance for the data sets we used.
Let A be the event that the sample of observations is the one we indeed observed. (By conditioning on A we are treating the sample as fixed and not random.) We can interpret the two terms of the variance as follows. The first term, E A [V (␥ P | A )] represents the component of standard uncertainty due to the uncertainty of the reference fits. We will denote this quantity by u R 2 , meaning "square of the standard uncertainty due to the reference." The second term, V A [E (␥ p | A )], represents the component of standard uncertainty due to the variation in the sampling. We will denote this by u S 2 , meaning "square of the standard uncertainty due to sampling." We now develop estimates for the two components of uncertainty, starting with u R .

Uncertainty Due to the Reference
To estimate u R , we must first estimate V (␥ P | A ). Considering ␥ P as a function of the p i , we have, using the law of propagation of uncertainty (If all the p i are zero (as might happen, for example, if the NIST algorithms are tested against themselves), then this estimate is indeterminate. However, whenever p i is small in relation to u i , we can improve the estimate by evaluating the partial derivatives in the propagation formula at the expected value of | p i | instead of at the expected value of p i . To do this, we assume that p i is distributed uniformly about the observed value within Ϯ͙3u i . Then, whenever | p i | < ͙3u i the mean should be modified to include the "folding back" at zero of the distribution of | p i |. The result is This approximation to V (␥ P | A ) is used in ATEP-CMS as an estimate of (If all the p i are zero-that is, u i = 0 whenever p i = 0then we set û R = 0.) The component of standard uncertainty due to the reference is estimated from numerical analysis considerations and, in the language of the guidelines on uncertainty [11,10], is a Type B standard uncertainty.

Uncertainty Due to Sampling Variation
We now turn to u S 2 , the component of standard uncertainty due to sampling. To estimate u S 2 , we need to estimate E (␥ P | A ). We do this by simply using a zero-order expansion of ␥ P about the p i , giving E (␥ P | A ) ≈ ␥ P , and we are left with estimating V A (␥ P ). If ␥ P = 0, we estimate V A (␥ P ) = 0. Otherwise we proceed as follows. By the law of propagation of uncertainty, For convenience, we denote by A 2 the mean variance of the p i 2 over all samples Since ␥ p is estimated from the sample, an unbiased estimate for A 2 is: n -1 (pi 2 -␥ P 2 ) 2 .
We then have û S 2 = ⌺(p i 2 -␥ P 2 ) 2 4n (n -1)␥ P 2 = ⌺ (p i 2 -␥ P 2 ) 2 4(n -1) ⌺p i 2 , with û S = 0 if all of the p i = 0. Since the component of uncertainty due to sampling is estimated from a statistical analysis of data, it is a Type A standard uncertainty, with n -1 degrees of freedom.

Combined Standard Uncertainty
Putting the two terms of the variance identity together, the combined standard uncertainty of the estimated performance parameter ␥ P is the positive square root of the estimated variance of ␥ P V (␥ P ) = u R 2 + u S 2 ≈ ⌺p i 2 u i 2 n⌺p i 2 + ⌺(p i 2 -␥ P 2 ) 2 4(n -1)⌺p i 2 .
Following NIST convention, the expanded uncertainty reported by ATEP-CMS is twice the combined standard uncertainty (i.e., a coverage factor of k = 2).

Summary
This paper has described how NIST evaluates the performance of geometric fitting software used for inspection. The NIST service, ATEP-CMS, is the only known test that provides quantitative measures of performance, complete with statements of uncertainty in accordance with international standards.
ATEP-CMS is something new in the way of calibration services. It is the only test of software offered by NIST in the field of dimensional metrology. It has the status of a Special Test, rather than a Calibration Service, because it is an experiment for us. The statements of uncertainty, in particular, are unsupported by any historical data. We believe, however, that they are fundamentally sound. We have tested some of our capabilities by running our software on both a personal computer and on NIST's supercomputer. These limited results support the validity of our approach. This paper has focused on the performance measures used in ATEP-CMS; we have not discussed testing procedures. However, one key aspect of those procedures bears discussion: the selection of data sets to be used for the test. We use the collection of data sets as if the average performance of the software over those problems represents the average performance of the software when it is used in production. To help ensure this, we use a stratified sampling approach in designing data sets for a test. Through judgment, we define ranges of parameters for inspection problems, including ideal geometry, form errors, surface sampling plans, and point measurement errors. Within each range, we select a representative sampling of data sets. We believe that this mixed approach improves the quality of the test.
We expect to refine our procedures as we gain experience in testing. We also plan to extend the scope of our testing services beyond orthogonal distance regression to include other fitting methods and other common CMS software functions [23].
The methods described in this paper demonstrate that classical concepts of metrology can be used to assess the performance of software, when that software forms part of a measurement system. Moreover, it seems that performance testing like that offered by ATEP-CMS is a requirement if CMS measurements are to be traceable to accepted standards of length.