A procedure for estimating bias between quantitative analytical methods

Before quality control techniques and protocols can be applied to a laboratory procedure, the quality itself must be evaluated. This involves among other considerations an assessment of analytical accuracy. Some, but not all (and perhaps not even many), clinical laboratory analytical procedures can be considered accurate in the sense that every laboratory should get the same quantitative result for the analyte in question, regardless of the composition of the sample medium (for example potassium). That most methods are to one degree or another inaccurate does not invalidate their use. But the burden is on the laboratory, in choosing a method, to knowjust how accurate that method is.


Introduction
Before quality control techniques and protocols can be applied to a laboratory procedure, the quality itself must be evaluated. This involves among other considerations an assessment of analytical accuracy. Some, but not all (and perhaps not even many), clinical laboratory analytical procedures can be considered accurate in the sense that every laboratory should get the same quantitative result for the analyte in question, regardless of the composition of the sample medium (for example potassium). That most methods are to one degree or another inaccurate does not invalidate their use. But the burden is on the laboratory, in choosing a method, to knowjust how accurate that method is.
If a new method can be shown to be sufficiently precise and stable over a period of weeks, and is linear and free of carry-over, then a few simple operations can assess and document its inaccuracy. One option is to determine the amount of bias that exists between the method under study (referred to as the test method) and a second reference method, the characteristics of which are presumably already known. An ample number of patient samples are analysed by both methods, preferably each ofa pair being analysed on the same day. A plot of one method's results versus the other's is usually quite revealing--imprecision, non-linearity, and the presence of outliers are often evident from a visual inspection of the graph. Problems arise, however, when more quantitative techniques are applied to the pairs of results. A simple least squares regression analysis is usually not valid here because it implicitly (and incorrectly) assumes that one of the two methods is essentially free of measurement error. The popular 'correlation coefficient' usually provides no added information--it is more appropriate for assessing departures from no correlation than for assessing departures from perfect correlation [1][2][3].
Instead of comparing the points to a best-fitting line it is generally more revealing to compare the points to the line of identity (the line passing through the origin with a slope of 1"0). A simple inspection of this plot will generally reveal, in addition to lack of linearity and the presence of 'outliers', any bias which may exist between the two methods. Obvious outliers (however defined) should be examined during the collection of data, preferably on or near the day of incidence. The sample in question should be reanalysed by both methods. A log should be kept ofall gross random erroi"s thus revealed. These may be instrument related or the result of human error. If the abnormal result persists on repeating the analyses, then an effort must be made to determine what characteristic of that sample matrix is causing the atypical behaviour.
Although plotting a set of 'test' method results versus those generated by a 'reference' method will probably show the presence of substantial bias between the methods, that procedure is relatively insensitive to quantifying the bias. A number of more rigorous alternatives to the construction of a simple least squares regression line have been suggested to accomplish this 1, 2 and 4]. With the exception of Lubran's technique, which employs multiple calculations of Student's t-test, these methods involve complex mathematics unfamiliar to most clinical scientists. This paper presents a simple graphical technique for estimating the amount of bias between the results of two quantitative clinical methods, and for determining whether that bias is 'real' orjust an artifact resulting from random sampling fluctuations. It is based on a statistical method developed by Bretaudiere and co-workers to study the suitability of control materials when used on different methods of assay of the same analyte [5].

Method
The assay material for the comparison of methods should include human (patient) samples, as well as all control samples which are likely to be used in future monitoring of the method in question. As Bretaudiere points out, the data derived from this procedure may also be used to obtain information in reference to the suitability of a control sample for its designated purpose.
The material is analysed by both methods between which the bias is to be determined. The reference method may or may not be a true analytical reference method, but should be a method which is well studied and characterized as to its reproducibility and accuracy, including its response to interfering substances. In order to determine bias at different concentrations the patient samples should be grouped according to predetermined value ranges. This stratified analysis lets us distinguish systematic bias (in which the magnitude remains fairly constant over the entire range of test values) from proportional error (in which the magnitude increases as the test value increases). Control material is also tested at different concentrations. Approximately 25 patient samples should be analysed by both methods, preferably at about the same time. Each control sample should be analysed approximately 10 times by both methods.
The pairs of results are then plotted, a line of identity is drawn, and the resulting graphs are examined visually for signs ofnon-linearity and bias. To determine whether the bias is real or merely due to random sampling fluctuations, given the imprecision of the two methods, the raw reference and test method data can be used to calculate a z-value [6]: The z-value gives us a quantitative estimation of the significance of the observed bias. When z > 1"96, the test method exhibits significant positive bias; when z < 1"96, the test method exhibits significant negative bias.
Normalized test method data (the ratios of test method values to reference method values) could also be used in place of raw test method results in the formula above, with z0 being set equal to 100. However, for results near zero, normalized values are subject to large fluctuations, even when only small differences between the two methods exist. In some circumstances, this might led to erroneous conclusions regarding intermethod bias.
When a number of related methods are being compared to their corresponding standard methods, the result can be displayed concisely by means of bar graphs of the normalized data, as illustrated in the following example. Yes amperometric titration (Fiske Chloridometer, Fiske Associates, Uxbridge, Massachusetts, USA), and carbon dioxide by rate of pH change (Beckman C1/CO2 Analyzer, Beckman Instruments, Brea, California, USA). Twenty-one patient serum specimens were analysed by both methods for each analyte. The results of the test method were then plotted against the results of the corresponding reference method for the same analyte on linear graph paper (see figures 1, 2, 3 and 4). These graphs clearly show the upward bias in the potassium test method, the downward bias in the chloride test and carbon dioxide test methods, and the evident lack of bias in the sodium test method. On closer examination, one can further discern that the bias in the potassium and chloride methods is probably of the systematic type (the points appear to lie parallel to the line of identity), while the bias in the CO2 method is more proportional (the points diverge farther from the line of identity for larger values of concentration).

Example
The computations required to assess the significance of bias in the four methods are shown in table 1. The test method result for each sample was expressed as a percentage, relative to the reference method result for the same sample (normalized result). It is seen that in the case of sodium the average percentage does not differ from 100% by more than two SEMs, so there is no significant bias. The other three analyses are seen to have average percentages differing from 100% by more than two SEMs, indicative of significant bias in these methods. The results of the significance testing are concisely expressed by the bar-graph in figure 5.   A computer program is available which performs the linear plot of data from one method versus that of another, data normalization, and z-value calculation. It is available for the Apple II+/IIe with 48K RAM and one disk drive.
Even without a microcomputer, however, this method is a convenient, simple method of estimating bias.
Please note that the data in this paper does not constitute an evaluation of any of the methods used as examples. These methods are merely representative of the type for which this procedure for the estimation of bias is a useful statistical tool.