Normalization and Technical Variation in Gene Expression Measurements

Using data from the Microarray Quality Control (MAQC) project, we demonstrate two data-analysis methods that shed light on the normalization of gene expression measurements and thereby on their technical variation. One is an improved method for normalization of multiple assays with mRNA concentrations related by a parametric model. The other is a method for characterizing limitations on the effectiveness of normalization in reducing technical variation. We apply our improved normalization to the four project materials as part of testing the linearity of the probe responses. We find that the lack of linearity is statistically significant but small enough that its sources cannot be easily identified. Applying our characterization method to assays of the same material, we show that there is a source of variation that cannot be eliminated by normalization and therefore must be dealt with by other means. Four high-density, single probe, one-color microarray platforms underlie our demonstration.

Using data from the Microarray Quality Control (MAQC) project, we demonstrate two data-analysis methods that shed light on the normalization of gene expression measurements and thereby on their technical variation. One is an improved method for normalization of multiple assays with mRNA concentrations related by a parametric model. The other is a method for characterizing limitations on the effectiveness of normalization in reducing technical variation. We apply our improved normalization to the four project materials as part of testing the linearity of the probe responses. We find that the lack of linearity is statistically significant but small enough that its sources cannot be easily identified. Applying our characterization method to assays of the same material, we show that there is a source of variation that cannot be eliminated by normalization and therefore must be dealt with by other means. Four high-density, single probe, one-color microarray platforms underlie our demonstration.
With some extensions, our techniques could be applied to the data from the other three microarray platforms. Each microarray platform was deployed at three test sites, and five replicates were assayed at each site. Thus, for each platform, the MAQC project produced sixty assays. The complicating feature in this otherwise simple inter-laboratory study is the large number of expression levels measured in each assay. This feature leads us to the use of novel statistical methods for investigating technical variation.
In its original form, a gene expression assay consists of many probe responses, which are generally intensity readings from a scanner. In the cases of the platforms considered here, the number of probe responses is greater than 30,000, although we only select some of these responses for our demonstrations. Each probe is a distinct piece of nucleic acid that is immobilized on the microarray substrate. In the case of gene expression, each probe interrogates an mRNA sample to determine the relative abundance of a particular transcript among all the transcripts that make up the sample. Hybridization is the process of applying an mRNA sample to the probes that make up a microarray.
Sources of technical variation, the focus of this paper, affect the probe responses. Some sources of technical variation affect most probe responses shifting all of them up or all of them down. The effects of such sources can be corrected in part through what is called normalization. Two basic choices of normalization method are usually considered. One is assay-to-assay equalization of the intensity distribution [1]. The other is use of external RNA controls [2,3]. The part of the technical variation that is most important is the part that normalization cannot correct.
A third normalization choice can be applied to probe responses from the MAQC project. Each laboratory measured four reference materials, two of which are mixtures of the other two. Because of this mixing, the expression levels in the materials are all related by a parametric model: The expression levels in the mixture materials are known proportions of the levels in the original materials. As this paper demonstrates, this parametric model can be the basis for a different type of normalization. Although this normalization is potentially useful in reaching biological conclusions, our immediate purpose for introducing it is testing the linearity of the probe responses.
Another aspect of normalization is limitations on its effectiveness. The basis of normalization, regardless of the type, is assumption of a relation among the effects of sources of technical variation on the intensity meas-urements from an array. Equalization of intensity distributions depends, at a minimum, on the assumption that the technical variation shifts all the intensities of an assay in the same direction. Use of external controls depends on an assumption that allows changes in the control intensities to be extrapolated to the other intensities. In this paper, we use the MAQC measurements to show limitations on normalization due to more complicated intensity co-variation.
This paper provides results on the technical variation observed through measurements of different materials made in the same laboratory and through measurements of the same material made in three different laboratories. The two sets of results are based on different statistical models, which are discussed in the next section. The results themselves appear in Sec. 3 and are discussed in Sec. 4. Finally, details on the methods are given in Sec. 5.

Statistical Models
Implementation of our normalization method, which we call parametric normalization, requires a mathematical model that faithfully describes the relations among assays. For a particular laboratory, consider the intensity value (probe response) for assay i and probe g, which we denote by y ig , where i = 1, ..., 20. The 20 assays are 5 hybridizations of material A, 5 of material B, 5 of material C, and 5 of material D, respectively. Parametric normalization as implemented here consists of a linear transformation of all the intensities of an assay. Thus, after normalization, the intensity for assay i and probe g is given by where η 0i and η i are the normalization coefficients for assay i. Our parametric normalization model is (1) where for materials A, B, C and D, x Ai equals 1, 0, γ C or (1γ D ), respectively, and x Bi equals 0, 1, (1γ C ) or γ D , respectively. This equation represents intensities in terms of mean intensities, θ Ag and θ Bg , and noise. For material A, the right side is θ Ag + σ ig e ig , and for material B, θ Bg + σ ig e ig . The quantity e ig is a random variable with mean zero and variance one. The standard deviation of the noise σ ig is shown as a function of assay i and probe g. In this model, the fraction of material C that 362 Volume 111, Number 5, September-October 2006 Journal of Research of the National Institute of Standards and Technology 0 , came from material A is γ C , and the fraction of material D that came from material B is γ D . On the basis of this equation, parametric normalization is performed by fitting the unknown parameters η 0i , η i , θ Ag , and θ Bg to the observed intensities y ig and then using the estimated values of η 0i and η i for normalization. The algorithm is given in Sec. 5. Investigation of the limitations on normalization also requires a mathematical model but of a different type. Consider measurements of the same material given as intensities on the base 2 logarithmic scale. We denote these intensities by z ig , where i = 1, ..., 15 indexes the five assays from each of three laboratories. Computation of the correlation matrix is a familiar summarization of such data. Let the covariance between assays i and j be (2) where zi is the mean of the log intensities over the probes for assay i and G is the number of probes. The familiar correlation between assays i and j is given by Our approach is based on modeling the covariance between assays i and j, which is estimated by s i j , using a factor analysis model with two common factors. The covariance model is This covariance modeling is equivalent to modeling the z ig by means of the factor analysis model (3) where the f 1g , f 2g , and e ig , i = 1, ..., 15 are all independent, normal random variables with mean 0 and variance 1. In this model, the random variables f 1g and f 2g , represent the probe-to-probe variation in the common factors and the parameters λ i1 and λ i2 govern the contributions of the common factors to the individual assays. This modeling gives the (log) intensity distribution for assay i as normal with mean µ i and variance The limitation arises because equalization of the intensity distribution has only this mean and variance available for adjustment and not the λ i1 and λ i2 individually.
The results in this paper are based on modeling the technical variation in four platforms. From platform to platform, the technical variation as portrayed by these two models is similar. We leave aside gene-level modeling of the differences among platforms [4]. We do not consider the question of inter-platform reproducibility, the main subject of many studies [5,6].

Parametric Normalization
To assess the linearity of the probe responses after parametric normalization, we determine how well the mixture model fits the normalized measurements. Our measure of lack of fit is a weighted sum of squared deviations, which we compute for each probe. For a particular laboratory, let the normalized intensities for probe g be denoted by u ig , i = 1, ..., 20. From these intensities, we estimate θ Ag and θ Bg denoting the estimates by θ^A g and θ^B g . The averages of the five normalized intensities for each material deviate from the values computed with the mixture model in accordance with Note that these deviations reflect departure of the normalized probe responses from linearity. Lack-of-fit for each probe is indicated by the weighted sum of squares of these deviations where the weights w Ag , w Bg , w Cg , and w Dg are inversely proportional to σ ig 2 . The amounts of RNA material mixed to form materials C and D were in 3 to 1 and 1 to 3 ratios. Were the fraction of mRNA in materials A and B the same, we would have γ C = γ D = 0.75. However, these fractions are not exactly equal. Based on our analysis of the MAQC data and a complementary analysis [7], we have adopted γ C = 0.80 and γ D = 0.69 as values for our deviations. In our analysis, we obtained estimates of γ C and γ D from each of the four platforms we consider. The estimates given by different platforms are in reasonable agreement.
One way to interpret our deviations is to compare them with what would be expected were the mixture model to fit perfectly [8]. We obtain a standard deviation from each of the five normalized intensities for a material and denote these by s Ag , s Bg , s Cg , and s Dg . The comparison involves the ratio of the sum of squared deviations with The scale factor 1/10 depends on specifics of the MAQC design and the number of unknown parameters in the mixture model. If the mixture model fits, this ratio is F distributed, that is, distributed as the ratio of two independent variances. The numerator and denominator degrees of freedom are 2 and 16.
To complete this interpretation of lack-of-fit, we combine the F values from all the probes into a distribution and compare this observed distribution with the F distribution for 2 and 16 degrees of freedom. We make this comparison with a quantile-quantile plot. Each point on this plot corresponds to a probe. The y value is the natural logarithm of the F value for this probe. Thus, a y value corresponds to an observed quantile. The corresponding x value is obtained from the ranking of this F value among all the F values. The x value is obtained as follows: First, find the proportion of all the F values that are less than the particular F value. Let this proportion be m/n, where n is the total number of probes. Second, find the value such that the cumulative F distribution for this value is (m + 0.5)/n. The logarithm of this value is the x value for the probe. Thus, an x value corresponds to a quantile hypothesized on the basis of perfect fit to the F distribution. Quantile-quantile plots for the sites that made use of the Applied Biosystems platform (ABI) are shown in Fig. 1. The methods section describes selection of probes for this figure. In addition to the points, there is the x = y line. If there were no lack of fit, the points would lie on this line. Apparently, there is some lack of fit, and the unanswered questions involve the implications of this lack of fit.
The size of the gap between the points and the x = y line depends on both the fit of the mixture model and on the denominator of the F ratio, which is proportional to the variance exhibited by the replicate measurements on the same material. To judge the gap in terms of the replicate variance, we can ask for the size of the factor that the denominator of the F ratio would have to increase for the points and the line to coincide. This factor is generally small as is shown by the fact that if the denominator were increased by a factor of 2.7, then the points would move downward 1 unit on the y axis. This suggests that the lack of fit exhibited in Fig. 1 is so small in comparison to the noise level that diagnosis of this lack of fit might be difficult. Moreover, note that the gap is ambiguous as a performance measure since it increases with lack of fit but decreases with increasing noise level.
For each of the sites in Fig. 1, the departure of the points from the x = y line is greater at the upper end. This shows that the upper tail of the observed distribution is heavier than it would be were the F values truly F distributed. Generally, one might summarize the lack of fit as being most notable in a small fraction of the probes that have observed F values so large that they cannot be attributed to estimation error as given by the F distribution.
Another view of the sum of squared deviations is comparison of its value under parametric normalization with its value under the manufacturer's specified normalization. To obtain this latter value, we use the manufacturer's values for u ig but do not change the weighting in either the estimation of θ Ag , and θ Bg and or in the computation of the weighted sum of squared deviations. Because the two normalizations may differ by a scale factor, we divide each sum of squares by the corresponding value of (θ^A g -θ^B g ) 2 . We have compared one scaled sum of squares with the other for each of the sites that used the Illumina platform (ILM). We see that for the majority of the probes considered, parametric normalization gives a smaller scaled sum of squares, which indicates that parametric normalization provides a better fit. This, of course, is not surprising because the parametric normalization is optimized on the basis of the mixture model whereas the manufacturer's normalization does not make use of such knowledge.
In characterizing the non-linearity of probe responses, one must apply normalization but, ideally, normalization that does not introduce non-linearity by itself. Figure 2 provides lack-of-fit quantile-quantile plots for the data under parametric normalization and under manufacturer's prescribed normalization. That the parametric normalization gives F statistics closer to the x = y line shows that parametric normalization is better for characterization of the inherent non-linearity of the probe responses.
Linearity of the probe responses is especially important in the determination of differential expression between two samples. We consider the differential expression between materials A and B expressed as the base 2 logarithm of the ratio of the two responses. We compute the differential expression as log 2 (θ^A g /θ^B g ). Figure 3 compares differential expression determined under parametric normalization with differential expression determined under manufacturer's prescribed normalization. In Fig. 3, we treat parametric normalization as the baseline because it provides better linearity as shown in Fig. 2. The vertical dimension shows the bias introduced by the manufacturer's prescribed normalization. Note that the bias is toward higher differential expression for material A. This figure shows that were it possible, there might be benefit in using parametric normalization in the interpretation of the results from a biological experiment.

Effectiveness of Normalization
We now switch from within-laboratory modeling of different materials to among-laboratory modeling of the same material. It has long been recognized that there are potentially substantial inter-laboratory differences in outright measurement of intensities. For this reason, one would avoid comparison of measured intensities across laboratories if one could. However, biological considerations in the design of a study may require processing samples to be compared in different laboratories or in a single laboratory with sufficient time delay that the technical variation will have interlaboratory characteristics. This sub-section shows the perils in such comparisons.
Of course, the meaningfulness of the following characterization of inter-laboratory differences depends on data used for the characterization. We note the following: Mixing should not be apparent as a source of interlaboratory variation because the materials were mixed in one laboratory and shipped to all the other sites in the project. Each platform manufacturer was motivated to choose the other two sites for their platform carefully and to harmonize protocols among the three sites. There was a pilot study that contributed to the harmonization. However, it is true that gauging how well the protocols agreed is difficult because of the multidimensional nature of microarray assays. Finally, generalization from three sites per platform is subject to appreciable uncertainty. Thus, from the MAQC data, one cannot say how difficult it might be to overcome the inter-laboratory differences discussed.
As discussed in the methods section, we select, for the Agilent platform (AG1), the 3000 probes for each material that have the highest intensity overall. Applying factor analysis to the set of material A intensities transformed to the log base 2 scale, we obtain Table 1. The symbols heading the columns in Table 1 are those found in Eq. assay. Inclusion of this parameter in the model provides an elementary form of assay-to-assay normalization. The column labeled Specific Factor gives an estimate of the standard deviation of the noise that is independent from probe to probe. This column also shows some difference among laboratories.
The columns labeled Common Factor complete the model of the distribution of the log intensities that our analysis provides. The intensities are centered at µ i and have spread given by the standard deviation Between assays i and j, the log intensities have correlation given by The amount of correlation depends on the similarity of parameters λ i1 and λ i2 , which model how the common sources affect each assay.
The relation between the components of the two common factors λ i1 and λ i2 is shown in Fig. 4 for all four materials. Normalization is generally based on differences among distributions of the intensities of the assays. In terms of our model, this distribution depends on the distance of the points in this figure from the origin. Thus, normalization can be used to adjust the intensities for each assay so that the distance of each assay from the origin is the same. However, Fig. 4 shows that such adjustment cannot eliminate the differences among assays. The oblique axes shown in each panel generally differentiate the part of the assay-to-assay differences that can and cannot be reduced by normalization. Apparently, there is a source of technical variation that must be dealt with not by normalization, but by other means. Such a source might be variation in some experimental circumstance, perhaps temperature or washing technique, that interacts with the strength of the hybridization bond.
In addition to showing the effects of sources of variation, Fig. 4 also shows clustering of the assays by laboratory, which portrays the magnitude of the among-laboratory variation compared to the within-laboratory variation. Comparisons like this are central to metrology. In the case of univariate measurements, the comparison is between the within-laboratory variance and the amonglaboratory variance and is called gauge R&R (repeatabil ity and reproducibility) [9]. In the case of high dimensional measurements such as gene expression assays with thousands of probes, it is common to apply some form of dimension reduction such as principal components analysis (PCA) followed by display of the measurements in the reduced coordinates [10,11]. Irizarry et al. [12] discuss inter-laboratory variation in gene expression measurements.
Instead of PCA plots, we obtain a perspective on the measurements that places more emphasis on the amonglaboratory variation. Our approach is linear discriminant analysis. Like PCA, we find two linear combinations of the measurements from each assay. In the case of discriminant analysis, these combinations are called discriminant coordinates. The difference is that we find linear combinations that maximize the separation of the laboratories with respect to the within-laboratory variation. To do this, we must adopt penalized discriminant analysis [13,14]. Based on the log 2 intensities for material A from the GE Healthcare platform (GEH), we derive discriminant coordinates with which we plot the 15 material-A assays. These points along with points for material B are shown in Fig. 5. We see that the groups of material-A points for each laboratory are separated. This is not surprising since we derived the discriminant coordinates from the material A measurements.
What is more interesting about Fig. 5 is the clustering of the material-B points. The discriminant coordinates for material A also separate the material B assays by laboratory. Apparently, the measurements from each array have inherent in them a signature associated with the processing laboratory. Such a signature originates from one or more sources of technical variation. This signature is strong enough that it stands out despite the difference between materials A and B. Were we to add materials C and D to Fig. 5 in the same way as material B, we would obtain the same clustering for these materials. Reversing the roles of materials A and B, that is, deriving the discriminant coordinates from the material B measurements instead, gives a result similar to Fig. 5.
Portraying among-laboratory differences in terms of discriminant analysis is particularly relevant to development of biomarkers on the basis of microarray measurements. Biomarkers can be developed from microarray measurements of samples from both cases and controls. Consider the possibility that the cases have been measured in a different laboratory than the controls, perhaps for logistical reasons. What Fig. 5 shows is that amonglaboratory differences could easily be mistaken for casecontrol differences. Moreover, this suggests that withinlaboratory differences involving a considerable time period might also be mistaken for case-control differences.
Volume 111, Number 5, September-October 2006 Journal of Research of the National Institute of Standards and Technology 368 Fig. 4. For AG1, plots of common factors showing assay differences with assay sites labeled. Shown are the relations of the differences to the origin. Distribution-based normalization might bring all the assays to the same distance from the origin, but it cannot bring the assays together in the other dimension.

Discussion
As a resource for investigation of technical variation, the laboratories involved in the MAQC project have provided measurements with state-of-the-art replication. Such measurements are of particular interest because the variation observed represents a technological challenge, not a circumstance that can be easily changed by closer attention to the requirements of the measurement protocol.
An overview of the MAQC project is provided in Ref. 15. This reference provides a more complete description of the MAQC data including parts of data set not considered in this paper. This reference also contains figures that depict the MAQC data. The title of this reference is problematical in that it implies that reproducibility is a qualitative property that a measurement system does or does not have. In metrological studies, reproducibility is a quantitative property that describes closeness of the agreement between the results of measurements on the same material. In the case of univariate measurements, reproducibility is usually described by a standard deviation. It is not clear what quantitative summary of the technical variation in microarray assays is sufficiently inclusive of all aspects of this technical variation. As demonstrated herein, statistical modeling can portray some aspects of reproducibility for gene expression assays.
On the basis of statistical modeling, this paper characterizes the technical variation in four ways generally corresponding to the figures. First is the linearity of probe responses, which we characterize by comparison with appearances that can be expected from noise alone. Second is the amount of reduction of technical variation by off-the-shelf normalization, which we characterize by comparison with parametric normalization. Third is a fundamental limitation of normalization, which we characterize in terms of sources of variation not subject to the usual normalization assumptions. Fourth is a difference between laboratories, which we characterize by choosing a certain perspective on this difference and comparing it with the difference between materials A and B. Note that the figures except for Figs. 2 and 3 are each based on a different platform. Each characterization could be applied to each platform giving a total of twenty figures, which we do not display because each characterization gives similar results for each platform.
The predominant features of these characterizations and the statistical models that they reflect can be expected to hold true in future applications of the platforms to biological studies. Prediction of properties of the technical variation that one should expect in future studies is the purpose of inter-laboratory studies such as the MAQC project. This standard should be applied to the depiction of the MAQC data in Ref. 15.
Common in the application of microarrays is screening for a few differentially expressed genes. This paradigm influences the way people analyze measurements when trying to understand technical variation. In particular, many of the graphical methods familiar in microarray analyses are centered on quantities with familiar meanings in the context of individual genes and have probe-to-probe independence as the foundation for their interpretation. In contrast, this paper focuses on assay-toassay associations that involve many probes. Such associations generally point to a few sources of technical variation and, in addition, form the basis for normalization. The characteristics of technical variation we have identified are important in common applications of microarray technology, but the connection requires some thought. Moreover, this thought is different from the quality-control thinking that serves so well for univariate measurements. Univariate quality control has only limited application to a measurement method that gives perhaps 50,000 responses at a time.
Many scientists require a transparent link between the original data and analysis results before they are willing to trust a characterization. Since providing this link with a single figure does not seem possible, we have performed further analyses to verify our characterizations. Regarding Fig. 1, for example, the observation that points for site 2 lie below the x = y line seems to indicate violation of some assumption about the data besides model lack of fit. We have identified violation of the normality assumption as a possible cause. Regarding Fig. 4, we chose to do factor analysis with two common factors. Inclusion of a third common factor adds detail to the results. More broadly, we note that two different models of microarray intensity form the basis of this paper and that this raises the question of which model is more faithful to the physical processes underlying microarray measurements. Although better models give better results, models do not have to be exact to give useful results. Because we do not use the two models in the same context, they can each be useful without their being in agreement.
The analyses in this paper indicate that technical variation in microarray assays might sometimes be reduced through better approaches to normalization, but that reduction of the technical variation itself through improved measurement protocols and through more consistent adherence to these protocols might finally be needed. Moreover, the analyses provided should be considered in developing experimental designs for microarray studies and QC metrics for monitoring study progress. In particular, our analyses show technical variation problems in comparing expression levels across sites and therefore potentially also over long time periods.

Methods
The original measurements, the responses of the probes before normalization, are the starting point for our approach. This is necessary because our approach involves insight into technical variation gained through study of normalization. We apply a novel normalization method, parametric normalization, as a way of observing non-linearity in the responses of the probes. In portraying the limitations on normalization, we compare among-laboratory variability with within-laboratory variability. This is a comparison we cannot make with measurements subjected to a type of normalization that reduces within-laboratory variation without similar reduction in the among-laboratory variation.

Probe Selection
We selected probes for each of our four analyses following a similar process for each platform. We began with the 12091 probes in the commonly mapped set chosen by the MAQC organizers for inter-platform comparison. For AG1 and GEH, we eliminated probes from this set based on the qualitative calls or on duplication of an identifier in the original data. For AG1, inclusion is based on the value "0" for control type, feature nonuniformity, and saturation. For GEH, inclusion is based on "L" or "G" as the qualitative call. Moreover, we eliminated any probe that does not meet the inclusion criterion for all sixty assays performed with the platform. We entirely eliminated probes with duplicate identifiers. Thus, we begin our selection with 12091, 12091, 11705, and 9378 probes for ABI, ILM, AG1, and GEH, respectively.
For each platform and each material, we next selected the 3000 probes with highest expression level. Consider material A. First, we transformed the intensities to the base 2 logarithmic scale. This required addition of a starter of 100 for AG1 and a starter of 0.3 for GEH. Next, we centered the logarithmic intensities for each assay at their mean. We then computed the mean over the 15 assays for material A and on the basis of this mean selected the 3000 largest intensities. This gives a set of probes we call the A set. Similarly, we obtained the B set, the C set and the D set. Our lack-of-fit analysis and normalization comparison are based on combining the A set and the B set with an exclusive OR. In other words, the probes in these analyses are either in the A set but not the B set or in the B set but not in the A set. The number of probes is 1982 for ABI and 1984 for ILM. We applied factor analysis to single materials: the probes in set A, set B, set C and set D for materials A, B, C and D, respectively. We applied discriminant analysis to the material A responses from the probes in set A. We then calculated discriminant coordinates for the material B responses from the probes in set A.

Parametric Normalization
Our algorithm for parametric normalization is a combination of iterative reweighted least squares [16] and crisscross regression [17,18]. The weighting is based on the suggestion of Rocke and Durbin [19].
For regression under the model discussed in Sec. 2, the weights are inversely proportional to the variance σ ig 2 . Following Rocke and Durbin [19], this variance is proportional to the sum of the expression level squared (x Ai θ Ag + x Bi θ Bg ) 2 and a constant. We followed the approach of Rocke and Durbin [19] in finding the size of the constant relative to the size of the squared expression level.
To obtain an initial estimate of the weights, we estimate the expression levels from the un-normalized intensities y ig using un-weighted least squares. Then, using these weights, we re-estimate the expression levels.
The main iteration in our algorithm follows: First, we use the current estimate of the expression levels to estimate the normalization coefficients η 0i and η i . These estimates are obtained by fitting the model to the un-normalized intensities with weighted least squares based in the current expression levels. Second, we re-normalize the intensities and use these to re-estimate the expression levels. This estimation is based on weights computed from the previous expression levels. We then begin the next iteration by re-estimating the normalization.

Factor Analysis
Separately for each material, we apply factor analysis to original AG1 intensities transformed to the log (base 2) scale. For the 3000 probes with highest expression levels that are given by sets A, B, C, and D, respectively, it is not necessary to add a starter to make the responses all positive. Our factor analysis computations are based on the usual maximum likelihood algorithm [20].

Discriminant Analysis
We apply discriminant analysis to original GEH intensities transformed to the log (base 2) scale with a starter of 0.3 added. We select the set A probes for both materials A and B. Because the starter is necessary for only one intensity value, we could as easily use another approach, but this would not lead to any noticeable change in Fig. 5. The classical approach to discriminant analysis does not apply in our situation because there are many more probes than replicate assays. For this reason, our derivation is based on penalized discriminant analysis [13], which entails adoption of a penalty function. As inputs to their algorithm, we take the centered log base 2 intensities. This centering is an elementary form of normalization. Our choice of penalty function is the identity matrix multiplied by 1000. Hastie and Tibshirani [14] provide a needed part of the algorithm.