The Statistical Value of Raw Fluorescence Signal in Luminex xMAP Based Multiplex Immunoassays

Tissue samples (plasma, saliva, serum or urine) from 169 patients classified as either normal or having one of seven possible diseases are analysed across three 96-well plates for the presences of 37 analytes using cytokine inflammation multiplexed immunoassay panels. Censoring for concentration data caused problems for analysis of the low abundant analytes. Using fluorescence analysis over concentration based analysis allowed analysis of these low abundant analytes. Mixed-effects analysis on the resulting fluorescence and concentration responses reveals a combination of censoring and mapping the fluorescence responses to concentration values, through a 5PL curve, changed observed analyte concentrations. Simulation verifies this, by showing a dependence on the mean florescence response and its distribution on the observed analyte concentration levels. Differences from normality, in the fluorescence responses, can lead to differences in concentration estimates and unreliable probabilities for treatment effects. It is seen that when fluorescence responses are normally distributed, probabilities of treatment effects for fluorescence based t-tests has greater statistical power than the same probabilities from concentration based t-tests. We add evidence that the fluorescence response, unlike concentration values, doesn’t require censoring and we show with respect to differential analysis on the fluorescence responses that background correction is not required.

true for the observed fluorescence responses for a given platform; as automatic calibration steps during setup yields a highly reproducible signal output even across different instruments 23 . Recently, we have argued that for statistical differential analysis and for reproducibility fluorescence values are the better choice and they alleviate the concern of determining levels of detection 15,17,22,24,25 . It this report a xMAP suspension bead-based immunoassay format is used to demonstrate fluorescence based data analysis, however the methodological techniques used here are generic and applicable to other immunoassay platforms that use typical sigmoidal/ logistic concentration curves to map fluorescence to concentration values.

Results
Blanks and standards. Comparing patient fluorescence responses against standards and blanks reveals whether the patient responses are within range of the standards 14 (Fig. 1a). The same distributions for all 37 analytes are given in Supplementary Figure S1. The analytes in (Fig. 1a) were chosen because their sample median fluorescent response was below the median response of their lowest standard (S8). For IL-11 its tissue median response is actually below the median of the blank (B, Fig. 1a). Also note, the median response for IL-32 in plasma and serum are also below their median blank response.
It's not uncommon for test sample fluorescence responses to be less than their blank 2,14 . Given that analyte distributions with a median response below the blank appear normal against the shape of the distributions seen for other analytes, it's arguable what the real association is between the standards and test samples 14 . While fluorescence responses below the associated blank cannot be assigned concentration values without using imputation 19 , they can be used for differential analysis and for assigning statistical significances to treatment effects.
Fluorescence and concentration level of detection. When working with concentration values a decision is made on how to set the appropriate level-of-detection (LOD) for each analyte 25 . However, the inverse procedure of going from fluorescence to concentration, via the inverse 5PL curve, also has limits (top/bottom), beyond which no concentration values can be obtained. In contrast, when working with the fluorescence responses no level of detection is generally specified 2,13,[26][27][28][29][30][31][32][33] . This is because fluorescence enables the analysis of low signal and will have more power for testing differences in analyte expression 13,14 . In this report we are interested to test if censoring the fluorescence responses is required.
If a fluorescent level of detection exists it is expected that as the fluorescence response drops, the responses, for a given analyte, should approach some lower limit. This was not seen in the analyte responses that have a median response less than their lowest standard (Fig. 1b). When the individual fluorescence responses are compared against their lower level of detection, LOD, (Fig. 1b: dashed horizontal lines) as obtained from the Bio-Plex Manager TM , there doesn't appear to be a real difference in point scatter above or below these thresholds (Fig. 1b). While IL-22 appears to show a greater scatter above its LOD, this is only a perceptual limit and comes from the scaling to view all the responses for the serum samples with respect to the other analytes. Note, for IL-11 all the urine samples are below the associated LOD (Fig. 1b). Table 1 gives the coefficients of variation (CV), used here as a standardised measure of point scatter and dispersion, for analyte responses with respect to tissue types that have at least 5 fluorescence responses above and below their associated LOD. The samples for analysis were chosen such that the N nearest points above and below the LOD was selected. Therefore, in total 2N points for each analyte and tissue combination was used. The actual value of each N was determined from the minimum number of responses either above or below the associated LOD (Table 1). If the analyte responses approach a lower limit then we expected on average that the point scatter (CV) below the LOD to be less than above it. A test of equality of two coefficients of variation is reportedly equivalent to testing for equality of variances between the logarithmic transformed data 34 . Therefore, we used an F-test and the NULL hypothesis that the ratio between the variances of the log2 transformed data above and below the LOD should be one. Seven of the 17 comparisons were found significant at the 0.05 level (Table 1). Of these only two represented the case where the CV was greater above the LOD than below. The boxplot distributions of the log2 of the CVs (Fig. 1c), for the groups above (A) and below (B) the LOD (Table 1) shows that above the LOD the CVs have a greater range of values than below it. However, statistically the two groups (A, B), according to a Mann-Whitney test they are statistically similar (w = 135, p-value = 0.65) and similarly according to a paired t-test (t = − 0.42, df = 16, p-value = 0.68). Thus, confirming our original observation that there appears little difference in point scatters above and below the LOD, as determined from the standards.
A lower limit wasn't revealed either by considering the log2 of the rank differences; where if a limit of detection existed then this difference should approach zero as the rank reduces (Fig. 1d). Over the first 50% of the differences the average log2 difference for all these analytes is approx. − 1 (Fig. 1d). This reflects ½ a fluorescence response unit change per unique fluorescence response change; and suggests that Fl(rank + 1)− Fl(rank) = 0.5, when rank is in the set 1:(n/2) and n is the largest rank value (Fig. 1d).
Looking at the ranked ordered of the analyte blank distributions (Fig. 1e) shows a fairly continuous range of values. The ranked fluorescence responses for all the patient plasma samples (Fig. 1f) reveals a log2 value of 4 may indicate a potential lower level of detection. However, this is just the response level that most patients' responses are actually above. This ordering also exposes the existence of a bend, or dog leg, near the fluorescence log2 value of 6 ( Fig. 1e). While this bend is not discussed here, it is noted that it represent approximately the median fluorescence response, as about 50% of the fluorescence responses fall below the log2 value of 6. As there are 8 of the 37 analytes with a median blank response below the log2 value of 4 ( Fig. 1d; dashed horizontal line), reveals that the Luminex100 instrument has the ability to distinguish fluorescence signal weaker than that observed for the majority of our test samples.
While, the above observations suggest that there is no detectable lower level-of-detection for the fluorescence responses given here, it doesn't rule out the case that the signal observed for the low abundant analyte samples represents just noise. If this was the case then the p-value distribution obtained for the low-abundant analyte pairwise tissue comparisons (Fig. 1g), should appear uniform 35 . However, as seen ( Fig. 1g) there is an overrepresentation of the lowest p-values, indicating the presence of structure and grouping effects in the low abundant responses with respect to fixed analyte and pairwise tissue comparisons. Differential analysis. For differential analysis of analytes across treatment, tissue or disease statuses, a variety of statistical approaches can be used, such as: t-tests 36,37 , ANOVA and ANCOVA 37,38 , and nonparametric tests 12,24 . However, mixed-effects modelling with fixed and random-effects are emerging as the preferred approach 5,14,27,39 . This is because of its ability to represent different experimental designs, to cope with unbalanced data sets, handle paired and longitudinal information and each random effect decreases the degrees of freedom less than the same effects when treated as fixed 40 , leaving more degrees of freedom for the additional effects and error term estimations 41 .
The most common method for employing these tests is to consider one analyte at a time using multiple statistical models 42 . While this has the advantage of computational tractability, compared to a single global statistical model approach, it has the disadvantages of not incorporating the variances from all the analytes simultaneously, it will missing intrinsic expression similarities or differences and theoretically increasing false positive and negative rates 27,42 . Other advantages for including all the analytes in a single statistical model are that it allows for easy discovery of statistically significant interactions 43 , and provides a simple framework for extracting corrected and uncorrected probabilities for analyte differences across, or in pair-wise contrasts, between tissues/treatments and disease status.
A common form of data reduction prior to statistical analysis is the stratification of results such that only the patients with conditions or tissues/treatments of interest are included in the statistical models. For example: to compare the Mononucleosis patients against the normal patients, Supplementary Table S1, two statistical models would normally be used, one including just the plasma data and other including just the serum data. However, doing so is overlooking the ability to correct for tissue and disease differences as needed; thereby, allowing more data into the analysis and increasing its underlying statistical power.
Here, two linear mixed-effects models are first examined (Fig. 2a), both use all the available data, but in one (the reduced model) not all the data associations are included. In both models the log2 of the fluorescence responses (Fl) are modelled with two fixed main effects (reduced model) or three fixed main effects (global model) plus 1 st order interactions, Tissue:Cytokine and/or Condition:Cytokine. Both models include the same two scalar random-effects defined by the terms in brackets and which are conditional, '|' , on Plate:Condition:Tissue or on Patient. The first random-effect allows for tissues and conditions to associate across different plates, and from the data sets here there are 14 such associations, see Supplementary Table S1. The 2 nd random effect accounts for patient-to-patient variations. Both these models adjust for extraneous variations due to plates, conditions, tissues and patient differences and because we have sparse data, Supplementary Table S1, we avoid overfitting by not including higher order fixed-effects interactions such as, Cytokine:Tissue:Condition.   For fluorescence, the advantage of including more data associations in the statistical analysis can be tested easily; by testing if the global model adds explanatory value over the reduced model and where the NULL hypothesis is no explanatory value added. From such analysis (Fig. 2b) we see while the global model uses nearly 3 times the degrees of freedom (Df) there is good statistical reasons (p-value < 2.2e- 16) to use a global model over a reduced model as in this case it helps explain more variation in the data. This can also be visualized using regression Conditional plots show the relationship between the outcome and explanatory/conditional variables to be viewed as other effects are held constant. Note for the global model, (d), the cluster of points (residuals) around each conditional mean response (dark horizontal line) is tighter than that seen in the reduced model, (c). Both models contain the same number of samples per tissue. For brevity, only results for 9 of the 37 analytes are given, however, the same comparison but for all 37 analytes are given in Supplementary Fig. S2. Since the conditional residuals are reasonably scattered above and below their respective means, implies that either model is a reasonable fit to the data.
conditional plots 44 and in terms of a reduced spread in the partial residuals about the conditional means for each analyte obtained from the global model compared to the same results obtained from the reduced model (Fig. 2c,d). Therefore, all statistical analysis here will use the global model (Fig. 2a).
Statistical significance of analyte differences. For the fluorescence responses there are no issues when applying the above statistical models (Fig. 2a) because of a reasonable number of patient readings for each analyte against tissue and conditions ( Supplementary Fig. S3). In contrast, for the concentration analysis there were major problems due to zero readings per analyte against tissue or condition (Supplementary Fig. S3). For the concentration data set there are 4 analytes IL-11, IL-34, Light, and MMP-3 that have at least one cell with zero patient readings ( Supplementary Fig. S3) and the mixed-effects models in (Fig. 2a) need at least 1 entry per cell to avoid matrix-rank deficiency problems during analysis. Therefore, these 4 analytes were removed from consideration for the concentration analysis, but not from the fluorescence based analysis.
The probabilities of analyte abundance changing across tissue types are easily obtained from a mixed-effects model and are given in Table 2. For completeness, the probabilities for the 6 pairwise comparisons between tissue types per analytes, for all analytes, are given in Supplementary Table S2 analytes that have significant abundance changes across tissue types according to either fluorescence responses or concentration values. The concentration analysis shows that nearly every analyte except for IL-10, IL-12p70 and TSLP, are significant at the 0.05 level. Seven analytes (IFN-a2, IFN-b, INF-g, IL-12p40, IL-2, IL-26 and MMP-1) are significant according to concentration but not according to fluorescence, Table 2. By looking at the adjusted means and the 95% confidence limits against tissue (Fig. 3a) for these analytes it's seen for fluorescence that all confidence limits for each of the 7 analytes, across tissues, overlap each other. This is not so for the concentration responses and this is why theses analytes show up as being significant according to concentration but not according to the fluorescence analysis. In Table 3, analyte pair-wise comparisons between normal patients and diseased patients (COPD, Mono, Myeloma, Psoriasis, RA, Sepsis and T23) are given. For brevity, Table 3 shows only those comparisons that are significant according to either the fluorescence or concentration based analysis at the 0.05 level. There are 56 comparisons in Table 3, 39 are significant according to fluorescence, but 45 according to the concentration results. The largest difference between the fluorescence and concentration results is seen for the pair-wise differences between normal and T2D patients. For this comparison: the fluorescence result shows only April as being differentially expressed, while from the concentration analysis IFN-g, IL-12p40, IL-2, IL-28 and IL-29, are identified as being differentially expressed. From the adjusted means and the 95% confidence limits according to the fluorescence and concentration responses for these analytes (Fig. 3b), it is clear from the concentration data that these analytes, except for April, are differentially expressed between normal and T2D patients, but equally clear from the fluorescence analysis is, that these same analytes are not differentially expressed (Fig. 3b).
We believe that the differences between the relative analyte abundances seen between the concentrations and fluorescence responses (Fig. 3) can only come about if the mapping from fluorescence to concentration changed the relative abundances and differences between the analytes from that given by the fluorescence responses.

Mapping the fluorescence response into concentration values.
It should be noted that the differences between the fluorescence and concentration results shown above are mostly highlighting low abundant analytes (Fig. 3). Yet it is the low abundant analytes that at times are the most interesting to the life scientists and can occupy most of the analysis. Therefore, simulation is used to confirm our understanding of the differences between the analyte abundance profiles seen in (Fig. 3). Mapping a hypothetical fluorescence response distribution to concentration values through a simple sigmoidal curve (Fig. 4a) at low (0.05), middle (0.5; the EC50 response) and high (0.95) response levels, reveals that the resulting concentration distributions are skewed either to left for low fluorescence response, or skewed right at high fluorescence response level (Fig. 4b). EC50 represents the effective concentration that produces 50% of the maximum fluorescence response. Note also, not all input responses were mapped to concentration values because the maximum (top) and minimum (bottom) log2 fluorescence responses that can be mapped via the inverse sigmoidal curve in this simulation is 1 and zero respectively. Also note, the variances, as measured by the standard deviation, sd, increases as the output distribution moves away from the EC50 location. This implies that any resulting statistical test on the concentration distributions when the fluorescence distribution is normally distributed but not centred at the EC50 response will have less statistical power and hence higher p-values compared to the input fluorescence responses.  Simulations for mapping the fluorescence responses from two hypothetical tissues/treatment groups A and B to concentrations are used to see how censoring and mapping fluorescence responses to concentration values effects statistical comparisons (Fig. 4c,d). Three input distribution pairs are used and they differ only with respect to level of skewness (Fig. 4c). The resulting p.values from two-sample t-tests, performed on the input fluorescence responses after translation, along the fluorescence axis, and on the corresponding output concentration distributions after the translated fluorescence responses are mapped to concentration values. This was carried out in a stepwise fashion (stepsize = 0.01) across the range of responses from 0.05 to 0.95 inclusively (Fig. 4d). The left plots (Fig. 4c,d) gives the case when the input distributions are normally distributed (skew = 0), and shows that the resulting concentration p.values, increase with distance from EC50 location (0 on the log2 scale). However, the expected p.values as obtained from the translated fluorescence responses remains constant. The middle plots (Fig. 4c,d) gives the case when both input distributions are skewed left (skew = − 5) and reveals that below the EC50 concentration the concentration p.values increase, but above the p.values decrease. The right plots (Fig. 4c,d) show the opposite case, when the input distributions are skewed right (skew = 5), and below the EC50 the concentration p.values are seen to decrease but above they increase.
The plots in (Fig. 4d) reveal that the concentration t-tests results changed with respect to concentration, implying that the effect-size between the inputs must also be changing with respect to concentration. Therefore, the underlying relative abundances determined from concentration based analysis can be different from that obtained from a fluorescence based analysis. Also note, that the expected p.values (Fig. 4d), obtained from the input fluorescence responses remained constant with respect to translation along the fluorescence axis, telling us that the results of t-tests are invariant under translation and more importantly, that differential analysis performed on the raw fluorescence values don't actually require any form of background correction prior to analysis.

Discussion
Here we have added evidence that there is no need to specify a limit of detection for the fluorescence response. That a global statistical model has more statistical power than a reduced statistical model, and this we believe extends to the cases of looking at single analytes at a time or via stratifying the results into logical groupings and analysing these groups individually. It was demonstrated that the observed concentration values and resulting p.values from comparative concentrations based analysis are affected by the shape of the input fluorescence response distributions and the actual location on the concentration curve these input distributions map through. Because of these relationships the exact output probabilities after mapping fluorescence responses to concentration values is to some extent unpredictable, even if the input fluorescence comparative probability is known. The output probabilities according to concentration comparisons can be less than obtained from the same comparisons using the fluorescence responses. This then leads to false conclusions and claims of greater effect size and significance. In other instances the output concentration comparisons probabilities can actually be greater than that obtained from the input fluorescence distributions leading to the false conclusion of little or small observable effect size.
Here it was seen that the analyte blanks showed varying response levels, suggesting that non-specific binding of multiplexed detector antibody complexes might be contributing to this variations. Some bead types appear to be inherently stickier, such as Light (51), while other bead types appeared much less sticky, for example IL-8 (54). We suggest that this non-specific binding observed in the blanks maybe blocked by agents found in the test sample 14 . These blocking agents could be lipids, cholesterol, proteins, or heterophilic antibodies. However, the common approach of subtracting the blank from the test sample responses to correct for nuisance background levels must assume no blocking is occurring in the test samples, yet it's not unusual to find sample responses for low abundant analytes less than the associated blank and this would not be the case if this assumption was true. Alternatively, by not subtracting the blank, as done here, assumes 100% of the non-specific binding sites are blocked by agents in the test samples. The reality, perhaps, will be between these two extremes. Of concern, for the analyst, is that individual estimates of the background levels will be mixed in with noise. Subtracting these estimates especially from the low abundant analytes will result in values with even more noise. Alternatively, the effect of no background subtraction will mean slightly higher response values for the low abundant analytes and will underestimate slightly any fold change measurements. Yet for differential analysis this isn't a problem because it was shown here via simulation that background levels should have no impact on differential analysis when using the fluorescence responses. This is because our simulations revealed that t-tests were invariant under translation; that is, t-test(x,y) = t-test(x + b, y + b); where x, y represent two groups of values and b represents the background level.
The use of fluorescence response in the case of xMAP technology permits data analysis to be performed independently from a standard curve. As a general guideline, for data points collected in duplicate or triplicate wells, intra fluorescence responses %CV reflects actual well-to-well variation in reading. This %CV value is typically smaller than intra-observed concentration %CV 14 , which is dictated by the precision of the entire standard curve. As a result, a poor between-plate %CV on a standard curve will have a larger impact on data interpretation if sample analysis is concentration driven. This is not an issue if data analysis is performed using fluorescence response. In addition, sample analysis in the absence of a standard curve will also help to maximize the number of samples to analyse per 96-well plate, from 39 to 47 samples (run in duplicate wells). Table 3. Significant analyte pair-wise condition (contrast) differences. The probability values (Pr(> Chisq)) of the Chi-square result have been multiple test corrected using Holm's methods 48 . The number of * 's in the Sig. column indicates significance levels. Note the empty rows associated with the concentrations results represents analyte comparisons that couldn't be analysed because of missing concentration values. COPD = chronic obstructive pulmonary disease, Mono = mononucleosis, RA = rheumatoid arthritis, T2D = type 2 diabetes. Degrees of freedom = 1 for all tests.  Supplementary Fig. S1). The dashed horizontal and vertical lines on the sigmoidal and the inverse curve respectively highlight these levels. (b) Shows the corresponding output concentration distributions after mapping the input distribution in (a) to pg/ml when the input distribution is centred at low, middle, or high log2(Fl) responses. The vertical dashed lines overlaid on the output distributions represent the expected mean concentration as obtained from the input mean fluorescence response. (c) Simulations of mapping the log2(Fl) responses from two hypothetical tissues A and B to pg/ml is given. Three input distribution pairs that differ only with respect to their level of skewness (0, − 5, and 5 respectively). The distance between the input means in each pair was set to achieve a Cohen's effect size of 0.8.

Methods
From 169 patients, a total of 191 samples, for either one or two of four possible sample types (plasma, saliva, serum and urine) were purchased from Bioreclamation Inc (Hicksville, NY). All blood specimens were collected at clinical locations using standard vacutainer-type blood collection tubes and processed to plasma or serum by the vendor. The samples were aliquoted and stored frozen at − 80 °C for single use. Bioreclamation Inc. collects every sample under IRB approved protocols, where ethical guidelines are followed to protect patient confidentiality and safety. Each sample has the patients consent for use in a wide range of research including the development of commercial products or services.
The fluorescence responses and concentrations of analytes were obtained using a Bio-Plex Pro ™ Human Inflammation Panel 37-Plex assay kit with magnetic beads (171AL1001M, Bio-Rad, Hercules, California, USA) and analysed with a Luminex100 system and the accompanying Bio-Plex Manager TM Software 6.1(Bio-Rad, Hercules, California, USA).
The concentration values and detection limits were determined from standard curves generated from each kit's standards using the Bio-Plex Software Manager TM weighted 5PL curve fitting procedure. To maximize the number of concentrations values available for analysis we included the Bio-Plex extrapolated values. Therefore, the definition of out-of-range here, and unless otherwise stated, refers to concentration values that cannot be obtained from the 5PL logistic curve; that is beyond extrapolation.
All statistical analysis was performed using R version 3.1.0 (2014-04-10) 45 via RStudio Version 0.98.507 46 . The mixed-effects modelling were done using lmer 47 . The visualization of regression results was done using visreg 44 , and the significance of interactions terms and interaction means were determined using Phia package 43 . Unless otherwise stated all p-values have been multiple test corrected according to Holm's method 48 . For simulation experiments normal distributions were obtained from rnorm and skewed distribution were obtained using skew normal distribution methods, rsn, from the R package sn 49 .