Analysis of Variance for Item Differences in Verification Data with Unknown Groups

For sequentially collected data, this paper introduces a lag-one differencingmethod to estimate the randomerror standard deviation δR and then uses the estimate δR to calculate a change detection threshold in a moving windowmethod to detect shifts in the shortterm systematic error. Performance results on simulated and real data are presented. Fortunately, the impact of having to perform change detection on the estimated short-term systematic and random error variances is anticipated to be modest or small. The motivating example arises from facilities under nuclear safeguards agreements, where inspector data collected during International Atomic Energy Agency (IAEA) verifications are compared to corresponding operator data. The differences between the operator and inspector values are evaluated using an application of analysis of variance (ANOVA). Typically, it is assumed that short-term systematic errors change across inspection periods, so inspection periods form the groups used in the ANOVA. In some data sets, it appears that the short-term errors have changed at other times, so change detection methods could be used to detect the actual change times.


Introduction and Background
For facilities under nuclear safeguards agreements, inspector data collected during International Atomic Energy Agency (IAEA) verifications are compared to corresponding operator data.The differences between the operator and inspector values are evaluated using an application of analysis of variance (ANOVA).It is typically assumed that short-term systematic errors change across inspection periods (groups); however, in some data sets, it appears that the short-term errors have changed at other times, so change detection methods could be used to detect the actual change times.This paper introduces a lag-one differencing method to estimate the within-group random error standard deviation   and then uses the estimate δ to calculate a change detection threshold in a moving window method to detect shifts in the short-term systematic error.Performance results on simulated and real data are presented.
An effective measurement error model for inspector measurement data must account for variation within and between groups, where in this context a group is usually (but not always) an inspection period.A model for multiplicative errors for the inspector () (and similarly for the operator ) is where for item  (from 1 to ) from group  (from 1 to ),   is the inspector's measured value,   is the true unknown value,   ∼ (0, 2  ) is the short-term systematic error in group , and   ∼ (0, 2  ) is a random error.The notation   ∼ (0, 2   ) means that   has a normal distribution with mean 0 and variance  2   , and similarly for   ∼ (0,  2  ) [1].For simplicity, it is assumed here that there are  measurements in each inspection period; allowing for a differing number of measurements in each inspection period is straightforward.
The measurement error model in (1) sets the stage for applying ANOVA with random effects [1][2][3][4].Neither   nor   are observable.However, for various types of observed data, the variances  2   and

ANOVA
The usual ANOVA decomposition (which is a standard analysis technique) is where , SSW is the within-group sum of squares, and SSB is the between-group sum of squares.In random effects ANOVA (the group means   ∼ (0,    and δ2  have the minimum variance among all possible unbiased estimators (MVUE) [1].However, biased estimators can have smaller mean-squared error (MSE), where  δ2 Also, the MVUE property relies on the normality assumption.Therefore, other estimators are sometimes considered [1]; for example, setting δ2 to 0 if it is negative leads to a positive bias, but also to a lower MSE.
Figure 1(a) plots 10 simulated  = ( − )/ values in each of 3 groups.The alarm limit on the right side of the plot in Figure 1(a) is estimated on the basis of the  = 3 sets of  = 10 values of .The groups in data such as that in Figure 1(a) are typically inspection periods; however, in some data sets, it appears that the short-term errors have changed at other times, so change detection methods can be used to detect the actual change times.This paper introduces a lag-one differencing method to estimate the random error standard deviation   and then uses the estimate δ to calculate a change detection threshold in a moving window method to detect shifts in the short-term error .Performance results on simulated and real data are presented.
Data such as that in Figure 1 are used for verification, and also for two main metrology purposes.One metrology purpose is to estimate performance values for the four RSDs,  2  ,  2  ,  2  , and  2  .For verification, statistical tests are applied to  = ( − )/, so a second metrology purpose is to estimate  2  and  2  , by applying random effects oneway ANOVA to real paired relative difference data that are typically assumed to follow (2).Reference [5] evaluates impacts on alarm probabilities of using estimates of  2  and  2  (this paper's focus) instead of the true quantities.Figure 2 is four pairs of real operator and inspector data for each of three measurement periods ("groups"), plotted as (-)/ versus the measurement order.In the context of change detection for paired (, ) data such as that in Figure 1 (simulated data) or Figure 2 (real data), there are situations where the default choice to group by inspection period seems to be contradicted by the data.The main goals for such paired (, ) data are to (1) estimate the within-group and between-group variance for - data (this paper's focus), (2) estimate thresholds that correspond to small false alarm probabilities for future - data as in Figure 1, (3) estimate the within-group and between-group variance for both  and  data for international target values [4] and to use for error variance propagation in material balance evaluation.This application is an example of random effects ANOVA, which is a standard analysis technique, but with the nonstandard feature that the group memberships are unknown.Some research issues arise in applying ANOVA to paired (, ) data, including the following three: (1) One-way random effects ANOVA that arises in the verification context sometimes with unknown groups has not been investigated in published literature; see Section 3.
(2) Although ANOVA dates to the 1930s [1], research continues regarding, for example, constructing intervals for  2   .Assuming that the  and  values are normally distributed, an exact CI can be constructed for  Ŝ /) [7], where   Ŝ is the kurtosis of the distribution of δ .If the  and  distributions are normal, then   Ŝ = 0.This approach works reasonably well in simulations reported in [6]; however, it ignores the Group 1 Group 2 Group 3 Alarm limit −0.04 Group 1 Group 2 Group 3 Group 1 Group 2 Group 3  fact (unpublished) that cov(, ) = (1/)( 4 − 3 4  ), where  4 is the fourth moment of the distribution of  + .
(3) A variance approximation issue arises in analysing ( − )/ instead of ( − )/ (not available in practice).Linear first-term error variance propagation is accurate for quite large   up to approximately 0.60.The results reported in Section 4 use  rel = ( − )/ to estimate   and   .Also, when checking whether future  rel = ( − )/ values are within alarm limits, there is no need to separately estimate  2   and  2  or  2  and  2  , so usual ANOVA is adequate, rather than the Grubbs' ANOVA [2][3][4][5].

One-Way Random Effects ANOVA with Unknown Groups
Regarding research issue one, if wrong groupings are chosen, then δ2  and δ2  can be adversely impacted.For example, Figure 3 plots an exaggerated example for ease of illustration, in which successive groups are incorrectly combined.To show that ( δ2  ) and ( δ2  ) are impacted by mismatch between the true and assumed grouping, consider the ANOVA decomposition equation (3), where   = (  −   )/  for multiplicative models.In Figure 3, the groups 1 and 2, 3 and Results ( 3) and ( 4) have been confirmed by simulation in the R statistical programming language [8] for data simulated from (1) such as that in Figure 3 (see Table 1 ).Notice from (3) and (4) that ( δ2  ) + ( δ2  ) =  2  +  2  .As another extreme case, assume that each of the 6 true groups in Figure 3 More generally, and so The extreme case of merging distinct groups 1 and 2, 3 and 4, and 5 and 6 or of splitting each of the 6 true groups into 3 groups will not occur in this simulated data or in other similar data if any reasonably effective change detection method were used, as illustrated in Section 4.

Change Detection to Detect Changes in the Value of S = S I -S O
There are many change detection options for sequences of (-) data [9,10].Assuming that not too many changes occur in  =   +   during the entire analysis period, a robust estimate of   , where  =   +   , is available, using the median absolute deviation (MAD) of the lag-one differences, adjusted to provide an unbiased estimate of the true variance [8].The following function madwin implements the MADbased estimate of   for a moving window change detector in the R statistical programming language [8].The moving window compares (  −  +1 ) to  δ , where   is the average value of  = (−)/ in window  (which by the central limit theorem can be assumed to have approximately a normal distribution) and  is a threshold chosen such that the false alarm probability over the entire analysis period is small, such as 0.05.madwin = function(x=testdata1-testdata2,nwin1=5, nwin2=7,t.percentile=.99,peak.fixed unknown true grouping.In simulation studies, one can know the true grouping and enforce all assumptions to hold, so it is possible to compare the performance of δ2  and δ2  for candidate grouping options. In Figure 4, there are three change points, at 101, 201, and 301, so there are four true groups.The black 1s are the  values.Notice the positive mean shift from group 1 (1:100) to group 2 (101:201).The red 2s are the moving average difference, scaled by using the MAD-based robust estimate of   .The blue and green curves are Page's cumulative sum, defined as   = max(0,  −1 + d ), where d = /σ ,, [3,10] for positive and for negative change detection.The red 2s (size 3 moving window) easily detect all three changes, at 101, 201, and 301.The cusums are more difficult to assess visually, but correctly have edges near the change points.The two horizontal lines are approximate 0.95 false alarm probability limits for the parametric size 3 moving window using the MAD approach, so the moving window option has a few false alarms where the red 2s exceed the horizontal lines away from the three change points, meaning it detects more than four groups in this realization.The moving window option computes the difference between successive moving averages of, for example, 3, (user specified) successive nonoverlapping d values.
Figure 5 is a simulated example to illustrate that even if   /  is large, so that most mean shifts due to  =   +   shifting will be easily observed, some shifts will be small and difficult to detect.Similarly, in Figure 5, the mean shift from group 1 to group 2 might not be detected, particularly if   increased.Other change detection options can be considered; for example, a nonparametric option has been implemented that works with the ranks of  values in a moving window system similar to the parametric window system used here.Nonparametric options can outperform parametric options if the distributional assumptions made by the parametric options are badly violated in the data.
Several simulations have been performed in R [8] to assess the impacts of having to perform change detection to choose the groups.As shown above, incorrect group assignments can lead to negative bias in δ2 and to positive bias in δ2 .The IAEA's calculations of the sample size needed to attain a specified loss detection probability require separate estimation of δ and δ [5].
For the results in Tables 1 and 2, the simulations used a total of 48 observations, divided into cases defined as having 2, 3, 4, 6, 8, 12, 16, or 24 equal-sized groups of 24, 16, 12, 8, 6, 4, 3, or 2, observations, respectfully.Each case was analysed using 10 5 simulations, and for brevity here, only some of the results are reported in Tables 1 and 2. The Table 1 results (repeatable to ±1 across sets of 10 5 simulations) are the relative bias (in percent) and the relative root mean-squared error (in percent).1 and 2 use the correct grouping ( = 6, =8) and several wrong groupings as indicated.Tables 1 and 2 also use the MAD-based windowing change detection method (with a false alarm probability, FAP, of 0.1, 0.05, or 0.01 as listed) and reduce the bias in the estimates δ2  or δ .The RMSE in δ2  or δ is only slightly larger than in the case where the true groups are known.
Figure 6(a) plots an example realization of the data with  = 6,  = 8 and with   = .02,  = .01. Figure 6(b) plots the results of the MAD-based windowing change detection method with a false alarm probability (deciding that a potential change point is a true change point when in fact it is not) of 0.01. Figure 7 is the same as Figure 6, but for with   = .02,  = .06.Based on the 10 5 simulations, the distribution of the number of groups found using the false alarm rate of 0.05, 0.17, 0.33, 0.30, 0.13, 0.02, 0.002, and 0.00006, for  a = 4,5,6,7,8,9,10, or 11, respectively; the average number of groups inferred is 5.4; the standard deviation in the estimated number of groups is 1.1.

Discussion and Summary
This paper introduced a lag-one differencing method to estimate the random error standard deviation   and then used the estimate δ to calculate a change detection threshold in a moving window method to detect shifts in the shortterm systematic error.Performance results on simulated and real data were presented.The impact of having to perform change detection on the estimated short-term systematic and random error variances appears to be modest or small.In the motivating example, typically, it is assumed that shortterm systematic errors change across inspection periods, so inspection periods form the groups used in the ANOVA.In some data sets, it appears that the short-term errors have changed at other times, so change detection methods could be used to detect the actual change times.
The default assumption that is often adequate is that  changes across inspection periods, so the groups are defined as inspection periods.However, if change detection contradicts the default assumption that  changes across inspection periods, then change detection as presented in Section 4 is a reasonable option.Recall from Section 3 that  ∑  1 and 2), the bias in δ2 is not that much larger when the groups are chosen by change detection than in the known-groups case.Interestingly, due to a negative bias in δ2  , the RMSE in δ2  can be slightly smaller for some wrong-groups choices than for the known-groups case.However, negative bias in δ2 could lead to more false alarms in nuclear safeguards applications, such as the application illustrated in Figure 1 [5], so it is preferable to use change detection to avoid large negative biases.Any reasonably effective change detection method [9,10] [2][3][4][5].Therefore, grouping choices that lead to negative bias in δ2 will tend to underestimate variance () and variance().
Data such as that in Figure 1(a) are collected in statistical quality control applications, where in phase I control limits    are estimated, and in phase II the process is monitored using the estimated limits from phase I [11].A key choice in quality control is the choice of rational subgroups [11]; a good rational subgrouping minimizes variation within groups and maximizes variation between groups.Ideally, only random errors are present within groups, while other effects might be present between groups.Analysis of Phase I data is retrospective, such as presented here for paired (, ) data, where the windowing scheme based on a robust MADbased estimate of δ was introduced to find good subgroups such as those found in Figure 7 (the inferred groups closely approximated the true groups).
One open question is to estimate the RMSE in δ for any particular data set for which the true values of   and   are unknown.One option is to use the same type of simulations that were used to estimate the bias and RMSE in δ (Tables 1 and 2), using δ and δ as the true values of   and   .However, this approach is likely to be quite noisy unless the number of groups  is large enough that the RMSE in δ is small.The relative RMSE of 52% in δ is too large (Table 2) for  = 6,  = 8, so more than 6 groups would probably be needed for reliable estimation of the RMSE in δ .A second open question is to estimate the actual false alarm probability 10 Science and Technology of Nuclear Installations using the estimates δ and δ in data such as that in Figure 1, when the nominal false alarm probability is specified, such as 0.05.Reference [12] addressed this question using tolerance intervals assuming that the group memberships are known.A third open question is to evaluate the impact of having to infer group memberships on δ , δ , δ , δ .
a) (O − I)/O Inspector versus operator data in group 1

threshold=. 5 )Figure 4 5 Figure 4 :
Figure4illustrates a moving window option and a cumulative sum option[3,10] (monitoring separately for positive shifts and for negative shifts) that both use the MADbased estimate of   .The data is simulated with  = 4,  = 100,   = √ 2  +  2  = 0.0022, and   = √ 2  +  2  = 0.011.The estimated number of groups ĝ is a random quantity that varies across hypothetical realizations of the data for the same

Figure 5 :
Figure5: Even with   = 0.01,   = 0.3 so   /  = 30, some group changes will not necessarily be detected (such as the change from group 1 to group 2).

Figure 6 :Figure 7 : 10 (
Figure 6: Example of change detection results using a sliding window and the MAD-based estimate of   .The simulated data had  = 6 groups, each with  = 8 observations, with   = .02,  = .01.

Figure 8 :
Figure 8: Real (O-I)/O data with the groups being the inspection periods in (a) and the groups being inferred using windowing in (b).
Groups are inferred using windowing

Figure 9 :
Figure 9: Real (O-I)/O data with the groups being the inspection years in (a), inspection periods in (b) (with multiple inspection periods per year), and the groups being inferred using windowing in (c).
Science and Technology of Nuclear Installations of   ,  2  =  2  ( 2  +  2  ), where the item variance  2  is the variance of the   , defined as  2  = ∑  =1   }/ is the overall mean of all the true values.The values  = ( − )/, by linear error variance propagation, have approximate variance  2 2  can be estimated.The variance of   is  2  =  2  + ( 2 +  2  )( 2  +  2  ), or for a fixed value 2 4, and 5 and 6 are merged.Specifically, let  =  = 36 and the assumed values are  a =12 and  a =3, but the true values are  t = 6 and  t = 6.One must take care to interpret terms because there are two values of  ( t = 6 and  a =12) and two values of  ( t = 6 and  a = 3), depending on whether the true or merged groups are being used in the calculation.Because the new group 1 has two different values of  (one for the first 6  values and another for the second 6  The between-groups sum of squares also has the wrong expected value due to the fact that var( 1 +  2 ) is being estimated, because the average of 2 successive S values is used in each of the 3 (merged) groups.So,  = (/( − 1)) ∑ =1 ( • −  •• ) 2 has expected value 12 × (  ) and ( δ2  ) when merging successive pairs of groups of size  are is split into 3 groups, creating 18 groups of size 2. It is easily shown for this case that ( δ2 ) = (15/17) 2  .The factor 15/17 arises from having 3 repeated values of  1 ,  2 , . .., and  6 in the expression
Table 2 is the same, but negative values of δ2  are set to 0, and the Table 2 entries are for estimation of   ,   ,   , and   rather than for estimation of  2  ,  2  ,  2  .The RMSE is defined in the usual manner, for example, for δ2  in Table 1 as  = √ ∑ 2/  , and the relative RMSE is defined as / 2  .The results reported in Tables

Table 2 :
Performance of estimators of δ , δ , δ for several values of  a and  a .The true values are   = .02,  = .01,  = 0.022, and negative estimates of δ2  were set to 0.