Keywords

1 Introduction: Reliability and Classic Test Theory

Within statistics, the reliability of a measure has been dealt-with mostly within the framework of Classic Test Theory (CTT) [1, 2]. Measures dealt-with in CTT differ from common physical magnitudes measurements in that CTT is usually oriented to simultaneously obtaining several measures (the different items/questions in the test/questionnaire) over several objects (the subjects taking the test) [2]. The framework developed for CTT can, however, be extrapolated to certain engineering problems, outside of the behavioral sciences, as will be described later. A brief exposition of the basics of CTT, required for the rest of the paper, is presented, followed by the objectives of this paper.

In CTT, an individual measure xi, such that 1 ≤ i ≤ I being I the number of available measures, is considered to be the sum of a true value \( \uptau \) and a random measurement error ei, when applied to K subjects, such that 1 ≤ k ≤ K, and could be expressed as:

$$ x_{i} (k) = \tau (k) + e_{i} (k) $$
(1)

It is assumed that any individual error ei is uncorrelated to \( \tau \), as well as to other errors ej, for i ≠ j. The reliability of the individual item is then defined as:

$$ \rho_{i} = \frac{{\sigma_{\tau }^{2} }}{{\sigma_{{x_{i} }}^{2} }} = \frac{{\sigma_{\tau }^{2} }}{{\sigma_{\tau }^{2} + \sigma_{{e_{i} }}^{2} }} = \frac{1}{{1 - \frac{1}{{\sigma_{\tau }^{2} /\sigma_{{e_{i} }}^{2} }}}} $$
(2)

The reliability is the fraction of the observed variance accounted for by the true variance, and as such ranges between 0, for completely erroneous measurements, and 1 in the absence of error. It is actually a correlation coefficient, between the true component \( \tau \) and the measured value xi. The third expression for \( \rho_{i} \) in Eq. (2) is useful to show the relationship between reliability and the Signal-to-Noise ratio (SNR) given by the quotient between the variances of \( \tau \) and \( e_{i} \). As seen from (2), SNR has a monotonic relationship with \( \rho_{i} \). SNR is preferred in engineering fields to characterize xi, and several advantages of interpreting test items in terms of SNR were described in [3]. Nevertheless, \( \rho \) remains the most relevant feature within the CTT framework.

For any given subject, the set of I individual xi outcomes need to be aggregated to form a single score. This aggregated score X is called the composite, and can be expressed in the more general way as a weighted sum of the items, with weights wi:

$$ X(n) = \sum\limits_{i = 1}^{I} {w_{i} } x_{i} (n) $$
(3)

The aggregate score is known to have a reduced noise variance [2, 4], a fact that makes it useful in many fields where noisy repetitions of a supposedly invariant true pattern/component are available (e.g. geodesic sensing [5], electrocardiography [6] or voiced signals [7]). The former equation is equivalent to the typical averaging operation if all weights are equal or set to 1/I. There is no point in giving different weights to items with equal reliabilities, so if Eq. (1) holds, averaging results the aggregate of choice. However, all xi can be expected to have different reliabilities, resulting from different proportions between the variances of true and error components. To reflect this, the expression for the individual items given by (1) should be revised [8]. It is common to arbitrarily assign unitary variances to the true and error variances, and account for the differences in individual reliabilities by using different loadings (\( \beta_{i} \) and \( \varepsilon_{i} \)) for the true and error components, respectively. An offset term \( \mu_{i} \) can also be included:

$$ x_{i} (n) = \beta_{i} \tau (n) + \varepsilon_{i} e_{i} (n) + \mu_{i} $$
(4)

There are different models [8] to obtain the reliability of X depending on certain restrictions imposed on \( \beta_{i} \) (the scale on which the true score is measured), \( \varepsilon_{i} \) (the magnitude of error in the measurement), and \( \mu_{i} \) in Eq. (4). The unrestricted model for the xi set is called congeneric (all items measure the true score on different scales with different errors). If \( \beta_{i} \) is assumed to be constant for all i, the model is called essentially tau-equivalent (all items measure the true score on the same scale, but with different errors). If, additionally, \( \mu_{i} \) is assumed to be zero for all i, the model is called tau-equivalent. Since reliability is based in variances and correlations, not influenced by mean values, essentially tau-equivalent and tau-equivalent models yield identical estimates of composite reliability. If additionally to constant \( \beta_{i} \) and zero \( \mu_{i} \) the value of \( \varepsilon_{i} \) is also considered constant, the model is called parallel (all items measure the true score in the same scale with the same amount of error). The parallel model is the most restrictive one, and could de completely described by Eq. (1).

The most commonly used estimate of an un-weighted composite’s reliability is Cronbach’s \( \upalpha \) [9], which is based on an essentially tau-equivalent model:

$$ \alpha = \frac{Ic}{(v + (I - 1)c)} $$
(5)

Here v stands for the average variance of all xi items, and c for the average covariance across all xi items (excluding same-item covariances). Popularity of \( \upalpha \) is due to not requiring estimates of individual \( \rho_{i} \): the assumption made for the essentially tau-equivalent model allows obtaining the value in (5) only from the observed xi scores.

The reliability in the congeneric model for an un-weighted composite is:

$$ \rho = \frac{{\left( {\sum\limits_{i = 1}^{I} {\beta_{i} } } \right)^{2} }}{{\left( {\sum\limits_{i = 1}^{I} {\beta_{i} } } \right)^{2} + \sum\limits_{i = 1}^{I} {\left( {\varepsilon_{i}^{2} } \right)} }} = \frac{{\left( {\sum\limits_{i = 1}^{I} {\beta_{i} } } \right)^{2} }}{{\left( {\sum\limits_{i = 1}^{I} {\beta_{i} } } \right)^{2} + \sum\limits_{i = 1}^{I} {\phi_{i} } }} $$
(6)

The term \( \varepsilon_{i}^{2} \) has been frequently denoted as \( \phi_{i} \) being equal to the unique/error variance of the ith item. The use of (6) requires to go beyond the observed scores xi, and perform estimates of the individual true and error loadings, which is a drawback compared to (5). This can be performed by means of covariance matrix analysis [10, 11], of the kind performed in Common Factor Analysis (CFA), in this case for a single factor, \( \tau \) [12]. For the case of a weighted composite, it has been shown [13] by algebraic methods that the weights producing maximum reliability of the composite in (3) using the items described in (4) are:

$$ w_{i} = \frac{{\beta_{i} }}{{\phi_{i} }} $$
(7)

In spite of theoretically offering maximum reliability, differential weights as given by (7) are far from being widely adopted [14, 15]. A main concern is the adequacy of a single-factor CFA to estimate the loadings [16] together with reports that unity weights work rather well [17, 18]. On the other hand, the use of unity/constant weights and composite’s reliability measures like \( \upalpha \) are heavily criticized for conducting to several misleading beliefs [19], like “Reliability increases with I (test length)”, “Reliability increases with the individual item reliabilities” or “Reliability increases with the correlation between items”, all of which seem sound at first sight. Contradictory examples of all these common sense beliefs are given in Appendix A of [19].

In this paper, we attempt to provide the readers with analytical tools to understand the effect, in the composite’s reliability, of the chosen weight for a particular item.

2 Analytical Derivations

Our focus is the effect of differential weighting of xi items in the composite’s reliability. We depart from reported results in optimal reliability in two ways:

  • First, our approach is analytical: looking for the effect of the weighting of an individual item in the resulting composite, and the optimal value for weight w will then be shown within the continuum of possible weight values (including unity, as the alternative approach). This is different from the widely used algebraic solution providing the optimal values of all wi as the solution for an equation system, which prevents the user to grasp what is being gained, and if it is relevant.

  • Second, we perform our analysis in terms of SNRs instead of reliabilities. There is no additional information in SNR as compared to reliability, since they are monotonically related as shown in Eq. (2). However their known limits (reliability values from \( 0 \ldots 1 \) map into SNR values from \( 0 \ldots \infty \)) favor visualizing the influence of wi in terms of SNR. Besides, the analytical derivations of the influence of the value of w are simpler to obtain for SNR than for \( \rho \), since we depart from a simpler quotient.

Let’s assume we have I + 1 items, from where we extract one item to analyze the effect of performing a weighted addition to the existing composite consisting of the other I items. For all the I + 1 items we have estimates of the true and error loadings, presumably obtained by CFA. The items xi, 1 ≤ i ≤ I, are not necessarily weighted in the existing composite of our assumption. For the now extended composite, with I + 1 items including the newly added item weighted by weight w, the SNRI+1 is:

$$ SNR^{I + 1} = \frac{{\left( {\sum\limits_{i = 1}^{I} {\beta_{i} } + w\beta_{I + 1} } \right)^{2} }}{{\sum\limits_{i = 1}^{I} {\left( {\phi_{i} } \right)} + w^{2} \phi_{I + 1} }} = \frac{{T^{I} + 2w\beta_{I + 1} \sum\limits_{i = 1}^{I} {\beta_{i} + w^{2} \beta_{I + 1}^{2} } }}{{E^{I} + w^{2} \phi_{I + 1} }} $$
(8)

For simplicity we have represented by TI the true variance in the original (I items) composite, and its error variance by EI. For w = 0 (i.e. not adding the I + 1 item), the SNRI+1 would still be the original \( SNR^{I} = {{T^{I} } \mathord{\left/ {\vphantom {{T^{I} } {E^{I} }}} \right. \kern-0pt} {E^{I} }} \). It is straightforward from Eq. (8) that SNRI+1 has no singularities, and a double zero located at a negative value of w:

$$ w^{SNR = 0} = - \frac{{\sum\limits_{i = 1}^{I} {\beta_{i} } }}{{\beta_{I + 1} }} = - \frac{{\sqrt {T^{I} } }}{{\beta_{I + 1} }} $$
(9)

The double zero is also the location of the minimum value of SNRI+1, but we are actually interested in the location of the maximum, corresponding to the optimal value of w. We obtain the derivative of SNRI+1 as expressed in (8) with respect to w as:

$$ SNR^{{I + 1^{\prime } }} = 2\frac{{ - w^{2} \phi_{I + 1} \beta_{I + 1} \sum\limits_{i = 1}^{I} {\beta_{i} } + w\left( {E^{I} \beta_{I + 1}^{2} - \phi_{I + 1} T^{I} } \right) + E^{I} \beta_{I + 1} \sum\limits_{i = 1}^{I} {\beta_{i} } }}{{\left( {E^{I} + w^{2} \phi_{I + 1} } \right)^{2} }} $$
(10)

The roots of Eq. (10) follow those from a quadratic form, and the values are:

$$ w_{a} = - \frac{{\sum\limits_{i = 1}^{I} {\beta_{i} } }}{{\beta_{k + 1} }}\quad \quad w_{b} = \frac{{E^{I} \beta_{I + 1} }}{{\phi_{I + 1} \sum\limits_{i = 1}^{I} {\beta_{i} } }} $$
(11)

The root wa is the already known position for the double-zero, minimum value of SNRI+1 given by Eq. (9), while wb is the one corresponding to the maximum value of SNRI+1, i.e. the optimal value of w. Substituting wb as given by Eq. (11) in the expression for the SNRI+1 given by Eq. (8), yields, after some manipulation:

$$ SNR^{I + 1} \left( {w_{b} } \right) = \frac{{T^{I} }}{{E^{I} }} + \frac{{\beta_{I + 1}^{2} }}{{\phi_{I + 1} }} = SNR^{I} + SNR^{item} $$
(12)

Equation (12) shows that the maximum increment that an item can produce to a composite’s original SNRI is limited to its own SNRitem. A final element of interest could be to obtain the intersection of the SNRI+1 curve with the original SNRI value:

$$ SNR^{I + 1} - SNR^{I} = \frac{{w\left( {w\left( {\beta_{I + 1}^{2} E^{I} - \phi_{I + 1} T^{I} } \right) + 2\beta_{I + 1} E^{I} \sum\limits_{i = 1}^{I} {\beta_{i} } } \right)}}{{\left( {E^{I} + w^{2} \phi_{I + 1} } \right)E^{I} }} = 0 $$
(13)

One of the solutions is w = 0 (the no-addition case), while the other occurs at:

$$ w^{{SNR_{I + 1} = SNR_{I} }} = \frac{{2\beta_{I + 1} E^{I} \sum\limits_{i = 1}^{I} {\beta_{i} } }}{{\left( {\phi_{I + 1} T^{I} - \beta_{I + 1}^{2} E^{I} } \right)}} $$
(14)

The sign of w depend on the denominator, and will be positive if:

$$ \frac{{T^{I} }}{{E^{I} }} > \frac{{\beta_{I + 1}^{2} }}{{\phi_{I + 1} }}\quad {\text{i}} . {\text{e}} . :\quad SNR^{I} > SNR^{item} $$
(15)

According to Eq. (14), the SNRI+1 curve will show a different behavior whether the SNR of the item to be added (SNRitem) exceeds the SNR of the existing composite (SNRI) or not. With the expressions for all the relevant points obtained, we can plot representative examples of both cases, shown in top and bottom panes in Fig. 1.

Fig. 1.
figure 1

Behavior of the resulting SNRI+1 after adding an item to a composite, top: case when the additional intersect of SNRI+1 = SNRitem occurs for positive w; bottom: case of the intersect occurring for negative w. Dotted horizontal line corresponds to the SNRI of the composite, dashed horizontal line corresponds to the SNRitem. Arrows placed at the optimum value of w to show that the resulting SNRI+1 at that point is the sum of the levels of both horizontal lines.

Both figures clearly depict the behaviors analytically described above. The graphs were obtained interchanging values of 11.5 and 31.5 between the SNRI and SNRitem. The horizontal axis has been horizontally compressed. A third behavior not shown in Fig. 1, for the case SNRI = SNRitem (i.e. with both dotted and dashed lines at the same level) leaves the w = 0 as the only intersect.

2.1 Equivalence with the Previous Algebraic Solution

The expression we obtained for the optimal weight of an item to be added to a composite, i.e. wb given by Eq. (11), can be shown to be equivalent to the optimal weights described in the literature, given by Eq. (7), as long as all the items have previously been optimally weighted. The derivation above didn’t consider any weight. We rewrite the expression for wb here, due to convenience to the analysis:

$$ w_{b} = w_{I + 1} = \frac{{E^{I} \beta_{I + 1} }}{{\phi_{I + 1} \sum\limits_{i = 1}^{I} {\beta_{i} } }} = \frac{{\sqrt {\frac{{T_{item} }}{{T^{I} }}} }}{{\frac{{E_{item} }}{{E^{I} }}}} $$
(16)

By design, we require the prior existence of true and error loadings, since our procedure was conceived to add our item to an existing composite. The terms EI and the summation of \( \beta_{i} \) are not defined for a fresh start from zero (i.e. I = 0). However, we can choose any value for w1, and proceed to find w2 according to (16), since the choice of an initial weight can only affect the rest by a proportionality constant. It is quite straightforward to choose \( w_{1} = {{\beta_{1} } \mathord{\left/ {\vphantom {{\beta_{1} } {\phi_{1} }}} \right. \kern-0pt} {\phi_{1} }} \), for two reasons: it comes naturally from ignoring the undefined terms in Eq. (16), and it is also the known optimal weight from the algebraic solution. Once this value is assigned to w1, we can check the result of Eq. (16) for w2:

$$ w_{2} = \frac{{E_{1} \beta_{2} }}{{\phi_{2} \sum\limits_{i = 1}^{1} {w_{i} } \beta_{i} }} = \frac{{w_{1}^{2} \phi_{1} \beta_{2} }}{{\phi_{2} w_{1} \beta_{1} }} = \frac{{\beta_{2} }}{{\phi_{2} }} $$
(17)

This result holds with Eq. (7), and the task of incrementally finding the rest of the weights up to the amount of I + 1 items produces the same coincident result. Departing from a different value for the initial weight w1 only creates a proportionality difference.

3 Discussion and Final Remarks

The analytical approach developed here provides more insights to the researcher than the previously reported batch-oriented, simultaneous solution of the values for all the weights. With the set of equations provided, the researcher can evaluate whether the use of the optimal or unitary weight modifies or not the value of SNR in a significant way (so as to trust the loadings provided by CFA), whether the gain in SNR that an item can provide to the composite will be significant or not (so as to keep it in or out of the composite) among other uses. In particular, the functional dependency obtained for SNRI+1(wI+1) and depicted in Fig. 1 allows to understand the causes of the apparently contradictory examples provided in [19]. The corresponding explanations are not provided here for space reasons, but the readers can readily attain them by means of Eq. (10) and the behaviors depicted in Fig. 1.

The differential weighting strategies in composites are not, by far, exclusive from the behavioral sciences. Optimal weighting of repetitive patterns immerse in noise is a goal in many areas in order to recover the true component. Different strategies for recovering this pattern have been evaluated in many engineering fields, like geodesics [5], acoustics [20], evoked potentials [21], glottal pulses [22], electrocardiography [6], or perceptual judgments of pathological symptoms [23].

The analysis of the influence of the differential weighting in the resulting SNR (and consequently in the reliability of the composite/pattern recovered) and its visual comparison with the unitary weighting, made possible by the procedure described here, can be of great help for researchers in any of those areas.

Future research by the authors will address the development of procedures for selecting items/judges to be discarded from the composite.