Uncertainties in Interpolated Spectral Data

Interpolation is often used to improve the accuracy of integrals over spectral data convolved with various response functions or power distributions. Formulae are developed for propagation of uncertainties through the interpolation process, specifically for Lagrangian interpolation increasing a regular data set by factors of 5 and 2, and for cubic-spline interpolation. The interpolated data are correlated; these correlations must be considered when combining the interpolated values, as in integration. Examples are given using a common spectral integral in photometry. Correlation coefficients are developed for Lagrangian interpolation where the input data are uncorrelated. It is demonstrated that in practical cases, uncertainties for the integral formed using interpolated data can be reliably estimated using the original data.


Introduction
Measurements of spectral irradiance, spectral responsivity or spectral reflectance are often made for a limited set of wavelengths and then used to calculate weighted spectral sums for photometry, colorimetry or filter radiometry. It is often necessary to interpolate the spectral data to a finer grid to avoid errors arising from the discrete approximation used to estimate the integral where the weighting function varies strongly between the wavelengths at which the measurements were made [1]. Interpolated values are then correlated to the original data and to nearby interpolated points; unless these correlations are taken into account, uncertainty calculations will give misleading results, generally underestimating errors in the spectral sums by significant amounts.
Interpolation may also be required when reference standards are provided for a limited set of wavelengths. Many of the primary reference standards in the national metrology institutes are derived in a functional form, and can be calculated on as fine a wavelength grid as required, along with all the necessary correlations [2]. However, calibrations of client lamps or detectors involve a transfer, by comparison, to artifacts that do not necessarily show a spectral variation that is easily modeled, and in the interests of reducing costs may be provided on a limited number of wavelengths. The reference data at different wavelengths may also be correlated. While the primary reference standards are often strongly correlated between wavelengths, the transfer process itself adds uncertainty that is generally random and often reduces the correlations to negligible levels [2] Where data are available at sufficient wavelengths to avoid errors due to the sum approximation to the integral, it is preferable to interpolate reference tables, such as the photometric response function V λ and the colorimetric tristumulus response functions [3] to the wavelengths at which measurements are taken, because those tables contain no uncertainty. However, interpolation of measurement data is often applied. One reason is that software for a calculation may require data at a particular interval, another is that instrument measurement programs may provide limited sets of data.
Following a brief description of uncertainty propagation, this paper is divided into two sections. The first covers Lagrangian interpolation, the second cubic spline interpolation; these are the two most commonly used interpolation methods. In each section, interpolation from a data set which is itself correlated is considered. Various simplifications for practical applications are made, and examples are presented. A conclusion is that in practical terms, uncertainty can be accurately derived from the original data set without a complex calculation of correlations.

Propagation of Uncertainty
Uncertainty propagation is described in detail in the ISO Guide to the Expression of Uncertainty in Measurement [4]. The uncertainty in a quantity y formed by combining measured quantities x i through the relationship y = f (x 1 , x 2 , ..x N ) is given by (1) where u(x i ) is the uncertainty in x i and u(x i , x j ) is the covariance between x i and x j . For uncorrelated input quantities, the covariance between pairs of variables is zero and Eq. (1) reduces to the "sum of squares" commonly applied. The derivatives ∂f / ∂x i are sensitivity coefficients for the dependence of y on the various measured quantities. Given that u 2 (x i ) ≡ u(x i , x i ), Eq. (1) can be expressed as (2) where (3) is a column vector of sensitivity coefficients ( T indicates the transpose) and (4) is the N× N uncertainty matrix.
Interpolation of spectral data is generally performed to produce a set of values (p k , x k ) from set (y i , x i ) where x i is the independent variable (generally wavelength in radiometry). Quantities in the set (p k , x k ) that depend on the same (y i , x i ) are correlated through this common dependence. The covariance between two values p k and p m (and, when k = m, the square of the uncertainty) is given by (5) In matrix form, this is simply expressed as (6) It is sometimes convenient to use correlation coefficients rather than covariances, defined as (7) A matrix of correlation coefficients is square, symmetric about the diagonal and has value 1 in the diagonal elements.

Lagrangian Interpolation
We have a tabulated function y i at values x i : For x in the range x m to x m+n , the formula for Lagrangian interpolation is [5] (9) Equation (9) represents an (n-1)th order polynomial fitted through the n original values. Interpolated data are formed as a linear combination of nearby existing data. Sensitivity coefficients for the dependence of the interpolated data on the input data are simply the weights w j in Eq. (9). Covariances between the output values, that is, the original input values plus the interpolated values, take several forms. Correlations present in the original values remain. The values newly formed by interpolation are correlated to both the input values forming them (and through them to the remaining input values if correlations are present in the original set) and to any new values formed from common input values. All of these covariances, including the uncertainty of the interpolated values, can be calculated through Eq. (6).
In many instances, the input data y i are present at regular values of x i and further, the output data are also required at regular intervals. Interpolation is usually then performed by running Eq.
T y x y u y = f U f 1 2 . .
. . mined by the interval spacings only and can be calculated prior to the interpolation. This was clearly demonstrated by Savitsky and Golay [6] in the development of their algorithms for smoothing and differentiation of spectral data where polynomial expressions of various order are fitted to regularly-spaced data; in those cases, the coefficients are determined as fixed linear combinations of the input dependent data. The arguments presented here for Lagrangian interpolation can easily be extended to cover smoothing of data using the Savitsky-Golay routines. Two-point Lagrangian interpolation forming a new value in the center of the existing values has weights (w 1 , w 2 )=( ½, ½), equivalent to a linear interpolation.
Propagation of uncertainty through two common examples of Lagrangian interpolation used in photometry are now discussed. In both of these we consider the calculation of illuminance response to CIE illuminant D65, a tabulated distribution carrying no uncertainty, for a photometer whose spectral response is a close approximation to V λ , measured at different spectral intervals and where the measured response values are uncorrelated. These distributions are shown in Fig. 1 for a wavelength interval of 5 nm. As the response function tapers to zero at each end, the luminance response is given as where R i is the photometer response and E D65,i is the illuminant value at the ith wavelength, respectively, and ∆λ is the 5 nm wavelength separation between the values.
For uncorrelated spectral response values, the uncertainty in R v is given by (11) For values as tabulated by CIE [3], the value of R v is 10567.41, with an uncertainty 19.60 (relative uncertainty 0.1855 %) if the responsivity values have a relative uncertainty of 1 % and are uncorrelated. (Note that extra significant figures for the uncertainty are presented above those of normal practice for the purpose of comparison.)

Photometer Measured at 10 nm Intervals, Interpolated to 5 nm
The spectral integral Eq. (10) for input values on a 10 nm grid evaluates to 10568.18, a small change compared to the 5 nm data due to the discrete approximation to the integral; the relative uncertainty for uncorrelated spectral response values with a relative uncertainty of 1 % becomes 0.2623 %, or the expected increase compared to the response measured at 5 nm intervals. We wish to interpolate the photometer response with a four point Lagrangian function to data on a 5 nm interval. The weights for Eq. (9) are then (12) and, as the input data are uncorrelated, the uncertainty for an interpolated value is given by (13) In any given interval spanning four input values, the uncertainty value u (y i ) is approximately constant, and the interpolated values have an uncertainty approx. 80 % of the original values in that range. If we ignore the correlations that have been introduced, the relative uncertainty of the integral evaluates as 0.17 %, low compared to the original value and clearly incorrect as it would imply that it could be reduced to zero by repeated interpolation.
A full correlation matrix for the output values is formed as follows. First the interpolation is performed   Uncertainties for these values are known for the original data and for the interpolated data from Eq. (13) and these are used to populate the diagonal elements of the uncertainty matrix U y . We then have to populate the elements to the right of the diagonal only before filling to the left of the diagonal by symmetry. As the original input values (now at i odd) are uncorrelated, we have for all i, from Eq. (5), For i even (interpolated values), and from Eq. (5) we have (17) For completeness, similar expressions were applied for the values formed by linear interpolation in the first and last intervals. Uncertainty calculated for the integral including all these correlations was then exactly that calculated with the original data for the 10 nm grid. One further simplification can be made for uncorrelated input values. In practical terms, nearby values have the same uncertainty. Hence the sets of Eqs. (14) to (17) can be reduced to a matrix of correlation coefficients determined purely from the weights, , where the first row corresponds to an interpolated value. Uncertainty for the integral using the interpolated data set is then found by modifying the sensitivity column vector to include the uncertainty at each value, and then performing the matrix multiplication Eq. (2). A negligible change relative to the true values is due to averaging through regions where the response is changing rapidly; while the relative uncertainty in these regions is constant, the absolute value is not. Table 1 shows uncertainties calculated for all the options discussed in this section. It can be seen that proper accounting for the correlations introduced by the interpolation reproduces the uncertainty calculated for the input values alone, and that using the correlation coefficients provides a practical calculation of the uncertainty matrix.

Photometer Measured at 5 nm
Intervals, Interpolated to 1 nm a 1 nm grid with a cubic-spine routine), reduces to 0.074 %, a value too low by the order of . The integral itself has the same value as that shown in Table 1 for the 5 nm grid.
For a four point Lagrange interpolation adding four values between each of the input values, correlations in the output set extend over the 19 values following an input value. The matrix of correlation coefficients is shown as the transpose relative to Eq. (18) in the interests of printing; that is, it is equivalent to filling to the lower left of the diagonal of the correlation matrix prior to filling the upper right by symmetry. Beginning at a column corresponding to an input value, the matrix of correlation coefficients is . (21) A number of these, for the values furthest from the input set, are negligible. The relative uncertainty of the integral for the interpolated data set calculated with these correlation coefficients is 0.197 %, equivalent in practical terms to that calculated using the original 5 nm data set and shown in Table 1.

Cubic-Spline Uncertainty Propagation
For our set of data Eq. (8), cubic-spline interpolation [7] calculates a value y at x in the interval x i to x i+1 as The first two terms of Eq. (22) represent simple linear interpolation. Including the second derivatives y" yields a function that has first and second derivatives continuous at the boundaries between intervals.
The second-derivatives are unknown. The relation between them is given by (27) which is a system of N-2 equations in the N unknowns . The natural cubic-spline, which is commonly used, sets (28) and solves for the remaining terms [7]. We are interested in using the cubic-spline interpolation on spectral data of known uncertainties, including the possibility of correlations, where the interpolated data may then be combined in various ways, so that not only the uncertainties in the interpolated data but also the correlations present are important in propagating uncertainties in the combinations.
The (N-2) values of depend on each of the input values, i.e., are correlated to each input value.
Then even for uncorrelated input data, the output data are correlated over the whole set of interpolated values. We wish to calculate the covariance u (y n , y m ) between two interpolated values , (29) where the y n and y m values may be in the same or different intervals denoted by i, j. The uncertainty in y n is given by The covariance between (and uncertainty of) the input values is known, carried in the matrix      Institute of Standards and Technology   2  2  1  1  "  1  1 , u y u y y u y y u y y u y y u y y u y y u y y u y u y y u y y u y y u y y u y y u y y u y u y y u y y u y y u y y with the lower half symmetric about the diagonal. Only half of this matrix is required. The two right quadrants can be separately multiplied on the right by the column vector (A j B j C j D j ) T and the two results combined into a single eight-element column vector for the final multiplication.

Cubic-Spline Interpolation Examples
Consider again the photometer response curve of Fig.1, to be integrated over the wavelength range from 360 nm to 830 nm (effectively the photometric response to an equal-energy source). The function was interpolated over the same input range (one less value) but shifted by 2.5 nm and the integral recalculated. Similarly, the function was interpolated to 1 nm intervals and the integral recalculated. Table 2 shows the results for these interpolations, where the uncertainty in the integral was calculated based on 1 % uncertainty in the input values, uncorrelated between values. The consequence of ignoring the correlation between the interpolated values is also shown. Correlations between distant points, introduced through the dependence of the second-derivatives, were negligible (largely because the response curve is relatively smooth), but strong correlations were found between near-neighbours. Figure 3 shows propagated uncertainties for the interpolation shifting the input by 2.5 nm; for an interpolation to the mid-point, we would expect the interpolated value for a smooth function to be near the mean of the two interval boundaries with a propagated uncertainty of the input (but of course correlated to adjacent values). Figure 4 shows the variation in uncertainty for values interpolated at different positions within the 5 nm interval. This is a practical concern where a wavelength offset may be present in measurements, although in general it is a better practice to retain the wavelength values of the measured points and interpolate weighting functions such as the illuminant or the colorimetric response functions as these carry no uncertainty. At the input points, uncertainties equal that of the input; at the midpoints they fall to of the input points. Again this is expected as the input data are smooth, and cubicspline results are not much different from linear interpolation. The cubic-spline reproduces the input ordinate values for abscissa values equal to an input value; for these points, we expect the propagated uncertainty to be that of the input and correlations between similar such values to match that of the input matrix U y . These conditions can be used to test the coding. Figure 5 shows V λ interpolated from a 20 nm grid to 2 nm. Where the input curve is changing rapidly relative to the magnitude of the data, linear interpolation would be discontinuous and for these regions, the relative uncertainty of the interpolated values rises above that of the 1 % assumed for the input values. This is shown more strongly in Fig. 6 for input data spaced at 40 nm, interpolated to 5 nm. Here the interpolation does not provide a good representation of V λ (as shown in Fig. 7); where the interpolation is poor, the uncertainties for the interpolated values rise above those of the input.

Conclusion
Interpolation of spectral data is a common occurrence in radiometric and photometric measurements. Those data are often then combined in forming integral values such as photometric or colorimetric responses or filter radiometer responses. Interpolation is particularly important when a relatively smooth curve available only on a wide spectral spacing may need to be convolved with a more-rapidly changing curve and then integrated. Uncertainties in the interpolated values will generally be smaller than those of the input data, although this is not always true in the case of cubic-spline interpolation. Uncertainties in combinations calculated using the interpolated data set then must include correlations introduced by the interpolation . Ignoring the correlation will lead to significant underestimation of uncertainties. The calculations in this paper for both Lagrange interpolation and for cubicspline interpolation show that the uncertainty can be reliably estimated, in practical terms, by propagating the uncertainty through the combination using only the original set of data.