Review of measures for light-source color rendition and considerations for a two-measure system for characterizing color rendition

Twenty-two measures of color rendition have been reviewed and summarized. Each measure was computed for 401 illuminants comprising incandescent, light-emitting diode (LED) -phosphor, LED-mixed, fluorescent, high-intensity discharge (HID), and theoretical illuminants. A multidimensional scaling analysis (Matrix Stress = 0.0731, R = 0.976) illustrates that the 22 measures cluster into three neighborhoods in a twodimensional space, where the dimensions relate to color discrimination and color preference. When just two measures are used to characterize overall color rendition, the most information can be conveyed if one is a referencebased measure that is consistent with the concept of color fidelity or quality (e.g., Qa) and the other is a measure of relative gamut (e.g., Qg). ©2013 Optical Society of America OCIS codes: (330.1690) Color; (330.1715) Color, rendering and metamerism; (230.3670) Light-emitting diodes. References and links 1. CIE, “Methods of measuring and specifying colour rendering properties of light sources,” in CIE 13 (CIE, Vienna, Austria, 1965). 2. W. Walter, “How meaningful is the CIE color rendering index?” Light Design Appl. 11(2), 13–15 (1981). 3. T. Seim, “In search of an improved method for assessing the colour rendering properties of light sources,” Lighting Res. Tech. 17(1), 12–22 (1985). 4. K. W. Houser, “Lighting for quality,” Light Design Appl. 32(11), 4–7 (2002). 5. J. A. Worthey, “Color rendering: asking the question,” Color Res. Appl. 28(6), 403–412 (2003). 6. CIE, “TC 1-62: Color rendering of white LED light sources,” in CIE 177:2007 (CIE, Vienna, Austria, 2007). 7. CIE, “TC 1-69: Color rendition by white light sources,” (Accessed Nov 18, 2012) http://div1.cie.co.at/?i_ca_id=549&pubid=239. 8. CIE, Division 1: vision and color, meeting minutes, (Taipei, Taiwan, Sep. 26–27, 2012), 26–28. 9. R. Luo, “An update of the div. 1 meeting in Taipei,” e-mail distributed to TC1–69 listserv. Oct 10, 2012. 10. X. Guo and K. W. Houser, “A review of color rendering indices and their application to commercial light sources,” Lighting Res. Tech. 36(3), 183–199 (2004). 11. M. P. Royer, K. W. Houser, and A. M. Wilkerson, “Color discrimination capability under highly structured spectra,” Color Res. Appl. (Online) Nov 2011. 9 pgs. DOI:10.1002/col.20702. 12. D. B. Judd, “A flattery index for artificial illuminants,” Illum. Eng. (USA) 62, 593–598 (1967). 13. P. J. Bouma, “Physical aspects of colour; an introduction to the scientific study of colour stimuli and colour sensations,” (Eindhoven: Philips Gloeilampenfabrieken (Philips Industries) Technical and Scientific Literature Dept. (1948). 14. W. A. Thornton, “The quality of white light,” Lighting Des. Appl. 12, 51–52 (1972). 15. J. A. Schanda, “A combined colour preference – colour rendering index,” Light. Res. Tech. 17(1), 31–34 (1985). 16. W. A. Thornton, “A validation of the color-preference index,” J. Illum. Eng. Soc. 4(1), 48–52 (1974). 17. W. Davis and Y. Ohno, “Color quality scale,” Opt. Eng. 49(3), 033602 (2010). 18. Y. Ohno and W. Davis, “NIST CQS version 7.5,” Excel Software, Sep. 10, 2009, (personal communication,


Introduction
Limitations of the general color rendering index (R a ) provided by the Commission Internationale de l´Eclairage (CIE), first introduced in 1965 [1], are well known and documented [2][3][4][5]. Nevertheless, R a has worked well enough to remain in continuous use for nearly 50 years. R a 's limitations are especially pronounced when applied to highly-structured spectral power distributions (SPDs)-that is, SPDs with sharp changes in slope, spikes, discontinuities, or some regions of smoothness and others that are spiky-including those produced by some solid state lighting (SSL) light sources such as light emitting diodes (LEDs). CIE TC1-62 Colour Rendering of White LED Light Sources, established in 2002, concluded that R a cannot even correctly rank-order the color-rendering ability of light sources when LEDs are included [6], yet the committee did not recommend an alternative measure.
CIE TC1-69 Colour Rendition by White Light Sources, established in 2006, was formed "to investigate new methods for assessing the color rendition properties by white-light sources used for illumination, including solid-state light sources, with the goal of recommending new assessment procedures." [7] At the CIE September 2012 meeting in Taipei, TC1-69 agreed to produce a technical report based on work to date, after which it will close. It is anticipated that the draft report will not make a single recommendation. It is expected to be a progress report that includes summaries from the different research groups that have developed their own approaches and measures for color rendition [8]. The membership of TC1-69 believe they need more time to verify the performance of the various models that have been developed before offering one solution for use by the lighting industry [9].
Two new TCs will continue where TC1-69 left off. TC1-90 Colour Fidelity Index was established "to evaluate available indices based on colour fidelity for assessing the colour quality of white-light sources with a goal of recommending a single colour fidelity index for industrial use". TC1-91 New Method for Evaluating the Colour Quality of White-Light Sources was established "to evaluate available new methods for evaluating the colour quality of white-light sources with a goal of recommending new methods for industrial use. (Methods based on colour fidelity should not be included)." Both committees have been given four years to perform their work [8].
Meanwhile, industry and standards organizations have an exigent need for a measure (or measures) of color rendition that faithfully characterize the color quality of light sources, especially SSL light sources. The question is: Is there now enough information to recommend a solution that can faithfully and reliably serve the needs of the lighting industry? In this paper, we begin the first step of answering this question by reviewing the color measures that have already been put forth and to understand their similarities and differences as well as strengths and shortcomings.
This paper includes a summary of 22 indices for color rendition that now appear in the literature, including those considered by TC1-69 and that will continue to be evaluated by the new CIE TCs. This component of the paper is an extension and update of the 2004 review performed by Guo and Houser [10]. We then summarize our process for computing each of the 22 indices for a set of 401 SPDs, and present statistical analyses that quantify the similarities, differences, and interrelationships between the 22 indices, plus CCT.
Our objectives are: 1) Summarize the work that has already been done by others; 2) Quantitatively demonstrate that there are many commonalities among the existing measures, including evidence of how they cluster; 3) Contribute to the discussion regarding a twomeasure system for characterizing color rendition.

Existing measures of color rendition
The appendix provides a summary of the 22 indices that are considered in this paper. The reader is referred to the cited references for thorough treatment of the computational details. The indices fall into one of the three basic classes of color rendition: the accurate rendition of colors so that they appear as they would under familiar reference illuminants; rendition of objects such that they appear pleasant, vivid, or flattering; and the capability of an illuminant to allow an observer to distinguish between colors when viewed simultaneously. These dimensions of color rendition are respectively referred to as color fidelity, color preference, and color discrimination [11][12][13][14][15]. In column 3 of the appendix, we have classified the various indices as "F", "P", and "D" based on our understanding of the authors' original intent. The groupings are not based on numerical analyses and they should not prejudice one measure over another. Moreover, the classifications are not entirely independent. Gamut has sometimes been used as a proxy for both discrimination and preference, but gamut is an imperfect predictor of both. In the case of preference, for example, excessively large gamut can make object colors appear too saturated, which is neither natural nor preferred [12,16]. In the case of either preference or discrimination, increases in saturation that lead to a larger gamut are almost always accompanied by hue shifts [17]. Thus, color discrimination and preference improvements associated with increased saturation may be offset by hue shifts. In the appendix, gamut-based indices have been denoted with either a "D" or "P" as per our understanding of the developer's original intent. We also employed the letter "Q" as an abbreviation for "Quality" to denote that that index has been employed as something other than a pure measure of fidelity, preference, or discrimination.
As will be expanded upon in the discussion, there is little support to suggest that any single-number index can capture more than one dimension. It is not possible to simultaneously maximize fidelity, preference, and discrimination with any one illuminant because they have conflicting optimization criteria [10,[12][13][14][15][16][17]. Table 1 provides a summary of the 401 SPDs employed in this study. These comprise 107 SPDs from the CQS programs provided by NIST [18,19], 12 from CIE (i.e., F1 -F12) [20], 28 from Guo and Houser [10], 100 from Wei and Houser [21] , and 20 additional SPDs that we digitized from other sources. We also computed and included 8 phases of blackbody radiation from 2,000 -4,999 K and 6 CIE D illuminants from 5,000 -8,000 K.

SPDs and method of computing measures of color rendition
For the contemporary measures now being considered by CIE TC1-69, we contacted the respective authors and requested Excel spreadsheets that would allow us to compute their measures. Programs were generously provided, allowing us to compute CQS (Q a , Q f , Q p , Q g ), MCRI, R a 2012, RCRI, FCI94, and FCI02. An Excel spreadsheet previously developed by Guo and Houser [10] was employed to compute CCT, R a , R 9 , R a O, R f , CPI, CDI, CRC84, CRC93, CSA, and PI. We wrote new code to compute FMG, FSCI, and GAI. All computations were performed using SPDs from 380 -780 nm in 5 nm increments. If the SPD was not originally in that format, we employed derivative-constrained-spline interpolation. Extrapolation was never done; if the SPD did not extend to either 380 or 780 nm, then unreported values were set to zero. The computations yielded a data set from which we were able to perform statistical analyses of the interrelationships between the 22 indices. We also O T -T a Fluorescent models include broadband and narrowband b e.g., Equal-Energy, Clipped Incandescent, Ideal Prime Color included CCT in our analyses since gamut area increases with CCT for CIE reference illuminants (i.e., blackbody radiation up to 5000 K and daylight phases at or above 5000 K).
For the 10 gamut-area measures that are computed with reference to the gamut of a fixed reference illuminant (i.e., FMG, CDI, GAI, CRC84, CRC93, CSA, PI, FCI94, FCI02, and  FSCI), we also computed modified versions using the gamut of a reference illuminant with the same CCT as that of the test source, either blackbody radiation or a CIE D illuminant. We edited the source code in the Excel spreadsheets and Visual Basic for Applications (VBA) macros in order to perform these computations.

Data distribution
The distribution of all 22 indices and CCT were checked for normality using measures of skewness and kurtosis.
Skewness measures the extent to which a distribution of values deviates from symmetry around the mean. A value of zero means that the distribution is symmetric. Positive skewness indicates a greater number of smaller values and negative skewness indicates a greater number of larger values. Computed values of skewness were mostly within an acceptable range of ± 2.
Kurtosis is a measure of the peakedness or flatness of a distribution. A kurtosis value near zero indicates a shape close to normal; positive kurtosis values indicate a shape flatter than normal and negative values indicate distributions which are more peaked than normal. Kurtosis values of ± 2 are usually considered acceptable for employing statistics that are based on the assumption of normality. Computed values of kurtosis for the 23 distributions ranged from 0.02 to 30.8 with most greater than 2.
In the following data analyses, we employed statistical methods that do not require normality of the data distribution.

Correlation
We employed Spearman's correlation method to analyze correlations between the 22 indices. Spearman's coefficient measures the rank order of points and makes no assumptions about data distribution. Table 2 is a matrix of Spearman correlation coefficients arranged to highlight three clusters of highly correlated measures.

Multidimensional scaling
Multidimensional scaling (MDS) was employed to further assist in identifying clusters of similarity and to identify the underlying dimensionality. MDS uncovers the structure and relationships in a set of data by finding a representation of the indicators (i.e., color measures) in a low-dimensional space such that the matrix of Euclidean distances among the indicators corresponds as closely as possible to some characteristics of the input matrix. We generated input dissimilarities (i.e., distances) between every pair of measures using the multivariate raw score data (i.e., values of the 23 computed measures for each of the 401 SPDs). Since the various measures employ different scales, they were converted to z-scores prior to performing the MDS analysis. The output was a spatial map of color measures, each represented as a point in a low-dimensional space. The greater the dissimilarity between a pair of color measures, the further apart the points lie on the spatial map.
The low-dimensional space was optimized to minimize stress, a criterion function indicating the lack of correspondence between the distances among points in the MDS map and the input matrix. Values of matrix stress smaller than 0.2 indicate a good fit. R 2 values are the proportion of variance that is accounted for by the MDS model, with a maximum value of 1.0. Our two-dimensional MDS solution has a matrix stress value of 0.0731 and R 2 of 0.976, indicating that this solution is an excellent fit to these data. We also computed onedimensional and three-dimensional MDS solutions. The one-dimensional solution had poor Table 2. Matrix of Spearman Rank correlation coefficients that also illustrates blocks of similarity from the MDS scaling solution (see next secion). The upper left shading in orange █ identifies a cluster that can be called "fidelity-based" measures, the middle shading in green █ identifies a cluster that can be called "preference-based" measures, and the lower right shading in blue █ identifies a cluster that can be called "gamutbased" (discrimination) measures. The date that each index appears in the literature is also provided. ** indicates that the correlation is signification at the 0.01 level and * at the 0.05 level (2-tailed). values for stress and R 2 and the three-dimensional solution only marginally improved the fit. The final MDS solution is presented as Fig. 1. We also computed MDS solutions for real light sources only (N = 263), theoretical illuminants only (N = 147), and LED sources only (N = 233). The MDS solutions for all three sub-sets had similar stress and R 2 values and spatial configurations comparable to that shown in Fig. 1, indicating the stability of the twodimensional solution. Figure 1 illustrates that most indices clustered into one of three distinct neighborhoods, highlighted by orange, green, and blue bubbles. The color scheme and the clusters correspond with the regions identified in the correlation matrix of Table 2. Based on the measures in each neighborhood, we refer to the three clusters as fidelity-based (orange), gamut-based (blue), and preference-based (green). Those labeled as gamut-based are generally intended to characterize color discrimination. There is subjectivity in forming the neighborhoods. CCT, for example, has a positive and relatively high Spearman Rank correlation coefficient with the measures that we have called gamut-based (0.78 < R < 0.91). We did not include CCT in that group because it lies some distance away and it is not intended to be a measure of color rendition. What we have called the preference-based neighborhood is also larger than the other two, suggesting that that group of measures employ more dissimilar computational methodologies than the other two groups, leading to higher heterogeneity. Q g may appear to be out of place in what we have called the preference-based neighborhood since it is a measure that is (relative) gamut-based; this is discussed below.
Orthogonal axes have been superimposed on the spatial map and are suggestive of two dimensions that underlie these data. The horizontal dimension, which we have labeled "Color Discrimination", is bounded on the left by what we have called fidelity-based measures and on the right by what we have called gamut-based measures. Color discrimination is operationally similar to gamut area and to the magnitude of color-shift vectors in chromaticity space. Thus, the plot reveals a tradeoff between fidelity and discrimination. The vertical dimension, which we have labeled "Color Preference", is bounded at the upper-end by what we have called preference-based measures. Ignoring FSCI for the moment, which we discuss below, the lower-end is bounded by what we have labeled fidelity-based measures and CCT. The spatial map suggests that what we have labeled as gamut-based measures are generally high in color discrimination, what we have labeled as preference-based measures are generally high in color preference, and what we have labeled as fidelity-based measures are relatively low on both.
Consider NIST's CQS indices Q a , Q p , Q g , and Q f as a way of interpreting the meaning of the dimensions. Q f , a measure of fidelity, plots at nearly the same location as other fidelitybased test-color sample methods (e.g., R a , R a O, R a 12). Q a plots above Q f along the vertical dimension of color preference. This is appropriate because in the computation of Q a , a test source that increases object chroma is not penalized (nor is it rewarded), whereas any shift is penalized in the computation of a pure fidelity measure such as Q f . This was incorporated into the formulation of Q a based on evidence that increases in object chroma are not detrimental to color quality as long as they are not excessive. Q p , a measure of preference, plots further up the vertical dimension of color preference, and above Q a . Q p places additional weight on increases in object chroma and rewards a light source for that behavior. Note that Q p plots at the lower portion of the preference neighborhood. This likely occurs because in the computation of Q p , scores are rescaled so that the average score for the 12 reference fluorescent lamp spectra (i.e., CIE F1 -F12) is equivalent for Q p and R a [17]. Considering the color fidelity/discrimination dimension, Q p plots further to the right of Q a , and Q g plots still further to the right. This reflects the fact that these indices progressively credit increases in gamut. Q f plots slightly to the right of Q a , though they are very close; this is reasonable since neither give credit for increasing gamut. Finally, Q g , a measure of relative gamut area, plots well above Q p on the color preference dimension and also falls within the color preference neighborhood. Q g has a different scaling than Q a , Q f , and Q p . It is normalized by the gamut area of a reference illuminant that is at the same CCT as the test illuminant. Values can be greater than 100, which can account for its position relative to Q a , Q f , and Q p . Similar logic can be followed to understand the relationships between other measures on the spatial map.
To demonstrate the level of correlation between measures belonging to the same MDS cluster, Fig. 2 includes scatter plots that illustrates the correlation between R a and Q a and between R a and R a 12. The greatest differences between R a and Q a appear when one of the two values is below 60. There does not appear to be marked differences between R a and Q a as a function of light source type. R a 12, in general, penalizes fluorescent light sources, especially those with R a between 80 and 85. Nevertheless, the overall high correlations between measures within the same cluster suggest that the evaluation of spectra will be rather robust against the specific choice of a metric in a given MDS cluster. More pointedly, R a 12 and Q a show high correlation with R a despite the many improvements and refinements in these newer formulations. Most gamut-based measures fall between the fidelity-based and preference-based measures on the color preference dimension. This is reasonable because increasing gamut, in comparison to the reference illuminants, frequently leads to an increase in preference [12,16,[28][29][30][31][32][33]. However, the increases in saturation that are required to increase gamut are generally accompanied by hue shift that may not be preferable. Whereas larger gamut is always better for a gamut-based measure, a well constructed preference-based measure will account for the fact that oversaturation and hue shifts can make objects look unnatural and not-preferred, even when they increase gamut.
Not all indices may appear to cluster in the group that might have been expected based on the intent of the original developer. For example, Judd's Flattery Index (R f ) was intended to characterize preference, but our data suggests that it actually resembles fidelity-based measures, raising questions about its efficacy in characterizing preference. In order to establish an R f value of 90 for the reference illuminant and to employ the same constant as used in the computation of R a (i.e., 4.6), Judd reduced the preferred color shifts to 1/5 of their experimental values. That decision effectively doomed the utility of the R f to characterize preference and made it perform similarly to R a . Thornton built upon Judd's work when developing CPI and he preserved the full magnitude of preferred color shifts. CPI clusters in the neighborhood of preference-based measures.
Of the gamut measures plotted, Q g may appear to be anomalous since it plots in the preference neighborhood. But Q g (in version 9.0c of NIST's CQS, which is what we have reported) is computed differently than all other gamut-based measures: it employs a variable reference at the same CCT as the test illuminant rather than employing the same reference for all test illuminants. Unlike the other gamut-based measures that cluster in the blue neighborhood at the right side of the spatial map, Q g is not correlated with CCT (Spearman Rank correlation coefficient = −0.206). We discuss this further in the next section.
FSCI and PI do not cluster within any of the three identified groups. FSCI is computed differently than all other indices. It is based on a spectral-bands method that only considers the similarity of the test illuminant's SPD to that of an equal-energy illuminant; it does not

CCT correlations and renormalization
CCT plots far from the preference indices, suggesting that high CCT is not associated with high preference. Indeed, the correlation matrix of Table 2 shows a negative correlation between CCT and all of the indices that clustered into the preference neighborhood. CCT, however, is moderately correlated with the gamut-based measures, which is consistent with the fact that gamut tends to increase with CCT for CIE reference illuminants. As an example, Fig. 3 illustrates the correlation between CCT and GAI. GAI is favored by higher CCTs because gamut area increases with CCT [16]. Similar trends were observed between CCT and other fixed-reference gamut-based measures (e.g., CDI, CRC84, CRC93, CSA, FMG, PI, FSCI). From a practical standpoint, it is not ideal for a measure of color rendition to be correlated with CCT. In the practice of lighting design it is common to first select CCT to set the overall color-tone of the environment. After CCT is selected, other color quality characteristics can then be evaluated, such as fidelity, discrimination, and preference. Most measures of rendition accommodate this process by using a reference illuminant of variable CCT (matched to the test spectrum), but this is not the case for measures based on gamut: all measures of gamut studied, with the exception of Q g , are computed with reference to a high CCT illuminant such as D65, CIE Illuminant C, or equal-energy.
To remove dependence upon CCT for all measures that are tied to a reference illuminant at a fixed CCT, we recomputed these measures using a reference illuminant at the same CCT as the test illuminant. For the reference, we employed blackbody radiation below 5000 K and a phase of daylight at or above 5000 K. These modified measures are abbreviated as CDI VR , CRC84 VR , CRC93 VR , CSA VR , PI VR , FCI94 VR , FCI02 VR , FMG VR , and FSCI VR , where "VR" stands for "variable reference". Note that because CDI and GAI have a correlation of 1.0 and are essentially equivalent (i.e., they measure the gamut area enclosed by the eight test-color samples that are used in the computation of R a , but using different reference illuminants), we did not compute a variable reference version of GAI. In effect, this means that the renormalized measures from the blue cluster of Fig. 1 employ a variable reference at the same CCT as the test illuminant. This is the procedure already used in the definition of Q g . (Note: In [17,18] Davis and Ohno computed Q g with reference to D65. The change to a variable reference was introduced in version 9.0 of NIST's CQS formulation [19]. To our knowledge, a description of this new formulation of Q g has not heretofore appeared in the refereed literature. We only discovered NIST's new formulation after developing our own VR versions of the other gamut-based measures.) Fig. 4 provides the MDS spatial map for these data (Matrix Stress = 0.107, R 2 = 0.961). When CCT dependence is removed, three neighborhoods appear on the spatial map. The neighborhoods can be classified as fidelitybased (orange oval, left), those based on target chromaticity coordinates (blue oval, middle), and those based on relative gamut (green oval, right). FSCI VR and CCT do not cluster with any of these three groups, likely for the reasons previously discussed.
The horizontal axis is much more strongly pronounced than the vertical axis. We have labeled it "Relative Saturation", an aspect of color perception that relates to preference, discrimination, and gamut. Measures in the leftmost cluster, which are fidelity-based measures, have comparatively low relative saturation and high fidelity. Relative saturation increases for the three measures in the cluster that are based on target appearance, where the target chromaticities that underlie these measures are intended to relate to an aspect of preference. MCRI has fixed target chromaticities that represent a presumed ideal archetype. Q p has target chromaticities on constant hue lines that allow for chroma enhancement. CPI has target chromaticities that allow for chroma enhancement in comparison to a reference illuminant. Relative saturation increases still more for measures in the rightmost cluster, which contains measures that are based on relative gamut. Q g is the only measure in the rightmost cluster that is an existing measure of color rendition. All of the others are based on our computational modification where we employed a reference of variable CCT.
Because of chromatic adaptation, and because CCT is selected to set the overall color tone of an environment as part of the lighting design process, we believe that variable-reference measures are especially relevant to applied lighting design. If the relative gamut is greater than that of the reference, and illuminance is lower than that provided by daylight, then an increase in preference and discrimination might be expected relative to the reference at that same CCT. If the relative gamut is smaller than that of the reference, then a decrease in preference and discrimination might be expected relative to the reference at the same CCT.
The collapse of the three MDS clusters to one dimension (relative saturation) is especially interesting in view of the relative information contained in the various measures of color rendition. As already discussed and presented in Table 3, measures pertaining to the same cluster have high correlation. The left (fidelity) and right (relative gamut) clusters are most differentiated along the horizontal dimension; that is, they are least correlated along this  dimension. Thus, if a SPD is characterized with just two derived measures, a maximum amount of uncorrelated information will be retained if one measure comes from each of these two clusters.

Discussion of considerations for a two-dimensional color scale
Any proposal for a new measure of color rendition must be guided by practical and theoretical considerations. Like others [5,10,12,17,30,34,35], we believe that one number cannot fully encapsulate the multidimensional problem of color rendition. We also accept that the lighting industry needs a simple and readily interpretable tool for communicating color quality. To address these considerations, below we consider the concept of a two-dimensional scale that could be simplified into a one-word categorical rating. The framework builds upon the work of Guo and Houser [ Guo and Houser performed a factor analysis based on 8 measures of color rendition for 34 illuminants, yielding two underlying factors that they labeled as reference-based (fidelity) and gamut-based. Guo and Houser suggested the use of one reference-based (fidelity) and one gamut-based measure when evaluating light sources for general illumination, with the caveat that they did not support using a blackbody reference below 5000 K [10]. Rea and Freyssinier-Nova created a set of 8 white-light illuminants that varied in R a , GAI, and FSCI. They performed a series of psychophysical experiments related to color discrimination, vividness, and naturalness. Their central conclusion was that GAI should be used in conjunction with R a and that neither measure is adequate on its own [30]. Rea and Freyssinier-Nova later made numerical recommendations, proposing that an illuminant should have both a R a of between 80 and 100 and a GAI between 80 and 100 for spaces where color is important [31, 32]. Dangol and others performed psychophysical experiments at three different CCTs (2700, 4000, 6500 K) using lighting booths equipped with fluorescent and LED sources. They purposely adjusted the LED spectra to vary some measure of color rendition (i.e., Q p , Q g , FCI) while maintaining R a at a value of 80. They concluded that people's judgments of naturalness and overall preference could not be predicted with a single measure, but required the joint use of a fidelity-based measure (e.g., Q p ) and a gamut-based measure (e.g., Q g or GAI) [38]. Smet and his colleagues suggest that Ra and GAI can be combined into a single number that relates to naturalness [40].
In consideration of Rea and Freyssinier-Nova's proposal, Fig. 5 is a scatter plot of R a versus GAI for the 401 SPDs in our data set, coded by CCT ranges. The plot illustrates that higher CCT light sources are strongly favored in simultaneously achieving high values of both R a and GAI. For example, fourteen illuminants in our data set have values of R a and GAI between 90 and 100; their range of CCT is 3921 to 7412 K with a mean of 5687 K.
There are five illuminants that have an R a of between 90 and 100 and a GAI greater than 100: two phases of daylight at 7,500 and 8,000 K, two versions of the F40T12/C75 lamp (7821, 7867 K), and one theoretical fluorescent lamp (6378 K). We suggest that these illuminants could be excellent in some applications. Daylight, for example, is widely considered to be the standard by which other sources are judged for color rendition, and F40T12/C75 lamps have been successfully employed for decades in color critical applications. Yet, these lamps fail to meet Rea and Freyssinier-Nova's criteria because GAI exceeds 100. Recall that Rea and Freyssinier-Nova compute GAI with reference to an equalenergy spectrum, which has a CCT of 5455 K and a smaller gamut than that of these sources. These examples underscore a limitation of employing a fixed reference and simultaneously setting an upper limit on GAI, as has been done by Rea and Freyssinier-Nova. Some common phases of daylight with a CCT greater than that of EEW produce a gamut greater than EEW. Thus, those phases of daylight have GAI greater than 100 and would be considered unacceptable in the Rea and Freyssinier-Nova model. Even without specifying details-such as light level, the objects being illuminated, or the specific color rendering objectives-we suggest that daylight is an excellent illuminant for color rendition.
We support the development of a two-measure system. Below we offer considerations that we believe to be relevant when developing a system suitable for applied lighting.
1. Test-Color Samples: Davis and Ohno showed that light sources can perform poorly with saturated test-color samples even when they perform well with the 8 desaturated testcolor samples employed in the computation of R a and GAI. Computations by Davis and Ohno suggest that the inverse is never true [17]. Smet and his colleagues suggest a set of imaginary test-color samples that span the visible spectrum [41]. The fundamental rationale is that steep slopes and sharp changes in slope (within a test-color sample or an illuminant) are more likely to lead to color shifts than gentle slopes and gentle changes in slope (within a test-color sample or an illuminant). Therefore, it is more difficult for an illuminant to minimize color shifts when the test-color samples have steep slopes, and steep slopes in the spectral reflectance distribution of a test-sample color are characteristic of more saturated colors. It follows that more saturated test color samples provide a more difficult test of a light source's color rendition ability. Regardless of the specific approach adopted, there is evidence that test-color samples should not be desaturated. 2. Consideration of Illuminance: Daylight is an outstanding light source that renders colors naturally and with high fidelity. It has been employed as the reference for most gamutbased indices. Yet, a person's experience and reference for daylight is outdoors, where illuminance can be as high as 100,000 lx. Daytime outdoor illuminance is almost always greater than the 50 -1000 lx that is typical of indoor illuminance from electric light sources. The magnitude of illuminance strongly affects the appearance of colored objects. Perceived hues are dependent upon illuminance (Bezold-Brucke effect) [42], colors appear more saturated under higher illuminance (Hunt effect) [42], and color discrimination performance is dependent upon illuminance level [30,43].
If an electric light source increases object saturation relative to a reference illuminant at a typical indoor illuminance level, then the object may appear more like it would under daylight at a typical outdoor daytime illuminance level. Nevertheless, it may not be practical to consider illuminance directly within a color rendition system. Most measures of color rendition are based on relative colorimetry, which ignores illuminance level. In our computations, only MCRI considered illuminance level or the degree of adaptation. Yet, given that color quality measure will almost always be employed at illuminance levels much less than that provided by daylight, a source should not be harshly penalized (if at all) for increases in gamut. As a practical matter, no measure that is based only on chromaticity will be able to predict color appearance.
3. Consideration of CCT: Lower CCT light sources can have excellent color qualities, including excellent color-discrimination performance despite having a smaller gamut [11,30]. This characteristic of human perception is not captured when one of the dimensions of a twomeasure system is pegged to a particular CCT, as occurs when a single reference is employed for a gamut-based measure. A two-measure system should reflect quality in lighting applications, which includes a consideration of CCT to set the overall color tone of the environment. We do not believe that it is appropriate for a two-measure system that is intended to characterize color quality to strongly favor higher CCT illuminants. This occurs because most two-measure proposals include a measure of gamut area, where gamut area is normalized to a reference with a relatively high CCT and relatively large gamut. There are alternate approaches. One would be to use references of different gamut areas at a limited number of CCTs, perhaps aligning with ANSI bins [44]. A second approach would be to eliminate the use of reference illuminants and instead use target chromaticity coordinates [12,16,45], possibly using different targets for different CCTs. A third approach is for both dimensions of a two-measure system to be based on comparison to a reference illuminant at the same CCT. The later approach is what we explore below, employing "VR" normalized gamut indices in conjunction with a measure based on fidelity.
4. Categorical Definitions: Rea and Freyssinier-Nova provided only two categories in their two-dimensional plot of fidelity and gamut: acceptable and unacceptable. We build on their basic idea [32], and that offered by Bodrogi and his colleagues [39], by suggesting the use of word-categories to define regions of a two-dimensional plot. Numerical regions can be defined to represent excellent, good, fair, and poor color quality. We believe that a simple word scale has the potential to capture overall color quality, would be especially useful to end-users, and could be considered for consumer packaging.
5. Choice of Measures: When more than one measure is employed for evaluating color quality, they should reflect two salient and scalable aspects of color rendition that have relatively low correlation with each other. Table 1 shows high correlation among three groups of indices, Fig. 1 illustrates that most existing measures cluster into one of three neighborhoods, and Fig. 4 illustrates that when CCT dependence is removed the measures essentially collapse onto one dimension. When considering the paired use of two indices, the two indices should be selected from opposite ends of the horizontal axis shown in Fig. 4. We developed a spreadsheet tool that plots the relationship between any two user-selected indices to assist in evaluating candidate measures. After studying all paired combinations, and considering the above criteria, we tentatively suggest Q a to represent the fidelity neighborhood and Q g to represent the preference neighborhood. A plot is provided as Fig. 6. We initially expected to select more pure measures of fidelity and gamut, essentially updating and confirming the proposal of Rea and Freyssinier-Nova [32]. van der Burgt and van Kemenade also suggest that a pure fidelity measure should be one component of a twomeasure system [46]. A pure fidelity measure, however, penalizes all color shifts and may incorrectly penalize some illuminants for favorable increases in chroma. We believe that Q a is more reflective of color quality in application because it does not penalize illuminants for chroma increases. Traditional gamut measures are inappropriate because of their dependence upon CCT. Q g was selected because it is an existing measure of relative gamut and because it shares some of the same computational framework as Q a , such as test-sample colors. We believe that each of the two dimensions should have individual meaning and predict a criterion of color rendition or a hybrid criterion. Figure 6 takes a hybrid-criterion approach by commingling Fidelity / Quality / Naturalness on the horizontal axis and Preference / Discrimination on the vertical axis. We believe that this is reflective of how light sources perform, how people perceive color, and how designers select light sources for applied lighting.
Rather than indiscriminately computing color differences, the Q a computation does not penalize (nor does it reward) an illuminant for increasing object chroma. Hue shifts are penalized, as are chroma shifts that desaturated object colors. For example, and as illustrated  6. Plot of Q a vs. Q g . The horizontal axis is related to fidelity and is a proxy for quality or naturalness when used for general illumination. The vertical axis is a measure of relative gamut and is a proxy for preference and discrimination. Refer to Table 2 for an explanation of the abbreviations.
in Fig. 6, neodymium lamps are rated highly (see the three orange + symbols with Q a ≈90 and Q g ≈115). They have an especially large gamut relative to a blackbody at the same CCT and that leads to a high score for Q g . But the increase in gamut is accompanied by a hue shift, which lowers the score for Q a . The scatter plot of data points form a triangular pattern, converging near 100 for both Q a and Q g . This occurs because all sources with Q a = 100 will have a value of Q g near 100, while lower values of Q a allow for a much wider range of Q g . These measures provide useful information individually, are mutually complimentary, and limit each other when considered together.
As a next step, we believe that the lighting community should develop a two-measure system color rendition and we will enthusiastically participate in that work. In our opinion, the first goal of a two-measure system should be to communicate the maximal amount of information about a light-source's color rendition potential. It must be readily interpretable by design professionals and formulated in such a way that it can be simplified even further into grades, classes, or words that would be understood by the general public. We do not believe that it is necessary for a new system to incorporate existing measures, but we also see no reason to invent new measures if what already exists can be intelligently combined into a two-measure system. While we cannot predict the future, we can imagine SPDs becoming even more discontinuous and spiky, just as we can imagine paints, plastics, and other materials with highly structured spectral reflectance distributions. This is our rationale for moving away from measures that rely on CIE test-color samples 1 -8. It is also desirable to employ the same set of test-color samples for both measures. We believe that each of the two dimensions should have individual meaning and predict a criterion of color rendition or a hybrid criterion. Figure 6 takes a hybrid-criterion approach by commingling Fidelity/Quality on the horizontal axis and Preference/Discrimination on the vertical axis. We believe that this is reflective of how light sources perform and of how people perceive color. The straw-man of Fig. 6 is readily interpretable, providing information that can be helpful to expert users. If regions of the two-dimensional space of Fig. 6 were to be defined with words such as excellent, good, fair, and poor, these words could have meaning to non-experts, including the general public. Table 3 list details about some of the 401 illuminants in our data set, including numerical scores for CCT, R a , Q f , and Q a .

Conclusions
While we appreciate and respect the considerable work that has gone in to developing new and improved measures of color rendition, especially as part of CIE TC 1-69, the above analyses suggest that the newer indices are not remarkably different from the older ones. Many of the newer measures have stronger theoretical underpinnings, for example by employing improved test-color samples and the latest CIE color appearance models, chromatic adaptation models, and/or colors spaces. Nevertheless, when the output of the computations is a single number, frequently on a scale of 0 -100, these improved computational engines yield results that are highly similar to longstanding measures that were based on cruder underlying models. The basic problems of color rendition-fidelity, discrimination, and preference-are well established, not subjects of debate, and are clearly revealed in our correlation and MDS analyses. In our assessment, the newer measures that have been recently proposed do not represent reconceptualizations of the basic dimensions that define quality white light. The improvements are at the margins.
Unless new aspects are brought to bear, we believe that the work of the new CIE TCs is unlikely to lead to an index or indices that are unequivocally superior to those that already exist. We question the pragmatism of asking the lighting community to be patient while two new CIE committees wrestle with details that may be lost when complex computations are distilled to an integer scale. We hope that these new committees will reach consensus, and that their recommendations will be mutually supportive, but is it prudent to ask the lighting industry to continue to wait? Against this backdrop, common sense suggests that it is not possible for any single index to fully encapsulate the multidimensional problem of color rendition. For example, if a light source is to be optimized for a measure of color preference, then it's necessary to make some colors appear more saturated than they would under a reference illuminant [12,16,28,29]. Thus, there is an intrinsic tradeoff between measures of fidelity and preference. If a light source is to be optimized for a measure of gamut area, then the goal will be to create colors that are as saturated as possible. Maximizing a gamut-based measure will come at the expense of fidelity and preference since extreme color saturation is neither preferred nor natural.
As a single number index, R a has fulfilled the practical need for a measure that is simple to understand and readily interpretable. Its utility is proven despite its limitations and shortcomings. Yet, as light sources have become more spectrally complex, R a has been pushed beyond its limits. We believe that any single-number index that has been or will be proposed as a replacement for R a will still suffer from a fundamental problem: one number is not enough to characterize all dimensions of color rendition and one number cannot faithfully summarize color quality.
While an expert user might be comfortable evaluating multiple indices, it is still necessary to condense information into a limited number of measures. When just two measures are used, the maximal amount of information that is relevant to applied lighting is retained if one of them is a fidelity-based measure that is consistent with the concept of color fidelity or quality (such as Q a ), and the other is a measure of relative-gamut (such as Q g ). To meet the needs of some users, such as residential homeowners, even multiple measures may be further compressed into a single scale, such as a word scale. [48]

Appendix: summary descriptions of measures for color rendition
Relative Gamut Area Scale of CQS version 9.0c (2012) Q g -Computed as relative gamut area formed by the (a*, b*) coordinates of the 15 test-color samples in CIELAB normalized by the gamut area of a reference illuminant at the same CCT and multiplied by 100. Scaling is different from Q a , Q f , and Q p and can be greater than 100. Q g does not employ a chromatic adaptation transform. See comments above under Q a regarding the difference between Q g v7.5 and Q g v9.0c. [17,19] Color Preference Scale of CQS version 7.5 (2009) Q p P This index places additional weight on preference of object color appearance based on the idea that increases in chroma are generally preferred. It is scaled from 0 -100 and so that 12 reference fluorescent lamp spectra have equivalent values of Q p and R a . Q p was dropped from CQS version 9.0 with the belief that additional visual experiments are needed before Q p can be placed into practice. [17,18] Memory Color Rendering Index (2010) MCRI P, Q MCRI is based on observers' memory of the preferred color of 10 familiar objects (e.g., fruits, flowers, skin, neutral grey).
There is no reference illuminant; the reference is color memory. Tristimulus values for the objects are transformed to corresponding colors under D65 illumination using CIECAT02 and then transformed to the IPT color space, where MCRI is computed. MCRI has a range of 0 -100. The result is also affected by the degree of adaptation and illuminance. [45] Color Discrimination Index (1972) CDI D A higher CDI is associated with a larger gamut in the CIE 1960 UCS chromaticity diagram. The gamut is normalized to 100 based on CIE illuminant C. The practical range is about 10 -130.