Perceived contrast on displays with different luminance ranges

Abstract Purpose Medical displays are fundamental in today's healthcare since they provide the link between digitally stored data and the human clinician, and it is thus important that the transfer of information is as effective and reliable as possible. Contrast perception in viewed images is complex due to the nature of the human visual system, and the luminance distribution in the viewed scene plays a major role. Standards and guidelines concerning medical displays are important as they set a baseline image quality. However, as the number of imaging applications as well as display technology has evolved rapidly during the past decades, there may be possible uses not foreseen in the current guidelines. Bright screens may perform as good in bright rooms as less bright displays do in dark rooms, but current guidelines are likely to favor dark rooms due to historical reasons. The purpose of this study was to determine the limits of contrast perception in three very different lighting conditions and relate the outcome to guideline recommendations. Methods Three different display luminance settings were studied, 1–250, 6–500, and 12–750 cd/m2 with luminance ratios of 250, 85, and 62, respectively. Although the luminance ratios, black levels, and white levels were different, they all covered the same number of just noticeable differences (JNDs). By using a two‐alternative forced‐choice method, contrast thresholds were determined at dark, mid‐gray, and bright pixel values for all luminance settings using bar patterns with two different spatial frequencies. In total, 18 contrast thresholds were determined by each of the 10 observers. Results The contrast thresholds for the low‐frequency patterns were close to 0.5 JNDs and there were no systematic differences between the three luminance conditions at any of the pixel values. The high‐frequency patterns required almost 10 times higher contrast where the highest contrast threshold (worst visibility) was obtained for the luminance setting 1–250 cd/m2 at the dark pixel value. Conclusions The differences between the three luminance conditions were mostly minor, which indicate that display settings with low luminance ratios and high minimum luminance levels can be used without compromising displayed image contrast. The number of JNDs enclosed by the luminance range of a display is a reliable metric for global perceived contrast. Luminance ratios are limited regarding the ability to detect low contrast objects when there are large differences in luminance, although they can still be used within a relatively small range of luminance levels. Low luminance levels may cause a loss of visibility, especially for fine details, and should be avoided.


INTRODUCTION
Quality assurance of medical displays is important for consistent and optimal rendition of clinical images. The display is the link that translates digital pixel values into visible light perceivable by the human visual system (HVS). Standardized display properties ensure adequate quality of rendered medical images. There are numerous standards and guidelines that specify display requirements like, for example, display size, resolution, homogeneity, noise, and temporal performance. [1][2][3][4][5][6][7][8][9][10] This paper will focus on the validity and limitations of the requirements for display luminance range, that is, the minimum and maximum luminance, and the distance between them.
The first displays to be used in medical imaging had a low maximum luminance and a high reflectance. 11 The most logical thing to do was to use them in dark rooms where reflections were reduced, luminance ratio increased, and contrast improved. It was also a simple and natural solution to use the minimum and maximum luminance together with the corresponding luminance ratio as display requirements. The technical aspects of displays have improved rapidly since then. Today, the maximum luminance can be very high and reflectance very low, which allows high-contrast renditions in much brighter environments, such as operating rooms and dentist departments.
For medical applications, the most commonly used requirement for display input to luminance output characteristics is the grayscale standard display function (GSDF) in Dicom part 14. 12 This function utilizes the concept of just noticeable differences (JNDs) to distribute the perceived contrast evenly throughout the entire luminance range. In short (and somewhat simplified), the number of JNDs enclosed by the minimum and maximum luminance of a display corresponds to the number of theoretically visible gray levels.
The luminance ratio functions reasonably well in dark environments, but a display with a high minimum luminance requires the maximum luminance to exceed unrealistic levels if the specified luminance ratio is also high. In this case, the number of JNDs is probably a better requirement than the luminance range for specifying display contrast since it accounts for the nonlinear nature of the HVS. As an example (see Tables 1 and 2), consider the following recommendations for diagnostic displays according to AAPM TG270: Minimum luminance 1 cd/m 2 ; maximum luminance 350 cd/m 2 ; luminance ratio 350. 2 The corresponding JND range is 582 and will in this example be used as an alternative to the luminance ratio requirement. In a brighter room where the display minimum luminance is 4 cd/m 2 , 582 JNDs can be achieved with a maximum luminance of 578 cd/m 2 while a luminance ratio of 350 would require the maximum luminance to be 1400 cd/m 2 , which is well above the specifications of most diagnostic displays. Even in a very bright environment, a high-end medical display would be possible to use with a luminance range of 6-700 cd/m 2 . In this case, the JND range is still 582 but the luminance ratio is only 117.
A fixed luminance ratio thus requires the minimum luminance to be relatively low to keep the maximum luminance within a realistic range, while a fixed JND range would allow diagnostic displays to be used in much brighter environments. Image quality can be critical also in bright environments where dimming of the lights is not possible. Unfortunately, displays in bright rooms tend to be of low cost with little or no quality control. Quality displays with QA capabilities are more expensive than standard displays and the investment is difficult to justify given that, at best, only lower requirements (review displays) can be fulfilled. However, if the assumption of equal perceived contrast for displays with equal JND ranges is valid, the actual image quality can be just as good in bright rooms as in dark rooms. Today, higher requirements (diagnostic displays) are impossible to meet in bright rooms due to the luminance ratio requirement that forces the maximum luminance to unrealistic levels. For the end users in bright locations, there are no guidelines on how to best use their displays. For a more comprehensive theoretical study concerning luminance ranges and possible strategies for maintaining stable image contrast in bright rooms, please see another recently published paper by Sund. 13 Replacing luminance ratios with JND ranges would not only allow standardized display properties in environments that cannot be dimmed, but it would also be possible to use much brighter reading rooms in general. Both AAPM Report 270 and ACR-AAPM-SIIM recommend that the minimum luminance is not too low to avoid the mesopic region of the HVS. 2,8 The HVS performs better with more light. 14,15 Fatigue may also be reduced with more light. 16 There are a few detection studies related to the visibility of images under different lighting conditions. Pollard showed that a moderate increase in illuminance (<100 lx) will likely not degrade, and may even improve, an observer's ability to detect low-contrast objects provided that the luminance ratio is maintained. [17][18][19] Other studies demonstrated that an increase in ambient light will degrade the visibility of image details. [20][21][22] However, the increase in ambient light was never accompanied by a corresponding increase in the display's maximum luminance, thereby reducing both the luminance ratio and the JND range. The purpose of this study was to determine the limits of contrast perception in three very different lighting conditions and relate the outcome to guideline recommendations. A human observer study will determine the related contrast thresholds using a two-alternative forced-choice (2AFC) method.

Terminology
L min : Minimum luminance output from a display (cd/m 2 ). L max : Maximum luminance output from a display (cd/m 2 ). E: Room illuminance measured at the display surface when the display is off (lx). R d : Display diffuse reflection coefficient (cd/m 2 /lx). L amb : Reflected luminance from the display (cd/m 2 ). [

Equipment
A high-end medical display (Eizo Radiforce RX350, Eizo, Hakusan, Japan) utilizing a display board capable of 10-bit rendering (FirePro W5100, AMD, Sunnyvale, CA, USA) was placed in a small room without windows where the walls and ceiling were painted in a matte black color and the gaps in the door framing were covered with black plastic sheets. The ambient light sources consisted of three Philips Hue color ambience E27 lights (Philips Lighting, Eindhoven, The Netherlands) positioned close to the ceiling, directly above the observer. The display luminance output was measured using an LS-100 telescopic luminance meter (Konica Minolta, Tokyo, Japan). All display calibrations and observer performance studies were made using the same software developed in-house by the author. During all luminance measurements, conditions such as display brightness, display mode, internal look-up-tables (LUTs), graphic board LUTs, and room lights properties were always recorded. The ambient light was also measured using the illuminance meter in the bezel of the display. Illuminance at screen center was measured using a Hagner Universal photometer model S2 (Hagner, Solna, Sweden).

Display calibration
Three different luminance and illuminance settings were used as shown in Table 3. They all covered 533 JNDs but had vastly different luminance ratios. The black and white levels were set at calibration by adjustment of the display brightness and internal LUT. The purpose of using higher illuminances for higher luminance settings was to reduce the possible eye strain caused by viewing a bright display in a dark room. The calibrations were made from L′ min to L′ max according to Dicom part 14 using the specified test pattern and 256 measurement points. Calibrations were stored in the display internal LUT and the LUT on the display board was linear.For each luminance setting,the maximum luminance of the uncalibrated display was adjusted to be somewhat higher than L max when calibrated. By keeping the maximum luminance of the uncalibrated display only slightly higher than when calibrated, the loss of luminance resolution caused by a partially used LUT was minimized. Notes: L′ min is the minimum luminance including reflected light. L′ max is the maximum luminance including reflected light. E is the illuminance at the center of the display surface when the display is off. L amb is the luminance caused by reflected light. r′ is the ratio between L′ max and L′ min . #JND is the number of JNDs enclosed by L′ min and L′ max . R d is assumed to be 0.005 cd/m 2 /lx.

Test image
The test image used (see Figure 1) was the same for both the display luminance measurements and the observer studies. The intention of the test image was to simulate the luminance variations in a typical X-ray image.The entire screen was set to pixel value 127 using an 8-bit gray-scale rendering, corresponding to mid-gray.
In the center 900 × 900 pixels (19 × 19 cm), the actual test image was displayed using a 10-bit OpenGL rendering technique. The test image was divided into 30 × 30 squares where the pixel values were generated only once,in the beginning of the study,by a random selection from 11 pixel values uniformly distributed between 0 and 1023. The center 10 × 10 squares were then replaced by a homogenous area that could take any pixel value. During the observer studies, but not during the luminance measurements, a low-contrast bar pattern (64 × 64 pixels) was displayed in the center of the image, where the bar pattern average luminance was the same as the luminance of the surrounding center homogenous area.
F I G U R E 1 Test image for luminance measurements and test pattern observations. The center bar pattern was only present during the observer studies and not during luminance measurements. In this image, the bar pattern contrast has been greatly exaggerated for demonstration purposes. Normally, it was barely visible

Luminance measurements and test pattern generation
Measuring luminance output valid for the actual viewing conditions was not trivial, even with a high-end telescopic luminance meter. The major problem concerned the lowest luminance output from the display. Even though the room lights were off, light from the bright parts of the image was reflected in the luminance meter and sent back to the screen surface, thereby increasing reflected light. The distance of the luminance meter to the display greatly influenced the measured values. To achieve a measuring geometry valid for when an observer is viewing images, the luminance meter was placed approximately at the position where the observer's head was supposed to be.
The Minolta LS-100 has a circular measurement area covering a 1 • viewing angle while the viewfinder covers 9 • . The influence of light from the viewfinder outside the measurement area is normally small but can be substantial when measuring a small dark area surrounded by bright regions. 23 The normal closest focusing distance of the luminance meter is 1 m,but due to the head-simulating position closer to the screen, a close-up lens (lens 135) had to be used.Due to light attenuation in the lens, all measured values were multiplied with 1.05, as specified by the Minolta user manual. Using this lens had another positive effect-the entire viewfinder field of vision was within the boundaries given by the center homogenous area of the test image, thereby reducing the effect of bright light from outside the measurement area.
Since extremely low-contrast test patterns (down to 0.1 JND) were to be displayed at different luminance levels, it was crucial to measure the display luminance response at the highest possible luminance resolution. An 8-bit display covering 533 JNDs has a luminance resolution of 2.1 JNDs per pixel value change while a 10bit display has 0.5. Since even a 10-bit display would be insufficient, any of the three colored subpixels could deviate by one pixel value from the other two, thereby increasing the number of (near) gray levels to 7162 (0.074 JND per subpixel change). 24,25 The luminance response for each of the three luminance settings was measured three times and the average value for each gray-level was used for test pattern generation.
Since the distances between subsequent luminance values were irregular, the software searched for the best possible test pattern with a given luminance and contrast, within given tolerances (±5% for luminance and ±0.01 JND for contrast). The requirement for a good bar pattern was that the higher and lower luminance levels were equally spaced from a center luminance level. This center luminance level was then used for the homogenous area in which the bar pattern was positioned. By using this method, any difference in bar pattern average luminance from the luminance in the surrounding homogenous area was too small to be detectable.

2.6
Observer studies-Determination of contrast thresholds For each of the three luminance settings, contrast thresholds were determined for bar patterns at three pixel values: 50 (dark), 500 (mid-gray), and 950 (bright). Two different bar pattern spatial frequencies were also used, one with 8 pixels per period (4 high + 4 low) and one with 2 pixels per period (1 high + 1 low). In total, 18 contrast thresholds were determined for each observer using a 2AFC technique together with an adaptive method in what is referred to as a run. 26 Each run took approximately 10 min to complete, and the observers were free to decide the number of consecutive runs in each session before breaking. Before each new run, the observers had to wait at least 30 seconds for their vision to adapt and for the display and room lighting to stabilize. The order of the runs was randomized for each observer, but all lower frequency patterns were always viewed before any of the higher frequency patterns. All technical aspects associated with each run, such as display luminance, display mode, LUTs, room light level settings, and room illuminance, were automatically set and verified by the software before each run could start. The viewing distance was 39 cm and for the 8 pixels per period patterns, this corresponded to the standard target defined by Dicom part 14 (A 2-deg × 2-deg square filled with a horizontal or vertical grating with sinusoidal modulation of 4 cycles per degree), apart from the fact that bar patterns were used instead of sinusoidal patterns.
There are many possible methods to determine contrast thresholds. [27][28][29][30][31][32][33] The one used in this study utilized 2AFC together with an adaptive method that determined the contrast of the upcoming pattern in the run based on previous answers. 26 The goal for the adaptive method was to choose patterns with a contrast close to, or slightly above, the estimated contrast threshold. For each of the 18 setups, a set of bar patterns was created within a plausible contrast range, considering that different observers have different contrast sensitivity. Each F I G U R E 2 Results from one run, that is, the determination of one contrast threshold for one observer. In the upper chart, test pattern contrast is shown for every observation in the run. White squares indicate correct answers while black squares indicate incorrect answers. The lower chart shows the proportion of correct observations for each test pattern contrast together with the best possible fit for the modified cumulative gaussian function. The number of observations for each contrast level is represented by the filled gray area. The contrast threshold was set to the test pattern contrast resulting in 75% correct answers according to the fitted function bar pattern could be displayed in one of two directions, vertical or horizontal, and the observer had to decide the most likely direction. The first pattern was always the one with the highest contrast and the following patterns had decreasing contrast until the observer made an erroneous decision. From that point on, the contrast of the next pattern was always determined by using all the observer's previous responses combined with a modified version of the cumulative Gaussian distribution. The modified version spanned from 0.5 to 1.0 since the guess rate in a 2AFC study is 50%. The best least squares adaptation of the modified cumulative Gaussian to all responses was determined using a Cobyla optimization algorithm. After 100 observations from the first error, the psychometric curve was approximated by the last fit of the modified cumulative Gaussian function. The contrast threshold was determined by the mean value in the Gaussian distribution, corresponding to a response rate of 75% correct observations, which is midway between guessing and 100% correct. See Figure 2 for an example run.
Each contrast threshold mean value was calculated by averaging the individual contrast thresholds for the ten observers. Confidence intervals and paired difference tests were calculated using bootstrapping. [34][35][36] For each bootstrap sample, ten observers were randomly selected (with replacement), making the results representative of a general population of observers. For each of the observers' runs, the observations at all test F I G U R E 3 Average contrast thresholds for the ten observers. Left chart (A): Low-frequency patterns. Right chart (B): High-frequency patterns. Three display luminance conditions were studied, 1-250, 6-500, and 12-750 cd/m 2 . For each condition, contrast thresholds were determined at three pixel values: dark (50), mid-gray (500), and bright (950). Error bars indicate the 95% confidence intervals pattern contrasts were replaced by bootstrapping from the original observations at that contrast level, thus resulting in new psychometric curves and new contrast thresholds. 2000 bootstrap samples were performed, resulting in the same number of mean value estimations for all contrast thresholds. The confidence interval for each mean was calculated as the middle 95% of all estimations. Difference distributions were calculated by creating pairwise differences between all contrast threshold estimations. The p-value related to two contrast thresholds, indicating the probability that the two thresholds were equal, was determined by the position of zero within the difference distribution. If zero splits the distribution exactly in the middle, there is (on average) no difference resulting in a p-value of 1. If zero splits the distribution with 2.5% of all values on one side and 97.5% on the other side, the p-value is 0.05.

RESULTS
18 contrast thresholds were determined by each of the 10 observers in a total of 180 2AFC runs.The result from one of the runs is shown in Figure 2. Most of the observations were made close to the detection threshold. The average contrast thresholds for the ten observers are shown in Figure 3 for two bar pattern frequencies, three luminance conditions and three pixel values. Pairwise p-values indicating the probability that two contrast thresholds are equal are shown in Table 4.
The contrast thresholds for the low-frequency patterns were close to 0.5 JNDs for all luminance settings and pixel values, while the contrast thresholds for the high-frequency patterns were almost 10 times higher.
The GSDF is based on the visibility of test patterns with a frequency corresponding to the lower frequency used in this study. For this frequency, the contrast thresholds at all three pixel values were, with a few excep-tions, almost the same and thereby close to perceptually linear. Although there were some statistically significant differences, they were not systematic. The issue of possible systematic errors (which are not reflected in the p-values) when determining very low-contrast thresholds is addressed in the discussion. For the highfrequency patterns, no significant differences could be found between the three luminance settings at the midgray and bright pixel values. However, at the dark pixel values, the contrast threshold for 1-250 cd/m 2 was significantly higher than for 12-750 cd/m 2 . The contrast threshold for 1-250 was also significantly higher at dark pixel values than at mid-gray and bright pixel values. For the other high-frequency pattern luminance settings, there were no significant differences between any of the pixel values.

DISCUSSION
The JND is based on human perception studies under different lighting conditions, and it is not surprising that the JND range of a display is a valid metric for perceived contrast. In this study, contrast properties for displays with three different luminance ranges, but with the same JND range, were examined. The differences were found to be minor with similar contrast properties at the three pixel values: Dark, mid-gray, and bright. Although there were a few statistically significant differences between the luminance settings for the lowfrequency patterns, there was no trend indicating which setting performed best. The GSDF accounts for different contrast thresholds at different luminance levels, but only for the specific frequency corresponding to the low-frequency patterns in this study. The higher contrast thresholds obtained in dark image areas for high-frequency patterns is an indication of the theoretical fact that small details are difficult to see when the luminance is low. 37 Image quality would probably improve by avoiding low display luminance levels. Another possible reason could be because the luminance ratio is highest for 1-250 cd/m 2 and lowest for 12-750 cd/m 2 . A high luminance ratio causes a decrease in contrast sensitivity as the difference between image object luminance and adaptation luminance increases. [37][38][39][40] Although the GSDF was a big step toward perceptual linearization, the JND is only valid when viewing a single luminance at a time. 12 One solution is to use a modified version of the GSDF that takes the luminance range into account. 40 If an unmodified version of the GSDF is used, a smaller luminance range may even be beneficial for the perceived contrast since the HVS operates closer to peak contrast sensitivity. 13 A common misconception about image display is that a large luminance range with a dark black and a bright white is always better than a low luminance range. The HVS has a remarkable capability of scaling a multitude of luminance ranges into the same perceived gray-scale range. 41 The risk of experiencing dull and gray images on a display with a low luminance range is therefore low, provided that the JND range is sufficiently high. Medical display calibration is about contrast visibility, and the spacing between gray-levels is far more important than their actual intensity.
The measured contrast thresholds in this study, expressed in JNDs, reflect properties of the HVS, and do not depend on the number of JNDs per gray level. However, when viewing images, a large JND range is important for low-contrast visibility, and a large number of gray levels (image and display bit depth) is important to reduce luminance quantization effects. The minimum requirements for these parameters are probably task dependent and outside the scope of this paper.
Consensus for the past 30 years has been to use a relatively low black level and a high luminance ratio, which is also the recommendations in most guidelines in the field of medical display. The historical reason behind these recommendations is that they provided an easily implementable solution for the two major problems of contemporary displays-high reflectance and low maximum luminance. Today, modern displays are very bright and use antireflective technology and are therefore much less problematic. Although the current recommendations still fulfill their purpose of excluding scenarios with poor image quality, they are also somewhat blunt, and exclude other scenarios with adequate or possibly even superior image quality. There is a possibility that some of the excluded scenarios will show increased image quality compared to the ones included, especially when the luminance increases to levels where the HVS perform better. A display with a black level of 12 cd/m 2 and a luminance ratio of 62 may seem unlikely to be used in a clinical environment, but when using a display with a reflection coefficient of 0.005 cd/m 2 /lx in an operating room with 2000 lx, this is close to reality. Another problem occurs with surgical light that has a central illuminance of 40 000-160 000 lx. To avoid adaptation problems, both the room illuminance and display luminance should be deliberately high.As the results from this study show, common diagnostic displays could very well be used under such conditions without compromising perceived contrast in any part of the image, if properly calibrated according to room illumination.
According to the visual model published by Barten,37 for a given luminance, higher spatial frequencies require higher contrast to be visible, which agrees with the results from this study. However, the expected increase in contrast between the two used frequencies is less than the obtained result, which was close to a factor of ten. One contributing factor could be pixel bleeding from the thin bright lines to the thin dark lines in the test pattern, thereby reducing the actual contrast. There is also a difference in luminance conditions between the model and this study. The model assumes ideal viewing conditions, whereas in this study, the pattern was positioned in the center of an area with large luminance variations. Intraocular light scattering is known to reduce the contrast sensitivity. 42,43 The rapid improvement in display technology with less reflective surfaces and higher light output has enabled the use of medical displays in bright environments with adequate image quality. Replacing the luminance ratio requirement with JND range would allow consistent image presentation for a wider range of operating conditions, including bright rooms. The possibility to use brighter displays in brighter rooms is not only of interest when dimming the lights is not feasible. Since the HVS is likely to perform better with more light, and humans in general tend to dislike working in dimly lit rooms, maybe it is time for radiologists to replace dark reading rooms with brighter locations. If the room illumination is constant and the display is calibrated accordingly, the perceived contrast will be as good, or possibly better, as in dark rooms.

CONCLUSIONS
The number of JNDs enclosed by the luminance range of a display is a reliable metric for global perceived contrast. Luminance ratios are limited regarding the ability to detect low-contrast objects, since they do not take the properties of the HVS into account, although they can still be used within a relatively small range of low luminance levels. If the JND range requirements of a display are met, and the display is calibrated correctly according to ambient illumination, low luminance ratios and high black levels are possible to use without compromising image contrast.

AC K N OW L E D G M E N T S
The author would like to thank the observers without whom this paper would not exist. The observers were: Angelica, Anna, Diba, Jennie, Ludvig, Magnus, Maria, Simon, Suhela, and the author himself. Eizo Nordic AB provided the API toolkit, which enabled software control of the display properties.

F U N D I N G I N F O R M AT I O N
The author received no specific funding for this work.

C O N F L I C T O F I N T E R E S T
The author has no relevant conflicts of interest to disclose.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data available on request from the authors.