Contrast Sensitivity of the Wavelet, Dual Tree Complex Wavelet, Curvelet, and Steerable Pyramid Transforms

Accurate estimation of the contrast sensitivity of the human visual system is crucial for perceptually based image processing in applications such as compression, fusion and denoising. Conventional contrast sensitivity functions (CSFs) have been obtained using fixed-sized Gabor functions. However, the basis functions of multiresolution decompositions such as wavelets often resemble Gabor functions but are of variable size and shape. Therefore to use the conventional CSFs in such cases is not appropriate. We have therefore conducted a set of psychophysical tests in order to obtain the CSF for a range of multiresolution transforms: the discrete wavelet transform, the steerable pyramid, the dual-tree complex wavelet transform, and the curvelet transform. These measures were obtained using contrast variation of each transforms' basis functions in a 2AFC experiment combined with an adapted version of the QUEST psychometric function method. The results enable future image processing applications that exploit these transforms such as signal fusion, superresolution processing, denoising and motion estimation, to be perceptually optimized in a principled fashion. The results are compared with an existing vision model (HDR-VDP2) and are used to show quantitative improvements within a denoising application compared with using conventional CSF values.

Abstract-Accurate estimation of the contrast sensitivity of the human visual system is crucial for perceptually based image processing in applications such as compression, fusion and denoising.Conventional contrast sensitivity functions (CSFs) have been obtained using fixed-sized Gabor functions.However, the basis functions of multiresolution decompositions such as wavelets often resemble Gabor functions but are of variable size and shape.Therefore to use the conventional CSFs in such cases is not appropriate.We have therefore conducted a set of psychophysical tests in order to obtain the CSF for a range of multiresolution transforms: the discrete wavelet transform, the steerable pyramid, the dual-tree complex wavelet transform, and the curvelet transform.These measures were obtained using contrast variation of each transforms' basis functions in a 2AFC experiment combined with an adapted version of the QUEST psychometric function method.The results enable future image processing applications that exploit these transforms such as signal fusion, superresolution processing, denoising and motion estimation, to be perceptually optimized in a principled fashion.The results are compared with an existing vision model (HDR-VDP2) and are used to show quantitative improvements within a denoising application compared with using conventional CSF values.
Index Terms-Discrete wavelet transforms, contrast sensitivity function.

I. INTRODUCTION
T HE Contrast Sensitivity Function (CSF) measures the threshold sensitivity of simple grating patterns of the Human Visual System (HVS) over a wide range of spatial frequencies.It plays a central role in the visual models and metrics used in perceptually-based image processing techniques such as image fusion and compression (e.g.[1], [2]).
Wavelet decompositions (see section II) provide invertible multiscale transforms enabling two-dimensional spatial-frequency analysis.The usage of wavelet decompositions in perceptual image processing has commonly been based on standard CSFs that have been generated using fixed-size Gabor functions (e.g.[2]).Watson et al. [3] measure and model the CSF curves associated with the DWT and observe that they do not follow the conventional CSF shape.Specifically, maximum contrast sensitivity is significantly shifted toward lower frequencies.
Although effective in multiscale decomposition, the basis functions of the DWT do not effectively decompose orientated two dimensional content compared to more recent wavelet decompositions (as discussed below).Our work is based on the work of Watson et al. [3] and its aim is therefore to generalise this previous work; specifically, to generate contrast sensitivity measures for each subband for a range of very effective and popular wavelet transforms, thus enabling more effective and accurate perceptual processing.Additionally, we have added further appropriate and useful experimental conditions (such as pink noise) compared to Watson et al. [3].In principle, the presented thresholds could be predicted using Linear System Theory (LST) from previous developed models.However, due to the inherent non-linearities of the HVS and related effects, direct measurements will provide precision to an accuracy not possible through model predictions.
This paper is structured as follows.First the background of both wavelets and contrast sensitivity functions are discussed in section II.The methods used to obtain the threshold values for each of the considered transforms is described in section III.The results are presented and discussed in section IV.This is followed by a section comparing the results to an existing vision model in section V and a description of applications in section VI.Finally a conclusion including a description of further work is included in section VII.

II. BACKGROUND A. Wavelet Transforms
Wavelets, in their different forms, provide a large number of invertible transforms that decompose a signal into a selfsimilar, weighted set of basis functions that vary in scale and orientation (in two dimensions).Although the discrete form (the Discrete Wavelet Transform: DWT) has primarily been used for compression, more recent non-critically decimated discrete forms (i.e.where the number of coefficients equals the number of samples) have been defined for the This work is licensed under a Creative Commons Attribution 3.0 License.For more information, see http://creativecommons.org/licenses/by/3.0/use in analysis applications such as: image and signal fusion, super-resolution processing, denoising and motion estimation.Popular wavelet transforms used for analysis include the DWT, DT-CWT, Curvelet and Steerable Pyramid Transforms.These are discussed below.

1) Discrete Wavelet Transform (DWT):
The DWT is a critically decimated invertible transform that uses separable orthogonal or bi-orthogonal high and low pass spatial filters that have been designed to provide perfect reconstruction [4]. 1he CSF has been measured and modelled for the DWT by Watson et al. [3].
2) Dual-Tree Complex Wavelet Transform (DT-CWT): The two dimensional DT-CWT was formulated by Kingsbury [5] and Selesnick et al. [6] in order to provide improved directional resolution and shift invariance compared to the conventional DWT.In common with the DWT, the DT-CWT is implemented using separable FIR filters over octave frequencies using subsampling between levels.However, the DT-CWT utilises two separate trees that form Hilbert filter pairs leading to complex real and imaginary coefficients in each subband. 2he transformation is able to distinguish between positive and negative frequencies leading to 6 orientated complex subbands at each scale (with the 6 orientated subbands being evenly angularly distributed as illustrated in figure 3).
It has been identified by Adelson et al. [9] that separating the positive and negative frequencies of an image transform is important in terms of human vision based processing.Figure 3(a) shows 3 subbands' real basis functions (there are 6 per scale with the remaining 3 being mirrors of the ones shown).This figure shows: the basis functions orientated at the 3 angles (15°, 45°, 75°), exponential change in support and how the basis functions resemble Gabor functions.The near shift invariance and improved directional selectivity have facilitated excellent results in denoising, fusion and other image processing applications [10]- [12].
3) Curvelet Transform: Conventional wavelet transforms such as the DWT are not well suited to two dimensional discontinuities.The Curvelet transform is a multiscale decomposition where an image is decomposed with geometric bases in multiple directions and positions [13].Specifically, frequency space is covered with directionally selective subbands at octave frequency centres with increasing directional selectivity as the frequency increases (leading to needle shaped basis functions at these finer scales).
Curvelets use parabolic scaling whereby at scale j each basis function has approximately the length of 2 − j/2 and width 2 − j .This arrangement leads to geometrically tuned basis functions that are better able to represent orientated discontinuities (especially at higher frequencies) and therefore give a sparser representation for natural images.Due to their geometric nature they have given very good results in applications such as denoising, image contrast enhancement and fusion (e.g.[14], [15]).Curvelets have been implemented using both an Unequally-Spaced Fourier Transform (USFFT) and also the wrapping of specially selected Fourier samples (wrapping method) [13].The USFFT implementation has been used within this work (to generate the basis functions shown in figure 5(a)).

4) Steerable Pyramid Transform:
The Steerable Pyramid decomposition is a multi-orientation, multi-scale and invertible transform that has been used for many image processing applications over the last two decades [16], [17].The Steerable Pyramid differs from the DWT, DT-CWT and Curvelet transforms in that the degree of orientation selectivity can be arbitrarily selected.Using kth order directional derivatives, the basis functions have k+1 orientations over an arbitrary number of scales and the transform was originally implemented using non-separable 2D steerable spatial filters.We have used the FFT implementation [17] to generate the basis functions shown in figure 4(a).

B. Contrast Sensitivity Function
A widely adopted approach to obtain the transfer function of the HVS is to measure the contrast sensitivity for spatially modulated and orientated Gabor or sinusoid gratings.CSF functions obtained with this method produce the typical inverse u-shaped graph that illustrates the HVS attenuation in sensitivity at high and low frequencies.A large number of models for the CSF have been developed such as those by: Daly [18], Barten [19] and van Meeteren and Vos [20]. 3 These works use common methods for obtaining the CSF using sinusoidal or Gabor gratings and vary the pattern contrast on a fixed uniform background luminance to obtain the threshold when the pattern is "just seen".Experiments commonly use iterated forced choice methods such as 2AFC [3] or 4AFC [21] to obtain this threshold.The current contrast together with whether the observer has chosen the correct forced choice is then compared to a chosen psychometric function model producing an updated contrast value and the method iterated until convergence.It should be noted that the estimated psychometric function represents the probability of detection reflecting that HVS thresholds are not simple step functions.

1) Contrast Sensitivity Function Dependencies:
The CSF has a large number of dependencies on signal context.We now give a non exhaustive list of these dependencies.a) Static average background luminance: Many different studies have measured the CSF on various (uniform) luminance backgrounds (for example an extensive study of CSF for different backgrounds on HDR displays was conducted by Kim et al. [21]).However, for non HDR applications, a static average background luminance is often assumed.Within the following experiments the background was measured to be 23.91cd/m 2 .This value was chosen to give a suitable average background luminance for a non HDR display. 3These three references have all used sinusoids windowed with a rectangle.This is historically the first way that CSFs were obtained before the more common contemporary use of Gabor filters.

b) Adaptation effects:
The effect of light adaptation on the CSF was first investigated by Van Nes et al. [22].More recent contrast sensitivity experiments have used a selection of typical natural images to adapt the HVS to typical frequency and spatial content.In studies, these images have been presented before, [23] and during, the stimulus presentation [23], [24].A study of the effect on the CSF functions on natural image content is investigated by Johnson and Fairchild [25].Many other studies have shown the adaptation of the varying stimuli to other controlled (orientated or scaled) stimuli such as masking Gabor functions (e.g.[26]).A more conventional approach (single stimulus presentation) has been used in the work presented here as it was considered that the analysis of each basis function in any application will be made alone without any reference to global frequency.Further independent masking effects could be used to extend the model in future work.However, simple 1/ f spatial adaptation, using pink noise (see section III-E), was employed to give some basic adaptation in a non-localised sense.
c) Suprathreshold effects: For applications such as compression and fusion, quantitative measures of difference (for contrast sensitivity) above the threshold of perception are important.This has rarely been dealt with in the literature.Most contrast sensitivity models for compression assume a linear response above the threshold.However, it has been demonstrated that the effects at threshold are significantly different than at suprathreshold levels [27], [28] leading to the conventional CSF curves not being applicable at levels far from threshold.Specifically, Daly [29] and Georgeson and Sullivan [27] have shown that equal contrast curves flatten out as the contrast becomes significantly greater than the threshold level.This effect is termed contrast constancy.
d) Orientation: It has been observed that natural scene imagery contains a larger proportion of frequency content in vertical and horizontal orientations (i.e. the cardinal axes) relative to the oblique orientations [30], [31] (with an additional slight bias to horizontal content as noted by Hansen and Essock [32]).This is reflected in the orientational tuning of the contrast sensitivity function of the HVS.Initial studies have shown that the contrast sensitivity function obtained using simple grating thresholds is more sensitive to horizontal and vertical orientations (the so called "Oblique Effect" [33], [34]).However, more recent experiments [30], [32] using broadband natural image masking content have conversely shown that the visual system is more sensitive to oblique orientations.It is anticipated that the conventional "Oblique Effect" will be reflected in our results.Therefore threshold data has been included for different basis function orientations where practical.
e) Additional dependencies: These include phase, chrominance, and temporal effects.These effects are not further investigated in our work.
Due to our application focus and to ensure tractability of analysis, we assume only a simple dependence of the CSF on orientation and 1/ f spatial adaptation.Further work is intended to extend the model to take into account the remaining dependencies listed above.

A. Display and Viewing Conditions 1) Display Visual Resolution:
The thresholds of the different transform basis functions depend upon both the viewing distance and the display visual resolution in pixels/degree.Given a viewing distance v in cm and a display resolution d in pixels/cm, the display visual resolution r in pixels/degree is given by (1).
For each trial, the viewing distance was set to 57cm therefore giving r ≈ d.Given this viewing distance and the resolution of the monitor, the display visual resolution was 21.38 pixels/degree.This was used for all subsequent calculations.This resolution relates to the design specification for HDTV (approximately 30 cycles/degree).It should be noted that there are significant differences in sensitivity as a function of the viewing distance independent of the visual resolution in pixels/degree.The distance used within our experiments would therefore be more applicable to desktop use than cinema or large TV applications.

B. Basis Function Spatial Frequency
In order to obtain meaningful CSF curves, the spatial frequency (cycles/degree) can be easily calculated for conventional Gabor gratings, since their central frequencies can be specifically defined.For the basis functions used throughout our work, the spatial frequency is calculated by (2).
where f is the spatial frequency in cycles/degree, r is the display resolution in pixels/degree and c is the cycles/pixel associated with the basis function.In previous work by Watson et al. [3], c is defined as 2 λ where λ is the wavelet level.This definition is not applicable to our work because each subband and its associated basis function will have a variable central frequency that changes according to both scale and orientation.The central frequency c for our work is calculated for each subband as the basis functions' maximum spatial frequency (as a proportion of the sampling frequency).This was calculated by finding the distance (in normalised units of frequency) from DC to the maximum magnitude of the two dimensional frequency transform of the subbands' basis function.

C. Display Calibration
In order to generate accurate contrast sensitivity threshold values, the monitor output must be calibrated so the stimuli provide linear luminance differences and contrast.The chosen monitor (see below) was calibrated using a photometer and the calibration functions provided within Psychtoolbox [35].The gamma value was measured to be approximately 1.9 (generated from a range of 9 luminance values across the monitor's possible values).A look-up table was used in all experiments thus creating a linearised display.A linearised display was chosen due to the following reasons: • As illustrated by Watson et al. [3] the difference in thresholds for a linearised and non-linearised display is approximately an offset.As the generated models were monotonically increasing, the relative offsets between transform subbands are not considered to be significant for comparative analysis.• For compatibility with compression / analysis methods that have assumed linearised displays and where visual models are combined under the assumption of linearity (e.g.Höntsch and Karam [36]).• Potential variations in gamma within display applications (typical values from 1.6 to 2.6).Our results should only strictly be valid for linearised displays although conversion from linearised to non-linearised threshold models is possible using the methods given by Watson et al. [3] and Peterson et al. [37].

D. Stimuli
Basis functions from the DWT, DT-CWT, Steerable Pyramid and Curvelet transforms were used as the stimuli for the threshold measurement experiments described in section III-E.For all transforms, the subbands are referenced as {λ, θ} where λ is the decomposition level and θ is the subband orientation.For transforms using complex basis functions, the real component was used in all cases.
For all transforms, the spatial basis function images were created as follows.
• A 512 × 512 zero filled image was transformed using the chosen transform (resulting in all zero coefficients within each subband).4• A single centrally located coefficient in the chosen subband was set to 1 (or 1+0i in the complex transform case).• The transform was inverted and the basis function exactly centred to produce the basis function stimulus.The basis function stimuli were presented in a manner identical to the work of Watson et al. [3].Firstly, the basis functions were normalised to give unit maximum amplitude (resulting in a signal in the range of [−1,1]).These normalised basis functions were variably scaled to produce maximum amplitudes in the range of [0,128].The scaling factor of these signals is our measure of stimulus contrast Y .Before presentation (within the threshold calculation procedure described in section III-E) the scaled signal is added to the display average value of 128 and the resulting image in the range [0,255] is presented on the linearly compensated display.Our defined stimulus threshold Y is therefore the maximum amplitude of the signal minus 128 at the threshold of perception.
The average intensity of the display L 0 (when displaying a uniform value of 128) was found to be 23.91cd/m 2 and the maximum intensity of the display (displaying a uniform value of 255) was found to be double this value (due to the linearised display).The commonly used contrast measure L/L 0 has been used (in dB units) defined for threshold experiments by Manituk et al. [38] as For our experiments, due to the linearised display, L is related to Y as: The stimulus contrast Y defined above can therefore be converted to the contrast C using All stimuli were displayed on a calibrated SONY Trintion 20" monitor (Model number 300SF).

1) DWT Basis Function Stimuli:
A selection of basis DWT functions is shown in figure 2(a) ({2 . . .4, 1 . . .3}).The actual basis functions used in the experiments were {1 . . .5, 1 . . .3} i.e. all three orientations for the first 5 levels.The filters used were the nearly symmetric filters designed by Abdelnour and Selesnick [39].These filters were used as they were the default first level filters used in the DT-CWT in section III-D2.
2) DT-CWT Basis Function Stimuli: Figure 3(a) shows the basis functions labelled {2 . . .4, 1 . . .3}.This figure shows how the positive and negative frequencies can be differentiated by the DT-CWT, leading to 6 orientated subbands at each level (+/ − 15°, +/ − 45°, +/ − 75°).Figure 3(b) shows how these basis functions are related to their frequency response zones.The basis functions used in the experiments were {1 . . .5, 1 . . .3} i.e. all the scales from 1 to 5 but only the first three orientations.A subset of orientations was used to make the experiments tractable.We have made the assumption that sensitivity to a stimulus is not affected by left-right reflection.Therefore, as the basis functions for the DT-CWT, Curvelet and Steerable Pyramid are reflected across the y-axis, the results are assumed to hold for the reflected orientations.

3) Steerable Pyramid Basis Function Stimuli:
The Steerable Pyramid basis functions and their frequency regions are shown in figures 4(a) and 4(b) respectively.The basis functions displayed are {2 . . .5, 1}.The basis functions used for the experiments are {1 . . .5, 1}.The Fourier domain implementation of the Steerable Pyramid transform was chosen (as it gives perfect reconstruction).The number of orientations of the transform can be chosen arbitrarily.However, we have used four orientations as this gives good directional coverage of the frequency domain and is the default implementation number within the available MATLAB code. 5) Curvelet Basis Function Stimuli: The curvelet basis functions used in the experiments are shown in figure 5(a) and can be denoted {2 . . .5, 1}.Only 4 scales were used as the single residual highpass filter did not contain any orientational information.Only one orientation per scale was selected for the experiment due to the very large number of separate orientations (16 orientation bands for scales 2 and 3 and 32 orientation bands for scales 4 and 5).Analysis of all these orientations (even with assumed left-right mirroring) would make the experiment intractable.This is a different case from the Steerable Pyramid orientations as the Steerable Pyramid filters (at the same scale) are rotated versions of each other but the Curvelet filters are not.However, the differently orientated Curvelets at the same scale do resemble each other so the final CSF values for all directions are generalised from the single orientation but interpolated to the actual filter frequency centres of each subband.Threshold values are therefore available on a per-subband basis within the MATLAB code (discussed below).

5) Gabor Function Stimuli:
The Gabor functions shown in figure 1 are included for reference and completeness.The centre frequencies (measured from DC) of the Gabor functions are equivalent to the centre frequencies of the horizontal (and vertical) subbands of the DWT (as assumed given by Watson et al. [3]).The spatial support of the Gabor Stimuli is defined by a constant Gaussian envelope with the sigma equal to 1 visual degree (this compares to 0.5 visual degree for equivalent patterns within Modelfest [40] and 1.5 visual degrees for the Gabor patterns used by Kim et al. [21]).

E. Threshold Measurement
It has been recognised that forced choice experiments are an effective way to determine contrast sensitivity at threshold [21], [41].Kim et al. [21] have used a 4AFC method to derive CSF curves, stating that the convergence to the perceptual threshold is faster than using 2AFC.However it was considered that presentation of stimuli at 4 possible spatial positions would the spatial and temporal aspects of the results and therefore centrally located stimuli using the 2AFC method was chosen.
The 2AFC trials consisted of two 200ms intervals separated by 500ms [3].One interval contained the stimulus added to a uniform grey screen and the other interval contained just the uniform grey screen (randomly alternated after each trial).Each interval was signalled by a separate audible warning tone.After each presentation, the observer was forced to select which interval they considered to contain the stimulus (using two mouse buttons).
Between presentation trials, the amplitude of the stimulus Y was varied adaptively using a Quest staircase [42].At the end of the trials, for each stimulus a Weibull distribution was fitted to the proportion correct.The final threshold was estimated as the stimulus amplitude giving an 82% correct rate [43].The code for our experiments used the Psychtoolbox [35] for MATLAB (version 3.0.11),utilising the toolbox's stimulus display, stimulus timing (with audio warnings) and QUEST implementations.
Other features of the trials include: • 36 trials for each stimulus.
• The threshold was not adapted for the first 4 trials for each stimulus -This was done because: (i) observers became used to the experimental procedure and (ii) mistakes within the first few trials may lead the Quest method to erroneous threshold values that may be difficult to recover from.
• Between each interval and trial, a small cross (5 × 5 pixels) was presented at the centre of the monitor screen (luminance value 0).As the stimuli were also presented at the centre of the screen this "fixation point" was used to aid the observer in keeping the presentation within the observer's foveal region.The cross pattern was extinguished while the stimuli were presented.• Between each trial an adaptation image was displayed.
This consisted of achromatic spatial 1/ f pink noise [44] with mean luminance equal to that of the stimuli (and background), a maximum amplitude of 128 and a standard deviation of 29.10.
-This helped keep the observers focused on the screen between presentation of the stimulus [45].-For each observer, before the trials the same adaptation image was displayed for 60 seconds in order to adapt the observers' visual system.-A pink noise adaptation visual image was used to adapt the observers' visual system to the typical frequency content of natural images (in a statistical sense).-There were eight observers.All observers were assessed to have normal or corrected to normal vision.None of the observers wore contact lenses.
Viewing was within a darkened room and binocular.-Causes spatial frequency adaptation.

IV. RESULTS
Figures 6, 7, 9 and 10 show the threshold values for the transform subband basis functions described above.Eight participants were used in these experiments and significant outliers were discounted as a pre-processing stage.This was done by discounting samples outside confidence limit of 95% (i.e. an alpha of 0.05) as described in detail by Grubbs [46] and implemented using the deleteoutliers.mMATLAB function. 6The confidence limits were calculated across all subjects for each stimulus.The crosses in each graph show the data after outlier removal.The average value over the eight observers is shown as a solid line on each graph.The graphs are plotted using loglog axes in order to better portray the variance over the range of frequencies and thresholds.

A. Threshold Model
Although there are a multitude of possible different parametric formulae available to model transform thresholds, we  have adopted the model used by Watson et al. [3] shown in equation ( 5).This equation represents a parabolic curve in logarithmic space of the frequency and threshold.
Where a is the contrast offset, g θ is the orientation subband frequency offset, f 0 is the frequency minimum and k is the scaling product of the exponent.The dotted line on each graph shows a fitted curve using standard least squares regression.The curves and average plots do not coincide for all the graphs as: firstly, outliers were discounted when using the regression tool and secondly, the parameters of the threshold model were averaged over a number of datasets.
Figure 11 shows all the models for each transform compared on a single graph.This illustrates that the diagonal subbands of the DWT and DT-CWT have significantly higher thresholds compared to the vertical and horizontal subbands.However, the difference between the vertical and horizontal results are less pronounced within the DT-CWT.It also can be noted that the curvelet results show a pronounced reduction in thresholds compared to the other transforms (presumably from the significantly different shapes of the basis functions).

B. Discussion
For all transforms considered, the model parabola is monotonically increasing (i.e. the function exists only as the right hand side of the parabola) as shown in figures 6, 7, 9 and 10.This contrasts with the conventionally assumed minimum between 2-3 cycles/pixel as shown in figure 8.This was assumed to be because: • there is a decrease in contrast sensitivity with increasing spatial frequency [47], • the basis function size increases with decreasing spatial frequency.
Comparison of all model results (all the dotted lines from figures 6,7,9,10.
The second item has also been noted in [48] for Gabor functions similarly varying in spatial support compared to their central frequency.
The threshold function minima can also be observed to have shifted to a lower spatial frequency by at least 1.5 cycles/degree for basis function transforms that decrease in support linearly with frequency. 7However, since these low frequencies are not likely to be analysed (i.e. over 5 levels of decomposition) in common circumstances, the considered transforms can be assumed to have monotonically increasing models. 8This is important in many applications where the relative importance of subbands at different scales is analysed/ exploited.
1) Model Parameters: The "Oblique Effect" can be observed comparing figures 6(a), 6(b), 7(a) and 7(c) to figures 6(b) and 7(b) (although it is only significant at high frequencies as suggested in the literature).Comparing the output parameter g 2 for both the DWT and DT-CWT it can also be seen that the oblique effect (or at least the lowering of the diagonal sensitivity for high frequencies) is smaller for the DT-CWT.
Comparing our results in table 1 to those of Watson et al. [3] (also included in the table) shows that the results are broadly similar (i.e.threshold frequency minimum f 0 and "Oblique Effect" characterised by g 2 are close in value) except a different offset given by parameter a.It should be noted that the study by Watson et al. employed a significantly smaller dataset (just two observers), was conducted under different viewing conditions and used different wavelet filters.Additionally, they did not linearise their display for the majority of their experiments or use a pink noise masker between tests.Together with the fact that they also pooled spatial basis functions these differences account for the reduced thresholds found in the results of our work relative to theirs.Many vision models exist that are able to predict detection thresholds from visual stimuli [38], [50].Firstly, the detection thresholds of the Modelfest dataset [40] have been very well predicted using a vision model developed by Watson and Ahumada, Jr., [50].The Modelfest dataset contains a selection of stimuli including Gabor and similar patterns (some of the stimuli resemble the elongated gratings of the Curvelet basis functions).The vision model of Watson and Ahumada, Jr., [50] has been able to accurately predict the detection thresholds of all the Modelfest stimuli.This model incorporates the effects of aperture, CSF, pooling etc.Additionally, a similar High Dynamic Range (HDR) based vision has been developed by Mantiuk et al. [38] called HDR-VDP2.As the HDR-VDP2 system was also able to accurately model conventional Visual Dynamic Range (VDR) stimuli and because the code was freely available (as opposed to Modelfest based model [50]) we have used it to compare with our results.The basis function stimuli used in our tests were input into the HDR-VDP2 system together with a mean background "blank image" and the amplitude of the basis function stimuli was adjusted until the probability of detection P det was estimated to be 0.5.The amplitude of the basis function that gave this probability of detection was considered to be the detection threshold of the stimulus (as described in section III-D and shown as Y ) in figures 6, 7, 8, 9 and 10.The HDR-VDP2 results show a good correlation between our results and the model. 9Specifically, these results reflect the lower frequency CSF peak of the considered basis functions in comparison to equivalent Gabor gratings.

A. HDR-VDP2 Model Error
If the mean observer thresholds t j and the HDR-VDP2 model predictions m j for each stimulus j are represented in dBs (i.e. in the log domain), we can use the model error RM S me as defined by Watson and Ahumada, Jr., [50] to represent the HDR-VDP2 model error over all stimuli.
where J is the number of stimuli.For our dataset and the results shown in figures 6, 7, 9 and 10, RM S me = 3.081 dB.This is slightly higher than the RM S me value of 2.8 dB obtained by the HDR-VDP2 model using the Modelfest dataset.However, their experiments have considerable differences from our own and the standard deviation of all observations in our dataset is large at 4.092 dB. 10 These values together with visualisations in the above figures show that our model and results are a good fit to the visual model implemented within HDR-VDP2.

B. Relationship to Contrast Sensitivity
All results have been reported as the threshold of perception of the basis functions.In order to convert these values to contrast sensitivity S we use S = 1 C [51] where C is defined in (4).
Figure 12 compares the contrast sensitivities (S) at orientation θ = 1 (75°) for both the DT-CWT basis functions and Gabor functions.This graph shows that the shape and sensitivity peak of the two functions differ significantly. 11

VI. APPLICATIONS
Numerous examples exist where standard CSF curves are used, together with different wavelet transforms, in image processing algorithms [52]- [56].However, the standard CSFs used are based on Gabor grating experiments.Our work has shown that these curves are inappropriate for use with transforms such as the DWT, DT-CWT, Curvelets and the Steerable Pyramid.Therefore, when considering applications such as watermarking, compression and denoising, our precise transform based CSF / threshold information will enable exact frequency based perceptual processing.

A. Application Example: Perceptual Denoising Using the DT-CWT
Laskar et al. [56] developed a perceptual denoising algorithm that uses a standard CSF curve to modify the perceptual importance of each subband before denoising within a soft thresholding technique based on a DWT decomposition.Achim and Kuruoglu [57] have implemented a Bayesian image denoising method that utilises a bivariate Cauchy prior that exploits the cross scale statistical dependencies of the DT-CWT.As a preliminary example of how our approach can improve performance we have combined these two denoising methods to perform CSF based perceptual image denoising using the DT-CWT and the CSF/threshold models developed within our paper (figure 13).This method is identical to that used by Laskar et al. [56] but replaces the DWT with the DT-CWT and uses a bivariate thresholding technique [57] (the DT-CWT has considerable advantages over the DWT as discussed in [57]).
1) Results: Table II shows the results of denoising experiments where variable amounts of white noise (with standard deviation σ ) are added to three test images and the results are assessed using SSIM [58], VSNR [59] and HDR-VDP2 [38].The SSIM metric is a standard perceptually based metric that measures the structural similarity of two images [58].However, SSIM does not explicitly use any definition of a CSF and therefore we have also included the VSNR and HDR-VDP2 metrics which do.The HDR-VDP2 output gives the probability of detection of a difference between the original and the noisy / denoised image.Therefore (unlike SSIM and VSNR) smaller output values indicate better performing methods.Results for low amounts of noise (2 ≤ σ ≤ 7) are included as the probabilities of detection generated by VDP-HDR2 are only meaningful within this range (for noise levels where σ ≥ 10 the distortion is obvious reflecting a probability of 1.0 for all methods).
This table shows: • DTCWT1: The results of the bivariate denoising method using the DTCWT transform developed by Achim and Kuruoglu [57].• DTCWT2: Identical to DTCWT1 but uses the perceptual weighting method shown in figure 13 in conjunction with the CSF curve used by Laskar et al. [56].• DTCWT3: Identical to DTCWT2 but uses the CSF/threshold results presented in our paper.The images were all of resolution 512 × 512 pixels and greyscale.The DT-CWT used the near symmetric first level filters [39] and the transform contained 6 levels.The results indicate that the integration of CSF curves used by Laskar et al. [56] (DTCWT2) provides better results compared to the original method by Achim et al. (DTCWT1).However, by using our new CSF/threshold information (DTWCWT3) considerable improvement in the results are shown when compared to both DTCWT1 and DTCWT2.The improved results of DTCWT3 over DTCWT2 demonstrates that conventional CSF curves are not appropriate for image processing with analysis transforms such as the DTCWT.

VII. CONCLUSION
We have presented the perceptual thresholds and contrast sensitivities of a range of basis functions from popular wavelet analysis decompositions (the DWT, Steerable Pyramid, DT-CWT and Curvelet transforms).Whereas typical CSF graphs show a peak of sensitivity at approximately 2 cycles/degree, we have demonstrated that only high frequency attenuation of sensitivity is apparent for the considered basis functions over common frequency ranges.The lack of low frequency attenuation is assumed to be caused by the variation in basis function support (low frequency basis functions having larger support than high frequency basis functions).Although a previous study has shown this effect for the DWT, our study is the first to be able to confirm similar trends in the results for the range of more flexible transforms.The results were shown to be comparable to the thresholds generated from the same basis functions using the visual model HDR-VDP2.Similar variations in CSF shape have been found through the variation of grating area by Rovamo et al. [60].However, the basis functions we have considered are not simple Gabor or gratings and therefore our results are precisely realised for the considered transforms.
These new models have demonstrated quantitative improvements in an example denoising application.Our quantitative results facilitate the next generation of perceptual processing applications when using any of the analysed transforms.MATLAB code giving precise thresholds and contrast sensitivities for all subbands for all the transforms are publicly available on figshare https://shar.es/1urZ5a.
Contrast Sensitivity of the Wavelet, Dual Tree Complex Wavelet, Curvelet, and Steerable Pyramid Transforms Paul Hill, Member, IEEE, Alin Achim, Senior Member, IEEE, Mohammed E. Al-Mualla, Senior Member, IEEE, and David Bull, Fellow, IEEE

TABLE I PARAMETERS
FOR THRESHOLD MODEL FOR DWT, DT-CWT, CURVELETS AND STEERABLE PYRAMID TRANSFORMS V. COMPARISON WITH EXISTING VISION MODELS