Adaptive Smoothing of Digital Images: The R Package adimpro

Digital imaging has become omnipresent in the past years with a bulk of applications ranging from medical imaging to photography. When pushing the limits of resolution and sensitivity noise has ever been a major issue. However, commonly used non-adaptive ﬁlters can do noise reduction at the cost of a reduced eﬀective spatial resolution only. Here we present a new package adimpro for R , which implements the propagation-separation approach by (Polzehl and Spokoiny 2006) for smoothing digital images. This method naturally adapts to diﬀerent structures of diﬀerent size in the image and thus avoids oversmoothing edges and ﬁne structures. We extend the method for imaging data with spatial correlation. Furthermore we show how the estimation of the dependence between variance and mean value can be included. We illustrate the use of the package through some examples.


Introduction
Digital imaging has seen a huge progress over the last decade through the broad availability of digital cameras. An image in its digital form can easily undergo image processing (Gonzalez and Woods 2002) in order to improve its quality, to enhance certain details, or to simply achieve some artistic effect. In the search for ever increased resolution and sensitivity denoising has been a major issue since the sources of noise are inherently connected with the physical properties of the technology. Many of the cameras therefore include smoothing algorithms in their imaging process in order to achieve a maximum of image quality for any lightening conditions. However, denoising algorithms have to balance the trade-off between denoising and preserva-tion of structure. Non-adaptive Gaussian filters typically tend to oversmooth edges and thus reduce noise at the cost of sharpness of the image. In order to overcome such a drawback there has been a lot of work in the recent years on edge preserving algorithms. Standard algorithms are based on non-linear diffusion (Weickert 1998), or wavelets (Mallat 1998;Whitcher 2006). Recent developments include curvelets, contourlets, and bandelets (Candés and Donoho 1999;Do and Vetterli 2005;Le Pennec and Mallat 2005;Candes, Demanet, Donoho, and Ying 2007;Do 2003).
Here, we use a new spatially adaptive smoothing algorithm that has been proposed in a series of papers (see e.g., Spokoiny 2000, 2006). In contrast to classical non-adaptive filters the smoothing method based on the propagation-separation (PS) approach naturally adapts to different structures of different size in the image. This is realized through local adaptive weighting schemes which ensure two main properties of the algorithm from which it draws its name. The propagation property means, that the smoothing algorithm behaves like a non-adaptive filter in homogeneous regions. On the other hand the separation property characterizes the restriction of the weighting scheme to homogeneous regions with respect to a specified model for the image data. The procedure has been very successfully applied to the analysis of functional magnetic resonance imaging (Polzehl and Spokoiny 2001;Tabelow, Polzehl, Voss, and Spokoiny 2006). Furthermore it has been suggested to use this adaptive smoothing method in the context of image denoising Spokoiny 2000, 2007).
In this paper we present an implementation of an algorithm based on PS for denoising of grayscale and color images. The R (R Development Core Team 2007) package adimpro is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project. org/. Let i = (i h , i v ) ∈ IR 2 , 1 ≤ i h ≤ n h , 1 ≤ i v ≤ n v denote the n = n h × n v pixel coordinates of a 2D -image and Y i ∈ Y ⊆ IR q an observed gray value (q = 1) or a vector of color values (q ∈ {3, 4}) at coordinates i . We assume that the distribution of each Y i is determined by a parameter θ i = θ(i) which may depend on the coordinates i .
Example 1.1 [Gray scale images] Every Y i follows a Gaussian distribution with mean θ i and unknown variance σ 2 that may depend on θ i . Example 1.2 [Color images] In color images Y i denotes a vector of values in a 3 dimensional color space at pixel coordinates i . A 4th component may code transparency information. The observed vectors Y i can often be modeled as multivariate Gaussian, i.e. Y i ∼ N q (θ i , Σ i ) with some unknown covariance Σ that again may depend on θ i .
Depending on the image acquisition process noise in digital images may have another distribution than a simple Gaussian (Gonzalez and Woods 2002). In all examples the situation may be aggravated by spatial correlation due to physical reasons or preprocessing steps.
The paper is organized as follows. Color images require the definition of some appropriate space to represent color. Section 2 therefore reviews the definition of some important color spaces. Section 3 provides the smoothing algorithm based on the propagation-separation approach. Spatial correlation and models for heteroscedasticity can be incorporated into the approach. We provide estimates for both spatial correlation and heteroskedastic variances. Examples are given in Section 4.

Color spaces
The concept of color spaces is closely related to the perception of color by the human eye. Sunlight when passing a glass prism does show up a whole spectrum of pure colors ranging from red to violet which blend smoothly into one another. They can be characterized by a single wavelength of the corresponding electromagnetic wave that ranges from approximately 400 to 700nm. The superposition of such chromatic light is perceived by humans as another color rather than recognizing the individual wavelengths separately (Malacara 2002). Thus the mixture of chromatic red and green light is seen as yellow. In the human eye the cones are responsible for color vision. It has been experimentally found that they can be mostly divided into three principal sensing categories, corresponding roughly to red, green, and blue. Other colors can therefore be seen as combinations of these three primary colors. This naturally leads to the definition of the RGB color space in which each color is characterized by the amounts of the primary colors necessary to build it up.
Many displaying devices like computer monitors directly make use of this color space through excitation of different amounts of red, green and blue areas on the screen. Digital cameras usually filter the three primary colors from the incoming light in the image acquisition process using a Bayer mask (Bayer 1976). Encoding color in the internet or in imaging software follows this scheme too. It should be noted, that not all visible color can be created by such mix, but it should be sufficient for most applications.
However, this does not match the way humans describe color. For example bright or pale yellow are seen as two variants of the same color rather than different mixtures of the three primary colors. Taking this into account we can characterize color in terms of brightness, hue, and saturation. Hue is associated with the dominant wavelength and corresponds to the notion of color as we perceive it. The amount of white light mixed with the hue is referred to as saturation. Therefore the pure spectral colors are fully saturated. Other colors like pink are mixtures of a pure color with white and thus less saturated. The combination of hue and saturation is called chromaticity. The brightness corresponds to the intensity as known from grayscale images. The three characteristics mentioned above form the HSI color space.
Other color spaces like YUV or YIQ are related to the way, signal transport for television is performed. One of the components (Y) again reflects the brightness of the image while the other two refer to the chromaticity. Therefore black and white TV sets can simply ignore the color information in the signal and nevertheless produce a reasonable output.
A specific color can be transferred from one color space into another. For example the conversion from RGB to YUV is a linear transformation and can thus be represented as a matrix multiplication. This makes the conversion a very fast operation. Details on the conversion between the color spaces can be found in the literature (see Gonzalez and Woods 2002).
In the package adimpro grayscale and color images are represented by objects of class "adimpro". Image data is stored as two and three dimensional arrays, respectively. The third dimension for color images is 3 and contains the color information in a specific color space as described above. The color space is coded in the attribute type and taken from the following list: yuv, yiq, hsi, xyz, rgb, greyscale. It is possible to get information on an adimpro-object via the generic summary-function. The package contains several functions to transform images from one colorspace to another like: R> img.yuv <-rgb2yuv(img.rgb) R> img.rgb <-hsi2rgb(img.hsi) R> img.grey <-rgb2grey(img.rgb) One can view any image with:

Smoothing using structural adaptation
Efficient image denoising requires a reduction of random noise without compromising the content of the image. The propagation-separation approach implemented in the package adimpro is designed to achieve this goal. Image denoising is ideally done on images that have been exposed to little preprocessing, like images coded in RAW or 16-Bit TIFF format. Lossy preprocessing like conversion to JPEG should be avoided ahead of adaptive image denoising to improve the quality of adaptation of the algorithm. Image acquisition in digital cameras is usually done by measuring the intensity of light for the three primary colors using a Bayer mask (Bayer 1976). The process of reconstruction of the missing color components for the full resolution is called demosaicing but will be considered elsewhere (Polzehl and Tabelow 2007).
Let the distribution of Y i be specified up to a parameter θ i that may depend on the coordinates i of a pixel and some additional parameters ν , such as possibly heterogeneous error variances. Let us also assume for a moment that we know how estimates of these parameters can be obtained.
We presume that for each pixel i the function θ(.) can be well approximated by a constant within a local vicinity U i containing pixel i . This serves as our structural assumption.
Our estimation problem can now be viewed as consisting of two parts. In order to efficiently estimate the function θ(.) at pixel coordinates i we need to describe a local model, i.e. to assign weights W i = {w i1 , . . . , w in } . If we knew the neighborhood U i by an oracle we would define local weights as w ij = w j (i) = I j∈U i and use these weights to estimate θ i as (1) Since θ and therefore U i are unknown the assignments will have to depend on the information on θ that we can extract from the observed data. If we have good estimates θ j of θ j we can use this information to infer on the set U i by testing the hypothesis A weight w ij can be assigned based on the value of a test statistic T ij , assigning zero weights if θ j and θ i are significantly different. This provides us with a set of weights Given the local model we can then estimate our function θ(.) in each pixel i by (1).
We utilize both steps in an iterative procedure. We start with a very local model in each point i given by weights w with l (0) The initial bandwidth h (0) is chosen very small. K loc is a kernel function supported on [−1, 1] , i.e. weights vanish outside a ball U (0) i of radius h (0) centered in i . We then iterate two steps, estimation of θ(x) and refining the local models. In the k th iteration new weights are generated as The bandwidth h is increased by a constant factor with each iteration k . The test statistic for (2) T ij . This term effectively measures the statistical difference of the current estimates in i and j . In (6) the term D(θ, θ ; ν) denotes a distance between the probability measures P θ,ν and P θ ,ν .

Adaptive weights smoothing
We now formally describe the resulting algorithm. For clarity of presentation we, for a moment, defer the problem of estimating ν i and modifications due to spatial correlation in the image.
• Initialization: Set the initial bandwidth h (0) and compute, for every i the statistics and the estimates θ • Adaptation: For every pair i, j , compute the penalties Now compute the weights w and specify the local model by W , increase k by 1 and continue with the adaptation step.
Image type Stochastic penalty s The algorithm can be adjusted to meet the properties of different types of images, that is different types of distributions of Y , by an appropriate choice of the statistical penalty s ij in (9). For grayscale images 1.1 and color images 1.2 the appropriate penalty is given in Table 1.

Choice of parameters -Propagation condition
The proposed procedure involves several parameters. The most important one is the scale parameter λ in the statistical penalty s ij . The special case λ = ∞ simply leads to a kernel estimate with bandwidth h max . We propose to choose λ as the smallest value satisfying a propagation condition. This condition requires that, if the local assumption is valid globally, i.e. θ(x) ≡ θ does not depend on x , then with high probability the final estimate for h max = ∞ coincides in every point with the global estimate. More formally we request that in this case for each iteration k for a specified constant α > 0 . Herě denotes the nonadaptive kernel estimate employing the bandwidth h (k) from step k . The value α characterizes a maximal loss in efficiency of the adaptive estimate compared to its non-adaptive counterpart, with α = 0.1 providing reasonable results.
The parameter λ provided by this condition is characteristic for the selected class of error distributions, but does not depend on the unknown model parameter θ . A default value for λ can therefore be selected by simulations using an artificial constant grayscale or color image with independent Gaussian noise. The values of λ implemented in the package for grayscale and color images are selected using α = 0.1 . Smaller values of λ may lead to random segmentation, while a larger λ simply reduces the sensitivity of the procedure. In general there is no need to adjust this parameter for any image if the noise is Gaussian. For the case of heavy tailed or highly skewed error distributions we provide an additional argument ladjust in function awsimage that can be used to increase the value of λ if random segmentation is observed.
The second parameter of interest is the maximal bandwidth h max which controls both numerical complexity of the algorithm and smoothness within homogeneous regions. The maximal bandwidth can be chosen by heuristic arguments. Such heuristics include the amount of computing time, the maximal size of homogeneous regions within the image and the variance reduction aspired. The separation property of the procedure secures that, up to a constant factor, the quality of reconstruction reached at any intermediate step will not be lost for larger bandwidths (see Polzehl and Spokoiny 2006).
Additionally we specify a number of parameters and kernel functions that have less influence on the resulting estimates. As a default the kernel functions are chosen as K loc (x) = (1−x 2 ) + and K st (x) = (1 − x 2 ) + . The initial bandwidth h (0) is chosen as 1. The bandwidth is increased after each iteration by a default factor c h = 1.25 1/2 .

Correction for spatial correlation
In a CCD sensor, the common device within a consumer digital camera, a Bayer mask is used to obtain color information. Within a square of four pixels one is red filtered, one blue and two green, that is red and blue are recorded in every fourth pixel while green is obtained in every second. To obtain a color image in full resolution the missing color values are filled in by interpolation (demosaicing). This results in spatial correlation within the image. Other sources of spatial resolution are noise reduction filters and preprocessing steps applied within the camera.
The statistical penalty s ij effectively measures the probability of observing the estimate θ i if the correct parameter θ i equals θ j . The penalty heavily depends on a correct assessment of the variability of θ i . In case of (positive) spatial correlation the variance reduction achieved will be reduced and therefore we need to adjust the statistical penalty.
As a rough model we assume the spatial correlation to be caused by convolution with a Gaussian kernel with unknown bandwidths g = (g h , g v ) in horizontal and vertical direction.
This leads to a multiplicative correction factor C i (g, h) =

Variance mean dependence
Error variances in the recorded images are usually unknown and need to be estimated. Depending on the imaging process error variances may be heteroscedastic. Special examples consists of confocal microscopy or perfectly linear cameras where the signal can be viewed as a linear combination of the number of recorded photons (Poisson counts) and termal noise (approximately homoscedastic Gaussian). Such a situation is characterized by a linear dependence between variance and the expected signal (Tian, Fowler, and Gamal 2001;Hirakawa and Parks 2005). A linear behavior may also serve as an approximation in case of RAW-images. In contrast images recorded in common formats like TIFF or JPEG are gamma corrected, that is a nonlinear transformation has been applied to the measured intensities. Except for low color values a constant variance may be a reasonable assumption for such images, see Figure 1.

Estimation of variances and spatial correlation
Problems frequently encountered in image processing are unknown error variances, or even variance inhomogeneity, spatial correlation and correlated errors between channels in color images. We try, in each color channel, to estimate both the spatial correlation and the, possibly heteroscedastic, error variances. These estimates are improved within the estimation process.
Let l specify the index of the color channel and r il is replaced by a non-adaptive kernel estimated using bandwidth 2 .
We estimate the spatial correlation in color channel l by These estimates are severely biased due to the spatial correlation of the original data. The bias vanishes as h (k) increases. We apply an approximate bias correction using a polynomial model for the correlation in ρ , (h (k) ) −1/2 , and (h (k) ) −1 .
In case of color images estimates of the correlations between channels, that is ρ 1,2 , ρ 1,3 , ρ 2,3 , can be obtained as Variances within channels can again be estimated from adjusted squared residuals in case of constant variances. The weights (N by weighted least squares with weights (N (k) i − 1) leading to variance estimates For color images estimates of covariance matrices Σ i are then obtained as assuming that correlations between channels do not depend on θ i .
The proposed procedure, employing a local constant structural assumption, is implemented in function awsimage.

Local polynomial propagation-separation approach
Until now we assume that the gray or color value is locally constant. This assumption is essentially used in the form of the stochastic penalty s ij . The effect can be viewed as a regularization in the sense that in the limit for h max → ∞ the reconstructed image is forced to a local constant gray value or color structure even if the true image is locally smooth. Such effects can be avoided if a local polynomial structural assumption is employed. Due to the increased flexibility of such models this comes at the price of a decreased sensitivity to discontinuities.
The propagation-separation approach from (Polzehl and Spokoiny 2004) assumes that within a homogeneous region containing pixel i , i.e. for j = (j h , j v ) ∈ U i , the gray value or color Y j can be modelled as where the components of Ψ (δ h , δ v ) contain values of basis functions for integers m 1 , m 2 ≥ 0 , m 1 + m 2 ≤ p and some polynomial order p . For a given local model W i estimates of θ i are obtained by local Least Squares as with The parameters θ i are defined with respect to a system of basis functions centered in i . The component of the estimate of θ in the local model W j , corresponding to the basis function ψ m 1 ,m 2 centered at i can be obtained by a linear transformation as In iteration k a statistical penalty can now, for grayscale images, be defined as In case of color images with independent errors between channels a natural generalization occurs as a possibly weighted sum of the statistical penalties for the three color channels.
A detailed description and discussion of the resulting algorithm and corresponding theoretical results can be found in (Polzehl and Spokoiny 2004).
Local linear and local quadratic models are implemented for both grayscale and color images in function awspimage. The implementation allows for homogeneous spatial correlation and unknown variances that are either constant or a linear function of the gray-or colorvalues.

Examples
The package adimpro can basically read and write the pgm and ppm image format. However the package allows to read and write all image formats supported by ImageMagick (Im-ageMagick Studio LLC 2006), if this is installed on the system. For reading camera RAW formats the package adimpro uses dcraw (Coffin 2006). Both programs are available for many operating systems. However, if you can not or don't want to install them on your system you must rely on some other software or R package to read the image data into the R session or to transform the image data into portable pixmap format. Since image conversion is not the target of the package and excellent software for this purpose exists we do not plan to extend its capabilities in the future.
Reading an image file from disk is done by one of the following statements: R> img <-read.image("image.tif") R> img <-read.raw("image.raw") Both statements create an object of class "adimpro". Such objects are lists containing the gray-or color values as well as image metadata.  Table 2: Estimated spatial correlation in horizontal and vertical direction in the three color channels for the example image used in Figure 3.
red/green red/blue green/blue -0.15 -0.0023 -0.58 Table 3: Estimated correlation between the color channels for the image used in Figure 3.
Alternatively an object of class "adimpro" can be generated from image data provided as an array, with the first two dimensions corresponding to the image dimension and the third being 1, or 3 for grayscale and color valued images, respectively. This is achieved by R> img <-make.image(imgdata).
In the package adimpro we provide two functions which implement the propagation-separation approach described in the preceeding sections. The function awsimage uses a local constant model while awspimage considers local polynomial models up to a degree of 2.
The complexity of the algorithm is mainly determined by h max which corresponds to the maximum bandwidth used in the iteration process (see Section 3 for details).
In Figure 2 we show an example of image smoothing under controlled noise conditions. An image of good quality and low noise has been taken using a CANON 300D with ISO 100. Spatially correlated homogeneous noise has been added.
As we already mentioned before, the assumption of homogeneous noise may only be suitable for gamma corrected images. However, if the camera provides RAW format, noise is characterized by a linear dependence between its variance and the mean intensity of light at each pixel, aside from saturation effects. Thus for images in RAW format, it is possible to specify a model for the variance mean dependence, namely a linear model. Figure 3 shows a test image with nine homogeneous colored regions taken by a Panasonic Lumix LX2 under extreme lighting conditions. ISO 1600 has been used, therefore the image has a relatively strong noise component. The result of adaptive smoothing is given on the right side of the figure. We read and processed the RAW image, as obtained by dcraw (Coffin 2006) using bilinear interpolation for demosaicing. The estimates for the spatial correlation and the correlation between the color channels are given in Table 2 and 3.
Furthermore we demonstrate the effects of the use of the linear variance model, which is the correct one for RAW-images as described above. When smoothing an image using an inadequate constant variance model we underestimate the variance where color values are large. This leads to a segmentation of bright homogeneous regions through adaptation to the noise component. A further negative bias in the estimated variance is introduced when estimating the variance again from residuals in the iterative process. We demonstrate the  Table 4: Estimated value for the variance in the three color channels in four areas of different mean color for the example image used in Figure 3. We considered the four areas in the upper right corner of the image. The first four rows of the table give the mean estimates for a linear variance mean model, while the last row contains the corresponding values for a constant variance mean dependence. The red values show cases, where in a constant model, the variance is underestimated in contrast to the (true) linear model and thus the smoothing procedure partly adapts to the noise. The effects are illustrated in Figure 4. effect in Figure 4 and Tables 4 and 5, where the use of a linear and constant variance model is compared. Table 6 gives an impression of the computing time needed to perform the smoothing with different maximum bandwidths.
As a last example we compare in Figure 5 the results of adaptively smoothing an image taken by a CANON PowerShot S30 at 800 ISO with a local constant and a local quadratic model.
The image files and R scripts to produce the figures are published along with this paper.   Figure 3 is shown here. Again h max = 22.7 has been used. The left column corresponds to a constant variance model, while the right column shows the corresponding results for a linear variance model. The upper images show the degree of adaptation to structure through the sum of weights in the two cases as in Figure 2. The edges between the colored regions can be clearly identified in both models. While there is only little structure in the homogeneous region, which is mainly due to the heavy noise, the bright square shows a lot of structure when smoothing with a constant variance model. Since noise for RAW images consists of two additive components which are approximately proportional to the mean (photon counts) and constant (termal noise) the linear variance model is correct, a constant model underestimates the variance in the bright region. Thus smaller differences of the color value are detected as significant with higher probability. The procedure therefore partly adapts to the noise structure. red channel green channel blue channel Estimated constant parameter 0.000378 0.0000824 0.0000775 Estimated linear parameter 0.00573 0.00404 0.00382 Table 5: Estimated parameters for the linear variance model in the three color channels for the image used in Figure 3.
h max 2 4 10 22.7 Grayscale 2.2s 5.1s 13.6s 37.2s Color 4.9s 11.4s 29.7s 83.8s Table 6: CPU time needed for smoothing a part of the image in figure 2 (color and grayscale) of size 512x512 pixels on a Xeon 5160 (3.00GHz), SuSE Linux 10.1, R 2.3.1. using different maximum bandwidths h max . Center: Result of smoothing with a local constant model and h max = 11.6 . Right: Same for a local quadratic model and h max = 18.4 . This demonstrates that a local quadratic model leads to smoother impression at homogeneous regions at the cost of a slightly decreased sensitivity at edges.