Multi-Focus Fusion Technique on Low-Cost Camera Images for Canola Phenotyping

To meet the high demand for supporting and accelerating progress in the breeding of novel traits, plant scientists and breeders have to measure a large number of plants and their characteristics accurately. Imaging methodologies are being deployed to acquire data for quantitative studies of complex traits. Images are not always good quality, in particular, they are obtained from the field. Image fusion techniques can be helpful for plant breeders with more comfortable access plant characteristics by improving the definition and resolution of color images. In this work, the multi-focus images were loaded and then the similarity of visual saliency, gradient, and color distortion were measured to obtain weight maps. The maps were refined by a modified guided filter before the images were reconstructed. Canola images were obtained by a custom built mobile platform for field phenotyping and were used for testing in public databases. The proposed method was also tested against the five common image fusion methods in terms of quality and speed. Experimental results show good re-constructed images subjectively and objectively performed by the proposed technique. The findings contribute to a new multi-focus image fusion that exhibits a competitive performance and outperforms some other state-of-the-art methods based on the visual saliency maps and gradient domain fast guided filter. The proposed fusing technique can be extended to other fields, such as remote sensing and medical image fusion applications.


Introduction
The sharp increase in demand for global food raises the awareness of the public, especially agricultural scientists, to global food security. To meet the high demand for food in 2050, agriculture will need to produce almost 50 percent more food than was produced in 2012 [1]. There are many ways to improve yields for canola and other crops. One of the solutions is to increase breeding efficiency. In the past decade, advances in genetic technologies, such as next generation DNA sequencing, have provided new methods to improve plant breeding techniques. However, the lack of knowledge of phenotyping capabilities limits the ability to analyze the genetics of quantitative traits related to plant growth, crop yield, and adaptation to stress [2]. Phenotyping creates opportunities not only for functional research on genes, but also for the development of new crops with beneficial features. Image-based phenotyping methods are those integrated approaches that enable the potential to greatly enhance plant researchers' ability to characterize many different traits of plants. Modern advanced imaging methods provide high-resolution images and enable the visualization of multi-dimensional data. The basics of image processing have been thoroughly studied and published. Readers can find useful information on image fusion in the textbooks by Starck or Florack [3,4]. These methods allow plant breeders and researchers to obtain exact data, speed up image analysis, bring high artifact. In a recently published article, the authors reviewed the works using sparse representation (SR)-based methods on multi-sensor systems [18]. Based on sparse representation, the same authors also developed the image fusing method for multi-focus and multi-modality images [19]. This SR method learns an over-complete dictionary from a set of training images for image fusion, it may result in a huge increment of computational complexity.
To deal with these obstacles mentioned above, a new multi-focus image fusion based on the image quality assessment (IQA) metrics is proposed in this paper. The proposed fusion method is developed based on crucial IQA metrics and a gradient domain fast guided image filter (GDFGIF). This approach is motivated by the fact that visual saliency maps, including visual saliency, gradient similarity, and chrominance similarity maps, outperform most of the state-of-the-art IQA metrics in terms of the prediction accuracy [20]. According to Reference [20], visual saliency similarity, gradient similarity, and chrominance maps are vital metrics in accounting for the visual quality of image fusion techniques. In most cases, changes of visual saliency (VS) map can be a good indicator of distortion degrees and thus, VS map is used as a local weight map. However, VS map does not work well for the distortion type of contrast change. Fortunately, the image gradient can be used as an additional feature to compensate for the lack of contrast sensitivity of the VS map. In addition, VS map does not work well for the distortion type change of color saturation. This color distortion cannot be well represented by gradient either since usually the gradient is computed from the luminance channel of images. To deal with this color distortion, two chrominance channels are used as features to represent the quality degradation caused by color distortion. These IQA metrics have been proved to be stable and have the best performance [20]. In addition, gradient domain guided filter (GDGIF) [21] and fast guided filter (FGF) [22] are adopted in this work as the combination of GDGIF and FGF and can offer fast and better fused results, especially near the edges, where halo artifacts appear in the original guided image filter. This study focuses on how to fuse multi-focus color images to enhance the resolution and quality of the fused image using a low-cost camera. The proposed image fusion method was developed and compared with other state-of-the-art image fusion methods. In the proposed multi-focus image fusion, two or more images captured by the same sensor from the same visual angle but with a different focus are combined to obtain a more informative image. For example, a fused image with clearer canola seedpods can be produced by fusing many different images of a canola plant acquired by the same Pi camera at the same angle with many different focus lengths.

Data Acquisition System
This image fusion work is part of the development of a low-cost, high throughput phenotyping mobile system for canola in which a low-cost Raspberry Pi camera is used as a source of image acquisition. This system includes a 3D Time-of-Flight camera, a Pi camera, a Raspberry Pi3 (RP3), and appropriate power supplies for the cameras and the mini computer (Raspberry Pi3). A built-in remote control allows the user to start and stop image recording as desired. Figure 1 shows various components of the data acquisition system. Data are recorded in the SD card of the RP3 and retrieved using USB connection to a laptop before the images are processed. The Kuman for Raspberry Pi 3 Camera Module with adjustable focus is used in this system. This camera is connected to the Raspberry Pi using the dedicated CSI interface. The Pi camera equips to the 5 megapixels OV5647 sensor. It is capable of capturing 2592 × 1944 pixels static images; it also supports video capturing of 1080 p at 30 fps, 720 p at 60 fps, and 640 × 480 p at 60/90 formats.
The testing subjects were the canola plants at different growing stages. The plants were growing in a controlled environment and also in the field. To capture images of the canola, the plants were directly placed underneath the Pi camera that fixed on the tripod at a distance of 1000 mm (Figure 1). Each canola plant was recorded at 10 fps for 3 s. The time between each change of the focal length is 10 s. Only frame number 20 of each video stream acquired from the Pi camera was extracted to store in the database for later use. The reason for selecting the 20th frame is that the plants and the camera are required to be stable before the images are being captured and processed. Only the regions containing the plant in the selected images were cropped and used for multi-focus image fusion methods.

Image Fusion Algorithm
In the proposed fusion approach, three image quality assessment (IQA) metrics: visual saliency similarity, gradient similarity, and chrominance similarity (or color distortion) are measured to obtain their weight maps. Then these weight maps are refined by a gradient domain fast guided filter in which, a gradient domain guided filter proposed by Reference [21] and a fast guided filter proposed by Reference [22] are combined. The workflow of the proposed multi-focus image fusion algorithm is illustrated in Figure 2. The detail of the proposed algorithm is described as follows. First, each input image is decomposed into a base and detailed component, which contain the large-scale and small-scale variations in intensity. A Gaussian filter is used for each source image to

Image Fusion Algorithm
In the proposed fusion approach, three image quality assessment (IQA) metrics: visual saliency similarity, gradient similarity, and chrominance similarity (or color distortion) are measured to obtain their weight maps. Then these weight maps are refined by a gradient domain fast guided filter in which, a gradient domain guided filter proposed by Reference [21] and a fast guided filter proposed by Reference [22] are combined. The workflow of the proposed multi-focus image fusion algorithm is illustrated in Figure 2. The detail of the proposed algorithm is described as follows.

Image Fusion Algorithm
In the proposed fusion approach, three image quality assessment (IQA) metrics: visual saliency similarity, gradient similarity, and chrominance similarity (or color distortion) are measured to obtain their weight maps. Then these weight maps are refined by a gradient domain fast guided filter in which, a gradient domain guided filter proposed by Reference [21] and a fast guided filter proposed by Reference [22] are combined. The workflow of the proposed multi-focus image fusion algorithm is illustrated in Figure 2. The detail of the proposed algorithm is described as follows. First, each input image is decomposed into a base and detailed component, which contain the large-scale and small-scale variations in intensity. A Gaussian filter is used for each source image to  First, each input image is decomposed into a base and detailed component, which contain the large-scale and small-scale variations in intensity. A Gaussian filter is used for each source image to obtain its base component, and the detailed component can be easily obtained by subtracting the base component from the input image, as given by: where B n and D n are the base and detail component of the n th input image, respectively. * denotes convolution operator, and G r,σ is a 2-D Gaussian smoothing filter. Several measures were used to obtain weight maps for image fusing. Visual saliency similarity, gradient similarity, and chrominance maps are vital metrics in accounting for the visual quality of image fusion techniques [20]. In most cases, changes of visual saliency (VS) map can be a good indicator of distortion degrees and thus, VS map is used as a local weight map. However, VS map does not work very well for the distortion type of contrast change. Fortunately, the gradient modulus can be used as an additional feature to compensate for the lack of contrast sensitivity of the VS map. In addition, VS map does not work well for the distortion type change of color saturation. This color distortion cannot be well represented by gradient either since usually gradient is computed from the luminance channel of images. To deal with this color distortion, two chrominance channels are used as features to represent the quality degradation caused by color distortion. Motivated by these metrics, an image fusion method is designed based on the measurement of the three key visual features of input images.

Visual Saliency Similarity Maps
A saliency similarity detection algorithm proposed by [23] is adopted to calculate visual saliency similarity due to its higher accuracy and low computational complexity. This algorithm is constructed by combining three simple priors: frequency, color, and location. The visual saliency similarity maps are calculated as VS k n = SF k n ·SC k n ·SD k n where SF k n , SC k n , SD k n are the saliency at pixel k under frequency, color and location priors. SF k n is calculated by SF k n = (IL k n * g) 2 + (Ia k n * g) 2 + (Ib k n * g) 2 ) 1/2 (4) where IL k n , Ia k n , Ib k n are three resulting channels transformed from the given RGB input image, I n to CIEL*a*b* space. * denotes the convolution operation. CIEL*a*b* is an opponent color system that a* channel represents green-red information while b* channel represents blue-yellow information. If a pixel has a smaller (greater) a* value, it would seem greenish (reddish). If a pixel has a smaller (greater) b* value, it would seem blueish (yellowish). Then, if a pixel has a higher a* or b* value, it would seem warmer; otherwise, colder. The color saliency SC n at pixel k is calculated using where σ C is a parameter. Ia k n = Ia k n −mina maxa−mina , Ib k n = Ib k n −minb maxb−minb , mina(maxa) is the minimum (maximum) value of the Ia and minb (maxb) is the minimum (maximum) value of the Ib.
Many studies found that the regions near the image center are more attractive to human visual perception [23]. It can thus be suggested that regions near the center of the image will be more likely to be "salient" than the ones far away from the center. The location saliency at pixel k under the location prior can be formulated by where σ D is a parameter. c is the center of the input image I n . Then, the visual saliency is used to construct the visual saliency (VS) maps, given by where G r,σ is a Gaussian filter.

Gradient Magnitude Similarity
According to Zhang et al. [24], the gradient magnitude is calculated as the root mean square of image directional gradients along two orthogonal directions. The gradient is usually computed by convolving an image with a linear filter such as the classic Sobel, Prewitt and Scharr filters. The gradient magnitude similarity algorithm proposed by Reference [24] is adopted in this study. This algorithm uses a Scharr gradient operator, which could achieve slightly better performance than Sobel and Prewitt operators [25]. With the Scharr gradient operator, the partial derivatives GMx k n and GMy k n of an input image I n are calculated as: The gradient modulus of the image I n is calculated by The gradient is computed from the luminance channel of input images that will be introduced in the next section. Similar to the visual saliency maps, the gradient magnitude (GM) maps is constructed as

Chrominance Similarity
The RGB input images are transformed into an opponent color space, given by The L channel is used to compute the gradients introduced in the previous section. The M and N (chrominance) channels are used to calculate the color distortion saliency, given by Finally, the chrominance similarity or color distortion saliency (CD) maps are calculated by

Weight Maps
Using three measured metrics above, the weight maps are computed as given by where , , and ɤ are parameters used to control the relative impor gradient saliency (GM), and color distortion saliency (CD). From th location k, the overall weight maps of each input image can be obtaine where N is the number of input images, is the weight value of Then proposed weight maps are determined by normalizing the salien = ∑ , ∀n = 1, 2, ..., N These weight maps are then refined by a gradient domain guid section.

Gradient Domain Fast Guided Filter
The gradient domain guided filter proposed by Reference [21] is a weight maps. By using this filter, the halo artifacts can be more effecti sensitive to its parameters but still has the same complexity as the guid guided filter has good edge-preserving smoothing properties as the b suffer from the gradient reversal artifacts. The filtering output is a loca image. This is one of the fastest edge-preserving filters. Therefore, the can apply in image smoothing to avoid ringing artifacts.
It is assumed that the filtering output is a linear transform of th window centered at pixel .
where ( , ) are some linear coefficients assumed to be constant in size of (2 1+1) × (2 1+1). The linear coefficients ( , ) can be estim function in the window between the output image Q and the input where is defined as is the mean value of all ( ). ɳ is calculated as 4/( , − min ( ( Ґ ( ) is a new edge-aware weighting used to measure the importance whole guidance image. It is defined by using a local variance of 3 × 3 w (2 1 + 1) × (2 1 + 1) windows of all pixels by is the window size of the filter. (16) where α, β, and where , , and ɤ are parameters used to control the relative importance of visual saliency (VS), gradient saliency (GM), and color distortion saliency (CD). From these weight maps, at each location k, the overall weight maps of each input image can be obtained.
where N is the number of input images, is the weight value of the pixel in the image. Then proposed weight maps are determined by normalizing the saliency maps as follows: These weight maps are then refined by a gradient domain guided filter described in the next section.

Gradient Domain Fast Guided Filter
The gradient domain guided filter proposed by Reference [21] is adopted to optimize the initial weight maps. By using this filter, the halo artifacts can be more effectively suppressed. It is also less sensitive to its parameters but still has the same complexity as the guided filter. The gradient domain guided filter has good edge-preserving smoothing properties as the bilateral filter, but it does not suffer from the gradient reversal artifacts. The filtering output is a local linear model of the guidance image. This is one of the fastest edge-preserving filters. Therefore, the gradient domain guided filter can apply in image smoothing to avoid ringing artifacts.
It is assumed that the filtering output is a linear transform of the guidance image in a local window centered at pixel .
where ( , ) are some linear coefficients assumed to be constant in the local window with the size of (2 1+1) × (2 1+1). The linear coefficients ( , ) can be estimated by minimizing the cost function in the window between the output image Q and the input image where is defined as , is the mean value of all ( ). ɳ is calculated as 4/( , − min ( ( ))). Ґ ( ) is a new edge-aware weighting used to measure the importance of pixel k with respect to the whole guidance image. It is defined by using a local variance of 3 × 3 windows and (2 1 + 1) × (2 1 + 1) windows of all pixels by are parameters used to control the relative importance of visual saliency (VS), gradient saliency (GM), and color distortion saliency (CD). From these weight maps, W at each location k, the overall weight maps of each input image can be obtained.
where N is the number of input images, W k n is the weight value of the pixel k in the n th image. Then proposed weight maps are determined by normalizing the saliency maps as follows: These weight maps are then refined by a gradient domain guided filter described in the next section.

Gradient Domain Fast Guided Filter
The gradient domain guided filter proposed by Reference [21] is adopted to optimize the initial weight maps. By using this filter, the halo artifacts can be more effectively suppressed. It is also less sensitive to its parameters but still has the same complexity as the guided filter. The gradient domain guided filter has good edge-preserving smoothing properties as the bilateral filter, but it does not suffer from the gradient reversal artifacts. The filtering output is a local linear model of the guidance image. This is one of the fastest edge-preserving filters. Therefore, the gradient domain guided filter can apply in image smoothing to avoid ringing artifacts.
It is assumed that the filtering output Q is a linear transform of the guidance image G in a local window w k centered at pixel k.
where (a k , b k ) are some linear coefficients assumed to be constant in the local window w k with the size of (2ζ1 + 1) × (2ζ1 + 1). The linear coefficients (a k , b k ) can be estimated by minimizing the cost function in the window w k between the output image Q and the input image P with 0 ≤ ( , ) ≤ 1. The higher value of ( , ), the less loss of information of the fused image.
The fusion performance Q / is evaluated as a sum of local information preservation estimates between each of the input images and fused image, it is defined as where ( , ) and ( , ) are edge information preservation values, weighted by where γ k is defined as where , , and ɤ are parameters used to control the relative importance of visual saliency (VS), gradient saliency (GM), and color distortion saliency (CD). From these weight maps, at each location k, the overall weight maps of each input image can be obtained.
where N is the number of input images, is the weight value of the pixel in the image. Then proposed weight maps are determined by normalizing the saliency maps as follows: These weight maps are then refined by a gradient domain guided filter described in the next section.

Gradient Domain Fast Guided Filter
The gradient domain guided filter proposed by Reference [21] is adopted to optimize the initial weight maps. By using this filter, the halo artifacts can be more effectively suppressed. It is also less sensitive to its parameters but still has the same complexity as the guided filter. The gradient domain guided filter has good edge-preserving smoothing properties as the bilateral filter, but it does not suffer from the gradient reversal artifacts. The filtering output is a local linear model of the guidance image. This is one of the fastest edge-preserving filters. Therefore, the gradient domain guided filter can apply in image smoothing to avoid ringing artifacts.
It is assumed that the filtering output is a linear transform of the guidance image in a local window centered at pixel .
where ( , ) are some linear coefficients assumed to be constant in the local window with the size of (2 1+1) × (2 1+1). The linear coefficients ( , ) can be estimated by minimizing the cost function in the window between the output image Q and the input image P where is defined as , is the mean value of all ( ). ɳ is calculated as 4/( , − min ( ( ))).
Ґ ( ) is a new edge-aware weighting used to measure the importance of pixel k with respect to the whole guidance image. It is defined by using a local variance of 3 × 3 windows and (2 1 + 1) × (2 1 + 1) windows of all pixels by where ( ) = , ( ) , ( ). is the window size of the filter. The optimal values of and are computed by µ χ,∞ is the mean value of all χ(k). where , , and ɤ are parameters used to control the relative importance of visual saliency (VS), gradient saliency (GM), and color distortion saliency (CD). From these weight maps, at each location k, the overall weight maps of each input image can be obtained.
where N is the number of input images, is the weight value of the pixel in the image. Then proposed weight maps are determined by normalizing the saliency maps as follows: These weight maps are then refined by a gradient domain guided filter described in the next section.

Gradient Domain Fast Guided Filter
The gradient domain guided filter proposed by Reference [21] is adopted to optimize the initial weight maps. By using this filter, the halo artifacts can be more effectively suppressed. It is also less sensitive to its parameters but still has the same complexity as the guided filter. The gradient domain guided filter has good edge-preserving smoothing properties as the bilateral filter, but it does not suffer from the gradient reversal artifacts. The filtering output is a local linear model of the guidance image. This is one of the fastest edge-preserving filters. Therefore, the gradient domain guided filter can apply in image smoothing to avoid ringing artifacts.
It is assumed that the filtering output is a linear transform of the guidance image in a local window centered at pixel .
where ( , ) are some linear coefficients assumed to be constant in the local window with the size of (2 1+1) × (2 1+1). The linear coefficients ( , ) can be estimated by minimizing the cost function in the window between the output image Q and the input image P where is defined as , is the mean value of all ( ). ɳ is calculated as 4/( , − min ( ( ))). Ґ ( ) is a new edge-aware weighting used to measure the importance of pixel k with respect to the whole guidance image. It is defined by using a local variance of 3 × 3 windows and (2 1 + 1) × (2 1 + 1) windows of all pixels by is calculated as 4/(µ χ,∞ − min(χ(k))). s the quantitative assessment values of five different multi-focus fusion osed method. The larger the value of these metrics, the better image quality is. old represent the highest performance. From Table 1, it can be seen that the duces the highest quality scores for all three objectives metrics except for QY asets and QAB/F with "Book" (extra images were also run to test the rgest quality scores imply that the proposed method performed well, stably, it can be concluded that the proposed method reveals the competitive mpared with previous multi-focus fusion methods both in visual perception Table 2 describes the ranking of the proposed method with others based on the s. The performance (including quality of the images and the processing time) The results show the outperformance of the proposed technique with other published.
lusions escription and quality images, especially images acquired from the digital for canola phenotyping, an image fusion method is necessary. A new multi-focus as proposed with the combination of the VS maps and gradient domain fast roposed algorithm, the VS maps were first deployed to obtain visual saliency, imilarity saliency, and chrominance saliency (or color distortions), then the constructed with a mix of three metrics. Next, the final decision weight maps izing the initial weight map with a gradient domain fast guided filter at two he fused results were retrieved by the combination of two-component weight nent source images that present large-scale and small-scale variations in d method was compared with five proper representative fusion methods both ctive evaluations. Based on the experiment's results, the proposed fusion mpetitive performance with or outperforms some state-of-the-art methods measure and gradient domain fast guided filter. The proposed method can use re captured by either a high-end or low-end camera, especially the low-cost Pi thod can be used to improve the images for trait identification in phenotyping ies. , some limitations of the proposed multi-focus image fusion, such as small-blurred ries between the focused and defocused regions and computational cost, are G (k) is a new edge-aware weighting used to measure the importance of pixel k with respect to the whole guidance image. It is defined by using a local variance of 3 × 3 windows and (2ζ1 + 1) × (2ζ1 + 1) windows of all pixels bŷ ( , ), respectively. Table 1 illustrates the quantitative assessment values of five different multi-focus fusion methods and the proposed method. The larger the value of these metrics, the better image quality is. The values shown in bold represent the highest performance. From Table 1, it can be seen that the proposed method produces the highest quality scores for all three objectives metrics except for QY with "Canola 2" datasets and QAB/F with "Book" (extra images were also run to test the performance). These largest quality scores imply that the proposed method performed well, stably, and reliably. Overall, it can be concluded that the proposed method reveals the competitive performance when compared with previous multi-focus fusion methods both in visual perception and objective metrics. Table 2 describes the ranking of the proposed method with others based on the quality of fused images. The performance (including quality of the images and the processing time) is scaled from 1 to 6. The results show the outperformance of the proposed technique with other techniques previously published.

Summary and Conclusions
To improve the description and quality images, especially images acquired from the digital camera or the Pi camera for canola phenotyping, an image fusion method is necessary. A new multi-focus image fusion method was proposed with the combination of the VS maps and gradient domain fast guided filters. In the proposed algorithm, the VS maps were first deployed to obtain visual saliency, gradient magnitude similarity saliency, and chrominance saliency (or color distortions), then the initial weight map was constructed with a mix of three metrics. Next, the final decision weight maps were obtained by optimizing the initial weight map with a gradient domain fast guided filter at two components. Finally, the fused results were retrieved by the combination of two-component weight maps and two-component source images that present large-scale and small-scale variations in intensity. The proposed method was compared with five proper representative fusion methods both in subjective and objective evaluations. Based on the experiment's results, the proposed fusion method presents a competitive performance with or outperforms some state-of-the-art methods based on the VS maps measure and gradient domain fast guided filter. The proposed method can use digital images which are captured by either a high-end or low-end camera, especially the low-cost Pi camera. This fusion method can be used to improve the images for trait identification in phenotyping of canola or other species.
On the other hand, some limitations of the proposed multi-focus image fusion, such as small-blurred regions in the boundaries between the focused and defocused regions and computational cost, are where χ(k) = σ G,1 (k)σ G,ζ1 (k). ζ 1 is the window size of the filter.
The optimal values of a k and b k are computed by ( , ), respectively. Table 1 illustrates the quantitative assessment values of five different multi-focus fusion methods and the proposed method. The larger the value of these metrics, the better image quality is. The values shown in bold represent the highest performance. From Table 1, it can be seen that the proposed method produces the highest quality scores for all three objectives metrics except for QY with "Canola 2" datasets and QAB/F with "Book" (extra images were also run to test the performance). These largest quality scores imply that the proposed method performed well, stably, and reliably. Overall, it can be concluded that the proposed method reveals the competitive performance when compared with previous multi-focus fusion methods both in visual perception and objective metrics. Table 2 describes the ranking of the proposed method with others based on the quality of fused images. The performance (including quality of the images and the processing time) is scaled from 1 to 6. The results show the outperformance of the proposed technique with other techniques previously published.

Summary and Conclusions
To improve the description and quality images, especially images acquired from the digital camera or the Pi camera for canola phenotyping, an image fusion method is necessary. A new multi-focus image fusion method was proposed with the combination of the VS maps and gradient domain fast guided filters. In the proposed algorithm, the VS maps were first deployed to obtain visual saliency, gradient magnitude similarity saliency, and chrominance saliency (or color distortions), then the initial weight map was constructed with a mix of three metrics. Next, the final decision weight maps were obtained by optimizing the initial weight map with a gradient domain fast guided filter at two components. Finally, the fused results were retrieved by the combination of two-component weight maps and two-component source images that present large-scale and small-scale variations in intensity. The proposed method was compared with five proper representative fusion methods both in subjective and objective evaluations. Based on the experiment's results, the proposed fusion method presents a competitive performance with or outperforms some state-of-the-art methods based on the VS maps measure and gradient domain fast guided filter. The proposed method can use digital images which are captured by either a high-end or low-end camera, especially the low-cost Pi camera. This fusion method can be used to improve the images for trait identification in phenotyping of canola or other species.
On the other hand, some limitations of the proposed multi-focus image fusion, such as small-blurred regions in the boundaries between the focused and defocused regions and computational cost, are ( , ), respectively. Table 1 illustrates the quantitative assessment values of five different multi-focus fusion methods and the proposed method. The larger the value of these metrics, the better image quality is. The values shown in bold represent the highest performance. From Table 1, it can be seen that the proposed method produces the highest quality scores for all three objectives metrics except for QY with "Canola 2" datasets and QAB/F with "Book" (extra images were also run to test the performance). These largest quality scores imply that the proposed method performed well, stably, and reliably. Overall, it can be concluded that the proposed method reveals the competitive performance when compared with previous multi-focus fusion methods both in visual perception and objective metrics. Table 2 describes the ranking of the proposed method with others based on the quality of fused images. The performance (including quality of the images and the processing time) is scaled from 1 to 6. The results show the outperformance of the proposed technique with other techniques previously published.

Summary and Conclusions
To improve the description and quality images, especially images acquired from the digital camera or the Pi camera for canola phenotyping, an image fusion method is necessary. A new multi-focus image fusion method was proposed with the combination of the VS maps and gradient domain fast guided filters. In the proposed algorithm, the VS maps were first deployed to obtain visual saliency, gradient magnitude similarity saliency, and chrominance saliency (or color distortions), then the initial weight map was constructed with a mix of three metrics. Next, the final decision weight maps were obtained by optimizing the initial weight map with a gradient domain fast guided filter at two components. Finally, the fused results were retrieved by the combination of two-component weight maps and two-component source images that present large-scale and small-scale variations in intensity. The proposed method was compared with five proper representative fusion methods both in subjective and objective evaluations. Based on the experiment's results, the proposed fusion method presents a competitive performance with or outperforms some state-of-the-art methods based on the VS maps measure and gradient domain fast guided filter. The proposed method can use digital images which are captured by either a high-end or low-end camera, especially the low-cost Pi camera. This fusion method can be used to improve the images for trait identification in phenotyping of canola or other species.
On the other hand, some limitations of the proposed multi-focus image fusion, such as small-blurred regions in the boundaries between the focused and defocused regions and computational cost, are The final value ofQ i is calculated byQ where a k and b k are the mean values of a k and b k in the window, respectively. a k and b k are computed by where w ζ1 (k) is the cardinality of w ζ1 (k).

Refining Weight Maps by Gradient Domain Guided Filter
Due to these weight maps being noisy and not well aligned with the object boundaries, the proposed approach deploys a gradient domain guided filter to refine the weight maps. The gradient domain guided filter is used at each weight map W n with the corresponding input image I n . However, the weigh map W_D n used W_B n as the guidance image to improve the W_D n , it is calculated by W_B n = G r1,ε1 (W n , I n ) (28) W_D n = G r2, ε2 (W_B n , I n ) (29) where r1, ε1 and r2, and ε2 are the parameters of the guided filter. W_B n and W_D n are the refined weight maps of the base and detail layers, respectively. Both weight maps W_B n and W_D n are deployed using mathematical morphology techniques to remove small holes and unwanted regions in the focus and defocus regions. The morphology techniques are described as bellow, mask = W n < threshold temp1 = im f ill(mask, holes ) temp2 = 1 − temp1 temp3 = im f ill(temp2, holes ) W n (re f ined) = bwareaopen(temp3, threshold) Then, the values of the N refined weight maps are normalized such that they sum to one at each pixel k. Finally, the fused base and detail layer images are calculated and blended to fuse the input images, as given by B n = W_B n * B n (31) The fast-guided filter is improved by the guided filter proposed by Reference [22]. This algorithm is adopted for reducing the processing of gradient domain guided filter time complexity. Before processing the gradient domain guided filter, the rough transmission map and the guidance image employ nearest the neighbor interpolation down-sampling. After gradient domain guided filter processing, the gradient domain guided filter output image uses bilinear interpolation for up-sampling and obtains the refining transmission map. Using this fast-guided filter, the gradient domain guided filter performs better than the original one. Therefore, the proposed filter was named as the gradient domain fast guided filter.

Multi-Focus Image Fusion
This section describes the comprehensive experiments conducted to evaluate and verify the performance of the proposed approach. The proposed algorithm was developed to fit many types of multi-focus images that are captured by any digital camera or Pi camera. The proposed method was also compared with five multi-focus image fusion techniques: the multi-scale weighted gradient based method (MWGF) [26], the DCT based Laplacian pyramid fusion technique (DCTLP) [27], the image fusion with guided filtering (GFF) [28], the gradient domain-based fusion combined with a pixel-based fusion (GDPB) [29], and the image matting (IM)-based fusion algorithm [30]. The codes of these methods were downloaded and run on the same computer to compare to the proposed method.
The MWGF method is based on the image structure saliency and two scales to solve the fusion problems raised by anisotropic blur and miss-registration. The image structure saliency is used because it reflects the saliency of local edge and corner structures. The large-scale measure is used to reduce the impacts of anisotropic blur and miss-registration on the focused region detection, while the small-scale measure is used to determine the boundaries of the focused regions. The DCTLP presents an image fusion method using Discrete Cosine Transform based Laplacian pyramid in the frequency domain. The higher level of pyramidal decomposition, the better quality of the fused image. The GFF method is based on fusing two-scale layers by using a guided filter-based weighted average method. This method measures pixel saliency and spatial consistency at two scales to construct weight maps for the fusion process. The GDPB method fuses luminance and chrominance channels separately. The luminance channel is fused by using a wavelet-based gradient integration algorithm coupled with a Poisson Solver at each resolution to attenuate the artifacts. The chrominance channels are fused based on a weighted sum of the chrominance channels of the input images. The image mating fusion (IM) method is based on three steps: obtaining the focus information of each source image by morphological filtering, applying an image matting technique to achieve accurate focused regions of each source image, and combining these fused regions to construct the fused image.
All methods used the same input images as the ones applied in the proposed technique. Ten multi-focused image sequences were used in the experiments. Four of them were canola images captured by setting well-focused and manual changing focal length of the Pi camera; the others were selected from the general public datasets used for many image fusion techniques. These general datasets are available in Reference [31,32]. In the first four canola database sets, three of them were artificial multi-focus images obtained by using LunaPic tool [33], one of them was a multi-focus image acquired directly from the Pi camera after cropping the region of interest as described in Section 2.1.
The empirical parameters of the gradient domain fast guided filter and VS metrics were adjusted to obtain the best outputs. The parameters of the gradient domain fast guided filter (see Equation (22)) consisted of a window size filter (ζ1), a small positive constant (ε), subsampling of the fast-guided filter (s), and a dynamic range of input images (L). The parameters of VS maps (Equation (16)), including α, β, and γ, were used to control visual saliency, gradient similarity, and color distortion measures, respectively. These empirical parameters of the gradient domain fast guided filters were experimentally set as s = 4, L = 9, and two pairs of ζ1(1) = 4, ε(1) = 1.0e − 6 and ζ1(2) = 4, ε(2) = 1.0e − 6 for optimizing base and detail weight maps. Other empirical parameters of VS maps were set as α = 1, β = 0.89, and γ = 0. 31.
Surprisingly, when changing these parameters of the VS maps, such as, α = 0.31, β = 1, and γ = 0.31, the fused results had a similar quality to the first parameter settings. It can be thus concluded that to obtain focused regions, both visual saliency and gradient magnitude similarity can be used as the main saliencies. In addition, the chrominance colors (M and N) also contributed to the quality of the fused results. For example, when increasing the parameters of M and N, the blurred regions appeared in the fused results. Figure 3 shows the outputs of the proposed algorithm, including visual saliency, gradient magnitude similarity, and chrominance colors. The red oval denotes the defocused region of the input image ( Figure 3a).  Figure 3 shows the outputs of the proposed algorithm, including visual saliency, gradient magnitude similarity, and chrominance colors. The red oval denotes the defocused region of the input image (Figure 3a).

Comparison with Other Multi-Fusion Methods
In this section, a comprehensive assessment, including both subjective and objective assessment, is used to evaluate the quality of fused images obtained from the proposed and other methods. Subjective assessments are the methods used to evaluate the quality of an image through many factors, including viewing distance, display device, lighting condition, vision ability, etc. However, subjective assessments are expensive and time consuming. Therefore, objective assessmentsmathematical models-are designed to predict the quality of an image accurately and automatically.
For subjective or perceptual assessment, the comparisons of these fused images are shown from

Comparison with Other Multi-Fusion Methods
In this section, a comprehensive assessment, including both subjective and objective assessment, is used to evaluate the quality of fused images obtained from the proposed and other methods. Subjective assessments are the methods used to evaluate the quality of an image through many factors, including viewing distance, display device, lighting condition, vision ability, etc. However, subjective assessments are expensive and time consuming. Therefore, objective assessments-mathematical models-are designed to predict the quality of an image accurately and automatically.
For subjective or perceptual assessment, the comparisons of these fused images are shown from In almost all the cases, the MWGF method offers quite good fused images; however, sometimes it fails to deal with the focused regions. For example, the blurred regions remain in the fused image as marked by the red circle in Figure 4c. The DCTLP method offers fused images as good as the MWGF but causes blurring of the fused images in all examples. The IM method also provides quite good results; however, ghost artifacts remain in the fused images, as shown in Figure 4g, Figure 6g, and Figure 7g. Although the fused results of the GFF method reveal good visual effects at first glance, small blurred regions are still remained at the edge regions (the boundary between focused and defocused regions) of the fused results. This blurring of edge regions can be seen in the "Rose flower" fused images in Figure 7e. The fused images of the GDPB method have unnatural colors and too much brightness. The fused results of the GDPB are also suffered from the ghost artifacts on the edge regions and on the boundary between the focused and defocused regions. It can be clearly seen that the proposed algorithm can obtain clearer fused images and better visual quality and contrast than other algorithms due to its combination of the gradient domain fast-guided filter and VS maps. The proposed algorithm offers fused images with fewer block artifacts and blurred edges.
In addition to subjective assessments, an objective assessment without the reference image was also conducted. Three objective metrics, including mutual information (MI) [34], structural similarity (QY) [35], and the edge information-based metric Q(AB/F) [36] were used to evaluate the fusion performance of different multi-focus fusion methods.
The mutual information (MI) measures the amount of information transferred from both source images into the resulting fused image. It is calculated by where I(X, F) is the mutual information of the input image X and fused image F. I(Y, F) is the mutual information of the input image Y and fused image F. H(X), H(Y), and H(F) denotes the entropies of the input image X, Y, and used image F, respectively. ghost artifacts on the edge regions and on the boundary between the focused and defocused regions. It can be clearly seen that the proposed algorithm can obtain clearer fused images and better visual quality and contrast than other algorithms due to its combination of the gradient domain fast-guided filter and VS maps. The proposed algorithm offers fused images with fewer block artifacts and blurred edges. In addition to subjective assessments, an objective assessment without the reference image was also conducted. Three objective metrics, including mutual information (MI) [34], structural similarity (QY) [35], and the edge information-based metric Q(AB/F) [36] were used to evaluate the fusion performance of different multi-focus fusion methods.
The mutual information (MI) measures the amount of information transferred from both source images into the resulting fused image. It is calculated by where ( , ) is the mutual information of the input image X and fused image F. ( , ) is the mutual information of the input image Y and fused image F. ( ), ( ), and ( ) denotes the entropies of the input image X, Y, and used image F, respectively.     (40) and (41)). Edge information preservation values are formed by with 0 ≤ Q AF (n, m) ≤ 1. The higher value of Q AF (n, m), the less loss of information of the fused image. The fusion performance Q AB/F is evaluated as a sum of local information preservation estimates between each of the input images and fused image, it is defined as where Q AF (n, m) and Q BF (n, m) are edge information preservation values, weighted by w A (n, m) and w B (n, m), respectively. Table 1 illustrates the quantitative assessment values of five different multi-focus fusion methods and the proposed method. The larger the value of these metrics, the better image quality is. The values shown in bold represent the highest performance. From Table 1, it can be seen that the proposed method produces the highest quality scores for all three objectives metrics except for QY with "Canola 2" datasets and QAB/F with "Book" (extra images were also run to test the performance). These largest quality scores imply that the proposed method performed well, stably, and reliably. Overall, it can be concluded that the proposed method reveals the competitive performance when compared with previous multi-focus fusion methods both in visual perception and objective metrics. Table 2 describes the ranking of the proposed method with others based on the quality of fused images. The performance (including quality of the images and the processing time) is scaled from 1 to 6. The results show the outperformance of the proposed technique with other techniques previously published.

Summary and Conclusions
To improve the description and quality images, especially images acquired from the digital camera or the Pi camera for canola phenotyping, an image fusion method is necessary. A new multi-focus image fusion method was proposed with the combination of the VS maps and gradient domain fast guided filters. In the proposed algorithm, the VS maps were first deployed to obtain visual saliency, gradient magnitude similarity saliency, and chrominance saliency (or color distortions), then the initial weight map was constructed with a mix of three metrics. Next, the final decision weight maps were obtained by optimizing the initial weight map with a gradient domain fast guided filter at two components. Finally, the fused results were retrieved by the combination of two-component weight maps and two-component source images that present large-scale and small-scale variations in intensity. The proposed method was compared with five proper representative fusion methods both in subjective and objective evaluations. Based on the experiment's results, the proposed fusion method presents a competitive performance with or outperforms some state-of-the-art methods based on the VS maps measure and gradient domain fast guided filter. The proposed method can use digital images which are captured by either a high-end or low-end camera, especially the low-cost Pi camera. This fusion method can be used to improve the images for trait identification in phenotyping of canola or other species.
On the other hand, some limitations of the proposed multi-focus image fusion, such as small-blurred regions in the boundaries between the focused and defocused regions and computational cost, are worthwhile to investigate. Morphological techniques and optimizing the multi-focus fusion algorithm are also recommended for further study.
Furthermore, 3D modeling from enhancing depth images and image fusion techniques should be investigated. The proposed fusion technique can be implemented in the phenotyping system which has multiple sensors, such as thermal, LiDAR, or high-resolution sensors to acquire multi-dimensional images to improve the quality or resolution of the 2D and 3D images. The proposed system and fusion techniques can be applied in plant phenotyping, remote sensing, robotics, surveillance, and medical applications.