1 Introduction

Marr [1] suggested that color could play an important role as it ‘carries information that often has important biological significance’. This information could help distinguish ‘whether a fruit is ripe, whether a leaf is green and supple, whether an insect is likely to be poisonous, and many other things’. Similar to Marr’s, McAndrew [2] argued that ‘for human beings, colour provides one of the most important descriptors of the world around us’. However, despite the undoubtful importance of color, traditionally, grayscale images have been more widely used when it comes to image processing (IP). This bears particular significance in relation to the edge detection problem, where dealing with color images introduces some complications.

RGBs and HSVs are both among the most important color images types, and are built in the RGB and HSV color space, respectively. The RGB color space is based on human perception. The human eye has three different cone cells, one to capture red luminosity, another to capture green, and the last one to capture blue. Rods, the second kind of cell, processes intensity but not color [3]. As a result, only ‘three numerical components are necessary and sufficient to describe a color’ as Bogumil has indicated [3]. In this sense, the RGB color space should be ideal for capturing all the color information, as it uses three components for it: red, green and blue. A cube is the common geometric figure to represent RGBs.

On the other hand, HSV was created by graphic designers to mimic the artist process of creating colors [4]. HSV is made up of three channels, similar to RGB images, but color information is instead captured in one single channel: Hue ‘H’, while the saturation (S) is placed in a different channel. The higher the saturation ‘S’, the purer the color. A third channel contains the brightness information: the value ‘V’. The higher is the value, the whiter the color. HSV color space has recently found new applications as skin detection and face detection [5,6,7]. The geometric figure that usually represents this color space is a cone (or an hexcone).

Edge detection (ED) is considered one of the main techniques inside Image Processing field (IP) [8,9,10]. Its importance lies in the fact that most higher-level techniques and algorithms make use of it. In the literature, there is not only one single definition accepted for ED. The most common one is that it is a technique that pursues finding the most important luminosity changes in a digital image [8, 9]. It can be stated that ED mimics the natural process of extracting visual information that is accomplished by human vision. This process has been named primal sketch [1]. ED has found applications along with a wide range of tasks and fields as pathological diagnostics in medicine [11], with a special focus in tumoral discovering. As well, in remote sensing [12], which is useful for agriculture and biology, and more recently for research related to climate change. Other relevant fields for ED are: military industry, surveillance [13], and others [14,15,16,17,18,19].

Some examples of prestigious ED algorithms developed in the literature for working with gray images are Sobel operator [20] and Canny’s [21]. The ED algorithms differ from each other mainly in the filters they use. These filters or masks are mathematical computations that perform over the matrix of pixels, i.e., the image, to find significant differences between the pixel values. Other differences between them are based on how they perform along each of the edge detection phases [8, 9, 22], which are commonly divided into conditioning (optional), feature extraction, blending or aggregation of features, and scaling.Footnote 1

Color edge detection is more complex to deal with than grayscale’s [23], and due partially to this, there exists commonly a overuse of gray ED, even when the original images are color images that would deserve a more careful treatment to keep its valuable color information. As Koschan and Abidi argue: ‘the color edges describe an object geometry in the scene better than the intensity edges, although over 90% of the edges are identical’. Most edge detection algorithms only deal with grayscale images, while a high number of segmentation algorithms deal with color images [10, 24, 25]. One important reason for this is that the distance/dissimilarity between pixels luminosity in one dimension is easier to compute than in a multidimensional case, where the approach for computing the distance/dissimilarity between colors remains an open problem. Therefore, it could be said that the main problem that arises in color ED is how to measure the distance between colors.

In the literature, two main methods for dealing with colors inside the ED problem have been proposed: individual channel [26] and vector-based approaches [27,28,29]. The first approach seems quite ‘natural’, and it consists of extracting edges for each channel separately. This approach brings another problem related to the necessity of choosing an appropriate aggregation method for blending the different channels information. And this issue is especially intricate in the case of HSV images due to the different nature of the channels/colors. Another motivation of exploring the possibilities of ED with HSV is that it has been less common in the literature than RGBs.

The proposal of this paper shows that applying different color edge detection algorithms over HSV images making use of aggregation operators inspired by Yager’s [30] outperforms RGB based approaches. Moreover, these algorithms are based on 4 different methods for aggregating the RGB and HSV channels.

The rest of this paper is organized as follows: the following Sect. 2 explains some preliminary information including IP basic concepts, color edge detection, HSV images generation from RGB images, relevant aggregation operators concepts and evaluation of ED algorithm’s performance using human references. Then, the different approaches for aggregating RGB and HSV channels are presented in Sect. 3. Finally, Sects. 4 and 5 present the comparison, its results, and the conclusions of this research.

2 Preliminaries

This section introduces some important concepts related to IP and aggregation operators.

2.1 Digital Image Processing

Let us denote a digital image by I, and the/its pixel coordinates of the spatial domain by (ij). For clarity’s sake, the coordinates are integers, where each point (ij) represents a pixel with \(i=1, \ldots , n\) and \(j=1, \ldots , m\). Therefore, the size of an image, \(n \ \times \ m\), is the number of its horizontal pixels multiplied by its number of vertical pixels. As color images are being dealt with, then a \(k=1, \ldots ,\tilde{k}\) index is needed for expressing the number of channels in the image. Let us denote by \(I^k_{i,j}\) the spectral information associated with each pixel (ij) at channel k. The value range of this information is dependent on the digital image type that is being considered:

  • binary map (\(I_{\text {bin}})\) where \(I_{i,j}\in \lbrace 0,255 \rbrace \) (as well it is usually expressed as \(\lbrace 0,1 \rbrace \)).

  • grayscale image (\(I_{\text {gray}}\)), where \(I_{i,j}\in \lbrace 0,1,\ldots ,255\rbrace \).

  • soft image (\(I_{\text {soft}}\)) where \(I_{i,j}\in \left[ 0,1\right] \). As well it is referred as a normalized grayscale image.

  • RGB image (\(I_{\text {RGB}}\)) where \(I_{i,j}\in \lbrace 0,1,\ldots ,255\rbrace ^3\) (R = Red; G = Green and B = Blue). In this paper, the channels are referred as both, \(I_{\text {RGB}}^{\text {R}},I_{\text {RGB}}^{\text {G}},I_{\text {RGB}}^{\text {B}}\) or as the more simplified version \(I_{\text {R}},I_{\text {G}},I_{\text {B}}\).

  • HSV image (\(I_{\text {HSV}}\)) composed by three channels \((I_{\text {HSV}}^{\text {H}},I_{\text {HSV}}^{\text {S}},I_{\text {HSV}}^{\text {V}})\) being H=Hue; S=Saturation and V=Value), \(I_{i,j}^{\text {S}},I_{i,j}^{\text {V}} \in [0,1]\) and for \(I_{i,j}^{\text {H}}\) there are two possible definitions: That \(I_{i,j}^H \in [0^o,360^o]\) or \(I_{i,j}^H \in [0,1]\) (the one used in this paper). The first one can easily be obtained through multiplying the second one by 360 and changing the scale to degrees. For ease of reference, in this paper the HSV channels are also referred to as \(I_{\text {H}},I_{\text {S}},I_{\text {V}}\). Moreover, for the use inside a formula by means of numeric index a third kind of expressions are employed: \(I_{\text {HSV}}^1,I_{\text {HSV}}^2,I_{\text {HSV}}^3\) where 1 stands for Hue, 2 stands for Saturation and 3 for Value. See Sect. 2.2 for more information about HSV images.

2.2 HSV and Other Color Models

HSV was created for mimicking the process of a painter that starts by choosing a hue, and then adds some white to it to give more light, or some black to darken it. Hue is any pure color that can be represented as a point placed in a disk (or in an hexagon). This color ranges according to its saturation from the purest color, i.e., the maximum saturation that is placed over the disk or hexagon, to white or gray, i.e., the minimum saturation that is situated in the center of the circle or hexagon. This is the reason why hue is expressed in degrees. ‘Value’ is the third dimension and represents the grade in which this color is non-black (0 = ‘black’ and 1 = ‘white’). Thus, a regular cone or a similar figure is usually employed to represent this color model (see Fig. 1). The less black, the higher the value is.

The literature has proposed color spaces different from RGB and HSV to study the dissimilarities between colors, as in the case of YUV and its variant YCoCg [31], CIELab [32], CYMK among others.

Another color model similar to HSV is HSL. There is a degree of confusion between these two models, and it is important to make a distinction [4]. In HSL, ‘L’ stands for ‘lightness’, which is equivalent to whiteness, while the value ‘V’ in HSV refers to the purity or ‘non-blackness’ of a color. A practical distinction is that all the pure tones or hues are the same in HSV, and placed in a plane, while in HSL every hue refers to a color that has a different lightness.

The algorithm named ‘Hexcone model’ [4] allows transforming an RGBs into an HSV image, and its steps are specified in what follows.

  1. 1.

    A normalized RGB image \(I_{\text {RGB}} \in [0,1]^3\) is given;

  2. 2.

    \(I_{\text {V}} = {\text {max}}(I_{\text {R}},I_{\text {G}},I_{\text {B}});\)

  3. 3.

    Let \(X = {\text {min}}(I_{\text {R}},I_{\text {G}},I_{\text {B}});\)

  4. 4.

    \(I_{\text {S}} = \frac{I_{\text {V}}-X}{I_{\text {V}}}\) if \(I_{\text {V}} \ne 0;\) (else \(I_{\text {S}} = 0\) and then the color is pure black);

  5. 5.

    Let \(r = \frac{I_{\text {V}}-I_{\text {R}}}{I_{\text {V}}-X}, g = \frac{I_{\text {V}}-I_{\text {G}}}{I_{\text {V}}-X}\) and \(b = \frac{I_{\text {V}}-I_{\text {B}}}{I_{\text {V}}-X};\)

  6. 6. (a)

    If \(I_{\text {R}} = I_{\text {V}}\) then \(I_{\text {H}} = 5+b\) if \(I_{\text {G}} = X\) and else \(I_{\text {H}} = 1-g\);

  7. (b)

    else if \(I_{\text {G}} = I_{\text {V}}\) then \(I_{\text {H}} = 1+r\) if \(I_{\text {B}} = X\) and else \(I_{\text {H}} = 3-b\);

  8. (c)

    else \(I_{\text {H}} = 3+g \) if \(I_{\text {R}} = X\) and else \(I_{\text {H}} = 5-r;\)

  9. 7.

    \(I_{\text {H}} = I_{\text {H}}/6.\)

After this algorithm is applied, an HSV image is obtained, with \(I_{\text {H}}\in [0,1]\), \(I_{\text {S}}\in [0,1]\) and \(I_{\text {V}}\in [0,1]\).

This paper is concerned with two different color spaces: RGB and HSV, whose differences can be appreciated in Fig. 1. Previous works have employed a different one: Super8 [26, 29], and in the previous and shorter version of this research [23], only RGB was employed. In this paper, RGB and HSV are employed by means of individual-channel approach (multi-channel was employed in [29]).

2.3 Aggregation Operators

Aggregation operators (AO) are one of the most important disciplines in information sciences since they are a fundamental part of knowledge acquisition. The process of aggregating the information is a key tool for most knowledge-based systems.

Definition 1

A function \(A:[0,1]^n\rightarrow [0,1]\) is said to be an n-ary aggregation function if the following conditions hold:

  1. (A1)

    A is increasing in each argument: for each \(i\in \{1,\ldots ,n\}\), if \(x_i\le y\), then \(A(x_1,\ldots , x_i, \ldots , x_n)\le A(x_1,\ldots , x_{i-1},y,x_{i+1},\ldots ,x_n)\);

  2. (A2)

    A satisfies the boundary conditions: \(A(0,\ldots , 0)=0\) and \(A(1,\ldots , 1)=1\).

AO have been employed in different disciplines due to their large number of applications, IP being one of them [33,34,35].

As this paper deals with information provided by different channels, which belong to RGB or HSV images, the use of AO is justified by the fact that the value of each channel is related to the likeliness of a given pixel to be an edge, or what the literature has called edginess. Then, there is a natural connection between the boundary conditions of AO and the potential edginess of a certain pixel. In this sense, the complete lack of edginess can be associated to the concept of minimal boundary. Conversely, the pixel is an edge when the supreme boundary is reached. Another desired propriety is monotonicity, as for a certain pixel an increment in the value of any channel means a higher likeliness for the pixel to be an edge.

The classical definition of AO can be naturally extended by means of replacing the unit interval [0, 1] with a more general lattice, which in the fuzzy area is traditionally assumed to be a complete lattice [36].

Some of these operators allow giving prioritization to some type of data against others. This can be done, for example, dealing with prioritized information, as it happens with the prioritized aggregation operators proposed by Yager in [30], which were generalized by Rojas et al. [37] In the latter, the generalization consisted in the use of general weights acting over the hierarchy and internal aggregation operators that can differ from the minimum, which was employed by Yager [30].

Definition 2

The Yager-inspired hierarchical prioritized aggregation operator is defined as:

$$\begin{aligned} V(y_1, \ldots , y_n) = \sum _{i=1}^{n} w_{i} \prod _{k=1}^{i} y_{n-k+1}, \end{aligned}$$
(1)

where ‘V’ stands for vertical, as every hierarchized set of data is placed in a different box inside a vertical structure that shows the hierarchy between the clusters ‘\(y_i\)’, and ‘n’ is the number of different clusters (for further information see [37]).

The prioritization of clusters (or ‘categories’ in Yager’s words) enables not only the assignment of a different importance for each one (this could also be done employing weighted operators) but it also enables the use of ‘a kind of importance weight in which the importance of a lower priority criteria will be based on its satisfaction to the higher priority criteria’ [38].

For this paper, and within the HSV context, the \(y_1=\) ‘value’ channel is in the top box, followed by the \(y_2=\) ‘saturation’ channel in the middle, with the ‘least important’ the \(y_3=\) ‘Hue’ channel at the bottom. The assumption that the ‘value’ channel carries more information for edge extraction purposes, i.e, it is more important than the other two, has been made. In fact, using only this channel allows an edge extraction that works perfectly (in practice, this happens because this channel is equal to the maximum color intensity of the RGB image’s version as can be seen in Hexcone model algorithm). Another reason for using this prioritization is that first comes the lightness (the ’value‘ channel), after comes the saturation of the color (the color purity), and finally the hue, what theoretically would be the least important of the three channels for ED purposes. Finally, to satisfy the conditions of Definition 1, \(\sum _{i=1}^{n} w_i = 1\) and \(w_i \in [0,1]\).

Fig. 1
figure 1

A visual representation of RGB and HSV (Hexcone model) color spaces. White color is black outlined to contrast it better against the background

2.4 Color Edge Detection

From a mathematical point of view, an edge detection algorithm is a function that converts a digital image into a binary image. The literature on the topic offers two main approaches to color edge detection:

  1. 1.

    Individual channel: The edges are extracted for each channel. This is the approach this paper employs.

  2. 2.

    Vector-based approach: An aggregation function is applied: for example, a median filter [27], a range operator [28], or other statistical aggregation methods [39]. This approach was employed in [29].

Other alternative approaches for working with color edge detection have been proposed. For example, in [40, 41] where this problem is solved by means of working with gradients that result from combining different colors.

2.5 Performance in Edge Detection Problems

Evaluating an edge detection algorithm cannot be considered a trivial task. For managing this task there exist many different approaches. In this paper, is followed the boundary-based evaluation methodology developed in [42, 43]. The methodology for benchmarking boundary detection algorithms developed by [43] is used on the Berkeley Segmentation Dataset (BSDS500). Nevertheless, this dataset of images was not created specifically for edge detection, but it has been widely used for edge detection comparisons [18, 22]. This dataset consists of 500 natural images that are divided into a training set of 200 images, a test set of 200 images and a validation set of 100 images. Each image of BSDS is accompanied by a set of four to seven human-made reference boundary maps (an example of this ‘Humans ground-truth’ is shown in Fig. 4) that serve as ground-truth for evaluating the automatic boundary maps that constitute the output of an edge detection technique [42].

Given an image I and to compare an edge detection solution \(I_{\text {bin}}\) (a binary image) for this image with the result given by a human ground-truth, a matching algorithm is developed to find the true positive values needed to build the confusion matrix. In this matching algorithm a distance threshold \(\delta \) is defined to specify the tolerance level to small boundary localization errors. Then, an unmatched automatic boundary pixel that lies closer than a distance \(\delta \) from a human boundary pixel is counted as a true positive (TP). Otherwise, unmatched automatic boundary pixels are counted as false positives (FP). And unmatched human boundary pixels are counted as false negatives (FN). Once these values are obtained, the confusion matrix can be built as well as other accuracy measures as the precision (Prec), recall (Rec) and also the \(F_{\beta }\)-measure. These constitute the most accepted alternative in recent years [18, 42, 44] to evaluate the performance of each one-to-one comparison.

Formally, given a candidate automatic boundary map \(I_{\text {bin}}\) and a ground-truth human boundary map \(I_{\text {gt}}\), its comparison’s \(F_{\beta }\) is computed as follows:

$$\begin{aligned} F_{\beta }(I_{\text {bin}},I_{\text {gt}})=(1+\beta ^2)\frac{{\text {Prec}}(I_{\text {bin}},I_{\text {gt}})\cdot {\text {Rec}}(I_{\text {bin}},I_{\text {gt}})}{\beta ^2 {\text {Prec}}(I_{\text {bin}},I_{\text {gt}})+{\text {Rec}}(I_{\text {bin}},I_{\text {gt}})} , \end{aligned}$$
(2)

where an harmonic mean is obtained for \(\beta =1\) and

$$\begin{aligned} {\text {Prec}}(I_{\text {bin}},I_{\text {gt}})&=\frac{{\text {TP}}}{{\text {TP}}+{\text {FP}}} , \end{aligned}$$
(3)
$$\begin{aligned} {\text {Rec}}(I_{\text {bin}},I_{\text {gt}})&= \frac{{\text {TP}}}{{\text {TP}}+{\text {FN}}}. \end{aligned}$$
(4)

3 Aggregating Channels in Color Edge Detection

This paper proposes 4 different methods for aggregating the RGB and HSV channels. The Sobel operator [20, 45] and Canny algorithm [21] are both used for these methods specified below. Moreover, all these methods make use of the individual channel approachFootnote 2.

  1. 1.

    Crisp pre-aggregation (method A): This method aggregates the different components/channels into one single channel. The exact procedure depends on the color space that is being used. In the RGB case, this means that the three channels/colors are aggregated into one single gray channel employing the mean. Then, an edge detection algorithm is applied over the resulting grayscale image. Canny’s and Sobel’s are applied to this paper. This method can be regarded as the classic method, as it is the most common procedure employed in the literature.

    $$\begin{aligned} I_{\text {gray}}=\frac{I_{\text {R}} + I_{\text {G}} + I_{\text {B}}}{3}. \end{aligned}$$
    (5)

    Equation 5 can be easily extended to more channels simply by considering the mean or weighted mean, which has already been applied, for example, in [26, 39]. This approach allows assigning variable importance to each color/channel.

    When aggregating the three channels of an HSV image the specific nature of each channel has to be taken into account. To deal with the information provided by \(I^1_{\text {HSV}}\), \(I^2_{\text {HSV}}\) and \(I^3_{\text {HSV}}\), corresponding to hue, saturation and value respectively, this paper proposes what it has been termed Yager aggregation of channels, which is inspired by [30], employed in [37] and explained in Sect. 2.3. Three different weights: \(w_1,w_2,w_3\) are applied hierarchically as shown below:

    $$\begin{aligned} I_{\text {gray}} = \sum _{i=1}^3 w_{i} \prod _{k=1}^i I_{\text {HSV}}^{4-k} \end{aligned}$$
    (6)

    expression that can also be expressed as:

    $$\begin{aligned} I_{\text {gray}} = w_1 \cdot I_{\text {V}} + w_2 \cdot I_{\text {V}} \cdot I_{\text {S}} + w_3 \cdot I_{\text {V}} \cdot I_{\text {S}} \cdot I_{\text {H}}. \end{aligned}$$
    (7)

    Once the HSV image has been aggregated into a single grayscale channel, which can be expressed as \({\text {Yager}}(I_{\text {HSV}},w_1,w_2,w_3)\), an edge detection operator is applied following standard procedures: \(I_{\text {soft}}=edge(I_{\text {gray}})\) as a general formula, and in the case of this paper being \(I_{\text {soft}}={\text {Sobel}}(I_{\text {gray}})\) or \(I_{\text {soft}}={\text {Canny}}(I_{\text {gray}})\) as these two edge detection algorithms (operator in the case of Sobel) are the ones employed.

    As the final ED step, the binarized image \(I_{\text {bin}}\) is produced. This image results from the soft image after a threshold value is applied (for a detailed explanation of the different phases of edge detection task see [22, 46]): \(I_{\text {bin}}={\text {threshold}}(I_{\text {soft}},\alpha )\), where this means that an \(\alpha \) threshold is applied over the soft image. Normally, this \(\alpha \) value ranges from 0 to \(100\%\) or [0, 1]. The \(w_i\) weights are chosen to satisfy the conditions of Definition 1.

    Two schemes for this method, one for RGB and the other for HSV, can be seen at the top of Figs. 2 and 3.

  2. 2.

    Crisp post-aggregation (method B): Applying an edge detection operator over each channel separately, which produces \(\tilde{K}\) different edges maps (\(\tilde{K}=3\) in the case of RGB and HSV). Then, all \(\tilde{K}\) resulted binarized images will be aggregated into a single one. We can see this methodology in Algorithm 1:

    In the case of HSV images, the three channels differ in their nature, as it was pointed out in Sect. 2. Therefore, to adapt the \(I^k_{\text {soft}}\) images, these three images resulted as follows:

    1. (a)

      \(I^{\text {YagerV}}_{\text {soft}}={\text {Sobel}}(w_1 \cdot I_{\text {V}})\);

    2. (b)

      \(I^{\text {YagerS}}_{\text {soft}}={\text {Sobel}}(w_2 \cdot I_{\text {V}} \cdot I_{\text {S}})\);

    3. (c)

      \(I^{\text {YagerH}}_{\text {soft}}={\text {Sobel}}(w_3 \cdot I_{\text {V}} \cdot I_{\text {H}} \cdot I_{\text {S}})\).

    After this, the binarized images of each channel are created through applying a threshold (see 4th line of Algorithm 1). Then, they are aggregated (5th line of Algorithm 1). The maximum aggregation function \(\Theta () = {\text {max}}()\) has been employed:

    \(I_{\text {bin}}= \max (I^1_{\text {bin}}, \ldots ,I^{\tilde{k}}_{\text {bin}})\)

    In the case of RGBs, this method B with the max() aggregation function has been employed by [2].

  3. 3.

    Fuzzy post-aggregation (method C): In this case the aggregation function is not applied over the already binarized image, but it is instead applied in the previous step over the soft image corresponding to each color channel. This soft image is made of what are termed as candidates to be edge pixels (see [22]). At the last step of the Algorithm 2, the binarized image is produced, following a soft approach that we have called ‘fuzzy’ approach.

    One more algorithm, the Algorithm 3, has been developed going beyond what was done in [23]. In this new approach, which has been named Method C2, the aggregation function is applied before combining the edge information, aka as edginess, of different directions/features, which are two directions, horizontal and vertical, in the case of Sobel’s. On the other side, when the aggregation of channels is made after the blending of directional information, this has been named Method C1.

figure a
figure b
figure c
Fig. 2
figure 2

Scheme of methods A, B and C with RGB images

Fig. 3
figure 3

Scheme of methods A, B and C with HSV images

4 Comparisons

To test edge extraction quality for the different aggregation methods specified in the previous section, Berkeley’s images data set (BSDS500) [43] was used. More specifically, the first 50 images sorted by number were employed (from ‘100075.jpg’ to ‘16052.jpg’). This sample was divided into training and test, which allowed learning the best parameters. Following a tenfold cross-validation method, the 50 images were divided into 5 blocks of 10 images each, which resulted in 10 different combinations of training sets of 30 images and validation sets of 20 images. Following this learning approach meant another improvement compared to what was done in previous research [23].

The comparisons were conducted for both versions of the image, RGB and HSV. The RGB version was the original included in Berkeley’s dataset, while the HSV version was computed by means of the \({\text {rgb2hsv}}(I_{\text {RGB}})\) function of Matlab2020b program. This function transforms an RGB image into HSVs applying the algorithm known as the hexcone model [4].

Moreover, 10 different \(w_i\) values were allowed for each HSV channel (0.1 by 0.1), but this was made respecting the boundary condition of Definition 1. Therefore, in resume, in this paper were compared 2 different color spaces (RGB and HSV) with 4 methods and 3 smoothing values (\(\sigma _{\text {smooth}}=\lbrace 0,0.6,1.0\rbrace \)), with 66 different weights (the number of \(w_i\) combinations that result when steps of 0.1 by 0.1 are being considered), 2 different edge detection algorithms (with 3 different Gaussian values in the case of Canny’s: \(\sigma _{\text {Canny}} = \lbrace 1, 1.5, 2.0\rbrace \)), and all this was made employing 19 different superior threshold values for hysteresis process [21] (ranging from 0.05 to 0.95 and stepping 0.05 by 0.05). Lower threshold was set up as \({\text {Thr}}_{\text {low}}=0.4{\text {Thr}}_{\text {sup}}\) as this relationship between both thresholds had been previously discovered in previous researches. All these combinations made a total of almost 3 million and a half binarized images to be evaluated. An ACER Intel(R) Core(TM) i7-8550U CPU with 1.80 GHz and 2.00 GHz with 20 GB RAM was employed for the computational analysis, supported occasionally by an OMEN HP Intel(R) Core(TM) i7-7700HQ CPU with 2.80 GHz and 2.81 GHz with 16 GB RAM.

After the matching process between the outputs of each algorithm with human’s ground-truth was performed, Precision, Recall and two different \(F_1\) were computed for the 10 cross-validated folds. The first \(F_1\) was computed for the average human, and the second one for the most similar human. This was another improvement compared to what was done in [23], where only the average human \(F_1\) was computed). Finally, the average Precision and Recall for the 10 cross-validation folders was employed to compute the overall \(F_1\) for both types of images, HSV and RGB, and method applied: A, B and C.

Focusing first in Sobel’s results, in Tables 1 and 2 is shown the superiority of HSV aggregations against RGBs. Among the HSV aggregations, the Yager-Inspired aggregations, whose scheme is shown in Fig. 3, based on method C (‘fuzzy’ approach) and method A, reached the highest \(F_1\) value (0.555). Method A proved its superiority for both, the average human and the closest human. Among the RGB aggregations, the method B reached the best performance.

It was considered as the best parameter’s value the one that reached the highest F value for a certain cross-validation folder. The smoothing best parameter was equal to 1 (\(\sigma _{\text {smooth}}=1\)) in all the methods except in the case of Sobel’s method C with maximum aggregation where it was 0. In the case of \(\sigma _{\text {Canny}}\) the best value for all the Canny methods was 2. In relation to the \(w_i\), i.e., the Yager-inspired weights, the most frequent among the 10 cross-validation folders best parameter values are shown at the last column of the tables

Table 1 Sobel evaluated measures for HSV and RGB algorithms (average of 10 cross-validation sets)—closest human
Table 2 Sobel evaluated measures for HSV and RGB algorithms (average of 10 cross-validation sets)—average human

In the case of the Canny algorithm, Tables 3 and 4 show again (as in Sobel’s case) that the HSV Yager-inspired aggregations outperformed the RGB results.

Table 3 Canny evaluated measures for HSV and RGB algorithms (average of 10 cross-validation sets)—closest human
Table 4 Canny evaluated measures for HSV and RGB algorithms (average of 10 cross-validation sets)—average human

In Fig. 4 are shown through an example image all the outputs, human references and original images: HSV and RGB.

Fig. 4
figure 4

Best outputs for all methods for an example’s image

A battery of Wilcoxon signed rank test with continuity correction was performed. This test pursued to find differences in \(F_1\) median values (average human) for a 10 training-test set fold between methods. The results of all these tests can be seen in Tables 5Footnote 3 and 6, and the medians with their standard errors are shown in Fig. 5. The tests showed that the best method for HSV outperformed the best method for RGB, which means that the \(F_1\) median in HSV images was significantly higher than the \(F_1\) median in RGB images. As well, more Wilcoxon rank tests were performed separately for RGB and HSV to seek differences between A, B and C methods. Method B was the best between RGB methods, and method HSV Yager A was found to be the best between HSVs.

Fig. 5
figure 5

\(F_1\) medians and standard errors for all methods (closest human)

Table 5 Wilcoxon signed rank test with continuity correction between the aggregation methods (Sobel)
Table 6 Wilcoxon signed rank test with continuity correction between the aggregation methods (Canny)

5 Conclusions and Future Research

This paper yields two main findings. The first is based on the potential of Yager-inspired aggregations with HSV channels, which showed to be closer to human vision (as they reached highest \(F_1\) values) than its equivalent RGB aggregations. Moreover, HSV aggregations presented lower standard deviation than RGBs which shows that these methods were more robust. This was tested for both Canny’s and Sobel’s. The superiority of HSV aggregations is remarkable, and it seems slightly counterintuitive that the same image containing the same information, but differently arranged, could result in having such a different potential for edge extraction purposes. The second finding is related to what we have called Yager-inspired aggregation. This aggregation proved to be an efficient approach for HSV image analysis. In relation to the \(w_i\) Yager-inspired weights, which are applied over the HSV images, it came out that \(w_1\) and \(w_2\), that perform over the ‘value’ and ‘saturation’ channels, respectively, were more relevant for the edge extraction process than \(w_3\).

Future research can explore Canny’s aggregations more in depth, due to the fact that Canny’s brings more complexity as its scaling phase is based on two different thresholds in a process known as hysteresis. Then, when the scaling is made separately for each channel the resulting segments built by hysteresis tend to be shorter than when hysteresis is performed over the information provided by the three channels combined.

We are also considering the possibility of adapting these aggregation methods to edge detection based on the global evaluation with segments that was explored in [22]. Moreover, it would be worthwhile to continue exploring these aggregation methods, or equivalent ones, with other color spaces such as HSL (similar to HSV), CYMK, CIELAB, and even Super 8 [29]. As well, other interesting possibility would be comparing the proposed HSV aggregations with other HSV aggregations proposed in the literature. Finally, the approach based on mixing different channels from different color spaces using deep learning techniques seems like a promising and natural next step.