Infrared Image Superpixel Segmentation Based on Seed Strategy of Contour Encoding

Li, Weihua; Miao, Zhuang; Mu, Jing; Li, Fanming

doi:10.3390/app12020602

Open AccessArticle

Infrared Image Superpixel Segmentation Based on Seed Strategy of Contour Encoding

¹

Key Laboratory of Infrared System Detection and Imaging Technology, Chinese Academy of Sciences, Shanghai 200083, China

²

University of Chinese Academy of Sciences, Beijing 100049, China

³

Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(2), 602; https://doi.org/10.3390/app12020602

Submission received: 16 December 2021 / Revised: 4 January 2022 / Accepted: 6 January 2022 / Published: 9 January 2022

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Superpixel segmentation has become a crucial pre-processing tool to reduce computation in many computer vision applications. In this paper, a superpixel extraction algorithm based on a seed strategy of contour encoding (SSCE) for infrared images is presented, which can generate superpixels with high boundary adherence and compactness. Specifically, SSCE can solve the problem of superpixels being unable to self-adapt to the image content. First, a contour encoding map is obtained by ray scanning the binary edge map, which ensures that each connected domain belongs to the same homogeneous region. Second, according to the seed sampling strategy, each seed point can be extracted from the contour encoding map. The initial seed set, which is adaptively scattered based on the local structure, is capable of improving the capability of boundary adherence, especially for small regions. Finally, the initial superpixels limited by the image contour are generated by clustering and refined by merging similar adjacent superpixels in the region adjacency graph (RAG) to reduce redundant superpixels. Experimental results on a self-built infrared dataset and the public datasets BSD500 and 3Dircadb demonstrate the generalization ability in grayscale and medical images, and the superiority of the proposed method over several state-of-the-art methods in terms of accuracy and compactness.

Keywords:

infrared image segmentation; superpixel; content-adaptive; seed strategy; superpixel refinement

1. Introduction

Infrared (IR) systems have a stronger penetrating ability and are less affected by fog and haze than visible systems. In recent decades, IR systems have been widely used in the remote sensing and surveillance fields, and the speed and accuracy of detection are key indica-tors. Blocks as basic processing units, instead of points, can effectively increase processing speed and serve as a pre-processing tool to facilitate task implementation such as removing halo artifacts from dehazing images [1,2]. Therefore, a reliable block (superpixel) segmentation method is necessary.

Superpixel segmentation is an unsupervised image clustering method, the concept of which was first proposed by Ren et al. [3]. An image usually includes a lot of redundant information that participates in information processing, such as adjacent pixels with similar gray values in a sky region; the handling of such data is unnecessary. Superpixel segmentation divides images into several simple connected regions with similar features, such as brightness, texture, and contrast, in order to replace point-to-point operations. Compared with single pixels, superpixels can effectively represent image features and reduce the number of processing units [4]. Therefore, superpixel segmentation has become a key pre-processing step in many computer vision applications, such as image dehazing [5], stereo 3D reconstruction [6], object recognition [7], remote sensing image segmentation [8,9], and many others. For these applications, a superpixel as a homogeneous region should have the following properties [10,11]:

Each superpixel is a simply connected domain, and each pixel in the image is classified into only one superpixel;
Superpixels should have regular shapes and adhere well to image boundaries. Furthermore, they should have consistent texture, and internal pixels should have similar properties;
The density of superpixels is adaptive to the complexity of the image contents. The number of superpixels in a sparse region (e.g., an area of the sky), should be fewer than that in a dense region;
The generation of superpixels should be fast, memory-efficient, and require little manual intervention.

Automatic superpixel segmentation has received widespread attention in the last decade, and can be broadly divided into two processing strategies [11,12]: graph-based algorithms, and gradient-based algorithms. Graph-based algorithms [9,13,14] aim to construct an undirected graph and minimize a defined cost function to generate superpixels. These methods treat pixels as nodes in the graph, and the edge weights between adjacent nodes are proportional to the similarity between the two nodes. Gradient-based algorithms [12,15,16] first obtain coarse segmentation by initial clustering, then iteratively refine the coarse clusters until a convergence condition is satisfied, such as the number of iterations or image quality criteria. This type of method usually starts from an initial seed set and finally assigns a superpixel label to each pixel. While methods in both categories can generate superpixels, they also have their own drawbacks. Graph-based algorithms have better performance in adhering to image boundaries but lack operational efficiency, which limits their use in many real-time applications. In addition, the shapes of superpixels are generally extremely irregular and lack compactness. In contrast, gradient-based algorithms perform well in terms of time complexity, but are inadequate for complex or irregular regions, as the superpixels easily cross the image boundary. Therefore, most tasks involve a trade-off between adherence to boundaries and efficiency. In recent years, increases in video resolution have led to a growing demand for an efficient superpixel generation algorithm with better boundary adherence.

Due to its efficiency advantage, the clustering method has drawn much attention, and many constraints have been proposed to improve the boundary tracing ability [17]. In the process of clustering, the initial seed set and the grouping of pixels are crucial. An image is usually composed of smooth regions and complex content regions, but most existing superpixel algorithms cannot generate superpixels adapted to the image content. Regular grid seed sampling is used by most algorithms to construct the initial seed set, which ignores the image structure and may lead to over-segmentation in smooth regions or poor boundary adherence in complex regions. In addition to color and spatial distance, texture is also used as an additional similarity measure factor to enhance the boundary adherence capability; however, it cannot ensure that the size of superpixels matches the image content [18,19,20]. Texture can reflect the complexity of the image, but not its structure. Contours can better reflect the image structure, and have been utilized by more scholars to strengthen the similarity measure between pixels. Zhang et al. [21] added the contour item as one of the similarity measure factors and determined the reallocation of pixels based on the possibility of their location on object boundaries; however, this method cannot actively increase the number of superpixels in complex regions to segment small regions. Luo et al. [22] directly extracted over-segmented superpixels from the binary edge map, then optimized the initial superpixel image using similarity clustering to generate the final superpixels. Although their method is able to generate superpixels based on image content with better boundary adherence, it is more sensitive to the fracture/closure of the contour and will fail in regions with weak boundaries or color progression. Most algorithms can only increase the number of superpixels across the board to improve segmentation accuracy, rather than in the desired region. Therefore, it is challenging to design a superpixel algorithm that is both robust and sensitive to image content.

Inspired by clustering algorithms and the contour constraint, we propose an effective IR image superpixel generation method based on a seed strategy of contour encoding (SSCE). Specifically, we apply the seed sampling strategy and an optimized clustering process to generate superpixels. The former provides the initial seed set for the clustering process by contour encoding, while the latter assigns a unique label to each pixel according to clustering criteria. As mentioned above, contours provide the structural information of the image, and their number represents the complexity of the structure in a local region. In general, larger superpixels are suitable for smooth regions in order to avoid over-segmentation, while more superpixels with smaller sizes are necessary for complex texture regions, thus enhancing the boundary adherence capability [23]. Our adaptive seed sampling strategy takes this feature into account. First, a contour encoding map is obtained by ray scanning, which is capable of constructing closed connected domains based on the image content. According to the sampling strategy, initial seed points are placed in each connected domain. Therefore, the initial seed set can adapt to the local image structure and place more seed points in dense regions. Then, starting from the initial seed set, the clustering algorithm is utilized to obtain the initial superpixels. In the clustering process, we eliminate the local outliers and optimize the similarity evaluation criteria to improve the boundary adherence performance. In order to reduce the leakage of superpixels, a contour constraint is applied simultaneously to the similarity measure function to refine the initial superpixel map. Finally, high-quality superpixels that can reduce over-segmentation and adapt to irregular shapes are generated by merging similar adjacent superpixels.

To demonstrate the effectiveness of our algorithm, comparative experiments were performed on a self-built IR dataset, the real-world image dataset BSD500 [24], and the medical image dataset 3Dircadb [25]. The results indicate that the proposed algorithm is adaptive to the structure of the image and performs better in adhering to boundaries and irregular objects. The main contributions of this paper are as follows:

First, an adaptive seed sampling strategy of contour encoding is proposed to solve the problem of the insensitivity of regular grid seed sampling to the image content;
Second, a novel balanced weighted similarity measurement enables accurate similarity evaluation between superpixels with different sizes, and a contour penalty term increases the boundary adherence capability and adaptation to irregular objects;
Finally, a constrained recombination method is applied to the initial superpixels, which significantly reduces over-segmentation and optimizes the labeling of irregular objects.

The rest of the paper is organized as follows. In Section 2, we briefly review both of the image segmentation categories by discussing some representative methods. Section 3 describes the proposed method in detail. Section 4 presents the experimental results of our algorithm and several existing methods. Finally, a summary is presented in Section 5.

2. Related Work

In order to generate satisfactory superpixels, a large number of algorithms have been proposed in the last few decades, most of which can be roughly categorized into two strategies: graph-based and gradient-based methods. We briefly review each strategy in this section.

2.1. Graph-Based Algorithms

Most graph-based algorithms extract superpixels by optimizing a cost function defined on a graph model. Normalized cut (Ncut), proposed by Shi et al. [26], is a classical image segmentation method. This method treats pixels and the similarity between each pixel pair from the image as nodes and edge weights in the graph, respectively. Then, superpixels are generated by minimizing a normalized sum of total edge weights. Although Ncut can express the global relationship between pixels, the boundary adherence performance is poor, and the computational complexity becomes progressively higher as the size of the image or the number of superpixels increases. To accelerate this process, the Felzenszwalb–Huttenlocher method (FH) [27] was proposed. FH defines a predicate that measures the evidence of boundaries to guide the merging of two nodes on the graph. Compared with Ncut, this method has better efficiency and boundary adherence, but the final superpixels have irregular shapes and sizes.

The aim of Ncut and its optimized version is to achieve global optimization, which leads to high resource consumption. Spectral clustering (SC) [28] maps data into another feature space by utilizing a feature function, in which graph-cut methods are applied to generate the optimal result. However, eigendecomposition is a complex pre-processing method. Chen and Li [29] have proposed linear spectral clustering (LSC) to accelerate superpixel segmentation. LSC replaces eigendecomposition with K-means clustering, in order to optimize the objective function and demonstrate the underlying mathematical equivalence between K-means and NCut. Although LSC reduces the amount of calculation, the similarity measure and restricted search range limit the boundary adherence capability.

The random walk (RW) method [30] labels pixels using the probability of non-labeled pixels reaching different objective regions for the first time. This method only considers the first arrival probability and the local relationship between pixels and their corresponding seeds, which may lead to irregular regions in the final superpixel map. Shen et al. [13] have proposed lazy random walks (LRW) to generate compact superpixels by considering self-loops and the commute time. In addition, LRW utilizes a relocation and splitting mechanism to ensure adherence to weak boundaries. As the entire graph must be traversed in each iteration, the computational cost is very high.

2.2. Gradient-Based Algorithms

Algorithms in this category extract superpixels by either gradient ascent or gradient descent. The early gradient-based method, known as watershed [31], obtains local minima and the similarity between neighboring pixels to segment images; however, it is sensitive to noise, and the shapes of superpixels are very irregular. In the last decade, gradually refining the initial segmentation regions has become the mainstream.

The Turbopixels method, proposed by Levinshtein et al. [32], aims to generate regular superpixels using level set based geometric flow evolution. This method starts with seed points and then iteratively dilates these seeds to generate highly uniform lattice-like superpixels. Although the superpixels are very compact, their adherence to boundaries is low, and this method is also slow, due to stability and efficiency issues of the level set. Wang et al. [33] have proposed a structure-sensitive superpixel technique using the geodesic distance. In this method, the geodesic distance obtained by geometric flows between pixels is adopted to guide clustering, which improves the segmentation accuracy. Although optimizing an energy function consisting of color homogeneity and structure density can adapt the size and number of superpixels to the image content, the computational cost is very high.

Simple linear iterative clustering (SLIC), proposed by Achanta et al. [12], is the most popular gradient-based segmentation algorithm. The scope of K-means clustering is first limited, which can reduce the time complexity to linear in each iteration. Although this method has high efficiency in each iteration, the cluster center requires multiple iterations to adhere to the image boundary. To generate superpixels in only one iteration, the author then proposed simple non-iterative clustering (SNIC) [34], using a priority queue and a real-time central point update strategy. However, the time cost of SNIC is not significantly reduced, especially in high-resolution images, and the boundary adherence capability is insufficient in complex regions. Zhang et al. [18] utilized a multi-feature weighted similarity measure function to increase the segmentation accuracy.

The above method is not sensitive to image content. To produce content-sensitive atomic regions, Liu et al. [10] have proposed manifold SLIC by mapping the input image to a two-dimensional manifold

M \subset R^{5}

, which can be used to measure the content density. Thus, the search range of each superpixel center can be adapted to the image content, but additional superpixels cannot be added in dense regions to improve boundary adherence. Di et al. [15] adopted hierarchical multi-level SLIC (Li-SLIC) to adaptively segment texture-sparse and texture-dense regions. Li-SLIC decides whether to perform the segmentation again by evaluating the color standard deviation of each superpixel. This approach is less robust and cannot be used in dense and sparse mixed regions. Zhang et al. [11] have developed a method in which the seed scattering conforms to the image complexity, according to the principle of image entropy equalization. However, their method relies heavily on manually preset values and cannot accurately place seed points within each object. At present, few methods can generate superpixels or an initial seed set that exactly matches the image structure.

3. Methodology

Superpixel segmentation is a process of pixel labeling, where each pixel has a unique label. In dense regions of the image, more superpixels are required to adhere to the object’s boundaries. In contrast, fewer compact superpixels should be distributed in sparse regions. In order to adapt the superpixels to the image content while increasing boundary adherence, we propose an IR image superpixel segmentation method based on a seed strategy of contour encoding. Our approach begins by encoding the contour map of the image, which is introduced in Section 3.1. Based on the coded result, a rough homogenous map

ψ

is obtained. Then, we present a seed sampling strategy extracted from

ψ

in Section 3.2, which initializes our cluster and makes the segmentation insensitive to disconnected contours. In Section 3.3, clustering is performed based on the initial seed set and weighted similarity evaluation. The optimal interval is applied to update the seed position, instead of the entire superpixel region. Finally, in Section 3.4, high-quality superpixels that can adapt to different shapes are obtained by refining the initial superpixel map based on a graph method. The block diagram of the proposed methodology is shown in Figure 1.

3.1. Contour Encoding

Most superpixel segmentation algorithms perform an undifferentiated segmentation operation on each part of the image while ignoring the image structure. In general, pixels around edges are the focus of segmentation. Image contours, which represent the boundaries between different regions, provide an efficient global clue. The homogeneous region map

ψ

can be initially generated by the contour map, and its guiding principle is that pixels surrounded by the same edge have a high similarity measure and a higher probability of belonging to the same objective region. An example is shown in Figure 2, where

p_{i}

,

p_{j}

, and

p_{s}

are three pixels. The pixels

p_{i}

and

p_{j}

are surrounded by the same contour, implying that their path

l_{i j}

is not crossed by edge

e

, such that they have higher similarity. To the contrary, the pixels

p_{j}

and

p_{s}

are separated by edge

e

, which means that their path

l_{j s}

is discontinuous, such that low similarity should be assigned. Therefore,

p_{i}

and

p_{j}

belong to the same superpixel, while

p_{s}

belongs to another.

Based on the above analysis, the similarity between pixel pairs can be described by their surrounding relationships with edges; that is, pixels surrounded by the same edge can form an initial homogeneous region. However, executing this idea in practice is difficult, as existing edge extraction algorithms, such as the Canny algorithm [35] with a bilateral filter used in this paper, fail to achieve complete closure (especially for weak boundary regions), which may lead to misjudgment of the relationship between pixels at edge breaks.

To quickly describe the relationship between pixel pairs and edges, the binary edge map

E

is scanned by rays. A set of parallel rays is drawn across

E

at an angle

ϑ_{i}

from an image reference axis, each ray will be cut into several segments by the edges. Pixels that are covered by the same ray interval should be grouped into the same region, as these pixels strictly satisfy the condition of not being split by edges. The corresponding intervals of all rays can jointly form the initial homogeneous region.

Based on the above analysis, we propose a contour encoding method to quickly generate initial homogenous regions. Both vertical and horizontal rays are used to scan the entire image, and we illustrate this in detail in the vertical direction. An example to clearly illustrate this process is shown in Figure 3, where black and white pixels belong to the edge and background, respectively. First, the initial value of each scan line

r (p_{0})

is encoded as 0, and each column of the binary edge map

E

is scanned from top to bottom. Then, when the scan line

r (p_{i})

reaches an edge pixel, the encoded value of the scan line is increased as follows:

r (p_{i}) = {\begin{array}{l} r (p_{i - 1}) + 1 & if p_{i} = - 1 and p_{i - 1} \neq - 1 \\ r (p_{i - 1}) & Otherwise \end{array},

(1)

where

p_{i} = - 1

and

p_{i - 1} \neq - 1

mean that the current pixel

p_{i}

and the previous pixel

p_{i - 1}

are an edge pixel and a background pixel, respectively. The encoded value changes only when a scan line faces an edge pixel, and then the encoded value is assigned to corresponding non-edge pixel that is scanned by the ray. Finally, after scanning the whole image, a connected domain with the same encoded value is considered to belong to the same homogeneous region, as shown in Figure 3b. After contour encoding, it is guaranteed that each connected domain is surrounded by the same edge.

3.2. Seed Sampling Strategy

Gradient-based superpixel generation algorithms usually begin with initial points to gradually form initial closed regions; for example, the watershed algorithm injects water from a local minimum point, while the superpixel growth of Turbopixels, SLIC, and SNIC start with a set of uniformly distributed seed points sampled from a regular grid. However, regular grid seed sampling is only related to the number of superpixels expected by the user, and ignores image structure information. Generally, the grid scattering strategy of seed points does not work well for different images or different regions of the same image, as it is independent of the variation in the input image content. Regions containing two small objects are difficult to be labelled separately by superpixels with large, expected sizes, while superpixels with small expected sizes may lead to severe over-segmentation.

In order to efficiently adapt the seed distribution to the image structure, we designed a novel seed scattering strategy using the initial homogeneous region formed by contour encoding. With the contour encoding method described in Section 3.1, we can obtain the initial homogeneous region, which is scanned from top to bottom, as shown in Figure 4a. A region surrounded by the same contour should be considered as a rational independent region, and arranging a seed point inside it is more conducive to generate superpixels with better boundary adherence, especially for small regions. In Figure 4a, each initial homogenous region completely satisfies the property that each pixel within it is surrounded by the same contour. Since the encoded value of each ray starts from 0, the region labeled by 0 at the beginning is usually large, such as the pink region in Figure 4a. Internal consistency is more difficult to ensure in large regions, as it is difficult to identify edges in weak boundary regions. Therefore, a set of rays in the reverse direction is applied to rescan the image and sub-divide the large region, as shown in Figure 4b. Then, two scan maps in opposite directions,

ϑ_{i}

and

π + ϑ_{i}

, are combined to form the joint initial vertical regions, as shown in Figure 4c. In the process of merging, the small region is dominant when two directional regions are overlapping.

However, each column is scanned by one ray, which may cause the ideal homogeneous region to be over-segmented after merging, such as the pink regions in Figure 4a,b which are over-segmented, as shown in corresponding position in Figure 4c. Therefore, further merging is needed for similar initial regions that are surrounded by the same contour, in order to reduce the number of seed points and, thus, reduce the over-segmentation phenomenon. At the same time, a merging strategy can also reduce the impact of edge detection anomalies occurring in the grayscale gradual region. Since the generation of the edge binary map depends on the degree of image gradient change, contour information cannot be effectively extracted in the progressive region. In the merging process of the contour encoding map, similarity metrics

M

are computed for adjacent regions which are not crossed by edges, represented as:

M_{v_{i} v_{j}} = {(\bar{v_{i}} - \bar{v_{j}})}^{2},

(2)

where

\bar{v_{i}}

is the grayscale average of the initial region

v_{i}

. When

M_{v_{i} v_{j}}

is less than the threshold value

M_{T}

,

v_{j}

merges into

v_{i}

, and

\bar{v_{i}}

is updated. For a scan map in direction

ϑ_{i}

, only merging regions in orthogonal direction is required. Additionally, when the region faces more than one adjacent region on a side, only the most similar adjacent region is selected for merging into the region.

Strong edges can ensure that the surrounded region is closed during edge extraction; however, IR images often contain many weak edges, leading to broken contours. If the ray passes through the location at a vertical angle

ϑ_{v}

where the edge is broken, two homogeneous regions cannot be segmented by the ray at the fracture, as shown in Figure 5a. In order to segment the edge fracture region effectively, a set of orthogonal rays is applied to rescan the image at a horizontal angle

ϑ_{h}

, as shown in Figure 5b. The ray in the orthogonal direction can segment the image again according to the edge map, and separate the initial consecutive vertical homogeneous regions at the edge break.

By scanning the IR image in orthogonal directions (i.e., vertical

ϑ_{v}

and horizontal

ϑ_{h}

), the joint initial orthogonal homogeneous regions

ϕ_{v}

(vertical direction) and

ϕ_{h}

(horizontal direction) are obtained, which are over-segmented such as Figure 4c. Then, compact initial orthogonal scan maps

ϕ^{'}_{v}

and

ϕ^{'}_{h}

are generated by merging adjacent initial regions that are not crossed by edges, based on similarity metrics

M

. Both

ϕ^{'}_{v}

and

ϕ^{'}_{h}

jointly form the final homogeneous map shown in Figure 6c. For a non-edge pixel

p_{i}

in the binary edge map, its region labels in

ϕ^{'}_{v}

and

ϕ^{'}_{h}

are denoted as

(ϕ^{'}_{v_i}, ϕ^{'}_{h_i})

, and pixels with the same

(ϕ^{'}_{v_i}, ϕ^{'}_{h_i})

are collected to form the final homogeneous map

ψ

, which is the basic element for the seed sampling strategy. Regions in

ψ

with area smaller than

t

, (

t

is related to the image resolution and recommended to be set to

1 . 5 \times 10^{- 4} \times N_{Total number in the image}

), pixels will be merged once into the most similar adjacent region to reduce redundant regions, unless there are no neighboring regions.

The clustering approach beginning with seed points performs better for convex regions, due to the characteristics of the compact shape. The capability of adhering to boundaries for non-convex targets will be reduced. In the jointly homogeneous map

ψ

, there are many regions with a large ratio of length to width, such as

l_{1}

in Figure 6c, and regions with large areas, such as

l_{2}

in Figure 6c. In SLIC [12], the distance term in the similarity metric can maintain compactness, such that the final superpixels tend to be quasi-circular. The farther the pixel is from the seed point, the greater the influence of the distance term on the similarity metric, and the more likely it is that a marking error will occur. Therefore, for regions with large aspect ratios or large areas, placing only one seed point in the interior cannot effectively describe the entire region. To accurately segment each region, the interval of seed point placement should be restricted. As in SLIC, a constant square search interval

S

is applied:

S = \sqrt{\frac{N}{K}},

(3)

where

N

and

K

represent the total number of pixels in the image and the expected density of seed points, respectively. For each homogeneous region

φ_{i} \in ψ

, the seed point sampling strategy is defined as follows:

{\begin{matrix} Place a seed point if φ_{i} - φ_{i} \cap S = None \\ Scan φ_{i} by S if φ_{i} - φ_{i} \cap S \neq None \end{matrix} .

(4)

If

φ_{i}

is completely covered by

S

, a seed point is placed at the centroid of

φ_{i}

; otherwise,

φ_{i}

is scanned with a sliding window

S

, as shown in Figure 7. If the length or width of

φ_{i}

covered by

S

,

(φ_{i} \cap S)

, is more than half of the length or width of the search interval, then a seed point is placed in the centroid of the covered area. The sliding window

S

moves to the next position and determines whether the placement condition is met until it covers the entire region.

After placing seed points for each region

φ_{i} \in ψ

, an initial set of seed points that can adapt to the image structure is obtained. As a summary, Algorithm 1 gives the main steps of our adaptive seed sampling strategy.

Algorithm 1. Adaptive seed sampling strategy.

Input: An image

I

and its binary edge map

e

; similarity metric threshold

M_{T}

and seed search interval

S

.
Output: Initial seed set of the image

I

/* Initialization */
Set the initial value of rays in direction

ϑ_{v}

,

π + ϑ_{v}

,

ϑ_{h}

and

π + ϑ_{h}

to 0
/* Initial homogeneous map

ψ

*/
for each direction

ϑ_{i}

of rays do
scan

e

by rays in direction

ϑ_{i}

if current

p_{i} = - 1

and previous

p_{i - 1} \neq - 1

then
the value of ray

r (p_{i}) = r (p_{i - 1}) + 1

else:

r (p_{i}) = r (p_{i - 1})

If

p_{i} \neq - 1

then
Label

l_{p_{i}} = r (p_{i})

to form initial scan map in direction

ϑ_{i}

end if
end if
end for
Construct

ϕ_{v}

by combining scan maps in direction

ϑ_{v}

and

π + ϑ_{v}

Construct

ϕ_{h}

by combining scan maps in direction

ϑ_{h}

and

π + ϑ_{h}

For

ϕ_{v}

and

ϕ_{h}

do
merging adjacent regions based on

M_{T}

to obtain

ϕ^{'}_{v}

and ϕ^{'}_{h}

end for
construct

ψ

by combining

ϕ^{'}_{v}

and

ϕ^{'}_{h}

/* Seed points placement */
for each region

φ_{i} \in ψ

do
if

φ_{i} - φ_{i} \cap S = ϕ

then
place a seed point inside

φ_{i}

else:
scan

φ_{i}

by S

and place seed if Equation (4) is met
end if
end for

3.3. Initial Superpixel Extraction

After obtaining the initial seed set, each pixel is labeled based on the similarity between the pixel and seed points to form superpixels. SLIC [12] limits the search range to a space proportional to the patch size and utilizes a multidimensional feature vector to constrain the segmentation process. Normally, the search range is limited to 2S × 2S, in order to achieve linear complexity.

D

is defined as the distance in multidimensional space, which describes the similarity of pixels and cluster centers:

D = \sqrt{{(d_{l a b})}^{2} + {(\frac{d_{s}}{S})}^{2} \cdot m^{2}},

(5)

d_{l a b} = \sqrt{{(l_{p} - l_{q})}^{2} + {(a_{p} - a_{q})}^{2} + {(b_{p} - b_{q})}^{2}},

(6)

d_{s} = \sqrt{{(x_{p} - x_{q})}^{2} + {(y_{p} - y_{q})}^{2}},

(7)

where

d_{l a b}

and

d_{s}

represent the color proximity and spatial proximity, respectively;

p

and

q

are two pixels within the bounded range;

[l, a, b]

and

(x, y)

indicate the color of pixels in the CIELAB color space and the pixel coordinate, respectively; and

m

is related to the boundary preservation ability.

Inspired by SLIC, the IR images are transformed into a three-dimensional feature vector

V = {[l, x, y]}^{T}

, which is expanded from the luminance space and coordinate space. Compared with color images, IR images only carry luminance information, and the classical SLIC algorithm is not applicable. We transform Equation (6) into a luminosity function suitable for IR images, where the spatial distance

D^{'}

can be expressed as:

D^{'} = \sqrt{d_{l}^{2} + {(\frac{d_{s}}{S})}^{2} \cdot m^{2}},

(8)

d_{l} = \sqrt{{(l_{p} - l_{q})}^{2}},

(9)

where

d_{l}

represents the luminance proximity and

l

is the luminance value.

In IR images, spot noise and blind elements, as local outliers, cannot be correctly clustered based on the similarity metric

D^{'}

. To improve the robustness to image noise, neighboring information is integrated when computing the luminance proximity

d_{l}

. A square region

ℝ (p)

of size

| ℝ (p) | = (2 n + 1) \times (2 n + 1)

, centered on pixel

p

, is constructed to represent the luminance value of

p

, instead of its own luminance value (

n

is set to 1 in this paper). In the region

ℝ (p)

, the set of non-edge pixels after excluding the extreme points is denoted as

ℝ^{'} (p)

, and its average value is used to calculate the luminance similarity between the central point

p

and seed point

C_{k}

. Equation (9) can thus be transformed into:

d_{l} (p, C_{k}) = \sqrt{{({\bar{l}}_{ℝ^{'} (p)} - l_{C_{k}})}^{2}},

(10)

{\bar{l}}_{ℝ^{'} (p)} = \frac{\sum_{p_{i} \in ℝ^{'}} l_{p_{i}}}{n (ℝ^{'})},

(11)

where

n (ℝ^{'})

and

{\bar{l}}_{ℝ^{'} (p)}

represents the total number of pixels and the average value of

ℝ^{'} (p)

, respectively. Combining pixel neighborhood information can improve the robustness to abnormal pixels.

In SLIC, the distance between each adjacent initial seed point is consistent due to regular grid sampling, such that pixels have equal spatial similarity weight to each seed point in 2S × 2S. In the initial seed map obtained by our seed sampling strategy, the distribution of seeds is related to the image content, resulting in different distances between each adjacent seed point, as shown in Figure 8. The spatial proximity weight

w (p_{i}, C_{k})

between pixel

p_{i}

and seed point

C_{k}

in the search range of 2S × 2S should be inversely proportional to the corresponding sampling area of

C_{k}

; for example, the shaded area in Figure 7. Equation (7) can be transformed into:

d_{s} (p_{i}, C_{k}) = \sqrt{w (p_{i}, C_{k}) [{(x_{p_{i}} - x_{C_{k}})}^{2} + {(y_{_{p_{i}}} - y_{_{C_{k}}})}^{2}]},

(12)

w (p_{i}, C_{k}) = 1 - \frac{Ω (C_{k})}{\sum_{g \in (0, n)} Ω (C_{g})},

(13)

where

Ω (C_{k})

represents the sampling area of seed point

C_{k}

, and

n

is the total number of seed points in the search range. When only considering the spatial distance similarity,

w (p_{i}, C_{k})

is introduced to balance the distribution of pixels within the search range to match the size of each superpixel. Without weighted balance,

p_{i}

will be categorized into

C_{2}

in Figure 8.

As mentioned in Section 3.1, pixels surrounded by the same contour have a higher probability of belonging to a uniform homogeneous region, such as

C_{1}

and

p_{i}

in Figure 8. Therefore, the edge prior map

E

is considered to constrain the similarity metric. When pixel

p_{i}

and seed point

C_{k}

are crossed by an edge from

E

, such as

C_{2}

and

p_{i}

in Figure 8, their similarity metric should be increased to prevent

p_{i}

from being associated with

C_{k}

. To ensure that the superpixel contours accurately follow the image contour, an edge penalty term

d_{λ}

is integrated into the similarity metric, as follows:

d_{λ} (p_{i}, C_{k}) = 1 + γ \max (λ (q)) q \in l_{C_{k}}^{p_{i}},

(14)

λ (q) = {\begin{array}{l} 1 & e (q) = - 1 \\ 0 & Otherwise \end{array},

(15)

where

γ

is the penalty control factor and

l_{C_{k}}^{p_{i}}

is the Euclidean path of two pixels (e.g.,

l_{1}

and

l_{2}

in Figure 8). As

γ

decreases, the constrained force of the boundary decreases, which is set to 0.3 in the following analysis. Combining Equations (10), (12) and (14), our proposed similarity metric is as follows:

D_{λ}^{'} (p_{i}, C_{k}) = D^{'} d_{γ} = (\sqrt{d_{l}^{2} + d_{S}^{2} \cdot {(\frac{m}{S})}^{2}}) (1 + γ \max (λ (q))) .

(16)

Based on the distance similarity metric, each pixel is assigned to the nearest seed point to minimize

D_{λ}^{'}

. After all pixels have been assigned, pixels that have the same label are combined to create initial superpixels. Then, the seed points are updated as the centroid of each initial superpixel, and the initial superpixel map can be obtained after several iterations.

3.4. Refining

In the process of initial superpixel extraction, multiple seed points may be placed in a homogeneous region in order to maintain the convex shape to keep the correct clustering, which may tend to increase over-segmentation and reduce semantics. Meanwhile if there is an edge inside an object, the object will be divided into two homogeneous regions. To reduce redundant superpixels so that the final superpixels can be adapted to differently shaped targets, a regional merger is applied to combine the over-segmented initial superpixels. The region adjacency graph (RAG) of each superpixel is constructed as shown in Figure 9.

The RAG, which is an undirected graph, provides a spatial view of the relationships between superpixels on a graph, where each superpixel is regarded as a node

v_{i}

. Let G = (V,E) be the RAG for superpixel

p_{i}

, where

V = {v_{1}, v_{2}, \dots, v_{n}}

is a set of nodes, and

E = {e_{1}, e_{2}, \dots, e_{n}}

is a set of corresponding boundaries. Each edge has a weight

E w_{i}

that represents the similarity between two adjacent nodes. We use the luminance average

\bar{I}

and texture entropy

e n

to achieve a reasonable representation of the regional similarity, as follows:

E w_{i} = a w_{l} + b w_{e n},

(17)

w_{l} = \frac{\min (\bar{I} (c_{i}), \bar{I} (c_{j}))}{\max (\bar{I} (c_{i}), \bar{I} (c_{j}))},

(18)

w_{e n} = \frac{\min (e n (c_{i}), e n (c_{j}))}{\max (e n (c_{i}), e n (c_{j}))},

(19)

where

c_{i}

and

c_{j}

are adjacent nodes; and

a

and

b

are two weight factors

(a + b = 1)

. The entropy value of image texture is widely used to characterize image texture information [18,36], which can be defined as:

e n = - \sum_{i = 0}^{H} P (i) \log [P (i)],

(20)

where

H

is the number of grayscale differences and

P (i)

is the probability of each gray difference obtained by the histogram statistic. When the texture of the IR image is simple, the luminance average

\bar{I}

is more important than the texture entropy

e n

, so we can set

a

greater than

b

. When

E w_{i}

is greater than the threshold

E w_{T}

,

p_{i}

and

p_{j}

are highly similar and, thus, the merging operation is performed. In the following experiments, we set

E w_{T} = 0.8

,

a = 0.5

, and

b = 0.5

. All superpixel blocks are searched, and the merging process of each layer is iterated until convergence; then, a final superpixel map is obtained, which has the desired characteristics. The clustering algorithm is summarized in Algorithm 2.

Algorithm 2. The clustering algorithm.

Input: An image

I

and its binary edge map

e

; initial seed set; search interval

S

and weight constant

m

; penalty control factor

γ

; iteration number

n

; weight parameter

a

,

b

; merging threshold

E w_{T}

Output: superpixel segmentation map
/* Initialization */
Replace

l_{p_{i}}

with

{\bar{l}}_{ℝ^{'} (p_{i})}

for each pixel

p_{i}

set label

l (p_{i}) = - 1

and distance

d (p_{i}) = \infty

for each pixel

p_{i}

/* Initial superpixel extraction */
repeat
for each pixel

p_{i}

do
record seed point

C_{k}

that appear in 2S × 2S centered on

p_{i}

compute spatial proximity weight

w (p_{i}, C_{k})

and edge penalty term

d_{λ}

between

p_{i}

and

C_{k}

end for
for each seed point

C_{k}

do
for each pixel

p_{i}

in 2S × 2S centered on

C_{k}

do
compute the distance

D^{'}_{λ} (p_{i}, C_{k})

between

p_{i}

and

C_{k}

if

D^{'}_{λ} (p_{i}, C_{k}) < d (p_{i})

then
set

l (p_{i}) = k

set

d (p_{i}) = D^{'}_{λ}

end if
end for
end for
update cluster centers
until iteration number =

n

/* Refining */
repeat
construct RAG based on the initial superpixel map
if the weight of edge

E w_{i}

between adjacent nodes

c_{i}

and

c_{j}

>

E w_{T}

then
merge

c_{j}

to

c_{i}

and update

c_{i}

end if
until convergence

4. Experiment

To evaluate the performance of our proposed method, we tested it on a real self-built IR dataset and compared it with state-of-the-art superpixel segmentation algorithms. Furthermore, extended experiments on public datasets were also conducted, in order to verify the generalization ability of our method on visible grayscale images and medical images. The experiments on these datasets demonstrate the effectiveness of our method. The proposed technique was implemented in MATLAB 2018a and tested on an i5 CPU with 16 GB of RAM.

4.1. Datasets

We compared our SSCE superpixel segmentation algorithm with five representative algorithms, including SLIC [12], SNIC [34], LRW [13], USEAQ [23], and CAS [37], on the real-world image dataset BSD500 [24], the medical image dataset 3Dircadb [25], and a self-built IR dataset. The self-built IR dataset was acquired using an uncooled infrared sensor with a size of 640 × 512. The basic parameters of the IR camera are shown in Table 1. IR imaging systems are commonly used in the field of surveillance, therefore this dataset contains natural scenes and urban buildings that partially contain aerial targets. BSD500 has been widely used for segmentation quality evaluation and contains 500 artificially labeled color images with multiple nature scenes. In order to evaluate the segmentation algorithm designed for IR images, the images in BSD500 were converted to grayscale. The 3Dircadb dataset contains 20 sets of CT images of biological tissues, and the corresponding regions of interest have been accepted by medical experts. Compared with BSD500, 3Dircadb contains gray images with more noise, due to the difference in imaging modes.

4.2. Benchmark

As mentioned in various superpixel studies [14,38], boundary adherence is the primary property in the superpixel evaluation system. To quantitatively evaluate the performance of the superpixel segmentation algorithms, two commonly used evaluation metrics were taken into account: boundary recall (BR) and under-segmentation error (USE).

BR is used to measure the degree of overlap between the boundary of the superpixel map and the boundary of an artificially labeled result. Better superpixel generation algorithms should lead to effective adherence to object boundaries. BR is expressed as follows:

B R = \frac{\sum_{p_{i} \in A L_{b}} I F (p_{i})}{| A L_{b} |},

(21)

I F (p_{i}) = {\begin{matrix} 0 Ω (p_{i}) \cap S M_{b} = \emptyset \\ 1 Ω (p_{i}) \cap S M_{b} \neq \emptyset \end{matrix},

(22)

where

A L_{b}

and

S M_{b}

represent the boundary of the artificially labeled map and the superpixel map, respectively;

| A L_{b} |

represents the total number of pixels in boundary

A L_{b}

;

Ω {(p}_{i})

is a region with a radius of two pixels centered at

p_{i}

; and

I F (\cdot)

is an indicator function: if there is a point on the edge of the superpixel falling within

Ω {(p}_{i})

, it returns 1; otherwise, it returns 0. The larger the value of BR, the stronger the boundary adherence capability.

For each superpixel

S_{j}

generated by the segmentation algorithm, the internal pixels should belong to the same object, and under-segmentation will occur when there are two objects within it. To measure the “leakage” of the superpixel over the artificially labeled region

G_{i}

, the USE metric is introduced, as follows:

U S E = \frac{1}{N} [\sum_{i = 1}^{M} (\sum_{S_{j} | S_{j} \cap G_{i} > B} | S_{j} |) - N],

(23)

where

N

is the total number of image pixels,

M

represents the number of artificially labeled segments, and

B

is the tolerance factor of

S_{j}

and

G_{i}

, which is set to 5% of

| S_{j} |

in this paper. A smaller value of USE indicates better overlapping between each superpixel and the artificially labeled regions. A value of USE close to zero indicates accurate segmentation.

4.3. Results

In the process of seed point sampling, a method of adaptive seed placement based on the contour prior is adopted, which can better adapt to the local image structure. The traditional regular grid sampling method ignores the complexity of image information, leading to under-segmentation in complex regions when the sampling interval is large. To improve the boundary adherence capability, most existing segmentation algorithms usually increase the sampling frequency, which leads to unnecessary segmentation in flat regions. In this section, we compare the results of the proposed method with those of state-of-the-art methods. The optimal parameters suggested by the authors were used for all comparative experiments, and the expected number of superpixels was set as 150. We implemented the proposed technique in MATLAB 2018a and tested it on an i5-8400 CPU with 16 GB of RAM. The parallel computing and multi-threaded GPU acceleration were prohibited.

Figure 10a–c and Figure 10d–f show quantitative comparisons of BR and USE, respectively, on the self-built dataset, BSD500 and 3Dircadb. In terms of the quantitative comparison, a similar conclusion can be inferred; that is, SSCE achieved better boundary adherence with fewer number of superpixels, while also keeping ahead with a higher number. As revealed in Figure 10a–c, when the expected number of superpixels was less than 300, the BR value of SSCE significantly differed from the others, demonstrating that SSCE has a dramatic advantage in boundary adherence. As the number of superpixels increases, the BR values of all methods gradually increased and became closer, but the growth rate slowed down significantly. SLIC and SNIC cannot generate superpixels based on image content, thus their performance lagged behind the others. The proposed SSCE clearly outperformed the others with a small number of superpixels, and was only rivaled by CAS when the number of superpixels was large. Benefitting from the boundary constraint in the clustering process, the USE value of SSCE indicated the best performance, as SSCE uses a weighted penalty for crossing borders. As the number of superpixels increased, all methods leaked fewer pixels from the ground segmentation, but the gain continually decreased. LRW can further segment superpixels based on thresholds, which is better than SLIC and SNIC. USEAQ and CAS also produced better results when a large expected number was set. From the analysis of the quantitative results, only our method achieved high-precision boundary adherence with fewer superpixels.

The average running time statistics of all tested methods for the three datasets were shown in Table 2. In all runtime tests, the data reading process was not included. From the results, it could be concluded that all methods were sensitive to image resolution, the higher the resolution, the longer the running time. LRW has the longest running time, while SSCE performed better than CAS. In cluster-based methods, SLIC and SNIC run less time since there were no additional conditions to constrain the clustering process. Overall, our approach had a higher yield.

To further demonstrate the performance of SSCE, we also conducted a qualitative analysis of SSCE and the most representative methods through visual inspection. Figure 11 demonstrates the superiority of our algorithm on the self-built IR image dataset.

According to Figure 11, SLIC, SNIC, LRW, and our method generated regularly shaped superpixels in the smooth regions. However, SLIC, SNIC, and LRW tended to produce superpixels with similar sizes and remained regular in complex texture regions, thus reducing the boundary adherence capability, especially in weak regions (e.g., the edge of the mountain in Example 2 of Figure 11). In contrast, USEAQ and CAS had better boundary adherence performance, as shown in Figure 11d,e, but their results were irregular. In Example 2 of Figure 11, both of them showed dentate and sinuous boundaries, and USEAQ achieved better boundary adherence at the cost of higher irregularity. Since our method places seed points according to the image content and the sampling interval is only used to maintain convexity, our method was able to maintain both regularity and better boundary adherence, as can be observed in Figure 11f. In the smooth regions, our method achieved compactness similar to that of SLIC, SNIC, and LRW, but the superpixels generated by our method could better extend along the edge of local objects. The amplified details in Figure 11 showed that only our method could completely segment the small objects, such as the road, bush, and aircraft, without increasing the expected number of superpixels. In summary, our proposed segmentation algorithm based on a seed sampling strategy improved the boundary adherence capability while maintaining regular shapes in the self-built IR dataset.

In order to validate the feasibility of our method on gray images and illustrate the strengths of our algorithm, our algorithm and the compared methods were tested on the most commonly used segmented dataset BSD500. The results presented in Figure 12 lead to similar conclusions as those from the experiment on the self-built IR dataset: our method maintained regular shapes in smooth regions and better adhered to boundaries in complex textured regions. As shown in Figure 12, the superpixels generated by SLIC, SNIC, LRW, and our method maintained smooth and regular shapes in smooth regions such as the sky; in comparison, those obtained with USEAQ and CAS were slightly less regular. However, in highly textured regions, the irregular shapes produced by USEAQ and CAS were more pronounced, with severe dentate boundaries, such as the zoomed-in stone in Figure 12, Example 3. Although SLIC, SNIC, and LRW remained compact, the boundary adherence capability was weakened, which can be observed in the aircraft tail in Figure 12, Example 1. Compared with the other methods, our method had better boundary adherence performance. Only our method was able to generate superpixels along the boundary of the caudal wing, and the letter “A” was also precisely segmented. We can observe, from the results, that when a low expected number of superpixels was set to ensure that no over-segmentation occurs in smooth regions, our method could appropriately increase the number of superpixels in the complex textured regions, according to the image content, to achieve correct segmentation. Therefore, our method is more advantageous, in terms of boundary tracking.

In addition to the public data set containing natural scenes, our method was also tested on the medical CT imaging dataset 3Dircadb, in order to verify its generalization ability. Although the imaging modality is different, CT images with weak boundaries and noise are relatively similar to IR images, in terms of visual quality, and pose challenges for segmentation. The results of segmentation on 3Dircadb are shown in Figure 13, in which columns (a–f) are the results generated by SLIC, SNIC, LRW, USEAQ, CAS, and our method, respectively. In Example 1, all methods except for SLIC could correctly segment the tissue in the red region, as the size of this tissue was similar to the expected size of the superpixel, and the shape was nearly convex. When the size of the tissue to be segmented is much smaller than the expected size of the superpixel, it is difficult to achieve accurate segmentation, as shown in Figure 13, Example 2. In the blue and green rectangles of Figure 13, compared to other methods, only our method was able to extract the tissue entirely, even if it was small. The results of the experiment on 3Dircadb provided clear support that the superpixels produced by our method had higher segmentation accuracy.

Most existing algorithms generate a superpixel map based on the expected number of superpixels. A large expected number of superpixels can improve the boundary adherence, but over-segmentation will be aggravated, especially in smooth regions. Although our SSCE achieved better boundary adherence, based on applying adaptive seed sampling instead of increasing the number of superpixels in each interval, we deliberately added additional superpixels to the large initial homogeneous regions, in order to maintain the convex shape. Therefore, there may be unnecessary segmentation in smooth regions, as was seen throughout the experiments. Merging similar initial superpixels can reduce the occurrence of over-segmentation and increase semantics, as shown in Figure 14. Compared to the initial superpixel map, the refinement results in smooth regions, such as the sky, allowed us to use fewer superpixels to represent objects. As a result, SSCE can achieve higher boundary adherence with fewer superpixels.

The spatial proximity weight and edge penalty term have a facilitating effect on boundary adherence. Ablation study about the spatial proximity weight and edge penalty term is shown in Figure 15. The spatial proximity weight can constrain the size of super-pixel close to the seed sampling area by balancing the spatial similarity. Without the spatial proximity weight, seeds from small sampling area will expand the enclosing region when facing seeds from large sampling area as shown in the red box in Figure 15a. The spatial proximity weight can reasonably compress the size of superpixels generated by seeds from small sampling region, which can improve boundary adherence as shown in the red box in Figure 15b. However, when the pixel is close to the seed point, the impact of luminance similarity will decrease. Crossing the image boundary easily occurs near the edge. To correct the similarity evaluation, edge penalty term is introduced. From Figure 15c, the edge constraint can improve the capability of boundary adherence. For seeds from the same homogeneous region, both spatial proximity weight and edge penalty term have no significant effect, as shown in blue box in Figure 15. In summary, SSCE has better boundary adherence.

4.4. Discussion

From the experiments, it is evident that most algorithms could improve the boundary adherence capability with an increasing number of superpixels, which was achieved at the cost of over-segmentation. A better superpixel extraction algorithm should achieve high accuracy with fewer superpixels, in order to maintain compactness and reduce the difficulty of subsequent tasks. In the quantitative results, more superpixels contributed to boundary adherence and when a certain number was exceeded, the precision improvement was marginal. A large number of small superpixels is simply the result of over-segmentation on previous superpixels, which is invalid for boundary adherence. Therefore, adaptively selecting the appropriate size of superpixels based on the local image content is a more efficient way to improve accuracy. Our seed sampling strategy, based on contour encoding, is capable of placing fewer seed points in smooth regions and more seed points in complex texture regions, in order to generate more appropriate superpixels. In other words, our method can fix the effective superpixels early, without adding a large number of superpixels, such that SSCE can obtain optimal BR values. In addition, the contour constraint can limit the generation of superpixels within the contours, which reduces the leakage of pixels. Our method achieves higher initial segmentation accuracy with fewer superpixels and further reduces redundant superpixels by merging the initial superpixel map. Experiments on a self-built IR dataset, BSD500 and 3Dircadb demonstrated the effectiveness of the proposed algorithm.

5. Conclusions

In this work, we present a novel image segmentation algorithm for IR images with stronger boundary adherence capability and reduced trade-off between compactness and segmentation accuracy. Compared with traditional grid seed sampling, our seed sampling strategy based on contour encoding is capable of scattering initial seed points depending on the image content. This seed strategy and contour constraint allow our method to maintain better boundary adherence with fewer superpixels. In smooth regions, a large superpixel is generated by a merging operation, which can avoid over-segmentation; while, in texturally complex regions, adaptively sized superpixels can accurately segment the objects. Experimental results on our self-built IR dataset and the public datasets BSD500 and 3Dircadb demonstrate that SSCE has a high generalization ability and exhibits better performance than state-of-the-art algorithms, in terms of segmentation accuracy and compactness.

Since superpixels are generated based on homogeneity within the image region, the associated semantics are reduced. Therefore, future work will focus on increasing the semantics of superpixels. Moreover, the accelerating effect of superpixel segmentation on pattern recognition will potentially be explored as well.

Author Contributions

All of the authors contributed to this study. Conceptualization, W.L. and F.L.; Methodology, W.L. and F.L.; Software, W.L.; Data curation, Z.M. and J.M.; Funding acquisition, F.L.; Writing—Original draft preparation, W.L.; Writing—review and editing, W.L., Z.M. and J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Shanghai Key Laboratory of Criminal Scene Evidence funded Foundation (Grant No. 2017xcwzk08) and the Innovation Fund of Shanghai Institute of Technical Physics (Grant No. CX-267).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Saranathan, A.M.; Parente, M. Uniformity-based superpixel segmentation of hyperspectral images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 1419–1430. [Google Scholar] [CrossRef]
Sun, N.; Jiang, F.; Yan, H.C.; Liu, J.X.; Han, G. Proposal generation method for object detection in infrared image. Infrared Phys. Technol. 2017, 81, 117–127. [Google Scholar] [CrossRef]
Ren, X.; Malik, J. Learning a classification model for segmentation. In Proceedings of the Ninth IEEE International Conference on Computer Vision, Washington, DC, USA, 13–16 October 2003; pp. 10–17. [Google Scholar] [CrossRef] [Green Version]
Xie, X.; Xie, G.; Xu, X.; Cui, L.; Ren, J. Automatic image segmentation with superpixels and image-level labels. IEEE Access. 2019, 7, 10999–11009. [Google Scholar] [CrossRef]
Yang, M.M.; Liu, J.C.; Li, Z.G. Superpixel-based single nighttime image haze removal. IEEE Trans. Multimed. 2018, 20, 3008–3018. [Google Scholar] [CrossRef]
Oliveira, A.Q.d.; Silveira, T.L.T.d.; Walter, M.; Jung, C.R. A hierarchical superpixel-based approach for DIBR view synthesis. IEEE Trans. Image Process. 2021, 30, 6408–6419. [Google Scholar] [CrossRef]
Zeng, X.; Wu, W.; Tian, G.; Li, F.; Liu, Y. Deep superpixel convolutional network for image recognition. IEEE Signal Process. Lett. 2021, 28, 922–926. [Google Scholar] [CrossRef]
Jia, S.; Zhan, Z.W.; Zhang, M.; Xu, M.; Huang, Q. Multiple feature-based superpixel-level decision fusion for hyperspectral and LiDAR data classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 1437–1452. [Google Scholar] [CrossRef]
Wang, M.; Dong, Z.; Cheng, Y.; Li, D. Optimal segmentation of high-resolution remote sensing image by combining superpixels with the minimum spanning tree. IEEE Trans. Geosci. Remote Sens. 2018, 56, 228–238. [Google Scholar] [CrossRef]
Liu, Y.; Yu, M.; Li, B.; He, Y. Intrinsic manifold SLIC: A simple and efficient method for computing content-sensitive superpixels. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 653–666. [Google Scholar] [CrossRef] [PubMed]
Zhang, D.; Xie, G.; Ren, J.C.; Zhang, Z.; Bao, W.L.; Xu, X.Y. Content-sensitive superpixel generation with boundary adjustment. Appl. Sci. 2020, 10, 3150. [Google Scholar] [CrossRef]
Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Süsstrunk, S. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2274–2282. [Google Scholar] [CrossRef] [Green Version]
Shen, J.; Du, Y.; Wang, W.; Li, X. Lazy random walks for superpixel segmentation. IEEE Trans. Image Process. 2014, 23, 1451–1462. [Google Scholar] [CrossRef] [PubMed]
Galvao, F.L.; Guimaraes, S.J.F.; Falcao, A.X. Image segmentation using dense and sparse hierarchies of superpixels. Pattern Recognit. 2020, 108, 107532. [Google Scholar] [CrossRef]
Di, S.; Liao, M.; Zhao, Y.; Li, Y.; Zeng, Y. Image superpixel segmentation based on hierarchical multi-level LI-SLIC. Opt. Laser Technol. 2021, 135, 106703. [Google Scholar] [CrossRef]
Lei, T.; Jia, X.; Zhang, Y.; Liu, S.; Meng, H.; Nandi, A.K. Superpixel-based fast fuzzy C-means clustering for color image segmentation. IEEE Trans. Fuzzy Syst. 2019, 27, 1753–1766. [Google Scholar] [CrossRef] [Green Version]
Saxena, A.; Prasad, M.; Gupta, A.; Bharill, N.; Patel, O.P.; Tiwari, A.; Er, M.J.; Ding, W.P.; Lin, C.T. A review of clustering techniques and developments. Neurocomputing 2017, 267, 664–681. [Google Scholar] [CrossRef] [Green Version]
Zhang, J.; Wang, P.; Gong, F.; Zhu, H.; Chen, N. Content-based superpixel segmentation and matching using its region feature descriptors. IEICE Trans. Inf. Syst. 2020, 103, 1888–1900. [Google Scholar] [CrossRef]
Chen, Y.T.; Li, Y.Y.; Wang, J.S. Remote aircraft target recognition method based on superpixel segmentation and image reconstruction. Math. Probl. Eng. 2020, 2020, 6087680. [Google Scholar] [CrossRef]
Giraud, R.; Ta, V.; Papadakis, N.; Berthoumieu, Y. Texture-aware superpixel segmentation. In Proceedings of the 2019 IEEE International Conference on Image Processing, Taipei, Taiwan, 22–25 September 2019; pp. 1465–1469. [Google Scholar] [CrossRef] [Green Version]
Zhang, Y.; Li, X.; Gao, X.; Zhang, C. A simple algorithm of superpixel segmentation with boundary constraint. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 1502–1514. [Google Scholar] [CrossRef]
Luo, B.; Xiong, J.K.; Xu, L.; Pei, Z. Superpixel segmentation based on global similarity and contour region transform. IEICE Trans. Inf. Syst. 2020, 103, 716–719. [Google Scholar] [CrossRef]
Huang, C.; Wang, W.; Wang, W.; Lin, S.; Lin, Y. USEAQ: Ultra-fast superpixel extraction via adaptive sampling from quantized regions. IEEE Trans. Image Process. 2018, 27, 4916–4931. [Google Scholar] [CrossRef]
Arbelaez, P.; Maire, M.; Fowlkes, C.; Malik, J. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 898–916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Soler, L.; Hostettler, A.; Agnus, V.; Charnoz, A.; Fasquel, J.B.; Moreau, J. 3D Image Reconstruction for Comparison of Algorithm Database: A Patient-Specifc anatomical and Medical Image Database. Available online: https://www.ircad.fr/research/3d-ircadb-01/ (accessed on 21 September 2021).
Shi, J.; Malik, J. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 888–905. [Google Scholar] [CrossRef] [Green Version]
Felzenszwalb, P.F.; Huttenlocher, D.P. Efficient graph-based image segmentation. Int. J. Comput Vis. 2004, 59, 167–181. [Google Scholar] [CrossRef]
Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
Chen, J.; Li, Z.; Huang, B. Linear spectral clustering superpixel. IEEE Trans. Image Process. 2017, 26, 3317–3330. [Google Scholar] [CrossRef]
Grady, L. Random walks for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1768–1783. [Google Scholar] [CrossRef] [Green Version]
Vincent, L.; Soille, P. Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 13, 583–598. [Google Scholar] [CrossRef] [Green Version]
Levinshtein, A.; Stere, A.; Kutulakos, K.N.; Fleet, D.J.; Dickinson, S.J.; Siddiqi, K. Turbopixels: Fast superpixels using geometric flows. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 2290–2297. [Google Scholar] [CrossRef] [Green Version]
Wang, P.; Zeng, G.; Gan, R.; Wang, J.D.; Zha, H.B. Structure-sensitive superpixels via geodesic distance. Int. J. Comput. Vis. 2013, 103, 1–21. [Google Scholar] [CrossRef] [Green Version]
Achanta, R.; Süsstrunk, S. Superpixels and polygons using simple non-iterative clustering. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4895–4904. [Google Scholar] [CrossRef] [Green Version]
Bao, P.; Lei, Z.; Xiaolin, W. Canny edge detection enhancement by scale multiplication. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1485–1490. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zheng, M.Y.; Qi, G.Q.; Zhu, Z.Q.; Li, Y.Y.; Wei, H.Y.; Liu, Y. Image dehazing by an artificial image fusion method based on adaptive structure decomposition. IEEE Sens. J. 2020, 20, 8062–8072. [Google Scholar] [CrossRef]
Xiao, X.; Zhou, Y.; Gong, Y. Content-adaptive superpixel segmentation. IEEE Trans. Image Process. 2018, 27, 2883–2896. [Google Scholar] [CrossRef] [PubMed]
He, W.; Li, C.; Guo, Y.; Wei, Z.; Guo, B. A two-stage gradient ascent-based superpixel framework for adaptive segmentation. Appl. Sci. 2019, 9, 2421. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Block diagram of the proposed methodology.

Figure 2. An example of edge-based relationships between pixels. Pixels surrounded by the same edge, such as

p_{i}

and

p_{j}

, are likely to be similar. Correspondingly, pixels separated by edges, such as

p_{j}

and

p_{s}

, are likely to belong to different homogeneous regions.

Figure 2. An example of edge-based relationships between pixels. Pixels surrounded by the same edge, such as

p_{i}

and

p_{j}

, are likely to be similar. Correspondingly, pixels separated by edges, such as

p_{j}

and

p_{s}

, are likely to belong to different homogeneous regions.

Figure 3. An example of the scan process in the vertical direction. (a) Binary edge map; and (b) the encoded result of (a). The black region is the edges, and each color represents a homogeneous region.

Figure 4. An example of initial homogenous region generation. (a) Scan from top to bottom; (b) scan from bottom to top; and (c) the joint initial map that combines both directions into the vertical direction.

Figure 5. A set of orthogonal rays is used to resolve the segmentation at the edge break. (a) Ray scanning in the vertical direction cannot split two adjacent homogeneous regions at the fracture; (b) horizontal rays rescan the image to re-segment the initial homogeneous region at the fracture.

Figure 6. The jointly homogeneous map combined by orthogonal initial homogeneous maps. (a) the compact initial vertical scan map

ϕ^{'}_{v}

after merging adjacent initial regions in Figure 4c; (b) the compact initial horizontal scan map

ϕ^{'}_{h}

; (c) the jointly homogeneous map

ψ

.

Figure 6. The jointly homogeneous map combined by orthogonal initial homogeneous maps. (a) the compact initial vertical scan map

ϕ^{'}_{v}

after merging adjacent initial regions in Figure 4c; (b) the compact initial horizontal scan map

ϕ^{'}_{h}

; (c) the jointly homogeneous map

ψ

.

Figure 7. Example of the seed sampling strategy for an initial homogeneous region that cannot be completely covered by S.

Figure 8. Example of seed distribution in 2S × 2S. The distance l₁ between pixel p_i and seed point C₁ is much larger than the distance l₂ from C₂.

Figure 9. Example of RAG in a local region. (a) A local region in the initial segmented map; (b) the corresponding RAG; and (c) the next RAG after merging

p_{1}

and

p_{5}

, if the weight

w

is greater than

w_{T}

.

Figure 9. Example of RAG in a local region. (a) A local region in the initial segmented map; (b) the corresponding RAG; and (c) the next RAG after merging

p_{1}

and

p_{5}

, if the weight

w

is greater than

w_{T}

.

Figure 10. Quantitative evaluation of other superpixel segmentation algorithms and the proposed SSCE: (a–c) BR curves; (d–f) USE curves of self-built dataset, BSD500 and 3Dircadb, respectively.

Figure 11. Experimental results of IR image superpixels in the self-built dataset from compared algorithms: (a) SLIC; (b) SNIC; (c) LRW; (d) USEAQ; (e) CAS; and (f) our method. The zoomed-in regions located below the results show that our method has better capability of adhering to boundaries.

Figure 12. Visual comparison of superpixel segmentation among six different methods tested on the BSD500 dataset. Images in each column represent the results of: (a) SLIC; (b) SNIC; (c) LRW; (d) USEAQ; (e) CAS; and (f) our method. The expected number of superpixels was set as 150.

Figure 13. Experimental results on the medical image dataset 3Dircadb: (a) result of SLIC; (b) result of SNIC; (c) result of LRW; (d) result of USEAQ; (e) result of CAS and (f) result of our method. Alternating rows show the segmented results and their local details.

Figure 14. Refined results after merging similar initial superpixels. The initial superpixel maps of Figure 11f and the corresponding merged results are interlaced in (a–f).

Figure 15. Ablation study on the impact of the spatial proximity weight and edge penalty term. (a) Result without any constraints; (b) result only with the spatial proximity weight; (c) result of SSCE.

Table 1. The basic parameters of the IR camera.

Detector Type	Uncooled VOx Microbolometer
Pixel Pitch	12 μm
Spectral Response	LWIR, 8–14 μm
NETD	<40 mK
Frame Rate	60 Hz
Focal Length	40 mm

Table 2. Running times of different methods on three datasets (in seconds).

	SLIC	SNIC	LRW	USEAQ	CAS	SSCE
IR Dataset	0.116	0.143	148.515	0.099	0.253	0.216
BSD500	0.067	0.075	72.913	0.060	0.138	0.134
3Dircadb	0.083	0.113	128.134	0.076	0.188	0.177
Average	0.089	0.110	116.520	0.078	0.193	0.176

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, W.; Miao, Z.; Mu, J.; Li, F. Infrared Image Superpixel Segmentation Based on Seed Strategy of Contour Encoding. Appl. Sci. 2022, 12, 602. https://doi.org/10.3390/app12020602

AMA Style

Li W, Miao Z, Mu J, Li F. Infrared Image Superpixel Segmentation Based on Seed Strategy of Contour Encoding. Applied Sciences. 2022; 12(2):602. https://doi.org/10.3390/app12020602

Chicago/Turabian Style

Li, Weihua, Zhuang Miao, Jing Mu, and Fanming Li. 2022. "Infrared Image Superpixel Segmentation Based on Seed Strategy of Contour Encoding" Applied Sciences 12, no. 2: 602. https://doi.org/10.3390/app12020602

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Infrared Image Superpixel Segmentation Based on Seed Strategy of Contour Encoding

Abstract

1. Introduction

2. Related Work

2.1. Graph-Based Algorithms

2.2. Gradient-Based Algorithms

3. Methodology

3.1. Contour Encoding

3.2. Seed Sampling Strategy

3.3. Initial Superpixel Extraction

3.4. Refining

4. Experiment

4.1. Datasets

4.2. Benchmark

4.3. Results

4.4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI