Riesz Networks: Scale-Invariant Neural Networks in a Single Forward Pass

Barisin, Tin; Schladitz, Katja; Redenbach, Claudia

doi:10.1007/s10851-024-01171-4

Riesz Networks: Scale-Invariant Neural Networks in a Single Forward Pass

Open access
Published: 29 February 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Riesz Networks: Scale-Invariant Neural Networks in a Single Forward Pass

Download PDF

Tin Barisin^1,2,
Katja Schladitz² &
Claudia Redenbach¹

344 Accesses
1 Altmetric
Explore all metrics

Abstract

Scale invariance of an algorithm refers to its ability to treat objects equally independently of their size. For neural networks, scale invariance is typically achieved by data augmentation. However, when presented with a scale far outside the range covered by the training set, neural networks may fail to generalize. Here, we introduce the Riesz network, a novel scale- invariant neural network. Instead of standard 2d or 3d convolutions for combining spatial information, the Riesz network is based on the Riesz transform which is a scale-equivariant operation. As a consequence, this network naturally generalizes to unseen or even arbitrary scales in a single forward pass. As an application example, we consider detecting and segmenting cracks in tomographic images of concrete. In this context, ‘scale’ refers to the crack thickness which may vary strongly even within the same sample. To prove its scale invariance, the Riesz network is trained on one fixed crack width. We then validate its performance in segmenting simulated and real tomographic images featuring a wide range of crack widths. An additional experiment is carried out on the MNIST Large Scale data set.

Methods for image denoising using convolutional neural network: a review

Article Open access 10 June 2021

CBAM: Convolutional Block Attention Module

Deep learning models for digital image processing: a review

Article 07 January 2024

1 Introduction

In image data, similar objects may occur at highly varying scales. Examples are cars or pedestrians at different distances from the camera, cracks in concrete of varying thickness or imaged at different resolutions, or blood vessels in biomedical applications (see Fig. 1). It is natural to assume that the same object or structure at different scales should be treated equally, i.e., should have equal or at least similar features. This property is called scale or dilation invariance and has been investigated in detail in classical image processing [1,2,3].

Neural networks have proven to segment and classify robustly and well in many computer vision tasks. Nowadays, the most popular and successful neural networks are convolutional neural networks (CNNs). It would be desirable that neural networks share typical properties of human vision such as translation, rotation, or scale invariance. While this is true for translation invariance, CNNs are not scale or rotation invariant by default. This is due to the excessive use of convolutions which are local operators. Moreover, training sets often contain a very limited number of scales. To overcome this problem, CNNs are often trained with rescaled images through data augmentation. However, when a CNN is given input whose scale is outside the range covered by the training set, it will not be able to generalize [4, 5]. To overcome this problem, a CNN trained at a fixed scale can be applied to several rescaled versions of the input image and the results can be combined. This, however, requires multiple runs of the network.

One application example, where the just described challenges naturally occur, is the task of segmenting cracks in 2d or 3d gray scale images of concrete. Crack segmentation in 2d has been a vividly researched topic in civil engineering, see [6] for an overview. Cracks are naturally multiscale structures (Fig. 1, left) and hence require multiscale treatment. Nevertheless, adaption to scale (crack thickness^{Footnote 1}) has not been treated explicitly so far.

Recently, crack segmentation in 3d images obtained by computed tomography (CT) has become a subject of interest [6, 7]. Here, the effect of varying scales is even more pronounced [8]: crack thicknesses can vary from a single pixel to more than 100 pixels. Hence, the aim is to design and evaluate crack segmentation methods that work equally well on all possible crack widths without complicated adjustment by the user.

In this work, we focus on 2d multiscale crack segmentation in images of concrete samples. We design the Riesz network which replaces the popular 2d convolutions by first- and second-order Riesz transforms to allow for a scale-invariant spatial operation. The resulting neural network is provably scale-invariant in only one forward pass. It is sufficient to train the Riesz network on one scale or crack thickness, only. The network then generalizes automatically without any adjustments or rescaling to completely unseen scales. We validate the network performance using images with simulated cracks of constant and varying widths generated as described in [6, 10]. Our network is compared with competing methods for multiscale segmentation and finally applied to real multiscale cracks observed in 2d slices of tomographic images.

There is just one publicly available dataset which allows for testing scale equivariance—MNIST Large Scale [5]. Additional experiments with the Riesz network on this dataset are reported in “Appendix A”.

1.1 Related Work

1.1.1 The Riesz Transform

The Riesz transform is a generalization of the Hilbert transform to higher dimensional spaces, see, e.g., [11]. First practical applications of the Riesz transform arise in signal processing through the definition of the monogenic signal [12] which enables a decomposition of higher dimensional signals into local phase and local amplitude. First, a bandpass filter is applied to the signal to separate the band of frequencies. Using the Riesz transform, the local phase and amplitude can be calculated for a selected range of frequencies. For more details, we refer to [12, 13].

As images are 2d or 3d signals, applications of the Riesz transform naturally extend to the fields of image processing and computer vision through the Poisson scale space [14, 15] which is an alternative to the well-known Gaussian scale space. Köthe [16] compared the Riesz transform with the structure tensor from a signal processing perspective. Unser and van de Ville [17] related higher-order Riesz transforms and derivatives. Furthermore, they give a reason for preferring the Riesz transform over the standard derivative operator: The Riesz transform does not amplify high frequencies.

Higher-order Riesz transforms were also used for analysis of local image structures using ideas from differential geometry [18, 19]. Benefits of using the first- and second-order Riesz transforms as low-level features have also been shown in measuring similarity [20], analyzing and classification of textures [11, 21], and orientation estimation [22, 23]. The Riesz transform can be used to create steerable wavelet frames, so-called Riesz-Laplace wavelets [17, 24], which are the first ones utilizing the scale equivariance property of the Riesz transform and have inspired the design of quasi monogenic shearlets [25].

Interestingly, in early works on the Riesz transform in signal processing or image processing [12, 13, 15], scale equivariance has not been noticed as a feature of the Riesz transform and hence remained sidelined. Benefits of the scale equivariance have been shown later in [17, 19].

Recently, the Riesz transform found its way into the field of deep learning: Riesz transform features are used as supplementary features in classical CNNs to improve robustness [26]. In our work, we will use the Riesz transforms for extracting low-level features from images and use them as basis functions which replace trainable convolutional filters in CNNs or Gaussian derivatives in [27].

1.1.2 Scale-Invariant Deep Learning Methods

Deep learning methods which have mechanisms to handle variations in scale effectively can be split in two groups based on their scale generalization ability.

Scale-invariant deep learning methods for a limited range of scales The first group can handle multiscale data but is limited to the scales represented either by the training set or by the neural network architecture. The simplest approach to learn multiscale features is to apply the convolutions to several rescaled versions of the images or feature maps in every layer and to combine the results by maximum pooling [4] or by keeping the scale with maximal activation [28] before passing it to the next layer. In [29, 30], several approaches based on downscaling images or feature maps with the goal of designing robust multiscale object detectors are summarized. However, scaling is included in the network architecture such that scales have to be selected a priori. Therefore, this approach only yields local scale invariance, i.e., an adaption to the scale observed in a given input image is not possible after training.

Another intuitive approach is to rescale trainable filters, i.e., convolutions, by interpolation [31]. In [29], a new multiscale strategy was derived which uses convolution blocks of varying sizes sequenced in several downscaling layers creating a pyramidal shape. The pyramidal structure is utilized for learning scale dependent features and making predictions in every downsampling layer. Layers can be trained according to object size. That is, only the part of the network relevant for the object size is optimized. This guarantees robustness to a large range of object scales. Similarly, in [30], a network consisting of a downsampling pyramid followed by an upsampling pyramid is proposed. Here, connections between pyramid levels are devised for combining low- and high-resolution features and predictions are also made independently on every pyramid level. However, in both cases, scale generalization properties of the networks are restricted by their architecture, i.e., by the depth of the network (number of levels in the image pyramid), the size of convolutions as spatial operators as well as the size of the input image.

Spatial transformer networks [32] focus on invariance to affine transformations including scale. This is achieved by using a so-called localization network which learns transformation parameters. Finally, using these transformation parameters, a new sampling grid can be created and feature maps are resampled to it. These parts form a trainable module which is able to handle and correct the effect of the affine transformations. However, spatial transformer networks do not necessarily achieve invariant recognition [33]. Also, it is not clear how this type of network would generalize to scales not represented in the training set.

In [34], so-called structured receptive fields are introduced. Linear combinations ($1\times 1$ convolutions) of basis functions (in this case Gaussian derivatives up to fourth order) are used to learn complex features and to replace convolutions (e.g., of size $3\times 3$ or $5\times 5$). As a consequence, the number of parameters is reduced, while the expressiveness of the neural network is preserved. This type of network works better than classical CNNs in the case where little training data is available. However, the standard deviation parameters of the Gaussian kernels are manually selected and kept fixed. Hence, the scale generalization ability remains limited.

Making use of the semi-group property of scale spaces, scale- equivariant neural networks motivate the use of dilated convolutions [35] to define scale-equivariant convolutions on the Gaussian scale space [36] or morphological scale spaces [37]. Unfortunately, these neural networks are unable to generalize to scales outside those determined by their architecture and are only suitable for downscale factors which are powers of 2, i.e., $\{2,4,8,16,\ldots \}$. Furthermore, scale-equivariant steerable networks [38] show how to design scale-invariant networks on the scale-translation group without using standard or dilated convolutions. Following an idea from [34], convolutions are replaced by linear combinations of basis filters (Hermite polynomials with Gaussian envelope). While this allows for non-integer scales, scales are still limited to powers of a positive scaling factor a. Scale space is again discretized and sampled. Hence, a generalization to arbitrary scales is not guaranteed.

Scale-invariant deep learning methods for arbitrary scales

The second group of methods can generalize to arbitrary scales, i.e., any scales that are in range bounded from below by image resolution and from above by image size, but not necessarily contained in the training set. Our Riesz network also belongs to this second group of methods.

An intuitive approach is to train standard CNNs on a fixed range of scales and enhance their scale generalization ability by the following three-step procedure based on image pyramids: downsample by a factor $a>1$, forward pass of the CNN, upsample the result by $\frac{1}{a}$ to the original image size [5, 8]. Finally, forward passes of the CNN from several downsampling factors $\{a_1,~\cdots ~,a_n~>~0\quad | \quad n\in {\mathbb {N}}\}$ are aggregated by using the maximum or average operator across the scale dimension. This approach indeed guarantees generalization to unseen scales as scales can be adapted to the input image and share the weights of the network [5]. However, it requires multiple forward passes of the CNN and the downsampling factors have to be selected by the user.

Inspired by scattering networks [39, 40], normalized differential operators based on first- and second-order Gaussian derivatives stacked in layers or a cascade of a network can be used to extract more complex features [41]. Subsequently, these features serve as an input for a classifier such as a support vector machine. Varying the standard deviation parameter $\sigma $ of the Gaussian kernel, generalization to new scales can be achieved. However, this type of network is useful for creating handcrafted complex scale-invariant features, only, and hence is not trainable.

Its expansion to trainable networks by creating so-called Gaussian derivative networks [27] is one of the main inspirations for our work. For combining spatial information, $\gamma $-normalized Gaussian derivatives are used as scale-equivariant operators ($\gamma = 1$). Similarly as in [34], linear combinations of normalized derivatives are used to learn more complex features in the spirit of deep learning. However, the Gaussian derivative network is based on ideas from scale space theory and requires the specification of a wide enough range of scales that cover all the scales present in the training and testing set. Hence, the scale dimension needs to be discretized and sampled densely. These networks have a separate channel for every scale and the network weights are shared between channels. Scale invariance is achieved by maximum pooling over the multiple scale channels.

2 The Riesz Transform

Let $L_2({\mathbb {R}}^d) = \{ f:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\text { } | \text { } \int _{{\mathbb {R}}^d}{|f(x)|^2dx < \infty }\}$ be the set of square integrable functions.

Formally, for a d-dimensional signal $f\in L_2({\mathbb {R}}^d)$ (i.e., an image or a feature map), the Riesz transform of first-order ${\mathcal {R}}=({\mathcal {R}}_1,\ldots ,{\mathcal {R}}_d)$ is defined in the spatial domain as ${\mathcal {R}}_{j}: L_2({\mathbb {R}}^d) \rightarrow L_2({\mathbb {R}}^d)$

$$\begin{aligned} {\mathcal {R}}_{j}(f)(x) = C_d \lim _{\epsilon \rightarrow 0}{\int _{{\mathbb {R}}^d \setminus B_{\varepsilon }}{\frac{y_jf(x-y)}{|y|^{d+1}}{\textrm{d}}y}}, \end{aligned}$$

(1)

where $C_d = \Gamma ((d+1)/2)/\pi ^{(d+1)/2}$ is a normalizing constant and $B_{\varepsilon }$ is ball of radius $\varepsilon $ centered at the origin. Alternatively, the Riesz transform can be defined in the frequency domain via the Fourier transform ${\mathcal {F}}$

$$\begin{aligned} {\mathcal {F}}({\mathcal {R}}_j(f))(u) = -i\frac{u_j}{|u|}{\mathcal {F}}(f)(u) = \frac{1}{|u|}{\mathcal {F}}(\partial _j f)(u), \end{aligned}$$

(2)

for $j \in \{1,\ldots ,d\}$. Higher-order Riesz transforms are defined by applying a sequence of first-order Riesz transforms. That is, for $k_1,k_2,...,k_d\in {\mathbb {N}}\cup \{0\}$ we set

$$\begin{aligned} {\mathcal {R}}^{(k_1,k_2,...,k_d)} (f)(x) := {\mathcal {R}}^{k_1}_1( {\mathcal {R}}^{k_2}_2( \cdots ({\mathcal {R}}^{k_d}_d(f(x)))), \end{aligned}$$

(3)

where ${\mathcal {R}}_j^{k_j}$ refers to applying the Riesz transform ${\mathcal {R}}_j$ $k_j$ times in a sequence.

The Riesz transform kernels of first- and second-order resemble those of the corresponding derivatives of smoothing filters such as Gaussian or Poisson filters (Fig. 2). This can be explained by the following relations

$$\begin{aligned}&{\mathcal {R}}(f) =(-1)(-\triangle )^{-1/2} \nabla f \end{aligned}$$

(4)

$$\begin{aligned}&{\mathcal {R}}^{(k_1,k_2,...,k_d)} (f)(x)= (-1)^N(- \triangle )^{-N/2} \frac{\partial ^N f(x)}{\partial ^{k_1}x_1 \cdots \partial ^{k_d}x_d}, \end{aligned}$$

(5)

for $k_1 +... + k_d=N$ and $N\in {\mathbb {N}}$. The fractional Laplace operator $\triangle ^{N/2}$ acts as an isotropic low-pass filter. The main properties of the Riesz transform can be summarized in the following way [17]:

translation equivariance: For $x_0 \in {\mathbb {R}}^d$ define a translation operator ${\mathcal {T}}_{x_0}(f)(x):L_2({\mathbb {R}}^d) \rightarrow L_2({\mathbb {R}}^d)$ as ${\mathcal {T}}_{x_0}(f)(x) = f(x-x_0)$. It holds that
$$\begin{aligned} {\mathcal {R}}_j({\mathcal {T}}_{x_0}(f))(x) = {\mathcal {T}}_{x_0}({\mathcal {R}}_j(f))(x), \end{aligned}$$
(6)
where $j\in \{1,\ldots ,d\}$. This property reflects the fact that the Riesz transform commutes with the translation operator.
steerability: The directional Hilbert transform ${\mathcal {H}}_v:L_2({\mathbb {R}}^d) \rightarrow L_2({\mathbb {R}}^d)$ in direction $v\in {\mathbb {R}}^d$, $||v||=1$, is defined as ${\mathcal {F}}({\mathcal {H}}_v(f))(u) = i \text { } {{\,\textrm{sign}\,}}(\langle u,v\rangle )$. ${\mathcal {H}}_v$ is steerable in terms of the Riesz transform, that is, it can be written as a linear combination of the Riesz transforms
$$\begin{aligned} {\mathcal {H}}_v(f)(x) = \sum _{j=1}^d v_j {\mathcal {R}}_j(f)(x) = \langle {\mathcal {R}} (f)(x), v\rangle . \end{aligned}$$
(7)
Note that in 2d, for a unit vector $v=(\cos \theta , \sin \theta ),\ \theta \in \left[ 0,2\pi \right] $, the directional Hilbert transform becomes ${\mathcal {H}}_v(f)(x) = \cos (\theta ){\mathcal {R}}_1(f)(x) + \sin (\theta ){\mathcal {R}}_2(f)(x)$. This is equivalent to the link between gradient and directional derivatives [17] and a very useful property for learning oriented features.
all-pass filter [12]: Let $H = (H_1, \cdots , H_d)$ be the Fourier transform of the Riesz kernel, i.e., ${\mathcal {F}}({\mathcal {R}}_j(f))(u) = i\frac{u_j}{|u|}{\mathcal {F}}(f)(u) = H_j(u){\mathcal {F}}(f)(u)$. The energy of the Riesz transform for frequency $u \in {\mathbb {R}}^d$ is defined as the norm of the d-dimensional vector H(u) and has value 1 for all non-zero frequencies $u\ne 0$, i.e.,
$$\begin{aligned} ||H(u)|| = 1, \quad u \ne 0. \end{aligned}$$
(8)
The all-pass filter property reflects the fact that the Riesz transform is a non-local operator and that every frequency is treated fairly and equally. Combined with scale equivariance, this eliminates the need for multiscale analysis or multiscale feature extraction.
scale (dilation) equivariance: For $a>0$ define a dilation or rescaling operator $L_{a}:L_2({\mathbb {R}}^d) \rightarrow L_2({\mathbb {R}}^d)$ as $L_{a}(f)(x) = f(\frac{x}{a})$. Then
$$\begin{aligned} {\mathcal {R}}_j(L_{a}(f))(x) = L_{a}({\mathcal {R}}_j(f))(x), \end{aligned}$$
(9)
for $j\in \{1,\ldots ,d\}$. That is, the Riesz transform does not only commute with translations but also with scaling.

Scale equivariance enables an equal treatment of the same objects at different scales. As this is the key property of the Riesz transform for our application, we will briefly present a proof. We restrict to the first order in the Fourier domain. The proof for higher orders follows directly from the one for the first order.

Lemma 1

The Riesz transform is scale-equivariant, i.e.,

$$\begin{aligned} {\mathcal {R}}_{i}f\Big (\frac{\cdot }{a}\Big ) = [{\mathcal {R}}_{i}f]\Big (\frac{\cdot }{a}\Big ). \end{aligned}$$

(10)

for $f \in L_2({\mathbb {R}}^d)$.

Proof

Remember that the Fourier transform of a dilated function is given by ${\mathcal {F}}(f(\alpha \cdot ))(u) = \frac{1}{\alpha ^d} {\mathcal {F}}(f)(\frac{u}{\alpha })$. Setting $g(x) = f(\frac{x}{a})$, we have ${\mathcal {F}}(g)(u) = a^d {\mathcal {F}}(f)(au)$. This yields

$$\begin{aligned}&{\mathcal {F}}\Bigg ({\mathcal {R}}_j\Big (f\big (\frac{\cdot }{a}\big )\Big )\Bigg )(u) = {\mathcal {F}}\Big ({\mathcal {R}}_j(g)\Big )(u) = \\&= i \frac{u_j}{|u|} {\mathcal {F}}(g)(u) = i \frac{u_j}{|u|} a^d {\mathcal {F}}(f)(au) = \\&= a^d \Big (i\frac{au_j}{a|u|}\Big ){\mathcal {F}}(f)(au) = a^d {\mathcal {F}}\Big ({\mathcal {R}}_j(f)\Big )(au) = \\&= {\mathcal {F}}\Bigg ({\mathcal {R}}_j(f)\Big (\frac{\cdot }{a}\Big )\Bigg )(u). \end{aligned}$$

$\square $

Figure 3 provides an illustration of the scale equivariance. It shows four rectangles with length-to-width ratio 20 and varying width (3, 5, 7, and 11 pixels) together with the gray value profile of the second-order Riesz transform $R^{(2,0)}$ along a linear section through the centers of the rectangles. In spite of the different widths, the Riesz transform yields equal filter responses for each rectangle (up to rescaling). In contrast, to achieve the same behavior in Gaussian scale space, the scale space has to be sampled (i.e., a subset of scales has to be selected), the $\gamma $-normalized derivative [1] has to be calculated for every scale, and finally, the scale yielding the maximal absolute value has to be selected. In comparison, the simplicity of the Riesz transform achieving the same in just one transform without sampling scale space and without the need for a scale parameter is striking.

3 Riesz Transform Neural Networks

In the spirit of structured receptive fields [34] and Gaussian derivative networks [27], we use the Riesz transforms of first and second orders instead of standard convolutions to define Riesz layers. As a result, Riesz layers are scale-equivariant in a single forward pass. Replacing standard derivatives with the Riesz transform has been motivated by [16], while using a linear combination of Riesz transforms of several orders follows [21].

3.1 Riesz Layers

The base layer of the Riesz networks is defined as a linear combination of Riesz transforms of several orders implemented as 1d convolution across feature channels (Fig. 4). Here, we limit ourselves to first- and second-order Riesz transforms. Thus, the linear combination reads as

$$\begin{aligned} J_{{\mathcal {R}}}(f)&= C_0 + \sum _{k=1}^{d}{C_k\cdot {\mathcal {R}}_k(f)} + \nonumber \\&\quad +\ \mathop {\sum }_{k,l\in {\mathbb {N}}_0, k+l=2}{C_{k,l}\cdot {\mathcal {R}}^{(k,l)}(f)}, \end{aligned}$$

(11)

where $\{C_0, C_k \vert k\in \{1,\ldots ,d\} \}\cup \{C_{k,l} \vert l,k \in {\mathbb {N}}_0, l+k=2\}$ are parameters that are learned during training.

Now we can define the general layer of the network (Fig. 4). Let us assume that the Kth network layer takes input $F^{(K)} = ( F^{(K)}_1,\ldots , F^{(K)}_{c^{(K)}}) \in \big (L_2({\mathbb {R}}^d)\big )^{c^{(K)}}$ with $c^{(K)}$ feature channels and has output $F^{(K+1)}= (F^{(K+1)}_1,\ldots ,F^{(K+1)}_{c^{(K+1)}})\in \big (L_2({\mathbb {R}}^d)\big )^{c^{(K+1)}}$ with $c^{(K+1)}$ channels. Then the output in channel $j \in \{1,\ldots ,c^{(K+1)} \}$ is given by

$$\begin{aligned} F^{(K+1)}_j = \sum _{i=1}^{c^{(K)}} J_{K}^{(j,i)}(F^{(K)}_i). \end{aligned}$$

(12)

Here, $J_{K}^{(j,i)}$ is defined in the same way as $J_{{\mathcal {R}}}(f)$ from Eq. (11), but trainable parameters may vary for different input channels i and output channels j, i.e.,

$$\begin{aligned} J^{(j,i)}_K(f)&= C_0^{(j,i,K)} + \sum _{k=1}^{d}{C_k^{(j,i,K)}\cdot {\mathcal {R}}_k(f)} + \nonumber \\&\quad + \mathop {\sum }_{k,l\in {\mathbb {N}}_0, k+l=2}{C_{k,l}^{(j,i,K)}\cdot {\mathcal {R}}^{(k,l)}(f)}. \end{aligned}$$

(13)

In practice, the offset parameters $C_0^{(i,j,K)}$, $i=1,\ldots ,c^{(K)}$ are replaced with a single parameter defined as $C_0^{(j,K)}:= \sum _{i=1}^{c^{(K)}}C_0^{(j,i,K)}$.

3.2 Proof of Scale Equivariance

We prove the scale equivariance for $J_{{\mathcal {R}}}(f)$. That implies scale equivariance for $J^{(j,i)}_K(f)$ and consequently for $F^{(K+1)}_j$ for arbitrary layers of the network. By construction (see Sect. 3.3), this will result in provable scale equivariance for the whole network. Formally, we show that $J_{{\mathcal {R}}}(f)$ from Eq. (11) is scale-equivariant, i.e.,

$$\begin{aligned} J_{{\mathcal {R}}}\left( f(\frac{\cdot }{a})\right) = J_{{\mathcal {R}}}(f) (\frac{\cdot }{a}), \end{aligned}$$

(14)

for $f \in L_2({\mathbb {R}}^d)$.

Proof

For any scaling parameter $a > 0$ and $x\in {\mathbb {R}}^d$, we have

$$\begin{aligned}{} & {} J_{{\mathcal {R}}}\left( f(\frac{\cdot }{a})\right) (x) = C_0 + \sum _{k=1}^{d}{C_k\cdot {\mathcal {R}}_k\left( f(\frac{\cdot }{a})\right) (x)} \\{} & {} \quad \qquad +\mathop {\sum }_{k,l\in {\mathbb {N}}_0, k+l=2}{C_{k,l}\cdot {\mathcal {R}}^{(k,l)}\left( f(\frac{\cdot }{a})\right) (x)} \\{} & {} \quad \overset{\mathrm {(10)}}{=}C_0 + \sum _{k=1}^{d}{C_k\cdot {\mathcal {R}}_k(f)(\frac{x}{a})} \\{} & {} \quad \qquad +\mathop {\sum }_{k,l\in {\mathbb {N}}_0, k+l=2}{C_{k,l}\cdot {\mathcal {R}}^{(k,l)}(f)(\frac{x}{a})} \\{} & {} \quad = J_{{\mathcal {R}}}(f) (\frac{x}{a}). \end{aligned}$$

$\square $

3.3 Network Design

block of modern CNNs is a sequence of the following operations: batch normalization, spatial convolution (e.g., $3\times 3$ or $5\times 5$), the rectified linear unit (ReLU) activation function, and max pooling. Spatial convolutions have by default a limited size of the receptive field and max pooling is a downsampling operation performed on a window of fixed size. For this reason, these two operations are not scale-equivariant and consequently CNNs are sensitive to variations in the scale. Hence, among the classical operations, only batch normalization [42] and ReLU activation preserve scale equivariance. To build our neural network from scale-equivariant transformations, only, we restrict to using batch normalization, ReLUs, and Riesz layers, which serve as a replacement for spatial convolutions. In our setting, max pooling can completely be avoided since its main purpose is to combine it with spatial convolutions in cascades to increase the size of the receptive field while reducing the number of parameters.

Generally, a layer consists of the following sequence of transformations: batch normalization, Riesz layer, and ReLU. Batch normalization improves the training capabilities and avoids overfitting, ReLUs introduce non-linearity, and the Riesz layers extract scale-equivariant spatial features. For every layer, the number of feature channels has to be selected. Hence, our network with $K \in {\mathbb {N}}$ layers can be simply defined by a $(K+2)$-tuple specifying the channel sizes^{Footnote 2}, e.g., $(c^{(0)}, c^{(1)}, \cdots c^{(K)},c^{(K+1)})$. The final layer is defined as a linear combination of the features from the previous layer followed by a sigmoid function yielding the desired probability map as output.

The four-layer Riesz network we apply here can be schematically written as $1 \rightarrow 16 \rightarrow 32 \rightarrow 40 \rightarrow 48 \rightarrow 1$ and has $(1\cdot 5\cdot 16+16)+(16\cdot 5\cdot 32+32)+(32\cdot 5\cdot 40+40)+(40\cdot 5\cdot 48+48)+(48\cdot 1+1)=18\,825$ trainable parameters.

4 Experiments and Applications

In this section we evaluate the four-layer Riesz network defined above on the task of segmenting cracks in 2d slices from CT images of concrete. Particular emphasis is put on the network’s ability to segment multiscale cracks and to generalize to crack scales unseen during training. To quantify these properties, we use images with simulated cracks. Being accompanied by an unambiguous ground truth, they allow for an objective evaluation of the segmentation results.

Additionally, in “Appendix A” scale equivariance of the Riesz network is experimentally validated on the MNIST Large Scale data set [5].

Data generation: Cracks are generated by the fractional Brownian motion (Experiment 1) or minimal surfaces induced by the facet system of a Voronoi tessellation (Experiment 2). Dilated cracks are then integrated into CT images of concrete without cracks. As pores and cracks are both air-filled, their gray value distribution should be similar. Hence, the gray value distribution of crack pixels is estimated from the gray value distribution observed in air pores. The crack thickness is kept fixed (Experiment 1) or varies (Experiment 2) depending on the objective of the experiment. As a result, realistic semi-synthetic images can be generated (see Fig. 5). For more details on the simulation procedure, we refer to [6, 10]. Details on number and size of the images can be found below. Finally, we show applicability of the Riesz network for real data containing cracks generated by tensile and pull-out tests.

Quality metrics: As metrics for evaluation of the segmentation results we use precision (P), recall (R), F1-score (or dice coefficient, Dice), and intersection over union (IoU). The first three quality metrics are based on true positives tp – the number of pixels correctly predicted as crack, true negatives tn—the number of pixels correctly predicted as background, false positives fp—the number of pixels wrongly predicted as crack, and false negatives fn—the number of pixels falsely predicted as background. Precision, recall, and dice coefficient are then defined via

$$\begin{aligned}{} & {} P = tp/(tp+fp),\quad R = tp/(tp+fn),\\{} & {} \quad \text {Dice} = 2PR/(P + R). \end{aligned}$$

IoU compares union and intersection of the foregrounds X and Y in the segmented image and the corresponding ground truth, respectively. That is

$$\begin{aligned} IoU(X,Y) = \frac{|X\cap Y|}{|X \cup Y|}. \end{aligned}$$

All these metrics have values in the range [0, 1] with values closer to 1 indicating a better performance.

Training parameters: If not specified otherwise, all models are trained on cracks of fixed width of 3 pixels. Cracks for the training are generated in the same way as for Experiment 1 on $256\times 256$ sections of CT images of concrete. Then, 16 images of size $64\times 64$ are cropped without overlap from each of the generated images. In this way, images without cracks are present in the training set. After data augmentation by flipping and rotation, the training set consists of 1 947 images of cracks. Some examples are shown in Fig. 5. For validation, another set of images with cracks of width 3 is used. The validation data set’s size is a third of the size of the training set.

All models are trained for 50 epochs with initial learning rate 0.001 which is halved every 20 epochs. ADAM optimization [43] is used, while the cost function is set to binary cross entropy loss.

Crack pixels are labeled with 1, while background is labeled with 0. As there are far more background than crack pixels, we deal with a highly imbalanced data set. Therefore, crack and pore pixels are given a weight of 40 to compensate for class imbalance and to help distinguishing between these two types of structures which hardly differ in their gray values.

Table 1 Experiment 1. Ablation study: scale generalization ability of Riesz networks

Full size table

4.1 Measuring Scale Equivariance

Measures for assessing scale equivariance have been introduced in [36, 38]. For an image or feature map f, a mapping function $\Phi $ (e.g., a neural network or a subpart of a network), and a scaling function $L_a$ we define

$$\begin{aligned} \Delta _a(\Phi ) := \frac{|| L_a(\Phi (f)) - \Phi (L_a(f))||_2}{||L_a(\Phi (f)) ||_2}. \end{aligned}$$

(15)

Ideally, this measure should be 0 for perfect scale equivariance. In practice, due to scaling and discretization errors we expect it to be positive yet very small.

To measure scale equivariance of the full Riesz network with randomly initialized weights, we use a data set consisting of 85 images of size $512\times 512$ pixels with crack width 11 and use downscaling factors $a\in \{2,4,8,16,32,64\}$. The evaluation was repeated for 20 randomly initialized Riesz networks. The resulting values of $\Delta _a$ are given in Fig. 6.

The measure $\Delta _a$ was used to validate the scale equivariance of deep scale spaces (DSS) in [36] and scale-equivariant steerable networks in [38]. In both works, a steep increase in $\Delta _a$ is observed for downscaling factors larger than 16, while for very small downscaling factors, $\Delta _a$ is reported to be below 0.01. In [38], $\Delta _a$ reaches 1 for downscaling factor 45. The application scenario studied here differs from those of [36, 38]. Results are thus not directly comparable but can be considered only as an approximate baseline. For small downscaling factors, we find $\Delta _a$ to be higher than in [38] (up to 0.075). However, for larger downscaling factors $(a>32)$, $\Delta _a$ increases more slowly, e.g., $\Delta _{64} = 0.169$. This proves the resilience of Riesz networks to very high downscaling factors, i.e., large changes in scale.

4.2 Experiment 1: Generalization to Unseen Scales

Our models are trained on images of fixed crack width 3. To investigate their behavior on crack widths outside of the training set, we generate cracks of widths $\{1, 3, 5, 7, 9, 11\}$ pixels in images of size $512\times 512$, see Fig. 9. Each class contains 85 images. Besides scale generalization properties of the Riesz network, we check how well it generalizes to random variations in crack topology or shapes, too. For this experiment, we will assume that the fixed width is known. This means that the competing methods will use a single scale which is adjusted to this width.

4.2.1 Ablation Study on the Riesz Network

We investigate how the network parameters and the composition of the training set affect the quality of the results, in order to learn how to design this type of neural networks efficiently.

Table 2 Experiment 1. Comparison with competing methods: Dice coefficients for segmentation of cracks of differing width

Full size table

Size of training set: First, we investigate the robustness of the Riesz network to the size of the training set. The literature [34] suggests that neural networks based on structure receptive fields are less data hungry, i.e., their performance with respect to the size of the training set is more stable than that of conventional CNNs. Since the Riesz network uses the Riesz transform instead of a Gaussian derivative as in [34], it is expected that the same would hold here, too.

The use of smaller training sets has two main benefits. First, obviously, smaller data sets reduce the effort for data collection, i.e., annotation or simulation. Second, smaller data sets reduce the training time for the network if we do not increase the number of epochs during training.

We constrain ourselves to three sizes of training sets: $1\,947$, 975, and 489. These numbers refer to the sets after data augmentation by flipping and rotation. Hence, the number of original images is three times smaller. In all three cases we train the Riesz network for 50 epochs and with similar batch sizes (11, 13, and 11, respectively). Results on unseen scales with respect to data set size are shown in Table 1 and Fig. 7 (left). We observe that the Riesz network trained on the smallest data set is competitive with counterparts trained on larger data sets albeit featuring generally 1–2% lower Dice and IoU.

Choice of crack width for training: There are two interesting questions with respect to crack width. Which crack width is suitable for training of the Riesz network? Do varying crack thicknesses in the training set improve performance significantly?

To investigate these questions, we choose three training data sets with cracks of fixed widths 1, 3, or 5. A fourth data set combines crack widths 1, 3, and 5. We train the Riesz network with these sets and evaluate its generalization performance across scales. Results are summarized in Fig. 7 (center) and Table 1. Crack widths 3 and 5 yield similar results, while crack width 1 seems not to be suitable, except when trying to segment cracks of width 1. Cracks of width 1 are very thin, subtle, and in some cases even disconnected. Hence, they differ significantly from thicker cracks which are 8-connected and have a better contrast to the concrete background. This indicates that very thin cracks should be considered a special case which requires somewhat different treatment. Rather surprisingly, using the mixed training data set does not improve the metrics. Diversity with respect to scale in the training set seems not to be a decisive factor when designing Riesz networks.

Number of layers: Finally, we investigate the explanatory power of the Riesz network depending on network depth and the number of parameters. We train four networks with $2-5$ layers and $2\,721$, $9\,169$, $18\,825$, and $34\,265$ parameters, respectively, on the same data set for 50 epochs. The network with 5 layers has structure $16\rightarrow 32\rightarrow 40\rightarrow 48\rightarrow 64$, and every other network is constructed from this one by removing the required number of layers at the end. Results are shown in Table 1 and in Fig. 7 (right). The differences between the networks with 3, 4, and 5 layers are rather subtle. For the Riesz network with only 2 layers, performance deteriorates considerably (3–5% in Dice and IoU).

In general, Riesz networks appear to be robust with respect to training set size, depth of network, and number of parameters. Hence, it is not necessary to tune many parameters or to collect thousands of images to achieve good performance, in particular for generalization to unseen scales. For the choice of crack width, 3 and 5 seem appropriate while crack width 1 should be avoided.

4.2.2 Comparison with Competing Methods

Competing methods: The four-layer Riesz network is compared to two other methods—Gaussian derivative networks [27] and U-net [44] on either rescaled images [5] or an image pyramid [8]. The Gaussian derivative network uses scale space theory based on the Gaussian kernel and the diffusion equation. Using the $\gamma $-normalized Gaussian derivatives from [1], layers of first- and second-order Gaussian derivatives are constructed [27]. We shortly state differences between our reimplementation and the original work [27]. In order to reduce the computation time, we use a version of the Gaussian network that has a single scale channel corresponding to the training thickness during the training, while additional channels with shared weights are added for the inference, i.e., testing the scale generalization. This version of the Gaussian network is different from the original one [27], which has more scale channels during training. Using multiple scale channels in the training step has been found to result in better scale generalization properties if the different scale channels were allowed to compete against each other during the training stage. However, this increases the computational burden compared to a single channel network. In this section, we use a single channel version of the network but with $\sigma $ adjusted to the crack width. The sparser scale sampling ratio of 2 used in this reimplementation from Sect. 4.3 is, however, expected to lead to lower performance compared to using a scale sampling ratio of $\sqrt{2}$, as used in the original work.

U-net has around 2.7 million parameters, while the Gaussian derivative network has the same architecture as the Riesz network and hence the same number of parameters (18k).

We design an experiment for detailed analysis and comparison of the ability of the methods to generalize to scales unseen during training. In typical applications, the thickness of the cracks would not be known. Here, crack thickness is kept fixed such that the correct scale of cracks is a priori known. This allows for a selection of an optimal scale (or range of scales) such that we have a best case comparison. For the Gaussian derivative network, scale is controlled by the standard deviation parameter $\sigma $ which is set to the half width of the crack. Here, we have avoided the inference of the Gaussian network with multiple scale channels and have used the assumption that the scale is a priori known. For the U-net, scale is adjusted by downscaling the image to match the crack width used in the training data. Here, we restrict the downscaling to discrete factors in the set $\{2,4,8,... \}$ that were determined during validation. For widths 1 and 3, no downscaling is needed. For width 5, the images are downscaled by 2, for width 7 by 4, and for widths 9 and 11 by 8. For completeness, we include results for the U-net without downscaling denoted by "U-net plain". Table 2 yields the prediction quality measured by the Dice coefficient, while the other quality measures are shown in Fig. 8. Exemplary segmentation results are shown in Fig. 9.

As expected, the performance of the plain U-net decreases with increasing scale. Scale adjustment stabilizes U-net’s performance but requires manual selection of scales. Moreover, the interpolation in upsampling and downsampling might induce additional errors. The decrease in performance with growing scale is still apparent (10–15%) but significantly reduced compared to the plain U-net $(55\%)$. To get more insight into performance and characteristics of the U-net, we add an experiment similar to the one from [5]: We train the U-net on crack widths 1, 3, and 5 on the same number of images as for one single crack width. This case is referred to "U-net-mix scale adj." in Table 2. Scales are adjusted similarly: w5 and w7 are downscaled by factor 2, w9 and w11 are downscaled by factor 4. The results are significantly better than those obtained by the U-net trained on the single width (10–15% in Dice and IoU on unseen scales), but still remain worse than the Riesz network trained on a single scale (around $7\%$ in Dice and IoU on unseen scales).

Table 3 Experiment 2. Performance on simulated multiscale cracks

Full size table

The Gaussian derivatives network is able to generalize steadily across the scales (Dice and IoU $74\%$) but nevertheless performs worse than the scale adjusted U-net (around $10\%$ in IoU). Moreover, it is very sensitive to noise and typical CT imaging artifacts (Fig. 9).

On the other hand, the Riesz network’s performance is very steady with growing scale. We even observe improving performance in IoU and Dice with increase in crack thickness. This is due to pixels at the edge of the crack influencing the performance metrics less and less the thicker the crack gets. The Riesz network is unable to precisely localize cracks of width 1 as, due to the partial volume effect, such thin cracks appear to be discontinuous. With the exception of the thinnest crack, the Riesz network has Dice coefficients above $94\%$ and IoU over $88\%$ for completely unseen scales. This even holds for the cases when the crack is more than 3 times thicker than the one used for training.

4.3 Experiment 2: Performance on Multiscale Data

Since cracks are naturally multiscale structures, i.e., crack thickness varies as the crack propagates, the performance of the considered methods on multiscale data is analyzed as well. On the one hand, we want to test on data with an underlying ground truth without relying on manual annotation prone to errors and subjectivity. On the other hand, the experiment should be conducted in a more realistic and less controlled setting than the previous one, with cracks as similar as possible to real ones.

We therefore use again simulated cracks, this time however with varying width. The thickness is modeled by an adaptive dilation. See Fig. 10 for realization examples. The change rendering our experiment more realistic than the first one is to exploit no prior information about the scale. The Riesz network does not have to be adjusted for this experiment, while the competing methods require scale selection as described in Sect. 4.2.2. Without knowing the scale, testing several configurations is the only option. See “Appendix B” for examples. Note that in this experiment we used a different crack simulation technique [10] than in Experiment 1. In principle, we cannot claim that either of the two techniques generates more realistic cracks. However, this change serves as an additional goodness check for the methods since these simulation techniques can be seen as independent.

We adjust the U-net as follows: We downscale the image by several factors from $\{2,4,8,16... \}$. The forward pass of the U-net is applied to the original and every downscaled image. Subsequently, the downscaled images are upscaled back to the original size. All predictions are joined by the maximum operator. We report results for several downscaling factor combinations specified by a number N, which is the number of consecutive downscaling factors used, starting at the smallest factor 2. Similarly as in Experiment 1, we report results of two U-net models: the first model is trained on cracks of width 3 as the other models in the comparison. The second model is trained on cracks with mixed widths. Including more crack widths in the training set has proven to improve the scale generalization ability in Experiment 1. Hence, the second model represents a more realistic setting that would be used in practice where the crack width is typically unknown. We denote the respective networks as "U-net pyramid" N and "U-net-mix pyramid" N.

For the Gaussian network, we vary the standard deviation parameter $\sigma $ in the set $\{1.5,3,6,12\}$. This selection of scales is motivated by the network having been trained on crack width 3 with $\sigma = 1.5$. We start with the original $\sigma $ and double it in each step. Note that in the related study on the scale sampling for scale-channel deep networks [5], using a scale sampling factor of $\sqrt{2}$ was found to lead to better performance than using a scale sampling factor of 2. Hence, additional experimentation with hyperparameters of the method might improve the results. We decided to keep the sampling scheme as similar as possible to the one from U-net. The reason is that U-net with downscaling was more extensively tested on the crack segmentation task [8, 45]. As for the U-net, we test several configurations, now specified by the number N of consecutive $\sigma $ values used, starting at the smallest (1.5). We denote the respective network "Gaussian network" N.

Results are reported in Table 3 and Fig. 10. We observe a clear weakness of the Riesz network in segmenting thin cracks (Fig. 10, first and last row). Despite of this, the recall is still quite high ($90\%$). However, this could be due to thicker cracks—which are handled very well—contributing stronger to these statistics as they occupy more pixels. Nevertheless, the Riesz network deals with the problem of the wide range scales without sampling the scale dimension, just with a single forward pass of the network.

The performance of the U-net improves with including more levels in the pyramid, too. However, this applies only up to a certain number of levels after which the additional gain becomes minimal. Moreover, applying the U-net on downscaled images seems to induce oversegmentation of the cracks (Fig. 10, second and third row). Including a variety of crack widths in the training set improves the overall performance of U-net in all metrics. This confirms the hypothesis that U-net significantly benefits from variations in the training set. However, this model of U-net is still outperformed by the Riesz network trained on a single crack width. The Gaussian network behaves similarly as the U-net, with slightly worse performance (according to Dice or IoU) but better crack coverage (Recall). As the number of $\sigma $ values grows, the recall increases but at the same time artifacts accumulate across scales reducing precision. The best balance on this data set is found to be three scales.

4.4 Experiment 3: Application to Cracks in CT Images of Concrete

Finally, we check the methods’ performance on real data: cracks in concrete samples generated by tensile and pull-out tests. In these examples, the crack thickness varies from 1 or 2 pixels to more than 20 pixels (Fig. 11). This motivates the need for methods that automatically generalize to completely unseen scales. Here, we can assess the segmentation results qualitatively, only, as no ground truth is available. Manual segmentation of cracks in high-resolution images is time-consuming and prone to individual biases. Additional experiments on real cracks in the different types of concrete are shown in “Appendix C”.

The first sample (Fig. 11, first row) is a concrete cylinder with a glass fiber-reinforced composite bar embedded along the center line. A force is applied to this bar to pull it out of the sample and thus initiate cracking. Cracking starts around the bar and branches in three directions: left, right diagonal, and down (very subtle, thin crack). Crack thicknesses and thus scales vary depending on the crack branch. As before, our Riesz network is able to handle all but the finest crack thicknesses efficiently in a single forward pass without specifying the scale range. The U-net on the image pyramid requires a selection of downsampling steps (Appendix B), accumulates artifacts from all levels of the pyramid, and slightly oversegments thin cracks (left branch).

The second sample (Fig. 11, second row) features a horizontal crack induced by a tensile test. Here we observe permanently changing scales, similar to our simulated multiscale data. The crack thickness varies from a few to more than 20 pixels. Once more, the Riesz network handles the scale variation well and segments almost all cracks with minimal artifacts. In this example, U-net covers the cracks well, too, even the very subtle ones. However, it accumulates more false positives in the areas of concrete without any cracks than the Riesz network.

5 Conclusion

In this paper, we introduced a new type of scale-invariant neural network based on the Riesz transform as filter basis instead of standard convolutions. Our Riesz neural network is scale-invariant in one forward pass without specifying scales or discretizing and sampling the scale dimension. Its ability to generalize to scales differing from those trained on is tested and validated in segmenting cracks in 2d slices from CT images of concrete. Usefulness of the method become manifest in the fact that only one fixed scale is needed for training, while preserving generalization to completely unseen scales. This reduces the effort for data collection, generation or simulation. Furthermore, our network has relatively few parameters (around 18k) which reduces the danger of overfitting.

Experiments on simulated yet realistic multiscale cracks as well as on real cracks corroborate the Riesz network’s potential. Compared to other deep learning methods that can generalize to unseen scales, the Riesz network yields improved, more robust, and more stable results.

A detailed ablation study on the network parameters reveals several interesting features: This type of networks requires relatively few data to generalize well. The Riesz network proves to perform well on a data set of approximately 200 images before augmentation. This is particularly useful for deep learning tasks where data acquisition is exceptionally complex or expensive. The performance based on the depth of the network and the number of parameters has been analyzed. Only three layers of the network suffice to achieve good performance on cracks in 2d slices of CT images. Furthermore, the choice of crack thickness in the training set is found to be not decisive for the performance. Training sets with crack widths 3 and 5 yield very similar results.

The two main weaknesses of our approach in the crack segmentation task are undersegmentation of thin cracks and edge effects around pores. In CT images, thin cracks appear brighter than thicker cracks due to the partial volume effect reducing the contrast between the crack and concrete. For the same reason thin cracks look discontinued. Thin cracks might therefore require special treatment. In some situations, pore edge regions get erroneously segmented as crack. These can, however, be removed by a post-processing step and are no serious problem.

To unlock the full potential of the Riesz transform, validation on other types of problems is needed. Furthermore, scaling the Riesz network to larger, wider, and deeper models remains an open topic. Our study as well as previous ones [34, 46, 47] imply that small models based on linear combination of the convolutions with fixed filters could yields results comparable to those of large CNN models. However, in order to state this reliably, training convergence, expressiveness, and run-times of these types of networks have to be compared systematically to those of CNNs.

In future, the method should be applied in 3d since CT data is originally 3d. In this case, memory issues might occur during discretization of the Riesz kernel in frequency space.

An interesting topic for further research is to join translation and scale invariance with rotation invariance to design a new generation of neural networks with encoded basic computer vision properties [40]. This type of neural network could be very efficient because it would have even less parameters and hence would require less training data, too.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

Crack scale, thickness, and width refer to the same characteristic and will be interchangeably used throughout the paper.
Channel dimension c(0) denotes the dimension of input image, e.g., for gray value images $c(0)=1$. Channel dimension $c(K+1)$ denotes the dimension of the final output of the network. For the crack segmentation, the output map is the binary image, e.g., $c(K+1)=1$.

References

Lindeberg, T.: Feature detection with automatic scale selection. Int. J. Comput. Vis. 30, 77–116 (1998). https://doi.org/10.1023/A:1008045108935
Article Google Scholar
Lowe, D.: Object recognition from local scale-invariant features. ICCV 2, 1150–1157 (1999). https://doi.org/10.1109/ICCV.1999.790410
Article Google Scholar
Lindeberg, T.: Image matching using generalized scale-space interest points. J. Math. Imaging Vis. 52, 3–36 (2015). https://doi.org/10.1007/s10851-014-0541-0
Article MathSciNet Google Scholar
Kanazawa, A., Sharma, A., Jacobs, D.: Locally scale-invariant convolutional neural networks. Deep Learning and Representation Learning Workshop: NIPS. arXiv:1412.5104pdf
Jansson, Y., Lindeberg, T.: Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales. J. Math. Imaging Vis. https://doi.org/10.1007/s10851-022-01082-2
Barisin, T., Jung, C., Müsebeck, F., Redenbach, C., Schladitz, K.: Methods for segmenting cracks in 3d images of concrete: a comparison based on semi-synthetic images. Pattern Recogn. (2022). https://doi.org/10.1016/j.patcog.2022.108747
Article Google Scholar
Ehrig, K., Goebbels, J., Meinel, D., Paetsch, O., Prohaska, S., Zobel, V.: Comparison of crack detection methods for analyzing damage processes in concrete with computed tomography. In: International Symposium on Digital Industrial Radiology and Computed Tomography (2011)
Jung, C., Müsebeck, F., Barisin, T., Schladitz, K., Redenbach, C., Kiesche, M., Pahn, M.: Towards automatic crack segmentation in 3d concrete images. In: iCT 2022, Wels, Austria. https://www.ndt.net/article/ctc2022/papers/ICT2022_paper_id225.pdf (2022)
Ess, A., Leibe, B., Schindler, K., Van Gool, L.: A mobile vision system for robust multi-person tracking, in. IEEE Conf. Comput. Vis. Pattern Recognit. 2008, 1–8 (2008). https://doi.org/10.1109/CVPR.2008.4587581
Article Google Scholar
Jung, C., Redenbach, C.: Crack modeling via minimum-weight surfaces in 3d Voronoi diagrams. arXiv:2210.05093 (2022)
Bernstein, S., Bouchot, J.-L., Reinhardt, M., Heise, B.: Generalized analytic signals in image processing: comparison, theory and applications. Springer, Basel 11, 221–246 (2013). https://doi.org/10.1007/978-3-0348-0603-9_11
Article MathSciNet Google Scholar
Felsberg, M., Sommer, G.: The monogenic signal. IEEE Trans. Signal Process. 49(12), 3136–3144 (2001). https://doi.org/10.1109/78.969520
Article ADS MathSciNet Google Scholar
Felsberg, M.: Low-level image processing with the structure multivector, Ph.D. thesis, Selbstverlag des Instituts für Informatik, Kiel (2002). https://macau.uni-kiel.de/receive/macau_mods_00001925
Felsberg, M., Sommer, G.: Scale adaptive filtering derived from the Laplace equation. Pattern Recogn. 124–131 (2001) https://doi.org/10.1007/3-540-45404-7_17
Felsberg, M., Sommer, G.: The monogenic scale-space: a unifying approach to phase-based image processing in scale-space. J. Math. Imaging Vis. 21, 5–26 (2004). https://doi.org/10.1023/B:JMIV.0000026554.79537.35
Article MathSciNet Google Scholar
Köthe, U., Felsberg, M.: Riesz-transforms vs. derivatives: on the relationship between the boundary tensor and the energy tensor. In: Proceedings of the Scale Space Conference (this, Springer), pp. 179–191 (2005). https://doi.org/10.1007/11408031_16
Unser, M., Van De Ville, D.: Wavelet steerability and the higher-order Riesz transform. IEEE Trans. Image Process. 19(3), 636–652 (2010). https://doi.org/10.1109/TIP.2009.2038832
Article ADS MathSciNet PubMed Google Scholar
Wietzke, L., Sommer, G., Schmaltz, C., Weickert, J.: Differential geometry of monogenic signal representations. In: Sommer, G., Klette, R. (eds) Lecture Notes Computing Science, pp. 454–465 (2008). https://doi.org/10.1007/978-3-540-78157-8_35
Dobrovolskij, D., Persch, J., Schladitz, K., Steidl, G.: Structure detection with second order Riesz transforms. Image Anal. Stereol. 38, 107 (2019). https://doi.org/10.5566/ias.1964
Article MathSciNet Google Scholar
Zhang, L., Zhang, L., Mou, X.: RFSIM: a feature based image quality assessment metric using Riesz transforms. In: ICIP 2010, pp. 321–324 (2010). https://doi.org/10.1109/ICIP.2010.5649275
Depeursinge, A., Foncubierta-Rodriguez, A., Van de Ville, D., Müller, H.: Multiscale lung texture signature learning using the Riesz transform. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds) MICCAI 2012, pp. 517–524 (2012). https://doi.org/10.1007/978-3-642-33454-2_64
Langley, K., Anderson, S.J.: The Riesz transform and simultaneous representations of phase, energy and orientation in spatial vision. Vis. Res. 50(17), 1748–1765 (2010). https://doi.org/10.1016/j.visres.2010.05.031
Article PubMed Google Scholar
Reinhardt, M., Bernstein, S., Heise, B.: Multi-scale orientation estimation using higher order Riesz transforms, International Journal of Wavelets. Multiresolut. Inf. Process. 20(03), 2040007 (2022). https://doi.org/10.1142/S021969132040007X
Article Google Scholar
Unser, M., Sage, D., Van De Ville, D.: Multiresolution monogenic signal analysis using the Riesz–Laplace wavelet transform. IEEE Trans. Image Process. 18, 2402–18 (2009). https://doi.org/10.1109/TIP.2009.2027628
Article ADS MathSciNet PubMed Google Scholar
Häuser, S., Heise, B., Steidl, G.: Linearized Riesz transform and quasi-monogenic shearlets. Int. J. Wavel. Multiresolut. Inf. Process. 12(03), 1450027 (2014). https://doi.org/10.1142/S0219691314500271
Article MathSciNet Google Scholar
Joyseeree, R., Otálora Montenegro, J., Müller, H., Depeursinge, A.: Fusing learned representations from Riesz filters and deep CNN for lung tissue classification. Med. Image Anal. 56. https://doi.org/10.1016/j.media.2019.06.006
Lindeberg, T.: Scale-covariant and scale-invariant gaussian derivative networks. J. Math. Imaging Vis. 64, 223–242 (2021). https://doi.org/10.1007/s10851-021-01057-9
Article MathSciNet Google Scholar
Marcos, D., Kellenberger, B., Lobry, S., Tuia, D.: Scale equivariance in CNNs with vector fields. ICML abs/1807.11783. arXiv:1807.11783
Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) ECCV 2016, pp. 354–370 (2016). https://doi.org/10.1007/978-3-319-46493-0_22
Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, Los Alamitos, pp. 936–944 (2017). https://doi.org/10.1109/CVPR.2017.106, https://doi.ieeecomputersociety.org/10.1109/CVPR.2017.106
Xu, Y., Xiao, T., Zhang, J., Yang, K., Zhang, Z.: Scale-invariant convolutional neural networks (2014). arXiv:1411.6369
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Advanced Neurology, vol. 28 (2015). https://proceedings.neurips.cc/paper/2015/file/33ceb07bf4eeb3da587e268d663aba1a-Paper.pdf
Finnveden, L., Jansson, Y., Lindeberg, T.: Understanding when spatial transformer networks do not support invariance, and what to do about it. In: ICPR, pp. 3427–3434 (2021). https://doi.org/10.1109/ICPR48806.2021.9412997
Jacobsen, J.-H., Gemert, J., Lou, Z., Smeulders, A.: Structured receptive fields in CNNs. In: CVPR-2016, pp. 2610–2619 (2016). https://openaccess.thecvf.com/content_cvpr_2016/papers/Jacobsen_Structured_Receptive_Fields_CVPR_2016_paper.pdf
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2016). arXiv:1511.07122
Worrall, D., Welling, M.: Deep scale-spaces: Equivariance over scale. In: Advanced Neurology, vol 32 (2019). https://proceedings.neurips.cc/paper/2019/file/f04cd7399b2b0128970efb6d20b5c551-Paper.pdf
Sangalli, M., Blusseau, S., Velasco-Forero, S., Angulo, J.: Scale equivariant neural networks with morphological scale-spaces. In: Lindblad, J., Malmberg, F., Sladoje, N. (eds) DGMM, pp. 483–495 (2021). https://doi.org/10.1007/978-3-030-76657-3_35
Sosnovik, I., Szmaja, M., Smeulders, A.: Scale-equivariant steerable networks. In: ICLR (2020).https://openreview.net/forum?id=HJgpugrKPS
Bruna, J., Mallat, S.: Invariant scattering convolution networks. IEEE Trans. Pattern Anal. 35, 1872–1886 (2013). https://doi.org/10.1109/TPAMI.2012.230
Article Google Scholar
Sifre, L., Mallat, S.: Rotation, scaling and deformation invariant scattering for texture discrimination. In: CVPR-2013, pp. 1233–1240 (2013). https://doi.org/10.1109/CVPR.2013.163
Lindeberg, T.: Provably scale-covariant continuous hierarchical networks based on scale-normalized differential expressions coupled in cascade. J. Math. Imaging Vis. https://doi.org/10.1007/s10851-019-00915-x
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach, F., Blei, D. (eds) Proceedings of the 32nd International Conference on Machine Learning, vol. 37 of Proceedings of Machine Learning Research, PMLR, Lille, France, pp. 448–456 (2015). https://proceedings.mlr.press/v37/ioffe15.html
Kingma, D., Ba, J.: Adam: a method for stochastic optimization, ICLR. arXiv:1412.6980
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Lecture Notes in Computer Science, pp. 234–241 (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Jung, C., Nowacka, A., Barisin, T., Meinel, D., Paetsch, O., Grzesiak, S., Salamon, M., Schladitz, K., Redenbach, C., Pahn, M.: 3d imaging and analysis of cracks in loaded concrete samples. In: iCT 2023, Furth, Germany, (2023). https://www.ndt.net/article/ctc2023/papers/Contribution_124_final.pdf
Luan, S., Chen, C., Zhang, B., Han, J., Liu, J.: Gabor convolutional networks. IEEE Trans. Image Process. 27(9), 4357–4366 (2018)
Article ADS MathSciNet PubMed Google Scholar
Penaud, V., Velasco-Forero, S., Angulo, J., et al.: Fully trainable gaussian derivative convolutional layer. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 2421–2425. IEEE (2022)
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Article Google Scholar
Maryamh, K., Hauch, K., Redenbach, C., Schnell, J.: Influence of specimen size on the fibre geometry and tensile strength of ultra-high-performance fibre-reinforced concrete. Struct. Concrete 23(2), 1239–1252 (2022). https://doi.org/10.1002/suco.202000753
Article Google Scholar
Maryamh, K., Hauch, K., Redenbach, C., Schnell, J.: Influence of production parameters on the fiber geometry and the mechanical behavior of ultra high performance fiber-reinforced concrete. Struct. Concrete 22(1), 361–375 (2021). https://doi.org/10.1002/suco.202000105
Article Google Scholar
Hauch, K., Maryamh, K., Redenbach, C., Schnell, J.: Predicting the tensile behaviour of ultra-high performance fibre-reinforced concrete from single-fibre pull-out tests. Materials 15(14), 5085 (2022). https://doi.org/10.3390/ma15145085
Kronenberger, M., Schladitz, K., Hamann, B., Hagen, H.: Fiber segmentation in crack regions of steel fiber reinforced concrete using principal curvature. Image Anal. Stereol. 37(2), 127–137 (2018). https://doi.org/10.5566/ias.1914
Article MathSciNet CAS Google Scholar
Schuler, F.: Richtungsanalyse von Fasern in Beton und Charakterisierung von rissquerenden Fasern mittels Computer-Tomografie, Ph.d. thesis, TU Kaiserslautern. https://kluedo.ub.uni-kl.de/frontdoor/index/index/docId/6204 (2020)

Download references

Acknowledgements

We thank Christian Jung (RPTU) for generating the multiscale crack images.

Funding

Open Access funding enabled and organized by Projekt DEAL. This work was supported by the German Federal Ministry of Education and Research (BMBF) [Grant Number 05M2020 (DAnoBi)].

Author information

Authors and Affiliations

Mathematics Department, RPTU Kaiserslautern-Landau, Gottlieb-Daimler-Straße 47, 67663, Kaiserslautern, Germany
Tin Barisin & Claudia Redenbach
Department Image Processing, Fraunhofer-Institut für Techno- und Wirtschaftsmathematik (ITWM), Fraunhofer-Platz 1, 67663, Kaiserslautern, Germany
Tin Barisin & Katja Schladitz

Authors

Tin Barisin
View author publications
You can also search for this author in PubMed Google Scholar
Katja Schladitz
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Redenbach
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

TB contributed to conceptualization, methodology, formal analysis and investigation, and writing—original draft preparation; CR and KS performed writing—review and editing, funding acquisition, and supervision.

Corresponding author

Correspondence to Tin Barisin.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Ethical Approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Experiment on MNIST Large Scale Dataset

We test the Riesz networks on a classification task on the MNIST Large Scale [5] to test wider applicability of Riesz networks outside of crack segmentation task. This data set was derived from the MNIST data set [48], and it consists of images of digits between 0 and 9 belonging to one of ten classes (Fig. 12) which are rescaled to a wide range of scales to test scale generalization abilities of neural networks (Fig. 13).

Our Riesz network has the channel structure 12-16-24-32-80-10 with the softmax function at the end. In total, it has 20,882 parameters. Following [27], only the central pixel in the image is used for classification. We use the standard CNN described in [5] but without any scale adjustments as a baseline to illustrate limited scale generalization property. This CNN has the channel structure 16-16-32-32-100-10 with the softmax function at the end and in total 574,278 parameters. The training set has 50,000 images of the single scale 1. We used a validation set of 1,000 images. The test set consists of scales ranging in $\left[ 0.5,8\right] $ with 10,000 images per scale. All images have size $112\times 112$. Models are trained using the ADAM optimizer [43] with default parameters for 20 epochs with learning rate 0.001 which is halved every 3 epochs. Cross-entropy is used as loss function.

Table 4 Classification accuracy (in %) of MNIST Large Scale data set

Full size table

Figure 14 shows validation and training loss during 20 epochs. Interestingly, the Riesz network converges faster and even its validation loss remains lower than the training loss of CNN. Accuracies for the different scales are shown in Table 4.

The Riesz network shows stable accuracy for scales in the range $\left[ 0.5,4 \right] $. The CNN, which has way more degrees of freedom, is only competitive for scales close to the training scale. Results for two scale adjusted versions of the CNN as reported in [5] are also given in Table 4. Their performance is slightly superior to the Riesz network (around 1–2%). However, it is important to note that this approach uses (max or average) pooling over 17 scales.

Further works considering the MNIST Large Scale data set are [27, 37]. Unfortunately, no numeric values of the accuracies are provided, so we can compare the results only qualitatively. The Riesz network’s accuracy varies less on a larger range of scales than those of the scale-equivariant networks on Gaussian or morphological scale spaces from [37] that were trained on scale 2. The Gaussian derivative network [27] trained on scale 1 yields results in a range between $98\%$ and $99\%$ for medium scales $\left[ 0.7,4.7\right] $ using pooling over 8 scales. The Riesz network yields similar values but without the need for scale selection.

On the smallest scale of 0.5, the Riesz network seems to give a better result than [27], while it is outperformed on the largest scales. The reason for the latter is that digits start to reach the boundary of the image. To reduce that effect, we pad the images by 20 and 40 pixels with the minimal gray value. Indeed, this improves the accuracy significantly for larger scales (Table 4), while it remains equal for the rest of the scales. For example, for scale 8, accuracy increases from 51.8 to $79.8\%$ (padding 20) and $83.6\%$ (padding 40). This is a better accuracy than that reported in [27] and [5] for models trained on scale 1.

Experiments on Scale Selection for Competing Methods Related to Riesz Network

The largest benefit of the Riesz network is avoiding the sampling of the scale dimension. Here, we give more detailed insight into scale sampling in practice for competing methods: U-net applied on rescaled images and Gaussian derivative networks. We show how segmentation results change as we add additional scales to the output. As we add new scales, cracks that belong (or are close) to the added scales get segmented. However, additional noise gets segmented, too. These noise pixels that are misclassified as cracks originate from two sources: interpolation error and high frequency noise. For simulated data this is shown in Fig. 15 and Fig. 16. For real cracks, see Fig. 17.

The main drawback is that one needs to select the range of scales on which to apply these methods. Since the scale dimension in the images is bounded from above by the size of the view window, when having images of different sizes scale sampling needs to be adjusted or recalibrated. It is not trivial how to achieve this in a general manner. In contrast, the Riesz transform enables simultaneous, continuous, and equal treatment of all scales automatically adapting to the image size.

Experiments on Different Types of Concrete: Fiber-Reinforced Concrete

It is a well-known weakness of concrete that it has low tensile strength, i.e., under high tensile force it fails abruptly and explosively. For that reason, reinforcement material is mixed with the cement paste creating a composite material. Most common reinforcements are steel rebars. Nowadays, fibers have become widely used as reinforcement in concrete creating a new class of reinforced concrete materials, e.g., ultrahigh-performance fiber-reinforced concrete [49,50,51]. A variety of materials can be used as fiber material, including glass, carbon, and basalt. Since all of these materials have different mechanical properties, the properties of fiber-reinforced concrete are connected to the properties of the concrete mixture, including the fiber material. Hence, a lot of effort has recently been invested in the investigation of fiber-reinforced concrete samples with various material configurations. In the context of CT imaging, different materials mean different energy absorption properties, i.e., fibers can appear both brighter or darker than concrete, which can result in very different images. In the context of crack segmentation, this means that our methods should be able to efficiently handle these variations. This section compares the performance of the Riesz network, U-net, and U-net-mix from the previous sections on three different fiber-reinforced concrete images. We comment on possible post-processing steps to improve results and discuss the robustness of the methods in the context of fiber-reinforced concrete.

Figure 18 shows a sample of high-performance concrete (HPC) with polypropylene fibers as reinforcement. See [45] for more details on sample and crack initiation. In this image, fibers are long and appear dark and hence interfere with the crack in the center. All three methods are able to extract the central and dominant crack in the middle. The Riesz network is not able to segment the thin crack on the left from the main crack, contrary to both U-nets. However, both U-nets accumulate a much larger amount of misclassified noise compared to the Riesz network.

Figure 19 features a sample reinforced with steel fibers. For more details see [52, 53]. In this image, fibers appear bright and create uneven illumination effects. We use a simple preprocessing step to understand if we can reduce this effect and improve the performance of the methods. Simple morphological openings with square structuring elements of half-sizes 2 and 5 are used for that purpose. As the size of the structuring element increases, segmentation results improve for all three methods. While the Riesz network struggles with low contrast cracks on the right, both types of U-net segment falsely many non-crack voxels.

The CT image from Fig. 20 originates from ultrahigh-performance concrete reinforced with steel fibers [50]. Again, fibers turn out to be bright structures in the images. These extremely highly X-ray absorbing fibers affect the gray value dynamics of the CT images. Morphological openings with square structuring elements of half-sizes 2 and 5 are applied to reduce this effect. As we increase the size, crack segmentation improves for the Riesz networks. Both types of U-net segment large amounts of noise, even with opening as a preprocessing step, rendering them ineffective for this sample.

Method Implementation

The Riesz network has been implemented in PyTorch. Here, we discuss the implementation of the Riesz transform (Algorithm 1) and the Riesz layer (Algorithm 2).

In Algorithm 1, the discrete Fourier transform "FT" denotes the composition of two torch functions: first, torch.fft.fft2 is applied to the input and then followed by torch.fft.fftshift. Similarly, its inverse "iFT" uses torch.fft.ifftshift and torch.fft.ifft2 but in the reverse order as in "FT". The operator "$\odot $" denotes a pointwise multiplication of two image maps (of the same size).

Based on Eq. (2), Algorithm 2 is the implementation of Eqs. (11) and (12). Here, a linear combination of Riesz transforms of the input feature maps can be seen as a 1d convolution across the channel dimension. Hence, "conv1d" represents the torch function torch.nn.Conv2d with $kernel\_size = (1,1)$ where $[5N_{input}, N_{output}]$ specify the number of input and output channels, respectively.

Finally, the Riesz network is built combining batch normalization, Riesz layer (Algorithm 2), and ReLU nonlinearity.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Barisin, T., Schladitz, K. & Redenbach, C. Riesz Networks: Scale-Invariant Neural Networks in a Single Forward Pass. J Math Imaging Vis (2024). https://doi.org/10.1007/s10851-024-01171-4

Download citation

Received: 08 May 2023
Accepted: 11 January 2024
Published: 29 February 2024
DOI: https://doi.org/10.1007/s10851-024-01171-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Riesz Networks: Scale-Invariant Neural Networks in a Single Forward Pass

Abstract

Similar content being viewed by others

Methods for image denoising using convolutional neural network: a review

CBAM: Convolutional Block Attention Module

Deep learning models for digital image processing: a review

1 Introduction

1.1 Related Work

1.1.1 The Riesz Transform

1.1.2 Scale-Invariant Deep Learning Methods

2 The Riesz Transform

Lemma 1

Proof

3 Riesz Transform Neural Networks

3.1 Riesz Layers

3.2 Proof of Scale Equivariance

Proof

3.3 Network Design

4 Experiments and Applications

4.1 Measuring Scale Equivariance

4.2 Experiment 1: Generalization to Unseen Scales

4.2.1 Ablation Study on the Riesz Network

4.2.2 Comparison with Competing Methods

4.3 Experiment 2: Performance on Multiscale Data

4.4 Experiment 3: Application to Cracks in CT Images of Concrete

5 Conclusion

Data Availability Statement

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Appendices

Appendix

Experiment on MNIST Large Scale Dataset

Experiments on Scale Selection for Competing Methods Related to Riesz Network

Experiments on Different Types of Concrete: Fiber-Reinforced Concrete

Method Implementation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation