FRD-Net: a full-resolution dilated convolution network for retinal vessel segmentation

Accurate and automated retinal vessel segmentation is essential for performing diagnosis and surgical planning of retinal diseases. However, conventional U-shaped networks often suffer from segmentation errors when dealing with fine and low-contrast blood vessels due to the loss of continuous resolution in the encoding stage and the inability to recover the lost information in the decoding stage. To address this issue, this paper introduces an effective full-resolution retinal vessel segmentation network, namely FRD-Net, which consists of two core components: the backbone network and the multi-scale feature fusion module (MFFM). The backbone network achieves horizontal and vertical expansion through the interaction mechanism of multi-resolution dilated convolutions while preserving the complete image resolution. In the backbone network, the effective application of dilated convolutions with varying dilation rates, coupled with the utilization of dilated residual modules for integrating multi-scale feature maps from adjacent stages, facilitates continuous learning of multi-scale features to enhance high-level contextual information. Moreover, MFFM further enhances segmentation by fusing deeper multi-scale features with the original image, facilitating edge detail recovery for accurate vessel segmentation. In tests on multiple classical datasets,compared to state-of-the-art segmentation algorithms, FRD-Net achieves superior performance and generalization with fewer model parameters.


Introduction
Diabetic retinopathy, glaucoma, and age-related macular degeneration are major causes of blindness in the elderly [1].In clinical practice, physicians diagnose these retinal diseases by analyzing the morphology of blood vessels, and based on the structure and location of vessels, they plan surgeries and guide interventions [2,3].However, due to the limitations of imaging devices and the inherent characteristics of biological tissues, the initially acquired medical images often fail to provide an accurate representation of structural information.As a result, experienced clinicians are required to manually annotate retinal vessel lesions, a time-consuming and tedious task.This becomes especially crucial in acute cases that require timely treatment [4].Therefore, automatic retinal vessel segmentation using computers is of significant importance and has become a research hotspot in the field of computer-aided medical diagnosis in recent years.
Initially, research on retinal vessel segmentation primarily focused on mathematical morphology methods [5], matched filtering methods [6], multi-scale methods [7], and region growing methods [8].These methods aimed to make final predictions through manually designed feature extractors.However, due to diverse and complex backgrounds in images, such as low-contrast vessels, these methods tend to misclassify vessels as background.Benefiting from the influence of data-driven approaches and innovations in computing devices, deep learning methods have made significant progress in retinal vessel segmentation [9,10].By leveraging the outstanding automatic feature learning and end-to-end learning capabilities of deep neural networks (DNNs), the accuracy of retinal vessel segmentation has been significantly improved [11][12][13].Especially, after the introduction of the landmark U-Net architecture [14], various outstanding variants for vessel segmentation have emerged [15][16][17].Despite achieving good segmentation results based on evaluation metrics, these approaches still face challenges in accurately segmenting fine vessels within complex backgrounds.
We conducted experiments on CE-Net to investigate the loss of detailed information, such as edges and textures, in the encoder-decoder structure.Vessel images differ significantly from other medical images, such as cardiac and cellular images, with the former often being relatively thicker and having lower pixel ratios, especially in capillary sections (see Fig. 1(a)).Traditional encoder-decoder segmentation networks use down-sampling operations to expand the receptive field [18] and reduce computational complexity.This leads to a reduction in valuable spatial information.As a result, extracting semantic information about small vessels in low-contrast areas poses a challenge for the model.In the decoder, up-sampling layers struggle to recover these fine structures, causing the network to prioritize identifying larger vessels while overlooking smaller vessels (see Fig. 1(c)).When reducing the number of down-sampling rounds, information about some smaller vessels is preserved (see Fig. 1(d) and (e)).vessel segmentation have emerged [15][16][17].Despite achieving good segmentation results based on evaluation metrics, these approaches still face challenges in accurately segmenting fine vessels within complex backgrounds.
We conducted experiments on CE-Net to investigate the loss of detailed information, such as edges and textures, in the encoder-decoder structure.Vessel images differ significantly from other medical images, such as cardiac and cellular images, with the former often being relatively thicker and having lower pixel ratios, especially in capillary sections (see Figure 1(a)).Traditional encoder-decoder segmentation networks use down-sampling operations to expand the receptive field [18] and reduce computational complexity.This leads to a reduction in valuable spatial information.As a result, extracting semantic information about small vessels in low-contrast areas poses a challenge for the model.In the decoder, up-sampling layers struggle to recover these fine structures, causing the network to prioritize identifying larger vessels while overlooking smaller vessels (see Figure 1(c)).When reducing the number of down-sampling rounds, information about some smaller vessels is preserved (see Figure 1(d) and (e)).
Inspired by the aforementioned challenges, this paper proposes an effective full-resolution retinal vessel segmentation network called FRD-Net, comprising two core components: the backbone network and the Multi-Scale Feature Fusion Module (MFFM).To address the issue of spatial information loss due to excessive down-sampling, existing methods attempt to alleviate this problem to some extent through multi-scale output strategies, but limitations persist.In order to tackle this problem, the backbone network replaces pooling down-sampling with convolution down-sampling, aiming to reduce the loss of spatial details during the pooling process.While reducing the number of down-sampling rounds can mitigate detail loss, this action also diminishes the network's receptive field.To balance the loss of detailed information and expand the receptive field, we opt for three down-sampling rounds.To mitigate detail a loss, we employ a strategy in the backbone network where the dilation rate of dilated convolutions is first increased and then Inspired by the aforementioned challenges, this paper proposes an effective full-resolution retinal vessel segmentation network called FRD-Net, comprising two core components: the backbone network and the Multi-Scale Feature Fusion Module (MFFM).To address the issue of spatial information loss due to excessive down-sampling, existing methods attempt to alleviate this problem to some extent through multi-scale output strategies, but limitations persist.In order to tackle this problem, the backbone network replaces pooling down-sampling with convolution down-sampling, aiming to reduce the loss of spatial details during the pooling process.While reducing the number of down-sampling rounds can mitigate detail loss, this action also diminishes the network's receptive field.To balance the loss of detailed information and expand the receptive field, we opt for three down-sampling rounds.To mitigate detail a loss, we employ a strategy in the backbone network where the dilation rate of dilated convolutions is first increased and then decreased in both the horizontal and vertical directions.Initially, dilated convolutions increase the dilation rate to balance large receptive fields and high spatial resolution.However, due to the sparsity of the kernel, further increasing the dilation rate fails to aggregate local features, leading to loss of information about fine vessels.Consequently, reducing the dilation rate restores spatial consistency to address this issue.Additionally, we introduce MFFM to fuse shallow multi-scale features with deeper multi-scale information, aiding in complementing deeper feature information to restore more vessel edge details, thus achieving precise retinal vessel segmentation.Specifically, the main contributions of this paper include: 1. We propose a new and effective full-resolution retinal vessel segmentation network, named FRD-Net, which consists of interconnected multi-resolution dilated convolution layers.FRD iteratively learns full-resolution representations to mitigate the loss of spatial information.
Simultaneously, an effective sequence of dilated convolutions is employed to compensate for their limitations, preserving fine vessel details.
2. To integrate multi-scale features and enhance segmentation performance, we introduce MFFM for extracting vessel structural information at different scales.This enables detailrich features from shallow layers to be directly transmitted to deeper layers, thus protecting both thick and thin vessels from down-sampling degradation.Concurrently, while retaining edge details, it suppresses background noise, further improving the precision of the segmentation results.The upcoming work is outlined as follows: The second section discusses representative deep learning methods for retinal vessel segmentation, while the third section provides a detailed exposition of the proposed methodology and network architecture.The fourth section reports on experimental parameter settings, datasets, and data preprocessing.In the fifth section, experimental results and ablation studies are presented, followed by a comparative analysis of performance against other state-of-the-art methods.The sixth section concludes the research findings of this paper.

Retinal vessel segmentation
In recent years, deep learning has emerged as the predominant approach for retinal vessel segmentation, with U-Net being one of the most widely applied deep learning frameworks in medical image segmentation tasks.Numerous researchers have employed U-Net for retinal vessel segmentation tasks [14].Compared to traditional unsupervised learning methods, U-Netbased approaches can automatically learn complex features, enhancing the accuracy of retinal vessel segmentation.To address the issue of spatial information loss caused by down-sampling operations in deep convolutional neural networks, U-Net introduced a mechanism of skip connections to fuse low-level and high-level features.While U-Net models benefit from feature fusion, some spatial information from the encoder's shallow stages is challenging to recover in the decoder.Moreover, U-Net has limitations in handling fine and irregular retinal vessel structures, as well as issues related to limited data annotations.Therefore, recent research has focused on improvements to U-Net to further enhance its performance.For instance, Guo, Pei, et al. [19] proposed an enhanced SD-Net model based on U-Net.They introduced the DropBlock structure into U-Net to alleviate network overfitting issues through normalized convolutional architecture.SA-Net [20] incorporated batch normalization layers (BN) into the convolutional blocks of SD-Net, utilizing spatial attention mechanisms to enhance the network's representational capacity.Although these methods further improved retinal vessel segmentation, they primarily addressed the issue of limited data annotations leading to network overfitting, neglecting the irregular characteristics of retinal vessel structures.Mou et al. addressed the curved structure of vessels with CS 2 -Net, utilizing 1×3 and 3×1 convolutions in two directions to capture vessel morphology features.They emphasized regions of interest through channel and spatial attention mechanisms [21].To address spatial information loss caused by consecutive convolution and pooling operations, Gu, Cheng et al. proposed CE-Net, constructing a context extractor module using dense dilated convolution blocks and residual multi-kernel pooling blocks to obtain more contextual information [12].These methods add extra modules between the U-Net encoder and decoder to obtain more high-level semantic features, thus making the model's ability to segment blood vessels greatly improved.However, these methods ignore the importance of underlying spatial information for retinal fine blood vessel segmentation.In order to preserve the underlying features with rich spatial information, Wang et al [22] introduced a spatial refinement path and semantic refinement path structure, focusing on fusing blood vessel features with different resolutions and levels in the network, but the method failed to fully fuse the underlying and high-level semantic features, resulting in the loss of some features representing fine blood vessels.Inspired by these works, we made targeted improvements to the network architecture, such as multi-scale fusion, changing pooling methods and the number of pooling layers, as well as the application of the DropBlock structure.

High/full-resolution network
In semantic segmentation, high/full-resolution networks typically have deeper and more complex structures, enabling them to learn and represent richer feature information, thus preserving more local and contextual details.Sun et al. [23,24] proposed High-Resolution Network (HRNet), specifically designed for tasks such as human pose estimation, object detection, and semantic segmentation.HRNet leverages multi-resolution fusion and cross-stage connections to effectively utilize information from different scales, thereby enhancing the understanding of semantic information in images.Additionally, HRNet's efficient feature learning mechanism, while maintaining high-resolution representations and effectively utilizing multi-scale features, improves the model's performance and efficiency.UNet++ [25,26] redesigns rich skip connections to reduce the semantic gap between feature maps of encoder and decoder sub-networks, maintaining richer feature representations.Furthermore, a pruning method is designed to accelerate the inference speed of UNet++.To address the discontinuity issue in vessel segmentation results, Liu et al. proposed FR-UNet [27], which expands in horizontal and vertical directions through multi-resolution convolutional interaction mechanisms while preserving the full image resolution.Finally, a Dual Threshold Iterative (DTI) algorithm is employed to extract fine vessel pixels, improving vessel connectivity.Previous studies have shown that full or high-resolution convolutional networks perform well in medical image segmentation tasks.However, these networks still face challenges such as redundant skip connections, complex network architectures, and high parameter computation.We aim to address these issues and apply them to retinal vessel segmentation to achieve better vessel segmentation performance.

Network architecture
This paper proposes a simple yet effective retinal vessel segmentation model, and Fig. 2 illustrates the complete architecture of the proposed FRD-Net.FRD-Net consists of two components: the backbone network and MFFM.The backbone network is responsible for extracting retinal vessel information at different scales.The outputs of the different layers of the backbone network are used as inputs to the MFFM to generate the final fusion result.The following sections will provide a detailed explanation of the backbone network and MFFM.horizontal and vertical directions.This approach preserves the details of fine vessels while retaining the integrity of the entire image resolution.

Effective Sequential Utilization of Dilated Convolutions
In many encoder-decoder models, a common strategy to balance large receptive fields and high spatial resolution is the substitution of traditional convolutions with dilated convolutions.However, continuous use of dilated convolutions gives rise to two issues:(1) Grid Effect: During feature learning in the network, not every pixel contributes to the computation, resulting in information discontinuity.This negatively impacts the learning of fine vessel details within the retina.(2) Weak Correlation between Distant Information: The structural form of dilated convolutions indicates their suitability for capturing long-range information.This implies that employing high dilation rates is effective only for segmenting coarse vessels, providing limited benefit for the segmentation of fine vessels.Therefore, it is essential to carefully consider the sequence of applying dilated convolutions to address these challenges in medical image segmentation for retinal vessels.
Handling the relationship between retinal vessel thickness effectively is a critical challenge in the design of well-constructed dilated convolutional networks.The introduction of the Hybrid Dilated Convolution (HDC) structure, as proposed in [28], aims to address the issue of information discontinuity.This structure incorporates two key design principles: (1) the dilation rates of the stacked convolution cannot exceed 1, and (2) the null rate is designed as a sawtooth structure, such as [1,2,5,1,2,5].For tackling the segmentation challenges associated with fine vessels within the vasculature, and considering the sparse connectivity characteristics of dilated kernels, as illustrated in Figure 3, further increasing dilation rates can result in decreased spatial consistency between adjacent information units [29].we opt to gradually increase the dilation rates (1,2,3) to mitigate the speed of spatial consistency decline, to facilitate subsequent enhanced recovery of spatial consistency between adjacent information units.Simultaneously, the decline in spatial

Backbone network
As shown in Fig. 2, the backbone network adopts a three-layer architecture, where each layer utilizes dilated residual convolutional blocks with distinct dilation rates to extract retinal vessel features.The interaction mechanism is employed to achieve horizontal and vertical expansion.Simultaneously, the effective utilization sequence of dilated convolutions is maintained in both horizontal and vertical directions.This approach preserves the details of fine vessels while retaining the integrity of the entire image resolution.

Effective sequential utilization of dilated convolutions
In many encoder-decoder models, a common strategy to balance large receptive fields and high spatial resolution is the substitution of traditional convolutions with dilated convolutions.However, continuous use of dilated convolutions gives rise to two issues:(1) Grid Effect: During feature learning in the network, not every pixel contributes to the computation, resulting in information discontinuity.This negatively impacts the learning of fine vessel details within the retina.(2) Weak Correlation between Distant Information: The structural form of dilated convolutions indicates their suitability for capturing long-range information.This implies that employing high dilation rates is effective only for segmenting coarse vessels, providing limited benefit for the segmentation of fine vessels.Therefore, it is essential to carefully consider the sequence of applying dilated convolutions to address these challenges in medical image segmentation for retinal vessels.
Handling the relationship between retinal vessel thickness effectively is a critical challenge in the design of well-constructed dilated convolutional networks.The introduction of the Hybrid Dilated Convolution (HDC) structure, as proposed in [28], aims to address the issue of information discontinuity.This structure incorporates two key design principles: (1) the dilation rates of the stacked convolution cannot exceed 1, and (2) the null rate is designed as a sawtooth structure, such as [1,2,5,1,2,5].For tackling the segmentation challenges associated with fine vessels within the vasculature, and considering the sparse connectivity characteristics of dilated kernels, as illustrated in Fig. 3, further increasing dilation rates can result in decreased spatial consistency between adjacent information units [29].we opt to gradually increase the dilation rates (1,2,3) to mitigate the speed of spatial consistency decline, to facilitate subsequent enhanced recovery of spatial consistency between adjacent information units.Simultaneously, the decline in spatial consistency poses challenges for higher-level information units, as they can only capture partial information from non-overlapping units, leading to difficulties in extracting local structural information and potential information loss.To address this issue, similar to HDC, we decrease the dilation rates after their augmentation to restore spatial consistency.The distinction lies in our choice of symmetric dilated convolution sequences for rate reduction, facilitating the reconnection of information pyramids between adjacent units and enabling the extraction of higher-level local structures [29].Specifically, as depicted in Fig. 3, we propose a horizontal dilated rate sequence of 1,2,3,2,1,3.This study introduces a backbone network structure that utilizes an interaction mechanism, employing dilated convolution sequences in both horizontal and vertical directions.In the horizontal direction, the effective application of dilated convolution sequences enables us to balance the demands of large receptive fields and high spatial resolution in vessel segmentation tasks.In the vertical direction, leveraging the interaction mechanism realizes the designed dilated convolution sequence, facilitating accurate detection of smaller objects (such as tiny blood vessels) while achieving cross-resolution information sharing and feature fusion, thereby enhancing the capability to accurately detect local structures like tiny blood vessels.consistency poses challenges for higher-level information units, as they can only capture partial information from non-overlapping units, leading to difficulties in extracting local structural information and potential information loss.To address this issue, similar to HDC, we decrease the dilation rates after their augmentation to restore spatial consistency.The distinction lies in our choice of symmetric dilated convolution sequences for rate reduction, facilitating the reconnection of information pyramids between adjacent units and enabling the extraction of higher-level local structures [29].Specifically, as depicted in Figure 3, we propose a horizontal dilated rate sequence of 1,2,3,2,1,3.This study introduces a backbone network structure that utilizes an interaction mechanism, employing dilated convolution sequences in both horizontal and vertical directions.In the horizontal direction, the effective application of dilated convolution sequences enables us to balance the demands of large receptive fields and high spatial resolution in vessel segmentation tasks.In the vertical direction, leveraging the interaction mechanism realizes the designed dilated convolution sequence, facilitating accurate detection of smaller objects (such as tiny blood vessels) while achieving cross-resolution information sharing and feature fusion, thereby enhancing the capability to accurately detect local structures like tiny blood vessels.

Multi-resolution interaction mechanisms and residual modules
Figure 4 illustrates the structure of the backbone network.The backbone network employs various convolutional operations (2x2 convolution and deconvolution, dilated convolutions with different dilation rates) to achieve horizontal and vertical expansions, similar to the structure of HRNet.Additionally, we introduce a multi-resolution interaction mechanism at each feature map stage to facilitate information exchange between adjacent stages.Shallow stages contribute refined semantic information, while deeper stages augment high-level context information and local receptive fields of feature maps.In order to reduce parameter count and maintain the effective utilization sequence of dilated convolutions, in contrast to the feature fusion module in FR-Net that employs parallel multiple dilated convolutions, we exclusively utilize dilated residual modules (as depicted in Fig. 4) for feature extraction from horizontally and vertically concatenated feature maps.This approach helps alleviate potential semantic misunderstandings that may arise from using a fixed receptive field during feature learning, while also reducing the overall number of parameters.Our multi-resolution interaction mechanism (highlighted by the red box in Fig. 4) operates as follows: Where D(x) and U(x) represent down-sampling and up-sampling operations, respectively.[u,v,. . .] represents the concatenation operation.As illustrated in Fig. 4, X i,j represents a fusion stage of feature maps, where i and j correspond to the rows and columns of the defined backbone network.Depending on the hierarchical level of the fusion stage, the feature fusion methods can be categorized into three stages:(1) Concatenation of the output of X i,j−1 and the up-sampling output of X i+1,j−1 ; (2) Concatenation of the output of X i,j−1 and the down-sampling output of (3) Concatenation of the output of X i,j−1 , the down-sampling output of X i−1,j−1 , and the up-sampling output of X i+1,j−1 .The resulting three different concatenation outputs are used as inputs to the dilated residual module.The structure of the backbone network is relatively simple, primarily consisting of the following components: dilated residual blocks, up-sampling, and down-sampling.Figure 5 illustrates the configuration of dilated residual blocks with varying dilation rates.Considering the limited number of publicly available retinal datasets, which poses a challenge of overfitting during training, we introduce the DropBlock structure within the dilated residual blocks as an effective regularization method.The incorporation of these dilated residual modules not only addresses overfitting challenges but also contributes to accelerated convergence during network training, enhancing the model's performance and generalization ability.The core components of upsampling and down-sampling include convolution layers, BN layers, and LeakyReLU activation functions.Specifically, down-sampling utilizes a 2×2 convolution with a stride of 2 to increase channel numbers and reduce spatial dimensions.Up-sampling, on the other hand, employs a 2×2 transpose convolution with a stride of 2 to halve channel numbers and increase spatial dimensions.The channel count in FRD-Net starts from 32 and gradually doubles to meet the feature extraction requirements at different levels.

Multi-scale feature fusion module (MFFM)
Retinal vessels exhibit diverse thicknesses and sizes, imposing higher demands on accurate segmentation across multiple scales.Recognizing this characteristic, and aiming to better integrate feature maps from different layers of the backbone network to enhance the final segmentation accuracy, we introduce MFFM block, as depicted in Fig. 6.The primary functionalities of this block include effective background noise suppression and fine segmentation of retinal vessels.

Multi-Scale Feature Fusion Module (MFFM)
Retinal vessels exhibit diverse thicknesses and sizes, imposing higher demands on accurate segmentation across multiple scales.Recognizing this characteristic, and aiming to better integrate feature maps from different layers of the backbone network to enhance the final segmentation accuracy, we introduce MFFM block, as depicted in Figure 6.The primary functionalities of this block include effective background noise suppression and fine segmentation of retinal vessels.
Prior research commonly employed the strategy of restoring all scale feature maps to the same scale, followed by pixel-wise addition and concatenation.However, these approaches often failed to adequately consider the spatial relationships between feature maps of different scales, posing a challenge to improving segmentation performance.Considering the semantic gap between feature maps of different scales, we decide to initiate the fusion process from the lowest layer of the backbone network, i.e., the smallest scale feature map.We then progressively up-sample and concatenate with the feature map from the previous layer, ultimately employing a 1x1 convolution for feature fusion.This process is repeated until the features are restored to the original size of the image.To compensate for the detailed information lost during the fusion process, we feed the original image into the Multi-Scale Fusion Module and concatenate it with the final fused feature map from the backbone network.This ensures that shallow multi-scale information can directly propagate to deeper layers, preserving the details of both coarse and fine retinal vessels from the effects of down-sampling.Subsequently, the MFFM employs dilated convolutions with varying dilation rates on the concatenated feature maps to perform feature learning, building upon traditional convolutional operations.Specifically, we utilize traditional convolutions with a kernel size of 1x1, coupled with dilated convolutions with different rates (1, 2, and 3), to extract multi-scale vessel features, thereby enhancing the final segmentation accuracy.By fusing features of different scales, we can more accurately capture and represent the details and diversity of retinal vessels.Prior research commonly employed the strategy of restoring all scale feature maps to the same scale, followed by pixel-wise addition and concatenation.However, these approaches often failed to adequately consider the spatial relationships between feature maps of different scales, posing a challenge to improving segmentation performance.Considering the semantic gap between feature maps of different scales, we decide to initiate the fusion process from the lowest layer of the backbone network, i.e., the smallest scale feature map.We then progressively up-sample and concatenate with the feature map from the previous layer, ultimately employing a 1x1 convolution for feature fusion.This process is repeated until the features are restored to the original size of the image.To compensate for the detailed information lost during the fusion process, we feed the original image into the Multi-Scale Fusion Module and concatenate it with the final fused feature map from the backbone network.This ensures that shallow multi-scale information can directly propagate to deeper layers, preserving the details of both coarse and fine retinal vessels from the effects of down-sampling.Subsequently, the MFFM employs dilated convolutions with varying dilation rates on the concatenated feature maps to perform feature learning, building upon traditional convolutional operations.Specifically, we utilize traditional convolutions with a kernel size of 1x1, coupled with dilated convolutions with different rates (1, 2, and 3), to extract multi-scale vessel features, thereby enhancing the final segmentation accuracy.By fusing features of different scales, we can more accurately capture and represent the details and diversity of retinal vessels.
The DRIVE dataset comprises 40 color fundus images of the retina (33 from non-diabetic individuals and 7 from patients with mild diabetic retinopathy), collected from different patients aged 25-90 in the Netherlands.The STARE dataset consists of 20 color fundus images (10 depicting pathological retinas and 10 normal retinas).The CHASE_DB1 dataset includes 28 color fundus images taken from 14 school children, providing binocular retinal color images.
The HRF dataset contains 45 color fundus images (15 from healthy individuals, 15 from patients with diabetic retinopathy, and 15 from patients with glaucoma).For the first three datasets, where two expert annotations are available, to maintain consistency with other methods, we use the annotations from the first expert as labels, which are input into the network alongside the original images.The annotations from the second expert are treated as human observers in these three datasets.
Due to the limited number of images in the four datasets (a total of 133 images), overfitting issues may arise during network training.To address this, we employed data augmentation methods.The specific augmentation techniques include random horizontal flipping with a probability of 0.5, random vertical flipping with a probability of 0.5, random rotation within the range of [0, 360], and random cropping.By applying these methods, the number of training images was increased fivefold, mitigating overfitting issues and significantly improving the model's performance.Table 1 summarizes the quantity of each dataset, the training/testing set split, the resolutions before and after cropping, and the number of training images after data augmentation.

Data preprocessing
Due to significant variations in color tones and contrast within the retinal datasets' color fundus images, grayscale conversion is initially applied to mitigate these interfering factors.Notably, the low contrast between vessels and background in retinal images often results in unclear vessel details.To address this issue, an Adaptive Histogram Equalization (CLAHE) method [34,35] is employed to enhance the contrast between vessels and background.CLAHE performs histogram equalization locally, significantly improving image contrast.Finally, Gamma Correction (GC) is introduced as an effective contrast enhancement technique.GC effectively highlights darker vessel structures in retinal images [36].Figure 7 illustrates the image processing results of the aforementioned steps.

Experimental environment and parameter settings
The experiments were conducted on a Dell Intel Xeon Gold 6226R processor and a Dell RTX A6000 graphics card.The experimental environment employed the Windows 10 (64-bit) operating system, with development and testing carried out using PyCharm Community Edition 2022.2.3 x64.The PyTorch open-source framework was utilized for training and testing the network models.The optimization algorithm for network training in this study was the Adam algorithm [37], with a learning rate (lr) of 0.001.The network underwent 250 training iterations.The DropBlock configuration for each dataset involved a dropout block size of 7, while maintaining an output probability of 0.9 for each neuron.

Evaluation metrics
To quantitatively analyze the effectiveness of the proposed model and assess its segmentation performance, the manually segmented results provided by the datasets served as standard segmentation images.Five performance evaluation metrics-Accuracy (Acc), Sensitivity (Se), Specificity (Sp), F1 score, and Area Under the ROC Curve (AUC)-were introduced for objective quantitative evaluation.The calculation formulas for the evaluation metrics are defined as follows: where TP, TN, FP, and FN represent True Positive, True Negative, False Positive, and False Negative, respectively.In the calculation, TP, TN, FP, and FN values are obtained by comparing the pixel-wise results of the retinal vessel segmentation from the testing method with the corresponding pixel values in the Ground Truth label images.The Area Under the ROC Curve (AUC) is defined as the area enclosed by the Receiver Operating Characteristic (ROC) curve and the axes.The values for Acc, Se, Sp, and F1 range from 0 to 1, while AUC ranges from 0.5 to 1. Larger values for these five metrics indicate better model classification performance, i.e., improved retinal vessel segmentation results.

Loss function
In the task of retinal image segmentation, the choice of a loss function is crucial for accurately segmenting essential structures in the retina, such as vessels and lesions.In retinal images, background pixels typically occupy the majority, while the target structures of interest (e.g., vessels) constitute a minority, leading to a class imbalance issue.To better train the proposed model, we employ the binary cross-entropy loss function to optimize the network parameters, measuring the difference between predicted values P(i) and actual values G(i) as the basis for network parameter optimization: where, P(i) represents the predicted value, where P(i) ∈ (0, 1), reflecting the likelihood of a pixel being predicted as a vessel pixel; a higher value indicates a higher likelihood.G(i) denotes the label, taking values of 0 or 1. N represents the total number of pixels in the image.

Quantitative comparison experiments between FRD-Net and other methods
To validate the superior performance of FRD-Net, we conducted comparisons with widely recognized retinal vessel segmentation methods on four public datasets.We comprehensively assessed five key evaluation metrics, including sensitivity (Se), specificity (Sp), accuracy (Acc), F1 score, and area under the curve (AUC).The best-performing results are highlighted in red, while the second-best results are marked in blue."N/A" indicates that the corresponding results were not provided in the respective papers.We re-implemented five methods, CE-Net [12], SA-Unet [20], CS 2 -Net [21], FR-Unet [27], and FRD-Net, and compared their performance.For other methods without available code, we referenced the comparison data provided in the relevant literature.
As shown in Table 2, on the DRIVE dataset, FRD-Net outperforms other methods in Se, Acc, F1, and AUC metrics.While CS 2 -Net achieves the best result in Sp, it lags behind FRD-Net in Se, Acc, F1, and AUC metrics, highlighting the impact of severe class imbalance due to the small pixel proportion occupied by retinal vessels.In comparison to Sp, Se and F1 effectively evaluate overall segmentation performance.CS 2 -Net falls short of FRD-Net by 1.55% in Se and 0.64% in F1.Table 3 provides quantitative comparative experimental results of FRD-Net with other methods on the STARE dataset.FRD-Net exhibits superior performance in Sp, Acc, AUC, and F1 metrics.Specifically, FRD-Net achieves a 0.45% higher Acc than the second-best SCS-Net.Although FRD-Net's Se metric slightly decreases compared to DM-Net and SDDC-Net, with gaps of 0.19% and 0.23%, it outperforms both methods in Se, Sp, Acc, and AUC metrics.Notably, in terms of Acc and F1 metrics, FRD-Net surpasses SDDC-Net by 1.12% and 2.89%, respectively, and DM-Net by 0.49% and 0.58%.
Tables 4 and 5 demonstrate the quantitative comparative experimental results of FRD-Net with other methods on the CHASE_DB1 and HRF datasets.The results indicate that on the CHASE_DB1 dataset, FRD-Net achieves optimal performance in all five metrics except for the Sp indicator.On the HRF dataset, FRD-Net outperforms other methods in all five metrics.Despite a 0.03% decrease in Sp on the CHASE_DB1 dataset compared to FR-Unet, FRD-Net exhibits superior performance in Se, Acc, F1, AUC, and other indicators.Notably, FRD-Net surpasses FR-Unet by 2.86% in correctly segmented vessel pixels (Se) and by 1.1% in the F1 indicator, which comprehensively measures segmentation performance.Moreover, on the HRF dataset, FRD-Net outperforms SA-Unet by 3.12% in the F1 indicator.
The above experimental results indicate that the proposed FRD-Net consistently achieves the highest values in key evaluation metrics, including Se, Acc, and F1 scores, across four public image datasets.FRD-Net is designed to enhance the extraction capability of retinal vessels, particularly in the presence of complex backgrounds and small retinal vessels.When compared to other state-of-the-art segmentation methods on retinal vessels, FRD-Net demonstrates superior performance in Se, Acc, and F1 scores.Higher Se values indicate stronger detail detection capabilities, while elevated Acc and F1 scores signify excellent performance in retinal vessel segmentation tasks.

Visualization comparison of the method with other methods
To demonstrate the performance of the proposed FRD-Net model in terms of vessel segmentation performance.under the same experimental conditions, Figs. 8, 9 show a visual comparison of FRD-Net with four methods such as CE-Net [12], SA-Unet [20], CS 2 -Net [21], and FR-Unet [27] in terms of local details for evaluating the model on four datasets, namely, DRIVE, STARE, CHASE-DB1, and HRF, which have been performance testing.We selected two images in each dataset for testing and used green and red borders to mark the local images of tiny blood vessels and provide a zoomed-in display of the local images.In Fig. 8, for the DRIVE, STARE, and CHASE_DB1 datasets, comparative methods exhibit instances of missed, incomplete, or discontinuous segmentation in small vessels (indicated in red).FRD-Net excels in accurately segmenting these small vessels.Additionally, other methods may experience missing or oversegmentation issues when dealing with larger vessels, as observed in the first image of the STARE dataset.In this case, both other methods and FRD-Net successfully segment the larger vessel within the red border.However, compared to the ground truth, CS 2 -Net segments the larger vessel with a diameter that is excessively large, while FRD-Net accurately segments these larger vessels.
In terms of small vessels in this image, other methods struggle to accurately segment the small vessels adjacent to the larger vessel, while FRD-Net demonstrates more accurate segmentation of these small vessels.Figure 9 illustrates the comparison results of FRD-Net and other methods on the HRF dataset, which contains a higher proportion of small vessels, providing a good reflection of the methods' performance in segmenting small vessels.In Fig. 9, other methods exhibit instances of missed, incomplete, or discontinuous segmentation in small vessels (indicated in red and green boxes with yellow arrows).In contrast, FRD-Net excels in accurately segmenting these small vessels.In addition to this, we have appended key quantitative metrics, i.e., Acc and F1 scores, next to each of the subfigures in Figs. 8, 9. Through the dual validation of quantitative metrics and qualitative analysis, we can clearly see that the proposed method exhibits a significant advantage in vessel segmentation performance compared to the comparison methods.It is not only accurate, but also capable of capturing the details of the vessels more comprehensively, thus providing us with more accurate and reliable vessel segmentation results.Overall, compared to the comparative methods, FRD-Net demonstrates more accurate segmentation of small vessels and excellent performance in larger vessel segmentation.

Performance of FRD-Net in challenging areas of retinal vessel segmentation
The intricate and curved structure of retinal blood vessels, low-contrast retinal images, and interference from vessel lesions pose significant challenges in the task of retinal vessel segmentation.The presence of numerous tiny vessels in the peripheral and intermediate regions of the retina further complicates the segmentation process by causing blurred outlines of these micro-vessels, making their accurate identification difficult.To assess the outstanding robustness of FRD-Net in such complex scenarios, we selected specific segmentation results from four datasets and illustrated the segmentation performance of FRD-Net in these challenging situations through Fig. 10.The first row demonstrates that FRD-Net excels in successfully segmenting micro-vessels under low-contrast conditions, showcasing its superiority in low-contrast environments.The second and third rows showcase the accurate segmentation of complex and curved vessels in pathological regions, highlighting the robustness of the FRD-Net against interference.The fourth and fifth rows depict FRD-Net's ability to accurately segment micro-vessels despite the challenges posed by low contrast and pathological factors.In the quantitative metrics presented in Fig. 10, by comparison, we can clearly see that FRD-Net is able to capture the fine structure of blood vessels more accurately when dealing with the segmentation challenges, thus achieving more precise vessel segmentation results.Figure 11 illustrates that when comparing the segmentation results of the FRD-Net model with ground truth, our method successfully extracted tiny blood vessels that were not annotated by the first expert.Despite being trained using annotations from the first expert, during testing, we observed that the model successfully identified more vessels than those marked by the first expert.This outcome indicates that the FRD-Net model has learned superior vessel representations.Even in regions where the first expert missed some tiny vessels, our model still performs well in the task of vessel segmentation.The quantitative indicators in Fig. 11 provide further evidence of this performance.

Computational complexity
Recent studies [48] suggest that increasing the complexity of a network typically enhances its representational capacity, leading to improved performance.However, this might not be the optimal choice in many medical applications, where sufficient computational resources for deploying and running highly complex models are often unavailable in clinical settings.The number of model parameters serves as an objective metric for evaluating computational

Computational Complexity
Recent studies [48] suggest that increasing the complexity of a network typically enhances its representational capacity, leading to improved performance.However, this might not be the optimal choice in many medical applications, where sufficient computational resources for deploying and running highly complex models are often unavailable in clinical settings.
The number of model parameters serves as an objective metric for evaluating computational complexity.We compare the proposed FRD-Net with other state-of-the-art methods in terms of complexity by estimating the number of parameters.As shown in Figures            baseline model for the ablation study.Sequentially, we introduced the multi-resolution interaction mechanism and residual module (referred to as FR in this section), the effective utilization of dilated convolutions (referred to as DCEU in this section), and the multi-scale feature fusion module (referred to as MFFM in this section) into the baseline model, resulting in five distinct models.The experimental results are summarized in Table 7, providing an intuitive understanding of the contribution of each module to the model's performance.In comparison to the baseline, the multi-resolution interaction mechanism (Baseline + FR) significantly improved overall segmentation performance, with increases of 4.67%, 0.22%, 0.74%, 1.03%, and 0.8% in Se, Sp, Acc, F1, and AUC, respectively.As illustrated in Figs.Building upon the multi-resolution interaction mechanism (Baseline + FR), we successively introduced the effective use of dilated convolutions (DCEU) and the multi-scale feature fusion module (MFFM).The former constitutes the backbone network of our FRD-Net, while the latter aims to highlight the role of the MFFM module.The experimental results are depicted in Table 7 as "Baseline + FR + DCEU" and "Baseline + FR + MFFM".In the effective use of dilated convolutions, we employed a strategy of increasing and then decreasing dilation rates applied in both horizontal and vertical directions, replacing conventional 3x3 convolutions.This strategy led to improvements in Se, Sp, Acc, F1, and AUC metrics, with notable advancements in Se and F1 by 0.62% and 0.69%, respectively, in "Baseline + FR + DCEU" compared to "Baseline + FR."We posit that the combination of the effective use of dilated convolutions and the multi-resolution interaction mechanism aggregates richer contextual information before fusion, advantageous for the segmentation of heavily imbalanced vessel pixels and delicate vessels.While the multi-scale fusion module is widely utilized in various mainstream network architectures, we modified its internal fusion approach, co-learning with the original image.In comparison to "Baseline + FR," the multi-scale fusion module facilitated the direct transmission of multi-scale information-rich features from shallow layers to deeper ones, preserving thick and thin vessels from downsampling degradation.This improvement enhances the overall performance, particularly achieving the highest score in the F1 metric.Finally, we further incorporate DCEU and MFB into the model (Baseline+FR+DCEU+MFB) to validate the combined effect of all modules.As shown in Table 7, this method achieves the highest Sp, Acc, and AUC in the ablation study, with slightly lower scores in Se and F1 metrics, but with marginal differences compared to the highest scores.Visual details from the local segments of the ablation experiment, as shown in Fig. 13, demonstrate that incorporating each module further elevates segmentation performance.These findings collectively underscore the efficacy of FRD-Net for vessel segmentation.In conclusion, the results suggest that maintaining a full-resolution interaction mechanism and overlaying effective dilated convolution sequencing and multi-scale aggregation modules form a strategic approach for vessel segmentation.
To assess the impact of down-sampling frequency on the performance of the FRD-Net model, we designate a model with a single down-sampling as a 2-layer FRD-Net.Subsequently, we incrementally increase the down-sampling frequency, resulting in four models ranging from 2-layer to 5-layer FRD-Nets.As depicted in Table 8, 3-layer FRD-Net and 5-layer FRD-Net exhibit distinct advantages and disadvantages among the four models.In comparison to the 5-layer FRD-Net, the 3-layer FRD-Net demonstrates improvements in the Se (Sensitivity), F1, and AUC metrics by 1.11%, 0.16%, and 0.09%, respectively.However, the Sp (Specificity) and Acc (Accuracy) metrics experience reductions by 0.09% and 0.01%, respectively.From a parameter perspective, it is noteworthy that the 3-layer FRD-Net, relative to the 5-layer FRD-Net, experiences an 8.4-fold reduction.Regarding the five conventional metrics, the performance enhancement of the 3-layer FRD-Net is more pronounced, considering the significantly lower parameter count compared to the 5-layer FRD-Net.As shown in Table 8, experimental results indicate that the 3-layer FRD-Net effectively balances the compromise between the loss of detailed information and the influence of expanding the receptive field on model performance.

Cross-validation
In order to assess the generalization performance of our proposed model, we employed a cross-training strategy using the DRIVE and STARE datasets.In the first set of experiments, we trained the FRD-Net on the DRIVE dataset and subsequently tested its performance on the STARE dataset.The experimental results demonstrate that the FRD-Net achieved optimal scores across all five metrics.Specifically, in terms of Sensitivity (Se) reflecting accurate vessel segmentation and the overall segmentation performance measured by Accuracy (Acc), our FRD-Net attained 82.38% and 97.31%, respectively, in the cross-training experiment on the STARE dataset.However, when trained on the STARE dataset and tested on the DRIVE dataset, we observed suboptimal vessel segmentation and damage due to the scarcity of fine vessels in the ground truth of the STARE dataset.As shown in Table 9, the results of the second set of experiments reveal that compared to the first set, other methods exhibited inferior performance in the second set.In comparison to the other five methods, our proposed FRD-Net achieved superior performance in Se, Acc, Area Under the Curve (AUC), and F1-score, reaching 80.79%, 96.63%, 80.68%, and 98.07%, respectively.Overall, our FRD-Net, trained through cross-validation on both the DRIVE and STARE datasets, demonstrates satisfactory generalization performance as evidenced by the experimental results.

Conclusion and future work
In this paper, we proposed a novel Retinal Vessel Segmentation Network (FRD-Net) that effectively utilizes dilated convolutions at full resolution.FRD-Net comprises two components: the main network and the Multi-Scale Feature Fusion module (MFFM).The main network consists of interactive multi-resolution dilated convolution layers, enabling the continuous learning of full-resolution representations to mitigate spatial information loss.Simultaneously, we employed an effective sequence of dilated convolutions to address their limitations, thereby preserving vessel details.To fuse features across scales for enhanced segmentation performance, we introduced the MFFM block to efficiently extract vessel structural information at different scales.This facilitates the direct transmission of multi-scale information-rich features from shallow layers to deep layers, preserving both thick and thin vessels from degradation due to downsampling.Concurrently, it enhances the segmentation results' precision while retaining edge details.Experimental results on four publicly available datasets (DRIVE, STARE, CHASE_DB1, and HRF) demonstrate that, compared to state-of-the-art retinal vessel segmentation methods, FRD-Net achieves better segmentation performance with fewer model parameters.In future work, we will attempt to validate the segmentation performance of the FRD-Net method in other medical image segmentation tasks to further confirm its effectiveness and generalization.Disclosures.The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.Visualization of segmentation results with different down-sampling operations on CE-Net.(a) Original image from the DRIVE dataset; (b) Ground truth; (c) Segmentation result of the original CE-Net; (d) Segmentation result of CE-Net without one round of down-sampling; (e) Segmentation result of CE-Net without two rounds of down-sampling.

Fig. 1 .
Fig. 1.Visualization of segmentation results with different down-sampling operations on CE-Net.(a) Original image from the DRIVE dataset; (b) Ground truth; (c) Segmentation result of the original CE-Net; (d) Segmentation result of CE-Net without one round of down-sampling; (e) Segmentation result of CE-Net without two rounds of down-sampling.

Figure 4
Figure 4 illustrates the structure of the backbone network.The backbone network employs various convolutional operations (2x2 convolution and deconvolution, dilated convolutions with different dilation rates) to achieve horizontal and vertical expansions, similar to the structure of HRNet.Additionally, we introduce a multi-resolution interaction mechanism at each feature map stage to facilitate information exchange between adjacent stages.Shallow stages contribute refined semantic information, while deeper stages augment high-level context information and local receptive fields of feature maps.In order to reduce parameter count and maintain the

5. 5 .
Experimental ablationTo validate the effectiveness of the modules proposed in our FRD-Net, we conducted ablation experiments on the well-established DRIVE dataset.The U-Net architecture served as the

Fig. 11 .Fig. 12 .
Fig. 11.Comparison of Fine Blood Vessel Segmentation between the Proposed Method and Two Observers.(a) Original Image; (b) First Observer; (c) Second Observer; (d) Our Proposed Method.

Fig. 11 .
Fig. 11.Comparison of Fine Blood Vessel Segmentation between the Proposed Method and Two Observers.(a) Original Image; (b) First Observer; (c) Second Observer; (d) Our Proposed Method.

Fig. 11 .Fig. 12 .
Fig. 11.Comparison of Fine Blood Vessel Segmentation between the Proposed Method and Two Observers.(a) Original Image; (b) First Observer; (c) Second Observer; (d) Our Proposed Method.

Fig. 12 .
Fig. 12. F1 and Sensitivity (Se) Scores on the DRIVE Dataset .The values in parentheses represent the number of parameters (in MB).Larger circles indicate a higher number of parameters.
13(g) and (h), local visual results from the ablation experiments further demonstrate the enhancement in segmentation performance.The experimental results indicate that introducing multi-resolution interaction mechanisms between adjacent stages enables interactive fusion of contextual information, thereby enhancing the model's segmentation ability for retinal vessels.