Fuzzy neighbourhood neural network for high-resolution remote sensing image segmentation

ABSTRACT Remote sensing image segmentation plays an important role in many industrial-grade image processing applications. However, the problem of uncertainty caused by intraclass heterogeneity and interclass blurring is prevalent in high-resolution remote sensing images. Moreover, the complexity of information in high-resolution remote sensing images leads to a large amount of background information around objects. To solve this problem, a new fuzzy convolutional neural network is proposed in this paper. This network resolves the ambiguity and uncertainty of feature information by introducing a fuzzy neighbourhood module in the deep learning network structure. In addition, it adds a multi-attention gating module to highlight small object features and separate them from the complex background information to achieve fine segmentation of high-resolution remote sensing images. Experimental results on three different segmentation datasets suggest that the proposed method has higher segmentation accuracy and better performance than other deep learning networks, especially for complicated shadow information. Code will be provided in (https://github.com/tingtingqu/code).


Introduction
Remote sensing image segmentation technology has been widely used in urban planning (Wurm et al., 2019), environmental protection (Song et al., 2018), climate change, and other fields (Salzano et al., 2019).However, it is still a challenging task due to the complexity of the objects in remote sensing image, such as the wide observation range, high information complexity and unstable imaging quality (X.Li et al., 2021).
A considerable amount of literature has been published on the remote sensing image segmentation.The conventional methods rely on the priori knowledge for manual selection of target features, and there is a gap with the actual semantic features in the process of building the corresponding segmentation model (Guo et al., 2018;Han & Wu, 2017;Yuan et al., 2014;Jabri et al., 2014).In recent years, convolutional neural networks have been extensively used in high-resolution remote sensing image segmentation due to their superior nonlinear representation capabilities (Pan et al., 2021), which can learn deeper and more essential features from massive sample data (Wang et al., 2022;Pan et al., 2021).From scene classification (Zhang et al., 2016), semantic segmentation (Bao et al., 2021;R. Liu et al., 2020), instance segmentation to panoramic segmentation (de Carvalho, de Carvalho Júnior, Silva, et al., 2022).Segmentation approaches vary from the patch-based methods (Zhang et al., 2018;Sharma et al., 2017) to a large variety of convolutional neural networks (Caesar et al., 2018;Wang et al., 2020;M. Liu et al., 2020;Ni et al., 2019;X. Li et al., 2019;Yue et al., 2016;Z. Zhao et al., 2021).M. Chen et al. (2021) added dense connection blocks and residual structures to the DeepLabv3+ encoder and decoder to overcome the drawback of incomplete fusion of shallowextracted low-level features and depth-extracted abstract features.Hua et al. (2021) proposed a cascade panoramic segmentation network for high-resolution remote sensing images.This method integrates the complementary features of different stages by sharing foreground instance segmentation with background semantic segmentation.Khoshboresh-Masouleh and Shah-Hosseini (2021a) proposed a building panoramic change segmentation method based on the squeeze-and-attention convolutional neural network.The proposed method provided better panoptic segmentation performance for bitemporal images.de Carvalho, de Carvalho Júnior, de Albuquerque, et al. (2022) introduced all-optical segmentation of multispectral remote sensing data and evaluated different configurations regarding band arrangements.
Remote sensing images have the similarity among non-homogeneous targets and the diversity of homogeneous targets.The boundary of ground objects is confused and even shadow overlap, which leads to uncertainty in the remote sensing image segmentation (Zhao, Xu et al., 2021).Figure 1 shows an appropriate example (The building shadows obscure the boundaries of cars and roads, and buildings are intricately connected to low vegetation boundaries).Some methods utilize a multiscale feature fusion strategy (Abdollahi et al., 2020) and multiscale structure to extract feature maps with large receptive fields to recognize those challenging and complicated objects (H.Zhao et al., 2017;Yuan et al., 2020;Zheng et al., 2019).But these methods ignore the problem that the semantic information of objects is covered and the scale sensing receptive field of feature map does not match.Fortunately, the fuzzy learning systems (Mylonas et al., 2013;Sugeno & Yasukawa, 1993) can capture the ambiguity of human thinking from a macroscopic view, and it has the unique strength in solving uncertainty problems (Huang et al., 2021;Lu et al., 2020;Nida et al., 2019;Pal & Sudeep, 2016).Extensive researchers have attempted to apply fuzzy systems to vision tasks.Deng et al. (2016) proposed a fuzzy neural network for segmentation to reduce the data noise caused by image uncertainty by fusing fuzzy features and convolutional features.Zhang et al. (2017) proposed a deep belief network that uses the fuzzy c-means clustering algorithm to partition the input space and implicitly creates a deep belief network by defining a fuzzy affiliation function.Hurtik et al. (2019) proposed a new image structure that reduces the error classification during pre-processing by transforming clear pixel values to fuzzy values.To this end, we propose a fuzzy neighbourhood module to learn the relationship between pixels and their neighbourhood and solve the uncertainty problem in the remote sensing image semantic segmentation.
In addition, remote sensing images also have the characteristic of high complexity (Xu et al., 2021;Zhou et al., 2019) and different object sizes (Xiao et al., 2018).The attention mechanism can selectively focus on a certain part of the image, which combines information from different regions (Wei et al., 2021).Qi et al. (2020) added an attention module to the convolutional neural network (CNN).Their network refines the nonlinear boundaries of objects in remote sensing images by using multi-scale convolution and attention mechanisms.Li, Zheng, et al. (2021) proposed a dense jump-connected network with a multiscale feature fusion attention mechanism to solve the problem of detailed and blurred edge loss during highresolution down-sampling of remote sensing images.Sun et al. (2020) proposed a boundary-aware semisupervised semantic segmentation network, which uses a channel-weighted multi-scale feature module to balance semantic and spatial information and a boundary attention module to weight the boundary information.
Inspired by the above analysis and discussion, this paper proposes a fuzzy neighbourhood convolutional neural network for high-resolution remote sensing image segmentation.Due to the low contrast at the boundaries, such as shaded and occluded areas, the segmentation results for the boundary pixels are likely to be uncertain.We use the fuzzy neighbourhood module to eliminate unfavourable factors that affect inter-class noise and intra-class complexity, and the multi-attention gating module to acquire object features.The main contributions of this paper can be summarized as follows.
(1) We propose a new architecture that combines fuzzy learning and convolutional neural network for high-resolution remote sensing image segmentation.
(2) The fuzzy neighbourhood module is used to process each pixel and learn the relationship between one pixel and its neighbourhood pixels, so as to tackle the problem of the diversity of homogeneous objects and the similitude among non-homogeneous objects in remote sensing images.(3) A multi-attention gating module is introduced to use low-level features to provide guidance information for high-level features and to highlight small object detail information in the feature map through weighting indices.(4) The effectiveness of the proposed method has been demonstrated by visualizing the segmentation results and by objective indexes in a comparative experimental analysis using three benchmark remote sensing datasets.

Methodology
The network architecture is shown in Figure 2. It is designed with an encoder-decoder structure, which is one of the most common frameworks of deep neural networks.Because some significant spatial information may be lost in the encoder, the multi-attention gating module is introduced into the down-sampling stage to fuse the detail information of low-level features with the semantic information of high-level features and thereby prevent the loss of spatial information in the low-level features.The outputs of the fuzzy neighbourhood module and the results from the multi-attention gating module are input to the decoder, and the non-local features are used as local features for information guidance to complete the final feature map prediction.

Feature extraction structure
The symmetric network architecture can efficiently extract global and local information from feature intervals.The encoder in this network is used for multi-scale feature extraction from high-resolution remote sensing images.Because the scale information obtained from the convolution modules of different layers is not exactly the same, by feeding information at different scales to the multi-attention gating module, the connection between low-level features and high-level features achieves a refinement effect, suppressing irrelevant regions and highlighting useful features in salient regions.The network integrates various feature maps with different scales into a unified feature map for prediction.Specifically, the output result of the convolution blocks in the final layer is obtained by (1).
where g n denotes the extraction process of E n , θ n is the learning parameter, F d is the output of the encoder, and S n is the output of the nth encoder.

The fuzzy neighbourhood module
Compared with the method of pre-processing fuzzy data and cascading to neural networks, our fuzzy neighbourhood module can extract fuzzy features from a clear set of limited information.This can solve the problem of intraclass heterogeneity caused by complex feature information in high-resolution remote sensing images by using pixel-to-pixel fuzzy relationships.The module processes the feature map instead of the original image.This avoids the information confusion caused by the phenomena of high contrast and border shadows in the original image.
The fuzzy neighbourhood module falsifies a fuzzy neighbourhood set Z for feature points p i 2 X in each channel by fuzzifying the function channels.For the input feature map F, we use M fuzzy functions to fuzzy transform each point in F with the help of the convolution operation.It is worth noting that M remains the same for all channels of each feature map.For each channel in a different feature map, M is different.M varies only between different input feature maps, and is related to the feature map size, not to the number of channels.
To obtain the fuzzy set Z, the fuzzy function uses a Gaussian fuzzy function, as shown in (2).
where x; y ð Þ are the coordinates of the feature point p x;y;c in channel C. μ k;c and σ k;c denote the mean difference and standard deviation, respectively.
The fuzzy degree γ i;j 2 0; 1 ½ � denotes the fuzzy relationship between the feature points p i and p j in γ i;j , and the fuzzy value of each pixel is calculated from the fuzzy degree.For M fuzzy functions, it is necessary to recombine them through defined fuzzy logic decision rules; each Gaussian function can learn the original eigenvalue values under various distributions.The definition of fuzzy degree is as follows: Given the feature point in a channel, a fuzzy neighbour set Z is calculated for each channel according to the proposed fuzzification process.After obtaining the fuzzy neighbourhood of different classes, they are passed through a convolutional normalization module to ensure that they have the same size as the input feature map F. It is then stitched with the original feature map to obtain the final output feature map, which enters the next branch.

The multi-attention gating module
The input image is processed by the convolution module to obtain feature maps with different sizes.A shallow network acquires feature maps with higher resolution that contain more local detail, whereas a deep network contains more semantic information.Because remote sensing images suffer from an uneven distribution between foreground and background, low contrast, and large variability in size and shape, small objects in the foreground are not easily recognized.To achieve accurate semantic segmentation, the highlevel semantic information should be preserved while the low-level detail information is also considered.To achieve this, we introduce a multi-attention gating module to refine different feature details.Finally, the splicing design is adopted to ensure the stability of propagation.
As shown in Figure 4, assuming that X L is a vector from the low-level feature channel and X H is a vector from the high-level feature channel, multi-attention gating is defined by (4).
where X 0 is the output of the module, f is the corresponding function that calculates the relationship between X H and X L , 1 C X ð Þ is the normalization factor (referred to here as the attention factor), and g is the unary function that calculates the input vector.For each element of X L and X H there is a corresponding X R response.Equation ( 4) is converted to a computable neural network module by (5).
where E e and F 0 d denote the output of the deconvolution information and the original information, respectively.τ denotes the convolution block formed by the convolution, normalization, and activation layers.� denotes the activation function.The coefficients α i 2 0; 1 ½ � are used to identify the foreground region containing small objects and construct the feature map response information.
The output of the multi-attention gating module is the product of the input characteristic graph and the attention coefficient α i E e .The module computes a single scalar attention value for each pixel vector.Thus, each attention learns to focus on a subset of the target structure rather than the entire background information.In this manner, it is more convenient to focus on a specific area and thereby reduce the amount of redundant computation.The multi-attention gating module trims the low-level feature information through contextual information, connects low-level features with high-level features to achieve information refinement.The overall definition of the module is as follows: where E e denotes the output deconvolution information, F 0 d denotes the information after the multiattention gating module, τ denotes the convolution block formed by the convolution, normalization, and activation layers, and � denotes the splicing operation.

Experiments and analysis
In this section, we evaluate the effectiveness, superiority, and generalization ability of the proposed network.First, we describe the three datasets used and present the details of the experimental setup.Second, we report quantitative and qualitative comparison results for three datasets.Finally, we demonstrate the effectiveness of our network through ablation experiments.

The CCF dataset
The CCF dataset was derived from the "CCF Satellite Image AI Classification and Recognition Competition", a high-resolution remote sensing image of a region in southern China captured in 2015.The types of features are divided into five categories: low vegetation, roads, buildings, water body, and others.Because the sizes of the original dataset images ranged from 4000 × 2000 pixels to 8000 × 8000 pixels, we sliced the sample images into slices of size 320 × 320 pixels and extended the data with remote sensing images and the corresponding labels by using OpenCV.Finally, we obtained 10, 489 slices for training and 3011 slices for testing.

The Vaihingen dataset
The Vaihingen dataset was imaged in the Vaihingen area of Stuttgart, Germany.It shows many separate buildings.The objects are divided into impervious surfaces, buildings, low vegetation, trees, clutter, and cars.This dataset includes 33 images, each with three bands, corresponding to near-infrared, red, and green wavelengths, and the ground sampling is 9 cm.In this study, we used 16 patches for training and 17 patches for testing.Finally, we used 1324 and 395 slices for training and testing, respectively.

The IND.V2 dataset
The IND.v2 dataset shows many separate buildings.The objects are divided into shadows, occluded areas, vegetation covers, complex roofs, and dense building areas.This dataset contains 294 images with a size of 1024 × 1024 pixels.This study used 256 patches for training and 38 patches for testing.We sliced the sample images into slices of size 520 × 520 pixels.Finally, we obtained 1536 slices for training and 228 slices for testing.

Implementation details
The PyTorch deep learning framework was used to conduct experiments on high-performance computing equipment with an Intel Core I7-7820× CPU and an NVIDIA GeForce GTX 1080 Ti GPU (with 23.2 GB RAM).The maximum boost frequency of this CPU is 3.6 GHz, each instruction can perform four singleprecision floating-point operations, and the CPU has eight physical cores.The operating system was Linux, the compiler was PyCharm Community, and the program was written in Python 3.6.The image processing functionality was provided by open-source computer vision libraries.A commonly used stochastic gradient descent optimizer (Adam) guided the optimization.The learning rate was 0.000,1 and the base was set to 0.95.Because of the limited physical memory on our GPU card, the batch size was set to 4 during training.
The evaluation metrics used are derived from those used in (Qi et al., 2020): Overall Accuracy (OA), Recall, Precision, F1 Score, Mean Intersection over Union (MIoU) and Frequency Weighted Intersection over Union (FWIoU).

Results on the CCF dataset
The efficiency of our network was compared with those of UNet (Ronneberger et al., 2015), SegNet (Badrinarayanan et al., 2017), FCN (Long et al., 2015), Deeplabv3+ (L. C. Chen et al., 2018), LinkNet (Chaurasia & Culurciello, 2017), EffuNet, Deeplabv3, MacuNet (Li, Duan, et al., 2021) and RSFCNN (Zhao et al., 2021).EffuNet is a combination of UNet and Efficient-net-B7 baseline.Example results of the proposed network on the CCF dataset are shown in Figure 5.Our segmentation result is complete for low vegetation, roads, and buildings.In comparing with other methods, the segmentation results of our network have clearer boundaries and are more accurate for some segmentation details within the images.Representative detailed results for the CCF database are presented as an example in Figure 6.In the first row of Figure 6, our network segments images with more complete buildings and roads simultaneously.The segmentations of roads and buildings are not mixed with too much orange background information.As shown in the second row of Figure 6, SegNet and UNet incorrectly classify the buildings as low vegetation.In contrast, the segmentation performed by our model is a little more accurate.As shown by the third row of Figure 6, our network has the advantage of identifying disturbing factors, such as buildings affected by shadows and light.The above results prove that our network can perform robust feature extraction and high-resolution detail recovery from high-resolution remote sensing images.
Table 1 shows the OA, Precision, Recall, FWIoU, MIoU, and Mean F1 of the different methods on the CCF dataset.Our method is superior to the comparison network in terms of Recall and MIoU.Among the indices, our model scores worse than LinkNet methods for Precision.Compared with RSFCNN, it achieves improvements of 3.486% in Mean F1, respectively.Compared with LinkNet, our method improves the OA and Recall by 0.846% and 3.453%, respectively.Table 2 shows the detailed results for each class of the CCF dataset.It shows that our method achieves the highest F1 Score for buildings, road, and low vegetation.These results also prove the effectiveness of our network.

Results on the Vaihingen dataset
To further evaluate the effectiveness of our method, we also conducted comparative experiments on the Vaihingen dataset.The experimental results are presented in Table 3, which shows that our network achieved the best performance in Mean F1, MIoU, and Precision.The visualization results on the test set are shown in Figure 7.
The prediction results based on this dataset show that our method achieved a good segmentation effect on the impervious surfaces in the cyan area of Vaihingen, with more explicit building boundaries and attenuated blurred mismatch caused by shadows.In addition, objects with almost the same colour were differentiated well into their categories.For example, as shown in the first row of Figure 7, SegNet and UNet incorrectly classify buildings as trees because of their shadows.RSFCNN, although relatively accurate in classification, introduced too much background information in tree classification.In contrast, our network performs optimal segmentation.As shown in the third row of Figure 7, our results for buildings are significantly better than those of the other networks in large areas of low vegetation and trees.
The quantitative results for the Vaihingen dataset are shown in Tables 3 and 4.These results show that our model outperformed other methods with respect to Mean F1 and MIoU.The Mean F1 of our model is 1.382% greater than those of UNet.The OA of our model is 0.541% greater than that of Deeplabv3+.Our model also demonstrates significant advantages in handling small objects.For example, for the classification of buildings and cars, the F1 Score of our model is improved by 3.072% and 0.561%, respectively, compared with that of RSFCNN.

Results on the IND.V2 dataset
We conducted the experiments that are based on the IND.v2 dataset.We performed plenty of visualization processing on the prediction results, as shown in Figures 8. Since the dataset consists of a large number of single small objects, as shown in the local prediction results of the dataset, our method has a better segmentation effect on detailed information in buildings.UNet and LinkNet correctly identify the buildings but also incorrectly identify the surrounding concrete as buildings, as shown in Figure 9. FCN introduces too much background information in classification, resulting in blurred building boundaries.In contrast, our network achieves satisfactory performance on shadow regions and tricky buildings, such as the building covered by the tree in the top right of the image.Finally, Table 5 lists the performance comparison for the different methods.Our method outperforms the other methods for OA, FWIoU, and MIoU.In this case, FWIoU reached 94.478%, which is at a rather advanced level.MIoU reached 85.119%, which is at least 1% points better than that for the other methods.

Ablation Study and optimizer
The optimization algorithm has a significant impact on the performance of the model.In this section, according to different optimization algorithms, we report full-scale ablation experiments to validate each module.Taking the Vaihingen dataset as an example, the baseline network was trained, and we then gradually added the multi-attention gating module and the fuzzy module.We conducted and reported their performance with respect to OA. M represents the multi- attention gating module, and F represents the fuzzy module.
Table 6 presents the results of the ablation study on the Vaihingen dataset.The OA of our model in the ablation experiment is 84.960%, which is better than that of the benchmark model.In addition, the use of multi-attention gating blocks significantly improves the OA, compared with the benchmark model.Using the multi-attention gating module to enhance contextual information improves the OA by at least 0.934%.Moreover, adding the fuzzy module improves the OA by 1.097%.Obviously, the fuzzy neighbourhood module is more conducive to high-resolution remote sensing image segmentation.Furthermore, when the two are combined, the OA increases by 2.924%, which proves (to some extent) that both modules are required.
The results of the visual ablation study of the Adam optimizer on the Vaihingen dataset are presented in    Figure 10.In comparison with the ground truth, only the baseline is used for segmentation, the boundary between the two objects is adhered to because of shadow overlap, the edge noise was evident, and the boundary was mostly jagged.After the fuzzy neighbourhood module and multi-attention gating module are added, the mutual independence between the objects is improved, and the edge noise and region noise are reduced.The corresponding visualization results are shown in the third row of Figure 10.The results of the visual ablation study of the SGD optimizer on the Vaihingen dataset are presented in Figure 11.The semantic information of car is easily covered by other categories.As illustrated in    Figure 11(d), when applying M to the baseline, the performance of the car has been improved.When applying F to the baseline, the performance for classification of heterogeneous object has been improved.We set two different optimization algorithms and trained the models separately to investigate the segmentation effect.We have determined that the model performs best when the Adam optimiser is used.We set up two different optimisers, Adam and SGD, and train the models separately to examine the effect of the iterations.This is only effective when the loss function becomes small.Figure 12 illustrates the visualization of the loss value per iteration for the IND.v2 datasets.It is not difficult to see that in the initial stage of training, the loss varies greatly.However, as the number of cycles increases, the loss value changes slowly.When the number of epochs reaches 500, the loss value gradually converges.Besides, when the optimizer is SGD, the loss function decreases slowly.Therefore, we chose Adam as the optimiser when the model was trained on different datasets.

Discussion
In this study, we proposed a fuzzy neighbourhood convolutional neural network that extract ground objects from high-resolution remote sensing images.The network combined multi-level feature extractor and fuzzy neighbourhood module, which is used to process uncertain information resulting from intraclass noise and the boundaries of different object.However, people may question that, for the remote sensing images with high intra-class difference and complex object boundaries, the fuzzy neighbourhood module may not be reliable.Our fuzzy neighbourhood module proceeds under a basic assumption that a pixel p i in the feature map is related to neighbourhood pixel p j , which contains most of the corresponding feature representation.Based on this assumption, the difference of uncertain information can be minimized by considering pixel neighbourhood information in the pixel classification process.Thereby, the only question is whether this assumption holds true in the complex scenarios as aforementioned.We think the hypothesis holds.Our method was tested on three remote sensing datasets and obtained some remarkable segmentation results.Such results showed the potential of fuzzy neighbourhood module to solve the above problems.Also, our method improved the deep-learning structure by adopting the fuzzy pattern recognition methods, which can construct fuzzy relations to express problems that is difficult to be accurately described by classical mathematical logic, and have a unique advantage in solving uncertain problems.Recent research combining convolutional neural networks and fuzzy learning (M.Liu et al., 2019) extends the image input by adding additional channels, which is equivalent to adding additional handcrafted features represented by fuzzy set to the crisp input.However, these handcrafted fuzzy features computed based on the crisp input may fundamentally limit their  performance.These methods may lose the original deterministic information.Rather than using the methods of the fuzzy data pre-processing cascading neural network that extract features from crisp input, we deeply integrated a fuzzy neighbourhood module in network architecture level to boost the performance of our model.Our approach was able to incorporate the convolutional neural network model alongside with the fuzzy learning methods in an effective manner.That is, the original definite information is retained.At the same time, the local features of each point are captured, and the permutation of the data feeding order is kept unchanged.
A commonly used design in the convolutional neural networks is based on stacked convolutions and pooling operations, which constantly reduce the spatial size of features to enhance their semantic representations.As mentioned in feature extraction structure section, we used a five-layers deep convolutional neural network structure.Although a deeper and wider network can capture richer and more complex features, it came at the cost of losing detailed spatial information (He et al., 2016).After down sampling several times in the deep convolutional neural network architectures, the latter feature maps lost spatial information.A small object of size 32 × 32 pixels is clearly visible in shallower feature maps, but not in the deeper feature maps.
The early method (Y.Liu et al., 2021) to solve the small object semantic segmentation was to improve the segmentation performance through image multi-scale strategy.This method enlarged the input image to different sizes, and judged the prediction results of all the size images to get the final segmentation result.Other methods (G.Chen et al., 2019;Xiao et al., 2020) used pyramid pooling layers to segment objects in multiple scales.Recent studies on small object detection (Nguyen et al., 2020;Tong et al., 2020) have successfully improved the detection accuracy by mainly focusing on top-down or bottom-up feature fusion.For example, some methods used skip connections to directly add lower-level feature maps to higher-level feature maps, or concatenated features by performing pooling and deconvolution simultaneously.
ln our method, we adopted weight coefficients that can adaptively fuse shallow and deep features.For this manner, it is more convenient to focus on a specific area rather than the entire background information.The performance of the fuzzy neighbourhood neural network was based on the result information of the fuzzy neighbourhood module and the feature complement of the multi-attention gating module respectively.The addition of the multi-attention gating part helped to supplement object information clearly.Therefore, our method was a process of mutual promotion, not only had certain effects in small object segmentation, but also had certain advantages in eliminating complex boundaries and intra-class noise.Based on the resulting segmentation and the corresponding accuracies, we concluded that fuzzy neighbourhood neural network can plays an important role in remote sensing image segmentation.
At the same time, other works on small object detection (Pang et al., 2019;Qian et al., 2019) proposed to use losses to address class imbalance for small objects.These methods give us a lot of inspiration for improving the recognition of small objects and thus optimizing the overall segmentation effect.In addition, the proposed model heavily relies on large labelled datasets.However, it is time-consuming and laborious to label large-scale remote sensing images that contain various diversity, such as different sensors, data ground sampling distances, acquisition areas and so on.Transfer learning can extract common knowledge from labelled datasets to help improve the performance of the model when the object lacks labels.In the follow-up study, we are planning to further delve into fuzzy learning, modify the proposed architecture and explore other excellent ideas.

Conclusion
This paper proposes a network structure to solve the problems of shadow noise and classification in remote sensing images.The network performs processing using the fuzzy neighbourhood module to overcome the inherent uncertainty of remote sensing images, achieve better boundary delineation, and improve segmentation performance.It realizes the filtering and extraction of shallow information in remote sensing images by using the multi-attention gating module, which can effectively remove noise from the shallow feature images and compensate for the detail in the deep feature images more robustly.As verified on the CCF, Vaihingen, and INDv2 dataset, our method can relatively better identify shadow information and smaller targets and can recognize target edge boundaries, while maintaining high accuracy.In the future, we will continue to study fuzzy learning in depth and adhere to the idea of combining fuzzy learning with deep learning for remote sensing image segmentation and further optimization of our proposed method.

Figure 1 .
Figure 1.Boundary pixels and shadows in remote sensing image.(a) the building shadows obscure the boundaries of cars and roads.(b) the buildings are intricately connected to low vegetation boundaries.

Figure 10 .
Figure 10.Results of the same baseline fusion test on the Vaihingen dataset with six different modules.(a) Original Image.(b) Ground truth.(c) BS.(d) BS+M.(e) BS+F.(f) Ours.

Figure 11 .
Figure 11.Results of the same baseline fusion test on the Vaihingen dataset with six different modules.(a) Original Image.(b) Ground truth.(c) BS.(d) BS+M.(e) BS+F.(f) Ours.

Figure 12 .
Figure 12.Loss curves for the training set with different optimizer.

Table 1 .
Performance comparison of different methods on the CCF dataset.

Table 2 .
F1 score of different classes on the CCF dataset.

Table 3 .
Performance comparison of different methods on the Vaihingen dataset.

Table 4 .
F1 score of different classes on the Vaihingen dataset.

Table 5 .
Performance comparison of different methods on the IND.V2 dataset.

Table 6 .
Comparison between the baseline only and the baseline with the corresponding model added.