A Fine-Grained Image Classification Approach for Dog Feces Using MC-SCMNet under Complex Backgrounds

Simple Summary Currently, more and more people keep dogs, and the gastrointestinal diseases of pet dogs have brought great losses to families. However, the condition of the dog’s feces is closely related to the health of its stomach and intestines. We can know the intestinal condition of dogs in advance by scoring dog feces, and implement measures such as food adjustments. The PURINA FECAL SCORING CHART and the WALTHAM™ Faeces Scoring System are good at scoring dog feces visually, but some scoring experience is required. Therefore, this paper proposes an artificial intelligence method to automatically classify the condition of dog feces by combining their classification criteria with the assistance of animal experts. This method can achieve an accuracy of 88.27%, improving the diagnostic efficiency of veterinarians. Abstract In a natural environment, factors such as weathering and sun exposure will degrade the characteristics of dog feces; disturbances such as decaying wood and dirt are likely to make false detections; the recognition distinctions between different kinds of feces are slight. To address these issues, this paper proposes a fine-grained image classification approach for dog feces using MC-SCMNet under complex backgrounds. First, a multi-scale attention down-sampling module (MADM) is proposed. It carefully retrieves tiny feces feature information. Second, a coordinate location attention mechanism (CLAM) is proposed. It inhibits the entry of disturbance information into the network’s feature layer. Then, an SCM-Block containing MADM and CLAM is proposed. We utilized the block to construct a new backbone network to increase the efficiency of fecal feature fusion in dogs. Throughout the network, we decrease the number of parameters using depthwise separable convolution (DSC). In conclusion, MC-SCMNet outperforms all other models in terms of accuracy. On our self-built DFML dataset, it achieves an average identification accuracy of 88.27% and an F1 value of 88.91%. The results of the experiments demonstrate that it is more appropriate for dog fecal identification and maintains stable results even in complex backgrounds, which may be applied to dog gastrointestinal health checks.


Introduction
Dogs are one of the first domesticated animals in the world and are widely bred for their docile temperament. Living with a pet dog can relieve stress [1] and even provide medical psychological support [2]. Therefore, people pay more and more attention to the health and physical condition of dogs [3,4]. In the breeding process, if the dog is sick and not treated in time, the breeder will suffer both economic and spiritual losses [5]. Therefore, To address the issue that the features of degraded dog feces are hard to retain via down-sampling, we propose a MADM method that combines the advantages of convolution and pooling operations. The method parallelizes multiple convolution and maximum pooling while concatenating an attention mechanism. Such a connected structure collects global features and enhances the scale variety of local features. As a result, the downsampled feces information is fully represented, and critical information is highlighted. Despite being transformed by harsh natural environments such as air-drying, MADM can keep its fine traits to the best of its ability.
To solve the issue of false identification of dog feces, we propose a novel attention mechanism CLAM based on the properties of dog feces that filters background interference and improves the network model's awareness to fecal properties. We propose a novel attention mechanism CLAM based on the properties of dog feces that filters background interference and overcomes the problem of false identification of dog feces. Because feces are separated from the ground at different heights, their spatial distribution differs. Moreover, CLAM has a cross structure that can swiftly perceive spatial information and provides auxiliary network features. Based on CA's location weight assignment strategy [32], CLAM concentrates spatial attention by embedding global and local information into feature channels along horizontal and vertical directions, respectively. This helps the network to focus on the given objects while minimizing the interference of other redundant inputs.
To overcome the inter-class similarity problem of dog feces, we propose a brand new SCM Block. Specifically, we combine the previous CLAM and MADM to construct a new feature extraction unit, recreate the ResNeSt backbone network, and compose SCMNet. Such a design is end-to-end without additional computational overhead for complex backgrounds in images. SCMNet improves the network's ability to express stool information by aggregating CLAM attention weights and MADM multi-scale features. It boosts the accuracy of ResNeSt's adaptation to dog feces.
The research contributions of this paper can be summarized as follows: To address the issue that the features of degraded dog feces are hard to retain via downsampling, we propose a MADM method that combines the advantages of convolution and pooling operations. The method parallelizes multiple convolution and maximum pooling while concatenating an attention mechanism. Such a connected structure collects global features and enhances the scale variety of local features. As a result, the down-sampled feces information is fully represented, and critical information is highlighted. Despite being transformed by harsh natural environments such as air-drying, MADM can keep its fine traits to the best of its ability.
To solve the issue of false identification of dog feces, we propose a novel attention mechanism CLAM based on the properties of dog feces that filters background interference and improves the network model's awareness to fecal properties. We propose a novel attention mechanism CLAM based on the properties of dog feces that filters background interference and overcomes the problem of false identification of dog feces. Because feces are separated from the ground at different heights, their spatial distribution differs. Moreover, CLAM has a cross structure that can swiftly perceive spatial information and provides auxiliary network features. Based on CA's location weight assignment strategy [32], CLAM concentrates spatial attention by embedding global and local information into feature channels along horizontal and vertical directions, respectively. This helps the network to focus on the given objects while minimizing the interference of other redundant inputs.
To overcome the inter-class similarity problem of dog feces, we propose a brand new SCM Block. Specifically, we combine the previous CLAM and MADM to construct a new feature extraction unit, recreate the ResNeSt backbone network, and compose SCMNet. Such a design is end-to-end without additional computational overhead for complex backgrounds in images. SCMNet improves the network's ability to express stool information by aggregating CLAM attention weights and MADM multi-scale features. It boosts the accuracy of ResNeSt's adaptation to dog feces.
The research contributions of this paper can be summarized as follows: 1.
For the first time, we collected a dataset of 1623 images of dog feces (DFML). According to the suggestion of Zuofu Xiang [33,34], a member of the Institute of Evolutionary Ecology and Conservation Biology, Central South University of Forestry and Technology, the dataset divides dog feces into four categories: diarrhea, lack of water, normal, and soft stool, providing a reference for the gastrointestinal diagnosis of dogs.

2.
In order to address the challenges of dog feces image identification, we suggest the MC-SCMNet network, which is designed as follows: • We design a multi-scale attention down-sampling module (MADM) to address the problem of losing critical characteristics owing to the degradation of dog feces. It enables the network to keep fecal texture details to the maximum extent possible while also fighting the impacts of fecal feature degradation.

•
We design a coordinate location attention mechanism (CLAM) to minimize the interference of similar items on dog feces identification. Its unique cross structure is able to filter noisy signals while highlighting fecal feature values in the attention map.

Related Works
The problem of classifying dog feces is a typical fine-grained visual classification (FGVC) problem [35]. FGVC focuses on samples of the same or closely related subordinate categories and is more specific than standard image classification hierarchies. For example, classifying different vehicle models [36], dog breeds [37], fruit types [38], citrus surface defects [39], etc. Fine-grained image classification has undergone long-term development, transitioning from machine learning methods to deep learning.
In recent years, researchers have applied complex machine learning [40][41][42][43][44][45][46][47][48][49] to the field of fine-grained image classification, which provides a reference for our research. In terms of agriculture, Chen et al. [28] used a tomato leaf disease recognition method based on the combination of ABCKBWTR and B-ARNet, and achieved a recognition accuracy of about 89%. Li et al. [50] proposed a FWDGAN method combining the deep features of ResNet and the global features of InceptionV1. This method achieved high recognition accuracy in tomato disease recognition. Deng et al. [51] integrated DenseNet-121, SE-ResNet-50, and ResNeSt-50 to diagnose six types of rice diseases, with an average accuracy of 91%. In the related field of stool, Saboo et al. [52] used machine learning (ML) analysis to find that the fecal microbiota was strongly correlated with the severity of liver cirrhosis. Ludwig et al. [53] used machine learning to classify infant feces, achieving 77.0% of manual detection. Hwang et al. [54] used a support vector machine (SVM) classifier to classify feces in colonoscopy videos based on color features, and the performance average accuracy of Sensitivity and Specificity reached 88.89 and 91.35, respectively. Liao et al. [55] developed a stool diagnostic system, which can achieve 100% and 99.2% recognition accuracy in color and features of stool images. Zhou et al. [56] proposed a hierarchical convolutional neural network architecture to train stool shapes. Their method also incorporated machine learning, and finally achieved 84.4% accuracy and 84.2% latency reduction on human stool classification. Leng et al. [57] achieved a classification accuracy of 98.4% on a self-built Animals 2023, 13, 1660 5 of 24 dataset of five stool classifications through a lightweight shallow CNN. Choy et al. [58] used ResNext-50 to classify human feces for monitoring and diagnosis of human diseases, achieving an accuracy of 94.35%.
However, the previous investigations are primarily focused on a single background interference, and the classification subject is merely human excrement, which is not extensible. Furthermore, applying solely deep learning components does not explore the relationship between stool features and network structure, and it is incapable of coping with more complex background interference. Therefore, this paper extends the target to dog feces, and selects ResNeSt as the benchmark network to broaden the application range of FGVC.

Materials Acquisition
The dataset of dog feces served as the foundation for the paper. We collected a significant number of dog feces photos to create the dog feces differentiated by moisture level (DFML) dataset. We based on the PURINA FECAL SCORING CHART [19] and the WALTHAM™ Faeces Scoring System [20]. On the advice of Zuofu Xiang [59,60], the stool images were classified into four categories based on their moisture contents: diarrhea, lack of water, normal, and soft stools. In addition, we cleaned invalid images according to the following criteria: (1) complete loss of stool characteristics; (2) images that are wobbly and blurry; (3) excessive light exposure or inadequate light. Finally, 1623 valid photos were obtained, and Table 1 shows the features and amounts of images in each category. The images in the DFML dataset consist of two parts. One part was captured by Canon EOS R6 in natural light, with 1412 images from Hunan Provincial Animal Hospital, White Sand Kennel, Changsha Small Animal Protection Association; the other part was screened on the Internet, with 211 images. accuracy in color and features of stool images. Zhou et al. [56] proposed a hierarchical convolutional neural network architecture to train stool shapes. Their method also incorporated machine learning, and finally achieved 84.4% accuracy and 84.2% latency reduction on human stool classification. Leng et al. [57] achieved a classification accuracy of 98.4% on a self-built dataset of five stool classifications through a lightweight shallow CNN. Choy et al. [58] used ResNext-50 to classify human feces for monitoring and diagnosis of human diseases, achieving an accuracy of 94.35%. However, the previous investigations are primarily focused on a single background interference, and the classification subject is merely human excrement, which is not extensible. Furthermore, applying solely deep learning components does not explore the relationship between stool features and network structure, and it is incapable of coping with more complex background interference. Therefore, this paper extends the target to dog feces, and selects ResNeSt as the benchmark network to broaden the application range of FGVC.

Materials Acquisition
The dataset of dog feces served as the foundation for the paper. We collected a significant number of dog feces photos to create the dog feces differentiated by moisture level (DFML) dataset. We based on the PURINA FECAL SCORING CHART [19] and the WAL-THAM™ Faeces Scoring System [20]. On the advice of Zuofu Xiang [59,60], the stool images were classified into four categories based on their moisture contents: diarrhea, lack of water, normal, and soft stools. In addition, we cleaned invalid images according to the following criteria: (1) complete loss of stool characteristics; (2) images that are wobbly and blurry; (3) excessive light exposure or inadequate light. Finally, 1623 valid photos were obtained, and Table 1 shows the features and amounts of images in each category. The images in the DFML dataset consist of two parts. One part was captured by Canon EOS R6 in natural light, with 1412 images from Hunan Provincial Animal Hospital, White Sand Kennel, Changsha Small Animal Protection Association; the other part was screened on the Internet, with 211 images.
The following are detailed descriptions of the four categories of feces: (1) Lack of water, with a dry surface and a hard texture. Sometimes the feces are roughly spherical in form, while other times it resembles a concave-convex strip. (2) Normal dog feces have visible gaps, commonly in the shape of a long snake. (3) Soft stools have a smooth and soft texture, a shiny finish, and an irregular form. (4) Diarrhea is sticky, extremely soft and shapeless, and occasionally liquid. There are also similarities in the characteristics of the categories, such as (1) and (2) lengthy strips, (2) and (3) similar surface color, and (3) and (4) shapelessness. These traits cause disturbances in the categorization of dog feces and researching them has practical significance for FGVC.

Materials Preprocessing
A large number of images can improve the network's training process by enriching the effective characteristics of feces and reducing the problem of overfitting. For data augmentation of DFML images, we utilize the imgaug package and use the following methods at random: (1) 50% Gaussian blur with coefficients ∈ [0, 0.5]; (2) Enhance or reduce

Materials Preprocessing
A large number of images can improve the network's training process by enriching the effective characteristics of feces and reducing the problem of overfitting. For data augmentation of DFML images, we utilize the imgaug package and use the following methods at random: (1) 50% Gaussian blur with coefficients ∈ [0, 0.5]; (2) Enhance or reduce the image's contrast; (3) Gaussian noise, 50% of the images sample the noise once every pixel, other images sample the noise every pixel and channel; (4) Adjusts the brightness of 20% for the images; (5) Affine transformations, including scaling, panning, rotating, and The surface is lustrous and smooth, with a soft touch and irregular shape.

Materials Preprocessing
A large number of images can improve the network's training process by enriching the effective characteristics of feces and reducing the problem of overfitting. For data augmentation of DFML images, we utilize the imgaug package and use the following methods at random: (1) 50% Gaussian blur with coefficients ∈ [0, 0.5]; (2) Enhance or reduce the image's contrast; (3) Gaussian noise, 50% of the images sample the noise once every pixel, other images sample the noise every pixel and channel; (4) Adjusts the brightness of 20% for the images; (5) Affine transformations, including scaling, panning, rotating, and clipping. Finally, we gathered 3246 training images, and Figure 2 and Table 2 illustrate the enhancing effect and augmented data. shapeless, and occasionally liquid. There are also similarities in the characteristics of the categories, such as (1) and (2) lengthy strips, (2) and (3) similar surface color, and (3) and (4) shapelessness. These traits cause disturbances in the categorization of dog feces and researching them has practical significance for FGVC.

Materials Preprocessing
A large number of images can improve the network's training process by enriching the effective characteristics of feces and reducing the problem of overfitting. For data augmentation of DFML images, we utilize the imgaug package and use the following methods at random: (1) 50% Gaussian blur with coefficients δ ∈ [0, 0.5]; (2) Enhance or reduce the image's contrast; (3) Gaussian noise, 50% of the images sample the noise once every pixel, other images sample the noise every pixel and channel; (4) Adjusts the brightness of 20% for the images; (5) Affine transformations, including scaling, panning, rotating, and clipping. Finally, we gathered 3246 training images, and Figure 2 and Table 2 illustrate the enhancing effect and augmented data.

Diarrhea
Gooey, highly fluffy, and irregular in shape, partly liquid.

Materials Preprocessing
A large number of images can improve the network's training process by enrich the effective characteristics of feces and reducing the problem of overfitting. For data a mentation of DFML images, we utilize the imgaug package and use the following me ods at random: (1) 50% Gaussian blur with coefficients ∈ 0, 0.5 ; (2) Enhance or red the image's contrast; (3) Gaussian noise, 50% of the images sample the noise once ev pixel, other images sample the noise every pixel and channel; (4) Adjusts the brightn of 20% for the images; (5) Affine transformations, including scaling, panning, rotating, a clipping. Finally, we gathered 3246 training images, and Figure 2 and Table 2 illustrate enhancing effect and augmented data.

MC-SCMNet
This paper proposes a network MC-SCMNet based on ResNeSt for FGVC of d Figure 3 depicts the model structure, with (a) to (e) representing the main structu MADM module, CLAM, DSC, and SCM-Block. In the feature extraction stage, we uti MADM and CLAM to enrich feces features while filtering out irrelevant information enter the feature fusion stage. We employ SCM-Block to build a backbone network dur

MC-SCMNet
This paper proposes a network MC-SCMNet based on ResNeSt for FGVC of dog. Figure 3 depicts the model structure, with (a) to (e) representing the main structure, MADM module, CLAM, DSC, and SCM-Block. In the feature extraction stage, we utilize MADM and CLAM to enrich feces features while filtering out irrelevant information to enter the feature fusion stage. We employ SCM-Block to build a backbone network during the feature fusion stage. SCM-Block combines the advantages of Split-Attention, CLAM, and MADM, optimizes fusion efficiency, accelerates information sharing across layers, and improves the network's global perception ability. Furthermore, DSC was used to reduce the number of network parameters and expand the flexibility of MC-SCMNet. MC-SCMNet integrates dog feces characteristics more effectively than ResNeSt to strengthen adaptive judgment for the future classification stage.
Animals 2023, 13, x FOR PEER REVIEW 7 of 24 the feature fusion stage. SCM-Block combines the advantages of Split-Attention, CLAM, and MADM, optimizes fusion efficiency, accelerates information sharing across layers, and improves the network's global perception ability. Furthermore, DSC was used to reduce the number of network parameters and expand the flexibility of MC-SCMNet. MC-SCMNet integrates dog feces characteristics more effectively than ResNeSt to strengthen adaptive judgment for the future classification stage.  The natural environment's attenuation of feces characteristics has a stronger impact on the network feature. The down-sampling structure in the neural network can adjust the image feature scale, create picture thumbnails, and extract prominent features in the receptive field. It can highlight valuable information and help to ease this condition. Scaling the picture size can reduce complexity and suppress overfitting. Down-sampling is commonly done with maximum or average pooling. The former emphasizes the most prominent characteristics in the receptive field, while minor feature information is readily overlooked. The latter is milder, smoothing all information and diminishing the sharpness of prominent characteristics. However, this will impair the network's translation invariance, making it difficult to withstand noise interference. Specially, weathering, rain, and sun, for example, will dilute the texture and color features of feces. This lowers the network's effective feature values and makes judging tougher for the classifier. Direct traditional down-sampling of feces images will exacerbate the loss of tiny features at this stage. Furthermore, standard pooling is static and does not have learnable parameters. In summary, the natural attenuation of stool characteristics reduces feature information, and standard down-sampling is difficult to cope with. As a result, we propose a MADM and the structure of which is shown in Figure 3b.
We divide the information flow into three branches, with a different kernel of convolution and maximum pooling. Unlike the static pooling process, the convolution operation can share all of the sampled points in the feature map, boosting the model's attention on tiny fecal characteristics. Furthermore, multiple convolutions can extract multi-scale information and comprehensively describe fecal characteristics. We employ large convolution kernels of 5 × 5 and 7 × 7 with a stride of two. Large convolution, on the other hand, increases the amount of network parameters, which we balance by employing depthseparable convolution. These two branches are trainable and compensate for the static disadvantage of max-pooling. This enables MADM to extract the most important feces textures while merging rich dog feces features to add data to the network. Finally, we employ SE attention to enhance the correlation between fecal feature channels and the information flow between the three branches.

Coordinate Location Attention Mechanism (CLAM)
In the environment, dead leaves, dirt, stumps, and other similar items often hamper dog feces categorization. We can employ attention mechanisms in deep learning to filter out these disturbances. The Attention Mechanism (AM) may reconfigure the corresponding weight of the feature map based on the target's attributes, therefore strengthening the target's feature expression. CA is a fairly new attention mechanism. It decomposes channel attention into two intersecting one-dimensional codes, aggregates characteristics in two spatial directions, and can locate target information accurately. Its spatial sensitivity is beneficial for fine-grained categorization in complicated environments. We propose a CLAM based on CA and the background properties of dog feces, and the structure is depicted in Figure 3c.
The feature map with input size h × w × C is first duplicated three times. The coordinate information x C is initially embedded in the feature map, as illustrated in Equation (1): Then, average pooling is employed to provide direction-awareness for both copies along the horizontal and vertical dimensions. The orientation perception is made up of two Animals 2023, 13, 1660 9 of 24 one-dimensional feature encodings, one in each direction. The feature encoding output in horizontal coordinate h and vertical coordinate w is as follows: We transpose the horizontal coordinate codes to allow later dimensional unity. Equation (4) depicts the transposition process.
We design a location attention mechanism (LAM) as the core of CLAM. This mechanism contains DSC and dynamic connection to generate changeable dimensionality via global pooling and parameter size. It enriches the fecal information adaptively. The pooling result is then coupled to a fully connected layer and encoded as a one-dimensional feature. Simultaneously, we join a 3 × 3 DSC with a dot product one-dimensional encoding. Following that, we synthesize the output LAM horizontal and vertical direction information, and the fusion process is depicted in Equations (5) and (6).
where F 1 and F 2 are LAM inputs, P 1 and P 2 are average pooling, P a and P b are fully connected layers, M is the element dot product, and A A and A B are LAM output information. Finally, we utilize the multiplication operation and softmax to fuse the LAM information in two spatial directions and aggregate the output A C of the third branch to generate the CLAM module's output F C , as indicated in Equation (7).
CLAM module output comprises multiple spatial angles of feces and significant target location, efficiently filtering the interference of complex backdrop and solving the problem of mistaken identification of identical items.

SCM-Block
The color, posture, and texture characteristics of dog feces are quite complex, with slight variances across classes as well as diversity within classes. For example, waterdeficient and normal feces are generally long strips; diarrhea is yellow, black, brown, and green in hue. Their color map spaces and texture channels are extremely comparable for the similarity problem. If the feature fusion is insufficient, the model will ignore minor variations, affecting the model's performance.
Therefore, we propose a new SCM-Block based on ResNeSt-Block, which stacked a new four-layer backbone network SCMNet. The block is based on Split-Attention for channel information modeling, combining the spatial dimension of CLAM and the multiscale information of MADM to improve information exchange efficiency, and its structure is shown in Figure 3e. CLAM first locates the fecal target in the input image, and MADM then stretches the feces' feature scale. Finally, we transmit the combined information into the Split-Attention module. For the input map X, the procedure is calculated as follows.
In the residual branch, different stride settings execute distinct procedures. With Stride = 1, the network maps the information directly; when Stride = 2, the network down-samples using MADM, as indicated in Equation (8).
In the main branch, we first use CLAM to extract the position information from the input image, and then use 1 × 1 convolution to split the feature channel into k × r branches (hyperparameters Cardinal = k, Radix = r). When the stride is 2, we utilize MADM to fuse the down-sampled features; when the stride is 1, we use DSC with kernel 3 × 3 to fuse the features. The final k × r sets of multi-channel characteristics are transmitted to the Split-Attention module. The Split-Attention calculating method is as follows: 1.
The Split Attention of each group of Cardinals is added to each split element, and the k group of Cardinal Groups is shown in Equation (9).
where,Û k R H×W×C/K f or k 1, 2, . . . K, H, W and C is the size of the block output feature map.

2.
Global information is gathered by pooling the global average of s k R C/K across spatial dimensions, with the cth component derived as defined in Equation (10). 3.
The channel is separated into r branches using an 1 × 1 convolution kernel, and the weights are allocated by the R-Softmax operation after 1 × 1 convolution, as described in Equation (11). G c i ic indicates calculating the weight of each split for s k .

4.
We sum the characteristics to get the cth channel calculation formula, which is shown in Equation (12).
The Split-Attention output of each group is then aggregated along the channel dimension as illustrated in Equation (13), and the backbone network output information is acquired after passing through the 1 × 1 convolution kernel.
We conduct feature fusion on the residual and main branches, and the final output signal Y is presented in Equation (14).
Animals 2023, 13, 1660 11 of 24 The number of split channels is determined by the SCM-Block hyperparameters cardinal and radix pair, which has different effects on model performance. The discussion of hyperparameters can be found in Section 4.3.

Depthwise Separable Convolution (DSC)
MC-SCMNet increases the amount of network parameters while boosting accuracy, which makes fast training difficult. As a result, we use DSC instead of standard convolution (as shown in Figure 3d) to decrease the number of model parameters and optimize the training process without impacting performance. DSC consists of a sequential connection between depthwise convolution and pointwise convolution. The former is a spatial convolution, with each input channel being individually conducted. The latter is an 1 × 1 convolution that reflects the input channels of the previous operation into a new channel space. It is similar to a 3D filter in that it has both channel and spatial dimensions. DSC introduces grouped convolution to the standard convolution. This allows it to calculate feature mapping and spatial mapping independently. As a consequence, it has no effect on the outcome of information processing and reduces the number of parameters.
Consider the input data to be S × S × C, the convolution kernel to be K × K × N, the stride to be 1, and the Equation (15) for the number of standard convolution parameters.
The calculation volume corresponding to the standard convolution is shown in Equation (16).
The parameter number of DSC is shown in Equation (17).
The corresponding calculation amount of DSC is shown in Equation (18).
Therefore, the ratio between the number of parameters and the amount of computation corresponding to the two structures is shown in Equations (19) and (20).
The above formulas show that DSC converts multiplication in convolution to addition, obtaining the same result. It makes better use of model parameters, lowers operational costs, and accelerates computation speed.

Results and Analysis
This section is divided into ten subsections: (1) Experimental environment and preparation; (2) Evaluation indicators; (3) Hyperparameter selection; (4) Data validation methods and tests; (5) Ablation experiment; (6) Analysis of performance experiments; (7) Comparison with other state-of-the-art methods; (8) Statistical test; (9) Model visualization; and (10) Workflow for a practical application. We tested and compared the models in many aspects. The results show that MC-SCMNet handles the problems of qualitative changes in dog feces in the natural environment, misdetection of objects similar to it, and tiny variability between classes.

Experimental Environment and Preparation
To prevent different experimental conditions from influencing MC-SCMNet results, all experiments in this paper are done in the same hardware and software environment. The main hardware devices used in this experiment are NVIDIA GeForce RTX 3070 Ti and 12th Gen Intel(R) Core (TM) i9-12900H 2.50 GHz CPU (Legion Y9000P|AH7H). The main software devices need to be compatible with the specific hardware. Table 3 depicts the particular experimental environment of this work. Considering the hardware performance and training effect, this paper sets the experimental batch to 130, employs the Adam optimizer, the learning rate is 0.0001, and the batch size is set to 8.

Evaluation Indicators
In this paper, we evaluate the performance of the model by accuracy, precision, recall, F 1 , parameter size, and training time.
where TP is true positive: prediction is positive, actuality is also positive; FP is false positive: predicted positive, actual negative; TN is true negative: predicted negative, actually negative; FN is false negative: predicted negative, actual positive. The inter-class average of model accuracy and recall is calculated using the weighted-average strategy [61]. The parameter is used to measure the model size and operating cost. Training time indicates the model's training speed.

Hyperparameter Selection
In Section 3.3.3, we mentioned the different effects of hyperparameters on the model. In order to explore the optimal performance combination, we test the impacts of various radix and cardinal models, and the experimental results are listed in Table 4. According to the experimental results, the network performs best when radix = 2 and cardinal = 1. Therefore, the combination of radix = 2 and cardinal = 1 was used for all subsequent experiments in this paper.

Data Validation Methods and Tests
We put the data-augmented images, a total of 3246 dog fecal photos in four categories, into the model training while keeping the other experimental settings constant. The picture size has been standardized at 224 × 224. The 5-fold cross-validation approach is utilized for training in this paper. First, the images are split into five equal pieces at random. Then, five repeat trials were set up to train four pieces in sequence, leaving one to test the model. The model's ultimate performance is measured using the average of five experimental outcomes. This strategy efficiently avoids the contingency of model training while improving experiment accuracy and dependability.
We utilize this strategy to train ResNet50, ResNeSt50, and MC-SCMNet, as shown in Figure 4, and their average accuracy is 80.22%, 83.70%, and 88.24%, respectively. MC-SCMNet's accuracy is much higher than that of the original models ResNet50 and ResNeSt50. To further validate the performance of our model, we retrained ResNet50, ResNeSt50, and MC-SCMNet using the images before data augmentation; the resulting accuracy was 71.30%, 74.69%, and 77.30%, respectively. Experiments demonstrate that our model outperforms ResNet50 (+5.86%, +8.02%) and ResNeSt50 (+2.47%, +4.54%) on both the original and augmented datasets. Furthermore, as the number of data images increases, so does the improvement in accuracy. According to the experimental results, the network performs best when radix = 2 and cardinal = 1. Therefore, the combination of radix = 2 and cardinal = 1 was used for all subsequent experiments in this paper.

Data Validation Methods and Tests
We put the data-augmented images, a total of 3246 dog fecal photos in four categories, into the model training while keeping the other experimental settings constant. The picture size has been standardized at 224 × 224. The 5-fold cross-validation approach is utilized for training in this paper. First, the images are split into five equal pieces at random. Then, five repeat trials were set up to train four pieces in sequence, leaving one to test the model. The model's ultimate performance is measured using the average of five experimental outcomes. This strategy efficiently avoids the contingency of model training while improving experiment accuracy and dependability.
We utilize this strategy to train ResNet50, ResNeSt50, and MC-SCMNet, as shown in Figure 4, and their average accuracy is 80.22%, 83.70%, and 88.24%, respectively. MC-SCMNet's accuracy is much higher than that of the original models ResNet50 and Res-NeSt50. To further validate the performance of our model, we retrained ResNet50, Res-NeSt50, and MC-SCMNet using the images before data augmentation; the resulting accuracy was 71.30%, 74.69%, and 77.30%, respectively. Experiments demonstrate that our model outperforms ResNet50 (+5.86%, +8.02%) and ResNeSt50 (+2.47%, +4.54%) on both the original and augmented datasets. Furthermore, as the number of data images increases, so does the improvement in accuracy. The MC-SCMNet model can achieve an average accuracy of 88.24% after training with five-fold cross-validation. Fold 1's accuracy is 88.27%, which is the closest to the average accuracy. Figure 5   The MC-SCMNet model can achieve an average accuracy of 88.24% after training with five-fold cross-validation. Fold 1's accuracy is 88.27%, which is the closest to the average accuracy. Figure 5

Ablation Experiment
We present MADM and CLAM modules, build SCM-Block, and apply DSC to the models in this paper. To test their effectiveness, we chose four metrics to conduct ablation experiments on the model: Block, Attention, Conv, and under-sampling. Table 5 displays the experimental outcomes.  Table 5 shows a total of 14 experimental schemes. Schemes 1 and 2 are the basic models ResNet50 and the baseline ResNeSt50, respectively. Schemes 8 and 14 are the intermediary models SCMNet and MC-SCMNet, respectively.

Ablation Experiment
We present MADM and CLAM modules, build SCM-Block, and apply DSC to the models in this paper. To test their effectiveness, we chose four metrics to conduct ablation experiments on the model: Block, Attention, Conv, and under-sampling. Table 5 displays the experimental outcomes.  Table 5 shows a total of 14 experimental schemes. Schemes 1 and 2 are the basic models ResNet50 and the baseline ResNeSt50, respectively. Schemes 8 and 14 are the intermediary models SCMNet and MC-SCMNet, respectively.
When comparing Scheme 4 and Schemes 1-3, it is shown that Split-Attention + CLAM can execute feature extraction better, with a considerably higher accuracy than no attention (+5.55%), Split-Attention (+2.16%), and Split-Attention + CA (+1.54%). When comparing Schemes 5-7 and Schemes 2-4, it is discovered that DSC significantly decreases the amount of network parameters (−8.861 M) while maintaining same identification accuracy. When Schemes 2 and 8 are compared, it is shown that SCM-Block can better filter the interference information in the feature fusion stage of the model, decrease feature information loss, and greatly enhance model accuracy (+3.70%). When comparing Schemes 8 and 9, we discovered that employing MADM allows us to better focus on the model's tiny information during the down-sampling phase. Schemes 8-14 completely depict the effect of MADM, CLAM, and DSC on SCM-Block on the model. In Scheme 14, MC-SCMNet has an accuracy of 88.27% and greatly surpasses ResNet50 (+8.02%) and ResNeSt50 (+4.63%).

Comparison with Other Basic Classification Networks
In terms of average accuracy, we compare MC-SCMNet with SCMNet and some basic networks: AlexNet, VGG, Googlenet, ResNet, and ResneSt. To clearly depict the curve values, we apply the same level of high-dimensional smoothing and fitting to them. After processing, the average accuracy of the MC-SCMNet and SCMNet networks, as well as the AlexNet, VGG, Googlenet, ResNet, and ResneSt networks, are depicted in Figure 6 by red, blue, orange, purple, green, brown, and black dotted lines. We can clearly observe that the accuracy of the SCMNet network model is substantially greater than that of the ResNeSt model after 130 epochs. This demonstrates that SCM-Block has a considerable effectiveness in properly identifying stool types. SCMNet has an accuracy of 88.27% and greatly surpasses ResNet50 (+8.02%) and Res-NeSt50 (+4.63%).

Comparison with Other Basic Classification Networks
In terms of average accuracy, we compare MC-SCMNet with SCMNet and some basic networks: AlexNet, VGG, Googlenet, ResNet, and ResneSt. To clearly depict the curve values, we apply the same level of high-dimensional smoothing and fitting to them. After processing, the average accuracy of the MC-SCMNet and SCMNet networks, as well as the AlexNet, VGG, Googlenet, ResNet, and ResneSt networks, are depicted in Figure 6 by red, blue, orange, purple, green, brown, and black dotted lines. We can clearly observe that the accuracy of the SCMNet network model is substantially greater than that of the ResNeSt model after 130 epochs. This demonstrates that SCM-Block has a considerable effectiveness in properly identifying stool types.  Table 6 displays their accuracies on different types of data. We constructed the MC-SCMNet model, which has the maximum accuracy in diarrhea (93.65%), normal (85.71%), and soft stool (84.71%). However, the model's accuracy on water-deficient stool (92.06%) is somewhat lower than that of SCMNet (93.65%) and ResNeSt-50 (93.12%). This is because water-deficient and normal stool samples have similar features, and the number is much higher than that of diarrhea and soft stools. ResNeSt and SCMNet's data extraction stage (deep-stem) pays insufficient attention to location information and down-sampling details. This enables the identification of a large number of normal-type feces with similar characteristics as dry. As a result, the water-deficient type's accuracy is artificially high, while the normal type's accuracy is low.  Table 6 displays their accuracies on different types of data. We constructed the MC-SCMNet model, which has the maximum accuracy in diarrhea (93.65%), normal (85.71%), and soft stool (84.71%). However, the model's accuracy on water-deficient stool (92.06%) is somewhat lower than that of SCMNet (93.65%) and ResNeSt-50 (93.12%). This is because water-deficient and normal stool samples have similar features, and the number is much higher than that of diarrhea and soft stools. ResNeSt and SCMNet's data extraction stage (deep-stem) pays insufficient attention to location information and down-sampling details. This enables the identification of a large number of normal-type feces with similar characteristics as dry. As a result, the water-deficient type's accuracy is artificially high, while the normal type's accuracy is low. As indicated in Table 7, we also compared the MC-SCMNet model under each classification to the initial model ResNet50 and the basic model ResNeSt50. Figure 7 depicts the confusion matrix of different categories of the MC-SCMNet model and the original models.

Comparison with Other State-of-the-Art Methods
In order to better assess MC-SCMNet's classification ability in dog feces. It is compared to ResNeXt50 [48], B-ARNet [28], DMS-Robust Alexnet [62], Swin-transformer [63], and CA-MSNet [64]. Table 8 shows that the recognition accuracy of MC-SCMNet in each category is higher than that of other state-of-the-art methods. The network's SCM-Block combines CLAM for spatial information and MADM for small information utilizing the Split-Attention approach. This raises the computational cost while also improving network performance, resulting in a long training time (3 h 31 min 38 s). As a result, in the future, we will employ more lightweight strategies to improve the network's categorization performance and training speed.    The accuracy of ResNet50 ranges from 71% to 88%, with an average accuracy of 80.79%; the recall ranges from 71% to 89%, with an average recall of 81.16%. The model's F 1 value is 80.98%, and its accuracy is 80.25%. The accuracy of ResNeSt50 ranges from 76% to 93%, with an average accuracy of 84.13%; the recall ranges from 76% to 90%, with an average recall of 84.05%. The model's F 1 value is 84.09%, and its accuracy is 83.64%.
MC-SCMNet, on the other hand, maintains a precision rate of 84% to 94%, with an average precision rate of 88.89%; and a recall rate of 85% to 95%, with an average recall rate of 88.93%. The model's F 1 value is 88.91%, and its accuracy rate is 88.27%. The testing results show that MC-SCMNet can distinguish the feces features of each category accurately and increases the lower bound of accuracy and recall. The metrics shown above are much higher than ResNet50 and ResNeSt50. It thoroughly demonstrates that MC-SCMNet is effective.

Comparison with Other State-of-the-Art Methods
In order to better assess MC-SCMNet's classification ability in dog feces. It is compared to ResNeXt50 [48], B-ARNet [28], DMS-Robust Alexnet [62], Swin-transformer [63], and CA-MSNet [64]. Table 8 shows that the recognition accuracy of MC-SCMNet in each category is higher than that of other state-of-the-art methods. The network's SCM-Block combines CLAM for spatial information and MADM for small information utilizing the Split-Attention approach. This raises the computational cost while also improving network performance, resulting in a long training time (3 h 31 min 38 s). As a result, in the future, we will employ more lightweight strategies to improve the network's categorization performance and training speed.

Statistical Test
In order to rule out the randomness and chance of our experimental data, we use SPSS to conduct One-way ANOVA of variance on the experimental data in Tables 6 and 8 [65], the experimental results are shown in Table 9. We first made a null hypothesis (H 0 ) that all performance indicators were equal and any small gains or losses observed were not statistically significant, and made a valid hypothesis (H 1 ) that the observed gains or losses are statistically significant. Finally, set ∝= 0.05. The results show: F = 4.1364 > Fcrit = 2.0666, p = 0.0000579 < ∝ = 0.05, so reject H 0 . Therefore, we believe that there are significant differences between these groups, and that MC-SCMNet is superior to the current common mainstream methods, and is pertinent to feces.

Model Visualization
In order to more intuitively observe the concentration of the MC-SCMNet model, we produced a heat map using the Gradient-based Classification Activation (Grad-CAM) method and assessed the model in conjunction with the original image. Figure 8   We can observe in the MC-SCMNet model that after layer 1 training, the network's attention is somewhat divergent, focused on detailed information. Following layer 2 training, the network pays more attention to the features of the darker portions while also paying attention to the model's details. The network begins to focus on the edge boundaries of objects in the picture after layer 3 training. Following layer 4 training, the network completes picture feature extraction and concentrates all attention on the dog feces region. After layer 1 training, the attention of the ResNet50 and ResNeSt50 models is concentrated on a narrow local area, and the entire attention to the image is lost. After layer 2 training, ResNet50 concentrates on observing weeds and water stains; the ResNeSt50 model's attention begins to diverge, focusing more on water stains and excrement. Following layer 3 and layer 4 training, the two models concentrate on detecting ground water spots.
ResNet50 and ResNeSt50 highlight erroneous interference information based on the findings stated above. During the model training process, MC-SCMNet pays close attention to the characteristics of location information and down-sampling details, which filters We can observe in the MC-SCMNet model that after layer 1 training, the network's attention is somewhat divergent, focused on detailed information. Following layer 2 training, the network pays more attention to the features of the darker portions while also paying attention to the model's details. The network begins to focus on the edge boundaries of objects in the picture after layer 3 training. Following layer 4 training, the network completes picture feature extraction and concentrates all attention on the dog feces region. After layer 1 training, the attention of the ResNet50 and ResNeSt50 models is concentrated on a narrow local area, and the entire attention to the image is lost. After layer 2 training, ResNet50 concentrates on observing weeds and water stains; the ResNeSt50 model's attention begins to diverge, focusing more on water stains and excrement. Following layer 3 and layer 4 training, the two models concentrate on detecting ground water spots.
ResNet50 and ResNeSt50 highlight erroneous interference information based on the findings stated above. During the model training process, MC-SCMNet pays close attention to the characteristics of location information and down-sampling details, which filters out the influence of similar items and highlights the properties of feces. As a result, MC-SCMNet is more adapted to dog feces identification than ResNeSt50 and ResNet50.
To examine the reliability of MC-SCMNet in dog feces categorization further, we visually compared it to AlexNet, VGG-16, GoogleNet, ResNet-50, ResNeXt-50, and ResNeSt-50 models. Figure 9 depicts the experimental outcomes. We discovered, via observation, that the distractors in the image contributed to the incorrect attention of other categorization algorithms to some extent. MC-SCMNet, on the other hand, may totally suppress the expression of background information and specifically focus on the feces section, boosting feces classification accuracy. out the influence of similar items and highlights the properties of feces. As a result, MC-SCMNet is more adapted to dog feces identification than ResNeSt50 and ResNet50.
To examine the reliability of MC-SCMNet in dog feces categorization further, we visually compared it to AlexNet, VGG-16, GoogleNet, ResNet-50, ResNeXt-50, and ResNeSt-50 models. Figure 9 depicts the experimental outcomes. We discovered, via observation, that the distractors in the image contributed to the incorrect attention of other categorization algorithms to some extent. MC-SCMNet, on the other hand, may totally suppress the expression of background information and specifically focus on the feces section, boosting feces classification accuracy.

Workflow for a Practical Application
We demonstrated the actual workflow of MC-SCMNet as shown in Figure 10. In the actual use of pet dog owners and veterinarians, they can take images of dog feces in any environment through cameras or mobile phones, upload them to the cloud, and pass them to the trained MC-SCMNet model for classification. After being processed by this method, they can observe the stool sorting results on computer monitors or mobile phones. The results can help veterinarians diagnose the dog's intestinal health status, and can also preliminarily judge the dog's gastrointestinal health status for the owner to improve the dog's diet in a targeted manner.

Workflow for a Practical Application
We demonstrated the actual workflow of MC-SCMNet as shown in Figure 10. In the actual use of pet dog owners and veterinarians, they can take images of dog feces in any environment through cameras or mobile phones, upload them to the cloud, and pass them to the trained MC-SCMNet model for classification. After being processed by this method, they can observe the stool sorting results on computer monitors or mobile phones. The results can help veterinarians diagnose the dog's intestinal health status, and can also preliminarily judge the dog's gastrointestinal health status for the owner to improve the dog's diet in a targeted manner. Animals 2023, 13, x FOR PEER REVIEW 20 of 24 Figure 10. Workflow for a practical application.

Discussion
In this work, we propose a fine-grained image classification approach for dog feces using MC-SCMNet under complex backgrounds and confirm the effectiveness of MC-SCMNet. In this section, we discuss the use and application value of MC-SCMNet in many aspects and explore how to improve the application of MC-SCMNet in dog feces classification work based on MC-SCMNet and the DFML dataset.
In the process of raising dogs, dogs are easily affected by parasitic infections [6], gastrointestinal biota disorder [7], stress disorders [8], etc., and produce abnormal feces such as diarrhea [12] or hard stool [66]. Different from the manual method of classifying feces by the PURINA FECAL SCORING CHART [19] and the WALTHAM™ Faeces Scoring System [20], novices who cannot distinguish feces can also accurately classify dog feces through MC-SCMNet. In actual use, owners or veterinarians only need to take pictures of dog feces and upload them, and MC-SCMNet can automatically obtain the classification results end-to-end, and provide early warning of the dog's intestinal health [26]. In the future, we should aim to deploy this approach to edge devices [67] for veterinarians and pet owners.
The results of this study show that the MC-SCMNet model outperforms other classification models in terms of overall classification accuracy under the same experimental setting. Moreover, because we use DSC instead of standard convolution, the total training time of MC-SCMNet is only increased by 16 min and 36 s compared with the original model ResNeSt50 [49]. Although MC-SCMNet has achieved a good balance in time and accuracy compared with ResNeSt, there is still a lot of room for improvement in the time of 3 h 31 min 38 s. Therefore, we will work on developing more lightweight strategies in the future to continuously improve the operational efficiency of the network.
In the performance experiments in Table 6, MC-SCMNet's accuracy on water-deficient stool (92.06%) is somewhat lower than that of SCMNet (93.65%) and ResNeSt-50 (93.12%). However, it has the maximum accuracy in diarrhea (93.65%), normal (85.71%), and soft stool (84.71%). This shows that MC-SCMNet has no obvious bias in the process of recognizing feces, and has stable classification performance. However, the lack of

Discussion
In this work, we propose a fine-grained image classification approach for dog feces using MC-SCMNet under complex backgrounds and confirm the effectiveness of MC-SCMNet. In this section, we discuss the use and application value of MC-SCMNet in many aspects and explore how to improve the application of MC-SCMNet in dog feces classification work based on MC-SCMNet and the DFML dataset.
In the process of raising dogs, dogs are easily affected by parasitic infections [6], gastrointestinal biota disorder [7], stress disorders [8], etc., and produce abnormal feces such as diarrhea [12] or hard stool [66]. Different from the manual method of classifying feces by the PURINA FECAL SCORING CHART [19] and the WALTHAM™ Faeces Scoring System [20], novices who cannot distinguish feces can also accurately classify dog feces through MC-SCMNet. In actual use, owners or veterinarians only need to take pictures of dog feces and upload them, and MC-SCMNet can automatically obtain the classification results end-to-end, and provide early warning of the dog's intestinal health [26]. In the future, we should aim to deploy this approach to edge devices [67] for veterinarians and pet owners.
The results of this study show that the MC-SCMNet model outperforms other classification models in terms of overall classification accuracy under the same experimental setting. Moreover, because we use DSC instead of standard convolution, the total training time of MC-SCMNet is only increased by 16 min and 36 s compared with the original model ResNeSt50 [49]. Although MC-SCMNet has achieved a good balance in time and accuracy compared with ResNeSt, there is still a lot of room for improvement in the time of 3 h 31 min 38 s. Therefore, we will work on developing more lightweight strategies in the future to continuously improve the operational efficiency of the network.
In the performance experiments in Table 6, MC-SCMNet's accuracy on water-deficient stool (92.06%) is somewhat lower than that of SCMNet (93.65%) and ResNeSt-50 (93.12%). However, it has the maximum accuracy in diarrhea (93.65%), normal (85.71%), and soft stool (84.71%). This shows that MC-SCMNet has no obvious bias in the process of recognizing feces, and has stable classification performance. However, the lack of recognition accuracy also reflects the insufficient amount of data contained in our DFML dataset and the lack of coverage of data categories. To alleviate this situation, we will expand more categories in the future to improve the generalization ability of the model. Since the dataset only contains a small number of samples, this paper uses data augmentation to alleviate the overfitting problem of the network and improve the accuracy of the model, but the improvement effect is limited, not as good as the actual image. Therefore, we plan to extend the DFML dataset in the future to improve model performance.
MC-SCMNet has a good and accurate classification effect on dog feces. However, in practical applications, the results of dog feces classification cannot accurately correspond to certain gastrointestinal diseases. When the dog's feces have been diagnosed as abnormal by MC-SCMNet for a long time, the owner needs to contact the veterinarian for further diagnosis [26,31]. Therefore, in future work, we should record some characteristics of dogs while taking pictures of dog feces in time, and conduct a certain degree of biochemical analysis on the health of dogs' gastrointestinal tract without affecting the health of dogs [31]. In this way, the direct correlation between the dog feces and the dog's gastrointestinal health can be established, and the adequacy and reliability of the work can be improved.

Conclusions
The dog feces classification model proposed in this paper significantly improves the performance of fine-grained dog feces classification in complex backgrounds. In the absence of a dataset, we built our own DFML dataset for the first time, and proposed MC-SCMNet based on ResNeSt. First, we propose a multi-scale attention down-sampling module, MADM. It expands the information scale and has learnable parameters while focusing on the salient features of feces. This further improves the model's fecal feature extraction performance. Then, we propose CLAM. It filters the background information interference in the image through bidirectional weight encoding, making the network focus on the target hotspot. Finally, in the feature fusion stage, we combined the characteristics of the above two sub-modules to form the SCM-Block. We use this Block to build a new backbone network SCMNet, which improves the fusion efficiency of the model for dog feces features. Considering the computational cost and training time, we use DSC to reduce the amount of network parameters while maintaining the performance of the model. The experimental results show that MC-SCMNet solves the problems of information loss and false classification in dog feces recognition. Compared to other standard classification models, it achieves the highest accuracy (88.27%) and is more targeted in the fine-grained classification of dog feces.
In summary, the work in this paper could monitor the health of dogs. The method can provide an initial screening of fecal status to aid veterinary diagnosis when a dog's feces are abnormal. This can improve the efficiency of the veterinary work while providing an initial warning to the pet dog owner.