Recognition of Ellipsoid-like Herbaceous Tibetan Medicinal Materials Using DenseNet with Attention and ILBP-Encoded Gabor Features

Tibetan medicinal materials play a significant role in Tibetan culture. However, some types of Tibetan medicinal materials share similar shapes and colors, but possess different medicinal properties and functions. The incorrect use of such medicinal materials may lead to poisoning, delayed treatment, and potentially severe consequences for patients. Historically, the identification of ellipsoid-like herbaceous Tibetan medicinal materials has relied on manual identification methods, including observation, touching, tasting, and nasal smell, which heavily rely on the technicians’ accumulated experience and are prone to errors. In this paper, we propose an image-recognition method for ellipsoid-like herbaceous Tibetan medicinal materials that combines texture feature extraction and a deep-learning network. We created an image dataset consisting of 3200 images of 18 types of ellipsoid-like Tibetan medicinal materials. Due to the complex background and high similarity in the shape and color of the ellipsoid-like herbaceous Tibetan medicinal materials in the images, we conducted a multi-feature fusion experiment on the shape, color, and texture features of these materials. To leverage the importance of texture features, we utilized an improved LBP (local binary pattern) algorithm to encode the texture features extracted by the Gabor algorithm. We inputted the final features into the DenseNet network to recognize the images of the ellipsoid-like herbaceous Tibetan medicinal materials. Our approach focuses on extracting important texture information while ignoring irrelevant information such as background clutter to eliminate interference and improve recognition performance. The experimental results show that our proposed method achieved a recognition accuracy of 93.67% on the original dataset and 95.11% on the augmented dataset. In conclusion, our proposed method could aid in the identification and authentication of ellipsoid-like herbaceous Tibetan medicinal materials, reducing errors and ensuring the safe use of Tibetan medicinal materials in healthcare.


Introduction
As the material basis of medical theory, Tibetan medicinal materials serve to achieve the purposes of disease prevention and healthcare, acting as a bridge between medical theory and clinical practice [1]. The correct recognition and application of Tibetan medicines are essential prerequisites for making full use of their medicinal value. Ellipsoid-like herbaceous Tibetan medicinal materials have fewer intraclass differences due to their similar natural attributes, such as their color and shape. In the early days, people mainly relied on manual methods of identification, such as observation, touch, taste, and smell, to recognize Tibetan medicinal materials [2]. However, these methods are highly subjective, laborintensive, and prone to errors. With the development of deep-learning technology, great progress has been made in the recognition of ellipsoid-like herbaceous Tibetan medicinal materials [3,4]. Compared to traditional manual methods, deep-learning-based methods To address these challenges, we built a standard dataset of ellipsoid-like herbaceous Tibetan medicinal materials with complex backgrounds. We combined a Gabor wavelet transform and improved local binary patterns to extract the texture features of images, and used the DenseNet network with an added attention mechanism to identify ellipsoidlike herbaceous Tibetan medicine images. The experimental results show that our method can achieve a 93.67% recognition accuracy on our dataset. To sum up, the main contributions of this paper are as follows: • We verified the key role of texture features in recognizing ellipsoid-like herbaceous Tibetan medicinal materials by conducting multi-feature fusion experiments on a constructed ellipsoid-like herbaceous Tibetan medicinal material dataset.

•
We used data enhancement to increase the number of images and validated its effectiveness in the recognition of ellipsoid-like herbaceous Tibetan medicinal materials.

•
We proposed the use of an improved LBP algorithm to encode texture features of ellipsoid-like herbaceous Tibetan medicinal materials and demonstrated its effectiveness at improving the recognition accuracy on an additional complex test set.

•
We evaluated our proposed method against existing herbal methods on the constructed dataset, and our results show that our method achieved better recognition for ellipsoid-like herbaceous Tibetan medicinal materials on a complex background.
The remainder of this paper is organized as follows: Section 2 reviews related work, Section 3 presents the method for identifying ellipsoid-like herbaceous Tibetan medicinal materials, Section 4 presents the experiments and an analysis of the results, and Section 5 concludes the paper. To address these challenges, we built a standard dataset of ellipsoid-like herbaceous Tibetan medicinal materials with complex backgrounds. We combined a Gabor wavelet transform and improved local binary patterns to extract the texture features of images, and used the DenseNet network with an added attention mechanism to identify ellipsoid-like herbaceous Tibetan medicine images. The experimental results show that our method can achieve a 93.67% recognition accuracy on our dataset. To sum up, the main contributions of this paper are as follows:

•
We verified the key role of texture features in recognizing ellipsoid-like herbaceous Tibetan medicinal materials by conducting multi-feature fusion experiments on a constructed ellipsoid-like herbaceous Tibetan medicinal material dataset.

•
We used data enhancement to increase the number of images and validated its effectiveness in the recognition of ellipsoid-like herbaceous Tibetan medicinal materials.

•
We proposed the use of an improved LBP algorithm to encode texture features of ellipsoid-like herbaceous Tibetan medicinal materials and demonstrated its effectiveness at improving the recognition accuracy on an additional complex test set.

•
We evaluated our proposed method against existing herbal methods on the constructed dataset, and our results show that our method achieved better recognition for ellipsoidlike herbaceous Tibetan medicinal materials on a complex background.
The remainder of this paper is organized as follows: Section 2 reviews related work, Section 3 presents the method for identifying ellipsoid-like herbaceous Tibetan medicinal materials, Section 4 presents the experiments and an analysis of the results, and Section 5 concludes the paper.

Related Work
Achievements have been made in the computer-based recognition of herbal medicines with similar shapes. Earlier works have relied on the underlying features of a single image, such as the color [6][7][8][9], texture [10][11][12][13][14], and shape [5,15], for various fine-grained herbal medicine recognition tasks. Due to the richness of herbal species, even herbs belonging to the same species can vary significantly in quality due to differences in the growing regions, climate, harvesting times, and processing methods. Recent research has proposed the use of deep-learning networks in the field of traditional Chinese medicine recognition, with convolutional neural networks showing greater advantages over traditional shallow machine-learning algorithms in image classification. The main deep-learning algorithms used in this field include GoogleNet [16][17][18][19], VGGNet [16][17][18]20,21], ResNet [20,22,23], DenseNet [24], and AlexNet [20,21,[25][26][27][28], among others. Lightweight CNNs such as SqueezeNet [29], ShuffleNet [30,31], and MobileNet [32][33][34] are also gaining popularity due to their fast speed, small memory requirement, and low computation, making them suitable for mobile devices. Recent advancements in peripheral vision [35], multi-axis vision transformers [36], and visual transformers [37][38][39] have also improved the accuracy of finegrained classification tasks. These methods provide important references for recognizing ellipsoid-like herbaceous Tibetan medicinal materials. However, existing experiments have mainly focused on images of individual ellipsoid-like herbaceous Tibetan medicinal materials taken in ideal environments, resulting in degraded recognition effects for images with complex backgrounds. In this paper, we propose a recognition model that combines texture feature extraction with deep-learning methods for images of ellipsoid-like herbaceous Tibetan medicinal materials captured in complex backgrounds. We improved the model's robustness to complex background distractors by introducing an attention mechanism.

A Dataset of Ellipsoid-like Herbaceous Tibetan Medicinal Materials
By reviewing the Encyclopedia of Tibetan Medicinal Materials in China, we selected 18 types of ellipsoid-like herbaceous Tibetan medicinal materials, including Lu Lu Tong, soapberry, and You Ma Zhong. We leveraged Python 3.8 [40] to search for corresponding images of ellipsoid-like herbaceous Tibetan medicinal materials from the Bing search engine and major Tibetan medicinal material websites. Additionally, we went to the Tibetan Museum of Nature Sciences to take some pictures of ellipsoid-like herbaceous Tibetan medicinal materials. Due to the low quality of the captured images, manual screening was required to ensure that the images in the original dataset correctly reflected the corresponding ellipsoid-like herbaceous Tibetan medicinal materials. Therefore, we hired researchers specializing in Tibetan medicine to identify and screen the images in the dataset, ensuring their accuracy. In total, we acquired 3200 images of 18 species of ellipsoid-like herbaceous Tibetan medicinal materials. After the manual screening, many images in the dataset that were obtained from the internet were discarded for some types of ellipsoid-like herbaceous Tibetan medicinal materials. Meanwhile, the number of images obtained by field photography at Lhasa joint specialty stores and the Tibetan Museum of Natural Sciences was also limited. As a result, the available training images for model learning were insufficient. To address this issue, we used data augmentation methods to expand the dataset by adjusting the brightness, adding Gaussian noise, and mirroring and rotating the images. In this way, the training dataset was enlarged and the number of Tibetan medicinal material images increased from 3200 to 16,000, which helped alleviate the model overfitting issue [4]. To evaluate the effect of data augmentation on the recognition of ellipsoid-like herbaceous Tibetan medicinal materials, we conducted experiments on the original dataset and the augmented dataset using our proposed method. The dataset was randomly divided into a training set and a test set in an 8:2 ratio. To evaluate the recognition accuracy of our proposed model, we additionally collected 360 images of complex backgrounds to build a complex test set. Compared with the images in the normal test set, the images in the complex test set had backgrounds that were usually very similar in color to the medicinal materials, and the occlusion was more severe. Some example images are shown in Figure 2. complex backgrounds to build a complex test set. Compared with the images in the normal test set, the images in the complex test set had backgrounds that were usually very similar in color to the medicinal materials, and the occlusion was more severe. Some example images are shown in Figure 2.

Feature Extraction
The color features of the images were less sensitive to size and orientation and had clear, intuitive, and easily described physical properties [41]. The RGB (red, green, blue) encoding method was used to represent the intensity of each of the three color channels: red, green, and blue. After encoding, the RGB color space was converted to the HSI (hue, saturation, intensity) color space for extracting the image feature vectors, and the obtained color feature vectors were subsequently normalized. Compared with the RGB model, the HSI model adds two feature parameters: saturation and luminance. Assuming that the values of the color components in the RGB color space are ( , , ) and ( , , ) ∈ [0,1] the formulas for converting from the RGB color space to the HSI color space are as follows [42]: where = arc cos .
The histogram of oriented gradient (HOG) algorithm is widely used for the shape feature extraction of images. The image is first normalized, and then gamma compression is applied to the color image to reduce the effects of shadows and illumination variations. Then, the gradient calculation is performed on the normalized color image to obtain the horizontal and vertical gradient components, and , and to calculate the current pixel gradient amplitude . The calculation formula is as follows [43]:

Feature Extraction
The color features of the images were less sensitive to size and orientation and had clear, intuitive, and easily described physical properties [41]. The RGB (red, green, blue) encoding method was used to represent the intensity of each of the three color channels: red, green, and blue. After encoding, the RGB color space was converted to the HSI (hue, saturation, intensity) color space for extracting the image feature vectors, and the obtained color feature vectors were subsequently normalized. Compared with the RGB model, the HSI model adds two feature parameters: saturation and luminance. Assuming that the values of the color components in the RGB color space are (R, G, B) and (R, G, B) ∈ [0, 1] the formulas for converting from the RGB color space to the HSI color space are as follows [42]: where θ = arc cos .
The histogram of oriented gradient (HOG) algorithm is widely used for the shape feature extraction of images. The image is first normalized, and then gamma compression is applied to the color image to reduce the effects of shadows and illumination variations. Then, the gradient calculation is performed on the normalized color image to obtain the horizontal and vertical gradient components, G x and G y , and to calculate the current pixel gradient amplitude G. The calculation formula is as follows [43]: The local binary pattern (LBP) texture analysis operator was first proposed by Ojala et al. [44]. This algorithm is widely used in the feature extraction process of recognizing objects [45]. The texture structure characteristics of ellipsoid-like herbaceous Tibetan medicinal materials under different angles and levels of illumination and shading do not change significantly. The local binary model can ideally extract the texture features of ellipsoid-like herbaceous Tibetan medicinal materials, which increases the robustness and accuracy of ellipsoid-like herbaceous Tibetan medicinal material recognition. The LBP algorithm [46] we used is defined as follows: where P and R represent the number of domain pixels and the processing radius of the processing unit, respectively. g c represents the gray value of the center pixel and g p represents the gray value of the first few pixels in the field, where P = 0, 1, 2, 3 . . ..

Multi-Feature Fusion of Images
Ellipsoid-like herbaceous Tibetan medicinal materials have a high similarity in terms of shape and color. Conducting multi-feature fusion experiments on the extracted features can verify the importance of color, shape, and texture features in image representation. We allocated different weights to different features for feature fusion [47]. The total weight of the fused features was 1. Through experiments, we obtained the optimal weights for each feature. The multi-feature fusion equation is as follows: where F represents the fused features; F RGB represents the color features; F HOG represents the shape features; F LBP represents the texture features; and a, b, and c represent the weight coefficients of each feature, respectively.

ILBP-Encoded Gabor Features
Improved local binary patterns: The basic LBP operator assigns the gray value of all pixels smaller than the central gray value to 0 when extracting the image texture [42]. It does not take into account pixels with small differences from the central gray value, so some useful texture information will be lost. For example, if the center pixel is 90 and the surrounding pixels have a gray value of 89, then obviously assigning the gray value of this surrounding pixel to 0 will result in some loss of information.
In addition, comparing only the grayscale values of the peripheral pixels with the central grayscale value suffers from the influence of the central grayscale value and is not very stable [48]. Therefore, we proposed an improved LBP algorithm. Suppose P stands for the current pixel, the gray value of the current pixel is defined as g r , the average gray value of the 8 neighborhoods is g a , and the standard deviation of the 8 neighborhoods is g δ . The ILBP (improved local binary pattern) algorithm operator is defined as follows: P = 1, g r ≥ g a or g r < g a and g a − g r < g δ P = 0, g r < g a and g a − g r ≥ g δ The improved LBP algorithm, which uses the gray average of 8 neighborhoods instead of the central gray value when calculating the binary sequence and considers the variance of the neighborhoods, reduces the influence of the central gray value on the LBP operator and can extract the texture features of the image more effectively.
The process of ILBP-encoded Gabor feature extraction is as follows: the Gabor wavelet can reduce the interference of external factors and extract feature information from multiple angles and scales of the target image, while the LBP algorithm can better present the local feature information of the image, extract clearer local texture features, and reduce the feature dimension of the image [49]. Combining different algorithms can make up for the deficiencies between the other algorithms to a certain extent. The specific implementation process of the algorithm combination can be seen in Algorithm 1. The process of ILBP-encoded Gabor feature extraction is as follows: the Gabor wave-let can reduce the interference of external factors and extract feature information from multiple angles and scales of the target image, while the LBP algorithm can better present the local feature information of the image, extract clearer local texture features, and reduce the feature dimension of the image [49]. Combining different algorithms can make up for the deficiencies between the other algorithms to a certain extent. The specific implementation process of the algorithm combination can be seen in Algorithm 1.

Attentional Mechanisms
The attention mechanism was originally inspired by the human brain's signal-processing mechanism for vision. When the brain receives information from the external world, it selectively processes only the important information while filtering out the distracting information, thus enhancing the efficiency of information processing [50]. In cognitive science, humans are known to selectively focus on a portion of all information when faced with a large and complex scene, such as regions of abrupt color or style changes, while ignoring other relatively mundane regions due to bottlenecks in information processing. The attention mechanism in computer vision draws from this concept, allowing the network to focus on the important information and ignore the unimportant information. Its first application was in natural language processing, and it was later extended to image processing. Since images of ellipsoid-like herbaceous Tibetan medicinal materials with complex backgrounds often contain irrelevant information, the recognition of these images is usually based on the texture features of the slices that occupy only a part of the image. In this paper, we introduced the attention mechanism into the DenseNet network to focus on the key areas of texture features of ellipsoid-like herbaceous Tibetan medicinal material images with complex backgrounds and extract more accurate key texture feature information, thus enhancing the recognition accuracy.

Construction of a Recognition Model for Ellipsoid-like Herbaceous Tibetan Medicinal Materials
To construct a recognition model for ellipsoid-like herbaceous Tibetan medicinal materials, we used the DenseNet proposed by Huang et al. [51] in 2017 as the backbone network. Due to the uneven distribution of the collected ellipsoid-like herbaceous Tibetan medicinal materials, we changed the loss function to focal loss, which can eliminate the category imbalance and mine difficult samples, improving the image recognition accuracy of the DenseNet network. The complex background of spherical herbaceous Tibetan

Attentional Mechanisms
The attention mechanism was originally inspired by the human brain's signal-processing mechanism for vision. When the brain receives information from the external world, it selectively processes only the important information while filtering out the distracting information, thus enhancing the efficiency of information processing [50]. In cognitive science, humans are known to selectively focus on a portion of all information when faced with a large and complex scene, such as regions of abrupt color or style changes, while ignoring other relatively mundane regions due to bottlenecks in information processing. The attention mechanism in computer vision draws from this concept, allowing the network to focus on the important information and ignore the unimportant information. Its first application was in natural language processing, and it was later extended to image processing. Since images of ellipsoid-like herbaceous Tibetan medicinal materials with complex backgrounds often contain irrelevant information, the recognition of these images is usually based on the texture features of the slices that occupy only a part of the image. In this paper, we introduced the attention mechanism into the DenseNet network to focus on the key areas of texture features of ellipsoid-like herbaceous Tibetan medicinal material images with complex backgrounds and extract more accurate key texture feature information, thus enhancing the recognition accuracy.

Construction of a Recognition Model for Ellipsoid-like Herbaceous Tibetan Medicinal Materials
To construct a recognition model for ellipsoid-like herbaceous Tibetan medicinal materials, we used the DenseNet proposed by Huang et al. [51] in 2017 as the backbone network. Due to the uneven distribution of the collected ellipsoid-like herbaceous Tibetan medicinal materials, we changed the loss function to focal loss, which can eliminate the category imbalance and mine difficult samples, improving the image recognition accuracy of the DenseNet network. The complex background of spherical herbaceous Tibetan medicine images may contain invalid information such as utensils and human hands, which may affect the accuracy of a quality evaluation during the training process. Therefore, when training recognition models, it is crucial to introduce an attention mechanism that preserves the target object's location features while removing background features [52]. First, through a series of convolutions and downsampling, high-level features were gradually extracted to increase the receptive field of the model. Activation pixels in high-level features can reflect regions of interest. Then, the same amount of upsampling was achieved by bilinear differencing to upscale the attention map to the same size as the original input. In this way, an attention region corresponded to each input pixel to obtain an attention map. The channel attention mechanism was introduced to the feature maps of different scales used to generate candidate regions. Instead of considering the feature information of all channels in the feature maps equally, we assigned different weights to each channel of the feature maps. For each channel, we increased the weights of object regions and decreased the weights of non-object regions by weight adaptation. Therefore, the model focused on the valid information with large weights while mitigating the interference of background information. To summarize, based on the spatial attention map, the feature map of each channel was multiplied by the corresponding weight to achieve the final attention mechanism. The attention mechanism unit introduced in each dense block structure can strengthen the global features in the shallow network and re-weight the important channels of each feature in the deep network, thus enhancing the model's accuracy. Finally, we formed the DenseNet with attention and ILBP-encoded Gabor features. The designed network structure is shown in Figure 3. that preserves the target object's location features while removing background feature [52]. First, through a series of convolutions and downsampling, high-level features wer gradually extracted to increase the receptive field of the model. Activation pixels in high level features can reflect regions of interest. Then, the same amount of upsampling wa achieved by bilinear differencing to upscale the attention map to the same size as the orig inal input. In this way, an attention region corresponded to each input pixel to obtain an attention map. The channel attention mechanism was introduced to the feature maps o different scales used to generate candidate regions. Instead of considering the feature in formation of all channels in the feature maps equally, we assigned different weights to each channel of the feature maps. For each channel, we increased the weights of objec regions and decreased the weights of non-object regions by weight adaptation. Therefore the model focused on the valid information with large weights while mitigating the inter ference of background information. To summarize, based on the spatial attention map the feature map of each channel was multiplied by the corresponding weight to achiev the final attention mechanism. The attention mechanism unit introduced in each dens block structure can strengthen the global features in the shallow network and re-weigh the important channels of each feature in the deep network, thus enhancing the model' accuracy. Finally, we formed the DenseNet with attention and ILBP-encoded Gabor fea tures. The designed network structure is shown in Figure 3.

Experimental Settings
To verify the effectiveness of our method, we conducted experiments on the datase of ellipsoid-like herbaceous Tibetan medicinal material images with complex back grounds. We first inputted the image features into the DenseNet network with the atten tion mechanism after multi-feature fusion, and obtained the optimal feature weights fo different features in multi-feature fusion by experimentally comparing the recognition ac curacy of images with different weights. We then verified the performance of the network when using a single LBP or Gabor algorithm, LBP-encoded Gabor, ILBP-encoded Gabor and an attention mechanism. The accuracy of each model was compared and analyzed We used the adaptive momentum stochastic optimization algorithm to update the weight

Experimental Settings
To verify the effectiveness of our method, we conducted experiments on the dataset of ellipsoid-like herbaceous Tibetan medicinal material images with complex backgrounds. We first inputted the image features into the DenseNet network with the attention mechanism after multi-feature fusion, and obtained the optimal feature weights for different features in multi-feature fusion by experimentally comparing the recognition accuracy of images with different weights. We then verified the performance of the network when using a single LBP or Gabor algorithm, LBP-encoded Gabor, ILBP-encoded Gabor, and an attention mechanism. The accuracy of each model was compared and analyzed. We used the adaptive momentum stochastic optimization algorithm to update the weights and biases in the network model. The parameters in the experiment were set as follows: the network learning rate was set to 0.001 and the batch size was set to 16. In the experiment, the stochastic gradient descent method was used for network training, the number of network iterations (Epoch) was set to 50, and focal loss was used as the loss function. We set the hyperparameters α and γ of the focal loss to 0.25 and 2, respectively. The accuracy rate (accuracy) and macro-F1 were used as the evaluation indexes of the model.  Table 1 shows the image recognition accuracy and macro-F1 score under different weight assignments. The weighting factors a, b, and c represent the color, shape, and texture features, respectively. The highest accuracy and macro-F1 score for the recognition of complex background ellipsoid-like herbaceous Tibetan medicinal material images were achieved when a = 0.1, b = 0.1, and c = 0.8. Although the color and shape features played a role in classifying and recognizing different types of ellipsoid-like herbaceous Tibetan medicinal materials, they were easily influenced by background interference. The complexity of the texture structure of these materials made texture characteristics crucial for their identification. The weights of different features indicated that texture features play a key role in expressing the image content information. An analysis of the weights of different features concluded that texture features in ellipsoid-like herbaceous Tibetan medicinal material images play a key role in expressing the content information of the images.

Results and Analysis of Ablation Experiments
The results of the ablation experiment (shown in Figure 4) demonstrate that the recognition accuracy of the network model gradually improved and eventually stabilized with an increase in the number of iterations in the training process. The recognition accuracy of our model (DenseNet with attention and LBP-encoded Gabor features) was 92.38%, which is higher than that of models using a single LBP or Gabor algorithm. The added attention module improved the model's feature extraction ability, reducing the weight of useless information and increasing that of useful information. In turn, this improved the overall performance of the network. Texture feature extraction provided a comprehensive understanding of the distinguishing features of ellipsoid-like herbaceous Tibetan medicinal materials, resulting in better classification and recognition results. The ablation experiment results confirmed that using LBP-encoded Gabor resulted in better texture features. This experiment was performed using the original LBP algorithm and the improved LBP (ILBP) method under the same experimental setup, and the results are shown in Table 2. As can be seen, the texture features extracted by the improved LBP operator had a better recognition performance and achieved a 93.67% recognition accuracy. The improved LBP algorithm can extract texture features more effectively by replacing the central gray value with the gray average of the eight neighborhoods and considering the variance of the neighborhoods to reduce the influence of the central gray value on the LBP operator when calculating the binary sequence.   To verify the effectiveness of data augmentation, the model was trained using the original data and the data-augmented dataset, and then the experimental results were obtained afterwards using the original data images for testing. The experimental results using our method for the original dataset and the augmented dataset are shown in Figure 5. The recognition accuracy for the original dataset was 93.67%, and the recognition accuracy after performing data augmentation was 95.11%. These results indicate that data augmentation can increase the number of training samples and reduce network overfitting, ultimately improving the model generalization and robustness.

Verification of the Validity of Dilated Convolution
Dilated convolution [53] is able to increase the output cell's perceptual field without increasing the number of parameters by injecting holes of weight 0 at intervals in the elements inside the conventional convolution kernel, and the number of injected holes is called the dilated rate [54]. Most images of ellipsoid-like herbaceous Tibetan medicinal materials with complex backgrounds contain a lot of invalid scenes, apparatuses, and other information. To alleviate the impact of such distractions, the 3 × 3 convolution in the first dense block was changed into a 3 × 3 convolution of cavities, and the number of To verify the effectiveness of data augmentation, the model was trained using the original data and the data-augmented dataset, and then the experimental results were obtained afterwards using the original data images for testing. The experimental results using our method for the original dataset and the augmented dataset are shown in Figure 5. The recognition accuracy for the original dataset was 93.67%, and the recognition accuracy after performing data augmentation was 95.11%. These results indicate that data augmentation can increase the number of training samples and reduce network overfitting, ultimately improving the model generalization and robustness.  To verify the effectiveness of data augmentation, the model was trained using the original data and the data-augmented dataset, and then the experimental results were obtained afterwards using the original data images for testing. The experimental results using our method for the original dataset and the augmented dataset are shown in Figure 5. The recognition accuracy for the original dataset was 93.67%, and the recognition accuracy after performing data augmentation was 95.11%. These results indicate that data augmentation can increase the number of training samples and reduce network overfitting, ultimately improving the model generalization and robustness.

Verification of the Validity of Dilated Convolution
Dilated convolution [53] is able to increase the output cell's perceptual field without increasing the number of parameters by injecting holes of weight 0 at intervals in the elements inside the conventional convolution kernel, and the number of injected holes is called the dilated rate [54]. Most images of ellipsoid-like herbaceous Tibetan medicinal materials with complex backgrounds contain a lot of invalid scenes, apparatuses, and other information. To alleviate the impact of such distractions, the 3 × 3 convolution in the first dense block was changed into a 3 × 3 convolution of cavities, and the number of

Verification of the Validity of Dilated Convolution
Dilated convolution [53] is able to increase the output cell's perceptual field without increasing the number of parameters by injecting holes of weight 0 at intervals in the elements inside the conventional convolution kernel, and the number of injected holes is called the dilated rate [54]. Most images of ellipsoid-like herbaceous Tibetan medicinal materials with complex backgrounds contain a lot of invalid scenes, apparatuses, and other information. To alleviate the impact of such distractions, the 3 × 3 convolution in the first dense block was changed into a 3 × 3 convolution of cavities, and the number of cavities in the dilated convolution was two, which can reasonably increase the perceptual field range. For the hole convolution with the expansion rate of 2, the output size was kept constant by setting the step size to 1 and the fill value to 2. The experiments were conducted on the collected complex-background ellipsoid-like herbaceous Tibetan medicinal material image dataset. The experimental results are shown in Table 3. As can be seen, the recognition accuracy after dilated convolution was incorporated into the method in this paper was 89.72%. Compared with the single use of the DenseNet network, the recognition accuracy increased by about 1%, and the recognition accuracy decreased compared with the proposed ellipsoid-like herbaceous Tibetan medicinal material recognition method in this paper (ILBPencoded Gabor_attention_DenseNet). The reason for this result might be because, although dilated convolution can expand the perceptual field, the convolution results obtained by a certain layer come from an independent set of the previous layer, and there is no correlation between the convolution results of this layer, resulting in the loss of local information.

Comparative Experimental Results and Analysis
In this section, the proposed model is compared with existing methods for identifying traditional Chinese medicines with similar shapes. The model was trained using both the original dataset and the dataset after data augmentation, and then tested using the original dataset images. The experimental results are shown in Table 4. The results indicate that the recognition accuracy and macro-F1 score for the augmented dataset were generally higher than those for the original dataset. Compared with other methods, the method proposed in this paper performed better on both datasets. The color moment + SVM model [54] had the lowest accuracy among the seven comparison models, as this model extracts color features from images and then classifies them. However, the color features of ellipsoid-like herbaceous Tibetan medicinal materials are highly similar and cannot be used as reliable recognition features. Therefore, on the complex-background dataset of ellipsoid-like herbaceous Tibetan medicinal materials, the color moment + SVM model performed poorly. The existing ResNet [20], Inception-V3 [15], LeNet-5 [54], and YOLOv3 [55] networks achieved recognition accuracies of over 80% on images of ellipsoid-like herbaceous Tibetan medicinal materials on complex backgrounds, but they still had a large gap in their recognition accuracies and macro-F1 scores compared with the method proposed in this paper. YOLOv5-Ghost-CA [56], based on the YOLOv5 algorithm's backbone network, designed a lightweight GhostBottleneck module. The attention mechanism was added to the model structure and the original convolution layer was replaced with depthwise-separable convolution. This method achieved a recognition accuracy of 89.77% on the dataset in this paper and performed better than other existing traditional Chinese medicine recognition methods. The experimental results demonstrate that the method proposed in this paper had the highest accuracy and the best performance in the comparative experiment.

Experimental Validation on Complex Test Sets
The recognition performance of various models on the complex test set is presented in Figure 6, clearly indicating that the convolutional neural-network-based classification algorithm outperformed traditional shallow machine-learning algorithms in terms of image classification accuracy. Especially in scenarios where the herb color was similar to the background or when the image was heavily occluded, our proposed method achieved a better recognition accuracy than other models, with an average recognition accuracy of 92.41% for 18 types of ellipsoid-like herbaceous Tibetan medicinal materials. The experimental results demonstrate that the combination of traditional texture features (ILBP-encoded Gabor) with deep learning (DenseNet) and the integration of an attention mechanism can effectively improve the recognition accuracy for images with complex backgrounds. Figure 7 illustrates the recognition results of partial images using different methods.
highly similar and cannot be used as reliable recognition features. Therefore, on the complex-background dataset of ellipsoid-like herbaceous Tibetan medicinal materials, the color moment + SVM model performed poorly. The existing ResNet [20], Inception-V3 [15], LeNet-5 [54], and YOLOv3 [55] networks achieved recognition accuracies of over 80% on images of ellipsoid-like herbaceous Tibetan medicinal materials on complex backgrounds, but they still had a large gap in their recognition accuracies and macro-F1 scores compared with the method proposed in this paper. YOLOv5-Ghost-CA [56], based on the YOLOv5 algorithm's backbone network, designed a lightweight GhostBottleneck module. The attention mechanism was added to the model structure and the original convolution layer was replaced with depthwise-separable convolution. This method achieved a recognition accuracy of 89.77% on the dataset in this paper and performed better than other existing traditional Chinese medicine recognition methods. The experimental results demonstrate that the method proposed in this paper had the highest accuracy and the best performance in the comparative experiment.

Experimental Validation on Complex Test Sets
The recognition performance of various models on the complex test set is presented in Figure 6, clearly indicating that the convolutional neural-network-based classification algorithm outperformed traditional shallow machine-learning algorithms in terms of image classification accuracy. Especially in scenarios where the herb color was similar to the background or when the image was heavily occluded, our proposed method achieved a better recognition accuracy than other models, with an average recognition accuracy of 92.41% for 18 types of ellipsoid-like herbaceous Tibetan medicinal materials. The experimental results demonstrate that the combination of traditional texture features (ILBP-encoded Gabor) with deep learning (DenseNet) and the integration of an attention mechanism can effectively improve the recognition accuracy for images with complex backgrounds. Figure 7 illustrates the recognition results of partial images using different methods.

Discussion
In this paper, based on the established ellipsoid-like herbaceous Tibetan medicinal material dataset, we first verified the criticality of texture features for distinguishing different medicinal material images by multi-feature fusion experiments. We proposed

Discussion
In this paper, based on the established ellipsoid-like herbaceous Tibetan medicinal material dataset, we first verified the criticality of texture features for distinguishing different medicinal material images by multi-feature fusion experiments. We proposed DenseNet models with attention and LBP-encoded Gabor features to recognize ellipsoid-like herbaceous Tibetan medicinal materials on complex backgrounds, and proposed an improved LBP algorithm for texture feature extraction. We discussed the effectiveness of data augmentation for this paper's research through experiments, and the experimental results prove that data augmentation can effectively improve the recognition accuracy of the experimental results. Our method achieved 93.67% accuracy on the original dataset and 95.11% accuracy on the augmented dataset. We additionally selected images with backgrounds more similar to the medicinal materials as a complex test set, and showed that our proposed method obtained a higher accuracy on this test set compared to other methods. Yet, the secondary recognition of misidentified ellipsoid-like herbaceous Tibetan medicinal materials still needs to be performed manually to ensure the safety of medication. The number of images in the dataset of this experiment was smaller than the standard public dataset CIFAR-10. Although the proposed model has achieved improvements in accuracy, there is still much room for improvement compared with the ideal case of Chinese medicinal material recognition. In the next work, the finished Tibetan medicinal material dataset constructed in this paper will be further expanded and unsupervised or semisupervised methods will be used to solve the annotation problem of a high-cost, large-scale, ellipsoid-like herbaceous Tibetan medicinal material dataset.  Data Availability Statement: While we appreciate the potential benefits of sharing our dataset, the sensitive nature of the data prevents us from doing so. The dataset contains data on cherished Tibetan medicinal herbs that cannot be made public without violating biodiversity security. We will be happy to provide access upon request without disclosing the agreement.

Conflicts of Interest:
The authors declare no conflict of interest.