Ancient Chinese Character Image Retrieval Based on Dual Hesitant Fuzzy Sets

. The complex and changeable structures of ancient Chinese characters result in the decreasing accuracy of their image retrieval. To resolve this problem, a new retrieval method based on dual hesitant fuzzy sets is proposed. Dual hesitation fuzzy sets that can express uncertain information more comprehensively are employed in the feature extraction process of directional line elements. The multiattribute evaluation index of adjacent grids for the current grid and its corresponding membership and nonmembership functions are established, and the weight of each attribute is calculated by the dual hesitation fuzzy entropy, such that the proposed features can fully reﬂect the topological structure of ancient Chinese characters. Using the dual hesitation fuzzy correlation coeﬃcient to measure the similarity between the ancient Chinese character images to be retrieved and the candidate images, the retrieval of ancient Chinese character images is realized. Experiments show that when the t0hreshold value of the correlation coeﬃcient is 0.9, the average retrieval accuracy is 90.4%.


Introduction
Image retrieval of ancient Chinese characters can assist researchers in tracing similar glyphs in the research process. It is an effective tool for the study of Chinese characters in ancient books. e theoretical basis of image retrieval of Chinese characters in ancient books is content-based image retrieval [1,2]. e key link is feature extraction and matching, that is, color, texture, shape, and other features [3][4][5] or their fusion features [6] are extracted through mathematical descriptions, and similarity calculation is conducted based on image features to identify images that are similar. Traditional Chinese character feature extraction methods mainly adopt structural features [7] and statistical features [8]. e fuzzy set theory has a good ability to represent uncertain problems; thus, it is applied to the recognition and retrieval of handwritten Chinese characters. Wei and Guo [9] extracted image features using dual-elastic grid technology and the correlation fuzziness between grid words. Ran et al. [10] proposed a normalized overlapped fuzzy bielastic grid, which was used to improve the effectiveness of the proposed features. Li et al. [11] used fuzzy entropy to classify Chinese characters with high accuracy and improved the recognition rate of Chinese characters. Liu and Meng [12] improved the membership function of a fuzzy support vector machine and improved the ability of text classification and recognition efficiency. e fuzzy set theory improves the efficiency of Chinese character recognition and retrieval to some extent. is theory expresses the singularity of fuzzy information, which causes information loss. erefore, a hesitant fuzzy set [13] is proposed and applied to deal with multiattribute decisionmaking (MADM) problems. e dual hesitant fuzzy set (DHFS) [14] is an extended form of the hesitant fuzzy set that combines the membership and nonmembership degrees of multiattribute evaluation information to express the uncertainty problem more comprehensively. Su et al. [15] proposed some distance measures and similarity measures and illustrated their applications in pattern recognition. Wang et al. [16] proposed a new dual hesitation fuzzy distance metric for the multiattribute decision-making problem with completely unknown attribute weights. Zhang [17] proposed several distance measurements and entropy measurement methods for dual hesitation fuzzy sets, which avoided the process of data expansion and overcame the problem of information loss to a certain extent. Combining the DHFS, Wei et al. [18] proposed some dual hesitant Pythagorean fuzzy Hamy mean aggregation operators, which are more valid for dealing with the MADM problem. Wang et al. [19] extended the q-rung orthopair fuzzy sets (q-ROFSs) to a dual hesitant fuzzy environment and presented the dual hesitant q-ROFSs. Tang and Wei [20] defined some dual hesitant Pythagorean fuzzy generalized Bonferroni mean operators which are utilized to design some methods to handle the MADM problems. Tang and Wei [21] investigated MADM problems based on Bonferroni mean operators with dual Pythagorean hesitant fuzzy numbers.
In view of the complex and changeable structure of ancient Chinese characters, this study proposes an image retrieval method for ancient Chinese books based on dual hesitant fuzzy sets. e membership and nonmembership degrees between adjacent elastic grids are calculated under the set attribute index, and the obtained evaluation values are placed in a dual hesitation fuzzy set, which can be used to extract the features of directional line elements. e dual hesitant fuzzy correlation coefficient is used to evaluate the similarity between the query image and the candidate images to realize the image retrieval of ancient Chinese characters.

Elastic Grid Division of Chinese Characters in Ancient
Books. Ordinary elastic mesh [22] does not evenly distribute the pixel density in each grid and has a low tolerance for feature mutations caused by stroke offset. In this study, the general elastic mesh is improved and a new elastic mesh division method is designed. e Chinese character image is divided into elastic grids according to the pixel density in a certain direction. Subsequently, each layer is divided twice according to the pixel density. e grid lines after the second division may not be connected to straight lines, as shown in Figure 1. e grid in Figure 1(a) is used to extract features of horizontal strokes of Chinese characters in ancient books. e grid in Figure 1(b) is used to extract features of vertical strokes. e grid in Figure 1(c) is used to extract features of apostrophe strokes. e grid in Figure 1(d) is used to extract features of downstrokes. e partition algorithm of Figure 1(a) is shown in Algorithm 1. Similarly, other grid division algorithms can be obtained.

Dual Hesitant Fuzzy Attribute Index Setting.
Considering the correlation between adjacent grids and the grid to be calculated, the corresponding membership and nonmembership functions are provided under three attribute indexes. Using the extraction of horizontal stroke features as an example, the definition process of attribute indexes and their membership and nonmembership degrees is illustrated in Figure 2. In Figure 2, the six neighborhoods extracted from the left and right sides of any grid are G (i) k (i � 1, 2, ..., 6). Figure 3, a is the pixel point in any horizontal stroke in the grid:

Pixel Distance Index. In
e standard normal distribution is used to characterize the membership and nonmembership of point a to G k [9]. Subsequently, the membership and nonmembership functions under the pixel distance index are where W (2) k is the width of G (2) k , d a is the distance between pixel point a and the left boundary of G (2) k , and n is the number of horizontal stroke pixel points in G (2) k . Figure 4, stroke b intersects with the left edge of G k , and stroke c is separated from the left edge of G k . e membership and nonmembership functions under the stroke position index are

Stroke Position Index. In
where n 1 is the number of strokes intersecting with G k and G (2) k , and n 2 is the number of strokes that do not intersect with G k . Figure 5, the left boundary of G k and the right boundary of G (2) k overlap, and the left boundary of G k and the right boundary of G (3) k partially overlap. e membership and nonmembership functions under the grid position index are

Grid Position Index. In
where H o is the overlap height between the right edge of G (3) k and the left edge of G k and H k is the height of G k . Under each attribute index, the membership and nonmembership degrees of each grid are calculated by the aforementioned function, and the different numbers of Chinese character images are statistically analyzed with the average membership and average nonmembership degrees of three indexes.
In Figure 6, hi(i � 1,2,3) are the average membership of three indexes and gj (j � 1,2,3) are the average nonmembership of three indexes. As can be seen from Figure 6, h3 is if DV > Sum * (k + 1)/5 then//the image is divided into 5 areas in the horizontal direction, Sum is the total number of pixels (5).
I l z � i; l � l + 1; z � z + 1//the initial values of l and z are 0 (14). end if (15). end for (16). end for (17). return J k , I k z ALGORITHM 1: Elastic grid division algorithm.    Scientific Programming higher than h1 and h2, and, from Figure 6(b), g2 is higher than g1 and g3. rough the aforementioned analysis, it can be concluded that the evaluation information of the grid has considerable differences under different indexes, and the three attribute indexes cannot be granted equal weight when the membership and the nonmembership degrees are calculated. In this study, the weight of the grid attribute index is calculated using dual hesitant fuzzy entropy, which improves the authenticity of the evaluation information.

Determination of the Attribute Index Weight.
Dual hesitant fuzzy entropy can effectively describe the degree of uncertainty of dual hesitant fuzzy elements [23]; it is defined as where ϕ(d), φ(d), and d refer to the membership, nonmembership, and hesitancy of any dual hesitant fuzzy element, respectively. e entropy theory can describe the degree of uncertainty of each attribute index to determine the attribute weight. Equation (9) is used to calculate the entropy E j of the attribute ω j (j � 1,2,3) of the grid G (i) k : At this point, E j and ω j are inversely proportional. e larger the E j , the smaller the ω j ; the smaller the E j , the larger the ω j . erefore, equation (10) can reasonably describe the relationship between E j and ω j :

Dual Hesitant Fuzzy Set.
Based on the hesitant fuzzy set, the dual hesitant fuzzy set fuses the membership and nonmembership degrees of multiple attribute indexes to improve its ability to express uncertain problems in decision-making. Its definition is as follows: where h (x) and g (x) are sets of some numbers in [0,1], respectively, representing the membership and nonmembership degrees of element x in the nonempty set X under D [12].

Feature Extraction of Dual Hesitant Fuzzy Direction Line
Element. e traditional feature vector of the directional line element only involves the set of directional attributes of strokes in a single grid; it does not consider the correlation between strokes in adjacent grids and in the grid to be computed, thus, affecting the stability of the proposed feature. rough equations (2)-(6), the evaluation value of G (i) k (i � 1, 2, . . ., 6) to G k under each attribute index is calculated, and the dual hesitation fuzzy set μ G (i) k is constructed. e weighted correlation coefficient between G (i) k (i � 1, 2, ..., 6) and the ideal grid at the corresponding position is calculated [24], as shown in equation (12), which represents the degree of influence of adjacent grids on the current grid: , c s ij and η s ij represent the sth largest element of the membership degree and nonmembership degree of G (i) k under each attribute [25]. e dual hesitant fuzzy direction line element characteristics of horizontal strokes in G k are as follows: where F H G k represents the dual hesitant fuzzy direction line element feature of horizontal strokes in the grid of Chinese characters, ρ G (i)

Image Retrieval Algorithm for Chinese Characters in Ancient Books
By introducing the dual hesitant fuzzy direction line element feature and using it to calculate the similarity between images of Chinese characters in ancient books, the retrieval results of images of Chinese characters in ancient books are obtained and the output is well ordered in Algorithm 2. e correlation coefficient was extended to the similarity measure between Chinese images. Equation (14) is used to calculate the correlation coefficient ρ R i between the image T I to be retrieved and the image T R i in the database. Multiple threshold values of correlation coefficients ρ θ are set to control the number of image outputs: where F I is the feature vector of the Chinese character image to be retrieved, F R i is the feature vector of the image in the database, N is the number of images in the database, and C(F I , F R i ) is the covariance of F I and F R i .

Experimental Parameter
Setting. e image samples of Chinese characters in ancient books were collected from "Si ku Quan shu," an important document recognized in the study of Chinese characters in ancient books. e images were marked according to information such as the cabinet, ministry, and book to which they belong. Owing to the absence of a public retrieval dataset, an experimental dataset for image retrieval of Chinese characters in ancient books was established based on the Chinese character samples collected earlier. VS2017 and SQLSEVER2017 were used to realize the image retrieval system of ancient Chinese books. e retrieval experiment was conducted on 92840 ancient Chinese book samples in the dataset.
Definition 2. Recall rate (R) is the ratio of the number of images of similar Chinese characters in the retrieval results to the number of images of all similar Chinese characters in the experimental data: Definition 3. F-Measure (F α ) is the weighted harmonic mean of P and R: where α � 1, P, and R have the same weight. F 1 combines the results of P and R; the higher the F 1 , the better the retrieval performance.

Retrieval Performance Analysis.
Four groups were selected according to the structure of Chinese characters and 10 samples in each group were retrieved for analysis, as shown in Table 1. e feature extraction methods in [9,10] were used for the comparative tests. We use the same dataset and similarity measure to compare these algorithms. e retrieval results of the aforementioned experimental samples were counted and analyzed under the set threshold of the correlation coefficient, and the recall, precision, and F 1 were calculated. e final result was obtained from the average of 40 samples.
According to the experimental results, when ρ θ is 0.7, the average recall rate of three experiments falls below 60%. To ensure the validity of the experiment, the threshold value was selected from [0.7, 0.9] and the interval was set as 0.05.
It can be seen from Tables 2 to 4 that, under the set correlation thresholds, the retrieval method in this study is superior to the comparison tests in terms of recall rate and precision. is indicated the effectiveness of this method in the image retrieval of ancient Chinese characters. Figure 7 shows the contrast line chart of F 1 , where E1 refers to the algorithm of this study, E2 refers to the algorithm in [9], and E3 refers to the algorithm in [10]. From Figure 7, the performance of the algorithm in this study is higher than other algorithms. e proposed algorithm evaluated the influence of adjacent grids on the current grid and improved the robustness of the features. erefore, the algorithm in this study is applicable in the retrieval of ancient Chinese character images. Figure 8 shows the average recall and precision of the four groups of samples in this study.

Scientific Programming
Input: image T I to be retrieved Output: images T R i of Chinese characters within the threshold of the correlation coefficient (1). Open the image T I to be retrieved (2). Preprocess T I (3). Elastic grid division (4). e dual hesitant fuzzy direction line element characteristics of T I were extracted (5). while i < N do//traversal database (6).
Calculate the correlation coefficient ρ R i between T I and T R i (7).
T R i is added to the result data table R(id, ρ) (9). else (10).
end if (12). end while (13). return R(id, ρ) ALGORITHM 2: Image retrieval algorithm of Chinese characters in ancient books based on the dual hesitant fuzzy direction line element.     Scientific Programming P j (j � 1,2,3,4) are the average precisions and R i (i � 1,2,3,4) are the average recalls. As can be seen from Figure 8, the retrieval accuracy of this method for Chinese characters with left and right structures and a single structure is higher than that of other structures.

Retrieval Result Analysis.
A Chinese character image was randomly selected to compare the retrieval results of the three experiments. As shown in Figure 9, the threshold value of the correlation coefficient is set as 0.8. e lower left view shows the first 15 images of the retrieval results of this experiment. e upper right view shows the first 15 images of retrieval results of the method in [9] and the lower left view shows the first 15 images of retrieval results of the method in [10].
As can be seen from Figure 9, the first 15 images in this experiment have a higher similarity than the comparison experiments. e similarity of the output images in the comparison experiments is significantly lower than that in this experiment. is indicates that the retrieval method in this study has a high accuracy.

Conclusion
is study proposes a method of image retrieval for Chinese characters in ancient books based on a dual hesitant fuzzy set. Dual hesitant fuzzy sets have the advantage of expressing uncertain information more comprehensively. ey introduce the information into feature extraction of directional line elements, calculate the comprehensive evaluation value of adjacent grids to the current grid under multiple attributes, extract more complex and robust image features of Chinese characters in ancient books, and improve the retrieval performance. e experimental results show that the average precision and average recall of this method are 1-4 percentage points higher than those of the comparison methods under multiple correlation coefficient thresholds. e follow-up work will be mainly conducted with the following two aspects: (1) improving the attribute index according to the topological structure of Chinese characters in ancient books and (2) optimizing the algorithm to improve the retrieval efficiency because the time complexity of the algorithm is relatively high as the membership and nonmembership should be calculated under multiple attributes.

Data Availability
e data used to support the findings of this study have been deposited in the https://github.com/ningmengweidexiaotanke/ AecientCC_DHFS.

Conflicts of Interest
e authors declare no conflicts of interest.