Quantification of Tobacco Leaf Appearance Quality Index Based on Computer Vision

The appearance quality index of tobacco leaves is widely used in the tobacco industry. But in national standards for flue-cured tobacco, the specified indicators only have qualitative descriptions, and a few have a range of quantitative values, lacking quantitative calculation methods, which affects the effective use of these indicators in tobacco automatic grading. In this work, we provided a computer vision-based quantitative research approach for color intensity, length, waste, and body of tobacco appearance quality indicators. We also designed quantitative algorithms for these indices to achieve precise quantitative values to address this issue. Especially we proposed the quantization algorithm of color intensity and waste originally. In order to employ the quantification algorithm for each index, the tobacco leaf image was first segmented to determine the tobacco leaf region in the picture. Second, a mesh segmentation technique for the color intensity is developed. A comparison of the color differences between the several sub-images of the tobacco leaf image is divided. The pixel length was swiftly calculated, the boundary points at both ends of the tobacco leaf were located, the minimum outer rectangle of the tobacco leaf was calculated for the length index, and the actual length was obtained by the checkerboard reference data. Internal waste and marginal waste are the categories under which the waste index is divided. To locate holes and abnormal areas for internal waste, the connected region analysis is employed. The waveform of the edge was created and studied to determine the missing part of the edge. The actual area of the tobacco leaf was calculated by the design algorithm, the body index was expressed as weight per unit area, and the weight was determined by a pressure sensor. Finally, each index’s experimental verification is designed. The empirical findings demonstrate that semantic segmentation average accuracy is 8.4% higher than threshold segmentation in the extraction of tobacco leaf regions. The average relative error between the calculated tobacco leaf length and the manual measurement is 2.83%. The average accuracy of tobacco leaf position classification was 88.52% under the six classifiers. The correlation coefficient between tobacco leaf body quantification value and tobacco leaf thickness value is 0.9270.


I. INTRODUCTION
One of the crucial tasks in the tobacco industry is grading tobacco leaves since it impacts a leaf of areas, including tobacco collecting, storage, processing, and selling [1]. Currently, the two primary methods used in the business for grading tobacco are the physical-chemical and appearance The associate editor coordinating the review of this manuscript and approving it for publication was Junchi Yan . methods. Physical-chemical methods are more accurate, but they are more expensive, require more expensive equipment, and have complex and lengthy detection processes that are better suited for laboratory analysis than for industrial mass manufacturing. The appearance method, in contrast, relies on artificial experience and is less accurate and reproducible than physical-chemical methods. Still, it has the benefits of quick detection and simple batch operation, which are more advantageous in industrial mass-production situations [2].
The appearance method continues to be used in tobacco leaf grading manufacturing as of right now. According to the National Standard of Flue-cured Tobacco of the People's Republic of China, the appearance method is a method for grading tobacco leaves based on their color, maturity, body, oil, color intensity, length, waste, and other appearance quality indicators.
There are two implementation approaches for the appearance-based grading of tobacco leaves: manual grading and intelligent grading. The manual grading adopts the method of manual classification. The classification results are obtained through observation and work experience of tobacco grading experts. Numerous tobacco grading experts are needed for mass production; their labor is expensive and time-consuming. Near-infrared spectroscopy and computer vision are the primary tools used in intelligent grading to identify and examine the characteristics of appearance indicators [5]. In the 1970s, computer vision techniques began to be used to grade and verify agricultural products. In the middle and late 1980s, this technology gradually advanced and started to grade and confirm various agricultural items, including fruits, vegetables, grain, and other products [6]. The early study and attempts of the computer vision method were primarily made in grading tobacco leaves, but they did not find practical application [7]. Machine vision technology started to be used in grading tobacco leaves in the middle and late 1990s when some researchers used it to collect data for modeling and grading prediction on tobacco leaves [8]. These researchers demonstrated good discrimination ability for samples of a particular smoking area and a certain level. The present primary research directions concentrate on the extraction of reasonable characterization criteria of tobacco leaf quality, the fusion of artificial intelligence technology and the engineering experience of tobacco leaf experts, and the development of specialized automation equipment. Related study outcomes According to the working knowledge of tobacco leaf experts, J. Ma et al. retrieved several picture characteristics of tobacco leaves. By principal component analysis and clustering, they achieved the automatic evaluation of tobacco leaf quality [9]. Based on the characteristics of the transmission images of flue-cured tobacco leaves, Z. Shen et al. innovatively proposed using neural network analysis to realize the extraction of tobacco leaf characteristic parameters [10]. A set of image acquisition systems for photo-electromechanical integrated flue-cured tobacco leaves and an intelligent grading system based on a brain-based system were created by L. Han et al. [11]. Recently, tobacco leaf grading has also become widely used for NIR spectral detection technologies. In contrast to computer vision, it employs spectrum data as a feature, allowing it to integrate some indicators that are problematic with the national standard into the spectral data of various bands. Associated study findings comprise, A method to distinguish between distinct tobacco leaves by soft-independent mode grading has been proposed by W. Du and other researchers after studying the near-infrared spectral data of tobacco leaves [12]. According to a theory by J. Zhang et al., tobacco leaves have various infrared spectral signals, which can be integrated with probabilistic neural networks or support vector machines to automatically grade tobacco leaves [13].
The critical development trend for tobacco leaf grading is detection with high efficiency, good repeatability, cheap labor costs, and the advantages of small working strength as intelligent grading does not rely on the expert knowledge and experience of field grading workers. The tobacco industry now emphasizes intelligent grading, intelligent collection, and intelligent storage of tobacco leaves [7]. It is necessary to quantify the pertinent indicators, such as extraction characteristics, predetermined standards, and so on, before engaging in the intelligent grading tobacco leaves. However, the current national standards' evaluation of tobacco appearance quality indicators focuses more on qualitative description and less on quantitative analysis. Most qualitative descriptions are based on the professional experience of experts in tobacco grading, which involves a significant deal of subjectivity and uncertainty [3]. The lack of unified quantitative standards negatively impacts the stability and reproducibility of tobacco leaf grading. If the qualitative description of the oil index is rich, oily, less, lean, or four levels, subjectivity and uncertainty increase at each level. Additionally, despite specific indicators having quantitative descriptions, they provide the minimum threshold value set at this grade. These quantitative values are likewise fictitious guesses made by tobacco leaf specialists with real-world experience and are also highly subjective [4]. For instance, the national standard of tobacco waste of X2F grade is 25%, which refers to the maximum waste of X2F grade, but the precise calculation method is not explicitly specified. These issues result in inconsistent and difficult-to-unite research on the intelligent grading of tobacco leaves due to the lack of common quantitative standards and theoretical underpinnings.
As can be seen, it is difficult to meet the development needs of intelligent tobacco grading using the qualitative description of tobacco appearance quality indicators in national standards. It is necessary to conduct relevant research on quantifying tobacco leaves' appearance quality indicators.

II. PROPOSED METHOD
The quantitative research of tobacco leaf appearance quality indicators is an essential basis for industry-related work such as the intelligent grading tobacco leaves. It focuses on the techniques of quantitative description, quantitative extraction, quantitative treatment, and quantitative representation of tobacco appearance quality index, which can provide more detailed and accurate characteristics for the intelligent grading of tobacco leaves, which has significant practical significance. At present, research in this area is relatively rare. This paper established a quantitative model of the tobacco appearance quality index, verified the usability of the quantitative model through pertinent experiments, and carried out a quantitative study of the tobacco appearance quality index based on computer vision to meet the needs of intelligent tobacco leaf grading. This study focuses on four crucial appearance quality indicators-color intensity, length, waste, and body-because there are many more appearance quality indicators for tobacco leaves, but they are challenging to quantify manually. Computer vision has established the quantitative representation method and value calculation model for these indicators.
The tobacco leaf regional extraction portion and the tobacco leaf appearance quality index quantification algorithm are the two components of the model for the tobacco appearance quality index quantification algorithm proposed in this paper. Figure 1 depicts the entire model structure. It addresses the correct leaf region pixel set acquisition, which forms the foundation for quantifying the appearance quality index. A quantitative algorithm model for the indicators of color intensity, length, waste, and body is partially established by the quantitative algorithm of the appearance quality index.

A. TOBACCO LEAF AREA EXTRACTION
The image segmentation process includes extracting the tobacco leaf region, which is the precise acquisition of the tobacco leaf region pixel set in the image. Frequently-used object detection models, SSD [14], Faster-RCNN [15], [16], Yolo [17], etc., are primarily used to identify all types of target objects in the image and frame the position of the target object in the picture. The pixel-level extraction task for tobacco leaf regions is not appropriate due to the difficulty in providing each target item's entire and precise pixel set. A high degree of image segmentation accuracy is also required to extract the tobacco leaf region. Whether or not the tobacco leaf region is accurately extracted will impact how accurately the tobacco leaf appearance quality index is quantified. In order to extract the region of the tobacco leaf, this work uses high-precision semantic segmentation based on deep learning.
The Yunnan Tobacco Leaf Company provided actual tobacco leaf samples to create the dataset of images of tobacco leaves used in this study. The sample photos of each grade of tobacco leaf are acquired using a specific tobacco leaf image acquisition device, as shown in Figure 2, to create a dataset of tobacco leaf photographs. Small tobacco leaf pieces are frequently strewn on the background workbench due to the working environment. This can result in mixed tobacco leaf image images, as shown in Figure 3, which can compromise the accuracy of tobacco leaf area extraction. Due to this, the tobacco leaf region extraction technique consists of two steps: semantic segmentation and connected region screening. The former adopts the DeepLabV3+ semantic segmentation model to preliminarily determine the tobacco leaf region. The latter removes the residual background, broken tobacco, and other impurities in the tobacco leaf picture to obtain a more accurate tobacco leaf region. The specific steps are shown in Figure 4.
The encoder-decoder network design frequently employed in image segmentation is used by the DeepLabV3+ model, as depicted in Figure 5. The encoder's job is to extract features from the input data and reduce its spatial dimension; the decoder's job is to restore the specifics of the input target and its related spatial dimensions. The tobacco leaf's original image is set to A. When DeepLabV3+ is in operation, the   input image A first enters the network's backbone, and the output result is then processed further in the encoder and the decoder, respectively. The receptive field is expanded in the encoder by multiple dilated convolutions, and more valuable features are extracted and merged. Then a 1 × 1 convolution kernel is used to reduce the number of feature channels. Finally, the feature map of the encoder output is obtained through the up-sampling operation, which contains the indepth features of the input image. The part of the decoder directly obtains the feature map containing the shallow features of the input image through 1 × 1 convolution kernel and up-sampling. The two-part feature map fusion is restored to the input image's size by interpolating the upsampling operation, and the final output result is obtained. In this paper, the up-sampling operation adopts the bilinear interpolation method, and its calculation process is shown in equation (1): where Q 11 (x 1 , y 1 ), Q 12 (x 1 , y 2 ), Q 21 (x 2 , y 1 ) and Q 22 (x 2 , y 2 ) are the four points nearest the pixel to be interpolated, forming a rectangle. f is a function of finding the value of the pixel. In this paper, the mobilenetv2 network is utilized as the backbone network for the DeepLabV3 + architecture. The convolutional layer uses deep separable convolution, which has a low operation cost and a small number of parameters, which can speed up model training. The semantic segmentation output is a category matrix B of the same size as the original image A, with the label name of the corresponding pixel as its element, the background as ''Back,'' and the tobacco leaf part as ''StopSigh,'' as shown in Figure 6. Make a zero matrix C, the same size as the category matrix B. Take the positions of all the elements in category matrix B, whose signature is ''StopSigh,'' and set the value of the corresponding element in matrix C to 1. The first determined tobacco leaf region is represented by the part of element value 1 in C. Then, all connected regions in C are found using the 8-connected method, sorted based on the number of elements in each region. The accurate tobacco leaf region was the connected region with the largest range that was screened out based on the number of elements. In contrast, the other connected regions were the lingering background and broken leaves. The background and broken leaves in the tobacco leaf image A are eliminated, leaving only the precise tobacco leaf area image, and the pixel value of the corresponding position of the maximum connected region is retained in the original image A while the pixel value of the other parts is set to 0. This results in removing the background and broken leaves from the tobacco leaf image A. Figure 7 displays the outcome.

B. QUANTIFICATION ALGORITHM OF TOBACCO LEAF APPEARANCE AND QUALITY INDEX
The quantitative algorithm of the appearance quality index develops a quantitative algorithm model for the indices of color intensity, length, waste, and body. The color intensity index quantitative model consists of color uniformity quantification, color saturation quantification, and color gloss intensity quantification; the length index quantification model consists of tobacco leaf boundary point extraction and actual length and width measurement; the waste index quantitative model consists primarily of tobacco leaf internal waste region extraction and tobacco leaf marginal waste region extraction; the body index quantitative model consists of tobacco leaf actual area measurement and precise weight measurement.

1) COLOR INTENSITY INDEX
The primary color of tobacco leaves recognized by the national standard has three colors: lemon (L), orange (F), and red (R). According to the investigation, most tobacco leaves purchased by relevant enterprises in the tobacco industry are orange. The other two kinds are generally used as fillers for ingredients, which are rarely purchased. The RGB mean values of the three primary colors in the chromatogram are pretty different: lemon is [250,202,46], orange is [237,109,0], and red is [204,102,51]. Therefore, it is relatively easy to distinguish the three primary colors of tobacco leaves by both traditional human observation and machine vision methods.
The color intensity index in the appearance quality index describes the difference between the more detailed factors, such as color uniformity, color saturation, and color gloss intensity of tobacco leaves under the same color, which is equivalent to a profound description of the color index. Currently, there is no unified standard for quantifying the color intensity index and obtaining the quantified value. Because of this, this paper refers to the principle that three color channel values of RGB can represent object color. Three factors significantly influencing the color intensity index are adopted: color uniformity, color saturation, and color gloss intensity form a three-dimensional color intensity vector. Then, the quantization value of the color intensity vector is obtained by quantizing the three dimensions.
Color uniformity indicates the degree to which the surface color of tobacco leaves is uniform. The method based  on color difference theory is used to construct the uniformity quantification algorithm in this research. Lab color space is typically utilized when comparing color differences.
In this work, the Commission Internationale de l'Éclairage (CIE)-proposed CIEDE2000 chromatic aberration formula is used, which is more accurate than prior chromatic aberration VOLUME 10, 2022 calculation methods. The preprocessed tobacco leaf image is first split into several smaller images, and the pixels of each smaller image is kept separately. Then, all sub-images are joined in pairs, and the color difference of each combination is calculated independently. A color difference threshold was then established, and the percentage of varieties with a color difference less than the threshold to the total number of combinations was calculated. This percentage represents the degree of color difference between each portion of the tobacco leaf image, which can be used as a quantitative measure of color uniformity. Figure 8 displays the process flow diagram for the aforementioned algorithm.
Tobacco into several sub-images and image segmentation, segmentation amount to be modified by actual needs and effects, the partition number is excessive, contrast effect is better, result accuracy is higher, but the enormous amount of calculation and impact the overall working efficiency. Although the running speed is quick and the segmentation number is low, the outcome is not precise enough. When there are fewer than 50 test segments, the rate is higher; when there are more than 150 segmentations, the computation will take a while. The final segmentation has 100, which is simple to notice and has excellent computational efficiency. In Fig. 9, the segmentation impact is displayed.
An array cell of 100 × 1 is used to hold the segmented sub-pixel image's matrix, and its elements are later examined. The entire background part of the sub-image corresponds to the image whose pixel matrix is 0, and this type  M remaining sub-images.
where the mean function means to calculate the mean of matrix elements in parenthesis, R i , G i , B i respectively represent the pixel matrix of RGB three channels in the ith sub-image, sum function means to calculate the sum of all elements of the matrix in parentheses, and find function means to calculate the number of matrix elements in parentheses that meet the conditions. Following the computation of the RGB mean of M sub-images, they are also placed in the cell array as 1 × 3 triples. The color difference comparison group is then calculated by pairing all elements in the cell array. The color difference E in each group is calculated using the CIEDE2000 algorithm. The color difference set E is composed of the E of all combinations. Calculated as the difference between the enormous value and the minimum value, the range of the elements in E. The threshold is set at one-third of the range, and the quantization value of color uniformity is taken as the percentage of combinations fewer than the threshold in E and the total number of combinations, as indicated in Equation (5): where color uni represents the quantified value of color uniformity; the larger the value is, the more uniform the color of tobacco leaf surface is, and E max and E min respectively represent the maximum and minimum values in E. The color saturation represents the intensity of the tobacco leaf surface color, and the visual perception of the eye is the vividness of the color. For a certain color, the maximum value of RGB value reflects the spectral value of the color in the corresponding color gamut. The closer the object color is to the spectral value, the more vivid the color and the higher saturation. The calculation method of color saturation quantization value is shown in Equation (6): where color sat represents the quantified value of color saturation; the larger the value is, the more vivid and saturated the tobacco leaf surface color is. R, G, and B are the RGB mean values of the tobacco leaf part of the overall tobacco leaf image, and the calculation method is consistent with Equations (2), (3), and (4). The degree of brightness of the surface color of the tobacco leaf is represented by the color gloss's intensity. Equation (7) illustrates that the maximum value of RGB is used to represent the quantization value of luster intensity because it is the primary value of the brightness of object color that human eyes can perceive.
The quantitative value of three influencing factors, namely color uniformity, color saturation, and color gloss intensity, which together made up the three-dimensional vector [color uni ,color sat ,color bri ] was used to represent the quantitative value of the final color intensity index of the appearance quality of tobacco leaves.

2) LENGTH AND WIDTH INDEX
In the national standard, the length is specified as the distance between the major vein stalk and the tip of the tobacco leaf, which is easier to measure physically. In the national standard, the precise definition of width is not specified; rather, it is typically determined by artificial experience and intuitive judgment. Since tobacco leaves have asymmetric and irregular external shapes, this paper uses a definition similar to length to determine the width, and for the quantification of length and width of tobacco leaves, a non-contact automatic measurement method of length and width of tobacco leaves based on computer vision is proposed. The procedure is illustrated in Figure 10. To begin with, the boundary points for the upper and lower ends, as well as the left and right ends of the tobacco leaf, were calculated after the pixel matrix of the image of the tobacco leaf was evaluated to determine its minimal outside rectangle. In order to determine the relationship between the pixel distance and the real distance, a black-and-white checkerboard image is employed as the reference image.
Finally, measurements of the tobacco leaf's actual length and width are calculated.
The background pixel is 0, and the tobacco leaf pixel is the original value in the pixel matrix of the image of a tobacco leaf after image segmentation. As a result, the image pixel matrix is navigated row by row or column by column in the upper and lower directions, as well as column by column in the left and right directions. Using combination, a minimum rectangle enclosing the tobacco leaf is found in the first row or column that is not 0. Figure 11 illustrates how. Find the pixel position that is not 0 on the side length of the minimum rectangle, and find the pixel in the middle position. Equation (8) represents the calculation process of the intermediate pixel.
where N is the total number of pixels on one side of the minimum rectangle, K is any positive integer, and i is the ith pixel whose value is not 0 on this side. By using the aforementioned method, four pixels can be produced, as seen in Fig. 12, where the left and right pixels are situated at the leaf tip, and the end of the leaf peduncle, respectively, and the distance between the two places is equal to the pixel distance of the length of the tobacco leaf. The width of the tobacco leaf is represented by the distance between the upper and bottom pixels. The conversion from pixel distance to actual distance requires a reference. This paper uses a standard black-andwhite checkerboard image to establish the conversion connection between pixel distance and real distance at a given shooting height in order to calculate the actual length and width of tobacco leaves. Measure the edge length a i of five squares at different positions on the black and white checkerboard, then use the average value as the actual edge length of a square on the black and white checkerboard. Then, five grids on the shot black-and-white checkerboard were selected in the same manner in order to measure the pixel edge length or the Euclidean distance b i between image pixel locations. The average value was also calculated as the length of each pixel's edge in the black-and-white checkerboard. According to Equations (9) and (10), the real length and width of tobacco leaves were calculated.
where L n , W n L n W n represent the actual length and width of tobacco leaves, l 1 , l 2 are respectively the pixel coordinates of the left and right pixels, w 1 , w 2 are respectively the pixel coordinates of the upper and lower pixel points.

3) WASTE INDEX
Waste is defined in the national standard as aberrant phenomena such as diseased patches, holes, and edge damage brought on by the injury of the tobacco leaves' surface. Based on prior observations, tobacco experts estimate the maximum percentage of waste for each grade, which is specified by the national standard. The accuracy of the traditional method, which mostly relies on physical observation and judgment, is low since it measures the proportion of tobacco leaf waste area. This paper proposed a more precise quantitative method of treating the tobacco waste index, which calculated waste ratio of the tobacco inside and waste ratio of the tobacco edge accordingly, and then added them together to get the overall waste ratio of tobacco. The color characteristics of tobacco leaves with waste were noticeably different from tobacco leaves with intact wounds, and they primarily consisted of aberrant locations like holes and burnt spots. First, the tobacco leaf image was cropped using the length calculation method to determine the smallest rectangle that would fit the tobacco leaf part. In the color intensity calculation method, the trimmed image is next binarized in accordance with the binarized image method. At this point, the white part ''0'' stands in for the background, while the black part ''1'' represents the tobacco leaf part. To achieve the color exchange between the tobacco leaf and the background portion, every element of the entire binary image matrix is inverted. The background part and the abnormal areas within the tobacco leaves are calculated via a sequence of connected regions that are obtained after computing the connected regions of the binary image at this time. The top four pixels with the highest number in the connected region, which correspond to the image's four corners, make up the background. When taking tobacco leaf pictures, it is required to take the front according to the industry specifications. In contrast, the color of the leaf veins on the back of the leaf differs significantly, and the difference between the front and other abnormal areas is slight. The color of the leaf stem part is highly similar to the black background, and it has been included in the background part when calculating the connected region. The connected regions are arranged in descending order according to the number of elements contained. After the first four are removed, the remaining areas represent the damaged areas inside the tobacco leaf. The calculation method of the damaged ratio inside the tobacco leaf is shown in Formula (11), and the algorithm flow chart is shown in Figure 13.
where rate in is the internal waste ratio of tobacco leaves, num i is the number of elements in the ith connected region, N is the total number of connected regions, and p 1 is the number of pixels in tobacco leaves. The waste area inside the tobacco leaf is shown in Figure 14.  Edge flaws and mechanical creases were the principal causes of tobacco leaf edge damage. In this paper, the image's tobacco leaf part was the first edge extracted, and the image's center was used as a reference point to determine the pixel distance between each point in the edge point set and the reference point, as well as the angle of inclination of the line connecting them. With the help of these details, the edge waveform map was created, the marginal waste area was examined and extracted, and the marginal waste ratio was ultimately calculated. Figure 15 displays the flowchart for the algorithm.
This paper uses the Canny operator to extract the edge of tobacco leaves in order to avoid the edge influence of internal waste. First, the previously extracted internal waste area of tobacco leaves is filled with red. Equation (12) is used to determine the tilt angle of the line connecting each point in the edge point set to the reference point.
where A is the inclination angle, (x r , y r ) represents the reference point coordinates, which is the center of the tobacco leaf image, (x i , y i ) represents the ith point coordinate in the edge point set. After determining the tilt angles of all the points, arrange them in ascending order, and then, taking into account the arrangement order, determine the distance between the appropriate edge point and the reference point. The distance is used as the ordinate, and the ordinate number is used as the abscissa when drawing the edge waveform map. Although the leaf pedicel part of the edge waveform is extremely close to the base point, and the distance jumps substantially, the waveform exhibits drastic oscillations that are not suitable for further study. The overall shape of the edge waveform is similar to the sine and cosine functions. The undulation area of leaf peels was discovered in this paper by numerous observations to be within the range of inclination Angle 10 • , that is, the part of inclination Angle less than 10 • and larger than 350 • . As a result, this part was removed, and Figure 16's edge characteristic waveform was redrawn. In the processed waveform, a sizable undulating area corresponds to the typical region of marginal waste. The edge waveform is applied to first-order differential processing to obtain the differentially processed differential waveform, as shown in Fig. 17. After the location of these undulating areas has been extracted. The precise method entails subtracting two neighboring words in order to make the first difference in the edge waveform's horizontal and vertical coordinates. In the edge of the leaf tobacco differential waveform diagram, the total normal part ordinate absolute value is less than 3, so three is the threshold value. On the original differential waveform signal processing, less than the threshold value part ordinate set to 0, more than part of the original value unchanged, the differential waveform ordinate absolute value to 0 not part of the figure is the edge of the injury characteristics of the part. In order to determine the final waste ratio of the tobacco leaf edge, the relevant points were extracted and calculated as point set B. As shown in Equation (13): where rate bound is the waste ratio of tobacco edge, n is the total number of elements in B of the point set, and bound all is the total number of elements in the edge point set. Fig. 18 depicts the waste region of the tobacco leaf edge. The total waste ratio of tobacco leaves is equal to the sum of the internal waste ratio and the marginal waste ratio. The traditional measurement of waste relies mostly on internal waste information. However, this paper enriches its substance by including edge information. During the experiment, it was determined that the quantification results of waste on the majority of tobacco leaves were not significantly higher than the maximum waste ratio of the national standard; therefore, the total percentage of the sum of the two tobacco leaves would not exceed 1. The method of calculating the total waste rate is depicted in Equation (14):

4) BODY INDEX
In the national standard, the state of tobacco leaves is characterized by leaf thickness, cell density, or weight per unit area, which can be expressed by any of the three aforementioned numbers; the weight per unit area is used for this paper.
Because human measurement of thickness is very simple, tobacco leaf thickness is usually utilized in conventional methods, whereas this paper is more suitable for the research of weight per unit area based on computer vision. In this paper, the actual area of a tobacco leaf was calculated using the method of calculating length and   width indices. Concerning the weight, a pressure sensor is mounted beneath the tobacco leaf placement platform to measure the weight, and an industrial camera is installed above the platform to capture images for back-end image processing. The total efficiency is greater than manual thickness measurement, and many indicators of appearance quality can be calculated simultaneously. The general structure is depicted in Figure 2. The method for calculating the actual tobacco leaf area is depicted in Equation (15): where P is the total number of small squares in black and white checkerboard, p 1 is the number of pixels in the tobacco leaf area in the tobacco leaf image, and p 2 is the total number of pixels in the image taken by the device used by the operator.

III. EXPERIMENTAL ANALYSIS A. REGIONAL EXTRACTION EXPERIMENTAL
The industrial camera we use to take pictures of tobacco leaves is the Mercury series USB3.0 interface digital camera of China Daheng Group Co., Ltd., with the model MER-1520-13u3c. The pictures taken are in bmp format with a resolution of 4232 × 1944. Subsequent image processing and quantization algorithms are implemented on the MATLAB R2021a software platform. The computer used in this paper is configured as a 64-bit windows10 operating system, and the processor is Intel (R) Core (TM) i5-10400F CPU @ 2.90GHz. Two hundred image data sets comprised tobacco leaf images, with 140 training data sets and 60 validation data sets. And take another 100 tobacco leaves pictures as the test set. During label making, mark the tobacco leaf part as ''Stop-Sigh'' and the background part as ''Back,'' and then conduct model training, and 200 training iterations are performed. The outcomes of the training are depicted in Figure 19, which depicts the accuracy curve and loss curve, respectively.
For image segmentation, traditional visual methods frequently employ the RGB threshold of the foreground and background. To further illustrate the benefits of the DeepLabV3+ semantic segmentation model, 100 images of tobacco leaves were chosen to compare the effects of semantic segmentation and threshold segmentation, and the results are presented in Table 1. The results demonstrate that the average accuracy (AP) and average recall (AR) of semantic segmentation are superior to those of threshold segmentation, with the difference in average accuracy being particularly pronounced. The primary reason for this analysis is that threshold segmentation will categorize some dark speckles on the surface of tobacco leaves as background, hence altering the actual pixel information of tobacco leaves and decreasing the segmentation's accuracy.

B. QUANTITATIVE EXPERIMENT OF COLOR INTENSITY INDEX
For the color intensity index, this paper employs two experimental methods. In the first method, three images with considerable color differences observed by human eyes in tobacco leaf samples are chosen, and the three quantities  in the three-dimensional vector expressing color intensity proposed in this paper are calculated, as depicted in Figure 20.
As seen in Figure 20, the surface of the first tobacco leaf from bottom to top is free of progressive lines and holes, etc., resulting in a more uniform and consistent color with the highest degree of color uniformity. The second tobacco leaf has a surface with numerous gradient marks, and the human eye perceives the color to be dark generally; hence, its color uniformity and color gloss intensity are the lowest. The surface of the third leaf contains holes, and its color uniformity is slightly less than that of the first leaf, but it is the brightest. Therefore, its color gloss intensity is the greatest.
In the second method, 50 C2F and C3F grade tobacco leaves from five distinct tobacco-producing regions were selected, and the mean values of the three quantities of the color intensity three-dimensional vector were calculated and compared, as shown in Table 2.
Due to the influence of geography, climate, and other environmental factors, the same grade of tobacco leaves grown in different places will be distinct. Therefore, this paper selected five distinct production regions to study and analyze their respective two grades of tobacco leaf production. In the national standard, C2F and C3F are the same color, with C2F having a strong color intensity index and C3F having a medium color intensity index. As shown in Table 2, the color uniformity and color saturation of the two grades of tobacco leaves in Dehong are identical. However, the color gloss intensity of C2F is 7% greater than that of C3F, indicating that the color intensity index of C2F is stronger than that of C3F and satisfies the national standard's quality requirements. In Baoshan and Wenshan, all three values of C2F were greater than those of C3F. Although the color saturation of C2F in Lijiang tobacco was somewhat lower than that of C3F, the other two numbers were higher than C3F, resulting in an overall index that was greater than C3F. Both the color saturation and color gloss intensity of Dali tobacco was lower in C2F than in C3F, but the color uniformity increment was higher, and the overall index was still indicative of strong C2F.
The above investigation demonstrates that when tobacco leaf portions have the same color, the color difference in the senses, such as between bright and bright colors, becomes less pronounced, which has a significant impact on the accuracy of hand grading. In this paper, the triplet is employed to express the color intensity index, and the influence of three elements on the tiny color difference is analyzed in depth. It can precisely depict the color intensity difference between tobacco leaves of the same position and color.

C. QUANTITATIVE EXPERIMENT OF LENGTH AND WIDTH INDEX
Two plans are designed in this paper to test the length-width index quantization algorithm. In Scheme 1, 10 groups of tobacco leaf samples were proposed, and for comparison, the length data for each sample were obtained using the suggested algorithm and a manual measurement method, respectively. The length and width data obtained by the algorithm in this paper are utilized as input features in Scheme 2 to classify portions with a classifier, and the classification impact is examined. The comparison results of Scheme 1 are shown in Table 3. One-way analysis of variance is also used in this paper to examine the differences between the two groups of data, and the results are displayed in Table 4.  Table 3 shows that the results produced by the suggested algorithm are marginally inferior to those produced by manual measurements. The reason is that manual measurements can be more exact since the leaves can be processed in greater detail, such as advance flattening, unfolding, fixing the ends, etc., but it takes longer to measure a leaf. The length calculated by the suggested algorithm and the manual measurement has no appreciable differences, as shown in  Figure 22, where the P-value of the ANOVA result is greater than 0.05.
In Table 4, the difference between various indicators under a single factor was assessed using a one-way analysis of variance. Here we compare the length differences between the manual measurement results and the computation results of the suggested algorithm. The table's columns 1 through 5 provide one-way ANOVA calculation parameters, while column 6's PF value represents the test's minimal threshold for significance. The null hypothesis in a one-way ANOVA, which is effectively a hypothesis test, is that there is no discernible difference between the various indicators. The null hypothesis is rejected if the significance probability is less than or equal to 0.05. When the PF value is greater than 0.05, as it is in this case, the null hypothesis is accepted, indicating that there is no appreciable difference between the results of the proposed algorithm's results and those of the manual measurement.
The classification of tobacco leaf parts is primarily guided by the length and width index, and its quantification effect may be further assessed by the classification effect of tobacco leaf parts. To calculate the length and width index, 400 tobacco leaf photos with known grades were used for this paper. The actual length, width, and area of the tobacco leaves were then used as the feature input, the tobacco leaf parts were used as the classification target, 70% of the training images were used, 30% of the training images were used as the validation images, and the remaining 270 images were used as the test images. For training and verification, several machine learning classification models were utilized, and the results are displayed in Table 5.  learning models, including the support vector machine, naive Bayes, decision tree, and multi-layer perceptron. Five regions are selected as five groups, and the second nuclear kernel, SVM, is selected as the classifier based on the classification results in Table 5. Fifty samples are selected from each of the three groups, which are then divided into their part parts. The component parts are then classified using the 50 samples, and the classification recall and accuracy are recorded for analysis, as shown in Table 6.
As can be seen from Table 6, the length, width, and area of the tobacco leaf calculated by the algorithm in this paper are useful features for part classification, and the average accuracy and recall rate is around 90%, which confirms the usefulness of the length and width calculation method proposed in this paper.

D. QUANTITATIVE EXPERIMENT OF WASTE INDEX
In this paper, ten groups of tobacco leaves with known grades were selected to determine waste, and Table 7 compares the maximum waste ratio in the national standard. The results indicate that just one of the 10 sample groups has a waste that exceeds the national standard minimum value of 2%, confirming the algorithm's efficacy.

E. QUANTITATIVE EXPERIMENT OF BODY INDEX
Weight per unit area was selected as the body index's quantification value in this paper. The thickness and weight per unit area are used in this paper as status indicators in VOLUME 10, 2022  a quantitative correlation analysis experiment because the value in the conventional artificial measure method almost couldn't find the data standardization and the matching.
The tobacco leaf density ρ had little difference among tobacco leaves of the same grade, and its relationship with weight per unit area g s and thickness h was ρ = g s /h. Therefore, the correlation between weight per unit area and thickness of the same tobacco leaf should be positive. In this paper, 40 X3F grade tobacco leaves were selected, and the weight per unit area and thickness were calculated and measured, respectively, with a high-precision thickness meter. Fig.21 displays the scatter relationship between the two.
The correlation coefficient was 0.9270, and the figure clearly shows a considerable positive correlation between the weight and thickness of tobacco leaf per unit area. It shows that the thickness per unit area calculated in this paper has the same advantages as the thickness measured using the conventional method, which can be used as the quantification value of the body index. However, the method employed in this paper is quicker and easier.
The running time of the color intensity quantification model mentioned in this paper is 8.7 seconds, the length and width quantification model are 2.6 seconds, the waste quantification model is 2.1 seconds, and the body quantification model is 0.3 seconds.

IV. CONCLUSION AND DISCUSSION
When grading tobacco, tobacco appearance quality indicators typically use artificial observation and measurement techniques to determine qualitative descriptions and quantitative values. This reduces classification accuracy because there is a lot of subjectivity involved. However, the labor costs are high, and the associated work takes a long time, which reduces the actual production efficiency. To address this problem, this paper proposes a quantitative method of the tobacco appearance quality index based on computer vision, which may provide precise quantitative values for some qualitative indices in accordance with the description of national standards and the experience of manual operation. The fundamental concept is to obtain photos of tobacco leaves for computer vision segmentation processing on the back end, then create algorithms based on the characteristics and actual meaning of many quantitative indicators to obtain precise quantitative values. To remove the impact of background and contaminants, the tobacco leaf region was extracted using semantic segmentation and connected region analysis. To compare the color differences between each partition, the color intensity index uses use of the difference of partition. Finding the boundary endpoint is the essential task of the length index. The waste index analyzes the inside and outside to determine the percentage of the waste area. The measurement of the tobacco leaf region is the main focus of the body index.
For each quantitative index, an experimental scheme is designed to verify the proposed method. The experimental plan for tobacco leaf region extraction compares semantic segmentation to conventional threshold segmentation. The findings demonstrate that threshold segmentation struggles to correctly identify some dark tobacco leaf surface portions as the background, while semantic segmentation performs more accurately. There were two different experimental plans designed for the color intensity index. Several groups of C2F and C3F tobacco leaves were selected for comparison in the first experiment. According to the results, C2F produced a better color intensity quantification result than C3F, which was in line with the national standard. In experiment 2, a collection of tobacco leaf samples with notable color variations was selected for study. The results demonstrated that different colored tobacco leaves had dramatically varying color quantization values, which may have been an indication of the color characteristics of tobacco leaves. For the length index, two experimental approaches were designed. The results of the first scheme, which compared the length calculated by the suggested algorithm with the actual measured length, revealed no appreciable differences between the two results. The results of the second scheme, which classified tobacco leaf portions using the length calculated in this paper, revealed that the classification effect was effective, suggesting that the length calculation method suggested in this paper may be used for tobacco leaf classification. In order to compare different groups of the waste ratio calculated in this paper with the maximum waste ratio outlined in the national standard, the experimental scheme for the waste index involved this comparison. The findings demonstrated that the calculated results in this paper fell within the national standard's permissible range. The experimental plan for the body index involved examining the relationship between the weight per square inch and the thickness of several groups of the same tobacco leaves. The correlation coefficient, which was 0.9270 according to the results, demonstrated that there was a significant positive correlation between them, indicating that the body index's quantitative value could be based on the weight per unit area selected for this paper.
The following study findings are presented in this paper: 1) Traditional threshold segmentation for image preprocessing will lose more pixel information than the computer vision-designed quantification algorithm of appearance quality index, which needs high-precision tobacco leaf area pixel information. Semantic segmentation can distinguish between the speckle and hole on a tobacco leaf's surface as well as the tobacco leaf and background in an image of a tobacco leaf. Tobacco leaves will serve as a visual cue for markings, and the background will serve as a visual cue for holes. 2) A three-dimensional vector may describe more specific aspects of color, such as light and dark, brilliance, uniformity, etc. than a single scalar value can. Different weights can be set according to the actual influence of color uniformity, saturation, and gloss intensity to take part in pertinent computation and create pertinent models in a future study on tobacco leaf grading.
3) It has a substantial impact on the classification of parts, and the accuracy of the length calculated using the method in this paper is comparable to that of the actual manual measurement. In addition to using fewer human resources, its efficiency is substantially higher than manual measurement. 4) A brand-new waste quantification model was proposed forth, taking into account the characteristics and variations of waste in tobacco leaves and edges. In recent years, a limited number of researchers manually calibrated the size of holes on the surface of tobacco leaves to determine the internal waste of tobacco leaves using Photoshop and other applications. Although the precise quantization value can be provided, the operation is time-consuming and inefficient, making it challenging to use in actual manufacturing. On the images of semantically segmented tobacco leaves in this paper, it is simple to find the pertinent waste indices, and the quantitative values are provided in the range of national requirements. The running speed is also high. By using associated software, the efficiency is significantly higher than manual identification and calibration computation. 5) Since weight per unit area can be quickly measured by computer vision techniques, it is used in this paper as the quantitative value of the body indicator. Thickness measurement is typically selected since manual tobacco leaf area measurement using conventional methods is difficult. When the quality of the tobacco leaf was fixed, the density barely varied, and the correlation between the weight per unit area and the thickness was positive.