Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Comparison of machine learning and deep learning models for evaluating suitable areas for premium teas in Yunnan, China

  • Guiyu Wei,

    Roles Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation School of Geography and Ecotourism, Southwest Forestry University, Yunnan, China

  • Ruliang Zhou

    Roles Conceptualization, Funding acquisition, Project administration, Resources, Supervision

    Zhou_ruliang@163.com

    Affiliation School of Geography and Ecotourism, Southwest Forestry University, Yunnan, China

Abstract

Background: Tea is an important economic crop in Yunnan, and the market price of premium teas such as Lao Banzhang is significantly higher than ordinary teas. For planting lands to promote, the tea industry to develop and minority lands’ economies to prosper, it is vital to evaluate and analyze suitable areas for premium tea cultivation. Methods: Climate, terrain, soil, and green cropping system in the premium tea planting areas were used as evaluation variables. The suitability of six machine learning models for predicting suitable areas of premium teas were evaluated. Result: FA+ResNet demonstrated the best performance with an accuracy score of 0.94 and a macro-F1 score of 0.93. The suitable areas of premium teas were mainly located in the southern catchment of LancangJiang River, south-central part of Dehong, a few areas in the mid-west of Lincang, central scattered areas of Pu’er, most of the southern western part of Xishuangbanna and the southern edge of Honghe. Annual mean temperature, annual mean precipitation, mist belt, annual mean relative humidity, soil type and elevation were the key components in evaluating the suitable areas of premium teas in Yunnan.

Introduction

Among the world’s most popular beverages, tea has been shown to be beneficial for health, wellness, and the treatment of chronic diseases [1]. The largest producer of tea in the world is China [2]. In addition to becoming a major cash crop in southwestern China [3], tea has also become a traditional characteristic of Yunnan’s ethnic minorities and an important economic driver, contributing to agricultural development, poverty alleviation, and increasing the incomes of tea farmers [4]. Among the 73 poor counties in Yunnan, 43 have a tea industry, making development of this industry one of the most effective ways to alleviate poverty in these areas [5]. Premium teas from ancient tree, like Lao Banzhang, Yiwu, Jingmaishan, Bingdao, Xigui, are as expensive as tens of thousands of yuan per kilogram, but their production is low and demand exceeds supply [6], thus have a high potential to expand the tea planting. However, tea farmers from poor minority countries generally have a low level of education and are less knowledgeable about the natural growing conditions of premium tea communities. They are less knowledgeable about suitable growing areas, and scientific methods for managing plantations [7]. As a result, it is necessary to develop an evaluation model that is capable of identifying the ideal growing areas for premium teas in Yunnan.

Firstly, climate variables including temperature and precipitation are important environmental variables for the cultivation of tea and other crops [810]. Temperatures of 17–22°C and precipitation of 1,200mm-1,500mm are the ideal conditions for the growth of Pu’er tea trees [11]. Humidity is another factor that influences a tea production area’s suitability. The suitable annual average humidity for tea planting should be more than 70% [1]. Secondly, Yunnan ethnic minorities have different methods for harvesting tea, fertilizing, irrigation, and controlling pests and diseases, so the green cropping system is also an important factor in promoting the cultivation of premium teas [12]. The artificial use of land, the price of tea, and political decisions are closely related to tea cultivation, which can result in inconsistencies in premium teas habitats [3]. Therefore, tea garden management, tea tree fertilization, and tea processing techniques are of particular importance in the cultivation of premium teas [13]. Thirdly, soil variables, including soil type, soil pH value, nitrogen, phosphorus, and potassium content within the soil are important indicators of tea leaf quality. According to Ye et al., soil PH and nitrogen, phosphorus, and potassium content were positively correlated with tea leaf quality [14]. It is recommended by Shamsheer and Ismet that soils with pH values ranging between 4.5 and 5.5 and containing 2% organic matter be planted for tea production, and soils with pH values below 4 and above 6 are incompatible with tea production [1]. Fourth, terrain variables, such as terrain conditions, can also affect the crop’s fitness [15]. According to Yang et al., the most suitable elevation for Pu’er tea trees is 1400 to 1800 meters, as well as flat and gently sloped lands with slopes of 15° or less [11]. Combined with the above variables, climate variables, green cropping systems, soil variables, and terrain variables were used as predictors of premium tea suitable areas.

Research on the prediction of suitable areas is currently focused on both flora and fauna. In terms of suitable areas of animals, various models have been used by scholars to predict the suitable areas. A study by Mudereri et al. demonstrated the feasibility of using MaxEnt (ME), random forests (RF), and support vector machines (SVM) to modify the habitat suitability of southern ground hornbills [16]. To estimate the optimal habitat for Asian horseshoe crabs in the future, Vestbo et al. used ME and SVM in order to create a niche model that could then be used to identify protected areas for Asian horseshoe crabs [17]. In their study, Lindbladh et al. used the k-NearestNeighbor (kNN) algorithm combined with information from satellite images and Swedish National Forest Survey data to predict the distribution of Aegithalos caudatus and deciduous forests [18]. A combination of five machine learning (ML) techniques was used by Muoz-Mas et al. to simulate the effects of climate change on the suitable areas of large brown trout (generalized additive models, multilayer perceptron ensembles, RF, SVM, and fuzzy rule-based systems) [19]. The findings of Su et al. have provided a better understanding of suitable habitats for Asiatic black bears and red pandas based on the use of ME and classification and regression tree (GARP) ecological niche models [20]. Zannou et al. applied and compared six statistical models (ME and generalized linear model (GLM) in a single model, generalized additive model, RF, augmented regression tree and SVM model in an integrated model) to predict the habitat suitability for ticks in order to better prevent and control ticks populations in different countries [21].

In terms of suitable areas of plants, scholars have used different models to predict the suitable areas of different pairs of plants. Angga Yudaputra et al. used an ensemble ML model with RF and artificial neural network techniques predicted endangered giant flower amorphophallus titanum in terms of habitat preference, spatial distribution and population status [22]. Mudereri et al. used RF, generalized linear model, SVM, CART, flexible discriminant analysis (FDA) and enhanced regression tree (BRT) modeling techniques and integrated models (IM) to predict current and future Striga weed distribution patterns [23]. Silva et al. analyzed the overlap of the range of Pygochelidon melanoleuca in Brazil by RF, ME and SVM algorithms [24]. Dai used the ME model to determine suitable habitat for tea tree cultivation under current climate scenarios [3]. In a prediction of the suitable areas of Tossa jute, ME and educational global climate models were used to study the effect of climate change on the distribution of Tossa [25]. Using a residual network, Yue et al. simulated the distribution of Panax Notoginseng in its suitable areas in China [26].

Furthermore, the accuracy of the prediction results for suitable areas varies considerably between models. According to Mudereri et al., the IM was the most accurate among RF, GLM, SVM, CART, FDA, and BRT models in predicting the applicability of current and future Striga weed distribution patterns [23]. In a study conducted by Hu et al., DNN models had excellent performance compared with ME, SVM, RF in predicting the habitat suitability of Framework-forming scleractinian, a deep-sea sclerenchyma coral in the Gulf of Mexico [27]. In comparison with one-dimensional convolutional networks (1D-CNN) without residual connections. Zhao et al. showed that residual connections can provide richer information about the bearing signal features and increase classification accuracy [28]. Jiang et al. introduced a convolutional attention mechanism in a deep residual network and showed that the model with the addition of the attention mechanism improved 0.65%, 0.38%, 0.62% and 0.60% in accuracy, precision, sensitivity and F1-score, respectively [29].

According to the literature review above, the study of fauna and flora habitats has gained considerable attention among scholars. However, little research has been conducted on tea’s suitable areas and the precision derived from different models varies widely. This paper will model and evaluate the suitability distribution of premium teas in Yunnan by utilizing four types of variables (climate variables, terrain variables, soil variables, and green cropping system variables) and six advanced ML methods: SVM, kNN, back propagation neural network (BPNN), Convolutional Neural Networks (CNN), Residual Network (ResNet) and FA+ResNet. It has not only been possible to identify variables associated with the quality of premium teas, but also to identify the optimal models that can be used to predict the suitable areas for premium teas in Yunnan, to provide precise guidance to tea farmers in the cultivation and production of tea, contribute to the industry theoretically and practically, and support the work of poverty alleviation in Yunnan, China.

Study area

Yunnan (Fig 1) is located in the southwest border of China, between 97°31′ and 106°11′ East longitude and 21°8′ and 29°15′ North latitude [30]. In Yunnan, which is a low-latitude inland region with high topography in the northwest and low topography in the southeast, the Tropic of Cancer passes through its southern end. Yunnan has 88.64% mountainous area, 87.21% mid-altitude region between 1000 and 3500 meters, and 56.46% area below 25° slope [31]. Yunnan spans six major water systems: the Yangtze, Pearl, Yuan, Lancang, Nujiang and Daying rivers [32]. Yunnan possesses a subtropical plateau monsoon climate with small annual temperature differences, large daily temperature differences, distinct dry and wet seasons, and large vertical temperature variations with altitude. The northwestern part of Yunnan has a cold climate; the eastern and central part has a temperate climate; and the southern and southwestern part has a low-heat valley climate with long summers, no winters, and rainfall in autumn [33]. In Yunnan, the average monthly temperature of the hottest month (July) varies between 19°C and 22°C, while the average monthly temperature of the coldest month (January) varies between 6 and 8°C, and the annual temperature difference is usually only 10°C to 12°C. The distribution of precipitation in Yunnan is extremely uneven geographically, with the most areas receiving 2,200mm to 2,700 mm of precipitation each year and the least receiving only 584 mm, with most areas receiving more than 1,000 mm per year [34].

thumbnail
Fig 1. Digital Elevation Map (DEM) of Yunnan Province, China.

Shape file source: republished from http://www.gscloud.cn under a CC BY license, with permission from Geospatial Data Cloud, original copyright [2022]; Own Map output: using ArcGIS 10.8 Software analysis.

https://doi.org/10.1371/journal.pone.0282105.g001

Methodology

Fig 2 depicts a flowchart illustrating the entire process of evaluating the suitable areas for premium teas in Yunnan.

thumbnail
Fig 2. Data pre-processing and evaluation of premium tea suitable areas.

https://doi.org/10.1371/journal.pone.0282105.g002

Premium teas data collection

Spring tea is generally picked in March and April in Yunnan, which represents the most premium quality in the whole year. Tea leaves have higher organic matter content after winter and spring recuperation. Additionally, the quality of spring tea is superior to summer and winter teas. Normal production of spring tea accounts for 35%-40% of the year, summer tea for 30% and autumn tea for 30%. Due to this overcapacity, spring tea’s actual share of sales has increased as summer tea is often abandoned for picking, and it now occupies 65%-70% of the annual economic share of tea industry [35]. Data on premium teas for this study were collected between March 20 and April 10, 2022, and 19 tea-growing villages were visited in the field. No permits were required with the help of local farmers. We used a handheld global positioning system (GPS) with a margin of error of three meters to locate reference control points, and 23 reference data were collected with the assistance of local farmers. Reference data were collected from 14 regions including Shuangjiang County, Lincang City, Menghai County, Xishanbanna, Mengla, Xishanbanna, Lancang Lahu Autonomous County, Ning’er Hani and Yi Autonomous County, Mojiang Hani Autonomous County, Jingdong Yi Autonomous County, Zhenyuan Yi Hani Lahu Autonomous County, Tengchong City, Baoshan City, Baoshan City, Shidian County, Simao District, Pu’er City, Luquan Yi Miao Autonomous County, Dali City, Guangnan County, Chuxiong Yi Autonomous Prefecture. Using ArcMap, 120 randomly selected sampling points were created within a radius of two kilometers, with each tea tree data point as the center of a circle, resulting in 2280 samples in total. The values of each variable are then extracted to each sample point using the multi-value extraction to points in the ArcGIS spatial analysis tool to form a complete sample set of data. To minimize redundancy in distribution data during modeling, species distribution data were buffered and analyzed using ArcGIS to remove duplicate data within 1 km, ensuring that each 250 × 250 m2 grid contained only one tea tree distribution point [23]. Following the above screening, 2028 species distribution data were finally retained, which were distributed on different elevation gradients from 610m to 3400m above sea level.

Variable data source and processing

Four types of variables were used in this study, mainly classified as climate variables, terrain variables, soil variables, and green cropping system variables.

Climatic variables include: annual mean temperature, annual mean precipitation, annual mean relative humidity, and cloud belt. These factors were selected because temperature and precipitation are important influencing factors for tea growth [10, 11] and the annual mean relative humidity for tea planting should not be less than 70% [1].

The annual mean temperature and annual mean precipitation data were obtained from the World Climate Information Platform (https://www.worldclim.org/), which has a resolution of 1000×1000m2 and a post resampling of 250×250m2. The annual average relative humidity data was obtained by calculating the average of meteorological station data in Yunnan Province from 2000 to 2020 and spatially interpolated, with a resolution of 250×250m2. Mist belt data are derived from Wang et al.’s analysis of the latitude-temperature relationship for Yunnan [36].

Soil variables include soil type, soil pH, soil nitrogen content, soil phosphorus content, and soil potassium content. Soil type data were obtained from Nanjing Soil Research Institute (https://soildata.issas.ac.cn). Soil PH, soil nitrogen content, soil phosphorus content, and soil potassium content data were obtained from the Geographic Remote Sensing Ecology Network (http://www.gisrs.cn/). The soil variables were chosen as a result of the positive correlation between the soil PH and soil N, P, and K contents with the quality of the tea [14].

The terrain variables are elevation, aspect, and slope. Terrain variables were chosen because topographic conditions also influence the range of crops that are suitable [17]. The elevation data were obtained from the digital elevation model (DEM) downloaded from the geospatial data cloud (http://www.gscloud.cn/), and aspect and slope were obtained by using the spatial analysis tool in ArcMap software with a resolution of 250×250 m2. Table 1 describes the predictor variables for the evaluation of suitable areas for premium teas.

thumbnail
Table 1. Predictor variables in the evaluation of suitable areas for premium teas.

https://doi.org/10.1371/journal.pone.0282105.t001

Moreover, green cropping systems are also significant factors affecting premium tea habitat distribution [12]. The green planting system data in this paper is a comprehensive scoring of the influence of tea history and culture of Yunnan ethnic groups on tea production methods [37], tea picking methods [38], whether the tea area is a minority gathering area, whether pesticides are applied to tea trees and whether fertilizers are applied to the tea area [39] and the resolution is 250×250 m2.

Variable data in this study were processed using normalization, missing value processing, and principal component dimensionality reduction. The minimal data set of this study is provide in S1 Table.

The normalization process has an effect of improving the speed of convergence of the loss function, preventing the gradient explosion and improving the computational accuracy [40]. It is common practice for the input of the model to contain a variety of data of different dimensions when predicting the suitable areas for teas. In order to eliminate the influence of different dimensions on the prediction results and improve the model’s accuracy and efficiency, it is necessary to normalize the data of different dimensions. In this paper, the min-max method was used to normalize the data to 0 to 1.

Missing values can be handled in a variety of ways, including deletion, interpolation, and no processing [41]. There are 6,143,563 raster points in the prediction set data, and there are inevitably missing values when the values are extracted. Due to the fact that the raster points will ultimately be converted into raster surfaces, neither the deletion nor the no processing methods are reasonable. Therefore the interpolation method is used to complete the missing values by forward complement (’pad’) and backward complement (’bfill’) in Python [42].

Principal component analysis (PCA) is a multivariate statistical technique that reduces multiple variables to a few composite variables through the idea of dimensionality reduction. During PCA, as many characteristics of the original sample as possible are retained [26].

Descriptive statistics on tea prices and quality

This study examined the price distribution of Yunnan tea over the study period and distinguished premium and inferior tea based on tea prices. The price of sample teas were obtained by web crawler from Taobao [43] (the largest e-commerce platform in China, and the variety and quantity of teas it sells are wide and representative) (https://www.taobao.com/) and the number and percentage of tea types in different price ranges were also calculated. In Table 2, the tea price range is between 0.8¥ and 99999¥, with a mean value of 8275.72¥ and a standard deviation of 21422.75¥. The full sample points were randomly divided into modeling and validation sets in a ratio of 8:2. In the modeling project, the statistical characteristics of the validation set and the modeling set are similar. This indicates that the validation set and the modeling set are representative of the study area for the modeling study. A validation set also isn’t involved in any modeling process, only participating in experiments as validation data. Table 3 shows the number and proportion of tea types in different tea price ranges. The tea samples are divided into 9 classes, with Class 1 representing inferior teas and Class 9 representing premium teas. The higher the price of the tea, the better the quality and vice versa.

thumbnail
Table 2. Statistics on descriptive characteristics of tea prices in the study area.

https://doi.org/10.1371/journal.pone.0282105.t002

thumbnail
Table 3. Number and proportion of price categories at sample points.

https://doi.org/10.1371/journal.pone.0282105.t003

Model interpretation

SVM.

SVM is a binary classifier whose main objective is to find the optimal classification hyperplane by using a hyperplane as a decision surface to maximize the edge between positive and negative examples [44]. In addition to reducing prediction errors, SVM also reduces the risk of overfitting. SVM can be used to solve nonlinear classification problems by mapping the data to a high-dimensional space and defining a partitioned hyperplane [45].

kNN.

kNN is a nonparametric classification method that is widely used in a wide variety of practical classification tasks. The main theory of kNN is to find the nearest k neighbors of an unknown sample to determine its category. The implementation steps of the kNN algorithm consist of three key elements: the k-value, the distance measurement method, and the classification decision rule. Based on a grid search, the best k-value is selected using Euclidean distance as a measure of distance and the majority voting method as a classification decision method in this study [46].

BPNN.

BPNN is one of the most widely used neural network algorithms [47], which is a supervised learning algorithm using three network layers for feedforward optimization. Its advantages include its simple architecture, ease of model construction, and fast computation [48].

CNN.

CNN is a multilayer feedforward neural network, designed for processing large-scale image or remote sensing data in the form of multiple arrays by considering local and global smooth properties. The structure of the convolutional neural network is shown in Fig 3. As compared to other deep learning structures, convolutional neural networks produce better results in areas such as image recognition and classification. CNN has local connectivity as well as weight sharing; parameter sharing can effectively minimize the problem of overfitting, and sparse connectivity allows the network to learn local features [49]. CNNs are usually composed of one or more convolutional layers, pooling layers, and fully connected layers [40].

ResNet block.

Shallow conventional neural networks with more desirable output results may instead lead to network degradation when additional layers are added [50]. Therefore, it is difficult to fit a potential identity mapping function HX = X directly to a traditional neural network. In contrast, residual networks use residuals to convert fitted objectives into an estimation of the residuals to zero. Instead, they learn a complete output, for instance the difference between HX and X in order to achieve the identity mapping training target [51].

FA+ResNet.

An attention mechanism [52] is developed by simulating the attention features of the human brain, and has been initially applied to image processing [53]. The attention mechanism in deep learning assigns weights based on different features, giving greater weights to key features and smaller weights to other features, which can enhance the efficiency of processing information. Fig 4 illustrates the structure of the attention mechanism unit.

The processes of attention state transition [54] are shown in Eqs (1) to (4): (1) (2) (3) (4) where ati is the attention weight value. y1, y2, y3, …, yt is the input sequence. h1, h2, h3, …, ht is the hidden layer state value corresponding to the input sequence y1, y2, y3, …, yt. ht is the hidden layer state value corresponding to the input yt. is the final feature vector. VWUb are the learning parameters of the model, which continuously update with the model training process.

The model structure of FA+ResNet is shown in Fig 5. This is a combined model, based on residual networks, for optimizing residual network structure by adding an attention mechanism, namely feature attention (FA), to feature extraction [28]. The FA+ResNet model is based on residual network, improved residual block structure and introduced attention mechanism in feature extraction, known as the Feature Attention (FA). As can be seen in Fig 6, the FA+ResNet model mainly includes sample input layer, PCA layer, FA layer, 1D convolution layer, residual block layer, dropout layer, flat layer, fully connected layer, and output layer. The modeling processes are as followed.

  • Normalize the input sample data and then perform PCA dimensionality reduction to make the data structure simpler and make the model easier to identify and process.
  • Incorporate attention mechanism in feature extraction, which improves the efficiency of information processing.
  • 1D convolutions are performed, and two residual blocks are then connected to form the whole residual connected part.
  • To prevent overfitting, the dropout layer is added after the residual block, and the result is "flattened" by the flat layer and finally classified by the Softmax layer.

Model assessment

Confusion matrix (CM).

The CM is also known as a likelihood matrix or error matrix, which is a visualization tool, especially for supervised learning. The main purpose of CM in image accuracy evaluation is to compare the classification results with the actual measured values and display the classification accuracy within itself [55, 56].

Accuracy and Macro-F1 score.

The accuracy rate, precision rate, recall rate and F1-score are the metrics used to evaluate the classification models and directly measure the performance of the models. The greater the accuracy of the model, the better it performs, as the accuracy represents the prediction accuracy of the whole sample [56]. The calculation formula is as follows.

(5)

True Positive (TP): successfully predicts positive samples as positive. True Negative (TN): successfully predicts negative samples as negative. False Positive (FP): incorrectly predicts negative samples as positive. False Negative (FN): incorrectly predicts positive samples as negative.

Precision is proposed in the context of sample imbalance and is defined as the probability that the actual result is a positive sample among all the samples that are predicted to be positive, calculated as Eq (6).

(6)

Recall rate is the probability that a sample with a positive result will also be predicted to be positive, and is calculated as in Eq (7).

(7)

The F1-score hopes to find a balance point between precision rate and recall, so that the precision rate and recall results can be optimal, and the formula is (8).

(8)

Macro-averaged F1 (Macro-F1 score) is the accuracy of each behavior type considered together in the case of multiple classifications. Thus it is utilized to evaluate the overall classification effectiveness of the model and is calculated as Eq (9) [57].

(9)

Results

Model parameter setting

Across all six models, SVM and KNN use grid search to set parameters. The grid search method increases the accuracy and applicability of parameters in algorithms [58]. Using Python 3.8, the predictor variables affecting the distribution of suitable areas for high quality tea were input into the model and fitted to obtain the optimal distribution model. 80% of the accumulated data is used for training and 20% for testing. Below are detailed descriptions of the parameter settings for each model:

  1. SVM: kernel function is "RBF", penalty factor C is 100, and error threshold gamma is 0.2.
  2. kNN: the optimal parameters are: k = 3, weight = "distance", and algorithm = "auto".
  3. BPNN: built in Python with a 3-layer implicit layer, an implicit layer activation function of "ReLU", an output layer activation function of "Softmax", a dropout of 0.1, a learning rate of 0.001, and an optimizer of "Adam".
  4. CNN: the kernel size is 3, the step size is 1, the padding is "same", and the activation is "relu" in the 1D convolution layer. The pool size of the pool layer is 2, the padding is "same"; the activation function of the hidden layer is relu, and the optimizer is "Adam". The dropout rate is 0.1, and the learning rate is 0.001.
  5. ResNet: the convolution and output layer parameters are set the same as in our CNN model. The optimizer is "Adam". The dropout rate is 0.1 and the learning rate is 0.0001.
  6. FA+ResNet: "rule" and "Softmax" are the activation functions of the 1D convolutional and hidden layers, respectively. The optimizer is "Adam", the dropout rate is 0.1 and the learning rate is 0.0001.

Results of model accuracy validation

Results of CM.

The CMs [59] of the six models is shown in Figs 611, respectively. The performance of the six models in predicting the suitable areas of first class tea in descending order is: ResNet<kNN<BPNN<SVM<CNN≤FA+ResNet. The performance of the six models in predicting the suitable areas of second class tea in descending order is: KNN≤BPNN≤CNN<SVM<ResNet<FA+ResNet. All six models performed well in predicting the suitable areas of second class tea, with no misclassifications. The performance of the six models in predicting the suitable areas of fourth class tea in descending order is: KNN<SVM≤BPNN≤CNN≤ResNet<FA+ResNet. The performance of the six models in predicting the suitable areas of fifth class tea in descending order is: KNN≤SVM≤BPNN≤CNN≤FA+ResNet<ResNet. The performance of the six models in predicting the suitable areas of sixth class tea in descending order is: KNN≤SVM≤BPNN≤CNN≤FA+ResNet<ResNet. The performance of the six models in predicting the suitable areas of seventh class tea in descending order is: KNN<BPNN<SVM≤CNN≤ResNet≤FA+ResNet. The performance of the six models in predicting the suitable areas of eighth class tea in descending order is: KNN<SVM≤FA+ResNet<BPNN≤ResNet<CNN. The performance of the six models in predicting the suitable areas of ninth class tea in descending order is: KNN<BPNN≤SVM≤CNN≤ResNet≤FA+ResNet. From the overall results, FA+ResNet had the highest comprehensive prediction ability.

Results of accuracy and Macro-F1 score.

Accuracy and Macro-F1 score are key indicators for judging the performance of multi-classification models [28]. Figs 12 and 13 describe the results of the six models obtained after ten trials for these two indicators. Both indicators show that CNN, ResNet, and FA+ResNet are more precise than BPNN, KNN, and SVM. Nevertheless, CNN, ResNet, and FA+ResNet perform differently at different stages and can’t be directly compared. Thus, the average, maximum, and minimum values of the two indicators were calculated as shown in Table 4 and Figs 14 and 15. The FA+ResNet model has both the highest accuracy and macro-F1 scores, and the overall performance of the ResNet model was slightly inferior to that of FA+ResNet. Additionally, CNN has the second best average, maximum, and minimum accuracy and macro-F1 scores after ResNet. The BPNN performs less well than the SVM, but its maximum accuracy and macro-F1scores are slightly higher than the SVM’s. The kNN model performs the worst, with the lowest average accuracy score of 81.28% and the lowest average Macro-F1 score of 80.59%.

thumbnail
Fig 12. Accuracy comparisons of ten experiments with SVM, KNN, BPNN, CNN, ResNet, FA+ResNet models.

https://doi.org/10.1371/journal.pone.0282105.g012

thumbnail
Fig 13. Macro-F1 comparisons of ten experiments with SVM, KNN, BPNN, CNN, ResNet, FA+ResNet models.

https://doi.org/10.1371/journal.pone.0282105.g013

thumbnail
Fig 14. Visualization of accuracy of SVM, kNN, BPNN, CNN, ResNet, FA+ResNet models in predicting suitable areas for premium teas.

https://doi.org/10.1371/journal.pone.0282105.g014

thumbnail
Fig 15. Visualization of Macro-F1 score of SVM, kNN, BPNN, CNN, ResNet, and FA+ResNet models in predicting suitable areas for premium teas.

https://doi.org/10.1371/journal.pone.0282105.g015

thumbnail
Table 4. Accuracy and Macro-F1 score of SVM, kNN, BPNN, CNN, ResNet, FA+ResNet models in predicting suitable areas for premium teas.

https://doi.org/10.1371/journal.pone.0282105.t004

Prediction of suitable areas of premium teas

Fig 16 illustrates the prediction results of six selected models, and the FA+ResNet model performed best based on the results of Macro-F1 score and Accuracy.

thumbnail
Fig 16.

The classified predicted suitable areas of premium teas using remote sensing variables in six models: (a) SVM; (b) kNN; (c) BPNN; (d) CNN); (e) ResNet; (f) FA+ ResNet. The suitability of premium teas increase gradually from Class 1 to Class 9. Class 9 is the most suitable areas for premium teas growth, while Class 1 is the least suitable area for premium teas growth. Shape file source: republished from http://www.gscloud.cn under a CC BY license, with permission from Geospatial Data Cloud, original copyright [2022]; Own Map output: using ArcGIS 10.8 Software analysis.

https://doi.org/10.1371/journal.pone.0282105.g016

In Fig 17, the FA+ResNet model’s predictions for premium tea suitable areas are illustrated in greater detail and the overall probability of the distribution of suitable areas for premium teas was estimated. Firstly, the predicted suitable areas for premium teas (Class 8 and Class 9 in Fig 17) are located in the southern catchment of LancangJiang River, south-central part of Dehong, a few areas in the mid-west of Lincang, central scattered areas of Pu’er, most of the southern western part of Xishuangbanna and the southern edge of Honghe, where abundant rainfall, high humidity and fog prevail. Secondly, the northwestern and northeastern areas of Yunnan Province other than the Lancangjiang River are predicted to be low suitable areas for premium teas (Class 1, Class 2, Class 3 and Class 4 in Fig 17), due to their high altitude (mostly above 2300 m), insufficient precipitation and low temperature conditions for premium teas’ growth. Thirdly, the medium suitable areas of premium teas are mainly distributed in southwestern and central Yunnan (Class 5, Class 6 and Class 7 in Fig 17). The quality of tea in these areas decreases due to its inadequate precipitation, temperature, cultivation system, soil type, and topography.

thumbnail
Fig 17. Predicted suitable areas of premium teas by FA+ResNet model.

The suitability of premium teas increase gradually from Class 1 to Class 9. Class 9 is the most suitable areas for premium teas growth, while Class 1 is the least suitable area for premium teas growth. Shape file source: republished from http://www.gscloud.cn under a CC BY license, with permission from Geospatial Data Cloud, original copyright [2022]; Own Map output: using ArcGIS 10.8 Software analysis.

https://doi.org/10.1371/journal.pone.0282105.g017

Variable importance

By examining the importance of variables, this paper reveals the relationship between variables and suitable areas for premium tea. In our study, the FA+ResNet algorithm is used to model the remaining variables after reducing one variable at a time. Under the same conditions, the importance of one variable is calculated as the difference between its accuracy and the accuracy of a successful model. Fig 18 shows that annual mean temperature, annual mean precipitation, mist belt, soil type, and elevation are the most prominent variables. Among them, annual mean temperature and annual mean precipitation are significant factors in the corresponding predicted area. The effect of each variable on the growth of tea trees is analyzed as follows.

thumbnail
Fig 18. Comparison of the importance of variables in the FA+ResNet model.

https://doi.org/10.1371/journal.pone.0282105.g018

In terms of annual mean temperature, annual mean precipitation and annual mean relative humidity, temperature and precipitation are significant environmental variables for tea cultivation [810]. Temperatures of 17–22°C and precipitation of 1,100mm-1,500mm are ideal for the growth of Pu’er tea trees in Yunnan [11]. The annual mean humidity should be at least 70% for premium tea growing [1].

In terms of mist belt, one ancient Chinese saying "high mountain clouds produce premium tea" reflects the influence of mist on tea [37, 60].

In terms of soil type, different soil types have different soil pH and nutrients. In addition, organic matter, nitrogen, phosphorus, potassium and microelements are the main nutrients required for the growth of tea trees [1, 61].

In terms of elevation, the most suitable altitude for Pu’er tea trees is 1400 to 1800 m, as well as flat and gently sloping lands with slopes of 15° or less, according to Yang et al. [11].

In terms of green cropping system, most of the quality tea projection areas in this paper overlap with areas inhabited by ethnic minorities, in accordance with a study by Cao et al., which suggested that multiethnic groups such as the Brown, Yi, Dai, Hani, De’ang, and Kino have inherited intangible cultural heritage, as well as traditional production techniques and minority tea customs, through their long history of tea cultivation and production [62]. Further, the unique cultivation methods of multiethnic groups are one of the major factors contributing to the high price of premium tea. Multiethnic groups do not apply artificial fertilizer during cultivation, but rather employ a manual hand picking method that does not degrade the quality of the premium tea leaves. In addition, the ecological environment in the cultivation areas is suitable and there is little human interference with the growth of tea leaves. This ensures the distinctive flavor of the premium tea to a large degree.

According to the FA+ResNet model, slope, soil PH, P, N, K, and aspect were all statistically negative variables. It is imperative that all of them be included in the evaluation, since they are indispensable for spatially predicting suitable areas for tea [1, 11, 13].

Discussion

In this study, it was demonstrated that deep learning methods are capable of predicting suitable areas for premium teas. Within the six models, FA+ResNet performed the best in terms of accuracy during data validation. With the use of cutting-edge artificial intelligence, FA+ResNet framework can assess crop suitability rapidly, accurately, and cost-effectively [58], which were previously difficult for quantification.

Deep learning plus attention mechanism has the advantage of obtaining more information when extracting features and preventing bias in decision making. Further, this study was more concerned with the predictive performance and accuracy of the model than the interpretability, thus FA+ResNet is the preferred approach in this context. This study has some limitations.

Firstly, this study measures the quality of tea classified by tea prices due to the fact that the market price tends to reflect the quality of tea to a large extent in most cases. However, the quality and price of teas are not completely correlated, since poor quality tea can be sold for high prices and high quality tea can be sold for high prices. Future research could choose more diverse variables to quantify tea quality.

Secondly, this study used a spatial resolution raster dataset of 250m×250m, and future studies should focus on a more refined high-resolution (i.e., sub-meter level) to detect and monitor distribution areas. Particularly, drones are suggested for monitoring the distribution area and growth of premium teas on a regular basis in order to adjust management methods as necessary.

Thirdly, in spite of the fact that premium teas in Yunnan were examined as a case study, the proposed approach can be replicated and applied to predict suitable areas for other cash crops with similar agroecological conditions.

Conclusion

In this study, the geographical distribution pattern of different suitable areas for premium teas in Yunnan was evaluated from a combination of theoretical and practical perspectives.

Firstly, this study compared the performance of six models in multi-classification prediction of suitable areas for premium teas in Yunnan: SVM, kNN, BPNN, CNN, ResNet and FA+ResNet, based on multi-source remote sensing data. Consequently, the FA+ResNet model with the attention mechanism based on the residual network demonstrated the best performance, with the highest accuracy and macro F1-score.

Secondly, the suitable areas of premium teas in Yunnan were mainly located in the southern catchment of LancangJiang River, south-central part of Dehong, a few areas in the mid-west of Lincang, central scattered areas of Pu’er, most of the southern western part of Xishuangbanna and the southern edge of Honghe, where abundant rainfall, high humidity and fog prevail by the FA+ResNet model.

Thirdly, annual mean temperature, annual mean precipitation, mist belt, annual mean relative humidity, soil type and elevation were the significant factors in evaluating suitable areas for premium tea distribution, which corresponds to the conventional knowledge that "premium tea comes from high mountains accompanied by clouds" by local tea farmers.

As a general rule, in order to help farmers from minority groups who live in areas where premium teas are distributed escape poverty and increase their income, initiatives that enhance the management of premium tea areas should be implemented. Furthermore, researchers, policymakers, extension agents, and various stakeholders can benefit from our findings by adopting effective management programs for premium tea suitable areas.

Supporting information

Acknowledgments

The authors express their gratitude to Mr. Kehao Chen, a master’s student at Chongqing University, for his supports in the process of proofreading the language of the article.

References

  1. 1. Shamsheer UH, Ismet B. Developing a Set of Indicators to Measure Sustainability of Tea Cultivating Farms in Rize Province, Turkey. Ecological Indicators. 2018;95 P1: 219–232.
  2. 2. Wu X. Analysis of the world tea production and trade pattern and its inspiration. Tea in Fujian. 2019;41: 41–42.
  3. 3. Dai Y. The overlap of suitable tea plant habitat with Asian elephant (Elephus maximus) distribution in southwestern China and its potential impact on species conservation and local economy. Environmental Science and Pollution Research. 2022;29: 5960–5970. pmid:34432214
  4. 4. Zhao M, Li S, Niu Y, Yang B, Liu R, Zhao P. Development Status and Analysis of Tea Industry in Yunnan. Chinese Journal of Tropical Agriculture. 2019;39: 120–124.
  5. 5. Chen H. Strategies to improve the performance of Yunnan tea industry in poverty alleviation with precision. Tea in Fujian. 2016;38: 260–261.
  6. 6. Duan Z. Expensive is respectful of resources. Pu-erh. 2018; 13.
  7. 7. Guangdong Tea Industry. China Tea Circulation Association launched a demonstration project to help poor tea farmers in mountainous minorities. Guangdong Tea Industry. 2015; 54.
  8. 8. Fekadu A, Soromessa T, Dullo BW. GIS-based assessment of climate change impacts on forest habitable Aframomum corrorima (Braun) in Southwest Ethiopia coffee forest. Journal of Mountain Science. 2020;17: 2432–2446.
  9. 9. Yin Y, Wang J, Leng G, Zhao J, Wang L, Ma W. Future potential distribution and expansion trends of highland barley under climate change in the Qinghai-Tibet plateau (QTP). Ecological Indicators. 2022;136: 108702.
  10. 10. Huang Z, Xie L, Wang H, Zhong J, Zheng X. Geographic distribution and impacts of climate change on the suitable habitats of Zingiber species in China. Industrial Crops and Products. 2019;138: 111429.
  11. 11. Yang Y, He C, Li X. Evaluation of the suitability of large scale plantations of cultivated Puer tea in Yunnan Province based on GIS. Journal of Beijing Forestry University. 2010;32: 33–40.
  12. 12. Yu X, Zhou X. Analysis of soil suitability of famous tea producing areas in Taihu County and the main measures to improve soil suitability. Anhui Agricultural Science Bulletin. 2016;22: 89+96.
  13. 13. Zhu B. Evaluation of ecological suitability of famous tea plantations in Guangde County. Anhui Agricultural Science Bulletin. 2016;22: 53–57.
  14. 14. Ye J, Zhang Q, Lin S, Luo S, Jia X, He H. Correlation of growth and fresh leaf quality of Dahongpao tea tree with soil characteristics. Journal of Forest and Environment. 2019;39: 488–496.
  15. 15. Luitel DR, Siwakoti M, MD Joshi, Rangaswami M, Jha PK. Potential suitable habitat of Eleusine coracana (L) Gaertn (Finger millet) under the climate change scenarios in Nepal. 2020 [cited 4 Jul 2022].
  16. 16. Mudereri BT, Chitata T, Chemura A, Makaure J, Mukanga C, Abdel-Rahman EM. Is the protected area coverage still relevant in protecting the Southern Ground-hornbill (Bucorvus leadbeateri) biological niche in Zimbabwe? Perspectives from ecological predictions. GIScience & Remote Sensing. 2021 [cited 4 Jul 2022].
  17. 17. Vestbo S, Obst M, Fernandez FJQ, Intanai I, Funch P. Present and Potential Future Distributions of Asian Horseshoe Crabs Determine Areas for Conservation. Frontiers in Marine Science. 2018;5: 164.
  18. 18. Lindbladh M, Felton A, Trubins R, Sallns O. A landscape and policy perspective on forest conversion: Long-tailed tit (Aegithalos caudatus) and the allocation of deciduous forests in southern Sweden. European Journal of Forest Research. 2011;130: 861–869.
  19. 19. Muoz-Mas R, Lopez-Nicolas A, Martínez-Capel F., Pulido-Velazquez M. Shifts in the suitable habitat available for brown trout (Salmo trutta L.) under short-term climate change scenarios. Science of The Total Environment. 2016;544: 686–700. pmid:26674698
  20. 20. Su H, Bista M, Li M. Mapping habitat suitability for Asiatic black bear and red panda in Makalu Barun National Park of Nepal from Maxent and GARP models. Sci Rep. 2021;11: 14135. pmid:34238986
  21. 21. Zannou OM, Da Re D, Ouedraogo AS, Biguezoton AS, Abatih E, Yao KP, et al. Modelling habitat suitability of the invasive tick Rhipicephalus microplus in West Africa. Transboundary and Emerging Diseases. 2022;n/a. pmid:34985810
  22. 22. Yudaputra Angga· Fijridiyanto Izu Andry · Yuzammi · Witono Joko Ridho ·, Astuti Inggit Puji · Robiansyah Iyan, et al ·. Habitat preferences, spatial distribution and current population status of endangered giant flower Amorphophallus titanum. Biodiversity and Conservation. 2022;31: 831–854.
  23. 23. Mudereri BT, Abdel-Rahman EM, Dube T, Landmann T, Niassy S. Multi-source spatial data-based invasion risk modeling of Striga (Striga asiatica) in Zimbabwe. GIScience & Remote Sensing. 2020; 1–20.
  24. 24. Silva GAD, Frederico RG, Almeida SM, Salvador GN, Malacco GB, Melo CD. Conservation of the Black-collared Swallow, Pygochelidon melanoleuca (Wied, 1820) (Aves: Hirundinidae) in Brazil: potential negative impacts of hydropower plants. 2022 [cited 4 Jul 2022]. http://www.scielo.br/j/bn/a/HksPjZvzMGgJcgH6Dyx3NPM/#figures
  25. 25. Mollah TH, Shishir S, Momotaz , Rashid MS. Climate change impact on the distribution of Tossa jute using maximum entropy and educational global climate modelling. The Journal of Agricultural Science. 2021;159: 500–510.
  26. 26. Yue J, Li Z, Zuo Z, Wang Y. Evaluation of Ecological Suitability and Quality Suitability of Panax notoginseng Under Multi-Regionalization Modeling Theory. Frontiers in Plant Science. 2022;13. pmid:35574115
  27. 27. Hu Z, Hu J, Hu H, Zhou Y. Predictive habitat suitability modeling of deep-sea framework-forming scleractinian corals in the Gulf of Mexico. Science of The Total Environment. 2020;742: 140562. pmid:32721728
  28. 28. Zhao J, Zhao Z, Yang S. Rolling bearing fault diagnosis based on residual connection and 1D-CNN. Journal of Vibration and Shock. 2021;40: 1–6.
  29. 29. Jiang Z, He T, Shi Y, Long X, Yang S. Remote sensing image classification based on convolutional block attention module and deep residual network. Laser Journal. 2022;43: 76–81.
  30. 30. Location and area_Impression of Yunnan_Yunnan Provincial People’s Government Portal. [cited 22 Jul 2022]. http://www.yn.gov.cn/yngk/gk/201904/t20190403_96247.html
  31. 31. Natural Overview_Impression of Yunnan_Yunnan Provincial People’s Government Portal. [cited 22 Jul 2022]. http://www.yn.gov.cn/yngk/gk/201904/t20190403_96255.html
  32. 32. Yunnan Water Resources Department. [cited 22 Jul 2022]. https://wcb.yn.gov.cn/html/shuiziyuangongbao/
  33. 33. Climate Characteristics_Impression of Yunnan_Yunnan Provincial People’s Government Portal. [cited 22 Jul 2022]. http://www.yn.gov.cn/yngk/gk/201904/t20190403_96257.html
  34. 34. Statistical Yearbook—Yunnan Bureau of Statistics—. [cited 22 Jul 2022]. http://stats.yn.gov.cn/tjsj/tjnj/202112/t20211223_1071686.html
  35. 35. Department of Rural Agriculture of Yunnan. [cited 15 Nov 2022]. https://nync.yn.gov.cn/
  36. 36. Wang Y, Huang X, Zhao P, Ye J, Zhou R. Analysis of Relationship Between Latitude and Temperature in Yunnan Province Using Temperature Fields. Remote Sensing Information. 2016;31: 44–50.
  37. 37. Cao M. Study on the Historical and Cultural Heritage of Tea of Ethnic Minorities in Yunnan. Agricultural Archaeology. 2022; 203–210.
  38. 38. Wang S. Tea picking standards and techniques. Guangdong Sericulture. 2020;54: 61–62.
  39. 39. Jiang la, Pan N, Liu Y, Pan W, Tian Y. Environmental evaluation and revelation of Chinese herbal medicine base in Meitan County. Agricultural Development & Equipments. 2015; 32–34.
  40. 40. Ren J, Wei H, Zhou Z, Hou T, Yuan Y, Shen jiquan, et al. Ultra-short-term power load forecasting based on CNN-BiLSTM-Attention. Power System Protection and Control. 2022;50: 108–116.
  41. 41. Zhen G, Yao X, Chen G, Xu Z. Research on Crawling and Cleaning of Ecological Data in Changbai Mountain. Journal of Changchun Institute of Technology(Natural Sciences Edition). 2021;22: 82–86+124.
  42. 42. Xiong Z, Guo H, Wu Y. Review of Missing Data Processing Methods. Computer Engineering and Applications. 2021;57: 27–38.
  43. 43. Taobao Internet. [cited 2 Aug 2022]. https://www.taobao.com/
  44. 44. Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intelligent Systems and their Applications. 1998;13: 18–28.
  45. 45. Burges C. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery. 1998;2: 121–167.
  46. 46. Weinberger KQ. Distance Metric Learning for Large Margin Nearest Neighbor Classification. Jmlr. 2009;10.
  47. 47. Lu H, Zhao M. Prediction of soil pH in Anhui province based on RPROP and GRPROP algorithms. Jiangsu Journal of Agricultural Sciences. 2019;35: 1119–1123.
  48. 48. Yang Y, Zhu J, Zhao C, Liu S, Tong X. The spatial continuity study of NDVI based on Kriging and BPNN algorithm. Mathematical & Computer Modelling. 2011;54: 1138–1144.
  49. 49. An J, Ai P, Xu S, Liu C, Xia J, Liu D. An intelligent fault diagnosis method for rotating machinery based on one dimensional convolution neural network. Journal of Nanjing University(Natural Science). 2019;55: 133–142.
  50. 50. Su Y, Zhang G, Wang B, Xu X. Field Weed Identification Method Based on Deep Connection Attention Mechanism. Computer Engineering and Applications. 2022;58: 271–277.
  51. 51. Pei X, Zhang Y. Flower image Classification Algorithm Based on Improved Residual Network. Chinese Journal of Electron Devices. 2020;43: 698–704.
  52. 52. Jin Y, Yao M, Liu X, Huang F. Rolling Bearing Fault Diagnosis Model Combining with Residual Network and Attention Mechanism. Mechanical Science and Technology for Aerospace Engineering. 2020;39: 919–925.
  53. 53. Mnih V, Heess N, Graves A, Kavukcuoglu K. Recurrent Models of Visual Attention. Advances in Neural Information Processing Systems. 2014;3.
  54. 54. Xie Q, Dong L, Ku X. Short-term electricity price forecasting based on Attention-GRU. Power System Protection and Control. 2020;48: 154–160.
  55. 55. Foody GM. Status of land cover classification accuracy assessment. Remote Sensing of Environment. 2002;80: 185–201.
  56. 56. Li N, Zhang W, Guo Z, Yuan T, Han X. Fault Prediction of Intelligent Electricity Meter Based on Multi-Classification Machine Learning Model. Electrical & Energy Management Technology. 2022; 6–11.
  57. 57. Ren J, Zhang Y, Zhang B, Li S. Classification Algorithm of Industrial Internet Intrusion Detection Based on Feature Selection. Journal of Computer Research and Development. 2022; 1–12.
  58. 58. Liu ZY-C, Chamberlin AJ, Tallam K, Jones IJ, Lamore LL, Bauer J, et al. Deep Learning Segmentation of Satellite Imagery Identifies Aquatic Vegetation Associated with Snail Intermediate Hosts of Schistosomiasis in Senegal, Africa. Remote Sensing. 2022;14: 1345.
  59. 59. Pan X, Yue J. Early Fault Diagnosis of Rolling Bearings Based on Parametric Optimized MCKD. Noise and Vibration Control. 2021;41: 109–113+218.
  60. 60. Chi Y. Alpine cloud tea cultivation techniques in Dehua County. Tea in Fujian. 2010; 42–43.
  61. 61. Zhang W, Shen S, Liu M, Feng M, Yang S. Climate division of tea planting in Hubei province. Journal of the Meteorological Sciences. 2011;31: 153–159.
  62. 62. Qi H, Gao H, Yang S, Tang Y, Chen Z, Ye Q, et al. Study on Climatic Regionalization of Tea Plant in Chongqing Based on GIS. Plateau and Mountain Meteorology Research. 2017;37: 61–65.