Machine Learning Programs Predict Saguaro Cactus Death

Results: Saguaro cacti (Carnegiea gigantea) show extensive bark coverage and cacti with extensive bark coverage die prematurely. Over the 23-year period of study, bark coverage on all surfaces was relatively constant. Decision trees are able to predict cactus death up to 96%. Three machine learning programs used similar surface coverages to make similar predictions of future bark coverage and cactus death accurately (approximately 92%), for cacti that had overall bark coverage less than 80% on south-facing surfaces. Higher prediction accuracies were obtained for cacti with were low bark percentages. While bark coverage rates and cactus death were less accurate for cacti with higher bark percentages because cacti can remain with high bark percentages with many years prior to death. Cacti with more than 80% coverage on south-facing surfaces were accurately predicted (p<0.05) to be alive and dead of the 23-year period with a tracking method.


Introduction
Saguaro cacti (Carnegiea gigantea), a columnar plant species native to the Sonoran Desert, exhibit extensive bark coverage and experience premature death [1]. Twenty-three additional columnar cactus species show bark coverage in the Americas [2]. Bark coverage is caused by a build-up of epicuticular waxes produced by epidermal cells [3,4]. During wax accumulation, many stomata become blocked, preventing gas exchange [2][3][4]. Extensive bark coverage results in cactus mortality rates of 2.3% per year even though estimates of life expectancies may be several hundred years [5][6][7].
Controlled exposures to UV-B light produce a buildup of epicuticular waxes in the same manner that occurs in nature [8]. Averaged over the entire year, south-facing surfaces of saguaro cacti in Tucson, Arizona (32.2°N) receive four times more sunlight than northfacing surfaces [9]. Corresponding with these sunlight exposures, south-facing surfaces of saguaros show initial bark coverage whereas north-facing surfaces are the last surfaces to show bark coverage prior to cactus death ( Figure 1) [2,3,10].
In previous studies, machine learning programs predicted increases in both bark coverages on cactus surfaces and death of saguaro cacti [1,6,11]. The most accurate predictions of cactus death involved bark coverage on north-facing right troughs (NR). For example, a WEKA C4.5 decision tree predicted cactus death with 88.7% accuracy using only trough data [6]. Two independent machine learning methods using only NR data predicted increased rates of bark coverage (RBC) and mortality (84% accuracy) [1]. Most recently, results of logistic curve analyses (with all r 2 values above 0.95), demonstrated the sequence of bark coverage on twelve surfaces around cacti. NR were the last of the twelve surfaces to have bark coverage, after an estimated 13-year delay [11].
The current study uses data from four consecutive evaluations, including three machine learning programs, to determine the following: (1) if saguaro cacti have relatively constant RBC; (2) if bark coverage on several cactus surfaces can predict bark coverage on NR; and (3) if bark coverage can predict cactus death.
The following were hypothesized: that bark coverage increases at constant rates, bark coverage of NR can be predicted with bark coverage on other surfaces, mortality rates can be predicted with bark coverage, and mortality of Class IV cacti can be predicted with bark coverage.

Field and survey conditions
Over a 23-year period, 1,149 saguaro cacti (Carnegiea gigantea) were studied. All cacti were within 50 field plots in Tucson Mountain Park near Tucson, AZ (32.2°N, 111.14°W) [5]. Cacti were first selected in 1993-1994. All selected cacti were taller than 4 m and assumed to be more than 80 years old [7,12]. Physical characteristics, nearby vegetation, topographical features, and GPS coordinates were used to distinguish each cactus plant. All cacti were re-evaluated in 2002, 2010, and 2017 to provide a total of four sampling periods [5].

Evaluation of cacti in field plots
Each saguaro cactus has 20 to 23 rib crests or convex protrusions [13]. Troughs are concave surfaces on the sides of each crest [14]. Usually, crests have more bark coverage percentages than troughs since crests are more exposed to sunlight than troughs. For each evaluation, the crest closest to south, east, north, and west azimuths was evaluated [3,4,15,16]. Troughs to the right and left of each crest were evaluated so that twelve surfaces were assessed for each cactus. An 8 cm vertical span, at 1.75 meters from ground level, was evaluated for percent bark coverage for each surface [3,4,15,16]. Bark coverage percentages were determined visually. Previously, similar estimates of bark coverage were obtained with visual and digital methods [6]. Percentages of bark coverage on all surfaces were entered into Microsoft Excel. The final data file titled Master File had 55,152 data points for the twelve surfaces of all 1,149 cacti over the four data collection periods (1993-1994, 2002, 2010, and 2017).

Classes of cactus plants
Bark coverage varies on cactus surfaces. Each cactus was placed in a class depending upon the percentage of bark coverage on south-facing crests only [5]: Class I -Less than 20% bark coverage, Class II -Between 21 and 49% bark coverage, Class III -Between 50 and 80% bark coverage, Class IV -More than 80% bark coverage, Class V -Cactus was dead.

Estimating constant rates of bark coverage (RBC)
To determine if bark coverage occurs at constant rates, the collected data were divided into three time periods (1994-2002, 2002-2010 and 2010-2017). Rates of bark coverage (RBC) were determined to be constant when the rate of bark coverage was similar between two successive periods. Thus, the data were placed into six class changes with three time periods each. For each cactus, bark coverage percentages were converted to arcsine values to linearize the data for analysis [17]. The rate of change for each cactus was determined by: where X is the arcsine bark percentage at the end of the period and Y is the arcsine bark percentage at the beginning of the period. The RBC for all samples in a class change among time periods were analyzed using a paired student's t-test (p=0.05).

Machine learning programs
Three machine learning programs (WEKA 3.8 decision trees, Validate Model, and Random Forest) were used to understand bark coverage. The WEKA 3.8 program generated decision trees that resulted in two distinct outcomes with a corresponding accuracy [18,19]. The WEKA 3.8 program implemented the ID3 algorithm to generate variables used to make predictions and produce decision trees [20]. The decision trees are visualized using the J48 algorithm.
The second machine learning program, Validate Model, was used to analyze RBC on cactus surfaces. Validate Model used a 10-fold cross-validation technique that used ten randomly selected subsets of the entire database, tested to produce a model, and validated the entire dataset to produce the final machine learning model. From Validate Model, rates of bark coverage (RBC) were classified as Slow, Normal, and Fast. Rates were considered Slow if the values were less than two times the standard deviation from the mean (Normal). Rates were considered Fast if the values were more than two times the standard deviation from the mean (Normal).
A third machine learning program, Random Forest, was used to confirm the results of the first two machine learning through the construction of decision trees [21].

Bark coverage percentages increase at constant rates
There were six group comparisons (Class I to Class II, Class I to Class III, Class I to Class IV, Class II to Class III, Class II to Class IV, Class III to Class IV) made for twelve surfaces and three time periods ( Table 1). The data show that there were no consistent differences in the RBC for any surface for the three of the time periods. Some individual differences were present for all trough surfaces. No crests showed statistically significant differences. Therefore, we conclude that bark coverage rates were relatively constant on all surfaces.

Bark coverage rates of north-facing right troughs can be predicted with bark coverage on other surfaces
To predict bark coverage on north-right troughs, three programs were utilized. The first step is to remove all dead cacti from Master File. The new file is titled No Dead File. The second step is to place the data from No Dead File into Validate Model so individual cacti could be catalogued as Slow, Normal, and Fast RBC. Three clusters were selected as predictor surfaces. Cluster 1 consisted of all surfaces except NR. Cluster 2 consisted of north-facing crests (NC) with north-facing left troughs (NL). Cluster 3 consisted of NL with west-facing left troughs (WL). After Slow, Normal, and Fast groups of cacti were determined, data were processed with the WEKA 3.8 program. Within each cluster, comparisons were made between Normal and Slow cacti as well as between Normal and Fast cacti ( Figure 2).
Comparisons of several machine learning models were used to predict bark coverage on NR with multiple predictive surfaces. The WEKA 3.8 and RF programs predicted bark coverage on NR with 89.9 to 94.9% accuracies (Table 2). Overall, analyses with Slow cacti produced higher accuracies than analyses with Fast cacti. Accuracies of the three predictive surfaces (north-facing left troughs (NL) combined with west-facing left troughs (WL), NL combined with north-facing crests (NC), and all surfaces except the NR) were similar. The numbers of Fast and Slow cacti from WEKA 3.8 confusion matrices were within 96% of cacti from Validate Model.

Mortality rates can be predicted with bark coverage data
The Master File was used to predict cactus death. For the periods (1994-2002, 2002-2010, 2010-2017), each cactus was placed in its class accordingly. Considering each class individually, WEKA 3.8 decision trees were used to determine whether a cactus was alive or dead by the end of the period.
The WEKA 3.8 program was used to predict cactus death with data of several cactus classes.  (Table 3; Figure 3).
Decision trees predicted death of Class II cacti with accuracies between 64.9 and 95.2% (Figure 4). Decision trees predicted death of Class III cacti with accuracies between 57.1 and 92.5%. Decision trees predicted death of Class IV cacti with relatively low accuracies between 68.9 and 77.4%.  (Table 4). Although Validate Model and the WEKA 3.8 program were not accurate for predicting death of Class IV cacti accurately, death of Class IV cacti can be predicted based upon bark coverages from 1994 through 2017.

Discussion
Saguaro cacti can live several hundred years [7,12]. Bark coverage on saguaro cacti did not occur, or was rare, prior to the 1950's [10]. Thus, bark coverage on saguaros is a relatively recent phenomenon. The purpose of this study was to closely examine bark coverage and cactus death within a population of saguaro cacti. Turner showed that 158, of a total of 208, cacti were taller than 4.2 m in 1962. In the same plot of 1988, only 27 of 168 cacti were taller than 4.2 m [22]. By subtraction, a minimum of 131 saguaros, taller than 4.2 m, died over the 27-year period. In addition, Turner and Funicelli demonstrated that 16% of the saguaro cacti in their study plots died between 1990-2000 [23]. O'Brien et al. demonstrated that the oldest of 20,372 saguaros in Saguaro National Park was less than 110 years old [24]. The above data and the data of the current study are coincident with high mortality rates [5]. These increases in morbidity and mortality are inconsistent with the lifespan noted above [7].
As stated previously, saguaros have extensive bark coverage prior to experiencing premature death [1]. Bark coverage on south-facing surfaces first occurs on south-facing surfaces while north-facing surfaces are the last surfaces to have bark coverage before cactus death [2,3,10]. For example, south crests of Class I cacti had an 80% increase in bark coverage, while east and west crests increased between 35 and 50%. Concurrently, NC increased 11 to 15%, while NR and NL increased only 2.6 and 3.1% respectively. Similar large differences occurred for Class II cacti that moved to Class IV. South crests increased 60%, east and west crests changed between 25 and 35%, while north crests changed 12 and 16%, and NR and NL changed only 3.6 and 6.9% respectively. The current data are incomplete in accordance with previous data from 1994, 2002, and 2010 that demonstrate the delay in bark coverage from south-facing surfaces to north-facing surfaces [11].
Saguaro cacti younger than 80 years old have little bark coverage. Bark coverage begins slowly on a surface but eventually increases rapidly. Surfaces above 90% bark coverage increase slowly thereafter.     So, the increase in bark coverage reflects a logistics curve [11]. Data of Class I cacti that moved to Classes II, III, and IV may reflect a logistics curve ( Figure 5). The low slope for south crests may be projected to the lower portion of a logistics curve. The higher slopes of east and west crests may be projected to the initial increase in the logistics curve. The south crest's slope may be projected to the upper part of the logistics curve.
All three machine learning models provided very similar results. The WEKA 3.8 program and Random Forest accuracies from all samples were within 4.2% (Table 2). Moreover, data of Fast and Slow cactus groups from Validate Model were within 2.9% of WEKA 3.8 decision tree values. The greatest error between the results of the machine learning models occur because of outliers. The majority (88%) of outliers were Class IV cacti. Therefore, most of the Normal cacti were from Classes I, II, and III.
Over the past several decades, many new and innovative decision tree programs have been developed. Decision trees have been used for a large number of purposes in making predictions. For example, field guides that are used to assist in the identification of wild bird and plant species serve as a form of decision tree [20,25]. Machine learning decision trees enable conclusions with large data sets in a wide range of topics such as board games, astronomy, petrochemistry, cancer research, phylogenetics and among many others [26][27][28][29][30].
Decision trees that are used to address large databases have no appeal to a conceptual model. Therefore, decision trees may serve to validate models and concepts held by researchers, based upon current knowledge. For example, a previous WEKA 3.8 decision tree analysis of saguaro cactus data, from the 1994 and 2002 sampling periods, selected north-facing surfaces as the first indicator surfaces [6]. Data from the 2010 survey revealed that if NL had more than 70% bark coverage, and NR had more than 65% bark coverage, WEKA 3.8 decision trees predicted with 85% accuracy that a cactus would die within eight years.
The above predictions, from three time periods (1994, 2002, and 2010), are supported by the data herein that include a survey from 2017. Summation data for all three north-facing surfaces of Class IV Alive cacti were 71 to 74% of Class IV Dead for 1994, 2002, and 2010. In addition, summation data of east troughs, north troughs, and west troughs of Class IV Alive cacti were 72 to 79% of Class IV Dead for the same three time periods (Table 5). The 70 to 79% lower percentages on troughs may represent the lag time of bark coverage on troughs compared to crests. In addition, a sum of all twelve surfaces was created for Class IV Alive and Class IV Dead. Class IV Dead cacti had a sum of 656 in 1994 (23 years prior to 2017), and Class IV Alive cacti had a sum of 691 in 2002. Do these data suggest that Class IV Alive cacti will die within 23 years prior to 2025?
Many WEKA 3.8 decision trees were generated for this research; however, machine learning programs have some limitations. Although Random Forest produced confusion matrices and had high accuracies, the program does not produce decision trees. Machine learning models used by WEKA 3.8 generate decision trees using a ten-fold, cross-validation technique. This technique uses 90% of the dataset for training and utilizes the remaining 10% of cacti to generate trees, confusion matrices, and accuracies. If a dataset has few cacti, the WEKA 3.8 program may not have enough information to provide high accuracies. For example, Class I cacti from 20102017 had only 35 cacti available for examination. Only one cactus died, while 34 remained alive. For this case, the proportion of alive to dead cacti was inadequate to train the WEKA 3.8 program to produce an accurate decision tree.
Predicting death of Class IV cacti from 1994-2017 using the WEKA 3.8 procedures may prove to be more successful than doing so from 1994-2002, 2002-2010, and 2010-2017. Attempts to predict the fate of Class IV cacti from 1994-2017 using the WEKA 3.8 program gave low accuracies. The low accuracies most likely resulted from the slow increases in bark coverage exhibited by already unhealthy Class IV cacti. As shown in this study, the tracking of bark coverage from period to period (1994-2002, 2002-2010, and 2010-2017), provided an accurate way to differentiate between alive and dead cacti in 2017. Differentiation between alive and dead cacti of Class IV began in 1994. To our knowledge, this is the first publication that predicts cactus health with three independent machine learning models. Therefore, machine learning models provide powerful tools to address complex issues with large data bases.

Author Contributions
LSE was involved in the initial experimental designs, field work in evaluating the cacti, helping to format the excel databases, evaluation of    Relationships between changes in bark coverages for crests of Class I cacti that changed to Classes II, III, and IV. All slopes had r 2 values above 0.96 and the slopes were 30.1, 14.5, 11.5, and 3.5 for south crests (green circles), west crests (orange circles), east crests (purple circles), and north crests (blue circles), respectively. Note the black solid arrows that indicate the possible location of these data with a logistics curve.
the data, writing of the manuscript. CRJ was involved in the field work to evaluate cacti, entered all data into excel data sheets and proofread them, entered the data into the various machine learning programs, and did the initial evaluations of the results of the machine learning programs, assisted with the editing process and creation of graphics. The authors are the sole originators of the data and have no conflicts of interest. The funding for this research was from the Catherine and Robert Fenton Endowed Chair to LSE and is the only funding source. To be clear, we received no funding or any other type of support from any commercial sources and we have no conflicts of interest with this research and/or this report. This manuscript has not been submitted elsewhere in any form.