Past, current, and future trends of red spiny lobster based on PCA with MaxEnt model in Galapagos Islands, Ecuador

Abstract In order to enhance in terms of accuracy and predict the modeling of the potential distribution of species, the integration of using principal components of environmental variables as input of maximum entropy (MaxEnt) has been proposed in this study. Principal components selected previously from the principal component analysis results performed in ArcGIS in the environmental variables was used as an input data of MaxEnt instead of raw data to model the potential distribution of red spiny lobster from the year 1997 to 2015 and for three different future scenarios 2020, 2050, and 2070. One set of six original environmental variables pertaining to the years 1997–2015 and one set of four variables for future scenarios were transformed independently into a single multiband raster in ArcGIS in order to select the variables whose eigenvalues explains more than 5% of the total variance with the purpose to use in the modeling prediction in MaxEnt. The years 1997 and 1998 were chosen to compare the accuracy of the model, showing better results using principal components instead of raw data in terms of area under the curve and partial receiver operating characteristic as well as better predictions of suitable areas. Using principal components as input of MaxEnt enhances the prediction of good habitat suitability for red spiny lobster; however, future scenarios suggest an adequate management by researches to elaborate appropriate guidelines for the conservation of the habitat for this valuable specie with face to the climate change.

The Intergovernmental Panel on Climate Change underpins the importance of conserving biodiversity in the face of climate change (IPCC, 2011). The relationship between species and its surrounding environmental variables have been of particular interest to ecologists (Kuemmerlen, Stoll, Sundermann, & Haase, 2016). Understanding these interactions are significant to predict current and future species distributions (Remya, Ramachandran, & Jayakumar, 2015). Species distribution modeling (SDM) is gradually used by scientists and policymakers as tools to investigate a wide variety of ecological problems. For the generation of SDM in Maximum entropy (MaxEnt), the environmental layers have been applied as hypothetical predictive variables using raw data, and associated to the geographical records of a particular specie show complications of spatial autocorrelation that can be defined as the degree of dependency of variables in geographical space (Anselin & Moreno, 2003). SDM obtained from a large dataset of associated environmental covariates often naturally result in multicollinearity, a statistical problem defined as a high degree of correlation among covariates as well in nonexperimental situations, where the researcher has no control of the risk associated to hypothetical factors related to independent variables.
To overcome this problem, principal components analysis (PCA) as a multivariate technique tool of ArcGIS can be apply as a predictor variable of environmental layers in order to incorporate as an input data of MaxEnt model instead of raw data of environmental variables.
Many applications of PCA explain the reduction to a number of predictive variables that retain a high proportion of the original information (Tabachnick & Fidell, 2007).
Maximum entropy is considered to be the most consistent methodology in studies of distribution of species (Elith, Graham, Anderson, Dudik, & Ferrier, 2006;Hernandez, Graham, Master, & Albert, 2008;Phillips, Anderson, & Schapire, 2006;Wisz et al., 2008;Mateo, Croat, Felicisimo, & Muñoz, 2010;Aguirre-Gutierrez et al., 2013) and has been described as especially efficient for handling complex interactions between response and predictor variables (Elith et al., 2011). In Galapagos Islands, MaxEnt model has not been used to predict the habitat suitability of RSL, nor focus on the impact of climate change on the distribution of the suitable habitat of this particular specie. Therefore, the key contribution of the study is to retain the principal components (PCs) from PCA extracted from ArcGIS in environmental variables, whose eigenvalues explain more than 5% of the total variance and use them as input of MaxEnt instead of raw data, in order to determine how this approach can enhance the accuracy of the potential distribution of RSL considering 19-year period and three future scenarios in GMR. Receiver operating characteristic (ROC) and the area under the ROC curve (AUC) as well Cohen's kappa statistic were used to validate the performance of the MaxEnt model.

| Study area
The study area considered for this study represents the Archipelago of Galapagos as presented in Figure 1. The total area covers 85,647 km 2 .
The entire study area of GMR includes twelve islands that represent the summits of volcanoes that emerged from the sea approximately 1-3 million years ago. Three major ocean current influence the climate in the GMR: the Panama current, bringing warm water from the north; the Humboldt current, bringing cooler water from the south; and the upwelling subequatorial (or Cromwell) current, with highly productive cold waters (Hearn, 2008

| Environmental predictors
For 19-year period, six environmental variables were recorded as original climatic data: annual sea surface temperature (ASST); annual maximum air temperature (AMAT); annual mean air temperature (AMeAT); annual minimum air temperature (AMiAT); annual precipitation (AP); and annual relative humidity (ARH), taken from the data zone of the Charles Darwin National Foundation home page (http:// www.darwinfoundation.org/datazone/climate/) from 1997 to 2015.
Using PCA as a multivariate technique of ArcGIS, the eigenvalues of environmental variables that explained more than 5% of the total variance were selected as an input of MaxEnt.
Four environmental variables were considered for the future climate scenarios pertaining to the years 2020, 2050, and 2070 taken from Climate Change Scenarios, GIS program home page https://gisclimatechange.ucar.edu/gis-data. The environmental variables recorded were as follows: Total precipitation (PPT); surface temperature (TS); maximum air temperature (TASMAX); and minimum air temperature (TASMIN). The model simulation selected for futures scenarios was IPCC Climate Change Commitment Scenario . Using PCA technique in the environmental variables where its eigenvalues explained more than 5% was selected as input of MaxEnt. Table 1 shows all environmental variables considered in the study that had a spatial resolution of 1 km 2 .

| Specie presence records
RSL (Panulirus penicillatus) is the specie most widely distributed of the spiny lobsters, ranging throughout the Indo-Pacific, Red Sea, and eastern tropical Pacific Islands including the Archipelago of Galapagos, where it is found around most islands and islets, inhabiting the shallow

Research Highlights
• Using principal components of PCA as input of MaxEnt instead of raw environmental data.
• Enhancing the accuracy and predictions of habitat suitability in red spiny lobster using MaxEnt.
• Analyzing the historical, current, and future trends of red spiny lobster in Galapagos under global warming.
rocky (Hickman & Zimmerman, 2000). This specie is gregarious and may often be found in groups of more than 20 individuals of different sizes, in submerged caves and lava tunnels (Hearn, 2008) Presence data for "P. pennicilatus" were extracted from the History of Marine Animal Populations (HMAP)-GMR, Ecuador III (HMAP Data Pages), and previous monitorings performed by researches. A total of 39,150 presence records pertaining to the twelve islands for RSL were utilized to predict the potential distribution in the 19-year period. The data were collected by interviewers and fishery observers from fishers at Puerto Ayora, Baquerizo Moreno, and Puerto Villamil on a daily basis during each fishing season, which usually lasted from September to December (with some exceptions).
Data provide the catch and effort statistics, the fishing method (Hookah diving), effective fishing hours (5 hr-average), number of divers (15 maximum), vessel type (fiberglass or wood), departure and landing port, and departure and arrival date. In addition, more than 150,000 measurements of total, carapace, and tail lengths of red lobsters were taken between 1997 and 2011. In most cases, these three types of lengths measurements were not registered for each individual.

| Modeling procedure
ArcGIS 10.5 was used in order to reach the objective of this study. Six environmental variables recorded from 1997 to 2015 were imported to ArcGIS as an original raw data. The raw data were transformed to raster format and were analyzed annually starting from the year 1997.
The environmental variables were cut within the study area. The "extract by mask" tool was used for each of the environmental variables in order to have the same extent and same cell of the study area. Using composite bands tool, ArcGIS creates a single raster dataset from multiple bands which means that all the six environmental variables were combined in a one multiband image in raster format. Nearest neighbor was used in the resample of the data during display. This multiband image was used as input of PCA to generate the respective predictor variables taking into account the same processing extent as the study area. The results of PCA were used as input of MaxEnt together with the presence records of RSL. The same procedure was conducted for the years 1998 to 2015 and for future scenarios (2020, 2050, and 2070). As a result, a range between two and four PCs among the six original environmental variables were selected by year to predict the potential distribution of RSL for the period 1997-2015, and four PCs for future scenarios, based on the percentage of variance explained by eigenvalues (>5%).
The format supported by MaxEnt is ASCII (.asc); for this reason, each variable was reformatted to ASCII using the "Convert Raster to ASCII" tool. All predicted variables used in the model had 30 arcsecond spatial resolution (1-km spatial resolution). Figure 2 illustrates the generation of species distribution models based on this approach.
Maximum entropy model performed best among many other species distribution models (Elith et al., 2006 andOrtega-Huerta &. For ecological niche-modeling using MaxEnt, the predicted variables obtained from PCA by each year must be loaded in ASCII format, and the presence occurrence data must contain the name of the species and geographical coordinates in CSV format generated in EXCEL software. For this purpose, ArcGIS 10.5 software was used to generate a shapefile of points using geographical coordinates to represent the presence of species, thereby identifying the points involved in developing the model. The definitive presence points were tabulated in EXCEL software. Each data point was placed in a single cell in the sheet, the name of the species followed by longitude and latitude geographical coordinates and separated by commas (,); in this way: "Panulirus_penicillatus, −90.04, −0.84," this document was saved in CSV format (.csv).
The records of occurrence by each year as well the predicted occurrence for 2020, 2050, and 2070 and its coordinates were previously analyzed for consistency in order to eliminate pseudoabsences generated outside the study area using the Spatial Analyst tool in ArcGIS software for environmental layers. The edition of environmental variables facilitated the transformation all environmental layers in order to obtain the same extent with the same pixel size (1 × 1 km) and the same position. Based on the spatial record occurrence of Panulirus penicillatus, the 70% of the records were used as a training model and the remaining 30% for validating the MaxEnt model. Runs were conducted with the default variable responses settings, and a logistic output format. Iterations were fixed as 5,000 and a convergence threshold as 0.00001. In order to avoid the overfitting of the test data, 0.1 was used as the regularization number (Phillips et al., 2006). The outputs generated by MaxEnt were transformed into raster format using the ArcMap tool in ArcGIS software for further analysis.
In order to have confidence in a predictive model, researchers such as Fielding and Bell (1997) and Farber and Kadmon (2003) described robust measures accepted as the best tools for evaluating model performance. AUC and Cohen's Kappa statistic were used in this study to assess MaxEnt model performance. An ROC test was applied for additional precision analysis. This measure estimates the relationship between AUC and the null expectation of bootstrap repetitions F I G U R E 2 Generation of species distribution models using principal components as input of MaxEnt (Peterson, Papes, & Soberon, 2008). The evaluation of the model is based on forecast performance and includes the determination of a minimum threshold of the quantitative value produced for the potential presence of the communities. ROC figures and AUC values were obtained directly from the analysis of MaxEnt. Values vary between 0 and 1, where 1 indicates high performance, and values lower than 0.5 indicate low performance (Luoto, Poyry, Heikkinen, & Saarinen, 2005;Elith et al., 2006).
The data management tool was used to calculate the kappa statistic according to a classification map based on a set of polygons and random points created in the classification maps created in ArcGIS software. Zhang, Liu, Sun, and Wang (2015) suggested the following ranges of agreement of the kappa statistic (K): <0.4, poor; 0.4-0.8, useful; and > 0.8, good to excellent. Kappa statistic ranged from −1 to +1, where +1 indicates excellent agreement between predictions and observations and values of 0 or less indicate agreement no better than random classification (Zhang et al., 2015). for the period 1997-2015 as well future scenarios (2020, 2050, and 2070). The analyses of PCA shows that in the most of the years, the variables such as AMAT, AMeAT, AP, and ARH were the variable predictors that contributed with the most variance among the original variables, explaining more than 95% of the total variance. Figure 3 shows the potential distribution of RSL along the 19-year period, and Figure 4 shows the predicted potential distributions for futures scenarios pertaining to the years 2020, 2050, and 2070. Figure 3 Figure 4a,c, respectively, in terms of prediction in which RSL can survive rather than raw data which explained 91% and 92% in Figure 4b,d, respectively. In terms of AUC values and Kappa statistic, the prediction using PCs was 0.92 and 0.98, in comparison with raw data that were 0.81 and 0.85 in the years 1997 and 1998, respectively. Table 3 shows the contribution of each environmental variable for the period 1997-2015 as well for future scenarios (2020,2050,2070), being AMAT, AMeAT, AP, ASST, the environmental predictors with high contribution in the potential distribution of RSL in the 19year period and PPT, TS, TASMAX, and TASMIN for future scenarios. T A B L E 2 Number of principal components (PCs) extracted from principal components analysis during the 19-year period and for future scenarios (2020, 2050, and 2070)

| DISCUSSION
This study applied PCs extracted from ArcGIS in order to use as input of MaxEnt instead of raw data with the purpose of enhancing the potential and future predictions of RSL in GMR based on this approach.
Common approach as using raw data to modelling potential present and future species distributions in ecological field do not always represent the uncertainty associated with the variable selection. Using Statistical methods of model selection by itself are not enough to reject or accept a model (Burnham & Anderson, 2002), as indices such AUC and Kappa; however, statistical models are a measure of internal model performance, not a measure of ecological validity. Therefore, the accuracy of the current and future distribution of RSL does not guarantee that the prediction will be accurate.
PCA has been widely used in various fields of investigation. These studies concern either environmental variation (Janzekovic & Novak, 2012), the investigated species, or communities characteristics. In aquatic habitat studies, it has been applied for evaluation of aquatic habitat suitability, their seasonal, and spatial variation (Ahmadi-Nedushan et al., 2006). When using PCs as predictor variables, it is necessary to minimize the autocorrelation before the selection in order to avoid negative effects in the modelling analysis. The prediction maps using raw data generally present a reduced number of suitable areas for RSL and less accuracy in terms of AUC and kappa; this is  (Wenju, Santoso, Wang, & Wu, 2015).  Figure 5. Figure 5a illustrates the case of 2020, in which areas of presence of RSL maintain similar trend with the last year analyzed in this study (2015) around GMR, and the environmental variables influenced in a positively way at this year. In 2050, the prediction of areas of good habitat suitability of RSL increases along the GMR and equatorial zone as shown in Figure 5b. This scenario can be explained due to the good environment conditions that can influence positively to RSL to increase its population. However, projecting to 2070, the environment conditions can be changed drastically due to the influence of its surrounding that can produce the decrease in suitable areas for the surviving of RSL. Considering these scenarios, the climate change has strong influence in the distribution of RSL; therefore, the suggestion for future studies is conducted an adequate management and elaborate an appropriate guidelines that support the decision-making of the researches for the conservation of the habitat for this particular specie that is the most economically valuable in GMR in the face of global warming.

| CONCLUSIONS
In this study, a PCA was applied as input of MaxEnt in order to model the potential distribution of RSL in GMR from the period 1997-2015 as well for future scenarios (2020, 2050 and 2070). The main conclusions are as follows: 1. The use of PCs of environmental variables as input of MaxEnt, instead of raw environmental data, can enhance the modelling of the potential distribution of a RSL in terms of accuracy and prediction in habitat suitability.

2.
The environmental variables that had the most influence in the distribution of RSL from 1997 to 2015 were AMAT, AMeAT, AP, and ASST, and for future climate scenarios were PPT, TS, TASMAX, and TASMIN.

The Kappa index showed better prediction accuracy in comparison
with AUC values, suggesting that it can be used as a more accurate tool for evaluating the quality and performance of species distribution models using MaxEnt.

CONFLICT OF INTEREST
None declared.