EnvRtype: a software to interplay enviromics and quantitative genomics in agriculture

Abstract Envirotyping is an essential technique used to unfold the nongenetic drivers associated with the phenotypic adaptation of living organisms. Here, we introduce the EnvRtype R package, a novel toolkit developed to interplay large-scale envirotyping data (enviromics) into quantitative genomics. To start a user-friendly envirotyping pipeline, this package offers: (1) remote sensing tools for collecting (get_weather and extract_GIS functions) and processing ecophysiological variables (processWTH function) from raw environmental data at single locations or worldwide; (2) environmental characterization by typing environments and profiling descriptors of environmental quality (env_typing function), in addition to gathering environmental covariables as quantitative descriptors for predictive purposes (W_matrix function); and (3) identification of environmental similarity that can be used as an enviromic-based kernel (env_typing function) in whole-genome prediction (GP), aimed at increasing ecophysiological knowledge in genomic best-unbiased predictions (GBLUP) and emulating reaction norm effects (get_kernel and kernel_model functions). We highlight literature mining concepts in fine-tuning envirotyping parameters for each plant species and target growing environments. We show that envirotyping for predictive breeding collects raw data and processes it in an eco-physiologically smart way. Examples of its use for creating global-scale envirotyping networks and integrating reaction-norm modeling in GP are also outlined. We conclude that EnvRtype provides a cost-effective envirotyping pipeline capable of providing high quality enviromic data for a diverse set of genomic-based studies, especially for increasing accuracy in GP across untested growing environments.


Introduction
Quantitative genetics divides phenotypic variation (P) into a genetic (G) and nongenetic sources of variation (E). The latter may involve micro-environmental effects that can be controlled by adequate experimental designs and phenotype correction strategies (e.g., Resende and Duarte 2007;Galli et al. 2018). Conversely, most nongenetic sources are due to macro-environmental fluctuations resulting from resource availability during crop lifetime (Shelford 1931). Despite this unfolded division, the effect of the environment on shaping gene expression (e.g., Plessis et al. 2015;Jo nczyk et al. 2017;Liu et al. 2020) and fine-tuning epigenetic factors (Varotto et al. 2020;Vendramin et al. 2020) creates an indissoluble envirotype-phenotype covariance in the phenotypic records (Lynch and Walsh 1998). Thus, for any genotypephenotype association study across multiple environments (e.g., mapping quantitative trait loci, QLT; genomic association studies, GWAS), there is a strong nongenetic influence that can be better understood using envirotyping-based data, i.e., a foundation of multiple techniques to collect, process, and integrate environmental information in genetic and genomic studies, Costa-Neto et al. (2020a).
Over the last 10 years, envirotyping (Xu 2016) has been incorporated into whole-genome prediction (GP, Meuwissen et al. 2001) aiming to better model genotype Â environment interaction (G Â E) as a function of reaction-norm from environmental covariables (ECs), i.e., linearized responsiveness of a certain genotype for a target environmental gradient. Those genomic-related reaction norms can be modeled as genotype-specific coefficients for each EC due to whole-genome factorial regressions (Heslot et al. 2014;Ly et al. 2018;Millet et al. 2019), allowing a deeper understanding of which ECs may better explain the phenotypic plasticity of organisms. Furthermore, ECs can also be used to create envirotyping-based kinships (Jarquín et al. 2014;Morais Junior et al. 2018;Costa-Neto et al. 2020a), enabling the establishment of putative environmental similarities that may drive a large amount of phenotypic variation. The integration of ecophysiologically enriched envirotyping data has led to outstanding results in modeling crops such as maize, due to the use of Crop Growth Models (Cooper et al. 2016;Messina et al. 2018) and Deep Kernel approaches (Costa-Neto et al. 2020a). Combined with phenotyping and genotyping data, the use of envirotyping data may leverage molecular breeding strategies to understand historical trends and cope with future scenarios of environmental change (Gillberg et al. 2019;de los Campos et al. 2020). Its use can also support other prediction-based pipelines in plant breeding, such as highthroughput phenotyping surveys (Bustos-Korts et al. 2019;Krause et al. 2019;Galli et al. 2020).
Despite advancements in the development of hypotheses supporting the inclusion of envirotyping data in GP, it is difficult for most breeders to deal with the interplay between envirotyping, ecophysiology, and genetics. For example, much research has been conducted to explore and associate data into the concepts and theories underlying quantitative genetics (e.g., Fisher's Infinitesimal Model) with the goal of building genomic relationship matrices (GRM). Genotyping pipelines based on bioinformatics were successfully developed to translate biochemical outputs collected from plant tissues into biologically significant markers of DNA polymorphisms, e.g., genotyping-by-sequence (GBS, Elshire et al. 2011). To the best of our knowledge, there is no publicly available user-friendly software to implement envirotyping pipelines to translate raw environmental data into a useful, highlytailored matrix of envirotypic descriptors. Consequently, a workflow to interplay enviromics (pool of environment-types, abbreviated as envirotypes) and genomic analysis is lacking, especially for GP conditions in multi-environment testing (MET) where G Â E is the main hindrance in the model's accuracy.
In this study, we introduce EnvRtype, a novel R package used to integrate macro-environmental factors in various fields of plant and animal research or evolutionary ecology. We approach basic ecophysiological concepts underlying the collection and processing of raw-environmental data, both biologically and statistically. Then, we present the functions for implementing remote data collection and primary processing and its applications for deriving quantitative and qualitative descriptors of relatedness. Finally, we present a comprehensive view of how enviromebased data can be incorporated into GP for selecting genotypes across diverse environments. We highlight the use of different envirotyping levels to discover descriptors of environmental similarity, using crop species to exemplify the concepts.

Envirotyping pipeline
EnvRtype is an R package created for handling envirotyping by ecophysiological concepts in quantitative genetics and genomics for multiple environments. It means that envirotyping is not only a collection of raw environmental data that is used for exploratory or predictive processes but rather a pipeline based on the collection of raw data and their subsequent processing in a manner that makes sense for describing the development of an organism in the target environment using a priori ecophysiological knowledge. Here, we consider enviromics as the large-scale envirotyping (Xu 2016;Resende et al. 2020) of a theoretical population of environments for a target species or germplasm (the so-called envirome). It may also denote the core of possible growing conditions and technological inputs that create different productivity levels. The envirotyping pipeline implemented by EnvRtype software is divided into three modules briefly described above and detailed in the following sections ( Figure 1).
Module 1 (yellow toolboxes in Figure 1) starts by collecting raw environmental data from public platforms, such as a satellitebased weather system named "NASA's Prediction of Worldwide Energy Resources" (NASA POWER, https://power.larc.nasa.gov/), which can access information daily anywhere on earth. This database is well consolidated and validated for use in several research fields, including crop modeling in agricultural research (White et al. 2011;Monteiro et al. 2018;Aboelkhair et al. 2019). Details about resolution and validation are given in https:// power.larc.nasa.gov/docs/methodology/validation/.
Data collection may span existing experimental trials (single sampling trials) or historical trends for a given location Â planting date arrangement. This module gathers the functions for remote data collection of daily weather and elevation data, as well as the computation of ecophysiological variables, such as the effect of air temperature on radiation use efficiency. The module includes a toolbox with "Remote Data Collection" and "Data Processing" steps, both designed to help researchers find a viable alternative for expensive in-field environmental sensing equipment. More details about the theoretical basis of environmental sensing and the module are given in the section "Module 1: Remote Environmental Sensing".
The processed environmental information can then be used for many purposes. In Module 2, we designed tools for the characterization of the macro-environmental variations, which can also be done across different time intervals of crop growth and development (when associated with a crop) or fixed time intervals (to characterize locations). The environmental characterization toolbox (green toolbox in Figure1) involves two types of profiling: 1) Discovering environmental types (envirotypes, hereafter abbreviated as ETs) and how frequently they occur at each growing environment (location, planting date, and year). Based on the ET-discovering step, it is possible to create environmental profiles and group environments with the same ET pattern. This step is also useful for running exploratory analysis, e.g., to discover the main ET of planting dates at a target location. 2) Gathering ECs from point-estimates (e.g., mean air temperature, cumulative rainfall). These ECs can be used for many purposes, from a basic interpretation of GÂE to estimating gene-environment interactions. At the end of this process, a matrix of ECs (W) is created and integrated with tools from Module 3. 3) Further details about this module are given in the section "Module 2: Macro-Environmental Characterization".
Finally, the information from Module 2 can be used to create environmental similarity and integrate robust GP platforms for multiple environments, hereafter referred to as envirotypeinformed GPs. Module 3 (the dark purple boxes Figure 1) aims to provide tools to compute environmental similarity using correlations or Euclidean distances across different trials conducted on ECs. Thus, we developed a function to integrate this enviromic source in GP as an additional source of variation to bridge the gap between genomic and phenotypic variation. For that, we provide at least four different structures into a flexible platform to integrate multiple genomic and enviromic kinships. Figure 1 shows some possible outputs of the EnvRtype package (in red toolbox colors), in which W can be used to interpret G Â E (e.g., factorial regression) or exploited in terms of increasing the accuracy of phenotype prediction for multiple environments. More details are given in the section "MODULE 3: Enviromic Similarity and Phenotype Prediction". Below we give some theoretical details about each module and a description of the functions used to implement it.

Software
The R package EnvRtype is available at https://github.com/alloga mous/EnvRtype (version 1.0.0, verified January 15, 2021). More details about graphical plots and additional codes can also be found on this Git Hub webpage and in Supplementary Software. Typing the following command in R will automatically install the package (BOX 1):All codes and BOX in this work are available on both the Git Hub web page and as Supplementary Code file.

Remote data collection
EnvRtype implements the remote collection of daily weather and elevation data by the get_weather function. This function has the following arguments: the environment name (env.id); geographic coordinates (latitude, lat; longitude, lon) in WGS84; time interval (start.day and end.day, given in "year-month-day"); and country identification (country), which sets the raster file of elevation for the region of a specific country. Countries are specified by their 3letter ISO codes [check in the package Git Hub or use the getData("ISO3") function from the raster package to see the codes]. Table 1 shows the names of the outputs of get_weather and processWTH (see Tools for basic processing). All weather information is given on a daily basis. Altitude (ALT) information is provided by SRTM 90 m resolution and can be collected from any place between À60 and 60 latitudes. This information is Figure 1 The workflow of the envirotyping pipeline implemented using EnvRtype in R. Yellow, green, dark purple, and red-colored boxes denote the steps related to Modules 1, 2, 3, and the examples of outputs from EnvRtype. Straight arrows indicate the flux of the envirotyping pipeline passing by each module. Curved arrows represent a process between Modules 1 and 2 in which field-growing conditions can be described as a panel of envirotype descriptors from each environmental factor processed and organized in Module 2. BOX 1. Install EnvRtype > install.packages('devtools') > devtools::install_github('allogamous/EnvRtype') presented as a data.frame class output in R. It is possible to download data for several environments by country.
A practical example of get_weather is given below (BOX 3). A collection of environmental data for Nairobi, Kenya (latitude 1.367 N, longitude 36.834 E) from 01 march 2015 to 01 April 2015, is performed by: A second function is extract_GIS, which can collect point values from large raster files from GIS databases. This function has six arguments. The object env.data indicates the name of the environmental dataset (arranged as a data.frame). It can be an output data.frame of the get_weather function or any spreadsheet of environmental data, as long as it is organized with a column denoting the environment's name, which is defined by the env.id argument (default is env.id ¼ "env"). Latitude and Longitude is provided in the same manner described in get_weather.
A practical use of extract_GIS is given below (BOX 4). A collection of clay content (g/kg) from 5 cm to 15 cm of depth for Nairobi using a raster file was downloaded from SoilGrids and the function extract_GIS. The raster file 'clay_5_15' can be accessed in R by typing data("clay_5_15").

Summarizing raw-data
A basic data summary of the outputs from the get_weather function is done by the summaryWTH function. This function has 10 arguments (env.data, id.names, env.id, days.id, var.id, statistic, probs, by.interval, time.window, and names.window). The common arguments with extract_GIS have the same described utility. Other identification columns (year, location, management, responsible researcher, etc.) may be indicated in the id.names argument, e.g., id.names ¼ c("year", "location", "treatment").
Considering a specific environmental variable, the argument var.id can be used as, for example, var.id ¼ "T2M". By default, this function considers all the names of the variables presented in Table 1. For other data sources, such as micro-station outputs, this argument is necessary for identifying which variables will be summarized. The argument days.id indicates the variable pointing to the time (days), where the default is the daysFromStart column from the get_weather function. A basic example of this use is given below (BOX 4):Dividing the development cycle into time intervals (e.g., phenology), whether phenological or fixed time intervals (e.g., 10-day intervals), helps to understand the temporal variation of environmental factors during the crop growth Table 1 The core of environmental factors available using the "environmental sensing module" of the EnvRtype package

Source
Environmental factor Unit Nasa POWER a Top-of-atmosphere insolation MJ m À2 d À1 Average insolation incident on a horizontal surface MJ m À2 d À1 Average downward longwave radiative flux MJ m À2 d À1 Wind speed at 10 m above the surface of the earth m s -1 Minimum air temperature at 2 m above the surface of the earth C d À1 Maximum air temperature at 2 m above the surface of the earth C d À1 The dew-point temperature at 2 m above the surface of the earth C d À1 Relative air humidity at 2 m above the surface of the earth % Rainfall precipitation (P) mm d À1 SRTM b Elevation ( c(0,14,35,60,90,120) denotes intervals of 0-14 days from the first day on record (0). If the first record denotes the crop's emergence date in the field, this can also be associated with some phenological interval. Those intervals can be named using the argument names.window, names.window ¼ c("P-E", "E-V1", "V1-V4", "V4-VT", "VT-GF", "GF-PM").
The argument statistic denotes which statistic should be used to summarize the data. The statistic can be: mean,sum or quantile. By default, all statistics are used. If statistic ¼ "quantile", the argument prob is useful to indicate which percentiles (from 0 to 1) will be collected from the data distribution, i.e., default is prob ¼ c(0.25, 0.50, 0.75), denoting the first (25%) second (50%, median), and third (75%) quantiles.

Tools for basic data processing
The processWTH function performs basic data processing. As described for summaryWTH, this function can also process environmental data for get_weather outputs and other sources (microstations, in-field sensors) using the same identification arguments (env.data, id.names, env.id, days.id, var.id). This function also gathers three other sub-functions created to compute general variables related to ecophysiological processes, such as the macro effects of soil-plant-atmosphere dynamics and atmospheric temperature on crop development. The basic usage of this package is given by processWTH(env.data ¼ env.data); in addition, crop-specific parameters such as cardinal values of temperature and evapotranspiration, as well as site-specific characteristics, can be given in additional arguments. Below, we describe these arguments over three functions that compose processWTH. We provide a brief description of them, in addition to the ecophysiological concepts underlying their application (see Appendix 1).

Radiation-related covariables
EnvRtype made a function called param_radiation available to compute additional radiation-based variables that can be useful for plant breeders and researchers in several fields of agricultural research (e.g., agrometeorology). These parameters include the actual duration of sunshine hours (n, in hours) and total daylength (N, in hours), both estimated according to the altitude and latitude of the site, time of year (Julian day, from 1 to 365), and cloudiness (for n). In addition, the global solar radiation incidence (SRAD, in MJ m 2 d À1 ) is computed as described at the beginning of this section. The latter is important in most computations of crop evapotranspiration (Allen et al.1998) and biomass production (Muchow et al.1990;Muchow and Sinclair, 1991). More details about those equations are given in ecophysiology and evapotranspiration literature (Allen et al.1998;Soltani and Sinclair 2012). More detail is given in Appendix 2.
The arguments of param_radiation are: env.data and merge, in which merge denotes if the computed radiation parameters must be merged with the env.data set (merge ¼ TRUE, by default).

Temperature-related covariables
The param_temperature function computes additional thermalrelated parameters, such as GDD and FRUE, and T2M_RANGE. This function has eight arguments (env.data, Tmax,Tmin, Tbase1, Tbase2, Topt1, Topt2, and merge). To run this function with data sources other than get_weather, it is necessary to indicate which columns denote maximum air temperature (Tmax, default is Tmax ¼ "T2M_MAX") and minimum air temperature (Tmin, default is Tmin ¼ "T2M_MIN") (BOX 6). The cardinal temperatures must follow the processes provided in the previously described ecophysiology literature (Soltani and

Atmospheric demands
We implemented the param_atmospheric function to run basic computations of atmospheric demands. This function has 11 BOX 5. Practical use of SummaryWTH Practical use of param_temperature for Dry Beans in Nairobi, Kenya Table 2 Synthesis of some cardinal limits for temperature on the phenology development in the main crops These estimates were adapted from Soltani and Sinclar (2012) arguments: env.data; PREC (rainfall precipitation in mm, default is PREC¼"PRECTOT"); Tdew (dew point temperature in C, default is Tdew¼"T2M_DEW"); Tmax (maximum air temperature in C, default is Tmax¼"T2M_MAX"); Tmin (minimum air temperature in C, default is Tmin¼"T2M_MIN"); RH (relative air humidity %, default is RH¼"RH2M"); Rad (net radiation, in MJ m À2 day À1 , default is Rad ¼ "Srad"); alpha (empirical constant accounting for vapor deficit and canopy resistance values, default is alpha ¼ 1.26); Alt (altitude, in meters above sea level, default is . Details about these inputs and equations are given in the Appendix Section. Below we present an example of usage in Nairobi, Kenya. Consider the same env.data collected in the previous box and an elevation value of Alt ¼ 1,628 (BOX 7).

Discovering Envirotypes with env_typing
An environment can be viewed as the status of multiple resource inputs (e.g., water, radiation, and nutrients) across a certain time interval (e.g., from sowing to harvesting) within a specific space or location. The quality of those environments is an end-result of the daily balance of resource availability, which can be described as a function of how many resources are available and how frequently those resources occur (e.g., transitory or constant effects). Also, the relationship between resource absorption and allocation depends on plant characteristics (e.g., phenology, current health status). Then, this particular environmental-plant influence is named after the envirotype to differentiate it from the concept of raw environmental data (data collected directly from sensors). It can be referred to as ETs. Finally, the typing of environments can be done by discovering ETs; the similarity among environments is a consequence of the number of ETs shared between environments.
Before the computation of ETs, a first step was to develop a design based on ecophysiological concepts (e.g., plants' needs for some resource) or summarize the raw data from the core environments being analyzed. Then, for each ET we computed the frequency of occurrence, which represents the frequency of specific quantities of resources available for plant development. Typing by frequency of occurrence provides a deeper understanding of the distribution of events, such as rainfall distribution across different growing cycles and the occurrence of heat stress conditions in a target location (Heinemann et al. 2015). Thus, groups of environments can be better identified by analyzing the events occurring in a target location, year, or planting date. This step can be done not only by using grade point averages (e.g., accumulated sums or means for specific periods), but also by their historical similarity. In this way, we are able to not only group environments in the same year, but also through a historical series of years. Finally, this analysis deepens in resolution when the same environment is divided by time intervals, which can be fixed (e.g., 10-day intervals), or categorized by specific phenological stages of a specific crop.
To implement envirotype profiling, we created the env_typing function. This function computes the frequency of occurrence of each envirotype across diverse environments. This function has 12 arguments, nine of which (env.data, id.names, env.id, days.idvar.id, statistic, by.interval, time.window, and names.window) work in the same way as already described in the previous functions. The argument cardinals are responsible for defining the biological thresholds between envirotypes and adaptation zones. These cardinals must respect the ecophysiological limits of each crop, germplasm, or region. For that, we suggest literature on ecophysiology and crop growth modeling, such as Soltani and Sinclar (2012). The argument cardinals can be filled out as vectors (for single-environmental factors) or as a list of vectors for each environmental factor considered in the analysis. For example, considering the cardinals for air temperature in rainfed rice presented in Table 2, the cardinals are typed for Los Baños, Philippines, from 2000 to 2020, as (BOX 8):If cardinals ¼ NULL, the quantiles 10%, 25%, 50%, 75%, and 90% are used by default. Which quantiles will be used is determined in the same manner as prob (in summaryWTH), but now using the quantile argument, e.g., quantile ¼ c(0.25,0.50,0.75).
For multiple environmental factors, a list of cardinals must be provided-for example, considering rainfall precipitation (PRECTOT, mm.day À1 ) and dew point temperature (T2DEW, C.day À1 ). Suppose precipitation values less than 10 mm.day À1 are insufficient to meet the studied crops' demand. Values between 11 mm.day À1 and 40 mm.day À1 would be considered excellent water conditions, and values greater than 40 mm.day À1 would be considered excessive rainfall. In this scenario, such rainfall values could be negatively associated with flooding of the soil and drainage of fertilizers, among other factors related to crop lodging or disease occurrence. Thus, for PRECTOT, the cardinals will be cardinals ¼ c(0,5,10,25,40,100). For dew point, let's assume data-driven typing (cardinals ¼ NULL) using the previously described quantiles. Taking the same example for Los Baños, Philippines (BOX 9). BOX 7. Practical use of param_atmospheric > # first need to compute radiation parameters Basic use of env_typing > # typing temperature in Los Baños, Philipines, from 2000 to 2020 T2M¼c(0,8,15,28,40,45

Environmental Covariables with W_matrix
The quality of an environment is measured by the amount of resources available to fulfill the plants' demands. In an experimental network composed of multi-environment trials (MET), the environment's quality is relative to the global environmental gradient. Finlay and Wilkinson (1963) proposed using phenotypic data as a quality index over an implicit environmental gradient. However, this implicit environmental quality index was proposed as an alternative to explicit environmental factors, given the difficulties in obtaining high-quality envirotyping data. Here we provide the use of detailed environmental data arranged in a quantitative descriptor such as a covariate matrix (W), following the terminology used by Costa-Neto et al. (2020a) and de los Campos et al. (2020). Based on these W matrices, several analyses can be performed, such as (1) dissecting the G Â E interaction; (2) modeling genotype-specific sensibility to critical environmental factors; (3) dissecting the environmental factors of QTLÂE interaction; (4) integrating environmental data to model the gene Â environment reaction-norm; (5) providing a basic summary of the environmental gradient in an experimental network; (6) producing environmental relationship matrices for genomic prediction.
To implement these applications, the processed environmental data must be translated into quantitative descriptors by summarizing cumulative means, sums, or quantiles, such as in summaryWTH. However, these data must be mean-centered and scaled to assume a normal distribution and avoid variations due to differences in scale dimensions. To create environmental similarity kernels, Costa-Neto et al. (2020a) suggested using quantile statistics to better describe each variable's distribution across the experimental network. Thus, this allows a statistical approximation of the environmental variables' ecophysiological importance during crop growth and development. In this context, we developed the W_matrix function to create a double-entry table (environments/sites/years environmental factors). Contrary to env_typing, the W_matrix function was designed to sample each environmental factor's quantitative values across different environments.
The same arguments for the functions summaryWTH and env_typing are applicable (env.data, id.names, env.id, days.idvar.id, statistic, by.interval, time.window, and names.window). However, in W_matrix, arguments center¼ TRUE (by default) and scale ¼ TRUE (by default) denote mean-centered (w À Àw) and scaled ðw À Àw=r), in which w is the original variable, Àw and r are the mean and standard deviation of this covariable across the environments. Quality control (QC ¼ TRUE argument) is done by removing covariables with more than r TOL 6 r, where r TOL is the tolerance limit for standard deviation, settled by default argument as sd.tol ¼ 3.
To exemplify a basic use of W_matrix, let us consider the maizeWTH object, involving only weather variables temperature, rainfall and precipitation, while assuming a quality control of sd.tol ¼ 4. The time intervals were settled for every ten days (default), and statistic as 'mean' for each variable at each time interval (BOX 10).

MODULE 3-enviromic similarity and phenotype prediction
The prediction of phenotypes across multiple environments can be conducted using different approaches, such as mechanistic crop models and empirical regressions, in which environmental and/or genomic information is necessary for training accurate models. The latter, named after whole-genome prediction (GP, Meuwissen et al. 2001), has revolutionized both plant and animal breeding pipelines around the world. Most approaches rely on increasing the accuracy of modeling genotype-phenotype patterns and exploring them as a predictive breeding tool. Among the several enrichments of computational efficacy and breeding applications, the integration of genomic by environment interaction (G Â E) has boosted the ability of genomic-assisted selection to evaluate a wide number of genotypes under several growing conditions over multiple environmental trials (METs). Heslot et al. (2014) and Jarquín et al. (2014) introduced ECs to model an environmental source of the phenotypic correlation across MET. These approaches aim to model the reaction-norm of genotypes across MET, i.e., how different genotypes react to different environmental gradient variations. In most cases, reaction-norm modeling serves as an additional source of variation for complementing the genomic relatedness among individuals tested and untested under known environmental conditions. Thus, in addition to the genomic kernels, envirotypeinformed kernels can be used to capture macro-environmental relatedness that shapes the phenotypic variation of relatives, the so-called enviromic kernel (Costa-Neto et al. 2020a).
In the third module of the EnvRtype package, we present the tools used to implement this modeling approach. Three main functions were designed for this purpose. First, the function env_kernel can be used for the construction of environmental relationship kernels using environmental information. Second, the get_kernel aims to integrate these kernels into statistical models accounting for different structures capable of explaining the phenotypic variation across MET. Finally, the function kernel_model BOX 10. env_typing for several variables > # toy set of environmental data > data("maizeWTH") > # type the variable names > var ¼ c("PRECTOT", "T2MDEW", "T2M_MAX", "T2M_MIN") var.id¼var, statistic¼"mean", by.interval¼TRUE) > dim(W) BOX 9. Basic use of env_typing for more variables > var ¼ c("PRECTOT", "T2MDEW") # variables > env.data ¼ get_weather(env.id ¼ 'LOSBANOS', country ¼ 'PHL', lat ¼ 14.170, lon ¼ 121.241, variables.names ¼ var, start.day ¼ '2000-03-01',end.day ¼ '2020-03-01') > # cardinals and data-driven limits >card ¼ list (PRECTOT ¼ c(0,5,10,25,40,100), can be used to fit regression models accounting for environmental and or genomic data using a computationally efficient Bayesian approach. In the following subsections, we describe the kernel methods for modeling envirotype relatedness. Then we present the statistical models that can be built with these kernels.

Enviromic Kernels with env_kernel
In this package, we use two types of kernel methods to compute enviromic-based similarity. The first consists of the traditional method based on the linear variance-covariance matrix (Jarquín et al. 2014). This kernel is equivalent to a genomic relationship matrix and can be described mathematically as: where K E is the enviromic-based kernel for similarity among environments and W matrix of ECs. Note that we use W matrix, but any other source of data from environments can be used here as EC (e.g., typologies, disease evaluations, management). The second method is a nonlinear kernel modeled by Gaussian processes, commonly called the Gaussian Kernel or GK, and widely used in genomic-enabled prediction (Gianola and van Kaam 2008;de los Campos et al. 2010;Cuevas et al. 2017). The use of GK for modeling K E was proposed by Costa-Neto et al. (2020a) and is described in a similar way to the approach already used for modeling genomic effects: where h is the bandwidth factor (assume as h ¼ 1 by default) multiplied by the Euclidean Distance D 2 ii 0 ¼ P k w ik À w i 0 k ð Þ 2 for each pairwise elements in the W ¼ fw i ; w i 0 g. This means that the environmental similarity is a function of the distance between environments realized by ECs. The scalar variable Q denotes the quantile used to ponder the environmental distance assumed as Q ¼ 0.5, equal to the median value of D 2 ii 0 . The h can be computed using a marginal function described by Pé rez-Elizalde et al. (2015).
The env_kernel function implements both methods. It has the following main arguments: env.data, env.id, gaussian, and h.gaussian. The first two arguments work in the same manner previously described for other functions. The gaussian argument (default is gaussian ¼ FALSE) denotes if models (1) or (2) are used to compute K E . If gaussian ¼ TRUE, then the Gaussian kernel (equation 2) is used, and h.gaussian must be inserted to compute it. In the Y argument (default is Y ¼ NULL), it is possible to insert a phenotypic record to be used in the marginal function to compute a data-driven h (Pé rez-Elizalde et al. 2015).
The env_kernel function has two outputs called varCov (relatedness among covariables) and envCov (relatedness among environments). The first is useful to deepen the understanding of the relatedness and redundancy of the ECs. The second output is K E . This matrix is the enviromic similarity kernel integrated into the GP models.
A basic use of env_kernel is presented below. Consider the W matrix created in BOX 10 for the maizeWTH object (5 environments in Brazil). The K E . value using linear covariance and the Gaussian kernel is given as (BOX 11):

Phenotype prediction across multiple environments
After constructing the relationship kernels for environmental relatedness, it is possible to fit a vast number of statistical models using several packages available in R CRAN. However, it is important to consider that statistical models containing more complex structures (e.g., more than one genetic effect plus G Â E and environmental information) are models that require more expensive computational effort and time. Under Bayesian inference, which demands multiple iterative sampling processes (e.g., via Gibbs sampler) to estimate the variance components, the computational effort may be more expensive. Among the R packages created to run Bayesian linear models for genomic prediction, three main packages may be highlighted: BGLR-Bayesian Generalized Linear Regression (Pé rez and de los Campos 2014), BMTME-Bayesian Multi-Trait Multi-Environment (Montesinos-Ló pez et al. 2016) and BGGE-Bayesian Genotype plus Genotype by Environment ). However, BGGE employs an optimization process that can be up to five times faster than BGLR and permits the incorporation more kernel structures than BMTME. For this reason, we implement the kernel_model function that runs the same optimization algorithm for Hierarchical Bayesian Modeling used in BGGE (see Appendix 5).
Below we describe a generic model structure that covers the diversity of possible combinations for modeling the phenotypic variation across MET. This model considers k genomic and l enviromic effects, plus fixed-effects and a random residual variation: where y is the vector combining the means of each genotype across each one of the q environments in the experimental network, in which y ¼ y 1 ; y 2 ; . . . y q ½ T . The scalar 1l is the common intercept or the overall mean. The matrix X f represents the design matrix associated with the vector of fixed effects b. In some cases, this vector is associated with environmental effects (target as fixed-effect). Random vectors for genomic effects (g s ) and enviromic-based effects (w r ) are assumed to be independent of other random effects, such as residual variation (e). It is a generalization for a reaction-norm model because, in some scenarios, the genomic effects may be divided as additive, dominance, and other sources (epistasis) and the genomic by environment (G Â E) multiplicative effect. In addition, the envirotyping-informed data can be divided into several environmental kernels and a subsequent genomic by envirotyping (G Â W) reaction-norm kernels. Based on Equation 6, the theory underpinning the get_kernel function is summarized in three • Benchmark Genotypic Effects. These are baseline models accounting only for genotype-based effects, mostly associated with pedigree-based or genomic realized kinships. P p s¼1 g s 6 ¼ 0 and P q r¼1 w r ¼ 0, in which g s may be related to main BOX 11. Basic use of env_kernel for linear and nonlinear kernel methods genotype-effect (G) in the case of the main genotype-effect model (MM); and G plus a genotype by environment deviation (GþGÂE), in the case of the so-called MDs model. Note that multiple genotype-relatedness kernels may be incorporated, e.g., for additive (A) and dominance (D) deviations and other sources of information from "omics." All genomic kernels must have the pÂp dimension, in which p is the number of genotypes. However, this model does not consider any environmental effect. • Enviromic-enriched Main Effects. We added the acronym "E" to the MM and MDs models to denote 'enviromic-enriched' for EMM and EMDs models. These models consider P p s¼1 g s 6 ¼ 0 and P q r¼1 w r 6 ¼ 0, in which g s are related to G (EMM) or GþGÂE (EMDs), and w r are only the main envirotype effects (W). In this type of model, the environmental effects can be modeled as a fixed deviation (using X f b) plus a random envirotyping-based variation ( P q r¼1 w r ). • Enviromic-based Reaction-Norm. We added the acronym "RN" for "reaction-norm" to the MM and MDs models, resulting in RNMM and RNMDs models, respectively. As described in (ii), the environmental effects can now be modeled as fixed deviations (using X f b) plus a random envirotyping-based variation ( P q r¼1 w r ). However, those RN models consider P p s¼1 g s 6 ¼ 0 and P q r¼1 w r 6 ¼ 0, in which g s are related to G (RNMM) or GþGÂE (RNMDs), and w r are related to main envirotype effects (W) plus an envirotype Â genomic interaction (GÂW). In this context, RNMM accounts for the variation due to GþWþGW, whereas RNMDS considers GþGEþWþGW.
Getting covariance structures with get_kernel: The get_kernel function has four main arguments: a list of genomic relationship kernels (K_G); a list of environmental relationship kernels (K_E); and a phenotypic MET data set (data), composed of a vector of environment identification (env), a vector of genotype identification (gid), and a vector of trait values (y); at last, the model argument sets the statistical model used ("MM","MDs", "EMM", "EMDs", "RNMM" and "RNMDs"). Each genomic kernel in K_G must have the dimension of p Â p genotypes. This argument assumes K_G ¼ NULL by default. If no structure for genetic effects is provided, the get_kernel function automatically assumes an identity matrix for genotype effects, in which it considers no relatedness among individuals. Finally, the argument stage (stage ¼ NULL by default) states which development stages can be used to create stage-specific enviromic kernels. More detail about the latter is given in the Example 3 of the Results section.
In the same manner, K_E could have the dimension of q Â q environments, but the environmental kernels can be built at the phenotypic observation level in some cases. It means that for each genotype in each environment, there is a different EC, according to particular phenology stages or envirotyping at the plant level. Thus, using the additional argument dimension_KE ¼ c("q", "n") (default is "q", for environment), the K_E may accomplish a kernel with n Â n, in which n ¼ pq. The basic usage of get_kernel is given below (BOX 12) and its detailed applications are provided in the Results section.
Modeling the phenotypic variation with kernel_models: Finally, the kernel_model function has four main arguments: a phenotypic BOX 12. Basic usage of get_kernel function >data("maizeYield") >data("maizeG") >data("maizeWTH") # toy set of environmental data >y ¼ "value" # name of the vector of phenotypes >gid ¼ "gid" # name of the vector of genotypes >env ¼ "env" # name of the vector of environments >ECs ¼ W_matrix(env.data ¼ maizeWTH, var.id ¼ c("FRUE",'PETP',"SRAD","T2M_MAX"),statistic ¼ 'mean') ## KG and KE might be a list of kernels ## Creating kernel models with get_kernel MET data set (data), composed of a vector of environment identification (env), a vector of genotype identification (gid), and a vector of trait values (y), a list object for random effects (random, from get_kernel) and a matrix for fixed effects (fixed). For the Hierarchical Bayesian Modeling (see Appendix 5), the arguments for number of iterations (iterations, default is 1,000) and the number of samples used for burn-in (burnin, default is 200) and thining (thining, default is 10) must be provided. The function has two main outputs: the predicted phenotypes (yHat), variance components for each random effect (VarComp). Below we show a brief example of the use of kernel_model for a MDs model (BOX 13) using the same inputs used in BOX 12.

Practical examples
Three practical examples were implemented to present a comprehensive overview of the most important functions of EnvRtype. First, we illustrate the use of EnvRtype for starting an envirotyping pipeline across different locations in the world (Example 1).
Second, we used the toy data set (maizeG, maizeWTH and maizeYield) to demonstrate different environmental similarities based on different environmental factors (Example 2). We used two envirotyping levels (per environment and per development stage at each environment) and two ECs (FRUE, PETP, and FRUEþPETP) to demonstrate different ways to build environmental relatedness for GP. This type of application can be useful for researchers interested in predicting the individual genotypic responses shaped by genomic and enviromic-specific factors across existing experimental trials or for the assembly of virtual scenarios.
Finally, in Example 3 we ran a genomic prediction study case in maize (maizeG, maizeWTH,and maizeYield) involving three models (M1, Baseline Genomic MDs model; M2, Reaction Norm RNMM model, and M3, Reaction Norm RNMM considering a different enviromic kinship for each development stage) and two crossvalidation schemes (CV1: prediction of novel genotypes, using 20% of the data as a training set; CV00: prediction of novel genotypes at novel environments, using 3 of the 5 environments plus 20% of the genotypes as a training set).
From the collected variables, it is possible to type any environmental factor or a core of environmental factors ( Figure 2B). As a toy exemplification (BOXES 14 and 15), we used the variable "T2M" (the daily average temperature at 2 m) to discover ETs and compute environmental similarity ( Figure 2C). In this case, we used the Gaussian kernel as a sign of environmental distance, but it can also be used as kinship for predictive breeding (Costa-Neto et al. 2020a).
It is possible to see in this toy example, that perhaps locations in different continents might have similar ET trends for air temperature. This process can be done for several variables (single or joint) to better describe those similarities.

Example 2: modeling genomic-enabled reaction-norm
To illustrate the use of different ECs in modeling genomic-enabled reaction-norms, we ran a toy example involving a tropical maize data set available in EnvRtype (see BOX 2). From Equation (3), we assumed the following baseline model: y ¼ 1l þ X f bþ gþ gEþ e; where X f b is the fixed environmental effects, g is the random genomic-additive effects and gE is the genomic Â environment interaction, modeled by a block diagonal matrix of genomic effects across environments (MDs model). Thus, we added enviromic effects following two envirotyping levels: (1) envirotyping mean values per environment (entire croplife), and (2) envirotyping for each time interval (development stage) across crop life, assuming fixed stages in terms of days after emergence. For each envirotyping level, we considered two types of ECs: the factor of temperature effect over radiation use efficiency (FRUE) and the difference between rainfall precipitation and crop evapotranspiration (PETP) (Figure 3). From Equation (3), these matrices of ECs (EC1, EC2, EC3, EC4, EC5, and EC6) were arranged in three kernel structures using the RNMDs model, with the baseline genomic model updated to y ¼ 1l þ X f b þ g þ gEþ EC þ gEC þ e, which resulted in 6 models (M1, M2, M3, M4, M5, and M6) according to each EC matrix used, plus a baseline genomic model (M0).

Example 3: Genomic Prediction using kernels for different development stages
Finally, we illustrate a case study of genomic prediction (GP) for two prediction scenarios (CV1 and CV00). In order to demonstrate the get_kernel function, we assumed a nonlinear kernel for enviromic effects (gaussian ¼ TRUE). Different from the last example, in this study, we created different enviromic kernels for each development stage (M3). This model was compared to the benchmark reaction-norm model (M2, Jarquín et al. 2014) and the baseline multi-environment genomic model (M1, the MDs model Lopez-Cruz et al. 2015). Thus, we ran the RNMM, which means that for M2 and M3 the genomicÂenvironment effects are computed as Kronecker product between enviromic and genomic kinships, and for M1, as a block diagonal genomic matrix (MDs). In this example, we hope to demonstrate that for the same environment, it is possible to create different enviromic relatedness kernels according to the similarities BOX 15. Discovering ETs and similarity among locations found in different environments at the same development stage (Figure 4).
We assume four development stages in maize: first vegetative stage (S 1 , from V1 to V6); second vegetative stage, in which the rate of biomass growth is increased in tropical maize (S 2 , from V6 to VT); anthesis-silking stage (S 3 , from VT to R1); and the grain filling stage (S 4 , from R1 to R3). To create this relatedness, 13 ECs were computed from daily weather data. The full matrix of ECs for all development stages considered a combination of 13 ECs by 4 development stages, thus resulting in 52 environmental descriptors (BOX 18). Figure 3 shows that for the same five environments, the crops' growing conditions, and consequently the patterns of similarity, differ according to the crop development stage. So, it's feasible to hypothesize a relatedness build up for some development stages may be more informative of the environmental-phenotype covariance among field trials than others, and perhaps more so than all environmental variables at all development stages. The biological explanation behind this hypothesis relies on ecophysiology, in which the plants are more or less sensitive to environmental variations at specific development stages. If the growing conditions drastically differ at some key development stage, and do not differ in others that do not have a strong impact on the final phenotypic expression, it is feasible to assume that the environmental variance for those "non BOX 16. Envirotyping levels and model structures for modeling reaction-norm > data("maizeYield") # phenotypic data > data("maizeG") # GRM > data("maizeWTH") # weather data > y ¼ "value" # name of the vector of phenotypes > gid ¼ "gid" # name of the vector of genotypes > env ¼ "env" # name of the vector of environments ### 1-Environmental Covariables (ECs) > stages ¼ c('VE','V1_V6','V6_VT','VT_R1','R1_R3','R3_R6',"H") > interval ¼ c (0,7,30,65,70,84,105) BOX 17. Running kernel_model to compute variance components polymorphic stages" may lead to an increased noise in the enviromic relatedness, reducing then the accuracy of GP using reactionnorms. In order to test this hypothesis, we ran the crossvalidation (BOX 19) for two prediction scenarios (CV1 and CV00). Below we present an example of code for running GP. More detail about the codes is given in the Supplementary Codes section. Table 4 presents a summary of the variance components for the random effects of each model. An increased trend in the genomic variance was observed as the inclusion of some enviromic data (from 0.426 in M1 to 0.509 in M2 and 0.555 in M3) and reduction of residual error variance (from 0.848 in M1 to 0.269 in M2 and 0.262 in M3). Despite that, for this germplasm evaluated at this experimental network, the effect of G Â E decreased when estimated using an enviromic kinship (from 0.353 in M1 to 0.269 in M2), but were better understood when dissected for each development stage (M3). The variance components for environmental relatedness plus genomicÂstage interaction was higher for reproductive-related stages (S 3 and S 4 ), suggesting that these stages may be more important in explaining environmentalphenotype covariances across field trials than using all environmental information to build up enviromic kinships. Finally, in Table 5, we present the accuracy of the statistical models for each prediction scenario. These results were obtained by the average Pearson's Moment Correlation (r) for each one of the 30 random samples of training sets. The use of enviromic information was beneficial for both prediction scenarios. The ability to predict novel genotypes at known growing conditions (CV1), using only the phenotypic records of 20% of the germplasm led to an increase from r ¼ 0.130 (baseline genomic model, M1) to r ¼ 0.762 (benchmark reaction-norm model, M2), in which the latter was not different for the reaction-norm accounting for stagespecific enviromic kernels (r ¼ 0.760 for M3). However, great differences between M2 and M3 were observed for predicting novel genotypes at novel growing conditions (CV00). In this scenario, based on the phenotypic records from 20% of the germplasm evaluated in 3 of the 5 environments (so the remaining 2 were used as testing-environments), the M3 model outperformed all models (r ¼ 0.504) in comparison to M2 (r ¼ 0.485) and M1 (r ¼ 0.102). This last model has the worst performance due to the lack of phenotypic records.

Discussion
The collection, processing, and use of envirotyping data in genomic-based studies do not depend only on the quality of the data sources. Here, we demonstrate that the increased ecophysiological knowledge in envirotyping increases statistical models' accuracy in genomic prediction and provides a better explanation of the sources of variation, while increasing those efficiency models. The correct use of envirotyping data depends on the quality of data processing and is specific for each crop species (or living organism). The same "environment" (considering a time interval for a target location) may result in different ETs for each organism, depending on their sensitivity to constant and transitory environmental variations. Thus, in this study, we presented some of those concepts and created functions (and gathered others from different R packages) to facilitate the use of envirotyping data in quantitative genomics.
We presented a user-friendly software, but also a costeffective pipeline aimed at democratizing the use of envirotyping in several fields of plant research. EnvRtype is an open-source package and all advances made up until now are freely available, a situation that can ultimately boost the predictive breeding for low-budget research programs to invest in environmental sensors or perform experiments across a geographically heterogeneity region. Thus, as the remote sensing tools and databases Table 3 Summary of variance components [and confidence intervals] for 7 genomic-based reaction-norm models with GxE (RNMDs), considering three envirotyping levels (no envirotyping, envirotyping by environment and envirotyping by development stage at environment) and three combinations of two environmental covariates (FRUE, PETP and FRUEþPETP). Models were fitted using all phenotypic records available (n ¼ 150 genotypes at 5 environments ¼ 750 records). Genomic kinships were based on additive effects. Enviromic kinships were built using a linear-covariance matrix (gaussian ¼ FALSE). FRUE and PETP denote the covariates "effect of temperature on radiation use efficiency" (from 0 to 1) and the "difference between daily precipitation and daily evapotranspiration" (mm day À1 ), respectively.  Figure 4 Nonlinear enviromic kernels (Gaussian) based on 13 environmental covariates over five tropical maize environments (locations SE, PM, NM, IP, and SO). (A) Enviromic kernel using a combination of 13 covariates at 4 development stages in maize (total of 52 ECs). (B) Enviromic kernel using variables 13 covariates at the initial vegetative stage (from V1 to V6). (C) Enviromic kernel using variables 13 covariates at the leaf growing stage (from V6 to VT). (D) Enviromic kernel using variables 13 covariates at the anthesis-silking interval (from VT to R1). (E)Enviromic kernel using variables 13 covariates at the grain filling interval (from R1 to R3). evolve, the power of EnvRtype to perform a quick and accurate envirotyping pipeline also evolves. In addition, other types of data sources can easily be integrated in the modeling approaches. For example, the use of high-throughput phenotypes can easy be integrated in predictive models by using get_kernel to build phenomics-realized relationship kernels (Crain et al. 2018;Rincent et al. 2018;Cuevas et al. 2019 Genomic Prediction using kernel_model >source('https://raw.githubusercontent.com/gcostaneto/SelectivePhenotyping/master/cvrandom.R') >rep ¼ 10; seed ¼ 1010; f ¼ 0.20; iter ¼ 5E3; burn ¼ 1E3; thin ¼ 10; provided end-to-end pipeline to interplay envirotyping in quantitative genomics, the users can adapt their codes and integrate different sources of information using the EnvRtype functions, such other omics data (Westhues et al. 2017).
We also showed that global envirotyping networks could be built using remote sensing tools and functions provided in EnvRtype. The combination of remote sensing þ typing strategies is a powerful tool for turbocharging global partnerships of field testing and germplasm exchange. It also contributes to increasing the prediction of genotypes across a wide range of growing conditions, i.e., the so-called adaptation landscapes (  ). Associated with predictive GIS tools, the recommendation of cultivars could also be leveraged for specific regions (Costa-Neto et al. 2020b). It could also increase a better definition of field trial positioning (Rincent et al. 2017;Resende et al. 2020) and how breeding strategies have impacted crop adaptation in the past (Heinemann et al. 2015). Evidence of this suggestion is the increased ability of predicting novel genotypes at novel growing conditions achieved by obtaining a deeper understanding of how environments are more or less related at each development stage of crop life.
Despite the benefits and potential uses of EnvRtype, we can envisage the following limitations: (1) the resolution of satellitebased weather system (Nasa Power data base), corresponding to 0.5 Â0.5 ($55,5 km Â 55,5 km) of longitude by latitude, may compromise the discrimination of environments in close geographical proximity; (2) the quality of point-estimates of environmental data using extract_GIS function from public GIS databases depends on the file resolution available; (3) the need for a good registration of geographic coordinates of the target environment, but also on knowing the "window" between harvest and sowing (for agricultural crops); and (4) management factors must be included manually in W_matrix, which we strongly suggest in order to avoid mistakes.  Table 4 Summary of the variance components for three modeling structures (M1, Baseline Genomic Model; M2, Benchmark Reaction-Norm model; M3, Reaction-Norm for each development stage) considering different sources of phenotypic variation due to genomic and enviromic effects. Confidence intervals (a ¼ 5%) for each variance component is given between square brackets. Horizontal dashed lines separate the genomics, environmental and genomic Â environment effects. Genomic kinships were based on additive effects. Enviromic kinships were built using a nonlinear method (gaussian ¼ TRUE). For M1 model, GxE is based on the Kronecker product between an identity environment matrix and the genomic kinship matrix. For M2, this Kronecker product is based on the enviromic kinship matrix instead of an identity matrix, in which this enviromic kinship were built up using envirotyping data to mimic an environmental relatedness among field trials. Table 5 Accuracy (6 standard deviation) of statistical models for phenotype prediction using genomic (M1) and genomic plus enviromic sources of variation (M2 and M3) in predicting novel genotypes at known environments (CV1), using 20% of the genotypes as a training set, and novel genotypes and novel environments (CV00), used as a training set 20% of the genotypes phenotyped at 3 from the 5 environments . Grain yield data are mean-centered and scaled (MaizeYield object). The genotyping relationship for additive effects is based on 52,811 SNPs available to make the predictions (maizeG object). The phenotypic and genomic data are credited to Helix Seed Ltda. Company. Finally, weather data are presented for each of the five environments (maizeWTH object). All codes are available in the BOX codes (from 1 to 19), and as Supplementary Codes (Supplementary Codes.R file). Additional tutorials can be found at Git Hub (https://github.com/allogamous/EnvRtype).

Random effect
Supplementary material is available at G3 online.
Funding APPENDIX Appendix 1. What is "environment" and how to remotely collect environmental data We use the term "environment" to refer to delimited unit that combines location, planting date, and management, which gathers the fluctuation for a core of environmental factors. Thus, the first step of any envirotyping study is to collect reliable environmental data. However, for most breeding programs worldwide, this step is limited by the availability of sensing equipment (e.g., weather stations) installed in the field or nearby site. It is important to highlight that some equipment can be expensive or difficult to access for some research groups in certain regions, more specifically developing countries. For this reason, below, we present two justifications for incorporating a remote environmental sensing routine (insilico) into this package. Then, we present recommendations to enrich the envirotyping platforms to collect and organize environmental data that will be useful to breeders' decision making. First, in order to facilitate the step of collecting environmental data, we decided to include a routine for collecting raw daily weather data through the NASA POWER database (https://power. larc.nasa.gov/), which can access daily information anywhere on earth. This database was integrated using the tools provided by the nasapower R package (Sparks, 2018). In addition, we integrated the raster R package to support downloading climatic data (from the WorldClim database) and SRTM (Shuttle Radar Topography Mission, which provides elevation information). The information from both databases is freely available and can be downloaded using geographical coordinates (latitude and longitude, given in decimal degrees, both in WGS84 format) for a specific window of time (e.g., from sowing to harvest).
Second, the collected environmental data processing requires some expertise in fields such as agrometeorology, soil physics, and ecophysiology. For this to be useful in explaining crop adaptation, the environmental data must be representative of some envirotype-to-phenotype dynamic linked to certain ecophysiological knowledge (e.g., air temperature, relative air humidity, and solar radiation driving the crops' evapotranspiration and, consequently, the soil-water balance). A direct example of the importance of processing raw envirotyping data into ecophysiological enriched information is given for the "daily air temperature"variable. This variable can be processed in heat units, heat stress effects on radiation use efficiency, and thermal range, which is species-specific. For some traits such as grain yield in maize, the impact of temperature-derived factors differs from what is observed in traits, such as plant height or flowering time. This dynamic also varies across crop developmental stages, which can be more or less prone to become a stress factor at certain stages (e.g., in maize, heat at the flowering time has a more significant impact on grain yield). Below is a detailed description of some of those subroutines.