Datasets of the environmental factors and management practices of the smallholder tomato production systems in the Colombian Andes

Datasets presented here were employed in the main work “Understanding the heterogeneity of smallholder production systems in the Andean tropics – The case of Colombian tomato growers” Gil, et al., 2019. In this region, tomato crop is developed under two technological levels: low, carried out under open field (OF) conditions and, a high, by using greenhouses (GH). For OF, data belong to five municipalities of the Guanentá province (Santander department), while for GH, data belong to five municipalities of the Alto Ricaurte province (Boyacá department). The data presented here includes information on soil parental materials and climate variables (averages ± standard deviations) relevant from the agricultural point of view, which were calculated from historical climate series. Soils natural fertility data, obtained by sampling the production areas, are also presented. After filtering the data, 67 samples were obtained for OF and 70 for the GH. For GH, a dataset with the results of 38 soil samples taken inside greenhouses were paired with the results of samples taken outside these greenhouses in uncropped areas. In the case of these soil analyses, the data correspond to tables with the results reported by the laboratory for both, chemical and physical variables, for each location in which soil samples were taken. In this work, the main dataset is one that contains the inputs of fertilizers and water, and the corresponding yields of tomato production cycles managed by local growers. This information was collected through two data collection tools: surveys (SVY) to growers about these aspects in their last production cycle, and through detailed follow-ups of selected production cycles (FWU). For the OF, we collected data from 71 cycles through the surveys and 22 through the follow-ups, while for the GH, information from 138 to 38 tomato cycles was collected through surveys and follow-ups, respectively. A table with the results aggregated by tomato cycle is attached.

one that contains the inputs of fertilizers and water, and the corresponding yields of tomato production cycles managed by local growers. This information was collected through two data collection tools: surveys (SVY) to growers about these aspects in their last production cycle, and through detailed follow-ups of selected production cycles (FWU). For the OF, we collected data from 71 cycles through the surveys and 22 through the follow-ups, while for the GH, information from 138 to 38 tomato cycles was collected through surveys and follow-ups, respectively. A table with the results aggregated by tomato cycle is attached.  Government agencies, farmer's interviews and on-field measurements Data format Raw, filtered and analysed Experimental factors We present a table and datasets that describes the environmental heterogeneity in terms of the climate and the natural fertility of the soils for two representative tomato production areas in Colombia. In terms of the soils natural fertility, the experimental factor taken into account was that the samples corresponded to uncultivated soils, seeking to avoid that the results were influenced by the addition of nutrients through fertilization or amendments. We also included data to show the effect of continuous greenhouse tomato production on the soil chemical and physical variables; in this case, the factor corresponds to the place where the samples were taken, i.e. inside or outside the greenhouse. Finally, a tomato production cycles dataset with aggregated inputs and outputs, as well as some cultivation practices is included. This dataset has two factors associated: the production zones and the tool used for data collection, which correspond to surveys and a direct observation procedure called follow-ups. Experimental features Soil fertility dataset came from a systematic sampling of the soils in each production region was carried out. Inside and outside soil samples were collected for selected greenhouse tomato production units. Finally, an extensive data collection work was carried out using two data collection tools to learn about the main management practices used by tomato growers, as well as a quantification of the amount of fertilizers used and the yield they are able to obtain.

Value of the data
The data serves to quantify the uncertainty derived from soil natural fertility along with the diversity in fertilization practices applied by smallholders.
The data provides the possibility to compare the usefulness of standard data collection tools such as surveys and detailed follow-ups to inventory the inputs and outputs of productions systems lacking a proper record keeping system. The data allows to compare the efficiency of smallholder production systems, which have different technological intensification level, and at the same time draw a baseline to increase fertilizers use efficiency.

Data
The data presented in this article serve to describe the heterogeneity associated with tomato production systems managed by small producers in the Colombian Andes [1]. Data used in this work came from two of the main tomato production areas in Colombia, located in the provinces of Guanent a (Santander department) and Alto Ricaurte (Boyac a department), where tomatoes are cropped under open field (OF) and greenhouse (GH) conditions, respectively. The distance between the two production zones is 115 km. The landscape of the Guanent a province is formed by mountains and hills crossed by rivers, which in some sectors form small riverbanks (Fig. 1). For GH zone, the elevation profile of this study area, including the main soil parent materials, is presented in Fig. 2. In Table 1 the monthly variations of the main agro-climatic variables are presented, from each production area. The datasets attached as supplementary material corresponds to the soil natural fertility defined based on a set of physico-chemical variables (Appendix A) in both production areas (at 30 cm of depth). Also, a table with soil analysis result for lots historically dedicated to tomato production under greenhouse versus nearby uncultivated lots is attached (Appendix B). Finally, we share data related with crop management, fertilizers addition and yields records obtained from surveys and detail follow-ups, recollected on commercial tomato lots (Appendix C).

Soil and climate data
For each production area, the elevation profiles were constructed with data extracted from Google Earth and supplemented with information about the soil parental materials obtained from previous studies [2e4]. Climate was described based on historical times series obtained from Colombia meteorological services agency (IDEAM). For the Guanent a province, climate time series were obtained from a weather station placed in Mogotes municipality (6 29 0 4.79 00 N, 72 58 0 28.59 00 W), while in Alto Ricaurte the station was placed in Villa de Leyva municipality (5 38 0 37.18 00 N, 73 34 0 17.85 00 W). Time series corresponded to daily records of 50 years for Boyac a (1964e2014) and of 57 years for Santander (1958e2015), but both presented a high proportion of missing data. From the dataset, we excluded years with strong or very strong affection of the El Niño (1957,1958,1965,1966,1972,1973,1982,1983,1997,1998,2015,2016) and La Niña (1973,1974,1975,1976,1988,1989), based on the Oceanic Niño Index (ONI) calculated and published by the National Oceanic Atmospheric Administration [5]. With the filtered dataset, monthly averages (±standard deviation) were calculated for four variables: precipitation (mm), temperature (ºC), relative humidity (%) and solar radiation (MJ m À2 d À1 ). The solar radiation was estimated from the hours of bright sunshine (hours day À1 ) since this variable was not included in the original data. The incoming solar radiation was estimated from extraterrestrial solar radiation and relative sunshine hours, following the equation proposed by Angstrom [6]: where I et is the daily total solar radiation above plant canopies (J m À2 day À1 ); I et is the extraterrestrial solar radiation (J m À2 day À1 ); b 0 and b 1 are empirical coefficients, s is the duration of the sunshine (h day À1 ); DL is the day length (hours). The coefficients b 0 and b 1 were determined according to climate zones (dry tropical) based in the values proposed by Frere and Popov [7] as 0.25 and 0.45, respectively. For extraterrestrial solar radiation and day length estimation, we followed the equations proposed by Christopher [8], which take into account an eccentricity correction factor and the local solar times.

Soil natural fertility results
Initially, in each production area, 75 soils samples were collected at 30 cm depth on fallow plots between May and July 2015. Sampling spots were determined by a non-aligned random sampling procedure and adjusted once on the field to sample only uncropped soils. Based on the geographic coordinates of the sampling points, we determined the altitude and slope using a 30 m digital elevation model (DEM). Soil samples were processed at a certificated soils laboratory, and the analysis included chemical properties such as nitrate (NeNO 3 ), ammonium (NeNH 4 ), phosphorus (P) and potassium (K) contents, pH, electrical conductivity (EC), soil organic carbon (SOC); and physical properties such as clay, silt and sand contents (%). Exchangeable NeNH 4 and NeNO 3 were determined by extraction with KCl, and the solution was analyzed as described by Bremner and Keeney [9]. Available P was determined by the Bray II method, K by flame photometry, pH and EC were determined in a 1:2 (v/v) soil: water solution, SOC by the wet oxidation method and texture was measured through a hydrometer. P and K contents were transformed to P 2 O 5 and K 2 O by multiplying them by 2.292 and 1.205, respectively.
Because some samples could correspond to recently cultivated soils and this would cause overestimates of the natural fertility level, a pre-treatment of the data was carried out to detect and remove outlier records. As we have a multivariate dataset, a method capable of identifying outliers on a matrix composed of n observations and p variables were used. The method used was based on determining the distance in a p-dimensional space taking into account the covariance matrix; specifically, the Mahalanobis distance was calculated using the following formula: where m is the multivariate arithmetic mean (centroid) and C is the covariance matrix. This distance is used to determine which observations can be considered atypical, as shown by Filzmoser et al. [10] for geochemical multivariable datasets. To define the threshold for atypical observations, we took into account that MD 2 i approximates to a chi-square distribution with p degrees of freedom [11]. Soil samples having a MD 2 i greater than c 2 p;0:98 were classified as outliers and removed from the dataset. Final datasets comprised 67 and 70 records for Guanent a and Alto Ricaurte provinces, respectively.

Effect of the tomato production under greenhouse on soil properties
As part of the characterization, we determined the effect of GH fertilization management on soil properties. We only considered only the GH system since tomato is the only crop planted inside these greenhouses throughout the time without any rotation with other crops, and planting can be done at any time throughout the year. Based on the above, changes in soil properties depend exclusively on the fertilization management conducted by the growers. To analyse the effect of GH fertilization, we took 30 cm depth soil samples inside GH during the fallow period and sampling only GH soils with more than two years dedicated to tomato production; and also, on adjacent non-cultivated areas (100e500 m away from the greenhouse edges). The sampling sites were randomly selected based on satellite images on which GH locations were clearly identified. Samples were analysed including the same variables used to describe the soil natural fertility in a certified soils laboratory and following the aforementioned methods. We took 38 pairs of soil samples during June 2013.

Fertility management practices
In the present work, two data collection tools were employed: surveys (SVY) and a direct follow-up observation procedure (FWU). SVY consisted on a questionnaire of closed-ended questions about technical aspects related to the last tomato growing cycle such as: cropped area, plant density, cycle length, type and amount of fertilizers applied, crop management practices, water input by irrigation and yield. Questions were redacted by the research team and subsequently were tested through a simulacrum on local growers to improve its comprehension. Once on the field, previously trained undergraduate students conducted the interviews. Between 2009 and 2010, a total of 80 and 174 surveys were carried out to randomly selected smallholder of the OF and GH technological levels, respectively.
Among surveyed growers, a random subsample was selected for a detailed FWU during complete production cycles. The FWU method corresponds to a detailed observational procedure on tomato production cycles carried out by members of the research team. With this data collection tool we focused on capture data related to use of fertilizers, irrigation, management practices and the obtained yield. At the beginning of each production cycle, we measured characteristics such as cropping area, plant density, irrigation water flow rate, and features associated to available infrastructure. Remaining factors were recorded through weekly interviews with growers, focused on crop management practices including the allocation of time and resources, irrigation schedule, dosing and timing for inputs used in fertilization, as well as fruit production. This procedure was conducted from soil preparation to the end of harvest.
In OF, we recorded data for 22 tomato cycles belonging to 10 farms, with four of them being planted consecutively on the same plots. Under GH conditions, we recorded data for 39 cycles established in nine farms, with 10 of them being consecutives. The smaller number of consecutive cycles in OF is due to the crop rotations scheme applied by each grower. The FWU method demanded a higher data collection time than the SVY method. FWU data collection took 16 months (from October 2011 to February 2013) for the OF and 30 months (from September 2010 to March 2013) for the GH. From both data collection tools, the data recorded were aggregated in order to obtain the total inputs (e.g. fertilizers) employed for the tomato production along with the total yield achieved. For commercial fertilizers, the nutrient elements contribution was obtained from the official information showed in the label. In the case of organic fertilizers, samples were taken and analyzed to determine the concentration of nitrogen, phosphorus and potassium.
The datasets of soil natural fertility, the effect of the GH fertilization strategies on the soils and the data about the management practices in relation to the fertilization are georeferenced. In each table, samples have associated the coordinates in decimal degrees taken in the official geodesic datum for Colombia (MAGNA-SIRGAS).

Acknowledgments
This datasets were collected with funds from the VLIR-UOS (Flemish Interuniversity Council) project "Multidisciplinary assessment of efficiency and sustainability of smallholder based tomato production systems in Colombia, with a roadmap for change" (grant number: ZEIN2009PR364); and from the COLCIENCIAS project "Desarrollo de un prototipo de sistema de soporte a la decision para el manejo del agua y la nutrici on del tomate a campo abierto y bajo invernadero" (grant number: 1202-669-45624).

Transparency document
Transparency document associated with this article can be found in the online version at https:// doi.org/10.1016/j.dib.2019.103844.

Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.org/10.1016/j.dib.2019.103844.