Chile's agricultural research institute plant tissue analysis dataset

This dataset holds 9,175 entries that report the nitrogen, phosphorus, potassium, calcium, magnesium, zinc, manganese, copper and boron contents of various plant species, with a focus on crops. The dataset accounts data of 94 plant species, and present nutrient concentration of 14 different plant tissues. The data are derived from the Soil and Plant Nutrition Lab of the Chilean Agricultural Research Institute, which provided services to farmers in the Chilean Central Valley between 2006 and 2020. The analytical methods used to generate these data were consistent across all years, ensuring the reliability of the information. Specifically, nitrogen content was determined using the Kjeldahl method, while all other analytes were quantified via colorimetry (phosphorus and boron) or atomic absorption spectrometry following high-temperature oxidation and dilution of the ashes with hydrochloric acid. The dataset has numerous potential applications, including the estimation of crop nutrient extraction rates, the identification of nutrient deficiencies or excesses, and the provision of reference or prior information for researchers studying plant physiology. The dataset includes 21 Chilean endemic species, which might be of particular interest to researchers studying the biodiversity and ecology of Chile's Central Valley.


Specifications
Agronomy and Crop Science Specific subject area Crop nutrition Type of data Table  How the data were acquired The dataset comprises plant tissue analysis for total nitrogen (%), phosphorus (%), potassium (%), calcium (%), magnesium (%), zinc (mg/kg), manganese (mg/kg), copper (mg/kg) and boron (mg/kg). Total nitrogen was measured by Kjeldahl method (Gerhardt Kjeldatherm, Turbosog and Vapodest 50 s). The remaining elements were measured after high-temperature oxidation of the organic matter and dissolution of the ash with hydrochloric acid. Digest analyte concentrations were determined by atomic absorption spectrometry (Thermo Scientific iCE 30 0 0

Value of the Data
• The dataset can be used to estimate crop nutrient extraction and set out fertilization plans, benefiting farmers, agronomist, and scientists. • The dataset can be used to estimate deficiencies or excesses ranges for different crops or set out likelihoods in future analyses or studies. • The data set can be used as a reference by researchers studying plant physiology and nutrient metabolism. • The dataset can serve to set out priors in Bayesian calculations with new data.
• Plant tissue analysis data on crops growing on Chilean soils is scarce. Soils in Chile have high contents of copper and manganese [2] . • The dataset includes crops and plants not commonly studied, such as 21 Chilean endemic tree species.

Objective
The Soil and Plant Nutrition Laboratory of Chilean Agricultural Research centre analyzes about 700 plant samples a year, serving farmers as a side job to their research work. The laboratory passes on the results directly to agronomist and consultants and archives the reports in digital format. It has been doing so since 2006. 'Previously, all reported were archived for two years in paper format then destroyed. Two years that equals the embargo period for the reported data.
Since 2006, lab reports have transitioned from occupying physical archive shelves to being stored indefinitely on a hard drive. As a result, they have no purpose after the embargo period.
Recognizing this, we identified the digital storage unit as a means to organize and provide access to tissue analysis data from both wild plants and crops cultivated in the soils of the Chilean Central Valley. The dataset encompasses all records from 2006 to 2020, with the last records having reached the end of their embargo period.

Data Description
The dataset comprises one SQL file with three data tables ( Fig. 1 ). The data tables are labeled CROPS, SAMPLED_TISSUE, and PTA_DATA. The CROPS data table has four columns. The first stands for a unique ID, which is also the CROPS table primary key. The other three columns have crops names in English (CROP_NAME_EN), Spanish (CROP_NAME_ES), and the scientific names (CROP_NAME_SC). Legitimate or nomen conservandum scientific names were included to facilitate translation to languages other than the two provided. The data table has 94 rows. The SAMPLED_TISSUE data table has three columns. The first stands for a unique ID, which is also the table primary key. The other two columns have descriptors for the plant tissue on which the analyses were performed: one column has the tissue names in English (TISSUE_NAME_EN), the other in Spanish (TISSUE_NAME_ES). The data table has 14 rows.
The PTA_DATA data table has twelve columns and contain the plant tissue analysis data. The first two columns relate the PTA_DATA table with the other two by attaching to every row the crop ID and tissue ID that correspond to the analysis of that row. The third column assigns a unique ID to every entry of the PTA_DATA Reported values for nitrogen, phosphorus, potassium, calcium and magnesium are expressed as%. Values for zinc, manganese, copper and boron in mg/kg.

Laboratory reports
All laboratory reports available in the hard drive were systematized by date. They were all Microsoft Word document files named after the internal ID of the samples processed. There were 1540 single files from 2006 to 2020 in the hard drive. As the format of the reports changed during the years, sample data was exported manually to an Excel spreadsheet. Reports were discarded when: (a) crop and sampled tissue was unknown; or (b) the analytical method followed was not clear.
Crops and plant species names were harmonized, mainly to present them in lower case letters while keeping the first in upper case. As reported names were common names in Spanish, we retrieve manually scientific names from Tropicos.org [2] keeping only those that were legitimate or conservandum . The process revealed that some plants were reported under different common names, despite corresponding to the same species, variety or cultivar. It was the case for the Peruvian pepper ( Schinus molle L.) and the Magellan barberry ( Berberis microphylla K. Koch). One column was added to provide English common names.
Sampled tissues were harmonized, presenting them in lowercase letters and using singular nouns (e.g., replacing roots by root). A column was added to provide the English translation.
The Spreadsheet was split and exported to comma separated values (csv) to produce the three separated data tables comprised in the dataset. The csv files were imported to MariaDB and the dataset backed up using mysqldump. The backed-up file is the one distributed: it has a set of SQL statements that can be executed to reproduce the original database object definitions and table data.

Plant tissue analysis
All analysis reported were done following Kalra Reference Methods for Plant Analysis [3] . Nitrogen was determined by Kjeldahl method. Tissue samples were digested with sulfuric acid, salicylic acid and hydrogen peroxide. After the digestion, N -NH 4 in the digestate is determined by NH 3 distillation and titration. The remaining elements were measured after high-temperature oxidation of the organic matter and dissolution of the ash with hydrochloric acid. Concentrations of potassium, calcium, magnesium, zinc, manganese and copper in the digestates were determined by atomic absorption spectrometry. The same digestate was used to determine phosphorus concentrations by vanadium phosphomolybdate colorimetric reaction and boron by azomethine-H colorimetry.

Quality control
Since the year 2007 the Soil and Plant Nutrition Laboratory of INIA has participated in the Wageningen Evaluating Programs for Analytical Laboratories (WEPAL-QUASIMEME) proficiency tests. Along the years, the lab has scored within one standard deviation of WEPAL agreed values and use the samples as an internal laboratory quality assurance. Moreover, the WEPAL samples are used to calibrate internal samples used as control for routine analysis. In addition to controls, by protocol 20% of the samples are analyzed in duplicate, setting a value for the variation of a sample of less than 10% to accept the analyzes of a lot.