Abstract
Soybean cultivation plays an important role in Mato Grosso do Sul and around the world. Given the inherent complexity of the agricultural system, this study aimed to develop climate-based yield prediction models using ML, considering the most correlated meteorological variables for each condition, test the best model with independent data, and define zones of higher soybean yield in Mato Grosso do Sul to recommend better planting sites. The study was carried out in two stages. First, meteorological and soybean yield data obtained from 47 locations in the state of Mato Grosso do Sul were used to calibrate the machine learning (ML) algorithms. Second, the best algorithm was used to predict soybean yields throughout Mato Grosso do Sul. Daily meteorological data of air temperature (T, °C), precipitation (P, mm), global solar irradiance (Qg, MJ m−2 day−1), wind speed (u2, m s−1), net radiation (Rn, MJ m−2 day−1), and relative humidity (RH, %) of the NASA-POWER system from 2002 to 2021 were used. The reference evapotranspiration (ETo) by the standard FAO method and water balance (WB) by Thornthwaite and Mather (1955) were calculated for each collection point. The MLs used in this stage consisted of multiple linear regression (MLR), multilayer perceptron (MLP), support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBOOSTING), and gradient boosted decision (GradBOOSTING). The ML models were calibrated using 70% of the data selected for training and 30% for validation. Algorithms were evaluated by accuracy, precision, and tendency. All analyses were performed using Python 3.8 software. Climate variables showed high spatial and seasonal variability throughout Mato Grosso do Sul (MS). Pearson’s univariate correlations between soybean yield and climate variables of the phenological period showed distinct relationships and different intensities. For instance, soil water storage (ARM) showed negative, neutral, and positive correlations in October, November, and December, respectively. The calibrated ML algorithms had a high precision and accuracy in both calibration and testing. For instance, the best model in the calibration was XGBOOSTING, which showed MAPE, R2, RMSE, MSE, and MAE values of 1.84%, 0.95, 2.06%, 4.24%, and 0.921%, respectively. Random forest (RF), extreme gradient boosting (XGBOOSTING), and gradient boosting (GradBOOSTING) were the most precise machine learning algorithms, with R2 values of 0.71, 0.62, and 0.62 in the test, respectively.
Similar content being viewed by others
Availability of data and material
The data/ material is opened.
Code availability
The software used was python and scripts are available.
Change history
06 February 2023
A Correction to this paper has been published: https://doi.org/10.1007/s00704-023-04389-1
References
Adeboye OB, Schultz B, Adekalu KO, Prasad K (2017) Soil water storage, yield, water productivity and transpiration efficiency of soybeans (Glyxine Max L.Merr) as affected by soil surface management in Ile-Ife, Nigeria. Int Soil Water Conserv Res 5(2):141–50
Al-Jarrah OY et al (2015) Efficient machine learning for big data: a review. Big Data Res 2(3):87–93
Allan RG, Pereira LS, Smith M (1998) Crop evapotranspiration-Guidelines for computing crop water requirements-FAO Irrigation and drainage paper 56. Fao, Rome 300(9):D05109
Alvares CA et al (2013) Köppen’s climate classification map for Brazil. Meteorol Z 22(6):711–28
Aparecido LE et al (2016) Agrometeorological models for forecasting coffee yield. Agronomy Journal 109(1):249–258
Aparecido LEDO et al (2020) Caracterização Hídrica Espacial e Sazonal de Mato Grosso do Sul com Dados em Grid. Rev Bras de Meteorologia 35:147–56
Battisti R, Sentelhas PC, Boote KJ (2017) Inter-comparison of performance of soybean crop simulation models and their ensemble in Southern Brazil. Field Crops Res 200:28–37
Benos L et al (2021) Machine learning in agriculture: a comprehensive updated review. Sensors 21(11):3758
Bhatnagar R (2018) Machine learning and big data processing: a technological perspective and review. In: Hassanien AE, Tolba MF, Elhoseny M, Mostafa M. The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA 2018), Advances in intelligent systems and computing, orgs. Springer International Publishing, Cham, pp 468–478. https://doi.org/10.1007/978-3-319-74690-6_46
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Câmara GMS (1991) Efeito do fotoperíodo e da temperatura no crescimento, florescimento e maturação de cultivares de soja (Glycine max (L.) Merrill. Sci Agric (Piracicaba, Braz.) 54 (spe). https://doi.org/10.1590/S0103-90161997000300017
Camargo M et al (1998) Teste e Análise de Modelos Agrometeorológicos de Estimativa de Produtividade Para a Cultura da Soja na Região de Ribeirão Preto. Bragantia 57 (2). https://doi.org/10.1590/S0006-87051998000200021
Cardoso A et al (2010) Extended time weather forecasts contributes to agricultural productivity estimates. Theoret Appl Climatol 102:343–350
Carvalho-Junior WC, Calderano Filho B, Silva Chagas C, Bhering SB, Pereira NR, Pinheiro HSK (2016) Multiple linear regression and random forest model to estimate soil bulk density in mountainous regions. Pesq Agrop Brasileira 51(9):1428–1437
Chan KY et al (2020) Affective design using machine learning: a survey and its prospect of conjoining big data. Int J Comput Integr Manuf 33(7):645–669
Che D, Safran M, Peng Z (2013) From big data to big data mining: challenges, issues, and opportunities. In: Hong B et al. Database systems for advanced applications, lecture notes in computer science, orgs. Springer, Berlin, Heidelberg, pp 1–15. https://doi.org/10.1007/978-3-642-40270-8_1
Chen XW, Lin X (2014) Big data deep learning: challenges and perspectives. IEEE Access 2:514–25
Chen S, Liu W, Feng P, Ye T, Ma Y, Zhang Z (2022) Improving spatial disaggregation of crop yield by incorporating machine learning with multisource data: a case study of Chinese maize yield. Remote Sens 14(10):2340
Companhia Nacional de Abastecimento (CONAB) (2021) Acompanhamento de safra brasileira de grãos, Safra 2020/21: décimo primeiro levantamento. Conab, Brasília. https://www.conab.gov.br/info-agro/safras/graos/boletim-da-safra-de-graos
Cravero A, Sepúlveda S (2021) Use and adaptations of machine learning in big data—applications in real cases in agriculture. Electronics 10(5):552
Evstatiev BI, Gabrovska-Evstatieva KG (2021) A review on the methods for big data analysis in agriculture. Mater Sci Eng 1032(1):012053
Ferreira LB, da Cunha FF, de Oliveira RA, Fernandes-Filho EI (2019) Estimation of reference evapotranspiration in Brazil with limited meteorological data using ANN and SVM – a new approach. J Hydrol 572:556–70
Fishman J et al (2010) An investigation of widespread ozone damage to the soybean crop in the Upper Midwest determined from ground-based and satellite measurements. Atmos Environ 44(18):2248–2256
Gao F, Anderson M, Daughtry C, Johnson D (2018) Assessing the variability of corn and soybean yields in central Iowa using high spatiotemporal resolution multi-satellite imagery. Remote Sensing 10(9):1489
García-Villalba R et al (2008) Comparative metabolomic study of transgenic versus conventional soybean using capillary electrophoresis–time-of-flight mass spectrometry. J Chromatogr A 1195(1):164–173
Gibson LR, Mullen RE (1996) Influence of day and night temperature on soybean seed yield. Crop Sci 36(1):98–104
Helm JM et al (2020) Machine learning and artificial intelligence: definitions, applications, and future directions. Curr Rev Musculoskelet Med 13(1):69–76
Hoffman AL, Kemanian AR, e C. E. Forest (2020) The response of maize, sorghum, and soybean yield to growing-phase climate revealed with machine learning. Environ Res Lett 15(9):094013
Hoogenboom GCH (1997) “Decision support system for agrotechnology transfer (DSSAT) Version 4.7 (https://DSSAT.net). DSSAT Foundation”
Hopper NW, Overholt JR, Martin JR (1979) Effect of cultivar, temperature and seed size on the germination and emergence of soya beans (Glycine max (L.) Merr.). Ann Bot 44(3):301–8
Isabella SJ, Srinivasan S (2018) An understanding of machine learning techniques in big data analytics: a survey. Int J Eng Technol(UAE) 7:666–69
James M, Chui M (2011) Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Institute, Lexington, KY
Jin Z, Azzari G, Lobell DB (2017) Improving the accuracy of satellite-based high-resolution yield estimation: a test of multiple scalable approaches. Agric for Meteorol 247:207–220
Kaul M, Hill RL, Walthall C (2005) Artificial neural networks for corn and soybean yield prediction. Agric Syst 85(1):1–18
Krige DG (1951) A statistical approach to some basic mine valuation problems on the Witwatersrand. J South Afr Inst Min Metall 52(6):119–139
L’Heureux A, Grolinger K, Elyamany HF, Capretz MA (2017) Machine learning with big data: challenges and approaches. IEEE Access 5:7776–97
Mahajan D et al (2016) TABLA: a unified template-based framework for accelerating statistical machine learning. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp 14–26. https://cseweb.ucsd.edu/~hadi/doc/paper/2015-tr-tabla.pdf
Major DJ, Johnson DR, Tanner JW, Anderson IC (1975) Effects of daylength and temperature on soybean development1. Crop Sci 15(2):174–179
Martorano LG et al (2009) Indicadores da condição hídrica do solo com soja em plantio direto e preparo convencional. Rev Bras De Engenharia Agríc e Ambient 13:397–405
Meng T, Jing X, Yan Z, Pedrycz W (2020) A survey on machine learning for data fusion. Inf Fusion 57:115–29
Miranda JM, Reinato RA, Silva ABD (2014) Mathematical model for predicting coffee yield. Rev Bras de Engenharia Agríc e Ambient 18:353–61
Mitchell TM (1997) Machine Learning. 1a edição. New York: McGraw-Hill Science/Engineering/Math. http://www.cs.cmu.edu/~tom/mlbook.html
Monteiro LA, Sentelhas PC, e George U. Pedra. (2018) Assessment of NASA/POWER satellite-based weather system for Brazilian conditions and its impact on sugarcane yield simulation. Int J Climatol 38(3):1571–1581
Neethirajan S (2020) The role of sensors, big data and machine learning in modern animal farming. Sens Bio-Sensing Res 29:100367
Pesqueira ADS, Bacchi LMA, Gavassoni WL (2016) Associação de fungicidas no controle da antracnose da soja no Mato Grosso do Sul. Rev Ciênc Agron 47:203–12
Qiu J et al (2016) A survey of machine learning for big data processing. EURASIP J Adv Signal Process 2016(1):67
Rabunal JR, Dorado J (2006) Artificial neural networks in real-life applications. Idea Group Inc (IGI), p 375
Rosa VGCD, Moreira MA, Rudorff BFT, Adami M (2010) Estimativa da produtividade de café com base em um modelo agrometeorológico-espectral. Pesquisa Agropecuária Brasileira 45:1478–88
Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3(3):210–229
Santos, Marco Antonio, e Marcelo Bento Paes Camargo (2006) Parametrização de modelo agrometeorológico de estimativa de produtividade do cafeeiro nas condições do Estado de São Paulo Parametrização de modelo agrometeorológico de estimativa de produtividade do cafeeiro nas condições do Estado de São Paulo. https://www.scielo.br/j/brag/a/zr8jKGz9bT9YkMHWfCCqL6H/?lang=pt (9 de fevereiro de 2022)
Sarker MNI et al (2019) Promoting digital agriculture through big data for sustainable farm management. Int J Innov Appl Stud 25(4):1235–40
Sassi I, Ouaftouh S, Anter S (2019) Adaptation of classical machine learning algorithms to big data context: problems and challenges: case study: hidden Markov models under spark. In 2019 1st International Conference on Smart Systems and Data Science (ICSSD), pp 1–7. https://ieeexplore.ieee.org/document/9002857
Schaafsma W, Vark GNV (1979) Classification and discrimination problems with applications, Part IIa. Statistica Neerlandica 33(2):91–126
Sentelhas PC et al (2015) The soybean yield gap in Brazil – magnitude, causes and possible solutions for sustainable production. J Agric Sci 153(8):1394–1411
Slavakis K, Giannakis GB, Mateos G (2014) Modeling and optimization for big data analytics: (statistical) learning tools for our era of data deluge. IEEE Signal Process Mag 31(5):18–31
Smiderle O et al (2019) Cultivo de Soja no Cerrado de Roraima. EMBRAPA. ISBN: 1809-2675. http://www.infoteca.cnptia.embrapa.br/infoteca/handle/doc/1120127
Sonka S (2016) Big data: fueling the next evolution of agricultural innovation. J Innov Manag 4(1):114–136
Sørensen CAG, Kateris D, Bochtis D (2019) ICT innovations and smart farming. In: Salampasis M, Bournaris T. Information and communication technologies in modern agricultural development, communications in computer and information science, orgs. Springer International Publishing, Cham, pp 1–19. https://eprints.lincoln.ac.uk/id/eprint/39235/
Souza GM, Tiago AC, Suzana CB, e Rogerio PS (2013) A comprehensive survey of international soybean research - genetics, physiology, agronomy and nitrogen relationships soybean under water deficit: physiological and yield responses. IntechOpen. https://www.intechopen.com/chapters/40862 (21 de fevereiro de 2022)
Sun Y, Cheng AC (2012) Machine learning on-a-chip: a high-performance low-power reusable neuron architecture for artificial neural networks in ECG classifications. Comput Biol Med 42(7):751–57
Tacker P, Vories E (2014) Chapter 8: Irrigation. In Soybean irrigation and water use. Arkansas Soybean Handbook, University of Missouri Extension, University of Arkansas Cooperative Extension Service. https://www.uaex.uada.edu/publications/pdf/mp197/chapter8.pdf
Thornthwaite CW, Mather JR (1955) The water balance. Drexel Institute of Technology, Laboratory of Climatology, Centerton, NJ, p 104
Van Schaik PH, Probst AH (1958) Effects of some environmental factors on flower production and reproductive efficiency in soybeans1. Agron J 50(4):192–97
Vanuytrecht E et al (2014) Aquacrop: FAO’s crop water productivity and yield response model. Environ Model Softw 62:351–360
Victorino, Euler Cipriani, Luiz Gonsaga de Carvalho, e Daniel Furtado Ferreira (2016) “Modelagem agrometeorológica para a previsão de produtividade de cafeeiros na região sul do estado de Minas Gerais”. http://www.sbicafe.ufv.br/handle/123456789/8070 (8 de fevereiro de 2022)
Volpato MML, Vieira TGC, Alves HMR, Santos WJRD (2013) Modis Images for Agrometeorological Monitoring of Coffee Areas. http://www.sbicafe.ufv.br/handle/123456789/7978 (9 de fevereiro de 2022)
Zhang M et al (2007) Uniconazole-induced tolerance of soybean to water deficit stress in relation to changes in photosynthesis, hormones and antioxidant system. J Plant Physiol 164(6):709–717
Acknowledgements
This study was supported by the Fundação de Apoio ao Desenvolvimento do Ensino, Ciência e Tecnologia do Estado de Mato Grosso do Sul (FUNDECT) and the Federal Institute of Mato Grosso do Sul (IFMS), Naviraí campus.
Funding
This study was supported by the Fundação de Apoio ao Desenvolvimento do Ensino, Ciência e Tecnologia do Estado de Mato Grosso do Sul (FUNDECT) and the Federal Institute of Mato Grosso do Sul (IFMS), Naviraí campus.
Author information
Authors and Affiliations
Contributions
Guilherme B Torsoni: formal analysis, investigation, data curation, writing—original draft, writing—review and editing, visualization. Lucas Eduardo de Oliveira Aparecido: conceptualization, methodology, 562 supervision, project administration. Gabriela Marins dos Santos: writing—review and editing. Alisson Gaspar Chiquitto: writing—review and editing. Glauco de Souza Rolim: writing—review and editing. José Reinaldo da Silva Cabral de Moraes: writing—review and editing. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: Due to added data in section 2.2. 1st stage—ML calibration.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Torsoni, G.B., de Oliveira Aparecido, L.E., dos Santos, G.M. et al. Soybean yield prediction by machine learning and climate. Theor Appl Climatol 151, 1709–1725 (2023). https://doi.org/10.1007/s00704-022-04341-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00704-022-04341-9