Skip to main content

Advertisement

Log in

Soybean yield prediction by machine learning and climate

  • Research
  • Published:
Theoretical and Applied Climatology Aims and scope Submit manuscript

A Correction to this article was published on 06 February 2023

This article has been updated

Abstract 

Soybean cultivation plays an important role in Mato Grosso do Sul and around the world. Given the inherent complexity of the agricultural system, this study aimed to develop climate-based yield prediction models using ML, considering the most correlated meteorological variables for each condition, test the best model with independent data, and define zones of higher soybean yield in Mato Grosso do Sul to recommend better planting sites. The study was carried out in two stages. First, meteorological and soybean yield data obtained from 47 locations in the state of Mato Grosso do Sul were used to calibrate the machine learning (ML) algorithms. Second, the best algorithm was used to predict soybean yields throughout Mato Grosso do Sul. Daily meteorological data of air temperature (T, °C), precipitation (P, mm), global solar irradiance (Qg, MJ m−2 day−1), wind speed (u2, m s−1), net radiation (Rn, MJ m−2 day−1), and relative humidity (RH, %) of the NASA-POWER system from 2002 to 2021 were used. The reference evapotranspiration (ETo) by the standard FAO method and water balance (WB) by Thornthwaite and Mather (1955) were calculated for each collection point. The MLs used in this stage consisted of multiple linear regression (MLR), multilayer perceptron (MLP), support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBOOSTING), and gradient boosted decision (GradBOOSTING). The ML models were calibrated using 70% of the data selected for training and 30% for validation. Algorithms were evaluated by accuracy, precision, and tendency. All analyses were performed using Python 3.8 software. Climate variables showed high spatial and seasonal variability throughout Mato Grosso do Sul (MS). Pearson’s univariate correlations between soybean yield and climate variables of the phenological period showed distinct relationships and different intensities. For instance, soil water storage (ARM) showed negative, neutral, and positive correlations in October, November, and December, respectively. The calibrated ML algorithms had a high precision and accuracy in both calibration and testing. For instance, the best model in the calibration was XGBOOSTING, which showed MAPE, R2, RMSE, MSE, and MAE values of 1.84%, 0.95, 2.06%, 4.24%, and 0.921%, respectively. Random forest (RF), extreme gradient boosting (XGBOOSTING), and gradient boosting (GradBOOSTING) were the most precise machine learning algorithms, with R2 values of 0.71, 0.62, and 0.62 in the test, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Availability of data and material

The data/ material is opened.

Code availability

The software used was python and scripts are available.

Change history

References  

  • Adeboye OB, Schultz B, Adekalu KO, Prasad K (2017) Soil water storage, yield, water productivity and transpiration efficiency of soybeans (Glyxine Max L.Merr) as affected by soil surface management in Ile-Ife, Nigeria. Int Soil Water Conserv Res 5(2):141–50

    Google Scholar 

  • Al-Jarrah OY et al (2015) Efficient machine learning for big data: a review. Big Data Res 2(3):87–93

    Google Scholar 

  • Allan RG, Pereira LS, Smith M (1998) Crop evapotranspiration-Guidelines for computing crop water requirements-FAO Irrigation and drainage paper 56. Fao, Rome 300(9):D05109

    Google Scholar 

  • Alvares CA et al (2013) Köppen’s climate classification map for Brazil. Meteorol Z 22(6):711–28

    Google Scholar 

  • Aparecido LE et al (2016) Agrometeorological models for forecasting coffee yield. Agronomy Journal 109(1):249–258

    Google Scholar 

  • Aparecido LEDO et al (2020) Caracterização Hídrica Espacial e Sazonal de Mato Grosso do Sul com Dados em Grid. Rev Bras de Meteorologia 35:147–56

    Google Scholar 

  • Battisti R, Sentelhas PC, Boote KJ (2017) Inter-comparison of performance of soybean crop simulation models and their ensemble in Southern Brazil. Field Crops Res 200:28–37

    Google Scholar 

  • Benos L et al (2021) Machine learning in agriculture: a comprehensive updated review. Sensors 21(11):3758

    Google Scholar 

  • Bhatnagar R (2018) Machine learning and big data processing: a technological perspective and review. In: Hassanien AE, Tolba MF, Elhoseny M, Mostafa M. The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA 2018), Advances in intelligent systems and computing, orgs. Springer International Publishing, Cham, pp 468–478. https://doi.org/10.1007/978-3-319-74690-6_46

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Google Scholar 

  • Câmara GMS (1991) Efeito do fotoperíodo e da temperatura no crescimento, florescimento e maturação de cultivares de soja (Glycine max (L.) Merrill. Sci Agric (Piracicaba, Braz.) 54 (spe). https://doi.org/10.1590/S0103-90161997000300017

  • Camargo M et al (1998) Teste e Análise de Modelos Agrometeorológicos de Estimativa de Produtividade Para a Cultura da Soja na Região de Ribeirão Preto. Bragantia 57 (2). https://doi.org/10.1590/S0006-87051998000200021

  • Cardoso A et al (2010) Extended time weather forecasts contributes to agricultural productivity estimates. Theoret Appl Climatol 102:343–350

    Google Scholar 

  • Carvalho-Junior WC, Calderano Filho B, Silva Chagas C, Bhering SB, Pereira NR, Pinheiro HSK (2016) Multiple linear regression and random forest model to estimate soil bulk density in mountainous regions. Pesq Agrop Brasileira 51(9):1428–1437

    Google Scholar 

  • Chan KY et al (2020) Affective design using machine learning: a survey and its prospect of conjoining big data. Int J Comput Integr Manuf 33(7):645–669

    Google Scholar 

  • Che D, Safran M, Peng Z (2013) From big data to big data mining: challenges, issues, and opportunities. In: Hong B et al. Database systems for advanced applications, lecture notes in computer science, orgs. Springer, Berlin, Heidelberg, pp 1–15. https://doi.org/10.1007/978-3-642-40270-8_1

  • Chen XW, Lin X (2014) Big data deep learning: challenges and perspectives. IEEE Access 2:514–25

    Google Scholar 

  • Chen S, Liu W, Feng P, Ye T, Ma Y, Zhang Z (2022) Improving spatial disaggregation of crop yield by incorporating machine learning with multisource data: a case study of Chinese maize yield. Remote Sens 14(10):2340

    Google Scholar 

  • Companhia Nacional de Abastecimento (CONAB) (2021) Acompanhamento de safra brasileira de grãos, Safra 2020/21: décimo primeiro levantamento. Conab, Brasília. https://www.conab.gov.br/info-agro/safras/graos/boletim-da-safra-de-graos

  • Cravero A, Sepúlveda S (2021) Use and adaptations of machine learning in big data—applications in real cases in agriculture. Electronics 10(5):552

    Google Scholar 

  • Evstatiev BI, Gabrovska-Evstatieva KG (2021) A review on the methods for big data analysis in agriculture. Mater Sci Eng 1032(1):012053

    Google Scholar 

  • Ferreira LB, da Cunha FF, de Oliveira RA, Fernandes-Filho EI (2019) Estimation of reference evapotranspiration in Brazil with limited meteorological data using ANN and SVM – a new approach. J Hydrol 572:556–70

    Google Scholar 

  • Fishman J et al (2010) An investigation of widespread ozone damage to the soybean crop in the Upper Midwest determined from ground-based and satellite measurements. Atmos Environ 44(18):2248–2256

    Google Scholar 

  • Gao F, Anderson M, Daughtry C, Johnson D (2018) Assessing the variability of corn and soybean yields in central Iowa using high spatiotemporal resolution multi-satellite imagery. Remote Sensing 10(9):1489

    Google Scholar 

  • García-Villalba R et al (2008) Comparative metabolomic study of transgenic versus conventional soybean using capillary electrophoresis–time-of-flight mass spectrometry. J Chromatogr A 1195(1):164–173

    Google Scholar 

  • Gibson LR, Mullen RE (1996) Influence of day and night temperature on soybean seed yield. Crop Sci 36(1):98–104

    Google Scholar 

  • Helm JM et al (2020) Machine learning and artificial intelligence: definitions, applications, and future directions. Curr Rev Musculoskelet Med 13(1):69–76

    Google Scholar 

  • Hoffman AL, Kemanian AR, e C. E. Forest (2020) The response of maize, sorghum, and soybean yield to growing-phase climate revealed with machine learning. Environ Res Lett 15(9):094013

    Google Scholar 

  • Hoogenboom GCH (1997) “Decision support system for agrotechnology transfer (DSSAT) Version 4.7 (https://DSSAT.net). DSSAT Foundation”

  • Hopper NW, Overholt JR, Martin JR (1979) Effect of cultivar, temperature and seed size on the germination and emergence of soya beans (Glycine max (L.) Merr.). Ann Bot 44(3):301–8

    Google Scholar 

  • Isabella SJ, Srinivasan S (2018) An understanding of machine learning techniques in big data analytics: a survey. Int J Eng Technol(UAE) 7:666–69

    Google Scholar 

  • James M, Chui M (2011) Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Institute, Lexington, KY

    Google Scholar 

  • Jin Z, Azzari G, Lobell DB (2017) Improving the accuracy of satellite-based high-resolution yield estimation: a test of multiple scalable approaches. Agric for Meteorol 247:207–220

    Google Scholar 

  • Kaul M, Hill RL, Walthall C (2005) Artificial neural networks for corn and soybean yield prediction. Agric Syst 85(1):1–18

    Google Scholar 

  • Krige DG (1951) A statistical approach to some basic mine valuation problems on the Witwatersrand. J South Afr Inst Min Metall 52(6):119–139

    Google Scholar 

  • L’Heureux A, Grolinger K, Elyamany HF, Capretz MA (2017) Machine learning with big data: challenges and approaches. IEEE Access 5:7776–97

    Google Scholar 

  • Mahajan D et al (2016) TABLA: a unified template-based framework for accelerating statistical machine learning. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp 14–26. https://cseweb.ucsd.edu/~hadi/doc/paper/2015-tr-tabla.pdf

  • Major DJ, Johnson DR, Tanner JW, Anderson IC (1975) Effects of daylength and temperature on soybean development1. Crop Sci 15(2):174–179

    Google Scholar 

  • Martorano LG et al (2009) Indicadores da condição hídrica do solo com soja em plantio direto e preparo convencional. Rev Bras De Engenharia Agríc e Ambient 13:397–405

    Google Scholar 

  • Meng T, Jing X, Yan Z, Pedrycz W (2020) A survey on machine learning for data fusion. Inf Fusion 57:115–29

    Google Scholar 

  • Miranda JM, Reinato RA, Silva ABD (2014) Mathematical model for predicting coffee yield. Rev Bras de Engenharia Agríc e Ambient 18:353–61

    Google Scholar 

  • Mitchell TM (1997) Machine Learning. 1a edição. New York: McGraw-Hill Science/Engineering/Math. http://www.cs.cmu.edu/~tom/mlbook.html

  • Monteiro LA, Sentelhas PC, e George U. Pedra. (2018) Assessment of NASA/POWER satellite-based weather system for Brazilian conditions and its impact on sugarcane yield simulation. Int J Climatol 38(3):1571–1581

    Google Scholar 

  • Neethirajan S (2020) The role of sensors, big data and machine learning in modern animal farming. Sens Bio-Sensing Res 29:100367

    Google Scholar 

  • Pesqueira ADS, Bacchi LMA, Gavassoni WL (2016) Associação de fungicidas no controle da antracnose da soja no Mato Grosso do Sul. Rev Ciênc Agron 47:203–12

    Google Scholar 

  • Qiu J et al (2016) A survey of machine learning for big data processing. EURASIP J Adv Signal Process 2016(1):67

    Google Scholar 

  • Rabunal JR, Dorado J (2006) Artificial neural networks in real-life applications. Idea Group Inc (IGI), p 375

  • Rosa VGCD, Moreira MA, Rudorff BFT, Adami M (2010) Estimativa da produtividade de café com base em um modelo agrometeorológico-espectral. Pesquisa Agropecuária Brasileira 45:1478–88

    Google Scholar 

  • Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3(3):210–229

    Google Scholar 

  • Santos, Marco Antonio, e Marcelo Bento Paes Camargo (2006) Parametrização de modelo agrometeorológico de estimativa de produtividade do cafeeiro nas condições do Estado de São Paulo Parametrização de modelo agrometeorológico de estimativa de produtividade do cafeeiro nas condições do Estado de São Paulo. https://www.scielo.br/j/brag/a/zr8jKGz9bT9YkMHWfCCqL6H/?lang=pt (9 de fevereiro de 2022)

  • Sarker MNI et al (2019) Promoting digital agriculture through big data for sustainable farm management. Int J Innov Appl Stud 25(4):1235–40

    Google Scholar 

  • Sassi I, Ouaftouh S, Anter S (2019) Adaptation of classical machine learning algorithms to big data context: problems and challenges: case study: hidden Markov models under spark. In 2019 1st International Conference on Smart Systems and Data Science (ICSSD), pp 1–7. https://ieeexplore.ieee.org/document/9002857

  • Schaafsma W, Vark GNV (1979) Classification and discrimination problems with applications, Part IIa. Statistica Neerlandica 33(2):91–126

    Google Scholar 

  • Sentelhas PC et al (2015) The soybean yield gap in Brazil – magnitude, causes and possible solutions for sustainable production. J Agric Sci 153(8):1394–1411

    Google Scholar 

  • Slavakis K, Giannakis GB, Mateos G (2014) Modeling and optimization for big data analytics: (statistical) learning tools for our era of data deluge. IEEE Signal Process Mag 31(5):18–31

    Google Scholar 

  • Smiderle O et al (2019) Cultivo de Soja no Cerrado de Roraima. EMBRAPA. ISBN: 1809-2675. http://www.infoteca.cnptia.embrapa.br/infoteca/handle/doc/1120127

  • Sonka S (2016) Big data: fueling the next evolution of agricultural innovation. J Innov Manag 4(1):114–136

    Google Scholar 

  • Sørensen CAG, Kateris D, Bochtis D (2019) ICT innovations and smart farming. In: Salampasis M, Bournaris T. Information and communication technologies in modern agricultural development, communications in computer and information science, orgs. Springer International Publishing, Cham, pp 1–19. https://eprints.lincoln.ac.uk/id/eprint/39235/

  • Souza GM, Tiago AC, Suzana CB, e Rogerio PS (2013) A comprehensive survey of international soybean research - genetics, physiology, agronomy and nitrogen relationships soybean under water deficit: physiological and yield responses. IntechOpen. https://www.intechopen.com/chapters/40862 (21 de fevereiro de 2022)

  • Sun Y, Cheng AC (2012) Machine learning on-a-chip: a high-performance low-power reusable neuron architecture for artificial neural networks in ECG classifications. Comput Biol Med 42(7):751–57

    Google Scholar 

  • Tacker P, Vories E (2014) Chapter 8: Irrigation. In Soybean irrigation and water use. Arkansas Soybean Handbook, University of Missouri Extension, University of Arkansas Cooperative Extension Service. https://www.uaex.uada.edu/publications/pdf/mp197/chapter8.pdf

  • Thornthwaite CW, Mather JR (1955) The water balance. Drexel Institute of Technology, Laboratory of Climatology, Centerton, NJ, p 104

  • Van Schaik PH, Probst AH (1958) Effects of some environmental factors on flower production and reproductive efficiency in soybeans1. Agron J 50(4):192–97

    Google Scholar 

  • Vanuytrecht E et al (2014) Aquacrop: FAO’s crop water productivity and yield response model. Environ Model Softw 62:351–360

    Google Scholar 

  • Victorino, Euler Cipriani, Luiz Gonsaga de Carvalho, e Daniel Furtado Ferreira (2016) “Modelagem agrometeorológica para a previsão de produtividade de cafeeiros na região sul do estado de Minas Gerais”. http://www.sbicafe.ufv.br/handle/123456789/8070 (8 de fevereiro de 2022)

  • Volpato MML, Vieira TGC, Alves HMR, Santos WJRD (2013) Modis Images for Agrometeorological Monitoring of Coffee Areas. http://www.sbicafe.ufv.br/handle/123456789/7978 (9 de fevereiro de 2022)

  • Zhang M et al (2007) Uniconazole-induced tolerance of soybean to water deficit stress in relation to changes in photosynthesis, hormones and antioxidant system. J Plant Physiol 164(6):709–717

    Google Scholar 

Download references

Acknowledgements

This study was supported by the Fundação de Apoio ao Desenvolvimento do Ensino, Ciência e Tecnologia do Estado de Mato Grosso do Sul (FUNDECT) and the Federal Institute of Mato Grosso do Sul (IFMS), Naviraí campus.

Funding

This study was supported by the Fundação de Apoio ao Desenvolvimento do Ensino, Ciência e Tecnologia do Estado de Mato Grosso do Sul (FUNDECT) and the Federal Institute of Mato Grosso do Sul (IFMS), Naviraí campus.

Author information

Authors and Affiliations

Authors

Contributions

Guilherme B Torsoni: formal analysis, investigation, data curation, writing—original draft, writing—review and editing, visualization. Lucas Eduardo de Oliveira Aparecido: conceptualization, methodology, 562 supervision, project administration. Gabriela Marins dos Santos: writing—review and editing. Alisson Gaspar Chiquitto: writing—review and editing. Glauco de Souza Rolim: writing—review and editing. José Reinaldo da Silva Cabral de Moraes: writing—review and editing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Lucas Eduardo de Oliveira Aparecido.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: Due to added data in section 2.2. 1st stage—ML calibration.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Torsoni, G.B., de Oliveira Aparecido, L.E., dos Santos, G.M. et al. Soybean yield prediction by machine learning and climate. Theor Appl Climatol 151, 1709–1725 (2023). https://doi.org/10.1007/s00704-022-04341-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00704-022-04341-9

Navigation