Harnessing the power of machine learning for crop improvement and sustainable production

Crop improvement and production domains encounter large amounts of expanding data with multi-layer complexity that forces researchers to use machine-learning approaches to establish predictive and informative models to understand the sophisticated mechanisms underlying these processes. All machine-learning approaches aim to fit models to target data; nevertheless, it should be noted that a wide range of specialized methods might initially appear confusing. The principal objective of this study is to offer researchers an explicit introduction to some of the essential machine-learning approaches and their applications, comprising the most modern and utilized methods that have gained widespread adoption in crop improvement or similar domains. This article explicitly explains how different machine-learning methods could be applied for given agricultural data, highlights newly emerging techniques for machine-learning users, and lays out technical strategies for agri/crop research practitioners and researchers.


Introduction
Naturally, humans' learning procedure is carefully or randomly monitoring surrounding events, grasping some experience and then predicting the next event, mainly occurring without human awareness.For instance, consider a human child who is learning how to talk.Basically, children do not know language learning techniques, procedures, or linguistics.Nevertheless, by listening to surrounding sounds, experimenting, and making mistakes, children gradually adjust their listening skills and simultaneously learn to talk and communicate in different situations.These procedures will continue until children feel confident enough to speak.Technically, they are learning how to talk by establishing a sound and an adequately accurate model of a whole set of procedures automatically and by testing the developed model again and again with surrounding voice data and improving it to build a more precise model.The term "machine learning" typically refers to the procedures of finding relevant groups within data or fitting prediction models to a target dataset.In essence, machine learning aims to mimic or resemble human capacity and the ability to identify patterns using computation approaches.Machine learning is especially handy when the dataset being analyzed is huge or sophisticated beyond human ability to analyze it or when we want to build an automated platform for analyzing a target dataset by considering it to be timeefficient and repeatable.Agricultural data often have these characteristics.Over the past few decades, agricultural databases have experienced remarkable growth in quantity and multi-layer complexity.Having a solid grasp of the methods being employed and some valuable tools for interpreting this wealth of data is becoming increasingly crucial.Although machine learning has been engaged in the crop domain for many years, its usage in agriculture and crop improvement has now become so widespread that it is used in almost every discipline.Only recently, though, has the field started to examine the various strategies more closely to determine which ones work best in certain situations or whether they are suitable at all.This review aims to offer compact, sufficient, and explicit information and details on how to use machine-learning techniques for agricultural and crop improvement researchers.We do not seek to provide a comprehensive analysis and investigate the literature on machine-learning applications for crop improvement problems nor to get into the specific mathematical details of different machine-learning techniques (Liakos et al., 2018;Sharma et al., 2020).We focus on connecting specific machine-learning methods to various kinds of agricultural data.In addition, we will try to explain some best practices for approaching training and modeling improvement in real-world scenarios.The intrinsic intricacy of agricultural data poses opportunities and challenges for analytical methods in machine learning.We highlight common problems that undermine the validity of research and offer advice on how to overcome these challenges.The discussion of several machine-learning methods takes up most of this review, and we also provide explicit examples of how to use the strategy appropriately and understand the outcomes in each case.Traditional machinelearning techniques are included in the discussion as, in many situations; they continue to be the best options to apply.Our discussion covers techniques of deep learning, which shows satisfactory performance and is the best option for various machine-learning responsibilities.We also cover federated learning as a robust technique for having a machine-learning global crop improvement model to deal with future challenges such as climate change.We conclude by outlining the prospects for integrating machine learning into agricultural data analysis pipelines.When using machine learning in agriculture, there are two primary objectives.First, even though the collected data are sufficient or deficient, precise predictions should be made and used to direct further research endeavors.Since scientists are interested in understanding mechanisms, the other objective is to apply machine learning to enhance and increase the comprehension of crop improvement mechanisms, including several types of phenotypical, genotypical, biological, agronomic, and climatic mechanisms.We also summarize some of the limitations and applications of machine-learning approaches along with some data-related concerns for researchers in the crop improvement domain.

Essential concepts
We discuss several fundamental ideas in machine learning and, whenever possible, present examples from agricultural literature to clarify these concepts.

Basic terms in machine learning
A dataset consists of several instances, or data points, that are conceptualized as individual experimental observations.Several fixed features describe each data point.Phenotype, genotype (SNPs), product price, and climatic parameters are a few examples of these features.Whatever we aim to do with a machine-learning model is specified objectively by a machinelearning task.For instance, we could predict the rate of price fluctuation at a particular point in time for a specific agricultural product with an experiment examining the cost of the crop product over time.In this instance, the features "cost of crop product" and "time" could be referred to as input features.The conversion rate, which would represent the anticipated output of the target model at a specific moment, is the quantity we are interested in forecasting.Input and output features of a model can be as many as desired.Features could be either categorical (accepting just discrete values) or continuous (continuous numerical values are used).Technically, categorical features are usually binary in nature, meaning they can be 1 (true) or 0 (false).

Concept of supervised, unsupervised, semi-supervised, and reinforcement learning
Supervised machine learning describes how a model can be fitted to data or part of target data that distinct labels have received for which a ground truth attribute exists; this quality is often determined by experimentation, researchers, or data collectors.In contrast to knowledge derived from inference, ground truth is information verified via direct observation and measurement, thus known to be accurate or real (Kondermann, 2013).Among the examples are high-yield prediction (Panigrahi et al., 2023) and water quality prediction (Ahmed et al., 2019;Ghosh et al., 2023;Chatterjee et al., 2024) using supervised learning for crop improvement.Laboratory or experimental observations ultimately serve as the source of ground truth in both cases.Contrary to supervised learning, patterns in unlabeled data can be found using unsupervised-learning techniques (James et al., 2023).This approach does not require predetermined labels with ground truth information (Sindhu Meena and Suriya, 2020).For example, plant image data can be analyzed using an unsupervised machinelearning technique (Davis et al., 2020;Bah et al., 2023).Semisupervised learning, in which a significant quantity of unlabeled data is paired with tiny quantities of labeled data, occasionally combines the two methodologies (Ouali et al., 2020;Ahfock and McLachlan, 2023); for example, weed distribution and density estimation (Liu et al., 2024).When obtaining tagged or labeled data is expensive, this can dramatically enhance performance.Another component of machine learning known as reinforcement learning (RL) teaches an agent how to behave and react in a given environment by having it carry out specific tasks and then watching the rewards or outcomes.This technique is already employed in different agricultural domains, such as crop yield prediction (Elavarasan and Vincent, 2020;Iniyan et al., 2023) and a completely autonomous precision agricultural aerial scouting technique (Zhang et al., 2020;Elango et al., 2024).

Concept of classification, clustering, and regression problems
In machine learning, a task is referred to as a classification challenge when it requires allocating data points to a collection of discrete classes such as varieties emitting high or low methane, and a classifier is any algorithm that carries out this kind of classification (Sen et al., 2020), such as cassava disease detection and classification (Bian and Priyadarshi, 2024).Contrary to classification, regression models produce a collection of values that are continuous (Pardoe, 2020;Panigrahiet al., 2023), such as the prediction of yield before the harvest of very early potato cultivars by using a regression model (Piekutowska et al., 2021).Regression problems can frequently be reformulated as classification problems since continuous values can be discretized or thresholded (Greener et al., 2022).Typically based on some metric of data point similarity, in a target dataset, clustering algorithms are applied to predict and group similar data points (Ghosal et al., 2020).These techniques are unsupervised and do not necessitate labeling the instances inside a dataset.For example, according to images of soybean, clustering could predict seed weight (Duc et al., 2023).

Concept of classes and labels
When a classifier returns a discrete collection (set) of mutually exclusive values, such values are referred to as classes.These values are called labels when they do not have to be mutually exclusive.Typically, an encoding is used to represent classes and labels.One essential step in preparing data for machine-learning tasks is encoding categorical variables.It is essential to convert categorical data into a numerical format to make them compatible with machine-learning algorithms.Categorical data are not numerical values, such as categories or text.For example, a place variable with the values first, second and third or a color variable with the values; red, green, and blue is categorical data, which every value denotes a distinct category.There might be an inherent ordering or link between some categories.There is a natural ordering of values for the aforementioned place variable.There is a natural ordering of values for the aforementioned place variable.Due to the fact that the values can be ranked or ordered, this kind of categorical variable is known as an ordinal variable.There are several popular category encoding methods, each combining benefits and drawbacks such as label encoding, ordinal encoding, and one-hot encoding methods.One-hot encoding is one of these techniques, which is most frequently employed (Yu et al., 2022b).When there is no innate link or order among the categories, this encoding performs well with nominal categorical variables (Rodrıǵuez et al., 2018).The distinctiveness of every category is maintained by one-hot encoding.It guarantees that no ordinal link between the categories is assumed by the method.Also, one-hot encoding eliminates the possibility of unintentionally adding biases based on the sequence of categories because each category is represented independently.But when working with categorical variables that have a large number of distinct categories, one-hot encoding can dramatically increase the dataset's dimensionality.This may result in the curse of dimensionality and have an adverse effect on the performance of the model.Ordinal encoding is used when the categorical feature is ordinal.Every distinct category value in ordinal encoding is given an integer value.For example, in the color categorical data, red is 1, green is 2, and blue is 3. Maintaining the order is crucial in this method and encoding should so take the sequence into account.Equal intervals between categories are assumed by ordinal encoding, yet this may not always be the case in real-world situations (Dahouda and Joe, 2021).Unlike one-hot encoding, ordinal encoding does not increase the dimensionality of the dataset and It saves space and processing time by substituting integers for categorical variables.Label encoding assigns a unique integer value to each category in a categorical feature.This is an easy-to-use strategy that can be helpful when the categories' order matters.However, because of the allocated integer values, it could create unintentional linkages between categories.For instance, label encoding could assign the values 0, 1, and 2 correspondingly if a categorical feature is small, medium, and large.This would suggest that "large" is twice as significant as "small", which is probably incorrect.Important point is that the type of categorical variable and the issue those researchers are trying to solve will determine which encoding techniques should be used.

Concept of cost or loss functions
Machine-learning models never produce perfect results; they always deviate from the ground truth or the real world (Ho and Wookey, 2019).Cost or loss functions are the mathematical functions that compute this deviation or, more broadly, the degree of disagreement between the actual and ideal outputs (Uma et al., 2021).Mean squared error loss for regression problems is one example, and, for classification-related problems, binary cross entropy (Nar et al., 2019).A mean squared error loss function calculates the average squared difference between the anticipated value and ground truth.Binary cross entropy is a binary classification problem that must divide observations into one of two labels according to specific criteria [such as healthy leaf and infected leaf (Sarkar et al., 2023)].

Concept of parameters and hyperparameters
In essence, models are mathematical functions that take a collection of imported features and return one or several features or values as an output.Models include adaptable and flexible parameters that can be adjusted throughout the training process to optimize the models' performance, allowing them to learn from training data (Yu and Zhu, 2020).In a simple regression model, for instance, each feature has a particular parameter that is being multiplied by the value of the feature; these are then integrated and combined to provide a forecast.Hyperparameters are tunable values that are not changed during training and are thus not regarded as a model component.But this nonetheless affects the performance and training model.The learning rate, which regulates the pace at which the model's parameters are changed during training, is a standard description of a hyperparameter.To simplify it, hyperparameters control a structure and training procedure of machine-learning models, and they might be the number of clusters in K-means clustering, the learning rate in a neural network, or the depth in a decision tree.Hyperparameters, in contrast to model parameters, need to be predefined and cannot be learned during training.A model's ability to perform well or poorly can be determined by selecting the appropriate collection of hyperparameters.Therefore, choosing the set of hyperparameters that result in the best possible model performance is known as hyperparameter tuning.Depending on the type of model being trained, different sorts of hyperparameters may be employed including learning rate, number of epochs, batch size, number of hidden layers and units, regularization parameters, momentum, and activation function.Several tools are developed for model tuning and hyperparameter optimization such as Ray Tune (Shin et al., 2020), Optuna (Akiba et al., 2019), HyperOpt (Bergstra et al., 2015), and AWS Sage Maker (Das et al., 2020).

Splitting target data into training, validation, and testing sets
Models need to be trained, which is the process of automatically modifying model parameters to enhance performance before they can be used to generate predictions (Mathai et al., 2020) This means altering the parameters in a supervised learning setting to minimize the average value of the loss or cost function and improves model performance with a training dataset.Typically, a separate validation dataset tracks but does not alter the training process to detect any overfitting (Twomey and Smith, 1997).Even if a cost function does not run on ground truth outputs in unsupervised scenarios, it is nonetheless decreased.After training, the model can be evaluated using data not used during training (Figure 2A) (Eelbode et al., 2021).For a general overview of the training procedure and instructions on how to divide the target dataset into training set and testing set. Figure 2 illustrates the principal notions for the training of models and displays a flowchart to aid in the whole procedure.

Concept of overfitting and underfitting
For a model to be predictive of unobserved (non-training) data, it must be fitted to training data to grasp the entire connection among all possible variables inside the dataset.The common reasons that a machine-learning model performs poorly are challenges, two key concepts in the field of machine learning (Figure 2B).An overfitted model (often caused by having too  Conversely, an underfitted model cannot accurately represent the connections between the data variables.This can result from an improper model type selection, inaccurate or inadequate data assumptions, a high bias, or a low variance procedure (Figure 2B).

Concept of the bias-variance tradeoff
The inductive bias of the model is the collection of assumptions made by the learning algorithm that leads it to prefer one solution to a learning problem over another (Baxter, 2000).This could be understood as the model favoring one learning solution over another.This choice is frequently encoded into the model by using a specific loss function and/or its particular mathematical form.Various model types have distinct inductive biases that make them more appropriate and they often perform better for specific categories of data.The tradeoff between variance and bias is another crucial concept in machine learning.The general argument is that a model with a large bias places more restrictions on the trained model.Conversely, the low-bias model decreases the number of assumptions about the model property, and it is theoretically capable of modeling a large range of function kinds (Neal, 2019).The amount that the trained model varies when it is trained on various training datasets is indicated by the variance of the model.Ideally, models should have low variance and bias, but these goals frequently conflict with each other, given that a model with low bias will learn distinct signals on separate training sets.To prevent either overfitting or underfitting, it is essential to manage the bias-variance tradeoff.

Overview of ML procedures and required concepts
This section is a concise survey of the procedures that should be followed for training an ML model (Figure 3).Surprisingly, little advice is given for the selection of specific models and methods of training (Bengio, 2012;Greener et al., 2022).The first step is to understand the problem, the nature of the imported data, and the final goal of the prediction, which should come before writing any ML algorithms.This step is essential to have a comprehensive understanding of the crop improvement aspect of the problem or The graphic illustrates the general process for data collection, training, testing, and validation of machine-learning and model evaluation methods.Data collection: collecting data from various data sources related to crop improvement and development.Data processing: the most important step in the ML space is data cleaning and pre-processing.Before beginning the analysis of the best algorithm for the provided data set, we must comprehend the data set and select the cleaning or pre-processing procedures to obtain the best possible outcomes.Learning (Model building and selection of ML algorithm): This is a crucial stage that will soon bring the process to an end.Thus, we must choose our models carefully because we will select one as the ultimate model to solve the provided business challenge.Model evaluation: Model evaluation validates a target model with some standardized mathematical formulae or methodology.Deployment and monitoring: the selected ML model is authorized for deployment in the production environment after its performance satisfies the requirements, and the performance and behavior of the selected model in the real world are continuously observed, analyzed, and assessed through the process known as model monitoring.
question: for example, knowing the sources of noise and the origin of the target data.Understanding the computational storage of the inputs and outputs is also crucial.The following questions could be addressed: Are they adjusted (normalized) to avoid an excessively high effect of one attribute on prediction?Do they have continuous or binary encodings?Are some entries repeated?Are some data pieces missing (NaN)?
In the following step, the collected data (target dataset) must be divided into the first training dataset, the second for validation, and finally the testing dataset (Figure 2A).The training dataset is used for training an ML algorithm.In contrast, the test dataset (holdout set) is used to evaluate the resulting model (to estimate how well the model performs on unseen data).This idea is further used in the model selection part of the training procedure, which might lead to allocating a part of the training dataset as a validation set while the rest of the training data are used for training proper.Using the training set, the parameters of the specific model are updated during the training procedure.Usually, 10% of the available data are split and considered validation data to oversee instruction (training) performance, thus avoiding overfitting of the target model, and select hyperparameters (previously explained) based on datasets for training.Frequently, k-fold cross-validation is used in this step.Typically, 10% to 20% of the total dataset is dedicated as a test dataset to evaluate the expected real-world performance of the target model by assessing how well it performs on data that were not used for training or validation.To prevent adjusting the model to match the test set, there should be only one-time use of the test set in the later stages, if possible (Hastie et al., 2009;Bzdok et al., 2018).Selecting a model comes next, depending on the dataset type (nature of data) and the kind of anticipation being formed.This is conceptualized and made concise in Figure 3.To raise the overall accuracy of the undertaken model, the ensemble model averages the outputs of several comparable models that could be considered.Finally, evaluating the model's accuracy in the dedicated test dataset is crucial.

Conventional machine learning
This section investigates several essential and traditional machine-learning techniques, focusing on their advantages and disadvantages.Table 1 presents a comparison of several machinelearning techniques along with some applications for crop improvement and production.Figure 4 illustrates a few of the conventional machine-learning techniques.To train these models, several software programs have been available, such as Caret in R (Kuhn, 2008;Dege and Brüggemann, 2023), MLJ in Julia (Blaom et al., 2020), and scikit-learn in Python (Pedregosa et al., 2011;Rajamani and Iyer, 2023).When developing machine-learning algorithms for crop improvement-related data, conventional machine learning is typically the first area to investigate to find the most appropriate solution for a given problem.Deep learning is currently prevalent and has the potential to be a robust and valuable method.It is still restricted to the application domains where it performs well, though, such as when a vast quantity of data are accessible, such as extreme data points, when there are several features on each data point or when the features have a lot of structure (Greener et al., 2022).Drone images from crop fields (Killeen et al., 2024;Sahoo et al., 2024) and genotypic data (SNPs) (Uppu et al., 2016) are two examples of agricultural data for which deep learning could be effectively used.Even when the other two conditions are satisfied, deep learning may not be the best option because of the need for vast volumes of data.Technically, conventional approaches build and evaluate solutions for a particular problem far more quickly than deep learning.When compared to more conventional models such as random forests and support vector machines (SVMs) (Hastie et al., 2009), creating the architecture and training a deep neural network might be a computation-intensive and costly process (Sejnowski, 2018).For a given agricultural prediction problem, even if deep learning seems theoretically doable, it is usually wise to train a conventional technique and evaluate it against a model based on neural networks such as ANN (artificial neural network), if at all possible (Smith et al., 2020).Conventional approaches usually assume that every sample in the collection has the same number of characteristics, which is not always feasible.Using SNP data with varying lengths for each case is a clear illustration of this problem.The data can be adjusted using basic techniques such as windowing and padding to make them all the same size and employing standard ways with them.Padding refers to the process that can add zero value to each example up to making the size of each of them equal to the most prominent example in the target dataset.Conversely, the windowing approach condenses each sample to a specific size (Chrysostomou et al., 2011).

Application of regression and classification models
Regarding regression problems such as those depicted in Figure 4A, ridge regression (a type of linear regression) is frequently a valuable place to start when building and developing a model since it could offer a quick and clear baseline for a particular responsibility.The value of one variable can be predicted by using linear regression analysis according to the value of another variable (Su et al., 2012).On the other hand, when a model relies on as few features as possible from the given data, then other variations of linear regression, such as elastic net regression (Zou and Hastie, 2005) and LASSO regression (Tibshirani, 1996), are also worthy of consideration.Since the correlations between the characteristics in the data are frequently non-linear, using a model such as an SVM is usually a better option in these situations (Noble, 2006), as shown in Figure 4B.SVMs are a practical kind of classification and regression model that convert non-separable problems into easier-to-solve separable problems by using kernel functions.A kernel function is a technique for transforming input data into the format needed for data processing.Both non-linear (a statistical method called nonlinear regression is used to model non-linear relationships between independent and dependent variables) and linear regression could be carried out with SVMs based on the kernel function that was applied (Ben-Hur et al., 2008;Ben-Hur and Weston, 2010;Kircher et al., 2014).To quantify, the best idea is to train an SVM through a kernel of a radial basis function and a linear SVM can be used from a non-  (Bzdok et al., 2018).A non-parametric supervisedlearning classifier, the k-nearest neighbors method employs closeness to classify or anticipate how a single data point will be grouped (Peterson, 2009).XGBoost (Figure 4C) (Chen and Guestrin, 2016;Olson et al., 2018) and random forests (Wang and Zhang, 2017) are examples of ensemble-based models, which provide another family of resilient non-linear techniques.These techniques are effective nonlinear models offering feature significance estimations and frequently just need minor adjustments to the hyperparameters.There are often an overwhelming number of variations among the several models available for regression and classification.It can be misleading to try to forecast how well-suited a specific method will be to a given issue in advance; instead, it is usually wiser to use an empirical approach to identify the optimum model via trial-and-error methods.Swapping out these model versions often involves only one line of code change thanks to a novel and robust machine-learning library such as scikitlearn (Pedregosa et al., 2011), which can efficiently run in a Python environment.To find the best approach overall, it is an excellent strategy to optimize and train several of the previously described techniques, and then compare the results on a different test set to see which method performed the best on the validation set.(A) Regression is the link between a single and/or several independent variables, also known as features, and a dependent variable (the observable attribute) is determined by using regression.A straightforward example is the prediction of crop yield based on one or some of the phenotypic features.(B) SVM: a support vector machine divides the original input data into several categories by creating a gap as large as feasible between the data in each converted version.One example is a prediction of whether a variety of a specific crop is a low-or high-yield variety.(C) Gradient boosting makes predictions by combining several weak prediction models, most often decision trees; for example, the prediction of sugarcane yield grade. (D) Clustering: using one of several algorithms, based on related objects; for example, better energy use in crop production.(E) t-distributed stochastic neighbor embedding (t-SNE), for example, dimensionality reduction of crop genotypic (SNP) data.

Application of clustering models
Like many other clustering algorithms (Figure 4D), k-means is a powerful multi-purpose clustering technique that requires the number of clusters to be specified as a hyperparameter (Jain, 2010).An alternate method that is not necessary for a predetermined number of clusters is DBSCAN (Ester et al., 1996).For datasets with plenty of features, dimensionality reduction can also be done prior to clustering to enhance performance.

Dimensionality reduction
High-dimensional data can be transformed into a lowerdimensional format while preserving the different connections and interactions between the data points and pieces using dimensionality reduction techniques.Although more dimensions could be used in machine learning, two or three dimensions are often selected to enable data visualization on several axes.These methods include data transformations that are both linear and nonlinear.Principal component analysis (PCA) (Jolliffe and Cadima, 2016) and t-distributed stochastic neighbor embedding (t-SNE) (Van der Maaten and Hinton, 2008) are some of the examples common in the agriculture domain for dimensionality reduction.The circumstance determines which technique to apply.PCA is based on a linear combination of input features; each component preserves the global connections between the data points and could be explainable, implying that it is simple to identify the characteristics that contribute to data diversity.t-SNE is a versatile technique that can uncover structure in complicated datasets and more robustly maintain local links between data points (Figure 4E).

Concept of artificial neural networks
The mathematical principle of artificial neural networks (ANN) has been conceptualized by following and understanding the behaviors and connectivity of human neurons in the human brain.It was created initially to study the workings of the brain (Crick, 1989).The significant advances in deep neural network training and architecture over the past few decades have increased interest in neural network models (LeCun et al., 2015).The following section covers the fundamentals of neural networks and common varieties used in research on crop improvement.Figure 5 displays some of these concepts.

Concept of neural network fundamentals
The capacity of neural networks to approximate functions universally is one of their primary characteristics; this implies that, with minimal presumptions, any mathematical function can be accurately approximated to any degree by a neural network that is set up appropriately.The fundamental units of every neural network model are artificial neurons.A mathematical function that translates (converts) inputs to outputs in a certain way constitutes an artificial neuron (Wu and Feng, 2018).Any number of input values can be fed into a single artificial neuron, which then uses a predetermined mathematical function to produce an output value.Artificial neurons are layered and the output of one layer is the input of the next, which forms a network.In the following subsections, we present several methods for configuring artificial neurons, sometimes called neural network architectures.Combining several architectural styles is also popular.For instance, fully linked layers are typically used to provide the final classification output in a CNN (convolutional neural network) used for classification.

Concept of multi-layer perceptrons
A feed-forward ANN (artificial neural network) having several layers, comprising an input layer, one or several hidden layers, and an output layer, is called a multi-layer perceptron (MLP) (Figure 5A).Every layer is wholly interconnected with every other layer.The term "perceptron" was initially established by Frank Rosenblatt (Seising, 2018).The fundamental building block of an artificial neural network, the perceptron, specifies the artificial neuron inside the network.Activation functions, node values, inputs, and weights are all used in this supervised learning technique to determine the output.The forward direction is supported by the MLP neural network.Every node has complete network connectivity.Only in the forward direction does each node transmit its value to the next node.Back-propagation is a method used by the MLP neural network to back propagate the error in order to optimize the weights and unit values.

Concept of convolutional neural networks
CNNs are developed mainly to use image data format, and the fundamental component of a CNN is the convolutional layer (Figure 5B).Three things are needed: a feature map, a filter, and input data (Li et al., 2021).Suppose the input will consist of a color picture, a 3D matrix of pixels.As a result, the input will have three dimensions: height, width, and depth, which match the RGB color space of a picture.CNNs are equipped with a feature detector, which could also be called a kernel or filter.This detector traverses the receptive fields of the image and determines (Bouvrie, 2006).A convolution is the name given to this procedure.CNNs can be set up (configured) to function well with various spatially structured datasets.A 1D CNN, for instance, would contain filters that move in only one way.Data with one spatial dimension would be a perfect fit for this kind of CNN (Tang et al., 2020), such as genotypic (SNP) data from rice varieties.Digital images are examples of data with two spatial dimensions that 2D CNNs can process (Hara et al., 2018).Volumetric data, such as multi-temporal remote-sensing images, are what 3D CNNs use to function (Ji et al., 2018).Significant progress has been made in crop improvement for various datasets using CNNs (Jiang and Li, 2020).Crop classification (Durrani et al., 2023), crop yield prediction (Nejad et al., 2022), and maize seedling recognition (Diao et al., 2022;Wei et al., 2024) are some examples of CNN models for crop improvement, and they now frequently surpass skilled human performance.

Concept of recurrent neural networks
RNNs are the most suitable approach with data organized into sequences, where each point in the series has some semblance of dependence or connection with the previous one (at least conceptually) (Greener et al., 2022), as seen in Figure 5C.The primary use of this approach is probably in NLP (natural language processing), which considers text a succession of characters (Medsker and Jain, 2001).One kind of RNN that can retain the outputs of each node for extended periods is called long short-term memory (LSTM) (Goodfellow et al., 2016).In other words, RNNs are modified to build LSTM networks, which provide better recall of previously learned data.Using back-propagation, they train the target model.When dealing with time delays of undetermined length, LSTM is a robust tool for classifying, processing, and predicting time series.Thus, once data are presented in an orderly structure, such as time sentences, LSTM can frequently be employed in different fields such as NLP and time-series analysis (Abdel-Nasser and Mahmoud, 2019).In the crop domain, RNNs are used extensively for crop improvement, such as land cover classification (Sun et al., 2019), prediction of crop biomass (Masjedi et al., 2019), and land cover and crop classification (Mazzia et al., 2019;Abidi et al., 2023;Moharram and Sundaram, 2023).

Principle of graph neural networks
Graph neural networks (GNNs) are especially well-suited for data that lack a clear apparent structure, such as a picture.Still, they are made up of things connected by randomly determined interactions or relationships (Battaglia et al., 2018).Such applications relevant to crop improvement include weed and crop recognition in smart farming (Jiang et al., 2020;Pandey et al., 2024) and crop recommendation systems (Ayesha Barvin and Sampradeepraj, 2023;Ge et al., 2024).In computer language, a graph is merely a representation of this kind of data, and every graph has a collection of nodes or vertices and a collection of edges that show different types of relationships or connections between the nodes.As seen in Figure 5D, when each feature of the nodes is updated across the network, neighboring nodes are considered.The node features in the final layer are then used as the output or merged to generate an output for the entire graph.Graphs For instance, the maize crop has been used for predicting yield (Ahmed, 2023).(B) To compute the values in the subsequent layer, a CNN employs filters that traverse the input layer.Since the filters work throughout the layer, parameters are shared, making it possible to identify related things wherever they may be.Although 1D and 3D CNNs are also used in crop improvement, 2D CNNs demonstrated operating on images of crops.1D and 3D CNNs have been used for crop and crop-land classification (Ji et al., 2018;Liao et al., 2020;Liu et al., 2023).(C) An RNN is a deep-learning model trained to interpret and translate a given set of sequential data inputs into a predetermined set of sequential data outputs.It is used for weed detection for crop improvement (Brahim et al., 2021).(D) A GNN uses data from linked nodes of graph format data.It can be used for genomic selection in crops, and it has been used for predicting crop variety yield (Yang et al., 2023).(E) An autoencoder is composed of an encoder neural network that transforms an input into a latent representation with lower dimensions, and this hidden representation is transformed back into the original input by using a decoder neural network.This method has been used in crop disease detection for crop improvement (Abinaya and Devi, 2022).
illustrating various correlations could use data from several sources to make predictions.Graph Nets (Gao and Ji, 2019) and PyTorch Geometric (Fey and Lenssen, 2019) are some of the most popular programs used to train GNNs.

Autoencoder networks
Autoencoder is used for unsupervised learning or the efficient coding of unlabeled input (Bank et al., 2023).The autoencoder method can learn two tasks: transforming input data by using an encoding function and recreating the input data from the encoded representation by a decoding function (Figure 5E).An alternative perspective is that the encoder attempts to compress the input and the decoder attempts to decompress it.Concurrent training is applied to the encoder, latent representation, and decoder (Doersch, 2016).Predicting the imposing of a structure on the latent space and the degree of similarity between two data points helpful for prediction tasks are two examples of applications.This approach has been used in several domains of crop improvement, such as crop classification (Bhosle and Musande, 2022;Cui et al., 2023) and crop mapping (Hamidi et al., 2021;Madala and Prasad, 2023;Hamidi et al., 2024).

Neural network improving and training
Several issues are unique to neural networks as they are far more sophisticated than conventional machine-learning techniques.It is frequently a good idea to train a neural network on a single training sample after deciding that it is the best model for the desired application for instance, a single image.The trained model is not helpful in forecasting, whereas it is adequate for exposing programming flaws (errors).As the network retains only the input, the training loss function ought to rapidly approach zero.If not, either the algorithm is not sophisticated enough to represent the input data or there is probably a mistake in the code.The network can begin with training on the whole training set after passing this fundamental debugging test when there is a minimum in the training loss function.It might be necessary to adjust hyperparameters such as the learning rate for this, as shown in Figure 6A.Overfitting of the network can be identified by tracking loss on the training dataset and validation dataset, where loss on the training set starts to rise and loss on the validation set keeps becoming less.At that moment, training is often discontinued, a procedure called early stopping, as shown in Figure 6B.A neural network overfitting indicates that the model's capacity to generalize to new data is beginning to wane as it starts to memorize only the features of the training set.Although early stopping is an intelligent strategy for avoiding this, other training approaches could be employed, such as dropout methods or model regularization.Nodes within the network are arbitrarily disregarded to compel the network to discover a more reliable prediction method incorporating more nodes.TensorFlow (Abadi et al., 2016) and PyTorch (Paszke et al., 2019) are well-liked neural network training programs.Neural network training is computationally intensive and often calls for a tensor processing unit or graphics processing unit with enough RAM (random-access memory) because using these devices could accelerate work 10 to 100 times faster than using a regular CPU (central processing unit).This acceleration is necessary for training massive datasets and for the larger models that have demonstrated success in recent years.Nevertheless, using a model that has already been trained is typically much quicker and this could frequently be accomplished with a simple CPU.For researchers without access to a GPU (a graphics processing unit is on-demand computing services) for training, cloud computing options are available from popular suppliers, and thus it is essential to remember that for simple tasks.Python code could be freely tested on graphics or tensor processing units using Colaboratory (Colab).A practical method to get started with deep learning based on Python is to use the Colab environment.

B A
(A) The learning rate concept is that, when training a neural network or other conventional techniques such as gradient boosting, the learning rate of the model controls how quickly parameters that are learned are changed.(B) Early stopping is a regularization technique that helps prevent overfitting when training learners using gradient descent or other iterative methods.
Khatibi and Ali 10.3389/fpls.2024.1417912 Frontiers in Plant Science frontiersin.org7 Challenges of machine learning for crop improvement and productionrelated data The enormous diversity of agricultural and crop domain data is one of the significant challenges in modeling, and these data are also generated from different nature domains.Crop improvementrelated data can be yield-related data, land-related data, cropdevelopment-related data, crop-disease-related data, or even microorganism data.Most of them can be along with genotypic and transcriptomic data such as SNPs or RNA-seq data and/or high-resolution images, 3D structures, or gene expression profiles over time, and different interactions of networks are some examples of these data formats and natures.A summary of recommended techniques and crucial factors for several crop-improvement data kinds is provided in Table 2.Because of the variety of data formats encountered, processing crop-improvement-related data frequently calls for customized solutions.Because of this, it is challenging to provide ready-made solutions or even broad suggestions for applying machine learning in various fields of study.However, for machine learning to be used successfully in crop improvement and agriculture, as well as more broadly, a few common challenges must be considered.

Availability of high-quality data
Since data quality directly affects the functionality, precision, and dependability of ML models, it is essential to the field of artificial intelligence.Models that use high-quality data are more predictive and yield more consistent results.There are some main challenges for insuring data quality in ML including Data collection; the problem facing crop research institutes is obtaining high-quality data from a variety of sources.Ensuring that every data point followed to the same criteria for collecting data and getting rid of redundant or contradicting data is difficult.Data labeling; for training purposes, machine learning algorithms require labeled data; yet, manual labeling is error-prone and time-consuming.Accurate labels that accurately represent real-world conditions are the difficult part.There are some tools and software to generate ground truth data for ML specifically for certain domain such as ROOSTER (image labeler and classifier) (Tang et al., 2023), Bounding boxes (Osman et al., 2021), Polygons (Li et al., 2012), and Polylines (Opach and Rød, 2018).Data security and storage; preserving the integrity of data also entails shielding it from potential corruption and unwanted access.Also, it is essential for agricultural research institutions to have reliable and secure data storage.Data governance; it is difficult for many research facilities to put in place data governance structures that adequately handle problems with data quality.Errors, inconsistent data, and segregated data can result from improper data governance.Also, there is a need for more open-access datasets and standardized data collection protocols to facilitate ML research It is essential to develop more reliable and accurate data collection methods to ensure highquality data for ML research for crop development improvement.

Accessibility of data
Compared to other domains, agricultural and crop-related data have little publicly available data.The selection of strategies that could be applied successfully is significantly influenced by the amount of data available for a particular import data format.Technically, researchers are effectively compelled to employ more conventional machine-learning techniques when limited quantities of data are available because the accuracy of these approaches is more reliable in these particular cases.Deep neural networks and other highly specified models can be explored once more significant quantities of data are available.For supervised machine-learning approaches, it is essential to take into account the relative quantities of every ground truth label included in the dataset.If some labels are insufficient, more data will be needed for machine learning to function (Wei and Dunbrack, 2013;Alzubaidi et al., 2023).

Model interpretability
Researchers often aim to determine why a particular model predicts some subjects in a certain way and why this particular model works in certain situations and is not accurate in other conditions.Putting it in another way, rather than focusing just on correct modeling, agri-researchers are typically interested in identifying the mechanisms and causes accountable for modeling output.The machine-learning technique and the input data determine how well a model can be interpreted.Non-neural network approaches typically contain fewer learnable parameters and feature sets that are more accessible to meaningful interpretation, making interpretation easier.For example, in a simple linear regression model, the parameter allotted to every input feature indicates how that variable influences the prediction.Because non-neural network approaches are inexpensive to train, ablation research in which the impact of eliminating certain input features on performance is quantified is recommended.One approach to potentially finding more reliable, effective, and understandable models is through ablation experiments, which can highlight which aspects are most helpful for a particular modeling job.Because a neural network often has many input parameters and features, interpreting one is often significantly more difficult.

Challenges in transdisciplinary partnerships
The main concern for data-driven crop improvement and production programs is standardized data collection protocol to prevent noisy data and the availability of high-resolution data.On the other hand, it is uncommon for one research organization to be aware of specific resources and knowledge to collect data in machine-learning research and adequately employ the most suitable machine-learning algorithms unless publicly available data are being used.Computer scientists and experimental agriinstitutes frequently collaborate, and the outcomes of these collaborations are often outstanding.However, in these kinds of partnerships, each party must understand the other.In particular, agri-institutes and researchers should be aware of the constraints of the machine-learning algorithms being applied, and computer scientists should understand thoroughly the nature of the data, including the anticipated repeatability and level of noise.Developing such awareness takes time and work, but it is crucial for halting the frequent accidental spread of below-standard models and false conclusions.
8 Federated learning and gossip learning as recommendation approaches for global crop improvement and production programs When leveraging datasets from many institutions, the model could be trained centrally, combining data from silos of various institutions onto a single server.However, different legal, ethical, and administrative restrictions exist on publicly exchanging crop-based data.In many countries, crop-based data must remain in the group, company, or institution.Machine-learning models are trained using a decentralized method called federated learning, often called collaborative learning.Federated learning (FL) is an approach for building machine-learning models where distributed data are used cooperatively by a central server (McMahan et al., 2017;Kairouz et al., 2021), as illustrated in Figure 7. FL allows the data to remain at the original site to protect the safety and intellectual privacy of data, in contrast to centralized training, which transfers data from produced locations to a central server to train the model.Once a new training cycle begins, the most recent version of the model is transmitted to every storage site where the training data are stored (Greener et al., 2022).Each copy of the model is then trained and updated using the data that belong to each unique site.The revised models are then returned to the central server from each site, where they are merged to create a universal model.After that, the freshly revised universal model is released for distribution once more, and the cycle continues until either the model training or convergence is completed.Only those people directly related to that institution have direct access to the data, which means that the data are never virtually transported from the originating location or institution.In an FL approach, the risks of data ownership violations are decreased, data aggregation costs are kept to a minimum, and training datasets can quickly increase in size and variety.Optimum use of the FL approach can lay the groundwork for training deep-learning models for universal crop-based data.

FL taxonomy
The data matrix is the foundation of FL (Li et al., 2022).FL is categorized into three groups according to the various distribution patterns of the sample space and feature space of the data: federated transfer learning (FTL), vertical FL (VFL), and horizontal FL (HFL), which partition datasets non-dimensionally, longitudinally (i.e., dimension of features), and horizontally (i.e., dimension of users), correspondingly, as shown in Figure 8.

General workflow for employing the FL approach
Data holders and central servers are the usual components of FL systems (Li et al., 2022).Not enough local data or feature counts from individual data holders may be available to enable effective model training.As a result, cooperation from other data owners is needed.The FL procedure for the architecture of the client-server is shown in Figure 7.To safeguard data privacy, the data holders exclusively train their data locally in a standard cooperative modeling procedure of FL.After desensitization, the gradients produced by the iterations are used as interaction information and sent to a trustworthy third-party server in place of local data and, to update the model, the server should return the aggregated parameters.The stages involved in FL can be summed up in detail below.The first step is system initialization.In this step, the central server sends out the modeling work and tries to engage with the client.Local calculation is the second step.Upon opening the joint modeling job and initializing the system settings, it will be necessary for each data owner (holder) to initially carry out local measurements and calculations based on the data locally.Eventually, the third step is central polymerization.The central server compiles the estimated values after obtaining the computation results from various data owners (holders).Security, privacy, efficiency, and other concerns are considered and checked during the aggregation process in this step.Significantly, the FL central server's functioning is comparable to that of a distributed machine-learning server, which gathers each data holder's gradient and then produces a new gradient via server aggregation processes.

FL applications in agriculture and relevant work in some crops
Since FL allows datasets to be analyzed even when the raw data are either not readily available or the data owners are not ready to share target data, this opens tremendous opportunities to use the mentioned approach in different domains.In the medical field, FL has been used to recognize COVID-19 disease during the pandemic through an image analysis approach from chest-computed tomography (Lai and Yan, 2022).According to their findings, their network's communication cost decreased by using the federated averaging model.Additionally, to lessen Byzantine assaults in their federated learning test bed, researchers suggest a modified federated learning model in which the edge nodes are randomly split into groups, each assigned a separate transmission time slot (Sifaou and Li, 2022).Because edge devices have a wide range of capabilities and resources, researchers have developed a federated learning framework that analyzes the models without jeopardizing data security or privacy while reaching convergence (Kevin et al., 2022).Agri-researchers, agri-institutes, and agricompanies also frequently gather private data and information that they prefer to keep private, as presented in Table 3. FL uses machine learning to train a shared model across several devices without requiring data exchange.It is perfect for agricultural applications.The FL applications in agriculture are categorized below to create a global model based on data-partitioning techniques, architecture, aggregation algorithms, and scale of federation.In one effort, researchers use a horizontally distributed dataset placed on several client devices to train the yield prediction model using FL (Manoj et al., 2022).To demonstrate the efficacy of agricultural data under decentralized learning, the FedAvg algorithm is used to build deep regression models such as ResNet-16.In another effort, to classify crops (chickpea, maize, and rice), the federated averaging approach has been employed (Idoje et al., 2023).Compared to the stochastic gradient descent (SGD) optimizer, the Adam optimizer model converged more quickly in this research.The study using the farm dataset has shown that decentralized models outperform centralized network models in terms of accuracy and convergence speed.

Federated learning challenges and limitation
Like other systems, FL also has some limitation and challenges for users which can categorized in four main groups include highcost communication, heterogeneity of systems, heterogeneity in statistics, and privacy issues (Mammen, 2021;Moshawrab et al., 2023).The first challenges are raised in FL system is high-cost communication.Network communication in federated systems can be many orders of magnitude slower than local computing because these models consist of a large number computing devices.Compared to traditional data center facilities, communication in these networks can be substantially more expensive.It is also required to design communication-efficient approaches that iteratively send short messages or model updates as part of the training process, instead of sending the complete dataset over the network, in order to fit a model to data supplied by the devices in a federated network.The second challenge is heterogeneity of systems.Due to variations in hardware (memory, CPU), power, and network connectivity, each device in federated networks may have different computing, storage, and communication capabilities.Furthermore, only a small percentage of the devices are usually active at any given time due to the scale of the network and limits imposed by individual systems on each device.For instance, in a network with millions of devices, only hundreds of devices might be in use.It is also possible for any device to be unreliable, and it happens frequently for an active device to stop working during a particular cycle.Problems like stragglers and fault tolerance are far The FL data partition categories: (A) horizontal FL, (B) vertical FL, and (C) federated transfer learning.
more common due to these system-level features than they are in standard data center settings.The third challenge od FL system is heterogeneity in statistics in the system.Across the FL network, devices typically produce and gather data in non-identically dispersed ways.Furthermore, there may be a large variation in the quantity of data points amongst devices.And finally, the last challenge of FL system is privacy concerns.In contrast to learning in data centers, privacy is frequently a primary problem in FL systems.FL only shares model updates rather than raw data, which is a step in the right direction towards preserving user data.Sensitive and important information may still be revealed to the central server or a third party by sharing model changes during the training phase.Although there have been efforts recently to improve FL privacy through the use of techniques like differential privacy or secure multiparty computing, these strategies frequently sacrifice system efficiency or model performance in order to achieve privacy (Zhang et al., 2021a;Wen et al., 2023).

Gossip learning can be alternative to federated learning
To tackle the same issue, gossip learning has also been suggested as an alternative to federated learning (Ormańdi et al., 2013;Hegedűs et al., 2016Hegedűs et al., , 2019)).There is no need for a parameter server because this method is completely decentralized.Nodes immediately share and combine models.Undoubtedly, there are seveal advantages to using gossip learning because there is no single  et al., 2022) point of failure and gossip learning has far cheaper scalability and better resilience because no infrastructure is needed.The term "gossip" describes the information-sharing process that occurs over the network in a manner akin to that of gossip within a social group.In this approach, through information sharing with other nodes in the network, each node in the network updates its model parameters in this distributed machine learning technique.The theory is that any node can rapidly converge to the global optimum by exchanging information with other nodes.In largescale distributed systems where node-to-node communication is unreliable or expensive, gossip learning is very helpful.

Conclusions and future direction
Future predictions display significantly greater use of AI and ML approaches in crop science, which could open a new horizon for integrated and valuable solutions in this area.We have undertaken a thorough review of the essential elements, concepts, applications, and machine-learning definitions required for agri-crop improvement.Nowadays, crop science is leveraging tons of available data to obtain deeper insights through AI and ML and offer the best suggestions for following actions and decisions for enhancing crop productivity or for other necessary tasks.Crop improvement and forecasting are made more accessible by combining computer science and agriculture.Offering broad recommendations and guidance for machine learning in agriculture is challenging because of the diversity of agricultural data.Therefore, our article aimed to provide agricultural and crop science researchers with an overview of the many accessible approaches, as well as some suggestions for conducting efficient machine learning through available data.It is vital to recognize that machine learning is inappropriate for all problems and to know when to avoid it: when the available data are insufficient, when it is necessary to comprehend rather than anticipate, or when it is not apparent how to fairly evaluate performance.Also, here we highlighted the application of federated learning in agriculture along with the definition, procedures, and structure, which can be beneficial for researchers in the agricultural sector.Even though there has been huge progress in machine learning in agriculture, many challenges still need to be addressed to mark ML territory in agricultural science.There is no denying that machine learning has influenced and will continue to influence agricultural research significantly.
FIGURE 2 ML approaches for training.(A) Target data for machine learning should be split into training, validation, and testing sets.The training set is used to train the model directly.With the validation set, the training set is monitored.Test data are used to assess the performance of the model.The k-fold cross-validation approach is also used for validation.(B) Concept of underfitting and overfitting.(C) One-hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction.(D) Numerical data can be represented in a way that machine-learning algorithms can understand using continuous encoding.In the given example, the RGB (R: red, G: green, and B: blue) are shown as the specific values of pixels in the targeted images.

FIGURE 3
FIGURE 3 FIGURE 5Neural network approaches.(A) MLP comprise nodes, which are represented by circles and can be either internal (hidden) values or output values.Layers of nodes are created by connecting each node in one layer to every node of other layers, signifying that the links represent learned parameters.For instance, the maize crop has been used for predicting yield(Ahmed, 2023).(B) To compute the values in the subsequent layer, a CNN employs filters that traverse the input layer.Since the filters work throughout the layer, parameters are shared, making it possible to identify related things wherever they may be.Although 1D and 3D CNNs are also used in crop improvement, 2D CNNs demonstrated operating on images of crops.1D and 3D CNNs have been used for crop and crop-land classification(Ji et al., 2018;Liao et al., 2020;Liu et al., 2023).(C) An RNN is a deep-learning model trained to interpret and translate a given set of sequential data inputs into a predetermined set of sequential data outputs.It is used for weed detection for crop improvement(Brahim et al., 2021).(D) A GNN uses data from linked nodes of graph format data.It can be used for genomic selection in crops, and it has been used for predicting crop variety yield(Yang et al., 2023).(E) An autoencoder is composed of an encoder neural network that transforms an input into a latent representation with lower dimensions, and this hidden representation is transformed back into the original input by using a decoder neural network.This method has been used in crop disease detection for crop improvement(Abinaya and Devi, 2022).

FIGURE 7
FIGURE 7General pipeline constructing-silo for crop-improvement data to use in FL approaches.Several universities or institutes cooperatively train an ML model via federated learning (FL).In phase (A), the central server provides the institution with the most recent model version.In phase (B), each organization uses its data to train the model locally.In phase (C), each institution transmits its trained model to the central server.In phase (D), the central server combines all of the models that have been locally trained by the various universities into a single updated model.In phase (E), each training cycle involves repeating this procedure till the training of the model is complete.Crop data never leave the institution during any of the training phases.Institutions need access to essential resources such as powerful hardware and specialists to conduct FL successfully.

TABLE 1
Comparison of different machine-learning methods.model, if any.Numerous models that are often employed in regression could be used in classification as well.Another acceptable default starting point for a classification problem is to train an SVM based on the kernel function and a linear SVM.k-nearest neighbors classification (also known as k-NN or KNN) is a further technique that could be used linear

TABLE 2
Suggestive strategies for applying machine-learning techniques to varied datasets related to crop improvement.

TABLE 3
Agricultural applications of the federated learning method in some crops.