FIU Digital Commons FIU Digital Commons Review—Machine Learning Techniques in Wireless Sensor Review—Machine Learning Techniques in Wireless Sensor Network Based Precision Agriculture Network Based Precision Agriculture

The use of sensors and the Internet of Things (IoT) is key to moving the world’s agriculture to a more productive and sustainable path. Recent advancements in IoT, Wireless Sensor Networks (WSN), and Information and Communication Technology (ICT) have the potential to address some of the environmental, economic, and technical challenges as well as opportunities in this sector. As the number of interconnected devices continues to grow, this generates more big data with multiple modalities and spatial and temporal variations. Intelligent processing and analysis of this big data are necessary to developing a higher level of knowledge base and insights that results in better decision making, forecasting, and reliable management of sensors. This paper is a comprehensive review of the application of different machine learning algorithms in sensor data analytics within the agricultural ecosystem. It further discusses a case study on an IoT based data-driven smart farm prototype as an integrated food, energy, and water (FEW) system. The

Technology plays a central role in mitigating pressure the farming industry faces due to factors in the rising population, consumer needs, and the growing shortages of land, water, and energy. Smart farming synonymous with other machine to machine (M2M) based implementation such as smart metering and smart city is also referred to as precision agriculture (PA). According to Libelium, a primary IoT solution industry driver, the total market value for PA solutions is expected to reach $4.7 billion in 2021, almost double the amount in 2016. 1 Despite a growing level of exciting research and new smart farming projects, the agriculture industry has been slow to adopt the emerging M2M and IoT technologies as compared with other industries. 2 Smart farming requires the integration of sensor technologies that collect data from the soil, crop, various environmental attributes, animal conduct, and tractor status. These sensor data through edge IoT computing and analytics can afford the farmer with valuable information on weather conditions and forecasts, crop monitoring, and yield prediction, plant, and animal disease detection. 3 The implementation of smart agriculture is dependent on the type of farming at hand. In a large farm setting, the use of farm vehicles like smart tractors equipped with GPS, and several embedded sensors, data visualization tools are currently in place with the ability to transmit real-time data. 4 Drones are a big player in this setting where built-in sensors provide different types of aerial imaging, field survey, and location mapping. 5 In small to medium-sized arable farming, spatially enabled mobile sensing technologies that provide detail analysis of field conditions in the different soil layer, nutrient levels, and overall ambient environmental conditions are being utilized. 6,7 Also, the implementation of smart irrigation by looking into the evapotranspiration parameter of plants to optimize the irrigation cycle is well in play. The use of soil moisture content and temperature sensors are widely prevalent in scheduling irrigation. [8][9][10][11][12] IoT solutions are also deployed in monitoring location and health of livestock where sensors are placed within the animal to transmit these data wirelessly. 13 Other popular IoT based applications are in greenhouses, and vertical farming integrated with emerging practices of aquaponics, aeroponics, and hydroponics. 14,15 Additionally, the use of WSN for different environmental monitoring intended for diverse applications have been widely implemented. 16 The objective of this paper is first to highlight the use of WSN and IoT in agriculture and give a comprehensive review of sensor and IoT data analytics using machine learning (ML) techniques for agriculture applications. Different numbers of relevant papers are presented that emphasize vital and unique features of ML model specifically in yield prediction, decision support for irrigation, and crop quality. It also presents a case study on an experimental testbed that implements an end-to-end IoT platform to investigate the interdependence of food, energy, and water (FEW) system. In the literature review, this paper only uses published works selected from the past three years onward. The structure of this paper is organized as follows: AI in Agriculture section presents the recent advances of artificial intelligence (AI) applications in agriculture. Machine learning techniques section delves into some of the commonly used machine learning techniques within the WSN based PA. Literature review section summarizes some of the recent works utilizing the ML technique for WSN based PA application. A case study on IoT based smart agriculture solution is presented in Case study on IoT based smart agriculture solution section.

AI in Agriculture
Artificial intelligence (AI) can help farmers get more from the land while using resources more sustainably. Big data refers to the large volume of data coming from sensors, IoT, GPS, aerial imagery, etc. 17 IoT is a system of embedded technologies consisting of wired and wireless communications, sensors, and actuators that are capable of acquiring and transferring data to the internet. 18 Today's Farms, with the help of IoT, Unmanned Aerial Vehicle (UAV), and other emerging technologies, is producing millions of data points on the ground daily. With the help of AI, farmers can now analyze weather conditions, temperature, water usage, energy usage, and soil conditions collected from their farm to better inform their decisions. Unlike before, farmers additionally now can use captured sensor data in predicting yield, and making them better equipped to natural disaster and climate conditions through intelligent data processing techniques like machine learning. IoT, combined with AI, is emerging as part of the solutions toward improved agricultural productivity and efficiency. 19 From detecting plant diseases 20 to monitoring harvest time, 21 AI's application in agtech is enormous and yet to be tapped. In Ref. 20, AI has been used to train data set of cassava leaves to detect disease and pest damages, and the algorithm was able to detect the damage with 98% accuracy. Furthermore, AI can be used in training robots to do the mundane labor of tending, harvesting, and maintaining farmland efficiently that usually requires a lot of human capital, time, and effort. AI in agriculture application is emerging in three areas: robotics, soil and crop monitoring, and predictive analytics. 22,23 Autonomous robots can replace human laborers in efficiently handling essential agricultural tasks such as planting, weed control, and harvesting. 22 Start-up companies like Blue River Technology recently acquired by John Deer implement computer vision in its precision spray to monitor and spray weeds on cotton plants. 24 Robotics and automation are also emerging as a solution to solve the problem of laborers in harvesting. A robot has been developed by Harvest CROO Robotics that support farmers in picking and packing strawberry. 25 Crop disease detection and soil health monitoring are significant areas where ML techniques have been mainly implemented. For instance, Plantix, an image recognition app, uses ML techniques in its software algorithm that can detect soil defects, and plant diseases in agriculture based on soil patterns. 26 Farmers can see the information through their smartphone camera along with techniques and solutions to fix the problem. Similarly, deep conventional neural network has been used to identify three crop diseases and two types of pest damage targeting cassava plants in Tanzania. 20 The use of UAV (drones) is currently prevailing in agriculture, where the market for it is projected to reach $480 million by 2027. 23 Drones can gather massive data of vast acres of land within a short period and are ideal in large arable farms. Through AI, data gathered by a drone can improve crop health, yield, and reduce cost. 27 The most popular use of predictive analytics is in connection with satellite data to predict weather and crop sustainability, in pest and disease identification, and remote PA application. 6,7 Predictive analytics are used in the data processing, wrangling, and analysis of sensor data for future prediction and decision models. In addition to this, ML techniques commonly used in IoT WSN based irrigation schemes as a decision support. 10

Machine Learning Techniques
Machine learning is a type of AI that gives machines the ability to learn from experience. Its algorithms use computational methods to learn directly from datasets without depending on predetermined equations as a model. The algorithms progressively adapt to enhance their performance as the available number of training samples increases. [28][29][30] ML approaches are powerful tools capable of autonomously solving extensive non-linear problems using sensor data or other various interconnected sources. It facilitates better decision making and informed actions in real-world scenarios with minimal human intervention. ML techniques are constantly undergoing developments and are widely applied across almost all domains. However, they have fundamental limitations on their applications. The accuracy of the prediction is affected by the data quality, proper model representation, and dependencies between input and target variables. 31 There are two broad categories of machine learning algorithms: supervised and unsupervised learning. Supervised learning uses a known set of labeled data to train a model to predict the target variable for out of sample data. 28 Classification and regression techniques are common applications of supervised learning. The list of common algorithms that fall under the different techniques is highlighted in Fig. 1. On the other hand, unsupervised learning relies on hidden patterns or intrinsic structures in data to draw deductions from unlabelled data. It is useful for exploratory applications where there is no specific set goal, or the information the data consists is not clear. It is also ideal as a mechanism for dimensionality reduction on data that have a number of features. Clustering is the most common learning model under this type of learning, and its application extends to exploratory data analysis, such as gene sequencing and objects recognition. 29 Algorithm selection depends on the size, type, and expected insight into the data. There is, however, no general prescription for algorithm selection; in most cases, it is a trial and error work. Both supervised and unsupervised learning techniques are used extensively in IoT smart data analysis across various domains. 32 Smart farming enabled by WSN and IoT is one of the domains where ML techniques are emerging to quantify and understand the big data in this field. ML application in PA can be categorized as crop management, 31,33-35 livestock management, 13 water management 36,37 and soil management. 31,38 ML's application in crop management deals with yield prediction, 31,33-35 disease detection, 20 weed detection 24 and phenotype classification. 27 This paper will further focus on WSN driven AI-based for agriculture applications. Regression.-Regression is supervised ML techniques that predict continuous responses such as stock prices, fluctuations in electricity demand, and time-series sensor data. Mainly, there are two types of regression algorithms: linear and nonlinear. Linear models rely on the assumption of a linear relationship between independent and dependent variables. As presented in Fig. 1, the common regression algorithms are linear, nonlinear, Gaussian process regression model (GPRM), support vector machine (SVM) regression, generalized linear model (GLM), decision tree (DT), ensemble methods, and neural networks. Four of these techniques were selected to be discussed in detail as they have been relevant to the application of crop yield prediction.
Decision tree (DT).-This method is also known as classification and regression trees (CART), which can be applied to both categorical and continuous input and output variables. 39 It works by splitting the data into two or more homogeneous sets or regions based on the most significant splitter among the independent variables.
DT works by following the decisions in the tree from the root down to a leaf node, 40 as shown in Fig. 2. The tree is usually shown inverted, with the root node at the top. A tree also consists of branching conditions where the value of a predictor is compared to a trained weight. In Fig. 2 the branching conditions are also shown next to the branches, where X1, X2, X3, and X4 represent the independent variables, and a,b,c, and d are the learned weights.
During the training process, the number of branches and the values of weights is determined. The best differentiator in the splitting process is the one that minimizes the cost metric. The cost metrics for a classification tree is often the entropy or the Gini index, whereas, for a regression tree, the default metric is the Mean Squared Error (MSE), which is shown in Equation 1.ŷ i is the predicted value for the i th sample and y i is the corresponding true value.
Additional pruning or modification can also be implemented to simplify the model. Pruning, as the name suggests, is the process of removing branches that do not significantly reduce the cost function. DT is easy to interpret and fast to fit and is optimal for applications where minimal memory usage and low predictive accuracy is not a priority. 41 The performance, in terms of MSE, of simple regression trees is not as high as the other machine learning methods. But by aggregating several decision trees, the performance improves significantly. Methods such as bagging, boosting, and random forests are based on this approach. 42 In particular, random forests has proven to be the most effective and popular among the tree-based approaches.
Ensemble learning.-Ensemble learning (EL) models strive at enhancing the predictive performance model fitting technique by creating a linear aggregate of a "base learning algorithm". 19 There are two principal strategies for designing ensemble learning algorithms. The first method is to form each hypothesis independently to create a set of hypotheses that are accurate and diverse. One of the common method for this is 'bagging' also known as "Bootstrap Aggregating" 43 and random forest. 44 The second approach deals with building the hypothesis in a coupled manner, so the weighted vote of the hypothesis generates a suitable fit to data. 45 A common method like random forest algorithm, unlike DT, overcome over-fitting by reducing the variance of the decision trees. They are called 'Forest' because they are the collection, or ensemble, of several decision trees. 44 One major difference between a DT and a random forest model is how the splits happen. In random forest, instead of trying splits on all the features, a sample of features is selected for each split, thereby reducing the variance of the model.
Averaging observations is an effective technique to reduce variance and improve predictive accuracy. If predictions P 1 , P 2 , P 3 ...P b are calculated on b different training sets and then averaged to calculate P bag , the variance reduces by a factor of b. But it is not always feasible to obtain several data sets for training. Bootstrapping is applied in this case, where samples are repeatedly extracted from the same population or data. This approach is referred to as bagging. 42 One of the drawbacks of bagging is that all the decision trees trained for the prediction can be highly correlated. The variable that is a strong predictor will be chosen for the first split (at the root node) by all the trees. This can limit the improvement in prediction accuracy. Random forests offer an improvement over bagging by ensuring that the trees are not correlated. They achieve this by randomly selecting a subset of m predictors for each tree out of the p available features. And usually, m is chosen to be much smaller than p. In fact, 42 suggests choosing an m close to √ p.
Bayesian models.-Bayesian models (BM) are a group of probabilistic graphical models in which the analysis is initiated within the context of Bayesian inference. 28 Equation 3 represents the Bayes' Theorem that forms the basis for BM. This equation is used to calculate the posterior probability using the prior probability and the information from the data collected. P(A|B) is the posterior probability that we wish to calculate. P(A) is the known prior probability. P(B|A) is known as the likelihood of the observation B. 42 They are a type of supervised learning category and can be employed for solving either classification or regression problems. Counter to most machine learning algorithms; Bayesian inference needs a relatively small number of training samples. 46 Bayesian methods modify probability distribution to detect possible concepts without over-fitting 30 efficiently.
Some of the most common algorithms are Naive Bayes, 47 Gaussian naive Bayes, multinomial Naive Bayes, Bayesian network, 28 a mixture of Gaussians, 48 and Bayesian belief network. 39 Support vector machine (SVM).-Similar to SVM classification, SVM regression algorithms are modified to predict a continuous response. 49 Instead of finding a hyperplane that separates data, SVM regression algorithms find a model that deviates from the measured data by value no greater than a small amount with parameter values that minimizes sensitivity to error. 40 It is suitable for high-dimensional data where a large number of predictor variables exist. Potential applications of SVM in WSNs supported PA are as a regression for yield and sensor data forecasting. 50,51 Figure 3 shows a linear case of support vector regression. The goal is to fit a linear function of the form y = w.x+b in order to optimize the cost function. Since w and x are vectors, w.x represents a dot product. By replacing the dot product with a nonlinear kernel, the data can be transformed into a higher-dimensional space. By doing this, the model can learn higher-order functions.
Artificial neural network (ANN).-ANN is an informationprocessing system that has certain performance similar to the biological neural networks. This learning algorithm could be constructed by cascading chains of decision units such as perceptrons or radial basis functions. used to recognize non-linear and complex functions. 52 A neural network is characterized by 1) its pattern of connections between the neurons called its architecture, 2) its method of determining the weights on the connections called algorithm, and 3) its activation function. The general architecture of the ANN algorithm consists of input units, single or multi-layer hidden units, and output units. 53 ANN can be used for regression and classification problems. Commonly implemented ANN learning algorithms include the radial basis function, 54 perception algorithms, back-propagation, and feedforward propagation. 19,53 y m i = A(h m i ). During the training process, the weights and bias in each layer are learned with the goal of minimizing the cost function. There are several choices for the cost function as well and MSE shows in Equation 1 is one of the most common.

Literature Review
Yield prediction is a vital feature of precision agriculture that utilizes farmland and weather data to help farmers increase crop production. Farmland data, which can be manipulated by man, can include land usage and preparation (tillage vs no-till soil) depth of till, soil texture, soil structure, organic matter present, the amount of (nitrogen, phosphorus and potassium) fertilizers present and consumed, efficiency of water usage based on the type of irrigation scheme, crop rotation pattern, method for pest and weed control, total yield produced. Environmental weather data can include temperature, rainfall, solar radiation, wind speed, presence of pests, weeds, and biodiversity. The use of advanced sensor technology allows this data to be autonomously collected in a non-destructive manner. This acquired data is then employed with ML to provide actionable insight, thereby selecting the best decision management systems for yield prediction. Table I provides a list of algorithms and approaches used for the prediction of crop yield, irrigation management, and crop disease detection.

Case Study on IoT Based Smart Agriculture Solution
In this section, a distributed WSN developed using open-source hardware platforms, Arduino based micro-controller, and ZigBee 55 module to monitor and control parameters critical to crop growth such as soil conditions, environmental and weather conditions is further discussed. This experimental testbed, as detailed in Ref. 6, is an offgrid photo-voltaic (PV) supported small-sized smart farm experimental test-bed, which additionally captures energy and water data as well. The main objective of this experimental project is to investigate more about the nexus of food, water, and energy by designing an IoT based farm system that will give the ability to produce more food with less energy and water using a simple automated system powered by solar panel in order to address current and future FEW resource scarcity. 70 It further aims to advance the goal of integrated planning, policy, and management, by using IoT and data analytics; bring together stakeholders working on different sectors of FEW systems, by providing a user friendly interface to track and control the system; as well as flexibility in the size of the system, broadening the user base.
The farm prototype, as shown in Fig. 5 operate on distributed wireless sensor technology and is able to monitor and measure various environmental parameters, such as soil temperature and moisture, in real-time to schedule precise irrigation events. The system further collects real-time weather information in order to minimize environmental impact and make better decisions on how to manage resources such as water and energy. The information gathered is available in the local and external databases, and the users have the ability to retrieve the information using an intuitive mobile application. The intent of the mobile app is to allow users to monitor or interact with the farm infrastructure remotely. The overall system is implemented with design requirements to be power-efficient, cost-effective, and low maintenance, allowing the farmers/users to manage their farm or garden with little effort. This system is currently deployed at the FIU engineering campus area for development and testing purposes. The deployment includes a gateway, 6 WSN, and a weather console.
Sensor nodes.-The wireless sensor units is a customized Arduino microcontroller consisting of the main functionality module board and a sensor interface board. The sensor board can be interfaced with various sensors to measure soil moisture content, pH level, soil temperature, leaf wetness, ambient temperature, solar radiation, atmospheric pressure, humidity, and weather parameters, including wind direction, precipitation, and wind speed. It uses the ZigBee protocol with XBee PRO S2 2.4 GHz to transmit sensor data to the gateway and also to communicate among other nodes. Sensor integration and programming can be achieved via the Integrated Development Environment (IDE). Each WSN is equipped with 3.7 V, 1150 mAh capacity lithiumion batteries to take care of power issues.
IoT Gateway.-A Linux-based mesh router is used as an IoT gateway where all the sensor data is saved in a local MySQL database. The gateway supports different wireless communication protocols, but this project uses a ZigBee protocol to communicate with the sensor nodes. Furthermore, it supports Ethernet or Wi-Fi connection where data stored in the local database can easily be synchronized to an external database via TCP/IP through Wi-Fi or cellular connection. Additionally, the gateway can push sensor data to a cloud platform. The gateway provides a user interface application to view recently captured data, as shown in Fig. 6.   Services and the cloud.-The gateway pushes sensor data to the Microsoft Azure cloud platform, a limited paid cloud service platform, and Google Firebase, a free cloud service with a generous storage Machine vision uses "canny edge" detection and seeded region growing to estimate leaf area, one-way and two-way ANOVA are used to examine correlation between environmental sensor data with leaf growth To find the best growth condition for orchids and increase productivity To increase production of crops through yield prediction. Counting coffee fruits on a branch, estimating weight, and the maturation percentage of the coffee fruits.
Image processing using digital imagery Crop characteristics were obtained using remote sensing approaches.
sunflower seed yield could be reasonably estimated using crop characteristic indexes under complex environmental conditions and management options (e.g., saline soils, nitrogen application.
Remote sensing for collecting data of high spatial-temporal resolution, e.g NDVI for crop development.
limit. The cloud provides convenient and flexible access to data for the intended user. It enables data access outside the farm network, long term applications like crop suggestions, and data analytics. Therefore data from this smart farm system can be accessed through web-based applications and smartphones.
Application.-Extending the data infrastructure to cloud, part of the goal for this project is to live-stream sensor data using a mobile application. The mobile application platform enables intended users to understand what their farm is doing in real-time and additionally track critical information on energy consumption, irrigation events,  and weather variables. The sensor data stored on the local database of the IoT gateway are constantly synchronized to an external MySQL database located in a virtual machine and Google Firebase cloud. It has been successfully implemented to be pushed to Microsoft Azure cloud services, as well. However, due to storage limit and cost, Google Firebase is selected for this application. The mobile application, Green-Link Farming, is currently developed for Android OS and will be extended to iOS in the future. The functionality of the GreenLink Farming app is summarized as follows: 1. A dashboard menu with soil moisture content, leaf wetness, and soil temperature, critical to the feedback response of irrigation events, as shown on the right side of Fig. 8. 2. Insight into previously collected and real-time sensor data. These data are divided into five tracks: weather data, soil data, yield data, energy data, and water data, as shown in Fig. 7.
3. Data visualization capability: sensor data can be viewed as a list view or are plotted to get insight on trends and patterns into the data. 4. Data analytics: predictive modeling of crop yield, weather, energy, and water using different ML techniques. The end objective for this is to eventually maximize food production through multiobjective optimization of the aforementioned variables. Additionally, it will explore the interdependent networks of food production on energy and water.
Data analysis.-Part of the goal for this project is to use highresolution sensor data for the prediction of crop yield, weather, and crop quality from sensor data. This IoT solution manages variations in the field to increase crop yield, raise productivity, and reduce the consumption of agricultural inputs. The data-driven physical model enables farmers on how much energy is being produced and consumed  by their farm, how much water is being consumed and recycled, and the quality of the yields. Monitoring weather data long-term will give better leverage in building a time-series forecast model that can accurately predict the weather a day ahead, equipping the farmer with decision making capability on when to irrigate. The mobile application platform provides just this, giving the intended user when to schedule irrigation on the dashboard. The number of measurements that sensors can take make the data storage and management process overwhelming, but will help narrow down potential predictor attributes in crop data sets. As this project is ongoing, the data analysis level is in the prelim nary phase, where more data processing and mapping needs to be completed. The research task for the data analysis track, as shown in the left diagram of Fig. 8 are listed as follows: 1. Tracking and gathering data from food, energy, water infrastructure 2. Data pre-processing to organize, clean, and prepare the data. This step is critical since the nature of this project has different temporal, spatial scale 3. Modeling of different machine learning algorithms • Use classification and regression trees (CART) model to identify potential predictors for crop yield, FEW interactions, and yield quality • Use autoregressive integrated moving average (ARIMA) model for all time-series based sensor data • Use of deep neural network in remote sensing data to supplement the WSN data • Evaluation of the models 4. Upgrading and modifying mobile and web-based application to display predicted values This project implements an IoT based data-driven prototype for an integrated food, water, and energy system. The main goal is to monitor and measure the three interdependent resources using wireless sensor networks and IoT platform across the whole system. To easily navigate the data acquisition and integration, GreenLink Farming mobile application is designed and implemented with Google Firebase cloud storage as a back-end. The implication of such a system is many: it advances the current research challenge on the lack of data-driven integrated FEW systems, and it explores the application of AI in agriculture, and it facilitates the slowly adopt IoT technology into the agriculture sector. It revolutionizes the way farmers cultivate by giving them a direct insight into what their farm is doing through a mobile application capable of data integration, visualization, and analytics. With a cost feasibility analysis, the prototype can also be ideally implemented in regions of the world where access to electricity is a challenge through the use of off-grid solar panels with energy storage. [71][72][73] There is no doubt data-driven techniques can tremendously help boost agricultural productivity. This case study presents an end-to-end IoT platform for agriculture to collect, monitor various sensors with a data analysis framework to be easily accessed via smartphone and internet.

Conclusions
Agriculture, like several industries, is undergoing a digital transformation. The amount of data being collected from farms is increasing exponentially. The use of wireless sensor networks, IoT, robotics, drones, and AI is on the upswing. Machine learning algorithms enable the extraction of useful information and insights from the deluge of data. This paper has reviewed the ML methods frequently used by researchers in the past two years in conjunction with wireless sensor networks. The coming years may see an increased use of more advanced techniques like distributed (or edge) deep learning. AI must be leveraged to increase the automation of tasks in agriculture and improve the yield while optimizing the use of natural resources. This paper has shown different ML models applied in multiple applications within the precision agriculture ecosystem, including yield prediction, weed, and disease detection. The reviewed work has only been focused specifically on WSN based PA application where ML algorithms were implemented for data mining, forecasting, and automation purpose. By applying ML to sensor data, farm management systems are evolving into real AI systems, providing optimal insights for decisions and actions to be made. This remark is further proved and showcased through an experimental smart farm prototype case study. An environment to help anyone to deploy a PA monitoring application has been described and successfully evaluated in this case study. The architecture, hardware, communication protocol, and data acquisition infrastructure is detailed. The implementation of smartphone applications and the back-end data analysis framework for prediction of weather, crop yield, and crop quality is presented.