Artificial neural network-based repair and maintenance cost estimation model for rice combine harvesters

: This research proposes an artificial neural network (ANN)-based repair and maintenance (R&M) cost estimation model for agricultural machinery. The proposed ANN model can achieve high estimation accuracy with small data requirement. In the study, the proposed ANN model is implemented to estimate the R&M costs using a sample of locally-made rice combine harvesters. The model inputs are geographical regions, harvest area, and curve fitting coefficients related to historical cost data; and the ANN output is the estimated R&M cost. Multilayer feed-forward is adopted as the processing algorithm and Levenberg-Marquardt backpropagation learning as the training algorithm. The R&M costs are estimated using the ANN-based model, and results are compared with those of conventional mathematical estimation model. The results reveal that the percentage error between the conventional and ANN-based estimation models is below 1%, indicating the proposed ANN model’s high predictive accuracy. The proposed ANN-based model is useful for setting the service rates of agricultural machinery, given the significance of R&M cost in profitability. The novelty of this research lies in the use of curve-fitting coefficients in the ANN-based estimation model to improve estimation accuracy. Besides, the proposed ANN model could be further developed into web-based applications using a programming language to enable ease of use and greater user accessibility. Moreover, with minor modifications, the ANN estimation model is also applicable to other geographical areas and tractors or combine harvesters of different countries of origin.


Introduction
In agricultural machinery operation, fuel costs account for the largest proportion of the variable cost, varying by the extent of operation [1] . Another significant variable-cost item is repair and maintenance (R&M) cost, which is a function of annual usage and machinery service life [2] . The ownership and operating costs constitute the key cost components of agricultural machinery investment. The ownership (fixed) cost, including depreciation, interest on investment, tax and insurance, is straightforward. On the other hand, the operating costs (e.g., fuel, lubricant, labor, R&M) vary in amount subject to the extent of operation.
As a result, a model to estimate the operating costs of agricultural machines, particularly the R&M outlay, is vital to the effective cost management and maximum return on investment [4][5][6] . The R&M cost of agricultural machinery is nonlinear, subject to a variety of factors including machinery age, extent of operation, harvest area, soil condition, crop type, and operators' skills and experience [7] .
As an alternative to the conventional mathematical methods, artificial neural network (ANN) is adopted to estimate the R&M cost of agricultural machinery. Ranjbar et al. [22] comparatively estimated the R&M costs of tractors using two neural network structures (between single network and separate networks), the result found that a single network gave a better result than using separate networks for estimation of each cost component. They summarized that neural network could be improved the economic decision making capabilities of machinery managers. Rohani et al. [23] estimated the R&M costs of two-wheel-drive tractors using ANN and conventional mathematical models; and reported that the ANN model provided the accuracy with the coefficient of determination (R 2 ) and root mean square error (RMSE) of 0.99 and 0.3674, respectively. BDLRF with feed-forward back-propagation (FFBP) algorithms Azim et al. [24] predict the R&M cost of twowheel-drive tractors using the multi-layer neural network with Feed Forward Backpropagation training algorithm (FFBP), the performance of Backpropagate Declining Learning Rate Factor algorithm (BDLRF) has been compared with Feed-Forward Backpropagation algorithm (FFBP), the result shows that training Feed Forward Backpropagation algorithm (FFBP) surpasses the (BDLRF) algorithm in predicting tractor R&M costs by using separate networks rather than a single network.
Despite significantly less data requirement, the existing ANN models suffer from limited estimation accuracy compared to the mathematical models [25] . In light of large data requirements of mathematical models and limited predictive accuracy of existing ANN models, this research proposes an ANN-based R&M outlay estimation model for agricultural machinery. The proposed ANNbased estimation model can achieve high predictive accuracy with small data requirement. In this study, the proposed ANN-based model is implemented to estimate the R&M costs using a sample of locally-made rice combine harvesters in the rice-growing regions of Thailand.
The inputs of the ANN estimation model are geographical regions (Thailand's northern, northeastern, and central regions), size of harvest area, and curve fitting coefficients related to historical cost data; and the ANN output is estimated R&M cost.
The neural network convergence algorithm of Levenberg-Marquardt backpropagation has many advantages compared to the traditional backpropagation, Levenberg Marquardt (LM) based back propagation (BP) has better performance (in term of convergence speed and rate) than other algorithms such as Artificial Bee Colony-Levenberg Marquardt (ABC-LM), Artificial Bee Colony-back propagation (ABC-BP) and back propagation neural network (BPNN) algorithms [26] . Sapna et al. [27] concluded that Levernberg-Marquardt algorithm gives the best performance in the prediction of diabetes compared to any other backpropogation algorithm. Multilayer feed-forward and Levenberg-Marquardt backpropagation learning algorithms are used for R&M cost estimation. Unlike previous ANN models, the proposed ANN estimation model incorporates curve-fitting coefficients, which is part of the mathematical technique, into the model to improve the predictive accuracy. To validate, the ANN-based R&M cost estimations are calculated, and results are compared with those of conventional estimation model. The proposed ANN-based model is useful for setting the rental rate or service charge of agricultural machinery in an efficient and reasonable manner, given the significance of R&M cost in profitability.

Research methodology
The research methodology consists of three stages: data collection, evolution of the algorithmic scheme and training, and validation. In the data collection stage, field survey is carried out to garner data on purchase prices, harvest areas, machine ages, and historical annual R&M costs. The curve-fitting coefficients of mathematical functions are then determined.
In the ANN algorithmic scheme evolution stage, multilayer feed-forward is adopted as the processing algorithm and Levenberg-Marquardt (LM) backpropagation learning as the training algorithm. The post-training ANN-based algorithmic scheme is subsequently established to estimate the R&M cost. In the validation stage, the ANN-based R&M cost estimations are calculated, and results are compared with those of conventional estimation model. The validation stage is detailed in the Results and Discussion section. In this research, a field survey was undertaken with a random sample of 100 owners of locally-made rice combine harvesters in 30 rice-growing provinces (excluding pre-owned vehicles). The owners have maintained detailed records of R&M costs since the first year of machine acquisition. Since the survey participants are required to have a complete record of repairs and maintenance, the sample size is therefore limited to 100 combine harvesters. Besides, previous research works on the relationship between R&M cost and usage relied on a minimum sample size of 30 [4,7] . Since there exists no official and systematic record keeping of R&M costs in Thailand, this research conducts a field survey by face-to-face interviews using a semi-structured questionnaire to collect the data. This method is straightforward and efficient to collect data from participants [27] .
The 30 rice-growing provinces consist of 7 provinces in the North (20 combine harvesters), 11 provinces in the Northeast (40), and 12 provinces in the Central Plains (40). In Thailand, rice cultivation is densely concentrated in the central region due to fertile lands and efficient irrigation. The soil condition of the northern and northeastern regions are of saline soil and gravelly, while that of the central region is of clay loam. Since topographical features vary from region to region which influence operation and R&M costs of rice combine harvesters, this research thus uses the data of different geographical regions. The average age of the rice combine harvesters is six years. The field survey data include initial acquisition costs (purchase prices), years in service, annual harvest areas, and annual R&M outlays, including lubricants, oil filter, spare parts, and labor. Table 1 lists the specifics of the surveyed rice combine harvesters. Rice combine harvesters are categorized by cutting widths into small and large combine harvesters. Due to different farm scales and crop types, the large cutting widths (5-6 m) are normally used in European countries and the U.S., while the small widths (1-4 m) are ubiquitous in Asian countries [28] .  Figures 1a-1c respectively illustrate the accumulative R&M cost, using MATLAB/Simulink, of rice combine harvesters (USD) relative to harvest area (hectare, hm 2 ) of Thailand's northern, northeastern, and central regions. The relationships between R&M cost and harvest area are nonlinear. Conventionally, the accumulative R&M cost as a percentage of initial purchase price is a function of accumulative hours of machinery use, and the machine is replaced upon reaching a predetermined maximum hour-usage threshold. However, this practice is impractical in the Thai setting due to a lack of hour-based R&M expenditures. In Thailand, the R&M outlay is in lump sum amount per total harvest area annually.

Mathematical curve-fitting coefficients
With the accumulative R&M cost relative to harvest area, the curve fitting models, based on coefficients of power, polynomial, exponential, and linear functions, are subsequently established. The accumulative R&M cost (y) based on power (Equation (1)), polynomial (Equation (2)), exponential (Equation (3)), and linear functions (Equation (4)) are: where, y is the accumulative R&M cost; a, b, c are the curve fitting coefficients; and x is accumulative harvest area. The number e, also known as Euler's number, is a mathematical constant approximately equal to 2.718 28.
To obtain the curve fitting coefficients of the power function, the accumulative R&M cost and harvest area data are fitted into Equation (1) (i.e., the power function). The results are graphically depicted by geographical region in Figures 2a-2c.
To acquire the curve fitting coefficients of the polynomial function, the accumulative R&M cost and harvest area data are fitted into the polynomial function (Equation (2)). Figures 3a-3c illustrate the polynomial function-fitted accumulative R&M cost relative to accumulative harvest area of the country's North, Northeast, and Central Plains, respectively.
Figures 4a-4c show the exponential function-fitted accumulative R&M cost (Equation (3)) relative to accumulative harvest area of the northern, northeastern, and central regions, respectively. The corresponding linear function-fitted accumulative R&M outlay (Equation (4)) in relation to accumulative harvest area are depicted in Figures 5a-5c.
In Figures 2a-2c, the minimum accumulative R&M outlays of power function fitting curve of three geographical regions approach zero. The finding indicates that the power function-fitted accumulative R&M cost estimation model is applicable to small-, moderate-, and large-scale harvest areas.
In Figures 3a-3c, the minimum polynomial function-fitted accumulative R&M expenditures of the North and Central Plains approach zero, while that of the northeastern region is negative. The negative R&M outlay of the Northeast is attributable to scarcity of repairs and maintenance during early machine service life. The polynomial function-fitted R&M cost estimation model is thus suitable for small-to large-scale farmland in the northern and central regions but unfit for small-scale harvest areas in the Northeast.
In Figures 4a-4c, the minimum exponential function-fitted R&M accumulative cost of rice combine harvesters of three geographical regions is around USD 2000. The excessive minimum R&M expenditure is contrary to logic, rendering the exponentialfitted R&M cost estimation model impractical. Meanwhile, due to the non-linearity of R&M outlay of agricultural machinery, the linear function curve fitting model is non-ideal for estimation of R&M cost, as shown in Figures 5a-5c. Table 2 lists the curve fitting coefficients (a, b, and c) of power, polynomial, exponential, and linear functions by geographical regions (Figures 2-5). In the table, the power-function coefficients of determination (R 2 ) of the three geographical regions are 0.9775-0.9806; and those (R 2 ) of polynomial, exponential, and linear functions are 0.9776-0.9820; 0.9132-0.9348; and 0.9465-0.9725. The large R 2 indicate high predictive accuracy of the mathematical functions.
Source: curve-fitting function based on survey data. RMSE : root mean squared errors.
In Table 2, in addition to the straightforwardness of power function model, its root mean squared errors (RMSE) for the three geographical regions are comparably small. Besides, the power function-fitted model is commonly used in estimation of R&M expenses of agricultural machinery [15,18,20,21] .

Second stage: Algorithmic scheme evolution and training
In the second stage, multilayer feed-forward algorithm is adopted as the processing algorithm of the ANN-based estimation model. In the ANN training, Levenberg-Marquardt backpropagation learning is used as the training algorithm.

Multilayer feed-forward algorithm
Multilayer feed-forward algorithm is used as the processing algorithm to estimate the R&M cost, which is a function of geographical region (R E ), harvest area (x), and power-function curve fitting coefficients (a, b) ( Table 2). Figure 6 illustrates the schematic of ANN-based R&M cost estimation model, consisting of three layers: input (R), hidden (S), and output (T) layers.
For the input layer ([P] (1×i) ), the geographical regions (p 1,1 ) are Thailand's northern (N), northeastern (NE), and central (C) regions; and the harvest area (x; p 1,2 ) is in hectare (hm 2 ). The power-function curve fitting coefficients (a, b; p 1,3 , p 1,4 ) are obtained from the  Figure 7) is the activation function of the input layer.
For the input-side hidden layer ([IW] (i×j) ), the number of neurons is iteratively optimized by ANN (i.e., multilayer feed-forward algorithm) based on type or complexity of experimentation [28] . The relationships are expressed in Equations (5)-(6) and graphically depicted in Figure 7.
Substituting the hyperbolic tangent sigmoid transfer function (tan sig) in a 1 ,  In Figures 6-8 ., R E , x, a, b). After a series of trial and error, the number of neurons of 10 (n=10) is selected, given large R 2 (0.999 78) and optimal response time. The matrix size of the input weight [ (10)- (14).
[LW] Figure 11 shows the internal validation result of post-training ANN algorithmic scheme whose R 2 is 0.999 78, indicating very high predictive accuracy of the algorithmic scheme. Figure 12a illustrates the geometry of the post-training ANN algorithmic scheme to estimate the R&M cost, and the Matlab/Simulink procedural scheme of the ANN-based estimation model is depicted in Figure  12b.
In estimation of R&M cost (Figure 12b), a geographical region (North, Northeast, or Central Plains) and corresponding regionspecific coefficients (Coef a (N), Coef a (Ne), Coef a (C), Coef b (N), Coef b (Ne), Coef b (C)) are manually selected using switches 1-6, where N, NE, and C denote the northern, northeastern, and central

Results and discussion
This section is concerned with the validation results of the proposed ANN algorithmic scheme. In the validation stage (i.e., the third stage), the ANN-based R&M cost estimations are calculated, and the results are compared with those of mathematical estimation model. Table 3 lists the simulation parameters of the conventional power-function curve-fitting and ANN-based estimation models of   Table 4 summarizes the estimated R&M outlay (USD) of the three geographical regions using the conventional curve fitting model. In the table, given the harvest area (x) of 400 hm 2 , the estimated R&M costs using conventional curve-fitting model are 1749.2 USD, 2415 USD, and 1698 USD for the northern, northeastern, and central regions, respectively. With 800 hm 2 harvest area, the corresponding R&M costs are 5436 USD, 6562 USD, and 5123.3 USD. The R&M expenditure is positively correlated to the size of harvest area. Figure 13 illustrates the R&M cost estimations in relation to harvest area using the conventional R&M models for the three geographical regions (Equations (15)-(17)).  Figure 13 R&M cost estimations relative to harvest area using the conventional power-function curve-fitting R&M cost models of northern, northeastern, and central regions Table 5 presents the R&M cost estimations of the three geographical regions using the ANN-based estimation model, based on the simulation parameters in Table 3. Given the harvest area (x) of 400 hm 2 , the estimated R&M costs using ANN-based model are 1749 USD; 2416 USD; and 1699 USD for the northern, northeastern, and central regions. With 800 hm 2 harvest area, the corresponding R&M costs are USD 5437 USD; 6563 USD; and 5124 USD. The R&M expenditure and harvest area size are positively correlated. The R&M cost estimations relative to harvest area using the ANN-based estimation model by geographical region are shown in Figure 14.   (18) where, R&M CONV and R&M ANN are the R&M cost estimations of the conventional and ANN-based models. In Table 6, the error between the conventional curve-fitting and ANN-based R&M cost estimation models of the northern region (N) is between -0.003 80%-0.001 24%. The percentage errors between both estimation models for the Northeast (NE) and Central Plains (C) are between 0.000 41%-0.000 04%; and -0.003 06%-0.000 584 6 %. The overall error is below 1%, indicating good agreement between the conventional and ANN-based estimation models.  Figure 15 compares the R&M outlay estimations of the conventional and ANN-based models of the northern region. The results of both models are in good agreement and consistent with Table 6. Likewise, the R&M expenditure estimations of the Northeast are also in good agreement, as shown in Figure 16. In Figure 17, the estimated R&M costs of both estimation models of the central region are agreeable. In essence, the results validate the applicability of the ANN-based model to estimating the R&M outlays of agricultural machinery, i.e., rice combine harvesters. The comparison between the estimated R&M costs of both estimation models validates the suitability of the proposed ANNbased estimation model as an alternative to the conventional mathematical estimation model, as evidenced by the percentage error of less than 1%. Unlike the mathematical model which demands a large amount of data, the proposed ANN model requires a substantially smaller amount of data for accurate estimation and thereby lower data-collection budget. Besides, data update is more convenient for the ANN-based estimation model. The proposed ANN model also incorporates curve-fitting coefficients into the model to improve the estimation accuracy. Specifically, the proposed ANN-based model is useful for pricing the service rate of agricultural machinery in an efficient and reasonable fashion.

Conclusions
This research proposes an ANN-based R&M cost estimation model for agricultural machinery with high estimation accuracy and small data requirement. In the study, the proposed ANN estimation model is implemented to estimate the R&M costs using a sample of locally-made rice combine harvesters in the rice-growing regions of Thailand. The model inputs are geographical regions, size of harvest area, and curve fitting coefficients related to historical cost data; and the ANN output is the estimated R&M cost. The ANN algorithmic scheme uses multilayer feed-forward and Levenberg-Marquardt backpropagation learning algorithms. To validate, the ANN-based R&M cost estimations are calculated, and results are compared with those of conventional mathematical estimation model. The results show that the percentage error between the conventional and ANNbased estimation models is below 1%, indicating high estimation accuracy of the proposed ANN model. The proposed ANN model requires a substantially smaller amount of data and, with the inclusion of curve-fitting coefficients in the model, can achieve improved estimation accuracy. The ANN-based model is beneficial for pricing the service rate of agricultural machinery in an efficient and reasonable manner. Moreover, with minor modifications, the ANN-based R&M cost estimation model is also applicable to other geographical areas and combine harvesters or tractors of different countries of origin.