Prediction of fishers’ income using a flexible model in Karanggongso fishers community, Trenggalek regency, Indonesia

The contribution of fisheries to the national GDP had increased from 2.32% in 2014 to 2.60% in 2018. However, in 2020, the threat of the Covid-19 pandemic emerged, which hit all sectors of the economy, including the fisheries sector. Many communities, especially coastal fishing communities, are complaining about economic hardship. Income has fallen dramatically because people’s purchasing power has fallen significantly. Based on these problems, this research was conducted to build a fishers’ income prediction model. This research took a case study on fishers in Karanggongso District Trenggalek by surveying 50 fishing households. There were 12 predictors variables, namely Boat Type (X1), Price Boat (X2), Age of the Boat (X3), Boat Power (X4), Machine Price (X5), Engine Life (X6), Fishing Equipment Price (X7), Fishing Gear Life (X8), Cool Box (X9), Trip/week (X10), Average Hours/trip (X11), and Total Expenditures/week (X12). The response variable is Income Per Week (Y). Data analysis was done by using multiple linear regression analysis and flexible modelling with a machine learning approach. Based on the results of the analysis, a multiple linear regression model had an accuracy level of R 2 = 70.5% and MSE = 1.086 × 1018 with the boat price was the most dominant influence on fishers’ income. While flexible modelling has an accuracy level of R 2 = 85.2% and MSE = 3.308 × 1014. From this research, it was proven that the flexible model had a higher level of accuracy than the linear regression model. Also, the flexible model obtained the nonlinear effect on the number of cool boxes and the fishing gear life.


Introduction
The contribution of Quarterly Fisheries GDP based on current prices 2014-2018 to National GDP shows an increasing contribution from an average of 2.32% in 2014 to an average of 2.60% in 2018. It shows an increase in value that reflects an average increase in the income of fisheries sector actors [1]. The value of Fishery's GDP increased from IDR 58.97 trillion in Q1 2018 to IDR 62.31 trillion in Q1 2019. Meanwhile, Fishers' Terms of Trade, which was at less than 106% in 2014, rose to 113.08% in May 2019 [2].
The achievements in 2019 will not be smooth without obstacles. The biggest threat continues from year to year, namely the rampant illegal fishing [3]. It is recorded that starting from 2014-2018, the Marine and Fisheries Ministry had sunk 463 foreign vessels proven to have carried out illegal fishing. In 2020, another threat that occurs internationally will emerge. The Covid-19 pandemic hit all economic sectors around the world, including the fisheries sector. Many people, especially coastal fishing communities, complain about financial difficulties because their income has dropped dramatically. Not because of problems in fishing activities to go to sea, but people's purchasing power has decreased significantly due to the pandemic [4]. The government is trying to take steps to provide direct assistance to the community by buying fishery products directly from fishers to increase people's purchasing power. Besides, the government has also increased the construction of cold storage in several areas [5].
Based on these problems, it is important to develop a model that can be used to predict the fishers' income so they can use resources optimally. Research is needed in the field of modelling that can predict fishers' income accurately, knowing not only the significance of the predictors, but also the predicting income accurately. As stated in Universitas Brawijaya's Master Plan for Good Governance, Universitas Brawijaya has launched a research program with strategic issues of Economic Resilience and Sustainable Local Business. This research was conducted to build a Fishers' Income Prediction Model with a Machine Learning Approach, where this research took a case study on fishing rods in Karanggongso, Trenggalek Regency.

Coastal community household production economics
According to [6], time as a farmer household economic resource can be allocated to activities which can be classified as follows: a. Income-generating activities b. Activities that do not generate income c. Relaxing (leisure) and d. Time devoted to acquiring skills Based on [7], revenue is all receipts of all members of economical household, both in the form of goods and services. These are the admissions: a. taking savings or savings b. sale or procurement of goods c. receivable receivables d. irregular shipments or gifts from family or other parties, inheritance or grants, and others. The results of the research showed that the income contribution of fisherwomen ranged from 2.25% to 45.45% with an average of 15.09%. This low contribution is due to the low-income category of the income earned from fisherwomen's activities. Most fisherwomen have income or income from activities as service sellers, such as "rampek" workers, "pindang" labourers, shop workers, and scavengers and sell with small capital. Household expenditure is one indicator that can provide an overview of the welfare state of the population. The higher the income, the portion of expenditure will shift from expenditure on food to non-food expenditure. Consumption expenditure is grouped into expenditure on foodstuffs and expenditure on non-food ingredients, namely: 1. Consumption of food ingredients: grains, tubers, fish, meat, eggs, milk, vegetables, nuts, fruits, oils and fats, beverage ingredients, spices, food and beverages, tobacco, and betel. 2. Consumption for non-food items consists of housing, fuel, lighting

Prediction model of fishers's income
The use of statistical data analysis methods and mathematical models that are often used to predict fishers' income is a multiple linear regression analysis. Here are some studies on the use of predictive models: Deep learning models for the prediction of small-scale fisheries catches: Finfish fishery in the region of "bahiá Magadalena-Almejas" [15] Non-linear Autoregressive Neural Network Table 1 shows several types of modelling methods used for prediction, namely the linear and nonlinear modelling approaches. The linear approach used includes multiple linear regression models and ARIMA. Meanwhile, non-linear approaches include Cobb-Douglass regression, Singular Spectrum Analysis, and Probit regression. This method has the advantage of the complexity of the model on the exogenous variables used. However, this method also has a weakness, namely a relatively large error rate. Besides, several studies also use a deep learning approach, one of which is by using an Artificial Neural Network. The advantage of this deep learning approach results in relatively lower errors because it goes through a fairly complex iteration process. However, this model also has a weakness, namely the limited exogenous variables used due to the complexity of the iteration, which adds to the difficulty, especially in the coding and iteration processes which take longer.

Machine learning for modelling
Arthur Samuel introduced the term machine learning in 1959 through his journal entitled "Some Studies in Machine Learning Using the Game of Checkers" (IBM Journal of Research and Development). Machine learning is a computer field that gives computers the ability to learn without being explicitly programmed using a model defined by certain parameters [16]. Machine learning uses statistical theories to form mathematical models. Models can be predictive or descriptive. In general, machine learning algorithms can be grouped into four categories, namely:

Types and sources of data
The data used in this study were primary. Primary data used in this study were obtained from a survey of fishing rods which was carried out in July 2019. Primary data was in the form of production assets, variable costs and production costs, the outpouring of fishing time, household expenses, and fishery product income measured using the observation sheet.

Place and time of research
This research was conducted in Trenggalek Regency, from July to September 2019. The location of the research was precisely in the target areas that had the impact of coastal development, especially Tasikmadu village where there is an Indonesian fishing port in that location.

Population and sample research
Based on the preliminary study, it was found that the total population of households who work as fishers in Tasikmadu Village, Watulimo District who live on the coast of Karanggongso is 333 households. By using the Slovin formula, the results of calculating the number of samples are as follows: The number of samples that were calculated, with the level of sampling error of 15% (0.15), a total sample of 40 households was obtained. 10 samples were added to the study so that the total sample taken in this study was 50 households to anticipate failure to produce valid data. As for the determination of households to be sampled, a sampling process was carried out using simple random sampling.

Best model selection criteria
The selection of the best model was made by comparing the largest R 2 and the lowest MSE. The following equation can calculate the value of the coefficient of determination (R 2 ). The greater the R 2 value, the tighter the relationship between and A B indicates a high level of prediction accuracy.
MSE had a function to obtain the magnitude of the difference that appears between the actual value and the predicted value. The MSE value was obtained from the following formula: The best model chosen is the model had the lowest of the MSE value, the lowest of the difference or error, or in other words, a high degree of accuracy.

Descriptive statistics
In general, it could be shown that there was high variability, especially in boat prices (X2), engine prices (X5), fishing gear prices (X7), and total expenses per week (X12). It could be seen from the relatively far Minimum -Maximum value and the high standard deviation value. Likewise, the income per week was also shown high variability.

Results of multiple linear regression analysis
To determine the effect of independent variables on income per week (Y) fishers in Karanggongso, Trenggalek Regency. The following are the results of testing multiple linear regression analysis: From table 3, we get the following regression model: Based on the results of the regression analysis in Table 3, it was shown that the boat price variable (X3) and fishing gear price (X7) have a p-value of less than 0.05 (p <0.05). From this test, it was shown that these two variables had a significant effect on Fishers's Income per Week (Y). The coefficient of determination (R 2 ) was 0.705 or 70.5%, and the Mean Square Error (MSE) was 1.086 × 10 18 . The diversity of data on the effect of the independent variable on Fishers's Income per week (Y) which was explained by the regression model is 70.5%. The remaining 29.5% was explained by errors and other factors not included in the linear regression model.

Flexible model analysis results
The process of estimating the flexible model parameters was carried out using the help of Eureqa 1.24 software. The generated model begins by making the mean model. Furthermore, the model generation process was carried out by considering the complexity of the model being built. In this study, the generating model process was carried out as many as 583,589 models were built. After generating a model with a stability level of 91.6%, the maturity of 85.6%, and converging of 96.3%, the 27 best possible models based on the R2 value are as follows: The goodness of fit model test that is formed is as follows: Based on the goodness of fit coefficients in the table above, it was shown that the 27th model had the highest R 2 value and the lowest Mean Squared Error. From this coefficient, it is proven that the 27th model has better accuracy than the other models. Therefore, the model used to predict the income of fishers in the Karanggongso area of Trenggalek Regency is as follows: The following is a plot graph between the actual income with the predicted value of the selected model: e X X X exp X X e X  Figure 2. Relationship between actual with predicted fisher's income.
Likewise, when viewed from the scatter plot (figure 2), the relationship between the actual value of fishers's income (Y) and the predicted results shows a positive relationship. The closer to the diagonal line, it indicates that there was a closer relationship, and the predicted value was closer to the true value.

Comparison of linear regression model with flexible model
Based on the results of modelling using linear regression analysis, a linear model was obtained with R 2 of 70.5% and MSE of 1.086 × 10 18 . In contrast, the flexible model with the deep learning approach obtained 1 model with R 2 of 85.2% and MSE of 3.308 × 10 14 . The following is a comparison of the prediction results between the linear regression model and the flexible model:   Figure 3 shows the scatter plot between actual income and predicted income for each variable. Predicted income from the regression model has a higher distribution, it even has a negative value. It is different from the predicted income from the flexible model where there is no negative predictive value found. As presented in descriptive statistics, it is shown that there is no negative income per week for fishers. From this test, it is shown that the flexible model has a better performance in predicting fishers's income.