Generalized Cross Validation (GCV) in Smoothing Spline Nonparametric Regression Models

A nonparametric regression model is utilized if the the regression curve does not contain information about the accepted shape and accepted curve is exist in a function. If any curve is given without the limitation of a certain functional form, a rough and non-unique curve will result. The smoothing spline can be utilized to remove rough curves in some segments by following a curve pattern. An approach that combines nonparametric regression and smoothing spline is known as the smoothing spline nonparametric regression model. The problem in estimating is the selection and determination of smoothing parameters obtained by taking into the sum of knots used and the position of the knots so that the Generalized Cross-Validation (GCV) method is required. A study was conducted on the smoothing spline nonparametric regression model on GCV. The method used in research is a literature study obtained from some articles, journals, and books that support research achievement. The results showed that with the GCV method the minimum GCV value was obtained which would determine how well the smoothing parameters shown by the estimator did not change significantly even though the number and position of the knots varied.


1.
Introduction The regression model is a show utilized to decide the numerical correlation between the predictor variable and the response variable. Assessing the regression curve gets to be an issue in regression analysis. In association with this estimate, there are two models utilized within the regression analysis that is the parametric and the nonparametric regression model. Parametric regression models are suitable in the event that the form of the regression curve is be discovered. The presumption of the parametric regression curve requires other sources accessible within the think about so that it can give point by point data. In case the form of the regression curve is not found or the instruction accessible around the regression curve is inadequately, at that point to estimate the regression curve it depends on the information so that a nonparametric regression model can be utilized. The use of constrained parametric regression yields inconsistent result. Within the nonparametric regression model, the curve is expected to fit in a function space where the function space choice is based on the nature of smoothness [1]. A process that can evacuate harsh curves by taking after a design is known as smoothing.
The development of smoothing techniques began in 1941 which was first introduced by Ezkeil. Smoothing aim is to minimize the diversity of data that does not affect so that the characteristics of the data will appear more clearly. Smoothing has gotten to be a common procedure in nonparametric  [2]. However, if any curves are given without any limitation on certain functional forms, the curve shape is inconsistent with the data. This is due to differences in the behavior of the function in each of its polynomial pieces. Spline technique is used to solve this problem by dividing the curve into several segments. Meanwhile, a model that combines smoothing and spline techniques is known as the smoothing spline model.
Smoothing spline is a nonparametric regression approach to obtain regression curve estimates [3]. Research conducted by [4] emphasized that estimation based on the smoothing spline technique has better results than kernel regression, where kernel regression is another nonparametric spline regression approaches. The main problem when estimating the smoothing spline regression function is selecting and determining the smoothing parameter [5]. According to [6] to obtain maximum results on smoothing parameters, the Generalized Cross Validation (GCV) method can be utilized. The GCV method is the superior of several methods that can be used to determine smoothing parameters because the calculation aspect is simpler and quite efficient [7]. In this study, was carried out for the GCV method in a nonparametric smoothing spline regression model.

2.
Research Methods The present research belongs to theory-based research which examines the smoothing parameters in the nonparametric smoothing spline regression model using the Generalized Cross Validation (GCV) method. The main procedure is to minimize the penalized least square model and determine the estimator function. This procedure includes estimating parameters, determining function estimators, and determining the matrix contained in the GCV method.

Result and Discussion
The study of spline regression was conducted by [8] using the spline technique. Based on the model formulated by [9] is to estimate the smoothing parameters to assess the smoothing spline regression curve. According to [10] there are various methods for selecting smoothing parameters and recommends GCV as a method for selecting spline smoothing parameters as computational effectiveness and precision of regression model functional coefficients. Based on the suggestion of [11], REML and GCV are a great smoothing parameter choice criteria for little and medium sample sizes. This procedure includes estimating parameters, determining function estimators, and determining the matrix contained in the GCV method.

3.1.
Nonparametric regression The model that is utilized when the regression curve is not found or does not follow a certain pattern is called a nonparametric regression model [12]. Nonparametric regression models are used to estimate regression curve that depend only on observed data. Suppose the predictor variable and response variable respectively ( , ), then the relationship between and is written as is a regression function to estimates and is error that is expected to be normally distributed where variance 2 and zero mean.
Assuming a function with an unknown regression curve is the goal of nonparametric regression [7]. The regression curve is only expected to fit in a certain function space in the sense of smooth so that it has high flexibility [3]. There are several techniques for estimating the regression curve in nonparametric regression, one of them is the smoohting spline.

3.2.
Spline function on nonparametric regression The spline is a segmented polynomial model where the segment properties provide better flexibility than the usual polynomial model. This property allows the spline regression model to fit properly to the local specifications of the data. The utilize of splines is centered on conduct or data patterns, which in certain sectors have different specifications from other sectors [12]. The m-order spline function with one explanatory variable is function that can generally be written in the form (3.2) Then equation (3.2) is substituted to equation (3.1), the spline nonparametric regression equation is obtained which is written as 2) of the third term shows that the spline is a fragmented polynomial model (piecewise polynomial), with the spline function being continuous at the knots [13]. Knots is characterized as a central point within the spline function so that the curve shaped is fragmented at that point. The knot point is a point of intersection that shows changes in curve behaviour at different hoses [14].
Furthermore, if there is an error with mean and variance respectively (0, 2 ) is assumed normal distribution, then also same as mean and variance respectively ( ( ), 2 ). While the error of equation

Smoothing spline nonparametric regression model
Smoothing spline is a function that can outline information well and has little error change [15]. The role of smoothing spline is to assess the regression function as a solution to the optimization problem obtained by minimizing the Penalized Least Square (PLS) function. Given ( ) as a smooth function contained in a certain function space, in particular the Sobolev room ∈ 2 [ , ], where is a positive number to solve the regression curve estimation and is error which is accepted to be normal distribution where mean and variance respectively (0, 2 ) [1]. An optimal estimator when minimizing the penalized least square which is presented in the form 2 is the sum of square error or a function of the range between the actual value and the estimated value, ∫ ( ′′ ( )) 2 is a roughness penalty, which is a measure of the smoothness of the curve in mapping the data and is used to manage the adjust between the achievability of the data (the goodness of fit) and the continuity of the curve (penalty) known as smoothing parameter [16]. The has a very big influence on the PLS function [17]. The value change from zero up to +∞, the arrangement changes from interpolation to linear models. When the value → +∞ Roughness penalty dominates the function PLS and the spline estimation is forced to be consistent. Meanwhile, the roughness penalty vanish from the function PLS, and the spline estimation interpolates the data when → 0. Therefore, determining the smoothing parameter is very influential in assessing an unknown function. The function estimator ̂ can be determined by minimizing equation (3.7) which is written . Used a much data , with = ( 1 , 2 , … , ) and is a vector of ( ), the equation (3.7) is determined in the form of a matrix which is written as where is a penalty matrix having a specific structure defined as ) is a matrix of sized × ( − 2) and is matrix of sized ( − 2) × ( − 2). Matrix and defined as 1  1  1  1  1  1  2  2  1  1  1  2  2  3  1  1  3   1  1 ( 2) 00 0 0 , 00  [18]. Next, the parameters estimator in equation (3.8) can be obtained by minimizing the PLS function which is written as are known as smoothing matrix that are positive definite (symmetric), depend on the smoothing parameter but not depend on [18] which is written as If ̂ is the estimator of the spline function and is the smoothing parameter in the spline regression, then selecting the optimal can be used the GCV method.

3.4.
Generalized cross validation (GCV) Determination of the optimal smoothing parameter is exceptionally critical to get a good curve estimator. To find out how well the estimator produced can be used the GCV method. In the smoothing spline nonparametric regression model, the method is written as  (3.11) where is a matrix of size × which is defined as -1 λ

S = (I + λK)
where is a matrix that is in accordance with equation (3.9), as known as the identity matrix and is the sum of the main diagonals of the expansion matrix of the smoothing parameter , while MSE it is the remaining average of square formulated as [ − ]) 2 with the value of influenced by , the optimum smoothing parameter is selected based on the small GCV value. Therefore, to obtain the optimal estimator of the spline function, it can be done by experimenting with the value of AEVEC 2020 Journal of Physics: Conference Series 1808 (2021) 012053 IOP Publishing doi:10.1088/1742-6596/1808/1/012053 6 (0 < < 1) until the minimum GCV is obtained. The smoothing parameter in the spline regression will be optimum if the criterion value is relatively small, meaning that the estimator has not changed much.

4.
Conclusion Smoothing spline with the penalized least square is a combination of the smoothing function with the mean squares of the residues or MSE in the spline regression. The smoothing spline regression model greatly affects the value of the smoothing parameter. The smooth parameter value has an important role in deciding whether or not the resulting estimated regression curve is good. The method utilized in determining smoothing spline regression parameters is Generalized Cross Validation (GCV) which shows that with the GCV method minimum GCV value can be obtained which determines how well the optimum smoothing parameter marked by the estimator does not change significantly.