BI-RESPONSE TRUNCATED SPLINE NONPARAMETRIC REGRESSION WITH OPTIMAL KNOT POINT SELECTION USING GENERALIZED CROSS-VALIDATION IN DIABETES MELLITUS PATIENT'S BLOOD SUGAR LEVELS

: This article discusses statistical modeling implemented in the health sector. This study used a bi-response nonparametric regression method with truncated spline estimation that used two response variables. The nonparametric regression method is used when the regression curve is not known for its shape and pattern. This study aims to model the blood sugar levels of people with diabetes mellitus. The data used are blood sugar levels of people with diabetes mellitus before fasting, blood sugar levels of people with diabetes mellitus two hours after fasting, cholesterol levels, and triglyceride levels. Determination of the optimal knot point using Generalized Cross-Validation. The parameter estimation method used is Weighted Least-Squares. The best model was obtained from the study results,


INTRODUCTION
Regression analysis is a way that can be used to determine the relationship of a response variable with one or more predictor variables [1]. Regression analysis is an analytical method used to analyze data and draw meaningful conclusions about the relationship of variable dependence on other variables [2]. In the regression analysis, there are three approaches to estimating the regression curve: parametric, nonparametric, and semiparametric [3], [4]. If data in the form of a regression curve forms a known pattern, such as a linear, quadratic, cubic, or polynomial form of degree k, then the proper approach is parametric. The nonparametric approach is used when the shape of the regression curve of data is unknown. A semiparametric method is used when the shape of the regression curve is partly known and partly the pattern is unknown [5], [6].
The state of the data where the regression function is unknown often makes researchers decide to complete their research or analyze the data using nonparametric regression analysis [7]. Some authors, such as Hardle [8] and Wahba [9], suggest using nonparametric regression to model data for good flexibility. The nonparametric regression model is a regression method that is used when the regression curve between the response variable and the predictor variable is unknown in shape or pattern [10], [11]. This is because the nonparametric regression model has high flexibility in forming the regression curve and, in particular, does not require parametric assumptions [12]. Several nonparametric regression model approaches that are widely used include Kernel [13]- [16], truncated spline [17]- [20], Fourier series [14], [21], [22], and so on. A popular nonparametric regression approach is the truncated spline.
A truncated spline is a truncated polynomial function of order k, in which the part has connecting points called knot points [3], [23]. The knot point is the joint point where there is a 3 CROSS-VALIDATION IN DIABETES MELLITUS PATIENT'S BLOOD SUGAR LEVELS change in the behavior pattern of the function or curve. The order in the function indicates the degree of the polynomial degree of the function. These knot points and orders will then be used to determine the truncated spline regression model [10]. Truncated splines can describe changes in behavior patterns and processes at sub-intervals, overcome data patterns that show sharp ups or downs with the help of knots, and the curves produced are relatively smooth [3].
Current nonparametric regression research has been studied and developed based on the type of response and the number of response variables involved in regression modeling. Regression research based on the number of responses is divided into single-response, bi-response, and multiresponse regression models [24], [25]. The truncated spline regression analysis used to investigate one predictor variable with one response variable is called a univariable truncated spline. If, in the regression analysis, there is one response variable with more than one predictor variable, then the regression is called a multivariable truncated spline. At the same time, the regression analysis in which there are two response variables is called bi-response truncated spline regression. Biresponse Truncated Spline Regression is included in multivariate regression because it consists of one or more correlated response variables and one or more predictor variables [26], [27]. The existence of the response correlation in the bi-response regression model can be determined using the Pearson correlation coefficient. Some studies that examine bi-response truncated spline regression include: [24], [25], [28] The application of regression analysis is widely used in various problems in the health sector.
One of the cases of health with a high risk of death is diabetes. WHO says diabetes mellitus can be defined as a disease or condition with a chronic metabolic disorder with multiple etiologies, which can be characterized by high blood sugar levels and also accompanied by metabolic disturbances in carbohydrates, lipids, and proteins as a result of functional insufficiency [29], [30].
Diabetes doesn't just cause premature death worldwide. This disease is also a significant cause of blindness, heart disease, and kidney failure. The International Diabetes Federation (IDF) organization estimates that at least 463 million people aged 20-79 years in the world have diabetes in 2019, or the equivalent of a prevalence rate of 9.3% of the total population at the same age. The prevalence of diabetes is estimated to increase with the increasing age of the population to 19.9% or 111.2 million people aged 65-79 years. The number is predicted to grow until it reaches 578 million in 2030 and 700 million in 2045 [31]. Based on this background, researchers used biresponse truncated spline regression and applied a model of blood sugar levels in patients with diabetes mellitus in inpatients at Abdul Wahab Sjahranie Hospital (AWS) Samarinda, East Kalimantan.

A. Nonparametric Regression
In general, the general nonparametric regression model with data pairs ( , ) is written in The general purpose of regression analysis is to estimate or predict a value of the response variable when the predictor variable is assigned a value [32] - [34]. In other words, it is to find the form of parameter estimation that matches the shape of the regression curve.

B. Truncated Spline Nonparametric Regression
A truncated spline is a regression model that can adapt to changes in data patterns, so it is often said to have high flexibility. The truncated spline function is written in Equation (2).
The truncated function in Equation (3) is described in Equation (4).
The regression model in Equation (3) can be written in matrix form as in Equation (5).
Using the Maximum Likelihood Estimation (MLE) in the parameter estimation process. Obtained parameter estimation of ̂ in Equation (6).

C. Bi-response Truncated Spline Nonparametric Regression
Bi-response truncated spline is defined as one of the nonparametric regression models with more than one response variable. Between these response variables, there is a strong correlation or relationship. The model for bi-response truncated spline is written in Equation (7).
The functions ( ) and ( ) are regression curves of unknown shape and will be approximated by a bi-respon truncated spline in Equation (8).
The regression model in Equation (8) can be written in matrix notation as: ; Matrix is a null matrix of size × ( ( + )).

D. Generalized Cross-Validation (GCV)
Theoretically, the GCV method has asymptotically optimal properties, is invariant to transformation, and does not require variance information in its calculations. The formula for selecting the optimal knot point in the bi-response truncated spline regression using GCV is written in Equation (12). With

A. Data Sources
The data used in this study is secondary data obtained from Abdul Wahab Syahrani Hospital (AWS) in 2022. The research variables used in this study are presented in Table 1. Triglycerides are a type of fat that is found in the blood.
Triglyceride levels were taken in patients with diabetes mellitus.

MAIN RESULTS
In this section, we will explain the results of a bi-response truncated spline nonparametric regression study applied to blood sugar levels of people with diabetes mellitus in inpatients at the Abdul Wahab Sjahranie Hospital (AWS) Samarinda, East Kalimantan.

A. Descriptive Statistics
Description of the data using descriptive statistical analysis is used to see an overview of the data used. The descriptive statistical analysis used in this study is the average, minimum, maximum, and standard deviation. The results of the descriptive statistical analysis are presented in Table 2.

B. Scatter Plot
The initial step in the bi-response truncated spline regression is to determine the relationship pattern between each response variable and the predictor variable using a scatter plot. The goal is to see the data patterns that are formed. Based on the results of the scatter plot that has been carried out on the response variable, namely data on blood sugar levels in patients with diabetes mellitus before fasting ( 1 ) with predictor variables, namely data on total cholesterol levels in patients with diabetes mellitus ( 1 ) and data on triglyceride levels in patients with diabetes mellitus ( 2 ) can be seen in Figure 1. The scatter plot from data on blood sugar levels in patients with diabetes mellitus two hours after fasting ( 2 ) with predictor variables, namely data on total cholesterol levels in patients with diabetes mellitus ( 1 ) and data on triglyceride levels in patients with diabetes mellitus ( 2 ) can be seen in Figure 2.  Figures 1 and 2 show that the relationship pattern between the response variable and the predictor variable is unknown, so the appropriate solution is to use a nonparametric regression approach.
The bi-response truncated spline regression model with two predictor variables and three-knot points is presented in Equations (15).
The bi-response truncated spline regression model, in general, that has been described previously is the first step taken to estimate parameters. Next, the model selection with the optimal knot point is carried out.

D. Selection of Optimal Knot Points
The best model selection from the bi-response truncated spline regression can be seen from the minimum GCV value and the maximum R 2 value. The formula of the bi-response truncated spline nonparametric regression model for three optimal knot points is in Equation (15). The estimation of model parameters with three-knot points has been described previously so that the best spline bi-response truncated spline regression model is written in Equation (16)

E. Model Fit Test
A comparison between the estimation results and data on blood sugar levels in diabetic patients is presented in the graph in Figure 3.  Figure 3 shows whether the resulting estimated or predictive data tend to approach the graph of the actual data or the original value data.

CONCLUSION
The best model from bi-response truncated spline nonparametric regression model with threeknot points. The resulting GCV value is 8.573, with an R 2 of 99.625%. The bi-response truncated spline regression model for the first response variable, namely data on blood glucose levels in patients with diabetes mellitus during fasting, is as follows.