Modelling lecturer performance index of private university in Tulungagung by using survival analysis with multivariate adaptive regression spline

Survival analysis performs relationship between independent variables and survival time as dependent variable. In fact, not all survival data can be recorded completely by any reasons. In such situation, the data is called censored data. Moreover, several model for survival analysis requires assumptions. One of the approaches in survival analysis is nonparametric that gives more relax assumption. In this research, the nonparametric approach that is employed is Multivariate Regression Adaptive Spline (MARS). This study is aimed to measure the performance of private university’s lecturer. The survival time in this study is duration needed by lecturer to obtain their professional certificate. The results show that research activities is a significant factor along with developing courses material, good publication in international or national journal, and activities in research collaboration.


Introduction
Survival analysis is one of the statistical methods that are specially used to process the data or cases related to the time duration until an event occurs. The typical data in survival analysis is the existence of censored data [1]. At the beginning of its development, the study about survival is concerned on the probability prediction of average life expectancy. Recently, survival analysis has been developed for identification of risk factors and prognostic factors associated with the maturity of the disease [2] and many other areas.
The survival analysis describes a process that is associated with time, from the time origin until the occurrence of a special occurrence or end point [3]. There several approaches in survival analysis, for example parametric and nonparametric approaches. One of the nonparametric approaches that can be employed is Multivariate Adaptive Regression Splines (MARS) as a nonparametric and semi parametric regression that takes into accounts the covariates with a multivariate approach [4].
Nonparametric regression using MARS does not depend on the assumption of a certain curve shape, has flexibility in high dimensional data, and modelling involves a lot of interaction with a few variables [5]. In survival analysis using MARS approach, the residuals of Cox Proportional Hazard Model (Cox PHM) is treated as dependent variable in MARS modelling [6]. Reference [7] is an example of research that employ survival analysis using MARS approach in Dengue fever cases, where the dependent variable of MARS models is martingale residual for uncensored data. Another study that employs Cox PHM with MARS approach also had been done for the case of survival of heart attack patients in Germany. It shows that the MARS approach gives better results than Cox PHM [6]. Reference [8] employed such approach in digital economic case. Another nonparametric approach for survival analysis is survival support vector machine, i.e. the modification of support vector machines (SVM) for survival analysis modelling. References [9,10] developed additive survival SVM and applied it in medical cases. A simulation studies on survival SVM had been done by [11,12] and applied to cervical cancer study. The survival SVM and survival MARS have different approaches.
In this study, the lecturer performance index or Indeks Kinerja Dosen (IKD) is modelled by using survival MARS. Lecturers are professional educators and scientists with the main task of transforming, developing and disseminating science, technology and the arts through education, research and dedication to society. Lecturer performance is defined as the ability to carry out the work or tasks that lecturers have in completing a job. Performance can be interpreted as work presentation, job performance, work achievement, or work result. It also refers to the result or output of a process. Education is aimed to: (1) improve the performance, capability and output of education; (2) facilitate communication and exchange of information about best practice of education with various types of educational institutions; and (3) as tools to improve institutional performance education and guidelines in strategic planning. By using Survival MARS, it will be known significant factors that influence the time needed by lectures to get their professional certificate.

Literature review 2.1. Survival analysis
Survival analysis is a statistical method that can be used to analyse data that related to start time (time origin) or start point until the specific event happened (end point) or failure event [13]. To determining the survival time, there are three term required: (i) Time origin (starting point), the starting time to record, (ii) Ending event of interest (recent events), the time when a lecture got the professional certificate, (iii) Measurement scale for the passage of time as a limit of the time of incident from the beginning to the end. The scale is measured in days, weeks, months, or years. In this study measuring scale used the time duration when a lecture was in years.
In survival analysis, there is difficulty in observing data is the possibility of some individual which cannot be observed from the start point to the end point. This situation is called the censored data [1] caused by three reasons: (i) Loss to follow up, it occurs when a lecture decides to move another university, (ii) Drop Out, it occurs at a lecture chooses to pension, (iii) Termination of study, it occurs when the research period was ended while the lecture has not get professional certificate yet.

Hazard function and survival function
There are two main functions in survival analysis. They are the survival function and hazard function. Survival function is the basis of survival analysis that refers to an individual opportunity to survive over time t [2]. Suppose T is survival time and has density function f(t), the cumulative distribution is denoted as: The survival function S(t) is probability of individual life longer than time t as follows: (2) The hazard function for estimating individual probability get an event at time t is described as [1].
(3) Relationship between density function, hazard function, and survival function is: The relationship of cumulative hazard and survival function is:

Cox proportion hazard model
The Cox PHM is proposed by Cox in 1972 [14] and now is widely developed. This modelling is a loglinear relationship between independent variables and hazard function as follows: where ( ) baseline hazard which is the form is not specified. The baseline hazard has specific form corresponding to the distribution of survival time.
Distribution approximation used in this study is time needed (started from first time to be a lecturer) by lecturer to get professional certificate. The distribution of survival time can be tested by using minimum Anderson Darling score as follows.

Weibull distribution with three parameters
Given the survival time follows Weibull distribution with three parameters; the baseline hazard has specific form accordingly. The cumulative distribution for Weibull distribution with three parameters is: with is location parameter, is scale parameter, is shape parameter, where and . The baseline hazard for survival time that Weibull distribution with three parameters is:

Multivariate adaptive regression splines (MARS)
Multivariate Adaptive Regression Spline (MARS) is one of the flexible methods for modelling highdimensional regression data. MARS is a form of extension of the Basis Spline Functions where the number of basis function is the parameters of the model. Some terms that need to be considered in MARS is as follows, a. Knots, the point of a regression line to form a region of a regression function. b. Basis Function (BF), collection of some of the functions that are used to describe the relationship between the response variable and the predictor. c. Interaction, a correlation between variables and the maximum number of interaction (MI) 1, 2, and 3. MARS model is formulated as follows: The best model produces smallest GCV which is denoted as: where ( ) ( ( ) ) .

Research methodology
Dependent variable of this study is lecturer performance index (IKD). The IKD is measured for both lecturer as civil servant (DPK) and lecturer with permanent status (DTY). Categorical dependent variable has goal to be suitable with MARS binary response. The data for survival function in this study is the time needed by lecturer as DPK or DTY in private university. The origin point is the starting time becomes lecture. The predictors involve education activities (6 indicators), research activities (4 indicators) and public services (3 indicators).
Steps of analyses in this study are as follows: a. Determining the survival data that will be used and eliminate the censored data.

Analysis and discussion
The first step of analysis is to fit the distribution of survival data. According to minimum Anderson Darling as reported in Table, the survival time follows Weibull distribution with three parameters such that the baseline hazard is specified accordingly. The baseline hazard function is estimated as follows:  Table 2.  Table 2 indicates the probability a lecture get their professional certificate through time t. In general, rate of getting certificate is increase in sequence time. A lecturer will get his/her existence of profession between three and four years, starting from first time as a lecture.
Based on the results of trial and error combination BF, MI, and MO, the combination of which produces minimum GCV value is a combination of 22, 3, 1 with a value of GCV = 0.373 with R 2 = 0.829. Based on the results of this combination, it is known MARS models produced are as follows: and where BF 6 = research activities, BF 7 = submission of article to International/national journal, BF 9 = attending conference activities, and BF 16 = participating in workshop and training. The results of survival MARS modelling show that in general, the variables that affect the survival to be professional lecturer are research activities, submission International/national journal, Attend conference activities, workshop and training.

Conclusion
Based on the empirical results using Cox PHM MARS approach, which used a combination Basis Functions, Maximum interaction are 22, 3, and 1 with a minimum GCV value was 0.022. Significant predictors influencing the survival time until the professional certificate for lecturer is obtained are research activities, submission to International/national journal, attending conference activities, and participation in workshop and training. Research activities have the largest contribution in the model, followed by submission to international/national journal, attending the conference, and participating in workshop and training.