Accelerated failure time models in analyzing duration of employment

Parametric accelerated failure time models (AFT), that assess the relationship between the event times and dependent variables constitute an essential class of regression models. In this research we have used the accelerated failure time models: in particular, we have taken into consideration exponential distribution, Weibull distribution, log-logistic model and lognormal distribution, to analyze the duration of employment for employees that have been in their current job or otherwise known as job tenure. A variety of commercial and public companies in Albania were surveyed for this study. Initially the goal of the research is to find the optimal probability distribution to suit the data. Then, the accelerated failure time model is used to assess the impact of the predictors, such as: the employee's age, wages, the employee's age when he has started the job, sex, profession, academic degree, marital status and years of experience prior to this position, in job termination. The log-normal AFT model was the most accurate model for predicting the outcomes of the study, and from this model we can conclude that the employee's age, the employee's age when he has started the job, wages, academic degree and position in the company can affect how long someone stays employed.


Introduction
The most common type of model used to predict the duration of socio-economic phenomena is the survival model [1]. There are many researches into the matter of duration of employment and unemployment using survival models [2], [3]. In [4] authors focus on job tenure as a measure of whether a job is a good match for someone after unemployment. The relationship between career flexibility and professional plateau is examined, with focus on how long employees have been employed and their own self-awareness is examined in [5]. Job duration is of uppermost importance to workers, as it can be interpreted as a measure of job-for-life. The median length of time that employees have been with their current employer was 4.2 years in January 2018, according to the Bureau of Labor Statistics. [6] adopt the semi-parametric Cox proportional hazards model, in and [7] determine whether work ability played a decisive role in keeping a job among hospital staff in Sã o Paulo, Brazil, who eventually were laid off within 4 years. By using multiple-record Cox proportional hazards model [8] examine risk variables for health-related job loss over a two-year follow-up period. In this research, we have surveyed of the feature of 887 employees, in a variety of commercial and public companies, in Albania. In the first stage of the study, we analyze the duration of job tenure and find the parametric probability distribution that best fits the data. Given that the time is a continuous variable we have taken into consideration some continuous distributions such as: lognormal; Weibull; exponential and log-logistic distribution which is mostly used in the literature. Then, variables, which are thought to have a great impact of length of employment are studied. The factors studied are: the employee's age, wages, the employee's age when he has started the job, sex, profession, academic

Materials and methods
When the length of time or survival time, is assumed to follow a known distribution, then we can use a parametric survival model for the data. In the case of parametric models, full maximum likelihood method can be used to estimate parameters and the probability distribution of the variable can be estimated. The probability distributions we have considered are continuous distributions: lognormal; Weibull; exponential and log-logistic distribution. In survival analysis models, explanatory variables are considered to be related with the time it takes for the event to occur. To explore this relationship, researchers proposed the Cox Proportional Hazards model with right censoring [9]. A useful alternative to the Cox PH model is the accelerated failure time (AFT) model [10], [11]. This model supposes that the error term of this linear regression has a specific distribution. This model has the advantage of an intuitive physical interpretation [12] and gives more efficient estimates compare with Cox model [13]. We have denotes with 1 ,..... n XX the variable of interest with probability density function f and cumulative distribution function F. The censoring variable is 1 ,.....C n C with continuous distribution function G. We suppose that the variable X is independent of C. When taking into account the random right censoring, the variable is not fully observed. We can only notice ( , ) where 0 () ht is the baseline hazard function. The influence of covariates on hazard in equation (2) depends on time and at the same time is disproportionate. A negative sign on a coefficient for a covariate indicates that it reduces the hazard value, but may also increase the duration of the exposure when the covariate increases [14]. The survivorship, the probability density, the hazard and cumulative hazard function are mathematically equivalent and all the relevant formulas are summarized as below: The general Weibull regression model is The log-logistic and lognormal regression models are examples of accelerated failure time models as well as the exponential model, which is a special case of Weibull distribution, for α = 1. If the event time T has lognormal distribution, we model its dependency from explanatory variables as equation (3), where the variable has standard normal distribution: The survivor function has the form If the event time T has log-logistic distribution, we model its dependency from explanatory variables as equation (3), where X has standard logistic distribution: The survivor function has the form:

Study population
In our research, we have looked at how long 887 employees, have been employed in their current job, in a variety of commercial and public companies, in Albania. This study looks at the effect of a given period of time, from January 1991 to December 2018. The time when the employee begins working for the employer, is considered as the start time, while the end time is the time when financial relations are interrupted. At the time the data was received from the company some employees are currently working, while some are no longer employees of the current company. For this purpose, we have established a censoring variable. This variable has a value of 0 for the "still working" variable has a value of 0 if the data was last updated within the past week and a value of 1 otherwise. "still working" variable and a value of 1 otherwise. We have tried to fit a probability model for duration of job tenure. Given that the time is a continuous variable, in our case in months, we have taken into consideration some continuous distributions such as: lognormal; gamma; Weibull; exponential and log-logistic distribution. Some factors that are thought to be important in the time of staying employed with the same employer and the same job position are studied. The accelerated failure time (AFT) model, is used to estimate how these factors may effect on job tenure. There are in total 887 employees, 534 of them were still in financial relationship with the company, in December 2018, the last month when the data has been update. The other part of the employees has had their employment relationship with the company interrupted. The shortest time that an employee has been in a current job position is one month and the longest time in the same job position is 26 years. In the first stage of this work we have proposed four probability distributions to fit the job tenure time: the lognormal, exponential, Weibull and log-logistic distribution. To estimate and evaluate the fitting we have used R software as a tool. To compare the fitting performance of the probability distributions we have used QQ plot graphical test and the information criteria statistics (AIC and BIC).  As it is clearly seen from Table 1, also the goodness-of-fit statistics for the job tenure time are in favor of the lognormal distribution, with parameters: meanlog: 5.2; sdlog: 2.44. From a given set of candidate models, we take the one that minimizes the AIC (BIC). The Kaplan-Meier estimator [15] is a non-parametric maximum likelihood estimator for survival function. This estimator does not make an assumption on the distribution of T, but from the curve we can see which parametric distribution is appropriate for the data.  Figure 2 shows the estimated survival function using Kaplan-Meier method and the lognormal, exponential, Weibull and log-logistic distributions fit to job tenure, without taking into consideration the covariates. Also, we can see that exponential curve is very different from the Kaplan-Meier curve and again here we can confirm that lognormal distribution "best" fits the data.
In the next step we estimate the relationship between job tenure and all the covariates mention above. A stepwise regression method is conducted, to select the explanatory variables. A forward-selection rule starts with no explanatory variables and then adds variables, one by one, based on which variable is the most statistically significant, until there are no remaining statistically significant variables. The selection procedure has been done with the principle of minimizing the indicators of goodness-of-fit, in this case we have used the criteria of minimizing Akaike Information Criteria (AIC). This procedure is done using accelerated failure time (AFT), exponential, Weibull, lognormal and log-logistic models. The statistical significance of the explanatory variables was observed at the 5% level after the procedure was completed. The final estimated model yielded the following results, given in Table 2. The first column of the table shows the parameters estimation, with the log-likelihood method. A log likelihood test indicates that the model on the whole is significantly better than one which does not include any covariates. According to the AFT model, factors: the employee's age, wages, the employee's age when he has started the job, sex, profession, and years of experience prior to this position in the company are statistically significant with a p-value less than 0.05. The marital status and academic degree are eliminated from the model based on AIC. The parameter estimates from AFT models are interpreted as effects on the time scale, which can either speed up or slow down the survival time. If the acceleration factor is greater than one, then the exposure is beneficial to survival. Male significantly increases the survival time by approximately 30%. One unit change in the employee's age when he has started the job at the current position in the company shortens survival time by 16%. The results suggest that the median survival time is decreased by a factor of 2.9 if the job position of an employee is a supervisory, compare with a financier. Profession, sex and the employee's age extends the duration of financial relationship. When using a model to fit survival data, the survival curves can be adjusted to account for all of the covariates used as predictors. The Kaplan-Meier method is unsuitable for this problem because it is not adjusting for covariates. Figure 3 shows adjusted survival curves for job tenure, where the average value chosen for a covariate being adjusted is used.  The probability that an employee will continue to work with the current employer or company for at least one year, is 96.2 %. The lower quartile of time is 71 months; this means that, nearly one-quarter of employees leave their company within 71 months from their start date, while half leave within 114 months and 75% of them interrupt their relationship with the company within 160 months from their beginning.

Conclusion
The objective of this study was to analyse the length of time the employee has worked in his or her current position or with the current employer, to find the model that best fits the data and to assess the impact of the employee's age, wages, the employee's age when he has started the job, sex, profession, academic degree, marital status and years of experience prior to this position, in job termination. As a result of preliminary analysis on how to model the phenomenon in question, four models were proposed: exponential, Weibull, lognormal and log-logistic. Based on graphical and statistical tests, the lognormal distribution was found to be the best fit for the empirical data, with parameters: meanlog: 5.2; sdlog: 2.44. The results showed that the average work experience of an employee in a company is approximately 187 months, 25% of employees have been in their current position for more than 10 years, and 30% for less than 2 years. According to the AFT analysis, profession, sex and age extends the time of financial relationship, while the employee's age when he has started the job and wages affect positively in the interruption of the relationship, thus in the job tenure. Also, half of the employees leave within 114 months and 75% of them interrupt their relationship with the company within 160 months from their beginning.