The role of web browsing in credit risk prediction

Online mail order and online retail purchases have increased rapidly in recent years worldwide, with Covid-19 forcing almost all non-grocery shopping to move online. These practices have facilitated the availability of new data sources, such as web behavioural variables providing scope for innovation in credit risk analysis and decision practices. This paper examines new web browsing variables and incorporates them into survival analysis as predictors of probability of default (PD). Using a large sample of purchase and repayment credit accounts from a major digital retailer and financial services provider, we show that these new variables enhance the predictive accuracy of probability of default (PD) models at account level. This also holds in the absence of credit bureau data, therefore, the new information can help people who may not have a credit history (thin file) who cannot be assessed using traditional variables. Moreover, we leverage on the dynamic nature of these new web variables and explore their predictive value in short and long-term horizons. By adding macroeconomic variables, the possibility for stress-testing is provided. Our empirical findings provide insights into web browsing behaviour, highlight how the inclusion of non-standard variables can improve credit risk scoring models and lending decisions and may provide a solution to the thin files problem. Our results also suggest a direct value added to the online retail credit industry as firms should leverage the increasing trend of consumers embracing the digital environment.


Introduction
The mail order and catalogue sector and many other retailers offer credit together with the goods purchased.Online retail purchases have increased rapidly in many countries over the last decade and especially over the last five years.For example, in the UK online retail sales have growth at a substantially faster rate than in-store sales, increasing from a 3.4% share of all retail sales in 2007 to 27.9% in 2020.And from November 2006 to February 2020, all retailing except automotive fuel online sales had just over a ten-fold increase, showing how online retail sales were already growing strongly prior to the pandemic [1].In the EU, sales volume by mail order and the internet increased by 54% between 2014 and 2017 (Eurostat, 2017).The recent Covid-19 restrictions made online shopping the preferred and in many cases the only possible way of operation for non-grocery retailers.This rise in online sales on credit enables retail lenders to use new predictors in their credit scoring models that were unavailable before.These new variables describe the way the web is used when making online purchases and characteristics of online access by a credit applicant to their account.
Although, this developing literature (see additional details in Section 2) has demonstrated that "alternative" data offers valuable information to predict loan default, research has focused on applying this type of data mainly in peer to peer (P2P) lending platforms [12][13][14][15][16][17][18]21,24,25], credit cards [11,20,22] and traditional bank loans [4,5,[7][8][9][10].Very few papers have dealt with credit risk modelling in the online retail credit sector [19,24].To the best of our knowledge web related variables (e.g.behavioural online characteristics) in the context of the online retail credit industry have not yet been explored in the literature.This paper addresses this gap.Whilst the alternative variables mentioned above show some predictability of loan default, their implementation in real online retail credit systems may be impractical for the following reasons.First, some variables such as phone usage data cannot be accessed without explicit permission in developed countries due to data protection regulation laws (e.g.The General Data Protection Regulation 2016/679 -GDPR in Europe, Data Protection Act -DPA in the UK).Second, variables such as social networks simply may not exist (e.g.some people may not have social media accounts).Third, texts and photos are not typically collected by online retailers.Fourth, there may be customer resistance to the use and/or collection of, for example, mobile usage data.Hence, to predict the PD in the context of online retail credit, new data on new variables that are readily available to the service provider, are preferable.The aim of this paper is to show how web browsing variables, that can be easily collected by online retailers without specifically seeking this additional information from customers, can be incorporated into models of credit risk to predict PD at account level.We aim to show that these new types of variables can enhance the predictive accuracy of credit scoring models in comparison to that achieved by models that include only traditional predictors.
We make several important contributions.We show the impact on predictive accuracy of certain specific web browsing variables in addition to traditional application, behavioural and credit bureau information.Specifically, we first show that for the evaluation of an application for credit to fund online purchases, the inclusion of measures of customer interactions with the online platform including Number of website visits, Number of account sessions, Number of terms and conditions views and Number of mobile devices used, on average increases the predictive accuracy of credit scoring models.When a variety of costs are applied to the misclassification of applicants, the increase in accuracy and benefits to the lender are notable, even though the improvement in rank ordering as indicated by area under the Receiver Operating Characteristics curve (ROC) is modest.The predictive contribution of these variables has not been shown before in the literature.Second, we demonstrate the value of web browsing behaviour as a PD predictor variable.We do this by exploring its predictive value over two different time horizons.We show that the inclusion of web browsing variables improves predictive performance over the longer term (a 12 month horizon), but not over a shorter term (a 3 month horizon).Third, we show which new predictors are statistically significant in the new models.This gives confidence in the relationships identified.
These findings suggest it is beneficial for online retailers to collect this information to enhance the accuracy of their credit risk models.Our investigation of the predictive accuracy of a risk model when the new variables are included instead of behavioural variables answers the question as to whether the new information may help people who may not have a credit history or who have a thin file 1 and so no credit score using conventional variables.Enabling such people to gain a credit score including the new variables may facilitate financial inclusion.
The structure of the paper is as follows.Section 2 gives an overview of the literature on new features in credit risk models.Section 3 introduces survival models, the notation used in the paper, and describes the data and experimental set up.Section 4 discusses the results and section 5 concludes.

Literature review
Our paper contributes to the recent and increasing literature that has considered the predictive accuracy of additional and new covariates in credit risk scoring models.A number of papers have assessed the characteristics of text used to describe the borrower and loan purpose by P2P applicants as predictors of PD.For example, Gao et al. [15] considered the readability, tone and deceptive cues of text used.Netzer et al. [16] considered two word combinations and word classes.Iyer et al. [17] considered the number of words and whether the applicant included a picture.Dorfleitner et al. [18] examined spelling errors, keywords and the number of words.Using characteristics extracted from photos of both lenders and borrowers in the P2P lending platform, Gonzalez and Loureiro [25] analysed the probability of loan application success.
Other papers have considered psychometric information [4][5][6][7][8][9][10] to predict PD.Further studies have considered Facebook likes [12] or characteristics of friendship groups [13].Lin et al. [13] found that the online friendships of borrowers increase the probability of successful funding.A few papers have considered characteristics of mobile phone usage data such as phone calls, messages, data volume, and app usage [20][21][22][23][24]26] in predicting PD.Social media usage was considered by Lu [26] and social media network size and social media messaging activity was used by Ge [14] and found to increase predictive accuracy.
Several papers have related some aspect of loan performance to types of products purchased using the credit.Wu [27] showed that inclusion of such variables increased the accuracy of a PD model for P2P loans.Vissing-Johansen [28] found that type of product affects the proportion of a loan that is not repaid, but did not relate it to PD. Li [29] showed that the type of product (health or education products) were statistically significant in a PD model but their contribution to predictive accuracy was not shown.
The closest to our paper is by Berg [19].Berg et al. [19] considered factors affecting PD for an online German e-commerce company, which they described as a "digital footprint".This included factors such as the type of device used, the operating system, the channel from which the customer comes to the website, the time of day of purchase, the email host and the email provider, checkout time from the website and whether the customer allowed data tracking.Wu [27] considered different types of web based behaviour compared to our paper.Wu included topics of customer interest (health or digital) gleaned from their web usage and whether shopping or photo apps were downloaded onto their device.
However, despite the growing desirability of incorporating Know Your Customer, we cannot find any papers that have considered certain statistical aspects of web usage, apart from Wu [27] who considered search frequency, or of online account access as predictors of the probability of loan default.This is surprising given the rapid increase in using the web to make purchases.Mail order companies and retailers that provide credit online and who ignore the inclusion of new web related variables might see their models become less accurate as online purchases increase.Our paper adds to this literature by quantifying the increase in the accuracy in estimating the probability of a borrower defaulting when new web behaviour variables are included.Our research becomes even more important in the current situation with a rapidly increasing proportion of shopping moving online.

Survival analysis
Credit risk analysis is an essential tool to estimate the probability of a borrower defaulting [3,30] and allows banks and retail lenders to predict the credit risk in their portfolios of either traditional credit products or those offered online.Traditional scorecards predict the PD in a given time period.However, these fixed-period scorecards do not consider time-dependent characteristics.This difficulty can be overcome by implementing survival analysis [31][32][33].Survival analysis (SA) models the probability that a credit account will remain in a particular state (for example that payments are up to date) until a chosen time when it will move into a different state [3].For example, it models the probability that an account will move from being up to date with payments to being in default, for the first time, within a time period of interest, e.g. 6 history.A bank or lender may be unable to calculate a credit score since there is not enough information in the credit history to do so.B.J.G.Rozo et al. months or 12 months.A survival model differs from a cross-sectional model in that instead of modelling whether the customer defaults in a fixed period it models when the event (e.g.default) will occur [34].The advantages provided by survival analysis are discussed in [31][32][33][35][36][37][38], among others.

Discrete time
In this paper we follow the literature [37,39,40] and treat time as discrete since data are observed monthly.We assume a discrete time hazard function with a pre-specified, but very flexible baseline hazard function.Thus, we use the following model log where • P it denotes the probability of default for an account i in a (discrete) duration period t, t = 1, 2, ….,T; • x i denotes a column vector of covariates whose values are specific to a borrower, i, and do not vary over time (application variables); • ω it denotes a column vector of covariates whose values differ between borrowers and differ over time (behavioural and transactions variables); • z t denotes a column vector of covariates that vary over time but not between individuals (macroeconomic variables); • β 1 , β 2 , β 3 denote column vectors of coefficients to be estimated; • g(t m ) is the baseline hazard function; and • l denotes a lag.
The functional form of the baseline, g(t m ), must be specified.Various options are available [31,[37][38][39].In this paper, we use spline functions, specifically cubic spline basis 2 functions, because they are very flexible for the g(t m ) function.Notice also that macroeconomic variables are measured in calendar time and for each case their value must be matched to duration time using the relationship t = o + c where o denotes calendar time of account origination and c is calendar time.We estimate the parameters of eq.(1) using pooled logistic regression estimators.To ensure (a) a hazard model is estimated rather than a crosssectional PD model and (b) the model represents the probability that default happens for the first time in period t, the data matrix used in the estimation must be set up in a specific way.The dependent variable, the indicator of default, is coded 0 for all time periods before the period of default, 1 in the month of default and missing thereafter.See [37,38,43,44].

Data
Our dataset consists of active accounts that were opened between January 2013 and January 2017 and provided by a major UK online retailer and financial services provider.Performances were observed until October 2017.We have access to purchase and repayment information for this sample.The purchase process is as follows.A potential purchaser visits the company's website to search for the product the client wishes to buy.If the customer decides to buy a product, the purchaser will be given a choice between paying within a fixed period or repaying the cost of the product over 6 or 12 months after the order is placed, with the interest added to the price from the date of the order.If a purchaser has not purchased a product from this company before, the customer would complete an application form for an account.The potential purchaser (i.e.applicant) is credit scored using an applicationscoring model and if the applicant score exceeds a threshold, the applicant is granted credit up to a specific limit and the product is delivered to the new client.If the purchaser already has an open account, the purchaser is still credit scored using a behavioural model.If that score exceeds a threshold and if the price of the product, when added to any outstanding balance, does not exceed the credit limit then the purchaser is granted the credit.The purchaser can then choose between paying the full price immediately or taking credit and the product is delivered to the customer.In essence, if a customer has an open account this is a credit line to which the customer can add further credit up to a set limit.Therefore, a survival analysis is needed because our data has time varying behavioural variables and PD values in any chosen future period can be predicted.

New web behavioural variables
We have available a number of traditional characteristics of each applicant, for example age, socio-demographic category, bureau variables 3 and a number of behavioural variables.In addition we have access to a number of additional variables that are not commonly used in credit scoring models and that we wish to assess for their predictive performance.These are listed in Table 1 where we give the definition and variable name used in the analysis, for each variable.We call these variables "new behavioural web related variables".These variables are related to the interaction between the customer and the retailer's website.
We are unable to give summary statistics for each variable because of commercial confidentiality.We might expect that several of these new behavioural web related variables would be predictive of default.Hence we might expect that customers who were concerned about their ability to repay their balance might look at their balance, which includes accrued interest, more frequently than customers who were well able to repay because they may be repeatedly computing how much interest they would incur relative to the benefits gained by making further expenditures or debt repayments instead.
Previous research has also provided some evidence that psychological traits are related to credit performance; in particular, neuroticism or anxiety can be positively related to the number of missed credit repayments [45].It is logical to expect that people who are more anxious visit the website more often and show more erratic behaviour with a higher number of visits than less anxious people do.In terms of devices used, one might also argue that the number of devices used to access the company website, proxies for income.
To the best of our knowledge, the new behavioural web related variables relating to web browsing that we have access to are novel and have not been considered in traditional credit risk models literature for predicting PD.The inclusion of these new variables would potentially generate more accurate predictions of default or risk scores because they take a more complete view of risk.Another advantage of including such new variables into credit risk modelling is that it enables the PD to be computed using the most recent (behavioural) data available.

Other variables
Transactions variables are also available for inclusion.These are listed in Table 2.We also consider for inclusion several macroeconomic 2 A cubic spline basis function of a variable consists of a separate cubic polynomial within each of several selected ranges of values of that variable where the separating values between the ranges are known as knots.So, for example, a cubic spline function with four knots, γ 1 , γ 2 , γ 3 and γ 4 , can be written as where d is the power of the polynomial and B j is a set of basis functions and j is knot j.See [41; 42]. 3 Bureau variables are predictive variables collected by a credit bureau that gathers and researches individual credit information from various creditors and sells that information to private lenders for a fee.Thus, these variables help lenders decide whether to grant credit to a new applicant.

B.J.G. Rozo et al.
variables which are described in Table 3.We chose these variables because we expect them to be correlated with PD and because there is evidence in the literature that suggests they are [31,37,39,[46][47][48][49].For example, one might expect that if the unemployment rate or bank rate are high then on average PD would be high as people have less disposable income from which to repay loans.The house price index and FTSE both represent wealth of different kinds which would be expected to be negatively correlated with PD and the index of production is a proxy for average income.

Variable selection and model set-up
We coarse classified both categorical and numeric input variables and represented their original values as weights of evidence. 4Both are commonly implemented in practice [3,34,50].The variables that were eventually included in the models were the result of a two stage selection procedure.The first stage consisted in selecting variables individually using pre-screening methods (based on Gini and Information Value) prior to running the survival analysis.From the variables that passed the first stage of selection a second stage consisted of a stepwise survival model.
We carried out this procedure for covariates where the time-varying covariates were lagged 3 months and, separately, where they were lagged 12 months.That is, in eq. 1, ι, the lag length is taken to be either 3 months or 12 months.
Whilst the predictive horizon could be set at any time, we illustrate the predictive enhancement within in a 12 months period and within a 3 months period.Twelve months is the standard time horizon for PD estimation for regulatory models such as IFRS9, and a 3 month period that is a common period of prediction among practitioners.The final variables are shown in Table 4. 5The sample was split into two non-overlapping independent data sets; a modelling dataset for estimating and validating the model and a second dataset for testing (i.e.out-of-time).The modelling data was split randomly with 70% of accounts selected for training and the remaining 30% selected for validation.The models were parameterised using the training data and then assessed on the validation data set.The new behavioural web related variables for each account in the modelling data were observed until the account defaults for the first time or until the end of the observation time window.An account was considered in default if it had missed three consecutive payments.The new behavioural web related variables and the macroeconomic variables were lagged either by 3 months or by 12 months and we estimated separate models for each lag length.Thus a model with covariates lagged 3 months (12 months) yields predictions 3 months (12 months) into the future.For convenience we denote models with covariates lagged 3 months (12 months) as 'Lag 3' ('Lag 12').The modelling data set for models with Lag12 correspond to accounts opened from January 2013 to December 2015 with performance observed until December 2016.The testing data set for models with Lag12 consists of accounts opened between January 2016 and October 2016 and that did not default before October 2016.Their performance was observed until October 2017.
For each Lag12 test account a prediction of the probability that the account defaulted in any month between November 2016 and October 2017 was computed.These predictions were made by substituting the values of the covariates at October 2016 into the parameterised eq. 1 to predict the survival probability over the 12 month period where the change in the future probability between months was due to the change in the duration time variable only.Thus, the testing data is out-ofsample and (largely) out-of-time relative to the training data set whereas the validation data is out-of-sample but in-time.We argue that the use of October 2016 values of the covariates is justified because practitioners would typically use the latest values of the covariates that would be available to them.
Turning to models with Lag3, the modelling data set corresponds to accounts opened from January 2013 to December 2015, with performance observed until April 2016.The test data set for models with Lag3 consists of accounts opened from January 2016 to January 2017, provided they did not default before end January 2017.Their performance was observed until April 2017.For each test account with Lag3 a prediction of the probability that the account defaulted in any month between February and April 2017 was computed using the same procedure as for lag12.The sample sizes of the training, validation and test samples for models with each lag length are listed in Table 5.
We constructed four survival models with application, transactional, bureau and new behavioural web related variables for models with Lag3 and we used different variables for models with Lag12 depending on which were selected by the two stage selection process described above.Details of the variables were shown in Tables 1, 2, and 3. We chose four combinations of these variables for each lag as detailed in Table 6 to assess the contribution to predictive accuracy of the new behavioural web related variables.
All four models include application, transactional and macroeconomic variables.Model A1 corresponds to models where bureau variables were also included but no new behavioural web related variables.A2 includes bureau and the new web variables.A3 does not include either bureau or the new behavioural web related variables and A4 adds only the new web variables. 6To assess the performance of the new behavioural web related variables we compare the performance of model A1 with that of model A2, where the same application, transactional and macroeconomic variables are included, the only difference being the additional of the new web variables in A2 but not in A1.We also compare the performance of model A4 (which includes the new variables) with that of A3 that omits the new variables, in these two cases bureau variables are omitted.The latter comparison allows us to consider the performance of the web variables for applicants that do not have a previous credit history.

Assessing performance
We compare the predictive performance of the competing models using three criteria.Firstly, we use two standard statistical measures, Kolmogorov-Smirnov statistics (KS) and Receiver Operating Characteristic curves (ROC), that are very commonly used in the credit scoring literature [2,3,[51][52][53].Secondly, as is commonly used in the literature (see above references) and by practitioners, we also use metrics that require an account to be predicted to be either good or bad: accuracy and sensitivity (the proportion of bad cases predicted to be bad).The prediction is made by comparing the predicted survival probability over a given time horizon with a cut-off probability.The cut-off is computed from the training data set for each model such that the proportion of cases observed to be good over the horizon equals the proportion of cases predicted to be good over the same horizon.Cases are ranked in the training sample in ascending order by predicted hazard value (i.e.PD).The case located at the percentile equal to the proportion of observed Good cases is taken and its value of predicted hazard is the cutoff threshold.Then this cut-off is applied to the out-of-time test set to make predictions.Thus, the out-of-time test can be split into two sets, Good and Bad cases, and defining those above the cut-off as Bads and those below it as Goods.A confusion matrix is created for the out-of-time test using a different cut-off for each model.Advantages of this way of setting the cut-off are that it is independent of the model estimated (since the observed proportions of goods and bads are independent of the model).It is also independent of the corresponding holdout sample.
Thirdly, we compare the models in terms of misclassification costs.It is well known [3,31,54,55] that the cost of mis-classifying a Good account as a Bad, equal to the opportunity cost of lost interest [3] is smaller than that resulting from mis-classifying a Bad account as a Good, when the institution loses some or all of not only the interest but also the repayment of principal [56].Therefore, to calculate the relative misclassification cost, we apply a cost function that penalises the Type II error (observed Bad predicted as Good) as follows: (1) a correctly classified case has no cost (cost = 0), (2) a Good case predicted as Bad has a cost of 1 and (3) a Bad case wrongly predicted as Good incurs a cost of 20.Observed values of these cost ratios are not in the literature, but these relative cost penalties have been used before in the literature [31] because they are believed to be realistic.To demonstrate robustness, relative costs of 15 and 25 are also reported for both sets of models, Lag 3 and Lag 12.

Baseline survival function
Fig. 1 shows the empirical baseline Survival function (left hand scale) and the Hazard function (right hand scale) for the training data across the full time period of 48 months.The horizontal axis is the duration time since account opening.The survival plot (blue line), shows a typical decline over time that is consistent with the literature [36].From the hazard function (red line) we can see that the hazard of default is higher between the months 4 and 9 and then decreases drastically, especially between the months 15 and 30.After month 30, the PD has a constant mean but is quite changeable probably due to the relatively few cases in this region.

Model assessment and predictive performance
Table 7 shows results for the models with Lag 12. Panel (a) shows a comparison of the performances of the estimated models using proportions of cases and panel (b) shows the predicted costs of bad cases   misclassified in the out-of-time data set, when the optimal cut-off is computed from the training data set.The same results are presented for Models with Lag 3 in Table 8

panel (a) and panel (b). The row 'Input
Variables' shows the number of variables selected from those in Table 4 using the selection procedure explained in section 3.5.The row 'Significant Variables' gives the number of significant variables in the model.We first consider models with Lag 12.The effect of including the behavioural web related variables is indicated by the performance of model A2 relative to that of A1 since both have the same application, transactional and macroeconomic variables, but whilst A2 includes the new variables A1 does not.Although the performance uplift from A1 to A2 is modest in terms of ROC and KS, in terms of sensitivity and minimising misclassification costs A2 is clearly superior.Model A2 classifies 86.74% of bad cases correctly whereas model A1 correctly classifies 83.33%.The relative cost advantage of A2 versus A1 is most evident for a relative cost of 20 or higher.For instance, the cost reduction gain by model A2 when comparing it with model A1 is 6.5% and 8.3% at cost ratios 20 and 25 respectively.
If we compare the models with and without the new variables but without the bureau variables, we see that A4 (with the new variables) has higher ROC, KS and sensitivity values than A3 (without the new variables).Again, the model with the new variables has lower misclassification costs compared with the model without them; in this case a reduction in cost of 7.2% and 9.2% at cost ratios 20 and 25 respectively when the new web related variables are included.In general, based on the KS and areas under the ROC for the test data, the most predictive model for models with Lag 12 is model A2, which incorporates application, transaction, macroeconomic, bureau and the new behavioural web related variables.This is also confirmed by the graph of the ROC curves in Fig. 2. The sensitivities yield the same conclusion.Notice that the results from Table 7 panel (a) are consistent for both KS and ROC statistics and across all three data sets: training, validation (in-time) and test (out-of-time or scoring) data sets.
Turning to the results for models Lag 3, from Table 8 panels (a) and (b) we see that web-based variables do not enhance predictive accuracy when only a 3 month prediction is required.Thus, we conclude that web browsing data enhances the predictive accuracy in the long-term but not in the short term.We argue that this better performance for a 12 month horizon is particularly useful since the Basel Accord, see BIS 2015 [58], requires PD to be predicted over this longer horizon and because over a short term period a particular borrower's behavioural pattern may be merely temporary or circumstantial.
While the results for models Lag 12 are encouraging, additional robustness check were conducted to support these results.As a robustness check, we conduct a five-fold Cross-validation [59,60] to evaluate   the predictive power of the models.We randomly split the modelling sample into 5 equal-sized sub-samples.We train the model on four of the five sub-samples and test it on the sub-sample left out.The test sample in this case is out-of-sample but in-time.We repeat this procedure so that each sub-sample is left out once.We then repeat this procedure using a different seed value for generating the random partitions.Thus, we train 10 models for each set of variables in A1, A2, A3 and A4.We do this for models with Lag 12 only because that is the time horizon over which the web variables enhance predictive accuracy.We estimate the mean of four statistics: sensitivity, KS, ROC, and misclassification costs. 7Table 9 summarises the Cross-validation results.The results in Table 9 verify the conclusions from Table 7 that model A2 gives greater predictive accuracy than A1 and A4 gives greater accuracy compared with A3.In fact, the same variables are statistically significant in the Cross-validation models as in the single partition models (Table 7).As the results reveal, based on the average of the KS, areas under the ROC, sensitivity and misclassification costs for the validation and test data sets (Table 9) and on the results for the test data in the single partition (Table 7), model A2 shows the highest prediction accuracy for models with Lag 12.
Finally, we compute paired t-tests to assess the statistical significance of differences in model performance for each statistic.Table 10 shows these results.
A paired t-test is the most relevant in our context since we have pairs of measurements for each model that were obtained from the same sample.An overview of learning algorithms evaluation and a description of selected statistical significance tests can be found in [60].From Table 10 we can safely reject the null hypothesis, H 0 (i.e. the mean difference between pairs of measurements is zero), at the 0.05 and 0.01 level of significance for model A1 and A2 regarding ROC, sensitivity and misclassification costs for the three scenarios (i.e.15,20,25).We can conclude that model A2 and A1 show a statistically significant performance difference on the validation and test data sets.A similar result is observed for models A3 and A4.Overall, A4 and A3 also show a statistically significant performance difference on ROC, sensitivity and misclassification costs for two scenarios (i.e.20 and 25) in the validation and test data sets.However, the observed differences between pairs of measurements for KS for the models A1 and A2 and between A3 and A4 are not significant at 5%.Hence, the results strongly suggest that model A2 performs better than A1 and model A4 performs better than A3 since they have greater ROC, sensitivity and lower misclassification costs.
Notice also that the difference between pairs of measurements for ROC, sensitivity and misclassification costs are statistically significant for differences between A2 and all the other models.Though KS is not statistically significantly different between A1 and A2, and A3 and A4, the incorporation of web behaviour variables provides relevant information to assess PD at account level and models that include them outperform models that omit them.In addition, we verify the three   assumptions 8 of the t-test, which must all be verified for the paired ttest's results to be valid.In results no presented here we found that all the assumptions of the t-test are met.

Model outputs
This section discusses some of the regression parameters from the most predictive models presented in Table 7, model A2.They are detailed in Table 11.This table shows which new behavioural web related variables are statistically significant and retained by the selection routine.
We notice that variables such as Median_num_of_paym, Max_nu-m_of_payment and AverageperMonth_num_device are more significant than the other new variables, since they show larger Wald Chi-Square values.Other significant variables are socio-demographic variables (socioeconomic segment), consumer confidence (GfK), total outstanding balance/credit limit and a bureau variable.
The variables in Table 11 are weights of evidence which are typically not monotonic with respect to the raw underlying variable they relate to.It would be impractical to present this relationship for each variable and it may breach commercial confidentiality if we did so.However we can consider one variable in detail: Max_website (Maximum number of website visits) per month.This example will illustrates the nature of the insights the new web information can provide to lenders.
For all but those aged over 30 years there is a positive relationship between maximum number of website visits and default rate and for younger age groups it is positive when the maximum number of visits per months is 3 or more.This might be explained by a theory of financial well-being, where research has shown that there are significant associations between problematic internet use and depression, anxiety and stress [63].Customers that show emotional instability or higher levels of stress and/or anxiety tend to have excessive internet use [64,65].Excessive use of the internet may lead to behavioural problems if its use becomes uncontrolled.In addition, unhealthy spending and poor saving behaviour are also correlated with personal stress and anxiety [66].Hojman et al. [67] found that depressive symptoms are higher for those who have been persistently over-indebted. 9Our results are consistent with these theories by showing that customers who visit the retailer's website more frequently have a higher PD, as shown in Fig. 3, panel (c) relative to panels (b) and (a).
This observation is more evident in young adults (customers between 18 and 25 years old) than in older groups.Age can also have a significant effect on debt, where younger householders are more likely to be in debt than older householders [68,69].This could be explained by several reasons, for example it is an individual's lifetime income and consumption profile: the young borrow when expenditure exceeds income due to family commitments; it could also be because of the lack of basic financial knowledge, making young adults take poor financial decisions [70].The overall default rate for the oldest group (i.e.customer_age > 47) is the lowest of the five groups (panel d).But those in this age group that visit the web most frequently (panel c) have a default rate twice the average for their group (panel d).This rate (in panel c) is close behind that of the youngest adults and almost the same default rate for customers between 25 and 38 years old who visit the web frequently.Increases in the maximum number of website visits per month (Max_website) from 3 to 6 visits to over 6 visits (Fig. 3 panel b vs panel c) are associated with increases in the default rate for all age groups.Interestingly, the default rate for heavy website users is relatively independent of Age.

Models for applicants without credit history
In this section we show that if one omits credit history variables but use web browsing data instead we can gain commercially acceptable levels of predictive performance.This is important because in many countries significant proportions of the over 18 population either have no credit history or have only thin files.Having a thin file is problematic making it difficult to get credit.Demirguc-Kunt et al. [71] estimated almost 2 billion people in the world did not have an account with a financial institution.
We train models with Lag 12, using several combinations of the (weights of evidence of the) following variables: Number of views of the customer's account, Number of visits to the company's website, Number of terms and conditions checked and Number of mobile devices used only and their derivative variables (i.e.average per month, max, median and mean) plus two application variables (customer age and brand type) and the macroeconomic variables.A five-fold Cross-validation was performed as well to provide robust results (see Section 4.2 to review the procedure).The results for the model with the highest ROC are shown in Table 12 and Table 13.
The ROC of 0.7284 in Table 12 is acceptable by commercial standards.Other combinations of the application, macroeconomic and behavioural web related variables gave similar predictive accuracy.Table 13 presents the regression results for this model.Turning to the number of website visits, we can see that the PD is positively associated with the weights of evidence of both the median and maximum visits per month to the website.This result remains consistent with our previous results that larger weights of evidence for Number of website visits per month is associated with a higher PD.Overall, based on the ROC, coefficients and Odds, we conclude that models trained on these particular web variables are predictive.These results suggest that these models, when using website interactions specifically Number of views of the customer's account, Number of visits to the company's website, Number of terms and conditions checked and Number of mobile devices used are viable alternative models to predict credit risk when bureau and transactional data are not available.
This result is important for financial inclusion because it suggests that people with an open purchase account (whether used as a credit account or not and no matter how their application was evaluated in the first place) could be assessed by this type of model.With the availability of open banking, lenders might be willing to accept thin file customers by using merely application and very limited banking transaction variables and therefore, offer a small credit limit.That is, if they have been granted credit without either bureau data or transactional information related to the online retailer lender.These customers would subsequently have the possibility of building web behavioural data over a period of 12 months.As a result, lenders would be able to implement this type of model that will enable them to make decisions regarding increasing or decreasing the credit limit for thin files customers based on the new behavioural web related variables.These new covariates provide an alternative to models that include bureau and transactional variables in handling credit risk applications for additional credit (if they already have some) and potentially lenders can make responsible decisions by incorporating these variables. 8The assumptions are the samples come from normally distributed populations, randomness of the samples and equal variances of the populations.For a more complete description of the t-test, see [61; 62].We use the Shapiro-Wilk normality test to confirm the normality assumption.Overall, the results shows that the p-values are larger than the significance level α = 0.01 for all models across the different performance measures.We can conclude that the normality assumption holds for all models at α = 0.01.The resulting p-values are available on request.The second assumption is met since the selected samples for training, validating and testing were randomly chosen from their underlying population.The third assumption, similarity of variances, was tested with the Levene's test.The results for the third assumption are similar to those observed for the first one.Thus, all the assumptions of the t-test were met. 9Although it may be that over-indebtedness precedes depressive symptoms rather than vice versa.

Conclusion
The aim of this paper is to show that the inclusion of new web browsing variables, such as Number of website visits to a retailer, the Number of devices used to access the lender's site, the Number of account sessions and Number of terms and conditions views, into survival analysis as predictors enhance the predictive accuracy of a PD model at account level.
Our results show first, that including various transformations of Number of mobile devices to make online purchases, Number of visits to an organisation's website, Number of sessions visiting the borrower's account, and Number of terms and conditions views into a survival scoring model enhances its predictive accuracy compared to one containing only conventional application, bureau and transactional variables.Although the addition of new web browsing variables as predictors of PD increases the area under the ROC curve by only 1.0%, their inclusion reduces relative misclassification costs of Type II error across a range of alternative cost scenarios.This indicates that the new behavioural web related variables that vary over time not only have predictive power but also provide promising information to reduce the costs of misclassification.
Second, our results highlight that the default rate for heavy website users is independent of age, which contrasts the well published negative  correlation between PD and age for application samples as a whole.We further find that heavy website users have a higher default probability than that for less frequent users.This finding is plausible given its consistency with psychology and financial well-being theories.Our results reveal that the new behavioural web variable Maximum number of website visits has a positive relationship with PD for individuals aged 30 years and over.Third, we find that time-varying behavioural web related variables boost predictive accuracy in the long-term (over a 12 month horizon), but not in the short term (over a 3 month horizon).This result for a 12 month horizon provides a valuable tool for regulatory reporting under the Regulatory use of system-wide estimations of PD, LGD and EAD [72] and the Guidance on credit risk and accounting for expected credit losses issued by the Basel Committee on Banking Supervision [57] because the Basel Accord requires the estimation of PD in the course of one year.We also argue that this observation for a 12 month horizon is particularly useful because when the pattern is observed over a longer period, it is more likely to be indicative of a personality trait, whereas over a short term period a particular behavioural pattern may be merely temporary or circumstantial.
Fourth, we find models that include only predictors related to website interaction in place of transactional and bureau variables are highly predictive.Using this type of model could help a lender offer loans to some applicants who may not have a credit history.
Our findings are of interest to banks, retailers, lenders and in general the online retail credit industry because our results underscore that the incorporation of new web related behavioural variables (non-standard information) in credit risk models as predictors of the probability of a borrower defaulting increase accuracy in the PD predictions.Moreover, our results provide insights into other disciplines including financial well-being, personal financial risk management and online consumer behaviour.Given the increased generation of information on individuals, web related behavioural variables will be more important than ever in years to come.

Fig. 1 .
Fig. 1.Baseline survival curve.Note.The units on both axes have been removed for commercial confidentiality reasons.

Fig. 3 .
Fig. 3. Model with Lag12, A2.Relative default rate by maximum website visits per age group.

Table 1
New Behavioural web related variable names and definitions available for selection.

Table 2
Transactional variables available for selection.

Table 3
Macroeconomic variables.Office for National Statistics.BOE = Bank of England, NBS=Nationwide Building Society, LSE = London Stock Exchange.GfK = Growth from Knowledge is a global consulting service for the consumer products industry.

Table 4
Covariates included in each set of models after the selection procedure was implemented.

Table 5
Sample sizes.

Table 6
List of models.

Table 9
Cross-validation results for Models lag 12.

Table 10
Paired t-test statistics value for Models lag 12.

Table 12
Model with Lag 12 A5 without transactional and bureau variables.