GENDER EFFECT ON THE DEFAULT RISK IN PEER-TO-PEER LENDING MARKETS: THE CASE OF THE LARGEST CHINESE PLATFORM

How to cite this paper: Lingnan, L. (2019). Gender effect on the default risk in peer-to-peer lending markets: The case of the largest Chinese platform. Risk Governance and Control: Financial Markets & Institutions, 9(3), 8-22. http://doi.org/10.22495/rgcv9i3p1 Copyright © 2019 The Authors This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). https://creativecommons.org/licenses/


INTRODUCTION
Online Peer-to-Peer (P2P) lending is an emerging internet financial mode with the advent of Web 2.0 technology. In the online P2P market, investors who have a surplus of funds lend money to applicants for loans on the internet, instead of through conventional financial institutions such as banks. In the absence of conventional financial institutions, online P2P lending renders a higher return to lenders and lower interest rate to borrowers by lowering transaction cost. With so many advantages, online P2P lending has exploded in recent years. The first online P2P platform, Zopa, was born in March 2005 in the United Kingdom. Thereafter, online P2P platforms represented by Prosper, Lending Club of United States, sprang up in many countries. Because of the maturity of the credit report system, online P2P lending is growing rapidly in western countries. Online P2P lending commenced in China in 2007. Accompanied by interest rate deregulation, finance disintermediation and the burst of informal lending, online P2P lending has grown violently and an abundance of online P2P platforms has emerged since 2010.
The goal of this paper is to estimate the gender effect on default risk in the online P2P lending markets. Using data from the largest Chinese lending platform RenRenDai, we test the effect of borrowers' gender on the probability of default based on a multivariate Probit model regression. The test results show that there is no significant gender effect on the probability of default. Males and females seem to have equal default risk in P2P lending, ceteris paribus. This finding remains true in a variety of robustness check. Therefore, we can be confident enough to confirm that no significant gender effect exists on RenRenDai.
Compared to existing research, this paper has several distinguishing features. First, our study focuses on the gender effect on the default risk on P2P lending platforms, which is limited in the existing literature. Second, compared to the analysis based on the data from Prosper, Lending Club and other platforms in different countries, our data is collected by some internet technology which can simultaneously procure the information submitted by loan applicants. At RenRenDai, applicants should submit a bunch of information for verification, including gender, age, monthly salaries, education level, marital status, employment status, loan purposes, etc. The submission of personal information is mandatory and far more substantial than other P2P lending platforms such as Prosper, Lending Club. The abundance of information about individual characteristics allows us to take more influencing factors under control, which enables more accurate estimation about the gender effect on default risk and makes our results more precise.
The remainder of this paper is organized as follow. Section 2 takes a review of the existing literature in the field of P2P lending. Section 3 introduces our test methodology and regression model briefly. Section 4 takes an overview of lending and borrowing on RenRenDai and describes the data we used. Section 5 describes the econometric results employing a two-step instrumental Probit regression. Then, we take a few robustness checks to confirm the adaptability of our test results in Section 6. In the last section, we get the final conclusion and point out several reasons why generalization of this conclusion needs further research.

LITERATURE REVIEW
Peer-to-Peer lending has been gaining growing popularity and more and more research attraction in this area. Hulme and Wright (2006) studied the case of Zopa and concluded that the emergence of P2P lending reflects the social trends and a need for reform of the financial system in the information era. Ashta and Assadi (2008) examined the role that internet techniques play in promoting social interactions and associations with lower cost for P2P lending. De Roure et al. (2018) developed a simple theoretical model to test how peer-to-peer (P2P) platforms compete with banks for loans and predicted that (i) P2P lending grows when some banks are faced with exogenously higher regulatory costs; (ii) P2P loans are riskier than bank loans; and (iii) the risk-adjusted interest rates on P2P loans are lower than those on bank loans.
A great number of researchers investigated the determinants of default risk. Gomez and Santor (2003) found that default rates of peer group lending are lower than that of conventional individual lending in Canada. Iyer et al. (2009) found that lenders are able to use available information to infer one-third of credit risk captured by borrowers' credit scores in P2P lending. Lin et al. (2013) analyzed the social connections in determining default rate and found that borrowers' friendship with lenders is associated with lower ex-post default rates at Prosper, one of the largest P2P lending platform in the United States. Polena and Regner (2018) studied the determinants of borrowers' default in P2P lending with a new data set consisting of 70,673 loan observations from the Lending Club. They discovered that the debt-to-income ratio, inquiries in the past six months and a loan intended for a small business are positively correlated with the default rate. Annual income and credit card as loan purpose are negatively correlated.
There is also extant literature related to the role of fundamental borrower characteristics and personal information in determining default risk. Barasinska and Schafer (2010) studied the gender effect of applicants on the funding success at Smava, the largest P2P lending platform in German, and found that there is no gender discrimination in P2P lending market. Using the data from POSper, Herzenstein et al. (2008) investigated the influence of demographic attributes, including gender, race and marital status, on the funding success. Pope and Sydnor (2011) focused on the impact of gender on the funding success and interest rate and found that there is a systematic underestimate in the default rate of African Americans. Duarte et al. (2012) discovered that borrowers whose appearances are more trustworthy would have higher credit scores, more funding success and lower default risk on P2P lending platforms. Freedman and Jin (2017) examined whether social networks facilitate online markets using data from POSper and found that borrowers with social ties are consistently more likely to have their loans funded and receive lower interest rates. However, most borrowers with social ties are more likely to pay late or default.
Research that focuses on the default risk at Chinese P2P lending platform is limited. Liao Li et al. (2015) studied the case of RenRenDai and found that higher education level is associated with higher selfconstraint and lower default rate, but there exists a bias in identifying credit risk through education level for lenders. Therefore, our study is expected to shed some lights on the relationship between gender and credit risk. Using data from a large P2P platform in China, Xuchen Lin et al. (2017) explored the factors that determine the default risk based on the demographic characteristics of borrowers and proposed a credit risk evaluation model which can quantify the default risk of each P2P loan. Zhang and Chen (2017) used in RenRendai to examine the dynamic relationship between prior cumulative bids and current bids. They found that lenders appear to imitate each other's behavior and herd in the P2P lending market.

RESEARCH METHODOLOGY
As we mentioned before, credit risk is the main concern of the research on the P2P platforms. The credit risk can be depicted by the default rate of loans. For a borrower on a P2P platform, we can index him/her with i. A loan related to borrower i can be depicted by three basic elements: nominal interest rate I i , duration D i and loan amount L i . Apparently, the borrower's probability of default p i is correlated with nominal interest rate I i , duration D i and loan amount L i . Lenders on the platform cannot observe p i . However, they can derive p i from observable variables related to the borrower's characteristics. We can use vector X i to capture all the observable variables deemed to be necessary to enter the determinants of p i .
The funding success of lending on P2P platforms is determined by the willingness of lenders to provide funds and how much lenders would provide. Whether or not lenders provide funds and the number of funds actually provided is determined by the expected return from lending. Obviously, the expected return is closely related to nominal interest rate I i , duration D i , loan amount L i and default rate p i .
Our research problem is that whether male and female borrowers have different default rate p i given that the loan term and all the observable characteristics are the same. If this can be authenticated, then let the default rate of male borrowers be p, the default rate of female borrowers would be + , with ≠ 0. Hence, the profit maximizing lender would have the incentive to charge a higher risk premium from a borrower with higher default rate, which can affect the probability of funding success prominently. Lenders can also use this phenomenon as a screen device to phase out borrowers of a particular gender, which constitutes another problem on the P2P marketsgender discrimination discussed in many literatures related to this area. Therefore, we will test the hypothesis below: H1: Assumed that borrowers of different sex face the same loan terms and have similar observable characteristics, the probability of default would be different across gender groups, ceteris paribus.
The test of this hypothesis in the remainder of this paper will be based on the most popular Discrete Response Model -Probit Model. According to this model, the probability of default for borrower i relies on the following equation: where ϕ(·) is the standard normal cumulative distribution function, and ′ = (1, , , , , ′ ), ′ = ( 0 , 1 , 2 , 3 , 4 , ′ ).
is a vector of variables capturing all the observable characteristics of borrowers, and is a coefficient vector. Hence, the probability of not default can be expressed as: Using Eq. 1 and Eq. 2, we can write the likelihood function as follows: where (·) is a indicative function signaling whether or not the borrower is default.
Hence, our task is to solve the maximization problem of the log-likelihood function: The coefficients 0 , 1 , 2 , 3 , 4 , are estimated through the above maximization program. The estimators can be written as ̂0 ,̂1,̂2,̂3,̂4,̂.
The variable of interest is the dummy variable of Male. If the borrower is male, Male=1, else Male=0. The effect of gender on the probability of default is reflected by the coefficient estimator ̂1 . More specifically, ̂1 > 0( ̂1 < 0) represents that males (females) have larger default rate, respectively.

An overview of RenRenDai
RenRenDai, short for RenRenDai Business Consultant (Beijing) Ltd., was founded in May 2010. RenRenDai is affiliated with Youxin Financial Co. and an is independent brand. It is one of the earliest founded and leading P2P platforms, and is on the list of Top 100 Chinese Internet Enterprises in 2015 and 2016. RenRenDai aims at provide high-quality, professional financial information service for clients, build a trustworthy investment and credit lending platform.
According to the 2018 annual report disclosed on the internet, a sum of 409,592 loans in a total volume of ¥ 30,191,885,600 had been transacted on RenRenDai, with the year-on-year growth rate of 38.47% and 49.34%, respectively. At the end of 2018, there are 18,149,168 registered users on the platform in the aggregate; the number of lenders and borrowers is 519,010 and 394,617, respectively. The number of transactions and the total amount increased continuously (see Figure 1 in Appendix).
Loan applications. In order to issue a loan application on RenRenDai, borrowers have to pass two verifications -registration and loan application verifications. First, they need to register on the platform, and submit the personal information, including education level (see Table 1 in Appendix), monthly salary, marital status, whether or not they have outstanding debts, whether or not they own a house/car and other required materials. The platform will evaluate based on the submitted information and assign a credit line/score to every applicant. After the first verification, borrowers could submit their loan applications. The platform would investigate the authenticity of applications, and decide whether or not to issue applications.
Credit ratings. Due to the imperfect credit system and fragmentation of credit information in China, RenRenDai combines the on-line and off-line verification in the purpose of controlling loan risks. The on-line verification is based on the data analysis of the submitted applicants' information. The offline verification is conducted by employees from cooperating credit agencies through field investigation. RenRenDai will evaluate applicants' creditworthiness through the individual credit risk analysis system with on-line and off-line verifications taken together. Then, RenRenDai assigns a final credit line and rating to each applicant. The borrowers' ratings are divided into 7 categories: AA, A, B, C, D, E, HR.
Risk reserve. In case of loan delinquencies, RenRenDai has set up a risk reserve account. When a loan applicant gets funded, a fraction of loan is charged by RenRenDai based on the applicant's rating (see Table 2). This fund is deposited in the risk reserve account. When the payment of a loan is overdue for 30 days, RenRenDai will employ the risk reserve fund to buy lenders' claims and ensure that lenders' principals are entirely paid. Due to the change of regulatory policies, RenRenDai is turning to third-party factoring mode. In this pattern, if a borrower remains in arrears for 30 days, the claims of lenders are sold to a collection or a commercial factoring agency. After buying the claims, agencies get the legal right to recover debts by all means at their disposal.

Data set
Our data set incorporates all the applications posted at the RenRenDai from March 2016 to September 2016 which received at least one bid from lenders. These data are collected using a crawler search technology, and all these data concerning loan applicants or borrowers are observable to lenders on the platform and researchers, which would not invoke any proprietary problems.
According to the data set, a total of 81,223 individuals applied for loans and received some funds larger than 0. As we can see from Table 3, females accounted for 26,091 (32.12%) and males accounted for 55,132 (67.88%) of all borrowers. The distributions of monthly salary and the definitions of variables are shown in Table 4 and Table 5, respectively. Descriptive statistics of the variables are given in Table 6. As we can see from Table 7, there exist some discrepancies between borrowers of different genders. In general, females procured larger loan amount and longer loan duration than males and paid 0.015 percent lower interest rate than males. There are also many gender differences in borrowers' personal characteristics. For example, the number of males who have a high school, undergraduate, graduate degree is larger than that of females, but more females have a college degree. The majority of borrowers have ratings A, however, an obvious increase of the number of males can be detected as the rating category degrade from B to HR. Similarly, male borrowers earn, on average, more than females, and are elder than female borrowers. The gender differences of most employment statuses are significant, except that the number of females who are sole proprietors is not so much different from that of males compared to other occupations. Figure 2 plots the distribution of borrowers by loan purpose. Generally speaking, liquidity consolidation and personal consumption account for most proportions of loan purpose despite gender differences. There is also some evidence in accordance with popular stereotypes: males prevail in the categories related to a car purchase, house purchase, marriage preparation, education & training, furnishing, while females predominate in other areas. However, counterintuitively, more females seem to borrow for investing in innovation activities than males.

ECONOMETRIC RESULTS
We can use the methodology introduced in Section 3 to conduct a baseline regression. However, it is obvious that borrowers can offer higher loan rates and lower loan amount to increase the chance of funding success. The less creditworthy borrowers are, the more incentive they have to offer tempting loan terms to be fully funded, which results in a higher probability of default. Hence, the interest rate and loan amount would be endogenous in our model. The endogeneity problem is widely discussed in the statistical literature (Heckman, 1978). The endogeneity of factors will cause the estimation model aforementioned in this paper to produce biased estimates. This bias can be remedied through a two-step regression method whereby endogenous variables are substituted by a few instrumental variables.
In the first step, two subsidiary OLS regressions are employed. The interest rate and loan amount are regressed with respect to a set of exogenous variables including loan duration, ratings, education levels, monthly salary, employment status, age, length of description, marital status, ownership of houses, ownership of autos, loan purpose. The estimation results of these two subsidiary regressions are shown in Panel A of Table 8. Most margin effects of exogenous variables on the interest rate and loan amount are significant except for gender, although the adjusted-R 2 is only 0.446 and 0.232, respectively.
After the first-step regression, the fitted value of interest rate and loan amount can be used as instrumental variables in the second-step probit model. In order to satisfy the just-identified order condition of instrumental variable, some of the exogenous variables in the first-step regression should be removed out of the second step. The employment status is selected in the removal. The choice is based on the fact that employment status is obviously a factor influencing loan terms. For instance, a borrower employed by a private enterprise would be more unpredictable in payment of loan than a civil servant due to the attributes of the job. Thus, a higher interest rate and a lower loan amount are requested for a borrower working in a private enterprise. However, given that the credit risk caused by different employment status is compensated by loan terms, the probability of default should be similar.
The estimation result is recorded in Panel B of Table 8. The bottom row of Panel B is a Wald test for the exogeneity of interest rate and loan amount. The test statistic is insignificant, which means we cannot reject the null hypothesis of exogeneity and the endogeneity problem of interest rate and loan amount can be neglected. Furthermore, the estimated coefficient of variable Male is not significant in the two-step regression, which means that gender difference has barely an effect on the probability of default.
6. ROBUSTNESS CHECK 6.1. The number of control variables The above two-step instrumental Probit regression shows that the problem of endogeneity is not serious. Therefore, we can use the Probit model to test if the result that gender effect on the default risk is not significant holds for the normal regression. Table 9 reports the coefficients of independent variables determining the probability of default. Column (1) returns the outcomes of the baseline regression which incorporates a dummy variable Male, a set of variables related to loan terms and a set of dummy variables generated by borrowers' ratings. Column (2) returns the result of regression with monthly salary, age, description, ownership of houses, ownership of cars included besides the variables in the baseline regression. Column (3), (4), (5) and (6) describes the results of extended regressions with education level, employment status, marital status, loan purpose incorporated, respectively.
Each regression predicts a weak link between loan term variables -the interest, duration, loan amount and the probability of default in accordance with the results in Section 5. It is very exceptional compared with the situation in regular financial institutions. Usually, lenders will associate longer duration and higher amount with more uncertainty in loan repayment, hence higher risk premia will be required by lenders which give rise to a higher probability of default by borrowers. Hence, we testify that online P2P lending is very distinct from the conventional financial market.
It is not surprising that each model shows a strong connection between borrowers' ratings and the default rate. The probability of default increases by the degrading of rating from A to HR. Likewise, some borrower features are reported as playing a key role in the default rate. For example, the default rate decreases prominently with the growing of borrowers' education level. In comparison with borrowers who do not report their marital status, borrowers who are divorced, married and unmarried seem to have a higher default rate. This is a little bit complicated to comprehend. Yet, some individuals who borrow on the internet may be more concerned about their own privacy and refuse to submit their marital status to the P2P platforms. These borrowers are likely to attach more importance to their creditworthiness so that they have a lower probability of default. Moreover, we find a positive relationship between age and default rate. However, monthly salary, ownership of houses, ownership of cars, employment status and loan purpose appear to be irrelevant to the probability of default. It justifies our choice of employment status as the instrumental variable to interest rate and loan amount in Section 5 again.
It is noted that in all specifications, the sign of the estimated coefficient of gender is positive. However, each regression predicts that gender has no significant effect on the probability of default. Even though we add the control variables in the regression step by step, gender remains insignificant in determining default rate.

Effects of divergent observable characteristic variables
The estimation result from the first step regression shows that males procure higher interest rate and less loan amount than females (Panel A of Table 8). Apart from the interest rate and loan amount, there are significant differences between observable characteristic variables as Table 7 shows. Plenty of divergences related to observable characteristics between different gender groups will render our ceteris paribus assumption invalid, and the estimation of gender effect will be inconsistent. To solve this sample problem, we use the method of propensity score matching designed for dealing with treatment effect (Rosenbaum and Rubin, 1985). The goal is to estimate the gender effect with a sample of matched individuals. Except for the gender divergence, all the other observable characteristics of these individuals are almost the same.
The resemblance of borrowers is based on their propensity score. A propensity score represents the possibility that a borrower is male given the observable characteristics and loan terms, and is calculated with a logit regression model whereby a dummy variable Male is regressed on all the other observable variables. The balance of samples in different gender groups indicated by standardized bias is shown in Figure 3. We can see that most dissimilarities of observable variables (including loan term variables) are reduced remarkably after matching. The distribution of propensity scores is plotted in Figure 4. Obviously, the distribution of propensity scores by different groups is close to each other. Indeed, only 20 males and 19 females fall outside the common support of propensity score, which means approximately 0.05% of the overall samples remain unmatched and there is a good chance that we can find an identical "twins" of the opposite gender for every borrower.
After matching with propensity score, we estimate the average gender effect on the probability of default using the kernel matching method (Heckman et al., 1998) with Epanechnikov kernel to assign a weight to every matched sample. The 39 samples falling outside of the common support are excluded from this calculation. At the same time, the bootstrap method is employed to estimate the standard error and p-value of gender effect. The result shows that the average gender effect is equal to 0.0007 and is statistically insignificant. Hence, the robustness check again justifies our result obtained earlier.

CONCLUSIONS
The above analysis based on the samples from the largest P2P platform RenRenDai does not inform us of any significant gender effect on the probability of default with all the observable characteristics and loan terms being controlled. The result obtained is subjected to different robustness checks. Even if we take the endogeneity problem of the interest rate and loan amount into an account and separate out the effects of observable characteristics, this result holds true. Therefore, we can argue that there is no gender difference in the default rate on RenRenDai at this moment.
However, we must be very careful to generalize this conclusion to other P2P lending platforms due to the following reasons for two respects. First is caused by the limitation of our data set. The time span of our data is only two quarters due to the availability. However, people usually need to borrow money for smoothing their own consumption and financing other expenditures in the 1 st and 4 th quarters in the context of the spring festival. Fewer loans would take place in the 2 nd and 3 rd quarters since most firms would conduct the bonus payment to employees at that time. Therefore, the seasonal factors should be taken into account when we discuss the problem concerning default risks. Furthermore, most of the loans we use for analysis have not yet matured. The probability of default remains uncertain on maturity. The repayment state at maturity should be investigated and the default at maturity should be distinguished from that in fulfillment. In this sense, panel data are called for to test the validity of our conclusion regarding gender effects on credit risks.
Another reason to be cautious of the generalization stems from the sampling limitation of this paper. Apparently, results obtained from different P2P lending platforms rely on the specific operational mechanism of platforms, and even the regulatory environment the specific platform dwells in. Most of the current research, including this paper, is based on the data of one single platform sample. It would be unsound to conclude that there is no gender effect on default risk on all P2P lending platforms. Therefore, a comparative analysis of divergent P2P lending platforms regarding the role of different operational mechanisms and regulatory environments is necessary and remains to be carried out in further studies.        Nominal interest rate the borrower should pay, ranging from 7-24, in % p.a. Duration

REFERENCES
Loan term, multipliers of 3 ranging from 3-36, in months. Loan amount Loan amount, divided by 1000, in RMB.

Education
Categorical variable with 4 values representing the borrower's education level as defined in Table X.

Rating
Categorical variable with 6 values based on the credit scores RenRenDai assigned for borrowers through risk evaluation.(See Table X