Estimating credit default probabilities using stochastic optimisation

: Banks and financial institutions all over the world manage portfolios containing tens of thousands of customers. Not all customers are high credit-worthy, and many possess varying degrees of risk to the Bank or financial institutions that lend money to these customers. Hence assessment of default risk that is calibrated and reflective of actual credit risk is paramount in the field of credit risk management. This paper provides a detailed mathematical framework using the concepts of Binomial distribution and stochastic optimisation, in order to estimate the Probability of Default for credit ratings. The empirical results obtained from the study have been illustrated to have potential application value and perform better compared to other estimation models currently in practise.


Introduction
Probability of default (PD) is a financial risk management term describing the likelihood of a default over a particular time horizon.It provides an estimate of the likelihood that a borrower will be unable to meet its debt obligations.Under Basel II guidelines, formulated by the Basel Committee on Banking Supervision or BCBS (2001), PD is a key parameter used in the calculation of economic capital or regulatory capital for a banking institution.Banks that comply with the new Basel II internal ratings-based approach are obliged to assign a PD value (usually a 1-year PD) to their clients, which are used as the basis for regulatory capital requirements.
A popular method of quantifying Probability of default is through credit ratings, where each entity in a bank's portfolio is assigned a rating grade depending on its past, current and future behaviour.
Entities with same rating grade represents similar credit risk.Some banks have internal score card and rating models that assign ratings based on internal methodologies.On occasions, banks may use ratings supplied by external credit rating agencies such as Moody's and Standard & Poor's (S&P) instead of internal ratings.For example, S&P Global ratings provide a rating system with 22 rating classes and Moody's, which is another popular rating agency classifies entities to 21 rating classes based on their credit worthiness.This is permissible because Basel II accord allows banks to base their capital requirements on internal as well as external rating systems and risk profile.
Large banks employ various mathematical techniques using the historical data of customer or account level ratings assigned by its internal ratings system (or external rating system) to forecast the default probability of each entity.The number of grades used by the bank depends on the bank's individual preference.The "upper" grades represent lower default risk and hence lower probability values are assigned to them while "lower" grades represent higher default risk and consequently, higher probability values are assigned to them.
One of the important concerns in the financial industry is regarding the estimation of true Probability of Default that captures the true default risk associated with a portfolio, to the best extend.Empirically estimated PD values based on sample data often deviate from expectations in that the observed PD values of rating classes are non-monotonic in nature (Pluto and Tasche, 2005).That is, ′ >  ( ′ is "lower" grade) need not imply  ̂′ >  ̂ ,where  ̂ is the estimated default rate for rating grade R. Tasche (2013) discussed this issue which he termed it as the "Inversion of default rates".Non monotonicity is an issue because the observed default rates do not reflect the true default risk associated with the rating class.Krahnen and Weber (2001) mentions monotonicity as one of the most important requirements for a rating system.It is easy to see that without monotonicity, the situation can lead to issues like misclassification of some customers into wrong risk buckets with under-estimation or over-estimation of default probability.It also affects the discriminatory power of a rating system thereby hindering meaningful risk differentiation between less risk and high-risk rating classes.
One reason for this deviation between expected values and observed estimates could be that the portfolio often has an uneven distribution of accounts in the rating classes (Tasche, 2013).It is commonly observed in many empirical data that are analysed, that there is insufficient information in some rating groups, thereby making it impossible to obtain an accurate estimate of its default risk from the observed default rate itself.For example, Table 1 shows the observed default rates based on historical rating assignments (Long-Term Foreign-Currency) of corporate assets by Standard & Poor's Ratings Services for the years 2017 and 2016.Note that out of the two years, there is a break in monotonicity in the empirically obtained default rates for the year 2016, while 2017 maintains an expected trend in default rate.From this observation we can conclude that the default rates in 2016 does not give a true picture of the default risk in 2016, at least for two rating classes.Such inversions need not be a problem stemming from the rank-ordering capacity of the rating methodology, but a result of randomness in the default rate of customers.The issuer weighted long-run average grade-level default rates reported by S&P report (2021) suggests that the observation of default rate inversions as in Table 1 might be an exception.In such cases, it is safe to assume that the observed default rate is different from the true default rate.A lack of sufficient data might lead to a difficulty in calculating the true probability of default value for such rating grades.It is not just in simple non-parametric models, that the issue may come up but also in classification models like logistic regression or decision tree models.Such models give us the probability of a customer or entity defaulting, given a set of customer level or entity level factors that influence the default risk (credit rating might be one of the factors here).But once the model is built, there is no guarantee that the aggregate risk of customers when grouped at a rating class level follows a monotonic trend.Hence there is a need to calibrate the estimated default probabilities such that,  ′ >  →  ̂′ >  ̂.
One of the early and most popular techniques used in the industry was developed by Pluto and Tasche (2005), who suggested a low-default portfolio calibration approach, termed as the most prudent estimation, using upper confidence bound with confidence level (1 − α), while guaranteeing an ordering of PDs that respects the differences in credit quality as indicated by the rating grades.This method involves optimising an objective function (a cumulative binomial likelihood) by assuming a binomial distribution of defaults and a suitable value for significance level α.Later, some alternatives to the method by Pluto and Tasche was proposed in the form of rating systems or score functions for low default portfolios, see Erlenmaier (2011); Fernandes and Rocha (2011).Dwyer (2006), introduced a Bayesian adjustment to Pluto and Tasche's model with the use of a Uniform prior distribution.His approach was later explored in greater detail in Kiefer (2009), Kiefer (2010) where using prior distributions (like Beta distribution) determined by expert judgement was considered.Other authors that considered the application of the Bayesian approach are Tasche (2013), Clifford et al. (2013), Chang and Yu (2014) and Kruger (2015).But most of these methods cannot be implemented on a rated system to ensure monotonic PD trend estimate.Van der Burgt (2007) suggested a method for estimating low-default portfolio PD curves by using the cumulative accuracy profile (CAP), also known as the power curve or Lorenz curve, and a mathematical function for modelling the CAP, that ensures calibrated monotonic PD curves.This is also called CAP Curve Calibration or VDB Calibration technique.Tasche Dirk (2009) proposed a two-parameter approach called Quasi Moment Matching (QMM) based on Accuracy Ratio and distribution of good accounts.Here, the PD or default probability for each rating is modelled as a mathematical function of the distribution of non-defaulted population in the portfolio and solved by optimising an objective function of the two user-defined parameters.
A recent work in the field was by Surzhko Denis (2017) who proposed a Bayesian method where the portfolio default rate itself rather than rating level is calibrated to the default rate of another portfolio called the closest available portfolio by adjusting the Bayesian prior density of the default rate parameter.Although the method can be adopted for a rating level system, it involves identifying a closely related rated portfolio with reliable and monotonic default statistics and a suitable prior distribution weight.
Almost all the methods which are presently, widely used involve either some level of subjectivity or assumptions in the theoretical framework (discussed briefly in section 3.3).In many cases, these methods are observed to under-predict or over-predict the estimates.Moreover, in some scenarios they have been observed to perform poorly and produce results which are non-intuitive and illogical such as uncalibrated default rates (estimated PDs decrease with decrease in credit quality) in certain types of portfolios.For example, Pluto and Tasche (2005) mentions that the disadvantage of the most prudent approach is that it may lead to non-monotone PD estimates when there are many defaults in high rating grades and less in lower grades.
The main motivation for this study was the gaps in the estimation frameworks of current methods in practise (discussed in brief in section 3.3) and possibility of overcoming the same while improving the performance.In short, the main objective of this study is to; A. Introduce an alternative PD estimation technique that; a. Offers better results with minimum under-estimation or over-estimation.b.Performs well under a wide variety of scenarios and different types of portfolios.c.Reduces the subjectivity in the model and hence the sensitivity of estimates due to subjective influences.B. Evaluate and compare the performance of the PD technique considered in this study using real data.
C. Indicate open questions for further research.
This paper proposes an estimation method called stochastic optimisation for achieving the default rate curve estimation and calibration.Using stochastic optimisation principles, we attempt to estimate the default parameter   () for each rating class R and period t given the data.The estimated default rate parameter acts as a best-case estimate of Probability of Default (PD) satisfying all the underlying conditions.
Stochastic optimization refers to a collection of methods for minimizing or maximizing an objective function when randomness is present.They are widely used in the fields of science, business, engineering etc… A stochastic method is chosen assuming that the likelihood function of default distribution of rating classes   subject to a set of given restrictions, is an expression without any closed form solution.Although the idea of stochastic optimisations is not new in general (Kirkpatrick et al., 1983), the method has seen limited application in the field of credit risk.Thus, the idea presented in this paper is novel with promising avenues of further research.The reason for choosing stochastic optimisation is that unlike other techniques which are widely used (like GRG linear) which gives local maxima/minima, stochastic methods give global maxima/minima.
The proposed estimation framework can be divided into two parts; A. Defining a mathematical objective function of the default rate parameter   that needs to be optimised.B. Optimising the objective function using a suitable technique that gives the best-case estimates for PD.
The theoretical framework for the entire process has been discussed in detail in section 2. The proposed methodology was implemented and tested using two different data sets, which are explained briefly in section 3.1.The results of the study are presented in section 3.2.The results are then compared to the outputs from other well-known approaches, which is discussed in Section 3.3, where it is explained how the proposed method is better than other methods.Few limitations have been discussed in section 4 and we conclude the article in section 5.

Optimisation function and constraints
Binomial distribution is the distribution of the number of successes that occur in N independent trials with the probability of success in each trial is .If we consider N to be the number of performing entities at the beginning and k denotes the number of entities at the end who defaulted, then the distribution of number of defaults can be considered as a binomial distributed random variable with likelihood where  is the true probability of occurrence of a default event of a single entity.Given a set of rating grades 1,2,3..m with initial distribution of performing entities {,  2 , … ,   } and distribution of defaulted entities { 1 ,  2 … .  }, we need to estimate the default rate parameter { 1 ,  2 , … ,   } the true value of which follows the condition g given by ( Under the assumption that number of defaults is binomially distributed,   ~(  ,   ) and default events are independent, i.e. default in one rating grade does not influence another rating grade, the Maximum likelihood Estimate of   0 for each rating grade is just the ratio of   and   .
It is taken as assumption that the reason that the Maximum Likelihood parameter breaks the order is due to limitations discussed earlier.The solution presented through this paper to this problem is to optimise the joint likelihood function of observing the given default data { 1 ,  2 …   } for rating grades 1,2,3…m, by varying the corresponding default rate parameters { 1 ,  2 , … ,   }. (4) Before we find a way to achieve such a solution, we need to implement an additional constraint on the parameters.If we solve our problem with only constraint g, it can occasionally lead to results which optimise the likelihood function but fails to quantify the true default risk.That is, the optimising algorithm may give unrealistically high or insignificantly low values of PD estimates as the optimal solutions to the objective function, but still satisfying condition g.For example, during trials with one of the data sets under study, it was observed that the solution converged to high PD values even for investment grade ratings (high quality ratings where PD is expected to be < 1%). Figure 1 displays such a case where the estimated PD values begin at around 30% (grade 1) and ends at around 80% (grade 5) while the observed default rate is significantly below these values.Remember that we already established an assumption that the observed value of the parameters may be different from the true value, and the variation is caused due to randomness in the default events.It is prudent approach to extend that assumption to further include a condition that, for each rating, the likelihood of data given default parameter  ̂ is within 1 − α% confidence interval of the likelihood of data given the maximum likelihood estimate   0 of the parameter   .The idea of this constraint is to prevent estimates of default rates which maximises our objective function and follows rank order at the same time prevents results which are far off from the expected range of results.Figure 2 shows an illustrative example, where the estimated value of the parameter   is outside the 1 − α% bounds of the MLE of the parameter.This new constraint can be implemented using the likelihood ratio test, as shown below, where   * is the candidate   and   0 is the MLE of   .The test statistic −2 log λ approximates a chi-squared random variable with degree of freedom equal to 1.
So, the estimate should be such that the likelihood ratio should be within pre-defined thresholds given by the condition h given by So, the problem is redefined as to optimise the joint likelihood function subject to the constraints g and h.
̂ = (ℎ(  |, ℎ)). (8) Likelihood of m binomial distributed random variables is just the product of their binomial probability densities (this follows since the random variables are iid) For mathematical convenience, the product can be converted into addition by using a logarithmic function.So instead of maximising the likelihood function, we may maximise the log of the likelihood.
Since we have a constraint in the form of conditions g and h : (0 ≤  1 ≤  2 … ≤   ≤ 1), and ℎ: −2 log λ ≥  ~  1 2 () we need to handle them in the objective function.A variety of constraint handling methods have been suggested by many researchers.Each method has its own advantages and disadvantages.The most popular constraint handling method among users is penalty function method.
The penalty function assigns as static exterior penalty value (Homaifar et al. 1994) to the likelihood function in such a way that the likelihood is decreased for each instance of breach in constraint.Hence for a number of constrains, our optimisation function may be rewritten as where   < 0 is the exterior penalty value for condition g and    is the indicator that kth constraint in condition g is breached.  < 0 is the exterior penalty value for condition h and  ℎ  is the indicator that jth constraint in condition h is breached.So, each violation in constraint would lead to a penalty being applied to the likelihood, thereby decreasing the function f(x).Applying the negative penalties would allow the algorithm to optimise values in the region where no violation occurs or where violations are as minimum as possible.
Therefore, the optimisation problem is formulated as ). (12) This expression has no bounded solution and normal optimisation techniques may not converge to a proper solution that satisfies all constraints.Hence, one can use simulation-based optimisation techniques, also called stochastic optimisation, to estimate the calibrated parameters that maximise our objective function.

Optimisation using Simulated Annealing
Stochastic optimization refers to a collection of methods for minimizing or maximizing an objective function when randomness is present.Unlike deterministic optimisation techniques like gradient descent, which give local maximum or minimum, stochastic methods can give global maximum or minimum.It is suitable for problems where finding an approximate global optimum is more important than finding a precise local optimum in a fixed amount of time.There are multiple models for stochastic optimisation, each of which may be suitable for specific problem.The solution discussed in this paper has been achieved through a method called Simulated Annealing.
Simulated annealing is a general probabilistic local search algorithm, proposed 20 years ago by Kirkpatrick et al. (1983) to solve difficult optimization problems.The algorithm is based upon Annealing process used in material science field.Annealing is the process of heating up a material (mostly metals) until it reaches a fixed temperature and then it will be cooled down slowly in order to change the material to a desired structure.When the material is hot, the molecular structure is weaker and is more susceptible to change.When the material cools down, the molecular structure is harder and is less susceptible to change.
Simulated Annealing (SA) mimics the Annealing process but is used for optimizing parameters in a model.This process is very useful for situations where there are a lot of local minima such that algorithms like Gradient Descent would be stuck at.Simulated Annealing is similar to an MCMC process in that the next value in the iteration depends on the current sample generated.A neighbouring solution is found as a new candidate solution by applying a random perturbation to the current solution using a candidate generator function π( ).This randomness helps in preventing from getting stuck in "local minima".If the selected move improves the solution, then it is always accepted.Otherwise, the algorithm makes the move with some probability less than 1.The probability decreases exponentially with the "badness" of the move, given by the following Equation.
where, () is the objective function to minimise.The parameter T used in the Equation is analogous to temperature in an actual annealing process.At higher values of T, uphill moves are more likely to occur.As T tends to zero, they become more and more unlikely, until the algorithm behaves more or less like an optimisation near local minima.The steps have been described in section 2.3.
The main advantages of choosing Simulated Annealing are the following; • It ensures a "good" global optimal solution with decent computation time.
• It is easy to implement using a software language such as R.
• It can deal with complex cost functions.
The R code used for the process has been provided in Appendix B.

Sampling algorithm
The optimisation process is performed in an iterative and step by step manner.Note that instead of trying to maximise the objective function (Equation 12), we are trying to minimise the negative value of the same.A. For each rating we first start with an initial solution   =  ,=0 .We also start with an initial temperature t = t₀.B. For j = 1 to N, propose new   Because the algorithm is sensitive to initial conditions, it can have an impact on the results as the algorithm may not converge to an optimal solution.So, these steps were performed multiple times to gather the optimal results from each run, which were then compared to obtain the final results.This would help in maintaining the objectivity of the output to a huge extend.

Data
"Standard & Poor's Ratings Services" issues credit ratings for the debt of public and private companies, and other public borrowers such as governments and governmental entities.It rates customers into 22 ratings based on their performance, with AAA being the best rating and minimal chance of default and D being the worst performance rating which stands for Default.Intermediate ratings are offered at each level between AA and CCC (such as BBB+, BBB, and BBB−).For the purpose of the modelling, Long-Term issuer ratings (Foreign-Currency) data of corporate customers rated by S&P was obtained 1 .The data consisted of ratings for the years 2016 & 2017.
1 Under SEC Regulation 17g-7, Nationally Recognized Statistical Rating Organizations (NRSRSOs) are required to report their historical rating assignments, upgrades, downgrades and withdrawals since 2010.Rating data are generally reported on a one year delay.CSV format of this data was sourced from the website www.ratingshistory.info.
For the purpose of simplicity and overcoming data challenges like lack of enough defaults in some rating grades, a binning process2 was carried out where in adjacent ratings were clubbed together to form a new rating group with 9 ratings.For example, ratings BBB+, BBB and BBB− were merged to form a new group BBB.Also, Ratings C and D were considered to be the default bucket with rating order 9.
December snapshots for the years 2011 to 2017 were analysed and the year 2016 was chosen for the modelling.A 12-month window was considered as an ideal observation window because banks and financial institutions are usually interested in measuring probability of default over a 1-year time horizon.For each snapshot period, the number of performing entities in the beginning and the number of default cases (from the performing entities) at the end of the 12 month window was measured for each rating class separately.See Table 2 for details of obligor and default counts.S&P data had rich default rates in many rating grades.The overall number of defaults was 124 for all the 8 rating grades combined, so a low default sample data was created by simulation (assuming a portfolio of only 8 ratings), for the purpose of testing the algorithm on a low default portfolio.The manually generated data has been shown in Table 3 given below.Note that there are only 4 default instances in this portfolio and ratings 7 and 8 with highest default risk has no default instances.Such scenarios may seem far-fetched but are plausible in extremely low default portfolios such as sovereign portfolios.
The method discussed in section 2 was successfully implemented using R programming language and the two data sets, and the results are discussed in the following section.

Experimental results
Using the method of simulated annealing with constrained parameters, our objective function is maximised for the two data considered for the experiment.Significance level α for testing condition h was taken as 2.5%.Candidate distribution π( ) for determining the next sample was chosen as Normal(mean = θ, variance = σ 2 ).The variance of the candidate distribution was chosen as the one that gave optimal results, based on trial and error.It was observed during trial runs that high variance would often lead to failure in convergence to an optimal solution, as the range of search space was quite narrow (θ in (0,1)) and on top of that additional constraints have been applied.The calibrated results for the two scenarios are as shown in the Table 4 below.We can see from the table that the estimates from the code gives monotonic PD values from the default rates for α = 2.5% and 5%.For α = 5%, the algorithm did not converge to an optimal solution and resulted in rejections for majority of the samples.There appears to be significant variation in the estimated PD values in the case of the 8 th rating grade.For the rest of the ratings, the algorithm seems to give relatively close estimates.The results can further be visualised from Figure 3.The estimates of PD for the 2 nd data set for α = 2.5% and 5% are shown in Table 5.The trend can be visualised from Figure 4.

Comparison with other methods
The results obtained from the method were compared with other popular calibration techniques in the industry to get a comparison of estimates.The following are the three major popular methods used currently for estimation and calibration of default rates.A. The most prudent estimation PDs suggested by Pluto and Tasche (2005).
This technique uses a cumulative binomial distribution function to estimate the probability of default for each rating class, as shown below: 1 −  ≤   ( ≥ ,  ≥ ,   ). ( where   is the cumulative binomial distribution function,  ≥ is the cumulative number of customers in rating class R and  ≥ is the cumulative number of defaults in rating class R.  is the level of significance which is subjectively chosen.  is the PD value which solves the Equation for given .
One major drawback is that while estimation, this binomial probability is solved using cumulative distribution of customers and defaults in each bucket rather than actual number of customers and defaults.The results also depend on the value of  and there is no formal rule for an appropriate value that can give an optimum result and this is of large impact on the PD estimates.Also, this method has been observed to produce non-intuitive results such as trend reversal, in many cases.
Generally, this method is used for estimating PDs for low default portfolios only.B. CAP Curve Calibration or VDB Calibration suggested by Van der Burgt (2008).
This method assumes an arbitrary functional form assigned to the cumulative PD curve given by where,  is a factor which depends on the AUC value of the cumulative accuracy profile.Θ  is the cumulative theoretical default rate at rating R. The objective function in this method is given by the sum of squared error between cumulative PD estimates and cumulative actual default rates.Note that Equation 15 is just arbitrarily selected based on the general trend of a cumulative accuracy profile, unlike the objective function defined in Equation 11.
Where,   is the cumulative actual default rate, m is the number of ratings.Using this objective function, the method tries to fit a best possible cumulative accuracy profile by minimising the objective function, thereby calibrating the PD estimates.The drawback of this method is that it gives poor estimates if the accuracy profile of the actual portfolio itself is poor.C. Quasi Moment Matching (QMM) proposed by Tasche Dirk (2009).
Similar to CAP Curve Calibration, this method uses cumulative distribution of non-defaults and defines the PD as a mathematical function of it.But unlike the former, it does not assign a functional form to the cumulative distribution.It rather gives a functional form to the PD value for each rating.
where,   is the cumulative actual non-default rate and   is the PD value at rating R.  and  are parameters of the functional form which needs to be optimised.The optimisation function also differs in that the sum of squared error between PD estimates and actual default rates are used in this method.
Where,   is the actual default rate, m is the number of ratings.Similar to previous method, there is a drawback that it gives poor estimates if the accuracy profile of the actual portfolio itself is poor.

S&P default data
The values of the PD estimates from each estimation technique, have been displayed in the Table 6.From the table, we can observe a significant variation among the different methods, especially in the worst rating grades.A visual comparison of PD estimate trend can be observed from Figure 5.For S&P data, the comparison exercise performed using Pluto-Tasche Calibration, QMM and CAP calibration method suggests that simulation-based approach provides reasonable results in line other industry-best practise techniques.The values are neither under-predicting nor over-predicting in comparison to the other models.Both α = 5% and α = 2.5% gave close estimates.We can quantitatively analyse the performance of PDs generated from Simulated Annealing framework using few statistical tests.Two tests have been conducted to determine if the observed default rate estimate is statistically different from the estimated PD value.
• The Binomial Test is a hypothesis test for measuring if the theoretically expected parameter is significantly different from the observed value of the parameter.It assumes the Null hypothesis that the estimated value of the parameter is the true value.If the Null hypothesis is supported by the observed data, then the observed parameter should fall within a 1-α% confidence interval of the estimated parameter.This test has no test statistic.• The Likelihood Ratio Test is a simple hypothesis test based on the ratio of likelihood between Null distribution and Alternate distribution.The Null Hypothesis is that the observed default rate is the true parameter and the Alternate Hypothesis is that the estimated value is the true parameter.If the null hypothesis is supported by the observed data, the two likelihoods should not differ by more than sampling error and the test statistic would be small.The test statistic −2 log λ approximates a chi-squared random variable with degree of freedom equal to 1.The number of ratings which failed both the tests are as shown in Table 7.It can be inferred that the proposed method passes the performance criteria, better than other competing techniques.Note that even though CAP calibration and QMM produced estimates in line with expected trend, the results does not pass the success criteria for 4-5 number of rating groups.The results of Binomial test and Likelihood Ratio test have been provided in the Appendix A.
It is not entirely surprising that the proposed method performed this way, as we had already defined through one of the constraints of the objective function, that the estimate should be within a boundary of the MLE of the default rate parameter.

Simulated portfolio data
The values of the PD estimates from each estimation technique, have been displayed in the Table 8.From the table, we can observe a significant variation among the different methods.A visual comparison of PD estimate trend can be observed from Figure 6.scenario, the proposed method using stochastic optimisation gave proper results with monotonic trend.Results for both α = 5% and α = 2.5% gave close estimates.CAP calibration method delivered expected trend, assuming an accuracy ratio of 50%.This was subjectively chosen given the low default nature of the portfolio.The number of ratings which failed both the tests are as shown in Table 9.
Comparison of methods using hypothesis tests yielded no significant breaches in any of the rating grades.This is not unexpected as the data only has small sample size and consequently leads to wider confidence intervals.
From the comparison, it is evident that the proposed method is more suitable or at par when compared with the other methods currently used in the industry.It has the added advantage of providing proper results without unintuitive trend in case of extremely low default instances.

Limitations
Although this approach was simple and straightforward to implement, one drawback that plagued the research was the accuracy of results.Rarely, the algorithm threw results which did not converge to the maximum likelihood.It was also observed to show some moderate sensitivity (significant increase/decrease in negative log-likelihood) near the neighbourhood of some points.It was suspected that the initial values at the beginning had a significant effect on the solution seeking capability of the algorithm.
Also, as already widely known, the performance of the algorithm strongly depends on the choice of the cooling schedule and the neighbourhood structure of the objective function.To alleviate the problem to an extent, multiple independent iterations were performed from and the results from each independent run were combined and compared to arrive at the results.Although it does not solve the problem completely, it does provide reasonably solid framework for further development to be carried out.
Values of starting temperature, the penalties of each constraint, variance of the candidate distribution etc all can have significant impact on the output.Optimal values of all these variables were identified through trial and error and kept fixed for the purpose of achieving results.There might be a different combination from the one that has been presented here, that might provide a better optimisation.

Conclusions
In this paper, we have discussed in length how the observed default rates are sometimes, not an accurate representation of the actual risk that is present.In case of rating systems, this can lead to issues like non-monotonous trend in the estimate of Probability of Default, where, a rating which is supposed to represent a higher risk has a lower PD estimate than ratings with lower risk (or vice-versa).This severely affect the institution's risk management capabilities by hindering its ability to identify high risk and low risk customers.We have presented an alternate method for estimation of Probability of Default for rating systems, which solves the issue of non-monotonicity which creeps in when default rates are empirically determined.The method introduced in the document is implemented by assuming that the default rate parameter is a random variable and that the defaults follow and Binomial Distribution.A simulation algorithm is designed using the concepts of stochastic optimisation to obtain the maximum value of likelihood function which is our objective function.In this paper, the method is demonstrated for two cases.One, for a portfolio of corporate entities rated by Standard and Poor's credit rating agency and, two for an imaginary portfolio with simulated defaults.
One advantage of this method is that it is conceptually simple and straight forward and has minimum number of assumptions.The assumptions are also logical and easy to justify.The results obtained have also been observed to be comparable or better to that of other popular calibration methods, especially in situations of data scarcity or low default cases.
Although this approach is simple and straightforward, for higher precision and for large number of ratings, more simulations may be required which may be limited by the computational power of the user's computer system.Reasonably good amount of measures have been adopted to tackle problems like sensitivity to initial conditions which in turn affect the accuracy of the estimates.But presence of large number of model parameters and constraints makes the problem solving a step more difficult.These are general issues plaguing any simulation-based technique and further research needs to be conducted on how to improve the framework by using faster sampling with improved accuracy of the estimates.Using heuristic algorithms which are more problemspecific and making best use of extra information about the system, but taking advantage of the model framework as laid out in section 2.1 may lead to improved results.

Figure 2 .
Figure 2. Spread of PD parameter θ showing a 1 − α% confidence interval (illustrative).Black line shows the Maximum Likelihood estimate of θ.Red line shows the estimated PD value for θ.

Figure 3 .
Figure 3. Calibrated PDs vs observed default rates for S&P data.

Figure 4 .
Figure 4. Calibrated PDs vs observed default rates for simulated data.

Figure 5 .
Figure 5.Comparison of results with estimates obtained from other methods for S&P Data.

Table 1 .
S&P default frequencies and default rates (%) for corporate entities in 2016 and 2017 based on Long Term Foreign-Currency issuer rating.

Table 2 .
S&P corporate credit rating (Long-Term Foreign-Currency) and corresponding numerical order.

Table 3 .
Sample data created by simulation.

Table 5 .
Simulated portfolio-data estimated PD values for 50,000 simulations.

Table 6 .
Comparison of PD results for S&P data.

Table 7 .
Number of Ratings which failed performance tests.

Table 8 .
Comparison of PD results for simulated portfolio.For the simulated data with low default cases, it can be observed that both Pluto Tasche and QMM provided estimates with reversal of trend (decreasing default rates with increasing risk).In this Data Science in Finance and EconomicsVolume 1, Issue 3, 253-271.

Table 9 .
Number of Ratings which failed performance tests.