Abstract

We consider the distribution of the sum of Bernoulli mixtures under a general dependence structure. The level of dependence is measured in terms of a limiting conditional correlation between two of the Bernoulli random variables. The conditioning event is that the mixing random variable is larger than a threshold and the limit is with respect to the threshold tending to one. The large-sample distribution of the empirical frequency and its use in approximating the risk measures, value at risk and conditional tail expectation, are presented for a new class of models which we call double mixtures. Several illustrative examples with a Beta mixing distribution, are given. As well, some data from the area of credit risk are fit with the models, and comparisons are made between the new models and also the classical Beta-binomial model.

1. Introduction

Since the U.S. subprime mortgage crisis which led the world economy into the global recession in the late 2000s, there has been serious criticism about the use of inappropriate dependent default models for portfolio credit risk (see Donnelly and Embrechts [1]). Specifically, the Gaussian copula model (Basel II Accord [2] and Li [3]) had become the industry standard model and has been widely used for both the pricing of portfolio credit derivatives such as collateralized debt obligations (CDOs) or mortgage backed securities and the credit risk management. Its inability to incorporate extreme tail dependence, however, resulted in a serious lack of default clustering, especially under stressful economic situations.

In the literature, several alternative approaches were proposed to cope with the issue of an insufficient level of dependency among defaults. As a simple extension of the Gaussian copula model, Andersen and Sidenius [4] and Burtschell et al. [5] considered stochastic asset correlation models. A more popular way of incorporating further dependence beyond the Gaussian copula model is the use of heavy tail copulas such as Archimedean copulas. Amongst many others, we mention Burtschell et al. [6] and Schönbucher and Schubert [7] for applications of copula models under the structural and the intensity-based credit risk model framework, respectively.

It is important to note that factor copula models under the typical conditional independence assumption have a close link with Bernoulli mixture models. Specifically, by de Finettis theorem, there exists a common mixing random variable on which the default indicator random variables, in an exchangeable credit portfolio, are dependent. The mixing random variable is essentially the random probability of default which can be represented as a function of the common systematic factor in a factor copula model. Under the typical conditional independence assumption, however, the level of dependence between the default indicator random variables is controlled solely by the distributional property of the systematic factor, which can impose limitations for some applications where extreme levels of dependence are required. See Bluhm et al. [8], Cousin and Laurent [9], Frey and McNeil [10, 11], KMV [12], McNeil et al. [13], McNeil and Wendin [14], Moraux [15], and RiskMetrics Group [16] for more references on credit risk applications of the Bernoulli mixture models under the conditional independence modeling framework.

Formally, a Bernoulli mixture model is defined as follows. Let be a sequence of identically distributed indicator random variables. We assume that each , , follows the Bernoulli distribution with the common default probability . The probability is randomly drawn from a distribution with cumulative distribution function (cdf) , . At this point, the dependency structure for is still arbitrary.

In credit applications, the statistical properties of the sum of the Bernoulli random variables are the main interest, and the following assumption plays an important role in alleviating computational difficulties: are conditionally independent given . In this case, given , the sum follows the binomial distribution with mean and variance . Then, the unconditional probability mass function (pmf) is given as follows: where the superscript “ ” on indicates the assumed conditional independence. The most commonly used mixing distribution in modeling the credit risk of a homogeneous portfolio such as a mortgage or credit card portfolio is the Beta distribution. This model is referred to as the Beta-binomial ( -I) model.

The present paper is concerned with the level of dependence incorporated in Bernoulli mixture models, especially under stress situations. Together with the asset correlation, the default correlation is used as the standard measure of dependence in portfolio credit risk models. In particular, we consider the behavior of the correlation between two arbitrary terms, as the random default probability is conditioned to be larger than , tending to 1. Specifically, we define the conditional default correlation between two terms, and , given , , as follows (Bae and Iscoe [17] used an analogous definition to study the asset correlations under stress for various single-factor credit risk models—some dynamic—while Kalkbrener and Packham [18] also studied asset correlations under stress, for static, normal variance mixture models. In both studies, in place of the conditioning on , conditioning is done on an auxiliary risk factor which, for example, is related to a systematic factor): where

Then, the limit of the conditional correlation as tends to 1 (referred to as the limiting correlation) is defined as provided the limit exists.

It is intuitive to expect that the correlations in (2) increase monotonically as the threshold approaches 1. Unfortunately, this is not the case for the Beta-binomial model. In fact, the conditional correlation always converges to zero regardless of the mixing distribution, as long as conditional independence is postulated (Corollary 2). This type of model failure can introduce a significant bias in the measurement and management of risk in stress situations.

The objectives of the present paper are threefold: (i) to derive the relationship between the limiting correlation and the probability structure; (ii) to construct general Bernoulli mixture models with nontrivial limiting correlations; (iii) to demonstrate implications of the constructed models, for tail risk measures such as value at risk and conditional tail expectation.

The theoretical results for these objectives are presented in Section 2. We first examine the limiting behavior of correlations for Bernoulli mixture models in Section 2.1. Then, Section 2.2 introduces a general model framework—the class of double mixture models—which allows further positive dependence between entities, beyond conditional independence (Theorem 4). Section 2.3 is devoted to the third objective, and it considers the large-sample distribution of the sum of Bernoulli mixtures under the general model framework and its application to the approximation of risk measures is discussed. As a specific case, Section 3 considers a Beta Bernoulli mixture, and Section 4 provides the results for an example with specific parameter values. Section 5 demonstrates the fitting results of the double mixture models to a dataset from the area of credit risk. Finally, Section 6 concludes this paper with a summary. Technical details of some proofs are given in the appendices.

2. Main Results

2.1. Limiting Correlation for Bernoulli Mixtures

We denote by the regular conditional probability, given , that all terms , of a fixed sequence of Bernoulli random variables, , take on the value 1. (For this definition, we allow .) Formally, where . Note that, under an assumption of conditional independence, .

The following hypothesis on the properties of in a deleted neighbourhood of will be part of the assumptions in Theorem 1 and its Corollary 2 below.

Hypothesis H. is absolutely continuous in a neighbourhood of , with probability density function (pdf) thereon, such that (i) ;(ii) is continuous in a deleted neighbourhood of ;(iii) converges to a finite limit as tends to 1. The significance of (i) is clear: it is required for the conditioning on to be well defined. Parts (ii) and (iii) are technical assumptions which play a role in the proof of Theorem 1. Part (iii) of Hypothesis is satisfied in many cases of practical interest, where has a simple asymptotic behaviour near . For example, (iii) is satisfied if for some positive (finite) constant and some (finite) constants and (the relation “ ” denotes asymptotic equivalence, in the sense that the ratio tends to 1, as )

Here is the verification that (6) implies that (iii) holds, in the case that . (The case is similar but easier, so the details will be omitted.) Applying (6) in the first step and L’Hospital’s rule in the second step, as tends to 1, we have so (iii) is satisfied with the value of the limit being . It will be clear that the proof of Theorem 1 is easily generalized to allow the limit in (iii) to be infinite. However, it is not apparent, given the previous example, whether such a situation can actually occur. For this reason, we have taken the limit to be finite in the formulation of (iii).

Based on the conditional joint probability , we obtain the following result for the limiting correlation.

Theorem 1. Let and be two Bernoulli mixture random variables with correlation, , as in (2). Suppose that Hypothesis holds. One further assumes that , is differentiable for in a deleted neighbourhood of 1, and exists. Then, the limiting correlation in (4) exists and satisfies

Proof. See the appendices.

In the context of a sequence of Bernoulli mixture random variables, the notation, “ ,” suppresses the dependence on and ; more precisely, it should be written as . In order to have a result which does not actually depend on or , an extra hypothesis on the Bernoulli mixtures must be imposed. We will return to this point in Section 2.2.

An interesting example is the case that both terms take on identical values (comonotonicity) (here, comonotonicity refers to the case that all the components (identically distributed) of a Bernoulli random vector coincide in value with probability one, conditional on ); that is, . Under Hypothesis , Theorem 1 implies that the limiting correlation for this case is 1. The following corollary, another important example, states that the limiting correlation between two arbitrary terms from an identically distributed Bernoulli mixture sequence converges to zero, under the assumption of conditional independence.

Corollary 2. Suppose that two Bernoulli mixture random variables and are conditionally independent given the common default probability . Then, under Hypothesis , the limiting correlation in (4) exists and satisfies

Proof. Under the conditional independence assumption, . The desired result follows immediately from Theorem 1:

Remark 3. Results similar to those of Theorem 1 and Corollary 2 can also be given for the other extreme, conditional on with tending to 0. One simply replaces, in Hypothesis and Theorem 1, the phrase “neighbourhood of ” with “neighbourhood of ” and (iii) with “ converges to a finite limit as tends to 0;” then, in Theorem 1 and Corollary 2, one replaces “ ” with “ .” (Note that is automatically 0 because .) However, in our applications, it is the case “ ” which is important, so we will not dwell further on the case “ .”

2.2. Double Mixture Models

We now turn to the construction of a more general probability structure beyond conditional independence—a structure which reveals the fundamental role of the quantities, . We require a stronger assumption on the sequence than their being just conditionally identically distributed, which is a statement about the 1-dimensional marginals. We require that be conditionally exchangeable, meaning that, under each Pr , all permutations of have a common joint distribution which may depend on . (Note that a sequence of conditionally iid random variables is conditionally exchangeable, as is a comonotonic sequence.) In particular, conditional exchangeability is sufficient to guarantee that , , and hence the limiting correlation in Theorem 1, are independent of and .

We assume, for the remainder of the paper, that our sequence of Bernoulli mixture random variables is conditionally exchangeable. The quantities , , then do not depend on the choice of subset of variables. The probability that any of the Bernoulli random variables take on the value 1 is given in the following theorem which shows that the pmf of is essentially determined by the quantities , , which in turn can be specified quite generally as input to the model.

Theorem 4. Let be a sequence of conditionally exchangeable Bernoulli mixture random variables. Then, for , where and .
Conversely, let be a family of probability measures on the interval , such that and is a Borel measurable function of , for each Borel set, . If is defined to be , then , as defined by the right-hand sides of (11) and (12), is a valid pmf. Moreover, suppose that follows a Bernoulli mixtures distribution with conditional distribution for all which have precisely components equal to 1 and the remaining components equal to 0. Then, is conditionally exchangeable; for all ; hence, has conditional pmf given by (11) and (12). Furthermore, therefore, for all which have precisely components equal to 1 and the remaining components equal to 0, and hence

Proof. See the appendices.

Remark 5. For a general characterization of multivariate Bernoulli random variables, see Sharakhmetov and Ibragimov [19]. With the exchangeability assumption, (12) can be deduced from Theorem 3 of the aforementioned paper.

Remark 6. Under the assumption of conditional independence, reduces to the pmf of a binomial distribution (i.e., ).

Remark 7. The converse part of the theorem is connected with the easy part of the classical Hausdorff moment problem and its application to the proof of de Finetti’s theorem for sequences of exchangeable random variables (see, e.g., Chapter VII in Feller [20]).

Definition 8. Let , be as in Theorem 4, and let , for . A Bernoulli mixture model with conditional distribution as specified in (14) (or equivalently, (12)) and (13) is called a Bernoulli double mixture model. It is determined by the following distributions: the family of measures , and the mixing distribution, , and it can be formally denoted as a - Bernoulli double mixture model. Similarly, model (16), for the sum, , of the - Bernoulli double mixture random variables, can be formally referred to as a - binomial double mixture model.
In concrete examples, we will employ some meaningful, worded description or an acronym, rather than through the formal notation of Definition 8. In addition, for the remainder of the paper, all considered models will be Bernoulli (or binomial) double mixture models; therefore, we may omit those words from the names of models, simply retaining the description of the building blocks, and .

The following simple example illustrates the converse of the theorem.

Example 9 (ICM). We consider a weighted average of independence and comonotonicity. Specifically, for a specified weight , the random -vector is a two-point mixture of a conditionally comonotonic -vector of Bernoulli mixtures and a conditionally independent -vector of Bernoulli mixtures (with common mixing distribution, ), with respective weights and . For such a model, the (conditional) joint probability, , , is expressed as the weighted average of the joint probability of the comonotonic case and that of independence case: In practice, the weight parameter must be estimated from data or be chosen exogenously.
This model (with unspecified) will be denoted by the acronym ICM standing for independent comonotonic mixture (with everything being implicitly conditional). It corresponds to the following choice for , which is a double mixture of point masses: Note that follows from the usual understanding that .
Then, the limiting correlation is (cf. (4)) and the unconditional pmf of the sum, , is with given in (1). The mean and variance of the sum can be easily obtained as follows: Note that both the mean and variance are the weighted averages of those for the independent and comonotone cases. (In general, the th moment is the weighted average of the th moment of the independent and comonotone cases.)

2.3. Large-Sample Distribution of Empirical Frequency

The proportion of observed defaults to the total number of entities in a credit portfolio is of interest in practice. For example, the historical default rate for a homogenous credit portfolio is used to estimate the probability of default for a generic counterparty in the portfolio.

For a fixed , let denote the empirical frequency, which can be considered as the percentage gross loss of a portfolio of loans in equal dollar amounts.

The probability distribution of is

For the general binomial double mixture model, based on a family of probability measures, , and mixing distribution, , we have the following result:

Here is the proof of (22). The case is trivial. For , first note that by (14) where, for a nonnegative number , represents the greatest integer less than or equal to . The integrand is the cdf, at , of where ; so, by the LLN, as , , a.s. Therefore, for , where denotes the indicator function of a set, . (The second term, with the factor “1/2,” comes from an application of the CLT to , when .) For , Integrating these two limiting results with respect to yields the first two cases of (22). The interchange of limit and integrals is justified by the bounded convergence theorem.

Example 10 (ICM reprise). For the -weighted average of conditionally comonotonic and conditionally independent Bernoulli mixtures described in Example 9, we can easily obtain the following limiting distribution (this is the weighted average of the obvious limit for the conditionally comonotonic component and the convergence result in Vasicek [21] for the conditionally independent component of the mixture): Then, the mean and variance of the limiting distribution are In general, the th moment of the limiting distribution is
The limiting result (27) can be applied to the area of financial credit risk management. The tail risk measures of a large-sample portfolio credit loss distribution are of particular interest for financial risk managers. The two well-known risk measures, value at risk (VaR) and conditional tail expectation (CTE), can be approximated by using the limiting distribution, (27). Formally, for a loss random variable with the cumulative distribution function , VaR and CTE at confidence level are defined as follows:

Note that this definition reduces to

if is a continuous random variable (see Hardy [22] or Acerbi and Tasche [23]).

Then, by the scaling property of these risk measures and the limiting result (27), for a large , the VaR of the sum of Bernoulli mixtures at level can be approximated as follows:

where .

Similarly, the CTE at level can be approximated as

3. Beta Mixing Distribution

We now specialize the results of the preceding section to the case where the mixing distribution, , is the Beta distribution with parameters and . Specifically, the density is given by where and are the Gamma and Beta functions, respectively.

For a Beta Bernoulli mixture, several computationally convenient expressions are available. We first introduce a notation which is convenient for the cumulative distribution function: where is the incomplete Beta function. Equation (1), the pmf of the sum under the conditional independence assumption, can be evaluated as The th moments for and in this case are

The following lists several results for the ICM model with the Beta mixing distribution ( -ICM model).

Example 11 ( -ICM). The pmf of the binomial double mixture is given from (20) as The limiting distribution of the empirical frequency, (27), is Since the inverse cumulative Beta distribution function does not admit a closed-form expression, a numerical method is required to approximate the VaR of a large-sample credit portfolio. Given the VaR at level , the CTE can be written in a concise form. Specifically, where , and where .

As can be seen from (27), the limiting distribution of the empirical frequency under the -ICM model has point masses at both end points, 0 and 1 (when ). This may restrict the use of a certain parameter estimation method in the case that all observations are strictly inside the distributional range, which is often the case. In the following example, we consider a Beta distribution for the measures , , as well as a Beta mixing distribution for . We call the model a double Beta model with acronym - . The limiting empirical frequency for the - model does not have point masses at the boundary points, 0 and 1.

Example 12 ( - ). In this example, the choice for is the Beta distribution with the two shape parameters and : where parameter satisfies . Then, using the recursive property of the Beta function ( ), with the usual convention that a product over an empty set of indices equals 1. In particular, and thus the limiting correlation for this model is, by Theorem 1, (Note that this is identical to the result for an ICM model, with ; cf. (19) in Example 9.)

The pmf of the binomial double mixture is given, by Theorem 4, as

By (22), the limiting distribution of the empirical frequency is absolutely continuous, and its pdf is

Since (47) does not admit a closed-form expression, a numerical method is required to calculate the approximate VaR and CTE of a large-sample credit portfolio.

4. Numerical Examples

As an illustrative example, we set and for the Beta mixing distribution for both the -ICM and - models. (These prescribed parameter values are chosen from the authors’ previous experience in portfolio credit risk modeling.) This gives and .

4.1. -ICM Model

In Figure 1, we plot the cumulative distribution functions of the -ICM binomial double mixture, , for several different values of (the limiting correlation). The plot shows that both the left and right tails get thicker as the level of limiting correlation, , increases to 1.

In Figures 2 and 3, we display the approximate risk measures, VaR and CTE, for a scale of confidence levels and by the level of the limiting correlation, . The results show that both and are increasing in the level of limiting correlation, for . This is not the case for when the confidence level is less than , because is decreasing in when .

More specifically, Table 1 compares the approximated VaR and CTE at the level .

4.2. - Model

Recall that the limiting correlation of the - model is . For the purpose of comparison with the -ICM model, we reparametrize the - mixture model as follows: , such that the limiting correlation becomes .

Figure 4 displays the cumulative distribution functions of the - binomial double mixture, , for several different values of (the limiting correlation). The plot shows that has a heavier left tail under the - model than under the -ICM model, for each limiting correlation level. Note that the - model does not admit the two extreme cases: and .

Figures 5 and 6 show the approximate risk measures, VaR and CTE, for a scale of confidence levels and by the level of the limiting correlation, . The approximate risk measures are smaller than those of the -ICM model for each confidence level and limiting correlation; compare, for example, Figures 2 and 3. This indicates that the right tail of the - model is thinner than that of the -ICM model for the same value of .

5. Models Fit to Real Data

In addition to the previous examples with prescribed parameter values, we illustrate the two binomial double mixture models with real data: Bloomberg mortgage delinquency rate index. U.S. residential mortgage loans are segmented into three buckets, Prime, Alt-A, and Subprime, based upon the loan type. Several delinquency rates for each bucket are calculated as the percentage of loans that are reported to be delinquent beyond a certain number of days (30, 60, or 90 days) or are classified as real estate owned (REO) or foreclosure. In this section, we only use the Alt-A 60+ days (60+90+REO+foreclosure) delinquency rate (Ticker: BBMDA60P) for the purpose of illustration. The same approach can be taken for other delinquency indexes such as the Prime (Tickers: BBMDP60P, BBMDPDLQ) or Subprime (Tickers: BBMDS60P, BBMDSDLQ) segments. Figure 7 is taken from the Bloomberg interactive chart, and it plots the monthly history of the delinquency rate during May 2007–March 2012.

5.1. -ICM Model

Note that none of the observations has the value 0 or 1. In this case, the maximum likelihood estimate of the weight parameter is 0 because, for , , where is the density of the Beta distribution, (34). To avoid this issue, we estimate the three parameters in the -ICM model by means of the method of moments (MMEs) which uses the first three moments, (29) and (37), of the limiting distribution of the empirical frequency.

For comparison, we also estimate the two shape parameters of the Beta distribution—the limiting distribution of the classical Beta-binomial ( -I) model—by matching the first and second sample moments with the theoretical ones, using (28) and (37). Table 2 gives the parameter estimates.

The table shows that the MMEs for the two shape parameters of the Beta mixing distribution of the general model ( ) are larger than those of the reduced model ( ). As a result, the right tail of the fitted Beta distribution in the general model is slightly thinner than that of the fitted Beta distribution in the reduced model. This is explained as follows: in the general model, the tail thickness in the data is explained by the estimates of both the shape parameters and . On the other hand, the reduced model is less flexible, and the two shape parameter estimates attempt to capture the observed tail.

In order to identify the implications of the two fitted models (with and without ), for the risk measures, we plot approximate VaRs and CTEs of with at various confidence levels. (The chosen number of entities is purely nominal since we are using the large-sample results. According to the National Delinquency Survey issued by Mortgage Bankers Association, the average number of conventional subprime mortgage loans during 2011 is over four million.)

Figure 8 shows that the risk measures calculated under the conditional independence ( -I) model are larger than those of the more general model, the -ICM model (17), at confidence levels up to 0.996 and 0.945 for VaR and CTE, respectively. Note that the estimated is small to the extent that the effect is not well recognized in VaR until the confidence level takes an extremely high value. The conditional tail expectation, however, of the more general model becomes larger than that under the conditional independence assumption at confidence levels which are typically used in the calculation of economic capital for a credit portfolio. The nonzero allows further dependence between entities, and thus this result suggests that the use of the conditional independence model may result in the underestimation of extreme tail risk.

5.2. - Model

Here, we illustrate the - model of Example 12. Note that the large-sample distributions (of the empirical frequency) are continuous, for both the - and -I models, and none of the observations are either 0 or 1. Thus, the maximum likelihood estimation (MLE) method can be applied by using the limiting distribution of the empirical frequency, (47). However, the number of observations, 57 data points, may not be sufficient to yield statistically reliable MLE fitting results. For the sake of statistical precision and model comparison, we reduce the number of parameters in both the - model and the -I model, which corresponds to the case of conditional independence. Specifically, we use the following theoretical relationship between the first two moments of the Beta mixing distribution and the two shape parameters : where and denote the mean and variance of the Beta mixing distribution, which we estimate with the sample mean and variance, and , respectively, of the delinquency rate, . Thus, we use the following form for in the Beta density: to obtain a single-parameter family of Beta distributions.

Table 3 gives the maximum likelihood estimates for the parameter of the Beta mixing distribution, the limiting correlation parameter , and the values of the log-likelihood functions under the - model and the -I model, respectively.

Figure 9 displays the differences in estimates of each of the two risk measures, VaR and CTE, under the - model and the -I model. For illustration, the approximate risk measures are calculated assuming that there are one million entities in the mortgage portfolio.

The result shows that both the VaRs and the CTEs under the - model are larger than those under the -I model, especially at high confidence levels, and the differences increase rapidly as the confidence level tends to 1.

This result demonstrates the role of the limiting correlation parameter in explaining the tail risk resulting from an interdependency among names within a credit portfolio.

6. Conclusion

In this paper, we study the conditional correlation and the distribution of the sum of Bernoulli mixtures under a general dependence structure. We show, in particular, under the typical conditional independence assumption, that the conditional correlation between two Bernoulli mixtures converges to 0, given that the mixing random variable is larger than a threshold tending to 1. We propose a method to construct a general dependence structure in the form of a double mixture model, in which the conditional iid assumption is replaced by the more general assumption of conditional exchangeability. As a simple illustration, we consider a weighted average of two cases: conditional independence and comonotonicity, for which the limiting correlation is included as a model parameter.

The large-sample distribution of the empirical frequency and its use in approximating the risk measures VaR and CTE are presented. Several tractable results for two Bernoulli double mixture models with Beta mixing distribution, the -ICM model and the - model, are given as illustrative numerical examples and also applied to real data.

Note that there is a strong demand for a credit risk model with an appropriate level of dependence in a stressed economic environment. The most popular model, the Beta-binomial mixture, however, cannot properly explain the empirically observed default correlation under stress. On the other hand, the model framework presented in this paper is simple but flexible enough to accommodate the required level of limiting correlation and thus can be effectively applied to portfolio credit risk models.

Future directions of research include the application of double mixture models to pricing of credit derivatives such as CDOs, on a completely homogeneous pool (e.g., Bae et al. [24]).

Appendices

A. Proof of Theorem 1

Let be the survival probability function. Then, from (2), Thus, the conditional correlation, given , , between and , is Denote and . By assumption, . Note that, by L’Hospital’s rule, Finally, to identify the limit of as , we apply L’Hospital’s rule two more times:

Remark. If it were possible that , then the previous proof is easily modified to cover that case. In the third-to-last line of the final derivation, one simply divides the numerator and denominator by before taking the limit, which then becomes

B. Proof of Theorem 4

As remarked after the statement of the theorem, the proof is based on parts of the proof of de Finetti’s theorem, as given in Feller [20], but for the case of a finite sequence. As it is well known that de Finetti’s theorem does not hold in general for finite sequences (see, e.g., Example 4 at the end of Section 4 of Feller [20]), we will take care to elucidate exactly the ingredients of the proof which are valid in our case.

To begin, we recall some identities for numerical sequences of finite length. For any numerical sequence , denote the difference operator by : Then, the th-order difference operator, denoted by , is defined recursively by It is easily shown by induction, using the elementary identity, , that for and , (This is equation (1.7) in Chapter VII of Feller [20].) Next, we cite the identity (1.9) from Chapter VII of Feller [20]:

Finally, if for some probability measure , , for all , then it is easily established by induction that (This is equation (3.3) in Chapter VII of Feller [20].) Putting (B.3) and (B.5) together yields

We now proceed with the proof of Theorem 4, starting with the verification of (11) and (12). Temporarily overriding the definition (12), denote Due to the exchangeability assumption, the conditional joint probability that the random vector takes on any permutation of the vector is . Therefore, and, thus, for (11) and (12), we must show that As shown in (4.6) in Feller [20] (identifying our and with his and , resp.), where, for the second equality, we have used the result (B.3) with .

For the converse part of the theorem, we first show that , regardless of how the sequence is chosen, as long as . By (B.3), we recognize that Therefore, by the identity (B.4), with ,

Therefore, to show that is a valid pmf, we must show that each term or equivalently is nonnegative. To achieve that, we simply apply (B.6), with , to , in case , , as in the hypothesis of the converse part of the theorem. As a byproduct, we obtain the conditional binomial representation

Finally, with the conditional pmf as described at (13), is clearly exchangeable. The proof of the converse will be complete as soon as it is shown that , for all . For , it follows by definition that . For fixed , denote that is, each in has precisely components equal to 1 and components equal to 0. Then, where the second line follows from the exchangeability of ; the second-to-last equality follows from the binomial expansion of .

Disclaimer

The views expressed in this paper are those of the authors and might not represent the views of the authors’ affiliated institutions.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors are grateful to Bloomberg for granting permission to use the delinquency data in the examples of Section 5. Taehan Bae is supported by the Discovery Grant Program of the Natural Science and Engineering Council of Canada.