The significance of claims fraud in microinsurance and a statistical method to channel limited fraud identification resources

In the past decade, the topic of microinsurance has received much attention from researchers around the world as the drive to alleviate persistent global poverty intensifies. Although microinsurance is a powerful tool that can be used to assist in the fight against poverty by acting as a safety net for policyholders, the problem of claims fraud is a serious threat to its long-term sustainability. Analysis of the existing literature reveals a severe shortage of research into the problem of microinsurance claims fraud, even though we have found that it poses a greater threat in microinsurance than regular insurance. In this paper we highlight the problem of claims fraud in low-income markets and we explain how fraud has the potential to make microinsurance initiatives unsustainable. After establishing that action is needed to combat fraud in microinsurance, we briefly present a number of fraud mitigation techniques that have been successful in conventional insurance. However, certain characteristics that differentiate microinsurance from regular insurance reveal that most of these fraud combating approaches are not appropriate to microinsurance; the proportionately higher costs of identifying claims fraud relative to policy size, the lack of data and the lack of resources experienced by microinsurers render these methods impractical and unaffordable in the context of microinsurance. We proceed to demonstrate the workings of a statistical method known as Principle Component Analysis of Ridit Scores (the Pridit method), initially developed by Brockett et al. (2002) which has been shown to effectively identify fraudulent claims without the need for a training sample. The method can thus easily be applied by microinsurers to assist in the detection of claims fraud. While this method of fraud detection is not without limitations, it may provide a pragmatic and cost-effective way for microinsurers to begin tackling claims fraud. In this paper, the method is clearly explained by means of a worked example to help microinsurers implement the method at low cost.


1.1
Against the backdrop of a difficult political past and the failure of the current government to successfully combat poverty, an overwhelming percentage of the South African population is poor. As of August 2009, approximately 52% of the South African population was living below the upper-bound poverty line, which is defined as R577 per month, in March 2005 figures (Lehohla, 2012), approximately the equivalent of R942 in March 2014 terms. These vulnerable members of society battle to make ends meet on a daily basis, let alone having the measures in place to cope with the risks that they face (Karla, 2010). Unexpected events such as the sudden death of a family breadwinner or damage to property can push these members of society further into poverty (ibid.).

1.2
Microinsurance products have the ability to provide protection to low-income households against such unexpected events, allowing them to focus their limited resources on escaping from poverty (Cohen, McCord & Sebstad, 2005). Churchill (2007) describes microinsurance as the term used to define insurance arrangements aimed at protecting lowincome members of society against specific perils in exchange for premium payments, which are proportionate in size to the likelihood and cost of the risk involved. Microinsurance is based largely on the same principles as regular insurance, but the needs of the target market are significantly different (Biener & Eling, 2012). 1 Typical microinsurance products include property insurance, life insurance and funeral cover.

1.3
From the definition above, it is clear that microinsurance is extremely relevant to South Africa, as more than half of the South African population is classified as low income. However, it is suggested that it is often the low-income segment of society that has the least access to insurance (Karla, 2010). 1 We recognise that there are variations of the exact definition of microinsurance. For example, some definitions specifically restrict the size of the policy to qualify as microinsurance. For the purposes of this research, the exact definition is not important, but we envisage a class of business where policyholders are highly price elastic and struggle to afford even the smallest of premiums. Thus, expenses need to be kept low by the insurer, reducing the resources to fight fraud.

1.4
For microinsurance to be successful in combating poverty, it must be both economically viable to insurers and affordable to the low-income market (Karla, 2010). Minimising the costs associated with offering microinsurance would contribute to achieving these conflicting objectives (ibid.).

1.5
Fraud committed by policyholders increases the costs of providing insurance. Cohen, McCord & Sebstad (2005) thus suggest that the sustainability of microinsurance is dependent (at least in part) on control systems that limit fraud and other costs.

1.6
The International Association of Insurance Supervisors defines insurance fraud as an act or omission intended to gain a dishonest advantage for the fraudster or other related parties (Yusuf & Babalola, 2009) and defines, in the context of insurance, five categories of fraud.

1.7
The focus of this research was on policyholder claims fraud, specifically in the field of microinsurance, defined as: "fraud [committed] against the insurer [by a policyholder] by obtaining wrongful coverage or payment [at claim stage]" (Yusuf & Babalola, 2009: 419). In subsequent sections of this paper, we use the word 'fraud' to refer exclusively to policyholder claims fraud. Policyholder claims fraud can manifest itself in a variety of ways, ranging from a complete fabrication of losses, to the exaggeration of loss amounts (Tennyson, 2008). In addition, fraud can be as a result of premeditation or opportunism on the part of the policyholder. Premeditation refers to a situation in which a person seeks insurance with the primary intention of committing fraud, whereas in the case of opportunism, fraud is not the primary purpose of seeking the insurance, but is committed as the opportunity presents itself.

1.8
This research had four aims: -to determine whether the problem of fraud is more prevalent in microinsurance than regular insurance; -to investigate some of the approaches used by regular insurers to detect and deter fraud and to determine whether these approaches could be used in microinsurance; -if these approaches were deemed to be unsuitable in a microinsurance context, to identify an alternative statistical approach that would assist in the detection of fraud; and -to demonstrate the workings of the statistical approach, if found.

1.9
Sections 2 and 3 deal with each of the first two aims in turn. In Section 4 we explain in more detail a statistical method known as Principle Component Analysis of Ridit Scores (Pridit) for fraud classification developed by Brockett et al. (2002), which we identified in the literature and which appears to be a suitable statistical method for detecting fraud in microinsurance. In this section we have laid out the steps of the methodology. 2 The method has been used in other contexts and built on by Ai, Brockett & Golden (2009) and Ai, Brockett, Golden & Guillén (2013). The value we add is to highlight the appropriateness of the method in the context of microinsurance. Sections 5 and 6 deal with practical considerations and the accuracy of the Pridit method respectively, and are followed by a discussion and conclusion in Section 7. Finally, in Section 8, we leave the reader with ideas for further research into this topic. In the appendices to the paper, we present an example of an application of the Pridit method to claims data from two microinsurance products sold in the South African market to illustrate how the method may be applied in practice.

THE PROBLEM FRAUD POSES IN MICROINSURANCE
In this section we discuss the business model followed by microinsurers, and then explain how fraud interferes with this model. We consider why fraud is such a significant problem in microinsurance and why it is also particularly challenging to combat. We give an example of the consequences of fraud in the agricultural microinsurance industry, before discussing an approach to combating fraud in microinsurance.

The Microinsurance Business Model
Insurers require reasonable prospect of profit to consider the low-income market segment, just as they do for any initiative. Key to achieving this objective are scale and minimising costs.

Scale
2.1.1.1 Churchill (2007) explains that microinsurance unit profits may be small in an attempt to make products affordable to low-income target markets, but when multiplied across a large number of policies, the overall profit figure may be attractive.
2.1.1.2 He further notes that large policy numbers are also desirable for pooling, resulting in increased stability and predictability of future claims experience. The increased stability has the potential to result in lower premiums due to the lower capital requirements.
2.1.1.3 A further reduction in premiums may be possible as the fixed expenses of the insurer are spread across more policies.
2.1.1.4 The increased affordability should in theory start a cycle of increased business volumes, more stable results and even more affordable premiums. A visual representation of this cycle is provided in Figure 1 below.

Keeping coStS low
2.1.2.1 Scale is desirable, provided individual policies are profitable. Furthermore, fewer policies would be required for the business to be economically viable if the per policy profit was maximised by increasing premiums and/or reducing expenses.
2.1.2.2 However, transaction costs in microinsurance are a major obstacle facing insurance providers due to many individuals in the target market lacking insurance awareness and not having bank accounts (Churchill, 2007).

2.2
How Fraud interferes with the Business Model 2.2.1 The first and most obvious way that fraud interferes with the microinsurance business model (refer to Figure 1) is that it increases the average cost of providing insurance because more claims are paid.
2.2.2 Furthermore, claims fraud introduces uncertainty that reduces the stability and predictability of the claims experience because it distorts the random statistical process underlying the claims experience, making it more difficult to model claims.

2.2.3
Modelling techniques could potentially be used to continue to price accurately in the presence of fraud if fraud were constant over time. However, fraud levels tend to fluctuate with the state of the economy (Churchill, 2007), making modelling more challenging. 2.2.4 The increased premiums that result from larger or more frequent claim payouts and increased risk margins due to the uncertain level of fraud lead to a reversal of the cycle shown in Figure 1.

2.3
An Example of the Consequences of Fraud in Microinsurance 2.3.1 Approximately half of the world's poor population rely on agricultural activities as their primary source of income (Barnett & Mahul, 2007). For this reason, poor households are extremely susceptible to the financial consequences of weather-related events. 2.3.2 The authors explain that crop insurance allows poor households that depend on agricultural activities for their livelihoods to transfer the risks of weather-related events to insurers.
2.3.3 They are, however, quick to highlight that crop insurance markets in rural areas of many low-and middle-income countries are underdeveloped. One of the many reasons given for this underdevelopment is the problem of fraudulent behaviour, which FIGURE 1. Cycle of increasing volumes and decreasing premiums they believe is prevalent in these markets. The availability of insurance is reduced as the fraudulent behaviour increases the cost of offering insurance and jeopardises the solvency of insurers (Hoyt, Mustard & Powell, 2006). 2.3.4 The consequence of fraud is clear. Not only does it make business less profitable for insurers, but it also stunts the development of microinsurance, thus reducing the ability of low-income earners to transfer risk. Minimising claims fraud would thus help to overcome the challenges involved in providing insurance to the poor. Similar consequences hold irrespective of the type of microinsurance, for example, agricultural, property, life or funeral. The next question considered is why claims fraud is common in microinsurance.

2.4
Drivers of Fraud in Microinsurance 2.4.1 While it is difficult to compare the incidence and cost of fraud in microinsurance against traditional insurance, primarily due to costs associated with identifying and pursuing fraudulent claims, four arguments may be proposed for the higher incidence of fraud in microinsurance: -policyholders are more accepting of fraud when premiums are financially onerous (Tennyson, 1997). This applies in microinsurance as low-income individuals view even the smallest of insurance premiums as unaffordable. -a negative attitude towards insurers results in more widespread acceptance of fraud (Tennyson, 1997); -lack of education about the value of insurance increases fraud incidence (Churchill, 2007)-for example, during claim-free periods; and -compulsory or forced cover, by law or contract, encourages policyholders to seek their money's worth, even at the cost of fraudulent activity (Tennyson, 1997). This is particularly the case when policyholders would not have purchased the insurance of their own accord.
2.4.2 Roth (2001) suggests that funeral insurance is extremely common in South Africa, especially amongst low-income earners. The Association of Savings and Investment South Africa (ASISA) states that the total value of known fraudulent funeral insurance claims amounts to an estimated R131,7 million for 2011 alone. 3 This is approximately 3.2% of the written premium for the South African funeral insurance industry in 2011. They suggest that the known cases of fraud are just the tip of the iceberg and that the total cost of fraud could be as high as 12% of the written premium.

2.5
Challenges of combating Fraud in Microinsurance 2.5.1 Based on the abovementioned drivers, fraud may well be more common in micro insurance than traditional insurance. Logically, this would call for thorough fraud combating approaches. However, combating fraud in microinsurance is particularly difficult for the following reasons: -it is relatively costly to verify claims, which is not conducive to affordable premiums for the low-income market. Minimal claims verification systems and processes produce an environment conducive to claims fraud (Yusuf & Babalola, 2009); -repudiating claims may further reduce trust of a fragile market, triggering a 'fraud spiral' as negative perceptions lead to more fraudulent claims. Lower trust will also likely reduce scale; -insurers are inclined to pay claims quickly to increase perceived value of insurance, reducing time available to investigate claims; and -justice systems to combat fraud are frequently inadequate. Less than 3% of the fraudulent claims reported to authorities in South Africa secure convictions, and fraud prevalence levels remain unchanged. 4 2.5.2 The insurance fraud spiral illustrates how challenging it is to combat fraud in microinsurance. Allowing the problem of fraud to fester can interfere with the business model, potentially rendering microinsurance unviable to insurers.

2.5.3
The question should therefore be not whether fraud should be combatted, but how it should be done in a way that overcomes the abovementioned problems of high cost of claims assessment, policyholders' acceptance of fraudulent behaviour and low trust in the insurance industry. 2.5.4 We have chosen to focus this paper on the task of identifying fraudulent claims because we believe identification to be the logical starting point to addressing the problem.

2.6
In the next section we present an overview of techniques that have been used to reduce fraud in conventional insurance. As we will see, many of these techniques are either not possible or practical to implement in microinsurance. In Section 4 we then summarise a statistical method developed by Brockett et al. (2002) for fraud identification known as Pridit that overcomes many of the problems involved in applying traditional fraud identification methods in microinsurance.

METHODS IN USE TO DETECT AND DETER CLAIMS FRAUD
Existing literature on the topic of insurance fraud reveals that past approaches used to address the problem of claims fraud can roughly be divided into two categories. The first category, referred to as ex-ante approaches, involves preventing fraud from happening, while the second category, referred to as ex-post approaches, involves the detection of fraud once committed. In the discussion that follows, we consider the ex-ante approaches of contract design and consumer education. We follow that with consideration of ex-post approaches: claims verification, data mining and statistical methods.

3.1
Contract Design Features 3.1.1 limitS on claimS to prevent over-claiming 3.1.1.1 The purpose of indemnity-type insurance contracts is to restore policyholders to the position that they would have been in had the insured event never occurred.
3.1.1.2 Yusuf & Babalola (2009) suggest that the introduction of limits on indemnitytype insurance contracts could act as fraud deterrence both in the form of exaggerating loss amounts and staged claim events. In the first case, any legitimate claim event that has resulted in a financial loss at or slightly below the limit would not present an opportunity for the insured to inflate the actual amount of the loss. In the second case, it is hypothesised by the authors that in the presence of limits on indemnity pay-outs, the insured would be afraid to stage a claim event for fear that the resulting loss would be greater than the limit.
3.1.1.3 The introduction of a claim excess would achieve a similar effect to decreasing limits, ensuring that, no matter the size of the claim, the policyholder feels at least some pain.
3.1.1.4 The danger with this contract design feature is the possibility that microinsurance products provide inadequate coverage and are less successful in meeting the needs of the target market, pushing down the level of trust in insurers (Karla, 2010).

BonuS-maluS contractS
3.1.2.1 Moreno, Vázquez & Watt (2006) suggest that if an insured's future premiums are increased each time they make a claim then they would be less willing to file fraudulent claims. This type of insurance contract is referred to as a bonus-malus contract. This contract design feature is only relevant to insurance contracts where multiple claims are possible. As a large portion of microinsurance products are for single claim events, for example, funeral and micro-life products, this contract design feature is unlikely to be as effective in combating fraud in microinsurance as it is in regular insurance (Biener & Eling, 2012).
3.1.2.2 In addition, increasing future insurance premiums when dealing with the low-income segment of the market is unattractive to customers and ethically inappropriate, considering the affordability constraints in the market. An alternative approach would be to grant a premium discount to a policyholder if no claims are made. However, this would mean that the initial premium, before discounts, would be less affordable.
3.1.2.3 Cashback rewards programmes have traditionally been used by insurers to promote good behaviour and create incentives for policyholders not to claim (Yusuf & Babalola, 2009). Such an arrangement provides the policyholder with a reimbursement after a specified period of time over which the policyholder has not submitted a claim (ibid.). Although this contract feature was not originally or solely designed to deter claims fraud, fraud deterrence has proven to be a favourable consequence (Marzen, 2013).
3.1.2.4 Section 2.4 highlights that individuals who are compelled to take out insurance are more likely to be tolerant of claims fraud as they may feel that the insurance is unnecessary (Tennyson, 1997). In such situations, cashback reward programmes may prove to be more successful in deterring fraud as policyholders no longer feel that they have to make a claim to justify the existence of the insurance.
3.1.2.5 These programmes are ultimately funded via increased premiums but the potential for a reduction in fraud may offset the disadvantage of the increased costs of providing cashbacks.
3.1.3 reducing policyholderS' control over claim eventS 3.1.3.1 The product could also be designed to reduce the potential for policyholders to exert any control over the claim events. An example of such a product is weather-index crop insurance, which pays indemnities not based on actual losses sustained by crop farmers, but rather on realisations of a weather-index, measured at a specific weather station in a given location, that is highly correlated with the actual losses sustained (Barnett & Mahul, 2007).
3.1.3.2 Weather-index crop insurance is a useful way of providing insurance to the poor. Among its advantages is its track record in combating fraud in crop insurance (Skees, 2008). As Turvey (1999) explains, the policyholder is unable to manipulate the amount of a claim or falsify a claim. The payout from weather-index insurance is independent of the policyholder's actions (Barnett & Mahul, 2007).
3.1.3.3 Perhaps the only realistic scope for fraud in weather-index crop insurance is the possibility of tampering with the equipment used to measure the relevant weather metrics (ibid.). This scope for fraud is admittedly higher in low and middle-income countries where the resources to secure the weather stations are limited.

3.2.1
Yusuf & Babalola (2009) state that the majority of insurance providers, particularly in low-and middle-income countries, have largely not succeeded in educating consumers on the consequences of insurance fraud. As a result, most consumers are not aware of the collective savings possible if claims fraud were reduced. 3.2.2 The lack of consumer education by insurance providers is even more problematic when the number of claims being repudiated increases due to an aggressive fraud combating approach (see ¶2.5). It is thus advisable for insurance providers, especially in the field of microinsurance, to combine fraud combating methods with consumer awareness programmes. It is suggested by Yusuf & Babalola (2009) that claims fraud education programmes could deter fraud, especially in the field of microinsurance.

3.2.3
The logical problem with the proposal is that the benefit of reduced fraud is shared by the group of policyholders and not individual policyholders considering fraudulent actions. We turn now to consider three different ex-post approaches.

Claims Verification Processes
3.3.1 Tennyson & Salsas-Forn (2002) suggest that, in the presence of exaggerated loss amounts and fictitious claims, active verification through claims investigation and auditing is an important tool to detect fraud. This management tool is costly, however. Insurers may be forced to devise methods of rationing limited investigative resources across an extensive set of claims (ibid.). Usually claims for larger amounts and those that exhibit greater potential for opportunism are allocated a greater portion of a firm's investigative resources (ibid.). In many cases, efficient claims verification systems have proven successful in detecting fraudulent claims (Brockett et al., 2002).
3.3.2 In the context of microinsurance, adequate claims verification processes are often not possible because of the associated costs involved (see ¶2.5) In practice, claims verification processes in microinsurance are usually limited to questions surrounding the details of the claim (Cohen, McCord & Sebstad, 2005). Even then, Tennyson & Salsas-Forn (2002) suggest that most questions asked are primarily for record-keeping purposes, not to assess the validity of a claim. 3.3.3 We have classified claims verification processes under ex-post approaches, but Tennyson & Salsas-Forn (2002) argue that the presence of a claims verification process acts also as a deterrent to fraud. If policyholders are aware of the existence of a claims verification process, goes the argument, then they are less likely to file fraudulent claims. 3.3.4 The presence of a claims verification process will quickly lose its effect as a fraud deterrent, however, if it becomes known as largely ineffective.

3.4
Data-mining Techniques 3.4.1 Data mining is defined as the identification of unexpected or useful patterns within large datasets (Hand, 2007). Marzen (2013) states that many insurers make use of data-mining techniques to identify useful patterns in insurance claims data. Marzen suggests that this is a productive technique for allocating investigative resources.
3.4.2 Hand (2007) points out, however, that data mining is a specialised statistical field that requires expensive human and computing resources on an ongoing basis. This detracts from its usefulness in microinsurance because expenses are ultimately borne by consumers (Brockett et al., 2002). The Pridit method by Brockett et al., which we present in this paper, is a data-mining technique that is simple to implement and requires relatively little expertise.
3.4.3 Marzen (2013) documents the case of the Risk Management Agency (RMA) that utilised data-mining techniques in an innovative way to combat crop-insurance claims fraud. If any policyholder's claim exhibited abnormal characteristics compared to all other claims over a given period and within a given location, then the policyholder's farm was inspected. This fraud-detection approach is believed to have resulted in an estimated $838 million saving in the US between 2001 and 2010.
3.4.4 Simplified data-mining techniques that are cost effective and not time consuming to undertake could be coupled with similar approaches taken by the RMA to combat fraud in various other types of microinsurance markets (ibid.). This is indeed what the Pridit method of fraud identification seeks to achieve.

3.5
Statistical Methods 3.5.1 Artis, Ayuso & Guillén (2002) testify to the success of discrete choice models in detecting fraudulent claims from automobile-insurance claims data. Such methods are used in other areas too, including credit card fraud detection (Jha, Guillen & Westland, 2012) 3.5.2 Essentially, discrete choice models are generalised linear regression models with a dichotomous response variable. In this case, the response variable would be defined as the presence or absence of fraud for a given claim.
3.5.3 One crucial requirement for this statistical method is the existence of a training sample, which is a collection of past claims data for which the response variable is known. These details are established through a claims verification process. 3.5.4 In essence, the model works by comparing the characteristics of future incoming claims with the characteristics of past claims already identified as fraudulent. Any new claims exhibiting similar characteristics to past fraudulent claims would be assigned a high probability of being fraudulent. Investigative resources can then be used more efficiently by prioritising the investigation of claims with a high probability of being fraudulent (ibid.). Such methods are also known as supervised methods as they are informed by past data.
3.5.5 Supervised methods can be refined to focus on minimising the cost of fraudulent claims by more accurately identifying large fraudulent claims, rather than focussing on accurately identifying all fraudulent claims (Viaene et al., 2007). 3.5.6 The crucial requirement of a training sample mentioned above is a major obstacle to the success of such models in detecting claims fraud in microinsurance. As claims verification processes in microinsurance are minimal, the existence of a reliable training sample enabling the estimation of the parameters of discrete choice models is unlikely. 5 3.5.7 While it is possible to adapt supervised techniques when some data are misclassified (Artis, Ayuso & Guillén, 2002;Caudill, Ayuso & Guillén, 2005), supervised methods are simply not possible when a training sample is unavailable or when the cost associated with establishing a training sample is prohibitive (Brockett et al., 2002), as is often the case in microinsurance.

3.6
Centralisation of Fraud Investigations 3.6.1 Given the abovementioned data challenges, Boyer (2000) makes the case for the centralisation of fraud investigations within countries. It is suggested that industrywide fraud investigation units should be set up to allow for collaborative data collection and analytical efforts (ibid.). Boyer suggests that such an approach would allow for economies of scale to reduce the costs of fraud detection across the insurance industry.
3.6.2 Yusuf & Babalola (2009) support this suggestion and believe that it would be especially successful in the case of microinsurance as the establishment of a single insurance fraud-combating institution within a country would allow smaller insurers with insufficient resources an opportunity to address the issue of claims fraud directly.
3.6.3 Jha, Guillen & Westland (2012) argue that credit card fraud needs to be assessed in clusters, rather than individual transactions, due to the tendency for fraudsters to commit multiple transactions. Arguably, this type of behaviour is more prevalent in online credit card transactions than insurance.

3.6.4
In South Africa, the Astute Financial Services Exchange 6 was launched in 2000 as a means for stakeholders within the long-term insurance industry to collaborate in a central electronic data exchange. Among other uses, the central database allows stakeholders to access information about policyholders' engagements with other stakeholders, including policy amendments and past claims, helping to prevent fraud. 3.6.5 The South African Insurance Crime Bureau (SAICB) was established in 2008 with the primary objective of combating organised crime in the short-term insurance industry. 7 Although the SAICB allows for the pooling of resources to combat organised crime committed by syndicates, it does not provide support against opportunistic fraud by policyholders (ibid.). The Pridit method presented in the next section does address opportunistic fraud.

4.1
Rethinking Fraud Identification Methods in Microinsurance 4.1.1 Churchill (2007) suggests that insurers need to recognise that viewing micro insurance products as existing insurance products with smaller sums insured is not sufficient. Instead, microinsurance requires new approaches that are different from regular insurance. This suggestion by Churchill could be extended to include the approaches that have historically been used to identify fraudulent claims in traditional insurance. New approaches to identifying fraudulent claims in microinsurance need to be developed as very few of the existing methods are practical or affordable.
4.1.2 In this section we introduce such a method of fraud identification. The method is known as Principle Component Analysis of Ridit Scores for fraud classification, developed by Brockett et al. in 2002 to identify fraudulent claims when no training sample is present. This is ideal for microinsurers because a training sample is costly to attain due to the costs of claims assessment, and not having a training sample renders many of the statistical methods of fraud identification impossible. 4.1.3 It is worth noting that the Pridit technique is one of a class of fraud detection techniques referred to as unsupervised methods. We have specifically focused on this method here as it has a number of desirable characteristics and can be implemented relatively easily. 4.1.4 In the discussion that follows we introduce the theoretical framework of the Pridit method, illustrated by a simplified worked example (remainder of Section 4), discuss some practical considerations (Section 5) and consider the accuracy of the method (Section 6). In the appendices, we present an application of the method to data obtained from a South African insurance company. It is our intention that this discussion should enable practitioners at microinsurance companies to implement the Pridit fraud identification method directly at low cost. 8

4.2
Introduction to the Pridit Method 4.2.1 Brockett et al. (2002) first presented the Pridit method for fraud classification in 2002. An overview of this technique is provided in this paper. For full details of the statistical theory underlying the method we refer the reader to Brockett et al. (2002) and Ai, Brockett & Golden (2009) The Pridit method is based on a transformation method known as Ridit 9 introduced by Bross in 1958. Ridit was developed by Bross to assist researchers in scientific studies in biological and behavioural sciences when dealing with what he referred to as 'borderland' variables. 10 Brockett et al. (2002) slightly adjusted the calculation of the Ridit score and then further applied an iterative weight-refining method to place more importance on variables that better explain the variability in claims data. Ai, Brockett & Golden (2009) then extended the method to allow for the inclusion of continuous fraud predictor variables in addition to categorical variables. Ai et al. (2013) further extended the method to estimate actual fraud probabilities, rather than simply ranking claims in decreasing likelihood of fraud-this extension is not considered in this paper.

4.2.3
The Pridit method aims to rank each claim in a claim file (say, all claims in the past week) in decreasing order of fraud probability (Brockett et al., 2002). This enables insurers to allocate limited claims-assessment resources to the claims that have the highest fraud suspicion, resulting in a more efficient use of limited investigation resources (ibid.). A further advantage of this method is that insurers can pay claims with low fraud suspicion without delay. This may help microinsurance companies build trust, which is necessary to increase business volumes and reduce fraud (see Section 2.2 and ¶2.5). 4.2.4 The biggest advantage of this method is that it does not require a training sample, making the method, we believe, most suitable for use in microinsurance. The method may, however, improve the ability of microinsurance companies to develop a training sample. 4.2.5 Brockett et al. (2002) and Ai, Brockett & Golden (2009) conducted a number of tests on past claim datasets where it was known whether claims were fraudulent or not. The Pridit method successfully identified fraudulent claims in absolute terms and when compared to the fraud identification ability of other statistical methods parameterised using past data.

4.2.6
We argue that the practicability of the method in the microinsurance context and its demonstrated usefulness on at least one insurance claims dataset provide sufficient evidence to suggest that the method be implemented by microinsurers as a step in identifying fraudulent claims; particularly by microinsurance companies that currently do not have any fraud identification strategy in place. In the remainder of Section 4 we explain the method using hypothetical examples and imaginary variables. In the appendices, we present an application of the method to a real dataset.
9 Ridit refers to a type of transformation of an identified empirical distribution, rather than a theoretical distribution where probit transformations are applicable. Refer to Bross (1958) for a full explanation of the name. 10 A response variable which may be on a subjective scale or where measurements are not easily reproducible, making comparison of measurements (say, between different experimenters) difficult (Bross, 1958).

4.3.1
Let us assume that we have a variable with the categories 'low', 'medium' and 'high' in increasing order of intensity, in other words, not just a random ordering of categories. While the categories are in increasing order of intensity, the exact definition of each of the categories is open to interpretation. In the context of claims fraud identification, this variable, which we call Variable 1, might be a variable stating 'how suspicious a policyholder sounded' when the claim was submitted, for example, a value subjectively allocated by the claims capturer. The variable is slightly far-fetched for practical use but serves to demonstrate the calculation of Ridit scores. More realistic fraud identification variables are available in the appendices of this paper. 4.3.2 A Ridit transformation provides a method for assigning a numerical value to each of our subjective categories (Bross, 1958). Each categorical response, previously described by a name (low, medium, high) is transformed into a number between -1 and 1. This number is referred to as the Ridit score. 4.3.3 The formula for the Ridit transformation used in the Pridit method for category i on a certain variable is: where p j is the percentage of claims in our claim file exhibiting category j on that variable. The Ridit score for category i is thus the percentage of claims in a lower category than category i on the variable minus the percentage of claims in a higher category than category i on the variable. 4.3.4 The formula results in a value in the range [-1,1]. Bross' (1958) original Ridit transformation method transformed variables into the range [0,1], where a value close to 0 for category i indicated that category i was least extreme relative to other categories and a value close to 1 was most extreme. Brockett et al. (2002) adjusted the method slightly so that the transformation was to the range [-1,1], which has the desirable characteristic of having a midpoint of zero. 4.3.5 Before we interpret this Ridit formula, we present a simple example to help the reader gain some insight into the operation of the transformation. We adjust the numbers in the example as we proceed to assist our explanation of the interpretation of the resulting Ridit scores.

4.4
Example and Interpretation of Ridit Scores 4.4.1 Consider Variable 1, introduced in ¶4.3.1. Now imagine a set of 100 claims. Closer inspection of the set reveals that the percentage of claims in each of the low, medium, and high categories on our variable is 60%, 30% and 10% respectively. If we assign integer value subscripts to identify the categories (low 1 to high 3), then p 1 =0.6, p 2 =0.3 and p 3 =0.1.

4.4.2
We now calculate a Ridit score for each category using the percentages of claims falling into each category (the p's) as follows: B 1 = -0.4, B 2 = 0.5, and B 3 = 0.9, from formula (1).

4.4.3
The Ridit score for a given category is influenced by two factors: 11 -User-defined ordering of the categories: a category can never have a higher score than a category which is deemed to be less severe according to the user input. This allows appropriate subjectivity in setting the ordering of the categories, in other words, to specify which categories are more severe. -The percentage of claims falling into each category: a smaller percentage of claims in a particular category results in the score for the category being similar to the scores in adjacent categories. Also, the smaller the percentage of claims in the extreme categories, the closer the Ridit scores of the extreme categories (lowest and highest) are to -1 and 1 respectively.

4.5
Applying Ridit Scores to more than One Variable 4.5.1 This transformation resulting in Ridit scores for each category can be applied to all variables with more than one category and the interpretation remains the same. This is useful when comparing variables with different numbers of categories. For example, it might be difficult to compare a response of 5 on a variable with categories 1 to 5 (call this Variable 2), with a response of 'medium' on Variable 1 with categories 'low', 'medium', 'high'. But if we knew that the proportion of claims falling into categories 1 to 5 on Variable 2 were equal to say 10%, 20%, 20%, 20% and 30% respectively, we would calculate the Ridit score for category 5 as B 5 = 0.7 and note its proximity to the Ridit score of 0.5 for category 'medium' on Variable 1, which means that they are actually quite similar in severity on each of their variables' respective scales, even though 5 is the most severe category on Variable 2 and 'medium' is not the most severe category on Variable 1. 4.5.2 When computing Ridit scores on each category for more than one variable, we need to adjust the formula to specify which variable t the score applies to. The formula becomes (Brockett et al., 2002): where i is a particular category on variable t and the number of variables is greater than 1.

4.6
Combining Ridit Scores to get a Total Score for Each Claim 4.6.1 Now that we have the Ridit score for each category on each variable, we need some way of ranking claims in order of least extreme to most extreme, by taking into account how extreme each claim is with respect to each variable. A claim is deemed to be more extreme if it exhibits characteristics that are extreme on a number of the variables in our model. 4.6.2 The simplest way to obtain an overall score of how extreme a particular claim is, is to sum across all variables the Ridit score of the category which the claim falls 11 Appendix 1 provides additional examples and discussion to help the reader understand the influences on Ridit scores.
into on each variable. For example, a claim that exhibited a 'medium' for Variable 1 and category 4 for Variable 2 will have a combined score equal to 0, suggesting that this particular claim is not very extreme in either direction, low or high, when taking both variables into account. 4.6.3 To overcome the problem of variables cancelling each other out when ordered differently, Brockett et al. (2002) suggest that the categories within each variable should all be ranked from most indicative of fraud (category 1) down to least indicative of fraud. Scores close to minus one under this approach are then deemed to be more extreme towards a high probability of fraud, while scores closer to positive one are deemed to be more extreme towards a low probability of fraud. Thus, when claim scores are summed across categories, the lower the combined score, the more likely the claim is to be fraudulent. 4.6.4 In our example, we thus need to adjust our Variable 1 so that categories are ordered 'high', 'medium', and 'low' since a 'high' score for 'how suspicious a policyholder sounded' is most indicative of fraud. If we recalculate the Pridit scores, they are now B 1 = -0.9, B 2 = -0.5, B 3 = 0.4. The claim type exhibiting 'high' for Variable 1 and 1 for Variable 2 would have a combined score of -1.8, a score more indicative of fraud than before since -1.8 is close to -2, the smallest possible combined Ridit score over two variables. 4.6.5 This approach could be followed for each claim in the claim file and the claims could be ranked from lowest to highest combined score, that is from most extreme in a fraudulent direction to least extreme. The microinsurer could then quickly settle claims which are not extreme in a fraudulent direction, say those claims with a combined score greater than 0 (Brockett et al., 2002), while investigating as many claims as resources allow, starting with the claim most likely to be fraudulent and working down the list.

4.7
Improvements to the Basic Ridit Method 4.7.1 The method as described above is the foundation of the Pridit method presented by Brockett et al. in 2002. There are two significant improvements to this basic method, though. The first improvement, suggested in the same paper by Brockett where X is the continuous variable and x is an observed value of X on a particular claim in the claim file. The subscript i, indicating category i, is no longer required. We calculate a score for each observed x in our claim file, where it is possible that x is different for each claim. 4.7.3.2 If we introduce fraud predictor Variable 3 as claim amount, a continuous variable, then for the first part of the Ridit formula, instead of asking, "What percentage of claims are in a category below category i?" we now ask, "What percentage of claims have claim amounts less than the claim amount on this claim?" Similar logic applies to the second part of the Ridit formula. Tables 1 and 2 show an unsorted set of claims and the same set after transformation to Ridit scores. Table 3 shows the benefit of reversing the components of the right hand side of formula (3) to give lower Ridit scores to larger claims.   Table 4 presents a summary of the Ridit scores on each of our three variables. Note that the Ridit scores for Variable 3 cannot be presented in this form because Variable 3 is a continuous variable. The Ridit score would need to be calculated for each and every claim in the dataset. The Ridit scores for Variable 3 are thus only shown in Table 5, which shows the Ridit scores for each claim, depending on the characteristics for each variable.  Table 5 we set out the information on ten claims expanding on Tables 2 and 3. The Ridit scores for each claim can be looked up from Table 4 for the categorical variables 1 and 2 and calculated for Variable 3 as in Table 3 above. 4.7.4.3 The final step is to rank claims from lowest total score to highest total score, so that claims are in order from most likely to be fraudulent to least likely to be fraudulent.  Brockett et al. (2002) propose a method by which the weights assigned to each variable when summing Ridit scores across variables can be refined so that variables with higher discriminatory power are assigned a higher weight. 4.7.5.2 This is carried out by giving equal weight initially to each variable (simply summing Ridit scores across categories as in the example above) and then assessing the correlations between the Ridit scores on each of the variables with the total score to see which variables give scores that are most consistent with the total combined score on each claim. These variables may be assigned higher weight (Brockett et al., 2002). Ai, Brockett & Golden (2009) explain that if a particular variable has a small score at the same time that the overall score is small or a large score at the same time that the overall score is large, and this occurs consistently over all claims in the claim file, then the variable is better at predicting the overall score. A measure of this correlation is the Pearson correlation, which is closely related to the normalised inner product. For a particular variable, the inner (or dot) product is the transpose of the vector of the Ridit scores on the variable for each claim multiplied by the n × 1 vector of overall scores, S (0) . 4.7.5.3 The variables that are more closely correlated to the final score are assigned more weight when summing Ridit scores across variables. The new weights give rise to a new set of total scores against which the correlation between each of the variables are calculated. This process is repeated until the weights assigned to each variable converge. Following convergence of the vector of weights, claims can then be ordered by the final scores, S (∞) , in ascending order to help focus claims investigation resources as before. 4.7.5.4 Brockett et al. (2002) use matrix algebra to show that the converged vector of weights assigned to each variable is the eigenvector corresponding to the largest eigenvalue of the matrix F'F, where matrix F is the matrix of Ridit scores for each claim and variable, columns 2 to 4 of Table 6 above in our example. The matrix F has dimensions n by m, where n is the number of claims in our dataset and m is the number of predictor variables. F'F is thus an m by m matrix, that is, a square matrix with number of rows and columns equal to the number of predictor variables. To understand this result better it will be helpful to understand what the matrix F'F represents and how to interpret the eigenvector corresponding to the largest eigenvalue of a matrix. 4.7.5.5 The steps involved in the method of weight refining are exactly the steps carried out in an iterative method of solving for the eigenvector corresponding to the largest eigenvalue of a matrix known as the Power Method (Jolliffe, 2004). So it is not surprising then that the result of the iterative weight-refining method is the dominant eigenvector of the matrix (see Table 7). 4.7.5.6 The fact that the most explanatory weights are derived as the dominant eigenvector of the matrix F'F has an interesting interpretation. Recall that the formula for the covariance between two variables X and Y is: and that the variable scores are 'centred with mean zero' (see ¶4.3.4), we can see that the matrix F'F is the matrix containing sample covariances between variables (Ai, Brockett & Golden, 2009). Friedman & Weisberg (1981) explains that the R 2 statistic is useful in the context of regression where a dependent variable is modelled on a number of independent variables. But in the context where a researcher is trying to establish how several variables may influence the same underlying variable (likelihood of fraud in our case), the largest eigenvalue of the correlation matrix (which is the same as the largest eigenvalue of the covariance matrix) is the test statistic that indicates the "maximum amount of variance of the variables which can be accounted for with a linear model by a single underlying factor" (Friedman & Weisberg, 1981: 1). In this case that would be the fraud dimension, which is a straight line in an m-dimensional space (the number of variables we have) from least likelihood of fraud to highest likelihood of fraud. 4.7.5.8 Brockett et al. (2002) and Ai, Brockett & Golden (2009) demonstrate that the weights obtained by this method are closely related to each variable's ability to discriminate between fraud and non-fraud claims, with lower weighted variables having lower discriminatory power. This may be useful if it is costly for the company to collect certain data fields; data on variables with Pridit weights close to zero may not be worth collecting in future. Readers wishing to work through the derivations in Brockett et al. (2002) that deal with variable discriminatory power may also find Graybill (1983) helpful, in particular theorems 8.4.3 and 8.5.2. 4.7.5.9 Some practitioners may wish to compare these weights to their expectations of the predictive power of each variable in order to gain insight into underlying fraud patterns picked up by the model, potentially to reveal any misunderstandings in the nature of fraud. We remind readers, however, that no judgment is required to calculate these weights. This may be seen as a disadvantage as user input is limited, but the user has the freedom to override the calculated weights if this is considered appropriate, notwithstanding the preference to rely on the weights reflecting the intrinsic qualities of the data.

PRACTICAL CONSIDERATIONS OF THE PRIDIT METHOD
Some of the advantages of the Pridit method may be evident from the descriptions in Section 4. These are summarised in the paragraph that follows. Thereafter, we discuss the question of how often to update the model to maximise the chance of detecting changing fraud patterns.

5.1
Key Advantages of the Pridit Method -No training sample required. The method uses only the claim file. There is no need to parameterise the model using past data. -Ability to rank claims from highest fraud likelihood to lowest fraud likelihood. This output is particularly useful for reasons discussed previously. -Easy to implement with little expertise required. The methods can easily be implemented in spreadsheet software such as Microsoft Excel, or in a more advanced programme. Though, where a sensible ordering of categorical responses is needed, this may only be achievable under certain circumstances with prior experience. -Two factors result in less expertise being required by the user in selecting variables. The model-assigned weightings for non-informative variables will be close to zero (Brockett et al., 2002); and if the user-defined ordering is reversed from least likely to be fraudulent to most, the model will correct for this by assigning negative weights (Ai, Brockett & Golden, 2009). -Does not need to be updated regularly unless it is suspected that the underlying nature of fraud has changed. Ai, Brockett & Golden (2009)  Given that the Pridit model can be updated quite easily, we suggest that a regular exercise be conducted to determine whether recent data indicate the need to change the weights. This may provide clues as to changes in fraud patterns, facilitating a proactive approach by the insurer.

5.2.2
We suspect that fraud syndicates may provide unique challenges to unsupervised methods such as Pridit. Syndicates tend to submit a large number of claims, which may reduce the ability of unsupervised fraud detection methods to identify such claims as outliers. This may result in such methods being ineffective in identifying fraud committed by these syndicates. This is an important area for further research, as fraud committed by syndicates is usually more organised and conducted on a larger scale.

5.2.3
We suggest that practitioners test a new batch of claims under the old model parameters before updating the model and weights in case a syndicate has submitted a large number of claims with similar characteristics, distorting the model's view of what is 'usual' and hence its ability to identify the 'unusual'.

6.1
In the discussion that follows we consider the accuracy of the Pridit method in identifying fraudulent claims. If the method is practical, simple and robust, but is not accurate, then it is not a suitable method for identifying fraudulent claims.

6.2
Ai, Brockett & Golden (2009) explain that it is challenging to test unsupervised methods for the same reason that unsupervised methods are useful: there is no training sample. If a past dataset of claims information were available, where for each claim it were known whether the claims were fraudulent or not, then it would be possible to test how accurate the unsupervised method is in identifying fraudulent claims. However, in microinsurance, access to a training sample is unlikely.

6.3
While it may be some time before we can scientifically test the accuracy of the Pridit method in the context of microinsurance, it would be helpful to see the results of tests done in an insurance context where a training sample was available. In both Brockett et al. (2002) and Ai, Brockett & Golden (2009), the Pridit method was tested using a past claims dataset of 1 399 personal injury protection claims provided by the Automobile Insurance Bureau (AIB) in Massachusetts that had information on a number of variables (which could be used as predictor variables) and, importantly, whether each claim was fraudulent or not. The fraud status of each claim was obtained by expert claims assessors.

6.4
Tests were carried out to compare the performance of the Pridit method against supervised fraud identification methods, including logistic regression, support vector machines and Bayesian additive regression trees. The correlations between the Pridit scores and fraud likelihood measures generated by the supervised fraud identification methods were assessed. The correlations were high (ranging from 0.55 to 0.91) and statistically highly significant (p < 0.0001). "This suggests that while the Pridit method uses only the consistency of the internal structure of the predictor variables, its predictions of the suspicion levels are similar to those of the more information intensive supervised methods" (Ai, Brockett & Golden, 2009: 27).

6.5
A particularly interesting finding by Ai, Brockett & Golden (2009) when comparing the Pridit method to supervised fraud detection techniques was that, while supervised fraud identification techniques were more accurate overall in identifying both fraudulent and non-fraudulent claims, when focusing on fraudulent claims only, the Pridit method more accurately identified fraudulent claims. This is desirable because, according to these authors, the risk of inaccurately classifying truly fraudulent claims as non-fraudulent (false negative) is greater than the risk of classifying non-fraudulent claims as fraudulent (false positive) as the cost of failing to identify the fraudulent claim (the cost of paying the claim) will likely be higher than the costs of assessing a claim to determine if it is fraudulent. This may be less applicable in microinsurance due to the low average claim size, but the greater accuracy in identifying fraudulent claims is still a desirable feature of the Pridit method.

6.6
Ai, Brockett & Golden (2009) conducted a number of tests to compare the accuracy of the Pridit method to two other unsupervised methods: Kohonen's feature map and cluster analysis. The results of the tests were that the three unsupervised methods gave the same classifications (fraud or non-fraud) on most claims. However, the two non-Pridit methods suffered certain disadvantages. According to Ai, Brockett & Golden, Kohonen's feature map produces graph results which are difficult to interpret and are computationally intensive, while the main drawback of cluster analysis is that it cannot rank claims from highest fraud likelihood to least, which is a key feature of a fraud detection model.

6.7
Because of these disadvantages of the two competitor unsupervised methods, the similar performance of all three methods and the ease of implementation of the Pridit method (and other advantages listed in the previous section), we believe that the Pridit method is the best option of the unsupervised methods for a microinsurance company aiming to introduce a fraud identification method into their operations.

6.8
While these tests were not conducted in a microinsurance context, they do at least provide us with evidence of the success of the method in another insurance context. It should be explicitly noted that, as with all statistical methods, the Pridit method does not purport to identify fraudulent claims with complete accuracy. We suggest that there is sufficient evidence of success to give microinsurers confidence that the method would prove beneficial, particularly if they currently have no existing fraud identification strategy in place.

7.1
Microinsurance has the potential to make a big difference in local economies and in individuals' lives by helping households recover from losses caused by uncertain events. However, fraud poses a serious threat to the viability of the microinsurance market and hence the availability of insurance products for low-income earners. Traditional methods of combating fraud are often unaffordable in the context of microinsurance as premiums need to be kept to a minimum to make policies affordable to low-income earners.

7.2
A key part of the fraud mitigation process is identifying fraudulent claims. The principle component analysis of Ridit method for fraud classification (Pridit) originally developed by Brockett et al. (2002) is a method that appears to be well suited to identifying fraud in microinsurance initiatives where it is critical to keep costs low. The method is easy to implement and is statistically sound. Tests on conventional insurance claims data show that the method has been reasonably accurate in detecting fraud with at least one dataset.

7.3
One challenge with the method is the subjectivity required to select the predictor variables and the criteria for assigning values to them. Safety nets are available, however, that result in auto-correction of the method when variables are incorrectly categorised, assigning low weight to variables with low indication of fraud.

7.4
A number of advantages make the method appealing, not least the combination of low cost and satisfactory accuracy when compared to supposedly more sophisticated methods.

7.5
We are of the view that microinsurance companies that currently have no fraud identification strategy in place have nothing to lose by implementing this method. It can be implemented by staff who are not expert claims assessors. Output provided is in a useful form, allowing limited claim follow-up resources to be used optimally and non-suspicious claims to be paid immediately to improve the insurer's reputation in the typically fragile micro insurance market.

7.6
Because the method has not been adequately tested in a microinsurance context, there is a possibility that the Pridit method will prove to be less accurate in identifying fraudulent claims than other supervised identification methods. However, even if this were the case, developing an effective fraud detection system is not achievable overnight. An effective fraud detection system requires an ongoing and dynamic process that incorporates emerging information and trends with time. The Pridit method for fraud classification may be viewed as a step in this process. We expect that using the Pridit method will allow a microinsurance company to build up a training sample more rapidly than if it were to follow up on claims at random. The training sample could then be used in future to apply advanced and mathematically rigorous statistical techniques to minimise the cost of fraudulent claims. That being said, there are certain advantages of the Pridit method over supervised methods, such as the failure of supervised methods to adapt as fraudsters adopt new methods of committing fraud.

8.1
It is clear that the Pridit method needs to be tested more adequately to establish just how successful it is at identifying fraudulent claims. This is certainly an area for further research.

8.2
Similarly, Pridit may be more successful at identifying claims for certain types of products. A comparison of the effectiveness of Pridit in general insurance, life insurance and funeral insurance (and possibly other product types) would be useful.

8.3
Syndicates tend to submit a large number of claims, which may reduce the ability of unsupervised fraud detection methods to identify such claims as outliers. This may result in such methods being ineffective in identifying fraud committed by syndicates. This is an important area of research.

8.4
An industry body that collects data on fraudulent claims from a number of microinsurers (possibly each applying the Pridit method to channel their limited claim checking resources and speed up the process of developing a training sample) could allow this research to be done for the benefit of all microinsurers. If such an initiative were to begin we would encourage microinsurers to participate for the benefit of the entire industry. We suggest that a good starting point would be to build on the work done by Astute FSE and SAICB, mentioned in ¶ ¶ 3.6.4 and 3.6.5. Tennyson, S (2008). Moral, social, and economic dimensions of insurance claims fraud. Social  Research 75(4) The following paragraphs provide some additional detail to clarify ¶4.4.3.
A1.1 Consider Variable 1, introduced in ¶4.3.1. Now imagine a set of 100 claims. Closer inspection of the set reveals that the percentage of claims in each of the low, medium, and high categories on our variable is 60%, 30% and 10% respectively. If we assign integer value subscripts to identify the categories (low 1 to high 3), then p 1 =0.6, p 2 =0.3 and p 3 =0.1.
A1.2 We now calculate a Ridit score for each category using the percentages of claims falling into each category (the p's) as follows: B 1 = -0.4, B 2 = 0.5, and B 3 = 0.9, from formula (1).
A1.3 Note that the Ridit score for category j is never greater than the Ridit score for category i where j > i, because the ordering of categories is maintained through the transformation. This allows appropriate subjectivity in setting the ordering of the categories, in other words, to specify which categories are more severe.
A1.4 It is helpful to consider the scenarios in which scores of 0, 1 and -1 are obtained before interpreting these results. If the percentage of responses that fall into categories below category i is the same as the percentage of responses that fall into categories above category i then the Ridit score for category i is zero. The category in this case lies in the middle of all categories, hence between the extreme values or categories of the variable in question. The extreme of 1 occurs when calculating the Ridit score for the most severe category where all responses are in categories below the most severe category. The extreme of -1 occurs when calculating the Ridit score for the least severe category (low in this case) where all responses are in categories above the least severe category.
A1.5 In our example, the Ridit score on the 'medium' category is higher than 0 and it is closer to the Ridit score for category 'high' than it is to the Ridit score for category 'low'. This indicates that it is closer to the high extreme than it is to low extreme on the severity scale of Variable 1.
A1.6 If the percentages in category 'low' and 'high' were similar, say 40% and 30% respectively, with the percentage in the 'medium' category remaining unchanged, the Ridit scores would be: B 1 = -0.6, B 2 = 0.1, and B 3 = 0.7. Transferring some of the mass from the lowest category to the highest category results in a more even distribution of scores, with the score for the 'medium' category closer to 0, exactly between the two extremes of -1 and 1. A1.7 If the percentage of claims in each of low, medium, and high categories on our variables were 5%, 60%, 35% respectively, instead of 60%, 30%, 10% as before, our Ridit scores for each category would be: B 1 = -0.95, B 2 = -0.3, and B 3 = 0.65. Here we can see that our 'low' category has a score closer to the low extreme on our severity interval of [-1,1].
Similarly, if the percentage of claims in the 'high' category was 5%, then the Ridit score for the 'high' category would be 0.95, which is close to the high extreme on our severity interval of [-1,1].

APPENDIX 2 Data used for Example Application of Pridit to Microinsurance
A2.1 The data that we use to demonstrate the workings of the Pridit method for fraud classification were obtained from a South African insurance company, which offers two types of insurance products the low-income market, referred to as product A and product B in this section. Both insurance products are distributed through an institution that sells consumer goods on credit. The institution requires that an individual purchasing goods on credit at least purchases insurance product A; product B is an extension of product A that offers more comprehensive cover. The purpose of this requirement is to limit financial loss, to both the insured and the institution, in the event that one of the insured perils occurs, resulting in damage to or loss of the asset purchased on credit. The way in which this is achieved by each of the two insurance products differs and will be explained in ¶A2.2 and ¶A2.3 respectively.
A2.2 When an individual purchases a good from the institution on credit, he/she is required to pay monthly instalments that include both interest and capital for an agreed term until the loan is repaid. Product A is required to be taken out at the same time as the purchase of the good and is paid for by monthly premiums that are added to the monthly loan instalments.
During the repayment term, if the good is damaged or stolen, product A will pay the outstanding balance on the loan at the date that the insured peril occurred, subject to the terms and conditions set out in the policy document. Product A eliminates the financial loss sustained by the insured because they do not have to pay any further monthly instalments for goods that they no longer have the use of. At the same time, it limits the financial loss to the institution because if the insurance was not in place, the purchaser would be less willing to continue paying the monthly instalment and hence default would be more likely.
A2.3 At the point of purchase, the individual can opt to purchase product B instead of product A. Product B operates similarly to product A but instead of settling the outstanding balance on the loan at the date that the insured peril occurs, product B either provides for the repair of the good in the case of reparable damage or provides for the replacement of the good on a new-for-old basis in the case of theft or irreparable damage. This limits the financial loss to both the insured and the insurer because the insured has an incentive to continue paying the monthly instalment, while from the insured's perspective, the good is repaired or replaced.
A2.4 The data that were used for the application of the Pridit method included both details of policies in force and details surrounding claims that arose from these polices during the investigation period from 1 January 2009 to 31 December 2010. This investigation period was chosen because it was believed that all claims that occurred during the period were fully run-off by the time that the analysis was performed. The data were extensively analysed for errors and omissions. Any errors and omissions that were identified were excluded from the dataset in order to reduce the chances of any distortions in the final results.
A2.5 The dataset contained variables for each policy record and claim record during the investigation period. The variables relating to the policy details and the claims details that were used in the application of the statistical method are shown in Tables A2.1 and A2.2 respectively. The reasons why these variables were used in the application of the statistical method will become clearer in Appendix 3. The institution offering the insurance products had a total of 161 physical branches distributed throughout South Africa during the investigation period. The variable LOCATION refers to the branch at which the insurance policy was taken out. For the reason of non-disclosure, the names of these locations are not revealed in this paper. The variable CLAIMTYPE refers to the cause of a claim. The cause of a claim was either damage or theft.

A2.7 Descriptive Statistical Analysis of the Data a2.7.1 product type
The proportion of type A policies in force during the investigation period was 11,7%, while the proportion of type B policies in force was 88,3%. The proportion of the claims arising from type A policies during the investigation period was 12,3%, while the proportion of the claims arising from type B policies was 87,7%.

a2.7.2 reporting delay
The average reporting delay for the claims during the investigation period was approximately 35 days. This was obtained by taking the number of days between the INCIDENTDATE variable and the REPORTINGDATE variable for each claim, summing these values up and then dividing the result by the number of claims that occurred during the investigation period. The significance of the average reporting delay will be explained in Appendix 3.

a2.7.3 cauSe of claim
Approximately 19,2% of claims were as a result of damage and 80,8% as a result of theft. During the investigation period, a total of 2 141 claims occurred.

a2.7.4 claim amountS
The average claim amount was R4 083-76. Since this is a relatively small amount, it is clear that these two insurance products can be classified as microinsurance products.

APPENDIX 3 Example of Pridit Application
A3.1 Using the data described in Appendix 1, we work through the four steps covered in Section 4: -Step 1: selection of predictor variables; -Step 2: assigning categories for predictor variables and determining ordering; -Step 3: calculating Ridit scores; -Step 4: optimising weights between variables by calculating the largest eigenvector of the matrix F'F; and -Step 5: ranking claims from 'most likely' to be fraudulent to 'least likely' to allow channelling of limited resources.
It is important to realise that the discussions that follow are particular to the products in question. Different variables and considerations may apply to other products such as life microinsurance and funeral insurance. a3.2 Step 1: Selection of predictor variaBleS A3.2.1 Recall from Section 4 that the first step in applying the Pridit method is to identify claim characteristics that can be used as predictor variables that may be indicative of fraud.
A3.2.2 The selection of variables may be based on past evidence of variables being associated with fraud-for example, Artis, Ayuso & Guillén (2002) found that the longer it takes for a policyholder to report a claim to the insurer, the greater chance there is of that claim being fraudulent. The reason suggested for this finding is that once an individual has staged a claim event, they may be hesitant to contact the insurer because they are fearful of being caught out. This results in a delayed notification until the policyholder finds the courage to notify the insurer of the claim event (ibid.). This suggests that reporting delay can be used as a predictor variable with longer reporting delays being more indicative of fraud.
A3.2.3 However, an insurer may have a number of variables for which there is no such evidence of fraud. In this case, the decision of which variables to choose may seem daunting, particularly for someone without a background in detecting fraudulent behaviour, but there are two things that reduce the risk of error: -If the user inputs a variable that is not closely related to fraud occurrence, then the measure of variable discriminatory ability developed by Brockett et al. (2002) will be close to zero for that variable and the corresponding weight assigned to the variable will be small (Ai, Brockett & Golden, 2009). There is thus virtually no consequence to the user adding additional variables to the model that are poor indicators of fraudulent claims. While it is required that as many variables as possible that give some clue of fraud likelihood are included in the model, it is not required that variables that don't give some clue of fraud likelihood are manually excluded from the model. The model will automatically 'exclude' them by assigning lower weight if they turn out to add little to the model's classification ability.
- Ai, Brockett & Golden (2009) also explain that if user input is incorrect, in that the relationship is opposite to what is expected (for example, it is suspected that a long reporting delay is indicative of fraud when in fact a short delay is more indicative of fraud), then the abovementioned measure of variable discriminatory power will be negative and the negative relationship will correct the ordering error made by the user. Thus, even though the user has got the ordering incorrect, the method will still highlight claims that are more likely to be fraudulent. -To summarise the above points, the user should rather include too many variables in the model to increase the chances of including significant indicators of fraud likelihood. Even if the wrong relationships are described for certain variables, it will not detract from the ability of the model to identify fraudulent claims. This makes it clear that a high level of expertise is not required to calibrate the model, making it even more suitable to microinsurance companies striving to keep expenses down to a minimum.
A3.2.4 In this exercise, besides the help from literature on the reporting delay variable, the selection of the predictor variables was essentially an exercise of judgement that was guided by the results of the descriptive statistical analysis performed on the data. In other words, with the data that were available there was no scientific way of selecting the predictor variables and someone else applying this method to the same products and data may select different predictor variables. Careful consideration was also given to the features of the two products that present opportunities for fraud in order to inform the selection of the predictor variables. A total of five predictor variables were selected for the application of this method.

a3.2.5 product type
It was explained in Section 2.4 that when individuals are compelled to take out insurance, they are more likely to be accepting of fraudulent behaviour (Tennyson, 1997). 'Product type' was thus chosen as a predictable variable because the institution required the individual to take out product A when purchasing goods on credit, whereas the individual had the option to select product B in place of product A. A claim on product A would thus have a higher suspicion of fraud than a claim on product B.

a3.2.6 cauSe of claim
The cause of claim, either theft or damage, was not used directly as a predictor variable in the model. It was, however, important to distinguish between the two causes in order for a method of assigning a value to the 'duration of policy at claim date' predictor variable to be established. The following section explains the reason for this.
a3.2.7 duration of policy at claim date A3.2.7.1 The selection of 'duration of policy at claim date' as a predictor variable was based on the consideration of the features of each product. For product A, a theft or damage claim that occurred soon after the policy commenced would have a higher suspicion of fraud than a theft or damage claim that occurred towards the end of the term of the policy. This is because soon after the policy commenced the amount outstanding on the loan would be greater than what it would be towards the end of the term of the policy. In other words, the potential financial gain from submitting a fraudulent claim on product A is much greater at earlier durations of the policy.
A3.2.7.2 For product B, a damage claim that occurred soon after the policy commenced would have a lower suspicion of fraud than a damage claim that occurred towards the end of the term of the policy. This is because the financial gain from submitting a fraudulent damage claim soon after the policy was taken out is much smaller than what it would be from submitting a fraudulent damage claim towards the end of the term of the policy. At an earlier duration, the damaged good would either be repaired or replaced, if the damage was irreparable, but the policyholder would still have to continue paying the monthly instalments. At later durations, when the good is much older, the damaged good may be replaced with an updated version, since the policy operates on a new-for-old basis, or restored to its original condition in the case of repairable damage. The policyholder would in this case have only a few loan instalments remaining. The same, however, is not necessarily true for theft claims. A fraudulent theft claim at earlier durations would still have a significant financial gain because, even though the policyholder would be required to continue paying the monthly instalments, a replacement good would be provided and, in addition, the policyholder would still have the use of the original 'stolen' good.

a3.2.8 reporting delay
As described in ¶A3.2.2, reporting delay may be a useful indicator of fraudulent behaviour and is thus included in our model.

a3.2.9 geographical location
A3.2.9.1 Recall that the institution had a total of 161 physical branches distributed throughout South Africa. This data allowed for a claim incidence rate to be calculated for each of the branches. This was done by taking the number of claims and dividing it by the total exposed to risk for each branch, during the investigation period. This calculation was performed in the standard actuarial way.
A3.2.9.2 The purpose of calculating a claim incidence rate for each of the branches was to identify branches that had an 'unusually high' claim incidence rate. The definition of 'unusually high' was relative to the average incidence rate across all the branches.
A3.2.9.3 The link between geographical region and suspicion of fraud is that claims from a region with an unusually high claim incidence rate have a higher suspicion of fraud than claims from a region where the incidence rate is deemed to be 'normal' or 'unusually low' (Artis, Ayuso & Guillén, 2002). The rationale behind this suggestion is that members within a community often exhibit similar attitudes towards insurance fraud, which may have a significant effect on the incidence of claims within that area (Tennyson, 1997). In addition, fraud is sometimes committed by organised syndicates that operate in specific geographical locations (ibid.).

a3.2.10 claim amount
The claim amount was selected as a predictor variable because it has been suggested by Brockett et al. (2002) that the prevalence of fraud varies with the amount of a claim. They hypothesise that the prevalence of fraud increases as the average claim amount increases. Tennyson (1997) suggests that the potential financial gain from committing fraud has to be large enough to outweigh the potential consequences of being caught out. This is perhaps the reason for the hypothesis proposed by Brockett et al. A3.2.11 It is important to note that consideration of any of the above predictor variables in isolation is inappropriate and does not make sense (Brockett et al., 2002). For example, it does not make sense to suggest that a claim arising from product A is more likely to be fraudulent than a claim arising from product B when considering the 'product type' predictor variable in isolation. Instead, the predictor variables should be jointly considered. It is the overall fraud score that is produced from the Pridit method that should be considered before any conclusions are drawn (ibid.).

a3.3
Step 2: a method for aSSigning valueS to the predictor variaBleS A3.3.1 Establishing a method for assigning values to the predictor variables is also an exercise of judgement, hence many different methods are possible (Brockett et al., 2002). The only requirement is that the predictor variables are treated consistently (ibid.).
A3.3.2 For each of the predictor variables, categories need to be created (ibid.), unless the variable is a continuous variable (Ai, Brockett & Golden, 2009). These categories are then ranked in order of decreasing fraud suspicion (Brockett et al., 2002). Once the categories have been arranged in order of decreasing fraud suspicion, an integer value in increasing order is assigned to each category (ibid.). In other words, the category with the highest fraud suspicion is assigned a value of one, the category with the second highest fraud suspicion is assigned a value of two and continuing in this fashion until the last category as we saw in Section 4. A3.3.3 For example, on the 'product type' variable, a claim from Product A will be assigned a value of one, while a claim from Product B will be assigned a value of two. The value of one, which corresponds to a claim from Product A is more indicative of fraud than a claim from Product B since Product A is compulsory, while product B is not.
A3.3.4 The list of variables that have been used to define these limits is shown in Table A3.1. The chosen categories for the predictor variables in this exercise are shown in Table A3.2. It is important to note that the limits defining each of the categories have been subjectively chosen (ibid.). 'Subjectively' in this context is intended to mean that they are based on judgement and what is believed to be appropriate (ibid.). Another individual conducting this research might well have chosen different limits.
A3.3.5 Up until this point, the dataset has simply been used to inform the selection of predictor variables and the setting of appropriate limits for each of the categories. Although this method does not require that historical data be used for this purpose, it has been used here in order to help inform the subjectivity that has been exercised.  Table A3.2. The percentage of claims in each category on each variable was then calculated, allowing us to calculate Ridit scores for each category on each variable. For each claim, the corresponding Ridit scores were looked up depending on the characteristics of the claims. These values were then arranged in matrix form, with each row representing a single claim record and the column entries for that row representing the corresponding Ridit values of the predictor variables in the same order as presented in Table A3.2.
A3.4.2 The iterative method of weight refining was then carried out on the abovementioned matrix of Ridit scores. a3.5 Step 5: ranKing claimS and taKing action A3.5.1 For the random sample of ten claims under investigation, the resultant fraud suspicion scores are shown in Table A3.3. These scores are ordered from smallest to largest.
A3.5.2 From the scores in Table A3.3, claims one and eight have the highest suspicion of fraud, while claims four and ten have the lowest suspicion of fraud. Brockett et al. (2002) suggest that a suitable method of deciding which claims to investigate further, would be to investigate all claims that have a negative fraud suspicion score. In the above scenario, claims one and eight would be investigated further. A3.4.3 Although the results of this method seem trivial, Brockett et al. (2002) propose that the decision to investigate specific claims based on the results of this method is a better alternative to randomly selecting claims from the set to investigate, given that all claims cannot be investigated due to limited resources.