Duopoly insurers' incentives for data quality under a mandatory cyber data sharing regime

We study the impact of data sharing policies on cyber insurance markets. These policies have been proposed to address the scarcity of data about cyber threats, which is essential to manage cyber risks. We propose a Cournot duopoly competition model in which two insurers choose the number of policies they offer (i.e., their production level) and also the resources they invest to ensure the quality of data regarding the cost of claims (i.e., the data quality of their production cost). We find that enacting mandatory data sharing sometimes creates situations in which at most one of the two insurers invests in data quality, whereas both insurers would invest when information sharing is not mandatory. This raises concerns about the merits of making data sharing mandatory.

Not only is cyber risk increasingly found everywhere, but the interconnectedness and interdependency of this modern world also poses challenges of its own. As pointed out by Böhme and Kataria (2006), there are at least two important forms of interdependent cyber risk : First, firms are connected to each other. While allowing huge efficiency gains when exchanging information across complex supply-chains, this also means that diligent security efforts at any one firm always risk being undermined by sloppy security somewhere else. The proverbial chain is never stronger than its weakest link. Second, many firms use the same systems, so a vulnerability found in a popular operating system, web browser, or encryption protocol may immediately put millions if not billions of machines at risk.
These difficulties are particularly relevant for insurers and reinsurers who underwrite cyber risks as part of cyber insurance offerings, and it has been repeatedly observed that interdependent cyber risk poses an important challenge to the development of a more mature and well-functioning cyber insurance market (Anderson and Moore, 2006;OECD, 2017, pp. 93-98). This is not the place to review the extensive literature on cyber insurance-a comprehensive but slightly dated literature review is offered by Böhme and Schwartz (2010) and a more recent review is given by Marotta et al. (2017); Barreto et al. (2021). Some notable complications with cyber insurance-in addition to the interdependence of cyber risks noted above-include unclear coverage, immature market offerings, various information asymmetries, and lack of cyber security experience and expertise on the part of insurers. In our context, however, the most important complication is lack of good actuarial data (see e.g. Biener et al., 2015;Franke, 2017;EIOPA, 2019;OECD, 2017, pp. 94-95).
To some extent, this lack of data reflects more general problems with cyber risk data, not limited to cyber insurance. A recent attempt to systematize quantitative studies of the consequences of cyber incidents by Woods and Böhme (2021) found several contradictory and sometimes spurious results, and cau-tions against employing too simple statistical relationships. Similarly, a review of estimates of cyber risk likelihood found contradicting trends and emphasizes the need for rigorous and transparent methods to avoid jumping to erroneous conclusions (Woods and Walter, 2022). In the insurance context, the difficulty of properly quantifying cyber risk forces expert-based or best-guess rather than actuarial pricing. Clearly, this may lead to undesirable outcomes, such as underpricing, where insurers unknowingly accept too much cyber risk, overpricing, where insureds pay too much for their risk transfer, or blanket exclusions of certain kinds of customers, who thus cannot reap the benefits of insurance (see, e.g., Gordon et al., 2003b;Mott et al., 2023). Against this background, it has been proposed that increased sharing of data between insurers might be beneficial.
A recent example is an analysis by the OECD (2020) on how to enhance the availability of data for cyber insurance underwriting. The report walks through existing practices such as cyber incident data being published by CERTs or regulators, information exchange (such as the CRO forum), commercial catastrophe models made available by firms such as AIR Worldwide and RMS, and reinsurer collections of aggregate data, but ultimately concludes that "[n]one of these data sources on their own provide sufficient information for underwriting coverage as incident data is seen to be incomplete, historical experience covers too few claims and models are relatively new and untested" (OECD, 2020, p. 9).
Instead, three recommendations for government action are made; (i) to remove legal obstacles to incident and claims data sharing, (ii) to encourage industry associations to establish mechanisms for incident and claims data sharing, and (iii) to encourage international collaboration.
Another recent example is a strategy note on cyber underwriting published by the European Insurance and Occupational Pensions Authority (EIOPA, 2020).
Here, lack of data is identified as a primary obstacle to the understanding of cyber risk, and accordingly, to appropriate coverage being offered on the market. It is also noted that the mandatory incident reporting regimes established by recent legislation such as the GDPR and the NIS directive will create relevant data. Against this background it is argued that access to a cyber incident database "could be seen as a public good and underpin the further development of the European cyber insurance industry and act as an enabler of the digital economy" (EIOPA, 2020, p. 3). The strategy delineated consists of EIOPA (i) promoting a harmonized cyber incident reporting taxonomy with "an aim to promote the development of a centralised (anonymised) database" (EIOPA, 2020, p. 4), (ii) engaging with the industry to understand their perspective, and (iii) encouraging data sharing initiatives.
The industry association Insurance Europe (2020), in a direct response to the EIOPA strategy, broadly welcomes the strategy's recognition that lack of data is a serious impediment to the growth of the European cyber insurance market. However, Insurance Europe also notes that there are trade-offs involved.
Specifically, it is cautioned that while a common cyber incident database should ideally be more detailed than the GDPR and NIS data it should at the same time not impose unnecessary burdens of additional reporting or IT system adaptation, and such a database should not distort competition. 1 Specifically, "if an insurer shares data it must gain access to an equal quantity and quality of data in return" (Insurance Europe, 2020, p. 2).
The relevance of data quality is underscored by recent empirical research on NIS incident reports. Based on all the mandatory NIS incident reports received by the responsible government agency in Sweden in 2020, Franke et al. (2021) find the economic aspects of reports to be incomplete and sometimes difficult to interpret. Thus, it is concluded that "just making NIS reporting, asis, available to insurers would not by itself solve the problem of lack of data for cyber insurance. Making the most of the reporting requires additional quality assurance mechanisms." It is this unfolding policy issue that motivates the research question of this 1 Despite the enthusiasm of Insurance Europe about using GDPR and NIS incident reports to improve cyber insurance offerings, not everyone offering cyber insurance is even aware of this possibility, as shown by a study in Norway where the interviewed "insurers seem oblivious to this aspect of NIS" (Bahşi et al., 2019). article: What would happen to data quality under a mandatory cyber data sharing regime for insurers? To answer it, a game-theoretic model is constructed where cyber insurers interact on a Cournot oligopoly market, but are uncertain about their (and their competitors') production costs, i.e., the true costs of the cyber incidents underwritten. When forced to share what information they do have, they cannot refuse, but they can choose whether to invest in improving the data quality of their own information, or just provide it as-is. The model can be seen as an attempt to formalize Insurance Europe's remark about sharing equal quantities and qualities of data-how would such sharing unfold? It should be stressed that both the OECD and EIOPA stop short of recommending mandatory cyber data sharing laws. Nevertheless, the question is implicitly on the table, and our investigation aims to bring one more perspective to this important issue.
The rest of the paper is structured as follows. In the next section, some related work is discussed, and the contribution is positioned with respect to this literature. In Section 3 the formal model is introduced, and the main results are shown in Section 4. We find mandatory data sharing changes the feasible Nash Equilibria (NE) and creates situations in which at most one of the two insurers invests in data quality, whereas both insurers would invest when information is not shared. The results are followed by a discussion of implications and conclusions in Section 6.

Related work
The general topic of cyber security information sharing is extensively addressed in the literature. A good starting point is the literature survey provided by Skopik et al. (2016), who offer a comprehensive and broad overview of legal, technical and organizational aspects. Koepke (2017) provides a more focused literature review on incentives and barriers, complemented by a survey of 25 respondents. Of particular interest in our context are the collaborative barriers related to "a lack of reciprocity from other stakeholders or the prob-lem of free-riders. This barrier category also includes the risk of sharing with rivals/competitors who may use the shared information to enhance their competitive position" (Koepke, 2017, p. 4).
Turning to formal game-theory, two classic treatments are offered by Gal-Or and Ghose (2005); Gordon et al. (2003a), who show information sharing may yield benefits to firms, but also can result in free-riding. Later works typically find similar sharing regime or not to, including its evolutionary stability (see e.g. Tosh et al., 2015Tosh et al., , 2017. Whereas the previous treatments consider symmetric players-the potential victims of cyber attacks-asymmetric games have also been studied. Laube and Böhme (2016) devise a principal-agent model of mandatory security breach reporting to authorities (such as those mandated under the GDPR and the NIS directive). Assuming imperfect audits which cannot determine for certain whether the failure to report an incident is deliberate concealment or mere lack of knowledge, Laube and Böhme find that it may be difficult to enact the sanctions level needed for the breach notification law to be socially beneficial.
Whereas the works mentioned above treat for-profit parties interested in sharing and receiving information there are also non-profit actors who can participate in such arrangements. For instance, Dykstra et al. (2022) analyze information sharing of unclassified cyber threat information by a government institution. Such non-profit institutions may share unclassified information in order to improve social welfare, rather than maximize their own profits.
For a fuller literature review of game theory models of cyber security information sharing, see Laube and Böhme (2017), who not only summarize the literature, but also systematize it using an illuminating unified formal model. However, in our context, it is important to note that their review does not include any articles investigating information sharing among insurers. Thus, whereas information sharing between the firms at risk is a standard component in game theoretic models of cyber risk-models which often include insuranceinformation sharing between the insurers underwriting the firms at risk has not yet been formally investigated using game theory, despite the policy attention described in Section 1.
Our model is inspired in the seminal work by Gal-Or (1986), which addresses information transmission in oligopolies. In her model, firms can share information about their production cost, which is unknown and different for each firm. Gal-Or (1986) finds that under Cournot competition, firms chose to share information, because they benefit when competing firms make an accurate estimation of their production cost. In our model the insurers have the same cost (e.g., they compete in the same market); hence, by sharing information a firm may reduce the uncertainty of the competitor (which is what we expect in an insurance market).
As mentioned above, we have not found any work formally investigating cyber security information sharing among insurers. Instead, the work that is most closely related to ours is a qualitative study by Nurse et al. (2020), who explore data use by cyber-underwriters in general, and the feasibility and utility of a 'pre-competitive dataset' shared within the industry in particular. Such a dataset is in fact precisely what is "encouraged" by the OECD (2020) and EIOPA (2020). However, the idea was met with considerable skepticism by the 12 cyber insurance professionals who participated in the focus groups conducted by Nurse et al. (2020). They were all concerned about the implications for competitiveness, asking why incumbents would jeopardize their advantage by sharing information with market entrants. Indeed, the very structure of such a dataset was deemed sensitive, as even proposal forms are considered proprietary, even though there are published studies based on such forms, see Woods et al. (2017). "People are insanely protective," remarked one participant (Nurse et al., 2020, p. 6).

Market model
We use a Cournot model to study situations in which two insurers compete in a market, given that the claims (risk level) is uncertain. Before giving the formal statement of the model, it is appropriate to discuss some of the modeling choices. First, the Cournot model is an oligopoly model. Thus, on the one hand, competition is not perfect-insurers make profits, which they would not if competition drove marginal prices down to equal marginal costs (Varian, 1992, pp. 180-181). To understand why competition is not perfect, recall that economies of scale and rigorous regulation raise barriers to entry, making it harder for new insurance companies to challenge the incumbents. On the other hand, insurers are not monopolists who can raise prices arbitrarily-there is competition even among oligopolists. For cyber insurance, this is confirmed by several studies: Nurse et al. (2020, p. 3) speak of "an extremely competitive cyber insurance market" and Woods and Moore (2019, p. 27) fear that "Competitive pressures drive a race to the bottom in risk assessment standards". Furthermore, the Cournot model is not an uncommon choice for modeling general (non-cyber) insurance markets (see, e.g., Gale et al., 2002;Wang et al., 2003;Cheng and Powers, 2008;Gao et al., 2016).
Second, production costs are uncertain-insurers do not know beforehand how much it will cost to produce their product, i.e. how large the indemnities owed will be. This reflects the uncertainty about cyber risk and lack of actuarial pricing described in Section 1: insurers underwriting cyber risks are uncertain about those risks.
Third, these uncertain production costs are assumed to be the same for all the market competitors. This reflects the interdependency of cyber risk described in Section 1: for an insurer. More precisely, cyber risk can only be managed up to a point by practices such as insuring customers in different geographical locations or from different industries. While such practices are effective against incident causes such as an outage at a payment service provider servicing a market of just one or a few countries, they are ineffective against other risks, such as the Heartbleed (see, e.g., Zhang et al., 2014) or Log4J (see, e.g., Srinivasa et al., 2022) vulnerabilities, or prolonged outages at major cloud service providers (Lloyd's, 2018). It is these risks-the ones that are difficult to manage-that is our concern here. With respect to these risks, thus, insurers can be seen as essentially picking and insuring insureds from the very same set of eligible firms (with some firms being excluded by all insurers using similar rules-of-thumb). Thus, while the outcomes of claims in a particular year will certainly differ, it is not unreasonable to model these outcomes with random variables representing production costs being the same for all market actors.
Indeed, such an assumption-in one form or another-is implicit in the entire discussion about data sharing.
Fourth, the uncertainties are modeled using normal random variables. The immediate rationale for this assumption is that it allows analytic calculations of conditional random variables. Therefore, it is almost always used in the extant literature on uncertainties on oligopoly markets (most prominently Gal-Or, 1986, and the secondary literature citing her). However, this should be seen as a convenient mathematical approximation, not an empirical claim. Indeed, the literature on the statistics of cyber risk instead typically suggests more heavy-tailed distributions (see, e.g., the review by Woods and Böhme, 2021), and our model does not question that. This approximation is further discussed in Section 5.
Turning to the formal model, in the Cournot competition two firms select their production levels, 2 which determine the market price of their goods. Let 2 It may seem unrealistic that production levels have to be chosen this way. After all, an insurance policy differs from physical goods and is not subject to the same production constraints. However, this is too simplistic-an insurer cannot scale production arbitrarily fast.
Even if the constraints are not identical to those of physical production, important constraints are indeed imposed by, e.g., the ability to hire underwriters and claims managers, by access to capital (whether from investors or from bank loans), by the capacity of the brokers who act as middlemen on the cyber insurance market (see, e.g., Franke, 2017;Woods and Moore, 2019), and by regulation. P = {1, 2} be the set of firms and the real quantity q i ∈ R their production, for i ∈ P. In this case q i represents the number of policies offered. We define the inverse demand function, i.e., the unitary price of a product as where a, b, d > 0. The value p i represents the premium, i.e., the payment that the insurer i receives.
We assume that each insurer has a linear production cost q i C i , where C i is the marginal cost (the claims of each policy). For simplicity we assume that the insurers offer identical products (b = d), allowing the price to be written as a function of the total production p(q i , q j ) = p(q i + q j ) = a − b(q i + q j ). We also assume that the insurers have the same marginal production costs (C i = C), an unknown value with distribution C ∼ N (0, σ), where σ is the uncertainty about the production cost. 3 Note that the assumption that C has mean zero does not affect the results, but considerably simplifies the exposition. 4 The use of a single random cost variable merits some additional discussion.
Importantly, this means that insurers do not benefit from the law of large numbers, as they would if instead there was rather a sum of q i single random cost variables-one per insured. To understand why this is reasonable, recall the discussion above about the fact that cyber risk can only be managed up to a point by practices such as insuring customers in different geographical locations or from different industries. Some risks remain, namely, the ones stemming from irreducible interdependence, as discussed in Section 1. These risks-e.g., the 3 While variance is often denoted σ 2 , for simplicity, we adhere to the notation of Gal-Or (1986) and denote it as σ.
4 Consider the marginal cost C ′ =c + C, wherec ∈ R and C ∼ N (0, σ) represent fixed and variable components. With an inverse demand function of the form a ′ − bq i − dq j the profit can be rewritten as The last step results by making a = a ′ −c. We can set the parameter a ′ large enough to guarantee that a > 0.
risks that all of the insureds are hit by something like Heartbleed or Log4J-are precisely the ones that have prompted the policy interest in cyber insurance information sharing, i.e., the risks that are our concern here. These risks are well modeled by the use of a single random cost variable, and this is what the model is designed to reflect. Of course, this is not to deny that the law of large numbers works well for other risks, including (some) other cyber risks, and that this is a cornerstone of insurance. But the model developed here aims to reflect precisely the interdependent cyber risks that cannot be tamed by the law of large numbers.
In our model each insurer conducts a risk assessment and finds a noisy signal about the claims (i.e., the production cost), denoted Here m i represents the uncertainty inherent in the signal. We assume that Z i is a private signal that depends on investments to improve the risk assessment process and its output.
We do not explicitly model the mechanisms by which Z i can be improved. However, it is clear that many possibilities exist, ranging from better security audits before underwriting clients, to continuous SIEM-like monitoring of clients' systems, to improving DFIR (Digital Forensics and Incident Response) processes once incidents occur. It is equally clear that such possibilities entail costs. Note that as opposed to Gal-Or, we do not assume that firms deliberately garble any information. We do, however, assume that data quality is a real issue, and that it may be low in the absence of deliberate and costly efforts to improve it.
Recall the results from the study of Swedish NIS reports: "Making the most of the reporting requires additional quality assurance mechanisms" . Concretely, an investment h i ≥ 0 leads to an uncertainty where m 0 > 0 represents the uncertainty without any investments (e.g., when the assessment is made using publicly available data 5 ) and α > 1 represents the 5 Public reports, like the studies conducted by NetDiligence or the Ponemon Institute, may Stage 1 which is a concave with respect to the uncertainty level m i . Eq.
(2) implies that it's prohibitively expensive to have no uncertainty (m i = 0). Now, the profit of firm i, denoted π i (q i , q j ), is equal to its income minus both production and risk assessment costs

Game formulation
In our Cournot model the insurers make two decisions at different stages (see Fig. 1): 1. In the first stage insurers make investments and commit to an information sharing policy. This is equivalent to selecting the uncertainty level m i , which in turn determines the investment h i (m i ). We assume that the sharing policy is defined before the game starts.
2. In the second stage the marginal production cost C is realized and each firm gets an estimate Z i . Then the information transmission takes place offer some information about the cyber risks of different industries. In addition, information sharing may take place among business partners, e.g., between insurance firms and their reinsurance providers and/or third parties that offer technical support.
and each firm uses the information available, represented as t i , to select the production quantity q i (t i ). The information available is t i = (Z i , Z j ) when insurers share their cost estimations, otherwise it is t i = (Z i ).
Thus, the strategy of each firm has the form (m i , q i (t i )), which satisfies the subgame perfect equilibria if it is a Nash Equilibrium (NE) in each stage of the game. We start by analyzing the second stage game to determine the production q i (t i ) (see Fig. 1). With this result we build the game in the first stage and then formulate the problem of selecting the optimal uncertainty (noise level) of the data m i . Some of the results in this section resemble the findings of Gal-Or (1986), because the Cournot model there is similar (at a high level) to ours; however, the precise solution for our model is different. We find the precise profit function for each scenario in the next section.

Second stage (low level) game
In this stage m i and m j are given and each firm chooses a production q i (t i ) that maximizes their expected profits given the available information t i (Fig. 1).
We define the game in the second stage as where P is the set of players, S i = R is the strategy space, and W i is the payoff function of the i th firm, which in this case corresponds to the expected profit in a Cournot competition (see Eq. (3)) given the signal t i . The following result shows the form of W i as a function of the optimal production q i (t i ).
Lemma 1. The utility of the game G 2 defined in Eq. (4) is where the optimal production q i (t i ) satisfies Proof. The expected profit in a Cournot competition (see Eq. (3)) is The expectation is made with respect to the unknown parameters, such as the cost C (and the signal Z j when the firms do not share information). Thus, whereĈ i = E C {C|t i } is an estimation of the production cost and E{q j (t j )|t i } is the estimated production of the adversary, given the available observation t i .
The optimal production must satisfy the following first order condition (FOC) In this case the optimal production is unique, since the expected profit is concave with respect to q i : Now, from Eq. (7) the optimal production satisfies Replacing Eq. (8) in Eq. (6) we obtain

First stage (upper level) game
We define the game in the first stage (see Fig. 1) as where insurers can select an uncertainty level m i ∈ [0, m 0 ] and the utility J i is the expected profit, given that firms choose q i (t i ) in the second stage (see Lemma 1). In this stage t i hasn't been realized; hence, the optimal production q i (t i ) can be seen as a random variable. The following result shows the form of Lemma 2. The utility of the game G 1 defined in Eq. (9) is where are the first and second moments of the optimal production q i (t i ), respectively (see Eq. (5)).
Proof. In the first stage the utility (expected profit) is The expectation is with respect to the signals Z i and Z j , which haven't been realized in the stage. Let us define Hence, the expected profit in Eq. (11) can be written as Eq. (10).
Remark 1. We cannot guarantee that G 1 has a Nash Equilibrium (NE). Also, rather than finding the precise NE of G 1 , we focus on finding the conditions in which investing in risk assessment is feasible or not. Thus, we classify the possible NE in the following categories • Both insurers invest the same amount: (m, m), with m ∈ (0, m 0 ).

Cost Estimation
Since both the cost C and the noise E i are normally distributed, the sample . This property makes it easier to find the closed form expressions of variables of interest. For instance, the estimation of the cost conditional to the sample Z i is with δ i = σ σ+mi . Moreover, we can find the expected cost given two observations, Z i and Z j , using the multivariate normal distribution where k 0 = σm i + σm j + m i m j . Fig. 2 shows the Bivariate distribution of Z i and Z j with parameters m i = m j = 2 and σ = 4.
The samples Z i and Z j are correlated through the cost C; hence, Cov(Z i , Z j ) = σ. For this reason the insurer i can estimate the sample of it's adversary Z j as Note that sharing information creates a conflict, because although insurers may benefit, they may also help the competing insurer. Fig. 3 shows an example of the cost distribution given some observations.
In this case each additional observation reduces the uncertainty of the cost (variance). Note that the insurers use E[C|Z i ] to make decisions in the second stage, rather than the rv (C|Z i ) depicted in Fig. 3.

Analysis of the market's equilibria
In this section we find the optimal production q i (t i ) and the utility function in the upper level game J i for each scenario (sharing and non-sharing information). Then we find the possible equilibria of the game and illustrate them with examples.

Market with information sharing
In this case the firms share their private information Z i . Thus, they have the same signal t i = t j = t = (Z i , Z j ), and therefore, make the same cost estimation (13)). Moreover, since firms have the same characteristics and possess the same information, there is no uncertainty about the production of the adversary, because E{q j (t j )|t i } = E{q j (t)|t} = q j (t).
In addition, they produce the same quantity q i (t) = q j (t). Thus, from Eq. (8) we get Now, recall that the cost C, E i , and E j are independent normal random variables. Thus, in the first stage, when neither C, Z i nor Z j have been realized, we can seeĈ as a random variable (see Eq. (13)) , k i = σmj k0 , and k j = σmi k0 , with k 0 = σ(m i + m j ) + m i m j . Thus, the cost estimationĈ is normally distributedĈ ∼ N (0,σ) witĥ Now, from Eq. (15) the optimal production q i can be seen as a random variable q i ∼ N ( a 3b ,σ 9b 2 ). Therefore, the profit of the game G 1 (see Eq. (10)) is The following result shows that only one firm may invest in risk assessment.
Proposition 1. A duopoly in which insurers share information can have two types of Nash equilibria (but only one of this scenarios can occur in a game): • Neither firm invests (m i = m j = m 0 ) if m 0 < 36b log α or if m 0 > 36b log α and σ <σ = 36bm 0 m 0 log α − 36b .
• Only one insurer invests (e.g., m i ≤ m 0 and m j = m 0 ) if The reader can find the proof of this and the following results in the appendix. (m 0 , m 0 ) and (m, m 0 ). In this case, free-riding can occur because at most one insurer invests in data quality (when m 0 is large, i.e., when the data quality is low without any investment) . In these examples we use the following parameters: a = 10, b = 1, α = 3, and m 0 = 1.5 36b log α . Moreover, for the NE examples we use σ = 1.2σ (Fig. 5a) and σ = 0.9σ (Fig. 5b).

Market without information sharing
In this case the information available is t i = (Z i ). From Gal-Or (1986);Radner (1962) the decision rules must be affine in the vector of observations (in this case Z i ), therefore, for some constants α i 0 and α i 1 . From Eq. (17) we can find the expected demand of the adversary given the information t i We can use Eq. (14) to rewrite Eq. (18) as Eq. (17) and Eq. (19) are equivalent, therefore the coefficients α i 0 and α i 1 must satisfy which have the following solution .
Now, in the first stage the optimal production q i (t i ) can be seen as a random variable. Since Z i is normal, the production is also normal (see Eq. (17)) Then, the profit of G 1 is The following result states that the market can have three types of equilibria.
Unlike the previous case, not sharing information creates the conditions to have both insurers investing in data quality.
Proposition 2. A duopoly in which insurers do not share information can have the following equilibria for different values of σ and m 0 . Investments in data quality occur only when m 0 is large, that is, when the data quality is low without any investment. Fig. 7 shows two examples of the possible equilibria in the market without sharing information. In particular, Fig. 7b shows that, unlike in the market that with information sharing enforced, one or both firms can invest in data quality in the equilibria. In these examples m 0 = 1.5 36b log α (same as the example in Fig. 5). Fig. 7a has σ = 0.9σ and Fig. 7b σ = 1.2σ.

Relaxing the normal approximation
In this section we argue that the previous results are also valid-to some extent-when we consider extreme events in the costs, i.e., cost distributions with heavier right tails than the normal distribution, as is often the case with cyber risks. Let us consider a cost distribution with pdf where w 1 , w 2 ≥ 0, w 1 + w 2 = 1, f 1 is a normal pdf, f 2 a Generalized Pareto Distribution (GPD). In this case, claims that exceed the threshold x 0 are modeled with the GPD. Now, an estimation of the cost given some information t i is Let us decompose the estimation into two terms, one corresponding to the most frequent events and one to the tail HereC i and ǫ i represent the lower and upper estimates of the cost. We assume that the first termC i is close to the cost estimation assuming a normal distribution. Intuitively, estimations assuming a normal distribution ignore the contribution of the tail.
The optimal amount of policies issued by each insurer-see Eq. (5)-then becomes 6 Note that the optimal production is lower when we consider the costs from the tail. Now, let us express the optimal production as whereq i (t i ) is the optimal production assuming a normal distribution. Substituting Eq. (21) into Eq. (20) we obtain The previous expression can be rewritten as The last step results from using Lemma 1 to get the optimal production when we estimate the cost assuming a normal distribution (i.e., using the cost estimatē C i ). From Eq. (21) and Eq. (22) conclude that Through Eq. (23) we express the impact of the tail in the optimal decision (in the second stage of the game). Now we can analyze how the tail affects the game in the first stage. Concretely, the utility function becomes (see Lemma 2) Note that δ i depends on both ǫ i and ǫ j . Thus, we argue that if the tail of the cost has finite mean and variance, then we can approximate Eq. (24) with whereJ i (m i , m j ) represents the utility in the first stage that we obtain using the normal approximation. If the tail of the cost has finite mean and variance, 7 then we can find some boundsφ i ,φ i > 0 s.t. It follows from Lemma 10 that the NE obtained assuming a normal distribution is close to the NE that we would obtain considering the tail of the distribution (see the discussion of ǫ-equilibria in Myerson, 1978;Fudenberg and Levine, 1986).
How close depends on the size of the tail. imation discussed in Section 5. A third aspect is that the model treats duopoly rather than the more general oligopoly situation. Now, there is good reason to believe that our results should generalize to the oligopoly situation-Raith (1996, see 3.a, p. 263) has shown that for Cournot markets, the results from Gal-Or (1986) and similar duopoly studies are valid for oligopolies as well. A detailed investigation of the oligopoly case, however, is beyond the scope of this paper. Nonetheless, we would expect that insurers would have lower incentives to invest in data quality for two reasons: 1) the profit of firms will be lower with each additional competitor; and 2) imposing data sharing in an oligopoly will give each firm access to more data.

Discussion and Conclusions
In addition to the simplifications of the model, the NE concept also has some limitations. Concretely, the existence of a particular NE only says that if the players are there, none of them has anything to gain from unilaterally deviating. However, if they are not there, the NE concept does not provide any mechanism for how the NE could be reached. This means that it is not possible to say which outcome will actually occur when there are multiple NEs. Despite this, our analysis is important because it reveals strategic tensions between the players. For example, in practice, situations that create opportunities for free-riding may result in no investments, because each firm will try to free-ride.
Future research directions include analyzing whether sharing information policies benefit insurers and consumers (despite of creating free-riding scenarios) and designing incentives that could improve the data quality. Also, it would be interesting to contemplate other cost and risk aversion functions on the part of the insurers, as well as extending the treatment in Section 5 of alternative cost distributions.

Appendix A. Additional results and proofs
The following results show some properties of the decision of each firm. We use these results to analyze the equilibria of the market with different data sharing policies.
Lemma 3. Let m i be a feasible decision in the game G 1 when the adversary selects m j . The following is satisfied where J ′ i and J ′′ i represent the first and second derivative of J i with respect to m i . Proof of Lemma 3. We formulate the decision of each firm with the following optimization problem The noise level m i is a local maxima if the following necessary conditions are with µ 1 , µ 2 ≥ 0. Moreover, the second order sufficient condition is Now let us analyze the strategies that the insurer i may choose: • m i = m 0 is a valid solution if µ 2 ≥ 0, µ 1 = 0 and −J ′ i (m 0 , m j ) + µ 2 = 0 which means that with the second order condition The next result shows that, except a special case, (m, m 0 ) and (m 0 , m 0 ) cannot be NE simultaneously. Thus, we assume that only one of these NE can occur. Lemma 5. If J ′ i (m i , m j ) is decreasing with respect to m j , then the best response of the insurer i, denoted m * i (m j ), is decreasing with respect to m j . In such cases, if (m 0 , m 0 ) is a feasible NE, then no other NE exists.
Proof. Suppose that m * i (m j ) < m 0 is the best response to the strategy m j ∈ [0, m 0 ]. This means that m * i (m j ) is a local maxima, i.e., J ′ i (m * i (m j ), m j ) = 0 and J ′′ i (m * i (m j ), m j ) < 0 (see Lemma 3). Then the following applies for two Let us assume without loss of generality thatm j <m j . If J ′ i (m i , m j ) is decreasing wrt m j , then Now, replacing Eq. (A.2) in Eq. (A.1) we obtain In other words, the best response function m * i (m j ) is decreasing wrt m j .
Appendix A.1. Market with information sharing Now we are ready to prove Proposition 1.
Proof of Proposition 1. In this proof we first find the form of the possible NE and then find the conditions to guarantee that they are feasible. From Eq. (16) the first and second derivatives of the profit are: .
Let us show that a NE in which both firms invest (m i , m j = m 0 ) does not exist. Note that (m i , m j ), with 0 < m i , m j < m 0 , is a feasible equilibria if it satisfies J ′ i (m i , m j ) = 0 for each firm. This FOC can be rewritten as (see Since the right hand side is identical for each firm we obtain The above expression implies that m i = m j (this is the only possible solution). Replacing Now, the second derivative evaluated on m i = m is Replacing Eq. (A.5) in the previous expression leads to The above expression leads to which is a quadratic equation of the form with A = 9b(σ + m 0 ) 2 , B = 18bσm 0 (σ + m 0 ) − σ 2 m 2 0 log α, and C = 9bσ 2 m 2 0 . The solution of Eq. (A.6) has the well known form Now, let us investigate the conditions in which Eq. (A.7) has no valid solution.
In other words, cases in which J ′ i (m i , m 0 ) > 0 for all m i ∈ [0, m 0 ]. If this happens then the game can have a single NE, namely (m 0 , m 0 ) (see Lemma 3).
Lemma 9. Consider a game G 1 without sharing information. If (m, σ 2 4m ) is a feasible solution, then (m, m) is also a feasible solution, but the converse is not necessarily true. . From Lemma 6 we know that the previous expression has a positive solution if 9 −γ < 0, that is, if m 0 > 9b log α .
Lastly, the next result defines conditions to have an ε Nash Equilibrium: