Making Better Decisions : Can Minimizing Frequentist Risk Help ?

The concept of shrinking bet size in Kelly betting to minimize estimated frequentist risk has recently been mooted. This rescaling appears to conflict with Bayesian decision theory through the likelihood principle and the complete class theorem; the Bayesian solution should already be optimal. We show theoretically and through examples that when the model determining the likelihood function is correct, the prior distribution (if not dominated by data) is ‘correct’ in a frequentist sense, and the posterior distribution is proper, then no further rescaling is required. However, if the model or the prior distribution is incorrect, or the posterior distribution improper, frequentist risk minimization can be a useful technique. We discuss how it might best be exploited. Another example, from maintenance, is used to show the wider applicability of the methodology; these conclusions apply generally to decision-making.

introduced the concept of bet shrinkage in Kelly betting, for the case where the probability of winning is not accurately known.They derived approximations to the amount of shrinkage required, and showed that shrinking bet size increased expected utility both in a simulated gambling situation, and in tennis betting.Their methodology is frequentist, and can be applied in contexts other than Kelly betting.It will however doubtless raise Bayesian hackles, because Bayesian decision theory would seem to state that no such rescaling is required, and it is underpinned by two important theorems, the likelihood principle and the complete class theorem (see e.g.Robert (2007)).If Bayesian decision theory could in general be improved by a frequentist tweak, this would strike at the very foundations of decision theory.Can the circle be squared, so that bet shrinkage based on frequentist criteria is admitted as a valid procedure, whilst leaving Bayesian decision theory intact?
This paper gives an answer to this question, and clarifies the circumstances under which the type of adjustment recommended by Baker and McHale might be useful, and how it could best be used in decision making.We first recap briefly on the logic of bet shrinkage and explain how this relates to Bayesian decision theory, before describing Monte-Carlo simulations and an analysis of a large tennis dataset.We also look at equipment replacement, deliberately choosing a quite different decision problem from Baker and McHale (2013), to see from a practical viewpoint whether the methodology could be of general use.Finally, we draw some conclusions as to how rescaling the decision variable might be used in decision theory.
The example that was used to motivate bet shrinkage is that of betting on tosses of a die.For example, in the 17th century, the Chevalier de Méré famously made money by making equal-odds bets that he could throw an 'ace' (a six) in 4 tosses of a die, for which the probability is 1 − (5/6) 4 ≃ 0.5177 (e.g., David, 1987).He consistently lost money in a subsequent bet that he could throw a double six in 24 tosses of two dice (probability is 1 − (35/36) 24 ≃ 0.4914), leading him to seek help from Blaise Pascal.Suppose we (wisely) decide to accept this second bet, but, like the Chevalier, are unable to compute the probability of winning.We might try a simulation, tossing two dice 24 times, repeating this procedure n times and so estimating the probability θ of winning as θ = m/n, where m is the number of times that a double six does not come up.This would give some information about θ, but it would not be the very accurate knowledge we could gain from simple probability theory, where the only inexactness in our knowledge of θ arises from tiny inaccuracies in the manufacture of the dice.We focus on this example, although Baker and McHale (2013) also consider the general problem of deciding how much to bet, if the bettor has only an inexact estimate (maybe a guess) of the probability of winning.
The decision we as investors face is how much of our bankroll to invest, given our knowledge (belief) about θ and the odds offered.For guidance on this, we could use the Kelly betting formula (Kelly, 1963).This is based on a logarithmic utility.Given odds of b (here b = 1), after betting a proportion δ of our bankroll, the utility from a unit bankroll becomes ln(1 + bδ) with probability θ and otherwise becomes ln(1 − δ).Maximising the expected utility gives δ = {(b + 1)θ − 1}/b.Maclean, Thorp and Ziemba (2012) include a comprehensive collection of papers on the Kelly criterion, and Poundstone (2005) also describes its use in finance.
If we knew the value of θ, we could make an optimum bet of δ = 2θ − 1 ≃ 0.0172 of our bankroll.Because we have only an estimate of θ, we will bet more or less than the optimum if we use the Kelly formula with our 'plug in' estimate θ or with the mean of the posterior distribution of θ derived from using say a Jeffreys prior.
Intuitively, we could imagine many copies of ourselves deciding how much to bet, and we could generate the results that they might obtain through dice tossing by bootstrapping (e.g.Efron and Tibshirani, 1993) the actual results, or, equivalently, using the sampling distribution of θ.The finding is then that the average utility of these bootstrapped decisions can be increased by shrinking the bet size, and we apply this shrinkage factor to the bet we are about to make.This is the motivation behind bet shrinkage as described by Baker and McHale (2013).This concept of considering multiple copies has some affinity with the concept of the 'wisdom of many in one mind' proposed by Vul and Pashler (2008), and Herzog and Hertwig (2009), in that it seeks to improve an individual decision by splitting it into multiple decisions.
That bet shrinkage is reasonable can be seen very easily; if we only carried out say two sets of tosses for the gambling example, we would obtain θ = 1 on roughly a quarter of occasions.Thus the Kelly formula with the estimated probability of winning plugged in would lead us to bet our entire bankroll, which we would lose with a probability of nearly a half.This illustrates the necessity of bet shrinkage.

Bayesian Decision Theory and Bet Shrinkage
We now examine what is implied in shrinking bet sizes from the viewpoint of Bayesian decision theory.This theory, like all Bayesian inference, is underpinned by the likelihood principle, which states that inference should depend only on the data actually observed, and on the complete class theorem, which states that Bayes estimators are admissible (no other estimator has smaller risk for all values of θ).Note however that there has always been some doubt over the likelihood principle, and Birnbaum's proof of it (see e.g.Robert, 2007) is currently disputed by Mayo (2010).Lehmann (2008) gives an intuitive summary of the complete class theorem 'The most important link between the two approaches [Bayesian and frequentist] is the crucial result of Wald that, roughly speaking, any sensible statistical procedure is a Bayes procedure corresponding to some (proper or possibly improper) prior distribution.'Following chapter 2 of Robert (2007), given a loss function L(θ, δ) for carrying out action δ when the space of θ is Θ, parameter value(s) is/are θ, the prior distribution is π(θ) and the posterior distribution π(θ|x) after observing x, the expected posterior loss is A decision rule δ(x) is optimal if it minimizes the posterior expected loss (maximises the posterior expected utility) and is then said to be a Bayes decision rule.
The frequentist risk is the integral over the sample space X where f is the sampling distribution of x.This is what Baker and McHale (2013) sought to minimize once the decision rule δ had been determined, by rescaling δ → λδ.It was of course then necessary to plug in an estimate θ of θ to make R(θ, δ) computable.
The integrated risk r(π, δ) is where m(x) is the marginal distribution of x.This risk in the gambling case is the average loss we would expect, on repeated gambles where the winning probability is drawn from the prior distribution, and so is precisely what regular bettors wish to minimize.From Bayes' theorem π(θ|x)m(x) = f (x|θ)π(θ) and from this, on using Fubini's theorem (see, for example, Thomas and Finney, 1996) to change the order of integration, the integrated risk can be rewritten as From (1) any decision rule δ minimizing the expected posterior loss ρ(π, δ(x)|x) will also minimize the integrated risk, which from (2) can also be achieved by minimizing the frequentist risk.Yet if the integrated risk is precisely what we seek to minimize on repeated betting, and the Bayes rule already does this, how can bet shrinkage help?
Before answering this question, we quote a proof that shrinkage is required, essentially following Baker and McHale (2013), but recast in the standard notation of decision theory.We reduce x to θ, our (Bayesian or other) estimator of θ.
Theorem 1.Given an unbiased estimator θ of win probability, frequentist risk R(θ, δ) can be decreased ∀ θ by shrinking the bet size.
Proof.To prove that risk can be decreased by shrinking the bet size, we show that dR(θ, λδ)/dλ| λ=1 > 0 for non-degenerate f ( θ|θ).The loss function (negative utility) is The frequentist risk is then Since 1/ θ and 1/(1 − θ) are convex functions, Jensen's inequality, here that Then the fact that E( θ) = θ gives the required result.When there is no uncertainty in the probability of winning, the inequality becomes an identity.
The optimum amount of shrinkage is an estimate, but some shrinkage is required, and a decision rule that shrunk bet size δ by ϵ unless f ( θ|θ) were degenerate would dominate the plug-in decision rule, i.e. give lower frequentist risk for all values of θ.
Bet shrinking amounts to shrinking the mean of the posterior distribution towards 1/(b + 1), the bookie's implied probability of a win.For the gambling example, for a shrinkage factor λ with the Jeffreys prior we could take θ = λ(m The resolution of the apparent contradiction between the need for bet shrinkage and the optimality of δ from minimising the expected posterior risk is that the above proof hinges on θ being an unbiased estimator, and a Bayes estimator cannot be unbiased except in the trivial case where the risk is zero (Lehmann and Casella 1998, ch. 4).Hence the decision rule that the proof shows can be dominated is not a Bayes rule, and there is no contradiction.
We can see in more detail how the solution to the paradox works in the dice example.Consider the prior distribution π(θ); taking the Haldane (improper) conjugate prior distribution π(θ) ∝ θ −1 (1 − θ) −1 leads to the sample estimated probability E(θ|x) = m/n, while the commonly-used Jeffreys prior π Thus the Haldane prior leads to an unbiased estimator of θ, where the decision rule can be dominated by bet shrinkage.This is because the Haldane prior leads to improper posterior distributions where if m/n = θ = 1 or zero, the probability distribution collapses.The risk R(θ, δ) is then infinite, and ( 1) and ( 2) do not apply.We may wonder what would happen using a reference prior, but in one dimension, the reference prior is the Jeffrey's prior (Bernardo, 2005).
It is useful to consider what happens using other priors.If we imagine a computer simulation in which values of θ are generated from the prior, and win/lose data are then generated with win probability θ, indeed no bet shrinkage is required.This can be seen from ( 2), but has been verified by computation.One could say that the Bayesian estimator E(θ) is preshrunk towards the value θ = 1/2 by just the right amount to minimize the integrated risk.However, on carrying out the same experiment where θ is generated from a more peaked prior distribution but where the Jeffreys prior is assumed in deriving the optimum betting amount δ, bet shrinkage is required.The frequentist risk in (2) is being averaged over the wrong prior distribution for θ.The problem with the Bayesian approach is that we are using a vague or uninformative prior distribution, to reflect our ignorance of the value of θ, but in repeated betting, what we need in order to minimize frequentist risk is the prior distribution of the win probabilities for the sort of bet we habitually make.The prior ignorance assumption amounts to assuming a very broad distribution of win probabilities (large variance).If we really had such a broad distribution, with peaks at θ = 0 and 1, we could make good bets on the basis of very limited information.
The shrunken decision will have lower risk most of the time, but occasionally will not, so for priors other than the Haldane prior, the shrunken decision rule will not dominate the Bayes rule, in accordance with the complete class theorem.
Similarly we may suspect that a 'wrong' model which then yields a 'wrong' likelihood function will also cause the Bayesian approach to perform badly.In the case of a wrong model, one can attempt to rectify the situation by developing a more comprehensive model, but where there is little data and the prior distribution is important, the rescaling is useful.
In general, minimizing frequentist risk starts with a Bayesian choice for δ, based on maximising expected utility.For Kelly betting, this also corresponds to using a plug-in estimate for θ, but this will not be true in general.Of course, one could also simply use a plug-in estimator, and seek to improve that through scaling.Although a frequentist transformation of some kind is applied to δ, rescaling is merely the simplest possibility.In Kelly betting, one could instead take δ → δ γ , giving shrinkage when γ > 1.This choice however would not shrink large bets where δ ≃ 1.There is no way to find the optimal transformation of δ.In general, positive decision variables can be shrunk or expanded, and variables defined on the whole real line could have a constant added.
The minimization of frequentist risk occurs occasionally in statistical theory, e.g.Beran, (2005), where minimizing risk serves to select the best model from the class of candidate Bayes estimators.Entirely frequentist procedures such as Mallows' C p (Mallows, 1973) also minimize risk under quadratic loss.There are two minimizations to be done: fitting a least-squares model and then minimizing C p over the discrete range of model choices.This is also the case in bet shrinkage: one minimizes expected posterior loss to obtain δ(x) and then minimises frequentist risk to obtain λ.
Müller (2012) suggests modifying the posterior distribution in linear regression using an 'artificial posterior' with the sandwich covariance matrix.This reduces asymptotic frequentist risk.Thus the concept of minimizing frequentist risk has been used successfully in statistics, albeit sporadically.In following sections, we consider some examples to gain practical experience with rescaling.

Gambling
First, we study a gambling problem where results of between 8 and 20 trials of a gamble were simulated, and the bet was then made using the information gleaned from these trials about winning probability.The probability of winning was in fact a random variate from a beta prior distribution.The even-odds Kelly bet is computed using the Jeffreys prior, and is shrunk using the approximate formula given in Baker and McHale (2013).In our notation, the shrinkage factor is where σ 2 is the estimated variance of the sampling distribution of θ.Table 1 shows the results.When win probabilities are simulated from the Jeffreys prior, so that the prior is 'correct', from (2) bet shrinkage can only increase the frequentist risk (decrease the expected utility).However, it does so by a tiny amount.Once the distribution of win probabilities diverges from the assumed prior, bet shrinkage starts to increase the expected utility.The shrinkage applied was not the optimal amount, but even the simple approximation in (3) is worthwhile.This bears out the point that shrinkage is useful if the prior distribution is 'wrong'.The evaluation of a decision rule for a bettor with specified risk aversion by whether it would have 'made money' on betting against the market or against a proxy bookmaker is precisely what is recommended by Johnstone et al (2013).
Table 1.Expected utilities from 10000 simulations, computing bet size using Kelly betting and the approximate shrunken Kelly bet from Baker and McHale (2013).The Jeffreys prior is used, and bet win probabilities are generated from the distributions shown.

Tennis Betting
Next, we revisit tennis betting, where Baker and McHale (2013) found that bet shrinkage increased the expected utility.This conclusion can be refined.We obtained more comprehensive data on the results of matches from the top tier of men's professional tennis, the ATP Tour, for 2002-2013 (31530 matches) from www.tennis-data.co.uk.The data included the participants' names and ATP world rankings points at the time of the match, the match result and up to six bookmaker's odds for each game, of which we used the one that was present most often.Following Boulier and Stekler (1999) and Clarke and Dyte (2000), we adopt a simple logistic regression model for estimating the probability of victory for the higher ranked (better) player using the natural logarithm of the ratio of the two players' world rankings points as a covariate.We also include the logarithm of the ratio of bookmaker's odds for the two players.Including these two variables, and their squares and the cross (interaction) term gives a model with 6 parameters that is good enough to enable betting to be worthwhile.Of course, much better models can be constructed using other covariates.We divided the data into 10 time slices, with the fit to all previous data being used to make bets for the current time slice.The details of the model used are not the point of this paper, but for completeness table 2 shows the fitted model parameters, when the model is fitted to the whole dataset.Figure 1 shows the model graphically.The focus here is not how good the model is, but whether bet shrinkage is required.We first explore the advantage of a Bayesian formulation.With the 6 covariates x 1 • • • x 6 and parameter values β 1 • • • β 6 the model for the win probability θ is Let the maximum-likelihood estimators be β, where asymptotically β ∼ N[β, V], where V is the negative inverse of the Hessian.Then using the delta-method (e.g.Oehlert, 1992), to O(1/n) under a vague prior for β we have that the expectation of the posterior distribution is E(θ) ≃ θ − (x T Vx) θ(1 − θ)( θ − 1/2), showing that the Bayesian estimate of win probability is shrunk slightly towards the centre of its range.The effect on winnings from Kelly betting is of course small, but figure 2 shows that the Bayesian adjustment improves the winnings slightly.
If the model is adequate, given the large sample size the vague prior on β should pose no problem, and bet shrinkage should not be required.We examined the expected utility from the bets after shrinking all bets down by the same factor, and found that indeed any shrinkage only decreased expected utility.When the model was forced to be inadequate, the finding was different.We took the model described in table 2 and discarded the cross term.Now (surprisingly) performance was poor: the final bankroll was only 0.462 units of money, giving an expected utility of -0.772, so betting using the model actually decreases expected utility.Shrinking all bets down to 25% of their value however gives the best result, a final bankroll of 1.092 units with expected utility 0.088.Although we are winning or losing tiny amounts of money, bet shrinkage is needed with this inadequate model, and it converts a negative outcome to a positive one.As expected, bet shrinkage can only improve results when something is wrong in the problem formulation, i.e. a wrong prior distribution or as here an inadequate model.

The Power-Law Process
Our aim here was to apply the method of rescaling the decision variable to minimize the frequentist risk to another problem in decision theory, quite different from the Kelly betting application considered in the previous section.This allows us to explore the question of whether the method is of general applicability.
The nonhomogeneous Poisson process where the expected number of events to time t is (t/α) β where α > 0, β > 0 is widely used in deciding when to replace ageing machinery.For example, Kumar and Klefsjö (1992) applied it to the replacement of load-haul dump trucks.Given a cost c r of repairing the system and a cost c a > c r of replacing it, we wish to replace at age δ to minimize the cost per unit time L = {c r (δ/α) β + c a }/δ.For β > 1 we should replace at age and if β ≤ 1 reliability does not deteriorate with time and the system should never be replaced.Since a wrong decision here predicts L = 0 but in fact results in infinite L , we impose the condition that the system be replaced at age T ≫ 1/α in any case.This would likely happen in practice because of technological obsolescence.We assume c a /c r = 100 and without loss of generality set c a = 1.
Of course, we do not know α and β and so will make a suboptimal decision if we estimate these parameters from data.We consider three methodologies: plugging the maximum-likelihood estimators (MLEs) into (4), the Bayesian method of replacing the expected number of failures by age δ by the expectation of this quantity over the posterior distribution, and minimization of the frequentist risk by rescaling the optimum value of δ found using the Bayesian method.Bain and Englehardt (1991) give the MLEs α, β and their sampling distribution, which is needed for the rescaling method, and Bar-Lev et al (1992) describe Bayesian inference.Data can be time-truncated or failure truncated; we assumed failure truncation, so that observation stops after the nth failure.Failure times are t 1 • • • t n .With the commonly used (improper) prior π(α, β) = 1/(αβ), from the posterior distribution one can obtain the expectation This can be rewritten in terms of the MLEs α and β as from which, using the product-limit form of the exponential function, it can be seen that the Bayesian expectation converges to the MLE (δ/ α) β as n → ∞.One cannot now solve analytically for the optimum value δ as in the MLE plug-in case (4), so a function minimizer must be used.
The approach of rescaling the decision variable, the replacement age δ, was studied using Monte-Carlo data that was representative of real data, e.g. the load-haul dump truck data of Kumar and Klefsjö and data on failures of marine engines from USS Grampus and Halfbeak, cited by Ascher and Feingold (1984).Without loss of generality, the scale factor α was set to α = 1, and β was chosen to be from 0.8 to 2.0; the region around β = 1 is particularly interesting.Replacement was made no later than T = 1000.
It is possible to improve the Bayesian estimator, by adopting a more realistic prior distribution.For load-haul dump trucks, from the data 1 < β < 2, whereas π(α, β) assigns the same probability for β to lie within the range (10000, 20000) as for it to lie within the range (1, 2).Clearly, in general β must be small, for who would tolerate equipment where the failure rate increased so dramatically with age?Also, if β is nearly zero, surely the manufacturers would run the equipment for a burn-in period, before unleashing it on the public.
It is straightforward to give the prior distribution a mean of unity (constant failure rate) and a realistic variance.For mathematical tractability, we used a gamma distribution for β with unit mean, so that π(α, β) = γ γ Γ(γ) β γ−1 exp(−γβ)/α.It is unlikely that β > 2, and with γ = 4 the probability of this is only 0.042.The probability that β < 0.75 = 0.352, which is rather high, as from the datasets examined it is unlikely that β < 0.75, and for realism we would really need to use a different distributional form, such as a truncated normal distribution.However, with the gamma distribution prior for β, the result can be derived analytically; the use of the gamma prior for β and this result appear to be new.The algebra is not shown as it is routine.The effect of introducing γ is to lengthen the optimum replacement time, as huge values of β that would necessitate frequent replacements occur with lower probability.
We simulated 10000 realizations of possible data for each value of β for various sample sizes n, computed the MLEs α and β and the MLE replacement age from (4), and the replacement age from the Bayesian estimator (5).This was done using a robust function minimizer.From the simulated realization, 8000 values of estimated α and β were computed using the formulae from Bain and Englehardt (1991), and the corresponding 8000 replacement ages derived from (6).The loss function (cost per unit time) was computed for a rescaling δ → λδ using (5).A function minimizer was then used to find the scaling that minimized the loss function, and applying this rescaling factor to the Bayesian optimal δ gave our third estimate of replacement age.Finally, the true costs per unit time were evaluated for each of the 10000 realizations, using the values of α and β from which the realizations were simulated.We take the excess cost per unit time over that at the true optimum value of δ as the loss function, and this is reported in table 3. The results are correct to the accuracy shown, as was demonstrated by computing the variance of the loss functions, and also by scaling up the numbers of simulations until the computed table did not change.
It can be seen from table 3 firstly that the Bayesian solution is in general, but not always, somewhat superior to the MLE.Second, the rescaling to minimize frequentist risk performs fairly well, often giving a slightly lower cost than the Bayesian solution, but performing worse for higher values of β.Table 4 shows the performance of the methods with the γ = 0 prior.Here the Bayes method often performs worse than the MLE method, and the rescaling, based on the Bayes solution, also performs poorly.
From equations (1) and ( 2) it is clear that the Bayesian decision rule minimizes a weighted mean of the frequentist risks, so if rescaling reduces loss for some values of θ, it must increase it for others.The average rescaling factor λ > 1, so that replacement is postponed longer if minimizing frequentist risk.As in the Kelly betting case, the loss function is asymmetric.In Kelly betting, betting too much can wipe out nearly the whole bankroll, and betting it all and losing would give an infinitely negative utility; refraining from betting merely gives zero utility.For equipment replacement similarly, replacing very often gives a huge disutility, but if β is not too large, not replacing often enough incurs only a small disutility.Hence in both cases rescaling δ makes sense, in the Kelly case scaling down the bet, and here scaling up the replacement age.As β increases, the asymmetry becomes smaller, and rescaling starts to perform badly.Estimating a further parameter δ from the data of course introduces noise and so can increase loss rather than decreasing it if the rescaling required is not large.

Conclusions
We have studied the technique of rescaling decision variables to minimize the frequentist risk, using the results of Bayesian decision theory, and also through Monte-Carlo simulations and analyses of real data.Besides examining Kelly betting, the quite different area of equipment replacement was studied, to address the question of the general applicability of the technique.
In drawing some conclusions for practical decision-making, it is first necessary to restate a remark about prior distributions.If, using our examples, we are betting repeatedly or replacing equipment on a regular basis, the frequentist properties of our (Bayesian) decision are important.The complete class theorem guarantees that they will be, but then the prior distribution must be right in a frequentist sense.If the prior is π(θ), the value θ should crop up a proportion π(θ) of times in the bets we make.This is of course likely not to be the case when we have no prior knowledge, and a 'vague prior' is used.For example, the Jeffreys prior π(θ) ∝ θ −1/2 (1 − θ) −1/2 implies that the probability of winning the bet is most likely to be either one or zero; with a little data, we can surely guess which type of bet we are facing, and so make a nice profit.This is insanely optimistic, which is why the Kelly betting strategy performs poorly when we do not know θ.This shows that unless we have a large sample size, there are risks in using a vague prior.We also require the model to be correct, but we assume that incorrect models can be refined until adequate, so this is not a major problem.
The Kelly betting example is an unusual case, where because the estimate θ is assumed unbiased, there is no Bayes estimator, and the generalized Bayes estimator that is used instead and which minimizes posterior expected loss for every x is not admissible (Robert,page 64).In general, minimizing frequentist risk by rescaling the decision variable is effective only when the prior distribution is 'wrong' and strongly affects inference, or when the model is wrong but cannot be easily rectified.
With this understanding, how can the technique of rescaling decision variables to minimize frequentist risk best be exploited in practice?There are several options: 1. routinely use frequentist risk minimization, based on rescaling a naive Bayesian decision; 2. use in one-off situations, but in general seek to improve the prior distribution π(θ) with experience, and then take a purely Bayesian approach; 3. use purely as a diagnostic tool to check decisions; 4. never use.
Our conclusion is that 2 and 3 are appropriate.For a one-off bet or decision, we can probably make a better decision by rescaling the decision variable, unless we have a prior distribution with good frequentist properties.But a habitual bettor could adopt a prior distribution for win probability θ such as a beta distribution with mean (1/(b + 1)) γ and unknown variance.When γ = 1 this shrinks the bet size to zero, and when γ < 1 the expected probability of winning exceeds the probability implied by the bookmaker and a bet is made.An empirical Bayes analysis of data used in making past bets, such as the dice bets described in table 1, would maximise the marginal probability m(x) to estimate the prior distribution to be used in future betting.A fully Bayesian analysis would use a hierarchical prior distribution.
Turning to equipment replacement, based on properties of the type of machinery for which replacement policies must be devised, the same method could be used to find a prior distribution π(α, β) that was more representative of the values of the shape parameter β typically arising.To make better decisions, it is vital to improve the prior distribution, through experience or through attempts to elicit the experience of experts, e.g.Percy (2002).Table 3 shows that replacement age should be increased over the range of β likely to be encountered.A realistic prior distribution with small probability that (say) β < 1/2 or β > 2 would improve the Bayesian results further.
Finally, model criticism is an area where Bayesian statistics routinely seeks help from the frequentist techniques used in model-checking.This method of rescaling can be used as a tool to check how good our decisions are.It would be a little daunting to find that one would have made more money if bets had been routinely shrunk to 25%, or if equipment had been replaced at double the replacement age used.Hence the mean value λ of the optimal scaling factor from a series of decisions is a statistic that is a measure of model/prior adequacy for the task at hand.
In conclusion, minimizing frequentist risk has been sporadically used in statistics, for example in Mallows' C p , and we think that the related concept of rescaling the decision variable to minimize risk can sometimes be useful in decision making, either directly, or as a diagnostic tool to help improve Bayesian methods.It should however be used with caution, and the more reliable approach of making better decisions by improving Bayesian prior distributions with experience should be the norm.

Figure 1 .
Figure1.The logistic model used in tennis betting.The lines correspond to log odds of -2.5,. . .4.5, going from the lowest to highest line.

Figure 2 .
Figure2.Cumulative bankroll from tennis betting, using the maximum-likelihood estimators of model parameters, and also calculating the win probability as the expectation over the posterior distribution.

Table 2 .
Tennis betting model parameters, values, standard errors, Z scores and p-values for a Wald test that the parameter is zero.

Table 3 .
Shape parameter β, sample size n, Maximum-likelihood, Bayes and rescaled average loss functions, with the average rescaling factor λ. The Bayesian loss has γ = 4.

Table 4 .
Shape parameter β, sample size n, Maximum-likelihood, Bayes and rescaled average loss functions, with the average rescaling factor λ. The Bayesian loss uses the unimproved prior with γ = 0.