Reflections on Bayesian inference and Markov chain Monte Carlo

Bayesian inference and Markov chain Monte Carlo methods are vigorous areas of statistical research. Here we reflect on some recent developments and future directions in these fields.


INTRODUCTION
On the occasion of the 50th anniversary of the Statistical Society of Canada, we offer some reflections on research in the highly twinned areas of Bayesian inference and Markov Chain Monte Carlo (MCMC) methods. These areas continue to attract robust participation from the Canadian research community. As labelled by initials, each of Sections 2 through 4 offers thoughts from one of us. First, JSR and RVC survey computational topics related to the use of MCMC algorithms for Bayesian computation. Then, PG describes some Bayesian applications, specifically in the context of pandemic-related research.

MCMC IN HIGH DIMENSIONS (JSR)
MCMC algorithms are very widely used to explore and sample from a complicated high-dimensional target probability distribution . Although other applications exist, the most common use of MCMC is in cases where is a posterior distribution which is defined in terms of a prior distribution p, and a sampling density that is indexed by a parameter ∈ Θ ⊂ R d and evaluated at the observed data y 0 ∈ .
McAuliffe, 2017) or INLA (integrated nested Laplace approximation) (Rue et al., 2017). However, it is usually far preferable to find general methods of obtaining verifiably accurate and reliable MCMC samples even in high dimension. Indeed, I would argue that this is the central challenge of Bayesian computation today. One approach in this direction is to use the results of Roberts and coauthors on the optimal scaling of proposal distributions (e.g., Roberts, Gelman & Gilks, 1997;Roberts & Rosenthal, 1998). These results involve speeding up the original Metropolis algorithm by a fixed power of the dimension d (e.g., d or d 1∕3 ), and then proving that as d → ∞, the resulting Markov process converges to a diffusion limit whose speed can then be optimized. The limiting diffusion no longer depends on dimension, which seems to imply that the computational complexity of the algorithm is equal to the order by which the chain was sped up. This intuition was formalized in Roberts & Rosenthal (2016), where it was proven that traditional random-walk Metropolis algorithms converge to stationarity in O(d) iterations, and Metropolis-adjusted Langevin algorithms converge in O(d 1∕3 ) iterations. However, these results were established only under the very strong assumptions required for the original diffusion limits, including that the target distribution factors into i.i.d. components (although in simulations they do appear to approximately hold much more generally (Roberts & Rosenthal, 2001)). These theoretical assumptions would never hold in practice, which leads to questions about how generally the corresponding results will hold in real MCMC applications.
So where does this leave us? We can say without hesitation that there has been a tremendous amount of success in using and verifying MCMC on moderately sized problems. However, when MCMC is used on much larger problems, it tends to suffer from either too slow convergence, or lack of reliability, or incomplete theoretical justification, or inaccurate approximation, or unrealistic assumptions. Thus, the overall goal of obtaining fast, accurate, reliable, verifiable MCMC sampling algorithms in general, very high-dimensional problems is not yet completely resolved. On the positive side, this vexing challenge provides many new research directions, which will surely occupy the MCMC and Bayesian computation communities for many years to come.

The Workhorse of Bayesian Computation
Although it continues to be referred to as a last-resort computational method (e.g., Thompson, 2011) to be used only when other numerical methods are ineffective, MCMC sampling has been widely adopted because of its ease of implementation and available software. Expanding on the discourse of Section 2, consider Hastings' generalization of the Metropolis sampler (Hastings, 1970), the so-called Metropolis-Hastings (henceforth, MH) algorithm, which is probably the most widely used MCMC sampler.
Assume that the state of the chain at time t is t . Given a user-defined proposal distribution q(⋅| t ), the update rule to construct t+1 is defined by the following two steps: Step 1 Draw a proposal * from the proposal density q( | t ).
Step 2 Set The pesky denominator in Equation (1) does not impede the calculation in Equation (3). However, the latter requires the ability to calculate the sampling density (y 0 | ) for any parameter value . The modern challenges posed by Bayesian computation have their roots in this apparently simple requirement.

Challenges Posed by Big Data
The increase in data volume presents an important initial challenge to classical MCMC procedures. For a sample of size N in the hundreds of thousands or millions, running an MCMC sampler for thousands of iterations becomes inefficient at best and impossible at worst since most samplers use at least O(N) operations to update the underlying Markov chain. Occasionally, the data volume is too large to be stored on a personal computer, making it impossible to update the Markov chain using Equation (3).
In other areas of statistical computation, algorithm designers have taken advantage of parallelization strategies in which a task is divided between a number of parallel "workers," where a worker can be a processing unit, a computer core, etc. (e.g., Schervish, 1988;Schubert & Gertz, 2018). Alas, MCMC samplers are notoriously resistant to parallelization and require the design and adoption of new ideas and different approximation techniques. For instance, Suchard et al. (2010) propose using a GPU's multiple processors to speed up a block Gibbs algorithm. Their approach exploits the GPU's multiple cores, which allows computing tasks within each MCMC iteration to be parallelized. The proposed approach relies on the intrinsic synchronicity of the parallel tasks, i.e., all cores must complete their tasks before the chain can finish updating. This approach is difficult to generalize to other MCMC samplers where an update cannot be easily split among independent workers. Others have discussed parallel MCMC methods (Rosenthal, 2000;Laskey & Myers, 2003;Wilkinson, 2006), where each worker runs on the full dataset. However, these methods do not resolve memory overload and may require intensive communication between workers during the simulation, which can push the computation budget beyond realistic bounds. It has been recognized that synchronous coordination (i.e., workers must wait for each other to finish before moving on to the next step) or frequent communication between workers slows down computation significantly and should be avoided. Some progress on asynchronous updates in Gibbs samplers is reported in Terenin, Simpson & Draper (2020), albeit under relatively stringent conditions. One strategy to reduce communication between workers and eliminate the need to store all the data on a single server is found in the embarrassingly parallel design in which the data are partitioned into K equally sized subsets called shards, each of which is analyzed independently by a different worker. Each worker uses an MCMC algorithm to draw samples from the posterior corresponding to that data shard, which is called a subposterior. Some essential MCMC-related questions regard (1) which subposterior distributions should be built for each shard and (2) how MCMC samples obtained from each subposterior should be combined to recover all information that would have been produced by an MCMC sample from the full posterior distribution. The subposteriors developed in the literature for the sth shard have the form where y (s) 0 is the sth data shard and a s , b s ∈ R are user-defined. For instance, Wang & Dunson (2013), Neiswanger et al. (2013), Scott et al. (2016), and Nemeth & Sherlock (2018) use a s = 1∕K and b s = 1 for all s = 1, … , K, but differ in their approach to combining subsamples. Specifically, Neiswanger et al. (2013) approximate each subposterior using a kernel density estimator, Wang & Dunson (2013)  Nemeth & Sherlock (2018) use Gaussian processes. Another option is to use a s = 1 and b s = K for all s = 1, … , K as in the likelihood-inflating algorithm (LISA) developed in Entezari, Craiu & Rosenthal (2018) for Bayesian additive regression trees. More recently, Changye & Robert (2019) have proposed a s = s ∕K and b s = s and used random forests and reweighted importance sampling to approximate subposteriors. These divide-and-conquer methods are well justified for Gaussian (or mixtures thereof) posteriors and subposteriors, but theory is lacking otherwise. The most general "fix" is to use importance sampling so that each subposterior sample is properly weighted, but this strategy adds significantly to the computational burden. Extending divide-and-conquer techniques to models for non-i.i.d. data is also challenging because batches are not independent or even exchangeable.
Because the size of the data poses serious challenges, it is reasonable to consider using only a subset of the sample to speed up MCMC computation. One essential aim is to not alter the statistical properties of the sample when trimming it. In its simplest and most intuitive form, sample reduction can be easily understood when the model admits a sufficient statistic and the posterior that conditions only on it, instead of the full sample, provides the same information about the parameter. Alas, such simplicity is almost never available in complex models, so new ideas are needed.
Consider the log likelihood obtained by adding N independent terms, each corresponding to an independent item in the sample: where l i ( ) = log (y 0,i | ). It is tempting to take a random subsample of size r of the data items {i 1 , … , i r } and use l r ( ) = ∑ r =1 l i ( ). However, replacing l N by l r in the transition kernel of the Markov chain used to run, say, an MH sampling algorithm, alters the target distribution of the chain and can lead to large inferential errors. This concern can be alleviated, if an unbiased estimator of the likelihood is available, given the pseudomarginal approach developed in Andrieu & Roberts (2009). The authors show that the target distribution of an MH sampling algorithm remains unchanged if the likelihood is replaced by an unbiased estimator when computing the acceptance ratio in Equation (3). An important addendum is that the sampling efficiency of the new algorithm degrades as the variance of the ratio of the likelihoods increases (Andrieu & Vihola, 2015), thus discouraging a choice of r much smaller than N.
Some of these challenges are tackled in Quiroz et al. (2018), which specifies a subsampling-based approach in which a different subsample of size r is used in each iteration of the MCMC sampler. The authors introduce an estimator of the likelihood that is approximatively bias-corrected and, to keep in check the variance of the likelihood ratio, control variates. Because the estimator is not exactly unbiased, the transition kernel of the MH chain deviates from the original one and has a perturbed target distribution. However, Quiroz et al. (2018) bound, for important general classes of models, the total variation distance between the perturbed chain's target and the posterior distribution of interest, ( |y 0 ) ∝ (y 0 | )p( ), and make recommendations about choosing the size of the subsample.
Keeping with the "data trimming" theme, Huggins, Campbell & Broderick (2016) considered a different approach based on the concept of a Bayesian coreset. Here, a single subset of the data is selected and treated as "the data" for inferential purposes. The working assumption is that most large data are redundant, so it is possible to reduce their size while preserving their statistical properties. Bounds for the distance between the full data likelihood and the one corresponding to the coreset are derived theoretically. For Gaussian models, Huggins, Campbell & Broderick (2016) also derived bounds for the discrepancy between the full and reduced data posteriors, but similar results are difficult to quantify theoretically in more general cases.
The Canadian Journal of Statistics / La revue canadienne de statistique

Challenges Posed by Intractable Likelihoods
The modern statistician must deal with workflows of increasing complexity. Large studies not only lead to large volumes of data but can also provide answers to more complex questions. The latter require complex models that elude closed forms. Computer simulators of complex phenomena, e.g., hurricane-driven surges (Plumlee et al., 2021), groundwater modelling (Cui et al., 2018) or climate-change scenarios (Oyebamiji et al., 2015), exemplify generative models in which data can be generated for every configuration of model parameters, but the corresponding likelihood is not available. When, for any parameter value ∈ R q , synthetic data y ∼ (y| ) can be generated from the model, one can still conduct a Bayesian analysis. We discuss here two computational approaches that have gained considerable momentum in recent years: approximate Bayesian computation (ABC) (Marin et al., 2012;Baragatti & Pudlo, 2014;Drovandi, 2018;Sisson, Fan & Beaumont, 2018a) and Bayesian synthetic likelihood (BSL) (Wood, 2010;Drovandi et al., 2018;Price et al., 2018). Both algorithms are effective when combined with MCMC sampling schemes to produce samples from an approximation of the posterior.
In its simplest form, ABC is an accept/reject sampler. Given observed data y 0 , a user-defined threshold > 0, a distance d ∶ R p × R p → R + , and a summary statistic (y) ∈ R p , the algorithm has the following steps: S1 Sample * ∼ p( ) and synthetic data y ∼ (y| * ). S2 If d( (y), (y 0 )) ≤ , then accept * as a sample from the approximate posterior ( | (y 0 )), the marginal of the joint distribution When is a sufficient statistic and = 0, the approximate posterior is the true posterior, i.e., ( | (y 0 )) = ( |y 0 ). To verify, let us work under the simplifying assumption that both and y take discrete values. One can easily see that which holds because is sufficient and = 0.
The remarkable feat of exploring an approximation of the posterior even when the likelihood is intractable has generated a lot of interest, as demonstrated by the large number of articles and ideas, all of which simply cannot be discussed here. Instead, I focus on a few essential developments. The practical implementation of the ABC algorithm requires choosing a number of simulation parameters, e.g., d, , or . Theory-backed recommendations for the choice of can be found in Fearnhead & Prangle (2012) and Prangle (2015). More radically, Bernton et al. (2017) bypass the need to select the statistic by computing the Wasserstein distance between the empirical distributions of the observed and synthetic data.
In the absence of information about model parameters, the prior and posterior distributions may have nonoverlapping regions with significant mass. Hence, parameter values that are drawn from the prior, as in S1, will rarely be retained. Recognizing this, Marjoram et al. (2003) proposed an ABC-MCMC algorithm that relies on building an MH transition kernel with the state space {( , y) ∈ R q ×  n }, proposal distribution q( | t ) (y| ) at iteration t, and the target ( , y|y 0 ) given in Equation (4). The acceptance probability in Equation (3) can be computed exactly because the intractable terms involving the likelihood (y| ) cancel out. There are a few alternatives to Marjoram et al.'s sampler, motivated by low acceptance probabilities for small values of . For instance, Lee, Andrieu & Doucet (2012) The Canadian Journal of Statistics / La revue canadienne de statistique DOI: 10.1002/cjs.11707 and replace ( , y|y 0 ) in the calculation of the acceptance probability with an unbiased estimator J −1 ∑ J =1 1 {d( (y ), (y 0 ))< } , where J ≥ 1 and each y is independently simulated from (y| ). The acceptance probability tends to increase compared to Marjoram et al.'s sampler, but in order to keep the variance of the estimator under control, a large number of pseudo-data generations may be needed to produce stable estimates of the tail probability in Equation (5) when is small. Other MCMC designs suitable for ABC can be found in Bornn et al. (2014). Sequential Monte Carlo (SMC) samplers have also been successfully used for ABC (Sisson, Fan & Tanaka, 2007;Lee, 2012;Filippi et al., 2013) and rely on a user-specified decreasing sequence 0 > · · · > J . A comprehensive coverage of ABC-related theory and computational techniques can be found in Sisson, Fan & Beaumont (2018b) and the references therein.
An alternative approach to bypass the intractability of the sampling distribution is proposed in Wood (2010). This approach is based on the working assumption that the conditional distribution of a user-defined statistic (y) given is Gaussian with a mean and a covariance matrix Σ . The synthetic likelihood (SL) procedure assigns to each the likelihood SL( |s 0 ) =  (s 0 ; , Σ ), where s 0 = (y 0 ) and  (x; , Σ) denotes the density of a normal random variable with a mean of and a covariance of Σ. SL can be used for maximum likelihood estimation, as in Wood (2010), or within a Bayesian paradigm, as proposed in Drovandi et al. (2018) and Price et al. (2018). The latter work proposes sampling the approximate posterior generated by the BSL approach, L ( |s 0 ) ∝ p( )SL( |s 0 ), using an MH sampler. Direct calculation of the acceptance probability is not possible because the conditional mean and covariance are unknown for any . However, both can be estimated from m statistics s 1 , … , s m sampled from their conditional distribution given . More precisely, after simulating y i ∼ (y| ) and setting so the SL is SL( |s 0 ) =  (s 0 ;̂,Σ ).
An MH algorithm designed to sample from the posterior L ( |s 0 ) is updated through the following steps at iteration t; ∼ (y| * ) and compute * and Σ * as in Equation (6) to obtain SL( * |s 0 ). SL3 Set t+1 = * with probability = min } and t+1 = t with probability 1 − . Running MCMC samplers for either ABC or BSL involves generating multiple pseudosamples and can become extremely costly in situations where data are high-dimensional and very large or expensive to generate, like in the hurricane path or climate change examples. Anticipating that complex models are usually motivated by big data, Levi & Craiu (2022) proposed strategies to minimize the amount of pseudodata that needs to be simulated. When computing MH acceptance ratios, the authors recommend reusing some of the proposals ( , y) from the chain's history. This modification of the chain's kernel reduces computation time by orders of magnitude but perturbs the target distribution. Their theoretical developments demonstrate that this error can be controlled in independent Metropolis samplers.

Conclusion
The challenges posed by Bayesian computation are important and require classical methods to be significantly reframed. I have discussed a number of methods that address two types DOI: 10.1002/cjs.11707 The Canadian Journal of Statistics / La revue canadienne de statistique of challenges: big data and intractable likelihoods. These two challenges cannot always be neatly separated and, in fact, I expect to increasingly see problems that combine the two. My discussion has not included important classes of approximation for Bayesian inference that eliminate the need for Monte Carlo sampling altogether, e.g., variational Bayes (Blei, Kucukelbir & McAuliffe, 2017) or INLA (Rue et al., 2017) (both also mentioned in Section 2). While these methods can be extremely efficient in certain models or when aimed at particular applications, they lack the level of generality exhibited by some of the other methods discussed so far. Clearly, our efforts to expand the toolbox of Bayesian computation in the modern era are just revving up. Much remains to be explored, from new conceptual ideas to efficient and automatic implementations. I urge those willing to participate in this adventure to be more accepting of the idea that approximations are unavoidable when dealing with challenges like the ones I described here, but to also keep in mind that the errors incurred must be theoretically and practically controlled under any scenario of interest.

BAYES IN THE TIME OF COVID (PG)
In contemplating the direction of my contribution to this article, I considered looking back to comment on the evolution of Bayesian analysis over recent decades. I also pondered looking forward to speculate on where this field might be headed. In the end, however, I decided to, instead, hone in on the present. While the global pandemic has upended society, Bayesian methods have been useful within the scientific response to the pandemic. I briefly describe several examples of this utility, drawing on the work I have been involved in as well as that of others.

Uncertainty Arising in Diagnostic Testing
Most diagnostic tests used in medicine are imperfect. At least qualitatively, it is well known that false-positive and false-negative test results can occur. Quantitatively, epidemiologists and statisticians strive to react appropriately to this reality. In typical-use settings, a diagnostic test is evaluated methodically before widespread use and the manufacturer states the test's performance characteristics as part of a "package insert." This would be expressed as values for sensitivity and specificity, the chance that a correct test result ensues for a true positive and a true negative subject, respectively.
At the start of the pandemic, diagnostic tests for COVID-19 were developed and widely deployed with amazing haste. For instance, in the case of PCR testing of nasopharyngeal swabs to diagnose current infection, Canadian provinces started posting daily reports on their case-finding efforts around the beginning of February 2020. Soon after, by April 2020, the first serological tests for prior infection were being used in scientific studies (so-called "sero-surveys"). In both cases, and because of the necessary haste, testing data were collected and analyzed with less than the typical understanding of the diagnostic test's performance. Thus, statistical efforts were made to acknowledge the uncertainty about performance, a task for which Bayesian methods are well, if not singularly, suited. Burstyn, Goldstein & Gustafson (2020) presented an early instance of an analysis acknowledging PCR test imperfection. (The first version of this article was posted to medrxiv.org on 11 April 2020.) The authors showed adjusted epidemic curves for both the province of Alberta and the city of Philadelphia for March 2020. More technically, they gave Bayesian point and interval estimates for the daily number of true positives among those receiving a test, as distinct from the daily number of reported positives.
In formulating the underlying Bayesian model, the authors made an important design choice to let the unknown test sensitivity vary by day, whereas the unknown test specificity was presumed static. To exemplify matching the prior specification to the scientific context at hand, I elaborate briefly on this point, with pertinent discussions in Burstyn, Goldstein & Gustafson (2020) and Günther et al. (2021). It is well understood that PCR technology has effectively perfect "technical" sensitivity, i.e., virtually any small amount of live virus on the swab will light up the machine. The swabbing, however, is the weak link. Even a swab done by a highly trained health practitioner may not capture any virus particles if the (truly infected) test subject has low virus levels in the nasal cavity. Consequently, effective sensitivity will be lower among recently infected individuals compared to those with more established infections. Extrapolating further, when testing a largely asymptomatic population, true positives will tend to be recently infected, so sensitivity will be lower. However, if only those with respiratory symptoms are eligible for a test, then the true positives will tend to have more established infections, resulting in a higher sensitivity. It is then not reasonable to expect a globally valid package-insert value for test sensitivity, since the nature of the population being tested is critical. Moreover, particularly early in the pandemic, there were temporal changes in eligibility for a test (based on symptoms, say) due to kit availability, lab capacity, and other considerations. Hence, a prior specification allowing a smooth change in test sensitivity over time was an important ingredient in the analysis. Turning now to testing for prior infection, in April 2020 a controversy arose in both scientific and lay circles regarding an early sero-survey conducted in Santa Clara, California. The initial preprint version of Bendavid et al. (2021) attracted much attention and criticism. A brief version of one concern is as follows. Package-insert values of test sensitivity and specificity,̃n and p, are inputs to an analysis inferring lifetime prevalence of infection from a sample. With a truly rare infection, even if̃p is close to 1, the estimated prevalence can vary strongly under a small perturbation tõp. Because of the urgent need to roll out tests and studies at warp speed, validation studies to determine diagnostic test properties were based on relatively few (known negative and known positive) test specimens, and a problem ensued.
Gelman & Carpenter (2020) reviewed this challenging situation and provided a careful Bayesian analysis acknowledging the uncertainty in test specificity (and sensitivity). These authors demonstrated much greater a posteriori uncertainty about the lifetime prevalence in the study population relative to the initial version of Bendavid et al. (2021). Gelman and Carpenter also illustrated other modelling features that can be brought to bear quite simply in a Bayesian framework. Of note, they showed how a hierarchical Bayesian analysis can incorporate multiple validation studies, each of which involves similar but not identical test characteristics.

Uncertainty Arising from Unknown Testing Patterns
Early in the pandemic, it was particularly challenging to learn infection rates and consequently infection fatality rates. Even in jurisdictions able to enumerate deaths due to COVID-19 infection relatively well, there were probably many undiagnosed infections. In terms of the infection fatality rate then, the numerator could be estimated reliably. However, particularly before the availability of serological tests for past infection, reasonably estimating the denominator was challenging. Wu et al. (2020) presented a Bayesian analysis targeting the total number of infections in the United States up to mid-April 2020. As part of the authors' modelling of state-level testing data, they built a defensible prior distribution describing relationships between the severity of respiratory symptoms and the likelihood of being tested. In common with Gelman & Carpenter (2020), they also used a prior distribution to adjust for the imperfect diagnostic test.
The principal finding of Wu et al. (2020) was that the total number of cases was likely between 3 and 20 times greater than the number diagnosed. Of course, this is a very wide credible interval. However, this width is appropriate given the information available at the time. In common with Gelman & Carpenter (2020), a thoughtful Bayesian analysis of the available DOI: 10.1002/cjs.11707 The Canadian Journal of Statistics / La revue canadienne de statistique information did not yield a sharp answer. This is a feature though, not a bug. The principled propagation of uncertainty afforded by Bayes' theorem implies that sharper inference would not be justified. In a related vein, Campbell et al. (2021) developed a Bayesian model admitting both surveillance data (tests for current infection within the confines and recommendations of public health surveillance at the time and place in question) and sero-survey data (tests for previous infection in a random sample conducted in the target time and place). The surveillance data were handled by modelling the association between infection status and testing status via an unknown parameter describing the extent to which the testing is preferential (i.e., honing in on those infected) rather than random. The authors ascribed a hierarchical prior distribution to these jurisdiction-specific preferential testing parameters. By applying this methodology to an evidence synthesis for European countries in Spring 2020, the infection fatality rate was estimated to be 0.53% (the posterior median) with an accompanying 95% credible interval of (0.39%, 0.69%). This inference is quite compatible with other estimates targeting European settings at about the same point in the pandemic but using different data sources and methods.

Just Turn the Bayesian Crank, or Understand it as Well?
In the examples alluded to above, Bayesian inference can be "plug and play." Once the heavy lifting is done in making defensible model and prior assertions, these can be encoded in a Bayesian software package (see Section 5). Then, data go in and posterior inference comes out. Scientifically, this can be the end of the narrative arc. One has a general assurance that reported estimates and posterior uncertainties are founded in principle based on the combined information content of the data, the supplied model, and the prior assertions.
As a specific example of ending the narrative, Burstyn, Goldstein & Gustafson (2020) presented the posterior distribution of the (time-varying) sensitivity and (static) specificity of the diagnostic test. Focusing on the Alberta data, the posterior distribution of test sensitivity was very similar to the prior distribution, i.e., the data did not provide additional information about the false-negative rate. On the other hand, and perhaps curiously, the posterior distribution of specificity was far more concentrated than the prior distribution. The former concentrated above 99.5% specificity with a mode of 100%, despite the latter being uniformly spread between 95% and 100%. The data were quite certain that the specificity was very close to perfect. An applied user can simply stop here and take this away as useful (or not so useful, in the case of the finding for sensitivity) knowledge. However, the mathematical scientist will naturally ask: why are the findings such?
It turns out that the explanation for the above findings arises from the partial identification that underlies this Bayesian model. The concept of partial identification is long studied, particularly through a frequentist lens and with an econometrics slant. Manski (2003) is a well-known reference. More recently, I have attempted to give a somewhat thorough Bayesian treatment of the topic (Gustafson, 2015). In essence, in lower information settings such as those described above, not all the parameters are uniquely determined by the law of the observable data. However, those not uniquely determined may be subject to inequalities in terms of those that are. By elucidating these inequalities, one can understand directly how the Alberta data said so little about test sensitivity and so much about specificity.
As it happens, such an understanding of what lurks behind the crank is also considered in Campbell et al. (2021). Tucked away in a supplement is a mathematical elucidation of the partial identification structure applicable to the surveillance data part of the model. (The sero-survey component is much simpler, with those tested presumed to be a random sample from the population.) Again, by elucidating inequalities in the parameter space, one sees that, depending on the configuration of infection rates and the extent of preferential sampling across jurisdictions, the surveillance data contribute differently to inference about the infection fatality rate. We can go beyond simply taking the posterior distribution as a fait accompli.

Looking Ahead
Even before the onset of the pandemic, the role of statistical modelling and inference in improving the human condition was, arguably, on the upswing. As we all fervently wish for the pandemic to appear in the rear-view mirror, it seems likely that recent experiences will expedite this trend further. Because in large part of computational advances, the Bayesian toolbox proved itself to be at the ready as part of the pandemic response. This success should spur efforts to refine the toolbox further and to apply it more widely in the years to come.

DISCUSSION
To conclude this article, we briefly comment on the relationship between algorithms (the focus of Sections 2 and 3) and applications (the focus of Section 4). Clearly, the former enable the latter. That said, it is interesting to note how this enabling works. There are long-standing efforts to build software that can permit users to compute posterior quantities, without relying on users' expert knowledge of algorithms. While perhaps we are not fully there yet, the ideal is that an applied user will just declare a model and a prior, input the data, and press the "compute posterior" button. By no means do we attempt to mention all of the software for Bayesian inference here. We note, however, that the BUGS (Bayesian inference using Gibbs sampling) software package (Gilks, Thomas & Spiegelhalter, 1994;Lunn et al., 2000;Lunn et al., 2009) and its continuation (roughly speaking), the JAGS (just another Gibbs sampler) software package (https://mcmc-jags .sourceforge.io), represent three decades of evolution, with algorithmic developments to improve performance along the way.
More recently, specific algorithmic developments have spawned new software. The Stan software package (Carpenter et al., 2017), which has seen extremely rapid adoption in applied fields, implements Hamiltonian Monte Carlo algorithms (see Hoffman & Gelman (2014) and the many references therein). Even more recently, the Blang software package (Bouchard-Côté et al., 2019) uses implementations based on recent research in both SMC and nonreversible MCMC methods. Fortunately, the path proceeding from algorithm research to software implementation to scientific application is one marked by continual upgrades!