THEMIS: A Parameter Estimation Framework for the Event Horizon Telescope

The Event Horizon Telescope (EHT) provides the unprecedented ability to directly resolve the structure and dynamics of black hole emission regions on scales smaller than their horizons. This has the potential to critically probe the mechanisms by which black holes accrete and launch outflows, and the structure of supermassive black hole spacetimes. However, accessing this information is a formidable analysis challenge for two reasons. First, the EHT natively produces a variety of data types that encode information about the image structure in nontrivial ways; these are subject to a variety of systematic effects associated with very long baseline interferometry and are supplemented by a wide variety of auxiliary data on the primary EHT targets from decades of other observations. Second, models of the emission regions and their interaction with the black hole are complex, highly uncertain, and computationally expensive to construct. As a result, the scientific utilization of EHT observations requires a flexible, extensible, and powerful analysis framework. We present such a framework, Themis, which defines a set of interfaces between models, data, and sampling algorithms that facilitates future development. We describe the design and currently existing components of Themis, how Themis has been validated thus far, and present additional analyses made possible by Themis that illustrate its capabilities. Importantly, we demonstrate that Themis is able to reproduce prior EHT analyses, extend these, and do so in a computationally efficient manner that can efficiently exploit modern high-performance computing facilities. Themis has already been used extensively in the scientific analysis and interpretation of the first EHT observations of M87.


Introduction
The Event Horizon Telescope (EHT), a global array of millimeter and submillimeter radio telescopes, has resolved the horizons of at least two black holes (Doeleman et al. 2008(Doeleman et al. , 2012Doeleman 2010;Event Horizon Telescope Collaboration et al. 2019a). This provides a unique window on the high-energy astrophysical processes responsible for the substantial growth and inordinate influence of supermassive black holes (Fabian 2012;Heckman & Best 2014), the dynamics and thermodynamics of material in the strong gravity regime (Narayan et al. 1998;Yuan & Narayan 2014), and the fundamental nature of black holes (Broderick et al. 2014;Psaltis et al. 2016). However, efficiently and accurately extracting this information from the observational data presents numerous challenges, requiring the development of novel analysis tools tailored to the EHT data products, EHT-target properties, and auxiliary information.
The EHT achieves an extraordinary resolution of 13 μas, making it the highest resolution imaging instrument in the history of astronomy. It does this via very long baseline interferometry (VLBI), in which information from pairs of individual stations separated by Earth-size distances are combined to measure smallscale structure on the sky. The resulting data take the form of complex visibilities, directly related to the Fourier transform of the image. This can be performed in all four Stokes parameters, yielding complete information about the resolved polarization structures (e.g., . In the near future, this will be extended to multiple wavelengths (1.3 and 0.87 mm; Falcke 2017). Millimeter-VLBI observations of the primary EHT targets have already been carried out at multiple epochs, covering times ranging from 10 s to 10 yr (Doeleman et al. 2008;Fish et al. 2011Fish et al. , 2016. The first horizon-resolving images have been published, with ancillary scientific analyses (Event Horizon Telescope Collaboration et al. 2019aCollaboration et al. , 2019bCollaboration et al. , 2019cCollaboration et al. , 2019d hereafter Papers I, II, III, IV, V, and VI, respectively). These have revealed a resolved shadow with brightness maximum offset from the direction of the large-scale jet, broadly consistent with the model predictions in Broderick & Loeb (2009a), Dexter et al. (2012), and Mościbrodzka et al. (2016). Extracting quantitative information about these has required detailed model comparisons directly with the complex visibilities (see, e.g., Papers V, VI).
Difficulties in the phase calibration, and lesser-though still significant-complications in the amplitude calibration of these visibilities, have motivated the construction of a set of VLBI observables (e.g., visibility amplitudes and closure phases : Jennison 1958; closure amplitudes: Twiss et al. 1960Twiss et al. , 1962 visibility polarization fractions: Wardle 1971, Roberts et al. 1994, Fish et al. 2009) that probe the underlying image structure in nonintuitive ways. These have traditionally been interpreted within the context of a simple set of phenomenological models, e.g., multicomponent Gaussians. However, the substantial structure anticipated on horizon scales exhibited by the primary EHT targets has given rise to a broader modeling effort, which includes a variety of physical processes (Narayan et al. 1998;Falcke & Markoff 2000;Yuan et al. 2002;Broderick & Loeb 2006a;Chan et al. 2009Chan et al. , 2015Mościbrodzka et al. 2009Mościbrodzka et al. , 2014Broderick et al. 2011Broderick et al. , 2016Dolence et al. 2012;Yuan & Narayan 2014;Gold et al. 2017;Shiokawa et al. 2017;Chael et al. 2018a).
This modeling effort is further motivated by the large amount of ancillary data that exists for EHT targets. All EHT targets are necessarily bright radio sources and thus have been the object of substantial astronomical scrutiny. Both the Galactic center (SgrA * ) and M87 have been studied across the electromagnetic spectrum, from decameter wavelengths (de Gasperin et al. 2012) to very high-energy gamma-rays (>1 TeV; Eckart et al. 2006). Moreover, due to the close proximity to their central black holes, both are empirically highly variable, providing statistical information about the dynamics within the millimeter-wavelength emission regions and creating opportunities to probe this dynamics directly at multiple wavelengths (Eckart et al. 2008). Physical modeling of these sources provides the unique ability to synthesize all of these observations, which when combined with EHT data can provide a detailed description of the conditions and dynamics of material near black hole horizons.
There are substantial challenges to such a broad modeling effort. First and foremost, models of the near-horizon region are necessarily complicated, invoking multiple emission components (nonthermal and thermal emission regions with uncertain and potentially distinct locations; Özel et al. 2000;Dexter et al. 2010;Shcherbakov et al. 2012;Chandra et al. 2015;Ressler et al. 2015Ressler et al. , 2017Mao et al. 2017;Chael et al. 2018a;Davelaar et al. 2018), a variety of dynamical processes (orbital motion, accretion flow height, winds, jets, explosive events, etc.; Broderick & Loeb 2006a;Dolence et al. 2012;Dexter & Fragile 2013;Fraga-Encinas et al. 2016;Pu et al. 2016a;Roelofs et al. 2017;Medeiros et al. 2018;Pu & Broderick 2018;Jeter et al. 2020), strong lensing in a potentially uncertain spacetime (Kerr or beyond; Bambi & Freese 2009;Johannsen & Psaltis 2010;Johannsen 2013;Broderick et al. 2014;Johannsen et al. 2016;Mizuno et al. 2018), polarization transfer effects (e.g., Faraday rotation and conversion; Shcherbakov et al. 2012;Mościbrodzka et al. 2017;Jiménez-Rosales & Dexter 2018), and propagation effects (e.g., interstellar scattering; Johnson & Gwinn 2015;Johnson et al. 2018;Issaoun et al. 2019). To add to the complexity of this comparison, only Fourier modes along specific tracks in the two-dimensional Fourier domain are probed via Earth-aperture synthesis on timescales that can be comparable to intrinsic source variability . Thus, any tools constructed to make comparisons between physical models of EHT targets and the collection of EHT and auxiliary data must be extremely flexible.
Second, there are clear emission-model independent features in many images that arise from the structure of the underlying spacetime. These include the black hole shadow-the silhouette of the black hole determined by its photon orbits, first described by Hilbert (1917) and simulated by Luminet (1979). This is bounded by the photon ring, a bright ring arising from the stacking of multiple images, in which the gross features of the spacetime are encoded. Thus, there is substantial motivation to directly extract these generic features from the EHT data alone. Again, this is complicated by the indirect relationship between the VLBI observables and the image, resulting in frequently counterintuitive conclusions. Hence, ideally, any tools for assessing the presence and properties of image structures should be able to extend to phenomenological models as well.
Third, the nature of EHT data has evolved rapidly over the past decade, growing as the sensitivity and baseline coverage improved. It is far from clear that any particular set of EHT data types is optimal for a given astrophysical or gravitational question. In some cases, new data types have been developed based on both instrumental and observational limitations (e.g., visibility polarization fractions). Similarly, intrinsic source variability has motivated the development of sophisticated statistical descriptions of observable quantities (Kim et al. 2016). Given the broad range of EHT and ancillary data types, any model comparison effort must maintain substantial flexibility in the kinds of information that it can utilize.
Fourth, there are many facets of the intrinsic data calibration that are potentially degenerate with modeling efforts. Examples include the calibration of the complex station gains and full instrument polarization response. Often these are obtained via fitting models of the instrument while assuming simplified source structures and/or iterative procedures in which the calibration is performed between attempts at source reconstruction (e.g., self-calibration). However, both of these impact the accuracy and precision of model parameter reconstructions. Thus, ideally, these calibration steps would be incorporated directly into the model-fitting effort.
Finally, in many cases, the construction of physically realistic models is computationally expensive, requiring ray tracing (relatively cheap) and radiative transfer (often expensive) through model structures. This difficulty is compounded by the often multimodal nature of the reconstructed posterior parameter distributions (see, e.g., Broderick et al. 2016). As a result, any analysis tools must be both computationally efficient and be able to exploit the large investment in high-performance computing resources.
It is to address these challenges that we have begun the development of an analysis framework for EHT and ancillary data: THEMIS. THEMIS is designed to be modular, extensible, and highly parallel, enabling the extraction of increasingly detailed information from EHT observing campaigns, both individually and in aggregate. Here we present the underlying design philosophy, structure, and validation tests of THEMIS, including the reproduction of a variety of published analyses. We then demonstrate the ability of THEMIS to extend these, presenting new analyses of phenomenological models that include the full set of published EHT observations. THEMIS has already been employed in the analysis of the first horizon-resolving imaging observations of M87 by the EHT (Papers V, VI).
In Section 2, we summarize the algorithms, components, and implementation details of THEMIS. Individual features are described in Sections 3-7. Various tests used to validate THEMIS features are presented in Section 8. A handful of novel results enabled by THEMIS are collected in Section 9. The computational performance of THEMIS and its key components, including the implications for high-performance computing (HPC) systems, is addressed in Section 10. Finally, conclusions are summarized in Section 11.

Structure
The driving motivation for THEMIS is to produce a modular and easily extensible framework for unifying existing and developing future analyses of EHT and auxiliary data. Modularity reduces the practical bars to significant contribution substantially: would-be developers need only understand the relevant elements of the interfaces between modules. In the presence of rapidly evolving data sets and model types, this is critical to ensuring the longevity of prior development efforts. THEMIS consists of five distinct collections of components, each of which is designed to be interchangeable: Data Structures: Management and standardization of observational data throughout THEMIS. These facilitate the rapid introduction of new data products, expand the capability of existing data products, and define the objects for which predictions are ultimately made. Models: Any algorithm that produces a prediction for some data object given a list of parameters. Models may be physically motivated or purely phenomenological. They are directly tied to underlying data structures via the declaration of those for which predictions can be made.
Priors: Priors encapsulate preexisting information about parameter values. Likelihoods: Likelihoods provide a method for directly comparing model and data objects. Note that in many cases elements of the underlying model may be subsumed into a likelihood (e.g., nuisance parameters that can be analytically marginalized over). Samplers: Samplers provide methods for efficiently exploring the model parameter space, providing information about the model parameters.
In practice, there is some overlap between component classes (e.g., Models, Priors, and Likelihoods), which may be implemented in more than one way. Nevertheless, this has proven sufficiently modular to enable rapid and significant model development already.
All THEMIS-based analyses are structured in the following way: 1. Generate the desired data objects, e.g., by reading in existing data sets. 2. Create an appropriate model object, i.e., declare a model capable of making predictions for the data selected. 3. Specify prior probability distributions for each model parameter. 4. Construct the relevant likelihood objects, combining data sets as desired. 5. Execute a sampler, reporting sampler-specific parameter information (e.g., generate chains for Markov Chain Monte Carlo (MCMC) samplers).
With the execution of the analysis now conceptually modularized, variations in each stage may be made with little practical effort.

Implementation
The main function is kept concise and is the only element of THEMIS a user that is simply running THEMIS needs to modify. The user may choose interchangeably different EHT data set(s), theoretical model(s), priors, likelihoods, and samplers to employ. Conceptually, this function is organized in a fashion that closely follows the analysis pipeline listed at the end of the previous section to improve usability. THEMIS also allows users to add a whole new functionality, such as additional models, which can be included easily into a clear and well-established structure. An object-oriented programming framework, along with inheritance, permits a clear and concise definition of component interfaces. Examples of how these are propagated through various THEMIS components are explicitly illustrated in the inheritance diagrams shown in Figures 1 and 2. Of practical importance, as seen in Figure 1, are the various classes of predictions enabled by a particular model class propagated to subsequent child classes (in this case, image-based models, see Section 4.1); for more details, see Section 4.
THEMIS is under version control provided by git with a modern, state-of-the-art branching strategy including master, development, and feature branches. Users are encouraged to generate new code branches, develop, and contribute to the code in the form of a pull request that will be reviewed by the THEMIS core development team.
A suite of tests is run regularly via a script in an effort to identify bugs or regressions as early as possible. The script performs these tests and sends a report to the THEMIS core development team. These include short tests using EHT data as well as full-scale parameter estimation validation tests similar to the ones presented in Section 8.
The code is written in C++, making it maximally portable, and has been tested on a variety of systems. THEMIS is designed with minimal dependency on external libraries to avoid installation conflicts; currently, the only required external libraries are FFTW (Frigo & Johnson 2005) and the Message Passing Interface (MPI). 93 Up-to-date documentation is critical in a rapid development environment. To meet this challenge, THEMIS has integrated documentation comments which may be optionally rendered via Doxygen 94 to produce a comprehensive, cross-linked HTML and/or PDF document.
We now turn to describing each component collection independently.

THEMIS Data Structures
Within THEMIS, observational data are collected in typespecific data structures. Each has a singular data element defined (a datum object) and an associated plural data structure (a data object) that provide additional input/output facilities and element access functions. At a minimum, these provide access to the values and their uncertainties. Typically, they Figure 1. Inheritance diagram for the model_image object within THEMIS generated via Doxygen. These are models whose primary output is a raster image. Note that a number of models that are either analytically tractable or extend beyond a single, raster image are not shown. A full listing of THEMIS models can be found in the THEMIS documentation. 93 Information on the MPI 3.1 standard can be found at www.mpi-forum.org. 94 Information on Doxygen features, directives, and on how to obtain and install it may be found at www.doxygen.org. include a variety of additional "accoutrements," i.e., information necessary or useful in modeling the data. Importantly, these accoutrements are both data-type specific and extensible: information that only becomes useful in subsequent observations or analyses can be added without modifying the datamodel interface. For example, observed fluxes may initially include frequency as an accoutrement and later expand to include time, observation facility, etc.
Organizing data this way within THEMIS permits evolution in how data are employed in model comparisons and presents a simple way in which to include additional types of data that are currently unforeseen. This is especially important given the wide variety of auxiliary data that exist for EHT targets, most of which have yet to be fully utilized. This has already been implemented for a number of existing data types, including all for which EHT data have already been reported (see Table 1). We summarize each of these below.

Visibility Amplitudes
The primary product of VLBI observations is complex visibilities, corresponding to the Fourier modes of the image on the sky at spatial frequencies given by the projected baseline presented by pairs of VLBI stations. Specifically, in the absence of confounding effects, the complex visibility is given by where (u, v) is the two-dimensional projected baseline length between the ith and jth stations expressed in units of the observed wavelength, and I(l, m) is the spatial intensity distribution at angular position (l, m) (for a comprehensive introduction to radio interferometry, see, e.g., Thompson et al. 2017). 95 In practice, these are modified by a variety of observational complications, chief among which are atmospheric absorption and phase delays at individual stations, which impact the amplitude and complex phase of V ij . Of these, the latter is especially problematic, resulting in phase shifts of the V ij by many times 2π, appearing to randomize the phase on every baseline. As a result, often the magnitudes of the visibilities, | | V ij , are employed, which are subject only to a comparably modest uncertainty, 1%-20%, depending on station and atmospheric conditions (see, e.g., Lu et al. 2018), albeit containing less information on the structure of the image. The number of visibility amplitudes generated by an interferometer grows quadratically with the number of stations, N, scaling as ( ) µ -N N 1 2. Throughout an  observing campaign, the rotation of Earth produces a large number of independent measurements at different projected baselines. A large number of EHT visibility amplitudes have already been published for the primary EHT targets, including SgrA * (Doeleman et al. 2008;Fish et al. 2011;Lu et al. 2018) and M87 (Doeleman et al. 2012;Akiyama et al. 2015). We list these in Table 1.

Closure Phases
The atmospheric phase delays introduce random stationspecific phase errors that complicate the reconstruction of the phase of the complex visibilities. While strategies exist to model the phase errors, including self-calibration and phase referencing, 96 a simple and efficient way in which to obtain some information about these phases is via the closure phase, i.e., the sum of the phases of a triplet of visibilities measured on the baselines between some triplet of stations. 97 Because the baselines "close," i.e., ( , all station-specific phase errors vanish identically, leaving a quantity that is subject only to nonclosing errors and dominated by the image structure (Paper III). Of particular importance is the fact that closure phases are also insensitive to the image blurring induced by the diffractive component of the interstellar scattering. Closure phases are not unique-for an array with N stations only ( )( ) --N N 1 2 2 are independent-a result that is presaged by their independence from the phase delays.
Closure phases have been reported for a number of years by the EHT for SgrA * in Fish et al. (2011) andLu et al. (2018), and for M87 in Akiyama et al. (2015), as summarized in Table 1. More recently, a large set of closure phases has been reported for the 2017 April EHT observations of M87 (Paper III).

Closure Amplitudes
Station-specific amplitude calibration errors can also be mitigated by combining visibilities measured on multiple baselines. The closure amplitude is constructed from combinations of visibilities measured on four stations, and is insensitive to variations in the flux calibration and phase delays. Again, closure amplitudes are also insensitive to the image blurring induced by the diffractive component of the interstellar scattering. As with the closure phase, this comes at the price of uniqueness; there are only ( ) -N N 3 2 independent closure amplitudes.
Closure amplitudes constructed from EHT data have not yet been published, primarily due to the limited number of stations participating in early observations. However, recent observations have generated a number of trivial closure amplitudes, i.e., amplitudes for which one baseline is very short (10 km), as part of the calibration process (see, e.g., . More recently, a large set of closure amplitudes have been reported for the 2017 April EHT observations of M87 (Paper III).

Interferometric Polarization Fractions
All EHT sites observe simultaneously in two polarization states (Paper II). From these, it is possible to construct complex visibilities in all four Stokes parameters, (I, Q, U, V ) (see, e.g., . Independently, these can be used to construct visibility amplitudes, closure phases, and closure amplitudes. However, additional information may be obtained by combining observations made in different Stokes parameters. The interferometric polarization fraction, , are the visibilities associated with Stokes Q and U, and V ij is the visibility defined in Equation (1), is the extension of the familiar polarization fraction to the individual Fourier modes of the image.  m is not to be mistaken with the Fourier transform of the linear polarization fraction as measured in the image domain. Unlike the standard polarization fraction,  m ij may be larger than unity and can exhibit counterintuitive pathologies for even simple source models (see the discussion surrounding Figure S6 in the supplemental material in . If the circular polarization vanishes, like closure amplitudes, the interferometric polarization fractions are insensitive to station-specific flux calibration uncertainties and the diffractive component of the interstellar scattering. This approximation is frequently quite good: for the astronomical sources of primary interest to the EHT, the integrated circular polarization fraction is typically 1% (see, e.g., Muñoz et al. 2012;Bower et al. 2018).
Interferometric polarization fractions have been reported for SgrA * and indicate the presence of ordered horizon-scale polarization structures . We summarize these in Table 1.

Flux Measurements
A key auxiliary set of observations is the spectral energy density distributions (SED) for primary EHT targets, which typically place strong limits on the emitting particle distributions, which are otherwise uncertain. In addition, multiwavelength light curves are a key probe of the nature and origin of variability in the emission regions of the source. Both empirical constraints are intrinsically encoded in measurements of the unresolved source flux, F ν , effectively equivalent to the visibility amplitudes measured at "zero baseline," i.e.,a single dish. The distinction between these arises in the accoutrements associated with the data, e.g., the origin of the observation, wavelength, time, etc.
Multiple sets of flux measurement data for SgrA * and M87 exist. For SgrA * , one set is summarized in Table 1.

THEMIS Models
Within THEMIS, a model is any algorithm capable of generating a prediction for any THEMIS data type. Thus, THEMIS models are closely aligned with THEMIS data structures -for each data type, there is a corresponding base model type. Models can be based on multiple base model types, i.e., they are capable of generating predictions for more than one type of data. This enables a broad, easily extensible, and backwards compatible framework for defining models that permits incrementally increasing sophistication. Importantly, it provides a uniform interface for both phenomenological models, which are designed to make predictions for a handful of data types, and physically motivated models, which can simultaneously make predictions for a wide variety of data types.
The manner in which predictions are made is not prescribed. That is, where analytical expressions for the relevant data type exist (e.g., visibilities from simple geometric models), models are capable of employing these. For more complex models, numerical computations are often required. In anticipation of numerically produced predictions, THEMIS permits the passing of an accuracy parameter for each value that specifies the accuracy with which these must be generated; typically, setting this to 25% of the measurement uncertainty is sufficient to generate accurate parameter estimates (see Appendix A).

Image-based Models
Because the EHT directly probes the structure of horizonscale images, THEMIS contains an image-based model type; how this depends on the underlying data-based model types and some examples are shown in Figure 1. This provides a set of utilities for generating and manipulating visibility-based data from models that primarily generate images.
Because image generation is frequently computationally intensive, the image-based model introduces an additional position angle parameter, permitting the specific model implementations to dispense with trivial image rotations, leading to a substantial potential reduction in the time required to sample a broad range of parameters.
Once generated, images are padded with zeros by a factor of 8 by default to sinc-interpolate in the numerically computed complex visibilities. The complex visibilities are computed on a two-dimensional grid of (u, v) values via a two-dimensional fast Fourier transform (FFT) using the FFTW library (Frigo & Johnson 2005). There are no restrictions on the image dimensions, though it is expected that the image is computed on a rectilinear grid with uniform pixel size; dimensions that factor into small primes will be marginally faster.
Complex visibilities are then estimated at arbitrary (u, v) via interpolation. By default, THEMIS employs bicubic interpolation, though a user may specify bicubic spline interpolation if desired. From these, the closure phases are constructed via Equation (2). While visibility magnitudes may also be constructed from the interpolated complex visibilities, it is considerably more accurate to interpolate the visibility magnitudes directly. 98 These are then used directly or to compute closure amplitudes via Equation (3). The accuracy of this interpolation depends on the degree to which image features are captured by the image resolution and field of view. Ultimately, this must be empirically assessed on a model-by-model basis. For the compact radio sources like those anticipated to be relevant for EHT targets, these interpolation errors are typically a significantly subdominant source of uncertainty in model evaluation and parameter estimation.

Phenomenological Geometric Models
Within THEMIS, a number of phenomenological geometric models have been implemented. These are models for which no underlying physical emission mechanism is identified for the origin of the image structures. However, such models are capable of extracting signatures of geometric features associated with underlying physical processes of interest, e.g., black hole shadows. Currently implemented phenomenological models include the following.

Symmetric Gaussian
Historically, the first shadow size estimates from millimeter-VLBI observations of SgrA * and M87 arose from fitting symmetric Gaussians to visibility amplitude measurements (Doeleman et al. 2008). Therefore, we have implemented within THEMIS a model consisting of a single symmetric Gaussian component, characterized by a size, σ; and an amplitude, V 0 . This makes predictions for visibility amplitudes, closure phases (trivially zero), and closure amplitudes.

Asymmetric Gaussian
The introduction of asymmetry in millimeter-VLBI images was initially characterized by an asymmetric Gaussian. Within THEMIS, we have implemented such a Gaussian model parameterized as in Broderick et al. (2011), and characterized by a size, σ; an asymmetry parameter, A; the amplitude, V 0 ; and the position angle, ξ.

Multiple Symmetric Gaussians
THEMIS also includes a model consisting of an arbitrary number of symmetric Gaussian components, each characterized by a size, σ j ; location, (x j , y j ); and amplitude, V j .

Crescent Model
THEMIS includes an implementation of the crescent model described in Kamruddin & Dexter (2013), for which the image is obtained by subtracting two nonconcentric disks, with the smaller disk lying completely inside the larger one. The complex visibilities for this model can be obtained analytically and are given by Equation (3) of Kamruddin & Dexter (2013). As in Kamruddin & Dexter (2013), we reparameterize this in terms of an amplitude, V 0 ; overall size, R; relative thickness, ψ; degree of symmetry, τ; and the position angle, ξ. Both ψ and τ are defined on the unit interval. 4.2.5. The "Xsringauss" Model THEMIS also contains an implementation of the nineparameter xsringauss model proposed in Benkevitch et al. (2016). This model was constructed in an effort to mimic a more realistic black hole accretion image like the ones commonly obtained from physically motivated models. The xringaus image is the combination of an eccentric slashed ring (i.e., with a linear brightness gradient) and an elliptical Gaussian located in the thicker side of the ring. 98 The magnitude of the gradient of the complex visibility and the visibility amplitude are related via | | ( | |) 2 , and thus the former is generally smaller than the latter. As a result, the errors in interpolation at any order are typically smaller when interpolating visibility amplitudes directly. Alternatively, this permits considerably smaller image sizes when only amplitudes are required. This model is then described by nine parameters: the zerospacing flux, V 0 ; the external radius, R ex ; the internal radius, R in ; the distance between the centers of the circles, d; the "fading" parameter controlling the minimum brightness; the Gaussian axis sizes, a and b; the fraction of the total flux in the Gaussian, g q ; and the position angle, ξ. The complex visibilities for this model, in terms of these parameters, can be also obtained analytically. The reader is referred to Section 2 of Benkevitch et al. (2016) for a more detailed description.

Visual Binary
THEMIS also features a model of two radio emitting symmetric Gaussian components in orbit around each other. The model is characterized by 13 parameters including the total flux F i , size σ i , and spectral index α i of each component; the total mass of the system M, the binary mass ratio q1, the orbital separation R, the source distance d, the phase offset Φ 0 , the cosine of the inclination angle ( ) i cos of the orbital angular momentum vector, and the position angle in the sky ξ. This model includes (and therefore also takes advantage of) relativistic effects such as Doppler boost and relativistic aberration. It is explicitly time dependent while being fully analytic and thus fast to evaluate.
This model is to be compared to long-timescale monitoring campaigns of sources such as OJ 287 or other binary candidates. Details will be published in a separate paper that focuses on this topic.

Interstellar Scattering Models
Interstellar scattering modifies the intrinsic images of SgrA * by both blurring the image (diffractive component) and adding small-scale structures associated with a random realization of refractive modes that vary slowly throughout the night (refractive component; see, e.g., . These significantly modify visibilities on long baselines and must be included in analyses of EHT observations of SgrA * . In the ensemble-average limit, i.e., when many realizations of the fluctuations in the scattering screen may be averaged over, only the diffractive component is relevant. This appears as an image smoothing via convolution with a Gaussian kernel whose parameters depend on the details of the intervening scattering screen(s). THEMIS has implemented two models for addressing interstellar scattering, both in the ensemble-average limit, which we list below. In both, the impact of scattering is imposed directly on the visibilities, for which the convolution in image space reduces to a multiplicative factor. Within THEMIS, each is implemented as a model that modifies an existing intrinsic model, with the latter introducing additional parameters. Hence, scattering provides an explicit example of how the modular structure of THEMIS enables the rapid construction of new models.
While only a very simple set of scattering models has been implemented in THEMIS thus far, more complex models are available in the literature. Physically motivated models that exhibit a smooth transition from a quadratic to a general powerlaw wavelength dependence for the size of the scattering kernel may be found in Psaltis et al. (2018). In these, the wavelength at which the transition occurs is determined by the underlying physical parameters of the screen. Updated values of the scattering kernel size from long-wavelength measurements within the context of these models may be found in Johnson et al. (2018). Implementing these updated models is left for future development.

Default Diffractive Screen
Multiwavelength observations have produced a model for the scattering kernel that is asymmetric and wavelength dependent, consistent with that anticipated by models of the scattering screen that invoke Kolomogorov turbulence within a plasma sheet (Bower et al. 2006;Johnson et al. 2018;Psaltis et al. 2018). The associated semimajor and semiminor axis sizes are given by and are oriented such that the major axis lies along the position angle ξ=78°east of north.

Parameterized Diffractive Screen
Recently, it has been shown that even for thin scattering screens, the wavelength dependence of anisotropic scattering screens may be substantially more complicated . The main uncertainty is the inner scale of the turbulence within the screen, corresponding to the dissipative scale within the sheet. For some plausible values, the wavelength dependence could depart from that found in Bower et al. (2006) near 1.3 mm. As a result, a second scattering model has been implemented in which σ maj , σ min , and the position angle are all parameterized as power laws of wavelength with unknown coefficients and powers. That is, where the seven parameters, σ A , σ B , ξ 0 , ξ 1 , α, β, and γ, may be varied. The pivot wavelength, λ p , is not a parameter (being fully degenerate with s A,B and set by the user to a convenient value.

Native Physical Models
The past two decades have seen the development of a number of physically motivated models that employ ray tracing and radiative transfer in black hole spacetimes. These have two main components: the construction of photon trajectories within the spacetime under consideration and the radiative transfer through some emitting plasma distribution. Both elements are directly affected by variations in the spacetime structure, with the emission also depending on a number of astrophysical considerations.
While this class of models is substantially more complicated than geometric models, their physical origin presents a number of significant advantages. First, they are capable of making predictions for a wide range of observations, making it possible to bring far more empirical data to bear upon them. For example, they necessarily make simultaneous, self-consistent predictions for images, fluxes, variability, and polarization features of the EHT and auxiliary data (Bromley et al. 2001;Broderick & Loeb 2006b;Huang et al. 2009;Mościbrodzka et al. 2014;Chan et al. 2015;Pu et al. 2016a;Gold et al. 2017;Chael et al. 2018a). Hence, physical modeling enables a concordance fitting effort that promises far more power to constrain the nature of the emission region (Chan et al. 2015;Broderick et al. 2016;Gold et al. 2017). Second, the spacetime structure impacts the image in many ways beyond gravitational lensing. The dynamics of the material in the emission region modifies its optical depth and therefore appearance (Broderick & Blandford 2003Broderick & Loeb 2005, 2006a, 2009bMościbrodzka et al. 2009;Pu et al. 2016a;Gold et al. 2017;Bronzwaer et al. 2018;Chan et al. 2018;Mościbrodzka & Gammie 2018;Jeter et al. 2020). Thus, in principle, modeling the brightness distribution offers additional probes of gravity (Broderick & Loeb 2006a;Johannsen 2013;Broderick et al. 2014;Johannsen et al. 2016;Mizuno et al. 2018). Third, it provides direct information about the high-energy astrophysical processes responsible for the growth of black holes and the launching of jets (Broderick & Loeb 2009b;Levinson & Rieger 2011;Mościbrodzka et al. 2014;Broderick & Tchekhovskoy 2015;Hirotani & Pu 2016;Gold et al. 2017).
Within THEMIS, two general relativistic ray-tracing and radiative transfer packages are provided. The first of these is the vacuum ray-tracing and radiative transfer package Vacuum Ray Tracing and Radiative Transfer (VRT 2 ). VRT 2 is based on the plasma radiative transfer package described in Broderick & Blandford (2003 and provides a modular framework for adding novel plasma distributions, radiative transfer mechanism, and spacetime structures. It was the basis for the images generated in, e.g., Broderick et al. (2011) and used in the analysis of Broderick et al. (2016). It also natively interfaces with THEMIS, having been written in the same programming language (C++), in a similar style. Models based on VRT 2 within THEMIS include those listed below.
In addition, the vacuum ray-tracing and radiative transfer package Odyssey described in Pu et al. (2016b) has also been incorporated within THEMIS. Based on the ray-tracing algorithm in Fuerst & Wu (2004) and the radiative transfer formula presented in Younsi et al. (2012), Odyssey can exploit graphics processing unit (GPU) cards to realize substantial speed gains for models that employ it. It requires the compute unified device architecture (CUDA)-enabled GPU cards and the CUDA compiler nvcc. Again, like THEMIS, Odyssey is implemented in C/C++ and CUDA C/C++, making its integration straightforward.

SED-fitted RIAF
This is an image at a single wavelength associated with the radiatively inefficient accretion flow (RIAF) models described in Broderick & Loeb (2006b) and refined in Broderick et al. (2011). This model employs a tabulated set of accretion flow parameters, obtained at different black hole spins and inclinations, that reproduce the observed compact radio SED of SgrA * . The model parameters are the dimensionless spin magnitude, a (in the range [0, 1]); the cosine of the inclination, q cos , ([−1, 1]); and the position angle, ξ ([−180°, 180°], as part of a model image). The accretion flow angular momentum is assumed to be aligned with the black hole spin. The intensity normalization may be included via the likelihood (see Section 6.6). An example image from the THEMIS-integrated VRT 2 package is shown in Figure 3.

Extended RIAF
This is an extension of the SED-fitted RIAF model that permits a wide range of structural parameters in the RIAF model to vary. This consists of two populations of synchrotronemitting electrons, orbiting a Kerr black hole in the presence of a toroidal magnetic field. Specifically, the proper number density and temperature of a thermal population of electrons are given by where q = z r cos and J = R r sin , where r is the standard Boyer-Lindquist radius (measured in GM c 2 ) and ϑ is the Boyer-Lindquist polar angle. Similarly, the proper number density of the nonthermal electrons is given by and has a power-law distribution in microscopic Lorentz factor above γ min with a power law corresponding to an optically thin spectral index of α (i.e., 2α−1). These are emitting within a toroidal magnetic field with comoving strength and orbiting with a four-velocity outside of the innermost stable circular orbit (ISCO) given by where ℓ K is the specific Keplerian angular momentum and u t is determined by the standard normalization condition on u μ ; inside of the ISCO, the material plunges on ballistic orbits  (Cunningham 1975). Thus, there are 15 parameters: black hole spin, a; cosine of the black hole spin inclination, q cos ; black hole spin position angle, ξ; thermal electron density normalization, n e t , , radial power law, η t , and scale height, h t ; electron temperature normalization, T e ; radial power law, τ t ; nonthermal electron density normalization, n e,nt , radial power law, η nt , and scale height h nt ; minimum microscopic Lorentz factor, γ min , and spectral index, α; plasma beta, β, and sub-Keplerian fraction κ. Note that subsets of these may be held fixed or varied simultaneously via the definition of a wrapper model.

Orbiting Hot Spots
Major dissipative events within the accretion flow, such as magnetic reconnection events and shocks, can generate initially compact, orbiting, synchrotron-emitting hot spots. These may increase the emission of SgrA * by orders of magnitude before inducing any dynamical effects. Therefore, they may be roughly modeled as orbiting, Gaussian nonthermal particle overdensities that subsequently synchrotron emit in the radio and infrared, restricted to the equatorial plane (Broderick & Loeb 2005, 2006a. To model the velocity profile of the spot we use a two-parameter, ( ) [ ] a k Î , 0,1 r , four-velocity given by K ff subscripts denote Keplerian and freefall motion respectively, and Pu et al. 2016a). Equation (11) is a two-parameter description of the flow dynamics and can also be applied to the extended RIAF model in Section 4.4.2, thereby generalizing Equation (10). Thus, there are 10 parameters needed for this model: black hole spin, a; cosine of the black hole spin inclination q cos ; black hole spin position angle, ξ; central spot nonthermal electron density n e,spot ; spot radial size R s ; initial spot location in time, t 0 , radius, r 0 , and azimuthal angle f ; 0 the sub-Keplerian parameter, κ; and the radial infall parameter α r .

Shearing Hot Spots
In practice, hot spots will subsequently shear and cool. Thus, THEMIS also includes a shearing hot spot model (Jeter et al. 2020) that incorporates the expansion of the hot spots within a background accretion flow. The parameters of this model are identical to the orbiting spot model above.

External Physical Models
There is no intrinsic bar to including additional ray-tracing and radiative transfer packages within THEMIS. Doing so offers a number of benefits, including the ability to rapidly generate new models within THEMIS itself, efficient parallelization, and improved portability. However, native integration is not necessary. It is often initially faster, and occasionally necessary, to externally include modeling software. For THEMIS, this has been done for a number of existing packages: GRTRANS: A publicly available general relativistic, polarized radiative transfer code written in FORTRAN, described in Dexter (2016) and . GRTRANS and by extension also THEMIS are coupled to the HARM3D general relativistic, magnetohydrodynamic (GRMHD) code (Gammie et al. 2003;Noble et al. 2006;McKinney et al. 2014). ASTRORAY: A significantly extended version of the general relativistic polarized radiative transfer code written in C/C ++ based on Shcherbakov (2014) and substantially extended in Gold et al. (2017). ASTRORAY and by extension THEMIS are coupled to HARM3D (Gammie et al. 2003;McKinney et al. 2012McKinney et al. , 2014. Note that many of these are directly coupled to a variety of existing GRMHD simulation codes such as HARM3D and BHAC. As of now, THEMIS has successfully interfaced, in at least a limited form, with the vast majority of the imagegeneration tools employed by the EHT collaboration.

Priors
Preexisting information about parameters may be imposed via priors. THEMIS provides a number of potential priors for individual parameters. These may be imposed in two distinct ways: as "priors" that modify the likelihood and "transforms" that modify the parameter values. Within THEMIS, "priors" add a term associated with a given prior distribution. These are most easily implemented and understood. However, they can be inefficient, assuming that the sampler will efficiently incorporate the modified likelihood. In contrast, "transforms" impose priors indirectly by mapping the variable being sampled into the desired prior via a coordinate transformation. These are more complicated to implement, typically requiring the integration of the desired prior probability distribution. However, they are optimally efficient, permitting the sampler to apply a more natural distribution. Note that "transforms" may be implemented intrinsically within models by choosing a convenient set of parameters.
Likelihood evaluation is short-circuited on the evaluation of priors, i.e., where the prior has zero probability (e.g., outside the limits of a linear range), the likelihood is not evaluated but rather returns the appropriate vanishing value. This achieves two goals: first, THEMIS is made marginally more efficient by avoiding unnecessary computation, and second, permits priors to be used to avoid unphysical parameter combinations, where models may return nonsensical results, e.g., negative densities passed to a radiative transfer code or black hole spin outside the range permitted by general relativity.
Currently, THEMIS has only implemented priors and transforms of a single variable. This is sufficient for most situations. However, there are situations that may benefit from priors that depend on many parameters, e.g., enforcing an ordering among the intensities of multiple Gaussian components, thereby eliminating the trivial degeneracy associated with swapping components. Nevertheless, there is no reason that such a prior cannot be implemented within THEMIS.
Implemented priors include: 1. None:a flat prior without a boundary. 2. Linear: a flat prior given two bounding values. 3. Logarithmic: a logarithmic prior given two bounding values. 4. Gaussian: a Gaussian prior given a mean and standard deviation.

Likelihoods
Models and data are systematically compared via likelihoods, which express the probability that the data was obtained from the model. Within THEMIS, a likelihood is any method for taking a parameter vector, p, and constructing a log-likelihood, . When this is generated using a THEMIS data object (consisting of a number of individual values) and a THEMIS model object, the log-likelihood is the probability of obtaining the data given the model. Likelihoods can be combined with user-supplied weights, enabling the combination of various data sets. However, when doing so it is assumed that the model parameters are unchanged, i.e., the same set of model parameters are to be supplied to each likelihood being used. All likelihoods expect a matching data type and model type, e.g., visibility amplitude data and a model that generates visibility amplitude predictions.
The likelihood generally requires information about the underlying error distribution of the data, which is typically provided via an error estimate. It is not required within THEMIS to assume Gaussian errors, i.e., likelihood classes that assume alternative error distributions (e.g., Rice distributions, etc.) are possible. In some instances, this flexibility is important, e.g., for quantities constructed from quotients (closure amplitudes, Section 6.4, and interferometric polarization fractions, Section 6.5). It is, however, often convenient numerically to presume Gaussian errors when permissible, enabling analytical simplifications that greatly improve the efficiency of THEMIS (Sections 6.6 and 6.7). Similarly, all currently implemented likelihoods assume the data values are independent-this, too, may be relaxed in principle. An obvious example of both that is of considerable interest is the covariance induced by the refractive modes in the scattering screen, the implementation of which is left for future development.
Likelihoods also can incorporate model features. In many instances, a subset of model parameters may be analytically marginalized over and in the process subsumed into the likelihood itself. We have implemented a number of examples of such "marginalized" likelihoods, i.e., likelihoods in which sets of nuisance parameters have been treated analytically. It is natural to include key systematic uncertainties of the EHT, e.g., the structure of the refractive scattering screen, in this fashion, though this is left for future development.
The likelihoods currently implemented in THEMIS include the following.

Test Cases
To facilitate testing samplers, THEMIS includes five artificial likelihoods with given distributions. The first is a multidimensional Gaussian, with user-specified mean and size, see Section 8.1.3. The second, the egg box, is considerably more complicated, producing a highly multimodal likelihood function in five dimensions: The number of peaks can be set by the range over which the priors permit the parameters, p, to vary. This is typically used to assess the ability of a sampler to accurately find widely separated, high-likelihood regions. The results of the egg box are presented in Section 8.1.2. The third artificial likelihood is the 2D Rosenbrock function: The test results are presented in Section 8.1.4. The fourth artificial likelihood is the five-dimensional Griewank problem: It features an egg box component and a long-tailed parabolic contribution. The test results are presented in Section 8.1.5. The fifth artificial likelihood is a three-dimensional Cauchy/ Lorentz distribution: where d=3 is the chosen dimension and m=0.1 the width of the line profile. It features heavy and long tails and has no welldefined mean, and therefore makes for a challenging sampling problem. The test results are presented in Section 8.1.6.

Visibility Amplitudes
THEMIS includes a log-likelihood that assumes Gaussian errors for visibility amplitudes: where | | V j and σ j are the observed visibility amplitudes and their errors, and |ˆ| ( ) p V j are the model visibility amplitudes given parameters p. Note that the true visibility amplitude error distribution is given by the Rice distribution, and for low signal-to-noise ratio (S/N) is both biased and non-Gaussian (Thompson et al. 2017). However, when data are selected such that S/N2 and approximately debiased via , the visibility amplitude error distribution is within 8% of an unbiased Gaussian distribution at all | | V , and reproduces the mode to better than 1% and the 68% and 95% cumulative widths to better than 6% (see Appendix B). Currently, the user is expected to independently implement the debiasing procedure and S/N cut in the generation of the data tables prior to reading them in THEMIS. These were made for the THEMIS analyses in Papers V and VI.

Closure Phases
In principle, closure-phase errors are non-Gaussian and correlated among station triangles. However, this is simplified in practice for EHT observations by two facts. First, current and future EHT data frequently have high S/Ns. For closure phases constructed from complex visibility measurements with S/ N>2, the closure-phase errors are well approximated by Gaussians (Rogers et al. 1984;Ricarte & Dexter 2015). Second, existing and anticipated EHT observations are performed with a highly heterogeneous array, in which a nondegenerate set of closure phases may be selected such that a single station dominates the sensitivity of each triangle. In this case, the off-diagonal elements of the covariance become subdominant, and the closure-phase measurements effectively independent (Blackburn et al. 2020).
Therefore, THEMIS includes a log-likelihood that assumes Gaussian errors for closure phases: where Φ j and σ j are the observed closure phases and their errors, ( ) F p j is the model closure phase given parameters p, and Δ(x) is the angular difference in the range [−180°, 180°). This is similar to, but distinct from, the visibility amplitude likelihood in that the difference selects the branch that minimizes the angular difference. In the limit of small σ j , this is identical to the circular dispersion in Medeiros et al. (2017) and the circular statistics in Chael et al. (2018b); at large σ j , all of these approximations differ, though in this limit the closure phases become uninformative. We adopt this simpler prescription as it facilitates addressing scattering-induced closure-phase fluctuations later.
Again, the user is expected to independently implement the S/N cut in the generation of the data tables prior to reading them in THEMIS. These were made for the THEMIS analyses in Papers V and VI.

Closure Amplitudes
Closure amplitudes provide an example of a non-Gaussian likelihood within THEMIS. Because closure amplitudes are constructed via taking ratios of visibility amplitudes, the likelihood of a single value exhibits a significant asymmetry and extended tail toward large values, characteristic of quotient distributions (see Appendix B). For S/N4, this is well approximated by a Gaussian quotient distribution, 99 given in Equation (B17). In principle, this can make use of ancillary information in the form of station system equivalent flux densities (SEFDs), though this is left for future development. Thus, at present, we assume that the parameter ρ, defined in Equation (B14), is fixed to unity, for which the Gaussian quotient approximation is accurate at all  to better than 13% for S/N4. The associated log-likelihood is where the  j and σ j are the observed closure amplitudes, thê ( )  p j are the model visibility amplitudes given parameters p (with the functional dependence suppressed for clarity), and erf(x) is the error function. This differs from Equation (B17) by constant normalization factors. In the limit ofŝ   0 j j , the third term vanishes. However, for  j of order unity, Equation (18) does not reduce to a Gaussian distribution in any S/N limit.
Finally, we note that this approximation is significantly better when j is small. Generally, the closure amplitudes can be constructed such that <  1 j , approximating this requirement. Currently, the user is expected to independently define the set of closure amplitudes such that this is true prior to reading them into THEMIS.

Interferometric Polarization Fractions
The interferometric polarization fraction provides a second example of a non-Gaussian likelihood available in THEMIS. As with the closure amplitude, the source of the non-Gaussianity is the presence of the ratio in their definition. This leads to an asymmetric likelihood with an extended tail toward large  m that is also well approximated by a Gaussian quotient distribution for S/N2, given in Equation (B10). That  m is defined by the ratio of visibilities constructed simultaneously on the same baseline places an additional constraint on the likelihood, permitting it to accurately be described by a single noise parameter (Appendix B): for S/N2, the Gaussian quotient distribution is accurate at all  m to 13% for S/N=2 and 6% for S/N4. The associated log-likelihood is identical in form to Equation (18): were  m j and σ j are the observed polarization fraction and its uncertainty, andˆ( )  p m j are the model polarization fractions associated with parameters p (with the functional dependence suppressed for clarity). This differs from Equation (B10) by constant normalization factors. As with the closure amplitudes, this is non-Gaussian even in the limit of  s  m 0 j j .

Norm-marginalized Visibility Amplitudes
Variations in the total source flux can be directly incorporated into the likelihood. Assuming Gaussian errors for visibility amplitudes, it is possible to introduce and analytically marginalize over an over-all normalization, V 00 , presuming a flat prior (Broderick et al. 2014). This provides the maximum log-likelihood: which may be identified with the minimum χ 2 , occurring at More relevant for sampling is the marginalized log-likelihood:ˆ( with the corresponding marginalized normalization, = V 00,marg V 00,max . A similar procedure could be applied for a Gaussian prior (see the following section) with very minor modifications.
By breaking the visibility amplitude data into epochs with similar visibility normalizations, corresponding, e.g., to a variable accretion rate, this can substantially increase the efficiency of sampling the remaining parameter space.

Shift-marginalized Closure Phases
At lowest order, refractive scattering induces shifts in the closure phase. These, again, may be incorporated into an appropriately constructed likelihood. Assuming Gaussian errors for closure phases, it is possible to analytically marginalize over an over-all shift, f, presuming a user-supplied Gaussian prior ). This provides both the maximum log-likelihood, where the most likely phase offset is and again Δ(x) is taken on [−180°, 180°), minimizing the magnitude of the angular difference. The maximum loglikelihood is trivially related to the χ 2 and is relevant for fit quality assessment. More relevant for parameter estimation is the marginalized likelihood, for which the log-likelihood is given bȳ with an associated marginalized value of the closure-phase shift of¯( Here, σ Φ is the width of the Gaussian prior on f; it is indicative of the amplitude of the refraction or turbulence responsible for the interepoch closure-phase fluctuations. This marginalized log-likelihood is appropriate for sampling the remaining parameters. By breaking the closure-phase data into epochs with similar visibility normalizations, corresponding, e.g., to a variable accretion rate, this can substantially increase the efficiency of sampling the remaining parameter space.

Gain-marginalized Visibility Amplitude
Station gains present a dominant source of systematic error in the construction of visibility amplitudes. Typically, it is possible to calibrate these to 10%-20% (Paper III); in instances where redundant antennas exist, network calibration schemes can reduce this to 1%. Occasionally, much larger gain variations are possible. In either case, these systematic errors overwhelmingly dominate the remaining, random components, e.g., the thermal noise. Accessing the full sensitivity of the EHT via visibility amplitude comparisons requires an efficient method of addressing the reconstruction and marginalization over these uncertain gains.
This reconstruction is facilitated by the correlations among visibilities resulting from the repeated appearance of the gain factors: in a scan involving N stations, at most N gains must be reconstructed along the underlying model from N(N−1)/2 visibility amplitude measurements. It is complicated by the time-variable nature of the gains, which can vary significantly from scan to scan, i.e., on timescales of tens of minutes, and thus introduces many additional parameters to be modeled (see, e.g., Natarajan et al. 2017). Within THEMIS, this is addressed by directly marginalizing over the gains at the level of likelihood construction. That is, THEMIS effectively selfcalibrates the station gains to the model at each likelihood evaluation.
We assume that the gain at each station is well modeled by a set of constant corrections for each gain-reconstruction epoch.
where¯( ) u p V ; are the model amplitudes in the absence of gain corrections g A k , and g B k , , and the baselines u AB j , comprise those baselines connecting A to B for which measurements were made within the specified time frame. For each k, we reconstruct all of the gain corrections independently, subject to Gaussian priors on the ,...
, .... This is done in two steps: 1. By numerically maximizing the log-likelihood in Equation (16) at fixed p supplemented by Gaussian priors for the coupled set of g k to obtainḡ k . This is necessarily peaked, unimodal, and a modest multidimensional optimization problem for the number of antennas relevant for the EHT. We do this using a modified Levenberg-Marquardt algorithm that includes the priors (see Appendix C). 2. An approximate marginalization over all g k is performed via the Laplace approximation, i.e., by assuming that near g k , the log-likelihood is nearly quadratic, and thus the contribution to the log-likelihood from where C is the covariance of the prior-adjusted loglikelihood from the first step and || || C is its determinant.
The final log-likelihood is obtained by combining those over all gain-reconstruction epochs: The number of additional effective parameters introduced can be very large and depends on the particulars of the stations contributing during each gain-reconstruction epoch. Within each reconstruction epoch, the number of additional parameters is the minimum of the number of baselines and the number of stations participating. When all stations participate in N g gainreconstruction epochs, the number of additional model parameters isŃ N g , which for a typical EHT observation in 2017 can be of order 10 2 for a single band. By ignoring potential correlations between gain-reconstruction epochs, we reduce the dimension of the additional parameter space by a large factor. By sampling the log-likelihood after marginalization over the { } g k , we efficiently restrict the dimension of the parameter space that must be sampled, e.g., by the techniques described in Section 7, to that of the original model. Typically, this has the effect of reducing the number of parameters by many hundreds.
We present validation tests in Section 8.2.2 and Paper VI.

Samplers
The dependence of the likelihood on the model parameters, incorporating any priors on the parameter values, is assessed via samplers. The process of sampling is conceptually separated from the definitions of data and models through the standardization of the likelihood objects. Thus, within THEMIS, a sampler is any method for exploring the values of a likelihood for various choices of the parameter vector. There is no standard output or input for a sampler, which may even vary qualitatively depending on the goal of the sampling process. However, all samplers interface with data and model objects solely through the use of likelihood objects and thereby permit analyses of a wide variety of combinations of data and models. Implemented samplers include the following.

Grid Search
The conceptually simplest but least efficient is a simple grid search where the parameter space is probed in predetermined fixed steps in each dimension. While limited in computational efficiency, this scheme is often used to cross-check results obtained by other samplers for smaller parameter spaces that both schemes can handle.

Parallel-tempered, Affine-invariant Markov Chain Monte Carlo
A natural choice for high-dimensional models is to use MCMC. Having scalability in mind, we chose to implement ensemble sampling methods in which many MCMC chains sample the parameter space in parallel. The chains interact and use the information from their spatial distribution to effectively adjust their next jump proposals. This has the added benefit of being able to sample the unknown likelihood surfaces efficiently and with minimal user input. We have implemented two different ensemble sampling methods, namely, an affineinvariant method and a differential-evolution method; the latter is given in Section 7.3.
The affine-invariant method can sample likelihood functions that are related by affine transformations with the same efficiency (Goodman & Weare 2010). This means it is very efficient in sampling highly stretched likelihood distributions as long as the nonlinear correlations among parameters are sufficiently weak.
MCMC algorithms are generally not very efficient on highly multimodal distributions. In order to overcome this problem, we have implemented parallel tempering for each MCMC sampler. Parallel tempering makes copies of the log-likelihood () function that are made smoother through the introduction of a temperature parameter, the higher the temperature, the smoother the likelihood surface: The different temperatures are chosen from a temperature ladder such that   T T 1 i max . Then, we run a copy of our MCMC sampler for each tempered likelihood copy in parallel. The highest temperature chains can freely move in the parameter space, while the low-temperature chains can be trapped in local likelihood maxima. By allowing the different temperature chains to exchange their positions following some prescription, the low-temperature chains efficiently escape the local maxima and explore the entire parameter space. The lowest-temperature chain, which samples the original untempered likelihood, yields the posterior probability distribution.
An efficient parallel-tempering algorithm requires that the temperature ladder has to be chosen carefully. There are two main factors to consider. First, the highest temperature used should be large enough to let the chains move freely within the likelihood surface. Furthermore, the temperatures should not be too widely spaced as that could hinder efficient swaps between chains from adjacent temperatures and lead to inefficient tempering. The choice of an efficient temperature ladder depends on the likelihood surface and could be difficult to guess. To mitigate this problem, we have implemented a method to adaptively change the temperatures in order to get near optimal efficiency. Our method follows that of Vousden et al. (2016).
The temperatures of the top and bottom tempering levels are fixed to = ¥ T and T=1, respectively, ensuring that the top and bottom tempering levels sample the prior and posterior of interest. Intervening tempering levels are initialized to have a geometric ladder with the total number of levels, initial spacing ratio, and tempering schedule treated as hyperparameters. During sampling runs, the user is responsible for inspecting the evolution of the resulting MCMC chains, tempering level exchange rates, and tempering level temperature evolution to ensure that the tempering is performing well. Key aspects to ensure are that the tempering level exchange rates are similar across the ladder (including to the prior), that the tempering level temperatures have settled to asymptotic values, and that the minimum number of tempering levels necessary are being used (thereby minimizing the roundtrip time). These must be explored via the comparison of multiple sampling runs, though they are frequently clearly discernible early in the run.
We have implemented parallelism in different levels of the MCMC sampler. Tempering is parallelized, MCMC chains at each tempering level run in parallel, and the likelihood can itself use multiple threads to run. This allows for the effective use of large high-performance computing machines.
It is a well-known feature of MCMC schemes that there is an initial so-called "burn-in" phase when the sampling exhibits comparatively large changes in parameter predictions, followed by a phase where the walkers settle down to smaller and more consistent changes, before ultimately converging to a final answer. As is customary in this approach, we exclude a certain number of MCMC steps corresponding to the "burn-in" process at the beginning of the "chains," i.e., the history of an MCMC walker. This "burn-in" is identified by inspection, although tools and diagnostics (e.g., Vehtari et al. 2019) are in active development and will be available in a future release of THEMIS.

Parallel-tempered, Differential-evolution Markov Chain Monte Carlo
The second MCMC algorithm implemented in THEMIS is the parallel-tempered, differential-evolution algorithm (Braak 2006). The differential-evolution method adjusts the collective move of its chains in a way to achieve optimal acceptance rate during Monte Carlo steps. It has the added benefit of being able to jump between modes in multimodal problems even without any tempering, thus representing a better option for multimodal distributions. Our implementation follows that of Nelson et al. (2014), and it also makes use of the same parallel-tempering algorithm as the affine-invariant method. Nelson et al. (2014) uses the ensemble to drive the acceptance rate of the sampler to a predetermined value, typically 0.25. Additionally, every 100 steps, it proposes a large step to potentially enable mode swapping for multimodel posteriors. Note, however, that this depends on the posterior having a regular structure.

Bayesian Evidence
The MCMC sampling described above provides the posterior probability distribution on parameters within the context of a given model. This allows us to calculate expectation values for any quantity of interest and to assess the goodness of fit for a given model to the data. However, if we need to compare the plausibility of different models given the same data set, running conventional MCMC is insufficient. Potential ways to perform model comparisons include reversible jump MCMC, calculating the Bayesian evidence (via thermodynamic integration, nested sampling, or Laplace approximation), and the use of information criteria (Knuth et al. 2015).
In Bayesian probability theory the relative probability of two models, M 1 and M 2 , given the same data set, D, is related to the ratio of the Bayesian evidences for the two models, i.e., the Bayes factor or odds ratio. The relative posterior probability of the two models can be written as P D M 2 are the Bayesian evidence for the two models. In the absence of some strong prior preference, it is often assumed that the prior probability of the two models, ( ) P M 1 and ( ) P M 2 , are equal, and hence the ratio of the Bayesian evidence or the Bayes factor is all that must be computed.
THEMIS implements the thermodynamics integration method to calculate the Bayesian evidence (Lartillot & Philippe 2006). In order to do this many MCMC chains are run in parallel on tempered versions of the likelihood. The temperature ladder for this purpose is provided by the user. Given the posterior distributions and the values of the likelihood at these points, the Bayesian evidence (Z) is obtained from is the expectation value of the log-likelihood calculated using the posterior probability distribution of chains at a tempered level corresponding to T=1/β.
It is critical that the tempering ladder be sufficiently large to make contact with the prior, effectively sampled at = ¥ T (β=0). Moreover, for the sum of the tempering levels to provide an accurate approximation of the Bayesian evidence, the separation between tempering levels must be sufficiently small. Both of these restrict the application of the Bayesian evidence, and often we will still appeal to information criteria.

Validation Tests
We now turn to validating THEMIS. For this we focus on the sampling methods, for which the implementations are novel, and reproducing prior analyses of EHT observations of SgrA * . The variety in algorithmic improvements present in THEMIS results in considerable speed-up and simplicity in implementation in comparison to previous work to which we compareoften, analyses that took many months (e.g., Broderick et al. 2011Broderick et al. , 2014Broderick et al. , 2016 are now executed in days. Moreover, all of these tests have been integrated into THEMIS, both as validation tools and tutorials for future users. Note that the sampler parameters (e.g., number of walkers, number of MCMC steps, number of tempering levels, grid search resolution) differ among these tests. In some cases, these are determined approximately algorithmically (e.g., the number of walkers in ensemble samplers should exceed the number of parameters by a factor of 4 or more; Braak 2006; Goodman & Weare 2010) while others are assessed by inspection (e.g., length of MCMC chains and grid search resolution). In principle, for problems that exhibit strong nonlinear covariances between all parameters, the number of walkers may scale as the number of parameters squared. However, in practice, the covariances for most problems are dominantly diagonal, i.e., the off-diagonals are sparse, and thus necessitate far fewer walkers.

Validation of the Samplers
Here we test the sampling part of the code thoroughly. In particular, we demonstrate the ability of the affine sampler to reliably probe nontrivial parameter spaces. That is, we anticipate that the complex models ultimately of most interest in the context of EHT analyses will produce multimodal probability distributions in high-dimensional parameter spaces. It will be necessary, therefore, to consistently identify all of the high-likelihood islands and determine accurately their relative posterior probabilities.

Two-dimensional Gaussian Likelihood
If our model has a small number of parameters, the grid search sampler can be efficient in sampling the parameter space. In this test, the grid search sampler was used to sample a two-dimensional symmetric Gaussian likelihood. Figure 4 shows the log-likelihood recovered using the grid search sampler as well as the marginalized posterior distributions for the same likelihood sampled using MCMC methods.

Egg Box Test
In this test, a five-dimensional parameter space with a highly multimodal egg-box-like distribution is sampled. The likelihood is described in Section 6.1 and contains 5 5 =3125 extrema, of which 1563 are maxima at the tops of sharp peaks within the prior range: p i ä [−8, 8] for all i.
This presents a significant challenge to most sampling schemes. The narrowness of the peaks and the dimensionality of the parameter space preclude a grid search, which would require more than 3×10 12 samples to robustly detect all of them. The large dynamic range in the likelihood, i.e., the very low likelihoods between peaks, precludes typical MCMC schemes, which are unable to efficiently explore the full parameter space. Therefore, it provides a strong test of the ability of the parallel -tempered, affine-invariant, and differential-evolution parallel MCMC samplers to efficiently find and reconstruct the various high-likelihood regions.
As seen in Figure 5, both capture the gross features of the egg box likelihood. This run was executed employing the differential-evolution sampler with 8 tempering levels and took only 10minutes on a typical laptop. There were 128 walkers running for 10 5 steps, of which the first 10 4 were discarded as the burn-in period Every one of the 1563 maxima are populated, with an apparent average number of independent samples of only 10. This is sufficient to faithfully map out the diagonal elements of the covariance, which are reconstructed to better than 8% of the true value; the off-diagonal elements are restricted to less than 6% of the magnitude of the diagonal elements. While this is likely assisted by the highly symmetric nature of the peaks and their regular locations, it nevertheless demonstrates the ability of the samplers to effectively address highly multimodal likelihoods.

16 Gaussian Test
Here we show that the sampling scheme can correctly probe a two-dimensional parameter space with a likelihood consisting of 16 Gaussians and accurately reconstruct the relative posterior probabilities of different peaks. To do this, the likelihood of the Gaussian located at (x 0 , x 1 )=(20, 10) is chosen to be nine times higher than the others. This test was run using the affine-invariant sampler with four tempering levels. The number of MCMC steps in this case was 4000 steps, and the sampler used 100 walkers.
As shown in Figure 6, the sampler finds all of the Gaussian components. In addition, it recovers the nonuniformity of the  likelihood surface, accurately reconstructing the posterior weight of the appropriate component. Figure 7 shows the relative probability mass correctly recovered for these 16 Gaussian peaks.

2D Rosenbrock Test
In this test, we demonstrate the differential-evolution sampler's ability to probe a two-dimensional parameter space with a likelihood surface that is quite narrow along one axis but broad in another and that has long tails. The two-dimensional Rosenbrock function serves as a standard test of sampling schemes. It has a global extremum at ( ) ( ) = x x , 1,1 0 1 . We first ran the Rosenbrock test with flat priors and with three independent initial conditions for 100,000 steps with 24 walkers and temperatures. Upon inspection, these appear converged.
Given that sampling from the truncated Rosenbrock distribution is computationally much easier, we run again using a linear prior within the [−2, 2] test and 20 walkers and 20 tempering levels for 5000 MCMC steps. As shown in Figure 8, the sampler probes the Rosenbrock likelihood surface well and recovers the global extremum. The fact that the marginalized distribution for x 1 peaks away from the truth is a result of the pathological functional form causing large contributions from x 1 ∼0 regions. Note that the best fit maximizes the joint posterior distribution, not the marginalized distributions individually.

5D Griewank Test
This test involves elements of both the egg box test and the Rosenbrock test in that it features many locally peaked likelihoods as in the egg box test and a long-tail component in the form of a parabola. The sampler finds the global extremum at (x i =0) for i=0 K 4 and captures the parabolic falloff; see Figure 9. This test was performed by executing 3 independent runs using the differential-evolution sampler with 24 walkers and 24 tempering levels for 50,000 MCMC steps with flat priors. Upon inspection, the resulting chains appear well converged.

Multivariate Cauchy Distribution
The Cauchy (or Lorentz) distribution is a tough sampling problem due to its heavy tails and undefined mean. Its multivariate form, in particular, is therefore an effective and popular choice to test sampler performance. We ran the differential-evolution sampler on a three-dimensional Cauchy distribution using 24 walkers and temperatures for 100,000 MCMC steps with a flat prior, shown in Figure 10. While runs with a linear prior converge quickly, the flat prior runs require significantly longer evolutions. Even very long runs produce residual apparent structure in the resulting MCMC chains.  However, again, running an even slightly truncated version using linear priors (more akin to our typical challenges), convergence is achieved substantially more rapidly.

Self-tests with Simulated Data
Here, we demonstrate the ability of THEMIS to accurately reconstruct model parameters. This presents a simultaneous test of many of the components of THEMIS, including the data structures, models, likelihoods, and samplers. We generate simulated images using THEMIS' native model classes, from which the appropriate simulated data is constructed. Thermal noise is then included, producing a data set similar in character to that associated with a single night of the 2017 EHT campaign. The simulated data are then analyzed with THEMIS using the corresponding model. Note that this does not address model discrimination. For this purpose, we considered three models: the symmetric Gaussian and the geometric crescent models, for which visibilities can be computed analytically, and the SED-fitted RIAF model, which incorporates the ray-tracing components of VRT 2 and numerical data generation of THEMIS.

Gaussian Model
We generated simulated closure amplitude data from a compact, symmetric Gaussian with V 0 =2.5 Jy and σ=5 μas. We adopted a Gaussian for simplicity. The very compact size was selected to ensure high-S/N detections on even the longest baselines of the 2017 EHT campaign; such high S/Ns are typical of more complex models. We analyze these data with THEMIS' symmetric Gaussian model (Section 4.2.1) to assess potential biases associated with the non-Gaussian nature of the closure amplitude error distribution. We imposed an S/N minimum of 4 on the simulated closure amplitude.
For this analysis, we used the closure amplitude likelihood described in Section 6.4. We sampled the posterior distribution with the parallel-tempered, affine-invariant MCMC sampler, adopting linear priors on each model parameter. The analysis converged using 5 tempering levels with 128 walkers communicating every 50 MCMC steps and taking 5000 samples per walker.
As expected, the total intensity is not constrained by closure amplitudes, recovering our prior distribution. The resulting posterior distribution for the size of the Gaussian is shown in Figure 11. The reconstructed size is σ=5.0004±0.0004 μas, 100 consistent with the input value. Repeating the analysis with different realizations of the simulated data produces qualitatively similar results, though they do exhibit 2σ fluctuations marginally more often than anticipated. No experiment produced a deviation larger than 3σ. As a result, we conclude that the likelihood in Equation (18) does not fully eliminate the bias inherent in the closure amplitude error distribution, though does so at the 2σ level. Decreasing the S/N minimum increases this bias substantially, suggesting that additional development is required to fully exploit low-S/N data.
While the closure amplitude likelihood in Equation (18) is not Gaussian, and thus does not admit a well-defined χ 2 , we do construct an approximate expression via c = - 2 2 . In the limit of small closure amplitudes, this identification is well motivated. The associated reduced χ 2 is 0.97 with 1656 degrees of freedom, suggesting that this statistic will be informative of fit quality. The corresponding two-tailed p-value, assuming a χ 2 distribution, is 39%. 101 Figure 9. Validation of the differential-evolution sampler with a fivedimensional Griewank test with five artificial parameters x x x x , , , , 0 1 2 3 and x 4 . The sampler finds the correct extremum and explores the likelihood surface comprehensively. Figure 10. Validation of the differential-evolution sampler with a threedimensional Cauchy/Lorentz test with three artificial parameters x x , , 0 1 and x 2 . The sampler finds the correct extremum and explores the likelihood surface comprehensively. 100 Note that for all of our analysis, we report the marginalized values for the parameters of each model instead of the maximum values, which for complicated likelihood distributions may differ significantly. 101 To quantitatively assess a given χ 2 value, we employ the two-tailed pvalues, defined by is the appropriate one-sided cumulative χ 2 for a given number of degrees of freedom, dof. This penalizes both poor fits (χ 2 ? dof) and overfits (χ 2 = dof).

Gaussian Model with Gain Errors
We generated simulated visibility amplitude data from a compact, symmetric Gaussian with V 0 =2 Jy and σ=15 μas that incorporated known gain errors. The S/Ns, observing schedule, and baseline coverage approximate those associated with the 2017 April observations of M87. 102 Multiple types of gain errors were considered. Following the properties of the recent M87 observations, we permit potentially large errors on the LMT gains and modest errors on the remaining stations (Paper III). This presents a pathological situation, with one station considerably more poorly characterized. In all cases, the gains were assumed to be stationary across a scan.
The analysis employed the parallel-tempered, differentialevolution MCMC sampler, adopting linear priors on both. We adopt Gaussian priors on the reconstructed gain errors, with Σ g =0.2 for all stations with the exception of the LMT, for which Σ g =1.0. The analysis used 8 tempering levels, 128 walkers communicating every 50 MCMC steps, and converged rapidly.
The reconstructed gains for the best fit are shown in Figure 12 when the true gain errors vanish and when they are nonzero and vary throughout the night. In both cases, the same realization of the observational errors was employed. Modest fluctuations within the assumed gain-reconstruction priors driven by the thermal errors are present. This is clear from the analysis in which no gain errors were introduced. Patterns in the reconstructed gains persist across points within the MCMC chain and across different input gain-error realizations, indicating that they result from the overdetermined nature of the gain-reconstruction problem (see Section 6.8). Nevertheless, the reconstructions faithfully follow the introduced gain errors for all stations. This is most apparent for the LMT, which, by design, has the largest errors.
The process of reconstructing the gains does significantly expand the posteriors of the model parameters. This is clear in Figure 13, in which the red contours indicate the posterior without gain reconstruction. This is expected: the additional freedom associated with the gain corrections can marginally correct for larger deviations from the true model. Nevertheless, the true model parameters are well within the posteriors generally. This is shown explicitly in Figure 13 for both cases shown in Figure 12.

Crescent Model
We generated simulated visibility amplitude and closurephase data from a diffractively scattered crescent image with V 0 =2.24 Jy, R=28 μas, ψ=0.14, τ=0.07, and ξ=0°, and added thermal noise to it. We then analyze these data with THEMIS' crescent model (Section 4.2.4), demonstrating that THEMIS properly recovers the parameters of the original image.
For the analysis, we used the standard visibility amplitude and closure-phase likelihoods described in Sections 6.2 and  6.3, and modeled the effects of diffractive scattering with the default scattering model implemented in THEMIS and described in Section 4.3.1.
In this case, we sampled the posterior distribution with the parallel-tempered, differential-evolution MCMC sampler adopting linear priors on each parameter of the model. The analysis converged using 4 tempering levels with 16 walkers per level communicating every 50 MCMC steps and taking 10,000 samples per walker. The resulting posterior distributions for the parameters of this model are shown in Figure 14 where the blue lines represent the true parameter values of the original image.
Our analysis shows that the marginalized values for the parameters of the model are V 0 =2.2399±0.0001 Jy for the total flux, R=28.0064±0.0054 μas for the overall size of the crescent, with a relative thickness ψ=0.1404±0.0003, an asymmetry parameter τ=0.0691±0.0005, and a position angle ξ=0°. 040±0°. 023. Individually, these are consistent at the 2σ level with the true values of the original crescent image. The model gives a satisfactory fit to the data as confirmed by the reduced χ 2 of 0.98 with 1670 degrees of freedom, which implies that high-quality fits exist. The corresponding twotailed p-value is 59%, implying that the fit quality is well within the anticipated statistical range.

RIAF Model
We generated visibility amplitude and closure-phase data from a diffractively scattered RIAF image with (a, θ, ξ)=(0.10, 60°, 0°). We added thermal noise to the simulated data and then analyze it with THEMIS' SED-fitted RIAF model (Section 4.4.1) to show that THEMIS can properly recover the parameters of the original image.
For the analysis, the standard visibility amplitude and closure-phase likelihoods (Sections 6.2 and 6.3) were used, and the effects of diffractive scattering were modeled using THEMIS' default scattering model (Section 4.3.1). We used the parallel-tempered, differential-evolution MCMC sampler with 3 tempering levels, 14 walkers per level communicating every 50 MCMC steps, and took 5000 samples per walker.
The posterior distributions for the parameters of the model are shown in Figure 15. We find that the marginalized values for the black hole spin parameters are = 0 . 0017 0 .0199 0 .0223 . These parameter estimates are consistent at the 1σ level with the true values of the original RIAF image. In this case, we find a reduced χ 2 of 0.99 with 1664 degrees of freedom and corresponding twotailed p-value of 71%, indicating that high-quality fits were found.

Reproducing Previous Results
The variety of published analyses of EHT observations of SgrA * provides a natural validation test of THEMIS, as well as a demonstration of its flexibility. These include comparisons of purely phenomenological and physically motivated models of the image structure. In constructing these, we make use of the published EHT data sets listed in bold in Table 1, consisting of visibility amplitudes measured in 2007 and 2009 and closure phases measured between 2009 and 2013, inclusively.

Symmetric Gaussian
We analyze the visibility amplitude data from 2007 and 2009 using the symmetric Gaussian model described in Section 4.2.1 in order to show that THEMIS can reproduce previous modelfitting studies made to estimate the source size of SgrA * .
For the analysis, we employed the norm-marginalized visibility amplitude likelihood described in Section 6.6 to  account for variations in the total flux of SgrA * between observation nights. The effects of diffractive scattering were modeled using THEMIS' default scattering model (Section 4.3.1).
We employed the parallel-tempered affine-invariant sampler with 4 tempering levels with 32 walkers per level, and adopted linear priors on each parameter of this model. The posterior distribution for the intrinsic size of SgrA * after 10,000 MCMC iterations is shown in Figure 16. The reconstructed size is σ=15.73±0.25 (FWHM=37.05±0.60). The model gives a satisfactory fit to the data with an associated reduced χ 2 =1.15 with 65 degrees of freedom, with a corresponding two-tailed p-value of 38%. These results are in good agreement with the best fits found in Broderick et al. (2011) when all epochs are combined and the inferred sizes for each night reported in Doeleman et al. (2008) and Fish et al. (2011).

Asymmetric Gaussian
We also analyze the visibility amplitude data from 2007 and 2009 with the asymmetric Gaussian model described in Section 4.2.2 to show that THEMIS can reproduce previous studies made to probe the asymmetry of the emitting region of SgrA * .
For this analysis, we also analytically marginalized variations in the total flux of SgrA * between observation nights using the norm-marginalized visibility amplitude likelihood described in Section 6.6, and modeled the effects of diffractive scattering using THEMIS' default scattering model (Section 4.3.1).
We employed the affine-invariant sampler and adopted linear priors on each parameter of this model. The results converge using 4 tempering levels, with 32 walkers per level, and taking 20,000 samples per walker. The posterior distributions for the different parameters of the model are shown in Figure 17 2 . This model also gives a satisfactory fit to the data with an associated reduced χ 2 =0.75 with 63 degrees of freedom, with a corresponding two-tailed p-value of 14%. These results are in good agreement with the best fits found in Broderick et al. (2011) when all epochs are combined.

Crescent Model
We analyze the visibility amplitude data from 2007 and 2009 with the crescent model outlined in Section 4.2.4 in order to show that THEMIS can reproduce the earlier findings reported by Kamruddin & Dexter (2013).
We proceeded in a fashion similar to the analysis performed with the Gaussian models employing the norm-marginalized visibility amplitude likelihood described in Section 6.6 to account for variations in the total flux of SgrA * between days and modeling the effects of diffractive scattering with the default scattering model implemented in THEMIS (Section 4.3.1).
In this case, we sampled the posterior distributions with the differential-evolution MCMC sampler adopting linear priors on each parameter of the model. We used the reported values in Table 1 of Kamruddin & Dexter (2013) as initial guesses for the values of the parameters of this model. The analysis converged using 6 tempering levels with 32 walkers per level, and taking 100,000 samples per walker. The resulting posterior distributions for the parameters of this model are shown in Figure 18 119 . 4 13 .7 8 .7 , and a minimum reduced χ 2 =0.76 with 62 degrees of freedom, with a corresponding two-tailed p-value of 16%.
The recovered crescent parameters are consistent with those reported in Kamruddin & Dexter (2013), though the inferred 1σ errors are larger. In addition, we recover the expected bimodal distribution in position angle, resulting from the insensitivity of  the visibility amplitudes to a 180°rotation, and an additional weak bimodality in the crescent radius.

RIAF Model: Visibility Amplitude Analysis
We now turn to the first example of a physical model. First, we demonstrate THEMIS' ability to reproduce the analysis published in Broderick et al. (2011). For that purpose, we analyze the visibility amplitude data of SgrA * from 2007 and 2009 with the SED-fitted RIAF model described in Section 4.4.1, using a tabulated set of accretion flow parameters obtained at different black hole spins and inclinations-and distributed with THEMIS-that reproduces the observed SED of SgrA * .
For this analysis, we employed a set of linear priors for each parameter of the model and the norm-marginalized visibility amplitude likelihood described in Section 6.6 to account for variations in the total flux of SgrA * between observation nights. The effects of diffractive scattering were modeled using THEMIS' default scattering model (Section 4.3.1).
We used the parallel-tempered, differential-evolution MCMC sampler with 12 tempering levels and 8 walkers per level communicating every 50 MCMC steps. The test completed 20,000 MCMC iterations, and the posterior distributions for the black hole spin parameters are shown in Figure 19. We find a spin = We analyze the visibility amplitude and closure-phase data sets that are bolded in Table 1 with the SED-fitted RIAF model described in Section 4.4.1 using 128×128 pixel RIAF images to show that THEMIS successfully reproduces the results of the analysis published by Broderick et al. (2016).
For this analysis, we employed the norm-marginalized visibility amplitude likelihood described in Section 6.6 to account for variations in the total flux of SgrA * between observation nights. We also used the shift-marginalized closure-phase likelihood in Section 6.7 to model the effects of refractive scattering, while the effects of diffractive scattering were modeled using THEMIS' default scattering model (Section 4.3.1).
We used the parallel-tempered, differential-evolution MCMC sampler with 12 tempering levels and 8 walkers per level communicating every 50 MCMC steps. The MCMC chain was run for 20,000 steps, and the resulting posterior distributions for the parameters of this model are show in Figure 20 in comparison to the results of analysis with visibility amplitude data only discussed in the previous section. We find that the black hole spin parameters are similarly constrained after the inclusion of the closure-phase data, with = 165 . 13 4 .62 7 .42 . In this case, we find a reduced χ 2 of 1.08 with 231 degrees of freedom, with a corresponding two-tailed p-value of 39%, indicating that highquality fits were found. These results are consistent with best-fit parameters reported by Broderick et al. (2016).

New Results
In this section, we present novel results obtained with THEMIS. These include model comparisons to larger collections of legacy EHT data sets and/or more uniform comparisons than reported previously. Additional results obtained by combining a more complete combination of the data sets in Table 1 and applying additional model features will be reported elsewhere. Application to the 2017 EHT observations of M87 can be found in Papers V and VI.

Crescent Model
Prior analyses of the Kamruddin & Dexter (2013) crescent model utilized visibility amplitude data. With THEMIS, we extend these to include the closure-phase data sets that are bolded in Table 1. To account for refractive scattering, we employ the shiftmarginalized closure-phase likelihood (Section 6.7) when including the contribution to the total likelihood from closure phases. In all other respects, the analysis is similar to that presented in Section 8.3.3.
The inclusion of closure-phase data places strong new constraints on the crescent structure in a number of respects. The resulting posterior distributions for the parameters of this model are shown in Figure 21. The constraints on all of the crescent parameters are substantially improved quantitatively, often settling ambiguities in the previous analysis. The overall crescent size has been restricted to m = -+ R 46.3 as 1.5 1.4 , the relative thickness parameter is now y = -+ 0.41 0.04 0.07 , and the asymmetry parameter is t = -+ 0.23 0.14 0.21 . Individually, these are consistent at the 2σ level with the expectation based on visibility amplitudes alone.
Similarly, the position angle is also strongly constrained, with the prior degeneracy eliminated, finding x =  - +  179 . 4 9 .1 19 . 2 . Unlike the other parameters, this is inconsistent with the estimates from the visibility amplitudes alone at the 2σ level. This is apparent in the bottom panels of Figure 21. This is modestly disconcerting, given the qualitatively distinct natures   Table 1. For reference, the posteriors implied by the visibility amplitude data alone are shown in red. The contours show the 1σ, 2σ, and 3σ confidence regions for the spin magnitude, a; the cosine of the inclination, q cos ; and the position angle, ξ. All parameter constraints are consistent with the results of Broderick et al. (2016).  Table 1. For reference, the posterior distributions implied by the visibility amplitude data alone from Figure 18 are shown in red. The gray contours show the 1σ, 2σ, and 3σ confidence regions for the overall radius; the relative thickness; the degree of symmetry; and the position angle of the crescent. of the closure phases and visibility amplitudes. Nevertheless, the reduced χ 2 =1.07, with a corresponding two-tailed pvalue of 44%, implies that high-quality fits exist.

Extended RIAF Model
The SED-fitted RIAF model treats the comparisons to the EHT data and flux measurements differently, utilizing a prior set of compact radio SED fits. Here, we use THEMIS to relax this procedure, comparing RIAF models simultaneously to the flux and millimeter-VLBI measurements. In principle, this may broaden the black hole parameter estimates, trading worse compact radio SED fits for better structural fits. To explore this, we performed a new analysis, similar in spirit to that presented in Section 8.3.5, in which we analyze both data sets concurrently.
We performed a new analysis using the extended RIAF model described in Section 4.4.2, which generates flux measurements in addition to the millimeter-VLBI observations in a fashion identical to the SED-fitted RIAF model. In addition to the parameters describing the black hole spin (magnitude, inclination, and position angle), three additional parameters were introduced, describing the normalizations of the densities (n e t , , n e,nt ) and temperature (T e ) of the emitting electron population; all remaining parameters were held fixed at the values employed in the SED-fitted RIAF model: η t =−1.1, η nt =−2.02, τ t =−0.84, h t =h nt =1.0, α=1.25, γ min = 100, β=10, and κ=0. This model was compared to the flux and millimeter-VLBI data bolded in Table 1. For this run, the differential-evolution sampler with 12 tempering levels was used. There were 24 walkers used, and the MCMC chain was run for 8200 steps. The obtained reduced χ 2 =1.06, with a corresponding two-tailed p-value of 50%, implying that highquality fits were found.
In Figure 22, the resulting set of parameter constraints is presented in comparison to the prior analyses described in Sections 8.3.4 and 8.3.5. In all cases, the spacetime parameters are consistent with those found previously. Including the flux data produces a marginally stronger constraint on the black hole spin, = -+ a 0.1 0.08 0.19 , arising from the systematic decrease in the quality of the compact radio SED fits at higher spins (which was ignored in the prior analyses). Nevertheless, as anticipated, the inclination constraints are broadened, permitting q = -+ 62.2 4.6 5.3 and q = -+ 117.2 6.2 5.5 .

Code Performance
As seen in many of the validation tests and example analyses presented in Sections 8 and 9, even for models with modest numbers of parameters, it is typical for the posterior probability distributions to be multimodal. As the models increase in sophistication, introducing additional physical freedoms and addressing various systematic uncertainties, this problem will be compounded by the need to explore high-dimensional parameters spaces. This is further complicated by the computational expense of numerically generating images of realistic astrophysical models. As a result, THEMIS has been designed to exploit the proliferation of modern HPC systems.
Here we discuss the ways in which this has been, and may be, implemented, along with a description of THEMIS's scaling efficiency, demonstrating that it can run efficiently on very large machines. THEMIS explicitly supports parallelization via MPI and implicitly via OpenMP and CUDA. MPI parallelization has been implemented at a number of levels, including the samplers, likelihood evaluation, and model generation, permitting users maximum flexibility in distributing the computational workload of an analysis.
Both the parallel-tempered, affine-invariant and differentialevolution MCMC sampling algorithms are designed to exploit parallelization in two levels. First, the use of parallel-tempering levels may be further parallelized by assigning separate tempering levels to different collections of processors. Second, the use of ensemble methods may be trivially parallelized among the individual walkers. Our implementation of the ensemble sampler evolves half of the walkers simultaneously while using the other, nonevolving half to determine the next proposed jump. Each walker in the "active" set can be evolved on a separate CPU. Upon completion, the "active" and "passive" sets swap, and the process is repeated. The result is a set of samplers that can immediately utilize N N 2 T W processors, where N T is the number of tempering levels and N W is the number of walkers, typically many times the number of parameters.
Image generation is an intrinsically parallelizable task. The VRT 2 library already natively supports MPI parallelization and vectorization via OpenMP. On modern Xeon-based systems, VRT 2 can efficiently use N L =32 cores to produce 128×128 pixel images before ancillary memory and communication costs become significant. Odyssey employs GPUs via CUDA and provides an example of mixed MPI/GPU support within THEMIS. The performance of mixed MPI/GPU computation depends mainly on the number and specifications of the GPU cards and is less sensitive to the number of MPI cores. Users implementing new THEMIS models are provided an MPI communicator and are only responsible for determining if and how parallelization should be implemented in their instance; they will be able to automatically exploit parallelization at the other levels. Figure 23 shows the scaling of THEMIS on a representative sample problem with N T =4 tempering levels, N W =16 MCMC walkers, and N L =32 processors per likelihood evaluation. For this case, THEMIS scales with 94% efficient to 32, 88% efficient at 512 cores, and 84% at 1024 cores. Note that even modest increases in problem complexity involving larger images or higher-dimensional parameter spaces require a larger set of walkers and tempering levels, and allow more processors per likelihood evaluation. Thus, the scaling efficiency of THEMIS will improve with problem size. Already, THEMIS can run efficiently on several thousand cores.

Summary
THEMIS provides a new framework in which to develop and implement analyses of EHT observations. By focusing on the construction of interfaces, THEMIS enforces a modularity that facilitates rapid future development, ensuring flexibility and permitting extensibility. This flexibility is illustrated by the existing set of current THEMIS components, which span a wide variety of types of data, models, and sampling techniques. The clear definition of component inputs and outputs enables future developers to rapidly contribute additional components (e.g., image models) without the need for a global understanding of the internal structure of the code.
Implemented data types include both millimeter-VLBI observables (visibility amplitudes, closure phases, closure amplitudes, polarization fractions) and ancillary data (fluxes). The ability to easily add accoutrements to these data objects, e.g., time stamps, observing stations, atmospheric conditions, observation resolution, etc., significantly increases their flexibility and the potential sophistication of subsequent analyses.
The generic nature of the model interface produces a correspondingly broad array of acceptable models, solving a key difficulty with unifying prior EHT analyses. As a result, THEMIS can construct analyses of phenomenological (e.g., Gaussians) and physically motivated models (e.g., polarized images of synchrotron-emitting GRMHD simulations). It also naturally allows the inclusion of optional, additional, independent model features (e.g., interstellar scattering) in a uniform way. In principle, it can also facilitate in nonparametric modeling, e.g., image inversion, though this has yet to be implemented.
A number of likelihoods have been implemented, including likelihoods that analytically address nuisance parameters. This will become increasingly important as additional EHT systematics are considered, e.g., telescope gain corrections, refractive scattering, and intrinsic source variability. Similarly, a number of samplers have been implemented, including samplers that efficiently explore high-dimensional, multimodal likelihood surfaces.  Table 1. For reference, the posteriors implied by the visibility amplitude data alone and by the combined visibility amplitude and closure-phase data sets from Figures 19 and 20 are shown in red and blue, respectively. The contours show the 1σ, 2σ, and 3σ confidence regions for the spin magnitude, a; the cosine of the inclination, q cos ; and the position angle, ξ. All parameter constraints are consistent with the results of Broderick et al. (2016).
A key feature of THEMIS is the ability to mix and match the above, constructing new analyses via minor changes in the model used, data included, and sampler used. This will be critical to evaluating the robustness of features, teasing apart subtle interactions in aspects of complex models, and systematically assessing the impact of additional types of data. At the same time, this permits rapid, distributed development: as features are added in the service of one analysis, e.g., a new sampler or a new scattering model, they may be rapidly deployed to others.
In anticipation of increasingly complex, physically motivated emission models for EHT targets, THEMIS enables the implementation of parallelization at multiple levels via multiple schemes. At present, this is implemented in a number of samplers via MPI and models via MPI, OpenMP, and CUDA. As a result, for typical analyses, THEMIS scales efficiently to thousands of cores, depending on problem complexity, and can effectively exploit modern HPC systems. For the implemented samplers, this parallel-performance scaling improves with problem complexity (i.e., number of parameters), partially mitigating the introduction of additional physical features.
Both the individual components of THEMIS and their integration have been extensively tested. THEMIS is able to accurately and consistently explore high-dimensional multimodal posterior probability distributions. It is able to recover the parameters of models used to construct realistic simulated EHT data for both geometric and physically motivated RIAF models. It has accurately reproduced previous analyses of published EHT data. In the case of the RIAF models, it has done so in an order-of-magnitude less user time.
The extensibility of THEMIS is evident in the extension of these prior analyses. The Kamruddin & Dexter (2013)  . This is considerably larger than the size implied by the 1σ region obtained when only visibility amplitude is considered, though still consistent at 2σ. Nevertheless, high-quality fits of the combined closure-phase and prior visibility amplitude data sets do exist. Note that this implies a crescent diameter that is nearly twice as large as the 55 μas anticipated for SgrA * by identifying the crescent with the gravitationally lensed image of a geometrically thick accretion flow.
Where prior RIAF analyses have separated the fitting of the SED and EHT data for SgrA * , THEMIS now simplifies the process of fitting both simultaneously. While this may yield weaker parameter constraints in principle, in practice the black hole spin parameters are similarly constrained. Within the context of a similarly constrained RIAF model, i.e., a geometrically thick, Keplerian flow, we find = - 158 . 1 10 . 4 11 . 5 . Future analyses that will systematically explore the relaxation of assumptions about the structure of the inner accretion flow will be published elsewhere.
THEMIS is meant to facilitate continuous, vigorous development. Already, plans are underway to address refractive scattering in the interstellar medium, model stochastic variability in the intrinsic emission region, introduce jet models, exploit GRMHD simulations, and perform nonparametric analyses. Anticipated future data-type development will include polarized fluxes, Faraday rotation measurements, circular polarization, and visibility variances. Additionally, new samplers will be implemented to help with the known difficulties of ensemble methods presented here (Huijser et al. 2015). As a result, THEMIS is designed to facilitate the future scientific utilization of the new window on black hole physics now opened by the EHT.

Appendix A Prediction Accuracy Requirements
When predictions are made numerically, frequently the computational expense is strongly dependent on the accuracy of the theoretical estimate required. Thus, significant efficiencies can be realized by understanding and limiting the accuracy requested where possible. Generally, comparisons with data with large uncertainties require far less accurate theoretical estimates than with data that have small uncertainties. Here we determine the relationship with parameter estimation uncertainty and the accuracy of the theoretical estimates, thereby estimating the accuracy required by THEMIS. We begin by assuming that the measurement errors are Gaussian. We further assume that the posterior parameter probability is also nearly Gaussian and thus adopt a Fisher matrix approach to the estimation of the uncertainty of the parameter estimates. Finally, we assume that errors in the predicted values are Gaussian and uncorrelated. That is, we set the log-likelihood to where the predicted value is f j , d j is the error in the predicted value, y j are the data, and σ j are the observational uncertainties.
The assumption that the prediction errors are Gaussian corresponds to assuming that the d j are Gaussian random variables. This is not true in an absolute sense: each time a prediction is made for the same independent variables, the δ j does not change. However, in a statistical sense, we are assuming that at different independent variable values and for different parameter values, the δ j are well approximated by a random variable. It will be useful henceforth to characterize the size of the distribution of the δ j in terms of σ j , i.e., we set the variance of the prediction error in terms of the measurement uncertainty as d s á ñ = D j j 2 2 2 . This implies that higher prediction accuracy is possible for more accurately measured quantities.
The δ j modify the minimum  (and thus χ 2 ) expected: averaging over realizations of the data and the prediction errors, where we have further assumed the number of degrees of freedom is close to the number of data points, N. For a sufficiently large number of degrees of freedom and a large enough Δ 2 , the deviation will be statistically noticeable in the reduced χ 2 when ( ) D  N 8 1 4 . This expression grows slowly with N, though, and is therefore not a fundamental limit as far as parameter estimation is concerned.
The uncertainty in the estimate of the parameters is set by the inverse of the covariance matrix, given by However, a typical value will be modified by the presence of the linear term. That is, the variance in the inverse covariance is The covariance matrix, whose eigenvalues indicate the magnitude of the uncertainties, is given by in which μ is a Gaussian random variable with unit variance. This error term is suppressed by approximately a factor of -N 1 2 relative to the mean variance, and thus in the limit of large N becomes insignificant.
The error term contains two elements, associated with the measurement and prediction errors, respectively. The ratio of the latter to the former is Δ 2 /2. When Δ 2 is small and N is large, the uncertainty on the parameter estimates then grows by a multiplicative factor of Δ 2 /4. This is unconditionally small when Δ is small.
In THEMIS, we typically set Δ=0.25, for which Δ 2 /4= 0.016, which broadens the posterior parameter distributions by 1.6%. This does complicate the interpretation of fit quality for N2048, however.

Appendix B Error Distributions of Quantities Associated with Visibility Amplitudes
THEMIS has three data types associated with visibility amplitudes: the visibility amplitudes themselves, interferometric polarization fractions, and closure amplitudes. None of the underlying error distributions for any of these is Gaussian, and the latter two are poorly approximated by Gaussians. Here, we summarize what the relevant error distributions are and quantify how well they are approximated in THEMIS. In all cases, we will assume that the complex visibilities are well described by a Gaussian random variable with nonzero mean.

B.1. Visibility Amplitudes
The probability distribution of the magnitude of a complex Gaussian random variable, V, with mean V 0 and standard deviation σ, is given by the Rice distribution (see, e.g., Thompson et al. 2017): At high S/N (defined here by | | s V 0 ), a Gaussian with mean | | s + V 0 2 2 and standard deviation σ becomes an increasingly good proxy for the Rice distribution, with the quality of this approximation increasing with S/N. When S/N2, the biased Gaussian is within 8% of the maximum probability of the Rice distribution for all values of | | V . These are compared in the left panel of Figure B1 for various choices of S/N. We provide a set of quantitative estimates of the accuracy of the Gaussian approximation in Table B1 for various S/Ns.

B.2. Visibility Amplitude Products
Before discussing the data quantities of interest, we begin by considering the distribution of the product of visibility amplitudes, i.e., We do this both to illustrate the procedure by which we construct exact probability distributions for combinations of products and quotients of visibility amplitudes and to show explicitly that these are typically well approximated by a single Rice distribution.
We begin by exploiting the nonnegative behavior of | | V to define . This simplifies the construction of the product by reducing it to a sum, i.e., in terms of v A and v B , . The probability distribution of v is given in terms of the Rice distribution by In practice, this may be computed efficiently via FFT. In terms of f r , the characteristic function of the probability distribution of w is, ,0 , and thus, the probability distribution of w is These are shown in Figure B1. While formally the distribution of W is characterized by four parameters, in practice it is well approximated by a single Rice distribution with , 0 , 0 and s s s = +

A B
2 2 , differing by 17% of the maximum probability for S/N2. This comparison is also shown in the right panel of Figure B1, and estimates of the accuracy of the approximation are tabulated in Table B2 for various S/Ns. Note that this also implies that the product distribution is well fit by a Gaussian for sufficiently high S/N.

B.3. Polarization Fractions-Visibility Amplitude Quotients
We follow a similar procedure to that in the previous section to compute the distribution of the quotient of visibility amplitudes, i.e., Unfortunately, we find that the remarkable simplicity of the distribution of visibility products does not extend to quotients. The characteristic function of the logarithmic quotient distribution is , where the * denotes complex conjugation. The resulting logarithmic quotient probability distributions is This is directly applicable to the polarization fraction, for which this distribution is shown in the left panel of Figure B2.
Notes. a Maximum absolute difference, measured relative to probability maximum. b Fractional error in the location of the mode. c Fractional error in the width of the region containing 68%/95% of the cumulative probability.
The polarization fraction distribution clearly deviates from the Gaussian and Rice distributions in two key respects. First, even at high S/N, the distributions are asymmetric, with the probability maximum lying below , 0 , 0 . Second, there is a significant tail extending to high values of Q, containing sufficient weight to move the average Q above Q 0 for S/N2. More accurate are fitted (i.e., same mean and standard deviation) log-normal approximations, shown in the left panel of Figure B2, which recover the asymmetry, though still exhibit deviations for S/N4.
Combined with the accuracy of the Gaussian approximation to the Rice distribution, this motivates an exploration of better approximates to the quotient distribution of visibility amplitudes. For two Gaussian variables, with nonzero means, it is possible to analytically construct the quotient distribution analytically: and erf(x) is the standard error function. This is also shown in the left panel of Figure B2. While deviations from p q continue to exist, the Gaussian quotient (Gauss Q) distribution accurately reproduces the large high-Q tail. As with p p (W), p q (Q) depends on the S/N of both the numerator and denominator independently. Unlike the product distribution, the quotient distribution is not symmetric in this dependency, with the properties of the denominator controlling the asymmetry and tail. Therefore, characterizing this distribution by a single pair of numbers-a central value and widthwill result in a substantial uncertainty in the resulting quotient distribution. This is simplified for polarization fractions by the fact that the stations used to construct the visibility amplitudes in the numerator and denominator are the same, and thus both quantities have similar noise profiles in principle, i.e., σ A ≈σ B . This implies that , 0 , 0 and  s m is the uncertainty obtained by the standard error propagation formula. Note that these are the only two quantities required to fully specify the Gauss Q and quotient distributions. Combining this with Equation (B8), we obtain for the polarization fraction These are shown in the left panel of Figure B3. Similar to the polarization fractions, they are clearly asymmetric and exhibit large tails to high values, typical of quotient distributions. Formally, this requires knowledge of eight values to define. However, again, it is possible to accurately estimate  p with only a handful of combinations of these values. Due to the similarity between the amplitude product distribution (Appendix B.2) and the Rice distribution, both the numerator and the denominator can be effectively described by only two parameters each. As a result, the closure amplitude distribution is similar to the amplitude quotient distribution described in Appendix B.3, shown by the dashed lines in the left panel of Figure B3.
If the S/Ns of the denominator and numerator are independently known, this is well approximated by the Gauss Q distribution in Equation (B8). These may be reconstructed with knowledge of the total S/N ( s   ), and the ratio of the thermal uncertainties in the numerator and denominator, i.e., s s s = +  Figure B3. The latter are not independent, related by the repeated presence of each of the four stations required to produce a closure amplitude in the numerator and denominator. That is, identifying the baselines A, B, C, and D, with stations 1 and 2, 3 and 4, 1 and 4, and 2 and 3, respectively, assuming identical bandwidths and scan duration, where the S j are station-specific system-equivalent flux densities (SEFDs). Despite the appearance of four SEFDs, this is a function of only two variables: the ratios S 3 /S 1 and S 4 /S 2 . Where both of these ratios are of order unity, i.e., for a homogeneous array, ρ≈1. For highly heterogeneous arrays, in which more than one station is much more sensitive than the others, this can deviate from unity substantially, by an amount that depends on the second-lowest and second-highest SEFDs, S 2nd min and S 2nd max , respectively, For the 2017 EHT campaign, the station SEFDs ranged from 90 to 6000Jy, with most being near 5000Jy. The phased Atacama Large Millimeter/submillimeter Array (ALMA) is an extremely low-noise outlier at 90Jy, followed by the Large Millimeter Telescope at 600Jy (see the EHT imaging library, Chael et al. 2016Chael et al. , 2018b, and available at github.com/achael/ eht-imaging). As a result, ρ ranges from 0.3 to 3.3, with most of the potential closure amplitude squares having ρ within 11% of unity. Therefore, even without prior knowledge about the visibilities that comprise the numerator and denominator of the closure amplitude, a similar procedure to that used for the closure amplitudes, where ρ≈1 is assumed, is well motivated.
Given a value of ρ, either from the station SEFDs or setting it to unity,  Notes. a Ratio of V A,0 /σ A to V B,0 /σ B in the construction of the product distribution. b Maximum absolute difference, measured relative to probability maximum. c Fractional error in the location of the mode. d Fractional error in the width of the region containing 68%/95% of the cumulative probability. Figure B2. Left: comparison of the polarization fraction distribution (visibility amplitude quotient distribution) with the Gaussian quotient distribution approximate (Gauss Q) for various input values of the denominator S/N, S/N d . In all cases, the numerator S/N was set to 8. For comparison, a log-normal distribution is also shown. Right: comparison of the polarization fraction distribution (boundaries of the shaded region), the Gauss Q approximation in Equation (B10), and the lognormal model, for various total S/Ns (i.e.,   s m m ) at  = m 0.5 and 2.0. Quantitative estimates of the accuracy of the various approximations can be found in Table B3. be constructed via Quantitative assessments of the performance of this approximation for a number of variations in the distribution of S/Ns among the various components and for fixed and known ρ are listed in Table B4. In the Gauss Q cases with fixed ρ and for the log-normal distribution, we permit the true value of ρ to range from 0.3 to 3.3, reporting the maximum deviation for each measure independently. Knowledge of ρ significantly improves the quality of the approximation, which for S/N2 is accurate to 13% for <  1. The asymmetric impact of noise in the denominator and numerator of the closure amplitude is responsible for the worsening performance of the approximation when >  1; generally, closure amplitudes can be constructed so that   1. Notes. a Maximum absolute difference, measured relative to probability maximum. b Fractional error in the location of the mode. c Fractional error in the width of the region containing 68%/95% of the cumulative probability. Figure B3. Left: comparison of the closure amplitude distribution with the Gauss Q approximation in Equation (B17) for various choices of the denominator S/N, S/N d . In all cases, the numerator S/N was set to 8, divided equally among the visibility amplitudes in the numerator. For comparison, the visibility amplitude quotient (Rice Q) distribution is also shown. The range of the filled bands indicates the uncertainty associated with various choices of how S/N d is apportioned between the two visibility amplitudes in the denominator. In all cases, the value of ρ was set to the proper value in all models. Right: comparison of the closure amplitude distribution and the Gauss Q approximation with ρ=1, for =  0.5 0 . The colored bands indicate the range of closure amplitude distributions when the true ρ is varied within the permissible range for the 2017 EHT campaign, [0.3,0.33]. The S/N within the numerator and denominator is distributed uniformly. Quantitative estimates of the accuracy of the various approximations for illustrative cases can be found in Table B4.
The performance of the approximation in reconstructing the mode and width of the distribution is very good in this limit, better than 1%. When ρ is not known a priori, setting it to unity introduces an additional error in the approximation of the closure amplitude distribution. Nevertheless, even with excursions of a factor of 3, by S/N4, the Gauss Q approximation is accurate to 13% at all  .
In practice, the primary difficulty with applying Equation (B17) is the accuracy with which s n n and s d d can be reconstructed, which depends how close  is to  0 . At low S/N, this can lead to a significant error in the estimation of the likelihood. Where the estimate of σ n /n or σ d /d is higher than their true values, this makes little difference. However, where they are much lower than their true values, this can result in the construction of the closure amplitude distribution. d Fractional error in the location of the mode. e Range of input values of ρ explored in accuracy estimate. f Fractional error in the width of the region containing 68%/95% of the cumulative probability. g Gauss Q model with ρ set to input value. in a distribution that is considerably more narrowly concentrated about  0 than the true distribution, biasing any resulting parameter estimates. This ceases to be a significant bias for S/N > 4 for the 2017 campaign.
Finally, we remark on comparisons to a frequent alternative approximation for the closure amplitude error distribution, the log-normal distribution. This has a number of desirable features: it is simple to define and rapid to compute, it eliminates the conceptual distribution between the numerator and denominator in the definition of the closure amplitudes, and it naturally produces extended tails toward large values. A quantitative comparison to the Gauss Q approximation is presented in Table B4. To do so, we have constructed a biased log-normal distribution, i.e., with mean ( ) s +    log 0 2 0 and standard deviation s   0 . These perform more poorly than the Gauss Q approximation at all values of S/N, exhibiting significant biases in the maximum and median of the error distributions.
At high S/Ns, this distinction makes little difference; for modest S/Ns, this results in significant systematic biases in reconstructed structural parameters. This is clear in Figure B4, which presents the posterior distributions of the size of a Gaussian feature from simulated 2017 campaign data. The underlying image and thermal noise were chosen such that the S/Ns of the closure amplitudes are moderate, i.e., they range from below our cutoff of 4-18. The two posteriors shown are for two likelihoods that differ in the assumed closure amplitude error distribution. For both, excellent fits are obtained. However, in the latter case, when a log-normal distribution is assumed, a notable and systematic bias toward more compact structures is present. For this reason, we adopt the modestly more complicated Gauss Q distribution.

Appendix C Gaussian-prior-modified Levenberg-Marquardt Algorithm
To numerically maximize the likelihood during the reconstruction and marginalization over station gain reconstructions (Section 6.8), we employ a modified Levenberg-Marquardt algorithm that includes Gaussian priors on the gains. We follow Section 15 of Press et al. (1992), in which algorithms to minimize a χ 2 are presented. The appropriately modified expression for the χ 2 , accounting for a Gaussian prior on the parameters, is˜( where g, g A , and Σ A are all defined in Section 6.8, and the additional term is the Gaussian prior directly. The associated modified definitions of β A and α AB (defined in 15.5.8 of Press et al. 1992) are˜˜( The expression for a AB is simplified further in 15.5.11 of Press et al. (1992), though we continue to adopt the above modification. All other elements of the Levenberg-Marquardt algorithm are unchanged. Figure B4. Comparison of the posterior distribution of the reconstructed size of a Gaussian from simulated closure amplitude data assuming the Gauss Q (black) and log-normal approximations (red). The true size is 10 μas, indicated by the blue line. An S/N cut of S/N > 4 was applied to the data to exclude marginal detections and improve the fidelity of the assumed approximate error distributions.