Closure Statistics in Interferometric Data

Lindy Blackburn; Dominic W. Pesce; Michael D. Johnson; Maciek Wielgus; Andrew A. Chael; Pierre Christian; Sheperd S. Doeleman

doi:10.3847/1538-4357/ab8469

1. Introduction

Interferometric observations allow diffraction-limited resolution on angular scales that are inaccessible to single-element systems. However, interferometers have the limitation of only sparsely sampling information in the so-called visibility domain. While measured visibilities have simple and deterministic thermal noise, they also have complex systematic errors. These systematic errors manifest as variations in visibility amplitudes and phases on many timescales, representing limitations imposed by a broad range of sources including the constituent interferometer elements, reference frequencies, and atmosphere.

The dominant systematic errors are station-based effects, corresponding to multiplicative complex gain factors. In this case, "closure" quantities can be constructed, which are independent of the station-based systematic calibration errors. Specifically, closure phases consist of a directed sum of visibility phases around a closed triangle joining three stations (Jennison 1958; Rogers et al. 1974), while closure amplitudes are the quotient of two visibility products involving four stations (Twiss et al. 1960; Readhead et al. 1980). These quantities have found particular utility in very long baseline radio interferometry (VLBI) where array sparsity motivates the use of model independent observables. For an array with N stations, one can form ∼N³ closure phases and ∼N⁴ closure amplitudes from the original set of ∼N² visibilities. However, there are at most (N − 1)(N − 2)/2 degrees of freedom in the closure phases and N(N − 3)/2 degrees of freedom in the closure amplitudes. The necessary degeneracy between the full sets of closure quantities is captured by the structure of their covariance.

Closure quantities are useful for interferometric analysis, especially model fitting and imaging (e.g., Readhead & Wilkinson 1978; Chael et al. 2018), because they eliminate the need to model station gain systematics, and their error budget can be determined from first principles. Yet, despite the fundamental importance of closure quantities for interferometry, there is widespread variation in the literature concerning their properties, best practices when utilizing closure quantities, their relationship with standard analyses such as self calibration, and the role of linearly independent sets of closure quantities (which are not necessarily statistically independent). Moreover, most analyses to date ignore the covariance between closure quantities, which can be significant; although, covariance of closure phases has been studied in the optical interferometry community (e.g., Kulkarni et al. 1991; Martinache 2010; Ireland 2013).

Here, we provide a rigorous foundation for analysis using closure quantities, and we give procedures for selecting nonredundant sets of closure phases and amplitudes. We demonstrate that, when covariance is correctly accounted for, these nonredundant sets carry the full information of the complete sets of closure quantities. Moreover, in the limit of completely unconstrained station gains, we show that analysis of closure quantities is identical to analysis of complex visibilities with gain marginalization. We also give procedures for selecting nonredundant sets that minimize covariance, and we demonstrate the effects of covariance among closure products using simulated data from simple models.

We begin, in Section 2, by discussing thermal and systematic errors in interferometric measurements, and we assess the conditions under which errors on closure quantities can be approximated as Gaussian. Next, in Section 3, we evaluate the covariance among closure quantities and give prescriptions for selecting nonredundant sets of closure quantities. Then, in Section 4, we apply our results to simple model fits using closure quantities and demonstrate the role of nonredundant sets and covariance among closure products. We summarize our results in Section 5. The notation used throughout the paper is described in Table D1.

2. Closure Quantities and Errors

2.1. Inteferometric Visibility and Gain

An interferometric array aims to measure the complex coherence function of the electric field, or visibility ${V}_{{ij}}=E\left[{{ \mathcal E }}_{i}{{ \mathcal E }}_{j}^{\ast }\right]$ (represented here in the frequency domain, with ${ \mathcal E }$ in units such that expectation value ${S}_{\nu }\sim E\left[| {{ \mathcal E }}_{i}{| }^{2}\right]$ is the electromagnetic flux spectral density), from a distant source at two locations i and j in the plane of propagation. V_ij samples a Fourier component of the brightness distribution on the sky (via the van Cittert–Zernike theorem, van Cittert 1934; Zernike 1938; Thompson et al. 2017), with spatial frequency corresponding to the projected baseline length in units of observing wavelength. In the idealized case, the field is measured without any attenuation or propagation delays (e.g., through atmosphere). In practice, the measured complex signal v_i at an antenna i can be modeled as the idealized incident aligned electric field ${{ \mathcal E }}_{i}$ subject to a linear complex gain factor γ_i, plus additive zero-mean circularly symmetric complex Gaussian noise n_i

$\begin{eqnarray}&&{v}_{i}={\gamma }_{i}\,{{ \mathcal E }}_{i}+{n}_{i}.\end{eqnarray} \tag{ 1 }$

In continuum-VLBI, the noise power typically exceeds signal power by a large factor: $E\left[| n{| }^{2}\right]\gg E\left[| \gamma { \mathcal E }{| }^{2}\right]$ . The gain for a particular antenna feed is a function of time and frequency γ = γ(t, f), and while a variety of simplifying assumptions and factorizations can be made, the gain is often not known a priori to a high degree of precision. Thus, the fundamental observable is not source visibility V_ij but cross-covariance r_ij between pairs of antennas:

$\begin{eqnarray}&&{r}_{{ij}}=E\left[{v}_{i}^{}{v}_{j}^{* }\right]={\gamma }_{i}^{}{\gamma }_{j}^{* }\,{V}_{{ij}}.\end{eqnarray} \tag{ 2 }$

If the signals v_i and v_j are normalized by their noise power such that $E\left[| {n}_{i}{| }^{2}\right]=1$ , r_ij is the correlation coefficient, and ${\mathrm{SEFD}}_{i}=1/| {\gamma }_{i}{| }^{2}$ represents the system-equivalent flux density (noise power in units of flux above the atmosphere).

Relating the measured correlation coefficients r_ij to source visibilities V_ij is the process of calibration and may include estimating the magnitude of $| {\gamma }_{i}|$ through the observation of bright flux calibrators, measuring differential phase $\mathrm{Arg}\left[{\gamma }_{i}^{}{\gamma }_{j}^{* }\right]$ by observing phase calibrators with known structure, or by process of self calibration where the gains γ_i are solved simultaneously with unknown source model parameters. For a VLBI array at millimeter wavelengths, calibration is made difficult by the strong and rapidly changing atmospheric effects and by the lack of bright compact calibration sources of known structure. Amplitude and phase gain systematics often dominate over the thermal (statistical) noise that arises from estimating r_ij over finite time and bandwidth.

2.2. Closure Phase and Closure Amplitude

Closure quantities are special combinations of correlation measurements taken over closed loops in an antenna network. They are able to cancel out station-based gains γ_i, giving observables that depend only on intrinsic source parameters.

A closure phase is the sum of measured phases around a closed triangle of baselines,

$\begin{eqnarray}&&{\phi }_{123}={\phi }_{12}+{\phi }_{23}+{\phi }_{31},\end{eqnarray} \tag{ 3 }$

where ϕ₁₂ is the phase on baseline 1–2,

$\begin{eqnarray}&&{\phi }_{12}=\mathrm{Arg}\left[{r}_{12}\right].\end{eqnarray} \tag{ 4 }$

Written as the phase of the complex bispectrum (triple product) V₁₂₃ = V₁₂V₂₃V₃₁, we see that a closure phase is independent of arbitrary phase gain $\mathrm{Arg}\left[{\gamma }_{i}\right]$ ,

$\begin{eqnarray}&&\mathrm{Arg}\left[{r}_{12}{r}_{23}{r}_{31}\right]=\mathrm{Arg}\left[{\gamma }_{1}^{}{\gamma }_{2}^{* }{V}_{12}\,{\gamma }_{2}^{}{\gamma }_{3}^{* }{V}_{23}\,{\gamma }_{3}^{}{\gamma }_{1}^{* }{V}_{31}\right]\end{eqnarray} \tag{ 5 }$

$\begin{eqnarray}&&=\ \mathrm{Arg}\left[{V}_{12}{V}_{23}{V}_{31}\right],\end{eqnarray} \tag{ 6 }$

since every gain term on the right-hand side is multiplied by its complex conjugate.

A set of correlation coefficients connecting four sites in a closed quadrangle can be used to calculate a closure amplitude,

$\begin{eqnarray}&&\displaystyle \frac{| {r}_{12}| | {r}_{34}| }{| {r}_{13}| | {r}_{24}| }=\displaystyle \frac{| {\gamma }_{1}^{}{\gamma }_{2}^{* }{V}_{12}| | {\gamma }_{3}^{}{\gamma }_{4}^{* }{V}_{34}| }{| {\gamma }_{1}^{}{\gamma }_{3}^{* }{V}_{13}| | {\gamma }_{2}^{}{\gamma }_{4}^{* }{V}_{24}| }=\displaystyle \frac{| {V}_{12}| | {V}_{34}| }{| {V}_{13}| | {V}_{24}| }.\end{eqnarray} \tag{ 7 }$

In this case, we see that closure amplitude is independent of arbitrary amplitude gain $| {\gamma }_{i}|$ since each station gain amplitude term appears in both the numerator and denominator.⁶

By canceling station gains, the closure quantities are able to isolate measurement degrees of freedom that are independent from gains, and they provide observables that are accurate to the thermal-noise limit or to residual baseline errors (Massi et al. 1991), which are typically much smaller than the station errors. Thus, they are particularly valuable when systematic gain uncertainty is much larger than statistical uncertainty. We will see in subsequent sections that the closure phases and closure amplitudes capture all of the gain-invariant degrees of freedom from the baseline visibilities, at the cost of removing any prior information about the gains.

2.3. Statistical Thermal Noise

When estimated from actual data, the closure quantities and associated correlation coefficients from Equations (2)–(7) must be averaged over a finite time and bandwidth in order to accumulate signal-to-noise ratio (S/N). A measurement of r_ij taken over integration time Δt and bandwidth Δν averages ${\rm{\Delta }}t\,{\rm{\Delta }}\nu$ independent complex samples (finite average denoted with $\langle \rangle$ ), and includes contributions from both the source and the independent zero-mean (and normalized) thermal noise at each antenna,

$\begin{eqnarray}&&{r}_{{ij}}={\breve{r}}_{{ij}}+\langle {n}_{i}^{}{n}_{j}^{* }\rangle .\end{eqnarray} \tag{ 8 }$

We have introduced a breve accent ${\breve{r}}_{{ij}}={\breve{\gamma }}_{i}{\breve{\gamma }}_{j}^{* }{\breve{V}}_{{ij}}$ to distinguish underlying (ground-truth) values from those that are subject to statistical or systematic errors.⁷ Under the previously adopted normalization $E\left[| {n}_{i}{| }^{2}\right]=1$ (see Equation (2)), the variance for one sample of correlated complex noise $E\left[| {n}_{i}^{}{n}_{j}^{* }{| }^{2}\right]=1$ , and the variance in one component (real or imaginary) of the averaged complex noise correlation $\langle {n}_{i}^{}{n}_{j}^{* }\rangle$ is then

$\begin{eqnarray}&&{\sigma }_{r,{ij}}^{2}=\displaystyle \frac{E\left[| \langle {n}_{i}^{}{n}_{j}^{\ast } \rangle {| }^{2}\right]}{2}=\displaystyle \frac{1}{2\,{\rm{\Delta }}t\,{\rm{\Delta }}\nu },\end{eqnarray} \tag{ 9 }$

where the amount of time-frequency averaging to reduce ${\sigma }_{r}^{2}$ is ultimately constrained by assumptions regarding station gain variability and source model variability.

The underlying signal-to-noise $\breve{\rho }$ of a correlation coefficient amplitude $| \breve{r}|$ under time-frequency averaging, and assuming negligible evolution of model visibility due to source structure or residual systematics, is

$\begin{eqnarray}&&\breve{\rho }=| \breve{r}| /{\sigma }_{r}.\end{eqnarray} \tag{ 10 }$

This is taken in the moderate-to-high $\breve{\rho }$ limit where it is meaningful to measure noise along just one of the complex components. A correlated flux density of 1 Jy with typical geometric mean SEFD of 10⁴ Jy would give an expected correlation coefficient of 10⁻⁴ and an S/N of 4.5 over 1 GHz of bandwidth in 1 s of integration time. At $\breve{\rho }\lt 1$ , the ability to measure correlation phase and amplitude degrades rapidly (Rogers et al. 1995), so that the minimum acceptable integration time is fundamentally limited by a combination of source strength, bandwidth, and collecting area.

At the same time, any uncompensated complex gain $\breve{\gamma }$ (from Equations (1) and (2)) must be stable over the averaging timescale to both ensure a meaningful measurement and to avoid phase decoherence while vector averaging complex baseline visibility. At millimeter and submillimeter observing wavelengths, phase decoherence due to atmospheric turbulence occurs on timescales of seconds. The requirement that $\breve{\rho }\gt 1$ over an averaging time Δt and bandwidth Δν where gain variation remains negligible sets the observational constraints where the use of closure quantities is particularly effective. Gain variation over the frequency bandpass is generally a stable instrumental effect that can be well measured and calibrated out. Similarly, relative complex gain between two orthogonal feeds in an antenna, e.g., γ_R/γ_L or γ_X/γ_Y, is generally stable and can be calibrated, so that synthesized combinations of correlation products such as Stokes I = V_RR + V_LL are also characterized by station-based residual gains that close.

For observations at high radio frequencies, rapid phase gain variability in time due to the atmosphere is a primary driver of efforts to expand the collecting area and instantaneous bandwidth of mm-VLBI arrays such as the Event Horizon Telescope (EHT; Event Horizon Telescope Collaboration et al. 2019a, 2019b). So long as there exists at least one high-S/N baseline to a given site connecting it to the array phase center, the atmospheric phase variations can generally be solved for and removed. This allows for longer coherent integration on weak baselines to that site (Blackburn et al. 2019). In the case of the EHT, this condition is generally satisfied by the presence of the highly sensitive ALMA in the array.

In the following subsections, we discuss the consequences of low S/N on characterization of errors in phase and amplitude, and also the propagation of these errors across derived closure quantities. However, we do not explore optimal averaging strategies for the generation of closure phase and closure amplitudes. This would require assuming a prior model for gain variability. Rather, we assume there exist some Δt and Δν such that $\breve{\rho }\gt 1$ is maintained on all baselines, and over which $\breve{\gamma }$ can be made reasonably stable.

2.4. Non-Gaussian Errors at Low S/N

The observed correlation coefficients (Equations (8) and (9)) are subject to measurement noise that is complex and independent in real and imaginary components. While this implies that complex visibility is the natural measurement space for correlation observables, gain systematics are largely separable into amplitude factors (e.g., aperture efficiency) and phase factors (e.g., variable path delay), and this is reflected in the way closure amplitude and closure phase are formed.

The transformation from errors in real and imaginary coefficients to errors in amplitude and phase is only effectively linear for $\breve{\rho }\gg 1$ . A consequence is that the statistical error budget of closure quantities becomes progressively non-Gaussian as $\breve{\rho }$ becomes small. This is particularly severe for the case of reciprocal amplitude, which is a necessary component of closure amplitude (Equation (7)). Heavy tails in the distribution for reciprocal amplitude are one motivation to move to log-closure amplitudes, which place the numerator and denominator of a closure amplitude on equal footing.

We will primarily assume that measured amplitudes, log amplitudes, and phases for visibilities and for closure quantities can each be approximately, yet adequately, characterized as a Gaussian random process with assumed model mean and variance. This is the case for $\breve{\rho }\gtrsim \mathrm{few}$ , which is typically achieved in continuum radio interferometry through sufficient time-frequency averaging in the weak-signal limit. Examples of closure phase and closure amplitude distributions, along with corresponding high-S/N normal distribution approximations, are shown in Figures 1–2.

**Figure 1.** Distribution of closure phases vs. Gaussian approximation from the high-S/N theoretical limit (dashed lines). For each closure phase, all three baseline visibilities are drawn from a complex normal distribution with mean value $\breve{\rho }$ and unity variance in each complex component.
Download figure:
Standard image High-resolution image

**Figure 1.** Distribution of closure phases vs. Gaussian approximation from the high-S/N theoretical limit (dashed lines). For each closure phase, all three baseline visibilities are drawn from a complex normal distribution with mean value $\breve{\rho }$ and unity variance in each complex component.
Download figure:
Standard image High-resolution image

**Figure 2.** Distributions of closure amplitudes vs. Gaussian approximations from the high-S/N theoretical limit. Baseline amplitudes A, B, C, D are drawn from a Rice distribution with noncentral amplitude 1 and ${\breve{\rho }}_{A},{\breve{\rho }}_{B},{\breve{\rho }}_{C},{\breve{\rho }}_{D}$ = (8, 8, 5, 5). There are large tails in the standard closure amplitude ratio due to amplitudes in the denominator that approach zero (top panel). The tail is mitigated somewhat by placing the lower-S/N measurements in the numerator (middle panel). However, using log-closure amplitude provides a better-behaved distribution overall (bottom panel).
Download figure:
Standard image High-resolution image

**Figure 2.** Distributions of closure amplitudes vs. Gaussian approximations from the high-S/N theoretical limit. Baseline amplitudes A, B, C, D are drawn from a Rice distribution with noncentral amplitude 1 and ${\breve{\rho }}_{A},{\breve{\rho }}_{B},{\breve{\rho }}_{C},{\breve{\rho }}_{D}$ = (8, 8, 5, 5). There are large tails in the standard closure amplitude ratio due to amplitudes in the denominator that approach zero (top panel). The tail is mitigated somewhat by placing the lower-S/N measurements in the numerator (middle panel). However, using log-closure amplitude provides a better-behaved distribution overall (bottom panel).
Download figure:
Standard image High-resolution image

These ensemble distributions for measured phase and amplitude are exactly calculable for a given model $\breve{\rho }$ , even in the low $\breve{\rho }$ limit where the distributions become non-Gaussian. However, in practice, the underlying intrinsic signal-to-noise $\breve{\rho }$ is generally not known, which means the distribution from which a single measured ρ is drawn is also not known precisely. Unless $\breve{\rho }$ is either assumed under a complete forward model (incorporating model visibility and all forward gains) or based on additional averaging beyond the single measurement of r, any estimate will be subject to thermal noise. In addition to a general mischaracterization of errors, this can also lead to a self-selection bias if realizations that are randomly low amplitude are assigned larger errors or if they are preferentially flagged from the data.

An expanded description of phase and amplitude distributions is given in Appendix A. The distributions for phase, amplitude, and log amplitude can be reasonably approximated as Gaussian for $\breve{\rho }$ above 2–5. For log amplitude, a full characterization of the distribution under incoherent averaging of amplitudes is given in terms of moments. This is useful for estimating the a priori amplitude noise bias that becomes significant at low S/N. However, if a significant amount of informative data has low S/N, it may be advantageous to forward model complex gains and explicitly marginalize over their uncertainties, at least for the affected stations. This keeps data in the complex domain and their errors Gaussian.

3. Independence of Closure Quantities

The ∼N³ possible closure phases and ∼N⁴ closure amplitudes are formed using the original ∼N² baseline visibilities and become highly redundant at large N, where a much smaller subset of nonredundant quantities captures all source degrees of freedom (Readhead et al. 1980; Pearson & Readhead 1984). The codependence of redundant closure quantities and their initial construction from common baseline quantities leads to a general lack of statistical independence in their residual thermal noise (Kulkarni 1989).⁸ For closure phases and log-closure amplitudes in the Gaussian limit,⁹ the statistical dependence is fully characterized by a nonzero covariance.

In the following subsections, we detail the covariance structure for closure phases and log-closure amplitudes, and we demonstrate the relationship of the covariance to the unique and statistically independent degrees of freedom present in the quantities. We then present strategies for the construction of nonredundant but complete sets of quantities, and we discuss proper accounting of the number of gain-invariant degrees of freedom.

3.1. Closure Covariance due to Thermal Noise

Closure phases and log-closure amplitudes are formed from sums and differences of shared baseline quantities, so that the closure quantities do not have independent noise. Under the approximation that baseline observables are Gaussian random variables, the joint distribution of $T$ nonredundant closure phases ${\psi }_{{ijk}}$ , for example, is characterized by a multivariate Gaussian distribution,

$\begin{eqnarray}&&G({\boldsymbol{\psi }};\hat{{\boldsymbol{\psi }}},{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}})=\displaystyle \frac{1}{\sqrt{{\left(2\pi \right)}^{T}\det ({{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}})}}\exp \left[-\displaystyle \frac{1}{2}{\tilde{{\boldsymbol{\psi }}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}\tilde{{\boldsymbol{\psi }}}\right],\end{eqnarray} \tag{ 11 }$

where residual closure phases $\tilde{{\boldsymbol{\psi }}}={\boldsymbol{\psi }}-\hat{{\boldsymbol{\psi }}}$ are taken about model values $\hat{{\boldsymbol{\psi }}}=\{{\hat{\psi }}_{{ijk}}\}$ and have covariance matrix ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}$ . This corresponds to the likelihood of observing the residuals $\tilde{{\boldsymbol{\psi }}}$ under the model hypothesis.

For a collection of all baseline phases measured among four sites, ${\boldsymbol{\phi }}=\{{\phi }_{12},{\phi }_{13},{\phi }_{14},{\phi }_{23},{\phi }_{24},{\phi }_{34}\}$ (Figure 3), the first three closure phases are,

$\begin{eqnarray}\begin{array}{rcl}{\psi }_{123} & = & {\phi }_{12}+{\phi }_{23}-{\phi }_{13}\\ {\psi }_{124} & = & {\phi }_{12}+{\phi }_{24}-{\phi }_{14}\\ {\psi }_{134} & = & {\phi }_{13}+{\phi }_{34}-{\phi }_{14}.\end{array}\end{eqnarray} \tag{ 12 }$

The final closure phase is redundant with the other three,

$\begin{eqnarray}&&{\psi }_{234}={\phi }_{23}+{\phi }_{34}-{\phi }_{24}={\psi }_{123}+{\psi }_{134}-{\psi }_{124}.\end{eqnarray} \tag{ 13 }$

**Figure 3.** Network of four sites. There are six baselines, three nonredundant closure phases, and two nonredundant closure amplitudes.
Download figure:
Standard image High-resolution image

We can represent the generation of closure phases as a linear operator (closure phase design matrix ${\boldsymbol{\Psi }}$ ) applied to the baseline phases: ${\boldsymbol{\psi }}={\boldsymbol{\Psi }}{\boldsymbol{\phi }}$ ,

$\begin{eqnarray}&&\left(\begin{array}{c}{\psi }_{123}\\ {\psi }_{124}\\ {\psi }_{134}\end{array}\right)=\left(\begin{array}{cccccc}1 & -1 & 0 & 1 & 0 & 0\\ 1 & 0 & -1 & 0 & 1 & 0\\ 0 & 1 & -1 & 0 & 0 & 1\end{array}\right)\left(\begin{array}{c}{\phi }_{12}\\ {\phi }_{13}\\ {\phi }_{14}\\ {\phi }_{23}\\ {\phi }_{24}\\ {\phi }_{34}\end{array}\right).\end{eqnarray} \tag{ 14 }$

This closure phase design matrix is equivalent to the "phase closure operator" of Lannes (1990b) and the "phase compilation operator" of Lannes (1991).

The covariance matrix for the nonredundant set is ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}={\boldsymbol{\Psi }}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}{{\boldsymbol{\Psi }}}^{\top }$ , where ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}$ is the covariance of the measured baseline phases. In general, ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}$ has a diagonal contribution from $B$ independent baseline thermal-noise contributions ${\boldsymbol{S}}=\mathrm{diag}({\sigma }_{00}^{2},\ ...,{\sigma }_{{BB}}^{2})$ , plus diagonal and off-diagonal contributions from common systematic gain errors ${\sigma }_{\theta ,i}^{2}$ . However, the common gain errors are ultimately eliminated through the formation of closure quantities. Therefore, ${\boldsymbol{\Psi }}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}{{\boldsymbol{\Psi }}}^{\top }={\boldsymbol{\Psi }}\,{\boldsymbol{S}}\,{{\boldsymbol{\Psi }}}^{\top }$ , and

$\begin{eqnarray}&&\begin{array}{l}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}\\ \ =\ \left(\begin{array}{ccc}{\sigma }_{12}^{2}+{\sigma }_{23}^{2}+{\sigma }_{13}^{2} & {\sigma }_{12}^{2} & -{\sigma }_{13}^{2}\\ {\sigma }_{12}^{2} & {\sigma }_{12}^{2}+{\sigma }_{24}^{2}+{\sigma }_{14}^{2} & {\sigma }_{14}^{2}\\ -{\sigma }_{13}^{2} & {\sigma }_{14}^{2} & {\sigma }_{13}^{2}+{\sigma }_{34}^{2}+{\sigma }_{14}^{2}\end{array}\right).\end{array}\,\end{eqnarray} \tag{ 15 }$

The cross terms of ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}$ are nonzero and are based on the sign of the shared baseline components of each closure phase.

For the same network of four sites, the first two log-closure amplitudes are also based on sums and differences of log-baseline amplitudes ${\boldsymbol{a}}=\{{a}_{12},{a}_{13},{a}_{14},{a}_{23},{a}_{24},{a}_{34}\}$ ,

$\begin{eqnarray}\begin{array}{rcl}{c}_{1234} & = & {a}_{12}+{a}_{34}-{a}_{13}-{a}_{24}\\ {c}_{1243} & = & {a}_{12}+{a}_{34}-{a}_{14}-{a}_{23}\end{array}\end{eqnarray} \tag{ 16 }$

with a third closure amplitude that is redundant,

$\begin{eqnarray}&&{c}_{1342}={a}_{13}+{a}_{24}-{a}_{14}-{a}_{23}.\end{eqnarray} \tag{ 17 }$

By using log amplitude, the redundancy in closure amplitudes can be cast in terms of linear dependence, as is already the case for closure phases. Covariance terms are formed according to shared baselines, as was done for closure phases,

$\begin{eqnarray}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}}}=\left(\begin{array}{cc}{\sigma }_{12}^{2}+{\sigma }_{34}^{2}+{\sigma }_{13}^{2}+{\sigma }_{24}^{2} & {\sigma }_{12}^{2}+{\sigma }_{34}^{2}\\ {\sigma }_{12}^{2}+{\sigma }_{34}^{2} & {\sigma }_{12}^{2}+{\sigma }_{34}^{2}+{\sigma }_{14}^{2}+{\sigma }_{23}^{2}\end{array}\right).\end{eqnarray} \tag{ 18 }$

The likelihood of observing a set of measured residual log-closure amplitudes $\tilde{{\boldsymbol{c}}}={\boldsymbol{c}}-\hat{{\boldsymbol{c}}}$ , given measurements ${\boldsymbol{c}}$ and model hypothesis $\hat{{\boldsymbol{c}}}$ , parallels Equation (11) for closure phases,

$\begin{eqnarray}&&{ \mathcal L }=\displaystyle \frac{1}{\sqrt{{\left(2\pi \right)}^{Q}\det \left({{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}}}\right)}}\exp \left[-\displaystyle \frac{1}{2}{\tilde{{\boldsymbol{c}}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}}}^{-1}\tilde{{\boldsymbol{c}}}\right].\end{eqnarray} \tag{ 19 }$

The covariance matrix ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}}}$ must be formed from a nonredundant set of $Q\leqslant {Q}_{\mathrm{minimal}}$ closure quantities—otherwise, the matrix will be rank deficient and not invertible. ${Q}_{\mathrm{minimal}}$ is the minimum size set that captures all available degrees of freedom, as well as the largest nonredundant set that can be formed (this is demonstrated in Section 3.2). The value ${\tilde{{\boldsymbol{c}}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}}}^{-1}\tilde{{\boldsymbol{c}}}$ will then follow a χ² distribution with Q degrees of freedom.

If we write the inverse covariance matrix as ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}}}^{-1}\,={{\boldsymbol{U}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}},\mathrm{diag}}^{-1}{\boldsymbol{U}}$ , we see that matrix U transforms a nonredundant set of ${Q}_{\mathrm{minimal}}$ closure quantities into a space of combinations of closure quantities with independent noise, and characterized by diagonal covariance matrix ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}},\mathrm{diag}}$ . When applied to closure phases, this generates the so-called "kernel phases," first noted by Martinache (2010). The closure basis formed in this manner can be arbitrarily rotated by different choices of U, but all rotations capture the same ${Q}_{\mathrm{minimal}}$ degrees of freedom. Additional redundant closure quantities to this set will be perfectly degenerate with linear combinations of the closure basis, and they will not add additional information to the likelihood of a set of observations. Thus, the calculation of χ² is unique and does not depend on the particular set of nonredundant closure quantities used (specific examples of this invariance are provided in Appendix C).

In terms of the closure (log amplitude) design matrix ${\boldsymbol{C}}$ , the factorization can also be written

$\begin{eqnarray}&&{{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}}}^{-1}={{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}}}^{+}={\left({{\boldsymbol{C}}}^{+}\right)}^{\top }{{\boldsymbol{S}}}^{-1}{{\boldsymbol{C}}}^{+}\end{eqnarray} \tag{ 20 }$

where ${{\boldsymbol{C}}}^{+}$ is the pseudo-inverse of ${\boldsymbol{C}}$ , and ${{\boldsymbol{S}}}^{-1}$ is a diagonal matrix containing the reciprocal baseline thermal variances ${\sigma }_{{ij}}^{2}$ . ${{\boldsymbol{C}}}^{+}$ itself does not depend on the actual baseline noise and can be readily computed via singular value decomposition (SVD). Redundant degrees of freedom will be reflected by singular values of zero and can be avoided by first removing the redundant closure quantities by matrix reduction or explicit construction (Section 3.2). In that case, the pseudo-inverse will be a true inverse. The advantage to inverting the design matrix rather than the covariance matrix (as in Equations (11) or (19)) is that the operation on the design matrix can be done once and then applied to different baseline noise prescriptions with little computational cost.

3.2. Minimal Complete Sets

The total number, $T$ , of triangles that can be constructed from a fully connected set of baselines across $N$ sites is

$\begin{eqnarray}&&{T}_{\mathrm{all}}=\left(\displaystyle \genfrac{}{}{0em}{}{N}{3}\right)=\displaystyle \frac{N(N-1)(N-2)}{3!},\end{eqnarray} \tag{ 21 }$

while the number of closure phase degrees of freedom is only the total number of baseline phases ( $N(N-1)/2$ ) minus the number of degrees of freedom contained in site phase differences ( $N-1$ ). These degrees of freedom should be captured by a nonredundant subset of closure phases of size

$\begin{eqnarray}&&{T}_{\mathrm{minimal}}=\displaystyle \frac{(N-1)(N-2)}{2}.\end{eqnarray} \tag{ 22 }$

For a large network, the set of all closure triangles will quickly outpace the number of independent measurements, resulting in a highly redundant set. One method for choosing a minimal set of closure triangles is given by Thompson et al. (2017) and shown in Figure 4. It involves selecting a single reference station and selecting the set of all triangles that contain it. Triangles that do not contain the reference station are formed as combinations of triangles from the minimal set.

**Figure 4.** Construction of a minimal set of closure phases. Site 1 is used as a reference, from which there are $(N-1)(N-2)/2$ choices for the other two sites that build the set of all triangles containing site 1. Combinations of closure phases from this set can be used to form arbitrary triangles that do not contain the reference station, proving that the set is complete. This prescription is described in Thompson et al. (2017).
Download figure:
Standard image High-resolution image

**Figure 4.** Construction of a minimal set of closure phases. Site 1 is used as a reference, from which there are $(N-1)(N-2)/2$ choices for the other two sites that build the set of all triangles containing site 1. Combinations of closure phases from this set can be used to form arbitrary triangles that do not contain the reference station, proving that the set is complete. This prescription is described in Thompson et al. (2017).
Download figure:
Standard image High-resolution image

We introduce a corresponding diagrammatic procedure for selecting a minimal set of closure amplitudes (Figure 5). The independent closure amplitudes are formed by arranging $N$ sites on a ring and selecting all pairs of two adjacent nonoverlapping sites. One closure amplitude is formed from each four-site arrangement (quadrangle) from the baselines that span the pair. Because the order of the pair does not matter, this results in the formation of

$\begin{eqnarray}&&{Q}_{\mathrm{minimal}}=\displaystyle \frac{N(N-3)}{2}\end{eqnarray} \tag{ 23 }$

total closure amplitudes, equal to the $N(N-1)/2$ baseline degrees of freedom minus the $N$ unknown site gain factors. Combinations of closure amplitudes from the basis can be used to construct all remaining possible closure amplitudes, showing that the remaining closure amplitudes are redundant. Note that because adding a new station to the ring will necessarily break up one previous pair, the minimal set formed this way across $N$ stations is not a proper subset of that formed across $N+1$ stations. An alternative strategy for building the set of closure amplitudes in a staged matrix-driven approach is presented in Appendix B.3.

**Figure 5.** Construction of a minimal set of closure amplitudes. We begin with the set of closure amplitudes defined by choosing all sets of two nonoverlapping pairs of adjacent sites and forming one closure amplitude from each collection of four sites according to the baselines shown on the top left. The solid and dashed lines determine which baselines go in the numerator and denominator of the closure amplitude. Since there are $N$ choices for the placement of the first adjacent pair and $N-3$ choices for the placement of the second pair, there are $N(N-3)/2$ nonredundant closure amplitudes formed. By multiplying closure amplitudes from our set, we can construct arbitrary closure amplitudes containing nonadjacent sites. The set is therefore complete.
Download figure:
Standard image High-resolution image

**Figure 5.** Construction of a minimal set of closure amplitudes. We begin with the set of closure amplitudes defined by choosing all sets of two nonoverlapping pairs of adjacent sites and forming one closure amplitude from each collection of four sites according to the baselines shown on the top left. The solid and dashed lines determine which baselines go in the numerator and denominator of the closure amplitude. Since there are $N$ choices for the placement of the first adjacent pair and $N-3$ choices for the placement of the second pair, there are $N(N-3)/2$ nonredundant closure amplitudes formed. By multiplying closure amplitudes from our set, we can construct arbitrary closure amplitudes containing nonadjacent sites. The set is therefore complete.
Download figure:
Standard image High-resolution image

In practice, the full set of $N(N-1)/2$ baseline visibilities may not be available due to processing issues or by choice, which complicates the generation of a minimal set of closure quantities. For closure phases, so long as a missing baseline does not include the reference station (Figure 4), the closure triangle containing the missing baseline can be excluded from the minimal set. For closure amplitudes, missing exterior baselines between adjacent sites along the ring (Figure 5) appear in only one of the closure amplitude basis quadrangles. Removing the quadrangle with the missing exterior baseline will correctly exclude all derivative closure amplitudes from the set. For more complicated baseline unavailability, a site-based procedure for forming nonredundant closure quantities may not work. One alternative method for extracting the unique degrees of freedom from a partially redundant set of closure quantities is through singular value decomposition (SVD) of the covariance matrix or design matrix (Equation (20), Figure 6). Alternatively, a minimal set of the original closure quantities can be identified and extracted by matrix reduction of the design matrix.

**Figure 6.** Singular value decomposition of the covariance matrix formed from a full set of 378 closure amplitudes over nine sites. The baseline noise prescription is random. There are 27 nonzero singular values corresponding to the $9\times (9-3)/2=27$ independent degrees of freedom represented in the closure amplitudes. SVD is particularly useful in situations of arbitrary missing baselines, which complicates the direct generation of a minimal set of closure quantities.
Download figure:
Standard image High-resolution image

**Figure 6.** Singular value decomposition of the covariance matrix formed from a full set of 378 closure amplitudes over nine sites. The baseline noise prescription is random. There are 27 nonzero singular values corresponding to the $9\times (9-3)/2=27$ independent degrees of freedom represented in the closure amplitudes. SVD is particularly useful in situations of arbitrary missing baselines, which complicates the direct generation of a minimal set of closure quantities.
Download figure:
Standard image High-resolution image

An upper limit on the number ${n}_{\psi }$ of different minimal closure phase subsets that exist for a fully connected array with N stations is given by the binomial coefficient

$\begin{eqnarray}&&{n}_{\psi }\leqslant \left(\displaystyle \genfrac{}{}{0em}{}{{\rm{size}}\,{\rm{of}}\,{\rm{maximal}}\,{\rm{set}}}{{\rm{size}}\,{\rm{of}}\,{\rm{minimal}}\,{\rm{set}}}\right)=\left(\displaystyle \genfrac{}{}{0em}{}{\left(\genfrac{}{}{0em}{}{N}{3}\right)}{\left(\genfrac{}{}{0em}{}{N-1}{2}\right)}\right).\end{eqnarray} \tag{ 24 }$

This expression yields only an upper limit because for a given maximal set and N > 4, some selections of subsets with size equal to that of the minimal set will contain redundant closure phases, and so they will not themselves be valid minimal sets. An analogous upper limit holds for the number of nonredundant sets of closure amplitudes,

$\begin{eqnarray}&&{n}_{c}\leqslant \left(\displaystyle \genfrac{}{}{0em}{}{3\left(\genfrac{}{}{0em}{}{N}{4}\right)}{\tfrac{N(N-3)}{2}}\right).\end{eqnarray} \tag{ 25 }$

Both n_ψ and n_c grow super-exponentially with N (see Table 1), and as the number of stations increases beyond a few, it quickly becomes prohibitive to search through all possible nonredundant subsets for the one that minimizes covariance. The minimal-covariance subset for both closure phases and log-closure amplitudes will generically depend on the specific baseline S/N distribution of the array, and we do not know of a general-purpose algorithm for selecting the optimal set. Instead, we consider rules of thumb for two limiting cases that approximate realistic array configurations: an array with uniform S/N on all baselines, and an array with S/N dominated by strong baselines to a single station or with other means to clearly identify weak baselines.

Table 1. Unique Minimal Sets

N	n_ψ	n_c
3	1	...
4	4	3
5	125	1518
6	46620	351117922

Note. The number of unique minimal sets of closure phases and log-closure amplitudes for small arrays.

Download table as: ASCII Typeset image

We use the determinant λ of the correlation matrix ${\boldsymbol{\varrho }}$ to quantify the degree of independence for any specific choice of minimal subset,

$\begin{eqnarray}&&\lambda \equiv \det \left({\boldsymbol{\varrho }}\right)\end{eqnarray} \tag{ 26 }$

$\begin{eqnarray}&&{\varrho }_{{ij}}=\displaystyle \frac{{{\rm{\Sigma }}}_{{ij}}}{\sqrt{{{\rm{\Sigma }}}_{{ii}}{{\rm{\Sigma }}}_{{jj}}}},\end{eqnarray} \tag{ 27 }$

where the elements of ${\boldsymbol{\varrho }}$ are related to the elements of the covariance matrix ${\boldsymbol{\Sigma }}$ . The value of λ varies between zero and one, with λ = 1 corresponding to no correlation and λ = 0 corresponding to complete correlation.

For an array with a uniform S/N on all baselines (e.g., a homogeneous array observing a point source), the covariance is minimized (i.e., λ is maximized) when all stations are represented as nearly equally as possible¹⁰ in the minimal set of either closure phases or log-closure amplitudes. For example, an array with N = 6 stations has a minimal closure phase set size of 10, but of the n_ψ = 46,620 different choices of minimal set, only 12 equally represent all baselines.¹¹ Similarly, an array with N = 5 stations has a minimal log-closure amplitude set size of five, but only six out of n_c = 1518 minimal sets equally represent all baselines.¹²

For an array with a high S/N on baselines to only one station (e.g., a heterogeneous array containing one highly sensitive station), the closure phase covariance is minimized when the minimal set is constructed using only triangles containing the reference station; that is, using the minimal set construction algorithm described earlier in this section produces an optimal set when the reference station dominates the array sensitivity. This is because weak baselines between two non-reference stations are then used only once in the construction. For log-closure amplitudes, placing the lowest S/N baselines on the ring as adjacent sites (as in Figure 5) accomplishes the same goal; the weakest baselines are used only once in the minimal set and, thus, do not contribute to the overall covariance.

3.3. Redundant Baselines

Some interferometric arrays have multiple baselines that are effectively redundant (dense arrays are often designed with this redundancy, to aid calibration). For instance, a common case in VLBI is to have multiple sites that can effectively be considered colocated. For example, the CSO, JCMT, and SMA are all on Maunakea and have participated in EHT experiments. Likewise, the APEX telescope is located within a few kilometers of the ALMA phased array center. Baselines to these redundant sites sample the same visibility and source structure, and they can be combined to reduce thermal noise and to improve calibration. For example, the addition of ALMA to the EHT including APEX does not provide new baselines. However, it significantly reduces the thermal noise of baselines to Chile.

We have so far focused on the unique statistical degrees of freedom contained in the closure quantities, which do not depend on array geometry. Baseline redundancy does have a dramatic effect, however, on the unique source structure degrees of freedom measured by the array. For example, the addition of colocated sites to a VLBI network does not sample new nontrivial source information via closure phases even as the statistical degrees of freedom grow according to Equation (23), but it does increase the amount of source information measured via closure amplitudes. In the limit where every site has a redundant partner, all source visibility amplitude information is sampled via closure amplitudes apart from a single unknown degree of freedom for the total flux density.

To assess the independent degrees of freedom for an array with baseline redundancy, we introduce a redundancy matrix ${\boldsymbol{R}}$ of dimensions ${B}_{\mathrm{NR}}\times B$ that links multiple measurements from redundant baselines into a single degree of freedom, such that ${B}_{\mathrm{NR}}\leqslant B$ is the number of nonredundant geometric baselines that sample unique source structure. For each row corresponding to a unique geometric baseline, ${\boldsymbol{R}}$ contains a "1" in each column for each matching station pair. If there are no redundant baselines, ${\boldsymbol{R}}$ is the identity matrix. For the four-site network in Figure 3, if stations 1 and 2 are taken to be colocated, then of the six measured baselines $\{{V}_{12},{V}_{13},{V}_{14},{V}_{23},{V}_{24},{V}_{34}\}$ , ${V}_{13}\sim {V}_{23}$ sample the same geometric baseline, as do ${V}_{14}\sim {V}_{24}$ , so that

$\begin{eqnarray}{\boldsymbol{R}}=\left(\begin{array}{cccccc}1 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0 & 1 & 0\\ 0 & 0 & 0 & 0 & 0 & 1\end{array}\right)\end{eqnarray} \tag{ 28 }$

with the "zero baseline" ${V}_{12}$ serving as one of the four unique geometric baselines.

The number of unique source degrees of freedom captured by closure quantities is found by taking the rank of the compound design matrix, which converts nonredundant amplitudes to closure quantities. For the previous four-station example with one colocated pair, this gives $\mathrm{rank}({\boldsymbol{\Psi }}{{\boldsymbol{R}}}^{\top })=2$ gain-independent phase structure degrees of freedom, and $\mathrm{rank}({{\boldsymbol{CR}}}^{\top })=1$ gain-independent amplitude structure degrees of freedom.¹³ A four-site array arranged in a square satisfies different constraints with ${V}_{12}\sim {V}_{34}$ and ${V}_{14}\sim {V}_{23}$ . While there are still four unique geometric baselines, there are now $\mathrm{rank}({\boldsymbol{\Psi }}{{\boldsymbol{R}}}^{\top })=3$ structure closure phases and $\mathrm{rank}({{\boldsymbol{CR}}}^{\top })=2$ structure closure amplitudes, both equal to the corresponding number of linearly independent closure quantities.

The analysis indicates that by judicious use of redundancy, an interferometric array can reduce the overall complexity of the measurements (with $\mathrm{rank}({\boldsymbol{R}})$ as an indication of complexity) while not sacrificing measured gain-independent structure degrees of freedom. For example, $\mathrm{rank}({\boldsymbol{R}})-\mathrm{rank}({{\boldsymbol{CR}}}^{\top })$ can be taken as an indication of the number of "amplitude gains" that remain unconstrained. As some level of a priori gain information is generally available, a sparse array that samples the maximum number of unique geometric baselines is likely preferable over one that utilizes geometric redundancy for most situations. Colocated sites in particular can cause a significant loss in measured information. However, they do provide a link to zero baseline quantities (such as total flux), which are often known a priori and, thus, inform model independent calibration (Blackburn et al. 2019).

4. Model Fitting with Unknown Gains

In this section, we apply the closure construction procedures detailed in the appendices to perform a series of simple model fits to different simulated data products generated from the same underlying truth image. The goal of these tests is to demonstrate that the same model parameter posteriors can be recovered using different representations of the data products, so long as covariances between measurements are properly accounted for.

4.1. Visibility Covariance due to Gain Error

To connect model fitting to closure quantities with model fitting to baseline visibilities, we first introduce a parallel construction (to Section 3.1) for the covariance in visibility measurements under the presence of uncertainty in station gain. In both cases, we characterize the covariance in a residual quantity ( $\tilde{{\boldsymbol{\psi }}}$ or $\tilde{{\boldsymbol{c}}}$ for closure quantities, $\tilde{{\boldsymbol{\phi }}}$ or $\tilde{{\boldsymbol{a}}}$ for visibilities—see Table D1), reflecting the difference between the measured quantity and the model prediction. However, while the covariance for residual closure quantities is due to thermal error on shared baselines, the baseline thermal noise is independent for visibility quantities in the weak source limit, and the covariance is due to systematic error in model gain over shared stations. A visibility measurement ${V}_{{ij}}$ contains contributions from both the source and from the station gains,

$\begin{eqnarray}&&{V}_{{ij}}={\gamma }_{i}{\gamma }_{j}^{* }{r}_{{ij}}.\end{eqnarray} \tag{ 29 }$

The multiplicative complex gains manifest as additive terms modifying the visibility phases and log visibility amplitudes,

$\begin{eqnarray}&&{\phi }_{{ij}}={\breve{\phi }}_{{ij}}+{\theta }_{i}-{\theta }_{j},\end{eqnarray} \tag{ 30a }$

$\begin{eqnarray}&&{a}_{{ij}}={\breve{a}}_{{ij}}+{g}_{j}+{g}_{j},\end{eqnarray} \tag{ 30b }$

where the sign differences in the second gain terms arise because complex conjugation negates phases but leaves amplitudes unchanged.

More generally, we can express the gain contributions to a collection of visibility phases or log visibility amplitudes in terms of design matrices ${\boldsymbol{\Phi }}$ or ${\boldsymbol{A}}$ operating on the vector of gain phases or log gain amplitudes,

$\begin{eqnarray}&&{\boldsymbol{\phi }}=\breve{{\boldsymbol{\phi }}}+{\boldsymbol{\Phi }}{\boldsymbol{\theta }},\end{eqnarray} \tag{ 31a }$

$\begin{eqnarray}&&{\boldsymbol{a}}=\breve{{\boldsymbol{a}}}+{\boldsymbol{A}}{\boldsymbol{g}}.\end{eqnarray} \tag{ 31b }$

For example, the visibility phases measured on the baselines in Figure 3 can be expressed using

$\begin{eqnarray}&&\left(\begin{array}{c}{\phi }_{12}\\ {\phi }_{13}\\ {\phi }_{14}\\ {\phi }_{23}\\ {\phi }_{24}\\ {\phi }_{34}\end{array}\right)=\left(\begin{array}{c}{\breve{\phi }}_{12}\\ {\breve{\phi }}_{13}\\ {\breve{\phi }}_{14}\\ {\breve{\phi }}_{23}\\ {\breve{\phi }}_{24}\\ {\breve{\phi }}_{34}\end{array}\right)+\left(\begin{array}{cccc}1 & -1 & 0 & 0\\ 1 & 0 & -1 & 0\\ 1 & 0 & 0 & -1\\ 0 & 1 & -1 & 0\\ 0 & 1 & 0 & -1\\ 0 & 0 & 1 & -1\end{array}\right)\left(\begin{array}{c}{\theta }_{1}\\ {\theta }_{2}\\ {\theta }_{3}\\ {\theta }_{4}\end{array}\right),\end{eqnarray} \tag{ 32 }$

while the log visibility amplitudes can be similarly expressed using

$\begin{eqnarray}&&\left(\begin{array}{c}{a}_{12}\\ {a}_{13}\\ {a}_{14}\\ {a}_{23}\\ {a}_{24}\\ {a}_{34}\end{array}\right)=\left(\begin{array}{c}{\breve{a}}_{12}\\ {\breve{a}}_{13}\\ {\breve{a}}_{14}\\ {\breve{a}}_{23}\\ {\breve{a}}_{24}\\ {\breve{a}}_{34}\end{array}\right)+\left(\begin{array}{cccc}1 & 1 & 0 & 0\\ 1 & 0 & 1 & 0\\ 1 & 0 & 0 & 1\\ 0 & 1 & 1 & 0\\ 0 & 1 & 0 & 1\\ 0 & 0 & 1 & 1\end{array}\right)\left(\begin{array}{c}{g}_{1}\\ {g}_{2}\\ {g}_{3}\\ {g}_{4}\end{array}\right).\end{eqnarray} \tag{ 33 }$

The visibility phase design matrix is equivalent to the "phase aberration operator" of Lannes (1990b), while the log visibility amplitude design matrix matches the "amplitude aberration operator" of Lannes (1990a, 1991).

This additivity makes it convenient to model the gain phases and log gain amplitudes as Gaussian distributed, so that their variances simply add to those of the corresponding visibility quantities. The baseline-based thermal variances are uncorrelated across baselines, and in the absence of gains, they would fully describe the visibility covariances via the diagonal matrices ${{\boldsymbol{S}}}_{{\boldsymbol{\phi }}}$ for visibility phases and S_a for log visibility amplitudes (see Tables B1 and B2 in Appendix B.1). The station-based gain variances do drive covariances in the visibility residuals for baselines that share a station, with the design matrices serving to map stations to baselines. The covariance matrices are then constructed as the sum of the baseline-based and station-based contributions,

$\begin{eqnarray}&&{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}={{\boldsymbol{S}}}_{{\boldsymbol{\phi }}}+{\boldsymbol{\Phi }}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\theta }}}{{\boldsymbol{\Phi }}}^{\top },\end{eqnarray} \tag{ 34a }$

$\begin{eqnarray}&&{{\boldsymbol{\Sigma }}}_{{\boldsymbol{a}}}={{\boldsymbol{S}}}_{{\boldsymbol{a}}}+{\boldsymbol{A}}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{g}}}{{\boldsymbol{A}}}^{\top },\end{eqnarray} \tag{ 34b }$

with the off-diagonal elements consisting of only station-based terms while the diagonal elements combine both station-based and baseline-based terms. The covariance matrix corresponding to the visibility phases in Equation (32) is given by

$\begin{eqnarray}&&\begin{array}{l}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}\\ \ =\ \left(\begin{array}{cccc}{\sigma }_{12}^{2}+{\sigma }_{\theta ,1}^{2}+{\sigma }_{\theta ,2}^{2} & {\sigma }_{\theta ,1}^{2} & ... & 0\\ {\sigma }_{\theta ,1}^{2} & {\sigma }_{13}^{2}+{\sigma }_{\theta ,1}^{2}+{\sigma }_{\theta ,3}^{2} & ... & -{\sigma }_{\theta ,3}^{2}\\ {\sigma }_{\theta ,1}^{2} & {\sigma }_{\theta ,1}^{2} & ... & 0\\ -{\sigma }_{\theta ,2}^{2} & {\sigma }_{\theta ,3}^{2} & ... & -{\sigma }_{\theta ,3}^{2}\\ -{\sigma }_{\theta ,2}^{2} & 0 & ... & 0\\ 0 & -{\sigma }_{\theta ,3}^{2} & ... & {\sigma }_{34}^{2}+{\sigma }_{\theta ,3}^{2}+{\sigma }_{\theta ,4}^{2}\end{array}\right),\end{array}\,\end{eqnarray} \tag{ 35 }$

while the covariance matrix corresponding to the log visibility amplitudes in Equation (33) is structurally identical except for the off-diagonal term signs,

$\begin{eqnarray}&&\begin{array}{l}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{a}}}\\ \ =\ \left(\begin{array}{cccc}{\sigma }_{12}^{2}+{\sigma }_{g,1}^{2}+{\sigma }_{g,2}^{2} & {\sigma }_{g,1}^{2} & ... & 0\\ {\sigma }_{g,1}^{2} & {\sigma }_{13}^{2}+{\sigma }_{g,1}^{2}+{\sigma }_{g,3}^{2} & ... & {\sigma }_{g,3}^{2}\\ {\sigma }_{g,1}^{2} & {\sigma }_{g,1}^{2} & ... & 0\\ {\sigma }_{g,2}^{2} & {\sigma }_{g,3}^{2} & ... & {\sigma }_{g,3}^{2}\\ {\sigma }_{g,2}^{2} & 0 & ... & 0\\ 0 & {\sigma }_{g,3}^{2} & ... & {\sigma }_{34}^{2}+{\sigma }_{g,3}^{2}+{\sigma }_{g,4}^{2}\end{array}\right).\end{array}\,\end{eqnarray} \tag{ 36 }$

Table B1. Visibility Phase Design and Covariance Matrices for Two- and Three-element Arrays, along with Matrices Relevant for Their Construction

		Number of Stations ( $N$ )
Matrix	Shape	$N$ = 2	$N=3$
${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\theta }}}$	$N$ × $N$	$\left(\begin{array}{cc}{\sigma }_{\theta ,1}^{2} & 0\\ 0 & {\sigma }_{\theta ,2}^{2}\end{array}\right)$	$\left(\begin{array}{ccc}{\sigma }_{\theta ,1}^{2} & 0 & 0\\ 0 & {\sigma }_{\theta ,2}^{2} & 0\\ 0 & 0 & {\sigma }_{\theta ,3}^{2}\end{array}\right)$

${{\boldsymbol{S}}}_{{\boldsymbol{\phi }}}$	$B$ × $B$	$\left(\begin{array}{c}{\sigma }_{12}^{2}\end{array}\right)$	$\left(\begin{array}{ccc}{\sigma }_{12}^{2} & 0 & 0\\ 0 & {\sigma }_{13}^{2} & 0\\ 0 & 0 & {\sigma }_{23}^{2}\end{array}\right)$

${\boldsymbol{\Phi }}$	$B$ × $N$	$\left(\begin{array}{cc}1 & -1\end{array}\right)$	$\left(\begin{array}{ccc}1 & -1 & 0\\ 1 & 0 & -1\\ 0 & 1 & -1\end{array}\right)$

${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}$	$B$ × $B$	$\left(\begin{array}{c}{\sigma }_{12}^{2}+{\sigma }_{\theta ,1}^{2}+{\sigma }_{\theta ,2}^{2}\end{array}\right)$	$\left(\begin{array}{ccc}{\sigma }_{12}^{2}+{\sigma }_{\theta ,1}^{2}+{\sigma }_{\theta ,2}^{2} & {\sigma }_{\theta ,1}^{2} & -{\sigma }_{\theta ,2}^{2}\\ {\sigma }_{\theta ,1}^{2} & {\sigma }_{13}^{2}+{\sigma }_{\theta ,1}^{2}+{\sigma }_{\theta ,3}^{2} & {\sigma }_{\theta ,3}^{2}\\ -{\sigma }_{\theta ,2}^{2} & {\sigma }_{\theta ,3}^{2} & {\sigma }_{23}^{2}+{\sigma }_{\theta ,2}^{2}+{\sigma }_{\theta ,3}^{2}\end{array}\right)$

Note. Here, $B=\left(\displaystyle \genfrac{}{}{0em}{}{N}{2}\right)$ is the number of baselines.

Download table as: ASCII Typeset image

Table B2. Log Visibility Amplitude Design and Covariance Matrices for Two- and Three-station Arrays, along with Matrices Relevant for Their Construction

		Number of Stations ( $N$ )
Matrix	Shape	$N=2$	$N=3$
${{\boldsymbol{\Sigma }}}_{{\boldsymbol{g}}}$	$N$ × $N$	$\left(\begin{array}{cc}{\sigma }_{g,1}^{2} & 0\\ 0 & {\sigma }_{g,2}^{2}\end{array}\right)$	$\left(\begin{array}{ccc}{\sigma }_{g,1}^{2} & 0 & 0\\ 0 & {\sigma }_{g,2}^{2} & 0\\ 0 & 0 & {\sigma }_{g,3}^{2}\end{array}\right)$

${{\boldsymbol{S}}}_{{\boldsymbol{a}}}$	$B$ × $B$	$\left(\begin{array}{c}{\sigma }_{12}^{2}\end{array}\right)$	$\left(\begin{array}{ccc}{\sigma }_{12}^{2} & 0 & 0\\ 0 & {\sigma }_{13}^{2} & 0\\ 0 & 0 & {\sigma }_{23}^{2}\end{array}\right)$

${\boldsymbol{A}}$	$B$ × $N$	$\left(\begin{array}{cc}1 & 1\end{array}\right)$	$\left(\begin{array}{ccc}1 & 1 & 0\\ 1 & 0 & 1\\ 0 & 1 & 1\end{array}\right)$

${{\boldsymbol{\Sigma }}}_{{\boldsymbol{a}}}$	$B$ × $B$	$\left(\begin{array}{c}{\sigma }_{12}^{2}+{\sigma }_{g,1}^{2}+{\sigma }_{g,2}^{2}\end{array}\right)$	$\left(\begin{array}{ccc}{\sigma }_{12}^{2}+{\sigma }_{g,1}^{2}+{\sigma }_{g,2}^{2} & {\sigma }_{g,1}^{2} & {\sigma }_{g,2}^{2}\\ {\sigma }_{g,1}^{2} & {\sigma }_{13}^{2}+{\sigma }_{g,1}^{2}+{\sigma }_{g,3}^{2} & {\sigma }_{g,3}^{2}\\ {\sigma }_{g,2}^{2} & {\sigma }_{g,3}^{2} & {\sigma }_{23}^{2}+{\sigma }_{g,2}^{2}+{\sigma }_{g,3}^{2}\end{array}\right)$

Note. Here, $B=\left(\displaystyle \genfrac{}{}{0em}{}{N}{2}\right)$ is the number of baselines.

Download table as: ASCII Typeset image

The likelihood of observing a collection of $B=N(N-1)/2$ residual visibility phases under a given source and gain model is then

$\begin{eqnarray}&&{ \mathcal L }=\displaystyle \frac{1}{\sqrt{{\left(2\pi \right)}^{B}\det ({{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}})}}\exp \left[-\displaystyle \frac{1}{2}{\tilde{{\boldsymbol{\phi }}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}^{-1}\tilde{{\boldsymbol{\phi }}}\right],\end{eqnarray} \tag{ 37 }$

with a similar construction for log visibility amplitudes ${\boldsymbol{a}}$ . This likelihood reduces to the simple case of statistically independent measured visibilities in the limit of zero systematic gain error (i.e., perfectly calibrated data).

4.2. Model Specifications

We consider the simple geometric truth image shown in the left panel of Figure 7. This image is constructed from the sum of two elliptical Gaussian components that are symmetrically positioned about the origin with a mutual separation of $\xi =50$ μas and a position angle of η = 75 degrees east of north. Both components have major and minor axis Gaussian σ-values of 9 μas and 6 μas, respectively, and each has a flux density of 0.5 Jy. The major axis of the eastern component is oriented at 30 degrees east of north, while the western component has a −30 degree orientation. These specific choices of parameter values are largely arbitrary, and they serve primarily to give the image sufficient asymmetry to produce nontrivial visibility phases and sufficient compactness to produce nonzero visibility amplitudes.

To produce synthetic visibility data, we sample the Fourier transform of the truth image at discrete locations in (u, v) space. We consider two sets of (u, v) coverage, corresponding to (1) a single snapshot from an N = 8 station array with mutual visibility to all stations and (2) an N = 5 station subset of that array. Both sets of coverage are shown in the right panel of Figure 7. Visibility amplitudes and phases are given by the magnitude and argument of the complex visibilities from each (u, v) point. The visibilities are then multiplied by their associated station gains, which are simulated as complex Gaussian-distributed random variables with unit mean and standard deviation of 0.1 along each dimension. We add a single realization of Gaussian thermal noise to the visibilities corresponding to a median S/N of 10.2 and spanning an S/N range from 2.3 to 44.9. Closure phases are constructed from the visibility phases using Equation (C10), and log-closure amplitudes are constructed from the visibility amplitudes using Equation (C17).

We model the data as the sum of two elliptical Gaussians, with all parameters except for ξ and η held fixed at their corresponding truth values. By restricting the model to this two-dimensional subspace of its natural 12-dimensional parameter space, we simplify the fitting process while retaining enough model complexity to provide nontrivial parameter correlations. We perform parameter estimation using Gaussian likelihoods analogous to Equation (37) for all data products. Unless otherwise specified, we apply uniform priors on the range [40, 60] μas for ξ and [0, 180] degrees for η; when fitting gains, our "maximally uninformative" priors are log-uniform on the range [10⁻⁵, 10⁵] for all gain amplitudes and uniform priors on the range [0, 360) degrees for all gain phases. We use the Python nested sampling code dynesty¹⁴ (Speagle 2020) to produce parameter posteriors for all model fits.

4.3. Phase and Amplitude Modeling

We fit the model to our single realization of synthetic data represented in a variety of ways, starting with visibility phases under the assumption that the gain phases are perfectly known (or equivalently, that they are perfectly calibrated). The likelihood function for this representation is given by Equation (37), and because the gain phases are known, the visibility phase covariance matrix is diagonal. Figure 8 shows the two-dimensional (ξ, η) posteriors for such fits to the N = 8 array and N = 5 array data in black contours.

**Figure 8.** Joint posterior distributions for residual separation (ξ − ξ₀) and position angle (η − η₀) when fitting the model described in Section 4 to visibility phases with perfectly known gain phases (black contours), visibility phases with completely unknown gain phases (gray contours), and closure phases with covariant structure accounted for (red dashed contours). We also show the results from numerically marginalizing over the gains (thin black contours), which match the covariant treatment as expected (see Appendix C.5). The model is fitted to the eight-station array data on the left and to the five-station array data on the right; we can see that the relative loss of information when going from perfectly calibrated phases to closure phases increases for smaller arrays. In both cases, the closure phase fits accurately recover the posteriors derived from visibility phase fits, within sampling uncertainties. Contours enclose 50%, 90%, and 99% of the posterior probability.
Download figure:
Standard image High-resolution image

We also fit the model to visibility phase data without assuming any a priori knowledge of the gain phases. The likelihood function remains Equation (37), but in this case, the covariance matrix is no longer diagonal. The gray contours in Figure 8 show the corresponding joint posteriors for (ξ, η), which exhibit the expected loss of constraining power compared to the posteriors derived from calibrated visibility phases. We can see that this loss becomes less severe as the number of stations increases, a consequence of the fact that the fraction of the visibility phase information required to constrain the gain phases decreases as 2/N.

The other phase data representations we consider are closure phases. For a minimal subset of closure phases described by covariance matrix ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}$ (see Appendix B.2), the likelihood function is given by Equation (11). Posteriors derived from this likelihood are shown as red dashed contours in Figure 8. Within the ∼1% numerical sampling uncertainties of our posterior contours, the closure phases provide parameter constraints that are identical to those imposed by the uncalibrated visibility phases.

We perform a corresponding set of model fits to visibility amplitudes and log-closure amplitudes rather than visibility phases and closure phases. The black contours in Figure 9 show the (ξ, η) posteriors for fits to the N = 8 and N = 5 array visibility amplitude data, using a likelihood analogous to Equation (37) under the assumption of perfectly calibrated gain amplitudes. The gray contours show fits to visibility amplitudes using the same likelihood but assuming no knowledge of the gain amplitudes. We again see the relative loss of information increasing as the number of stations decreases, becoming particularly severe for the case of N = 5 (in which there are only three degrees of freedom remaining in the data to constrain the model, compared to eight degrees of freedom when the gain amplitudes are calibrated).

**Figure 9.** Same as Figure 8 but for visibility amplitudes and log-closure amplitudes rather than visibility phases and closure phases. Contours enclose 50%, 90%, and 99% of the posterior probability.
Download figure:
Standard image High-resolution image

We compare the visibility amplitude results to those obtained from fitting to log-closure amplitudes. For a minimal subset of log-closure amplitudes described by a covariance matrix ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}}}$ (see Appendix B.3), the likelihood function is given by Equation (19). The posteriors derived using this likelihood expression are plotted in Figure 9 as red dashed contours. As with the closure phases, we find that the log-closure amplitudes provide constraints that are identical to those provided by the uncalibrated visibility amplitudes.

We also consider two alternative treatments of the closure phases and log-closure amplitudes that attempt to avoid accounting for covariances, and we show here that these efforts fail. In the first such treatment, we use a minimal closure phase subset but assume all measurements are independent. This assumption amounts to using only the diagonal elements of ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}$ (i.e., all off-diagonal elements are set to zero), and the likelihood function remains Equation (11). The red and blue contours in the left panel of Figure 10 show posteriors derived under this assumption, for two different choices of minimal closure phase subset constructed by ordering the stations from lowest to highest (red) and highest to lowest (light blue) mean baseline S/N. We can see that these contours systematically deviate from the visibility phase contour. In the second treatment, we use the maximal (redundant) set of closure phases (see Appendix B.2), but we retain the assumption that all measurements are independent. The likelihood is then simply the product of the individual measurement likelihoods taken over all closure phases in the maximal set. The dotted black contour in the left panel of Figure 10 represents the resulting posterior after scaling the individual measurement variances by

$\begin{eqnarray}&&{R}_{\psi }=\left(\displaystyle \genfrac{}{}{0em}{}{N}{3}\right)/\left(\displaystyle \genfrac{}{}{0em}{}{N-1}{2}\right)=\displaystyle \frac{N}{3},\end{eqnarray} \tag{ 38 }$

which is a redundancy factor that accounts for the fact that the maximal set contains an increased number of measurements without a corresponding increase in the number of degrees of freedom. Even after accounting for this redundancy, however, we see a similar systematic discrepancy in the posterior relative to those derived from the visibility phases. Note that for the unusual case of equal S/N on all baselines, this redundancy factor scaling does produce the correct likelihood (see Appendix C).

In the right panel of Figure 10, we again compare the posteriors obtained using (1) a minimal set of log-closure amplitudes without accounting for covariance, and (2) a maximal set of log-closure amplitudes. In both cases, the likelihood function is the product of the individual measurement likelihoods, where the product is taken over all log-closure amplitudes in the minimal or maximal set, as appropriate. We again consider two choices of minimal subset, constructed via the same station-ordering scheme used for phases. For the posteriors derived from the maximal set, shown using a dotted black contour in the right panel of Figure 10, we have scaled the measurement variances by the redundancy factor

$\begin{eqnarray}&&{R}_{A}=3\left(\displaystyle \genfrac{}{}{0em}{}{N}{4}\right)/\left(\displaystyle \frac{N(N-3)}{2}\right)=\displaystyle \frac{(N-1)(N-2)}{4}.\end{eqnarray} \tag{ 39 }$

Regardless of the redundancy correction, we find in both cases that the posterior distributions do not match those expected from fits to the visibility amplitudes.

4.4. Gain Uncertainty Modeling

We have shown that our level of knowledge about the gains dictates how much source information the closure quantities contain relative to the visibility quantities. For perfectly known (or, equivalently, perfectly calibrated) gains, the visibility quantities provide more information about the source than the closure quantities; for small arrays, this difference may be quite large (see, e.g., Figure 9). When the gains are completely unknown (or, equivalently, when the gains must be fully determined along with the source information), both the visibility and closure quantities contain identical source information. We now explore the case of partially known gains.

We quantify how well we know the gains by comparing our gain uncertainty, σ_i, to the uncertainties in the data (i.e., in the visibilities), σ_ij, using

$\begin{eqnarray}&&\varepsilon \equiv \displaystyle \frac{\left\langle {\sigma }_{{ij}}\right\rangle }{\left\langle {\sigma }_{i}\right\rangle },\end{eqnarray} \tag{ 40 }$

where $\left\langle \right\rangle$ denotes a sample average; the averages are taken over all stations and all baselines for the gain uncertainties and visibility uncertainties, respectively. The quantity ε tracks our knowledge of the gains; $\varepsilon \to 0$ when we have no information about the gains, and $\varepsilon \to \infty$ when the gains are perfectly calibrated. Note that both σ_i and σ_ij refer to logarithmic uncertainties when considering amplitude data products, meaning that ε can also be thought of as the ratio of the "gain S/N" to the data S/N.

Within the context of our model-fitting procedure, the assumed level of gain knowledge can be straightforwardly incorporated using Gaussian priors on the gain parameters. Figure 11 shows the results of fitting to visibility quantities while varying the value of ε. We find that noticeable improvements in the posterior constraints start to occur for ε ≳ 1, and that for ε ≳ 10, the posteriors better approximate the perfect-knowledge case (black contours in Figure 11) than they do the no-knowledge case (gray contours). This matches our expectation that knowledge of gains begins to inform an overconstrained model as soon as its precision approaches that of the thermal uncertainties. For an underconstrained problem such as imaging using a sparse array, typical regularization imposes much weaker relationships across points in the (u, v) domain, and partial gain calibration can matter much earlier by providing unique information not sampled by the closure quantities.

**Figure 11.** Left panel: posterior contours for fits to visibility phases with varying degrees of prior gain phase knowledge assumed. The gray contour matches the posterior recovered when fitting to closure phases (see Figure 8). Right panel: same as the left panel but fitting to visibility amplitudes rather than visibility phases. The contours in both panels enclose 90% of the posterior probability.
Download figure:
Standard image High-resolution image

While the demonstrations presented here are all done using simulated observations, the recent parameterization of the horizon-scale emission and shadow of the supermassive black hole in M87 by the Event Horizon Telescope Collaboration et al. (2019c) utilized cross-validation of results across several techniques to handle gain uncertainty. These included explicit semi-analytic marginalization of amplitude gains by Laplace approximation (via Themis; Broderick et al. 2020), minimization of closure phase covariance through selection of a highly sensitive reference antenna (ALMA), and use of diagonalized closure phases and log-closure amplitudes by accounting for covariance (via dynesty, as described in this work). The multiple approaches resulted in a high degree of consistency as reflected by their posterior parameter distributions. A detailed study of the effects of covariant interferometric errors on imaging and on parameter reconstruction is forthcoming (D. W. Pesce et al. 2020, in preparation).

5. Summary

We have explored in detail the statistics of closure phase and closure amplitude for S/N ≳ 1, characteristic of high-frequency radio interferometry where both phase and amplitude calibration have significant uncertainties, and where phase coherence timescales are short relative to the length of a continuous observation. The analysis unifies and clarifies several concepts that have been previously discussed in the literature regarding the independence of closure quantities, the nature and number of statistical degrees of freedom, best practices for constructing and fitting to closure quantities, and the relationship of closure quantities to self calibration and marginalization over unknown gains. Due to the large number of topics covered, we delineate the main statements and findings from this work across three primary topics.

(1) Formation of closure quantities and non-Gaussian errors:

1.
Non-Gaussian errors become significant for S/N below ∼2–5 for phase, amplitude, and log amplitude. Reciprocal amplitude is unstable below S/N ∼5, which provides motivation to use log amplitude instead when there is a chance for low-S/N amplitudes to appear in the denominator of an amplitude ratio. (Appendix A)
2.
The ensemble distribution of measured log amplitude for known S/N is fully characterized in terms of moments, from which expected distributions for log-closure amplitude can be derived. (Appendix A)
3.
In practice, a noisy estimate of S/N prevents a reliable characterization of phase and amplitude errors, particularly for weak signals, and can lead to significant bias from self-selection of data. (Section 2.4)

(2) The covariance structure for closure quantities—fitting those quantities to a model and characterizing their fundamental degrees of freedom:

1.
Closure quantities formed from a common set of baseline visibilities are covariant due to shared thermal noise. This must be included for a particular realization to recover both proper χ² statistics and a correct likelihood. (Section 3.1)
2.
When covariant errors are included, both the χ² and the likelihood (in the Gaussian limit) are independent of the specific minimal set of closure quantities used for the calculation. (Section 3.1)
3.
In the limit of equal S/N on all baselines, the χ² for a specific minimal set reduces to an evaluation over all closure quantities weighted equally, scaled to the appropriate degrees of freedom. (Appendix C)
4.
If closure quantities are assumed to be independent and the covariance structure is ignored, results do depend on the specific choice of minimal set. Certain selections can be chosen to minimize off-diagonal terms in the covariance matrix, but this choice depends on the specific arrangement of baseline S/N. (Section 3.2, Figure 10 of Section 4)
5.
Two different direct constructions for selecting nonredundant sets of closure amplitudes are given. They verify explicitly the expected N(N − 3)/2 independent degrees of freedom contained in the closure amplitudes. (Section 3.2 and Appendix B.3)
6.
A unified matrix construction for creating visibilities and closure quantities is given, which systematically builds up design matrices for increasing station number. These are used to derive the covariance and other relationships across different quantities. (Appendix B)

(3) Relationship of closure information to station gain information, and behavior in the limit of completely known gains, partially known gains, and completely unknown gains:

1.
Under a model for systematic station-based gain error, residual visibility phase and log amplitude also assume a covariance structure with nonzero off-diagonal elements due to gain model error. (Section 4.1)
2.
Using this covariance structure for visibilities is equivalent to explicitly marginalizing over additional free gain parameters under a Gaussian prior. We note however that wide log-amplitude gain priors will often be a poor characterization of expected telescope performance, which is bounded. A standard Bayesian approach of direct numerical marginalization over nuisance gain parameters would be needed to take full advantage of more realistic priors. (Appendix C.5)
3.
In the limit of small thermal error compared to gain error, the χ² derived from visibility measurements reduces to the χ² derived from only closure quantities, after accounting for covariance. Thus, the closure quantities contain all non-station-based information. (Appendix C.4)
4.
We apply the likelihood constructions introduced in this paper toward direct sampling of the posterior distribution of a simple source model and simulated observation. We confirm that the inferred parameter posterior derived using closure quantities matches that derived using baseline visibilities in the limit of unknown gains and that the uncertainties are larger than those derived under known gain calibration, reflecting the relative loss of information. (Section 4)
5.
Under modeling of partially known gains, with systematic station gain uncertainty comparable to that from baseline thermal noise, we see that the model posterior distribution transitions smoothly between the case of perfectly calibrated, corresponding to zero gain error, and completely unknown calibration, using only closure quantities. (Section 4.4)

The authors thank Jim Moran, Geoff Bower, Ramesh Narayan, Kazunori Akiyama, Katie Bouman, Christiaan Brinkerink, Avery Broderick, Andre Young, Josh Speagle, and the anonymous referee for helpful discussions, comments, and ideas. We thank the National Science Foundation (AST-1440254, AST-1614868) and the Gordon and Betty Moore Foundation (GBMF-5278) for financial support of this work. This work was supported in part by the Black Hole Initiative at Harvard University, which is supported by a grant from the John Templeton Foundation and the Gordon and Betty Moore Foundation.

Appendix A: Distributions due to Thermal Noise

Here, we discuss the statistical distributions of measured amplitude and phase quantities used for model fitting, quality of the normal distribution approximations, and influence of the estimation of an intrinsic signal-to-noise parameter $\breve{\rho }$ . In the thermal-noise-dominated regime, the fundamental measured quantity, complex correlation coefficient r, follows a circularly symmetric complex normal distribution with mean $\breve{\rho }{\sigma }_{r}$ . Without loss of generality, we choose our coordinates such that the mean of the complex correlation distribution is real. An associated standard deviation of both real and imaginary components σ_r can be computed from first principles (Thompson et al. 2017). Hence, it is useful to work with the normalized complex random variable r/σ_r, with unit standard deviation (Figure A1). Probability densities for closure quantities, as shown in Figures 1 and 2, can then be derived from the ones presented here with elementary operations such as convolution.

**Figure A1.** Two-thousand random realizations of the measured complex correlation coefficient r/σ_r given intrinsic signal-to-noise $\breve{\rho }$ = 2 and 5.
Download figure:
Standard image High-resolution image

A.1. Phase and Amplitude Distributions

The correlation coefficient phase ϕ is the argument of a circular complex normal variable and, as such, obeys the following circular distribution (Thompson et al. 2017):

$\begin{eqnarray}&&\begin{array}{l}p(\phi | \breve{\rho })\,=\,\displaystyle \frac{1}{2\pi }\exp \left(-\displaystyle \frac{{\breve{\rho }}^{2}}{2}\right)\left\{1+\sqrt{\displaystyle \frac{\pi }{2}}\breve{\rho }\,\cos \,\phi \,\exp \left(\displaystyle \frac{{\breve{\rho }}^{2}\,{\cos }^{2}\phi }{2}\right)\right.\\ \,\times \,\left.\left.\left[1+{\rm{E}}{\rm{r}}{\rm{f}}\left(\displaystyle \frac{\breve{\rho }\,\cos \,\phi }{\sqrt{2}}\right)\right.\right]\right\}.\end{array}\end{eqnarray} \tag{ A1 }$

We choose the coordinates in such a way that the true visibility phase is zero, and we denote the error function with Erf. Examples showing the probability density $p(\phi | \breve{\rho })$ for different values of $\breve{\rho }$ are shown in the top left panel of Figure A2. This somewhat complicated distribution can be approximated either by a normal distribution (dashed lines in Figure A2),

$\begin{eqnarray}&&{p}_{{\rm{N}}}(\phi | \breve{\rho })=\displaystyle \frac{\breve{\rho }}{\sqrt{2\pi }}\exp \left(\displaystyle \frac{-{\phi }^{2}{\breve{\rho }}^{2}}{2}\right),\ {\sigma }_{{\rm{N}}}^{2}=\displaystyle \frac{1}{{\breve{\rho }}^{2}},\end{eqnarray} \tag{ A2 }$

or by the von Mises distribution (dotted lines in Figure A2; Christian & Psaltis 2019),

$\begin{eqnarray}&&{p}_{{\rm{M}}}(\phi | \breve{\rho })=\displaystyle \frac{\breve{\rho }\,\exp \left({\breve{\rho }}^{2}\,\cos \phi \right)}{2\pi \,{I}_{0}({\breve{\rho }}^{2})},\ {\sigma }_{{\rm{M}}}^{2}=1-\displaystyle \frac{{I}_{1}({\breve{\rho }}^{2}/2)}{{I}_{0}({\breve{\rho }}^{2}/2)},\end{eqnarray} \tag{ A3 }$

which outperforms the normal distribution for low S/N.

**Figure A2.** Analytic distributions of phase and amplitude quantities (continuous lines). Normal distribution approximations, exact in the $\breve{\rho }\to \infty$ limit, are shown with dashed lines. For the visibility phase (top left panel), the von Mises distribution approximation is shown with a dotted line. All presented approximations assume knowledge of a hidden parameter $\breve{\rho }$ , which in general must be estimated from noisy measurements.
Download figure:
Standard image High-resolution image

$\breve{\rho }\to \infty $ — **Figure A2.** Analytic distributions of phase and amplitude quantities (continuous lines). Normal distribution approximations, exact in the $\breve{\rho }\to \infty$ limit, are shown with dashed lines. For the visibility phase (top left panel), the von Mises distribution approximation is shown with a dotted line. All presented approximations assume knowledge of a hidden parameter $\breve{\rho }$ , which in general must be estimated from noisy measurements.
Download figure:
Standard image High-resolution image

For a given model $\breve{\rho }$ , the measured normalized correlation coefficient amplitude $\rho =| r/{\sigma }_{r}| \geqslant 0$ follows a Rice distribution,

$\begin{eqnarray}&&p(\rho | \breve{\rho })=\rho \,\exp \left(-\displaystyle \frac{{\rho }^{2}+{\breve{\rho }}^{2}}{2}\right){I}_{0}\left(\rho \breve{\rho }\right),\end{eqnarray} \tag{ A4 }$

where I₀ is a modified Bessel function of the first kind with order zero. Distributions of visibility amplitude are shown in Figure A2 (top right panel). The dashed lines represent a normal distribution approximation with mean ${m}^{2}=({\breve{\rho }}^{2}+1)$ and unit standard deviation, which is accurate in the limit $\breve{\rho }\to \infty$ .

The normal approximation can not properly handle strictly nonnegative random variables, which becomes a problem at low S/N. The mean of the correlation amplitude is also positively biased with respect to $\breve{\rho }$ due to its noise contribution, and we find $E\left[\rho \right]$ = 2.272 for $\breve{\rho }=2$ and $E\left[\rho \right]=5.101$ for $\breve{\rho }=5$ , which illustrates why debiasing is important for incoherent averaging over many realizations and for estimating $\breve{\rho }$ from low-S/N data.

When working with closure amplitudes, we need to utilize the reciprocal amplitudes y = 1/ρ, distributed according to

$\begin{eqnarray}&&p(y| \breve{\rho })=\displaystyle \frac{1}{{y}^{3}}\exp \left(-\displaystyle \frac{1/{y}^{2}+{\breve{\rho }}^{2}}{2}\right){I}_{0}\left(\displaystyle \frac{\breve{\rho }}{y}\right).\end{eqnarray} \tag{ A5 }$

Although this distribution can be approximated at high S/N as a normal distribution with mean ${m}^{2}={({\breve{\rho }}^{2}+1)}^{-1}$ and standard deviation ${\breve{\rho }}^{-2}$ (Figure A2 bottom left panel), the probability distribution exhibits heavy tails at the low S/N, related to inversion of potentially arbitrarily small amplitude. The fact that amplitude is always positive is one indication that log amplitude might be a more natural space in which to characterize the distribution. Another benefit of using log amplitude is that amplitude and squared-amplitude (a more natural quantity for incoherent sums of Gaussian components) are simply related. Logarithms of the correlation amplitude $z=\mathrm{log}\rho$ ( $\mathrm{log}$ denotes a natural logarithm) obey the following log–Rice distribution:

$\begin{eqnarray}&&p(z| \breve{\rho })=\exp \left(2z-\displaystyle \frac{{\breve{\rho }}^{2}}{2}-\displaystyle \frac{\exp 2z}{2}\right){I}_{0}\left(\breve{\rho }\exp z\right).\end{eqnarray} \tag{ A6 }$

The distributions of the logarithm of amplitude for different $\breve{\rho }$ are shown in Figure A2, bottom right panel. Moments of the log–Rice distribution are treatable analytically, and the distribution can be approximated with a normal distribution of mean $m=0.5\mathrm{log}({\breve{\rho }}^{2}+1)$ and standard deviation $1/\breve{\rho }$ . A more general exact treatment of incoherent averages of M amplitude measurements follows.

A.2. Log-amplitude Ensemble Distribution

Amplitude gain is a property of antenna efficiency and system noise and is generally quite stable when compared to variations in atmospheric phase. This leads to the concept of incoherent averaging for a series of amplitude measurements (Rogers et al. 1995; Johnson et al. 2015). Consider a set of M independent complex visibility measurements v_i where each complex component has thermal noise of 1. Thus, ${v}_{i}={\breve{\rho }}_{i}+{n}_{i}$ where ${\breve{\rho }}_{i}$ is some expected signal-to-noise ratio for each measurement, and n_i is a Gaussian complex random variable with σ = 1 for each component. The sum-squared amplitudes follow a χ² distribution with 2M degrees of freedom. This will be a noncentral χ² distribution if it includes a nonzero expected source contribution.

$\begin{eqnarray}&&x=\sum _{i}| {v}_{i}{| }^{2}\qquad f(x)={\chi }_{2M,\lambda }^{2}\end{eqnarray} \tag{ A7 }$

where λ is the non-centrality parameter,

$\begin{eqnarray}&&\lambda =\sum _{i}| {\breve{\rho }}_{i}{| }^{2}.\end{eqnarray} \tag{ A8 }$

The expectation value of $\mathrm{log}x$ is

$\begin{eqnarray}&&E\left[\mathrm{log}x\right]={g}_{M}(\lambda ),\end{eqnarray} \tag{ A9 }$

and g(·) is the function (Lapidoth & Moser 2003)

$\begin{eqnarray}&&\begin{array}{l}g(\lambda )\\ \,=\,\left\{\begin{array}{c}{\rm{l}}{\rm{o}}{\rm{g}}\lambda -{\rm{E}}{\rm{i}}\left[-\displaystyle \frac{\lambda }{2{\sigma }^{2}}\right]\,({\rm{i}}{\rm{f}}\,\lambda \gt 0)\\ \,+\,\displaystyle \sum _{j=1}^{M-1}{\left(-1\right)}^{j}\left[{e}^{-\lambda }(j-1)!-\displaystyle \frac{\left(M-1)!\right.}{j(M-1-j)!}\right]{\left(\displaystyle \frac{1}{\lambda }\right)}^{j}\\ {\rm{l}}{\rm{o}}{\rm{g}}2{\sigma }^{2}-\gamma \,+\displaystyle \sum _{j=1}^{M-1}\displaystyle \frac{1}{j}\,\,({\rm{i}}{\rm{f}}\,\lambda =0)\end{array}\right.\end{array}\end{eqnarray} \tag{ A10 }$

where γ ≈ 0.577 is the Euler–Mascheroni constant and $\mathrm{Ei}$ is the exponential integral. We have introduced σ for the case where amplitudes are uniformly scaled away from σ = 1. From this, the expectation value $E\left[\mathrm{log}\sqrt{x}\right]=E\left[\mathrm{log}x\right]/2$ is easy to calculate, for example, in the case of a single Rice-distributed complex visibility (where "measured" $\rho =| v|$ )

$\begin{eqnarray}&&E\left[\mathrm{log}\rho \right]=\mathrm{log}\breve{\rho }-\mathrm{Ei}\left[-\displaystyle \frac{{\breve{\rho }}^{2}}{2}\right].\end{eqnarray} \tag{ A11 }$

The log-closure amplitude c is formed from the linear combination of four log amplitudes A, B, C, D,

$\begin{eqnarray}&&c=A+B-C-D\end{eqnarray} \tag{ A12 }$

so that the expectation value (from which a bias is derived) is trivial,

$\begin{eqnarray}&&E\left[c\right]\,=\,E\left[A\right]\,+\,E\left[B\right]\,-\,E\left[C\right]\,-\,E\left[D\right].\end{eqnarray} \tag{ A13 }$

To characterize the distribution of measured closure amplitudes, we require additional moments beyond the first moment (bias). For a multivariate Gaussian approximation suitable for a least-squares fitting of log-closure quantities with known covariance, we need to estimate the second moment of each log amplitude. High-order moments of the log-noncentral χ² distribution can be derived as Poisson-weighted infinite series of polygamma functions ψ^(m)(z) (Pav 2015),

$\begin{eqnarray}&&E\left[{x}^{k}\right]=\sum _{j=0}^{\infty }\displaystyle \frac{{e}^{-\lambda /2}{\left(\lambda /2\right)}^{j}}{j!}{\mu }_{{k}_{2M+2j}}^{{\prime} }\end{eqnarray} \tag{ A14 }$

where ${\mu }_{{k}_{2M+2j}}^{{\prime} }$ is the k^th moment (not central moment) of a log chi-square (λ = 0) distribution with 2M + 2j degrees of freedom,

$\begin{eqnarray}&&{\mu }_{n}^{{\prime} }={\kappa }_{n}+\sum _{m=1}^{n-1}\left(\displaystyle \genfrac{}{}{0em}{}{n-1}{m-1}\right){\kappa }_{m}{\mu }_{n-m}^{{\prime} }\end{eqnarray} \tag{ A15 }$

$\begin{eqnarray}{\kappa }_{n}=\left\{\begin{array}{ll}\mathrm{log}2+\psi (M+j) & n=1\\ {\psi }^{(n-1)}(M+j) & n\gt 1\end{array}\right.\end{eqnarray} \tag{ A16 }$

in terms of cumulants κ_n. Note that the second and third cumulants are equal to the corresponding central moments. The cumulants in terms of noncentral moments are

$\begin{eqnarray}&&{\kappa }_{n}={\mu }_{n}^{{\prime} }-\sum _{m=1}^{n-1}\left(\displaystyle \genfrac{}{}{0em}{}{n-1}{m-1}\right){\kappa }_{m}{\mu }_{n-m}^{{\prime} }.\end{eqnarray} \tag{ A17 }$

For a single log-central χ² distribution of two degrees of freedom (i.e., an exponential distribution with mean value 2), the first and second cumulants are particularly simple,

$\begin{eqnarray}&&E\left[\mathrm{log}{\chi }_{2}^{2}\right]=\mathrm{log}2+\psi (1)=\mathrm{log}2-\gamma \end{eqnarray} \tag{ A18 }$

$\begin{eqnarray}&&\mathrm{Var}\left[\mathrm{log}{\chi }_{2}^{2}\right]={\psi }^{1}(1)=\displaystyle \frac{{\pi }^{2}}{6},\end{eqnarray} \tag{ A19 }$

which are the same as the cumulants for a log-Rayleigh distribution scaled appropriately by a factor of two.

Recurrence relationships for polygamma can be used to quickly derive cumulants, including higher-order cumulants, of the log-central χ² distribution of 2M degrees of freedom. Aside from κ₁, the higher-order cumulants approach zero as $M\to \infty$ . We see that calculating cumulants at increasing number of degrees of freedom is simply adding one more term to the series.

$\begin{eqnarray}&&{\kappa }_{1}=\mathrm{log}2-\gamma +\sum _{k=1}^{M-1}\displaystyle \frac{1}{k}\end{eqnarray} \tag{ A20 }$

$\begin{eqnarray}&&{\kappa }_{2}=\displaystyle \frac{{\pi }^{2}}{6}-\sum _{k=1}^{M-1}\displaystyle \frac{1}{{k}^{2}}\end{eqnarray} \tag{ A21 }$

$\begin{eqnarray}&&{\kappa }_{3}=-2\,\zeta (3)+\sum _{k=1}^{M-1}\displaystyle \frac{2}{{k}^{3}}\end{eqnarray} \tag{ A22 }$

$\begin{eqnarray}&&{\kappa }_{4}=\displaystyle \frac{{\pi }^{4}}{15}-\sum _{k=1}^{M-1}\displaystyle \frac{6}{{k}^{4}}\end{eqnarray} \tag{ A23 }$

or, more generally,

$\begin{eqnarray}&&\displaystyle \frac{{\psi }^{(n-1)}(M)}{{\left(-1\right)}^{n}(n-1)!}=\zeta (n)-\sum _{k=1}^{M-1}\displaystyle \frac{1}{{k}^{n}}=\sum _{k=M}^{\infty }\displaystyle \frac{1}{{k}^{n}}.\end{eqnarray} \tag{ A24 }$

These relatively simple expressions are for the cumulants of a log central χ² distribution. They must be converted into moments of a noncentral distribution by summing over the appropriate Poisson mixture (Equation (A14)), which depends on the non-centrality parameter. The noncentral moments can then be converted back into cumulants of the noncentral distribution to build cumulants of the log-closure amplitude distribution. For a log-closure amplitude c = A + B − C − D, the cumulants of c are formed as

$\begin{eqnarray}&&{\kappa }_{c,1}={\kappa }_{A,1}+{\kappa }_{B,1}-{\kappa }_{C,1}-{\kappa }_{D,1}\ \ \ \ (\mathrm{mean})\end{eqnarray} \tag{ A25 }$

$\begin{eqnarray}&&{\kappa }_{c,2}={\kappa }_{A,2}+{\kappa }_{B,2}+{\kappa }_{C,2}+{\kappa }_{D,2}\ \ \ (\mathrm{variance})\end{eqnarray} \tag{ A26 }$

$\begin{eqnarray}&&{\kappa }_{c,3}={\kappa }_{A,3}+{\kappa }_{B,3}-{\kappa }_{C,3}-{\kappa }_{D,3}\ (\mathrm{skew}\times {\kappa }_{2}^{3/2}\end{eqnarray} \tag{ A27 }$

$\begin{eqnarray}&&{\kappa }_{c,4}={\kappa }_{A,4}+{\kappa }_{B,4}+{\kappa }_{C,4}+{\kappa }_{D,4}\ (\mathrm{ex}.\mathrm{kurtosis}\times {\kappa }_{2}^{2})\end{eqnarray} \tag{ A28 }$

and so on. Figure A3 shows the first four moments calculated this way using a finite number of nonzero terms from the Poisson mixture, and compared to a Monte Carlo estimation.

A.3. Quality of Distribution Approximations

The true underlying value of $\breve{\rho }$ remains generally unknown, and our ability to estimate $\breve{\rho }$ will influence the quality of our derived distribution for the measured value. This contributes a source of error in addition to a possible mismatch due to any approximations used. In Figure A4, we evaluate the influence of both these effects using a χ² test and also by calculating the Kullback–Leibler divergence between the ground truth and a normal distribution characterized by the two approximated moments. At low S/N, uncertainties are typically underestimated leading to large χ² values. In the context of inferred model parameters, this leads to erroneously narrow derived posteriors.

**Figure A4.** The four panels show the quality of the Normal approximation for different phase and amplitude distributions as a function of model S/N. The solid lines show the expected squared value of the normalized residual quantity (expected reduced χ²) from the legend, while the dashed lines show the relative entropy (Kullback–Leibler divergence) between the true distribution of each quantity and a standard Normal distribution. For example, the values at $\breve{\rho }=2$ reflect an ensemble of complex visibilities with intrinsic $\breve{\rho }=2$ and measured $\rho =| r/{\sigma }_{r}|$ for each random realization (see Figure A1). The orange line in the top left panel thus corresponds to an expected squared deviation in measured phase $\phi =\mathrm{Arg}\left[\rho \right]$ away from the truth value $\breve{\phi }$ , where the deviation is normalized by an empirical error estimate σ_ϕ = 1/ρ. Other curves show different error estimates based on the model $\breve{\rho }$ (which is typically not known in a real observation), or a noise-debiased estimate ${\rho }_{\mathrm{deb}}=\sqrt{{\rho }^{2}-1}$ . For log amplitude, μ corresponds to the small expected bias from Equation (A11).
Download figure:
Standard image High-resolution image

**Figure A4.** The four panels show the quality of the Normal approximation for different phase and amplitude distributions as a function of model S/N. The solid lines show the expected squared value of the normalized residual quantity (expected reduced χ²) from the legend, while the dashed lines show the relative entropy (Kullback–Leibler divergence) between the true distribution of each quantity and a standard Normal distribution. For example, the values at $\breve{\rho }=2$ reflect an ensemble of complex visibilities with intrinsic $\breve{\rho }=2$ and measured $\rho =| r/{\sigma }_{r}|$ for each random realization (see Figure A1). The orange line in the top left panel thus corresponds to an expected squared deviation in measured phase $\phi =\mathrm{Arg}\left[\rho \right]$ away from the truth value $\breve{\phi }$ , where the deviation is normalized by an empirical error estimate σ_ϕ = 1/ρ. Other curves show different error estimates based on the model $\breve{\rho }$ (which is typically not known in a real observation), or a noise-debiased estimate ${\rho }_{\mathrm{deb}}=\sqrt{{\rho }^{2}-1}$ . For log amplitude, μ corresponds to the small expected bias from Equation (A11).
Download figure:
Standard image High-resolution image

Given knowledge of true $\breve{\rho }$ , it is possible in principle to achieve perfect statistics due to full knowledge of the distribution, rather than the high-S/N approximations used in the figure. However, this does not extend to the empirical (realistic) estimators. Furthermore, we see that the estimator with χ²_r closest to one is not always the estimator with the best Gaussianity according to the KL divergence. Lastly, although reciprocal amplitude is very difficult to characterize due to values near zero, motivating the use of log amplitude, visibility amplitude itself is comparatively well behaved and easy to approximate, even to low S/N.

Appendix B: Design and Covariance Matrix Construction

B.1. Baseline Phase and Amplitude Matrices

A pair of complex visibilities may share a station, so station-based gain effects result in covariances between visibility measurements. The covariance between two visibility phase measurements ${\phi }_{{ij}}$ and ${\phi }_{k{\ell }}$ can be expressed as

$\begin{eqnarray}\begin{array}{rcl}\mathrm{Cov}({\phi }_{{ij}},{\phi }_{k{\ell }}) & = & {\sigma }_{{ij}}^{2}\left({\delta }_{{ik}}{\delta }_{j{\ell }}-{\delta }_{i{\ell }}{\delta }_{{jk}}\right)\\ & & +\ {\sigma }_{\theta ,i}^{2}\left({\delta }_{{ik}}-{\delta }_{i{\ell }}\right)-{\sigma }_{\theta ,j}^{2}\left({\delta }_{{jk}}-{\delta }_{j{\ell }}\right),\end{array}\end{eqnarray} \tag{ B1 }$

where ${\sigma }_{{ij}}^{2}$ is the thermal variance of the visibility phase measurement ${\phi }_{{ij}}$ , ${\sigma }_{\theta ,i}^{2}$ is the gain phase variance for station i, and δ_ij is the Kronecker delta. A similar expression holds for the covariance between two log visibility amplitude measurements ${a}_{{ij}}$ and ${a}_{k{\ell }}$ ,

$\begin{eqnarray}\begin{array}{rcl}\mathrm{Cov}({a}_{{ij}},{a}_{k{\ell }}) & = & {\sigma }_{{ij}}^{2}\left({\delta }_{{ik}}{\delta }_{j{\ell }}+{\delta }_{i{\ell }}{\delta }_{{jk}}\right)\\ & & +\ {\sigma }_{g,i}^{2}\left({\delta }_{{ik}}+{\delta }_{i{\ell }}\right)+{\sigma }_{g,j}^{2}\left({\delta }_{{jk}}+{\delta }_{j{\ell }}\right),\end{array}\end{eqnarray} \tag{ B2 }$

where ${\sigma }_{{ij}}^{2}$ is the thermal variance of the log visibility amplitude measurement ${a}_{{ij}}$ and ${\sigma }_{g,i}^{2}$ is the log gain amplitude variance for station i.

We can see from Equations (B1) and (B2) that the visibility measurement covariances separate into baseline-based and station-based terms. We can thus write the visibility phase covariance matrix ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}$ as

$\begin{eqnarray}&&{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}={\boldsymbol{\Phi }}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\theta }}}{{\boldsymbol{\Phi }}}^{\top }+\,{{\boldsymbol{S}}}_{{\boldsymbol{\phi }}}.\end{eqnarray} \tag{ B3 }$

The ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\theta }}}$ and ${{\boldsymbol{S}}}_{{\boldsymbol{\phi }}}$ matrices are $N$ × $N$ and $B$ × $B$ diagonal matrices constructed from the individual station gain phase variances and visibility phase variances, respectively. The visibility phase "design matrix" ${\boldsymbol{\Phi }}$ is rectangular in general, with $B$ rows and $N$ columns, and provides a mapping from the station-based representation to the baseline-based representation. Each row of ${\boldsymbol{\Phi }}$ contains only two nonzero entries, the first being a 1 and the second being a −1. There are $B$ different ways of writing a length- $N$ row in this fashion, and these constitute the $B$ rows of the matrix. The ordering of these rows depends on the chosen baseline ordering scheme. In this section, we assume a "second station first" baseline ordering scheme, which increments the visibility phases via a nested loop method. The "inner loop" iterates through the second station in increasing order, and the "outer loop" iterates through the first station in increasing order. An example such ordering would be ( ${\phi }_{12}$ , ${\phi }_{13}$ , ${\phi }_{14}$ , ..., ${\phi }_{1N}$ , ${\phi }_{23}$ , ${\phi }_{24}$ , ..., ${\phi }_{2N}$ , ${\phi }_{34}$ , ..., ${\phi }_{N-1,N}$ ), where the indices here correspond to the two stations forming each baseline.

For a general $N$ -station array with $N$ > 2, we present the following recursive relationship for the visibility phase design matrix:

$\begin{eqnarray}{{\boldsymbol{\Phi }}}_{N}=\left(\begin{array}{cc}{\boldsymbol{1}} & -{{\boldsymbol{I}}}_{N-1}\\ 0 & {{\boldsymbol{\Phi }}}_{N-1}\end{array}\right),\end{eqnarray} \tag{ B4 }$

where $1$ is an ( $N$ − 1) × 1 vector containing only 1s, $0$ is an $\left(\displaystyle \genfrac{}{}{0em}{}{N-1}{2}\right)\times 1$ vector containing only 0s, ${{\boldsymbol{I}}}_{N-1}$ is the identity matrix of rank $N-1$ , and ${{\boldsymbol{\Phi }}}_{N-1}$ is the visibility phase design matrix for an array with $N-1$ stations. The rank of ${{\boldsymbol{\Phi }}}_{N}$ is equal to $N-1$ . Table B1 lists examples of ${\boldsymbol{\Phi }}$ and the corresponding ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}$ matrices.

The log visibility amplitude design matrix ${\boldsymbol{A}}$ shares the same structure as the visibility phase design matrix, with the only difference being that the negative elements of ${\boldsymbol{\Phi }}$ become positive for ${\boldsymbol{A}}$ . As a result, for $N\gt 2$ , the rank of ${{\boldsymbol{A}}}_{N}$ for the log visibility amplitudes is equal to $N$ . Table B2 lists examples of log visibility amplitude design and covariance matrices.

B.2. Closure Phase Matrices

It is possible for two closure triangles to have a baseline in common, so in general, two closure phase measurements may be covariant. The covariance between two closure phase measurements ${\psi }_{{ijk}}$ and ${\psi }_{{\ell }{mn}}$ can be expressed as

$\begin{eqnarray}\begin{array}{rcl}\mathrm{Cov}({\psi }_{{ijk}},{\psi }_{{\ell }{mn}}) & = & {\sigma }_{{ij}}^{2}\left({\delta }_{i{\ell }}{\delta }_{{jm}}+{\delta }_{{im}}{\delta }_{{jn}}+{\delta }_{{in}}{\delta }_{j{\ell }}\right)\\ & & -\ {\sigma }_{{ij}}^{2}\left({\delta }_{i{\ell }}{\delta }_{{jn}}+{\delta }_{{im}}{\delta }_{j{\ell }}+{\delta }_{{in}}{\delta }_{{jm}}\right)\\ & & +\ {\sigma }_{{jk}}^{2}\left({\delta }_{j{\ell }}{\delta }_{{km}}+{\delta }_{{jm}}{\delta }_{{kn}}+{\delta }_{{jn}}{\delta }_{k{\ell }}\right)\\ & & -\ {\sigma }_{{jk}}^{2}\left({\delta }_{j{\ell }}{\delta }_{{kn}}+{\delta }_{{jm}}{\delta }_{k{\ell }}+{\delta }_{{jn}}{\delta }_{{km}}\right)\\ & & +\ {\sigma }_{{ik}}^{2}\left({\delta }_{i{\ell }}{\delta }_{{kn}}+{\delta }_{{im}}{\delta }_{k{\ell }}+{\delta }_{{in}}{\delta }_{{km}}\right)\\ & & -\ {\sigma }_{{ik}}^{2}\left({\delta }_{i{\ell }}{\delta }_{{km}}+{\delta }_{{im}}{\delta }_{{kn}}+{\delta }_{{in}}{\delta }_{k{\ell }}\right).\end{array}\end{eqnarray} \tag{ B5 }$

This lengthy expression encodes two symmetries of closure phases. The first symmetry is a "cycling invariance",

$\begin{eqnarray}&&{\psi }_{{ijk}}={\psi }_{{jki}}={\psi }_{{kij}},\end{eqnarray} \tag{ B6 }$

which indicates that the choice of starting baseline does not affect the value of the closure phase. The second symmetry is a sign flip imparted upon reversing the direction of the sequence,

$\begin{eqnarray}&&{\psi }_{{ijk}}=-{\psi }_{{jik}}.\end{eqnarray} \tag{ B7 }$

These symmetries are illustrated in Figure B1.

As with the visibilities (see Appendix B.1), we can construct a design matrix ${\boldsymbol{\Psi }}$ that maps from the visibility phase space to the closure phase space,

$\begin{eqnarray}&&{\boldsymbol{\psi }}={\boldsymbol{\Psi }}{\boldsymbol{\phi }},\end{eqnarray} \tag{ B8 }$

where ${\boldsymbol{\phi }}$ and ${\boldsymbol{\psi }}$ are vectors of visibility phases and closure phases, respectively. This design matrix allows us to express the closure phase covariance matrix ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}$ in terms of the visibility phase covariance matrix ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}$ ,

$\begin{eqnarray}&&{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}={\boldsymbol{\Psi }}\,{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}{{\boldsymbol{\Psi }}}^{\top }.\end{eqnarray} \tag{ B9 }$

By construction, the closure phases use combinations of visibility phases for which the gain contributions cancel, a property referred to as "phase aberration annihilation" by Lannes (1991). This cancellation manifests in the design matrices as well: the product of closure phase and visibility phase design matrices evaluates to the zero matrix,

$\begin{eqnarray}&&{\boldsymbol{\Psi }}{\boldsymbol{\Phi }}=0.\end{eqnarray} \tag{ B10 }$

We can thus express the closure phase covariance matrix more simply in terms of the diagonal matrix containing only visibility phase thermal variances,

$\begin{eqnarray}&&{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}={\boldsymbol{\Psi }}\,{{\boldsymbol{S}}}_{{\boldsymbol{\phi }}}{{\boldsymbol{\Psi }}}^{\top }.\end{eqnarray} \tag{ B11 }$

For a general $N$ -station array with $N$ > 3, we present the following recursive relationship for constructing the design matrix corresponding to a maximal set of closure phases:

$\begin{eqnarray}{{\boldsymbol{\Psi }}}_{N,\max }=\left(\begin{array}{cc}{{\boldsymbol{\Phi }}}_{N-1} & {{\boldsymbol{I}}}_{\left(\displaystyle \genfrac{}{}{0em}{}{N-1}{2}\right)}\\ 0 & {{\boldsymbol{\Psi }}}_{N-1,\max }\end{array}\right),\end{eqnarray} \tag{ B12 }$

where ${{\boldsymbol{\Phi }}}_{N-1}$ is the visibility phase design matrix for an array with $N-1$ stations (see Equation (B4)), $0$ is an $\left(\displaystyle \genfrac{}{}{0em}{}{N-1}{3}\right)\times (N-1)$ matrix containing only 0s, ${{\boldsymbol{I}}}_{\left(\displaystyle \genfrac{}{}{0em}{}{N-1}{2}\right)}$ is the identity matrix of rank $\left(\displaystyle \genfrac{}{}{0em}{}{N-1}{2}\right)$ , and ${{\boldsymbol{\Psi }}}_{N-1,\max }$ is the maximal closure phase design matrix for an array with $N-1$ stations.

To obtain a minimal (nonredundant) set of closure phases for an $N$ -station array, we can use a modified design matrix:

$\begin{eqnarray}{{\boldsymbol{\Psi }}}_{N}=\left(\begin{array}{cc}{{\boldsymbol{\Phi }}}_{N-1} & {{\boldsymbol{I}}}_{\left(\displaystyle \genfrac{}{}{0em}{}{N-1}{2}\right)}\end{array}\right).\end{eqnarray} \tag{ B13 }$

Table B3 lists example closure phase design and covariance matrices.

Table B3. Closure Phase Design and Covariance Matrices for Three- and Four-element Arrays

		Number of Stations ( $N$ )
Matrix	Shape	$N=3$	$N=4$
${{\boldsymbol{\phi }}}^{\top }$	$1\times B$	$\left(\begin{array}{ccc}{\phi }_{12} & {\phi }_{13} & {\phi }_{23}\end{array}\right)$	$\left(\begin{array}{cccccc}{\phi }_{12} & {\phi }_{13} & {\phi }_{14} & {\phi }_{23} & {\phi }_{24} & {\phi }_{34}\end{array}\right)$

${{\boldsymbol{\Psi }}}_{\max }$	T × B	$\left(\begin{array}{ccc}1 & -1 & 1\end{array}\right)$	$\left(\begin{array}{cccccc}1 & -1 & 0 & 1 & 0 & 0\\ 1 & 0 & -1 & 0 & 1 & 0\\ 0 & 1 & -1 & 0 & 0 & 1\\ 0 & 0 & 0 & 1 & -1 & 1\end{array}\right)$

${\boldsymbol{\Psi }}$	t × B	$\left(\begin{array}{ccc}1 & -1 & 1\end{array}\right)$	$\left(\begin{array}{cccccc}1 & -1 & 0 & 1 & 0 & 0\\ 1 & 0 & -1 & 0 & 1 & 0\\ 0 & 1 & -1 & 0 & 0 & 1\end{array}\right)$

${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }},\max }$	T × T	$\left(\begin{array}{c}{\sigma }_{12}^{2}+{\sigma }_{13}^{2}+{\sigma }_{23}^{2}\end{array}\right)$	$\left(\begin{array}{cccc}{\sigma }_{12}^{2}+{\sigma }_{13}^{2}+{\sigma }_{23}^{2} & {\sigma }_{12}^{2} & -{\sigma }_{13}^{2} & {\sigma }_{23}^{2}\\ {\sigma }_{12}^{2} & {\sigma }_{12}^{2}+{\sigma }_{14}^{2}+{\sigma }_{24}^{2} & {\sigma }_{14}^{2} & -{\sigma }_{24}^{2}\\ -{\sigma }_{13}^{2} & {\sigma }_{14}^{2} & {\sigma }_{13}^{2}+{\sigma }_{14}^{2}+{\sigma }_{34}^{2} & {\sigma }_{34}^{2}\\ {\sigma }_{23}^{2} & -{\sigma }_{24}^{2} & {\sigma }_{34}^{2} & {\sigma }_{23}^{2}+{\sigma }_{24}^{2}+{\sigma }_{34}^{2}\end{array}\right)$

${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}$	t × t	$\left(\begin{array}{c}{\sigma }_{12}^{2}+{\sigma }_{13}^{2}+{\sigma }_{23}^{2}\end{array}\right)$	$\left(\begin{array}{ccc}{\sigma }_{12}^{2}+{\sigma }_{13}^{2}+{\sigma }_{23}^{2} & {\sigma }_{12}^{2} & -{\sigma }_{13}^{2}\\ {\sigma }_{12}^{2} & {\sigma }_{12}^{2}+{\sigma }_{14}^{2}+{\sigma }_{24}^{2} & {\sigma }_{14}^{2}\\ -{\sigma }_{13}^{2} & {\sigma }_{14}^{2} & {\sigma }_{13}^{2}+{\sigma }_{14}^{2}+{\sigma }_{34}^{2}\end{array}\right)$

Note. Here, $T=\left(\displaystyle \genfrac{}{}{0em}{}{N}{3}\right)$ is the number of triangles in a maximal set, $t=\left(\displaystyle \genfrac{}{}{0em}{}{N-1}{2}\right)$ is the number of triangles in a minimal set, and $B=\left(\displaystyle \genfrac{}{}{0em}{}{N}{2}\right)$ is the number of baselines.

Download table as: ASCII Typeset image

B.3. Log-closure Amplitude Matrices

A pair of closure quandrangles can have up to two baselines in common, meaning that, in general, log-closure amplitudes will be covariant. The covariance between log-closure amplitude measurements ${c}_{{ijk}{\ell }}$ and ${c}_{{mnpq}}$ can be expressed as

$\begin{eqnarray}&&\begin{array}{l}\mathrm{Cov}\left({c}_{{ijk}{\ell }},{c}_{{mnpq}}\right)={\sigma }_{{ij}}^{2}\left({\delta }_{{im}}{\delta }_{{jn}}+{\delta }_{{in}}{\delta }_{{jm}}+{\delta }_{{ip}}{\delta }_{{jq}}+{\delta }_{{iq}}{\delta }_{{jp}}\right)\\ \quad \ -\ {\sigma }_{{ij}}^{2}\left({\delta }_{{im}}{\delta }_{{jp}}+{\delta }_{{in}}{\delta }_{{jq}}+{\delta }_{{ip}}{\delta }_{{jm}}+{\delta }_{{iq}}{\delta }_{{jn}}\right)\\ \quad \ +\ {\sigma }_{k{\ell }}^{2}\left({\delta }_{{km}}{\delta }_{{\ell }n}+{\delta }_{{kn}}{\delta }_{{\ell }m}+{\delta }_{{kp}}{\delta }_{{\ell }q}+{\delta }_{{kq}}{\delta }_{{\ell }p}\right)\\ \quad \ -\ {\sigma }_{k{\ell }}^{2}\left({\delta }_{{km}}{\delta }_{{\ell }p}+{\delta }_{{kn}}{\delta }_{{\ell }q}+{\delta }_{{kp}}{\delta }_{{\ell }m}+{\delta }_{{kq}}{\delta }_{{\ell }n}\right)\\ \quad \ +\ {\sigma }_{{ik}}^{2}\left({\delta }_{{im}}{\delta }_{{kp}}+{\delta }_{{in}}{\delta }_{{kq}}+{\delta }_{{ip}}{\delta }_{{km}}+{\delta }_{{iq}}{\delta }_{{kn}}\right)\\ \quad \ -\ {\sigma }_{{ik}}^{2}\left({\delta }_{{im}}{\delta }_{{kn}}+{\delta }_{{in}}{\delta }_{{km}}+{\delta }_{{ip}}{\delta }_{{kq}}+{\delta }_{{iq}}{\delta }_{{kp}}\right)\\ \quad \ +\ {\sigma }_{j{\ell }}^{2}\left({\delta }_{{jm}}{\delta }_{{\ell }p}+{\delta }_{{jn}}{\delta }_{{\ell }q}+{\delta }_{{jp}}{\delta }_{{\ell }m}+{\delta }_{{jq}}{\delta }_{{\ell }n}\right)\\ \quad \ -\ {\sigma }_{j{\ell }}^{2}\left({\delta }_{{jm}}{\delta }_{{\ell }n}+{\delta }_{{jn}}{\delta }_{{\ell }m}+{\delta }_{{jp}}{\delta }_{{\ell }q}+{\delta }_{{jq}}{\delta }_{{\ell }p}\right),\end{array}\end{eqnarray} \tag{ B14 }$

where ${\sigma }_{{ij}}^{2}$ is the variance in the log visibility amplitude measurement ${a}_{{ij}}$ . There are three symmetries encoded in the above expression. The first of these is a cycling invariance,

$\begin{eqnarray}&&{c}_{{ijk}{\ell }}={c}_{{\ell }{kji}},\end{eqnarray} \tag{ B15 }$

indicating that, as for the closure phases, the log-closure amplitude value does not change with choice of starting baseline. The second symmetry is a direction invariance,

$\begin{eqnarray}&&{c}_{{ijk}{\ell }}={c}_{k{\ell }{ij}},\end{eqnarray} \tag{ B16 }$

showing that, unlike for closure phases, the log-closure amplitude value does not change when the sequence of baselines is reversed. The third symmetry is a sign flip imparted on the value of the log-closure amplitude upon swapping the numerator and denominator,

$\begin{eqnarray}&&{c}_{{ijk}{\ell }}=-{c}_{{ikj}{\ell }}.\end{eqnarray} \tag{ B17 }$

These symmetries are illustrated in Figure B2.

We construct a minimal design matrix ${\boldsymbol{C}}$ that maps from the log visibility amplitude space to the log-closure amplitude space,

$\begin{eqnarray}&&{\boldsymbol{c}}={\boldsymbol{C}}{\boldsymbol{a}},\end{eqnarray} \tag{ B18 }$

where ${\boldsymbol{a}}$ and ${\boldsymbol{c}}$ are vectors of log visibility amplitudes and log-closure amplitudes, respectively. This log-closure amplitude design matrix is equivalent to the "amplitude closure operator" of Lannes (1990a) and the "alternate amplitude compilation operator" of Lannes (1991). We express the log-closure amplitude covariance matrix ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}}}$ in terms of this design matrix ${\boldsymbol{C}}$ and the log visibility amplitude covariance matrix ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{a}}}$ ,

$\begin{eqnarray}&&{{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}}}={\boldsymbol{C}}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{a}}}{{\boldsymbol{C}}}^{\top }={\boldsymbol{C}}{{\boldsymbol{S}}}_{{\boldsymbol{a}}}{{\boldsymbol{C}}}^{\top },\end{eqnarray} \tag{ B19 }$

where, as with Equation (B10), we have used the fact that ${\boldsymbol{CA}}=0$ to simplify the construction. This cancellation is referred to as "amplitude aberration annihilation" by Lannes (1991).

For an $N$ -station array with $N\gt 4$ , the design matrix for a minimal set of log-closure amplitudes can be constructed using

$\begin{eqnarray}{{\boldsymbol{C}}}_{N}=\left(\begin{array}{cc}{{\boldsymbol{X}}}_{N} & {{\boldsymbol{Y}}}_{N}\\ 0 & {{\boldsymbol{C}}}_{N-1}\end{array}\right),\end{eqnarray} \tag{ B20 }$

where ${{\boldsymbol{C}}}_{N-1}$ is the design matrix for an array with $N-1$ stations, $0$ is an $\left(\tfrac{(N-1)(N-4)}{2}\right)\times \left(N-1\right)$ matrix of all zeros,

$\begin{eqnarray}&&{{\boldsymbol{X}}}_{N}=\left({{\boldsymbol{I}}}_{N-2}-1\right),\end{eqnarray} \tag{ B21 }$

and ${{\boldsymbol{Y}}}_{N}$ is an $(N-2)\times \left(\displaystyle \genfrac{}{}{0em}{}{N-1}{2}\right)$ matrix constructed by "cycling" through pairs of baselines that do not contain the first station:

$\begin{eqnarray}\begin{array}{l}{{\boldsymbol{Y}}}_{N}=\left(\begin{array}{cccccccccccccccccc}0 & 0 & 0 & \ldots & 0 & -1 & 0 & \ldots & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\ -1 & 0 & 0 & \ldots & 0 & 0 & 1 & \ldots & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & 0\\ \vdots & & & \ddots & & & & \ddots & & & & & & & & & & \vdots \\ \cdot & \cdot & \cdot & \ldots & \cdot & \cdot & \cdot & \ldots & -1 & 0 & 0 & 1 & \cdot & \cdot & \cdot & \cdot & \cdot & 0\\ \cdot & \cdot & \cdot & \ldots & \cdot & \cdot & \cdot & \ldots & \cdot & \cdot & \cdot & \cdot & -1 & 0 & 1 & \cdot & \cdot & 0\\ \cdot & \cdot & \cdot & \ldots & \cdot & \cdot & \cdot & \ldots & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & -1 & 1 & 0\end{array}\right).\end{array}\end{eqnarray} \tag{ B22 }$

Here, matrix elements represented by a dot indicate zero-valued entries. Table B4 lists example log-closure amplitude design and covariance matrices.

Table B4. Minimal Log-closure Amplitude Design and Covariance Matrices for a Four-element Array

		Number of Stations ( $N$ )
Matrix	Shape	$N=4$
${{\boldsymbol{a}}}^{\top }$	1 × B	$\left({a}_{12}\ {a}_{13}\ {a}_{14}\ {a}_{23}\ {a}_{24}\ {a}_{34}\right)$

${\boldsymbol{C}}$	q × $B$	$\left(\begin{array}{cccccc}0 & 1 & -1 & -1 & 1 & 0\\ 1 & 0 & -1 & -1 & 0 & 1\end{array}\right)$

${{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}}}$	q × q	$\left(\begin{array}{cc}{\sigma }_{13}^{2}+{\sigma }_{14}^{2}+{\sigma }_{23}^{2}+{\sigma }_{24}^{2} & {\sigma }_{14}^{2}+{\sigma }_{23}^{2}\\ {\sigma }_{14}^{2}+{\sigma }_{23}^{2} & {\sigma }_{12}^{2}+{\sigma }_{14}^{2}+{\sigma }_{23}^{2}+{\sigma }_{34}^{2}\end{array}\right)$

Note. Here, $q=\tfrac{N(N-3)}{2}$ is the number of quadrangles in a minimal set, and $B=\left(\displaystyle \genfrac{}{}{0em}{}{N}{2}\right)$ is the number of baselines.

Download table as: ASCII Typeset image

Appendix C: Worked Examples of Information Content

For an array of $N$ stations, an accounting of the nonredundant closure phases reveals that they differ by an amount $N-1$ from the number of visibility phases. This offset, which is equal to the number of unique gain phases in the array, suggests that closure phases contain all of the source phase information and that the additional degrees of freedom afforded by the visibility phases only describe the gains. A similar situation holds for the closure amplitudes, where the number of nonredundant quadrangles differs from the number of visibility amplitudes by an amount equal to the number of gain amplitudes, $N$ . In the limit where we have no a priori information about the gains, then, the information content in the visibility quantities should be identical to that contained within the closure quantities. In this section, we demonstrate the reality of this equality and its lack of dependence on the specific choice of nonredundant closure subset for some selected test cases.

C.1. Closure Phase for N = 3 Stations

We consider a three-station interferometer with measured visibility phases ( ${\phi }_{12}$ , ${\phi }_{13}$ , ${\phi }_{23}$ ) and model visibility phases ( ${\hat{\phi }}_{12}$ , ${\hat{\phi }}_{13}$ , ${\hat{\phi }}_{23}$ ) related by the station gain phases ( ${\hat{\theta }}_{1}$ , ${\hat{\theta }}_{2}$ , ${\hat{\theta }}_{3}$ ) as

$\begin{eqnarray}&&{\phi }_{{ij}}={\hat{\phi }}_{{ij}}+{\hat{\theta }}_{i}-{\hat{\theta }}_{j}.\end{eqnarray} \tag{ C1 }$

The information contained in the measured visibilities is captured by their joint likelihood distribution, ${ \mathcal L }$ . If the visibility phases have Gaussian thermal variances ( ${\sigma }_{12}^{2}$ , ${\sigma }_{13}^{2}$ , ${\sigma }_{23}^{2}$ ), and if we assume that the gain contributions are also Gaussian distributed with variances ( ${\sigma }_{\theta ,1}^{2}$ , ${\sigma }_{\theta ,2}^{2}$ , ${\sigma }_{\theta ,3}^{2}$ ), then the likelihood of the measured visibility phases can be expressed as a multivariate Gaussian,

$\begin{eqnarray}&&{ \mathcal L }=\displaystyle \frac{1}{\sqrt{{\left(2\pi \right)}^{3}\det ({{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}})}}\exp \left[-\displaystyle \frac{1}{2}{\tilde{{\boldsymbol{\phi }}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}^{-1}\tilde{{\boldsymbol{\phi }}}\right],\end{eqnarray} \tag{ C2 }$

where

$\begin{eqnarray}&&\tilde{{\boldsymbol{\phi }}}=\left(\begin{array}{c}{\phi }_{12}-({\hat{\phi }}_{12}+{\hat{\theta }}_{1}-{\hat{\theta }}_{2})\\ {\phi }_{13}-({\hat{\phi }}_{13}+{\hat{\theta }}_{1}-{\hat{\theta }}_{3})\\ {\phi }_{23}-({\hat{\phi }}_{23}+{\hat{\theta }}_{2}-{\hat{\theta }}_{3})\end{array}\right)\equiv \left(\begin{array}{c}{\tilde{\phi }}_{12}\\ {\tilde{\phi }}_{13}\\ {\tilde{\phi }}_{23}\end{array}\right)\end{eqnarray} \tag{ C3 }$

is the vector of visibility phase residuals, and ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}$ is the visibility phase covariance matrix (see Appendix B.1). Because the likelihood is Gaussian and the variances are constant-valued, the quantity

$\begin{eqnarray}&&{\chi }_{\phi }^{2}={\tilde{{\boldsymbol{\phi }}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}^{-1}\tilde{{\boldsymbol{\phi }}}\end{eqnarray} \tag{ C4 }$

contains the same information as ${ \mathcal L }$ in a more compact form; we will thus proceed through the use of ${\chi }_{\phi }^{2}$ rather than ${ \mathcal L }$ .

Our expectation is that in the high-S/N limit (i.e., when the thermal noise is negligible compared to gain variations), the information content in the visibility phases will be identical to that in the closure phases; equivalently, ${\chi }_{\phi }^{2}$ for the visibility phases should equal ${\chi }_{\psi }^{2}$ for the closure phases. To simplify the mathematics and notation, let us now suppose that the array is perfectly homogeneous such that we can denote ${\sigma }_{\theta ,1}^{2}\,={\sigma }_{\theta ,2}^{2}={\sigma }_{\theta ,3}^{2}\equiv {\sigma }^{2}$ and ${\sigma }_{12}^{2}={\sigma }_{13}^{2}={\sigma }_{23}^{2}\equiv {\varepsilon }^{2}{\sigma }^{2}$ . The high-S/N limit thus corresponds to ${\varepsilon }^{2}\ll 1$ . To leading order in ${\varepsilon }^{2}$ , the inverse of the covariance matrix is

$\begin{eqnarray}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}^{-1}=\displaystyle \frac{1}{3\,{\varepsilon }^{2}{\sigma }^{2}}\left(\begin{array}{ccc}1 & -1 & 1\\ -1 & 1 & -1\\ 1 & -1 & 1\end{array}\right),\end{eqnarray} \tag{ C5 }$

which corresponds to a ${\chi }_{\phi }^{2}$ in the same limit of

$\begin{eqnarray}&&{\chi }_{\phi }^{2}=\displaystyle \frac{{\left({\tilde{\phi }}_{12}-{\tilde{\phi }}_{13}+{\tilde{\phi }}_{23}\right)}^{2}}{3\,{\varepsilon }^{2}{\sigma }^{2}}.\end{eqnarray} \tag{ C6 }$

Because the array contains only $N=3$ stations, $\left(\displaystyle \genfrac{}{}{0em}{}{N}{3}\right)=\left(\displaystyle \genfrac{}{}{0em}{}{N-1}{2}\right)$ and the complete set of closure phases is equal to the nonredundant set, both of which contain only a single element. We can write the model closure phase as ${\hat{\psi }}_{123}={\hat{\phi }}_{12}-{\hat{\phi }}_{13}+{\hat{\phi }}_{23}$ and the measured closure phase as ${\psi }_{123}={\phi }_{12}-{\phi }_{13}+{\phi }_{23}$ , with corresponding thermal noise given by ${\sigma }_{123}^{2}={\sigma }_{12}^{2}+{\sigma }_{13}^{2}+{\sigma }_{23}^{2}=3\epsilon {\sigma }^{2}$ . The value of ${\chi }_{\psi }^{2}$ is then written simply as

$\begin{eqnarray}\begin{array}{c}\begin{array}{rcl}{\chi }_{\psi }^{2} & = & \displaystyle \frac{{\left({\psi }_{123}-{\hat{\psi }}_{123}\right)}^{2}}{{\sigma }_{123}^{2}}\\ & = & \,\displaystyle \frac{{\left({\tilde{\phi }}_{12}-{\tilde{\phi }}_{13}+{\tilde{\phi }}_{23}\right)}^{2}}{3\,{\varepsilon }^{2}{\sigma }^{2}},\end{array}\end{array}\end{eqnarray} \tag{ C7 }$

which we can see is identical to Equation (C6).

C.2. Closure Phase for N = 4 Stations

We consider now a four-station interferometer with measured visibility phases ( ${\phi }_{12}$ , ${\phi }_{13}$ , ${\phi }_{14}$ , ${\phi }_{23}$ , ${\phi }_{24}$ , ${\phi }_{34}$ ) and model visibility phases ( ${\hat{\phi }}_{12}$ , ${\hat{\phi }}_{13}$ , ${\hat{\phi }}_{14}$ , ${\hat{\phi }}_{23}$ , ${\hat{\phi }}_{24}$ , ${\hat{\phi }}_{34}$ ) related by the station gain phases ( ${\hat{\theta }}_{1}$ , ${\hat{\theta }}_{2}$ , ${\hat{\theta }}_{3}$ , ${\hat{\theta }}_{4}$ ) as specified in Equation (C1). Following the same procedure as in the previous section, the inverse of the covariance matrix in the high-S/N limit is

$\begin{eqnarray}\begin{array}{c}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}^{-1}=\displaystyle \frac{1}{4\,{\varepsilon }^{2}{\sigma }^{2}}\left(\begin{array}{cccccc}2 & -1 & -1 & 1 & 1 & 0\\ -1 & 2 & -1 & -1 & 0 & 1\\ -1 & -1 & 2 & 0 & -1 & -1\\ 1 & -1 & 0 & 2 & -1 & 1\\ 1 & 0 & -1 & -1 & 2 & -1\\ 0 & 1 & -1 & 1 & -1 & 2\end{array}\right),\end{array}\end{eqnarray} \tag{ C8 }$

corresponding to

$\begin{eqnarray}\begin{array}{rcl}{\chi }_{\phi }^{2} & = & \displaystyle \frac{{\left({\tilde{\phi }}_{12}-{\tilde{\phi }}_{13}+{\tilde{\phi }}_{23}\right)}^{2}}{4\,{\varepsilon }^{2}{\sigma }^{2}}+\displaystyle \frac{{\left({\tilde{\phi }}_{12}-{\tilde{\phi }}_{14}+{\tilde{\phi }}_{24}\right)}^{2}}{4\,{\varepsilon }^{2}{\sigma }^{2}}\\ & & +\ \displaystyle \frac{{\left({\tilde{\phi }}_{13}-{\tilde{\phi }}_{14}+{\tilde{\phi }}_{34}\right)}^{2}}{4\,{\varepsilon }^{2}{\sigma }^{2}}+\displaystyle \frac{{\left({\tilde{\phi }}_{23}-{\tilde{\phi }}_{24}+{\tilde{\phi }}_{34}\right)}^{2}}{4\,{\varepsilon }^{2}{\sigma }^{2}}.\end{array}\end{eqnarray} \tag{ C9 }$

The four-station array has four closure phases in total, of which three are nonredundant. We specify a measured closure phase ${\psi }_{{ijk}}$ as

$\begin{eqnarray}&&{\psi }_{{ijk}}={\phi }_{{ij}}-{\phi }_{{ik}}+{\phi }_{{jk}},\end{eqnarray} \tag{ C10 }$

with an analogous specification for the corresponding model closure phase ${\hat{\psi }}_{{ijk}}$ . For a particular choice of nonredundant closure phase subset, the value of ${\chi }_{\psi }^{2}$ will depend on the covariance matrix ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}$ for the closure phases (see Appendix B.2 and Equation (B9)) and on the vector $\tilde{{\boldsymbol{\psi }}}$ of closure phase residuals,

$\begin{eqnarray}&&\tilde{{\boldsymbol{\psi }}}=\left(\begin{array}{c}{\psi }_{123}-{\hat{\psi }}_{123}\\ {\psi }_{124}-{\hat{\psi }}_{124}\\ {\psi }_{134}-{\hat{\psi }}_{134}\end{array}\right)\equiv \left(\begin{array}{c}{\tilde{\psi }}_{123}\\ {\tilde{\psi }}_{124}\\ {\tilde{\psi }}_{134}\end{array}\right).\end{eqnarray} \tag{ C11 }$

After computing the inverse of ${{\boldsymbol{\Sigma }}}_{\psi }$ ,

$\begin{eqnarray}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}=\displaystyle \frac{1}{4\,{\varepsilon }^{2}{\sigma }^{2}}\left(\begin{array}{ccc}2 & -1 & 1\\ -1 & 2 & -1\\ 1 & -1 & 2\end{array}\right),\end{eqnarray} \tag{ C12 }$

it is a tedious but straightforward algebraic exercise to obtain

$\begin{eqnarray}\begin{array}{rcl}{\chi }_{\psi }^{2} & = & {\tilde{{\boldsymbol{\psi }}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}\tilde{{\boldsymbol{\psi }}}\\ & = & \ \displaystyle \frac{{\tilde{\psi }}_{123}^{2}+{\tilde{\psi }}_{124}^{2}+{\tilde{\psi }}_{134}^{2}+{\tilde{\psi }}_{234}^{2}}{4\,{\varepsilon }^{2}{\sigma }^{2}}.\end{array}\end{eqnarray} \tag{ C13 }$

Because closure phases are constructed purely from sums and differences of visibility phases, ${\tilde{\psi }}_{{ijk}}={\tilde{\phi }}_{{ij}}-{\tilde{\phi }}_{{ik}}+{\tilde{\phi }}_{{jk}}$ , and thus, Equation (C13) is equivalent to Equation (C9). Furthermore, Equation (C13) no longer shows any signature of the original nonredundant closure phase subset choice; rather, each element of the full redundant set of four closure phases is represented equally, and the ${\chi }_{\psi }^{2}$ includes a 3/4 redundancy correction factor (see Equation (38)) corresponding to the ratio of linearly independent to total closure phases (note that ${\sigma }_{{ijk}}^{2}=3\,{\varepsilon }^{2}{\sigma }^{2})$ .

C.3. Closure Amplitude for N = 4 Stations

We consider again a four-station interferometer with measured log visibility amplitudes ( ${a}_{12}$ , ${a}_{13}$ , ${a}_{14}$ , ${a}_{23}$ , ${a}_{24}$ , ${a}_{34}$ ) and model log visibility amplitudes ( ${\hat{a}}_{12}$ , ${\hat{a}}_{13}$ , ${\hat{a}}_{14}$ , ${\hat{a}}_{23}$ , ${\hat{a}}_{24}$ , ${\hat{a}}_{34}$ ) related by the log station gain amplitudes ( ${\hat{g}}_{1}$ , ${\hat{g}}_{2}$ , ${\hat{g}}_{3}$ , ${\hat{g}}_{4}$ ) as

$\begin{eqnarray}&&E\left[{a}_{{ij}}\right]={\hat{a}}_{{ij}}+{\hat{g}}_{i}+{\hat{g}}_{j}.\end{eqnarray} \tag{ C14 }$

As in Appendix C.1, if the measured log visibility amplitudes have Gaussian thermal variances ( ${\sigma }_{12}^{2}$ , ${\sigma }_{13}^{2}$ , ${\sigma }_{14}^{2}$ , ${\sigma }_{23}^{2}$ , ${\sigma }_{24}^{2}$ , ${\sigma }_{34}^{2}$ ) and the log gain amplitude contributions are also Gaussian distributed with variances ( ${\sigma }_{g,1}^{2}$ , ${\sigma }_{g,2}^{2}$ , ${\sigma }_{g,3}^{2}$ , ${\sigma }_{g,4}^{2}$ ), then the joint distribution of the measured log visibility amplitudes can be expressed as a multivariate Gaussian. The covariance matrix for this distribution can be constructed using the procedure described in Appendix B.1.

If we once again treat the array as perfectly homogeneous and take the high-S/N limit, then to leading order in ${\varepsilon }^{2}$ , we find

$\begin{eqnarray}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{a}}}^{-1}=\displaystyle \frac{1}{6\,{\varepsilon }^{2}{\sigma }^{2}}\left(\begin{array}{cccccc}2 & -1 & -1 & -1 & -1 & 2\\ -1 & 2 & -1 & -1 & 2 & -1\\ -1 & -1 & 2 & 2 & -1 & -1\\ -1 & -1 & 2 & 2 & -1 & -1\\ -1 & 2 & -1 & -1 & 2 & -1\\ 2 & -1 & -1 & -1 & -1 & 2\end{array}\right).\end{eqnarray} \tag{ C15 }$

The corresponding ${\chi }_{a}^{2}={\tilde{{\boldsymbol{a}}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{a}}}^{-1}\tilde{{\boldsymbol{a}}}$ can then be written

$\begin{eqnarray}\begin{array}{rcl}{\chi }_{a}^{2} & = & \displaystyle \frac{{\left({\tilde{a}}_{12}+{\tilde{a}}_{34}-{\tilde{a}}_{13}-{\tilde{a}}_{24}\right)}^{2}}{6\,{\varepsilon }^{2}{\sigma }^{2}}\\ & & +\ \displaystyle \frac{{\left({\tilde{a}}_{12}+{\tilde{a}}_{34}-{\tilde{a}}_{14}-{\tilde{a}}_{23}\right)}^{2}}{6\,{\varepsilon }^{2}{\sigma }^{2}}\\ & & +\ \displaystyle \frac{{\left({\tilde{a}}_{13}+{\tilde{a}}_{24}-{\tilde{a}}_{14}-{\tilde{a}}_{23}\right)}^{2}}{6\,{\varepsilon }^{2}{\sigma }^{2}}.\end{array}\end{eqnarray} \tag{ C16 }$

The four-station array has three closure amplitudes in total, of which two are nonredundant. We specify a measured log-closure amplitude ${c}_{{ijk}{\ell }}$ as

$\begin{eqnarray}&&{c}_{{ijk}{\ell }}={a}_{{ij}}+{a}_{k{\ell }}-{a}_{{ik}}-{a}_{j{\ell }},\end{eqnarray} \tag{ C17 }$

with an analogous specification for the corresponding model log-closure amplitude ${\hat{c}}_{{ijk}{\ell }}$ . For a particular choice of nonredundant closure amplitude subset, the value of ${\chi }_{c}^{2}$ will depend on the covariance matrix ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}}}$ for the log-closure amplitudes (see Appendix B.3 and Equation (B19)) and on the vector $\tilde{{\boldsymbol{c}}}$ of log-closure amplitude residuals,

$\begin{eqnarray}&&\tilde{{\boldsymbol{c}}}=\left(\begin{array}{c}{c}_{1234}-{\hat{c}}_{1234}\\ {c}_{1243}-{\hat{c}}_{1243}\end{array}\right)\equiv \left(\begin{array}{c}{\tilde{c}}_{1234}\\ {\tilde{c}}_{1243}\end{array}\right).\end{eqnarray} \tag{ C18 }$

Written out more explicitly, the covariance matrix is given by

$\begin{eqnarray}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}}}=2\,{\varepsilon }^{2}{\sigma }^{2}\left(\begin{array}{cc}2 & 1\\ 1 & 2\end{array}\right),\end{eqnarray} \tag{ C19 }$

with corresponding inverse

$\begin{eqnarray}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}}}^{-1}=\displaystyle \frac{1}{6\,{\varepsilon }^{2}{\sigma }^{2}}\left(\begin{array}{cc}2 & -1\\ -1 & 2\end{array}\right).\end{eqnarray} \tag{ C20 }$

We thus obtain

$\begin{eqnarray}\begin{array}{rcl}{\chi }_{c}^{2} & = & {\tilde{{\boldsymbol{c}}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}}}^{-1}\tilde{{\boldsymbol{c}}}\\ & = & \ \displaystyle \frac{{\tilde{c}}_{1234}^{2}+{\tilde{c}}_{1243}^{2}+{\tilde{c}}_{1342}^{2}}{6\,{\varepsilon }^{2}{\sigma }^{2}},\end{array}\end{eqnarray} \tag{ C21 }$

which is equal to Equation (C16). As with the closure phases, we see that the initial choice of minimal log-closure amplitude subset has no bearing on the value of ${\chi }_{c}^{2}$ and accounts for the redundancy factor of total versus linearly independent closure amplitudes.

C.4. Closure Quantities for Arbitrary N

We introduce the notion of "mixed phases" that retain all of the information contained in the visibility phases, but we separate it into two components: one component that is captured by the closure phases and a second component that captures the remaining station-based effects. For an array with N stations, the mixed phase design matrix operates on the $B$ baseline phases and is given by

$\begin{eqnarray}\begin{array}{rcl}{{\boldsymbol{\Psi }}}_{+} & = & \left(\begin{array}{c}{{\boldsymbol{I}}}_{N-1,B}\\ {{\boldsymbol{\Psi }}}_{N}\end{array}\right)\\ & = & \ \left(\begin{array}{cc}{{\boldsymbol{I}}}_{N-1} & 0\\ {{\boldsymbol{\Phi }}}_{N-1} & {{\boldsymbol{I}}}_{\left(\displaystyle \genfrac{}{}{0em}{}{N-1}{2}\right)}\end{array}\right).\end{array}\end{eqnarray} \tag{ C22 }$

${{\boldsymbol{I}}}_{N-1,B}$ is an $(N-1)\times B$ "rectangular identity matrix" that extracts the first $N-1$ baseline phases by combining ${{\boldsymbol{I}}}_{N-1}$ , a standard square identity matrix of rank $N-1$ , with $0$ , an $(N-1)\times (B-N+1)$ matrix of all zeros. ${{\boldsymbol{\Psi }}}_{N}$ is the minimal closure phase design matrix for $N$ stations (see Equation (B13)), which can be expanded into the visibility phase design matrix ${{\boldsymbol{\Phi }}}_{N-1}$ for $N-1$ stations (see Equation (B4)) and a standard square identity matrix of rank $\left(\displaystyle \genfrac{}{}{0em}{}{N-1}{2}\right)$ . This design matrix maps from the visibility phase space to the mixed phase space,

$\begin{eqnarray}&&{{\boldsymbol{\psi }}}_{+}={{\boldsymbol{\Psi }}}_{+}{\boldsymbol{\phi }}.\end{eqnarray} \tag{ C23 }$

For example, the mixed phase design matrix for an array with N = 4 stations is given by

$\begin{eqnarray}{{\boldsymbol{\Psi }}}_{+}=\left(\begin{array}{cccccc}1 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 & 0\\ 1 & -1 & 0 & 1 & 0 & 0\\ 1 & 0 & -1 & 0 & 1 & 0\\ 0 & 1 & -1 & 0 & 0 & 1\end{array}\right),\end{eqnarray} \tag{ C24 }$

and the corresponding mixed phase vector is

$\begin{eqnarray}&&{{\boldsymbol{\psi }}}_{+}=\left(\begin{array}{c}{\phi }_{12}\\ {\phi }_{13}\\ {\phi }_{14}\\ {\phi }_{12}-{\phi }_{13}+{\phi }_{23}\\ {\phi }_{12}-{\phi }_{14}+{\phi }_{24}\\ {\phi }_{13}-{\phi }_{14}+{\phi }_{34}\end{array}\right).\end{eqnarray} \tag{ C25 }$

The mixed phase covariance matrix is given by

$\begin{eqnarray}&&{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}+}={{\boldsymbol{\Psi }}}_{+}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}{{\boldsymbol{\Psi }}}_{+}^{\top }.\end{eqnarray} \tag{ C26 }$

Using the inverse of the mixed phase design matrix,

$\begin{eqnarray}{{\boldsymbol{\Psi }}}_{+}^{-1}=\left(\begin{array}{cc}{{\boldsymbol{I}}}_{N-1} & 0\\ -{{\boldsymbol{\Phi }}}_{N-1} & {{\boldsymbol{I}}}_{\left(\displaystyle \genfrac{}{}{0em}{}{N-1}{2}\right)}\end{array}\right),\end{eqnarray} \tag{ C27 }$

we can invert Equation (C26) to obtain an expression for the visibility phase covariance matrix in terms of the mixed phase covariance matrix,

$\begin{eqnarray}&&{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}={{\boldsymbol{\Psi }}}_{+}^{-1}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}+}{{{\boldsymbol{\Psi }}}_{+}^{\top }}^{-1},\end{eqnarray} \tag{ C28 }$

where we note that the inverse transpose is equal to the transposed inverse for the mixed phase design matrix. We can use the above to substitute for ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}$ in our expression for the visibility phase ${\chi }^{2}$ (see Equation (C4)),

$\begin{eqnarray}\begin{array}{rcl}{\chi }_{\phi }^{2} & = & {\tilde{{\boldsymbol{\phi }}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}^{-1}\tilde{{\boldsymbol{\phi }}}\\ & = & \ {\tilde{{\boldsymbol{\phi }}}}^{\top }{\left({{\boldsymbol{\Psi }}}_{+}^{-1}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}+}{{{\boldsymbol{\Psi }}}_{+}^{\top }}^{-1}\right)}^{-1}\tilde{{\boldsymbol{\phi }}}\\ & = & \ {\tilde{{\boldsymbol{\phi }}}}^{\top }\left({{\boldsymbol{\Psi }}}_{+}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}+}^{-1}{{\boldsymbol{\Psi }}}_{+}\right)\tilde{{\boldsymbol{\phi }}}\\ & = & \ {\left({{\boldsymbol{\Psi }}}_{+}\tilde{{\boldsymbol{\phi }}}\right)}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}+}^{-1}\left({{\boldsymbol{\Psi }}}_{+}\tilde{{\boldsymbol{\phi }}}\right)\\ & = & \ {\tilde{{\boldsymbol{\psi }}}}_{+}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}+}^{-1}{\tilde{{\boldsymbol{\psi }}}}_{+},\end{array}\end{eqnarray} \tag{ C29 }$

revealing that the χ² constructed from mixed phases is equal to that constructed from visibility phases, when all covariances are taken into account. This is due to the fact that the mixed phases are generated through a non-singular linear transformation of the visibility phases.

To see how the mixed phases reduce to purely closure phases in the high-S/N limit, it is convenient to consider the following decomposition of the mixed phase covariance matrix:

$\begin{eqnarray}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}+}=\left(\begin{array}{cc}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}^{{\prime} } & {{\boldsymbol{W}}}^{\top }\\ {\boldsymbol{W}} & {{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}\end{array}\right),\end{eqnarray} \tag{ C30 }$

where ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}^{{\prime} }$ is the first (N − 1) × (N − 1) upper left subset of the full visibility phase covariance matrix ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}$ , ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}$ is the closure phase covariance matrix, and ${\boldsymbol{W}}={{\boldsymbol{\Phi }}}_{N-1}\,{{\boldsymbol{S}}}_{{\boldsymbol{\phi }}}^{{\prime} }$ is the covariance between the closure phases and the first $N-1$ visibility phases. Since the closure phases are independent of station gain, both ${\boldsymbol{W}}$ and ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}$ include only baseline thermal noise.

Using a strategy analogous to that employed in the previous sections, where parameter ${\varepsilon }^{2}\sim { \mathcal O }({\sigma }_{{ij}}^{2}/{\sigma }_{\theta ,i}^{2})$ relates statistical error in visibility phase to that from gain uncertainty, we examine the behavior of ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}+}^{-1}$ as $\varepsilon \to 0$ . The sub-matrices of ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}+}$ scale with ε as

$\begin{eqnarray*}\begin{array}{rcl}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}^{{\prime} } & \propto & 1\\ {\boldsymbol{W}} & \propto & {\varepsilon }^{2}\\ {{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}} & \propto & {\varepsilon }^{2}.\end{array}\end{eqnarray*}$

From the block matrix form of Equation (C30), we can write the inverse mixed phase covariance matrix as

$\begin{eqnarray}&&\begin{array}{l}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}+}^{-1}=\left(\begin{array}{c}{\left({{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}^{{\prime} }-{{\boldsymbol{W}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}{\boldsymbol{W}}\right)}^{-1}\\ -{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}{\boldsymbol{W}}{\left({{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}^{{\prime} }-{{\boldsymbol{W}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}{\boldsymbol{W}}\right)}^{-1}\end{array}\right.,\\ \left.\begin{array}{c}\ \ -\ {\left({{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}^{{\prime} }-{{\boldsymbol{W}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}{\boldsymbol{W}}\right)}^{-1}{{\boldsymbol{W}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}\\ {{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}+{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}{\boldsymbol{W}}{\left({{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}^{{\prime} }-{{\boldsymbol{W}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}{\boldsymbol{W}}\right)}^{-1}{{\boldsymbol{W}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}\end{array}\right).\end{array}\end{eqnarray} \tag{ C31 }$

These four sub-matrices scale with ε as

$\begin{eqnarray*}\begin{array}{rcl}{\left({{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}^{{\prime} }-{{\boldsymbol{W}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}{\boldsymbol{W}}\right)}^{-1} & \propto & 1\\ {\left({{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}^{{\prime} }-{{\boldsymbol{W}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}{\boldsymbol{W}}\right)}^{-1}{{\boldsymbol{W}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1} & \propto & 1\\ {{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}{\boldsymbol{W}}{\left({{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}^{{\prime} }-{{\boldsymbol{W}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}{\boldsymbol{W}}\right)}^{-1} & \propto & 1\\ {{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}+{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}{\boldsymbol{W}}{\left({{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}^{{\prime} }-{{\boldsymbol{W}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}{\boldsymbol{W}}\right)}^{-1}{{\boldsymbol{W}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1} & \propto & \displaystyle \frac{1}{{\varepsilon }^{2}},\end{array}\end{eqnarray*}$

and we can see that the ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}$ term in the lower right sub-matrix dominates ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}+}^{-1}$ . The product of ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}+}^{-1}$ with ${\tilde{{\boldsymbol{\psi }}}}_{+}$ in this limit will therefore serve to isolate the last t terms of ${\tilde{{\boldsymbol{\psi }}}}_{+}$ (which are just the closure phases $\tilde{{\boldsymbol{\psi }}}$ ), and multiply them by ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}$ :

$\begin{eqnarray}\begin{array}{rcl}{\mathrm{lim}}_{\varepsilon \to 0}\left({\chi }_{\phi }^{2}\right) & = & {\tilde{{\boldsymbol{\psi }}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}^{-1}\tilde{{\boldsymbol{\psi }}}\\ & = & \ {\chi }_{\psi }^{2}.\end{array}\end{eqnarray} \tag{ C32 }$

The final result is that the visibility phase ${\chi }^{2}$ , in the limit where the uncertainty in the baseline-based quantities is much smaller than the uncertainty in the station-based quantities, is equal to the closure phase χ².

The equivalence between ${\chi }_{a}^{2}$ derived from a complete set of log visibility amplitudes in the $\varepsilon \to 0$ limit and ${\chi }_{c}^{2}$ derived from log-closure amplitudes is demonstrated the same way. Here, the corresponding design matrix and mixed log amplitudes are

$\begin{eqnarray}&&{{\boldsymbol{C}}}_{+}=\left(\begin{array}{c}{{\boldsymbol{I}}}_{N,B}\\ {{\boldsymbol{C}}}_{N}\end{array}\right)\qquad {{\boldsymbol{c}}}_{+}={{\boldsymbol{C}}}_{+}\,{\boldsymbol{a}},\end{eqnarray} \tag{ C33 }$

which draws from the first $N$ visibility log amplitudes followed by a minimal set of log-closure amplitudes. The transformation to mixed quantities is non-singular, as long as the first $N$ baselines drawn do not form any closed quadrangles (or the first $N-1$ baselines do not form any closed triangles, in the case of mixed phases). This condition is met by the baseline ordering convention used in this paper. The reduction in Equations (C30)–(C32) then follows under substitution of phase with log-amplitude quantities.

C.5. Explicit Gain Marginalization

In the previous Sections C.1–C.4, we have shown that the information content in closure quantities is equivalent to that in the baseline visibilities for the limit of completely unconstrained gains, so long as the covariance structure of the corresponding observables is taken into account. We now relate the use of covariance in the residual visibility likelihood construction (Equation (C2)) to explicit analytic marginalization over Gaussian uncertainties in phase or log-amplitude station gain. Thus, in the limit of completely unconstrained gains, use of closure quantities should give identical results to explicit numerical marginalization over all possible gains; and, for the case of finite Gaussian uncertainties in phase or log-amplitude station gain, use of the residual visibility covariance should give identical results to explicit numerical marginalization over Gaussian priors for the gains.

For an array with N stations under modeled gain corrections, we can write Equation (C14) for all baselines using

$\begin{eqnarray}&&E\left[\tilde{{\boldsymbol{a}}}\right]=E\left[{\boldsymbol{a}}\right]-\hat{{\boldsymbol{a}}}={\boldsymbol{A}}({\boldsymbol{g}}-\hat{{\boldsymbol{g}}})={\boldsymbol{A}}\tilde{{\boldsymbol{g}}},\end{eqnarray} \tag{ C34 }$

where ${\boldsymbol{A}}\tilde{{\boldsymbol{g}}}$ is a residual vector of log visibility amplitude correction factors. For example, a three-station array would have

$\begin{eqnarray}{\boldsymbol{A}}\tilde{{\boldsymbol{g}}}=\left(\begin{array}{ccc}1 & 1 & 0\\ 1 & 0 & 1\\ 0 & 1 & 1\end{array}\right)\left(\begin{array}{c}{\tilde{g}}_{1}\\ {\tilde{g}}_{2}\\ {\tilde{g}}_{3}\end{array}\right)=\left(\begin{array}{c}{\tilde{g}}_{1}+{\tilde{g}}_{2}\\ {\tilde{g}}_{1}+{\tilde{g}}_{3}\\ {\tilde{g}}_{2}+{\tilde{g}}_{3}\end{array}\right).\end{eqnarray} \tag{ C35 }$

The Gaussian likelihood for calibrated log visibility amplitudes is then expressed as

$\begin{eqnarray}&&p(\tilde{{\boldsymbol{a}}}| \tilde{{\boldsymbol{g}}})=\displaystyle \frac{1}{\sqrt{\det \left(2\pi {{\boldsymbol{S}}}_{{\boldsymbol{a}}}\right)}}\exp \left[-\displaystyle \frac{1}{2}{\left(\tilde{{\boldsymbol{a}}}-{\boldsymbol{A}}\tilde{{\boldsymbol{g}}}\right)}^{\top }{{\boldsymbol{S}}}_{{\boldsymbol{a}}}^{-1}\left({\boldsymbol{a}}-{\boldsymbol{A}}\tilde{{\boldsymbol{g}}}\right)\right],\end{eqnarray} \tag{ C36 }$

where we note that ${{\boldsymbol{S}}}_{{\boldsymbol{a}}}$ contains only baseline thermal noise and is diagonal.

If we further impose independent zero-mean Gaussian priors on each of the model gain correction factors, we can express the joint prior as

$\begin{eqnarray}&&p(\tilde{{\boldsymbol{g}}})=\displaystyle \frac{1}{\sqrt{\det \left(2\pi {{\boldsymbol{\Sigma }}}_{{\boldsymbol{g}}}\right)}}\exp \left[-\displaystyle \frac{1}{2}{\tilde{{\boldsymbol{g}}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{g}}}^{-1}\tilde{{\boldsymbol{g}}}\right].\end{eqnarray} \tag{ C37 }$

We can use this prior to marginalize the log visibility amplitudes over the log gain amplitudes,

$\begin{eqnarray}&&{{ \mathcal L }}_{a}=\int p(\tilde{{\boldsymbol{a}}}| \tilde{{\boldsymbol{g}}})p(\tilde{{\boldsymbol{g}}})\,{\rm{d}}\tilde{{\boldsymbol{g}}},\end{eqnarray} \tag{ C38 }$

where ${{ \mathcal L }}_{a}$ represents the marginalized likelihood.

The integrand in Equation (C38) is a product of exponentials, which together contain several terms that depend on $\tilde{{\boldsymbol{g}}}$ . To evaluate the integral, we would like to consolidate these terms. By defining

$\begin{eqnarray}&&{\boldsymbol{M}}={{\boldsymbol{M}}}^{\top }\equiv {{\boldsymbol{A}}}^{\top }{{\boldsymbol{S}}}_{{\boldsymbol{a}}}^{-1}{\boldsymbol{A}}+{{\boldsymbol{\Sigma }}}_{{\boldsymbol{g}}}^{-1}\end{eqnarray} \tag{ C39 }$

$\begin{eqnarray}&&{\boldsymbol{\mu }}\equiv {{\boldsymbol{A}}}^{\top }{{\boldsymbol{S}}}_{{\boldsymbol{a}}}^{-1}\tilde{{\boldsymbol{a}}},\end{eqnarray} \tag{ C40 }$

completing the square, and then pulling terms that do not depend on $\tilde{{\boldsymbol{g}}}$ out of the integral, we obtain

$\begin{eqnarray}\begin{array}{rcl}{{ \mathcal L }}_{a} & = & \displaystyle \frac{1}{\sqrt{\det \left(2\pi {{\boldsymbol{S}}}_{{\boldsymbol{a}}}\right)\det \left(2\pi {{\boldsymbol{\Sigma }}}_{{\boldsymbol{g}}}\right)}}\\ & & \times \ \exp \left[-\displaystyle \frac{1}{2}\left({\tilde{{\boldsymbol{a}}}}^{\top }{{\boldsymbol{S}}}_{{\boldsymbol{a}}}^{-1}\tilde{{\boldsymbol{a}}}-{{\boldsymbol{\mu }}}^{\top }{{\boldsymbol{M}}}^{-1}{\boldsymbol{\mu }}\right)\right]\\ & & \times \ \displaystyle \int \exp \left[-\displaystyle \frac{1}{2}{\left(\tilde{{\boldsymbol{g}}}-{{\boldsymbol{M}}}^{-1}{\boldsymbol{\mu }}\right)}^{\top }{\boldsymbol{M}}\left(\tilde{{\boldsymbol{g}}}-{{\boldsymbol{M}}}^{-1}{\boldsymbol{\mu }}\right)\right]{\rm{d}}\tilde{{\boldsymbol{g}}}.\end{array}\end{eqnarray} \tag{ C41 }$

The integrand now contains only a single multivariate Gaussian in $\tilde{{\boldsymbol{g}}}$ , with mean ${{\boldsymbol{M}}}^{-1}{\boldsymbol{\mu }}$ and covariance ${{\boldsymbol{M}}}^{-1}$ . Integrating over all $\tilde{{\boldsymbol{g}}}$ thus yields the volume $\sqrt{\det \left(2\pi {{\boldsymbol{M}}}^{-1}\right)}$ , so that

$\begin{eqnarray}\begin{array}{rcl}{{ \mathcal L }}_{a} & = & \displaystyle \frac{1}{\sqrt{\det \left(2\pi {{\boldsymbol{S}}}_{{\boldsymbol{a}}}\right)\det \left(2\pi {{\boldsymbol{\Sigma }}}_{{\boldsymbol{g}}}\right)\det \left({\boldsymbol{M}}/2\pi \right)}}\\ & & \times \ \exp \left[-\displaystyle \frac{1}{2}\left({\tilde{{\boldsymbol{a}}}}^{\top }{{\boldsymbol{S}}}_{{\boldsymbol{a}}}^{-1}\tilde{{\boldsymbol{a}}}-{{\boldsymbol{\mu }}}^{\top }{{\boldsymbol{M}}}^{-1}{\boldsymbol{\mu }}\right)\right].\end{array}\end{eqnarray} \tag{ C42 }$

Upon expanding ${{\boldsymbol{\mu }}}^{\top }{{\boldsymbol{M}}}^{-1}{\boldsymbol{\mu }}$ and directly applying the Woodbury matrix inverse identity, we obtain

$\begin{eqnarray}\begin{array}{rcl}{{ \mathcal L }}_{a} & = & \displaystyle \frac{1}{\sqrt{\det \left(2\pi {{\boldsymbol{S}}}_{{\boldsymbol{a}}}\right)\det \left(2\pi {{\boldsymbol{\Sigma }}}_{{\boldsymbol{g}}}\right)\det \left({\boldsymbol{M}}/2\pi \right)}}\\ & & \times \ \exp \left[-\displaystyle \frac{1}{2}{\tilde{{\boldsymbol{a}}}}^{\top }{\left({{\boldsymbol{S}}}_{{\boldsymbol{a}}}+{\boldsymbol{A}}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{g}}}{{\boldsymbol{A}}}^{\top }\right)}^{-1}\tilde{{\boldsymbol{a}}}\right],\end{array}\end{eqnarray} \tag{ C43 }$

where we have obtained the log visibility amplitude covariance ${{\boldsymbol{\Sigma }}}_{{\boldsymbol{a}}}={{\boldsymbol{S}}}_{{\boldsymbol{a}}}+{\boldsymbol{A}}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{g}}}{{\boldsymbol{A}}}^{\top }$ analogous to Equation (B3).

For the determinants in the normalization constant,

$\begin{eqnarray}&&\det \left({\boldsymbol{M}}/2\pi \right)\det \left(2\pi {{\boldsymbol{\Sigma }}}_{{\boldsymbol{g}}}\right)\det \left(2\pi {{\boldsymbol{S}}}_{{\boldsymbol{a}}}\right)\end{eqnarray} \tag{ C44 }$

$\begin{eqnarray}&&=\ \det \left({{\boldsymbol{A}}}^{\top }{{\boldsymbol{S}}}_{{\boldsymbol{a}}}^{-1}{\boldsymbol{A}}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{g}}}+{\boldsymbol{I}}\right)\det \left(2\pi {{\boldsymbol{S}}}_{{\boldsymbol{a}}}\right)\end{eqnarray} \tag{ C45 }$

$\begin{eqnarray}&&=\ \det \left({\boldsymbol{A}}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{g}}}{{\boldsymbol{A}}}^{\top }{{\boldsymbol{S}}}_{{\boldsymbol{a}}}^{-1}+{\boldsymbol{I}}\right)\det \left(2\pi {{\boldsymbol{S}}}_{{\boldsymbol{a}}}\right)\end{eqnarray} \tag{ C46 }$

$\begin{eqnarray}&&=\ \det \left(2\pi ({\boldsymbol{A}}{{\boldsymbol{\Sigma }}}_{{\boldsymbol{g}}}{{\boldsymbol{A}}}^{\top }+{{\boldsymbol{S}}}_{{\boldsymbol{a}}}\right).\end{eqnarray} \tag{ C47 }$

Here, we have used the Weinstein–Aronszajn matrix identity $\det \left({\boldsymbol{I}}+{\boldsymbol{XY}}\right)=\det \left({\boldsymbol{I}}+{\boldsymbol{YX}}\right)$ . The marginalized likelihood can thus be written as

$\begin{eqnarray}&&{{ \mathcal L }}_{a}=\displaystyle \frac{1}{\sqrt{\det \left(2\pi {{\boldsymbol{\Sigma }}}_{{\boldsymbol{a}}}\right)}}\exp \left[-\displaystyle \frac{1}{2}{\tilde{{\boldsymbol{a}}}}^{\top }{{\boldsymbol{\Sigma }}}_{{\boldsymbol{a}}}^{-1}\tilde{{\boldsymbol{a}}}\right],\end{eqnarray} \tag{ C48 }$

showing that the marginalization over Gaussian priors in log gain amplitude is fully captured through the use of visibility covariance as in Equation (C2). The derivation applies to any linear transformation of independent Gaussian observables. In particular, it is the same for partially known visibility phases under the substitution $({\boldsymbol{a}},{\boldsymbol{A}},{\boldsymbol{g}})\to ({\boldsymbol{\phi }},{\boldsymbol{\Phi }},{\boldsymbol{\theta }})$ .

Appendix D: Notation

Table D1 lists the notation used throughout this paper Vector quantities reflect values taken over a common set of recorded signals at N antennas. Throughout, we distinguish the measured values (no accent) with a forward model (breve accent), as well as the model residual (measured minus model value, tilde accent).

Table D1. Notation Used in This Paper

		Measured Value		Model Parameter		Residual		Residual Error
Quantity	Number	Single	Vector	Single	Vector	Single	Vector	Variance	Covariance	Design
Complex gain	$N$	${\gamma }_{i}$	$...$	${\hat{\gamma }}_{i}$	$...$	${\tilde{\gamma }}_{i}$	$...$	$...$	$...$	$...$
Complex visibility	$B$	${V}_{{ij}}$	$...$	${\hat{V}}_{{ij}}$	$...$	${\tilde{V}}_{{ij}}$	$...$	$2\,{\sigma }_{V,{ij}}^{2}$ (thermal only)	$...$	$...$

Gain phase	$N$	${\theta }_{i}$	${\boldsymbol{\theta }}$	${\hat{\theta }}_{i}$	$\hat{{\boldsymbol{\theta }}}$	${\tilde{\theta }}_{i}$	$\tilde{{\boldsymbol{\theta }}}$	${\sigma }_{\theta ,i}^{2}$	${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\theta }}}$	$...$
Visibility phase	$B$	${\phi }_{{ij}}$	${\boldsymbol{\phi }}$	${\hat{\phi }}_{{ij}}$	$\hat{{\boldsymbol{\phi }}}$	${\tilde{\phi }}_{{ij}}$	$\tilde{{\boldsymbol{\phi }}}$	${\sigma }_{{ij}}^{2}+{\sigma }_{\theta ,i}^{2}+{\sigma }_{\theta ,j}^{2}$	${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\phi }}}$	${\boldsymbol{\Phi }}$
Closure phase	$T$	${\psi }_{{ijk}}$	${\boldsymbol{\psi }}$	${\hat{\psi }}_{{ijk}}$	$\hat{{\boldsymbol{\psi }}}$	${\tilde{\psi }}_{{ijk}}$	$\tilde{{\boldsymbol{\psi }}}$	${\sigma }_{{ijk}}^{2}$	${{\boldsymbol{\Sigma }}}_{{\boldsymbol{\psi }}}$	${\boldsymbol{\Psi }}$

Gain amplitude	$N$	${G}_{i}$	$...$	${\hat{G}}_{i}$	$...$	${\tilde{G}}_{i}$	$...$	$...$	$...$	$...$
Visibility amplitude	$B$	${A}_{{ij}}$	$...$	${\hat{A}}_{{ij}}$	$...$	${\tilde{A}}_{{ij}}$	$...$	$...$	$...$	$...$
Closure amplitude	$Q$	${C}_{{ijkl}}$	$...$	${\hat{C}}_{{ijkl}}$	$...$	${\tilde{C}}_{{ijkl}}$	$...$	$...$	$...$	$...$
Log gain amplitude	$N$	${g}_{i}$	${\boldsymbol{g}}$	${\hat{g}}_{i}$	$\hat{{\boldsymbol{g}}}$	${\tilde{g}}_{i}$	$\tilde{{\boldsymbol{g}}}$	${\sigma }_{g,i}^{2}$	${{\boldsymbol{\Sigma }}}_{{\boldsymbol{g}}}$	$...$
Log visibility amplitude	$B$	${a}_{{ij}}$	${\boldsymbol{a}}$	${\hat{a}}_{{ij}}$	$\hat{{\boldsymbol{a}}}$	${\tilde{a}}_{{ij}}$	$\tilde{{\boldsymbol{a}}}$	${\sigma }_{{ij}}^{2}+{\sigma }_{g,i}^{2}+{\sigma }_{g,j}^{2}$	${{\boldsymbol{\Sigma }}}_{{\boldsymbol{a}}}$	${\boldsymbol{A}}$
Log-closure amplitude	$Q$	${c}_{{ijkl}}$	${\boldsymbol{c}}$	${\hat{c}}_{{ijkl}}$	$\hat{{\boldsymbol{c}}}$	${\tilde{c}}_{{ijkl}}$	$\tilde{{\boldsymbol{c}}}$	${\sigma }_{{ijkl}}^{2}$	${{\boldsymbol{\Sigma }}}_{{\boldsymbol{c}}}$	${\boldsymbol{C}}$

Note. Measured values reflect contributions from thermal noise and systematic errors. The residual error represents the expected covariance across a set of simultaneously observed residual quantities (measured minus model values). For visibilities, the residual covariance includes contributions from both thermal (statistical) and gain (systematic) errors in the general case. At high S/N, the variance for phase and log amplitude are equal; thus, we use the same symbol ${\sigma }_{{ij}}^{2}\approx {\sigma }_{V,{ij}}^{2}/{A}_{{ij}}$ for notational simplicity. ${\sigma }_{V,{ij}}^{2}$ reflects the thermal noise in one component of the complex visibility, and it is known a priori to high precision.