Characterizing and Mitigating Intraday Variability Reconstructing Source Structure in Accreting Black Holes with mm-VLBI

The extraordinary physical resolution afforded by the Event Horizon Telescope has opened a window onto the astrophysical phenomena unfolding on horizon scales in two known black holes, M87 * and Sgr A * . However, with this leap in resolution has come a new set of practical complications. Sgr A * exhibits intraday variability that violates the assumptions underlying Earth aperture synthesis, limiting traditional image reconstruction methods to short timescales and data sets with very sparse ( u , v ) coverage. We present a new set of tools to detect and mitigate this variability. We develop a data-driven, model-agnostic procedure to detect and characterize the spatial structure of intraday variability. This method is calibrated against a large set of mock data sets, producing an empirical estimator of the spatial power spectrum of the brightness ﬂ uctuations. We present a novel Bayesian noise modeling algorithm that simultaneously reconstructs an average image and statistical measure of the ﬂ uctuations about it using a parameterized form for the excess variance in the complex visibilities not otherwise explained by the statistical errors. These methods are validated using a variety of simulated data, including general relativistic magnetohydrodynamic simulations appropriate for Sgr A * and M87 * . We ﬁ nd that the reconstructed source structure and variability are robust to changes in the underlying image model. We apply these methods to the 2017 EHT observations of M87 * , ﬁ nding evidence for variability across the EHT observing campaign. The variability mitigation strategies presented are widely applicable to very long baseline interferometry observations of variable sources generally, for which they provide a data-informed averaging procedure and natural characterization of inter-epoch image consistency.


Introduction
The Event Horizon Telescope (EHT) is a global very long baseline interferometry (VLBI) array observing at a wavelength of ∼1 mm, and it has been assembled with the primary goal of resolving the horizon-scale emission structures of both the M87 * (Event Horizon Telescope Collaboration et al. 2019aCollaboration et al. , 2019bCollaboration et al. , 2019cCollaboration et al. , 2019d, 2019e, 2019f, 2019e, 2019f; hereafter M87 * Papers I, II, III, IV, V, and VI, respectively) and Sgr A * (Event Horizon Telescope Collaboration et al. 2021cCollaboration et al. , 2022aCollaboration et al. , 2022bCollaboration et al. , 2022d, 2022e, 2022f, 2022e, 2022f; hereafter Papers I-VI, respectively) supermassive black holes (SMBHs).To date, all sources observed with the EHT during its 2017 campaign have been spatially resolved (Kim et al. 2020;Janssen et al. 2021), and many have exhibited structural variability across days.The most extreme of these variable sources is Sgr A * , which exhibits substantial variability across the electromagnetic spectrum (see, e.g., Boyce et al. 2019;Witzel et al. 2021;Wielgus et al. 2022, and references therein) and on horizon scales (see, e.g., Fish et al. 2011;Gravity Collaboration et al. 2018).The mass of Sgr A * is M ≈ 4 × 10 6 M e (Do et al. 2019;Gravity Collaboration et al. 2019, 2020), corresponding to a gravitational timescale of t g = GM/c 3 ≈ 20 s, which is much shorter than the ∼10 hr duration of a typical EHT observing track.
Sources exhibiting substantial structural variability on timescales that are short compared to a VLBI observing track complicate the image reconstruction process.Because VLBI arrays have only sparse instantaneous coverage of the Fourier plane, standard practice is to use Earth-rotation aperture synthesis to improve the coverage by observing the source for several hours (Thompson et al. 2017); as the Earth rotates, so do the baselines in the array, which therefore populate new regions of the Fourier plane.However, image reconstruction using data acquired via this technique assumes that the source remains static for the duration of the observation, such that the rotated baselines continue to sample the same source structure.For rapidly variable sources like Sgr A * , this assumption is broken.
The potential presence of short-timescale variability raises two distinct challenges: characterization and mitigation.The latter necessarily depends on the former; an appropriate mitigation scheme can only be selected after the properties of the variability have been identified.Methods to detect variability intrinsic to the source, and not imposed by subsequent atmospheric or station-induced corruptions, may be constructed using closure phases (Roelofs et al. 2017;Satapathy et al. 2022).However, the interpretation of such measures is complicated by the nontrivial relationship between closure phases and source structure.
We introduce an additional purely empirical method for detecting and, more importantly, characterizing structural variability based on visibility amplitudes.While this novel method must contend with systematic calibration uncertainties, the results may be immediately interpreted as the power spectrum of image fluctuations.This power spectrum naturally provides a prior on subsequent mitigation schemes.
There are at least three conceptually distinct approaches that have been applied (or proposed to be applied) for reconstructing variable sources from VLBI data: 1.The first approach-which we will refer to as the "snapshot reconstruction" approach-involves simply dividing the data into time segments that are short compared to the source variability timescale and then using standard reconstruction techniques.For sources whose variability timescales are longer than a few hours, the snapshot reconstruction approach is no different from multiepoch imaging, which has seen regular use in the Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence.Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
VLBI community to make movies of astronomical sources (e.g., Matthews et al. 2010;Lister et al. 2018;Walker et al. 2018;Kim et al. 2020). 150However, for more rapidly variable sources the required data segmentation may lead to inadequate Fourier coverage that compromises image fidelity (e.g., Fomalont et al. 2001) or necessitates the use of restricted source models (e.g., Miller-Jones et al. 2019).2. The second approach involves directly reconstructing a dynamic source model rather than a static one, and we will refer to this approach as "dynamical reconstruction." We draw a distinction between dynamical reconstruction and snapshot reconstruction in that the former attempts to recover the time-continuous source structure, rather than restricting itself to static "frames" at the points in time where snapshot data are available.Example dynamical reconstruction algorithms include dynamical imaging (Johnson et al. 2017), StarWarps (Bouman et al. 2018), and RESOLVE (Arras et al. 2019(Arras et al. , 2022)), all of which impose some manner of temporal coherence or regularization on the reconstructed source structure.Such temporal regularization is necessary both to interpolate between times when data are present and to enforce structural continuity in time.In principle, this capability for data taken at one point in time to inform the image structure at another point in time should mean that dynamical reconstruction algorithms have less stringent requirements on dense instantaneous Fourier coverage than snapshot reconstruction algorithms do.But in practice, current dynamical reconstruction algorithms struggle in similar regimes of data sparsity to snapshot reconstructions (see Section 9 of Paper III). 3. The third approach is "visibility stacking," whereby the source is observed over multiple epochs and the complex visibilities at each location in the Fourier plane are averaged in time.By the linearity of the Fourier transform, the image reconstructed from time-averaged visibilities should be the same as the average of images reconstructed from single-epoch visibilities, meaning that visibility stacking provides a method for recovering the average image structure from time-variable data.Lu et al. (2016) applied a visibility stacking technique, including light-curve normalization and temporal smoothing, to simulated EHT observations of Sgr A * , demonstrating that multiepoch averaging of the complex visibilities can substantially improve the reconstructed image fidelity compared to single-epoch imaging.However, such averaging requires access to calibrated visibilities, and in practice the visibility phases of EHT data are irretrievably corrupted by atmospheric effects (M87 * Paper III). 151To this list we add a fourth approach: statistical reconstruction of the variability alongside static image reconstruction of the average source structure that may be deployed in Bayesian image reconstruction methods like those described in Broderick et al. (2020b) and Pesce (2021).Any variable structure can be decomposed, without loss of generality, into a static (average) component and a dynamic (variable) component.If the duration of the observation is much longer than the variability timescale, then the variable component can be usefully conceptualized as defining a distribution from which the measured visibilities at any instant are drawn.If this distribution is parameterized and an appropriate likelihood function constructed, then the data can be used to simultaneously constrain both the parameters describing the variability and the average source structure.For EHT observations of Sgr A * , in which the observing window (∼10 hr) is much longer than the expected variability timescale (∼minutes), this statistical approach to source reconstruction in the face of variability is well motivated.
This paper is organized as follows: In Section 2 we provide a theoretical expectation for the variability behavior of EHT sources.In Section 3 we outline our proposed strategies for measuring and mitigating this variability during reconstruction.We describe a data-driven, model-agnostic procedure for characterizing and quantifying the amount of variability contained in the data in Section 4. In Section 5 we present an algorithm for simultaneously reconstructing both the average source structure and its parameterized variability.Both methods are demonstrated by application to the 2017 EHT M87 * data in Section 6.Finally, we summarize and conclude in Section 7.

Expected Source Variability of EHT Targets
Structural variability in EHT observations of horizon-scale sources (i.e., Sgr A * and M87 * ) is anticipated to result from the turbulent physical processes associated with accretion and jet launching near the central black hole.Modeling these turbulent processes is currently performed via large-scale numerical computations, typically within the general relativistic magnetohydrodynamic (GRMHD) paradigm (see, e.g., M87 * Paper V; Paper V, and references therein).These simulations may be post-processed through ray-tracing and radiative transfer computations to generate simulated images of the near-horizon emission regions (see, e.g., Gold et al. 2020, and references therein).Through this procedure, physically self-consistent and observationally credible predictions for the horizon-scale image structure and variability may be generated, from which the expected variability properties may be extracted.The resultant variability properties across the library of GRMHD simulations in Paper V are discussed in detail in Georgiev et al. (2022), and we only summarize them here.
We employ the library of simulation images constructed for EHT observations of Sgr A * to motivate the form of our variability mitigation strategy (Paper V).Each simulation generates a sequence of such images, or "movies."These movies can be decomposed into a mean image and movie of random brightness fluctuations, both of which are of intrinsic interest for EHT sources.The spatial power spectrum density (PSD) of the brightness fluctuations are computed and presented in Georgiev et al. (2022), which finds that they are universally well characterized by a "red-red" noise process.That is, the largest brightness fluctuations typically occur on the longest timescales and the largest spatial scales.
This procedure can be repeated for movies of different temporal lengths, generating families of PSDs that are indicative of the fluctuation spectra over the different observationally relevant timescales.In all cases, the PSDs are well described by a broken power law in spatial frequency, 152and therefore in baseline length, |u|.The location of this break ranges from u 0 ∼ 1 to 3 Gλ, corresponding to angular scales ranging from 70 to 200 μas and associated with the typical extent of the emission within the image.For |u| < u 0 , the PSD asymptotes to a constant.For averaging times T  10 3 GM/c 3 the spatial PSDs are independent of T, indicating the lack of correlated variability on longer timescales in most simulations.For Sgr A * and M87 * , 10 3 GM/c 3 corresponds roughly to 5.5 hr and 1 yr, respectively.For averaging times shorter than this timescale, the maximum of the PSD decreases roughly ∝T 2-3 and the location of the break increases.
The variability in the brightness fluctuations can be substantially reduced by normalizing the visibility data by the total flux light curve.This leverages the "red-red" nature of the brightness fluctuations and explicitly eliminates the variability on zero baselines.Because variability on |u| < u 0 is correlated with that at zero baselines, this light-curve normalization reduces a significant fraction of the additional variability on nonzero baselines shorter than the break.Generically, at small |u|  u 0 the PSDs of the light-curve-normalized brightness fluctuations scale as |u| 2−4 , depending on the assumed calibration details.For |u|  u 0 , the PSDs generally fall with power-law indices of 2-3.
For GRMHD simulations viewed from close to the polar axis the PSDs are approximately azimuthally symmetric.However, for simulations of accretion flows that are viewed nearly edgeon, which exhibit significant departures in azimuthal symmetry in their images, the PSD at 4 Gλ can vary by as much as 50%.Nevertheless, we will proceed with the approximation that the PSDs are functions of |u| alone, subject to the validation experiments performed in the following sections.

Summary of Intraday Variability Mitigation Strategy
Intraday variability presents a substantial challenge for VLBI generally, and especially so for that performed by the EHT specifically, which has sparse (u, v)-coverage, as shown in Figure 1.Methods that attempt image reconstruction and model-based parameter inference from the 2017 EHT observations of Sgr A * using segments of the data that span time ranges shorter than that associated with the variability must contend with at least two significant challenges: 1.A dramatically reduced data volume.Limiting the segmentation time to minutes reduces the number of independent VLBI (non-intrasite) measurements to 15 in the best case, in comparison to the many hundreds obtained throughout a night owing to Earth aperture synthesis.As a direct consequence, the number of independent parameters that can be effectively constrained is drastically limited.2. The inherent complexity and stochasticity of the anticipated variability, based on the turbulence observed in GRMHD simulation movies.
The strategy presented here is to statistically address the putative variability so that standard VLBI analysis tools that leverage Earth aperture synthesis can still be applied despite the presence of significant intraday variability.In this sense, the properties of the average brightness distribution are simultaneously measured with their covariance, or "noise," over the observation window.This procedure is formally equivalent to marginalizing over realizations of the noise and is statistically well justified when the timescale of the variability is short in comparison to the observing window.
The additional variability-induced noise is an addition to, not a replacement for, the usual measurement uncertainties.Thermal noise is the dominant contribution to these uncertainties and is dependent solely on the telescope operations at individual sites, such as system temperature, bandwidth, integration time, etc. (Thompson et al. 2017).To a very good approximation, the thermal errors on the real and imaginary components of the complex visibilities are Gaussian.We will assume that thermal error is synonymous with the statistical error, which may include small additional contributions from phasing efficiency, phase coherence, etc. Importantly, these errors are independent for each measurement, known a priori, and insensitive to the intrinsic source structure and its variability.
This strategy comes with its own set of challenges.Typically, there is not a single variability timescale, but rather a variability spectrum, with some components of the variability appearing well within the observing window and others extending beyond it.This multitude of timescales is evident, e.g., in the variability properties of the light curve of Sgr A * during the 2017 EHT observations (Wielgus et al. 2022) and suggested by the GRMHD simulations described in the previous section (Georgiev et al. 2022).More important is the presence of correlations between measured visibilities, which will be present even in the absence of spatial correlations in the image.These correlations complicate the manner in which the noise is introduced.
In practice, we generate a model of the diagonal elements of the covariance that incorporates the additional deviations of the complex visibilities from a mean image that result from the structural variability within the source.These variances can be estimated directly from the data (Section 4) or obtained simultaneously with model comparisons and image reconstructions (Section 5).In the remainder of this section we discuss how correlations on different scales are addressed and the procedures we present in this paper are validated.

Large-scale Correlations
The light curve of Sgr A * during the observing nights throughout the 2017 EHT campaign is dominated by fluctuations on the longest timescales (Paper V; Wielgus et al. 2022).This is consistent with other studies that cover much longer time spans, which suggest a potential break at a timescale between 2 and 8 hr (Dexter et al. 2014;Witzel et al. 2018Witzel et al. , 2021)).
This temporal behavior is consistent with the expectations from GRMHD simulations, described in Section 2 and reported in detail in Georgiev et al. (2022).Within the GRMHD simulations, the long-timescale variability is associated with variations in the largest-scale features of the image.In this sense, the spatiotemporal power spectrum is "red-red," i.e., the largest-amplitude fluctuations in the source brightness occur on the longest timescales and largest spatial scales.As a direct result, the largest correlations are driven by the longest temporal/largest spatial fluctuations.
We remove these a priori by light-curve-normalizing the visibility data.By construction, the normalized data have vanishing variance at zero baseline.For GRMHD simulations, the suppression of fluctuation power extends to |u| ≈ 2 Gλ (Georgiev et al. 2022).A direct consequence of this procedure is that the reconstructed variance is unable to address these omitted long temporal/large spatial scale modes.

Small-scale Fluctuations
GRMHD simulations exhibit turbulence in the emitting plasma, leading to short-time/small-scale stochastic fluctuations in the images (M87 * Paper V; Paper V).These occur on all scales, ranging from the field of view of the EHT observations (200 μas) to well below the formal resolution of the EHT (20 μas; Georgiev et al. 2022).These structural fluctuations are the source of the additional noise that the mitigation scheme presented here is addressing.

Methodology, Caveats, and Validation
The adopted methods only model the diagonal elements of the covariance of the complex visibility data associated with the underlying structural variability, thereby leaving out correlations.The largest correlations are avoided by fitting to light-curve-normalized complex visibilities.Nevertheless, other correlations including the short-time/small-scale fluctuations are not directly addressed, and we have no a priori reasons to believe that they are negligible.Indeed, in Section 4 we see clear evidence for the existence of nontrivial correlations.
The implication of these residual correlations for the fidelity of image reconstructions and parameter estimates is not immediately clear.Thus, we specified the following set of validation experiments, applied where possible to the specific observation parameters (e.g., (u, v)-coverage, observation times) relevant for the analysis in question: 1. Demonstrated on simulated data sets with Gaussian noise added to individual complex visibility estimates.2. Demonstrated on simulated data sets with Gaussian noise fluctuations with a specified power spectrum overlaid on the image.These are uncorrelated in the image domain but will have residual correlations in the (u, v)-domain owing to the finite source size.3. Demonstrated on simulated data sets in which structural brightness fluctuations are added via the iNoisy prescription in a manner that is spatially correlated in the image domain (Lee & Gammie 2021).4. Demonstrated on simulated data sets generated from imaging GRMHD simulations.Fluctuations are necessarily correlated in the image domain in a manner that reflects the self-consistent physical evolution of the emission region.
These experiments are progressive in that additional complications, related to the nature and strength of correlations, are incorporated in subsequent experiments.Thus, we treat them as cumulative: successful validation using GRMHD simulations implies a successful validation using the iNoisy sets, etc.All methods presented here are successfully validated via multiple of the above experiments, including on simulated data sets produced from GRMHD simulations.

Data-driven Characterization of Variability
VLBI data sets collected over many observation epochs contain within them a natural characterization of source variability (see Figure 1).We exploit this feature in the 2017 EHT observing campaign to assess the magnitude and spectrum of the intrinsic source variability.
Data sets that may be combined include those taken on multiple days, crossing tracks in the (u, v)-plane throughout the night, and multiple observation bands (e.g., low-and high-band EHT data) where the spectral dependence of the source may be neglected across them.Additionally, lower limits on the intrinsic source size, α, induce correlation in the (u, v)-plane, meaning that neighboring points in the (u, v)-plane may also be compared.Expressed in EHT-relevant units, the typical distance within the (u, v)-plane over which the complex visibilities are strongly correlated is Δu ≈ 1.0(α/200 μas) −1 Gλ.Within this scale, independent of the small-scale structure within the image, we may treat the visibilities as sampling similar aspects of the intrinsic source structure.In practice, we assume that the structure of the visibility may be modeled by a linear function within patches of this size, with the remaining suprathermal variations indicative of the combination of structural variability, refractive scattering, and residual systematic uncertainty.
Therefore, the strategy for obtaining a model-agnostic, datadriven characterization of variability at any given point in the (u, v)-plane (u 0 , v 0 ) is as follows: 1. Select all data points across all relevant days and bands within a silo of diameter Δu = 1 Gλ about (u 0 , v 0 ).
2. Detrend the selected data points with a linear model in (u, v) to disentangle the gross source structure from the fluctuations about it.3. Compute debiased estimates of the mean of the trend and the variance of the residuals.4. Average these estimates in azimuth.
The final step reduces the statistical uncertainty in the mean and variance, addresses sparse (u, v)-coverage, and facilitates direct comparisons with Section 2.
Insofar as the above strategy provides a measure of the variability in excess to that anticipated by the thermal errors, including the potential temporal correlations, it is directly comparable to the additional uncertainty required to fit static models to variable data sets.An additional debiasing step that addresses temporal correlations, necessary to compare with theoretical expectations of the power spectrum of visibility amplitude fluctuations, is described in Section 4.5.

Data Selection
To avoid having to address the unknown station phase delays, i.e., an inherently uncertain self-calibration procedure in the absence of a known source model, we employ the visibility amplitudes.Visibility amplitudes constructed from complex visibilities with Gaussian (thermal) noise are distributed according to the Rice distribution, which is only approximately Gaussian.In the following analysis we therefore restrict our attention to scan-averaged data values and data points for which the signal-to-noise ratio is S/N 2, where the Rice distribution is approximated well by a Gaussian.
We combine the data sets from high and low bands, centered on 227.1 and 229.1 with 2 GHz bandwidths, and across all observation times, including through and across observation nights.All data are equally weighted, i.e., we do not attempt to address the different number and quality of data on different days.This combined data set is assumed to provide an ensemble of comparable observations from which we may estimate the desired statistical properties.
Station gains impose an important potential additional systematic uncertainty.In most cases, we marginalize over the credible range of station gains (see Section 4.3).However, there are two important preprocessing steps that we employ to reduce the impact of gain errors within the 2017 EHT observation campaign.
First, we employ the network calibration procedure described in M87 * Paper II to set the gain amplitudes at JCMT, SMA, ALMA, and APEX to within 1%.
Second, we calibrate the LMT gain amplitudes by fixing the visibility amplitudes on the LMT-SMT baseline to a Gaussian model, with an FWHM of 60 μas, comparable to the secondmoment estimates of M87 * and Sgr A * .This does not eliminate uncertainty in the LMT gains, but rather reduces it to a combination of that associated with the SMT and the error in the applicability of the Gaussian model, eliminating the large potential swings (see, e.g., Figure 34 of M87 * Paper IV).This calibration procedure does, however, artificially suppress the variance on baselines shorter than 2 Gλ, and thus we will focus our attention only on baselines long enough to be unaffected by the calibration procedure.
Third, to avoid significant losses on JCMT associated with poor phase coherence during scan averaging, prior to coherently averaging we phase-calibrate the JCMT to a pointsource model.

Estimating Statistical Properties of a Single Patch
We begin with estimating the mean and variance of the visibility amplitudes within a radius Δu/2 of some point (u 0 , v 0 ) within the (u, v)-plane, which we refer to as a "patch."Such estimates may be constructed at any point in the (u, v)-plane where a sufficient number of visibility amplitudes exist, i.e., N 4.For Δu = 1 Gλ, given the 2017 Sgr A * EHT campaign coverage, there are roughly 60 independent patches within | u| = 9 Gλ containing sufficient data to generate local mean and variance estimates.

Detrending Visibilities
The visibility amplitudes within a patch will vary as a result of both variability and intrinsic structure.Because it is the former that is of interest here, to exclude the latter, we first estimate the linear trend across the patch.As long as Δu is sufficiently small, a linear model is an excellent approximation of the average visibility amplitudes across the patch.
The linear model is defined by 0 .The detrending procedure is described in Appendix A. The reconstructions of the detrending parameters, V and (V u , V v ), are unbiased for Gaussian errors on the amplitudes.The fidelity of the trend reconstruction depends on the number and S/N of the amplitudes.Having chosen 0 makes no significant change to the trend reconstruction.
The outputs of this procedure are the mean amplitude within the patch, estimated at the patch center, V , and the detrended amplitudes within the patch, . This is supplemented with the thermal errors at those locations, σ th,j .

Statistical Estimators
Inherent within the detrended ensemble, { ˆ} V j , is an estimate of the statistical properties of the structural fluctuations not accounted for by the linear trend.These are necessarily biased by the thermal fluctuations and structural variations, both of which must be removed.
We estimate the debiased variance by to avoid pathologies associated with "over-debiasing," in which statistical fluctuations below s j th, 2 generate negative estimates of σ 2 .The normalization of N − 3 is a consequence of having three degrees of freedom in the linear detrending model.
The debiased estimate of the mean is given by where the mean estimate from the linear detrending procedure is corrected for the biases arising from the thermal and structural fluctuations, characterized by the patch variance estimate.
To demonstrate these estimates, we generate a simple simulated data set composed of 10 complex visibilities, V j , located at (u, v)-positions (u j , v j ) randomly selected about a center (u 0 , v 0 ).At each (u j , v j ), the V j is generated from a fixed linear model with = V 1, V u = 0.5, V v = 0, and two complex zero-mean Gaussian random fluctuations, one representing the thermal uncertainty and the other the structural variability, both with amplitude 0.1.Repeating this procedure, holding the linear model and (u, v)-positions fixed, produces an ensemble of data sets, from which the distributions of the reconstructed μ and σ 2 are shown in red in Figure 2.
The estimate of μ is unbiased, with an accuracy that depends on the degree of variability and number of visibility amplitudes.Even when only four points are used, this quantity is reasonably well behaved, appearing normal and centered on the "truth" value.
The distribution of the estimated variance is clearly not normal, matching a lognormal distribution more closely.The specific realization of the (u j , v j ) does impact the distribution, with the degree of impact increasing as the number of amplitudes included decreases.When the impact of the structural variability is much smaller than the thermal errors, i.e., σ = σ th , the variance estimate can be inaccurate.

Error Estimates
We estimate the errors on the estimates μ and σ 2 for a single patch via a Monte Carlo procedure, Monte Carlo Errors (MCEs), that may be applied to the data directly.Doing so avoids the need to propagate the individual thermal errors through the detrending procedure and, as done in the following sections, include additional sources of uncertainty (e.g., gain uncertainties, polarization leakage).This process begins with generating an ensemble of data sets produced by introducing two additional complex, zero-mean, unit-variance Gaussian random variables, x j and y j : where σ 2 is the variance estimate from the observations.This procedure presumes that the thermal noise on the complex visibilities is Gaussian (well justified) and that the structural variations are independent (perhaps not well justified).Each generated set of ¢ V j is analyzed in a fashion identical to that described above: detrended, and μ and σ 2 estimated from Equations (2) and (1), respectively.
These must be further debiased to remove the impact of the original fluctuations inherent in the amplitude data.Therefore, we generate an ensemble of estimates that are further debiased via The distribution of estimates from the MC ensemble is shown in blue in Figure 2.For both μ and σ 2 , the MCE distributions are quantitatively similar to those generated from ensembles of simulated data.This similarity extends for σ 2 to being biased toward large values.The additional debiasing, which eliminates the double counting of the induced and original fluctuations, does indeed appear to satisfactorily remove any residual offsets between the true and reconstructed distributions.
We reiterate that this MCE procedure has the considerable virtue of being applicable to the actual data, providing a credible means by which to assign uncertainties to the patchspecific means and variances.

Azimuthally Averaged Statistical Properties
The procedure outlined above may be used to produce estimates of the variance arising from structural variations in the source throughout the (u, v)-plane.However, in many instances, and explicitly for the 2017 EHT observations (M87 * Paper III), the baseline coverage is very sparse.Combined with the expectation that the power spectrum of the structural variability is only modestly dependent on position angle, this sparsity motivates the construction of azimuthally averaged estimates of μ and σ 2 , which are then functions of only the baseline length.

Averaged Statistical Estimators
To generate these, we first produce a polar grid of patch centers.In these the radial positions, ρ s , are linearly spaced with separation Δu/2, so that neighboring patches overlap.153At each radial position, a set of azimuthal positions f st are chosen so that azimuthal spacing is Δu/2.These then provide a set of . At each polar patch position, we generate estimates of the mean and variance of the detrended visibility amplitudes, μ st and s st 2 .The location of these patches is illustrated in Figure 3.Each data point will contribute to multiple patches.
Azimuthally averaged quantities are generated by weighting each patch's estimate by the numbers of degrees of freedom.The resulting azimuthally averaged estimate of the mean is given by where Θ(x) is the standard Heaviside function.Because the variance estimates appear to be lognormal, and thus vary over many orders of magnitude, we average s ( ) log 2 , i.e., Note that this estimate of s s 2 emphasizes those patches that have many data points and therefore are likely to be more accurate.

Averaged Error Estimates
Neighboring grid points are not necessarily independent, and thus the uncertainty in the combination is expected to be significantly larger than would be inferred if they were.Therefore, the Monte Carlo estimates of the error in μ s and s s 2 must be performed after the azimuthal averaging.Residual gain errors present an additional source of uncertainty on the reconstructed means and variances.Therefore, the MCEs for the azimuthally averaged data sets follow a similar but distinct procedure to that for the individual patches.
We begin by constructing estimates of μ s and s s 2 .With these, we generate an ensemble of data sets as described by Equation (3).Each ensemble element is subsequently modified by introducing gain amplitude fluctuations, introducing a correlated fluctuation across the data set.For application to the 2017 EHT Sgr A * campaign, we adopt gain uncertainties consistent with those reported in Paper III.For sites for which network calibration is performed (ALMA, APEX, JCMT, and SMA) we assume a gain amplitude error of 1%; all other sites are assigned a gain amplitude error of 10%.These additional gain fluctuations are assumed to be normally distributed and applied on a per-scan basis.Finally, we normalize the ensemble data by the correlated flux on the intrasite baselines (ALMA-APEX, JCMT-SMA) as an approximation to the light curve.

Estimating Noise Model Parameters
We ultimately seek to compare the empirical noise estimates with those found in GRMHD simulations by Georgiev et al. (2022).To facilitate this comparison, we fit the azimuthally averaged s s 2 with a broken power-law model, consistent with that found to be applicable to GRMHD simulations: Because the azimuthally averaged variance estimates are approximately lognormal, we perform this fit on the logarithm of the reconstructed s s 2 .We find that the variance estimates themselves are susceptible to correlated errors, presumably due to the correlations in the fluctuations induced by noise imposed within the image domain.Therefore, we inflate the MCE estimates on the s s 2 by a factor of two, which safely encapsulates the discrepancy in the numerical validation experiments described in the following sections.In summary, the log-likelihood explored is where err s is the geometric mean of two-sided 1σ quantiles from the MCE ensemble for s s 2 .The noise model parameters, {a, u 0 , b, c}, are estimated by sampling this likelihood using the Python-based ensemble Markov Chain Monte Carlo package emcee (Foreman-Mackey et al. 2013).Typically, 10 5 steps are taken by 128 walkers, of which 10 4 are randomly selected in the latter half of the ensemble of chains.Beyond this point, the joint posteriors of all of the noise parameters are insensitive to the number of MCMC steps taken.

Validation
We explore a number of validation exercises that follow the validation ladder laid out in Section 3. In summary, within the region for which we expect high-fidelity characterization of the structural variation, we find that we are able to do so.This encompasses the baselines such that 2 Gλ < |u| < 6 Gλ, where baselines shorter than 2 Gλ are excluded by the LMT calibration procedure, and at baselines longer than 6 Gλ the structural variability is subdominant to the thermal noise.
In each validation exercise, the simulated data sets were processed as described in Section 4.1.That is, (1) they are The resulting data sets are then analyzed as described above with Δu = 1 Gλ.Note that we show results for baseline length spacing of 0.5 Gλ, and thus subsequent values are formally correlated.
These are compared to the azimuthally averaged mean and variance of the visibilities computed directly from "truth" movies, without any attempt to restrict them to any subset of the (u, v)-plane, that is, without any of the constraints imposed by the actual observations.

iNoisy Validation Results
The images used as part of the validation scheme for the various image reconstructions in Paper III provide a natural set of simple examples in which the fluctuations are introduced in the image domain.In each, a geometric brightness map is modified by the imposition of a set of correlated fluctuations via the iNoisy prescription, in which a set of temporally and spatially correlated brightness fluctuations are constructed in a fashion that mimics the turbulent structures within accretion disks (Lee & Gammie 2021).
This iNoisy method provides substantial control over the properties of the image variability while separating it from the complexity of the source structure.The structural variability induced by iNoisy in these images is tuned to match that exhibited by Sgr A * , as measured by the light curve and variations in the closure phase as quantified via the Q-metric, a noise-weighted measure of the closure phase variability described in Roelofs et al. (2017).The construction of the simulated iNoisy data sets is described in detail in Section 5 of Paper III.However, the apparent spatial correlations are typically confined to smaller scales than those characteristic of the geometric model.
The iNoisy data sets that we employed during the validation of the statistical estimators include a crescent, ring, double source, simple disk, elliptical disk, and ring with a 30minute-period hot spot (see Paper III for further details).Each "truth" movie was resampled on each of the 4 days with different thermal noise, complex gain, scattering, and structural fluctuation realizations.Three examples-the crescent, point source, and elliptical disk-are shown in Figure 4.
In all cases, the estimated azimuthally averaged mean and variance maps match those associated with the "truth" movies well.The LMT calibration procedure does produce artifacts at | u|  2.5 Gλ, extending marginally above that.This is especially visible in the estimated variances.However, it is also visible in the estimated means, which hew closely to the 60 μas FWHM Gaussian assumed during the LMT calibration.
For |u|  6 Gλ, the structural variability is modestly subdominant to the thermal errors, indicated by their average values.As anticipated, there are modest departures from the truths in this region.
The consistency extends to the range of fit broken power-law noise models.Because the break in the truth variances occurs below |u| ≈ 2.5 Gλ, there is little traction on the break location and the short-baseline power-law index.However, both the overall amplitude and long-baseline power-law index are well constrained.

GRMHD Validation Results
GRMHD simulations provide a credible proxy for source structure of astronomical sources on horizon scales (Paper V; Georgiev et al. 2022).While there is a limited range of potential image morphologies within the GRMHD libraries, they do provide a physically motivated and self-consistent set of intensity fluctuations.Importantly, these models exhibit the spatial and temporal correlations that are anticipated in accreting and jet-launching environments.
Each "truth" movie was resampled with (u, v)-coverage identical to that of Sgr A * during the 2017 EHT campaign on each of the four observation days with different realizations of thermal noise, complex gain, and scattering.The simulated data sets share many of the features of those associated with the iNoisy sets: the images are complex, contain significant image-domain correlations that translate into corresponding visibility-domain correlations, and contain an underlying noise spectrum that is not known a priori.The latter fact motivates a comparison to the data-driven nonparametric estimates from Section 4.
However, the GRMHD simulation data sets do differ in two important respects.First, they exhibit a much wider array of behaviors, extending from small-scale fluctuations like those in the iNoisy sets to large-scale arcing structures that differ from the iNoisy images.These arcing features induce longtimescale correlations that are not otherwise present in the iNoisy sets.Second, there is no attempt to restrict the simulations to those that reproduce the variability observed in Sgr A * .As a result, the variability present in the GRMHD simulations is typically larger than that exhibited by Sgr A * (Paper V).
Three typical GRMHD validation examples are shown in Figure 5.As with the iNoisy validation examples, this example accurately reconstructs the underlying truth variances associated with the full GRMHD movie for 2.5 Gλ < |u| < 6 Gλ, limited at shorter baselines by the LMT calibration procedure and at larger baselines by the thermal noise.
These are representative of 100 such GRMHD simulation comparisons.
As with the iNoisy models, the break in the truth variances occurs below |u| ≈ 2.5 Gλ, and therefore only the overall amplitude and long-baseline power-law index are well constrained.

Debiased Variability Measurement
Thus far, no attempt has been made to address clear artifacts in the normalized variance estimates in the examples shown in Figures 4 and 5 that are due to unaddressed temporal correlations in the simulated data sets.These correlations are real and anticipated in the actual Sgr A * data, arising from the fact that on a single night neighboring positions in the (u, v)plane are observed at nearby times.Insofar as the variance estimate is used to quantify the excess noise required to accommodate fitting to stationary models (including images, geometric models, and physical models) to data sets composed of a full observation night, the prior scheme is sufficient.However, it is possible to envision a debiasing procedure in which these correlations are corrected, which we present here.
The temporal-correlation artifacts are most prominent between 6 and 7.5 Gλ, within which, for every example case shown in Figures 4 and 5, the normalized variance estimates exhibit a deficit of variability.This range of baseline lengths is especially sparsely sampled by the 2017 Sgr A * baseline coverage, with only one baseline contributing on most days (see Figure 1).For models in which the stochastic image fluctuations exhibit red temporal power spectra, as anticipated by GRMHD models of Sgr A * and observed in the light curve of Sgr A * , this sparsity implies a strong correlation among the visibility amplitudes within (u, v)-bins within this baseline length range.As shown in Appendix B for a simplified case, In contrast, note that between 7.5 and 9 Gλ there are five separate baselines, four of which are "following baseline pairs"-LMT-SPT/SMA-SPT and SMT-SPT/PV-SPT-in which the two baselines traverse the same region in the (u, v)-plane but at times separated by hours (Paper IV).Thus, within this baseline range, there are more statistically independent samples than implied by the number of apparent baselines, and many more than between 6 and 7.5 Gλ.Correspondingly, the accuracy of the variance estimates improves dramatically beyond 7.5 Gλ.
The magnitude of the bias, β, depends on the temporal power spectrum of the fluctuation spectra and thus is entangled with the statistical property we wish to measure.In principle, we could construct an a priori debiasing function, β −1 (|u|), for a given set of observations as a function of baseline length and temporal power spectrum of the putative fluctuations.In practice, it is simpler to utilize the libraries of simulated data to effectively "measure" the debiasing factor for specific classes of models that share similar temporal properties.We present such an empirical calibration here for the iNoisy and GRMHD models, of which the latter is expected to be more physically relevant and utilized in Papers IV and V.
The procedure to estimate the debiasing functions is straightforward: 1. Generate an ensemble of simulated observations matching the 2017 EHT Sgr A * campaign.2. Estimate the variance as described in Sections 4.1-4.3.3. Construct the anticipated power spectrum, Ps , from the "truth" movie as described in Section 2 on each day.4. Generate an estimated debiasing factor for each baseline length bin by minimizing the joint χ 2 among all models, observation days, and the estimates of the variance and its error: We generate an estimate of the debiasing function using the 90 simulations employed in the M/D calibration in Paper IV.This set of mock observations is generated from GRMHD simulations appropriate for Sgr A * , each of which is composed of simulated visibility data for both bands and all days of the reported 2017 Sgr A * EHT campaign.This is shown in Figure 6 and follows the pattern anticipated by the sparsity of the baselines: baseline lengths with few/many baselines contributing have large/small corrections.
The debiased variance estimates for the three GRMHD examples shown in Figure 5 are presented in Figure 7.These three simulations are taken from the 10 simulations used to validate the M/D calibration and are therefore distinct from those within the ensemble used to estimate the debiasing function.They show excellent agreement with the fluctuation power spectrum obtained directly from the simulated images.

General Application: Limiting Assumptions and Potential Extensions
While we demonstrate and validate the data-driven variability characterization presented in this section using the 2017 EHT Sgr A * campaign, the technique is more widely applicable, albeit subject to some nominal limitations.
First, this method relies on a sufficient density and resampling of visibilities within the (u, v)-plane exists.In the absence of more than three points per patch, it is not possible to reconstruct the detrended excess variance.In the absence of many such patches, the resulting estimates are highly uncertain.
Second, the debiasing procedure outlined in Section 4.2 only results in accurate estimates of the excess variance and its uncertainty when the former exceeds the variation due to the intrinsic thermal uncertainty.
Third, the method does not account for correlations that may be present among different regions in the (u, v)-plane.While not a formal limitation on the method's application, it nevertheless limits the interpretation of the reconstructed variances.
This method also makes a handful of specific choices that may be relaxed.The assumption of azimuthal symmetry, which was exploited to improve the precision of the variance estimates, may be removed when the (u, v)-coverage is sufficiently complete.The debiasing function presented in Section 4.5 was constructed for a specific set of observations and anticipated mock image ensemble.An analogous debiasing function could be separately reconstructed for a different application, given an appropriate set of observations and mock images from which to derive it.
Finally, the interpretation of the reconstructed variances depends on the nature of the structural variability.Where it is well described by a set of stochastic fluctuations about a mean image, the measured variability may directly be interpreted as constraining the power spectrum of the fluctuations.Where the variability is associated with coherent motions, the interpretation is more complicated.

Bayesian Noise Reconstruction
Implicit in the noise estimates obtained in Section 4 is a model for the source structure and its variability that results in smoothly varying visibility amplitudes and noise that is stationary across the multiple observation days.No model for the station gains is assumed, and no effort to utilize the phase information is made.However, it is possible to simultaneously reconstruct the average source structure and the additional noise associated with the variability within standard maximum likelihood analysis schemes.
For Gaussian likelihoods, i.e., assuming Gaussian errors (as is appropriate when fitting complex visibilities), the loglikelihood is is the difference between the observed and model complex visibilities and C is the covariance.This differs from the usual quadratic expression by the inclusion of the logarithmic normalization term, which will be critical in what follows.
The essence of Bayesian noise reconstruction is that, in addition to the ˆ( ) V u v , , the data covariance C is also replaced by some parameterized model whose parameters are jointly optimized with those from the visibility model.In this strategy, the logarithmic normalization is critical, as it penalizes excess noise.That is, the optimal joint source and noise model is one that has just enough noise to properly normalize the loglikelihood, but no more. 154s described in Section 3, in what follows we will assume that C is diagonal, subject to the assumptions and limitations described there.In this case the normalization term reduces to where s j 2 is the jth diagonal element of C, corresponding to the effective uncertainty associated with the jth complex visibility.

Implementation of Uncertainty Modeling
Multiple specific realizations of the noise modeling procedure outlined above have been implemented in the THEMIS analysis framework (Broderick et al. 2020a).In each case, the "noise" model, or uncertainty model, is characterized by a parameterized function that is added in quadrature to the existing uncertainties on the complex visibilities.That is, for the jth complex visibility we set its effective uncertainty by where s j th, 2 is the reported thermal error and σ 2 (u j , v j ; q) is the additional component evaluated for some some set of  parameters q.These differ in the particular form of σ 2 (u j , v j ; q).The most general uncertainty model employed is constructed from three subcomponents.
The first is a systematic term associated with the residual nonclosing errors that is proportional to the visibility amplitudes: , 13 where anticipated values of f are of order 1% (Paper II).
The second is a refractive scattering term that accounts for the long-baseline fluctuations resulting from the refractive scintillation toward the Galactic center (see, e.g., Johnson et al. 2018;Issaoun et al. 2021).Predictions for the amplitudes of the fluctuations due to refractive scattering are typically only weakly dependent on baseline length beyond |u| ≈ 1 Gλ (Paper III).Therefore, we incorporate a term that imposes a uniform ((u, v)-independent) floor: , ; , 14 where typical values of σ T can be as high as 0.4%.This term only ever contributes significantly at longer baselines when the former two terms have fallen below s ref 2 .Finally, the third is a contribution that accounts for the structural variability, where n(|u|) is defined in Equation (7).While this specific functional form is motivated by GRMHD simulations (see Section 2), it is sufficiently flexible to encompass a wide variety of potential structural variability.The combined uncertainty model is then , , , , , 16 of which some or all parameters can be fixed, which will be useful to model only a subset of the model components.This is then inserted into Equation ( 12) to obtain the uncertainties on the real and imaginary components of complex visibilities.

Validation
We report three sets of validation experiments of the Bayesian noise modeling implemented in THEMIS.These correspond to the different validation levels described in Section 3.3.In each, we demonstrate that the average source structure and noise power spectrum can be faithfully recovered.
All of the analyses presented here are performed as the THEMIS-based analogs described in Papers III and IV.Specifically, they make use of the scan-averaged, complex visibilities, and the LMT and JCMT pre-processing calibration steps described in Section 4.1 are applied.The complex station gains are reconstructed simultaneously with the image reconstruction.

Gaussian Noise Reconstructions
We begin with simulated data sets produced from a simple source structure and independent Gaussian fluctuations added directly to the visibilities.Note that in this set of tests there are no guarantees that the associated image is positive definite.However, the manner in which additional noise has been added to the data matches exactly that being modeled.
Simulated data sets are produced with the baseline coverage and uncertainties similar to the 2017 April 7 EHT observations of Sgr A * for a Gaussian brightness map with a total flux of 1 Jy and an FWHM = 50 μas.Various assumptions are made about the inclusion of additional fluctuations, which are independently generated for the systematic, refractive, and variability components.We consider three cases here: 1.No additional noise is added: In this case, the true values of the relevant parameters are f = 0, σ T = 0, σ var = 0. 2. Only systematic and refractive noise is added: In this case, the true values of the relevant parameters are f = 1%, σ T = 0.7%, and σ var = 0. 3. The full noise model is added: In this case, the true values of the relevant parameters are f = 1%, σ T = 0.7%, a = 5%, u 0 = 2 Gλ, b = 3, and c = 2.
These are then fit with a Gaussian brightness map model, with and without employing the relevant model for the additional noise.
Figure 8 clearly indicates the consequences for failing to account for the additional fluctuations on the underlying data.
When no additional noise is included, the posteriors on the total flux and the FWHM faithfully recover the true values.However, when additional sources of noise are included but are unmodeled, large biases are induced, i.e., many σ deviations from the truth.It is mitigating precisely these biases that is the focus of this paper and critical to Papers III and IV.
In contrast, the joint posteriors when the model being fit includes the appropriate noise model are shown in Figure 9.In this case, the parameters of the image structure are faithfully recovered, albeit with an appropriate inflation of the uncertainty.In addition, the parameters of the underlying noise parameters are faithfully recovered as well.
A similar experiment is performed for a more complicated source model that provides good fits to the source structure in the 2017 EHT Sgr A * data (Paper IV), specifically, the "mring" construction from Johnson et al. (2020), in which the truth image is a narrow ring with an azimuthal flux distribution decomposed into a truncated Fourier series and then convolved with a circular Gaussian.The model parameters are the total flux, I, ring radius, R, Fourier mode amplitudes and phases, {A m , f m }, and FWHM of the Gaussian smoothing kernel (for detailed model specification see Section 4 of Paper IV, though note that we do not include the additional Gaussian component employed there for this test).For this test, we truncate the Fourier series at m = 2. Data are generated in a similar fashion to that for the Gaussian test and analyzed with and without the Bayesian noise modeling.
The top right panel of Figure 10 indicates the necessity of including the Bayesian noise reconstruction: when the excess noise is not modeled (open points), the structural parameters are biased by nearly 100σ in some cases; in contrast, including the noise modeling yields quantitatively accurate parameter estimates.Joint posteriors for the data set in which all additional sources of noise are added are shown in Figure 10.As with the simpler Gaussian image structure, the Bayesian noise reconstruction permits an accurate simultaneous recovery of the structural and noise parameters.This improvement is in part associated with the inflation of the parameter posterior widths resulting from the inclusion of additional contributions to the uncertainties of the individual complex visibilities.
We conclude that Bayesian noise reconstruction satisfactorily mitigates the presence of additional variability when directly applied to the complex visibilities.

iNoisy Reconstructions
The iNoisy data sets used to validate the image reconstructions in Paper III and described at the beginning of Section 4.4.1 provide a natural set of validation tests.Despite the artificial nature of the simulated images, there are some salient differences with tests in the preceding subsection.First, the images are more complex, and we apply a more complicated source model-an adaptive bicubic raster model appropriate for image reconstruction (see Broderick et al. 2020b; Paper III)-with an associated increase in model complexity.Second, because fluctuations are applied directly to the image, significant correlations are induced among the complex visibilities.Third, the underlying noise spectrum is not known a priori, and therefore the specific quantitative reconstructions of the noise parameters cannot be compared to some truth in the form shown in Figure 9. Thus, we compare to the data-driven nonparametric estimates from Section 3.
Simultaneous image and noise reconstructions are shown in Figure 11 for three examples of the iNoisy simulated data sets.The morphology of the reconstructed images faithfully matches that of the mean images in the underlying truth.A gross sense of the breadth of the posterior is indicated by the samples shown.
The normalized variance in both instances is shown in the bottom panels of Figure 11.The data-driven, nonparametric estimates serve two purposes in these analyses.First, they inform the priors on the noise model, which employ the datadriven amplitude at |u| = 4 Gλ and the long-baseline powerlaw index b.However, these are weakly informative, predominantly setting a lower bound on the noise.The range of normalized variances associated with the permitted noise models is reflected by the blue shaded regions.To prevent overinterpretation of the simulated data, we intentionally impose strong lower limits on the noise via these priors; only very weak, noninformative upper limits are enforced.
Second, the data-driven, nonparametric estimates provide a comparison by which to assess the posteriors on the parameters of the noise models, shown by the orange shaded regions in Figure 11.These posteriors are well matched to the data-driven estimates within and beyond the range for which the latter are well constrained.Note that the assumed priors include many values of the noise parameters for which this is not the case, especially parameters that exhibit large degrees of excess noise that are strongly disfavored.

GRMHD Simulation Reconstructions
Finally, we turn to GRMHD simulations, which exhibit physically motivated stochastic and coherent substructures in the simulated images.The simulated GRMHD data sets described in Section 4.4.2 for the April 6 and 7 (u, v)-coverage of the 2017 EHT Sgr A * campaign are used.
We perform two sets of validation experiments with the GRMHD simulation data sets, distinguished by the image model employed.In both sets of experiments we employ the same procedure to set the priors on the noise model using the data-driven, nonparametric estimates as that employed during the iNoisy tests.These priors are reflected in the blue shaded regions in Figures 12 and 13.
In the first, we reconstruct the simulated data sets with a parameterized ring model appropriate for comparison with the compact ring structures seen in M87 * and Sgr A * : the mG-ring model described in Paper IV and a simple extension of the m-ring model introduced in Section 5.2.1, differing only by the addition of a central Gaussian component to probe the magnitude of the central flux depression.This is similar to geometric ring reconstructions of the Sgr A * data in Paper IV.
Unlike the test in Section 5.2.1, we do not have an unambiguous set of mG-ring-parameter truth values to validate against, and thus we limit ourselves to general image comparisons.Figure 12   are similar to those of the underlying average truth, blurred to the ostensible resolution of 15 μas, limited by the restricted freedom of the mG-ring model to produce radially asymmetric brightness profiles.In all cases, the reconstructed noise posteriors match those estimated by the data-driven nonparametric estimates.Specifically, the amplitudes near 4 Gλ and the power-law index between 2.5 and 6 Gλ are well reproduced.
The second validation experiment mirrors that of the iNoisy demonstration: we utilize an adaptive bicubic raster model appropriate for image reconstruction (Broderick et al. 2020b).This directly supports the Sgr A * image reconstructions presented in Paper III and is a procedure similar to that employed by the THEMIS reconstructions presented therein.The results for the same three GRMHD simulation data sets are shown in Figure 13.As found in the iNoisy case, the morphology of the reconstructed image faithfully matches that in the average truth after convolving both with 15 μas beams.The reconstructed noise estimate posteriors, shown by the orange bands, are again consistent with those from the datadriven nonparametric estimates.In particular, the amplitude at 4 Gλ and the long-baseline power-law index of the excess normalized variance are faithfully recovered.
The success found in both sets of experiments is notable given the much larger degree (by 1-2 orders of magnitude) of additional noise present than the previous iNoisy experiments.More importantly, the noise estimates from both experiments are also consistent with each other, indicating robustness to variations in the underlying image model specification.

General Application: Limiting Assumptions and Potential Extensions
We have made a number of specific choices regarding the form and properties of the noise models, with the goal of applying the noise modeling technique to horizon-resolving black hole imaging analyses.However, there are many other potential natural applications-e.g., the characterization of variability in microquasars and subparsec radio emission from astrophysical jets-that may be incorporated with a handful of minor extensions.
The assumed broken power-law form of the noise model in Equation (7), while well justified for GRMHD accretion simulations and reasonably generic, may not apply in general.However, there is no obstacle to implementing alternative functional forms for the noise.Such alternatives could, e.g., relax the assumption of azimuthal symmetry by including a parameterized azimuthal structure.Similarly, the assumption of no nonzero off-diagonal terms in the covariance-which may be generated, e.g., by coherent variations in the image structure -may also be relaxed.Such would require a model for the particular structure of the anticipated off-diagonal terms.
An important limitation of the procedure as presented is the lack of any model for temporal correlations.Again, this limitation might be relaxed at the expense of specifying an appropriate model.However, in the absence of such an extension to the model, we retain a requirement on the data: a sufficient number of statistically independent samples of the complex visibility function must be present at all locations in the (u, v)-plane.For Sgr A * , this requirement precludes singleday analyses, for which subhour temporal correlations can bias the noise model reconstructions significantly in the absence of additional source realizations.Combining at least 2 days, and thereby better conditioning the data to the underlying assumption of the model, is sufficient to mitigate these biases in the applications to the 2017 EHT Sgr A * campaign.
As with the data-driven variance estimation, the interpretation of Bayesian noise modeling depends on the origin of the underlying variability.Where it is due to stochastic fluctuations about a mean, the reconstructed noise model is again a direct measurement of the spatial power spectrum.Where coherent secular motions dominate the variability, the interpretation is more complicated.This ambiguity is similar to that surrounding the meaning of the reconstructed mean image.

Application to M87 *
As a first science demonstration we apply the data-driven variability characterization and Bayesian noise reconstruction to the 2017 EHT observations of M87 * .The estimated gravitational timescale of M87 * is 9 hr, and thus little intraday variability is expected, with the possibility of interday variability across the 7 days of the 2017 EHT M87 * campaign (M87 * Paper II, Satapathy et al. 2022).This expectation is supported by the image and model reconstructions of M87 * , which show no significant variation from April 5 to 6 and from April 10 to 11 and modest variations between the two pairs of observing days (M87 * Paper IV; M87 * Paper VI).
The analyses presented here differ from those in M87 * Papers I−VI in two important respects.First, we analyze the entire data set simultaneously, combining the information across multiple observation nights.Second, we explicitly disregard the overall compact flux via the light-curve normalization procedure described in Section 4.1.Because M87 * does contain significant extended structure, which is presumably static across the entire observing campaign, we estimate the light curve associated with the compact structure via the intrasite baselines, i.e., SMA-JCMT and ALMA-APEX, using the variance weighted mean for scans in which more than one intrasite visibility amplitude is available.

Data-driven Variability Estimate
We begin with the application of the data-driven, modelagnostic characterization of the variability as a function of baseline length described in Section 4. The empirically derived mean visibility amplitudes and their variance across all 4 days  , 6, 10, and 11; right).The top row shows maximum likelihood (left), posterior average (middle), and standard deviation (right) images, with nine additional random draws from the posterior shown below.To recover physical brightness units, all images have been renormalized by a common factor so that the total flux of the mean image matches the mean of the light curve.In the bottom panels, the reconstructed noise posterior (orange) is shown in comparison to the allowed noise models permitted by the assumed priors (blue, covering the entire range) and typical thermal noise estimates (gray triangles).For reference, the data-driven nonparametric variance estimates from Figure 14, constructed from the full 2017 EHT M87 * campaign, are reproduced (black points).are shown in Figure 14.The mean amplitudes exhibit the clear double-humped structure consistent with a narrow crescent, already apparent in the amplitudes themselves (M87 * Paper III).The normalized amplitude variances are generally small, consistent with the typical thermal errors (though still much larger than the systematic uncertainty).That is, as anticipated, there is little evidence for stochastic variability in M87 * from this model-agnostic, data-driven approach.We now move on to Bayesian source and noise reconstructions.

Bayesian Noise Reconstructions
The expected absence of intraday variability precludes placing strong priors on the noise model.Thus, we do not apply priors based on the data-driven variance estimates from In addition to image reconstruction, we simultaneously fit the noise model to the data and find no evidence for any excess noise.The bottom left panel of Figure 15 shows that the posterior distribution of the required additional noise on 2017 April 10, indicated by the orange band, is generally small in comparison to both the thermal errors and the data-driven, model-agnostic estimates from all four observing days presented in Figure 14.
While there is little evidence for intraday variability, interday variability is obviously present in the image reconstructions in M87 * Paper IV and M87 * Paper VI.This conclusion is borne out by the multiday image reconstructions shown in the right panels of Figure 15.In stark contrast to the single-day analysis, when multiple observation days are combined, there is a clear need for additional noise, the posterior range of which is shown by the orange band in the bottom right panel, even though marginally subthermal.Above 4 Gλ, i.e., on characteristic scales smaller than 50 μas, the required additional noise matches that anticipated by the data-driven, model-agnostic analysis.
A quantitative comparison of the required additional variability obtained from the Bayesian noise modeling is presented in Figure 16.Consistent with the lack of significant Galactic scattering toward M87 * , no substantial σ T is required.Estimates of the additional systematic noise from the statistics of closure quantities are consistent with the roughly 1% values obtained (M87 * Paper III).On a single day, no additional variability noise is required, i.e., a < 0.05% at 2σ. Across all observation days, a = 1.00% ± 0.25% ± 0.45% at 1σ and 2σ, respectively.No other parameters of the noise model are meaningfully constrained in this example application.

Summary and Conclusions
Intraday variability presents a novel challenge for highfrequency VLBI observations, like those performed by the EHT.Traditional image reconstruction algorithms are predicated on static source structures that permit the combination of VLBI observations throughout the observing night, i.e., Earth aperture synthesis, an assumption explicitly violated by variable targets.At the same time, the nature and properties of the source variability are of intrinsic interest, informing the underlying physical properties of the source (Georgiev et al. 2022).
For the sources of greatest interest to the EHT, Sgr A * and M87 * , this variability is associated with stochastic fluctuations driven by MHD turbulence.Thus, we have developed methods to reconstruct the mean source structure and statistically characterize the variability around it.By doing so, we reap the benefits underlying Earth aperture synthesis-better (u, v)coverage and larger effective data sets, both of which increase the fidelity and volume of information that may be extractedwhile addressing the variable nature of the source.These methods are more broadly applicable to any time-variable source, though they are most naturally applied to the imaging and/or modeling of sources exhibiting stochastic variations.
In GRMHD simulations, the strongest structural variability occurs on the longest timescales and largest spatial scales, i.e., the spatiotemporal power spectrum is "red-red."Therefore, Georgiev et al. (2022) find that it is possible to substantially reduce the degree of variability that must be mitigated by utilizing light-curve-normalized visibility data.For GRMHD simulations of Sgr A * and M87 * , the remaining power spectrum is well described by a broken power law that typically peaks below 2 Gλ.
We have presented a data-driven nonparametric method for directly estimating the power spectrum of source image fluctuations from multiday sets of observations, like those by the EHT.For the 2017 EHT Sgr A * observing campaign, we have calibrated the resulting power spectrum estimates against GRMHD simulations.In particular, we have demonstrated the ability to faithfully extract the magnitude and long-baseline power-law index of the image fluctuation power spectrum near 4 Gλ, corresponding to scales of 50 μas.We further present a formalism, Bayesian noise reconstruction, by which the statistical properties of the variability are constrained simultaneously with the mean source structure.These make use of parameterized "noise models," in which an excess uncertainty budget is included within a self-consistent likelihood.In applying this procedure, we necessarily focus exclusively on the diagonal of the visibility covariance, ignoring correlations between the visibilities at different locations in the (u, v)-plane.This procedure has been implemented in the THEMIS parameter estimation framework (Broderick et al. 2020a) and validated with a number of mock data sets.
We conclude that the Bayesian noise reconstruction scheme is effective at mitigating the impact of even substantial variability on the problem of source reconstruction and provides self-consistent estimates of the intrinsic variability.We have tested the Bayesian noise reconstruction algorithm on simulated data tests that span a wide range of mean images and types of variability, extending to the simultaneous reconstruction of mean images and power spectra from GRMHD movies.In all cases, we are able to faithfully recover both the image structure and excess variability noise.These reconstructions are robust in the sense that any model possessing a sufficient number of degrees of freedom recovers similar images and similar "noise" power spectra.Results of a variety of imaging and modeling analyses of the EHT Sgr A * observations are presented in Papers III and IV.As with the data-driven variability characterization, the recovered noise model from the Bayesian noise reconstruction directly informs models of variability within the emission region.
We have applied both methods to M87 * , for which GM/c 3 = 9 hr.Image reconstructions of the EHT M87 * data do not show significant variation among neighboring days, implying little intraday variability.We confirm this conclusion by analyzing the 2017 April 10 observation, finding no evidence for variability.Combining all four observing days, 2017 April 5, 6, 10, and 11, we recover the interday variability observed across the week.In particular, we find excellent agreement between the data-driven nonparametric estimates and the self-consistently obtained modeled variability noise.Moreover, the nonvariability components in the noise model, i.e., refractive scattering and fractional systematic error components, return their anticipated values for M87 * : negligible and 1%, respectively.
Both of the methods presented here have potential applications well beyond EHT data sets.As demonstrated with M87 * , they provide a natural method for estimating the degree of variability for sources that vary across multiple observation epochs.Moreover, by simultaneously recovering the variability and the mean flux distribution, Bayesian noise reconstruction enables the self-consistent combination of the data from multiple observations.That is, by construction, Bayesian noise modeling permits self-consistent, data-driven averaging across multiple observations, with the potential to result in more accurate mean image estimates.
A number of the assumptions made here regarding the particular form of the noise model are informed by our ultimate application to EHT analyses of Sgr A * and may be relaxed.Chief among these are azimuthal symmetry and the assumed vanishing of the off-diagonal elements of the covariance.The development of an appropriate noise model is source dependent.Nevertheless, it is clear that in addition to enabling traditional VLBI analysis techniques, the ability to detect and characterize variability opens a new window onto the dynamical processes at work in VLBI targets.

Figure 1 .
Figure 1.Summary of the (u, v)-coverage across the four observation days, including both high and low bands, in the 2017 Sgr A * EHT campaign.Radii of constant baseline length are shown by the dashed gray circles.

Figure 2 .
Figure 2. Distributions of the mean and variance reconstructions for a specific realization of the (u, v)-positions of simulated data points with thermal and structural variability included.In red, the distributions from an ensemble of such experiments are shown, holding the underlying mean visibility map and (u, v) positions fixed.The truth values are indicated by the vertical black lines.The average mean and variance are indicated by the vertical solid red lines.The vertical dashed red line shows the average log variance.The gray shaded region indicates the region in which the noise is subthermal.The vertical dashed blue lines indicate the μ and σ 2 from the data realization used to generate the bootstrap error estimates, shown by the blue distributions and with means shown by the blue solid lines.

Figure 3 .
Figure 3. Locations of circular patches in the (u, v)-plane within which means and variances of the detrended data are computed.For reference, the locations of the 2017 EHT Sgr A * measurements are indicated in blue.Red patches contain more than four data points and therefore contribute to the azimuthally averaged statistical estimates; open gray patches do not contribute.

Figure 4 .
Figure 4. Data-driven estimates of the mean and variance for simulated data sets generated using the 2017 EHT Sgr A * campaign parameters assuming the scattered iNoisy crescent, double point source, and elliptical disk models from Paper III. Gray triangles show an estimate of the thermal noise in each radial baseline bin.The time-and azimuthally averaged visibility amplitudes and temporal variances associated with the idealized visibility functions of the input movies are shown for each of the four observation days by the red lines.The orange shaded region indicates the 95% confidence level region spanned by the broken power-law fit to the variance between 2.5 and 6 Gλ described in Section 4.3.3.For reference, the visibility amplitudes of the 60 μas FWHM Gaussian that was used to provide an intermediate calibration of the LMT as described in Section 4.1 are shown by the blue dashed lines.

Figure 5 .
Figure 5. Data-driven estimates of the mean and variance for simulated data sets generated using the 2017 EHT Sgr A * campaign parameters assuming three different GRMHD simulations from the validation set used in Paper IV. Gray triangles show an estimate of the thermal noise in each radial baseline bin.The time-and azimuthally averaged visibility amplitudes and temporal variances associated with the idealized visibility functions of the input movies are shown for each of the four observation days by the red lines.The orange shaded region indicates the 95% confidence level region spanned by the broken power-law fit to the variance between 2.5 and 6 Gλ described in Section 4.3.3.For reference, the visibility amplitudes of the 60 μas FWHM Gaussian that was used to provide an intermediate calibration of the LMT as described in Section 4.1 are shown by the blue dashed lines.Note that the vertical axis range of the bottom row differs from that in Figure 4, reflecting the greater degree of variability in the GRMHD simulations.

Figure 6 .
Figure 6.Debiasing function, β −1 (|u|), derived from comparison of the estimated variances generated from simulated data sets associated with 90 GRMHD simulations to the visibility amplitude power spectra resulting from the GRMHD movie directly.Below |u| < 2 Gλ, the LMT calibration procedure precludes accurate variance estimates.Above 9 Gλ, there are no data.The region from 6 to 7.5 Gλ has poor (u, v)-coverage, resulting in significant underestimates of the variance.

Figure 7 .
Figure 7. Debiased data-driven estimates of the variance for simulated data sets generated using the 2017 EHT Sgr A * campaign parameters assuming three different GRMHD simulations from the validation set used in Paper IV. Gray triangles show an estimate of the thermal noise in each radial baseline bin.The time-and azimuthally averaged visibility variances associated with the idealized visibility functions of the input movies are shown for each of the four observation days by the red lines.The orange shaded region indicates the 95% confidence level region spanned by the broken power-law fit to the variance between 2.5 and 6 Gλ described in Section 4.3.3.

Figure 8 .
Figure8.Joint posteriors on the normalized intensity and FWHM of a Gaussian fitted without any noise modeling to simulated data sets in which no additional noise was added (black), additional σ sys and σ ref were added as Gaussian random variables (green), and additional σ sys , σ ref , and σ var are added as Gaussian random variables (blue).The impact of the unmodeled or misspecified noise is to substantially bias the reconstructed parameter estimates (i.e., by many σ).Filled contours correspond to the 50%, 90%, and 99% quantiles.The truth values used to generate the simulated data are shown in red.

Figure 9 .
Figure 9. Joint posteriors on the normalized intensity, FWMH, and relevant noise model parameters when a Gaussian with the appropriate noise model is fit to simulated data sets in which additional σ sys and σ ref were added as Gaussian random variables (green), and additional σ sys , σ ref , and σ var are added as Gaussian random variables (blue).Filled contours correspond to the 50%, 90%, and 99% quantiles.Panels showing the joint posteriors among the noise model parameters are shaded light red.The truth values used to generate the simulated data are shown in red.

Figure 10 .
Figure 10.Lower left: joint posteriors on the normalized intensity, ring diameter, model amplitudes and phases, and smoothing scale for an mG-ring model (see Paper IV for detailed definition), and relevant noise model parameters when a Gaussian with the appropriate noise model is fit to simulated data sets in which additional σ sys , σ ref , and σ var are added as Gaussian random variables (blue).Filled contours correspond to the 50%, 90%, and 99% quantiles.Panels showing the joint posteriors among the noise model parameters are shaded light red.The truth values used to generate the simulated data are shown in red.Top right: a summary of the absolute residuals of the reconstructed structural parameters, after marginalization over all other parameters and normalized by the one-dimensional posterior standard deviation for a variety of mG-ring analyses.Shown are analyses of simulated data sets produced with no additional noise (black circles), with σ sys and σ ref added as Gaussian random variables (green squares), and σ sys , σ ref , and σ var added as Gaussian random variables (blue diamonds).Filled and open points show the results of analyses with and without including the relevant noise model, respectively.For reference, 1σ, 2σ, and 3σ lines are shown in red.

Figure 11 .
Figure 11.Reconstructed images (top) and noise (bottom) for the crescent (left), point (middle), and elliptical disk (right) iNoisy simulated data sets.In each, on the top line the time-averaged truth image (left), maximum-likelihood (middle), and posterior-average (right) images are shown, with nine additional random draws from the posterior shown below.To recover physical brightness units, all images have been renormalized by a common factor so that the total flux of the mean image matches to 2.5 Jy.In the bottom panels, the data-driven nonparametric estimates from Section 4 are shown in black, and the allowed noise models permitted by the assumed priors on the noise model parameters are shown in blue.Finally, the reconstructed posterior of the noise is shown in orange.

Figure 12 .
Figure 12.Images (top) and noise posteriors (bottom) from mG-ring fits to synthetic data sets on multiple days (2017 April 6 and 7 EHT Sgr A * low-band (u, v)coverage) generated from three GRMHD simulations.In each, on the top line the time-averaged truth image (left), maximum-likelihood (middle), and posterioraverage (right) images are shown, with nine additional random draws from the posterior shown below.To recover physical brightness units, all images have been renormalized by a common factor so that the total flux of the mean image matches to 2.5 Jy.In the bottom panels, the data-driven nonparametric estimates from Section 4 are shown in black, and the allowed noise models permitted by the assumed priors on the noise model parameters are shown in blue.Finally, the reconstructed posterior of the noise is shown in green.
shows mG-ring reconstruction image posteriors in comparison to the average truth images for the same three GRMHD examples shown in Sections 4.4.2 and 4.5.The general size and brightness distributions of the mG-ring images

Figure 13 .
Figure 13.Reconstructed images (top) and noise posteriors (bottom) for synthetic data sets on multiple days (2017 April 6 and 7 EHT Sgr A * combined high-and low-band (u, v)-coverage) generated from three GRMHD simulations.In each, on the top line the time-averaged truth image (left), maximum-likelihood (middle), and posterior-average (right) images are shown, with nine additional random draws from the posterior shown below.To recover physical brightness units, all images have been renormalized by a common factor so that the total flux of the mean image matches to 2.5 Jy.In the bottom panels, the data-driven nonparametric estimates from Section 4 are shown in black, and the allowed noise models permitted by the assumed priors on the noise model parameters are shown in blue.The reconstructed posterior of the noise is shown in orange.For comparison, the posterior of the noise from the mG-ring fits in Figure 12 is reproduced in green.

Figure 14 .
Figure 14.Estimates of the mean (top) and variance (bottom) for the M87 * sampled with Δu = 1 Gλ.In both cases, the data have been normalized by the intrasite baselines and the LMT gains have been approximately calibrated assuming that the LMT-SMT baselines lie on a Gaussian with FWHM 60 μas.To avoid large phase decoherences, we have further phasecalibrated JCMT.For reference, the range of visibility amplitudes from the crescent component of the xs-ringauss fit presented in Figure 5 of M87 * Paper IV is shown by the shaded red region in the top panel.Variance estimates for |u| < 2.5 Gλ are excluded owing to the LMT calibration procedure.

Figure 15 .
Figure 15.Reconstructed images (top) and noise posteriors (bottom) for the EHT observations of M87 * on a single day (2017 April 10; left) and combined across all days(2017 April 5, 6, 10, and 11; right).The top row shows maximum likelihood (left), posterior average (middle), and standard deviation (right) images, with nine additional random draws from the posterior shown below.To recover physical brightness units, all images have been renormalized by a common factor so that the total flux of the mean image matches the mean of the light curve.In the bottom panels, the reconstructed noise posterior (orange) is shown in comparison to the allowed noise models permitted by the assumed priors (blue, covering the entire range) and typical thermal noise estimates (gray triangles).For reference, the data-driven nonparametric variance estimates from Figure14, constructed from the full 2017 EHT M87 * campaign, are reproduced (black points).
the preceding section, and instead we assume linear priors in the ranges σ T ä [0, 0.02], f ä [0, 0.1], a ä [0, 0.5], u 0 ä [1 Gλ, 10 Gλ], b ä [0.5, 6], and c ä [0.5, 6].The range of noise models is indicated by the blue shaded areas in Figure 15 and covers the entirety of the plot region.The absence of intraday variability is borne out by the Bayesian noise reconstruction on a single day.The left panels of Figure 15 show draws from the posterior of the reconstructed image of the EHT observations of M87 * on 2017 April 10.This day has the fewest scans of any of the observations of M87 * during the 2017 EHT campaign and is correspondingly more uncertain, a fact that is reflected in the variety of potential images in the imaging posterior (Figure 15, middle left panels).

Figure 16 .
Figure 16.Joint posteriors on the noise model parameters from the analysis of the 2017 EHT observations of M87 * .Cases shown are when only the 2017 April 10 data set is analyzed (red, upper right triangle) and when all days are analyzed (blue, lower left triangle).Contours indicate the 50%, 90%, and 99% quantiles.
Results of applying this procedure to Sgr A * are contained in Paper IV, and the implications of the measured a and b noise model parameters for GRMHD models of Sgr A * are discussed in Paper V. These estimates of the Sgr A * variability are explicitly used to inform priors on the noise modeling employed by the THEMIS mG-ring analyses in Paper IV and the THEMIS image reconstructions in Paper III; the remaining imaging methods employed in Paper III survey over ranges of noise model parameters informed by these analyses.

Figure 18 .
Figure18.Distributions of reconstructed V , V u , and V v for a specific realization of the (u, v)-positions generated upon varying a set of 10 complex V j within a complex error of magnitude 0.1 Jy with fixed (u j , v j ).The underlying truth values are indicated by the vertical black lines.