UvA-DARE (Digital Academic Repository) Monitoring the Morphology of M87* in 2009-2017 with the Event Horizon Telescope

The Event Horizon Telescope ( EHT ) has recently delivered the ﬁ rst resolved images of M87 * , the supermassive black hole in the center of the M87 galaxy. These images were produced using 230 GHz observations performed in 2017 April. Additional observations are required to investigate the persistence of the primary image feature — a ring with azimuthal brightness asymmetry — and to quantify the image variability on event horizon scales. To address this need, we analyze M87 * data collected with prototype EHT arrays in 2009, 2011, 2012, and 2013. While these observations do not contain enough information to produce images, they are suf ﬁ cient to constrain simple geometric models. We develop a  modeling approach based on the framework utilized for the 2017 EHT data analysis and validate our procedures using synthetic data. Applying the same approach to the observational data sets, we ﬁ nd the M87 * morphology in 2009 – 2017 to be consistent with a  persistent asymmetric ring of ∼ 40 μ as diameter. The position angle of the peak intensity varies in time. In particular, we ﬁ nd a  signi ﬁ cant difference between the position angle measured in 2013 and 2017. These variations are in broad agreement with predictions of a  subset of general relativistic magnetohydrodynamic simulations. We show that quantifying the variability across multiple observational epochs has the potential to constrain the physical properties of the source, such as the accretion state or the black hole spin. ,


INTRODUCTION
The compact radio source in the center of the M87 galaxy, hereafter M87 * , has been observed at 1.3 millimeter wavelength (230 GHz frequency) using very long baseline interferometry (VLBI) since 2009.These observations, performed by early configurations of the Event Horizon Telescope (EHT, Doeleman et al. 2009) array, measured the size of the compact emission to be ∼ 40 µas, with large systematic uncertainties related to the limited baseline coverage (Doeleman et al. 2012;Akiyama et al. 2015).The addition of new sites and sensitivity improvements leading up to the April 2017 observations yielded the first resolved images of the source (Event Horizon Telescope Collaboration et al. 2019a,b,c,d,e,f, hereafter EHTC I-VI).These images revealed an asymmetric ring (a crescent) with a diameter d = 42±3 µas and a position angle of the bright side φ B between 150 • and 200 • east of north (counterclockwise from north/up as seen on the sky, EHTC VI), see the left panel of Figure 1.The apparent size and appearance of the observed ring agree with theoretical expectations for a 6.5 × 10 9 M black hole driving a magnetized accretion inflow/outflow system, inefficiently radiating via synchrotron emission (Yuan & Narayan 2014, EHTC V).Trajectories of the emitted photons are subject to strong deflection in the vicinity of the event horizon, resulting in a lensed ring-like feature seen by a distant observer -the anticipated shadow of a black hole (Bardeen 1973;Luminet 1979;Falcke et al. 2000;Broderick & Loeb 2009).
General relativistic magnetohydrodynamic (GRMHD) simulations of relativistic plasma in the accretion flow and jet-launching region close to the black hole (EHTC V, Porth et al. 2019) predict that the M87 * source structure will exhibit a prominent asymmetric ring throughout multiple years of observations, with a mean diameter d primarily determined by the black hole mass-to-distance ratio and a position angle φ B primarily determined by the orientation of the black hole spin axis.The detailed appearance of M87 * may also be influenced by many poorly constrained effects, such as the black hole spin magnitude, magnetic field structure in the accretion flow (Narayan et al. 2012, EHTC V), the electron heating mechanism (e.g., Mościbrodzka et al. 2016;Chael et al. 2018a), nonthermal electrons (e.g., Davelaar et al. 2019), and misalignment between the jet and the black hole spin (White et al. 2020;Chatterjee et al. 2020).Moreover, turbulence in the accretion flow, perhaps driven by the magnetorotational instability (Balbus & Hawley 1991), is expected to cause stochastic variability in the image with correlation timescales of up to a few weeks (∼ dynamical time for M87 * ).The model uncertainties and expected timedependent variability of these theoretical predictions A 42 µas circle is plotted with a dashed line for reference.The observed position angle of the approaching jet φjet is 288 • east of north (Walker et al. 2018).Under the assumed physical interpretation of the ring, we expect to find the bright side of the crescent on average approximately 90 • clockwise from φjet (EHTC V).We assume a convention φB,exp = 198 • , indicated with a blue dashed line.Right panel: a random snapshot (note that this is not a fit to the EHT image) from a GRMHD simulation adopting the expected properties of M87 * (Section 4.1).The spin vector of the black hole is partially directed into the page, counteraligned with the approaching jet (and aligned with the deboosted receding jet); its projection onto the observer's screen is located at the position angle of φspin = φjet − 180 • .strongly motivate the need for additional observations of M87 * , especially on timescales long enough to yield uncorrelated snapshots of the turbulent flow.
To this end, we analyze archival EHT observations of M87 * from observing campaigns in 2009, 2011, 2012, and 2013.While these observations do not have enough baseline coverage to form images (EHTC IV), they are sufficient to constrain simple geometrical models, following procedures similar to those presented in EHTC VI.We employ asymmetric ring models that are motivated by both results obtained with the mature 2017 array and the expectation from GRMHD simulations that the ring feature is persistent.
We begin, in Section 2, by summarizing the details of these archival observations with the "proto-EHT" arrays.In Section 3, we describe our procedure for fitting simple geometrical models to these observations.In Section 4, we test this procedure using synthetic proto-EHT observations of GRMHD snapshots and of the EHT images of M87 * .We then use the same procedure to fit models to the archival observations of M87 * in Section 5. We discuss the implications of these results for our theoretical understanding of M87 * in Section 6, and briefly summarize our findings in Section 7.

OBSERVATIONS AND DATA
Our analysis covers five separate 1.3 mm VLBI observing campaigns conducted in 2009, 2011, 2012, 2013, and 2017.The M87 * data from 2011 and 2013 have not been published previously.For all campaigns except 2012, M87 * was observed on multiple nights.For the proto-EHT data sets (2009)(2010)(2011)(2012)(2013) we simultaneously utilize the entire data set from each year, fitting to data from multiple days with a single source model, when available.This is motivated by the M87 * dynamical timescale argument, little visibility amplitude variation reported by EHTC III on a one-week timescale, as well as by the limited amount of available data and lack of evidence for interday variability in the proto-EHT data sets.We use incoherent averaging to estimate visibility amplitudes on each scan (∼ few minutes of continuous observation) and bispectral averaging to estimate closure phases (Rogers et al. 1995;Johnson et al. 2015;Fish et al. 2016).The frequency setup in 2009-2013 consisted of two 480 MHz bands, centered at 229.089 and 229.601GHz.Whenever both bands or both parallel-hand polarization components were available, we incoherently averaged all simultaneous visibility amplitudes.The data sets are summarized in Table 1, where the number of detections for nonredundant baselines of different projected baseline lengths is given, with the corresponding (u, v)-coverage shown in Figure 2. Redundant baselines yield independent observations of the same visibility.In Table 1 we also indicate the number of available nonredundant closure phases (CPs, not counting redundant and intrasite baselines, minimal set, see Blackburn et al. 2020).As is the case for non-phase-referenced VLBI observations (Thompson et al. 2017), we do not have access to absolute visibility phases.All visibility amplitudes observed in 2009-2013 are presented in Figure 3.
A more detailed summary of the observational setup of the proto-EHT array in 2009-2013 and the associated data reduction procedures can be found in Fish et al. (2016).All data sets discussed in this paper are publicly available1 .

2009-2012
Prior to 2013, the proto-EHT array included telescopes at three geographical locations: (1) the Combined Array for Research in Millimeter-wave Astronomy (CARMA, CA) in Cedar Flat, California, (2) the Submillimeter Telescope (SMT, AZ) on Mt.Graham in Arizona, and (3) the Submillimeter Array (SMA, SM), the James Clerk Maxwell Telescope (JCMT, JC), and the Caltech Submillimeter Observatory (CSO, CS) on Maunakea in Hawai'i.These arrays were strongly east-west oriented, and the longest projected baselines, between SMT and Hawai'i, reached about 3.5 Gλ, corresponding to the instrument resolution (maximum fringe spacing) of ∼ 60 µas.The 2011 observations of M87 * have not been published but follow the data reduction procedures described in Lu et al. (2013).The 2009 and 2012 observations and data processing of M87 * have been published in Doeleman et al. (2012) and Akiyama et al. (2015), respectively.However, our analysis uses a modified processing of the 2012 data because the original processing erroneously applied the same correction for atmospheric opacity at the SMT twice. 2 The SMT calibration procedures have been updated since then (Issaoun et al. 2017).
Each observation included multiple subarrays of CARMA as well as simultaneous measurements of the total source flux density with CARMA acting as a connected-element interferometer; these properties then allow the CARMA amplitude gains to be "network calibrated" (Fish et al. 2011;Johnson et al. 2015, EHTC III).Of these three observing campaigns, only 2012 provides closure phase information for M87 * , and all closure phases measured on the single, narrow triangle SMT-SMA-CARMA were consistent with zero to within 2 σ (Akiyama et al. 2015), see Figure 4.

2013
The 2013 observing epoch did not include the CSO, but added the Atacama Pathfinder Experiment facility (APEX, AP) in the Atacama Desert in Chile.This additional site brought for the first time the long (≈ 5 − 6 Gλ) baselines CARMA-APEX and SMT-APEX, that are roughly orthogonal to the CARMA-Hawai'i and SMT-Hawai'i baselines, see Figure 2. The addition of APEX increased the instrument resolution (maximum fringe spacing) to ∼ 35 µas.While the 2013 observations of Sgr A * were presented in several publications (Johnson et al. 2015;Fish et al. 2016;Lu et al. 2018), the M87 * observations obtained during the 2013 campaign have not been published previously.
The proto-EHT array observed M87 * on March 21st, 22nd, 23rd, and 26th 2013.CARMA-APEX detections were found on March 22nd (11 detections) and 23rd (7 detections) with a single SMT-APEX detection on March 23rd.March 23rd (MJD 36374) was the only day with detections on baselines to each of the four geographical sites.No detections between Hawai'i and APEX were found, and there were no simultaneous detections over a closed triangle that would allow for the measurement of closure phase.

2017
In 2017, the EHT observed M87 * with five geographical sites (EHTC I; EHTC II), without CSO and CARMA, but with the addition of the Large Millimeter Telescope Alfonso Serrano (LMT, LM) on the Volcán Sierra Negra in Mexico, the IRAM 30-m telescope (PV) on Pico Veleta in Spain, and the phased-up Atacama Large Millimeter/submillimeter Array (ALMA, AA, Matthews et al. 2018;Goddi et al. 2019).The expansion of the array resulted in significant improvements in (u, v)-coverage, shown with gray lines in Figure 2, and instrument resolution raised to ∼ 25 µas.In addition to hardware setup developments (EHTC II), the recorded bandwidth was increased from 2×0.5 GHz to 2×2 GHz (226)(227)(228)(229)(230). The 2017 data processing pipeline used ALMA as an anchor station (EHTC III).Its high sensitivity greatly improved the signal phase stability (Blackburn et al. 2019;Janssen et al. 2019, EHTC III) and enabled data analysis based on robustly detected closure quantities obtained from coherently averaged visibilities (EHTC IV, Blackburn et al. 2020) rather than on visibility amplitudes alone.These improvements allowed for an unambiguous analysis of the M87 * image by con- straining the set of physical (EHTC V) and geometric (EHTC VI) models representing the source morphology.
where the measured Fourier coefficients V (u, v) are referred to as "visibilities" (Thompson et al. 2017).When an array of N telescopes observes a source, N (N − 1)/2 independent visibility measurements are obtained, provided detections on all baselines are found.Certain properties of the geometry described by I(x, y) can be inferred directly from inspecting the visibility data.
In the top panel of Figure 3 we summarize all the M87 * detections obtained during 2009-2013 observations as a function of projected baseline length √ u 2 + v 2 .Dashed lines represent R, the analytic Fourier transform of an infinitely thin ring with a total intensity I 0 and a diameter d 0 , where J 0 is a zeroth-order Bessel function of the first kind.This simple analytic model predicts the visibility null located at and a wide plateau around the first maximum, located at recovering about 40% of the flux density seen on short baselines.In Figure 3 we use d 0 = 45.0 µas and show R curves corresponding to I 0 = 0.25, 0.5, 1.0, 1.5 to guide the eye.The behaviors of the visibility amplitudes, particularly the fall-off rate seen on medium-length baselines (1.0-3.6 Gλ) in all data sets and the flux density recovery on long baselines to APEX in 2013, are roughly consistent with that of a simple ring model.Moreover, all detections on baselines to APEX have a similar flux density of ∼ 0.2 Jy, while the projected baseline length varies between 5.2 and 6.1 Gλ.In the analytic thin ring model framework, this can be readily understood, because baselines to APEX sample the wide plateau around the maximum located at b 1 , Equation 4. The gray dots in Figure 3 correspond to the source model constructed based on the 2017 EHT observations -the mean of the four images reconstructed for April 5th, 6th, 10th and 11th 2017 with the eht-imaging pipeline (Chael et al. 2016, EHTC IV).In the 2017 model, east-west baselines, such as SMT-Hawai'i, probe a deep visibility null located around b 0 (Equation 3), where sampled amplitudes drop below 0.01 Jy.Northsouth baselines do not show a similar feature, which indicates source asymmetry.Irrespective of the orientation, visibility amplitudes flatten out around b 1 .Gray dots with black envelopes represent the 2017 source model sampled at the (u, v)-coordinates of the past observations, for which all medium-length baselines were oriented in the east-west direction.
One can immediately notice interesting discrepancies.The visibility amplitudes measured on long baselines to APEX (projected baseline length ∼ b 1 ) in 2013 were about a factor of 2 larger than the corresponding 2017 source model predictions.At the same time, the flux density on the short CARMA-SMT baseline is consistent between the 2013 measurements and the 2017 model predictions.This shows that the image on the sky has changed between 2013 and 2017 in a structural way, which cannot be explained with a simple total intensity scaling.We also notice that several detections obtained in 2009-2011, corresponding to projected (u, v)-distances of 3.2-3.5Gλ on Hawai'i-USA baselines, record flux density above 0.1 Jy.At the same time, the 2017 model predicts that these baselines sample a visibility null region around b 0 , with a flux density lower by an order of magnitude.However, the compact flux (on short baselines) did not change by more than a factor of two, remaining between 0.5 and 1.0 Jy throughout the 2009-2017 observations, see the bottom panel of Figure 3.This suggests that the null location in the past (if present) was different than that observed in 2017, which may correspond to a fluctuation of the crescent position angle or a changing degree of source symmetry.
Apart from the visibility amplitude data, a limited number of closure phases from the narrow triangle SMT-SMA-CARMA has been obtained from the 2012 data set (Akiyama et al. 2015).All of these closure phases are measured to be consistent with zero, which suggests a high degree of east-west symmetry in the geometry of the source observed in 2012.While the closure phases on this triangle were not observed in 2017 (CARMA was not part of the 2017 array), we can numerically resample the 2017 images (eht-imaging reconstructions, EHTC IV) to verify the consistency.In Figure 4 we show the closure phases obtained in 2012, averaged between bands, the two CARMA subarrays shown separately.Near-zero closure phases observed in 2012 are roughly consistent with at least some models from 2017.Unfortunately, technical difficulties that occurred during the 2012 campaign precluded obtaining measurements between UTC 7.5 and 10.5, where nonzero closure phases are predicted by all 2017 models.
Altogether, we see strong suggestions that the 2009-2013 data sets describe a similar geometry to the 2017 results, but there are also substantial hints that the detailed properties of the source structure evolved between observations.These differences can be quantified with geometric modeling of the source morphology.

MODELING APPROACH
The sparse nature of the pre-2017 data sets precludes reconstructing images in the manner employed for the 2017 data (EHTC IV).However, the earlier data are still capable of providing interesting constraints on more strictly parameterized classes of models.Figure 5 shows the 2013 data set overplotted with a best-fit ring model3 (in blue; 5 degrees of freedom) and asymmetric Gaussian model (in red; 4 degrees of freedom).Both models attain similar fit qualities, as determined by Bayesian and Akaike information criteria (see, e.g., Liddle 2007).(Akiyama et al. 2015) and numerically resampled source models constructed based on the 2017 observations.The predictions of the asymmetric ring models RT and RG fitted to 2012 observations are also given, see Section 3 and Section 5. Data corresponding to two CARMA subarrays, C1 and C2, are shown separately.
In the absence of prior information, we would be unable to confidently select a preferred model.However, the robust image morphology reconstructed from the 2017 data provides a natural and strong prior for selecting an appropriate parameterization.The "generalized crescent" (GC) geometric models considered in EHTC VI yielded fit qualities comparable to those of image reconstructions for the 2017 data, and in this paper, we apply two variants of the GC model to the pre-2017 data sets.Owing to data sparsity, we restrict the parameter space of the models to a subset of that considered in EHTC VI containing only a handful of parameters of interest.
Throughout this paper, we use perceptually uniform color maps from the ehtplot library4 to display the images.In some of the figures we present models blurred to a resolution of 15 µas, adopted in this paper as the effective resolution of the EHT.The EHT instrument resolution measured as the maximum fringe spacing in 2017 is about 25 µas, however, for the image reconstruction methods employed in EHTC IV a moderate effect of superresolution can be expected (Honma et al. 2014;Chael et al. 2016).The effect may be much more prominent for the geometric models, which are not fundamentally limited by the resolution.

Model specification
Given the image morphology inferred from the 2017 data, the primary parameters of interest we would like to constrain using the earlier data sets are the size of the source, the orientation of any asymmetry, and the presence or absence of a central flux depression.The analyses presented in this paper use two simple ringlike models -similar to those presented in Kamruddin & Dexter (2013) and Benkevitch et al. (2016) -both of which are subsets of the GC models from EHTC VI.
The first model we consider is a concentric "slashed" ring, where the ring intensity is modulated by a linear gradient, hereafter denoted as RT.In this model, the flux is contained within a circular annulus with the inner and outer radii R in and R out , respectively.The model is described by five parameters: 1. the mean diameter of the ring d = R in + R out , 2. the position angle of the bright side of the ring 0 ≤ φ B < 2π,

the fractional thickness of the ring
4. the total intensity 0 < I 0 < 2 Jy, and 5. β, an intensity gradient ("slash") across the ring in the direction given by φ B , corresponding to the ratio between the dimmest and brightest points on the ring, 0 < β < 1.A ring of uniform brightness has β = 1, while a ring with vanishing flux at the dimmest part has β = 0.
This model reduces to a slashed disk for ψ → 1.The assumed definition of mean diameter is consistent with the one used in EHTC VI, allowing for direct comparisons.Except where otherwise specified, we use this first model for the analyses discussed in this paper.
As a check against model-specific biases, we consider a second model consisting of an infinitesimally thin slashed ring, blurred with a Gaussian kernel (EHTC IV).The equivalent five parameters for this model are: 1. the mean diameter of the ring d = 2R in = 2R out , 2. the position angle of the bright side of the ring 3. the width of the Gaussian blurring kernel 0 < σ < 40 µas, 4. the total intensity 0 < I 0 < 2 Jy, and 5. the slash 0 < β < 1.
This second model, hereafter referred to as RG, reduces to a circular Gaussian for d σ.Both the RT and RG models provide a measure of the source diameter (d), the orientation of the brightness asymmetry (φ B ), and the presence of a central flux depression.We quantify the latter property using the following general measure of relative ring thickness (from EHTC VI) where R out = R in for the RG model and σ = 0 for the RT model.All data sets except 2009 contain observations from intrasite baselines ("zero baselines"), see Table 1.For M87 * , these baselines are sensitive to the flux from the extended jet emission on ∼ arcsecond scales (EHTC IV, see also the bottom panel of Figure 3) and do not directly inform us about the compact source structure on scales of ∼ tens of microarcseconds.However, the intrasite baselines still provide useful constraints on station gain parameters (see Section 3.2), and hence, we do not flag them.Rather, we parameterize this large-scale flux using a large symmetric Gaussian component consisting of two parameters, flux and size.This component is entirely resolved out on intersite baselines and thus has no direct impact on the compact source geometry.Ultimately, the models that we use have 5 geometric parameters for the 2009 data set and 5+2=7 geometric parameters in all other cases.

Fitting procedure and priors
We perform the parameter estimation for this paper using Themis, an analysis framework developed by Broderick et al. (2020) for the specific requirements of EHT data analysis.Themis operates within a Bayesian formalism, employing a differential evolution Markov chain Monte Carlo (MCMC) sampler to explore the posterior space.Prior to model fitting, the data products are prepared in a manner similar to that described in EHTC VI.Descriptions of the likelihood constructions for different classes of data products are given in Broderick et al. (2020).
One important difference between the 2017 and pre-2017 data sets is that the latter contain almost exclusively visibility amplitude information, rather than having access to the robust closure quantities in both phase and amplitude (Thompson et al. 2017;Blackburn et al. 2020) that aided interpretation of the 2017 data.In addition to thermal noise, visibility amplitudes suffer from uncertainties in the absolute flux calibration, including potential systematic effects such as losses related to telescope pointing imperfections (EHTC III).These uncertainties are parameterized within Themis using stationbased amplitude gain factors g i , representing the scaling between the geometric model amplitudes Model amplitudes | Vij | are then compared with the measured visibility amplitudes |V ij |.Within Themis, the number of amplitude gain parameters N g is equal to the number of (station, scan) pairs, i.e., the gains are assumed to be constant across a single scan but uncorrelated from one scan to another.By explicitly modeling station-based gains, we correctly account for the otherwise covariant algebraic structure of the visibility calibration errors (Blackburn et al. 2020).At each MCMC step, Themis marginalizes over the gain amplitude parameters (subject to Gaussian priors) using a quadratic expansion of the log-likelihood around its maximum given the current parameter vector; see Broderick et al. (2020) for details.For the analysis of the 2009-2013 data sets presented in this paper, we have adopted rather conservative 15% amplitude gain uncertainties for each station, represented by symmetric Gaussian priors with a mean value of 0.0 and standard deviation of 0.15.The width of these priors reflects our confidence in the flux density calibration rather than the statistical variation in the visibility data.
The RT model is parameterized within Themis in terms of R out , φ B , ψ, I 0 , and β.Uniform priors are used for each of these parameters, with ranges of [0, 200] µas for R out , [0, 2π] for φ B , [0, 1] for ψ, [0, 2] Jy for I 0 , and [0, 1] for β.We achieve the "infinitesimally thin" ring of the RG model within Themis by imposing a strict prior on ψ of [10 −7 , 10 −6 ], and the prior on σ is uniform in the range [0, 40] µas.Because d and f w are derived parameters, we do not impose their priors directly but rather infer them from appropriate transformations of the priors on R out , ψ, and σ.The effective prior on which is uniform within the range [0, 200] µas.For the RT model, the effective prior on which is not uniform but rather increases toward smaller values.For the RG model, the effective prior on f w = σ 2 ln(2)/R out is given by where α = 2 ln(2)/5 ≈ 0.235 for our specified priors on σ and R out ; this prior is uniform within the range [0, α].

Degeneracies and limitations
Modeling tests revealed the presence of a largediameter secondary ring mode in the posterior distributions for the 2009-2012 data sets, corresponding to the dashed green line in Figure 5.This mode is excluded by the detections on long baselines (APEX baselines in 2013, multiple baselines in 2017) and detections on medium-length (∼1.5 Gλ) baselines (LMT-SMT in 2017).Excising this secondary mode, as we do for the posteriors presented in Figure 6, effectively limits the diameter d to be less than ∼ 80 µas.In all cases, the prior range is sufficient to capture the entire volume of the primary posterior mode, corresponding to an emission region of radius ∼ 20 µas.We have verified numerically that this procedure produces the same results as restricting the priors on R out to [0, 45] µas for the analysis of the 2009-2012 data sets.
As a consequence of the Fourier symmetry of a real domain input signal, we have Hence, visibility amplitude data alone cannot break the degeneracy between the orientation of φ B and φ B = φ B + 180 • , and effectively, we only constrain the axis of the crescent asymmetry.This is how the reported uncertainties should be interpreted.Having that in mind, for the 2009, 2011, and 2013, consisting exclusively of the visibility amplitude data, we choose the reported φ B using the prior information about the position angle of the jet φ jet to select the φ B such that φ jet −180 • < φ B < φ jet , where φ jet = 288 • (Walker et al. 2018).In other words, between the orientations φ B and φ B , we choose the one that is closer to the expected bright side position φ B,exp = 198 • .This is motivated by the theoretical interpretation of the asymmetric ring feature (EHTC V).In the case of the 2012 data set, for which a very limited number of closure phases is available, we report the orientation φ B of the maximum likelihood (ML) estimators, noting the bimodal character of the posterior distributions and the aforementioned 180 • degeneracy.These caveats do not apply to the 2017 data set, for which substantial closure phase information is available and breaks the degeneracy.
It is important to recognize that the parameters of a geometric model have no direct relation to the physical parameters of the source, unlike direct fitting using GRMHD simulation snapshots (Dexter et al. 2010;Kim et al. 2016;Fromm et al. 2019, EHTC V) or raytraced geometric source models (Broderick & Loeb 2009;Broderick et al. 2016;Vincent et al. 2020) to the data.The crescent model is a phenomenological description of the source morphology in the observer's plane.If physical parameters (such as black hole mass) are to be extracted, additional calibration, in general affected by the details of the assumed theoretical model and the (u, v)-coverage, needs to be performed (EHTC VI).

MODELING SYNTHETIC DATA
In order to verify whether the (u, v)-coverage and signal-to-noise ratio of the 2009-2013 observations are sufficient to constrain source geometric properties with simple asymmetric ring models, we have designed tests using synthetic VLBI observations.The synthetic observations are generated with the eht-imaging software (Chael et al. 2016(Chael et al. , 2018b) by sampling four emission models (GRMHD1, GRMHD2, MODEL1, MODEL2) with the (u, v)-coverage and thermal error budget reported for past observations.Additionally, corruption from time-dependent station-based gain errors has been folded into the synthetic observations.The groundtruth images that we use correspond to ray-traced snapshots of a GRMHD simulation and published images of M87 * (EHTC IV), reconstructed based on the 2017 observations.

GRMHD snapshots
For the first two synthetic data tests, we use a random snapshot from a GRMHD simulation of a low magnetic flux standard and normal evolution (SANE) accretion disk (Narayan et al. 2012;Sądowski et al. 2013) around a black hole with spin a * ≡ Jc/GM 2 = 0.5, shown in Figure 1 (second panel), and in Figure 7 (first panel).The GRMHD simulation was performed with the iharm code (Gammie et al. 2003), and the ray tracing was done with ipole (Mościbrodzka & Gammie 2018).Following Mościbrodzka et al. (2016) and EHTC V, we assume a thermal electron energy distribution function and relate the local ratio of ion (T i ) and electron temperature (T e ) to the plasma parameter β p representing the ratio of gas to magnetic pressure with R high = 40 and R low = 1 for the considered snapshot.The prescription given by Equation 10 parameterizes complex plasma microphysics, allowing us to efficiently survey different models of electron heating, resulting in a different geometry of the radiating region.
As an example, for the SANE models with large R high the emission originates predominantly in the strongly magnetized jet base region, while for a small R high disk emission dominates (EHTC V).
The image considered here is a higher resolution version (1280×1280 pixels) of one of the images generated for the Image Library of EHTC V and corresponds to a 6.5 × 10 9 M black hole at a distance D = 16.9Mpc.This choice results in an M/D ratio 5 of 3.80 µas and an observed black hole shadow that is nearly circular with an angular diameter not substantially different from the Schwarzschild case, which is 2 √ 27M/D = 39.45 µas (Bardeen 1973).For reference, the dashed circles plotted in Figure 1 and Figure 7 have a diameter of 42.0 µas.These parameters were chosen to be consistent with the ones inferred from the EHT 2017 observations (EHTC I).The camera is oriented with an inclination angle of 22 • .The viewing angle was chosen to agree with the expected inclination of the M87 * jet (Walker et al. 2018).The choices of spin a * , electron temperature parameter R high , and the SANE accretion state are arbitrary.The choice of R low follows the assumptions made in EHTC V. We also assume that the accretion disk plane is perpendicular to the black hole jet (the disk is not tilted).The first image, GRMHD1, has been rotated in such a way that the projection of the simulated black hole spin axis counteraligns with the observed position angle of the approaching M87 * jet, φ jet = 288 • (Walker et al. 2018;Kim et al. 2018).The GRMHD2 test corresponds to the same snapshot, but rotated counterclockwise by 90 • to φ jet = 18 • , hence displaying a brightness asymmetry in the eastwest rather than in the north-south direction, see the second panel of Figure 7.Because the (u, v)-coverage in 2009-2013 was highly anisotropic, a dependence of the fidelity of the results on the image orientation may be expected.

M87 * images
5 Hereafter, we use the natural units in which G = c = 1.For additional synthetic data tests, we consider images of M87 * generated based on the 2017 EHT observations, published in EHTC IV.We consider two days of the 2017 observations with good coverage and reported structural source differences (EHTC III; EHTC IV, Arras et al. 2020) -2017 April 6th (MODEL1) and 2017 April 11th (MODEL2), see the first row of Figure 7. MODEL2 was also shown in the first panel of Figure 1.While these models are constructed based on the observational data, we resample them numerically to obtain synthetic data sets considered in this section.Synthetic closure phases on the SMT-SMA-CARMA triangle computed from these models were shown in Figure 4.Note that there is a subtle difference between resampling a model constructed based on the 2017 data with a numerical model of the 2017 array and direct modeling of the actual 2017 data, considered in Section 5.The sampled images were generated utilizing the eht-imaging pipeline through the procedure outlined in EHTC IV, with a resolution of 64×64 pixels.This test can be viewed as an attempt to evaluate what the outcome of the modeling efforts would have been had the 2017 EHT observations been carried out with one of the proto-EHT 2009-2013 arrays rather than with the mature 2017 array.

Results for the synthetic data sets
Figure 7 shows a summary of the maximum likelihood (ML) RT model fits to the synthetic data sets; each column shows the fits for a single ground-truth image, and each row shows the fits for a single array configuration.Though our simple ring models cannot fully reproduce the properties of the abundant and high signal-to-noise data sampled with the 2017 array (i.e., fits to these data sets are characterized by poor reducedχ 2 values of χ 2 n ∼ 5), they nevertheless recover diameter and orientation values that are reasonably consistent with those reported in EHTC VI for the 2017 observations, including the counterclockwise shift of the brightness position angle between 2017 April 6th (MODEL1) and 11th (MODEL2).We note that the underfitting of the data set sampled with the 2017 array results in an artificial narrowing of the parameter posteriors (the full posterior is captured in EHTC VI by considering a more complicated GC model).ML estimators for 2009-2013 data sets, on the other hand, typically fit the data much closer than the thermal error budget.Because we model time-dependent station gains in a small array (often only two to three telescopes observing at the same time in 2009-2013 with missing detections on some baselines), the number of model parameters may be formally larger than the number of data points, and this complicates our estimation of the number of effective degrees of freedom (see also Section 5.3).As a consequence, we cannot gen-erally utilize a χ 2 n goodness-of-fit statistic as was done in EHTC IV and EHTC VI. 6The relevant parameter estimates and uncertainties from the RT model fits are listed in Table 2.In Figure 6 we show the marginalized posteriors for the diameter and position angle parameters.For each synthetic data set we also indicate the image domain position angle η, defined as where I k is the intensity and φ k is the position angle of the kth pixel in the image.A similar image domain position angle estimator was considered in EHTC IV.
We notice that the image-and model-based estimators may occasionally display significant differences (e.g., GRMHD1).However, they are both sensitive to global properties of the brightness distribution, unlike some other estimators that could be considered, such as, e.g., the location of the brightest pixel.For the diameter d estimates reported in Table 2 we list both the median and ML values, with 68% and 95% confidence intervals, respectively.For the orientation angle φ B we list the ML values with 68% confidence intervals, and for the fractional thickness f w we list the 95th distribution percentile.Values of φ B contained in parentheses indicate that the 68% confidence interval exceeds 100 • , in which case we have concluded that the orientation is effectively unconstrained.We find that the diameter is well constrained in general, with the GRMHD data sets recovering a typical value of ∼44 µas and 95% confidence intervals that never exceed ±12 µas from this value; the analogous measurement for the MODEL data sets is 44 ± 9 µas.Biases related to the array orientation can be seen -particularly with the 2013 coverage, the GRMHD2 test estimates an appreciably larger diameter than GRMHD1, inconsistent within the 68% confidence interval.
The orientation φ B is poorly constrained, with posterior distributions that depend strongly on the details of the (u, v)-coverage.Nevertheless, the 2009-2013 ML estimates provide orientations of the axis of asymmetry that are consistent within ±35 • with the results obtained using the 2017 synthetic coverage.We note that in 3 out of 4 synthetic data sets, the limited number of closure phases provided by the simulated data sets with the 2012 coverage is enough to correctly break the degeneracy in the position angle φ B , discussed in Section 3.3.For the synthetic GRMHD data sets, the preference for the correct brightness position angle is very strong (see Figure 6).For the MODEL data sets, the effect of closure phases is much less prominent, the distributions remain bimodal, and in the case of the MODEL1 data   set, the ML estimator points at the wrong orientation, suggesting brightness located in the north.We also consider the fractional thickness f w of the ring, as defined in Equation 5.The fractional thickness provides a measure of whether the data support the presence of a central flux depression, a signature feature of the black hole shadow, or if it is consistent with a disklike morphology (i.e., f w ≈ 1 for the RT model).We find that f w is less well constrained than the diameter d, consistently with the conclusions of EHTC VI.In some cases the ML estimator corresponds to a limit of a disk-like source morphology without a central depression (see Figure 7).Only the 2013 and 2017 synthetic data sets allow us to confidently establish the presence of a central flux depression, with posterior distributions excluding f w > 0.7 for all synthetic data sets (see Table 2).For the 2009-2012 coverage synthetic data sets, f w is not constrained sufficiently well to permit similar statements.We find that the RG model produces re-sults that are typically consistent with those of the RT model (see Appendix A, Figure 16).

MODELING REAL DATA
Encouraged by the results of the tests on synthetic data sets, we performed the same analysis on the 2009-2013 proto-EHT M87 * observations.We also present the analysis of the 2017 observations with the RT and RG models.In the latter case only lower band data (2 GHz bandwidth centered at 227 GHz) were used.

Source geometry estimators
In the first row of Figure 8 we show the ML estimators obtained by fitting the RT model to each observational data set; in Figure 9 we show the same for the RG model.For the 2009, 2011, and 2013 data sets, which only constrain the axis of the crescent asymmetry, the orientation of the brightness peak was selected with a prior derived from the approaching jet orientation on the sky, φ B ∈ (φ jet − 180 • , φ jet ), Section 3.3.The 2012 data set, for which some closure phases are available, indicates a weak preference toward the brightness position located in the north rather than in the south (Figure 10, in the Appendix B).However, the posterior distribution remains bimodal and 47% of its volume remains consistent with the jet orientation based prior.Hence, the distinction is not very significant -it is entirely dependent on the sign of closure phases shown in Figure 4, which are all consistent with zero to within 2 σ.Closure phases predicted by the ML estimators for the RT and RG models are indicated in Figure 4.The 2017 data sets are consistent with the orientation imposed by the jet position prior.The second rows of Figures 8 and 9 show the ML estimators blurred to a resolution of 15 µas.In the third rows of Figures 8 and 9 we present "mean images" for each data set, obtained by sampling 2 × 10 4 sets of model parameters from the MCMC chains and averaging the corresponding images.The mean images highlight structure that is "typical" of a random draw from the posterior distribution, though we note that a mean image itself does not necessarily provide a good fit to the data.Because of the rotational degeneracy, the orientation is always assumed to be the one closer to the orientation given by the ML estimate of φ B for the construction of these images.

Estimated parameters
The marginalized posteriors for the mean diameter d and position angle φ B for the observational data sets are shown in Figure 10 for both the RT and RG models, and tabulated values of the relevant estimates for the RT model are given in Table 3.The posterior distributions for the 2009-2012 data sets have complex shapes, not all parameters are well constrained, and ML estimators do not necessarily coincide with the marginalized posteriors maxima of the individual model parameters, which   Similar to the case of the synthetic data sets, we find that the diameter d is well constrained; the RT model 95% confidence intervals across all observational data sets always fall within ±12 µas from d = 40 µas.The 2013 proto-EHT observations provide meaningful constraints on φ B , indicating that the source asymmetry in 2013 was in the east-west direction, rather than in the north-south direction, as in the case of the 2017 data set.The 2009 and 2011 data sets do not constrain the orientation well.
All ML estimators and mean images from the RT model fits show a clear shadow feature, indicating that a disk-like, filled-in structure is disfavored by all of the observations (however, for 2009-2012 it cannot be excluded with high confidence based on the relative thickness parameter f w distribution, Table 3 and Appendix B).This is contrary to the synthetic data results shown in Figure 7, where some of the ML estimators correspond to a disk-like morphology.On the other hand, the mean images for the 2009-2012 RG model fits show a significant flux density interior to the ring, indicating that these data sets are consistent with a symmetric Gaussian source model, having no central flux depression.This is a consequence of the resolution being limited by the lack of long baselines prior to 2013.Short and medium-length baselines alone provide insufficient information to fully exclude the Gaussian mode allowed by the RG model, or the disk-like mode allowed by the RT model.For the same reason we see flattened posterior distributions of the RG diameter for 2009-2012 in Figure 10 -these indicate consistency with a small, strongly blurred ring with f w > 1, becoming a Gaussian in the limit of σ d.The slash parameter β can be measured to be 0.3±0.1 for the 2013 data set, which is consistent with the fits to the 2017 data sets that give β ∼ 0.20 (RT) or β ∼ 0.35 (RG).Fits to the 2012 data set indicate preference toward more symmetric brightness distribution, 2009 and 2011 do not provide meaningful constraints on β.

Quality of the visibility amplitude fits
The quality of fits and their behavior in terms of χ 2 n are similar to the synthetic data sets (see the comments in Section 4.3 and Table 4).In Figures 11-12 we explicitly give the number of independent visibility amplitude observations for each data set N ob and the number of independent visibilities on nonzero (intersite) baselines N nz .Note that the latter is larger than the number of detections on nonzero, nonredundant baselines given in Table 1, as some detections are independent but redundant.We also provide the number of explicitly modeled amplitude gains N g for each data set (see Section 3.2 and Section 4.3).Given the pathologies in the χ 2 n metric described in Section 4.3, we characterize the quality of the ML estimator fits to data using the two following metrics In Equations 12-13 we follow the notation of Equation 6, that is, Vi represents the visibilities of the geometric model while Vi corresponds to the model modified by applying the estimated gains, representing the final fit to observations V i .Uncertainties σ i correspond to the thermal error budget.We only account for nonzero baselines, which describe the compact source properties.In the bottom rows of Figures 11-12 we indicate two error bars.Black error bars correspond to the thermal uncertainties σ i , while the red ones correspond to inflated uncertainties approximately capturing the uncertainty related to the amplitude gains.For all 2009-2013 data sets, the flexibility of the full model is sufficient to fit the sparse data to within the thermal uncertainty level with a best-fit ML estimator.

Consistency with the prior analysis
In order to assess model-related biases and verify to what extent our simplified models recover geometric parameters consistent with the ones reported by EHT, i.e., image domain results given in EHTC IV Tables 5 and 7, and geometric modeling results given in EHTC VI Tables 2 and 3, we gather these results in Table 4.For the details of the methods and algorithms, see explanations and references in EHTC IV and EHTC VI.We notice that (1) differences between methods may be as large as 30 • for the same data set, (2) models considered in this work measure diameter and position angle consistently with more complex crescent models (GC) and with the image domain methods within the expected intermodel variation, (3) RT and RG models are too simplistic to fully capture the properties of the 2017 data sets, resulting in underfitting, indicated by higher values of χ 2 n , and (4) posteriors of the RT and RG models are narrower for the 2017 data sets than the GC posteriors as an effect of the underfitting.source morphology and stability.While the constraining power of the 2009-2013 data sets is rather weak in comparison to the 2017 observations, we find evidence for a persistent ring structure that shows modest struc-tural variability.Both the persistence and variability offer important constraints for models of M87 * .We will now discuss our evidence and theoretical implications for the presence of the shadow feature in 2009-2017 (Sec- tion 6.1), the persistence of the source geometry as an argument for its theoretical interpretation (Section 6.2), and the variability of the source geometry (Section 6.3).We also summarize the limitations and caveats of the theoretical interpretation within the GRMHD framework (Section 6.4).

Presence of the shadow feature
In Section 5 we have discussed the fits of the asymmetric ring models to the M87 * observations, indicating that all data sets are consistent with such a geometry.Within the framework of a ring model, a key question for the archival observations is whether we unambiguously detect the inner flux depression seen in the 2017 results, the expected feature of a black hole shadow.While maximum likelihood estimators clearly indicate this feature in all cases (Figure 8), a detailed inspection of the posterior distributions shown in Appendix B allows us to conclude that only the 2013 archival data set provides a robust detection of the central flux depression, constraining the relative thickness of the ring f w to less than 0.5 (see Table 3).For the 2009-2012 data sets, the RT model posteriors indicate some preference toward the presence of a flux depression; however, a disk-like filledin geometry cannot be excluded with a high degree of confidence (Appendix B).This is not entirely surprising, as the 2009-2012 observations lack baselines with projected lengths > 3.6 Gλ, probing spatial frequencies higher than the visibility null b 0 (Section 2.4), which are more sensitive to differences between an empty ring, a filled-in disk, and a Gaussian.
However, the presence of the shadow feature is sensitive to changes in the optical depth.The total compact flux density in 2009-2012 was measured to be 0.8−0.9Jy, significantly higher than the 0.5 − 0.6 Jy observed in 2013 and 2017 (Figure 3, bottom panel).Mildly elevated levels of X-ray emission from the nucleus of M87 before 2016 were also reported (Sun et al. 2018).These measurements suggest a higher mass accretion rate in 2009-2012 and hence a larger density scale, in turn increasing the opacities and the optical depth, possibly changing the appearance of the black hole shadow (Mościbrodzka et al. 2012;Chan et al. 2015).Is it possible that the shadow feature had been obscured by a more optically thick medium in 2009-2012 than in the case of the more recent 2013 and 2017 observations?While the fully general answer to that question would require extensive testing of a variety of GRMHD models, we address this concern by analyzing a representative example from the library of simulated M87 * images.We consider a random snapshot from a SANE simulation with spin a * = 0.5, and electron temperature parameter R high = 20.The R low parameter is equal to 1, as it is throughout this paper.The snapshot is then repeatedly ray-traced with its density scale (and hence opacities and emissivities) adjusted to give a total compact flux density equal to 0.1, 0.3, 0.5, 0.7, 0.9, and 1.1 Jy.The resulting images, normalized by the brightness maximum, are shown in Figure 13.These findings indicate that variation in the total compact flux density between 0.1 and 1.1 Jy does not eliminate the central brightness depression.Judging from the similarity between blurred images, the flux density scaling is also not expected to influence the estimated diameter or the position angle appreciably.However, it is likely to influence measures of asymmetry, such as the slash parameter β in the RT/RG models, discussed in Section 3.1.Interestingly, we see a preference toward more symmetric source geometry (larger β) in the 2012 posteriors (Appendix B).We conclude that the central flux depression was most likely present throughout the 2009-2017 observations and that a lack of a high-confidence detection of this feature in 2009-2012 is presumably a consequence of the very limited (u, v)-coverage.
None of the EHT observations so far took place during an unambiguous flaring activity period of the M87 nucleus, such as the events discussed in Abramowski et al. (2012).We note that such an event could potentially influence the image morphology more strongly than the moderate increase of total brightness considered in Figure 13.Future simultaneous multiwavelength observational campaigns will shed light on the structural changes in the M87 * compact radio image in relation to the enhanced activity in different parts of the spectrum, allowing the site of particle acceleration to be localized.

Black hole shadow or a transient feature
As discussed in Section 3, the 2009-2013 data sets can be successfully modeled with both an asymmetric Gaussian and a ring model, with each giving a similar fit quality.However, the maximum likelihood Gaussian models are very inconsistent in size, shape, and orientation across different years, see Figure 14.In contrast, the best-fitting ring models, as seen in Figure 8, are similar in size over all epochs under the priors described in Sections 3.2 and 3.3.Even when considered separately from the 2017 results, this consistency supports our choice to interpret the 2009-2013 morphology as a ring and to draw conclusions from the results of fitting asymmetric ring models to the data.The 95% confidence intervals of all 2009-2017 diameter posteriors lie within 40±12 µas, while ML estimators of the diameter from all years lie within 43 ±5 µas, see Table 3.These values are also consistent with the M87 * diameter measurement of 42±3 µas reported by EHTC VI and with the expected size of an observed black hole of mass/distance corresponding to the stellar dynamics measurement (Blakeslee et al. 2009;Gebhardt et al. 2011).All 2009-2017 diameter measurements are inconsistent with the gas dynamics mass measurement (Walsh et al. 2013), which predicts a diameter roughly half as large.The consistency in the diameter across multiple observational epochs supports the interpretation of the ring-like feature as emission from the immediate surroundings of the supermassive black hole.
The four EHT observations of M87 * in 2017 spanned about a week, corresponding to a timescale of ∼ 15 M .With such a short time span, we cannot exclude a transient origin for the source morphology using 2017 data alone.However, such a feature would need to attain the geometry and apparent size expected of a shadow of a black hole (of independently measured mass-todistance ratio) through an unusual coincidence.Moreover, transient features are unlikely to persist with a similar geometry throughout multiple years of observations, corresponding to ∼ 10 3 − 10 4 M timescales (the total span of our analyzed observations is 7900 M ).For example, a lensed background source would need a low transverse velocity of v 40 km s −1 to travel 1 M between 2009 and 2017.This is much smaller than the gas velocities seen in the nucleus of M87 (e.g., Macchetto et al. 1997).A bright feature moving with the jet (v ∼ c) should travel 1 pc on that timescale, a factor of roughly 10 3 larger than the physical size of the ring itself.A bright knot or other stationary jet feature would need to persist with a similar location, flux density, and ring morphology to remain consistent with these results.The 8 yr span of the 2009-2017 monitoring is also much longer than the typical variability timescale of the M87 nucleus observed at 230 GHz, which is ∼ 50 days (Bower et al. 2015).While features remaining stationary for many years in otherwise rapidly flowing jets have been reported and interpreted as standing recollimation shocks (Lister et al. 2009), such a configuration would constitute one more unusual coincidence.Thus, we conclude that with multiple years of observations remaining consistent with a ∼ 40 µas ring model, it is highly unlikely that the origin of the observed geometry could be a transient feature.

Time variability of the source geometry
In addition to conclusions from the persistence of the ring structure, we can also draw inferences from the variability observed in the ring structure across the 2009-2017 data sets.In particular, the spread of the diameter and brightness position angle estimates (Table 3) are significantly larger than the spread for corresponding static synthetic data sets (Table 2).As a specific example, the circular standard deviation of the ML position angle estimators given in  , 4 • for GRMHD1, GRMHD2, MODEL1, and MODEL2, respectively.For the observational data (Table 3) the circular standard deviation is equal to 48 • .This larger spread suggests that we are detecting intrinsic structural variability despite the large uncertainties in the parameters estimated with pre-2017 observations.Moreover, unambiguous signatures of intrinsic variabil-ity on a timescale of years can be seen directly in the visibility data (Section 2.4).
Because GRMHD simulations naturally model both the source structure and its variability, they provide an important pathway for drawing conclusions from the observed variability.As a preliminary demonstration for a comprehensive study that will be published separately, we consider a small subset of the EHT Image Library (EHTC V).The simulations are parameterized with the black hole spin a * , the electron temperature parameter R high (see Section 4.1) and the accretion state -strongly magnetized magnetically arrested disk (MAD, Narayan et al. 2003), or standard and normal evolution (SANE) flow, such as those considered in Sections 4.1 and 6.1.Other parameters (e.g., total compact flux density of 0.5 Jy, inclination, jet position angle) are adjusted to match the observed properties of M87 * .For our exploratory study, we utilize the following four simulations: (S1) MAD, a * = 0.5, R high = 10 ; (S2) SANE, a * = 0.5, R high = 10 ; (S3) MAD, a * = −0.5,R high = 10; (S4) MAD, a * = 0.5, R high = 160.For each simulation, we take 500 ray-traced snapshots with 5 M separation in time.For each snapshot we calculate the position angle of the bright component η using Equation 11.We then construct histograms of η for each of the four simulations, shown in Figure 15.
Each of the simulation parameters influences the distribution of η, both in terms of its mean and spread.Some of these differences can be readily understood, for instance, in the case of prograde accretion onto a spinning black hole, the radiation is boosted both with Doppler and with frame-dragging effects.The position angle of the bright component is thus expected to be relatively more influenced by the geometry (assumed to be fixed) and not by the stochastic component than the retrograde case, in which Doppler effect and frame-dragging counteract and the geometry becomes relatively less important.Irrespective of the mechanism, some variants of simulations have great difficulties explaining either source orientation on 2017 April 6th (bright component too far clockwise), or in 2013 (too far counterclockwise), as indicated in Figure 15.Thus, continued EHT observations, with tight constraints on η spaced over multiple years, will constrain these types of models on the basis of variability in η.

Limitations of the current approach
There are caveats to the simple analysis outlined above that should be addressed in more focused future studies.The simulations that we consider do not capture the full extent of the physics relevant for M87 * .In particular, the electron temperature could be calculated in a more self-consistent fashion than via a temperature ratio prescription of Equation 10, by separately evolving the energy of ions and electrons (Sądowski et al. 2017).Then, one could evolve a population of nonthermal electrons (Chael et al. 2017), determine the electrons heat-  3, two results from 2017 are shown without their very narrow error bars).The gray area around φjet corresponds to the observed jet position angle variation (Walker et al. 2018).
ing with a well-motivated subgrid prescription (Chael et al. 2018a;Davelaar et al. 2019), or even employ nonideal MHD to model nonthermal emission caused by the particles accelerated through magnetic reconnection in a more self-consistent manner (Ripperda et al. 2019).In the current analysis we also make an assumption of no tilt between the plane of accretion and the black hole spin.If the tilt is present, an additional degree of freedom ("camera longitude") corresponding to a position angle of the black hole spin (misaligned with the jet position angle φ jet ) in the image plane will influence the observed crescent orientation (Chatterjee et al. 2020).
In that case, analysis of the position angle distributions could place joint constraints on the tilt magnitude, the longitude, and other parameters of the simulation.Large-scale parameter surveys with these extensions to the GRMHD setup are currently precluded by the immense computational costs.
A separate concern is whether the orientation of crescent models fitted to the VLBI data is consistent with the image domain η (Equation 11).In the case of the synthetic data sets considered in Section 4, the two GRMHD data sets exhibited quite large biases while the two MODEL data sets showed a high level of consistency, so this issue requires further study.Characterizing GRMHD simulations in terms of VLBI observables is the subject of continued work.

SUMMARY
We have performed geometric modeling of the 2009-2017 EHT observations of M87 * .Motivated by EHT imaging and modeling results using the 2017 observations and the stability of fits across the archival observations, we have used a simple asymmetric ring model.We found that the fitted ring diameter is stable throughout these observations, which strongly argues in favor of its association with the shadow of a supermassive black hole.We observe indications of modest intrinsic variability in the total flux density of the ring and in its position angle.
Specifically, we find the brightness asymmetry along the east-west direction in the previously unpublished 2013 observations, while all other data sets are consistent with the north-south asymmetry direction seen in the 2017 EHT images.This degree of position angle variation is seen in some GRMHD simulations of M87 * , while others do not show position angle variations as broad as those observed between 2013 and 2017.Thus, the source variability over these observations provides new constraints on the simulation parameters, including the black hole spin, accretion flow magnetization, and electron heating model.As an example, the GRMHD MAD model with spin a * = 0.5 and R high = 160 (last panel of Figure 15), which was determined to be viable by EHTC V, is inconsistent with the presented position angle measurements.We expect that unmodeled physical effects such as black hole and accretion flow spin misalignment may also be important in interpreting this variability.
Our results extend the temporal span of EHT constraints on the ring morphology by nearly three orders of magnitude, from ∼15 M over the 2017 observations to ∼7900 M between the 2009 and 2017 campaigns.Because the correlation timescale for M87 * is expected to be at least a few tens of M , the longer span is critical for decoupling stable image features such as the black hole shadow from transient features associated with the turbulent accretion flow.As continued EHT observations become available, the variation of the estimated posi-tion angle should allow us to discriminate between viable GRMHD models, providing constraints on the physical parameters of M87 * and opening an exciting new avenue for quantitative time-domain studies of structural variability in M87 * .

B. CORNER PLOTS FOR THE RT AND THE RG MODELS
We present the posterior probability distributions corresponding to fitting the 2009-2013 data sets with the RT model, Figure 17, and with the RG model, Figure 18.Similarly, for the 2017 data sets, we show the posterior distributions obtained for the RT model, Figure 19, and for the RG model, Figure 20.

Figure 1 .
Figure1.Left panel: one of the images of M87 * obtained in EHTC IV (see Section 4.2 for details).A 42 µas circle is plotted with a dashed line for reference.The observed position angle of the approaching jet φjet is 288 • east of north(Walker et al. 2018).Under the assumed physical interpretation of the ring, we expect to find the bright side of the crescent on average approximately 90 • clockwise from φjet (EHTC V).We assume a convention φB,exp = 198 • , indicated with a blue dashed line.Right panel: a random snapshot (note that this is not a fit to the EHT image) from a GRMHD simulation adopting the expected properties of M87 * (Section 4.1).The spin vector of the black hole is partially directed into the page, counteraligned with the approaching jet (and aligned with the deboosted receding jet); its projection onto the observer's screen is located at the position angle of φspin = φjet − 180 • .

Figure 2 .
Figure 2. (u, v)-coverage of the M87 * observations performed in 2009-2013 with various proto-EHT arrays.Gray lines indicate detections obtained during the 2017 observations with a mature EHT array, including several new sites, but without the baselines to CARMA.Dashed circles correspond to angular scales of 50 µas (inner) and 25 µas (outer).

Figure 3 .
Figure 3. Top: Visibility amplitudes of M87 * detections in 2009-2013 as a function of projected baseline length √ u 2 + v 2 .The source model derived from the EHT 2017 observations is shown with gray dots.Gray dots with black borders show the predicted visibility amplitudes of the source model at the baselines of the prior observations in 2009-2013.Dashed black lines correspond to the family of Fourier transforms of a symmetric, infinitely thin ring of diameter d0 = 45.0 µas.Bottom: total arcsecondscale flux density (on intrasite baselines, network-calibrated), and compact emission flux density from the short CARMA-SMT baseline.In the case of the short baselines in 2017, predictions of the source model are given.

Figure 5 .
Figure 5.Comparison of maximum likelihood (ML) asymmetric Gaussian and asymmetric ring models fitted to the 2013 data.Data are shown as points with error bars corresponding to the thermal errors.The shaded regions cover all amplitudes for a given model.The red and blue lines represent models evaluated at the (u, v)-coordinates of the observations.Both models offer a very similar fit quality.ML estimators are shown as inset figures.The model of a ring with roughly double the diameter (dashed curve) fits the intermediate baselines, but is excluded by long-baseline amplitudes.

Figure 6 .
Figure 6.Top two rows: marginalized distributions of the mean diameter d and brightness maximum position angle φB for the RT model fits to GRMHD simulation snapshots GRMHD1 (first row) and GRMHD2 (second row).The 2017 posteriors are contained within the dark gray bands.The dashed vertical line in the left panels denotes diameter of 2 √ 27M/D.The vertical dashed lines in the right panels denote the convention angle φB,exp, φB,exp − 90 • , and the approaching jet position angle φjet = φB,exp + 90 • .The range of (φjet − 180 • , φjet) is highlighted.Two bottom rows: similar as above, but for the RT model fitted to MODEL1 and MODEL2.Lightly shaded areas correspond to values reported in EHTC I, diameter d = 42 ± 3 µas and position angle 150 • < φB < 200 • .

Figure 7 .
Figure 7. ML estimators corresponding to the fits to four synthetic images, shown in the first row (no blurring).Estimators were obtained through synthetic VLBI observations with the (u, v)-coverage and uncertainties identical to those of the real observations performed in 2009-2017.The thick ring model (RT) was used, and the presented images of ML estimators were blurred to µas resolution.Blue dashed lines indicate the convention for the expected position angle of the bright component φB,exp.The gray bar represents the ML estimate of φB.For the 2009, 2011, and 2013 data sets, the orientation is determined assuming that |φB,exp − φB| < 90 • .The dashed circles correspond to a diameter of 42 µas.
with Equation 11, b using the (u, v)-coverage of 2017 April 6th

Figure 8 .Figure 9 .Figure 10 .
Figure 8.First row: ML estimators obtained from fitting the RT model to the 2009-2017 observations.The position angle φB is indicated with a bar.For the 2009, 2011, and 2013 data sets the orientation is determined assuming that φjet −180 • < φB < φjet, where φjet = 288 • .Position angle 68% confidence intervals are shown for the 2009-2013 data sets.Second row: RT models from the first row blurred to a 15 µas resolution, indicated with a beam circle in the bottom-right corner of the first panel.The dashed circle of 42 µas diameter is plotted for reference.Third row: mean of the 2×10 4 images drawn from the posterior of the RT model fits.2009

Figure 11 .Figure 12 .
Figure 11.Top row: visibility amplitudes measured in 2009-2012 are shown as black diamonds with error bars corresponding to thermal uncertainties.Blue shaded regions correspond to the range of visibility amplitudes of the asymmetric ring RT model ML estimators, shown in the first row of Figure 9.The total number of observed visibility amplitudes N ob is given, along with the number of nonzero baseline visibility amplitudes Nnz, and the number of modeled gains Ng.Bottom row: differences between measured amplitudes |Vi| and the geometric model amplitudes | Vi|.Black error bars correspond to thermal uncertainties, while red ones correspond to error budget inflated by adding systematics approximately capturing the gains uncertainties (Equation14).Two fit quality metrics, defined with Equations 12-13, are provided for each data set.

Figure 13 .
Figure 13.Top row: a single GRMHD simulation snapshot, ray-traced with total compact flux density adjusted to values between 0.1 Jy and 1.1 Jy.The brightness distribution in each panel has been normalized by the brightness maximum.EHT observations of M87 * found a total compact flux density of 0.8-0.9Jy in 2009-2012 and 0.5-0.6Jy in 2013 and 2017, see the bottom panel of Figure 3. Bottom row: same snapshots blurred to the resolution of 15 µas.The 42 µas diameter ring is indicated with a dashed circle.The position angle convention φB,exp is shown with a dashed blue bar.The gray bar indicates the image domain position angle η calculated using Equation 11, and values of η obtained are given in the bottom-right corner of each panel.

Figure 14 .
Figure 14.Year to year consistency of the best-fitting (ML) asymmetric Gaussian model to the M87 * data sets.

Figure 15 .
Figure 15.Histograms of the brightness position angle η, measured in the image domain for 500 ray-traced snapshots of GRMHD simulations from the EHT M87 * Simulation Library (EHTC V).In each panel, the blue histogram represents the same fixed model S1: MAD, a * = 0.5, R high = 10, and the red histogram represents a model in which a single parameter has been altered with respect to the fixed model (S2-S4).Orientations measured in the 2009-2017 observational data sets with an ML estimator are indicated, with 68% confidence intervals (Table3, two results from 2017 are shown without their very narrow error bars).The gray area around φjet corresponds to the observed jet position angle variation(Walker et al. 2018).

Figure 17 .
Figure 17.Posterior distributions of RT model parameters resulting from fitting to M87 * 2009-2013 data, obtained using Themis.The maximum likelihood estimate is indicated with red lines.Contours indicate 0.2, 0.5, 0.8 and 0.95 of the posterior volume.

Figure 18 .
Figure 18.Same as Figure 17, but for the RG model.

Figure 19 .
Figure 19.Same as Figure 17, but for the M87 * observations on 2017 April 6th and April 11th fitted with the RT model.

Figure 20 .
Figure 20.Same as Figure 17, but for the M87 * observations on 2017 April 6th and April 11th fitted with the RG model.

Table 1 .
M87 * data sets analyzed in this paper.
a theoretically available / with detections / nonredundant, nonzero with detections, b all / SMT-Hawai'i, c all / SMT-Chile, d single-day data set

Table 2 .
Parameter estimates from fitting the RT model to the synthetic data sets.

Table 3 .
Parameter estimates from fitting the RT model to the observational data sets.

Table 4 .
Comparison between parameters extraction results reported in This Paper, EHTC IV, and EHTC VI.