First M87 Event Horizon Telescope Results. VII. Polarization of the Ring

In 2017 April, the Event Horizon Telescope ( EHT ) observed the near-horizon region around the supermassive black hole at the core of the M87 galaxy. These 1.3 mm wavelength observations revealed a compact asymmetric ring-like source morphology. This structure originates from synchrotron emission produced by relativistic plasma located in the immediate vicinity of the black hole. Here we present the corresponding linear-polarimetric EHT images of the center of M87. We ﬁ nd that only a part of the ring is signi ﬁ cantly polarized. The resolved fractional linear polarization has a maximum located in the southwest part of the ring, where it rises to the level of ∼ 15%. The polarization position angles are arranged in a nearly azimuthal pattern. We perform quantitative measurements of relevant polarimetric properties of the compact emission and ﬁ nd evidence for the temporal evolution of the polarized source structure over one week of EHT observations. The details of the polarimetric data reduction and calibration methodology are provided. We carry out the data analysis using multiple independent imaging and modeling techniques, each of which is validated against a suite of synthetic data sets. The gross polarimetric structure and its apparent evolution with time are insensitive to the method used to reconstruct the image. These polarimetric images carry information about the structure of the magnetic ﬁ elds responsible for the synchrotron emission. Their physical interpretation is discussed in an accompanying publication.


Introduction
The Event Horizon Telescope (EHT) Collaboration has recently reported the first images of the event-horizon-scale structure around the supermassive black hole in the core of the massive elliptical galaxy M87, one of its two main targets. 130The EHT images of M87ʼs core at 230 GHz (1.3 mm wavelength) revealed a ring-like structure whose diameter of 42 μas, brightness temperature, shape, and asymmetry are interpreted as synchrotron emission from relativistic electrons gyrating around magnetic field lines in close vicinity to the event horizon.We have described the details of the EHT's instrumentation, data calibration pipelines, data analyses and imaging procedures, and the theoretical interpretation of these first images in a series of publications (Event Horizon Telescope Collaboration et al. 2019aCollaboration et al. , 2019bCollaboration et al. , 2019cCollaboration et al. , 2019d, 2019e, 2019f, 2019e, 2019f, hereafter Papers I, II, III, IV, V, VI, respectively).
In this Letter, we present the first polarimetric analysis of the 2017 EHT observations of M87 and the first images of the linearly polarized radiation surrounding the M87 black hole shadow.These polarimetric images provide essential new information about the structure of magnetic field lines near the event horizon of M87ʼs central supermassive black hole, and they put tight constraints on the theoretical interpretations of the nature of the ring and of relativistic jet-launching theories.The theoretical implications of these images and the constraints that they place on the magnetic field structure and accretion state of the black hole are discussed in an accompanying work (Event Horizon Telescope Collaboration et al. 2021, hereafter Paper VIII).Readers interested in the details of the data reduction, methodology, and validation can find a detailed index of this Letter in Section 1.2.Readers primarily interested in the results may skip directly to Section 5 and to subsequent discussion and conclusions in Section 6.

Previous Polarimetric Observations of the M87 Jet
The giant elliptical galaxy Messier 87 (M87, NGC 4486) is the central member of the Virgo cluster of galaxies and hosts a lowluminosity radio source (Virgo A, 3C 274, B1228+126).M87 is nearby and bright, and at its center is one of the best-studied active galactic nuclei (AGNs).M87 was the first galaxy in which an extragalactic jet (first described as a "narrow ray") extending from the nucleus was discovered (Curtis 1918).This kiloparsec-scale jet is visible, with remarkably similar morphology, at all wavelengths from radio to X-ray.The optical radiation from the jet on kpc scales was found to be linearly polarized by Baade (1956), which was confirmed by Hiltner (1959), suggesting that the emission mechanism is synchrotron radiation.
The central engine that powers the jet contains one of the most massive black holes known, measured from the central stellar velocity dispersion (Gebhardt et al. 2011; M =(6.6 ± 0.4) × 10 9 M e ) and directly from the size of the observed emitting Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence.Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
The M87 jet has been imaged at subarcsecond resolution in both total intensity and linear polarization at optical wavelengths with the Hubble Space Telescope (Thomson et al. 1995;Capetti et al. 1997), and at radio wavelengths with the Very Large Array (e.g., Owen et al. 1989).Observing the launching region of the jet closer to the black hole and the region surrounding the black hole requires milliarcsecond (mas) resolution or better, and hence very-long-baseline interferometry (VLBI) techniques used at the highest frequencies (e.g., Boccardi et al. 2017 and references therein).
Milliarcsecond-scale VLBI observations show that the core itself is unpolarized even at millimeter wavelengths.Zavala & Taylor (2002), observing at 8, 12, and 15 GHz, set upper limits on the fractional polarization of the compact core of m < 0.1%.About 20 mas downstream from the core, patchy linear polarization starts to become visible in the jet at the level of 5%-10%, although no large-scale coherent pattern to the electric-vector position angles (EVPAs) χ is apparent.However, at each patch in the downstream jet, the EVPAs exhibit a linear change with λ 2 , allowing the rotation measures (RMs) to be estimated.These RMs range from −4000 rad m −2 to 9000 rad m −2 (Zavala & Taylor 2002).The linear dependence of EVPA on λ 2 over several radians is important, as it shows that the Faraday-rotating plasma in the jet cannot be mixed in with the relativistic emitting particles (Burn 1966) but must be in a cooler (sub-relativistic) foreground screen.
On kiloparsec scales, Owen et al. (1990) found a complex distribution of RM.Over most of the source the RM is typically of order 1000 rad m −2 , but there are patches where values as high as 8000 rad m −2 are found.
More recently, Park et al. (2019) studied Faraday RMs in the jet using multifrequency Very Long Baseline Array (VLBA) data at 8 GHz.They found that the RM magnitude systematically decreases with increasing distance from 5,000 to 200,000 R s .The observed large (45°) EVPA rotations at various locations of the jet suggest that the dominant Faraday screen in this distance range would be external to the jet, similar to the conclusion of Zavala & Taylor (2002).Homan & Lister (2006), also observing at 15 GHz with the VLBA (as part of the MOJAVE project) found a tight upper limit on the fractional linear polarization of the core of <0.07%.They also detect circular polarization of (−0.49± 0.10)%.
At 43 GHz, Walker et al. (2018) presented results from 17 years of VLBA observations of M87, with polarimetric images presented at two epochs.These show significant polarization (up to 4%) in the jet near the 43 GHz core, but at the position of the total intensity peaks the fractional polarizations are only 1.5% and 1.1%.They interpret these fractions as coming from a mix of emission from the unresolved, unpolarized core and a more polarized inner jet.Hada et al. (2016) showed images at four epochs at 86 GHz made with the VLBA and the Green Bank Telescope.At this frequency, the resolution is about (0.4 × 0.1) mas, corresponding to (56 × 14) R s .Again, the core is unpolarized with no linear polarization detected at the position of the total intensity emission's peak, while there is a small patch of significant (3.5%) polarization located 0.1 mas downstream.At 0.4 mas downstream from the peak, there is another patch of significant polarization (20%).These results indicate that there are regions of significantly ordered magnetic field very close to the central engine.
Very recently, new observations by Kravchenko et al. (2020) using the VLBA at 22 and 43 GHz show two components of linear polarization and a smooth rotation of EVPA around the 43 GHz core.Comparison with earlier observations show that the global polarization pattern in the jet is largely stable over an 11 year timescale.They suggest that the polarization pattern is associated with the magnetic structure in a confining magnetohydrodynamic wind, which is also the source of the observed Faraday rotation.
The EHT presently observes at ∼230 GHz and has previously reported polarimetric measurements only for Sagittarius A * (Sgr A * ; Johnson et al. 2015).The only previous polarimetric measurements of M87 at this frequency were done by Kuo et al. (2014) using the Submillimeter Array (SMA) on Maunakea, Hawai'i, USA.The SMA is a compact array with a (1.2 × 0.8) arcsec beam, 10000 times larger than the EHT beam.Li et al. (2016) used the value from this work to calculate a limit on the accretion rate onto the M87 black hole.Most recently, Goddi et al. (2021) reported results on M87 around 230 GHz as part of the Atacama Large Millimeter/submillimeter Array (ALMA) interferometric connected-element array portion of the EHT observations in 2017.The ALMA-only 230 GHz observations (with a FWHM synthesized beam in the range 1″-2″, depending on the day) resolve the M87 inner region into a compact central core and a kpc-scale jet across approximately 25″.It has been found that the 230 GHz core at these scales has a total flux density of ∼1.3 Jy, a low linear polarization fraction |m| ∼ 2.7%, and even less circular polarization, |v| < 0.3%.Notably, ALMA-only observations show strong variability in the RM estimated based on four frequencies within ALMA Band 6 (four spectral windows centered at 213, 215, 227, and 229 GHz;Matthews et al. 2018).The RM difference is clear between the start of the EHT observing campaign on 2017 April 5 (RM ≈ 0.6 × 10 5 rad m −2 ) and the end on 2017 April 11 (RM ≈ − 0.4 × 10 5 rad m −2 ).Because these measurements were taken simultaneously with the EHT VLBI observations presented here, the ALMA-only linear polarization fraction measurements can be used as a point of reference, and we discuss possible implications of the strong RM evolution on the EHT polarimetric images of M87.

This Work
This Letter presents the details of the polarimetric data calibration, the procedures for polarimetric imaging, and the resulting images of the M87 core.In Section 2, we briefly overview the basics of polarimetric VLBI.In Section 3, we summarize the EHT 2017 observations, describe the initial data calibration procedure and validation tests, and describe the basic properties of the polarimetric data.In Section 4, we describe our methods, strategy, and test suite for our polarimetric calibration and imaging.In Section 5, we present and analyze the polarimetric images of the M87 ring and examine the calibration's impact on the polarimetric image.We discuss the results and summarize the work in Sections 6 and 7.
This Letter is supplemented with a number of appendices supporting our analysis and results.The appendices summarize: polarimetric data issues (Appendix A); novel VLBI closure data products (Appendix B); details of calibration and imaging methods (Appendix C); validation of polarimetric calibration for telescopes with an intra-site partner (Appendix D); fiducial leakage D-terms from M87 imaging (Appendix E); preliminary results of polarimetric imaging of M87 (Appendix F); polarimetric imaging scoring procedures (Appendix G); details of Monte Carlo D-term simulations (Appendix H); consistency of low-and high-band results for M87 (Appendix I); comparison to polarimetric properties of calibrator sources (Appendix J); and validations of assumptions made in polarimetric imaging of the main target and the calibrators (Appendix K).

Basic Definitions
A detailed introduction to polarimetric VLBI can be found in Thompson et al. (2017, their Chapter 4).Here we briefly introduce the basic concepts and notation necessary to understand the analysis presented throughout this Letter.The polarized state of the electromagnetic radiation at a given spatial coordinate x = (x, y) is described in terms of four Stokes parameters,  x ( ) (total intensity),  x ( ) (difference in horizontal and vertical linear polarization),  x ( ) (difference in linear polarization at 45°and −45°position angle), and  x ( ) (circular polarization).We define the complex linear polarization  as represents the (complex) fractional polarization, and c =  0.5 arg ( ) is the EVPA, measured from north to east.Total-intensity VLBI observations directly sample the Fourier transform  ˜as a function of the spatial frequency u = (u, v) of the total-intensity image; similarly, polarimetric VLBI observations also sample the Fourier transform of the other Stokes parameters    , , ˜˜˜.EHT data are represented in a circular basis, related to the Stokes visibility components by the following coordinate system transformation:

˜˜˜˜( )
for a baseline between two stations j and k.The notation R L j k * indicates the complex correlation (where the asterisk denotes conjugation) of the electric field components measured by the telescopes; in this example, the right-hand circularly polarized component R j measured by the telescope j and the left-hand circularly polarized component L k measured by the telescope k.Equation (2) defines the coherency matrix ρ jk .Following Johnson et al. (2015), we also define the fractional polarization in the visibility domain, Note that Equation (3) implies that u m ( )  and -u m ( )  constitute independent measurements for u ≠ 0.Moreover, u m ( )  and m(x) are not a Fourier pair.While the image-domain fractional polarization magnitude is restricted to values between 0 (unpolarized radiation) and 1 (full linear polarization), there is no such restriction on the absolute value of m .Useful relationships between m  and m are discussed in Johnson et al. (2015).
Imperfections in the instrumental response distort the relationship between the measured polarimetric visibilities and the source's intrinsic polarization.These imperfections can be conveniently described by a Jones matrix formalism (Jones 1941), and estimates of the Jones matrix coefficients can then be used to correct the distortions.The Jones matrix characterizing a particular station can be decomposed into a series of complex matrices G, D, and Φ (Thompson et al. 2017), Time-dependent field rotation matrices Φ ≡ Φ(t) are known a priori, with the field rotation angle f(t) dependent on the source's elevation θ el (t) and parallactic angle ψ par (t).The angle f takes the form where f off is a constant offset, and the coefficients f el and f par are specific to the receiver position type.The gain matrices G, containing complex station gains G R and G L , are estimated within the EHT's upstream calibration and total-intensity imaging pipeline; see Section 3.2.Estimation of the D-terms, the complex coefficients D R and D L of the leakage matrix D, generally requires simultaneous modeling of the resolved calibration source, and hence cannot be easily applied at the upstream data calibration stage.The details of the leakage calibration procedures adopted for the EHT polarimetric data sets analysis are described in Section 4. For a pair of VLBI stations j and k the measured coherency matrix r ¢ jk is related to the true-source coherency matrix ρ jk via the Radio Interferometer Measurement Equation (RIME; Hamaker et al. 1996;Smirnov 2011) where the dagger † symbol denotes conjugate transposition.
Once the Jones matrices for the stations j and k are well characterized, Equation (6) can be inverted to give the source coherency matrix ρ jk .From ρ jk , Stokes visibilities can be obtained by inverting Equation (2): The collection of Stokes visibilities sampled in (u, v) space by the VLBI array can finally be used to reconstruct the polarimetric images    x x x , , ( ) ( ) ( ), and  x ( ).The coherency matrices on a quadrangle of baselines can be combined to form "closure traces," data products that are insensitive to any calibration effects that can be described using Jones matrices.Appendix B defines these closure traces and outlines their utility for describing the EHT data.

Observations and Initial Processing
Eight observatories at six geographical locations participated in the 2017 EHT observing campaign: ALMA and the Atacama Pathfinder Experiment (APEX) in the Atacama Desert in Chile; the Large Millimeter Telescope Alfonso Serrano (LMT) on the Volcán Sierra Negra in Mexico; the South Pole Telescope (SPT) at the geographic south pole; the IRAM 30 m telescope (PV) on Pico Veleta in Spain; the Submillimeter Telescope (SMT) on Mt.Graham in Arizona, USA; SMA and the James Clerk Maxwell Telescope (JCMT) on Maunakea in Hawai'i, USA. 131he EHT observations were carried out on five nights between 2017 April 5 and 11.M87 was observed on April 5, 6, 10, and 11.Along with the main EHT targets M87 and Sgr A * , several other AGN sources were observed as science targets and calibrators.
Observations were conducted using two contiguous frequency bands of 2 GHz bandwidth each, centered at frequencies of 227.1 and 229.1 GHz, hereby referred to as low and high band, respectively.The observations were arranged in scans alternating different sources, with durations lasting between 3 and 7 minutes.Apart from the JCMT, which observed only a single polarization (right-circular polarization on 2017 April 5-7 and left-circular polarization on 2017 April 10-11), all stations observed in full polarization mode.ALMA is the only station to natively record data in a linear polarization basis.Visibilities measured on baselines to ALMA were converted from a mixed linear-circular basis to circular polarization after correlation using the PolConvert software (Martí-Vidal et al. 2016;Matthews et al. 2018;Goddi et al. 2019).A technical description of the EHT array is presented in Paper II and a summary of the 2017 observations and data reduction is presented in Paper III.

Correlation and Data Calibration
After the sky signal received at each telescope was mixed to baseband, digitized, and recorded directly to hard disk, the data from each station were sent to MIT Haystack Observatory and the Max-Planck-Institut für Radioastronomie (MPIfR) for correlation using the DiFX software correlators (Deller et al. 2011).The accumulation period adopted at correlation is 0.4 s, with a frequency resolution of 0.5 MHz.The clock model used during correlation to align the wavefronts arriving at different telescopes is imperfect, owing to an approximate a priori model for Earth's geometry as well as rapid stochastic variations in path length due to local atmospheric turbulence (Paper III).Before the data can be averaged coherently to build up signalto-noise ratio (S/N), these effects must be accurately measured and corrected.This process, referred to as fringe fitting, was conducted using three independent software packages: the Haystack Observatory Processing System (HOPS; Whitney et al. 2004;Blackburn et al. 2019); the Common Astronomy Software Applications package (CASA; McMullin et al. 2007;Janssen et al. 2019a); and the NRAO Astronomical Image Processing System (AIPS; Greisen 2003, Paper III).Automated reduction pipelines were designed specifically to address the unique challenges related to the heterogeneity, wide bandwidth, and high observing frequency of EHT data.The field rotation angle is corrected with Equations (4)-(5), using coefficients given in Table 1.Flux density (amplitude) calibration is applied via a common post-processing framework for all pipelines (Blackburn et al. 2019;Paper III), taking into account estimated station sensitivities (Issaoun et al. 2017;Janssen et al. 2019b).Under the assumption of zero circular polarization of the primary (solar system) calibrator sources, elevation-independent station gains possess independent statistical uncertainties for the right-hand-circular polarization (RCP) and left-handcircular polarization (LCP) signal paths, estimated to be ∼20% for the LMT and ∼10% for all other stations (Janssen et al. 2019b).
To remove the instrumental amplitude mismatch between the LL * and RR * visibility components (the R-L phases are correctly calibrated in all scans by using ALMA as the reference station), calibration of the complex polarimetric gain ratios (the ratios of the G R and G L terms in the G matrices) is performed.This is done by fitting global (multi-source, multi-days) piecewise polynomial gain ratios as functions of time.The aim of this approach is to preserve differences in LL * and RR * visibilities intrinsic to the source (Steel et al. 2019).After this step, preliminary polarimetric Stokes visibilities     , , , ˜˜˜˜c an be constructed.However, the gain calibration requires significant additional improvements.The final calibration of the station phase and amplitude gains takes place in a self-calibration step as part of imaging or modeling the Stokes  brightness distribution, preserving the complex polarimetric gain ratios (e.g., Papers IV, VI).Fully calibrating the D-terms requires modeling the polarized emission.
The Stokes  (total intensity) analysis of a subset of the 2017 observations (Science Release 1 (SR1)), including M87, was the subject of Papers I-VI.The quality of these Stokes  data was assured by a series of tests covering self-consistency over bands and parallel hand polarizations, and consistency of trivial closure quantities (Wielgus et al. 2019).Constraints on the residual non-closing errors were found to be at a 2% level.
For additional information on the calibration, data reduction, and validation procedures for EHT, see Paper III. Information about accessing SR1 data and the software used for analysis can be found on the EHT websiteʼs data portal. 132n this Letter, we utilize the HOPS pipeline full-polarization band-averaged (i.e., averaged over frequency within each band) and 10-second averaged data set from the same reduction path as SR1, but containing a larger sample of calibrator sources for polarimetric leakage studies.In addition, the ALMA linearpolarization observing mode allows us to measure and recover the absolute EVPA in the calibrated VLBI visibilities (Martí-Vidal et al. 2016;Goddi et al. 2019).Other minor subtleties in the handling of polarimetric data are presented in Appendix A.

Polarimetric Data Properties
In Figure 1 (top row), we show the (u, v) coverage and lowband interferometric polarization of our main target M87 as a function of the baseline (u, v) after the initial calibration stage but before D-term calibration.The colors code the scanaveraged amplitude of the complex fractional polarization m  (i.e., the fractional polarization in visibility space; for analysis of m  in another source, Sgr A * , see Johnson et al. 2015) that occur at (u, v) spacings where the Stokes  visibility amplitude enters a deep minimum.The fractional polarization m  of the M87 core is broadly consistent across the four days of observations and between low-and high-frequency bands, therefore high-band results are omitted in the display.
In Figure 1 (bottom row) we show the field rotation angles f for each station observing M87 on the four observing days.The data are corrected for this angle during the initial calibration stage, but the precision of the leakage calibration depends on how well this angle is covered and on the difference in the field angles at the two stations forming a baseline.In the M87 data the field rotation for stations forming long baselines (LMT, SMT, and PV) is frequently larger than 100°except for April 10, for which the (u, v) tracks are shorter.
In addition to the M87 data, a number of calibrators are utilized in this Letter for leakage calibration studies.To estimate D-terms for each of the EHT stations we use several EHT targets observed near in time to M87.In VLBI, weakly polarized sources are more sensitive to polarimetric calibration errors so they are preferred calibrators.For full-array leakage calibration, we focus on two additional sources: J1924-2914 and NRAO 530 (calibrators for the second EHT primary target, Sgr A * ), which are compact and relatively weakly polarized.The main calibrator for M87 in total intensity, 3C 279 (Kim et al. 2020;Paper IV), is bright and strongly polarized on longer baselines and is not used in this work.The properties and analysis of the calibrators are discussed in more detail in Appendices J and K.
The closure traces for M87 and the calibrators can be used both to probe the data for uncalibrated systematic effects (see Appendix B.2) and to ascertain the presence of polarized flux density in a calibration-insensitive manner (see Appendix B.3).
Unless otherwise stated, the following analysis is focused on the low-band half of the data sets.

Methods
Producing an image of the linearly polarized emission requires both solving for the sky distribution of Stokes parameters  and  and for the instrumental polarization of the antennas in the EHT array.In this work, we use several distinct methods to accomplish these tasks.Our approaches can be classified into three main categories: imaging via subcomponent fitting; imaging via regularized maximum likelihood; and imaging as posterior exploration.In this section we only briefly describe each method; fuller descriptions are presented in Appendix C.
The calibration of the instrumental polarization by subcomponent fitting was performed using three different codes (LPCAL, GPCAL, and polsolve) that depend on two standard software packages for interferometric data analysis: AIPS133 and CASA. 134In all of these methods, the Stokes  imaging step is performed using the CLEAN algorithm (Högbom 1974), and sub-components with constant complex fractional polarization in the range from 0 to 2. The data shown are derived from low-band visibilities after the initial calibration pipeline described in Section 3.2 but before any D-term calibration.The data points are coherently scan-averaged.Bottom row: M87 field rotation angle f for each station as a function of time (Equation ( 5)).
are then constructed from collections of the total intensity CLEAN components and fit to the data.In AIPS, two algorithms for D-term calibration are available: LPCAL (extensively used in VLBI polarimetry for more than 20 years; Leppänen et al. 1995) and GPCAL 135 (Park et al. 2021).In CASA, we use the polsolve algorithm (Martí-Vidal et al. 2021), which uses data from multiple calibrator sources simultaneously to fit polarimetric sub-components and allows for D-terms to be frequency dependent (see Appendix D).In all sub-component fitting and imaging methods, we assume that Stokes =  0. Further details on LPCAL, GPCAL, and polsolve can be found in Appendix C.1.
Image reconstruction via the Regularized Maximum Likelihood (RML) method was used in Paper IV along with CLEAN to produce the first total intensity images of the 230 GHz core in M87.RML algorithms find an image that maximizes an objective function composed of a likelihood term and regularizer terms that penalize or favor certain image features.In this work, we use the RML method implemented in the eht-imaging 136 software library (Chael et al. 2016(Chael et al. , 2018) ) to solve for images in both total intensity and linear polarization.Like the CLEAN-based methods, ehtimaging does not solve for Stokes  .Details on the specific imaging methods in eht-imaging used in the reconstructions presented in this work can be found in Appendix C.2.
Imaging as posterior exploration is carried out using two independent Markov chain Monte Carlo (MCMC) schemes: D-term Modeling Code (DMC) and THEMIS.Both codes simultaneously explore the posterior space of the full Stokes image (including Stokes  ) alongside the complex gains and leakages at every station; station gains are permitted to vary independently on every scan, while leakage parameters are modeled as constant in time throughout an observation.We provide more detailed model specifications for both codes in Appendix C.3 and in separate publications (Pesce 2021; A. E. Broderick et al. 2021, in preparation).
Hereafter, we often refer to eht-imaging, polsolve, and LPCAL methods as imaging methods/pipelines and to DMC and THEMIS methods as posterior exploration methods/ pipelines.

Leakage and Gain Calibration Strategy
In the imaging methods we divide the polarimetric calibration procedure for EHT data into two steps.In the first step, we calibrate the stations with an intra-site partner (ALMA-APEX, SMA-JCMT) using the assumption that sources are unresolved on intra-site baselines, where the brightness distribution can be approximated with a simple point source model.In the imaging pipelines we apply the D-terms for ALMA, APEX, and SMA to the data before polarimetric imaging and D-term calibration of the remaining stations.Baselines to the JCMT (which are redundant with SMA baselines) are removed from the data sets, to reduce complications from handling single-polarization data.The ALMA, APEX, and SMA D-terms are fixed in imaging with eht-imaging and polsolve; because LPCAL is unable to fix D-terms of specific stations to zero, it derives a residual leakage for these stations, which remains small. 137In the second step, we perform simultaneous imaging of the source brightness distribution and D-term calibration of stations for which only long, source-resolving baselines are available.In contrast, the posterior exploration pipelines do not use the D-terms derived using the intra-site baseline approach and instead solve for all D-terms (and station gains) starting with the base data product described in Section 3.2.
The point source assumption adopted in the imaging method intra-site baseline D-term calibration step is an extension to the intra-site redundancies already exploited in the EHT network calibration (Paper III), allowing us to obtain a modelindependent gain calibration for ALMA, APEX, SMA, and JCMT.For an unresolved, slowly evolving source we can assume the true parameters of the coherency matrix ρ jk in Equation (6) to be constant throughout a day of observations, as very low spatial frequencies u are sampled, ρ jk ≈ ρ jk (u = 0).Hence, only four intrinsic visibility components of ρ jk per source and four complex D-terms (two for each station) need to be determined from all the data on an available baseline.
We fit the D-terms of ALMA, APEX, JCMT, and SMA for each day using the multi-source feature of polsolve, combining band-averaged observations of multiple sources (3C 279, M87, J1924-2914, NRAO 530, 3C 273, 1055+018, OJ287, and Cen A as shown in Appendix D) on each day in one single fit per band.The results of these fits per station, polarization, day, and band are presented in Figure 2 (left panel), where we also plot the mean and standard deviation of the D-terms across all days and both bands for each station and polarization.In Appendix D, we provide tables with D-term values and further discuss the time and frequency dependence of D-terms and JCMT single polarization handling.In Appendix D we also present several validation tests of our intra-site baseline D-term estimation method carried out to motivate the use of band-averaged data products, comparisons to independent polarimetric source properties measured from simultaneous interferometric-ALMA observations (Goddi et al. 2021) near-in-time interferometric-SMA leakage estimates, and comparisons to results from a model fitting approach.
In addition to intra-site baseline D-term calibration in the imaging pipelines, we also account for residual station-based amplitude gain errors by calibrating the data to pre-determined fiducial Stokes  images of chosen calibrator sources.Given the extreme resolving power of the EHT array, all available calibrators are resolved on long baselines.Therefore, we must select sources that are best imaged by the EHT array; these are compact non-variable sources with sufficient (u, v) coverage.Four targets in the EHT 2017 observations fit these criteria: M87, 3C 279, J1924-2914, and NRAO 530.Stokes  images of M87 and 3C 279 have been published (Papers I-VI; Kim et al. 2020).Final Stokes  images for the Sgr A * calibrators NRAO 530 and J1924-2914 will be presented in upcoming publications (S.Issaoun et al. 2021, in preparation;S. Jorstad et al. 2021, in preparation) but the best available preliminary 135 GPCAL is a new automated pipeline written in Python and based on AIPS and the CLEAN imaging software Difmap.GPCAL adopts a similar calibration scheme to LPCAL but allows users to (i) fit the D-term model to multiple calibrators simultaneously and (ii) use more accurate linear polarization models of the calibrators for D-term estimation.In this Letter, we use GPCAL to complement the LPCAL analysis of the M87 data (Appendices G.3 and K) and the D-term estimation using calibrators (Appendix J).We do not show GPCAL results in the main text. 136https://github.com/achael/eht-imagingimages are used to self-calibrate our visibility data for D-term comparisons in this Letter (Appendix J).
For M87, although multiple imaging packages and pipelines were utilized in the Stokes  imaging process, the resulting final "fiducial" images from each method are highly consistent at the EHT instrumental resolution (e.g., Paper IV, Figure 15).Therefore, we selected a set of Stokes  images for selfcalibration from the RML-based SMILI imaging software pipeline (Akiyama et al. 2017a(Akiyama et al. , 2017b;;Paper IV).The images we use for self-calibration are at SMILIʼs native imaging resolution (∼10 μas), which provide the best fits to the data and are not convolved with any restoring beam.We self-calibrate our visibility data to these images, thereby accounting for residual station gain variations in the data that make imaging challenging.Using these self-calibrated data sets allows the imaging methods to focus on accurate reconstructions of the polarimetric Stokes  and  brightness distributions and D-term estimation.
Preliminary D-terms estimated by the three imaging methods before testing and optimizing imaging parameters on synthetic data are reported in Appendix F. The right panels of Figure 2 show the final D-terms for LMT, PV, and SMT derived from the imaging and posterior modeling methods after optimization on synthetic data (see Section 4.3).To quantify the agreement (or distance in the complex plane) between D-term estimates from different methods we calculate L 1 norms.The L 1 norms averaged over left and right (also real and imaginary) D-term components, over all stations and over the four observing days, are all less than 1% for each pair of imaging methods (see Figure 20 in Appendix E).The mean values of the D-terms from the posterior exploration methods correlate well with the D-terms estimated by the imaging methods.For each combination of imaging and posterior exploration method the station averaged L 1 norms range from 1.5% to 1.89%.
DMC is the only method that solves for independent leftand right-circular station gains.The ratio of these gains derived by DMC on April 11 is shown in Figure 3.As expected, the assumption made by all of the imaging pipelines and one of the posterior exploration pipelines (THEMIS), that right-and lefthand gains are equal for all stations at all times, holds.For  Individual station gain ratios are offset vertically for clarity, with the dashed horizontal lines indicating a unit ratio for each station (i.e., unity for amplitudes and zero for phases).Note that JCMT only observes one polarization at a time, and so provides no constraints on gain ratios.We see that the assumption made by the three imaging pipelines and one posterior exploration pipeline (THEMIS)namely, that the right-and left-hand gains are equal for all stations at all times -largely holds.The behavior in this plot is representative of that seen across days and bands.
verification purposes, we also estimate D-terms using data of several calibrator sources.We find that the D-terms derived by polarimetric imaging of these other sources are consistent with those of M87 (Appendix J).Finally, we note our estimated SMT D-terms are similar to those computed previously using early EHT observations of Sgr A * (Johnson et al. 2015).

Parameter Surveys and Validation on Synthetic Data
Each imaging and leakage calibration method has free parameters that must be set by the user before the optimization or posterior exploration takes place.Some of these parameters (e.g., field of view, number of pixels) are common to all methods, but many are unique to each method (e.g., the subcomponent definitions in LPCAL or polsolve, or the regularizer weights in eht-imaging).In VLBI imaging, these parameters are often simply set by the user given their experience on similar data sets, or based on what appears to produce an image that is a good fit to the data and free of noticeable imaging artifacts.In this work, we follow Paper IV in choosing the method parameters that we use in our final image reconstructions more objectively by surveying a portion of the parameter space available to each method.
We perform surveys over the different free parameters available to each method and attempt to choose an optimal set of parameters based on their performance in recovering the source structure and input D-terms from several synthetic data models.Appendix G provides more detail on the individual parameter surveys performed by each method.The parameter set that performs best on the synthetic data for each method is considered our "fiducial" parameter set for imaging M87 with that method. 138The corresponding images reconstructed from various data sets using these parameters are the method's "fiducial images." The synthetic data sets that we used for scoring the imaging parameter combinations consist of six synthetic EHT observations using the M87 2017 April 11 equivalent low-band (u, v) coverage.The source structure models used in the six sets vary from complex images generated using general relativistic magnetohydrodynamic (GRMHD) simulations of M87ʼs core and jet base (Models 1 and 2 from Chael et al. 2019) to simple geometrical models (a filled disk, Model 3, and simple rings with differing EVPA patterns, Models 4-6).The synthetic source models have varying degrees of fractional polarization and diverse EVPA structures.The synthetic source models blurred to the EHT nominal resolution are displayed in the first column of Figure 4.
All M87 synthetic data sets were generated using the synthetic data generation routines in eht-imaging.We followed the synthetic data generation procedure in Appendix C.2 of Paper IV, but with models featuring complex polarization structure.The synthetic visibilities sampled on EHT baselines are corrupted with thermal noise, phase and gain offsets, and polarimetric leakage terms.Mock D-terms for the SMT, LMT, and PV stations were chosen to be similar to those found by the initial exploration of the M87 EHT 2017 data reported in Appendix F. Random residual D-terms for ALMA, APEX, JCMT, and SMA (reflecting possible errors in the intra-site baseline calibration procedure) were drawn from normal distributions with 1% standard deviation.After generation, the phase and amplitude gains in the synthetic data were calibrated for use in imaging pipelines in the same way as the real M87 data; that is, they were self-calibrated to a Stokes  image reconstructed via the SMILI fiducial script for M87 developed in Paper IV.
In Figure 4, we present our fiducial set of images (in a uniform scale) from synthetic data surveys carried within each method.In each panel we report a correlation coefficient á ñ I I 0 • between recovered Stokes  and the ground-truth  0 images, This reflects the dot product of the two mean-subtracted images when treated as unit vectors.We also calculate a correlation coefficient for the reconstructed linear polarization image The real part is chosen to measure the degree of alignment of the polarization vectors   , ( ).In both cases, images are first shifted to give the maximum correlation coefficient for Stokes  .Because Stokes  image reconstructions are tightly constrained by an a priori known total image flux density, the Stokes  correlation coefficients are mean subtracted to increase the dynamic range of the comparison.This introduces a field-of-view dependence to the metric, as only spatial frequencies above (field of view) −1 are considered; up to the beam resolution.There is no such dependence in the linear polarization coefficient, which is not mean subtracted.
The correlation is equally strong independently of the employed method.The polarization structure is more difficult to recover for models with high or complex extended polarization (Models 1 and 2) for which correlation of the recovered polarization vectors is strong to moderate.In Figure 5 we present a uniform comparison of the recovered D-terms and the ground-truth D-terms for all synthetic data sets and methods.For all methods the recovered D-terms show a strong correlation with the model D-terms.To quantify the agreement (or distance in the complex plane) between D-term estimates and the ground-truth values D Truth in each approach, we calculate the L 1 ≡ |D i − D Truth | norm, where D i is a D-term component derived within a method i.Overall, for the fiducial set of parameters the agreement between the ground truth and the recovered D-terms in synthetic data measured using the L 1 norm is 1.3% on average (when averaging is done over stations, D-term components, and models).The reported averaged L 1 norms give us a sense of the expected discrepancies in D-terms between employed methods for their fiducial set of parameters.However, we notice again that the discrepancies do depend on source structure.For example, in models with no polarization substructure (e.g., Model 3) all methods had difficulty in recovering D-terms for PV (visible as large error bars for the station), a station forming only very long baselines on a short (u, v) track.If we exclude PV from the  L 1 metrics the expected L 1 norms for LMT and SMT alone for all methods are L 1 ∼ 0.6%-0.8%when averaged over models.

Fiducial Polarimetric Images of M87
In Figure 6, we present the fiducial M87 linear-polarimetric images produced by each method from the low-band data on all four observing days.The fiducial images from each method are broadly consistent with those from the preliminary imaging stage shown in Figure 21 of Appendix F.
Unless otherwise explicitly indicated, we display low-band results in the main text.The high-band results are given in Appendix I. We decided to keep the analysis of the high-and low-band data separate for several reasons.First, the main limitations in the dynamic range and image fidelity in EHT reconstructions arise from the sparse sampling of spatial frequencies, not the data S/N.Increasing the S/N by performing band averaging does not improve the dynamic range of the reconstructed images.Second, treating each band separately minimizes any potential chromatic effects that might add extra limitations to the dynamic range, such as intra-field differential Faraday rotation.Finally, separating the bands in the analysis allowed us to use the high-band results as a consistency check on the calibration of the instrumental polarization and image reconstruction for the low-band data.We perform this comparison of the results obtained at both bands in Appendix I. We conclude that both the recovered D-terms and main image structures are broadly consistent between the low and high bands.
The different reconstruction methods have different intrinsic resolution scales; for instance, the CLEAN reconstruction methods model the data as an array of point sources, while the RML and MCMC methods have a resolution scale set by the pixel size.In Figure 6, we display the fiducial images from each method at the same resolution scale by convolving each with a circular Gaussian kernel with a different FWHM.The FWHM for each method is set by maximizing the normalized cross-correlation of the blurred Stokes  image with the April 11 "consensus" image presented in Figure 15 of Paper IV.The blurring kernel FWHMs selected by this method are 19 μas for eht-imaging, DMC, and THEMIS, 20 μas for LPCAL, and 23 μas for polsolve.
The M87 emission ring is polarized only in its southwest region and the peak fractional polarization at ≈20 μas resolution is at the level of about 15%.The residual rms in linear polarization (as estimated from the CLEAN images) is between 1.10-1.30mJy/beam in all epochs, which implies a polarization dynamic range of ∼10.The nearly azimuthal EVPA pattern is a robust feature evident in all our reconstructions across time, frequency, and imaging method.The images show slight differences in the polarization structure between the first two days, 2017 April 5/6 and the last two, 2017 April 10/ 11.Notably, the southern part of the ring appears less polarized on the later days.This evolution in the polarized brightness is consistent with the evolution in the Stokes  image apparent in the underlying closure phase data (Paper III, Figure 14; Paper IV, Figure 23).However, as with the Stokes  image, the structural changes in the polarization images with time over this short timescale (6 days ≈16 GM/c 3 ) are relatively small, and it is difficult to disentangle which differences in the polarized images are robust and which are influenced by differences in the interferometric (u, v) coverage between April 5 and 11 (Paper IV, Section 8.3).
In Figure 7, we show the simple average of the five equivalently blurred fiducial images (one per method) for each of the four observed days.The averaging is done independently for each Stokes intensity distribution.These method-averaged images are consistent with the EHT closure traces, as shown in Figure 13 in Appendix B. We adopt the images in Figure 7 as a conservative representation of our final M87 polarimetric imaging results.

Azimuthal Distribution of the Polarization Brightness
While the overall pattern of the linearly polarized emission from M87 is consistent from method to method, the details of the emission pattern can depend sensitively on the remaining statistical uncertainties in our leakage calibration.In addition, the different assumptions and parameters used in each reconstruction method affect the recovered polarized intensity pattern, introducing an additional source of systematic uncertainty in our recovered images.In this section, we assess the consistency of the recovered polarized images across different D-term calibration solutions within and between methods.
We explore the consistency of our image reconstructions against the uncertainties in the calibrated D-terms by generating a sample of 1000 images for each method, each generated with a different D-term solution.For the imaging methods, we define complex normal distributions for each D-term based on the scatter in recovered D-terms in Figure 2 and reconstruct images after calibrating to each set of random D-terms without additional calibration.This procedure is explained in detail in Appendix H.For the posterior exploration methods we simply draw 1000 images from the posterior for each observing day.
In each method's set of 1000 image samples covering a range of D-term calibration solutions, we study the azimuthal distribution of the polarization brightness (p) and EVPA (χ) by performing intensity-weighted averages of these quantities over different angular sections along the ring.The width of the angular sections used in the averaging is set to Δj = 10°and the averages are computed from a position angle j = 0°to j = 360°, in steps of 1°.
Comparing angular averages of these quantities with a small moving window Δj avoids spurious features from the different pixel scales used in the different image reconstruction methods.The pixel coordinates of the image center are estimated (for each method) from the peak of the cross-correlation between the  images and the representative images of M87 used in the self-calibration.To avoid the effects of phase wrapping in the averaging (which biases the results for values of χ around ±90°), the quantity 〈χ〉 is computed coherently within each angular section, i.e., the averages are defined as In Figure 8, we show histograms of these quantities for two days, 2017 April 5 and 11, as a function of the orientation of the angular section used in the averaging (i.e., the position angle around the ring).We consider these two days because they have the best (u, v) coverage and span the full observation window; these results will thus include any effects of intrinsic source evolution in the recovered parameters.From Figure 8, it is evident that the difference in 〈p〉 between methods is larger than the widths of the 〈p〉 histograms in each method.This means that effects related to the residual instrumental polarization, giving rise to the dispersion seen in the histograms, are smaller than artifacts related to the deconvolution algorithms.In other words, the 〈p〉 images are limited by the image fidelity due to the sparse (u, v) coverage rather than by the D-terms.
Even though there are differences among methods in the p azimuthal distribution, some features are common to all our image reconstructions.The peak in the polarization brightness is located near the southwest on 2017 April 5 (at a position angle of 199°± 11°, averaged among all methods) and close to the west on 2017 April 11 (position angle of 244°± 10°).That is, the polarization peak appears to rotate counter-clockwise between the two observing days (see the dotted lines in Figure 8).On both days, the region of high polarization brightness is relatively wide, covering a large fraction of the southern portion of the image (position angles from around 100°-300°).
In the azimuthal distribution of 〈χ〉, all methods produce very similar values in the part of the image with the highest polarized brightness (the southwest region, between position angles of 180°and 270°).The EVPA varies almost linearly, from around 〈χ〉 = − 80°(in the south) up to around 〈χ〉 = 30°( in the east).The EVPAs on 2017 April 11 are slightly higher (i.e., rotated counter-clockwise) compared to those on 2017 April 5.This difference is clearly seen for eht-imaging, polsolve, and THEMIS, though the difference is smaller for DMC and LPCAL.We notice, though, that the differences in the EVPAs between days could also be affected by small shifts in the estimates of the image center on each day.Outside of the region with high polarization, the EVPA distributions for all methods start to depart from each other.There is a hint of a constant EVPA 〈χ〉 ∼ 0°in the northern region (i.e., position angles around 0°-50°) in polsolve and LPCAL on both days, but the other methods show larger uncertainties in this region.
The discrepancies in EVPA among all methods only appear in the regions with low brightness (i.e., around the northern part of the ring).Therefore, polarization quantities defined from intensity-weighted image averages, discussed in the next sections, will be dominated by the regions with higher brightness, for which all methods produce similar results.Image-averaged quantities are somewhat more robust to differences in the calibration and image reconstruction algorithms, though they are not immune to systematic errors.

Image-averaged Quantities
In comparing polarimetric images of M87, we are most interested in identifying acceptable ranges of three imageaveraged parameters that are used to distinguish between different accretion models in Paper VIII: the net linear polarization fraction of the image |m| net , the average polarization fraction in the resolved image at 20 μas resolution 〈|m|〉, and the m = 2 coefficient of the azimuthal mode decomposition  6).Method-average images for all four M87 observation days are shown, from left to right.These images show the low-band results; for a comparison between these images and the high-band results, see Figure 28 in Appendix I. We employ here two visualization schemes (top and bottom rows) to display our four method-average images.The images are all displayed with a field of view of 120 μas.Top row: total intensity, polarization fraction, and EVPA are plotted in the same manner as in Figure 6.Bottom row: polarization "field lines" plotted atop an underlying total intensity image.Treating the linear polarization as a vector field, the sweeping lines in the images represent streamlines of this field and thus trace the EVPA patterns in the image.To emphasize the regions with stronger polarization detections, we have scaled the length and opacity of these streamlines as the square of the polarized intensity.This visualization is inspired in part by Line Integral Convolution (Cabral & Leedom 1993)  of the polarized brightness β 2 .These parameters are defined below.
First, the net linear polarization fraction of the image is where the sum is over the pixels indexed by i. ALMA measured |m| net = 2.7% on 2017 April 11 (Goddi et al. 2021), but this measurement includes emission at large scales outside of the 120 μas field of view of the EHT images.We also consider the intensity-weighted average polarization fraction across the resolved EHT image: The value of 〈|m|〉 is determined by the intensity of the polarized emission at each point in the image, and thus it is sensitive to the resolution of the image and the choice of restoring beam.Specifically, images restored with beams of larger FWHM will tend to be more locally depolarized and thus have lower 〈|m|〉 than images restored with beams of smaller FHWM.In contrast, the integrated polarization fraction |m| net is insensitive to convolution.We quantify the polarization structure with a decomposition into azimuthal modes.In particular, Paper VIII considers the complex amplitude β 2 of the m = 2 mode defined in Palumbo et al. (2020), who found this mode to be the most important in distinguishing different modes of accretion from 230 GHz images produced by different GRMHD simulations.The β 2 azimuthal mode decomposition coefficient is defined as where (ρ, j) are polar coordinates in the image plane, and I ring is the Stokes  flux density in the ring between the minimum radius r min and the maximum radius r max .Because our reconstruction methods recover no significant extended brightness off the main ring, we take r = 0 min and extend r max to encompass the full-image field of view, so I ring is equal to the total Stokes  flux density in the image.
Note that both the amplitude |β 2 | and phase ∠β 2 depend on the choice of image center, image resolution, and restoring beam size.In the comparisons that follow, we convolve images from each method with circular Gaussian beams with the FWHMs specified in Section 5.1 chosen to bring all images to the same resolution scale.Furthermore, we center each reconstruction by finding the pixel offset that maximizes the cross-correlation between the blurred Stokes  image and the 2017 April 11 consensus Stokes  image from Paper IV.In general, we find these offsets to be small, and our results do not change significantly if we do not apply any centering procedure in calculating β 2 from our reconstructed images.
From the sets of 1000 images generated by each method to explore variations of the image structure with the D-term solution, we compute distributions of each key metric-|m| net , 〈|m|〉, |β 2 |, and ∠β 2 -that is used in Paper VIII for theoretical interpretation.These distributions are summarized in Figure 9, which displays the mean points and 1σ error bars from different D-term realizations for all four methods on both 2017 April 5 and 11.We present a more complete look at these distributions with histograms for each quantity from each method/day pair in Appendix H, Figures 25 and 26. Figure 9 shows results for low-band images only; we compare these results to results derived from the high-band data in Figure 29 in Appendix I.Because we derived and vetted our imaging procedures for the low-band data, we use only the low-band results in determining our final parameter measurements and use the high-band results in Appendix I as a consistency check.
On each observation day, the distributions of |m| net appear consistent between most pairs of reconstruction methods, with some notable exceptions.Many of the distributions of |m| net peak around the ALMA measured value of 2.7%, but the LPCAL distributions on both days and the eht-imaging distributions on 2017 April 11 are peaked closer to 1%.The distributions of 〈|m|〉 are peaked between 6% and 11% for all five methods across both days.On both days, the 〈|m|〉 distributions for eht-imaging, DMC, and THEMIS are peaked at values 2%-3% higher than the corresponding LPCAL or polsolve distributions.This systematic shift may indicate residual issues with bringing the reconstruction methods to the same resolution scale; in particular, the same circular Gaussian kernel was used to blur Stokes   , and  in each method, while the intrinsic resolution of the reconstruction in  and  may be lower than in total intensity.In each method, there appears to be a decrease in 〈|m|〉 of ≈1%-2% between 2017 April 5 and 11.Note that, because it is constrained to be positive and defined as the ratio of two uncertain flux densities, the mean of a distribution of fractional polarization m can have a positive bias.In Figure 25, we see that the distributions of |m| net and 〈|m|〉 both can have long tails; this is most evident on 2017 April 10, when the image reconstructions are the most uncertain due to poor (u, v) coverage.On 2017 April 5 and 11, we do not see prominent tails in the distributions of |m| net and 〈|m|〉.Furthermore, we expect any bias in the mean of these quantities in the measurement from a single method to be overwhelmed by the systematic uncertainty between different reconstruction methods.
The mean of the |β 2 | distribution is peaked between 0.04 and 0.07 for all methods on both days; however, the |β 2 | distributions from eht-imaging, DMC, and THEMIS have larger mean values on both days than the corresponding distributions for polsolve and LPCAL.Again, because |β 2 | is sensitive to the restoring beam size, this may be due to residual errors in bringing the polarized images to the same resolution scale.Similarly to the distributions of 〈|m|〉, there are indications of a shift downward in |β 2 | by an absolute value of ≈0.01 in all four methods between 2017 April 5 and 11.The distributions of the phase ∠β 2 are consistent between most pairs of methods with no obvious systematic difference  12)), the average polarization fraction 〈|m|〉 (Equation ( 13)), and the amplitude |β 2 | and phase ∠β 2 of the m = 2 azimuthal mode of the complex polarization brightness distribution (Equation ( 14)).The shaded bands show the consensus ranges (Table 2) incorporating both uncertainties in these parameters from the D-term calibration and systematic discrepancies between image reconstruction methods.A comparison with analogous highband results is provided in Figure 29.
between the sub-component methods (LPCAL, polsolve) and those that use a continuous image representation (ehtimaging, DMC, and THEMIS).Furthermore, there is no apparent systematic difference in the ∠β 2 results between 2017 April 5 and 11.
To score different accretion models from GRMHD simulations against constraints from the EHT data, Paper VIII uses a range for each quantity that incorporates both the uncertainties in the parameters from the D-term calibration process (the error bars for each method in Figure 9) and the systematic uncertainty across methods (the scatter in the points across methods).The final ranges used in Paper VIII for each parameter were set by taking the minimum/maximum of the 10 mean values minus/plus the 1σ error bars across both days and all methods.These parameter ranges are denoted by colored bands in Figure 9 and are presented in Table 2.

Discussion
We discuss several important effects in the polarimetric emission from M87 that are relevant for our analysis of the 230 GHz linear polarization structure in this work.In particular, we discuss implications of our results for source variability and Faraday rotation in M87.
Figure 8 demonstrates variability in both the total intensity and polarimetric images of M87 between 2017 April 5 and 11.It is unlikely that the polarimetric variability in the reconstructed images is due to the different (u, v) coverages on different days.The changes in the polarimetric images are consistent with signatures of the source intrinsic variability seen in the VLBI data themselves.In addition to the reconstructed images and calibrated data, we see variability in calibration-insensitive VLBI data products; these are introduced and discussed in Appendix B.
The total flux density from M87ʼs inner arcseconds measured on EHT intra-site baselines (ALMA-APEX) and by ALMA alone is F ∼ 1.2 Jy; this is a factor of 2 higher than the total flux density measured in the ring visible on EHT scales.Given that the net fractional polarization measured in the EHT images (|m| net ∼ 1%-3.7%) is consistent with that measured on arcseconds scales |m| ∼ 2.7% (Goddi et al. 2021), the net fractional polarization of any other emission component(s) in the ALMA field of view should be comparable to that of the ring resolved by the EHT.
The polarimetric image stability analysis (Section 5.2) provides a measurement of the integrated EVPA and associated D-term calibration uncertainty for each reconstruction method and observing day. Figure 10 shows that the total EVPA integrated over the EHT images ranges from χ ∼ − 70°to χ ∼ − 55°on 2017 April 5 and from χ ∼ − 25°to χ ∼ − 10°o n 2017 April 11, depending on the image reconstruction method.On all days, the EHT-measured EVPA in the core is significantly offset from the EVPA measured by ALMA on large scales.This offset implies that the extended component within the central arcseconds is polarized.We note that in both EHT and ALMA-only observations, the EVPA swings in a counter-clockwise direction from 2017 April 5 to 11.
Figure 10 also compares the image-integrated EVPA measured from the EHT high-and low-band images across all four days (see Appendix I for a full discussion of the highband results).For 2017 April 6, 10, and 11, the high-and lowband net EVPAs are consistent for each method within the 1σ error bars derived from the D-term calibration sample.On 2017 April 5, we see a systematic net EVPA offset between the lowand high-band images: Δχ |HI−LO| ≈ 20°-30°in all five methods.
If we were to interpret this systematic EVPA offset on 2017 April 5 as Faraday rotation from an external screen, it would correspond to |RM| ∼ 1-2 × 10 7 rad m −2 .As this is the largest effect that we observe between the bands, we can adopt this number as a conservative upper limit on the resolved RM.While on the other three days (2017 April 6, 10, and 11) there is no signature of an offset in the image-integrated EVPA, we do see intriguing offsets in the EVPAs in some portions of the resolved images.Such non-uniform rotations may be indicative of Faraday rotation occurring internally in the compact source, but because of the low significance of these detections we make no further effort to interpret them here.
The full implications of these results for Faraday rotation in the source depend on the magnitude, location, and nature of the Faraday screen.Goddi et al. (2021) report contemporaneous ALMA measurements of the RM in M87 on arcseconds scales; these range from 1.5 × 10 5 rad m −2 to − 0.4 × 10 5 rad m −2 .If we interpret the ALMA-only measurements as the result of a variable external Faraday screen, they would imply EVPA rotations from infinite frequency to 230 GHz of less than 15°, with day-to-day swings in the EVPA of up to 20°from variability in the external RM alone.However, Goddi et al. (2021) also consider a two-component model comprising compact and extended (arcseconds-scale) emission regions with separate, static Faraday screens.This two-component model is capable of reproducing both the magnitude and interday variability of the observed ALMA RMs, with the variability entirely in the underlying source, not the Faraday screen.In contrast to the direct ALMA measurement, this model suggests the RM relevant for the EHT images is of order − 5 × 10 5 rad m −2 .Intrinsic polarimetric evolution is also supported by the changes in Stokes  alone and by the changes in the distribution of polarized intensity in our polarimetric images; we do not see a simple uniform rotation of EVPAs between 2017 April 5 and 11 around the emission ring in Figure 7.
In reality, the net EVPA and resolved EVPA structure in both EHT bands are affected by the complicated interplay of intrinsic source structure and evolution, Faraday rotation internal to the emission region, and Faraday rotation from an external screen (if present).Paper VIII discusses all of these scenarios in more detail.Starting from 2018, the EHT observes simultaneously in 212.1-216.1 GHz and 226.1-230.1 GHz frequency bands (Paper II).This development should allow us to better quantify the resolved RM and to address the intrinsic polarimetric variability of M87 with better precision in the future.ote.The ranges are taken from the bands plotted in Figure 9 incorporating the ± 1σ error from each method's D-term calibration survey.
Finally, in this Letter, we discuss only the linear polarization images.Given its small magnitude, Stokes  is significantly more sensitive to calibration choices and residual errors than the linear polarization components.For that reason a full analysis of the circular polarization structure in M87 will be presented separately.

Summary
We present polarimetric calibration and polarimetric imaging of the EHT 2017 data on the 230 GHz core of M87 on scales comparable to the supermassive black hole event horizon.Our analysis follows up on the M87 total intensity data calibration, image reconstructions, and model fits presented in Paper III, Paper IV, and Paper VI.
We employ multiple distinct methods for polarimetric calibration and polarimetric imaging.All methods were first tested on a suite of synthetic data.When applied to M87, they consistently show that the polarized emission is predominantly from the southwest quadrant.In all reconstructions, the polarization vectors are organized into a similar coherent pattern roughly oriented along the ring.In all reconstructions, both the image-integrated net linear polarization fraction and the average resolved polarization fraction on the ring are consistent to within a few percent.We observe signatures of evolution in the ring's polarization from 2017 April 5 to 11, the full length of the EHT 2017 observing campaign.In this work, we demonstrate that the main polarimetric characteristics of the M87 ring are robust to D-term calibration uncertainties and to the choice of image-reconstruction algorithm, though the detailed source structure (particularly in low brightness regions) is still limited by the EHT's very sparse (u, v) coverage and thus depends sensitively on choices made in the image reconstruction and calibration process.
The high-angular-resolution observation with the EHT, on unprecedented scales of ∼20 μas ≈2.5 R S , allows us for the first time to reconstruct the geometry of magnetic fields in the immediate vicinity of the event horizon of the M87 supermassive black hole.The physical interpretation of our polarimetric images and the full discussion of horizon-scale magnetic field geometries consistent with the EHT images are presented in Paper VIII.

Appendix A Polarimetric Data Issues
In this section we describe station-specific issues and present the results of a set of validation tests and refinements in the calibration that have been performed on the EHT data, prior to the calibration of the instrumental polarization and the final reconstruction of the full-Stokes EHT images.

A.1. Instrumental Polarization of ALMA in VLBI Mode
Phased ALMA records the VLBI signals in a basis of linear polarization, which need a special treatment after the correlation (Martí-Vidal et al. 2016;Matthews et al. 2018).The postcorrelation conversion of the ALMA data from a linear basis into a circular basis has implications for the kind of instrumental polarization left after fringe fitting.As discussed in Goddi et al. (2019), any offset in the estimate of the phase difference between the X and Y signals of the ALMA antenna used as the phasing reference (an offset likely related to the presence of a non-zero Stokes  in the polarization calibrator) maps into a post-conversion polarization leakage that can be modeled as a symmetric, pure-imaginary D-term matrix (i.e., D R = D L = iΔ).The amplitude of the ALMA D-terms, Δ, can be approximated (to a first order) as the value of the phase offset between X and Y in radians (Goddi et al. 2019).Hence, we expect the D R and D L estimates for ALMA to be found along the imaginary axis and to be of similar amplitude.
Furthermore, the ALMA feeds in Band 6 (the frequency band used in the EHT observations) are rotated by 45°with respect to their projection on the focal plane.This introduces a phase offset between the RCP and LCP post-converted signals that has to be corrected after the fringe fitting.This offset can be applied as a global phase added (subtracted) to the RL * (LR * ) correlation products in all baselines (because ALMA has been used as the reference antenna in the construction of the global fringe-fitting solutions).We have applied this 45°rotation to all the visibilities before performing the analysis described in this Letter.Hence, the absolute position angles of the electric vectors (EVPA) derived from our EHT observations are properly rotated into the sky frame.This property of the ALMA-VLBI observations (see Appendix D) gives us absolute EVPA values instantaneously.

A.2. Instrumental Polarization of the LMT
The LMT shows an unexpectedly high leakage signal with a large delay of ∼1.5 ns, which affects the cross-polarization phase spectra of the baselines related to the LMT.As a consequence, all the baselines related to the LMT show spurious instrumental fringes in the RL * and LR * correlations, with amplitudes similar to (and even higher than, for the case of sources with low intrinsic polarization) that of the main fringe.These instrumental fringes are smallest in the parallel-hand correlations (RR * and LL * ), but relatively high in the crosspolarization hands and are related to strong polarization leakage likely due to reflections in the optical setup of the LMT receiver used in 2017 (Paper III).For the EHT observations on year 2018 and beyond, the special-purpose interim receiver used at the LMT was replaced by a dualpolarization sideband-separating 1.3 mm receiver, with better stability and full 64 Gbps sampling as for the rest of the EHT (Paper II), so future polarimetry analyses of the EHT may be free of this instrumental effect from the LMT.
If we take the frequency average over all intermediatefrequency (IF) sub-bands (the results presented in Paper I-Paper VI are based on this averaging), the effect of this leaked fringe is smeared out, as the average is equivalent to taking the value of the visibility at the peak of the main fringe.This main peak is only affected by the sidelobe of the delayed leaked fringe, with a relative amplitude that we estimate to be of 10%-20% of the cross-polarization main fringe.Therefore, the effect of the leaked fringe is small in comparison to the contribution from the ordinary instrumental polarization, which can especially dominate the cross-polarization signal for observations of sources with low polarization like M87, and can be ignored.

A.3. Instrumental Polarization of the SMA
The dual-polarization observations performed by the SMA use two independent receivers at each antenna to register the RCP and LCP signals.However, the visibility matrices of the baselines related to the SMA are built from the combination of the RCP and LCP streams as if they were registered with one single receiver.Therefore, some of the assumptions made in the RIME (see Equation ( 6)) for the polarimetry calibration (e.g., stable relative phases and amplitude between polarizations) may not apply for the SMA-related visibilities.However, the fringe fitting of the parallel-hand correlations related to the SMA, as well as the absolute amplitude calibration (both described in Paper III) did account for the drifts in crosspolarization phase and amplitude between the SMA receivers, which makes it possible to model the instrumental polarization using ordinary leakage matrices.
One extra correction that has to be applied to the D-terms of the SMA is a phase rotation between the RCP and LCP leakages, to account for the 45°rotation of the antenna feed with respect to the mount axes.The D-terms shown in Section 4.2 and in Appendix D are corrected for this rotation.

A.4. Instrumental Polarization of the JCMT
The JCMT was equipped with a single-polarization receiver for these observations, so that only one of the two polarizations can be used at each epoch.Therefore, only one of the two cross-polarization correlations can be computed in all the baselines related to the JCMT; depending on which product is computed, we can only solve for one of the two D-terms of the JCMT (i.e., D L if RCP is recorded; D R otherwise).

A.5. Cross-polarization Delays
As explained in Martí-Vidal et al. (2016), a byproduct of the use of polconvert in VLBI is the calibration of the absolute cross-polarization delays and phases in the stations with polconverted data, which allow for the reconstruction of the absolute EVPAs of the observed sources.The only condition to have this absolute R/L delay and phase calibration is to use the polconverted station (i.e., ALMA, in the case of the EHT) as the reference antenna in the fringe fitting.
In Figure 11, we show the difference of multi-band delays between RL * and LR * after the GFF calibration described in Paper III.All source scans and baselines with an S/N higher than 10 are shown for times when ALMA was participating in the observations.According to Martí-Vidal et al. (2016), the delay difference between RL * and LR * should be around zero when ALMA is the reference antenna.We see, though, hints of a small global residual delay difference after the GFF calibration (the points are not symmetrically distributed around zero).The weighted average of all the delay differences shown in Figure 11 is Δτ = − 28 ± 1 ps.This is a very small delay in absolute value (the amplitude losses due to this delay in each correlation product is lower than 1%), but still detectable at the remarkably high S/N level of the EHT observations.

B.1. Definition
Closure traces (Broderick & Pesce 2020) are calibrationinsensitive quantities constructed on station quadrangles from the coherency matrices ρ jk defined in Equation (2): These data products are a superset of the more familiar closure quantities (closure phases and closure amplitudes), with the additional property that they are independent of instrumental polarization.The closure traces are also independent of any other station-based effects that can be described in a Jones matrix formalism, including the definition of the polarization basis (e.g., the representation of the polarized quantities in terms of linear or circular feeds).The closure traces thus provide a powerful tool with which to make calibrationindependent statements regarding polarimetric data and intrinsic source structure.By analogy with trivial closure phases (see Paper III), trivial closure traces may be constructed on "boomerang" quadrangles, i.e., quadrangles in which a station is effectively repeated in such a way as to make the quadrangle area vanish (Broderick & Pesce 2020).Given two co-located stations i and ¢ i , the closure trace ¢  iji k reduces to unity.Each quadrangle ijkl has an associated "conjugate" quadrangle ilkj, constructed by reordering the baselines within the coherency matrix product. 139Conjugate closure trace products can be expressed as º = + -+ -

˜
. The  ijkl are identically unity in the absence of intrinsic source polarization and for point sources.Deviations from unity require non-constant interferometric polarization fractions on baselines in the quadrangle ijkl, and therefore closure trace products are a robust indicator of polarized source structures (Broderick & Pesce 2020).

B.2. Implications for Polarimetric Data Quality
For EHT observations, boomerang quadrangles are formed using the redundant baselines presented by ALMA and APEX. 140In the top panel of Figure 12, the phases of all of the trivial closure traces are shown for M87 (blue) and the calibrators (J1924-2914, NRAO 530, 3C 279; gray), constructed from scan-averaged visibility data.The values of these phases are clustered about zero, consistent with the expectation that the trivial closure traces are unity.
The distribution of the normalized residuals provide a direct assessment of the systematic error budget of the polarimetric data independent of the gain and leakage calibration.These residuals are shown in the bottom panel of Figure 12 for both M87 and calibrators.We find that the data match the anticipated unitvariance Gaussian, which is consistent with an absence of unidentified systematic uncertainties in the polarimetric data.

B.3. Calibration-insensitive Detection of Polarization
Figure 13 shows the phase of the conjugate closure trace product,  ijkl , for a quadrangle pair ALMA-APEX-LMT-SMT and ALMA-SMT-LMT-APEX.The presence of non-zero  ijkl is a calibration-insensitive indicator of significant polarized structures in the Stokes map.Because the uncertainties of the closure traces on conjugate quadrangles are correlated, the resulting uncertainty in the conjugate closure trace product is typically smaller than would be estimated from assuming independent uncertainties in the individual closure traces.The errors shown have been estimated using Monte Carlo sampling of the constituent visibilities.

B.4. Calibration-insensitive Detection of Evolving Source Structure
Closure trace phases are shown on a handful of non-trivial quadrangles in Figure 14 for M87.These phases are clearly nonzero and exhibit variations throughout the observing night, which is consistent with non-trivial source structure.The behavior of the closure trace evolution is similar across neighboring observation days (e.g., 2017 April 5/6, April 10/ 11) and consistent between quadrangles constructed using ALMA (filled markers) and APEX (open markers).The behavior of the closure trace evolution is dissimilar between the 2017 April 5/6 and April 10/11 observations, providing direct evidence for an evolving source structure in M87 independent of all station-based corrupting effects, including polarization leakage.Because ALMA, LMT, and SMT are nearly co-linear as seen from M87 for much of the observations, and because the closure traces are presumably tracing primarily the Stokes  emission, the closure trace phases in Figure 14 are very similar to the Stokes  closure phases shown in Figure 14 of Paper III.Note that the points plotted in Figure 14 have been averaged across both the high and low bands.

B.5. Calibration-insensitive Probe of Evolving Polarimetric Source Structure
Conjugate closure trace product phases are shown in Figure 15 for each observation day for the ALMA-PV-LMT-SMT and ALMA-SMT-LMT-PV quadrangle pair.There is the appearance of temporal evolution from April 5/6 to April 10/11, with an attendant implication for an evolution in the polarization map of M87 between those periods.However, the paucity of quadrangles exhibiting significant  ).The colored lines show the same conjugate closure trace product phase for each of the image reconstructions (see Figure 6), and the dark gray solid line shows the same for the fiducial image (see Figure 7).evolution renders this conclusion suggestive at best.Note that the points plotted in Figure 15 have been averaged across both the high and low bands.

Appendix C Detailed Description of Algorithms for the Calibration of Instrumental Polarization
In this Appendix we provide a brief description of each of the five image reconstruction and D-term calibration pipelines that we employ in this work.More complete descriptions of each method can be found in the individual method papers referenced for each individual pipeline.

C.1. Polarimetric Imaging via Sub-component Fitting:
polsolve, LPCAL, and GPCAL In the sub-component fitting method for polarimetric modeling, the instrumental polarization (i.e., the complex D-terms) and the source polarized brightness distribution are estimated simultaneously from the interferometric observables and a fixed estimate of the source brightness distribution  x ( ).The sub-component fitting calibration algorithms estimate the D-terms from Equations (4) and (7) by modeling the polarized source structure as a disjoint set of N "polarization sub-components,"  x i ( ), such that:  The fractional polarization of each sub-component is assumed to be constant, implying that: where q i and u i are real-valued constants.In the sub-component fitting method, we therefore assume that the polarized brightness is exactly proportional to  i for each source sub-component.This condition is known as the "similarity approximation" and may produce inaccurate estimates of the instrumental polarization for cases of strongly polarized and resolved calibrators (e.g., Cotton 1993) and/or if the sub-division of  into sub-components is not performed properly.Discussions about the self-similarity assumption can be found in Appendix K.
The polsolve, LPCAL, and GPCAL algorithms determine which values of q i , u i , D R , and D L minimize the difference between the calibrated visibility matrix and the Fouriertransformed model brightness matrix (Equations ( 6) and (4)).The total number of parameters used in this fit is equal to two times the the number of source sub-components (i.e., 2N, which correspond to q i and u i in Equation (C2)) plus four times the number of antennas (i.e., 4N a , accounting for the real and imaginary parts of the D R and D L of each antenna).The error function to be minimized is the sum of the χ 2 values computed for the cross-polarization matrix elements of the RIME, i.e.,  4)), whereas ˜and  ˜depend on q i and u i .The χ 2 minimization solves for the intrinsic source Stokes parameters and the instrumental polarization simultaneously.We note that the effects of instrumental polarization are constant in the frame of the antenna feed for Cassegrain-mounted feeds, whereas the intrinsic source polarization is defined in the sky frame; as a consequence, the changing feed angle of each antenna across the observations (i.e., the Earth rotation during the extent of the observations) allows the model fitting to decouple the antenna D-terms from the Stokes parameters of the source sub-components.For Nasmyth-mounted feeds, there is an additional rotation between cross-polarization introduced by the feed/optics and the telescope itself.In this case, we assume a minimal contribution from the antennas themselves, a reasonable assumption for these on-axis telescopes.Equation (C3) implies that there are several implicit assumptions in the polarimetric modeling of polsolve and LPCAL.First, RL *c and LR *c are computed by setting =  0 (i.e., any circular polarization in the calibrators is neglected, compared to Stokes  ).Furthermore, the real and the imaginary parts of the residual visibilities (i.e., either RL *c or LR *c minus the Fourier transforms of the corresponding model brightness distributions) are assumed to be statistically independent.
If the linear polarization structures of calibrators are not similar to their total intensity structures, a breakdown of the similarity approximation can occur.This can be a source of uncertainties in D-term estimation.In Appendix K, we discuss this effect on our results for different polarization calibrators reported in this Letter.

C.2. Polarimetric Imaging via Regularized Maximum
Likelihood: eht-imaging The package eht-imaging (Chael et al. 2016(Chael et al. , 2018) implements polarimetric image reconstruction via RML.ehtimaging solves for an image X by minimizing an objective function via gradient descent.The objective function J(X) is a weighted sum of data-consistency log-likelihood terms and regularizer terms that favor or penalize certain image features.That is, to find an image (in either total intensity, polarization, or both) we minimize å å Picking optimal values of the "hyperparameter" weights α i and β j in Equation ( C4) is an essential task in RML imaging.Here we describe the data terms and regularizers that we use for polarimetric imaging, and in Appendix G we describe our method for determining the hyperparameters using parameter surveys.
For polarized image reconstructions, we follow the method laid out in Chael et al. (2016), with the addition of iterative selfcalibration of any uncorrected station D-terms.We start with data that has had the overall time-dependent station amplitude and phase gains calibrated using the SMILI fiducial image from Paper IV, and the ALMA, APEX, SMA, and JCMT D-terms have been corrected using the zero-baseline solutions described in Section 4.2.The data are scan-averaged.We then reconstruct the Stokes  using the same fiducial imaging script for ehtimaging developed in Paper IV.We fix the image field of view at 120 μas and solve for a grid of 64 × 64 pixels.In the Stokes  imaging, the total flux density is constrained to be 0.6 Jy.We next (re)self-calibrate the station amplitude and phase gains (assuming G R = G L ) to our final Stokes I image.Having extensively explored the imaging parameter space for Stokes  imaging in Paper IV, we do not vary these parameters in our polarimetric imaging surveys.After self-calibrating to our final total intensity image, we drop zero baselines for the polarimetric imaging stage.
In defining an objective function of the form in Equation (C4) for the polarized image reconstruction, we consider two loglikelihood χ 2 terms; one computed using the RL * polarimetric visibility = +    i ˜˜, and one using the visibility domain polarimetric ratio  is immune to any residual station gain error left over from Stokes  imaging, while c p 2 ˜is not.We use two regularizers on the polarized intensity.First, the Holdaway-Wardle (Holdaway & Wardle 1990) regularizer S HW (Equation (13) of Chael et al. 2016) acts like an entropy term that prefers image pixels take a value less than = m 0.75 max .This regularizer encourages image pixels to stay below the theoretical maximum polarization fraction for synchrotron radiation, but it is not a hard limit.Second, the total variation (TV) regularizer S TV (Rudin et al. 1992) acts to minimize pixel-to-pixel image gradients in both the real and imaginary part of the complex polarization brightness distribution (Equation ( 15) of Chael et al. 2016).
Taken together, the objective function we minimize in polarimetric imaging is The relative weighting between the data constraints and the regularizer terms is set by the four hyperparameters α P , α m , β HW , and β TV .
We solve for the polarized intensity distribution that minimizes Equation (C5) parameterized by the fractional polarization m and EVPA ξ in each pixel.The Stokes  image is fixed in the polarimetric imaging step and defines the region where polarized emission is allowed.To ensure our solution respects 2 everywhere, we transform the fractional polarization m in each pixel from the range m ä [0, 1] to κ ä (−∞ , ∞ ) and solve for κ (See Appendix D of Chael et al. 2016).In the eht-imaging script for EHT M87 observations, we solve for the pixel values of κ and ξ that minimize the objective function by gradient descent, and we then transform ).We often restart the gradient descent process several times, using the output of the previous round of imaging blurred by a 20 μas Gaussian kernel as the new initial point.
In between rounds of polarimetric imaging with ehtimaging, we iteratively solve for the remaining D-terms by minimizing the χ 2 between the real (gain-calibrated) data and sampled data from the current image reconstruction corrupted with Jones matrices (Equation ( 4)).We do not use any linearized approximations of the effects of the Jones matrices when solving for the D-terms, but throughout we assume the model image has no circular polarization ( =  0).The ehtimaging pipeline thus alternates between rounds of polarimetric imaging and D-term calibration; often it takes many successive rounds of imaging and D-term calibration (n iter ≈ 50-100) for the process to converge on a stable D-term solution.

C.3. Polarimetric Imaging as Posterior Exploration: DMC and THEMIS
In this section we describe two MCMC schemes developed for polarimetric imaging.Both MCMC codes model the polarized emission structure on a Cartesian grid of intensity points, with the Stokes vector parameterized using a spherical (Poincaré) representation, where the index i runs over individual grid points, where we solve for the total intensiy  i , the fractional polarization ℓ i , and the two angles ξ i , ς i .Stokes visibilities are generated from the gridded emission structure via a direct Fourier transform (i.e., treating each grid point as a point source), and the visibilities are then multiplied with a smoothing kernel to impose image continuity.The parallel-and cross-hand visibilities on each baseline are then computed from the Stokes visibilities using Equation (7), and the gains and leakage terms are applied to the model visibilities using a Jones matrix formalism (see Equation ( 6)).The model and data visibilities are ultimately compared via complex Gaussian likelihood functions for each of the parallel-and cross-hand data products independently, with the total likelihood taken to be the product of the individual likelihoods for all parallel-and cross-hand data products.

C.3.1. DMC
We introduce a new DMC that utilizes the Hamiltonian Monte Carlo (HMC) sampler implemented in the PyMC3 probabilistic programming Python package (Salvatier et al. 2016) to perform posterior exploration.We briefly describe the relevant aspects of the DMC analysis in this section; a more thorough description of the software is provided in Pesce (2021).Prior to fitting, we coherently average the visibility data on a per-scan basis and flag the intra-site baselines.
Within the DMC framework, the  image axes are aligned with the equatorial coordinate axes.The pixel intensities are constrained to sum to a total flux density via the imposition of a flat Dirichlet prior, and the total flux density is restricted to be positive via a uniform prior on the range (0,2) Jy.The radial Stokes parameter (ℓ i in Equation (C6)) is sampled from a unit uniform prior, and the angular Stokes parameters (ξ i and ς i in Equation (C6)) are uniformly sampled on the sphere.We multiply the model visibilities by a circular Gaussian kernel to impose image smoothness.
In DMC, both the right-and left-hand complex station gains are modeled independently on every scan, save for a single reference station (chosen to be ALMA) that is constrained to have zero right-and left-hand gain phase at all times.We impose log-normal priors on the gain amplitudes and wrapped uniform priors141 on the gain phases.The right-and left-hand leakage amplitudes are sampled from a unit uniform prior, and the leakage phases are sampled from a wrapped uniform prior.
The DMC likelihood variances are set to the quadrature sum of the data thermal noise variances and a systematic component that is modeled as the square of a fraction of the Stokes  visibility amplitude; this fractional uncertainty parameter is sampled from a unit uniform prior.

C.3.2. THEMIS
The existing method for imaging via posterior exploration described in Broderick et al. (2020b) has been extended to polarization reconstructions.This makes use of a deterministic even-odd swap tempering scheme (Syed et al. 2019) using the HMC sampling kernel from the Stan package (Carpenter et al. 2017).Here we briefly summarize the implementation and assumptions underlying the THEMIS polarization map reconstructions; more detail on these points will be presented elsewhere (A.E. Broderick et al. 2021, in preparation).
As with DMC, all THEMIS analyses are performed on coherently scan-averaged visibility data.Unlike the DMC analysis, intra-site baselines are included to facilitate gain and leakage calibration.This is enabled by the inclusion of a large, uniformly polarized Gaussian to model the milliarcsecondscale structure (see, e.g., Broderick et al. 2020b;Paper IV) THEMIS models the polarized image as a small number of control points located on a rectilinear grid, from which the fields  , ℓ, ξ, and V cos( ) are constructed via an approximate cubic spline in a fashion similar to Broderick et al. (2020b).The field of view and orientation of the rectilinear grid are fit parameters and permitted to vary.In this way the effective resolution is reconstructed from the data itself.Logarithmic priors are adopted on  and ℓ, flat priors are adopted on ξ and V cos( ) with the natural limits.
Complex station gains are reconstructed via the Laplace approximation (see Section 6.8 of Broderick et al. 2020a).The right-and left-hand complex station gains are constrained to be equal, and permitted to vary independently on every scan.Lognormal priors are imposed on the station gain amplitudes.The real and imaginary components of the right-and left-hand leakages are treated as additional model parameters, with each component sampled uniformly on [−1,1].
Unless otherwise indicated, THEMIS analyses shown here used a 5 × 5 raster grid, which is consistent with that typically necessary to capture features on the scale of the EHT beam within the field of view imposed by the shortest intersite baselines.A 3% systematic noise component was added in quadrature to the thermal uncertainties to capture non-closing errors in the scan-averaged visibilities.These are similar to the magnitude of fractional systematic error inferred from the DMC analyses.

C.3.3. THEMIS−DMC Leakage Posterior Comparison
In Figure 16 we show a comparison between the leakage posteriors for all stations, as determined by DMC and THEMIS fits to the 2017 April 11 observations of M87.Despite the various different assumptions and model specifications, we find excellent agreement in both the means and shapes of the posterior distributions recovered from both methods.The modest discrepancies between the posteriors shown in Figure 16 are associated with the different treatment of systematic uncertainty and right/left gain ratios between the two methods; when these model choices are homogenized, the DMC and THEMIS fits to both synthetic data sets and to the M87 data return indistinguishable posteriors.Notably, both model treatments of the Stokes map appear to be comparably capable of capturing the source structure.

Appendix D Intra-site D-term Validation
We present the final D-terms for ALMA in Table 3 and for APEX, JCMT, and SMA in Table 4.These D-terms were fit to EHT intra-site baseline data from multiple calibrators simultaneously using the method described in Section 4.2, implemented in polsolve.
JCMT can only record one of two polarization channels at a given time; see Appendix A.4.Therefore, the coherency matrix given in Equation (2) is incomplete for the JCMT-SMA baseline; the missing cross-polarization components on all baselines to the JCMT imply that the relation between visibilities and polarized brightness distribution is an underdetermined problem.Fortunately, when fitted in combination with the ALMA-APEX baseline, the Stokes parameters of the unresolved source are determined by the latter.This information is used simultaneously to fit for the JCMT and SMA D-terms.In this fit, only the D-terms affecting the observed cross-polarization product can be estimated, which means that, for each JCMT polarization configuration, only one of the two D-terms of each station (SMA and JCMT) can be determined.
Most D-terms, being instrumental properties, are expected to remain constant across observations of different target sources and observations carried out across multiple days.In the case of ALMA, however, D-terms are generated from an offset in the relative phase calibration between the X and Y linear polarizations of the reference ALMA antenna in the VLBI phasing procedure (Martí-Vidal et al. 2016;Matthews et al. 2018;Goddi et al. 2019).The estimated relative X-Y phase at ALMA may change between epochs due to, e.g., a change in the reference antenna,142 a resetting of the ALMA delay calibration, or the use of different calibrators in the polarization calibration process.Therefore, day-to-day variations in ALMA D-terms are expected.D-terms are expected to have a frequency dependence for all stations, hence we obtain separate estimates for each 2 GHz band.
The D-terms fitted for ALMA are dominated by an imaginary component and indicate day-to-day variation along the imaginary axis, as expected from the physical understanding of the leakage origin (Martí-Vidal et al. 2016;Matthews et al. 2018;Goddi et al. 2019).The dispersion in D-term estimates between days and bands is remarkably low for APEX.Thus the fitting is consistent among days as expected: the APEX hardware appears stable across the whole EHT campaign.Similar to APEX, SMA and JCMT should not have varying D-terms across epochs.Therefore, we derive campaign-average D-terms for these three stations from the day-by-day estimates, combining bands.For ALMA, per-day/ band D-term estimates are used.
We validate our D-term calibration via intra-site baseline properties using three methods: comparing intra-site baseline source properties to interferometric-ALMA measurements; comparing SMA intra-site leakage estimates to interferometric-SMA estimates; and comparing polsolve leakage estimation to point-source polarimetric modeling with the eht-imaging library and DMC.We additionally motivate leakage calibration using band-averaged products from intraband leakage studies for ALMA-APEX.
Simultaneously to our VLBI observations, ALMA also observes as an interferometric array (referred to as ALMA-only) in a linear-polarization basis.This array data is used for ALMA-VLBI calibration in the Quality Assurance process at ALMA (QA2; Goddi et al. 2019), and provides source-integrated information for calibration refinement and validation, such as total flux densities or polarization properties.Given that our intrasite baselines do not resolve the observed sources, the sourceintegrated properties from ALMA-APEX, SMA-JCMT, and the core component of ALMA-only should match.We show our validation of the derived source polarimetric properties from the intra-site D-term fitting against QA2 ALMA-only estimates in the top (for ) and bottom (for  ) panels of Figure 17.There is a strong correlation between the Stokes parameters of all sources derived from the ALMA-only observations (Goddi et al. 2019) and the estimates from the ALMA-APEX intra-site VLBI baseline.This correlation can be seen as a further validation test of the quality of the EHT polarimetric calibration.
The polarimetric leakage of the SMA is well characterized, with D-terms of only a few percent expected for observations near the 233.0 GHz tuned frequency of the quarter-wave plates (Marrone 2006;Marrone & Rao 2008).In addition to historical measurements of leakage, near-in-time polarimetric observations of sources with the SMA also allowed us to compute quasisimultaneous leakage estimates that can be compared with our Note.The D-term posterior distributions are assumed to be circular Gaussians in the complex plane.intra-site method estimates.Observations of 3C 454.3, M87, and 3C 279 within a month of our EHT campaign provided an upper limit of 10% for D-terms, which is consistent with our intra-site method estimates, and stable across days at the 1% level.We also validated the polsolve leakage estimates using point-source modeling within the eht-imaging and DMC libraries.Both modeling schemes assume a constant polarization fraction and EVPA for the point source.The DMC model fits to cross-hand and parallel-hand visibilities by incorporating right-and left-hand station gains as model parameters, while the eht-imaging model fits to gain-independent "polarimetric closure" data products consisting of the ratio of crosshand visibilities to parallel-hand visibilities on a single baseline (see, e.g., Blackburn et al. 2020) Both models use Gaussian likelihood functions for their respective data products.
In Figure 18, we compare the multi-source polsolve leakage estimates to multi-source eht-imaging fits and single-source (using 3C 279) DMC fits; both eht-imaging and DMC have fit only to the ALMA-APEX baseline, while the polsolve estimates additionally fit to the SMA-JCMT baseline.We find that the leakage terms recovered by all three methods are consistent with one another, with an uncertainty-weighted mean absolute deviation across all days and bands of <1% in absolute leakage between any two methods.
Furthermore, our leakage estimation methods used bandaveraged data products.Given the high S/N of the detections between ALMA and APEX, there are strong detections in all four correlation products (i.e., RR  19.This test showed very stable D-term estimates across the entire band, motivating our use of band-averaging for our final results in Table 3.

Appendix E Fiducial Leakage D-terms from M87 Imaging
We provide fiducial M87 D-term estimates for each imaging or posterior exploration method in Table 5.These D-term results for LMT, SMT, and PV are depicted in the main text in the right panel of Figure 2. In Figure 20 we show an example of one-to-one software comparisons of the campaign-average D-terms for LMT, PV, and SMT.
Note.The imaging pipelines pre-calibrate the ALMA, APEX, SMA, and JCMT D-terms using intra-site baseline fitting (see Section 4.2), and so only the D-terms for stations forming long baselines (i.e., LMT, SMT, and PV) are reported for these approaches.LPCAL is an exception due to its inability to fix D-terms for the pre-corrected stations: derived residual D-terms are shown here.The posterior exploration pipelines do not precalibrate the zero-baseline D-terms (see Appendix C.3), and we report here "residual" leakage values-i.e., the excess leakage, as determined by the posterior exploration pipelines, over that obtained from zero-baseline fitting (given in Table 3 for ALMA and Table 4 for APEX, JCMT, and SMA).The uncertainties for each of the posterior exploration leakage estimates are quoted in parenthesis.D-terms for LMT, SMT, and PV are depicted in Figure 2.

Appendix F Preliminary Imaging Results for M87
In this Appendix we present the preliminary polarimetric results on M87 obtained using the three imaging methods.These preliminary images were generated "by hand," with manual tuning of free parameters in the imaging and calibration process, before full parameter surveys were done to choose parameters more objectively and evaluate uncertainties.These results are not blind tests in analogy to the initial stage of total intensity imaging (see Paper IV). Nonetheless, in this early stage of manual imaging we found a high degree of similarity in the recovered structure and D-terms between methods; these results guided the design of our synthetic data tests and parameter survey strategy that we pursued to obtain our final polarimetric images of M87.
In Figure 21, we present our recovered total intensity and preliminary polarimetric images of M87 on 2017 April 11 produced by the three methods available when preliminary image reconstructions were conducted.In Figure 21, we also show the D-terms associated with these images.Each method reproduces consistent D-terms for all three remaining longbaseline EHT stations.The preliminary polarimetric images are roughly consistent across methods.In all images, the M87 ringlike structure is predominantly polarized mostly in the southwest part with a fractional polarization up to |m| ∼ 15%.The EVPAs are organized into a coherent pattern along the ring.However, small differences in fractional polarization and polarized flux density are evident between the three packages.
The preliminary results in Figure 21 revealed the main structure of the linearly polarized source and suggested consistency between different imaging methods.They strongly motivate the need for conducting full parameter surveys for each method to optimize the chosen imaging parameters and validate the results on synthetic data.Section C.2.In the imaging stage, the critical parameters that influence the final reconstruction include the four hyperparameters α P , α m , β HW , and β TV that set the relative weighting in the objective function between the different data constraints and regularizing terms.In surveying different parameters in the eht-imaging survey, we fix α m = 1 and vary the other three hyperparameter weights.
In addition to the hyperparameter weights, an additional free parameter in our objective function is the amount of additional systematic noise to add to the data as a budget for non-leakage sources of systematic error (see Papers III, IV).To account for these systematics, we add a term equal to ´ f sys | ˜| in quadrature to our baseline thermal noise estimates, where f sys is an overall multiplicative factor.Finally, we also include as a parameter in our surveys the number of iterations n iter of alternating between imaging and calibrating the station D-terms.The full list of parameters that we vary in the ehtimaging parameter survey is listed in Table 6, with the fiducial parameters used in reconstructing images of M87 denoted in bold.
To select a fiducial set of parameters that performs best on all six synthetic data tests, we assign each parameter set p two scores on each synthetic data set a; one scoring the fidelity of the polarized image reconstruction s fid,a (p), and one scoring the accuracy of the D-term estimation s dterm,a (p).First, we score the image fidelity by computing the normalized cross-correlation ρ NX between the reconstructed and ground-truth polarimetric intensity distribution.That is, we use Equation (15) of Paper IV on the images of +   2 2 .Then the fidelity score for the parameter set p on the synthetic data set a is simply We compute the D-term estimation accuracy metric by first calculating the average ℓ 2 distance d D between the reconstructed D-terms and the ground truth for the data set.For M87, we average this distance over the three stations that we calibrate at this stage: SMT, LMT, and PV.Then we transform this distance to a score between 0 (bad) and 1 (good) by using a sigmoid function; where d tol is a threshold for the average distance between the ground truth and recovered D-terms beyond which we begin to heavily penalize the reconstruction.We set d tol = 5%.
Finally, having computed the two scores s fid,a and s dterm,a for the parameter set p on the synthetic data set a, we compute a final score s(p) for the parameter set by multiplying these individual scores together on all six synthetic data sets a: We then have a final score s for each parameter set incorporating its performance in accurately reconstructing the polarized flux distribution and input D-terms on six synthetic data sets.We take the parameter set p with the highest score as our fiducial parameter set.The fiducial set is indicated in bold in Table 6.

G.2. polsolve Parameter Survey
The polsolve algorithm is characterized by several degrees of freedom: the pixel's angular size (smaller values increase the astrometric accuracy of the CLEAN components and, hence, the quality of the deconvolution), the field of view (the effect of this parameter is minimum if a CLEAN masking is applied), the visibility weighting (mainly defined with the Briggs "robustness" parameter, r; Briggs 1995), and the method for division of the Stokes  model into subcomponents of constant fractional polarization (see Section C.1).
The first step in the polsolve procedure is to generate a first version of the  image (using the CASA task clean).Several iterations of phase and amplitude self-calibration (using tasks gaincal and applycal) may be applied to the data, in order to optimize the dynamic range of the  model.The selfcalibration gains are forced to be equal for the R and L polarizations at all antennas. 144Then, the  model is split into several sub-components and formatted for its use in polsolve, using the CASA task CCextract. 145Finally, polsolve estimates the D-terms of LMT, SMT, and PV, together with the fractional polarizations and EVPAs of all the source sub-components.The estimated D-terms are applied to the data and a final run of clean is peformed, to generate the final version of the full-polarization images.
In the polsolve parameter survey, the  division is done in two ways.In one approach, a centered square mask of 50 × 50 μas is created and divided into a regular grid of n × n cells (in this case, cells that do not contain CLEAN components are not used in the fit).Alternatively, a centered circular mask of 40-80 μas diameter is created and divided azimuthally into a regular set of n pieces.The full parameter survey with polsolve consists of an exploration of the following.
1.Both types of model sub-divisions (i.e., either regular grid or azimuthal cuts), with n running from 1 to 12. 2. Robustness parameter, from r = − 2 to r = 2 (i.e., from uniform to natural weighting) in steps of 0.2. 3. Relative weight of the ALMA antenna (which affects the shape of the point-spread function (PSF) of the instrument, especially for values of r far from −2), running from 0.1 to 1.0 in steps of 0.1.4. In all images, a circular CLEAN mask of diameter varying from 40 to 80 μas (in steps of 5 μas) is used.The size of the CLEAN mask is the same as the mask to define the  model sub-division.The pixel size is fixed to 1 μas and the image size covers 256 × 256 μas.We note that for models with extended emission (i.e., Models 1 and 2, see Figure 4) an additional CLEAN mask is added at the southern part of the image.
For each combination of parameters in the survey, we compute the L 1 norm of the differences between true and estimated D-terms, as well as the correlation coefficients, ρ s (for each Stokes parameter, s) between the CLEAN image reconstructions and true-source structures, properly convolved with the same beam.These quantities can be used to select the best combination of parameters for D-term calibration and image reconstruction (the fiducial imaging parameters for polsolve).Depending on the relative weight that is given to L 1 and the average image correlation, r r r r = + +    3 ( ) , we obtain slightly different fiducial parameters.
In Figure 22, we show an example plot from our polsolve parameter survey for Model 5 (see Figure 4; similar results are found for the rest of models in the survey).The chosen figure of merit is (1 − ρ)L 1 , which we show as a function of the robustness parameter r and the number of slices n in the azimuthal sub-division.From Figure 22, we note that the dependence of the figure of merit with n becomes weak for values of n larger than 3-4 and robustness parameters between −1 and −2.This behavior also happens if the regular gridding is used to generate the  i sub-components.A qualitative explanation of this effect may be that large values of n translate into sub-components of sizes smaller than the synthesized resolution.Therefore, increasing the number of sub-components does not improve the fit, because the small separations between neighboring sub-components correspond to spatial frequencies that are not sampled by the interferometer.
Conversely, the fitted fractional polarizations of close by subcomponents become highly correlated in the fit, and the L 1 norm of the D-terms saturate around a minimum value.
Based on the combined analysis of all the synthetic data sets, we determine the fiducial parameters for polsolve: a robustness parameter of −1 (though −2 produces similar results, especially for models 4 to 6, see Figure 4), a circular slicing with n = 8 sub-components (which produces results similar to 9-10 sub-components, and also similar to those from a regular gridding, n × n, with n = 3-5), relative ALMA weights of 1.0 (which produces similar results as for values between 0.5-1.0), and a circular CLEAN mask of 50 μas diameter.

G.3. LPCAL Parameter Survey
The standard procedure for D-term calibration using LPCAL is as follows.1.) Select a calibrator that has either a low fractional polarization or a simple polarization structure, and a wide range of parallactic angle coverage.This is M87 in our case.2.) Produce a total intensity CLEAN map of the calibrator with e.g., Difmap.3.) Split the CLEAN model into several sub-models with the task CCEDT in AIPS.LPCAL assumes that each sub-model has a constant fractional polarization and EVPA.Therefore, the more sub-models that we use to divide the Stokes I image, the more degrees of freedom we have for modeling the source linear polarization.4.) Run LPCAL using the sub-models.
We follow this standard procedure for the D-term calibration.In this work, we consider an additional parameter, the ALMA weight-scaling factor.Down-weighting the ALMA data can be useful when there are significant systematic uncertainties in the ALMA visibilities.In this case, the solutions for other stations can be distorted as the fitting would be dominated by the ALMA baselines due to its high sensitivity and the corresponding smaller formal error bars.In addition to the ALMA weight-scaling factor, we consider the number of CLEAN sub-models as the main parameter that may significantly affect D-term estimation with LPCAL.
We first performed a manual parameter survey using the synthetic data.We reconstructed D-terms with LPCAL by using different numbers of sub-models and ALMA weight-scaling factors, and compared with the ground-truth values.We conclude that using a relatively large number of sub-models (10) gives better reconstructions, while the results do not change much when more than 10 sub-models are used.Also, we find that the results are not sensitive to the ALMA weight re-scaling.The strategy and parameters that we adopted could reproduce the ground-truth D-terms in the synthetic data within an accuracy of ∼1% (Figure 5).
Next, we analyzed the low-band M87 data.We used a common approach as for the synthetic data tests, but we allowed several users to independently calibrate and image the real data with different parameter settings to test the robustness of the method.First, each user reconstructed the Stokes  image with CLEAN in Difmap.The CLEAN components were divided into several sub-models by the task CCEDT in AIPS.Each user then used LPCAL in AIPS to solve for D-terms for all antennas.This includes the possible residual D-terms of ALMA, APEX, and SMA (as LPCAL cannot set any station D-terms to zero).We let each user test different schemes for CLEAN and use different parameters that were not seen as sensitively impacting results in the synthetic data survey.However, all users split their CLEAN models into a number of sub-models (10), in accordance with our synthetic data parameter survey results.Different (u, v) weighting parameters, CLEAN windows, and CLEAN cutoffs were used.Users could choose to downweight ALMA baselines through ParselTongue (Kettenis et al. 2006), a Python interface to AIPS, or average the data in time for LPCAL.We selected fiducial D-terms for the LPCAL pipeline by taking the median of the real and imaginary parts of the surveyed D-terms of each station.This approach allows us to take into account the uncertainties in LPCAL that may be associated with the Stokes  image reconstruction and the parameters used for the different tasks.To get the final M87 image for LPCAL, we applied these fiducial D-terms to the data and imaged in.For the high-band results (Appendix I), we obtained D-term solutions and images from a single, best-bet pipeline using representative parameters (15 CLEAN sub-models, ALMA weight = 1.0) from our survey on the low-band data.
Additionally, we investigated the effects of the parameter selection on the real data with GPCAL, using the same parameters as we identify for LPCAL,146 to examine the improvements of the fit statistics when changing parameters.
Figure 23 shows the distribution of c red 2 for the two parameters from the survey on the M87 data for 2017 April 11.We explored the number of sub-models from five to 20 with an increment of one and the ALMA weight-scaling factor of (0.1, 0.2, 0.4, 0.7, 1.0, 2.0, 5.0, 10.0).Similar to our conclusions from the synthetic data analysis, we found that the fit statistics on the M87 data improve with a larger number of sub-models up to about 10 sub-models.The result is insensitive to changing the ALMA weight-scaling factors.This trend was seen for the M87 data for the other observation days as well.Therefore, we conclude that the parameters that we used for the real data analysis with LPCAL are reasonable.

G.4. DMC Parameter Survey
For the DMC image reconstructions, we surveyed two hyperparameters: the pixel separation and the image field of view.Because the DMC method fits for a systematic uncertainty term in addition to the image and calibration parameters, all fits are formally "good" from the perspective of, e.g., a χ 2 metric.We thus determine an acceptable fit for a particular data set to be the one that minimizes the number of model parameters (i.e., a combination of largest pixel separation and smallest image field of view) while recovering the expected level of systematic uncertainty (i.e., 0% for the synthetic data sets and 2% for the M87 data sets; see Paper III) within some threshold (taken to be the 3σ bounds determined by the posterior distribution).

G.5. THEMIS Parameter Survey
Associated with THEMIS reconstructions are two hyperparameters corresponding to the two-dimensional number of control points.These have natural values set by the number of independent modes that may be reconstructed; for the EHT, 5 × 5.Because this resulted in formally acceptable fits, i.e., reduced-χ 2 near unity, and based on similar considerations from Stokes  image reconstructions (see, e.g., Broderick et al. 2020b), no additional exploration was required for M87.For the Model 1 and 2 synthetic data reconstructions, the number of control points were incrementally increased until acceptable fits were obtained.

Appendix H D-term Monte Carlo Survey Details
In Sections 5.2 and Section 5.3 we generate a sample of 1000 images with different D-term calibration solutions for each method on each observing day to assess our uncertainty in the polarimetric image structure.In this Appendix, we discuss the procedure that we use to generate these samples and provide detailed histograms of the results for the various image-integrated quantities.

H.1. Method
To sample the effects of uncertainties in the D-terms on the reconstructed images in the posterior exploration methods (DMC and THEMIS), we can simply draw our 1000 image sample randomly directly from the full posterior.For the three non-MCMC imaging pipelines (eht-imaging, LPCAL, polsolve), our method is to use a simple Monte Carlo approach, similar to the analysis of Martí-Vidal et al. (2012).
For each method, we draw 1000 random sets of D-terms from normal distributions with means and (co)variances determined by considering the fiducial results from Table 5 across the four observing days.We assume for this test that the uncertainties in the D-terms are uncorrelated from station to station and between LCP and RCP.This represents a worstcase scenario test, as correlations between the D-terms would reduce the volume of the D-term parameter space surveyed by each method.The full sample of 1000 D-term sets sampled for SMT, LMT, and PV on 2017 April 11 for the ehtimaging, polsolve, and LPCAL pipelines are shown in Figure 24.In addition to the distributions shown in Figure 24, we also include D-terms on ALMA, APEX, SMA, and JCMT drawn from circular complex Gaussians with 1% standard deviation; these represent residual uncertainties left over from the zero-baseline D-term calibration of these stations.
After drawing a given set of random D-terms, we apply this calibration solution to the data and reconstruct a polarized image using the same fiducial imaging procedure described for each method in Appendix G. Our imaging scripts in this stage do not involve any simultaneous leakage calibration, but only reconstruct the Stokes  and  from the visibilities with the assumed calibration solution already applied. 147That is, in this procedure we draw a set of random D-terms from distributions reflecting our uncertainty in the recovered D-terms from the earlier stage of calibration and imaging, and then we reconstruct an image assuming that this D-term calibration is perfect with no need for further leakage calibration.

H.2. Distributions of Image-averaged Parameters
In Figures 25 and 26, we show histograms over each imaging method's sample of 1000 images with different D-term calibration solutions on all four observing days of the four image-integrated quantities used in Section 5.3.Figure 25 shows histograms of the image net polarization fraction |m| net (Equation ( 12), plotted in red) and the intensityweighted average polarization fraction at 20 μas resolution 〈|m|〉 (Equation ( 13), plotted in green).Figure 26 shows the amplitude |β 2 | (plotted in brown) and phase ∠β 2 (plotted in purple) of the β 2 coefficient of the azimuthal decomposition defined in Equation ( 14).Because the observations on 2017 April 5 and 11 have the highest-quality (u, v) coverage and bracket the observed time evolution of the source, we choose to define acceptable ranges for these parameters (the shaded bars in Figures 25, 26, taken from Table 2) using only these two days.In particular, the poor quality of the 2017 April 10 (u, v) coverage leads to broader distributions of the four key quantities with large systematic uncertainties between imaging methods (third columns of Figures 25, 26).The distributions on 2017 April 5 and 11 are summarized with mean and 1σ error bars in the main text Figure 9, and are discussed in Section 5.3.
Finally, Table 7 presents ranges of the image-integrated Stokes parameters    , , derived from the surveys over different D-term calibration solutions from each day of observations.The ranges in Table 7 were calculated by taking the minimum mean − 1σ and maximum mean + 1σ point from the five individual method surveys on each day.
Figure 24.Samples of 1000 SMT, LMT, and PV D-terms applied to the data and used in each of the three image reconstruction algorithms in the Monte Carlo procedure described in Section 5.2.Each D-term was drawn from a normal distribution with no correlations between the D-terms.The means and (co)variances of these distributions for each method were determined from the set of four fiducial D-term solutions found across the four observing days for each method.ehtimaging included covariance between the real and imaginary parts of each D-term in its approach, while LPCAL and polsolve did not.12)) and the image-averaged polarization fraction 〈|m|〉 (red: Equation (13)) from each method's survey over random D-term calibration solutions performed on the low-band data.From left to right, the four columns show histograms for 2017 April 5, 6, 10, and 11.In all panels the shaded bands represent the final parameter ranges reported in this work, incorporating the uncertainty both across D-term realizations and reconstruction methods.These ranges are presented in Table 2.Note that as a consequence of the poor (u, v) coverage and parallactic angle sampling, the 2017 April 10 image reconstructions from all methods show more systematic uncertainty in the derived parameters than on the other days.2.
polarization pattern is consistent between the bands on each day.

I.2. Image-averaged Quantities
To evaluate the consistency of the image-averaged quantities, we extend the analysis presented in Section 5.3 to the high-band data.In particular, we generate a sample of 1000 images from the high-band data for each method, exploring a range of different D-term calibration solutions (this procedure is described in Appendix H).
Figure 29 compares results for the key image-integrated metrics (|m| net , 〈|m|〉, β 2 ; see definitions in Section 5.3) derived from such image samples.High-band results for a given method are generally consistent with their low-band counterparts within 1σ.The only notable exception to the overall consistency in the low-and high-band results are the 2017 April 11 polsolve measurements; in particular, |m| net , |β 2 | and ∠β 2 show deivations of 2-5σ between the low and high band.We note that the reported error bars in Figure 29 are derived only from sampling uncertainties in the applied D-terms; they do not include systematic error in the choice of imaging hyperparameters, which were derived only once for each imaging method, on the low-band data.
The mean of the high-band results for all methods fall within the ranges established using the low-band images (Table 2), but the high-band mean -1 σ lower limits fall outside the established ranges for the eht-imaging and polsolve |m| net measurements on 2017 April 11, and for the polsolve measurement of |β 2 | on 2017 April 5.Because the imaging procedures and results for the low band were extensively validated with synthetic data tests, we choose to use the lowband results only in defining the parameter ranges in Table 2.Note that |m| net in particular can be quite sensitive to the choice of imaging hyperparameters used, and that these parameters were not re-derived for the high-band data.
In addition to the image-integrated quantities reported in Figure 29, we also computed image-integrated EVPAs for each method from the high-band D-term calibration survey.These are discussed along with the corresponding low-band results in the main text in Section 6.
Appendix J LMT, SMT, and PV D-terms using Calibrator Data: Synthetic Data Tests, Expected Uncertainties, and Convergence with M87 Results Together with M87, full-array polarimetric calibration and imaging was also attempted on three other sources: 3C 279, observed contemporaneously with M87; and J1924-2914 and NRAO 530, observed contemporaneously with our second EHT primary target, Sgr A * , in the second half of each observing day.3C 279 was observed on the same four days as M87, with the latter two days having the best (u, v) coverage with the addition of SPT.J1924-2914 was observed on all five days of the EHT campaign (the same four days as M87 with the addition of 2017 April 7), and NRAO 530 was observed on the first three days of the campaign (2017 April 5-7).Coverage and data quality vary from day to day, depending on the structures of the observations and, in the case of the Sgr A * calibrators, whether ALMA is observing.For optimal calibration and imaging, we make an initial cut based on (u, v) and field rotation angle coverage, and the presence of ALMA in the array.We exclude J1924-2914 and NRAO 530 observations on 2017 April 5, which do not have ALMA, and the 2017 April 10 two-scan snapshot observations of J1924-2914, which severely lack (u, v) coverage.
In Figure 30, bottom row, we show the field angle coverage on the three calibrators for their best-coverage day (2017 April 11 for 3C 279 and J1924-2914, and 2017 April 7 for NRAO 530).For comparison, the field angle coverage for M87 on 2017 April 11 is also shown.Compared to M87, the   three calibrators are at sufficiently low decl.to also be observed by the SPT, but the elevation stays constant for sources viewed from the South Pole and only a constant field angle is sampled.In Figure 30, top row, we present the m | |  structure in the (u, v) plane prior to D-term calibration for the best-coverage days of the calibrators; 2017 April 11 M87 is also shown for reference.High-polarization fractions are expected in M87 on baselines that probe our visibility minima in total intensity, but the source overall is weakly polarized.3C 279, on the other hand, has multiple baselines exhibiting high polarization fraction.The recovery of D-terms for a highly polarized source like 3C 279 would require an extremely accurate source model in both total intensity and polarization.However, 3C 279ʼs complex structure in both total and polarized intensity add to the difficulty of imaging and calibrating the source (Kim et al. 2020).Furthermore, interferometric-ALMA measurements taken contemporaneously to our EHT campaign found that 3C 279 may have non-negligible Stokes  (see Goddi et al. 2021), which breaks the Stokes =  0 assumptions made in most of our calibration and imaging pipelines.Based on these findings, 3C 279 is thus not the best choice for D-term comparisons with M87.
J1924-2914 and NRAO 530 exhibit low polarization fractions on most baselines (Figure 30) and have negligible Stokes  as measured by interferometric-ALMA (Goddi et al. 2021), making them ideal for D-term calibration and polarimetric imaging.Their total-intensity structure, however, is more uncertain and more complex than M87.Both sources are blazars with bright extended jets (e.g., Wills & Wills 1981;Preston et al. 1989;Shen et al. 1997;Bower & Backer 1998;Healey et al. 2008), and imaging with their current EHT coverage may not capture the complexity of the extended jets in these sources.Nevertheless, their weak polarization allows for better D-term estimates despite uncertainty in modeling their structure.
Following the same methodology as for M87, we generate synthetic data to optimize imaging and calibration parameters for all methods based on J1924-2914 and NRAO 530 lowband coverage.We use the same six ring-like synthetic models as for M87 (see Section 4.3) and add a seventh model constructed with 10 Gaussian sources of varying total and polarization intensity, with some polarization structure offset from Stokes  .This seventh data set is designed to mimic the basic structure seen in the preliminary polarimetric images of the two calibrators, for which the final images will be presented in forthcoming publications (S.Issaoun et al. 2021, in preparation;S. Jorstad et al. 2021, in preparation).We generate seven synthetic EHT observations for each source using their best EHT (u, v) coverage, 2017 April 11 and 2017 April 7 for J1924-2914 and NRAO 530, respectively.Parameter surveys are carried out for each method probing the same parameter space as for M87, and fiducial sets were selected with the same selection metrics; see Appendix G.
In Figures 31 and 32, we present the set of fiducial images from synthetic reconstructions using J1924-2914 and NRAO 530 bestday low-band coverage, respectively.In each panel, the correlations between the ground truth and reconstructed Stokes  and linear polarization P images are provided.Consistently with the results with M87 coverage, the Stokes  correlations are high for all models regardless of method and coverage, and P correlations seem to worsen for models with complex polarization structure or high polarization.
In Figure 33, we compare the recovered leakage D-terms to the ground-truth D-terms for the synthetic data sets with coverage from J1914-2914 (top row) and NRAO 530 (bottom row) and each method.Similarly to the M87 results, PV and SPT have the largest standard deviations for all methods.Their large deviations stem from all methods having difficulty recovering D-terms for models with no strong polarization substructure due to them being isolated stations with only long baselines.Overall, deviations of the D-terms measured via the L 1 norm (and its standard deviation) for the calibrators are comparable to those for M87 for all methods, but the standard deviation on each D-term estimate is noticeably wider for all stations, indicating that while overall image recovery is similar, the coverage differences between the M87 and the calibrator synthetic data do add uncertainty in the D-term recovery for the calibrators.
Finally, we estimate LMT, SMT, and PV D-terms via polarimetric imaging of the J1924-2914 and NRAO 530 EHT data.The polarimetric images of these two calibrators will be presented in forthcoming publications (S.Issaoun et al. 2021, in preparation;S. Jorstad et al. 2021, in preparation).Here, in Figure 34, we show that D-terms of LMT, SMT, and PV estimated by imaging the calibrators roughly agree with those of M87.We note that a better agreement is obtained between the M87 and J1924-2914 D-terms compared to between M87 and NRAO 530.The calibrators have sparser (u, v) coverage (fewer scans), a narrower field rotation range, and more complex Stokes  (extended structure and higher noise level) and polarimetric images compared to M87, which all impact the quality of our D-term estimation.Given these additional complexities, we argue that the calibrator D-terms are consistent with those of M87 (the D-term consistency within 2%-3% is expected for the calibrators; see also Appendix K) and that M87 itself is the best source for polarimetric leakage calibration.
Furthermore, while imaging calibrators we found that the quality of the Stokes  image is critical for calibration.Both NRAO 530 and J1924-2914, as blazar sources, have complex jet structure that is not fully recovered with the current EHT coverage, and thus our Stokes  reconstructions have larger uncertainties and noise levels that those of M87, due to unconstrained flux density on large scales not sampled by our array configuration.Assumptions about the Stokes  image affect the results of the polarimetric imaging and calibration methods, for example in the self-similarity assumption employed for CLEAN reconstructions in our sub-component methods (see Appendix K).

Appendix K Validation of the Similarity Approximation in CLEAN Algorithms
The D-term estimates using the M87 data with polsolve and LPCAL reported in Section 4.2 are based on the similarity approximation.In this approach, the Stokes  CLEAN models are divided into many sub-models to provide many degrees of freedom for modeling the source's linear polarization structure.Nevertheless, the complex linear polarization structure of M87 (Figure 6) may not be perfectly modeled with this approximation.This could be a source of uncertainty in the D-term estimation.
We investigate the effect of the similarity approximation by using the instrumental polarization self-calibration mode in GPCAL, which iterates (i) imaging of the source's linear polarization structures and (ii) solving for the D-terms using the images (Park et al. 2021).We ran GPCAL on the M87 data on 2017 April 11.The Stokes  CLEAN components are divided into 15 sub-models for initial D-term estimation using the similarity approximation.The D-terms of ALMA, APEX, and SMA are fixed to be zero for fitting, as they were already calibrated using the intra-site baselines (Section 4.2).The intra-site baselines are flagged, as the limited field of view of the EHT does not allow us to properly model the source structure observed on large scales.Instrumental polarization self-calibration was then performed with 10 iterations by employing Difmap for producing the Stokes and Stokes  images with CLEAN.
The left panel of Figure 35 compares the D-terms obtained by GPCAL with the the average of the fiducial imaging pipeline results on the same day (Figure 2).Both results using (i) the similarity approximation only (open squares) and (ii) the similarity approximation followed by 10 iterations of instrumental polarization self-calibration (filled circles) are shown.Both show a good consistency with the fiducial D-terms with the L 1 norms of ≈1.0%-1.2%,consistent with the deviations between values of different pipelines seen in Figure 2.This result indicates that the D-terms obtained by polsolve and LPCAL using the M87 data are robust against the similarity approximation.
However, this may not be the case for the calibrators.We ran GPCAL on the J1924-2914 data on 2017 April 11 and NRAO 530 data on 2017 April 7. Similar parameters to the M87 data analysis are used.The results are shown in the middle and right panels of Figure 35.The D-terms obtained with instrumental polarization self-calibration are more consistent with the fiducial D-terms than those obtained with the (2017 April 7) low-band data sets using the eht-imaging, polsolve, and GPCAL pipelines.In the first and third panel from the left the M87 D-terms are depicted with lighter symbols, while heavier symbols mark the calibrator D-terms.In the correlation plots shown in the second and fourth panels from the left, the D-terms for M87 and J1924-2914/NRAO 530 are averaged over different methods.LMT and SMT D-terms derived from J1924-2914are found to be highly consistent with those from M87.The D-terms derived from NRAO 530 imaging on average show larger deviation from M87 the D-terms; in particular, the PV D-terms estimated by ehtimaging show the largest deviation from all other estimates.The L 1 norms do not change much with instrumental polarization self-calibration for M87, while they are significantly improved for the calibrators, especially for J1924-2914.This indicates that the similarity approximation employed by polsolve and LPCAL for the D-term estimation from M87 (Section 4.2) is reasonable.The calibrators may have complex linear polarization structure and D-term estimation from those sources can be improved with instrumental polarization self-calibration.

Figure 1 .
Figure 1.Top row: (u, v) coverage of the four M87 observing days in the 2017 campaign.The color of the data points codes the fractional polarization amplitude m u v , | ( )| in the range from 0 to 2. The data shown are derived from low-band visibilities after the initial calibration pipeline described in Section 3.2 but before any D-term calibration.The data points are coherently scan-averaged.Bottom row: M87 field rotation angle f for each station as a function of time (Equation (5)).

Figure 2 .
Figure 2. Left panel: D-term estimates for ALMA, APEX, JCMT, and SMA from polsolve multi-source intra-site baseline fitting; one point per day and band (low and high) for each station across the EHT 2017 campaign.Both polarizations are shown for ALMA and APEX per day, but only one polarization is shown for JCMT and SMA per day due to JCMT polarization setup limitations.Station averages across days and high/low bands are shown as solid points with error bars.The depicted D-terms are provided in tabulated form in Appendix D. Right panels: fiducial D-terms for LMT, PV, and SMT derived from the low-band data via leakage calibration in tandem with polarimetric imaging methods and posterior modeling of M87 observations.We depict fiducial D-terms per day, where each point corresponds to one station, polarization, and method.Filled symbols depict D-terms from imaging methods and symbols for posterior exploration methods have error bars corresponding to the 1σ standard deviations estimated from the posterior distributions of the resulting D-terms.

Figure 3 .
Figure3.Amplitudes (left panel) and phases (right panel) of the ratio of R to L station gains from the DMC fit to M87 2017 April 11 low-band data.Individual station gain ratios are offset vertically for clarity, with the dashed horizontal lines indicating a unit ratio for each station (i.e., unity for amplitudes and zero for phases).Note that JCMT only observes one polarization at a time, and so provides no constraints on gain ratios.We see that the assumption made by the three imaging pipelines and one posterior exploration pipeline (THEMIS)namely, that the right-and left-hand gains are equal for all stations at all times -largely holds.The behavior in this plot is representative of that seen across days and bands.

Figure 4 .
Figure 4. Fiducial images from synthetic data model reconstructions using M87 2017 April 11 low-band (u, v) coverage.Rows from top to bottom correspond to six different synthetic data sets.Columns from left to right show ground-truth synthetic image (column 1) and the best image reconstructions by each method (columns 2-6).The polarization tick length reflects total linear polarization, while the color reflects fractional polarization from 0 to 0.3.The normalized overlap is calculated against the respective ground-truth image, and in the case of the total intensity it is mean-subtracted.

Figure 5 .
Figure5.A comparison of LMT, SMT, and PV D-term estimates to ground-truth values in the synthetic data sets 1 through 6 (shown in Figure4).Each panel shows correlation of the estimated and the truth D-terms for a single method.Each data point in each panel depicts an average and standard deviation for each D-term estimate derived from the six synthetic data sets.The norm L 1 ≡ |D − D Truth | is averaged over left, right, real, and imaginary components of the D-terms and over all shown EHT stations.Notice that each method recovers the ground-truth D-terms to within ∼1%, on average.

Figure 6 .
Figure 6.Fiducial polarimetric M87 images produced by five independent methods.The results from all imaging and posterior exploration pipelines are shown on the four M87 observation days for the low band data (the low-and high-band results are consistent, see Appendix I).Total intensity is shown in grayscale, polarization ticks indicate the EVPA, the tick length indicates linear polarization intensity magnitude (where a tick length of 10 μas corresponds to ∼30 μJy μas −2 of polarized flux density), and color indicates fractional linear polarization.The tick length is scaled according to the polarized brightness without renormalization to the maximum for each image.The contours mark the linear polarized intensity.The solid, dashed, and dotted contour levels correspond to linearly polarized intensity of 20, 10, and 5 μJy μas −2 , respectively.Cuts were made to omit all regions in the images where Stokes <  10% of the peak brightness and  < 20% of the peak polarized brightness.The images are all displayed with a field of view of 120 μas, and all images were brought to the same nominal resolution by convolution with the circular Gaussian kernel that maximized the cross-correlation of the blurred Stokes  image with the consensus Stokes  image of Paper IV.

Figure 7 .
Figure7.Fiducial M87 average images produced by averaging results from our five reconstruction methods (see Figure6).Method-average images for all four M87 observation days are shown, from left to right.These images show the low-band results; for a comparison between these images and the high-band results, see Figure28in Appendix I. We employ here two visualization schemes (top and bottom rows) to display our four method-average images.The images are all displayed with a field of view of 120 μas.Top row: total intensity, polarization fraction, and EVPA are plotted in the same manner as in Figure6.Bottom row: polarization "field lines" plotted atop an underlying total intensity image.Treating the linear polarization as a vector field, the sweeping lines in the images represent streamlines of this field and thus trace the EVPA patterns in the image.To emphasize the regions with stronger polarization detections, we have scaled the length and opacity of these streamlines as the square of the polarized intensity.This visualization is inspired in part by Line Integral Convolution(Cabral & Leedom 1993) representations of vector fields, and it aims to highlight the newly added polarization information on top of the standard visualization for our previously published Stokes  results (Papers I, IV).
Figure7.Fiducial M87 average images produced by averaging results from our five reconstruction methods (see Figure6).Method-average images for all four M87 observation days are shown, from left to right.These images show the low-band results; for a comparison between these images and the high-band results, see Figure28in Appendix I. We employ here two visualization schemes (top and bottom rows) to display our four method-average images.The images are all displayed with a field of view of 120 μas.Top row: total intensity, polarization fraction, and EVPA are plotted in the same manner as in Figure6.Bottom row: polarization "field lines" plotted atop an underlying total intensity image.Treating the linear polarization as a vector field, the sweeping lines in the images represent streamlines of this field and thus trace the EVPA patterns in the image.To emphasize the regions with stronger polarization detections, we have scaled the length and opacity of these streamlines as the square of the polarized intensity.This visualization is inspired in part by Line Integral Convolution(Cabral & Leedom 1993) representations of vector fields, and it aims to highlight the newly added polarization information on top of the standard visualization for our previously published Stokes  results (Papers I, IV).

Figure 8 .
Figure 8. Histograms of the azimuthal distributions of polarized intensity (left panel) and EVPA (right panel) obtained from the low-band data in the survey of different D-term solutions with all five imaging and posterior exploration methods.These quantities are estimated as the intensity-weighted averages within an angular section of a width of 10°.The position angle is measured counter-clockwise, starting from the north.The position angles with the highest average polarization brightness are marked with dotted lines for each method and day.

Figure 9 .
Figure 9. Summary of the key quantities used in Paper VIII measured by each method from the low-band data on both 2017 April 5 and 11.From left to right, the quantities are the integrated net polarization |m| net (Equation (12)), the average polarization fraction 〈|m|〉 (Equation (13)), and the amplitude |β 2 | and phase ∠β 2 of the m = 2 azimuthal mode of the complex polarization brightness distribution (Equation (14)).The shaded bands show the consensus ranges (Table2) incorporating both uncertainties in these parameters from the D-term calibration and systematic discrepancies between image reconstruction methods.A comparison with analogous highband results is provided in Figure29.

Figure 10 .
Figure 10.M87 net EVPA integrated within 120 μas on all four days for both high and low bands.The low-band results are indicated by circular markers and the high-band results are plotted in a lighter color with square markers.Vertical lines mark the ALMA-only EVPA measurements from Goddi et al. (2021), measured on arcseconds scales.(The error bars of ALMA-only measurements are very small so the shaded bands appear as vertical lines.)The imageintegrated EVPAs are consistent between high and low bands on all days except 2017 April 5, where all methods show an offset of ∼20°.Note that the image reconstructions and EVPA measurements on 2017 April 10 are poorly constrained due to the poor (u, v) coverage.

Figure 11 .
Figure 11.Differences between the RL* and LR * delays for all baselines and scans with S/N higher than 10 (only scans when ALMA is observing are shown).The level of zero delay is shown as a dashed line.

Figure 12 .
Figure12.Top panel: phases of "boomerang" closure traces for M87 (blue) and calibrators (J1924-2914, NRAO 530, 3C 279; gray), i.e., those with a repeated station (here ALMA/APEX) and thus expected to trivially vanish.Bottom panel: normalized residuals of the trivial closure traces on M87 and the calibrator sources in comparison to a unit-variance normal distribution.In both panels, high-and low-band values are shown for scan-averaged data.We see that these boomerang closure trace phases exhibit the expected clustering around zero.

Figure 13 .
Figure 13.Phase of the conjugate closure trace product constructed from the 2017 April 11 low-band observations on the APEX-ALMA-LMT-SMT and APEX-SMT-LMT-ALMA quadrangles.This quantity is sensitive solely to polarization structure; deviations from zero indicate the presence of non-trivial polarization structure (i.e., a polarization fraction that is not constant across the source; Broderick & Pesce 2020).The colored lines show the same conjugate closure trace product phase for each of the image reconstructions (see Figure6), and the dark gray solid line shows the same for the fiducial image (see Figure7).

Figure 14 .
Figure 14.Phases of M87 closure traces on four illustrative quadrangles for the different observation days.Closure traces constructed with ALMA and APEX are shown by filled and open points, respectively.The plotted closure traces represent an average across high-and low-band data.An S/N > 1 selection in the final closure trace phase has been applied.

Figure 15 .
Figure 15.Phase of conjugate closure trace products constructed on the ALMA-PV-LMT-SMT and ALMA-SMT-LMT-PV quadrangles, averaged across high and low bands for each of the days on which M87 was observed.
is the weight of the mth visibility, the index c stands for calibrated visibilities (corrected both for station gains and for instrumental polarization using the current estimate of the D-terms) and N v is the number of visibilities.The calibrated visibilities, *

Figure 16 .
Figure 16.Leakage posteriors for individual stations from DMC and THEMIS reconstructions of M87 on 2017 April 11.Because JCMT only records a single polarization, only left-hand D-terms are shown.The plotted contours enclose 50%, 90%, and 99% of the posterior probability, and show a large degree of overlap for all stations despite considerable differences in the underlying model specifications.

Figure 17 .
Figure 17.Comparison of source-integrated Stokes  and  estimates from intra-site EHT baselines using the multi-source fitting mode of polsolve to those from ALMA-only observations (Goddi et al. 2019).
* ) at each intermediate-frequency band. 143Therefore, we can use the high S/N on ALMA-APEX to estimate the D-terms at each intermediate-frequency band in a polsolve fit and study the possible frequency dependence of the instrumental polarization of ALMA and APEX.The results are shown in Figure

Figure 18 .
Figure18.Pairwise comparisons of leakage estimates for ALMA and APEX, obtained from point-source modeling of the intra-site baseline with polsolve, ehtimaging, and DMC.Both polsolve and eht-imaging leakages are derived from multi-source fits, while the DMC leakages are derived from fitting to 3C 279 only.Each panel aggregates leakage estimates from both stations (ALMA and APEX), both bands, and all four observing days.Values quoted in the lower right-hand corner of each panel are the uncertainty-weighted mean absolute deviation for the corresponding pair of fits.The dashed line on each plot marks where y = x.

Figure 19 .
Figure 19.ALMA (left panels) and APEX (right panels) D-term spectra recovered on 2017 April 11 using polsolve.Each band has a width of 2 GHz and is divided into 32 intermediate-frequency sub-bands (IFs) of equal width.

Figure 20 .
Figure 20.Example one-to-one software comparisons of the campaign-average D-term estimates for LMT, PV, and SMT.Left panel: comparison of eht-imaging estimates against polsolve estimates.Middle panel: comparison of LPCAL estimates against polsolve estimates.Right panel: comparison of LPCAL estimates against eht-imaging estimates.The L 1 norm is averaged over the left/right and real/imaginary components of the D-terms and over all three EHT stations shown.See Section 4.2 for a discussion of the D-term L 1 norm values between eht-imaging/polsolve/LPCAL and THEMIS/DMC.

Figure 22 .
Figure 22. of merit (1 − ρ)L 1 (lower values mean better results) for the polsolve algorithm (running on synthetic data Model 5), as a function of its two main degrees of freedom (i.e.,  sub-division and visibility weighting).

Figure 23 .
Figure 23.Distribution of the reduced chi-squares of the fits to M87 2017 April 11 low-band data with GPCAL, depending on the ALMA weight-scaling factor and the number of CLEAN sub-models.GPCAL parameters that produce nearly identical results to LPCAL were used.A larger number of sub-models provides a better fit, while the results are insensitive to the ALMA weightscaling factors.

Figure 27 .
Figure 27.Fiducial D-terms for LMT, PV, and SMT derived via leakage calibration in the eht-imaging, polsolve, LPCAL polarimetric imaging pipelines and the DMC/THEMIS posterior exploration of M87 data.The D-terms derived from the low band (lighter points) and the high band (heavier points) are consistent with one another within the systematic scatter among methods seen in the low-band results.All D-terms are displayed in the same manner as in the right panels in Figure 2.

Figure 28 .
Figure28.Fiducial M87 average images per day produced by averaging results from our five methods (see Figure6).Method-average images for all four M87 observation days are shown, from left to right.The top and bottom rows show high-and low-band results, respectively.The images are all displayed with a field of view of 120 μas, and all images were brought to the same nominal resolution by convolution with the circular Gaussian kernel that maximized the cross-correlation of the blurred Stokes  image with the consensus Stokes  image of Paper IV.Total intensity, polarization fraction, and EVPA are plotted in the same manner as in Figure7.

Figure 29 .
Figure 29.Comparison of high-and low-band results for the key quantities used in Paper VIII on 2017 April 5 and 11.The low-band results are indicated by circular markers, and the high-band results are plotted in a lighter color with square markers.The low-band results were presented in the main text in Figure 9.The vertical bands indicate the derived parameter ranges from the low-band results presented in Table 2.

Figure 30 .
Figure 30.Top row: comparison of the polarization, (u, v) coverage, and field rotation angle coverage of the main target and the calibrators.2017 April 11 is shown for M87, 3C 279, J1924-2914 while 2017 April 7 is shown for NRAO 530.Color scales indicates fractional polarization amplitude m | |  in the range from 0 to 2. Bottom row: sources field rotation angle f for each station as a function of time.The figure is analogous to Figure 1 for M87 on 2017 April 5, 6, 10 and 11.

Figure 31 .
Figure31.Fiducial images from synthetic data model reconstructions using J1924-2914 low-band (u, v) coverage on 2017 April 11.Polarization tick length reflects total polarization, while color reflects fractional polarization from 0 to 0.3.Normalized overlap is calculated against the respective ground-truth image, and for the case of total intensity is mean-subtracted.

Figure 32 .
Figure32.Fiducial images from synthetic data model reconstructions using NRAO 530 low-band (u, v) coverage on 2017 April 7. Polarization tick length reflects total polarization, while color reflects fractional polarization from 0 to 0.3.Normalized overlap is calculated against the respective ground-truth image, and for the case of total intensity is mean-subtracted.

Figure 33 .
Figure 33.D-terms for LMT, SMT, PV, and SPT derived from synthetic data sets.A comparison of estimates to ground-truth values is shown per software (ehtimaging, polsolve, LPCAL, and GPCAL results are shown in first through fourth columns, respectively) and per (u, v) coverage of the real observations (results for (u, v) coverage of J1924-2914 on 2017 April 11 and the (u, v) coverage of NRAO 530 on 2017 April 7 are shown in the top and bottom rows, respectively).Each data point represents the mean and standard deviation for each D-term estimate derived from synthetic data sets 1-7.The norm L 1 ≡ |D − D Truth | is averaged over left, right, real, and imaginary components of the D-terms and over the four EHT stations shown.

Figure 34 .
Figure34.Comparison of fiducial D-terms for the telescopes LMT, SMT, and PV estimated from M87 (2017 April 11),J1924-2914J1924-  (2017 April 11)  April 11)  and NRAO 530 (2017 April 7) low-band data sets using the eht-imaging, polsolve, and GPCAL pipelines.In the first and third panel from the left the M87 D-terms are depicted with lighter symbols, while heavier symbols mark the calibrator D-terms.In the correlation plots shown in the second and fourth panels from the left, the D-terms for M87 and J1924-2914/NRAO 530 are averaged over different methods.LMT and SMT D-terms derived from J1924-2914are found to be highly consistent with those from M87.The D-terms derived from NRAO 530 imaging on average show larger deviation from M87 the D-terms; in particular, the PV D-terms estimated by ehtimaging show the largest deviation from all other estimates.

Figure 35 .
Figure 35.Comparison of D-terms estimated with GPCAL with and without instrumental self-calibration.The D-term averages estimated with the imaging pipelines on the 2017 April 11 M87 data are shown on the x-axis.The GPCAL results with (filled circles) and without (open squares) instrumental polarization self-calibration are shown on the y − axis (see the text for more details).The left, middle, and right panels show the results for M87 on 2017 April 11, J1924-2914 on 2017 April 11, and NRAO 530 on 2017 April 7, respectively.The L 1 norms do not change much with instrumental polarization self-calibration for M87, while they are significantly improved for the calibrators, especially for J1924-2914.This indicates that the similarity approximation employed by polsolve and LPCAL for the D-term estimation from M87 (Section 4.2) is reasonable.The calibrators may have complex linear polarization structure and D-term estimation from those sources can be improved with instrumental polarization self-calibration.

Table 1
Field Rotation Parameters for the EHT Stations

Table 2
Final Parameter Ranges for the Quantities Used in Scoring GRMHD Models in Paper VIII

Table 3
Daily Average D-terms for ALMA Derived via the Multi-source Intra-site Method

Table 4
Campaign-average D-terms for APEX, JCMT, and SMA Derived via the Multi-source Intra-site Method

Table 5
Fiducial Set of Low-band D-terms for Each Station as Derived from M87 Data Via Polarimetric Imaging with Fiducial Polarimetric Survey Parameters

Table 6
The Parameters Surveyed by eht-imaging Note.The selected fiducial parameters are displayed in bold.

Table 7
Ranges of Image-integrated Stokes    , , Obtained from Each Method's D-term Calibration Survey on the Low-band Data Histograms of the net polarization fraction |m| net (green: Equation (