Long Term Evolution of Surface Features on the Red Supergiant AZ Cyg

We present H-band interferometric observations of the red supergiant (RSG) AZ Cyg made with the Michigan Infra-Red Combiner (MIRC) at the six-telescope Center for High Angular Resolution Astronomy (CHARA) Array. The observations span 5 years (2011-2016), offering insight into the short and long-term evolution of surface features on RSGs. Using a spectrum of AZ Cyg obtained with SpeX on the NASA InfraRed Telescope Facility (IRTF) and synthetic spectra calculated from spherical MARCS, spherical PHOENIX, and SAtlas model atmospheres, we derive $T_{\text{eff}}$ is between $3972 K$ and $4000 K$ and $\log~g$ between $-0.50$ and $0.00$, depending on the stellar model used. Using fits to the squared visibility and Gaia parallaxes we measure its average radius $R=911^{+57}_{-50}~R_{\odot}$. Reconstructions of the stellar surface using our model-independent imaging codes SQUEEZE and OITOOLS.jl show a complex surface with small bright features that appear to vary on a timescale of less than one year and larger features that persist for more than one year. 1D power spectra of these images suggest a characteristic size of $0.52-0.69~R_{\star}$ for the larger, long lived features. This is close to the values of $0.51-0.53~R_{\star}$ derived from 3D RHD models of stellar surfaces. We conclude that interferometric imaging of this star is in line with predictions of 3D RHD models but that short-term imaging is needed to more stringently test predictions of convection in RSGs.


INTRODUCTION
As stars of mass 9 to 25 transition to core He burning, they evolve toward the red supergiant (RSG) stage (Meynet et al. 2015). Reaching radii up to 1500 , stars in this evolutionary stage are cool (3400 − 4100 ) and have luminosities of 20, 000 to 300, 000 . RSGs display high mass-loss rates (10 −7 to 10 −4 yr −1 ) leading to complex circumstellar environments and a variety of evolutionary outcomes (van Loon et al. 2005;De Beck et al. 2010;Mauron & Josselin 2011). Stars may remain in this stage until they end in supernovae or they may evolve blueward, possibly to return once more to the RSG stage (Meynet et al. 2015). Understanding the evolution of RSGs has implications for a variety of astrophysical questions, including the chemical evolution of the Universe and the diversity of supernovae observed.
RSGs display semi-regular variations comprised of a short period of hundreds of days and a longer period lasting thousands of days (Kiss et al. 2006). Although pulsations have been linked to some of these variations, it is likely that the motions of large surface features also play a role. In the early 1970s, Stothers (1972) and Schwarzschild (1975) suggested that the observed periodicity was reminiscent of the variability displayed by convection in Sun-like stars but on a larger scale and longer timescale. Using standard mixing length theory, Stothers (1972) suggested that RSGs possess a small number of extremely large convection cells (∼ 0.5 − 1 ★ ) with lifetimes of 1000s of days. On the other hand, Schwarzschild (1975) extrapolated from the behavior of supergranulation in the Sun, predicting a surface occupied by dozens of large granules with lifetimes of 100s of days. Indeed, both types of features are observed in 3D radiative hydrodynamics (RHD) models of RSGs (Chiavassa et al. 2011b,a). Kravchenko et al. (2019) applied tomography to high resolution spectra of the RSG Cep finding that line-of-sight velocity variations exhibited a phase-shift behind temperature and photometric variations, a feature also found in 3D RHD simulations. The authors suggested that this was evidence that the shorter period of hundreds of days was related to convection within the star. The presence of a granulation-related signal in the irregular variability of Betelgeuse was noted by Ren & Jiang (2020) who determined a granulation time of 138 +8 −2 days by fitting the posterior of the power density spectrum (PSD) of American Association of Variable Star Observers (AAVSO) observations of the star to a modification of an equation Harvey (1985) first used to model solar granulation. However, convection is likely not the cause of the dominant short-term periodicity in RSGs. For example, Joyce et al. (2020) found that radial fundamental and first overtone pulsations, originating in the -mechanism, are the cause of Betelgeuse's short-term (< 400 d) variability.
In 2019, the dramatic dimming of Betelgeuse (Guinan & Wasatonic 2020) brought further attention to the variability of RSGs and its myriad causes. Dharmawardena et al. (2020) suggested the dimming was caused by the presence of a large cool spot in the photosphere of the star based on decreased flux at submillimeter wavelengths. On the other hand, other authors have suggested that spectrophotometry (Levesque & Massey 2020), polarimetry (Cotton et al. 2020), and interferometry (Montargès et al. 2021) provide evidence that the decrease in visual magnitude was the result of obscuring of the surface, likely due to large-grain dust from mass loss. This claim was bolstered by Dupree et al. (2020) who reported the presence of a large bright spot on Betelguese prior to its dimming event. Because the location of the spot was similar to the location of the dimmed portion of the star, they concluded that pulsation and convection elevated material outward and that this material later cooled, dimming the star. Indeed, the contribution of convection to mass-loss has been a long standing question. Using tomography, Josselin & Plez (2007) found velocity gradients indicative of vertical motion on time scales that match the lifetime of convection cells predicted by Schwarzschild (1975). Turbulent pressure from such motions would result in lifting that could play a role in mass loss. However, Arroyo-Torres et al. (2015) studied the extended atmosphere of three RSGs using spectro-interferometry and found that hydrostatic model atmospheres, 3D RHD simulations of convection, and 1D pulsation models each produced atmospheres that were more compact than observed. Whilst pointing out the limitations of current 3D convection and 1D pulsation models, they noted that these models do not take into account short-timescale velocity gradients, which may result from granulation-like motion in convection cells.
Because these stars are quite large and because their convection related features are predicted to be large relative to the stellar radius, studying convection in RSGs is amiable to imaging studies. Using aperture masking at the William Herschel Telescope, Buscher et al. (1990) and Tuthill et al. (1997) found evidence of hotspots via detection of asymmetries in visible wavelength observations of three large RSGs. Because the time scale of the variations matched predictions by Schwarzschild (1975), Tuthill et al. (1997) suggested that the hotspots were related to convection. Later observations of Betelgeuse by Young et al. (2000) showed that detection of these hotspots is wavelength dependent and that at some wavelengths, RSGs appear featureless. Observations by Haubois et al. (2009) of Betelgeuse in the H-band, where H − continuum opacities are at a minimum (1.6 m), suggested that large features do exist in the photosphere, as predicted by 3D RHD models (Chiavassa et al. 2010;Montargès et al. 2014). Using the Center for High Angular Resolution Astronomy (CHARA) Array, Baron et al. (2014) imaged the surface of T Per and RS Per in the H-band, finding more evidence that large surface features are ubiquitous. Looking at short timescale evolution, Montargès et al. (2018) monitored the evolving surface features of CE Tau between November and December 2016, finding evidence of short-term changes. As for longer timescale evolution, López Ariste et al. (2018) collected spectro-polarimetry of Betelgeuse between 2013 and 2018 and found evidence of warm regions of scale 0.6 ★ in the photosphere that persisted for up to 4 years, which the authors tied to giant convection cells. Climent et al. (2020) presented reconstructions the surface of V602 Carinae obtained with VLTI-PIONIER. These data, collected in 2016 and 2019, are limited by temporal and angular resolution in their ability to study the size of individual granules. However, they suggest that the surface does not change significantly over a period of at least 70 days and that 3D RHD models are able to reproduce surface features as observed in images of RSGs. Moreover, comparison of continuum images to 3D RHD models suggests that the large features present in these images are indeed due to convection.
Our work in many ways complements this study of V602 Carinae. In order to better grasp the timescale and spatial scale of surface variations on RSGs, we undertook a 5 year study of the RSG AZ Cyg using the Michigan Infrared Combiner (MIRC) on the CHARA Array. In section 2 we discuss the observations and data reduction of the interferometric and spectroscopic data we used in this study. In section 3 we present the fundamental stellar parameters of AZ Cyg. In section 4 we discuss our image reconstruction techniques and present images of AZ Cyg from the 5 years of observations. In section 5, we use 1D power spectra of the images to compare the reconstructions to predictions of surface feature size by 2D and 3D models. We conclude the paper in section 6 with a summary of the work and discussion about the need for observations at shorter time scales.  Kiss et al. (2010) with the Michigan InfraRed Combiner (MIRC) using four telescopes on the CHARA Array suggested the presence of complex structures on the surface of the star. These results inspired this more involved study using the updated MIRC after it was upgraded to combine the light from six telescopes, which enabled imaging of the surface structures.

CHARA/MIRC Observations
Over the course of 5 years, we observed AZ Cyg using MIRC at the Georgia State University's CHARA Array. The CHARA Array is located on Mount Wilson, CA and consists of six 1 m telescopes with 15 baselines ranging from 34 m to 331 m and angular resolutions reaching ∼0.5 mas in the H-band (ten Brummelaar et al. 2005).
The MIRC instrument, which has since been upgraded to MIRC-X, was a six beam combiner operating in the H-band (1.5-1.8 m). The instrument was capable of providing 15 visibilities, 10 independent closure phases, and 9 independent closure amplitudes per spectral channel. In the low spectral resolution mode used for our observations (R=30), eight 30 nm wide spectral channels are acquired (Monnier et al. 2004. We recorded 21 data blocks for AZ Cyg, as listed in Table 1, with each data block corresponding to a continuous observation of a target for 10 minutes. Note that because of issues with calibration frames taken during the observation run, we were unable to use data from 2012 in our analysis. To reduce data, we used the latest version of the official MIRC reduction pipeline (as of June 2017; Monnier et al. 2007). In the pipeline, we applied a 17 millisecond coherence time and a cross-talk correction for visibilities less than 0.1. Corrections for variations in atmospheric coherence time and optical changes in the beam path were obtained using calibrator stars noted in Table 1 and described in Table 2. Following the method of Monnier et al. (2012), we also corrected for systematic errors. We used a 6.6% multiplicative error correction and 2 × 10 −4 additive error correction. For triple amplitudes, we used a 10% multiplicative error correction and 1 × 10 −5 additive error correction. Following Zhao et al. (2011), we applied a 1 • error floor for closure phases.
Because we do not expect significant brightness variation or rotation over periods of less than one month, we merged calibrated data observed within three weeks of each other into single data files for each epoch  Zhao et al. (2008) prior to analysis and image reconstruction. The purpose of this was to build up (u,v) coverage and increase the number of data to points to assist in image reconstruction. In Figure 1, we display the (u,v) coverage for the 2011 epoch. Figure 11 in the Appendix contains plots of (u,v) coverage for all epochs discussed in this paper. Because of observing conditions or ongoing work on different telescopes, it was not always possible to record data with all six telescopes.

SpeX IRTF Spectroscopy
In addition to the interferometric data, we obtained a spectrum of AZ Cyg on 2016 September 6 using the short cross dispersed (SXD) mode on the SpeX spectrograph on the NASA Infrared Telescope Facility (IRTF) 3 meter telescope atop Mauna Kea ). SpeX's SXD mode covers 0.7-2.5 m at ∼ 2000 and is matched to a 0.3x15" slit. In order to correct for additive systematic effects such as dark current and sky background, we employed an "AB" method in which we observed the target at two different positions along the slit. To build up signal-to-noise (S/N), we collected 5 AB pairs. The total integration time on the target was 13.902 s. E1-E2 E1-W2 E2-W2 S1-E1 S1-E2 S1-W2 S2-E1 S2-E2 S2-S1 S2-W2 W1-E1 W1-E2 W1-S1 W1-S2 W1-W2 For purposes of flux calibration and correction of telluric features, we obtained 4 AB pairs of the A0 V standard HD 201320, which we selected using a list provided by the SpeX instrument team. The integration time on each exposure was 29.65 s. We used Spextool, an Interactive Data Language (IDL) package (Cushing et al. 2004) to reduce data. During this process, the standard star spectrum was used to generate a model telluric spectrum by adjusting a convolved model spectrum of Vega to match the standard star, as described in Vacca et al. (2003), and dividing the standard spectrum by this model so that only telluric lines remained. To get a spectrum corrected for telluric lines, we divided the AZ Cyg spectrum by this derived telluric spectrum. We shifted the telluric spectrum roughly 0.1-0.2 pixels in order to minimize systematic errors around sharp telluric lines. Telluric correction using this method within the SpeX pipeline calibrates flux to 10% accuracy (Rayner et al. 2009). Calibration frames also included a flat field and Argon arc lamp exposure. In the final step of data reduction, we merged the order separated spectra into a single spectrum. In SpeX spectra, the offset between neighboring orders is between 1-3% (Rayner et al. 2009).
Following reduction, we followed the procedure of Villaume et al. (2017) to absolutely flux calibrate our data. In short, we determined a calibration factor, that we used to scale the entire spectrum, whilst preserving the relative flux calibration between each order. In equation 1, ( ) = 10 ( AB −N obs )/2.5 where AB is the AB magnitude determined from the observed spectrum, N obs is a random normal distribution determined using 2MASS magnitudes and errors from Cutri et al. (2003), and are weights comprised of 1 1 deviations from ( ).

Angular Diameter
We determined uniform disk (UD) and power law limb darkened disk (LDD) diameters using 2 minimization of models to the squared visibilities. The UD complex visibilities are given by the equation ( ) = 2 1 ( ) where = , 1 is a Bessel function of the first kind, the spatial frequency at which the visibility function is sampled, and the angular diameter being fit. For the limb darkened disk diameter, we used the Hestroffer law (Hestroffer 1997): where is intensity, = √︁ 1 − (2 / ) 2 with the angular distance from the center of the star, the angular diameter of the photosphere, and is the limb darkening parameter. The complex visibility function corresponding to this law, as a function of = , is: where Γ is the Gamma function (Lacour et al. 2008).
We present the results of our fits in Table 3. Note that we did not observe any significant dependence on wavelength in the angular diameters when fitting using individual channels. The angular diameter and limb darkening parameter of the star seem to vary temporally. Because we did not limit our fits to the first lobe, the reduced 2 values for fits to the uniform disk are high. This is to be expected for an object which deviates from a uniform disk, such as a star with spots on it and a darkened limb. Although lower, reduced 2 values for fits to the limb darkened disk are likewise impacted by the presence of such features. In Figure 2 we plot the visibility curve of each epoch, denoted by different colors. Looking at the first two lobes, which provide the diameter information, one can see that the differences in diameter arise from the second lobe and lower visibilities of the first lobe. This suggests that the variation in measured diameters

Fundamental Stellar Parameters
To determine the fundamental stellar parameters of AZ Cyg, we made use of atlases of synthetic spectra based on spherical MARCS models (Gustafsson et al. 2008), SAtlas models (Lester & Neilson 2008), and synthetic spectra from the Lançon et al. (2007) spectral library, which is based on spherical PHOENIX models. Within the Lançon et al. (2007) grid, we selected the spectra with solar metallicity but 4 He, 12 C, 14 N, and 16 O abundances modified to match those of a = 20 RSG near the end of its life. The MARCS grid consisted of 280 models, covering eff = 3300 − 4500 , log = −0.5 ± 1.0, and [Fe/H]= 0.0 ± 1.0. All models were calculated for = 15 . To generate the model spectra from MARCS models we used TURBOSPECTRUM v19.1, a 1D LTE spectral synthesis code that incorporates numerous atomic and molecular features (Alvarez & Plez 1998;Plez 2012). We calculated spectra with turb = 2 km s −1 and 0.1Å steps from 650-5500 nm, in order to cover the complete range of our SpeX observations.
To generate the grid of SAtlas model atmospheres we first calculated a grid of plane parallel model atmospheres, starting with the initial grid of model atmospheres from Fiorella Castelli's website 1 which covered the parameter space eff = [3500, 3750, 4000, 4250, 4500] , log = [0, 0.5, 1.0], [Fe/H]=[-5.5:0.5 in 0.5 steps], and v turb = 2 km s −1 (Castelli & Kurucz 2003). We used the coefficients, the "new" opacity distribution functions recomputed with an updated H 2 O line list, and Kurucz's molecular and atomic line lists from Castelli's website. Using this grid, we calculated an expanded grid covering eff = 3000 − 4500 in 100 steps and log = −1.0 ± 1.0 in 0.25 steps at [Fe/H]=0.0 and v turb = 2 km s −1 . Using this grid of plane parallel models, we then calculated spherical models with radius = 100 − 900 , mass = 8, 10,15,20,30 , and luminosity determined by using the corresponding radius and temperatures from 3000 − 4300 in 100 steps. Prior to calculation we checked whether the L, R, and M fell into RSG parameter space, which we defined as 10000 < < 500000 and −1.0 < log < 1.0. This resulted in a grid of 542 model atmospheres. We then used the SYNTHE suite 2 to generate model spectra at resolution R=500,000 from 680 to 2700 nm using the Kurucz linelists as inputs (Kurucz 1993;Sbordone 2005).
We obtained stellar parameters by minimizing the 2 between the observed spectra and our model spectra (after convolving to the resolution of SpeX) using the Amoeba function within IDL, which uses the Nelder-Mead downhill simplex method for multi-dimensional minimization. We included a radial velocity correction of −3.88 km s −1 based on the measurement of the radial velocity from Gaia Data release 2 and corrected for reddening using a modified version of the Cardelli reddening law (Cardelli et al. 1989) as proposed by Massey et al. (2005), setting = 4.2 rather than the normal 3.1 in order to account for circumstellar dust. We allowed the color excess ( − ) to vary as a parameter in our fits. We limited our fit to 850 − 2290 m in order to avoid strong TiO and CN lines which emerge further in the atmosphere of the star and would drive the temperature lower (Davies et al. 2013). We report the results of our fitting in Table  4 and present an example of the fits in Figure 3, with the rest in Figure 12 in the Appendix. Keep in mind that the grids and input parameters for each model atmosphere atlas are somewhat different. Thus, in Table  4 luminosities and radii for the MARCS and PHOENIX rows are derived from the effective temperature, log , and mass of the best fitting model. For the SAtlas row, effective temperature and log are derived from the luminosity, radius, and mass of the best fitting model. Note also that the luminosities for the SAtlas models were originally derived from a grid of radii and effective temperatures; as a result, the luminosity grid does not correspond to rounded numbers.  We find that eff ranges from 3972 to 4000 and log from -0.5 to 0. The color excess we find, ranges from ( − ) = 0.54 to 0.56. Gaia (Gaia Collaboration et al. 2018; Gaia Collaboration 2018) reports eff = 3294 for AZ Cyg, which is close to the value of eff = 3300 given by Lancon & Rocca-Volmerange (1992). The temperature scale of Levesque et al. (2005), based on fits of TiO rich optical spectra to spherical MARCS models, gives eff = 3535 − 3660 for spectral types M2-4.5. On the other hand, van Belle et al. (2009) gives eff = 3513 ± 168 K to 3755 ± 194 for spectral types M2-M4.7, based on angular diameters derived from the Palomar Testbest Interferometer data and fits of SEDs to templates from Pickles (1998). The cooler temperatures reported previously likely result from the inclusion of regions of cool molecules further in the atmosphere of the star. Davies et al. (2013) found that measurements of RSG temperatures using spectral regions with strong molecular lines resulted in cooler temperatures. Using the limb-darkened diameters and Gaia distances from Bailer-Jones et al. (2021)  Interferometric imaging constitutes an ill-posed inverse problem, both due to sparse (u,v) coverage and due to the incompleteness of the phase information carried by closure phases. The problem is typically solved by regularized maximum likelihood (Thiébaut & Young 2017). The reconstructed image opt ∈ R × ≥0 is defined as the most probable × -pixel image given the data, under a set of prior assumptions about the image (such as a given level of smoothness or sparsity). The data are assumed to be normally distributed, following the OIFITS standard (Pauls et al. 2005). The priors are then imposed both by the optimization engine for image positivity, and by a regularization function ( ), so that opt is given by: where is a hyperparameter that sets the relative strength of the regularization with respect to the 2 . In this paper, we use two very different imaging codes to produce the reconstructions. Both exemplify two different optimization techniques to solve Eq. 4. OITOOLS.jl 3 uses a quasi-Newton method with positivity constraint (Thiebaut 2002), a fast minimization technique which makes use of the analytic expressions of the gradients 2 and ( ) . A major drawback of OITOOLS.jl is consequently its restricted choice of regularization function, which needs to be both differentiable and convex.
In contrast, SQUEEZE 4 ( Baron et al. 2010Baron et al. , 2012 employs slower, gradient-less, stochastic methods. In this paper we used its default simulated annealing mode. The image is modeled as a superposition of large number of flux elements (typically 1000-10000 elements). These elements are free to randomly move within the image, increasing or decreasing the 2 + functional as they do so and as the posterior distribution is explored. The evolution of the image as the elements move constitutes a Markov Chain. After a period of burn-in where elements are settling, the chain stabilizes: the posterior distribution is still being explored, but only close to the optimal solution. Averaging the chain over the last hundreds of iterations then provides a mean image that, while it does not strictly minimize Eq. 4, still characterizes the posterior around the solution. A standard deviation is also produced, that quantifies the spread around the mean and thus reflect the posterior sharpness.

Regularization
The best non-committal regularization in optical interferometry is generally admitted to be total variation (Renard et al. 2011), which acts on the image by imposing greater sparsity on its spatial gradient ∇ . The total variation functional TV is the ℓ 1 norm of the spatial gradient, and can be approximated by: where is chosen to be a small number close to numerical precision. This approximation is differentiable and can be readily used in OITOOLS.jl. Total variation's effectiveness can be traced to two main effects. First, it is very effective in regularizing the outside of the star, i.e., pixels with zero flux. As such it acts as a soft-support constraint, as opposed to a hard-support constraint enforced e.g., by a mask. Second, total variation prevents high frequency noise. Total variation, by design, also tends to favor zones of uniform flux separated by sharp boundaries. Since 3D simulations of stellar convection do show convection cells separated by black outline, total variation may be a good choice as long as remains reasonable.
In addition to total variation, we also employ a Laplacian regularizer and ℓ 0 regularizer in this study. The Laplacian regularizer is the ℓ 1 norm of the Laplacian: LA ( ) = ∇ 2 2 1 . Because it favors contours, it is particularly useful in imaging stellar surfaces with non-uniform flux. The ℓ 0 regularizer is a pseudo-norm, which is not a norm since it is non-convex:  where 1 R ≥0 is the indicator function of R ≥0 , the set of real non-negative numbers. The ℓ 0 pseudo-norm counts the number of non-zero elements in the vector it is applied to and hereby heavily penalizes stray flux outside of the stellar disk.

Results
Prior to reconstructing images of AZ Cyg, we ran an experiment to test the impact of different combinations of regularizers and hyperparameters on reconstructions of RSGs. To start, we selected a snapshot of a 3D RHD simulation from Chiavassa et al. (2011a), scaled the snapshot to the angular diameter of AZ Cyg, and copied the (u, v) coverage from each epoch to make four OIFITS files of simulated observations of a RSG. Using each simulated observation, we ran a large stack of reconstructions with SQUEEZE using different regularizers and hyperparameters. In each reconstruction, we used a mask of a 4-4.25 mas disk and an initial image corresponding to the best fitting limb darkened disk for a given epoch. Each reconstruction ran on 5 independent chains and we aligned and averaged the results of all the chains to produce a single final mean image. We used the ℓ 1 norm to compare each mean image to the source snapshot, convolved to the resolution of the reconstruction and rebinned to match a 4.0 mas star. Gomes et al. (2017) found the ℓ 1 norm to be the optimal metric for comparing reconstructions to source images.
For the reconstruction of images using real data from each epoch, we selected the regularizers and hyperparameters used in the reconstruction with the lowest ℓ 1 norm of the simulated data corresponding to that epoch. In Figure 4, we depict the source image and the best fitting reconstruction for the simulated observation based on the 2011 observations. In Table 5 we present the reconstruction parameters we used for each epoch, based on the results of this experiment. To produce our images, we ran SQUEEZE using the parameters from Table 5 on 10 chains for 500 iterations using 4500 elements on a 64x64 grid at a scale of 0.125 mas/pixel, which is roughly 1/8 th the resolution of our data. In each reconstruction we used an initial image of a 4.0-4.25 mas disk, which we smoothed with a Gaussian in order to enforce artificially sharp boundaries at the edges, and used as a prior image the best fitting limb darkened disk for a given epoch.
The use of multiple chains offers a chance to investigate the impact of variations in the reconstruction process on the final image. For each epoch, we determined mean and standard deviation images from the result of all chains. We found that the standard deviation image was not very informative, likely because of the use of a mask and initial image, and that the result from each chain was only marginally different.
We also tested our images for artifacts by simulating an observation of a uniform disk using the (u, v) coverage of our observations and reconstructing an image using exactly the same parameters. We depict the result of that experiment in Figure 5. Although the reconstructed images do have some noisy features, we note that in every case the patterns observed in the reconstructions of the true data are not present and that the pixel intensities are much lower than found in the images from the real data.
In the second row of Figure 5 we present SQUEEZE reconstructed images of AZ Cyg from 2011 to 2016. In Figure 6, we present the squared visibilities and closure phases of the 2011 data compared to synthetic squared visibilities and closure phases generated from the mean reconstructions. A figure presenting a comparison of data from all epochs is available in the Appendix in Figure 13. The reduced 2 of the reconstructions, based on comparison of the squared visibilities, triple amplitude, and closure phases of the data, are listed in Table 5. The reduced 2 in Table 5 and residuals in the bottom panels of Figure  13 indicate a close match between reconstructed images and data and low frequencies, but that there are notable deviations at higher frequencies and in datasets with longer temporal coverage.
Although the spectral resolution of the observations is rather low, we also reconstructed images for each wavelength channel in order to test for wavelength dependence. These images were produced using the same reconstruction parameters as used in reconstructing images using data from all wavelength channels. We present these images in Figure 14 in the Appendix.
In order to test the results of our reconstructions, we also used OITOOLS.jl to reconstruct the data using total variation and a centering prior. We used the same masks and initial images as used with SQUEEZE. We display the reconstructions made using OITOOLS.jl in the top row of Figure 5. We note that prominent features found in the images produced using SQUEEZE are present in the reconstructions made with OITOOLS.jl, which suggests that these features are not the result of a bias in the reconstruction process. In the following analysis section, we will use only the images produced using SQUEEZE.
As further tests of our images, we split up the observations from the 2016 epoch into two parts and reconstructed images with these split data. We find that the prominent features in the image produced with the complete dataset remain in the split data sets. We depict the result of this experiment in Figure 7.

DISCUSSION
In Figure 5 we can see that the photosphere of AZ Cyg is non-uniform and consists of features of varying size and intensity. The accompanying reconstructions of uniform disks based on simulated observations using the same (u,v) coverage do not show similar features as the reconstructions of AZ Cyg. Thus we posit that the features in the reconstructions of the star are not artifacts of observational coverage.
In Figure 14, it is clear that the images have a wavelength dependence. The spectrum of AZ Cyg is dominated by CN lines from roughly 1.450 to 1.560 m and between 1.560 and 1.640 m are several CO bands. Thus, it is likely that reconstructions in these channels display the appearance of the star at higher levels in the atmosphere where these lines form. On the other hand, the channels between 1.670 and 1.760 m cover a continuum region of the spectrum. Thus these reconstructions in these channels present the surface of the star.
In each epoch, there are a number of relatively small (< 30% R * ), bright features of various intensities, as well as larger medium intensity regions and smaller dark regions. The size of the small, bright features appears to be similar in every year, but as these features are less than the nominal resolution of the CHARA Array, it could be that this is the result of a limitation in our data. On the other hand, the medium intensity regions seem to be consistent in their size, occupying > 0.5 ★ , as was found in Chiavassa et al. (2010). The smaller, dark features are of comparable size to the bright regions.
3D RHD models, as well as spectroscopic and photometric evidence suggest that the surface of RSGs is occupied by large, long lived convection cells with spatial extent ≥ 0.5 ★ . Within these cells there are expected to be smaller granules with a lifetime on the order of months. In the reconstruction of the 3D RHD model presented in Figure 4, we see that the large convection cells in the source image appear as bright to medium intensity regions in the reconstruction, with the bright regions being granules of rising hot gas. The darker regions in the reconstruction correspond to the thin lanes of infalling gas between the large convection cells. Although the source image has four or five large cells, in the reconstruction, these regions blur together and it is difficult to distinguish more than three separate regions.
Carrying this same analysis to the reconstructions of AZ Cyg, we see a strikingly similar surface pattern. In 2011, the surface of AZ Cyg exhibited several distinct regions that may be convection cells bounded by dark lanes. One of these is in the northwest quadrant of the star, another is in the northeastern quadrant and one is in the southeastern quadrant. In addition, there are several bright regions that may be granules. The size of these granule-like features is at the limit of the interferometer, so these images provide an upper limit of 0.25 ★ on the granules.
The image from 2011 is separated too far in time from the 2014 image to make any temporal comparison. However, comparing the 2014-2016 images, allows us to make a rough estimate of the lifetime of these features. In 2014, there is a region in the northwest that may be a convection cell, a feature in the south that may be a separate cell, and a feature in the east that may be another. In 2015, the bright granules are no longer in the same location, as expected, but there remain a medium intensity region in the northwest and another in the south. The similarities between 2015 and 2016 are even more striking. The pattern in the southwest in the 2015 image remains in the 2016 image. This suggests that the lifetime of the larger features is at least one year and possibly two. The rotation periods of most RSGs are on the order of 20 years. Thus, it is not likely that yearly changes on the surface are the result of long lived features moving out of view. To better understand the lifetime of surface features will require data obtained at a shorter time interval, perhaps monthly.
In order to quantitatively analyze our images we first determined the root-mean-square (rms) relative intensity contrast (which we will call contrast throughout the rest this paper) of the surface granulation using the definition of Tremblay et al. (2013) which has found use by Wittkowski et al. (2017a), Paladini et al. (2018) and Montargès et al. (2018) in studies of evolved stars.
As noted in Tremblay et al. (2013), this quantity is a measure of the deviation of a star from planeparallel approximation and 3D models with the highest contrast are those with the least efficient convection (Trampedach et al. 2013). The contrast is correlated with the Mach number, which is the ratio of the flow and sound speeds, such that stars with high contrast ratios will have larger Mach numbers. This is explained by mixing length theory, which suggests that larger convective velocities correspond to larger temperature fluctuations between convective features and their surroundings. The intensity contrast ratio is correlated with density, with lower densities corresponding to higher Mach numbers. Because lower temperature stars have lower densities, there also exists a correlation between temperature and intensity contrast, with lower temperatures corresponding to larger intensity contrast ratios. This relation is expressed quantitatively (for giant stars) by Trampedach et al. (2013): rms /% = (20.81 ± 1.81) × log eff − (1.11 ± 0.14) × log − 55.46 ± 0.29.
We present the average intensity contrasts in Table 6 and Figure 8. To get these values, we took the average of the contrast measured on the mean image of each chain. This permits our measured values to incorporate the spread resulting from the reconstruction process. We also performed this measurement on the OITOOLS.jl reconstruction, which does not have an error value because the reconstruction process results in only one image.
We find that our contrast ranges from roughly 5.0% to 14.5% in the continuum wavelength channels. Contrasts in observations from 2014 are notably lower that other years. It is unclear if this is the result of an actual difference in the star, or the lower quality of that year's data. Using Equation 9 with the parameters from Table 4 gives 19.35±6.52% through 20.05±6.53. Paladini et al. (2018) found contrasts of 11.9 ± 0.4% to 13.1 ± 0.2% for an AGB of eff = 3200 and log = −0.4. Montargès et al. (2018) found contrasts of 5% to 6% for the RSG CE Tau with eff = 3820 ± 135 and log = 0.05 +0.11 −0.17 . Wittkowski et al. (2017b) found a contrast of 10±4% for the RSG V776 Cen with eff = 4290 ± 760 . Thus, our contrasts fall within the range of those found around similar stars. Based on the models described by Tremblay et al. (2013), which we note do not cover RSG parameter space, these intensity contrasts correspond roughly to Mach numbers between 0.25 and 0.75 at unity Rosseland optical depth (< > , = 1). In models of red giant stars, Tremblay et al. (2013) found relative intensity contrasts of roughly 18% and 22%. Though these models have similar temperature to AZ Cyg, they are of higher gravity (log = 1.0 and 1.5). A lower gravity would suggest a lower density and thus larger Mach number and larger intensity contrast, a trend also described by Equation 9. That we find a lower contrast rather than higher suggests some difference between red giants and red supergiants. If we focus on the Mach number as a guide, one explanation is that the convective flow velocity behaves differently. A lower contrast suggests a lower Mach number and thus a lower convective flow velocity, which could result if the depth of the cells was larger in RSGs than in red giants. A lower convective velocity would also suggest a larger advective time scale for the same pressure scale height and thus longer cell lifetimes. Tremblay et al. (2013) and Trampedach et al. (2013) also offer a way of measuring the characteristic size of features using the power spectra of the intensity maps they generated. This method was adapted by Paladini et al. (2018) to measure the characteristic size of granulation on the AGB 1 Gruis using images generated from Very Large Telescope Interferometer (VLTI) data. In order to derive the size of features on AZ Cyg, we followed a similar method. We first took the 1D spatial power spectra of each mean image. We then filtered the disk out of the images by setting the pixel values of an image from the disk boundary outwards equal to the median flux of the pixels within the star. This had the effect of smoothing out the sharp boundary at the edge of the disk and enabled detection of prominent power carrying features within the disk. To make identification of the peak easier, we padded this filtered image by 200 pixels. We identified the characteristic size of granulation as the remaining peak in a power spectrum. We present the power spectrum of each epoch in Figure 9. We also present a comparison of power spectra of images produced with SQUEEZE and OITOOLS.jl in Figure 10. We note the characteristic size of granulation in Table 7.

Spatial scales
Using mixing length theory with turbulence, Antia et al. (1984) predicted a small number of large granules on the surface of cool, evolved stars such as red giants and red supergiants. This was in line with the qualitative arguments by Schwarzschild (1975) which extrapolated from solar data to suggest that the atmospheric depth of the maximum of the superadiabatic temperature gradient was related to the vertical extent of a granule and thus also related to the horizontal size of such a feature. 2D and 3D models (Freytag et al. 1997;Trampedach et al. 2013;Tremblay et al. 2013) have found that granule size, gran is proportional to the scale pressure height, , a result of the fact that the mixing length, the length at which a convective element dissipates into its environment, is proportional to in current theories of convection. Interestingly, Freytag et al. (1997) found that the superadiabatic temperature gradient was less useful as a Table 8. Calculated Surface Properties of AZ Cyg. When ranges are reported they represent the lower and upper values determined using the parameters in Table 4 Year gran, obsv gran, Freytag gran, Tremblay gran, Trampedach gran, Chiavassa (10) Tremblay et al. (2013) found that the relation between gran and was not linear but varied based on Mach number. They noted that gran / might be related to the ratio between horizontal and vertical velocity, with larger velocity ratios corresponding to larger ratios of gran / . To describe the relationship between stellar parameter and gran they determined a parametrization using least squares fits to the granule size described by a power spectrum:
Finally, using 3D RHD models of red supergiants, Chiavassa et al. (2009) found the equation of Freytag et al. (1997) needed to account for turbulent pressure. Including this factor, they used with a parameter close to one, the adiabatic exponent, the sound speed, the average particle mass, the atomic mass of hydrogen, and turb the turbulent velocity. Using this equation and models of RSGs, Chiavassa et al. (2009) found that convective feature size could be obtained by multiplying Equation 10 by five. The feature size predicted by this equation matched that of the large convection features in their 3D RHD models. Chiavassa et al. (2011a) conducted further analysis of the relation between pressure scale height and large surface features, this time as suggested by the standard deviation of photocenter displacement. They found that both 3D RHD models and interferometric observations of RSGs suggests the relation between convection cell size and pressure scale height is different in RSGs than it is in giant stars. In Table 8, we display the parameters of the surface of AZ Cyg derived from Equations 10-13, which we determined using values from Table 4. We report ranges in calculated granule size for Equations 11 and 12 because these equations depend on log , for which we found a wide-range of possible values, depending on the stellar atmosphere code used in the fit (see Table 4). We did not use averages of the parameters in Table 4 because they resulted from very different stellar atmosphere codes. Based on these results, it appears that the features measured in the images of AZ Cyg presented in this paper are the same as those calculated by Equation 13. These features are likely the longer lived convection cells. If we assume that these cells last 3350 days, the length of the longest period observed by Kiss et al. (2006) and Chatys et al. (2019), the same cell could be present during the entirety of the observing cycle. The similarity in measured granule size between 2015 and 2016 suggests that features of that scale do last longer than a year. The difference between those years and the measured size in 2011 may indicate the emergence of a new cell, or it could indicate that there was a change in the size of a cell during its lifetime. Further, long term observations of the star, which we are pursuing, will be necessary to image the full life-time of a convection cell. On the other hand, the features derived using Equations 10, 11 and 12 could correspond to the smaller bright features visible in the images in Figure 5, which may represent smaller scale granulation with a shorter life time. These features may last on the order months, indicating a need for short-term observations as well as long term study.  Figure 10. Power spectra of the mean reconstructed images from OITOOLS in Figure 7 compared to the reconstructions from SQUEEZE in Figure 5, both with the limb darkened disk edge filtered out.

CONCLUSION
The observations presented here have allowed us to derive the fundamental stellar parameters of the RSG AZ Cyg. In addition, we have reconstructed H-band images of the star, finding that the surface varies from year to year with large features of scale ∼ 0.5 ★ , as predicted by 3D RHD simulations. We also find small bright spots on the surface which are perhaps akin to the granulation predicted by such models, but these features are of a scale which is below the resolution of the CHARA Array and thus analysis of their size beyond the scope of this work. Although it seems like the larger features last for longer than one year, the surface pattern of the small features varies substantially. Observations on a shorter time-scale, on the order of months, will be necessary to get a better understanding of the lifetime of these features. Overall, it seems like the surface of this RSG, and likely others, is dominated by a pattern of large, long lived convection cells and smaller features consisting of hot granules of rising gas which persist for months, but less than one year. ACKNOWLEDGMENTS RN would like to thank Dr. Carlos Lopez Carrillo for the suggestion to use mean images from each chain when measuring contrast and feature size in order to obtain an understanding of the spread in measurements.
FB and RN would like to acknowledge funding of this research through NSF Awards #1616483 and #1814777. RN would also like to acknowledge start-up funding from the Office for Research at New Mexico Tech. This work is based upon observations obtained with the Georgia State University Center for High Angular Resolution Astronomy Array at Mount Wilson Observatory. The CHARA Array is supported by the National Science Foundation under Grant No. AST-1636624 and AST-1715788. Institutional support has been provided from the GSU College of Arts and Sciences and the GSU Office of the Vice President for Research and Economic Development.
This work is based upon observations obtained with the Infrared Telescope Facility, which is operated by the University of Hawaii under contract NNH14CK55B with the National Aeronautics and Space Administration.
This research has made use of the Jean-Marie Mariotti Center OIFits Explorer 5 and Aspro 6 services.
This research has also made use of the SIMBAD database and the VizieR catalogue access tool, CDS, Strasbourg, France (DOI: 10.26093/cds/vizier). The original description of the VizieR service was published in Ochsenbein et al. (2000).
This publication makes use of data products from the Two Micron All Sky Survey, which is a joint project of the University of Massachusetts and the Infrared Processing and Analysis Center/California Institute of Technology, funded by the National Aeronautics and Space Administration and the National Science Foundation.
This work has made use of data from the European Space Agency (ESA) mission Gaia (https://www. cosmos.esa.int/gaia), processed by the Gaia Data Processing and Analysis Consortium (DPAC, https: //www.cosmos.esa.int/web/gaia/dpac/consortium). Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement.