ELT Imaging of MWC 297 from the 23-m LBTI: Complex Disk Structure and a Companion Candidate

Herbig Ae/Be stars represent the early outcomes of star formation and the initial stages of planet formation at intermediate stellar masses. Understanding both of these processes requires detailed characterization of their disk structures and companion frequencies. We present new 3.7 micron imaging of the Herbig Be star MWC 297 from non-redundant masking observations on the phase-controlled, 23-m Large Binocular Telescope Interferometer. The images reveal complex disk structure on the scales of several au, as well as a companion candidate. We discuss physical interpretations for these features, and demonstrate that the imaging results are independent of choices such as priors, regularization hyperparameters, and error bar estimates. With an angular resolution of ~17 mas, these data provide the first robust ELT-resolution view of a distant young star.


INTRODUCTION
Herbig Ae/Be stars are young, intermediate-mass stars hosting protoplanetary disks (e.g. Herbig 1960;Hillenbrand et al. 1992), often thought to represent a transition in formation mechanism between high-and low-mass stars (e.g Vink et al. 2005). Observing their disks in detail and placing constraints on their companion occurrences presents an opportunity to study the physics of star formation, and to probe the initial stages of planet formation around massive stars. Interferometric studies of these objects have revealed extended millimeter emission (e.g. Piétu et al. 2003;Alonso-Albi et al. 2009), compact infrared circumstellar disks (e.g. Millan-Gabet et al. 2001;Eisner et al. 2004), and winds and outflows (e.g. Malbet et al. 2007). Furthermore, a variety of surveys using visual and spectroscopic techniques have revealed a high companion frequency (30 − 75%; e.g. Leinert et al. 1997;Bouvier & Corporon 2001;Corporon & Lagrange 1999;Baines et al. 2006), with a possibly higher frequency for Be Here we present new, spatially-resolved observations of the Herbig Be star MWC 297, which is located in the Aquila Rift. Its mass and age are estimated to be ∼ 17 M and ∼ 2.8 × 10 4 yr, respectively (e.g. Vioque et al. 2018). Its spectral energy distribution classifies it as a Meeus Group I object, indicating the presence of a circumstellar disk that contributes significantly to the infrared luminosity (e.g Meeus et al. 2001). Previous studies have yielded a variety of distance and extinction estimates to MWC 297, constrained its rotational velocity, and studied its circumstellar environment at a range of wavelengths. The following three subsections detail the various distance estimates (1.2), disk characterization efforts (1.3), and constraints on complex asymmetric structures (e.g. companions and/or winds) in the context of MWC 297's stellar properties (1.4).

The Distance to MWC 297
Estimates of MWC 297's distance have ranged from 250 pc to >450 pc. Its distance was initially constrained to be < 600 pc, based on a B0-or-later spectral type and and an extinction of A v ∼ 6.7 mag derived from HI line flux ratios (Thompson et al. 1977;Bergner et al. 1988). CO kinematic observations decreased the distance estimate to ∼ 450 pc (Canto et al. 1984). Followup optical spectroscopy constrained the spectral type, A v , and distance simultaneously, yielding B1.5V (within 0.5 subtypes), A v ∼ 7.8 mag, and 250 ± 50 pc (Drew et al. 1997). 1 In contrast, Gaia recently estimated the distance to MWC 297 to be 375 ± 20 pc (Vioque et al. 2018).
Of the distance measures above, the CO kinematic distance is highly uncertain, since this method is unreliable within 1 kpc (Canto et al. 1984). The remaining discrepancy lies between the photometric distance of 250±50 pc and the Gaia distance of 375 ± 20 pc. The photometric distance relies on estimates of MWC 297's V band magnitude and extinction, which are both uncertain. MWC 297 has been shown to be variable in V band, with apparent magnitudes ranging from ∼ 12.0 − 12.5 (Bergner et al. 1988). The majority of extinction estimates range from A v ∼ 7.8 − 8.3 mag (McGregor et al. 1984;Hillenbrand et al. 1992;Drew et al. 1997), with the exception of A v ∼ 6.7 mag from Thompson et al. (1977). However, the H I lines in Thompson et al. (1977) were not observed simultaneously, and MWC 297 is known to be variable in Hydrogen lines (e.g. Eisner et al. 2015), making this estimate less reliable.
Given the uncertainties for V magnitude and extinction, we explore whether the Gaia distance is consistent with a B1.5(±0.5)V star. Following the same procedure as in Drew et al. (1997), MWC 297's range in apparent V magnitude of 12.0 − 12.5 and its extinction estimates of A v = 7.8−8.3 yield M V = -3.0 to -4.0 mag for a distance of 375 pc, consistent with expected M V values for B0V -B1.5V spectral types (e.g. Straizys & Kuriliene 1981). This calculation is in agreement with more recent 0.3-1.3 µm spectral energy distribution modeling of MWC 297, which derived A v ∼ 7.7 for a Kurucz B1.5V model spectrum (Kurucz 1991) assuming a distance of 375 pc (Ubeira-Gabellini et al. 2020).
The remaining argument for a shorter distance of 250±50 pc is that as a young star MWC 297 is likely associated with the Aquila Rift. Photometry suggests that the closest edge of the cloud complex lies at 225 ± 55 pc, and that the complex has a thickness of ∼ 80 pc (Straižys et al. 2003). The far edge of the cloud would thus lie between 250 pc and 360 pc, making a 375 pc distance to MWC 297 an outlier at the 1σ level. However, VLBA parallax measurements to young stars in the vicinity of MWC 297 estimate their distances to be much larger (> 400 pc e.g. Ortiz-León et al. 2017). A 375 pc distance to MWC 297 is thus within the range of distance estimates to the Aquila Rift. For this reason and due to its agreement with MWC 297's extinction and spectral type estimates, we adopt the Gaia distance of 375 pc for this work, and adjust any spatial measurements from previous studies to this distance.

MWC 297's Circumstellar Disk
MWC 297 hosts a circumstellar disk that has been well studied in the radio. An early 5 GHz (6 cm) map revealed extended structure on ∼ 200−300 mas scales, and a north-south elongation (Drew et al. 1997). Given the large v sin i measured from He I lines (∼ 350 km/s), this study suggested that the north-south elongation may trace an edge-on circumstellar disk (Drew et al. 1997). Followup observations at 1.3 mm and 2.7 mm did not show the same complex structure (Alonso-Albi et al. 2009), but they had poorer angular resolution (beam sizes of 1.1" × 1.4" and 1.4" × 0.9", respectively, compared to 0.14" × 0.11"). A joint fit to these data and MWC 297's SED resulted in a best-fit outer disk radius of ∼ 43 au at 375 pc, similar to the extent of the 6 cm emission. The observations were well explained by either an i ∼ 5 • disk with an inner rim, or an i ∼ 80 • disk without an inner rim. The 1.3mm/2.7mm spectral slope suggested that a ring of large (∼ 1 cm) grains may also exist at radii of 200 − 300 au (Alonso-Albi et al. 2009).

Rotational Velocity, Complex Disk Structure, and Companion Scenarios
MWC 297's low inclination estimates are at odds with a spectrosopic v sin i measurement of 350 ± 50 km s −1 (Drew et al. 1997). For this v sin i, inclinations of 15 − 38 • correspond to rotational velocities 550 km s −1 . However, recent estimates of MWC 297's mass and radius, ∼ 17 M and 9.7 R , respectively (Vioque et al. 2018;Ubeira-Gabellini et al. 2020) suggest a breakup velocity of ∼ 480 km s −1 . If the stellar v sin i is indeed 350 km s −1 , then the stellar inclination must be 50 • in order for the rotational velocity to be below breakup.
For a star with a wind, some spectral features could be contaminated by wind material or formed in a location where outflow kinematics dominate (e.g Weigelt et al. 2011), which could lead to an overestimated stellar v sin i. MWC 297's v sin i was measured in two ways: (1) by broadening standard star spectral lines and comparing them to MWC 297, and (2) computing the widths of He I singlets at 4009Å and 4144Å (Drew et al. 1997). While MWC 297 is known to have a wind (Malbet et al. 2007;Weigelt et al. 2011), the He I singlet v sin i (∼ 320 km s −1 ) is unlikely to be contaminated by circumstellar material given the high line excitations.
If the v sin i measurement errors were underestimated, then the stellar rotation rate could be consistent with a 38 • inclination. If the true v sin i were ∼ 290 km s −1 then the stellar rotation would be below breakup for the most recent inclination estimate of 38 • (Kluska et al. 2020). This v sin i would correspond to a one pixel change in the He I line widths measured in Drew et al. (1997), using the Intermediate-dispersion Spectrograph and Imaging System on the William Herschel Telescope. It would also only be 10 km s −1 lower than the nominal 1σ bounds on the measurement (300 − 400 km s −1 ).
It is also possible that MWC 297's inclination is underestimated. The inclination constraints are based on long-baseline visibilities that indicate a symmetric brightness distribution on ∼ few milliarcsecond scales (e.g. Malbet et al. 2007;Weigelt et al. 2011;Hone et al. 2017). Recent VLTI data including phase information reveals asymmetric emission that could be explained by disk inclination effects for i ∼ 38 • (Kluska et al. 2020). However, it is also possible that complex circumstellar structure could cause a brightness distribution that appears relatively symmetric despite a higher stellar inclination. For example, a disk wind could produce this effect, since dust can become entrained in the wind (e.g. Bans & Königl 2012). This would cause the brightness distribution to appear more spherical on the sky even at a high disk inclination, which could lead to an underestimated disk inclination in geometric fitting.
An outer companion could cause a symmetric brightness distribution for a high stellar inclination by inducing a disk warp. This could cause the disk inclination to differ from the stellar inclination for at least some spatial separations. This scenario could be tested by searching for shadows or asymmetries in scattered light on the scales of a few au, and searching for companions that may be perturbing the disk on those scales. There are no reports of disk shadowing or warps in the innermost several au around MWC 297 in the literature.
VLT/SPHERE recently discovered a sub-stellar companion (M ∼ 0.1 − 0.5 M ) around MWC 297 with a separation of ∼ 245 au (Ubeira-Gabellini et al. 2020). However, given its mass and separation, following the first order perturbation theory in Terquem & Bertout (1993), this companion is not massive enough to cause large warps in the inner ∼ few to tens of au. A companion-induced disk warp would require a companion with a smaller separation and/or higher mass. The VLT/SPHERE observations did not detect any other companions down to their coronagraph inner working angle 150 mas (56 au at 375 pc). This inner working angle also makes these data poorly suited for searching for inner disk warps and shadows in the innermost few tens of au.

Outline of This Paper
Here we present 3.7 µm ELT-resolution imaging of the Herbig Be star MWC 297, from the co-phased Large Binocular Telescope Interferometer (LBTI; Hinz et al. 2008;Bailey et al. 2014). With a 23-meter effective aperture, LBTI can robustly image MWC 297's circumstellar emission down to spatial scales of several au. We place new constraints on the circumstellar disk geometry and detect a companion candidate with a separation of ∼ 22 au.
In Sections 2, 3 and 4 we describe the experimental design, observations, and data reduction, respectively. Sections 5 and 6 discuss the image reconstruction process and present its results. We discuss MWC 297's morphology in Section 7 and summarize the main conclusions in Section 8. Appendices A, B, and C provide additional details regarding data reduction, geometric modeling, and image reconstruction tests for exploring image fidelity.

EXPERIMENTAL DESIGN
The technique of non-redundant masking (NRM; Tuthill et al. 2000) has been demonstrated to provide useful constraints on Herbig Ae/Be stellar environments (e.g. Tuthill et al. 2001;. NRM uses a pupil-plane mask to transform a conventional telescope into an interferometric array, making the images on the detector the interference fringes formed by the mask. These images are Fourier transformed to calculate complex visibilities, which have both amplitude and phase. Since the baselines in the array are non-redundant (no repeated lengths and position angles), information from each baseline has a unique spatial frequency. From the complex visibilities we calculate squared visibilities, the powers associated with the mask baselines. We also calculate closure phases, sums of phases around baselines forming a triangle. Non-redundancy means that closure phases eliminate residual instrumental and atmospheric phase errors to first order, making them particularly powerful for close-in companion detection.
NRM provides moderate contrast down to separations of ∼ 0.5 λ/D (e.g. Ireland & Kraus 2008;Sallum & Skemer 2019), a factor of a few to several boost in resolution compared to high performance coronagraphy (e.g. Guyon et al. 2014;Ruane et al. 2017). Due to its sparse Fourier coverage, model fitting and/or image reconstruction (often used in conjunction) are required to constrain the source brightness distribution. Simulations have shown that with adequate (u, v) sampling and sky rotation coverage, robust, model-independent images can be reconstructed from NRM observations (e.g. .
The NRM observations presented here were taken with operational co-phasing of the two 8.4-m LBT mirrors. With co-phasing, the wavefront across each mirror is flattened using adaptive optics and the differential piston between the two mirrors is controlled using a cryogenic pathlength corrector. These observations used the flexible pyramid wavefront sensors at 600-900 nm to correct 153 Zernike modes at 990 Hz. Differential piston, tip and tilt were sensed at Ks-band using the phasecam fringe tracker run at 1 kHz. In this mode the LBT can be thought of as a segmented telescope with two 8meter mirror segments, providing ∼ 23-m resolution in one direction, and ∼ 8-m resolution in the perpendicular direction. While previous LBTI NRM observations have utilized inter-aperture baselines in "lucky" imaging mode , these are the first data with operational co-phasing.
Applied on an Extremely Large Telescope (ELT; D 25 m), 3.7 µm NRM observations resolve ∼ 15 mas angular scales, accessing the inner few to several au around distant ( 500 pc) young stars. The ∼ 1 − 23 meter baselines offered by the co-phased LBTI resolve angular separations down to ∼ 17 mas at L . The dense (u, v) coverage from the intra-aperture (B < 8 m) baselines, combined with the high resolution of the inter-aperture (B > 8 m) baselines allows us to robustly image MWC 297 on scales down to ∼ 6 au.

OBSERVATIONS
We observed MWC 297 and a reference PSF calibrator on May 31, 2018 using LBTI/LMIRCam (e.g. Leisenring et al. 2012) and the 12-hole non-redundant mask. We chose the calibrator HD 164259 due to its proximity on the sky to MWC 297, and its similar 2-4 µm fluxes (Table 1). HD 164259 is significantly brighter than MWC 297 at the wavefront sensing band (600-900 nm), and slightly fainter than MWC 297 at the phase-tracking band (Ks-band). As a result, the calibrator observa-tions had superior AO correction, but poorer co-phasing performance. We discuss this further in Appendix A.4, where we also present data reduction strategies for this situation.
We alternated pointings between MWC 297 and HD 164259, using the same integration time (∼ 0.15 s) for both (Table 2). Each pointing was split into two dithers on the upper and lower halves of the detector. To fill in the (u,v) plane, we observed in pupil-stabilized mode, with the sky rotating on the detector throughout the night. Altogether we obtained two pointings of each object, with a total of ∼ 19 • of parallactic angle evolution and ∼ 500s of integration time for MWC 297. Figure  1 shows the combined (u,v) coverage for the two MWC 297 pointings.  We reduce the data using an updated version of the pipeline presented in . This applies image calibrations, generates Fourier sampling coordinates for the mask, extracts squared visibilities and closure phases, and calibrates the target observables using reference PSF observations. Appendix A provides details on these steps, a description of error bar estimation for the observables, and differences between this reduction and the pipeline presented in . Figure 2 shows the final, calibrated visibilities and closure phases.

ANALYSIS
We reconstruct images using BSMEM (Buscher 1994;Baron & Young 2008), a gradient-descent algorithm with maximum entropy regularization (Frieden 1972). For all reconstructions, we use a pixel scale of 1 mas and a field-of-view of 800 mas. Changing these parameters does not change the images significantly, unless the field of view is made small compared to the resolution of the shortest baselines. In this case BSMEM cannot effectively use information from the short baselines. We run BSMEM in "classic Bayesian" mode, which chooses the entropy hyperparameter (α) automatically by assigning it a prior and evaluating the evidence to choose the most likely value (Baron & Young 2008). We also test other methods for choosing α, which are presented and discussed in Appendix C.
BSMEM can reconstruct images with a variety of built-in priors or with a user-specified prior image. We first reconstruct images with a generic, but physically  The raw data were combined using the p = 3 averaging scheme described in Appendix A.4, and error bars are assigned following Appendix A.6. motivated prior for imaging circumstellar material: a central compact component representing the star, and an extended component to allow for circumstellar structure in the reconstruction. Previous observations of MWC 297 suggest that it is well modeled by a ∼ few mas compact component and a 40 mas extended component (e.g. Acke et al. 2008;Malbet et al. 2007;Weigelt et al. 2011). We thus begin with a prior image consisting of two circular Gaussian functions -one central, compact component (F W HM = 2 mas, fractional flux of ∼ 0.8) and one more extended component (F W HM = 50 mas). We then demonstrate that the prior choice does not significantly change the recovered image by testing several other priors informed by geometric model fits to the data (Appendices B and C).
6. RESULTS Figure 3 shows the final reconstructed image using the simple two-Gaussian prior, zoomed to the innermost 250 mas, and Figure 4 shows the reconstructed observables plotted over the data. MWC 297's morphology is more complex than a simple star plus face-on disk model. The imaging reveals bright zones along position angles roughly ±30 • east of north, and flux deficits to the north and south (indicated by the dashed lines and (b) label in Figure 3). The emission to the immediate east of the star is ∼ 1.4 − 2 times as bright as that to the west. There is also a companion-like feature (indicated by the circle and (a) label in Figure 3) at ∼ 110 • east of north, with a separation of ∼ 60 mas and contrast of ∼ 2% (4.25 mag) relative to the central compact component. Appendices B.2 and C.3 present modeling that assesses the false positive probability of this signal, which we calculate to be < 0.08%. We carried out a variety of tests to demonstrate that the companion signal is independent of prior image choice. Figure 5 shows one of these, where we reconstruct images using priors made up of a central, compact component, an extended disk, and a compact companion signal. In one prior, the companion location is the same as the companion signal in Figure   other the position angle is offset. While only one example is shown in Figure 5, we tested a variety of prior companion position angle offsets. When the prior companion is at the same position as the signal in Figure  3, the reconstructed companion becomes more compact but contains the same fractional flux. For the offset prior companions, BSMEM does not introduce fake signals with significant fractional flux compared to either the observed companion or the noise levels in the data. The images and companion signal are robust to arbitrary prior choices such as these and others that are presented in Appendix C.1.
Appendix C presents the detailed results of additional image fidelity tests related to error bars and regularization, and we describe them briefly here. Reconstructions with scaled error bars and alternative entropy hyperparameters (α) show that small errors and/or small α lead BSMEM to introduce high frequency peaks throughout the field of view, improving its ability to perfectly match the observations (including noise). Conversely, large er- Over-regularized images (which closely resemble faceon disks) do not reproduce the observations even qualitatively. Furthermore, simulated reconstructed images of δ functions plus various disk models cannot reproduce the structure seen in Figure 3 (Appendix C.2). Gapped radiative transfer disk models (which are inconsistent with previous datasets but which we explore in Appendix C.3), also cannot reliably reproduce the reconstructed image. The images thus demonstrate that MWC 297's circumstellar emission is inconsistent with a simple, relatively face-on disk. Geometric modeling of the observations support this as well, with high reduced χ 2 values for all disk models, but better reduced χ 2 and Bayesian evidence values for increasingly complex geometric models (e.g. skewed disks, disks plus companions; Appendix B).
We note that some of the model squared visibilities in Figure 4 reach values slightly greater than 1. This is because BSMEM allows for an error on the zero spacing flux, which we left at its default value of 0.1 for this reconstruction. As a result, during BSMEM's automatic regularization, the data have enough weight compared to the regularizer that BSMEM introduces noise to try to better match the visibilities, which reach values greater than 1 due to imperfect calibration. This effect diminishes with increasing regularizer values (see Appendix C, Figures 16 and 17) and decreasing values of the zero spacing flux error. It also does not significantly change the morphology in the central ∼ 250 mas of the reconstruction, which can be seen by comparing the reconstruction in Figure

MWC 297's Circumstellar Disk
Here we use the reconstructed images to place constraints on MWC 297's circumstellar structure. The imaging reveals an unresolved central component with a large fractional flux; 0.74-0.78 on few mas scales. We use MWC 297's spectral energy distribution to estimate the fractional flux that can be associated with disk material. Assuming a stellar effective temperature and radius of ∼ 23, 700 K, and ∼ 9.17 R , respectively (Ubeira-Gabellini et al. 2020), the stellar contribution to MWC 297's total 3.7 µm flux is ∼ 5%. This suggests that ∼ 69 − 72% of the total 3.7 µm flux resides in a compact region that is unresolved by the LBTI observations. This is consistent with recent VLTI modeling and imaging, which showed that the 1.65 − 2.2 µm stellar fractional flux was ∼ 10 − 15%, and that the remainder was concentrated in an area with a characteristic size of a few mas (e.g. Malbet et al. 2007;Kluska et al. 2020).
We explore whether the remaining extended emission can be explained by the relatively face-on disk scenario suggested by previous VLTI observations (e.g. Kluska et al. 2020) For models of this type, the LBTI data prefer inner radii on the scale of several au; we thus test gapped disk models that can roughly match our observations without underestimating the VLTI squared visibilities or closure phases presented in Kluska et al. (2020). These tests show that gapped disk models significantly under-fit the LBTI closure phases, do not reproduce the reconstructed image in the absence of noise, and do not reliably reproduce the reconstructed image with appropriate levels of added noise (see Appendix C.3).
The remaining flux around MWC 297 is thus more complex than a simple, face-on disk with or without gaps. The central ∼ 3 − 7 au region of the reconstructed image is relatively symmetric, but further out (∼ 10 − 30 au) the flux is distributed in a butterfly pat-tern similar to near-infrared HST images of protostars with outflows (e.g. Padgett et al. 1999;Cotera et al. 2001;Wolf et al. 2008;Burrows et al. 1996). Herbig Ae/Be stars are thought to drive well-collimated outflows (opening angle 50 • ) at ages of less than a few × 10 4 yr, and poorly-collimated outflows at older ages (e.g Beuther & Shepherd 2005). Given MWC 297's young age (∼ 2.8 × 10 4 yr) one may expect relatively collimated outflows, which could cause this disk morphology. The nearly-vertical, conical dark zone in the reconstructed image could be caused by a well-collimated bipolar outflow, which would agree with the north-south elongation seen in a previously-published 5 GHz map (Drew et al. 1997).
The inner extent of the possible outflow cavity (∼ 7 au) is consistent with the inner reaches of outflows observed around Herbig stars (e.g. Devine et al. 2000). The spatial scale of the entire butterfly pattern (∼ 30 au north-south, ∼ 20 au east-west) is smaller than that observed in the infrared for protostars with outflow cavities ( 100 au). It is possible that the LBTI observations trace only the brightest regions of the disk and either would not be sensitive to, or would over-resolve fainter, more extended emission. It is also possible that MWC 297's higher mass, and thus higher photoionizing flux, leads to a more compact disk compared to the T Tauri stars where these nebulae have been observed. Lastly, a companion with a separation of tens of au (comparable to the companion candidate separation in the reconstructed image) could also truncate the disk, leading to the more compact emission seen here.
The observed brightness distribution is most similar to disk plus outflow models viewed at moderate inclination ( 50 − 60 • ; e.g Stark et al. 2006). This explanation requires that the disk inclination be higher than the ∼ 15 − 38 • estimated from previous long-baseline observations (e.g. Millan-Gabet et al. 2001;Eisner et al. 2004;Monnier et al. 2006;Malbet et al. 2007;Acke et al. 2008;Weigelt et al. 2011;Hone et al. 2017;Lazareff et al. 2017;Kluska et al. 2020). For all of these except Kluska et al. (2020), the inclination estimates were based on the aspect ratios of geometric fits to visibilities. Kluska et al. (2020) based their inclination estimate (∼ 38 • ) on a disk fit that reproduced the location and degree of the asymmetry in the reconstructed image. However, more complex structures than simple face-on disks may have relatively symmetric brightness distributions at such small scales, causing geometric and disk model fits to prefer face-on geometries. Indeed, the inner regions of simulated disks with outflow cavities have been shown to appear relatively symmetric for moderate inclinations ( 65 • ; e.g. Stark et al. 2006). We note that the models shown in Stark et al. (2006) are not entirely analogous to MWC 297, since they cannot account for the compact, high fractional flux suggested by the LBT and long-baseline observations. Recent VLTI observations do not show a decrease in brightness toward small radii, suggesting they do not resolve the inner radius of the circumstellar emission (Kluska et al. 2020). Furthermore, fits to previous VLTI datasets and MWC 297's spectral energy distribution prefer disk models with continuum emission inside the sublimation radius (e.g. Acke et al. 2008;Malbet et al. 2007;Weigelt et al. 2011). This suggests either the presence of free-free emission or refractory grains close to the star, neither of which are taken into account in Stark et al. (2006). While fully exploring such a complex system's parameter space is beyond the scope of this paper, either of these scenarios should still produce enough near-infrared radiation to cause extended emission from scattering by an outflow cavity.
The disk plus outflow scenario we propose here implies a higher inclination than the 38 • suggested by previous long baseline studies. If this were the case, one might expect to see close-in asymmetric emission on one side of the star from an inner disk rim located at ∼ 7 au (the dust sublimation radius). However, the presence of free-free emission or close-in refractory grains could result in a non detection of an asymmetric disk rim. In both cases the bright central component could increase the contrast of a close-in disk, reducing its closure phase signal. A refractory inner disk could also shadow a rim of non-refractory grains, increasing its contrast. Lastly, the higher inclination implied by the disk plus outflow scenario would be consistent with the observed v sin i of 350 km/s, which requires an i 50 • for the rotational velocity to be less than breakup (∼ 480 km/s). Future comprehensive modeling efforts could nail down the validity of this scenario and possibly measure the true inclination of MWC 297.

A Companion Candidate Around MWC 297
We detect a companion candidate around MWC 297 at a separation of ∼ 60 mas and a contrast of ∼ 2% = 4.25 mag relative to the unresolved flux. The observed contrast agrees with previous near-infrared visual binary searches that observed a trend of sharply increasing companion contrasts greater than ∆K ∼ 2 mag (e.g. Leinert et al. 1997). The projected spatial separation (∼ 22 au at a distance of 375 pc) lies in a region of the parameter space where aperture masking and long-baseline interferometry studies have collectively set a lower limit of ∼ 11% on the companion fraction, with detected companions having spatial separations of ∼ 1.4 − 30 au (e.g. Anthonioz et al. 2015;Smith et al. 2005;Kraus et al. 2012;Berger et al. 2011;Duchêne 2015). Here we discuss possible physical explanations for this companion signal.
The companion candidate fractional flux at 3.7µm is 1.5 ± 0.2%, corresponding to an intrinsic flux of ∼ 1.12 ± 0.17 Jy. Assuming the same age as MWC 297 (∼ 2.8 × 10 4 yr; Vioque et al. 2018), and that the emission comes from a bare photosphere, we use the solar-metallicity PARSEC (Bressan et al. 2012) premain-sequence evolutionary tracks to fit for a stellar mass, radius, and effective temperature. Models with M * = 2.1 ± 0.2 0.3 M , R * = 13.1 ± 0.7 1.4 R , and T eff = 4542 ± 150 40 K are consistent with the 3.7µm flux. A stellar companion such as this one would not be noticeable in previously-published H band to K band long-baseline interferometry visibilities. However, it may be noticeable in the H band closure phases published in Kluska et al. (2020), since the expected H band flux is 0.7 − 0.9 Jy after reddening.
To check this, we generate a δ + skewed Gaussian model image with the same stellar fractional flux, FWHM, and asymmetry as the reconstructed image in Kluska et al. (2020). We sample it with the same (u, v) coverage as Kluska et al. (2020), and compare the results for this image to one where we add a companion candidate with a flux of 0.8 Jy (corresponding to a contrast of ∼ 15%) and a 60 mas separation. The squared visibilities for these two cases are indistinguishable, but the closure phases for the model with the companion are on average 8 • larger than the δ + skewed Gaussian model, much larger than the typical error bars.
The inconsistency between a bare stellar photosphere and the previously-published VLTI closure phases suggests that the companion must be very red. To explore this, we increase the contrast of the companion in the model images described above, until the difference between the δ + skewed Gaussian model and the δ + skewed Gaussian + companion model is smaller than the typical VLTI closure phase error bar. The change associated with a contrast of ∼ 1% with respect to the central star is below these limits, corresponding to an H-band flux of ∼ 0.05 Jy and an H-L color > 4.8 mag (compared to ∼ 0.7 mag for MWC 297). At an age of ∼ 2.8 × 10 4 yr, companions with masses less than ∼ 0.1 − 0.5 M can satisfy this H-band upper limit, but cannot reproduce the color. However, companions surrounded by warm (∼ 700 K) dust can satisfy both the H and K band flux constraints if their extent is 3.5 au. A dust-shrouded companion with a mass 0.2 M could match this scenario, since its Hill radius would be 3.5 au for an orbital radius of ∼ 22 au.
Single epoch observations cannot distinguish between an orbiting, dusty companion and circumstellar material. MWC 297's mass (∼ 17 M ) and the companion separation (22 au) imply an orbital period of at least ∼ 25 years and evolution of up to ∼ 14 • yr −1 in position angle. Given the companion position angle uncertainty of a few degrees, followup observations in the coming years should be capable of observing or ruling out the expected orbital motion at high significance. An alternative explanation with no expected orbital motion is that the companion signal originates from a Herbig Haro object (Herbig 1951;Haro 1952) caused by an equatorial flow (assuming the north-south cavity in the imaging is caused by a bipolar flow). Modeling of radiation-driven winds has shown that both bipolar and equatorial outflows may exist around accreting early Herbig Be stars (e.g. Drew et al. 1998).

3.7µm Constraints on the Wide-Separation SPHERE Companion
VLT/SPHERE recently detected a companion around MWC 297 at a separation of ∼ 650 mas, corresponding to ∼ 245 au (Ubeira-Gabellini et al. 2020). We do not detect significant emission at the SPHERE companion location in the reconstructed images. Given the noise levels in the observations, the simulations presented in Appendix B suggest that the L companion contrast is greater than ∼ 5 magnitudes, corresponding to a flux less than ∼ 750 mJy. This is consistent with the best-fit BT-SETTL model presented in Ubeira-Gabellini et al. (2020), which predicts a ∼ 6 mJy 3.7 µm flux (∆L ∼ 10 mag).

CONCLUSIONS
We presented new, spatially resolved observations of the Herbig Be star MWC 297 from the co-phased LBTI. The imaging allows us to characterize MWC 297's disk geometry in detail. It reveals complex disk structure on scales of ∼ 10 − 30 au that closely resembles butterfly patterns associated with collimated outflows around young stars. Protostellar outflow models with moderate inclinations (∼ 50 − 65 • ) provide a good match to the data. This scenario is consistent with MWC 297's young age and spectral type, since early Be stars are expected to drive collimated outflows at ages a few × 10 4 yr.
The images resolve inconsistencies between low inclination estimates and high stellar v sin i measurements in previous studies. Previous inclination constraints (∼ 15 − 38 • ) were all based on simple disk models to primarily long-baseline visibility data. Indeed, the axes ratios of simple geometric disk fits to the LBTI data also imply a low inclination (Appendix B). However, the imaging may be better explained by a moderately inclined disk plus outflow, a scenario that has been demonstrated to look relatively symmetric through radiative transfer modeling. This larger inclination (∼ 50 − 65 • ) would be consistent with the large v sin i (350 km s −1 ), which requires i 50 • for the stellar rotation to be below breakup. While fully modeling this complex system is beyond the scope of this paper, future modeling efforts could confirm or refute this scenario and possibly measure the true inclination of MWC 297.
The images also show a companion-like feature at a separation of 60 mas, corresponding to 22 au at the distance of MWC 297. If this feature is indeed an orbiting companion, its infrared flux constraints cannot be explained by a bare stellar photosphere. They are better matched by warm ∼ 700 K dust on the scales of a few au, corresponding to the size of the Hill sphere for companions with M * 0.2 M . Multi-epoch observations will distinguish between a dusty companion scenario and alternatives, such as heating of disk material by an equatorial outflow.
The images of MWC 297 have ∼ 17 mas resolution at 3.7 µm. Unlike previous LBTI NRM datasets taken in "lucky" imaging mode ), here we demonstrate that the structure in the reconstructed images is independent of prior choices, regularization hyperparameters, and issues in error bar estimation. The imaging data provide a high-fidelity, ELTresolution view of several au scales around a distant young star. While studies applying geometric models provide some constraints on MWC 297's circumstellar environment, we demonstrate that robust imaging is required to characterize its morphology in detail and resolve modeling inconsistencies. Future observations with the 23-m LBTI and with upcoming facilities such as GMT and TMT will place new, similarly detailed constraints on disk structures and companions around young stellar objects like MWC 297. We first flat field the raw images using a flat constructed from the portions of science and calibrator frames that contain only sky background. We perform bias, dark, and sky subtraction for each target pointing by subtracting the median of the top dither from the bottom dither, and vice versa. We then apply a dewarping correction following the procedure described in Maire et al. (2015), using dewarp (Spalding & Stone 2019).

A.1.2. Additional Detector Systematics
LMIRCam is an H2RG detector with 64-pixel wide channels having different analog-to-digital converters. We measure and correct two types of systematic noise associated with these readout channels. The first is pattern noise that repeats over each 64-pixel channel. To characterize this, we separate the frame into 32 64 × 2048 channels and take the median of each pixel across the 32 channels (excluding channels that contain images of the star). We then subtract the median from all 32 readout channels. We also correct for different bias levels in each of the readout channels. We measure this by separating the frame into 32 64 × 2048 channels, and calculating the median value of each channel (excluding rows that contain images of the star). We then subtract the median from each channel in the image.

A.1.3. Bad Pixels
We lastly correct for bad pixels, replacing each pixel flagged as bad with the mean of the adjacent pixels. We flag pixels as bad in two ways: (1) using a pre-generated bad pixel map, and (2) by examining each science frame individually. We calculate the bad pixel map from the data we use to construct the sky flat. From the distribution of all pixel values in the master flat, we flag all 3σ outliers as bad pixels. We follow the same procedure with the master dark that was used to calibrate the sky flat, but using different σ cuts on each side of the distribution (2σ low, 3.5σ high) because of its skew.
For each science frame, we flag additional bad pixels after correcting those from the pre-generated map. We compare each pixel to those in a 3 × 3 surrounding box. We flag pixels that are greater than 2σ away from the mean of the box. We choose a 3 × 3 box as the largest one that would not contain too much complex structure from the mask PSF, which would systematically increase σ and make it more difficult to flag pixels. We tested a variety of σ cuts, and found that the correction did not change significantly for cuts of 2 − 3σ outliers.

A.2. Choosing Fourier Sampling Coordinates
We use the mask hole positions, diameters, and bandpasses to calculate synthetic power spectra for choosing Fourier sampling coordinates. We create a realistic mask PSF by combining monochromatic mask PSFs across the bandpass. We take the |FT| 2 of the mask PSF to calculate the power spectrum, and then find all pixels within ∼ 50% of the maximum for each baseline location. We sample the observed complex visibilities at those pixels when generating squared visibilities and closure phases.
We compare a variety of synthetic power spectra to the mean observed power spectrum to check the sampling quality. We allow for a non-zero mask rotation angle, and found that a small rotation (∼ 3.5 • ) provided the best match to the data. However, even with a small rotation a mismatch existed for the inter-aperture baselines that could not be corrected by a simple scaling (e.g. bandpass adjustment). Decreasing the horizontal separation (by ∼ 0.5 m) between holes on each of the two primaries corrected this mismatch. An effect like this could be caused by mask flexure, which would disproportionately affect the inter-aperture baselines.

A.3. Squared Visibility and Closure Phase Generation
We calculate the squared visibilities and closure phases by sampling the pixels described in Section A.2. For the squared visibilities, we sum the power (|FT| 2 ) for all pixels corresponding to each baseline. We subtract a bias that we calculate by taking the mean of all power spectrum pixels without signal. We save these raw squared visibility amplitudes for each science frame, and also save normalized squared visibilities, which we calculate by dividing the zero-spacing power into the raw visibilities.
To calculate closure phases, for each triangle of baselines, we find all triangles of pixels whose (u, v) coordinates sum to (0,0). We calculate a bispectrum for each pixel triangle by multiplying the FT values of the three pixels. We then calculate an average bispectrum for each triangle and frame by averaging the bispectra of all the pixel triangles that close. We describe our strategy for producing average squared visibilities and closure phases for each pointing in Section A.4.

A.4. Scan Averaging Strategy
Both the adaptive optics and co-phasing performance affect the relative quality of observables calculated for different frames and targets. AO performance affects the data quality for all baselines, impacting coherence for the intra-aperture fringes and the wavefront at the phase tracker. The phase tracking performance only influences the inter-aperture baselines. Since the wavefront sensing and fringe tracking occur at different bandpasses (600-900 nm and 2.0-2.3 µm, respectively), science and reference PSF objects with different colors may have different relative V 2 and CP quality as a function of baseline length.
While HD 164259 is significantly brighter than MWC 297 at 1 µm, its fainter Ks band flux leads to lowerquality long baseline observables than MWC 297. We use the fraction of squared-visibility power that resides in the inter-aperture baselines (f P OW ) as a representative measure of co-phasing performance for each frame. The worse co-phasing performance for HD 164259 can be seen in the larger number of low-f P OW frames in the top-row shaded histograms in Figure 6. An averaging scheme that weights all individual frames equally would thus result in lower inter-aperture visibilities for HD 164259 just due to co-phasing performance. When these are divided into MWC 297 squared visibilities during calibration, the calibrated MWC 297 visibilities would appear higher (under-resolved) for inter-aperture baselines (Figure 7, grey points). This would also cause a systematic error in the inter-aperture closure phases, since they would be noisier for HD 164259, which would degrade any real CP signals in the calibrated MWC 297 data.
To avoid these systematics in a more objective way than manual vetting, we apply a weighted averaging strategy aimed at evening the relative distributions of f P OW for MWC 297 and HD 164259. For the squared visibilities, we average the observables for the individual frames using the total visibility amplitudes in the inter-aperture baselines, raised to a power p, as a weight for each frame. For the closure phases, we use the sum of all bispectrum amplitudes for triangles made up of two inter-aperture baselines, also raised to a power p. We checked that this approach to the closure phase weighting is not significantly different than weighting both the squared visibilities and closure phases by simply the raw, inter-aperture visibility amplitudes. Since closure phases are already weighted by bispectrum amplitudes during averaging, the improvement of this weighted scan averaging strategy was more pronounced for the squared visibilities than for the closure phases.
We use the raw sum of inter-aperture amplitudes (as opposed to the sum of inter-aperture amplitudes divided by the sum for all baselines) as the averaging weights, since the intra-aperture power for individual frames can vary as well (e.g. due to variability in AO performance). Using the fractional inter-aperture power would thus unnecessarily upweight frames with equivalent inter-aperture power and lower intra-aperture power. We test values of p from 0 (no weighting) to 5 (aggressive up-weighting of coherent inter-aperture frames). Figure 6 shows the effective distributions of inter-aperture fractional power for each value of p. We reconstructed images for all of these weighting schemes and found comparable results except for p = 0 (see Figure 8). Furthermore, we compared calibrated datasets for this scheme to a vetting scheme where we dropped some fraction of frames with the lowest inter-aperture amplitudes. All values of p ≥ 1 were equivalent to dropping the lowest-quality 0.5 − 0.7 of the frames. Figure 7 compares the final (calibrated) squared visibilities for these different methods and parameters. We use the intermediate p = 3 weighted average as our representative dataset throughout the rest of the paper. This value of p leads to the lowest squared visibility errors estimated following the procedure in Appendix A.6, with lower p values suffering contamination from poorly co-phased frames, and higher p values giving too much weight to a small number of frames.

MWC297 2
Inter-aperture f POW , and fractional cutting (right) approaches described in Section A.4. In both panels, the largest grey points show the visibilities with no weighting scheme or cuts in the averaging process. Colors and sizes correspond to the power that the amplitudes are raised to in the weighting (left) and the fraction of lowest amplitude frames dropped (right). As the points become smaller and redder, the weighting and data cutting schemes become more aggressive.

A.5. Calibration
We calibrate both the squared visibilities and the closure phases using the "polycal" method described in . We fit a polynomial to the calibrator observations as a function of time, and calculate its value at each target observation time. We divide the resampled calibrator squared visibility into each science target squared visibility, and subtract each resampled calibrator closure phase from each science closure phase. We note that one should generally correct for the angular size of the calibrator, and thus any resolved calibrator visibility signals, before carrying out the visibility calibration. However, the calibrator angular size of 0.76 mas (Delfosse & Bonneau 2004) corresponds to a squared visibility of 0.9994 on 23-m baselines. We can thus treat the calibrator as an unresolved source.
Since only two calibrator pointings are available, we test 0 th -and 1 st -order polynomials, measuring the scatter in the calibrated data. We use the polynomial order that minimizes the scatter in the calibrated MWC 297 observations, which for both the squared visibilities and the closure phases was the 0 th -order (constant) function.

A.6. Error Bar Estimation
Systematic, rather than random noise sources dominate in NRM observations; the scatter in the final calibrated data is larger than the scatter across the cubes of images. Assigning error bars based on the random variation in the observables would underestimate the error bars. We thus use the distributions of calibrated closure phases and squared visibilities to estimate the errors on the data.
First, to remove any mean signal, we calculate the mean squared visibility and closure phase for each baseline and triangle, respectively, by averaging the two pointings. We then divide this out of the squared visibilities and subtract it from the closure phases. We fit separate Gaussian functions to the resulting distributions of all squared visibilities and all closure phases separately, and assign each best-fit standard deviation as the error bar for all observables of its type (0.05 for squared visibilities, and 3.0 • for the closure phases). Figure 2 shows the final calibrated observations with their assigned error bars.
This simple approach may underestimate the error bars for the inter-aperture baselines and overestimate them for the intra-aperture baselines. However, given the small number of observables we do not attempt to fit a more complex error model to the data. We do, however, test the effects of both increasing and decreasing the errors as a function of baseline length, and find that this does not affect the results significantly for linear scalings with baseline length. More aggressive error scalings, which do cause the reconstructed images to change, are unrealistic since they significantly upweight very small numbers of data points.

B.1. Model Fitting
We fit geometric models to the data to constrain MWC 297's morphology and to inform our image reconstruction tests. We use the results of previous long-baseline interferometric studies to inform our modeling assumptions. IOTA observations utilizing 21-m and 38-m baselines at 1.65 − 2.2 µm favored a Gaussian brightness distribution over a ring morphology for this object (Millan-Gabet et al. 2001). More recent VLTI datasets either showed no evidence for an inner clearing, or had a best-fit inner disk radius that would not be resolved by our observations (r in ∼ 1 − 2 mas; Acke et al. 2008;Weigelt et al. 2011;Lazareff et al. 2017). We thus explore simple δ function + Gaussian disk models (as opposed to δ + ring models).
We restrict the disk full-width at half-maximum (FWHM) to be less than ∼ 70 mas, slightly larger than the bestfit Gaussian size from 10.7 µm Fizeau observations (Monnier et al. 2009) and larger than the dominant Gaussian components in fits to long-baseline interferometry data (e.g. Acke et al. 2008). While the 10.7 µm observations suggested the presence of an extended halo (resolved by the shortest, ∼ 1.8m baselines in their array), it was only at the ∼ 2% level and thus we neglect it in the simple models presented here. We also force the aspect ratio to be > 0.5, since parametric fits to both the 10.7 µm observations and long-baseline data are relatively axisymmetric (r > 0.77; e.g. Malbet et al. 2007;Kluska et al. 2020).
We apply a model consisting of a central delta function containing fractional flux f * , and a skewed, Gaussian disk. The following equation defines the brightness distribution for the disk: and where (x, y) increase right and up in image space, respectively; θ is the position angle of the disk major axis, measured E of N; φ s is the peak skew position angle measured E of N; φ = arctan(y, x); A s is the skew amplitude, and r is the minor to major axis ratio. The full width at half maximum along the disk major axis is given by We explore δ + Gaussian disk models of increasing complexity by first fixing f * = 0,r = 1 and A s = 0, and then relaxing constraints on the unresolved flux, axes ratio and skew. We next allow for the presence of a companion-like feature in addition to the δ function and Gaussian disk, in the form of a second δ with a separation s c , position angle (measured E of N) θ c , and contrast c c relative to the central δ. We restrict the companion separation to less than ∼ 150 mas since recent SPHERE observations detected no companions down to that inner working angle (Ubeira-Gabellini et al. 2020).
Due to the large model parameter space, we use Markov-Chain Monte Carlo methods to explore possible models, rather than a grid search. We use the open-source package emcee (Foreman-Mackey et al. 2013) in parallel-tempering mode, with 100 walkers and 10 temperatures for each model type. This ensures that the space is well sampled even in the presence of local likelihood maxima. We take the 16% and 84% contours in the T=1 chain as the 1σ allowed range of model parameters. For each model, we also calculate reduced χ 2 to assess goodness of fit.
We use both likelihood ratios and Bayesian evidence ratios to estimate the likelihood that the model is preferred. The performance of two nested models can be compared by calculating the ratio of the likelihood values for the two models, which in this case is the same as the difference in best-fit χ 2 values (e.g. Neyman & Pearson 1933). This statistic is approximately χ 2 distributed with ∆ DOF degrees of freedom, where ∆ DOF is the difference in degrees of freedom between the two models. To calculate the likelihood ratio for any pair of models, we take the difference in log likelihood (the difference in χ 2 ) between the best fits for the two models. We calculate the significance level at which that model is preferred by comparing the ∆χ 2 to a distribution with ∆ DOF degrees of freedom. b Significance values for a particular model being preferred (based on ∆χ 2 ) compared to the model in the row above it.
Since log likelihood testing has a non-negative false positive probability (e.g. Jenkins & Peacock 2011), we also calculate Bayesian evidence values for each model (e.g Trotta 2008). Bayesian evidence is the marginalized likelihood over the parameter space of interest. For two different models, the odds of one model being preferred over another is proportional to the ratio of their evidence values. We use emcee to estimate the Bayesian evidence from the various temperature chains using thermodynamic integration (e.g. Goggans & Chi 2004). We compare evidence values for the different models by examining differences in log evidence (log Z). Table 3 lists the best-fit parameters for the geometric model fitting whose corresponding images are shown in Figure  9. Table 4 lists model selection metrics: minimum χ 2 values, reduced χ 2 values (for all obserables, for just closure phases, and for just squared visibilities), ∆χ 2 values and their corresponding significance, and log Z values. The ∆χ 2 values in Table 4 list the decrease in minimum χ 2 associated with that model, compared to the model in the row above it. The significance values show the significance with which one model is preferred compared to the simpler model above it, according to the likelihood ratio test.

B.2. Results
All of the models suggest the existence of a compact, unresolved component with a large fractional flux. The circular Gaussian model where f * = 0 prefers a FWHM of ∼ 13 mas, placing most of the flux in the central region that is unresolved by the observations. This model is significantly worse than the models where f * is allowed to vary, which can be seen in the relatively high reduced χ 2 and low log Z values. This suggests the combination of an unresolved component like that seen in near-infrared long-baseline observations (e.g. Weigelt et al. 2011;Lazareff et al. 2017), and an extended component similar to that seen at longer infrared wavelengths (e.g. Acke et al. 2008;Monnier et al. 2009).
Indeed, all of the models with a central delta function point to an additional extended structure; their reduced χ 2 and log Z values are much improved compared to the Circ. Gauss The extended component must be relatively centro-symmetric to match the observations. In particular, the squared visibilities would not show such consistent behavior with parallactic angle if the brightness distribution were highly asymmetric. However, allowing for a non-circular Gaussian disk component leads to a large increase in log Z and a modest improvement in χ 2 . While the ∆χ 2 value only compares the best fits for two models, log Z compares the quality of the fit over the entire parameter space for two models. This suggests that the two model sets have a similar quality best fit, but that a larger portion of the δ + Non-circ. Gauss. parameter space can match the observations. The non-zero closure phases prefer models that allow for the extended emission to be skewed, with a ∼ 25% asymmetry along φ s . This is evidenced by both the large ∆χ 2 and the large increase in log Z between the first two models, whose disks do not have skew, and the last two models, whose disks allow for skew. Furthermore, the δ + Skew Gauss. + Comp. model is strongly preferred by the data. It has a ∆χ 2 of ∼ 92 (corresponding a > 5σ significance with ∆ DOF = 3), and a large increase in log Z compared to the δ + Skew Gauss. model. The best-fit contrast for this companion relative to the central delta function component is 2.0 ± 0.3% or 4.25 ± 0.15 mag.
To test whether this companion signal could be caused by noise, we perform companion fits to Gaussian noise realizations. We draw random observables from distributions of squared visibilities and closure phases with no mean signal and with the same level of scatter in the data (σ V 2 = 0.05; σ CP = 3.0 • ). We calculate the false positive probability as the fraction of best fits that have the the same separation as the companion but lower contrast. Figure  10 shows the results; the companion contrast is 0.55 mag brighter than the brightest best fit to noise. Because of the limited number of simulations, we can constrain the false positive rate to < 0.08%. We test whether noise plus a circumstellar disk model can reproduce the observed companion signal in Appendix C.3.
All of the best-fit models have large reduced χ 2 values (1.85-2.05), suggesting that the data are under-fit, with a worse fit to the closure phases than to the squared visibilities. MWC 297's underlying brightness distribution is likely more complex and asymmetric than a simple disk or even a disk plus companion. The models all under-fit the closure phases in particular, with χ 2 r,CP ∼ 2 for all model types. This points to more complex asymmetry than a disk plus companion model can represent. We calculate the false positive probability as the fraction of best fits that have the the same separation as the companion but lower contrast. Inset: The purple histogram shows the cumulative distribution of contrasts from fits to noise where the best fit had a separation equal to the companion candidate's. The solid red line shows the best fit contrast for the companion candidate.
C. ADDITIONAL IMAGE RECONSTRUCTION TESTS

C.1. Choice of Prior
To test the image fidelity, we reconstructed images with five additional priors other than the two-Gaussian prior whose results shown in Figure 3: (1) a simple delta function, and (2-5) the best geometric fits from the models in Appendix B that allow for a central δ function component. Figure 11 presents the results. In the δ image ( Figure 11, first column), the prior is aggressive enough to cause a flux deficit around the central component in the reconstruction. However, the central, single pixel contains the same fractional flux as the central, beam-sized region in the other reconstructions. Their less aggressive priors allow for flux close to the unresolved component. Comparing the last two columns of Figure 11 illustrates this as well. Introducing a delta function in the prior at the location of the companion feature leads it to become more compact in the reconstruction.
To further test the influence of compactness in the prior, we convolve the 2 × δ + Skew Gauss model with a Gaussian before reconstruction. This leads to a prior image where the companion model feature and the central star both have > 1 pixel extents. Figure 12 shows the results. Indeed, as the two δ functions become more smeared out in the prior, BSMEM concentrates the flux less densely at that location in the resulting image. However, the total fractional flux in the central component and companion features does not change.

C.2. Simulated Reconstructions of Geometric Models
To further explore the impact of prior images, and to check whether simple geometric models could cause the structure in the reconstructions, we simulate image reconstructions for the models shown in Section B.2. We sample each model with the same Fourier coverage and sky rotation as the observations. We then add Gaussian noise to make the distributions of simulated observables match the data. We reconstruct each model using each of the priors applied in Figure 11. Figure 13 shows the results of these tests. The δ function prior imposes a cleared region around the central, bright pixel for all models, while the less-aggressive priors do not. The prior image can introduce a small amount of skew when noise is present in observations of centrosymmetric models; the increased skew in the right end of the δ + Circ.  Figure 11. BSMEM reconstructed images for priors informed by geometric modeling. From top to bottom, each column shows for a single reconstruction, the prior image used, the resulting reconstructed image, the model closure phases (purple points) plotted against the data (grey points with error bars), and the model sqared visibilities (purple points) plotted against the data (grey points with error bars). All of the priors are best fits from models presented in Tables 3 and 4, with the exception of the first column, which is just a simple δ function. Each reconstructed image panel shows the reduced χ 2 returned by BSMEM, which is defined as the χ 2 of the reconstructed observables divided by the number of data points.
Gauss. row shows this. However, a skewed prior does not introduce as much skew for centrosymmetric objects as it does for truly asymmetric ones. Comparing the images in the second column from the right shows this.
Including delta functions in a prior image always increases the flux in a single pixel at the same location in the reconstruction. However, when no flux is present at that location in the true brightness distribution, the bright pixel is not significant compared to the noise in the rest of the image. When the source does have significant emission at the prior delta function location, the single bright pixel in the reconstruction is significant. This can be seen in the rightmost column of Figure 13. All four images contain one pixel with a brightness enhancement due to the prior, but it is only visible for the reconstruction of the disk plus companion model (fractional flux of ∼ 0.019). The fractional fluxes in the same pixel for the other three images are ∼ 0.00011 − 0.00018, with the largest fractional flux in the reconstruction of the δ + Skew Gauss. model. In this case, BSMEM puts some of the asymmetric disk flux into a single bright pixel at the location of the δ function in the prior image. However, the fractional flux in this pixel is still lower than the one in the 2 × δ + Skew Gauss. reconstruction by a factor of ∼ 100.

C.3. Simulated Reconstructions of Radiative Transfer Disk Models
The simulations shown in Figure 13 show that simple geometric disk models cannot reproduce the data. Here we explore whether radiative transfer disk models, allowing for the presence of a disk rim, can match the observations without being inconsistent with previous VLTI datasets. We fit a coarse grid of radiative transfer models using the open-source software RADMC-3D (Dullemond 2012) and pdspy (Sheehan 2018). We then reconstruct images of the best-fit disk model by simulating observations with the same Fourier coverage as the data, and adding noise so that the simulated scatter matches the scatter in the data.
We explore two types of disk models: (1) a star + disk scenario with an inner radius allowed to vary, and (2) a star + inner disk from ∼ 0.2 − 2 au + outer disk with an inner radius allowed to vary. In all cases, we artificially increase the fractional flux of the star and/or inner disk, to account for the large compact fractional flux seen in the LBTI data and previous VLTI observations. The best fit models from each of these categories have similar reduced χ 2 values (∼ 2), but only models from the second category are consistent with the VLTI closure phases from Kluska et al. (2020).
We thus explore whether the best-fit gapped disk model could reproduce the observed LBTI reconstructed images. Figure 14 shows the best-fit disk model, its model observables plotted against the data, the simulated image reconstruction from those observables. The reduced χ 2 for this model is χ 2 r = 1.93. This model has an outer disk inner radius of 8 au and an inclination of 40 • , and provides a relatively good match to the squared visibilities (χ 2 r,V 2 = 1.28), but not to the closure phases (χ 2 r,CP = 2.14). This suggests that the circumstellar disk has more complex structure than a simple gapped model. This is supported by the simulated image reconstruction, which does have a central depression, but does not show the same complex structure as the observed reconstructed image.  Figure 13. Reconstructed images from simulated observations of the best-fit disk models presented in Appendix B.2. We sampled the disk models with the same Fourier coverage and sky rotation as the data, and added enough noise to the simulated observables so that the distributions of closure phases and squared visibilities matched the observations. We then reconstructed images of each input model (rows) with all five priors used on the observations (columns).
Injecting a 2% companion at the location of the candidate in the data does not change this -it only introduces flux at the companion candidate location in the reconstruction. A gapped disk plus companion model also still under-fits the data, with reduced χ 2 values of: χ 2 r = 1.90; χ 2 r,CP = 2.12; χ 2 r,V 2 = 1.21. Like the geometric models, the particularly high χ 2 r,CP suggests that the true circumstellar structure is more complex than a gapped disk plus companion. Generating simulated reconstructions under conservative noise assumptions shows that a gapped disk model plus noise does not reliably reproduce the elongated structure in the observed reconstructed image. We simulated a large number of observations of the gapped disk model, adding enough noise to match the distribution of squared visibilities and closure phases. We note that assuming a best-fit model with χ 2 r,CP ∼ 2 means that we must add a large amount of noise compared to the gapped disk signal (see Figure 14) to match the distribution of closure phases. For these simulated noise realizations, we estimate that less than ∼ 10% produced images with an asymmetric butterfly pattern similar to the observed reconstructed image. Reproducing the observations with this gapped disk scenario thus seems unlikely since it requires such a pathological noise realization.
We use these simulations to test whether a disk model plus noise could reproduce the companion signal in the observed reconstructed image. We reconstructed a large number of images from the best fit gapped disk model, with enough Gaussian noise added to match the scatter in the observed squared visibilities and closure phases. We then measured the fractional flux at the position of the companion candidate for each noise realization. All of the observed fractional fluxes were lower than the ∼ 2% level measured in the companion candidate, demonstrating that a gapped disk model cannot reproduce the companion signal. While this does not rule out extended circumstellar structure as the source of the companion candidate, it shows that reproducing the signal with circumstellar material requires a more complex morphology than a simple disk.

C.4. L-curve Reconstructed Images
In addition to BSMEM's automated hyperparameter optimization, for each prior image we also use the "L-curve" method to explore entropy hyperparameters (e.g. Hansen 1992;Thiébaut & Young 2017). This involves plotting the image regularizer function (entropy) versus χ 2 , which has an L-shape. The vertical portion of the L-curve, where χ 2 values do not change with regularization, is dominated by the likelihood function and is under-regularized. In contrast, the horizontal section of the L-curve, with rapidly-changing χ 2 values, is dominated by the regularization and is thus over-regularized. We use the elbow in the L-curve, which balances the influence of the likelihood and regularizer, as the final reconstruction parameters. For all prior images and using the error bars presented in Appendix A.6, the elbow corresponds to an entropy hyperparameter of α ∼ 500 ( Figure 15).  Figure 16 shows the reconstructed images for α = 500 using the error bars estimated in Appendix A.6, (which correspond roughly to reconstructed image reduced χ 2 values of 1, where reduced χ 2 is defined by BSMEM as the χ 2 of the reconstructed observables divided by the number of data points). The reconstructed image observables  Figure 16. BSMEM reconstructed images using L-curve α optimization and priors informed by geometric modeling. From top to bottom, each column shows for a single reconstruction, the prior image used, the resulting reconstructed image, the model closure phases (purple points) plotted against the data (grey points with error bars), and the model squared visibilities (purple points) plotted against the data (grey points with error bars). All of the priors are best fits from models presented in Tables  3 and 4, with the exception of the first column, which is just a simple δ function. Each reconstructed image panel shows the reduced χ 2 returned by BSMEM, which is defined as the χ 2 of the reconstructed observables divided by the number of data points.
(purple scattered points in the lower two rows) have significantly lower scatter than both the observations and the reconstructed observables for BSMEM's automated hyperparameter optimization (Figure 4). This is because the automated optimization chooses hyperparameter values of ∼ 10 − 15, which lie in the under-regularized region of the L-curve. However, the structure in the innermost ∼ 250 mas in these images is not significantly different from that shown in Figure 3.

C.5. Image Dependence on α and σ
We test the effects of changing the entropy hyperparameter and the estimated error bars on the reconstructions (Figure 17). For smaller hyperparameters (where the reconstruction is more dominated by the likelihood function than the regularizer), the reconstructed image better matches the scatter in the data. BSMEM accomplishes this by adding low levels of high-frequency signal throughout the field of view, which can create large closure phase signals. Conversely, very large hyperparameters reduce the information in the reconstructed image to the point that it no longer qualitatively matches the observations. Varying the entropy hyperparameter close to the elbow of the L-curve does not qualitatively change the reconstructed images.
Incorrect error bars change the location of the L-curve elbow and affect the quality of the reconstruction. Underestimated error bars move the L-curve elbow toward higher regularization hyperparameters and larger reduced χ 2 values. We demonstrate this in Figure 18, which shows L-curves for the nominal error bars, and error bars that are a factor  Figure 17. BSMEM reconstructed images using the δ + Skew Gauss prior with different entropy hyperparameters. The top row shows the reconstructed image, the middle row shows closure phases (purple points) plotted against the data (grey points with error bars), and the bottom row shows model squared visibilities (purple points) plotted against the data (grey points with error bars). Each reconstructed image panel shows the reduced χ 2 returned by BSMEM, which is defined as the χ 2 of the reconstructed observables divided by the number of data points. of two and four lower. For these reconstructions, BSMEM better matches the observations again by adding low levels of high-frequency signal to the image, without a significant qualitative change ( Figure 19).  Figure 19. BSMEM reconstructed images using a δ + Skew Gauss prior and the L-curve method for the error scalings shown in Figure 18. The top row shows the reconstructed image, the middle row shows closure phases (purple points) plotted against the data (grey points with error bars), and the bottom row shows model squared visibilities (purple points) plotted against the data (grey points with error bars). Each reconstructed image panel shows the reduced χ 2 returned by BSMEM, which is defined as the χ 2 of the reconstructed observables divided by the number of data points.