Type Ia Supernovae are Excellent Standard Candles in the Near-Infrared

We analyze a set of 89 Type Ia supernovae (SN Ia) that have both optical and near-infrared (NIR) photometry to derive distances and construct low redshift ($z<0.04$) Hubble diagrams. We construct mean light curve (LC) templates using a hierarchical Bayesian model. We explore both Gaussian process (GP) and template methods for fitting the LCs and estimating distances, while including peculiar velocity and photometric uncertainties. For the 56 SN Ia with both optical and NIR observations near maximum light, the GP method yields a NIR-only Hubble-diagram with a RMS of $0.117 \pm 0.014$ mag when referenced to the NIR maxima. For each NIR band, a comparable GP method RMS is obtained when referencing to NIR-max or B-max. Using NIR LC templates referenced to B-max yields a larger RMS value of $0.138 \pm 0.014$ mag. Fitting the corresponding optical data using standard LC fitters that use LC shape and color corrections yields larger RMS values of $0.179 \pm 0.018$ mag with SALT2 and $0.174 \pm 0.021$ mag with SNooPy. Applying our GP method to subsets of SN Ia NIR LCs at NIR maximum light, even without corrections for LC shape, color, or host-galaxy dust reddening, provides smaller RMS in the inferred distances, at the $\sim 2.3 - 4.1\sigma$ level, than standard optical methods that do correct for those effects. Our ongoing RAISIN program on the Hubble Space Telescope will exploit this promising infrared approach to limit systematic errors when measuring the expansion history of the universe to constrain dark energy.


INTRODUCTION
The increasing sample of high quality, low-redshift (low-z), near-infrared (NIR) light curves (LCs) of Type Ia supernovae (SN Ia) provides an opportunity to further investigate their utility as cosmological standard candles. Optical samples of SN Ia are large enough now that systematic uncertainties are major limitation to ac-aavelino@cfa.harvard.edu asf@ucsd.edu curate cosmological constraints. Infrared observations of SN Ia can help in that essential way because supernovae are more nearly standard candles in the NIR and the e↵ects of dust are diminished. This paper explores ways to use NIR observations of SN Ia to measure distances. This investigation is for a low-z sample, but we are working to extend this technique to cosmologicallyinteresting distances with the Hubble Space Telescope (HST).
Before NIR photometry became practical for large samples of SN Ia, photometry and spectroscopy of SN Ia at optical wavelengths enabled the unexpected 1998 dis-covery of cosmic acceleration (Riess et al. 1998;Schmidt et al. 1998;Perlmutter et al. 1999). Since then, a suite of independent cosmological methods has confirmed the SN Ia results (see Frieman et al. 2008;Weinberg et al. 2013 for reviews). The prevailing view is that the mechanism behind cosmic acceleration is some form of dark energy. The constraints on cosmological parameters from the SN Ia Pantheon sample ) combined with the Planck 2015/2018 Cosmic Microwave Background data (Planck Collaboration et al. 2016b, as well as Baryon Acoustic Oscillations (Alam et al. 2017) and local Hubble constant measurements (Riess et al. 2016(Riess et al. , 2018c are consistent with this view. Among the major cosmological techniques, SN Ia provide precise measurements of extragalactic distances and the most direct evidence for cosmic acceleration (see Goobar & Leibundgut 2011;Kirshner 2013;Goobar 2015;Davis & Parkinson 2016;Riess et al. 2018c for reviews).
Optical SN Ia LCs are known to be excellent standardizable candles that exploit correlations between intrinsic luminosity and LC shape and color (Phillips 1993;Phillips et al. 1999;Hamuy et al. 1996;Riess et al. 1996Riess et al. , 1998Perlmutter et al. 1997;Goldhaber et al. 2001;Tonry et al. 2003;Wang et al. 2003;Prieto et al. 2006;Jha et al. 2006Astier et al. 2006;Takanashi et al. 2008;Conley et al. 2008;Mandel et al. 2009;Guy et al. 2005Guy et al. , 2007Guy et al. , 2010Mandel et al. 2011Mandel et al. , 2017. Recent work has demonstrated that SN Ia in the NIR are more nearly standard candles, even before correction for LC shape or host galaxy dust reddening (e.g. Krisciunas et al. 2004a;Wood-Vasey et al. 2008;Mandel et al. 2009;Krisciunas et al. 2009;Friedman 2012;Kattner et al. 2012). NIR LCs are ⇠5-11 times less sensitive to dust extinction than optical B-band data (Cardelli et al. 1989). When constructing SN Ia Hubble diagrams using NIR data, the distance errors produced by extinction are small: ignoring dust would be fatal for optical studies, but nearly not as serious for NIR studies like Wood-Vasey et al. 2008 or the present work. An improved approach would use optical and infrared data simultaneously to determine the extinction .
Optical-only samples yield typical Hubble diagram intrinsic scatter of int ⇠ 0.12 mag and a RMS of 0.141 mag after applying light-curve shape, host-galaxy dust, and host-galaxy mass corrections, assuming a peculiarvelocity uncertainty of 250 km s 1 (e.g. Scolnic et al. 2018). For simplicity, we adopt a conservative peculiarvelocity uncertainty for the host galaxies in our sample of 150 km s 1 . If the typical redshifts in the sample were large enough, this would be of no consequence, but for our nearby sample, the inferred intrinsic scatter of the supernova luminosities depends on the value we choose. As a result, though we have confidence when comparing the RMS and intrinsic scatter for various subsamples containing the same SN with both optical and infrared data, the real value of the scatter should be determined from observations that are securely in the Hubble flow beyond 10,000 km s 1 . When including a peculiar-velocity uncertainty of 150 km s 1 , our best method yields intrinsic scatters as small as ⇠ 0.03-0.11 mag, depending on the NIR filter subset, and a RMS of ⇠ 0.087 ± 0.013 mag for the best NIR Y JH-band subset, confirming and strengthening previous results for NIR methods (Meikle 2000;Krisciunas et al. 2004aKrisciunas et al. , 2005Krisciunas et al. , 2007Folatelli et al. 2010;Burns et al. 2011;Wood-Vasey et al. 2008;Mandel et al. 2009Mandel et al. , 2011Kattner et al. 2012;Dhawan et al. 2015). Assuming a larger peculiar-velocity uncertainty, such as 250 km s 1 , makes our estimated intrinsic scatter even smaller. In addition, our best NIR method using any of the Y JHK s bands yields an RMS of only 0.117 ± 0.014 mag, compared to 0.179 ± 0.018 mag and 0.174 ± 0.021 mag for SALT2 and SNooPy fits to optical BV R data for the same 56 SN Ia, respectively. While using LC shape, color, and host galaxy dust corrections would likely lead to improvements, the simpler approaches in this paper are still remarkably e↵ective.
Overall, a substantial body of evidence indicates that rest-frame LCs of SN Ia in NIR are both better standard candles than at optical wavelengths and less sensitive to the confounding e↵ects of dust. When NIR data are combined with UBV RI photometry, this yields accurate and precise distance estimates (Krisciunas et al. 2004b(Krisciunas et al. , 2007Wood-Vasey et al. 2008;Folatelli et al. 2010;Burns et al. 2011;Friedman 2012;Phillips 2012;Kattner et al. 2012;Burns et al. 2014;Mandel et al. 2009Mandel et al. , 2011Mandel et al. , 2014Mandel et al. , 2017. This is significant for supernova cosmology because, along with photometric-calibration uncertainties (Scolnic et al. 2015;Foley et al. 2018), uncertain dust extinction estimates and the intrinsic variability of SN Ia colors present challenging and important systematic problems for dark energy measurements (Wang et al. 2006;Wood-Vasey et al. 2007;Hicken et al. 2009a;Guy et al. 2007Guy et al. , 2010Conley et al. 2007Conley et al. , 2011Komatsu et al. 2011;Campbell et al. 2013;Rest et al. 2013;Scolnic et al. 2013;Narayan 2013;Betoule et al. 2014;Rest et al. 2014;Mosher et al. 2014;Scolnic et al. 2014bScolnic et al. ,a, 2015Narayan et al. 2016;Scolnic et al. 2017;Mandel et al. 2017;Foley et al. 2018;Scolnic et al. 2018;Brout et al. 2018a;Kessler et al. 2018). Combining optical and NIR LCs promises to re-duce these systematic distance uncertainties (Folatelli et al. 2010;Burns et al. 2011;Kattner et al. 2012;Mandel et al. 2011Mandel et al. , 2014. This work is organized as follows. In §2, we review previous results with SN Ia in NIR, detail our analysis selection criteria, and discuss host galaxy redshifts. In §3, we outline our Gaussian process (GP) procedure to fit LCs and our hierarchical Bayesian model to construct mean Y JHK s LC templates. In §4, we use these templates and GP fits to individual LCs to construct Hubble diagrams in each NIR band, as well as a combined Y JHK s NIR Hubble diagram. We compare this to optical BV R Hubble diagrams for the very same set of 56 supernovae that use the SALT2 and SNooPy LC fitters. We end with §5 by documenting how, even without correcting for LC shape or dust, SN Ia in the NIR using our GP fits at NIR maximum are better standard candles than optical SN Ia observations corrected for these e↵ects. Mathematical details of the Gaussian process, the hierarchical Bayesian model, and the method for determining the intrinsic scatter are presented in the Appendices.

SN Ia IN NIR AS STANDARD CANDLES
Pioneering studies by Meikle (2000) and Krisciunas et al. (2004a) demonstrated that SN Ia have smaller luminosity variation in the NIR JHK s bands than in the optical BV bands at the time of B-band maximum light (t Bmax ). Krisciunas et al. (2004a) found that optical LC shape and intrinsic NIR luminosity were uncorrelated in a sample of 16 SN Ia, while measuring a NIR absolute magnitude scatter of J = 0.14, H = 0.18, and Ks = 0.12 mag. Following this, Wood-Vasey et al. (2008) used a homogeneously-observed sample of 18 spectroscopically-normal SN Ia in the JHK s bands, with intrinsic root-mean-square (RMS) absolute magnitudes of 0.15 mag in the H-band, without applying any reddening or LC shape corrections. By combining these 18 objects with 23 SN Ia from the literature, the sample in Wood-Vasey et al. (2008) yielded an H-band RMS of 0.16 mag, strengthening the evidence that normal SN Ia are excellent NIR standard candles. In the present work, we show that SN Ia in NIR yield a narrow distribution of Y JHK s peak magnitudes with RMS Hubble Diagram scatter as small as 0.087 ± 0.013 mag for the combined Y JH bands and as large as 0.179 ± 0.029 mag for the K s band, consistent with previous results.
Following Wood-Vasey et al. 2008, Mandel et al. 2009 developed a new hierarchical Bayesian model (BayeSN) and a template model to account for J-band LC shape variation to the existing SN Ia in NIR sample, finding a marginal scatter in the peak absolute magnitudes of 0.17, 0.11, and 0.19 mag, in JHK s , respectively, while finding that J-band LC shape does correlate with NIR intrinsic luminosity. Subsequent work by Folatelli et al. 2010 applied a di↵erent LC shape correction method, but found scatters of 0.12-0.16 mag in Y JHK s , consistent with the results of Mandel et al. (2009).
Additional work by Kattner et al. (2012) found an absolute magnitude scatter of 0.12, 0.12, and 0.09 mag in Y JH, respectively, by analyzing a subset of 13 wellsampled normal NIR SN Ia LCs with relatively little host galaxy dust extinction. Kattner et al. 2012 also showed evidence for a correlation between the JH-band absolute magnitudes at t Bmax and, m 15 (B), the lightcurve decline rate parameter in B-band after 15 days of t Bmax (Phillips 1993), with no evidence for strong correlation in the Y -band. This is also consistent with the results of Mandel et al. 2009, who found that J-band LC shape and luminosity are correlated.
Using a small data set of 12 SN Ia JH-band LCs, each with only 3-5 data points, Barone-Nugent et al. 2012 find a scatter of 0.116 mag and 0.085 mag in the J and H-bands, respectively. In the first data release of the SweetSpot survey, Weyant et al. 2014 present a similarly small sample of 13 low-z SN Ia, each with 1-3 LC points, finding an H-band scatter of 0.164 mag. This was followed by a second SweetSpot data release, which included a total of 33 SN Ia with 168 JHK s observations in the redshift range 0.02 < z < 0.09, well into the smooth Hubble flow, but which did not yet include NIR Hubble diagrams (Weyant et al. 2018).
By analyzing 45 NIR LCs with data near NIRmaximum, Stanishev et al. 2018 find an intrinsic Hubble diagram scatter of ⇠ 0.10 mag, after accounting for potential new correlations between light curve shape, color excess, and J H color at NIR-max. Stanishev et al. 2018 also present single-epoch JH photometry for 16 new SN Ia with z > 0.037. The Carnegie Supernova Project (CSP) final data release (CSP-I; Krisciunas et al. 2017), was recently analyzed in Burns et al. (2018), which found peculiar velocity corrected Hubble diagram dispersions of ⇠ 0.08 0.15 mag, depending on the subset of the 120 SN Ia they considered. Additional CSP-II photometric data, to be published in 2019, was recently described in Phillips et al. 2019. Hsiao et al. (2019 present an overview of the NIR SN Ia spectroscopy obtained by the CSP and the Center for Astrophysics (CfA) Supernova Group.
While the current sample of optical SN Ia LCs exceeds 1000 , and will be increased by orders of magnitude by ongoing and future surveys including the Dark Energy Survey (DES; DES Collaboration et al. 2018a,b;Brout et al. 2018b;D'Andrea et al. 2018), the Zwicky Transient Facility (ZTF;Smith et al. 2014), and the Large Synoptic Survey Telescope (LSST; Ivezic et al. 2008;Zhan & Tyson 2017), the number of normal SN Ia with published NIR LCs is still less than 250. Nevertheless, the NIR sample has the potential to improve systematics compared to optical-only SN Ia cosmology samples, which are already systematics limited .
Overall, the growing sample of photometric data suggests that NIR observations of SN Ia present a promising path to standardize SN Ia for distance estimates (Dhawan et al. 2015;Shari↵ et al. 2016;Burns et al. 2018;Stanishev et al. 2018), Hubble constant estimates (Cartier et al. 2014;Efstathiou 2014;Riess et al. 2016;Cardona et al. 2017;Dhawan et al. 2018;Burns et al. 2018), and eventually, cosmological parameter estimates, when the nearby and high-z samples are combined as in the HST RAISIN program (RAISIN: Tracers of cosmic expansion with SN IA in the IR, PI. R. Kirshner, HST GO-13046, GO-14216).

Nearby SN Ia in NIR Sample and Data Cuts
This work analyzes a suitable subset including 89 objects from the current sample of low-redshift photometric data for SN Ia NIR Y JHK s -band LCs including data releases 1 and 2 from the Carnegie Supernova Project (Schweizer et al. 2008;Contreras et al. 2010;Stritzinger et al. 2010Stritzinger et al. , 2011Taddia et al. 2012), now superseded by CSP data release 3 (Krisciunas et al. 2017), the CfA (Wood-Vasey et al. 2008;Friedman 2012;Friedman et al. 2015), and other groups (e.g. Krisciunas et al. 2000Krisciunas et al. , 2004bKrisciunas et al. ,c, 2005Krisciunas et al. , 2007. We limit our analysis to spectroscopically normal SN Ia from Table 3 of Friedman et al. 2015, plus the definitive version of the CSP-I DR3 sample of low-z SN Ia (Krisciunas et al. 2017), and other groups. Additional CSP-II photometric data, to be published in 2019, was recently described in Phillips et al. 2019 and will be analyzed in future work. We apply the following data cuts to analyze a subset of 89 SN Ia with NIR data. Table 1 shows how the initial sample of 177 SN Ia decreases after applying the di↵erent cuts, and Table 2 lists the general properties of the remaining 89 SN Ia. We determine m 15 (B) and E(B V ) host with SNooPy.
• Optical light curve shape parameter 0.8 < m 15 (B) < 1.6, to consider normal SN Ia only (Hicken et al. 2009b). Objects must have accompanying B-band optical data to measure m 15 (B).
• Host galaxy reddening: 0.15 < E(B V ) host < 0.4. This cut is inspired by the standard SALT2 cut in color, 0.3 < c < 0.3, in optical-only analysis (Betoule et al. 2014;Scolnic et al. 2018) but with a less stringent cut considering that SN Ia in the NIR are less sensitive to dust.
• One advantage of the relative NIR insensitivity to dust reddening is that it also allows us to set a large threshold for Milky Way color excess: E(B V ) MW < 1 mag, to exclude highly reddened SN Ia. All 177 SN Ia in the sample passed this cut.
SN2006lf with E(B V ) MW = 0.8135 mag has the largest color excess in the initial sample.
• Redshift range: z < 0.04. The maximum redshift cut limits the e↵ects of Malmquist bias. Section 2.2 describes corrections to deal with SN Ia at z < 0.01, that su↵er from peculiar velocity bias.
• Duplicates: For a given supernova observed by multiple surveys, we use the CSP data (Krisciunas et al. 2017), which typically has smaller photometric uncertainties than the CfA PAIRITEL data (Friedman et al. 2015).
• We include only spectroscopically normal SN Ia as identified by the Supernova Identification Code (SNID) Blondin & Tonry (2007).
• At least 3 photometric points in a given band for each SN Ia LC. A large fraction of the NIR data from Barone-Nugent et al. (2012), Stanishev et al. (2018, and the SweetSPOT survey with WIYN (Weyant et al. 2014(Weyant et al. , 2018 did not meet this criterion, so we chose not to analyze these data in this work.

Host Galaxy Redshifts
Heliocentric galaxy recession velocities and CMB frame redshifts are shown in Tables 2 and 3. We obtained heliocentric host galaxy recession velocities using Notea Heliocentric redshift from NED or the literature using v helio from Table 3. b Redshift corrected to the CMB frame and using the C15 local flow model or redshift-independent distance information from Table 4. c Uncertainty in the theoretical distance modulus because of the peculiar velocity, defined in Eq. (8). d LC-data source. CfA: Wood-Vasey et al. 2008;Friedman et al. 2015, CSP: Krisciunas et al. 2017, Others: K04c: Krisciunas et al. 2004cV03: Valentini et al. 2003;K03: Krisciunas et al. 2003;P08: Pignata et al. 2008;St07: Stanishev et al. 2007;L09: Leloudas et al. 2009;K07: Krisciunas et al. 2007. Also see Table 3 of Friedman et al. 2015 for references. e Determined by fitting the optical and NIR LCs data with SNooPy. f LC shape parameter: apparent-magnitude decline between B-band peak luminosity and 15 days after peak. g Host-galaxy color excess, as measured by SNooPy fits to the optical and NIR LCs.  2). g The first reference listed is for v helio from NED or the literature. The second reference is for the e↵ective vCMB derived using either the C15 local flow model or independent distance information for nearby galaxies with v helio . 3000 km/s, special cases where host galaxy identification from NED is ambiguous, or some clusters which may have v > 3000km/s (see §2.2 and Table 4 NGC 3368 Riess et al. 2016, since H 0 0 = H0 and µ 0 = µ e↵ , e↵ective distance moduli µ e↵ are already on the Hubble scale used for this paper. We compute the e↵ective CMB frame recession velocity v e↵ in Table 3 via Eqs. 2-3 using µ 0 and H 0 0 (or equivalently µ e↵ and H0). This is then used to construct an e↵ective redshift or recession velocity for use in Hubble diagrams.  Table 3.
the NASA/IPAC Extragalactic Database (NED), using measurements with the smallest reported uncertainty. 1 If the host galaxy was anonymous or had no reported NED redshift, we used redshifts reported in the literature. When no uncertainties are available, we assume a recession velocity uncertainty of 50 km/s.
To further correct the CMB frame redshifts for local velocity flows and to estimate uncertainties, we used the model of Carrick et al. 2015. 2 Such corrections are most important for SN Ia with z < 0.01 (v < 3000 km/s), but we also use them for SN Ia further into the Hubble flow.
In special cases, we did not use the Carrick et al. 2015 flow model and instead used independent information for individual objects. For several SN Ia that have v helio > 3000 km/s, but are members of known galaxy clusters, to avoid large peculiar velocities from the cluster velocity dispersion, following Dhawan et al. 2018, we used the mean recession velocity of the cluster based on the cluster redshift from NED to estimate the CMB frame recession velocity for the SN Ia host galaxy. For SN 2008hs in Abell 347, we used v CMB = 5655 ± 13 km/s. For SN 2010ai in the Coma cluster, we used v CMB = 7166 ± 54 km/s (Pimbblet et al. 2014).
To further avoid peculiar velocity systematics for SN Ia with v helio < 3000 km/s, where available, we also used redshift-independent distance information from Cepheid variable stars, surface brightness fluctuations (SBF), or the Tully-Fisher method (TF) to estimate an e↵ective CMB frame redshift (see Tables 3-4 for  references).
Of the 19 SN Ia with Cepheid distances µ Ceph and uncertainties µ Ceph in the HST SHOES program (Table 5 of Riess et al. 2016), 7 with NIR data are included in our Table 3 ( SN 2001el, SN 2003du, SN 2005cf, SN 2007af, SN 2007sr, and SN 2011by). One other SN Ia (SN 1998bu) also has Cepheid distance from the HST Key Project (Table 4 of Freedman et al. 2001). 3 Lastly, 2 more SN Ia with NIR data (SN 2002dj, SN 2003hv) had redshift-independent host galaxy distance information from TF and/or SBF (Courtois & Tully 2012;Tully et al. 2013Tully et al. , 2016. of ⌦ M = 0.3 (implicitly ⌦ ⇤ = 0.7 for a flat universe), H 0 = 73.24 km s 1 Mpc 1 (Riess et al. 2016), = 0.43, and bulk flows of (Vx, Vy, Vz) = (89, 131, 17) km/s (Carrick et al. 2015). 3 We use the metallicity corrected values µ Z and Z from Table 4 of Freedman et al. 2001. For all of these methods, we convert the reported distance modulus µ 0 on a given Hubble scale H 0 0 to the Hubble scale of H 0 = 73.24 km s 1 Mpc 1 as measured by Riess et al. 2016 and use this value of H 0 throughout the rest of the paper. More specifically, for Hubble constants in units of km s 1 Mpc 1 , the distance modulus µ e↵ on our fiducial Hubble scale H 0 is given by See Table 4. For these objects, we convert the redshift independent distance modulus µ e↵ to an e↵ective CMB frame recession velocity with Hubble's law in the linear regime: with an uncertainty given by 4 For SN Ia with Cepheid distances, we assume that the only contribution to the recession velocity uncertainty comes from Eq. 3 and therefore adopt a peculiar velocity uncertainty of 0 km/s for these objects. For objects without Cepheid or other redshiftindependent distances, we assume a peculiar velocity uncertainty of pec = 150 km/s, following Radburn-Smith et al. 2004. 5 As shown in Section 5, the value of pec = 150 km/s yields a more conservative determination of the Hubble diagram intrinsic scatter compared with larger values of pec that tend to produce a misleadingly small value. However, statistics like the RMS, which we also use to compare various methods, are relatively insensitive to the assumed value of pec .

NIR LC TEMPLATES
We determine the normalized mean Y JHK s LC templates, as shown in Figure 1 and Table 5, using the SN Ia in Table 2 as follows. In each band, we convert the photometry from the observer-frame apparent magnitude to the rest-frame absolute magnitude. We further apply K-corrections to the rest-frame and correct for Milky Way dust extinction. These steps are detailed in §3.1. We then use a Gaussian process method, as described in §3.2, to fit the LC in each NIR band. Finally, in §3.3, 4 We do not propagate the uncertainty on H 0 in Eq. 2 because we have fixed the Hubble scale for this work.   Note-Mean NIR LC templates in the Y JHKs-bands using the hierarchical Bayesian model and Gaussian process method described in §3. These are referenced to the time of B-band maximum light, such that t ⇤ = 0 at tBmax. See Fig. 1.

Figure 1.
Upper and lower panels show the normalized mean Y JHKs templates and residual plots, respectively. By construction, we normalize the templates so that they have magnitude zero at t ⇤ = 0, with reference to the time of B-band maximum light. The numerical values of these templates are tabulated with 1-day sampling in Table 5. The black curves show the normalized mean magnitude ✓(t ⇤ ) vs. rest-frame phase t ⇤ , while the green and blue bands correspond to the population standard deviation, ✓ (t ⇤ ), and the uncertainty in ✓, respectively, determined using the hierarchical Bayesian model and Gaussian process method described in §3. We use 28, 67, 68 and 25 SN Ia that we can determine µ L as described in §3.2 to build the Y, J, H and Ks templates respectively.
using a hierarchical Bayesian model we average all the LCs in a given NIR band to determine the normalized mean LC template, its uncertainty, and the population standard deviation.

Rest-Frame Absolute Magnitudes
For a given supernova s observed through filter O, we convert the apparent magnitude m s datum observed at the modified Julian day (MJD) t MJD to the absolute magnitude M s at rest-frame phase t, via where z s is the spectroscopic redshift of the supernova s with respect to the CMB, including any local flow models (see Table 2). The phase t ⌘ (t MJD t Bmax,s )/(1 + z helio,s ) is the rest-frame observation time, corrected for cosmological time dilation, z helio,s is the heliocentric redshift, and t Bmax is the time of B-band maximum light. The term K OQ is the K-correction from the observed band O to the rest-frame band Q, and A O is the Milky Way foreground extinction defined as We assume a V -band total-to-selective extinction ratio for the Milky Way of R V = 3.1. We determine t Bmax and compute the K-correction K s OQ terms using a module in the SNooPy LC package (Burns et al. 2011), which uses the normal SN Ia spectroscopic template of Hsiao et al. 2007 that is "mangled" to match the actual colors derived from the data.
The theoretical distance modulus is defined as We assume the luminosity distance d L (z) for a spatially flat ⇤CDM Universe, ignoring radiation, is approximately given by where E(z) = p ⌦ m (1 + z) 3 + ⌦ ⇤ and c is the speed of light. We assume fiducial values for the matter and energy density fractions of ⌦ m = 0.28 and ⌦ ⇤ = 0.72 and a Hubble constant of H 0 = 73.24 km s 1 Mpc 1 (Riess et al. 2016).
Every value of M s has an error variance where m is the measurement uncertainty of the apparent magnitude m s , µpec is the uncertainty in the distance modulus µ ⇤CDM (z s ) due to the peculiar velocity and redshift uncertainties, given as 2 µpec,s (z s ) = ✓ 5 z s ln(10) For the SN in Table 4 with independent distance estimates, we use those corresponding distance modulus uncertainties. The term A in Eq. (7) is the uncertainty in the Milky Way dust extinction A O computed as, A = R O EBV , where EBV is the uncertainty in the Milky Way color excess E(B V ) MW , and Kcorr is the K-correction uncertainty estimated using Monte Carlo simulations of the full optical and NIR dataset {m s } for a given SN. In this Monte Carlo approach, for each photometric datum at a given MJD time and band, m s (T MJD ), we simulate a realization of this datum by drawing a random value from a Gaussian distribution with mean and standard deviation equal to the measured values m s and m . For each simulated full op-tical+NIR dataset, we compute the K-corrections and then determine the mean and standard deviation of the distribution of the K-correction values for each photometric datum at a given MJD time and band. We use this standard deviation as an estimation of the uncertainty of the K-correction value for that datum.

LC Fitting: Gaussian process
The Gaussian process technique is a non-parametric Bayesian method that we use to fit the NIR LCs for each SN Ia in Table 2. A Gaussian process defines a prior over functions. Realizations from a GP, evaluated on a discrete set of times, are random vectors drawn from a joint multivariate Gaussian distribution, N , of dimension equal to the number of components in the vector. Given a dataset, the GP formalism allows us to coherently determine the posterior mean function that fits the dataset along with its posterior covariance. The GP methodology is especially helpful in accounting for missing data (in our cases, phases with no observations), and when the data are correlated as in the case of the SN Ia LCs. Rasmussen & Williams (2006) provide an introduction to GPs for machine learning.
The following description applies to a LC of a single supernova in a given NIR band. We model the absolute magnitude M at phase t as a noisy measurement of the latent (true) absolute magnitude M at that phase, given by M (t) = M(t) + ✏, where ✏ ⇠ N (0, 2 M ). In vector notation we express the collection of absolute magnitude data of a given LC as where n ⇤ is the number of times in the grid determined from a sequence of phases between t min,s and t max,s in steps of 0.5 days, where t min,s and t max,s are the minimum and maximum phases in t ⇤ . Thus the number of times in the regular grid is n ⇤ = (t max,s t min,s )/0.5. In Appendix A, we provide the mathematical details to determine µ post and ⌃ post .

Normalization of the GP light curves
Our goal with the GP fitting is to determine the shape of the LC to be used later in Section 3.3 to construct NIR templates to fit the data and estimate distance moduli. So once we determine the posterior mean and covariance of the latent absolute magnitude LC for a given supernova s using GP, we normalize the LCs to extract the information about their shape regardless of their absolute magnitudes. The normalized LC L(t) is the function, over phase, of the di↵erence in magnitudes relative to the peak phase, so that L(t Bmax ) = 0.
To estimate the distance moduli, we choose to use the phase of B-band maximum light, t Bmax , as the reference time to derive the distances. In Section 4.2, we also implement the estimation of distance moduli using the time of NIR-band maximum light instead of t Bmax as the reference time.
We define the vector L, corresponding to the normalized LC derived from the latent absolute magnitude LC M ⇤ , evaluated on the phase grid t ⇤ , as where M 0 is the latent absolute magnitude at t Bmax and 1 is a vector of dimension n ⇤ with all its elements equal to one. Since this is a linear transformation of M ⇤ into L, and M ⇤ is Gaussian, therefore L is also Gaussian and described completely by its mean µ L and covariance ⌃ L . See Appendix A.1 for details. In the next section we use (µ L , ⌃ L ) to construct the NIR light curve templates.

Hierarchical Bayesian Model for the Normalized Magnitudes
In this section, we describe how we construct NIR LC templates for the Y , J, H, and K s bands that correspond to the mean shape of SN Ia LCs in each of these bands. To do so, we combine the normalized LCs described by (µ L , ⌃ L ), from all the supernovae at a given phase t ⇤ using a hierarchical Bayesian model to determine the mean normalized magnitude. Then we repeat the procedure described below over all the phases in t ⇤ to construct the final NIR LC templates.
First, we assume the normalized magnitude at phase t ⇤ , µ L s , for the supernova s is drawn from a Gaussian distribution with true value ⌘ s and standard deviation ⌘,s , where the value of 2 ⌘,s is given by the (t ⇤ s , t ⇤ s ) element in the diagonal of the covariance matrix ⌃ L s [see Eq. (A11)]. Next, we assume that the set of values {⌘ s } from all the N ⇤ T supernovae at phase t ⇤ , are independent draws from a Gaussian population distribution with population mean ✓ and variance 2 ✓ , In Appendix B, we write the expression for the joint posterior distribution of the hierarchical model and describe additional decompositions in order to make the computations more tractable to determine the posterior inference 6 of ({⌘ s }, ✓, ✓ ) given the data {(µ L s , ⌘,s )} at phase t ⇤ . We repeat the above procedure for all phases in the range t ⇤ = ( 10, 45) days, every 0.5 days, to determine (✓, ✓ ) for all t ⇤ in this range. Figure 1 shows the Y JHK s templates constructed with this procedure and Table 5 reports the numerical values of the templates. The posterior estimates of the population mean and variance of the normalized LC, (✓, 2 ✓ ), and the uncertainty in the determination of ✓, are shown in Figure 1 as black curves, green bands, and blue bands, respectively.

HUBBLE DIAGRAM
We implement two di↵erent methods to derive the distance modulus for each supernova from the NIR LCs. We call them the template method and the Gaussianprocess method (GP). The GP method requires data near the NIR maximum for all NIR bands being used, while the template method works for arbitrarily sampled data, even if the LC is sparse near maximum. For this reason, we have more objects in the template method Hubble diagrams. We describe these methods in more detail in the following sections.
Any of these NIR-only approaches approximately treat the information in each of the Y JHK s bands as independent. However, this simple approach does not take maximal advantage of the cross-band correlations between each of the NIR and optical bands, as is done using a more sophisticated hierarchical Bayesian model (e.g. BayeSN: Mandel et al. 2009Mandel et al. , 2011Mandel et al. , 2014. Nor does this approach use the fact that there is only one true distance to the supernova. To alleviate this problem, we also derive the distance modulus for each supernova from the combined distance moduli in each NIR band. However, instead of computing a simple average distance modulus from the individual distance moduli, we instead estimate the covariance matrix of the Y JHK s distance moduli (and submatrices of it) and then derive the weighted average distance modulus. The advantage of this procedure is that it takes into account the correlations among the magnitudes in the NIR bands and then derives more realistic mean distance moduli and their uncertainties. More details are in Section 4.3.
For our NIR-only Hubble diagrams, only NIR LCs are used to directly construct distance moduli. However, auxiliary optical data is used to estimate t Bmax , m 15 (B), and mangled K-corrections, and is employed in the input data selection cuts described in §2.1.

Distance Modulus: Template method
To determine the photometric distance modulus µ s of the supernova s in a given NIR band, we use the normalized mean template, ✓, computed in Section 3, to determine the apparent magnitude at phase zero, m 0,s ⌘ m s (t = 0), by fitting the template to the sometimes sparse photometric LC data {m s (t)}. We define the di↵erence where m s (t) and ✓(t) are the apparent magnitude and the magnitude of the normalized template at phase t, respectively. We can express this di↵erence for all the n phases in a given LC as the vector, Then, to determine m 0,s we minimize the negative of the log likelihood function L(m 0,s ) defined as 2 ln L(m 0,s ) = m > s · C 1 · m s + constant, (14) where C is the n-dimensional covariance matrix where the (t i , t j ) component is given by: where ✓ (t) is the population standard deviation of the sample distribution of magnitudes at phase t, determined from Eq. (B14) during the training process used to construct the mean LC template,ˆ 2 m,s (t i ) is the photometric error of the datum m s (t i ), and l is the hyperparameter of GP kernel determined from Eq. (A9) and with values shown in Table 15.
From Eq. (14), we can calculate an analytic expression for the maximum likelihood estimator (MLE) of the apparent magnitude at B-band maximum light,m 0,s , given by: with the MLE of the uncertainty ofm 0,s given aŝ which corresponds to the fitting error of the light curve. This error incorporates the photometric measurement error and the sparsity of the actual data points. Now, from the distribution of absolute magnitudes at phase zero estimated as M 0,s ⌘m 0,s µ ⇤CDM (z s ) (see Fig. 2), we compute the sample mean absolute magnitude, hM 0 i, and the sample standard deviation of the distribution, obtaining the values reported in Table 6. The sample standard deviation describes the total scatter of the absolute magnitude estimates. Below, we decompose this into the contributions from peculiar velocity-distance errors, measurement/fitting errors, and intrinsic dispersion.
Finally, we estimate the photometric distance modulus for supernova s in a given NIR band aŝ The uncertainty onμ s is composed of two sources of errors: the fitting uncertaintyˆ fit,s estimated in Eq. (18) for each individual supernova, and the intrinsic scatter, int , which primarily comes from the intrinsic variation of SN Ia absolute magnitudes and is estimated by fitting an entire sample of SN Ia on the Hubble diagram (see Appendix C for more details). So the variance of the photometric distance modulus is given aŝ 2 µ,s =ˆ 2 fit,s +ˆ 2 int .
The Hubble residual for supernova s is defined as µ s ⌘μ s µ ⇤CDM (z s ).
The uncertainty on µ ⇤CDM (z s ) is given by Eq. (8). The variance on the Hubble residual for supernova s, 2 ,s , comes from the propagation of uncertainties onμ s and µ ⇤CDM (z s ), it is, 2 ,s =ˆ 2 fit,s +ˆ 2 int + 2 µpec,s . Figure 2. Histograms of the absolute magnitudes at phase zero (t ⇤ = tBmax), defined as M0,s ⌘m0,s µ⇤CDM(zs) for the SN Ia sample using the template method. The sample mean, standard deviation, and the number of supernovae in each histogram are shown in Table 6.
In addition to int , to quantify the dispersion in the Hubble residuals, we also compute both the RMS and the inverse-variance weighted root-mean-square (wRMS, see Appendix C). The RMS and wRMS are measures of the total scatter in the Hubble Diagram. The wRMS is relatively insensitive to the assumed value of the peculiar velocity uncertainty, and the formula for the RMS does not depend on the asssumed value of pec at all and is therefore more straightforward to compare with other works.
For the template method, the values ofμ s , int and wRMS in the Hubble diagram residual for a given NIR band depend on the phase range of the NIR LC template used to determine the distance modulus. We found that phase range of t ⇤ = ( 8, 30) days in each of the Y JHK s bands minimized the scatter in the Hubble residual, as measured by int or wRMS. Table 11 reports the distance moduliμ s and their fitting uncertaintyˆ fit,s we obtain with this procedure for  Table 6. each supernova in each band, and Fig. 11 shows the Hubble diagram and residuals.

Distance Modulus: Gaussian-process Method
The nearby low-z NIR sample now contains a su cient number of SN Ia well-sampled around maximum light in the Y JHK s -bands, that we can explore referencing various distance estimation approaches to the times of these NIR maxima, rather than B-max, for which there has long been su ciently well sampled optical photometry.
An alternative approach that we implement to derive distance moduli is by determining the apparent magnitude at the time of NIR maximum light, t NIRmax , and B maximum light, t Bmax , using the GP technique to interpolate the LC data. The method follows the same procedure as the one described in Section 3.2, but instead of GP fitting the absolute magnitude LCs, {M s (t)}, we directly GP fit the apparent magnitude LCs, {m s (t)}. By doing this, we do not include µpec in the error budget for each m s (t) because we do not subtract µ ⇤CDM (z).  Table 6.
To determine the posterior mean of the apparent magnitude LC, {m s (t ⇤ )}, and the posterior covariance of a GP fit to {m s (t)} we use the Eqs. (A7) and (A8) where we set 2 µpec,s = 0. For each LC, we use the average of the apparent magnitude data as the GP prior mean, and use the same values for the hyperparameters of the GP kernel shown in Table 15, given that the shape and dispersion of the apparent magnitude LC data is very similar to the absolute magnitude LCs we fitted with GP in Section 3.2 for each supernova. We verified that the GP fits to the LCs are insensitive to these choices.
We only consider LCs that have data either around t NIRmax or t Bmax so that we can determine the GP fit at those references phases. By construction, t Bmax corresponds to the phase = 0 days. For the case of t NIRmax we limit the search for the maximum to the phase range 8.5 < t NIRmax < 2.5 days to remove cases where maximum of the posterior mean happens after t Bmax , which we found can be artifacts of the GP fit when there are too few data points before t Bmax . For the rest of this section we denote the subscripts "NIRmax" and "Bmax" simply as "max".
From each set {m s (t ⇤ )}, we estimate,m max,s , the GP interpolated apparent magnitude at t max,s . Then we estimate the distance modulus aŝ µ s =m max,s hM max i where hM max i is the mean absolute magnitude at t max,s from all the supernovae in a given NIR band (see Fig. 3), with M max,s ⌘m max,s µ ⇤CDM (z s ). The uncertainty on the photometric distance modulusμ s in this case iŝ fit,s , which is equal to the uncertainty in the apparent magnitude at t max,s inferred from the GP fit to the LC. Figure 12 shows Hubble diagrams constructed from the distance moduli inferred from the GP method for each of the Y JHK s bands, with numerical values reported in Table 12.

Distance modulus from the combined NIR bands
From the estimated distance moduli (μ Y s ,μ J s ,μ H s ,μ K s ) for a given supernova s determined from each NIR band using either of the three methods described above, we estimate the weighted average of the distance modulus µ s from each method. First we define the vector of resid- whereμ Y s ,μ J s ,μ H s ,μ K s are determined by either Eqs. (19) or (23), for the template or GP methods, respectively. Then, to estimate µ s , we minimize the negative of the likelihood function L(µ s ) defined as 2 ln L(µ s ) = µ > s · C 1 µ · µ s + constant, where C µ is the sample covariance matrix computed from the Hubble residuals (see Eq. (21)) { µ Y s , µ J s , µ H s , µ K s }, the collection of distancemodulus residuals from all SN Ia with observations in the four Y JHK s bands. For supernovae with observations in only three, two, or one bands, we construct the respective covariance matrices based on those supernova subsamples, and the vector defined in Eq. (24) becomes three, two, or one dimensional, respectively. In Appendix D, we provide numerical values of the covariance matrix C µ for these di↵erent subcases.
We derive an analytic expression for the minimization of Eq. (25) with respect to µ s and obtain the maximum likelihood estimate for the combined distance modulus given by,μ s = whereμ b s 2 {μ Y s ,μ J s ,μ H s ,μ K s } (the index b stands for band ), and Now, assuming that the uncertainties in the distance modulus estimated from each individual Y JHK s band,ˆ fit,s,Y ,ˆ fit,s,J ,ˆ fit,s,H ,ˆ fit,s,K , are independent between bands b and also independent of the intrinsic scatter int , then we can propagate the uncertainty in the combined distance modulus due to the fitting only as: whereˆ s,b ⌘ˆ fit,s,Y ,ˆ fit,s,J ,ˆ fit,s,H ,ˆ fit,s,K . The last column in Tables 11-13 show the combined distance moduli we obtain with this procedure for the template and GP methods respectively. The reported uncertainties correspond toˆ fit,s in all cases.

Distance modulus from optical bands
We wish to assess how well the SN Ia observed in NIR bands perform as standard candles, specifically when using t NIRmax as opposed to t Bmax , as the time reference to estimate their distance. To do so, we determine the distance moduli using only optical BV R-bands LCs for exactly the same 56 supernovae in the "any Y JHK s " Hubble diagram set that was used for the GP method (see left panel in Fig. 6 and the SN listed in Table 12). Then we can compare the intrinsic scatter and RMS or wRMS in the Hubble-diagram residuals between the optical-only and NIR-only Hubble diagrams. A smaller intrinsic scatter, wRMS, or RMS, including the uncertainties, would indicate evidence that SN Ia are better standard candles using that data and Hubble diagram construction method.

SALT2 distance modulus
We use the optical photometric data compiled in the public SNANA ) database 7 but replace the CMB redshift values in the SNANA photometric files with the z CMB values in Table 2. Using the latest SALT2 model (SALT2.JLA-B14) ) already trained on the JLA sample (Betoule et al. 2014), we fit the optical data and determine the SALT2 lightcurve fit parameters for each supernova. For the CSP data, we added an additional 0.01 mag in quadrature to the photometric errors to have a more conservative uncertainties on those values when fitting the data in SALT2. We use the SALT2 outputs including the apparent magnitude m B at B-band maximum light, the stretch parameter x 1 , and the color term c, as well as their correlations.
We convert the SALT2-fit parameters to distance moduli for each supernova using the Tripp formula (Tripp 1998), where M B is the expected absolute magnitude at Bband maximum light for a SN Ia with x 1 = 0, c = 0, while ↵ and are coe cients parametrizing correlations between luminosity and stretch or luminosity and color, respectively.
For the global parameters M, ↵, we use the values reported by Scolnic et al. (2018); ↵ = 0.147, = 3.00, and assume the fiducial values of H 0 = 73.24 km s 1 Mpc 1 and M B = 19.36 mag. We then adjust the latter to M B = 19.44 mag so that the weighted-average Hubble residual is zero.
The standard deviation of the measurement error fit,s from the SALT2 fitting comes from propagating the uncertainties on Eq. (29), including their correlations. Interestingly we found that for the supernovae with high Milky Way color excess E(B V ) > 0.2 the uncertainty on m B,s is larger than the propagated uncertainty on the SALT2 distance modulus, µ s , derived from optical bands. This evidence further emphasizes how SN Ia are more negatively a↵ected by dust when deriving distances using optical data, as compared to NIR observations. The variance of the photometric distance modulus is given by 2 µ,s = 2 fit,s + 2 int .
Using SALT2 in this way, we obtain an intrinsic scatter in the Hubble residuals of int = 0.133 ± 0.022, an inverse-variance weighted RMS of wRMS=0.174 ± 0.020 mag, and a simple RMS = 0.179 ± 0.018 mag. The third column of Table 14 and the left panel of Fig. 9 show the distance moduli derived from the SALT2 fits, along with the Hubble diagram and residuals, respectively. The uncertainties shown in Table 14 and Fig. 9 are the values of fit,s . Note that we are not applying the usual SALT2 cuts to this subsample of SN because we are interested in comparing the scatter in the Hubble residuals using exactly the same 56 SN Ia used in the "any Y JHK s " Hubble diagram for the GP method. We find that when applying the SALT2 cut on color, 0.3 < c < 0.3, there is only 1 SN Ia in the subsample that does not pass this cut. All SN Ia in the sample pass these SALT2 cuts: 3 < x 1 < 3, uncertainty in x 1 < 1, and uncertainty in t Bmax < 2 days. However, 21 SN Ia fail to pass the SALT2 cut requiring that the probability that the data are represented by the model, given the 2 per degree of freedom of the fit, is larger than 0.001 (a.k.a, FIT-PROB > 0.001). However, a low fit probability does not necessarily indicate a poor SN Ia light curve fit and may instead be an indication that the photometric uncertainties or the model uncertainties are unrealistically small. We visually inspected the light curve fits of these 21 SN Ia, finding that they are reasonably well-fit by the model and can therefore be used to yield accurate distance measurements.

SNooPy distance modulus
As a second cross check of the scatter in the opticalonly Hubble diagram, we also fit the BV R-bands LCs using the SNooPy LC fitting package's EBV_model where T Q (t, m 15,s ) is a light-curve template for the rest-frame band Q that depend on t and m 15,s , and M Q,s ( m 15,s ) is the absolute magnitude band Q. In this model, the free parameters that SNooPy estimates (along with their uncertainties) are µ s , m 15,s , E(B V ) host ,s and t Bmax . We consider the estimated uncertainty on µ s output by SNooPy as the fit,s in our analysis. We refer the reader to Burns et al. (2011) for details on how SNooPy estimates the uncertainty on µ s . We obtain an intrinsic scatter in the Hubble residuals of int = 0.128 ± 0.018, a wRMS= 0.159 ± 0.019 mag, and a RMS = 0.174 ± 0.021 mag. The fourth column of Table 14 and right panel of Fig. 9 show the distance moduli derived from the SNooPy fits, along with the Hubble diagram and residuals, respectively.

DISCUSSION
Tables 7 and 8 summarize the scatter in the Hubble residuals measured with the either the intrinsic scatter int , the wRMS, or the RMS. We compute these both for our fiducial peculiar velocity uncertainty of pec = 150 km/s as well as the value pec = 250 km/s used in Scolnic et al. (2018). While the formula for RMS in Eq. C23 does not depend on the assumed value of pec (see Appendix C), the value of int is quite sensitive to the assumed value of pec . In particular, larger assumed values of pec yield Note-We compare the Hubble residual scatter for the optical and NIR bands using exactly the same 56 supernovae for several methods. We compute the intrinsic scatter, int (see Appendix C), the inverse-variance weighted root-mean square (wRMS), and the simple RMS, using two standard LC fitters: SALT2 ) and SNooPy (Burns et al. 2011) to fit the optical BV R-band LC data, as well as the three NIR methods we implement in this work: NIR LC templates at B-max [Template], and GP regression at NIR-max [GP (NIR max) or B-band maximum [GP (B max)]. We are limited to 56 SN Ia because these are all the supernovae that we can fit using the GP method. Columns 4 and 5 show int, assuming pec = 150 and 250 km/s respectively. The estimated intrinsic scatter int decreases as the assumed peculiar velocity uncertainty pec increases from commonly assumed values of 150 km/s (Radburn-Smith et al. 2004) to 250 km/s , making int somewhat model dependent. By contrast, the wRMS value only changes by thousandths of a magnitude for pec in the same range. Column 6 shows the wRMS assuming pec = 150 km/s. Column 7 shows the simple RMS, which makes no assumptions about error weighting and does not depend on pec. Both optical methods apply LC shape and dust corrections but still yield a larger scatter than the NIR methods quantified with any of int, wRMS or RMS (see also smaller inferred values of int (see columns 4 and 5 of Tables 7 and 8).
The assumption of pec = 150 km/s in this work therefore yields a more conservative estimate of int compared with larger values of pec because, in the latter case, most of the scatter in the Hubble residuals can be explained as arising solely from peculiar velocities. For instance, the Hubble residuals using only H-band LCs from the GP (NIR max) method produce an intrinsic scatter of zero when assuming pec = 250 km/s. We found that wRMS is less sensitive than int to the assumed value of pec , producing di↵erences of ⇠ 0.001 mag between pec = 150 and pec = 250 km/s.
Of the three NIR methods used to derive distance moduli, the GP method at NIR max yields smaller RMS, wRMS, and intrinsic scatter in the Hubble residuals than the template and GP methods at B max methods applied to the same 56 SN Ia with data from any of the Y JHK s bands. When we combine the GP distance moduli for these same SN Ia referenced to the NIR maxima, we find an RMS = 0.117 ± 0.014, wRMS = 0.100±0.013, and intrinsic scatter of int = 0.047±0.018 mag. Using the GP method instead referenced to B-max for the same SN Ia yields RMS = 0.115 ± 0.011, wRMS = 0.106 ± 0.010, and int = 0.066 ± 0.016 mag. The NIR maxima thus yield comparable dispersion in the Hubble residuals than B-max for each individual NIR band subset with the GP method (see Table 10).
By comparison, when using the NIR template method referenced to B-max for these same SN Ia, we find a larger value of RMS = 0.138 ± 0.014, wRMS = 0.140 ± 0.016, and int = 0.112 ± 0.016 mag.
Overall, as shown in Table 9, depending on the NIR Y JHK s subset, the NIR-only GP method yields a RMS in the Hubble residuals that is as much as ⇠ 2.3-4.1 smaller than the SALT2 and SNooPy fits using opticalonly BV R data. Furthermore, our "any Y JHK s " set of 56 SN Ia yields a RMS for our GP method at NIR max that is 0.057 ± 0.025 mag smaller than SNooPy and 0.062 ± 0.023 mag smaller that SALT2 applied to the corresponding BV R data, again at the ⇠ 2.7 level. We interpret the smaller intrinsic scatter as additional evidence, at the ⇠ 2.5-3.1 level, that NIR SN Ia LCs at NIR maximum, without LC shape or dust corrections, are already better standard candles than opticalonly SN Ia LCs referenced to B-max that apply such corrections. In addition, it is possible that NIR data or a combination of NIR and optical could yield even smaller intrinsic scatter if employing a method that applies LC shape and dust corrections, for example, using  Table 7. Columns 4 and 5 show int, assuming pec = 150 and 250 km/s respectively. Note that by increasing the value of pec, the int decreases even to zero in some cases with uncertainty denoted by 1. For the GP method, we use exactly the same supernovae at B-max or NIR-max. For all NIR band subsets, the GP (NIR max) method produces the smallest scatter, followed by the GP (B-max) method, while the template method always yields the largest scatter and wRMS. Figs. 5-7 and 11-13 show Hubble diagrams and residuals for most of the NIR subsets listed in this table. ⇤ For pec = 250 km/s, the estimated value of int in these cases is consistent with zero. See the paragraph below Eq. (B.7) in Blondin et al. (2011) for the explicit approximation we use to estimate the uncertainty, which breaks down at int = 0.
a hierarchical Bayesian approach like BayeSN (Mandel et al. 2009. In Table 9, we note that the uncertainty on the di↵erence in the dispersion estimates between any two methods has been computed conservatively. The uncertainty of the dispersion of each individual method has been computed independently, and then the uncertainty in the di↵erence is found by adding in quadrature, assuming the independence of the samples and therefore the individual uncertainties. However, this ignores the fact that the supernovae in our optical sample are exactly the same ones as those in our NIR sample. Therefore, the actual peculiar velocity-distance errors must be the same in each sample (and not just the variance of these errors). Because of this common component of scatter, the dispersion estimate for the optical Hubble Diagram is (positively) correlated with that for the NIR Hubble Diagram in each comparison. The e↵ect of this positive correlation is to reduce the variance in the di↵erences in dispersion. Using our estimates of int , fit,s and µpec,s for the sample and each method, we have run simulations to account for this correlation and quantify this e↵ect. For example, we find that the uncertainty in RMS for "SNooPy -any Y JHK s " is ⇠ 30% smaller than naive uncertainty assuming independent samples, resulting in a significance greater than 3 .
For the Hubble diagrams created using just one of the Y JHK s bands, when using the GP method at NIR max, the Y band has the smallest scatter with a RMS of 0.111 ± 0.018 mag. When using the template method, the Y band has also the smallest scatter with RMS = 0.152 ± 0.016.
For every individual band and subset of NIR bands shown in Table 10, the GP method yields smaller intrinsic scatter when referencing to NIR max instead of B max, by mean amounts of up to ⇠ 0.03 mag for the same SN Ia at up to the ⇠1.0 level. While not as statistically significant as the NIR vs. optical comparison in Table 9, we note that the NIR maxima yield smaller intrinsic scatter int and wRMS than B max for all sub- wRMS, and RMS, where the first is defined as the di↵erence in Hubble residuals intrinsic scatter between the optical BV R data, fit using SALT2 or SNooPy, and the indicated subset of NIR data using the Gaussian process method at NIR max. The quantities wRMS and RMS are defined in a similar way to int but using wRMS and RMS instead of the intrinsic scatter, respectively. The uncertainties are given by the quadrature sum of the int, wRMS, or RMS, uncertainties from columns 4, 6, or 7, respectively of Tables 7 and 8 for pec = 150 km/s. Columns 3, 5, and 6, show ndefined as the number n of standard deviations by which the NIR data yields smaller int, wRMS, or RMS, than the optical data using these methods, respectively. Excluding the Ks-band on its own, where our LC compilation contains much less data than the Y JH bands, in general, NIR data subsets yield smaller RMS than the optical data at the ⇠ 1.3-4.1 level. In the best case, the JH, Y JH, and Y JHKs-bands perform ⇠ 2.3-4.1 better than either SALT2 or SNooPy fits to the BV R data in terms of the RMS, while in the worst case, J-band, still performs 1.3 better than optical data. For simplicity, the stated uncertainties on the di↵erence in dispersion estimates between any two methods ignores the fact that the actual peculiar velocity-distance errors are exactly the same between the optical and NIR samples, since they contain exactly the same SN. The e↵ect of accounting for this correlation is to decrease the uncertainty of the di↵erence, and increase the significance ( §5). wRMS, and RMS, defined here as the di↵erence in Hubble residuals scatter between the Gaussian process method referenced to B max or NIR max. As in Table 9, the uncertainties are given by the quadrature sum of the int or wRMS uncertainties from columns 4 or 6 of Tables 7 and 8 for pec = 150 km/s. Columns 3, 5, and 7, show ndefined as the number n of standard deviations by which the NIR data referenced to NIR max yields smaller intrinsic scatter, wRMS, or RMS, than when referenced to B-max, respectively. For every individual band and subset of NIR bands, the GP method yields smaller estimated intrinsic scatter when referencing to NIR max instead of B max, where the largest di↵erence is n-= 0.99 for JH band. This trend is also observed when comparing the wRMS values, again, excluding, Ks band, where our sample lacks enough data to draw meaningful conclusions. Figure 5. Y JHKs Hubble diagrams (top row) and residuals (bottom row) using the template method on the 89 supernovae that passed our cuts. The error bars plotted for each supernova correspond to the fitting uncertaintiesˆ fit,s . The left panel corresponds to the case when we determine a single distance modulus by combining any of the available 1, 2, 3, or 4 Y JHKs distance moduli for a given SN Ia. The right panel shows the case when we require only SN Ia with J and H-band data, which allows us to include the majority of data from the CfA and CSP samples. Points are color coded by NIR photometric data source, including the CfA (red; Wood-Vasey et al. 2008;Friedman et al. 2015), the CSP (blue; Krisciunas et al. 2017), and other data from the literature (green; see Table 2 and references therein). Note that only the CSP used a Y -band filter. Table 8 summarizes the intrinsic scatter in the Hubble diagrams, while Table 11 reports the numerical values of the distance moduli from this figure. Note-Distance moduli and their fitting uncertaintiesˆ fit,s , estimated from the di↵erent NIR bands, either alone (see columns 3-6) or combined (column 7), using the template method. Corresponding Hubble diagrams are shown in Figs. 5 and 11.   Table 13 shows numerical values of the distance moduli from this figure.
sets of the NIR data except for K s . 8 While NIR data at NIR max are better standard candles in comparison to optical data, they are also at least as good or better than when referenced to B-max. Therefore, future analyses should consider using t NIRmax as the reference time instead of the traditional t Bmax,s .
As an additional comparison between NIR and optical Hubble residuals, in Fig. 10, we plot the histograms (dashed lines) with their Gaussian approximation (left panel), and the cumulative distribution function (right panel) for Hubble residuals using the same 56 SN Ia used for the "any Y JHK s " GP method at NIR max (lower left panel on Fig. 6), SALT2 (lower left panel of Fig. 9), and SNooPy (lower right panel of Fig. 9). The Gaussian approximations of the histograms in the left panel of Fig. 10 show that the Hubble residuals are more narrowly distributed for the NIR data (solid red curve) compared to both optical methods (solid green and blue), while in the right panel of Fig. 10 the cumu-8 The only exception we tested is the Ks-band, which has only 14 SN Ia LCs, where we find an essentially equivalent wRMS ⇠ 0.163 mag when referenced to either NIR max or B-max. lative distribution function curve for the NIR Hubble residuals is steeper than for either optical curve. Both approaches suggest that the Hubble residual scatter is smaller in the NIR compared to the optical. A larger sample of SN Ia in the NIR would strengthen the evidence for this conclusion.

CONCLUSIONS
This work bolsters and confirms a growing body of evidence that SN Ia in NIR are excellent standard candles in the Y JHK s bands in comparison to the optical BV R bands. Depending on the NIR data subset, our GP method performs 2.3-2.7 better in RMS than either the SALT2 or SNooPy LC fitters for the same 56 SN Ia using BV R data and applying LC shape and color corrections. Using a suitable subset of the existing low-redshift sample including 89 spectroscopically normal SN Ia with NIR data, Y JHK s photometry alone already provides a simple means to estimate accurate and precise host galaxy distances in each band, without the LC shape or host galaxy dust reddening corrections required for optical data.  Fig. 6 and 7. Again, Table 7 summarizes the intrinsic scatter in the Hubble diagram while Table 11 shows numerical values of the distance moduli from this figure.
In this work, we employed a hierarchical Bayesian model, combined with a Gaussian process LC fitter, to construct new mean NIR LC templates. We then used these templates, along with Milky Way dust corrections, NIR K-corrections, and the measured spectroscopic redshifts (corrected for local velocity flows), and redshift independent distance information (e.g. Cepheids) for special cases, to estimate host galaxy distances and uncertainties and construct Hubble diagrams in each of the individual Y JHK s bands. When considering NIR-only methods, our GP method referenced to the time of NIR maximum yields slight smaller Hubble diagram intrinsic scatter and error weighted RMS than when referenced to B max and significantly smaller intrinsic scatter compared to the template method.
Our approach is intermediate in complexity between earlier analyses by our group by Wood-Vasey et al. 2008 and the BayeSN approach detailed in Mandel et al. 2009Mandel et al. , 2011. The BayeSN methodology presents a coherent, principled, hierarchical Bayesian model that takes into account the full correlation structure between all the input optical and NIR bandpasses, both in color and phase, in order to determine the posterior distribu-tions for distance moduli µ, host galaxy dust estimates A V , and separate R V values for each supernova. Nevertheless, BayeSN is considerably more complex to implement than the simpler analysis methods in this work, which perform quite well for our sample of NIR data.
Compared to optical LCs, NIR SN Ia LCs have a narrow luminosity distribution, and are less sensitive to host galaxy dust extinction. This could help to limit systematic galaxy distance errors that arise from the degeneracy between the intrinsic supernova colors and reddening of light by dust, that a↵ects optical-only SN Ia cosmology (Krisciunas et al. 2004a;Wood-Vasey et al. 2008;Folatelli et al. 2010;Burns et al. 2011;Burns et al. 2014;Kattner et al. 2012;Mandel et al. 2009Mandel et al. , 2011Mandel et al. , 2017Scolnic et al. 2014bScolnic et al. , 2017. Studies combining NIR and optical SN Ia photometry have already shown that the addition of NIR data is an extremely promising way to break the degeneracy between intrinsic color and dust reddening, allowing distance estimates to become increasingly insensitive to the assumptions behind individual LC fitting models (Mandel et al. , 2014. We have recently begun to augment the existing lowz SN Ia in NIR sample from the CfA, CSP, and other Figure 9. Hubble diagram (top row) and residuals (bottom row) using SALT2 and SNooPy to fit only the optical BV R-band LCs for exactly same sample of 56 SN Ia used for the "any Y JHKs " GP (NIR max) Hubble diagram shown in the left panel of Fig. 6 and listed in Table 12. As emphasized in Tables 7-9, the intrinsic scatter is clearly larger in these optical only Hubble diagrams compared with the GP NIR max ones constructed for the same 56 SN Ia. Table 14 shows numerical values of the distance moduli from this figure. Figure 10. Comparing the scatter in the Hubble residuals, { µs}, as defined in Eq. (21), using NIR and optical methods for the same 56 SN Ia. The red, green, and blue colors correspond to the Hubble residuals from the "any Y JHKs" GP (NIR max) method (lower left panel of Fig. 6), SALT2 (lower left panel of Fig. 9), and SNooPy (lower right panel of Fig. 9), respectively. The left panel shows histograms (dashed lines) and Gaussian approximation to the histograms (solid lines) of the Hubble residuals, where we observe that the distribution of the NIR Hubble residuals (red) is narrower than either optical distribution (blue or green). The right panel shows the corresponding cumulative probability distribution functions, where we also note that the slope of the NIR curve is steeper than the optical curves, asymptotic to 1 at a smaller value of µ, again indicating that the Hubble residual scatter is smaller in the NIR compared to the optical. Note-Distance moduli and their fitting uncertaintiesˆ fit,s , estimated from the di↵erent NIR bands, either alone (see columns 3-6) or combined (column 7) using the Gaussian-process method at NIR max. The Hubble diagrams from these data are shown in Figs. 6 and 12.  Brout et al. 2018b). Analysis of the RAISIN data will be presented in future work. The evidence from this work further emphasizes the promise of NIR wavelength observations not only for the ongoing HST RAISIN project, but also for future space studies of cosmic acceleration and dark energy (Gehrels 2010;Beaulieu et al. 2010;Astier et al. 2011;Hounsell et al. 2017;Riess et al. 2018c). Upcoming missions that could exploit nearby NIR data as a low-z anchor include the Large Synoptic Survey Telescope (LSST; Ivezic et al. 2008), the NASA Wide-Field Infrared Survey Telescope (WFIRST-AFTA; Gehrels 2010; Spergel et al. 2015), the European Space Agency's EUCLID mission (Beaulieu et al. 2010;Wallner et al. 2017), as well as the NASA James Webb Space Telescope (JWST; Clampin 2011; Greenhouse 2016).
NIR photometry can also augment our knowledge of the spectral energy distribution of SN Ia, for example the Type Ia parametrized SALT2 model, which is currently poorly constrained at infrared wavelengths (Pierel et al. 2018b,a). This will dovetail nicely with the NIR capabilities of JWST and WFIRST and be useful for future SN Ia surveys.
Methods such as BayeSN (Mandel et al. 2009, SNooPy, and SALT2ext (Pierel et al. 2018b,a) that use empirical LC fitters and provide host galaxy distance estimates using both optical and NIR data can be extended to obtain cosmological inferences and dark energy constraints using both low-z and high-z samples.

A. GAUSSIAN PROCESS REGRESSION
Given the dataset of observations in an absolute magnitude NIR LC, (M, t) for a given supernova, we want to use this information to estimate the latent absolute magnitudes M ⇤ at a grid of phases t ⇤ described in Section 3.2. To do so, we define a Gaussian process with these data and variables.
To model the covariance Cov[M ⇤ , M ⇤> ] we choose the squared exponential GP kernel that is defined as where K and l are the GP kernel hyperparameters that we explain how to compute at the end of this section. We choose the GP kernel of Eq. (A1) because it is simple, produces smooth curves, and has the general properties we need to model the observed shapes of the NIR LCs: for two phases very close to each other, t i ⇠ t j , their covariance is close to 1, and for distant phases, t i ⌧ t j , then k(t i , t j ) ⇠ 0, such that they are almost uncorrelated. We also take into account the uncertainty associated with each datum M (t i ) in the variance 2 M = 2 m + 2 A + 2 Kcorr + 2 µpec [see Eq. (7) for details], by defining the total covariance between two data points as where ij is the Kronecker delta function, and we assume that the measurement and K-correction errors are independent between two di↵erent M i and M j , but that both the peculiar velocity-distance error and the Milky Way extinction error are not independent at di↵erent times because they are the same over the whole LC in a single filter for a given supernova. In matrix notation we can write Eq. (A2) for all the data M in a LC as where K(t, t) is a square matrix with elements given by Eq. (A1), W is a diagonal matrix of dimension n ⇥ n with elements given by W ij = ij 2 m + 2 Kcorr , and 1 is a vector of ones, so that the term ( 2 µpec + 2 A )1 · 1 > , is a square matrix of dimension n ⇥ n with elements all equal to ( 2 µpec + 2 A ). Following the standard GP formalism (e.g., Chapter 2 of Rasmussen & Williams (2006)), we first write the joint distribution of the observed absolute magnitudes, M, and latent absolute magnitudes, M ⇤ , with a constant prior mean as " K(t, t) + W + ( 2 µpec + 2 A )1 · 1 > K(t, t ⇤ ) where 1 and 1 ⇤ are vectors of ones and of dimensions n and n ⇤ respectively, and a is a scalar that we assign the value of 17.5, 17, 18 and 18 mag for the Y , J, H and K s bands, respectively. We assume these values of a just for computational convenience in the GP fitting, and verified that the final templates are insensitive to these choices over a wide range of values for a. The matrices K(t, t), K(t ⇤ , t), K(t, t ⇤ ), and K(t ⇤ , t ⇤ ), are of dimensions n ⇥ n, n ⇤ ⇥ n, n ⇥ n ⇤ and n ⇤ ⇥ n ⇤ respectively, with elements defined by Eq. (A1). The conditional distribution of M ⇤ given t, t ⇤ and M, can be written as M ⇤ |t, t ⇤ , M ⇠ N µ post , ⌃ post (A6) where the posterior mean µ post and posterior covariance ⌃ post are given as h K(t, t) + W + ( 2 µpec + 2 A )1 · 1 > i 1 K(t, t ⇤ ) (A8) The final values we obtain from the GP regression are the vector µ post and the matrix ⌃ post , that we estimate using Eqs. (A7) and (A8) respectively.
The coe cients K and l in Eq. (A1) are called the hyperparameters of the GP kernel that we determine by assuming that the LCs for all the SN in a given NIR band are independent of each other, and that the GP hyperparameters describe the population of the SN LCs in a given band rather than each individual LC. With these assumptions, we can write the global marginal likelihood distribution where the subindex s refers to quantities for supernova s, N T is the number of SN Ia used to construct the normalized LC template in a given NIR band, and "{ } s " means the collection of values from all the N T SN Ia. To compute the MLE values for ( K , l), we minimize the negative of the logarithm of Eq. (A9), obtaining the values shown in Table  15.

A.1. Normalization of the GP light curves
In Section 3.2.1, we explained that we are primarily interested in the shape of the light curves. For this reason, after determining the posterior light curve described by (µ post , ⌃ post ), we normalize the LC using t Bmax as the reference time where the light curve will have a value of zero.
First, for computational convenience, we rewrite the linear transformation of Eq. (9) as the matrix operation where A is a n ⇤ ⇥ n ⇤ square matrix defined as A ⌘ I V k , where I is the identity matrix, and V k is a matrix containing only 1s in the kth column and zeros everywhere else, assuming that the kth element of t ⇤ correspond to phase t ⇤ k = t Bmax . We compute the mean of the normalized LC as, µ L = E[L|D] = A E[M ⇤ |D] = Aµ post , where D ⌘ (t, t ⇤ , M) is the conditional data in Eq (A6). And the covariance is given by From these expressions at t ⇤ k = t Bmax , the posterior mean and variance of the normalized LCs are both identically zero: which is required for self-consistency with the definition of the normalized LC.

B. HIERARCHICAL BAYESIAN MODEL
Using Bayes' theorem, applying the product rule for probability, and assuming conditional independence of the means of the normalized LCs, µ L s 's, with respect to the population mean and variance (✓, 2 ✓ ), we can write the joint posterior distribution in our hierarchical model as Inserting Eqs. (10) and (11) into Eq. (B13), we obtain, where N ⇤ T is the number of supernovae for which we have determined the best fitting function at phase t ⇤ . Note that since each LC has a di↵erent number of photometric data points over di↵erent phase ranges, this implies that N ⇤ T is di↵erent for each phase t ⇤ . For computation convenience, following Gelman et al. (2014), we decompose the joint posterior distribution using the product rule as where the first factor to the right of the proportionality sign of Eq. (B15) can be written for the supernova s as where and The middle factor to the right of the proportionality sign of Eq. (B15) can be written as and Finally, the last term to the right of the proportionality sign can be written as where we are assuming a uniform prior distribution p( ✓ ) / 1. We use Eq. (B15) combined with Eqs. (B16)-(B22) to simultaneously determine the posterior best estimates of ({⌘ s }, ✓, ✓ ) at phase t ⇤ , given the data {µ L s , ⌘,s }, following the computational procedure described in Appendix C.3, subsection "Marginal and conditional simulation for the normal model", of Gelman et al. (2014). We use the R code presented there to build our R code to make the computations described in this work.

C. RMS, WEIGHTED RMS, AND THE INTRINSIC SCATTER
We use the RMS to quantify the scatter in the Hubble residuals because it is simple and straightforward to compute and compare with the Hubble residuals reported by other authors. The definition we use is where N SN is the total number of SN Ia in the Hubble diagram. We compute the uncertainty on RMS using bootstrap resampling.
To weight the root mean square (RMS) by the uncertainties in each SN distance modulus estimate in each NIR band, we compute the inverse-variance weighted root mean square (wRMS) of the residuals as where w s ⌘ 1/(ˆ 2 fit,s +ˆ 2 int + 2 µpec,s ) and µ s is defined in Eq. (21). We also compute the uncertainty on wRMS using bootstrap resampling.
We determine the intrinsic scatter, int , in the Hubble residual following the procedure described in Eqs. (B.6)-(B.7) in Appendix B of Blondin et al. (2011). This dispersion tries to quantify the scatter due to intrinsic di↵erences in the NIR SN Ia absolute magnitudes only and not due to the peculiar-velocity uncertainty of each SN. The intrinsic scatter corresponds to the remaining dispersion observed in the Hubble-diagram residuals after accounting for the uncertainty in distance modulus due to the peculiar-velocity uncertainty, 2 µpec,s , and the photometric errors {ˆ fit,s }. When comparing our notation to Eqs. (B.6)-(B.7) of Blondin et al. (2011), note that where we use fit,s , int and µpec,s , Blondin et al. (2011) instead uses the notation m,s , pred , and pec,s , respectively.

D. COVARIANCE MATRIX C µ OF HUBBLE RESIDUALS
In this section we provide the numerical values for di↵erent cases of the covariance matrix C µ . For the template method, we find the following values of the sample covariance matrix C µ for the Y JH bands: For the GP method, we find the following values for the sample covariance matrix for the Y JH bands: and for the JHK s bands:  Table 2). Note that only the CSP used a Y -band filter. In Table 11, we report the numerical values of the distance moduli shown in this figure. Table 8 shows the intrinsic scatter in the Hubble diagram. Figure 12. Individual Y JHKs Hubble diagrams (top row) and residuals (bottom row) using the Gaussian-process method at NIR max. See the caption of Fig. 11. In Table 12, we report the numerical values of the distance moduli shown in this figure. Figure 13. Individual Y JHKs Hubble diagrams (top row) and residuals (bottom row) using the Gaussian-process method at B max. See the caption of Fig. 11. In Table 13, we report the numerical values of the distance moduli shown in this figure.