Cosmological Results from the RAISIN Survey: Using Type Ia Supernovae in the Near Infrared as a Novel Path to Measure the Dark Energy Equation of State

Type Ia supernovae (SNe Ia) are more precise standardizable candles when measured in the near-infrared (NIR) than in the optical. With this motivation, from 2012-2017 we embarked on the RAISIN program with the Hubble Space Telescope (HST) to obtain rest-frame NIR light curves for a cosmologically distant sample of 37 SN Ia ($0.2 \lesssim z \lesssim 0.6$) discovered by Pan-STARRS and the Dark Energy Survey. By comparing higher-$z$ HST data with 42 SN Ia at $z<0.1$ observed in the NIR by the Carnegie Supernova Project, we construct a Hubble diagram from NIR observations (with only time of maximum light and some selection cuts from optical data) to pursue a unique avenue to constrain the dark energy equation of state parameter, $w$. We analyze the dependence of the full set of Hubble residuals on the SN Ia host galaxy mass and find Hubble residual steps of size $\sim$0.06-0.1~mag with 1.5- to 2.5-$\sigma$ significance depending on the method and step location. Combining our NIR sample with CMB constraints, we find $1+w=-0.17\pm0.12$ (stat$+$syst). The largest systematic errors are the redshift-dependent SN selection biases and the properties of the NIR mass step. We also use these data to measure $H_0=75.9\pm 2.2$ km s$^{-1}$ Mpc$^{-1}$ from stars with geometric distance calibration in the hosts of 8 SNe Ia observed in the NIR versus $H_0=71.2\pm3.8$ km s$^{-1}$ Mpc$^{-1}$ using an inverse distance ladder approach tied to Planck. Using optical data we find $1+w=-0.10\pm0.09$ and with optical and NIR data combined, we find $1+w=-0.06\pm0.07$; these shifts of up to 0.11 in $w$ could point to inconsistency in optical versus NIR SN models. There will be many opportunities to improve this NIR measurement and better understand systematic uncertainties through larger low-$z$ samples, new light-curve models, calibration improvements, and by building high-$z$ samples from the Roman Space Telescope.


INTRODUCTION
After five decades of development as cosmological distance probes (Kirshner 2010), Type Ia supernovae (SNe Ia) are now mature tools for understanding cosmic acceleration (Riess et al. 1998;Perlmutter et al. 1999) and systematic errors will soon become the largest source of uncertainty in the parameters of cosmic expansion (Betoule et al. 2014;Riess et al. 2016;Scolnic et al. 2018;Brout et al. 2019a;Jones et al. 2019;Brout et al. 2022). In the optical bands where most SN Ia data have been obtained, substantial deviations from the behavior of a standard candle are reduced by correcting for the observed relations between light curve shape and luminosity (Pskovskii 1977;Phillips 1993;Hamuy et al. 1996;Riess et al. 1996;Guy et al. 2007;Jha et al. 2007;Conley et al. 2008;Mandel et al. 2011;Burns et al. 2014), reddening by dust (Riess et al. 1996;Phillips et al. 1999;Burns et al. 2014; Thorp et al. 2021;Mandel et al. 2022), and the intrinsically redder color of less luminous SNe Ia (Tripp 1998;Mandel et al. 2017).
However, SNe Ia have been found to be more nearly standard candles in the near-infrared (NIR; wavelengths covered by the rest-frame zY JHK bands) and their light is also less sensitive to the effects of dust extinction at these wavelengths (Krisciunas et al. 2004(Krisciunas et al. , 2007Wood-Vasey et al. 2008;Mandel et al. 2009Mandel et al. , 2011Barone-Nugent et al. 2012;Phillips 2012;Avelino et al. 2019;Mandel et al. 2022). This paper reports an attempt to realize these advantages in a cosmologically distant SN Ia sample.
At low redshift, significant samples of SNe Ia have been observed in the NIR by the Carnegie Supernova Project (CSP-I; Contreras et al. 2010;Stritzinger et al. 2011;Krisciunas et al. 2017) at Las Campanas Observatory in Chile and the CfA Supernova Survey (Wood-Vasey et al. 2008;Friedman et al. 2015) using PARI-TEL at the F.L. Whipple Observatory in Arizona. NIR SN Ia light curves from SweetSpot (Weyant et al. 2018) using the WIYN telescope at Kitt Peak National Observatory in Arizona have also been assembled, and additional data from CSP-II Phillips et al. 2019), the VISTA Extragalactic Infrared Legacy Survey (VEILS) 1 , UKIRT (Konchady et al. 2021), and HST (the SIRAH program; HST-GO 15889, PI: Jha) are forthcoming.
Measurements of the dark energy equation of state parameter, w, require comparing low-z SNe to those at larger cosmological distances. Measurements of z > 0.1 SNe in the rest-frame NIR have only been obtained by Stanishev et al. (2018) (four SNe) and Freedman et al. (2009). The only previous measurement of the dark energy equation of state using NIR data was the pioneering study of Freedman et al. (2009), who found w = −1.05 ± 0.13 (stat) ± 0.09 (sys) by comparing 21 low-z SNe to 35 high-z SNe using templates as red as the rest-frame I band with observations in Y J from the Magellan Baade telescope. This analysis was unable to apply a number of corrections that are now commonly included in cosmology analyses but were less well studied or entirely unknown at the time that Freedman et al. (2009) was published, including corrections for distance (Malmquist) biases and the dependence of Hubble residuals on host galaxy mass (Kelly et al. 2010;Lampeitl et al. 2010;Sullivan et al. 2010).
Though high-z NIR SN Ia observations are rare, the value of NIR data is clear from low-z analyses. Avelino et al. (2019) used CSP-I and CfA data to show in headto-head comparisons of the same SNe that the scatter in the distances as measured in the infrared is smaller by 35% than distances determined from optical light curves. Studies such as Wood-Vasey et al. (2008); Mandel et al. (2009); Kattner et al. (2012); Mandel et al. (2022) have also shown evidence that SNe at NIR wavelengths yield more precise distance measurements.
Low-z NIR data have recently been used to demonstrate a significant relationship between SN Ia distance residuals and the mass of the SN Ia host galaxy (Ponder et al. 2021;Uddin et al. 2020), which was originally discovered at optical wavelengths (Kelly et al. 2010;Lampeitl et al. 2010;Sullivan et al. 2010), although Johansson et al. (2021) found mildly conflicting results with no evidence for a mass-step in the JH bands and ∼2-σ evidence for a step in the Y band. Burns et al. (2018) and Dhawan et al. (2018) used NIR data to measure the Hubble constant, H 0 , using this new SN Ia wavelength range to provide additional evidence for tension between Cepheid+SN distance ladder measurements of H 0 (e.g., Riess et al. 2021) and the cosmic microwave background (CMB;Planck Collaboration et al. 2018).
With the advent of the Nancy Grace Roman Space Telescope, developing the optimal ways to measure NIR distances at cosmological redshifts is now urgent. Although Roman will observe SNe Ia primarily at restframe optical wavelengths at maximum redshifts up to 2.5 (Hounsell et al. 2018), Roman SN Ia observations will be in the rest-frame NIR in the important regime at z 0.7 where Baryon Acoustic Oscillation constraints will be limited by the cosmic volume (Weinberg et al. 2013).
Here, we present NIR cosmological parameter measurements from the RAISIN survey. RAISIN (an anagram for "SNIA in the IR") used 23 SNe Ia discovered in the Medium Deep Survey (MDS) of Pan-STARRS  in Hawaii and another 23 discovered by the Dark Energy Survey (DES) at Cerro Tololo in Chile (The Dark Energy Survey Collaboration 2005). We triggered Hubble Space Telescope observations of those objects with WFC3-IR using the F 125W and F 160W filters. After applying a number of wellmotivated cuts to the data, we use 42 low-z SNe Ia from CSP-I (hereafter CSP) and 37 high-z (z > 0.2) SNe Ia from RAISIN to measure w.
In this work, we pursue a "NIR-only" cosmological analysis that uses a different wavelength range than previous cosmological analyses with SNe and will have reduced systematic uncertainties due to dust. An optimal combination of optical+NIR data would yield the most precise cosmological constraints, but we show that an NIR-only analysis offers additional independent distance information. Throughout this paper we enumerate areas that will be improved to shrink these errors in future NIR studies and compare our results to a combined optical+NIR measurement.
In Section 2 we describe the RAISIN data, including survey properties, classifications, and photometric measurements. In Section 3 we describe our analysis method. In Section 4 we present our baseline cosmological results and in Section 5 we discuss additional analysis variants and measure the correlation of Hubble residuals with host galaxy mass. In Section 6 we discuss the implications of our results for future missions such as the Roman Space Telescope. In Section 7 we conclude.

Overview and Strategy
The RAISIN program was carried out in cycle 20 through HST-GO 13046 (hereafter RAISIN1; PI: Kirshner) and cycle 23 through HST-GO 14216 (hereafter RAISIN2; PI: Kirshner). RAISIN1 followed 23 spectroscopically classified SNe from the Pan-STARRS MDS ) and RAISIN2 followed 23 spectroscopically classified SNe from DES (The Dark Energy Survey Collaboration 2005). RAISIN1 observed SNe in the redshift range of 0.22 ≤ z ≤ 0.50 and RAISIN2 observed SNe at 0.35 ≤ z ≤ 0.61 to observe at redshifts where the available HST filters overlap with the restframe Y JH filters to minimize K-correction uncertainties. The median redshifts of PS1 and DES SNe aligned well with these targeted redshift ranges.
Due to occasional poor weather during PS1 and DES observing seasons and the need for HST template imaging after each SN had faded, RAISIN observations for each program extended over a period of 1.5-2 years; RAISIN1 observations were taken from 2012 October 29 to 2014 June 17 and RAISIN2 observations occurred from 2015 September 28 to 2017 November 21st.
To select RAISIN candidates, we identified Pan-STARRS and DES SNe that were likely to be SNe Ia discovered before maximum light. These candidates were then classified spectroscopically; classifications for MDS and DES SNe were carried out either by the RAISIN team in collaboration with the survey teams or by the MDS or DES teams themselves. For RAISIN1, we selected candidates that had been discovered by the Pan-STARRS team and were consistent with having a light curve phase approximately 5-10 days before maximum light. For RAISIN2, we similarly selected candidates based on their apparent light curve phase and additionally required that the photometric redshift of their host galaxies were consistent with the target redshift range of ∼ 0.5 ± 0.1.
Most classifications were from Magellan (16 SNe) and Gemini South (13 SNe), with additional classifications from Gemini North (four SNe), the MMT (six SNe), the AAT (three SNe), and Keck (three SNe). Each classification was determined using the SN IDentification software (SNID; Blondin & Tonry 2007), which uses crosscorrelation matching to template SNe to yield SN types, light curve phases, and redshifts. Representative spectra for RAISIN targets are shown in Figure 1. Spectroscopically classified SNe Ia with rising light curves within the target redshift range were submitted to STScI for scheduling as non-disruptive Targets of Opportunity on HST. To improve the distance measurement, two additional epochs were obtained for each SN after the initial HST observation, spaced by approximately five restframe days. Template images for each SN were taken at least six months after the initial observations, with the exception of a single SN for which no template was needed because it was located in a region with no apparent galaxy light.

The Pan-STARRS Medium Deep Survey
The Pan-STARRS MDS, the source of RAISIN1 SNe, observed 70 square degrees in griz filters over approximately 4 years, with gr, i, and then z observations on successive nights. MDS also observed in the y filter, primarily during bright time, but the S/N of y-band observations for objects in our targeted redshift range was too low to be of use for SN Ia cosmology. Approximately 5200 SNe were discovered across the four years of the MDS, with over 3000 host galaxy redshifts measured (Jones et al. 2017). Five hundred twenty SNe were spectroscopically classified during the survey, with ∼350 of these being SNe Ia and having a median redshift of ∼0.35 . Cosmological parameter measurements from these data are presented in Scolnic et al. (2018) for the spectroscopically classified sample and Jones et al. (2018a) for the full sample. Further details regarding the MDS are given in Chambers et al. (2016).
The optical photometry for RAISIN1 comes from the Pantheon cosmological analysis , with the exception of five SNe that were not included in Scolnic et al. (2018) 2 . Light curves of those five SNe were published by Villar et al. (2020) and are included in the data release that accompanies this paper. Pan-STARRS SN photometry is from Photpipe (Rest et al. 2005), with details specific to the MDS given in Rest et al. (2014) and Scolnic et al. (2018). In brief, MDS images are astrometrically aligned and their zeropoints are measured using the PS1 catalog. PS1 zeropoints have been calibrated to a 3 mmag relative precision across the Figure 1. Representative classification spectra for RAISIN SNe (black) with the best template matches from SNID (red). Though the template match to PS1-490521 (SN 2006ot) is a peculiar Ia, the SNID matches to normal SNe Ia are also excellent.  sky (Schlafly et al. 2012;Scolnic et al. 2015;. Single-season, inverse variance-weighted template image stacks are then convolved and subtracted from the nightly images, and SNe are discovered and their light curves subsequently measured by performing DAOPHOT forced photometry on the resulting difference images (Stetson 1987).

The Dark Energy Survey
DES observed in griz filters over 27 square degrees with a cadence of approximately once per week. These 27 square degrees were split into eight 2.7 deg 2 "shallow" fields, with depths of ∼23.5 mag, and two "deep" fields, with depths of approximately 24.5 mag. DES spectroscopically classified 251 SNe Ia at redshifts 0.02 < z < 0.85 (Abbott et al. 2018). After sample cuts, their threeyear spectroscopic sample includes 207 SNe Ia at a median redshift of z = 0.36. Cosmological parameter measurements from the DES spectroscopic sample are given in Abbott et al. (2018) with additional publications describing the calibration (Burke et al. 2018;Lasker et al. 2019), primarily based on observations of the CALSPEC standard star C26202, bias corrections , photometry (Brout et al. 2019b), spectroscopic classification (Smith et al. 2020a), and systematic uncertainties (Brout et al. 2019a) of these data.
For RAISIN2, DES SN discovery uses a difference imaging procedure similar to the MDS, but final optical photometry is carried out using the scene-modeling algorithm presented in Brout et al. (2019b). Scene modeling (Holtzman et al. 2008) uses imaging data to build a pixel-based model of the galaxy+SN that is convolved with each night's measured point spread function (PSF) model. The amplitude of the SN is allowed to vary epoch-to-epoch while the galaxy brightness is fixed. The robustness of the algorithm has been tested with artificial sources and recovers fluxes to an accuracy of 3 mmag.

The Carnegie Supernova Project
There are two sources of well-sampled low-z NIR SN Ia data, the CfA and CSP samples (Friedman et al. 2015;Krisciunas et al. 2017). These were combined to yield a sample of 89 low-z, NIR-observed SNe in Avelino et al. (2019). However, most CfA SNe are either at redshifts where the effect of peculiar flows dominates the distance uncertainty (z 0.01) or lack data prior to maximum light, which makes the determination of the time of maximum light uncertain and increases distance uncertainties. Therefore, we restrict ourselves to the CSP data for the low-z SN sample used in this work, although the CfA data remain extremely useful for training SN standardization models in the NIR. The release of CSP-II data in the near future will add an additional 125 SNe with NIR light curves and 90 SNe with NIR spectra Phillips et al. 2019).
CSP uses DAOPHOT (Stetson 1987) to measure natural-system photometry on SN images after a template image, taken once the SN light has faded, has been subtracted. CSP images are calibrated with respect to primary standards BD +17 • 4708 (optical bandpasses) and Vega (NIR), with additional details given in Krisciunas et al. (2017).

Spectroscopic Classifications and Redshifts
Spectroscopic classifications and SN redshifts for each RAISIN SN were determined using SNID (Blondin & Tonry 2007). SNID determines a classification quality through a combination of the overlap between the observed spectrum and a template spectrum (lap), and the height of the cross-correlation peak (r). We ensure that the best template match has an rlap > 5, indicating a good match, and that the top three spectroscopic matches are all normal SNe Ia. For SN PS1-520107, the classification was unclear so we remove this SN from our sample. For SN PS1-490521 the best-matched spectrum was to the peculiar SN Ia 2006ot but, as discussed in Section 3.4, we choose to include 06bt-and 06ot-like SNe in our cosmology analysis as they cannot be reliably classified (or ruled out) from noisier high-z spectra.
The PS1 and DES teams measured host galaxy redshifts for these objects using cross-correlation matches to galaxy templates. For PS1, redshifts were estimated using the rvsao package (Kurtz & Mink 1998), and for DES the Marz package was used (Hinton et al. 2016).
For CSP classifications and redshifts, see Krisciunas et al. (2017) and references therein. SN classifications were measured from SNID, as well as SN redshifts. As discussed in Krisciunas et al. (2017), all CSP SNe have host galaxy redshifts, and nearly all of these are measured from the NASA/IPAC extragalactic database (NED 3 ). We also use updated group redshifts from the Pantheon+ analysis Scolnic et al. (2021) for SN 2005al and 2009ab, which differ significantly from the values in the latest CSP data release.

HST Data and Photometry
The HST data for RAISIN1 SNe were taken with three epochs each of F 125W (approximately the J band) and F 160W (approximately the H band). For RAISIN2 SNe, only the F 160W band was used for most SNe; five RAISIN2 SNe included data in both filters. Figure  2 illustrates the rest-frame wavelengths probed by the HST data as a function of redshift.
The first HST observations occurred at a median of 12.2 observer-frame days relative to B-band maximum light for RAISIN1 (+9.2 rest-frame days) and 13.9 observer-frame days (+9.5 rest-frame days) for RAISIN2 due to the time required for SN discovery, classification, and the latency in the HST scheduling. The median S/N is 12.5 for RAISIN1 and 12.9 for RAISIN2 with template observations taken at an average of 189 days after the first SN detection. See Appendix A for a discussion of how we correct for a small amount of late-time SN flux in some template images.
Examples of RAISIN observations are shown in Figure  3. Optical photometry and spectra of these SNe are provided in the online data release accompanying this paper to allow these measurements to be of use in future NIR analyses such as SIRAH (HST-GO 15889).
Our team measured photometry from the RAISIN images according to the following steps: 1. For each epoch, FLT images, which are images that have been bias-subtracted, dark-subtracted, and flat-fielded, are drizzled 4 together and the final (template) epoch is subtracted.
2. We then measured SN centroids from the difference images and performed 0.4 aperture photometry on the difference images with aperture corrections from HST. We verified these aperture corrections using drizzled images of the CALSPEC standard star P330E.
3. Using artificial stars injected into the data frames, we corrected the resulting measurements and uncertainties for host galaxy noise.
4. We calibrated the data using publicly available zeropoints that were measured from observations of four white dwarf standards (GD153, G191B2B, GD71, GRW+70D5824), and the G-type star P330E 5 .
We describe these steps in additional detail in Appendix A. Coordinates of the RAISIN SNe and the final RAISIN photometry are given in Appendix B.
In this section, we describe the NIR light curve models we used (Section 3.1), present our method for distance measurements (Section 3.2), create a dispersion model to measure distance-dependent biases (Section 3.3), apply sample selection cuts (Section 3.4), measure the dependence of distance on host galaxy mass (Section 3.6), and estimate each source of systematic uncertainty in our NIR measurement of w (Section 3.7). Our cosmological parameter measurements are then presented in Section 4.

NIR Light Curve Models
SN Ia standardization models rarely make use of the NIR due to the small number of well-calibrated, highcadence NIR light curves. In the last decade, only the SALT2 (Guy et al. 2007(Guy et al. , 2010 and SiFTO (Conley et al. 2008) light curve fitters have been used for measurements of w, with SALT2 being by far the most common model. SALT2, however, extends only to a central filter wavelength of approximately 7000Å, with the recentlydeveloped SALT3 model  now extending this wavelength to 8700Å to include the restframe z band.
However, there are now two well-vetted light-curve models extending to the NIR: the SNooPy model (Burns et al. 2011(Burns et al. , 2014 and the BayeSN model (Mandel et al. 2022;Thorp et al. 2021). SNooPy benefits from over a decade of use by the community and includes light curve templates for the uBV griY JH bands that have been trained using well-calibrated CSP light curves. BayeSN is a hierarchical Bayesian Spectral Energy Distribution (SED) model for SNe Ia (BayeSN-SED) that models time-dependent optical-NIR (to 1.8 µm) SN Ia SEDs as a combination of physically-distinct intrinsic spectral components and host galaxy dust effects. By having a SED-based model continuous in both time and wavelength, BayeSN removes the need to compare high-redshift photometric data to an approximate rest-frame filter and effectively allows a K-correction that varies with light-curve stretch. At the same time, BayeSN has yet to be integrated in publicly available tools like SNANA (Kessler et al. 2010), so the simulation of distance-dependent biases for a cosmology analysis is not yet feasible.
Though we incorporate both SNooPy and BayeSN in our analysis, we use SNooPy as our baseline method due to the greater ease of determining distance biases and because BayeSN was under development during the time when much of this analysis was carried out. The advantages of BayeSN's statistical framework are somewhat mitigated due to the limited number of NIR ob- Figure 4. Examples of RAISIN SN light curves. Apparent magnitude minus distance modulus (a proxy for absolute magnitude) versus phase is shown for three RAISIN SNe with observations in i (PS1 or DES), F 125W , and F 160W . Solid lines show the SNooPy model fit to the data. The optical bands, including i, are used to estimate the time of maximum light, while the HST NIR bands are used to measure the distance (we allow sBV fitting to show the best-fit NIR model in this example). Histograms with approximate absolute magnitude distributions for all RAISIN SNe are shown on the right after applying the sample selection cuts discussed in Section 3.4 below.
servational epochs in the RAISIN data. Still, we find good consistency between the distance measurements from these two models, as demonstrated in Section 5.2.1; we plan to use BayeSN in a future optical+NIR analysis of these data.

SNooPy Model Philosophy
The SNooPy model philosophy assumes that the observed variation in SN colors is caused by extinction in the SN host galaxy, but allows the selective-to-total extinction ratio R V to be specified by the user -the nominal method in this analysis -or fit by the data. We also allow SNooPy to fit negative extinctions for a more agnostic treatment of SN color; SNooPy assumes the luminosity versus color relationship to be linear with no minimum value (the same philosophy as the Tripp relation; Tripp 1998). This effectively means that R V should be interpreted as a luminosity versus color trend in this work, rather than a physical dust law.
SNooPy also includes options to correct for the dependence of SN luminosity on light curve shape. We adopted the "EBV model2", that uses the s BV parameter to determine the relation between SN light curve shape and luminosity . The s BV parameter is a stretch parameter that is insensitive to extinction because it takes advantage of the fact that the delay between time of maximum light and the time of reddest color is insensitive to reddening (Lira et al. 1998). To enable easier determinations of distance biases and systematic uncertainties in later stages of the analysis, we incorporated this SNooPy model into the SNANA software (Section 3.5; Kessler et al. 2010) and used the SNooPy iY JH templates generated from low-z CSP data to fit our NIR data and measure distances. Example RAISIN light curves with SNooPy fits are shown in Figure 4.

Distance Measurement Method
Because RAISIN NIR light curves include just three post-maximum light epochs per SN, we are unable to make robust shape and color corrections or determine the time of maximum light from NIR data alone; we find a Hubble residual RMS of 0.24 mag when fitting A V and 0.28 mag when fitting s BV from NIR-only RAISIN data (in Appendix C, we will use simulations to confirm that larger RMS values are expected given the small number of post-maximum light epochs per SN).
Therefore, our nominal distance measurement approach is to use the SNooPy model to fit only the NIR SN amplitude; we use optical data to determine the time of maximum light and fix the SN shape and color to a nominal value 6 . This approach yields a Hubble residual RMS of 0.18 mag.
We relax either the NIR-only aspect of this analysis or the nominal approach of fitting only the amplitude of 6 We adopt arbitrary values of s BV = 1 and A V = 0 for simplicity, as the bias correction stage will adjust the derived distances for the average s BV and A V of the data themselves. We have confirmed that adjusting the nominal value of these parameters in the distance fits does not change the scatter by more than ∼0.01 mag.
the NIR data -i.e., not fitting s BV or A V -in several places in this analysis as described below: 1. Selection cuts. Optical data are used to make sample selection cuts to remove SNe with high values of E(B − V ), which cannot be estimated from NIR data alone. This is discussed in Section 3.4.
2. Time of maximum light. Because we do not have NIR data near maximum light in the high-z sample, we must use optical data to constrain the phase of the SN observations. 3. Scatter model. SNe Ia were discovered and spectroscopically classified at optical wavelengths, so we determine the consequence of these selection effects on our sample. To estimate distancedependent selection effects for the SNooPy model, we use optical+NIR CSP data to estimate the covariance matrix of the distance residuals after estimating and correcting for s BV and A V . This is described below in Section 3.3.

Bias corrections:
To determine and correct for the redshift dependence of the intrinsic populations of s BV , we fit s BV using the NIR data. Though the resulting Hubble residual scatter increases, as discussed above, this is a useful tool to compare data distributions of s BV to simulated distributions of s BV . This is described in Appendix C.

5.
Wavelength-dependence of the mass step. In Section 5.1, we compare estimates of the host galaxy mass step between optical and NIR data in a self-consistent manner that accounts for the correlation between s BV and galaxy mass. Therefore, in this section we will correct the NIR Hubble residuals for their dependence on the optical+NIR-measured s BV parameter. We use both the SNooPy derived s BV -NIR luminosity correlation and the directly measured dependence of our NIR Hubble residuals on s BV . A s BVcorrected mass step is not used for cosmological parameter estimation but only to investigate the physics and wavelength-dependence of the mass step.
6. Comparing to optical and optical+NIR measurements of w. In Section 4.1, we compare the baseline results to results that fit the SNooPy model to optical and optical+NIR data.

Dispersion Model and Distance Biases
Because we compare SNe Ia across a large span of redshift, we must determine distance-dependent biases, that are caused by the increased likelihood of discovering higher-luminosity SNe at larger redshifts in magnitudelimited surveys; these can be up to ∼5% in distance for the PS1 and DES samples Brout et al. 2019a). To determine these biases, we build a dispersion covariance matrix from the optical to NIR to understand the ways in which optical SN Ia selection effects − which are caused by PS1 and DES selection criteria − affect distances measured in the NIR.
Because there has been no measurement of the covariance in magnitude between different SNooPy bands, we estimated this model from the low-z CSP data in our sample. To generate this model, we use low-z SNe at 0.015 < z < 0.1 to ensure that the results are minimally dependent on cosmological parameters while also mitigating the effect of peculiar velocity uncertainties at z < 0.015 (Peterson et al. 2021).
We then simultaneously fit the SNooPy model to all the optical and NIR bands in the low-z CSP sample to estimate the time of maximum light, s BV , and A V for each SN. Keeping these parameters fixed, we then fit the low-z data with the SNooPy model for each of the BV griY JH bands individually. We use the resulting Hubble residuals in each band to generate a band-to-band covariance matrix that can be used to generate large Monte Carlo simulations and determine wavelength-dependent distance biases. We note that the additional distance scatter predicted from a peculiar velocity uncertainty of 250 km s −1 ) is smaller than the scatter observed in each band at the median redshift z = 0.023, and we find that the observed optical-to-NIR covariances are very similar when restricting the sample to z > 0.025 or z > 0.03.
The resulting covariance matrix is shown in Figure  5 and is now included in the public SNANA software 7 . We find similar dispersion measurements in most bands, with values ranging from ∼0.116 to 0.136 mag; the maximum dispersion is in the H band and the minimum is Y but the band-to-band differences are not highly significant in this sample. Covariance is on the order of ∼0.015 mag 2 between neighboring optical bands (correlations of 0.95) and ∼0.010 mag 2 between NIR-to-NIR and optical-to-NIR bands (correlations of ∼0.78-0.86). These correlations can be understood as the correlation between Hubble residuals measured using one band at a time. Slightly higher correlations are found when we examine the Uddin et al. (2020) data, which might be due to the exclusion of fast-declining, red, and low-z SNe in the present analysis. NIR-derived distances in our sample are well-correlated with optical distances -implying a significant wavelength-independent component of SN Hubble residual scatter -but are less correlated with optical bands than optical bands are with each other, implying that NIR data contain additional information useful for constraining SN Ia distances.

Sample Selection Cuts
Next, we apply selection cuts to our sample to ensure that each SN has well-measured photometry and distances. First, as discussed in Appendix A.1, we remove five SNe with host galaxy noise contributions that add greater than 0.1 mag to the photometric errors, based on an injected star analysis, as these are indications of subtraction residuals. Due to PSF variation Note. The effect of selection cuts on the low-z and RAISIN datasets. These include requiring (1) photometric uncertainty from bright hosts < 0.1 mag, including systematic uncertainty due to subtraction residuals, (2) spectroscopic confirmation that the SN is Type Ia, (3) redshift > 0.01 to limit peculiar velocity errors, (4) data taken before maximum light, both before removing points with χ 2 > 10 from the light curve fit and (5) after removing those points (see text), (6) existence of NIR data, (7) successful convergence of the fit to the SNooPy light-curve model, caused by the breathing of HST, it is difficult to have clean subtractions on galaxies that are bright relative to the SN; this is because even few-percent variations in the PSF when multiplied by a bright galaxy core can cause large residuals relative to the brightness of the SN. This PSF variation will unavoidably bias our sample against SNe in high surface brightness environments, making it even more important to better characterize the SN-host galaxy environment correlation. We visually inspect each difference image to ensure that the remaining SNe appear to have high-quality subtractions. We also remove SN PS1-520107, which has an unclear spectroscopic classification. We note that SN 2006bt is peculiar (Foley et al. 2010), but we chose to include it for consistency across the redshift range as similar events would be impossible to identify and exclude in our high-z sample given the noisier spectra at those redshifts. However, 2006bt subsequently fails Chauvenet's criterion (see below). We further remove SNe that lack either optical or NIR data before maximum light as this makes it impossible to accurately determine the phase of a given observation. We also ensure that pre-maximum light data exist both before and after removing >10-σ SNooPy light-curve fit outliers from the data; if a SN had a mis-identified time of maximum light we found that a SN could appear to have pre-maximum light data when in reality it did not, but this issue was fixed once we removed outlying data points from the fit. Next, we remove SNe measured to have E(B − V ) > 0.3 mag from optical data. Though only a single SN in our sample fails this cut, these SNe could in principle cause large outliers on our NIR-only Hubble diagram given that we perform light-curve fits with fixed A V . High-A V SNe could also have some sensitivity to differences in R V that are difficult to measure due to the extrinsic/intrinsic degeneracy in SN colors (e.g., Mandel et al. 2022). We also remove SNe with extreme values of the s BV parameter that are not well-represented in the SNooPy training sample. These include s BV > 1.18 and s BV < 0.75, values for which SNooPy is not well trained. Cutting at s BV > 0.75 also excludes 91bg-like and transitional SNe Ia (see Burns et al. 2014;Gall et al. 2018), which removes SNe that would be poorly standardized by our baseline method that does not fit for s BV . At s BV > 1.18 in particular, only a single high-stretch SN is included in the SNooPy model training. SNe with s BV measurement uncertainties >0.2 are also removed, as we cannot reliably estimate whether they pass cuts. Finally, two SNe -SN 2006bt and DES15C1nhv -fail Chauvenet's criterion as 2.8-and 3.1-σ outliers on the Hubble diagram. A sample of 38 objects (the high-z sample) is expected to have just 0.07 3.1-σ (or greater) outliers on average, and a sample with 44 objects (the CSP sample) is expected to have just 0.2 >2.8-σ outliers; both numbers are below the Chauvenet threshold of 0.5. We treat the low-and high-z distributions differently in this calculation as they have substantially different photometric data coverage and scatter (Section 5.2.2).
The full set of distances, light curve parameter measurements (using optical data), and cuts on those data for both the low-z and RAISIN samples are given in Appendix E. For each subsample, selection cuts are summarized in Table 1.

Simulations and Bias Corrections
Using the dispersion model from Section 3.3, we use the SNANA (Kessler et al. 2010) simulation framework to generate realizations of each subsample in this analysis (CSP, RAISIN1, and RAISIN2). SNANA is the "industry standard" tool for accurately simulating large SN Ia samples for cosmological parameter measurements, and for this reason both the MDS and DES teams have already developed sophisticated SNANA simula- Figure 6. SNooPy model iY JH light curves as implemented in SNANA and as a function of the color-stretch parameter, sBV . The rest-frame Y band model in particular has very little sensitivity to sBV at the epochs of the RAISIN data. The phase range within which 68% of RAISIN observations fall are shown in grey bands (for CSP the 68% range covers phases from ∼ −3-40 days).
tions of their surveys that yield accurate realizations of their survey data. SNANA includes sky noise, survey zeropoints, survey cadences, host galaxy properties, realistic shape and color distributions for the SN population, and the quantitative criteria that were used to trigger followup observations. These simulations are presented in Appendix C. In brief, we based our simulations on pre-existing CSP , PS1 , and DES  simulations, but we applied them to a SNooPy model that we implemented within the SNANA framework ( Figure 6). We then estimated intrinsic population distributions of the s BV values using the aggregate NIR data and, although NIR data are only weakly dependent on A V , we simulate a nominal exponential A V scale of τ = 0.2 mag and vary this scale length in our systematic error budget. This exponential scale is well matched to our measurements from optical observations, from which we estimate τ 0.15−0.18 mag; preliminary optical+NIR analysis of the RAISIN sample from BayeSN also gives τ = 0.21 ± 0.04.
The simulations then give the distance bias: the average difference between simulated and measured distance as a function of redshift, shown in Figure 7. We correct the measured distances for the average bias at each SN redshift. The predicted optical distance biases have similar sizes to bias corrections generated for the Pantheon analysis ) using the Guy et al. (2010) optical scatter model and the SALT2 standardization model. The NIR bias corrections are dominated by the Results are shown for both the NIR-only light curve fits (blue), which assume zero reddening and sBV = 1.0, and the optical+NIR fits (orange), which include AV and sBV as free parameters. Predicted low-z (black triangles) and PS1 (grey triangles) distance biases from Pantheon are also shown in the top panel. Some bin-to-bin fluctuation in the bias corrections as a function of redshift is caused by simulating SNe at only the exact redshifts of RAISIN objects for a smaller, more computationally efficient simulation. Figure 8. Cumulative fraction of SNe as a function of host galaxy mass for CSP and RAISIN SNe. CSP SNe are found in significantly more massive host galaxies. difference in mean stretch and A V between the low-and high-z samples. The uncertainties on the differences in these distributions are used to determine systematic uncertainties on the bias corrections.

Host Galaxy Mass Dependence
The step-like dependence of SN Hubble residuals on host galaxy mass has been well established (Kelly et al. 2010;Lampeitl et al. 2010;Sullivan et al. 2010) and recently there have been reported -but somewhat conflicting -detections of the host mass step in the NIR (Ponder et al. 2021;Uddin et al. 2020;Johansson et al. 2021). Constraining the wavelength dependence of the mass step could help to distinguish among theories for the origin of the step, e.g., dust properties ) versus progenitor metallicity (Rose et al. 2021a). We attempt to measure and correct for a host galaxy mass step in the RAISIN data. We measure host galaxy masses ourselves to ensure consistency across the sample using a procedure described in Appendix D. We simultaneously estimate the mass step and the average Hubble residual in three redshift bins to ensure our measurement is independent of cosmological parameters. The resulting masses are shown in Figure 8.

Systematic Uncertainties
We treat systematic uncertainties in much the same way as previous cosmological analyses (e.g., Scolnic et al. 2018;Brout et al. 2019a;Jones et al. 2019): we determine 1-σ uncertainties in data and analysis parameters − e.g., peculiar velocities, Milky Way reddening, calibration, and bias corrections − apply each systematic to Figure 9. Selected systematic shifts in distance modulus as a function of redshift for this analysis, grouped into calibration systematics (top), bias correction systematics (middle) and other systematics (bottom). The systematics shown in the middle panel are a shift in the scale length of the AV distribution (left), a shift in the scale length of the AV distribution for the high-z data only (center), and 1-σ shifts in the high-z stretch distribution parameters (right). The systematics shown in the bottom panel are due to the uncertainty in the size of the mass step (left), the SNooPy model (center), and K-correction uncertainties (right). The weighted average of each distance vector is subtracted to show relative low-to high-z differences, even if the systematic only changes the distances in one part of the redshift range. For visualization purposes, only bins with at least 3 SNe are shown to reduce noise while illustrating z-dependent trends.
the data or analysis, and construct a covariance matrix of the distances resulting from each analysis choice or data modification. The systematic uncertainty covariance matrix is then added to the (diagonal-only) statistical covariance matrix. The systematic uncertainty covariance matrix is given by: for a given set of systematics S n applied to each light curve or distance measurement. For the nth systematic, ∂Sn is the change in distance after applying systematic S n to a SN at redshift z j . The size of each systematic uncertainty is given by σ(S n ). Details about individual systematic uncertainties are given below. Compared to previous analyses, our choice of the SNooPy model and the limited sample size allows us to omit a few second-order systematic uncertainties, including redshift evolution in nuisance parameters and redshift evolution of the host galaxy mass step. We also do not include the choice of SN Ia dispersion model as a systematic uncertainty because we have constrained this directly from the low-z Hubble diagram itself. We are therefore not subject to the G10/C11 scatter model degeneracy that causes significant uncertainties in opticalonly analyses (e.g., Brout et al. 2019a). The redshift dependence of a subset of our systematic uncertainties is shown in Figure 9.

Calibration
This analysis is limited to two photometric systems, HST and CSP (the Swope telescope). The HST CAL-SPEC calibration affects both the RAISIN and low-z datasets, while the CSP calibration to CALSPEC affects only the low-z data. The CALSPEC calibration was updated in Bohlin et al. (2020) with the revised uncertainty in the F 125W and F 160W bands now at the level of 0.5%. The CSP calibration is from Krisciunas et al. (2017); the JH calibration is tied to Vega while the optical calibration is tied to observations of BD+17 • 4708. We have updated the optical CSP zeropoints using the latest CALSPEC spectra of BD+17 • 4708 8 . The CSP JH calibration is uncertain at the level of approximately 2%, and we apply independent 2% JH systematic errors to the CSP magnitudes as a conservative estimate. We also assume a larger, 3% systematic error for the Y -band calibration as it was calibrated to Castelli & Kurucz (2003) atmospheric models rather than Persson standards (Persson et al. 1998;Krisciunas et al. 2017). Better calibration of low-z samples is a promising area for improvement to enable improved future ground-based, NIR measurements.

Bias Corrections
The uncertainty in the host galaxy bias corrections is dominated by uncertainty in the intrinsic distributions of s BV and A V in our samples. The method of Scolnic & Kessler (2016) gives 1σ uncertainties on the derived population parameters and we re-compute bias corrections after varying the s BV distribution parameters for the low-z and high-z data by their 1-σ uncertainties. The population parameters after 1-σ variations are shown in Appendix C. We include two additional extinction variants: first, we reduce the exponential scale length by 0.03 in E(B − V ) (from 0.13 to 0.1) and second, we reduce the exponential scale length of the extinction distribution by 0.03 in E(B − V ) for only the CSP sample. These E(B − V ) shifts are equivalent to a shift in mean A V of 0.05 for our default R V = 1.52 or 0.1 for R V = 3.1; they are greater than the measured difference in mean A V between the low-and high-z samples when optical+NIR data are used in the fitting.

NIR SN Standardization Model
We substitute BayeSN distances for SNooPy distances to estimate the systematic error in the SNooPy NIR SN Ia standardization model. Though BayeSN has not yet been implemented in SNANA, e.g., bias corrections cannot yet be estimated, we can still compare the uncorrected distances after fitting the RAISIN and low-z data with BayeSN. To ensure that the BayeSN distances are fit in the same way as the SNooPy distances, we keep the A V and shape parameter fixed to values equivalent to the SNooPy values of A V = 0 and s BV = 1 (BayeSN A V = 0, θ −1), and we apply the SNooPy bias corrections to the BayeSN distances, making the assumption that the SNooPy bias corrections remain valid for BayeSN measurements. SNooPy and BayeSN distances are well-correlated (see further discussion in Section 5.2.1) and the average Hubble residual between the low-and high-z samples changes by just 2 mmag when using BayeSN instead of SNooPy.

Mass Step
We adopt two variants of the host galaxy mass step. In the first, we shift the mass step by its 1-σ uncertainties from the maximum likelihood approach described in Appendix D. In the second, we adopt the NIR mass step location suggested by Ponder et al. (2021) of 10.44 dex: we measure the mass step at a location of 10.44 dex, recompute the step, and re-apply the step to the data.

K-corrections
Because the Hsiao et al. (2007) model is used to generate SNooPy's K-corrections, use of the SNooPy model requires an understanding of the uncertainties in K-corrections due to the limited number of SNe that were used to construct the Hsiao et al. (2007) model. To create a model for K-corrections with realistic uncertainties, we create composite spectra following the method of Siebert et al. (2019) using a limited number of individual NIR spectra from the Infrared Telescope Facility (Marion et al. 2009, with data from G. H. Marion, private communication) that were used to create the original Hsiao et al. (2007) template. Composite spectra are created at phases of approximately −9, −3, +7, +15, +39 days, epochs that match the phases of the individual spectra themselves, and then bootstrap-resampled 50 times to create estimated uncertainties. The +15-day composite spectrum is within a few days of the epoch of most RAISIN NIR data 9 . We then use the shape of the Hsiao et al. (2007) template light curve at each wavelength to interpolate between the phases covered by the composite spectra and use a Savitzky-Golay algorithm to smooth over the telluric regions. We then assume those uncertainties are representative of the uncertainties on the Hsiao et al. (2007) model and shift the model by the standard deviation of the bootstrap-resampled spectra. This will result in a systematic change in the measured distances caused by the statistical uncertainty in the Hsiao et al. (2007) model, particularly at high redshift.
We note some small slope differences between the Hsiao et al. (2007) template and our composite spectra, but find good consistency overall. We also inspect the 2011fe NIR spectra from Hsiao et al. (2013) − a SN with somewhat narrower than average stretch − and see some broadband color differences that could be attributed to relative calibration errors in the spectra or perhaps intrinsic variation between SNe. Methods such BayeSN, which effectively has a SN stretchdependent K-correction, is capable of modeling this intrinsic variation. Future, well-sampled, public NIR spectral databases will reduce this systematic uncertainty and are an important area to improve future analyses.  (2011) values by 5% according to the systematic uncertainty in the dust temperature correction. For peculiar velocities, we conservatively vary the massto-light bias parameter β in the 2M++ cosmic flow maps by its 5σ statistical uncertainty (Lavaux & Hudson 2011).

Cosmological Parameter Measurements
We use a combination of CosmoMC (Lewis & Bridle 2002) and CosmoSIS 10 (Zuntz et al. 2015) with likelihood chains from Planck Collaboration et al. (2018), including temperature, polarization, and lensing, to measure cosmological parameters from the vector of SN distances and the systematic uncertainty covariance matrix. To measure SN-only constraints we use CosmoMC to sample the likelihood, while for combining SNe with Planck, we use the CosmoSIS importance sampler to combine SN constraints with results from the Planck chains. Luminosity distances d L from the wCDM model, in Mpc, are given by: Here Ω m , Ω Λ , and Ω k are the cosmic matter density, dark energy density and spatial curvature, respectively.

Note.
Summary of systematic uncertainties on the wCDM model with constraints from SNe+Planck. ∆w is the shift in w resulting from applying each systematic uncertainty and ∆σw is the additional uncertainty that would need to be added in quadrature to the statistical error to yield the uncertainty on w from a given variant. Note that some variants shift the value of w but do not increase its uncertainties.
We use the following function to estimate cosmological parameters: where C is the covariance matrix from the combined statistical and systematic uncertainties,μ is the distance modulus measured from the data as described in Section 3.2 and including the bias corrections and host mass step as discussed in Sections 3.5 and 3.6. The model distance modulus µ wCDM = 5 log(d L ) − 5. ∆M is the offset between the assumed SNooPy absolute magnitude of a SN Ia, which is also degenerate with the assumed value of H 0 , and the best-fit global value.

RESULTS
After performing the analysis described in the previous section, we produce the RAISIN NIR-only Hubble diagram shown in Figure 10. The high-z distance measurements are presented in Appendix E. The total root mean square (RMS) of the Hubble residuals is 0.167 mag, 18% higher than the Pantheon Hubble residuals. However, the CSP scatter for those SNe with NIR data near maximum light is 0.136 mag, slightly lower Figure 10. The RAISIN Hubble diagram, with low-z SNe from CSP (green), and high-z samples from RAISIN1 (orange) and RAISIN2 (purple). We show photometric uncertainties only, neglecting the contribution of intrinsic scatter for visual clarity.  . Additional data in the gap between the lowz and RAISIN redshifts would be particularly beneficial in creating an extended NIR Hubble diagram.

Ω m and w
By using CosmoMC to constrain cosmological parameters from SNe alone, Figure 11 shows the constraints on non-flat CDM relative to Pantheon and the cosmic acceleration discovery sample from Riess et al. (1998). These data thereby confirm cosmic acceleration with high significance, using a new method and wavelength range that has less sensitivity to dust extinction uncertainties. In a flat universe, from RAISIN SNe alone, we find Ω m = 0.258 ± 0.090.
Statistical and systematic uncertainties in this analysis are approximately equal. The dominant systematic uncertainty in this measurement is the bias correctionpredominantly due to uncertainty in the intrinsic s BV distribution -which will improve with larger sample sizes and improved light curve fitting methods. The second-largest systematic uncertainty is the size of the NIR mass step (Section 5.1), which will similarly improve with larger sample sizes. The full list of systematic uncertainties on the wCDM model are shown in Table  2 and visually in Figure 13. Though we have attempted to be comprehensive in our accounting of systematic uncertainties, we note that there may be substantial additional uncertainties in this novel measurement that will become more apparent only with larger sample sizes. In addition to measuring w, we measure H 0 with NIR SN data alone. We use Cepheid distances with CSP and RAISIN SN data to measure the Hubble constant from the distance ladder approach and use SN+CMB data to constrain the H 0 parameter with an "inverse distance ladder" approach (e.g., Cuesta et al. 2015) that combines the angular size of the CMB sound horizon with wCDM constraints from CSP, RAISIN and Planck.

The Hubble Constant
To measure H 0 from the distance ladder, we use publicly available Cepheid or tip of the red giant branch (TRGB) distances for all ten SNe with CSP photometry and with independent TRGB or Cepheid measurements to calibrate the NIR luminosity of SNe Ia. Nine of these SNe -SNe 2006D, 2007af, 2007on, 2007sr, 2009Y, 2011iv, 2012fr, 2012ht, and 2015F 11 -have redshifts of z < 0.01 and therefore were not included in our baseline sample for measuring w. One SN, 2007A, was already included in our sample as it has a redshift of z = 0.017.    (Table 2). Bias correction and mass step uncertainties dominate the error budget, with photometric calibration and uncertainties in the peculiar velocities contributing significantly. Other systematic uncertainties are added in quadrature to give the "Other" contribution.
We use a combination of available Cepheid and TRGB distance calibrations (Freedman et al. 2019;Anand et al. 2021;Riess et al. 2021) and take the agnostic approach of giving all available distances as originally provided equal weight. We use Cepheid distances from Riess et al. (2021) for five SNe: 2006D, 2007A, 2009Y, 2012ht, and 2015F. We use the mean of available Cepheid and TRGB distances for three SNe: 2007af (Freedman et al. 2019;Riess et al. 2021), 2007sr (Freedman et al. 2019Anand et al. 2021;Riess et al. 2021), and 2012fr (Freedman et al. 2019;Anand et al. 2021;Riess et al. 2021). Finally, we use the mean TRGB distances for two SNe from Freedman et al. (2019) and Anand et al. (2021), SNe 2007on and 2011iv, which are in the same host galaxy of NGC 1404. These two SNe are fast decliners (Gall et al. 2018 find s BV = 0.57 ± 0.01 and 0.64 ± 0.01 for 2007on and 2011iv, respectively) with decline rates below our nominal s BV cuts so we provide results with and without them.
We use these Cepheid and TRGB distances to calibrate the SN luminosity of the CSP and RAISIN samples following the method of Riess et al. (2021). If we include SNe 2007on and 2011iv in our H 0 measurement, we measure H 0 = 77.5 ± 2.1 km s −1 Mpc −1 from Cepheid+TRGB and H 0 = 76.6 ± 2.6 km s −1 Mpc −1 from TRGB alone. Excluding the two fast-declining SNe in NGC 1404, however, we measure H 0 = 75.9 ± 2.2 km s −1 Mpc −1 from Cepheid+TRGB and H 0 = 72.9 ± 3.0 km s −1 Mpc −1 from TRGB distances alone. We note that the statistical significance of this TRGB measurement with a reduced sample is limited as we have just three TRGB calibrator galaxies and the measurement is statistically consistent with other recent TRGB-based measurements (Freedman et al. 2019;Anand et al. 2021). The majority of the uncertainty derives from the error in the mean of the modest samples of NIR SN calibrators.
Because our distance measurement method does not correct for any dependence of SN luminosity on s BV and our Hubble flow sample excludes such fast-declining objects (we require s BV > 0.75) it is possible these fastdecliners, which make up a large fraction of the calibrator sample, could bias our results; therefore, our baseline result excludes the fast-decliners.
From the inverse distance ladder in a wCDM model, we find a model parameter H 0 = 71.2 ± 3.8 km s −1 Mpc −1 (Figure 14). The value of H 0 is consistent with the conventional result of 67.4 ± 0.5 assuming ΛCDM with Planck data alone. Independent approaches to measuring SN distances at cosmological redshifts such as this one demonstrate the ways in which future measurements can help determine what role, if any, the present dark energy could play in the tension between local (e.g., Riess et al. 2016;Pesce et al. 2020;Huang et al. 2020;Wong et al. 2020;Freedman 2021;Riess et al. 2021) and CMB H 0 measurements (e.g., Dhawan et al. 2020).
Though the uncertainties on these measurements are large, we find that moving to NIR wavelengths does not appear to change the size of the difference between CMB and local H 0 measurements (see also Burns et al. 2018;Dhawan et al. 2018).

Comparing to Optical and Optical+NIR Measurements
Finally, Table 3 compares our NIR-only measurement of w to measurements using optical data alone and optical+NIR data from SNooPy. In computing the optical and optical+NIR measurements, we apply the same base set of systematic uncertainties but add calibration uncertainties at the level of 3 mmag for each PS1 band, 5 mmag for each DES band, and 1% for each optical CSP band Burke et al. 2018;Krisciunas et al. 2017).
Interestingly, we find substantial shifts when considering optical or optical+NIR data versus NIR data alone. In the NIR-only case versus the optical+NIR case in which NIR data primarily have the effect of constraining the SN color, w shifts by up to 0.11. Though the statistical significance of this shift is not entirely clear given the correlated data and some correlation in the systematic uncertainties, this shift could point to the need for revisions to the NIR or optical SN standardization model in future work, particularly given forthcoming surveys such as the Roman Space Telescope SN survey that will observe in the NIR (Rose et al. 2021b).
Although the sensitivity of the NIR-only measurement to bias correction uncertainties yields a result with significantly higher systematic errors in our baseline analysis, larger sample sizes would result in more precise bias estimations from a better constraint on the redshiftdependent stretch distribution of SNe. However, given that NIR-only approaches will always have difficulty constraining the potential evolution of dust extinction with redshift, we believe that the optimal approach in future work will be an optical+NIR measurement; Table 3 shows that this measurement has the lowest total systematic uncertainty budget of our three wavelength regimes. Though the overall improvement in the precision of w is small, we note that alternative distance measurement methods such as BayeSN (Mandel et al. 2022) appear to more optimally weight the NIR versus optical data and yield significantly reduced distance uncertainties when NIR data are included. Figure 14. Constraints on H0, w, and Ωm from RAISIN and CMB data combined using an inverse distance ladder method. This figure was generated with the corner package (Foreman-Mackey 2016).

EXPLORING ALTERNATIVE ANALYSIS METHODOLOGIES AND CONSTRAINING THE MASS STEP
In this section, we explore variations on our nominal analysis and examine potential systematic uncertainties in the cosmological parameter measurement.

The Mass Step
Using our nominal distance measurement method we find a maximum likelihood size of the mass step of ∆ M = 0.022±0.047 mag at a step location of 10 dex and 0.026 ± 0.040 mag at the Ponder et al. (2021) location of 10.44 dex (the step location is fixed during each fit). These results are used to generate the distance measurements in Section 4 above. However, the differences between the NIR and optical measurements are affected by the fact that stretch is correlated with host galaxy mass; our nominal distance method does not fit for stretch and there is a median difference of ∆s BV = −0.099 between the high-and low-mass subsets of our sample. Correcting for this stretch differential, as done in an optical-only analysis, would have the effect of making the mass step larger.
To measure the mass step in a way that is more comparable to the way optical mass steps are measured, we test the impact of applying an s BV correction to the NIR distances using the best-fit s BV from optical+NIR data. We see a strong trend between Hubble residual and optical+NIR-derived s BV as shown in Figure 15 and find that the scatter in our Hubble residuals is reduced by 10% by making this correction. We do not see a significant change in the dependence of Hubble residual on s BV in the low-z data (slope −0.622 ± 0.189) compared to the high-z data (slope −0.702 ± 0.244).
After s BV fitting we measure a larger mass step, as we would predict, of 0.072 ± 0.041. This result is shown in Figure 16. If instead we use the location of the mass step from Ponder et al. (2021) of 10.44 dex, we find a mass step of 0.057±0.035 mag. If we adopt the approximate linear, empirical correction between s BV and Hubble residual shown in Figure 15, instead of using the default SNooPy s BV -luminosity relation, we find consistent steps of 0.086 ± 0.043 mag and 0.094 ± 0.037 mag for 10 and 10.44 dex step locations, respectively. We conclude there is ∼2-σ evidence for a mass step in these data even if we do not a priori assume that the SNooPy luminosity-s BV relation is correct.
Using the high-z data alone, we measure a mass step of 0.114 ± 0.081 mag at a step location of 10 dex and a step of 0.122 ± 0.091 mag at 10.44 dex. Unfortunately, the sample is too small to constrain the size of the step with these high-z data alone, which yield a statistically insignificant 1-σ evidence for the NIR mass step.
If we use the optical data to measure the mass step with the same SNe, we find a step of 0.11 ± 0.03 mag from SNooPy and 0.08±0.03 mag from SALT3 (the same step size is measured from SALT2). These are consistent with optical mass steps in the literature, which can range from 0.04 ± 0.019 mag from DES (Smith et al. 2020b) to 0.119 ± 0.032 from the Nearby Supernova Factory (Aldering et al. 2002;Rigault et al. 2018); the largest dataset (Pantheon; Scolnic et al. 2018) yields a step of 0.072 ± 0.01 12 .
These mass step results qualitatively support − albeit with limited significance − the findings of Ponder et al. (2021) and Uddin et al. (2020), who also found evidence that the NIR mass step was of order the same size as the optical mass step.

NIR Distances from BayeSN
Although we have not yet simulated distance biases with the BayeSN SED model, we did compare our  and all data combined (bottom). Dark shading indicates the uncertainty on the mean for high-mass (red) and low-mass (blue) hosts, while light shading indicates the dispersion of each population. The shading of the points indicates the probability that a SN is in a low-mass host (blue) or high-mass host (red). In this figure (unlike in Figure 10), we correct individual distances for sBV to derive a measurement that is directly comparable to optical analyses. We also simultaneously fit for three z-dependent distance bins to remove the effect of cosmological parameter values on the mass step size. SNooPy distances before bias correction to raw BayeSN distances that assume a fixed stretch and zero host dust extinction A V , analogous to the approach we adopted for SNooPy. We fix the BayeSN primary intrinsic component θ = −1, approximately equivalent to our chosen SNooPy s BV = 1. We use the same time of B-band maximum light for both SNooPy and BayeSN and fit only the NIR light curve data to obtain photometric distance estimates.
As demonstrated in Figure 17, these distances are consistent across the redshift range of our sample, with the average difference between BayeSN and SNooPy distances changing by just 2 mmag from the low-z to high-z samples. As SNooPy has rarely been used at high redshift, this consistency between the two models gives us confidence that the raw distances that we measure with SNooPy are robust.

Optical Distances from SALT3 and SNooPy
In Table 4, we explore the differences between several methods of measuring NIR, optical, and optical+NIR distances. We compare the baseline NIR-only distances to optical+NIR distances with both the baseline R V = 1.5 and a Milky Way-like R V = 3.1. We also compare to optical distances from SALT3 and finally, we use an approach where optical+NIR data are first used to determine s BV and then s BV is fixed to that value in an NIR-only fit that measures the distance. This essentially allows the NIR distances to include the anticipated s BVluminosity relation.
We find Hubble residual differences of ∼1-2% between our baseline NIR method and methods using optical data and the default SNooPy value of R V = 1.5 (Folatelli et al. 2010). With R V = 3.1, the difference in distances is much larger, at 0.065 mag between low-and high-redshift, though we note that using this value of R V increases the Hubble residual scatter by 28%. In a NIR-only analysis, however, using R V = 3.1 changes the distances by just ∼0.02-0.04 mag due to the lower sensitivity of the NIR to the value of R V . Because we allow negative extinction in this analysis and because SNooPy does not model the intrinsic correlation between color and luminosity, the R V parameter is more analogous to the nuisance parameter β in the Tripp relation than to a physical dust law.
The difference between our baseline distances and SALT3 distances is just 0.005 ± 0.038 mag. Although this difference in distance is small, the bias corrections shown in Figure 7 are ∼0.05 mag smaller in the NIR-only distances, increasing the potential discrepancy between SALT3 and the baseline NIR measurement. However, given the large uncertainties on the distance in measurement, a SALT3-based (or SALT2-based) analysis would still yield a consistent measurement of w.
Finally, the procedure of adjusting the NIR distances by a value of s BV measured in the optical results in a statistically insignificant −0.016 ± 0.027 mag change. The lack of a small s BV correction in our baseline analysis therefore does not appear to have a significant impact on the resulting cosmological parameter estimation. We also explore altering the semi-arbitrary s BV cuts in our baseline distance measurements, particularly given that the s BV values tend to cluster near ∼1.18, perhaps due to a limited high-s BV training sample in SNooPy. Fortunately, we find that our analysis is relatively insensitive to the minimum and maximum s BV ; both a "loose" cut of 0.6 < s BV < 1.3 and a "tight" cut of 0.8 < s BV < 1.15 change the average Hubble residual by just 7 mmag when comparing SNe at z < 0.1 to those Note. For optical+NIR analysis methods without bias corrections, the difference between low-and highz distances relative to the baseline NIR-only method: at z > 0.1. The mass step measurement is changed by less than 0.01 mag.
In Figure 18, we examine the dispersion of Hubble residuals for different methods that use optical data. We find that the Hubble residual dispersion is lowest when using the optical+NIR with R V = 1.5 (0.140 mag) or the SALT3 model (0.130 mag). SALT3 appears to standardize SNe Ia more precisely than SNooPy at high redshift, while SNooPy distances are more precise than SALT3 distances for the CSP data on which SNooPy was trained. The SALT3 results are nearly the same as those from SALT2, which was used for most previous optical-only analyses.
If an optical versus NIR difference becomes statistically significant in future work, it could point to unrecognized systematic uncertainties in analyses with optical data, NIR data, or both. In optical analyses, the G10 scatter model is a source of significant uncertainty in correcting for distance-dependent biases Brout et al. 2019a) and proposed alternative scatter models could shift w by 4% ). In the NIR, differences in stretch-luminosity correlations between Burns et al. (2014) and Burns et al. (2018), for example, may point to systematic uncertainties in SNooPy NIR standardizations. These results show that NIR observations constitute an extremely useful consistency check on optical analyses.

The Late-Time NIR Model
Our Hubble diagram compares SNe observed near maximum light in the low-z CSP sample to SNe observed only at later phases in the higher-z RAISIN sample; the initial NIR epoch in RAISIN occurred at +9 rest-frame days on average. This could introduce systematic uncertainty if the SN standardization model is not as accurate at late times as it is near maximum light where most previous NIR data exist. Although we account for this systematic uncertainty by using two SN Ia standardization models, both use much of the same training data.
To investigate potential biases due to the phase at which each SN was observed in the NIR, we investigated whether there is a trend in Hubble residuals as a function of the median phase at which each SN was observed. By comparing the Hubble residuals of SNe observed at a median phase of < +15 days to those observed at a median phase of > +15 days (+15 days is approximately the sample median), we detected a "step" in the NIR Hubble residuals at 3-σ significance. This step appears to show that RAISIN SNe Ia observed at a phase of > +15 days are fainter by 0.164±0.051 mag than RAISIN SNe Ia observed at a median phase of < +15 days. Though this trend is potentially concerning, we note that the difference in stretch between the early versus late-observed samples is 0.12, which could explain ∼0.05 mag of the ∼0.16-mag difference ( Figure 15). This selection bias could be caused by the fact that low-stretch SNe are intrinsically fainter, so it may have been harder to classify them and subsequently trigger HST observations prior to maximum light.
To inspect potential biases in the SNooPy model as a function of phase, in Figure 19 we show the SNooPy Y -band template alongside the CSP and RAISIN observations after the observed data have been converted to the rest frame. Both the high-and low-z data agree relatively well with the SNooPy model, albeit with slight offsets at +10 < phase < +20 days of ∼ 0.05 mag. These offsets suggest that the late-phase NIR SNooPy model should be adjusted as more data become available, but would only shift the mean high-z distance measured here by approximately 0.02 mag. The variance of the NIR data also appears to increase slightly near the phase at which most of the later RAISIN observations occur, while observations near maximum light have the lowest scatter. The slightly higher scatter of the RAISIN magnitudes compared to CSP might also be an indication of K-correction uncertainties in the highz distances. We observe similar trends when comparing the J-band SNooPy model to our data.
The apparent dependence of Hubble residual on the phase at which a SN was observed could also be due to subtle errors in the relationship between SN luminosity and s BV or that the NIR SN color-luminosity dependence differs significantly from the relation that would be expected from dust extinction alone. Although retraining the SNooPy model is beyond the scope of this work, and will likely require significant additional restframe NIR data to be robustly re-determined, additional Figure 18. Hubble residual scatter for different analysis variants. From left to right, we show our baseline NIR-only Hubble residuals, optical+NIR SNooPy distances with RV = 1.52 (Folatelli et al. 2010), optical+NIR SNooPy Hubble residuals with RV = 3.1, NIR SNooPy Hubble residuals after applying the sBV parameters measured from optical+NIR data, and optical SALT3 Hubble residuals. The SALT3 panel removes two CSP SNe with colors that are too red (c > 0.3) to pass standard SALT cosmology cuts; with these SNe included, scatter increases negligibly to 0.132 mag in the CSP sample. The SNooPy model (black) compared to CSP (blue) and RAISIN (red) measurements after using the SNooPy K-corrections and the cosmological distance to convert the observations to rest-frame absolute magnitude. Solid lines in the top panel show the rolling means of the CSP and RAISIN absolute magnitudes and the rolling standard deviation, with the average peculiar velocity error and photometric noise subtracted in quadrature, is shown in the bottom panel. NIR SN Ia observations appear to have the lowest RMS near maximum light and slightly higher RMS from ∼10-25 days.
low-z data and model retraining are important ways in which future NIR analyses could be improved.

DISCUSSION
Although this study reaffirms the existence of cosmic acceleration and establishes the value of NIR SN Ia data for cosmological measurements, much of the value of this dataset will be in preparing for NIR SN Ia samples from the Roman Space Telescope. Although the SN survey strategy is not yet finalized, Roman will observe in the rest-frame NIR to z ≈ 0.7, the regime in which cosmological parameter constraints from SNe Ia will likely be more precise than volume-limited Baryon Acoustic Oscillations (Weinberg et al. 2013). Measuring the most precise cosmological constraints from these data will be the path to obtaining the best possible constraints on cosmological parameters.
With Roman on the horizon, the systematic error budget presented here suggests a path forward for improved NIR cosmological constraints. Here's what we need: 1. Larger photometric NIR samples with coverage near maximum light across multiple photometric systems to improve calibration systematics.
2. Larger photometric and spectroscopic NIR samples to improve training of models, and opticalto-NIR measurements of the SN Ia dispersion to constrain any redshift-dependence of the value of R V .
3. Optical-to-NIR SN Ia SED models and standardization methods that separate SN Ia intrinsic dispersion from the effects of host galaxy dust as a function of wavelength, and improve the modeling of high-stretch (slowly declining) SNe.
4. Improved constraints on the host galaxy mass step and other host galaxy dependences as a function of wavelength to break degeneracies with dust properties.
5. An untargeted NIR low-z sample with wellunderstood selection effects.
These improvements would improve the accuracy of the cosmological measurement presented here even without obtaining additional high-z data. A public release of CSP-II NIR spectra and photometry will address several of these key points Phillips et al. 2019). As we enter the Roman era, new data will also reduce bias correction systematics by constraining the distribution of SN shape and color as a function of redshift. We elaborate on a number of these potential improvements below.

NIR Dispersion Model
The NIR dispersion model measured both in this work and in the BayeSN SED model show that NIR distances offer significant independent leverage to constrain distance measurements compared to those obtained only with optical data. Further, NIR data, in combination with optical, provide additional leverage to improve constraints on both R V and A V , as demonstrated in recent analyses that use NIR data to break the degeneracy between host galaxy dust and intrinsic SN physics (Mandel et al. 2022;Thorp et al. 2021).
In this study, we see that even without shape and color corrections, just three epochs of NIR data at ∼10-20 days after maximum light are sufficient to constrain dark energy properties. It is possible that even fewer epochs or filters would still yield competitive cosmological constraints, though additional epochs and filters give valuable ability to avoid data outliers and to constrain dust properties. Much of the systematic uncertainty in this analysis stems from the fact that these SNe were selected for spectroscopic followup in the optical. In large, untargeted NIR samples like those from Roman, this source of systematic uncertainty will vanish.

Building a New, Well-Calibrated NIR SED Model
SN Ia analyses and standardization models in the NIR are currently limited by the sample sizes of wellcalibrated datasets, both photometric and spectroscopic, which introduce substantial calibration systematics and uncertainties in the precision of the standardization model training itself. New, well-calibrated NIR samples from CSP-II, SIRAH, VISTA, UKIRT (Konchady et al. 2021), and eventually Roman will be needed for improved model training. Simultaneous NIR spectroscopy will also be important for a better understanding of K-corrections. The SIRAH sample in particular, with extremely well-calibrated photometry and grism spectroscopy tied to ground-based NIR observations, will be a powerful link between precise NIR cosmological analyses from space and the ground.
This analysis shows that the NIR+optical measurement of w has similar precision to the optical measurement alone. Although we find that systematic uncertainties on w are lowest when we use NIR data in combination with optical, there is no improvement in the precision of cosmological parameter measurements when adding NIR data in this analysis. This is likely because our light curve fitting approaches are not yet optimally weighting the SN data at different wavelengths, a problem that previous studies have found can be addressed with improved uncertainty treatments in SN standardization models (Mandel et al. 2011(Mandel et al. , 2022.

The Mass Step
The underlying cause of the host galaxy mass step is a subject of debate, with numerous explorations of alternative global host galaxy dependencies (Hayden et al. 2013;Pan et al. 2014) or relationships between Hubble residuals and local host galaxy properties near the SN location (Rigault et al. 2013(Rigault et al. , 2015Jones et al. 2015aJones et al. , 2018aKim et al. 2018;Rigault et al. 2018;Roman et al. 2018;Kelsey et al. 2021). The degeneracy between physical effects, potential data reduction artifacts (Solak et al. 2021), and dust has made measuring the wavelength dependence of the host galaxy mass step, and other host galaxy dependencies, extremely important for next-generation cosmological measurements. Measurement of whether a step-like dependence of NIR distance measurement on global or local galaxy color, e.g., Roman et al. (2018); Jones et al. (2018a); Kelsey et al. (2021), would be particularly interesting to understand the effects of dust or progenitor ages on SN distance measurements. Previous measurements of the mass step in the NIR have been limited by sparse data (Ponder et al. 2021) and the need to include optical data with the NIR observations to constrain the best-fit parameters such as s BV and A V (Uddin et al. 2020). Flexible simulation-based approaches, e.g. Pierel et al. (2021), will be necessary to fully understand the impact of different SN-host relationships on cosmological parameter measurements.
Although the galaxy targeted nature of the CSP sample has made measuring the mass step in this work difficult (nearly all CSP galaxies are in the "high-mass" regime), we do find evidence suggesting that the mass step in the NIR is about the size of the mass step in the optical. We note that our results are not in tension with other recent NIR mass step results, including those of Johansson et al. (2021), who saw that the mass step was insignificant in the JH bands and less significant (∼2σ) in the Y band. Our distances primarily probe the rest-frame Y band, albeit with some i and J-band overlap, where Johansson et al. (2021) find a mass step of approximately 0.07 ± 0.03 mag. However, our results do not favor the conclusion of  that the mass step is caused by variations in dust properties, as the effect of variation in R V should decrease by a factor of ∼3 between e.g., rest-frame V and restframe Y . Further study of the origin of the mass step is clearly needed and larger infrared samples may prove useful.
Future measurements of this and other host galaxy dependencies as a function of wavelength may be necessary to separate intrinsic color, dust, other SN Ia physics, or even multiple SN Ia progenitor models. For example, when measuring the mass step, we did not fit for individual R V values as was done in Johansson et al. (2021), or simultaneously constrain the population distribution of R V values as was done in Thorp et al. (2021); these methods can help to disentangle the role of dust in the mass step's size. Although SNe Ia are used as an empirical tool for measuring distance, understanding the objects and their connection to stellar populations is also valuable in itself. The evolution of stellar populations and of SN Ia progenitors across the long span of cosmic time that Roman will provide gives an opportunity to understand the nature of SN progenitors and their environments and to use them for more precise cosmological measurements.

CONCLUSIONS
We present photometric measurements, a Hubble diagram, and NIR-only cosmological constraints from the RAISIN survey. SN candidates were observed in cycles 20 and 23 by HST and combined with low-z, NIRobserved SNe from CSP to yield a NIR Hubble diagram and a better understanding of cosmological parameter measurements in the NIR. From CSP data we build a new, optical-to-NIR dispersion model for SNe Ia to predict distance-dependent biases and measure the independence of SN Ia distances in the optical and NIR. We find that NIR-derived distances are well correlated with distances derived from optical data, but offer some independent information. The dispersion of our NIRonly distances is 0.18 mag, ∼25% larger than the Pantheon sample at 0.141 mag but not standardized using shape or color information. When measuring the host galaxy mass step after applying shape corrections to our distances, in order to match the procedures used in optical analyses, we measure a NIR mass step of 0.072 ± 0.041 mag at a step location of 10 dex and 0.123 ± 0.034 mag at a step location of 10.44 dex.
In combination with CMB constraints from Planck Collaboration et al. (2018) and assuming a flat wCDM model, we measure 1 + w = −0.17 ± 0.12 (stat+sys), consistent with the ΛCDM expectation of w = −1. This NIR measurement of w may embed unknown systematic uncertainties that will become apparent when future, larger NIR samples are analyzed. The dominant systematic uncertainties in this analysis stem from bias corrections, which will be improved by future SN standardization models that extend to the NIR; building these models will be facilitated by larger NIR sample sizes on the way from CSP-II, SIRAH, VISTA, and UKIRT. We find that combined cosmological parameter measurements from optical+NIR data appear to have lower systematic uncertainties than when using optical data alone. We also see shifts in w of up to 0.1 between optical, optical+NIR and NIR data alone, pointing to the possibility of inconsistency in the optical versus NIR SN standardization models.
We also measure H 0 = 75.9 ± 2.2 km s −1 Mpc −1 from a local distance ladder approach tied to 8 NIRobserved SNe Ia with Cepheid or TRGB distances versus H 0 = 71.2±3.8 km s −1 Mpc −1 using a so-called "inverse distance ladder" approach anchored to CMB measurements of the angular size of the sound horizon. Future NIR measurements can help to constrain whether the present dark energy density or systematic uncertainties due to SN Ia dust could play any role in the H 0 tension.
Finally, these data will help to better understand the behavior and standardization of SNe Ia in the NIR and assist the community in preparing for data from the Roman Space Telescope. Improved dispersion models will help improve distances by using an optical+NIR combination, and wavelength-dependent measurements of the dependence of SN Ia distance measurements on host galaxy properties will break the degeneracies with dust properties. When combined with future datasets from SIRAH and improved SN Ia standardization models that can be trained with this dataset, we hope the data published here will have legacy value for the SN Ia community in the Roman Space Telescope era 13 .

ACKNOWLEDGMENTS
We would like to acknowledge that much of the data and analysis in this manuscript were made possible through the work of our friend and colleague Dr. Andrew S. Friedman, who passed away during the final stages of this project. We would also like to thank G. Narayan and N. Drakos for useful discussions and their assistance with HST photometry and K-corrections. We would like to thank R. Kessler for implementing the latest SNooPy model in SNANA and C. Rojas-Bravo for assistance with the spectra. We would also like to thank the anonymous referee for many helpful suggestions. This paper includes data gathered with the 6.5 meter Magellan Telescopes located at Las Campanas Observatory, Chile. Some of the observations reported here were also obtained at the MMT Observatory, a joint facility of the University of Arizona and the Smithsonian Institution, and the W.M. Keck Observatory, which is operated as a scientific partnership among the California Institute of Technology, the University of California and the National Aeronautics and Space Administration. The Observatory was made possible by the generous financial support of the W.M. Keck Foundation. The authors also wish to recognize and acknowledge the very significant cultural role and reverence that the summit of Maunakea has always had within the indigenous Hawaiian community. We are most fortunate to have the opportunity to conduct observations from this mountain. This paper also uses observations obtained at the international Gemini Observatory, a program of NSF's NOIRLab, which is managed by the Association of Universities for Research in Astronomy (AURA) under a cooperative agreement with the National Science Foundation on behalf of the Gemini Observatory partnership: the National Science Foundation (United States), National Research Council (Canada), Agencia Nacional de Investigación y Desarrollo (Chile), Ministerio de Ciencia, Tecnología e Innovación (Argentina), Ministério da Ciência, Tecnologia, Inovações e Comunicações (Brazil), and Korea Astronomy and Space Science Institute (Republic of Korea). Finally, we include data acquired at the Anglo-Australian Telescope. We acknowledge the traditional owners of the land on which the AAT stands, the Gamilaroi people, and pay our respects to elders past and present. In this appendix, we describe the photometric measurement of RAISIN SNe from HST. The photometric measurements themselves are given in Appendix B.
First, we used AstroDrizzle to combine RAISIN FLT images downloaded from the Mikulski Archive for Space Telescopes (MAST) and create geometric distortion-corrected images for each epoch of each SN. For a given SN, each epoch of observations is drizzled, then the output, drizzled images are aligned to a reference image via tweakreg so that SN coordinates are consistent in all epochs, and then the data from each epoch are re-drizzled. Finally, we subtract the drizzled template image from each image containing the SN. We make use of sndrizpipe 14 , a pipeline for HST drizzling and image subtraction originally written for the CANDELS and CLASH HST Multi-Cycle Treasury programs (Grogin et al. 2011;Postman et al. 2012) that has continued being actively developed in the last few years. Images were drizzled to 0.11 pix −1 , slightly smaller than the native pixel scale of WFC3, to improve the resolution of the HST PSF (we found minimal differences between 0.09-0.13 pix −1 choices). The photometric zeropoints that we use are derived from the latest HST calibrations of Bohlin et al. (2020).
SN positions from discovery data or optical imaging are known to approximately 1 precision or better and are given in Table 5. We use centroiding algorithms from the hstphot 15 and PythonPhot (Jones et al. 2015b) packages to refine those positions. The best centroid was given by the weighted average of coordinates measured from F125W images for RAISIN1 and F160W images for RAISIN2. Because SN Ia are fainter in redder bands and host galaxies are often brighter, coordinates from F125W images tend to be more reliable when available.
Using the measured SN coordinates, we performed aperture photometry with a fixed 0.4 radius using the zeropoints and aperture corrections determined from standard star observations 16 . We found that due to the undersampled and somewhat variable HST PSF, aperture photometry was more reliable than PSF photometry for these images. There are few stars in a typical RAISIN frame from which to determine the PSF on an epoch-by-epoch basis and, likely due to the breathing of HST and the undersampled WFC3 PSF, we found the PSF was too variable for a PSF model based on archival data to be sufficient. By performing artificial star tests we found we were unable to perform PSF photometry without introducing mmag-level biases.
As the drizzling process produces pixels with correlated noise, uncertainties were estimated by planting 1000 fake stars per epoch at random locations. Fake stars were generated using a P330E PSF model (again created using hstphot) and each fake star had a magnitude equal to the measured SN mag in a given epoch. The dispersion in these fake star magnitudes was added in quadrature to the Poisson noise from the SN to give the total uncertainty due to sky background variation, correlated pixels, and Poisson noise. Given that our final pixel scale is near the native pixel size of WFC3-IR, the noise in the recovered stars is likely dominated by Poisson noise of the sky background.
The expected bias in the photometry given the centroid uncertainties derived above is derived by Rest et al. (2014), their Appendix D. This requires an estimate of how much the magnitude changes when the coordinate is incorrect by 1 pixel. P330E images show that this is approximately 0.022 mag for a 0.11 pixel scale, but typical centroid uncertainties are closer to ∼0.02 pix. We corrected for these biases, but in practice find that they are typically <1 mmag.

A.1. Host Galaxy Noise
SNe Ia on top of bright host galaxies can have biased measurements or underestimated errors. The additional noise introduced by a bright host galaxy tends to create significant subtraction residuals by increasing the effect of small PSF changes between the template and SN images. We occasionally see significant dipole-like subtraction artifacts.
To model this effect we used TinyTim to create up to 25 fake stars near the same host galaxy as each SN. We require that the fake stars are at least 0.7 arcsec from the SN position and from each other, which is >4 times the full width at half maximum of the F160W PSF. We cannot measure the noise at the exact location of the SN, but we can gain a statistical sense of the noise as a function of host galaxy surface brightness from these fake stars.
The results of this test are shown in Figure 20. Small error bars give the uncertainty on the average magnitude in each surface brightness bin, and large error bars give the dispersion of the fake magnitudes in each bin. To compute more realistic uncertainties, we then subtract the measured magnitude uncertainties in quadrature from the total magnitude dispersion of the fake stars. These quantities give the extra uncertainty that should be added in quadrature to each SN magnitude as a function of the brightness of the host galaxy underneath the SN. SNe with estimated host galaxy uncertainties of greater than 0.1 mag are removed from the sample; we chose this cut based on visual inspection, which shows that such SNe typically have subtraction artifacts likely due to misalignments or "breathing" of HST and are unlikely to provide reliable, unbiased photometry. Our team attempted applying convolutional subtraction algorithms Figure 20. Dispersion of fake star magnitudes as a function of host galaxy surface brightness at the SN location for F 125W (left) and F 160W (right). Large error errors (orange) show the dispersion while small errors (green) show the error on the mean. We restrict to the sample of SNe with predicted host galaxy error less than 0.1 mag and verify with visual inspections that the subtracted images for these SNe appear reliable.
to the data including HOTPANTS (Becker 2015) and ZOGY (Zackay et al. 2016) but did not achieve improved results for these bright hosts. Five SNe were removed from the final analysis based on excessive host galaxy noise.

A.2. Residual SN flux in Template Imaging
The RAISIN programs took HST template imaging between 129 to 285 days after the estimated time of maximum light for subtracting from the HST images with SN light. To test that our time interval between SN images and template images was sufficient, we used early-and late-time observations of SN 2012fr, which has Y J imaging at ∼150 days after maximum light from Contreras et al. (2018). With these data, we can extrapolate to later times to estimate the residual flux of RAISIN SNe at the time the template was taken.
For all RAISIN SNe, we use this estimated template flux to correct the magnitudes of RAISIN SN observations for which the closest rest-frame band is Y . Due to the faster decline of SNe in the J band, the J-band correction is expected to be negligible. To account for variability in the typical decline rate of SNe at late times in the NIR, we created a version of our analysis in which we conservatively increase the predicted late-time template flux by 0.5 mag for each RAISIN SN and correct the photometry for this revised template flux prediction. We include this analysis variant in our systematic error budget ( §3.7). Unfortunately, SN 2012fr is the only SN for which high-quality NIR data exist at ∼six months after maximum light. We predict that 36 of 47 RAISIN SNe would have declined by < 5 mag, giving expected magnitude corrections that are more than 1% for these SNe with a maximum predicted correction of ∼3.7%. We note that this source of uncertainty could be eliminated by re-visiting the sites of RAISIN SNe to obtain templates with zero flux from the SNe.

A.3. End-to-end Fake Star Tests
Finally, to verify both our photometric measurement technique and our drizzling procedure, we used TinyTim to simulate fake, geometrically-distorted stars in the FLT images. Then, we drizzled those images, subtracted them by a template, and measured output photometry on the sources detected in those images.
The results of these tests are shown in Figure 21. The median F125W and F160W biases are ≤1 mmag for bright sources, and ∼1% for all sources. However, because these tests use only a single epoch of data, this 1% bias may be due to inaccurate fake star centroids. In the real RAISIN data this bias is likely not present because we use multiple epochs to determine the precise PSF centroid. Figure 21. Measured fake star magnitudes minus simulated magnitudes as a function of mag for F 125W (left) and F 160W (right) with the median bias shown in red. We find just 1 mmag biases for sources brighter than 22 mag and ∼1% median biases overall likely due to centroiding errors that are corrected in the real data thanks to multiple observational epochs.

C. SIMULATING THE RAISIN SAMPLE
Here, we describe our simulations of the RAISIN sample. These simulations rely on an optical selection function that was determined for the PS1 and DES analyses Kessler et al. 2019). We modeled the sample under the assumption that the RAISIN SNe were selected in an unbiased way from the subset of Pan-STARRS and DES SNe with spectroscopic classifications, and the typical S/N of the RAISIN optical light curves compared to the full PS1/DES spectroscopic samples shows this to be a reasonable approximation. Though RAISIN targeted SNe in specific redshift ranges, those redshift ranges -0.25 < z < 0.45 for RAISIN1 and 0.4 < z < 0.6 for RAISIN2 -were within one standard deviation of the mean redshifts of the PS1 and DES SN spectroscopic surveys. Therefore, much of the necessary work for building these simulations for RAISIN was already undertaken by the MDS and DES teams. The main difference for RAISIN is the use of a new SN model, SNooPy, with the new optical-to-NIR dispersion model discussed in Section 3.3. We implemented the SNooPy model in SNANA by using the Python-based SNooPy code to generate a grid of model realizations as a function of s BV and A V , which SNANA may then use for both simulations and light curve fitting ( Figure 6). We then added a module to the SNANA simulation engine containing the SNooPy dispersion model, which generates correlated-random band-to-band offsets to be applied to the simulated magnitudes; this results in simulated distance-dependent biases. Both the SNooPy models and the SNooPy dispersion model are publicly available in SNANA.
For the low-z simulations, we do not have sky noise and zeropoint information for the original NIR observations. However, SNANA is able to estimate these quantities by using the magnitudes and uncertainties in the data themselves to create a "simulation library". Other low-z survey characteristics and selection effects were modeled in the Pantheon analysis ) and we use the magnitude-based SN detection efficiency determined in that work to simulate the low-z CSP sample here. For the high-z simulations, our simulation library is built from observations of the RAISIN SNe themselves, which allows us to simulate SNe at the specific redshifts of the RAISIN objects and control for possible differences between the mean survey observing conditions and the conditions during which the RAISIN SNe were observed.
Finally, we require that simulated distributions of x 1 and c parameters for SALT2 are replaced by distributions of the stretch parameter s BV and the extinction A V . Even though the light curves in NIR bands are less sensitive to s BV and A V than those in optical bands, a NIR-only approach with the RAISIN data does not allow shape/color to be corrected for and therefore the results will be very sensitive to sample-to-sample differences in these parameters and the optical selection effects that will bias the sample towards, e.g., high s BV and low A V . We therefore adapt the method of Scolnic & Kessler (2016) to determine the intrinsic s BV distribution for the low-z and high-z samples. The distribution of the s BV parameter is treated as an asymmetric Gaussian following Scolnic & Kessler (2016) and we estimate the s BV distribution by fitting s BV with our NIR data.
For A V , our baseline analysis assumes the mean intrinsic dust extinction is independent of redshift, but we vary this assumption in our systematic error budget. We simulate an exponential dust distribution with τ A V = 0.2 mag, which we find is a good approximate match to the optical+NIR data in Figure 22 and matches nominal dust distributions from the SN rates analysis of (Rodney et al. 2014). We also find that varying τ globally has a minimal impact on the redshift-dependence of our predicted bias corrections. Our two A V -related systematic uncertainty analysis variants reduce the A V scale length first by 0.05 for both samples (a small additional uncertainty) and second by 0.05 for the low-z sample only.
The low-z sample from CSP used a galaxy targeted approach to find SNe, which gives a sample of predominantly massive host galaxies and, as a result, increases the fraction of fast-declining SNe in the sample (Childress et al. 2013). For this reason we assume a priori that the low-z and RAISIN samples have different intrinsic stretch and color distributions. However, because the DES and MDS samples were selected in much the same way and to avoid statistical noise, we assume that the intrinsic distributions of shape and color in these SN samples is the same. This is an approximation that is necessary due to limited statistics, but the Scolnic & Kessler (2016) measurements of intrinsic population parameters in high-z samples show that this may be a good approximation. However, some evolution in population parameters (Nicolas et al. 2021) is expected, and larger samples both at high-and low-z in future analyses will allow constraints on this evolution as well as a more robust determination of the stretch and color distributions themselves.
The resulting stretch and color distributions are shown in Figure 22. We find that both the s BV and A V distributions are statistically consistent between low-and high-z samples, albeit with relatively large uncertainties. Figure 22. RAISIN simulations (red) compared to the data (black). Histograms of Hubble residuals, AV , sBV and maximum S/N are shown for the CSP sample (top), the RAISIN1 sample (middle) and the RAISIN2 sample (bottom). Various combinations of data are used for comparison purposes as indicated above each column. Simulations after applying a 1σ shift to parameters of the AV and sBV distributions are shown in blue and orange, respectively. Some discrepancy in maximum S/N may be due to statistical fluctuation due to the small sample sizes in this analysis or uncertainty in the chosen SNooPy model.
Finally, throughout this analysis we assume the default SNooPy total-to-selective extinction ratio of R V = 1.52 from Folatelli et al. (their Table 8;2010), which reduces dispersion about the Hubble diagram compared to choosing the Milky Way value of R V = 3.1 (Figure 18). The lower-than-expected value of R V may be due to the lack of an intrinsic color variation component of the SNooPy model (Mandel et al. 2022;Thorp et al. 2021) and because extinction is allowed to be (unphysically) negative in this analysis. By using the CSP third data release and allowing SNooPy to fit for R V , we find that SNooPy gives a median R V 2, though unfortunately we find the high-z RAISIN data are insufficient to constrain R V with SNooPy.
Because data at NIR wavelengths are a factor of ∼4-5 less sensitive to dust than the optical, distance error due to uncertainty in the value of R V is a sub-percent level effect and will not be a significant component of the error budget in this analysis. We note that in the NIR fitting itself, we treat A V as a constant, and therefore this effect will only change the distances in the bias correction stage.
The maximum S/N distributions are shown on the right side of Figure 22. Generally, the simulations expect slightly higher S/N than is observed in the data for high-z. Because the larger PS1 and DES samples have a similar S/N near maximum light compared to the subset with RAISIN observations − nearly identical for PS1 and marginally higher S/N in the DES RAISIN subset − we choose not to modify the simulations here. The low-z S/N, on the other hand, has an unusual shape that we were unable to reproduce, perhaps due to small sample sizes; in the full SNooPy sample, before applying our selection cuts, we do not see the same concentration around a narrow range of S/N. In future, larger NIR samples it will be much easier to understand such artifacts in the data. We are unsure whether the NIR sample was selected in the same way as the rest of the CSP sample, but we assume that for this analysis; the simulations developed for past analyses are therefore sufficient given systematic uncertainties on the s BV and A V distributions. A low-z sample that selects NIR-observed SNe in an unbiased way is a key potential improvement for future cosmological analyses.

C.1. Distance Measurement Methodologies
Using the RAISIN simulations, we test the assumption that NIR-only distance measurements are most precise when s BV and A V are fixed to a constant value. From simulations, we measure a Hubble residual RMS of 0.16 mag, slightly lower than the 0.19 mag measured from the real RAISIN data, when s BV and A V are fixed to a constant. When we attempt to use the NIR data alone to fit for A V , the Hubble residual RMS increases to 0.28 mag, and when we fit for both s BV and A V we measure an RMS of 0.25 mag. However, when we use the NIR data alone to fit for s BV we find that the RMS is unchanged from the s BV fixed case; this may be because the simulated SNooPy model does not include the increase in scatter at later phases that we observe and discuss in Section 5.3. However, despite this modest discrepancy, both the simulations and real RAISIN data appear to show no benefit from fitting to A V and/or s BV with NIR data alone.

D. HOST MASS MEASUREMENT
Host galaxy masses were reported by the CSP, DES, and MDS teams (Krisciunas et al. 2017;Smith et al. 2020b;Jones et al. 2018a). However, to ensure consistency between all methods, we estimated the host galaxy masses of SNe in the RAISIN and low-z samples ourselves. For the low-z sample we used data from GALEX (Martin et al. 2005), SDSS DR16 (Ahumada et al. 2019), PS1 DR2 (Flewelling et al. 2016), 2MASS (Skrutskie et al. 2006), while for the high-z sample we used only SDSS and PS1 as the other catalogs do not have sufficient depth for the faint, high-z RAISIN galaxies. For RAISIN2, we also used SN-free DES photometry from (Wiseman et al. 2020). Finally, because PS1 3π images may be contaminated by SN light, we used MDS single-season template images for RAISIN1 SNe. The stacked images make it possible to detect log(M * /M ) = 10 SN host galaxies with sufficient depth to measure masses on either side of the typical mass step location.
To find the host galaxy for each SN, we use SExtractor to determine the "directional light radius" (DLR) of between potential host galaxies and each SN (Sullivan et al. 2006;Gupta et al. 2016), a method that incorporates the size and orientation of each galaxy to determine which galaxy is the most likely host. Each most probable host was confirmed by eye and, thanks to HST imaging, we were able to determine which galaxies were the host without significant ambiguity. We then used SExtractor to measure the elliptical parameters of each galaxy in the r band, and then used elliptical aperture photometry to measure the magnitudes of each galaxy in each available bandpass. The aperture size was chosen to extend slightly beyond the isophotal radius determined by SExtractor, and was also extended to account for the increased PSF sizes of 2MASS and WISE. Any contamination by foreground stars was removed by using SExtractor to identify possible contaminants and masking those objects by setting the value of the pixels SExtractor deemed as belonging to those objects to the median value of the nearby pixels.
Once aperture photometry was measured, we used LePHARE (Arnouts & Ilbert 2011) with Bruzual & Charlot (2003) spectral templates and a Chabrier initial mass function (Chabrier 2003) to determine the stellar masses of each host galaxy. The templates include 9 exponentially decreasing star formation histories in three metallicity bins, and we allow E(B − V ) to vary from 0 to 0.4 in steps of 0.1 mag with a range of extinction laws. Uncertainties on these masses were estimated by Monte Carlo sampling of the photometry using the photometric uncertainties for each band and assuming a 1% error floor for bright galaxies. We note that mass estimation requires determining an absolute magnitude, making these mass estimates dependent on an assumed cosmology and the SN brightness residual. However, none of our mass estimates would change from <10 dex to >10 dex with modest adjustments in cosmological parameters; a 20% shift in w at a redshift of 0.6 would result in a systematic shift of just 0.03 dex relative to a low-z mass.
We compare these masses to estimates from Roman et al. (2018) for the low-z sample and Smith et al. (2020b) for the high-z sample. Out of the 15 SNe with host masses in Roman et al. (2018), only one SN, SN 2004ey, disagrees with our high versus low-mass designations. We do see a large median offset of −0.5 dex when subtracting the Roman et al. (2018) masses from ours, but find that this is due to our addition of GALEX and 2MASS data; running LePHARE on our optical-only measurements gives a marginally significant median difference of +0.21 ± 0.08 dex for these SNe. For DES masses, we find a median difference of just 0.06 dex between our masses and those of Smith et al. (2020b), with no disagreement between our high versus low-mass designations. Figure 8 shows histograms of the masses for CSP, RAISIN1 and RAISIN2 SNe; due to the targeted nature of the low-z CSP data, the CSP SNe are found in significantly more massive host galaxies.
For a given mass step location, with log(M * /M ) = 10 as the default, we assume the uncertainties on the mass estimates are Gaussian and use these uncertainties to compute the probability that each SN has a mass greater than or less than the value at the step location. We then built a maximum likelihood, Gaussian-mixture model that includes free parameters of the intrinsic dispersions for CSP, RAISIN1, and RAISIN2 (RAISIN2 has higher dispersion as it uses one filter instead of two for most SNe), median Hubble residuals at low, medium, and high redshifts to remove sensitivity to the cosmological model, and a single mass step parameter. This procedure largely follows Jones et al. (2018b). We correct for the measured mass step in our data and apply two systematic uncertainties based on the value and location of the mass step as discussed in Section 3.7.4.

E. DISTANCE MEASUREMENTS FOR RAISIN AND LOW-Z SNE
This appendix contains NIR distances and optical+NIR stretch (s BV ) and A V measurements for RAISIN, computed using SNooPy and assuming an R V of 1.52. Note. Distance moduli from RAISIN SNe, calibrated to our best-fit H0 of 75.4 km s −1 Mpc −1 . The "Raw Distance" (column 3) does not include the bias correction (column 4), which is added to the raw distance prior to cosmological parameter fitting. The "Rest-frame Bands" column indicates the rest-frame SNooPy templates that were used to fit the RAISIN observations.