ATOCA: an algorithm to treat order contamination. Application to the NIRISS SOSS mode

After a successful launch, the James Webb Space Telescope is preparing to undertake one of its principal missions, the characterization of the atmospheres of exoplanets. The Single Object Slitless Spectroscopy (SOSS) mode of the Near Infrared Imager and Slitless Spectrograph (NIRISS) is the only observing mode that has been specifically designed for this objective. It features a wide simultaneous spectral range (0.6--2.8\,\micron) through two spectral diffraction orders. However, due to mechanical constraints, these two orders overlap slightly over a short range, potentially introducing a ``contamination'' signal in the extracted spectrum. We show that for a typical box extraction, this contaminating signal amounts to 1\% or less over the 1.6--2.8\,\micron\ range (order 1), and up to 1\% over the 0.85--0.95\,\micron\ range (order 2). For observations of exoplanet atmospheres (transits, eclipses or phase curves) where only temporal variations in flux matter, the contamination signal typically biases the results by order of 1\% of the planetary atmosphere spectral features strength. To address this problem, we developed the Algorithm to Treat Order ContAmination (ATOCA). By constructing a linear model of each pixel on the detector, treating the underlying incident spectrum as a free variable, ATOCA is able to perform a simultaneous extraction of both orders. We show that, given appropriate estimates of the spatial trace profiles, the throughputs, the wavelength solutions, as well as the spectral resolution kernels for each order, it is possible to obtain an extracted spectrum accurate to within 10\,ppm over the full spectral range.


INTRODUCTION
One of the key observing modes of the Near Infrared and Slitless Spectrograph (NIRISS, Doyon et al., in prep) onboard the James Webb Space Telescope (JWST) is the Single Object Slitless Spectroscopy (SOSS) mode (Albert et al., in prep). It enables timeseries spectroscopy in the 0.6-2.8 µm range for bright targets, which is of particular use for exoplanet transit spectroscopy. Indeed, simulations have demonstrated SOSS as a key mode to use with JWST on the brightest exoplanet targets (Greene et al. 2016;Batalha & Line 2017;Louie et al. 2018;Schlawin et al. 2018) and it has been selected by multiple Cycle 1 programs, including the Early Release Science program DD-ERS 1366 (Batalha et al. 2017;Bean et al. 2018), as well as many Guaranteed Time Observations, e.g., GTO 1201 (Lafreniere 2017) and the General Observer Programs 1935 (Kempton et al. 2021), 2062 (Mayo et al. 2021), 2113 (Espinoza et al. 2021b), 2589 (Lim et al. 2021), 2594 (Spake et al. 2021) and 2722 (Madhusudhan et al. 2021). SOSS uses the GR700XD cross-dispersion grating prism (Doyon et al., in prep) in the pupil wheel of NIRISS to produce a series of three spectral traces: order 1 (0.83-2.8 µm), order 2 (0.6-1.4 µm) and order 3 (0.6-0.95 µm). In practice, order 3 does not warrant much consideration due to its faint signal and the fact that it does not increase the wavelength domain. A slight (22 pixel wide) defocus along the spatial axis is purposely included to enable observations of bright targets without saturating the detector pixels. Mechanical constraints in the thickness of the GR700XD element at the design phase prevented the first and second orders from being fully separated, resulting in an overlap by about half the trace width towards the red wavelength ends of the traces (See Figure 1) As a result, established methods for spectrum extraction cannot be applied directly to the regions affected by contamination. Typically, at high signal-to-noise, a box-extraction method (de Boer & Snijders 1981) is preferred, which is performed by simply summing over the spatial axis all pixels located within a fixed-width aperture. This technique has been utilized in many spacebased relative spectral measurements of exoplanets to date, for example: transits, eclipses or phase curves (e.g., Deming et al. 2013;Wakeford et al. 2013;Sing et al. 2015;Evans et al. 2017) as well as in many groundbased observations (e.g., Jordán et al. 2013;Diamond-Lowe et al. 2018). The main advantages are the fact that a box extraction is easy to implement and that it is less prone to modelling errors. On the other hand, at lower signal-to-noise, an optimal extraction (Horne 1986;Robertson 1986;Marsh 1989) is often better. Indeed, by weighting the pixels according to their relative contribution to the signal, a better precision can be reached. This requires the determination of a spatial profile, which is a delicate task that can introduce biases in the resulting spectra (Horne 1986;Jordán et al. 2013). Nevertheless, it is still used in the exoplanet community to perform spectrophotometric measurements from space (e.g., Kreidberg et al. 2014;Stevenson & Fowler 2019) or from the ground (e.g., Berta et al. 2011;Stevenson et al. 2014), with comparable results. However, these two methods have no mechanism to distinguish between contributions from overlapping traces from different sources or diffraction orders.
Yet, the challenge of extracting spectra from blended sources is not unprecedented. It was needed notably in the context of long-slit spectroscopy for several science applications, such as the observation of galactic nuclei (Lucy & Walsh 2003) or crowded star fields (Hynes 2002). In fact, a task as common as a simple sky subtraction is by itself a type of decontamination. Hence, various techniques have been proposed over the past two decades (e.g., Hynes 2002;Khmil & Surdej 2002;Lucy & Walsh 2003;Bolton & Schlegel 2010). However, due to the particularity of the NIRISS SOSS mode and the precision it requires, it was necessary to develop a dedicated algorithm.
In this article, we present ATOCA; an algorithm designed to properly decontaminate and extract overlapping orders. Though the methods that make up the ATOCA algorithm can be applied generally to the problem of extracting overlapping spectral orders, we focus here on the NIRISS/SOSS mode of JWST. Proper extraction of SOSS observations was our primary motivation for creating ATOCA, and the algorithm has been made part of the official JWST pipeline 1 -the data management system (DMS) -as part of the stage 2 spectral extraction step.
The article is divided as follow: Section 2 presents an estimate of the level of contamination that is expected with the NIRISS/SOSS mode. Then, the algorithm and its implementation are presented in sections 3 and 4, followed by section 5 where we evaluate the performance of the decontamination.

THE SOSS TRACE OVERLAP PROBLEM
The optics of the SOSS mode were designed with strong mechanical constraints, one of which, the total thickness of the GR700XD element, prevented the cross-dispersing prism from being sufficiently inclined to cleanly separate spectral orders 1 and 2 (see Albert et al. in prep). As a result, the red end of order 2 (λ ≥ 1.1µm) partially overlaps with that of order 1 (λ ≥ 2.2µm) (see Figure 1). This cross contamination of the signals is a major issue during spectrum extraction and will bias results using the simple aperture-based methods discussed in Section 1. The amount of contamination can be characterized by measuring the contaminating signal present in the extraction region, F contam , allowing for the definition of a contamination factor, c order , via the following ratio: Here, F order is the flux that would be extracted from the targeted order if there was no contamination. For example, for the first order, F contam is given by the signal from the second order that is overlapping with the region of interest. Based on laboratory measurements of the blaze function, detector efficiency, and end-to-end throughput of the telescope and NIRISS instrument, it is possible to simulate this contamination factor for any given extraction method. Figure 2 shows the contamination factors, c, for a standard box extraction using a 25-pixel aperture. The simulations were made using ATOCA and are described in Section 5.1. The aperture width was determined by minimizing the dispersion of the white light-curve over the contaminated band-pass (2.0-2.8 µm). To get a realistic estimation of the dispersion, we had to consider the expected jitter on the telescope pointing ∼ 5 mas (see Albert et al., in prep.) which has the effect to increase the width of the required aperture. The incident fluxes at different stellar effective temperatures were modeled using PHOENIX HiRes synthetic spectra (Husser et al. 2013). Note also that the contamination factor is presented in % to make a distinction with the transit depth (which is is typically in ppm), both being relative quantities.
For the first diffraction order, most of the affected wavelength range exhibits levels of contamination below 0.1 %, until 2.6 µm, after which c 1 increases exponentially up to 2 %. The stellar effective temperature also has a significant effect, since the second order covers shorter wavelengths (0.6 ≤ λ ≤ 1.4 µm) than the first order (0.83 ≤ λ ≤ 2.8 µm), hence a star with a stronger relative flux contribution at short wavelengths will be more affected. The second order, on the other hand, suffers more drastically from contamination, reaching levels above 100%. Fortunately, the wavelength domain affected (λ ≥ 0.85 µm) is shared with order 1 (and is within the region where order 1 does not suffer from contamination), so very little information is lost. The wavelengths towards the blue end of the spectrum (λ ≤ 0.85 µm) and complementary to order 1 correspond to the part of the detector where the orders' spatial positions deviate from one another, creating a drop in the contamination levels.
The above discussion holds for any absolute flux measurement, but what is the impact for the intended application of the SOSS mode -exoplanet time-serieswhose measurements are relative?
The fluxes measured in order 1 will be a combination of the true flux in that order, F 1 , contaminated by some flux from order 2, F contam . Assuming a simplified tophat model for an exoplanet transit (no limb darkening, instantaneous ingress and egress, non-grazing) this flux can be calculated, for cases in and out of transit, via: (3) where F out and F in are respectively the mean flux outside of transit and during transit measured by an extraction around the first order's trace, d is the transit depth due to the opaque planet (i.e., without considering an atmosphere) and δ 1 (λ 1 ) and δ 2 (λ 2 ) are the wavelengthdependent transit depths due to the planet's atmosphere for orders 1 and 2, respectively. λ 1 and λ 2 are the wavelength solutions at each order, which are both a function of the column position, x, such that λ 1 (x) and λ 2 (x). The transit depth measured on a contaminated trace is, by definition: Recalling the order contamination factor from equation 1, then the transit depth can be written: In the case where there is no chromatic variation in the atmospheric signal (i.e., a flat transmission spectrum), δ 2 (λ 2 ) = δ 1 (λ 1 ) so D = d + δ 1 (λ 1 ). Therefore, contamination has no bearing on the retrieved transit 0.6 0.7 0.8 0.9 1.0  Figure 1. High signal-to-noise stack of a SOSS mode observation on a tungsten lamp obtained at cryogenic vacuum testing (CV3) of the telescope and instruments. Order 1 is the most apparent feature extending from 0.83 µm on the right to 2.8 µm on the left portion of the image. The second order can be seen starting at 0.6 µm at the top right part of the image and extends out to 1.4 µm. At approximately 1.1 µm the second order is significantly blended with the first order. The faint third order can be observed above the 2nd order. The overlap between the spectral orders 1 and 2 on the left side of the image complicates the spectrum extraction and motivates this paper. Since the order 2 covers shorter wavelengths than order 1, this problem should be even more striking in actual astrophysical targets which are warmer (T ≥ 3000 K) than the tungsten filament used in the laboratory (1500 K). The vertical lines were added to mark the wavelengths, from right to left, at 0.9, 1.2, 1.6, 2.0, 2.4, 2.8 µm for order 1 (in black) and at 0.6, 0.8, 1.0, 1.2, 1.4 µm for order 2 (in red).   depth. In other words, the second order contaminating signal changes by exactly the same relative amount during transit as the first order signal.
In the case where the atmospheric signal is different at the two overlapping wavelengths, then the difference modulated by c 1 /(1 + c 1 ) will affect the transit depth (i.e., the last term in equation 5). To make a distinction with the contamination factor c, we will use the name "transit contamination" to refer to this last term of equation 5. Generally, ∆ will be about the same order of magnitude as δ 1 and δ 2 , so a good estimation can be drawn simply from the contamination term. Moreover, for order 1, c will be small, hence c/(c + 1) ≈ c. So, concerning the first order's relative measurement, the spu-rious signal can be approximated to less than 1% (i.e., c 1 ) of the chromatic contribution of the transit signal, δ 1 . For example, if we take a hypothetical transmission spectrum with a spectral feature for the first order of δ 1 (2.7µm) = 300 ppm above the mean transit depth d. Let's also assume that there is a spectral feature from the second order at the corresponding columns (see Figure 1) of δ 2 (1.35µm) = −200 ppm, i.e., 200 ppm below the mean transit depth. This would result in a difference of ∆(x) = 500 ppm and the resulting contamination signal will be around 1.5 ppm, considering a contamination factor of c 1 ≈ 3 % (see Figure 2).
Nevertheless, to fully grasp the importance of this effect, we computed the resulting contamination signal in transmission for a variety of exoplanets, most of them being part of the NIRISS Exploration of the Atmospheric diversity of Transiting exoplanets (NEAT)  . Expected contamination levels during transit for a diverse sample of exoplanets. Each target is shifted vertically by 2 ppm for the first order (left panel) and by 50 ppm for the second order (right panel). A horizontal dotted line is drawn for each model to mark its zero point. The color of the lines refers to the effective temperature of the host star, which influences the level of contamination. As in Figure 2, these estimates were made assuming a 25-pixel-wide box extraction and the wavelength domain was limited for the same reasons.
GTO program (Lafreniere 2017). The results are shown in Figure 3. The transit models for each planet were produced using the SCARLET atmosphere framework (Benneke & Seager 2012, 2013Benneke 2015) assuming, for simplicity, cloud-free atmospheres with solar elemental abundances and chemical equilibrium. These assumptions generally lead to stronger signals, hence upper limits on the estimates. For the first order, the transit contamination is constrained below 8 ppm for all targets except WASP-107 b, for which it reaches almost −12 ppm. This corresponds to ≈1% of the planet atmospheric signal, as expected. For example, the transmission spectrum of WASP-107 b presents spectroscopic variations around 2500 ppm (see Figure 9), which would lead to an expected transit contamination signal of 25 ppm, not far from 12 ppm value. On the other hand, the second order is much more affected, with levels around 100 ppm or more in the longer wavelength range shown in Figure 3. This contamination comes from the wings of the first order's spatial profile. The longer wavelengths are not presented since the second order becomes almost completely diluted into the first order (see Figure 1). On the other side, below 0.8 µm, where the second order contributes unique wavelength coverage, the transit contamination seems to vanish. However, this drastic drop in contamination is attributable to the simulations, as discussed in Section 5.2. Whilst the systematic error on the transit signal may seems small, it must not be taken lightly and should be prevented using the ATOCA extraction method presented in the next section. We also want to emphasize that this is a systematic error, and not a randomly dis-tributed source of noise such as shot noise. These estimations could also be worsened by any other relative signals that depends on wavelength; like limb darkening and stellar contamination from unocculted regions (e.g. Rackham et al. 2018;Genest et al. 2022). Moreover, the examples presented here assume that the trace shape is perfectly stable within a whole time-series and that the trace position is varying within the expectation, i.e., following a random normal distribution with a dispersion of 5 mas. In the context of real observations, these assumptions may not be true due to the finite pointing precision of the Fine Guidance Sensor (FGS) and possible variations in the point spread function (breathing effects, wavefront variations, etc.). Therefore, to obtain the most stable transmission spectrum, one may need to increase the width of the aperture (along the spatial axis) used for the box extraction (e.g., Diamond-Lowe et al. 2018;Mikal-Evans et al. 2021) in order to minimize the variation due to the signal moving in and out of the aperture, hence increase the contribution of the contaminating order. Furthermore, even in the context of standard extractions, ATOCA will help to calibrate and extract the one-dimensional spectra. The contamination would also need to be properly characterized by identifying the contribution of each order to ensure the reliability of any results. Finally, extraction using ATOCA will ensure that science applications needing absolute flux calibration can be realized with SOSS.

DESCRIPTION OF THE EXTRACTION METHOD
The core idea behind ATOCA is to determine the underlying flux by fitting each order directly and simultaneously on the detector image, pixel by pixel. To do so, we first need to establish a linear model of each individual pixel with the flux as independent variable. More precisely, we need to discretize the flux by evaluating it on a wavelength grid. Each of the nodes (or elements) of the resulting flux array is an independent variable. Then, by minimizing the χ 2 with respect to each of these nodes, we are able to express the flux as the solution of a linear system, which ultimately enables us to explicitly extract it. Hence, no forward modelling of the flux is needed for an extraction. However, to be accurate, it requires a thorough knowledge of the detector's properties. The formalism of the algorithm is described below.
For the following equations, each pixel that will be used for the fit will be labeled by the index i and the total number of relevant pixels is given by N i . There is no need to account for the two-dimensional nature of the detector with additional indices. Each diffraction order also needs to be identified, here using the index n. Now, to determine the flux falling on each pixel, we need for each order: 1) the wavelength solution, 2) the spectral throughput 2 , and 3) the spatial throughput. From the wavelength solution, we can define the central wavelength of a pixel i at order n as λ ni . It will also be useful to define λ + ni and λ − ni ; the wavelengths at the pixel borders in the spectral direction, and ∆λ ni = λ + ni − λ − ni ; the pixel spectral coverage. The spectral and spatial throughputs are given respectively by the function T n (λ) and the constant P ni . Take notice that the former depends on the wavelength but not explicitly on the pixel, whereas the opposite is true for the latter. Finally, let the spectral flux density of the target incident upon the spectrograph be f (λ).
It is also important to consider that the flux density is seen by each order at a different resolution, so for an order n, we need an additional input: 4) the spectral resolution kernel κ(λ, λ). It specifies the convolved flux f in relation with an incident flux through the equatioñ Figure 4 presents some visualizations of the aforementioned quantities. The difference between the two orders after convolution with the resolution kernels becomes apparent in the overlapping wavelength range, where the first order (blue curve) is not superimposed perfectly on the second order (orange curve).

The model
With all this in place, we are now able to define a model of the detector. The number of photo-electrons detected by pixel i can be represented, up to a multiplicative constant, as where a ni (λ) accounts for all the coefficients that are not the flux. Note that the summation is made over all orders n that contribute to the signal measured by a given pixel i. In the case of the NIRISS/SOSS mode, the index n covers only the first and second orders; the third order is not considered since it does not cover the same pixels. To translate this model into a numerical form,  we can define a grid where f is projected, labeled by the index k so that f (λ k ) = f k and ∆λ k = λ k+1 − λ k . The length of the discretized grid would then be given by N k , so that 1 ≤ k ≤ N k . Similarly, Nk is the length of the convolved fluxf , so that 1 ≤k ≤ Nk. We also need a numerical form of this integral. There are multiple ways to do this, but for ATOCA, we use the trapezoidal method on a specified grid as illustrated in Figure 5. The details of this method are in Section B. Independently of the chosen integration technique, the numerical form of the integral will look like, with w ink given by the integration method. To link the diffraction orders, we want to write these equations according to the underlying flux f (λ), following equation 7. In the numerical form, with κ nkk being the coefficients of the convolution kernel at order n. The numerical integration method as well as the kernel are comprised in them.
Finally, we have that This equation can be written in a more intuitive matrix form as (12) To simplify the notation again, we can put all the coefficients for each order in single matrices b n with dimen- and add them together in one matrix B to have a final model of each valid pixel given by This result is one of the main utilities of ATOCA, which is a linear model of the full NIRISS/SOSS detector. One could use it to generate quick simulations, given a model of the incident flux.

Solving for f
Now that we have a model of the intensity at each relevant pixel, we can link their individual measured intensity D i , with f i by fitting directly the pixel model on the detector using a χ 2 minimization. Given the following equation, with D being the array of measured intensities on each pixels i and σ the array of their uncertainties, then the best solution can be found by imposing The detailed calculations found in the Appendix A lead to the following system of equations: This system can now be solved for f . A comparison with the optimal extraction method (Horne 1986) is presented in Appendix C.
To precisely estimate the integral representing each pixel, an oversampled numerical grid is required (see Figure 5 and paragraph Grid sampling of Section 3.4). Hence, for a given pixel, the solution of equation 17 would be highly degenerate. In many situations, the system will still be invertible since many pixels can cover slightly different wavelength ranges, but the solutions will then be extremely unstable. This is an illconditioned system, where a slight change in the observation vector D could cause large differences in the solution f . To circumvent this problem, the system needs to be regularized.

Regularization
There are multiple ways to perform regularization. The one chosen here is Tikhonov regularization (Tikhonov 1963). It is also referred as Phillip-Twomey's regularization (Phillips 1962) and ridgeregression (Horel 1962). This technique has been used in astrophysics in a variety of similar situations requiring the inversion of an integral equation (e.g., Kunasz et al. 1973;Thompson 1990). In fact, it is part of some advanced spectral extraction methods for fiber-fed spectrographs that were identified among the most effective methods (Min et al. 2020). It is also used in the context of spline interpolation of noisy data (Green & Silverman 1993;Hastie & Tibshirani 1990). The main idea is to add a regularization term to the linear χ 2 (equation 15), yielding the following equation: Here, α is a Lagrange multiplier and Γ is a linear operator that adds a "cost" depending on the nature of the solution. Generally, it is used to favor smoother solutions, and hence reducing the level of overfitting. We can obtain the new solution by minimizing the system in the same fashion as before, differentiating with respect to f k , with the following result: In this case, the Tikhonov matrix Γ was chosen to be the first derivative operator as in Li et al. (2015) or Piskunov et al. (2021). The choice of α can be done in many different ways. Usually, the general idea is to find a good balance between the regularization term and the χ 2 which can be done using the L-curve criterion (e.g., Hansen 1992) or generalized cross-validation techniques, GCV (Golub et al. 1979;Wahba 1977). The L-curve technique has the advantage of being robust, especially against correlated noise (Hansen & O'Leary 1993) compared to the GCV. However, it tends to over-smooth the solution, which is not optimal. As for the GCV, it is too computationally intensive in the context of largescale problems. However, in the present situation, the objective is not the direct product of the regularization, i.e., the underlying flux, but rather the modeling of the pixel after re-integration (this will be discussed in sections 3.4 paragraph Proper output and 4). Thus, it is not necessary to find the most physically accurate solution. In fact, what is needed is the smoothest solution that, once re-projected on the detector, fits the observations within the uncertainties. Consequently, we defined custom criteria to determine α with the objective to keep the dependency of the solution's sensitivity to the scaling factor lower than the expected noise. Mathematically, this comes back to defining a threshold on the derivative of the χ 2 with respect to log α. The same convergence criterion was used by Khmil & Surdej (2002) in a similar context. subject to computational errors. One way to define the grid would be to use a grid representative of the native pixel sampling and to oversample each interval of this grid by a certain factor. This will however create unnecessarily large systems and increase the computation time. Moreover, for the regularization method to be well-behaved, it is better to have errors of the same order for each node. Indeed, as highlighted by Puetter et al. (2005), the dynamical range needs to be constrained, lest some regions will be overfitted and others underfitted. Given all this, we opted for an irregular grid designed to make the magnitude of the integration error between subsequent nodes more uniform. To estimate the integration error on each node, we compared the result of a trapezoidal integration (as it is done in ATOCA) with a more precise Romberg's integration. The intervals with an estimated error higher than a specified tolerance were oversampled by a factor of two iteratively until the tolerance was satisfied. This method only requires an estimate of the function to be integrated, which can be given by a user or directly estimated from the data. An example of an uneven grid is shown in Figure 5, represented by the vertical dotted lines.

Other considerations
Proper output -As mentioned before, the underlying flux f is not the end product of an extraction. Indeed, since it has a resolution higher than both orders on the detector, f will be degenerate. In fact, solving for f is a deconvolution, which is subject to instabilities or artifacts (Bolton & Schlegel 2010). Thus, one will have to reconvolve the result to get rid of these effects. In this case, the underlying flux has to be integrated on bins representing a pixel grid. This can be done by invok-ing equation 14 and reconstructing the detector, which can be used to assess the quality of the detector modeling by examining the residuals. It could also be used to rebuild each order independently using the b n matrices from equation 13. Furthermore, it is possible to get a one-dimensional spectrum by re-integrating on a grid representative of the pixel sampling. In fact, this is equivalent to reconstructing a single row of the detector. In this context, without any dimension in the cross-dispersion axis, the spatial profile would not be relevant anymore and its value should be set to unity.
Wavelength distortions -In a variety of situations, the wavelength solution will not be constant along the axis perpendicular to the dispersion. Generally, this can be due to differential refraction in the Earth's atmosphere and imperfect spectrograph optics (Horne 1986). This effect is seen in many spectrographs (e.g. Bolton & Schlegel 2010;Piskunov et al. 2021). It can also be caused by observing techniques like the spatial-scan mode of the Hubble Space Telescope's Wide Field Camera 3 (Deming et al. 2013). In the case of the NIRISS SOSS mode, a tilt is present due to the grism configuration (Albert et al., in prep.). Standard extraction procedures will either neglect this distortion (Sing et al. 2015) or resample the detector image by interpolating along the dispersion axis (e.g., Kreidberg et al. 2014). ATOCA has the advantage of implicitly accounting for this distortion by treating each pixel individually, and using the full 2D wavelength solution.

IMPLEMENTATION
ATOCA has been implemented in python 3.8 as an option for the spectral extraction of NIRISS/SOSS observations in the JWST Data Management System (DMS). It is part of the Extract1dStep 3 of the spec2pipeline calibration step. A description of the inputs and an example Jupyter Notebook can be found at https://github.com/AntoineDarveau/atoca demo.
However, the end product is not the spectrum extracted with this method, as described in the Section 3.4 paragraph Proper output. Indeed, one caveat regarding the product of an extraction with ATOCA is the dependency on the accuracy of the model. Just like the optimal extraction, the method needs a relative precision on the spatial profile much better than the corresponding data in order to make an unbiased spectral extraction (Horne 1986). Moreover, inconsistencies between the spectral orders in the wavelength  . Decontamination steps for the first and second spectral orders. From top to bottom: A) processed image of the detector with only signal from the targeted object, B) selection of relevant pixels, C) reconstructed image of the contaminating orders after a fit using ATOCA, D) decontaminated image of the first order after subtraction of the contaminating order; the data is now ready for a standard box extraction around first order (delimited by the green lines). Panels E and F are the same steps as C and D, but the extraction of the second order. The color scale is logarithmic, with the lowest intensity in black and the highest in yellow. The pixels that are not considered in the analysis are in white. solution, the throughput or the resolution kernels could result in an intermediate solution between the two orders that will not satisfy the expected accuracy. However, in the case of contaminated spectra, it is possible to circumvent this problem by reconstructing the trace of each individual order with the b n matrices of equation 13, which are then used to decontaminate the detector image; the process is shown in Figure 6. This allows for a more classical extraction, like a box extraction, to be performed afterwards on each decontaminated trace. This technique has the advantage of reducing the dependency on the accuracy of the model. For instance, as shown in Figure 6, the modelling will only affect contaminated columns of the first order, leaving the rest untouched. This technique was preferred in the context of NIRISS/SOSS mode, although the ATOCA spectra are also available as a byproduct of the detector fit. Another output of the ATOCA is the model of each order (1st and 2nd) that were used for the decontamination. This could be useful to assess the quality of the fit by simply looking at the residuals. It would also be required to quantify the level of contamination (see Section 2). Furthermore, this model of the detector can be used to assign values to bad pixels located inside the aperture without the need for a separate outlier correction routine.
As shown in Figure 6, only the well-behaved pixels that will be extracted are considered in the application of ATOCA. This is in order to reduce the computational time and to make sure that the detector fit is not biased by superfluous regions of the detector. However, even with pixel selection, the size of the matrix B in equation A5 is still considerable, with a size of N pixel × N k . For example, in the realistic scenario of a 40-pixel aperture and a reasonable oversampling of the wavelength grid (tolerance of 10 −3 per pixel), the dimension of the matrix will be around 150, 000×6000 = 9×10 8 . To overcome this problem, we took advantage of the fact that most elements of the matrix are null. Indeed, each order (or source) modeled by the matrix B is represented as a block-diagonal that can be shifted with respect to the main diagonal. This enabled us to use the scipy (Virtanen et al. 2020) sparse matrices and drastically lessen the computational time and the amount of memory needed.
The implementation required the addition of new reference files to the Calibration Reference Data System (CRDS) 4 used by the DMS, since ATOCA needs, for each order, the two-dimensional wavelength solutions, 4 https://jwst-crds.stsci.edu/ the spectral resolution kernels, the spatial profiles, and the spectral throughputs. The one-dimensional wavelength solutions and the throughputs are already a product of the standard calibrations, so the two-dimensional wavelength maps can be created simply by applying the tilt to the existing solution. The convolution kernels were determined from monochromatic point spread functions generated with WebbPSF. The resulting images were rectified to remove the tilt and summed over the spatial axis to keep only the spectral dependency. The most demanding input is still the determination of the spatial profile, but techniques typically used for optimal extractions could be applied directly to most of the spectral ranges, except for the overlapping parts. An algorithm to estimate the spatial profiles of both orders within the contaminated region is currently in development (Radica et al., in prep), and will be made available to complement ATOCA before the release of Cycle 1 data.
As mentioned in section 3.4, ATOCA needs an estimate of the underlying flux f k to generate an oversampled grid. In the context of the NIRISS/SOSS mode, this is done by extracting the underlying flux f k for each order separately with only the most contaminated pixels masked. These rough extractions are done over a grid at native pixel sampling, which ensures the stability of the solution and removes the need for any regularization. Precision will be lost in this process and the contamination will not be treated correctly, but it is sufficient to generate the estimate. The latter is also used to provide a first estimate of the regularization factor, which is refined afterwards as described in section 3.3. This step can take some time depending of the precision needed for the oversampled grid and the number of relevant pixels. Fortunately, it only needs to be performed once for a given time series, since the underlying spectrum will not vary enough to justify a different level of regularization.
In realistic observations, the position of the trace will change slightly between visits due to the angle of the pupil wheel and variation in the target acquisition, which will alter the wavelength solution as well as the spatial profile. These changes will effectively take the form of a rotation and a spatial shift of these reference files and are implemented as input parameters (rotation and translation) captured by the keyword soss transform = [x, y, theta]. These parameters can be either specified by the user or determined within Extract1dStep by fitting the measured trace centroid.

Simulations
Two types of simulations were used in the context of this work: simulations produced with ATOCA itself (ATOCA simulations), and those produced by the instrument development team (IDT simulations). The first type takes advantage of the ATOCA capability to directly model the detector image using equation A5, given an input spectrum. Hence it is possible to use it to generate simplistic simulations to validate the internal consistency of the method. To make sure that the numerical precision was not an issue, the wavelength grid used for the simulation was oversampled to limit the integration error over each pixel below ∼1 ppm. We only included photon and background noise, so neither bad pixels, 1/f noise, nor cosmic ray hits were taken into account. The reference files were based on the best current knowledge of the NIRISS/SOSS mode, as described in section 4. We used the throughput and the one-dimensional wavelength solution estimated or measured in the lab by the NIRISS instrument development team. The spatial profiles were determined from the IDT simulations The IDT simulations (see Albert et al, in prep) are made by distributing an incident flux directly on an oversampled image using a trace as wide as a single oversampled pixel. This signal is then convolved in two dimensions with a grid of monochromatic kernels from WebbPSF and re-sampled at the native pixel resolution. These simulations were used to test the versatility and robustness of ATOCA on more realistic simulations. They also include the 1/f noise, the effects of flatfielding and bad pixels. In all situations, the incident stellar fluxes are taken from high-resolution PHOENIX synthetic spectra (Husser et al. 2013) and the transit models from SCARLET (Benneke 2015).

Results
For the first series of validations, the simulations were made with ATOCA itself to assess the performances of the decontamination and internal consistency. For each decontamination, the estimated relative tolerance of the wavelength grid had to be less than 10 −3 , a level of precision that implies a reasonable grid length, and hence moderate computational time. We also do not expect the precision of a single pixel to get higher than this, which corresponds to a SNR of 1000.
A first test was done on a single exposure by comparing each extraction to an equivalent simulation free of contamination. Different maximum pixel signal-tonoise ratios were tested, ranging from 200 to 10,000. This verifies our ability to decontaminate observations with the most extreme levels of precision planned for the NIRISS/SOSS mode. An example of the results is shown in Figure 7 for a star with an effective temperature of 2300 K and for a SNR of 10,000. The decontamination performance proved to be very effective at removing the contamination from the second order. In the case presented in Figure 7, ATOCA was able to go from a contamination of ∼ 10000 ppm, equivalent to a hundred times the expected uncertainties, to virtually no contamination. The residuals fall within the expected uncertainties (<100 ppm) and seem free of any correlated noise. The same conclusions hold for all stellar temperatures. It is also interesting to note the clear cut around 0.85 µm in the contamination levels (before decontamination). This is an artefact of the simulations since the monochromatic kernels used for the two-dimensional convolution only cover 128 native pixels. Thus, there is a threshold at 69 rows around the center of the trace where the wings of the spatial profile are not modeled. In the context of real observations, the contamination levels should extend below 0.85 µm, while continuing to decrease.
The second and third tests were inspired by the commissioning programs Martel et al. (2020) and Espinoza et al. (2021a). The former is a flat time series observation comprising 876 integrations on the standard A1V star, BD+601753. This will quantify the stability of frame-by-frame decontamination for representative SNRs (177 at maximum pixel). The results are presented in Figure 8. The flat sensitivity spectrum shows that the frame-by-frame decontamination is stable. The combined spectrum reaches a precision of less than 100 ppm at best and between 150 and 400 ppm in the regions subject to contamination. The measured standard deviation along the time series is in good agreement with the expected uncertainties in the entire spectral range, as evidenced by the sensitivity curve. Note that these results only accounts for photon noise; in realistic observations, other sources of noise like 1/f, jitter or other detector effects might become dominant at certain wavelengths.
The latter program consists of another time-series observation of an expected featureless transit, evaluating the precision of relative measurements. Our simulation was based on the primary target of this program, the massive hot-Jupiter HAT-P-14 b, which was simplified to a step-transit (no limb darkening, instantaneous ingress and egress) with a flat transmission spectrum. We also neglected the effect of non-linearity in the ramps and forced a signal-to-noise ratio of 400 per pixel at maximum. This is above the capability of a SOSS-mode single integration, but it allows to push the decontamination at higher levels. The same framework was also applied to a transit spectrum of an exoplanet similar to WASP-107 b in a fourth verification (same transmission spectrum and same star, but different magnitude). In this case however, the signal-to-noise was artificially increased to 1000 for the same reason as mentioned above. Both validations led to the same conclusions, so the results of the flat transit are not shown here. The WASP-107 b-like transmission spectrum is presented in Figure 9. It confirms that the procedure can reach the expected precision and accuracy on relative measurements. Even with a required precision on ATOCA's wavelength grid of 10 −3 per pixel for the integration-by-integration decontamination, the combination of all extractions reaches an accuracy of less than 100 ppm in the order 1 contaminated region with, again, no evidence of systematic bias. The performance is even more obvious when it comes to the second order, where the contamination reaches levels of 1000 ppm. For comparison, Figure 10 presents the results of the WASP-107 b time series without any decontamination. We can see that the polluting signal follows the expected curve (red) taken from equation 5. It also confirms that the levels of contamination for the first order are small compared        to the actual signal. The result of the flat transit is not shown here since it leads to the same conclusions. Based on these four tests, some remarks can be made. First, it is interesting to note that they entail qualitatively different high-resolution PHOENIX synthetic spectra (Husser et al. 2013) at T ef f = 9400 K (BD+601753), T ef f = 6700 K (HAT-P-14), T ef f = 4500 K (WASP-107) and T ef f = 2300 K; the hottest spectrum showing well-defined absorption features on a smooth continuum and the coldest one containing features with noise-like behaviour. This is an assessment of the robustness of ATOCA regarding the nature of the underlying spectrum f k .
Second, the tests were initially run assuming that the spatial profiles, the wavelength solutions, and the throughputs were exactly known, resulting in errors consistent with the expected noise limits for each application of ATOCA. However, the same tests were repeated with the reference files slightly shifted from their nominal values to confirm the robustness of the decontamination by applying a rotation and spatial shift (see end of Section 4). We found no evidence that it affected the extraction for reasonable values, i.e., within the expected precision of the reference files. More precisely, we tested for shift in the dispersion direction up to 0.5 pixel, for shift in the spatial axis up to 0.1 pixel and for rotations up to 0.01 deg.
Third, based on the apparent agreement between the expected and measured transit contamination seen in Figure 10, it would seem that the correction for contamination could be made after a standard extraction, directly on the one-dimensional spectra using equation 5. However, these examples used an idealized transit, with a perfectly stable stellar spectrum and without considering any limb darkening. Further analysis should be done before making any conclusion on this possible alternative to correct for order contamination.
ATOCA was also applied on the realistic time series simulation from the IDT to test the robustness of the algorithm. We present here an example on WASP-52. In this case, the tests were designed to assess the quality of integration-by-integration decontamination as well as the stability of the decontamination. Only the stellar spectrum was included since adding a transit would only add complexity to the interpretation of the results, without bringing additional information. The time series comprises a total of 103 integrations with a signal-tonoise ratio per pixel reaching up to ∼ 250. Contrary to the more simplistic simulations, we did not have access to each individual order, which is more representative of the context of real observations. Therefore, we used the residual of the full detector model (combined orders) for individual integrations as an indicator of the quality of the decontamination. The logic being that, if the model is able to represent correctly the overlapping region as well as the pixels covering the wavelength domain shared between both orders, then we can be confident that the overall model is accurate. Figure 11 presents the residuals for a single integration, given by the equation In this situation, since the input spectrum is perfectly stable, the uncertainties could be determined empirically using the standard deviation of each pixel throughout the full time series. The result is consistent with Gaussian noise and there is no evidence of any correlated features.
Concerning the stability of the decontamination, we had to do a similar comparison as in Figure 8. However, this time, the spectrum extracted with the standard technique could not be used directly to avoid a possible contribution from the bad pixel modeling. Instead, we used the standard deviation of the pixels along all integrations, as it was done above to estimate the uncertainties, and then computed the summation in quadrature with the weight specific to the extraction method. In this manner, the bad pixels are not included in the summation. This was done for the time series before and after decontamination. The results are presented in Figure 12. The sensitivity increases slightly at longer wavelengths where the contamination is at its peak, but it remains contained below 5% for practically all of the domain of both spectral orders. This means that the precision, which is around 1000 ppm in the current example as shown in the top panel, would differ by only 50 ppm.
Based on these two tests, we can conclude that ATOCA can model the detector image within the uncertainties and at a low cost in terms of noise. This shows again that the algorithm is robust to inexact reference files.
On a different note, it is important to mention that the quality of the modeling is strongly influenced by the intertwining between both orders in their shared wavelength domain. This effect is accentuated in regions where the signal from both orders is strong, in which case a poor representation of the relation between the two can lead to over-and under-estimation. It can also be compensated with lower regularization factors, i.e., over-fitting. This was seen in many situations where the reference files were biased on purpose, as well as in the realistic simulations. This effect can be overcome by a proper estimation of the reference files. Thankfully, the regions that require higher precision are practically free of contamination, so the calibrations of the spatial profiles, the wavelength solutions and the throughput are relatively straightforward. Conversely, in the overlapping region, the throughput from the second order drops considerably, leaving the solution of the underlying flux dominated by the corresponding wavelength from the first order. This means that the model of the order 2 in the overlapping region is more permissive.

FUTURE IMPROVEMENTS
Hyper-parameters -The choice of regularization parameters could benefit from further improvements. Indeed, the current criterion could lead to unstable solutions, which would be very effective at modeling each valid pixel of the detector, but mediocre when it comes to accurately estimating any pixels that are not included in the fit. The latter objective could be achieved using other criteria. For example, non-exhaustive crossvalidation techniques (e.g., k-fold, Monte-Carlo) would be a judicious choice since their primary objective is to be able to simulate a set of data points that are voluntarily excluded from the fit.
Choice of Tikhonov matrix -The injection-recovery tests on our simulations pointed towards comparable performances with the first and second derivative operator. The former was preferred due to its slightly lower complexity. However, the latter could end up being a more appropriate choice. Indeed, the second derivative is used in spline interpolation on noisy data to smooth out the solutions, which is not far from the problem we are facing here. It would also be a more physical explanation, given that the finite resolution of observations enforces a smooth solution with only small variations of the second derivative.
Background fitting -One forthcoming challenge with the NIRISS/SOSS mode is the background subtraction. While the spatial spread of the trace profile is very effective to improve the precision of the measurements, it also greatly reduces the number of pixels available to measure and remove the background contribution. This problem is even more concerning in the SUBSTRIP96 observing mode, where the two traces cover the entire range of rows for some columns. ATOCA could circumvent this problem by directly including the background in the fitting, adding the parameters needed to model the background at each column to the solution vector.

CONCLUSION
In this work, we presented an alternative spectral extraction method to solve the overlap problem pertaining to the NIRISS/SOSS mode. We first characterized the extent of the contamination for the first and second diffraction orders. It was found that for absolute measurements, the levels were kept below 0.1% for most of the wavelength domain of the first order, except for wavelengths greater than 2.6 µm where they reach 1%. For the second order, the effect is much more important, but concerns mainly the wavelength range already covered by order 1. For relative measurements, for which the SOSS mode is specifically designed, the same levels of contamination are expected, but only on the chromatic differences of the signal (e.g., the difference in transit depth). This means that the systematic error due to the overlap should not be the dominant source of noise in the first diffraction order, although one should always assess its importance.
Nevertheless, it is still important to provide a way to disentangle each order's contribution to at least quantify the contamination, but also to allow a proper extraction for any absolute measurements or scenarios where the relative contamination becomes non-negligible. Consequently, we developed ATOCA, an algorithm that enables the modeling and extraction of overlapping orders (or sources), and decontamination the detector image. We showed that, given reasonable estimates of the spatial profiles, the wavelength solutions, and the spectral throughputs, ATOCA was able to decontaminate the data up to the required precision. We also characterized the robustness of the decontamination by introducing errors in the reference files and by applying it to realistic simulations. A first version of the algorithm is available in the JWST official pipeline. A development version is also available on github 5 .
ATOCA is a promising technique to disentangle the contributions of overlapping spectral traces. Its framework might be transferable to other contexts, like field decontamination or multi-object slitless spectroscopy. ATOCA might also provide a powerful alternative to manage distorted wavelength solutions. All this potential has yet to be vetted throughout real observations which should come soon with the upcoming commissioning of NIRISS/SOSS. TLFQ. 1998, in Dictionnaire historique du français québécois (Québec: Presses de l'Université Laval) Virtanen, P., Gommers, R., Oliphant, T. E., et al. 2020, Nature Methods, 17, 261 Wahba, G. 1977 Wakeford, H. R., Sing, D. K., Deming, D., et al. 2013, Monthly Notices of the Royal Astronomical Society, 435, 3481