NA-SODINN: a deep learning algorithm for exoplanet image detection based on residual noise regimes

Supervised deep learning was recently introduced in high-contrast imaging (HCI) through the SODINN algorithm, a convolutional neural network designed for exoplanet detection in angular differential imaging (ADI) datasets. The benchmarking of HCI algorithms within the Exoplanet Imaging Data Challenge (EIDC) showed that (i) SODINN can produce a high number of false positives in the final detection maps, and (ii) algorithms processing images in a more local manner perform better. This work aims to improve the SODINN detection performance by introducing new local processing approaches and adapting its learning process accordingly. We propose NA-SODINN, a new deep learning binary classifier based on a convolutional neural network (CNN) that better captures image noise correlations in ADI-processed frames by identifying noise regimes. Our new approach was tested against its predecessor, as well as two SODINN-based hybrid models and a more standard annular-PCA approach, through local receiving operating characteristics (ROC) analysis of ADI sequences from the VLT/SPHERE and Keck/NIRC-2 instruments. Results show that NA-SODINN enhances SODINN in both sensitivity and specificity, especially in the speckle-dominated noise regime. NA-SODINN is also benchmarked against the complete set of submitted detection algorithms in EIDC, in which we show that its final detection score matches or outperforms the most powerful detection algorithms.Throughout the supervised machine learning case, this study illustrates and reinforces the importance of adapting the task of detection to the local content of processed images.


Introduction
The direct imaging of exoplanets through 10-m class groundbased telescopes is now a reality of modern astrophysics (e.g., Bohn et al. 2021;Chauvin et al. 2017;Keppler et al. 2018;Marois et al. 2008bMarois et al. , 2010;;Rameau et al. 2013;Wagner et al. 2016).Reaching this milestone is the result of significant advances in the field of high-contrast imaging (HCI).For instance, extreme adaptive optics (AO) are routinely used during observations to correct image degradation caused by the Earth's atmosphere (Snik et al. 2018).In the same way, dedicated HCI instruments, such as Subaru/SCExAO (Lozi et al. 2018) or VLT/SPHERE (Beuzit et al. 2019), make use of state-of-the-art coronagraphs (Soummer 2005;Mawet et al. 2009) in order to block out the starlight to mitigate the huge flux ratio (or contrast) between a host star and its companions.Despite all these approaches, a high contrast image is still affected by speckle noise, due to residual aberrations that arise in the optical train of the telescope and instrument (Males et al. 2021).Speckles are scattered starlight blobs in the image which can mimic the expected signal of an exoplanet in both shape and contrast.Therefore, Send offprint requests to: C.Cantero, e-mail: ccantero@uliege.beF.R.S.-FNRS Senior Research Associate beyond dedicated instrumental developments, powerful image post-processing algorithms are needed to disentangle true companions from speckles.In order to help algorithms to achieve this goal, different observing strategies have been proposed, the most popular being angular differential imaging (ADI, Marois et al. 2006).An ADI data set consists of a sequence of high contrast images acquired in pupil-stabilized mode, where the instrument derotator tracks the telescope pupil instead of the field, in such a way that the instrument and optics in the telescope stay aligned while the image rotates in time due to the Earth rotation.As a result, speckles associated with the telescope and instrument optical train remain mostly fixed in the focal plane while the astrophysical signal rotates around the star as a function of the parallactic angle.
Currently, there exists a plethora of post-processing detection algorithms that work on ADI sequences.Most of these algorithms belong to the PSF-subtraction family, which aims to model the speckle field and subtract it from each frame in the ADI sequence, de-rotate the residual images according to the parallactic angles, and finally collapse them into a final frame (Marois et al. 2008a), commonly referred to as processed frame.Examples of these techniques are the locally optimized combination of images (LOCI, Lafreniere et al. 2007) and its variants TLOCI (Marois et al. 2014) and MLOCI (Wahhaj et al. 2015), principal component analysis (PCA, Soummer et al. 2012;Amara & Quanz 2012), the low-rank plus sparse decomposition (LLSG, Gomez Gonzalez et al. 2016), and the non-negative matrix factorization (NMF, Ren et al. 2018).PSF subtraction is usually followed by a detection algorithm, which can be either based on an S/N map (Mawet et al. 2014) or on a more advanced technique, such as the standardized trajectory intensity mean (STIM, Pairet et al. 2019) or the regime-switching model (RSM, Dahlqvist et al. 2020).Another family of algorithms, based on an inverse problem approach, relies on directly modeling the expected planetary signal and tracking it along the ADI sequence.This is typically done by estimating the contrast of the potential planetary signal via maximum likelihood estimation.Examples of these methods include ANDROMEDA (Cantalloube et al. 2015), the forward model matched filter (FMMF, Ruffio et al. 2017), the exoplanet detection based on patch covariances (PACO, Flasseur et al. 2018), or the temporal reference analysis of planets (TRAP, Samland et al. 2021).Recently, a new post-processing approach based on machine learning has emerged in HCI.In particular, SODIRF and SODINN (Gomez Gonzalez et al. 2018) are two binary classifiers that use a random forest and a convolutional neural network, respectively, to distinguish between companion signatures and residual noise in processed frames.More recently, Gebhard et al. (2022) proposed a modified version of the half-sibling regression by Schölkopf et al. (2016) using a ridge regression with generalized crossvalidation.
Most of these techniques were benchmarked in the context of the Exoplanet Imaging Data Challenge (EIDC, Cantalloube et al. 2020Cantalloube et al. , 2022)), the first platform designed for a fair and common comparison of processing algorithms for exoplanet detection and characterization in high-contrast imaging.From the whole set of conclusions provided by the first EIDC phase Cantalloube et al. (2020), we rely on two of them to motivate this paper.First, we observed that detection algorithms that exploit the local behaviour of image noise obtained the highest detection score in the challenge leader-board.Second, we found that supervised machine learning algorithms produced a relatively high number of false positives, compared with more standard algorithms.Thereby, with the aim of enhancing the supervised machine learning models, we explore, in this paper, a new local noise approach, through which they can better exploit noise statistics in the ADI data set.This approach relies on the existence of two noise regimes in the processed frame: a speckle-dominated residual noise regime close to the star, and a background-dominated noise regime further away.Our goal is to spatially define these regimes in the processed frame through the study of their statistical properties and then adapt the SODINN neural network to work separately in each of them in order to improve its detection performance.Therefore, in Sect. 2 we first revisit noise statistics in HCI and present a novel statistical method that allows to empirically delimit noise regimes in processed frames.Then, in Sect.3, we introduce the NA-SODINN detection algorithm, a neural network architecture optimized to work on noise regimes.Our deep learning method is also fed with local discriminators, such as S/N curves that contain additional physical-motivated features and help the trained model to better disentangle an exoplanet signature from speckle noise.In Sect.4, NA-SODINN is evaluated through local ROC analysis using on a series of ADI data sets obtained with various instruments.During the evaluation, NA-SODINN is benchmarked against other state-of-the-art HCI detection algorithms.Section 5 concludes the paper.

Noise regimes in processed ADI images
The term local is often used in image processing to describe a process applicable to a smaller portion of the image, such as the neighborhood of a pixel, in which pixel values exhibit a certain amount of correlation.In HCI, defining image locality thus implies a good comprehension of the physical information captured in the image.A common manner to define locality is linked to the understanding of noise distribution along the image fieldof-view, and how this can prevent the detection of exoplanets.For example, after some pre-processing steps (including background subtraction), a high-contrast image is composed of three independent components: (1) residual starlight under the form of speckles, (2) the signal of possible companions, and (3) the statistical noise associated with all light sources within the field-ofview, generally dominated by background noise in infrared observations.In these raw images, exoplanets are hidden because starlight speckles and/or background residuals dominate at all angular separations, and act as a noise source for the detection task.According to their origin, starlight speckles can be classified as instrumental speckles (Hinkley et al. 2007;Goebel et al. 2016), which are generally long-lived and therefore referred to as quasi-static speckles, and atmospheric speckles, which have a much shorter lifetime (Males et al. 2021).Speckles intensity is known to follow a modified Rician probability distribution (Soummer et al. 2007).Here, the locality of the noise is driven by the distance to the host star (Marois et al. 2008a), which already gives an indication on how local noise will be defined in a processed image.Consequently, a large fraction of post-processing algorithms currently work and process noise on concentric annuli around the star.For example, the annular-PCA algorithm (Absil et al. 2013;Gomez Gonzalez et al. 2016) performs PSF subtraction with PCA on concentric annuli.Nevertheless, more sophisticated local approaches have recently been proposed in the literature.For instance, both the TRAP algorithm (Samland et al. 2021) and the half-sibling regression algorithm (Gebhard et al. 2022) take into account the symmetrical behavior of speckles around the star when defining pixels predictors for the model.
In this section, we aim to introduce an alternative local processing, well-suited for the SODINN framework as explained later in the paper, based on the spatial division of the processed frame into (at least) two noise regimes.For illustrative purposes, we make use, in this section, of two ADI sequences chosen from the set of nine ADI sequences used in EIDC (Cantalloube et al. 2020) (see Table A.1 for more information about the EIDC data sets).Our two ADI sequences, referred to as sph2 and nirc3, were respectively obtained with the VLT/SPHERE instrument (Beuzit et al. 2019) and the Keck/NIRC-2 instrument (Serabyn et al. 2017).They have the advantage of not containing any confirmed or injected companion, which makes them appropriate for algorithm development and tests that rely on the injection of exoplanet signatures in the image.

Spatial noise structure after ADI processing
Performing PSF subtraction on each high-contrast image in an ADI sequence generates a sequence of residual images where speckle noise is significantly reduced, and partly whitened (Mawet et al. 2014).After de-rotating these residual images based on their parallactic angle, and combining them into a final frame, the remaining speckles are further attenuated and whitened.This final frame is commonly referred to as processed frame.Because of the different post-processing steps and the whitening operator that removes correlation effects, the major- ity of HCI detection algorithms make use of the central limit theorem to state that residual noise in processed frames follows a Gaussian distribution, an assumption that even today has not been proven experimentally.From practice, it is known that this Gaussian assumption leads to high false positive detection rates (Marois et al. 2008a;Mawet et al. 2014) since residual speckle noise in processed frames is never perfectly Gaussian, and still dominates for small angular separations.Pairet et al. (2019) found experimentally that the tail decay of residual noise close to the star is better explained by a Laplacian distribution than a Gaussian distribution.Later, Dahlqvist et al. (2020) reached the same conclusion by applying a Gaussian and a Laplacian fit to the residuals of PCA-, NMF-, and LLSG-processed frames.These experimental results suggest the presence of two residual noise regimes in the processed frame: a non-Gaussian noise regime close to the star, dominated by residual speckle noise, and a Gaussian regime further away, dominated by background noise.

Identification of noise regimes
Based on the current understanding of the local statistics of noise in a processed frame, we aim now to spatially delimit both noise regimes in the image.To do so, we try to find the best radial distance approximation from the star where residual speckle noise starts to become negligible compared to background noise (Fig. 1), which is uniform over the whole field-of-view.

Paving the image field-of-view
In order to find the radius at which background noise starts to dominate in the image, we study the evolution of noise statistics as a function of angular separation.We first pave the full image field-of-view through concentric annuli of λ/D width (Fig. 1).Each annulus contains pixels that are expected to be drawn from the same parent population (Marois et al. 2008a).Note that, in the presence of residual speckles, pixels that contain information from the same speckle are all spatially correlated.When background noise dominates over residual speckle noise, we can instead assume that all pixels in an annulus are independent, since photon noise occurs on a pixel-wise basis.In HCI, a common procedure to guarantee the independence of pixel samples when performing statistical analysis is to work by integrating pixel intensities on non-overlapping circular apertures of λ/D diameter within the annulus (Mawet et al. 2014), as shown in Fig. 2.This procedure is based on the characteristic spatial scale of residual speckles (∼ λ/D size).However, Bonse et al. ( 2022) have recently showed that, in the presence of speckle noise, this independence assumption on non-overlapping apertures is incorrect.Instead, they propose to (i) only consider the central pixel value in each circular aperture, to produce a more statistically independent set of pixels, and (ii) possibly repeat the experiment with various spatial arrangements of the non-overlapping apertures to reduce statistical noise in the measured quantities.We follow this recommendation and therefore, for the rest of this study, we define our annulus samples by only taking the central pixel value for each non-overlapping circular aperture (Fig. 2).One limitation in using non-overlapping apertures is the small sample statistics problem, especially at small angular distances (Mawet et al. 2014).Small samples make statistical analysis not significant so that derived conclusions are not strong enough statistically speaking.In order to avoid this issue, we propose to use the concept of a rolling annulus (Fig. 2) that always contains a minimum number of independent pixels N. It can be understood as an annular window around the star for which the inner boundary moves in 1λ/D steps, while the outer boundary is set to achieve the criterion on the minimum number of independent pixels.An example of this process with N = 100 pixels is shown in Fig. 2, where the first rolling annulus that achieves the condition, composed of all central pixels of the nonoverlapping apertures between 1 and 6λ/D, is displayed in red color over the processed frame.Then, the rolling annulus moves away from the star changing its boundaries as illustrated with the black line at the bottom of Fig. 2. For example, the ninth rolling annulus (in blue) with N = 100 is located between 9 and 10 λ/D, and the eighteenth rolling annulus (in green) is at 18λ/D distance, achieving the N = 100 condition without the need to expand the region to another annulus.In this paper, we select N = 100 minimum samples, considered to be the minimum number of samples required to reach a reliable statistical power and significance for our statistical analysis.

Statistical moments
Once the processed frame is paved, we first study the evolution of different statistical moments as a function of the angular separation to the star: the variance (amount of energy/power), the skewness (distribution symmetry), and the excess kurtosis (distribution tails).Figure 3 shows this evolution for the case of the sph2 (top row) and nirc3 (bottom row) data sets, on which we apply annular-PCA to produce the processed frames.We observe that the variance decreases as the rolling annulus moves away from the star.This trend is common to both data sets and is what we would expect in physical terms as the intensity of residual speckles varies rapidly with angular separation, especially at short distance.We also see that this behaviour is damped when using a larger number of principal components (PCs), which leads to a more effective speckle subtraction.Regarding the skewness analysis, we adopt the convention of Bulmer (1979), which states that a distribution is symmetrical when its skewness ranges from −0.5 to 0.5.For both data sets, we clearly observe a loss of symmetry at small angular separations.The presence of speckles can provoke this distribution asymmetry due to their higher intensity values in comparison with the background.Looking now at the excess kurtosis in Fig. 3, we observe a strong leptokurtic1 trend for the entire set of PCs at small angular separations, and for both data sets.This perfectly matches with the fact that a Laplacian distribution fits better the tail decay of residual noise (Pairet et al. 2019), since it is, by definition, leptokurtic.At higher angular separations instead, we observe differences between both data sets.In the sph2 processed frames, we detect one mesokurtic regime approximately between 6-13λ/D followed by a weaker leptokurtic regime approximately between 14-19λ/D.For nrc3, we only observe one mesokurtic regime at large distance from the star, beyond about 3-6λ/D (Fig. 3).

Normality test combination analysis
Another way to explore the spatial distribution of noise is to use hypothesis testing.Assuming that residual speckle noise is non-Gaussian by nature, while background noise is Gaussian (see Sect. 2.1), we can assess the probability of the null hypothesis H 0 that data is normally distributed, i.e., explained solely by background noise.We rely on a combination of a series of normality tests, making use of four of the most powerful tests: the Shapiro-Wilk test (sw, Shapiro & Wilk 1965), the Anderson-Darling test (ad, Anderson & Darling 1952), the D'Agostino-K2 test (ak, D'Agostino & Pearson 1973), and the Lilliefors test (li, Lilliefors 1967).This choice is motivated by the fact that they have been well-tested in many studies, including Monte-Carlo simulations (Yap & Sim 2011;Marmolejo-Ramos & González-Burgos 2013;Ahmad & Khan 2015;Patrício et al. 2017;Wijekularathna et al. 2019;Uhm & Yi 2021).It is worthwhile to remark that the goal is not to benchmark the robustness of all these tests.Our purpose, instead, is to collect a larger amount of statistical evidence for a same hypothesis, that can then be combined to increase the statistical power when making a decision regarding the null hypothesis.Moreover, regarding the statistical requirements, the only constraints to be verified before using these tests are the independence and sufficient size of the sample.In terms of sample size, (Jensen-Clem et al. 2017) shows that normality tests can exhibit lower statistical power with sample sizes under 100 observations.Here, the independence and the size constraints are met by the proposed approach to pave the field-of-view, using the central pixels of non-overlapping apertures within rolling annuli of N = 100 apertures.Additionally, we follow for this analysis the recommendation of Bonse et al. ( 2022) to perform our statistical tests with various spatial arrangements for the nonoverlapping apertures.We leverage the fact that different aperture arrangements within a same annulus contain valuable noise diversity that can directly benefit the analysis when making a decision about the null hypothesis.
Our analysis is thus composed as follows.Given a processed frame, we test the null hypothesis H 0 in a specific rolling annulus through the following consecutive steps: 1. Randomly select a normality test t from T = {sw, ad, ak, li}. 2. Randomly select an angular displacement θ of circular apertures for each single annulus within the rolling annulus.Assuming N ann single annuli, then, Θ = {θ i } i={1,...,N ann } , where Θ thus represents a random aperture arrangement.3. Define the sample of central pixels X(Θ). 4. Using the selected statistical test t, compute the p-value associated with the null hypothesis for the sample X(Θ), denoted as p(t, Θ). 5. Repeat steps 1-4 m times.Because these m p-values computed in step 4 are not statistically independent, we use the harmonic mean as proposed by Vovk & Wang (2020) to combine them into a global p-value noted p. 6. Compare p with a predefined significance threshold α, and reject H 0 if p < α.
By repeating steps 1-6 for each rolling annulus in the processed frame and for various numbers of principal components in our annular-PCA post-processing algorithm, we can build what we call as PCA p-value map, or PCA-pmap for short.Figures 4 and 5 show examples of PCA-pmaps for the sph2 and nirc3 data sets, respectively.For both, we only considered the first 29 principal components to produce the annular-PCA space (y-axis in figures).Each cell in a PCA-pmap shows, through the number in white and its background color, the combined pvalue p computed in step 5 with m = 30.P-values below the pre-defined threshold α are marked with yellow stars on the figures.In order to minimize the Type I error (false rejection of the null hypothesis), we selected a conservative threshold value α = 0.01 in Figures 4 and 5.In the case of sph2 (Fig. 4), we clearly observe the presence of three noise regimes: a first regime dominated by non-Gaussian noise due to residual speckles between 1 − 5λ/D distance, a second regime where noise is more consistent with Gaussian statistics, probably dominated by background noise between 5 − 12λ/D, and finally, a third regime with non-Gaussian noise beyond 12λ/D, where speckles are dominating again as we approach the limit of the well-corrected area produced by the SPHERE adaptive optics (Cantalloube et al. 2019).This would also explain the slightly leptokurtic behavior observed at those separations in Fig. 1.For the nirc3 data set (Fig. 5), we see two noise regimes, with speckle noise dominating approximately between 1 − 3λ/D distance, and background noise dominating beyond 3λ/D.
In addition to the detection of noise regimes, we leverage the fact that a PCA-pmap can be also used as a method to choose an optimal PCA-space according to the residual noise instead of other used metrics, such as the cumulative explained variance ratio (CEVR, Gomez Gonzalez et al. 2018).For each rolling annulus in Figs. 4 and 5, we plot the 90% CEVR with a white dashed curve, a common variance limit used in the literature to capture relevant information in the data.Hence, this curve informs about which principal component should be the lowest for each annulus to avoid adding useless independent information.In order to complement the added value of the CEVR over the PCA-pmap, we provide information on the exoplanet signature evolution along the PCA-space.By injecting a number of fake companions in each rolling annulus, with random coordinates and random flux level between 1 and 3 times the estimated noise level, and computing their S/N, we can estimate for which principal component the companion S/N is maximum.Beyond this principal component, self-subtraction of the exoplanetary signal starts to increase more rapidly than noise suppression (Gonzalez et al. 2017).We indicate the principal component where the exoplanet S/N is maximum in average through white circles in Figs. 4 and 5.By comparing the plausibility of the null hypothesis together with the CEVR metric and the principal component at which the S/N of exoplanet is maximum, we can define the PCA-space in our ADI sequence based on a more complex analysis of noise.We make use of this particular approach in PCApmaps later in section 4.

Field-of-view splitting strategy
At this point, we can see that, for both sph2 and nirc3, similar estimations of the noise regimes are reached using the two proposed methods: the study of statistical moments and the PCApmaps.Figure 3 provides a first insight into the spatial structure of residual noise and thereby, brings us closer to estimate the radius split (Fig. 1) in the processed frame.Indeed, the significant increase of the variance together with the leptokurtic behaviour and the positively skewed trend at small angular separations, suggest that this regime is still dominated by residual starlight speckles.On the other hand, PCA-pmaps contain more statistical diversity through the combination of p-values with which very similar regime estimations are reached.Thus, both analysis are complementary statistically speaking.Yet, from now on, we elect to use PCA-pmaps to define noise regime as a baseline, since they can also be used for other purposes.
The noise analysis described above suggests that there can be more than two noise regimes in the processed frame, depending on the structure of the data.Despite the fact that this is not enough to extract a general conclusion for all HCI instruments, it suggests that noise regions should be defined on a case-bycase basis.Regarding the nature of residual noise in a processed frame, our tests do not necessarily mean that residual speckle noise is non-Gaussian in the innermost, individual annuli.Instead, compound distributions could be at the origin of the non-Gaussian noise behavior in the innermost rolling annuli.Compound distributions refer to the sampling of random variables that are not independent and identically distributed.For large angular separations (e.g., the green annulus in Fig. 2), we generally observe that the variance is approximately the same for all central pixels.Because they can be considered as independent random variables, we can apply the central limit theorem to state that these samples follow a Gaussian distribution, as expected for background noise.However, for small angular separations (red annulus in Fig. 2) where residual speckle noise dominates over background noise, the samples are taken from distributions that might be Gaussian, but with different variances.If they are Gaussian and their variance follows an exponential distribution, then according to Gneiting (1997), the compound distribution follows a Laplacian, as observed by Pairet et al. (2019).This explanation, which is not a proof, would reconcile the belief that residual speckle noise should be locally Gaussian.Because of small sample statistics, there is however no proper way to test this interpretation on individual annuli in the innermost regions.Likewise, the compound distribution problem could also explain why we observe a non-Gaussian behavior in the outermost annuli of the sph2 processed frame.In those separations, our rolling annuli contain samples drawn at exactly the same radial distance, which would lean us to assume that the variances of the underlying distributions are all identical.However, due to different physical reasons, such as the possible presence of a wind-driven halo or of telescope spiders, there is no guarantee for a perfect circular symmetry inside this speckle-dominated annulus.In such a scenario, the compound distribution problem combined with the variability in the variances of the corresponding samples leads to a non-Gaussian behavior as well.For all these reasons, we believe that splitting the processed frame field-of-view in different noise regimes is duly motivated and, in the next sections, we detail how we have implemented this splitting to improve the detection of exoplanets.

Implementation
So far, we have focused on understanding the spatial structure of residual noise in the processed frame, which has allowed us to empirically define the regions dominated by speckle and background noise.Now, we aim to use this local noise approach in order to help post-processing algorithms to enhance their detection performance.Most HCI algorithms have the potential of being applied separately on different noise regimes.Here, we are particularly interested in the case of deep learning.Neural networks are good candidates to capture image noise dependencies due to their ability to recognize hidden underlying relationships in the data, and make complex decisions.In order to maximize the added value of working in noise regimes and show its benefits for the detection task, we propose to revisit SODINN (Gomez Gonzalez et al. 2018), the first supervised deep learning algorithm for exoplanet imaging.In this section, we first provide a brief overview of SODINN, and then present our novel NA-SODINN algorithm, an adaptation of SODINN working on noise regimes, aided with additional handcrafted features.

Baseline model: the SODINN algorithm
SODINN stands for Supervised exOplanet detection via Direct Imaging with deep Neural Network.It is a binary classifier that uses a convolutional neural network (CNN) to distinguish between two classes of square image sequences: sequences that contain an exoplanet signature (c + , the positive class), and sequences that contain only residual noise (c − , the negative class).Figure 6 (bottom) shows an example sequence for each class, where the individual images are produced with various number of principal components.Gomez Gonzalez et al. (2018) refers to these image sequences as Multi-level Low-rank Approximation Residual (MLAR) samples.
The first step in SODINN is to build a training data set composed of thousands of different c + and c − MLAR sequences.A c + sequence is formed through three consecutive steps that are summarized in Fig. 6. (i) First, a PSF-like source is injected at a random pixel within a given annulus of the ADI sequence.The flux of this injection is the result of multiplying the normalized off-axis PSF by a scale factor randomly chosen from a pre-estimated flux range that corresponds to a pre-defined range of S/N in the processed frame.(ii) Singular value decomposition (SVD, Halko et al. 2011) is then used on this synthetic ADI sequence to perform PSF subtraction for different number of singular vectors (or principal components), thereby producing a series of processed frames.(iii) Finally, square patches are cropped around the injection coordinates for each processed frame.This forms a series of c + MLAR sequences, where each sequence contains the injected companion signature for different numbers of principal components.The patch size is usually defined as two times the FWHM of the PSF.Likewise, we construct a c − se-quence by extracting MLAR sequences for pixels where no fake companion injection is performed.The number and order of singular vectors is the same as those used for the c + sequences.For the case of c − sequences, SODINN must deal with the fact that, using only one ADI sequence, we obtain a single realization of the residual noise, so that the number of c − sequences we can grab per annuli is not enough to train the neural network without producing over-fitting.SODINN solves this problem by increasing the number of c − sequences in a given annulus through the use of data augmentation techniques, such as random rotations, shifts, and averaging.This procedure of generating c + and c − sequences is repeated thousands of times for each annulus in the field-of-view.When the entire field-of-view is covered, MLAR sequences of a same class from all annuli are mixed and the balanced training set (same amount of c + and c − samples) is built.
The training set is then used to train the SODINN neural network.This produces a detection model that is specific for the ADI sequence from where MLAR sequences where generated.The SODINN network architecture is composed of two concatenated convolutional blocks.The first block contains a convolutional-LSTM (Shi et al. 2015) layer with 40 filters, and kernel and stride size of (1,1), followed by a spatial 3D dropout (Srivastava et al. 2014) and a MaxPooling-3D (Boureau et al. 2010).The second block contains the same except for it has now 80 filters, and kernel and stride size of (2,2).These first two blocks extract the feature maps capturing all spatio-temporal correlations between pixels of MLAR sequences.After that, they are flattened and sent to a fully connected dense layer of 128 hidden units.Then, a rectifier linear unit (ReLU, Nair & Hinton 2010) is applied to the output of this layer followed by a dropout regularization layer.Finally, the output layer of the network consists of a sigmoid unit.The network weights are initialized randomly using a Xavier uniform initializer, and are learned by back-propagation with a binary cross-entropy cost function.SODINN uses an Adam optimizer with a step size of 0.003, and mini-batches of 64 training samples.An early stopping condition monitors the validation loss.The number of epochs is usually set to 15, with which SODINN generally reaches ∼ 99% validation accuracy (Gomez Gonzalez et al. 2018).
Once the detection model is trained and validated, it is finally used to find real exoplanets in the same ADI sequence.Because the input of the model is an MLAR structure, we first map the entire field-of-view by creating MLAR samples (with no injection) centered on each pixel.The goal of the trained model is therefore to assign a probability value for each of these new MLAR sequences to belong to the c + class.Computing a probability for each individual pixel leads to a probability map, from which exoplanet detection can be performed by choosing a detection threshold.

Model adaptation: the NA-SODINN algorithm
In SODINN, the training set is built by mixing all MLAR sequences from a same class, generated on every annulus in the field-of-view.In the presence of different noise regimes, this way to proceed can complicate the training of the model, as the statistics of an MLAR sequence generated in the speckle-dominated regime differ from a sequence of the same class generated in the background-dominated regime instead.In order to deal with this, we train an independent SODINN detection model per noise regime instead of a unique model for the full frame field-of-view.Thereby, each detection model is only trained with those MLAR sequences that contain statistical properties from the same (or similar) probabilistic distribution function.Therefore, our region of interest in the field-of-view is now smaller.This means that the number of pixels available to generate MLAR sequences is reduced and therefore, that we are losing noise diversity in comparison with a model that is trained in the full frame.However, this diversity loss comes with the benefit of better capturing the statistics of noise within a same noise regime, which improves the training.
In order to compensate for the noise diversity loss associated with the training on individual noise regimes, we attempt to reinforce the training by means of new handcrafted features.An interesting discriminator between the c + and c − classes, which is also physically motivated, comes from their behavior in terms of signal-to-noise ratio (S/N).The most accepted and used S/N definition in the HCI literature is from Mawet et al. (2014).It states that, given a 1λ/D wide annulus in a processed frame at distance r (in λ/D units) from the star, paved with N = 2πr nonoverlapping circular apertures (see Fig. 2), the S/N for one of these apertures is defined as where xt is the aperture flux photometry in the considered test aperture, xN−1 the average intensity over the remaining N − 1 apertures in the annulus, and σ N−1 their standard deviation.In order to maximize the S/N, image processing detection algorithms need to be tuned through finding the optimal configuration of their parameters (see e.g., Dahlqvist et al. 2021b).Here, rather than optimizing the algorithm parameters, we use the fact that we can leverage the behavior of the S/N versus some of the algorithm parameters in our deep learning approach.This is especially the case for the number of principal components used in the PSF subtraction.We define an S/N curve as the evolution of the S/N computed for a given circular aperture as a function of the number of principal components (Gonzalez et al. 2017).Fig. 7 shows an example of 1,000 S/N curves generated from the sph2 ADI sequence.We clearly see in Fig. 7 that, in the presence of an exoplanet signature (blue curves), the S/N curve first increases and then decreases, which leads to the appearance of a peak at a given number of principal components.This behavior, capturing the competition between noise subtraction and signal self-subtraction, was already documented elsewhere (e.g., Gonzalez et al. 2017).The peak in the S/N curve indicates the number of principal components for which the contrast between the companion and the residual noise in the annulus is maximum.For a given 1-FWHM circular aperture, the MLAR sequence (no matter the class) and the S/N curve are linked from a physical point of view.Actually, the evolution of the S/N as a function of the number of principal components can be readily extracted from intermediate products used in the production of the training data set.Therefore, the information conveyed through the S/N curve is already partly contained in the MLAR patches.But while the MLAR sequence contains localized information on the signal and noise behavior, the S/N curve conveys an annuluswise information, obtained through aperture photometry.Indeed, each aperture S/N estimation depends on the noise in the rest of the annulus (Eq.1), so that it also contains information that connects with other circular apertures at the same angular separation from the star.This dependency is not captured in MLAR sequences.S/N curves make this rich summary statistics directly available to the neural network to improve the neural network training.One complication in using S/N curves in the training relates to data augmentation, which is mandatory to build up a sufficiently large training data set for SODINN.Because these augmentation operations modify the intensity and distribution of pixels in the MLAR sequence, there is no direct way to compute the associated S/N curve of an augmented MLAR sequence through Eq. 1.To deal with this, we make simplifying assumptions for each augmentation operation in SODINN: (i) image rotations do not affect the S/N curve as the same pixels are kept in the final sequence, (ii) averaging two sequences can be approximated as averaging their S/N curves, and (iii) image shifts do not affect the S/N curve as long as the shift is sufficiently small.By adding the noise regimes approach and the S/N curves to SODINN, we are building a new detection algorithm.We refer to this novel framework, depicted in Fig. 8, as Noise-Adaptive SODINN, or NA-SODINN for short.As its predecessor, NA-SODINN is composed of the same three steps: (i) producing the training set from an ADI sequence, (ii) training a detection model with this training set, and (iii) applying the model to find companions in the same ADI sequence.However, in the first step, NA-SODINN generates as many training sets as detected residual noise regimes.Each of these sets are composed of MLAR sequences and their corresponding S/N curves generated from the corresponding noise regime, including data augmentation.In the second step, NA-SODINN trains an independent detection model for each regime by using its corresponding training set.For each MLAR sequence in the training set, the feature maps created through convolutional blocks are now concatenated with their respective S/N curves after the flattened layer (Fig. 8).In last step, NA-SODINN does inference in individual noise regimes.It applies the trained model of each regime to infer its corresponding probability map of the same regime (Fig. 8).Finally, NA-SODINN builds the final probability detection map by joining all probability regime maps inferred with each detection model.Thus, our NA-SODINN algorithm is conceived to keep the main characteristics of the pioneering SODINN algorithm (Gomez Gonzalez et al. 2018), such as its architecture, and adapt its optimization process to our local noise approach.NA-SODINN trains as many detection models as detected noise regimes using their respective training data sets (note that for the sake of simplicity, we have not duplicated the central deep neural network).This case contains two regimes, the speckle-and background-dominated noise regimes, so that two models are trained.Right: detection map.Finally, NA-SODINN uses each trained model to assign a probability value to belong to the c + class to each pixel of the corresponding noise regime field-of-view.

Model evaluation
Now that NA-SODINN has been introduced, we aim to thoroughly evaluate its detection ability.In the first part of this section, we explain the evaluation strategy and benchmark NA-SODINN with respect to its predecessor SODINN using the same sph2 and nrc3 ADI sequences.Then, in the second part, we apply NA-SODINN to the first phase of EIDC (Cantalloube et al. 2020), providing probability maps for each ADI sequence in the data challenge and running the same statistic analysis to compare the NA-SODINN performance with the rest of HCI algorithms.

Performance assessment
The evaluation of HCI detection algorithms consists of minimizing the false positive rate (FPR) while maximizing the true positive rate (TPR) at different detection thresholds applied in the final detection map.This information is summarized by a curve in the Receiver Operating Characteristics (ROC) space, where each point in the curve captures both metrics at a given threshold value (Gomez Gonzalez et al. 2018;Dahlqvist et al. 2020).In order to produce ROC curves for various versions of SODINN applied on a given ADI sequence D, we first build the evaluation set D eval = {D 1 , D 2 , D 3 , . . ., D s } containing s synthetic data sets D i , where each synthetic data set is a copy of D with one fake companion injection per noise regime.Here, we limit the num-ber of injected companions to one at a time, as having more than one companion per data cube is unnecessary due our approach to detect exoplanets locally.The coordinates of these injections are randomly selected within the considered noise regime boundaries, and their fluxes are randomly set within a pre-defined range of fluxes that correspond to a S/N range between one and two in the processed frame.Hence, each algorithm provides s final detection maps, from which true positives (TPs) and false positives (FPs) indicators are computed across the whole noise regime field-of-view at different detection thresholds.Then, all these indicators are averaged and the corresponding ROC curve for the considered noise regime is produced.Instead of using the FPR as in standard ROC curves, here we used the mean number of FPs within the whole field-of-view, which is more representative of the HCI detection task and facilitates the interpretation of our performance simulations.
We perform the proposed ROC curve analysis on both sph2 and nrc3 ADI sequences with s = 100 for each.For this assessment, a detection is defined as a blob in the final detection map with at least one pixel above the threshold inside a circular aperture of diameter equal to the FWHM centered at the position of each injection of both D sph2 eval and D nrc3 eval .With the aim to benchmark NA-SODINN, we include in this evaluation the annular-PCA algorithm (Absil et al. 2013), as implemented in the VIP Python package (Gonzalez et al. 2017;Christiaens et al. 2023), the SODINN framework by Gomez Gonzalez et al. (2018), and two hybrid detection models.These hybrid models are modifications of SODINN to include only one of the two additional features introduced in NA-SODINN: the adaptation to noise regimes, or the addition of S/N curves in the training.Hereafter, we refer to them respectively as SODINN+Split and SODINN+S/N.In the same spirit as an ablation study, these two hybrid models are included in our evaluation in order to provide information about the added value of each approach separately for the task of detection.
An important aspect to consider when comparing algorithms in ROC space is to optimally choose their model parameters.In the case of annular-PCA, we use five principal components for each annulus as a good compromise to get a high S/N for injected companions, especially in the speckle-dominated regime.For the various versions of SODINN, we need to define two main parameters: the list of principal components PC = (pc 1 , pc 2 , . . ., pc m ) that are used to produce each sample in both the MLAR sequence and S/N curve, and the level of injected fluxes used for making c + class samples (see Sect. 3.1).For SODINN, we used the criterion based on the cumulative explained variance ratio (CEVR), as proposed by Gomez Gonzalez et al. (2018), to define the range of PC.For NA-SODINN and the hybrid models, we instead rely on the novel PCA-pmaps technique presented in Sect.2, and we choose a list of m = 13 principal components centered around the principal component where the maximum S/N is reached (pc peak hereafter, denoted by a white circle in the PCA-pmap).By comparing pc peak with the principal component where the 90% CEVR is reached in PCA-pmaps for both sph2 and nrc3 ADI sequences (Figs. 4 and 5), we observe that at some angular separations, the S/N peak is not well captured by the CEVR metric.This suggests that the use of CEVR as a figure of merit for choosing the PC list is not always optimal for the training.Regarding the injected fake companion fluxes, we choose for all SODINN-based models a range of fluxes that correspond to an S/N between one and three in the PCA-processed frame.This range of fluxes does generally not lead to class overlapping, where c + and c − class samples would look too similar.However, in order to avoid FPs in the final detection map, the user may consider higher flux ranges in those data sets where the level of noise is higher.Finally, to build the ROC curve, we consider a list of S/N thresholds ranging from 0.1 to 4.4 in steps of 0.01 for annular PCA, while for the SODINN-based models we use a list of probability thresholds from 0.09 to 0.99 in steps of 0.01.All SODINN-based models are trained on balanced training sets containing around 10 5 samples for each class.
Figures 9 and 10 display a series of ROC spaces -one for each detected noise regime-, respectively for the sph2 and nrc3 ADI sequences.Each of these ROC spaces displays one ROC curve per algorithm, which informs about its detection performance on that specific noise regime for different thresholds.We observe from both figures that NA-SODINN outperforms both its predecessor and the hybrid models, especially for the noise regimes dominated by residual speckle noise.In the case of the sph2 noise regime comprised between 12-19 λ/D, corresponding to the outer edge of the SPHERE well-corrected region, we observe that SODINN presents a significant number of FPs.We associate this trend to the fact that this noise regime contains a significant number of residual speckles with similar intensities as used for the injected fake companion during training, causing a class overlap situation.This is partly due to the much larger number of independent statistical samples at these larger separations, which increases the chances of finding stronger outliers in the noise.In order to overcome this problem, the user can increase the S/N range hyper-parameter used to generate c + sequences, at the expense of decreasing the ability of finding  faint companions.Despite the complexity of this noise regime, NA-SODINN manages to reduce the FPs to sufficiently low levels while at the same time improving the TPR, which is always above 90% at every threshold.This behavior is further illustrated in Figs.B.1 and B.2 of Appendix B, where the NA-SODINN and SODINN probability maps are compared at different threshold levels.Regarding hybrid models, we generally observe that they land between the SODINN and NA-SODINN detection performance, with SODINN+S/N generally being the best hybrid model.These results thus suggest that both working with separate noise regimes and adding S/N curves in the neural network significantly enhance the detection performance of SODINN.When these approaches are used in synergy, as in NA-SODINN, the improvement is even more significant.

NA-SODINN in EIDC
By design, the Exoplanet Imaging Data Challenge (EIDC, Cantalloube et al. 2020) can be used as a laboratory to compare and evaluate new detection algorithms against other state-of-the-art HCI detection algorithms.For instance, Dahlqvist et al. (2021a) used the EIDC to highlight the improvement of the automated version of their RSM algorithm.Here, we use the first subchallenge of the EIDC to generalize the ROC analysis presented above, and evaluate how NA-SODINN performs with respect to the state-of-the-art HCI algorithms that entered the data challenge.Besides the sph2 and nrc3 data sets used so far, the first EIDC sub-challenge includes seven additional ADI sequences in which a total of 20 planetary signals with different contrasts and position coordinates were injected.Two of these seven ADI sequences are from the SPHERE instrument (Beuzit et al. 2019), identified as sph1 and sph3, two more from the NIRC-2 instrument (Serabyn et al. 2017), identified as nirc1 and nrc3, and the remaining three from the LMIRCam instrument (Skrutskie et al. 2010), with lmr1, lmr2 and lmr3 ID names.For each of these nine data sets, EIDC provides a pre-processed temporal cube of images, the parallactic angles variation corrected from true north, a non-coronagraphic PSF of the instrument, and the pixelscale of the detector.Each algorithm entering the EIDC had to provide a detection map for each ADI sequence.The following standard metrics are then used to assess the detection performance on each submitted detection map: -True Positive Rate: TPR = T P T P+FN , -False Positive Rate: FPR = FP FP+T N , -False Discovery Rate: FDR = FP FP+T P , -F1-score: F1 = 2•T P 2•T P+FP+FN .We apply our NA-SODINN framework to the EIDC, and as in the ROC analysis, we use PCA-pmaps as a tool for both estimating residual noise regimes and choosing the list of principal components PC at each angular separation.For the injection flux ranges, we use an S/N range between one and four times the level of noise in the processed frame.Each model is trained with balanced training sets that contain around 10 5 samples per class.Because all three LMIRCam cubes contain more than 3,000 frames (Table A.1), we decided to reduce this number to around 250-300 frames to limit the computational time.To do that, we average a certain number of consecutive frames along the time axis in the sequence.Figure 11 shows a grid of all resulting NA-SODINN probability maps from EIDC ADI sequences where we observe, by visual inspection, that NA-SODINN finds most of the injected fake companions, while producing only faint false positives that all fall below our default detection threshold τ = 0.9.In order to quantify this information, we follow the same approach as in Cantalloube et al. (2020) by considering the area under the curve (AUC) for the TPR, FPR, and FDR as a function of the threshold, which allows to mitigate the arbitrariness of the threshold selection by considering their evolution for a pre-defined range.The AUC TPR should be as close as possible to one and the AUC FPR and AUC FDR as close as possible to zero.The F1-score ranges between zero and one, where one corresponds to a perfect algorithm, and is computed only on a single threshold τ sub that is chosen by the participant.
Figure C.1 shows the result of this analysis for all NA-SODINN probability maps of Fig. 11, in which all TPR, FPR, and FDR metrics (and their respective AUCs) are computed for different probability threshold values ranging from zero to one.Here, we mainly see that the AUC FDR is generally higher along the range of thresholds for NIRC-2 and LMIRCam than for SPHERE data sets, the AUC FPR is close to zero for all data sets, and the AUC TPR is almost perfect for SPHERE data sets.To compute the F1-score, we choose a τ sub = 0.9 probability threshold.From our test with NA-SODINN, we consider this value as  A.1).For the submitted probability threshold τ = 0.90, we highlight with green circles the correct detection of injected companions (true positives), and with red circles the non-detection of injected companions (false negatives).No false positive is reported in our maps, as all the remaining non-circled peaks in the probability maps are below the threshold.Large white circles delineate the noise regimes at each case.
the minimum probability threshold for which one can rely on the significance of detections, maximizing TPs while minimizing FPs.Thus, any pixel signal above this τ sub on each probability map of Fig. 11 is considered as a detection for the computation of the F1-score.Finally, through the AUC TPR , AUC FDR and F1-score metrics obtained with the NA-SODINN algorithm, we are able to update the general EIDC leader-board (Cantalloube et al. 2020).Figure 12 shows how NA-SODINN ranks compared to the algorithms originally submitted to the EIDC, for each con-sidered metric.We clearly observe that NA-SODINN ranks at the top, or close to the top, for each of the EIDC metrics, with results generally on par with the RSM algorithm by Dahlqvist et al. (2020).In particular, NA-SODINN provides the highest area under the true positive curve, while preserving a low false discovery rate.

Conclusions
In this paper, we explore the possibility to enhance exoplanet detection in the field of HCI by training a supervised classification model that takes into account the noise structure in the PCA-processed frame.SODINN (Gomez Gonzalez et al. 2018), the pioneering detection algorithm in HCI on using deep learning, is adapted to learn from different noise regimes in the processed frame and local discriminators between the exoplanet and noise, such as S/N curves.With these two approaches working in synergy, we build a new detection algorithm, referred to as Noise-Adaptive SODINN, or NA-SODINN for short.Although our findings related to the spatial structure of noise distributions are showcased by adapting the SODINN detection framework, we believe that other algorithms dealing with processed frames could be adapted in a similar way.
The NA-SODINN detection capabilities are tested through two distinct analyses.First, we perform a performance assessment based on ROC curves using two ADI sequences provided by the VLT/SPHERE and Keck/NIRC-2 instruments.Here, NA-SODINN is evaluated with respect to annular-PCA, the original SODINN, and two SODINN-based hybrid models that use only one of the two proposed approaches, i.e., the noise regime splitting or the S/N curves addition.We find that hybrid models improve the detection performance of SODINN in all noise regimes, which demonstrates the interest of the local noise approaches considered in this paper.Moreover, we find that NA-SODINN reaches even higher detection performance, especially in the speckle noise regime, by combining both approaches in the same framework.Next, in order to benchmark NA-SODINN against other state-of-the-art HCI algorithms, we apply NA-SODINN to the first phase of the Exoplanet Imaging Data Challenge (Cantalloube et al. 2020), a community-wide effort meant to offer a platform for a fair and common comparison of exoplanetary detection algorithms.In this analysis, we observe that NA-SODINN is ranked at the top (first or second position) of the challenge leader-board for all considered evaluation metrics, providing in particular the highest true positive rate among all entries, while still keeping a low false discovery rate.Our new NA-SODINN framework therefore opens the door to more accurate searches of new and/or non-confirmed worlds in individual HCI data sets, as well as in large HCI surveys.The F1-score is computed at the submitted threshold on the challenge τ sub = 0.9 (vertical dashed line) and it is showed in the top of each subplot.When the data set contains injections, TPR and FDR steply decrease with threshold while FPR decreases monotonically.Thereby, an ideal algorithm would provide a TPR=1, FPR=0 and FDR=0 for any threshold and therefore, an AUC T PR = 1, AUC FPR = 0 and AUC FDR = 0.However, when the data set does not have injections, the FPR is the only metric that can be defined as it does not depend on TPs.

Fig. 1 .
Fig. 1.Processed frame from sph2 data set with both speckle-dominated and background-dominated residual noise regimes and their annular split (black circle).The best approximation of this split is what we aim to find in this section.

Fig. 2 .
Fig. 2. Rolling annulus with N = 100 over the processed frame of Fig. 1.The first rolling one (in red), the ninth rolling one (in blue) and the eighteenth rolling one (in green) are displayed over the central pixel pavement in the image.The full list of rolling annuli is shown on the black line below.

Fig. 3 .
Fig. 3. Statistical moments evolution based on a rolling annulus which paves the full annular-PCA processed frame.The top and bottom rows refer, respectively, to sph2 and nrc3 ADI sequences.Colour curves on each subplot refers to a different principal component.

Fig. 4 .
Fig.4.PCA-pmap of sph2 ADI sequence, showing the combined p-value p both as a color code and as values, as a function of the distance to the star through the rolling annulus (x axis) and the number of principal components used in the PCA-based PSF subtraction (y axis).Yellow star markers indicate when the null hypothesis H 0 (Gaussian noise) is rejected.The white dashed line shows the 90% CEVR at each rolling annulus.White circles in bold highlight the principal component that maximizes the S/N of fake companion recoveries.

Fig. 6 .
Fig. 6.SODINN labeling stage.Top: steps for generating MLAR samples (see text for more details).N f is the number of frames in the ADI sequence and N pc is the number of principal components in the cube of processed frames and therefore in the final MLAR sequence.Bottom: example of an MLAR sequence of each class.

Fig. 7 .
Fig. 7. S/N curves generated from the sph2 cube of processed frames at a 8λ/D distance from the star.Curves in blue contain the exoplanet signature and curves in red just residual noise.The flux of injections is randomly selected from a range that is between one and three times the level of noise.Dotted curves over populations show the mean of each class.

Fig. 8 .
Fig. 8. Illustration of the three steps within the NA-SODINN algorithm working flow.Left: generation of the training set.NA-SODINN uses the annular-PCA algorithm to perform PSF-subtraction and produce the cube of processed frames.Then, it detects residual noise regimes by applying the PCA-pmap technique in this cube and build both the training and inference data sets at each regime, which are composed of both MLAR samples and S/N curves.Middle: model training.NA-SODINN trains as many detection models as detected noise regimes using their respective training data sets (note that for the sake of simplicity, we have not duplicated the central deep neural network).This case contains two regimes, the speckle-and background-dominated noise regimes, so that two models are trained.Right: detection map.Finally, NA-SODINN uses each trained model to assign a probability value to belong to the c + class to each pixel of the corresponding noise regime field-of-view.

Fig. 9 .
Fig. 9. ROC analysis per noise regime for the sph2 data set showing the performance of SODINN, NA-SODINN, annular-PCA, and hybrid SODINN models.The values plotted alongside each curve highlight some of the selected thresholds.

Fig. 11 .
Fig. 11.NA-SODINN probability maps obtained on the whole set of EIDC ADI sequences (TableA.1).For the submitted probability threshold τ = 0.90, we highlight with green circles the correct detection of injected companions (true positives), and with red circles the non-detection of injected companions (false negatives).No false positive is reported in our maps, as all the remaining non-circled peaks in the probability maps are below the threshold.Large white circles delineate the noise regimes at each case.

Fig. 12 .
Fig. 12. Updated EIDC leader-board after the NA-SODINN submision.Ranking based on the F1-score (on top), the AUC of the TPR (on middle) and the AUC of the FDR (on bottom).Colors refer to HCI detection algorithm families: PSF-based subtraction techniques providing residual maps (red) or detection maps (orange), inverse problems (blue) and supervised machine learning (green).The light, medium and dark tonalities correspond to SPHERE, NIRC2, and LMIRCam data sets respectively.

Fig. C. 1 .
Fig. C.1.TPR, FDR and FPR metrics computed from the probability maps of Fig.11for a range of probability thresholds varying from zero to one.Their respective AUCs are showed in each legend.The F1-score is computed at the submitted threshold on the challenge τ sub = 0.9 (vertical dashed line) and it is showed in the top of each subplot.When the data set contains injections, TPR and FDR steply decrease with threshold while FPR decreases monotonically.Thereby, an ideal algorithm would provide a TPR=1, FPR=0 and FDR=0 for any threshold and therefore, an AUC T PR = 1, AUC FPR = 0 and AUC FDR = 0.However, when the data set does not have injections, the FPR is the only metric that can be defined as it does not depend on TPs.