Near-real-time detection of unexpected atmospheric events using principal component analysis on the Infrared Atmospheric Sounding Interferometer (IASI) radiances

. The three Infrared Atmospheric Sounding Inter-ferometer (IASI) instruments on board the Metop family of satellites have been sounding the atmospheric composition since 2006. More than 30 atmospheric gases can be measured from the IASI radiance spectra, allowing the improvement of weather forecasting and the monitoring of atmospheric chemistry and climate variables. The early detection of extreme events such as ﬁres, pollution episodes, volcanic eruptions, or industrial releases is key to take safety measures to protect the inhabitants and the environment in the impacted areas. With its near-real-time observations and good horizontal coverage, IASI can contribute to the series of monitoring systems for the systematic and continuous detection


Introduction
Atmospheric composition is changing fast locally and globally, under combined natural and anthropogenic influences.Fire activity and local urban pollution are likely to increase in a warming climate (Hart et al., 2022).With their potential consequences on society and health, monitoring the events that impact atmospheric composition becomes increasingly important.
Since the end of 2006, the Infrared Atmospheric Sounding Interferometer (IASI) mission has been probing the troposphere from satellites to monitor the atmospheric composition globally on board three successive Metop satellites (Clerbaux et al., 2009).Observation records and trends Published by Copernicus Publications on behalf of the European Geosciences Union.
are available for several infrared absorbing species, such as methane (CH 4 ; García et al., 2018), carbon monoxide (CO; George et al., 2009), ammonia (NH 3 ;Van Damme et al., 2021), ozone (O 3 ;Dufour et al., 2018;Wespes et al., 2019), and dust (Capelle et al., 2014;Clarisse et al., 2019).As the first goal of this mission is to feed meteorological forecasts using data assimilation, radiance level 1C (L1C) data are received in near-real time, around 2-3 h after the overpass of the satellite.This makes the detection of exceptional events possible, potentially right after they occur, such as large biomass burning fires (Turquety et al., 2009;R'Honi et al., 2013), anthropogenic pollution episodes (Boynard et al., 2014), or volcanic eruptions (Wright et al., 2022).With more than 1.2 × 10 6 radiance spectra per instrument per day, the search for local extreme events in near-real time is not straightforward.A limitation is also associated with the lack of data when clouds are present in the field of view, as the usual retrieval algorithms fail to properly derive atmospheric concentrations for trace gases.Cloudy data are hence filtered.
Soon after the launch of the first IASI instrument, it was suggested to use the principal component analysis (PCA) method to reduce data volumes by reconstructing the radiances using only the leading eigenvectors (Matricardi, 2010).This compression not only allows us to heavily decrease the data volume but also to ease the data dissemination.Now available through the EUMETSAT (EUropean organisation for the exploitation of METeorological SATellites) Advanced Retransmission Service (EARS-IASI), the PCA method allows meteorological centers to directly assimilate the principal components (Collard et al., 2010;Matricardi and Mc-Nally, 2014;Guedj et al., 2015).It was also demonstrated that using reconstructed IASI radiance results in a substantial reduction in the random instrument noise for the analysis of trace gases such as sulfur dioxide (SO 2 ) or NH 3 (Atkinson et al., 2010).However, it was decided to continue the distribution of the entire radiance spectra (8461 spectral channels), as one of the concerns in the use of the PCA method for atmospheric chemistry studies was the detection of spectral features associated with minor trace gases linked with rare events in the reconstructed spectra.Examples are volcanic eruptions, which all differ in terms of gas and type of ash emitted, and hence not enough representative cases were available in the training set.The same holds for biomass burning fires releasing different amounts of specific species, depending on the type of vegetation burned.With the advent of the second and third IASI instrument, together with the improvement of retrieval algorithms over time, a number of short-and long-lived trace gases were identified in the IASI spectra up-or downwind of strong emission sources (Clarisse et al., 2011;De Longueville et al., 2021).
This paper describes the potential of the PCA applied on the IASI L1C (apodized radiance) data for the automatic, near-real-time detection and characterization of exceptional events.The paper is organized as follows: Sect. 2 describes the IASI instrument and the dataset used in this study.Sec-tion 3 describes the PCA method.In Sect.4, an innovative approach based on the PCA method and IASI data granules is presented, which allows the spectral characterization of species in near-real time.In Sect.5, different case studies of exceptional past events are discussed, such as volcanic, fire, and anthropogenic pollution episodes, along with industrial accidents, detected by IASI/Metop-A and Metop-B.Finally, conclusions are given in Sect.6.
2 The IASI radiance data IASI is a Fourier transform infrared spectrometer, which records the thermal infrared (TIR) radiation emitted by the Earth and the atmosphere, between 645 and 2760 cm −1 , with 8461 channels sampled every 0.25 cm −1 and a spectral resolution of 0.5 cm −1 .An example of IASI spectrum along with the absorption band of several species is illustrated in Fig. 1.
In this work, IASI-A and IASI-B are used as a combined dataset.The IASI-A dataset is used for the study of events before the launch of IASI-B and for creating the PCA training database (described hereafter), and the IASI-B complete dataset is used for data after 2013 to the present.The two datasets have been shown to be highly consistent, with no significant drifts over time (García et al., 2016).
Each IASI instrument provides more than 1.2 × 10 6 spectra per day.IASI L1C data are disseminated by EUMETSAT in 3 min files (hereafter called a granule) less than 3 h after each overpass.Each granule contains 22 or 23 IASI scan lines, with 120 pixels per line.With a wide swath width of ∼ 2200 km, global observations are provided twice a day, at 09:30 and 21:30 LT.IASI has an instantaneous field of view (FOV) at nadir, with a spatial resolution of 50 km × 50 km, composed of 2 × 2 circular pixels (IFOV), each corresponding to a 12 km diameter footprint on the ground at nadir (Clerbaux et al., 2009).
3 The principal component analysis method

Basic concepts
The PCA method for high spectral resolution sounders, such as IASI, is described in Atkinson et al. (2008).This method is well suited to efficiently represent the amount of information contained in the 8641 IASI channels.It relies on the use of a dataset of thousands of spectra representing the full range of atmospheric conditions from which the principal components are calculated -the so-called training database.
One considers an ensemble of n IASI radiance spectra y of dimension m (where m is the number of channels, and n is the number of observations).Let us denote N −1 y as the mean and S (m×m) as the covariance of the normalized ensemble of spectra.N is the noise normalization matrix and is defined as the square root of S y (m×m), which is the instrument noise covariance matrix associated with the IASI spectra.
The PCA is based on the eigenvalue decomposition of the matrix S , as follows: where E is the matrix m × m of eigenvectors and the diagonal matrix of their associated eigenvalues.
The representation of a measured spectrum y in the eigenspace E is obtained by the following: (2) p (dimension m) is the vector of the principal component scores.
The analysis consists of representing the multidimensional IASI spectra in a lower-dimensional space, which accounts for most of the variance seen in the data.This space is spanned by a truncated set of the eigenvectors of the data covariance matrix.By noise-normalizing the spectra prior to the application of the PCA, the ability to fit the data is enhanced by avoiding giving too much weight to variance caused by noise.Giving m * the number of the most significant eigenvectors of S , one can represent the spectrum in the eigenspace with a truncated vector of principal component scores, p * , with the rank m * (m * < m).p * is thus a compressed representation of y.The reconstructed spectrum, ỹ (dimension m), is given by the following: where E * is the matrix of the m * first eigenvectors or principal components.We define the noise-normalized residual vector r (dimension m) of the reconstruction with the following: By definition, if m * is taken equal to m, then ỹ = y, and the residual is the null vector.In nominal cases, if the truncation rank is carefully chosen, then r essentially contains noise.
Several techniques exist to estimate m * in order to keep the essential part of the atmospheric signal and to remove the eigenvectors containing mainly the measurement noise (e.g., Antonelli et al., 2004;Atkinson et al., 2010).The individual components of vector r are used later to define the reconstruction score, and they are denoted r i .
In the following, the noise-normalized residual, which is calculated for each IASI IFOV, is called the IFOV-residual.between 90 and 75 • only one spectrum is selected, between 75 and 60 • , two spectra are selected, between 60 and 45 • , three spectra are selected, between 45 and 30 • , four spectra are selected, between 30 and 15 • , five spectra are selected, and between 15 and 0 • six spectra are selected.
To reach a sufficient but reasonable number of IASI spectra/IFOVs (1.3 × 10 6 spectra per day; 4.7 × 10 8 per year), 120 000 IFOVs for year 2013 were randomly chosen to represent all atmospheric/surface situations (air masses, land/sea, day/night, and clear/cloudy) and acquisition conditions (IASI scan mirror position and pixel number).

Number of eigenvectors
Several techniques exist to estimate m * in order to keep the essential part of the atmospheric signal and to remove the eigenvectors containing mainly the measurement noise.Antonelli et al. (2004) define a criterion based on the spectral root mean square (rms) reconstruction residuals, finding the optimal truncation rank when this value approaches the spectral rms of the instrument noise.Other methods directly test the behavior of the reconstruction score 1 m m i=1 r 2 i as a function of the truncation rank by looking at the second derivative of the reconstruction score as a function of the truncation rank (e.g., Hultberg, 2009) or by plotting the principal component score (p) spatial correlation as a function of the eigenvector rank (Atkinson et al., 2009).In this study, the estimation of m * is based on the analysis of the eigenvalues.The eigenvalues (sorted in descending order) quantify the variability explained by the corresponding eigenvectors, and the optimal number of eigenvectors needed to reproduce the signal in the raw radiances can be determined by analyzing their magnitude and behavior.In the present implementation of the PCA method, we process the full IASI spectrum and use a simple method for selecting the truncation rank.The plot of the eigenvalues was examined, and principal components (PCs) were selected up to the point where the slope of the curve stabilized.This leads to choosing the first 150 eigenvectors, as done in Atkinson et al. (2010).Sensitivity tests have been performed to test the impact of using different values (from 120 to 250) on the reconstructed scores obtained from several atmospheric events (fires and volcano cases are discussed in the next sections) and confirm this value.

Granule maxima and minima
The near-real-time detection of exceptional events is performed on the IASI granule.The choice of applying the method to the granule is convenient for the near-real-time aspect, as it represents 3 min of IASI data, which are received every 1-2 h by the antenna.Each granule contains ∼ 2700 radiance spectra from which the corresponding IFOV-residuals are computed, based on the IASI-PCA method.For each granule, the largest positive and negative residual value for each spectral channel is recorded in two arrays, which are hereafter called granule maxima (GMA) and granule minima (GMI).GMI and GMA are defined as the pseudo-residuals of dimension 8461 (the number of radiance channels) and represent the spectral envelope of the statistics of residuals over the granule.Physically, the GMI (GMA) pseudo-residual is associated with reconstruction errors in spectral absorption (emission) lines.Since the method is based on the granule extrema (GMI and GMA), the method is therefore called IASI-PCA-GE, where GE stands for granule extrema.It is important to note that these pseudo-residuals associated with a granule are different from the individual IFOV-residual associated with each IFOV. Figure 2 illustrates an example of GMA and GMI pseudoresiduals for an intense fire event that occurred in Australia on 1 January 2020.The GMI pseudo-residual (bottom panel) is characterized by detectable spectral features asso- ciated with a poor reconstruction around 700, 950, 1100, or 2100 cm −1 .Using a spectroscopic database allows us to associate some of these strong peaks with the contributions of different atmospheric components (see Sect. 5 for the identification of the molecules).Similar spectral features can be seen in the GMA pseudo-residual (top panel), albeit in the emissions and less intense.

Detection thresholds
Two detection thresholds are defined in order to select (1) the granules associated with outliers only (which allows us to gain computation time) and (2) the IFOV-residuals associated with reconstruction errors.For the definition of the detection thresholds, a dataset of 43 000 IASI/Metop-B granules (21 500 granules for daytime and 21 500 for nighttime) containing outlier and regular spectra and chosen randomly on the first of each month between April 2013 and April 2021 is used.Note that this dataset differs from that generated for the principal component calculation, as the detection method is applied on a granule basis.From this dataset, 21 500 GMI and 21 500 GMA pseudo-residuals are calculated for both day-and nighttime conditions.
Figure 3 shows the statistical distribution of the largest minimum and maximum values for each of the 43 000 GMI/GMA pseudo-residuals for all channels.The lower and upper limit of the blue box represents the 25th and the 75th percentiles in the data, respectively.The red line represents the median.The black lines represent upper adjacent value (UAV) and lower adjacent value (LAV), and the red crosses have been considered to be outliers in a first analysis of the dataset.Using the UAV and LAV as thresholds was observed to be too restrictive.After several tests, a decision was made to use the 25th percentile of the data to keep the granules as- sociated with potential outliers (F 1 threshold).All granules associated with GMA or GMI minimum and maximum values (in absolute values) larger than the 25th percentile of the datasets are then selected, thus avoiding the need to process granules without interesting anomalies.
A second threshold (F 2 threshold) was defined for each spectral channel, based on the 99th percentile value of the GMI and GMA pseudo-residuals calculated from the 43 000 granules (21 500 for daytime conditions and 21 500 for nighttime conditions).This F 2 threshold is used in the processing of each granule selected after applying the F 1 threshold.It is applied only on channels of interest that are associated with a strong absorption of a molecule, which are identified in Table 1.For those channels, all IFOV-residuals associated with values larger than the F 2 threshold values are selected.The choice of the 99th percentile as the threshold value is the result of extensive tests performed on both the ensemble of statistically representative scenes (the 43 000 granules) and on specific atmospheric situations of fires and volcanoes.It corresponds to the empirical compromise allowing (1) a reasonable rate of detection of extreme events (below 4 %) for the processed scenes, (2) the minimization of false positive detections in the statistically representative scenes (false positive detections are empirically identified as spatially noisy https://doi.org/10.5194/amt-16-2107-2023Atmos.Meas.Tech., 16, 2107-2127, 2023 i.e., isolated IFOVs), and (3) the unambiguous detection of well-identified fire and volcanic events.Values of the F 2 thresholds used for the channels of interest are provided in Table 1.In the detection processing for each selected IFOVresidual, the spectral channel associated with the detection (and thus the corresponding spectral interval and associated molecule as defined in Table 1) is recorded, along with the corresponding IFOV-residual value, the latitude, and the longitude.This step allows us to localize (IFOV latitude and longitude) and characterize (spectral position and corresponding IFOV-residual value) the outliers.

Towards a detection of extreme events in near-real time
Right after the reception of each IASI 3 min granule, the two GMA/GMI pseudo-residuals, in addition to other statistics of the residual over the granule, are calculated.Then the two different thresholds defined in Sect.4.2 are applied to the GMA/GMI pseudo-residuals in order to localize the pixels potentially associated with an event and the associated channels.In the case of anomalies (i.e., threshold overrun) in the GMA/GMI pseudo-residuals, an alert is set up along with the targeted channels identified.The corresponding absorbing species with their spectral range are identified in the following, together with the associated peak position of the associated channel, and the spatial distribution map of the detected pixels in the 3 min granule is produced.This allows us to visualize and further study exceptional events.The IASI-PCA-GE method was validated for past and documented events, four of which are described hereafter.It is now running continuously, delivering email alerts on a routine basis, using the near-real-time IASI L1C radiance data.Most of these alerts are associated with fires and volcanic eruptions.

Case studies
This section presents a demonstration of the IASI-PCA-GE method for several past extreme events.The method is applied to IASI/Metop-A and the IASI/Metop-B L1C radiance data.Table 2 gives a brief description of the case studies presented hereafter.
For each event, we identify the molecules in the outliers, through an analysis of the residual statistic, in order to assign the spectroscopic feature characteristic of the corresponding species over a granule and applying the IASI-PCA-GE method.We also provide distribution maps to illustrate the spatial distribution of the target event.When available, the maps are compared to the existing retrieved IASI products (CO in Hurtmans et al., 2012;NH 3 Clarisse et al., 2012).

Volcanic eruption events
Volcanic eruptions have a major impact on the atmospheric composition.SO 2 , which has several strong absorption bands in the TIR spectral range, is the most common molecule observed in the volcanic plume (Clarisse et al., 2012).Several other species were previously observed by satellites in volcanic eruptions, such as hydrochloric acid (HCl; Clarisse et al., 2020), hydrogen sulfide (H 2 S; Clarisse et al., 2011), and sulfuric acid (H 2 SO 4 ; Ackerman et al., 1994;Karagulian et al., 2010), which can be injected into the stratosphere in the case of a high-altitude eruption (Rose et al., 2006;Millard et al., 2006).The IASI-PCA-GE method was applied to several volcanic eruptions.Here, we illustrate the findings for the eruption in Ubinas, Peru, on 20 July 2019 (Venzke, 2019).The Instituto Geofísico del Perú (IGP; Geophysics Institute of Peru) mentioned that seismic activity suddenly increased during June 2019 and remained high during July 2019, with important ash emissions causing the evacuation of the population in some areas affected by ashfall.Figure 4 illustrates the normalized GMI pseudo-residual obtained during this volcanic eruption, which corresponds to a granule taken in the area of the plume during daytime.A large difference between the reconstructed spectra and raw spectra is seen in the SO 2 ν 3 band around ∼ 1371 cm −1 and ∼ 1377 cm −1 , which is in agreement with results of Clarisse et al. (2008Clarisse et al. ( , 2012) ) showing the sensitivity of the ν 3 band.Indeed, the peak found at 1371.50 cm −1 is associated with the presence of the SO 2 plume in the upper troposphere/lower stratosphere (∼ 14 km; 150 hPa) between 0.5 and 200 DU (saturation; Clarisse et al., 2011).Such detection is expected in this case due to the high quantity of SO 2 emitted.It is worth noting that other peaks in the GMI pseudo-residual also show strong absorptions, which were initially associated with HNO 3 .Even if this constituent has previously been reported in volcanic plumes in some active degassing volcanoes (Mather et al., 2004), peaking in the GMI at ∼ 763, ∼ 879 and ∼ 897, and ∼ 1326 cm −1 associated with ν 8 , ν 5 , 2ν 9 , ν 3 , and ν 2 nitric acid absorption bands, respectively, it has never been observed by remote sensing before.As the analysis of the IASI HNO 3 L2 products shows no HNO 3 enhancement, further investigations were performed to identify where the signature comes from.
The HNO 3 detection by the IASI-PCA-GE method was further investigated by applying the whitening method proposed by De Longueville et al. (2021).The use of a covariance matrix, calculated from a set of IASI spectra, shows similar results to those found with the IASI-PCA-GE method.However, using a covariance matrix excluding the SO 2 absorption band, no HNO 3 spectral feature was found.This suggests that no nitric acid is present in the plume.The features found in the HNO 3 absorption band by the IASI-PCA-GE method is likely related to SO 2 features, given that the SO 2 ν 3 absorption band superimposes with the HNO 3 ν 3 band.
Furthermore, other spectral signatures remain difficult to characterize in the 1200-1300 cm −1 spectral domain.This spectral range corresponds to the absorption of different volcanic compounds such as ash, aerosols, and other possible volcanic molecules such as H 2 S or H 2 SO 4 (Karagulian et al., 2010) but is also sensitive to strong H 2 O absorptions.
After applying the threshold filters defined in Sect.4.2 to the GMI pseudo-residual, the spatial distribution of the pixels associated with outliers can be mapped.Figure 5 shows a plume of SO 2 (left) in southeastern South America, with large signal intensity values reaching around −150 in the center of the plume.The spatial distribution of the retrieved IASI SO 2 L2 operational products (right) also shows the plume located in southeastern South America and is in excellent agreement with the SO 2 plume detected from the IASI-PCA-GE method.

Volcanic eruption archive for IASI/Metop-B
The time series of the SO 2 detections derived from the IASI-PCA-GE method is applied to the IASI/Metop-B global dataset over the 2013-2022 period.Figure 6 shows the comparison of the SO 2 IASI-PCA-GE signal intensity with the SO 2 hyperspectral range indexes (HRIs) product at 5 km (Bauduin et al., 2016).HRIs at 5 km are chosen because of a good sensitivity around this altitude (Clarisse et al., 2014), compared to the L2 SO 2 concentration data that show concentrations above 5 km (likely high-intensity volcanism).Only daily SO 2 extrema of both the IASI-PCA-GE method and HRI product are compared.They are spatially co-located and associated with documented volcanic events from the Global Volcanism Program, Smithsonian Institution (https: //volcano.si.edu/, last access: 19 April 2023).It is observed that both methods are able to detect not only intense eruptions but also moderate or degassing volcanic events.The largest volcanic eruptions detected during this period for both methods are Calbuco on 22 April 2015, Raikoke on 22 June 2019, and Ubinas on 19 July 2019 (Sennert, 2015(Sennert, , 2019a, b), b).Furthermore, for all major events (corresponding to 2810 d https://doi.org/10.5194/amt-16-2107-2023Atmos.Meas.Tech., 16, 2107-2127, 2023  over 3373 d in total), an excellent correlation between HRIs and IASI-PCA-GE signal intensity (R 2 = 0.96) is found between the two datasets.
In order to analyze and understand the differences between the two records, the correlation between the latitudes of both datasets shown in Fig. 6 are plotted (see Fig. 7).An excellent location correlation between both HRI and IASI-PCA-GE methods is observed for high-intensity detections.However, some discrepancies are found in the case of low-intensity events, corresponding to commonly active degassing volcanoes.
Some specific latitudes associated with degassing volcanoes, such as Sabancaya (Moussallam et al., 2017), the Vanuatu island arc with Ambae (Bani et al., 2012), Colima and Popocatepetl in Mexico (Varley and Taran, 2003), and the The daily maxima located around 56 • N have been investigated and found to be associated with Kamchatka degassing volcanoes.Disperse latitudes of IASI-PCA-GE daily maxima are not consistent with the co-registered HRI maxima.These differences between the IASI-PCA-GE and HRI methods can also be explained by the relation between the plume altitude/temperature not being represented in the principal components that will also affect the spectral reconstruction.As a result, the location of daily maxima can be different in the case of low-intensity detections because of the PCA overestimation (or underestimation) of atmospheric anomalies.This also results from the nonlinear relationship between retrieved concentrations and PCA intensities.
It is interesting to note that both IASI-PCA-GE and HRI detections observed at around 30 and 65 • N are associated with anthropogenic emissions in the region of the Sarcheshmeh Copper Complex, one of the largest industrial mining complexes for copper that is emitting about 789.9 t of SO 2 per day (Amirtaimoori et al., 2014), and over the Norilsk city, which is also well known for its mining and smelting industries (Bauduin et al., 2016).That finding illustrates the capacity of both methods to detect industrial emissions.
It is found that the relation between the concentration and signal intensity is not linear, and the PCA-based results cannot be used for an accurate quantification of SO 2 concentrations.Indeed, IASI-PCA-GE signals will not only be dependent on the molecule concentration but also on thermal contrast and other surface parameters and atmospheric conditions.This is why discrepancies are found at high latitudes between the location of IASI-PCA-GE and HRI maxima, which are associated with eruptions in the Kamchatka region.

The Australia case study
In Australia, fire events known as bushfires occur every year.Coupled with global warming and the lack of rainfall in 2019-2020, the fires were particularly intense, with burned areas covering more than 186 000 km 2 .It was shown that pyroconvection allowed the plume to reach the lower stratosphere at around 15-16 km (Khaykin et al., 2020).Many species were observed by the Atmospheric Chemistry Experiment Fourier Transform Spectrometer (ACE-FTS) during that episode (e.g., Boone et al., 2020)  The IASI-PCA-GE method was applied to the IASI/Metop-B L1C data on 1 January 2020.Figure 8 illustrates an example of a normalized GMI pseudo-residual obtained during the Australia fire event.As expected, peaks relative to the CO absorption lines are found in the 2050-2200 cm −1 spectral domain.Other peaks associated with the absorption of molecules are also visible, including HCN, with a peak at 712.50 cm −1 , furan (C 4 H 4 O) at 744.50 cm −1 , C 2 H 2 at 729.50 cm −1 , C 2 H 4 at 949.25 cm −1 , HCOOH at 1105.00 and 1777.00 cm −1 , CH 3 OH at 1033.50 cm −1 , and peaks associated with NH 3 at 931.00 and 967.00 cm −1 .
Figure 9 (left column) shows the spatial distribution of the residual values associated with the detected species in the GMI pseudo-residual.Despite their different lifetimes, the plumes for the different species are located in the same region (around 180 • E in the Pacific Ocean).
Carbon monoxide is retrieved in near-real time (George et al., 2009) from IASI L1C and is used for monitoring fires (Turquety et al., 2009).In Fig. 9, CO is observed with both the IASI-PCA-GE and the L2 retrieval methods.However, some discrepancies are found in terms of location and inten-sity.A few pixels are detected by the IASI-PCA-GE method in the southeast of Australia, which is in agreement with the CO operational L2 product.However, the retrieval method is able to detect a larger plume over Australia compared to the IASI-PCA-GE method.Furthermore, a large plume is also detected over the Pacific Ocean but is missed by the IASI-PCA-GE method.Note that the high-intensity CO peaks are clearly detected in the residuals (see Fig. 10).However, most of the missing pixels in the PCA detection results are located above the sea.That could be due to the combination of the database chosen in the PCA method and the high variability in this spectral domain.Indeed, a higher thermal contrast variability is observed above land (Clerbaux et al., 2009), but the database contains spectra representing the natural variability without differentiating between sea and land pixels.As a result, the spectral reconstruction above the sea with the PCA method will be less sensitive to spectral variations, causing a reduced sensitivity above the sea.Furthermore, the spectral region between 2050 and 2200 cm −1 has shown a large statistical distribution of extrema signals within the 21 500 granules used for threshold calculation in Sect.4.2, allowing us to set a restrictive threshold for the outlier detection for CO.That restriction will also impact the num- ber of detected pixels.The sensitivity of PCA reconstruction outliers to strong CO concentrations in fires should be more deeply investigated in further studies.
NH 3 is also retrieved in near-real time (Van Damme et al., 2017) and observed with a low concentration and occurrence above Australia on 1 January 2020 in the L2 retrievals and with a low signal and occurrence in the IASI-PCA-GE method.Some pixels are detected by the IASI-PCA-GE method but are not spatially correlated with the NH 3 total column L2 data.A less frequent detection of NH 3 is expected since only low-intensity peaks of NH 3 are found in the GMI pseudo-residual, but two plumes are observed above both land and sea, while L2 retrievals only show many isolated pixels.
However, for other indicators, the size of the plume differs; large plumes are found for C 2 H 2 , C 2 H 4 , and HCOOH, while smaller plumes are found for HCN, C 4 H 4 O, and CH 3 OH.Those differences can be explained by the difference between both methods.Indeed, the column maps include the effects of radiative transfer (thermal contrast in particular), and the presence of clouds can also induce differences between both products, as the retrievals are highly sensitive to clouds.For the IASI-PCA-GE method, the sensitivity to molecule detection highly depends on the selection of spectra to construct the database and the thresholds chosen for the detection.

Fire archive for IASI/Metop-B
Figure 10 illustrates the time series of the ethylene detections from the IASI-PCA-GE method, based on the IASI/Metop-B L1C data for the 2013-2022 period.C 2 H 4 is a weak absorber often detected at 949.25 cm −1 in the case of highintensity fires and is able to show many high-intensity peaks attributed to fire events.In the figure, the most intense fires are characterized by their location (names indicated in black in Fig. 10).The presence of fires was validated by comparing C 2 H 4 detection to the IASI L2 CO that is shown to be a good fire tracker (Logan et al., 1981).The seasonality of fires clearly appears during summer in the Northern Hemisphere and is mainly related to fires in Canada and Russia and during summer in the Southern Hemisphere with annual Australian and Indonesian fires.One of the largest detections of the 2013-2022 period is associated with the 2019-2020 Australian bushfires discussed in Sect.5.2.1.Note that the highest C 2 H 4 intensity, observed on 29 July 2021 with a signal of 56, could not be associated with biomass burning, as no other indicators are present in the PCA residuals.The source of this C 2 H 4 enhancement is likely linked to anthropogenic activities, in addition to some other maxima, which are all located in Iran near the Iraqi border.This will be further discussed in Sect.5.3.The episode was caused by the presence of anthropogenic emissions combined with low wind speed and low-altitude boundary layer, leading to the weak mixing and dispersion of pollutants.The ability of IASI to detect high concentrations of trace gases such as CO, SO 2 , NH 3 , and ammonium sulfate aerosol ((NH 4 ) 2 SO 4 ) during the nighttime was demonstrated in the case of large negative thermal contrasts related to the winter season and coal burning in China for domestic heating.The IASI-PCA-GE method was applied on 13 January 2013 during the nighttime.The normalized GMA pseudoresidual obtained during China's anthropogenic pollution is illustrated in Fig. 11.In order to optimize the sensitivity of the method for a low-intensity event, the F 2 thresholds were defined as F 2 = 5 for both day-and nighttime conditions for the three species of interest (CO, NH 3 , and SO 2 ).We clearly see a signal associated with CO, NH 3 , and SO 2 spectral emissions, with the largest signal for SO 2 (value reaching ∼ 18).The detection of SO 2 around ∼ 1345 cm −1 is less frequent compared to a similar detection of SO 2 during volcanic eruptions.This result suggests that the SO 2 absorption features around ∼ 1345 cm −1 also allows the detection of SO 2 during anthropogenic pollution episodes, which is in agreement with the finding of Bauduin et al. (2014Bauduin et al. ( , 2016)).Finally, the spectral features around 1180-1200 cm −1 showing a low signal intensity are likely due to the IASI detector band 1-band 2 inter-band domain that is well captured in the IASI-PCA-GE method and should not be associated with an anomalous atmospheric constituent.
The spatial distribution of the residual values associated with the detected species in the GMA pseudo-residual (see Fig. 11) is presented in Fig. 12 (left).The IASI-PCA-GE method allows the spectral detection of NH 3 , SO 2 , and CO.However only a few pixels are detected for NH 3 , which is due to the very low (< 5) signal intensity found for that species.We see the same behavior for CO.However, a clear SO 2 plume characterized by a signal reaching ∼ 18 (at https://doi.org/10.5194/amt-16-2107-2023Atmos.Meas.Tech., 16, 2107-2127, 2023 1345.00 cm −1 ; see Fig. 11) is found by the IASI-PCA-GE method.
Figure 12 (right) illustrates the spatial distribution of NH 3 and CO total column and SO 2 plume altitude L2 data retrieved from the IASI/Metop-A L1C data (Clarisse et al., 2012).The retrieval and IASI-PCA-GE methods shows different patterns.We clearly see two plumes for SO 2 plume altitude and CO concentrations, but only a few pixels of detection are found for NH 3 .

SO 2 released by a sulfur plant
During the period extending from 20 to 27 October 2016, a sulfur mine burned in Al-Mishraq near Mosul, Iraq.This fire on the sulfur plant, which was set by members of the Islamic State in Iraq and the Levant (ISIL), caused a large emission of SO 2 and other sulfured species in the atmosphere, which was observed from several satellite instruments (Björnham et al., 2017).Similar plant fires occurred in June 2003 during 4 weeks, with approximately 600 kt of SO 2 emitted (Carn et al., 2004).This was a major health hazard (Baird et al., 2012).Nearly 1000 people were intoxicated due to toxic fire plumes, and two Iraqis died.
Figure 13 illustrates the normalized GMI pseudo-residual obtained during the Iraqi industrial disaster on 24 October 2016 PM.The GMI pseudo-residual is characterized by an absorption peak at ∼ 1326.00 cm −1 that could be assigned to HNO 3 and two absorption peaks associated with SO 2 at 1345.00 and 1371.00 cm −1 .The signal intensity is about −14 for SO 2 , which suggests that the event is of low to medium intensity.However, the SO 2 peaks found around ∼ 1371 and ∼ 1377 cm −1 are mostly seen in the case of intense volcanic eruptions, suggesting that the SO 2 concentrations are larger than the concentrations found above most of degassing volcanoes.This suggestion for an industrial origin is well supported by Fig. 14, showing SO 2 total columns up to 5 DU.
The detection at ∼ 1326 cm −1 is not associated with HNO 3 and is due to the contribution of SO 2 and aerosols, as already discussed in the case of the Ubinas eruption (see Sect. 5.1.1).
The spatial distribution of the residual values associated with SO 2 detections is illustrated in Fig. 15.The IASI-PCA-GE method allows the spectral detection of this molecule in the region of interest 4 d after the fire started, thus showing the transport of the plume in the eastern part of the country.Fewer pixels are detected by the IASI-PCA-GE method than by the L2 retrieval method.This can be explained by the fact that SO 2 thresholds associated with the IASI-PCA-GE method were empirically chosen to minimize false positive detections, and thus, the detections of low-intensity residuals can be missed.

C 2 H 4 sporadic emission at the border of Iran/Iraq
In Sect.5.2.2, we reported that the IASI-PCA-GE method is well suited to the detection of biomass burning by using the C 2 H 4 indicator found in conjunction with other signatures of molecules usually associated with fire activity.Among the events that we detected, on a few occasions, we found intense signatures in the Iran/Iraq region with no absorption other than C 2 H 4 , which suggests that sources other than biomass burning -likely due to anthropogenic activitiesare at play.The main event occurred in July 2021, and some other weaker ones are also identified in Fig. 10.By averaging IASI data over time and using a super-sampling technique, Franco et al. (2022) uncovered and identified over 300 worldwide emitters of C 2 H 4 emanating from petrochemical clusters, steel plants, coal-related industries, and megacities.However, no C 2 H 4 point source was formally identified in this Iran/Iraq region.But the method described in this paper is also well suited to the detection of sporadic events, which contrasts with the continuous emissions identified by Franco et al. (2022).Indeed, oversampling methods are well suited for the detection of regular, even weak, anthropogenic sources but typically miss transient sources lasting less than 24 h.A new analysis was therefore performed on the events spotted by the IASI-PCA-GE method, which led to the identification of plumes lasting for only a few hours (see Fig. 15) and for specific days, as identified in Fig. 10.Although visible satellite imagery and independent online information indicate the presence of oil and gas activities in that area, no firm identification was possible, and further investigation is   needed to identify the potential sources of these sporadic emissions.

Conclusions and perspectives
This paper presents an innovative approach, based on a PCA method applied on the IASI radiance spectra, that allows the detection and characterization of exceptional events in near-real time.This new method, the IASI-PCA granule extrema (GE) method, consists of focusing on extrema calculated within a given geographical region.A statistical selec-tion is made by focusing on the anomalous variability in IASI channels (detection of outliers) in order to identify the contribution of specific molecules from different types of events.The method is applied to the standard 3 min granules of IASI observations, thus allowing the near-real-time detection of a series of short-lived trace gases.Using a dataset representing the full range of atmospheric conditions, we show that the PCA method is well suited to detecting outliers efficiently.The analysis of the outliers allows the identification of spectral features exceeding the natural variability in several absorbing species, especially for weak absorbers, emitted during fires, volcanic eruptions, an- thropogenic pollution, or industrial disasters.The method is more robust than previous retrieval methods when the spectra are contaminated by clouds.
The analysis of several case studies shows a good sensitivity of the IASI-PCA-GE method, which is able to detect weak absorbers such as SO 2 , HCN, C 2 H 2 , C 2 H 4 , CH 3 OH, C 4 H 4 O, and NH 3 .We also showed that the method is well suited to the detection of transient events that last only a few hours or days.
Our work shows that, within a granule, the negative part the of residuals (GMI) contains more information than the positive part represented by the GMA.However, the latter contains relevant information in the case of negative thermal contrasts, thus allowing the detection of specific events such as the recurrent anthropogenic pollution events occurring in China in winter.
The IASI-PCA-GE method is better suited to the detection of spuriously emitted species.In this study, only species associated with narrow (as Q branches of C 2 H 2 and C 2 H 4 ) spectral features have been considered.Species such as PAN, CH 3 COOH, and CH 3 COCH 3 , characterized by broadband absorption features, are more difficult to detect with the IASI-PCA-GE method.Also, inconclusive results were obtained for CO because its variability is already well captured by a truncated reconstruction, due to the high variability in this species, from background conditions (50 ppb) to highly polluted areas (4000 ppb).Finally, as explained above, concerning SO 2 and HNO 3 , the spectral coincidence of some of the intense spectral features of these two species can affect the reconstruction of one when the other one is highly present.In the frame of this study, this is the only identified example of confounding situations (i.e., an unusual perturbation in a limited number of channels impacts the reconstruction residual in other channels), leading to false detection.Considering the high numbers and diversity of detections and extreme situations analyzed in this work, such confounding situations are rare, and PCA-based detection of atmospheric events can be effectively and efficiently exploited.
Overall, this paper shows the capacity of PCA detection for identifying different species from one event to another, especially in case of fire events, which suggests the possibility of categorizing fire events based on judicious combinations of species.The method also proves useful for deriving consistent, long-term records of fire and volcanic events, and the data will continue to accumulate over time, as the method is now routinely implemented.Further work is still needed to avoid false detections, such as those associated with HNO 3 , which are due to the correlation between different absorption bands for the same molecule, and one of them is likely interfering with the SO 2 present in the volcanic or industrial plumes.
https://doi.org/10.5194/amt-16-2107-2023Atmos.Meas.Tech., 16, 2107-2127, 2023 A first version of this method is currently running continuously, delivering email alerts on a routine basis, using the near-real-time IASI L1C radiance data.Although the method is still being tested, it is planned to be used as an online tool for the early and systematic detection of extreme events.

Figure 1 .
Figure 1.(a) Example of IASI spectrum.(b, c) Radiative transfer simulations for the main and weaker infrared absorbers, respectively.

Figure 3 .
Figure 3. Distribution of normalized GMI and GMA extrema in absolute values calculated from 43 000 granules (21 500 for daytime conditions and 21 500 for nighttime conditions).The lower and upper limit of the blue box represents the 25th percentile and the 75th percentile in the data.The red line represents the median.The black lines represent the upper adjacent value (UAV) and lower adjacent value (LAV), and the red crosses are considered to be outliers in the dataset.The dashed magenta line represents the F 1 threshold.

Figure 4 .
Figure 4. (a) Example of GMI pseudo-residual calculated from IASI/Metop-B L1C data during a volcanic eruption in Ubinas, Peru, on 20 July 2019 in the morning (daytime or AM orbit).(b)The HITRAN spectroscopic parameters associated with the absorption of HNO 3 and SO 2(Gordon et al., 2017(Gordon et al., , 2022) ) are shown in blue and in orange, respectively.

Figure 5 .
Figure 5. (a) Spatial distribution of the residual values associated with SO 2 IASI-PCA-GE detections, using IASI/Metop-B radiance data recorded on 20 July 2019 in the morning (daytime or AM orbit).(b) SO 2 total column retrievals in Dobson units.

Figure 6 .
Figure 6.Time series of SO 2 detections from IASI-PCA-GE method (gray) and the SO 2 HRIs at 5 km (orange), based on the IASI/Metop-B L1C data for the 2013-2022 period.Only the daily extrema are shown in the time series.

Figure 7 .
Figure 7.Comparison of latitudes corresponding to the daily maxima detected for both the IASI-PCA-GE SO 2 signal intensity and HRI product between 2013 and 2022 with IASI-B L1C data during the day.The dashed lines show location discrepancies.

Figure 8 .
Figure 8.(a) Example of GMI pseudo-residual calculated from IASI/Metop-B L1C data during the intense fire event in Australia on 1 January 2020 in the morning (daytime or AM orbit).(b) The HITRAN spectroscopic parameters associated with the absorption (Gordon et al., 2017, 2022) of different species are shown with colors.

Figure 9 .
Figure10illustrates the time series of the ethylene detections from the IASI-PCA-GE method, based on the IASI/Metop-B L1C data for the 2013-2022 period.C 2 H 4 is a weak absorber often detected at 949.25 cm −1 in the case of highintensity fires and is able to show many high-intensity peaks attributed to fire events.In the figure, the most intense fires are characterized by their location (names indicated in black in Fig.10).The presence of fires was validated by comparing C 2 H 4 detection to the IASI L2 CO that is shown to be a good fire tracker(Logan et al., 1981).The seasonality of fires clearly appears during summer in the Northern Hemisphere and is mainly related to fires in Canada and Russia and during summer in the Southern Hemisphere with annual Australian and Indonesian fires.One of the largest detections of the 2013-2022 period is associated with the 2019-2020 Australian bushfires discussed in Sect.5.2.1.Note that the highest C 2 H 4 intensity, observed on 29 July 2021 with a signal of 56, could not be associated with biomass burning, as no other indicators are present in the PCA residuals.The source of this C 2 H 4 enhancement is likely linked to anthropogenic activities, in addition to some other maxima, which are all located in Iran near the Iraqi border.This will be further discussed in Sect.5.3.3.

Figure 10 .
Figure 10.Time series of C 2 H 4 detections from the IASI-PCA-GE method based on the IASI/Metop-B L1C data for the 2013-2022 period.Only the daily extrema are shown in the time series.For clarity, the time series are separated into two periods, namely 2013-2017 (a) and 2018-2022 (b).Some events (blue dots) are associated with sporadic industrial releases.

Figure 11 .
Figure 11.(a) Example of GMA pseudo-residual calculated from IASI/Metop-A L1C data during an anthropogenic pollution event occurring in China on 13 January 2013 in the evening (PM orbit).(b) The HITRAN spectroscopic parameters associated with the absorption of different species (Gordon et al., 2017, 2022) are shown with colors.

Figure 12 .
Figure 12. Analysis of an intense fire event in China on 13 January 2013 in the evening (nighttime or PM orbit) based on IASI/Metop-A L1C data.(a, c, e) Spatial distribution of residual values associated with SO 2 , CO, and NH 3 .(b, d, f) SO 2 plume altitude retrievals (km) and CO and NH 3 total column retrievals (molec.cm −2 ).

Figure 13 .
Figure 13.(a) Example of GMI pseudo-residual calculated from IASI/Metop-B L1C data during a sulfur plant fire event occurring in Iraq on 24 October 2016 in the evening (nighttime or PM orbit).(b)The HITRAN spectroscopic parameters associated with the absorption of different species(Gordon et al., 2017(Gordon et al., , 2022) ) are shown in colors.

Figure 14 .
Figure 14.Analysis of sulfur plant fire event in Iraq on 24 October 2016 in the evening (nighttime or PM orbit) based on IASI/Metop-A L1C data.(a) Spatial distribution of residual values associated with SO 2 .(b) SO 2 total column in Dobson units.

Figure 15 .
Figure 15.Analysis of the acetylene sporadic emission event in Iraq on 29 July 2021 based on IASI/Metop-A L1C data.(a) Spatial distribution of residual values associated with C 2 H 4 during the morning orbit.(b) Spatial distribution of residual values associated with C 2 H 4 during the evening orbit.

Table 1 .
Signal intensity thresholds (F 2 ) for several species for day-and nighttime conditions obtained from the 99th percentile of the GMA or GMI pseudo-residuals.The thresholds are defined based on the more intense peaks associated with each molecule.Since IASI-PCA sensitivity is generally lower during nighttime than during daytime, which is mainly due to thermal contrast, different thresholds for dayand nighttime conditions were defined.

Table 2 .
Brief description of the four case studies analyzed in this section.