Toward instrument combination for boundary layer classification

To handle the complexity of the atmospheric boundary layer (ABL) and make accurate feature detection (top height, low‐level jets, inversions, etc.), a prior necessary step is to identify the type of boundary layer. This study proposes a new method to identify the boundary layer type through unsupervised classification and the synergistic use of ground‐based remote sensing. Unsupervised classification is used to lighten the human supervision. The new classification was applied to a 1‐day case study collected during wintertime in the Arve River valley near Chamonix–Mont‐Blanc during the Passy‐2015 field experiment. The ABL classification obtained from microwave radiometer and ceilometer observations (ground‐based remote sensors [GBReS]) combination is compared with high‐frequency radiosoundings (RS) data and the French convective scale AROME model outputs. Classifications from RS and GBReS broadly agree, demonstrating the good behavior of the method, AROME leading to different results at night. The difference of AROME is likely due to the different nature of the data (model fields are smoother and include forecasting errors). The results show the ability of unsupervised classification to segment relevant objects in the boundary layer and the benefit to use a combination of GBReS.


| INTRODUCTION
The atmospheric boundary layer (ABL) is classically defined as the part of the atmosphere in contact with the earth's surface, undergoing its influence within a time scale of the order of 1 h (Seibert et al., 2000;Stull, 1988). It is the siege of complex phenomena such as turbulence, exchanges with the surface, complex terrain forcings, local circulations (breezes, low-level jets), fog, and so forth. Most of these complex phenomena are directly linked to high-stakes issues (e.g., air quality, land and air transportation, urban climate, etc.). It is thus a key topic for weather and climate forecasting because of its influence on the upper atmosphere through many physical processes that are currently not well described in numerical models (Couvreux et al., 2016). Observing the ABL is also a challenge because of its high spatio-temporal variability and the limited Abbreviations: ABL, atmospheric boundary layer; FA, free atmosphere; GBReS, ground-based remote sensors; IOP, intensive observation period; MWR, microwave radiometer; RS, radiosoundings; SBL, stable boundary layer. resolution of satellite data in this part of the atmosphere, leading to many collaborative actions in order to build large networks of observation sites (see, e.g., Illingworth et al. 2019;Haefele et al. 2016 and references therein). The ABL is usually described by a typical diurnal cycle whose complexity is due to its high dependence on the land cover, the orography, the latitude, and large scale conditions (de Arruda Moreira et al., 2022;Garratt, 1994;Lehner & Rotach, 2018).
The ABL state also usually evolves from stable conditions at night with temperature inversions and low-level jets, while convective conditions appear in the afternoon, leading to turbulent mixing. In order to retrieve a more advanced product like the ABL height, the identification of the ABL type is a prerequisite step, as the most relevant method will depend on the situation and the ABL state (see, e.g., Collaud Coen et al. 2014 and references therein). As a consequence, ABL classification has been defined as one of the main products to answer end-user needs within the European COST action PROBE (Cimini et al., 2020).
Previous studies on ABL classification are usually part of broader works on ABL height detection (Pal et al., 2013;Rieutord et al., 2021;Toledo et al., 2014), parameterization schemes (Harvey et al., 2013;Lock et al., 2000), air quality alerts (Liao et al., 2018;Sun et al., 2021), or cloud characterization (Hogan & O'Connor, 2004;Kotthaus & Grimmond, 2018), for example. Many classifications are made with carefully tuned thresholds and device-dependent decision trees (Hogan & O'Connor, 2004;Kotthaus & Grimmond, 2018;Manninen et al., 2018;Pal et al., 2013). Harvey et al. (2013) describe a profile-based classification with a probabilistic decision tree, lowering the sensitivity to hard thresholds. Using Doppler lidar observations, Lock et al. (2000) distinguish nine ABL classes and each class triggering different ABL mixing schemes in the UK Met Office model. Manninen et al. (2018) demonstrate that a pixel-based 1 classification with a threshold-based decision tree was able to identify the source of local turbulence at high temporal and spatial resolution using Doppler lidar observations. Using potential temperature profile retrievals from radiosoundings (RS) and unsupervised classification based on self-organizing maps, Liao et al. (2018) showed that nine ABL types could be detected and linked with air quality regimes in Beijing. Rieutord et al. (2021) make a binary classification (within/outside of ABL) with K-means and AdaBoost algorithms in order to retrieve the boundary layer height from lidar measurements.
This study presents a pixel-based classification (as in Manninen et al. 2018) with unsupervised machine learning algorithms (as in Liao et al. 2018 but with a different algorithm). Unsupervised classification suppresses the need for an a priori list of ABL types (as it is the case in Harvey et al. 2013 andManninen et al. 2018). The present study aims to provide a detailed description of the ABL with instrument synergy, while Liao et al. (2018) aim to identify ABL types relevant for air quality and based only on RS. The K-means method used in Rieutord et al. (2021) is also unsupervised and the present study could be seen as an extension of this work to the multi-class problem and to multiple instruments. The high flexibility of this novel approach is demonstrated by applying the ABL classification on several dataset: model forecasts, RS data and a combination of ground-based remote sensors (GBReS) (a microwave radiometer [MWR] and a ceilometer). Data and methodology are described in Section 2. Results of the case study are presented in Section 3 then followed by conclusion and discussion.

| Material
Observational data come from the Passy-2015 experiment that took place in the Northern Alps and was focusing on Original data used in the case study. (a) Absolute temperature (T) from radiosoundings (RS); (b) relative humidity from RS; (c) T from AROME; (d) relative humidity from AROME; (e) T from microwave radiometer; and (f) aerosol backscatter from ceilometer and measured cloud base height (black line) boundary layer meteorology and air quality in a section of the Arve river valley near Chamonix-Mont-Blanc at wintertime (Paci et al., 2017;Sabatier et al., 2018). The case study chosen here to illustrate the main code functionality was February 19, 2015, which is part of the Passy-2015 second Intensive Observation Period (IOP). It was chosen for the high concentration of remote sensors, high-frequency RS, and kilometric scale numerical modeling, which is particularly suited to testing innovative methods (see, e.g., Martinet et al. 2017;Chemel et al. 2016). The second IOP was characterized by a short anticyclonic period between two synoptic disturbances with little snow cover. Meteorological conditions within the valley were marked by pronounced nighttime boundary layer inversions disappearing during daytime due to the sun-driven heating of the ground.
Three sources of data are used in this study: AROME model forecasts (Brousseau et al., 2016), high-frequency RS, temperature profiles retrieved from microwave radiometer, and ceilometer backscatter profiles (GBReS). They provide several parameters onto a time-altitude grid, as shown in Figure 1. All data are at the same site.
For AROME, 1-h forecasts of absolute temperature and specific humidity profiles were used. The 35 vertical levels of AROME within the ABL start approximately at 3.28 m above ground level, and they are linearly stretched to decrease the resolution with altitude (Δz = 11.3 m near the ground and Δz = 115.6 m near 2000 m altitude). The horizontal grid resolution is 1.3 km.
For RS, values of absolute temperature and relative humidity were used. Twelve launches were done this day; therefore, the average time between two launches is about Δt = 2 h. However, they were irregularly spaced; more launches were done in the afternoon. The spacing between vertical levels of RS is of Δz = 9.47 m on average.
For GBReS, we make a combination between a ceilometer (Vaisala CT25K) and a microwave radiometer (RPG HATPRO G3; Rose et al., 2005). Values of rangecorrected backscatter intensity were taken from the ceilometer and retrieved values 2 of absolute temperature were taken from the radiometer. The resolutions of ceilometer data are Δt = 15 s and Δz = 15 m. The resolutions of the microwave radiometer data are Δt = 12 min and a stretched vertical grid (Δz = 50 m close to the surface, Δz = 200 m at z = 2000 m). Therefore, an interpolation is necessary to have these data on a common grid. The target grid that was chosen after a short sensitivity analysis (not shown) has a resolution of Δz = 40 m and Δt = 30 min.

| Method
Classification algorithms aim to assign a class to individuals. In our case, individuals will be pixels in a timealtitude grid, and classes will be boundary layer types. An individual is represented by a vector of features x = (x 1 ,…, x p ), with p the number of atmospheric variables used for the classification. For example, with RS data, we have p = 2 features: absolute temperature and relative humidity. For GBReS data, p = 1 when they are used separately (for the MWR alone, only absolute temperature is used due to low information content of the MWR regarding humidity in the ABL, and for ceilometer alone, only backscatter intensity is available). However, when they are used together, p = 2, but the two atmospheric variables are originally available on different grids. So when MWR and ceilometer are used in combination, an interpolation step is required to get all atmospheric variables on the same grid. A class is represented by a label y, an integer between 1 and K, K being the number of classes.
A classifier is a function C such that C(x) = y. They are built upon a set of individuals X train = (x 1 ,…, x N ) T for which we have trustworthy class labels available y train = (y 1 ,…, y N ), N being the number of pixels. The proposed method to build an automatic classifier b C is iterative. At the beginning, X train and y train are empty, and the available classifier b C is from unsupervised classification.
1. Choose a set of individual (for example, 1 day of measurement): X new = (x 1 ,…, x N ) T . 2. Use available classifier to provide some class labels: Manually interpret the given class labels and add the human-validated individuals to the training set: Repeat from step 1 until the supervised classifier reaches satisfactory performances.
This study focuses on the first steps of this iterative process, namely the unsupervised classification at the very start of the process. All algorithms were operated thanks to the Scikit-learn open-source Python package (Pedregosa et al., 2011), and the code is fully available following this link: https://github.com/ThomasRieutord/blclassification.
Unsupervised classifiers are based on distance between individuals and they automatically gather the closest individuals within the same class. More specifically, we used clustering algorithms (Friedman et al., 2001;Jain et al., 1999), with setting details given when the results are presented. Note that the unsupervised classification is only a way to lighten the human supervision. The physical consistency of the classes is ensured by human supervision, not by the classification settings.
The identification of ABL types is made by examining the characteristics of the formed groups, according to Stull (1988). Given that (i) the physical definitions are idealized and (ii) the groups are formed from the data and not from the physical definition, the name given to the groups is somehow subjective and certainly debatable. The following types are attributed to the groups of pixels if they gather enough characteristics.
• Mixed layer: warm, in contact with the ground, moderate to high concentration. • Stable layer: cold, under a temperature inversion, with possibly high concentration. • Free atmosphere: aloft, low concentration.
• Cloud: strong enough gradient of concentration to trigger the cloud base height attribution of the ceilometer.
The word concentration is used as a proxy for the relative humidity (water vapor concentration) or the backscatter intensity (aerosol concentration added to aerosol optical size). In order to highlight the difference with usual definitions of ABL types, the groups given by the classification will be attributed as potential ABL types.
First, we apply unsupervised classification to GBReS separately (Section 3.1). Then, the same unsupervised classification is applied to GBReS together (Section 3.2). Eventually, the classification is applied to the different sources of data: AROME, RS, and GBReS (Section 3.3).

| Unsupervised classification without combination of GBReS
In Figure 2, the results of unsupervised classification applied to the micro-wave radiometer or ceilometer separately are shown. In both figures, the unsupervised classification is made with average linkage hierarchical clustering, enabling a maximum of four clusters. The results of the classification are class labels for pixels, from 0 to 3, shown in shades of colors onto a time-altitude grid. Please note that in each panel, the same color does not correspond to the same ABL type. The inconsistent attribution of colors, directly reflecting the cluster labels, illustrates the random attribution of cluster labels in unsupervised classification.
Firstly, one can see that the classification with the ceilometer only ( Figure 2a) detects a potential stable boundary layer (SBL) (red area) below 400 m between 00:00 and 09:00 topped by a potential cloud layer (green area). The presence of the cloud is confirmed by the automatic detection of cloud base height by the ceilometer, which gives a cloud base height at this location. We can note that one of the clusters (blue area) does not correspond to any boundary layer object and is likely due to low signal strength.
Secondly, the classification with the radiometer only (Figure 2b) also detects a potential SBL (orange area) below 800 m between 00:00 and 09:00 but thicker than the one observed by the ceilometer alone. 3 The structure of a potential convective boundary layer is also detected from 11:00 to 21:00. The red area corresponds to the description of a well-mixed layer. The green area corresponds to an intermediate layer, not as warm as the possible mixed layer but still warmer than the pixels around. However, the orange area is also present above 1600 m in the same group than the potential SBL (which does not necessarily imply that it is also a potential stable boundary layer).
This comparison demonstrates that each instrument is sensitive to different objects in the boundary layer, so the two instruments are complementary as expected. For example, the cloud layer is detected by the ceilometer but not by the radiometer alone. The potential convective boundary layer is detected by the microwave radiometer but not by the ceilometer alone.

| Hierarchical clustering with combination of GBReS
In Figure 3, the results of unsupervised classification applied to the combination of GBReS are shown. This classification was made with hierarchical clustering, "cityblock" distance, and average linkage. Figure 3a-f shows the different results of hierarchical classification when it is configured to form 2-7 clusters.
• Figure 3a (two clusters): The algorithm distinguishes between a potential SBL (blue) and another part (orange). • Figure 3b (three clusters): The other part is divided between a potential mixed layer (orange) and potentially the free atmosphere (FA) (green). • Figure 3c (four clusters): The potential SBL is divided into a potential cloud layer (blue) and another potential stable boundary layer. These results show potential for the hierarchical classification to describe the boundary layer with a chosen level of details. Figure 3c is comparable to the Figure 2a,b because they all display four clusters with the same clustering algorithm. One can see in Figure 3c the potential SBL (orange area-visible in Figure 2a,b), the potential cloud layer (blue area-visible in Figure 2a but not in Figure 2b), and the potential diurnal boundary layer (green area-visible in Figure 2b but not in Figure 2a). Figure 3c combines the strength of both instruments and integrates together the information present in Figure 2a,b. The artifacts visible in Figure 2a,b are also removed in Figure 3c. This demonstrates the significant benefits of using GBReS in combination for our boundary layer classification objective.

| Comparison between AROME, RS, and remote sensors
The classification is now applied to data from all sources in order to compare whether the same objects are visible from model data, RS data, and ground-based remote Result of classification for GBReS data for 2-7 clusters-hierarchical clustering with "cityblock" distance, average linkage, and linear interpolation on the grid (Δt = 30 min, Δz = 40 m) sensing data. For the RS and AROME sources, the backscatter information was replaced by another concentration variable: the relative humidity. Figure 4 shows the cluster labels in a time-altitude grid for the three sources of data. This classification was made with hierarchical clustering with Euclidean distance and Ward linkage and cut at three clusters.
One can see on all classification outputs two clusters in contact with the ground, one in the morning and one in the afternoon, and a third cluster above the two previous. We identify the morning ground-connected cluster as a potential SBL, the afternoon ground-connected cluster as a potential mixing layer (ML), and the aloft cluster as the potential FA.
The locations of clusters vary depending on the data source. However, they are very consistent between GBReS observations and RS data, demonstrating the ability of the classification algorithm to run on various types of data. In addition, the benefits of the increased temporal and vertical resolution provided by the remote sensing instruments in GBReS are very clear when compared to the more scarce RS data. Another inference from Figure 4 is that the data preparation, although different between GBReS (require interpolation) and RS (do not require interpolation), does not prevent the classification to give similar results. This suggests that the necessary data interpolation applied to GBReS data is not disruptive.
We can note that the classification from AROME data shows larger discrepancies compared with the two other classifications. With AROME forecasts, the potential SBL is not very realistic as it goes too high within the atmosphere-over 1250 m at 01:00. The potential ML develops sooner (around 08:00) for AROME than for other data sources (around 12:00), and its maximum vertical extension is limited to 1000 m. In Figure 4b,c, the green area near the ground between 18:00 and 00:00 could be a shallow, stable layer not seen by the AROME model. However, many differences are already visible in Figure 1. Therefore, these differences are rather more likely due to model limitations (e.g., subgrid physics and topography representation) than to erroneous classification by the algorithm used on model data.
When the parameters of the classification change (data not shown), the potential stable layer and the potential mixed layer are identified by all sources of data. However, the cloud is not detected by RS nor AROME, whereas it is detected by GBReS (see Figures 2a and 3c-f). The separation of the potential mixed layer into two sublayers (e.g., as in Figure 3e) can also be made differently from one source to another. These differences probably occur because of the differences in resolution and because the relative increase in backscatter intensity in the case of clouds is stronger and can therefore be picked up by the clustering algorithm. Therefore, the classifications from different sources only compare for the potential stable layer and the potential mixed layer.

| Conclusions
The current study presents a new method to perform pixel-based boundary layer classification using machine learning techniques. It assumes relevant atmospheric variables are known onto a time-altitude grid. Pixels of the time-altitude grid are cast into classes that have similar atmospheric values thanks to hierarchical clustering.
The method was applied to AROME model forecasts data, high-frequency RS data and a combination of GBReS data for a wintertime valley boundary layer case observed during the Passy-2015 experiment's second IOP on February 19, 2015. The study yields the following results: • Hierarchical clustering is able to facilitate a boundary layer classification with instrument combination. The resulting classification combines the strengths of both instruments and provides better results that could be deduced from applying the method to both instruments individually. • The detailed examination of the hierarchical structure of clusters for GBReS shows potential to describe the boundary layer with a chosen level of details.
(b) (c) (a) F I G U R E 4 Results of the classification from AROME, RS, and GBReS datahierarchical clustering with Ward linkage cut at three clusters • The comparison between AROME, RS, and GBReS shows that the classification is applicable to all sources (similar results), with GBReS having the best temporal and vertical resolution.

| Prospects
The purpose of the present study is to explain a new methodology and give an insight into its potential, but additional work is needed to investigate more deeply the benefits of this method for automatic boundary layer classification. First, the study should be extended to a broader dataset. The interesting objects detected by the hierarchical classification here (like the cloud layer in Figure 3c or the transition layer in Figure 3f) may not be detected another day. The ability of the classification to deal with various, including unprecedented, situations must be investigated in order to quantify the information brought by the classification and how reliable it is. The recent field experiment SOFOG3D (Burnet et al., 2020) could be used, as it operates co-located microwave radiometers and ceilometers for 6 months at 6 different sites as well as AROME simulations with fine vertical resolution and frequent RS during fog events.
Second, the identification of the ABL types in the output of the classification raises some questions. Given that (i) the physical definitions are idealized and (ii) the groups are formed from the data and not from the physical definition, the name given to the groups is somehow subjective and certainly debatable. We believe that the unsupervised classification brings a new approach that would be interesting in this debate: define the boundary layer types from the data rather than from the a priori theory. Instead of having hard definitions (all characteristics must be satisfied) for the boundary layer types and finding the corresponding groups in the data, the unsupervised classification looks for groups in the data, and these groups are then identified with soft definitions (enough characteristics are satisfied). By doing so, the subjectivity of the ABL type identification would rely on the use of these soft definitions and on the choice of the input features (e.g., use virtual temperature instead of absolute temperature).
Third, evaluation of such classification results is not straightforward. Internal scores exist (e.g., the silhouette score [Rousseeuw, 1987] or the Calinski-Harabasz index [Cali nski & Harabasz, 1974]). They are usually based on distance ratios, but they are not connected with the usefulness of the classification and might not be the most appropriate methods. A comparison with already existing boundary layer classifications has two main handicaps: (i) it ensure that the output classes are similar, which refers to the previous paragraph; and (ii) in the absence of a reliable reference, the result would only highlight the differences between classifications without telling which one is better. User-oriented scores should be a better way to assess the classification and need to be defined and built.