A station‐based evaluation of near‐surface south foehn evolution in COSMO‐1

This study investigates the skill of the Consortium for Small‐scale Modeling (COSMO) model (v5.7) at 1.1 km horizontal grid size in simulating the near‐surface foehn properties and evolution for five south foehn events and a five‐year‐long climatology. A significant near‐surface cold bias is found during foehn, with an average bias of −3$$ -3 $$  K in the Rhine Valley in the five foehn cases and −1.8$$ -1.8 $$  K in the major northern foehn valleys in the five‐year foehn climatology. The cold bias tends to be larger in the stronger and moister deep foehn events. Sensitivity experiments are carried out to examine the possible causes of the cold bias, including changes to the parameterization of the land–atmosphere interaction, 1D turbulence parameterization, and horizontal grid spacing. Most sensitivity experiments have only a very minor impact on the cold bias, except for the model run with a horizontal grid spacing of 550 m. The 550‐m COSMO run shows a reduced cold bias during foehn hours and also an improvement in the simulated foehn duration and northward foehn extent. By inspecting the vertical dimension, we found that the near‐surface cold bias downstream might partly originate upstream. A further contribution to the downstream cold bias is likely due to insufficient vertical mixing in the foehn flow. The latter is possibly enhanced in the 550‐m model run, leading to a less stably stratified atmosphere in the lower few hundred meters of the atmosphere and a reduction of the reported model cold bias.

hazards (Richner & Hächler, 2013).It may also affect the local air quality by advecting or dispersing pollutants (Baumann-Stanzer et al., 2001;Gohm et al., 2009) or by modifying the atmospheric stratification (Li et al., 2020).When the foehn flow blows over a snow/ice-covered region it encourages melting, and thus it gets the name of "snow eater".Foehn-like flow has also recently been related as a driver for ice-shelf melting in the Antarctic Peninsula (King et al., 2017;Turton et al., 2020).
In the last decades, the European Alps have been a hotspot for foehn research (Seibert, 2012).While foehn flow has been observed to originate from different directions, the south foehn-the foehn originating from the south of the Alps-is the most studied type, probably due to its frequent occurrence and strong impacts on local weather (Drobinski et al., 2007).Seibert (1990) summarized several south foehn studies in the Wipp and Inn Valleys since the Alpine Experiment (ALPEX) in the early 1980s.There, two foehn types, differing in their warming mechanisms, are discussed: the classical "thermodynamic foehn" (Hann, 1866;Hann, 1885) and foehn with stable stratification on the south side and dry and stably stratified foehn flow which descends, forming dense isentropes on the north side (later called "isentropic drawdown" in Elvidge & Renfrew, 2016).
Since the introduction of the Mesoscale Alpine Programme (MAP) in the mid-1990s, the Rhine Valley has received attention in the scientific comunity due to its unique topography and flow patterns.The valley is north-south oriented, with its north end opening to the Alpine Foreland.In addition, the Rhine Valley is flanked by several tributary valleys.Previous studies have shown the occurrence of a flow-splitting pattern at the intersection of the Rhine and Seez (see Figure 1b,c) during south foehn events (Drobinski et al., 2001;Drobinski et al., 2007;Zängl et al., 2004).Katabatic drainage flow from the Walgau and Prättigau regions has been reported to merge into the Rhine Valley and can cause local wind maxima in a shallow foehn event (Drobinski et al., 2007).Steinacker (2006), in the frame of the project "Foehn in the Rhine valley during MAP" (FORM), summarized the previous studies in the Austrian and Swiss Alps.The author suggested that "thermodynamic foehn" is more often observed in the Swiss Alps (Swiss foehn), while "isentropic drawdown" is more common in the Austrian Alps (Austrian foehn).Würsch and Sprenger (2015), for the first time, confirmed the idea with a trajectory study using a three-year-long analysis as an input, and emphasized the existence of both foehn types at both locations.In addition to the two warming mechanisms mentioned above, Elvidge and Renfrew (2016) introduced two further local warming mechanisms with the help of a Lagrangian heat budget: turbulent mixing, which heats up the cold air pool (CAP) from the top down and destabilizes the foehn flow, and radiative heating, upward surface sensible heat flux boosted by the "foehn window" which heats the CAP from the bottom up.Subsequently, Miltenberger et al. (2016) investigated the contribution of the four warming mechanisms in two south foehn cases in the Rhine Valley, with a quantitative focus on the thermodynamic foehn and isentropic drawdown.The authors identified two primary branches of trajectories attributed to the thermodynamic and isentropic drawdown mechanisms, respectively.These branches coexist (referred to as vertical "scrambling" of air parcels by the authors) and contribute simultaneously to foehn warming.However, their relative impacts can vary at different stages or periods during the foehn event.Recently, Jansing and Sprenger (2022) quantitatively studied the variability of the warming mechanisms within the airstreams arriving in six major Alpine valleys.Their study validated the previous findings of Miltenberger et al. (2016) and emphasized that the warming observed is a combination of various warming mechanisms with varying relative contributions.Importantly, the relative contributions of these warming mechanisms are influenced by the synoptic situation and geographical location of the valleys.
Alongside the foehn warming mechanisms, the variability of the synoptic situation, up-and downstream local stability, cloudiness and precipitation, and the large variability of the local foehn intensity among foehn events inspire studies on foehn classification.Gerstgrasser (2017) summarized five south foehn types based on the synoptic to mesoscale conditions: (1) the classical foehn, which is driven by a low-pressure system over the British Isles or the Bay of Biscay, which facilitates cold-air blocking on the south side and warm-air advection on the northern side of the Alps, causing a hydrostatic pressure gradient across the Alps that initiates the foehn flow; (2) the anticyclonic foehn, in which the low-pressure system is further west, resulting in no precipitation and low humidity on both sides of the Alps (analogous to the Austrian foehn type); (3) the shallow foehn, which occurs with only weak support by the synoptic situation, driven by a deep cold pool on the south side of the Alps flowing through a mountain pass (the so-called "waterfall theory"); (4) the Güller/Gegenstrom foehn as a special type of shallow foehn, in which the airflow above the mountain top is against the low-level foehn flow direction (Güller, 1977); and (5) the Dimmer foehn, in which clouds and precipitation can exist on the north side of the mountain due to a strong cross-mountain pressure gradient.Jansing et al. (2022) summarized more foehn classification methods in detail.
In contrast to the previously mentioned projects, which focus on the well-developed phase of the foehn flow, the Penetration and Interruption of Alpine Foehn (PIANO) research project investigated the foehn's initial and final stages.The project focuses on studying the interaction between the foehn and the pre-existing CAP in the Inn Valley using extensive observations combined with numerical models (Haid et al., 2020;Umek et al., 2021;Umek et al., 2022).Haid et al. (2020) provided evidence of interaction between the pre-foehn westerlies in the CAP and the foehn flow.).Improvement of the simulated TKE can be achieved by employing even higher horizontal resolutions.However, even with a horizontal resolution of 13 m, the LES fails to replicate the correct timing of the foehn breakthrough.The authors hypothesized that this resolution is still insufficient to resolve adequately the small-scale eddies dominant at the interface between the foehn flow and the CAP, as well as within the CAP itself.Despite the differences in turbulence characteristics, the mean features of the foehn flow are rather similar in LES simulations with various horizontal and vertical resolutions.
A similar deficiency has been reported in the numerical weather prediction (NWP) Consortium for Small-scale Modeling (COSMO) model (Sandner, 2020;Wilhelm, 2012).The COSMO model has been used operationally in several national meteorological services, including the Swiss Federal Office for Meteorology and Climatology (MeteoSwiss), where the model has shown solid performance.However, foehn forecasters and previous investigations have noticed systematic biases during south foehn events, such as a too-early or delayed foehn onset/breakthrough at the valley floor, a cold temperature bias, and a high relative humidity bias after the breakthrough (Sandner, 2020;Wilhelm, 2012).The reasons for these biases are not yet clear.Based on the above-mentioned warming mechanisms, some hypotheses for the cold bias in the COSMO model can be proposed.For instance, a too-strong coupling between the cold soil layer and the atmosphere in the land-surface parameterization may lead to a cold bias during foehn hours.The choice of planetary boundary-layer scheme and the height of the lowest model level may affect the representation of the potential temperature profile and, consequently, the near-surface temperature (Zängl et al., 2008).A too stably stratified foehn flow may contribute to a cold bias, as suggested in Umek et al. (2021) for the elevated stations.An inaccurate distribution of clouds and precipitation may influence the surface energy budget and thus the near-surface atmospheric properties.Additionally, the filtering of the orography required for numerical stability may lead to a bias in the diagnosed 2-m temperature and an insufficient dry-adiabatic descent across the mountain (Umek et al., 2021).
The objective of this study is to evaluate the near-surface south foehn evolution simulated with the COSMO model with 1.1 km horizontal grid spacing (COSMO-1) based on surface observations from multiple sources.For this, an event-based evaluation of five south foehn cases is complemented with a climatological analysis for a five-year period.We perform several sensitivity experiments in order to evaluate potential causes of the temperature bias within the prevailing foehn flow.The article is structured as follows.In Section 2, we introduce the COSMO model, the setup of the control and sensitivity experiments, and the five-year COSMO analysis data.The observations, foehn detection methods, and foehn classification methods are also presented.Section 3 is divided into four subsections.In Section 3.1, the synoptic situations of the five foehn cases are introduced.In Section 3.2, the simulation results of the strong and long-lasting south foehn case are described and evaluated in detail with station-based observations.In Section 3.3, the first case is compared with four other cases to find similarities and differences in the model biases.In Section 3.4, we confirm the representativeness of the model biases for the five cases using a five-year COSMO-1 analysis dataset.In addition, biases for different foehn types are also investigated.In Section 3.5, a subset of possible causes of the model biases mentioned above is investigated with additional analysis and sensitivity experiments.Lastly, discussions, conclusions, and open questions are given in Section 4.

The COSMO model
The COSMO model (version 5.07) is a nonhydrostatic limited-area numerical weather prediction model.The model solves the fully compressible hydrothermodynamic equations formulated in rotated geographical coordinates on an Arakawa C-grid with a time-splitting third-order Runge-Kutta scheme for time integration (Wicker & Skamarock, 2002).We use a COSMO- The physical parameterizations used in our setup include a radiative transfer scheme based on the -two-stream approach from Ritter and Geleyn (1992), a modified 1D TKE-based prognostic closure for vertical turbulent diffusion (horizontal diffusion is not considered by default: (Buzzi et al., 2011;Mellor & Yamada, 1982), a corresponding TKE-based surface transfer scheme, and a multilayer soil and vegetation model TERRA_ML (Schulz et al., 2016).We use seven soil layers in TERRA_ML with a total depth of 14.58 m.The model employs an updated targeted diffusion for the numerically induced cold extremes that are caused by the fifth-order advection scheme over steep topography in very rare instances (Langhans et al., 2012).In addition, we use the new soil thermal conductivity implemented by Schulz et al. (2016).Unlike previous model versions that assumed a constant median value, the updated model includes the impact of soil moisture and soil ice on soil thermal conductivity.
The geographical datasets for describing the earth surface in our simulations are as follows: the Advanced Spaceborne Thermal Emission and Reflection Radiometer global digital elevation map (ASTER) using a horizontal resolution of 30 m for the topography, 1 GlobCover (GC2009) with a horizontal resolution of 300 m for land use, 2 and the Harmonized World Soil Database (HWSD) with a horizontal resolution of 1 km for the soil type. 3 Model outputs are instantaneous values and are output every hour for 2D (surface) fields and every 3 hr for 3D fields.

Experiments
The model setup described in Section 2.1 is referred to as the reference setup, REF, hereafter.Five additional sensitivity experiments are conducted for all the selected foehn cases to test possible causes of the cold bias observed in the REF simulations (Table 1).These include the following.
(1) EVAP, a newly implemented bare-soil evaporation scheme based on a resistance formulation.It is proved to reduce the syndixstematic bias in latent heat flux in COSMO (Schulz & Vogel, 2020 (2) SKIN, an implementation of a skin layer to represent the isolation effect of vegetation (Schulz & Vogel, 2020;Viterbo & Beljaars, 1995).(3) SURF, an increased resistance (from 1 to 10) for heat transfer in the laminar sublayer (Khain et al., 2015).(4) TKESH, the inclusion of a horizontal shear production term in the prognostic TKE equation.(5) DX550, a reduction of the horizontal grid spacing to 550 m, while maintaining the same vertical resolution.The time step for dynamics is reduced to 5 s in DX550 to ensure the same Courant-Friedrichs-Lewy condition for numerical stability.Appendix A presents a detailed description of the experiments.
In addition to the case studies, a foehn forecasting climatology is presented based on a five-year-long COSMO-1 analysis dataset.The COSMO-1 analysis for the five-year period, from October 29, 2015-October 29, 2020, was produced with a consistent data assimilation system at MeteoSwiss, based on the observation-nudging technique (Schraff & Hess, 2018).This system runs continuously and nudges the prognostic fields towards the observations every hour.The observations used in the analysis include 2-m dewpoint temperature and surface pressure from the surface synoptic observations (SYNOP) in Switzerland, France, Germany, Austria, Slovenia/Croatia, and Italy.They also include radiosonde observations, wind profilers, Aircraft Meteorological Data Relay (AMDAR), and a few marine observations from ships and buoys in the Ligurian Sea and Adriatic Sea.Radar-derived surface precipitation is included through latent heat nudging (Stephan et al., 2008).The snow cover is derived from Spinning Enhanced Visible and InfraRed Imager (SEVIRI) satellite data.However, observations of 2-m temperature and surface winds are not assimilated.

Observations
Surface observations from three measurement networks are used for the model evaluation (see Figure 1b,c).Hourly station observations from SwissMetNet were downloaded from the data portal of MeteoSwiss. 4 The data base includes an hourly 10-min averaged 10-m wind speed, hourly averaged 2-m temperature, relative humidity, and station pressure.For specific humidity and dewpoint temperature, only instantaneous values at 10-min intervals are available.This dataset serves three purposes: (1) climatological analysis of the model biases during foehn events in the northern foehn valleys (black diamonds in Figure 1b); (2) detailed analysis of the case studies in the (3) evaluation of the cross-Alpine pressure gradient during foehn events (orange diamonds in Figure 1b).Note that we utilize hourly-averaged temperature and relative humidity and 10-min averaged wind speed to evaluate instantaneous model output.This is legitimate, as the model does not resolve the high temporal and spatial variability associated with small-scale turbulent motions.
In addition, we use 10-min data from the semiautomatic weather stations (TAWES, Teilautomatischen Wetterstation) from GeoSphere Austria, formerly known as Central Institution for Meteorology and Geodynamics, ZAMG (Zentralanstalt für Meteorologie und Geodynamik), and 10-min observations from MeteoGroup/DTN (previously known as Telvent DTN, Data Transmission Network and Dateline).The data are retrieved every full hour to match the model output frequency.For TAWES data, specific humidity is not available, hence it is calculated from the dewpoint temperature and station pressure, following eq.4.24 in Stull (2015).For the MeteoGroup data, the station barometric pressure is not available at all stations and it is therefore is calculated from reduced pressure at mean sea level, whenever such data are available.Tables B.1-B.3 list these stations and the availability of the processed data.
To evaluate the upstream model bias, we use twice-daily radio-sounding data (0000 and 1200 UTC) from Milan (yellow triangle in Figure 1b), retrieved from the University of Wyoming website. 5 For the model evaluation, the grid points closest to the observation sites were selected based on the optimal distance, that is, both horizontal and vertical distances are considered, following Kaufmann (2008).

Foehn detection method
For this study, we aim to find a foehn detection method that can be applied easily to both observations and simulations across various foehn station locations.Foehn detection methods commonly include deterministic methods using statistical thresholds derived from long-term observations, such as Dürr (2008), or probabilistic methods using, for example, the statistical mixture model (Plavcan et al., 2014).The latter has been shown to offer more advantages than the former in a community foehn detection experiment (Mayr et al., 2018).Both foehn detection methods selectively consider local parameters, such as relative humidity, wind speed, and direction, and nonlocal parameters, such as the potential temperature difference between valley stations and a crest station.The potential temperature difference measures the contribution of adiabatic descent, which differentiates foehn from thermally driven downvalley/downslope winds.
As both methods require manual selection for individual stations and long-term data to some extent, they are not suitable for the present study.Therefore, we combine a preselection of foehn cases/hours based on existing foehn indices with local foehn criteria to achieve a similar result.For the preselection, an enhanced offline version of the operational station-based foehn index maintained by the Alpine Research Group Foehn Rhine Valley/Lake Constance (Arbeitsgemeinschaft Föhnforschung Rheintal-Bodensee, AGF)6 is used (Jansing et al., 2022).As for local foehn criteria, we utilize a simplified version of the AGF foehn definition.It is noteworthy that this method does not provide a superior diagnosis of foehn events compared with the existing foehn detection methods.Rather, its primary purpose is to avoid the TA B L E 2 List of criteria for the definition of south foehn for stations in the Rhine Valley according to the Arbeitsgemeinschaft Föhnforschung Rheintal-Bodensee (AGF).

Criterion
Valley Foreland Beginning Duration Ending need for defining different location-specific thresholds for observations and simulations, and to establish a "common ground" for comparing simulations and observations for this specific study.
The AGF foehn definition is intended for foehn events taking place in the lower Rhine Valley and Lake Constance region.The full criteria are shown in Table 2, where e1, e2, and e3 serve as termination criteria for a foehn event.
The foehn event ends when there is a significant change in e1 (wind direction), or in both e2 and e3 (wind speed and wind gust), accompanied by a decrease in temperature and an increase in relative humidity.The "must" in the beginning and duration columns indicates mandatory criteria, while "x" denotes criteria that depend on specific conditions.At least three out of the four "x" criteria must be met to initiate a foehn event, and at least two out of the three "x" criteria must be met for the foehn duration.For an easy extension to various foehn locations and model data, we simplify the AGF definition by keeping only the criteria for wind speed and relative humidity.We thus refer to this as "the simplified AGF foehn criteria" in the remaining sections.This method results in two time indices of foehn hours based on the simulations and the observations.On one hand, the timing biases of the model, specifically the start and end times of foehn events, can be evaluated by comparing the corresponding time indices.However, caution must be exercised when interpreting the results if there are systematic biases in the simulated relative humidity or wind speed due to the application of unified thresholds to both observations and model outputs.On the other hand, during foehn periods, we can assess the model's performance by considering the intersection of the two time indices.This approach helps exclude any undesirable model biases arising from inconsistent timing between the observations and simulations.The resulting time index is subsequently referred to as "common foehn hours".
To test the validity of the simplified AGF foehn criteria, we apply it to the five foehn cases.The resulting time indices determined from observations are compared with the enhanced offline version of the operational station-based foehn index (hereafter called the "enhanced foehn index": (Jansing et al., 2022).The enhanced foehn index is based on the operational foehn index available at MeteoSwiss with a time resolution of 10 min, where three foehn labels are defined: (0) no foehn, (1) foehn mixed air, and (2) foehn.The offline enhanced foehn index extends the operational one with two additional labels for Dimmer foehn and Gegenstrom foehn, as well as a probability index at 10-min temporal resolution.We aggregate the 10-min values (0, 1, or 2) within the time slot [0 min, 50 min] to the full hour.When the aggregated value is equal to or larger than six, the full hour is considered as a foehn hour.The simplified AGF foehn criteria agree well with the hourly-aggregated enhanced foehn index, with an average overlap ratio of 95% for the spring and late autumn cases and 88% for the early summer case (the five foehn cases are introduced in Section 3.1), averaged over a Reuss Valley station, Altdorf, and the Rhine Valley stations.The lower value for early summer is understandable, since the foehn signal is typically weaker in summer due to the stronger solar radiation and diurnal cycle.In rare cases, such as weak foehn/shallow foehn, downvalley winds due to night-time cooling might be falsely classified as foehn winds, as reported by Plavcan et al. (2014) for the statistical mixture model.

Foehn classification
Following Jansing et al. (2022), we classified all the foehn events that occurred within the five-year period into five categories, including three deep foehn types: dry foehn, moist foehn, and Dimmer foehn, and other two foehn TA B L E 3 An overview of all the selected foehn cases, their duration recorded by AGF at station Vaduz, their initialization times in the COSMO simulations, and their distinguishing features.

RESULTS
For the event-based evaluation, five south foehn cases are selected from the AGF foehn case list, covering different seasons, synoptic situations, precipitation conditions, foehn intensities, and durations (see Table 3).First, a brief introduction to the synoptic situation for the five cases is given (Section 3.1).The five cases are then evaluated in Sections 3.2 and 3.3, followed by a climatological analysis in Section 3.4.A set of sensitivity tests is presented in Section 3.5.

Synoptic overview for five cases
The synoptic situation in the beginning of the 2016nov case is representative of a typical deep south foehn event.At 500 hPa, there is a trough over the British Isles (Figure 2a) alongside a high-pressure system extending from Russia to northern Italy.At lower levels, densely distributed isobars are formed along the Alps, that is, the "foehn knee" (Richner & Hächler, 2013).The wind blows across the Alps as a result of this pressure gradient.Subsequently, a cut-off low developed, temporarily weakening the cross-Alpine pressure gradient and the cross-mountain wind from 0600 UTC on November 22 to 1800 UTC on November 23.Later, the foehn signal reaches a second maximum at around 0600 UTC on November 24, driven by the cut-off low (Figure 2b).The foehn signal lasts until 1700 UTC on November 24, when the foehn event ends due to the weakening of the low-pressure system.Continuous precipitation is observed on the southern side of the Alps during the whole foehn event.Based on the foehn classification method from Jansing et al. ( 2022) (see Section 2.5), this foehn case is classified as a moist deep foehn with a few hours classified as Dimmer foehn (16 out of 108 hr).
While the 2017feb and 2017mar events are two consecutive south foehn cases, the synoptic situations are slightly different.In the 2017feb case, the low-pressure system is further north, resulting in a stronger zonal flow component at 500 hPa over the Alps (Figure 2c).Despite the stronger zonal wind component, this foehn case is classified as a dry foehn.This is due to the high-level wind speed, which is too strong to be categorized as a shallow foehn.Additionally, the wind direction is not sufficiently zonal to be recognized as a Gegenstrom foehn.As the low-pressure system moves towards the southeast, this foehn event transitions into a moist foehn.Later, the foehn event ceases with the arrival of a cold front in the Alpine valleys.In contrast, for the 2017march case, the low-pressure system is located further south (Figure 2d).As a result, the foehn flow extends to high levels and the cross-Alpine pressure gradient is stronger than for the 2017feb case (not shown).Precipitation occurs only on the second day (March 4, 2017) after 0900 UTC on the southern side of the Alps.
A detailed description of the synoptic situation for the 2013mar and 2013may cases can be found in Miltenberger et al. (2016).Both cases correspond to deep foehn events driven by a synoptic-scale low-pressure system, which is off the coast of Portugal for 2013march case and over the British Isles for the 2013may case.A concomitant strong pressure gradient develops across the Alps.In the 2013mar case (the dry case), no precipitation is observed until 0000 UTC on March 6.For the 2013may case (the moist case), on the other hand, there is continuous precipitation on the southern side of the Alps.

Evaluation of a strong foehn case-November 2016
We first present the 2016nov case to illustrate the near-surface characteristics and evolution of a typical foehn event.Figure 3 shows the time series of 2-m temperature and relative humidity at six MeteoSwiss surface stations (red stations in Figure 1b).The gray shading and the blue bar denote the foehn hours detected by the simplified AGF foehn criteria based on the observations (OBS) and the reference simulation (REF), respectively.In OBS, an abrupt increase in near-surface temperature and a decrease in relative humidity mark the breakthrough of the foehn flow to the surface, this being most pronounced at Altdorf (ALT), Vaduz (VAD), and Altenrhein (ARH).At the same time, the wind speed increases abruptly and remains high during the foehn hours.In addition, the wind direction turns parallel to the valley axis and remains steady (not shown).The foehn signal is strongest and most persistent for ALT and VAD.In contrast, Chur (CHU) and Bad Ragaz (RAG) show a weaker foehn signal and a more pronounced diurnal cycle, presumably affected by a combination of foehn warming and diurnal temperature variation.For the stations further north, the foehn is less persistent in OBS.The foehn signal is temporarily suspended at Oberriet (OBR), and it ends on November 22 at ARH.
While REF reproduces the main features of the foehn breakthrough (rise in temperature, drop in relative humidity, and increase in wind speed), biases are evident in the near-surface temperature after the foehn breakthrough, as well as in the timing of the foehn event.Specifically, at ALT, RAG, and VAD, REF exhibits a longer foehn signal at the end of the event, lasting approximately one day longer than in OBS.For the two northernmost stations, REF fails to capture the intermittency of the foehn signal at OBR and the very short duration at ARH.During the common foehn hours (intersection between gray boxes and blue bars in Figure 3), surface temperature (relative humidity) is cooler (higher) in the model than in the observation by 3 K (5.6%) averaged over the six stations.The specific humidity, however, is slightly underestimated in the model by an average of 0.2 g⋅kg −1 (not shown), which indicates that the overestimation of relative humidity is a result of the cold model bias, for this foehn case at least.Before and after the foehn period, the simulation and the observations agree much better.
Figure 4 shows the temporal evolution of the potential temperature at the five Rhine Valley stations.In OBS (Figure 4a), the potential temperature increases sharply first at the central Rhine Valley station, VAD, at 0000 UTC on November 20.This is followed by a rapid rise in temperature at RAG, CHU, and OBR at 0200 UTC on November 20, indicating that the foehn air is extending gradually towards both the upper and lower Rhine Valley.At the lower Rhine Valley station, ARH, the breakthrough does not occur until 2300 UTC.The potential temperature after the foehn breakthrough is warmest at ARH, followed by OBR and VAD, then RAG, and is coolest at CHU.This may indicate that the foehn flow arriving at ARH originates from a higher level and experiences a stronger adiabatic heating.The spatial heterogeneity among the Rhine Valley stations, however, is not reproduced well in REF.As seen in Figure 4b, the foehn flow breaks through at all stations almost simultaneously and the associated temperature change appears milder than in OBS.
Figure 5 shows the horizontal extent (in REF and OBS) and the vertical structure (in REF) of the foehn flow.At the time when the foehn signal is strongest (Figure 5a), REF captures the far northward extent of the foehn flow well.The potential temperature, however, is underestimated at almost all stations, especially at CHU and RAG.The potential reason can be inferred from Figure 5b.The foehn flow in REF descends strongly after passing over RAG, leaving the air at CHU less affected and colder than at RAG/VAD.This suggests a too-weak foehn descent at CHU.However, the reason for this is unclear.It is worth noting that the strong descent over VAD has been observed by Drobinski et al. (2001) with wind lidar and simulated by Zängl et al. (2004) with the fifth-generation Pennsylvania State University-NCAR Mesoscale Model MM5.The latter study explains that foehn subsidence in VAD is enhanced by flow splitting at the Seez Valley junction due to mass continuity.
During the temporary retreat (see Figure 5c), the foehn front retreated to the middle of the Rhine Valley in OBS, whereas in REF it remains further north.The vertical cross-section illustrates that the mountain wave that facilitates foehn descent is amplified after passing over the elevated topography upstream of ARH.This amplification of the wave strength helps to maintain the too-far northward extent of the foehn front in REF, suggesting that the model may overestimate the strength of the mountain wave.In fact, Doyle et al. (2011) have shown that NWP models at 1 km horizontal resolution show limited predictability when simulating fine-scale mountain wave structures, especially those induced by high mountains.
Alternatively, the too-far northward extent of the warm foehn air might be related to the overestimation of the cross-alpine pressure gradient.The simulated pressure gradient is compared with observations in Figure 6.The pressure gradient is calculated from three surface stations: one northern station, Zürich-Fluntern (SMA), and two southern stations, Lugano (LUG) and Stabio (SBO).The station pressure from the three surface stations is first converted to a common reference height assuming a constant temperature lapse rate of −6.5 K⋅km −1 (see Figure 6a,b) before calculating the cross-alpine pressure gradient (see Figure 6c,d).While the converted REF surface pressure at SMA and SBO is in reasonable agreement with OBS, the model tends to overestimate the surface pressure at LUG (Figure 6a), probably attributed to the surrounding topography, which is more complex than at SMA and SBO.The systematic pressure bias at LUG results in an overestimation of the pressure gradient between LUG and SMA (Figure 6c) and a well-simulated pressure gradient between SBO and SMA (Figure 6d).Although it is too soon to draw a definitive conclusion, the overestimation of the downstream foehn extent is not likely to be caused by a systematic bias in the large-scale cross-alpine pressure gradient.

Comparison of the five foehn cases
In the following, rather than going through each case, as we did in Section 3.1, we focus more on the differences and similarities among the five foehn cases.To avoid redundancy, Figure 7 shows the time series of only 2-m temperature at VAD and ARH for the five cases.We choose VAD, as it usually presents the strongest foehn signal among all the Rhine Valley stations, and ARH, as it represents the foehn flow in the lower Rhine Valley.Figure 7a,b shows the time series of 2-m temperature for the case 2016nov, the same as in Figure 3d,f, for comparison.The shallow foehn case 2017feb is relatively short in duration (Figure 7c).This is understandable, as previous studies show that a shallow foehn is often found as a transitional stage to deep foehn events or as a transient phenomenon due to the strengthening of upstream blocking of the cold air (Krieger et al., 2018;Mayr et al., 2004).The foehn timing at VAD is captured well in REF.At ARH (Figure 7d), foehn hours appear shorter, as the wind speed did not increase until 1900 UTC on February 27, when the temperature reached its second peak in OBS.REF underestimates the 2-m temperature during the 2017feb case at both VAD and ARH, particularly at ARH, where a delayed foehn onset is present.For the Dimmer foehn case 2017mar, the foehn timing is simulated relatively well.However, a significant cold bias is present at VAD and ARH (Figure 7e,f).
For the 2013mar case, REF captures the foehn timing at VAD and ARH well (Figure 7g,h), but there is a large cold bias during the foehn period.The 2013may case shows a weak foehn signal at both VAD and ARH (Figure 7i,j).The reason might be twofold.On the one hand, Miltenberger et al. (2016) shows a smaller fraction of the trajectories descending to the valley bottom on the northern side of the Alps in comparison with the 2013mar case.On the other hand, the higher solar radiation in May intensifies the diurnal cycle of temperature and humidity and weakens the foehn signal.At ARH, the simplified AGF foehn criteria detect a longer foehn duration in REF, in contrast to no foehn hours in OBS.This is due to the low wind speed at ARH throughout the entire foehn event in OBS.
The larger discrepancy in foehn timing at ARH is observed in several cases (the 2016nov case, the 2013mar case, and the 2013may case).This indicates that the model struggles to predict the downstream extent of the foehn flow.One possible explanation is that the strength of the night-time CAP is not reproduced adequately in the REF simulations, leading to an inaccurate representation of the CAP-foehn interaction, as suggested in Umek et al. (2021).In summary, despite the characteristics highlighted above for the individual cases, some similarities are shared.Consistent with the 2016nov case, a cold bias is found at VAD, ARH, and the other stations in all the cases during the foehn hours (Figure 7).As a result, relative humidity is overestimated in all the cases (not shown).The magnitudes of the biases differ from case to case.The cold bias, averaged over common foehn hours and over the six stations, is smallest in the 2013may case (−1.6 K) and largest in the 2013mar case (−5.2 K).The smallest cold bias for 2013may can be attributed to the stronger diurnal cycle in May compared with the other cases.
For relative humidity, the bias ranges from a minimum of 5.6% for the 2016nov case to a maximum of 22% for the 2013mar case.The averaged specific humidity bias, however, varies among the five cases, ranging from dry biases of −0.2 and −0.3 g⋅kg −1 in the 2016nov and 2017mar cases, respectively, to moist biases of 0.2 g⋅kg −1 for 2017feb, 0.8 g⋅kg −1 for 2013mar, and 0.4 g⋅kg −1 for 2013may.Prior to and after the foehn events, the biases are generally smaller.

Climatological evaluation of south foehn for a five-year period
To investigate the representativeness of the results for the five cases, we proceed with a climatological evaluation of foehn in a five-year-long period.As 2-m temperature and 10-m wind from SYNOP stations are not assimilated in the period considered, the COSMO-1 values of these quantities are likely to reflect COSMO model errors.The time series of the COSMO-1 analysis are shown in Figures 3  and 7 (ANA, red lines) for the cases 2016nov, 2017feb, and 2017mar.Overall, the results for ANA are very similar to those for REF.It is noteworthy that the model bias in REF is unlikely to originate from the near-surface COSMO-1 analysis, since the simulations are initialized before the foehn events when discrepancies between OBS and ANA are minimal.Benefiting from the archived COSMO-1 analysis dataset at MeteoSwiss, we are thus able to extend our foehn evaluation to a longer period.In this section, we will consider not only the Rhine Valley stations and ALT but also the other major foehn valley stations (for information on these northern valley stations, see Tables B.1-B.3).

Surface variables
As was shown above, the simplified AGF foehn criteria work reasonably well for detecting foehn (with an averaged overlap ratio of 93.6% among the five cases, see Section 2.4).Nevertheless, they are less reliable when a station exhibits stronger diurnal variations, such as for the 2013may case.Therefore, before we apply the simplified AGF foehn criteria to the five-year-long COSMO-1 analysis, we first preselect a time window based on the foehn detection and classification method from Jansing et al. (2022).As shown in Figure 8, the ALT time window includes foehn hours in the enhanced foehn index F I G U R E 7 As in Figure 3, but only for surface temperature at VAD (left) and ARH (right) for the five cases.
at ALT, including 24 hr before and after the foehn period (11,954 hr).The Foehn index at ALT is selected for the climatological study, due to its central location among the five valleys investigated.It is also very similar to the corresponding foehn index series at VAD. Next, we apply the simplified AGF foehn criteria to the preselected hours for both ANA and OBS.The resulting common foehn hours in the five-year period for a station X are denoted by the orange area in Figure 8.The blue area represents the common nonfoehn hours and the white half-moon area the mismatch between ANA and OBS.The sample sizes of common foehn hours (orange) and common non-foehn hours (blue) and the mismatched hours (white) at the 25 major valley stations are shown with a stacked bar plot in Figure 9a.The well-known foehn stations, such as ALT and VAD, exhibit longer foehn hours.Figure 9b-e shows the model biases calculated for common foehn and non-foehn hours by ΔΦ = Φ ANA − Φ OBS , where Φ stands for 2-m surface temperature, 2-m relative humidity, 2-m specific humidity, and 10-m wind speed.
As expected, almost all the selected valley stations experience a cold bias during the common foehn hours and common nonfoehn hours (blue) for a specific station X.The ALT foehn hours are derived from the enhanced foehn index based on observations.The foehn hours of the station X are defined according to the simplified AGF foehn criteria.
(−1.8 K averaged over the 25 stations).By contrast, many stations show a slight warm bias during common non-foehn hours (0.5 K: Figure 9b).The relative Properties of the boxes and whiskers are the same as in Figure 9.
humidity bias (Figure 9c) is consistent with the previous results, with a mean bias of 4.2% (moist bias) during foehn hours and −5.7% (dry bias) during non-foehn hours.
The actual water-vapor content is nearly unbiased during foehn, as the mean bias over the 25 stations is close to zero (Figure 9d).In contrast, there is a slight dry bias during nonfoehn hours, with a mean value of −0.17 g⋅kg −1 .This suggests that the moist relative humidity bias is mainly a result of the cold bias during foehn, and the dry relative humidity bias during nonfoehn hours is due to a combination of a warm bias and a dry specific humidity bias.
The wind-speed bias shows a more significant spatial variability.On average, the wind speed tends to be more overestimated in the Rhine Valley than in the other valleys, especially during foehn (Figure 9e).The mean wind-speed bias for the seven Rhine Valley stations (for the other valley stations) is 1.5 m⋅s −1 (-0.3 m⋅s −1 ) during common foehn hours, 0.7 m⋅s −1 (0.4 m⋅s −1 ) during non-foehn hours.
In the climatology, the cold bias and the corresponding relative humidity bias appear smaller than in the case study.This could be attributed to several factors.Assimilated observations from sources such as radiosondes might contribute indirectly to a less-biased surface temperature.The difference in model biases may also be due to the rather stochastic sampling of the five foehn cases, which might not be fully representative of the five-year climatology.

Model biases for different foehn types
We investigate the dependence of the biases on foehn type.
Five different foehn categories are identified by applying the foehn classification method mentioned in Section 2.5 to the COSMO-1 analysis data at ALT. Next, we take the intersection between the ALT foehn types and the common foehn hours (orange area in Figure 8) to get the common foehn hours for the different foehn types for each of the 25 foehn stations.The number of common foehn hours at all foehn stations is summed up for different foehn types and shown in Figure 10a.To evaluate differences in model biases between the Rhine Valley and the other valleys, we divided the stations into "Rhine Valley" stations (light orange boxes) and "other valleys" stations (dark orange boxes) in Figure 10b-d.Within the five-year period, there are 8756 hr of moist deep foehn, 1671 hr of dry deep foehn, 573 hr of shallow foehn, 186 hr of Gegenstrom foehn, and 1229 hr of Dimmer foehn, aggregated over the 25 foehn stations.There is a cold bias for all foehn types and valleys.Among the three deep foehn types, that is, dry deep foehn, moist deep foehn, and Dimmer foehn, the moister the foehn type, the larger the cold bias (Figure 10b).Shallow foehn, in general, has a smaller cold bias, similar to deep dry foehn.In terms of specific humidity, among the deep foehn types, the moister the foehn case, the smaller the moist bias.In the Dimmer foehn cases, where clouds and precipitation extend to the lee side of the mountain, the model underestimated the near-surface moisture, leading to a slightly dry bias in the model (Figure 10c).The difference in wind-speed bias between the foehn types is not pronounced (Figure 10d).
Furthermore, a difference in the magnitude of biases is found between the Rhine Valley and the other valleys.The model exhibits comparatively larger cold and moist bias at the Rhine Valley stations, except for the Dimmer foehn type, which displays a smaller dry bias.The Rhine Valley stations present a larger overestimation of the wind speed, which is consistent with the previous finding in Figure 9.

Further analysis and sensitivity experiments
In this section, we evaluate some of the hypotheses concerning the cause of the model cold bias during foehn hours.

Global radiation
One of the possible and most straightforward causes of a near-surface cold bias in foehn valleys would be an underestimated global downward radiation.For example, this could be due to a misrepresentation of downstream cloudiness.Figure 11 shows the difference of the averaged global downward radiation bias and the averaged 2-m temperature bias between foehn hours and non foehn hours and their dependence.Each average is taken over the common nonfoehn/foehn hours during the entire simulation period for one of the five foehn cases and one of the 25 northern foehn valley stations.Consistent with the previous sections, ΔT_bias is negative for the majority of stations and cases.ΔSWD_bias, however, is evenly distributed around zero and shows no correlation with ΔT_bias.This indicates that there is no significant difference in the model skill with respect to global downward radiation between foehn and nonfoehn hours and that the global downward radiation bias, regardless of its existence, has no direct impact on the cold bias during foehn hours.

Land-atmosphere coupling
A possible impact of land-atmosphere coupling on the cold bias during foehn is tested by the three parametrization changes introduced in Section 2. decrease the coupling strength between the atmosphere and the surface.The result of these sensitivity tests in terms of 2-m temperature is shown in Figure 12 for the 2016nov case and in Figure A1 for all five foehn cases.
As shown with the green lines in Figure 12, EVAP has only a very minor impact on the near-surface cold bias during foehn (ΔT = +0.02K).It shows a more pronounced warming effect during nonfoehn hours (ΔT = +0.36K).This holds true for the other foehn cases, as shown in Figure A1b,d, where, for most of the stations and cases, EVAP results in a slightly higher 2-m temperature, especially during nonfoehn hours.
To investigate the impact of EVAP on the near-surface temperature, the surface energy balance (SEB) is calculated at the five Rhine Valley stations (Figure A2).During foehn hours, a downward sensible heat flux and a upward latent heat flux can be observed due to the warm and dry near-surface foehn flow (Ward et al., 2022).This tendency is further enhanced by EVAP for CHU, RAG, and VAD in the 2016nov case.Despite the impact of EVAP on the magnitude of the surface sensible heat fluxes at those stations, the impact on the 2-m temperature during foehn is negligible.
Similar to EVAP, SKIN has a very minor impact on the 2-m temperature (ΔT = +0.02K) for the 2016nov case (orange lines in Figure 12).However, during nonfoehn hours, especially the prefoehn CAP period, SKIN significantly reduces the 2-m temperature by 0.4 K.This is true for the other foehn cases as well, as can be seen in Figure A1d.By reducing the temperature at night or during the prefoehn CAP period, SKIN magnifies the range of the diurnal temperature cycle, which corresponds to the findings of Schulz and Vogel (2020).SURF also only has a very minor impact on the simulated 2-m temperature (red lines in Figure 12).As for SEB, the modification by SKIN and SURF is different from station to station and time to time (Figure A2).Thus, we will not continue in this direction.
To sum up, during the foehn hours, EVAP, SKIN, and SURF reduce the cold bias by only an indistinct amount.During the nonfoehn hours, the average temperatures in EVAP and SURF are slightly warmer than those in REF, while in SKIN the average temperature is lower at night-time or during the prefoehn CAP period.In general, these experiments have larger impacts on the 2-m temperature during nonfoehn hours than during foehn hours, indicating that the cold bias during foehn hours is unlikely to be related to biases in the representation of the land-atmosphere interaction.

Turbulence
Turbulence is exhibited within the shear flow between CAP and the overlying foehn flow, and within the foehn flow itself.Therefore, it can play a crucial role in CAP removal and local warming before and during a foehn event.Insufficient turbulent mixing between the foehn flow and CAP, as well as within the foehn flow, may result in false foehn timings and an underestimation of the temperature in the lower part of the foehn flow.Previous studies have shown the benefit of turbulence-resolving simulations when dealing with atmospheric processes over complex terrains (e.g., Umek et al., 2021;Umek et al., 2022;and references in Serafin et al., 2018), which, however, are computationally too expensive for operational use.Nevertheless, more accurate simulations may be achieved by improving the turbulence parameterization (e.g., Goger et al., 2019).As a very first step, the sensitivity to a simple parameterization of turbulence production by horizontal shear is tested, as in Goger et al. (2018), and denoted as TKESH.This experiment shows the lowest impact on the 2-m temperature among all the sensitivity experiments (purple lines in Figures 12,A1b, and A1d).
Further investigation shows a rather limited impact of this option on the TKE values in the Rhine Valley.There is no overall increase of TKE, but instead random changes of TKE (≤ 0.4 m 2 ⋅s −2 ).Predictably, this only results in very minor changes in the near-surface temperature.

Upstream conditions
To test whether the cold bias could have its origin in the upstream conditions, soundings from the station Milan are investigated.Figure 13 depicts the temperature bias over the five foehn cases as a function of height.Near the surface, the model underestimates the air temperature by K F I G U R E 13 Evaluation of upstream temperature bias at Milan during common foehn hours for the five foehn cases.Samples were taken at the hours when the Milan sounding is available (twice a day, at 0000 and 1200 UTC), and when it is a common foehn hour based on the simplified AGF foehn criteria at VAD.The model data and the Milan soundings are interpolated to a unified pressure coordinate with a spacing of 5 hPa from 1000-700 hPa, 10 hPa from 700-500 hPa, and 20 hPa from 500-300 hPa, in total 91 levels.Properties of the boxes and whiskers are the same as in Figure 9.
about 2 K, on average, whereas no strong bias is found above 800 hPa.Previous Lagrangian studies have shown that the foehn air/trajectories which arrive in the northern valleys originate from a large range of locations and altitudes.From a three-year climatological study, Würsch and Sprenger (2015) found that the trajectories that arrived at Altdorf mostly originated from around 1 km above mean sea level (AMSL) in the Po Valley (see fig. 11 in Würsch & Sprenger, 2015).In the 2013may case, the trajectories that descended to the bottom of the Rhine Valley (see cyan trajectories in fig.9a,c from Miltenberger et al., 2016) came mainly from above 1.2 km AMSL.Jansing and Sprenger (2022) studied the 2016nov case and found that the origins of the foehn flow that arrives in the different Alpine valleys can be very diverse, both spatially and temporally.For instance, most of the trajectories arriving in the western valleys travel northwestwards along the Po Valley and experience significant ascent (latent heating) in the middle phase of the 2016nov case (November 21-23, 2016).Meanwhile, all the trajectories arriving in the eastern valleys originated from higher altitudes and experienced adiabatic warming during descent (see figs. 7, 10, and 11 from Jansing & Sprenger, 2022).
Assuming that Milan is representative of the atmospheric conditions in the Po Valley and that the foehn air comes from a minimum height of 1 km (around 900 hPa), the cold bias above this level is less than −1.5 K, which is smaller than the bias observed in the northern foehn valleys.Thus the low-level upstream temperature bias may be a partial, but not complete, explanation for the cold bias in the foehn valleys.It is important to acknowledge that it is uncertain whether the air parcels that pass Milan above 1 km actually descend into the Rhine Valley.Moreover, the model temperature bias in the lower atmosphere upstream could alter the foehn trajectories.Lastly, the current conclusion is limited to our five foehn cases.Unfortunately, we are unable to support our hypothesis with the COSMO-1 analysis dataset, as the assimilation of radiosonde data in the COSMO-1 analysis leads to nearly no bias in the potential temperature profile.

3.5.5
Vertical structure of the atmosphere in the Rhine Valley Since there are no available sounding data in the Rhine Valley, we use a pseudovertical profile to display the potential temperature profile averaged over the common foehn hours for each foehn case.The pseudo profile combines the potential temperature from surface stations at different altitudes, and it can be a good approximation of the true potential temperature profile despite local effects near the surface (Gohm et al., 2015).The latter are shown in Figure 14, where the REF profile at VAD (thick blue line) and the pseudovertical profile (thin blue line) show a similar stratification, that is, temperature gradient, despite differences in the absolute values and local variability.Therefore, except for the cold bias at the surface, the pseudo profile can provide a fairly good picture for assessing atmospheric stratification in REF.
A comparison with OBS shows that in reality the atmosphere seems less stratified than in the model in all five foehn cases, especially close to the surface (Figure 14).While the cold bias at elevated stations (about 1 km AMSL) is relatively small, a more stably stratified atmosphere leads to a more pronounced cold bias near the surface.While these results should be taken with caution, they suggest that too too an stable atmosphere in the foehn valleys may partially explain the surface cold bias during foehn events.Furthermore, assuming that the near-surface atmosphere is well mixed in reality, and therefore the observed station values are representative for a vertical profile, the model profiles above 800 m AMSL are nearly unbiased for three of the five cases (2016nov, 2017feb, and 2013march).

Horizontal resolution
Geographical features can be represented better with a finer grid spacing, which gives the potential for a more accurate simulation of the mountain boundary layer (Schmidli et al., 2018).Thus an additional simulation with a horizontal grid spacing of 550 m is performed (DX550).Among all the sensitivity experiments, DX550 shows the most significant improvements (brown curves in Figure 12).The cold bias during foehn is reduced by 0.52 K for the 2016nov case and by 0.75 K averaged over all five cases (Figure A1).It can be also observed in Figure 15a, where the upper Rhine Valley is overall warmer in DX550 than in REF.At ARH, the 2-m temperature switches between foehn and nonfoehn states during the temporary retreat and the second phase due to the reduced northward foehn extent in DX550 (Figure 15c).DX550 exhibits the capability to resolve mountain waves with smaller wavelengths compared with REF (Figures 5b,d F I G U R E 15 Same as in Figure 5, but for DX550.and 15b,d).This phenomenon is particularly pronounced in the vicinity of the elevated region situated between OBR and ARH.During the peak foehn period, the mountain wave immediately downstream of the elevated region in DX550 exhibits a wavelength approximately half of that observed in REF (Figures 5b and 15b).Subsequently, during the temporary retreat, the mountain wave is much weaker and fails to reach the ground at ARH in DX550 (Figure 15d), consistent with the shortened northern extent in Figure 15c.Furthermore, DX550 exhibits more temporal and spatial variations compared with REF throughout the whole foehn event (Figure 4), especially for the onset period, where a clearer distinction among stations, e.g., VAD (yellow) and OBR (green), can be observed.As for the atmospheric stratification, Figure 14 shows a less stably stratified atmosphere below 1000 m AMSL in DX550 (thin brown lines) in comparison with REF, resulting in a reduced cold bias in the near-surface temperature in DX550.

DISCUSSIONS AND CONCLUSIONS
The near-surface properties of south foehn in the COSMO model are evaluated for five foehn cases in the Rhine valley and a five-year climatology in five foehn valleys.The five cases are simulated using COSMO-1.The climatology is based on the COSMO-1 analysis from MeteoSwiss.The bias assessment is based on 25 stations in five foehn valleys.Additional stations are used to assess the northward foehn extent.It is shown that the COSMO-1 analysis and our COSMO-1 simulations have very similar biases for the five cases, implying that the results of the case studies presented here can be generalized using the COSMO-1 analysis.Even though the foehn cases differ widely, some similarities can be found in both the case studies and the climatological study.The main findings are as follows.
• During foehn, a cold bias is observed in all five foehn cases in the Rhine Valley (average bias of −3.5 K: Figures 3 and 7).The 2013may case is an exception, with a smaller average cold bias of −1.6 K, which could be related to the effect of strong solar radiation and a weaker foehn signal near the surface.The cold bias is supported by the five-year COSMO-1 analysis dataset with an average cold bias of −3 K in the Rhine Valley and −1.8 K over 25 northern valley stations (Figure 9).
• While there is a bias in relative humidity during foehn, the bias in absolute moisture content is small.This implies that the relative humidity bias is primarily due to the cold bias (Figures 3 and 9).
• The downstream foehn extent is likely to be associated with mountain-wave structures.In the 2016nov case, the strength of the mountain wave is likely to be overestimated, especially at the foehn front, resulting in a too-far northward extent of the foehn flow, especially for the second phase.No significant bias is found in the synoptic-scale cross-Alpine pressure gradient in COSMO-1, thus excluding errors in the simulated cross-Alpine pressure gradient as a likely cause of the bias in the foehn extent.
• The cold bias is found for all foehn types.No significant difference in the model biases is found between deep, Gegenstrom, and shallow foehn.Within the deep foehn category (dry deep, moist deep, Dimmer foehn), the moister the deep foehn case, the larger the cold bias (Figure 10a).The bias in absolute moisture content is small.Although small, the moisture bias transitions from a positive moist bias for dry deep foehn to a negative dry bias for moist deep foehn (Figure 10b).The wind-speed bias does not differ among foehn types, but among the valleys: the wind speed is on average overestimated in the Rhine Valley, whereas in the other valleys the wind-speed bias is close to zero.
Our findings confirm the cold and moist biases found in previous studies (Sandner, 2020;Umek et al., 2021;Wilhelm, 2012).The model biases in foehn timings, wind speed, and wind direction are rather dependent on the specific valleys and station locations.
The observed differences in temperature and moisture biases across various foehn types, especially among the three deep foehn types, constitute an intriguing result.Using the same dataset as presented in this article, Jansing et al. (2022) identify three clusters of trajectories, characterized primarily by different thermodynamic histories.Subsequently, the authors analyze the percentage of these clusters within each foehn type and discover that dry deep foehn, shallow foehn, and Gegenstrom foehn exhibit comparable cluster percentage patterns.In contrast, the moist foehn type contains a higher proportion of trajectories originating from lower altitudes in the Po Valley and undergoing substantial ascent and diabatic heating, with the Dimmer foehn type displaying an even greater prevalence of these trajectories.These foehn types present larger cold bias and smaller moist bias (even dry bias in the Dimmer foehn type), suggesting that the COSMO-1 model might simulate too-strong moisture loss/condensation on the up-and downstream sides of the Alps.Additionally, the more pronounced cold bias in the foehn valleys in the moister foehn types indicates that either insufficient diabatic heating is occurring upstream or evaporative cooling is too strong on the up-and downstream sides of the Alps.
It is important to note that the simplified AGF criteria are not intended to diagnose foehn hours precisely.Instead, their purpose is to serve as a foundation for both observations and simulations, and to enable a comprehensive examination of model biases during foehn events across different cases, foehn types, and foehn valleys.In an extended study, the sensitivity of the model biases to the chosen relative humidity and wind-speed thresholds is tested.The results reveal only a limited impact on the magnitude of the model biases diagnosed.In short, the main findings are insensitive to the choice of thresholds.
In order to investigate the above systematic cold bias during foehn, further analysis and sensitivity experiments were carried out.The main conclusions and open points for discussion are the following.
• No significant difference of the incoming solar radiation bias in the downstream foehn valleys is found between foehn and nonfoehn hours.Therefore the incoming radiation is unlikely to be the cause of the near-surface cold bias in the downstream foehn valleys during foehn.
• Sensitivity experiments regarding the land-atmosphere coupling result in only very minor improvements of the near-surface temperature (Figure 12).A relatively larger impact of the land-surface representation can be found at CHU, where the foehn signal is weaker and the diurnal cycle of temperature is larger.
• Based on the routine sounding of Milano, an average cold bias of about 1-1.5 K is found upstream below 850 hPa in the five foehn cases, which might partly contribute to the cold bias downstream (Figure 13).
• Compared with observed vertical pseudo profiles, REF (COSMO-1) shows a too stably stratified near-surface atmosphere during foehn hours (Figure 14).This might be related to insufficient turbulent mixing produced in the 1D turbulence parameterization scheme.However, considering a simple 3D effect by including horizontal shear production does not yield an improved result.The reason remains unclear.We conclude that the too stably stratified low-level atmosphere during foehn results in the cold temperature bias near the surface.
• DX550 (COSMO-550m) shows significantly better results in terms of temperature, humidity, wind speed, and foehn extent (Figures 4,12,14,and 15).However, the exact reason is not clear.Possible reasons might include stronger vertical mixing resulting in a warmer near-surface temperature (see Figure 14) and a more accurate representation of the underlying surface, in particular the orography.A further investigation of this issue is left to a future study.
In summary, a subset of possible reasons for the near-surface cold bias during foehn in COSMO-1 mentioned in previous studies (Sandner, 2020;Wilhelm, 2012) is examined.Insufficient incoming solar radiation, too strong coupling between land and atmosphere, and the deficiency of the default bare-soil evaporation parameterization are not likely to be the main causes of the cold bias.Instead, it might be partly transported from upstream, as indicated by the cold bias in the Milan sounding and partly related to a misrepresentation of the local temperature structure in the lowest few hundred meters in the foehn valleys themselves, as suggested by the pseudo profiles in the Rhine Valley (too stable near-surface atmosphere in the model).This points to the turbulence parameterization as a potential candidate for improvement.COSMO-550m (DX550) shows the most significant improvement among all tests.However, further sensitivity experiments as suggested above are needed for a better understanding.Limited by the detection method, this article does not cover the performance of the model in predicting the timing of foehn onsets and terminations and in producing false alarms, which are of relevance in operational foehn forecasting.To gain a deeper and more accurate insight into the spatial structure of foehn in the Rhine Valley, more detailed observations, especially in the vertical dimension, are required.Alternatively, intercomparison of NWP models with similar grid spacings and large eddy simulation may help to identify those physical processes represented poorly in the operational mesoscale numerical weather prediction model.

A.1 Bare soil evaporation (EVAP)
Evapotranspiration plays an important role in surface hydrological processes.It also impacts the surface/near-surface temperature and moisture fields.TERRA_ML considers four processes: evaporation from the interception reservoir, evaporation from the snow reservoir, bare-soil evaporation, and vegetation transpiration.The total evapotranspiration is a weighted sum of these components.
The COSMO model uses as a default the bare-soil evaporation scheme adapted from the Biosphere-Atmosphere Transfer Scheme (BATS: (Dickinson, 1984).BATS uses a force-restore method, in which, to be specific, the water contents of the upper three soil layers are aggregated to represent the water content in the upper soil layer of a former two-layer scheme.In a previous study, Schulz and Vogel (2020) showed a systematic bias in bare-soil evaporation using the BATS scheme.Schulz and Vogel (2020) implemented a new option of bare-soil evaporation in COSMO 5.4 based on a resistance formulation analogous to Ohm's law: r s = r s,min where r a , r s , and r s,min are aerodynamic resistance, soil resistance, and minimum soil resistance, respectively. 1 ,  min , and  max are the volumetric soil water content of the uppermost soil layer, the permanent wilting point, and the field capacity.Schulz and Vogel (2020) showed that, under wet (dry) conditions, the overestimation (underestimation) of bare-soil evaporation in the former BATS scheme is reduced by the new resistance formulation, resulting in a decreased (increased) latent heat flux.This leads to a higher (lower) daytime surface temperature, and a better agreement of the diurnal temperature cycle with observations.
A.2 Skin layer (SKIN) Schulz and Vogel (2020) implemented a skin layer above the uppermost soil layer in the COSMO model to represent the effect of vegetation and litter on the bare soil, based on Viterbo and Beljaars (1995).When the parameterization is activated, the skin temperature and the uppermost soil temperature are considered in the surface heat budget equation instead of solely the soil temperature (Schulz & Vogel, 2020).The formulation follows: where the four terms on the right-hand side represent the net shortwave and longwave radiation flux, and the latent and sensible heat flux, respectively, and Λ sk is the skin layer conductivity.The larger the value, the stronger the coupling between the skin temperature T sk and the surface temperature T s .Schulz and Vogel (2020) noted that, when using Λ sk = 10 W ⋅ m −2 ⋅ K −1 , the underestimation of the diurnal 2-m temperature range, especially the warm bias at night, is improved.In addition, as a consequence of the shading effect, the overestimation of the diurnal soil temperature range is reduced (Schulz & Vogel, 2020).

A.3 Heat resistance of the laminar layer (SURF)
In COSMO, the layer between the solid surface of the earth and the lowest atmosphere level is defined as the "surface transfer layer" (Buzzi, 2008).It includes, from bottom to top, a laminar layer, the turbulent roughness layer, and the surface layer or the Prandtl layer.The first two layers have basically no vertical extension (Buzzi, 2008).In the laminar layer, the resistance for momentum is assumed to be zero, and the resistance for heat can be adjusted through a tuning parameter, rlam heat.The larger the value of rlam heat, the lower the diagnosed 2-m temperature is expected to be under unstable atmospheric conditions (Khain et al., 2015).

A.4 Horizontal shear production of turbulence (TKESH)
As described in Section 2.1, the COSMO model use a 1D turbulence parameterization with a prognostic TKE closure at 2.5-level by default (Mellor & Yamada, 1982).By assuming horizontal homogeneity, only the vertical turbulent flux is parameterized as w ′  ′ = K  ⋅ ∕z, and the vertical diffusion coefficients K  are parameterized as the product of a stability function, a turbulent length scale, and a parameter q representing turbulent velocity, where q = √ 2e and e is the TKE.q is predicted using the TKE prognostic equation (eq. 3 in Goger et al., 2018), which only considers the vertical shear of the horizontal wind (u∕z) 2 + (v∕z) 2 .COSMO provides an option (itype sher=2) to include a horizontal shear term in the TKE tendency equation.The horizontal shear term is calculated using a Smagorinsky closure (eq. 5 in Goger et al., 2018).In addition, the horizontal shear of the vertical wind is calculated by multiplying horizontal gradients of vertical wind (w∕x and w∕y) by an isotropic horizontal diffusion coefficient.These two components are then added to the total TKE tendency.Although it is far from a full 3D turbulence closure, it is a first step towards 3D shear production in the TKE prognostic equation.

A.5 COSMO-550m (DX550)
In COSMO-550m the horizontal grid spacing is set to 550 m, while the vertical level spacing remains the same as in COSMO-1.The time step is reduced to 5 s to ensure numerical stability.We adopt a smaller model domain for COSMO-550m, with 1200 × 1000 horizontal grid points for all the chosen foehn cases.Other parameters that are adjusted to keep the numerical stability can be found in Table 1.
1 (a) The model domains of the REF experiments for the cases 2016nov, 2017feb, 2017mar are shown by the colored area (topography height, unit: m) and for the cases 2013mar and 2013may by the black box; the orange box represents the model domain for DX550; the red box shows the area of (b).(b) Station locations of surface observations from SwissMetNet (diamond markers) and Milan sounding station (yellow triangle marker); vertical cross-sections are taken along the red polygonal line; the blue box shows the area of (c).(c) Surface stations from SwissMetNet, TAWES (cross markers), and MeteoGroup (circle markers), which are used to depict the foehn front (see Figures5a,5c and 15) and to construct pseudovertical profiles (Figure14).Three side valleys, Prättigau, Seez, and Walgau, are shown in (b) and (c) with numbers 1, 2, and 3, respectively.
resolution of 550 m, vertical levels are the same as in REF dlon = dlat = 0.005, dt = 5.0, hd_corr_u_in = 0.1, nrdtau = 2, rlwidth = 30000.0,betasw = 0.8 Rhine Valley and the prototype foehn station Altdorf in the Reuss valley (red diamonds in Figure 1b,c); The simplified AGF foehn criteria used in this study are based on the subset of the wind speed and relative humidity (denoted by bold font).Onset and ending of a foehn event is determined by a combination of these criteria (see text). a Either e1 or (e2 and e3) change significantly.b Number of conditions fulfilled (×).

F
Time series of 2-m temperature (T, unit: K), 2-m relative humidity (RH, unit: %), and 10-m wind speed (WS, unit: m⋅s −1 ) at Altdorf and Rhine Valley stations in observation (black), reference (blue), and analysis (red) for the 2016nov case.The gray shading represents foehn hours based on observations and the blue bar denotes foehn hours based on simulations.Both are calculated with the simplified AGF criteria introduced in Section 2.4.

F
I G U R E 4 Time series of potential temperature (unit: K) at the Rhine Valley stations for the 2016nov case for (a) OBS, (b) REF, and (c) DX550.

F
Snapshot of the foehn flow at (a,b) 1200 UTC on November 21, 2016, when the foehn flow is strongest and extends the furthest north, and at (c,d) 1500 UTC on November 22, 2016, when the foehn flow retreats temporarily.(a,c) Horizontal cross-section of potential temperature at lowest model level (coloring, unit: K) and observed potential temperature at stations in Figure1c(colored circle, unit: K); (b,d) vertical cross-section of horizontal wind speed (shading, unit: m⋅s −1 ) and potential temperature (black solid contour, unit: K) along the polygonal line in Figure1b.The topography height is indicated as black contours at 1000, 2000, and 3000 m in (a,c) and as gray shading in (b,d).Simulated and observed potential temperature in (a,c) share the same color scale.The blue shading in (b,d) indicates relative humidity ≥ 99%, representing cloud existence, and the blue bar at the bottom shows total precipitation in the previous hour (unit: mm).
Time series of pressure (upper row) and pressure gradient (lower row) between Lugano (LUG) and Zürich-Fluntern (SMA) (left column) and Stabio (SBO) and SMA (right column) in OBS (black), REF (blue), and ANA (red) for the 2016nov case.All station pressures are converted to the same reference height.
43849 hours between 01.10.2015 and 31.12.2020ALT foehn hours + 24 hours before and after: Definition of the common foehn hours (orange)

F
I G U R E 9 A climatological evaluation of south foehn in the COSMO-1 analysis for the period fromOctober 29, 2015-October 29,   2020.(a) Sample sizes (in percentage) of common foehn hours (orange), nonfoehn hours (blue), and mismatched hours (white) at the 25 foehn stations.The sum equals 11,954 hr as in Figure8.(b) Surface temperature bias, (c) relative humidity bias, (d) specific humidity bias, and (e) wind-speed bias during common foehn hours (orange) and common non-foehn hours (blue).The boxes are drawn from the first quartile to the third quartile, with the white lines denoting the median and the white circles denoting the mean.Whiskers are drawn on both sides for a distance of 1.5 times the interquartile range.Outliers are marked with dots.Zero bias is shown with dashed lines.The valleys, separated with different background colors, from left to right, are Rhone, Aare, Reuss, Linth, and Rhine.F I G U R E 10A climatological evaluation of different south foehn types in the COSMO-1 analysis for the period from October 29, 2015-October 29, 2020.(a) Aggregated occurrence hours of the five foehn types (summed over common foehn hours at all 25 stations in Figure9) and their corresponding percentage in the total common foehn hours; (b) surface temperature bias, (c) specific humidity bias, and (d) wind-speed bias during common foehn hours for the stations in the Rhine Valley (light orange) and for the other four valleys (dark orange).
2 and Appendix A (EVAP, SKIN, and SURF).While EVAP changes the surface heat fluxes by improving the representation of evaporation from the soil, SKIN and SURF F I G U R E 11 Relationship between ΔT_bias (averaged 2-m temperature bias) and ΔSWD_bias (averaged global downward radiation bias), with Δ being the difference of values between foehn hours and nonfoehn hours.

DXF
I G U R E 12 Time series of 2-m surface temperature (T, unit: K) for the Rhine Valley stations in OBS (black), REF (blue), and sensitivity experiments, including SKIN (orange), EVAP (green), SURF (red), TKESH (purple), and DX550 (brown) for the 2016nov case.

F
I G U R E 14 Simulated profiles of potential temperature at VAD (bold lines; blue for REF and brown for DX550), observed pseudo profiles of potential temperature (black bold lines with markers; OBS) and simulated pseudo profiles (thin lines with markers; blue for REF and brown for DX550).Profiles are averaged over common foehn hours of VAD, for (a) 2016nov, (b) 2017feb, (c) 2017mar, (d) 2013mar, and (e) 2013may.The markers are the same as in Figure 1: diamonds for SwissMetNet, crosses for TAWES, and circles for MeteoGroup.Stations with altitudes below 500 m AGL are not shown in the pseudo profiles, except for VAD in OBS.
Heatmap of (a) mean 2-m temperature bias (2-m T, unit: K) of the six experiments (REF, EVAP, SKIN, SURF, TKESH, and DX550) over common foehn hours; (b) difference of mean 2-m temperature of the five sensitivity experiments with respect to REF.Data are shown for five Rhine Valley stations (CHU, RAG, VAD, OBR, and ARH) in all five foehn cases (from left to right: 2016nov, 2017feb, 2017mar, 2013mar, 2013may); (c) and (d) as for (a) and (b), but for nonfoehn hours.

F
I G U R E A2 Time series of the surface energy balance (SEB) in REF, EVAP, SKIN, and SURF experiments at the five Rhine Valley stations for the 2016nov case.
the foehn flow and the underlying CAP manifest at different scales.The latter are smaller and intermittent.Both require a reasonable representation to capture the timing of the foehn breakthrough correctly.The authors investigate LES with horizontal resolutions from 200-13 m.They found that the LES at 200 m can already partially resolve the Kelvin-Helmholtz (K-H) instability and the associated turbulent kinetic energy (TKE Shear flow instability is found in the transition zone, as documented by Doppler wind lidars during the second Intensive Observation Period.Umek a too-early foehn breakthrough in the mesoscale mode.Despite the remarkable improvement in comparison with the mesoscale simulation, the LES with a horizontal grid spacing of 40 m still fails to capture the re-establishment of the CAP during the second night of the event adequately.In a subsequent study, Umek et al. (2022) noted that turbulent eddies within 630 horizontal grid points, while those in 2016 and 2017 were simulated with a larger domain consisting of 1100 × 750 grid points, as illustrated in Figure1.Boundary conditions are updated every hour for all simulations.The prognostic equations use a time step of 10 s.
the COSMO-2 (COSMO model with Δx = 2.2 km) analysis is employed for the foehn cases in 2013 (as the COSMO-1 analysis is only available since 2015).Note that the COSMO-2 analysis has a slightly smaller domain than the COSMO-1 analysis.Consequently, the foehn cases in 2013 were simulated with a domain that consists of 1000 × ).