General circulation models simulate negative liquid water path–droplet number correlations, but anthropogenic aerosols still increase simulated liquid water path

. General circulation models’ (GCMs) estimates of the liquid water path adjustment to anthropogenic aerosol emissions differ in sign from other lines of evidence. This reduces confidence in estimates of the effective radiative forcing of the climate by aerosol–cloud interactions (ERFaci). The discrepancy is thought to stem in part from GCMs’ inability to represent the turbulence–microphysics interactions in cloud-top entrainment, a mechanism that leads to a reduction in liquid water in response to an anthropogenic increase in aerosols. In the real atmosphere, enhanced cloud-top entrainment is thought to be the 5 dominant adjustment mechanism for liquid water path, weakening the overall ERFaci. We show that the latest generation of GCMs includes models that produce a negative correlation between present-day cloud droplet number and liquid water path, a key piece of observational evidence supporting liquid water path reduction by anthropogenic aerosols and one that earlier-generation GCMs could not reproduce. However, even in GCMs with this negative correlation, the increase in anthropogenic aerosols from preindustrial to present-day values still leads to an increase in simulated liquid water path due to the param-10 eterized precipitation-suppression mechanism. This adds to the evidence that correlations in the present-day climate are not necessarily causal. We investigate sources of confounding to explain the noncausal correlation between liquid water path and droplet number. These results are a reminder that assessments of climate parameters based on multiple lines of evidence must carefully consider the complementary strengths of different lines when the lines disagree.


Introduction
Aerosol-cloud interactions (ACI) remain the greatest source of uncertainty in our estimates of anthropogenic perturbations to Earth's energy budget (Boucher et al., 2014;Forster et al., 2021).In liquid clouds, an anthropogenic aerosol perturbation essentially instantaneously alters the number of cloud droplets (  ), changing cloud reflectance and thus the shortwave radiation absorbed by the climate system, which exerts a radiative forcing on climate ("radiative forcing by aerosol-cloud interactions" or RFaci; Twomey, 1977;Boucher et al., 2014).While our knowledge of RFaci is uncertain (Quaas et al., 2020), an even thornier issue is cloud adjustments to the   perturbation, where multiple processes acting at different scales from cloud droplet to planetary circulation (Stevens and Feingold, 2009) result in a multiscale dynamics prediction problem that is impervious to any one "silver bullet" solution (Mülmenstädt and Feingold, 2018).Estimates of ACI adjustments are, therefore, based on multiple, and often conflicting, lines of evidence (Boucher et al., 2014;Bellouin et al., 2020;Forster et al., 2021).
Those lines of evidence are, broadly, modeling at the cloud process scale ("large eddy simulation" or LES), global modeling, and observations at different scales.
In the following, we focus on stratocumulus (Sc) clouds, which play a large role in the energy budget due to their high albedo and frequent occurrence.Our understanding of adjustments in Sc is that two effects compete: an anthropogenic increase in   suppresses precipitation (Albrecht, 1989), increasing cloud liquid water path (L); but the   increase also promotes increasing turbulent entrainment of subsaturated air at cloud top (Ackerman et al., 2004;Bretherton et al., 2007), decreasing L. These mechanisms are regime dependent; precipitation suppression only plays a role in clouds that would have precipitated in the absence of the aerosol perturbation, and the entrainment mechanisms depend strongly on the turbulence generation mechanisms, for example cloud-top radiative cooling.The regime dependence of the underlying processes leads to "process fingerprints" in   -L space in LES (Hoffmann et al., 2020) for the very limited set of boundary conditions where LES is available.Similar bifurcation behavior appears in satellite observations, where mean L as a function of   first increases in precipitating clouds, next reaches a peak that roughly coincides with the transition to nonprecipitating clouds, and then decreases again (Gryspeerdt et al., 2019).There is evidence that this "inverted v" relationship between L and   overestimates the strength of the causal effect of   on L (Gryspeerdt et al., 2019;Arola et al., 2022;Fons et al., 2023), but qualitatively it is consistent with process understanding from LES. Integrated over all meteorological boundary conditions, the overall satellite correlation between L and   is negative.The satellite inverted v, satellite observations of natural laboratories (Christensen et al., 2022) where the origin of the perturbation is evident (Malavelle et al., 2017;Toll et al., 2019), and process-modeling lines of evidence lead to the assessment that the adjustment of L to anthropogenic aerosol is a reduction of L, that is, a positive contribution to the effective radiative forcing by ACI (ERFaci; Bellouin et al., 2020;Forster et al., 2021).
Global climate models -which, currently, means general circulation models (GCMs) run at roughly 1 • latitude-longitude spatial resolution -tell a different story.They would project an increase, rather than a decrease, in L when aerosols are increased from preindustrial (PI) to present-day (PD) concentrations (Gryspeerdt et al., 2020).The GCM line of evidence is https://doi.org/10.5194/egusphere-2024-4Preprint.Discussion started: 9 January 2024 c Author(s) 2024.CC BY 4.0 License.discounted in multiline assessments because it conflicts with the other lines and because those lines are assumed to provide more reliable information.This assumption rests on the representation of the relevant processes in GCMs.In these models, precipitation is initiated by a microphysical parameterization with an explicit dependence on   (or, largely equivalent, droplet size), so that the L increase by precipitation suppression is explicitly parameterized.Reduced L by enhanced evaporation, on the other hand, depends critically on meter-scale or smaller interactions between turbulence, radiation, and microphysics at the cloud edge.These interactions fall between several parameterizations and are therefore tricky to formulate in GCMs.(As a perverse consequence, this causes us to fret that GCMs may be structurally incapable of representing turbulent entrainment scales, while we often mistakenly consider the many-orders-of-magnitude-smaller-scale precipitation processes a parametric problem; e.g., Mülmenstädt et al., 2020Mülmenstädt et al., , 2021)).
In this work, we show that some Coupled Model Intercomparison Project 6 (CMIP6) era GCMs, unlike earlier model generations, are capable of producing inverted v   -L relationships in agreement with global observations and LES.Based on these PD correlations and on the   change between PI and PD (i.e., mimicking the information available to observationsbased ERFaci estimates), these models predict a reduction in L, which is consistent with assessments that use multiple lines of evidence.However, the causal effect of anthropogenic   changes on L, as diagnosed by model experiments where all climatic boundary conditions apart from aerosols are held fixed, remains as in previous GCM generations: an anthropogenic   increase leads to an increase in average L, consistent with a dominant role for the precipitation suppression mechanism parameterized in the model microphysics.

Data and methods
We use an ensemble of GCMs to perform fixed-sea surface temperature model experiments with PD and PI emissions, archive instantaneous aerosol and cloud information with sufficient frequency (3 h) to resolve the diurnal cycle and with sufficient length (1-5 years with the large-scale winds nudged to PD meteorology) to draw statistically robust conclusions.The model ensembles used are the CMIP5-era AeroCom indirect effect experiment (AeroCom IND3) simulations on the one hand and four newer-version models prepared for CMIP6 on the other.The AeroCom models are described in Zhang et al. (2016); Ghan et al. (2016).The CMIP6-era models are the U.S. Department of Energy Exascale Earth System Model (E3SM) version 1 Atmosphere Model (EAMv1; Rasch et al., 2019), the NASA Goddard Institute for Space Studies (GISS) ModelE3 (Cesana et al., 2019(Cesana et al., , 2021) ) configuration Tun1, the Geophysical Fluid Dynamics Laboratory (GFDL) Atmospheric Model AM4.0 (Zhao et al., 2018), and the Community Earth System Model version 2 Community Atmosphere Model version 6 (CESM2-CAM6; Gettelman et al., 2019).The CMIP6-era models were run for one year for the baseline experiment.E3SM was further run for 5 years for additional experiments that needed more data to perform stratification by confounding variables (see Sect. 3.3).For E3SM, the Cloud Feedback Model Intercomparison Project ObservaTon Simulator Package (COSP) satellite simulator (Pincus et al., 2012;Swales et al., 2018) mimicking the MODerate Resolution Imaging Spectroradiometer (MODIS) cloud retrievals (Platnick et al., 2017) and a number of vertically resolved fields were archived over a limited area over the northeast Pacific (NEP) Sc region for further analysis of confounders.https://doi.org/10.5194/egusphere-2024-4Preprint.Discussion started: 9 January 2024 c Author(s) 2024.CC BY 4.0 License.

Cloud selection
From the model output, we select liquid clouds, defined by the absence of ice (ice water path < 10 −3 kg m −2 and ice cloud cover = 0) in the column.To mimic passive satellite analyses, as well as to simplify the application of entrainment diagnostics in part 2 of the series (Mülmenstädt et al., in prep.), we require near-overcast (liquid cloud cover > 0.9) conditions.For these liquid clouds, we calculate "in-cloud" cloud-top   and L by dividing the grid-mean   and L by the projected cloud cover.
Only clouds over ocean are considered in this analysis.We refer to these clouds as "overcast clouds".
In addition to these globally occurring overcast clouds, we also study smaller cloud subsets defined by dynamical regime following Medeiros and Stevens (2011).In this classification, the stratocumulus regime is based on vertical velocity  at 700 and 500 hPa ( 700 > 10 hPa d −1 and  500 > 10 hPa d −1 ) and lower tropospheric stability (LTS), which we define here as the difference in potential temperature  between 1000 hPa and 700 hPa ( 700 −  1000 > 18.55 K).We further restrict the clouds to occur in grid boxes where these conditions are met at least 30% of the time, which serves to select the subtropical Sc regions.
The occurrence fraction  Sc of these conditions is shown in Fig. 1.In addition to the Medeiros and Stevens (2011) requirements, all of the above-mentioned warm cloud criteria are applied.We refer to these clouds as "Sc regime clouds".

Analysis methods
From L and   , we construct the conditional probability (L|  ) following Gryspeerdt et al. (2019).For ease of comparison among models and configurations, we collapse the two-dimensional (L|  ) into one dimension by calculating the geometricmean L in each   bin, also following Gryspeerdt et al. (2019).
For the MODIS simulator analysis in Sect.3.3.3,we transform the simulated  and droplet effective radius (  ) into   and L using a power-law relationship for adiabatic updrafts with constant   (Brenguier et al., 2000;Bennartz, 2007;Painemal and Zuidema, 2011;Grosvenor et al., 2018): where we take the ratio  = (  /  ) 3 between volumetric mean radius   cubed and effective radius cubed to be 1, subadiabatic factor  ad = 1, scattering efficiency  = 2, and adiabatic condensation rate Γ = 2 × 10 −6 kg m −4 .These assumptions minimize the complications involved in showing results that are mostly power-law behavior independent of these constant factors.(This does neglect important modifications that can arise if these factors are not, in fact, constant; Varble et al., 2023).
To analyze confounding by planetary boundary layer (PBL) depth (Sect.3. Table 1 summarizes the emissions, cloud selection, and model run duration for each experiment.

Results and discussion
In Fig. 2, we show the behavior of the AeroCom IND3 (CMIP5-era) GCMs in   -L space: with the exception of one model, L increases monotonically as a function of   .In some models, the slope decreases at high   , but only one model (HadGEM) has quantitatively similar behavior to the inverted v satellite   -L plot.The behavior of these models (with the exception of HadGEM) is consistent with the interpretation that the predominant mechanism linking L and   is precipitation suppression.

CMIP6-era models produce inverted v 𝑵 𝒅 -L relationships
A funny thing happened on the way to CMIP6: three of the four US CMIP6-era GCMs have an inverted v with a pronounced negative slope.The behavior of these models is contrasted with the AeroCom models' behavior in Fig. 3.The geographic distribution of the regression slope between log L and log   is predominantly negative in the models with an inverted v (Fig. 4), as Gryspeerdt et al. (2019) found in satellite retrievals.
One of these models (ModelE) was designed to better represent the entrainment behavior to which the negative slope is attributed in process-scale modeling.The other two (CAM6 and EAMv1), however, were not; if the negative slope is due to an entrainment ACI mechanism, it is an emergent behavior not explicitly parameterized into the turbulence scheme.It is doubly surprising that these models produce a negative slope considering that their closely related predecessor, CAM5.3-CLUBB-MG2, was part of the AeroCom ensemble and showed, at best, a slightly negative relationship between   and L.

3.2
The negative correlation between   and L does not predict the sign of PI to PD change in L The bulk of the   population lies in the part of the inverted v with a negative   -L correlation.If we regarded this relationship as indicative of a causal influence of   on L -that is, that an increase in   causes L to decrease -then we would predict a decrease in L as   increases from its PI value to its PD value due to anthropogenic emissions.
We can compare the change in L predicted by the   -L correlation in PD internal variability to the outcome of a model experiment designed to measure the causal effect of   on L. This experiment fixes all climatic boundary conditions affecting cloud state (i.e., solar constant, greenhouse gases, and sea-surface temperature) with the exception of anthropogenic aerosols.
The change in L in this experiment can therefore only be due to the anthropogenic aerosol emissions change.This model experiment shows that the causal effect of the   increase is to increase L on average, contradicting the prediction of a decrease in L based on PD internal variability (Fig. 5).The correlation seen in PD internal variability in these models therefore cannot be causal.Plotting the correlations within PD and PI, as shown in This behavior has not been documented in three-dimensional GCM experiments, but CAM6 and EAMv1 use related cloudturbulence (Bogenschutz et al., 2013;Larson, 2022) and cloud-microphysics (Gettelman, 2015) schemes, so it is conceivable that   -dependent entrainment mechanisms contribute to the   -L relationship in these three models.A deeper investigation of this question merits a separate paper (part 2 of this series, Mülmenstädt et al., in prep.).

Sources of covariability that produce noncausal 𝑵 𝒅 -L relationships
Noncausal relationships between two variables often originate from a third (possibly unobserved) variable that exerts a causal relationship on the two variables being correlated.This third variable is termed a "confounding variable" (Pearl and Mackenzie, 2018).In its most striking form, confounding can lead to a sign reversal between causation and correlation, for example in Simpson's paradox (Simpson, 1951;Feingold et al., 2022).Cloud properties respond strongly to the circulation at the scales of the Sc cellular organization (mesoscale) and greater.Thus, the meso-to synoptic-scale circulation is a natural place to look for confounding variables that lead to noncausal correlations between cloud properties.

Mesoscale cloud regimes
Mesoscale circulation manifests as cloud "regimes" (e.g., Rossow et al., 2005;Gryspeerdt and Stier, 2012;Muhlbauer et al., 2014;Unglaub et al., 2020).ACI mechanisms likely differ between cloud regimes (e.g., Mülmenstädt and Feingold, 2018;Possner et al., 2020;Dipu et al., 2022).This could result in different  Figure 7 shows the results.The model, perhaps unrealistically, produces clouds that generate surface-reaching rain at all droplet concentrations; however, in bins with higher rain rates, the   distribution is noticeably lower, as might be expected from the negative-exponent power law that parameterizes the autoconversion of cloud water to rain, and as is expected from observations (Pawlowska and Brenguier, 2003;Comstock et al., 2004).At the same time, L is higher in bins with higher rain rates, again as might be expected from the parameterized autoconversion and accretion.Superimposing the bin-mean   and L for each rain-rate bin on the unbinned   -L distribution, we find that the negative correlation among the bin means echoes the unbinned correlation.This is the case even though, in very classic Simpson (1951) fashion, the correlations within five out of the six  bins are positive.Thus, the opposing influences of   and L on rain rate can, without any involvement of entrainment or evaporation mechanisms, generate a noncausal negative correlation between   and L.
We note that the mechanism generating this noncausal correlation is unusual.The contrast with the causal precipitation suppression mechanism is clear: there, causation runs from   to autoconversion to L. Here, causation runs jointly from both   and L to precipitation.How or whether the causal chain then returns from precipitation to   and L, as in the classic confounding mechanism, is an open question.
We further note that precipitation already appears to have a qualitative effect on the model's   -L relationship at rain rates far below the CloudSat sensitivity threshold: even in the second-lowest  bin, the correlation between   and L is already positive.This suggests that the parameterized precipitation may exert such a strong influence on ACI even for clouds with low precipitation rate that other ACI adjustment mechanisms, while they may in principle be represented in the model, could be so overwhelmed by the parameterized precipitation suppression that their effect is not discernible in the climate response.

Synoptic-scale airmass advection
At the synoptic to planetary scales, covariability between cloud and aerosol properties can lead to spurious correlations in ACI metrics (Grandey and Stier, 2010).Synoptic-scale meteorological covariability can take the form of continental versus marine airmass advection.When an airmass originates over land, it typically has higher temperature, lower relative humidity (contributing to lower L), and higher aerosol concentration (contributing to higher   ) than when an airmass originates over ocean.This contrast between airmasses creates an anticorrelation between   and L even in the absence of any causal effect of   on L (Brenguier et al., 2003).Additionally, sea surface temperature is coldest and climatological subsidence strongest, near the coast, resulting in shallow marine boundary layers.The model's conception of this synoptic-scale covariability in space can be seen in deeper boundary layers with low CCN farther offshore.A similar covariability exists at particular locations in time.Figure 9 illustrates the mechanism in the NEP Sc region: at any given location, PBL depth and CCN concentration are strongly linked via the synoptic-scale circulation.Presumably the position of the anticyclonic subtropical subsidence governs both the PBL depth and whether continental or maritime air is advected.
To assess the synoptic meteorological confounding effect, we stratify the E3SM model clouds in the NEP Sc region by PBL depth.We choose PBL depth as the confounding variable because it appears to act as a proxy for airmass "continentality" in the model (Fig. 8), without a direct parameterized relationship to either aerosols or cloud.PBL depth is nevertheless strongly correlated with both CCN concentration (temporal-and regional-mean vertical profiles are shown in Fig. 10) and L. For a fairly wide range of PBL depths (representing the central 90% of the PBL depth distribution for Sc-regime cloud columns), the relationship between mean   and mean L stratified by PBL depth mimics the slope of the unstratified   -L relationship quite closely (Fig. 11).Based on this, it is plausible that synoptic-scale meteorological covariability contributes substantially to the overall negative   -L correlation in the model.
We note several caveats.The synoptic-scale covariability of aerosol advection and PBL depth is a feature of the general circulation and can therefore be expected to be modeled reasonably well in a GCM.However, the interaction of the advected aerosol with boundary-layer clouds depends on mixing between the free troposphere and the boundary layer, which is likely much less well represented in GCMs that have coarse vertical resolution.Whether the synoptic-scale confounding signature in the model mimics the real atmosphere is therefore uncertain.Further, the synoptic-scale covariability differs depending on the geographic particulars of each Sc basin; we have only analyzed the NEP Sc in detail.Finally, while the PBL depth-stratified negative   -L relationship in the model is consistent with observational analyses (e.g., Fons et al., 2023), the model does not reproduce the weakening of the   -L correlation within each PBL depth bin (not shown) that is found in observations (Possner et al., 2020;Fons et al., 2023).

Phase-space boundaries
Correlations between   and L can also arise simply because not all parts of the   -L phase space are equally accessible to clouds.This can be illustrated by applying the MODIS simulator (Pincus et al., 2012) to the model.The MODIS simulator provides optical thickness  and droplet effective radius   diagnosed consistently with the MODIS satellite cloud retrievals (Platnick et al., 2017).Power-law adiabatic relationships L (  , ) and   (  , ) can be used to transform the MODIS output into   -L space (e.g., Dipu et al., 2022).In logarithmic coordinates, this is a linear transformation, yielding the correlation shown in Fig. 12.This S-curve correlation, like the model-native   -L correlation, shows a steep rise in L at low   and a steep drop at moderate   .It also shows another steep rise at high   that the model may hint at but does not exhibit clearly.
Investigating the data before and after the coordinate transformation to   -L space is instructive.In log -log  space, the MODIS simulator output falls within a rectangle, bounded by the limits the model prescribes on its clouds.Upon transformation to log L-log   space, the population is bounded by a parallelogram (see the isolines in Fig. 12).These limits on the phase space strongly sculpt the behavior of the mean log L as a function of log   , because the parts of phase space that are not populated do not contribute to the mean L as a function of   .

Persistent disagreement with other lines of evidence
Before these results, it was only logical to discount the GCM evidence on the basis that it could not reproduce the observed the   -L relationship in PD internal variability.Now that some GCMs match the other lines of evidence in PD internal variability, what do we make of the fact that the disagreement on the sign of the causal climatic L adjustment to RFaci persists?
In observations, it is more difficult to establish causality than in the GCMs, where it is as simple as changing the aerosol emissions while fixing all other boundary conditions.The most reliable causal evidence in observations comes from observational natural laboratories where the aerosol perturbation is known and an unperturbed control can be identified clearly (Christensen et al., 2022).Such laboratories indicate unchanged or reduced L in the perturbed clouds (Malavelle et al., 2017;Toll et al., 2019;Diamond et al., 2020).But such laboratories are rare, and there is no rigorous extrapolation from these laboratories to the full diversity of cloud regimes found in the climate.The most representative observations, that is, the global satellite-retrieved inverted v correlations, have the opposite problem: they are representative, but are the correlations causal?The correlation is more negative than the estimate of causal interannual L response to   perturbations using an effusive volcano as a laboratory (Gryspeerdt et al., 2019, albeit for shallow Cu rather than Sc).Arola et al. (2022) argue that satellite   -L correlations are negatively biased not only by covariability confounding but also by retrieval errors.Fons et al. (2023) applied a causal network approach to the temporal evolution in geostationary satellite data and found that the causal negative   -L relationship is weaker than the   -L correlation.Strong regional increasing and declining trends on multidecadal timescales in the satellite record may also contribute to disentangling covariability and causality (Quaas et al., 2022).
In LES, as in GCMs, causality can be established by varying aerosol concentration while keeping the other boundary conditions constant.This provides very clear evidence that precipitation suppression and entrainment feedbacks lead to process fingerprints of positive and negative L tendencies in   -Lspace (Hoffmann et al., 2020) that translate into steady Sc states (Glassmeier et al., 2021).But these LES experiments are expensive, so boundary conditions are carefully curated to a very small subset of the high-dimensional space of meteorological conditions present in the climate.We simply do not know whether the process fingerprints would be as unambiguous if a broader spectrum of boundary conditions were simulated or if the clouds were able to interact with larger scales in the multiscale climate problem (Kazil et al., 2021) instead of evolving to a steady state.
In summary, GCMs are still the odd ones out in their negative L adjustment component of ERFaci.The observational and LES modeling lines of evidence have clear confounding and representativeness problems.Are these problems severe enough to flip the sign of the adjustment?It seems unlikely, but our GCM results show that it is possible; addressing the representativeness and confounder questions in the other lines of evidence thus takes on a renewed urgency.

Conclusions
Mülmenstädt and Wilcox (2021) expressed the hope that global models, after a long stretch of playing the odd line of evidence out in assessments of global energy budget problems (Bellouin et al., 2020;Sherwood et al., 2020), might be returning to a more equal role in the balance and struggle between conflicting lines of evidence.One way in which the global model perspective Causality (or, in this case, lack of causality) is easy to establish in a model experiment but very difficult in observations.
Where the noncausal correlation originates is another question that models can, in principle, answer definitively by shutting off confounding model mechanisms in mechanism-denial experiments.In part 2 of this series, we will more fully use the power of models as hypothesis testers by performing perturbed-physics and mechanism-denial experiments.In this paper, we have restricted ourselves to slice-and-dice analyses that could, in principle, also be performed on observations.We hope that they will be performed on observations, especially if Lagrangian investigation of cloud life cycle (e.g., Eastman et al., 2022;Christensen et al., 2023) and observational fingerprints of loss processes (e.g., Varble et al., 2023) can be included.
Whether lack of causality in the model system implies lack of causality in the real atmosphere is a question that models alone cannot address, so we do not yet know how worried we need to be about the sign difference between correlation and causation in the model   -L relationship.When it comes to the non-GCM lines of evidence, one can quibble with the representativeness of the causal evidence and with causality in the representative evidence -at the very least, these model results are a flashing red warning sign hanging over our interpretation of the L adjustment component of ERFaci.
3.2), we identify the top of the Sc-like boundary layer by the first model level where temperature increases with height in Sc-regime overcast columns.This produces wellmixed profiles of liquid-water potential temperature   and total water mixing ratio   .(Other definitions of PBL top, i.e., the model level of greatest gradient in   or   , yield very similar results.)As we will see inSect.3.3.2,cloud and aerosol      properties are remarkably stratified by PBL depth in E3SM; to keep the properties as distinct as possible as a function of PBL depth, we retain the native model vertical discretization instead of converting the hybrid pressure levels to pressure or geometric height.
Fig. 6, provides a glimpse at what is happening instead: a secular increase in   does not lead to a secular reduction in L by shifting the L population along the correlation line, as would be expected for a causal relationship.Instead, the correlation line shifts along with the secular shifts in   and https://doi.org/10.5194/egusphere-2024-4Preprint.Discussion started: 9 January 2024 c Author(s) 2024.CC BY 4.0 License.L ( mostly to the right given that the change in   is far greater than the change in L) in a way that is not predicted by the correlation line itself.This contradiction raises three questions.First, what produces the noncausal negative   -L correlation?We provide a few hypotheses in the following section.Second, considering that these models can replicate the observed PD correlation, what can we infer about the causality of the relationship in observations, where we are unable to conduct direct experimental tests of causality?We discuss this question in Sect.3.4.Third, is any part of the negative relationship between   and L in the models causal?Any such causal mechanism would have to involve a direct or indirect   -dependence in cloud-top entrainment.In ModelE, the Bretherton and Park (2009) turbulence scheme provides an explicit entrainment closure.Guo et al. (2011) have shown that the combination of the Cloud Layers Unified By Binormals (CLUBB;Larson and Golaz, 2005;Golaz et al., 2007) cloud and turbulence scheme and the Morrison-Gettelman microphysics scheme(Morrison and Gettelman, 2008;Salzmann et al., 2010) can reproduce entrainment-mediated enhanced evaporation at high   in single-column experiments.
-L slopes in open-or closed-cell Sc or shallow cumulus or, as the positive-and negative-sloped legs of the inverted v relationship perhaps show, in precipitating and nonprecipitating cloud regimes.Due to GCMs' coarse resolution, it is doubtful that they can correctly represent these mesoscale cloud regimes, their ACI mechanisms, or their coupling to the circulation.Nevertheless -or perhaps precisely because we can probably discount cloud-scale causal links between   and L due to the mismatch with the GCM resolved scale -we can use GCMs to test whether the existence of cloud regimes is, on its own, a confounding mechanism for the   -L relationship.To assess whether regime-induced confounding effects may exist in the model   -L relationship, we stratify the E3SM model clouds by surface rain rate.These bins of rain rate are our stand-in for precipitation regimes.We focus on the surface https://doi.org/10.5194/egusphere-2024-4Preprint.Discussion started: 9 January 2024 c Author(s) 2024.CC BY 4.0 License.rain rate because, unlike mesoscale morphological regime definitions (which are subgrid scale in the GCM), the precipitatingnonprecipitating regime delineation has a somewhat clear analog in the GCM.Because the model rain rate has a very long low tail, we do not attempt to define a binary nonprecipitating versus precipitating categorization but rather divide the cloud sample into quantiles of rain rate.Specifically, we use sextiles, balancing the need for a meaningful range of rain rates with the need to maintain a large sample of clouds within each bin.The CloudSat precipitation detection sensitivity at the GCM spatial resolution (≈ 0.01 mm d −1 ; Stephens et al., 2010) falls roughly into the third rain rate bin, so, by this definition, half the bins approximately represent precipitating and half nonprecipitating clouds.
https://doi.org/10.5194/egusphere-2024-4Preprint.Discussion started: 9 January 2024 c Author(s) 2024.CC BY 4.0 License.shores up the strength of the multiline assessment by providing information not available from the other lines of evidence: being able to test causality and showing that PD internal variability may not even correctly predict the sign of the causal cloud water adjustment to the anthropogenic cloud droplet perturbation.

Table 1 .
GCM experiments used in this analysis.