Cloud Patterns in the Trades Have Four Interpretable Dimensions

Shallow cloud fields over the subtropical ocean exhibit many spatial patterns. The frequency of occurrence of these patterns can change under global warming. Hence, they may influence subtropical marine clouds’ climate feedback. While numerous metrics have been proposed to quantify cloud patterns, a systematic, widely accepted description is still missing. Therefore, this study suggests one. We compute 21 metrics for 5,000 satellite scenes of shallow clouds over the subtropical Atlantic Ocean and translate the resulting data set to its principal components (PCs). This yields a unimodal, continuous distribution without distinct classes, whose first four PCs explain 82% of all 21 metrics’ variance. The PCs correspond to four interpretable dimensions: Characteristic length, void size, directional alignment, and horizontal cloud top height variance. These dimensions span a space in which an effective pattern description can be given, which may be used to better understand the patterns’ underlying physics and feedback on climate.

pattern measures (Denby, 2020). A third, more traditional approach is to compute one or more human-defined metrics; these are both interpretable and objective and are therefore considered in this paper.
Cloud patterns are often associated with a quantity called "organization." This term has consequently taken on numerous interpretations. It is often synonymous with "aggregation" in studies of deep convection (Holloway et al., 2017;Tobin et al., 2012;White et al., 2018), sometimes characterized as the regular, random or clustered structure of nearest-neighbor distances of cloud objects (Seifert & Heus, 2013;Tompkins & Semie, 2017;Weger et al., 1992), or connected to cloud scale (Bony et al., 2020;Neggers et al., 2019). However, cloud field organization has also been defined by metrics of fractal analysis (Cahalan & Joseph, 1989), directional alignment (Brune et al., 2018), subcritical percolation (Windmiller, 2017), or spatial variance (de Roode et al., 2004;Wood & Hartmann, 2006). While this makes it difficult to objectively define and discuss organization, all these interpretations share the same aim: Quantifying cloud patterns. Hence, this diversity can potentially also be harnessed to distinguish between different patterns.
The aim of this study is therefore to systematically extract the independent information encapsulated by the set of metrics associated with "cloud field organization" in literature, and to use this information to describe and interpret cloud patterns as effectively as possible. We first compute 21 diverse metrics for 5,000 satellite observations of mesoscale cloud fields in the trades and synthesize these in a multivariate distribution (Section 2). Next, we show that the metrics vary primarily along four Principal Components (PCs), allowing drastic dimensionality reduction (Section 3.1). Analysis of these main PCs results in a pattern description that is remarkably effective, in addition to being objective and interpretable (Section 3.2). We then highlight several approaches to approximate the PCs that balance the description's complexity and accuracy (Section 3.3). Finally, we demonstrate and discuss our description's ability to characterize previously diagnosed and novel pattern regimes (Section 3.4), before concluding (Section 4).

Data
Following Stevens et al. (2019) and Bony et al. (2020), we concentrate on shallow, subtropical clouds in the North Atlantic Ocean east of Barbados (20°-30°N, 48°-58°W), which are representative for the entire trades (Medeiros & Nuijens, 2016). Our cloud fields stem from the MODIS instrument borne by NASA's Aqua and Terra satellites. Specifically, we sample daytime overpasses during December-May 2002-2020 and directly use the level 2 cloud water path (CWP), cloud top height (CTH), and cloud mask products at 1 km resolution (Platnick et al., 2015) as basis for our metrics. By comparing the multivariate metric data set to a corresponding data set constructed using coarse-grained cloud products (Loudin & Miettinen, 2003), Figure S1 verifies that our results are not overly sensitive to instrument resolution. We only interpret pixels classified as "confidently cloudy" by the cloud mask algorithm as cloud.
Our data points are scenes of cloud fields, which are 512 × 512 km subsets sampled within the 10° × 10° observation region. To boost the size of our data set, scenes are allowed to overlap 256 km. We attempt to minimize the impact of errors and biases in remotely sensed cloud products by rejecting scenes with (i) high clouds such as cirrus, if more than 20% of the clouds' tops lie above 5 km, (ii) overly large sensor zenith angle, if this angle exceeds 45°, following, for example, Wood and Field (2011) and (iii) sunlight errors, manually excluding scenes where these are visually found to influence the cloud mask. A set of 5,004 scenes remains.

Metrics and Dimensionality Reduction
To appropriately capture the body of existing organization metrics, we require them to meet either of the following two criteria: (i) Are they perceived to capture a unique aspect of the patterns? or (ii) do they frequently recur or recently first appear in literature? Additionally, they must be easy to interpret. This procedure (see Table S1 for details) diagnoses 21 metrics, which broadly divide into three methodological categories: Statistical moments of physical cloud field properties, object-based metrics, and attributes of scale decompositions. The metrics are briefly introduced below, visually presented in Figure 1 and further detailed in Text S1.
Statistical moments of cloud field properties comprise measures of typical cloud mass and area: The cloud mask's coverage fraction (cloud fraction), the CWP's scene integral (cloud water) and standard deviation (σ(CWP)) and the variance ratio for "mesoscale aggregation" of moisture proposed by Bretherton and Blossey (2017) (CWP var. ratio), here applied only to cloud water. Furthermore, this class contains measures of the clouds' vertical extent: The mean and standard deviation of cloud top height (CTH and σ(CTH) respectively).
Object-based metrics measure size, shape, and relative positioning of individual cloud segments, which are identified from cloud mask fields by connecting cloudy pixels that neighbor each other vertically and horizontally (4-connectivity labeling). To avoid artifacts at the resolution scale, objects of a smaller dimension than four times the instrument resolution are ignored. Our results are not sensitive to the chosen connectivity scheme or minimum object size ( Figure S1). The resulting metrics further divide into two categories: Scene statistics of individual object properties and measures of the spatial distribution of the objects. The first category includes the mean and maximum object length (mean length, max length), the number of objects (cloud number) and the mean object perimeter (perimeter); the second comprises the Simple Convective Aggregation Index (SCAI) (Tobin et al., 2012), Convective Organization Potential (COP) (White et al., 2018), the peak of the average radial distribution function (Rasp et al., 2018) (max RDF), the degree variance (degree var) of the cloud objects' nearest-neighbor network representation (Glassmeier & Feingold, 2017) and the Organization Index (I org ) (Weger et al., 1992), of which we include two versions. The first, most commonly applied form, compares the cloud field Nearest-Neighbor Cumulative Density Function (NNCDF) to a Weibull distribution. The second variant ( * org I ) compares it to an NNCDF that accounts for object size and therefore is less likely to erroneously predict regularity in the cloud fields (Benner & Curry, 1998). This metric is similar to that introduced by Pscheidt et al. (2019).
We compute four metrics from scale decompositions: The size exponent of the cloud object size distribution modeled as a power law (size exponent), the box-counting dimension of cloud boundaries in the cloud mask field (fractal dim.), the spectral length scale as defined by Jonker et al. (1999) and the deviation of variance from the mean in the horizontal, vertical or diagonal orientations of the cloud water field's stationary wavelet spectrum (WOI 3 ) (Brune et al., 2018). In this paper, we use these metrics as discriminators between individual cloud fields, not to measure their cumulative scaling properties. Finally, we introduce a novel metric: A scene's largest, rectangular, contiguous cloud-free area (clear sky), as a simple measure of lacunarity, the degree to which continuous areas without clouds dominate a scene.
We describe patterns as a linear combination of the computed metrics. To weight each metric equally, we first standardize them by setting their mean to zero and variance to one. Since many metrics in Figure 1 strongly correlate (see Figure S2), we conduct a Principal Component Analysis (PCA, e.g., Abdi & Williams, 2010). This transforms the metrics to an orthogonal basis whose components (principal components-PCs) explain the maximum variance in the data set. If a small number of PCs (orthogonal dimensions) can accurately capture the metric set's variance, these form an effective pattern description. Figure 2 shows uni-and bivariate kernel density estimates on planes spanned by the first four PCs of the distribution of metric values, annotated with the fractional variance of the data set explained by each PC (explained variance ratio-EVR). It reveals that multiple PCs (dimensions) are needed to capture the multivariate distribution's cumulative EVR (CEVR) appropriately. However, the first PC is by far the most influential (EVR = 0.49-widest distribution). Furthermore, the CEVR of the first two PCs already rises to 0.66, while including 3 and 4 of the 21 original dimensions explains 75% and 82% of the data set's variance, respectively. After the fourth PC, EVR quickly deflates to 0.04, 0.03, 0.03, 0.02, 0.02 for PCs 5-9, dropping below 0.01 after the tenth PC ( Figure S3). These statistics show that four PCs effectively capture the information in all 21 metrics. Therefore, we reduce our 21-dimensional metric set to four PCs.

A Four-Dimensional Pattern Distribution
Of course, truncating the PCA after precisely four components remains somewhat arbitrary. Yet, this choice strikes a useful balance between including enough dimensions to effectively describe patterns and sufficiently few dimensions to interpret them. This claim is visually supported by  . Univariate (diagonal, density on y-axis) and bivariate (off-diagonal, density in color) Gaussian kernel density estimates of the first four principal components (PCs) of the pattern distribution. The annotations EVR and CEVR denote the individual and cumulative explained variance ratio of each PC, respectively. Bandwidths for the Gaussian kernels are computed using Scott's rule (Scott, 1992). EVR, explained variance ratio; CEVR, cumulative explained variance ratio.

An Interpretable Pattern Description
Our four-dimensional pattern description is not only effective; by relating the PCs to their underpinning metrics, it can also be interpreted. This interpretation is facilitated by Figures 3c and 3d, which show normalized metric values (filled contours) and mean gradients (arrows) of metrics that predominantly vary in the planes depicted in Figures 3a and 3b, respectively. To further aid the interpretation, we identify "meaningful directions" (arrows in Figures 3a and 3b), by manually grouping similarly varying metrics and computing their mean gradient. Using these meaningful directions, we name the PCs and relate them to several common interpretations of organization.
Strikingly, 17/21 metrics mainly describe variations in the first two PCs (Figure 3c, see also Figure S3). These metrics derive from all three methodological categories (statistical moments, object metrics and scale decomposition metrics) and have a rather continuous spectrum of orientations, such that remarkably many meaningful directions can be used to interpret  1. Coverage (arrow in Figure 3a represents the mean gradient of cloud fraction, max length, and cloud water) 2. Space filling (fractal dim., * org I ) 3. Characteristic length (spectral length scale, size exponent, mean length) 4. Void size (clear sky) 5. Aggregation or clustering (I org , SCAI, cloud number, max RDF), as commonly associated with deep convective organization (Tobin et al., 2012;Tompkins & Semie, 2017) We adopt the two meaningful directions that best align with the PCs as names for our pattern description's first two dimensions: Characteristic length and void size. We find it both intuitive and beautiful that these two dimensions, which respectively measure the typical scale of clouds and the complementary clear sky space between them, naturally emerge from our approach.
Linear combinations of the first two PCs can construct different meaningful directions. For instance, clustering/aggregation differs only subtly from characteristic length, assigning slightly more importance to voids between cloud clusters. Space filling weights voids even more heavily. Finally, coverage distinguishes itself from void size by assigning marginally more importance to characteristic length. Hence, the same aspects of the patterns in Figure 3a can be described with different pairs of meaningful directions.
Several such pairs are already indirectly recognized as central traits of "organization." For instance, Seifert and Heus (2013) suggest that both a spectral length scale (characteristic length) and I org (clustering) may be needed to discriminate between various modes of organization; Neggers et al. (2019) identify organization as a combination of maximum cloud size (coverage) and typical nearest-neighbor distances between smaller clouds (space filling); chapter 5 of van Laar (2019) distinguishes "cloud field characteristics" (cloud fraction, maximum cloud size-coverage) from "organization parameters" (I org , SCAI, COP-clustering) and Bony et al. (2020) span their 2D description of organization with mean length (characteristic length) and I org (clustering). With so many valid interpretations of "organization," a consistent understanding of the term predicates on an awareness of how the various interpretations relate. The arrows in Figure 3a provide exactly these relationships, and can therefore advance such understanding.
Our four-dimensional pattern description also goes beyond the common, two-dimensional interpretations of organization. Figure 3d shows that PC3 quantifies variations in the degree to which clouds are directionally aligned (WOI 3 ), a characteristic that strongly correlates to the cloud water variance in a scene's largest scales (CWP var ratio). PC4 distinguishes between scenes with different horizontal variance of vertical cloud development (σ(CTH)). Hence, variations in PC3 and PC4 can be understood as combinations of directional alignment and cloud top height variance.
In conclusion, the 4 PCs of our cloud pattern description have meaningful interpretations: Characteristic length, void size, directional alignment and cloud top height variance. In combination with the description's effectiveness, this leads us to recommend using the PCs of a large metric set to describe cloud patterns.

Metric Subset Approximations
While we need only a few metrics to interpret our four PCs, computing them still requires a full loading matrix, with input from all metrics. Since one may not always want to compute as many metrics as we do here, this section investigates how well one can approximate our original PCs with a smaller subset of contributing metrics. Unfortunately, techniques which optimize a cost function that explicitly balances the accuracy of the approximate PCs with how many metrics contribute to them, such as sparse PCA (Zou et al., 2006), prove unable to robustly indicate metric subsets (see Figure S5). Hence, it is not obvious that a clearly optimal metric subset exists.
One practical way to compose a subset nonetheless is to choose one metric that most closely correlates to each PC (Cadima & Jolliffe, 1995). This approach (see Figure S3) selects spectral length scale, clear sky, WOI 3 and σ(CTH) (CEVR = 0.59, computed using the approach from Zou et al., (2006)) and is a reasonable approximation of the four PCs (CEVR = 0.82). If one's primary interest is in the first two dimensions of the pattern distribution, several roughly orthogonal metric pairs competently estimate the plane in Figure 3a. Examples include spectral length scale and clear sky (CEVR = 0.31), cloud fraction and fractal dim.
(CEVR = 0.31) or perimeter and * org I (CEVR = 0.30). All three pairs sacrifice explained variance compared to two PCs (CEVR = 0.66). Yet, they capture far more information than various metric combinations considered in literature, for example, mean length and I org (Bony et al., 2020, CEVR = 0.18), I org and fractal dim. (Denby, 2020, CEVR = 0.20), spectral length scale and I org (Seifert & Heus, 2013, CEVR = 0.19) or I org , SCAI, COP, and max RDF (van Laar, 2019, CEVR = 0.26). Using the metric subsets identified here can therefore already promote the orthogonality, variance capture and ultimately the effectiveness of subset approximations, even if they remain substantially less expressive than our four PCs.

Regimes of Patterns
Asking how many dimensions cloud patterns possess, is not equal to asking how many fundamental types of cloud patterns exist. Dividing clouds into distinct classes (e.g., cumulus or cirrus) is a classical approach, which recently inspired efforts to also classify shallow cloud patterns, using both the human eye (Stevens et al., 2019) and metrics (Bony et al., 2020). We search for these classes ("sugar," "gravel," "fish," and "flowers") in our four-dimensional pattern description by segmenting it into k-means clusters (Figure 4). When we set k = 7, four of these clusters roughly match the proposed classes, as indicated by the lettered labels in Figure 4a.
Scenes arguably dominated by "sugar" and "gravel" reside in clusters 5 (brown) and 3 (maroon). These patterns should, in the terminology from Section 3.2, be understood as small-scale with rather small voids (or disaggregated/unclustered); "gravel" distinguishes itself through its higher cloud top height variance and low directional alignment (see also left side of Figure 3b). Cluster 1 (navy) comprises i.a. "fish," which share gravel's void size, but have larger characteristic lengths, cloud top height variance and directional alignment. Finally, one may see "flowers" in cluster 7 (blue), as large-scale, aggregated structures with little directional alignment and low cloud top height variance.
JANSSENS ET AL.
10.1029/2020GL091001 8 of 11 Figure 4. Seven regimes of the 4D pattern description, identified as k-means clusters of different color:(a) Scenes scattered over planes defined by the first four PCs, each normalized to unit variance, named using the convention from Section 3.2. Pluses and crosses indicate the distribution's mean and mode, respectively. S, G, Fi, and Fl suggest typical locations for the "sugar," "gravel," "fish" and "flowers" patterns diagnosed by Stevens et al. (2019), in the two planes shown in Figure 3, determined by eye. Figure (b) shows seven examples of scenes in each regime.
The fact that sugar, gravel, fish, and flowers occupy different regimes of our systematically constructed distribution is encouragingly consistent with human pattern classification (Stevens et al., 2019) and solidifies Bony et al. (2020)'s conclusion that these patterns can be objectively identified. However, even in an unrealistic scenario where all scenes in the four clusters discussed in the previous paragraph could unambiguously be labeled sugar, gravel, fish, or flowers, they would contain only 52% of the observations in our data set. Figure 4 indicates several other regimes that differ in important regards. For instance, many scenes possess vast voids (cluster 6, sea green). In this regime, clouds likely affect the region's climate much less than sugar, gravel, fish or flowers, which all have higher cloud cover. Comprehensive analyses of the climate sensitivity of cloud patterns should consider this and other different regimes too.
In fact, pattern classification is itself an approximation. The pattern distribution is unimodal and continuous (Figure 2), and therefore does not inherently possess multiple "classes," "clusters," or "modes." Breaking the continuum into four classes or seven clusters is thus rather arbitrary and neglects subtly different patterns within a cluster. For instance, the band-like subregime at high directional alignment in Figure 3b falls within cluster 4 (peach) in Figure 4, even if this subregime is visually distinct from all displayed scenes in cluster 4. To capture such subtleties, we recommend shifting focus from regimes, classes or clusters of patterns to a more fitting, continuous representation.

Conclusion and Outlook
Research on the climate feedback of patterns in shallow trade-wind cloud fields requires a consistently understood description of those patterns. In this study, we have systematically developed such a description for square, 500 km 2 satellite-observed cloud fields east of Barbados. By projecting one new and 20 previously developed organization metrics onto a set of PCs, we show that cloud patterns can be effectively described as a four-dimensional, linear combination of characteristic length, void size, directional alignment and cloud top height variance. This description is objective and interpretable, in contrast to direct unsupervised machine learning (objective, not usually interpretable) or human pattern identification (interpretable, not objective). It also demonstrates that patterns follow a continuous, unimodal distribution without distinct classes and that visually striking patterns are extreme, rather than typical. Future studies of the physics behind and climate impact of shallow cloud patterns can therefore rely either on our PCs or, if accuracy is less important, on metrics that correlate closely to them.
The effectiveness of our approach may well extend to descriptions of deep convective organization. Many relationships between our metrics are consistent with those found for deep convective cloud fields (Brueck et al., 2020;Rempel et al., 2017), suggesting that an effective, low-dimensional description of deep convective organization is attainable. Our pattern description could also be used for forecast verification (Jolliffe & Stephenson, 2012), using the pattern distribution's dimensions as matching criteria between model and observation in similar fashion to, for example, the criteria developed by Wernli et al. (2008). In turn, the forecast verification community may offer useful insights to descriptions of cloud patterns.
Finally, our approach can itself be refined in several regards. First, using predefined metrics to describe patterns leaves potentially undiscovered information from the description. Therefore, it may be fruitful to compare our approach to fully unsupervised machine learning (e.g., Denby, 2020). However, the completeness of a pattern description should ideally be assessed in terms of how fully the underlying processes are separated. This requires process-resolving numerical simulations and/or temporally evolving observations, which link the evolution of a pattern to that of the atmospheric state. Next, our conclusions are tied to our observation scales (1-500 km), meaning that we may inadequately capture this scale window's extremes and cannot capture pattern formation processes on much larger scales. Furthermore, we treat this scale window in an integral sense and ignore patterns that appear on one scale, but may be canceled by another (Nair et al., 1998). Hence, a further refinement could be to consider pattern distributions on a per-scale basis. Lastly, some subjectivity will likely remain in how different researchers interpret "organization." This attests the richness of the underlying patterns, which we hope remains appreciated.