Convective cloud regimes from the classiﬁcation of object-based CloudSat observations over Asian-Australian monsoon areas

The present study objectively classify the convective cloud objects detected by the space-borne CloudSat radar over the Asian-Australian monsoon region using the hierarchical agglomerative clustering algorithm. Based on key properties representing the morphological features and convective intensity of the systems, ﬁve distinct convective cloud regimes are derived. The unique Coastal-Intense regime exhibits the most expansive horizontal scales ( > 1000 km), high convective strength, the strongest cloud radiative eﬀects, the highest probability of extreme rainfall, and a signiﬁcant coupling with the sharp onset of the Asian summer monsoon circulation. Secondly, the Coastal regime illustrates smaller but also highly organized coastal convections, with the strongest convective strength. Less than 10% of the systems in the CI and Coastal regimes overlap with the tropical cyclones. The rest three regimes mark the less organized convection at various life cycle stages mainly over the land areas, with small seasonal variation in their occurrence.


Introduction
The Asian-Australian monsoon region is one of the largest monsoon systems in the world.During the Asian summer monsoon season, a planetary scale low-level southwesterly jet develops over the Arabian Sean and extends to the western Pacific region which is a major source of moisture for the Asian summer monsoon.The overturning circulation over the longitudinal bands of the Asian-Australian monsoon is the major contributor to the dynamics of the zonal mean Hadley Cell in boreal summer (Hoskins et al., 2020).
The occurrence of intense rainfall and mesoscale convective systems (MCSs) is highly coupled with the seasonal transition of monsoon circulation and can exhibit multi-scale variabilities.The hotspots of the MCSs are over the Maritime Continent (MC), tropical Indian Ocean, and tropical western Pacific region during boreal winter and shift to the off-equatorial latitudes during boreal summer (Yuan & Houze, 2010).
These are also the regions that exhibit the maxima of variability in outgoing longwave radiation (OLR) and meridional wind during the boreal summer.(Hoskins et al., 2019).The MCSs contribute 56% of the total precipitation over the tropics (Yuan & Houze, 2010).They are also the major contributor to extreme precipitation events over the Asian-Australian monsoon area (Hamada et al., 2014), and the flash floods and lightning activities associated with these organized convective systems have caused severe loss of lives and properties over the densely populated coastal regions.
The operational early warning and the future projection of the occurrence of the intense MCSs over the Asian-Australian monsoon regions in a changing climate will critically rely on improving our knowledge of their spatial-temporal variability as well as the associated environmental conditions.
Long-term satellite observations of cloud and precipitation have facilitated the detailed understandings of the structures and distribution of the tropical and monsoonal convection systems.Previous studies have subjectively applied a combination of morphological features to identify and classify the MCSs.For example, Yuan and Houze (2010) defined the MCSs by the system's brightness temperature and the precipitation features and classified the MCSs by the system size and the raining core numbers based on multiple satellite observations.Large separated and connected MCSs are the two major MCSs regimes and are frequently observed over the MC, Indochina, and the coastal region.
The other studies used objective classification/clustering algorithms to distinguish cloud/weather regimes based on the statistical distribution of satellite-observables.Jakob and Tselioudis (2003) and Rossow et al. (2005)  Spectroradiometer (MODIS) gridded data.13 tropical cloud regimes (TCRs) have been classified.TCR 1, 2, and 3 are associated with convective systems: TCR1 is considered as convective core-dominated; TCR2 has relatively low cloud top and lower cloud optical depth; TCR3 is considered as anvil-dominated.In their study, the contiguous pixels of TCR1 with TCR2 or/and TCR3 at the same observed time are further defined as the convective aggregates to exam the convective system's characteristics and precipitation features.The gridded data provide large numbers of data samples for the clustering analysis and can give a more straightforward concept to the cloud system.However, the 1 o -by-1 o gridded data still relatively coarse to detect a small convective system.
A heuristic approach adopted in this study is to combine the domain knowledge of MCSs and the data-driven approach through identifying convective cloud regimes by applying the agglomerative clustering algorithm to the vertical convective cloud objects.We diagnosed the cloud objects by connecting the cloudy profiles of the CloudSat level 2 cloud mask.The cluster analysis is based on the key morphological and physical properties featuring mesoscale development quantified from the convective cloud objects.This approach provides new insights into the convective cloud regimes that can be objectively delineated by the physical properties of the convective systems, as well as their spatial-temporal occurrence.The manuscript is organized as follows.Section 2 explains the data sets and analytic methodology.The Cloud Profiling Radar (CPR) at 94-GHz onboard CloudSat detects vertical profiles of cloud hydrometeors in the atmosphere with a vertical resolution of 240 m with 125 bins and a horizontal resolution of 1.4 km across and 1.8 km along the track (Stephens et al., 2008).CloudSat overpasses the tropical area at around local time 1:30 pm and 1:30 am.The radar reflectivity and cloud mask from the level-2 2B-GEOPROF R04 product (Marchand et al., 2008) during the years 2006-2015 are used in this study (the data are only available for daytime after 2010 due to battery issues).The cloudy pixels are defined by the cloud mask value >=20, same as Takahashi et al. (2017), Riley and Mapes (2009), and Bacmeister and Stephens (2011).
To understand the statistics of the rainfall spectrum and radiative properties associated with the identified convective cloud regime, the precipitation data from the TRMM 3B42 version7 (3-hourly; 0.25-degree spatial resolution) and the National Oceanic and Atmospheric Administration (NOAA) interpolated daily OLR observations (Liebmann & Smith, 1996) were analyzed.The 850-hPa wind fields and water vapor in European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5 reanalysis (Hersbach et al., 2020), and the SST in the Optimum Interpolation Sea Surface Temperature (OISST) version 2 (Reynolds et al., 2007) were also used.
The International Best Track Archive for Climate Stewardship (IBTrACS) version 4 (Knapp et al., 2010) datasets were also applied to analyze the relationship between the Tropical Cyclones (TC) and the convective cloud objects.The definition of the TC here is the 10-minute sustained wind speed higher than 34 knots.

Definition of Convective Cloud Objects and Feature Selection
CloudSat cloud mask observations were first processed by the 4-way connection to identify cloud objects (Tsai & Wu, 2017;Chen et al., 2019a;Su et al., 2019).The connecting algorithm is provided in the supplementary material.We subjectively define convective cloud objects as those with a continuous vertical extent between 2 km and 6 km altitudes.As shown in the example of Fig. 1(a), five object-based key physical properties can be diagnosed after defining the convective cloud objects and selected as the features for clustering analysis, namely, (1) cloud size, (2) horizontal scale (3) cloud top, (4) whether the observed 0 dBZ echo height is above 10 km, and (5) whether the observed 10 dBZ echo height is above 10 km.Cloud size and horizontal scale both represent the scale of cloud objects but their combination can be used to infer the shape of the cloud objects, particularly the coverage of the anvil parts.
The cloud top represents how high cloud objects can develop.Whether the observed 0 dBZ or 10 dBZ heights is above 10 km represent the convective strength of cloud objects.If both features are "yes", the cloud objects contain relatively strong convective strength.If there are only 0 dBz height is above 10 km then the convective strength are medium.If both features are "no", the cloud objects exhibit relatively weak convective strength.

Agglomerative clustering analysis
Data volume reduction is one of the advantages while using convective cloud objects as clustering targets.There are a total of 44,000 swaths (every swath is an X-Z array of size 37000 by 125) for the original CloudSat observations over the study period.The data volume takes up to one terabyte for storage alone, and not to mention the computational resource demanded if we perform cluster analysis with the raw data.
With the convective cloud objects identifying processes, 0.11 million convective cloud objects are identified over the tropical Asian-Australian monsoon areas, and five selected features use to represent these convective cloud objects.The data volume significantly drops to 17 megabytes, which is more reasonable for further analysis.
The present study adopts the hierarchical agglomerative clustering (HAC, Ward, 1963) algorithm to discover regimes of convective clouds.HAC is a bottom-up clustering algorithm for unlabeled data, and the number of clusters need not be presumed as in the k-mean clustering.The algorithm calculates similarities between objects' input features and proceeds as follows: (1) each object represents a cluster, (2) merge two most similar clusters and recalculate the similarities, and then (3) run step 2 until all data are merged into one cluster.The clustering processes can be visualized as a dendrogram and the optimal number of clusters can be inferred from it.In this study, we use the HAC implemented in the scikit-learn library (Pedregosa et al., 2011).

The Convective Cloud Regimes
The cluster analysis yields five distinct types of convective cloud regimes.The example selected for each regime is visualized in Fig. 1 We also examine whether these two highly organized and expansive regimes are the TCs.If the distance between the geolocation of the convective cloud object and the center of the TC from IBTrACS v4 within ±12 hours is less than 500km, the object would be defined as highly associated with TCs.Only 364 (7.4%) of the CI objects

OLR and Extreme Precipitation Associated with Each Regime
As CloudSat only samples the cloud system along the narrow (~1 km) swath, here we examine the OLR and precipitation using observations collected within the 6 o 6 o areas surrounding the convective cloud objects.Fig. 3 A similar statistical analysis on areal mean column water vapor (CWV) and SST manuscript submitted to Geophysical Research Letters 13 has also been carried out using the ERA5 reanalysis data and OISST data, respectively (figure not shown).Most of the convective cloud objects occur in a highly moist and warm surface environment (CWV > 45 mm; SST>27 o C) and both the intra-regime spread and inter-regime differences are small; except some objects in the Weak regime appear in the environment with CWV < 45 mm.

Discussions
The objective classification method of the present study is similar to those of Rossow et al. (2005), Luo et al. (2017), and Jin et al. ( 2020) (e.g., K-mean clustering), but using different "features" to delineate the clusters/regimes from the satellite cloud products.These previous studies classified the joint histograms of cloud optical depth, cloud top height, or radar reflectivity, which are the detailed statistics of the pixel properties collected over specific domain/grid areas to represent weather conditions or cloud composition, instead of individual convection systems.Our study identifies the regimes by morphological and physical characteristics of the connected cloud objects.
These features are referred to the key empirical properties applied in the literature on identifying mesoscale convection systems over the tropics and subtropics (e.g., Yuan and Houze, 2010;Houze Jr., 2004;Houze Jr. et al., 2015;Hamada et al., 2014), but the data-driven approach relieves the need to pre-define thresholds to subjectively select systems of a certain scales or convective intensity.Overall, the CI objects usually have highly continuous cloud structures and may contain several strong convective cores.The cross-sectional appearance of the Coastal objects is similar to those of the CI objects but with smaller horizontal scales.Objects of the Weak regime are usually small and low, indicating they are not well-develop or in the early stage of the life cycle, while the Medium and Strong regimes are more like mature convective manuscript submitted to Geophysical Research Letters 14 systems with the anvil.Both CI and Coastal regimes occur more frequently over coastal and ocean areas than over the land/islands, while the opposite is true for the three smaller regimes.
The relationships between the multi-scale convective systems and the large-scale circulation over the Asian monsoon region is showcased by the seasonal evolution of the CI and Coastal regimes (Figs. 4(a) and (b).).The CI regime exhibits the sharpest seasonal transition among all regimes.During the boreal winter, the convective cloud objects mostly occur in the southern hemisphere to the south of 5 o N.After the precipitation sharply increases over the northern hemisphere around mid-May, corresponding to the onset of the Asian summer monsoon (the enhancement of the low-level southwesterly winds over the northern hemisphere), the hotspots of convective cloud objects will transfer to the northern hemisphere and will transfer back to lower latitude after September.It is most active over the northern hemisphere in boreal summer than any other season, well centered over the latitudes where the averaged precipitation is above 9 mm day -1 , indicating that the CI cloud objects are the major contributors to the intense summer monsoon precipitation.The hotspots of the CI cloud objects in Fig. 2(a) combining with the high occurrences over the latitudes where the southwesterly wind speed at 850 hPa exceeds 5 ms -1 , indicating that the development of CI cloud objects highly related to the interaction between the prevailing wind and the topography.Moreover, the Coastal regime occurs more frequently around the edge of the 9 mm day -1 contour (Fig. 4(b)) and has no significant increase of occurrence over the latitudes where the low-level southwesterlies are enhanced.They also exhibit prominent seasonal variability while more prominent during the transitional period (e.g.Mar.-May and Sep.-Nov.),compare to the CI regime.The occurrence hotspots of the CI and Coastal regimes coincide with regions of maximum OLR and meridional wind variability in boreal manuscript submitted to Geophysical Research Letters summer (Hoskins, et al., 2019).We hypothesize that the diabatic heating of these highly organized systems can contribute significantly to the overturning circulation of the local or even the zonally averaged Hadley cell.The other three regimes show less seasonal variation; only the Medium and Strong Regimes (Figs. 4(d) and (e)) are somewhat enhanced during boreal winter in the low-latitudes, likely associated with the more active Madden Julian Oscillation and other tropical waves during this period.
Recent studies have found that the cloud vertical structure and the associated diabatic heating are tightly connected to the modulation of the large-scale environment during the transition of intraseasonal variability (e.g., Riley et al., 2011;Del Genio et al., 2012;DelGenio & Chen, 2015;Ciesielski et al., 2017;Hung et al., 2020) and monsoon onset (e.g., Chen et al., 2019aChen et al., , 2019b)).In the future, collocating the other satellite products from the A-train, the Global Precipitation Measurement, or the geostationary Himawari-8/9 satellites can be carried out to provide additional properties to the convective systems.The object-based radiative and latent heating effects associated with various convective cloud regimes can be investigated to understand the interaction between the organization of convective systems and the large-scale circulation transition.

Conclusions
In this study, the convective cloud systems over the Asian-Australian monsoon region were identified using an object-based machine learning classification from the multi-year CloudSat observations.The 4-way connected vertical continuity method is applied to obtain convective cloud objects and their object-based properties from the CloudSat cloud mask retrievals.
classified the joint histograms of cloud top pressure and the cloud optical depth joint histogram from the International Satellite Cloud Climatology Project (ISCCP) by the k-means clustering algorithm and six tropical weather states (WS) were identified.Three of the "convectively active" WSs are associated with the specific composition of cloud mixture over the Tropic and Asian monsoon region, each with distinct regional or land-ocean characteristics of occurrence.However, the resolution of ISCCP data (2.5 o -by-2.5 o ) limits the more details of analysis to the detailed structures of convective cloud systems.Luo et al. (2017) also applied the k-means clustering algorithm to multiple satellite radar-lidar products over the tropics.The H-dBZ joint histograms, which statistically synthesizes the cloud vertical development and hydrometeor distribution from profiles over a certain domain, are classified into four cloud regimes; two of them are convection-related: one represents the mature MCSs and relatively high occurrences over the MC and tropical ocean, while the other represents the dissipating MCSs over the western MC, South China Sea (SCS), Indochina and the open ocean.Jin et al. (2020) applied the k-means clustering algorithm to the joint histograms of manuscript submitted to Geophysical Research Letters 6 cloud top pressure and cloud optical depth from Moderate Resolution Imaging Section 3 presents the clustering results and the statistics of each cloud regime, while Sections 4 and 5 provide the discussion and conclusion, respectively.CloudSat cloud mask over the low-latitude regions of the Asian-Australian monsoon areas (25 o S-25 o N, 70 o E-150 o E) during years 2006-2015 are analyzed here, covering the main areas of Indian, East Asian, and Australian monsoon, as well as the MC.CloudSat is a National Aeronautics and Space Administration (NASA) A-Train polar-orbiting satellite launched on 28 April 2006.
(b), with the statistics of the five key features considered in the cluster analysis.Fig 2. shows the spatial distribution of the occurrence for each regime from 2006 to 2015, stratified at manuscript submitted to Geophysical Research Letters 10 2 o -by-2 o latitude-longitude grids based on the geolocation of the objects' centroids.The regimes are named according to their dominant spatial distribution and/or intensity.The first regime (Coastal Intense; CI hereafter; n= 4901) is convectively highly organized, exhibiting the most expansive horizontal extents among all regimes (averaged scale = 1190 km) and deep vertical development (averaged cloud top = 15.7 km).Among the objects classified as CI, the probability of 0 dBZ (10 dBZ) echo height above 10 km is 99% (64%), indicating relatively strong convective strength.The CI objects frequently occur over coastal areas between 0 o -25 o N (Fig. 2(a)), concentrating over the west coast of Indochina, Sumatra, Borneo, and the Philippines, as well as the ocean areas of the Western Pacific.The second regime (Coastal; n= 5309) is also very organized, with an averaged horizontal scale of 641 km and a deep cloud top averaged 15.7 km.The Coastal regime is smaller horizontally than the CI regime but with similarly cloud top development and even stronger convective strength that all of them have 10 dBZ > 10 km.Coastal objects occur frequently over the ocean and coastal region of the deep tropics (10 o S-10 o N), as shown in Fig. 2(b).
and 143 (2.7%) of the Coastal objects are related to TCs.The Weak (n= 65529), Medium (n= 18816), and Strong (n= 18827) regimes are smaller in size, differentiated mainly by their convective strength.Figures 2 (c) to (e) show that these three regimes have similar spatial distribution mainly over the Maritime Continent; the hotspots over the large islands become more prominent with increasing convective strength.The Weak regime exhibits the smallest size and lowest manuscript submitted to Geophysical Research Letters 11 cloud top overall.Near 85% in this regime exhibit a cloud top lower than 10 km with an averaged horizontal scale of 70 km.The remaining 15% of the objects have averaged scale of 203 km with anvil structures, indicating they are in the dissipating stage.The Medium regime is wider and deeper than the Weak regime (averaged horizontal scale of 243 km and averaged cloud top of 13.7 km), while the Strong Regime has a similar horizontal extent (181 km) but deeper in the vertical (14.7 km).Note that the horizontal scales of most objects in the CI, Coastal, Medium, and Strong regimes are above 150 km, and our classification results show that these tropical MCSs further emerge from the data as distinct regimes by their physical properties.In general, the physical characteristics and spatial distribution of our CI and Coastal regimes are included in tropical weather regimes or convective cloud regimes reported in previous studies that feature large mesoscale systems concentrated over coastal areas with prominent seasonal variation, such as the WS1 inRossow et al. (2005), parts of Cluster 1 and Cluster 2 inLuo et al. (2017), the large connected and some of the large separated MCSs inYuan and Houze (2010), and the large convective aggregates inJin et al. (2020).Our data-driven approach further distinguishes these coastal MCSs into two categories (CI and Coastal) based on the physical characteristics of the cloud objects.
(a) compares the boxplots of the areal mean OLR of the five regimes.Over 80% of the objects in the CI regime exhibit areal mean OLR below 200 W m -2 , indicating they are well-developed systems with very high cloud tops.Some objects in the CI regime is associated with relatively higher mean OLR (above 200 W m -2 ) may be related to the mid-latitude frontal systems penetrating to lower latitudes, considering their wide horizontal scale.Only 65% of the Coastal objects have areal mean OLR below 200 W m -2 .Considering that the Coastal objects exhibit similar convective strength cloud top height as the CI objects, the difference in their areal mean OLR may due to the smaller horizontal scale thus less coverage of high cloud top of the Coastal objects.The linkage between the convective cloud regimes and the precipitation extreme is demonstrated in Fig. 3(b) which presents the probability distribution of maximum precipitation within the 6 o 6 o areas surrounding the convective cloud objects from TRMM 3B42 (see figure caption for the detail of deriving the statistics).The CI andCoastal regimes have a higher probability of maximum rainfall > 12 mm hr -1 , with spectral peaks around 10-13 mm hr -1 and 99 th percentile of 39-40 mm hr -1 .The CI objects show the highest probability of occurrences for maximum precipitation > 24 mm hr -1 among all five regimes, indicating these highly-organized, well-developed convective systems are more likely associated with extreme rainfall.For the Weak, Medium, and Strong regimes, as the horizontal scale of these objects are smaller, the mean OLR over the 6 o 6 o area can therefore include both cloudy-and clear-sky areas.The Weak regime exhibits the highest variability in OLR and Over 70% of the Weak objects have mean OLR above over 200 W m -2 , corresponding to the lack of anvil coverage.The objects in the Medium and Strong regimes exhibit lower OLR values, consistent with their larger horizontal scale and higher cloud top than those in the Weak regime.The intensity spectra of the Weak, Medium, and Strong regimes are similar, with peaks around 4-7 mm hr -1 and the 99 th percentile of maximum precipitation around 30 mm hr -1 .
manuscript submitted to Geophysical Research Letters 16 Five distinct convective cloud regimes were classified by the agglomerative clustering algorithm, based on properties associated with cloud shape and convective strength.Two regimes (CI and Coastal) are highly organized convective systems over the coastal regions, and less than 8% of them are associated with Tropical Cyclones.The other three regimes (Weak, Medium, and Strong) are the less organized system but more numerous.They share a similar spatial distribution mainly over the major islands.The CI regime exhibits the most expansive horizontal scale, the lowest areal mean OLR, and the highest probability of extreme rainfall.The Coastal regime, on other hand, shows stronger convective strength than the CI regime but with a smaller horizontal size, higher OLR, and less extreme rainfall.The occurrence of the CI and Coastal regimes closely follows the onset of monsoon low-level circulation and precipitation.The Coastal regime more often appears during the transition period while the CI regime is most active during the peak precipitating period of the boreal summer monsoon.Overall, the CI and Coastal regimes are characterized by very extensive stratiform areas, multiple convective cores, and the strong cloud radiative effects on the regional OLR, in contrast to the other three regimes that are less organized with less radiative impacts.The seasonal occurrence of these regimes can contribute to the seasonal variation of the contrast in OLR between land and coastal oceans in the Asian-Australian monsoon region.

517 Figure 2 .Figure 4 .
Figure 1.(a) Example of a convective cloud object identified from CloudSat (occurred on Aug. 28, 2008 over the west coast of the Luzon Island).Colored shading is the radar reflectivity (dBZ).The size (cross-section area) of this object is 8700 km 2 , while its horizontal scale is 830 km along the track.The cloud top height is 15.8 km.Both 0 dBZ and 10 dBZ reflectivity occurred above 10 km altitude.(b) Examples (top row; color shading is radar reflectivity) and statistics of the key features (bottom row) of all convective cloud objects classified into the (1) Coastal Intense, (2) Coastal, (3) Weak, (4) Medium, and (5) Strong Regimes, respectively.The snapshot examples are taken from Jul. 15, 2006 over the east coast of the India, May 15, 2007 over the south coast of the Vietnam, Jan. 10, 2008 over the India Ocean, Jun. 2, 2007 over the Bay of Bengal, and Jun.27, 2007 over the Sulu Sea, respectively.