Global airborne bacterial community—interactions with Earth’s microbiomes and anthropogenic activities

Significance Understanding the interactions of planetary microbiomes and their ecological and health consequences requires in-depth knowledge of bacterial communities in the atmosphere, which is the most untouched microbial habitat on the Earth. By establishing a comprehensive atlas of global airborne bacteria, we found that half of the airborne bacteria originate from surrounding environments and are mainly influenced by local meteorological and air quality conditions. One feature of the airborne bacteria in urban areas is that an increasing proportion consists of potential pathogens from human-related sources. The present study defines the aerial microbial world and its origins in a changing climate, and contributes to assessments of the health impact in atmospheric environments.


S1.1 Acquisition of Environmental Data
In this study, all environmental variables were divided into three major groups: meteorological condition, air quality, and earth surface type. The global hourly meteorological data in each site during the sampling process, containing air temperature, air pressure, relative humidity, wind speed, and wind direction; and the corresponding air quality data including AQI, PM10, PM2.5, SO2, NO2, O3, and CO, were obtained from diverse national public websites and the environmental monitoring centers of local governments. The global hourly meteorological data in each site during the sampling process were downloaded from the official website of the National Climatic Data Center (ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite/) and Weather Underground (https://www.wunderground.com/). Since there is still no globally uniform air quality monitoring system like that of NOAA (the National Oceanic and Atmospheric Administration) for air pollutant records, we compiled air quality data obtained from the environmental monitoring centers of the respective national governments, including the United States Environmental Protection Agency The quantification of types of land cover in a diameter range of the sampling sites (50 km) was performed with a MODIS (Moderate-resolution imaging spectroradiometer) land cover approach (5′ × 5′ resolution). Using the IGBP (International Geosphere-Biosphere Programme) system (MCD12Q1-1), the different MODIS land coverages were described (1). The relative contribution of each type of landscape to the aerial emission of bacterial cells was predicted by weighting these relative surfaces by their associated bacterial cell concentration, as reported earlier (2).

S1.2 Real-Time qPCR Quantification of Targeted Genes
The total bacterial loading was approximated by the concentration of 16S rRNA gene copies in the air. The 16S rRNA gene was amplified on a StepOnePlus Real-Time qPCR System (Applied Biosystems) with the following primer sequences: 5'-TCCTACGGGAGGCAGCAGT-3' as the forward primer and 5'-GGACTACCAGGGTATCTAATCCTGTT-3' as the reverse primer. To quantify the absolute number of copies of 16S rRNA genes in DNA samples, a seven-point standard curve (including a blank standard) in a 10-fold serial dilution was run with samples. All of the samples, standards, and blanks were run in triplicate with a 90% -110% application efficiency. Finally, the specificity of the amplicons was verified by a melt curve analysis. The 20-μL qPCR reaction mixture contained 10 µL of Power SYBRTM 5 Green PCR Master Mix (Life Technologies, CA, USA), 1 µL of template DNA, 0.5 μL of each primer (100 nM), and RNAse-free water to complete the final 20 µL volume. The 16S rRNA gene was amplified according to the following protocol: an initial step at 95 °C for 10 min for enzyme activation, then 40 cycles of 10 s at 95°C, and 1 min at 60°C for hybridizations and elongations. The amplicon length was around 400-500 bp. In addition, to reduce the variations in the 16S rRNA gene copies caused by particle size, the mean ratios of bacterial loadings in PM2.5 with PM10 (1: 1.56±0.74) and with TSP (1: 7.44±3.86) collected in the same sites during the same sampling period were used to modify the data drawn from the literature (Fig. S10).

S1.3 Pathogen Identification Based on Metagenomes
The limited DNA content (~50 ng) in each collected ambient air sample was used to construct a low-input library, followed by shotgun sequencing on an Illumina Hiseq X Ten platform with a 150bp paired-end read length. The clean data were uploaded to the NCBI with the accession number PRJNA858396. The raw reads on metagenomic sequencing were processed to filter out low-quality S6 reads using fastp with default parameters (3). The filtered clean metagenomic reads were taxonomically profiled using Kraken 2 (v2.0.8-beta) (4) and Bracken (v2.5.0) (5), using the standard Kraken 2 database. Human pathogens, especially the nosocomial ESKAPE pathogens that exhibit multidrug resistance and virulence (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter spp.), were identified on the species level to verify the results of the identification of pathogens in the 16SPIP pipeline.

S1.4 Chemical Analysis
After acid pretreatment for filter samples, the final solutions were analyzed by inductively coupled plasma -optical emission spectrometry (ICP-OES) to detect the total concentrations of various elements. Organic carbon (OC) and element carbon (EC) concentrations were analyzed by the DRI Thermal/Optical Carbon Analyzer (Model 2001) and calculated based on the result of thermal optical reflectance (TOR) protocol (6). Soluble anions (NO3and SO4 2-) and cations (NH4 + ) in filtrates were analyzed by Ion Chromatography.

S1.5.1 Core Bacteria Identification.
The determination of a global core set of airborne bacteria was based on their abundance and occupancy in regional and temporal variations with reference to multiple reported methods. First, 166 OTUs with a high mean relative abundance (> 0.01%) across 370 samples were selected as overall abundant OTUs (7). Then, 68 OTUs among the overall abundant OTUs with an occurrence frequency in all samples of more than 80% were filtered out as widespread OTUs (8). Finally, the dominant OTUs in more than half of the samples were further filtered out and identified as the final core airborne bacterial community. In sum, the process was as follows: OTUs were sorted based on their abundance in each sample, and the accumulated sum of the relative abundance was calculated one by one; those OTUs that made up the top 80% of the bacterial community in each sample, as well as those that occurred in more than half of the samples, were chosen as the final core bacteria.

S1.5.2 Diversity Analyses and Correlations with Environments.
To compare the structure of the bacterial community across different regions, α-diversity and β-diversity were computed using the "Picante" package (9) in R based on the original OTU table directly generated by sequence processing. In addition, to minimize deviations in the number of bacterial taxa (i.e., richness) caused by different particle sizes, we used the mean ratio of richness in PM2.5 to PM10 (1: 1.73 ±0.63) and TSP (1: 2.14±0.84) collected in the same sites during the same sampling period to modify the richness data in the literature (Fig. S10). The Bray-Curtis dissimilarity matrix for the airborne bacterial community structure (OTU abundance-based) between pairs of samples was calculated to estimate the taxonomic β-diversity using the "vegdis" function in the "vegan" R package (10).
Another R package, "geosphere", was applied to calculate the geographic distance between any two sampling sites across the globe, based on geographic coordinates (11). For each environmental variable, a partial Mantel test with 999 permutations was performed to examine the correlation (Pearson's rank correlation) between the environmental variable matrix and the composition of the bacterial community using the "vegan" R package (10).
Also, we used the Akaike information criterion (AIC) to select the best model to fit the relationship between two variables. Given a collection of models for the data, AIC can be used to estimate the quality of each model, relative to each of the other models. AIC has been widely used for statistical inference and is gradually forming the basis of a paradigm for the foundations of statistics.

S1.5.3 Network Construction, Topological Property Calculation, and Keystone Species
Identification. To filter out rare OTUs, those OTUs with a mean relative abundance of over 0.1% in all samples and a relative abundance of over 0.5% in any one sample were retained for network S7 construction. The co-occurrence patterns were based on the Spearman correlation matrix performed with the WGCNA package (12). The final network was constructed with statistically significant relationships (p < 0.01, R > 0.6) (13). The nodes in this network represent OTUs, whereas the edges (that is, connections) correspond to robust and significant associations between OTUs.
Topological properties were respectively calculated for each node in the resulting network with the "igraph" R package (14), and those OTUs with high degree (> 72, 75% of the highest degree) and low betweenness centrality scores (< 2000, 10% of the highest betweenness centrality) were recognized as the keystone taxa (15). Included in the set of topological properties for each node were degree, betweenness centrality, closeness centrality, and transitivity. Betweenness centrality reveals the role of a node as a bridge between components of a network, while degree reveals the role of a node with direct connections with other OTUs in the whole community. Thus, the two important indexes were normally selected as critical criteria for the identification of keystone species in the overall co-occurrence network (16).
Except for each node, the topological properties were also calculated in the whole co-occurrence network to further understand the complex web of interconnected bacteria (15). These topological indexes contained the number of nodes, the number of edges, the average degree, the average shortest path length, the average connectivity, and the average clustering coefficient (Table S3).
The "Smallworldness" Index, another important index that represents the correlation compactness of individuals in a network, was computed based on the transitivity (any pairs of nodes with direct or indirect connections could be transitive) and the average shortest path length (the average number of steps along the shortest paths for all possible pairs of network nodes) in the "qgraph" R package. A network can be recognized as "small-world" if its "smallworldness" is higher than one (a stricter rule is "smallworldness" ≥ 3) (17). The judgment criteria of "small-world" network is to determine if it has a transitivity that is substantially higher than that of comparable random networks and an average shortest path length that is similar to or higher (but not many times higher) than that computed on random networks (18). Edge weights, signs, and directions are ignored in the computation of the indices.

S1.5.4 Estimation of the Richness of Global Microbiomes.
Scaling laws describe the functional relationship between two physical quantities, i.e., the total abundance (NT) and the abundance of the most abundant species (Nmax), that scale with each other over a significant interval, underpin unifying theories of biodiversity, and are among the most predictively powerful relationships in biology. We used scaling laws to predict global airborne bacterial richness (S) based on the lognormal species abundance model (19). S could be predicted in terms of NT, Nmax, and the assumption that the rarest species is a singleton (Nmin = 1), and by using the following equation: where a could be numerically solved by the following:

S8
The total airborne bacterial abundance was estimated using the qPCR quantification results of 16S rRNA gene copies from this study and published data. Thus, the NT (global airborne bacterial abundance in the troposphere) is about 2.69 × 10 25 . Details of the calculation process are given below. In addition, Nmax was inferred based on the proportion of the typically most abundant genus or using the dominance-abundance scaling law: The most dominant taxonomic unit (based on a 97% 16S rRNA sequence similarity) in the troposphere was typically predicted to be a member of the Bacillus genus and accounted for around 1.2% of 16S rRNA gene reads in the whole dataset. Nmax would then be approximately 3.2 × 10 23 , which was close to the prediction by the scaling law, Nmax = 1.78 × 10 23 . The same method was also applied to microbial communities in global soil, global freshwater, and global leaf surfaces.
The approximate values of NT and Nmax for global microbiomes were calculated as follows: (1) Troposphere. The exact upper boundary of the atmosphere (biosphere) was a Gordian knot for estimating the total number of airborne bacteria in the whole biosphere. The traditionally cited highest altitude for aerobiology is 77 km, due to the detection of microbes on the surface of one rocket; however, many researchers doubted that the microbes came from the rocket itself (in that study there was no detailed description of sterilized operations or of the steps taken to prevent contamination) or from the flying soil caused by the rocket making landfall (20). Nevertheless, it is certain that the troposphere contains approximately 80% of the total mass of the atmosphere; also, the temperature was -56 o C and the humidity was nearly zero at the frontier between troposphere and stratosphere, where microbiomes can hardly survive (21). In addition, human activities mainly proceeded in the troposphere. As a result, our study focused on airborne bacterial communities in the troposphere, beginning at the land surface and extending to between 17 km at the equator and 7 km at the poles, with a mean altitude of 12 km (21). First, we assumed that the earth was a sphere with a radius of 6,371 km, and overlooked surface effects such as mountains and valleys. Second, we divided the troposphere into three circles based on the elevation: 0-1 km, 1-8 km, and 8-12 km According to the qPCR quantification results of the 16S rRNA gene copy from this study and published data, we predicted the mean bacterial density in the three circles: 4.8×10 5 cells/m 3 (0-1 km); 6.8×10 5 cells/m 3 (1-8 km); and 5.1×10 3 cells/m 3 (8-12 km) based on the measurements in this study and published studies (locations of these air samples can be found in Fig. S11). We assumed that the air was kept in the same conditions within cycles, ignoring the intra-circle variations in temperature and air pressure. The total bacterial loading in each circle was calculated by multiplying the total air volume and the mean bacterial density in the corresponding circle. Finally, we added them up to denote the total abundance of airborne bacteria in the troposphere. As a result, the NT (global airborne bacterial abundance in the troposphere) is about 2.69×10 25 , and Nmax would be inferred to be 3.2×10 23 or 1.78×10 23 respectively according to the proportion of the most dominant genus and the scaling law (19).
(2) Global soil. The most dominant genus-level candidate (based on a 97% 16S rRNA sequence similarity) in global soil is the Mycobacterium, with an estimated proportion of 0.61% in each sample on average (22). As for the total number of bacteria cells in the global soil, we used data from the literature indicating that there are about 9.4×10 28 microbial cells in global soil ecosystems (23). The detailed steps in determining the total number of bacterial cells in the global soil were as follows: First, the global soil was categorized into 12 classes according to types of ecosystems, and the total area on the Earth's surface was calculated. Then, mean bacterial densities with different depths (0-1 m and 1-8 m) were calculated based on as many measurements made around the world as possible. Next, boreal forest and tundra and alpine soils were assumed to be 1 m deep, but other classes of soil were 8 m deep. The soil volume could be calculated by multiplying surface areas and depths. The total number of bacterial cells in the global soil was estimated by the sum of the results from 12 classes of soil. The Nmax of the global soil would be approximately 5.7×10 26 , which is close to the estimated value of Nmax using the dominance scaling law (19), 3.5×10 26 .

S9
(3) Global freshwater. We used the value 4.7×10 25 as the number of microbial cells in global freshwater, including both rivers and lakes (23,24). In addition, the most abundant taxonomic unit (based on a 97% sequence similarity in 16S rRNA reads) in global freshwater is typically a member of the Pseudomonas genus, and accounted for around 1.57% of the 16S rRNA gene reads in a sample based on the EMP database (25); the Nmax of the global freshwater would then be 7.4×10 23 or be estimated by the scaling law (19) as 3.0×10 23 .

S1.5.5 Interactions of the Bacterial Community Composition of Air with Other Bacterial
Habitats. The bacterial abundance table used in this study, containing 5,166 global samples from multiple habitats, was obtained from the EMP database (25). We also followed the unified standard workflow employed by the EMP to analyze our airborne bacterial sequence data based on a closed reference against Greengenes 13.8 in Qiime2 (26). The processed airborne bacterial OTU table was merged with the EMP OTU table, including samples collected from soil, rhizosphere, freshwater, ocean, air, human, and animal-associated habitats. The non-metric multidimensional scaling (NMDS) analysis was first performed based on the Bray-Curtis dissimilarity matrix for a comparison of microbial community compositions across habitats. In addition, the derived OTU table was used as the input file to estimate the proportion of each airborne bacterial sample attributable to various habitats on the genus level by using "SourceTracker" (27). To explore the patterns of interconnection and the coexistence of bacterial communities across various habitats at the global scale, the Earth's metacommunity co-occurrence network consisting of a communal catalog was also constructed with robust correlations (ρ > 0.7, p-value < 0.01).

S1.5.6 Quantifying the Relative Importance of the Microbial Community Assembly Process.
A phylogenetic-bin-based null model analysis (iCAMP) was used to quantify the microbial community assembly process (28), which included three major steps: phylogenetic binning, conducting a null model analysis within each bin, and integrating the results of different bins to assess the relative importance of each process. Based on iCAMP, we were able to obtain quantitative results on community assembly processes containing heterogeneous selection (HeS), homogeneous selection (HoS), dispersal limitation (DL), homogenizing dispersal (HD), and drift (DR) from the statistical perspective.

S1.5.7 Multivariate Analysis.
PCoA was used to visualize spatial and temporal differences in the keystone and core bacterial communities between the samples, based on the "Euclidean" index, and to evaluate the impacts of some environmental variables on these communities. We also used redundancy analysis (RDA) to identify the relationships between keystone or core bacterial genera and soluble ions or heavy metals in PM2.5. This was performed by the "vegan" R package.
To quantify the relative contributions of the environment effect (separately or jointly) versus the distance effect on β-diversity, a variance partitioning analysis (VPA) was performed based on the RDA algorithm. We first selected a subset of explanatory variables from the set of all variables for constrained ordination, the goal of which was to reduce the number of explanatory variables entering the analysis while keeping the variance explained by them to the maximum. The standard method is stepwise selection, the combination of two approaches: forward selection (adding explanatory variables one by one) and backward selection (starting from the full model and deleting variables of which the least decreases the total explained variance). In addition, the initial set in our study included three explanatory groups: air quality (e.g., AQI, PM10, PM2.5, NO2, SO2, and CO), meteorological conditions (e.g., air temperature, air pressure, wind speed, wind direction, and relative humidity), and land cover type (e.g., water/sea, forest, shrubs, grassland, cropland, and built-up areas). After a forward selection of environmental factors, the variables that remained for a VPA analysis were: NO2, CO, PM10, PM2.5, wind speed, air temperature, air temperature, water/sea, cropland, and grassland (core bacteria); SO2, NO2, CO, O 3 , PM10, PM2.5, air pressure, air temperature, relative humidity, wind speed, wind direction, water/sea, forest, shrubs, cropland, grassland, and built-up areas (keystone bacteria).
Moreover, to further explore the direct and indirect relationships among geographic locations, environmental variables, and bacterial communities, the structural equation model (SEM) was built S10 with the "lavaan" package (29). The prior model was set using all hypothesized reasonable indirect and direct links among the variables in SEM based on their pairwise correlations. We subsequently removed non-significant relationships and variables or created new links between other terms, i.e., the post hoc model modification, until all quantitative indices met the overall goodness of fit. The composition of airborne keystone bacterial and core bacterial communities was indicated by their first principal coordinates (PCoA1). The SEM evaluation was based on the fit indices for the test of a non-significant chi-square test (p > 0.05), the root means square error of approximation (RMSEA) < 0.08, the standardized root means square residual (SRMR) < 0.05, the Tucker-Lewis index (TLI) > 0.90, and the comparative fit index (CIF) > 0.95.

S2.1 Small-world Characteristics in the Global Air, Soil, and Marine Bacterial Co-occurrence Network
In general ecology, real networks, including biological networks, have been proven to have the "small-world" property (30), which means that individuals are more connected to each other than in a random network. In this first attempt to construct an airborne bacterial network, we also computed the "smallworldness" index (17) by relying on the global transitivity of the network (31) and its average shortest path length. This indicated that the global airborne bacterial community network was not a "small-world" network ("smallworldness" index = 0.51 < 1) (Fig. 1E). In other words, the common "small-world" network in real ecosystems was not observed in our dataset. Conversely, the two other global bacterial datasets on topsoil and marine ecosystems both met the properties of a "small-world" network (soil "smallworldness" index = 5.82 > 3 for a stricter rule; marine "smallworldness" index = 1.21 > 1 for the general rule). The "smallworldness" index of the bacterial community network showed a decreased gradient from soil, marine, to air habitats, which was consistent with the variations in other topological properties (Table S3), such as the average shortest path length (3.03 < 3.97 < 5.24), diameter (9 < 10 <15), and clustering coefficient (0.48 < 0.58 < 0.67). Furthermore, the higher "smallworldness" index and shorter average shortest path length indicated a closer relationship between OTUs. This could be interpreted as increasing the speed of the response of the network to perturbations, finally leading to a more stable community network structure. All of the above results indicated that the airborne bacterial community network was not as stable as the soil and marine ecosystems and could be more easily affected by external factors.

S2.2.1 Differentia of Taxonomic Composition in Whole, Core, and Keystone Bacterial
Communities on a Global Scale. The structure of the airborne bacterial community exhibited great dissimilarity from those of other ecosystems (Fig. 3B); in essence, a core set of 24 bacteria and 19 keystone species was exclusively determined to exist in the unique and huge airborne bacterial communities (Tables S1 and S4). In addition, OTU22, OTU94, and OTU159 overlapped both core bacteria and keystone species (Fig. S4), showing the great importance of whole communities due to hyper dominance and tight connections with other members. These could be recognized as the top three key species. What they have in common is that they are all grampositive bacteria and belong to the same phylum, Actinobacteria.
In addition, the differences in the community composition structures of keystone, core, and all OTUs were further compared. As one of the most widespread phyla in airborne bacterial communities (24.8% in whole communities), Firmicutes has been well documented as producing endospores that are resistant to extreme desiccation for surviving in extreme conditions. However, no members of Firmicutes were identified as keystone species because there were few interactions with communities. By contrast, Actinobacteria did not exhibit overwhelming superiority with regard to abundance (18.1% in whole communities and 19.6% in the core set); yet they were closely related to the whole airborne bacterial communities and even exceeded Proteobacteria, as embodied in the finding that Actinobacteria made up a great share (72.2%) of keystone communities ( Fig. S4 and Table S4). In summary, the composition of the whole global airborne bacterial community was similar to that of the core set, while both displayed wide variations from keystone communities (Fig. S4).

S2.2.2 Keystone Taxa Associated with Evolutional and Ecological Functions.
According to the rich-gets-richer preferential attachment process of growth in a scale-free network (32), the highly connected nodes, namely, the keystone species, could acquire more links, contributing to the establishment of the whole microbial network, which means that keystone nodes are recognized as initial components in networks. Thus, in evolutionary terms, this suggests that keystone species emerged earlier than other species, and that their lineages might have a longer evolutionary history S12 in microbial co-occurrence networks (15). This has important implications for exploring the origins of microbes in the atmosphere and even in other ecosystems. For example, an important keystone taxon, Frankiales (OTU19 & OTU22 in Table S4), was assumed to have an adaptable ancestral bacterium that evolved to occupy many different ecological niches, including the root nodules of woody dicots, hot springs, rocky surfaces, gamma-irradiated substrates, activated sludge, compost, and soils (33), which could be assumed to be the crucial ancestor of many airborne bacteria. In addition, the close relationship of Frankiales with soluble ions in RDA also revealed its wide adaptation (Fig. S12).
The keystone concept was derived from food-web ecology (34), while the interspecies relationships, including mutualism, commensalism, parasitism, competition, and others, were mainly manifested as trophic relationships. Thus, the functions of most keystone species were related to nutrition and metabolism, which was also reflected in our study. For instance, Rhizobiales (OTU63 & OTU55 in Table S4) occupies a unique physiological function (biological nitrogenfixation) and has been recognized as key to the global cycling of nitrogen (35). Similarly, Burkholderiaceae (OTU250 in Table S4) can produce secondary metabolites and significantly affect microbial interactions in the network (36), and Gaiellales (OTU372 & OTU721 in Table S4) can utilize several organic compounds (37) and plays an important role in the whole process of the cycling of nutrition in microbial communities. The integrity of the overall community and its unaltered persistence through time, namely its stability, have been authenticated to be profoundly affected by the activities and abundance of these individual populations, which make up the keystone set (38).
Moreover, for insight into the functions of keystone species, the microbial keystone taxa in various ecosystems were also summarized from other studies (Table S5). The point worth emphasizing here is that diverse members of the Rhizobiales and Burkholderiales orders have been consistently recognized as keystone taxa in various ecosystems and habitats, from soil to aquatic systems and from equatorial to polar habitats (Table S5). Rhizobiales contains not only nitrogen-fixing bacteria (Rhizobium spp. and Bradyrhizobium spp.), but also methanotrophs (Methylobacterium) with high abundance in the phyllosphere (35). In addition, Burkholderiales includes several well-known pathogenic bacteria, such as Bordetella, Ralstonia, Oxalobacter, as well as Burkholderia, one of the most versatile and virtually ubiquitous terrestrial microbial groups (36). Although Rhizobiales and Burkholderiales have been well documented as playing an important role in keystone bacterial communities in the natural world, the computational identification results have shown that not all of their members could be considered keystone taxa. Plenty of its subordinate taxa in the two orders had no significant impacts on community composition or function. Their common keystone roles, which occur in diverse habitats, might also be caused by their large abundance and wide occupancy in natural environments. Despite that, the possibility is still high that members of Rhizobiales and Burkholderiales could become keystone taxa, and future studies should further evaluate their roles as potential keystone taxa in microbial functions and interactions.

S2.2.3 Airborne Bacterial Diversity Pattern in Mid-latitudinal Regions.
The global latitudinal diversity pattern in whole airborne bacterial communities showed a clear trend of richness peaking at intermediate latitudes and declining towards the equator and the two poles ( Fig. 2A). However, there were still some dots that deviated from the fitting line in the above latitudinal pattern, especially in the mid-latitudinal regions (35 o -45 o ). In order to explain the unknown large deviation, mid-latitudinal samples (n = 64) were selected for further study. We discovered that the bacterial richness strongly correlated with both PM10 concentration (R 2 = 0.549, p < 1 × 10 -10 ) and PM2.5 concentration (R 2 = 0.517, p < 1 × 10 -7 ) (Fig. S13). The richness was higher in moderately polluted air, but lower with good air quality and heavy pollution, which might be attributed to the fact that pollutants could also be considered a source of nutrients for the microbiome. In addition, populations are distributed unevenly across the globe, a situation that is much more obvious in midlatitudinal regions. In the more developed cities and economic circles, populations are much more dense, while they are sparsely populated in deserts, polar regions, and high mountains. For instance, Hong Kong is densely populated, with 6,544 people per square kilometer. Its population density is far above that of Mt. Ailao, which is at a similar latitude. However, there was no significant S13 difference in richness between the two sites. In conclusion, from a global perspective, airborne bacterial diversity follows a downward opening parabola-shaped latitudinal pattern and is hardly influenced by human distribution and the intensity of human activities.

S2.3 Impacts of Urbanization on Airborne Bacterial Genotypes
The structure (i.e., cell shape and cell management) of airborne bacteria changes a great deal due to urbanization, as can be seen from the finding that the percentages of Bacilli and of bacteria existing in clusters in urban areas were higher than those in terrestrial background and offshore areas (Fig. S14). Although air temperature is mainly driven by latitude, the relatively higher air temperature in urban areas than in terrestrial background and offshore areas in general caused there to be a correspondingly higher optimal temperature range for airborne bacteria in urban areas (Fig. S15a). For instance, the relative abundance of thermophilic bacteria was higher in the urban areas of Guangzhou (5.25%) than in a background area with a similar latitude, Mt. Ailao (2.39%). Nevertheless, tolerance of airborne bacteria to the environment did not change with urbanization; concretely, no difference was observed in the environmental resistance caused by sporulation and the surviving modes (symbiotic or free-living) of bacteria between urban and background air (Figs.  S15 b and c). Similarly, no great difference was found in the ratio of gram-positive to gramnegative airborne bacteria (and pathogens) in urban, terrestrial background, and offshore areas, further illustrating that urbanization did not change the likelihood of human illness related to Gram-and/or Gram+ pathogens (Fig. S16). The motility of airborne bacteria in urban areas was higher than in natural areas (terrestrial background and offshore areas), aligning with the ratio of bacteria carrying flagella. This might enlarge the spread of airborne pathogens and further increase the likelihood of pathogenic infections in cities (Fig. S17).

S2.4.1 Relationships between Environmental Factors and Airborne Bacterial Diversity and
Biomass. The variations in the diversity of global airborne bacteria with latitude were similar to that in the AQI index scores and concentrations of PM10, PM2.5, SO2, and CO, and were also related to other meteorological parameters such as air temperature, air pressure, wind direction, wind speed, and relative humidity (Fig. S5). The pairwise correlation analysis suggested that the relationships of richness with other geographic locations, namely altitude (R 2 = 0.019, p = 0.209) and distance to coast (R 2 = 0.019, p = 0.209), were either weak or uncorrelated. Besides, different pollutants showed different relationships with bacterial diversity, mainly classified as parabolic fitting curve associations (AQI, PM2.5, and PM10), positive correlations (SO2, and CO), and independent relationships (NO2, and O3). Among the above meteorological conditions, the best-predicted factor of bacterial richness was relative humidity (R 2 = 0.190, p < 10 -7 ).
The correlations between the total airborne bacteria biomass (the number of copies of 16S rRNA genes) and environmental factors were also analyzed (Fig. S7); however, three factors were significantly related to bacterial biomass: concentration of NO2 (R 2 = 0.109, p < 10 -5 ), wind speed (R 2 = 0.173, p < 10 -8 ), and wind direction (R 2 = 0.189, p < 10 -6 ). According to the parabolic fitting curve relationship, the biomass was much higher with a southerly wind blowing than with winds blowing from other directions. At this point, we originally hypothesized that warm temperature and low latitude were hospitable to bacterial survival and diversity, because most samples were collected in the northern hemisphere, and southerly winds could increase the local air temperature to a certain extent. However, this hypothesis conflicted with another discovery that there were no relationships between biomass and either latitude or air temperature, so the hypothesis was soon disavowed. As a result, there must be some unknown mechanisms of wind direction driving airborne bacterial biomass, which might be related to atmospheric circulation or other geographic effects on microbial communities.

S2.4.2 Environmental Filtering Impacting Each OTU in Whole Global Airborne Bacterial
Communities. To further understand the impacts of environmental factors on whole airborne bacterial communities, we examined one by one the correlations between 11 typical environmental S14 variables with a total of 10,897 OTUs and drew up a summary. More than half of the OTUs (57.7%) showed no connections with the environmental variables. Also, among the few existing relationships, the degree of correlation of the majority (97.4%) was not high (absolute Spearman's ρ < 0.5), indicating that the impacts of environmental filtering on the abundance of each specific OTU were weak (Fig. S18a). In addition, the total number of significant correlations of each OTU with air pollutants (n = 6,440) was much larger than with meteorological conditions (n = 2,284), particularly for CO, PM10, and PM2.5, which were respectively correlated with 14.4%, 12.9%, and 12.5% members of whole airborne bacterial communities (Fig. S18b). Moreover, most of the impacts of air pollutants on airborne bacteria (78.0% in keystone bacteria, 72.5% in core bacteria, and 69.0% in all OTUs) were positive (Fig. S18c), which indicated that it was easier for the structure of airborne bacterial communities to be positively influenced by moderate air pollution. Air pressure and relative humidity also contributed to shaping community structure and could directly affect 9.8% and 8.2% of bacteria, respectively. Other meteorological conditions, however, had almost no impacts on bacterial abundance.
Although environmental factors had less of an effect on the total biomass (Fig. S7), bacterial richness was associated with most air pollutants and climatic conditions (Fig. S5). This determined the patterns of diversity of airborne bacterial communities worldwide, although some of these environmental factors were inter-related (Fig. S6a). Abiotic factors, namely environmental variables, affected airborne bacterial richness in various regions but had less influence on the abundance of each specific OTU, with a low ratio (7.28%) of the number of existing significant relationships (p < 0.05) to the number of all possible connections between OTUs and environmental variables. Thus, there must be some unknown connections between the environmental variables and the overall community structure. Here, the significant effect frequencies of environmental factors on each core and keystone OTU were quantified by a multiple regression analysis, showing the ratios of 24.6% and 44.0%, respectively (Fig. S17c). This revealed that the strength of the impact of environmental filtering on these two crucial bacterial communities largely outweighed that of other normal OTUs. This suggested that keystone species and core bacteria were much more affected by the environmental filtering process and played a key role in shaping the composition of the whole community, which could be considered as one of the biotic factors shaping whole communities.

S2.4.3 Impacts of Environmental Variables on Keystone and Core Bacterial Communities.
To investigate the temporal and spatial variability of the two crucial airborne bacterial communities (24 core OTUs and 19 keystone species), we analyzed their correlations with other potential environmental factors by multiple regressions. The results showed that all keystone species and most of the core bacteria were significantly (p < 0.05) correlated with at least one out of the 11 environmental factors (Figs. S19 c and d). VPA showed that air quality affected the keystone bacterial communities most and a subset of air pollutants including SO2, NO2, CO, O3, PM10, and PM2.5 together explained 53.3% of the structural variations. This was substantially higher than the figures for meteorological condition (26.54%) and landscape coverage type (18.3%) (Fig. S19f). This result was consistent with the high frequency with which air pollutants were related to keystone bacterial communities (26.3%) and the comparatively low frequency with which they were related to meteorological conditions (17.7%) (Fig. S18). Regarding the core bacteria, the contribution of land coverage type (15.9%) was far lower than that of the other two groups (36.5% and 31.9%) (Fig. S19e). Notably, the three major groups had a significant impact on whole communities, explaining over 70% of the variation in both the core (70.6%) and keystone (80.1%) communities.
In addition, as well-known antimicrobial agents, heavy metals in PM2.5 were also explored for their effects on bacterial communities. However, the heavy metals in the air showed little effect on temporal or regional variations in the structure of keystone and core bacterial communities, as reflected by the similar distributions of all samples (RDA, Fig. S12). As for species differentia, keystone taxa were all negatively affected by heavy metals, while core bacteria had various responses to heavy metals. For example, around half of core OTUs were positively correlated with heavy metals, probably because these abundant core bacteria were adaptable to atmospheric environments, while metal-induced toxicity was negligible. The remaining core bacteria showed S15 weak or negative correlations with heavy metals. The mechanisms of the metals affecting airborne bacteria were complex, and further studies involving discussions of specific conditions and toxicity experiments of cells are needed. S16 Fig. S1. Airborne bacterial abundance distributions across the globe (n = 370). (a) Phylogenetic tree of dominant airborne OTUs. The center is a phylogenetic tree of bacteria abundant in the worldwide atmosphere. The middle ring corresponds to body sites at which the various taxa are abundant and color-coded at the phyla level. The airborne bacteria are dominated by four phyla: Proteobacteria (green), Firmicutes (yellow), Actinobacteria (red), and Bacteroidetes (blue). In the external middle ring, the most abundant bacteria (mean relative abundance > 1%) are indicated by purple rectangles, and the relatively inadequate bacteria (mean relative abundance < 1%) are indicated by yellow triangles. The heights of blue bars outside the circle correspond to the abundance of taxa at the body site of greatest prevalence. (b) Phyla-level (class-level for Proteobacteria) community compositions across the globe. Each column represents a phylum (subphylum). Each row represents a sample collected from various environments at different times, which are clustered based on the Bray-Curtis similarity of phylum-level compositions. The taxonomic annotation is labeled with text at the bottom. The relative abundance of each phylum in specific samples is depicted by the color of the box with reference to the legend (unit: %) at the upper right. S17 Fig. S2. Abundance-occupancy relationship (AOR): mean relative abundance (x-axis) and occupancy (y-axis) plot after combining the OTUs with the same annotation (n = 10,897). The mean relative abundances were estimated by averaging the relative abundances of each OTU in all samples; occupancy represents the number of samples in which the OTU was detected. The fitted model (sigmoid curve) was occupancy versus logarithm of abundance for each species, and the red solid line is the global fit to all species and samples. S18 Fig. S3. Comparison of networked bacterial communities in the global atmosphere, topsoil, and top marine layer. Co-occurrence network: The connection (edge) stands for a strong (Spearman's ρ > 0.6) and significant (P-value < 0.01) correlation. The nodes represent the combined OTUs with a unique annotation for the genus level in the datasets. The size of each node is proportional to the mean relative abundance across all samples. Nodes are colored by the phyla of the bacteria. Network topology: The degree and centrality (betweenness and closeness) of each node from the networks were measured. Degree represents the number of direct connections of a node with other OTUs in the whole community. Betweenness centrality reveals the role of a node as a bridge between components of a network. Closeness centrality is a measure of the average shortest distance from each node to each other node. Power-law degree distribution: The node-degree distribution shows a power-law behavior for an airborne bacterial community co-occurrence network in airborne bacterial communities (R 2 = 0.984, p < 0.001), marine layer bacterial communities (R 2 = 0.897, p < 0.001), and topsoil bacterial communities (R 2 = 0.937, p < 0.001), respectively.          The number of asterisks indicates the significance levels (two-sided) of the Spearman's rank correlation coefficients (*** p < 0.001, ** p < 0.01, * p < 0.05). Each row represents the correlation of a specific OTU and environmental variables in 370 biologically independent air samples, which are clustered based on Spearman's rank correlation coefficients. (e and f) VPA showing relative contributions of air quality (PM10, PM2.5, SO2, NO2, O3, and CO), meteorological condition (air temperature, air pressure, relative humidity, wind speed, and wind direction), and landscape coverage type (water/sea, urban, grassland, cropland, forest, and shrubs) to community variations in core bacteria and keystone species. The overlap represents the joint effect explained by two or three factor groups together, while the percentage number below each group name represents the variance explained by one group alone. "Unexplained" denotes a variance that could not be explained by any one of these three groups.