The gut microbiome of Baka forager-horticulturalists from Cameroon is optimized for wild plant foods

Summary The human gut microbiome is losing biodiversity, due to the “microbiome modernization process” that occurs with urbanization. To keep track of it, here we applied shotgun metagenomics to the gut microbiome of the Baka, a group of forager-horticulturalists from Cameroon, who combine hunting and gathering with growing a few crops and working for neighboring Bantu-speaking farmers. We analyzed the gut microbiome of individuals with different access to and use of wild plant and processed foods, to explore the variation of their gut microbiome along the cline from hunter-gatherer to agricultural subsistence patterns. We found that 26 species-level genome bins from our cohort were pivotal for the degradation of the wild plant food substrates. These microbes include Old Friend species and are encoded for genes that are no longer present in industrialized gut microbiome. Our results highlight the potential relevance of these genes to human biology and health, in relation to lifestyle.


INTRODUCTION
The human gut microbiome (GM) is capable of acquiring structural and functional configurations that reflect differences in modes of living.][3][4][5][6][7][8][9][10][11][12][13][14] The former are characterized by a GM ecosystem with significant higher biodiversity, an extraordinarily complex glycome, and the presence of Prevotella, Succinivibrio, and Treponema, commonly referred to as "old friends" bacteria. 15,16In contrast, urban and industrialized groups more commonly display reduced ecosystem diversity, a complex resistome, and considerable number and complexity of genes specifically related to the metabolism of xenobiotic compounds.The differences between these two ends of the microbiome spectrum may provide glimpses of a possible adaptive GM response at the holobiont level, where the GM complements the limited plasticity of our genomes, providing the necessary phenotypic plasticity to adapt to the various lifestyles. 17For example, the increased structural and functional diversity typical of the GM from communities practicing gathering, small-scale horticulture, and pastoralism is likely a response to the diverse and refractive plant polysaccharides. 18In contrast, the GM from industrialized urban societies is more specialized for the metabolism of simple sugars and is more able to adapt to or detoxify the xenobiotic substances that they regularly encounter. 16However, numerous studies have also indicated that the GM among increasingly industrialized populations undergoes several deleterious changes, including a reduction in diversity and increase in functional specialization, that lead to reduced resilience, high risk of dysbiotic transitions, and increased burden of non-communicable diseases (e.g., 19 ).This rises the important concern of the ''microbiome modernization process," 16 as a progressive and maladaptive shrinkage of the phylogenetic and functional diversity that is occurring along with the human urbanization and modernization processes.
By studying the diversity of the human GM globally and at the metapopulation level, we therefore gain insight on how these communities of bacteria, viruses, and fungi change in the human population, contributing to human health and our ability to succeed while engaging in a large diversity of lifeways or including, in some circumstances, possible maladaptive changes.Exploration into the GM profiles of non-urban industrialized groups is of particular importance, for two reasons.First, the full diversity of the human GM is still largely unexplored, with a limited knowledge of its variation among rural and traditional population, which may still represent an untapped source of probiotic functions loss in urbanized context.Second, sociopolitical pressures have meant that many groups practicing foraging, horticulture, and pastoralist lifestyles are increasingly adopting aspects of the urban industrial lifestyle, including consumption of mass-produced foods, regular use of ll OPEN ACCESS antibiotics, and greater reliance on a smaller number and variety of food items. 8,12There is therefore an increased urgency to capture information about the unique GM configurations across as wide a spread of different lifeways, in order to highlight and protect their strategic functional traits providing selected and important phenotypes.Furthermore, by exploring the GM configurations of groups who rely heavily on the consumption of diverse plants, we may highlight the importance of a more sustainable plant-based diet in industrialized urban populations, which would result not only in the recovery of the strains/genes necessary for the exploitation of complex plant polysaccharides but also in the concomitant gain of associated probiotic functions and/or metabolites, with important benefit in terms of human health.
In this context, we partnered with the Baka, a group of forager-horticulturalists from southeastern Cameroon, who combine hunting and gathering with growing a small number of crops and working for the neighboring Bantu-speaking farmers (the Nzime). 20Part of the food consumed by the Baka, particularly cassava (Manihot esculenta) and plantain (Musa paradisiaca), comes from agricultural fields, 21 with the addition of only few processed foods (i.e., stock cubes, tomato sauce, and, rarely, sardines).On the other hand, a wide variety of key nutrients come from game (meat) and forest foods, including edible wild plants and nuts. 22We characterized the GM of Baka who spent a large amount of their time in a forest camp (Baka forest group), Baka individuals who mostly live in a village along the logging road (Baka settled group), and Nzime farmers (Nzime village group).The Baka forest group consumed more wild plants and less processed foods than the settled Baka, whereas the Nzime village group relied more on processed and traded foods.Recruiting a cohort with individuals who showed different accessibility to wild plant foods and processed foods, we have been able to highlight the GM features associated with the specificities of the three rural lifestyles, with Baka forest group relying the most on western African rainforest wild foods.Metagenomic profiles from this cohort were interpreted across subsistence strategies and integrated with available data from worldwide populations, with varying degrees of traditional or industrialized lifestyles.We explored the variation of the GM along the transition from hunter-gatherer to agricultural communities at a finer functional resolution than previous efforts, from the microbiome network topology to the genome scale metabolic models, until species-level genome bins.Our work led to uncovering specific adaptive gradients associated with the consumption of western African rainforest wild plant foods, at both taxonomic and functional scale.Finally, the results herein offer a complete description of reference genomes of microbes associated with wild plant foods consumption and their potential relevance for human health.

Baka, Nzime, and lifestyle gradient
3][24] In this study, we asked for volunteers from three different communities, representing the two major ethnic groups in the region, the Baka and the Nzime.The Baka are Ubangian-speaking people who practice foraging and small-scale cultivation, whereas the Nzime are Bantu-speaking people who practice subsistence-level agriculture, including, to a reduced extent, animal husbandry.Both groups live in the same area, primarily in villages clustered along logging roads (Figure 1).The two groups regularly interact with the Baka trading forest foods (plants and game) for agricultural crops grown by the Nzime.The Baka also engage in wage labor for the Nzime in their agricultural fields, for logging companies and collect forest products for sale to traders along the logging roads.Small market stalls that sell canned goods, candy, alcohol, and other items also are present along the logging roads in the Nzime villages, and the Baka have some access to these resources. 20Although many Baka have homes and fields in one of the villages, some regularly live in forest camps that are located several hours' walk from the logging roads.These forest camps are sometimes used only seasonally (e.g., during the rainy seasons when forest nuts are collected), but some individuals choose to spend the majority of their time in the forest as well.In this study, we compared the GM profiles from 26 Baka participants, including 16 individuals from the forest camp (Baka forest) and 10 individuals from one of the settled villages (Baka settlement, see Figure 1).Additionally, 18 individuals were recruited from the neighboring Nzime (Nzime village).

Baka and western African rainforest wild plant foods
The Baka use a variety of wild plant foods in their meals: (1) dark green leaves (e.g., Gnetum africanum, Baka name: koko), which are rich in amino acids and are likely an important source of protein; (2) oil from nuts, particularly Irvingia spp (bush mango, Baka name: payo, pekoe), Baillonella toxisperma (Baka name: ma `be `) and Panda oleosa (Baka name: kana`) that are rich in fat and used for cooking; and (3) spices such as Afrostyrax lepidophyllus (Baka name: ngimba `) and a variety of fruits, mushrooms, bark, and other plant species.Evidence from our previous research in this community 21 outlined that Baka villagers settled closest to the market town consumed legumes, nuts, and seed, but not specifically coming from the wild.On the other hand, Baka who spent more time in the forest have more access to the wild plant foods and consumed them with more regularity and at higher quantity.Finally, individuals from the Nzime village possess more money, which is reflected in wider access to other food and less interest toward wild plant food consumption.

Baka GM varies on the basis of lifestyle
In order to assess whether the GM varies across lifestyle, we characterized the samples by shotgun metagenomics, with an average of 8.1 M (G1.9 M) high-quality reads per samples (Table S1).Starting from the 44 metagenomes, we were able to reconstruct 628 metagenomesassembled genomes (MAGs), which were dereplicated to 161 species-level genome bins (SGBs, i.e., clusters of MAGs spanning 5% genetic diversity; see the ''STAR Methods'' section and the Figure S1 for further details).Then, we mapped such 161 representative genomes against the previously available SGBs database from Pasolli et al., 25 that described the >1,50,000 MAGs from the GM of different individuals, spanning age, geography, and lifestyle.In total, 132 of our SGBs (82%) cluster together with at least one known SGB from Pasolli et al. 25 (Table S2); on the other hand, the remaining fraction of SGBs (29 SGBs, 18%) showed >5% genetic distance to any SGBs of the available database, representing candidate new taxa.Based on the 161 SGBs, comparison of community structure in the three groups (Baka forest, Baka settled, and Nzime village), using weighted and unweighted UniFrac distances, showed that the GM varies across groups (p = 0.001, permutational test with pseudo-F ratio).In particular, visualization of these distances using principal coordinates analysis (PCoA) revealed separation between the Baka forest and Nzime village individuals (p value = 0.001, permutation test with pseudo-F ratios), with the Baka settled group closely associated with the Baka forest group, but slightly shifted toward the Nzime village group, as reflecting the changes in lifestyle (Figure 2A).
Coherently, several SGBs mirror this trend, with SGBs for Roseburia inulinivorans, Collinsella, and Lachnospira sp000437735 that significantly decreased from Bantu-speaking individuals to Baka forest, with Baka settled showing an intermediate abundance, and on the other hand, SGBs for Enterousia, unknown Ruminococcaceae, and Phascolarctobacterium sp90055135, showing the opposite trend (Figure 2B, p < 0.05, Kruskall-Wallis rank-sum test; see on the figure panel for the exact p values).When combining the two Baka groups into one and compare it with the Nzime, all the differences mentioned earlier were generally confirmed, with the only exception for the two SGBs assigned to Enterousia, which were characteristic only of the Baka Forest group.However, additional differences have been also observed, such as Agathobacter rectalis, significantly higher in the Nzime, and Faecousia and unknown Ruminococcaceae, characterizing the Baka (Figure S3).

Identification of SGBs involved with western African rainforest wild plant food consumption and their contribution to GM structure
To capture the functional diversity of GM, encoding for the metabolic functions able to use wild plant foods as substrate, we applied a de novo functional screening of genome-scale metabolic networks (GSMNs) to the full set of SGBs characterized within this study.In particular, we used Metage2Metabo (M2M), 26 a resource that allows the identification of the key microbiome components for specific substrate usage, with a particular emphasis on metabolic cooperation.Based on the frequency of wild plant species appearing in the dietary recalls of our previous work (N = 2377), 22 we ran M2M for the five wild plant foods that showed a frequency intake >1%, which included Gnetum africanum (40.7%),Irvingia spp (8.4%), Baillonella toxisperma (5.9%), Afrostyrax lepidophyllus (1.5%), and Panda oleosa (1.3%).
We found a module of 26 cooperating SGBs, out of 161, as essential for the metabolism of the western African rainforest wild plant food substrates (wpSGBs).In particular, these wpSGBs included taxa that are usually associated with a rural lifestyle, such as Treponema, Succinivibrio, and Prevotella, together with other taxa that are common components of GM also in industrialized context, such as Butyricicoccus, Dialister, Escherichia coli, Phascolarctobacterium, and Ruminococcus.Notably, Treponema, Succinivibrio, and Prevotella are Old Friend species, usually considered part of the human GM in our ancestors before adopting agriculture, 27 and often absent in ''Western'' populations. 18traightly-and supporting the connection between the wpSGBs module and the African rainforest wild food consumption-the cumulative abundance of wpSGBs was significantly higher in the Baka forest group, compared with individuals of the Nzime village (p = 0.02, Wilcoxon rank-sum test), with the Baka settled group showing an intermediate abundance (Figure 3).
24 wpSGBs showed a representative (kSGBs) in the SGBs database from Pasolli et al. 25 However, most of them (23/24) are still rather uncharacterized species, as they represent sequenced genomes assigned to genus-, family-, or order-level without any species name.Many such unknown wpSGBs were from the Clostridia class (10 kSGBs).Further, the 2 remaining wpSGBs that fall within previously uncharacterized genomes (i.e., those showed >5% genetic distance to any SGBs of the database) were assigned to Collinsella and to an uncharacterized species of the class Bradymonadia.A full list of the wpSGBs and their taxonomic assignment is available in Table S3.
We next investigated how these 26 wpSGBs contributed to the overall GM structure and community topology in our dataset.To this purpose, we constructed a heatmap based on the Kendall's tau correlation coefficients between the different 150 SGBs with a minimum genome copies per million reads of 10 in at least two samples.We then grouped correlated bacterial species into seven clusters of SGBs, represented by different colors, whose interactions are represented by Wiggum plot, where SGBs abundance is proportional to the circle diameter (Figures 4A, 4B, and S4, and Table S4).The dominant SGBs for each cluster were taxonomically assigned to Bacteroides fragilis (gray), Cryptobacteroides (brown), Prevotella (pink, Prevotella cluster 1), Prevotella (blue, Prevotella cluster 2), Phascolarctobacterium (red), Succinivibrio (cyan), and Treponema (green).The topological data analysis indicated that Cryptobacteroides, Prevotella (from cluster 2), and Treponema are keystone taxa in the GM network structure, showing the highest connectivity, due to the combination of high values of (1) closeness centrality (0.50, 0.55, and 0.46, respectively), (2) betweenness centrality (0.03, 0.02, and 0.04, respectively), and ( 3) degree (29, 48, and 20 respectively), with a normalized genome counts per million reads >50.Notably, two wpSGBs were assigned to two of these keystone taxa (Treponema and Prevotella).
The clusters changed in abundance across the three groups, with the 26 wpSGBs associated with wild plant food consumption that showed a higher representation in the Cryptobacteroides cluster (n.wpSGBs = 9), with respect to Prevotella cluster 2 (6), Phascolarctobacterium (3), Treponema (3), Succinivibrio (3), and Bacteroides fragilis (2) clusters (Figures 4C and S4, and Table S4).In particular, the GM of the Baka forest group was characterized by a Cryptobacteroides-centered cluster, with relevant contribution in terms of abundance of the Phascolarctobacterium, Treponema, and Succinivibrio clusters.Conversely, the GM of the Nzime group was found to be centered around the Prevotella cluster 2 and Bacteroides fragilis cluster.
As expected, the GM of the Baka settled group was characterized by an intermediate configuration between the Baka forest and the Nzime village groups, coherently with their lifestyle that represented an intermediate between the other two groups.Indeed, we observed a strong resilience of members of the Cryptobacteroides cluster, to values comparable with the Baka forest, together with the emergence of some members of the Prevotella cluster 2. Notably, this group was also characterized by a higher abundance of members of the Treponema cluster than the other two groups.
Collectively, the different accessibility to wild plant foods was probably the reason for the modifications of GM structure in Baka individuals, as revealed in the Baka settled group with respect to Baka forest, with new emerging GM traits that are shared with Nzime agriculturists (Nzime village group).In order to rule out possible strain transfer between individuals from the Baka settled and Nzime village groups, we applied StrainPhlAn3 28 to the most abundant wpSGBs.We found that no strains were shared, as if wpSGBs from different groups were different strains or, at least, not deriving from recent transmission events between individuals.
Although the abundance of the wpSGB module decreased when comparing Baka forest and Baka settled groups, wpSGBs were preserved in almost all clusters and samples, including the Nzime village group, even if at lower abundance.This persistence likely had two contributing causes: (1) western African rainforest wild plant food consumption only decreases across these three groups, but never completely ceased, and (2) the metabolic capabilities of wpSGBs were very varied and not exclusively limited to degradation of wild-plant-food-deriving substrates, but possibly also providing additional probiotic functions of relevance for keeping health.
Wild-plant-food-associated taxa contain genes that are not present in the industrialized GM In order to search for the specific wpSGBs functional features that associated with the metabolism of wild plant food substrates (i.e., genes involved in degradation of the molecules contained within Gnetum africanum, Irvingia sp., Baillonella toxisperma, Afrostyrax lepidophyllus, and Panda oleosa), we first screened wpSGBs for genes for the degradation of polysaccharides and phytochemicals (e.g., polyphenols and essential oils) and that were not present in the remaining 135 SGBs from this study (see STAR Methods for more details).We found 29 genes from 7 different wpSGBs with these characteristics (Table 1).In particular, the full list contains 22 genes from E. coli, together with seven genes belonging to six different wpSGBs, encoding for urease (wpSGB taxonomy: unclassified Clostridia), chloronitrobenzene-nitroreductase (Duodenibacillus), pullulanase (Treponema), dihydrolipoyl dehydrogenase (unclassified Sphaerochaetaceae), sialidase (Faecousia), mannosyltransferase, and a protein assigned to the CBM57 module (unclassified Kiritimatiellae).We then verified the presence of such genes within the gut metagenomes of 970 individuals of different geographical origin that relied on rural or industrial lifestyle (Figure 5A; Table S5).The distribution of these functional features in the human gut metagenomes suggests, from one hand, that the E. coli-related genes are widespread, and, on the other side, that the remaining functions are most likely exclusive of rural GM, irrespectively of geographic origin, with the exception of the chloronitrobenzene-nitroreductase, which is exclusively present in our cohort.S4).The most dominant clusters identified are highlighted by different colored boxes and were confirmed by permutation tests with pseudo-F ratios (p < 0.05, adonis of the R package vegan).One setting was used for cluster analysis (gray dashed lines), which identified seven clusters.The Cryptobacteroides cluster is highlighted in brown, the Treponema cluster in green, the Succinivibrio cluster in cyan, Phascolarctobacterium in red, Prevotella (cluster 1) in pink, Bacteroides fragilis in gray, and Prevotella (cluster 2) in blue.Further, we hypothesized that some wpSGBs could encode for additional features not connected to degradation of plant substrates but relevant microbiome-microbiome and microbiome-host communication, with unexplored impact on human health. 29,30In this direction, we investigated the presence of biosynthetic gene cluster (BGCs) for the production of secondary metabolites, within the genome of wpSGBs.Such BGCs can produce a wide variety of natural products, including antibiotics, antifungals, and other bioactive compounds, with a possible relevant importance in host-protection. 31We found 34 BGCs encoded by 16 wpSGBs (Table 2).In particular, such secondary metabolites are ranthipeptides, arylpolyenes, terpenes, betalactones, thiopeptide, type I polyketides, non-ribosomal peptides (NRPs), ribosomally synthesized and post translationally modified peptides (RiPPs), and RiPP precursor peptide recognition elements (RREs).These molecules contain specific and broad-spectrum antimicrobials, plant-ground mediators, and molecules that participate in the microbial quorum sensing.In order to explore the global diffusion of such BGCs into the human gut metagenome, we screened the same metagenomes we previously used for the genes involved in wild plant food degradation.We found that the 34 BGCs are present into the gut metagenome of individuals relying on a rural lifestyle, with only few exceptions, mainly related to BGCs of the wpSGB assigned to E. coli (Figure 5B).Conversely, BGCs associated with the production of arylpolyenes, type I polyketides, terpenes, and lactones were exclusively present in the rural individuals, irrespective of their geographic origin, as if their presence were linked to the lifestyle.When looking at BGCs specifically present in our cohorts, we found that arylpolyenes produced by Duodenibacillus sp900767875, together with arylpolyenes and terpenes produced by Merdousia sp002437405, and BGCs for NRPs of the Butyricicoccus A sp002395695 were very specific of Baka and Nzime individuals.We are here tempted to speculate that their presence could be connected to western African rainforest wild plant food ingestion and consumption.

DISCUSSION
Baka communities are increasingly faced with challenges to their culture and livelihood through influences such as land displacement for exploitation of the natural resources, government policies that favor agricultural societies, and climate change with drought that may disrupt traditional patterns of migration and make it difficult to find food and water resources. 20These favor the transition from foraging and smallscale cultivation to settled agriculture and industrialization, with an increase in consumption of processed foods and decrease in wild plant foods and game. 32Here, we demonstrated that the consumption of wild plant foods is associated with the presence of a specific microbial module of 26 wpSGBs, providing taxa and functionalities that are preserved almost in their entirety across other rural populations and lost in industrialized populations.Coherently, part of these microbes encoded for genes that are no longer detected in industrialized GM, such as urease, pullulanase, dihydrolipoyl dehydrogenase, sialidase, mannosyltransferase, and a protein assigned to the CBM57 module, together with BGCs for the production of arylpolyenes, polyketides, terpenes, NRPs, and lactones.Such enzymes are mainly involved in substrate degradation and signaling, whereas the secondary metabolites are more connected to the microbe-host crosstalk with unexplored effects on human host.
Specifically, the pullulanase catalyzes the hydrolysis of pullulan, a polysaccharide composed by maltotriose units, into smaller molecules, comprising glucose, and driving to the production of short-chain fatty acids (SCFAs) such as propionate, which have been shown  to have anti-inflammatory properties and can help to promote gut health. 33Pullulanase-producing bacteria may thus play a role in the metabolism of dietary fiber and other complex carbohydrates, which can be difficult for humans to digest on their own.In particular, by breaking down these complex molecules, pullulanase-producing bacteria can help to release nutrients that would otherwise be inaccessible to the human body.When we considered the carbohydrate-binding module 57 (CBM57), we found that it is associated with bacterial enzymes involved in the breakdown of complex carbohydrates, including the plant-derived lignocellulose. 34,35Coherently, also mannosyltransferase was previously identified as one of the microbial CAZymes involved in the degradation of complex microbial polymers. 36Taken together, the presence of pullulanase, CBM57, and mannosyltransferase is consistent with the ingestion of wild plant foods, rich in fiber and complex carbohydrates, which are not absorbable by the human host and therefore potentially usable by bacteria possessing at least one of these three genes.8][39] For this reason, further experiments are necessary to disentangle their peculiarities with respect to the analogous enzymes present in the industrial GM.Interestingly, through the selection of the wpSGBs, western African rainforest wild plant foods would also result in the provision of a pool of wpSGBs-associated BGCs, being specific of our cohort and of other rural populations.1][42] We realized that these molecules, produced in the human gut, could have important effect on our health, as highlight in recent studies.For instance, a previous work conducted by Masyita et al. explored  the potential role of terpenes and terpenoids in human health and food industry, showing their possible application as antianxiety, anticancer, anti-inflammatory, and analgesic molecules and also as antimicrobial and food preservative. 43,44The same can be sustained for betalactones, such as tetrahydrolipstatin and salinosporamide A, that have been described as molecules with potent bioactivity against bacteria, fungi, or human cancer cell lines. 45Also, type I polyketides, such as erythromycin and jamaicamide, are characterized by a diverse range of chemical structures and biological activities, and they have been the subject of extensive research for their potential use as antibiotics, anticancer agents, and other therapeutics. 46Finally, arylpolyenes increase protection from oxidative stress and contribute to biofilm formation, and for this reason its biosynthesis pathway has been explored to prevent biofilm formation of multidrug-resistant pathogens. 47It is tempting to speculate that the diversity of such bioactive secondary metabolites in the intestine of Baka and Nzime individuals-and other rural populations-may be an additional benefit coming from the consumption of wild plant foods, which, selecting for wpSGBs, will also provide for the associated and diverse pool of BGCs, possibly providing a range of bioactive metabolites in support for a better gut health.This hypothesis well combines with the low relevance of non-communicable diseases, including metabolic disorders and cancers, in such populations. 48owever, such results need to be further investigated through the isolation of the specific bacteria and the characterization of the chemical structure of the secondary metabolites, for retrieving more insights on their contributions on human health.Taken together, our results shed further light on the microbiome portion associated with the consumption of western African rainforest wild plant foods and traditional lifestyles, highlighting the genetic characteristics that this component carries in its genomes, with a particular attention to those genes that are no longer present in the microbiome of industrialized individuals.The work emphasizes the view of exploring microbiome diversity in traditional populations for identifying the important functionalities to be protected, as strategic for the extension of our phenotypic landscape, 49 as providing the access to specific plant-based foods and also being important for keeping the gut homoeostasis, safeguarding our health.Further, by shedding light on unexplored services provided by the GM to humans who rely on a rural lifestyle consuming western African rainforest wild plant foods, we also have the opportunity to evaluate the impact of modernization on human GM and health.Finally, our work through the evidence of microbes containing BGCs whose presence is associated with the ingestion and/or gathering of wild plant foods nurtures the hypothesis that the GM biodiversity loss linked to industrialization may also be connected to eating predominantly processed and sterile industrial foods.subsamples of the feces to test for parasite eggs in the presence of the participant, showing the process under the microscope, and providing information about fecal parasites and their transmission.When we identified parasite eggs we informed the participant and instructed them to discuss it with the local medical service.In our final field season we returned to the communities and shared with them the preliminary results of the microbiome study.In this presentation we showed them the processing methods and photos of the team, and explained how their (community) GM was different from other communities that were studied previously.

DNA extraction and shotgun sequencing
Metagenomic DNA libraries were prepared using the QIAseq FX DNA Library Kit, following the manufacturer's instructions (QIAGEN).Briefly, for each sample, 100 ng of DNA were fragmented to 450-bp size, end-repaired, and A-tailed using FX Enzyme Mix with the following thermal cycle: 4 C for 1 min, 32 C for 8 min, and 65 C for 30 min.Illumina adapter barcodes were attached through a 15-min incubation at 20 C in presence of the DNA ligase enzyme.After two purification steps with Agencourt AMPure XP magnetic beads (Beckman Coulter), 10-cycle PCR amplification, and a further step of purification as above, samples were pooled at equimolar concentration of 4 nM.Sequencing was performed on an Illumina NextSeq 500 platform using a 2 3 150 bp paired-end protocol, following the manufacturer's instructions (Illumina).A sequencing control from DNA extraction to library preparation was performed and sequenced by 16S RNA sequencing on an Illumina platform to detect any contamination.Only 99 reads, mainly assigned to Aeromonadaceae and unclassified genus of the family Lachnospiraceae (47 and 32 reads, respectively), were detected by our analysis.(Table S7).

Species-level genome bins (SGBs) identification and analysis
Raw reads were filtered from human DNA and quality using the human sequence removal pipeline and the WGS read processing procedure of the Human Microbiome Project (HMP). 81High-quality reads were de novo assembled into longer sequences (contigs), and contigs were binned into metagenome assembled genomes (MAGs) using the metawrap pipeline, 71 with metaspades, 70 maxbin2, 68 metabat2 69 and concoct. 63Quality controls (completeness, contamination, genome size (bp), number of contigs, contig N50 values, mean contig length), were assessed using the lineage-specific workflow in CheckM with default settings and reported in Table S8. 62Only MAGs with a completeness above 50% and a contamination lower than 5% were retained and then dereplicated into Species-level genome bins (SGBs) using the dRep dereplicate command (dRep version 3.2.2) 65 and the following parameters ''-ignoreGenomeQuality -pa 0.90 -sa 0.95 -nc 0.30 -cm larger -centW 0''.GTDB-Tk was used for taxonomic assignment with default parameters. 67The abundance of SGBs in each sample was estimated by the metawrap quant_bins module 71 and the genome annotation was retrieved by prokka 73 using also the dbCAN 82 and XenoPath databases (https://github.com/TessaTi/XenoPath).The sharing of genes across SGBs were determined using roary, 75 with the following parameters ''-I 90 -cd 17 -e -g 1000000''.A phylogenetic tree including all the SGBs were built by applying phylophlan 72 with the default parameters and used for measuring UniFrac distances among samples in PCoA analysis.

Detection of strain-sharing events between individuals of the Baka settled and Nzime village groups working in the same fields
To gain a deeper insight into potential sharing of microbiome components across human metagenomes, we looked at the strain level population structure using StrainPhlAn3 as previously illustrated. 83We perform the analysis on the most abundant wpSGBs, i.e. those that are represented by at least 5 MAGs and whose abundance was > 5 gcpm (genome copies per million reads) in at least one individual from both Baka settled and Nzime village groups.For each species analyzed, custom wpSGB marker databases were constructed, by firstly selecting the core genes for each specific wpSGB from the roary output (i.e., the genes that were present only in the examined SGB and absent in the rest of the dataset).The MAGs comprised within each specific wpSGB were divided into 150 nucleotide fragments and aligned against their core genes using bowtie2 (version 2.3.4.3; -sensitive option).A core gene was considered valuable as marker genes for a wpSGB if at least 90% of MAGs mapped against it by covering >50% of the gene's length.To infer strain sharing, strain-level phylogenies were then reconstructed using bowtie2 (-sensitive option) and StrainPhlAn3 with parameters "-marker_in_n_samples 10 --phylophlan_mode accurate" and the parameter ''-sample_with_n_markers'' set for retaining only samples with at least 10 marker genes.
To detect strain-sharing events, we first set wpSGB-specific normalized phylogenetic distance (nGD) thresholds that optimally separated same-group strain retention (same strain) from unrelated-individuals (different strain) nGD distributions (to this purpose we compared Bakasettled metagenomes with data from a previous study characterizing the microbiomes of the Hadza from Tanzania 6 ).nGDs were calculated as leaf-to-leaf branch lengths normalized by total tree branch length in phylogenetic trees produced by StrainPhlAn, which are built on marker gene alignments.nGD thresholds were then defined based on maximizing Youden's index and limiting at 5% the fraction of unrelated individuals to share the same strain as a bound on a false discovery rate.

Genome scale metabolic models for western African rainforest wild plant foods substrate degradation
Microbiome-scale metabolic complementarity for the identification of key species devoted to the degradation of wild plant food substrates were obtained by applying carveme 61 and Metage2Metabo. 26Specifically, carveme has been applied to each SGBs using the prokka outputs (.faa files) as input and the default options, in order to build the specific genome scale metabolic model (GSMM) for each SGBs.Metage2Metabo, with the command ''metacom'', were used for creating a single metabolic network combining all the GSMM and retrieving the list of SGBs essential for the degradation of the western African rainforest wild plant foods (wpSGBs).5][86][87][88] The list of the wpSGBs was compiled by selecting the bacteria that are involved in the metabolism of at least one of these wild plant foods.

Spread of wpSGB features in the global populations
Further examining the shared features from the output of roary, we identified those genes that were not contained in other SGBs, but only specifically present in wpSGBs.From these genes, we selected those annotated in the dbCAN or XenoPath databases, because of interest for the degradation of plant substrates.We then applied antismash 6.0 59 to the wpSGBs, for selecting eventual BGCs that were potentially connected to plant consumption or involved in microbe-plant crosstalk.The identified features were used to build a database, to which 970 metagenomes from populations from all over the world (Table S5) were aligned using bowtie2 with the -end-to-end tag. 60The number of aligned reads for each sample was retrieved using samtools 77 and normalized for sequencing depth and length of the references, obtaining reads per kilobase of genes per million reads mapped (RPKM) as unit of measurement.

Biostatistics and graphical representation
All statistical analysis and graphical representation were performed using the R software (v.4.2.0, www.r-project.org) with packages vegan (version 2.6-2), 79 RColorBrewer (version 1.1-3), 76 gplots (version 3.1.3), 66viridis (version 0.6.2), 80reshape2 (version 1.4.4), 74tidyverse (version 1.3.2). 78 Data separation in the Principal Coordinates Analysis (PCoA) was evaluated using a permutation test with pseudo-F ratios (function adonis in the vegan package).Kruskall-Wallis test was used to assess significant differences between groups.p values, when necessary, were corrected for multiple testing by means of the Benjamini-Hochberg method, with a false discovery rate (FDR) % 0.05 considered to be statistically significant.

Figure 1 .
Figure 1.Geographic map of sample collection locations Baka and Nzime live in South-East part of Cameroon, as indicated in the box on the top-right of the figure.Fecal samples from Baka adults were collected in the Kungu forest camp (Baka forest) and in the Le Bosquet village (Baka settlement).Samples from Nzime individuals were collected in the Nkeadinako village.Such locations are indicated by red dot on the map.

Figure 2 .
Figure 2. Differences in GM compositions among individuals of the Baka forest (green), Baka settled (light green), and Nzime village (yellow-green) groups (A) PCoA plots based on unweighted and weighted UniFrac distances and (B) boxplots for SGBs abundances, in terms of genome copies per million of sequenced reads.p values are obtained using Kruskal-Wallis test.See also FigureS2for the distribution of all the SGBs across the entire cohort and FigureS3for the same analysis combining Baka forest and Baka settled in a unique group.

Figure 3 .
Figure 3.Comparison of cumulative abundances of wpSGBs Highlighting comparison between fecal samples from individuals of the Baka forest (green), Baka settled (light green), and Nzime village (yellow-green) groups, represented by boxplots.Values in genome copies per million reads.*p = 0.02, Kruskal-Wallis rank-sum test.

Figure 4 .
Figure 4.Co-abundance analysis highlights distinct bacterial networks characterizing the three groups (A) A network heatmap based on Kendall's correlation coefficient and GM data was generated using the most abundant SGBs across all samples (see complete list of taxa and their abundance in TableS4).The most dominant clusters identified are highlighted by different colored boxes and were confirmed by permutation tests with pseudo-F ratios (p < 0.05, adonis of the R package vegan).One setting was used for cluster analysis (gray dashed lines), which identified seven clusters.The Cryptobacteroides cluster is highlighted in brown, the Treponema cluster in green, the Succinivibrio cluster in cyan, Phascolarctobacterium in red, Prevotella (cluster 1) in pink, Bacteroides fragilis in gray, and Prevotella (cluster 2) in blue.(B) Network scheme illustrating the relationships between bacterial clusters.The leading taxa in each network are highlighted.A positive correlation is shown with a gray line and a negative correlation with a red line.Clusters are colored as in (A).(C) Cumulative relative abundance of the different clusters of taxa among the three groups (*p < 0.01, **p < 0.001; Wilcoxon test).See also FigureS4.
Figure 4.Co-abundance analysis highlights distinct bacterial networks characterizing the three groups (A) A network heatmap based on Kendall's correlation coefficient and GM data was generated using the most abundant SGBs across all samples (see complete list of taxa and their abundance in TableS4).The most dominant clusters identified are highlighted by different colored boxes and were confirmed by permutation tests with pseudo-F ratios (p < 0.05, adonis of the R package vegan).One setting was used for cluster analysis (gray dashed lines), which identified seven clusters.The Cryptobacteroides cluster is highlighted in brown, the Treponema cluster in green, the Succinivibrio cluster in cyan, Phascolarctobacterium in red, Prevotella (cluster 1) in pink, Bacteroides fragilis in gray, and Prevotella (cluster 2) in blue.(B) Network scheme illustrating the relationships between bacterial clusters.The leading taxa in each network are highlighted.A positive correlation is shown with a gray line and a negative correlation with a red line.Clusters are colored as in (A).(C) Cumulative relative abundance of the different clusters of taxa among the three groups (*p < 0.01, **p < 0.001; Wilcoxon test).See also FigureS4.

Figure 5 .
Figure 5. Prevalence of wpSGB genes and BGCs across human populations Heatmaps showing the prevalence of the 29 wpSGB genes showing a propensity for the degradation of polysaccharides and phytochemicals (A) and of 34 BGCs for secondary metabolites (B), which were not present in the other SGBs.Datasets comprised individuals relying on both rural and industrialized lifestyle from different geographical origin (see also Table S5 for further details).SWE, Sweden; ITA, Italy (industrial); DEU, Germany (industrial); USA, USA (industrial); IND, India (industrial); CHN, China (industrial); BRA, Brasil (rural); PER, Peru (rural); TZA, Tanzania (rural).

Table 1 .
List of the 29 wpSGB genes showing a propensity for the degradation of polysaccharides and phytochemicals that were not present in the other SGBs

Table 1 .
Continued (Continued on next page)