From hype to hope: Considerations in conducting robust microbiome science

, especially biomedical science, with implications for how we understand and ultimately treat a wide range of human disorders. However, like many great scientific frontiers in human history, the pioneering nature of microbiome research comes with a multitude of challenges and potential pitfalls. These include the reproducibility and robustness of microbiome science, especially in its applications to human health outcomes. In this article, we address the enormous promise of microbiome science and its many challenges, proposing constructive solutions to enhance the reproducibility and robustness of research in this nascent field. The optimisation of microbiome science spans research design, implementation and analysis, and we discuss specific aspects such as the importance of ecological principals and functionality, challenges with microbiome-modulating therapies and the consideration of confounding, alternative options for microbiome sequencing, and the potential of machine learning and computational science to advance the field. The power of microbiome science promises to revo-lutionise our understanding of many diseases and provide new approaches to prevention, early diagnosis, and treatment.


The history and hype of gut microbiome research
The term "microbiome" is relatively new to science (Lederberg and McCray, 2001), however the concepts that underpin this field have their roots in the late 1800s.Ecologists acknowledged that microbes must be studied within their natural environments, replete with the complexity of microbe-microbe and microbe-environment interactions (Dworkin, 2012;Savage, 1977).Culture-based studies of microbial habitats increased in popularity throughout the 1900s, spurred on by the development of anaerobic culturing techniques early in the century (Finegold, 1993).Discrepancies between microscopic cell counts and colony-forming units of bacteria, referred to as the "great plate anomaly" (Staley and Konopka, 1985), motivated the development of sequencing-based approaches to identify uncultivable organisms.These motivations culminated in the first use of 16S ribosomal RNA (rRNA) gene polymerase chain reaction (PCR) to identify bacteria in 1989 (Bottger, 1989).Microbiome research was broadly popularised in 2006 following the publication of a seminal study that provided some evidence to suggest that the obesity phenotype could be transferred from humans to germ-free mice via faecal microbiome transplantation (FMT) (Turnbaugh et al., 2006).Nearly two decades later, there has been widespread uptake of microbiome research across numerous fields of science, remarkable research investment, thousands of publications, and enthusiastic public interest in understanding the role of microbiomes in human health.
Between 2007 and 2016 the United States invested more than US$1 billion in microbiome research, including funding the Human Microbiome Project (NIH Human Microbiome Portfolio Analysis Team, 2019).Soaring industry investment, including within innovation and agriculture sectors, has gone hand in hand with this public research funding, with microbiome-related applications anticipated to remain ongoing major contributors to the global economy (Li et al., 2020;Meisner et al., 2022;Proctor, 2019).The human gut microbiome has received the most attention (Fig. 1), and many companies now offer personalised analysis of the faecal microbiome, promising helpful information and actionable strategies to improve "gut health".The intense public interest in such diagnostic and interventional products is not surprising, given that 94% of media coverage of microbiome research describes health benefits associated with the microbiome, and 79% of media coverage espouses actions that may be taken to acquire these benefits (Marcon et al., 2021).
However, the hype of human microbiome research appears to have outpaced the evidence that underlies it, both within public and academic spheres.Indeed, an extraordinary breadth of human health conditions have been associated with host microbiomes (Fan and Pedersen, 2021;Vijay and Valdes, 2022).Whilst the media and the public may attribute this phenomenon to the importance of these microbiomes, some assumptions of causality may be premature.A review of FMT in humanised gnotobiotic rodents found that 95% of studies reported the successful transfer of disease phenotypes from donors to recipients (Walter et al., 2020).The authors assert the highly implausible nature of such findings and suggest that this overinflation of success rates is likely related to methodological and publication biases (Walter et al., 2020).
Additionally, most human microbiome research to date is observational, thus it is possible that the apparent causative links between microbiomes and host health outcomes are exaggerated within the literature.There are also fundamental questions within the field that remain unanswered; for instance, there is no agreed-upon definition of a "healthy" microbiome, nor an impaired one, and the undefined term "dysbiosis" lacks scientific utility (Brussow, 2020).As a result, interventions applied to improve the health of microbiomes are based on poorly defined outcomes, which may adversely affect the field's ability to transition into a translatable science (Brussow, 2020).
After an initial period of hype and excitement, the field now finds itself in need of a period of conservative contemplation.Reflecting on learnings from the past decade is essential to better inform future research strategies.Increasingly, it is becoming clear that ecological frameworks, computational and systems biology approaches and multiomics techniques are necessary to infer causality between microbiomes and human health outcomes.The purpose of this review is to briefly highlight some key learnings, ongoing challenges, and future considerations for the next steps in microbiome research, acknowledging that a vast amount of additional work has been achieved in this field beyond the present synthesis.In particular, we strongly advocate for the development of rigorous and critical methods in the field with reproducible and transparent reporting, and for the collaboration between researchers with a range of necessary expertise across multiple disciplines.

It takes a 'microbial village'
There are conflicting philosophical perspectives regarding how to define the mutualistic relationships between humans and their resident microbiomes (Inkpen, 2019).On the one hand, microbiomes have been considered functional organs of the human body, together comprising a highly integrated ecosystem, known as a "holobiont" (Hutter et al., 2015;Simon et al., 2019).On the other hand, microbiomes have been considered distinct ecosystems that interact with the human host via an "ecosystem services" paradigm, with microbiome-host reciprocity Fig. 1.Growth of microbiome research over the past two decades.The number of papers published each year containing "gut / stool microbiome", "oral microbiome", "skin microbiome", "vaginal microbiome", "lung / respiratory microbiome", "breast milk / human milk microbiome", and "nasal / nasopharygeal microbiome" in the title or abstract.Data extracted from PubMed March 2023.rather than full integration (Douglas and Werren, 2016;Foster et al., 2017;Sharp and Foster, 2022).Elucidating the true nature of the hostmicrobe relationship is likely essential to fully determine the contribution of microbiomes to human health outcomes.
There has been a lot of public, philanthropic, and commercial research investment into the identification of specific features of microbiomes that correlate with health outcomes.Identification of such key features has been promising in fields such as psychiatry, where diagnostic and prognostic biomarkers are lacking (Garcia-Gutierrez et al., 2020;Hyman, 2012), or for fields such as oncology, where biomarkers may improve early detection and targeted microbiomemodulating therapies may enhance treatment response (Cammarota et al., 2020).However, despite a plethora of preclinical and clinical work, accurate and consistently replicable microbial signatures that associate specifically with health outcomes in humans are yet to be discovered.
To understand their contribution to health outcomes, there is a need to understand the fundamental ecological processes that contribute to the temporal and spatial dynamics of microbiomes (Costello et al., 2012;Gilbert and Lynch, 2019).Underpinning this is the need to understand the processes driving host-microbe and microbe-microbe interactions both across individuals and over time.Most human microbiome research to date has focused on the diversity, relationships, and interactions of microbes within an environment (i.e., microbial ecology).However, the investigation of large scale patterns and mechanisms of microbial distributions and abundances (i.e., macroecological concepts) and the interactions between local microbial communities (i.e., metacommunity theory) is essential (Miller et al., 2018;Shade et al., 2018).Applying these concepts to human microbiomes is complex, as the processes that shape and drive microbial community compositions (i.e., community assembly) are likely influenced by many factors.These include environmental dynamics or between species interactions (i.e., deterministic processes) as well as more unpredictable factors (i.e., stochastic processes) such as the random fluctuations in species abundances over time (i.e., ecological drift) and the movement or transfer of microbial species between communities (i.e., dispersal).Strengthening our understanding of these factors may afford a better understanding of how microbiomes develop and, subsequently, how they respond to disturbances (Costello et al., 2012;Gilbert and Lynch, 2019).
Metacommunity theory has been applied to gut (Ma, 2021a), skin (Manus, 2022), vaginal (Ma, 2021b), and lung (Conrad et al., 2013;Dickson et al., 2014;Willis et al., 2020) microbiome research.However, whilst many researchers across disciplines have somewhat grasped concepts such as αand β-diversity, metacommunity theory studies are complex to read for the non-ecologist audience, and conducting such studies will necessitate inter-disciplinary cooperation.Densely sampled time-series data, and computational resources to manage these longitudinal, repeated-measured data, are also required (Gilbert and Lynch, 2019).Study of a small, but densely sampled, population of humans and mice has provided novel insights into short-term fluctuations, long-term stability, and dietary response of the gut microbiome by applying metacommunity theory (Ji et al., 2020).Ongoing technological advances and accessibility within the field will hopefully make such informative studies more commonplace in the future.
Community assembly processes such as dispersal and ecological drift have also been investigated in the context of viral and fungal communities (Ma and Mei, 2022;Tong et al., 2019).Despite recognition of the contribution of non-bacterial microbes (such as viruses, fungi, archaea, and protists) to microbiomes, these fields of research remain in their nascency.It is appreciated that these other microorganisms form complex intra-and inter-kingdom co-occurrence networks with their more commonly studied bacterial counterparts (Dhingra et al., 2018;Matchado et al., 2021), however, we are yet to scratch the surface of understanding these relationships.So although the lack of replicable and consistent findings within the microbiome field may be leaving some feeling deflated, there remains great opportunity for this relatively young field to further our understanding of human health outcomes.

Purpose is just as important as presence
Numerous studies have shown that profound variability in the taxonomic structure of communities can occur in the absence of functional differences (Louca et al., 2016a).Drawing on findings from environmental ecology, the metabolic functional potential of microbial communities has been observed to be closely related to environmental factors, whereas these same factors poorly explained taxonomic composition (Louca et al., 2016b;Nelson et al., 2016;Raes et al., 2011).In humans, environmental factors such as diet (e.g., dietary fibre and short-chain fatty acid production) and niche (e.g., gut, skin, mouth) are known to influence both the taxonomic structure of microbiomes and microbial functional capacity (Reynoso-García et al., 2022).Whilst there appears to be core pathways shared across microbiome niches, other functional pathways appear to be more niche-specific, such as enrichment of pathways involved in sugar or oxidative metabolism in the oral cavity, or enrichment of pathways for cellulose metabolism in the gut (Bornigen et al., 2013).This raises the important question of purpose versus presence, and how this relates to the notion of "best practice" in microbiome science.
It is generally accepted that species with similar traits may execute very similar roles (Weissman et al., 2021).Yet, as we continue to sequence microbiomes more deeply, strain level differences in functionality are becoming more apparent (Yan et al., 2020).Some functional information can be found for specific strains in published literature, and culture-based studies have been instrumental in identifying putative microbial functions.However, the unculturable nature of many bacteria has been a major limitation, thus the evolution of genetic sequencing approaches to link taxonomic features with potential functional attributes has been a welcomed addition.Inference-based databases (e.g., PICRUSt2) can leverage 16S rRNA sequencing data to suggest potential functionality (Douglas et al., 2020), with even greater insights into potential biological functions, and thus the metabolic potential of a microbial community, achieveable using whole-genome shotgun metagenomic sequencing (Sharpton, 2014).Finely curated databases of microbial traits across human bodily niches have also been developed using sophisticated machine learning and network analysis approaches (Weissman et al., 2021).Additionally, the emergence of bacterial single-cell RNA sequencing techniques to study microbial communities constitutes a promising approach to understand how the communities are organized in specific configurations, as well as the genetic mechanisms governing their stability and resilience against environmental stressors (Ma et al., 2023).
One of the key limitations of these aforementioned methods that measure potential functionality from genes is that gene presence does not necessarily equate to a subsequent functional outcome.The integration of multi-omics techniques including metagenomic sequencing (to inform which microbial features and genes are present in a particular environment), metatranscriptomic sequencing (to inform whether these genes are being transcribed into messenger RNA), and metaproteomics (to inform whether these expressed genes are subsequently being translated into proteins) are required to provide greater insight into the functional cellular and metabolic pathways characterising microbial ecosystems (Aguiar-Pulido et al., 2016;Zhang et al., 2019).As these approaches continue to become more accessible, their co-utilisation is occurring more frequently both in the context of human health (Ali et al., 2023;Mills et al., 2022;Worby et al., 2022), as well as in other disciplines such as food science (Ferrocino et al., 2023).Integration of these techniques using systems biology approaches may assist with the generation of mechanistic hypotheses related to microbiomes, increase functional predictivity, and allow for a greater conceptualisation of the function, or dysfunction, of microbial ecosystems (Dugourd et al., 2021;Ferrocino et al., 2023).However, understanding where these functions originate or how they relate to the physiology and metabolism of the microorganisms that comprise microbiomes will require further investigation.
Microorganisms also produce and modify molecules that can act both locally and systemically.Metabolomic analyses of human biospecimens (e.g., stool, blood) are progressively being used to complement microbiome analyses to better understand their functional potential, ranging from highly targeted analyses (e.g., short-chain fatty acid quantification) to untargeted profiling (i.e., metabolite discovery panels).Several studies combining metagenomic and metabolomic analyses have identified plasma metabolites that robustly associate with microbial diversity and composition, however, the plasma variance explained by the gut microbiome varies across metabolites (Chen et al., 2022;Dekkers et al., 2022;Visconti et al., 2019).There are also microbial features that associate with multiple metabolites and vice versa, underscoring the fact that different microbiome compositions can ultimately serve the same purpose through functional redundancy.Understanding these functional attributes and interactions is essential for developing strategies to promote optimal microbiome function and to treat microbiome-related diseases.It is time to move beyond merely cataloguing taxa, and towards a functional, ecological framework for understanding the human microbiome.

The complications of confounding
The multitude of different protocols and methodologies, from data collection to data analysis, that have been developed within the field of microbiome science have led to significant technical variation between studies.An understanding of these technical confounders is important when interpreting and comparing studies, or when seeking to conduct meta-analyses that combine results from different studies.Although there are methods that are considered "gold-standard", often these methods are difficult or impractical to implement in human-based research.For example, stool microbiome composition is considered a proxy for lower gut microbiome composition and remains the primary method of measuring this microbial niche, despite some limitations such as exposure to contamination by human DNA (Tang et al., 2020).The gold standard for stool sample collection is considered immediate freezing of stool post-evacuation (Costea et al., 2017).However, this method relies on people adequately storing stool samples in their home freezer or immediately returning them to a laboratory, adding additional complexity due to the need for appropriate cold-chain transportation that can be negated with the use of stabilisation buffers.Fortunately, the effect of stool sample collection method on microbiome composition is relatively small compared to other sources of variation (Bartolomaeus et al., 2021;Wang et al., 2018), allowing for greater consideration of participant burden with minimal loss of sample integrity.
On the other hand, the DNA extraction protocol used has been identified as the greatest technical source of microbiome composition variation, with comparable effect sizes observed to that of nutritional and clinical data (Bartolomaeus et al., 2021).The mechanical lysis step has been implicated in driving these differences (Costea et al., 2017;Knudsen et al., 2016), with larger beads appearing to be less effective at breaking apart bacterial cell walls, particularly for Gram-positive bacteria (Costea et al., 2017;Santiago et al., 2014;Smith et al., 2011).These differences in protocols may impact the clinical interpretation of results (Costea et al., 2017), and may in part be contributing to the issues relating to reproducibility within the field.Thus, the clear reporting of microbiome processing methods is essential, and including a discussion of the impact of technical variance should be considered where possible, for the proper scientific interpretation of these studies.
The identification of microbial features that characterise a particular disease state has been a key area of research (Strati et al., 2022).Studies that utilise case-control study designs commonly match healthy and diseased persons for confounding factors such as age and sex (Pearce, 2016).However, there are numerous other factors that are consistently associated with microbiome compositions or that explain compositional variance.One factor that is particularly difficult to match in case-control studies is medication, especially in the case of polypharmacy.It has been repeatedly shown that medications can have a significant effect on the composition of microbiomes, greater than the effect of a disease itself (Eckenberger et al., 2022;Falony et al., 2016;Forslund et al., 2015;Nagata et al., 2022).Many commonly used medications, such as proton pump inhibitors, statins, and metformin, have been linked to changes in gut microbiome composition (Forslund et al., 2015;Freedberg et al., 2015;Jackson et al., 2016;Vieira-Silva et al., 2020), and non-antibiotic medications have been shown to have antimicrobial effects (Maier et al., 2018).The consistently observed strong effect of medication on gut microbiome not only complicates observational studies, but also adds complexity to clinical trials.For ethical reasons, many medications cannot be stopped prior to clinical trial participation, and indeed may have prolonged effects on the gut environment.Trials must often decide whether to recruit only non-medicated individuals, which may not be a representative population, or include those on medications, which may confound study results.This remains a key challenge within the field of microbiome research, as medications and other covariates, and transparency regarding such factors, are essential in the discussion and interpretation of findings.
Another complex and difficult covariate to consider is diet.Particularly with regards to the gut, dietary factors are a key modulator of microbiome composition (David et al., 2014), and indeed many studies utilise diet as an intervention due to this effect.Both overall dietary patterns and dietary components (i.e., micronutrients, macronutrients, and even non-nutritive components of the food matrix) have been shown to influence gut microbiome composition (Chassaing et al., 2022;Rinninella et al., 2019;Snelson et al., 2021a;Snelson et al., 2021b).Habitual dietary patterns can differ substantially between individuals, thus diet may be an important confounding factor.Controlling for diet could include providing all food during a trial, as is utilised in some dietary trials, however this is expensive and logistically difficult approach which may not be feasible for most microbiome trials.An alternative is the use of household controls who are not only consuming similar diets but will also be exposed to similar environmental exposures, such as the presence of pets or geographical factors that associate with microbiome composition (Dill-McFarland et al., 2019;Valles-Colomer et al., 2023;Yatsunenko et al., 2012), however this may hinder the ability to match by age and sex.Indeed, this study design has recently been used to dissect the alterations in microbial metabolic activity that may be a contributing factor in multiple sclerosis (iMSMS Consortium, 2020, 2022).Considering diet is a critical modulator of the composition of gut microbiomes, and likely other microbiomes, careful consideration of diet is essential during the study design phase.

Microbiome modulation as a treatment strategy is challenging
Given the accumulating evidence associating microbiome compositions and functions with human disease, it is unsurprising that there has been substantial interest in using microbiome-modulating therapies that leverage microbiome plasticity to address human health issues.Microbiome modulation has multiple potential avenues of utility, including improved understanding of host-microbiome interactions, development of new treatments for various disease conditions, and enhanced understanding of personalised responses to inform precision medicine.Several different approaches to modulating microbiomes exist, including a range of biotic therapies (i.e., pre-, pro-, syn-, and post-biotics), FMT, and dietary interventions.The use of biotic therapies, particularly probiotics, has been promising, however results are not always reproducible, likely due to factors such as the use of different dosages and strains across studies (Dudek-Wicher et al., 2020;McFarland, 2021) as well as intrapersonal factors such as genes, lifestyle, and baseline gut microbiome composition that may influence response (Jain, 2020).The experimental evidence testing other treatment strategies such as FMT for disorders other than Clostridioides difficile infection in a clinical context is generally still in its early phases (Green et al., 2020), with several barriers to widespread implementation (Nigam et al., 2022).Our understanding of the mechanisms of action of FMT are also unclear, with weak evidence of an association between donor strain engraftment and clinical outcome success commonly observed across studies (Ianiro et al., 2022;Schmidt et al., 2022).Whilst many studies have looked at associations between dietary patterns and gut microbiome composition, surprisingly few randomised controlled trials investigating the effect of dietary change on gut microbiome composition have been conducted (Ghosh et al., 2020;Wilson et al., 2020).Further, these findings are complicated by "loss-of-function" experimental studies of antibiotics, some of which have shown no substantial phenotypic change despite major changes to gut microbiome composition (Reijnders et al., 2016).Addressing some of these gaps in our knowledge is likely critical to fully leverage the potential benefits of these therapies for human health.
The lack of regulation of microbiome-modulating therapies remains an ongoing concern.Commercially available prebiotics and probiotics are poorly regulated in most jurisdictions, and being marketed as a food or dietary supplement means that they bypass the need for approval by drug or therapeutic goods administrators.There is commonly limited testing for bacterial content and concentration within these products, and many commercially available products do not meet the cell counts reported on their labels or the thresholds to be considered a probiotic (Naissinger da Silva et al., 2021).This lack of regulation raises concerns regarding the efficacy, safety, and quality of probiotic products that are deemed, and indeed marketed, as safe to the public (de Simone, 2019; Naissinger da Silva et al., 2021).This may be of even greater relevance to immunocompromised patients; for example, over-the-counter probiotic use has been shown to potentially reduce immunotherapy efficacy (Spencer et al., 2021), with different strains of commonly used probiotics such as Bifidobacterium bifidum potentially having differential effects on cancer treatment response (Lee et al., 2021).Regulatory concerns also exist for FMT; indeed, stool is not a consistent nor standardised product, and rigorous screening of donor stool is required (Costello and Bryant, 2019).Of major concern are reports of individuals attempting "do-it-yourself" home FMTs based on online videos, books, and anecdotal experiences (Smith et al., 2014), the dangers of which have been highlighted by recent deaths associated with the transfer of antimicrobial resistance genes (Green et al., 2021).However, the ongoing development of "stool banks" and standardised synthetic microbiomes, alongside regulatory frameworks and guidelines for FMT globally, is promising (Cammarota et al., 2019), and hints towards a future where FMT will be more accessible.
Compounding these regulatory and safety concerns, there is very little consensus regarding dosing strategies for biotics or FMT.There is an urgent need for the uptake of more rigorous phase I dose escalation studies to optimise the efficacy of microbiome-targeting therapeutics.This must be based on a strong foundation of preclinical development using standardised methodologies and reporting requirements to enhance reproducibility and rigour (Gheorghe et al., 2021;Secombe et al., 2021).Such approaches have been used to inform the safety and tolerability, whilst also informing appropriate doses for adequate colonisation, of live biotherapeutic products consisting of a defined consortia of bacterial isolates (Dsouza et al., 2022).As the lack of efficacy of microbiome-modulating treatments in many clinical trials may be an artifact of improper dosing regimens, such dose-response studies would be of great utility to the field and may bring us closer to consistent and translatable treatment options.
Another challenge within the field is the complex interaction of host indigenous microbes with the incoming microbiome-modulating therapies.The success of incoming taxa is dependent on existing community structure, an aspect of community assembly known as "priority effects" (Debray et al., 2022;Ojima et al., 2022).Resident microbes can influence the bioavailability, and therefore the efficacy, of incoming therapeutics via direct and indirect mechanisms (Cussotto et al., 2021;Scher et al., 2020;Zitvogel et al., 2018).As such, baseline microbiome compositions may influence the success of microbiome-modulating therapies, such as antibiotics and probiotics (Rashidi et al., 2021;Zmora et al., 2018), and dietary interventions (Klimenko et al., 2022).Interestingly, recipient gut microbiome composition has been identified as a greater determinant of FMT strain colonisation than donor factors, which is counterintuitive to the "super-donor" hypothesis (Schmidt et al., 2022).Such findings suggest that microbiomes may be leveraged for the development of personalised medicine strategies in the future.

Development of new and more affordable techniques
Continuing to develop our understanding of microbiomes, both in terms of composition and function, is paramount in our quest to improve human health.To achieve this, ongoing improvements and advancements of current techniques are necessary to obtain deeper and more accurate data sets, and translational progress.Recently, new and more affordable options have been developed for long-read sequencing through Pacific Biosciences (PacBio), Oxford Nanopore and other commercial technologies (Albertsen, 2023).While beneficial for the quality of data, the availability of multiple sequencing options does introduce challenges when pooling data across studies, and from different sequencing eras.
Over the last decade, short-read methods have been instrumental in enabling some major microbiome-related discoveries (Dogra et al., 2021).However, the use of only one or two hypervariable regions of the 16S rRNA gene limits taxonomic resolution, largely to the family or genus level.While these technologies are useful for hypothesis generation, they are limited in their ability to elucidate changes in a community.For instance, if treatment with a certain drug causes one species of a particular genus to be eradicated, while another species of the same genus increases in abundance, the drug will appear to have no effect at the genus level.Long-read sequencing platforms, which can sequence the whole 16S rRNA gene rather than just specific regions, can give greater resolution to the species level, although some genera are not able to be differentiated based on their 16S rRNA gene.These technologies are now as affordable and easy to use as short-read technologies; however, uptake within the field has been slow.The utility of long-read sequencing has been explored, and this method has shown advantages in being able to cover regions of the genome, such as mobile and highly conserved sequences, that have been previously difficult to measure using other methods (Kim et al., 2022).Sequencing whole bacterial genomes using whole genome shotgun sequencing generates massive amounts of high-quality data that can provide strain-level resolution, in addition to providing information on the functional potential of bacteria.Whole genome sequencing can outperform the amplicon sequencing approaches in price per genome and quality (Albertsen, 2023), and uptake of these techniques is steadily increasing.Importantly, the use of genome sequences allows researchers to go beyond composition, towards an understanding of the potential function of a community, in addition to producing data on all other microbes present in a habitat, not just bacteria.Additionally, other techniques such as target-enriched long-read sequencing (TELSeq) can be used for better detection of antimicrobial resistance genes, including those of low abundance, filling a critical gap in the ability to advance understanding of antimicrobial resistance, with potential applications in public health and clinical decision-making (Slizovskiy et al., 2022).
Both amplicon and whole-genome sequencing techniques have become the primary methods used in the field to characterise a microbial community.However, whilst they have been revolutionary, these technologies are not without a number of shortcomings.DNA sequencing techniques are unable to differentiate between viable and non-viable cells, as both contain DNA, making it impossible to distinguish living and dead components within a microbiome sample (Emerson et al., 2017).There are also variations in 16S rRNA gene copy number between (and even within) different bacterial species (Vetrovsky and Baldrian, 2013).This variation introduces bias into bacterial abundance estimates which is then further exaggerated during the amplification process.Indeed, perhaps one of the greatest issues in the field is the use of relative abundance in place of absolute abundance.The absolute abundance of each taxa is biologically relevant, yet data generated by amplicon sequencing are interpreted in terms of relative abundance due to the compositional nature of the data (Gloor et al., 2017).In a compositional dataset, an increase in one taxon's abundance by definition causes an equivalent decrease in the relative abundance of all other taxa.This issue inherently leads to a high rate of false positives in differential abundance analyses (Barlow et al., 2020;Hawinkel et al., 2019;Weiss et al., 2017).Additionally, calculation of relative abundance does not accurately determine the direction or magnitude of change (i.e., did one taxon A increase and taxon B decrease, or did they both increase just at different magnitudes) which may further be hindering the replicability and translatability of microbiome research (Barlow et al., 2020).Thus, despite these advancements, a "back to basics" approach may help to resolve some of these sequencing limitations.Affordable and simple techniques such as culture and quantitative PCR (using a single copy target) address many of these issues and can be layered on top of sequencing data to bolster findings and verify interpretation.Further, new statistical methods are being developed to allow integration of data generated from such methods with sequencing data, such as modelling taxon-specific efficiencies, which will facilitate more accurate and meaningful findings (Williamson et al., 2022).
Novel techniques with enhanced sequencing potential are exciting and promise new discoveries in our field and beyond.Yet, we must be cautious of existing microbiome data becoming redundant alongside the sequencing techniques used to produce them.Pooling of data from old and new technologies cannot always be achieved, so long-term storage of sample data may be important to ensure the ability to re-analyse sequences against updated reference databases.Furthermore, the need for adequately skilled bioinformaticians to analyse data from these evolving techniques must be front of mind to ensure appropriate handling of data and meaningful interpretation (Clavel et al., 2022).

Large cohort studies with machine learning/predictive modelling
Understanding microbiome-phenotype associations, or how microbiomes influence or implicate host physiological states, underlies the promise of providing microbiome-based solutions in personalised medicine.Evaluating the complexities of host-microbiome interactions requires the integration of several layers of information, often in the form of "omics" datasets, as well as environmental parameters and hostassociated characteristics.Extracting knowledge to make accurate predictions from these complex datasets has been boosted by the implementation of advanced statistical methods such as machine learningbased predictive tools.
In the microbiome field, machine-learning algorithms have been used in different facets, including microbial systematics (Parks et al., 2018;Rinke et al., 2021), antimicrobial peptide prediction (Ma et al., 2022), and viral discovery (Nayfach et al., 2021;Neri et al., 2022).However, the most popular implementation is for the identification of microbiome-related attributes that can be used as markers for early diagnosis of human diseases, prediction of treatment response, and disease progression trajectories.Unsupervised methods in the form of cluster analysis have provided new insights into the ecology of the human gut microbiome by proposing that the gut microbiome is structured in stable states or enterotypes, which represent distinct taxonomic and functional entities (Arumugam et al., 2011).Enterotypes have allowed the stratification of individuals based on their gut microbiome and have been associated with various host parameters, prediction of treatment response, as well as prediction of current gut microbiome composition based on historical factors (Costea et al., 2018;Qiao et al., 2023;Si et al., 2022;Valles-Colomer et al., 2023).Supervised learning approaches have also been extensively used, demonstrating, for example, that the gut microbiome contributes to glycemic responses to diet (Rein et al., 2022;Zeevi et al., 2015) and can be used to predict incidence of several disease types (Caussy et al., 2019;Kartal et al., 2022;Ruuskanen et al., 2022;Wirbel et al., 2019).A third machinelearning paradigm known as reinforcement learning has also been used, for example, to identify biomarkers of obesity (Liu et al., 2022b).Additionally, in a step towards achieving causality within microbiome research, Mendelian randomisation has been used to identify taxonomic and functional modules of microbiomes causally linked to human pathology and modulation of metabolic traits (Liu et al., 2023;Liu et al., 2022a;Moitinho-Silva et al., 2022;Sanna et al., 2019).Models predicting causal effects of the gut microbiome on host phenotype pave the way for developing new treatment options to limit exposure to gut microbiome-related risk factors.
With sequencing technologies and metabolic phenotyping becoming more affordable for researchers, we envisage that machine learning algorithms will be pivotal in the analysis of future cohort studies and randomised clinical trials to inform microbiome precision medicine strategies.However, it is crucial that generated models provide robust, interpretable, and reproducible evidence for microbiome-mediated clinical phenotypes, and, to that end, some considerations must be made before embarking on using machine-learning algorithms.Machine learning models are often predictive, which is not always applicable to all research questions, and they do not necessarily model the full complexity of microbiomes.Cautious application and statistical support is required to prevent inaccurate findings that will further exacerbate the lack of reproducibility within the field (Volovici et al., 2022).Technical expertise is also required to ensure the complex methodological assumptions and requirements that underpin machine learning techniques are adhered to in order to maintain the scientific accuracy and integrity of the results (Teschendorff, 2019).Fortunately, there are ongoing efforts to avoid common pitfalls associated with machine learning, including the development of machine-learning reporting guidelines, as well as machine-learning frameworks and data repositories specifically created for microbiome research (Salim et al., 2023;Stevens et al., 2020).Minimising the risk of bias, increasing reproducibility, and promoting the construction of large learning repositories as well as data sharing through multi-institutional collaborative efforts, will help to build better machine-learning models to quickly progress towards harnessing microbiomes to benefit our health.

'Cut the crap' from microbiome research
Despite the exponential increase in the interest in human microbiomes and the quantity of research outputs across the scientific literature, there remain relatively few well-replicated findings.Methodological variability must explain a proportion of the heterogeneity of findings, as highlighted by authors of systematic reviews from a number of microbiome sub-fields, from psychiatry (McGuinness et al., 2022) to cardiovascular disease (Attaye et al., 2022).In microbiome research, the protocols, procedures, and materials used across the pipeline provide a particularly large number of sources of heterogeneity, each with separate and interacting impacts on the results (Nearing et al., 2022).These sources are in addition to the roles of study design, selection of participants, the measurement of numerous covariates and potentially confounding factors as previously discussed, methodological reporting, and the interpretations of statistical outputs that are largely unfamiliar to many.Given these factors and the overall microbiome hype, publication bias within the field seems likely (Secombe et al., 2021;Walter et al., 2020).
The importance of methodology standards for quality and consistency has been widely acknowledged.The International Human Microbiome Standards project (http://www.microbiome-standards.org/, 2011-2015) was funded by the European Commission to develop standards and guidelines for microbiome-related methodologies, and has published protocols for the collection, processing, sequencing, and analysis of human microbiome samples.Some similar undertakings include the UK National Institute for Biological Standards and Control reagent reference standards (Amos et al., 2020) and the Quadram Institute's "Best Practice in Microbiome Research" protocols (https://quadram.ac.uk/best-practice-in-microbiome-research/), as well as independent efforts (Bharti and Grimm, 2021).Unfortunately, it is unclear whether any group has taken the mantle internationally to provide updates that are comprehensive and keep pace with the rapid development and innovation in methods.
In the absence of static consensus regarding methods to use, a large consortium of microbiome researchers contributed to the Strengthening The Organization and Reporting of Microbiome Studies (STORMS) Checklist (Mirzayi et al., 2021) which provides a minimum list of elements required for the adequate reporting of microbiome-related methods and results within publications.The overarching aim of this checklist is to enhance the transparency and reproducibility of microbiome research, rather than guiding methodological choices.Indeed, it is unlikely that there is one perfect methodological approach that is free from bias and both logistically and ethically feasible in all circumstances, therefore multiple lines of high-quality evidence are important to guide our understanding.Promisingly, the STORMS checklist has been adopted by Nature Publishing Group and Gigascience Press and may become an industry standard, in a similar fashion to the way in which the STROBE and PRISMA checklists have become standards for observational studies and systematic reviews respectively.
Whilst strides are being made to improve reporting, next steps may include improving the understanding and interpretation of microbiome studies across fields where statistical and scientific terminologies may differ, as well as for clinicians and public health practitioners interpreting microbiome studies.Considering the current challenges and limitations within the field, caution must guide the clinical interpretation of observational and intervention microbiome studies.An understanding of the issues regarding generalisability of findings is crucial, particularly in light of the myriad confounding factors, such as medication use and diet, that may impact microbiome compositions and intervention efficacy.To this end, the consideration of microbiomes within evidence-based medicine is likely premature, but an exciting future prospect.
Another safeguard against the reproducibility crisis and publication biases experienced in other domains (as well as the microbiome field) is pre-registration (https://aspredicted.org/, (Baker, 2016;Nosek et al., 2018;Warmbrunn et al., 2022)).As discussed above, every methodological decision from sample storage method to multiple comparison correction where performed (hopefully every time) will affect the final results and interpretation.Pre-registration need not be restrictive (https: //www.cos.io/blog/preregistration-plan-not-prison);updating is possible, particularly on the basis of evolving methods and unforeseen changes in a study (e.g., smaller than expected sample size).In this critical establishment phase of the field, let us ensure that the foundational "truths" we discover hold up to the best methods available from the history of science more broadly.As additional motivation, badges that enable journals to publicly acknowledge pre-registration are available to add to publications (https://www.cos.io/initiatives/badges).
The increasing support for open science, and widespread availability of data hosting and sharing platforms and standards, will also help with reproducibility, particularly if raw data are made available so that pipelines can be replicated.Data availability is also critical for metaanalyses since microbial pipelines need to be harmonised to quantitatively compare and synthesise findings across studies.The former coordinator of the Human Microbiome Project called for formal management of microbiome data sharing, which requires investment in scientific infrastructure to enable all the determinants of reproducibility, not just for the research itself.There are numerous microbiome centres around the world, and cooperation in a common goal, such as reproducibility, would be a rising tide that floats all boats.Whilst national (e. g., Canadian Microbiome Initiative; HMP; Alliance Promotion Microbiote) and international (IHMC, and MicrobiomeSupport, funded by European Union's Horizon 2020 research and innovation programme) efforts exist, a global coalition of them would be greater than the sum of its parts.
Closer to the ground, how should we, as microbiome scientists, ensure that everyone is sufficiently expert in the requisite laboratory, bioinformatic, and statistical methods of microbiomes?Indeed, what is a "microbiome scientist"?One of the major strengths and interests of the field of microbiome research is that it is fundamentally interdisciplinary and draws people from across a range of areas of expertise.It is surely impossible (and inefficient) for each individual to be an expert in all microbiome-related methods, from sample collection to analytical interpretation, but providing educational opportunities from many angles will only improve standards for all.

Key take-home messages for the future of microbiome science
Microbiome science is an exciting field that is still in its infancy.The technological breakthroughs in recent decades, including amplicon and whole genome sequencing approaches, have led to the exponential growth of this nascent field.Such a novel and dynamic field attracts researchers from a range of disciplines, many of whom may be oblivious (at least initially) to the many technical challenges and pitfalls in the field.Some of the challenges outlined in this article are somewhat unique to the microbiome field, however other challenges around data reproducibility and robustness are relevant to many other fields of science.Maintenance of internationally agreed-upon reporting standards, and rigorous approaches to all aspects of research design, implementation, analysis, and statistics, can help lift the field and accelerate the discovery of novel therapeutic approaches.The future of microbiome science is brimming with opportunities for prevention, early diagnosis, and treatment for complex human diseases.By balancing hype with hope, robust microbiome science may lead to a clinical revolution in the prevention and management of human disease.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.AJ McGuinness is funded through the NHMRC supported Centre for Research Excellence for the Development of Innovative Therapies (CREDIT CRE).CSM Cowan is supported by a National Health and Medical Research Council Investigator Grant (APP1196783).HR Wardill is supported by the Hospital Research Foundation Group.M Snelson is supported by a National Heart Foundation Postdoctoral Fellowship (106698).LF Stinson is supported by the Stan Perron Charitable Foundation and an unrestricted grant provided by Medela AG, administered through The University of Western Australia.AJ Hannan in supported by a NHMRC Ideas Grant and EU-JPND Grant.