Experimental infrastructure requirements for quantitative research on microbial communities

Natural microbial communities are composed of a large diversity of interacting microorganisms, each with a speciﬁc role in the functional properties of the ecosystem. The objectives in microbial ecology research are related to identifying, understanding and exploring the role of these different microorganisms. Because of the rapidly increasing power of DNA sequencing and the rapid increase of genomic data, main attention of microbial ecology research shifted from cultivation-oriented studies towards metagenomic studies. Despite these efforts, the direct link between the molecular properties and the measurable changes in the functional performance of the ecosystem is often poorly documented. A quantitative understanding of functional properties in relation to the molecular changes requires effective integration, standardization, and parallelization of experiments. High-resolution functional characterization is a prerequisite for interpretation of changes in metagenomic properties, and will improve our understanding of microbial communities and facilitate their exploration for health and circular economy related objectives.


Introduction
In natural and man-made environments, microorganisms virtually never thrive as single species. Instead, they flourish as microbial communities of various complexity.
In these microbial communities all different species have a specific role, and their combined effort results in an overall functional performance characterized by the catalysis of different redox reactions and an overall conversion of substrates into products. In our society we make use of our knowledge on these microbial communities for understanding and controlling of food fermentations, soil conditioning, host-microbe interactions, and numerous environmental engineering applications [1,2 ,3,4].
Microbial communities are intrinsically more complicated to investigate than single strains since they are composed of two up to thousands of different types of microorganisms. The functional performance -the sum of all conversions catalysed -of a microbial ecosystem therewith depends on the combined activity of all these microorganisms. There is a wide range of factors that determines the complexity of microbial communities, but the most commonly described include (i) catalysis of sequential conversions (ii) the catalysis of parallel conversions, or (iii) the existence of (redox) gradients resulting in specific space dependent ecological niches and associated different types of microorganisms ( Figure 1). Additional complexity arises from multi-way cross-feeding on a wide variety of excreted metabolites [5], interspecies microbial fusion [6], and dynamic properties of environmental ecosystems like day-night rhythms [4], or other forms of competition and cooperation. The enormous array of environmental conditions in nature has facilitated the evolution of a tremendous microbial diversity that has been found to inhabit virtually all ecological niches on earth.
Microbial ecology research aims for relating functional system dynamics to the changes in the molecular properties in the system. Changes in molecular properties may occur at various levels and time scales: (i) changes in the microbial community structure, (ii) changes in gene expression in the system as reflected in the metatranscriptome and metaproteome, and (iii) direct changes in fluxes due to metabolic flux control. A wide range of experimental tools is currently available to transfer ecological research questions to a laboratory experimental system, and to analyse both the functional properties of the ecosystem as well as its molecular properties.
The rapid development of experimental tools for conducting research on microbial communities gives rise to an intense discussion on how to approach research questions in the field of microbial ecology [2 ,10,11,12 ,13 ].
Here we have taken these considerations into account and analysed the current status of the different aspects of microbial ecology research. On the basis of this analysis we propose a strategy to overcome the limitations encountered in our current approach to microbial community research. The potential impact of this strategy is discussed.

Microbial ecosystems in the laboratory
Research on microbial communities in natural and fullscale engineered ecosystems is often complicated by the troublesome functional characterization of the conversions occurring. For instance, the operation of fullscale wastewater treatment plants cannot be changed with the objective to see how it affects the process performance and microbial community structure, since it may compromise the treatment performance. Similarly, the sampling and identification of substrate and product fluxes in human microbiome studies are hard to conduct, and human microbiome research therefore strongly relies on molecular systems analysis using stool samples and limited insight in the actual conversions catalyzed is achieved. Overall it is in general undesirable or impossible to expose natural microbial ecosystems to specific changes in environmental conditions with the objective to investigate the impact on the system.
To overcome the limitations of in-vivo research on microbial ecosystems, microbial communities are transferred to the laboratory and investigated at various levels of complexity, ranging from direct measurements on environmental microbial consortia, via mesocosms, and microbial enrichment studies in bioreactors to eventually isolation and characterization of key players from the ecosystem under investigation [14][15][16]. Depending on the research question at hand, choices are made on the required degree of simplification of the ecosystem, taking into account that the experimental resolution increases at decreased system complexity.
Typically, the basic tool for mimicking specific environmental conditions in the laboratory is the bioreactor, ranging in complexity from simple batch bottles to high-tech continuous bioreactors equipped with on-line measuring facilities. In these bioreactors the functional response of a microbial community can be analysed and, when combined with appropriate molecular tools for biomass characterization, related to the molecular changes in the microbial community. Integration of these experimental data allows for development of quantitative system-based models for relating functional to molecular changes. This integrated approach generates improved knowledge and understanding of the system at hand, and allows for identification of the impact of changes in environmental conditions on the system. Eventually this enables the understanding and prediction of changes in the natural or man-made ecosystem the experiment aims to mimic. A schematic representation of a typical sequence of events in laboratory research on microbial communities is shown in Figure 2.
In the next sections we will elaborate on the recent developments in both functional and molecular characterization of microbial communities in order to identify fundamental shortcomings and opportunities.

Functional system characterization using process dynamics and on-line measurements
Detailed functional characterization of microbial processes concerns the identification and quantification of redox reactions catalysed in a microbial ecosystem in relation to the development of the biomass concentration and composition. Measurement of these variables enables the identification of biomass specific fluxes in relation to the thermodynamic driving forces. One of the major challenges in the identification of biomass specific fluxes is the dependency of quantitative data on the microbial community structure, which will be discussed in the next section on microbial community analysis. Here we will discuss the added value of developments in functional   [7], parallel nitrate reduction to dinitrogen and ammonium (middle) [8], and sequential conversions in the two-step nitrification process (right) [9]. Size bars from left to right indicate 100, 10, and 50 mm. characterization of microbial communities: (i) on-line rate measurements, (ii) process dynamics, and (iii) uncoupling of solid and liquid retention times: 1 On-line off-gas composition measurements combined with the supply of an inert gas (e.g. dinitrogen or argon gas) enables the identification of process rates at each moment in time. This approach has been used to identify different competitive strategies in pulse fed aerobic bioreactors ( Figure 3) [17,18]. Off-gas oxygen concentration measurements enabled on-line oxygen respiration rate measurements providing a detailed insight in the process when linked to other on-line measurements such as of the off-gas carbon dioxide concentration and acid/base dosage rate for pH-control.
In the example shown in Figure 3, the oxygen respiration rates in three comparable systems was shown to be significantly different, corresponding to clearly different ecological strategies [18]. It is furthermore evident that the information density obtained from the on-line oxygen uptake rate measurements (Figure 3b) is significantly higher compared to the off-line substrate concentration measurements (Figure 3a). The functional dynamics of a system during long term cultivation studies can subsequently be analysed through definition of key variables that can be identified from the online measurements for each operational cycle. In the example shown in Figure 3 we used the (i) length of the period of substrate presence, (ii) the oxygen uptake before and after substrate depletion, 160 Environmental biotechnology Typical work flow for microbial community research: (1) a research question originating from any microbial ecosystem is translated to a laboratory cultivation experiment (2), that allows for identification of the response of the microbial community to changes in environmental conditions (3), and understanding of the molecular drivers responsible for the response observed (4).
and (iii) the increase in oxygen uptake rate during the presence of substrate, as key variables for analysing the development of the functional properties of the system over a period of more than 100 generations [18]. 2 The high information density of the on-line rate measurements proposed in the previous paragraph only holds true if some dynamics in operation of the process are established. Continuous Stirred Tank Reactors (CSTR) operated at a constant dilution rate are typically substrate limited, and even though the extent of substrate limitation may vary, this will hardly be reflected in the conversion rates observed. This means that even though major changes in the actual stoichiometric and kinetic capacities of the microbial community may occur, they cannot be identified from standards measurements. Hence, some form of operation dynamics needs to be implemented in order to increase the information density of the generated data. This can be established in pulse fed experiments, as used in the experiment described in the previous paragraph, or through a periodic increase of the dilution rate, for example. Combined with on-line rate measurements the dependency of the fluxes in the process can be identified as a function of the actual process conditions during these pulses, and the long term functional properties of the system can effectively be monitored [19][20][21]. 3 A third important aspect of current microbial community cultivation methods is the possibility to uncouple solid (biomass) and liquid retention times in the process through application of a solid-liquid separating membrane in the reactor outlet. Whereas in the past solid retention in laboratory bioreactors was achieved through periodic settling or through formation of biofilms, solid liquid separating membranes provide a more controlled method for solid retention in the process. Membrane bioreactors enable experiments at relatively low substrate and product concentrations and low growth rates, but well measurable biomass concentrations as often encountered in natural environments [22,23].
Microbial community structure and function analysis As described in the previous section, current methods enable us to identify accurately the overall process stoichiometry and rates in laboratory microbial ecosystems. This includes the redox reactions catalysed as well as the  total biomass produced in the process. One of the key remaining challenges in functional characterisation of a microbial community is related to the biomass composition in the process. In order to identify biomass specific fluxes (so called q-rates, [24]) a quantitative description of the microbial community structure in terms of functional groups of microorganisms is required. Only when the community structure is known and fluxes can be attributed to specific groups of microorganisms we can claim to understand the system and investigate how specific experimental variables affect both the conversions as well as the microbial community.
The currently most widely applied method for identifying the microbial community structure is based on high throughput sequence analysis of the 16S ribosomal RNA (rRNA) gene pool that is PCR-amplified from biomass DNA. This method typically generates compositional data, that is, information on the presence of specific microbial taxa as well as their relative abundance in the community. However, such compositional data do not allow absolute quantification of the population sizes of different microbial taxa. Furthermore, a number of technical biases related to DNA-extraction efficiency, PCR specificity, variations in copy numbers of the targeted gene, and cell size have been identified. Consequently, measured community structure data can be orders of magnitude different from the actual microbial community composition in terms of the actual distribution of protein or cell dry weight in a sample. The quantitative representation of the community structure (typically in bar-charts) can be misleading since it suggests insight in the actual community structure, even though the biases are such that this cannot be claimed [25][26][27][28]. The key value of 16S rRNA gene amplicon sequencing can be in the determination of the relative changes in time in microbial community structure which can be analysed adequately if experimental procedures are consequently applied.
Modified experimental procedures such as the inclusion of internal standards may contribute to absolute quantification of the microbial community structure [29]. Combination of amplicon sequencing data with more quantitative methods like fluorescent in situ hybridization (FISH, see Figure 1), quantitative real-time PCR or flow cytometry has been proposed to obtain more community structure data [30][31][32][33]. It should be noted, however, that also these methods are not without bias, as has been shown by comparing quantitative microbiome profiles obtained by qPCR and flow cytometry [34]. Furthermore, current metagenomic and/or metaproteomic methods are potentially less biased than 16S rRNA gene amplicon sequencing based microbial community structure analysis, and together with metatranscriptomic analysis, these methods provide additional insights in functional capacity and activity [2 ,35 ]. Overall, the quantitative coupling of measurable fluxes to specific guilds of microorganisms remains challenging.

Towards the integration of methods for effective and quantitative research on microbial communities
It is evident that microbial ecology research requires a wide range of fields of expertise. Nevertheless, the historic focus of research groups on the individual aspects of microbial ecology has hampered the integration of the available tools as required for increased understanding of the microbial world. Furthermore, the financial implications of the infrastructure requirements for conducting integrated research on microbial communities suggests that effective cooperation is strictly necessary. Even though the prices for sequencing dependent molecular analysis of microbial communities have gone down rapidly, sample preparation and data processing still require a considerable amount of human power and computer power. Thus, in order to move microbial ecological research beyond the state-of-the-art, major investments are required.
Overall we have identified three key aspects that are required to achieve more effective and quantitative research on microbial communities: 1 Integration. Effective laboratory research on microbial communities requires integration of the different fields of expertise such as microbial physiology, molecular ecology, bioprocess engineering, biochemistry, process modelling, and bioinformatics. Depending on the actual research question, a combination of the different fields of research are required. In individual research groups not all these fields of expertise are available, imposing a direct need for collaboration. For example, researchers working on detailed molecular characterisation of microbial communities do not have the same degree of expertise on functional characterisation of microbial communities in the laboratory. The same holds true vice versa. In order to focus on your own field of expertise and the related research questions, one would want to make sure that the other aspects are dealt with in an optimized manner requiring a significant degree of integration of research fields. In summary this suggests that effective collaboration and making use of each other's expertise is a prerequisite for efficient research that facilitates effective focus on the topic of interest and an effective search for answers to the research questions asked. 2 Multiplication. A second limit of current research infrastructure is the scale at which experiments can be conducted. Laboratory scale bioreactor operation is a time and resources consuming method for characterising microbial communities and analysis of their response upon different cultivation conditions. Online rate measurements as previously discussed can reduce the effort required for conducting an experiment, but typically researchers can operate no more than one or two laboratory reactors at a time. This imposes a range of limitations: (i) only one operational variation can be investigated, (ii) no replicate runs are typically conducted, and (iii) since multiple values for a specific operational variable are investigated in sequence, the starting point (i.e. the inoculum) of each experiment is different. Therefore, in order to investigate the impact of a specific operational variable one would want to increase the number of bioreactors that are operated in parallel, enabling the same starting point (inoculum) in each bioreactor. Parallel operation of more (e.g. five to ten) bioreactors combined with on-line rate measurements enables (i) high-resolution investigation of the effect of different operational variables on the system development, (ii) inclusion of replicates, (iii) use of the same inoculum (starting point) in all experiments, (iv) and inclusion of comparative molecular analysis of the development of microbial community composition and functional properties in time. 3 Standardization. A final concern in the field of microbial ecology research is the troublesome reproducibility of experimental results and the limited accessibility of experimental data. These aspects can to a large extent be overcome by including standard protocols for (i) definition of experimental setups and protocols, (ii) data handling and storage, and (iii) adequate storage of microbial communities established in experiments. Integrated storage of both (raw and processed data of) functional system properties (including media, etc.) combined with molecular data, makes data directly available for inter-experimental comparison, post-processing of data, and (metabolic) modelling efforts.
These three main considerations form the basis of the research infrastructure project entitled Unlock that was granted by the Dutch Science Foundation (NWO) in the spring of 2020 and that we will be implementing in the coming years. Unlock consists of three experimental laboratories and a common data platform for investigating microbial communities at different levels of complexity: The modular bioreactor platform, for investigating the most complex microbial communities in their environment. This lab includes specific elements like constructed wetlands, bioelectrochemical systems, and bioreactors that can be operated at high temperatures and pressures for mimicking extreme conditions microbial ecosystems can be exposed to, The parallel cultivation platform, consisting of forty identical lab-scale bioreactors equipped with state of the art on-line analytical equipment for high-resolution functional characterisation of microbial communities of variable complexity upon their exposure to a variety of operational conditions. The biodiscovery platform, consisting of microbioreactor and microfluidics units for screening and characterising defined cocultures of microorganisms, and suspended bead-based systems for single cell genomics. Also a high throughput, largely automated biomass sample processing unit will be implemented for conducting a wide range of molecular analyses using standardized approaches.
The fair data platform, will concern an open source scalable data storage and processing facility. The data platform will be fair by design to maximize data reuse, and focuses on data ownership, intellectual property right while aiming for optimal public data availability [36].
The objective of the Unlock research infrastructure is to enhance the basic level of conducting integrated research on microbial communities.

Conclusions
Microbial community research depends on the effective integration of different fields of expertise. Traditionally, research either emphasizes the development of functional properties of microbial communities or focuses on their molecular characterization and development. Even though mostly some degree of integration of both fields of expertise is established, the true integration of state-of-the-art methods and expertise on functional system characterization, molecular system characterization, and derivatives thereof including process modelling cannot be achieved within one single research project, nor by one specific research group.
To overcome these limitations, we propose that microbial ecology research should aim for integration of the different research fields, multiplication for a scale increase of the experimental facilities, and standardization of the methodologies. We are aiming for integration of experimental facilities for both functional and molecular characterization of microbial communities and data processing and storage. Herewith the objective is the generation of high quality data in all aspects of microbial ecology research as well as efficient management of the data generated. Integration allows researchers to focus on their key research questions and develop their specific field of expertise, being comforted by the idea that all other aspects of the research are dealt with by specialists in the corresponding field. We have united our forces in achieving these objectives by organizing the research infrastructure entitled Unlock (www. m-unlock.nl) that is designed to facilitate ground-breaking fundamental and applied research on microbial communities and their use in a wide range of applications.