MetaMIS: a metagenomic microbial interaction simulator based on microbial community profiles

Background The complexity and dynamics of microbial communities are major factors in the ecology of a system. With the NGS technique, metagenomics data provides a new way to explore microbial interactions. Lotka-Volterra models, which have been widely used to infer animal interactions in dynamic systems, have recently been applied to the analysis of metagenomic data. Results In this paper, we present the Lotka-Volterra model based tool, the Metagenomic Microbial Interacticon Simulator (MetaMIS), which is designed to analyze the time series data of microbial community profiles. MetaMIS first infers underlying microbial interactions from abundance tables for operational taxonomic units (OTUs) and then interprets interaction networks using the Lotka-Volterra model. We also embed a Bray-Curtis dissimilarity method in MetaMIS in order to evaluate the similarity to biological reality. MetaMIS is designed to tolerate a high level of missing data, and can estimate interaction information without the influence of rare microbes. For each interaction network, MetaMIS systematically examines interaction patterns (such as mutualism or competition) and refines the biotic role within microbes. As a case study, we collect a human male fecal microbiome and show that Micrococcaceae, a relatively low abundance OTU, is highly connected with 13 dominant OTUs and seems to play a critical role. MetaMIS is able to organize multiple interaction networks into a consensus network for comparative studies; thus we as a case study have also identified a consensus interaction network between female and male fecal microbiomes. Conclusions MetaMIS provides an efficient and user-friendly platform that may reveal new insights into metagenomics data. MetaMIS is freely available at: https://sourceforge.net/projects/metamis/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1359-0) contains supplementary material, which is available to authorized users.


Data format
MetaMIS accepts a temporal microbial abundance profile as input in which microbial communities should be obtained in a series of temporal samples. A dataset with long time lapses between consecutive data points may have potential problem, the sparse data may cause false results. To simplify the process of data input, we have defined the input file format as follows. First, a temporal microbial abundance profile needs to be processed in advance according to a user-defined taxonomic level.
The data must be in a tab separated text format, described as follows (Fig. 1).
1. OTU identifiers are listed in the first column. The first cell can be empty.
Duplicated OTU identifiers are not allowed.
2. The first row is reserved for time-series sample identifiers that must be numeric; and the remaining spaces should be filled with the microbial abundance values, a numeric matrix.
3. Reads or relative abundance tables are both acceptable. Figure 1 Data input format.

Test data:
1. Data description: A gut meategnomic study containing temporal microbiota from a male and a female was used [1]. Using greengenes taxonomy, the number of total taxa assigned at family level were 92 and 69 for male and female microbiomes. The two datasets were further checked by gene copy number correction program [2]. There are two main blocks in MetaMIS (Fig. 2).
1. Data loading and preprocess blocks.

Pipeline for data flow and parameter selection:
A typical analysis workflow may contain three steps: uploading of formulated data file(s) (Step 1: File selection), specification of the parameters (Step 2: Parameter selection), and performing the model calculations. Input parameters are optional for users or can be set by default.
Select「Open」to open a processed dataset, e.g. F_gut.mat. Figure 3 The exemplification of creating a new project.

Step 2: Parameter selection
⊙Microbial abundance: 「Raw abundance」: Time-series samples retain the original input reads.
「Relative abundance」: All samples are adjusted to the common total reads. ⊙OTU ranking strategy: 「Abundance」: OTUs are ranked based on the ratio of average abundance across samples to the total abundance. The default setting of the ratio is >1% for high abundance, <0.1% for rare, and the remaining for low abundance non-rare OTUs. However, the minimum number of  Using generalized Lotka-Volterra equations [3], the inferred microbial interactions can be sued to regenerate abundance profile. In this regeneration process, there are two parameters to be set.

Step 3: model calculations
「Run model」 : Press this button to perform model calculations based on Lotka-Volterra equations to get abundance-ranking interaction networks.

Example of Test data:
After F_gut.txt is imported into MetaMIS (Fig. 3), MetaMIS immediately generates the basic data description: there are 124 time-series samples, 9 high abundance, 13 low-abundance non-rare, and 47 rare OTUs in this dataset ( Fig. 4(A)).

Case 1:
While pressing the button of 「Default setting」, parameters are automatically determined as follows ( Fig. 4(A)).
•「Initial time point」is set to be the first time point.
•「Stop condition」is set to be 80% of terminal time points.

Case 2:
Pressing the button of 「Change the setting」, users can easily modify the parameters. Then, MetaMIS will update the data structure ( Fig. 4(B)). The threshold of high abundance OTUs was set to 1 in this test data, and the other parameters adopted default values. MetaMIS will automatically adjust the high abundance threshold to make sure that there are at least three high abundance OTUs in an interaction network.
Once the parameter setting is completed, please press the button 「Run model」 to manipulate the Lotka-Volterra equations for inferring microbial interactions.

Five tab panels for MetaMIS outputs
After the implement of MetaMIS on the female gut communities [1], we got totally 17 interaction networks. As shown in Figure 5, the output page contains five main tab panels to display the selected interaction network. The rightest two panels convey the systematic information from all produced interaction networks. More  Bray-Curtis dissimilarity, ranged from 0 to 1, measures the difference between the original and predicted abundance profiles. The lower the Bray-Curtis (BC) score, the more similar the two abundance profiles. In the test case, interaction network with 22 OTUs conveyed a BC score 0.184, reflecting the reliability of the generated dataset.

Panel of "Microbial interactions"
Inferred microbial interactions are displayed in two manners. The tabular structure shows the quantitative interactive relationships. The network topology is to visualize the microbial interactions (Fig. 6). Furthermore, the interaction strengths can be optionally chosen for users. There are three strength types (Fig .7) in MetaMIS. If different interaction strength type is chosen, the interaction network will be generated based on corresponding strengths.  To keep track of visual-spatial information, we provided four kinds of network topological layouts, such as circle, force, layered, and subspace for optional visualization (Fig. 8). (B) 「Arrow size」: The arrow means the direction of influence from a source OTU to a target one.
Users can fine-tune the arrow size to optimize the network visualization.
(C) 「Edge width」: The edge width is proportional to the microbial interactive strength. The larger (E) 「Specific node」: "Specific node" allowed users to partially observe the connective behaviors of a specific OTU (Fig. 9). The default setting is to show the interactive network of all nodes in a global view.

(F) 「N(MIs)」:
How many interaction pairs are selected to display in the network (Fig. 9).
(G) 「Node In/Out」: Select the option of "Node In" (or "Node Out" ), and microbial interactions will converge to (or diverge from) a specific OTU (Fig. 10).

(H) 「Interaction」:
If the positive or negative interactions are concerned, this is an optional item to change the status (Fig. 11).

Panel of "Overview of interaction pairs"
The main purpose of this panel is to identify potential key OTUs by observing the interactive behaviors between one OTU and others. If there are N OTUs in an interaction network, each OTU will have at most N-1 interaction-pair relationships.
The distribution of six interaction patterns, including mutualism (+/+), competition (-/-), parasitism/predation (+/-), commensalism (+/0), amensalism (-/0), and no effect (0/0), is provided for each OTU (Fig. 12(A)). Users can fine-tune the threshold by turning weaker interactions to zero to reveal the influence of weaker interactions on the total distribution of interaction pairs (Fig. 12(B)). The average or summation of all absolute interaction strength from each interaction pattern is illustrated in Fig.   12(C). The two-dimensional principal component analysis (PCA) plot is supported to identify the major interaction pairs of an OTU (Fig. 12(D)). Figure 12 Panel of "Overview of interaction pairs" shows the different types of interaction pairs. PCA plot is supported to reveal the key OTUs among interaction patterns.

Panel of "Systematic view"
To compare outputs from different interaction networks, a systematic perspective is shown in Fig. 13. The distribution of three interaction patterns, including mutualism (+/+), competition (-/-), and parasitism/predation (+/-), is shown in Figure 11. The meaning of symbols is illustrated bellow. For any two sequential interaction networks, one OTU may play a critical role in the system. The removal or addition of this lowest abundance OTU may influence the successful or failed outcomes. This column of "Status change" is used to record the status change between two sequential interaction networks (Table 1). Figure 13 The panel of "systematic view" to examine all interaction networks. Table 2, Lactobacillaceae did not exist in 9-OTU interaction network but in 10-OTU interaction network. The two interaction networks were denoted as success conveying BC scores, 0.1714 and 0.1715 respectively.

As exemplified in
Consequently, Lactobacillaceae would be denoted as "Not changed". On the other hand, Porphyromonadaceae was involved in 18-OTU interaction network but absent in 17-OTU interaction network. The previous one resulted in a successful outcome (BC=0.3141) but the latter was failed (BC=0). Consequently Porphyromonadaceae should be annotated as "Status change: failure -> success".

Panel of "Consensus network"
The panel of "Consensus network" is aimed at providing consensus interactions from all interaction networks (Fig. 14). For an interaction pair, Mij, there were + and − interaction networks producing positive and negative outcomes when the interactive direction was fixed. When the ratio of + to the summation of + and − was statistically significantly greater than the user-defined threshold for this study, i.e., 90% represented by P(concordant pairs)>=0.9 in this panel, we were able to conclude that this interaction relation was concordant among networks and directed positively, and vice versa. One sample z-test for proportions was used to measure the concordance of predicted interactive relations among networks. This item is used to extract an interaction subnetwork with a specific range of interaction strengths. There are two columns to specify the range of interaction strengths.
For example, if we intended to display the strongest 10% of interactions, the upper and lower bound should be "0" % and "10" % (Fig. 14). On the contrary, the weakest 10% of interactions are extracted by the settings of "90" % and "100" % ( Fig. 15).
(B) 「Layout」: Similar to the layout options in the panel of "Microbial interactions", there are four options, including circle, force, layered, or subspace, in this item. Figure 15 The optional item of "Top MIs" was set to extract the weakest 10% of interaction strengths.
[MetaMIS] 21 21 (C) 「P(concordant pairs)」: P(concordant pairs) is a threshold to determine the interaction consistency from multiple interaction networks. For example, P(concordant pairs)>=0.9 means to collect a set of interaction pairs, usually a source and a target OTUs, with 90% of interactive outcomes being consistent (Fig. 14). If the threshold is raised, fewer interactions are selected (Fig. 16). Figure 16 The optional item of "P(concordant pairs)" was set to 0.95 to increase the criteria of selecting consensus interactions.

Save to a mat file
After the implement of MetaMIS, all results and parameter settings can be saved into a .mat file, which is 「Save」on the left side of MetaMIS ( Fig. 2). This .mat file can be loaded directly into the MetaMIS via the button of 「Open」.

Export interaction tables
Exporting the inferred microbial interaction table is an easy step. Choose 「This table」 to export the selected interaction table. Choose 「All tables」 to export all interaction tables (Fig .6). There are two kinds of tab separated files to produce.
F_gut_EDGE_9.txt. Each OTU identifier has an average abundance value and a core ratio across samples, been reserved in the file of Filename_NODE_N(OTUs).txt, e.g. F_gut_NODE_9.txt.

How to import an interaction table to Gephi [4]?
Step 1: Double click the icon Step 2: Create a new project (Fig. 17) Figure 17 Create a new project in Gephi.
[MetaMIS] 23 23 Step 3: To import the interactions, please follow the indicators in the Fig. 18 Step 4: Users can follow similar steps to import F_gut_NODE_9.txt, which provides OTU information about its average abundance and core ratio among samples. OTU information may help users to understand the topological network. Figure 18 The process of importing an interaction table into Gephi.

How to import an interaction table to Cytoscape [5]?
Step 1: Double click the icon Step 2: Create a new project and follow the process in Figure 19. Figure 19 The process of importing an interaction table into Cytoscape.

Bugs or problems
Encounter bugs, problems, or have any suggestions? Please contact Grace Tzun-Wen Shaw (tzunwen@gmail.com)