Prediction of linear B-cell epitopes of hepatitis C virus for vaccine development

Huang, Wen-Lin; Tsai, Ming-Ju; Hsu, Kai-Ti; Wang, Jyun-Rong; Chen, Yi-Hsiung; Ho, Shinn-Ying

doi:10.1186/1755-8794-8-S4-S3

Volume 8 Supplement 4

Joint 26th Genome Informatics Workshop and 14th International Conference on Bioinformatics: Medical Genomics

Research
Open access
Published: 09 December 2015

Prediction of linear B-cell epitopes of hepatitis C virus for vaccine development

Wen-Lin Huang¹,
Ming-Ju Tsai²,
Kai-Ti Hsu²,
Jyun-Rong Wang²,
Yi-Hsiung Chen² &
…
Shinn-Ying Ho^2,3

BMC Medical Genomics volume 8, Article number: S3 (2015) Cite this article

2657 Accesses
9 Citations
3 Altmetric
Metrics details

Abstract

Background

High genetic heterogeneity in the hepatitis C virus (HCV) is the major challenge of the development of an effective vaccine. Existing studies for developing HCV vaccines have mainly focused on T-cell immune response. However, identification of linear B-cell epitopes that can stimulate B-cell response is one of the major tasks of peptide-based vaccine development. Owing to the variability in B-cell epitope length, the prediction of B-cell epitopes is much more complex than that of T-cell epitopes. Furthermore, the motifs of linear B-cell epitopes in different pathogens are quite different (e. g. HCV and hepatitis B virus). To cope with this challenge, this work aims to propose an HCV-customized sequence-based prediction method to identify B-cell epitopes of HCV.

Results

This work establishes an experimentally verified dataset comprising the B-cell response of HCV dataset consisting of 774 linear B-cell epitopes and 774 non B-cell epitopes from the Immune Epitope Database. An interpretable rule mining system of B-cell epitopes (IRMS-BE) is proposed to select informative physicochemical properties (PCPs) and then extracts several if-then rule-based knowledge for identifying B-cell epitopes. A web server Bcell-HCV was implemented using an SVM with the 34 informative PCPs, which achieved a training accuracy of 79.7% and test accuracy of 70.7% better than the SVM-based methods for identifying B-cell epitopes of HCV and the two general-purpose methods. This work performs advanced analysis of the 34 informative properties, and the results indicate that the most effective property is the alpha-helix structure of epitopes, which influences the connection between host cells and the E2 proteins of HCV. Furthermore, 12 interpretable rules are acquired from top-five PCPs and achieve a sensitivity of 75.6% and specificity of 71.3%. Finally, a conserved promising vaccine candidate, PDREMVLYQE, is identified for inclusion in a vaccine against HCV.

Conclusions

This work proposes an interpretable rule mining system IRMS-BE for extracting interpretable rules using informative physicochemical properties and a web server Bcell-HCV for predicting linear B-cell epitopes of HCV. IRMS-BE may also apply to predict B-cell epitopes for other viruses, which benefits the improvement of vaccines development of these viruses without significant modification. Bcell-HCV is useful for identifying B-cell epitopes of HCV antigen to help vaccine development, which is available at http://e045.life.nctu.edu.tw/BcellHCV.

Background

Infection with the hepatitis C virus (HCV) often results in chronic hepatitis, liver cirrhosis, and hepatocellular carcinoma [1]. HCV presents high genetic heterogeneity [2], and HCV species are currently classified into 11 genotypes with 80 subtypes within each genotype [3]. Therefore, no vaccine is currently available [4]; however, some therapies have proven effective against some, but not all, genotypes [5]. HCV is an enveloped virus with two types of surface glycol-proteins, E1, and E2. The two types of glycoprotein epitopes are targets for the neutralization of antibody responses [6, 7]. Some recent approaches to vaccine development have focused on HCV envelope structures [5, 6, 8].

Previously, the development of HCV vaccines has mainly focused on T-cell immune response [4, 9–12]. Prabdial-Sing et al. performed sequence-based in silico analysis of HCV epitopes using algorithms to predict the immunogenicity of their variants from other less studied genotypes [13]. Li, et al. find that the two HLA epitopes may contribute to design the HCV vaccine for the Chinese population [4] and Aqsa, et al. report that the glycoprotein 2 of HCV-3a is an ideal target for vaccine design [10]. Despite identifying linear B-cell epitopes that can stimulate B-cell response, is one of the major tasks to design peptide-based vaccine; there are only few researches to analyze the B-cell immune response of HCV. Furthermore, design a predictor for B-cell epitopes, which have high variable epitope length, is more complex than predictor for T-cell epitopes [14].

On the other hand, some alternative computational methods (Table 1) have been developed for prediction of linear B-cell epitopes. These prediction methods mainly focus on peptides of a fixed length and use these peptides as an input to various machine learning models, including the Markov model (HMM), the artificial neural network (ANN), the support vector machine (SVM) [14–19]. However, the underperformances of these general-purpose methods [20, 21] and the significantly differenent sequence context of HCV from the hepatitis B virus counterpart (Figure 1) motivate this work to develop a specific method/tool for identifying B-cell epitopes of HCV.

Table 1 Representative peptide-based methods for predicting linear B-cell epitopes.

Full size table

This work retrieved experimentally validated B-cell response of HCV dataset (BR-HCV) from the immune epitope database (IEDB) [22]. In order to provide insights into the mechanism of B-cell epitopes of HCV and improve the prediction accuracy, an interpretable rule mining system of B-cell epitopes (IRMS-BE) is proposed which consists of physiochemical property (PCP) mining module to select informative PCPs and knowledge acquisition module to extract several if-then rule-based knowledge of predicting B-cell epitopes in HCV (Figure 2). A web server Bcell-HCV for predicting linear B-cell epitopes in HCV was implemented using the 34 informative PCPs and yields a test accuracy of 70.7%, which is superior to that of other SVM-based methods (66.5%) for identifying B-cell epitopes of HCV and the two general-purpose methods (49.9%). This work uses a feature knockout procedure [23] to analyze the efficiency of the 34 PCP features in predicting antigenic epitopes in HCV. The three most important properties (AAindex IDs: GEIM800102, ISOY800107, and SNEP660101) present the same difference (5.36%) in a feature knockout procedure. Among the three essential properties, the property of principal component I (AAindex ID: SNEP660101) is related to aromatic structures. The alpha-helix structure (AAindex ID: GEIM800102) plays an significant role in connecting HCV and the host cell and in facilitating HCV entry into host cells.

Furthermore, knowledge acquisition module can obtain 12 interpretable rules from top-five PCPs that have a prediction accuracy of 73.4% and sensitivity of 75.6%.

Finally, a conserved promising vaccine candidate, PDREMVLYQE, is identified from the top-50 B-cell epitopes of HCV for inclusion in a vaccine against HCV. The promising candidate is simultaneously considering induce antibodies and neutralize as broad as possible.

The benefits of IRMS-BE lies in the incorporation of informative physicochemical properties and rule-based knowledge. Future research will be aimed at extending the IRMS-BE method to the prediction of B-cell epitopes in other viruses. Also, Bcell-HCV has been implemented as a web server and is available at http://e045.life.nctu.edu.tw/BcellHCV.

Results

Performance comparison with the sequence-based methods

Using m = 34 physiochemical properties, the proposed prediction system Bcell-HCV produced training and independent test accuracies of 79.7% and 70.7%, where (C, γ) = (2², 2¹). Additionally, its training and test MCC values are 0.600 and 0.417, respectively. To evaluate the four SVM-based classifiers with 531 PCP features (referred to as SVM-PCP), 20 amino acid compositions (referred to as SVM-AAC), 400 dipeptide compositions (referred to as SVM-DPC), and 8000 tri-peptides (referred to as SVM-TPC), respectively, were evaluated in terms of prediction accuracy in 10-fold CV using the BR-HCV dataset. The best values for parameters γ and C in the SVM-based classifier were determined using a step-wise approach from γ ∈ {2^-15, 2^-13, ..., 2¹⁶} and C∈ {^-15, 2^-13, ..., 2¹⁶}. As shown in Table 2 the SVM-PCP classifier achieved accuracies of 74.80% and 65.50% when applied to the BR-HCV^Tr and BR-HCV^Te datasets, respectively, where (C, γ) = (2¹, 2^-3). Among the four SVM-based classifiers, SVM-PCP presented the second best performance after that of Bcell-HCV. These experimental results demonstrate that the prediction system Bcell-HCV outperforms the SVM-based methods (70.7% compared with 66.5% for test accuracy). Furthermore, two elegant and general-purpose methods [12, 24] for predicting linear B-cell epitopes, ABCpred [12] and LBtope [24], using the BR-HCV^Te dataset were compared with Bcell-HCV. The results indicate that Bcell-HCV is better than the SVM-based methods for B-cell epitopes of HCV and the two general-purpose methods (Table 2). To prevent the threshold biased, the ROC curves are plotted using the BR-HCV^Te dataset (Figure 3).

Table 2 Prediction performance comparisons between Bcell-HCV and representive methods using the BR-HCV dataset.

Full size table

Ranking the identified physiochemical properties

The work adopted a feature-knockout approach [23] to rank the efficiency of m = 34 physiochemical properties (PCPs) in the prediction of linear B-cell epitopes. The most effective PCP has maximum accuracy difference (D_j = Acc - Acc_j, for all j = 1, 2,..., m) between overall accuracy and feature-knockout accuracy. Overall accuracy Acc is obtained using all m = 34 PCPs, ρ = {ρ₁, ρ₂,..., ρ_m}. Feature-knockout accuracy Acc_j is obtained by employing an SVM with m-1 PCPs in the ρ group. The group ρ contains m-1 PCPs except for ρ_j. Figure 4 displays the m = 34 accuracy differences and rank in decreasing order. The top-10 PCPs are listed in Table 3 and relevant information associated with the 34 PCPs is listed in Supplementary Table 1. Top-three physiochemical properties are GEIM800102, ISOY800107, and SNEP660101, which achieve a maximum accuracy difference, is 5.36%.

Table 3 Definition of the top-10 properties ranked by the accuracy differences.

Full size table

The 34 properties contain 10 related to secondary structure, which are marked with "s", as shown in Additional File 1: Table S1. These include the conformational parameter of beta structure (AAindex ID BEGF750102), which is ranked fifth. The hydrophobicity coefficient in RP-HPLC, C8 with 0.1%TFA/MeCN/H2O (AAindex ID: WILM950102) is ranked tenth, which is in agreement with results obtained using AAP [15], suggesting that this property is efficient in the discrimination of linear B-cell epitopes and non B-cell epitopes.

Rule-based knowledge

This work presents a knowledge acquisition module based on the decision tree method C5.0, an improved version of C4.5 [25] to obtain insight into HCV antigenic epitopes. Knowledge can be obtained from two aspects: 1) the identification of informative physicochemical properties, and 2) if-then rules for distinguishing between B-cell and non B-cell epitopes. The top-five most influential features for predicting B-cell epitopes of HCV are utilized to acquire the rule set and a corresponding decision tree. Figure 5 shows a constructed decision tree with pruning confidence level higher than 25%. The accuracy of classifying the training dataset using the constructed decision tree is 73.4%. Furthermore, a set of 12 interpretable rules, six for identifying B-cell epitopes and six for identifying non B-cell epitopes, are transformed from this tree (Additional File 2: Table S2).

Table 4 presents the six interpretable rules for identifying B-cell epitopes of HCV with top-five important physicochemical properties, and each rule comprises the different number of the criteria. If a query sequence meets all of the criteria in one rule, then it is identified to B-cell epitopes of HCV. The first rule, which covers 259 samples and the accuracy is 78.3%, is constructed by three properties (SNEP660101, BEGF750102, and CHOP780215). This rule has three criteria: (1) Principal component I (property SNEP660101), which is related to aromatic structures, equal or less than 0.364754. This rule means a query sequence with a low ratio of aromatic structures composition. (2) The conformational parameter of beta structure (property BEGF750102) is equal or less than -0.11677, which means a query sequence with a high ratio of beta structures composition. (3) The normalized value of CHOP780215 is equal or less than -0.1979, which means a query sequence with a low frequency of fourth residue in turn composition.

Table 4 Six if-then rules for identifying B-cell epitope of HCV using C5.0 and top-five important physicochemical properties.

Full size table

The second rule, which covers 9 samples and the accuracy is 77.8%, is constructed by five properties (SNEP660101, BEGF750102, CHOP780215, GEIM800102, and ISOY800107) that comprises the following five criteria: (1) The normalized value of SNEP660101is equal or less than 0.364754, which means a query sequence with a low ratio of aromatic structures composition. (2) The conformational parameter of beta structure (property BEGF750102) is equal or less than -0.11677, which means a query sequence with a high ratio of beta structures composition. (3) The normalized value of CHOP780215 is larger than -0.1979, which indicates a query sequence with a high frequency of 4^th residues in turn composition. (4) The normalized value of GEIM800102 is larger than 0.026614, which means a query sequence with an intermediate ratio of alpha-helix. (5) Normalized relative frequency of double bend (property ISOY800107) is equal or less than -0.47739, which means a query sequence with a low rate of the double bend.

The third rule, which covers 65 samples and the accuracy is 76.9%, is constructed by four properties (SNEP660101, BEGF750102, GEIM800102, and CHOP780215). This rule has four criteria: (1) The normalized value of SNEP660101is equal or less than 0.364754, which means a query sequence with a low ratio of aromatic structures composition. (2) The conformational parameter of beta structure (property BEGF750102) is larger than -0.11677 and less or equal 0.15908, which means a query sequence with a high ratio of beta structures composition. (3) The normalized value of GEIM800102 is less or equal than 0.055132, which means a query sequence with a low rate of alpha-helix. (4) The normalized value of CHOP780215 is less or equal than -0.26309, which means a query sequence with a low frequency of fourth residues in turn composition.

The fourth rule, which covers 89 samples and the accuracy is 76.9%, is constructed by four properties (SNEP660101, BEGF750102, CHOP780215, and GEIM800102). This rule has four criteria: (1) The normalized value of SNEP660101is larger than -0.14618 and equal or less than 0.364754, which means a query sequence with a low ratio of aromatic structures composition. (2) The conformational parameter of beta structure (property BEGF750102) is equal or less than -0.11677, which means a query sequence with a high ratio of beta structures composition. (3) The normalized value of CHOP780215 is larger than -0.1979, which indicates a query sequence with a high frequency of fourth residue in turn composition. (4) The normalized value of GEIM800102 is less or equal than 0.026614, which means a query sequence with a low ratio of the alpha-helix.

The fifth rule, which covers 20 samples and the accuracy is 70.0%, is constructed by three properties (SNEP660101, BEGF750102, and GEIM800102). This rule has three criteria: (1) Principal component I ((AAindex ID: SNEP660101), which is related to aromatic structures, equal or less than 0.364754, which means if a query sequence with a low ratio of aromatic structures composition, (2) The conformational parameter of beta structure (AAindex ID: BEGF750102) is larger than -0.11677, which means if a query sequence with a high ratio of beta structures composition (3) The normalized value of GEIM800102 is greater than 0.51531, which means a query sequence with a high rate of alpha-helix.

The sixth rule, which covers 96 samples and the accuracy is 54.2%, is constructed by four properties (SNEP660101, BEGF750102, CHOP780215, and GEIM800102). This rule has three criteria: (1) Principal component I (AAindex ID: SNEP660101), which is related to aromatic structures, equal or less than -0.14618, which means a query sequence with a low ratio of aromatic structures composition, (2) The conformational parameter of beta structure (AAindex ID: BEGF750102) is less and equal than -0.11677, which means a query sequence with a low ratio of beta structures composition (3) The normalized value of GEIM800102 is less than -0.13308, which means a query sequence with a low ratio of the alpha-helix formation.

Identifying promising vaccine candidates

Sequence variability of neutralizing epitopes is considered to be a major obstacle to vaccine development [26]. Owing to the rapid change of antigenic profile of HCV, the promising vaccine candidate is identified from B-cell epitopes of HCV by the two-stage procedure: 1) making the neutralized range as broad as possible and 2) maximizing the ability to induce antibodies. To analyze the ranges of the top-n B-cell epitopes of HCV using a phylogenetic tree, the procedure is described below.

Step 1) Select the top-n B-cell epitopes in accordance with prediction scores of HCV using the prediction system Bcell-HCV.

Step 2) Use these n B-cell epitopes of HCV to generate a phylogenetic tree (Figure 6) by applying the BLOSUM62 scoring matrix with the Jalview tool [27].

Step 3) Calculate the divergence is calculated by summing the distances of all edges in the corresponding phylogenetic tree.

Step 4) (Termination test) If n != k then n = n+10 and go to the Step1. Otherwise, stop the algorithm. The value of k is determined by the specified threshold (T) for the average score of top-n linear B-cell epitopes. In this work, T= 0.95 is used.

In this work, the phylogenetic tree of the top-50 epitopes with the highest divergence (963.33) which denotes the neutralize range of the identified vaccine candidate is obtained (Additional File 3: Table S3). The detailed information of the top-50 B-cell epitopes of HCV is listed in Additional File 4: Table S4.

The following procedure performs the second stage for maximizing the ability to induce antibodies:

Step 1) Use the MAFFT tool [28] to obtain the conserved motif (PDRE-VLYQE) in Figure 7 from the top-50 B-cell epitopes of HCV. The illustrated example is PDRE-VLYQE, shown in Figure 7.

Step 2) Insert every one of the 20 amino acids into the gap of the conserved motif (PDRE-VLYQE), to generate 20 peptides.

Step 3) The best one of the 20 peptides is a vaccine candidate (PDREMVLYQE) in accordance with prediction scores using the Bcell-HCV (Table 5).

Table 5 Prediction scores of 20 vaccine candidates from the conserved motif.

Full size table

Discussion

The one of the major challenges of designing a peptide-based vaccine system is related to the collection of non-epitopes. Some residues in epitope are incorrectly annotated as non-epitopes led to overestimation problem that is overestimating of false positive rate. Most researchers [21, 29, 30] dealt with this overestimation problem by extracting a random sample of the protein sequences in Swiss-Prot [31]. This work solves this overestimation problem using experimentally verified non B-cell epitopes derived from IEDB [22]. The sequence pattern of linear B-cell epitopes of similar pathogens are quite different leading to the underperformance of general-purpose computational methods [20, 21] for predicting B-cell epitopes in HCV, which is consistent with the independent test results of two general-purpose tools [12, 24]. Hence, developing a virus-specific tool is important to accurately identify linear B-cell epitopes in a particular virus. To our knowledge, this is the first HCV-customized report showing that predicting antigenic epitopes in HCV, analysis of informative physicochemical properties, and identifying promising vaccine candidate from two views which are inducing antibodies and neutralizing as broad as possible.

Analyzing the relationships among the 34 identified physicochemical properties and the estimated epitope sequences provided some insight into linear B-cell epitopes in HCV. As shown in Table 3 the three most influential properties (obtained using AAindex IDs GEIM800102, ISOY800107, and SNEP660101) have the same difference in accuracy (5.36%), which clearly indicates their significant contribution to the prediction of HCV antigenic epitopes. These include alpha and turn propensities, hydrophobicity, and aromatic properties.

The E2 segment plays a significant role in HCV connection and entry into host cells, and the alpha-helix structure (GEIM800102) of epitopes has a strong influence on this connection. For example, one of the central binding regions in the E2 protein is formed by hydrophobic interactions on the alpha-helix, which is located at the C-terminal [32]. Furthermore, the spatial arrangement of the components at E2 of HCV is found to deviate significantly from the corresponding complexes with neutralizing antibodies [33].

Antibodies that target the two hypervariable regions of the E2 segment include HVR1 and HVR2 [34]. These two parts on the E2 segment contain conservative residues, such as Thr^2, Gly^6, Gly²³, and Gln²⁶, which are polar amino acids that form hydrogen bonds [35]. Furthermore, in a work by Kong et al that analyzed antibodies of HCV, the major antibody binding sites were found to be conserved, and the residues that were substituted in those sites showed similar hydrophobicity [36]. For HCV, the interaction between the antibody and epitope depends on specific residues from the hydrophobic face of the epitope. Accordingly, replacing these with polar or charged residues could weaken or eliminate the interaction between the antibody and the antigen [36].

To investigate the property of the alpha-helix structure (GEIM800102), the ps2 protein structure prediction server (http://ps2.life.nctu.edu.tw/) to predict the 3D structure of the query epitopes. The PyMOL (http://sourceforge.net/projects/pymol/) molecular visualization system was then used to present molecular visualization in 3D. For example, in Figure 8(a), "YPGHVSGHRMAWDMM" is a linear B-cell epitope which mainly forms a helix structure (blue), whereas "RLWHYPCTINYTIFKI" is a non B-cell epitope that possesses an alpha structure (red) as well as a helix structure (blue).

The property of principal component I (AAindex ID: SNEP660101) is related to aromatic structures, which were discovered to play a fundamental role in the interaction between antibodies and epitopes. The side chain of two aromatic residues, Phe⁴⁴², and Tyr⁴⁴³, are exposed on the same side of the helix in the E2 protein [37] and close to the binding residues, Leu⁴³⁸, Ala⁴³⁹, and Leu⁴⁴¹. Systematic studies on mutagenesis have shown that only Leu⁴³⁸ and Ala⁴³⁹ have the ability to tolerate mutations. This means that the mutation of those two residues should not damage the ability of the virus to bind to host cells, whereas a mutation in one of the two aromatic residues would render the HCV virus non-functional [8, 37].

In conclusion, the binding sites of HCV antibodies are located in a predominantly hydrophobic cavity with aromatic residues that play a critical role in the interaction with antibodies [36]. Some residues can adjust the shape space of epitopes in the connection between antibodies and epitopes. For instance, the small amino acids glycine and proline are found interspersed between aromatic residues, which can alter the geometry of the hydrophobic region to fit with various antibodies [38].

Conclusions

Development of an effective vaccine against hepatitis C virus (HCV) is a complex task owing to the variability of this RNA virus. Recently, development of HCV vaccines has mainly focused on T-cell immune response. However, B-cell epitopes that can stimulate B-cell response is one of the major tasks of peptide-based vaccine development. This work proposes an interpretable rule mining system IRMS-BE for extracting interpretable rules using informative physicochemical properties and a web server Bcell-HCV for predicting linear B-cell epitopes of HCV. Finally, a conserved promising vaccine candidate, PDREMVLYQE, is identified for inclusion in a vaccine against HCV.

Methods

The block diagram in Figure 1 outlines the steps involved in modeling with the proposed IRMS-BE system, including 1) Datasets, 2) representing PCP composition, 3) IRMS-BE system, 4) PCP mining module, 5) knowledge acquisition module, 6) predicting B-cell epitopes of HCV and 7) interpretable rules set. These steps were applied in this work, and detailed descriptions are in the following sub-sections.

Datasets

A B-cell response of HCV (referred to as BR-HCV) dataset was established to evaluate IRMS-BE and Bcell-HCV. BR-HCV dataset were collected from the immune epitope database (IEDB) [22] (version 2.12 released on Dec. 16, 2013), which contains data related to antibodies and T-cell epitopes in humans, non-human primates, rodents, and other animal species. The latest version (version 2.3) was released on July 2, 2014. The BR-HCV dataset was created as follows.

Step 1) The source organism "Hepatitis C virus" was used to find the B-cell response of HCV sequences. This result involved the collection of 8009 B-cell response sequences, including 4041 linear B-cell epitopes (positives) and 3968 non B-cell epitopes (negatives), as shown in Additional File 5: Figure S1.

Step 2) Epitopes with 10- to 20-mers lengths were selected. The epitopes annotated with a greater number of 'positive' (i.e. as opposed to 'negative') results were regarded as positive samples. Conversely, peptide sequences with more 'negative' results were considered as negative. For example, Additional File 6: Figure S2 (a) and (b) show the epitope 'YLLPRRGPRL' was considered a positive sample because most of the experiment results were positive. The epitope 'DLMGYIPLV', as shown in Additional File 6: Figure S2 (c) was considered a negative sample is showing the opposite criteria.

Step 3) Redundant samples (i.e., epitopes that shared 20% or more sequence identity with any other peptides in the same subset) were removed from the benchmark data sets to create a non-redundant dataset. This work used the PICES [39] culling program, which resulted in the collection of 1548 B-cell response sequences, 774 positives and 774 negatives, to establish the BR-HCV dataset

Step 4) The BR-HCV dataset was divided into two parts (BR-HCV^Tr and BR-HCV^Te) for training and independent test. The BR-HCV^Tr and BR-HCV^Te datasets, which were kept at a ratio of 2:1, contained 1032 (516 positives and 516 negatives) and 516 (258 positives and 258 negatives) B-cell response sequences, respectively.

Representation of PCP composition

Physicochemical properties (PCPs), also referred to as propensity, are the most intuitive feature associated with biochemical reactions and are widely used in the field of bioinformatics. This work represents each peptide sequence for an l-dimensional profile, where the value of each amino acid is obtained from the AAindex [40] database to encode a particular PCP feature. The l-dimensional profiles are transformed into the N_AAindex-dimensional feature vectors (referred to as PCP composition), where N_AAindex = 531 physicochemical properties can be obtained from http://www.genome.ad.jp/aaindex[40]. In this work, l = 10, 11 to 20 are used. Finally, all values of the feature vectors are normalized into [-1, 1] before being input into the SVM.

IRMS-BE system

An interpretable rule mining system of B-cell epitopes (IRMS-BE) including physiochemical properties (PCPs) mining module and knowledge acquisition module is proposed. PCPs mining module selects 34 informative PCPs from 531 physiochemical properties and determines the values of C and γ of the used SVM simultaneously based on an inheritable bi-objective genetic algorithm [41]. The knowledge acquisition module is based on the 34 informative PCPs, a decision tree method C5.0 [42] was used to extract if-then rule-based knowledge for the biologist to understand the mechanism of B-cell epitopes in HCV.

PCP mining module

To identify minimal number (m) out of 531 PCP features while establishing an SVM-based training classifier (referred to as PCP mining) with maximal accuracy is a bi-objective combinatorial optimization problem [43]. Physicochemical property mining module (PCP mining module) solved this optimization problem by utilizing an inheritable bi-objective combinatorial optimization genetic algorithm (IBCGA). The PCP mining module to consider internal correlations among relevant features rather than focusing on individual features [44].

To select a minimal set of m informative PCPs from n = 531 PCPs while maximizing the prediction accuracy of using these m features for designing an SVM classifier is a bi-objective combinatorial optimization problem C(n, m). To cope with this large parameter optimization problem, the inheritable bi-objective combinatorial genetic algorithm (IBCGA) is used [9]. The IBCGA can simultaneously obtain a set of solutions, S_r, where r = r_start, r_start+1, ..., r_end in a single run using an inheritance mechanism to efficiently search for a solution S_r+1 to C(n, r+1) by inheriting a good solution S_r to C(n, r). On the other hand, the IBCGA using an intelligent evolutionary algorithm (IEA) [10] which can efficiently solve large parameter optimization problems is good at deriving an optimized SVM model with the feature selection. The high performance of IEA mainly arises from using an orthogonal array based crossover operation with a systematic reasoning ability instead of the traditional crossover operation with a generate-and-go mechanism. The detailed method can refer to the work [9]. Considering the purposes of this work, involving both model selection and estimation of prediction errors, the cross-validation scheme is used with an SVM-based classifier of this PCP mining module. To reduce computational costs, PCP mining module used the prediction accuracy (ACC) of 10-fold CV to serve as the fitness function of IBCGA [24] for the entire training set. The evaluation of binary predictions involves the use of several quality measures: accuracy (ACC), sensitivity (SE), specificity (SP), and the Matthews correlation coefficient (MCC):

\begin{gathered} ACC = (TP + TN) / (TP + FP + TN + FN) \\ SE = TP / (TP + FN) \\ SP = TN / (TN + FP) \\ MCC = (TP \times TN - FP \times FN) / \sqrt{(TP + FP) \times (TP + FN) \times (TN + FP) \times (TN + FN)} \end{gathered}

(1)

where TP and TN are the numbers of correctly predicted linear B-cell epitopes and non B-cell epitopes, respectively. FP and FN are the numbers of incorrectly predicted linear B-cell epitopes and non B-cell epitopes, respectively. MCC is often used to evaluate the balance of model prediction.

The input of the PCP mining module is a training set of protein sequences belonging to two classes: positives and negatives. The output contains a set of m selected PCP features and an SVM-based classifier with associated parameter settings, γand C. Protocols for the PCP mining module are as follows:

Step 1) Each sample is represented as an n-dimensional feature vector P = [p₁, p₂, ..., p_n] using the composition of PCPs

Step 2) The IBCGA-chromosome consists of binary genes f_i from which to select PCP features and two 4-bit genes for encoding kernel parameter (γ) and cost parameter (C). The corresponding feature p_i (the i-th PCP feature) is excluded from the SVM classifier if f_i = 0, and is included if f_i = 1. Let m be the sum of f_i. The γ > 0 determines how the samples are transformed into a high-dimensional search space. The cost parameter C>0 of the SVM classifier adjusts the penalty of total error. These two parameters C and γ must be tuned to get the best prediction performance. In this work, γ ∈ {2^-15, 2^-13, ..., 2¹⁶}and C∈ {2^-15, 2^-13, ..., 2¹⁶}.

Step 3) The fitness function is the prediction accuracy of 10-fold CV using the LIBSVM classifier [47] with m selected PCP features, γ and C by decoding the IBCGA-chromosome. In this work, a popular kernel function that is radial basis function exp (− γ ||xⁱ x^j||²) is adopted. The xⁱ and x^j are training samples and γ is a kernel parameter. The parameter settings of IBCGA are given in Additional File 7: Table S5.

Step 4) All solutions for S_r from r=r_start to r_end are obtained using IBCGA. Let S_m be the most accurate solution with m selected PCP features among all solutions from C(n, r_start) to C(n, r_end) search space. In this work, r_start = 10 and r_end = 40 are used.

Step 5) IBCGA use mechanisms of randomization and are therefore characterized as non-deterministic based on the fact that results of individual runs are not the same always. Therefore, Steps 3) and Step 4) are performed for R independent runs to obtain the best R number of discrete runs to get the best R solutions, In this work, R = 30 is used.

The knowledge acquisition module

Decision tree algorithms are valuable to obtain the rule-based knowledge since the tree can generate if-then rules. In this work, the method C5.0 is employed to construct decision tree classifier and acquire interpretable rule set for analyzing hepatitis C virus. The decision tree is constructed using ranked properties, which selected by information gain, and can be used to select properties, according to the ranks of properties. Nevertheless, the selected properties have no interaction between each property and the influence of properties should be considered individually. For acquiring the general and interpretable rules, the pruning process is applied to avoid the over-fitting problem and the threshold value of confidence is set to 25%. The final tree can transfer to if-then rules that one rule is corresponded to one leaf node. The covered samples of a rule are the samples in one leaf node.

Prediction system of B-cell epitopes of HCV

To provide prediction service to the scientific community, we have developed a user-friendly web server Bcell-HCV based on 34 informative physicochemical properties and optimized parameters (C, γ) of SVM classifier in this study. The 34 informative physicochemical properties were selected by the PCP mining training module.

Availability

The web server Bcell-HCV can deal with the amino acid sequences of HCV antigens in FASTA format. Users can input the size of sliding window and the threshold of prediction score for screening high-confidence putative epitopes. The output is the identified B-cell epitopes with location information and their prediction scores. The high score implies a great probability for the peptide to be a B-cell epitope of HCV. The web server is freely available at http://e045.life.nctu.edu.tw/BcellHCV/

References

Lavanchy D: Evolving epidemiology of hepatitis C virus. Clinical Microbiology and Infection. 2011, 17 (2): 107-115.
Article CAS PubMed Google Scholar
Bukh J, Miller RH, Purcell RH: Genetic Heterogeneity of Hepatitis C Virus: Quasispecies and Genotypes. Seminars in Liver Disease. 1995, 15 (1): 41-63.
Article CAS PubMed Google Scholar
Alexopoulou A, Dourakis SP: Genetic heterogeneity of hepatitis C virus and its clinical significance. Curr Drug Targets Inflamm Allergy. 2005, 4 (1): 47-55.
Article CAS PubMed Google Scholar
Li D, Pan L, Xie YM, Xue XP, Zhao FT, Ma L, et al: Prediction and Identification-Based Prediction of Chinese Hepatitis C Viral-Specific Cytotoxic T Lymphocyte Epitopes. Journal of Medical Virology. 2011, 83 (8): 1315-1320.
Article CAS PubMed Google Scholar
Khan AG, Whidby J, Miller MT, Scarborough H, Zatorski AV, Cygan A, et al: Structure of the core ectodomain of the hepatitis C virus envelope glycoprotein 2. Nature. 2014, 509 (7500): 1476-4687.
Article Google Scholar
Keck Z-y, Xia J, Wang Y, Wang W, Krey T, Prentoe J, et al: Human Monoclonal Antibodies to a Novel Cluster of Conformational Epitopes on HCV E2 with Resistance to Neutralization Escape in a Genotype 2a Isolate. PLoS Pathog. 2012, 8 (4): e1002653-
Article CAS PubMed PubMed Central Google Scholar
Sautto G, Tarr AW, Mancini N, Clementi M: Structural and Antigenic Definition of Hepatitis C Virus E2 Glycoprotein Epitopes Targeted by Monoclonal Antibodies. Clinical and Developmental Immunology. 2013, 2013: 450963-
Article PubMed PubMed Central Google Scholar
Krey T, Meola A, Keck ZY, Damier-Piolle L, Foung SK, Rey FA: Structural basis of HCV neutralization by human monoclonal antibodies resistant to viral neutralization escape. PLoS Pathog. 2013, 9 (5): e1003364-
Article CAS PubMed PubMed Central Google Scholar
Tavis JE, Donlin MJ, Aurora R, Fan XF, Di Bisceglie AM: Prospects for personalizing antiviral therapy for hepatitis C virus with pharmacogenetics. Genome Medicine. 2011, 3 (2):
Ikram A, Anjum S, Tahir M: In Silico Identification and Conservation Analysis of B-cell and T-Cell Epitopes of Hepatitis C Virus 3a Genotype Enveloped Glycoprotein 2 From Pakistan: A Step Towards Heterologous Vaccine Design. Hepatitis Monthly. 2014, 14 (6): e9832-
PubMed PubMed Central Google Scholar
Andriulli A, Nardi A, Di Marco V, Ippolito AM, Gavrila C, Aghemo A, et al: An a priori prediction model of response to peginterferon plus ribavirin dual therapy in naive patients with genotype 1 chronic hepatitis C. Digestive and Liver Disease. 2014, 46 (9): 818-825.
Article CAS PubMed Google Scholar
Saha S, Raghava GP: Prediction of continuous B-cell epitopes in an antigen using recurrent neural network. Proteins. 2006, 65 (1): 40-48.
Article CAS PubMed Google Scholar
Prabdial-Sing N, Puren AJ, Bowyer SM: Sequence-based in silico analysis of well studied Hepatitis C Virus epitopes and their variants in other genotypes (particularly genotype 5a) against South African human leukocyte antigen backgrounds. BMC Immunology. 2012, 13 (1): 67-
Article CAS PubMed PubMed Central Google Scholar
Larsen J, Lund O, Nielsen M: Improved method for predicting linear B-cell epitopes. Immunome Research. 2006, 2 (1): 2-
Article PubMed PubMed Central Google Scholar
Chen J, Liu H, Yang J, Chou KC: Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids. 2007, 33 (3): 423-428.
Article CAS PubMed Google Scholar
El-Manzalawy Y, Dobbs D, Honavar V: Predicting linear B-cell epitopes using string kernels. Journal of Molecular Recognition. 2008, 21 (4): 243-255.
Article CAS PubMed PubMed Central Google Scholar
Su CH, Pal NR, Lin KL, Chung IF: Identification of Amino Acid Propensities That Are Strong Determinants of Linear B-cell Epitope Using Neural Networks. PLos One. 2012, 7 (2): e30617-
Article CAS PubMed PubMed Central Google Scholar
Wee LJ, Simarmata D, Kam YW, Ng LF, Tong JC: SVM-based prediction of linear B-cell epitopes using Bayes Feature Extraction. BMC Genomics. 2010, 11 Suppl 4: S21-
Article PubMed Google Scholar
Yao B, Zhang L, Liang S, Zhang C: SVMTriP: A Method to Predict Antigenic Epitopes Using Support Vector Machine to Integrate Tri-Peptide Similarity and Propensity. PLoS One. 2012, 7 (9): e45152-
Article CAS PubMed PubMed Central Google Scholar
Greenbaum JA, Andersen PH, Blythe M, Bui HH, Cachau RE, Crowe J, et al: Towards a consensus on datasets and evaluation metrics for developing B-cell epitope prediction tools. Journal of Molecular Recognition. 2007, 20 (2): 75-82.
Article CAS PubMed Google Scholar
Soria-Guerra RE, Nieto-Gomez R, Govea-Alonso DO, Rosales-Mendoza S: An overview of bioinformatics tools for epitope prediction: Implications on vaccine development. Journal of Biomedical Informatics. 2015, 53: 405-414.
Article PubMed Google Scholar
Vita R, Zarebski L, Greenbaum JA, Emami H, Hoof I: The immune epitope database 2.0. Nucleic Acids Res. 2010, 38 (Database issue): D854-D862.
Article CAS PubMed Google Scholar
Wolf L, Martin I: Regularization Through Feature Knock Out. Massachusetts institute of technology -- computer science and artificial intelligence laboratory. 2004, CBCL Memo 242:
Google Scholar
Singh H, Ansari HR, Raghava GPS: Improved Method for Linear B-Cell Epitope Prediction Using Antigen's Primary Sequence. Plos One. 2013, 8 (5):
Quinlan JR: 1993, C4.5: programs for machine learning: Morgan Kaufmann Publishers Inc
Burton DR, Poignard P, Stanfield RL, Wilson IA: Broadly Neutralizing Antibodies Present New Prospects to Counter Highly Antigenically Diverse Viruses. Science. 2012, 337 (6091): 183-186.
Article CAS PubMed PubMed Central Google Scholar
Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ: Jalview Version 2-a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009, 25 (9): 1189-1191.
Article CAS PubMed PubMed Central Google Scholar
Katoh K, Misawa K, Kuma K, Miyata T: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002, 30 (14): 3059-3066.
Article CAS PubMed PubMed Central Google Scholar
Saha S, Raghava GPS: Prediction of continuous B-cell epitopes in an antigen using recurrent neural network. Proteins. 2006, 65 (1): 40-48.
Article CAS PubMed Google Scholar
Sollner J, Mayer B: Machine learning approaches for prediction of linear B-cell epitopes on proteins. Journal of Molecular Recognition. 2006, 19 (3): 200-208.
Article PubMed Google Scholar
Bairoch A, Apweiler R: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000, 28 (1): 45-48.
Article CAS PubMed PubMed Central Google Scholar
Deng Y, Guan J, Wen B, Zhu N, Chen H, Song J, et al: Induction of broadly neutralising HCV antibodies in mice by integration-deficient lentiviral vector-based pseudotyped particles. PLoS One. 2013, 8 (4): e62684-
Article CAS PubMed PubMed Central Google Scholar
Deng L, Ma L, Virata-Theimer ML, Zhong L, Yan H, Zhao Z, et al: Discrete conformations of epitope II on the hepatitis C virus E2 protein for antibody-mediated neutralization and nonneutralization. Proc Natl Acad Sci U S A. 2014, 111 (29): 10690-10695.
Article CAS PubMed PubMed Central Google Scholar
Suzuki Y, Gojobori T: Positively selected amino acid sites in the entire coding region of hepatitis C virus subtype 1b. Gene. 2001, 276 (1-2): 83-87.
Article CAS PubMed Google Scholar
Penin F, Combet C, Germanidis G, Frainais PO, Deleage G, Pawlotsky JM: Conservation of the conformation and positive charges of hepatitis C virus E2 envelope glycoprotein hypervariable region 1 points to a role in cell attachment. J Virol. 2001, 75 (12): 5703-5710.
Article CAS PubMed PubMed Central Google Scholar
Kong L, Giang E, Robbins JB, Stanfield RL, Burton DR, Wilson IA, Law M: Structural basis of hepatitis C virus neutralization by broadly neutralizing antibody HCV1. Proc Natl Acad Sci U S A. 2012, 109 (24): 9499-9504.
Article CAS PubMed PubMed Central Google Scholar
Drummer HE, Boo I, Maerz AL, Poumbourios P: A conserved Gly436-Trp-Leu-Ala-Gly-Leu-Phe-Tyr motif in hepatitis C virus glycoprotein E2 is a determinant of CD81 binding and viral entry. J Virol. 2006, 80 (16): 7844-7853.
Article CAS PubMed PubMed Central Google Scholar
Ma Y, Thomas MG, Okamoto M, Bogdanos DP, Nagl S, Kerkar N, et al: Key residues of a major cytochrome P4502D6 epitope are located on the surface of the molecule. J Immunol. 2002, 169 (1): 277-285.
Article CAS PubMed Google Scholar
Wang G, Dunbrack RL: PISCES: a protein sequence culling server. Bioinformatics. 2003, 19 (12): 1589-1591.
Article CAS PubMed Google Scholar
Kawashima S, Ogata H, Kanehisa M: AAindex: Amino Acid index database. Nucleic Acids Res. 1999, 27 (1): 368-369.
Article CAS PubMed PubMed Central Google Scholar
Ho SY, Chen JH, Huang MH: Inheritable genetic algorithm for biobjective 0/1 combinatorial optimization problems and its applications. IEEE Trans Syst Man Cybern B Cybern. 2004, 34 (1): 609-620.
Article PubMed Google Scholar
Quinlan JR: 1993, C4.5: Programs for Machine Learing: Morgan Kaufmann Publishers
Huang WL, Tung CW, Huang HL, Hwang SF, Ho SY: ProLoc: Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features. BioSystems. 2007, 90 (2): 573-581.
Article CAS PubMed Google Scholar
Huang WL, Tung CW, Ho SW, Hwang SF, Ho SY: ProLoc-GO: Utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization. BMC Bioinformatics. 2008, 9: 80-
Article PubMed PubMed Central Google Scholar
Ho SY, Chen JH, Huang MH: Inheritable genetic algorithm for biobjective 0/1 combinatorial optimization problems and its applications. IEEE Trans Syst Man Cybern B Cybern. 2004, 34 (1): 609-620.
Article PubMed Google Scholar
Ho SY, Shu LS, Chen JH: Intelligent evolutionary algorithms for large parameter optimization problems. Evolutionary Computation, IEEE Transactions on. 2004, 8 (6): 522-541.
Article Google Scholar
Chang CC, Lin CJ: LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology. 2011, 2 (3):
Vacic V, Iakoucheva LM, Radivojac P: Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics. 2006, 22 (12): 1536-1537.
Article CAS PubMed Google Scholar
Tomii K, Kanehisa M: Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. Protein Engineering. 1996, 9 (1): 27-36.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We are thankful to the associate editor and two anonymous reviewers, whose suggestions have greatly improved the quality of this manuscript, and we also would like to thank Ying-Ying Yu for great suggestions. This work was funded by National Science Council of Taiwan under the contract number MOST-103-2221-E-009-117-, and "Center for Bioinformatics Research of Aiming for the Top University Program" of the National Chiao Tung University and Ministry of Education, Taiwan, R.O.C. for the project 104W962. This work was also supported in part by UST-UCSD International Center of Excellence in Advanced Bioengineering sponsored by the Ministry of Science and Technology with I-RiCE Program under Grant Number: MOST 103-2911-I-009-101-.

Declaration

Publication charges for this article have been funded by the National Science Council of Taiwan under the contract number MOST-103-2221-E-009-117-

This article has been published as part of BMC Medical Genomics Volume 8 Supplement 4, 2015: Joint 26th Genome Informatics Workshop and 14th International Conference on Bioinformatics: Medical genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcmedgenomics/supplements/8/S4.

Author information

Authors and Affiliations

Department and Institute of Industrial Engineering and Management, Minghsin University of Science and Technology, Xinfeng Hsinchu, Taiwan
Wen-Lin Huang
Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
Ming-Ju Tsai, Kai-Ti Hsu, Jyun-Rong Wang, Yi-Hsiung Chen & Shinn-Ying Ho
Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
Shinn-Ying Ho

Authors

Wen-Lin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Ju Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Kai-Ti Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Jyun-Rong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Hsiung Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shinn-Ying Ho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shinn-Ying Ho.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

Wen-Lin Huang (WLH) and Ming-Ju Tsai (MJT) designed the system, participated in manuscript preparation, and carried out the detail work. MJT, Kai-Ti Hsu, Jyun-Rong Wang (JRW) and Yi-Hsiung Chen (YHC) designed the system and implemented programs. Also, Shinn-Ying Ho (SYH) and WLH supervised the whole project and participated in manuscript preparation. All authors have read and approved the final manuscript.

Electronic supplementary material

Additional file 1: Table S1. Definition of the 34 properties ranked by the accuracy differences. (PDF 82 KB)

Additional file 2: Table S2. Rule-based knowledge of Bcell-HCV prediction. (PDF 75 KB)

Additional file 3: Table S3. Statistics of the top-k epitopes for the threshold > 0.95. (PDF 62 KB)

12920_2015_568_MOESM4_ESM.pdf

Additional file 4: Table S4. The top-50 B-cell epitopes of HCV for analyzing the conserved motif and constructing a phylogenetic tree are listed. The order of peptide IDs are sorted using the prediction score. (PDF 74 KB)

12920_2015_568_MOESM5_ESM.pdf

Additional file 5: Figure S1. Figure S1 (a) HCV source organism. (b) A total of 8009 non-redundant linear B-cell epitopes and non B-cell epitopes are obtained. (PDF 448 KB)

12920_2015_568_MOESM6_ESM.pdf

Additional file 6: Figure S2. Figure S2 Collect relevant antigenic sequences with (b) B-cell assay data, based on the (b) positive samples and (c) negative samples. (PDF 843 KB)

Additional file 7: Table S5. The control parameters of IBCGA used. (PDF 93 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Huang, WL., Tsai, MJ., Hsu, KT. et al. Prediction of linear B-cell epitopes of hepatitis C virus for vaccine development. BMC Med Genomics 8 (Suppl 4), S3 (2015). https://doi.org/10.1186/1755-8794-8-S4-S3

Download citation

Published: 09 December 2015
DOI: https://doi.org/10.1186/1755-8794-8-S4-S3

Prediction of linear B-cell epitopes of hepatitis C virus for vaccine development

Abstract

Background

Results

Conclusions

Background

Results

Performance comparison with the sequence-based methods

Ranking the identified physiochemical properties

Rule-based knowledge

Identifying promising vaccine candidates

Discussion

Conclusions

Methods

Datasets

Representation of PCP composition

IRMS-BE system

PCP mining module

The knowledge acquisition module

Prediction system of B-cell epitopes of HCV

Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors' contributions

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Medical Genomics

Contact us