B-cell epitope prediction for peptide-based vaccine design: towards a paradigm of biological outcomes for global health

Global health must address a rapidly evolving burden of disease, hence the urgent need for versatile generic technologies exemplified by peptide-based vaccines. B-cell epitope prediction is crucial for designing such vaccines; yet this approach has thus far been largely unsuccessful, prompting further inquiry into the underlying reasons for its apparent inadequacy. Two major obstacles to the development of B-cell epitope prediction for peptide-based vaccine design are (1) the prevailing binary classification paradigm, which mandates the problematic dichotomization of continuous outcome variables, and (2) failure to explicitly model biological consequences of immunization that are relevant to practical considerations of safety and efficacy. The first obstacle is eliminated by redefining the predictive task as quantitative estimation of empirically observable biological effects of antibody-antigen binding, such that prediction is benchmarked using measures of correlation between continuous rather than dichotomous variables; but this alternative approach by itself fails to address the second obstacle even if benchmark data are selected to exclusively reflect functionally relevant cross-reactivity of antipeptide antibodies with protein antigens (as evidenced by antibody-modulated protein biological activity), particularly where only antibody-antigen binding is actually predicted as a surrogate for its biological effects. To overcome the second obstacle, the prerequisite is deliberate effort to predict, a priori, biological outcomes that are of immediate practical significance from the perspective of vaccination. This demands a much broader and deeper systems view of immunobiology than has hitherto been invoked for B-cell epitope prediction. Such a view would facilitate comprehension of many crucial yet largely neglected aspects of the vaccine-design problem. Of these, immunodominance among B-cell epitopes is a central unifying theme that subsumes immune phenomena of tolerance, imprinting and refocusing; but it is meaningful for vaccine design only in the light of disease-specific pathophysiology, which for infectious processes is complicated by host-pathogen coevolution. To better support peptide-based vaccine design, B-cell epitope prediction would entail individualized quantitative estimation of biological outcomes relevant to safety and efficacy. Passive-immunization experiments could serve as an important initial proving ground for B-cell epitope prediction en route to vaccine-design applications, by restricting biological complexity to render epitopeprediction problems more computationally tractable.

Global health has been defined as "collaborative transnational research and action for promoting health for all" [22]. It thus transcends the traditionally parochial pursuit of public health by nation-states within their respective geographic territories. Furthermore, it encompasses both human and veterinary medicine, which are inextricably linked through the concept of "One Health" (i.e., health among interacting human and animal populations as tightly integrated components of a shared global ecosystem) [23][24][25][26]. This concept is key to comprehending disease in terms of epidemiologic transition theory [27,28].
Classic epidemiologic transition theory [27] holds that, during the shift from agricultural to industrial society, human population growth accelerates rapidly due to declining death rates borne of widespread health-promoting developments (e.g., public-health interventions against infectious disease) but subsequently decelerates due to declining birth rates borne of multiple factors (e.g., deferred and less frequent childbearing consequent to increased educational and employment opportunities); by this account, the classic epidemiologic transition is a shift in the human disease burden from infections resulting in premature death to chronic conditions developing with cultural and environmental changes that accompany industrialization. The chronic conditions whose incidence tends to increase with industrialization include disorders of immune function, notably those characterized by hypersensitivity in the form of allergic and autoimmune reactions [29][30][31][32]. This trend reflects immune An extended framework of epidemiologic transition theory [28] identifies the classic epidemiologic transition as the second of three epidemiologic transitions. Of these transitions, the first accompanies the shift from huntergatherer to agricultural society and is characterized by a rise in infectious diseases favored by poorer nutrition (due to decreased diversity of dietary sources of nutrients), increasing human population density (particularly with the advent of sedentism and subsequent urbanization), and greater contact between humans and domesticated animals (leading to various zoonoses). The third epidemiologic transition, already underway, is characterized by a resurgence of infectious diseases due to accelerated globalization of human disease ecologies that increasingly favors pandemics of emerging and reemerging pathogens. This ongoing epidemiologic transition reflects the current socio-ecological regime, which motivates the unsustainable pursuit of unlimited growth of material production and consumption [44]; the resulting environmental degradation compromises the global ecosystem as a whole, for example, by way of climate change that expands the geographic ranges of pathogens and disease vectors [26].
Epidemiologic transitions have thus resulted in a rapidly evolving double burden of infectious and non-infectious diseases [45][46][47], hence the urgent need for versatile generic technologies exemplified by peptide-based vaccines [48][49][50], which elicit antipeptide antibodies that modulate protein function to prevent or treat disease. B-cell epitope prediction is employed in the design of such vaccines to presumptively identify segments of protein sequence that as peptides induce beneficial antibody responses [12,14]; each segment thus identified contains one or more putative B-cell epitopes, i.e., molecular substructures whose defining feature is their capacity for binding by antibodies [51].
A B-cell epitope is therefore antigenic (i.e., potentially recognizable by the immune system) by virtue of its potential to be bound by antibodies (and, more generally, by immunoglobulins, which include both B-cell surface immunoglobulins and antibodies); if it can also induce the production of such antibodies (e.g., consequent to its binding by B-cell surface immunoglobulins), it is immunogenic as well. Accordingly, the properties of being antigenic and of being immunogenic are antigenicity and immunogenicity, respectively. One B-cell epitope may be more or less immunogenic than another under a given set of circumstances, for which reason immunogenicity (in the sense of potential to induce an immune response of a certain intensity) is better represented as a continuous rather than dichotomous variable. Furthermore, if an immunogenic B-cell epitope thus induces the production of antibodies, these antibodies may also bind a structurally different B-cell epitope (which may be nonimmunogenic or of low immunogenicity) provided that the two B-cell epitopes share sufficient structural similarity and are both physically accessible to the antibodies, which are then said to react with the first (immunogenic) B-cell epitope and to cross-react with the second (possibly non-immunogenic or poorly immunogenic) B-cell epitope; this phenomenon (i.e., binding of one B-cell epitope by antibodies elicited by another) is a form of antigenic cross-reactivity (i.e., recognition of different molecular features by a common immunesystem component). Such cross-reactivity is the basis for peptide-based vaccination whereby vaccine peptides induce the production of antipeptide antibodies that react with B-cell epitopes of the peptides and also cross-react with B-cell epitopes of proteins (e.g., toxins as well as pathogen-associated adhesion molecules for binding to host surfaces) that are the intended antibody targets (noting that a peptide B-cell epitope and a protein B-cell epitope are structurally distinct even if they share the same amino-acid sequence, owing to differences in overall molecular context that invariably arise but are often overlooked) [51]. Additionally, one B-cell epitope may be more immunogenic than another under a given set of circumstances (e.g., if both are present on the same molecule, and especially so if they physically overlap with one another), such that the more immunogenic one (which is thus described as immunodominant in relation to the other) may induce production of antibodies to itself while effectively suppressing production of antibodies to the other; this phenomenon (i.e., bias of antibody production towards an immunodominant B-cell epitope) is known as immunodominance, which is highly contextdependent in that a B-cell epitope may appear to be either immunodominant or non-immunodominant depending on its relationships to other B-cell epitopes (e.g., if it is more or less immunogenic than others to which it is physically linked) in the course of an individual host immune response. Among peptides, for example, immunodominance can arise in the setting of neighboring or overlapping B-cell epitopes (i.e., with proximate or overlapping amino-acid sequences), particularly where antibody production becomes biased towards a B-cell epitope due to its prior induction of specific B-cell clonal expansion [52]. Hence, a peptide B-cell epitope may be worth incorporating into a vaccine if it is immunodominant (insofar as it induces adequate production of antibody to itself as part of a vaccine peptide) and induces antipeptide antibodies that cross-react with a target protein so as to produce a biological effect that is beneficial rather than harmful; paradoxically, potentially harmful effects may result, as exemplified by autoimmune antibody responses (i.e., targeting of autologous or self biomolecules) and also by antibody-dependent infection enhancement, which is the amplification of infectious processes by antibodies (typically to pathogens or products thereof).
Various computational methods for B-cell epitope prediction have thus been proposed. The prototype of these methods [53] assigns a numeric hydrophilicty value to each of the 20 canonical proteinogenic amino acids and evaluates the arithmetic-mean hydrophilicity over a sliding window several residues in width along the entire sequence of a polypeptide chain, generating a sequence profile whose peaks correspond to putative B-cell epitopes. Many variant methods have been devised using alternatives to the original hydrophilicity values and averaging scheme [54,55], in turn giving rise to composite predictive approaches based on consensus among such methods [56][57][58]. These classical sequence-profiling methods are both computationally expedient and applicable where only protein sequence is known, but they have been criticized for their unrealistic unidimensional view of B-cell epitope prediction [55,59]. This criticism is avoided by other methods that either explicitly consider detailed three-dimensional protein structure rather than sequence alone [60][61][62][63][64][65][66][67] or aim to enhance predictive power by using computational techniques for sequence analysis that are of much greater sophistication than the classical sequence-profiling methods [66,[68][69][70][71][72][73][74][75][76][77][78][79][80]. Unfortunately, the benchmarking of B-cell epitope prediction methods is commonly subject to uncertainties that potentially confound efforts to systematically refine the methods to better support the design of peptide-based vaccines [81].
Despite more than a quarter century of attempts to enable successful development of peptide-based vaccines that is based on analysis of available biomolecular data, only marginal progress has been realized towards clinically proven applications [48,82,83]. This may largely reflect multiple technical problems, such as peptide instability in vivo and low adjuvanticity, which might yet be solved through appropriate revision of protocols for immunogen preparation and immunization [48,84,85]. However, it also raises serious concerns regarding the feasibility of peptide-based vaccine design as currently conceptualized under an overly simplistic paradigm of B-cell epitope prediction focused on antibody binding per se rather than biological consequences thereof [81,86,87]. The present work examines key limitations of this prevailing paradigm to outline a proposed alternative that effectively circumvents them, thereby facilitating the systematic refinement of methods for B-cell epitope prediction as applied to peptide-based vaccine design, with cautious regard for the complexities of global health.

Paradigm evolution Basic requirements
Reasoning entirely from first principles is unlikely to ever yield a practicable solution to the problem of B-cell epitope prediction, considering both the computational demands of rigorous quantum-mechanical calculations for antibody binding and the uncertainties introduced by any simplifying approximations [88]. A more realistic way to develop methods for predicting B-cell epitopes is through iterative cycles of incremental refinement guided by parallel benchmarking against subsets of judiciously selected empirical data that are partitioned according to the complexity of the predictive task (e.g., due to localization of proteins in biological membranes, viral capsids or other supramolecular complexes), such that the methods are systematically revised to perform more reliably under increasingly challenging circumstances [81]. Prerequisites for this are rational benchmark-data selection criteria [67,89] and a corresponding objective function that correlates predictions with benchmark data to express reliability [81].
Consistent with the intended application of peptide-based vaccine design, the fundamental criterion for selecting benchmark data is their generation through experiments that are adequate to detect cross-reactions of polyclonal antipeptide antibodies with proteins [67,89]; moreover, at least some of the benchmark data must reflect functionally relevant cross-reactivity manifest as antibodymediated modulation of protein biological activity (e.g., enzyme inhibition by antibodies) [81]. Full definition of the selection criteria depends on paradigm-specific details, as does the choice of objective function.

Binary classification
Binary classification underlies the prevailing paradigm (hereafter referred to as the binary classification paradigm), the essential feature of which is a presupposed dichotomy between sequences with and without potential to serve as B-cell epitopes under given sets of conditions [90][91][92]; accordingly, both predictions and benchmark data are either positive or negative. A positive prediction regarding a sequence often implies predicted potential to serve as a B-cell epitope in two distinct capacities: first, as part of an immunizing peptide, to elicit antipeptide antibodies; and second, as part of a cognate protein of the peptide, to mediate cross-reaction of the antibodies with the protein [67]. This follows from the convention that the immunizing peptide sequence is a subsequence of the cognate protein, notwithstanding the potential of antibodies elicited by an immunizing peptide to cross-react with a protein of apparently unrelated sequence [93][94][95].
As the vast majority of published benchmark data pertains to peptides and proteins whose B-cell epitopes have not been (and possibly cannot be) precisely mapped as sequences [51], predictions are generally amenable to benchmarking only if they apply to peptide-protein pairs rather than individual B-cell epitopes [67], in which case immunodominance must be accounted for. Among B-cell epitopes of immunizing peptides, immunodominance is the bias of antipeptide antibody responses towards socalled immunodominant B-cell epitopes, which is probably at least partly due to mutual steric exclusion between competing immunoglobulins (in particular, between rapidly produced antibodies to immunodominant B-cell epitopes on the one hand and, on the other, B-cell surface immunoglobulins that can recognize nonimmunodominant B-cell epitopes) such that early and stable binding of immunodominant B-cell epitopes by antibodies suppresses the production of antibodies to non -immunodominant B-cell epitopes (consistent with peptide-immunization studies [52] and modeled by a simple affinity-based hierarchical steric-exclusion scheme for predicting B-cell epitope immunodominance in peptides [67]). If a positive prediction is rendered for a certain combination of immunizing peptide and cognate protein, this implies that the peptide contains at least one putative immunodominant B-cell epitope whose sequence both elicits antipeptide antibodies and, as part of the protein, mediates their cross-reaction with the protein. Likewise, positive benchmark data are instances of empirically confirmed cross-reaction between antipeptide antibodies and a cognate protein of the immunizing peptide which was used to elicit the antibodies.
Under the binary classification paradigm, two major criteria may be defined for selecting benchmark data [81]. The first major criterion confirms positive benchmark data; it requires evidence of functionally relevant crossreactivity (posited in the preceding subsection, "Basic requirements"). The second major criterion confirms negative benchmark data; it requires evidence of genuine absence of cross-reactivity, which is established by a negative result in a fluid-phase immunoassay (e.g., immunoprecipitation) to avoid the confounding apparent absence of cross-reactivity due to artifactual inaccessibility of B-cell epitopes that arises in solid-phase immunoassays (e.g., enzyme-linked immunosorbent assays) [96][97][98]. In addition, a minor criterion further restricts the admissible positive benchmark data to those for which the immunizing peptide bears a lone predicted immunodominant B-cell epitope whose sequence occurs as part of the cognate protein in exactly one distinct structural context, thereby avoiding ambiguity in the attribution of cross-reactivity to individual putative B-cell epitopes [81]. The exact form of this minor criterion depends on details of the chosen method of B-cell epitope prediction, notably assumed lengths of B-cell epitope sequences, such that positive benchmark data are excluded for excessively long immunizing peptide sequences.
To evaluate the objective function, predictions must first be qualitatively appraised in relation to the benchmark data. By convention, predictions are deemed either true if they agree with the benchmark data or false if otherwise, such that every prediction falls into one of four mutually exclusive categories, namely true-positive, true-negative, false-positive and false-negative; the numbers of predictions falling within these categories are often denoted by TP, TN, FP and FN, respectively. This allows calculation of sensitivity as TP/(TP+FN) and specificity as TN/ (TN+FP), which both range from 0 to 1 and would both be equal to 1 for a perfect predictive method [99]. Insofar as predictions are rendered by dichotomizing a continuous variable (e.g., local average hydrophilicity value [53] or estimated free energy change of antibody binding [62,67]) using an arbitrary cut point (i.e., threshold value), both sensitivity and specificity can be calculated over a range of cut points. Most commonly, a receiver operator characteristic curve (ROCC) is generated that reflects the inherent tradeoff between sensitivity and specificity [100]; by convention, the ROCC is a plot of the truepositive rate (TPR, i.e., sensitivity) against the falsepositive rate (FPR, equal to 1-specificity), and the objective function is the area under the ROCC (AUROCC). Typically, the ROCC of a useful predictive method extends from the origin to the point where both TPR and FPR equal 1, above the diagonal line defined by TPR=FPR, such that AUROCC exceeds 0.5; and the goal is to approach as closely as possible the theoretical maximum AUROCC value of 1.
Problems of the binary classification paradigm are mainly due to the dichotomization of benchmark data. Just as qualitative predictions are obtained by dichotomizing continuous variables, so are qualitative benchmark data. Because the cut points used for this are arbitrary, unstandardized and often unknown (e.g., as implicitly set by the undetermined limit of detection for a qualitative immunoassay), the dichotomization of benchmark data entails loss of information [101] and introduces the possibility of investigator selection bias (e.g., in the designation of weak yet statistically significant experimental results as positive).

Continuous outcomes
The above-mentioned problems of the binary classification paradigm are avoided by dispensing with dichotomization in both the rendering of predictions and the definition of benchmark data. This redefines the predictive task as quantitative estimation of biological effects due to binding by antibody. Mathematically, such an effect may be expressed in terms of a continuous variable A that is a measure of some biological activity (e.g., enzyme activity); if A is cast as a function of antibody concentration C, the absence of antibody-mediated biological effects is associated with the value A0 of A for which C is 0, such that division by A0 normalizes A. Generalizing this approach, benchmark data comprise values of continuous variables that represent measurements of biological activities, such that normalization of both prediction results and benchmark data facilitates combined analyses accommodating many diverse forms of biological activity [81].
By thus replacing the binary classification paradigm with an alternative based on correlation between prediction results and benchmark data that are both expressed in terms of continuous rather than dichotomous variables, the dichotomy of positive and negative benchmark data is rendered obsolete. Consequently, selection criteria are necessary for only a single category of benchmark data that reflect functionally relevant cross-reactivity, albeit over a continuous range of possible values. Of the selection criteria formulated under the binary classification paradigm, the criterion for negative benchmark data loses all meaning; but the major and minor criteria for positive benchmark data remain useful if applied to continuous rather than dichotomous outcome variables. As to the choice of objective function, the Pearson correlation coefficient (PCC) may be used in place of AUROCC [81], with the goal of approaching as closely as possible the theoretical maximum PCC value of 1.
While the paradigm shift just described resolves the issue of dichotomization, it fails to solve a much deeper problem, namely the failure to explicitly model biological consequences of immunization (as opposed to binding by antibodies irrespective of its biological consequences). This deficiency is masked by the use of benchmark data on functionally relevant cross-reactivity, when in fact the predictions fall short of committing to definite biological outcomes (e.g., a certain fractional enzyme inhibition for a given set of conditions such as enzyme and antibody concentrations) because only binding by antibodies is actually predicted as a surrogate for its biological effects. This opens the possibility of investigator selection bias, particularly where prospective benchmark data might be excluded on the grounds that the observed biological effects are too weak to be informative; because binding by antibodies does not uniformly produce biological effects, benchmarking against data on weak biological effects risks exposing as unreliable a method that purports to predict biological effects but in reality predicts only binding by antibodies as a surrogate.
Nevertheless, preliminary steps have already been taken to address biological effects more explicitly in B-cell epitope prediction for vaccine design, as notably exemplified by the development of machine-learning approaches with benchmarking against experimental data on HIV that reflect biological activity relevant to vaccine efficacy [70]. Continued progress in this direction, particularly to further elaborate on the biological outlook in ways that address both safety and efficacy, might be realized as described below.

Biological outlook
To avoid problems that arise where binding by antibodies is treated as a surrogate for its biological effects, these effects themselves could be the express object of B-cell epitope prediction for the design of peptide-based vaccines. In practice, this could serve to prioritize the prediction of biological outcomes that are practically relevant to both safety and efficacy (e.g., hypersensitivity reactions versus prophylactic activity) [102,103]. These biological outcomes may reflect both direct and indirect consequences of binding by antibodies. The direct consequences are due to binding by antibodies per se, as in the case of enzyme inhibition resulting from occlusion of enzyme active sites by antibodies. In contrast, the indirect consequences are realized through downstream immune effector mechanisms that are initiated by antibodies already complexed with their targets (e.g., pathogens). The downstream mechanisms include complement activation, which in turn can lead to both lysis of target membranes and enhanced phagocytic uptake of targets. To the extent that they result in protection against pathogens, the direct and indirect consequences respectively underlie what have been designated as class-I and class-II protectivities [73]. Like the direct consequences, the indirect consequences may be regarded as instances of antibody-mediated modulation of protein biological activity, in the sense that binding by antibodies extends the function of the target proteins to encompass biological outcomes of activating downstream immune effector mechanisms. However, maintenance of native target protein conformations tends to be a prerequisite more for the direct than the indirect consequences; the latter may be realized even where target proteins are denatured, provided that they are bound by antibodies.
The explicit prediction of biological outcomes completes the evolutionary development of a new paradigm, the essence of which is uncompromising focus on continuous variables that correspond directly to the biological effects of binding by antibodies, for both B-cell epitope prediction and the benchmarking thereof. Interestingly, this circumvents the problem posed by ambiguous attribution of functionally relevant cross-reactivity among putative B-cell epitopes, which is otherwise avoided by applying the minor criterion for selecting benchmark data (invoked in the preceding subsections, "Binary classification" and "Continuous outcomes"). In retrospect, the problem exists only where binding by antibodies is predicted as a surrogate for its biological effects; if these effects are themselves predicted, the burden of resolving ambiguities is properly assigned to the predictive method, such that the minor criterion is rendered obsolete. Upon discarding the minor criterion, more benchmark data (e.g., on immunizing peptides of arbitrarily long sequence) are admissible as their selection is unconstrained by peculiarities of chosen predictive methods. What then remains as the sole criterion for selecting benchmark data is the basic requirement for evidence of functionally relevant crossreactivity.

Systems view
Historically, B-cell epitope prediction has been attempted primarily through superficial analyses of protein structure alone, mostly using sequence as a surrogate for detailed three-dimensional structure [53][54][55]59]; yet structural analysis is merely an initial step towards modeling the consequences of vaccination under the paradigm of biological outcomes which is developed in the preceding section. Migration to this paradigm thus calls for greater emphasis on functional correlates of structure [87].
Initially, one could resort to predicting functionally important structural features of proteins that are prospective vaccine targets (e.g., putative enzyme active sites) [88]; such a first approach might be useful for special cases wherein biological activities of interest are fortuitously inferred from analyses of protein structures in isolation from one another, but not where the activities are mainly emergent properties that arise through the association of proteins with their interaction partners (e.g., of pathogen proteins that are virulence factors by virtue of their interactions with host biomolecules). In general, the pervasive interdependence of biological activities via exquisitely complex and dynamic molecular interaction networks compels the study of target proteins in multiple contexts that reflect numerous combinations of possible interaction partners and various potential roles at progressively higher levels of structural and functional organization [104], inevitably requiring a full transition from descriptive bioinformatics to systems biology [105]. This is implicit in the current functional annotation of B-cell epitope data, which addresses both direct and indirect consequences of binding by antibody (discussed in the preceding subsection, "Biological outlook"); notably, the indirect consequences entail extended target protein function that encompasses biological effects of activating downstream immune effector mechanisms, which epitomizes the conceptual domain of systems biology. Systems analysis of antibody-mediated biological phenomena thus presents unprecedented opportunities to enrich the functional annotation of B-cell epitope data in ways that ultimately enhance computational capabilities for vaccine design. This endeavor has already begun to yield promising new results that may serve to guide vaccine development [106].
Thus adopting a systems view for B-cell epitope prediction, the fundamental problem of elucidating biologically relevant antigen structures on which to base predictions is more fully appreciated. This problem may often be inadequately addressed by exclusive reliance on direct structural elucidation of antigens ex vivo (e.g., by crystallography), particularly in their purified forms. Purified antigens may yield structural models that can be misinterpreted as suggesting the absence of interactions with other biomolecules, such that apparently accessible antigen surfaces are actually inaccessible in vivo due to localization in biological membranes and other supramolecular complexes [81]; conversely, purified antigens may be found to exist in oligomerization states that are biologically irrelevant [107][108][109][110], such that actual Bcell epitopes may be mistaken for buried sites. More generally, experimentally observed statistical distributions of antigen conformational states may be of uncertain biological relevance. For instance, conformational freedom may be artificially restricted by packing constraints in a protein crystal lattice [111], and the addition of ligands to promote protein crystallization may yield crystals wherein the observed protein conformations are unrepresentative of ligand-free proteins [112]. In extreme cases, intrinsic disorder (i.e., dynamic random-coil behavior in the native state) might be masked by folding of natively disordered protein segments that is induced through the formation of ligand-protein and crystal contacts [113]. Alternatively, natively folded structures might fail to form from their disordered precursors if appropriate interaction partners that can induce folding are unavailable [114].
Considering the issues just enumerated, experimentally derived (e.g., crystallographic) antigen structural models per se may therefore be suboptimal bases for B-cell epitope prediction applied to vaccine design, in which case they might be better utilized indirectly to devise alternative structural models that more accurately represent the antigens as biologically relevant targets of antibodymediated immunity. The experimentally derived models might thus serve as starting points for molecular dynamics simulations under conditions of the actual molecular milieu in vivo (e.g., aqueous solution containing potential interaction partners and other solutes at physiologic concentrations). Such an approach may, for example, allow for relaxation of a crystal structure with realistic local unfolding of protein segments that are natively disordered, thereby rendering them more accessible for binding by antibodies. A much more challenging potential application would be simulation of flavivirus maturation wherein a critical target of neutralizing antibodies is the conserved E-glycoprotein fusion loop, which they preferentially recognize among immature rather than mature virions due to its decreasing accessibility in the course of virion maturation [115,116]. The affinity of the antibodies is thus lower for mature virions than for immature virions, such that the antibodies may actually enhance infection of certain cell types (e.g., macrophages) by mature virions via Fc- receptor-mediated entry, particularly where the occupancy of potential antibody-binding sites on the virions is low [117][118][119][120]. Interestingly, this antibody-mediated infection enhancement can be suppressed by complement-based mechanisms. For example, complement component C1q can bind Fc regions of certain virion-bound IgG-class antibodies to restrict antibody-mediated infection enhancement [121] by crosslinking the antibodies in a manner that potentiates their capacity for neutralizing infectivity [122]. Additionally, mannose-binding lectin (MBL) can bind mannosebearing N-linked glycans on virions to restrict antibodymediated infection enhancement via complement activation leading to neutralization of infectivity and immune clearance [123].
Antibody-mediated infection enhancement (often referred to as antibody-dependent enhancement of infection) has been documented among infections due to taxonomically diverse viruses [124][125][126][127][128], including HIV [129 -132], and even cellular pathogens such as bacteria and protozoa [133,134]. Fc- receptor-mediated entry is thus frequently exploited by viruses to infect host cells, but it can also serve as a means for host-cell internalization of virulence factors (e.g., toxins) elaborated by cellular pathogens [135,136]. In such cases, the role of complement is highly context-dependent; in the case of HIV infection, complement appears to initially restrict but subsequently promote antibody-mediated infection enhancement as host immune status deteriorates [132]. Antibody-mediated infection enhancement may also occur via mechanisms other than Fc- receptor-mediated entry, as in the case of the Panton-Valentine leukocidin secreted by Staphylococcus aureus; binding of this cytotoxin by antibodies can prevent it from activating certain innate immune mechanisms, thereby impairing the overall host immune response to the pathogen [137]. In view of these phenomena, vaccine-induced antibody responses may produce biological effects that paradoxically harm rather than protect the host, which complicates the task of vaccine development [138]. This realization underscores the need for a systems view of immunobiology to support vaccine design.

Unifying themes
For a systems view of immunobiology to cost-effectively support vaccine design, it should enable sufficiently accurate modeling of biological outcomes that is based on available empirical data, most especially those data which are efficiently generated in bulk using highthroughput technologies (e.g., for genome sequencing). Building upon existing foundations of structural biology, this would ultimately relate molecular structure to biological activity at organismal and even higher levels (e.g., of populations and ecosystems). A powerful unifying theme therein is the invariance of physicochemical constraints imposed by thermodynamics and reaction kinetics on biological activity. As reaction kinetics may be formulated in thermodynamic terms of transition state theory [139] and diffusion control [140], biological activity conceivably can be predicted largely from molecular structure by way of a link to thermodynamics. Such a link is found in structural energetics, which relates the structure of biomolecules to the thermodynamics of their folding and binding [141][142][143].
Structural energetics provides a means to estimate affinities and kinetic rate constants for the binding of putative B-cell epitopes by antibodies, in turn to predict the extent of binding by antibodies over time [81]. If this scheme is to be maximally utilized in support of B-cell epitope prediction for peptide-based vaccine design [62,67], structural energetics must be extended to more adequately describe binding kinetics vis-a-vis biological activity. A particularly instructive case in point concerns immunodominance (as defined above in the "Binary classification" subsection): it is positively correlated with affinity for antibody in primary antibody responses [144] but is arguably the result of differential binding kinetics, as suggested by the positive correlation between kinetic onrate and clonal selection of B-cells in T-cell dependent secondary antibody responses [145]. For a B cell to competitively recruit T-cell help [146], kinetics should permit sufficiently rapid and stable binding of antigen by surface immunoglobulin to favor receptor-mediated endocytosis; yet excessively stable (i.e., high-affinity) binding that translates to very slow dissociation of antibody-antigen complexes limits antigen availability for surface immunoglobulin, which may explain the apparent upper limit of affinity for sustainable T cell-dependent clonal selection of B cells [147]. Binding kinetics that facilitates endocytic uptake of antigen may also entail B cell receptormediated signal transduction that decreases cellular death rates [148] and increases B-cell ability to bind and acquire antigen from follicular dendritic cells [149], thereby further promoting positive selection of B cells that recognize immunodominant B-cell epitopes. Detailed modeling of these processes is imperative, as immunodominance is a central unifying theme in immunobiology [14].
The origins of immunodominance can be fully comprehended only in relation to mechanisms that render B-cell epitopes non-immunodominant. One mechanism already alluded to herein is positive selection of B cells whose surface immunoglobulins bind immunodominant B-cell epitopes, whereby activation of other B cells is suppressed (e.g., through preemptive binding of immunodominant B-cell epitopes by antibodies that renders other B-cell epitopes sterically inaccessible). Another mechanism is tolerance, i.e., difficulty of eliciting antibodies to certain B-cell epitopes even in the absence of immunodominant competitors, due to negative selection against either B cells themselves or T cells that might otherwise provide T-cell help to the B cells; this normally suppresses production of antibodies to self molecules and food components, thereby preventing deleterious hypersensitivity reactions [150][151][152][153][154]. These mechanisms are subject to imprinting, that is, the influence of past immune responses on future ones; as an immune response selects certain lymphocyte clones at the expense of others, it biases subsequent immune responses towards certain B-cell epitopes and away from others. For example, a prior antibody response to some B-cell epitope X may bias subsequent antibody responses towards X such that antibody responses to another B-cell epitope Y are suppressed if Y is accompanied by X, as in the phenomenon of original antigenic sin [155][156][157]; additionally, tolerance to B-cell epitopes may be either induced or broken in the course of an immune response [39,158]. Immunologic imprinting thus alters the potential for mounting antibody responses against particular B-cell epitopes, possibly in highly context-dependent ways (e.g., enabling T-cell independent activation of memory B cells to produce antibodies against viral envelope proteins in response to whole virions but not soluble monomeric forms of the proteins [159]). Such imprinting points to the plasticity of immune systems.
The plasticity of host immune systems has evolved to defend against pathogens, yet it is exploited by pathogens through deceptive imprinting that suppresses antibody responses to B-cell epitopes whose binding by antibodies interferes with infection [160][161][162]; pathogens thus present highly immunodominant B-cell epitopes as decoys [163][164][165] and even mimic normally tolerated B-cell epitopes (e.g., of host biomolecules) [166], although attempts at such mimicry may break normal tolerance to induce deleterious hypersensitivity reactions (e.g., antibody-mediated autoimmune destruction of host biomolecules) [151,158]. In principle, deceptive imprinting and its sequelae can be overcome with vaccines that refocus antibody responses towards critical B-cell epitopes to suppress, disrupt or otherwise circumvent pathophysiological mechanisms (e.g., of infection and autoimmunity) [156,164,167]; but to actually design such vaccines demands comprehensive knowledge of diseasespecific pathophysiology, which for infectious processes is complicated by host-pathogen coevolution [81].

Unique histories
In a broad sense that encompasses somatic evolution (e.g., through somatic hypermutation during B-cell affinity maturation), host-pathogen coevolution entails host immune responses that impair pathogen function and thereby direct pathogen evolution to evade them, thus initiating new cycles of host response and pathogen counter-response. This is most readily apparent among infections due to rapidly evolving viral quasispecies [156,168], for which the mutation rates are so high that each round of viral replication may yield a population of distinct genomic variants from a single progenitor genome. In an individual host, the many possible outcomes of infection (e.g., latent infection and chronic disease) and paths between these [169][170][171] are challenging to model, as each instance of infection represents a unique history of ongoing host-pathogen coevolution and prior imprinting (e.g., due to passive and active immunization, including past infection), which impact vaccine safety and efficacy yet are neglected by overly simplified attempts at vaccine design based on static molecular views of both host and pathogen.
The significance of unique histories among prospective vaccinees extends to vaccine design for both infectious and noninfectious diseases, as biological outcomes of vaccination are invariably influenced by genetics and imprinting, which are themselves inextricably linked to the much wider socioeconomic and environmental contexts of individuals. All this tends to be obscured by conventional statistical analysis of vaccinee populations as if they were homogeneous groups, yet it must be properly addressed in accordance with emerging ethical regimes that increasingly reject traditional reductionist notions of biomedicine [172].
The preceding considerations transcend the notion of vaccines as highly standardized products intended for mass administration on a global scale, arguing instead for at least some degree of customization in vaccine design to suit individual circumstances. Such customization might selectively favor the production of antibodies against key B-cell epitopes to confer protective immunity without inducing deleterious responses (e.g., autoimmune and other forms of hypersensitivity reactions). However, active immunization with individually customized vaccines fails to completely exclude the possibility of imprinting that may subsequently contribute to antibodymediated pathophysiological processes, for example, via antibody-dependent enhancement as occurs in dengue disease [173]. To avoid this inherent risk of active immunization, passive immunization might be performed instead by administering exogenous antibodies of predefined immunological characteristics (e.g., of antigenic specificity and immune effector function) to confer highly specific forms of protective immunity while avoiding exposure to the target B-cell epitopes themselves. Among humans, this might be accomplished using appropriately humanized monoclonal antibodies of non-human animal origin. Such an approach would largely obviate concerns over human safety in the prediction of B-cell epitopes.
With regard to issues of both safety and efficacy, the distinction between active and passive immunization strategies further clarifies the scope of B-cell epitope prediction for peptide-based vaccine design. To guide the design of peptide-based immunogens that induce antipeptide antibodies for passive immunization, B-cell epitope prediction must explicitly apply to functionally relevant cross-reactivity in the context of clinically important biological outcomes (i.e., adverse and beneficial effects of exogenously supplied antipeptide antibodies), which occur in response to administration of the antibodies rather than active immunization with the peptide-based immunogens; in principle, the antibodies could be purified to some desired degree of functional homogeneity (e.g., as monoclonal antibodies), and their biological effects could be modeled along the lines of conventional pharmacokinetics and pharmacodynamics (e.g., with calculation of time-dependent antibody concentrations in blood plasma and other compartments). However, if the immunogens are used as vaccines for active immunization to elicit endogenous antipeptide antibodies, B-cell epitope prediction must explicitly apply much more comprehensively to the entire course of events from the administration of the initial dose of vaccine onwards, as influenced by genetic background, prior immunologic imprinting and other factors. A major challenge therein would be the estimation of time-dependent concentrations of vaccineinduced antibodies to self and nonself B-cell epitopes, as a prerequisite to predicting biological outcomes of vaccination (e.g., prophylaxis against or enhancement of infection at successive post-vaccination time points).

Future directions
In view of the extent to which the prevailing binary classification paradigm dominates the theory and practice of B-cell epitope prediction for peptide-based vaccine design, the most immediately pressing problem is the current unavailability of benchmark data that reflect functionally relevant cross-reactivity as continuous variables (e.g., fractional enzyme inhibition) qualified in terms of other pertinent continuous variables (e.g., in relation to fractional enzyme inhibition, the concentrations of both enzyme and antibody). This problem is unlikely to be solved anytime soon, if at all, by the classic default strategy of gleaning benchmark data from published literature, hence the urgent need to generate new empirical results that are sufficiently qualified for use as benchmark data. Until the demand for such data is thus met, dichotomous benchmark data might be used instead [81], subject to the loss of information and possibility of investigator selection bias that are discussed above in the "Binary classification" subsection.
Yet, the shift from dichotomous-to continuous-variable benchmark data entails the problem of biological variability in antibody responses among individuals even where populations are essentially homogeneous (e.g., among genetically identical animals reared under rigorously controlled laboratory conditions); a completely analogous problem is actually encountered with dichotomous variables, in that both positive and negative experimental results are sometimes obtained for purported replicate trials despite diligent efforts to minimize biological variability. This necessitates the acquisition of continuous-variable data in quantities (as gauged by sample sizes) that are adequate to compensate for biological variability, which may reflect inherent stochastic processes (e.g., in the somatic rearrangement of immunoglobulin genes) and be more pronounced in real-world situations (e.g., among human and wild-animal populations) than under laboratory conditions, so as to avoid errors due to reliance on insufficiently sampled data.
Meanwhile, development of computational tools could anticipate future availability of more appropriate benchmark data by aiming to actually predict biological outcomes in terms of changes in molecular, supramolecular and higher-order structure with respect to time [174,175]. Pending definitive empirical validation, this would be provisionally justified if based on well-established physicochemical premises (thermodynamics, reaction kinetics, etc.) and pursued within the framework of systems biology to gain deeper insights into key phenomena (e.g., immunodominance and host-pathogen coevolution) with a strong emphasis on individual uniqueness in terms of genetic background, imprinting and other factors that impact vaccine safety and efficacy. B-cell epitope prediction might thus be initially developed in two successive phases: The first phase would focus on developing an approach to support the design of peptide-based immunogens that induce antipeptide antibodies for passive immunization, and the second phase would extend the approach to support the design of peptide-based vaccines for active immunization based on endogenous antipeptide antibodies. The second phase could be facilitated by more detailed and comprehensive modeling of immune function, with due emphasis on key molecular events that lead to B-cell activation [149,[176][177][178].
What has been outlined thus far is but one of many conceivable paths towards further development of B-cell epitope prediction for peptide-based vaccine design. This particular path is most immediately useful for illustrating how progress might be realized within an evolving paradigm of biological outcomes as proposed in the preceding section, "Paradigm evolution," mainly to inform the development of alternative paths for expediting vaccine design without demanding comprehensive and mechanistically detailed computational simulation of biological systems. Such alternative paths could be charted by extending certain machine-learning approaches that have served to advance the understanding of B-cell epitope prediction for vaccine design. One pioneering example of these approaches [73] has pointed to the importance of sequence variability and posttranslational modification as confounding factors, which has been interpreted as arguing for the exclusion of variable and posttranslationally modified sequences in the design of vaccine peptides; this could, however, also be interpreted as arguing for the development of more sophisticated computational methods to predict both immunological cross-reaction among variable sequences and immunogenicity of posttranslationally modified sequences, rather than simply discarding such sequences from the outset. More generally, factors that complicate B-cell epitope prediction tend to limit the utility of predictive methods, suggesting that systematic refinement of the methods is contingent upon the appreciation of such factors as they relate to the prediction problem [81].
As a corollary to the concept of factors that complicate B -cell epitope prediction, available predictive methods may optimally complement one another if each is selectively applied only in circumstances under which it outperforms all others. This argues for the pursuit of multiple alternative paths towards B-cell epitope prediction and of diversification within each path to develop a versatile repertoire of prediction methods (e.g., that are customized for various categories of proteins, as might be defined by structural characteristics and biological localization). Accordingly, comparative benchmarking of predictive methods plays a vital role in identifying subsets of benchmark data (e.g., on categories of pathogen proteins for various host-pathogen combinations) for which a particular method is most well-suited; where performance is found to be relatively poorer for certain subsets of benchmark data, individual methods could be revised to address the deficiencies and possibly extend the applicability of the entire repertoire to a broader spectrum of input data. Iterative cycles of such revision, supported by a continually growing body of benchmark data, might eventually enable the routine design of safe and efficacious peptide-based vaccines. This could be founded upon the rich variety of extant methods for B-cell epitope prediction, representative examples of which are briefly referenced above in the "Background" section. As a rule, these methods actually render predictions as quantitative scores, which are subsequently dichotomized for compatibility with available qualitative benchmark data under the prevailing binary classification paradigm. The quantitative scores could otherwise be used to render predictions on biological effects of binding by antibody in terms of continuous variables, and the resulting predictions could be directly compared with corresponding benchmark data that are likewise expressed in terms of continuous variables under the full-fledged paradigm of biological outcomes.
With regard to the general practice of computationally aided design of peptide-based vaccines, functional protein annotation will increasingly suggest new potential vaccine targets as it is progressively enriched through the advancement of systems biology. These potential targets may include those that elicit antibodies in the natural course of disease and, perhaps more importantly, those that do not. The latter command special attention in that protective immunity may be conferred by vaccineinduced antibodies that bind them, although such antibodies might just as well produce deleterious effects (e.g., autoimmune and other hypersensitivity reactions). This reasoning may be extended to predicted B-cell epitopes: if they correspond to potential vaccine target regions that fail to elicit antibodies in the natural course of disease, they merit further scrutiny with respect to both safety and efficacy as possible vaccine epitopes.
At any rate, caution is warranted in all attempts to induce antibody responses against particular B-cell epitopes. This holds where antibody-dependent enhancement of pathogenetic processes is cause for concern and also more generally where antibody-independent immune mechanisms may be beneficial. Most current vaccination strategies rely mainly on antibody responses, yet antibody-independent immune mechanisms (notably those mediated by T cells) can serve as a complementary or alternative basis for protective immunity [179]. For example, although currently available influenza vaccines induce antibodies that are protective against only a relatively small subset of viral strains, this limitation could conceivably be overcome by alternative vaccines that induce immunity based on cytotoxic T cells capable of recognizing epitopes present in a broad range of viral strains [180]. Furthermore, antibody responses can interfere with the induction of antibody-independent immune mechanisms, biasing helper T-cell responses towards excessive Th2 function at the expense of Th1 function and thereby impairing cell-mediated elimination of intracellular pathogens [134]. Similar helper T-cell dysfunction may also result from other factors such as nutritional imbalances [181] and helminthic parasitism [41], which remain widely prevalent among developing regions of the world. B-cell epitope prediction may therefore find practical application less in vaccine design and more in the design of immunogens to produce antibodies for passive immunization, for both prophylaxis against and treatment of disease. Moreover, B-cell epitope prediction may ultimately play an even more important practical role in the development of immunodiagnostics, for which binding per se rather than biological consequences thereof is typically the primary consideration (and the benchmarking problem thus may be more readily tractable, although still decidedly non-trivial in view of the biological variability of antibody responses in both health and disease).

Conclusions
To advance the development of B-cell epitope prediction for peptide-based vaccine design, efforts must focus on predicting clinically relevant biological outcomes expressed in terms of continuous rather than artificially dichotomized variables. For this purpose, appropriate benchmark data must be generated to empirically validate the requisite predictive methods, which themselves may be provisionally justified on the basis of physicochemical arguments articulated from the perspective of systems biology. Most importantly, issues of safety and efficacy should be addressed with due attention to the individual uniqueness of prospective vaccine recipients. As epidemiologic transitions play out in the course of climate change and other environmental transformations, vaccine-design initiatives as a whole must broaden in scope to encompass a growing number of diseases among human and animal populations, within an expanding ecological framework that increasingly enables costeffective intervention based on systems analyses. Such analyses would be better enabled by further extending the application of immunoinformatics and computational immunology to progressively higher levels of biological organization, so as to capture key emergent phenomena (e.g., host immunomodulation by commensals and parasites) that arise in real-world scenarios. In this context, B -cell epitope prediction must complement T-cell epitope prediction to support the design of peptide-based vaccines that confer protective immunity while avoiding vaccine-induced adverse reactions.
Notwithstanding the overall bias of currently employed vaccines towards induction of antibody responses, active immunization may result in harm due to antibody-dependent enhancement of pathogenetic processes, in which case carefully controlled passive immunization with exogenously supplied antibodies might offer a safer path to protective immunity. Such passive immunization could serve as an important initial proving ground for Bcell epitope prediction en route to application in the design of peptide-based vaccines: B-cell epitope prediction could aid in the design of peptide-based immunogens to induce the production of antipeptide antibodies for passive immunization, which in turn would provide an opportunity for empirical validation of predictions on clinically relevant biological outcomes in a setting of reduced complexity relative to active immunization. This scheme could facilitate the preliminary development of B-cell epitope prediction for peptide-based vaccine design, specifically by restricting biological complexity (e.g., due to prior immunologic imprinting) in passive-immunization experiments to render epitope-prediction problems more computationally tractable. At the same time, the passiveimmunization protocols thus devised might themselves be useful for prophylaxis against or treatment of disease.

Competing interests
The author declares that he has no competing interests.