In-silico identification of linear B-cell epitopes in specific proteins of Bartonella bacilliformis for the serological diagnosis of Carrion’s disease

Carrion´s disease is caused by Bartonella bacilliformis, it is a Gram-negative pleomorphic bacterium. B. bacilliformis is transmitted by Lutzomyia verrucarum in endemic areas of the Peruvian Inter-Andean valleys. Additionally, the pathogenicity of B. bacilliformis involves an initial infection of erythrocytes and the further infection of endothelial cells, which mainly affects children and expectant women from extreme poverty rural areas. Therefore, the implementation of serological diagnostic methods and the development of candidate vaccines for the control of CD could be facilitated by the prediction of linear b-cell epitopes in specific proteins of B. bacilliformis by bioinformatics analysis. In this study, We used an in-silico analysis employing six web servers for the identification of epitopes in proteins of B. bacilliformis. The selection of B. bacilliformis-specific proteins and their analysis to identify epitopes allowed the selection of seven protein candidates that are expected to have high antigenic activity.


Introduction
Carrion's disease (CD) is an endemic disease of the Andean countries such as Peru and Ecuador, and was formerly reported in Colombia [1,2]. CD represents one of the main challenges for public health due to poverty and poor sanitation in endemic localities, affecting children with chronic malnutrition. Bartonella bacilliformis, the etiological agent of CD, is a Gram-negative pleomorphic bacterium transmitted by sand flies of the Lutzomyia genus, especially L. verrucarum [2]. Climate change and variability in inter-Andean valley rainfall associated with the El Niño-Southern Oscillation (ENSO) contribute to the reproduction of the associated vector and the emergence of local CD outbreaks with a non-negligible number of cases [3].
CD clinical manifestations are diverse as the microorganism parasitizes human erythrocytes generating an acute phase, named Oroya fever, characterized by anemia and febrile illness [2], nevertheless, the nature of the initial symptoms may be confused with that of other infectious diseases, such as malaria, dengue or others. This is of special relevance because the absence or delay of adequate treatment may result in fatal outcomes, hence mortality rates among untreated or inadequately treated patients were described as up to 88% [2,4]. In Peru, the overall Oroya fever lethality ranges between 0.5 and 3%, with about 10% of severe cases attending reference hospitals having a fatal outcome [2]. After a period of several weeks following the acute phase, the patients display a non-life threatening eruptive phase termed verrucose phase [1,2]. The eruptive phase is characterized by skin eruptions and may be presented in the absence of a previous acute infection [1,2]. Although, patients with asymptomatic bacteremia have been identified, they are considered as potential reservoirs of B. bacilliformis and a potential source of infection for susceptible persons [3].
In relation to the diagnosis of CD, bacteriological cultures are accurate, but they are timeconsuming because of the slowness in the colony formation by B. bacilliformis (about 4-15 days) which limits its usefulness for diagnostic purposes. On the other hand, blood smear detection is widely used due to its accuracy and low cost, but it has a low sensitivity that can generate false negatives that affect the confirmation of the diagnosis [5,6]. Molecular methods based on the amplification of specific gene regions such as gltA, ribC and ialB have been proposed, however, the implementation of these techniques in rural areas is challenging due to the lack of equipment, and some of these genes are not specific for B. bacilliformis, causing cross reactivity with other pathogens [7]. Furthermore, the available serological technique used for the diagnosis of CD employs soluble protein lysate, it displays limited specificity, and potential cross-reactivity with other Bartonellaceae or other microorganisms, similar to what has been reported for different immunological approaches for the detection of Bartonella quintana and Bartonella henselae infections [8,9]. Regarding rapid diagnostic tools that can be used in endemic areas, different antigenic candidates have been proposed but none has been introduced in clinical practice [10,11]. Data about the immune response to B. bacilliformis is scarce, antibody immunity build-up for the development of partial immunity [10,12]. Thus, immunoglobulin M (IgM) is considered as a biomarker of the acute phase and immunoglobulin G (IgG) as a marker of previous exposure [10,12].
Identification of the antigenic-determinant is crucial for designing immune treatment such as a vaccine against infectious diseases, or to avoid cross-reactivity of antibodies used in diagnostic methods [13]. Immunoinformatics addresses the molecular interactions of potential binding sites by computational methods [14] and is the best approach to recognizing epitopes. Immunoinformatics is inexpensive, accurate and not time-consuming, allowing the design and synthesis of a molecule that can be used as an antigen [13]. Hence, the present study aimed to identify linear B-cell epitopes in specific proteins of B. bacilliformis.

Data preparation
Proteins highly enriched in linear B-cell of B. bacilliformis based on the complete genome of the strain KC584 (GenBank code CP045671.1) were identified [15]. The selected genome contains 1'411´655 base pairs, displays a 500.0 X genome coverage, and was obtained by PacBio Sequel and Illumina MiSeq [15]. Functional genes (CDSs) were downloaded in FASTA format and all headers were edited in order to retain the name, orientation and position in the genome.

Selection of non-homologous proteins
To obtain only B. bacilliformis-exclusive-proteins, we used the BLAST+ tool to identify nonhomologous proteins, comparing proteins encoded by the essential genes of B. bacilliformis to the proteome of Homo sapiens, Mus musculus, febrile illness-associated bacterial species and other Bartonella species. First at all, We identified non homologous proteins shared with Homo sapiens (GenBank code GRCh38p.13) and Mus musculus (GenBank code GRCm39), to avoid cross-reactivity when human samples and animal model samples are used, respectively. The selected parameters were Bit-score < 100, % identity < 35, % coverage < 35.
Finally, identity percentages were considered as a parameter for the selection of B. bacilliformis specific proteins, considering proteins with less than 80% identity to other bartonellae and 70% identity to fever-causing bacteria.

Results
This study aimed to identify linear B-cell epitopes in highly specific proteins of B. bacilliformis for further use in serological diagnosis. The identification of essential genes allows for obtaining 646 proteins which were used for the screening of non-homologous proteins. Then, 323 proteins were obtained by comparing B. bacilliformis essential proteins with the proteome of Homo sapiens and Mus musculus, and 131 proteins were obtained and identified as non-homologous to febrile illness-associated bacterial species and other Bartonella species, see Fig 1. Percentage identity was considered as a criterion for selection of B. bacilliformis-specific proteins, hence, 29 proteins were obtained and used to identify linear B-cell epitopes, see Table 1. The specific proteins include 13 cytoplasmic proteins, 12 cytoplasmic/membrane proteins, 2 outer membrane proteins, and 2 unknown-localization proteins, whilst the functionality of proteins involves more metabolic activities than structural conformation.
Six web servers were employed to predict the linear epitopes in each B. bacilliformis-specific protein, the analysis was performed using signal-peptide-free sequences. Then, the number of predictions per position was plotted to facilitate the identification of protein regions with a higher number of epitopes. A general overview shows epitopes were predicted in more than 50% of the entire protein, except prot 492, see Fig 2. The top seven proteins displaying more linear B-cell epitopes were selected for further analysis.
The linear B-cell epitopes were highlighted in the 3D simulated structure of the top seven proteins, Fig 3, no computational simulation calculations for lineal epitopes were obtained in 3D modeling. The 3D structure provides insight into the possible disposition of epitopes in the predicted proteins under the native state.
The identification of linear B-cell epitopes was performed by using six bioinformatics tools, the results were plotted for easy identification of areas that tend to contain more linear B-cell epitopes; this study considers as regions of interest those that were identified by at least 3 predictors, see Table 2 and Fig 2. Thus, we predicted that prot 447 is a peptidoglycan DD-metalloendopeptidase has regions highly enriched in linear B-cell epitopes, also, the prot 447 was the only protein that contain linear B-cell epitopes that were predicted by six tools. Likewise, regions highly enriched in linear B-cell epitopes were identified in prot 81, prot 288, prot 504, prot 492, prot 612, and prot 689. Prot 81 is predicted as a protein with tyrosine recombinase activity that participates in the integration of the genetic material of phages, prot 288 is predicted as a transport protein with ABC domain, prot 492 is predicted to be involved in cell division, prot 504 is predicted as a beta-barrel protein responsible for locating proteins in the outer membrane, prot 612 is predicted as a protein that participates in DNA replication and has a single-stranded DNA-binding domain, and prot 689 is predicted as a lipoprotein A with lytic transglycosylase activity. The size and number of the linear B-cell epitopes were variable in each protein, Table 2 shows the characteristic of epitopes.

Discussion
The in-silico identification of linear B-cell epitopes in specific proteins of B. bacilliformi proteins for their application in immunological methods is a strategy considered by several researchers, to solve the need for serological diagnosis. For the development of this study, the bioinformatic analysis aimed to identify non-homologous proteins with Homo sapiens, Mus  musculus, other Bartonellae, and a subset of bacterial agents of febrile syndromes. This approach shares similarities with the methodology used by Ditcher et al., who identified potentially immunoreactive proteins by predicting putative antigenic proteins in-silico from the genomic sequences of B bacilliformis, evaluating homologies affecting potential cross-reactivities with other Bartonella spp [33]. All proteins identified in this study present percentages of identity less than 80% at the sequence level with respect to other species of the Bartonella genus and less than 70% with respect to other genera associated with fever. The in-silico identification of linear B-cell epitopes in specific proteins of B. bacilliformis was carried out using six epitope predictor tools available on web servers, the same ones that reside in algorithms such as supporting vector machine, SVM; random forest, RF; neural network, NN and physicochemical characteristics. The performance of these methodologies has been analyzed simultaneously in different studies [34,35], and applied to different gram-negative bacteria such as Coxiella burnetii [36], Vibrio vulnificus [37], Treponema pallidum [38], Anaplasma phagocytophilum [39], among others. Furthermore, many of the proteins or peptides obtained in those studies have shown their utility in serological assays and/or in the generation of antibodies in murine models for the development of vaccines. In that regard, Dichter et al. identified three immunodominant proteins which were evaluated by ELISA and displayed a sensitivity of 81% and specificity of 95% when a porin B and an autotransporter E are combined [33]. Likewise, Padilla et al. show a protein as a vaccine candidate, designed using predicted epitopes by in-silico analysis [40]. The in-silico analysis allowed the identification of seven specific proteins of B. bacilliformis. The prot 81 was predicted as a phage integrase whose cellular localization is cytoplasmic. However, it did not discount the possibility to investigate this protein, since the ELISA assay performed at National Institute of Health-Peru is produced using total proteins of B. bacilliformis strains. The 228 protein is predicted to belong to the ABC (ATP-binding protein) transporter family containing three continuous linear epitopes based on analysis ( Table 2). The predicted region of this protein is involved in metal cation exchange, according GenBank web, but so far it has not been characterized for Bartonella bacilliformis species. However, Hina et al identified a protein from the membrane transporter family (Hemin ABC transporter) as a candidate for the design of vaccines against Bartonella bacilliformis, using bioinformatic analysis [41].
A similar approach using in-silico analysis was performed by Dichter et al. who employed only the Vaxign software, they identified the peptidoglycan-binding protein (LysM) [33], the same one that we described in our study as 447. In our research, prot 447 was predicted to display 7 linear epitopes (6-30 amino acids), see Table 2, and minor sections that could not be considered as epitopes due to their size (2Aa to 5Aa). According to previous studies, the aggrupation of linear epitopes and minor sections could be part of a conformational epitope [42,43], likewise, it has been suggested that conformational epitopes are to be formed by linear epitopes [44,45]. It should be noted that unlike Dichter, in our study, more than three tools coincided in the same region in the prediction of epitopes within this protein, which supports our results and that 447 is considered a good antigenic candidate for the development of serological diagnostic methods. Furthermore, a previous study has identified a homologous protein of Prot 447, the identification was done by screening heterologous proteins with serums of patients [46]. Also, the homologous protein of prot 447 has been identified in western-blot assays [46] and has been expressed in-vitro in Escherichia coli for ELISA-type serological assays [40]. In relation, Prot 447 identified in this study has been predicted as a 43kDa lipoprotein with metallopeptidase activity and a lysine domain responsible for peptidoglycan binding, as well as the previous study. The findings about the homologous of Prot 447 support our bioinformatics analysis, considering that the prot 447 predicted for us displayed several linear B-cell epitopes.
Likewise, prot 504 was predicted to be an outer membrane protein with a beta-barrel domain, this kind of protein has been shown to generate immunity against Pasteurella multocida [47], and to have a potential use as vaccine against Haemophilus influenzae [48,49] and Leptospira sp. [50]. The remaining five identified proteins are predicted to be involved in cell division, peptidoglycan binding, or transglycosylases, additionally, those proteins have not been reported previously and experimental analysis is required.
The finding of specific outer membrane proteins (OMPs) of B. bacilliformis, such as the lipoprotein prot 447 and the prot 504 with beta-barrels domain, provides the basis for the future implementation of an accurate and sensitive serological assay for the diagnosis of CD as OMPs can activate the immune response and virulence by mediating pathogen-host interactions [51]. In Gram-negative bacteria, the tertiary structure of OMPs includes beta-barrel structure composed of a variable number of beta-strands. Another type of exposed OMPs known as outer membrane lipoproteins are also involved [52,53]. In addition, both outermembrane beta-barrels (OMBB) and outer-membrane lipoproteins (OMLP) are considered good candidates for the development of vaccines, immunological target tests and could promote better understanding of the pathogenicity of B. bacilliformis, facilitating the identification of therapeutic molecules for in-silico and experimental assays [54].
The prot 612 predicted as a single-strand binding protein has not previously been described, the available information on GenBank web points out this protein could be involved in DNA replication, DNA reparation, and DNA recombination. Also, the prot 689 was predicted to be a lytic transglycosylase, as is mentioned by GenBank web, septal ring lytic transglycosylase RlpA family (rare lipoprotein A). This kind of protein was studied in Gram-negative bacteria and was described to participate in division of cells [55], hence it can be inferred that prot 689 has a major role in bacterial survival. No serological analysis was performed using homologous proteins of prot 612 and prot 689, in previous studies.

Perspectives and implications of this study
Adequate medical interventions rely on disease diagnosis. CD includes an acute phase, characterized by anemia and febrile illness and a chronic phase distinguished by skin eruptions [2]. Furthermore, the presence of asymptomatic carriers has been reported, being considered as reservoirs and sources of the disease [2,10]. It is therefore important for a serological test for CD to not only be sensitive and specific, but it should also be able to detect early (acute) and late (chronic) infections, as well as asymptomatic carriers. Detection of the latter is of special interest to advance towards the eradication of CD [2]. The development and implementation of IgM (acute phase) and IgG (chronic phase and previously exposed population) antibody tests, respectively is important, with ELISA, Immunoblot and immunochromatographic lateral flow being good alternatives. The validation of these methods should include the geographic representation of CD from different endemic areas of Ecuador and Peru. Many of the B. bacilliformis proteins identified in this study could be used as targets since their high specificity (inhibiting the secretion of virulence proteins, inhibiting the maintenance of cell wall) or vaccines candidates (by designing a mix of complete antigens, multiepitope proteins, or outer-membrane vesicles containing virulence factors). Future studies should consider both 1) the genetic variability of B. bacilliformis, and 2) the genetic variability of the human immune components (HLA-I, HLA-II) and should be carried out using molecular dynamics approaches.

Conclusions
We report the in-silico identification of proteins with a high number of predicted linear B-cell epitopes of B. bacilliformis by exploring a combination of predictors at the genome level. The list of seven protein candidates identified in this study could be used for the development of serological diagnostic tests, the production of monoclonal antibodies and the development of vaccine candidates for the control of CD.