In silico assessment data of allergenicity and cross-reactivity of NP24 epitopes from Solanum lycopersicum (Tomato) fruit

This paper describes data on allergies caused by food (vegetable) and their negative impact on the nutritional balance of the human body. Allergic responses to vegetables such as tomatoes, capsicum and spinach are next to fish, eggs and nuts. Epitopes such as NP24 (allergens) are one of the salt-induced allergenic proteins found in the thaumatin-like protein (TLP) family. The mechanism of allergenicity of TLP found in Solanum lycopersicum (Tomato) fruit is poorly studied. Here we demonstrated allergenicity conferred by the NP24 protein found in Tomato. The data on the cross-reactivity of NP24 protein was generated using Allergen Online and Allermatch tools. Tomato allergenic protein epitope shows a significant identity of with allergens reported in Capsicum, Olive, Kiwi, Tobacco and Banana allergens. Hence, the datasets of sequences, comparative analysis and homology epitope mapping over three dimensional (3D) structures revealed that NP24 has higher cross-reactivity to Capsicum and Tobacco proteins. Thus, this data probably act as limelight for planning wet lab experiments.


Data
This paper describes the data on plant proteins (PR1 to PR 17) especially belonging to vegetables. Allergic plant protein from Solanum lycopersicum (Tomato) such as Thaumatin protein belonged to PR-5 group. It shows functional diversity in allergenicity and kinase function. Tomato fruit NP24 protein has seven different types of IgE epitopes. Mapping of IgE epitopes of NP 24 protein shows that NP 24 protein is allergic and shows cross-reactivity as well.
Nutritious food is required for growth, development and maintenance of health. Most commonly used food such as fishes, eggs, fruits and nuts have wide acceptability and fall under most nutritious foods. On contrary, intake of such food can cause allergy or create allergic situations in some individuals. Overreaction of body immune defence system is the cause for allergies. A study of the prevalence of sensitization to foods in Europe was carried out with 4522 individuals living in 13 countries for IgE test against 24 foods. The survey reported that individuals from most of the countries the high prevalence of vegetables, fruits and nuts than to eggs, milk and kinds of seafood. Allergic sensitivity to nuts was 7% whereas 0.2 and 0.4% to fish and eggs respectively. Similarly, Tomato is one of the most commonly consumed vegetables across the next to apple and wheat. "Solanum lycopersicum fruits that are called as vegetables by "Nutritionists". Food allergy from Tomato fruit showed the significant allergic prevalence of 3.3% [1]. Prevalence of clinical oral food challenge (OFC proven) for food allergy test in preschool children in developed countries was reported to be as high as 10%. Unlike in developed countries, it is 7% in Asian countries such as China and India [2]. However, urbanization increased consumption of processed food and stressful lifestyle has resulted in reduced immunity and increased allergies to foods especially in children [3,4].
The hypersensitive response is one of the most efficient mechanisms for conferring immunity a from phytopathogens which include fungi, bacteria and viruses. The pathogen-related proteins (PR proteins) are defensive molecules which protect plants. These PR proteins are belonging to the family of "stress-inducible" proteins. These were first discovered in Nicotiana tabacum (Tobacco) plants causing hyper sensitively to infection from Tobacco Mosaic Virus [5,6]. Later, many PR proteins have been detected in other plant species [7][8][9][10][11].
The pathogenesis-related protein families are broadly classified into 17 groups. Amongst them, thaumatin-like proteins (PR-5) is the fifth group of the PR protein family having molecular weights ranged from 20 to 26 kDa. They were named as thaumatin-like proteins because the amino acid sequence is homologous to thaumatin-a sweet-tasting protein derived from Thaumatococcus daniellii [12]. Thaumatin and Thaumatin-like proteins also identified in animals [13] and fungi [14,15]. The thaumatin family protein has eight disulphide residues [16,17]. Despite the lack of atopic individuals (in humans), pollens and food TLPs are identified as inhalants and ingestant respectively [18]. Reports show that 39.2% of children are monosensitized to grass pollen and has allergy to Tomato fruits IgE antibodies [19]. In 1988 Ortolani et al., confirmed that association between Tomato fruit oral allergy syndrome (OAS) and grass pollen allergy is statistically significant [20]. It has also been reported that sometimes anaphylaxis arises within few hours as soon as Tomato fruit consumed [21]. Several allergens in Tomato fruit have been described such as Sol-l1, Sol-l2, Sol-l3, Sol-l chitinase, Sol-l-Glucanase, Sol-l-peroxidase and Sola-TLP (NP24). But, the clinical relevance of each of these allergens is yet not clear. Only a limited number of TLPs have been identified from plant pollens and foods. Among fruits, only hybrid forms of TLPs are shown a hypertensive reaction with IgE [22][23][24]. So far, out of 15 allergen TLPs, only 7 TLPs have been crystallized and their 3D structures were elucidated [25]. However, comparative analysis of the structural features of allergenic TLPs in the context of prediction of IgE epitopes have not been reported so far.
The salt-induced TLPs is Protein NP24 (molecular weight 24 kDa) containing 247 amino acids is found in Tomato fruit tissues [26]. Previously NP24 was first isolated, purified and crystallized from Tomato fruit fruits [27]. Later studies reported that there are two isoforms (I and II) of the thaumatinlike protein NP24 present [19]. Isoform-I was expressed mainly in the outer pericarp of healthy Tomato fruit fruits and low in green Tomato fruits which subsequently increases during ripening of fruit. On the other hand, Isoform-II is relatively high in green Tomato fruits. It's concentration rise as the fruit turns pink and subsequently decreases as the fruit turns red. Fully ripened Tomato fruit (mature fruit) will have mainly isoform-I and the half-ripened fruit will have both isoforms (I & II) in significant quantity.
Detection allergy in individuals either in vivo or in vitro using molecular biology techniques is very difficult task time-consuming task and cost ineffective as well [28]. Possible development of severe allergic risks during the test and lack of sensitivity of allergic reaction are the major drawbacks of in vivo and in vitro methods. These shortcomings make the computational method as a good approach for the identification of epitopes and allergenicity. From the above discussion, it may be inferred that vegetable allergies, especially by consumption of Tomato fruit are much more prevalent than what one would expect. Hence the present work was undertaken mainly for computational analysis of NP24 protein from Tomato fruits in order to identify allergic components such as IgE epitopes, their position and possible cross-reactivity with other food and pollen TLPs.

Experimental design, materials, and methods
2.1. The protein sequence data of the NP24/Thaumatin-like protein The NP24 protein sequences were retrieved from the National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/). Search result yielded around 5, 359 hits from different organisms. Further narrowing down to Solanum lycopersicum yielded 44 results. We have used the complete gene sequence which has 247 amino acids (NCBI accession number P12670).

Prediction of cross reactivity of NP24 protein
The cross-reactivity of Protein NP 24 sequence determined using freewares such as Allergen online (www.allergome.org) and Allermatch (http://allermatch.org/) tools. Both online tools give comparative data on cross-reactivity and IgE binding properties of clinically important NP24 protein.

Prediction of 3-dimensional (3D) confirmation of NP24 protein
The HHpred method was used to predict the 3D structure of homologous sequences. HHpred utilizes Hidden Markov Model -Hidden Markov Model (HMM-HMM) algorithm/ (http://toolkit.tue bingen.mpg.de/) for identification of 3D structure. This modeling process involves various steps such as (a) Database search and E value: Here homologous sequences detected greater than 90% and very less E value are considered, (b) To check similarity between secondary structure of sequences, (c) Identification of possible conservative motifs for designing structure, (d) Aligning the target sequence with the template structure, and (e) Realignment of sequences.

Data on prediction and characterization of antigenic determinants of NP24 protein
Antigenic determinants are the part of an antigen. These are identified by the antibody, B cells or T cells, react with them and produce hypersensitivity reactions. Prediction of linear B cell with accuracy is a challenging process to design immunotherapy. BepiPred prediction method adopted for identification and locating B cell epitopes by a combination of Hidden-Markov model and Parker & Levitt propensity scale algorithm. Algpred tool (http://www.imtech.res.in/raghava/algpred/submission. html) used for the prediction of antigenic determinant binding to IgE. To predict allergens, initially, BLAST was used to identify sequences and aligned using Allergen Representative Peptides. Multiple Em for Motif Elicitation (MEME) tool was used for discovering motifs in a group of related protein sequences followed by statistical analysis. Phylogenetic tree analysis was done using MEGA6.

IEDB homology mapping of NP24 protein
Mapping gives the best representation of antigenic determinant points over the 3D structure. It also provides the information about the position of epitopes i.e. whether epitope regions lies inside the structure or at the surface level and how this epitope distributed over the similar sequences as that of the query sequence. IEDB epitope homology mapping tool (http://tools.immuneepitope.org/ tools/bcell/iedb) detects PDB that are homologous to the epitope source sequence.

Epitope Conservancy Analysis (ECA)
ECA tool computes the degree of conservancy of an epitope within a given protein sequence set at a given identity level. Obtained results were represented as summary view (for all epitope sequences) and a detail view (for individual epitope). The summary view for each epitope shows degree of conservancy (percentage of protein sequence matches a specified identity level) and the matching minimum/maximum identity levels within the protein sequence set. The detail view of an epitope shows the positions and the matching protein sub-sequences for all sequences in the protein dataset.

Datasets of cross-reactivity of NP24 protein
Allergen Online, Allermatch and FARRP tools database gave significant information regarding cross-reactivity. In 2000 Alberse hypothesized that greater than 70% identity by comparing a query sequence with homologous known allergen sequences showed the significant cross-reactivity. While those with less than 50% identities are unlikely to be cross-reactive. This suggests that alignment of the query with greater than 50% identity with allergen sequence were cross-reactive in full-length alignment. In 80-mer sliding window method, similarity search was performed for every 80-amino acids segment of the query sequence. The cutoff value was greater than 35% (as per FAO/WHO 2001 expert panel recommendation) which indicates the possible cross-reactivity of allergens. However, in the 8-amino acid exact match (8-mer) method, any exact match for the query was considered to identify the protein as a potential cross-reactive allergen. Cross-reactivity analysis using all three methods in both FARRP & Allermatch gives in-depth cross-reactivity information. Table 1 shows the results of allergen online database. In full-length alignment method for allergic food, Kiwi, Olive and Sapota showed significant identity to NP24 (above 65%), whereas allergic pollen like Japanese Cedar and White Cedar showed an identity of approximately 50%. In-80-slide window result for protein NP24, the number of 80-mer sequences was found to be 168 among that 29 sequences showed matching of 80 amino acid stretches with the allergens deposited in databases. In the 8-mer, the total number of 8-mers were 240. Of which, 15 sequences with at least one 8-mer match corresponding to allergens found in the database.
In full-length alignment method, interestingly food alignment window for protein NP24 shows more than 50% identity to TLPs Capsicum, Kiwi fruit, Banana, Apple, White Cedar and Cupressus sempervirens. In 80-slide window and exact match method result for protein NP 24 showed the similar result as of full length ( Table 2).

Recognition and assessing antigenic determinants of NP24 protein
NP24 protein sequence was retrieved from the SwissProt database (accession number P12670) and analyzed for B cell epitopes using IEDB tool. Prediction scores for each residue of NP24 obtained by IEDB BepiPred (Fig. 1). The residues with scores above the cutoff value (4 35%) was predicted as epitope and highlighted in yellow color. Total 12 epitopes were identified. Amongst 12 epitopes, epitope number 10, 6, 2 and 11 have shown high scores followed epitope 8, 7 and 4 (with intermediate scores) which followed by epitope 1, 3, 5, 9 and 12.

Epitope conservancy analysis and distribution of epitopes
Epitope conservancy analysis tool calculates the degree of conservancy of an epitope within a given protein sequence set at a different degree of sequence identity. The degree of conservation is defined as "the fraction of protein sequences containing the epitope at a given identity level". Epitome conservancy analysis was performed on 7 identified epitopes from Algpred tool. It was observed that IgE Epitopes 1, 3 & 2 showed the highest degree of the conservancy. Detailed analysis is depicted in Table 4 shows epitope sequence, starting position, ending position and percentage of identity of the query sequence. Epitope regions of NP24 predicted by different tools as discussed earlier. Table 5 shows positions 50 to 73 of the NP24 sequence.

Phylogenetic analysis of NP24 protein
The phylogenetic tree was inferred using MEGA6 tool to understand the evolutionary pattern of homologous sequences which share IgE epitopes with NP24 (Fig. 2). Homologous sequences obtained from epitope conservancy were alignment by using MUSCEL. The phylogenetic tree was constructed by MEGA 6.0 tool. Protein NP24 is closely related to osmotic like protein of Capsicum annum (Fig. 3) sharing the Epitope 1 & 2. This infers that the person with Tomato fruit allergy might have an allergy to Capsicum. Table 2 Allergic assessment of NP24 protein using Allermatch Tool: a] Results of FASTA alignment of input sequence against UniProt and WHO-IUIS database. (Number indicates to percentage identity). b] (i) Percent identical amino acids in the aligned 80-aa sliding window, (ii) the number of hits the input sequence had with this allergen, and (iii) the percentage of windows analyzed for this input sequence hitting this allergen iv) Results of a FASTA alignment of the complete input sequence against this database sequence. The first number is the percentage of identity. The second number is the length of sequence over which FASTA aligned c] the percentage of exact hits the input sequence is found to hit this allergen sequence.

Protein ID Species Name
Full length a (%) 80 merwindow b (%) 6 amino acid match c (%)   Fig. 2. Both IgE Epitope surfaces overlap with the thaumatin family signatures. Further, B-cell epitope residues with the high score (FNAAGRGTCQTGDC) and medium scores (PRGTK) were found in IgE epitopes 1 and 2, which indicate their higher accessibility for antibody recognition.

Data on computational modeling of NP24 protein
Crystal structure of NP24-I (PDB ID: 2i0w) was created using X-ray diffraction, at a resolution of 2.5 Å and deposited in PDB in 2006. An in silico modeling of NP24-I was compared with the crystal structure of NP 24 using DALI tool. Homology method was used to develop the model of Protein NP24. Total 11 crystallographic structures of TLPs were available in RCSB PDB database. In order to construct the structure of NP24, one can choose any one of these or a combination of these structures as templates. On the basis of alignment score and coverage of sequences, 6 structures were selected . The further 3D model was generated using Modeller of HHpred (Fig. 4).
To understand the model quality and recognition of errors in the model, PROSA tool was used. PROSA result shows overall model quality and local model quality. Quality has been assessed using Z-score. Obtained Z-score -4.43 lies within the range (Fig. 5). Furthermore, we have aligned the model structure with that of the crystal structure of NP24-I (PDB ID: 2i0w) using the DaliLite tool. Both structures have a greater identity with Z-score equal to 35.4 and RMS of C-alpha value equal to 0.6 ( Fig. 6). Structural analysis of modeled protein shows that NP24-I has five helices, 19 strands, 34 turns, 128 hydrogen bonds and eight disulphide bridges which stabilized the entire structure. All interactions including disulphide bridges are shown in Fig. 7. It was observed that there were 10 clefts. Out of 10 clefts, four have higher volumes which are more significant. It was also observed that different residues present in protein NP24 according to their properties of an amino acid. The first cleft is the biggest cleft with the volume 2582 Å contains one cysteine molecule, three positive and two negatively charged atoms with more number of aliphatic residues. Whereas the second cleft with 1766 2582 Å volume contains two cysteine molecules which show a more stabilized structure. An important feature of the 2nd cleft is the presence of a higher number of neutral and aromatic residues. IgE epitopes present in protein NP24 predicted by AlgPred tool show stick-like shape.

IgE Epitope mapping using IEDB's homology mapping tool
Predicted IgE epitopes in NP24 protein were mapped on 3D structures and the sequence of some TLPs. It shows source sequences appeared in the regions that are similar to the epitope sequence. The epitope location annotated by IEDB were highlighted by green and orange color which indicate perfect sequence matches while other matches (identity 4 80%, overlap 4 ¼ 80% and no more than one gap) indicated in light grey color. Epitope mapping resulted in 19 hits which had epitope matches to the structures. These 19 hits have sequence similarity greater than 39%. This study also shows number of residues of the epitope are exactly present in the homologous structures of proteins. Pathogenesis-related (PR) group of protein is one of the specific plant proteins group especially found in vegetables which has the number of types from ranged from PR1 to PR 17 [29,30]. Protein sequences which show maximum similarity to Thaumatin protein (TP) belonged to PR-5 group. Most of the PR families show functional diversity in allergenicity and kinase function despite their sequence similarity [31].
Current understanding of the cross-reactivity of food allergens from Tomato fruit is very rare. NP24 protein alignment and analysis using 80-mer window methods (FARRP and Allermatch) shows that Kiwi, Bell Pepper, Banana, Wheat and Peach show 4 60% identity with Thaumatin-like proteins (TLPs) from plant food (vegetables). The pollen TLPs considerable similarity numbers of pollen TLP were very few [32]. The number of 8-mer matches with NP24 protein by FARRP and Allermatch was found to be high for Bell Pepper, Olive and Kiwi. AlgPred epitope analysis of NP24 protein indicates the presence of seven (7) different types of IgE epitopes in the homologous sequences. Among TLPs, some protein sequences very few have only four epitopes and others are devoid of epitopes. IgE epitopes 1 and 2 are the predominant of food TLPs. On contrary, IgE epitopes 1, 2 and 3 frequently in most pollen TLPs. Secondary structure analysis shows that structure of NP24 protein is much more similar to other TLPs. The secondary structure of NP24 shows high percentages of amino acids such as glycine 28/247 (11.33%), threonine 25/247 (10.12%) and proline 21/247 (8.5%).
Mapping of NP24 protein epitopes: The two predicted IgE epitopes (1 and 2) of NP24 protein have been mapped. It was observed that some residues in IgE epitopes one (1) and two (2) were buried. But, most of the residues were readily accessible and specific IgE of protein NP24 produce allergic reactions.
The protein NP24 is a commonly found component in Tomato fruits and Spinach leaves. It shows the close match with allergenic TLPs of Capsicum, Kiwi, White Cedar, Cupressus sempervirens and Banana suggesting cross allergic reaction. Two unique IgE epitope of NP24 protein were identified viz. Epitope 1 (AAGTASARFWGRT), Epitope 2 (TFDASGKGSCQTG) at positions 58-70 & 73-85 position respectively. Amongst seven IgE epitopes, the epitope number 1, 2 & 3 showed a greater degree of conservancy within the homologous sequence to NP24 protein. Phylogenetic analysis of protein NP24 with other TLPs revealed that Capsicum shows highest allergic cross-reactivity with Tomato fruit NP24 protein.