WHO/IUIS Allergen Nomenclature: Providing a common language

HighlightsBeginning and evolution of the official Allergen Nomenclature system 1980–2018.Allergen Names abbreviated genus, species and number.Expected data including characterization of protein amino acid sequence, cDNA, human serum donors and experimental data.Challenges of identifying allergens including exposure and complex human exposure and immunity.Complexity of new methods including “omics”. ABSTRACT A systematic nomenclature for allergens originated in the early 1980s, when few protein allergens had been described. A group of scientists led by Dr. David G. Marsh developed a nomenclature based on the Linnaean taxonomy, and further established the World Health Organization/International Union of Immunological Societies (WHO/IUIS) Allergen Nomenclature Sub‐Committee in 1986. Its stated aim was to standardize the names given to the antigens (allergens) that caused IgE‐mediated allergies in humans. The Sub‐Committee first published a revised list of allergen names in 1986, which continued to grow with rare publications until 1994. Between 1994 and 2007 the database was a text table online, then converted to a more readily updated website. The allergen list became the Allergen Nomenclature database (www.allergen.org), which currently includes approximately 880 proteins from a wide variety of sources. The Sub‐Committee includes experts on clinical and molecular allergology. They review submissions of allergen candidates, using evidence‐based criteria developed by the Sub‐Committee. The review process assesses the biochemical analysis and the proof of allergenicity submitted, and aims to assign allergen names prior to publication. The Sub‐Committee maintains and revises the database, and addresses continuous challenges as new “omics” technologies provide increasing data about potential new allergens. Most journals publishing information on new allergens require an official allergen name, which involves submission of confidential data to the WHO/IUIS Allergen Nomenclature Sub‐Committee, sufficient to demonstrate binding of IgE from allergic subjects to the purified protein.


Introduction
Early reports of adverse health problems that are thought to represent allergies and asthma date back more than 3000 years from Egypt, Rome and China, but these are not well documented. Likely cases of allergy to various inhalation sources or insect bites or stings are often dismissed as the nature and cause of the reactions were unknown. Allergic diseases are complex and broad in terms of organs affected and severity, ranging from highly prevalent allergic rhinitis to life-threatening anaphylaxis (Cohen and Zelaya-Quesada, 2002). Progress in the allergy field started in 1869 when Dr. Charles H. Blackley demonstrated that pollen was the apparent cause of his hay fever (Taylor and Walker, 1973), and continued throughout the 1900s with the establishment of immunotherapy using whole allergen extracts as a routine practice by the 1970′s (summarized in Fig. 1). A need to develop a systematic nomenclature for allergens became apparent in the early 1980s, when only few allergens had been identified and allergen names were inconsistently used in publications. This article focuses on allergen nomenclature in the context of Immunoglobulin E (IgE)-mediated allergies, directed against apparently harmless substances, mostly proteins and glycoproteins from diverse biological origins. Allergen sources include pollen, mites, animal epithelia and saliva, fungi, insect venoms and a variety of plant and animal foods.
Allergen extracts contain many other proteins and components in addition to allergenic proteins. The first identified allergenic protein, antigen E (AgE), was isolated from pollen of short ragweed (Ambrosia artemisiifolia) by Norman in 1962 (Marsh et al., 1981). Other ragweed allergens identified at an early stage included Ra3 and Ra5 (Marsh et al., 1981). Around the same time, proteins from ryegrass pollen (Rye I and Rye II, later called Lol p I and Lol p II) were identified as prominent allergens by Johnson and Marsh as reviewed in Freidhoff et al. (1986). In the course of this and subsequent discovery work originally aiming for a better understanding of HLA-associations with allergic immune responses, the potential of using specific allergenic molecules for more precise diagnosis and possibly for immunotherapy was gradually emerging (Yunginger and Gleich, 1972;Baer et al., 1980). During the past decades, advances in protein biochemistry and molecular biology have accelerated the discovery and characterization of allergens, being generated by recombinant DNA technology for a variety of applications, including basic and clinical research, allergen product standardization, allergy diagnostics and development of novel therapeutic approaches. Investigation of individual patient sensitization profiles has recently become possible via application of their sera to solid phased purified allergens in single assays or on microarrays with over 100 purified allergens, to accurately identify IgE-binding proteins and sources that likely cause their symptoms. Clear IgE binding to 2S albumins of peanut or soybean or to oleosins in peanut are likely to indicate higher risks of severe reactions (Beyer et al., 2015;Ebisawa et al., 2013;Schwager et al., 2017). Measuring specific IgE patterns can also help guide clinicians to treat patients with allergen immunotherapy (Sastre et al., 2012). Clinicians seeing patients allergic to bee and wasp venom may also improve diagnostic and therapeutic success for patients with so-called double-sensitizations using valuable molecular markers to prove primary sensitizations to the culprit venom (Seyfarth et al., 2017). In the future, individual immunotherapeutic reagents and prescriptions may be available for improved therapy. These developments further underpin the need for a consistent and unambiguous nomenclature of allergens. In parallel, the Allergen Nomenclature Sub-Committee has adapted to these changes by updating the criteria for defining a new allergen and the information requested in the submission form. This article provides an update on these criteria and challenges facing the existing system. Publishing the criteria ensures consistency and transparency of the process. Researchers are strongly encouraged to address them, with support of appropriate data reported confidentially to the WHO/IUIS Sub-Committee, to demonstrate evidence of allergenicity in order to receive an official allergen name prior to publication.
2. The beginning of the systematic Allergen Nomenclature: three men in a boat 1980 The idea for the current allergen nomenclature system originated from a discussion among Drs. David Marsh (USA), Henning Løwenstein A. Pomés et al. Molecular Immunology xxx (xxxx) xxx-xxx (Denmark) and Thomas Platts-Mills (UK) during a boat ride on Lake Constance (Bodensee), Konstanz, Germany, during the 13th Symposium of the Collegium Internationale Allergologicum in July 1980 Chapman, 2004). The revised nomenclature system was first described in the Bulletin of the World Health Organization by the committee of clinicians who joined the International Union of Immunological Society (IUIS) Sub-Committee for Allergen Nomenclature, with David Marsh as Chair . Many of the Sub-Committee scientists listed in the 1986 publication have been active in the evolution of rules and decisions on proposed allergen nomenclature. Other members have chaired the Sub-Committee over time (Wayne Thomas, Heimo Breiteneder and now Richard E. Goodman), with Jørgen N. Larsen pioneering the development of a web site, which was refined by John Wise at the University of Nebraska. In 2017, there are 22 active members and five members at large (listed on the website). The website http://allergen.org/originally showed allergens and information as a simple table, while a searchable database was established in 2007 and entries are since then added as they are agreed upon by the Sub-Committee. Allergen names are assigned by the WHO/IUIS Allergen Nomenclature Sub-Committee through a defined submission process as described in Section 5. This process ensures that an appropriate, approved and non-redundant name is assigned to the allergen for any further publication. The benefit of this nomenclature system is that allergens are named in a properly documented, consistent and unambiguous manner, creating and maintaining clarity amongst the scientific, clinical and regulatory communities.

Evolution of allergen names
During the early 1980′s, the Sub-Committee established guidelines for naming allergens based on the taxonomic name of the source organism. Original allergen names comprised the first three letters of the genus and the first letter of the species epithet (both in italics) followed by a Roman numeral to indicate the allergen in the chronological order of isolation from the same source . The publication described the requirements for data that should be provided as prerequisite for naming allergens including -molecular weight (estimated mass by SDS-PAGE or gel-filtration), -isoelectric point, -amino acid composition or sequence, and -immunochemical characterization, i.e. allergen recognition by patient IgE and/or animal antisera.
Epidemiology of allergens was called for in early publications of the Sub-Committee in terms of frequency of positive IgE binding from 20 to 30 human subjects. Over time, as biochemical, immunochemical and biophysical methods have improved, the expectations for description of allergens have grown and several nomenclature conventions have been changed including a requirement of fewer subjects.
In 1994, the nomenclature was revised, so that allergen names would not be in italics (reserved for genes), and the Roman numbers were replaced by Arabic ones (e.g. Amb a 1, Der p 1) (King et al., 1994). The 1994 update included a statement that sequence information would be required for new allergens (described in Radauer et al., 2014). The focus was on the amino acid (AA) sequence of the protein which is the relevant target for IgE. However, criteria for acceptance of sequences to the database have been adjusted over time as methods of protein determination have improved. A few new allergens were identified in a 1995 WHO/IUIS update (King et al., 1995).
In the 1990s, many allergens were being cloned and expressed as recombinant proteins and that led to the identification of multiple, highly homologous sequences. Then the WHO/IUIS Allergen Nomenclature Sub-Committee recognized the need to name homologous allergens in the same organism as either isoallergens or variants (isoforms), as described below.

Current allergen nomenclature conventions
As defined in the 1994 revision, allergen names consist of the first three letters from the genus, one letter from the species epithet, followed by an Arabic numeral. Occasionally, a letter is added to the genus (Sola for tomato, Solanum lycopersicum, as Sol was used for multiple ant species, Solenopsis sp.) or to the species (Hel as 1 for a snail Helix aspersa as Hel a was used for sunflower, Helianthus annuus) abbreviation to differentiate otherwise identical names of allergens from different species. Sometimes, the taxonomic name of the species is changed (e.g. Betula verrucosa became Betula pendula), but the Sub-Committee might decide to keep the same original allergen name if it has been widely used in the literature (e.g. Bet v 1 for the major birch pollen allergen).
Allergens from the same source are classified in biochemical groups (designated by Arabic numeral that contains two digits), usually according to the order in which they were identified (e.g. Der p 1 or Bet v 1). When possible, allergens from different species, related up to the taxonomic level of family or order, that belong to the same biochemical protein family will be assigned the same number across species. However, due to the historical naming convention, some allergens of the same biochemical protein family have been assigned different numbers (e.g. pectate lyases Amb a 1 and Art v 6, PR-10 proteins Bet v 1 and Ara h 8, or the profilins Amb a 8, Bet v 2, Phl p 12 and Art v 4). Within each group, allergens can be isoallergens or variants (isoforms) depending on their amino acid sequence identity. Isoallergens are homologous allergens that share the following common biochemical properties: similar molecular size, similar or identical biological function, if known, and an amino acid sequence identity of at least 67% (as a guideline not always followed if justified). Each isoallergen may have multiple forms of highly identical sequences (> 90% identity, typically differing in only few amino acids), which are designated as variants (or isoforms). Isoallergens and their variants defined in this way are distinguished by numerical suffixes following a dot after the allergen number. The first two numerals 01-99 refer to a particular isoallergen (e.g. Amb a 1.01 and Amb a 1.02), and the two subsequent numerals 01-99 define each variant of a particular isoallergen (e.g. Amb a 1.0101 and Amb a 1.0102). Generally, allergen names should always be spelled out in full and partial abbreviation of allergen names (e.g. Ara h 1/2 or Ara h 1 and 2) is discouraged.

Review process by the Allergen Nomenclature Sub-Committee
The review process of a candidate allergen is shown in Fig. 2. The scientist submitting a protein as a candidate allergen should complete the submission form (www.allergen.org) and forward it to the Sub-Committee Chair by email. The scientist may suggest an allergen name (including isoallergen and variant number) based on current WHO/IUIS nomenclature.
Each submission is assigned for review to two or three members of the Sub-Committee, selected by the Chair of the Sub-Committee according to their area of expertise. Any possible conflict of interest with a potential reviewer can be indicated by the scientist submitting the form on the first page. The submission is held confidential within the Sub-Committee until a final decision is made. In case of uncertainty regarding particular criteria, descriptions or methods used, additional information may be requested. Final decisions are made by the Sub-Committee, based on the current criteria for allergen nomenclature, and communicated to the investigators who should then use the assigned name in their publication and in sequence databases. The process normally requires two weeks from submission to review, but may require longer depending on a need for additional data.
6. Essential criteria for acceptance of a protein as an allergen in the database Acceptance of a protein as an allergen in the database essentially relies on the demonstration of its presence in the source that causes allergies, characterization of the protein by standard methods of biochemistry and molecular biology (Section 6.1), and, most importantly, the proof of specific recognition by relevant human serum IgE (Section 6.2). The main criteria for acceptance of a candidate allergen, summarized in the decision chart of Fig. 3, are: • Description of the allergenic source, including taxonomic name. • Information on the purification and characterization of the candidate allergenic protein including amino acid sequence.
• Description of the allergic human serum donors used to test IgE binding to the candidate protein.
• Demonstration of specific IgE binding with five sera of relevant patients.
Over time, the WHO/IUIS Sub-Committee has refined the criteria for inclusion of proteins in the allergen database, as methods of testing by investigators and knowledge of allergen structures have improved (Chapman et al., 2007, Breiteneder andChapman, 2014). The database currently includes 880 allergens, about 100 of which have had their three-dimensional structure determined (Pomés et al., 2015). Demonstrated biochemical function and tertiary structure are not necessary to obtain a name from the Sub-Committee, but may help in the overall understanding of allergenicity and potential cross-reactivity.

Analysis of candidate allergens by biochemistry and molecular biology
The WHO/IUIS allergen database includes only proteins and glycoprotein allergens. Ideally, the full amino acid and nucleotide sequence of the allergen should be provided on the submission form. If the full sequence is not available (i.e. for purified native allergens that have not been cloned), a partial amino acid sequence must be determined (i.e. by mass spectrometry).
The apparent MW of the protein should be estimated based on migration compared to standards in SDS-PAGE, by size exclusion chromatography, by mass spectrometry, or by calculation based on the amino acid sequence. Mass spectrometry determines the absolute molecular mass and is more definitive than SDS-PAGE, but its use is not a mandatory requirement.
The amino acid sequence of the protein should be deposited in a general protein database (UniProt or GenBank Protein) and the accession number specified in the application form. The experimentally determined protein sequence, as well as the determined DNA sequence, should be entered in the submission form. If the amino acid sequence is incomplete (due to gaps for example), each amino acid gap should be entered as "xxx". In cases where only a partial sequence was determined by N-terminal or LC-MS/MS sequencing, which is identical to a published protein sequence from the same species, the accession number of the published sequence should be added with a clear note in the submission form. If the sequence has not been published before, the authors have the option to request that a non-public sequence remains confidential until publication.
To enable evaluation of potentially confounding data, scientists submitting a name are encouraged to provide additional information such as glycosylation state and purity of the protein isolated from the natural source or recombinant allergen. For glycosylated proteins, although not required, possible involvement of the glycan in IgE binding could be assessed following deglycosylation or by performing an inhibition assay using a high glycan source such as phytohemagglutinin (PHA) from Phaseolus vulgaris. Alternatively, IgE reactivity with a nonglycosylated recombinant allergen protein provides evidence that carbohydrate binding is not essential for allergen recognition if the recombinant protein has been demonstrated to fold like the native allergen. If the protein identity has been verified by monoclonal antibodies, the identity of the hybridoma and the assay method used should be described to allow the Sub-Committee to evaluate the quality and relevance of the data presented. If the cDNA sequence was determined by polymerase chain reaction (PCR) the sequence and position of primers should be described. If the sequence has already been published, the reference should be provided. If the sequence and description was already presented at a scientific conference, the abstract and poster or relevant presentation slides should be provided.
Finally, IgE-mediated allergies are also induced by small molecules if bound to carrier proteins (e.g. human serum albumin). These are low-MW organic compounds such as various drugs (e.g. antibiotics, muscle relaxants, opioids) and various organic compounds that can act as IgEbinding haptens and cause clinical reactions (Deak et al., 2016). There are also certain carbohydrate structures, such as specific plant-or insect-derived asparagine-linked glycans, commonly called cross-reactive carbohydrate determinants (CCD) that can bind IgE, but rarely or never elicit allergic reactions ( Van der Veen et al., 1997;Foetisch et al., 2003;Mari et al., 2008). The carbohydrate structure galactose alpha-1,3 galactose (alpha-gal), present on many non-primate mammalian proteins, has recently been found to cause severe delayed IgE-mediated reactions following consumption of red meat in those who have been sensitized by salivary proteins from tick bites (Commins et al., 2009;Commins and Platts-Mills, 2013). Similar relationships between tick bites and meat allergy have been found in Europe (by the tick Ixodes ricinus; Hamsten et al., 2013a;Hamsten et al., 2013b;Wilson et al., 2017), as well as Asia (Chinuki et al., 2016). These glycoproteins are unusual targets of IgE binding. Since the antibody recognition is directed to the glycan structure rather than the protein structure, they are not included in the WHO/IUIS database as allergens unless IgE binding has been demonstrated against the deglycosylated (unglycosylated) protein.

Proof of allergenicity of the newly identified proteins
Allergens are usually only accepted in the database if the purified protein is demonstrated to specifically bind IgE antibodies from at least five patients allergic to the respective allergen source. Specificity is demonstrated by lack of IgE binding in atopic and non-atopic control donors. Exceptions to the patients' number may be made in particularly justified cases (such as in the field of occupational allergy or allergens in novel foods or new products), where it might be difficult to find five specifically allergic patients to prove allergenicity. In these cases, additional proof of allergenicity might then be requested, such as IgE immunoblots, dot blots, ELISA, basophil activation tests or mediator release assays (i.e. RBL) with the pure allergen. The IgE binding should be tested in a described and well controlled assay with the purified (natural or recombinant) allergen. Data from in vivo provocation of patients is not required, but adds value in demonstrating mast cell degranulation. Such tests should only be performed following informed consent of subjects, ethical review panel approval and appropriately controlled, sterile antigens.
Proof of allergy to the allergen source is also preferred. In general, the route of exposure might contribute to identify symptoms. For example, the natural route of pollen, animal dander, fungal spores and airborne occupational allergen exposure is inhalation. Thus, rhinoconjunctivitis or asthma is the expected clinical outcome. Food allergens should be linked to objective symptoms of oral allergy syndrome, angioedema of the larynx, symptoms from the gastro-intestinal tract or systemic symptoms such as urticaria, asthma or anaphylaxis after consumption of the food. Insect bites or stings should generally include urticaria plus angioedema as well as other possible systemic reactions. However, exceptions will always occur. Skin symptoms after ingestion of allergens are quite common, and the same is true for respiratory symptoms. Patients with urticaria might not be the best candidates for skin testing. For some allergens (e.g. from mites) it is difficult to relate clinical symptoms to natural exposure. Also, some individuals with high levels of allergen specific IgE or skin test reactivity are asymptomatic upon natural exposure (Abraham et al., 2007). Therefore, in some cases larger samples might be required to demonstrate the association with clinically relevant allergy.
Sensitization and/or elicitation of an allergic response to the source (allergen extract) may be demonstrated in vitro by basophil activation or in vivo through inhalation, ingestion, intradermal injection, prick-toprick tests or mucosal contact if performed under institutional review board guidance and following ethical approval.
Positive reactions should be based on criteria such as at least three mm wheal size for allergen extracts greater than the negative SPT control or greater than an established cut-off (e.g. three times the standard deviation of negative control subjects by in vitro IgE binding for ELISA) for extracts or the purified protein. The primary questions of the submission form are designed to allow the reviewers to understand what was done, how complete the characterization of the allergen candidate was, indicating the protein sequence and physical characteristics, the selection of serum donors, tests for IgE binding as well as biological tests (SPT or BAT). The scientists submitting the form should provide enough information to demonstrate that their protein is most likely to be an allergen.

A. Pomés et al.
Molecular Immunology xxx (xxxx) xxx-xxx 7. Challenges to the Allergen Nomenclature Sub-Committee

Revision of allergen names
Due to the risk of confusion and the fact that some names have already been published, the WHO/IUIS Sub-Committee generally takes a conservative position regarding changing allergen names. However, some allergen names have been revised or further specified as the original characterization was incomplete or associations were determined to be wrong based on new evidence.
1. The seed storage proteins of soybean provide two examples. The Gly m 5 protein is a complex of three beta-conglycinin subunits that were not characterized at the time the allergen name was assigned.
One subunit is now known as beta-conglycinin alpha (67 kDa), the second as beta-conglycinin alpha' (71 kDa) and the third as betaconglycinin beta (50 kDa) as described by Holzhauser et al. (2009). The alpha-and alpha′-subunits are approximately 82% identical. The sequence of the beta-subunit is only 76% identical to the two alpha subunits. The three subunits fit together to form a trimeric structure. However, different ratios of Gly m 5 subunits form multimeric associations in protein bodies of the soybean (Maruyama et al., 2002). IgE from soybean allergic subjects may bind to one, two or three subunits of Gly m 5. The protein complex was originally named Gly m 5, but following analysis of intact complexes, various combinations of beta-conglycinins were identified to form the multimeric complexes (Holzhauser et al., 2009). Therefore, the individual subunits were finally named as isoallergens Gly m 5.01, Gly m 5.02 and Gly m 5.03 in 2009, with minor variants (e.g. Gly m 5.0301 and Gly m 5.0302). 2. Similarly, glycinin (Gly m 6) was also named with an understanding that the whole protein has a complex hexameric structure, made of combinations of different subunits (Nielsen et al., 1989). With further gene and protein characterization, it was recognized that at least five genes encode the subunits (Nielsen et al., 1989) and the allergen can be an arrangement of these five proteins which have individual IgE binding properties (Holzhauser et al., 2009), and which were renamed as five isoallergens in 2009. 3. Bovine milk casein was originally named Bos d 8, but full characterization has demonstrated that it is composed of complexes made up of four primary proteins (Willis et al., 1982). One is now called Bos d 9 (alpha S1 casein) as described by Nagao et al. (1984). The other caseins are defined as Bos d 10 (alpha S2 casein), Bos d 11 (beta casein) and Bos d 12 (kappa casein) with IgE binding defined by Natale et al. (2004). These proteins have high sequence diversity and they form complexes associated with calcium ions involved in a micellar structure. IgE antibodies from cow's milk allergic subjects may bind to any combination of these subunits. Additional IgE binding from cow's milk allergic individuals involves beta-lactoglobulin, alpha-lactalbumin and for some, IgG or lactoferrin. 4. There are examples of homologous allergen domains or whole allergens from grass pollen of related species (Lolium sp, Dactylon sp, and Phleum sp.) with proteins sharing considerable sequence identity which are difficult to assign to either the same or a different group. For example, Lol p 2 and Lol p 3 share approximately 60% identity. Dag g 2 is 60% identical to Lol p 3, 65% identical to Dac g 3 and 65% identical to Phl p 2, which is within the differences in isoforms accepted by the Sub-Committee. However, both grass pollen groups 2 and 3 were kept separate as there is high conservation across species and a marked reduction in sequence identity between group 2 and group 3 proteins among grass pollen. There is also some homology to group 1 beta-expansin proteins of the group 2 and 3 grass pollens though the major group 1 beta-expansin grass pollen allergens including Lol p 1 are much longer proteins and share little IgE cross-reactivity (Devanaboyina et al., 2014). 5. Two examples exist of allergens that were renamed due to high sequence identities. One is the ragweed allergen Amb a 2 that has a high amino acid sequence identity to group 1, and was therefore renamed as Amb a 1.0501. The other is a seed storage glycinin allergen in peanut, Ara h 4 that is > 90% identical to Ara h 3.0101 and it was therefore renamed as Ara h 3.0201 in 2012.

Difficulty in naming certain allergens
Some allergens are difficult to name for different reasons.

Complexity
The major cat allergen Fel d 1 (Morgenstern et al., 1991), has a crystal structure similar to that of secretoglobin (Kaiser et al., 2007). This allergen is a tetrameric protein formed by two heterodimers of chain 1 and chain 2, which are encoded by separate genes. Unlike Gly m 5 and Gly m 6 mentioned above, two different proteins are under one allergen name, Fel d 1.0101 (King et al., 1995). The allergen has a unique structure and there is no evidence of IgE cross-reactivity with secretoglobulins of other species . A revision of this name in the future is possible but unlikely, given the widely spread use of Fel d 1 in the scientific community.

Low amino acid identity
Glutathione S-transferases (GST) from cockroach are structurally and functionally related and belong to different GST classes. GSTs attach a glutathione molecule to highly different substrates leading to the high GST diversity regarding function and sequence. The question is whether allergens with a similar biochemical function deserve the same group number, despite highly different sequences. It seems that identical group numbers within a species should generally alert scientists to the possibility of cross-reactivity. Nevertheless, cockroach GSTs from P. americana and B. germanica were assigned to the same group 5, despite their low amino acid identity (15.7% between Bla g 5 and Per a 5; way under the suggested threshold of 67% to be in the same group as isoallergens). More recently the Sub-Committee is trying to reserve the same number for homologous proteins from different species.

Hydrophobic nature of some allergen parts
Oleosins are hydrophobic allergens with an unusual hydrobophic core and few hydrophilic sections responsible for IgE antibody binding. Assignment of names to oleosins was difficult, because the molecular structure was taken into consideration to dissect and compare only the hydrophilic portions of the protein. Other proteins challenging to analyze for the purpose of assigning an allergen name include vitellogenins of chicken (Walsh et al., 1988). Vitellogenins are encoded as polyproteins and expressed in the liver as phospholipoproteins, then transported via blood and taken up and processed in egg yolk. The process includes proteolysis into independent shorter proteins and one has been listed now as an allergen by WHO/IUIS (Gal d 6) (De Silva et al., 2016). Similar cases were reported earlier for fish roe proteins (Shimizu et al., 2009). The proteins are complex and assignment of IgE binding has often been to the holoprotein.

Assignment of names to isoallergens or variants (isoforms)
Isoallergens or variants (isoforms) are currently defined based on their amino acid sequence identity, but they might be originated in different ways (Table 1). In some cases, information on whether they are encoded by separate but related genes or the same gene with polymorphisms could assist in the identification. For example, there are 7 gene loci that encode beta expansins in Johnson grass pollen, including Sor h 1. The two allergens Sor h 1.0101 and Sor h 1.0201 are encoded by separate gene loci (Campbell et al., 2015). Similarly, the cockroach allergens Bla g 1.0101 and Bla g 1.0201 are listed as isoallergens due to the high sequence similarity, but recent genomic evidence suggests they are distinct gene products (Fig. 4). Investigators are not required to provide information on gene structure or location, but as this information becomes available it might help to improve the understanding of protein structures and potential name assignments.
New isoallergens will only be accepted in the database if IgE binding has been demonstrated to the isoallergen. New variants will only be accepted if there is justified evidence that the variant is expressed in the organism and tissue causing allergies. The use of PCR based cloning techniques and genome analysis has increased the potential identification of erroneous sequences. The committee would like investigators to sequence multiple clones from independent experiments and to provide accurate information on the source and methods used to determine sequences. Ideally, the existence of the protein in the natural source should be verified by mass spectrometry or N-terminal sequencing with sufficient length to verify that the protein sequence matches the nucleic acid sequence. Additional information for inclusion into the database are the characterization of the purified recombinant variant, confirmation of the source and identity of the organism, as well as IgE reactivity with the recombinant protein.

Cross-reactivity might impair the identification of source-specific allergens
Considerations include whether proteins that are homologs of allergenic proteins are allergens themselves, binding IgE and if so, whether patients show symptoms upon exposure. Antibody cross-reactivity can occur with only few amino acids (1-3) in common between homologous proteins (Glesner et al., 2017), but it does not necessarily translate into clinical cross-reactivity. Serum IgE cross-reactivity has been observed between PR-10 related allergens from birch pollen (Bet v 1), apple (Mal d 1) and other plant-derived foods. However, birch pollen is usually the primary sensitizer and elicitor of the allergic immune response, whereas cross-reactive IgE binding to Mal d 1 is often of lower avidity, or may involve a single IgE epitope and therefore may or may not be clinically relevant for individual patients. Similarly, there are patient-specific patterns of IgE cross-reactivity to fish parvalbumins coupled with clinical sensitivity to some fish, but not to other fish species (Kuehn et al., 2014) or to chicken (fish-chicken syndrome) . There are numerous examples of reactivity to homologous allergens from multiple allergen sources that can be explained by IgE cross-reactivity. For example, there is cross-reactivity among lipocalins Can f 6 of dog dander, Fel d 4 of cat dander and Equ c 1 of horse dander (Nilsson et al., 2012). Each of these three allergens can act as primary sensitizer, co-sensitizer or cross-reacting molecule, depending on the exposure profile of the patient. Cross-reactivity with Can f 6 was responsible for a clinically relevant positive specific IgE to dog dander in a horse allergic patient (Jakob et al., 2013). Data on Gly m 5, Gly m 6 IgE antibody recognition may be affected depending on epitope distribution. a Polyploidy (more than two haplotypes) will further increase the number of genes (each of which can harbor different allelic isoallergens/variants). b This kind of modification does not lead to different isoallergens or variants. sequence and structural similarity combined with serum IgE binding must be carefully evaluated as demonstrated by the studies of IgE binding and allergy of Ole e 1 (olive pollen allergen), Fra e 1 (ash pollen allergen) and Pla a 1 (English plantain pollen allergen). The overall structures are similar as could be predicted to some extent by sequence, yet there is a clear gradation of IgE binding among patients, with unique clinical responses (Stemeseder et al., 2017). Thus, sequence or structural information alone is judged insufficient to name a protein as an allergen.
7.5. Is the Sub-Committee naming too many allergens?
The WHO/IUIS Allergen Nomenclature database lists 17 allergens in peanut Arachis hypogaea, 31 in house dust mite Dermatophagoides farinae, 20 in Dermatophagoides pteronyssinus and 23 in Aspergillus fumigatus. Some of those are likely minor allergens or only hypothetical proteins from cDNA (such as Tri a 42 and Tri a 43 in wheat). Some simply have a low IgE binding capacity for some subjects that is only detected when presented at high concentration by in vitro assays, and might not contribute to the IgE binding to the source because the allergen might not be well represented in the allergenic material or allergenic extracts (Casset et al., 2012). Others, like Der p 23, turned out to be significant allergens (Weghofer et al., 2013). In contrast, some allergens such as Der f 22, which shows a low identity (∼40%) to Der f 2 and no publications demonstrating evidence of IgE binding or reactivity in the literature for nearly 10 years, probably should not have been accepted by the Sub-Committee. Der f 22 will be re-reviewed and might be removed from the database unless new evidence is uncovered by the Sub-Committee in the future. The Sub-Committee is considering removing other allergens from the database. Some allergens (such as the tomato allergens Sola l 2), bind IgE only through carbohydrates, and this IgE is not clinically relevant. In other cases, IgE binding to high concentrations of purified proteins that are naturally low in abundance can mislead the assessment of a protein as an allergen. In these situations, the use of non-glycosylated recombinant proteins at concentrations mimicking natural protein abundance would assist in assessing IgE binding to the desired protein.

Integration of big data from 'omics' research
The application of massively parallel DNA and transcriptome sequencing, proteomics, bioinformatics, mass-spectrometry and protein chemistry processes coupled with sensitive high or multi-allergen array immunoassays is markedly increasing the capacity for data generation and discovery. Through studies of whole allergen source transcriptomics and genomics, it is evident that multiple gene copies exist in a given source for a single protein family that contains allergenic as well as non-allergenic members. Thus not all putative allergen gene sequences actually encode allergens. This may be for a variety of reasons including that some gene loci are pseudogenes. Not all "genes" are transcribed, translated and processed in the relevant tissue at the time when or location where sensitized individuals are exposed. Examples of this include the greater number (six of seven) genes encoding Johnson grass pollen beta-expansin proteins that can be confirmed within the pollen proteome, whilst only two of these gene loci encode IgE reactive proteins isoforms. Similar findings have been observed and/or proposed for the multiple beta-expansin gene loci of rice (Devis et al., 2017) and Sorghum bicolor (Paterson et al., 2009) and profilins of rice (Devis et al., 2017). Other examples are the polcalcin protein family with the allergen Phl p 7, parvalbumins of fish (e.g. Gad m 1) and the PR-10 family, which includes Bet v 1 from birch pollen (Erler et al., 2011).
Currently, there is good knowledge of many protein families that contain allergens (Radauer et al., 2008). Yet many proteins within the families are not known to cause allergy. Although there are numerous publications that attempt to predict allergens using bioinformatics with genomic and transcriptomic data, detection of the proteins in the allergenic source material is essential. Predictive models of allergenicity based on sequence identity are not reliable due to the wide variations in the percentage of amino acid sequence identity that is associated with cross-reactivity, and the lack of knowledge regarding the actual epitopes for these proteins. Other factors are important for allergenicity, such as abundance and structural stability for food allergens or solubility for aeroallergens. Thus the WHO/IUIS Sub-Committee strives to stimulate appropriate characterization of allergenic proteins and demonstration of IgE binding from appropriate human volunteers, to demonstrate that a given protein is likely to cause allergy.

Assignment of names before publication
The goal of the WHO/IUIS Allergen Nomenclature Sub-Committee is to assign names to allergens before publication. The purpose is to associate a name with the original publication. However, the danger of this process is that the data used to characterize the allergen have not been evaluated through peer-reviewed publication. In the last four years, the Sub-Committee has been asking for more comprehensive information prior to accepting a candidate as an allergen, and the submission form has been modified accordingly to reflect that and it will likely continue evolving. One goal of this publication is to encourage investigators to meet the need for more relevant information prior to submission.
Journals cannot rely on reviewers to make sure that correct allergen nomenclature is used in publications unless it is emphasized by the editorial office and forms part of the "Instructions to Authors". Allergen names are sometimes published that do not comply with the official WHO/IUIS nomenclature, causing inconsistencies in the literature. Currently, journals (especially journals that are not only allergy focused) still sometimes do not publish names accepted by the Sub-Committee. To help improve the correct assignment and use of allergen names, the WHO/IUIS Allergen Nomenclature Sub-Committee has contacted in 2017 the editorial offices of the main journals that publish allergen studies, to request the implementation of a policy requiring the use of the official allergen nomenclature in their publications. There has been a positive response from the journals and most of them have already implemented such a change of the instructions to authors. Some allergens not recognized and listed by the WHO/IUIS have been published in journals indexed in PubMed. Even more commonly, genomics, transcriptomics and proteomics projects are assigning proteins in sequence databases (e.g. GenBank) with "allergen-like" names or even listing proteins or genes with a WHO/IUIS-like name, however, without having provided evidence-based data on protein allergenicity. Authors are encouraged to submit such allergens to the Sub-Committee for its consideration and having an official name assigned.

Other allergen sequence and structure databases
Several publicly available allergen databases exist with different purposes, strengths and weaknesses (reviewed by Radauer, 2017). The WHO/IUIS allergen nomenclature database provides a systematic and official allergen nomenclature for many allergens. Allergome (www. allergome.org) comprises a large amount of partially filtered information and references of allergens, public references for the proteins and links to many reference sources. AllergenOnline (www.allergenonline. org) is a peer-reviewed risk assessment tool for food safety and includes allergens from all sources (Goodman et al., 2016). AllergenOnline is updated annually with reviews of individual proteins and data of IgE binding and clinical responses after publication of data on the allergens and allergenicity. The COMPARE database (comparedatabase.org) hosted by the Health and Environmental Sciences Institute was derived from the AllergenOnline.org database (> 99% of entries in COMPARE) and shows the same "year adopted" for entries as in the AllergenOnline database. It does not yet have a sequence search system, but is intended to eventually be a risk assessment tool similar to AllergenOnline. The Structural Database of Allergenic Proteins database (SDAP, fermi.utmb.edu) is linked to a structural database source and is intended as a risk assessment tool. The Immune Epitope Database (IEDB, www.iedb.org) provides information on T-cell and B-cell epitopes as well as MHC binding proteins of many proteins including pathogens, viruses, bacteria and some allergens. The AllFam database (www.meduniwien.ac. at/allfam) focuses on protein families of allergens based on information from publications, from WHO/IUIS, AllergenOnline and Pfam databases. Users of these databases should understand the information and intent of each of these databases and their limitations.

Future directions and new initiatives
A The Sub-Committee has had recognition and some financial support from the International Union of Immunological Societies (IUIS) for a number of years. In 2016 the Sub-Committee gained additional support from the European Academy of Allergy and Clinical Immunology (EAACI) that recognized it as one of its committees. In 2017, support was gained from the American Academy of Allergy, Asthma and Immunology (AAAAI Program server at the University of Nebraska-Lincoln (USA). The Who/IUIS database was transferred to the EAACI computer server in November 2017 and the cost of maintaining the database will be shared by the IUIS, EAACI and AAAAI. The WHO/IUIS Allergen Nomenclature Sub-Committee is now an official committee in the EAACI organization. C The Sub-Committee is reviewing and improving available information and tools on the database to help researchers compare their newly identified candidate allergens in the context of the database. The Sub-Committee welcomes suggestions that may be submitted to the Chairman or Sub-Committee members. D As advances are made in technology of gene sequencing, transcriptomics and proteomics, more questionable allergen candidates are being submitted to the WHO/IUIS Sub-Committee and other sequence databases. The Sub-Committee encourages that scientists revise and remove "unproven" allergens from protein databases (GenBank Protein, and UniProt). However, it is important to note that as a committee on nomenclature, the mission is to provide a framework for consistent allergen identification and not to pass judgement on whether an allergen is a major, mid-tier, or minor allergen. The scientific discussion about the merits of a particular allergen is simply facilitated by accurate nomenclature to identify the molecule in question. E How the Allergen Nomenclature Sub-Committee responds to and evaluates the increasing numbers of putative allergens will be an increasing challenge. Not all putative allergen-like sequences are demonstrably allergenic. Processes and criteria for assessing allergenicity might need to be continually evaluated and revised. Research and critical thinking will need to address the capacity to assimilate many different data sources into useful and clinically relevant information that aids the diagnosis and treatment of allergic patients. F A full online review process analogous to the ones for manuscripts can be envisioned. Name assignments to allergen candidates are performed by the Chair of the Sub-Committee and two to three reviewers. Currently the WORD file submissions are maintained on the server. Once an online entry form is developed, the submissions should be handled with less chance of manual errors, although the committee will review all entries before they become public.

Conclusions
The WHO/IUIS Allergen Nomenclature Sub-Committee serves as an expert professional group that reviews supporting evidence of allergenicity of proteins, assigns names and manages an official list of recognized allergens. The Sub-Committee also clarifies the understanding of qualities of allergen molecules and focuses current thinking about the criteria for allergenicity of proteins from molecular, biochemical, in vitro immunoassay-based methods as well as clinical perspectives. Investigators are encouraged to follow the criteria set by the Sub-Committee to provide evidence-based data on allergenicity prior to submitting an allergen. Ultimately, whether a protein is an allergen for a particular patient depends on the level of sensitisation, exposure and clinical allergy status. As the introduction of allergen components has had major impact on analytical sensitivity, specificity and allergy diagnosis (Matricardi et al., 2016), allergen nomenclature will remain and become increasingly important in the context of the application of the wide array of "omics" technologies in allergy research. In the future, the WHO/IUIS Allergen Nomenclature Sub-Committee will continue to provide expertise and criteria to evaluate evidence of the specificity of IgE binding for candidate allergens. These criteria will most likely evolve as additional data are generated from new technologies. Recently, there has been an explosion of "Big Data", from various "omics" techniques that will impact on the submissions to the database. The continued activities of the WHO/IUIS Allergen Nomenclature Sub-Committee are an important component of the review process to assign names to allergens before peer-reviewed publication. These activities help scientists and reviewers to focus on relevant data supporting characterization of allergens and IgE binding, and stimulate more rigorous review of clinical implications applied to allergens. The assignment of reliable names to allergens is useful for pharmaceutical companies making diagnostic and therapeutic materials for patients, helps provide information as the medical field of allergology continues to move from using crude extracts to the use of individual allergens. However, database users should remember that a WHO/IUIS name designation relies on data provided by the scientist submitting the candidate allergens, before peer-review publication. The Sub-Committee contributes to improve continued interactions of scientists and clinicians in the effort to improve diagnoses, therapy and risk avoidance for those who have specific allergies.