Positive selection at the receptor-binding site of haemagglutinin H 5 in viral sequences derived from human tissues

Received 28 March 2008 Accepted 2 May 2008 Highly pathogenic H5N1 avian influenza virus has spread through at least 45 countries in three continents. Despite the ability to infect and cause severe disease in humans, the virus cannot transmit efficiently from human to human. The lack of efficient transmission indicates the incompletion of the adaptation of the avian virus to the new host species. The required mutations for the complete adaptation and the emergence of a potential pandemic virus are likely to originate and be selected within infected human tissues. Differential receptor preference plays an important role in the species-tropism of avian influenza. We have analysed quasispecies of sequences covering the receptor-binding domain of the haemagglutinin gene of H5N1 viruses derived from fatal human cases. We employed a likelihood ratio test to identify positive-selection sites within the quasispecies. Nine of seventeen positive-selection sites identified in our analyses were found to be located within or flanking the receptor-binding domain. Some of these mutations are known to alter receptor-binding specificity. This suggests that our approach could be used to screen for mutations with significant functional impact. Our data provide new candidate mutations for the viral adaptation to a human host, and a new approach to search for new genetic markers of potential pandemic viruses.


INTRODUCTION
The high replication error of RNA viruses, such as influenza virus, results in a mixed viral population with many variants, referred to as quasispecies (Eigen, 1996).Although, most sequence variations are neutral and offer no competitive advantage, the quasispecies provides multiple variants, which can be readily selected if there is a change in selective pressure.Transmission of the virus to a new host species provides new selective pressures and can result in the expansion of the 'best fit' minor variants for adaptation to the new environment.
In general, human and avian influenza A viruses differ in their recognition of host cell receptors.Human influenza viruses preferentially recognize a-2,6-linked sialic acid, while avian influenza viruses recognize a-2,3-linked sialic acid (Rogers & Paulson, 1983).However, the highly pathogenic avian influenza viruses subtype H5N1 can transmit directly from avian species to humans (Subbarao et al., 1998;Tran et al., 2004).Even though the H5N1 viruses can infect and cause severe disease in humans, they do not bind the a-2,6- linked sialic acid receptor with high affinity (Ha et al., 2001;Stevens et al., 2006).This property is believed to be one of the major factors that prevent the H5N1 virus from transmitting efficiently amongst humans and causing a pandemic.Amino acid substitutions in the haemagglutinin (HA) gene can lead to the altered receptor-binding preference of the virus from a-2,3-linked to a-2,6-linked sialic acid (Auewarakul et al., 2007;Stevens et al., 2006;Yamada et al., 2006).This would enable avian H5N1 viruses to recognize human-type host cell receptors and could potentially enable the virus to transmit efficiently within the human population and cause a catastrophic pandemic.Therefore, it is extremely important to monitor the viral changes that may lead to the emergence of pandemic viruses.
The earliest possible detection of selected mutants is by looking at sequences within viral quasispecies before they expand and become dominant virus.In order to detect mutants that might have altered phenotypes we studied the viral sequences at the level of quasispecies.We conducted a study where the viral sequence was directly amplified, cloned and sequenced from a nasopharyngeal aspirate or tissue specimens.The specimens were obtained from fatal human cases in Thailand.
Selection at the protein level can be measured by v (d N /d S ), in which d N 5non-synonymous substitution rate (non-synonymous changes per non-synonymous site) and d S 5synonymous substitution rate (synonymous changes per synonymous site).If amino acid changes provide better fitness, the mutations will be fixed at a higher rate than synonymous mutations.This results in d N .dS and v.1.Originally, v was calculated as an average for the whole gene, which does not allow sensitive detection of individual amino acid residue under positive selection.Subsequently codon-based models that allow the v ratio to vary amongst sites were developed (Nielsen & Yang, 1998;Yang & Nielsen, 2000).These models describe v ratio distribution amongst sites: M0 assumes one constant v for all sites; M3 classifies sites into discrete classes with different v; M7 allows v to vary according to a distribution that represents negative or neutral selection; M8 adds on top of M7 a discrete v class for sites with positive selection (v.1).Likelihood ratio test is used as a statistical test of goodnessor-fit to compare the two models and test whether the more complex model, e.g.M8 or M3, fits the dataset significantly better than the simpler model, e.g.M7 or M0.An empirical Bayes approach is then used to calculate the posterior probability that each site is from a particular site class, and sites with high posterior probabilities coming from the class with v.1 are inferred to be under positive selection (Yang et al., 2005).

METHODS
Patient.The first patient (patient A) was a non-autopsy case, previously reported (Auewarakul et al., 2007).Patient A was a 5-yearold boy who had progressive viral pneumonia that led to respiratory failure and death by 12 days after the onset of illness.In December 2005, he developed a fever, stomach ache, nausea and vomiting and was admitted to the hospital 10 days later when he developed dyspnea and a chest radiography showed patchy infiltration at the right middle lobe.The symptom of dyspnea worsened to respiratory failure and the pulmonary infiltration spread to both lungs on the following day.The diagnosis of avian influenza was suspected on day 12.The use of the antiviral drug oseltamivir was started and the patient expired on the same day.The patient was not known to have had direct contact with any sick or dying birds, but he played in the yard where the birds were often present.Nasopharyngeal aspirate was collected on day 12 after the onset of illness.
Autopsy was conducted in the second (patient B) and third patients (patient C).Patient B, previously reported (Uiprasertkul et al., 2007), was a 48-year-old man who had progressive viral pneumonia in October 2005.He had a fever, cough, runny nose, myalgia and chest pain at the onset of illness.Dyspnea developed on day 2 of the illness and a chest radiograph showed interstitial infiltrations at the right upper and left middle lung fields and a mass-like infiltration at the right middle lung field.The diagnosis of avian influenza was suspected on day 4 of the illness after a history of direct contact with dying chickens was revealed.Respiratory secretions were then sent to national laboratories and they were confirmed positive for influenza (H5N1) virus.The patient died on day 6 of the illness.
Patient C, previously reported (Uiprasertkul et al., 2005), was a 6year-old boy who had progressive viral pneumonia in January 2004.He was initially treated with multiple broad-spectrum anti-microbial agents.Virological diagnosis of H5N1 infection was made on day 7 of the illness.After oseltamivir became available in Thailand, he was treated on day 15 of his illness with this agent until he died.He was also treated with methylprednisolone on day 15 until death and with granulocyte colony-stimulating factor for leukopenia from day 5 to 10 of the illness.The patient died on day 17 of the illness.
The use of the patients' specimens was approved by the Ethics Committee of the Faculty of Medicine Siriraj Hospital.
Viral RNA, cloning and quasispecies analysis.For patient A, total RNA from the nasopharyngeal specimen was extracted according to the manufacturer's protocol (QIAmp RNA mini kit; Qiagen).For patients B and C, total RNAs were extracted by using Trizol from paraffin-embedded blocks of lung and intestine tissue samples and then purified using Qiagen RNAeasy kit according to the manufacturer's instructions.
A fragment of the HA gene covering the receptor-binding site (nt 413-905) was amplified from RNA extracted from the specimen by using the high fidelity enzyme Pfu (Promega) and the primers HHAf2 (59-GGTCCAGTCATGAAGCCTCA-39) and HA-H5r12 (59-TTTAT-CGCCCCCATTGGAGT-39).The PCR product was cloned into pGEM T-Easy.One hundred clones of each sample were picked up and sequenced.
The selective pressures acting on the receptor region were estimated by using the CODEML program in the PAML package.We used models M7 and M8, where M7 contains 10 v categories to describe v amongst sites, all constrained to be ,1; M8 differs from M7 only in that it estimates v for an extra class of sites (p10) at which v can be .1 (Yang, 1997).Models were compared using a likelihood ratio test and the Bayes Empirical Bayes (BEB) method was used for a posteriori estimation of individual codons under positive selection (Yang et al., 2005).

RESULTS
We have analysed the HA sequences from either each individual patient or each organ (lung or intestine) of the patient.During a phylogenetic analysis (data not shown), the consensus wild-type sequence of each patient was found to be similar to one another and closely related to other clade 1 sequences from Thailand.All together 17 positive-selection sites were identified (Table 1) within this sequenced region spanning 143 aa residues (128-270).At these positions, the sequence majority in all the samples contains similar amino acids as in the consensus sequence of all human H5N1 viruses available in the GenBank database, except for positions 133 and 138 where mutant amino acids were found in higher frequencies.Total numbers of synonymous and non-synonymous substitutions, as well as lists of non-synonymous substitutions in all the RNA samples are shown in Table 2. Some of these mutations with low frequencies were not picked up as positive-selection site by the BEB analysis.Amongst the positive-selection sites, six sites were repeatedly found when each patient was analysed individually.These sites were 133, 138, 161, 186, 222 and 227 (H3 numbering system).Of these all but the 133 and 161 positions are in the known receptor-binding domain.The receptor-binding site of HA at the tip of HA1 globular domain is composed of three secondary structure elements: the 190 helix (residues 190-198), the 130 loop (residues 135-138) and the 220 loop (residues 221-228), forming the sides of each site; and the base made up of the conserved residues Tyr 98 , Trp 153 , His 183 and Tyr 195 (Skehel & Wiley, 2000).Although, the positions 133 and 230 are not in the receptor-binding domain, they flank the 130 and 220 loops and they showed positive selection in our analyses, suggesting that mutations at these sites might also Sources of sequences, wild-type and mutant amino acids, positions (H3 numbering), frequencies of mutant amino acids and site-specific v (d N / d S )±SEM are shown (v.1 indicates positive selection).The wild-type amino acids are from the consensus sequence of all human H5N1 viruses available in the GenBank database.Bold-typed residues are related to the receptor-binding domain.The residues that have been shown to carry receptor preference determinant for H5N1 are underlined.contribute to the receptor-binding adaptation.A138V, N186K and S227N mutations were previously reported to confer a-2,6-linked sialic acid binding to H5N1 virus (Auewarakul et al., 2007;Gambaryan et al., 2006;Shinya et al., 2005;Yamada et al., 2006).The mutant sequences in our analyses are similar to these mutations at positions 138 and 227, whereas our mutation at position 186 is N186D.Amongst the 17 positive-selection sites that were identified in our study, eight sites are not known to be related to the receptor-binding domain.But, these sites all showed only low mutation frequencies.In other words, all strong positive-selection sites with high mutation frequencies are related to the receptor-binding domain.

Sample
In patients A, B and C, we found seven, five and ten positive-selection sites, respectively.Of these five, three and five sites, respectively, are in or they flank the receptorbinding domain (Table 1).The higher frequency of positive-selection sites in patient C may be related to the fact that this patient died on day 17 of the illness, while the samples from the other two patients were collected earlier in the course of the illness.The longer duration of infection provided a longer period under the selective pressure and might cause the virus to gain more adaptation.In patient C, there were markedly more positive-selection sites that are not related to the receptor-binding domain, including the N-linked glycosylation site at position 158.This suggests that other selective pressures, such as the immune response, might be involved in late phases of the disease.
When each type of tissue from patients B and C was analysed, four and seven positive-selection sites were identified in the lung and intestine, of which two and four sites are related to the receptor-binding domain, respectively.(Table 1).There are some differences in the positiveselection sites from different tissues.In particular, the S227N mutation was found in high frequency (36.7 %) in the intestine of patient C and in low frequency (3 %) in the intestine of patient B, while it was absent in the lung tissue from both cases.This suggested a strong selection for this mutation and a compartmentalization of the viral population within the patients.
Most of the identified positive-selection sites concentrated in the N-terminal 2/3 part of the sequences.Sixteen sites were identified in the region covering the receptor-binding domain from residue 128 to 230, whereas only two sites were identified in the rest of the sequences (residue 231-270).
The observed differences in the viral quasispecies in lung and intestine suggest that there might be a compartmentalization of viral infection and that the selective pressure might be different among different tissues.Human lung has been shown to contain a-2,3-linked sialic acid in alveolar epithelial cells, whereas human intestinal epithelium lacks this receptor.In human intestine the a-2,3- linked sialic acid receptor was identified only on neurons (Yao et al., 2008).It is not clear whether the difference in the sialic acid receptor distribution between the two tissues contributed to the different selection of viral sequences.
Our data demonstrate adaptation of the receptor-binding domain of H5N1 virus in infected human tissues.If allowed to be transmitted further to other human hosts, the mutants would be likely to be selected further and expand, and eventually cause emergence of a potential pandemic virus.Understanding the adaptation is therefore of upmost importance.Adaptation of avian influenza virus to human host involves multiple mechanisms.However, the receptor usage preference is likely to be a major step in the adaptation process.Although, there have been reports on mutations that altered receptor-binding specificity of H5, those reported mutations only conferred partial switching from a-2,3-linkage tropism to dual tropism (Auewarakul et al., 2007;Gambaryan et al., 2006;Stevens et al., 2006;Yamada et al., 2006).It is likely that these mutations are not sufficient and full switching to a-2,6- linkage tropism is probably needed for an efficient transmission in the human population.Such mutation that can cause a complete switching of H5 is not known.The fact that our analyses could pick up mutations that are known to change the receptor-binding property of H5N1 viruses indicates that they can be used to screen and search for mutations with significant functional effects.Our analyses offer an approach to find candidate mutations, which should be studied further for determining functionality.Finding mutations with pandemic potential before the actual emergence of such viruses will provide genetic markers for vigilant monitoring, which will hopefully help us to avoid the pandemic.

Table 1 .
Positive-selection sites on the HA gene from residue 128 to 270

Table 2 .
Numbers of all synonymous and non-synonymous substitutions and a list of non-synonymous substitutions in each dataset