Interpreting missense mutations in Human TRIM5alpha by computational methods

Background The human restriction factor TRIM5α may play an important role in regulation of the human immunodeficiency virus (HIV). It is unclear whether non-synonymous single nucleotide polymorphisms (nsSNP) in TRIM5α affect the clinical course of HIV infection. Findings We surveyed the literature for TRIM5α nsSNPs and used comparative sequence analysis to predict the effect of each polymorphism on protein function. Twenty-eight nsSNPs were identified with available functional data, clinical data, or both. The four comparative method programs assessed included SIFT, PolyPhen, A-GVGD, and average BLOSUM62 pairwise score. Two common polymorphisms, H43Y and R136Q, were predicted to be benign based on comparative sequence analysis. The nsSNPs P323R, K324N, I328M, G330Q, R332P, I348V, and T369S were all predicted to affect protein function. Conclusion Comparative sequence analysis offers a functional tool to analyze unknown nsSNPs in TRIM5α.


Background
Human immunodeficiency virus type 1 (HIV-1) infection depends on both viral and human genetic factors [1]. Single nucleotide polymorphisms (SNP) in different immune-modulation genes have been shown to affect susceptibility and progression of disease. Of these, the restriction factors APOBEC3F [2], APOBEC3G [3], and TRIM5α [4] are innate immune proteins that affect postentry steps in HIV-1 replication and confer resistance to retroviruses in other species.
The tripartite motif restriction factor TRIM5α is a cytoplasmic and nucleolic protein that restricts viral infection by interfering with the capsid protein, promoting premature disassembly [5]. TRIM5α has been studied in primates where it has been shown to be extremely effective at inhib-iting HIV-1 and other lentiviruses [6,7]. The restriction factor is composed of several regions: the RING domain, B-boxes, a coiled-coil domain, and a carboxy-terminal SPRY (B30.2) domain [8]. The SPRY domain defines antiretroviral activity of TRIM5α and amino acids in this region show a high degree of positive selection based on sequence comparison in primates [9,10]. The variation in the SPRY domain is responsible for the specificity of TRIM5α in primates, but not humans, to effectively restrict HIV-1. The RING domain contributes to the antiviral activity, but the exact function remains unknown [8,10,11]. Different studies have analyzed TRIM5α nsS-NPs in both HIV-1 infected and non-infected populations [12][13][14][15][16]. The affect of TRIM5α polymorphisms on protein function and clinical course of HIV-1 remains controversial. Studies have revealed conflicting results secondary to variations in functional assays, lack of power in clinical cohorts, and possible linkage disequilibrium between alleles.
Comparative sequence analysis is a powerful technique that can predict whether an nsSNP is likely to affect protein function. These methods rely on the fact that critical residues for function are conserved across different genomes and should not vary [17][18][19]. These amino acids may directly participate in enzymatic reaction or have an important role in secondary or tertiary structure. Likewise, residues that are not vital would be subject to increased variation with little to no affect on protein function. Recently, we studied the accuracy of four methods using comparative sequence analysis to predict the affect of nsS-NPs on protein function [20]: (1) SIFT (Sorting Intolerant from Tolerant, http://blocks.fhcrc.org/sift/SIFT.html) [21]; (2) PolyPhen (Polymorphism Phenotyping, http:// genetics.bwh.harvard.edu/pph) [22]; (3) A-GVGD (Grantham Variance-Grantham Difference, http:// agvgd.iarc.fr) [19]; (4) Average BLOSUM62 pairwise score [20].
The accuracy of any one method for predicting a non-synonymous SNP as either deleterious (affecting protein function) or tolerant (benign) is approximately 80%. When all four methods agree, the predictive value is greater than 90% [20]. The goals of this study are to: (1) Predict the affect of TRIM5α nsSNPs using comparative sequence analysis and compare our results to known invitro and clinical data; (2) Identify mutations that are likely to affect TRIM5α protein function and may warrant further investigation in clinical cohorts and functional assays.

Creation of multiple sequence alignments
Amino acid sequence alignments were constructed using the standard program ClustalW as previously described [18]. Homologs of genes of interest were retrieved from GenBank after BLAST searches using the human sequence as the query. Alignments are available as supplemental material. TRIM5α nsSNPs were evaluated by four publicly available methods as previously described [20]: 1) Average BLOSUM62 pairwise; 2) SIFT; 3) PolyPhen; 4) A-GVGD. These computational methods were applied to known clinical and functional data. Literature containing TRIM5α nsSNPs was identified by searching PubMed [12,13,15,16,23]. The agreement of the four methods was assessed for overall consistency using Fleiss' kappa [24,25].

Results and discussion
Comparative sequence analysis is a powerful tool for the analysis of nsSNPs in the human genome. A large variety of organisms selected for sequence analysis means fewer sequences will be needed to make inferences secondary to long divergence times and increased number of mutations [20,26]. Too little variation can cause residues to be overly conserved with 'false positive' results, i.e. a residue may seem to be critical for protein function when it is not. In our study population, organisms were not highly diversified, but there was sufficient variation for comparative analysis [18,26]. With proper alignment, comparative sequence alignment programs have been shown to be accurate over 90% of the time [20]. The TRIM5α sequence alignment was based on available BLAST data of 40 species with 2550 variants. This met the previous threshold for statistical significance [18,26]. All sequences were eukaryotes and included primates, mouse, rat, and cow [27].
Twenty-eight amino acid mutations in TRIM5α were identified in the literature, twenty-one of which are known to be nsSNPs in the human population (Table 1). Eight other nsSNPs had in-vitro functional data and are located in the critical SPRY domain of TRIM5α, but are not found in humans. The most common nsSNPs found in both HIV-1 infected and uninfected people were H43Y (frequency 6 to 43%) and R136Q (11 to 38%). Other com-mon nsSNPs included V112F (1 to 11%), G249D (6 to 27%), and H419Y (1 to 8%).
The nsSNP H43Y of the RING region may be important in protein function, specifically E3 ligase activity [13,16,23]. Up to 43% of certain populations carry this polymorphism [23]. Functional data have shown that H43Y retains restriction activity [12,14,15], whereas other results show decreased activity [13,23]. Individuals homozygous for the H43Y mutation may develop X4trophic virus more rapidly than those who are not and progress to AIDS at a faster rate [16]. To further investigate the affect of H43Y and other polymorphisms on protein function, the twenty-eight TRIM5α mutations were analyzed using SIFT, PolyPhen, A-GVGD, and average BLOSUM62 pairwise score ( Table 2). Three of the four computational methods (SIFT, PolyPhen, A-GVGD) suggested that H43Y is a tolerated mutation and does not affect protein function. Although PolyPhen and SIFT do not require aligned sequences, we have previously shown that using a sequence alignment of curated data is superior to a single query sequence alone [20]. In this case, regardless of the sequence(s) entered, SIFT classifies the mutation as tolerant. The BLOSUM62 pairwise program predicted H43Y as deleterious, but does not distinguish specific mutations at a given codon and instead makes general predictions based on overall conservation at a given position. This may be a less specific algorithm for detecting individual mutations, but is still as accurate as the other methods. For H43Y, the agreement between programs suggests a greater than 70% accuracy of a 'tolerant' prediction [20]. This supports the evidence that H43Y does not affect TRIM5α function and is likely a benign mutation.
Similar to H43Y, data regarding R136Q has shown conflicting results (Table 1). This amino-acid resides in the coiled coil domain and may participate in TRIM5α oli- Linker 1 < 1% [13] ND ND G110E B-box 2 < 1 to 2% [13,15] Functional/Slightly decreased [15] No effect [  PROBABLY DAMAGING DELETERIOUS NEUTRAL DELETERIOUS Likely Deleterious (P i = 0.50) † SIFT output is based on manually aligned sequences. Query only sequence data is not shown. ‡ Predictions are based on P i , the extent to which SIFT (aligned), PolyPhen, A-GVGD, and BLOSUM62 pairwise agree for a given nsSNP. When all four methods agree (P i = 1.00), the overall predictive value exceeds 90% [20]. gomerization [8,13]. Clinical studies have shown that this mutation is increased in HIV-infected patients versus noninfected (OR = 5.49, 95% CI 1.83-16.45, p = 0.002) [15], but in-vitro data shows R136Q retains functional activity [15,23]. Furthermore, other clinical studies have shown that R136Q may have a protective affect against HIV-1 [13]. One reason for the conflicting results may be that this mutation is in linkage disequilibrium with other alleles that do play a role in HIV progression or susceptibility [15,16]. A questionable protective effect of R136Q has also been observed in people with X4-trophic virus [16]. Comparative sequence analysis shows that all four methods agree R136Q would be tolerant with an accuracy of greater than 90% [20].
Two other nsSNPs, G249D and H419Y, have also shown ambiguous data with either no effect on clinical outcomes [13,15] or a slower progression of disease [12]. Functional data show that both of these nsSNPs have no affect on TRIM5α function [12,13,15,23]. Three of four methods using comparative sequence analysis suggest G249D is a benign polymorphism, and all four agree that H419 is benign ( Table 2).
Several other TRIM5α nsSNPs of interest were also identified. The polymorphisms C58Y, R119W, Q143R, R238W, and V438G are all observed in different human populations and are all predicted deleterious by the four comparative sequence methods (Table 2). Only R119W has been evaluated in clinical studies and has no effect on HIV outcomes [15]. Both R119W and R238W are functional based on in-vitro studies [15,23]. Given the conflicting data, these nsSNPs along with C58Y, Q143R, and V438G should be further studied to assess affect on protein function and association with clinical HIV disease.
The SPRY region of TRIM5α is a critical region involved in species-specific restriction of HIV-1 [8,10,28] and contains codons under high degrees of positive selection [9,28]. A number of TRIM5α mutations have been studied in the SPRY region of the protein (Table 1). Of interest, amino acid residues 325 to 344 are in a segment of this domain which differs from primates [8]. This 'hypervariable' region has been shown to be responsible, at least in part, for the ability to specifically target HIV-1 [10]. Although no nsSNPs have yet been observed in this region in the human population, mutations at this site may confer a protective benefit against HIV-1 infection. In-vitro studies have demonstrated that single amino acid changes in this region, specifically R332P and to a lesser extent K324N, may be able to effectively restrict HIV-1 [10,29]. All methods for these mutations with the exception of I348V were predicted tolerant mutations in agreement with the functional data. For I348V, three methods pre-dicted the mutation as tolerant while only the average BLOSUM62 pairwise predicted it as deleterious.
Overall, the four computational methods agreed the majority of the time (κ = 0.53, moderate agreement [25]). Clinical studies on TRIM5α nsSNPs have shown conflicting results [12,13,15,16], but in-vitro assays clearly demonstrate activity at inhibiting HIV-1. More studies are needed to define the interaction of TRIM5α with immune regulatory genes and DNA sequences that may be in linkage disequilibrium. Focus should be taken to explore the possibility that TRIM5α may affect certain populations differently, specifically people with X4-dominant HIV infection or other ethnic groups.
Two limitations to this study are the paucity of TRIM5α gene sequences available and the lack of structural data available on TRIM5α. Although the number of sequences is sufficient as discussed above, a greater variety of species would allow better alignments based on sensitivity and specificity plots [20]. Furthermore, programs such as PolyPhen rely on structural databases of which there is none presently for TRIM5α.

Conclusion
Comparative sequence analysis suggests that neither H43Y nor R136Q affect TRIM5α protein function. We identified other nsSNPs that may affect TRIM5α activity and should be analyzed in further clinical and laboratory studies.