Introduction

Nonhuman primates are important animal models for biodefense, transplant immunology, and infectious disease research (Gardner and Luciw 2008; Berger et al. 2009; Patterson and Carrion 2005). In particular, macaque infection with pathogenic strains of simian immunodeficiency virus (SIV) or chimeric SIV/HIV (SHIV) serves as the primary model system for understanding HIV pathogenesis (Pratt et al. 2006; Baroncelli et al. 2008; Valentine and Watkins 2008). Major histocompatibility complex (MHC) class I-restricted CD8+ T-cell responses are crucial in determining the host adaptive immune response against infection by viruses like HIV/SIV (Goulder and Watkins 2008). However, characterizing CD8+ T-cell responses requires detailed knowledge of MHC class I alleles present in infected macaques. MHC class I genotyping in macaques is complicated by the fact that macaque MHC class I loci have undergone a complex series of segmental duplications—genomic sequencing of the MHC region shows that at least 22 functional MHC class I genes are encoded in both rhesus (Daza-Vamenta et al. 2004) and cynomolgus macaques (Watanabe et al. 2007).

Indian rhesus macaques (Macaca mulatta) have historically been the preferred macaque population for modeling infectious disease; as such, they are to date the most well-characterized population in regards to known MHC class I allele sequences (Baroncelli et al. 2008). In this population, sequences for over 600 MHC classical and nonclassical class I alleles have been at least partially characterized (www.ebi.ac.uk/imgt/mhc; Robinson et al. 2003). While cynomolgus macaques (Macaca fasicularis) provide alternative models for biomedical research, pig-tailed macaques (Macaca nemestrina) are emerging as important models for HIV infection. Significantly, pig-tailed macaques can be infected not only with SIV/SHIV (Buch et al. 2002; Polacino et al. 2008) but also with minimally modified HIV-1. Unlike rhesus and cynomolgus macaques, in which the functional TRIM5α protein is a barrier to HIV-1 replication, pig-tailed macaques express a nonfunctional TRIM5α variant that makes them susceptible to HIV-1 infection (Brennan et al. 2007; Igarashi et al. 2007; Brennan et al. 2008; Newman et al. 2008). The recent success in challenging pig-tailed macaques with minimally modified HIV-1 containing only SIV-derived Vif sequences (Hatziioannou et al. 2009) suggests that pig-tailed macaques will become more widely used in HIV pathogenesis and vaccine research.

MHC class I alleles can both influence the course of infection following viral challenge and confound interpretation of vaccination effects (Yant et al. 2006; Florese et al. 2008; Loffredo et al. 2008; Sauermann et al. 2008; Mee et al. 2009). Thorough investigation of cellular immune responses and correlates of protection against infection in pig-tailed macaques is hindered, however, by limited knowledge of MHC class I genetics. Only 28 Mane-A and 22 Mane-B complementary DNA (cDNA) sequences have been partially or fully described (www.ebi.ac.uk/imgt/mhc; Robinson et al. 2003). Restriction of an SIV epitope has been defined for a single allele, Mane-A1*08401 (previously known as Mane-A*10, accession numbers AY557348, DQ916064, and EF010518), expression of which has been correlated to lower viral loads following challenge with SIVmac239 (Smith et al. 2005; Mankowski et al. 2008).

Recently, we introduced cDNA amplicon Roche/454 pyrosequencing as a method to rapidly determine MHC class I transcript profiles in macaques (Wiseman et al. 2009). Taking advantage of the high throughput and sensitivity of GS-FLX pyrosequencing, we sequenced a 190-bp cDNA amplicon spanning a portion of the highly polymorphic peptide-binding region of MHC class I transcripts. In a cohort of 12 pig-tailed macaques, we detected 24 previously described Mane-A and Mane-B sequences or lineages, along with 98 putative novel Mane-A and Mane-B-like sequences (Wiseman et al. 2009). This preponderance of putative novel sequences indicated that the existing Mane class I allele database is incomplete.

Here, we describe amplicon pyrosequencing for MHC class I genotyping in pig-tailed macaques using a 367-bp cDNA amplicon that encodes the MHC class I peptide-binding domain. This amplicon provides improved resolution of closely related class I sequences, more clearly illuminating shared sequences among animals and independent cohorts; additionally, spanning intron 2 of the MHC transcript eliminates the possibility of genomic DNA contamination. The comprehensive genotypes we obtained by this method elucidated MHC class I diversity within individual animals and also highlighted the need to characterize full-length Mane sequences to confirm that these novel cDNA amplicon sequences represent functional MHC class I transcripts.

Taking advantage of amplicon pyrosequencing data to pre-screen and prioritize individual animals for allele discovery by cDNA cloning and Sanger sequencing, we characterized 66 novel Mane sequences and extended the known sequences for five previously characterized transcripts. This full-length characterization of novel Mane sequences provides a necessary confirmation of the diversity of novel sequences identified by amplicon pyrosequencing. More importantly, elucidation of these novel full-length sequences adds value to the pig-tailed macaque as a model organism in biomedical research; these full-length cDNA sequences can serve as reagents for a variety of immunological assays that will aid in investigating mechanisms underlying protective MHC class I-restricted immune responses in pig-tailed macaques.

Materials and methods

Pig-tailed macaque samples

We genotyped the MHC class I region by 367-bp amplicon pyrosequencing in 24 pig-tailed macaques from two distinct breeding centers. Cellular RNA and genomic DNA for 12 macaques (PT029–PT040) were provided by investigators at Johns Hopkins University (Baltimore, MD); RNA, peripheral blood mononuclear cells (PBMC), T cells, or bone marrow samples were provided for an independent cohort of 12 pig-tailed macaques (PT044–PT055) from the Fred Hutchinson Cancer Research Center (Seattle, WA). We obtained full-length cDNA sequences from 24 pig-tailed macaques for which we had comprehensive pyrosequencing genotypes. Cellular RNA for PT029–PT040, macaques genotyped in this report by 367-bp amplicon pyrosequencing, was provided by Johns Hopkins University researchers. The other 12 pig-tailed macaques used here for allele discovery were previously genotyped by 190-bp amplicon pyrosequencing (Wiseman et al. 2009): PBMC samples from nine of these pig-tailed macaques (PT020–PT028) were obtained from the University of Pennsylvania (Philadelphia, PA), while cellular RNA from PT009, PT010, and PT019 was obtained from Johns Hopkins University. All animals were cared for according to the regulations and guidelines of the Institutional Care and Use Committees at their respective institutions.

Preparation and pyrosequencing of 367-bp amplicons

Samples were prepared as described previously (Wiseman et al. 2009). If necessary, RNA was isolated using the MagNA Pure LC RNA Isolation Kit (Roche Applied Sciences, Indianapolis, IN). RNA was reverse-transcribed to cDNA using the Superscript™III First-Strand Synthesis System (Invitrogen, Carlsbad, CA). We generated PCR amplicons from cDNA using high-fidelity Phusion polymerase (New England Biolabs, Ipswich, MA). For the 367-bp amplicon, we used the previously described exon two forward primer, SBT190F, paired with a reverse primer, SBT367R (5′-TCCCACTTSCGCTGGGT-3′), which binds a conserved region of exon 3. Forward and reverse PCR primer pairs contained one of 12 multiplex identifier (MID) tags, unique 10 bp sequences annealed to the 5′ end of the primer, along with GS-A or GS-B adaptor sequences required for emulsion PCR: 17 bp sequences annealed to the 5′ end of the MID tag-primer oligonucleotide (Supplemental Table 1). The total amplicon length including adaptor and MID sequences is 421 bp.

We generated primary amplicons from cDNA using the following PCR program on an MJ Research Tetrad Thermocycler (Bio-Rad Laboratories, Hercules, CA): initial denaturation, 98°C for 3 min; amplification over 23 cycles of 98°C for 5 s, 60°C for 1 s, and 72°C for 20 s; and final extension of 72°C for 5 min. Aliquots from each reaction were run on a FlashGel DNA cassette (Lonza Walkersville Inc., Walkersville, MD) to check for sufficient amplification; if necessary, the reaction was put back on the thermocycler for three to six additional PCR cycles to generate sufficient product. PCR products were then separated using a 1% agarose gel in 1× Tris-acetate-EDTA buffer, purified using the MinElute Gel Extraction Kit (Qiagen, Valencia, CA), and quantified with a Qubit Quantitation Platform using the Quant-iT dsDNA HS Assay fluorescence kit (Invitrogen). We normalized amplicons to equimolar concentrations and then pooled samples, confirming purity of amplicons using a 2100 BioAnalyzer (Agilent Technologies, Santa Clara, CA).

The emulsion PCR, bead recovery, and pyrosequencing steps were performed following the manufacturer’s GS FLX protocols at the University of Illinois—Urbana Champaign Sequencing Center. The two 367-bp amplicon pools were sequenced in two independent instrument runs on 1/16th regions of a 70 × 75 PicoTiterPlate.

Data analysis

We associated sequence reads with individual animals by binning high quality reads according to MID tag. Because the sequence of the 367-bp amplicon exceeds the maximum read length for the GS-FLX instrument, we first assembled forward and reverse reads, averaging about 250 bp in length, into 100% identical unidirectional contigs using the SeqMan Pro assembler (DNASTAR, Madison, WI). We further examined only sequences that assembled into contigs of two or more identical reads. Following assembly of reads into unidirectional contigs, we performed manual editing as described in Wiseman et al. (2009) to remove pyrosequencing-associated artifacts (chiefly insertions or deletions in homopolymers) and to identify single base and primer mismatches introduced during the amplification processes. An increased frequency of insertion/deletion-type pyrosequencing artifacts resulted from inclusion of a conserved guanine homopolymer region, at the beginning of exon 3, in the 367-bp product for a subset of class I sequences; as a result, a somewhat higher percentage of total sequencing reads were identified as artifacts following manual editing than was observed with 190-bp amplicon pyrosequencing. We then used CodonCode Aligner (Dedham, MA) software to assemble forward and reverse contigs into bidirectional reads based on 100% identity in the overlapping region of approximately 120 bases. Requiring 100% identity over the entire overlapping region was sufficient to unambiguously pair forward and reverse reads for almost all sequences within an individual animal. In a small minority of cases, a definitive 367-bp sequence could not be identified because contigs of distinct forward reads associated with a single contig of reverse reads, or vice-versa; these ambiguous sequences were excluded in our final data analysis. All sequences we analyzed subsequently represent spliced messenger RNA (mRNA) transcripts as the presence of any genomic contamination from the second intron of MHC class I genes would be evident in the assembled bidirectional sequences. To minimize the likelihood of erroneously identifying sequencing artifacts as novel class I sequences, we required at least two identical sequencing reads in each orientation, for a minimum of four reads, to consider a putative novel class I sequence as present within an animal; we determined this to be an appropriate limit of detection given the stringent artifact analysis and manual assembly of bidirectional reads to ensure that only unambiguously paired forward and reverse unidirectional contigs were included in the data presented.

Confirmation of initial 367-bp amplicon pyrosequencing results by microsatellite analysis

We performed microsatellite analysis for PT029–PT040, the first pool of macaques genotyped by pyrosequencing of the 367-bp amplicon. As previously described (Wiseman et al. 2007; Karl et al. 2008), genomic DNA served as template for PCR using a panel of 16 microsatellite markers that span the 5-Mb MHC region in macaques. Sizes of the resulting fluorescently labeled products were determined by capillary electrophoresis. We scored peaks using Data Acquisition and Data Analysis Software (Van Mierlo Software Co., Einhoven, The Netherlands). All markers gave a single peak value per haplotype, with the exception of the P03-193435 marker in the MHC class IB region that yielded a variable number of peaks per haplotype.

cDNA cloning and sequencing to characterize full-length MHC class IB alleles

We selected two cohorts, of 12 pig-tailed macaques each, for allele discovery; comprehensive genotyping data from amplicon pyrosequencing existed for both cohorts (Wiseman et al. 2009; this report). The MHC class I cDNA cloning and Sanger sequencing method follows the protocol described by Karl et al. (2009). RNA was isolated, and cDNA was generated, as described above. We performed PCR using primers specific for the untranslated regions of MHC class IB alleles to preferentially amplify full-length MHC class IB cDNAs. The sense primer for cDNA-PCR, 5′MHC_UTR_CY1_MIDx (5′-AGAGTCTCCTCAGACGCCGAG-3′) was tagged with MID sequences as previously described in Karl et al. (2009); the antisense primer, 3′MHC_UTR_CY1-MIDx (5′-GGCTGTCTCTCCACCTCCTCAC-3′) was likewise tagged with MID sequences (Supplemental Table 1). PCR for amplification of MHC class I cDNA sequences, ligation and transformation of purified product into chemically competent E. coli, and preparation of plasmid DNA was done as described by Karl et al. (2009). Where the concentration of cDNA-PCR product allowed, we ligated multiple, uniquely MID-tagged samples into vector as a pooled, equimolar sample. At least 96 colonies were isolated per transformation. We did the Sanger sequencing, as previously described (Karl et al. 2009), of all clones using a single primer to identify potentially novel cDNA clones; we used the T7 sequencing primer for pooled samples or the 5′Refstrand_v2 primer for individual samples. Sequence analysis was performed using CodonCode Aligner and Lasergene (DNASTAR) software. We sequenced novel class I transcripts detected in at least three clones by single-pass sequencing with a total of five primers (Supplemental Table 1), as described by Campbell et al. (2009); for the full-length cDNAs characterized, we obtained overlapping sequence coverage for an average of 1,221 nucleotides between the 5′ and 3′ UTR primers.

The 71 full-length sequences described in this report are available in GenBank under the following accession numbers: FJ875218-FJ875276, GQ274880, GQ153465, GQ153484, GQ153471, GQ274894, GQ274896, GQ153467, GQ153511, GQ281749, GQ153468, GQ274887, and GQ274890. They were also submitted to the IMGT/MHC Non-human Primate Immuno Polymorphism Database-MHC (IPD) for official nomenclature assignments (Robinson et al. 2003). Nomenclature for pig-tailed macaque MHC class I sequences has been recently updated based on homology to rhesus macaques, and the most recent IPD designations for previously described cDNAs are given in Supplemental Table 2.

Results

Comprehensive MHC class I genotyping by amplicon pyrosequencing

The sequences encoding the peptide-binding domain of MHC class I proteins are highly polymorphic; therefore, obtaining partial sequence coverage of this region of class I transcripts allows us to distinguish specific sequences or lineages. Utilizing amplicon pyrosequencing for sequence-based typing applications is a method to rapidly and comprehensively genotype the MHC class I transcripts of macaques used for biomedical research. Although originally designed based on alignment of known rhesus and cynomolgus macaque MHC class I sequences, the 367-bp primers employed here for pyrosequencing effectively amplify pig-tailed macaque MHC class I transcripts as well, binding highly conserved regions of MHC class I transcripts that flank regions of great nucleotide variability in the peptide-binding region (Fig. 1).

Fig. 1
figure 1

Schematic of ~1.2 kb MHC class I product and PCR primers. Approximate primer positions are indicated. SBT190F was paired with SBT190R to generate the 190-bp amplicon (Wiseman et al. 2009) and with SBT367R for 367-bp amplicon for pyrosequencing. We used UTR primers to preferentially amplify full-length MHC class IB cDNAs. Relative nucleotide variability along the length of the transcript is calculated based on alignment, using MUSCLE (Edgar 2004), of previously known pig-tailed macaque MHC class I sequences, as well as the novel full-length pig-tailed macaque MHC class I sequences described here

We evaluated an average of 856 sequencing reads for each of the 24 pig-tailed macaques genotyped by 367-bp amplicon pyrosequencing. We distinguished approximately 14 distinct Mane sequences per animal, with a minimum of two Mane-A and six Mane-B transcripts identified in each animal (Fig. 2). A total of 119 unique Mane sequences were distinguished. We detected 16 previously described Mane-A sequences but only ten known Mane-B sequences. Putative novel MHC class I sequences predominated; we observed 21 Mane-A and 68 Mane-B sequences previously unreported. In addition, three previously characterized and one novel Mane-I sequences were identified. We determined that approximately 82% of Mane sequences, including the novel cDNA sequences described in this report, are uniquely resolved within the sequence of this amplicon. Interesting to those investigating the protective effects of Mane-A1*08401 against SIV disease progression, the 367-bp amplicon allows resolution of Mane-A1*08401 from closely related sequence variants. Within the amplicon sequence, Mane-A1*08401 is distinct from Mane-A1*08402 and also from two putative novel variants that we detected: Mn-A*nov013 and Mn-A*nov030, which differ by one and three nucleotide substitutions, respectively, from Mane-A1*08401 in the peptide-binding region.

Fig. 2
figure 2

Amplicon pyrosequencing MHC class I genotypes for 24 pig-tailed macaques. Samples from two cohorts of macaques were obtained from distinct breeding centers and genotyped with the 367-bp amplicon in two separate GS-FLX instrument runs: PT029–PT040 in the first pool and PT044–PT055 in the second. Abundance of Mane sequences is given as a percentage of the total reads analyzed per animal, following removal of pyrosequencing artifacts. Where multiple sequences are noted, the class I genotype is ambiguous due to sequence identity within the 367-bp region examined. The frequency column to the far right indicates the number of animals in which each transcript was observed. Names of novel class I transcripts for which full-length sequences are here described are in bold; all transcripts in bold are novel unless (FL) indicates that previously published sequence has been extended. GenBank accession numbers for full-length novel sequences are given in Table 2; partial sequences for identified “nov” transcripts were also submitted to GenBank (Supplemental Table 3)

We observed widespread sharing of MHC class I sequences within and between the independent pig-tailed macaque cohorts that were genotyped. Although pedigree data were not available for most animals, we deduced putative Mane-B haplotypes on the basis that particular combinations of three or more MHC class IB sequences were observed in two or more macaques with similar profiles of transcript abundance. In the 24 pig-tailed macaques reported here, we observed eight shared Mane-B haplotypes (Fig. 2); together with our previous amplicon pyrosequencing study (Wiseman et al. 2009), we inferred a total of 12 distinct Mane-B haplotypes (Table 1). The similar transcript profiles we observed among animals deduced to share a haplotype suggest that genotyping by amplicon pyrosequencing offers a semi-quantitative measure of relative transcript levels. This reproducibility in shared transcript profiles is exemplified in Fig. 3 for two Mane-B haplotypes (Pt4b and Pt7) that were shared among two and three animals, respectively, from distinct breeding centers. The correspondence between shared Mane-B haplotypes deduced by microsatellite analysis and amplicon pyrosequencing data for PT029–PT040 serves as confirmation of inferred haplotypes within this cohort, validating the predictions made based on sequence sharing among two or more animals (Fig. 4). Partial breeding records available for PT031, PT032, PT033, and PT034 were also consistent with the haplotype segregation inferred by both microsatellite analysis and amplicon pyrosequencing genotypes; this provides further support for the notion that deduced Mane-B haplotypes based on observations of shared sequences and transcript profiles in two or more animals are not simply chance arrangements. Finally, for the animals illustrated in Fig. 4, microsatellite profiles suggest sharing of extended MHC haplotypes despite the predicted amplicon pyrosequencing haplotypes being determined based only on shared Mane-B transcripts.

Table 1 Identification of twelve deduced Mane-B haplotypes in pig-tailed macaques
Fig. 3
figure 3

MHC class IB transcript profiles of animals inferred to share Mane-B haplotypes. Similarity in relative sequence abundance profiles among animals inferred to share haplotypes suggests comparable levels of expression for these transcripts. The Pt7 and Pt4b haplotypes were observed in multiple animals from both cohorts genotyped. The Pt4b haplotype is a variant of haplotype Pt4a originally observed in a separate cohort using the 190-bp genotyping amplicon (Wiseman et al. 2009). Haplotypes Pt11, Pt3, and Pt12 were also observed in multiple animals from distinct cohorts (Fig. 2)

Fig. 4
figure 4

Microsatellite analysis confirms shared haplotypes identified by amplicon pyrosequencing. Based on combinations of shared Mane-B sequences observed in the genotyping data, PT031, PT032, and PT033 were inferred to share a common haplotype designated Pt7. PT032 also shares haplotype Pt12 with PT034. These findings are consistent with partial breeding colony records available for these animals, indicating that they share a common sire and/or grandsire. Pt8 appears to be conserved between different breeding centers, since these animals were derived from unrelated parents and originated from distinct breeding colonies. Sequences associated with both haplotypes present in each animal are inferred based on amplicon pyrosequencing results; only shared Mane-B sequences have been designated with haplotype names. Novel full-length cDNA sequences characterized in this report are indicated in bold. Failed amplification resulted in no data, indicated by nd

Full-length MHC class IB allele discovery

MHC class I genotyping by amplicon pyrosequencing indicated that most novel MHC class I sequences in pig-tailed macaques are Mane-B sequences; therefore, we focused our allele discovery effort to characterize novel Mane-B transcripts. We characterized 66 novel, full-length MHC class I cDNA sequences. In addition, we obtained full-length cDNA sequences for five previously reported Mane-B and Mane-I transcripts, extending each of these known sequences by at least 100 bp to obtain sequences inclusive of both start and stop codons (Table 2). We prioritized cDNA cloning and sequencing for macaques inferred by amplicon pyrosequencing to share Mane-B haplotypes or express highly abundant novel Mane-B transcripts. We did not observe any full-length cDNA sequences that were not detected by amplicon pyrosequencing; identification of the novel Mane-B transcripts strongly correlated to the relative abundance determined by amplicon pyrosequencing. Thirty-five of the full-length Mane-B transcripts were sequences we observed at frequencies greater than 5% of sequence reads obtained per animal in the genotyping experiment. We observed 21 of these novel Mane-B transcripts at intermediate levels, between 1% and 5% of analyzed pyrosequencing reads. In contrast, we only characterized four of the full-length novel sequences that were detected at less than 1% of the total pyrosequencing reads per animal.

Table 2 Seventy-one full-length MHC class I cDNA sequences identified in pig-tailed macaques

Given the limited number of MHC class IB alleles characterized previously in pig-tailed macaques, we compared these novel Mane-B transcripts to the more extensively characterized Mamu and Mafa sequences. The official nomenclature (Robinson et al. 2003) assigned to our novel Mane-B sequences suggests that we identified 15 lineage groups, each consisting of two or more Mane-B transcripts (Table 2). At least three unique cDNA sequences were characterized for five Mane-B lineages. To illuminate possible structural or functional similarities to other known macaque MHC sequences, we performed BLASTP analysis using conceptual translations for these Mane-B transcripts. We analyzed similarity across species for the predicted protein products of these full-length novel Mane-B transcripts (averaging 362 amino acids) as well as for the peptide-binding region encoded by exons two and three (predicted to be 182 amino acids). Over a third of predicted proteins encoded by the novel Mane-B sequences characterized in this report have 100% amino acid identity within the peptide-binding region to previously described rhesus and cynomolgus macaque gene products (Table 2 ). Eight novel pig-tailed macaque MHC class I gene products are amino acid identical to complete cynomolgus macaque predicted proteins; four others have 100% identity to complete rhesus macaque proteins.

Discussion

Amplicon pyrosequencing and full-length cDNA sequencing as complementary methods

The coordinated use of amplicon pyrosequencing with cDNA cloning and Sanger sequencing to characterize novel MHC class I transcripts offers advantages for genotyping and allele discovery in species for which our knowledge of MHC genetics is limited. Firstly, full-length characterization of novel Mane sequences originally identified by amplicon pyrosequencing confirms the authenticity of the novel amplicon sequences as functional class I transcripts. While a major concern for pyrosequencing-based genotyping remains the relatively high incidence of sequencing artifacts, the concordance of the described cloning and sequencing results with the MHC class I transcript profiles generated by amplicon pyrosequencing is an important confirmation that amplicon pyrosequencing, when employed with appropriately stringent data analysis, is a reliable method to rapidly genotype macaques. Adding to this, the use of the 367-bp amplicon spanning intron 2 of the MHC transcript provides assurance that analyzed sequence reads represent spliced mRNAs and not genomic DNA sequence. The second advantage of this combined method is in using comprehensive genotypes to guide full-length allele discovery. Use of the 367-bp genotyping amplicon provides sufficient sequence resolution to aid in the unique identification of closely related sequence variants, adding sensitivity to our ability to detect novel Mane sequences. This genotyping data enabled us to target specific loci and focus allele discovery efforts primarily on animals predicted to carry shared haplotypes or highly abundant novel sequences.

Applications and limitations of MHC class I genotyping by amplicon pyrosequencing

The genotyping data generated by amplicon pyrosequencing is useful for a variety of applications. Comprehensive MHC class I genotyping in cohorts of pig-tailed macaques can simplify interpretation of experimental results by illuminating shared and unique haplotypes or individual transcripts. This may be particularly relevant in examining immune responses made against specific infectious agents and in interpretation of vaccination results, as genetic correlates to disease susceptibility and resistance are poorly understood in pig-tailed macaques. MHC class I genotyping by amplicon pyrosequencing also holds promise for design and management of breeding colonies. Because the MHC class I region recombines rarely during meiosis (Penedo et al. 2005), inheritance of known haplotypes can be readily traced from parents to offspring; availability of comprehensive genotypes for breeding sires and dams allows for simplified MHC class I genotyping of offspring by alternative techniques, such as microsatellite analysis.

While this deep sequencing method provides a comprehensive genotype overview, utility of the genotyping data is more limited in species for which few MHC class I sequences have been characterized. The short amplicon sequences obtained by pyrosequencing do not provide template material suitable for further investigation of the structure and function of novel MHC class I transcripts. Thus, full-length cDNA cloning and sequencing to characterize novel MHC class I sequences remains of great importance to researchers investigating MHC class I-restricted immune responses in less well-characterized nonhuman primate species. Additionally, the presence of certain MHC class I transcripts is likely masked due to either low expression or mismatches to the amplification primers used in pyrosequencing. Although all of the novel full-length cDNAs that we characterized were detected by amplicon pyrosequencing, four of our novel sequences were originally detected at relatively low abundance. Two of these (Mane-B*09801 and Mane-B*05102) were detected in four or more distinct animals and may be commonly expressed at low transcript levels. In contrast, the full-length sequences of Mane-B*01101 and Mane-B*03004 revealed mismatches under the forward pyrosequencing primer, which may have caused underrepresentation in our amplicon pyrosequencing results. Augmenting the library of full-length cDNA sequences enables us to fine-tune our amplicon pyrosequencing primers to capture a wider diversity of MHC class I sequences specifically in pig-tailed macaques.

Characterization of novel Mane transcripts

MHC class I transcripts are largely distinct even among geographically distinct populations of the same species (Karl et al. 2008; Campbell et al. 2009). The data described here, however, make a case for conservation of common classical MHC class I transcripts and lineage groups across species. The predicted amino acid sequences of the Mane transcripts described in this report exhibit a high degree of homology to MHC transcripts characterized in rhesus and cynomolgus macaques, despite the fact that pig-tailed macaques belong to a distinct evolutionary clade (Macaca silenus) that diverged over five million years ago from the Macaca fascicularis clade, which gave rise to both rhesus and cynomolgus macaques (Tosi et al. 2000; Deinard and Smith 2001; Li et al. 2009). Previously, the most notable example of conserved MHC class I protein sequences between pig-tailed macaques and the more closely related rhesus and cynomolgus macaques was the observed sequence homology of nonclassical MHC class I sequences (Lafont et al. 2003; Lafont et al. 2004). Among the novel Mane-B transcripts characterized here, 20 are 100% identical in the peptide binding region to Mamu or Mafa gene products, and 12 of these transcripts are identical to their Mamu or Mafa homologues throughout the complete protein translation.

While such similarity may not be unexpected given the similar susceptibility of these macaque species to certain infectious diseases, it has not previously been possible to compare such a diversity of Mane-B sequences with MHC class IB sequences expressed in other macaques. One striking similarity among species is the apparent evolutionary conservation of a sequence lineage known to give rise to alternatively spliced transcriptional variants. We sequenced two novel Mane-B*11901 transcripts, one of which appears to be an alternatively sliced variant encoding a truncated protein. Sequence homology between Mane-B*11901 and rhesus and cynomolgus macaque transcripts known to have splicing variants (Mamu-B*07402 and Mafa-B*03901) makes it likely that this alternative splice site is conserved in all three species. Furthermore, certain abundantly expressed MHC class I haplotypes may be conserved across species, a possibility exemplified by the combination of Mane-B*03003 and Mn-B*nov060 on the Pt4b haplotype, which was detected in a third of the pig-tailed macaques described in this report. Mane-B*03003 is 100% identical in the peptide binding region to the gene product of Mamu-B*03003 (Table 2), while Mn-B*nov060, for which sequence covers only the peptide-binding region, is 100% amino acid identical to Mamu-B*02702 (Supplemental Table 3); these two rhesus macaque transcripts are linked on recently described Indian rhesus macaque haplotypes (Sauermann et al. 2008; Wiseman et al. 2009).

The importance of amino acid homology in class I sequences from distinct species may be more fully understood by considering the amino acid homology that exists between two novel MHC class IB sequences characterized here and two MHC alleles previously shown to be protective in SIV infection. Mane-B*01703 is 99% similar at the amino acid level to Mamu-B*01701, and Mane-B*04701 has 100% identity to the Mamu-B*04701 transcript. Mamu-B*01701 is correlated with decreased viral loads following infection with SIVmac239 (Yant et al. 2006), while haplotypes containing Mamu-B*04701 are similarly correlated to slow disease progression (Sauermann et al. 2008). Mane-B*01703 differs from Mamu-B*01701 by three amino acids; however, these residues are located at the start of the α2 domain in positions predicted to be highly variable (Parham et al. 1988). Additionally, recent studies show that even disparate MHC class IB transcripts can present highly similar peptides (Loffredo et al. 2009). In the case of Mane-B*04701, identical to the protein product of Mamu-B*04701, this transcript is identified here as a component of a shared haplotype, designated Pt12 (Figs. 2 and 3, Table 1). While functional studies are required to determine if the role of Mane-B*04701 or Mane-B*01703 during SIV infection of pig-tailed macaques is similar to the protective role of the homologue rhesus macaque alleles, the degree of identity remains striking. Furthermore, in the case of Mane-B*04701, existence of a corresponding haplotype makes it possible to consider more than single-allele effects on disease progression.

Advancing pig-tailed macaques as model organisms for biomedical research

Using the coordinated approach of amplicon pyrosequencing to generate comprehensive genotypes with full-length cDNA cloning and sequencing for allele discovery, we have characterized 12 distinct Mane-B haplotypes and obtained full-length sequence for 66 novel as well as five previously described Mane transcripts. Previously, only 50 classical MHC class I cDNAs and ten nonclassical MHC class I cDNAs had been characterized in pig-tailed macaques (Lafont et al. 2003; Smith et al. 2005; Lafont et al. 2004; Pratt et al. 2006; Lafont et al. 2007); thus, this report represents a significant increase in our knowledge of the MHC genetics in this important animal for HIV and other infectious disease research. The comprehensive genotypes generated by amplicon pyrosequencing have broad applications for biomedical research using pig-tailed macaques, while the full-length characterization of these novel alleles makes it possible to generate reagents necessary for functional immunological studies in pig-tailed macaques, such as MHC class I transferrants to determine CD8+ T-cell restriction and tetramer constructs for sensitive detection of specific CD8+ T-cell populations. Additional improvements in the use of pyrosequencing for MHC class I genotyping, as well as continued full-length sequencing of potential novel alleles, make the pig-tailed macaque an increasingly valuable animal model for biomedical studies, including HIV vaccine development.