The Major Histocompatibility Complex of Old World Camels—A Synopsis

This study brings new information on major histocompatibility complex (MHC) class III sub-region genes in Old World camels and integrates current knowledge of the MHC region into a comprehensive overview for Old World camels. Out of the MHC class III genes characterized, TNFA and the LY6 gene family showed high levels of conservation, characteristic for MHC class III loci in general. For comparison, an MHC class II gene TAP1, not coding for antigen presenting molecules but functionally related to MHC antigen presenting functions was studied. TAP1 had many SNPs, even higher than the MHC class I and II genes encoding antigen presenting molecules. Based on this knowledge and using new camel genomic resources, we constructed an improved genomic map of the entire MHC region of Old World camels. The MHC class III sub-region shows a standard organization similar to that of pig or cattle. The overall genomic structure of the camel MHC is more similar to pig MHC than to cattle MHC. This conclusion is supported by differences in the organization of the MHC class II sub-region, absence of functional DY genes, different organization of MIC genes in the MHC class I sub-region, and generally closer evolutionary relationships of camel and porcine MHC gene sequences analyzed so far.


Introduction
The major histocompatibility complex (MHC) is a genomic region of critical importance for vertebrate immune functions. Class I and II MHC molecules are responsible for the presentation of peptides to T cells of intracellular and extracellular origin, respectively [1]. Different MHC molecules encoded by different alleles of multiple MHC genes can bind and present different groups of structurally similar antigenic oligopeptides. Compared to highly specific T-and B-cell receptors, MHC molecules are less specific. This phenomenon allows MHC molecules to cope with the immense diversity of protein antigens based on inherited genetic variation. Different antigen presenting molecules are encoded by different allelic variants of MHC class I and class II genes. The MHC region is characterized by many loci with immune as well as with non-immune functions. Of these, multiple loci code for antigen presenting molecules that possess many alleles and have higher heterozygosity typically than to determine variation of gene encoding non-antigen presenting molecules, and to compare it with variation observed for the studied MHC class II [24] and MHC class III genes.

Analysis of Selected Non-Antigen Presenting MHC Genes
The camel DNA samples used in this study come from collections at the Research Institute of Wildlife Ecology, Vetmeduni Vienna and UPVS Brno. Bactrian camel samples were collected at three different locations in Mongolia and one sample was collected from a breeder in Austria, while dromedary samples were from Jordan, Saudi Arabia, UAE, Qatar, Sudan, Kenya, Kazakhstan and Nigeria. Numbers of analyzed camels of both species are presented in Table 1. All samples were collected commensally during veterinary procedures for previous research projects (GACR 523/09/1972; PI: P. Horin; FWF P1084-B17 and P24706-B25; PI: P. Burger). An MHC class II gene TAP1, and MHC class III TNFA, and the LY6G6 gene family were selected for this study. All these genes were already annotated. We amplified and sequenced them as described previously for MHC class I genes [25]. Briefly, individual gene sequences were extracted from sequence resources at NCBI, as seen in Tables S1-S6. Primers were designed using either the Primer3 Webtool or NCBI PrimerBLAST in Table 2. PCR was performed according to standard protocol using either KAPA 2G Robust HS or KAPA LongRange HS (KapaBiosystems, Wilmington, DE, USA). Sequencing was performed on Illumina MiSeq platform (San Diego, CA, USA) using 500 cycles PE chemistry. Data analysis was performed as previously described in Plasil et al. [25]. Table 2. List of primers used in amplification of selected major histocompatibility complex (MHC) genes in camels.

New Annotation of the MHC Region and Phylogenetic Analyses
We used the newly available resource of the C. dromedarius genome assembly CamDro3 [26], which improves upon work published by Elbers et al. [27], for refining our previous characterization of the MHC region to produce a detailed map of all three MHC class regions [24,25]. The CamDro3 assembly is a result of upgrading the CamDro2 assembly, similarly to Elbers et al. [27], where CamDro1 assembly was upgraded to CamDro2 [28]. The CamDro2 assembly was re-scaffolded using the original Dovetail Chicago and Hi-C reads with the HiRise pipeline [29] in an attempt to fix local misassemblies. We then filled in gaps using our PacBio long-reads (SRA accession: SRP050586) [27] with PBJelly v. 15.8.24 [30] twice instead of one time, which was done for CamDro2. Instead of polishing the assembly with Pilon [31], we used a standard variant calling workflow, which increased the RNA-Seq mapping rates relative to the Pilon-polished assembly. Briefly, we first mapped, trimmed and error-corrected Illumina short-insert sequences (SRA accession: SRR2002493) [28], using BBMap v. 38.12 (https://sourceforge.net/projects/bbmap/) with the vslow and usejni settings to the PBJelly assembly. We then sorted and indexed the resulting BAM file with Sambamba v. 0.6.7 [32], and called variants with CallVariants v. 38.12 (https://sourceforge.net/projects/bbmap/). We finally used BCFtools v. 1.2 (http://samtools.github.io/bcftools/) to generate a consensus sequence, for which we filled in gaps using ABYSS Sealer v. 2.1.0 [33], using default settings except for a bloom filter size of 40 GB, and multiple K values from 90 to 20 in increments of 10.
For phylogenetic analyses of the MHC genes studied here, we included annotated sequences from other mammalian species available at NCBI, as seen in Tables S1-S6. Phylogenetic trees were constructed in MEGA7 using the maximum-likelihood (ML) approach with 1000 bootstraps and the best-fitting model was tested according to the Bayesian information criterion (BIC) [34]. I.e., the best evolutionary model for TAP1, TNFA, LY6G6E and LY6G6F followed the Tamura 3-parameter with gamma distribution (5 categories with parameters 0.5098, 0.1968, 0.5569 and 0.8816, respectively) [35]. The LY6G6C and LY6G6D trees were based on the Jukes-Cantor and Kimura 2-parameter [36,37], respectively.

The MHC Class II Gene TAP 1
Compared to the available annotated coding sequences (CDSs) of Old World camels (XM_010994582.1:25-2265), one substitution was found in the CDS of the TAP1 gene. Out of the total of 19 SNPs within the CDS, 12 were non-synonymous and 7 were shared between Camelus bactrianus and Camelus dromedaries, as seen in Table S7, Figure S1. We detected a discrepancy between the here developed TAP1 CDS and the sequences available at NCBI, as seen in Table S1. While the alignment of the published dromedary and Bactrian camel sequences showed significant differences between sequence positions 520-585 of the CDS, we could not confirm them in our sequences as all analysed sequences bear close similarity to the sequence of C. dromedaries, as seen in Figure S2. A potential deletion between positions 265 and 306 bp was identified in the CDS of C. ferus obtained from NCBI, as seen in Table S1.
A phylogenetic analysis of the camel TAP1 sequences showed high level of conservation within Camelidae. Despite its lower bootstrap support, it suggested a relatively close evolutionary relationship with the domestic pig, Sus scrofa, as seen in Figure 1. Additional information regarding the sequence similarity of the analyzed TAP1 CDSs is available in the Table S1.

The TNFA Gene
The three SNPs found within the TNFA CDS in exon 2 (1 non-synonymous) and exon 3 (1 synonymous, 1 non-synonymous) were shared between C. bactrianus and C. dromedaries, as seen in Figure S3. The haplotypes were conserved across the entire CDS, i.e., the three SNPs were either in homozygous or heterozygous constitution, and the frequency of heterozygote individuals was 0.48 and 0.27 in dromedaries and Bactrian camels, respectively. A comparison with other publicly available camelid sequences did not reveal any additional potential SNPs, as seen in Figure S4. The phylogenetic tree showed an overall high level of conservation within Camelidae, and closer evolutionary relationships with sequences from cattle and goat, as seen in Figure 2. Additional information regarding sequence similarity of the analyzed TNFA CDSs is available in the Table S2.

The LY6G6 Gene Family
Four genes of this family, LY6G6C, LY6G6D, LY6G6E, and LY6G6F were analyzed and have been annotated in the three species of Old World camelids. While we discovered only one non-synonymous SNP within the LY6G6C CDS at position 218 (G/A) present in C. dromedaries, as seen in Figure S5, the sequences retrieved from NCBI showed another non-synonymous SNP at position 235 (C/A) as seen in Figure S6. However, this SNP was not confirmed by our dataset. The frequency of heterozygotes for the LY6G6C SNP 218 in C. dromedarius was 0.2. The CDS of the LY6G6D contained one synonymous SNP (G/T at position 405) observed only in C. bactrianus, as seen in Figure S7. The frequency of heterozygotes for this SNP was 0.2 in C. bactrianus. Comparison with other available sequences from NCBI did not reveal any additional SNPs in Old World camelids, as seen in Figure S8.
In the family Camelidae, the LY6G6E gene has been annotated as sperm acrosome membrane-associated protein 4-like. Figure S9 shows our analyses, as well as sequences retrieved from NCBI, that the LY6G6E CDSs were monomorphic in the subfamily Camelini. However, a comparison with the CDS of Vicugna pacos (Lamini) revealed two SNP positions (319 T/C and 358 G/A), where the latter was non-synonymous. Comparison with the CDS of Bos taurus showed that the CDSs compared were not of the same length.
One synonymous (A/G, at position 150) and one non-synonymous SNP (C/T, at position 659) were identified in the CDS of LY6G6F, both shared between C. bactrianus and C. dromedaries, as seen in Figure S10. The frequencies of heterozygotes in C. bactrianus and C. dromedarius, for both SNPs were 0.3 and 0.5, respectively. Individuals of both species were either heterozygous or homozygous in both SNPs, but never homozygous for one and heterozygous for the other. Comparison of the sequences available on NCBI confirmed both SNPs, as seen in Figure S11.
The results of the phylogenetic analyses summarized in Figures 3-6 document a high level of conservation within the LY6G6 family CDSs in the Camelidae. LY6G6C showed the closest evolutionary relationship with the sequence of Sus scrofa, while LY6G6D was closer to Bos taurus and Capra hircus. LY6G6E was close to sequences of all other included Cetartyodactyla Sus scrofa, Bos taurus and Capra hircus, and LY6G6F was also closest to Sus scrofa. In humans, LY6G6E has a status of pseudogene. Sequence similarities of CDSs used for constructing the phylogenetic trees are provided in Tables S3-S6.

New Annotation of the MHC Region in the Dromedary
A new version of the organization of the MHC region in the dromedary is shown in Figure 7. A complete annotation record will be available in Lado et al. [26]. A graphical summary of the evolutionary relationships for the genes studied is presented in Figure 8. The synteny plot for available camelid MHC regions is shown in Figure 9.

Discussion
This study brings new information on MHC class III sub-region genes in Old World camels. These data, along with new camel genomic resources, allowed us to construct an improved genomic map of the entire MHC region of Old World camels.
The MHC class III region, where no genes coding for antigen presenting molecules are located, represented a gap in our knowledge of the camel MHC. For bridging this gap in the map, we selected different types of class III genes with diverse immunological functions. Based on this, we could make a comparison of their variability with so far known genes from other MHC regions previously studied in the same groups of dromedary and Bactrian camels [24,25]. The data then primarily served for allowing us an overall characterization of the camelid MHC. As such, they do not represent a detailed analysis of the MHC class III region, which still is our pending task.
The TNFA (tumor necrosis factor alpha) is a cytokine playing a major role in the activation of both acute and chronical inflammatory processes, and contributing to the activation cascade of mother cytokines and chemokines connected with inflammation [42]. Besides this role, the TNFA also contributes to the regulation of apoptosis and carcinogenesis [43,44]. While there are no reports on the role of TNF-alpha and/or of the TNFA gene in diseases of camels, in cattle, a closely-related species, TNFA polymorphisms have been associated with multiple infectious diseases and with fertility [45][46][47][48]. The SNPs identified in this report thus represent potentially useful markers for analyzing various camel diseases. TNFA, subject to strong purifying selection [42], may be also used as a marker of a conserved part of the MHC in close physical proximity of MHC genes under positive selection pressure [48].
The LY6G6 (lymphocyte antigen 6 family, member G6) gene family is located within the MHC class III region, spanning approximately 20 kb. In humans (assembly GRCh38.p12) and mice (assembly GRCm38.p4), the family contains four individual genes, LY6G6C, LY6G6D, LY6G6E, and LY6G6F. LY6 proteins containing cysteine-rich domain are attached to the cell surface by a GPI anchor, which is involved in signal transduction. LY6 proteins might play a role in the hematopoietic differentiation [49].
This is a first report on the existence of this group of MHC class III genes in camels. Although little is currently known about their functions, they represent suitable anchor and marker loci for the MHC class III region. Sequences of all four selected genes showed high levels of conservation within Old World camels. SNPs were rather rare and were often observed in only one camel species; LY6G6E was even completely monomorphic in both species. Interestingly, while the LY6G6E sequence has the status of pseudogene in humans, we did not find any features of non-functional LY6G6E sequences in the two camel species. These findings are in agreement with those of other closely related mammals, such as Bos taurus and Sus scrofa, suggesting that this gene is probably functionally important for the Cetartiodactyla.
Having characterized selected MHC class III genes in Old World camels, we were able to build upon our previous studies of MHC class I and MHC class II regions and to update the general organization of the MHC region, describe its similarities with New World camelids, as well as with relevant mammalian species, and to point out its unique features.
The MHC class III genes studied here do not encode antigen presenting molecules and are not involved in the mechanisms of antigen presentation. For comparison, we chose TAP1, a gene not coding for antigen presenting molecules, but playing an important role in the process of antigen loading of the MHC class I molecules, located in the MHC class II region [50]. Certain TAP1 variants may have a large impact on MHC class I expression and interaction with pathogens, at least in some species [51,52]. While TNFA and the gene family LY6 showed high levels of conservation characteristic for MHC class III loci in general, the camel TAP1 was not only polymorphic, but the numbers of SNPs detected were higher than in the MHC class I and class II genes examined so far [24,25].

MHC Region Organization and Diversity in Old World Camels
The presented MHC organization is primarily based on dromedary sequences, but due to the high conservation and similarity of this region between dromedaries and Bactrian camels, we can take firm conclusions about the latter's MHC. The overall nucleotide and amino acid sequence similarities between the two species ranged across the MHC region (delimited by the genes TRIM27 and RXRB) on chromosome 20 of the new dromedary assembly CamDro3 [26], with no major differences between the three sub-regions, as seen in Figure 9. Extensive trans-species polymorphism (TSP), i.e., allele sharing, is also a typical feature of the MHC of camelids, including llamas [24,25].
The diversity of the MHC class I, class I-related and class II genes is generally lower than expected, based on a comparative analysis of multiple vertebrate species [53]. This finding is in agreement with a lower genome-wide diversity in dromedaries than in the wild and domestic Bactrian camels [28]. We were unable to make a reliable estimate of polymorphisms of class II DQB genes. Their exon 2 encoded amino acid sequences containing the antigen-binding site are by four amino-acids longer due to a 12 bp insertion in both camel species. They contain multiple SNPs, but we were unable to identify individual loci [24]. It is possible that certain specificities in DQB sequences, such as GC rich regions and/or long repeats cause technical problems as described, e.g., for the horse DQB region [54]. Interestingly, some of the genes encoding other than antigen presenting molecules and located in the MHC class I and II regions, such as MICA and TAP1, were found to have higher polymorphism than classical MHC class I and II genes. On the other hand, the class II DRA gene, monomorphic in most mammalian species, is polymorphic both in dromedaries and Bactrian camels [24]. Recently identified MHC-linked microsatellite loci also showed unusually low diversity both in dromedaries and bactrians, often with only two or three alleles [55]. Bottlenecks experienced in the evolutionary history of the three species, and also in recent times due to domestication, might be the cause of the reduced genome-wide variability in dromedaries. However, it is not clear to what extent these processes contributed to the low diversity of the immunogenome. Additionally, the reasons for higher numbers of SNPs observed in genes coding for other than antigen presenting MHC molecules remains especially puzzling.

Cross-Species Comparisons with New World Camelids and Other Mammals
A direct comparison with alpaca sequences presented here, as well as in our previous work [24,25], showed that the studied loci are generally highly conserved in camelids. Based on the data available so far, as seen in Figure 9, it seems that the organization of the MHC genomic region in Vicugna pacos is highly similar if not nearly identical to Old World camels.
When compared to other species, the physical location and the overall organization of the MHC in camels follows the common general structure of the mammalian MHC region [24]. However, phylogenetic relationships of camel MHC genes do not always follow relationships based on other (neutral) nuclear genes [24,25,38,56], suggesting that different MHC sub-regions might have followed different evolutionary pathways. In addition, so far, we have been unable to identify a DYA locus, specific for the Bovidae, the phylogenetically closest family. Although DYA-like sequences were previously annotated in camelid genomes, as seen in Figure 7, a more focused analysis showed that they are more similar to DQA rather than to DYA sequences, and that it was impossible to identify this region within the camel MHC physical map [38]. Like in camels, the porcine MHC class II region contains a non-functional fragment of DY gene (GenBank Gene ID: 100135048), although it is annotated as DYB, while the camel sequence is annotated as DYA. Moreover, in cattle, the physical position of the TAP1 gene is different from other mammalian species, including camels, as seen in Figure 8. Overall, the organization of the camel MHC class II region is more similar to Sus scrofa rather than to Bos taurus.
Nucleotide sequences of genes located in the MHC class II sub-region are generally closer to dog and human, with the exception of DQB and TAP1, which are closely related to their porcine counterparts, as seen in Figure 7 [38]. On the other hand, the MHC class III genes and the LY6G6 gene family, are more related to the corresponding porcine sequences, with the exception of LY6G6D and TNFA, which are closer to the bovine sequences than to its porcine counterpart. Despite a probably common origin and close physical proximity of the LY6G6 gene family, the LY6G6D does not behave identically in phylogenetic trees, as seen in Figures 3-6 [57]. It thus seems that different MHC sub-regions have not always the same evolutionary history, reflecting probably the fact that the selection pressures exerted on the immune system are different between camels and cattle.
The evolution of the MHC class I sub-region appears to be even more complex, as shown in our previous study [25]. The MHC class I locus B-67 is closely related to the canine locus DLA-88, the locus BL3-7 is related to the porcine SLA-11, while MICA is closest to the relevant human sequences. While the camel MIC genes follow the pattern of the human MICA/B genes, cattle have at least three functional MIC genes with unique sequences [25,58]. In contrast to these differences between camel and cattle, high similarities with the pig MHC class I genes were often observed. Especially, the MHC class I gene BL3-7, a locus of unclear status, highly similar to the annotated sequence BL3-6 in alpacas, is also closely related to the locus SLA-11 in pig, and is one MHC locus with unknown function and unusual structure [25,59]. Interestingly, we have also found close similarities between camels and pigs for two other complex immunogenomic regions, the natural killer complex (NKC) and the leukocyte receptor complex (LRC), encoding natural killer cell receptor molecules. Both regions differ from their cattle counterparts, resembling more the pig NKC and LRC, respectively [60]. However, in this context, it is important to note that the annotation of MHC class I loci from CamDro3, as presented in Figure 7, is based on calculations of their sequence similarities with other species. Taking into consideration that sequence similarities alone are often not informative for a correct assignment of their classical or non-classical status in non-model organisms, we do not have enough information for definitive conclusions about this status for class I genes in camels either.

Conclusions
This first characterization of specific MHC class III genes added to the general structure and variability of the MHC in Old World camels, allowing a synopsis of the current knowledge of the MHC region of Old World camelids. Novel phylogenetic analyses between camelids and other domestic mammals showed high conservation of this region among Old and New World camels, and a closer evolutionary relationship with pigs rather than with cattle. The knowledge of the MHCs structure, function and evolution in camels is important for understanding their immune response to diseases in their specific and extreme environments. Funding: This work was supported by the Austrian Science Foundation (FWF) project P29623-B25 and by CEITEC -Central European Institute of Technology with research infrastructure supported by the project CZ.1.05/1.1.00/02.0068 financed from European Regional Development Fund. J.P. Elbers is funded by the FWF project P29623-B25 granted to P.A. Burger.