Data characterizing the genomic structure of the T cell receptor (TRB) locus in Camelus dromedarius

These data are presented in support of structural and evolutionary analysis of the published article entitled “The occurrence of three D-J-C clusters within the dromedary TRB locus highlights a shared evolution in Tylopoda, Ruminantia and Suina” (Antonacci et al., 2017) [1]. Here we describe the genomic structure and the gene content of the T cell receptor beta chain (TRB) locus in Camelus dromedarius. As in the other species of mammals, the general genomic organization of the dromedary TRB locus consists of a pool of TRBV genes located upstream of in tandem TRBD-J-C clusters, followed by a TRBV gene with an inverted transcriptional orientation. A peculiarity of the dromedary TRB locus structure is the presence of three TRBD-J-C clusters, which is a common feature of sheep, cattle and pig sequences.

T cell receptor TRB locus Dromedary genome Camelus dromedarius IMGT a b s t r a c t These data are presented in support of structural and evolutionary analysis of the published article entitled "The occurrence of three D-J-C clusters within the dromedary TRB locus highlights a shared evolution in Tylopoda, Ruminantia and Suina" (Antonacci et al., 2017) [1]. Here we describe the genomic structure and the gene content of the T cell receptor beta chain (TRB) locus in Camelus dromedarius. As in the other species of mammals, the general genomic organization of the dromedary TRB locus consists of a pool of TRBV genes located upstream of in tandem TRBD-J-C clusters, followed by a TRBV gene with an inverted transcriptional orientation. A peculiarity of the dromedary TRB locus structure is the presence of three TRBD-J-C clusters, which is a common feature of sheep, cattle and pig sequences.  The dromedary TRB locus characterization can be used to increase the understanding in the evolution of Camelidae and to contribute to solving the relative placement of this species within the Artiodactyla order.
The availability of the sequence of the dromedary TRB locus allows researchers to concentrate on functional study and provides a tool to use this specie as a valuable model for immunological research.

Data
Data presented in the text include tables and figures giving information on the genomic structure and the gene content of the dromedary TRB locus, a mammalian species belonging to the Camelus genus. This information was obtained by integrating the sequence data deduced from the public genomic assembly [2] with sequences obtained by PCR experiments conducted in our laboratory. Table 1 describes position, classification and functionality of the TRB genes retrieved from the dromedary public genome assembly. Table 2 shows the description of the dromedary TRBV pseudogenes. Table 3 describes position, classification and functionality of the unrelated TRB genes recovered from the dromedary public genome assembly. Fig. 1 shows the deduced amino acid sequences of the dromedary TRBV genes according to IMGT unique numbering for the V-REGION [6]. Table 4 provides the list of the genomic clones of the dromedary TRBD-J-C region with the primer pairs used and the PCR conditions. Fig. 2 shows the TRBD, the TRBJ and the TRBC gene sequences. We employed the recent submission to NCBI (BioProject PRJNA234474) of a draft genome sequence from the Arabian camel [2] to identify the TRB locus in this species. A standard BLAST search (Basic Local Alignment Search Tool. http://blast.ncbi.nlm.nih.gov/Blast.cgi.) of the dromedary genomic resource was then performed by using human and sheep TRB gene sequences to assess their physical location in the dromedary genome. We directly retrieved a sequence of 457871 pb (gaps included) from the PRJNA234474_Ca_dromedarius_V1.0 assembly that corresponds to eight distinct unplaced and not continuous scaffolds ( Fig. 1 in [1]). The sequence comprises the MOXD2 and the EPHB6 genes that flank the 5′ and 3′ ends, respectively, of all mammalian TRB loci studied to date. All dromedary TRB genes have been recognized and annotated while taking into account both the human sequence and the sheep genomic D-J-C region as a reference [3][4][5] (Table 1). The functionality of V, J and C genes was predicted through the manual alignment of sequences adopting the following parameters: (a) identification of the leader sequence at the 5′ of the TRBV genes; (b) determination of proper recombination signal (RS) sequences located at 3′ of the TRBV, 5′ of the TRBJ, and 3′ and 5′ ends of the TRBD genes, respectively; (c) determination of correct acceptor and donor splicing sites; (d) estimation of the expected length of the coding regions; (e) absence of frameshifts and stop signals in the coding regions of the genes. We annotated 33 TRBV germline genes (twenty-five functional genes and eight pseudogenes) (Table 2), one TRBD, 13 TRBJ and two complete and one incomplete TRBC genes. The analysis of the 3′ part of the locus revealed the potential presence of three D-J-C clusters similar to clusters found in sheep [4,5].
We also identified and annotated four trypsin-like serine protease (TRY) genes (Table 3). In this context, downstream of the TRBV1 gene, proceeding from 5′ to 3′, we found as in humans two protease genes that we recognized tentatively, according to their genomic position, as TRY1 (alias PRSS58 or TRYX3) and TRY2 (alias TRY2P), respectively. A third TRY gene, named TRY3, was homologous to a gene located after the TRY2P gene in humans that was found within the NW_011623391 unplaced scaffold. Extrapolation of the synteny with the human sequence predicts that the NW_011623391 scaffold should be juxtaposed within the dromedary TRB locus, upstream of the TRBV3 gene ( Fig. 1 in [1]). An additional TRY gene, classified as TRY4, was found before the D-J-C region. Thus, unlike humans, only one TRY gene encompasses the array of the TRBV genes. All dromedary TRY genes appear putatively functional with the presence of correct acceptor and donor splicing site and an absence of frameshifts and stop codon in their coding regions. The genomic structure of the MOXD2 and EPHB6 genes, which delimit the TRB locus, was also defined (Table 3).

Protein display of the dromedary TRBV genes
The deduced amino acid sequences of the germline TRBV genes were manually aligned according to IMGT unique numbering for the V-REGION [6] to maximize the percentage of identity (Fig. 1). Only    . Conversely, CDR-IMGT varies in amino acid composition and length. It should be noted that the TRBV21 genes show a difference in length of one amino acid in the FR3 that corresponds to a C′′ strand that is shorter and has a diverse amino acid sequence for TRBV21S2 compared to the TRBV21S1 gene.

Isolation of the dromedary TRBD-J-C region and analysis of the gene content
To isolate the entire TRBD-J-C region, we set up six different PCRs to produce six consecutive amplicons that cover the region between the first TRBJ and the last TRBC gene. Mostly, for each amplification, we used a primer pair, a gene-specific primer designed on the sequence of the TRBJ genes identified within the cDNA clones (see [1]), and a conserved primer constructed on the first exon of the TRBC genes. For the isolation of the TRBC2 gene, a 3'UTR lower primer derived from the sequence of the genomic assembly was used. Amplification consisted of an initial denaturation step at 93°C for 2 min followed by 10 amplification cycles that each comprised a denaturation step at 93°C for 10 s, an annealing step with a low temperature (53-56°C, according to the melting temperature of the primers) for 30 s, an extension step at 68°C for 7 min, followed by 25 cycles with a higher annealing temperature (55-58°C, according to the melting temperature of the primers) and a gradually increasing extension time of 20 s as well as a final incubation at 68°C for 7 min. A 30-deoxyadenosine overhang was added to blunt-ended amplicons by incubation with 1.0 unit of Platinum Taq DNA Polymerase (Invitrogen) at 72°C for 10 min. These products were purified and cloned into the StrataClone TA-vector per the manufacturer's instructions. For each sample, 6 to 10 colonies were propagated and bi-directionally sequenced using M13 and T7 vector-specific primers. All plasmid sequence data were manually analysed. For the list of the clones with the primer pairs used and the PCR conditions see Table 4. All the obtained amplicons were sequenced (Acc. no. LT837971). The sequenced region is schematically illustrated in Fig. 3 in [1].
The nucleotide and deduced amino acid sequences of the TRBD, TRBJ and TRBC genes classified according to the similarity to the sheep sequence are shown in Fig. 2.

Transparency document. Supplementary material
Transparency document associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.dib.2017.08.002.