HLA Polymorphisms and Haplotype Diversity in Transylvania, Romania

Transylvania is a historical region in the northwestern part of Romanian with a rather heterogeneous population. Our study is the first to determine human leukocyte antigen (HLA) profiles in a large population sample from this region and to compare them with other European population groups. HLA genes were examined in 2,794 individuals using the Single Specific Primer-Polymerase Chain Reaction (SSP-PCR) and Polymerase Chain Reaction Sequence-Specific Oligonucleotide (PCR-SSO) methods. All samples were tested for the HLA-A locus, 2,773 for HLA-B, 1,847 for HLA-C, and 2,719 for HLA-DRB1 loci. HLA gene frequency data from several European population groups (as presented in studies involving more than 1,000 individuals) served as reference in comparison with the local sample. The distribution of HLA genes in the studied population group was heterogeneous, as the Hardy-Weinberg equilibrium was statistically significant (P value < 0.01). The most common genes found in our sample group were A∗02 (0.27%), B∗35 (0.14%), C∗07 (0.25%), and DRB1∗11 (0.19%). The most common haplotype was A∗01~B∗08~C∗07~DRB1∗03 (1.26% in 1,770 individuals with complete data). This analysis confirmed the known heterogeneity of the Transylvanian population. The study indicates that the European population groups located in close vicinity (those from Serbia, Hungary, Wallachia, and Croatia) are genetically closest to the Transylvanian population.


Introduction
The major histocompatibility complex (MHC) is a large gene complex (approximately 3.5 million base pairs) with an integral role in the immune system. Known as the human leukocyte antigen (HLA), the human MHC-encoded glycoproteins are vital in the body's immune defence, being specialized in the presentation of short peptides to T cells [1].
The 4 Mb region of the human chromosome 6p21 designated as the MHC is the most dense and polymorphic one in the human genome [2]. The extensive polymorphism at the HLA loci reflects the importance of the encoded molecules in human transplantation and autoimmunity, as well as in regard to drug response and susceptibility to infection [3]. Compatibility in the HLA system is vital in hematopoietic stem cell transplantation [4].
The distribution of the HLA genes is a particularity of any given ethnic group. Functional differences at the HLA loci observed between related populations, better explained by changes in the frequency of an existing allele than by migration history, contribute to a divergent immune response. Genetic drifts are generally eliminated through selection processes [5].
Analysis of the HLA profiles in various ethnic groups is important since a significant number of human diseases are more common among individuals carrying certain       HLA genes. HLA disparity in mating partners provides a better genetic baggage for eventual offspring. The HLA genes may influence the human lifespan, but such association rests both on the genetic background and environmental influences [6].
Transylvania ("terra ultra silvam" in Latin, i.e., the land beyond the forest) is a historical region of 100,293 km 2 (42% of the Romanian surface), geographically located within the Carpathian arch. It neighbours Ukraine in the north, Hungary in the west, and Serbia in the southwest. Across the Carpathian Mountains lie Walachia (south) and Moldavia (east), the other two major Romanian provinces [7][8][9].
The territory of Transylvania has been inhabited since pre-Christianity by the Dacians, part of the Thracian population formed around 2000 BC from the blending of the Indo-European migratory populations with the native Neolithic population [7,8,10]. The Romans conquered Dacia in 106 AD and administered it until 271 AD, an intense process of Roman colonization taking place during that period [7,8,10]. In the seven centuries following the withdrawal of the Roman administration, the intracarpathic territory was successively invaded by Visigoths, Huns, Gepids, Avars, Slavs, Bulgars, Magyars, and Pechenegs [7,8,10]. As a result, some local population groups found shelter into the mountainous areas where they have been surviving through centuries, somewhat isolated from the rest of the population [7].
In contrast to the neighbouring territories, the massive early Slav populations migrating in the 6th and 7th centuries were virtually assimilated here by the Dacian-Roman natives [8]. However, the historical evolution of the province was significantly marked by the establishment of the Magyars and the Szeklers in Pannonia in the 9th century, their gradual advance eastwards into the territory of Transylvania paralleling an extensive process of colonization with Saxons in the 11th-13th centuries. Several successive waves of Roma populations also settled here following the Mongol invasion in 1241 [7,8,10].
Transylvania became an independent principality under the sovereignty of the Ottoman Empire following the occupation of the Hungarian Kingdom by the Turks (1526) and, later on, an autonomous principality part of the Habsburg Empire (1699-1867, during which time the colonization of the southwestern Transylvania region with Swabians of German origin was observed). From 1867 to 1918, it was incorporated into the Austro-Hungarian Empire under the Table 3: The most frequent 50 HLA-A~B~DRB1 haplotypes from the 2,708 analyzed individuals presented in descending order of their frequency.
Our study is aimed at determining the HLA profiles (HLA-A, HLA-B, HLA-C, and HLA-DRB1) in a Transylvanian population group. This is the first attempt to analyze the frequencies of these HLA genes in a large population sample from this region.

Subjects.
A total of 2,794 individuals of Transylvanian origin were enrolled in the study between January 2010 and December 2017. 2,262 of them were recruited from the participants in the Romanian Bone Marrow Volunteer Donors Program registered in the Romanian Bone Marrow Donor Registry by the Institute of Urology and Renal Transplant in Cluj-Napoca. Related persons were excluded based on Registry evidence. 532 unrelated individuals (mothers and presumptive fathers) subjected to paternity testing at the Institute of Forensic Medicine in Cluj-Napoca were included in the study to improve the chances to detect low frequency genes. Figure 1 highlights the counties of origin for the Transylvanian population sample. Informed consent was obtained from all individual subjects included in the study. Data regarding HLA gene frequencies in several European countries neighboring Transylvania or various historically linked regions served as reference in comparisons with the local population group providing that they were the result of population studies involving more than 1,000 individuals (Table 1). Data were extracted from the Allele Frequency Net Database (http://www.allelefrequencies.net) [11], as well as from studies on the population groups of Walachia [12], Hungary [13], Serbia [14], Croatia [15], and Italy [16]. As only a few of these studies concerned haplotype analysis, we took into consideration additional data from other studies, regarding population groups in Germany [17], Bulgaria [18,19], Macedonia [20], and Greece [21].
2.2. DNA Extraction. Two ml of peripheral venous blood was collected from each person subjected to paternity testing, and DNA was extracted using a Ready DNA Spin Kit (Inno-Train Diagnostik GmbH, Kronberg, Germany) according to the manufacturer's instructions. DNA concentration and purity were quantified by nanophotometric readings against a reference Tris buffer. When the value of the A260/280 absorbance ratio was outside the 1.6-2.0 range, the DNA was purified using an Epicentre MasterPure™ Complete DNA and RNA Purification Kit (Illumina Company, Madison, WI, USA) according to the manufacturer's instructions.
The same amount of blood was collected from the volunteers included in the Bone Marrow Donors National   n: number of analyzed subjects; X: not statistically significant P value. Resources presented in Table 1 provided the gene frequency references.

Statistical Analysis.
The relative frequencies of the HLA-A, HLA-B, HLA-C, and HLA-DRB1 genes were expressed as ratios of their absolute frequencies (direct counting) to the total number of genes. Since the gametic phase was unknown, deviation from the Hardy-Weinberg equilibrium was appreciated with a test similar to Fisher's exact test performed locus by locus on an extended contingency table to arbitrary size [22,23]. The average distance between two populations was computed as the distance between each pair of gene frequencies using the fixation index (F ST ) formula provided by Rosenberg et al. [24]. We hypothesized that the F ST distance is inversely proportional to the genetic relatedness of two populations. For each gene, we produced a relatedness hierarchy based on F ST distance-top countries exhibiting the smallest average F ST distance to the Transylvanian sample. A multiple correspondence analysis of the F ST distances for the HLA-A, HLA-B, HLA-C, and HLA-DRB1 loci generated the overall relationships between the populations analyzed in this study. We determined the 4-locus and 3-locus haplotypes present in our sample. However, only the first 50 most frequent 3-locus haplotypes (HLA-A, HLA-B, and HLA-DRB1) were used in the comparative analysis since they are more frequently reported in literature.
The frequencies published by Constantinescu et al. [12], Lebedeva [11], and Rendine et al. [16] were taken into consideration when calculating the sample size. In order to be able to detect low gene frequencies (0.01%), the sample size was increased by adding 532 subjects to the initial 2,262.
The chi-square test or Fisher's exact test was used to compare the allele frequencies between different samples when any values in the 2 × 2 expected tables were <5. P values were adjusted using the Bonferroni correction, considering the number of comparisons recorded.

Results
We identified 18 HLA-A different genes in 2,794 subjects, 30 HLA-B genes in 2,773 individuals, 13 HLA-C genes in 1,847 individuals, and 13 HLA-DRB1 genes in 2,719 individuals. Their frequencies are presented in Table 2. For all the considered HLA genes of the Transylvanian subjects, a statistically significant departure from the Hardy-Weinberg exact equilibrium test was observed ( Table 2).
2,832 different haplotypes were identified in the 2,708 individuals for whom we had obtained complete data for 3 loci only (HLA-A, HLA-B, and HLA-DRB1), 221 of which

Discussion
The purpose of this study, to analyze HLA frequencies in the Transylvanian population and to compare them with European population groups of over 1,000 individuals, was achieved. In our pursuit, we considered being of interest to compare Transylvania with its neighboring countries and regions: Ukraine (north), Moldavia and the Republic of Moldova (east), Wallachia and Bulgaria (south), and Serbia and Hungary (west). Unfortunately, we did not find representative studies (n > 1,000) concerning the population of the Moldavia province, regarding the Republic of Moldova, Bulgaria, or Ukraine.
In regard to the most frequent genes, our results are similar with those reported in several other European population studies, in particular to those from neighbouring Serbia [14], Hungary [13], Wallachia [12], or Croatia [15].
HLA-A * 02, the most frequent HLA-A gene in the Transylvanian sample, was also the most frequent one in the Czech, Polish, Portuguese, Russian, and Italian populations, Table 6: Comparison of HLA-C gene frequencies in the Transylvanian (n = 1,847) vs. other European population groups.

Gene
Czech Republic Poland Russia France 4,669 2,907 2,650 6,094 n: number of analyzed subjects; X: not statistically significant P value. Resources presented in Table 1 provided the gene frequency references.
with no statistically significant differences being observed when their frequencies were compared (Table 4). However, the observed frequency of this locus was below the frequencies reported for Croatia, Serbia, Germany, or Hungary. HLA-A * 01 was the second most frequent gene in the analyzed sample, as well as in the population groups from Serbia, Czech Republic, Slovakia, Poland, Italy, Hungary, and Portugal. Its frequency was significantly different from those observed in Southern Europe populations such as the Greeks, Italians, and Portuguese, as well as in the Croatian and Wallachian ones. Although the HLA-B genes analyzed in our study exhibited the highest degree of polymorphism, we managed to find several similarities with the results reported in other population studies in Europe (Table 5). HLA-B * 35, the most frequent HLA-B gene in the Transylvanian sample, was also found as the most frequent gene in the Wallachian, Greek, Serbian, Croatian, Slovakian, Polish, and Italian populations, its frequency being not statistically different to those observed in the Serbian, Croatian, Wallachian, and Portuguese ones (Table 5). Lower frequencies of the HLA-B * 35 gene were observed in Northern Europe.
The HLA-B * 18 gene was the second most frequent in our sample, as well as in the Wallachian group, its frequency being not statistically different to those observed in the Wallachian, Greek, Serbian, Italian, Hungarian, or Slovakian population groups. Concerning B * 07, B * 08, or B * 44, the Transylvanian group exhibited lower frequencies than those reported for Northern and Central Europe. However, they were higher than those observed in Italy or Greece. Although no statistically significant differences in terms of frequencies were observed in either case, these data suggest an increasing trend from Southeastern towards Northwestern Europe. A reverse pattern (in accordance with our findings) is observed for the HLA-B * 51 gene, more frequent in Southeastern than in Northwestern Europe. Of note, the frequencies we found for HLA-B * 50 and B * 53 displayed the lowest values when compared to all studied groups.
Regarding the HLA-C locus, the scarceness of relevant data (studies with n > 1,000) allowed statistical analyses with only four other European population groups (Table 6). Since HLA-C * 07 was the most frequent HLA-C gene in the Transylvanian sample, as in all four considered groups, one might conclude it is the most frequent HLA-C gene in the European populations ( Table 6). The second most frequent gene was C * 04, as in the Czech, Polish, and French populations.
The most frequent HLA-DRB1 gene in the Transylvanian sample, DRB1 * 11, was also the most frequent one in the Hungarian, Greek, Serbian, Croatian, Slovakian, and Italian populations, its frequency being not statistically different from the ones in the Serbian and Wallachian populations; while smaller than those observed in Italy and Greece, the DRB1 * 11 frequency in our sample was higher than that in all other groups ( Table 7).
As a conclusion of this univariate analysis, no statistically significant differences were found between the Transylvanian population and the Serbian one when considering 51 out of the 61 genes analyzed (a concordance of 83.6%). High concordance was also noted in comparisons with the Hungarian (50 genes-82.0%), Wallachian (48 genes-78.7%), and Croatian (47 genes-77.0%) populations (Tables 4-7).
The multivariate analysis confirmed the results of the univariate analysis: Serbians, Hungarians, Wallachians, and Croatians were shown to be genetically closer to the Transylvanian population. Genetic F ST distances between the gene frequencies of the Transylvanian sample and of all other population groups taken into consideration in this study using multiple correspondence analyses for the HLA-A, HLA-B, HLA-C, and HLA-DRB1 loci are shown in Figure 2. Our data are consistent with two studies in Hungary [13] and Serbia [14], both indicating that the Serbs were genetically the most related to the Romanians.  Our study was the first to determine the haplotypes in this population group, allowing possible comparisons with ancient genomes. Since most literature data refer to three-locus haplotypes (HLA-A~B~DRB1), only the 50 most frequent three-locus haplotypes are presented here (see Table 3).
Regarding the most common haplotypes, our results are similar to those reported for the neighboring populations. The most frequently observed haplotype, A * 01~B * 08~DRB1 * 03, was in a similar position in several other European populations: Serbian [14], German [17], or Croatian [15]. The second most frequent in our sample, as well as in the Croatian [15] and Serbian [14] populations, the A * 02~B * 18~DRB1 * 11 haplotype was reported to be the most common in two Bulgarian studies [18,19] and fairly common in both Greek and Macedonian populations [20,21]. In contrast, it was only ranked the 23 rd in a German population group [17]. Our third most frequent haplotype, A * 02~B * 35~DRB1 * 11, was reported only in the 50 th place in the German population (0.3%) [17] and was not listed in  the Bulgarian top 12 [19], in the Bulgarian top 16 [18], in the Serbian top 10 [14], or in the Croatian top 50 most frequent haplotypes [15]. However, this haplotype was reported as common in certain Southern Europe populations such as the Greeks and Lombardy Italians (1.4%) [11].
For an equiprobable population, the probability of an individual being homozygous in a 1,770 sample size is 4.8% for HLA-A, 2.8% for HLA-B, 7.14% for HLA-C, and 7.7% for HLA-DRB1. In our sample, we found 3 times more individuals homozygous for the HLA-A locus than expected, 2.8 times more individuals homozygous for the HLA-B locus, 2.2 times more homozygous individuals for the HLA-C locus, and 1.6 times more homozygous individuals for the HLA-DRB1 locus than in the equiprobable population sample. A consanguineous environment in isolated mountain communities might be accountable in this case [26,27], a hypothesis supported by a statistically significant departure from the Hardy-Weinberg exact equilibrium found in our sample. Literature data revealed similarities for the HLA-A and HLA-B loci in a German population [28].
Another explanation for the statistically significant departure from the Hardy-Weinberg exact equilibrium could be the existence of two major ethnically distinct groups within the population: the Romanians (70.62%) and the Hungarians (17.92%). To confirm such an assumption, the two groups should be analyzed separately in future studies.
Although a sample size calculation was performed and a more than double number of individuals were enrolled in the study, the sample size was found to be too small for some rare genes (1 out of the 21 HLA-A genes, 5 of the 36 HLA-B genes, and 1 out of the 14 HLA-C genes). Some genes might not be present in the Transylvanian population at all (e.g., HLA-A * 80, which was not found in a considerably larger study on 159,311 Italians [16]). We recommend that further studies should consider larger sample sizes.
Another limitation of our study is that the population sample was not randomly selected from the general population, the selection process including only volunteer donors and paternity subjects consenting to participate in this study. However, taking into account the highly diverse origin of the Transylvanian sample, we consider that this aspect did not interfere significantly with our results.

Conclusions
This study provides information on a genetically imbalanced population subjected to intense migration in a continuously inhabited territory relatively isolated by a mountainous chain.
Our findings are that genetically closest to the Transylvanian sample are the neighbouring populations from Serbia, Hungary, and Wallachia.
The data derived from this study can be considered an incipient database helpful for subsequent population and dis-ease association studies or for donor recruitment planning at the regional level.

Data Availability
All relevant data is within the paper. All raw data remains in the possession of the authors of the article.