Mapping the Human Leukocyte Antigen Diversity among Croatian Regions: Implication in Transplantation

In the present study, HLA allele and haplotype frequencies were studied using the HLA data of 9277 Croatian unrelated individuals, typed using high-resolution methods for the HLA-A, -B, -C, and -DRB1 loci. The total numbers of observed alleles were 47 for HLA-A, 88 for HLA-B, 34 for HLA-C, and 53 for HLA-DRB1. HLA-A∗02:01 (29.5%), B∗51:01 (10.5%), C∗04:01 (15.8%), and DRB1∗16:01 (10.4%) were the most frequent alleles in the Croatian general population. The three most frequent haplotypes were HLA-A∗01:01~C∗07:01~B∗08:01~DRB1∗03:01 (4.7%), HLA-A∗03:01~C∗07:02~B∗07:02~DRB1∗15:01 (1.7%), and HLA-A∗02:01~C∗07:01~B∗18:01~DRB1∗11:04 (1.5%). Allele and haplotype frequencies were compared between national and regional data, and differences were observed, particularly in the North Croatia region. The data has potential use in refining donor recruitment strategies for national registries of volunteer hematopoietic stem cell donors, solid organ allocation schemes, and the design of future disease and anthropological studies.


Introduction
The Human Leukocyte Antigen (HLA) genes have been the focus of numerous studies in the past decades to their key role in processes of immune recognition; on the one hand, and an extensive polymorphism reflected in both the large number of genes and their immense allelic variety on the other. A very large proportion of these studies have been population studies, since knowledge about the HLA polymorphism in a given population has extensive applications, among which solid organ transplantation and allogeneic hematopoietic stem cell transplantation (allo-HSCT) from an unrelated donor are one of the most important ones.
The importance of HLA polymorphism in solid organ transplantation arises from the direct correlation between the transplantation outcome and HLA matching of the recipient and the donor. Since the probability of a matched donor grows with the higher level of the HLA diversity among the deceased donor pool, the need for establishing international organisations for allocation and cross-border exchange of deceased donor organs, such as Eurotransplant, became evident early on.
The development of the allo-HSCT program is highly dependent on the existence of volunteer HSC donor registries since these registries provide an HLA-matched unrelated donor (MUD) for those patients who do not have an HLAidentical sibling. Moreover, the number of HSCTs preformed from an unrelated donor is constantly increasing worldwide, as well as in Croatia, and it is closely followed by the parallel improvements and the expansion of volunteer HSC donor registries. Currently, there are more than 37 million volunteer HSC donors and cord blood units around the world, recruited in almost 80 national registries and more than 50 cord blood banks which list their donors/CBUs in the World Marrow Donor Association (WMDA) database [1]. The Croatian Bone Marrow Donors Registry (CBMDR) was founded in 1993 and joined the Bone Marrow Donors Worldwide (BMDW) organisation in the same year. As of August 2020, CMBDR enlists almost 60000 unrelated HSC donors.
The national HSC donor registry databases with HLA profiles of enrolled donors are a valuable source of information regarding the HLA polymorphism of a population to which these donors belong and can be used to enhance the strategies for registry improvement and development. Namely, one of the critical questions in the policy of each registry is that of size or more precisely of the sufficient/optimal number of donors needed for an efficient national registry. The second question is how to obtain this number, which leads to the necessity of developing optimised recruitment plans for increasing the number of donors [2][3][4][5]. Aside from increasing the number of donors in general, numerous strategies can be employed depending on the end-goal. Enrolment of younger male individuals is an example of such a strategy. This policy has been adopted by various registries, prompted by studies which have shown a correlation between donor age and survival after HSCT, e.g., patients who received HSC from younger donors had a better survival rate [6]. The advantages of male versus female donors have not been unequivocally proven by similar studies thus far, although in theory, female donors are less preferable due to possible HLA sensitisation as a result of pregnancy.
On the other hand, an effective recruitment and selection strategy based on HLA allele and haplotype frequencies can be established. Moreover, differences in HLA profiles of donors recruited in different donor recruitment centres may be representative of regional diversity [2,3]. Population studies as well as registry data demonstrated that a detailed characterization of the HLA polymorphisms in different populations worldwide is important and required in the field of allo-HSCT [7][8][9]. Several previous studies focused on HLA diversity at the regional level in national registries demonstrated that HLA frequencies vary across different geographic regions and are correlated with geography [9,10]. Conversely, other similar studies have been published with reports of no such regional HLA differences observed [11]. These results suggest that recruitment strategies may differ from one country to another. Motivated by these opposing data, the question was raised of whether regional differences in HLA polymorphism exist in Croatia. Croatia's placement on the Balkan Peninsula as well as the observed influence of various other populations on the Croatian population in certain parts of the country would suggest that such differences could be expected. Croatians migrated from the Baltic to South East Europe in the 7th century. In that early period, a part of the Croatian population settled on the Adriatic coast which was followed by mixing with the South East Europe's autochthons (Illyrians, Thracians). Afterward, in the last few centuries, northern and central parts of Croatia were influenced by Austrians, Hungarians, and Germans while the southern part was influenced by Italians, as well as by Turks [12].
Regarding Croatia's regions, the contemporary regional division of the country into the northern and southern parts is essentially dictated by the country's geographical features. The northern, predominantly lowland part of the county, is then divided into Central Croatia, East Croatia, and North Croatia, while the coastal, southern part of Croatia is usually subdivided into two regions: Istria & Primorje and Dalmatia ( Figure 1). Along with the geographic feature, this division also takes into account the influence of different populations on these regions throughout history [12].
In order to expand the current knowledge about the HLA polymorphism in the Croatian population and explore the possibility of regional differences in HLA allele and haplotype distribution, data from CBMDR were used in the present study. Although information about HLA allele and haplotype frequencies among Croatians has already been reported in a few publications, those analyses have focused only on the Croatian population as whole, without taking into account the different regions of Croatia [13,14].

Subjects.
A total of 9277 volunteer unrelated donors from the CBMDR were included in the dataset (data extracted on 01-12-2018). The donors were recruited and originate from 5 different regions of Croatia (1532 from Dalmatia, 865 from Istria & Primorje, 1021 from Central Croatia, 1877 from East Croatia, and 1049 from North Croatia) as well as from Zagreb (N = 2933), as illustrated by Figure 1. The number of samples chosen for each region correlates with its population size and represents 0.2% of the number of inhabitants for a specific region. The number of towns included for each region is as follows: 11 from Dalmatia, 8 from Istria & Primorje, 10 from Central Croatia, 9 from East Croatia, and 8 from North Croatia. Donors from the same region were then selected based on their residence in different towns situated in a given region, in such a way that all towns were adequately represented. Finally, individuals residing in the same town and carrying the same surname were excluded from the sample. The regions do not represent official Croatian counties, but rather a geographical division of Croatia. One possible disadvantage of the sample selection method used in this study is the potential gene flow due to population migrations, which regularly occur in Croatia in the direction of a regional centre of each region. For that reason, the capital city of Croatia, Zagreb, was excluded from the sample of the Central Croatia region since constant migration to Zagreb occurs from all Croatian regions. For the same reason, the sample including subjects residing in Zagreb was chosen as a sample representative of the Croatian population in general. The study was approved by the Ethics Committee of the University Hospital Zagreb, and it was performed in line with the Helsinki Declaration. Sweden), and results were processed using the Helmberg SCORE 5 software [16]. HLA ambiguous typing results were retested by employing the sequence-based typing (PCR-SBT) method using a commercial Olerup SBT Resolver kit (Car-eDx Pty Ltd, Fremantle Western Australia, Australia) [17]. Data obtained by the SBT method were evaluated by Olerup Assign SBT v. 4.7.1 program. The IPD-IMGT/HLA databases 3.31.0-3.35.0 were used for analysis. In cases when HLA ambiguous results were still present, a decision was made based on the CWD allele data [18]. The list of HLA ambiguities that could not be resolved using the abovementioned methods is available in Supplementary  Table 1. It is necessary to mention that by using this approach, we perhaps failed to detect some rare or very rare alleles, but the projected percentage of such cases is very low.

Statistical Analysis.
Allele frequencies were calculated using the GeneRate program [19] and also by direct counting, with no difference observed in results obtained by these two approaches. In cases when only one allele was present at a given locus, the individual was counted as homozygous. PyPop (PyThon for Population genetics, version 0.7.0) was used to test for Hardy-Weinberg equilibrium (HWE), to conduct the Ewens-Watterson homozygosity analysis and to estimate four-locus haplotype frequencies [20]. The significance of differences in allele and haplotype frequencies between regions was evaluated using the chi-square test, while Fisher's exact test with Yates correction was used if any of the values in 2 × 2 tables were <5 (GraphPad Quick-Calcs online software, https://www.graphpad.com/). P value was corrected by the number of alleles observed at each locus (Pcorr). P values obtained for haplotype analysis were also corrected for multiple testing.

Results
The expected and observed allele frequencies for the alleles at tested HLA loci did not differ significantly, and populations from all regions as well as the population from Zagreb were found in the Hardy-Weinberg equilibrium ( Table 1). As the first part of the analysis, we compared the alleles observed among individuals included in the present study with the most recent catalogue of common and well-defined (CWD) alleles by the European Federation for Immunogenetics (EFI) [18]. This comparison revealed that 165 (81.7%) of HLA alleles detected in this study have been included in the EFI CWD catalogue. Conversely, 37 (18.3%) alleles found in the Croatian population have not been reported as CWD alleles in the EFI CWD catalogue thus far.
In the second part of the study, the analysis of the HLA allele frequency distribution was performed. Ten most frequent alleles at tested HLA loci in Zagreb and their respective frequencies in each region are listed in Table 2.
A comparison of the data obtained for the sample from Zagreb with the results from the previously published study for the Croatian population did not reveal any statistically significant difference, which justified our choice of Zagreb data as reference data for our population in general [13].
A total of 47 HLA-A alleles were found in our entire sample (N = 9277), among which the most frequent allele in the Zagreb population was A * 02:01 (29.5%). This allele was also the most frequent allele in all five analyzed regions, with a frequency ranging from 28.2% to 31.5%. The secondranked allele by frequency in Zagreb was HLA-A * 01:01 (12.9%), which appeared among individuals from five regions with a frequency ranging from 11.7% to 14.0%. Finally, with a frequency of 11.8%, which placed it in the third place in the Zagreb sample, the allele HLA-A * 03:01 was detected in the five regions with a frequency ranging from 10.1% to 11.6%. The HLA-B locus exhibited the highest polymorphism with 88 detected alleles, of which the three most frequent alleles among Zagreb residents were HLA-B * 51:01 (10.5%; frequency range in the five regions from 8.3% to 12.6%), HLA-B * 18:01 (8.0%; ranging from 7.5% to 9.5% in the regions), and HLA-B * 07:02 (7.5%; frequency ranges from 6.6% to 8.0% in the five tested regions). Among 34 different alleles observed at HLA-C locus, two alleles, HLA-C * 04:01 (ranging from 13.5% to 16.7%) and C * 07:01 (ranging from 14.1% to 17.0%), were present in more than 10.0% of the tested subjects in each region, while the third HLA-C allele ranked by frequency in the Zagreb sample, HLA-C * 12:03 (11.6%), exhibited a frequency range from 8.5% to 14.9% among subjects from the five regions. Fifty-three different alleles were determined at the HLA-DRB1 locus. Three most frequent alleles among Zagreb citizens were HLA-DRB1 * 16:01 (10.4%; ranging from 7.2% to 13.0% in the five regions), HLA-DRB1 * 03:01 (10.1%; ranging from 9.9% to 11.7% in the five regions), and HLA-DRB1 * 01:01 (10.0%; ranging from 8.1% to 10.7% in the five regions). The distribution of all observed alleles at four tested HLA loci in the Zagreb population as well as in each region is listed in Supplementary Table 2. Figure 2 summarizes the data about 17 alleles whose frequencies significantly deviated (were either increased or decreased) in one region in comparison to at least three other regions (Zagreb is excluded). The highest number of these alleles (N = 10), for four of which (HLA-C * 07:04, C * 12:03, DRB1 * 04:02, and DRB1 * 16:01) the observed differences remain significant even after the correction of P value, was detected in the North Croatia region. It is interesting to note that the occurrence of two alleles belonging to the same gene group (HLA-DRB1 * 04:01 and DRB1 * 04:02) was significantly different in this region in comparison to the other parts of Croatia. This investigation also revealed that the difference in the distribution between regions can be attributed to only two HLA-A alleles (HLA-A * 66:01 and A * 68:02). These alleles were present with a significantly different frequency in Dalmatia in comparison to other regions. The results of the analysis also revealed that different alleles of the same gene group exhibit significant variation in frequency among subjects from five Croatian regions. Examples of such variation are the HLA-B * 35 alleles: the HLA-B * 35:01 allele was significantly more frequent in North Croatia, while the HLA-B * 35:03 allele was more frequent in Dalmatia. The HLA-B * 35:08 allele was observed with a significantly higher frequency in Istria & Primorje but only in comparison to Dalmatia and North Croatia, and therefore, it was not 4 Journal of Immunology Research included in Figure 2. In contrast, the HLA-B * 35:02 allele was present with similar frequencies in all regions (from East Croatia -1.1% to North Croatia -1.7%). Another example is the frequency of the HLA-B * 44:03 allele, which was significantly higher in Central Croatia and East Croatia than in the rest of Croatia. The fourth aim of the present study was to analyze the distribution of non-CWD alleles (according to data presented in the EFI CWD catalogue-version 1.0) and to evaluate their distribution in different Croatian regions [18]. As suggested in a previous study and to avoid possible misinterpretations, we used an additional term, "local" (LOC), to categorize the alleles which occurred ≥3 times in our sample but are not present in the current EFI CWD catalogue [21]. Figure 3 lists LOC alleles, but also all other HLA alleles observed ≤2 times in this study. Among 15 HLA alleles observed only once, one third (5 out of 15) were found in East Croatia, three alleles were detected in Central Croatia, two in North Croatia, one in Dalmatia, and Istria & Primorje each, and the remaining two alleles with one occurrence appeared among the residents of Zagreb. In this group of HLA alleles observed only once in our study, four are classified as rare (HLA-A * 01:08, A * 24:41, B * 38:08, and DRB1 * 01:31), while DRB1 * 12:39 is categorized as very rare according to the Rare Alleles Detector (RAD) [22]. It is interesting to mention that among twelve non-CWD alleles classified as LOC alleles, only three alleles (HLA-B * 39:31, C * 08:03, and DRB1  [18,23]. The three most frequent HLA-A~C~B~DRB1 haplotypes with a frequency >1.0% in all regions were HLA-A * 01:01~C * 07:01~B * 08:01~DRB1 * 03:01 (range from 4.3%, Central     Journal of Immunology Research Croatia to 6.4%, Dalmatia), HLA-A * 03:01~C * 07:02~B * 07:02~DRB1 * 15:01 (range from 1.3%, Dalmatia to 1.8%, Central Croatia), and HLA-A * 02:01~C * 07:01~B * 18:01~DRB1 * 11:04 (range from 1.4%, North Croatia to 2.1%, Central Croatia). The distribution of the remaining 20 most frequent HLA-A~C~B~DRB1 haplotypes in Zagreb, which represents the Croatian population in total, is presented for each region in Table 3. The remaining HLA-A~C~B~DRB1 haplotypes found ≥4 times in each region are shown in Supplementary Table 3.

Journal of Immunology Research
For five out of the 20 most frequent HLA-A~C~B~DRB1 haplotypes among Croatians in general, a significantly different frequency was observed in one region in comparison to at least three other regions ( Figure 4). Again, the highest number of such haplotypes was found in the North Croatia region.

Discussion
The present study is the first analysis of the HLA polymorphism in different regions of Croatia. The regions included in this analysis correspond to the established regional division of Croatia, and the number of samples pertaining to each region was adjusted according to the population size for that particular region. The comparison of HLA allele and haplotype distribution was performed between different regions as well as between each region and the Croatian population in general (as represented by the Zagreb sample).
As no deviation from the HWE was detected, our registry sample may be considered as representative for the regional population as suggested by different authors [24,25].
Comparison of allele frequencies at tested HLA loci between the Zagreb sample and our previous study, which included 4000 individuals from different cities, has not revealed any significant difference and therefore additionally supports our hypothesis that Zagreb data can be used as reference data for the Croatian population in total [13]. At the same time, some differences were observed between five Croatian regions as well as between regions and the Zagreb data.
The observed HLA heterogeneity of the Croatian population is probably a result of a very turbulent history during which numerous influences on different populations in different regions of Croatia occurred. For example, prior to the arrival of the Slavic population as a part of the Avar migration, the area of Dalmatia was inhabited by different Illyric tribes which mixed with Greek colonists, especially on the islands [26]. This substratum was later on Romanised as part of the Roman Empire. As part of the coastal area in the vicinity of Italy, Istria was constantly exposed to the influences and admixing which arrived from the Apennine area, even after the fall of the Roman Empire [12]. In the northern parts of Croatia, the Illyric population mixed with the Celts before the Romans, and after the fall of the Roman Empire, traces of the migration period (of different Germanic tribes) remained in this area. This area of Croatia was a part of different political associations in the later periods, and as a result, traces of the former Austro-Hungarian Monarchy population are visible (Germans, Hungarians, Austrians, and Czechs). The central part of Croatia as well as the easternmost parts shows traces of the long-term Ottoman presence. An additional possible cause for the genetic heterogeneity of the Croatians is the country's location in South-Eastern Europe on the corridor between Southern and Northern Europe. Genetic drift probably also added to the variation among the different regions. Finally, the variation could likewise be a result of a selection caused by the exposure to different pathogens and a subsequent better or worse adaptation of individuals with specific HLA genes.     Journal of Immunology Research Ten most frequently observed alleles represent approximately 90% of the cumulative frequency at the HLA-A locus; this percentage was nearly 65% at the HLA-B locus, around 85% for the HLA-C locus, while at the HLA-DRB1 locus, their frequency amounted to 78%. The HLA-A * 02:01 allele was the most frequent allele at the HLA-A locus in all regions, and the comparison of this allele between our and neighbouring populations did not reveal any significant difference [22]. Only six out of 88 different HLA-B alleles showed a frequency >5.0%, and the HLA-B * 51:01 allele was the most frequent one in almost all regions (except North Croatia). This study supports the data from a previous investigation which showed that HLA-B * 51:01 is the most frequent among HLA-B alleles. The same finding was reported for some other populations in the south of Europe [22]. Regarding the HLA-C locus, the HLA-C * 07:01 allele was more frequent in Dalmatia, Istria & Primorje, and East Croatia compared to Central and North Croatia, whereas the HLA-C * 04:01 allele was more common in Central and North Croatia in comparison to the rest of the country. In general, the frequency of the HLA-C * 07:01 allele increases from Southern (Spain -11.0%) to Northern Europe (United Kingdom -19.0%), and the frequencies detected for this allele among individuals from Croatian regions fall somewhere in the middle of these values (around 15%). The opposite observation was made for the HLA-C * 04:01 allele, whose frequency decreases in the same south-north direction, and, once again, the Croatian regional frequencies from 13.5% to 16.7% fit well in that range [22].
One of the interesting results pertaining to the HLA-DRB1 locus is the low frequency of the HLA-DRB1 * 16:01 allele (7.2%) in Istria & Primorje, which is perhaps caused by the marked influence of the Italian population on this region. Namely, the frequency of the HLA-DRB1 * 16:01 allele among Italians is around 5.0% [22]. This allele occurs among individuals from the remaining regions of Croatia with a frequency of around 10.0% or higher (North Croatia -13.0%). This frequency distribution fits well with results from other population studies which state that the DRB1 * 16:01 allele can be found with the highest frequency in Southern European countries (e.g., 13.7% among Greeks, 14.9% among North Macedonians, or as much as 15.5% among Bulgarians) [22]. North Croatia is also a very peculiar region regarding the distribution of the two most frequent HLA-DRB1 * 04 alleles (HLA-DRB1 * 04:01 and DRB1 * 04:02).
HLA allele frequency distribution demonstrated that differences between the regions are not large; however, it nevertheless disclosed a number of HLA alleles with significant differences among the regions. Seven alleles were present in North Croatia with a significant difference in frequency when compared to all other regions as well as in comparison to the Croatian population in general. This finding is probably associated with the geographic specificity of that region and its relative isolation from the other regions. Namely, even in a small country such as Croatia, there was some migration of population in the past, for example, from Dalmatia to East Croatia. For the North Croatia region, however, there were never reports of substantial migration in either direction.
According to the EFI CWD catalogue ver. 1, the percentage of non-CWD HLA alleles in the Croatian population was 17.9%. Among these non-CWD alleles, a few alleles (HLA-B * 39:31, C * 08:03, and DRB1 * 11:12) are present in all Croatian regions and fall into the LOC allele category for our population, despite the fact that they are not even classified as WD 9 Journal of Immunology Research in the abovementioned catalogue. The data about non-CWD alleles in our population raises the assumption that we probably failed to detect some rare and very rare HLA alleles, but published data for other populations so far suggest that this percentage is undoubtedly very low.
This discrepancy might exist due to the criteria for establishing common status in the EFI CWD catalogue and further corroborates the suggestion that populations where these alleles are commonly observed are underrepresented in the EFI CWD catalogue since no data for them were available at the time the catalogue was established. This is especially the case for the populations from the South-Eastern region of Europe. The matter of population size could also be involved in the explanation of the fact that some LOC alleles in Croatia are not even listed as WD alleles in EFI CWD catalogue [18]. Namely, there is a considerable difference in the sample size of the Croatian population represented in the EFI CWD catalogue and the one reported in the present study. The sample of the current study is at least twice as large, and it is expected that less frequent alleles will have a better chance of being detected in a larger population study. Our study points out some of the differences in the lists of available CWD catalogues published so far [18,23]. Also, it highlights that different population data pools provide different information about the categorization of HLA alleles in the group of CWD alleles [18,23]. After all, this is the reason why the inclusion of more population data for HLA alleles is important for obtaining a precise image about CWD alleles and for gaining more information on HLA diversity.
Nonetheless, as described in the results, some HLA fourlocus haplotypes exhibit specific regional characteristics. For example, three haplotypes were significantly more frequent in North Croatia in comparison to all other regions. Haplotype HLA-A * 02:01~C * 03:04~B * 15:01~DRB1 * 04:01, ranked 20 th in the total population, was located in the 7 th place, haplotype HLA-A * 02:01~C * 07:04~B * 44:27~DRB1 * 16:01 was second, while haplotype HLA-A * 02:01~C * 02:02~B * 27:02~DRB1 * 16:01 was third. The data about HLA polymorphisms obtained in this study is valuable for resolving the HLA diversity in different regions of Croatia, and it could be a valid tool for developing a new recruitment strategy for CBMDR. Moreover, in solid organ transplantation setting, our recent study emphasized the importance of including populations with different HLA profiles in international organ exchange programs [28]. More precisely, that study suggested that, for example, patients on a waiting list for kidney transplantations who are HLA-B * 18 positive had a greater chance of receiving a kidney graft from a Croatian deceased donor then from the Eurotransplant donor pool. Finally, the presented data has a potential for use in designing HLA disease and anthropology studies as well.
To conclude, regardless of the fact that the Croatian population, in the global context, represents a very small population, HLA diversity can still be observed and therefore should be considered and documented.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request