High Resolution Class I HLA -A, -B, and -C Diversity in Eastern and Southern African Populations

Africa remains significantly underrepresented in high-resolution Human Leukocyte Antigen (HLA) data, despite being one of the most genetically diverse regions in the world. This critical gap in genetic information poses a substantial barrier to HLA-based research on the continent. In this study, Class I HLA data from Eastern and Southern African populations were analysed to assess genetic diversity across the region. We examined allele and haplotype frequency distributions, deviations from Hardy-Weinberg Equilibrium (HWE), linkage disequilibrium (LD), and conducted neutrality tests of homozygosity across various populations. Additionally, the African HLA data were compared to those of Caucasian and African American populations using the Jaccard index and multidimensional scaling (MDS) methods. The study revealed that South African populations exhibited 50.4% more genetic diversity within the Class I HLA region compared to other African populations. Zambia showed an estimated 36.5% genetic diversity, with Kenya, Rwanda and Uganda showing 35.7%, 34.2%, and 31.1%, respectively. Furthermore, an analysis of in-country diversity among different tribes indicated an average Class I HLA diversity of 25.7% in Kenya, 17% in Rwanda, 2.8% in South Africa, 13.6% in Uganda, and 6.5% in Zambia. The study also highlighted the genetic distinctness of Caucasian and African American populations compared to African populations. Notably, the differential frequencies of disease-promoting and disease-preventing HLA alleles across these populations emphasize the urgent need to generate high-quality HLA data for all regions of Africa and its major ethnic groups. Such efforts will be crucial in enhancing healthcare outcomes across the continent.

populations compared to African populations.Notably, the di erential frequencies of diseasepromoting and disease-preventing HLA alleles across these populations emphasize the urgent need to generate high-quality HLA data for all regions of Africa and its major ethnic groups.Such e orts will be crucial in enhancing healthcare outcomes across the continent.

RSA -
Republic of South Africa CAU -Caucasian HF -Haplotype Frequency AFAM -African American LD -Linkage Disequilibrium

Author Summary
This study investigated the diversity of class I HLA in the eastern and southern regions of the African continent using a population genetics approach.Analysis of HLA data at both country and tribal levels revealed significant genetic di erences and the unique characteristics of these populations compared to Caucasian and African American populations in the United States.The di erential frequencies of disease-promoting and disease-preventing HLA alleles across these populations suggest that large-scale vaccine administration may be ine ective without a thorough understanding of the HLA composition of each population.This study highlights the urgent need to generate high-quality HLA data across all regions of Africa and its major ethnic groups.Such comprehensive data collection is essential for optimizing vaccine design, deepening our understanding of HLA-disease associations, and ultimately improving healthcare outcomes across the continent.

INTRODUCTION
The Human Leukocyte Antigen (HLA) complex consists of highly polymorphic genes that code for surface proteins responsible for presenting antigens to T cells as part of an immune response to infections [1].According to the IPD-IMGT/HLA database, more than 40,000 HLA alleles have been identified and the total HLA allele variation is estimated to be several millions across the di erent populations around the world [2].Africa, often referred to as the cradle of humankind [3,4], boasts of the highest levels of human genetic diversity in the world [5].This rich genetic diversity is a result of the continent's long evolutionary history, complex demographic processes and genetic admixture that have shaped its populations over time [4,6].However, population data sets on some of the databases such as Allele Frequency Net Database (AFND) which provides the scientific community with a freely available repository for the storage of frequency data including alleles, genes, haplotypes, and genotypes have reported very limited HLA frequency data for African populations [7][8][9].Moreover, many ethnic populations in Sub-Saharan Africa are underrepresented in medical genomics studies due to limited research, particularly on HLA alleles, compared to developed countries [10].This discrepancy in HLA allele data is also reflected in the IPD-IMGT/HLA database, where most submissions originate from Europe, America, and Australia (IMGT/HLA Database, released of July 2024, IPD-IMGT/HLA Database (ebi.ac.uk)).This indicates a significant lack of HLA typing infrastructure in Sub-Saharan Africa, further contributing to the scarcity of HLA data for these populations.
The HLA genes have been widely studied over the years due to their extensive allelic variability across diverse populations and their importance in host immune responses, therapy and organ transplantation [2,8,[11][12][13].In addition, some HLA alleles have been associated with either protection against or susceptibility to a wide range of autoimmune and infectious diseases as well as drug-induced hypersensitivity and cancer [14].For instance, HLA class I alleles such HLA-B*27, HLA-B*52, HLA-B*57 and HLA-B*81 have been linked to protection against HIV disease progression (protective alleles) whereas HLA-B*35, HLA-B*51:01, and HLA-B*58:02 have been linked to rapid disease progression (disease-susceptible alleles) [15,16].However, HLA alleles and haplotypes do not occur at the same frequency in di erent populations.For example, in Caucasians, HLA-B*58:02 which is linked to HIV disease susceptibility is mainly absent whereas it is highly prevalent in the African population [15].Similarly, the protective allele HLA-B*57:01 is highly prevalent in the Caucasian population whereas it is largely absent in the African population [15,17,18].
Leveraging HLA diversity data can lead to more tailored therapies and inform the rational design of T cell-based vaccines that will be e icacious across di erent populations.In this study, population genetics approaches have been used for understanding HLA (genetic) diversity in the eastern and southern African regions.The study provides an insight into the extensive diversity of the allelic and haplotype frequencies within five African populations and compared to the Caucasian and African American populations.

Genetic Diversity Between African and U.S. Populations
To address the extent of HLA di erences between the African populations and the US populations, we compared HLA data from the African sub-regions to the Caucasian and the African American populations.Although, the Caucasian and African American HLA studies have received some attention in literature [19][20][21], this study demonstrated that HLA data from the US populations cannot be a true representative of the African HLA population data.We computed allele frequencies across all populations in this study to help identify complex genetic traits and discover HLA disease associations [22,23].Frequencies of alleles were estimated by direct counting.The full list of alleles and their frequencies across populations is detailed in Supplementary Tables 1, 2 and 3. Allelic frequency distributions vary across populations.Some alleles frequencies are either high or low, while others may be present or absent across populations.Allele frequencies (for countries) were sorted in descending order within each population and alleles with frequencies of at least 5% were plotted and presented in Figure 1 for all loci.

HLA-A
HLA-A*02:01 is present among the top 5% alleles in all the populations at di erent high frequency level with it being highest in Caucasians at 25% frequency.However, this allele was relatively low in the South African (RSA) population with a frequency of 5.1% as presented in Figure 1.Also, HLA-A*68:02 was observed at a relatively high frequency (> 8%) in the African populations but low (6%) in AFAM.Interestingly, this allele was not among the top 5% in the Caucasian population.HLA-A*68:01, A*02:02, A*02:05, and A*11:01 were observed among the top 5% frequent alleles and only found in Kenya (5%), Rwanda (8%), RSA (5%), and Caucasian (6.4%) populations respectively (Figure 1 and Supplementary Table 1).

HLA-C
In locus C, HLA-C*04:01, HLA-C*06:02 and HLA-C*07:01 were highly observed and listed among the top five of the 5% most frequent alleles in all the populations (Figure 1).However, HLA-C*02:10 was among the top 5% frequent alleles in the African and AFAM populations but was not among the top 5% alleles in the Caucasian population.Also, HLA-C*17:01 was among the top 5% alleles only in the African populations.In the Caucasian population, HLA-C*05:01 and HLA-C*12:03 were present at 8% and 6% respectively but were not among the top 5% frequent alleles in other populations (Figure 1, Supplementary Table 3).
Jaccard index was used to quantity the similarity (or dissimilarity) between two populations in terms of alleles composition and genetic makeup.The Jaccard index was obtained by determining the alleles that are simultaneously present in two populations.The structure of the alleles was then used to determine the Jaccard similarity indices, converted into percentages and drawn as a non-clustered heat map.The darker the red colour in the heatmap, then the more similar two corresponding populations.Generally, Figure 2C shows low level of genetic similarities between the African and US populations.The Caucasian population had the lowest similarity index to the African populations in all the HLA locus considered.Similarly, the African American population showed relatively higher similarity indices to the African populations than the Caucasian population in all loci.The darker the red colour, the more similar the two populations involved.
In addition to the individual HLA alleles, the study determined the extent to which haplotypes (specific combination of alleles inherited together on the same chromosome) overlap between populations.This was determined using MDS to visualize the genetic distances (cartograph) at all the haplotype loci.Analysis was carried out on the relative frequency of haplotypes in each population relative to other populations.Haplotype frequencies from each population data were dimensionally reduced using MDS to create a 2-dimensional genetic cartograph.Based on the analysis, two countries are close to each other on the map if the distribution of the haplotypes in these two populations are close to each other, relative to the distribution observed in the other countries.In Figure 3, the African American population is relatively closer to the African populations compared to the Caucasian which is farther away from the African populations at the global haplotypes.

Genetic Diversity Within African Populations
To better define HLA genetic diversities among African populations at both country and tribal levels, we computed within populations diversities using the Shannon and Simpson diversity indices.Similarly, the Jaccard index in Figure 2C present a diversity index to compare the HLA di erences between African populations.The Shannon and Simpson indices were determined at each locus and across African populations (see Supplementary Figure 1 for tribes).Figure 2A provides a summary of the Shannon index which accounts for alleles specie richness and evenness of their abundance.Also, Figure 2B provides a summary of Simpson indices which account for probability that two alleles taken from the sample at random are of di erent types.In Figures 2A and 2B, all populations present the natural polymorphic structures of the HLA alleles except for South Africa wherein HLA-A is slightly more evenly distributed than HLA-B.Generally, the higher values observed in all populations at di erent loci indicate high level of genetic diversity within each of the African populations.A similar trend of results were observed in the tribal populations of each country in the African sub region (Supplementary Figure 1A and 1B).
In Figure 2C, the highest values of the Jaccard index among the eastern African countries were observed between Rwanda and Uganda with similarity indices of 68%, 79% and 91% at locus A, B and C respectively.South Africa and Zambia had similarity indices of 57%, 50% and 64% at locus A, B and C respectively.Interestingly, Uganda and Zambia had the highest similarity index of 77% at locus A. Conversely, South Africa had the lowest similarities with other African countries at all loci except with Zambia at locus C where the similarity index was relatively higher (64%).At the tribal level (Supplementary Figure 1C), the Jaccard index also shows various values of similarity indices across tribes within the African countries.The Zulu tribe show low (24% ≤  ≤ 56%) similarity to other African tribes at all loci except with Bemba tribe (62% at locus C) from Zambia.It was observed that only few tribes from some of the African countries have similarity above 80%.For instance, a high similarity of 82% was observed between the Nsenga tribe of Zambia and Munyankole tribe in Uganda at locus A.
Also the cartograph shows that at loci B:C, South Africa (RSA) was observed to be farther away from other African countries (Figure 3C).Similarly, Kenya is seen to be far from the rest of the African countries at loci A:B:C (Figure 3D).Zambia and Kenya were observed to be closer at loci A:B and A:C (Figures 3A and 3B), while Uganda and Rwanda were closer at loci A:B and A:B:C (Figures 3A and 3D).We also observed close genetic distances between Zambia and Rwanda as well as Uganda and Kenya at loci B:C (Figure 3C), whereas, Uganda and Rwanda were closer at loci A:B:C (Figure 3D).Furthermore, the cartograph at tribal level as presented in Supplementary Figure 2 shows that the genetic distances between the Kenyan tribes (Kikuyu and Luhya) are farther apart from each other at all loci.The Ugandan tribes in this study (Muganda, Munyankole and Munyarwanda) were observed to be far apart from each other at all loci except for loci B:C where they seem to be relatively closer.Also, the Lozi tribe in Zambia did not exhibit the same genetic closeness as other Zambian tribes at all loci (Supplementary Figures 2).The rarefaction curves in Figure 4 were employed to evaluate completeness of samples and uniqueness of alleles for each population (country and tribe) as a function of the number of participants.This was determined by creating a subsample of size  and counting the number of unique HLA alleles included in the subsample for any given (HLA alleles, population) pair.
Subsampling was done at random without replacement and was repeated for di erent values of  = 1, . . .,  , where  denotes the number of participants from the selected population.We next plotted the number of unique HLA alleles as a function of the number of participants by population.The overall numbers and percentage frequencies of alleles observed across populations at each locus are presented in Table 1.The rarefaction curves in Figure 4 indicate that the curves typically rises quickly initially as unique alleles were observed then levelled o as only few rare alleles remain to be observed.Also, the curves presented the natural diversity structures of alleles in each locus [24].HLA-B had the highest allelic diversity in all populations (countries and tribes), followed by HLA-A, while HLA-C had the least allelic diversity.Visual inspection of the curves also supports the genetic diversities observed in Table 1 across the African countries.In Table 1, Kenya is observed to have the highest allelic diversity at all loci, followed by Rwanda, then Uganda and Zambia, while South Africa has the least diversity at all loci among the African countries.

HLA Alleles Linked to Immune Responses and Disease Outcomes
Additionally, we investigated immune/disease associations of some known alleles at each locus to determine the important di erences in the HLA alleles that are associated with immune/disease outcomes for the di erent populations.This will assist in understanding the genetic basis of immune responses and the association with diseases such as HIV, leading to improved diagnostics, treatments, and preventive strategies [25,26].According to literature, some alleles are grouped as either Bw4 or Bw6 [27].The Bw4 and Bw6 are epitopes found on most HLA-B and few HLA-A proteins, which play a role in immune responses [28].In addition, other alleles are classified as either protective or disease susceptible based on di erent populations and studies [27][28][29][30][31].This study observed the frequencies of the known immune and HIV disease associated alleles in each of the populations and the results are presented in Table  principles in population genetics and serve several important purposes in diversity studies [32,33].The HWE and Neutrality test were both performed at di erent HLA loci on the African populations.Significant deviations from expected HWE heterozygosity were observed in the distribution of genotypes of HLA-C in South Africa (Table 3) and the Ngoni tribe at HLA-B (Table 4).Also, the Neutrality test of homozygosity showed significant deviations from expected homozygosity in Kenya, South Africa and Uganda at locus A and in Rwanda and Uganda at locus B (Table 5).At tribal level, similar deviations were also observed at locus B for Muganda, Nsenga and Tumbuka (Table 6).Also, Chewa and Muganda tribes had significant deviations from expected homozygosity at locus C (Table 6).Haplotypes and linkage disequilibrium analysis helps in understanding the genetic variation of alleles that are inherited together on the same chromosome and non-random associations among alleles at di erent loci respectively.The haplotypic associations of the HLA class I region were also investigated.While the full list of haplotypes is detailed in the Supplementary Tables  7).Between the two loci, the strongest estimated associations were those between alleles of HLA-B and C (Table 8).

HF: Haplotype Frequency
Pairwise linkage disequilibrium measured by Hedrick's and Crammer's statistics at all loci across populations were all statistically significant ( < 0.001) as presented in Table 8.Few loci such as A:B in Lozi, Tonga and Tumbuka, A:C in Tonga, show random association (not significant) between alleles (Table 9).Additionally, this study investigated alleles that are unique to di erent populations (see Supplementary Tables 8 and 9).Based on the sample sizes of each population in this study, it was observed that certain alleles were unique to di erent populations.Furthermore, Alluvial plots were employed to present the inheritance patterns of alleles observed in this study.Each block size in the alluvial plot represents the frequency of the corresponding alleles and the thickness of the flow streams denotes the frequency of alleles inheritance pattern.These provide an understanding of predicting the likelihood of inheriting specific traits or conditions [34,35]. Figure 5 presents inheritance patterns of HLA-B alleles as observed in the African populations (see Supplementary Figures 3, 4

DISCUSSION
This study adopted several population genetic diversity approaches to investigate class I HLA diversity in the eastern and southern African populations compared to the US populations.
This study observed di erences in allele frequencies in all populations.Distribution of allele frequencies are influenced by several factors such as genetic drift, gene flow, mutation, population history, natural selection, making each population genetically unique [36][37][38].Allele frequencies vary across populations and the topmost 5% frequent alleles reported in this study have also been reported in other studies [19,[39][40][41] at higher or lower frequencies.For example, HLA-A*02:01 is a common allele of the HLA-A gene, playing a crucial role in the immune system [42,43].The high prevalence of HLA-A*02:01 in a population has been linked to higher risk of certain cancers [44].This distinction among the alleles frequencies across populations is a testament to the HLA genetic architectural diversity among the populations.
The Jaccard index heatmap shows various levels of allelic similarities among populations (regions, countries and tribes).The heatmap indicated that the Caucasian and AFAM are dissimilar to the African population due to extremely low similarity indices observed at all loci.This a irms the allelic diversity between the African and US populations and suggests that there are several uncommon alleles between the two populations.Population comparisons based on haplotype frequencies using MDS showed distinct genetic di erences both within African populations and between African and US populations.The Cartograph clearly shows the distinction between the African and United State populations.The Caucasian population show high genetic distances to the African populations at all loci which indicated high diversity between the two populations.The African American population though genetically close to African population due to their historical background [45], still maintain a level of distinction which suggests a non-representative of the African populations.
The Shannon and Simpson indices a irm the polymorphic status of each HLA locus and suggested di erent levels of diversity within each population.Interestingly, despite the lower number of HLA-A alleles detected in the South African populations, the Shannon index shows that this locus displayed relatively evened allele distributions which resulted in higher diversity than HLA-B that had more alleles.
The highest values of the Jaccard index in the African populations were observed among the eastern African region at all loci.This suggests high similarity in terms of their combination of common alleles.Majority of the southern African countries had low similarity due to uncommon alleles between them.At the tribal level, the Zulu tribe also exhibited low similarity to other tribes within the African region but maintained relatively high similarities with the Zambian tribes at all loci.This a irms the closeness between the two populations at both country and tribal levels.
Summarily, the Jaccard indices observed at the tribal levels also a irm the existence of allelic diversity among tribes of the same countries within African populations.The genetic distances observed in the cartograph suggest allelic diversity among the populations as previously established by other analyses in this study.High diversity was observed in some HLA loci (A:B and A:C) than others (B:C and A:B:C) among African countries.Countries from the same region tended to be in the same location on the cartograph for haplotypes A~B and B~C.This suggests similar genetic diversities between those countries in the same region.Also, South Africa seemed to have a close genetic distance to Zambia at loci A:B and A:C compared to other countries and a closer genetic distance at loci A:B:C.Interestingly, there was a wide genetic distance between the two countries at loci B:C.This could be linked to some allelic bias towards South Africa compared to Zambia even though both countries are from the same African region.
Kenya showed closer genetic distance to Zambia at loci A:B and A:C compared to any other African countries.Similar closeness was also observed at loci B:C between Kenya and Uganda.This suggests low diversity between the two countries at those loci.Furthermore, there was high diversity between Kenya and all other African populations at loci A:B:C.Also, we observed distinct levels of diversity between Rwanda and Uganda at di erent loci.While high diversity was observed at loci A:C, relatively low diversity was observed between the two countries at loci A:B, B:C and A:B:C.Additionally, the rarefaction curves support the comparison of allelic diversity between populations (countries and tribes).Although, by observing the shape of the curves, we can infer that allelic variants have been observed within a given number of samples, yet more participants are required to the HLA typed to observed the unique alleles in all the African populations.
Furthermore, diversity at locus B is more observed across populations due to the polymorphic nature of alleles at that locus [46].
Generally, the absence of Bw4 and Bw6 alleles in African populations indicates non-expression of serological markers at the respective locus, which can a ect organ transplantation compatibility, immune responses and disease susceptibility within the continent [47,48].
Similarly, the absence of HLA-B*27:05 in African is supported by the uncommon presentation of ankylosing spondylitis (AS) disease [49].HLA-B*27:05 has been reported in literature to be associated with AS [50,51] and high prevalence of AS disease in Caucasian and African American populations is said to be associated with HLA-B*27:05 [52,53].Also, the study observed significant deviation from Hardy-Weinberg equilibrium in South Africa population at locus C. Similar deviations were observed in Ngoni at locus B. Potential causes of significant deviation from Hardy-Weinberg equilibrium have been mentioned in literature [19,40,41].Deviations from HWE at these loci in the two populations might indicate inbreeding, which can reduce genetic diversity and the population's ability to adapt to environmental changes at these loci [54,55].However, due to the retrospective nature of this study, we acknowledge allelic bias and/or HLA genotyping error as major potential causes of the deviations as also reported in literatures [56].Ewens-Watterson Neutrality test of homozygosity was significant for di erent populations at di erent loci.The significant deviations observed for the di erent populations at di erent loci suggest balancing selection which helps in preserving multiple alleles at each locus, contributing to genetic diversity [57][58][59][60].This is vital for the adaptability and long-term survival of populations, enabling them to cope with changing environments and disease pressure that are associated with alleles in that locus [61].Also, top haplotypes observed between populations a irm the closeness among such populations at the respective locus.There was a strong LD between all the locus pair across populations in this study except for Lozi, Tonga, and Tumbuka at locus A:B and Tonga at locus A:C.It is reported that haplotype frequencies are influenced by allele frequencies, LD, samples sizes, completeness of HLA data etc.[62][63][64].The results show genetic variants in high nonrandom associations being less likely to be separated by a recombination event and thus alleles of the variants are more commonly inherited together than expected [65,66].Hedrick's D′ weights alleles in each haplotype and Cramer's V Statistic is a multi-allelic correlation measure between pairs of loci [41].Also, haplotype diversity coupled with highly significant LD might provide insight into Negative (or purifying) selection in the HLA genomic region [67].This could also be linked to background selection where linked allelic variations are lost during negative selection process [68].Similar pattern of results was observed at the tribal level which also indicates genetic diversity among tribes at all loci.
The discrepancies in the unique alleles observed in the groups of population might be due to sample sizes of the populations in this study.Hence, larger sample sizes with more African countries need to be studied to get a comprehensive picture of HLA genetic diversity across Africa This study only looked at classical Class I HLA genes and the patterns of alleles inheritance at each locus observed in this study needs to be studied in more details.Also, More researches need to look at non-classical genes and Class II genes as it will help in unravelling the genetic profiles in terms of disease susceptibility and protection in each population.This will assist in understanding genetic diversity and inform in HLA population-based therapeutic development for each country.

Limitations of study
This study had limitation in terms of samples sizes in the tribal populations which could have increase the understanding of the diversity among enough tribes within each country.Hence, large sample sizes of HLA data at tribal level are needed to fully understand their respective diversity.Additionally, the imbalance sample sizes among populations might have influenced the number alleles, alleles and haplotype frequencies within each population.However, the limitations observed do not a ect the importance of understanding HLA diversity in the African subregion as presented in this study.

Conclusion
In this study, we have established HLA diversity in the Eastern and Southern African region of the African continent.Comparison of the HLA data at both country and tribal levels suggest genetic di erences within the African populations and uniqueness of the Eastern and Southern African populations relative to the US-based African populations.These analyses demonstrate the limitations of applying HLA data from one region to another, reinforcing the necessity of collecting high-quality HLA data from all regions of Africa and its varied ethnicities.
Comprehensive data collection is crucial for enhancing vaccine design and advancing our understanding of HLA disease associations, ultimately improving healthcare outcomes across the continent.Finally, due to genetic admixture, cautions must be made against extrapolating HLA data from other continents to inform African vaccine development.

Population and Sample
The Class I HLA data used in this study were obtained from a preliminary study of our HLA typing project and also from our collaborators across five distinct cohorts within African populations and two ethnic groups in the United States, all of which are part of HIV research cohorts.The African cohorts comprise Centre for The Aids Programme of Research In South Africa (CAPRISA), International AIDS Vaccine Initiative (IAVI), Female Rising through Education, Support and Health (FRESH), and Sinikithemba in South Africa.The ethnic groups from the US are the African Americans (AFAM) and Caucasians (CAU) [20].Necessary approvals were granted for all the HLA studies across the di erent cohorts.The present study includes 2,718 anonymous samples from apparent unrelated subjects across the di erent cohorts.African samples were obtained from three eastern and two southern African countries and are distributed as follows; Kenya ( = 106), Rwanda ( = 173), Uganda ( = 231), South Africa -RSA ( = 1640) and Zambia ( = 565).Of the five countries sampled within the African sub-region, tribal information was obtained from four countries excluding Rwanda due to historical development.

Data Cleaning and Validation
The HLA data used in this study was examined for inconsistencies and an in silico method (expert knowledge) [70] was used to resolve the ambiguities encountered.Few samples were duplicated with similar allelic information and participants with more allelic information were retained for the study.Otherwise, only one sample was retained in the case of same allelic information in the sample.Also, duplicate samples with di erent allelic information and samples with partially or entirely missing allelic information were excluded from the analysis.Furthermore, the HLA data was analysed at 4-digit resolution in this study.
All the HLA data used in this study were checked for allele validity, and all allele nomenclature reported prior to 2010 were updated using current nomenclature conversion tables and conversion tools provided by IMGT/HLA databased (IMGT/HLA Database, IPD-IMGT/HLA 3.56, release of January 2024, https://www.ebi.ac.uk/ipd/imgt/hla/alleles/).Similarly, haplotype nomenclature was done in accordance with the 2013 report [71] aimed at organizing and discriminating phased genes, genotypes, and ambiguous assignments.

Statistical Analysis
Allele frequencies were estimated by direct counting using Python for population genomics (PyPop) version 1.0.0 [72].The haplotypes and haplotype frequencies (HF) were estimated by resolving phase and allelic ambiguities using the expectation-maximization (EM) steps with progressive insertion algorithm by setting the posterior probability to 0.0001 in the haplo.statsversion 1.9.5.1 R package [73].The HLA data were converted to Arlequin version 3.5.2software [74] input files using CREATE software version 1.37 [75] to examine deviations from Hardy-Weinberg equilibrium (HWE) adopting a modification of the Markov random walk algorithm with 100 000 dememorization steps [76].Estimation of relative delta ( ) and Cramer's V Statistic ( ) values to measure pairwise linkage disequilibrium (LD) between pairs of alleles of di erent loci and their statistical significance were calculated using Hedrick's [77] and Cramer's [78] estimators as previously described in literature [39,79].The Ewen-Watterson neutrality test of homozygosity was implemented in PyPop using the Slatkin principle of implementation [80,81].

Multiple comparisons of both LD and Neutrality tests of homozygosity were both addressed via
Benjamini & Hochberg correction method [82].Aplha diversity indices such as specie richnessnumber of alleles [83,84], Shannon index -entropy [85], and Simpson (Gini-Simpson) indexprobability that two alleles taken from the sample at random are of di erent types [86,87] were all used to measure within population diversities.The Jaccard similarity index [88], a measure of beta diversity, was employed to determine heterogeneity between the populations.Furthermore, a rarefaction analysis to gain quantitative insights into the number of alleles that were observed in each population as a function of the number of participants was also determined.Similarly, as a measure of genetic distance between populations, haplotype frequency data from each country were dimensionality reduced using classical multidimensional scaling (MDS) to create a 2-dimensional genetic cartograph.Based on the analysis, two countries are close to each other on the map if the distribution of the HLA alleles in these two countries are close to each other, relative to the distribution observed in the other countries.

Figure 1|
Figure 1| Most frequent (≥ 5%) HLA alleles within each population.Distinction in allele frequencies testify to HLA genetic architectural diversity among the populations.HLA-B has relatively less frequencies.
was observed that HLA-B*15:03, HLA-B*58:01 and HLA-B*58:02 were present among the top 5% alleles in all populations (though at di erent frequency levels) except the US populations (Figure 1).Also, HLA-B*57:03 was only present in AFAM (10.8%) and Zambia (5.3%) among the top 5% frequent alleles.HLA-B*27:05 and HLA-B*57:01 were only present in the Caucasian population whereas, HLA-B*81:01 was only present in the AFAM population among the top 5% frequent alleles.HLA B*07:02 was among the top 5% frequent alleles in all populations except in Zambia (Figure 1, Supplementary Table

Figure 2|
Figure 2| Graphs of Shannon (A), Simpson (B) indices across African populations and (C) non-clustered heatmap of similarity index (Jaccard) among populations.A and B explain the in-country diversity.The higher the index values the more the diversity of the population at that locus.C quantify the genetic similarities (in %) among populations.

Figure 3|
Figure 3| Cartography of the genetic distance in global haplotypes between populations.(A), (B), (C) and (D) represents haplotype A~B, A~C, B~C, and A~B~C respectively.The figure visualizes the genetic distance between African and US ethic populations.
The Zulu tribe from South Africa exhibited genetic closeness to tribes in Zambia and Uganda at all loci except B:C where it was relatively farther from other African tribes.

Figure 4|
Figure 4| Rarefaction curves by HLA gene and populations estimating the allelic diversity or richness.It explains detection of more allelic variants at each locus as more participants are selected in each population.

4, 5 , 6 ,and 7 ,
topmost estimated two and three loci haplotypes in each population are summarized in Table 7.At loci A:B, A:C and A:B:C, haplotypes A*30:01~B*42:01, A*30:01~C*17:01 and A*30:01~B*42:01~C*17:01 were the topmost in the South Africa and Zambia populations.Similarly, Haplotypes A*02:01~B*15:03 and A*02:01~B*15:03~C*02:10 were detected at similar frequencies as topmost haplotypes in the Rwanda and Uganda populations.South Africa and Rwanda reported the same top haplotypes at locus pair B:C (B*58:02~C*06:02) at similar frequencies.All the populations reported di erent topmost haplotypes at three loci association (Table , and 5 for full list).In Kenya, it was observed that HLA-B*07:02 was inherited more often with HLA-B*45:01 than other alleles.Also, HLA-B*45:01 (Allele_1 and Allele_2) was observed to be inherited more often with HLA-B*15:10, and HLA-B*58:02.Furthermore, HLA-B*42:01 was observed to have more inheritance patterns with HLA-B*53:01 and HLA-B*58:01 alleles.Similarly, HLA-B*15:03 and HLA-B*58:02 alleles were observed to be inherited together with most of the alleles in Rwanda.In Ugandan and Zambian populations, HLA-B*53:01 had the highest inheritance pattern with other alleles.HLA-B*58:02 had the highest pattern of inheritance in both Rwanda and South Africa followed by HLA-B*15:03 and HLA-B*42:01 in Rwanda and South Africa respectively.

Figure
Figure 5| plots showing frequency how HLA ~B alleles were inherited together by participants in each country.

Table 1|
Number of alleles for Class I alleles according to country of subjects 2. It was observed that at locus A and B, Bw4 group alleles were either observed at an extremely low frequency or not observed at all in all the populations.HLA-A*74:01 (protective allele) was observe at relatively higher frequencies in Kenya, Rwanda, Uganda, Zambia and African American than in South Africa and very low in the Caucasian.Also, HLA-A*25:01 was not observed in all the African populations and had a very low frequency in the African American Also, among the HIV disease-susceptible alleles at locus B, HLA-B*08:01 was observed at a relatively high frequency in the Caucasian population compared to other populations.Similarly, this allele was at a high frequency in South Africa compared to other African populations.Interestingly, HLA-B*58:02 was observed at higher frequencies in the African populations compared to the US populations.This allele seems to have very high frequency in South Africa (11.4%) and Rwanda (11.3%) compared to other African countries.
population.The HIV disease-susceptible allele at locus A (A*36:01) was observed at a relatively low frequency in South African and Caucasian populations compared to other populations.Similarly, the Bw4 and Bw6 alleles were technically not observed in the African populations at locus B. Although HLA-B*39:01 was observed in the Zambia population but at an extremely low frequency.Among the protective alleles, HLA-B*27:05 was observed at a low frequency in South Africa among the African populations compared to the Caucasian and African American populations.Conversely, HLA-B*42:01 and B*44:03 were observed at the high frequencies in both South African and Zambian populations.HLA-B*52:01 was only observed in Kenya and South Africa among the African populations with low frequencies.While HLA-B*57:01 was observed at extremely frequencies in African populations, it was observed at a high frequency B*07:02 was observed at a relatively high frequency in Kenya compared to other African countries while the Caucasian population had the highest frequency of B*07:02 among all populations.

Table 2|
Grouping of alleles and allele frequencies among populations at di erent locus di erent loci to gain an insight into the basis for the observed genetic diversity in each population at di erent loci.The HWE and Neutrality test of homozygosity are fundamental Genetic Basis for Observed Di erencesFurthermore, this study employed the Hardy-Weinberg equilibrium (HWE), Neutrality test of homozygosity, haplotypes and pairwise linkage disequilibrium test and inheritance patterns of alleles at

Table 3|
Exact test using Markov chain for HWE parameters for the five countries.

Table 4|
Exact test using Markov chain for HWE parameters for tribes.Statistically significant).Obs.Het., observed heterozygosity; Exp.Het., expected heterozygosity, No of Gen.; No of Genotypes

Table 5|
Slatkin's implementation of EW homozygosity test of neutrality for the five African countries *Statistically significant.

Table 6|
Slatkin's implementation of EW homozygosity test of neutrality for tribes *Statistically significant.

Table 7|
Topmost haplotypes at di erent loci across populations

Table 8|
Pairwise linkage disequilibrium across countries

Table 9|
Pairwise linkage disequilibrium across tribes