A comprehensive ethnic-based analysis of alpha thalassaemia allelle frequency in northern Thailand

Alpha (α)-thalassaemia is one of the most prevalent hereditary blood disorders, commonly affecting Southeast Asian people, with the highest incidence (30–40%) being seen in northern Thailand. However, this high incidence was estimated without consideration of the variations between ethnic populations and the geographical location of the populations. To address this issue, a total of 688 samples from 13 different northern Thai ethnic groups (30 villages) categorized into three linguistic groups were genotyped for deletional alpha-thalassaemia (-α3.7, -α4.2, --SEA and --THAI) and/or non-deletional alpha-thalassaemia (αCS and αPS) via multiplex gap-PCR and dot-blot hybridization, respectively. Alpha+(-α3.7, -α4.2, αCS and αPS) and alpha°-thalassaemia (--SEA and --THAI) allele frequencies (with 95% Confidence Interval) were the highest in the Sino-Tibetan group [0.13 (0.08–0.18)] and the Tai-Kadai group [0.03 (0.02–0.05)], respectively. With regards to ethnicity, the varying allele frequency of α+ and α°-thalassaemia amongst a variety of ethnic groups was observed. The highest α+-thalassaemia allele frequency was found in the Paluang [0.21 (0.10–0.37)] while α°-thalassaemia allele frequency was the highest in the Yuan [0.04 (0.01–0.10)]. These detailed results of alpha thalassaemia allele frequency and genetic diversity amongst the northern Thai ethnic groups demonstrate the need for ethnicity based thalassaemia prevention programs.

carrier state, while loss of three (α + /α°) results in Hb H disease in which the pathology is primarily mediated by the relative excess of β-chains which can form tetramers of β-globin (β 4 ) which can promote oxidative hemolysis. Loss of four α-globin genes (α°/α°) results in fatal Hb Barts' hydrops fetalis syndrome. Where loss of three α-globin genes occurs through inheritance of a combination of deletional and non-deletional α-thalassaemia, presentation can be more severe than that which results from inheritance of deletional α-thalassaemia only, and consequently, the clinical symptom of an affected person with inherited α-thalassaemia alleles ranges from asymptomatic to blood transfusion-dependence to premature death of infants depending on the number of α-globin alleles affected 8 .
The morbidity and mortality of α-thalassaemia associated with significant clinical symptoms are therefore observed in haemoglobin H disease (Hb H, three missing functional α-globin alleles) and haemoglobin Bart's hydrops fetalis syndrome (Hb Barts, a complete loss of functional α-globin alleles) 10 . In Thailand, due to the high prevalence of α-thalassaemia carriers, there is a significant number of patients with Hb H disease (7/1,000 newborns) 11 . More importantly, in northern Thailand, 0.33% of 52,625 fetuses were reported to be Hb Bart's hydrops fetalis 12 . These confirm the necessity for accurate and effective management of α-thalassaemia in this part of the world.
Several α-thalassaemia surveys in the northern part of Thailand have demonstrated that there is a high (15-40%) prevalence of α-thalassaemia alleles in the northern Thai population 13,14 . However, population sampling in most surveys was conducted on couples who went to hospitals for screening, so the prevalence observed was primarily determined from the overall population of the upper northern part of Thailand. Interestingly, a recent study that determined the prevalence of α-thalassaemia in a population-based study in the northern Thai population showed for the first time that the overall prevalence of α-thalassaemia in upper northern Thailand was 24% (33 of 141), and more importantly, the study highlighted the significantly different prevalence of α-thalassaemia amongst ethnic groups ranging from 0 to 50% of populations examined 15 . However, that study was limited by a low number of samples and sampling areas for some ethnic groups, and in particular, no hill-tribe groups belonging to the Sino-Tibetan and Hmong-Mien linguistic families were included in the study. To address these issues this study analysed a large cohort comprising of ethnic populations from numerous sampling areas throughout the northern part of Thailand including the northern minorities such as Shan, Karen and Htin. Thus, the objective of this study is to provide more comprehensive and meaningful data of common α-thalassaemia allele frequency in northern Thai people as well as in particular, in each ethnic population. This information will serve as a more practical basis for developing genetic counseling for the long-term effort to reduce the burden of Hb H and Hb Bart's hydrops fetalis syndrome in the country.

Results
A total of 688 DNA samples from people belonging to 13 ethnic groups that are classified as part of three linguistic groups (Tai-Kadai, Austro-Asiatic and Sino-Tibetan) were analysed for four types of common deletional α-thalassaemias (-α 3.7 , -α 4.2 , --SEA , --THAI ) by multiplex gap-PCR with nine specific primers for each type (Fig. 1a) and 350 of the total 688 samples were analysed for an additional two types of mutational α-thalassaemias (α CS and α PS ). Of the six common α-thalassaemia screened for, three different deletions (-α 3.7 , -α 4.2 , --SEA ) and one point mutation (α CS ) were found in this cohort.
The overall prevalence of the six common α-thalassaemia types assessed in this cohort of 13 ethnic groups is 19.51% (Table 1) with a frequency of 0.1008 (0.0788-0.1247) ( Table 2). Almost all the α-thalassaemia detected in this study was heterozygous, except for one case of Hb H disease (-α 3.7 /--SEA ) which was detected in one sample from the Yong ethnic group (Table 1).
The combined analysis of α + and α°-thalassaemia allele frequency may overestimate the incidence of the disease in the population. More importantly, the allele frequency of α°-thalassaemia determines the burden of significant α-thalassaemia syndromes. Therefore, α + -thalassaemia was analysed separately from α°-thalassaemia allele frequency. The results showed that α + -thalassaemia allele frequency was the highest in the Sino-Tibetan group [0.1262 (0.0841-0.  Table 2).

Discussion
Alpha thalassaemia is a global health problem that is a growing burden 16,17 , particularly in Southeast Asian ethnic groups. The high prevalence (30%) of αthalassaemia that has been previously reported in northern Thailand 13,14,18 , has been shown to vary from region to region and by ethnic group 15 . However, few studies describing the frequencies of α-thalassaemia in Thai ethnic groups have been conducted, and the data was limited by the small sample size and screening method 9,14 (Table 3). Our first survey undertaken using molecular analysis to identify α-thalassaemia amongst 8 ethnic groups had a small sample size 15 , but showed distinct variations between ethnic groups. Furthermore, the population in upper northern Thailand is comprised of a number of ethnic groups which can be categorized into three major linguistic groups. These are comprised of the Tai-Kadai group who are the majority of the present day northern Thai population, the Austro-Asiatic group who are recognized as the descendants of the prehistoric inhabitants of northern Thailand and mostly reside in remote areas, and the hill-tribes group which is comprised of ethnic groups that belong to the Sino-Tibetan and Hmong-Mien linguistic families. From this last group, the Karen ethnic group have the highest population number amongst the hill-tribes of northern Thailand 19,20 . This is further complicated by the occurrence of diverse genetic backgrounds  Table 1. The number of affected person according to the genotype analysis with prevalence (%) of α-thalassaemia in the population residing in northern Thailand. ND = Unverified point-mutational alphaglobin gene anomalies (α CS , α PS ) by a dot-blot hybridization method. *The number of sample enrolled in this study was subjected to four deletional alpha-thalassaemia screening (-α 3.7 , -α 4.2 , --SEA and --THAI ). **The number of sample enrolled in this study was subjected to four deletional (-α 3.7 , -α 4.2 , --SEA and --THAI ) and two mutational (α CS , α PS ) alpha-thalassaemia screening.
amongst northern ethnic groups 21,22 , and therefore the overall incidence of α-thalassaemia from previous surveys might not represent the situation accurately. Thus it was of interest to conduct a larger survey to more accurately determine the real prevalence. Therefore in this study, a larger cohort comprising of individuals from 13 ethnic groups residing in northern Thailand was surveyed for six common α-thalassaemia types. The overall frequency of the six types of α-thalassaemia investigated in this study is 0.1008 (0.0788-0.1247) ( Table 2), representing a prevalence of 19.51%) ( Table 1). The prevalence data surveyed by this and our previous cohort 15 are comparable, but lower than previous reports from the general Thai population that reported the prevalence of α-thalassaemia at 26.42% (28/106) 9 (Table 3).
In accordance with our previous study 15 and the findings of other studies 14,16,18 , the -α 3.7 deletion is the most common α-thalassaemia present amongst Thais and Thai ethnic groups, followed by -- SEA 6,7 . Interestingly, the heterozygous --SEA deletion is very common in the Tai-Kadai linguistic group. The data is also consistent with a study undertaken in the Yunnan province of Southwestern China which showed that the --SEA deletion type is the most common α-thalassaemia 23 and supports evidence that the Tai-Kadai speaking people staying in Northern Thailand migrated there from the southwest of China 24,25 . It also supports the genetic diversity of this abnormal gene between population groups.
Similarly, evidence for the genetic diversity of this gene was found with the α CS mutation. While the overall allele frequency of the α CS allele was 0.0100 (0.0040-0.0205) ( Table 2), it was only found in the Tai-Kadai linguistic group (Yuan, Lue and Yong) and the mutation was not detected in the Austro-Asiatic groups. However, α CS is the most prevalent α-globin variant in the Southeast Asian population 26 Table 2. The allele frequency of α-thalassaemia in the population residing in northern Thailand. ND = Unverified point-mutational alpha-globin gene anomalies (α CS , α PS ) by a dot-blot hybridization method. *The number of sample enrolled in this study was subjected to four deletional alpha-thalassaemia screening (-α 3.7 , -α 4.2 , --SEA and --THAI ). **The number of sample enrolled in this study was subjected to four deletional (-α 3.7 , -α 4.2 , --SEA and --THAI ) and two mutational (α CS , α PS ) alpha-thalassaemia screening.   Table 3. Previous reports of α-thalassaemia prevalence in the population residing in northern Thailand.
Scientific RepoRts | 7: 4690 | DOI:10.1038/s41598-017-04957-2 α 0 -thalassaemia, a more severe presentation than deletional Hb H disease can occur 26 . In contrast to the native Thai population, we did not find the --THAI deletion or the Hb Pakse α-globin variant in this cohort. This latter observation is in accordance with an earlier study which showed that Hb Pakse was not found in the population residing in northern Thailand 3 . With regards to the frequency of α-thalassaemia observed in each linguistic group, this study detected considerable variation amongst the different ethnic groups. The highest frequency of α-thalassaemia [0.1262 (0.0841-0.1794): α + = 0.1262 (0.0841-0.1794), α 0 = 0] was seen in the Sino-Tibetan (Karen) linguistic group and the -α 3.7 deletion type was the only α-thalassaemia type existing in this group. Importantly, the Paluang, Karen and Shan ethnic groups showed a very high frequency of the -α 3.7 deletion type. Since these three ethnic peoples live along the Thailand-Myanmar border 27 which is a malaria endemic area 28 , the high frequency of the -α 3.7 may reflect natural selection due to protection against severe malaria infection 29 . Moreover, the presence of the -α 3.7 deletion in  Table 4. Linguistic group, ethnicity, location and number of samples of the 13 ethnic groups. *Previously reported groups screened for 6 types of deletion and point mutation of α-thalassaemia gene for comparison 15 .
all three Karen ethnic groups (Skaw, Pwo and Padong) is at very similar levels, supporting the common origin of these ethnic groups, and showing that the Karen seem to have a homogenous genetic background. The frequency of α-thalassaemia is also high in the Tai Interestingly, this linguistic group shows the highest frequency of heterozygous α-thalassaemia 1 (--SEA ) which is characterized by deletion of two α-globin genes, and this was supported by the detection of one individual with Hb H disease in this group. In contrast to the other ethnic groups in the Tai-Kadai linguistic group, the Lue show significant gene diversity with 4 types of α-thalassaemia detected in this ethnic group. This is likely to be the result of a founder effect and/or inter-ethnic marriage between the Lue and other ethnic groups during their migration through Laos. The predominance of --SEA and α CS types in Tai-Kadai linguistic group elevates their risk of conceiving fetuses with Hb Bart's hydrop fetalis or Hb H-CS disease. The lowest frequency was recorded in Austro-Asiatic linguistic group since no α-thalassaemia was detected in any of the 48 Lawa people investigated.

Conclusion
Our study presents the results of the screening of a large cohort representing 13 ethnic groups from northern Thailand for α-thalassaemia. As the prevalence of α-thalassaemia is relatively high and the majority of these groups are still unaware of their thalassaemia status, couples who are members of particular ethnic populations at risk for α°-thalassaemia (--SEA , --THAI ) such as the Yuan, Shan, Lue and Yong should be recommended for haematological screening prior to planning for pregnancy to control the severe types of α-thalassaemia. Future studies might be directed to study the whole α-globin locus in order to determine whether novel α-globin gene abnormalities may exist that are unique to a particular ethnic group.

Materials and Methods
Study populations. Northern Thailand has 18 officially recognized ethnic populations 19,20 . For this survey samples were obtained from 13 ethnic groups from 30 villages distributed in five provinces of northern Thailand. The cohort comprised (a) 278 newly genotyped samples and (b) 269 subjects previously genotyped for hemoglobin E for whom α-thalassaemia genotype has not been reported 30 . In addition (c) α-thalassaemia genotypic data from 141 subjects as previously reported 15 was included, giving a total 688 samples. The criteria for population sampling was as described elsewhere 22,24,30,31 . Briefly, all volunteers enrolled in this study were healthy, over 20 years of age, unrelated, and recognized as a member of the study ethnic population for at least three generations with no admixture from other populations. The designed number of sample size enrolled in this study was 30 samples per ethnic group. Although some difficulties arose in obtaining appropriate number of samples from some ethnic groups such as the Padong Karen who practice endogamous marriage, the Palaung and Blang who have small population sizes and the Khuen who traditionally marry with people from other ethnic groups (interethnic marriage) giving offspring (admixed population) that cannot be recruited for this study, the sample sizes from such mentioned ethnic groups are still nearly in the power of calculation for population analyses as stated by Jobling et al., 2013 (20-50 individuals per populations are recommended) 32 . The location of sampling areas and details are shown in Table 4 and Fig. 3. All subjects from categories (a) to (c) were enrolled after informed consent. Ethical approval of all methods and experimental protocols according to the guidelines was follows: the Yong ethnic group (a) and all subjects of category (b) were approved by the Human Experimentation Committee, Research Institute for Health Sciences, Chiang Mai University, Thailand. All subjects of category (c) were approved by the Policy Review Board of the Pan Asia SNP consortium as described elsewhere 33 . It should be noted that both the Lue and the Htin ethnic population samples of category (a) were collected more than 10 years ago and therefore oral informed consent was implemented with the assistance of the head of each village.
DNA extraction. Five milliliters of peripheral blood from human subjects was collected after individual informed consent, and total genomic DNA was extracted using an inorganic salting out protocol as described elsewhere 34 . Quality and quantity of extracted genomic DNA from all samples were examined by 1% agarose gel electrophoresis and spectrophotometry (OD 260 /OD 280 ). All samples were kept at -20 °C until use.
Multiplex gap-PCR analysis of the deletional alpha globin gene. The four most common deletional α-thalassaemias (-α 3.7 , -α 4.2 , --SEA and --THAI ) in the Thai population were investigated in this study. All samples were genotyped for the four deletional types of α-thalassaemia by multiplex gap-polymerase chain reaction modified from Chong and colleagues 35 . Briefly, nine specific primers were used in PCR reaction, consisted of primers α2/3.7-F, 3.7-R, α2-R, 4.2-F, 4.2-R, SEA-F, SEA-R, THAI-F and THAI-R. Each PCR reaction was performed in a single tube for simultaneous amplification of different amplicons under an initial denaturation at 95 °C for 15 minutes and followed by thirty-five cycles of denaturation for 45 second at 98 °C, annealing at 60 °C for 1.30 minute, extension at 72 °C for 2.15 min with an additional final extension at 72 o C for 5 min after the last cycle. PCR products were analysed by 1.5% agarose gel electrophoresis compared with positive controls as shown in Fig. 1a. To ensure the genotyping accuracy of the multiplex gap-PCR, every single round of PCR amplification of unknown samples was performed paralleled with the positive controls (Fig. 1a, lane 1, 2, 3 and 4 are --THAI / αα, --SEA /αα, -α 4.2 /αα and -α 3.7 /αα, respectively) and a negative control (lane 5 is genotyped as αα/αα). DNA samples from each unknown individual was genotyped at least in duplicate.
Dot-blot hybridization analysis of the mutational alpha globin gene. A total of 350 samples were screened for two common types of non-deletional α-thalassaemia (α CS and α PS ). The dot-blot hybridization method was employed as described elsewhere 36 . The α-globin gene was amplified by PCR using primers αF and α2 R. The 331 bp-PCR products were validated by 1.5% agarose gel electrophoresis and were then subsequently hybridized with specific probes for α CS and α PS as well as a normal probe. The resulting genotype of each unknown sample was interpreted in parallel with controls, which consisted of a normal sample and homozygous α CS and α PS . A blue spot was interpreted as a positive signal (Fig. 1b). The genotyping quality of the dot blot hybridization of unknown samples was ensured by controls (Fig. 1b, positive controls are samples with known genotype of α CS homozygous and α PS homozygous while the negative control was αα/αα). Unknown samples were always tested in paralleled with controls, and analysis was conducted in duplicate.
Statistical Methods. All the allele frequencies were calculated using the Microsoft Excel program (version 2016, Microsoft Corporation, USA) with the function BinomLow and BinomHigh (add-ins) derived from JavaStat to compute the exact binomial confidence interval (95%). The bar graph was generated by the PRISM software (version 7.00, GraphPad Software, Inc. USA).
Data availability statement. The data sets generated and analysed during the current study are available within the paper.