Comprehensive Screening of Genetic Variants in the Coding Region of F8 in Severe Hemophilia A Reveals a Relationship with Disease Severity in a Colombian Cohort

Hemophilia A is an X-linked disorder characterized by quantitative deficiency of coagulation factor VIII (FVIII) caused by pathogenic variants in the factor 8 (F8) gene. Our study’s primary objective was to identify genetic variants within the exonic region of F8 in 50 Colombian male participants with severe hemophilia A (HA). Whole-exome sequencing and bioinformatics analyses were performed, and bivariate analysis was used to evaluate the relationship between identified variants, disease severity, and inhibitor risk formation. Out of the 50 participants, 21 were found to have 17 different pathogenic F8 variants (var). It was found that 70% (var = 12) of them were premature truncation variants (nonsense, frameshift), 17.6% (var = 3) were missense mutations, and 11.7% (var = 2) were splice-site variants. Interestingly, 35% (var = 6) of the identified variants have not been previously reported in the literature. All patients with a history of positive inhibitors (n = 4) were found to have high-impact genetic variants (nonsense and frameshift). When investigating the relationship between variant location (heavy versus light chain) and specific inhibitor risk, 75% (n = 3) of the inhibitor participants were found to have variants located in the F8 light chain (p = 0.075), suggesting that conserved domains are associated with higher inhibitor risk. In summary, we identified genetic variants within the F8 that can possibly influence inhibitor development in Colombian patients with severe HA. Our results provide a basis for future studies and the development of further personalized treatment strategies in this population.


Introduction
Hemophilia A (HA) is an inherited bleeding disorder characterized by quantitative deficiency of coagulation factor VIII (FVIII) and caused by pathogenic variants in the factor 8 (F8) gene located on the X chromosome (Xq28) [1][2][3][4].The F8 gene has 26 exons and 187,000 base pairs (bp), and encodes a high-molecular-weight glycoprotein that is 2351 amino acids (aa) long [3,5,6].Mature FVIII protein consists of six domains arranged in the following order: (A1-A2-B)-(A3-C1-C2) from the amino terminus to the carboxyl terminus [5].The first domain is known as "heavy chain" and the second as "light chain".Currently, there are more than 3,756 reported mutation variants within the F8 gene known to be associated with HA [7][8][9].
Hemophilia A is estimated to affect one in 5000-10,000 live male births [5,10].According to the 2023 Colombian national registry, developed by the Colombian Fund for High-Cost Diseases (CAC), the incidence and prevalence of HA in Colombia were 0.83 and 4.68 for every 10,000 people, respectively.Out of 2421 reported patients with HA, 52.78% of them were categorized as having a severe disease [11].
Standard treatment for severe HA (residual FVIII level less than 1%) patients includes the use of FVIII concentrates, with the primary goal of achieving a minimum plasma concentration greater than or equal to 1% between FVIII scheduled infusions [12].The efficacy of replacement-based therapies can be affected by the development of immunoglobulin G (IgG) FVIII-neutralizing alloantibodies.These inhibitors can develop in up to 25-30% of patients with severe HA.The presence of inhibitors makes treatment with FVIII concentrates ineffective, increases the incidence of bleeding events, and affects the overall quality of life of HA patients [1,13].Inhibitors are classified based on their peak inhibitory titers, with those with historic levels < 5 Bethesda units (BU) classified as low-responding (LR) inhibitors and those with ≥ 5 BU as high-responding (HR) inhibitors.As of 2023, only 6% of HA Colombian patients were reported as having positive inhibitor titers, 50% of them being LR cases [11].
Given the genetic nature of HA, identifying F8 variants could allow for the detection of possible variants that may be associated with the severity of the condition and inhibitor formation [14].Prior studies have usually focused on F8 mutations localized within intron 22 and intron 1, as they account for approximately 45% and 9% of HA cases, respectively, and are the most common mutations reported in HA patients [15].Considering that the F8 gene-coding region could have a significant number of other mutations, this study aimed to identify variants in the exonic region of F8 in a cohort of Colombian patients with severe HA using whole-exome sequencing (WES), and to examine their relationship with disease severity and risk for inhibitor formation.

Study Design
This cross-sectional study was performed via open invitation to 50 male participants with severe HA (FVIII concentration < 1%) affiliated with Integral Solutions SD SAS, a specialized medical care center in Colombia.The study was approved by the Institutional Human Research Ethics Committee at CES University (Project ID: 1065).

Sample Collection and Sequencing
A total of 5 mL of peripheral whole blood was collected from each participant.DNA was quantified via spectrophotometry (Nanodrop Lite) by measuring the absorbance at 260 nm, following the manufacturer's instructions.Library preparation was performed using an Agilent SureSelect M kit (v6.0).The sequencer applied was a NovaSeq6000 platform (Illumina, San Diego, CA, USA), following the manufacturer's instructions.Variant prioritization was given for the filtering algorithm in Varseq 2.3.0,applying the following strategy: quality (PASS and Missing); depth ≥ 20×; genotype quality ≥ 20.2; type of variant and frequency, filtered via zygosity; effect: (LOF, Missense); and a pseudocontrol population frequency ≤ 1% according to gnomAD exomes, gnomAD genomes, 1000G, Life 2024, 14, 1041 3 of 14 ESP, Kaviar, Beacoz, and Bravo.Pathogenicity prediction was performed by filtering using the threshold CADD ≥ 14 phred and Revel ≥ 0.75 to identify variants predicted to be damaging using in silico bioinformatics tools (Figure 1).Pathogenicity assignment for the remaining variants after filtering was performed according to the 2015 American College of Medical Genetics and Genomics (ACMG) recommendations.Only those variants that were likely to be pathogenic and/or variants of unknown significance (VUS) were reported.To analyze copy number variation (CNV), our filtering strategy included deletions, duplications, p-value < 0.05, a Span of 10,0000 pb, and clinical interpretations of "likely pathogenic" and "pathogenic".
using an Agilent SureSelect M kit (v6.0).The sequencer applied was a NovaSeq6000 platform (Illumina, San Diego, CA, USA), following the manufacturer's instructions.Variant prioritization was given for the filtering algorithm in Varseq 2.3.0,applying the following strategy: quality (PASS and Missing); depth ≥ 20×; genotype quality ≥ 20.2; type of variant and frequency, filtered via zygosity; effect: (LOF, Missense); and a pseudocontrol population frequency ≤ 1% according to gnomAD exomes, gnomAD genomes, 1000G, ESP, Kaviar, Beacoz, and Bravo.Pathogenicity prediction was performed by filtering using the threshold CADD ≥ 14 phred and Revel ≥ 0.75 to identify variants predicted to be damaging using in silico bioinformatics tools (Figure 1).Pathogenicity assignment for the remaining variants after filtering was performed according to the 2015 American College of Medical Genetics and Genomics (ACMG) recommendations.Only those variants that were likely to be pathogenic and/or variants of unknown significance (VUS) were reported.To analyze copy number variation (CNV), our filtering strategy included deletions, duplications, p-value < 0.05, a Span of 10,0000 pb, and clinical interpretations of "likely pathogenic" and "pathogenic".

Protein Structure Prediction
The canonical amino acid sequence of F8 was obtained from the UniProt database [16].Considering this as our base, this sequence was modified according to the amino acid changes in each of the included patients.Once this edit was completed, the FVIII protein with the corresponding changes was modeled using the Swiss-Model Workspace [17].In total, 15 variants were modeled, and the generated models were downloaded in PDB format in parallel with the download of the PDB file of the wild-type FVIII protein [16,18,19] and uploaded into PyMOL [20] for visualization and editing.Subsequently, visualization of all downloaded predictions was performed, coloring the mutated protein domains with the same range of colors as the wild-type domains.

Statistical Analysis
For univariate (descriptive) analyses, we applied frequency statistics to describe the behavior of each variable to be analyzed.For bivariate analyses, we applied Fisher's statistical test for small sample sizes.All estimates were performed at a confidence level of 95%.All statistical analyses were performed using SPSS Statistics for Windows version 23.

F8 Variants Identified in Study Participants
Whole exome sequencing analysis identified potential pathogenic F8 variants in 21 of the 50 participants, with a total of 17 different F8 variants (Figure 2).The majority of

Protein Structure Prediction
The canonical amino acid sequence of F8 was obtained from the UniProt database [16].Considering this as our base, this sequence was modified according to the amino acid changes in each of the included patients.Once this edit was completed, the FVIII protein with the corresponding changes was modeled using the Swiss-Model Workspace [17].In total, 15 variants were modeled, and the generated models were downloaded in PDB format in parallel with the download of the PDB file of the wild-type FVIII protein [16,18,19] and uploaded into PyMOL [20] for visualization and editing.Subsequently, visualization of all downloaded predictions was performed, coloring the mutated protein domains with the same range of colors as the wild-type domains.

Statistical Analysis
For univariate (descriptive) analyses, we applied frequency statistics to describe the behavior of each variable to be analyzed.For bivariate analyses, we applied Fisher's statistical test for small sample sizes.All estimates were performed at a confidence level of 95%.All statistical analyses were performed using SPSS Statistics for Windows version 23.

F8 Variants Identified in Study Participants
Whole exome sequencing analysis identified potential pathogenic F8 variants in 21 of the 50 participants, with a total of 17 different F8 variants (Figure 2).The majority of participants had frameshift variants (52.4%, n = 11), followed by nonsense variants (23.8%, n = 5), missense variants (14.3%, n = 3), and splice-site variants (9.5%, n = 2) (Table 1).Some 57% (n = 12) of participants had variants located within the heavy chain, and 33.3% (n = 7) within the light chain.Two (9.5%) participants presented splicing variants (Tables 1 and 2).Out of the 21 participants with mutations, 16 had mutations resulting in early termination of protein expression.Of these, 5 presented variants generated by the change of a single nitrogenous base, leading to an immediate termination codon (nonsense mutation).Eleven participants presented deletions or duplications of one or more nitrogenous base, which changed the reading frame (frameshift).No CNVs were detected.
Variants in the light chain were mainly found in the A3 and C2 domains.Most of these variants exhibited early termination in protein synthesis, destabilizing the overall structure of FVIII, which can lead to a loss of affinity and interaction between FVIII and other hemostatic proteins involved in the coagulation process.This was observed, for example, in the c.6972C>A p.(Tyr2324Ter) variant, which is truncated at the nucleotide positions 2303-2332, representing the carboxyl-terminal sequence and one of the 3 VWF binding regions, which will lead to early FVIII proteolysis [21].The c.6666G>A p.(Trp2222Ter) and c.6721C>T p.(Gln2241Ter) variants exhibit early termination in the C2 domain, also associated with a decrease in affinity to VWF.These mutations also cause aminophospholipid protein binding sites to lose phosphatidylserine residues on the surface of platelets, affecting platelet packaging [22].The c.6045G>A p. (Trp2015Ter) and c.5882G> p. (Trp1961Ter) variants present early termination of protein expression in the A3 domain, leading to the loss of FVIII function and loss of multiple sites that contribute to the binding between LRP (high-density lipoprotein receptors) and FVIII [5].)Finally, the c.5447_5448dupGG p. (Gln1817GlyfsTer55) variant is associated with truncating protein synthesis that affects the FVIII region from amino acids 1803-1818, which is critical for the function of the cofactor [23].Variants in the heavy chain, such as c.2724delT p.(Pro909HisfsTer15), c.4296_4300delTTCTC p.(His1434SerfsTer6), c.2609delC p.(Pro870LeufsTer7), c.2945dupA p.(Asn982LysfsTer9), and c.4379delA p.(Asn1460IlefsTer5), exhibit early termination of the FVIII B domain which, although cleaved upon FVIII activation, is important in the structure of FVIII to allow appropriate interaction with activated FIX (FIXa) and the tenase complex (X) [23].Two additional missense variants were identified in the A2 domain: c. 1795G>T p. (Asp599Tyr) and c.1946G>C p. (Cys649Ser).In the A1 domain, the most frequently identified variant was the c.298dupT p. (Tyr100LeufsTer2), which is known to lead to premature termination of FVIII protein expression (Figure 3).TTCTC p.(His1434SerfsTer6), c.2609delC p.(Pro870LeufsTer7), c.2945dupA p.(Asn982Lys-fsTer9), and c.4379delA p.(Asn1460IlefsTer5), exhibit early termination of the FVIII B domain which, although cleaved upon FVIII activation, is important in the structure of FVIII to allow appropriate interaction with activated FIX (FIXa) and the tenase complex (X) [23].Two additional missense variants were identified in the A2 domain: c. 1795G>T p. (Asp599Tyr) and c.1946G>C p. (Cys649Ser).In the A1 domain, the most frequently identified variant was the c.298dupT p. (Tyr100LeufsTer2), which is known to lead to premature termination of FVIII protein expression (Figure 3).

Relationship between F8 Identified Variants and Inhibitor Risk
Out of the 21 participants, 4 (19%) had a history of FVIII inhibitors, 2 were LR and 2 were HR.Inhibitor development in all 4 participants occurred after exposure to high doses (> 80 IU/kg/day, for 3 to 7 days) of plasma-derived F8 concentrates for the control of acute bleeding episodes.All 4 participants exhibited variants that cause early termination of FVII synthesis (Table 4).High-impact variants (nonsense, frameshift) were found in all 4 patients.No relationship was found between coding impact and inhibitor development (p = 0.45) (Table 5).All

Relationship between F8 Identified Variants and Inhibitor Risk
Out of the 21 participants, 4 (19%) had a history of FVIII inhibitors, 2 were LR and 2 were HR.Inhibitor development in all 4 participants occurred after exposure to high doses (>80 IU/kg/day, for 3 to 7 days) of plasma-derived F8 concentrates for the control of acute bleeding episodes.All 4 participants exhibited variants that cause early termination of FVII synthesis (Table 4).High-impact variants (nonsense, frameshift) were found in all 4 patients.No relationship was found between coding impact and inhibitor development (p = 0.45) (Table 5).All high-impact variants (nonsense, frameshift) seemed to affect 100% of the patients with inhibitors.When investigating the relationship between the variant's location (heavy versus light chain) and their specific inhibitor risk, of the inhibitor participants (n = 4) were found to have variants located within the light chain (p = 0.075; Table 6).Fisher's exact test.Note: additional patients harbored splicing variants in the absence of inhibitors.
The c.298dupT variant was present in three participants.Two of them have received primary prophylaxis for more than 15 years without a history of inhibitors or joint damage.The third participant was an older adult who has previously received tertiary prophylaxis and currently has high-response inhibitors following an regimen, with no bleeding in the last 12 months.Variants c.389-2A>G, c.1795G, and c.2609delC were all present in one participant, who had no joint damage or inhibitors.
The c.2724delT variant was found in two participants currently on prophylaxis regimen without joint damage or history of inhibitors.Finally, the c.5447_5448dupGG variant was present in two other participants currently receiving tertiary prophylaxis with no significant bleeding phenotype in the last 12 months, although one of them currently has low-response inhibitors.

Three-Dimensional Structure of the F8 Protein
In silico protein modeling allowed us to identify and study the mechanism by which the variants detected in F8 in our patients affected the structure, localization, and interaction of the FVIII protein and other proteins.Figure 4 shows the 3D structure of the canonical FVIII protein, where the domains are differentiated by color and the affected amino acids are labeled.Figure 5 represents the tertiary structure models of FVIII, showing where the nine variants seem to affect the domains in the heavy chain, which was truncated in most cases.Figure 6 shows the tertiary structure models of FVIII in the light chain, where the protein was truncated in all cases.

Discussion
Several studies have investigated the role of F8 variants in patients with HA [9,24].Their findings suggest that the frequency and type of F8 variants, as well as their potential association with inhibitor risk formation, depend on the specific studied population.Due to its known high prevalence, prior HA studies in Colombia have mostly focused on analyzing the role of intron 22 inversion, but have not investigated the prevalence and role of variants located within the exonic region of F8 [15].
In our cohort, and similar to the systematic review by Gouw et al. [25], frameshift type

Discussion
Several studies have investigated the role of F8 variants in patients with HA [9,24].Their findings suggest that the frequency and type of F8 variants, as well as their potential association with inhibitor risk formation, depend on the specific studied population.Due to its known high prevalence, prior HA studies in Colombia have mostly focused on analyzing the role of intron 22 inversion, but have not investigated the prevalence and role of variants located within the exonic region of F8 [15].
In our cohort, and similar to the systematic review by Gouw et al. [25], frameshift type variants were the most common type of identified F8 variants (52%), followed by nonsense variants, present in 24% of study participants.Interestingly, missense variants were only present in 14% of our cohort.This finding differs from those reported by Gouw and Atik [25,26], where the prevalence of this type of variant ranged between 34% and 45% (Table 7).Nonsense mutations had a similar reported prevalence in comparison to these two studies (24% versus 23% and 25%, respectively).Splice-site variants were more frequently encountered in our cohort.Data were adapted from a meta-analysis by Gouw et al. [25], and a mutation study by Atik et al. [26], taking only the type of variation found in the study cohort as 100%.
When looking at variants associated with an early termination of the FVIII protein (frameshift and nonsense mutations), their frequency was higher in our cohort compared to the results reported by Gouw and Atik (76% versus 59% and 50%, respectively).Interestingly, in our cohort, most of the participants (75%) with positive inhibitor history had their variants located in the light chain.This finding also differs from the one reported by Gouw et al. (75% versus 52%, respectively) [8].We can then hypothesize, and as reported by Oldenburg et al. [27], Carcao and Goudemans [28], and Gensana et al. [29], that the more domains are conserved, the greater the possibility of generating inhibitors.This may be because when more domains are conserved, more epitope sites become available to generate inhibitors, triggering an immune response to exogenous FVIII [30].
Another important difference worth highlighting is that, despite IgG epitopes having been most commonly reported to be localized in the A2 and C2 domains where missense variants increase the risk of developing inhibitors up to four-fold [31], in our cohort, variants located in the A2 domain had the lowest prevalence (11.7%, n = 2).These variants were all missense mutations without associated inhibitor risk.This finding could possibly explain the low incidence (5.7%) of patients with hemophilia A and inhibitors reported in the Colombian national registry [11] in comparison to the 25-30% inhibitor rate reported in several other studies [32].
Our study identified six mutations that had not been previously reported in the literature.This finding might suggest that in Colombia, for every ten identified F8 variants, three might differ from those reported in other populations.This genetic characteristic could possibly explain the different phenotype and inhibitor prevalence observed in Colombian patients with HA.Our findings mandate the need to develop future studies exploring the role of specific treatment regimens based on individual patient genotype, as this strategy might not only lead to improved treatment outcomes but might also impact the cost-effectiveness of these personalized therapies.
Nevertheless, we need to acknowledge that one of the limitations of WES analysis is that it can only detect variants in the exonic region of F8.In hemophilia, about 50% of patients have variants in the exonic region, for which the use of other methods that allow for the detection of intron 1 and intron 22 inversions is mandated.

Conclusions
Our study shows differences in the frequency and type of pathogenic F8 variants in a cohort of Colombian patients with HA compared to other previously reported HA populations.Furthermore, more than a third of identified F8 variants have not been previously described in the literature.Interestingly, none of these novel variants were associated with a risk for FVIII inhibitor formation.In our cohort, we also found a greater association of inhibitor presence between F8 variants located in the FVIII light chain compared to the heavy chain, which could indicate that the more conserved the domains are, the greater the probability of inhibitor generation.Moreover, those variants located in the C2 domain seemed to have the highest risk for inhibitor formation Based on our findings, we propose that population-specific F8 genotype information, along with patient bleeding phenotype, can help to develop personalized treatment regimens to further optimize their effectiveness and safety.Population-specific F8 genotyping can also allow for the identification of HA patients with the highest risk for inhibitor formation.Certainly, these are two important steps towards advancing precision medicine adapted to diverse patient populations.Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Figure 1 .
Figure 1.Workflow for variant prioritization of single nucleotide variants and small insertion/deletion variants.The filtering algorithm strategy used in Varseq 2.3.0 for prioritizing variants in F8 and co-expression-associated genes is shown.This figure was made using BioRender.

Figure 1 .
Figure 1.Workflow for variant prioritization of single nucleotide variants and small insertion/deletion variants.The filtering algorithm strategy used in Varseq 2.3.0 for prioritizing variants in F8 and co-expression-associated genes is shown.This figure was made using BioRender.

Figure 2 .
Figure 2. Location of mutations in F8 variants (variant nomenclature).The F8 gene contains two regions, F8A and F8B, in intron 22 that act as a bidirectional promoter, transcribing in the forward and reverse directions.* indicates splice variants.

Figure 2 .
Figure 2. Location of mutations in F8 variants (variant nomenclature).The F8 gene contains two regions, F8A and F8B, in intron 22 that act as a bidirectional promoter, transcribing in the forward and reverse directions.* indicates splice variants.

Figure 3 .
Figure 3. Locations of the protein mutations per domain (protein change nomenclature).This figure shows the location of mutations in the variants of F8.The variants c.670+1G>T and c.389-2A>G are not shown because they are splicing variants.

Figure 3 .
Figure 3. Locations of the protein mutations per domain (protein change nomenclature).This figure shows the location of mutations in the variants of F8.The variants c.670+1G>T and c.389-2A>G are not shown because they are splicing variants.

Life 2024 , 15 Figure 4 .
Figure 4. Location of the affected amino acids in the three-dimensional protein structure of FVIII resulting from mutations in the coding region of the F8 gene.This figure was made by the authors using the PyMOL tool.

Figure 4 .
Figure 4. Location of the affected amino acids in the three-dimensional protein structure of FVIII resulting from mutations in the coding region of the F8 gene.This figure was made by the authors using the PyMOL tool.

Figure 4 .
Figure 4. Location of the affected amino acids in the three-dimensional protein structure of FVIII resulting from mutations in the coding region of the F8 gene.This figure was made by the authors using the PyMOL tool.

Figure 5 .
Figure 5. Models of the three-dimensional structure of FVIII protein.The changes in the amino acids in the heavy chain are highlighted.All models are made by the authors using the PyMOL tool.

Figure 5 .
Figure 5. Models of the three-dimensional structure of the FVIII protein.The changes in the amino acids in the heavy chain are highlighted.All models are made by the authors using the PyMOL tool.Life 2024, 14, x FOR PEER REVIEW 12 of 15

Figure 6 .
Figure 6.Models of the three-dimensional structure of the FVIII protein.The changes in the amino acids in the light chain are highlighted.All models are made by the authors using the PyMOL tool.

Figure 6 .
Figure 6.Models of the three-dimensional structure of the FVIII protein.The changes in the amino acids in the light chain are highlighted.All models are made by the authors using the PyMOL tool.

Table 2 .
Identified Variants by Type and Location.

Table 1 .
Causative variants identified using whole-exome sequencing in Colombian patients with severe hemophilia A (hg38).

Table 2 .
Identified Variants by Type and Location.

Table 3 .
Specific F8 Variants Identified in the Study Cohort.

Table 4 .
Observed Variants on Patients with Inhibitors.

Table 4 .
Observed Variants on Patients with Inhibitors.

Table 5 .
Statistical test between coding impact and the presence of inhibitors.

Table 6 .
Variant Location and Inhibitor Risk.

Table 7 .
Comparison of the results reported by meta-analyses.