Prediction and in silico validation of MYH7 gene missense variants in the Iranian population: a bioinformatics analysis based on Iranome database

Identifying disease-causing genetic variants in a particular population improves the molecular diagnosis of genetic disorders. National genome databases provide valuable information on this matter. This study aimed to investigate the genomic variants of the MYH7 gene, related to the common heart disease, i.e., hereditary cardiomyopathy. MYH7 gene variants were extracted from the Iranome database and loaded into SPSS software. The filtration steps were performed based on the variant specification and with emphasis on identifying missense changes. Using predictive algorithms, different aspects of the changes such as allele frequency and functional defects were investigated. Our results showed that 41 (17.4%) coding variants were synonymous compared with 18 (7.7%) missense alterations. The missense variants were mostly observed in exons 20–40 that encode MyHC α-helical rod tail. The p.Pro211Leu, p.Arg787His, p.Val964Leu, p.Arg1277Gln, and p.Ala1603Thr were already known to be associated with inherited cardiomyopathy. Four of the missense variants, p.Asn1623Ser, p.Arg1588His, p.Phe1498Tyr, and p.Arg1129Ser, were located on MyHC α-helical rod tail and none of them was annotated on dbSNP or genomAD databases. Our study showed several MYH7 variants associated with the disease in the Iranian population. The results emphasize the importance of analyzing the exons encoding MyHC α-helical rod tail. The investigation of genomic databases can be considered as a cost-effective strategy using targeted mutation detection analyses. The efficacy of this prediction method should be elucidated in further studies on patients’ cohorts.


Background
Recent progress in the detection of molecular genetic defects has led to a major development in the diagnosis and treatment of diseases. Decoding the human genome has provided important clues about the genetic diversity of diseases and paved the way for the development of more specialized prevention, diagnostic and therapeutic strategies. By using high-throughput technologies, next-generation sequencing (NGS) generated a significant amount of genomic data, which has been widely used over the past decade [1].
The NGS method can generally be used to sequence genes regardless of their size and complexity and cover all parts of the genome. This widespread coverage has improved the sensitivity of mutation detection methods more than other conventional approaches. Currently, the causative variants of many single-gene disorders have been identified by the NGS-based method. However, at the clinical level, identifying the effect of genetic variants on the cell function and pathogenesis is extremely important. Thus, various software and web-based bioinformatics tools have been designed and presented for variant evaluation [2].
Hereditary cardiomyopathies include a group of diseases that involve the heart muscle [3]. Their most common complications comprise the thickening of the heart muscle or dilation of the ventricles, which lead to hypertrophic (HCM) and dilated (DCM) cardiomyopathies, respectively [4]. Importantly, the patients may be asymptomatic or have mild non-specific symptoms. For this reason, heart failure can progress to sudden cardiac arrest in a seemingly healthy individual. Since cardiomyopathies run in families, rapid and accurate molecular diagnosis can be of great value to prevent the disease progression in individuals with a positive family history [5].
One of the genes associated with cardiomyopathies is myosin heavy chain gene (known as MYH7), which its mutations are reported in 14-25% of all cardiomyopathy cases [6]. The MYH7 gene is located on the 14q11-12 chromosomal position and consisted of 40 exons. Myosin heavy chain (MyHC) protein is almost exclusively expressed in heart muscle and contributes to the formation of thick filaments in a hexamer format along with myosin light chains. The protein has 1934 amino acids and is consisted of two spherical heads followed by an extended α-helical myosin rod tail which are bonded together at the neck region [7].
Since primary studies in cardiomyopathies, commonly reported the mutations in the head area, the importance of rod tail region is often underestimated.
Given that the conventional study of the MYH7 gene is time-consuming and costly, regional studies have been limited to analysis exons in the MyHC head domain. Consequently, they have not had much success in mutation detection.
Iranian Genome Database (Iranome) has provided genomic information on 800 individuals regardless of their disease or health status [8,9]. The distribution of reported variants could help to predict the occurrence of mutations in the related pathological conditions. It can be assumed that the variants reported in Iranome could also be distributed in the related patients. Although this is not a straightforward link, it can be a key to predicting pathogenic mutations. Owing to these facts, we aimed to perform further bioinformatics studies regarding MYH7 variants based on the Iranome database. The objective of our study was to identify variants that could be disease-causing. By detecting these variants, further clinical validation studies can focus on exons which probably have a higher chance of mutation in the Iranian population.
When variants were analyzed based on the exon-intron distribution, it was found that intron 22 had the highest rate of changes. The synonymous alterations were located almost uniformly in all exons and two nonsense changes were reported in exons 3 and 33. Interestingly, the missense variants were mostly observed in exons 20-40 that encode MyHC α-helical rod tail (Fig. 2).

MYH7 missense variants
For further identification of the variants that could be considered as pathogenic in the Iranian population, missense substitutions were studied more precisely. The variants which were positioned on the exons and subsequently led to MyHC protein amino acid changes were then filtered. The filtering analysis found 18 missense alterations, including p.Pro211-Leu, p.Arg787His, p.Val964Leu, p.Arg1277Gln, and p.Ala1603Thr which were already known to be associated with inherited cardiomyopathy. Some substitutions had previously been identified as a causative mutation in cardiomyopathies, although the subsequent studies did not confirm their pathogenesis. From this group, we can refer to p.Ala26-Val and p.Arg1662His. Due to the high prevalence in the human genome databases and the results of clinical and bioinformatics studies, two variants, p.Asn1257Ser and p.Ser1491Cys, were previously considered as polymorphisms [10]. Variants p.Ala1191Thr, p.Ser1366Leu, p.Ser1596Leu, p.Asn1824Asp, and p.Asn1824Ser were found with relatively rare allele frequencies in dbSNP or genomAD databases. However, they were not reported related to any disease and generally considered as uncertain significance ( Table 1).
The majority of the variants were detected in heterozygote states in only one individual out of 800 genomes indicating that they were very rare (allele frequency of 0.000625). Three variants were found in heterozygous status, each of them in two different individuals. With Allele frequency of 0.0025, the variant p.Arg1277Gln was found in four individuals in a heterozygous manner. The most common variants were p.Ser1491Cys, with 22 reported heterozygous individuals and allele frequency of 0.01375 which implies that it is a population polymorphism.
The results of variant pathogenicity on the databases and in silico analysis are presented in Table 2. As shown in the table, the results obtained from different sources were not necessarily consistent, and the conflicting outcome was observed. Variants with the most evidence of disease-causing were p.Val964Leu, p.Arg1277Gln, and p.Ala1603Thr.

Interpretation of not annotated MYH7 missense variants
As indicated in Table 1, four reported missense variants, p.Asn1623Ser, p.Arg1588His, p.Phe1498Tyr, and p.Arg1129-Ser, were not annotated on dbSNP or genomAD databases. All the four variants were located on MyHC α-helical rod tail (Fig. 3). Except for p.Asn1623Ser, the rest of the variants have not been reported on the ClinVar website.
p.Arg1129Ser (c.3387G>C) located on MYH7 exon 27 was identified as damaging by FATHMM. Another substitution in the nucleotide number 3387 (c.3387G>A) has been reported on ClinVar. This synonymous change which does not result in an amino acid change (p.Arg1129 =), has been reported in cardiomyopathy and considered as likely benign [11].
The p.Phe1498Tyr is located on exon 32 and has been declared as damaging by the majority of the algorithms, but not by MutationAssessor and FATHMM which interpreted this variant as tolerated (Table 3).
By the score of 34, p.Arg1588His (c.4763G>A) has the highest combined annotation-dependent depletion (CADD)

Discussion
Using various predictive algorithms, we have evaluated the MYH7 gene variants reported on the Iranome website. Following the filtering steps, 18 missense MYH7 variants were found that could be related to the pathogenesis of the cardiomyopathies. Located on the exon 3, p.Ala26Val was previously reported in HCM and DCM probands of the Asian-origin families [15]. Further studies revealed that Alanine 26 substitution is likely benign as it occurs at poorly conserved amino acid. Furthermore, it has an allele frequency of 0.55 in the East Asian population, which based on the ClinGen Inherited Cardiomyopathy Expert Panel, is above the threshold and should be considered as benign [16]. Another variant, p.Pro211Leu, is identified in several studies related to cardiomyopathies [17,18]. It has been reported in several patients as a compound heterozygous alteration along with other MYH7 missense mutations [19]. It should be noted that adjacent mutations to Pro211Leu were reported to be involved in the disease pathogenesis. Also, its low prevalence is another reason to be considered as a disease causative mutation.
In a previous study, p.Arg787His was declared as a mutation that could cause phenotypes of varying severity [20]. This mutation has been reported in several studies from India, while in Iranome database, it has been identified in a Persian Gulf Islander in a heterozygous status.
By geographic proximity, it can be assumed that a founder effect is involved, although in studies from India, this mutation has been identified as de novo [21].
The variant which should be considered seriously in Iranian cardiomyopathy patients is p.Val964Leu located on exon 23. In Iranome, two individuals from Turkmen and Persian ethnicity carried this substitution. The p.Val964Leu has been reported linked to cardiomyopathies, either HCM or DCM, in numerous studies [22][23][24]. However, this variant is indicated in ClinVar with conflicting interpretations of pathogenicity because of relatively high frequency in the European population (0.08%). The Valine964 is located in the neck region of MyHC and is a highly conserved amino acid and thus the change to Leucine was predicted to be pathogenic [25].
Another variant of uncertain significance is p.Arg1277Gln which has changed as a semi-conservative amino acid. This substitution is located on exon 34 and has been reported from different parts of the world [26,27].
The p.Ala1603Thr is another alteration that should be considered in Iranian studies. In silico testing, including protein predictors and evolutionary conservation, showed that p.Ala1603Thr can be pathogenic. Using high resolution melting (HRM) method, this variant was firstly reported in a cohort of HCM patients [28]. In a recent study, p.Ala1603Thr has also been reported in an HCM patient and it has been deemed as pathogenic in the population study [29].
The next variant of uncertain significance is p.Arg1662His, which is found in both HCM and DCM Fig. 3 The location of the four unreported variants is shown on the MyHC rod tail  [30,31]. It should be noted that Histidine is the wildtype amino acid at this position, in different species.

Conclusion
In our study, four amino acid substitutions in MYH7 protein were taken into consideration. These variants occurred in the protein tail rod region and were reported as disease causative by most prediction software. Among them, p.Asn1623Ser was reported in ClinVar and suggested to be deleterious based on a computational algorithm that was developed to evaluate the pathogenicity of MYH7 gene variants. The other three variants were not present in dbSNP or genomAD databases and have not been reported in individuals with MYH7-related cardiomyopathy according to the literature. That could be evidence of their pathogenicity in the Iranian population. However, this finding should be confirmed by conducting molecular studies on potential patients. In summary, the Iranian patient's studies should be prioritized to evaluate MYH7 exons 20-40. Given the high cost of molecular diagnosis and its vital importance for many patients, the availability of national databases should be considered as a valuable opportunity. The availability of this information will also prevent blind studies and have a promising impact on the perspective of genetics research.

Data extraction
Based on the literature review and extensive search on available databases, MYH7 was selected due to its greatest contribution to hereditary cardiomyopathies. All national reports that had represented MYH7 mutations and their association with cardiac disease were screened. In the next step, by referring to the Iranome website, all the reported MYH7 variants were extracted and loaded onto the SPSS version 20.0 and Excel 2010 software. Iranome database includes the results of NGS analysis of 800 genomes obtained from Iranian individuals over 35 years old. The samples were collected from 8 different Iranian ethnic groups, 100 individuals from each. Iranome website provides a search tool based on the gene name, genomic region, transcript, and multi-allele variants which are continually updated with new genomic data. The majority of the reported variants are similar to other communities, while 30% (422,000) of these genetic changes are unique to the Iranian population.

Filtering strategy
To determine the MYH7 gene varieties associated with cardiomyopathies, the data in Iranome was filtered in several steps. All variants which occurred in exons and led to amino acid change were selected and studied. To identify the pathogenic effects of the variants, they were divided into two groups including previously reported and unannotated variants. An "unannotated variant" was referred to as the alterations that were not previously interpreted on the dbSNP or genomAD databases. Published articles and documents related to the reported mutations were also analyzed and variants that were associated with cardiomyopathy phenotype were identified.

Bioinformatics
Bioinformatics analysis was done on putative MYH7 nucleotide substitutions selected from filtering steps using the following databases and online resources.