Whole Exome Sequencing Reveals Clustering of Variants of Known Vitiligo Genes in Multiplex Consanguineous Pakistani Families

Vitiligo is an autoimmune complex pigmentation disease characterized by non-pigmented patches on the surface of the skin that affect approximately 0.5–2% population worldwide. The exact etiology is still unknown; however, vitiligo is hypothesized to be a multifactorial and genetically heterogeneous condition. Therefore, the current study is designed to investigate the anthropometric presentation and genetic spectrum of vitiligo in fifteen consanguineous Pakistani families. The clinical evaluation of participating individuals revealed varying degrees of disease severity, with 23 years as the average age of disease onset. The majority of the affected individuals had non-segmental vitiligo (NSV). Whole exome sequencing analysis revealed clustering of rare variants of known vitiligo-associated genes. For instance, in the affected individuals of family VF-12, we identified three novel rare variants of PTPN22 (c.1108C>A), NRROS (c.197C>T) and HERC2 (c.10969G>A) genes. All three variants replaced evolutionarily conserved amino acid residues in encoded proteins, which are predicted to impact the ionic interactions in the secondary structure. Although various in silico algorithms predicted low effect sizes for these variants individually, the clustering of them in affected individuals increases the polygenic burden of risk alleles. To our knowledge, this is the first study that highlights the complex etiology of vitiligo and genetic heterogeneity in multiplex consanguineous Pakistani families.


Introduction
Vitiligo is a common skin pigmentation disorder with remarkable genetic and clinical heterogeneity. Vitiligo is characterized by the appearance of colorless patches on the epidermal surface of the skin. Discrepancies in the production and distribution of melanin in affected areas result in the onset of this socially stigmatizing and psychologically devastating disease. Vitiligo may affect one or both halves of the body. The vitiligo European task force assessment (VETFa) has broadly classified vitiligo into two main types, i.e., non-segmental vitiligo (NSV) or generalized vitiligo (GV) and Segmental Vitiligo (SV).

Family Enrollment and Clinical Classification
The proceedings of the current study were approved by committees of the Institutional Review Board (IRB) at the University of Maryland, School of Medicine, Baltimore, MD, USA (HP-00061036) and Pir Mher Ali Shah Arid Agriculture University Rawalpindi, Pakistan. The tenets of the Declaration of Helsinki were followed for the use of human subjects in the current study. Informed consent in writing was obtained from all participants. Fifteen vitiligo families were recruited from different regions of the Punjab province of Pakistan in this study.
Detailed family histories were obtained to document disease onset, affected area of skin, disease progression status, smoking/non-smoking status, chemical exposure, general lifestyle, eating habits, treatment etc. Family pedigrees were drawn based on the information received and verified by multiple participating family members.

Sample Collection and Isolation of DNA
From all participating individuals, 3 mL of venous blood samples were collected in an EDTA Orsin vacutainer (formerly imuMed ® ) and stored at 4 • C until further use. The genomic DNA was isolated from the blood samples of participants using the Thermo Scientific™ GeneJET genomic DNA purification kit (Catalog #: K0722). The concentration of DNA was quantified by NanoDrop ® ND-1000 UV-Vis Spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA), and DNA quality was assessed by 260/280 and 260/230 ratios.

Whole Exome Sequencing
Whole exome sequencing was performed on the probands of families VF-01, VF-02-, VF-03, VF-09, VF-10, VF-12 and VF-15. Genomic libraries were recovered for exome enrichment by using the Agilent SureSelect Human Expanded All Exon V5 kit and sequenced on an Illumina HiSeq4000 (Illumina, San Diego, CA, USA) with an average of 100× coverage [10]. Broad Institute's Toolkit for Genome Analysis was used to analyze the sequencing data, as described previously [11,12], with the following modifications: (a) allele frequen-cies cut off was set to ≤0.003; (b) potential pathogenic prediction was achieved by at least 2 in silico algorithms; and (c) CADD score ≥ 6.0. Primers were designed using the online resource Primer3 (http://bioinfo.ut.ee/primer3-0.4.0/, accessed on 23 October 2018) for the variants that passed our filtration criteria; targeted DNA amplification by conventional PCR followed by Sanger sequencing was carried out for segregation studies [13].

Clinical Evaluation of Vitiligo in Multiplex Pakistani Families
After IRB approval, a total of 15 multiplex consanguineous families segregating vitiligo ( Figure 1) were enrolled in this study from five different cities (Rawalpindi, Chakwal, Mianwali, Abbottabad, Islamabad) in Pakistan (Table 1). A total of 130 individuals from these families participated in our study, including 72 males (55%) and 58 females (45%). Of these enrolled subjects, 53 (33 males and 20 females) exhibited the vitiligo phenotype, while 77 individuals had no skin problem and, thus, were used as controls in our downstream genetic studies. The age range for onset vitiligo macules was between 2 to 68 years, with an average of 23 years. Among the enrolled females, non-segmental vitiligo (NSV) was the most noticeable clinical phenotype noted in forty-eight females (91%), three females (6%) had vitiligo universalis (VU), while two females (3%) reported complete recovery from their vitiligo phenotype. Almost all affected individuals reported having no other autoimmune diseases or any other comorbidity besides vitiligo, except two affected individuals that had type 1 diabetes mellitus (Table 1).
Inter-and intra-familial variability in the areas of skin discoloration among the affected individuals was identified (Table 1). For instance, family VF-02 is an extended vitiligo family having nine affected individuals ( Figure 1). Three of the affected individuals (VF02_15, _16, _17) had 90-100% of their body with vitiligo patches and thus present VU (Table 1), while the other affected individuals had much fewer patches (Table 1). Intriguingly, according to family history interviews, one affected female (VH02_07) recovered from vitiligo without any medication. In contrast, the three affected individuals of family VF-04 had relatively less progressive disease severity (Table 1).  Inter-and intra-familial variability in the areas of skin discoloration among the affected individuals was identified (Table 1). For instance, family VF-02 is an extended vitiligo family having nine affected individuals ( Figure 1). Three of the affected individuals (VF02_15, _16, _17) had 90-100% of their body with vitiligo patches and thus present VU (Table 1), while the other affected individuals had much fewer patches (Table 1). Intriguingly, according to family history interviews, one affected female (VH02_07) recovered from vitiligo without any medication. In contrast, the three affected individuals of family VF-04 had relatively less progressive disease severity (Table 1).

Genetic Analysis of Families with Vitiligo
To decipher the genetic risk factors contributing to the vitiligo phenotype in the enrolled families, we performed whole exome sequencing on index patients from seven families along with ethnically matched normal control samples. After quality control, we used in silico panel to filter variants in the known vitiligo-associated genes [14]. Next, we selected variants of the genes that had (a) an allele frequency of ≤0.001; (b) a CADD score of ≥7.0; and (c) predicted damage by at least one in silico algorithm. Table 2 lists all variants of known vitiligo genes that passed these criteria. Although various in silico algorithms predicted low effect size for these variants individually, the clustering of these variants in affected individuals increases the polygenic burden of risk alleles (Table 2); we found potentially pathogenic variants of three known vitiligo-associated genes, PTPN22, HERC2 and NRROS, in a varying combination of zygosity among affected individuals of family VF-12 ( Figure 2). Proband III-2, having the most severe phenotype, was found heterozygous for the rare missense variants (

Genetic Analysis of Families with Vitiligo
To decipher the genetic risk factors contributing to the vitiligo phenotype in the enrolled families, we performed whole exome sequencing on index patients from seven families along with ethnically matched normal control samples. After quality control, we used in silico panel to filter variants in the known vitiligo-associated genes [14]. Next, we selected variants of the genes that had (a) an allele frequency of ≤0.001; (b) a CADD score of ≥7.0; and (c) predicted damage by at least one in silico algorithm. Table 2 lists all variants of known vitiligo genes that passed these criteria. Although various in silico algorithms predicted low effect size for these variants individually, the clustering of these variants in affected individuals increases the polygenic burden of risk alleles (Table 2); we found potentially pathogenic variants of three known vitiligo-associated genes, PTPN22, HERC2 and NRROS, in a varying combination of zygosity among affected individuals of family VF-12 ( Figure 2). Proband III-2, having the most severe phenotype, was found heterozygous for the rare missense variants (Figure 2) of PTPN22 (c.1108C>A; p.(His370Asn)), HERC2 (c.10969G>A; p. (Val3657Ile)), and NRROS (c.197C>T; p.(Ala66Val)). Different family members inherited different combinations of these potentially pathogenic variants, and hence disease severity and allelic burden vary among these individuals ( Figure 2 and Table 1). Variants found in the PTPN22, HERC2 and NRROS genes have been previously known for their association with NSV in different ethnicities around the world [15][16][17][18].  p.(Ala66Val)). Different family members inherited different combinations of these potentially pathogenic variants, and hence disease severity and allelic burden vary among these individuals ( Figure 2 and Table 1). Variants found in the PTPN22, HERC2 and NRROS genes have been previously known for their association with NSV in different ethnicities around the world [15][16][17][18].  Amino acid conservation studies have shown that the variants identified in PTPN22, HERC2 and NRROS genes are present in the evolutionarily conserved regions of the encoded proteins ( Figure 2C). Next, we performed molecular modeling analysis using the Phyre2 online resource for retrieving structural information based on protein homology present in the protein database. Using this approach, we were able to model the two missense variants found in the HERC2 and NROSS proteins (Figure 3). The lack of available crystalline structural information for PTPN22 restrained further analysis from predicting the effect of the mutation on protein structure and function. A different pattern of interresidual bonding interaction between wild type and mutant protein was observed for the p.Val 3657Ile variant of HERC2 ( Figure 3A). The modeling data predicted that p.Val3657Ile substitution might impact protein folding due to the formation of an additional H-bond between p.Ile3657 and p.Leu3661 ( Figure 3A). Similarly, the p.Ala66Val variant of NRROS is predicted to greatly hamper the microscopic interaction in the surrounding area of 10 Å by differential H-bonding patterns ( Figure 3B). Surrounding the p.Ala66 residue in WT NRROS, there are four prominent hydrogen bonds: between p.Gly45 and p.Asp65 (2.837 Å); p.Cys91 and Leu64 (0.3129 Å); p.Cys91 and p.His89 (0.374 Å); and p.Ala66 and p.His89 (2.921 Å; Figure 3B). In contrast, the p.Ala66Val substitution exhibits a different set of H-bonding capabilities, resulting in the loss of H-bonds between p.Gly45 and p.Asp65 and the formation of two new H-bond bonds between p.Val66 and p.His89 (2.962 Å), and between p.Gly113 and p.Ser90 (2.293 Å; Figure 3B). This new pattern of interaction may induce torsion, leading to improper protein folding and properties, which may have an effect on function. A noticeable decrease in Gibbs free energy was recorded in PTPN22 (−0.73516404), HERC2 (−0.52853749) and NRROS (−0.22229897) by MUpro, indicating a decrease in protein 3D structure stability, as corroborated by the I-Mutant prediction program. Overall, the three identified variants have been anticipated to have deleterious effects on protein structures, rendering the proteins unstable ( Figure 3C).  Using this in silico approach, we also modeled six missense variants found in the affected individuals of our vitiligo families (Table 2) and predicted their potential pathogenic impact on the encoded proteins ( Figure 4). These include p.Asn201Thr (GSTP1), p.Arg63Gln (CDH1), p.Arg161Trp (CASP7), p.Pro695Leu (NOS2), p.Glu518Lys (UBASH3A), and p.Asp442Gly (PTPRC) variants ( Figure 4). The p.Asn201Thr variant of GSTP1 is located within the GST C-terminal domain, and due to different sizes and ionic properties, it might impact the folding of the domain (Figure 4). A similar impact was predicted for the p.Arg63Gln variant of CDH1, p.Arg161Trp of CASP7, and p.Asp442Gly of PTPRC (Figure 4). The p.Pro695Leu of NOS2 is predicted to form a new H-bond with p.Ile692 (Figure 4). Moreover, the WT Proline residue is known to be very rigid and therefore induce a special backbone conformation; therefore, substitution with Leucine would affect this conformation. Finally, the p.Glu518Lys variant of UBASH3A is predicted to alter the salt bridge with Arginine at position 406 and Arginine at position 485 (Figure 4), and thus it ultimately affects protein folding and function. Overall, the above missense variants are predicted to have an impact on protein folding and secondary structure, which might lead to impaired function.

Discussion
Vitiligo is a disease of complex etiology, whereby the involvement of genetic contributors in disease pathogenesis has been emphasized greatly since the advent of modern genetic technologies. Previously, several genetic association studies have identified risk alleles responsible for various types of vitiligo in different ethnicities [17,[19][20][21][22][23][24]. In the present study, we reported clinical and genetic findings in fifteen highly inbred consanguineous Pakistani families. Our study highlights the genetic heterogeneity of the disease in terms of risk allele clustering as well as associated risk allele burden in affected individuals. We report the identification of several rare variants of known vitiligo-associated genes, including three novel missense variants in PTPN22 (c.1108C>A; p.(His370Asn)), NRROS (c.197C>T; p.(Ala66Val)) and HERC2 (c.10969G>A; p. (Val3657Ile)) in family VF-12. In silico analysis of the risk variants for pathogenicity determination predicted a low to moderate pathogenic impact of all the identified variants; however, clustering of mutated alleles in an individual collectively contributed to the onset of vitiligo.
Autoimmune-mediated melanocyte destruction is a common phenomenon in vitiligo pathogenesis that involves intrinsic defects within melanocytes and an autoimmune system that targets these cells [25]. PTPN22, which encrypts genetic information for

Discussion
Vitiligo is a disease of complex etiology, whereby the involvement of genetic contributors in disease pathogenesis has been emphasized greatly since the advent of modern genetic technologies. Previously, several genetic association studies have identified risk alleles responsible for various types of vitiligo in different ethnicities [17,[19][20][21][22][23][24]. In the present study, we reported clinical and genetic findings in fifteen highly inbred consanguineous Pakistani families. Our study highlights the genetic heterogeneity of the disease in terms of risk allele clustering as well as associated risk allele burden in affected individuals. We report the identification of several rare variants of known vitiligo-associated genes, including three novel missense variants in PTPN22 (c.1108C>A; p.(His370Asn)), NRROS (c.197C>T; p.(Ala66Val)) and HERC2 (c.10969G>A; p. (Val3657Ile)) in family VF-12. In silico analysis of the risk variants for pathogenicity determination predicted a low to moderate pathogenic impact of all the identified variants; however, clustering of mutated alleles in an individual collectively contributed to the onset of vitiligo.
Autoimmune-mediated melanocyte destruction is a common phenomenon in vitiligo pathogenesis that involves intrinsic defects within melanocytes and an autoimmune system that targets these cells [25]. PTPN22, which encrypts genetic information for lymphoidspecific intracellular phosphatase, is the main inhibitor of T-cell activation [19]. Many autoimmune diseases, including type 1 diabetes, arthritis, systemic lupus and vitiligo, have also been previously associated with PTPN22 variants [4]. NSV was found to be strongly associated with PTPN22 (c.1858C>T; p. (Arg620Trp)) in the cohorts from the United Kingdom, Gujarat, India and Romania in three different case-control studies [15,26]. Similarly, HERC2, which encodes for the E3 ubiquitin-protein, provides protection against ionizing radiation and DNA damage related to radiation. Previous studies have shown that an elevated risk of vitiligo is associated with HERC2 variations, which have a significant impact on determining skin and iris color besides the OCA2. Previous studies on Caucasian and Chinese Han cohorts also reported a significant association between the T allele of rs6583331 SNP of NRROS and vitiligo [17]. However, in our cohort, a novel vitiligoassociated variant of NRROS was identified. Our outcomes are in agreement with previous findings, thus indicating an increased risk for intra-familial NSV development linked to NRROS.
Besides the above-mentioned genes, as shown in Table 2, the majority of the variants identified in our cohort are present in proteins that function in innate immune responses, inflammation reaction, control of cell apoptosis, anti-oxidative damage, and have a role in detoxification reactions. For instance, IFIH1 is an interferon-induced helicase C domaincontaining protein, an innate immune receptor that acts as a cytoplasmic sensor of viral nucleic acids and plays a major role in sensing viral infection and in the activation of a cascade of antiviral responses, including the induction of type I interferons and proinflammatory cytokines [25]. The IL1RAPL1-Interleukin-1 receptor accessory protein-like 1 regulates secretion and presynaptic differentiation through inhibition of the activity of the N-type voltage-gated calcium channel and is also involved in the activation of the MAP kinase JNK. TLR4 is a transmembrane protein, a member of the Toll-like receptor family, which belongs to the pattern recognition receptor (PRR) family. TLR4 participates in intracellular signaling NF-κB and inflammatory cytokine production, which is responsible for activating the innate immune system. CASP7 encodes a member of the Cysteine-Aspartic acid protease (caspase) family and plays a central role in the execution phase of cell apoptosis and survival of melanocytes [4]. Similarly, the BCL2L12 protein family and their interactors include inhibitors and inducers of cell death, regulate and mediate the process by which mitochondria contribute to cell death known as the intrinsic apoptosis pathway, and thus regulate the survival of pigmented cells.
In conclusion, the convergence theory of vitiligo development bridges all previously known etiologies and describes vitiligo as a multivariate, complex, and multifactorial disorder [9]. Vitiligo is not caused merely by the presence of an underlying genetic variation or only by exposure to some specific external factor such as stimuli or defective melanocytes and an internal autoimmune condition but is a result of an intricate combination of all these etiological contributors. Therefore, it is important to acknowledge that the underlying molecular pathways affected in one patient do not always lead to the same level of pathogenicity in other individuals. Such attributes of vitiligo define that identification of individual-specific pathways is significant for the targeted development of more effective therapeutics [26]. Polygenic Risk Score (PRS) estimation is a trending approach for the genetic description of complex disorders, including vitiligo. However, there always remains quite a fair chance for the involvement of additional genes or novel variants in the known vitiligo-associated genes. In this regard, it has been recently described that the phenomenon of clustering of risk alleles in an inbred family increases polygenic risk burden [5], and our study further supports this notion.