A novel gene ZNF862 causes hereditary gingival fibromatosis

Hereditary gingival fibromatosis (HGF) is the most common genetic form of gingival fibromatosis which is featured as a localized or generalized overgrowth of gingivae. Currently two genes (SOS1 and REST), as well as four loci (2p22.1, 2p23.3–p22.3, 5q13–q22, and 11p15), have been identified as associated with HGF in a dominant inheritance pattern. Here, we report 13 individuals with autosomal-dominant HGF from a four-generation Chinese family. Whole-exome sequencing followed by further genetic co-segregation analysis was performed for the family members across three generations. A novel heterozygous missense mutation (c.2812G > A) in zinc finger protein 862 gene (ZNF862) was identified, and it is absent among the population as per the Genome Aggregation Database. The functional study supports a biological role of ZNF862 for increasing the profibrotic factors particularly COL1A1 synthesis and hence resulting in HGF. Here, for the first time we identify the physiological role of ZNF862 for the association with the HGF.

Introduction as an autosomal-dominant inheritance pattern. Nevertheless, sporadic cases and autosomal-recessive inheritance pedigrees have also been previously described (Majumder et al., 2013).
As reported previously, to date, four loci (2p22.  (Xiao et al., 2001;Ye et al., 2005;Zhu et al., 2007;Xiao et al., 2000); besides a heterozygous frameshift mutation in SOS1 (MIM: 182530) has been identified as causative for the autosomal-dominant HGF in a Brazilian family (Hart et al., 2002); and protein truncating mutations in REST (MIM: 600571) have been reported as the genetic cause of an autosomal-dominant inheritance pattern HGF across three independent families (Bayram et al., 2017). The estimated incidence of HGF is 1 per 175,000 of the population (Ahmed and Ali, 2015), and equal between males and females (Gawron et al., 2016;Odessey et al., 2006). Periodontal surgery including gingivectomy, gingivoplasty, and flap surgery can be applied to patients with HGF due to the aesthetic and functional concern, whereas the recurrence of hyperplasia is relatively high potentially owing to the underlying genetic predisposition (Chaurasia, 2014;Zhou et al., 2011). Therefore, it is important to explore the genetic causes and etiology accounting for HGF with expectations of the precise genetic diagnosis and of the potential gene therapy development, to help individuals who desire to circumvent the sufferings and improve their lives. We report a large Chinese family with 13 affected individuals clinically diagnosed with nonsyndromic HGF, and with 12 unaffected family members. Genetic analysis of this large family led to the identification of a heterozygous mutations in a novel gene ZNF862.

Results and discussion
This four-generation pedigree with HGF was shown in Figure 1A. Visual inspection indicated severe or mild generalized GF in all affected members, whereas not in any unaffected member, from this family. No individuals in this family was ever exposed to the medication that may result in GF according to our investigation. No other congenital abnormality has been found for all family members. The clinical data of all individuals in this family are summarized in Table 1, and clinical photos of one control and three patients from each generation are available in Figure 1B. From a clinical perspective, the proband (IV-2) had the mild recurrence of hyperplasia during the second year after the periodontal surgery of gingivectomy, gingivoplasty, and flap surgery; while the patient II-2 had an additional clinical finding (hypertension, with no medication of calcium channel blockers treated).
Whole-exome sequencing (WES) was performed for the proband and nine other members (numbers indicated in red color in Figure 1A) in the family according to previously described protocols (Zhang et al., 2015). After analyzing all single nucleotide variants (SNVs) and indels for each gene including the promoter region, exons, splicing sites, introns, and untranslated region (UTR), only three variants ATP7B (c.3403G > A), CDADC1 (c.83-13G > T), ZNF862 (c.2812G > A) were identified to be co-segregated with the phenotype among the 10 family members who underwent WES. To further screen and validate the potential disease-causing variants co-segregation with the phenotype in this pedigree, conventional PCR was performed for sanger sequencing evaluation. Eventually, only ZNF862 (c.2812G > A), no any other variant, was validated to be co-segregated with the phenotype in all 23 alive members in this family, as is shown in Figure 1C.
ZNF862 gene harbors eight exons, encodes a 1169-amino acid protein, and is located on chromosome 7q36.1. It does not escape our notice that ZNF862 does not map to any one of the loci that were reported previously as be associated with HGF. ZNF862 is a predicted intracellular protein which function, to the best of our knowledge, is not yet clearly identified, as a zinc finger protein it may be involved in transcriptional regulation. ZNF862 is expressed ubiquitously across tissues (Fagerberg et al., 2014), it may play various roles under different physiological condition. Schwartz et al., 2019, suggested a plausible role of ZNF862 in matrix-producing metaplastic carcinoma. Peng et al., 2018, reported ZNF862 associated with IgE-mediated type-I hypersensitivity in children. The WES-identified heterozygous variant (p.A938T) in our study located close to and upstream of the Dimer_Tnp_hAT (hAT family C-terminal dimerization region) domain of ZNF862 protein (Figure 1).
The pathophysiological mechanisms underlying HGF remain largely elusive. Nevertheless, overproduction of extracellular matrix, particularly the major component collagen type I (COL1A1), may account for the gingival fibroblast overgrowth phenotype; meanwhile increased synthesis of TIMP-1 seems to be associated with COL1A1 excessive accumulation in HGF gingival fibroblasts (Gawron et al., 2018;Roman-Malo et al., 2019 of TGF-β1 and IL-6 in gingival fibroblasts play pivotal roles in increasing the synthesis of the COL1A1 along with other specific growth factors (Martelli-Junior et al., 2003). Systematically, transcriptomic analysis of HGF patients and controls performed by Han et al., 2019, demonstrated that regulatory network connection of TGF-β/SMAD signaling pathway and craniofacial development processes contributes to the molecular mechanism of clinical-pathological manifestations. Aiming to explore the underlying mechanism related to the ZNF862 mutation, RNA sequencing was performed according to previously described protocols (Liang et al., 2019). The primary fibroblasts were isolated from fibromatic gingival specimens obtained from two patients who underwent gingivectomy (IV-1 and IV-2) using standard explant culture as described previously (de Andrade et al., 2001), while control cells were obtained from four independent age-and gender-matched controls who underwent crown lengthening surgery for the restorative purpose. The genes expression profiling      Table 1 continued and changes in patients over controls were summarized in Supplementary file 1, the samples average counts per million mapped reads (CPM) for each gene is shown, the expression fold-change (FC) and their statistical significance false discovery rate (FDR) are also provided. The expression profiling of attested profibrotic genes (Gao et al., 2019), which may be involved in HGF, was interrogated ( Figure 2A). A part of such genes, including COL1A1, TGFB1/2, and SMAD1, was up-regulated in HGF fibroblasts compared with controls; whereas another part was down-regulated, suggesting that the special HGF in our study was attributed to a corresponding part of profibrotic factors, including TGF-β/SMAD1 signaling pathway and COL1A1. Besides RNA sequencing, the in vitro cultured gingival fibroblasts from control 1 underwent adenovirus-mediated delivery of short hairpin RNA (shRNA) targeting ZNF862. The expression of four most significant genes ACTA2, FOS, SMAD1, AGT, as well as COL1A1, under shRNA treatment were investigated ( Figure 2B), the knockdown of ZNF862 in control gingival fibroblasts leads to the profibrotic genes up-regulation, reminiscent of the expression in the patients. Here, we speculate that the ZNF862 mutation in patients attenuate the protein function, similar with the ZNF862 shRNA, and promote special profibrotic genes expression to result in the HGF trait. Meanwhile, it did not escape our notice that the causative gene SOS1 was mildly downregulated, which may be associated with the phenotype (Figure 2 and Figure 2-figure supplement  3). Nevertheless, the proliferation in control gingival fibroblasts was not significantly stimulated by the knockdown of ZNF862 during 5 days' culture period (Figure 2-figure supplement 2), which may be partially resulting from the too short time course to observe the sophisticated physiological effect.
To screen all the genes differentially expressed (DE) between patients and controls, we quantified genes whose expression changed over twofold (|log 2 FC| ≥ 1.0), and we employed a cut-off of FDR < 0.05 due to the statistical power limitation of the small number of entire samples (N = 6). Finally, the transcriptomic comparison of HGF patients and controls yielded 597 DE transcripts, 355 up-regulated while 242 down-regulated ( Figure S1). Of which, the 100 most up-regulated DE genes and 100 most down-regulated DE genes (Supplementary file 2) were shown in Figure 2C, furthermore the top 10 genes were illustrated accordingly. These DE genes probably correlated with the HGF trait. The 20 most significantly enriched functional pathways of all DE genes were shown ( Figure 2D). Thereinto, lipoic acid metabolism is the most enriched cluster; notably the MAPK signaling pathway is associated with IL-6 pathway, besides several infection-related clusters in this chart is related with IL-6-induced autophagy pathway.
Taken together we can speculate that ZNF862 may execute as transcriptional repressors since ZNF862 has the zinc-finger DNA-binding domain, meanwhile this missense mutation may attenuate biological function of ZNF862 and hereby enhance the TGF-β/SMAD1 signaling pathway that increases profibrotic factors, particularly COL1A1 accumulation in gingival tissue and results in HGF. Nevertheless, the underlying mechanism of this novel mutation leading to HGF remains enigmatic, with no more specific experiment designed for exploring the biological function of ZNF862. Further studies are expected to expand the mutational spectrum of ZNF862 and the associations with phenotypes, and to investigate the physiological role of ZNF862 in the process of HGF.
In this study, we identify a missense variant (c.2812G > A) in a novel gene ZNF862 that causes autosomal-dominant HGF in a four-generation Chinese family. The functional study supports a biological role of ZNF862 for increasing the profibrotic factors, particularly COL1A1 synthesis in gingiva and hence resulting in HGF. This variant has been absent among the large population according to the report of Exome Aggregation Consortium (ExAC) database, 1000 Genomes, and Genome Aggregation Database. The log of the odds (LOD) score is calculated as 3.6 based on the dominant segregations. According to the combining criteria set guidelines by the American College of Medical Genetics and Genomics (ACMG) and the Clinical Genome Resource (ClinGen) (Oza et al., 2018;Richards et al., 2015), the missense variant is rated as 'Pathogenic' with the following evidence rules: PP1_Strong (LOD > 3), PS3 (in vitro functional study support), and PM2 (absent from controls). We can speculate that this finding may shed lights on the precise genetic diagnosis and on the potential therapy development, helping individuals continue their lives better.

Samples collection and WES
The participants from this family were recruited from Nanjing Stomatological Hospital. The genomic DNA was extracted from the peripheral blood. WES was performed for probands and nine other members (numbers indicated in red color in Figure 1A) in the family. Average WES sequencing depth of target region for each sample was higher than 100, and 20× coverage for more than 95% targeted bases was achieved for each sample. Sequencing reads were aligned to the human genome reference sequence (hg19) through the in-house programs. Genome Analysis Toolkit (GATK) was used to call SNVs and indels; and Annotation of Genetic Variants (ANNOVAR) was used for the annotation. A custom Perl script (Source code 1) was used for retrieving the variants corresponding to the inheritance model of this family and that do not appear in unaffected individuals. All the WES data underwent the same quality control filtering and pruning procedures to maximize parity. The following primers are used to amplify products for Sanger sequencing: ZNF862 (NM_001099220.

Transcriptome sequencing
The primary fibroblasts were isolated from fibromatic gingival specimens obtained from the patients who underwent gingivectomy (IV-1 as patient 1 and IV-2 as patient 2) using standard explant culture, while control cells were obtained from independent age-and gender-matched controls (two males at 25 and 26 years of age as controls 1 and 2, respectively, and two females at 21 and 23 years of age as controls 3 and 4, respectively) who underwent crown lengthening surgery for the restorative purpose. Total RNA samples were extracted from the cultured fibroblast cells using Trizol, RNA integrity was assessed using the RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system (Agilent Technologies, CA). A total amount of 1 μg RNA per sample was used as input material for the RNA sample preparations. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations under elevated temperature in First Strand Synthesis Reaction Buffer. First strand cDNA was synthesized using random hexamer primer and M-MuLV Reverse Transcriptase (RNase H-). Second strand cDNA synthesis was subsequently performed using DNA Polymerase I and RNase H. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of 3' ends of DNA fragments, adaptor with hairpin loop structure was ligated to prepare for hybridization. In order to select cDNA fragments of preferentially 370-420 bp in length, the library fragments were purified with AMPure XP system (Beckman Coulter, Beverly, MA). Then PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers, and Index Primer. At last, PCR products were purified (AMPure XP system) and library quality was assessed on the Agilent Bioanalyzer 2100 system. The library preparations were sequenced on an Illumina Novaseq platform and 150 bp paired-end reads were generated.
Raw data were processed through an in-house bioinformatics workflow. Paired-end clean reads were aligned to the reference genome ensembl release-97 (http://asia.ensembl.org/Homo_sapiens/ Info/Index) using Hisat2 v2.0.5, featureCounts v1.5.0-p3 was used to count the reads numbers mapped to each gene. And then the CPM and fragments per kilobase of transcript per million mapped reads (FPKM) of each gene was calculated based on the length of the gene and reads count mapped to shRNA 2 (S2) groups, respectively. Results are given as means ± SEM, dots indicate the relative values of gene expression levels. *p < 0.05, **p < 0.01 in comparison with Sc group. N.S. means not significant. (C) Heatmap of the 100 most up-regulated differentially expressed (DE) genes and 100 most down-regulated DE genes expression profiling in RNA sequencing. The color key is the same with (A). (D) Functional annotation chart of DE genes, the 20 most significantly enriched functions of DE genes were plotted. Functional pathways enrichment tests were based on KEGG database. Enrichment denotes the proportion of DE genes among the indicated pathway.
The online version of this article includes the following figure supplement(s) for figure 2:    this gene. At least 45 million clean mapped reads per sample were yielded. The clean bases after filtration for each sample are higher than 6.7 Giga-base, with the effective rate higher than 99.7%, the Q20 bases is higher than 94% for each sample. The GC content for each sample is ranging from 51% to 55%. DE analysis of two groups (patients and controls) was performed using the edgeR R package (3.22.5). The resulting p-values were adjusted using the Benjamini and Hochberg's approach for controlling the FDR. Corrected p-value of 0.05 and absolute FC of 2 were set as the threshold for significantly DE. We used clusterProfiler R package to test the statistical enrichment of DE genes in KEGG pathways, corrected p-value less than 0.05 was considered significantly enriched. KEGG is a database resource for understanding high-level functions and utilities of the biological system (http:// www.genome.jp/kegg/).

ShRNA interference
Besides RNA sequencing, the in vitro cultured fibroblasts (from control 1, control 2, and control 3, respectively) underwent adenovirus-mediated delivery of shRNA targeting ZNF862, the following sequences were used for the ZNF862 shRNA interference experiments, ZNF862 shRNA 1: GCTC TGTT CTGC TCTG CTTGC; ZNF862 shRNA 2: GGAT TTAC ATCC GCTA CTTCA; scrambler: GCAC CCAG TCCG CCCT GAGCAAA, at 72 hr after the adenovirus-mediated U6 promoter-driven shRNA delivered into the cells (MOI = 70) real-time PCR was performed using 18s rRNA as an internal control. Cells between the third and sixth passages were used for all abovementioned experiments. The following primers are used to amplify products for the real-

Proliferation assay
Cell Counting Kit 8 (CCK8, Dojindo Molecular Technologies) proliferation assay was performed with the above-mentioned in vitro cultured fibroblasts. The cells in the logarithmic growth phase were plated overnight in a 96-well plate at an initial density of 2000 per well. Starting with adenovirus infection (MOI = 70), 10 μl CCK8 per well was added to the culture medium on the days 1, 2, 3, 4, and 5. The fluorescent optical density value was measured using a microplate reader. The in vitro cultured fibroblasts underwent adenovirus-mediated delivery of ZNF862 siRNA (NM_001099220.3).

Statistical analysis
All data from the semi-quantitative real-time PCR analyses are representative of at least three independent experiments. Data are presented as the mean ± SEM of at least three independent experiments and differences (in comparison with Sc group) are considered statistically significant at p < 0.05 using Student's t-test. *p < 0.05, **p < 0.01. N.S. means not significant. All analyses were performed using R v4.0.2.

Data availability
The sequencing data supporting this study have been deposited in the China Genebank Nucleotide Sequence Archive (https://db.cngb.org/cnsa, accession number CNP0000995).
The following dataset was generated: Author (