In-Depth Bioinformatic Study of the CLDN16 Gene and Protein: Prediction of Subcellular Localization to Mitochondria

Background and Objectives: The defects in the CLDN16 gene are a cause of primary hypomagnesemia (FHHNC), which is characterized by massive renal magnesium wasting, resulting in nephrocalcinosis and renal failure. The mutations occur throughout the gene’s coding region and can impact on intracellular trafficking of the protein or its paracellular pore forming function. To gain more understanding about the mechanisms by which CLDN16 mutations can induce FHHNC, we performed an in-depth computational analysis of the CLDN16 gene and protein, focusing specifically on the prediction of the latter’s subcellular localization. Materials and Methods: The complete nucleotide or amino acid sequence of CLDN16 in FASTA format was entered and processed in 14 databases. Results: One CpG island was identified. Twenty five promoters/enhancers were predicted. The CLDN16 interactome was found to consist of 20 genes, mainly involved in kidney diseases. No signal peptide cleavage site was identified. A probability of export to mitochondria equal to 0.9740 and a cleavable mitochondrial localization signal in the N terminal of the CLDN16 protein were predicted. The secondary structure prediction was visualized. Νo phosphorylation sites were identified within the CLDN16 protein region by applying DISPHOS to the functional class of transport. The KnotProt database did not predict any knot or slipknot in the protein structure of CLDN16. Seven putative miRNA binding sites within the 3’-UTR region of CLDN16 were identified. Conclusions: This is the first study to identify mitochondria as a probable cytoplasmic compartment for CLDN16 localization, thus providing new insights into the protein’s intracellular transport. The results relative to the CLDN16 interactome underline its role in renal pathophysiology and highlight the functional dependence of CLDNs-10, 14, 16, 19. The predictions pertaining to the miRNAs, promoters/enhancers and CpG islands of the CLDN16 gene indicate a strict regulation of its expression both transcriptionally and post-transcriptionally.


Introduction
The CLDN16 gene is clustered on chromosome 3q28 and encodes the claudin16 protein which is found primarily in the kidneys, specifically in the thick ascending limb (TAL) of the loop of Henle where it regulates the paracellular resorption of magnesium ions. The defects in the CLDN16 gene are a cause of primary hypomagnesemia (FHHNC) (OMIM # 248250; HOMG3), which is characterized by massive

Materials and Methods
The nucleotide sequence of the CLDN16 gene was downloaded in FASTA format from the Ensembl database. The EMBOSS_CpGplot tool [7] was employed to identify putative CpG islands using the criteria by Takai and Jones [8]: Observed/expected ratio >0.65; percent C + percent G >55.00; length >500.
The prediction of promoters and enhancers was performed by the FPROM database [9]. The GeneMania database was used for the prediction of the CLDN16 gene interactors [10]. The description of the retrieved genes was provided by the GeneCards database [11]. The functional enrichment analysis of gene ontology (GO) annotations relative to disease of the CLDN16 interactome was performed using ToppFun, an application of the ToppGene Suite [12].
Similarly, the amino acid sequence of the CLDN16 protein was downloaded in FASTA format from the Uniprot database (UniProtKB ID: Q9Y5I7). The ProtParam tool was used for the prediction of the protein's physicochemical properties [13]. The SignalP v.4.1 Server carried out the prediction of the presence and location of signal peptide cleavage sites [14]. The combined transmembrane topology and signal peptide prediction was performed by the Phobius database [15]. The protein subcellular location was predicted by WoLFPSORT [16]. MitoProt II and Mitofates were used for the identification of putative mitochondrial targeting sequences and cleavage sites [17,18]. The prediction of the secondary structure was performed by the PSIPRED v.3.3 database as the latter also provides the possibility of identifying disordered protein regions through the disorder predictor tool DISOPRED2 [19]. The prediction of protein knots was conducted using the KnotProt v.2.0 database [20]. The DISPHOS v.1.3 database was used to predict serine, threonine and tyrosine phosphorylation sites within the CLDN16 protein selecting transport as the latter's functional category [21].
The identification of predicted and validated CLDN16-miRNA targets was performed by the miRWalk v.2.0 database using the default parameters (3'UTR, start position of miRNA seed=position 1, minimum seed length=7) [22].
The selection of the aforementioned bioinformatic tools was made mainly on the basis of two factors: Performance and quality of documentation. All analyses were performed on September 2018. The specific versions of the databases have been indicated in the cases where more than one version is available. The workflow of the in-silico methodology used in this study is visualized in Figure 1.
prediction of the secondary structure was performed by the PSIPRED v.3.3 database as the latter also provides the possibility of identifying disordered protein regions through the disorder predictor tool DISOPRED2 [19]. The prediction of protein knots was conducted using the KnotProt v.2.0 database [20]. The DISPHOS v.1.3 database was used to predict serine, threonine and tyrosine phosphorylation sites within the CLDN16 protein selecting transport as the latter's functional category [21].
The identification of predicted and validated CLDN16-miRNA targets was performed by the miRWalk v.2.0 database using the default parameters (3'UTR, start position of miRNA seed=position 1, minimum seed length=7) [22].
The selection of the aforementioned bioinformatic tools was made mainly on the basis of two factors: Performance and quality of documentation. All analyses were performed on September 2018. The specific versions of the databases have been indicated in the cases where more than one version is available. The workflow of the in-silico methodology used in this study is visualized in Figure 1.

CLDN16 Gene
One CpG island was identified in the nucleotide sequence of the CLDN16 gene ( Figure 2). Twenty five promoters/enhancers were predicted in the FPROM analysis ( Table 1). The GeneMania database identified 20 genes as possible interactors of the CLDN16 gene (Table 2). According to ToppFun, the CLDN16 interactome is mainly involved in kidney disorders (Table 3).

CLDN16 Gene
One CpG island was identified in the nucleotide sequence of the CLDN16 gene ( Figure 2). Twenty five promoters/enhancers were predicted in the FPROM analysis ( Table 1). The GeneMania database identified 20 genes as possible interactors of the CLDN16 gene ( Table 2). According to ToppFun, the CLDN16 interactome is mainly involved in kidney disorders (Table 3).   The gene description has been provided by the GeneCards database.

CLDN16 Protein
The ProtParam Tool predicted that the CLDN16 protein has a theoretical isoelectric point and an instability index equal to 8.26 and 35.14 respectively. The latter value classifies the protein as stable. No signal peptide cleavage site was identified by the SignalP v.4.1 Server and the Phobius database. The results of the combined transmembrane topology and signal peptide prediction analysis as retrieved by the latter database are presented in Figure 3. The WoLFPSORT database identified 32 nearest neighbors (plasma membrane: 19, extracellular: 8, mitochondrial: 2, nuclear: 1, peroxisomal: 1, golgi: 1). MitoProt II predicted a probability of export to mitochondria equal to 0.9740 while Mitofates identified a cleavable localization signal in the N terminal of the CLDN16 protein (22 MPP cleavage site). The secondary structure prediction is shown in Figure 4. The KnotProt v.2.0 database did not predict any knot or slipknot in the protein structure of CLDN16. No phosphorylation sites were identified within the CLDN16 protein region by applying DISPHOS 1.3 to the functional class of transport ( Figure 5).

CLDN16 Protein
The ProtParam Tool predicted that the CLDN16 protein has a theoretical isoelectric point and an instability index equal to 8.26 and 35.14 respectively. The latter value classifies the protein as stable. No signal peptide cleavage site was identified by the SignalP v.4.1 Server and the Phobius database. The results of the combined transmembrane topology and signal peptide prediction analysis as retrieved by the latter database are presented in Figure 3. The WoLFPSORT database identified 32 nearest neighbors (plasma membrane: 19, extracellular: 8, mitochondrial: 2, nuclear: 1, peroxisomal: 1, golgi: 1). MitoProt II predicted a probability of export to mitochondria equal to 0.9740 while Mitofates identified a cleavable localization signal in the N terminal of the CLDN16 protein (22 MPP cleavage site). The secondary structure prediction is shown in Figure 4. The KnotProt v.2.0 database did not predict any knot or slipknot in the protein structure of CLDN16. Νo phosphorylation sites were identified within the CLDN16 protein region by applying DISPHOS 1.3 to the functional class of transport ( Figure 5).

miRNA Analysis
The miRWalk v.2.0 database predicted that CLDN16 was most probably a target of the miRNAs presented in Table 4.

Discussion
In this study we aimed to perform an in-depth computational study of the CLDN16 gene and protein so as to gain a better understanding about the latter's involvement in the pathophysiology of renal disorders. Due to the fact that mislocalized CLDN16 mutants lose their function [6], this study focused specifically in the prediction of the protein's subcellular location. Our results, which have been reproduced in two independent databases, are the first to identify mitochondria as a probable cytoplasmic compartment for CLDN16 localization, thus providing new insights into the protein's intracellular transport. Additionally, no signal peptide cleavage site was identified. The integral membrane proteins carry a signal peptide and/or a transmembrane domain that mediates their insertion into the endoplasmic reticulum from where they exit to reach the Golgi apparatus and the plasma membrane [23]. Two questions arise from the above findings: 1. What is the functional role of the CLDN16 protein in mitochondria, assuming the latter actually locates at this site? 2. Which are the processes that control the protein's translocations?
The regulation of protein trafficking relies on information that is encoded within the protein sequence and occurs by two major mechanisms, namely co-translational and post-translational translocation [24]. CLDN16 is located in the TJs but the mechanism regulating its localization is unclear [25]. It has been reported that in renal tubular epithelial cells the tight junctional localization of CLDN16 is regulated by Syntaxin 8 (STX8) [6]. In addition, the association between the two proteins requires the phosphorylation of CLDN16 [6]. The dephosphorylation of CLDN16 increases its intracellular distribution and decreases paracellular Mg 2+ permeability [25]. The RING finger-and PDZ domain-containing protein PDZRN3 mediates the endocytosis of dephosphorylated CLDN16, thus representing an important component of the CLDN16-trafficking machinery in the kidney [25]. Notably, in this study, no phosphorylation sites were identified within the CLDN16 protein region by applying DISPHOS to the functional protein category of transport. This is important as DISPHOS uses disorder information to improve the distinction between phosphorylation and non-phosphorylation sites [21]. Based on both our findings and the existing literature, we could speculate that the dephosphorylated form of the CLDN16 protein may translocate to the mitochondria although cellular subfractionation studies need to be performed in order to prove this hypothesis. In addition, it would be of interest to see whether the protein as a whole or a fraction of it, as in the case of the retinoblastoma protein, reaches the mitochondria compartment [26].
The presence of mutations leading to defects in protein trafficking is an acknowledged pathogenetic mechanism observed in an increasing number of disorders, including approximately one third of monogenic diseases affecting the kidneys [27]. In the case of the FHHNC disease, various mutations in the CLDN16 gene can lead either to the retention of the protein product in the endoplasmic reticulum and Golgi compartments or to its mislocalization to lysosome [27]. Notably, the CLDN16 gene interaction network appears to be associated with Bartter syndrome type 4, which results from mutations in the BSND gene also affecting the trafficking and function of CIC-K channels [28,29]. The remaining results of the functional analysis are also intriguing as they link the CLDN16 interactome with disorders of water, electrolytes and acid-base metabolism. The epithelial cells in the TAL, form a water-impermeable barrier, actively transport Na + and CI − via the transcellular route and provide a paracellular pathway for the selective reabsorption of Mg 2+ and Ca 2+ [30]. An interesting study relating to salt and acid-base metabolism in CLDN16 knockdown mice, revealed that the loss of CLDN16 results in increased urinary flow, reduced HCO3-excretion and lower urine pH [31].
The identification, in this work, of CLDN19 and CLDN14 as members of the CLDN16 gene regulatory network denotes the functional interplay of these genes, which has been confirmed in previous studies [30,32]. It has been reported that the CLDN14 protein blocks the paracellular cation channel formed by the CLDN16-CLDN19 protein complex that is critical for Ca 2+ reabsorption in the TAL [32]. Of interest, the gene expression of CLDN14 is regulated on the post-transcriptional level by two microRNAs (miR-9 and miR-374) which directly target the 3 -UTR of the CLDN14 mRNA inducing its decay and translational repression [30,32]. The Ca 2+ sensing receptor (CaSR) acts upstream of the microRNA-CLDN14 axis [32] providing thus a regulatory loop to maintain Ca 2+ homeostasis in the kidney [30]. Another finding that should be commented upon, pertaining to the CLDN16 gene interaction network, is the identification of the CLDN10 gene as recently it was reported that deletion of the latter rescues CLDN16-deficient mice from hypomagnesemia and hypercalciuria [33]. It is worth noting that the four aforementioned CLDN genes have been included in a list of 16 genes (both differentially expressed and differentially methylated) which ranked in the top 15% of the nodes of an integrated gene regulatory network in kidney renal clear cell carcinoma [34]. We currently perform an in-silico transcriptomic analysis of the CLDN16 interactome in kidney cancer to examine possible associations.
With respect to the miRNA analysis results, this study predicted seven putative miRNA bindind sites within the 3'-UTR region of CLDN16. The miRNAs partake in the regulation of almost every cellular process and are associated with many human pathologies including kidney diseases [35]. In the TAL of the loop of Henle, miRNAs not only regulate the Ca 2+ metabolism as mentioned above, but also the salt and fluid handling [35]. This was evidenced in a study which demonstrated that the miR-192 suppresses the β-1 subunit of Na(+)/K(+)-ATPase, the enzyme that provides the driving force for tubular transport [36]. To our knowledge, no literature exists with respect to the miRNAs that have been identified in this study and FHHNC disease. Recently, a group from Spain standardized the protocol conditions for the identification of differentially expressed miRNAs in urinary exosome-like vesicles of FHHNC patients (and other renal diseases) characterized by polyuria [37].
The CpG islands are sites of transcription initiation [38] and have been characterized lately as "hotspots for global gene regulation" [39]. At the same time, promoters and enhancers are DNA regulatory regions accountable for ensuring proper spatiotemporal expression patterns of eukaryotic genes [40]. Most genes have multiple promoters and 72% of human promoters are associated with CpG islands [41]. The frequency of TATA box containing promoters among human protein-coding genes has been reported to be 10-20% with the result that the majority of protein-coding genes are regulated by TATA-less promoters [41]. The FPROM method which has been used in this study for the identification of potential transcription start positions has been shown to predict with high accuracy both types of promoters [9]. The identification in this study of seven miRNA binding sites along with the prediction of one CpG island and twenty five promoters/enhancers within the nucleotide sequence of the CLDN16 gene provides the clue that the latter's expression is possibly strictly regulated at the transcriptional and post-transcriptional level.
A limitation of our study was that the results are mainly based on predictions. Bioinformatics should be combined with experimentation to generate more accurate and reliable interpretations, however the in-silico analysis allows researchers to take an informed decision before proceeding in an expensive and time consuming experiment for further validation [42]. The bioinformatics tools that have been used in this study have been tested for their performance as evidenced by the publications mentioned therein.

Conclusions
This study performed a thorough bioinformatic analysis of the CLDN16 gene and protein.
Our main finding is the prediction that mitochondria are a probable subcellular compartment for the localization of the CLDN16 protein. The conditions under which the CLDN16 protein (or a fraction of it) reaches this organelle along with its possible functional role there, must be further investigated at the experimental level. Our results with respect to the CLDN16 interactome underline its role in renal pathophysiology and highlight the functional dependence of the CLDN16-CLDN19-CLDN14-CLDN10 genes. The predictions pertaining to the miRNAs, promoters/enhancers and CpG islands of the CLDN16 gene provide indications for a strict regulation of its expression both transcriptionally and post-transcriptionally. Our report inculcates the idea of studying both the potential translocation of the CLDN16 protein to mitochondria and the functional role of the CLDN16 gene regulatory network in kidney disorders other than FHHNC.