Structural insights and characterization of human Npas4 protein

Npas4 is an activity dependent transcription factor which is responsible for gearing the expression of target genes involved in neuro-transmission. Despite the importance of Npas4 in many neuronal diseases, the tertiary structure of Npas4 protein along with its physico-chemical properties is limited. In the current study, first we perfomed the phylogenetic analysis of Npas4 and determined the content of hydrophobic, flexible and order-disorder promoting amino acids. The protein binding regions, post-translational modifications and crystallization propensity of Npas4 were predicted through different in-silico methods. The three dimensional model of Npas4 was predicted through LOMET, SPARSKS-X, I-Tasser, RaptorX, MUSTER and Pyhre and the best model was selected on the basis of Ramachandran plot, PROSA, and Qmean scores. The best model was then subjected to further refinement though MODREFINER. Finally the interacting partners of Npas4 were identified through STRING database. The phylogenetic analysis showed the human Npas4 gene to be closely related to other primates such as chimpanzees, monkey, gibbon. The physiochemical properties of Npas4 showed that it is an intrinsically disordered protein with N-terminal ordered region. The post-translational modification analyses indicated absence of acetylation and mannosylation sites. Three potential phosphorylation sites (S108, T130 and T136) were found in PAS A domain whilst a single phosphorylation site (S273) was present in PAS B domain. The predicted tertiary structure of Npas4 showed that bHLH domain and PAS domain possess tertiary structures while the rest of the protein exhibited disorder property. Protein-protein interaction analysis revealed NPas4 interaction with various proteins which are mainly involved in nuclear trafficking of proteins to cytoplasm, activity regulated gene transcription and neurodevelopmental disorders. Moreover the analysis also highlighted the direct relation to proteins involved in promoting neuronal survival, plasticity and cAMP responsive element binding protein proteins. The current study helps in understanding the physicochemical properties and reveals the neuro-modulatory role of Npas4 in crucial pathways involved in neuronal survival and neural signalling hemostasis.


INTRODUCTION
NPas4 belong to the basic helix loop helix/per-arnt-sim (BHLH/PAS) transcription factor family (Shamloo et al., 2006). This family is responsible for gearing the expression of target genes on either the positive or negative side (Spiegel et al., 2014). Classically, these transcription factors harbor a DNA binding motif and a dimerisation motif called PAS homology domain (Gu, Hogenesch & Bradfield, 2000). The word 'PAS' pertains to the first three proteins of the domain; period (per), aryl hydrocarbon receptor translocator (arnt) and single minded (SIM) (Ooe, Saito & Kaneko, 2009). Functionally, Npas4 is categorized as an immediate early gene (IEG), responsible for direct control of a large number of activity dependent genes that are capable of altering their expression input on the basis of sensory stimuli they receive (Madabhushi et al., 2015;Sun & Lin, 2016). Npas4 is also known for development of glutaminergic and GABAergic synapses in neurons suggesting its crucial role in neuro-circuitary homeostasis and memory formation (Spiegel et al., 2014). Deletion of Npas4 may lead to development of several neuronal plasticity disorders and disorganised management of sensory input (Maya-Vetencourt, 2013). Previous studies consolidated its functional role in cerebral ischemia, memory developmental disorders and related pathologies like autism spectrum disorders, and neuropsychiatric disorders (Choy et al., 2015a;Choy et al., 2015b;Ebert & Greenberg, 2013;Shepard, Heslin & Coutellier, 2017). Limited studies are available which clearly delineate Npas4's role in regulating its discovered function of neuro-protection and hemostasis in synaptic inter connectivity. The tertiary structure of Npas4 protein along with its physico chemical properties is scarcely available. The current study was mainly aimed to provide the structural insight of Npas4 protein and its possible interaction with other proteins.

MATERIALS AND METHODS
The flow chart of methodology is shown in Fig. 1.

Sequence retrieval and homology search
The Npas4 protein sequence was retrieved from UniProt database in FASTA format and was subsequently utilized for homology search using protein BLAST program at NCBI (Blastp). The amino acid sequence of Npas4 was used for homology search using BLASTP at NCBI. One hundred orthologous sequences of Npas4 were retrieved. Out of these 100 sequences 24 full length sequences with homology >97% were selected for phylogenetic analysis.

Multiple sequence alignment and phylogenetic analysis
Multiple sequence alignment of different orthologous sequences were done using ClustalX (version 2.0). All the default paramenters were used for alignment. For Phylogenetic analysis, the tree was constructed by Mega 7 (Kumar, Stecher & Tamura, 2016) using NJ (Neighbour Joining) method along with 1,000 bootstraps. The evolutionary distances were computed using the Poisson correction method. All the other parameters were set as default. Figure 1 Flowchart used for functional annotation of Npas4. Sequence retrieval of Npas4 followed by homology search and multiple sequence alignment and phylogenetic analysis. Assessment of physiological properties, post-translational modifications of Npas4 were predicted using primary sequence. Subsequently, secondary and tertiary structure were predicted through Ab-initio modeling.

Protein order-disorder prediction
In order to check whether Npas4 is an intrinsically disordered protein or not, predictions were computed using MobiDB web tool and Meta Disorder Web server (Kozlowski & Bujnicki, 2012;Piovesan et al., 2018). MobiDB predicted the disorder region of proteins through the use of six different tools (DisEMBL, ESpritz, GlobPlot, IUPred, Jronn, VSL2b (Piovesan et al., 2018)). Metadisorder predicts the disorder regions of proteins through the use of 13 different tools and provides the consensus of all these tools (Kozlowski & Bujnicki, 2012). In order to predict the type of disorder in Npas4, charge hydropathy plot and Cumulative Distribution Function analysis were performed through PONDR server (Xue et al., 2010).

Tertiary structure prediction
Npas4 (Uniprot ID: Q8IUM7) was subjected to BLAST search and less than 25% homology with crytal structure of NPAS3-ARNT complex (5SY7) and NPAS1-ARNT complex (5SY5) was found ( Figure S1). Due to the absence of suitable structural homologue, homology modeling cannot be performed. In the absence of structural homologue, threading protein structure prediction approach was used. Threading is a fold recognition method to predict 3D structure of proteins. I-TASSER (http://zhanglab.ccmb.med.umich.edu/I-TASSER) LOMETS (https://zhanglab.ccmb.med.umich.edu/LOMETS/), MUSTER (https://zhanglab.ccmb.med.umich.edu/MUSTER/), SPARSKS-X (http://sparks-lab.org/ yueyang/server/SPARKS-X/) RaptorX (http://raptorx.uchicago.edu/) and Phyre 2.0 (http: //www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index) were used for three dimensional structure prediction of human Npas4. LOMETS uses local meta threading approach to predict the tertiary structure of protein. It has nine locally installed threading programs and generates an output on the basis of high scoring target-template alignment. I-TASEER is an iterative threading program that builds 3D structure by using the hierarchical method (Roy, Kucukural & Zhang, 2010). MUSTER is a multisource threading program that works by identifying structural templaes from the protein data bank and then performs the profile-profile alignment and generates 3D structure (Wu & Zhang, 2008). SPARSKS-X is a fold recognition method to predict 3D structure (Yang et al., 2011). Phyre works by searching the structural template for the query sequence followed by multiple sequence alignment and then searching the hidden markov model of structures to find the best structure (Kelley et al., 2015).

Protein-protein interaction analysis
String databse (https://string-db.org/) was used to predict the interaction of Npas4 with other cellular peoteins. STRING is a database of known and predicted protein-protein interactions. In order to get the interacting partners of Npas4 the medium confidence interval value of 0.4 was used and 20 maximum interactions for the first shell and 10 maximum interactions for the second shell were used. The interactions may be direct physical interactions or indirect functional associations (Szklarczyk et al., 2017).

Multiple sequence alignment and phylogenetic analysis
Multiple sequence alignment of human Npas4 sequence with other species is shown in Figure S2. The phylogenetic analysis of the aligned sequences of Npas suggests Npas4's evolutionary conservativeness. The analysis of taxonomic classification of all organisms (Table S1) revealed human Npas4 to be closely related to other primates including chimpanzees (T.N), monkeys and gibbons. The next closest neighbors were rodents (marmots, elephant-shrews, moles, guinea pigs). Other related organisms to rodents were artiodactyla like camels and alpacas. The farthest neighbors of Npas4 sequences were from rodents including rabbits and pikas (Fig. 2).

Disorder prediction of Npas4
The disorder prediction was carried out using Metadisorder and Mobidb. According to Metadisorder and Mobidb, results indicated Npas4 is an intrinsically disordered protein.
According to Metadisorder plot C-terminal half of the protein showed strong disorder while N-terminal of the protein is ordered (position 15-350 a.a). CDF (cumulative distribution function) analysis showed that Npas4 is mixture of ordered and disordered regions as it intersects the boundary (Fig. 3). Primary sequence analysis showed that Npas4 is enriched in disorder promoting amino acids (Pro, Ser, Glu, Gln, Ala and Gly; Table 1). These amino acids have the tendency to prevent the folding of protein (67-70). Pro and Ser have the highest probability. Amino acid sequence analysis of Npas4 protein showed that almost 50% of the protein composed of disorder region. Flexibility analysis of Npas4 was performed using composition profiler and DynaMine server and both servers showed that Npas4 is enriched with flexible amino acids like Pro, Ser and Gln (Fig. 4A). Low hydrophobicity and high hydrophilicity promote disorder in the proteins (Dyson & Wright, 2016). The analysis according to the composition profiler server suggests Npas4 is enriched with hydrophilic amino acids like Gln, Ser, Pro and Thr (Fig. 4B).

Physico-chemical properties of Npas4
The physico-chemical properties were determined by Protscale server (Fig. 5). The higher score suggested a higher probability of that particular property of Npas4. The ratio of the side chain volume to the length of an amino acid suggested protein bulkiness and may affect the local structure of a protein. . This data showed that Npas4 possesses polarity. Mutability, which is the probability that amino acids will bring a change over a particular evolutionary period of time, was determined through the relative mutability score (Fig. 5D) and the values lie between 46.000 (position 295aa) and 102.889 (position 18 aa) which showed that Npas4 has mutability potential. The signal peptide of Npas4 was computed using SignalP Server. The presence of signal peptide was measured through C-(raw cleavage site score), S-(signal peptide score) and Y-score (combined cleavage site score). S-score is the estimation of possible signal peptide while D-score is the average of mean S and the max Y-score and its discriminate signal peptide from non-signal peptides. In Npas4, the D-score was 0.450 which was less than the  cut off value of 0.5 which showed absence of signal peptide (Fig. 6). The secretory nature of Npas4 was predicted through SecretomeP and the NN score which was 0.5.

Crystallization propensity of Npas4
Crystallization propensity was predicted through multiple resources(name) inorder to get a validated prediction. The ParCrys server predicted Npas4 to be 'Recalcitrant to Crystallisation'. According to CRYSTALP2, Npas4 is non-crystallizable with 0.447 confidence. The probability of crystallization was also studied through FDETECT and it gave a score of 0.64 which means this protein is difficult to crystallize. PPCpred provided the crystallization probability score of 0.122 for Npas4 while the score above 0.4 means that protein has the ability to crystallize (Mizianty & Kurgan, 2011), suggesting similar difficulty in crystallization.

Protein binding regions in Npas4
Many disordered proteins bind to some other proteins and transform from disorder to order and thus perform their function (Dyson & Wright, 2016). ANCHOR predicts 10  binding sites in the disorder region of Npas4. Out of these 10 regions three of them (position 585-599, 662-720, 746-792) were present in transactivation domain of Npas4 (Table 2).

Post translation modification sites
Attempts were made to predict the protein modification sites that could be occurring in Npas4. The results indicated absence of acetylation and mannosylation sites. However, two potential N-glycosylation sites were found in disorder region of Npas4 at 556 (NPTK) and 671 (NLSL) amino acid positions (Fig. 7). A total of 80 O' linked glycosylation sites were present in Npas4. However, only 56 sites were found available for glycosylation according to the GlyCamserver.
According to Netphos 3.1 server predictions, 34 threonine phosphorylation sites, 53 serine phosphorylation sites and four tyrosine specific phosphorylation sites were present in Npas4 (Fig. 8). NetsurfP prediction revealed 43 serine, 28 threonine and three tyrosine residues were exposed for phosphorylation. Table S2 showed the 24 Serine, 13 Threonine and one Tyrosine kinase specific sites in Npas4. Two phosphorylation sites present (S38, S44) in bHLH domain and three sites (S98, S100 and T136) were observed in PASA domain while one site (S273) was seen in PAS B domain. Six Ser and four Thr phosphorylation sites were present in disordered transactivation domain (Table 3).

Secondary structure of Npas4
The secondary structure of Npas4 showed that the protein consists of alpha helices, coils and beta sheet ( Figure S3).

Tertiary structure of Npas4
The three dimensional structure of Npas4 protein has not been determined to date. Due to the absence of suitable structural template, holmology modeling can not be used ( Figure S1). In order to get the high quality structure of Npas4, ab-initio and threading approaches were used. By using these approaches five models were generated through I-Tasser, 10 models through LOMET, one model from Raptor X, 10 models from MUSTER, one model from Phyre and 10 models from SPARSK-X (Table 5). All the 37 models were then subjected to validation through PROSA, Qmean and Ramachandran plot. The models with 96% residues in favoured region were further selected for refinement (Table 6, Fig. 9). All models suggested that Npas4 is a disordered protein with ordered bHLH and PAS domain. Based on Qmean, PROSA Z-score and ramachandran scores, model 9 generated LOMET was the best predicted three dimensional structure of Npas4. The possible phosphorylation sites in PASA and PASB domain of Npas4 is shown in Fig. 10.

DISCUSSION
The Npas4 gene has been in debate for more than a decade, however its structural information is scarcely available. Npas4 expression is primarily traced to neural cells, where it has been reported to be an important contributor of dendritic growth in phases of neuronal development and for modulating limbic patterning and function (Moser et al., 2004). Recently, it has been reported to be also expressed in pancreatic cells (Speckmann et al., 2016). Functionally, Npas4 is an immediate early gene, which under activation, triggers the activation of battery of genes involved in regulating brain plasticity and cognition, attributed mainly to its interaction with number of transcription factors. Npas4 belongs to the bHLH-PAS family of transcription factors and the other Npas members (1 and 3) are also linked to numerous psychiatric disorders namely autism, bipolar disorders, schizophrenia and depressive disorders (Adachi et al., 2014;Kamnasaran et al., 2003). A simulated dimerised structure of Npas4 with ARNT was reported before to discuss the potential implication of gene variants (Bersten et al., 2014). Current findings address physicochemical properties on human Npas4 protein, with a plausible 3D model, providing useful information about amino acid characteristics and possible identification of interactable proteins. Human Npas4 is located on human chromosome 11 reference genomic contig NC000011.10, mapping to the chromosomal position 11q13.2. It possesses 11 exons that encode 2406 bp mRNA which translates into 802 amino acid long protein with 87.1 KDa molecular weight (https://www.ncbi.nlm.nih.gov/gene/266743).
Full-size DOI: 10.7717/peerj.4978/ fig-11 In the current study the functional characterization of Npas4 has been done using in silico approaches. The MSA and phylogentic analysis showed that Npas4 is evolutionarily conserved, reflecting the highest homology to primates such as chimpanzees and monkeys.
Npas4 does not have any signal peptide hence it may not be classically secreted extracellularly. The NN score of SecretomeP also verifes Npas4 to be a non-classically secreted protein.
The physicochemical and fuctional properties of proteins are affected by posttranslational modifications. Phosphorylation, acetylation and glycosylation are the most common type of post-translational modifications of proteins (Koh et al., 2012).
Acetylation is one of the major post-translational modifications (PTMs) and is important in determining the cellular localization of proteins (Qin, Pang & Zhou, 2011). There is no acetylation site present in Npas4 protein.
Glycosylation, the addition of glycosyl moiety to protein, is the most common post-translational modification in eukaryotes. N-linked and O-linked glycosylations are common while C-linked (mannosylation) is rare (Koh et al., 2012). No C-linked glycosylation sites were present in Npas4. In Npas4, two N-linked glycosylation sites were present but these sites were not available for glycosylation due to the absence of signal peptide.
Phosphorylation is another important feature contributing in PTM. Phosphorylation changes the conformation of protein making it active, inactive or modifying its function (Raghava et al., 2014). In Npas4 there were 90 potential phosphorylation sites which include 34 Threonine, 53 Serine and four Tyrosine phosphorylation sites. But not all the sites were available for phosphorylation. There were 24 Serine, 13 Theronine and one Tyrosine site available for phosphorylation.The important phosphorylation sites present in PAS A and PAS B domain were found at position 130, 136 and 273 respectively. Further experimental studies are needed to explore the role of these sites in Npas4 function.
Prediction analysis in the current study suggested presence of intrinsically disordered protein residues (IDRs) in Npas4 sequence at the C terminal, that can have profound effect on its functional versatility. Conventionally, proteins at physiological temperature exhibit a particular conformational ensemble based on optimal thermal accessibility of the ensemble to various molecular crowders and interacting proteins (Wright & Dyson, 1999). However, intrinsically disordered proteins or intrinsically disordered protein residues (IDRs) in a particular protein enables the protein to interconvert among various topological conformations (Dyson & Wright, 2005). These IDRs therefore, can provide Npas4 ease of flexibility to engage multiple targets. Moreover, IDRs can also aid in interactions by efficient utilization of less residues. This will enhance spontaneous disassociation or displacement in neuronal pathways. Previously it has been reported that structural homogeneity exhibited by IDRs is deficient which results from varied composition of Gly and Pro charged residues (Wright & Dyson, 2015), as is the case of Npas4, which may subsequently result in decreased protein folding. This can further be implicated to enhance conformational plasticity of Npas4 which may help in behaving differently to various transcription factors and enhancing its functional repertoire. The presence of IDRs C-terminal regulatory domain have also been reported in other Bhlh proteins (Fribourgh & Partch, 2017).
Moreover, previous reported behavior of IDRs suggest that the presence of more charged residues enhances electrostatic interactions which augments the propensity of posttranslational modifications (Hofmann et al., 2012;Mao et al., 2010). This subsequently resulted in altered binding affinities and increased range of structural heterogeneity derived from disorder to order transitions altering protein compactness. The simultaneous consideration of Npas4 functioning is as an 'immediate early gene', along with all the predicted and implied consequences of harboring IDRs in its sequence. The PAS domains carrying multi-ligand binding properties can aid Npas4 to impart its role as a 'hub protein' in forming various complexes and key mediator in neuronal signaling.
The presence of IDR within Npas4 sequence can pose difficulty in sculpting protein structure through X-ray crystallography which may be attributable to hindrances caused by the crystal packing forces from promiscuous conformations of disordered regions (Wells et al., 2008). This could be one of the reasons that determination of Npas4 crystallization propensity was reported to be on very low scores from all servers used.
The function of protein can be best predicted if we have insight about the protein tertiary structure. The Npas4 sequence has less than 25% homology with the crystal structure of PAS domains of Npas3 and Npas1. In the absence of any structural template, ab-initio modeling approach for Npas4 structure prediction was opted. The modeled structure was composed of alpha helices and B sheets in the bHLH domain and PAS A and B domains, while the transactivation domain does not passess any tertiary structure due to the presence of IDRs. Due to the presence of proper tertiary structure of PAS domain it was involved in heterodimarization with its partner (ARNT). Current study of amino acid properties suggested preponderance of hydrophobic amino acids in both PAS A and B domains. These findings are in accordance with other bHLH proteins like Npas1, Npas3, BMAL1 and CLOCK where PAS domains are harboring internal hydrophobic cavities (Wu et al., 2016).
The Npas4 was also subjected to prediction of interacting proteins and it reflected its interaction with ARNT, which is its main partner for dimerization. The STRING analysis also showed some important interactions. The analysis showed that Npas4 functional interaction can be divided into four clusters or groups. Cluster 1 involves proteins which are responsible for forming a large protein complex (the SMC5-SMC6 complex). This protein complex is responsible for inducing double stranded DNA breaks (DSB) under neuronal stimulation, which is crucial for the expression downstream promoters such as BDNF (Madabhushi et al., 2015). Through its interaction with CDK5, which has a key role in BDNF induced dendrite development, it implicates the crucial role of Npas4 in memory plasticity through regulating the balance of inhibition in neuronal synapses. This interaction may also be related to Rett Syndrome, a disorder in which activity dependent BDNF transcription is hampered (Liang et al., 2015). Cluster 2 (Fig. 8) suggested Npas4 interact with Nuclear RNA export factor (NXF) proteins, which are RNA binding proteins, with various paralogs reported to be found in dendritic granules (Mamon et al., 2017), suggesting Npas4's active involvement in the nuclear export of different mRNA and translational control over different proteins. Cluster 3 proteins reported to have altered expression in autism spectrum disorders (ASD), that may consequently result from knockdown of activity regulated gene transcription mediated by Npas4 (Morrow et al., 2008). Cluster 4 represented Npas4's interaction with cyclic AMP responsive element binding (CREB) proteins. These proteins under phosphorylated state can alter chromatin structure by histone acetylation which speeds up the RNA polymerase II recruitment, triggering the gene transcriptional programs involved in synapse development (Cohen & Greenberg, 2008;Greer & Greenberg, 2008). The CREB along with Npas4 is also involved in regulating inhibitory synapses of excitatory neurons by activating BDNF transcription (Hong, McCord & Greenberg, 2008;Lin et al., 2008). Interstingly, we found three binding sites in the transactivation domain which fall in the region of IDR. This information provides a plausible rationale that CREB may be interacting with any of these residues from the transactvation domain through interaction with transcriptional coactivator, CBP which is known to cause such interaction through IDRs (Dyson & Wright, 2016).

CONCLUSION
Npas4 is an intrinsically disordered protein with ordered bHLH and PAS domain. The amino acid sequence analysis showed that Npas4 has a high proportion of flexible and hydrophobic amino acids that promote the disorder properties of proteins. The model 9 predicted through LOMET is the best structure of Npas4 and can be used for further analysis. This protein is difficult to crystalize, so in order to determine its tertiary structure we can use NMR and other related techniques. It has strong interactions with NXF 2B and NXF 3 proteins which implicate its potential role in cytoplasmic export of proteins from nucleus, thus influencing protein translation. This Current study also elucidates Npas4 activation with CREB proteins, suggesting Npas4's reliance on Ca 2+ dependent kinases (CDKs). The information compiled in this research can serve as useful information for identifying new drug targets, which can modulate synaptic hemostasis in neuropshychiatric and neuro-developmental disorders. Moreover, possible interaction with Nuclear RNA export factor (NXF) protein family role identification needs further elaboration. The results reported by our study are more structural and theoretical in nature, however, they may help in biophysical studies, NMR and crystallographic studies directed towards Npas4. Moreover, the considerations of results may help in future studies designed to understand binding interactions of Npas4.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
The authors received no funding for this work.