NPAS1-ARNT and NPAS3-ARNT crystal structures implicate the bHLH-PAS family as multi-ligand binding transcription factors

The neuronal PAS domain proteins NPAS1 and NPAS3 are members of the basic helix-loop-helix-PER-ARNT-SIM (bHLH-PAS) family, and their genetic deficiencies are linked to a variety of human psychiatric disorders including schizophrenia, autism spectrum disorders and bipolar disease. NPAS1 and NPAS3 must each heterodimerize with the aryl hydrocarbon receptor nuclear translocator (ARNT), to form functional transcription complexes capable of DNA binding and gene regulation. Here we examined the crystal structures of multi-domain NPAS1-ARNT and NPAS3-ARNT-DNA complexes, discovering each to contain four putative ligand-binding pockets. Through expanded architectural comparisons between these complexes and HIF-1α-ARNT, HIF-2α-ARNT and CLOCK-BMAL1, we show the wider mammalian bHLH-PAS family is capable of multi-ligand-binding and presents as an ideal class of transcription factors for direct targeting by small-molecule drugs. DOI: http://dx.doi.org/10.7554/eLife.18790.001


Introduction
The mammalian bHLH-PAS transcription factors share a common protein architecture consisting of a conserved bHLH DNA-binding domain, tandem PAS domains (PAS-A and PAS-B), and a variable activation or repression domain. These factors can be grouped into two classes based on their heterodimerization patterns (McIntosh et al., 2010;Bersten et al., 2013) ( Figure 1A and Figure 1-figure supplement 1). Class I includes the three hypoxia-inducible factor (HIF)-a proteins (HIF-1a, HIF-2a and HIF-3a), four neuronal PAS domain proteins (NPAS1-4), aryl hydrocarbon receptor (AHR), AHR repressor (AHRR), single-minded proteins (SIM1, SIM2) and clock circadian regulator (CLOCK); while class II includes aryl hydrocarbon receptor nuclear translocator (ARNT, also called HIF-1b), ARNT2, brain and muscle ARNT-like protein 1 (BMAL1, also called ARNTL) and BMAL2 (ARNTL2). Heterodimerization between class I and class II members produces functional transcription factors capable of DNA binding and target gene regulation.
Many bHLH-PAS proteins are significantly involved in human disease processes and would be ideal drug targets if their architectures could accommodate binding and modulation by small-molecules. Members of the HIF-a subgroup mediate the response to hypoxia and regulate key genetic programs required for tumor initiation, progression, invasion and metastasis (Semenza, 2012). AHR controls T cell differentiation (Stevens and Bradfield, 2008) and maintains intestinal immune homeostasis (Leavy, 2011), making it a potential target for alleviating inflammation and autoimmune diseases. Loss-of-function mutations in SIM1 have been linked to severe obesity in human populations (Bonnefond et al., 2013), and defects in SIM2 are associated with cancers . CLOCK and BMAL1 together establish molecular circadian rhythms and their functional disruption can lead to a variety of metabolic diseases (Gamble et al., 2014).
The NPAS genes are highly expressed in the nervous system (Zhou et al., 1997;Brunskill et al., 1999;Ooe et al., 2004). In mice, genetic deficiencies in NPAS1 and NPAS3 are associated with behavioral abnormalities including diminished startle response, social recognition deficit and locomotor hyperactivity (Erbel-Sieler et al., 2004;Brunskill et al., 2005). NPAS2 is highly related to CLOCK in protein sequence (Reick et al., 2001), and altered patterns of sleep and behavioral adaptability have been observed in NPAS2-deficient mice (Dudley et al., 2003). NPAS4 deficiency is associated with impairment of long-term contextual memory formation (Ramamoorthi et al., 2011). In humans, alterations in all four NPAS genes have been linked to neuropsychiatric diseases including schizophrenia, autism spectrum disorders, bipolar disease and seasonal depression disorders (Kamnasaran et al., 2003;Pieper et al., 2005;Partonen et al., 2007;Pickard et al., 2009;Huang et al., 2010a;Bersten et al., 2014;Stanco et al., 2014). Structural information has not been available for any NPAS proteins to show if they could bind drug-like molecules for treating psychiatric diseases.
A crystal structure was previously reported for the CLOCK-BMAL1 heterodimer (Huang et al., 2012), and we recently reported crystal structures for both HIF-2a-ARNT and HIF-1a-ARNT heterodimers (Wu et al., 2015). In all these complexes, the conserved bHLH-PAS-A-PAS-B protein segments were visualized. While no internal cavities were reported within the CLOCK-BMAL1 architecture; we identified multiple hydrophobic pockets within HIF-1a-ARNT and HIF-2a-ARNT heterodimers. Discrete pockets were encapsulated within each of the four PAS domains of their heterodimers (two within ARNT and two within each HIF-a protein) (Wu et al., 2015). Beyond the first structural characterizations of NPAS1-ARNT and NPAS3-ARNT complexes presented here, we further addressed if ligand-binding cavities are a common feature of mammalian bHLH-PAS proteins. A comparison of these two structures with those of CLOCK-BMAL1, HIF-1a-ARNT and HIF-2a-ARNT heterodimers unveils the larger mammalian bHLH-PAS family as ligand binding transcription factors eLife digest Transcription factors are proteins that can bind to DNA to regulate the activity of genes. One family of transcription factors in mammals is known as the bHLH-PAS family, which consists of sixteen members including NPAS1 and NPAS3. These two proteins are both found in nerve cells, and genetic mutations that affect NPAS1 or NPAS3 have been linked to psychiatric conditions in humans. Therefore, researchers would like to discover new drugs that can bind to these proteins and control their activities in nerve cells. Understanding the three-dimensional structure of a protein can aid the discovery of small molecules that can bind to these proteins and act as drugs. Proteins in the bHLH-PAS family have to form pairs in order to bind to DNA: NPAS1 and NPAS3 both interact with another bHLH-PAS protein called ARNT, but it is not clear exactly how this works.
In 2015, a team of researchers described the shapes that ARNT adopts when it forms pairs with two other bHLH-PAS proteins that are important for sensing when oxygen levels drop in cells. Here, Wu et al. -including many of the researchers involved in the earlier work -have used a technique called X-ray crystallography to determine the three-dimensional shapes of NPAS1 when it is bound to ARNT, and NPAS3 when it is bound to both ARNT and DNA. The experiments show that each of these structures contains four distinct pockets that certain small molecules might be able to bind to. The NPAS1 and NPAS3 structures are similar to each other and to the previously discovered bHLH-PAS structures involved in oxygen sensing.
Further analysis of other bHLH-PAS proteins suggests that all the members of this protein family are likely to be able to bind to small molecules and should therefore be considered as potential drug targets. The next step following on from this work is to identify small molecules that bind to bHLH-PAS proteins, which will help to reveal the genes that are regulated by this family. In the future, these small molecules may have the potential to be developed into new drugs to treat psychiatric conditions and other diseases in humans.
with internal pockets appropriate for the selective binding of lipophilic molecules and drug-like compounds.

NPAS1-ARNT and NPAS3-ARNT architectures
We employed the contiguous bHLH-PAS-A-PAS-B segments of NPAS1, NPAS3 and ARNT proteins for our crystallographic studies ( Figure 1B). For the NPAS1-ARNT heterodimer, we obtained crystals that diffracted to 3.2 Å resolution ( Table 1). The quaternary organization of the NPAS1-ARNT complex is shown in Figure 1C. We found that the bHLH, PAS-A and PAS-B domains of ARNT twist along the outside surface of the NPAS1 protein. Figure 1D shows that NPAS1-ARNT and HIF-2a-ARNT heterodimers are very similar in overall architectures, but the PAS-B domain of ARNT is slightly displaced in the NPAS1 heterodimeric complex. This observation indicates that the ARNT architecture can display flexibility in accommodating its different class I partners.
We could not obtain crystals of apo NPAS3-ARNT and instead pursued its DNA complex. The response element for NPAS1-ARNT is known to match the consensus hypoxia response element (HRE) (Ohsawa et al., 2005;Teh et al., 2007), but the response element for NPAS3-ARNT was not previously characterized. NPAS1 and NPAS3 closely share amino-acids within their bHLH domain (75% identity, see Figure 1B and Figure 1-figure supplement 2), including conservation of residues that recognize DNA base-pairs based on our observations of the HIF-2a-ARNT-DNA complex (Wu et al., 2015). Therefore, we tested if NPAS3-ARNT could efficiently bind to the same consensus HRE sequence used by NPAS1-ARNT and HIF-a-ARNT heterodimers. We measured the dissociation constants (K d ) using a DNA duplex containing a central HRE sequence (5'-TACGTG-3') and found similar K d values of~20 nM for both NPAS1-ARNT and NPAS3-ARNT ( Figure 2A). These binding constants indicate a relatively higher affinity than that of the HIF-2a-ARNT heterodimer (K d~4 0 nM) (Wu et al., 2015). We then obtained crystals for NPAS3-ARNT bound to this HRE element and solved the structure at 4.2 Å resolution ( Figure 2B and Table 1). The resolution made possible an architectural comparison, at the quaternary level, between the NPAS3-ARNT, NPAS1-ARNT, HIF-1a-ARNT and HIF-2a-ARNT heterodimers. All of these complexes share a similar overall architecture ( Figure 2C,D) stabilized by the same six domain-domain junctions (see below). Furthermore, the cooperation between the two subunits of NPAS3-ARNT also creates a DNA-reading head that is similar to that of the HIFa-ARNT-DNA complexes, allowing direct readout of the HRE site ( Figure 2C).

Domain-domain arrangements
We next tested if the six domain-domain interfaces observed in NPAS1-ARNT and NPAS3-ARNT ( Figure 3A) are important for maintaining the stability of their full-length heterodimers within cells. For our study, we used co-immunoprecipitation (co-IP) experiments in HEK293T cells, together with a series of single or double mutations positioned within ARNT ( Figure 3B). Mutations in interfaces 1-4 were found to significantly destabilize ARNT's heterodimeric interactions with both NPAS1 and NPAS3, as predicted from their crystal structures. Point mutations within NPAS1 at interfaces 5 and 6 also destabilized heterodimerization with ARNT ( Figure 3C). We additionally tested the effects of  these destabilizing mutations on the transcriptional function of NPAS1-ARNT. NPAS1 has been shown to function as a transcriptional repressor on its target gene tyrosine hydroxylase (TH), since it lacks a functional transactivation domain (Teh et al., 2006). We confirmed this repression activity in both HRE-driven and TH-driven reporter assays, and further found that mutations that destabilized the heterodimer also compromised the transcriptional repression ( Figure 3D).
In discovering that the NPAS1-ARNT, NPAS3-ARNT and HIF-a-ARNT heterodimers utilize the same six domain-domain interfaces to create a shared quaternary structure ( Figures 1D and 2D), we also found that this type of architecture is clearly distinct from that of CLOCK-BMAL1 complex (Huang et al., 2012) (Figure 1-figure supplement 3). Therefore, we conclude that there are at least two distinct architectural forms within the mammalian bHLH-PAS family. Since the ARNT heterodimers form a larger group than the BMAL1 heterodimers ( Figure 1A), we asked if other ARNT heterodimers rely on the same six domain-domain junctions we observed in NPAS1-ARNT, NPAS3-ARNT and HIF-a-ARNT heterodimers. We could experimentally interrogate the degree of architectural variation within the ARNT heterodimer class by using our panel of ARNT mutations that were able to block its heterodimerization with NPAS1 and NPAS3.
Using co-IP studies, we tested if these same ARNT mutations would disrupt its heterodimeric complexes with SIM1, NPAS4 and AHR ( Figure 3B). The SIM1-ARNT heterodimer stability was indeed compromised by these ARNT mutations, indicating that this heterodimer shares the same overall architecture as NPAS1-ARNT, NPAS3-ARNT and the two HIF-a-ARNT heterodimers. However, NPAS4-ARNT and AHR-ARNT heterodimer stabilities were not impacted in the same manner by these ARNT mutations, particularly when the mutations were located to interfaces 3 and 4. Therefore, we believe there is greater architectural variation within ARNT heterodimer group than what has been crystallographically observed to date. A protein amino-acid sequence alignment further indicates that residues observed to stabilize domain-domain junctions in the NPAS1/3-ARNT and HIF-a-ARNT heterodimers are much more conserved in SIM1 than in NPAS4 and AHR (Figure 1figure supplements 1 and 2).
The ARNT protein is ubiquitously expressed in mammalian cells, but its paralog ARNT2 is more specifically enriched in brain and kidney tissues (Hirose et al., 1996). ARNT and ARNT2 have both unique and overlapping cellular functions (Keith et al., 2001), and ARNT2 has been suggested as the preferred physiological partner for Class I members SIM1 (Michaud et al., 2000) and NPAS4 (Bersten et al., 2014). Since the amino-acid sequence identity at the bHLH-PAS region is nearly 80% between ARNT and ARNT2, and since all the ARNT residues observed to be participating at dimerization interfaces are fully conserved in ARNT2 (Figure 3-figure supplement 1), we predict that ARNT2 heterodimers will display similar overall quaternary architectures as their ARNT counterparts. To test this prediction, we mutated two ARNT2 residues at each of the four dimer interfaces ( Figure 3E). Compared with the ARNT mutations ( Figure 3B), these corresponding ARNT2 mutations (both single and double ones) indeed had similar effects on the dimerization with not only SIM1 but also NPAS4 ( Figure 3E), indicating that ARNT2 indeed dimerizes with SIM1 and NPAS4 in the same way as ARNT does.

NPAS1-ARNT and NPAS3-ARNT cavities
In both HIF-1a-ARNT and HIF-2a-ARNT heterodimers, we previously identified hydrophobic pockets encapsulated within the two PAS domains of ARNT, and within the two PAS domains of each HIF-a protein (Wu et al., 2015). Here we asked if NPAS1-ARNT and NPAS3-ARNT harbored similarly positioned pockets. NPAS1 and NPAS3 are genetically associated with a wide range of human neuropsychiatric disorders (Kamnasaran et al., 2003;Pieper et al., 2005;Stanco et al., 2014). Thus, the discovery of ligand-binding cavities could lead to the future discovery of therapeutic molecules for these illnesses. We show that NPAS1 protein's PAS-A and PAS-B domains do contain internal cavities with volumes measuring 190 Å 3 and 180 Å 3 , respectively ( Figure 4A,B and Table 2). Similarly positioned pockets are seen in the NPAS3 protein, measuring 100 Å 3 and 230 Å 3 , respectively. Each of these PAS domains resembles a baseball catcher's mitt, with the beta strands forming the palm and short alpha-helices forming the opposing thumb to enclose a central pocket.
Through heterodimerization, ARNT further brings its own two pockets to join each of its class I partners. We examined if the two ARNT pockets alter their shape when ARNT forms different heterodimeric complexes. We could not detect any major change in the cavity size or shape of ARNT's PAS-A and PAS-B pockets, whether this protein was in a complex with HIF-2a or NPAS1  Figure 4C). ARNT is ubiquitously expressed in many cell types and is predominantly nuclear (Reyes et al., 1992); whereas its class I heterodimerization partners are signal activated and/or tissue restricted. Therefore, the pockets located in NPAS1, NPAS3, HIF-1a and HIF-2a should allow more selective actions of therapeutic drugs than the pockets within ARNT ( Figure 4B-D).

Variations in protein pockets
Our findings of two similarly positioned pockets within each of NPAS1, NPAS3, HIF-1a, HIF-2a and ARNT led us to consider if any other bHLH-PAS family members could also harbor putative ligandbinding cavities. The crystal structure of CLOCK-BMAL1 has been reported, but no internal cavities were described in its published analysis (Huang et al., 2012). Therefore, we closely examined this structure and found that both CLOCK and BMAL1 proteins contained discrete pockets within each of their PAS-A and PAS-B domains ( Figure 4E-G). The BMAL1's PAS-A and PAS-B pockets (measuring 200 Å 3 and 220 Å 3 , respectively) are larger than those of CLOCK (measuring 120 Å 3 and 140 Å 3 , respectively) ( Table 2). Crystal structures are unavailable for SIM1 and SIM2 proteins; however, the close evolutionary and amino-acid sequence similarity with the NPAS1 protein led us to generate plausible models for their PAS-A and PAS-B pockets ( Figure 4H, Figure 1-figure supplements 1  and 2).
The pockets of AHR and its repressor AHRR are more difficult to model based on our existing crystal structures, because of greater sequence and evolutionary divergence of these proteins (Figure 1-figure supplement 1). Interestingly, multiple classes of ligands, including halogenated aromatic hydrocarbons, tetrapyrroles and several tryptophan derivatives were previously identified as direct AHR binding ligands Stejskalova et al., 2011). These molecules display significant variations in their chemical structures and sizes, but still bind with high affinities to AHR (K d values of 0.1-100 nM). High-affinity, multi-ligand binding can best be accounted for by the use of four discrete pockets within AHR-ARNT, as shown here for other ARNT heterodimers, than just one pocket as suggested previously (Pandini et al., 2007). Figure 4 shows side-by-side comparisons of the PAS-A and PAS-B pockets from multiple bHLH-PAS proteins. Importantly, each PAS domain relies on a constellation of different amino-acids to form its interior cavity, allowing the pockets to bind selectively to different repertoires of endogenous ligands. Moreover, within subgroups such as HIF-1a/2a, NPAS1/3, and SIM1/2, the proteins share pocket residues more closely with each other than they do between subgroups. This observation indicates close ligand preferences within subgroups, suggesting they could recognize similar Figure 4 continued with identical counterparts in NPAS3 are in cyan and labelled in black, while those with non-conserved NAPS3 counterparts (indicated in parentheses) are in magenta and labelled in red (B). ARNT pocket residues from apo HIF-2a-ARNT complex (PDB: 4ZP4) (in wheat) and those from NPAS1-ARNT complex (in green) are superposed for comparison (C). The HIF-2a residues with identical counterparts in HIF-1a are in magenta, while those with nonconserved HIF-1a counterparts (indicated in parentheses) are in green (D). Structures of CLOCK (F) and BMAL1 (G) are from the CLOCK-BMAL1 complex (PDB: 4F3L) in brown and lime, respectively. The SIM1 structure in orange was modelled from NPAS1 and HIF-2a; and the pocket residues not conserved in SIM2 are in green with their SIM2 counterparts indicated in parentheses (H). DOI: 10.7554/eLife.18790.011 Pocket volumes (Å 3 ) were calculated using CASTp program (Dundas et al., 2006)  metabolites or signaling molecules derived from the same biosynthetic pathway. Importantly, in all the PAS-A and PAS-B pockets, the amino-acid residues lining interior cavities are predominantly hydrophobic. This property would allow favorable van der Waals interactions with lipophilic ligands. The desolvation of hydrophobic ligands can further contribute to favorable energetics of ligandbinding inside these cavities. The internal pocket volumes in the bHLH-PAS family (100-600 Å 3 ) ( Table 2) are in-line with pocket sizes observed in other classes of ligand-binding proteins (100-1000 Å 3 ) (Liang et al., 1998). Moreover, we found that ligand binding can increase the size of pocket significantly. For example, HIF-2a specific inhibitor 0X3 enlarged its PAS-B pocket volume from 370 Å 3 to 560 Å 3 (Wu et al., 2015). These observations suggest a high degree of adaptability in PAS domain pockets, and should encourage future drug-discovery efforts seeking multiple classes of synthetic modulators for bHLH-PAS proteins. An analogy can be made with the nuclear receptor family, where diverse classes of small-molecules can bind to the same pocket, and ligand binding can expand pocket sizes (Huang et al., 2010b).

Discussion
Previously, the nuclear receptors were believed to form the only transcription factor family in higher eukaryotes with conserved ligand binding capabilities. We showed here that the mammalian bHLH-PAS proteins constitute a second independent family of transcription factors with the appropriate and conserved molecular frameworks required for multi-ligand binding. Our findings are based on the direct crystallographic analysis of seven distinct bHLH family members (ARNT,NPAS1,NPAS3,CLOCK and BMAL1). In all these proteins, internal hydrophobic cavities were clearly observed within both their PAS-A and PAS-B domains. These PAS domains become interweaved in bHLH-PAS heterodimers through at least two highly distinct forms of architecture. Aside from the seven members we crystallographically examined, three additional members: AHR (Stejskalova et al., 2011), HIF-3a (Fala et al., 2015) and NPAS2 (Dioum et al., 2002) are known to have ligand-binding capabilities associated with at least one of their PAS domains, bringing the experimentally confirmed number of ligand pocket containing members to ten. Our sequence-structure comparative analyses further implicate three more members: ARNT2, SIM1 and SIM2, as likely to harbor pockets due to considerable amino-acid conservations with ARNT and NPAS1.
Our unmasking of the wider bHLH-PAS protein family as a group of transcription factors with multi-ligand-binding capabilities should fuel the future search for their physiological and endogenous ligands. The protein pockets, in each case, make these factors ideally suited for applications of highthroughput screening campaigns to find small-molecule therapeutics for a variety of human diseases, including psychiatric illnesses, cancers and metabolic diseases. The distinctive amino-acid residues inside their pockets predict successful outcomes for screens seeking highly selective modulators for each protein. These pockets are also highly pliable and can accommodate significantly larger ligands than their empty volumes alone suggest.
The precise mechanism through which endogenous and/or pharmacologic ligands can modulate transcriptional activities of bHLH-PAS proteins has not yet been fully revealed, and could differ for each family member. For example, ligand binding to AHR can displace heat shock protein 90 and initiate the nuclear translocation of AHR . Some small-molecule HIF-2a antagonists that bind to the PAS-B domain can disrupt the dimerization between HIF-2a and ARNT (Scheuermann et al., 2013). Beyond these initial observations to date, and without the availability of small-molecule tools to probe other bHLH-PAS proteins, the mechanistic links between ligand binding and transcriptional regulation remain to be discovered.
It is interesting that while dimeric nuclear receptors harbor two pockets in their functional architectures, the bHLH-PAS proteins present four distinct pockets. Notably a fifth pocket was also observed in HIF-2a-ARNT heterodimers, located between two subunits, allowing proflavine to promote subunit dissociation (Wu et al., 2015). The latter finding further suggests that crevices formed between the PAS domains within the quaternary architectures of family members could form additional ligand binding and modulation sites. The availability of so many distinctive pockets within bHLH-PAS proteins could allow a variety of biosynthetic or metabolic signals derived from cellular pathways to be integrated into a single unified functional response. Alternatively, each pocket may allow a bHLH-PAS protein to be the site of unique ligand within each cell-type. The identification and validation of endogenous ligands for this family will help define the genetic programs controlled by each bHLH-PAS member.

Crystallization and X-ray data collection
Crystallization of the NPAS1-ARNT complex was carried out using the sitting drop vapor diffusion method at 16˚C, by mixing equal volume of protein (4 mg/ml) and reservoir solution containing 2% Tacsimate pH 7.0, 3% PEG3350. Before flash frozen in liquid nitrogen, crystals were soaked in reservoir plus 30% glycerol as the cryoprotectant. NPAS3-ARNT-DNA crystals were grown at 16˚C in sitting drops formed by equal volume of complex (4 mg/ml) and reservoir consisting of 100 mM NH 4 F, 9% PEG 3350, and then transferred stepwise to cryoprotectant of 100 mM NH 4 F, 10% PEG 3350 and up to 30% PEG 400 (5% increase each step) prior to flash freezing. Diffraction data were collected at the Argonne National Laboratory SBC-CAT 19ID beamline at 100 K.

Fluorescence polarization binding assay
The 21-mer fluoresceinated double-strand DNA was prepared by annealing 6-FAM labelled forward strand (5'-GGCTGCGTACGTGCGGGTCGT-3') with the unlabeled reverse strand (5'-ACGACCCG-CACGTACGCAGCC-3') in the buffer consisting of 10 mM Tris pH 7.5, 1 mM EDTA and 2 mM MgCl 2 . For the binding assay, 2 nM DNA was incubated with purified proteins for 30 min, and final protein concentrations were varied by serial dilution in binding buffer (20 mM Tris pH 8.0, 50 mM NaCl and 10 mM DTT). The fluorescence polarization signals were recorded and processed as previously (Wu et al., 2015).

Luciferase reporter assay
HEK293T cells were seeded in 24-well plates, and one day later transfected with 200 ng of pCMV-Tag4-NPAS1 (WT, mutants or empty plasmid), 1 ng of pRL (control Renilla luciferase), 100 ng of HRE-luc reporter (Ao et al., 2008) or TH-luc reporter containing the tyrosine hydroxylase promoter sequence (Thiel et al., 2005) using 0.6 mL jetPRIME regent for each well. Medium was refreshed after overnight transfection, and luciferase activity was measured another 24 hr later using the Dual-Glo Luciferase Assay System (Promega, #E2920). Final data were normalized by the relative ratio of firefly and Renilla luciferase activity.