The complete structure of the human TFIIH core complex

Transcription factor IIH (TFIIH) is a heterodecameric protein complex critical for transcription initiation by RNA polymerase II and nucleotide excision DNA repair. The TFIIH core complex is sufficient for its repair functions and harbors the XPB and XPD DNA-dependent ATPase/helicase subunits, which are affected by human disease mutations. Transcription initiation additionally requires the CdK activating kinase subcomplex. Previous structural work has provided only partial insight into the architecture of TFIIH and its interactions within transcription pre-initiation complexes. Here, we present the complete structure of the human TFIIH core complex, determined by phase-plate cryo-electron microscopy at 3.7 Å resolution. The structure uncovers the molecular basis of TFIIH assembly, revealing how the recruitment of XPB by p52 depends on a pseudo-symmetric dimer of homologous domains in these two proteins. The structure also suggests a function for p62 in the regulation of XPD, and allows the mapping of previously unresolved human disease mutations.


Introduction
Transcription factor IIH (TFIIH) is a 10-subunit protein complex with a total molecular weight of 0.5 MDa that serves a dual role as a general transcription factor for transcription initiation by eukaryotic RNA polymerase II (Pol II), and as a DNA helicase complex in nucleotide excision DNA repair (NER) (Compe and Egly, 2016;Sainsbury et al., 2015). Mutations in TFIIH subunits that cause the inherited autosomal recessive disorders xeroderma pigmentosum (XP), trichothiodystrophy (TTD), and Cockayne syndrome (CS) are characterized by high incidence of cancer or premature ageing (Cleaver et al., 1999;Rapin, 2013). Furthermore, TFIIH is a possible target for anti-cancer compounds (Berico and Coin, 2018) and therefore of great importance for human health and disease.
The TFIIH core complex is composed of the seven subunits XPB, XPD, p62, p52, p44, p34, and p8, and is the form of TFIIH active in DNA repair (Svejstrup et al., 1995), where TFIIH serves as a DNA damage verification factor (Li et al., 2015;Mathieu et al., 2013) and is responsible for opening a repair bubble around damaged nucleotides. This activity depends on both the SF2-family DNA-dependent ATPase XPB, and the DNA helicase activity of XPD Evans et al., 1997;Kuper et al., 2014). TFIIH function in transcription initiation requires the double-stranded DNA translocase activity of XPB to regulate opening of the transcription bubble (Alekseev et al., 2017;Fishburn et al., 2015;Grünberg et al., 2012), and additionally the CdK activating kinase (CAK) complex, which harbors the kinase activity of CDK7 as well as the Cyclin H and MAT1 subunits (Devault et al., 1995;Fisher et al., 1995;Fisher and Morgan, 1994;Shiekhattar et al., 1995;Svejstrup et al., 1995). Targets of human CDK7 include the C-terminal heptapeptide repeat domain of the largest subunit of Pol II, as well as cell-cycle regulating CDKs (Fisher and Morgan, 1994;Shiekhattar et al., 1995). MAT1 serves as a bridging subunit that promotes CAK subcomplex formation by interacting with Cyclin H and CDK7 (Devault et al., 1995;Fisher et al., 1995), recruits the CAK to the core complex by interactions with XPD and XPB (Abdulrahman et al., 2013;Busso et al., 2000;Greber et al., 2017;Rossignol et al., 1997), and also aids in Pol II-PIC formation by establishing interactions with the core PIC (He et al., 2013;He et al., 2016;Schilbach et al., 2017). The presence of MAT1 inhibits the helicase activity of XPD (Abdulrahman et al., 2013;Sandrock and Egly, 2001), but the mechanism of this inhibition is not fully understood. While the enzymatic activity of XPD is not required for transcription initiation, it is critical for the DNA repair function of TFIIH (Dubaele et al., 2003;Evans et al., 1997;Kuper et al., 2014). Therefore, NER requires the release of the CAK subcomplex from the core complex (Coin et al., 2008). The activities of both XPB and XPD are regulated by interactions with additional TFIIH components, including that of p44 with XPD (Coin et al., 1998;Dubaele et al., 2003;Kim et al., 2015), and those of the p52-p8 module with XPB Coin et al., 2006;Jawhari et al., 2002;Kainov et al., 2008). These interactions are likely to be crucial for TFIIH function, as some are affected by disease mutations (Cleaver et al., 1999), but they have been only partially characterized mechanistically.
Our previous structure of the TFIIH core-MAT1 complex at 4.4 Å resolution (Greber et al., 2017) allowed modeling of TFIIH in the best-resolved parts of the density map, but several functionally important regions remained unassigned or only partially interpreted because reliable de novo tracing of entire domains in the absence of existing structural models was not possible. Here, we present the complete structure of the human TFIIH core complex in association with the CAK subunit MAT1, determined by phase plate cryo-electron microscopy (cryo-EM) at 3.7 Å resolution. Our structure reveals the complete architecture of the TFIIH core complex and provides detailed insight into the interactions that govern its assembly. Additionally, our cryo-EM maps define the molecular contacts that control the regulation of the XPB and XPD subunits of TFIIH, including the critical p52-XBP interaction, and an extensive regulatory network around XPD, formed by XPB, p62, p44, and MAT1.

Structure determination of TFIIH
To determine the complete structure of the human TFIIH core complex, we collected several large cryo-EM datasets (Supplementary file 1) of TFIIH immuno-purified from HeLa cells using an electron microscope equipped with a Volta phase plate (VPP) ) and a direct electron detector camera mounted behind an energy filter. From a homogeneous subset of eLife digest The DNA inside a cell carries the instructions it needs to survive. Living cells use many different proteins to read and maintain this store of information. For example, a group of ten proteins collectively called TFIIH is often involved in both reading and repairing the DNA. Proteins in the TFIIH complex include p52, p62, XPB and XPD.
Understanding the structure of the proteins in TFIIH could reveal much about how it works and how changes to its structure contribute to various medical conditions. Yet TFIIH is a dynamic assembly of molecules and includes many proteins, which makes examining its structure challenging. An ideal protein structure should provide an accurate map of the positions of all the atoms in a protein. Previously, it has not been possible to get this level of detail for TFIIH. Greber et al. used an approach called cryo-electron microscopy (also called cryo-EM) to reveal the structure of TFIIH collected from human cells. The structure revealed several new details, including how p52 helps XPB attach to the rest of TFIIH, and that p62 helps to control the activity of XPD. With such a detailed structure, Greber et al. could link changes in TFIIH that are seen in different human diseases to specific parts of the complex.
Examining the atomic details of proteins can reveal a lot about how they work and the changes that occur during different diseases. These structures can also help to reveal aspects of how DNA is read and repaired, and may help to design new approaches to treat diseases in the future. approximately 140,000 TFIIH particle images identified by 3D classification (Scheres, 2010), we reconstructed a 3D cryo-EM density map at 3.7 Å resolution (Figure 1-figure supplements 1 and 2A-C). This VPP-based cryo-EM map was substantially improved compared to our previous maps obtained without phase plate, both in resolution and interpretability (Figure 1-figure supplement 2D-G), and enabled building, refinement, and full validation of an atomic model of the TFIIH core complex and the MAT1 subunit of the CAK subcomplex ( Figure 1A-C, Figure 1-figure supplement 2B,C, Supplementary file 2, 3), while the remainder of the CAK subcomplex is invisible in our map because it is flexibly tethered to the TFIIH core complex. Tracing and sequence register assignment of protein components modeled de novo was facilitated by density maps obtained from  Nakane et al., 2018), which resulted in density maps of improved interpretability for all three sub-volumes and a slightly improved resolution of 3.6 Å for the XPD-MAT1 region. Both the overall and multibody-refined maps showed clear side chain information (Figure 1-figure supplement 6A-D). Furthermore, our model was corroborated by existing chemical crosslinking-mass spectrometry (CX-MS) data of human TFIIH (Luo et al., 2015) and site-specific crosslinks from yeast TFIIH (Warfield et al., 2016)  Detailed architecture of TFIIH and structure of p62 Our structure of the TFIIH core complex shows its horseshoe-like overall shape ( Figure 1A-C, Video 1), as observed in previous lower-resolution reconstructions of free and PIC-bound TFIIH (Gibbons et al., 2012;Greber et al., 2017;He et al., 2016;Murakami et al., 2015;Schilbach et al., 2017), and allows us to define the complete set of inter-subunit interactions that lead to the formation of the TFIIH core complex directly from our structure ( Figure 1D).
The largest subunits of the complex, the SF2-family DNA-dependent ATPases XPB and XPD, both containing two RecA-like domains (RecA1 and RecA2), interact directly (Greber et al., 2017), are on one side of the complex, and are additionally bridged by MAT1 ( Figure 1B), which has been shown to interact with either ATPase in isolation (Busso et al., 2000). On the side facing away from MAT1, XPD interacts with the von Willebrand Factor A (vWFA) domain of p44 (Coin et al., 1998;Dubaele et al., 2003;He et al., 2016;Kim et al., 2015), which in turn forms a tight interaction with p34 via interlocking eZnF domains (Schilbach et al., 2017) and a p44 RING domain interaction (Radu et al., 2017) Figure 1-figure supplement 6J,K), consistent with the formation of a multivalent interaction network between p34 and p44 (Radu et al., 2017). The vWFA domain of p34 recruits p52 by a three-way interaction that involves the most N-terminal winged helix domain in p52 and a helical segment of p62 (Schilbach et al., 2017) (Figure 1-figure supplement  6L). The p52 C-terminal region comprises two domains; first, the 'clutch' that interacts with XPB (Jawhari et al., 2002) and second, a dimerization module that binds p8 (Kainov et al., 2008), thereby recruiting XPB to TFIIH and cradling XPB RecA2 (see below). In addition to this structural framework that is formed by folded domains, our cryo-EM map reveals several interactions involving extended protein segments, including several interactions formed by p62 (Figure 2), and an interaction between the p44 N-terminal extension (NTE) and the N-terminal domain (NTD) of XPB ( Figure 1B). To form this interaction, approx. 15 residues of p44 span the distance between the p44 vWFA domain and the XPB NTD, where a small helical motif in p44 contacts XPB residues 72-75, 95-102, and 139-143, in agreement with CX-MS data (Luo et al., 2015) (Figure 1-figure supplement 6E). Partial deletion of the p44 NTE in yeast causes a slow-growth phenotype, suggesting a functional role for this p44-XPB interaction (Warfield et al., 2016).
The p62 subunit is almost completely resolved in our structure and exhibits a complex beads-on-a-string-like topology. It fully encircles the top surface of TFIIH (Figures 1C and 2A, Figure 2-figure supplement 1), interacting with XPD, p52, p44, and p34, in agreement with previous structural findings (Greber et al., 2017;Schilbach et al., 2017). Based on these interactions, p62 can be subdivided into three functional regions: (i) the N-terminal PH-domain, disordered in our structure, is responsible for mediating interactions with components of the core transcriptional machinery (Di Lello et al., 2008;He et al., 2016;Schilbach et al., 2017), transcriptional regulators (Di Lello et al., 2006), Video 1. Architecture of the TFIIH core complex. Rotating structure of the TFIIH core complex, followed by views that highlight the interactions of p62 near the nucleotide binding pocket of XPD and near the substrate binding cleft of XPD (binding sites are indicated by a flashing ADP molecule and DNA strand, respectively). Bound substrates, which are not present in our structure, were superposed from PDB ID 6FWS (Cheng and Wigley, 2018). DOI: https://doi.org/10.7554/eLife.44771.010 and DNA repair pathways (Gervais et al., 2004;Lafrance-Vanasse et al., 2013;Okuda et al., 2017); (ii) residues 108-148 and 454-548 of p62, including the first BSD (BTF2-like, synapse-associated, DOS2-like) domain (BSD1) and the C-terminal 3-helix bundle, play an architectural role by binding to p34 and the extended zinc finger (eZnF) domain of p44 ( Specifically, p62 residues 160-365 form three structural elements that interact with XPD ( Figure 2B, Video 1), in agreement with previous biochemical, structural, and CX-MS data (Figure 1-figure supplement 6G) Luo et al., 2015;Schilbach et al., 2017). First, an a-helix formed by p62 residues 295-318 binds directly to XPD RecA2 and thereby recruits residues 160-258 of p62, comprising the BSD2 domain and adjacent sequence elements, to this surface of XPD RecA2 (Figure 2A (Schilbach et al., 2017). This inserted p62 segment directly blocks a DNA-binding site on XPD RecA1 ( Figure 2D) and localizes near the access path to a pore-like structure between the XPD FeS and ARCH domains. While p62 does not directly contact the DNA-binding surface on XPD RecA2, it may still sterically interfere with DNA binding or access to the helicase elements of XPD in this region ( Figure 2-figure supplement 1E). Therefore, this segment of p62 may need to move away when XPD binds and unwinds DNA. Third, p62 residues 350-358 form a short a-helix that binds in a cleft between the two RecA-like domains of XPD ( Figure 2C), so that it not only closes the entrance to the nucleotide binding pocket in XPD RecA1 (Figure 2-figure supplement 1F), but also partially overlaps with the predicted location of the nucleotide itself ( Figure 2C), strongly suggesting a role for this p62 sequence element in XPD regulation. The density for these structural elements of p62 (residues 260-300 and 346-365) in our cryo-EM map is weaker than for the remainder of the complex, suggesting a dynamic interaction with XPD that enables them to modulate the access to the nucleotide-binding pocket, the DNA-binding cavity, and the DNA-translocating pore of XPD, depending on the functional state of TFIIH. 3D reconstructions of TFIIH classified for these regions of p62 (Figure 1-figure supplement 3) show globally intact TFIIH, both in the presence and absence of the p62 segments at these XPD sites (Figure 2-figure supplement 1G-J), supporting our hypothesis of dynamic regulation, rather than the alternative hypothesis of p62 binding to XPD as a requirement for TFIIH stability (Luo et al., 2015).

Molecular basis of XPB recruitment by p52
Our structure of TFIIH resolves the structure and interactions of all four folded domains of human XPB -two RecA-like domains that form the SF2-family type helicase cassette, a DNA damage recognition domain ( Existing biochemical data show that the XPB NTD is required for integration of XPB into TFIIH (Jawhari et al., 2002) by forming an interaction with p52 that has been referred to as the 'clutch' (Schilbach et al., 2017). In our structure, the p52 contribution to the clutch encompasses p52 residues 306-399, which, strikingly, assume the same overall fold as the XPD NTD ( Figure 3B), as hypothesized previously (He et al., 2016;Luo et al., 2015), thereby forming a pseudo-symmetric dimer of structurally homologous domains. The two domains interact through their b-sheets, via both hydrophobic and charged interactions ( Our structural findings rationalize biochemical data that show that deletion of XPB residues 1-207, but not deletion of residues 1-44, impairs the p52-XPB interaction (Jawhari et al., 2002) ( Figure 3C). Our structure is also consistent with data indicating that p52 residues 304-381 are critical for the XPB-p52 interaction Jawhari et al., 2002), but does not show any contacts that could explain that reported binding of XPB to p52 residues 1-135 or 1-304 (Jawhari et al., 2002) (Figure 3-figure supplement 2D,E). The interaction between p52 and XPB not only recruits XPB to TFIIH, but also stimulates its ATPase activity in vitro . Because our structure does not shown any elements of p52 approaching the XPB nucleotide-binding pocket, we propose that this effect is likely induced by the interactions of p52 with the XPB NTD and RecA2, which may, together with p8 (Coin et al., 2006), properly arrange the XPB helicase cassette to bind and hydrolyze ATP ( Figure 3D) (Grünberg et al., 2012).
The XPB NTD is the site of the two human disease mutations F99S and T119P, which cause XP and TTD, respectively (Cleaver et al., 1999). Or structure shows that neither of these residues is in direct contact with p52 or the RecA-like domains of XPB, suggesting that the F99S and T119P mutations exert their detrimental effects through structural perturbation of the XPB NTD ( This conservation suggests that a threonine at this position is important for the efficient folding of this domain in general, and that the T119P mutation may cause its destabilization, resulting in lower levels of active enzyme in TTD patients. Lower overall levels of properly assembled TFIIH have been shown to be a hallmark of TTD (Botta et al., 2002;Dubaele et al., 2003;Giglia-Mari et al., 2004) and could explain the disease-causing effect of T119P in vivo even though recombinant TFIIH carrying this mutation retains some activity in both transcription initiation and NER . A less likely alternative, given the conservation of the equivalent residue in the p52 clutch, is that T119 is involved in an interaction with a factor that is critical for cellular function, for example in NER.
The F99S mutation affects a residue that is conserved throughout eukaryotic XPB (  the XPB contact site with the p44 N-terminal extension (Figure 3-figure supplement 2I). This mutation is likely to impair the stability and folding of the XPB NTD. Unlike T119P, this mutation leads to impaired DNA opening in NER assays, reduced interaction with p52, reduced ATPase activity , and strong impairment in DNA damage repair (Riou et al., 1999), suggesting a severe effect on the structure of the XPB NTD.
Natural and synthetic mutations in the Drosophila melanogaster homolog of p52 that lead to disease-like phenotypes in flies and have similar defects when introduced into human cells (Fregoso et al., 2007) map directly to the p52-XPB interface, explaining their detrimental phenotypes ( Figure 3C, Figure 3-figure supplement 2A).
Our structure assigns XPB residues 165-300 to a DRD-like domain that connects the NTD to the RecA-like domain ( Figure 3A, Figure 3-figure supplement 1A), the deletion of which is lethal in yeast (Warfield et al., 2016). The DRD is a DNA-binding domain found in DNA repair enzymes and chromatin remodelers (Mason et al., 2014;Obmolova et al., 2000) and has been implicated in DNA damage recognition in archaeal XPB (Fan et al., 2006;Rouillon and White, 2010). Our 3.7 Åresolution map of TFIIH reveals that in eukaryotic XPB, one b-strand of the DRD of archaeal XPB is replaced by an insertion of approximately 70 residues that exhibits relatively low sequence conservation ( Figure 3E, Figure 3-figure supplement 1B) and shifts the domain boundaries of the human XPB DRD-like domain with respect to previous sequence alignments (Fan et al., 2006;Oksenych et al., 2009). The part of this insertion resolved in our map consists of a negatively charged linker and an a-helical element that contacts XPD directly ( Figure 3E,F). The surface on XPD involved in this interaction has been implicated in the initial step of DNA substrate binding by XPD (Constantinescu-Aruxandei et al., 2016;Kuper et al., 2012). Density features and secondary structure prediction indicate the presence of several aromatic side chains of XPB near the interface ( Figure 3E), where they might form contacts resembling those of nucleoside bases of XPD-bound DNA substrates ( Figure 3F). Thus, XPB may modulate substrate binding by XPD, further reinforcing the idea that XPD activity is regulated by several other components of TFIIH.

Conformational dynamics of the TFIIH core complex
In order to investigate the dynamics of TFIIH, we analyzed the conformational landscape of the particles in our cryo-EM dataset ( Figure  . This conformational change in TFIIH upon PIC entry also appears to break the interaction between MAT1 and the XPB DRD-like domain ( Figure 4C), which in turn might serve to enable positioning of the CDK7-cyclin H dimer within the CAK subcomplex at the appropriate location for Pol II-CTD phosphorylation in the mediator-bound Pol II-PIC ( Figure 4D) (Robinson et al., 2016;Schilbach et al., 2017). Our structural comparison also reveals that a TFIIE-XPB interaction that has been implicated in XPB regulation (Schilbach et al., 2017)

Structure of XPD
The structure of XPD shows the conserved domain arrangement of two RecA-like domains (RecA1 and RecA2), with the FeS and ARCH domain insertions in RecA1 (Constantinescu-Aruxandei et al., 2016;Fan et al., 2008;Kuper et al., 2012). The quality of the map allowed us to interpret the density for the N-and C-termini of XPD, which closely approach each other near the nucleotide-binding site within RecA1 ( Figure 5A). The N-terminus of XPD forms a short two-stranded b-sheet near the  (Cleaver et al., 1999).
Before XPD-bound DNA reaches the helicase motifs in the RecA like domains, it passes through a pore-like structure next to the 4FeS cluster at the interface between the FeS and ARCH domains ( Figure 5-figure supplement 1E,F) (Cheng and Wigley, 2018;Constantinescu-Aruxandei et al., 2016;Kuper et al., 2012;Liu et al., 2008;Wolski et al., 2008). This region was poorly defined in previous TFIIH reconstructions, but our cryo-EM map now shows side-chain densities for the aromatic residues Y158, F161, and F193, which are critical for the DNA-binding, ATPase, and helicase activities of XPD (Kuper et al., 2014), as well as for residues Y192 and R196, which form part of a DNA lesion recognition pocket (Mathieu et al., 2013) ( Figure 5B). This functionally important region is only partially conserved in archaeal XPD homologs ( Figure 5-figure supplement 1G-I) (Fan et al., 2008;Kuper et al., 2012;Wolski et al., 2008). A eukaryotic-specific loop insertion in the XPD ARCH domain (Greber et al., 2017;Schilbach et al., 2017) closely approaches this binding pocket ( Figure 5B) and may serve to regulate the binding of DNA in the lesion recognition pocket such as to prevent untimely access of substrates to the XPD pore.

Interactions and regulation of XPD
Our structure of TFIIH shows that XPD forms architectural and regulatory interactions with four other TFIIH subunits: XPB, p62, p44, and MAT1, which together form a cradle-like structure around XPD ( Figure 5C). We described above two interactions that could potentially regulate XPD activity: the newly defined interaction of an insertion element in the XPB DRD with a DNA-binding site in XPD ( Figure 3E that implicate p62, as well as XPB, in XPD regulation. Additionally, it is known that the helicase activity of XPD is inhibited by the CAK subcomplex (Araújo et al., 2000;Sandrock and Egly, 2001). The contacts we see between MAT1 and XPD localize to the ARCH domain of XPD and the N-terminal RING domain and helical bundle of MAT1 (residues 1-130) ( Figure 5C), in agreement with previous structural (Greber et al., 2017;Schilbach et al., 2017) and biochemical analysis (Abdulrahman et al., 2013;Luo et al., 2015;Warfield et al., 2016). The interaction between the XPD ARCH domain and the MAT1 helical bundle is characterized by charge complementarity (Figure 5-figure supplement 1J-M). This interface is highly dynamic, enabling the release of MAT1 The region corresponding to the view in this panel (but viewed from the back side) is indicated in (A). (C) Interaction network of XPD with surrounding TFIIH subunits (interacting regions colored, remainder grey). (D) Cartoon model for repression and de-repression of XPD by MAT1, XPB, and p62. (E) XPD-p44 interacting regions (defined as residues within <4 Å of the neighboring protein) are colored in dark green (XPD) and dark red (p44). Residues discussed in the text are shown as sticks; those with mutation data (natural variants or experimental constructs) are colored yellow on XPD, teal on p44. The remainder of the b4-a5 loop harboring the synthetic p44 mutations is colored teal as well. DOI: https://doi.org/10.7554/eLife.44771.020 The following figure supplements are available for figure 5: and the entire CAK subcomplex from TFIIH during NER, as well as its subsequent re-association to regenerate a transcription-competent TFIIH (Coin et al., 2008).
Insertion of substrate DNA into the pore between the XPD ARCH and FeS domains requires the flexibility of the XPD ARCH domain (Constantinescu-Aruxandei et al., 2016), and large domain motions have been observed in the structure of the DNA-bound homologous helicase DinG upon nucleotide binding (Cheng and Wigley, 2018). This suggests a role for the mobility of the ARCH domain in both DNA loading and DNA translocation by the XPD helicase. Our structure suggests that binding of the MAT1 helical bundle and RING domain to the ARCH domain may prevent such motion and therefore the subsequent substrate loading and XPD helicase activity ( Figure 5D), in agreement with biochemical data that show XPD inhibition upon MAT1 binding (Sandrock and Egly, 2001), as well as reduced single-stranded DNA affinity of TFIIH in the presence of the CAK (Li et al., 2015). Conversely, release of MAT1 from XPD might allow the ARCH domain to move more freely, thereby de-repressing XPD. Furthermore, displacement of the MAT1 a-helix that connects XPD to XPB may allow XPB to move away from XPD, thereby unmasking the substrate-binding site on XPD RecA2 that is otherwise occluded by the DRD insertion element ( Figure 5D). This latter conformational change would be similar, overall, to that seen for TFIIH upon incorporation into the Pol II-PIC, where XPD and XPB move apart and density for the MAT1 helix is missing ( Figure 4C) (He et al., 2016;Schilbach et al., 2017). We propose that the combined unmasking of the XPD substrate binding site and the enhanced flexibility of the XPD ARCH domain may both contribute to derepression of the XPD helicase upon release of MAT1. This mechanism of XPD inhibition by MAT1 does not exclude the possibility of additional repression of NER activity by the CAK subcomplex through phosphorylation of NER pathway components (Araújo et al., 2000).
Our structure also resolves in detail the XPD-p44 interaction, a known regulatory interface (Dubaele et al., 2003;Kim et al., 2015;Kuper et al., 2014) affected by numerous disease mutations (Cleaver et al., 1999;Greber et al., 2017;Kuper et al., 2014) ( Figure 5E). The relatively small interaction surface between p44 and XPD, of just 940 Å 2 ( Figure 5-figure supplement 2A), contrasts with the much larger buried surface of 3300 Å 2 between XPD and p62, or 1580 Å 2 for the p52-XPB interaction. This smaller interaction surface may result in higher sensitivity to mutations that localize at the XPD-p44 interface. Our structure, thus, rationalizes the deleterious effect of a number of natural and synthetic mutations in this interface (see Appendix 1 and Figure 5-figure supplement 2B-E), including mutations L174W and T175R in the b4-a5 loop of p44 ( Figure 5E) Seroz et al., 2000), which may lead to steric clashes in the densely packed interface ( Figure 5-figure supplement 2B), and the XPD R722W mutation (Kuper et al., 2014), which disrupts the salt bridge with D75 in p44 and may additionally cause steric clashes with neighboring p44 residues due to the bulky tryptophan side chain ( Figure 5-figure supplement 2C). Our structure also shows that, in contrast to a previously proposed model (Luo et al., 2015), the XPD R616P, D673G, and G675R disease mutations act either via disruption of the XPD structure or the XPD-p44 interface, but not via disruption of the interaction with p62 (see Appendix 1 and Figure 5-figure supplement 2D). Notably, the p44-dependet stimulation of XPD activity does not depend on the presence of p62 (Dubaele et al., 2003;Kim et al., 2015;Kuper et al., 2014). We were also able to map a number of disease mutations onto our XPD structure ( Figure 6A, Video 3, Appendix 2) and analyze in detail the interactions involving the affected residues (example shown in Figure 6B). Our analysis confirms that XP mutations mostly localize near the helicase substrate-binding or active sites, while TTD mutations predominantly localize to the periphery of XPD (Figure 6, Figure 6-figure supplement 1) (Fan et al., 2008;Liu et al., 2008), where they disrupt TFIIH assembly and cause the transcription defects that are a hallmark of this disease (Dubaele et al., 2003) (Appendix 2).

Discussion
Our study reveals the complete structure of the TFIIH core complex and provides mechanistic insights into the regulation of its two component helicases. Specifically, it shows XPD wrapped by numerous interactions with XPB, p62, p44, and MAT1 ( Figure 5C,D), indicating how its activity can be tightly controlled and de-repressed only when its enzymatic function is needed. XPD activity is not needed and most likely inhibited during transcription initiation, but it may also be tightly controlled during NER, when repair bubble opening and lesion verification need to be coordinated with the recruitment and activation of the damage recognition and processing machinery (Figure 7).
While the regulation of XPD by MAT1 and p44 has been studied in some detail, and the domain motions in TFIIH suggest a straightforward mechanism for liberating the substrate-binding site on XPD RecA2, less was known about the interplay between XPD and p62. Our structure now shows how p62 is able to impede both substrate and nucleotide binding in XPD RecA1, and hints at dynamic structural changes of p62 during de-repression and enzymatic activity of XPD, possibly regulated by other components of the transcription or NER pathways.
Our results allow us to put extensive biochemical data on the NER pathway into a structural context (Figure 7). Depending on whether XPB binds to the damaged ( Figure 7A) or undamaged ( Figure 7B) strand, the combined action of XPD and XPB could lead to the extrusion of a DNA bubble ( Figure 7A) or to the tracking of the entire complex towards the lesion ( Figure 7B), which is initially located 3' of the TFIIH binding site (Sugasawa et al., 2009). The latter hypothesis is attractive in the context of biochemical data that show that XPD tracks along the damaged strand in the 5' to 3' direction until it encounters the DNA lesion in order to verify the presence of a bona fide NER substrate (Buechner et al., 2014;Li et al., 2015;Mathieu et al., 2013;Naegeli et al., 1993;Figure 6. Disease mutations in XPD. (A) Residues affected by disease mutations are shown as spheres (XP purple; TTD yellow; CS-XP orange). Conserved helicase elements for DNA binding are shown in blue, for nucleotide binding and hydrolysis in green, and for coupling of nucleotide hydrolysis and DNA translocation in brown. DNA superposed from (Cheng and Wigley, 2018). (B) Salt bridge between R658 (RecA2) and D240 (RecA1) visualized in our structure that is affected by the temperature sensitive TTD mutation R658C (Vermeulen et al., 2001). Video 3. Visualization of disease mutations mapped onto the structure of the TFIIH core complex. The human disease mutations discussed in the manuscript are shown in the context of the TFIIH structure. The areas depicted are: (i) The interaction interface between p8 and XPB, affected by a TTD mutation; (ii) the XPB N-terminal domain, affected by XP and TTD mutations; (iii) the active site region of XPD, affected mostly by XP and XP/CS mutations; (iv) the DNAbinding cleft of XPD, affected mostly by XP mutations; (v) the interaction site between XPD and p44, affected mostly by TTD mutations; (vi) the interaction site between MAT1 and XPD, affected by a TTD mutation (see text for further details). DOI: https://doi.org/10.7554/eLife.44771.025 Sugasawa et al., 2009;Wirth et al., 2016). It is worth noting that the length of DNA fragments excised during NER is approx. 29 nt, with 22 nt located 5' and 5 nt located 3' of a thymine dimer lesion (Huang et al., 1992). According to our structure, the 22 nt 5'-fragment corresponds well to the estimated 20 nt of DNA that are required to span the distance from the DNA damage verification pocket in XPD (Mathieu et al., 2013) to the helicase elements of XPB. This proposal is compatible with a model in which TFIIH sitting on the open repair bubble might track towards the lesion, where it would stop due to inhibition of XPD (Li et al., 2015;Mathieu et al., 2013;Naegeli et al., 1993), at which point double incision could be initiated. However, this model ( Figure 7B) would require strong DNA bending before both XPB and XPD could be loaded. Additionally, it has not been fully resolved whether XPB participates in DNA translocation or unwinding during TFIIH activity in NER (Li et al., 2015), which would be required in the tracking model (Li et al., 2015), or whether it exclusively acts to anchor the complex in the vicinity of the DNA lesion Oksenych et al., 2009).
Independently of the orientation of the repair bubble, our structural data are compatible with literature data introduced above and a model ( Figure 7C) that localizes XPG near XPD and p62 (site of 3'-incision), XPF-ERCC1 near XPB (site of 5'-incision), and with RPA binding the non-damaged strand (Fagbemi et al., 2011). We have currently not included XPA in this model because its interactions with distinct partners or participation in various processes, such as involvement in CAK release (Coin et al., 2008), binding to p8 (Ziani et al., 2014), and participation in helicase stalling after lesion recognition (Li et al., 2015), suggest its localization to various, often distant sites on TFIIH, or the repair bubble in general (Sugitani et al., 2016).
In summary, our structure of the human TFIIH core complex reveals the interactions that govern the architecture and function of this molecular machine, provides new insights into the regulation of its enzymatic subunits, and thus constitutes an excellent framework for further mechanistic studies of TFIIH in the context of larger DNA repair and transcription assemblies.

Materials and methods
TFIIH purification, cryo-EM specimen preparation, and data collection TFIIH was purified and cryo-EM grids were prepared on carbon-coated C-flat CF 4/2 holey carbon grids (Protochips) using a Thermo Fisher Scientific Vitrobot Mk. IV, as previously described (Greber et al., 2017). To improve on our previous 4.4 Å cryo-EM map of human TFIIH (Greber et al., 2017), which was based on four cryo-EM datasets (3 of which were retained in the 4.4 Å reconstruction, datasets 8-10 in Supplementary file 1) from a low-base Titan microscope (Thermo Fisher Scientific) equipped with a side-entry holder (Gatan) and a K2 Summit direct electron detector (Gatan), we collected new data (dataset seven in Supplementary file 1) on a Titan KRIOS microscope (Thermo Fisher Scientific) operated at 300 kV extraction voltage and equipped with a C S -corrector, a K2 Summit direct electron detector (Gatan) operated in super-resolution counting mode, and a Quantum energy filter (Gatan). This dataset was collected under the same imaging conditions as our previous data (i.e. 37,879 x magnification resulting in 1.32 Å pixel size, and at a total exposure of 40 e -Å À2 ), except for the change of microscope. Datasets 7-10 could be combined to yield a cryo-EM map at 4.3 Å resolution (not shown), however, this did not lead to a substantial improvement in map quality, suggesting that particle alignment quality was limiting. We therefore opted to collect further data on a Titan KRIOS electron microscope (Thermo Fisher Scientific) operated at 300 kV acceleration voltage and equipped with a Volta Phase Plate (VPP), a Gatan Quantum energy filter (operated at 20 eV slit width), and a Gatan K2 Summit direct electron detector (operated in super-resolution counting mode). VPP data (datasets 1-6 in Supplementary file 1) were collected according to the defocus acquisition technique Khoshouei et al., 2017) at 43,478 x magnification, resulting in a physical pixel size of 1.15 Å on the object scale, with a total electron exposure of 50 e -Å À2 at an exposure rate of 6.1 e -Å À2 s À1 during an exposure time of 8.25 s, dose fractionated into 33 movie frames (50 frames for dataset 6). Data collection was monitored on-the-fly using FOCUS (Biyani et al., 2017) to ensure proper evolution of the VPP-induced phase shift.
Initially, we used data collected in 10 microscopy sessions, six sessions using the VPP and four sessions without the VPP, resulting in >30'000 total micrographs, of which approx. 16,000 were retained after inspection of the quality of Thon rings and CTF fitting (for details, see Supplementary file 1). Movie stacks were aligned and dose weighed using MOTIONCOR2 (Zheng et al., 2017). The aligned, dose weighed sums from the datasets collected at 1.32 Å pixel size (datasets 7-10) were up-sampled to 1.15 Å per pixel to match the scale of the micrographs collected using the VPP (datasets 1-6) after calibrating the two magnifications to each other based on 3D reconstructions computed from the two types of data. CTF parameters were estimated using GCTF (Zhang, 2016) and particles were picked using GAUTOMATCH (Kai Zhang, MRC Laboratory of Molecular Biology, Cambridge UK) or RELION (Scheres, 2015) using templates generated from a preliminary run without reference templates. All subsequent data processing was performed in RELION 2 (Kimanius et al., 2016;Scheres, 2012) or RELION 3 Zivanov et al., 2018).
To remove false positive particle picks and broken particles, an initial 3D classification at low resolution (7.5˚angular sampling) was performed on each dataset individually (datasets 3, 4, 5), or on a few pooled datasets if appropriate (datasets 1 and 2 were pooled as they used the same batch of specimen; the non-VPP datasets 7-10 were joined because only few micrographs were retained due to more stringent quality criteria compared to our previous study; and dataset six was initially classified together with particles from dataset four to compensate for particle orientation bias in dataset 6, see Figure 1-figure supplement 1). In summary, a total of >2,000,000 initial particle picks were subjected to this initial low-resolution 3D classification, identifying approx. 820,000 intact particles that were subjected to further processing. After 3D auto-refinement and another round of 3D classification, performed separately for the VPP and non-VPP data because the two data types were spuriously separated into distinct classes in combined RELION 3D classification runs, the best classes (one from VPP and non-VPP data each) resulting from the high-resolution 3D classifications were refined according to the gold-standard refinement procedure (fully independent half-sets), resulting in a 3.9 Å -resolution reconstruction according to the FSC = 0.143 criterion (Rosenthal and Henderson, 2003;Scheres and Chen, 2012). Beam tilt refinement in RELION 3 (Zivanov et al., 2018) improved the map computed from the final subset of VPP data (138,659 particle images) to 3.7 Å Figure 7. Implications for assembly of the repair bubble during NER. (A) Schematic of DNA-bound TFIIH (DNA damage verification pocket in XPD and DNA 5'-phosphates indicated by purple and orange spheres, respectively) Binding of both XPB and XPD to the damaged strand would lead to extrusion of a bubble when XPD scans in the 5'À3' direction, while XPB may be stationary or contribute to bubble extrusion if translocating in the 3'À5' direction (DNA superposed from PDB IDs 6FWR, 5OQJ (Cheng and Wigley, 2018;Schilbach et al., 2017)). (B) Binding of XPB to the undamaged strand would enable the entire complex to scan in 5'À3' direction, given the opposing polarities of the two ATPases/helicases involved. (C) Model for the assembled repair bubble. Positions of NER factors are approximate. XPG-p62 PH domain interaction according to (Gervais et al., 2004). See Discussion for details. DOI: https://doi.org/10.7554/eLife.44771.026 resolution. The non-VPP data no longer improved the reconstruction after beam tilt correction and was therefore discarded at this point. The final map was post-processed by application of a B-factor of À142 A 2 and low-pass filtration to the nominal 3.7 Å resolution for visualization and later coordinate refinement.
We note that even though the final reconstruction comprises only a relatively small fraction of the total particle picks, the first 3D refinement from 786,755 VPP particle images (Figure 1-figure supplement 1) resulted in a 4.3 Å -resolution map that is in excellent agreement with the final map, except for lower resolution and worse map quality caused by residual heterogeneity that was addressed in the subsequent 3D classification step to yield the final set of 138,659 particle images. Therefore, we conclude that our final reconstruction is representative of the overall particle population in the dataset.
To facilitate the interpretation of less ordered or only partially occupied parts of the structure, including the p62 BSD2 domain, the MAT1 RING domain, the MAT1 three-helix bundle at the XPD arch domain, and the N-terminus of XPD, we used signal subtracted classification Nguyen et al., 2015) (p62 BSD2 domain, Figure 1-figure supplement 3), focused classification (MAT1 RING domain, Figure 1-figure supplement 4), and multibody refinement ) (MAT1 three-helix bundle and XPD N-terminus, Figure 1-figure supplement 5). For these classification procedures, we used only the VPP data because 3D classification separated VPP and conventional cryo-EM data into distinct classes, rendering combined classification ineffective. Multibody refinement led to only a slight improvement in resolution for the XPD-MAT1 body (to 3.6 Å ) relative to the overall refined best map, and only during the first two iterations, likely due to the relatively small size of the individual bodies and the resulting limited signal for alignment. However, the above-mentioned structural elements showed improved density features (Figure 1-figure supplement 5B) and could be more reliably interpreted in the multibody-refined XPD-MAT1 map (green in

Model building and refinement
The previous structure of the human TFIIH core complex (Greber et al., 2017) and of yeast TFIIH in the Pol II-PIC (Schilbach et al., 2017) were docked into the cryo-EM map and used as the basis for atomic modeling in O (Jones et al., 1991) and COOT (Emsley et al., 2010). In addition to these models, the structure of the human p34 VWFA-p44 RING domain complex (Radu et al., 2017), the N-terminal RING domain of MAT1 (Gervais et al., 2001), the C-terminal RecA-like domain of human XPB (Hilario et al., 2013) and several homology models for the p52 winged-helix domains generated using the PHYRE2 web server (Kelley et al., 2015) based on templates PDB IDs 3F6O and 1STZ (Liu et al., 2005) were used for model building.

Building of 52
The structure of p52 was traced and built completely de novo, with the exception of the wingedhelix domains and the very C-terminal domain, where a homology model was placed into the cryo-EM map together with the p8 structure (Kainov et al., 2008;Vitorino et al., 2007) and adjusted to the density.

XPB
The structure of the XPB NTD was also built de novo, the structure of the DRD was extensively rebuilt, and the RecA-like domains were rebuilt to improve the fit to the density.

XPD
The improved cryo-EM map enabled detailed re-building of XPD, including correction of register shifts in the more poorly ordered regions of the protein and extension of the N-and C-termini.

MAT1
The N-terminal MAT1 RING domain (Gervais et al., 2001) was first docked into the focused classified density and combined with the rest of the TFIIH model, were it helped guide the assignment of the sequence register to the MAT1 model, in combination with density features of large side chains in the helical regions.

Building of p34
The human p34 structure (Radu et al., 2017) was docked into the map as is, extended near the interaction site with p52, and combined with a completely re-built model of the C-terminal eZnF domain (Schilbach et al., 2017).

Building of p44
The p44 VWFA fold needed only minor rebuilding and was combined with the eZnF domain and the C-terminal human RING domain model (Radu et al., 2017;Schilbach et al., 2017). The p44 NTE was built according to the density at the contact site with XPB and guided by CX-MS data (Luo et al., 2015). Both the features of the cryo-EM map and crosslinks of the p44 NTE to p34, p52, and XPB (Figure 1-figure supplement 6E) unambiguously confirm the tracing of this segment towards the XPB NTD, rather than alternative tracing towards XPD (this density is now assigned to p62, in agreement with p62-XPD crosslinks; see below and

Building of p62
The p62 protein was modeled based on the placement of the BSD domains (PDB ID 2DII), secondary structure prediction, extension of docked coordinates (Greber et al., 2017;Schilbach et al., 2017), and new tracing of the protein chain. Placement of the regions near XPD, where density is weak overall, was guided by matching the succession of secondary structure elements along the p62 sequence with helical densities in the cryo-EM map (Figure 2-figure supplement 1A), and corroborated by CX-MS data (Luo et al., 2015), which showed excellent agreement of p62-XPD crosslinking data with the structure (Figure 1-figure supplement 6G). Crosslinks between p62 and p44 showed a relatively large proportion of outliers (Supplementary file 4), which may be due to the fact that the sequence register of the p62 segments to which these crosslinks map is not well constrained. These segments were modeled as poly-alanines and deposited without sequence assignment (UNK; Supplementary file 2). Maps low-pass filtered to 6 Å and sharpened by a B-factor of only À100 Å 2 were used to guide docking of domains and assess the continuity of the density in poorly ordered regions of the protein (Figure 2-figure supplement 1B,C).
The resulting coordinate model (Supplementary file 2) was refined against the final overall reconstruction at 3.7 Å resolution using the real space refinement program in PHENIX (Adams et al., 2010;Afonine et al., 2018) and validated using the MTRIAGE program in PHENIX and the MOL-PROBITY web server . Ramachandran, C b , rotamer, and secondary structure restraints were used throughout the refinement to ensure good model geometry at the given resolution. Data used in the refinement excluded spatial frequencies beyond the nominal 3.7 Å resolution of the cryo-EM map to prevent over-fitting. Additionally, by specifically monitoring the bond length and bond angle r.m.s.d. values, the real space refinement program in PHENIX automatically estimates the relative weighing of the restraint and map data to maintain good model geometry and to prevent over-refinement of the structure (Adams et al., 2010;Afonine et al., 2018). Because the automatically determined weight fluctuated between approximately 3 and 6 during a typical refinement run, we used the average value of 4.5 for the final refinement (five macro cycles of global optimization and B-factor refinement). The N-terminal RING domain of MAT1 and the BSD1 domain of p62, for which only poorly resolved density is present in the final cryo-EM map, were additionally restrained by reference restraints (Headd et al., 2012) using the NMR structures of the corresponding domains (PDB ID 1G25 and 2DII, respectively) (Gervais et al., 2001). The side chains of these two domains (with the exception of residues involved in zinc finger formation and of prolines) were truncated at the Cb position to reflect the lower resolution of the corresponding densities. The FSC curve between the refined coordinate model and the cryo-EM map extends to 3.9 Å and the distribution of B-factors in the refined coordinate model (Figure 1-figure supplement 2C) mirror the local resolution of the cryo-EM map (Figure 1-figure supplement 2D), as expected. Refinement statistics are given in Supplementary file 3 and are typical for structures in this resolution range (100 th percentile for MOLPROBITY clash score and overall score ).

Flexibility analysis
For the analysis of conformational dynamics of TFIIH, VPP datasets 1 and 2 were subjected to multibody refinement in RELION 3  using six masks (Figure 4-figure supplement  1). After completion of multi-body refinement, we used RELION three to run a principal component analysis to identify the principal modes of motion of the bodies relative to each other . The volume series for the first 12 principal components were reconstructed and difference densities (green and purple in Figure 4-figure supplement 1) were computed between the most extreme states in each series and are shown in Figure 4-figure supplement 1. Subsequently, roughly 20,000 particles corresponding to both ends of the distribution were used for selected principal components and subjected to 3D refinement, resulting in maps of approx. 10 Å resolution. It is important to note that the particles used for these refinements, and the subsequent analysis shown in Figure 4A, were un-subtracted original particle images containing the entire TFIIH. These refinements are therefore not directly affected by any limitations on alignment accuracy that would arise from alignment of smaller sub-volumes of TFIIH. We also repeated this analysis for two different data subsets (the final 138,659 particle-subset that gave rise to the 3.7 Å -resolution reconstruction and the complete set of 786,755 VPP particles resulting from the initial 3D classification) using only three bodies for multibody refinement (providing more signal for alignment per body) and obtained consistent results overall, with the exception that the ranking of the principal components changed in some instances.
The refined atomic model of the TFIIH core complex, subdivided into suitable rigid bodies, was then rigid-body refined into these volumes using PHENIX real space refinement (Afonine et al., 2018) and coordinate displacement between the two resulting models for each principal component was plotted to obtain an initial assessment of the modes of motion present in the TFIIH dataset ( Figure 4A). For actual structural interpretation ( Figure 4B,C), the final cryo-EM maps of the TFIIH core complex (this work) and TFIIH in the context of the Pol II-PIC (Schilbach et al., 2017) were used.

Other
Figures were created using PyMol (The PyMOL Molecular Graphics System, Version 1.8 Schrö dinger, LLC.) and the UCSF Chimera package from the Computer Graphics Laboratory, University of California, San Francisco (supported by NIH P41 RR-01081) (Pettersen et al., 2004). Protein-protein interface statistics were determined using PISA (Krissinel and Henrick, 2007). Multiple sequence alignments were performed with Clustal Omega (Sievers et al., 2011).

Data availability
The cryo-EM map of the human TFIIH core complex at 3.7 Å and the refined coordinate model have been deposited to the EMDB and PDB with accession codes EMD-0452 and PDB-6NMI, respectively. Additional cryo-EM maps resulting from the classification of the dataset for presence of the MAT1 RING domain and for the p62 BSD2 domain (both presence and absence) have been deposited to the EMDB with accession codes EMD-0587, EMD-0589, and EMD-0588, respectively. The multibody-refined maps for XPD-MAT1, XPB-p8-p52 (clutch, CTD), and p44-p34-p62-p52 (N-terminal region) have been deposited with accession codes EMD-0602, EMD-0603, and EMD-0604, respectively.
GM63072, P01-GM063210, and R35-GM127018 to EN. BJG was supported by fellowships from the Swiss National Science Foundation (projects P300PA_160983, P300PA_174355). EN is a Howard Hughes medical investigator. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. Additional files

Supplementary files
. Supplementary file 1. Data collection statistics. All datasets were acquired on Gatan K2 Summit direct electron detectors mounted in 300 kV-electron microscopes with three-condenser type electron optics. The high rejection rate for the data collected without VPP was due to poorer CTF resolution estimates compared to the VPP data, likely because of the use of a low-base Titan with a less stable side entry holder for most of these data. Abbreviations: TEM, transmission electron microscope; VPP, volta phase plate; S, sum. In summary, our structure pinpoints the locations of XPD mutations that provide insight into human disease mechanisms, and in most cases resolves the density for the side chains of the affected residues, thereby providing a structural framework for the detailed analysis of the interactions of the affected residues and the effects of these mutations.