Structural flexibility of human α‐dystroglycan

Dystroglycan (DG), composed of α and β subunits, belongs to the dystrophin‐associated glycoprotein complex. α‐DG is an extracellular matrix protein that undergoes a complex post‐translational glycosylation process. The bifunctional glycosyltransferase like‐acetylglucosaminyltransferase (LARGE) plays a crucial role in the maturation of α‐DG, enabling its binding to laminin. We have already structurally analyzed the N‐terminal region of murine α‐DG (α‐DG‐Nt) and of a pathological single point mutant that may affect recognition of LARGE, although the structural features of the potential interaction between LARGE and DG remain elusive. We now report on the crystal structure of the wild‐type human α‐DG‐Nt that has allowed us to assess the reliability of our murine crystallographic structure as a α‐DG‐Nt general model. Moreover, we address for the first time both structures in solution. Interestingly, small‐angle X‐ray scattering (SAXS) reveals the existence of two main protein conformations ensembles. The predominant species is reminiscent of the crystal structure, while the less populated one assumes a more extended fold. A comparative analysis of the human and murine α‐DG‐Nt solution structures reveals that the two proteins share a common interdomain flexibility and population distribution of the two conformers. This is confirmed by the very similar stability displayed by the two orthologs as assessed by biochemical and biophysical experiments. These results highlight the need to take into account the molecular plasticity of α‐DG‐Nt in solution, as it can play an important role in the functional interactions with other binding partners.

Dystroglycan (DG), composed of a and b subunits, belongs to the dystrophin-associated glycoprotein complex. a-DG is an extracellular matrix protein that undergoes a complex post-translational glycosylation process. The bifunctional glycosyltransferase like-acetylglucosaminyltransferase (LARGE) plays a crucial role in the maturation of a-DG, enabling its binding to laminin. We have already structurally analyzed the N-terminal region of murine a-DG (a-DG-Nt) and of a pathological single point mutant that may affect recognition of LARGE, although the structural features of the potential interaction between LARGE and DG remain elusive. We now report on the crystal structure of the wild-type human a-DG-Nt that has allowed us to assess the reliability of our murine crystallographic structure as a a-DG-Nt general model. Moreover, we address for the first time both structures in solution. Interestingly, small-angle X-ray scattering (SAXS) reveals the existence of two main protein conformations ensembles. The predominant species is reminiscent of the crystal structure, while the less populated one assumes a more extended fold. A comparative analysis of the human and murine a-DG-Nt solution structures reveals that the two proteins share a common interdomain flexibility and population distribution of the two conformers. This is confirmed by the very similar stability displayed by the two orthologs as assessed by biochemical and biophysical experiments. These results highlight the need to take into account the molecular plasticity of a-DG-Nt in solution, as it can play an important role in the functional interactions with other binding partners.
Dystroglycan (DG) is a heterodimeric transmembrane glycoprotein that is a part of the multimeric dystrophinglycoprotein complex and plays a crucial role in the association of cells with the basement membranes [1]. DG links the basal lamina with the cytoskeleton by bridging the intracellular dystrophin to a plethora of Abbreviations DG, dystroglycan; D max , maximum size of the particle; DSF, differential scanning fluorimetry; ECM, extracellular matrix; EOM, ensemble optimization method; ha-DG-Nt, human a-DG-Nt; Ig-like, immunoglobulin-like; LARGE, like-acetylglucosaminyltransferase; MMexp, experimental molecular mass of the solute; ma-DG-Nt, murine a-DG-Nt; NSD, normalized spatial discrepancy; R g , radius of gyration; rmsd, root mean square deviation; S6 domain, small subunit ribosomal protein S6 of Thermus thermophilus; SAXS, small-angle X-ray scattering; T m , melting temperatures; V p , excluded volume of the hydrated particle; a-DG-Nt, N-terminal region of a-DG; a-DG, a-dystroglycan; b-DG, b-dystroglycan. extracellular matrix (ECM) proteins, that is, laminin, agrin, and perlecan, thus offering stability to tissues. DG is highly expressed in skeletal muscle where it was first discovered [2] and where it confers structural stability to the sarcolemma during contraction, but it is also strongly expressed in heart, in brain, and in peripheral nerves, where it is involved in various physiological processes [3]. Moreover, DG has been also associated with Old Word arenaviral infections, acting as a receptor for virus anchoring [4].
Dystroglycan is encoded by a single gene (DAG1) [2] and the corresponding precursor is proteolytically cleaved within the endoplasmic reticulum, resulting in the formation of the extracellular a-dystroglycan (a-DG) and the transmembrane b-dystroglycan (b-DG). In their mature forms, aand b-DG are linked together through noncovalent interactions involving the N-terminal and the C-terminal regions of b-DG and a-DG, respectively. a-DG undergoes a complex, and is still not fully understood, glycosylation post-translational process, which involves several enzymes at various stages of a-DG maturation [5] in both endoplasmic reticulum and Golgi apparatus. A correct a-DG glycosylation has been shown to be critical for its physiological functions. In a family of neuromuscular diseases called secondary dystroglycanopathies, the hypoglycosylated forms of a-DG, resulting from defective enzymes responsible for a-DG glycosylation, display limited binding capabilities toward laminin with severe implications for health [6]. A low degree of a-DG glycosylation has also been found in rare diseases caused by single point mutations hitting the DG gene [7][8][9]. In recent years, it has been discovered that the Ca 2+ -dependent interaction of a-DG with its main binding partner in ECM, the LG domain, is specifically mediated by a novel polysaccharide [10]. The a-DG glycosylation biosynthetic pathways involve a kinase and several glycosyltransferases [5], among which the bifunctional glycosyltransferase like-acetylglucosaminyltransferase (LARGE) that adds the repeating disaccharide unit (-a3-GlcA-b3-Xyl-) to a glycan anchored at the site defined by Thr317 and Thr319 [11]. Indeed, it has been reported that the elongation of the glycan operated by LARGE requires the presence of the N-terminal region of a-DG (a-DG-Nt), which is thought to act as a recognition site for LARGE before being processed by a furin-like proprotein convertase [12,13]. It has been proposed that a-DG-Nt would be able to bind other partners in the ECM [14], but the biological implications of these potential interactions remain elusive [15].
We aimed at characterizing the biophysical and biochemical bases behind the biological function of a-DG, and our efforts have been focused on the comprehension of the molecular determinants that modulate its binding abilities [16]. An electron microscopy study showed that a-DG assumes a dumbbell-like shape, with two globular N-terminal and C-terminal regions at the extremes of a mucin-like region [17]. Furthermore, the crystal structure of the murine N-terminal region disclosed a modular architecture composed by an immunoglobulin-like (Ig-like) domain and a second domain similar to the small subunit ribosomal protein S6 of Thermus thermophilus (S6 domain) [18]. Although the experimental structure of the C-terminal region of a-DG is still unknown, homology modeling suggests that its fold is likely to be a second Ig-like structure [19]. According to the crystallographic structure, murine a-DG-Nt (ma-DG-Nt) assumes an overall rather compact fold, with the Ig-like and S6 domains interacting with each other and linked together by a flexible loop [18]. Such an organization has been observed in the crystal structure of the murine a-DG missense pathological mutant T190M, which displays the same fold, mutual orientation, and interaction between the Ig-like and S6 domains observed for WT a-DG-Nt [20]. Despite such high degree of structural similarity, it has been proposed that T190M might have a reduced ability to assist LARGE in its glycan elongation action, and the resulting a-DG hypoglycosylation leads to a form of limb-girdle muscular dystrophy [7]. The comparison of the WT and T190M crystallographic models ruled out any effect of the T190M mutation on the Ig-like and S6 domains folding, as earlier proposed by a computational study [21]. Therefore, we considered whether the murine crystallographic structure could be reliably used as a general structural model for a-DG-Nt. In addition, we have explored a-DG-Nt propensity for plasticity in solution by small-angle X-ray scattering (SAXS) analysis of both murine and human proteins. The high-resolution crystal structure of human a-DG-Nt (ha-DG-Nt) has also been determined, in order to assess the structural similarities between the two orthologs, which are 93% identical in their amino acid sequence. The combination of the high-and low-resolution structural data (respectively, in the crystals and in solution) with biochemical and biophysical experiments proves that the murine and human proteins share a highly conserved overall architecture as well as a striking parallel structural flexibility in solution.

Results
Conformational stability of the N-terminal domains of murine and human a-dystroglycan Conformational stabilities of murine and human a-DG-Nt were compared by biochemical and biophysical experiments, that is, differential scanning fluorimetry (DSF) and limited proteolysis assays.
In the DSF assay, thermally induced protein denaturation is monitored by the increase in SYPRO Orange fluorescence upon exposure of hydrophobic patches during protein unfolding. The comparison of the resulting thermal unfolding profiles and melting temperatures (T m ) can be used to infer differences in the conformation and therefore in the thermal stability of the proteins [22]. Figure 1 shows the changes in fluorescence emission of the murine and human proteins upon thermal unfolding in the presence of the dye.
Both the murine and human variants showed very low background fluorescence in the pretransition region that is quite flat. Their denaturation curves are almost superimposable and suggest a two-transition unfolding process typical of protein containing two independently folded domains. The T m values for the first transition obtained from the Boltzmann sigmoid fitting of the data (43.76 AE 0.08°C and 43.94 AE 0.11°C for murine and human variants, respectively) did not show any significant difference, supporting the hypothesis that the two proteins share a very similar conformational stability in solution. A similar thermal stability of the two proteins is also clear from the inspection of the second transition, although in this case the T m could not be calculated because it was not possible to reach the aggregation region even increasing the final temperature to the upper range value of the instrument.
Limited proteolysis was also used to probe conformational stability, assuming that proteolytic recognition sites become accessible upon unfolding. Indeed, this technique is widely used to examine flexible and exposed regions, considering that proteolysis occurs exclusively at 'hinges and fringes' [23] and conformational parameters such as accessibility and segmental mobility correlate quite well with exposed proteolytic sites [24]. Limited proteolysis experiments (data not shown) with a panel of seven different proteases did not reveal any significant difference in conformation stability or flexibility between ma-DG-Nt and ha-DG-Nt.
Crystallographic structure of human a-dystroglycan N-terminal domain The crystallographic structure of ha-DG-Nt has been determined at a resolution of 1.8 A. Upon completion of the crystallographic refinement, the final Rfactor was 0.163 (Rfree = 0.195), with residues 52-60, 163-179, and 305-315 missing in the final model as no reliable electron density could be detected for these regions. A lower-quality electron density was also observed for the flexible regions encompassing residues 89-91 and 181-185. The region comprising residues 159-162 shows signs of multiple conformations, but any attempt to model it during the refinement did not improve the 2F o -F c and F o -F c density maps, nor the refinement quality indicators.
The Ig-like domain (residues 62-160) and the S6 domain (residues 182-305) assume the same relative orientation as observed in ma-DG-Nt [18] (Fig. S1A, B) with a root mean square deviation (rmsd) between the murine and ha-DG-Nt crystallographic models equal to 0.468 A (calculated on 225 common C a s). Differences between the primary structures of human and murine a-DG (see Table 1 for alignment) were easily identified in ha-DG-Nt, according to the 2F o -F c and F o -F c density maps (see Fig. S2 for selected examples).
The differences in the primary structures are all mapped on the protein surface of ha-DG-Nt ( Fig. 2 and Fig. S3), with residues fully or partially exposed to the bulk solvent.
The residues that are different in ha-DG-Nt and ma-DG-Nt are clustered in four small patches ( Table 2) that are longitudinally distributed along one edge of the proteins (Fig. 2). Such an uneven distribution in a-DG-Nt seems to be an intrinsic property of this protein region (Fig. S4). Patches P1 and P2 are located on the Ig-like domain, whereas patches P3 and P4 are on the S6 domain (Fig. S3). Patches P2 and P3 face each other in a large cleft lined by the b-strands B, D, and E of the The increase in fluorescence emission at 516 nm indicates the association of SYPRO Orange with exposed hydrophobic residues as the protein unfolds. Experiments were performed in triplicate: A single representative curve is shown for clarity.
Ig-like domain and by the a-helices H2 and H3 of the S6 domains (see [18] and Fig. S3B). P2 and P3 patches do not show any explicit mutual interaction.
While the global rmsd is quite low, small but significant deviations between superposed C a s (around 0.7-1. 2 A against average values of~0.1-0.3 A and a maximum-likelihood error estimate of 0.233 A) are observed in the zones encompassing residues 111-114, 134-145, and 155-162. The zone defined by residues 134-145, which includes N-and C-terminal regions of b-strands F and G and the turn connecting them [18], does not show any remarkable structural variation and the C a s deviations above the average are probably due to the intrinsic dynamics of the turn connecting the two strands. Apart from the flexible linker (residues 161-181) connecting the Ig-like and the S6 domains, the zone encompassing residues 111-114 shows the highest deviations between superposed C a s (Fig. S2A), which is probably due to the very different nature of the residues occupying the same topological position in the two proteins (Pro110 in ma-DG-Nt and Ser112 in ha-DG-Nt). It is well known that prolines reduce the conformational freedom of the proteins backbone [25], and it is likely that its substitution in the corresponding position of ha-DG-Nt with Ser112 may affect the local main chain conformation. Indeed, in ha-DG-Nt, residues 112-114 assume a helix-3 10 conformation, instead of the turn observed in ma-DG-Nt. We do not observe any relevant backbone deviation between ma-DG-Nt and ha-DG-Nt for the residues being part of patches P3 and P4. Nonetheless, the interacting network of residues spatially closer to His212 and Arg215 (patch P3) is different from that observed in the corresponding region of ma-DG-Nt ( Fig. 3A-B). This discrepancy is likely due to the different chemical nature of the positively charged His and Arg amino acids (ha-DG-Nt) with respect to the polar Asn and Gln residues (ma-DG-Nt). This notion is further supported by the comparison of the electrostatic potentials of ma-DG-Nt and ha-DG-Nt, which are locally different around residues 212 and 215 ( Fig. 3A-B), while overall being quite similar ( Fig. S1A-B).
The ha-DG-Nt and ma-DG-Nt crystal structures display the most significant structural differences in  TT  TT  TTTT  TTT  TTT  Mus musculus  TT  TT  TTT  TTT  TTT  Homo  the flexible linker (see Fig. S2A,B) connecting the Iglike and S6 domains. According to the ha-DG-Nt refined model, the conformation of the N-terminal part of the linker, the only one reliably modeled in both structures, differs from that observed in ma-DG-Nt crystal structure. This finding is not surprising, being the linker very flexible [20], probably playing a pivotal role in a-DG structural plasticity as discussed in the next paragraphs. As mentioned above, the linker is also one of the a-DG zones with the highest sequence variability among different species (see [26] and Table S1). Besides, the presence of Leu164 in ha-DG-Nt (not modeled in the ha-DG-Nt crystallographic structure) instead of Pro162 in the corresponding position of ma-DG-Nt may influence ha-DG-Nt conformational variability with respect to ma-DG-Nt. Indeed, residues 159-161 assume a helix-3 10 conformation in ha-DG-Nt, whereas they display a turn/coil conformation in ma-DG-Nt.

Association state and overall parameters of the N-terminal domains of murine and human a-dystroglycan in solution
Small-angle X-ray scattering experiments were performed to compare the conformations in solution of ma-DG-Nt and ha-DG-Nt.  The protein solutions were analyzed at different concentrations (Table 3), and in both cases, no systematic changes with the solute concentration could be observed, although the murine protein showed a certain degree of aggregation, likely due to the higher concentration of the stock solution.
The experimental SAXS curves, obtained at the highest concentration of ma-DG-Nt and ha-DG-Nt, are displayed in Fig. 4A,B, respectively, and the computed p(r) distance distribution functions are displayed in Fig. 5; the overall parameters extracted from the SAXS data are summarized in Table 3 (additional details on SAXS structural parameters are reported in Table S2).
The molecular mass (MM) of the proteins, estimated from the relative forward scattering intensities (s = 0, with s the scattering vector), suggests that both proteins are monomeric in solution at all conditions tested and is in good agreement with the value estimated from the primary sequences (around 28.5 kDa). This is further corroborated by excluded volume of the hydrated protein molecules (V p ), consistent with the empirical finding for globular proteins that the hydrated volume expressed in nm 3 should numerically be about twice the MM in kDa. The experimental radius of gyration (R g) and maximum size (D max ; Table 3) point to an elongated shape of the proteins, and the two p(r) functions that nicely overlap (Fig. 5) display an asymmetric tail, typical of elongated particles.
It is interesting to note that in both cases, the scattering curves computed by the CRYSOL program [27]

Samples
Concentration Notations: MMexp, experimental molecular mass of the solute; v crystal , v CORAL , v ab initio , and v EOM , discrepancy (chi-square value) for the fit from the crystallographic structures with the missing regions reconstructed by CORAL keeping fixed the two domains, from rigid body modeling using CORAL, from ab initio modeling using DAMMIN, and from EOM, respectively.  based on the crystallographic models (PDB ID 1U2C and 5LLK for ma-DG-Nt and ha-DG-Nt, respectively) give a poor fit to the respective experimental data (data not shown). Even upon reconstruction of the missing regions (around 10 amino acids at both Nterminal and C-terminal regions and the missing linker between the two domains that are kept fixed) using CORAL [28], the fit is not improved (v crystal in Table 3 and 'fit crystal' in Fig. 4 with the zoomed portions at low angles in the inserts). It can be thus concluded that both murine and human a-DG-Nt are monomeric in solution, even at relatively high concentrations, but they show a significantly more extended conformation than in the crystallographic models.

Molecular shape reconstruction of the N-terminal domains of murine and human a-dystroglycan in solution
Two different strategies have been employed to reconstruct the macromolecular shapes of the two proteins in solution. At first, low-resolution three-dimensional models of ma-DG-Nt and ha-DG-Nt were reconstructed from the experimental X-ray scattering data using the ab initio modeling program DAMMIN [29] (Fig. 6), with all models providing an excellent fit to the experimental data (Table 3, v ab initio ). The final DAMMIN models are the result of analyzing and averaging 10 independent solutions. The normalized spatial discrepancy (NSD) value, which describes the similarity between the different models produced by the program [30], is low for both the murine and human models (0.478 AE 0.029 and 0.547 AE 0.014, respectively), indicating that the multiple solutions built by the program are very similar to each other. The visible similarity in the shapes of the two models suggests that at low resolution, no differences could be detected.
A second approach to molecular shape reconstruction consisted in a rigid body modeling of the two proteins from the scattering data conducted by using the program CORAL [28]. High-resolution models of individual Ig-like and S6 domains in the corresponding crystal structures were combined with different conformations of flexible dummy residue linkers. The relative orientations of the two domains and the reconstruction of the flexible missing loops linking the Ig-like and the S6 domains were optimized. The CORAL models nicely overlay with the scattering curves (v CORAL in Table 3 and 'fit CORAL' in Fig. 4 with the zoomed portions at low angles in the inserts) and well superpose into the SAXS envelopes of the ab initio models (Fig. 6). It is interesting to notice that, probably due to packing stabilizing interactions of the crystal lattice, the two crystallographic structures are more compact than the respective conformations in solution, whose more elongated shapes are most evident in the asymmetric tail at the higher r of their p(r) distributions (Fig. 5). As a quantitative measure of structure compactness, the distances between centers of masses of the Ig-like and S6 domains in the crystallographic models (29.7 and 29.9 A in the human and murine models, respectively) have been compared to those of the CORAL models (33.6 and 34.4 A in the human and murine models, respectively), confirming the existence of more extended conformations in solution. The most straightforward explanation is that the solution structures of the N-terminal regions of murine and human a-DG display a conformation that is more flexible than the one inferred from their crystal structures.

Interdomain flexibility of the N-terminal domains of murine and human a-dystroglycan in solution
More extended ma-DG-Nt and ha-DG-Nt solution structures are in agreement with the evidence of a disordered region linking the Ig-like and S6 domains, as suggested by the crystallographic analysis. Indeed, linker flexibility could allow variability in the relative orientation of the individual domains, resulting in structural plasticity.
An analysis of the interdomain flexibility and size distribution of possible multiple configurations in solution was conducted by using the ensemble optimization method (EOM) [31], obtaining typical optimized ensembles that fit well the measured scattering data ( Table 3). The EOM analyses of the murine and ha-DG-Nt are presented in Fig. 7 as a size distribution, plotting the R g of the structures forming the random pool and the selected ensembles against their relative frequencies.
The R g distributions of these ensembles (Fig. 7, solid lines) are very similar to each other and nearly as broad as the distribution of randomly generated models (Fig. 7, dashed lines), supporting the hypothesis of a certain degree of interdomain flexibility. Moreover, these R g distributions are both characterized by a bimodal profile. Indeed, the predominant fractions (around 60%) represent relatively compact models with R g of about 21-25 A, while a very small fraction (around 15%) of models with R g of about [30][31][32][33][34][35] A is due to more elongated configurations: two molecular structures representative of different conformations are shown in Fig. 7 (compact conformations on the left and extended conformations on the right). This interdomain flexibility may confer structural plasticity, which in turn may represent the molecular basis regulating a-DG maturation and/or modulating the interactions with other physiological binding partners.

Discussion
The aim of this study was to investigate the comprehensive structure of a-DG-Nt in solution. By exploring for the first time the conformational landscape of human and mouse a-DG-Nts in solution by means of SAXS experiments, we disclosed unexpected shared features of a-DG-Nt that may help to elucidate the molecular basis of the physiological and pathological functions of a-DG. Besides, the analysis of the SAXS and crystallographic data of the two orthologs validates previous results and the structure of ma-DG-Nt as a fully descriptive model for ha-DG-Nt. This notion is further supported by the comparison of the conformational stability of the two orthologs as assessed by biochemical and biophysical experiments.
The ha-DG-Nt crystal structure displays the same fold of ma-DG-Nt, with few structural differences. It is interesting to note that when mapping the not conserved amino acids on the high-resolution 3D structure, they cluster around four distinct patches, which span along only one edge of the a-DG-Nt longest axis (Figs 2 and S4). Such clustering might have a biological significance; that is, these patches might represent 'hotspots' for transient species-specific protein-ligand interactions. On the other hand, the conserved surface involving especially the S6 domain (Fig. S1) may be of functional relevance for LARGE recognition along the a-DG maturation pathway. Moreover, it is interesting to note that the patch involving residues 112 and 113 in ha-DG-Nt just follows Asp111, whose mutation to Asn has been related to pathological a-DG hypoglycosylation [8], suggesting that this residue is involved in the complex mechanism leading to the mature, fully glycosylated a-DG.
In order to explore the molecular structure of the a-DG-Nt in solution, we have undertaken a SAXS study. SAXS is the technique of election for low-resolution structural studies in solution, especially valuable for flexible systems, whose conformational variability description is precluded to crystallography [32]. While SAXS cannot infer the molecular structure at the atomic level like X-ray crystallography can mostly do, it offers the unique opportunity to obtain, albeit at lower resolution, a structural model in solution, free of the packing forces that may instead influence a crystallographic model [33]. Indeed, while packing effects are not expected to affect the compact folds of the single domains, they could in this case have an impact on the relative orientation of the Ig-like and the S6 domains. Mutual domains orientation may be assisted by the flexible 20-residue-long linker connecting them, which may influence the overall conformation of the a-DG-Nt. Combined with the crystallographic models of murine and ha-DG-Nt s, SAXS analysis may provide a reliable low-resolution model of a-DG at near-physiological conditions [34].
The SAXS study presented here points to structural models significantly different from those observed in the murine and ha-DG-Nt crystal structures. According to their solution structures obtained by SAXS data analysis exploiting the respective crystal structures, ma-DG-Nt and ha-DG-Nt both assume a remarkably less compact structure in solution than that observed in the crystal structures. The comparison of ma-DG-Nt and ha-DG-Nt p(r) distributions pinpoints to a more extended conformation, a feature that is notably similar for the two orthologs in solution (Fig. 5). Furthermore, the ma-DG-Nt and ha-DG-Nt SAXS models nicely fit on the low-resolution molecular envelopes obtained by DAMMIN ab initio method. Employing the program CORAL [28], apart from rigid body fitting of the Ig-like and S6 domains, we have been able to reconstruct the missing parts of both ma-DG-Nt and ha-DG-Nt. While the N-and C-terminal zones of the Ig-like and S6 domains, respectively, assume extended conformations, the linker connecting the two subdomains displays a more compact structure, rather similar in the two proteins despite important differences in their amino acid sequences. It is interesting to note that by comparing the crystal structures with the CORAL models, the S6 domain appears to be rotated by about 90°to each other around the Ig-like domain. Even if this observation might suggest some functional implications for this assembly, it must be taken into account that the CORAL models have some intrinsic limitations due to the assumption of the models to be rigid. Indeed to overcome this bias and to gain further information on the conformational variability in ma-DG-Nt and ha-DG-Nt in solution, the interdomain flexibility has been assessed by using the EOM method [31]. According to our analysis, murine and ha-DG-Nt share a common behavior in solution, as shown by the similarity in their bimodal R g distribution curves, with comparable maxima and slightly different frequencies.
Indeed, both ma-DG-Nt and ha-DG-Nt in solution are partitioned among few principal populations, which differ in their compactness (Fig. 7). More extended conformations seem to be present, but their relative abundance is low when compared with the most frequent. The common conformational characteristics of murine and ha-DG-Nt suggest that in solution a-DG may have a more complex behavior than expected only on the basis of its crystal structures.
According to the present SAXS study, the structural plasticity of the a-DG-Nt seems to be a general property of this protein, as ma-DG-Nt and human a-DG-Nt show common structural features. Such unexpected conformational variability of the a-DG-Nt is of great interest and it may play a functional role in its ability to interact with different partners, either inside the cell along its maturation pathway or at the level of the ECM, or in both. It has been proposed that a-DG-Nt can assist the bifunctional glycosyltransferase LARGE in its complex enzymatic actions, a function that may require a-DG-Nt to assume different, functionally relevant conformational states. It is also tempting to speculate that the functional flexibility of the a-DG-Nt was positively selected as it conferred a strong advantage for the multistep maturation pathway. Along this pathway, concerted connections must be established between a plethora of glycosyltransferases and regulatory enzymes that extensively decorate the mucin-like region within the Golgi lumen [35]. Further biochemical and structural work is warranted in order to assess the possibility of direct interactions of a-DG-Nt with some of these enzymes. Furthermore, the conformational variability in a-DG highlights the consolidated notion that a-DG displays a distinct structural modularity, in line with the recent analysis of a-DG conserved domain organization in metazoan [36]. The autonomous folding modular nature of the entire a-DG-Nt [26] prompted its possible use as a serum biomarker in DMD patients [37] or also in human uterine fluid to determine uterine receptivity [38].
In line with these results, molecular plasticity of a-DG in solution should be considered and investigated to enhance our understanding of the molecular basis of the physiological and pathological role of a central component of the dystrophin-glycoprotein complex.

Materials and methods
Cloning, expression, and purification As previously reported for the murine a-DG N-terminal region, ma-DG-Nt(50-313)R168H [18], we have cloned the DNA fragment encoding for its human counterpart, ha-DG-Nt(52-315), into the bacterial vector pHis-Trx, for the expression of the protein as a thioredoxin fusion product, with an N-terminal His 6 tag and a thrombin cleavage site. We also introduced within ha-DG-Nt(52-315) the additional mutation R170H, in order to make the protein more proteolytically resistant [22]. The recombinant ha-DG-Nt(52-315) R170H was obtained as previously reported [20]. Namely, it was expressed as fusion protein in Escherichia coli BL21 (DE3) Codon Plus RIL, purified by nickel affinity chromatography, cleaved by thrombin, and further purified by anion-exchange and gel filtration chromatography.

Differential scanning fluorimetry (DSF)
Differential scanning fluorimetry experiments were carried out on a CFX96 Touch Real-time PCR instrument (Bio-Rad, Hercules, CA, USA). Measurements were taken using an excitation wavelength of 470-505 nm and an emission wavelength of 540-700 nm. Data were acquired using a temperature gradient from 20 to 90°C in 0.2°CÁmin À1 increments. The samples contained 0.5 mgÁmL À1 murine and ha-DG-Nt proteins in 20 mM Tris, 150 mM NaCl pH 7.5, and 909 SYPRO Orange (Sigma-Aldrich, St. Louis, MO, USA) in a total volume of 25 lL. The melting curves represent the fluorescence increase arising from the association of SYPRO Orange with exposed hydrophobic residues as the protein unfolds with increasing temperature [33]. Experiments were performed in triplicate. Fluorescence data were analyzed and the T m , represented by the inflection points of the transition curves, were calculated using the Boltzmann sigmoid fit [22].
Limited proteolysis ma-DG-Nt and ha-DG-Nt were subjected to limited proteolysis at 37°C at a final concentration of 30 lM in 25 mM Tris pH 7.5, 150 mM NaCl buffer. A panel of proteases from Proti-Ace kits (Hampton Research, Aliso Viejo, CA, USA), that is, bromelain, proteinase K, subtilisin, thermolysin, trypsin, endoproteinase Glu-C, and clostripain (endoproteinase Arg-C), were tested at a final concentration of 2 lgÁmL À1 . The reactions were stopped after 1, 5, 10, 20, 40, and 60 min by adding SDS sample buffer to aliquots of the reaction mixtures. The samples were analyzed by performing 15% SDS/PAGE and Coomassie staining.
Crystallization, data collection, structure solution, and refinement Attempts to grow crystals of ha-DG-Nt by using the hanging-drop vapor diffusion method from conditions similar to those previously reported for both wild-type ma-DG-Nt and its mutant T190M [18,20] were not successful. While exploring new crystal growth conditions by using commercial high-throughput screening kits (100 + 100 nL of protein and precipitant solution at both 277 and 297 K), we also attempted cross-seeding methods starting from already-grown crystals of ma-DG-Nt mutant. While the screenings did not reveal new crystallization conditions, cross-seeding, based on well-established protocols [39,40], resulted in the growth of well-shaped crystals. Precipitant conditions (0.6-1.4 M sodium citrate buffer) and pH (6.8-7.2) were screened by mixing 1 lL of protein solution (10 mgÁmL À1 ha-DG-Nt in 150 mM NaCl, 25 mM Tris, pH 7.5) with 1 lL of precipitant solution; the drops were equilibrated at 277 K for 3-6 days before seeding. Fully grown crystals were obtained in 2 weeks after seeding in the optimized growth conditions (0.8 M sodium citrate buffer, pH 7.2). Both streak-seeding and microseeding methods were attempted with seeds stock prepared following the manufacturer's protocol (Hampton Research, HR2-320 User Guide). The best-shaped crystals were obtained by using the streak-seeding method. Repeated streak seeding (two to three times), following the same protocol but at optimal precipitant and pH conditions, increased crystal dimensions and improved their diffraction quality.
Data collections were carried out at the XRD1 beamline at ELETTRA (Trieste, Italy) [41,42] using a Pilatus-2M (Dectris Ltd., Baden, Switzerland) detector and the wavelength of 1.00 A. Data collection was carried out at 100 K after having quickly dipped the crystals into a cryoprotectant solution (25% ethylene glycol added to the precipitant solution) and frozen in liquid nitrogen. Indexing, integration, and data reduction of the diffraction data were carried out by the XDS program [43]. Data reduction statistics of the ha-DG-Nt dataset are reported in Table 4.
The structure solution of ha-DG-Nt was obtained by Patterson search methods, using the PHASER software, implemented in the PHENIX crystallographic package [44]. The crystal structure of WT ma-DG-Nt (PDB ID: 1U2C [18]) was used as a search template. Rigid body refinement was initially carried out, followed by a simulated annealing step. The following cycles of the crystallographic refinement included positional refinement and translation-librationscrew (TLS) model parameterization before the individual B-factors refinement. All the refinement cycles were carried out by using PHENIX.REFINE [45] and were alternated with the manual rebuilding of the structure by using the COOT software [46]. Solvent molecules were included in the final model by using the automatic search protocol available in PHENIX.REFINE and manually checked before being included in the model. Occupation of residues Asp160, His161, Ser162, Ala184, and Asp185 that display a poor electron density and a high B iso was also refined. Protein stereochemistry was monitored throughout the refinement process and during manual rebuilding with MolProbity [47]. Statistics of the crystallographic refinement are reported in Table 4. Coordinates and structure factors have been deposited in the PDB, with accession number 5LLK. Molecular diagrams were created with PYMOL [48] and the STRIDE web server [49] has been used for ha-DG-Nt secondary structure assignments. Protein structures superposition and rmsd estimation were carried out by ProFit (Martin, A.C.R., http://www.bioinf.org.uk/software/profit).

Small-angle X-ray scattering (SAXS)
Small-angle X-ray scattering data for ma-DG-Nt were collected on the BM29 beamline [50] at the European Synchrotron Radiation Facility (ESRF, Grenoble, France) as 10 9 1 s exposure time using a Pilatus 1M detector, sample detector distance of 2.87 m and wavelength of 0.99 A. SAXS measurements for ha-DG-Nt were taken on the P12 beamline EMBL SAXS-WAXS at PETRAIII/DESY (Hamburg, Germany) [51] as 20 9 0.05 s exposures time using a Pilatus 2M detector, sample detector distance of 3.00 m and wavelength of 1. 24 A. Scattering profiles for the collected frames were compared to detect radiation damage.
Measurements were taken at six different concentrations (the ranges are reported in Table 3; additional details on SAXS structural parameters are reported in Table S2) in 20 mM Tris, 150 mM NaCl, pH 7.5.
After normalization to the intensity of the transmitted beam, frames were merged for each sample. Subtraction of the buffer's contribution to the scattering and further processing steps were performed using PRIMUS [52] from the AT-SAS 2.6.0 program package [28]. The forward scattering I(0) and the R g were evaluated using the Guinier approximation [53], assuming that at very small angles (s < 1.3/R g ), the intensity is represented as: Pair distance distribution functions of the particles p(r) and the maximum sizes D max were computed using GNOM [54]. MM was estimated by comparison of the calculated forward scattering I(0) of the samples with that of the standard solution of bovine serum albumin (MM 66 kDa). V p was calculated using the Porod approximation [55]: The program DAMMIN [29] was employed to construct low-resolution ab initio beads models of murine and ha-DG-Nt that best fit the scattering data. It employs a simulated annealing procedure to build a compact bead configuration inside a sphere with the diameter D max that fits the experimental data I exp (s) to minimize the discrepancy: Ten independent DAMMIN runs were performed for each scattering profile in the 'slow' mode, using default parameters and no symmetry assumptions (P1 symmetry). The models resulting from independent runs were superimposed using the program SUPCOMB [30], and aligned models were averaged using DAMAVER [56] to generate a consensus three-dimensional shape.
A simulated annealing protocol implemented in CORAL [28] was employed to find the optimal positions and orientations of the available high-resolution models of the Ig-like and S6 domains of murine and human a-DG-Nt. In addition, the program also generated approximate clashfree conformations of the missing portions of polypeptide chain (around 10 amino acids at both the N-terminal and C-terminal regions and the missing linker between the Iglike and S6 domains). The model fit of the X-ray crystal structures (PDB ID: 1U2C for ma-DG-NT and 5LLK for ha-DG-Nt) against the SAXS data was calculated using CRYSOL [27]. Analysis of the interdomain flexibility and size distribution of possible conformers, consistent with the measured scattering data for murine and human a-DG-Nt, was conducted using the EOM [31]. This method selects an ensemble of possible conformations from a pool of 10 000 randomly generated models constructed from rigid domains linked by randomly generated flexible linkers. The program CRYSOL is used to calculate the theoretical scattering profiles of these models, and a genetic algorithm, GAJOE, is used to select an ensemble of conformations, whose combined scattering profiles best fit the experimental data. The crystal structures of the Ig-like and S6 domains of murine and human a-DG-Nt were used as rigid bodies for the analysis of the scattering data, employing ensemble optimization. Linkers between the domains and missing N-terminal and C-terminal stretches were represented as a flexible chain of dummy residues.

Supporting information
Additional Supporting Information may be found online in the supporting information tab for this article: Table S1. Multiple alignment of selected mammalian sequences of the N-terminal region of a-DG. Table S2. SAXS structural parameters. Figure S1. Electrostatic potential maps of ma-DG-Nt and ha-DG-Nt. Figure S2. Superimposition of the ha-DG-Nt and ma-DG-Nt selected regions. Figure S3. Amino acid differences between mouse and human a-DG-Nt mapped onto the ha-DG-Nt accessible surface. Figure S4. Amino acid differences mapped onto the ha-DG-Nt accessible surface.