The cryo-EM structure of the human uromodulin filament core reveals a unique assembly mechanism

The glycoprotein uromodulin (UMOD) is the most abundant protein in human urine and forms filamentous homopolymers that encapsulate and aggregate uropathogens, promoting pathogen clearance by urine excretion. Despite its critical role in the innate immune response against urinary tract infections, the structural basis and mechanism of UMOD polymerization remained unknown. Here, we present the cryo-EM structure of the UMOD filament core at 3.5 Å resolution, comprised of the bipartite zona pellucida (ZP) module in a helical arrangement with a rise of ~65 Å and a twist of ~180°. The immunoglobulin-like ZPN and ZPC subdomains of each monomer are separated by a long linker that interacts with the preceding ZPC and following ZPN subdomains by β-sheet complementation. The unique filament architecture suggests an assembly mechanism in which subunit incorporation could be synchronized with proteolytic cleavage of the C-terminal pro-peptide that anchors assembly-incompetent UMOD precursors to the membrane.

The most abundant protein in human urine, the glycoprotein uromodulin (UMOD; also Tamm-Horsfall protein, THP), is conserved throughout Mammalia and produced primarily in the epithelial cells lining the thick ascending limb (TAL) of the Henle loop in the kidney. There, UMOD is important for maintaining the water impermeable layer, regulating salt transport, and urinary concentration (Devuyst et al., 2017). Farther along the urinary tract, UMOD has been shown to act as a soluble adhesion antagonist against uropathogenic E. coli (UPEC) (Pak et al., 2001;Weiss et al., 2020).
UMOD precursors traffic through the secretory pathway in the assembly incompetent pro-UMOD form, where they become glycosylated at eight potential N-glycosylation sites and eventually attached at the apical membrane surface via their glycosylphosphatidylinositol (GPI)-anchored C-terminal pro-peptide (CTP) (Rindler et al., 1990;van Rooijen et al., 1999;Weiss et al., 2020). The protease hepsin then cleaves the CTP and UMOD assembles to homopolymeric filaments with an average length of 2.5 mm in the urine (Brunati et al., 2015;Porter and Tamm, 1955).
Recently, the general architecture of the intrinsically flexible UMOD filaments was uncovered via cryo-electron tomography (Weiss et al., 2020). The results, together with previous data, showed that UMOD assembles to a zigzag shaped, linear polymer, where the core of the filament is formed by its bipartite zona pellucida (ZP) module with the subdomains ZPN and ZPC. The N-terminal domains, epidermal growth factor-like (EGF) I--III and the following cysteine-rich D8C domain, protrude as arms alternating from opposite sides of the filament, and the EGF IV domain connects the arms to the filament core ( Figure 1A; Jovine et al., 2002;Schaeffer et al., 2009;Weiss et al., Figure 1. Cryo-EM structure of the human UMOD filament core. (A) Domain architecture of the membrane-anchored pro-UMOD monomer, composed of 4 EGF-like domains (I-IV, gray), the cysteine-rich D8C domain (green), and the bipartite ZP module (ZPN and ZPC, blue). The fold of the C-terminal ZPC subdomain is extended by the GPI-anchored, C-terminal pro-peptide (CTP, red) that is cleaved by hepsin as a prerequisite of UMOD polymerization. The hepsin cleavage site (peptide bond 587--588) is indicated by scissors. The previously crystallized UMOD segment rEGF-ZP-CTP is  2020). UMOD filaments encapsulate piliated uropathogens through multivalent binding via their N-glycans to the lectins at the tips of bacterial pili. This mechanism not only prevents pathogen adhesion to target glycans on epithelial cells of the host, but also facilitates pathogen aggregation and clearance by urine excretion (Weiss et al., 2020). Despite these insights, the exact structural organization of UMOD filaments remained unknown. We set out to solve the structure of the UMOD filament by single-particle cryo-EM. Isolated native UMOD filaments from a healthy individual appeared on the grid in two major views: a fishbone-like 'front' view with alternately protruding arms, and a zigzag 'side' view ( Figure 1B). Marked conformational variation in both the filament core and the angles between the peripheral arms and the core was visible. Additionally, flexible N-glycans lining the filament core could be seen already in 2D class averages ( Figure 1-figure supplement 1). This intrinsic flexibility and glycan display is a prerequisite of pathogen encapsulation by UMOD filaments, however, they considerably complicated the structure determination by single-particle cryo-EM, restricting high-resolution analysis to the filament core.
Ab initio 3D reconstruction of 723,000 particles was followed by extensive classification and focused refinement together with the non-uniform refinement strategy in cryoSPARC in order to obtain a reconstruction of the UMOD core at 3.5 Å (Figure 1-video 1). This high-resolution, unsymmetrized map was used to rigid body fit the individual UMOD ZP subdomains, ZPN and ZPC, from the previously published crystal structure of a recombinant, assembly incompetent pro-UMOD fragment (consisting of the EGF IV domain, ZP module, and CTP (termed rEGF-ZP-CTP); Bokhove et al., 2016;PDB: 4WRN). The resolution of the cryoSPARC map at the filament core allowed direct model building of the immunoglobulin-like (IG) ZP subdomain folds. The ZPN/ZPC asymmetric unit was then extended by rigid-body fitting within a lower resolution map (4.7 Å ), refined in cisTEM, that contained more repeats. This fitting showed a helical architecture of the filament with a~65 Å rise and~180˚twist, which is consistent with parameters obtained by sub-tomogram averaging (Weiss et al., 2020). From here, we were able to ascertain an unexpected, extended conformation of the ZP monomers in the filament core, wherein a long, 28-residue linker (UMOD residues L429--F456) between the ZPN and ZPC subdomains of each ZP monomer spanned the ZPC from the previous module (ZPC -1 ) and the ZPN from the subsequent module (ZPN +1 ) ( Figure 1C-D). The core region including the linker is well resolved, showing clear sidechain map features ( Figure 1E-F). The linker between ZPN and ZPC of the monomers spans~103 Å and interacts tightly with the two intervening ZP subdomains ( Figure 1D--F, Figure 1-figure supplement 2). Moreover, Figure 1 continued also indicated. The positions of the eight verified N-glycosylation sites in mature UMOD filaments are indicated above the respective domains as hexagons (filled hexagons: complex type N-glycan; open hexagon: high mannose type N-glycan). Amino acid numbering according to pre-pro-UMOD including signal sequence. (B) Representative cryo-electron micrograph showing the two major views of UMOD filaments: front view (1) and side view (2). Scale bar: 50 nm. Schematic representations of the subdomain organization of these two views are shown on the right. (C) Segmented Coulomb potential map of the mature ZP module in the filament, shown over the low-pass filtered reconstruction (at~9 Å resolution, gray). The extended linker connecting ZPN and ZPC of the blue ZP module complements the folds of ZPN of the following and ZPC of the preceding ZP modules, (ZPN +1 and ZPC -1 ; plum and gold, respectively) by b-sheet complementation. The map features show a helical rise of 65 Å and a twist of 180˚. The locations of the EGF IV and D8C domains from the filament arms are also indicated. The resolved N-glycan map features (dark grey) are indicated with arrows. (D) Front and side views of the refined model of the filament core within the obtained high-resolution cryo-EM map. Resolved monosaccharide units of N-glycans are shown as gray stick models. (E) Detailed view of the map features at selected regions in ZPN (blue) and ZPC (gold), illustrating the quality of the final cryo-EM model. (F) Contiguous Coulomb potential map (mesh) around the extended linker segment that harbors the b-strands Lb1--Lb 3 . Lb 1 and Lb 2 complement the ZPC fold of the preceding UMOD subunit, and Lb 3 complements ZPN of the following UMOD subunit. (G) Coulomb potential map around the core pentasaccharide of the complex type N-glycan attached to N396 in ZPN. Blue squares: N-acetylglucosamine; green circles: mannose. The online version of this article includes the following video and figure supplement(s) for figure 1:    Coulomb potentials for prominent N-glycans at N396 and N513 were observed and allowed visualization of the core pentasaccharide and disaccharide, respectively ( Figure 1C,D,G).
The intrinsic flexibility of the UMOD fibers could be directly observed in the collected micrographs in the range of their curvatures ( Figure 1B). Slight deviations in the helical rise along the chain and the intrinsic flexibility hampered particle classification and alignment and limited the resolution of the reconstructions. To analyze this flexibility, we measured the relative angles between subdomains after rigid-body docking into multiple, heterogeneous 3D class averages (Figure 1-figure supplement 3). Similar maximum angular changes of~7 degrees were obtained at both unique interfaces between the ZP subdomains, indicating that both interfaces equally give rise to the flexibility of the filaments. Additionally, we utilized the 3D variability analysis from cryoSPARC (based on the principle component analysis (PCA); Punjani and Fleet, 2020), which showed a similar degree of filament bending (component 1) and an elongation of the filament along the z-axis (component 2) based on a relative rotation of ZP modules at the subdomain interfaces ( Figure 1-video 2).
The elongated ZP linker establishes an intricate and extensively enchained scaffold for filament build-up from mature UMOD monomers. Reminiscent of donor strand complementation in subunitsubunit interactions of pili assembled via the chaperone-usher pathway (Waksman, 2017) or the more recently discovered type V pili (Shibata et al., 2020), formation of the UMOD filament involves a b-sheet complementation mechanism extending across enchained ZP modules ( Figure 2A). Specifically, the extended linker between ZPN and ZPC in each monomer is comprised of three separate b-strands: Lb 1 , Lb 2 , and Lb 3 (residues 430--435, 438--441, and 446--452, respectively), where Lb 1 and Lb 2 of subunit n complement the fold of ZPC of subunit n--1 (ZPC -1 ), and Lb 3 extends the IG-fold of ZPN of subunit n+1 (ZPN +1 ) ( Figure 2B). The complementation site (CS) between Lb 1 , Lb 2 , and ZPC -1 (denoted CS A ) is formed by the antiparallel insertion of Lb 1 relative to the ZPC b F strand, and the shorter, parallel complementation of the ZPC b A' strand by the Lb 2 strand ( Figure 2C). Lb 3 , the longest b-strand in the linker segment, binds parallel along the ZPN +1 b G strand at complementation site B (CS B ) ( Figure 2D).
The inter-molecular interface between the ZPN subdomain of subunit n and the ZPC domain of the preceding subunit (ZPC -1 ) (interface A, I A ), is formed by mostly hydrophobic residues from both subunits ( Figure 2E), creating a buried surface area of~2000 Å 2 (PISA server, Krissinel and Henrick, 2007). Specifically, residues Y402, Y427, and L429 from ZPN at I A cover a prominent hydrophobic patch on the surface of the ZPC -1 , composed of L491, F499, F456, and L570 (Figure 2-figure supplement 1). The second unique intermolecular interface (I B ) along the elongated UMOD monomer is formed between ZPC -1 and ZPN +1 , and is spanned by the linker ( Figure 2F). Surface charge complementary between the two ZP subdomains provides the basis for I B , highlighted by the insertion of R415 on ZPN +1 into a negatively charged pocket on ZPC -1 harboring D532. I B is further stabilized by hydrophobic interactions between ZPC -1 (F555, F553) ZPN +1 (I413, I414) and the linker (P441, V443) (Figure 2-figure supplement 1).
We utilized the Genome Aggregation Database (gnomAD) to parse missense variants in the UMOD ZP module detected in~125,000 exome and~15,000 whole-genome sequences from unrelated healthy individuals from various population studies (Karczewski et al., 2020). Given the selective pressure for high levels of functional UMOD filaments in the urine (Ghirotto et al., 2016), we reasoned that residues or regions of UMOD that are required for filament assembly should show less genetic variation in healthy individuals. Here, we found that both the linker segment and those residues important for creating I A and I B are depleted in genetic variants, realtive to the rest of the ZP module ( The previously reported structure of the pro-UMOD fragment rEGF-ZP-CTP, which crystallized as a homodimer in which only the ZPN subdomains interact (Bokhove et al., 2016), provides the nearest comparison for the transition from pro-UMOD (CTP intact) to mature UMOD in the filament. Figure 3 shows the major conformational differences between rEGF-ZP-CTP and the ZP module in the mature filament. The most prominent difference is the separation of ZPN and ZPC in the mature filament, where individual C a atoms (e.g. ZPN residue S364) change their relative position by up to 95 Å ( Figure 3A). In addition, formation of the 103 Å long linker segment in mature UMOD involves i) the excision of strand b A from ZPC, which becomes the linker strand Lb 3 in the mature ZP module, and ii) the unwinding and extension of the a-helical segment connecting ZPN and ZPC in pro-UMOD that forms linker strands Lb 1 and Lb 2 in the mature ZP module ( Figure 3B; Figure 3-figure supplement 1). The overall folds of the ZPN and ZPC subdomains, however, remain the same (with C a RMSD values of 2.0 and 1.2 Å , respectively). Notably, linker strand Lb 2 occupies the same position in ZPC --1 as the b-strand segment (559--605) from the CTP in rEGF-ZP-CTP. In the mature filament, the residues that constitute helix aEF of rEGF-ZP-CTP are extended and the involved aromatic residues undergo a rearrangement that allows binding of Lb 2 (Figure 3-figure supplement 2). In addition, the insertion of the linker strand Lb 1 into ZPC -1 only becomes possible after the excision of its strand b A , which blocks Lb 1 binding in rEGF-ZP-CTP ( Figure 3C-E). Overall, the binding pocket for the CTP b-strand in ZPC of rEGF-ZP-CTP is shorter than that accommodating Lb 1 and Lb 2 in the mature ZPC subdomain, causing a kink in the CTP peptide along the ZPC subdomain ( Figure 3E). The difference observed in comparison of the ZPN domains is less dramatic; here, the same surface complemented by the rEGF-ZP-CTP strand b G from the apposing ZPN in the crystal structure homodimer is fulfilled in the filament by the Lb 3 strand from subunit n--1 ( Figure 3F,G).
The cryo-EM structure of the UMOD filament core raises the intriguing question of how UMOD filaments assemble from pro-UMOD subunits attached to the cell surface. Figure 4 shows the  simplest conceivable model of UMOD filament assembly. Assembly requires the release of mature UMOD monomers from the membrane by proteolytic removal of the GPI-anchored CTP by hepsin, thus making a ZPC module competent for binding the Lb 3 strand of another UMOD. Assembly starts by binding of a pro-UMOD ZPN module to strands Lb 1 and Lb 2 in a neighboring pro-UMOD (step 1). Hepsin cleavage may then occur at the elongating filament (step 2). Each subsequent UMOD incorporation step (steps 3, 4, etc.) would then be characterized by the intercalation of the ZPN   from an incoming pro-UMOD between the two ZPC modules of the filament, leading to a more stable binding of the incoming ZPN compared to that in step 1. As a final cleavage step to release the assembled filaments from the membrane, a yet unidentified hydrolase may act upon UMOD, as small amounts of the CTP could still be detected in MS/MS spectra after trypsin digestion of mature filaments (Figure 4-figure supplement 1). This UMOD assembly model is analogous to the recently proposed assembly mechanism of filamentous type V pili, in which assembly is linked with proteolytic release of pilus subunits from the outer bacterial membrane (Shibata et al., 2020). During the preparation of this manuscript, a related preprint article on the cryo-EM structure of the UMOD filament core was published (Stsiapanava et al., 2020). In said study, an alternative model of UMOD assembly was proposed, based on the assumption that assembly starts from membrane-bound pro-UMOD homodimers.
The enchained b-sheet complementation mechanism observed in the UMOD filament core may also be valid for other proteins containing membrane-anchored ZP modules and undergoing filament formation. For instance, another human ZP protein, alpha-tectorin (TECTA), forms filaments creating the basis for the apical extracellular matrix (ECM) on the cochlear supporting cells and shares important features with UMOD: a highly similar linker region and a conserved, C-terminal protease cleavage site (Figure 4-figure supplement 2). Recently, Kim and colleagues proposed the '3D printing model' for surface-tethered, TECTA-mediated ECM organization (Kim et al., 2019), which is in agreement with our proposed model of UMOD polymerization at the surface of TAL cells in the kidney tubule. Thus, the cryo-EM structure of the UMOD filament core might be representative for the core structure of multiple proteins with a C-terminal ZP module that become functional after polymerization. Step 1) Binding of ZPN from a pro-UMOD monomer (blue) to the Lb 3 segment of an extended neighbor pro-UMOD (gold) may start the assembly. (Step 2) In the resulting, asymmetric pro-UMOD dimer, hepsin (red scissors) cleaves the GPI anchored CTP (red) from the ZPC subdomain. The released ZPC subdomain then binds to Lb 1 and Lb 2 of the incoming pro-UMOD. (Step 3) ZPN from a third pro-UMOD (plum) binds to the Lb 3 segment between the two ZPC segments of the growing filament. (Step 4) Again, hepsin cleaves off the CTP from the pro-UMOD in which the ZPC is complexed with ZPN. Steps 3 and 4 are consistently repeated until filament assembly is completed. The online version of this article includes the following figure supplement(s) for figure 4:

Purification of human UMOD
Human UMOD fibers were purified from a healthy donor as described in Weiss et al., 2020. Briefly, urine was purified using a diatomaceous earth filter, concentrated, and dialyzed overnight against 0.5 mM EDTA-NaOH, pH 8.2 (300 kDa cut-off, Spectrum laboratories). Aliquots were flash-frozen in liquid nitrogen and stored at --20˚C until further use. Only the second morning micturition was utilized for urine collection. Protein concentrations were determined as previously described (Weiss et al., 2020).

Cryo-EM data collection
Protein samples (3.5 mL of 2.5 mM UMOD) were applied to glow-discharged lacey carbon grids with an ultrathin carbon coating (Electron Microscopy Sciences) automatically blotted from the back side of the grid for 13.5 s (100% humidity, 9˚C) and plunge frozen in liquid-ethane-propane using a Vitrobot Mark IV (Thermo Fisher Scientific). Micrographs were acquired on a Titan Krios microscope (Thermo Fisher Scientific) operated at 300 kV with a Gatan K2 Summit direct electron detector using a slit width of 20 eV on a GIF-Quantum energy filter. A total of 4679 and 4864 movies were recorded over two data collections from two separate grids and subsequently merged. Images were recorded with EPU software (Thermo Fisher Scientific) at 130 000 x nominal magnification in counting mode with a calibrated pixel size of 1.084 Å . The defocus target ranges were À1.2 mm to À3.3 mm and À0.8 mm to À2.0 mm for the first and second data collection, respectively. Each micrograph was dose-fractionated to 40 frames under a dose rate of 7.5 e -/ Å / s, with an exposure time of 6 s, resulting in a total dose of approximately 45 e -/ Å 2 .

Cryo-EM image processing
The collected movies were motion corrected with MotionCor2 (Zheng et al., 2017). The CTF parameters of the micrographs were estimated using Gctf 1.06 (Zhang, 2016). Following steps of image processing were done in cryoSPARC v2.15 (Punjani et al., 2017) and with the in-house written python scripts for data administration and interpretation (https://github.com/dzyla/umod_process-ing_scripts). Approximately 600 particles were manually picked from a random selection of micrographs and used to train a convolutional neural network picking model for TOPAZ (Bepler et al., 2019). A total of 1.3 million particles were picked via TOPAZ from all micrographs. For the first steps of processing, particles were extracted with 320 px box size and binned to 2.71 Å /px. After several rounds of 2D classification, 723,000 particles, corresponding to well resolved, clear classes, were selected for further processing. Three ab initio initial models were generated and a single best one was used as a 3D-reference for the 3D-heterogeneous refinement. 3D-heterogeneous refinement with 10 classes resulted in multiple distinct 3D class-averages, showing intrinsic flexibility of the specimen. The most populated 3D class-average of 145,000 particles was chosen for further refinements. Particles from the selected class were then re-extracted at the original pixel size and subjected to a 3D-homogenous refinement. A mask was created in UCSF Chimera (Pettersen et al., 2004) for the center segments of 3D-reconstruction and used for several rounds of local refinement using a nonuniform refinement strategy. This pipeline yielded high-resolution 3D-reconstruction of the central asymmetric unit (AU; ZPN, ZPC, and the inset linker) at 3.5 Å . Detailed workflows are shown in To obtain a map focused on several repeat units, an alternative approach was applied using cis-TEM for refinement and classification: all micrographs were manually selected, resulting in a total of 8543 movies which were then also motion corrected using MotionCor2, CFT values estimated using Gctf, and 485,000 particles were manually picked as helices from a random selection of 3600 micrographs in RELION v3.0.8 (Zivanov et al., 2018) and extracted at the original pixel size. The particle stack was then subjected to 2D-classification in cisTEM (Grant et al., 2018). 402,000 good particles were selected, and multiple rounds of focused classification was performed with the imported best cryoSPARC 3D class-average as a reference (described above) and a spherical (120 Å radius) mask. Two of the resulting 3D class-averages were merged and another round of 3D-classification with the same spherical mask was undertaken, resulting in a map resolved to 4.7 Å from 330,000 particles. The map was filtered to 4.7 Å and sharpened using cisTEM default settings (pre-cut-off B-Factor À90.0, resolution cut-off 4.7 Å ). Reported helical parameters were estimated in real space from the cisTEM map using relion_helix_toolbox (He and Scheres, 2017) with search ranges for rise (60--67 Å ) and twist (175-185˚). Schematic workflows and details are shown in Figure 1-figure supplement 1, Supplementary file 1B,D. For comparative purposes, the cisTEM map FSC curves and the local resolution maps were calculated in cryoSPARC for all generated half-maps.

Model building and structure refinement
The individual ZP subdomains of the UMOD crystal structure (PDB: 4WRN; Bokhove et al., 2016) were used as initial models (ZPN: aa 428-528, ZPC: aa 541-710, according to the 4WRN numbering). First, the individual domains were rigid-body fitted to the map obtained in cryoSPARC using UCSF Chimera, followed by manual building in Coot 0.9 and ISOLDE (Croll, 2018;Emsley et al., 2010). The asymmetric unit (AU, defined as a single ZPN and ZPC and the inset linker) of UMOD was refined using phenix.real_space_refine program (Adams et al., 2010) with the default settings, which apply Ramachandran restraints. To reduce clashes between subunit interfaces, the AU model was used to create a model with four consecutive ZP subdomains and was refined against the cis-TEM map. The structure of the UMOD ZP module was then applied to the RosettaCM refinement with the symmetry derived from the cisTEM map (Song et al., 2013), with the 'auto' strategy applied in the refinement. The central ZPN/ZPC module in the refined structure was manually adjusted to the high-resolution cryoSPARC map in COOT and was subsequently used to generate the final UMOD model. This updated model was subjected to the next round of RosettaCM refinement, with a higher RMS model restraint applied (r.m.s. = 0.2). Overall, three rounds of interactive manual and RosettaCM refinements were performed and the geometry parameter of the final AU model structure was further improved using phenix.real_space_refine with tight reference restraints (sigma = 0.025). The sidechain residues 462--473 in the AU model could not be built with confidence and were omitted from the final model. For the final model of the UMOD filament, a structure was generated with 3 copies of the final AU model by applying the symmetry derived from the cisTEM map. The central ZPN/ZPC subunit of the core part of the structures were subjected to a real space refinement with tight reference restraints (sigma = 0.025) using the cryoSPARC map. The center ZPN/ZPC subunit in the refined model was then used to generate the new UMOD filament structure. Clashes in the subunit interface were further reduced by performing another round of phenix.real_space_refine with tighter reference restraints (sigma = 0.02).
Model validations were performed using the Comprehensive Validation tool in PHENIX. Final model statistics were generated using MolProbity tool via PHENIX . Figures of the structure were prepared with PyMol v2.4 (The PyMOL Molecular Graphics System, Version 2.4 Schrö dinger, LLC) and EM map figures were generated with UCSF ChimeraX 0.93 (Goddard et al., 2018).

Cryo-EM model 3D flexibility analysis
The well-resolved reconstructions from 3D classifications steps performed independently in cryo-SPARC and cisTEM were rigid-body fitted with ZP subdomains derived from the final model. Each subdomain was first placed at the correct position and then fitted by the Fit in Map tool in Chimera 1.13.1. In total, 6 ZP subdomains were fitted into nine maps where the subdomain Coulomb potential was well defined (four classes from cryoSPARC and five classes from cisTEM) and saved with fixed positions to each other as a 'filament'. Next, using the PyMOL align command, all nine filament structures were aligned to the center ZPN subdomain. In Chimera, the define axis command was used to give each subunit a centroid and a central axis. For subunits at positions ZPC -1 and ZPC, the maximum angle was determined with the angle command.

3D variability analysis of UMOD
3D variability analysis (3DVA) was performed in cryoSPARC using~723,000 particles extracted at the pixel size of 4.34 Å , 10 modes, and filter resolution of 9.2 Å . The 3DVA display job was run in the simple output mode with 20 intermediate frames. From 10 generated components, the first two exhibited the same yaw/pitch movement (Figure 1-video 2), the third showed an artifact at the box edge, and the fourth showed the elongation along the filament axis (Figure 1-video 2). The higher components showed various movements in the arms (not shown).

Assessing genetic variation using gnomAD
Genome Aggregation Database (GnomAD) v2.1.1 (https://gnomad.broadinstitute.org/) is aligned against the GRCh37 genome build and was released in March 2019. The dataset comprises 125,748 exomes and 15,708 genomes sequenced as part of various disease-specific and population genetic studies, totaling 141,456 unrelated individuals from eight major populations (Karczewski et al., 2020). Results for UMOD were filtered for missense variants and checked for consistency with the UMOD transcript ENST00000302509.8. Missense variants corresponding to a non-canonical transcript were manually removed. UMOD variants were parsed using an in-house Python script (https://github.com/dzyla/umod_ processing_scripts) and plotted on the structure. Residues with no mutations were set to 1.0 and for all non-zero variants the values were normalized between 0 and 1, based on the PAM250 matrix. The structure was visualized by spectrum coloring in PyMOL 2.4. For Supplementary file 1A, all linker residues are shown and the interfaces are defined by atoms that are within 8 Å of one another, as defined by COCOMAPS (Vangone et al., 2011).