Visualizing the functional 3D shape and topography of long noncoding RNAs by single-particle atomic force microscopy and in-solution hydrodynamic techniques

Long noncoding RNAs (lncRNAs) are recently discovered transcripts that regulate vital cellular processes, such as cellular differentiation and DNA replication, and are crucially connected to diseases. Although the 3D structures of lncRNAs are key determinants of their function, the unprecedented molecular complexity of lncRNAs has so far precluded their 3D structural characterization at high resolution. It is thus paramount to develop novel approaches for biochemical and biophysical characterization of these challenging targets. Here, we present a protocol that integrates non-denaturing lncRNA purification with in-solution hydrodynamic analysis and single-particle atomic force microscopy (AFM) imaging to produce highly homogeneous lncRNA preparations and visualize their 3D topology at ~15-Å resolution. Our protocol is suitable for imaging lncRNAs in biologically active conformations and for measuring structural defects of functionally inactive mutants that have been identified by cell-based functional assays. Once optimized for the specific target lncRNA of choice, our protocol leads from cloning to AFM imaging within 3–4 weeks and can be implemented using state-of-the-art biochemical and biophysical instrumentation by trained researchers familiar with RNA handling and supported by AFM and small-angle X-ray scattering (SAXS) experts. This protocol describes how to characterize the 3D topology of long noncoding RNAs. The authors provide detailed step-by-step procedures for the complete workflow from lncRNA isolation to AFM imaging and SAXS analysis.


Introduction
Long noncoding RNAs (lncRNAs) are non-protein-coding transcripts that regulate fundamental cellular processes, such as cell differentiation and replication 1 , and are directly connected to severe diseases, such as cancer and neurological or cardiovascular defects 2 . LncRNAs are thus of fundamental interest in a wide range of research areas, including epigenetics, neurobiology, oncology, plant biology and infection biology. However, the molecular mechanisms of lncRNAs are poorly characterized to date. For a quantitative and comprehensive mechanistic understanding of lncRNAs, characterizing their three-dimensional (3D) structures is now of utmost importance, because 3D structure has been shown to play a crucial role in lncRNA biological functions 3,4 . Specifically, many lncRNAs scaffold nuclear proteins, such as chromatin remodeling enzymes and transcription factors, or shape subnuclear bodies, such as speckles or paraspeckles [5][6][7] . Thus, it can be expected that lncRNA tertiary structures may guide specific and selective protein binding or ensure correct chromatin targeting and, consequently, efficient gene expression regulation 3 . Indeed, long-range tertiary interactions have already been identified in the lncRNAs RepA 8 , XIST 9 , and MEG3 10 . However, highresolution structural studies on lncRNAs are limited to the characterization of extremely small domains, that is, a 14-nt-long stem loop of the~17,000-nt-long XIST 11 and a 76-nt-long triple helix motif of the~8,400-nt-long MALAT1 12 . Instead, 3D structures have never been determined for any full-length lncRNA, because the size and complexity of these transcripts-which generally span 1,000-10,000 nucleotides (nt)-present unique and unprecedented challenges for biochemical and biophysical characterization 4 .

Development of the protocol
In our lab, we recently developed an approach that enabled us to visualize the structural organization of full-length lncRNAs. For our studies, we have integrated small-angle X-ray scattering (SAXS) and single-particle AFM imaging with a detailed functional characterization of the chemically probed lncRNA secondary structures 10 . Optimization of our non-denaturing lncRNA purification pipeline 13 with an enlarged set of transcription buffers and quality control assays (Bioanalyzer, electrophoresis, and static and dynamic light scattering (DLS)) was essential to obtain highly homogeneous conformations of our target. The identification of functional tertiary interactions via robust phenotypic assays provided further useful guidance for biophysical studies and, crucially, a rationale for designing mutants with perturbed structural architectures that served as invaluable test samples with which to benchmark our imaging method. Finally, specific screening of AFM surfaces, sample deposition procedures, and data processing methods-as we describe in detail in our protocol-enabled us to adapt AFM imaging to lncRNAs, which, to our knowledge, have never before been studied with such a method. Considering the AFM conditions that we have used, we can achieve a resolution of 1.5 nm, which enables us to visualize the global topography of our targets, including their homogeneity, size, and compactness. We can clearly distinguish between different folding states of a wildtype lncRNA (unfolded, partially folded, and folded) and we can capture topological differences (i.e., differences in 3D shape) between lncRNA mutants that possess wild type versus disrupted long-range tertiary contacts.
In our work, we have focused on the human lncRNA called 'maternally expressed gene 3' (MEG3) 10 , which is an lncRNA that promotes neuronal differentiation and stimulates p53, preventing neurodevelopmental syndromes and intracranial tumors 14,15 . Using our approach, we could specifically prove that evolutionarily conserved intramolecular long-range tertiary structure interactions called pseudoknots or 'kissing loops', which are required for MEG3-dependent p53 stimulation, are also strictly required for MEG3 folding 10 . Our work established that physiologically relevant longrange RNA tertiary structure interactions guide the biological function of lncRNAs. In this protocol, we provide a detailed workflow that can be broadly applied to characterize the 3D topology of any lncRNA of interest and will thus enable the characterization of important molecular properties of many of these medically relevant targets.

Overview of the procedure
In this protocol, we describe how to study lncRNAs with a spectrum of complementary biophysical methods in solution, how to image lncRNAs with AFM, and how to correlate their structural features to functionally relevant conformations.
The strengths of our protocol are that (i) we integrate hydrodynamic and AFM analyses, ensuring that the molecular shape and size adopted by the target on the AFM support is compatible with its dimensions in solution; (ii) we use known structured and unstructured RNAs as controls for benchmarking the protocol; and (iii) we can visualize and compare functionally active and inactive constructs of the same lncRNA target to correlate structural changes with specific functional states. The protocol is compatible with and complementary to chemical or enzymatic probing approaches, for example, selective 2ʹ-hydroxyl acylation analyzed by primer extension (SHAPE), as previously reported 13,17,18 , which is a robust technique for mapping the secondary structural architecture of the target RNA. We strongly recommend that any data obtained using the approach described in this protocol be backed up and validated by functional experiments in cells. The relevant functional assays are, however, target specific and will therefore not be described in this protocol.

Applications of the method
Our protocol is applicable to a vast number of newly discovered biological targets because it has recently been discovered that lncRNAs are numerous, especially in mammals. For instance, it is estimated that humans may express as many as 30,000 lncRNAs 19 versus~20,000 protein-coding genes 20 . Moreover, recent studies on the secondary structure architecture of functional lncRNAs by chemical and enzymatic probing revealed that these transcripts are as complex as rRNAs or ribozymes 21 and that their structures are evolutionarily conserved [22][23][24][25][26] . Furthermore, our protocol enables the visualization of functional and non-functional lncRNA conformations and folding states, which is of immediate interest to a broad spectrum of researchers, including epigeneticists, developmental biologists, plant biologists, and oncologists. Indeed, structured lncRNAs participate in diverse and fundamental biological processes. For instance, HOTAIR scaffolds Polycomb repressive complexes in trans for epigenetic differentiation of skin tissues 26 ; MEG3 participates in cell cycle regulation and tumor suppression 10,27 ; COOLAIR controls flowering and vernalization in plants 22 ; NEAT1 and lincRNA-p21 shape the expression landscape in response to environmental conditions modulating the stress response in paraspeckles 23,25 ; Braveheart regulates cardiomyocyte differentiation 24 ; and RepA, roX, and XIST ensure gene-dosage compensation during sex determination 8,28,29 .
The profound medical implications of lncRNAs raise the prospects of gaining valuable insights from the applications of our method to the field of medicinal chemistry. Large investments are currently being made by private and academic labs to identify compounds designed to modulate RNA function for potential clinical use 30 . In this context, our protocol could be used to screen the effects of small molecules on lncRNA structural conformations. Moreover, considering that our study on MEG3 enabled us to observe specific structural differences between the functional wild-type lncRNA and non-functional constructs carrying point mutations, our protocol will also be used for screening the structural impact of lncRNA single-nucleotide polymorphisms (SNPs) known to cause pathological phenotypes.
Finally, our protocol opens up future directions for biochemists and RNA structural biologists. For instance, the protocol can be adapted to perform optical tweezer experiments and AFM experiments in 'force modeʹ, that is, by measuring interaction forces in single molecules. These studies would precisely determine the energetics of lncRNA folding, as has already been done for short viral RNA motifs 31 or for de novo-designed DNA and RNA origami structures 32 .

Comparison with other methods
Our protocol has the advantages of (i) integrating widely accessible biochemical and biophysical techniques; (ii) making use of instrumentation that can be operated by any researcher familiar with RNA handling-including scientists at early stages of their careers, that is PhD students, if appropriately supervised by experts when operating costly equipment-such as AUCs, SAXS beamlines, and AFM microscopes; and (iii) being implementable with affordable investments in house or via user-oriented facilities, such as synchrotron beamlines. The protocol is also compatible with parallel characterization of multiple lncRNA targets.
Moreover, our protocol is unique in the detailed description of the biochemical and biophysical pipeline for lncRNA production and for the experimental assessment of their purity and homogeneity. Complementarily to an approach recently used to image the lncRNA Braveheart by SAXS 33 , our protocol reveals how integrated structural biology can powerfully enable the characterization of novel classes of large and difficult-to-handle biological macromolecules such as lncRNAs. In this respect, one of our protocol's distinctive feature is the implementation of single-particle imaging by AFM, in addition to in-solution hydrodynamic analysis by SAXS, to visualize lncRNA functional conformations (Figs. 2-4).  1 | Workflow for visualizing the 3D topology of lncRNAs. First, a highly pure preparation of the lncRNA target is produced (Steps 1-19). Optimal folding conditions are then identified by AUC (Steps 20-38). The sample is then checked for monodispersity in different folding conditions (Step 39A-C). Finally, lncRNA volumes are reconstructed by SAXS (Steps , and lncRNA single particles are imaged by AFM (Step 65-100). These procedures enable comparison of lncRNAs in different folding conditions and in functional/non-functional states. The time required for each step is indicated in blue, pause points are indicated in green, critical checkpoints in gold, and preparative steps in gray. LS, light scattering; MW, molecular weight; RI, refractive index; S, Svedberg; UV, ultraviolet. Images adapted from Uroda et al 10 under a Creative Commons Attribution 4.0 license (https://creativecommons.org/licenses/by/ 4.0/legalcode).
Single-particle AFM imaging has previously been used to visualize short nucleic acids 34 and their ribonucleoprotein complexes 35 , revealing their mechanistic details, interaction interfaces and conformational dynamics [36][37][38] . For instance, AFM imaging has enabled morphological analysis of short structured motifs of genomic RNA of the turnip yellow mosaic virus (TYMV) 39 , internal ribosome entry site (IRES) elements of genomic RNA of hepatitis C virus (HCV) 40 , and de novo-designed small RNA nanostructures with potential biomedical applications 35,41 . However, the length and complexity of lncRNAs present unique challenges with respect to such short, structured RNA motifs. The specific molecular properties of lncRNAs require completely different production, stabilization, and handling of the targets and have specific implications for data processing, as we describe in detail in our protocol.   -99). a,b, Graphical depiction of how characteristic length scales relate to the spatial frequency of specific features of the PSD. a, Schematic of a particle, characterized by an overall molecular size X (corresponding to spatial frequency ξ) and by an intramolecular feature Z (corresponding to spatial frequency ψ). b, The corresponding PSD of such a particle. c,d, PSD plots from simulated 60-nm (c) and 10-nm (d) particles. Convolution with a tip with a 10-nm curvature radius is simulated. Each plot is the average of 10 particles; error bars = SEM. Representative simulated particles are depicted in the bottom left inset in each plot. e, Selection of an ROI (yellow box) around an lncRNA particle in a prototypical AFM acquisition (Step 95). xy scale bar, 200 nm. f, Detail of the lncRNA molecule within the ROI. xy scale bar, 100 nm. The z color scale bar applies to both e and f. g, PSD of the particle displayed in f, taken along the fast-scanning axis (x) (Step 96). h, Average PSD from 100 such ROIs, displayed with superimposed fits to the low-frequency plateau and the two regions of the f −α decay (Steps 97-100). Intercepts at 25 nm and 80 nm relate to intermolecular domain size and overall molecule diameter (2D projected), respectively. Error bars = SEM.
Electron microscopy (EM) is an alternative approach to AFM for single-particle imaging of biological macromolecules-even at high resolution (i.e., 2-5 Å). This technique has recently gained momentum, especially since the advent of single-electron detectors 42 . However, for still largely unclear reasons, EM has so far not been used successfully in the characterization of pure RNA molecules; that is, EM structures exist only for RNAs in complex with protein partners. We have attempted visualization of the lncRNA MEG3 by both negative-staining EM and cryo-EM, but we encountered technical challenges that we have not yet overcome. For instance, we have imaged MEG3 at increasing concentrations of magnesium in the same range as for AFM imaging (10-25 mM Mg 2+ ). By negative-staining EM, we could visualize individual particles on the grids, but the predominant aggregates prevented us from obtaining well-defined reference-free 2D class averages (Extended Data Fig. 1a,b). Furthermore, by cryo-EM on holey grids with and without continuous carbon support, we could observe only MEG3 aggregates (Extended Data Fig. 2). Different glow-discharging conditions (negative and positive polarity with varying current and duration) did not improve our EM imaging results. It thus seems apparent that specific EM optimization is required to image lncRNAs in the future. Certainly, it will probably be necessary to optimize grid preparation, that is, use of different support substrates such as graphene or graphene oxide 43,44 , and vitrification, that is, blot-free and sample-spraying-based vitrification methods 45,46 . Besides grid preparation, a strong focus should, however, be applied to the optimization of the target. Production of minimal functional cores of lncRNAs, encompassing only the most highly structured regions will probably be necessary to obtain samples that ensure the highest image contrast, that are resistant to grid preparation and staining/freezing, and that yield particles that can be classified, averaged, and reconstructed in 3D. In this context, our current protocol will serve as an invaluable reference for optimizing targets and experimental conditions, which will crucially enable highresolution lncRNA imaging by cryoEM in the coming years.

Limitations of the approach
Limitations of our protocol are the throughput of our analysis, the imaging resolution, and the fact that we currently cannot yet provide an atomic description of the 3D structures of our targets. First, regarding throughput, the following considerations should be taken into account. With our approach, we have visualized~100 target particles per condition. Although providing sufficient statistical sampling to visualize differences between lncRNA folding states 10 , this parameter could be beneficially improved, for example, by using high-speed microscopes 47 , to capture an even more complete spectrum of lncRNA conformations. Importantly, the sample preparation and biochemical characterization presented in our protocol (Steps 1-68) are fully compatible with higher-speed AFM imaging, but sample adsorption to the mica may require optimization, that is, of sample concentration and adsorption time (Steps 69-74). In addition, the operation of high-speed microscopes is different from that of the Multimode microscopes presented here (Steps 75-86) and will thus need to be carried out according to the manufacturer's specifications and under the supervision of facility managers and AFM experts.
Second, regarding AFM image resolution, the following aspects should be considered. In AFM, 'resolution' can be defined in two ways. First, 'sampling resolution' determines the spacing between recorded data points. In our case, most images were acquired at a 1-µm scan size with 1,024 pixels (px) per line. Thus, our sampling resolution was 0.98 nm/px. Consequently, no molecular features smaller than 0.98 nm could be determined. Second, and analogously to the optical Rayleigh criterion, 'image resolution' can also be defined as half of the distance that can distinguish two different AFM topographic features. In this respect, at our sampling resolution (0.98 nm/px), we could easily determine distances between structural features of~3 nm, and we thus estimate our image resolution to the value of~1.5 nm. On the basis of these considerations, we can state that-although AFM does not offer atomic resolution for lncRNAs-it can unambiguously characterize their topography and their degree of structural compaction at nanometer resolution.
Last but not least, regarding the molecular description of our targets, it has to be considered that our lncRNA single-particle images are topographic AFM images that rely on contact-based AFM. On the one hand, contact-based AFM images are potentially affected by the convolution of sample topography and AFM tip geometry. Because the AFM tip geometry cannot be experimentally determined with sufficient precision for samples that are a few nanometers in height, true deconvolution methods cannot be applied to our image processing pipeline 48 . To reduce the tipconvolution effect, dedicated image processing techniques that are in development could be used instead 49,50 . However, deconvolution is not necessary for our protocol, whose main application is to image conformational differences of lncRNAs in their functional and non-functional states and in folded and unfolded forms. In this context, considering the height of lncRNA molecules on the mica (a few nanometers 10 ), the maximum convolution effect that our images suffer from is close to the nominal size of the AFM tip apex (2 nm, in our case). This small effect will not perturb the length measurement of lncRNA molecules, which are 30-85 nm in size, according to our SAXS, AUC, and power spectral density (PSD) analyses 10 . On the other hand, topographic AFM images are orientation dependent and cannot be used for reconstructing the atomic 3D coordinates of the target, especially not at our imaging resolution (~1.5 nm, see above). Nonetheless, lncRNA molecules will typically adsorb to the mica in all possible 3D orientations, so our AFM images capture different views of the lncRNA particles, and these views are accounted for in our PSD analysis. Moreover, our sample deposition and imaging procedures produce particles that have dimensions similar to those in solution, as determined by AUC and SAXS 10 , and it is known that viral particles or other nucleic acids are also adsorbed to the mica in conformations that closely mimic their conformations in solution (±10-15%) 51 . In the future, AFM topographic single-particle images could be used as experimental constrains to guide reconstitution of lncRNA 3D volumes by integrative structural biology methods, as can already be done for proteins [52][53][54] .

LncRNA selection
Our protocol is applicable to lncRNAs possessing a broad spectrum of lengths and sequence compositions. Indeed, our non-denaturing purification method is compatible with lncRNAs ranging from several hundred to a thousand nucleotides long 10,25,26,55 , provided that the appropriate size-exclusion chromatography resin is used, as described in the 'Equipment setup' section.
For any new lncRNA of interest, we recommend performing secondary structure probing before the SAXS/AFM structural analysis 13 . Chemical or enzymatic secondary structure probing methods have the potential to distinguish highly structured from loosely structured lncRNAs/lncRNA motifs 17 . Loosely structured lncRNAs-of which the RepE motif of XIST could be one 28 -are unlikely to be suitable candidates for topographic or shape analysis. However, because lncRNAs are typically modular, one could determine the boundaries between lncRNA domains by secondary structure probing and then focus on the analysis of only the most structured domains.
Finally, for any new target under investigation, we also advise developing functional assays as early as possible, this is, phenotypic assays in cells 10 or in model organisms 24,33 , or in vitro assays such as protein binding experiments 26 . These assays will promote the design of biologically relevant mutants, ensuring that the structural analysis focuses on functionally meaningful features of the target and provides useful mechanistic insights.
LncRNA preparation and folding: important physical-chemical parameters and the key role of magnesium ions Key parameters that affect lncRNA stability and folding are the temperature, the folding strategy, and the ionic conditions. First, we strongly recommend carrying out lncRNA purification at room temperature (~20-25°C). Cooling or freezing the lncRNA typically 'traps' the target in heterogeneous conformations, which compromise imaging.
Second, we strongly recommend purifying lncRNAs under non-denaturing conditions. Alternative protocols, which involve denaturation and refolding by heating and annealing and which are widely used for short RNAs, are typically more time consuming and produce lower yields. Most importantly, such protocols introduce heterogeneity into the sample, compromising imaging (Extended Data Fig. 3).
Third, it is essential to optimize ionic conditions experimentally for each new lncRNA target. It is emerging that lncRNAs can be studied at near-physiological concentrations of magnesium (1-10 mM) 10,26,33 , but it is paramount to characterize the specific behavior of each target within this range of concentrations, because magnesium ions crucially determine the functional and structural properties of lncRNAs.
We recommend performing folding studies by titrating magnesium concentrations using sedimentation-velocity analytical ultracentrifugation (SV-AUC, Steps 20-38; ref. 13 ), which measures the velocity at which an RNA molecule sediments to the bottom of a closed compartment under high angular velocities 56 . For our study, we used a Beckman Coulter Analytical XL-A/XL-I instrument equipped with an An-50 Ti analytical rotor, and Nanolytics Instruments cells and counterbalance (see also ref. 13 ). SV-AUC determines four important properties of the target lncRNA. First, SV-AUC determines the homogeneity of the sample at each magnesium concentration (Step 35). Coexisting species could be oligomers of the RNA molecule of interest, or unspecific aggregation products, and generally appear at high magnesium concentrations (Extended Data Fig. 4). Should these aggregated species dominate the particle distribution of the sample, further investigation of the folding conditions is necessary, because lncRNA aggregation will otherwise preclude biophysical characterization, particularly ab initio 3D shape determination by SAXS. Second, SV-AUC measures the sedimentation coefficient (s) and sedimentation coefficient distribution [c(s)] of the lncRNA of interest (Step 35). By appropriate conversion of the experimental sedimentation coefficient (s) into a theoretical sedimentation coefficient at standard conditions (i.e., in water at 20°C, s(20,w)), it is possible to compare the hydrodynamic properties of different samples (Step 35). Third, AUC determines the lncRNA frictional ratio (f/f 0 ), which is indicative of the shape of the target lncRNA (axial ratios for oblate and prolate ellipsoid models, Step 35; ref. 57 ). The f/f 0 value can be calculated in SEDFIT, assuming that the partial specific volume and the hydration of the lncRNA are 0.53 mL/g and 0.59 g/g, respectively 58 . Fourth, AUC determines the lncRNA hydrodynamic or Stokes radius (R H ), which is defined as the radius of an equivalent hard sphere diffusing at the same rate as the molecule under observation (Step 36). Compact folded molecules have smaller R H values, higher sedimentation coefficients, and a frictional ratio closer to 1 than unfolded ones. By specifically plotting R H values at increasing magnesium concentrations and fitting the graph to a Hill equation (Step 37), one can derive the Hill coefficient, which represents an estimate of the magnesium cooperativity for folding, and the k 1/2Mg value, which is the magnesium concentration at which the target molecule reaches 50% of its maximal compaction.

LncRNA quality control
Homogeneity and folding can additionally be tested by native gel electrophoresis, DLS, and MALLS (Step 39A/B/C).
Native gel electrophoresis (Step 39A(i-viii) offers the advantage of a direct visualization of the migration pattern of the target and its potential to fold into alternative conformations. Our procedures describe how to analyze the folding state of an lncRNA such as MEG3 variant 1 (MEG3 v1), which possesses a k Mg1/2 value of 6.9 mM as experimentally determined by AUC 10 . Conditions at lower magnesium concentrations will result in partially folded species with lower electrophoretic mobility and conditions at higher magnesium concentrations will result in folded species with higher electrophoretic mobility. The magnesium concentration values we have used for MEG3 v1 (2, 5 and 10 mM MgCl 2 ) 10 can be used as initial references for magnesium concentrations but should be adjusted around the k Mg1/2 value determined by AUC for each lncRNA of interest. DLS (Step 39B(i-xiii) is performed to assess polydispersity of the target lncRNA preparation. DLS procedures that we describe here for lncRNA studies derive from well-established DLS protocols optimized for other biological macromolecules 59,60 .
Finally, MALLS (Step 39C(iii-xv) is based on the principle of proportionality between the intensity of the scattered light from biological macromolecules and their molar mass. For lncRNAs, we recommend coupling MALLS with size-exclusion chromatography (SEC-MALLS) to maximize sample homogeneity 60 . Our procedure specifically describes how to perform SEC-MALLS using selfpacked high-performance columns and an apparatus comprising a UV detector, a Wyatt Dawn HELEOS II light-scattering detector and a Wyatt Optilab T-rEX refractometer. The SEC-MALLS results (Step 39C(iii-xi) inform about the experimental molar mass and polydispersity (Mw/Mn) of the target lncRNA. The experimental molar mass of the target should be similar (±5%) to the theoretical molar mass calculated from the target lncRNA sequence or a multiple n (integer number) thereof. If n = 1 the sample is monomeric, if n > 1, the sample is an oligomer. If the experimental molar mass is smaller than the calculated mass, this probably indicates sample degradation or incomplete transcription (see Troubleshooting section). If the experimental molar mass is larger than the calculated mass, but n is not an integer, the sample is probably inhomogeneous and in equilibrium between multiple oligomeric states. Although this property may have biological relevance, it has to be carefully considered in the interpretation of AUC, SEC-SAXS, DLS, and AFM results. Moreover, a sample with monodisperse behavior will present a polydispersity value of 1. Any deviation will indicate polydispersity of the sample and should be optimized (see 'Troubleshooting' section). Monodispersity can also be visually detected by the overlap of the UV intensity, refraction index, and Rayleigh signals (or UV intensity and SAXS scattering intensity) peaks over the elution volume. A perfect overlap of the UV and the Rayleigh ratio curves indicates monodisperse behavior (Step 39C(xv)). In addition, the molar mass distribution of a monodisperse sample will be constant over the elution volume (Extended Data Figs. 3a-c and 4d).

LncRNA 3D shape determination by SEC-SAXS
Small-angle X-ray scattering (SAXS, Steps 40-64) is used to derive low-resolution information about the 3D shape of lncRNAs in solution 61 . SAXS, like MALLS, is also based on the proportionality between the intensity of scattered light and the mass of the molecular targets. If the angular dependence of the scattered light is measured in the horizontal plane, it is possible to determine the size of the molecule. This size measurement is known as the radius of gyration (R g ) and is a measure of the size of the molecule weighted by the mass distribution around its center of mass (Steps 62 and 63). SAXS additionally informs about the maximal pairwise interatomic distance in the target lncRNA (D max ), and its overall volume (Steps 62 and 63). If the R g /D max ratio is between 0.8 and 1.1, the lncRNA is globular, whereas if the R g /D max ratio is >1.1, the sample is elongated.
As for MALLS, we specifically recommend coupling SAXS experiments with SEC, to maximize sample homogeneity. Homogeneity is particularly critical in SAXS, where small percentages of aggregates can affect the calculation of R g and prevent accurate ab initio shape determination 62 . We performed SEC-SAXS experiments at the BioSAXS BM29 beamline at ESRF 63 . When ab initio shape determination is possible, the derived 3D volumes can be compared to atomistic models of the target lncRNA predicted in silico from experimentally determined secondary structure maps 33 . It has to be considered, however, that current software is not yet powerful enough to accurately predict high-resolution 3D structures and generally can process only lncRNA sequences up to a few hundred nucleotides long. Such predictions and the associated fit to SAXS-derived volumes can thus yield only qualitative information, not high-resolution 3D experimental models.
LncRNA topographic imaging by AFM AFM (Steps 65-94) is used to visualize individual particles of the target lncRNA with the objective of capturing conformational differences in their functional and non-functional states or in different folding states. First, a large-view-field AFM image (5 × 5 μm, Step 83) gives an immediate estimation of sample homogeneity on the mica, on the density of the particles, and on the presence of any undesired features along with the target of interest (i.e., salt crystals, see 'Troubleshooting' section). If the large-view-field image is of good quality, one can proceed with higher-resolution scans (1 or 2 µm at 1,024 px per line, Step 85). From these scans, an immediate type of analysis consists of correlating particle size with the dimensions measured in solution by hydrodynamic techniques (i.e., D max measured by SAXS, Steps 62 and 63). A more systematic acquisition of high-resolution scans should aim at imaging a large number of particles (i.e., at least~100 particles per sample, per condition, as we did for MEG3 v1; ref. 10 ). On these particles, PSD analysis can then be carried out (see below).
The data acquisition procedures described below (Steps 75-86) are appropriate for a Multimode 8 Nanoscope V equipped with NanoScope 9.2 software but need to be customized for microscopes from different vendors. Readers should refer to the operational instructions for their specific instrument and crucially seek advice from the microscope operations manager or from an expert AFM microscopist before use. Importantly, on open-loop AFM systems such as Multimodes, the piezo scanner must be calibrated. Calibration should be done at the appropriate scanning size (≤5 µm) for imaging single molecules such as lncRNA. By default, AFM scanners are calibrated by manufacturers on the full piezo scanner length, which is~100 µm in the case of Multimode J scanners. Refer to the manufacturer's instruction manual for details on how to perform the dedicated calibration at lower voltages.
Finally, it must be considered that during adsorption to the mica, cations act as bridges between the lncRNA and the mica surface. Thus, the magnesium concentration used to image folded conformations of lncRNAs by AFM, should be higher than the k 1/2Mg value, not only to ensure folding, but also to account for this adsorption effect. However, if the k 1/2Mg value of the lncRNA of interest is too high (i.e., >25 mM), thick particulate may deposit on the mica, compromising the image quality (see 'Troubleshooting' section and Extended Data Fig. 5).
LncRNA image processing using PSD analysis PSD analysis is performed on square AFM topographic images to extract characteristic length scales present and recurring in the image. Typically, these characteristic length scales associate to interparticle distances, particle size and any other topographic feature that recurs in the particles, such as protrusion, branches, or globular features. This ability of the PSD analysis derives from it being a form of Fourier analysis of the signal, leading to an emphasis on characteristic frequencies reappearing in the signal. The PSD is built by taking the squared modulus of the Fourier transform of the signal: Where Δx is the pixel size, L the size of the image, and FFT denotes the fast Fourier transform of the image, the result of an algorithm to compute a discrete, pixel-based version of the Fourier transform, which is otherwise a continuous function 64 . Notably, PSD units depend on the software used to calculate the PSD and on the dimensionality of the source data. For the calculation of characteristic distances, this scaling effect is completely immaterial. In our case, the software Gwyddion outputs PSD amplitudes as volumes; i.e., their units are m 3 . In our specific case, for each line of the topographic image, an FFT is calculated, the corresponding PSD computed, and then the PSDs of all lines are averaged together to yield a global PSD of the image. The PSDs are collected along the fast scanning axis of the microscope to avoid artifacts due to line-to-line offset. Selecting small regions of interest (ROIs) surrounding individual lncRNA particles enables focusing on particle features rather than interparticle distances in the PSDs. In the case of a periodic signal, the PSD displays a characteristic peak at the corresponding frequency, and the interpretation is straightforward. In the case of lncRNA particles, how can we relate PSD to particle size? We shall note here that qualitatively the PSD of a typical lncRNA particle displays a lowfrequency plateau (f) followed by a power-law decay toward high frequency, of the form f −α . Intuitively, this means that there is a flat region outside the particle (low-frequency plateau) and then a knee where the sequence of higher frequencies of decremental density (following a power law), arising from the length scales associated with the actual particle, gives rise to the decay. Quantitatively, each change of slope of the power-law decay relates to a characteristic length scale associated with the particle (Fig. 3a). The intersection between the plateau and the power law decay defines the average maximal particle size (as highlighted by simulated data, Fig. 3b), whereas other changes of slope relate to intra-particle features.
The characteristic frequencies can be obtained by fitting the power-law-decay regime(s) of the PSD and looking at the intersections of the distinct power law regimes (including the low-frequency plateau).
Operationally, the PSD for all the particles imaged under the same experimental conditions is computed, and the average PSD for the specific experimental conditions is plotted against the spatial frequencies ω, where ω = 2πf. The corresponding spatial distances (x) can be easily calculated, taking into account that ω ¼ 2π x . Although, in principle, it is always possible to measure the dimension of each particle individually, the rich variety of 2D projections of complex structural conformations, together with the bias induced by manual selection of profiles or contours, makes PSD analysis a powerful and rapid tool for assessing these structural parameters.
The PSD analysis can provide information at two levels. First, the presence of multiple slopes in the f −α region provides immediate qualitative indication of a structured morphology of the lncRNA. Second, knowing the positions of the intercepts of the distinct power law decays enables estimation of the physical size of the structural domains.

Biological materials
• Expression plasmid coding for the target lncRNA downstream of a T7 promoter sequence and immediately upstream of a single restriction enzyme cutting site 13 . The plasmid carries antibiotic resistance to ampicillin. In our example shown in this protocol we use plasmid pTU1 10 , which can be obtained from the corresponding author upon request.

Reagents
! CAUTION We advise wearing protective clothing (lab coats and gloves) when handling any reagent mentioned in this protocol. Recycling or disposal of solid and liquid waste should be done according to local and institutional regulations c CRITICAL All reagents should be stored and prepared according to the manufacturer's recommendations, if not otherwise indicated. We recommend using these reagents for RNA work only, avoiding the use of spatulas to weigh them, and changing gloves frequently to avoid RNase contamination. •

Reagent setup
c CRITICAL All reagents must be prepared according to good RNA-handling practices. In particular, clean gloves should be worn at all times, and glassware should be baked in the oven at 180°C for 4 h before use. Only DEPC-treated water should be used to prepare solutions and buffers. Immediately after preparation, all solutions are filtered through 0.22-μm membrane filters or 0.22-μm Ultrafree GV Durapore filters.

DEPC-treated water
This solution is 0.1% (vol/vol) DEPC in Milli-Q water. Measure 0.5 mL DEPC and adjust to a total volume of 500 mL with Milli-Q water. Incubate at 37°C for 2 h and autoclave twice. This solution can be prepared in advance and stored at room temperature for several months.

Ribonucleotide stock solutions
These are 100 mM solutions of each ribonucleotide in DEPC-treated water. Weigh~150 mg of each ribonucleotide powder and dissolve it in~0.3 mL of DEPC-treated water. Adjust the pH by successively adding NaOH and measuring the pH with pH strips until reaching pH~7.0. Measure the concentration in a NanoDrop spectrophotometer at 260 nm and adjust the concentration to 100 mM with DEPC-treated water, using the corresponding extinction coefficient values provided in Table 1. These solutions can be prepared in advance and stored at −20°C for several months.
DTT, 1 M solution Weigh 1.54 g DTT and dissolve in a total volume of 10 mL DEPC-treated water. This solution can be prepared in advance and stored in 100-µL aliquots at −20°C for years. Avoid multiple freeze-thaw cycles.
MgCl 2 , 1 M solution Weigh 2.03 g MgCl 2 ·6H 2 O and dissolve in a total volume of 10 mL DEPC-treated water. Correct for the decrease in the apparent density of the solution (due to the hygroscopicity of the salt) by weighing 1 mL of solution on a precision balance and adding salt until reaching the weight for the corresponding concentration (1.07 g/mL at 20°C). This solution can be prepared in advance and stored at 4°C for several months.
Tris-HCl, pH 7.5 and 8.0, 1 M solutions Weigh 60.55 g Tris base and dissolve in 450 mL deionized water and adjust the pH to 7.5 or 8.0 by adding HCl. Adjust the volume to 500 mL with deionized water. These solutions can be prepared in advance and stored at room temperature for several months.
Spermidine, 2 M solution Weigh 2.9 g spermidine and dissolve in a total volume of 10 mL DEPC-treated water. This solution can be prepared in advance and stored as small aliquots at −20°C for several months.

NaCl, 5 M solution
Weigh 146.1 g NaCl and dissolve in a total volume of 500 mL DEPC-treated water. This solution can be prepared in advance and stored at room temperature for months.
KCl, 2 M solution Weigh 74.55 g KCl and dissolve in a total volume of 500 mL DEPC-treated water. This solution can be prepared in advance and stored at room temperature for months.
EDTA-Na, pH 8.0, 0.5 M solution Dissolve 18.61 g EDTA in 80 mL DEPC-treated water and adjust the pH to 8.0 by adding 5 M NaOH. Adjust the volume to 100 mL with DEPC-treated water. This solution can be prepared in advance and stored at 4°C for several months. EDTA disodium salt will not fully dissolve until the pH of the solution is adjusted to~8.0 by the addition of NaOH.
MOPS-K buffer, pH 6.5, 1 M solution Dissolve 20.93 g MOPS in 75 mL DEPC-treated water and adjust the pH to 6.5 by adding 4 M KOH. Adjust the volume to 100 mL with DEPC-treated water. This solution can be prepared in advance and stored at room temperature for several months. When diluted to 8 mM, the pH of this solution will decrease to~6.0.

Filtration buffer, 1×
This solution is composed of 8 mM MOPS buffer, pH 6.5, 100 mM KCl, and 0.1 mM EDTA-Na, pH 8.0, in DEPC-treated water. Mix 4 mL 1 M MOPS buffer, pH 6.5, 25 mL 2 M KCl, and 0.1 mL 0.5 M EDTA-Na, pH 8.0, in a total volume of 500 mL DEPC-treated water. Filter the filtration buffer through a 0.22-µm filter using a vacuum pump before use in an FPLC system. This solution can be prepared in advance and stored at room temperature for several months.

TE, pH 8, solution
Tris-EDTA (TE) solution is 10 mM Tris-HCl, pH 8.0, and 1 mM EDTA-Na, pH 8.0, in DEPC-treated water. Mix 1 mL 1 M Tris-HCl, pH 8.0, and 0.2 mL 0.5 M EDTA-Na, pH 8.0, in a total volume of 100 mL DEPC-treated water. This solution can be prepared in advance and stored at room temperature for several months.

Transcription buffers (10×)
The solutions used for the various transcription buffers used in this protocol are listed in Table 2. All solutions for transcription buffers 1-4 can be prepared in advance and stored as small aliquots (i.e., 100 µL) at −20°C for several months. BSA must be added fresh to the transcription reaction of transcription buffer 4.

Proteinase K storage buffer (1×)
This solution is 10 mM Tris-HCl, pH 7.5, 1 mM CaCl 2 , and 40% (vol/vol) glycerol. Mix 0.1 mL 1 M Tris-HCl, pH 7.5, 10 µL 1 M CaCl 2 , and 4 mL glycerol in a total volume of 10 mL DEPC-treated water. This solution can be prepared in advance and stored at 4°C for several months.
CaCl 2 , 1 M solution Weigh 1.11 g CaCl 2 and dissolve in a total volume of 10 mL DEPC-treated water. This solution can be prepared in advance and stored at 4°C for several months.

Proteinase K, 30 mg/mL suspension
This solution is a 30 mg/mL suspension of lyophilized proteinase K powder in proteinase K storage buffer. Weigh 30 mg proteinase K using an analytical scale and dissolve it in a total volume of 1 mL proteinase K storage buffer . This solution can be prepared in advance and stored at −20°C for several months. Orange G solution, 2% (wt/vol) Weigh 1 g orange G and dissolve it in 50 mL of DEPC-treated water. This solution can be prepared in advance and stored at room temperature for several months.  DEPC-treated water. This solution can be prepared in advance and stored at room temperature for several months.

RNA native loading dye (6×)
This solution is 0.5% (wt/vol) orange G, 0.5× TB buffer, and 40% (wt/vol) sucrose in DEPC-treated water. Weigh 20 g sucrose and dissolve it in 30 mL of DEPC-treated water. Add 12.5 mL 2% (wt/vol) orange G, and 2.5 mL of 10× TB buffer. Fill to a total volume of 50 mL with DEPC-treated water. This solution can be prepared in advance and stored at 4°C for several months.

DNA native loading dye (6×)
This solution is 0.5% (wt/vol) orange G, 10 mM EDTA and 50% (vol/vol) glycerol in DEPC-treated water. Mix 12.5 mL of 2% (wt/vol) orange G, 1 mL of 0.5 M EDTA-Na, pH 8.0, and 25 mL glycerol. Fill to a total volume of 50 mL with DEPC-treated water. This solution can be prepared in advance and stored at 4°C for several months.

MgCl 2 solutions (10×)
These solutions are 20, 50, and 100 mM MgCl 2 in DEPC-treated water. Measure 2 mL, 5 mL, and 10 mL 1 M MgCl 2 and mix each into a total volume of 100 mL DEPC-treated water. These solutions can be prepared in advance and stored at room temperature for several months.

SYBR Safe staining solution (1×)
This solution is 1:10,000 SYBR Safe in 1× TB buffer. Measure 5 μL of SYBR Safe dye into a total volume of 50 mL 1× TB buffer. This solution should be prepared fresh and can be reused up to three times.
Equipment setup FPLC purification system with a self-packed high-performance column Any FPLC system equipped with wavelength detection at 260 nm can be used. We use a Tricorn 10/300 empty high-performance column self-packed by gravity with Sephacryl S-500 HR resin for RNAs ranging from 1,000 to 2,000 nt long, although any new RNA being purified should ideally be tested on different size-exclusion resins to determine the best separation range. For additional references, we advise consulting the following website: gelifesciences.com/en/us/shop/chromato graphy/resins/size-exclusion. Before the first use with RNA, the FPLC system must be thoroughly cleaned by passing successively 0.5 L filtered DEPC-treated water supplemented with RNaseZap and 0.5 L of pure DEPCtreated water. Before and after each use, the column should be equilibrated with 3 column volumes (CV) of filtration buffer and 3 CV of DEPC-treated water, respectively c CRITICAL The first time the self-packed column is connected to the system, the resin will compact slightly because of the increase in pressure. The adaptor of the column must then be screwed in until the resin is in contact with the top coarse filter before the column can be used.

Cloning, in vitro transcription, and lncRNA purification • Timing 3 d (cloning); 3 h (in vitro transcription); 3 h (purification)
1 Clone the RNA of interest in a high-copy-number vector immediately downstream of a T7 promoter sequence. We have used the MEG3 v1 sequence as deposited in NCBI (NR_002766.2) and cloned it into a modified pBluescript vector 70,71 by sequence and ligation-independent cloning (SLIC) 72 . We named this vector pTU1 10 . For cloning, use LB Agar Petri dishes with the appropriate antibiotic (ampicillin, in the case of pTU1). 2 Linearize 100 µg vector with the appropriate restriction enzyme overnight.
j PAUSE POINT Linearized DNA can be stored at −20°C for months 3 Set up an initial screening of in vitro transcription reactions by mixing the linearized vector from Step 2 or 12 with T7 polymerase and various buffers (transcription buffers 1-4, see Table 2) in a total reaction volume of 25 µL per transcription condition. This initial screening will determine which transcription condition provides the highest yield and homogeneity for a given lncRNA (Table 3). At this stage, estimate the yield qualitatively, as described in Steps 4 and 5. c CRITICAL STEP Transcription buffers 1-4 contain different magnesium and salt concentrations, and the choice of one over the others is empirical and depends on the particular lncRNA of interest. For example, for MEG3 v1 the highest transcription yield was obtained with transcription buffers 3 and 4 (Extended Data Fig. 6a). 4 After transcription, spin down the reaction at 20°C and 21,130g (15,000 r.p.m.) for 5 min to pellet the precipitated pyrophosphate. 5 Mix 10 μL of the transcription reaction with 6× RNA native loading dye and load it onto a 1% (wt/vol) native agarose gel containing 1:10,000 (vol/vol) SYBR Safe solution (3 μL for a 30-mL agarose gel), together with 5 μL of Quick-Load purple 2-log DNA ladder. Run the gel at 100 V for 30-45 min, depending of the size of your RNA. When the run is complete, visualize the gel under UV light, using a Gel Doc or Chemidoc device. Although the DNA marker will not provide accurate size estimations, it will serve as an indication of the level of compaction of the RNA transcribed under the different transcription buffers, the relative yield, and the presence of any degradation subproducts. Transcription yield can be considered satisfactory, that is, the yield is sufficient for the applications described downstream (Steps 20-100), when the lncRNA band on the gel is more intense than each individual band of the DNA ladder (provided that the exact volumes of sample and ladder as described above have been loaded onto the gel). We routinely obtain sample bands that are 10-100 times more intense than those of the ladder. Examples of agarose gels for MEG3 transcription screenings are shown in Extended Data Fig. 6a c CRITICAL STEP When the transcription yield is sufficient (see above), continue directly to Step 14. If the yield is low (i.e., sample bands on agarose gels no more intense than the DNA ladder), we recommend purifying the linearized DNA as described in Steps 6-12 and repeating the initial screening of transcription conditions as described in Step 13. Purification of the linearized vector causes partial loss of template DNA but may increase the yield of RNA transcription. ? TROUBLESHOOTING 6 (Optional) Purification of the linearized DNA (Steps 6-13). Extract the restriction reaction from Step 2 with one volume of phenol/chloroform/isoamyl alcohol (25:24:1 vol/vol/vol) followed by a second extraction with one volume of chloroform/isoamyl alcohol (24:1 vol/vol). For each extraction, mix the sample with an equal volume of solvent, vortex, and let decant on the bench until the watersoluble and solvent phases separate. Carefully aspirate the upper water-soluble phase containing the linearized DNA and transfer it to a clean tube. c CRITICAL STEP Be conservative in the recovery of the water-soluble phase, especially in the second extraction. It is preferable to lose some DNA than to carry over chloroform/isoamyl alcohol contamination, which will inhibit the RNA polymerase and prevent transcription. j PAUSE POINT Once optimal transcription conditions are identified, upscaling of transcription can be performed at a later date. 14 Repeat the transcription in the selected condition as described in Step 2, but in a larger volume, for example, 100 µL-1 mL. 15 After pyrophosphate precipitation (see Step 4), collect the supernatant in a clean Eppendorf tube. 16 Purify the resulting RNA following a non-denaturing protocol, as previously described 13 . Briefly, add 50 µL TURBO DNase to 1 mL transcription reaction and incubate for 30 min at 37°C. After this step, add 50 µL proteinase K, 30 mg/mL suspension directly to the reaction tube and incubate for 45 min at 37°C. Finally, add 26.4 µL of EDTA-Na, pH 8.0, to eliminate free magnesium, which will otherwise produce unspecific aggregation of the RNA in the next purification step. 17 Purify the transcribed lncRNA of unreacted nucleotides, digested protein peptides, proteinase K, and all other transcription components by filtering with Amicon Ultra-0.5 centrifugal filter units with a 100-kDa molecular weight cut-off and filtration buffer to concentrate and rebuffer the sample successively. We typically split the 1-mL transcription reaction among several filter units and concentrate for at least three successive rounds of centrifugation for 5 min at 2,348g (5,000 r.p.m.) and 20°C. 18 Isolate a homogeneous form of the RNA of interest from other conformations, potential aggregates, and prematurely terminated transcripts by SEC using an FPLC purification system with a selfpacked high-performance column equilibrated in filtration buffer. Use a constant flow rate of 0.5 mL/min to avoid changes in the compaction of the resin, which will decrease the reproducibility of the separation, and obtain fractions of 0.5 mL each. Keep the fraction corresponding to the highest peak of the chromatogram for further applications. Measure its concentration on a spectrophotometer using the Beer-Lambert law and the extinction coefficient appropriate to the lncRNA of interest, as calculated from its sequence and the extinction coefficients of individual nucleotides reported in Table 1.
? TROUBLESHOOTING 19 Check the integrity of the RNA with a 2100 Bioanalyzer using the RNA 6000 Nano Kit. To do so, prepare an aliquot in the range of 25-500 ng/µL and run it against the provided Agilent RNA 6000 ladder to confirm that a single peak, corresponding to the expected size of the target lncRNA, is visualized. An example of a Bioanalyzer trace of purified MEG3 is shown in Extended Data Fig. 6b. c CRITICAL STEP Do not denature the RNA, for example, by gel electrophoresis or ethanol precipitation. Refolding of lncRNAs results in highly heterogeneous preparations (Extended Data Fig. 3). j PAUSE POINT Do not cool or freeze the RNA. Store at room temperature, if necessary. Follow-up experiments can be performed on different days but-although the RNA is stable at room temperature for several days-we advise using newly transcribed, fresh RNA each time.  -38). A detailed description of the data analysis procedure has been provided elsewhere 13,73 . Briefly, when the run is finished, proceed to data analysis, which we routinely perform in SEDFIT 69 . Alternatives to SEDFIT are DCDT+ (http://www.jphilo.mailway.com/dcdt+. htm), SedAnal (http://www.sedanal.org/), Heteroanalysis (https://core.uconn.edu/files/auf/ha-help/ HA-Help.htm), Svedberg (http://www.jphilo.mailway.com/svedberg.htm), and UltraScan (http:// ultrascan.aucsolutions.com) 74 . The performance of some of these software alternatives for analyzing structured nucleic acids has been compared 57 . 32 Load the files acquired by the XL-A/XL-I user interface software into the SEDFIT software. Set the meniscus and cell bottom limits to indicate the limits of the analysis. 33 Set up a Continuous c(s) distribution analysis. 34 Introduce the following parameters for the fitting: the molar mass of the target lncRNA; the range of s values to display, the resolution (100 is default); the partial specific volume, which is 0.53 mL/g for RNA molecules 58 ; the buffer density and viscosity as calculated by SEDNTERP (http://www. jphilo.mailway.com/download.htm); and the frictional coefficient initial value (we recommend starting with a value of 2 for lncRNA analysis). 35 Click on the Run and then on the Fit menu functions to start optimizing the parameters. Repeat the fit with optimized parameters, as described previously 13 . Derive the sedimentation coefficient (s), the equivalent sedimentation coefficient at standard conditions (water, 20°C) [s(20,w)], and the frictional coefficient (f/f 0 ). ? TROUBLESHOOTING 36 Calculate the Stokes radius (R H ) of the lncRNA at each magnesium concentration, using the calculator option in SEDFIT. 37 Represent the Stokes radii as a function of the magnesium concentration and fit the curve to a Hill equation, using Prism 6: where R H is the Stokes radius (in ångströms), R H,0 and R H,f are the Stokes radii for unfolded and folded RNA, which correspond to the R H values determined at~0 mM magnesium and at a concentration of magnesium higher than physiological (i.e., ≥50 mM), respectively. The K Mg value is the concentration of magnesium at which 50% of the RNA is folded, and n is the Hill coefficient, which indicates the cooperativity of the folding transition 58,75 . An example of a magnesium titration by AUC for MEG3 v1 is provided in Extended Data Fig. 4c. 38 To ensure folding in follow-up experiments, use a magnesium concentration higher than the K Mg calculated above. c CRITICAL STEP Accurate experimental determination of the magnesium concentrations required for folding the target lncRNA, as obtained via Steps 20-38, is extremely important because cations -and magnesium in particular-crucially affect folding of single-stranded structured RNAs 70,76-78 . c CRITICAL STEP One can use straight pipette tips to reach the bottom of the cuvette more easily. Cuvettes of different volumes can be used, after ensuring compatibility with the operating instrument. (viii) Analyze over a concentration range, for example, from 0.5 μM to 5.0 μM, starting from the highest concentration and diluting the sample at each measurement by adding filtered buffer to the cuvette, followed by gentle mixing. Perform all measurements at ambient temperature. (ix) Open the Zetasizer Nano S software. Step 39C(iii-xi), 3h;

Characterization of lncRNA monodispersity in solution
Step 39C(xii-xv), 0.5 h (i) Data acquisition (Step 39C(i-xi)). To avoid interference of any impurities of the buffers with the assay, filter 1 L of DEPC-treated water and all the required buffers through a 0.1-µm filter using a vacuum pump. (ii) To remove impurities present in the resin, which could interfere with the scattered light, equilibrate the self-packed column on the same FPLC device used for RNA purification (see Step 18) for 12 h with filtration buffer at a 0.1-mL/min rate. (iii) Before hanging the column on the MALLS system, wash the system (including the injection loop) at 5 mL/min with 20 mL DEPC-treated water, followed by 20 mL 1% (vol/vol) RNaseZap in DEPC-treated water, 20 mL DEPC-treated water, and 20 mL filtration buffer. (iv) Connect the column and wash for additional 4 h in filtration buffer at a 0.4-mL/min rate.
At this stage, the light scattering and refraction detectors should display a flat baseline. (v) Prepare a sample of RNA eluted by SEC (Step 18) with a concentration ranging between 5 and 0.32 μM. This is an indicative concentration, which should be optimized for each target lncRNA. The sample should be filtered with a 0.22-μm Ultrafree GV Durapore filter to eliminate any aggregates. The goal is to obtain UV, scattering intensity, and index of refraction signals displaying sufficient signal-to-noise ratio without saturating the detectors. j PAUSE POINT After the end of the run, data analysis can be performed at a later date. (xii) Data processing (Step 39C(xii-xv)). When the run is finished, in the Astra software set the baseline for the light scattering and refraction index signals. (xiii) Find the scattering peaks either using the Autofind Peaks option or manually by eliminating unwanted peaks and dragging peak bars around the peaks of interest. (xiv) Obtain the molar mass and radius of gyration from the MALLS detector, considering that the refractive index increment (dn/dc) for RNA is 0.17.
(xv) Export and plot the distributions of UV intensity, refraction index, Rayleigh ratio, and molar mass versus the elution volume. Examples of SEC-MALLS chromatograms for MEG3 are shown in Extended Data Figs. 3a-c and 4d.

SAXS analysis • Timing 3-4 h (data acquisition); 3-4 h (data analysis)
c CRITICAL To remove impurities in the resin, which could interfere with the scattered light, equilibrate the self-packed column on the same FPLC device used for RNA purification (see Step 39C(i,ii)) for 12 h with filtration buffer at a 0.1 mL/min rate. Generally, a column with a column volume of 25 mL should be suitable for loading the amount of RNA required for the SEC-SAXS experiments (0.5-5 µM, see below). The same self-packed columns as previously described for MALLS can be used. c CRITICAL Alternative procedures for SAXS analysis of lncRNAs also exist and have recently been reported elsewhere, during peer review of this manuscript 33  In ATSAS 79 , localize the raw scattering data files (.dat); these are text files that can be opened in a text editor, for example, Microsoft Excel, if needed. They contain three columns reporting scattering vector, experimental intensity, and experimental errors. 49 Among the raw data files, identify files to be used for background subtraction. These files should correspond to raw scattering data files with no signal, as judged from scattering intensity plots that show the scattering intensity of each frame versus the corresponding frame number. 50 Load background files in PRIMUS and click on Average. c CRITICAL STEP Average and Merge perform similar operations with the following differences: Average requires the same number of data points in all frames and it is best used to combine frames that are all in the same scale; Merge has no limitations in terms of number of data points per frame, it scales the data and minimizes noise, and it is thus best used for frames having different intensities, that is, the frames collected during elution from a SEC-SAXS run. 51 Click on 'Save' and save as 'averaged background file'. 52 Load all frames displaying scattering intensity. c CRITICAL STEP Use only frames that correspond to the scattering of particles with constant R g value. 53 Subtract 'averaged background file' from each scattering file.
! CAUTION Do not subtract 'averaged background file' from averaged scattering files c CRITICAL STEP This step is better done with the following command in the ATSAS module DATOP: $ datop SUB sample.dat background.dat -o subtracted.dat 54 Average all subtracted files (see Step 50; use the same procedure as for the background files). 55 Click on 'Save' and save as 'averaged data file'. 56 Open 'averaged data file' in PRIMUS. 57 Inspect the resulting processing curves. To compare different datasets, use the SCATTER program. 58 Define an appropriate Guinier region. 59 Restrict the data range from the beginning of the Guinier region until the highest resolution range at which the data are not yet too noisy (typically until~3 nm −1 for MEG3). 60 Save the 'cropped averaged data file'. 61 Open 'cropped averaged data file' in PRIMUS. 62 Determine R g and D max , by optimizing smoothening factor alpha and D max to obtain the most accurate fit with the scattering curve. 63 Save the corresponding output file (.out) in which to plot relevant curves for reporting, for example, Log(I) versus s, and Guinier, Kratky, and pair distance distribution function (P(r)) plots. An example of SAXS data for purified MEG3 is shown in Extended Data Fig. 7a-c. 64 Ab initio shape determination (Step 64). Perform ab initio shape determination in the ATSAS module DAMMIF. Important output files produced by DAMMIF are: • damfilt.pdb, which contains coordinate points present in all models.
• damaver.pdb, which contains coordinate points present in at least one model. damaver.pdb is always bigger than damfilt.pdb. • damsel.log, which includes a cross-correlation analysis of all models. The model with the lowest normalized spatial discrepancy (NSD) value should be taken as the reference model, and it is the most accurate model to use for comparing with other techniques (e.g., AFM). • damstart.pdb, which describes the occupancy of the coordinate points across generated models.
High-occupancy coordinates constitute the core of a flexible molecule; low-occupancy coordinates constitute the flexible regions. An example of the shape of MEG3 determined using DAMMIF is shown in Extended Data Fig. 7d. c CRITICAL STEP DAMMIF should be used on samples displaying high homogeneity and unambiguous calculations of R g and D max . Multiple runs with the same dataset testing different D max values from PRIMUS or multiple datasets of the same sample should result in similar structural models.
Atomic force microscopy sample preparation • Timing 1 h 65 Sample preparation (Steps [65][66][67][68]. Prepare the target lncRNA in partially folded forms and fully folded forms as described above. After SEC (Step 18), incubate with appropriate magnesium concentration as for AUC (Step 38). We have used a concentration of 10 mM MgCl 2 to ensure folding of MEG3 v1 in our experiments 10 . c CRITICAL STEP The concentration of magnesium is target dependent and needs to be determined experimentally, that is, by AUC, as described in Steps 20-38. Too-low magnesium concentrations (i.e., below the K Mg value determined in Step 38 by AUC) will not ensure folding. Too-high magnesium concentrations may induce sample aggregation (visible also in the AUC plots, Step 38) and formation of salt crystals on the mica, which would compromise data acquisition (see also Extended Data Fig. 5). 66 To obtain denatured samples, precipitate the target lncRNAs with isopropanol overnight at −20°C, then resuspend it in deionized formamide, and finally dilute it with ethanol to reach the same final concentration as that of the samples diluted in buffer. 67 Use poly(A) RNA, dissolved in the same buffers as the target lncRNA, as negative control. A concentration of 0.3 µg/mL poly(A) RNA is generally appropriate for imaging. 68 Use a highly structured RNA as a positive control. We used the Oceanobacillus iheyensis group II intron ribozyme, which has been crystallized 71,[76][77][78]80 . Prepare the intron in the same way as the target lncRNA.
AFM data acquisition • Timing 0.5 h (mica preparation and cantilever mounting); 0.5 h (sample adsorption to the mica);~2 d per condition (data acquisition) 69 Mica preparation and cantilever mounting (Steps 69-73). Glue a mica disk to the steel disk, using a double-sided adhesive tape. c CRITICAL STEP We recommend the use of freshly cleaved non-derivatized mica, which ensured the best-quality image acquisition, in our case using the lncRNA MEG3. Mica derivatization with divalent metal ions (nickel) or silane chemistry (3-aminopropyl)triethoxysilane (APTES)) is also possible and would allow for stronger RNA adsorption, but in our case it yielded lower-quality images (see Extended Data Fig. 5d). 70 Position the AFM probe holder on a flat surface, exposing the tip fastening mechanism toward yourself. 71 Gently squeeze an AFM probe from the middle with the tweezers of choice, making sure to notice where the cantilevers you need are, because sometimes both sides of the probe contain cantilevers. We recommend using straight-edged blunt-ended tweezers. 72 Push down the spring-releasing mechanism from the probe holder and insert the probe beneath the fastening mechanism.
73 Make sure the probe is straight in the lodging space of the probe holder. A stereomicroscope is of particular 84 If necessary, turn off some automated ScanAsyst options, such as setpoint or gain. In fact, it is usually necessary to have some control over the imaging parameters. 85 Image in PFT mode at an~1-Hz rate (at piezo oscillations of 2 kHz), with 512-or 1,024-px sampling (depending on the scanning size). With a small scan size (≤2 µm), use a set-point of~200 pN. c CRITICAL STEP When scanning at minimal forces, any vibration of the AFM can cause the AFM tip to lose contact with the sample surface. Use an efficient vibration-isolated AFM setup; for example, antivibration tables or heavy platforms supported by bungees. Acoustic isolation of the AFM can be efficiently achieved using an acoustic foam-coated casing for the microscope. 86 At high magnification, and because of the size of the samples (a few nanometers tall), the vertical range of the z piezoelectric scanner can be reduced (e.g., from 6 µm to 1 µm) to enhance the vertical resolution by the digitalization of the signal. AFM image analysis • Timing 5-6 h 95 Open the AFM images using Gwyddion. Correct them according to the protocol described in Steps 87-94. Draw a square ROI of 256 px (corresponding to~256 nm in the reported imaging conditions) around each particle (Fig. 3e), so that only one particle is present in the ROI. Save the ROI as a new image (Fig. 3f). 96 Calculate the PSD within the ROI of the image. If using Gwyddion, choose 1D Statistical Functions -PSDF and choose the fast scan axis (x in our case) for the analysis (Fig. 3g). 97 Export the PSDF function for the chosen ROI as a text file in ASCII format. The text file contains two columns, one for the spatial frequencies and the other for the PSD amplitudes. 98 Average the selected ROIs in the software of your choice. We typically select 100 ROIs and average them using Excel in Microsoft Office. For averaging, place the columns corresponding to the PSD amplitudes into a new Excel sheet (one column per ROI) and then average the values row by row. 99 Plot PSD versus spatial frequencies of individual particles or after averaging in the software of choice. We typically use IgorPro. In IgorPro, change both axes to log-scale. Use the cursors function in the plot to select what now appear as linear decays in the f -a region of the PSD (see equations in the 'Experimental design' section above). Use Analysis-Curve Fitting-power as the fitting function, selecting the option fit between cursors to fit into the desired range. Hold the offset to 0, effectively fitting Ax − b to the data, with A and b as free parameters. Repeat the procedure for the other linear ranges in the PSD, including the low-frequency plateau. 100 The intersection points between the linear fits yield the characteristic frequencies of the particles, namely average diameter and average size of any intra-particle feature, where present (Fig. 3h).

Troubleshooting
Troubleshooting advice can be found in Table 4.

Anticipated results
This protocol typically produces milligram amounts of lncRNA (i.e., 2-3 mg pure MEG3 v1 per 1-mL transcription reaction) via a non-denaturing purification method (Steps 1-19). Appropriate lncRNA folding conditions are determined by SV-AUC (Steps 20-38). A successful SV-AUC run can be recognized by looking at the raw data distribution and residuals in the SEDFIT software. The heat map of the scans should display a gradient from blue to red colors, without a predominance of dark-blue scans (sample not completely sedimented) or red scans (in the latter case, fewer scans are enough to sediment the sample). If there has not been leakage, the meniscus should be located at about the 6-cm position, whereas the bottom of the cell should be at 7.2 cm, when using a Beckman Coulter Analytical XL-A/XL-I instrument equipped with an An-50 Ti analytical rotor, and Nanolytics Instruments cells and counterbalance. An example of SV-AUC data obtained for MEG3 v1 is shown in Extended Data Figs. 3a-c and 4d.
Folded lncRNAs should run as a single homogeneous sharp band on non-denaturing agarose gels. Additional sharp bands may also form, particularly at low magnesium concentrations; these probably indicate alternative discrete conformations of the sample (see 'Troubleshooting' section and Extended Data Fig. 4e). These bands should disappear, or at least substantially decrease in intensity, at higher Measure PSD along the slow scanning axis after equalizing the background of the fast-scan lines using the Align rows function and median correction algorithm in Gwyddion Fit to power law not possible Wrong range selected Provide initial guesses for the fit, for example, −2 < α < −1 No low-frequency plateau present ROI too small around the particle Depending on lncRNA particle density, select an ROI that contains only one particle of at least 256 nm magnesium concentrations. If the purified lncRNA shows smearing on the gel, this is generally indicative of unstructured lncRNAs or lncRNAs prone to degradation. Unstructured or degraded samples are not suitable for further structural investigation. Sample homogeneity, purity, and integrity can also be assessed by DLS (Extended Data Fig. 3d) and SEC-MALLS (Extended Data Figs. 3a-c and 4d). Homogeneous pure lncRNAs are then structurally analyzed in solution by SEC-SAXS, which enables determination of the R g and D max values, as well as globularity of the targets (Step 63, Extended Data Fig. 7a-c) and of their 3D shape (Step 64, Extended Data Fig. 7d).
Finally, the 3D topography of lncRNAs is visualized by AFM in different folding states (Steps 65-100; Fig. 4). Crucially, AFM analysis is guided by the use of specific controls. Polyadenylic acid serves as a control for unfolded RNAs, because although it forms agglomerates, polyA does not fold into a compact structure even in the presence of magnesium ions (Fig. 4). Instead, the group II intron ribozyme, which has been previously crystallized 71,76,77 , can serve as a control for folded RNAs (Fig. 4). Ribosomal RNAs could be used as an alternative to the group II intron.
Specific structural features of the AFM topographic images can be analyzed by PSD analysis. The corner frequencies of the PSD (intersection between low-frequency plateau and f −α decay and intersections between different regions of the f −α decay) provide a measure of the diameter and overall size of lncRNA molecules and of their structural subdomains (Fig. 3h). To exemplify what should be anticipated when using PSD analysis, the following considerations should be made. Upon acquiring a set of high-resolution AFM pictures of lncRNA particles on mica substrates, one will probably have a collection of single-particle pictures such as the one depicted in Fig. 3d, which clearly displays a complex morphology. Qualitatively, the particle displays a central core, about 10nm in diameter and higher than the rest, and three features branching out radially, at about 120°from each other, each spanning again about 10 nm outward. The entire particle can be circumscribed by a circle of radius~50 nm in this case. The PSD analysis provides a quantitative estimate of this morphological complexity (Fig. 3f). Here, the pooled data from hundreds of particles yield an average curve that exhibits a set of characteristic frequencies, at~80 nm and~25 nm, which respectively reflect the average lateral size of the entire lncRNA particle on the mica and the presence of smaller features, on the order of a few tens of nanometers, which give rise to the aforementioned internal morphology. Although more complex choices for characterizing the morphology of the particles are certainly possible, PSD analysis is thus a robust and straightforward way to appraise and quantitate the basic morphological features shared among a large dataset.

Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
All data generated or analyzed during this study are included in the paper and its Supplementary  Information and  Corresponding author(s): Marco MARCIA Last updated by author(s): Mar 18, 2020 Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability All data generated or analyzed during this study are included in the paper and its Supplementary Information and are available from the corresponding author on request