Led-Seq: ligation-enhanced double-end sequence-based structure analysis of RNA

Abstract Structural analysis of RNA is an important and versatile tool to investigate the function of this type of molecules in the cell as well as in vitro. Several robust and reliable procedures are available, relying on chemical modification inducing RT stops or nucleotide misincorporations during reverse transcription. Others are based on cleavage reactions and RT stop signals. However, these methods address only one side of the RT stop or misincorporation position. Here, we describe Led-Seq, a new approach based on lead-induced cleavage of unpaired RNA positions, where both resulting cleavage products are investigated. The RNA fragments carrying 2′, 3′-cyclic phosphate or 5′-OH ends are selectively ligated to oligonucleotide adapters by specific RNA ligases. In a deep sequencing analysis, the cleavage sites are identified as ligation positions, avoiding possible false positive signals based on premature RT stops. With a benchmark set of transcripts in Escherichia coli, we show that Led-Seq is an improved and reliable approach based on metal ion-induced phosphodiester hydrolysis to investigate RNA structures in vivo.


INTRODUCTION
RN A is an extremel y versatile molecule, despite its limited number of building blocks. It performs various tasks in many biological processes ranging from encoding genetic information as mRNAs to deli v ering amino acids to the ribosome as tRN As, catal yzing chemical r eactions, r egulating gene expression and shielding viral RNA from degradation by host RNases, just to name examples (1)(2)(3). The functional di v ersity of RNA is based on helical arrangements comprising stacking interactions and base pairing that form both local structural motifs and long range interactions ( 4 ).
RNA structure formation is energetically dominated by canonical (Watson-Crick and GU-wobble) pairs forming helical stem regions separated by unpaired stretches of nucleotides. Such secondary structures also appear as intermediates during the folding process before additional interactions stabilize the three-dimensional structure ( 5 ). RNA secondary structure, i.e. the arrangement of canonical base pairs, can be computed based on an energy model that considers sequence-specific stacking of adjacent base pairs and entropy-dri v en, destabilizing contributions of loops ( 6 ). Efficient dynamic programming algorithms have been devised that compute minimum free energy (MFE) structures ( 7 ) and partition functions ( 8 ). Deviations from this simple nearest neighbor model, inaccuracies of energy parameters in particular large and multi-branched loops, tertiary interactions including pseudoknots, the limited understanding of the effects of salt concentrations and temperature limit the accuracy of the thermodynamic model. Additional factors such as interactions with proteins, metal ions and other ligands as well as the cellular localization of RNA molecules like wise hav e an impact on the conformational sta tes ( 9 , 10 ). Tha t makes computa tional prediction dif ficult and highlights the need for additional data to infer reliable RNA structures. Chemical probing methods provide information on the base pairing status of individual nucleotides. While it is well known that this information alone is insufficient to uniquely determine a secondary structure, it can be readily combined with thermodynamic prediction algorithms in the form of position-specific constraints or 'bonus energies' to guide the reconstruction of the biolo gicall y r elevant structur e (11)(12)(13). It is crucial, ther efor e, to de v elop and improve methods to determine RNA structure in vivo to deepen our understanding of how it is forming and how this structure can be predicted.
A rapid de v elopment has tak en place in the last tw o decades regarding the determination of RNA secondary and tertiary structure (re vie wed in (14)(15)(16)). Although biophysical methods such as X-ray crystallo gra phy, NMR and cryo-EM also have the potential to provide detailed insights into RNA structure, they are limited to in vitro applications. The possibility to structurally investigate the entirety of RNAs in vivo was pioneered with the de v elopment of SHAPE ( 17 ). This method and its deri vati v es e volv ed ov er the years utilizing different reagents, like NMIA, 1M7 or BzCN, to modify RNA depending on its structure ( 18 ). The coupling of SHAPE and other previously established structure probing methods with capillary electrophoresis and later high throughput sequencing allowed massi v ely parallel investigation of RNA structures, enabling a global assessment of the ' in vivo structurome' (19)(20)(21). Most methods target nucleotides in single-stranded and conformationally fle xib le RNA regions that are more accessible to chemical modification than base-paired regions. Chemical modification gi v es rise to either re v erse transcription (RT) stop (-seq approaches) or mutation signals (-MaP approaches) in cDNA. These signals can be read out as a reactivity score and be used to infer RNA structure ( 10 ). Structuredependent cleavage is an attracti v e alternati v e to the covalent modification strategy. Compared to enzymatic cleavage, which is to date limited to in vitr o applica tions ( 22 , 23 ), divalent metal ions promise to be an interesting alternati v e for in vivo investigations.
In particular Pb 2+ is well established in RNA structure probing both invitro and invivo ( 24 ). Briefly, this method relies on structure dependent cleavage catalyzed by hydrated Pb 2+ ions that abstract the proton from the ribose 2 -OH group. In an in-line configuration, the 2 -O − group attacks the phosphorous in the sugar-phosphate backbone, resulting in cleavage of the RNA with fragments carrying 2 ,3cyclic phosphate (2 ,3 -cP) and 5 -OH ends. As fle xib le regions ar e mor e likely to adopt the in-line conformation, this reaction occurs predominantly in single-stranded regions. Enhanced cleavage can also occur at metal ion binding sites ( 25 , 26 ). Lead-seq ( 27 ) combines the established in vivo Pb 2+ probing approach ( 28 ) with next generation sequencing (NGS) to allow its use in a high-throughput setting. It serves as a promising starting point to broaden the spectrum of methods to investigate the RNA structurome in vivo . The incorporation of a lead cleavage score as 'bonus energy' in the minimum free energy structure computation resulted in a distinct improvement of the predicted structure of se v eral tRNAs. Like most NGS-coupled probing approaches, Lead-seq uses random fragmentation to generate a more homo geneousl y sized pool of RNAs. This introduces additional strand breaks, which may obscure the actual cleavage signal e v en though a negati v e control can reduce this effect to a certain degree. The 5 phosphorylation and 3 dephosphorylation of the RNAs with T4 polynucleotide kinase prior to adapter ligation impedes the distinction of the 5 and 3 phosphorylation status of the ligated RNAs. In this manner, all cellular RNAs are potentially in-cluded in the libraries, which negati v ely impacts the specificity of the method. Lead-seq thus is a promising concept with ample potential for improvement.
Here we propose a modified procedure that improves the specificity and validity of this method. Cleavage by divalent metal ions introduces strand breaks tha t genera te fragments with distinct end groups, 2 ,3 -cP and 5 -OH. For the specific capture of these fragments, we utilized the unique features of two RNA ligases to mark the cleavage positions via ligation of sequencing adapters. The 5 fragments, carrying a 2 ,3 -cP, ar e captur ed by an Arabidopsis thaliana tRNA ligase ( Ath RNL) variant. The wild-type enzyme is able to ligate 2 ,3 -cP and 5 -OH via its 2 ,3 cyclic phosphodiesterase and 5 kinase activity ( 29 , 30 ). To ensure specific ligation of a pre-adenylated adapter to the substrate RNA, two mutations were introduced tha t inactiva te the enzyme's ability to phosphorylate and adenylate 5 -OH groups ( 31 ). This Ath RNL AA double m utant recentl y pro ved to k eep its specificity for 2 ,3 -cP carrying substrates in a ribozyme activity screen ( 32 ). The corresponding 3 fragments, carrying 5 -OH ends, ar e captur ed in a similar manner utilizing Esc heric hia coli ( Eco ) RtcB. This ligase and its homologs also have the ability to directly ligate 5 -OH and 2 , 3 -cP but exhibit a unique mechanism without the necessity of phosphorylating the RNA 5 end ( 33 ). It also proved to be applicable to library preparation in previous studies ( 34 , 35 ). The combination of two separate ligation approaches and analysis of cleavage positions from both sides allows mutual validation of the identified cleavage sites. Moreover, the use of reads from both the 5 and 3 side ensures that one of the libraries remains informati v e close to the transcript ends, wher e the r eads from the other library are too short for unambiguous mapping.

Protein purification
Ath RNL K152A D726A was expressed with 6xHisTag from a pET28a vector kindly gifted by Christina Weinberg in E. coli BL21 (DE3) codon PLUS RIPL cells. Sacchar om y ces cer evisiae Tpt1 was expressed from a pET-24b vector in E. coli BL21 (DE3) cells. Both enzymes were expressed and purified as described ( 32 ). T4 RNL 2 truncated KQ and TS2126 RNL expr ession plasmids wer e gifted by Jan Medenbach and the proteins were expressed and purified as described previously ( 36 , 37 ). Eco RtcB was expressed with 6xHisTag from a pET-53 vector (Addgene #51282) in E. coli BL21 (DE3) according to ( 38 ). Briefly, transformed cells were cultivated in TB medium with 100 g / ml ampicillin at 37 • C to an OD 600 of 0.6. Cultures were chilled on ice for 30 min before induction with 0.1 mM IPTG and addition of ethanol to a final concentration of 2%. The induced cells were cultivated for 18 h at 16 • C and harvested by centrifugation. Pellets were stored at −80 • C until use. Cell pellets were resuspended in 25 ml ice-cold lysis buffer (50 mM Tris-HCl, pH 7.4, 250 mM NaCl, 10% (w / v) sucrose, 0.2 mg / ml lysozyme) and incubated for 1 h at 4 • C. Then Triton-X100 was added to a final concentration of 0.1%. Cells were disrupted by sonica tion a t 70% intensity, 7 × 10 s with 20 s breaks. The lysate was centrifuged, the supernatant sterile-filtered and used for purification. All

PAGE 3 OF 15
Nucleic Acids Research, 2023, Vol. 51, No. 11 e63 further purification steps were carried out on an Ä KTA pure protein purification system starting with metal ion af finity chroma to gra phy on a HisTra pFF 1 ml column (Cytiva). To this end, the column was equilibrated with 5 column volumes (CV) of binding buffer (50 mM Tris-HCl, pH 7.4, 150 mM NaCl, 10% glycerol) containing 25 mM imidazole. After loading the sample, the column w as w ashed with 10 CV binding buffer with 25 mM imidazole, followed by a second wash step with 10 CV wash buffer (50 mM Tris-HCl, pH 7.4, 2 M KCl).
Step-wise elution was carried out using 5 CV binding buffer containing 100, 300 and 500 mM imidazole. Fractions containing Eco RtcB ligase were identified via SDS-PAGE and Coomassie staining, pooled and subsequently purified by size exclusion chromato gra phy on a HiLoad 16 / 60 Super de x 75 pg column with SEC buffer (10 mM Tris-HCl, pH 8.0, 350 mM NaCl, 1 mM DTT). Fractions containing the desired protein were identified via SDS-PAGE, pooled and concentrated with a Vivaspin 6 column (MWCO 10 kDa) and stored at −80 • C until use.
T4 polynucleotide kinase R38A was expressed with 6xHisTag from a pET-53 vector in E. coli BL21 (DE3) according to ( 39 ). Transformed cells were grown in TB medium containing 100 g / ml ampicillin at 37 • C to an OD 600 of 0.6. The cultures were chilled on ice for 30 min before induction with 0.3 mM IPTG and cultiva tion a t 16 • C for 18 h. Cells were harvested via centrifugation and stored at −80 • C until use. The cells were resuspended in 25 ml ice-cold lysis buffer (50 mM Tris-HCl, pH 7.5, 1.2 M NaCl, 15 mM imidazole, 10% (v / v) glycerol, 0.2 mM phen ylmethylsulphon yl fluoride, 1 mg / ml lysozyme) and incubated 1 h at 4 • C. Triton-X100 was added to a final concentration of 0.1%. Cells were disrupted by sonica tion a t 70% intensity, 7 × 10 s with 20 s breaks. The lysate was centrifuged, the supernatant sterile-filtered and used for purifica tion. Metal ion af finity chroma to gra phy was carried out on an Ä KTA pure protein purification system using a HisTrapFF 1 ml column (Cytiva). The binding buffer contained 50 mM Tris-HCl, pH 7.5, 200 mM NaCl and 10% glycerol. For step-wise elution, the 5 CV buffer additionally contained 125, 300 and 500 mM imidazole. The fractions with T4 PNK R38A were identified by SDS-PAGE, subsequently pooled and dialyzed twice against 1 l dialysis buffer (10 mM Tris-HCl, pH 7.4, 50 mM KCl, 1 mM DTT). Glycerol mix was added to reach storage conditions (10 mM Tris-HCl, pH 7.5, 50% (v / v) glycerol, 0.2 mM EDTA, 1 mM DTT, 50 mM KCl, 0.2 M ATP). Aliquots of the protein solution wer e stor ed at -80 • C. Plasmids encoding the enzymes used in this work are either available at Ad dgene (ad dgene.org) or from the authors upon request.

Oligonucleotide pr epar ation
Oligonucleotides were ordered from biomers (Ulm, Germany) and Microsynth (Balgach, Switzerland). 3 Adapter, RT-Primer and circularization RT-Primer were 5 labeled using [ ␥ 32 P]-ATP and T4 PNK (NEB) and purified via denaturing PAGE and ethanol precipitation. Nonradioacti v el y labeled ada pters and RT-Primers wer e pr epared in the same way with ATP and T4 PNK, using T4 PNK (3 phosphatase minus) for the 5 adapter. The 5 phos-phorylated and 3 blocked 3 adapter was pre-adenylated with TS2126 RNL. In one 20 l reaction, 2.5 M TS2126 RNL, 100 pmol adapter (10 pmol end-labeled with 32 P) and 0.5 mM ATP were incubated in 1 × adenyla tion buf fer (50 mM MOPS, pH 7.5, 10 mM KCl, 5 mM MgCl 2 ) supplemented with 2.5 mM MnCl 2 and 1 mM DTT for 1 h at 60 • C. The reaction was stopped by heat inactivation for 10 min at 80 • C followed by preparati v e denaturing PAGE and ethanol precipitation. For details on oligomers, see Supplementary Information.

Sample pr epar ation f or in vivo lead probing
In vivo lead probing was performed as described ( 27 , 28 , 40 ). Briefly, LB medium was inoculated with an E. coli DH5a overnight culture to a starting OD 600 of 0.06, and cells were grown to an OD 600 of 0.5 at 37 • C. Lead-II-acetate solutions were freshly prepared by mixing 3 volumes of a lead-II-acetate stock solution with 1 volume 4 × LB medium and pre-warming to 37 • C. After reaching the desired density (OD 600 0.5), 20 ml of each main culture were mixed with LB-lead-II-acetate solutions to a final concentration of 75 mM and incubated for 7 min. In Pb 2+ (-) samples, lead-II-acetate was replaced by autoclaved, deionized water (dH 2 O). The reaction was stopped by adding 10 ml icecold 500 mM EDTA. Cells were immediately pelleted and RNA was isolated using peqGOLD TriFast ® (VWR) according to the manufacturer's instructions and precipitated from the aqueous phase by adding twice the volume of icecold isopropanol. The pellet was resuspended in dH 2 O and incubated with 2 U DNase I (NEB). RNA was r ecover ed by phenol / chloroform extraction and ethanol precipitation ( 41 ). Recovered RNA was redissolved in dH 2 O and stored a t -80 • C .

Specific adapter ligation
Total RNA was used for 2', 3'-cP mapping via specific adapter ligation with Ath RNL K152A D726A and for 5 -OH mapping via 3 dephosphorylation with T4 PNK R38A and subsequent 5 -OH specific adapter ligation with Eco RtcB.
2 , 3 -cP capture. 35 ng total RNA wer e pr e-incubated with 20 pmol pre-adenylated 3 adapter for 5 min at 65 • C and immediately put on ice for at least 1 min. Subsequently, the mixture was incubated in 1 × reaction buffer (20 mM Tris-HCl, pH 7.5, 5 mM MgCl 2 , 2.5 mM spermidine, 100 M DTT) and 20% (v / v) PEG8000 with 12 pmol Ath RNL K152A D726A for 2 h at 25 • C in a volume of 16 l. In the ligation reaction, the 2', 3'-cP is converted into a 2'-P group that can interfere with re v erse transcription. To remove this obstacle, 10 pmol S. cerevisiae tRNA 2'-phosphotr ansfer ase Tpt1 and 1 mM NAD were added to the ligation mixture. The volume was adjusted to 20 l with dH 2 O and reaction buffer and the samples were incubated for 30 min a t 30 • C . The liga ted and 2'-dephosphorylated RNA was r ecover ed using the Monarch ® RNA clean up kit (NEB) and used as template for re v erse transcription. 5 -OH capture. 70 ng total RNA were incubated in 1 × PNK buffer (NEB) with 1 mM ATP and 10 pmol T4 PNK R38A for 30 min at 37 • C in 15 l total volume. The 3 dephosphorylated RNA was r ecover ed with the Monar ch ® RNA clean up kit (NEB) using the protocol to bind RNA down to 15 nt length. The RNA was eluted in 6 l dH 2 O and 3 l thereof were pre-incubated with 20 pmol 5 adapter for 5 min at 65 • C and put on ice for 1 min. Subsequently, the samples were incubated in 50 mM Tris-HCl (pH 7.4), 2 mM MnCl 2 , 100 M GTP and 20% (v / v) PEG8000 with 50 pmol Eco RtcB for 1 h at 37 • C in 20 l total volume. The ligated RNA was r ecover ed again with the Monarch ® RNA clean up kit and used for 3 adapter ligation. To this end, the ligated RNA was mixed with 20 pmol preadenyla ted 3 adapter, pre-incuba ted for 5 min a t 65 • C and put on ice for 1 min. The mixture was incubated in 1 × T4 ligase buffer (NEB) and 20% PEG8000 (v / v) with 20 pmol T4 RNL 2 truncated KQ for 2 h at 25 • C. The ligated RNA was extracted using the Monarch ® RNA cleanup kit and used as template for re v erse transcription.

Reverse transcription
The ligated and r ecover ed RNA from both strategies was re v erse transcribed with Superscript IV re v erse transcriptase (Thermo). Reactions of 20 l were set up according to the manufacturer, using RT-Primer for 5 -OH samples. For 2 ,3 -cP samples, a biotin-dNTP-Mix (final concentration 500 M dATP / dGTP / dTTP each, 350 M dCTP, 150 M biotin-16-dCTP (Jena Bioscience)) and circularization RT-Primer was used. For both strategies, 20 pmol of the respecti v e primer were used containing trace amounts of the same 5 labeled primer for product detection. After incubation at 55 • C for 10 min, the template RNA in the reaction mixes was degraded by adding NaOH to a final concentration of 250 mM and incubation for 3 min a t 95 • C . The reaction mix was neutralized with 250 mM HCl, cDNA was extracted and size-selected via preparati v e denaturing PAGE. A 32 Plabeled size standard was used to identify cDNA above the size of the used RT-Primer + UMI sequence. The cDNA was eluted from the gel and precipitated with ethanol and 10 g / ml LPA (Thermo) as carrier. Samples of the 2 ,3 -cP strategy wer e cir cularized (see below), 5 -OH cDNA was r edissolved in 20 l dH 2 O and directly used for amplification and introduction of flow cell linkers and indices.

Circularization of cDNA and streptavidin bead cleanup
The cDNA for 2 , 3 -cP library construction, redissolved in 20 l dH 2 O, was incubated for 2 h at 60 • C in 1 × adenyla tion buf fer supplemented with 2.5 mM MnCl 2 , 10 mM DTT and 50 M ATP using 2.5 M TS2126 RNL ( 42 ). The mixture was hea t-inactiva ted a t 80 • C for 10 min, adjusted to 100 l with 1 × wash / binding buffer (20 mM Tris-HCl, pH 7.5, 1 mM EDTA, 0.5 M NaCl) and directly used for purification with Hydrophilic Streptavidin Magnetic Beads (NEB). Per reaction, 16 l beads were prepared: the beads were washed three times by resuspending them in 160 l wash / binding buffer each time and removing the supernatant while placing the tubes on a magnetic rack.
The washed beads wer e r esuspended with the circularization r eaction mixtur e and incubated for 20 min at room temperatur e with car eful mixing e v ery 5 min. After incubation, samples were spun down, supernatants were discarded and the beads were resuspended by pipetting. The beads where washed three times with 500 l wash / binding buffer. Each wash included the following steps: resuspension of the beads by pipetting, brief centrifugation in a desktop centrifuge, again car eful r esuspension of the bead pellet by pipetting, transfer of the sample to a new tube (this step turned out to be crucial to avoid carry over of nonbiotinylated cDNA), using a magnetic rack when removing the supernatants. The beads were finally resuspended in 20 l dH 2 O each and directly used for amplification and introduction of flow cell linkers and indices.

Introduction of flow cell linkers and indices
Flow cell binding sequences and indices were introduced via PCR. One reaction mix contained 2.5 l of the cDNA template solution, 1 × Phusion HF buffer (Thermo), 200 M dNTPs, 0,5 M Illumina PCR and Index Primer and 0.02 U / l Phusion ® high-fidelity polymerase (Thermo) in a volume of 25 l. To samples containing circularized cDNA templates, 5 M of a PNA clamp were added to reduce the accumulation of a side product resulting from r esidual prolonged cir culariza tion RT-Primer as templa te in this reaction. Cycling conditions were as follows: initial dena tura tion a t 98 • C for 30 s, followed by 15 (5 -OH strategy) / 18 (2 ,3 -cP strategy) cycles of 98 • C f or 10 s, 80 • C f or 20 s, 60 • C for 20 s and 72 • C for 20 s. The additional annealing step at 80 • C was introduced to ensure optimal PNA binding in the 2 , 3 -cP libraries ( 43 ). The amplified libraries were purified by preparati v e nati v e PAGE, e xcised and eluted from the gel, and precipitated with ethanol and 10 g / ml LPA as carrier. The final libraries were sequenced on an Illumina NovaSeq 6000 (Azenta). The experimental protocol is illustrated in Figure 1 .

Read preprocessing and mapping
Sequencing quality of pair ed r eads was evaluated using MultiQC ( 44 ). 3 adapter sequences were removed from read1 and read2, and pairs were filtered for a correct UMI sequence in read2 with cutadapt v2.10 ( 45 ). A minimum read length of 12 nt was set to enable unambiguous mapping. An additional fixed sequence between the UMI and RNA insert was removed from the 5 -end of read2 and if necessary from the 3 -end of read1 by cutadapt . Pr eprocessed r eads wer e mapped to the E. coli genome (NZ CP026085.1) with segemehl v0.3.4 ( 46 , 47 ). Libraries were deduplicated with umi tools v1.0.1 ( 48 ) and filtered for primary hits of properly mapped read pairs. After an initial sample composition analysis, selected (multi-copy) genes were masked by substitution with 'N' and one copy was attached to the end of the genome to facilitate unique mapping of reads. As we wanted to focus our analysis on highly r epr esented transcripts, this procedur e included all tRNA and rRNA genes. For details on the generated genome and corresponding transcriptome annotation file, see Supplementary Information. Mapping and deduplication steps were repeated accordingly. Left: Mapping of 2 ,3 -cP carrying cleava ge fra gments via specific liga tion of a pre-adenyla ted RNA 3 adapter using Ath RNL AA. The 3 adapter carries an 8 nt unique molecular identifier (UMI) to account for PCR bias. After ligation, RNA is re v erse transcribed using the circularization RT-Primer and biotinylated dCTP (B-dCTP). The resulting cDNA is circularized by TS2126 RNL and extracted with hydrophilic streptavidin magnetic beads (brown circular symbol denoted 'A'), removing residual circularization RT-Primer in the process. Circularized, bead-bound cDNA is used as a PCR template to introduce flow cell linkers (P5, P7) and 6 nt indices. A PNA PCR clamp is utilized to minimize the amplification of circularized products without cDN A insert. Right: Ma pping of 5 -OH carrying cleavage fragments via specific ligation of a biphosphorylated RN A 5 ada pter using Eco RtcB. RN A was previousl y 3 dephosphorylated with T4 PNK R38A to pre v ent ligation of RNA fragments with 2 ,3 -cP to 5 -OH ends. The 3 adapter is subsequently ligated to the RNA using T4 RNL II KQ truncated, followed by re v erse transcription and amplification via PCR.

Intensity of the probing signal
We filtered for reads that mapped uniquely in proper pairs and subsequently considered only hits that mapped to nonoverlapping annotated regions ( bedtools v2.27.1 ( 49 )) to ensure an unambiguous signal. In the 2 -3 -cP libraries, the last nucleotide of a ligated RNA fragment r epr esents the position immediately upstream of the cleavage site in the RNA backbone. Correspondingly, the first nucleotide of a fragment in 5 -OH libraries r epr esents the position downstr eam of the cleavage site. The raw probing signal was obtained for each nucleotide of the transcriptome by counting the number of read starts (or start-1, respecti v ely) a t tha t position. Normalization of the raw signal was performed separately for each transcript. According to Low and Weeks ( 50 ), we divided the raw read count by the average count of the 90 th to 98 th percentile of the signal. This is motiv ated b y considering the largest 2% of the signals as outliers. We denote the normalized signal for position i by S i . Its range is limited to the interval [0,7] because very high values of outliers were capped at 7. Where applicable, mean values of replicates were used for all downstream analyses. The workflow for the computation of the normalized probing signal from raw sequencing reads is implemented as a snakemake v3.13.3 pipeline ( 51 ), which is available at github.com / xamiiii / Led-Seq.

Estimating the probability to be unpaired
We use a bayesian approach to estimate the probability q i that position i is unpaired based on the normalized signal S i . To this end, we employ a collection of r efer ence structures comprising 32 RNAs of lengths 74 nt -682 nt. This set includes non-coding RNAs that belong to Rfam families ( 52 ) and that are sufficiently r epr esented by our data, see coverage criteria below. The small-subunit rRNA (16S) and the large-subunit rRNA (23S) were divided into smaller domains as described ( 53 , 54 ). Secondary structures for the resulting 40 sequences were taken from the RNAcentral data base (see Supplementary Table S1 and Supplementary Inf ormation f or full details). Denote by n u ( S ) and n ( S ), the number of unpaired positions and the total number of positions, respecti v ely, that exhibit a normalized signal in the calibration set that falls within a bin, i.e. an interval of signal values, centered at S . The probability that a position with a signal that falls within this bin is unpaired can then be estimated by In a more elaborate model, we combine the two signals S cP and S OH of the 2 , 3 -cP and 5 -OH libraries. We then consider where n u ( S cP , S OH ) and n ( S cP , S OH ) are the counts of unpaired and all positions, respecti v ely, in the calibration set that have signal values for the 2 , 3 -cP and 5 -OH libraries within intervals centered at S cP and S OH , respecti v ely. In order to reduce the effects of inaccuracies in the r efer ence structures and other noise in the data we approximate p ( S ) and p ( S cP , S OH ) by fitting sigmoidal functions of the form to the binned data, see Figure 2 . To estimate parameters  a , b , c , and a 1 , a 2 , b , c , respecti v ely, we used the function curvefit of the python v3.6.12 library scipy v1.5.2.
Since cleavage sites in CA dinucleotides were found to behave differ ently compar ed to the other dinucleotides, differ-ent parameters were fitted for this special case. As a control we randomized the sequence positions and, as expected, obtained a flat response curve, see Supplementary Figure S1. The 2 ,3 -cP libr ary produces sequence fr agments that are too short for reliable mapping close to the 5 end of each transcript. Consequently, no signal S cP exists for the first 11 nucleotides of each transcript. Analo gousl y, the 5 -OH library is uninformati v e close to the 3 end (last 12 nt). Thus we use p( S OH i ) or p( S cP i ) for positions i at the ends of a transcript and p( S cP i , S OH i ) for its interior.

Secondary structure prediction with probing data
The conversion of the probability of being unpaired, i.e.
, into a pseudo-energy in essence follows the scheme proposed by Zarringhalam et al. ( 55 ).
Howe v er, we associate pseudo-energies only with the unpaired nucleotides. It is not difficult to see that it suffices to associate pseudo-energies only with paired or only with unpaired nucleotides as long as one is not interested in the absolute value of the partition function, see ( 12 ).
To incorporate the probing data into secondary structure prediction algorithms, we converted q i into pseudo free energy terms that can be interpreted as log-likelihoods for nucleotides being unpaired ( 56 ). In addition, we compensate for the fact that the a priori probability p 0 that a base is unpaired differs from 1 / 2. This yields where R is the gas constant, T is the absolute temperature, and c is a constant that allows to tune the relati v e importance or trust in the probing data. Throughout this contribution, we used c = 1.2. The value of p 0 = 0.42 was determined from the calibration set. The position-dependent pseudo-energies are used as soft constraints ( 12 ) in the program RNAfold of the ViennaRNA package v2.4.15 ( 57 ) as described ( 11 ). A detailed structural analysis r equir es sufficient cleavage signal across the transcript. We r equir e rather stringent conditions: (i) at least 75% of a transcript must be r epr esented by reads, and (ii) there must be at least 2.5 read starts per position on average.
To assess prediction quality, we calculated positi v e predicti v e value (PPV), sensitivity (SEN) and the Matthews correla tion coef ficient (MCC) (definitions see Supplementary Information). Plots were generated using the python package matplotlib v3.3.1. Secondary structures with mapped pseudo-energies were visualized using the forna Web server ( 58 ).

Lead probing protocol
Metal ion cleavage is an established approach to probe RN A structure ( 28 , 59 ). Recentl y, Twittenhoff et al. showed that in vivo probing with lead(II) ions coupled with next generation sequencing is suitable to investigate RNA structures on a transcriptome-wide le v el ( 27 ). Here, we present a novel lead-based approach where NGS adapters are selecti v ely ligated to the resulting fragments of lead-induced  Figure 1 ). Accordingly, both cleavage ends (2 , 3 -cP and 5 -OH) are converted into sequencing libraries. During the de v elopment of this method, we noticed two major side products in the sequencing data of the 2 , 3 -cP libraries, both identified as results from the circularization reaction with TS2126 RNL. These originated from leftover 3 adapter and circularization RT-Primer in the re v erse transcription reaction which subsequently gave rise to the pr esence of thr ee RT-Primer based DNA species. In addition to circularization RT-Primer+UMI+cDNA, also circularization RT-Primer+UMI and RT-Primer alone were present, despite performing a size selection via preparati v e denaturing PAGE after re v erse transcription. These side products caused a considerable loss in usable sequences. To reduce these side products, we combined two strategies. First, cDN A was internall y labeled with biotin-dCTP and extracted using magnetic streptavidin beads after circulariza tion to elimina te unused circulariza tion RT-Primer. Second, as PNA oligonucleotides exhibit a very tight binding to complementary DNA sequences with high thermal duplex stability ( 43 , 60 ), we designed a PNA clamp binding to the constant 3 -region of the UMI-containing oligonucleotide and to the 5 adapter-deri v ed cDNA sequence. These regions are only adjacent if the circularization RT-Primer elongated by the sequence of the 3 adapter was circularized without insert. Hence, the PNA clamp blocked the amplification of this side product. Both strategies led to a drastically reduced formation of by-products (see Supplementary Figure S2, Supplementary Figure S3).
We generated two independent biological replicates of both Pb 2+ (+) libraries. In addition, negati v e controls in the absence of lead (Pb 2+ (-)) were prepared and sequenced. One of these controls is shown here as a r epr esentati v e e xample. After read preprocessing, mapping and removal of duplicated reads, we obtained 4.8 -16.8 million uniquely mapping reads per library (see Supplementary Table S2). Figure 3 summarizes the composition of mapped read ends in terms of the annotated biotypes. As expected, the ma- jority of reads originates from rRN A and tRN As. The distributions are similar for lead-treated and negati v e control libraries and also differ only moderately between the 2 , 3 -cP and 5 -OH libraries.
Between 221 and 465 RNAs in the Pb 2+ (+) libraries (and 104 to 171 RNAs in the Pb 2+ (-) libraries) are covered sufficiently by reads to meet our criteria for structural analysis (described in Methods section). Of these, we set up a benchmark set of 32 transcripts that we analyzed in more detail. We found that probing signals are highly reproducible with a Pearson correlation coefficient of 0.83 for the two 2 ,3 -cP libraries and 0.80 for 5 -OH libraries (calculated over all sufficiently covered transcripts).

Signal corresponds to structure
Based on r efer ence structur es, we investigated distributions of the signal for paired and unpaired nucleotides separately ( Figure 4 ). Consistent with our expectations, unpaired positions display significantly higher cleavage le v els (two-sided Mann-Whitney U test p = 3.0 × 10 −210 for 2 , 3 -cP and p = 2.9 × 10 −200 for 5 -OH). We also observed that most positions exhibit low signal regardless of the structure. High signal values are therefore informati v e for unpaired positions. Howe v er, not all nucleotides that are unpaired in the r efer ence structur es ar e associated with high cleavage activity. Thus, low signals do not provide a reliable predictor for pairedness.
To visualize the correlation between probing intensity and structure, the profile for tRNA Leu(CAG) is displayed in Figure 5 . D-loop, anticodon loop, variable loop, T-loop as well as the unpaired 3 end consisting of the CCA sequence and the discriminator base are reflected in the peaks of 2 ,3 -cP as well as 5 -OH libraries. Since we capture both RNA cleava ge fra gments in our protocol, the information for the entire transcript is retained e v en though both library types hav e one 'b lind' end of 11 (or 12) nucleotides. To further investigate and quantify the structural information within the probing signal, we converted the scores into probabilities to be unpaired q i , see Figure 2 in the Methods section.
Since the mapped reads determine the exact cleavage position, we considered the possibility of sequence-specific biases. We indeed found a strong bias towards CA dinucleotides and a milder bias towards UA in 2 ,3 -cP libraries ( Figure 6 ). This observation cannot be explained by genomic overr epr esentation of these dinucleotides or the influence of only a few outliers (see Supplementary Figure S4). Cleavage between C and A (but not between U and   A) is clearly less correlated to an unpaired structural state compared to all other dinucleotides. We account for this effect by estimating the probability of unpairedness q i separately for CA and all other dinucleotides, see Figure 2 above. The de viations observ ed for UA in the 2 ,3 -cP libraries were small enough to be neglected.

Comparison of Pb 2+ (+) and Pb 2+ (-) libraries
The signal profiles of Pb 2+ -treated and non-treated samples in a subregion of the non-coding RNase P RNA are shown as an example in Figure 7 . RNase P RNA harbors se v eral high affinity metal ion binding sites that were identified already in previous studies ( 26 , 61 , 62 ). Two major (Ia and Ib) and one minor (IIa) high affinity metal ion binding site that are present in the illustrated subregion yielded strong signals in the Pb 2+ -treated but not in the H 2 O control libraries (Figure 7 A wise, Figure 7 C illustra tes tha t ther e ar e many pronounced peaks in the profiles after lead treatment that are absent in the control samples. These findings emphasize the successful application and utility of lead-induced cleavage in combination with end-specific ligation reactions to determine RNA secondary structure. The high quality of the data is illustrated in detail in Supplementary Figure S6B. We noticed that the Pb 2+ (-) libraries were also highly PAGE 9 OF 15 Nucleic Acids Research, 2023, Vol. 51, No. 11 e63 correlated with both Pb 2+ (+) signal and the unpaired positions of the known r efer ence structur es (Figur e 7 ). Overall, the distribution of normalized signals in the H 2 O-treated sample is similar to that of the Pb 2+ -treated samples (Supplementary Figure S5A), although the separation of unpaired and paired positions is less pronounced compared to Figure 4 . We find that the patterns are close matches for both the 2 , 3 -cP libraries (Figure 7 A) and the 5 -OH libraries (Figure 7 B). The normalized signals are also highly correlated across the entire data set (Figure 7 C, Supplementary Figure S5B). This similarity between Pb 2+ -treated and H 2 O-treated libraries persists when considering the unpaired positions of our calibration set. As shown in Figure 7 D (Supplementary Figure S5C), the Pb 2+ (-) signal can also be converted into a probability q i that behaves similarly to the one obtained from the Pb 2+ -treated samples. We ther efor e tested whether an additional 'bonus energy' based on Pb 2+ (-) as independent source of information results in an increase of secondary structure inference over the use of Pb 2+ (+) only. Since this was not the case, the negati v e control libraries were not used in further downstream analysis.

Experimental signal impr ov es RNA secondary structure prediction
Probing data by definition offer only information whether a gi v en nucleotide is pair ed or unpair ed and ther efor e provide experimental constraints on RNA secondary structures rather then determining the structure unambiguously. We ther efor e assessed the usefulness of the Led-Seq data by testing whether they are capable of improving the secondary structures over computational predictions based on the rules of the thermodynamic standard model. To this end, we calculated pseudo-energy contributions for positions with q i greater than the average probability to be unpaired p 0 . These values served as soft constraints for the computation of MFE structures with RNAfold . We observed that for the vast majority of the transcripts in our benchmark set there is indeed an improvement. More pr ecisely, the MFE structur es computed with the experimentally determined pseudo-energies are close to the refer ence structur es (Figur e 8 ). All r efer ence structur es from RNAcentral as well as predictions by construction do not include pseudoknots. Pseudoknots ther efor e appear as 'unpaired' in the reference structures, see e.g. Supplementary Figure S6. Table 1 summarizes the results quantitatively in terms of the quality metrics PPVs , SENs , and MCCs. Notabl y, a pplying the experimental signal of only one type of probing library (2 , 3 -cP or 5 -OH) already yields substantially incr eased pr ediction accur acy. This demonstr ates the high performance of both libraries individually. As expected, best accuracy is achie v ed by incorporating both libraries. In particular, improvements close to both ends of a transcript r equir e both libraries since each library type is informati v e only for either the 5 or the 3 terminal r egion. Figur e 9 gi v es three illustrati v e e xamples (5S rRN A, 23S rRN A (domain IV) and tRNA Ile(GAU) ) sho wing ho w incorporation of experimental pseudo-energies guides structure prediction. In particular, positions engaged in false-positi v e (stacked) base pairs predicted by the thermodynamic model are re- shows an open conf ormation f or the variable loop, while it is base-paired in the reference. As the resulting terminal loop consists of three individual residues, it is very likely that this poses a considerable tension on the short helical part that, due to the central U-A pair, is not highly stable. As a consequence, this region probably shows a certain fraying, resulting in a single-stranded conformation that is recognized by Led-Seq. For more details, see Supplementary Information.

mRNA analysis
The secondary structures of mRNAs in the vicinity of the translation start site have repeatedly been reported to be subject to selecti v e pr essur es. Such effects can be detected by considering patterns of accessibility, i.e. the profile of the position-specific unpairedness q i , see e.g. ( 23 , 27 , 67 , 68 ). We collected all mRNAs for which a stretch of 50 nucleotides upstream of the AUG start codon does not intersect another annotated gene and that are sufficiently covered by the probing data ( n = 89 for 2 ,3 -cP libraries and n = 146 for 5 -OH libraries). Sequences were aligned at AUG start codons and mean values were calculated for e v ery position in the 5 untranslated region (UTR, positions −48 to −1) and the beginning of the coding sequence (CDS, positions 1 to 180, 60 codons). As there is no valid signal for the first 11 nucleotides within the 2 ,3 -cP libraries, these positions where excluded. The resulting profiles are displayed in Figure 10 . The signal shows a pronounced peak . We further assessed the mean structural signal of first, second and third positions within codons. Triplets were found to exhibit a significant periodicity in CDSs. No periodicity is detectable in the 5 -UTRs. Both libraries suggest that, on average, last codon positions ar e less structur ed than first codon positions (see Supplementary Figure S9).

Utility and limitations of probing methods for RNA structure elucidation
Here we show that Led-Seq is capable of probing RNA secondary structures in vivo with highly reproducible results between biological replicates. A major advantage of the method is that the interrogation of both cleavage products in the 2 ,3 -cP and 5 -OH libraries not only increases the reliability of the data and provides an internal quality control, but also ensures that the full length of transcripts can be probed in a high throughput setting.
As with other probing methods, including SHAPE, lead cleavage does not unambiguously distinguish between paired and unpaired positions but provides quantitati v e evidence that can be converted into a probability that a nucleotide is unpaired. We emphasize that this is not a methodological shortcoming but an ine vitab le consequence of the fact that RNAs form a free-energy weighted ensemble of structures rather than a single, unambiguous secondary structur e ( 8 , 69 , 70 ). Indeed, r ecently methods have become availab le that deconvolv e multiple r epr esentati v e structures from a probing signal (70)(71)(72). The 'known' r efer ence structur es ar e ther efor e necessaril y a pproxima tions ra ther than a perfect gold standard.
The probing signal is confounded further by the fact that the chemical reactions underlying a probing method are influenced by the detailed local conformation of the target, metal ion or protein binding, and other factors that go beyond the pairing status of nucleotides. It is ther efor e not surprising that the lead probing signal does not distinguish paired from unpaired positions in an all-or-nothing fashion. Instead, one observes distributions of signals S that are biased towards larger signals for unpaired positions. The distribution of lead signals in Figure 4 are indeed very similar to corresponding distributions for SHAPE data, see e.g. ( 73 ).
A general issue of probing methods is that low signals, her e for pair ed nucleotides, cannot be distinguished from missing data due to other causes, such as limited accessibility to the reagent, see e.g. ( 74 , 75 ). Based on this ambiguity, we cannot safely use low values of S and thus p ( S ) as support for pair edness. We ther efor e include pseudo-energies only if p ( S ) > p 0 , i.e., if the signal provides evidence that a position in unpaired. Lifting this restriction indeed did not improve the prediction quality.
We observed that the incorporation of probing data did not result in improved predictions in a subset of the benchmark structures, and in a few cases the prediction accuracy e v en decreased, albeit only slightly. This is not unexpected. First, in many cases the thermodynamic model produces rather accurate structures e v en without additional experimental evidence ( 76 , 77 ). In these cases, additional experimental data confirm rather than modify the structure. Mor eover, the r efer ence data from RNAcentral have been obtained with the help of templa tes tha t, in turn, are informed either by in vitro structures obtained from NMR or X-ray data, or have been constructed as consensus structures over a large set of phylo geneticall y related molecules ( 63 ). They cannot account for fluctuations in the structures of actual RNAs with (temporarily) open segments as implied by probing data of the transcripts with decreased folding accuracy. Taken together, it is reassuring that inclusion of the probing data increases consistency with the r efer ence models. At the same time there is no reason to expect the probing data to reproduce the r efer ence structur e perfectly. Figure 8 can thus be interpreted as strong support for the proper functioning and usefulness of Led-Seq.

Negativ e contr ols as inf ormation sour ce in Led-Seq
We observed that negati v e controls without lead treatment (Pb 2+ (-)) produce a signal that is similar to the Pb 2+ (+) libraries. This observation can be explained by the reacti v e intracellular environment. For example, [Zn(H 2 O) 5 OH] + and [Cu(H 2 O) 5 OH] + with pK a values of 8 to 9 ( 78 ) were also shown to be able to hydrolyze E. coli RNase P RNA at neutral pH ( 79 ). Considering that Zn 2+ and Cu 2+ are natural trace elements, it is reasonable to assume that RNA fragments generated by endogenous transition metal ion hydrates entered the 2 ,3 -cP and 5 -OH libraries of the H 2 O control samples. Although lead-treated libraries are more e63 Nucleic Acids Research, 2023, Vol. 51, No. 11 PAGE 12 OF 15 informati v e owing to the identification of additional cleavage sites and a better separation of paired and unpaired signals, the Pb 2+ (-)-libraries also convey structural information. This implies that the Pb 2+ (-) libraries are not genuine backgr ound contr ols, thus questioning the concept to determine differences or ratios such as S Pb 2 + /S H 2 O for quantifying the results of in vivo lead probing experiments. We emphasize that the lack of a 'control', or rather a background signal is not an issue that might invalidate the method. In fact, genome-wide data sets often include sufficient information on the background already in the foreground data to render negati v e controls redundant. In the case of Led-Seq, the relevant information on the background signal is implicitly provided by the distribution of normalized signals for paired positions in the reference structures. We therefore advocate to utilize the control libraries in Led-Seq experiments as an additional source of RNA structure information and to assess the quality and integrity of Led-Seq data by computing the distributions of normalized signals and the functions p ( S ) using a few r efer ence structur es, pr eferably well-characterized non-coding RNAs.

Bias f or cleav age betw een p yrimidine and adenosine in 2 ,3 -cP libraries
In general, RNA cleavage induced by Pb 2+ has been reported to have no overall sequence bias ( 27 ). Ne v ertheless, a considerable bias towards the r epr esentation of cleavage sites between CA, and to a lesser extent UA, dinucleotides was evident in our 2 ,3 -cP libraries. At present, we can only speculate about the origin of this effect. In se v eral small RNA sequencing libraries, ligation biases were described, and the secondary structures of the ligation partners seem to r epr esent a major cause of such biases ( 80 ). In our libraries, howe v er, the CA (UA) bias is presumably not induced by secondary structures, because the remaining three dinucleotides starting with C or U did not show this behavior. In addition, we did not observe any sequence-rela ted ef fect in liga tion ef ficiency. Theoretically, the observed pattern could also be caused by an endonuclease with a specific recognition sequence that leaves 2 ,3 -cP residues after cleavage. E. coli expresses such enzymes, most notably MazF ( 81 , 82 ). In a previous study targeting cyclic phosphate-containing RNAs in mice, a similar effect was observed predominantly in mRNA and attributed to a hypothetical enzymatic cleavage e v ent ( 83 ). In following studies, the authors reported a similar effect in human cell lines and explained it as the product of ANG cleavage ( 84 ). In Bombyx mori cells, cleavage with BmRNase ( 85 ) was identified as a possible cause. If a similar mechanism is at work on E. coli RNA, we would expect to observe the overr epr esentation of the CA dinucleotide in both, 2 ,3 -cP and 5 -OH libraries. This, howe v er, is not the case.
Hence, the molecular basis for this phenomenon is currently unclear. To our knowledge, no bacterial enzyme is known that cleaves RNAs specifically at pyrimidine-A dinucleotides leaving 2 ,3 -cP but not 5 -OH groups ( 82 ). The effect is also not caused by an incr eased occurr ence of the CA dinucleotide in the E. coli transcriptome and our data do not contain one or a few 'hotspot' RN As w hose overr epr esentation might explain the CA bias. Interestingly, UA and CA phosphodiester-bonds have been described as more susceptible to hydrolysis in general. However, this effect was later reported to be too small and unsystematic and also highly dependent on the other neighboring nucleotides in the investigated oligomers. Eventually, it was attributed to other structural effects, such as stacking interactions, that enhance or reduce cleavage (86)(87)(88)(89). Further investigation is needed to determine the cause for this dinucleotide bias in 2 ,3 -cP libraries. From another perspecti v e, our observation of a CA bias in the 2 ,3 -cP libraries, but essentially not in the 5 -OH libraries, again illustrates the strength of our dual Led-Seq approach. The observed discrepancy suggests unresolved technical reasons and pre v ented us from drawing pr ematur e conclusions on the biological significance of this finding.

Analysis of mRNA structures
Led-Seq can also be used to evaluate mRNA structure. Lead treatment of cells resulted in a larger relati v e abundance of reads mapping to mRNAs in both libraries (Figure 3 ). The coverage of mRNAs (or other non-coding RNAs) could be improved further by implementing an rRNA depletion step. Since we aimed to demonstrate the general applicability of Led-Seq in this work, we did not include any selection step for certain RN As, especiall y since rRN As and tRN As are also part of our benchmark set. In first experiments on the applicability of commercially available rRNA depletion kits we could already observe that both RNase H-based (NEBNext ® , NEB) and bead-based (riboPOOLs, siTOOLs) kits are compatible with our approach and allow a considerable reduction of the rRNA content (data not shown). With the possibility of an individual adaptation of the used probes to other targets, such as tRNAs, we expect a general applicability of these kits for the depletion of a variety of RNAs. Ne v ertheless, the application of these methods can also bring disadvantages. For example, RNase H-based depletion methods have already been shown to display off-target effects that can negati v ely impact ribosome profiling data ( 90 ). Averaging the normalized probing signals of mRNAs aligned at the start codon (position 1) re v ealed a local minimum around −9 nt and a local maximum around −17 nt in both libraries. These signals ar e r emar kab ly consistent with earlier reports based on parallel analysis of RNA structure (PARS) probing data ( 67 ). Del Campo et al. postulated that the unstructured region 20 nt upstream of the start codon serves as a nonspecific docking site of the 30S ribosomal subunit and described it as a general feature of E. coli genes. They interpreted the low signal near nucleotide position -10 as an effect of the Shine-Dalgarno sequence. We also observed a substantially increased signal immediately preceding the start of CDS, implying an open conformation. This observation also conforms to previous findings ( 67 , 68 , 91 ). Comparison of the average signal intensity of UTRs and CDSs showed a significant increase in the first 60 nt of the CDS.
We also noticed a periodicity of the signal in the mRNA coding regions, while such a signal is absent in the UTR. Interestingly, the 2 , 3 -cP mapping shows a significantly higher signal for e v ery 3r d position of a codon. In contrast, 5 -OH data show no difference between 2nd and 3rd position, but a significantly lower signal for the first position. Taken together, this leads to the conclusion that the third codon position is more susceptible to Pb 2+ -induced cleavage than the first position and thus appears less likely to be 'structured'. The same effect was already observed in PARS data of E. coli ( 67 ), as well as in lead-based structure analysis in Yersinia pseudotuberculosis ( 27 ). Other studies also recognized a periodic pattern in eukaryotic mRNA.
Howe v er, in DMS-seq data of A. thaliana ( 68 ), PARS data for S. cerevisiae ( 23 ), and CIRS-seq data for mice ( 92 ), the first nucleotide was least likely to be 'structured'. Intriguingly, no periodicity was observed in SHAPE-MaP data for E. coli ( 93 ). The authors provided se v eral e xplanations for this discrepancy, which we would like to address with an emphasis on our approach: The authors argued that the RNases used in the PARS approach are known to exhibit a certain sequence bias in their cleavage efficiency. In contrast, Pb 2+ -induced cleavage occurs sequence-independent ( 27 ). Another potential cause are artifactual signals generated either by methodical or cellular processes. While we cannot rule out a methodical cause entirely, we would not expect to see a periodical pattern restricted to coding regions. Moreover, cotranslational decay by exonucleases as described by Pelechano et al. ( 94 ) would result in fragments that are not mapped by our approach. To our knowledge, no known exonuclease leaves 2 ,3 -cP and 5 -OH ends. The periodicity effect observable in our data is also not assignable to in vitro probing conditions, as we used an in vivo probing approach. The only suggested explanation for the mentioned discrepancy between the methods, that applies to Led-Seq, is that it relies on the enzymatic ligation of adapters to cleavage fragments to infer structural inf ormation. Theref ore it is, in principle, prone to a bias based on the ligation properties of the used enzymes. Despite the enhanced CA dinucleotide cleavage described above, where a ligation bias can be excluded, we could not identify a ligation bias matching the average nucleotide composition of the investigated mR-NAs. The periodicity of the genetic code is a known feature based on the fact that in coding sequences no truly random nucleotide / triplet composition is present due to the inherent bias of amino acid-coding triplets ( 95 ). Our results suggest that the observed structural periodicity is in fact caused by this intrinsic feature of coding sequences in DNA and ther efor e in mRNA. It is concei vab le that the small size allows the lead ions to enter acti v ely translating ribosomes, where they might have access to the mRNA codons engaged in tRNA binding. The weak wobble interaction at codon position three would then be less protected from cleavage, resulting in the observed periodicity pattern. Further investigation is needed to explain the discrepancy between SHAPE-MaP results and other previous findings concerning CDS structural periodicity.

CONCLUSION
The Led-Seq approach described here offers the unique advantage that both sites of the metal ion-induced cleavage position in RNA are mapped, increasing the reliability of the observed signals. Furthermore, the interrogation of both cleavage sites allows for the structural analysis of RNA regions close to the 5 and 3 ends of transcripts, as the two separate libraries can m utuall y compensate for in-forma tion loss a t the end of transcripts, which is caused by inaccurate mapping of short cDNAs to the genome. This poses an advantage over other sequencing based methods that lose the 3 -end information because of a missing compensa tion option. W hile muta tional pr ofiling appr oaches elegantly circumvent that information loss, RT reactions generally r epr esent an err or-pr one enzymatic step and inherent RT stops and nucleotide misincorporations may result in a loss of information. Using our double-end approach, we minimize artificially introduced signals by a redundant design of the method. Nonetheless, Led-Seq and SHAPE based approaches complement each other well, as they involve entirely different chemistries. An additional advantage resulting from the use of metal ion-based cleavage is its potential applicability for the in vivo investigation of the structurome of psychrophiles and thermophiles, as the exploited probing reaction is theoretically suitable for a wide range of temperatures. Taken together, the doubleend structure investigation of Led-Seq r epr esents a very useful approach to characterize RNA structures in vivo as well as in vitro , expanding our technical arsenal to investigate structur e-function r elations of RNA.

DA T A A V AILABILITY
The data for this study have been deposited in the European Nucleotide Archi v e (ENA) at EMBL-EBI under accession number PRJEB58715, see www.ebi.ac.uk/ena/browser/ view/PRJEB58715 . A computational pipeline is accessible at github.com / xamiiii / Led-Seq and https://doi.org/10. 5281/zenodo.7821447 .