RiboDiffusion: tertiary structure-based RNA inverse folding with generative diffusion models

Abstract Motivation RNA design shows growing applications in synthetic biology and therapeutics, driven by the crucial role of RNA in various biological processes. A fundamental challenge is to find functional RNA sequences that satisfy given structural constraints, known as the inverse folding problem. Computational approaches have emerged to address this problem based on secondary structures. However, designing RNA sequences directly from 3D structures is still challenging, due to the scarcity of data, the nonunique structure-sequence mapping, and the flexibility of RNA conformation. Results In this study, we propose RiboDiffusion, a generative diffusion model for RNA inverse folding that can learn the conditional distribution of RNA sequences given 3D backbone structures. Our model consists of a graph neural network-based structure module and a Transformer-based sequence module, which iteratively transforms random sequences into desired sequences. By tuning the sampling weight, our model allows for a trade-off between sequence recovery and diversity to explore more candidates. We split test sets based on RNA clustering with different cut-offs for sequence or structure similarity. Our model outperforms baselines in sequence recovery, with an average relative improvement of 11% for sequence similarity splits and 16% for structure similarity splits. Moreover, RiboDiffusion performs consistently well across various RNA length categories and RNA types. We also apply in silico folding to validate whether the generated sequences can fold into the given 3D RNA backbones. Our method could be a powerful tool for RNA design that explores the vast sequence space and finds novel solutions to 3D structural constraints. Availability and implementation The source code is available at https://github.com/ml4bio/RiboDiffusion.


Introduction
The design of RNA molecules is an emerging tool in synthetic biology (Chappell et al., 2015;McKeague et al., 2016) and therapeutics (Zhu et al., 2022), enabling the engineering of specific functions in various biological processes.There have been various explorations into RNA-based biotechnology, such as translational RNA regulators for gene expression (Laganà et al., 2015;Chappell et al., 2017), aptamers for diagnostic or therapeutic applications (Espah Borujeni et al., 2016;Findeiß et al., 2017), and catalysis by ribozymes (Dotu et al., 2014;Park et al., 2019).While the tertiary structure determines how RNA molecules function, one fundamental challenge in RNA design is to create functional RNA sequences that can fold into the desired structure, also known as the inverse RNA folding problem (Hofacker et al., 1994).
Most early computational methods for inverse RNA folding focus on folding into RNA secondary structures (Churkin et al., 2018).Some programs use efficient local search strategies to optimize a single seed sequence for the desired folding properties, guided by the energy function (Hofacker et al., 1994;Andronescu et al., 2004;Busch and Backofen, 2006;Garcia-Martin et al., 2013).
Others attempt to solve the problem globally by modeling the sequence distribution or directly manipulating diverse candidates (Taneda, 2010;Kleinkauf et al., 2015;Yang et al., 2017;Runge et al., 2019).However, without considering 3D structures of RNA, these methods cannot meet accurate functional structure constraints, since RNA secondary structures only partially determine their tertiary structures (Vicens and Kieft, 2022).The pioneering work (Yesselman and Das, 2015) applies a physicallybased approach to optimize RNA sequences and match the fixed backbones, but it is still constrained by the local design strategy and computational efficiency.
Recent advances in deep learning and the accumulation of biomolecular structural data have enabled computational methods to model mapping between sequences and 3D structures with extraordinary performance, as demonstrated by remarkable results in protein 3D structure prediction (Jumper et al., 2021;Lin et al., 2023) and inverse design (Dauparas et al., 2022).Inspired by this, the development of geometric learning methods on RNA structures has received increasing research interest.On the one hand, many studies have explored RNA tertiary structure prediction using machine learning models with limited data (Shen et al., 2022;Baek et al., 2022;Li et al., 2023).On the other hand, although deep learning has a promising potential to narrow down the immense sequence space for inverse folding, developing an appropriate model for RNA inverse folding remains an open problem, as it requires capturing the geometric features of flexible RNA conformations, handling the non-unique mappings between structures and sequences, and providing alternative options for different design preferences.
In this study, we introduce RiboDiffusion, a generative diffusion model for RNA inverse folding based on tertiary structures.We formulate the RNA inverse folding problem as learning the sequence distribution conditioned on fixed backbone structures, using a generative diffusion model (Yang et al., 2022).Unlike previous methods that predict the most probable sequence for a given backbone (Ingraham et al., 2019;Jing et al., 2021;Gao et al., 2023;Joshi et al., 2023), our method captures multiple mappings from 3D structures to sequences through distribution learning.With a generative denoising process for sampling, our model iteratively transforms random initial RNA sequences into desired candidates under tertiary structure conditioning.This global iterative generation distinguishes our model from autoregressive models and local updating methods, enabling it to better search for sequences that satisfy global geometric constraints.We parameterize the diffusion model with a cascade of a structure module and a sequence module, to capture the mutual dependencies between sequence and structure.The structure module, based on graph neural networks, extracts SE(3)-invariant geometrical features from 3D fixed RNA backbones, while the sequence module, based on Transformer-liked layers, captures the internal correlations of RNA primary structures.To train the model, we randomly drop the structural module to learn both the conditional and unconditional RNA sequence distribution.We also mix the conditional and unconditional distributions in the sampling procedures, to balance sequence recovery and diversity for more candidates.
We use RNA tertiary structures from PDB database (Bank, 1971) to construct the benchmark dataset and augment it with predicted structures from the RNA structure prediction model (Shen et al., 2022).We split test sets based on RNA clustering using different sequence or structure similarity cutoffs.Our model achieves an 11% higher recovery rate than the machine learning baselines for benchmarks based on sequence similarity, and 16% higher for benchmarks based on structure similarity.RiboDiffusion also performs consistently well across different RNA lengths and types.Further analysis reveals its great performance for crossfamily and in-silico folding.Our method could be a powerful tool for RNA design, exploring a wide sequence space and finding novel solutions to 3D structural constraints.

Methodology
This section will explain RiboDiffusion in detail -a deep generative model for RNA inverse folding based on fixed 3D backbones.The overview is shown in Fig. 1.We will first introduce the preliminaries of diffusion models and our formulations of the RNA inverse folding problem.We will then describe the design of neural networks to parameterize the diffusion model and explain the sequence sampling procedures.

Diffusion Model
As a powerful genre of generative models, diffusion models (Sohl-Dickstein et al., 2015) have been successfully applied to the distribution learning of diverse data, including images (Ho et al., 2020;Song et al., 2021), graphs (Huang et al., 2022(Huang et al., , 2023a)), and molecular geometry (Watson et al., 2023;Huang et al., 2023b).As the first step of setting up the diffusion model, a forward diffusion process is constructed to perturb data with a sequence of noise.This converts the data distribution to a known prior distribution.With random variables x 0 ∈ R d and a forward process {x t } t∈[0,T ] , a Gaussian transition kernel is set as where α t , σ t ∈ R + are time-dependent differentiable functions that are usually chosen to ensure a strictly decreasing signal-tonoise ratio (SNR) α 2 t /σ 2 t and the final distribution q T (x T ) ≈ N (0, I) (Kingma et al., 2021).Diffusion models can generate new samples starting from the prior distribution, after learning to reverse the forward process.Such the reverse-time denoising process from time T to time 0 can be described by a stochastic differential equation (SDE) (Yang et al., 2022) as where ∇ x log p t (x t ) is the so-called score function and wt is the standard reverse-time Wiener process.While t is the diffusion coefficient (Kingma et al., 2021).Deep neural networks are used to parameterize the score function variants in two similar forms, i.e., the noise prediction model ϵ θ (x t , t) and the data prediction model d θ (x t , t).In this study, we focus on the parameterization of the widely used data prediction model to directly predict the original data x 0 from x t .

RNA Inverse Folding
Inverse folding aims to explore sequences that can fold into a predefined structure, which is specified here as the fixed sugarphosphate backbone of an RNA tertiary structure.For an RNA molecule with N nucleotides consisting of four different types A (Adenine), U (Uracil), C (Cytosine), and G (Guanine), its sequence can be defined as S ∈ {A, U, C, G} N .Among the backbone atoms, we choose one three-atom coarse-grained representation including the atom coordinates of C4', C1', N1 (pyrimidine) or N9 (purine) for every nucleotide.The simplified backbone structure can be denoted as X ∈ R 3N ×3 .Note that there are various alternative schemes for coarse-graining RNA 3D backbones, including using more atoms to obtain precise representations (Dawson et al., 2016).We explore a concise representation with regular structural patterns (Shen et al., 2022).
Formally, we consider the RNA inverse folding problem as modeling the conditional distribution p(S|X), i.e., the sequence distribution conditioned on RNA backbone structures.We establish a diffusion model to learn the conditional sequence distribution.To take advantage of the convenience of defining diffusion models in continuous data spaces (Chen et al., 2023;Dieleman et al., 2022), discrete nucleotide types in the sequence are represented by one-hot encoding and continuousized in the real number space as S ∈ R 4N .The continuous-time forward diffusion process in the sequence space R 4N can be described by the forward We construct a dataset with experimentally determined RNA structures from PDB, supplemented with additional structures predicted by an RNA structure prediction model.We cluster RNA with different cut-offs for sequence or structure similarity and make cross-split to evaluate models.RiboDiffusion trains a neural network with a structure module and a sequence module to recover the original sequence from a noisy sequence and a coarse-grained RNA backbone extracted from the tertiary structure.RiboDiffusion then uses the trained network to iteratively refine random initial sequences until they match the target structure.We present a comprehensive evaluation and analysis of the proposed method.SDE with t ∈ [0, T ] as dS t = f (t)S t dt + g(t)dw.Under this forward SDE, the original sequence at time t = 0 is gradually corrupted by adding Gaussian noise.With the linear Gaussian transition kernel derived from the forward SDE in Eq. (1) (Yang et al., 2022), we can conveniently sample S t = α t + σ t ϵ S at any time t for training, where ϵ S is Gaussian noise in the sequence space.For the generative denoising process, the corresponding reverse-time SDE from time T to 0 can be derived from Eq. (2) as where p t (S t |X) is the marginal distribution of sequences given X, and the score function ∇ S log p t (S t |X) represents the gradient field of the logarithmic marginal distribution.
Once the score function is parameterized, we can numerically solve this reverse SDE to convert random samples from the prior distribution N (0, I) into the desired sequences.We establish a data prediction model to achieve the score function parameterization, learning to reverse the forward diffusion process.Specifically, we feed the noised sequence data S t , the log signal-tonoise ratio λ t = log(α 2 t /σ 2 t ), and the conditioning RNA backbone structures X to the data prediction model d θ (S t , λ t , X).We optimize the data prediction model with a simple weighted squared error objective function: which can be considered as optimizing a weighted variational lower bound on the data log-likelihood or a form of denoising score matching (Ho et al., 2020;Song et al., 2021;Kingma et al., 2021).

Model Architecture
The architecture design of the data prediction model largely determines the diffusion learning quality of the diffusion model.We propose a two-module model to predict the original nucleotide types: a structure module to capture geometric features and a sequence module to capture intra-sequential correlation.

Structure Module
Geometric deep learning models aim to extract equivariant or invariant features from 3D data and achieve impressive performance in the protein inverse folding task (Ingraham et al., 2019;Jing et al., 2021;Gao et al., 2023).Our structure module is constructed based on the GVP-GNN architecture (Jing et al., 2021) and adapted for RNA backbone structures.
The fixed RNA backbone is first represented as a geometric graph G = (V, E) where each node v i ∈ V corresponds to a nucleotide and connects to its top-k nearest neighbors according to the distance of C1' atoms.The scalar and vector features are extracted from 3D coordinates as node and edge attributes in graphs, which describe the local geometry of nucleotides and their relative geometry.Specifically, the scalar node features in nucleotides are obtained from dihedral angles, while the vector node features consist of forward and reverse vectors of sequential C1' atoms, as well as the local orientation vectors of C1' to C4' and N1/N9.The initial embedding of each edge consists of its connected C1' atom's direction vector, Gaussian radial basis encoding for their Euclidean distance, and sinusoidal position encoding (Vaswani et al., 2017) of the relative distance in the sequence.In addition to geometry information, we also append the corrupted one-hot encoding of nucleotide types S t as the Algorithm 1 RiboDiffusion Training.
S ϵ 10: end for 11: return St M node scalar features.Furthermore, inspired by the widely used self-conditioning technique in diffusion models (Chen et al., 2023;Watson et al., 2023;Huang et al., 2023b), the previously predicted sequence output, denoted as S0 , is also considered as node embeddings to enhance the utilization of model capacity.To update the node embeddings, the nucleotide graph employs a standard message passing technique (Gilmer et al., 2017).This involves combining the neighboring nodes and edges through GVP layers, where scalar and vector features interact via gating to create messages.The resulting messages are then transmitted across the graph to update scalar and vector node representations.

Sequence Module
The sequential correlation in RNA primary structures is crucial for inverse folding and to obtain high-quality RNA sequences even with imprecise 3D coordinates.This concept is applicable in the inverse folding of proteins (Hsu et al., 2022;Zheng et al., 2023).The sequence module takes in f -dimensional nucleotidelevel embeddings h 0 ∈ R N ×f as tokens, which consists of SE(3)invariant scalar node representations from the structure module and corrupted sequence data.During training, we randomly add self-conditioning sequence data similar to those of the structure module and drop structural features to model both the conditional and unconditional sequence distributions for further application.
Our sequence module architecture is modified from the Transformer block (Vaswani et al., 2017) to inject diffusion context, log-SNR λ, or other potential conditional features (e.g.RNA types) (Dhariwal and Nichol, 2021;Peebles and Xie, 2023).The context input C affects sequence tokens in the form of adaptive normalization and activation layers, which are denoted as adaLN and act functions: where LN(•) is the layer normalization and MLP(•) is a multilayer perception to learn shift and scale parameters.The l-th Transformer block is defined as follows where MHA(•) is the multi-head attention layer and FFN(•) is the Feedforward neural network (Vaswani et al., 2017).Finally, the sequence module output h L is projected to nucleotide one-hot encodings via an extra MLP.The detailed training procedure is referred to as Algorithm 1.

Sequence Sampling
To generate RNA sequences that are likely to fold into the given backbone, we construct a generative denoising process based on the parameterized reverse-time SDE with the optimized data prediction model d θ , as described in Eq. (3).Various numerical solvers for the SDE can be employed for sampling, such as ancestral sampling, the Euler-Maruyama method, etc.We apply convenient ancestral sampling combined with the data prediction model and self-conditioning to generate sequences.Algorithm 2 outlines the specific sampling procedure.For more details on the noise schedule parameters, including α t and σ t , refer to (Kingma et al., 2021).We intuitively explain the denoising process as follows: we start by sampling noisy data from a Gaussian distribution that represents a random nucleotide sequence, and we iteratively transform this data towards the desired candidates under the condition of the given RNA 3D backbones.
Exploring novel RNA sequences that fold into well-defined 3D conformations distinct from the natural sequence is also an essential goal for RNA design, as it has the potential to introduce new functional sequences.This task not only requires the model to generate sequences that satisfy folding constraints but also to increase diversity for subsequent screening.During the generative denoising process, our model can balance the proportion of unconditional and conditional sequence distributions by adjusting the output of the data prediction model.Let w be the conditional scaling weight, and the data prediction model can be modified as Setting w = 1 is the original conditional data prediction model while decreasing w < 1 weakens the effect of conditional information and strengthens the sequence diversity.In this way, we achieve a trade-off between recovering the original sequence and ensuring diversity.The distribution weighting technique is also used in diffusion models for text-to-image generation (Ho and Salimans, 2022;Saharia et al., 2022).

Dataset Construction
We gather a dataset of RNA tertiary structures from the PDB database for RNA inverse folding.The dataset contains individual RNA structures and single-stranded RNA structures extracted from complexes.After filtering based on sequence lengths ranging from 20 to 280, there is a total of 7.322 RNA tertiary structures and 2, 527 unique sequences.In addition to experimentally determined data, we construct augment training data by predicting structures with RhoFold (Shen et al., 2022).The structures predicted from RNAcentral sequences (Sweeney et al., 2019) are filtered by pLDDT to keep only high-quality predictions, resulting in 17, 000 structures.
To comprehensively evaluate models, we divide the structures determined by experiments into training, validation, and test sets based on sequence similarity and structure similarity with different clustering thresholds.We use PSI-CD-HIT (Fu et al., 2012) to cluster sequences based on nucleotide similarity.We set the threshold at 0.8/0.6/0.4 and obtain 1, 252/1, 157/1, 114 clusters, respectively.For structure similarity clustering, we calculate the TM-score matrix using US-align (Zhang et al., 2022) and apply the agglomerative clustering algorithm from scipy (Virtanen et al., 2020) on the similarity matrix.We achieve 2, 036/1, 659/1, 302 clusters with TM-score thresholds of 0.6/0.5/0.4.We randomly split the clusters into three groups: 15% for testing, 10% for validation, and the remaining for training.We perform 4 random splits with non-overlapping testing and validation sets for each split strategy to evaluate models.The augmented training data is also filtered strictly based on the similarity threshold with the validation and testing sets for each split.

RNA Inverse Folding Benchmarking
Baselines.We compare our model with four machine learning baselines with tertiary structure input, including gRNAde (Joshi et al., 2023), PiFold (Gao et al., 2023), StructGNN (Ingraham et al., 2019), GVP-GNN (Jing et al., 2021).While gRNAde is a concurrent graph-based RNA inverse folding method, PiFold, StructGNN, and GVP-GNN are representative deeplearning methods of protein inverse folding, which are modified here to be compatible with RNA.Implementation details of these model modifications are in the supplementary material.These methods use the same 3-atom RNA backbone representation.We also introduce RNA inverse folding methods with secondary structures as input for comparison.RNAinverse (Hofacker et al., 1994) is an energy-based local searching algorithm for secondary structure constraints.MCTS-RNA (Yang et al., 2017) searches candidates based on Monte Carlo tree search.LEARNA and MetaLEARNA are deep reinforcement learning approaches (Runge et al., 2019) to design RNA that folds into the given secondary structures.Each method generates a sequence for every RNA backbone for benchmarking.
Metrics.The recovery rate is a commonly used metric in inverse folding that shows how much of the sequence generated by the model matches the original native sequence.While similar sequences have a higher chance of achieving the correct fold, the recovery rate is not a direct measure of structural fitness.We further evaluate with two metrics: the F1 Score, which assesses the alignment between the generated sequence's predicted secondary structure (via RNAfold (Gruber et al., 2008)) and the secondary structure extracted from the input's tertiary structure (using  We present recovery rate results in Table 1, which contains the average and standard deviation of four non-overlapping test sets for each model in different cluster settings.Our model outperforms the second best method by 11% on average for sequence similarity splits and 16% for structure similarity splits.RiboDiffusion consistently achieves better recovery rates in RNA with varying degrees of sequence or structural differences from training data.Methods based on tertiary structures outperform those based on secondary structures, as the latter contains less structural information.Extra results are shown in Table 2.It is worth noting that the tools used in these two metrics may contain errors.Our proposed method outperforms or matches the baseline methods in secondary structure alignments and more effectively retains family information from the input RNA. We further classify the RNA in the test set based on its length and type to compare the model performance differences more thoroughly.First, we divide RNA into three categories based on the number of nucleotides (nt), i.e., Short (50 nt or less), Medium (more than 50 nt but less than 100 nt), and Long (100 nt or more).It can be observed in Table 1 that RiboDiffusion maintains performance advantages across different lengths of RNA.Short RNAs present a challenge for the model to recover the original sequence due to their flexible conformation, causing a relatively low recovery rate when compared to medium-length RNAs.A more detailed correlation of RiboDiffusion performance with RNA length is shown in supplemental materials.Each split shows similar patterns: RiboDiffusion has higher variance in short RNA inverse folding, and the model's performance becomes limited as RNA length increases.Moreover, Fig. 2 shows the recovery rate distribution of different RNA types with over 10 structures in test sets, including rRNA, tRNA, sRNA, ribozymes, etc.The RNA type information is collected from (Sweeney et al., 2019).Compared to other baselines, RiboDiffusion still has a better recovery rate distribution across RNA types.Through comprehensive benchmarking, we have observed remarkable performance improvement in tertiary structure-based RNA inverse folding achieved by RiboDiffusion.

Analysis of RiboDiffusion
We dive into a more comprehensive analysis of RiboDiffusion.
Cross-family performance.We repartition the dataset with the cross-family setting to further verify the generalization of our model.We obtain the RNA family corresponding to the tertiary structure from (Kalvari et al., 2021), then randomly select four families for testing and others for training.The experimental results of 4 non-overlapping splits are shown in Fig. 3.The average recovery rate of RiboDiffusion in each family generally ranges between 0.4 and 0.6.Especially, our model performs well on RF02540 whose sequence length far exceeds the training set.Although the performance is slightly worse than other splits in Table 1, these results still illustrate that our model can handle RNA families that do not appear in the training data, considering that cross-family is inherently a more difficult setting.
In-silico tertiary structure folding validation.To verify whether RiboDiffusion generated sequences can fold into a given RNA 3D backbone, we use computational methods to predict RNA structures (i.e., RhoFold (Shen et al., 2022) and DRFold (Li et al., 2023)) to obtain their tertiary structures.Structure prediction models with a single sequence input are used due to the difficulty in finding homologous sequences for generated sequences and performing multiple sequence alignment.We take the TM-score of C1' backbone atoms to measure the similarity between the predicted RNA structure of generated sequences and the given fixed backbones.Note that in-silico folding validation contains two sources of errors.One is the structure prediction error of the folding method itself, and the other is the sequence quality generated by RiboDiffusion.Therefore, we also predict the structure from the original native sequence using the same folding method and compare it to the given RNA backbone as an error and uncertainty reference.
As depicted in Fig. 4 (a), sequences generated by RiboDiffusion exhibit promising folding results in the fixed backbone for mediumlength and long-length RNAs.However, the performance for shortlength RNAs is relatively poor, which is affected by the unsatisfied recovery rate of our model and the limitations of RhoFold itself.We also show the folding performance using DRFold in Fig. 4 (b), where RiboDiffusion exhibits distribution shapes similar to those of using RhoFold.Here, due to the limitation of DRFold inference speed, we only test on the representative sequence of each cluster instead of the entire test set.We further make in-silico folding (with RhoFold) case studies of rRNA, tRNA, and riboswitch in Fig. 4 (e).RiboDiffusion generates new sequences that are different but still tend to fold into similar geometries.To alleviate concerns about the independence of structure prediction and inverse folding models, we provide results from alternative tools and evaluations of structures independent of current datasets as an extra reference in the supplementary material.
Trade-off between sequence recovery and diversity.Exploring novel RNA sequences that have the potential to collapse into a fixed backbone distinct from native sequences is a realistic demand for RNA design.However, there is a trade-off between the diversity and recovery rate of the generated sequences.RiboDiffusion can achieve this balance by controlling the conditional scaling weight.For the representative input backbone of each cluster, we generate 8 sequences in total to report diversity.The diversity within the generated set of sequences G is defined as IntDiv (Benhenda, 2017).The function Sim compares two sequences by calculating the ratio of the length of the aligned subsequence to the length of the shorter sequence.In Fig. 4 (c), it is evident that the mean diversity of generated sequences in the test sets begins to increase when the conditional scaling weight is set to 0.5, while the recovery rate and the F1 score decrease to some extent.Therefore, we recommend using a value between 0.5 and 0.35 to adjust the sequence diversity.
Training data augmentation analysis.Augmenting training data is primarily driven by the scarcity and limited diversity of RNA available in PDB.Table 3 indicates that the incorporation of additional RhoFold predictions improves the overall generated sequence quality.This augmentation also enhances the adjustment ability of RiboDiffusion for sequence diversity, as shown in Fig. 4 (d), where the sequence diversity of the model without the augmented data remains relatively low.Notably, the noisy nature of augmented data requires appropriate preprocessing and filtering for quality assurance.

Conclusion
We propose RiboDiffusion, a generative diffusion model for RNA inverse folding based on tertiary structures.By benchmarking methods on sequence and structure similarity splits, comparing performance across RNA length and type, and validating with insilico folding, we demonstrate the effectiveness of our model.Our model can also make trade-offs between recovery and diversity, and handle cross-family inverse folding.In future work, we aim to expand the scope of RiboDiffusion by exploring RNA sequences that span larger magnitudes in size and integrate contact information from the complex into the model.Our ultimate objective is to utilize the model for designing functional RNA like ribozymes, riboswitches, and aptamers, and to verify its effectiveness in wet lab experiments.

B.5. Extra In-silico Tertiary Structure Folding Results
To alleviate concerns about the independence of structure prediction tool and inverse folding models, we use two extra computational tools, trRosettaRNA (Wang et al., 2023) and SimRNA (Boniecki et al., 2016), to obtain tertiary structures of generated RNA sequences.We also use these tools to predict tertiary structures from the original native sequences.As depicted in Figure 3 Besides RhoFold (Shen et al., 2022), we also provide 3D visualized results of DRFold (Li et al., 2023) and trRosettaRNA, which are shown in Figure 6.

B.6. Results on New RNA Structures
We evaluate the newly published RNA structures between 2023 and 2024 as an additional reference for our model.After removing redundancy and removing RNAs similar to the training set, we Native represents structures predicted from original sequences of given backbones as references, while Generated represents structures predicted from generated sequences.present 8 structures that have not been trained by RiboDiffusion and RhoFold.The result is displayed in Table 3.

B.7. Performance on CASP15
To assess the generalizability of the model, RiboDiffusion is tested on six natural RNAs in CASP15 without any overlap with the training set.As shown in Figure 4 (a) and (b), the performance of RiboDiffusion in complex RNA backbone structures is impressive, which is demonstrated by an average recovery rate of 0.56.Furthermore, the TM-score values of generated sequences are similar to the native sequences.However, it is important to note that the results of in-silico folding on CASP15 need more followup validation, as the TM-score value used as a reference is not satisfactory.

B.8. Ablation Studies
We perform additional ablation studies to validate the necessity of the sequence module.We train the models in a sequence similarity split and a structure similarity split and report the results in Table 1.In our diffusion model formulation, adding the sequence module facilitates performance improvement.The inference time of diffusion-based models is largely dependent on the number of steps in the sampling process.For the runtime analysis, we use 50 steps identical to those in our other experiments.On a GeForce RTX 3090 GPU, we report wall clock times of RiboDiffusion generation with different lengths of RNA and different numbers of sequences generated simultaneously in Figure 5. RiboDiffusion can finish the inverse folding of 200 nt RNA in just one second when generating a sequence.However, when generating 128 sequences simultaneously, RiboDiffusion experiences a significant increase in processing time, leading to limitations in scalability.We believe that the running speed of RiboDiffusion can be further improved in the future by accelerating the diffusion models, which is currently an emerging topic in machine learning.We report extra results of secondary structure-based inverse folding methods in Table 2.These methods obtain high F1 scores because they directly use energy optimization to obtain sequences, making it unfair to compare with other methods.It is difficult for secondary structure-based inverse folding methods to generate new sequences in the same family due to the information loss compared to the tertiary structure input, even for tRNA with a more conservative shape.

••
Fig.1: Overview of RiboDiffusion for tertiary structure-based RNA inverse folding.We construct a dataset with experimentally determined RNA structures from PDB, supplemented with additional structures predicted by an RNA structure prediction model.We cluster RNA with different cut-offs for sequence or structure similarity and make cross-split to evaluate models.RiboDiffusion trains a neural network with a structure module and a sequence module to recover the original sequence from a noisy sequence and a coarse-grained RNA backbone extracted from the tertiary structure.RiboDiffusion then uses the trained network to iteratively refine random initial sequences until they match the target structure.We present a comprehensive evaluation and analysis of the proposed method.

Fig. 2 :
Fig. 2: Violin plots for the recovery rate distribution of methods for different types of RNA, including tRNA, rRNA, sRNA, ribozyme, snRNA, SRP RNA, hammerhead ribozyme, and pre miRNA.

Fig. 3 :
Fig. 3: Performance of RiboDiffusion on different RNA families under the cross-family setting.The average length and number of tertiary structures for each family are marked above violin plots.

Fig. 4 :
Fig. 4: Analysis of RiboDiffusion.(a)-(b) In-silico folding validation results that show the TM-score between structures predicted by RhoFold or DRFold and the given fixed RNA backbones (on Seq.0.4 split).Native represents structures predicted from original sequences of given backbones as references, while Generated represents structures predicted from generated sequences.(c)-(d) Trade-offs between the diversity of generated sequences and recovery rate, as well as refolding F1-score (including models with and without augmented data).(e) Visualization of input RNA structures (pink) and predicted structures (green) of generated sequences.The generated sequences and the corresponding native sequences are shown below the structure visualization, where different nucleotide types are marked in red.

Fig. 2 :
Fig. 2: The correlation between different mutation rates and free energy (with random mutation and RiboDiffusion).
(a), generated and native sequences have similar TMscore distribution when predicted by trRosettaRNA.The result of SimRNA is shown in Figure 3(b).The performance of SimRNA is relatively poor, which indicates that although generated sequences have a similar TM-score distribution to natural sequences, the refolding evaluation based on SimRNA may have a large error and uncertainty.

Fig. 3 :
Fig. 3: In-silico folding validation results of trRosettaRNA and simRNA.In-silico folding validation results that show the TM-score between structures predicted by trRosettaRNA or simRNA and the given fixed RNA backbones (on Seq.0.4 split).Native represents structures predicted from original sequences of given backbones as references, while Generated represents structures predicted from generated sequences.

Fig. 4 :
Fig. 4: Performance on CASP15.(a)A bar chart shows the recovery rate of RiboDiffusion on six natural RNAs in CASP15.(b) A bar chart displays the TM-score between predicted structures of RiboDiffusion-generated sequences and given RNA backbones.The TM-score of predicted structures from native sequences is displayed as a reference.

Fig. 5 :
Fig. 5: Running time and scalability analysis.A line chart shows the relationship between running time and RNA sequence length when predicting different numbers of RNA sequences simultaneously.

Table 2 .
Comparison of secondary structure similarity and success rate of family preservation.F1: F1 score.Suc.: success rate of family preservation.
(Kalvari et al., 2021)), and the success rate determined by Rfam's covariance model(Kalvari et al., 2021), which evaluates the preservation of family-specific information in the generated sequences, indicating conserved structures and functions.Average success rates across families are reported.

Table 3 .
Ablation study on data augmentation.Rec.: recovery rate.

Table 2 .
Comparison of secondary structure similarity and success rate of family preservation.The F1 score is an unfair metric for energy-optimized methods.

Table 3 .
Results on newly published RNA structures.TM-score (generated) is calculated between the given structure and the refolded structure from the RiboDiffusion-RhoFold pipeline.TM-score (native) is calculated between the given structure and the predicted structure of RhoFold with the original native sequence.Nucleic acids research, 43(21):e142-e142, 2015.F. Runge, D. Stoll, S. Falkner, and F. Hutter.Learning to design rna.In ICLR, 2019.