Genome-scale quantification and prediction of pathogenic stop codon readthrough by small molecules

Premature termination codons (PTCs) cause ~10–20% of inherited diseases and are a major mechanism of tumor suppressor gene inactivation in cancer. A general strategy to alleviate the effects of PTCs would be to promote translational readthrough. Nonsense suppression by small molecules has proven effective in diverse disease models, but translation into the clinic is hampered by ineffective readthrough of many PTCs. Here we directly tackle the challenge of defining drug efficacy by quantifying the readthrough of ~5,800 human pathogenic stop codons by eight drugs. We find that different drugs promote the readthrough of complementary subsets of PTCs defined by local sequence context. This allows us to build interpretable models that accurately predict drug-induced readthrough genome-wide, and we validate these models by quantifying endogenous stop codon readthrough. Accurate readthrough quantification and prediction will empower clinical trial design and the development of personalized nonsense suppression therapies.

Validating the DMS readthrough values for 15 variants, under SRI treatment, using single variant genome integration by flow cytometry (See single variants validation Methods section) allowed us to assess the performance of our DMS sort-seq readthrough assay (Fig. 1d).Correlation value (r=0.95,Pearson) is very high when removing the outlier (correlation on 14/15 variants).The inclusion of this outlier drops the correlation to 0.8.Ten more variants spanning readthrough values around the outlier were individually tested, confirming the loss of accuracy in the upper readthrough range (Extended Data Fig. 1h).
The disagreement in readthrough values between sort-seq and validation in these high readthrough variants responds to a well-known sort-seq related problem 1 .Variants with readthrough distributions smaller than the sorting gate width are more likely to be miscalled.Our gates were designed in the logarithmic scale, meaning that high-readthrough gates have larger widths than low-readthrough gates.This holds especially true for the highest gate, which was designed to include all cells above the top-edge of the second-highest gate, in order to not lose any high-readthrough cells.In turn, only very high-readthrough variants suffer from the problem described above and this results in our assay having a 6% readthrough upper bound (under SRI treatment), preventing us from quantifying the exact readthrough efficiency of variants with readthrough>6% (Fig. 1d).Note that we are underestimating the readthrough of these variants, being the real readthrough equal or higher (but not lower) than our DMS estimate.For how many variants readthrough is underestimated?Using the DMS vs individual measurements data, we detected that DMS readthrough estimates start to saturate for variants with >90% reads in the last gate.We computed the percentage of variants in each treatment that had more than 90% of reads in the highest gate.This number is around 1% (0.41, 0.22, 1.54, 1.73, 1.04, 0.54, 0.57 and 1.31; for CC90009, clitocine, DAP, FUr, G418, gentamicin, SJ6986 and SRI, respectively), meaning that 99% of the library lays on the optimal dynamic range of the assay.We used the average readthrough of these groups of variants as the assay upper limit for each of the treatments, being readthrough of 2.6%, 2.8%, 4.5%, 1.5%, 3.2%, 1.2%, 6.3% and 5.8%; for CC90009, clitocine, DAP, FUr, G418, gentamicin, SJ6986 and SRI, respectively.Same calculations were performed for the NTC library to define the upper limit for each drug condition, with 0.75%, 0.25%, 0.6%, 0.31% and 0.98% variants with >90% reads in the last bin and readthrough values saturating at 2.6%, 2.9%, 3%, 5.1% and 3.6% for clitocine, DAP, G418, SJ6986 and SRI; respectively.

Supplementary Note 2: Sequence effects are preserved across drug concentrations
The optimal drug concentration might be different in other assays and clinical applications.We therefore measured the library under two more SJ6986 concentrations (0.5μM and 20μM) to test the concentration-specificity of the sequence effects.High-quality data (interreplicate correlations r=0.92 and r=0.91, for 0.5μM and 20μM; respectively) showed a shift in the readthrough distribution of the library compared to the 5μM conditions (mean readthrough across all variants is 0.93%, 1.64% and 1.81% for 0.5μM, 5μM and 20μM conditions; respectively), and very good correlations for all three pairwise comparisons, indicating the absence of interaction effects between drug concentration and sequence context (Extended Data Fig. 2k,l).5μM and 20μM readthrough efficiencies show excellent linear correlations (r=0.92),whereas 0.5μM shows a slightly decreased linear correlation (r=0.88 with 5μM and 20μM).This is indicative of a non-linear trend where, in the 5μM and 20μM conditions, very high readthrough variants display lower readthrough than predicted by a linear model (Extended Data Fig. 2l).The readthrough of variants with very high readthrough at 0.5μM increases less than expected by a linear model upon 5μM and 20μM treatment.In summary, variants are equally ranked across drug concentrations but their readthrough efficiencies are not always linearly scaled.

Supplementary Note 3: From readthrough signatures to mechanism of action (MOA)
Some of the sequence preferences of particular drugs can be understood from their MOAs.For example, DAP interferes with cytosine 34 modification in tRNATrp and makes it more prone to near-cognate codon pairing 2 .The only near-cognate stop codon to tryptophan (UGG) is UGA, which is consistent with DAP showing the highest UGA-specificity amongst all drugs tested.As another example, in our data the adenosine analog clitocine induces readthrough over UAA and UGA variants.The insertion of clitocine at the wobble position 3 of the codon increases near-cognate pairing, whereas position 2 is intolerant to mispairing 3 potentially preventing clitocine-induced readthrough over UAG variants.Future research might consider the sequence-drug specificities described here to gain additional insights into drug MOAs.

Supplementary Note 4: Comparison of sequence context effects
Sequence downstream of the PTC: For all drugs, the preferred +1 nucleotide is C.But the rest of the nucleotides show distinct preferences across drugs.For example, U is the most readthrough insensitive +1 nucleotide for SRI and SJ6986, whereas it is the second most sensitive for G418 and gentamicin.Interestingly, drugs with a similar preference for particular stop codons can differ in their preference for the +1 nt.G418 and SRI both promote readthrough of UGA codons over UAG and UAA, but G418 preferences at +1nt are C>U>A>G and SRI's are C>A>G>U (all comparisons adjusted p<1e-05, Wilcoxon).Drugs also show quantitative differences in their +1 nt preferences.For example, readthrough by all drugs is greater for C (n=542) than G (n=1074) at the +1 nt but this preference is stronger for SRI, SJ6986 and G418 (3-, 2.5-and 2.8-fold, respectively) than for clitocine and DAP (1.2-and 1.5-fold, respectively) (adjusted p<2e-16) (Fig. 2e, Extended Data Fig. 2d).
Sequence upstream of the PTC: Clustering sequences in our library by the upstream codon reveals upstream preferences for each of the drugs (p<2e-16, Kruskal Wallis test) (Fig. 2h, Extended Data Fig. 2b).