An AsCas12f-based compact genome-editing tool derived by deep mutational scanning and structural analysis

UCD-250 (Cosmo Bio Co., Ltd.


INTRODUCTION
CRISPR-Cas (clustered regularly interspaced short palindromic repeats and CRISPR-associated proteins) systems provide adaptive immunity against mobile genetic elements in bacteria and archaea and are divided into two classes (classes 1 and 2) and six types (types I-VI). 1,2Cas9 (type II) from Streptococcus pyogenes (SpCas9) associates with dual RNA guides (CRISPR RNA [crRNA] and trans-activating crRNA [tracrRNA] or their artificially connected single-guide RNA [sgRNA]) and cleaves double-stranded DNA (dsDNA) targets flanked by an NGG (where N is any nucleotide) protospacer adjacent motif (PAM), using its HNH and RuvC nuclease domains. 3,4By contrast, among the diverse type V Cas12 enzymes, Cas12a from Acidaminococcus sp.(AsCas12a) binds a crRNA and cleaves dsDNA targets with TTTV (where V is A, G, or C) PAMs, using a single RuvC nuclease domain. 5As SpCas9 and AsCas12a exhibit robust nuclease activities in eukaryotic cells, they are widely used as versatile genome engineering tools. 5,6However, neither SpCas9 (1,368 amino acids) nor AsCas12a (1,307 amino acids) can be efficiently packaged into a single adeno-associated virus (AAV) vector due to their large gene sizes, which hampers their clinical applications to in vivo gene therapy.
Recent studies demonstrated that type V-F Cas12 effectors are exceptionally compact (400-700 amino acids) RNAguided DNA endonucleases. 7,8Cas12f associates with dual crRNA:tracrRNA guides and cleaves target DNAs with T-rich PAMs.Previous structural studies of Cas12f from an uncultured archaeon (UnCas12f), with 529 amino acids, revealed that UnCas12f functions as a dimer to compensate for its small size. 9,10However, UnCas12f cleaves DNA targets only under low salt conditions in vitro and lacks activity in human cells, which limits its application as a genome-editing tool.The minimal Cas12f from Acidibacillus sulfuroxidans (AsCas12f), which consists of only 422 amino acids, can cleave DNA targets with a TTR (where R is A or G) PAM under physiological conditions in vitro. 8In addition, AsCas12f exhibits low but detectable genome-editing activity in human cells. 11Therefore, AsCas12f shows great promise as a miniature genome-editing tool that can be packaged into a single AAV vector.
In this study, we present a comprehensive effort to improve the AsCas12f system for genome editing.By combining structural analysis and deep mutational scanning (DMS) methods, we revealed the molecular basis and identified a detailed landscape of amino acid substitutions that greatly augment the nuclease activity of AsCas12f.The synergistic effects of these mutations, coupled with guide RNA engineering, remarkably enhanced the genome-editing efficiency of AsCas12f in human cells to levels comparable with those of both SpCas9 and engineered AsCas12a effectors.The compact size of AsCas12f offers an attractive feature for AAV-deliverable sgRNA and partner genes, such as base editors and epigenome modifiers.Therefore, our newly engineered AsCas12f systems could be a promising genome-editing platform.

RESULTS
Cryo-EM structure of the AsCas12f-sgRNA-target DNA ternary complex To understand the molecular mechanism of AsCas12f, we used cryoelectron microscopy (cryo-EM) to analyze the complex structure of AsCas12f with a 222-nucleotide (nt) sgRNA (natural 29-nt crRNA and 169-nt tracrRNA sequences connected by a GAAA tetraloop) and a 38-nt dsDNA with phosphorothioate modifications of the DNA backbone around the cleavage site with a TTA PAM and obtained a reconstruction at an overall resolution of 3.1 A ˚(Figures 1A-1C and S1A-S1D; Table 1).The structure revealed that two AsCas12f molecules (AsCas12f.1 and AsCas12f.2) assemble with one sgRNA molecule to form an asymmetric homodimer, as previously observed in Un-Cas12f. 9,10The Cas12f dimer adopts a bilobed architecture comprising a recognition (REC) lobe and a nuclease (NUC) lobe, with the guide RNA-target DNA heteroduplex bound to the central channel between the two lobes.The REC lobe consists of the wedge (WED) and REC domains of both AsCas12f.1 and AsCas12f.2(WED.1/WED.2/REC.1/REC.2),whereas the NUC lobe includes the RuvC and target nucleic-acid-binding (TNB) domains of both AsCas12f.1 and AsCas12f.2(RuvC.1/RuvC.2/TNB.1/TNB.2).
AsCas12f is over 100 residues shorter than UnCas12f, whereas its cognate sgRNA is about 40 nucleotides longer than that of UnCas12f.A structural comparison of AsCas12f with UnCas12f revealed that, although the domain configurations are similar, AsCas12f lacks the zinc-finger (ZF) domain inserted between the WED and REC domains, which is observed in UnCas12f (Figure S2A).Instead, in the AsCas12f structure, the ZF domain is replaced with PK 2 and stem 3, which are not present in UnCas12f (Figure S2A).These findings account for the miniaturization of AsCas12f, which is compensated by its longer sgRNA.
The assembly of the sgRNA scaffold with AsCas12f is facilitated by both base-specific and non-specific interactions (Figures 2A-E).The continuous helix consisting of stem 1 and PK 1 is primarily recognized and accommodated within the groove formed by the WED.1 and RuvC.1 domains (Figure 2C).Stem 2 passes through the gap between AsCas12f.1 and As-Cas12f.2and extensively interacts with both protomers, likely reinforcing their dimerization, as described above (Figures S2F  and S2G).PK 2 is recognized by the WED.1 and REC.1 domains through sugar-phosphate backbone interactions and further stabilized by the coordination of metal ions (Figure 2D).By contrast, stems 3 and 4 are exposed to the solvent and have minimal interactions with AsCas12f.
The guide RNA-target DNA heteroduplex is accommodated within the positively charged central channel and recognized by AsCas12f through base non-specific interactions (Figures 2B  and 2E).The PAM-proximal region of the heteroduplex (G1:dC18 to G12:dC7) is mainly recognized by Cas12f.1, whereas the PAM-distal region of the heteroduplex (C13:dG6 to G18:dC1) is recognized by AsCas12f.2(Figure 2E).Notably, P240.2 in RuvC.2 stacks with the G18:dC1 base pair in the heteroduplex, indicating that 18 nucleotides in the spacer sequence function as a guide segment (Figure 2E).A structural comparison with UnCas12f revealed that the position of RuvC.1 in AsCas12f is similar to that of RuvC.1 in UnCas12f, which cleaves both the target and non-target strands (Figure 2F).These structural observations suggest that AsCas12f.1 is responsible for cleaving the target DNA, whereas AsCas12f.2plays a crucial role in recognizing the PAM-distal region of the heteroduplex.

Engineering of AsCas12f to enhance genome-editing efficiency in mammalian cells
Previous studies revealed that AsCas12f shows limited genome-editing activity in human cells. 8,11To expand the utility of this compact protein, we sought to engineer an As-Cas12f variant with enhanced activity.We first performed DMS to determine how all amino acid substitutions impact the genome-editing efficiency in HEK293T cells.We designed a plasmid that expresses both EGFP and AsCas12f and created an AsCas12f library that encompasses all 20-amino acid substitutions in the whole sequence (M1-K422) (Figure 3A).The sgRNA targeting GFP was transduced into HEK293T cells, and subsequently, the AsCas12f library packaged within a lenti-virus was expressed at a multiplicity of infection (MOI) of less than 0.2, ensuring that no more than one mutant AsCas12f1 was expressed per cell (Figure 3B). 12Given that the full-length sgRNA exhibited suboptimal genome-editing efficiency, a stem 5-deleted variant (sgRNA_DS5), which exhibited enhanced genome editing in human cells, was used for the screening process.We selected lentivirus-infected cells by sorting GFP-positive cells on day 2 post-infection and cultured them for 5 days further.On day 7 post-infection, we separately extracted mRNAs from GFP-positive and GFP-negative cells and performed the deep sequencing analysis to identify mutations (Figure 3B).The editing efficiency of each mutation was defined as the ratio of GFP-negative read-count to total read-count and normalized by the value of wild-type (WT) AsCas12f (Figure 3C).DMS experiments were performed in duplicate and yielded similar results (R 2 [coefficient of determination] = $0.6), as in our previous study (Figure S3A). 13We identified over 200 single amino acid substitutions that resulted in a more than 20% increase in editing efficiency compared with the WT (Figures 3C, 3D, and S3B).Among them, we first focused on the S188H and D195K mutations, which are located close to the nucleic acids in our structure and could form additional interactions to reinforce the target DNA binding by AsCas12f, and adopted them for further experiments.We next developed a split-GFP reporter system, which is connected by a frameshift linker containing the Vascular endothelial growth factor A (VEGFA) sequence, and evaluated the genome-editing activity of WT and several mutants by measuring the ratio of frameshifted GFP-positive cells (Figure S3C). 14Consistent with the DMS analysis, the S188H and D195K mutants exhibited higher genome-editing activities than WT (Figure 3E).To create variants with higher genome-editing activity than the single S188H or D195K variants, we combined the S188H or D195K substitution with other substitutions selected based on the DMS results and our structural perspective.Although some double mutants showed decreased genome-editing activity, S188H/V232A and D195K/V232A exhibited improved activity (Figures 3E and S3D).To further increase their activities, we introduced additional mutations that enhanced the activity when combined with S188H or D195K, resulting in the generation of seven triple mutants.The S188H/V232A/E316M and D195K/D208R/V232A triple mutants showed further improved genome-editing activities (Figures 3E and S3E).Lastly, we constructed 29 quadruple mutants by combining the S188H/ V232A/E316M or D195K/D208R/V232A triple mutants with other effective substitutions identified through the DMS analysis and assessed their genome-editing activities.Most of these variants showed comparable or lower activities than the triple mutants, but the F48Y/S188H/V232A/E316M To differentiate between NTS and TS within the manuscript, asterisks are used to denote NTS.(E) Structure of the sgRNA and target DNA complex.The dotted arrows represent the backbone direction, from 5 0 to 3 0 .See also Figures S1 and S2.S3F).We designated the F48Y/S188H/V232A/E316M and I123H/D195K/D208R/V232A variants as AsCas12f-YHAM and AsCas12f-HKRA, respectively.

Optimization of sgRNA for AsCas12f
A recent study reported that deleting stem 5 from the sgRNA enhances the genome-editing activity of AsCas12f. 15Consistent with this finding, stem 5 is completely disordered in our structure.
Our present structure also revealed that, although PK1, PK2, stem 1, and stem 2 form extensive interactions with the AsCas12f protein, stems 3 and 4 are exposed to the solvent and form few interactions with the protein (Figure 2A).Thus, we sought to further engineer the sgRNA by truncating these domains.We designed seven sgRNA variants (sgRNA_DS3-5_v1-v7) to eliminate stems 3, 4, and 5 while connecting the original architecture of the remaining regions by different linker sequences (Figure S4A).As expected, all the sgRNA variants exhibited increased genome-editing activities compared with sgRNA_DS5, with sgRNA_DS3-5_v7 being the most effective, albeit by a slight margin (Figure S4B).In our initial DMS experiment, the sgRNA transduced by a multi-copy lentivirus led to higher genome-editing activities than that by a single-copy lentivirus.Therefore, we hypothesized that the expression level of sgRNA in cells may influence the activity of AsCas12f.Indeed, the expression level of sgRNA_DS3-5_v7 was 4.5-fold higher than that of sgRNA_DS5 in HEK293 cells (Figure S4C), suggesting that shortening the sgRNA sequence led to an increase in its expression, and consequently enhanced the genome-editing activity.To further elucidate the effects of our sgRNA modification, we transduced the GFP-targeting sgRNA_DS5 and sgRNA_DS3-5_v7 by single-copy lentivirus infection and evaluated the deletion rate of GFP.sgRNA_DS3-5_v7 significantly improved the GFP deletion efficiencies of both AsCas12f-WT and AsCas12f-HKRA, compared with sgRNA_DS5 (Figures 3F  and S4D), suggesting that sgRNA_DS3-5_v7 is effective in situations with a limited expression level, such as in vivo gene delivery.

Cryo-EM structures of AsCas12f variants with optimized sgRNA
To gain mechanistic insights into the enhanced DNA cleavage activity exhibited by our AsCas12f variants, we determined the cryo-EM structures of the AsCas12f-YHAM-sgRNA_DS3-5_v7-target DNA and AsCas12f-HKRA-sgRNA_DS3-5_v7target DNA complexes, both at overall resolutions of 2.9 A (Figures 4A and S1E-S1L; Table 1).The overall structures of the AsCas12f-YHAM and AsCas12f-HKRA variants are similar to that of WT AsCas12f, revealing that the mutations introduced by our DMS experiment do not substantially affect the overall structures of the complexes.
In the AsCas12f-YHAM structure, in addition to the hydrophobic interactions observed in WT AsCas12f, Tyr48.1 (F48Y.1),located at the dimer interface, may form a hydrogen bond with the main chain of G57.2, likely promoting more stable dimerization (Figure 4B).The side chain of His188.1 (S188H.1) is stabilized by Ile2 and interacts with the backbone phosphate group between dT19 and dC18, indicating that the S188H mutation promotes the unwinding of the target DNA (Figure 4C).Met316.1 and.2 (E316M.1 and.2) form additional hydrophobic interactions with Thr239 and Ala241, thereby potentially enhancing the overall stability (Figure 4D).
Despite the presence of the V232A mutation in both the AsCas12f-YHAM and AsCas12f-HKRA variants, our structures do not provide clear insight into its contribution to the enhanced DNA cleavage activities of both variants.These findings suggest that DMS-based engineering approaches offer significant potential for generating highly active mutants that cannot be predicted from structural information alone.

Characterization of AsCas12f variants in human cells
To assess the PAM specificity of enAsCas12f variants toward a broad range of targets, we developed a self-targeting library composed of both the sgRNA and its target sequence and measured insertion-deletion (indel) formations induced by WT AsCas12f (referred to as AsCas12f for simplicity), AsCas12f-YHAM, and AsCas12f-HKRA toward 750 different spacer sequences with 16 NTTN PAMs (Figures 5A and S5A).The deep sequence analysis demonstrated that the three enzymes induced indels at the NTTR PAM, but not the NTTY (where Y is T or C) PAM, indicating that they commonly recognize the NTTR sequences as the PAM (Figure 5B).Our cryo-EM structures revealed that the nucleobases of dT(À3*) and dT(À2*) in the PAM duplex form hydrophobic interactions with Tyr76.1, whereas the N6 of dA(À1*) forms a hydrogen bond with His72.1 (Figure S5B).When dA(À1*) is replaced by dG, the O6 of dG(À1*) can form a hydrogen bond with His72.1, providing a clear explanation for the NTTR PAM preference.At the NTTR PAM, AsCas12f induced indels at 3.0% on average, whereas AsCas12f-YHAM and AsCas12f-HKRA induced indels at 40.8% and 44.7% on average, respectively (Figure 5B).These results established that the enAsCas12f variants exhibit genome-editing activities much higher than that of AsCas12f at various target sequences with the NTTR PAM.
Next, we compared the genome-editing efficiencies of AsCas12f and enAsCas12f variants with those of AsCas12a and AsCas12a variants (UltraCas12a and enAsCas12a), 16,17 at five target sites with the NTTG PAM in HEK293T cells.AsCas12f, AsCas12f-YHAM, AsCas12f-HKRA, AsCas12a, UltraCas12a, and enAsCas12a generated indels at these five sites with 8.5%, 22.3%, 14.3%, 9.0%, 11.0%, and 11.1% frequencies on average, respectively (Figure 5C).These results indicated that the enAsCas12f variants, but not AsCas12f, can induce indels at efficiencies comparable with those of the AsCas12a and As-Cas12a variants.We also compared the genome-editing efficiencies with those of SpCas9 at eight target sites, which had the same spacer sequences with a 5 0 -TTTG PAM for AsCas12f and an NGG-3 0 PAM for SpCas9, in HEK293T cells (Figure 5D).AsCas12f, AsCas12f-YHAM, AsCas12f-HKRA, and SpCas9 (F) Time course of GFP deletion induced by AsCas12f and AsCas12f-HKRA in HEK293T cells expressing d2EGFP (n = 3, mean ± SD). sgRNA_DS5 and sgRNA_DS3-5_v7 were transduced by a single-copy lentivirus infection.See also Figures S3 and S4.generated indels at the eight target sites with 1.7%, 14.8%, 16.9%, and 22.9% frequencies on average (Figure 5D).Furthermore, we measured indel formations at three therapeutic targets, PCSK9 and ANGPTL3 for atherosclerosis 18 and TTR for transthyretin amyloidosis, 19 in HEK293T cells.We observed higher indel frequencies for AsCas12f-YHAM and AsCas12f-HKRA than those for SpCas9 at two of the three target sites (Figure 5E).Similar results were obtained at other target sites and in other human cell lines, Huh-7 hepatoma cells, and HT-1080 sarcoma cells (Figures S5C-S5F), indicating that both AsCas12f-YHAM and AsCas12f-HKRA exhibit genome-editing activities comparable with those of SpCas9 and AsCas12a variants regardless of the target sites or cell lines.
To examine the specificities of enAsCas12f variants, we investigated their mismatch tolerance.Both enAsCas12f variants showed broad tolerance of a single mismatch but negligible tolerance of double mismatches, except for those at the PAMdistal region (positions 16-20), as observed with Cas12a and other Cas12fs (Figure S6A). 15,20,21We also examined the genome-wide specificities of AsCas12f-HKRA and SpCas9  using GUIDE-seq (genome-wide, unbiased identification of double-stranded breaks (DSBs) enabled by sequencing) [22][23][24] at nine target sites.AsCas12f-HKRA and SpCas9 had comparable numbers of off-target sites, and AsCas12f-HKRA tolerated mismatches at the PAM-distal 5-nt region, consistent with our mismatch experiments (Figures S6B and S6C).These results demonstrated that enAsCas12f variants with sgRNA_DS3-5_v7 exhibit genome-editing activities and specificities comparable with those of SpCas9 and AsCas12a, despite their extremely compact size (Figure S5F).

Therapeutic potential of AsCas12f-mediated genome editing
To evaluate the potentials of the variants for therapeutic applications, we generated induced pluripotent stem cells (iPSCs) with the deletion of Duchenne muscular dystrophy (DMD) exon 44 (hDMDDEx44), which is the major mutation of DMD, and differentiated them into cardiomyocytes (iPSC-CMs) (Figures S7A-S7D).Dystrophin deficiency causes myopathy and cardiomyopathy, and the skipping of exon 45 to restore the DMD protein is now in clinical trials. 25,26We first expressed AsCas12f-HKRA and paired sgRNAs designed to skip DMD exon 45, using an all-in-one AAV serotype 6 vector in WT iPSC-CMs, and found that AsCas12f-HKRA efficiently reduced the amount of dystrophin protein (Figure 6A).Furthermore, this all-in-one treatment partially restored the dystrophin protein in hDMDDEx44-iPSC-CMs (Figures 6B and 6C), confirming the potential of AsCas12f variants to treat DMD using AAV gene delivery.
Next, to investigate the therapeutic potential of AsCas12f in vivo, we applied AsCas12f-HKRA to mouse liver genome editing.We constructed an AAV vector encoding AsCas12f or AsCas12f-HKRA under the HCRhAAT promoter and a U6 promoter-driven sgRNA targeting the TTR gene for transthyretin amyloidosis (Figure 6D). 27We injected 7-week-old mice with 3 3 10 11 or 1 3 10 12 vg of the hepatotropic AAV serotype 8 and evaluated the expression level of plasma transthyretin.As-Cas12f failed to modulate the plasma transthyretin level, whereas AsCas12f-HKRA reduced it in a dose-dependent manner (Figure 6E).8 weeks after AAV-based gene delivery, the target locus was analyzed by targeted amplicon deep sequencing, which showed that a high editing rate (66.3%) was achieved by 1 3 10 12 vg of AsCas12f-HKRA (Figure 6F).
We then evaluated the in vivo knock-in efficiency of the As-Cas12f system to insert the EGFP gene at the mAlb 3 0 UTR.The knock-in of genes at the Alb locus is an intensively explored platform to ectopically produce therapeutic proteins, including coagulation factors and lysosomal enzymes, from the liver. 28,29e prepared two AAV vectors, one expressing AsCas12f or AsCas12f-HKRA and the sgRNA targeting the mAlb 3 0 UTR, and the other providing a donor template to knock in EGFP at the DSB only by homology-directed repair (HDR) (Figure 6G).We intraperitoneally injected these two AAV vectors into C57BL/6 WT neonatal mice and assessed EGFP expression in the liver at 4 weeks after the vector injection by immunofluorescence microscopy.We confirmed a significant increase in EGFP-positive liver cells with the injection of the AAV vector harboring AsCas12f-HKRA (Figures 6H and 6I).We next replaced the EGFP gene of the donor vector with the coagulation factor IX (F9) cDNA with the Padua mutation and then injected the vectors into neonatal hemophilia B mice (F9-deficient mice).The knock-in of the F9 gene at the Alb locus using AsCas12f-HKRA, but not AsCas12f, significantly increased the plasma coagulation factor IX (FIX) activity beyond the therapeutic range (Figure 6J).These data indicate that our engineered AsCas12f variants with the optimal sgRNA can be harnessed for in vivo gene therapies (e.g., for hemophilia).

Applications of compact enAsCas12f
Given its small size, the AsCas12f gene can be packaged into an all-in-one AAV vector along with multiple sgRNAs and/or large partner genes, enabling its application to genome-editing treatments that were not possible with previous genome-editing tools.We designed a single AAV vector, encoding AsCas12f (or AsCas12f-HKRA) under a liver-tropic Ttr promoter, a donor sequence, and a U6 promoter-driven sgRNA, to insert the F9 cDNA with the Padua mutation into the Alb 3 0 UTR locus (Figure 7A). 30We injected the single AAV serotype 8 vector into neonatal hemophilia B mice and found that both the plasma FIX activity (FIX:C) and antigen (FIX:Ag) were significantly increased into the therapeutic range by the AAV vector harboring AsCas12f-HKRA, but not by that carrying AsCas12f (Figures 7B  and 7C).Consistent with these results, the coagulation time assessed by activated partial thromboplastin time (APTT) and the expression level of F9 mRNA assessed by quantitative realtime PCR (RT-PCR) were significantly improved (Figures 7D  and 7E).The amylose gel analysis of the mRNA PCR fragments revealed that cDNA insertion via HDR had occurred, although some insertions mediated by non-homologous end joining (NHEJ) were also observed (Figure 7F).
Finally, to investigate the utility of enAsCas12f for epigenome editing, we performed a transcriptional activation assay in Huh-7 cells, which stably express luciferase driven by the minimal cytomegalovirus (CMV) promoter with two HEXA sgRNA recognition sites (Figure 7G).To enhance the transcriptional activity of enAsCas12f, we engineered an sgRNA with the MS2 aptamer inserted into its stem loops. 31We transfected the Huh-7 cells with the plasmids expressing catalytically inactive AsCas12f-HKRA (dAsCas12f-HKRA) conjugated with VP64 and an MS2-fused activator (MS2-p65-HSF1), as well as engineered sgRNAs targeting HEXA (Figure 7G).As shown in Figures 7H and  S7E, dAsCas12f-HKRA in combination with the MS2 aptamerinserted sgRNA significantly enhanced luciferase expression (Figure 7H).The conjugation of VP64 at the end of dAsCas12f-HKRA was most effective for transcriptional activation, although the direct conjugation of VPR (VP64, p65, and Rta) failed to enhance the transcription by MS2-p65-HSF1 (Figures S7F and  S7G).In addition, we transduced Huh-7 cells with the all-inone AAV serotype 6 vector, encoding dAsCas12f or dAs-Cas12f-HKRA conjugated with VP64, MS2-p65-HSF1, and sgRNA, and confirmed the significant increase of luciferase expression in a dose-dependent manner (Figures 7I and S7H).Lastly, to investigate the utility of enAsCas12f-mediated transcriptional activation in vivo, we administered the AAV vector to mice (Figure S7I).We observed the significant increase of luciferase activity in mice treated with dAsCas12f-HKRA, but not with dAsCas12f (Figures 7J and 7K).These results indicated that enAsCas12f conjugated with a transcriptional activator can be harnessed for a transcriptional activation tool deliverable using a single AAV vector.

DISCUSSION
In this study, we determined the cryo-EM structures of the AsCas12f-sgRNA-target DNA ternary complex and revealed that two AsCas12f molecules (AsCas12f.1 and AsCas12f.2) assemble with a single sgRNA to form an asymmetric homo-dimer, similar to UnCas12f.This observation suggests that the compact type V-F CRISPR-Cas nucleases commonly function as dimers.
We applied the DMS technique to CRISPR-Cas effectors and successfully engineered compact AsCas12f variants with enhanced activities.The DMS approach combines exhaustive protein mutagenesis and functional screening with deep sequencing, enabling the assessment of the effects of thousands of mutations in a single experiment.One of the typical applications of DMS is the introduction of the library into the yeast (C) Indel efficiencies by AsCas12f, AsCas12f-YHAM, AsCas12f-HKRA, AsCas12a, Ultra, and enAsCas12a at five endogenous target loci in HEK293T cells (n = 3).(D) Indel efficiencies by AsCas12f, AsCas12f-YHAM, AsCas12f-HKRA, and SpCas9 at eight endogenous target loci in HEK293T cells.The same spacer sequences with a 5 0 -TTTG PAM for AsCas12f and an NGG-3 0 PAM for SpCas9 were used (n = 3).(E) Indel efficiencies by AsCas12f, AsCas12f-YHAM, AsCas12f-HKRA, and SpCas9 at three therapeutic target loci in HEK293T cells (n = 3).See also Figures S5 and S6.
surface display system and the evaluation of the binding affinity of ligands, including antibodies and viral glycoproteins. 32In this study, to evaluate genome-editing activities, we applied the yeast screening system to mammalian cell-based screening.We constructed a library that encompasses all 20 single amino acid substitutions at each position in the whole sequence of  (legend continued on next page) AsCas12f (422 amino acids) and identified over 200 effective mutations at various positions.To efficiently explore the effective combinations of these mutants, we employed a structural perspective.In the absence of an experimentally determined structure, structure prediction models like AlphaFold would be a useful approach for efficiently selecting mutations identified through DMS.Various CRISPR-Cas variants have been engineered with the assistance of structural information 33,34 and directed evolution with random mutagenesis. 35However, the former approach has limited capacities to test the number of mutations, whereas the latter method carries the risk of introducing unnecessary mutations that do not contribute to activity improvement.By contrast, by combining comprehensive DMS with rationally structure-informed design, our approach enables the development of variants with enhanced activity in a more reliable and efficient manner.By applying this powerful approach to other Cas enzymes with different PAM sequences, we can potentially generate efficient genome-editing enzymes capable of targeting a wide range of genes.Moreover, with suitable adaptations to the evaluation system, this approach can be applied to enzymes beyond the scope of genome editing.

Limitations of the study
Although our DMS technique discovered numerous effective substitutions, it is not clear whether the best set of mutations was selected in the final optimized variants, due to the lack of a reliable method for selecting the optimal combination of effective mutations.One future solution is to use computational modeling to predict effective combinations without compromising protein stability. 36Another is a machine learning-based approach.By training a machine learning algorithm using a dataset from the DMS containing multiple mutations, 37 the resultant model could predict genome-editing activity values for all possible mutation combinations.The Jackson Laboratory RRID:IMSR_JAX:004303

Oligonucleotides
Sequences used for structural analysis, see Table S1 This paper Table S1 Spacer and designed full sgRNA sequences, see Table S2 This paper Table S2 Primers for qPCR and NGS, see Table S3 This paper Table S3 Recombinant

RESOURCE AVAILABILITY
Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Osamu Nureki (nureki@bs.s.u-tokyo.ac.jp).

Materials availability
All unique/stable reagents generated in this study are available from the lead contact with a completed Materials Transfer Agreement.
d Microscopy data reported in this paper will be shared by the lead contact upon request.
d The data of unprocessed image files have been deposited in the Mendeley Data repository (https://doi.org/10.17632/dn9h5k4tpf.1).
d This paper does not report original code.d Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

METHOD DETAILS
Protein and RNA preparation for structural analysis The N-terminally His 6 -tagged wild-type AsCas12f, AsCas12f-YHAM, and AsCas12f-HKRA proteins were expressed in Escherichia coli Rosetta2 (DE3).Transformed E. coli cells were cultured at 37 C until the OD 600 reached 0.8, and protein expression was then induced by the addition of 0.1 mM isopropyl b-D-thiogalactopyranoside (Nacalai Tesque).E. coli cells were further cultured at 20 C overnight and harvested by centrifugation.The cells were then resuspended in buffer A (20 mM HEPES-NaOH, pH 7.6, 20 mM imidazole, and 1 M NaCl), lysed by sonication, and centrifuged.The supernatant was mixed with 3 ml Ni-NTA Superflow resin (QIAGEN), and the mixture was loaded into an Econo-Column (Bio-Rad).Proteins were eluted with buffer B (20 mM HEPES-NaOH, pH 7.6, 0.3 M imidazole, 0.5 M NaCl) and then loaded onto a 5-ml HiTrap Heparin HP column (GE Healthcare) equilibrated with buffer C (20 mM HEPES-NaOH, pH 7.6, and 0.5 M NaCl).The proteins were eluted with a linear gradient of 0.5-2 M NaCl and further purified by chromatography on a HiLoad 16/600 Superdex 200 column (GE Healthcare) equilibrated in buffer D (20 mM HEPES-NaOH, pH 7.6, 0.5 M NaCl).The purified proteins were stored at À80 C until use.The wild-type sgRNA and sgRNA_DS3-5_v7 were transcribed in vitro with T7 RNA polymerase and purified by 10% denaturing (7 M urea) polyacrylamide gel electrophoresis.

Electron microscopy sample preparation and data collection
The AsCas12f-sgRNA-target DNA ternary complex was reconstituted by mixing purified AsCas12f, the 222-nt sgRNA, the 38-nt target DNA, and the 38-nucleotide non-target DNA at a molar ratio of 1:1:1:1.Each DNA strand has phosphorothioate modifications within the phosphate backbone around the cleavage site to inhibit DNA hydrolysis (Table S1).The AsCas12f-YHAM-sgRNA_DS3-5_v7-target DNA and AsCas12f-HKRA-sgRNA_DS3-5_v7-target DNA ternary complexes were reconstituted in the same way.The ternary complexes were purified by size-exclusion chromatography on a Superdex 200 Increase 10/300 column (GE Healthcare) ).The purified complex solution (A 260 nm = 10 for wild type and A 260 nm = 4 for mutants) was then applied to Au 300-mesh R1.2/1.3 grids (Quantifoil) that were glow-discharged after adding 3 ml of amylamine in a Vitrobot Mark IV (FEI) at 4 C, with a waiting time of 10 sec and a blotting time of 4 sec under 100% humidity conditions.The grids were plunge-frozen in liquid ethane and cooled to the temperature of liquid nitrogen.
Micrographs for all datasets were collected with a Titan Krios G3i microscope (Thermo Fisher Scientific) running at 300 kV and equipped with a Gatan Quantum-LS Energy Filter (GIF) and a Gatan K3 Summit direct electron detector in the electron counting mode (The University of Tokyo, Japan).Datasets of the AsCas12f-sgRNA-target DNA ternary complex were collected with a total dose of approximately 50 electrons per A ˚2 per 48 frames by the standard mode, and datasets of the AsCas12f-YHAM-sgRNA_DS3-5_v7-target DNA and AsCas12f-HKRA-sgRNA_DS3-5_v7-target DNA ternary complexes were collected with a total dose of approximately 50 electrons per A ˚2 per 64 frames by the CDS mode, using the EPU software (Thermo Fisher Scientific).The dose-fractionated movies were subjected to beam-induced motion correction and dose weighting using Patch Motion Correction, and the contrast transfer function (CTF) parameters were estimated using Patch-based CTF estimation in cryoSPARC v3.3.2.
Single-particle cryo-EM data processing Data were processed using cryoSPARC v3.3.2. 38For the AsCas12f-sgRNA-target DNA ternary complex, 1,304,079 particles were initially selected from the 4,027 motion-corrected and dose-weighted micrographs using Blob picker and Topaz, and extracted at a pixel size of 3.32 A ˚.These particles were subjected to several rounds of 2D classification to curate particle sets.The particles were further curated by heterogeneous refinement, using a map derived from ab initio reconstruction as the template.The selected 150,071 particles were then re-extracted at a pixel size of 1.33 A ˚and subjected to 3D variability analysis.The resulting maps with different conformations were used for subsequent heterogeneous refinement.The selected particles after heterogeneous refinement were refined using non-uniform refinement with optimization of the CTF value, 43 yielding a map at 3.08 A ˚, according to the Fourier shell correlation (FSC) criterion of 0.143. 44he datasets for the AsCas12f-YHAM-sgRNA_DS3-5_v7-target DNA and AsCas12f-HKRA-sgRNA_DS3-5_v7-target DNA ternary complexes were processed using cryoSPARC, in a similar manner to that used for the wild-type AsCas12f-sgRNA-target DNA ternary complex.For data processing details, see Figure S1.

Plasmid generation for cell biological experiments
Using a plasmid encoding both AsCas12f and sgRNA (addgene #171614), AsCas12f and its mutants were cloned into pcDNA4TO (Thermo Fisher Scientific) using a NEBuilder HiFi DNA Assembly Kit (New England Biolabs).The sgRNAs were separately cloned into pBS or lentiGuide (addgene #52963).Other Cas enzymes, including AsCas12a, UnCas12f, SpCas9 and their mutants, were mutagenized and cloned into pcDNA4TO (Thermo Fisher Scientific) with the following plasmids: addgene #114091, #176269, and #98290, respectively.The engineered sgRNA was designed by extracting stem sequences.Both the 5 0 and 3 0 side fragments were cloned into pBS.The reporter for genome editing was constructed with GFP1-10, GFP11 and the linker containing the target sequence.This sequence was cloned into pLenti CMV (addgene #17448).

Immunofluorescent staining of dystrophin
We also replated cardiomyocytes for an immunofluorescent analysis to confirm the restoration of dystrophin in cardiomyocytes.Immunofluorescence staining of DMD was performed as described previously. 51Cells were fixed with ice-cold acetone for 10 minutes at À20 C, and then blocked with 10% donkey serum (S30-100ML, Sigma-Aldrich) / TBS-T (T9141, Takara Bio) for 1 h at room temperature.Primary and secondary antibodies were added to cells in blocking buffer for 2 h and 1 h, respectively.Nuclei were counterstained using Hoechst 33342 (62249, Thermo Fisher Scientific).Antibodies used in this article were dystrophin (MANDYS8, D8168, Sigma-Aldrich, 1:800 dilution) and Donkey anti-Mouse IgG (H+L) Highly Cross-Adsorbed Secondary Antibody, Alexa FluorÔ Plus 488 (A32766, Thermo Fisher Scientific, 1:500 dilution).All imaging was performed using an Olympus FV1000 laser scanning confocal microscope.

Animal experimentation
All animal experimental procedures were approved by The Institutional Animal Care and Concern Committee of Jichi Medical University (permission number: 20051-8), and animal care was conducted in accordance with the committee's guidelines and ARRIVE guidelines.Coagulation factor IX (FIX)-deficient mice (B6.129P2-F9tm1Dws) were obtained from The Jackson Laboratory (Sacramento, CA).C57BL/6 mice were purchased from SLC Japan (Shizuoka, Japan).To obtain plasma samples, mice were anesthetized with isoflurane (1-3%), and the blood sample was drawn from the jugular vein using a 29G micro-syringe (TERUMO) containing 1/10 (volume/volume) sodium citrate.Platelet-poor plasma was obtained by centrifugation and then frozen and stored at À80 C until analysis.The AAV vector was administered intravenously through the jugular vein (100-150 mL) and intraperitoneally (10 mL) in adult and neonatal mice, respectively.
Quantitative RT-PCR Total RNA was isolated from cells with an RNeasy Mini Kit (QIAGEN).The RNA samples were reverse-transcribed using a PrimeScript RT Reagent Kit (Takara Bio).Quantitative real-time PCR was performed using TaqMan Gene Expression Assays [(F9: Hs01592597_m1; Gapdh: Mm99999915_g1 (Thermo Fisher Scientific)] and THUNDERBIRD Probe qPCR Mix (TOYOBO) in a QuantStudio 12K Flex Real-Time PCR system (Thermo Fisher Scientific).Reactions were analyzed in duplicate, and F9 expression levels were normalized to Gapdh mRNA levels.

FIX activity and ELISA
Human FIX activity (FIX:C) was measured with a one-stage clotting-time assay and an automated coagulation analyzer (Sysmex CS-1600).Plasma coagulation time was measured as activated partial thromboplastin time (APTT), using an automated coagulation analyzer (Sysmex CA-500).The FIX antigen (FIX:Ag) was measured as follows: microtiter plates were coated with an anti-human FIX antibody (CEDARLANE).After blocking with 5% casein, diluted plasma samples were incubated for 1 h at 37 C.The antigen binding was detected with the anti-human FIX antibody conjugated with horseradish peroxidase (Affinity Biologicals) and ABTS microwell peroxidase substrate (Seracare).Plasma transthyretin was measured with a Prealbumin ELISA Kit (Aviva Systems, San Diego, CA), according to the manufacturer's recommendations.

Immunohistochemistry
Anesthetized mice were perfused with phosphate buffered saline (PBS).The isolated tissues were fixed with 4% paraformaldehyde, incubated with PBS containing sucrose, and then frozen in optimal cutting temperature compound (Sakura Fintek Japan, Tokyo, Japan).The tissue sections were blocked with 5% donkey serum and then incubated overnight at 4 C with an anti-GFP antibody (MBL Co., Aichi, Japan) and an anti-CD146 antibody.The sections were incubated with anti-rabbit IgG conjugated with AlexaFluor 488 (Thermo Fisher Scientific) and anti-rat IgG conjugated with AlexaFluor 594 (Thermo Fisher Scientific) for 2 h at 4 C.The sections were mounted with VECTASHIELD Mounting Medium with DAPI (Vector Laboratories, Burlingame, CA, USA).Immunofluorescence staining was observed and photographed using an all-in-one microscope (BZ-X700, Keyence, Tokyo, Japan).The EGFP-positive cells were quantified with the BZ-X 700 imaging software (Keyence).

Measurement of luciferase activity
The luciferase gene, driven by the minimal cytomegalovirus promoter conjugated with two HEXA sgRNA-binding sequences, was cloned into the pcDNA3 vector (Thermo Fisher Scientific).The linearized plasmid was transduced into Huh-7 cells, using Lipofectamine 3000 (Thermo Fisher Scientific).To select the transfected cell clones, G418 (Nacalai Tesque) was added to the culture medium.For the luciferase activity measurements, the cells were lysed with 100 ml of lysis reagent (Promega), and then aliquots (10 mL) were added into the wells of a 96-well plate.The 96-well plate was placed in a luminometer (Centro LB 960, BERTHOLD Technologies), and 50 mL of the Luciferase Assay Reagent (Promega) was injected into each well, using the automatic injector.

Measurement of luciferase activity by in vivo imaging in mice
The anesthetized mice were injected intraperitoneally with the luciferin substrate (3 mg/body).We injected two AAV serotype 8 vectors, one encoding dAsCas12f or dAsCas12f-HKRA conjugated with VP64, MS2-p65-HSF1, and sgRNA, and the other encoding luciferase with a HEXA sgRNA binding site.Photons derived from luciferase activity were measured by an IVISâ Imaging System and Living Image software (Xenogen Corp., Alameda, CA).Quantitative data were expressed as photon units (photons/second).

QUANTIFICATION AND STATISTICAL ANALYSIS
All data are expressed as mean ± SD.No statistical methods were used to predetermine sample size.Sample size was based on experimental feasibility and sample availability.Samples were processed in random order.Statistical analyses were performed using GraphPad Prism 10 (Graph Pad Software, San Diego, CA).All data are presented as the mean ± standard deviation (SD).Statistical significance was analyzed by two-tailed Student's t test, one-way ANOVA with post hoc Tukey's multiple comparison test, or twoway ANOVA with post post hoc Sidak's multiple comparison test.P<0.05 was considered as statistically significant.

Figure 1 .
Figure 1.Cryo-EM structure of the AsCas12f-sgRNA-target DNA ternary complex (A) Domain structure of AsCas12f.(B and C) Cryo-EM maps (B) and structural models (C) of the AsCas12f-sgRNA-target DNA ternary complex.The zinc and magnesium ions in the TNB domain and sgRNA scaffold are shown as gray spheres.(D) Schematic of sgRNA and the target DNA.The disordered regions are enclosed by dashed boxes.TS, target strand; NTS, non-target strand; PK, pseudoknot.To differentiate between NTS and TS within the manuscript, asterisks are used to denote NTS.(E) Structure of the sgRNA and target DNA complex.The dotted arrows represent the backbone direction, from 5 0 to 3 0 .See also Figures S1 and S2.

Figure 2 .
Figure 2. Recognition of the guide RNA and target DNA (A) Recognition sites of the guide RNA scaffold and target DNA.(B) Electrostatic surface potential of AsCas12f.The sgRNA-target DNA heteroduplex and the stem 1, stem 2, and PK 1 regions of the sgRNA scaffold are accommodated within the positively charged grooves of the AsCas12f dimer.(C-E) Recognition of stem 1 and PK 1 (C), stem 3 (D), and the guide RNA-target DNA heteroduplex (E).(F) Structural comparison of the TS, NTS, and RuvC active site of AsCas12f with those of UnCas12f (Cas12f from an uncultured archaeon) (PDB: 7C7L).The positions of the RuvC.1 active site relative to the target DNA are similar in both structures, suggesting that AsCas12f also cleaves target DNA by using its RuvC.1 domain.See also Figure S2.

Figure 3 .
Figure 3. Engineering AsCas12f variants with deep mutational scanning (A) The design of the deep mutational scanning (DMS) library for AsCas12f.(B) A schematic of the DMS approach to evaluate the genome-editing efficiency in the context of the GFP gene deletion.(C) Heatmap illustrating how all single mutations affect the genome-editing activity.Squares are colored by mutational effect according to the scale bars on the right, with blue indicating deleterious mutations.Squares with a diagonal line through them indicate wild-type amino acid.(D) Close-up view of the selected amino acid substitutions.Squares are colored by mutational effect according to the scale bars on the right, with blue indicating deleterious mutations.Mutations selected for two quadruple mutants (AsCas12f-YHAM and AsCas12f-HKRA) are highlighted with gray-colored borders.(E) The genome-editing activities of AsCas12f mutants carrying editing efficiency-enhancing mutations, determined by flow cytometry (n = 4).(F) Time course of GFP deletion induced by AsCas12f and AsCas12f-HKRA in HEK293T cells expressing d2EGFP (n = 3, mean ± SD). sgRNA_DS5 and sgRNA_DS3-5_v7 were transduced by a single-copy lentivirus infection.See also Figures S3 and S4.

Figure 6 .
Figure 6.Applications of AsCas12f-HKRA for therapeutic mouse and human models (A) Western blot analysis of dystrophin protein in wild-type iPSC-CMs (AR21-5) after AAV transduction at a titer of 1 3 10 5 vg/cell.(B and C) In hDMDDEx44-iPSC-CMs, AsCas12-HKRA (5 3 10 5 vg/cell) partially restored dystrophin, which was confirmed by western blotting (B) and immunostaining (C).Scale bar represents 40 mm.(D) A schematic of the AAV vector used for mouse liver genome editing.NLS, SV40; pA, poly A sequence.(E) Time course of plasma transthyretin levels (n = 3-4, mean ± SD). (F) Indel frequency at the Ttr locus in mouse livers (n = 3-4, mean ± SD).Statistical significance between AsCas12f and AsCas12f-HKRA was analyzed by twotailed Student's t test.(G) A schematic of the AAV vectors for EGFP or coagulation factor IX (F9) knockin at the Alb 3 0 UTR.NLS, SV40; pA, poly A sequence.(H) Immunohistochemistry (IHC) staining of livers from AAV-injected mice at 4 weeks after the vector injection.Endothelial cells were stained with antibodies specific to CD146 (red).Nuclei were counterstained with DAPI (blue).Scale bars, 200 mm.(I) The ratio of EGFP-positive cells.Statistical significance between AsCas12f and AsCas12f-HKRA was analyzed by two-tailed Student's t test.(J) The increase in plasma factor IX activity (FIX:C).Statistical significance between donor only and donor plus AsCas12f-HKRA was analyzed by two-tailed Student's t test.See also Figure S7.

Figure 7 .
Figure 7. Application of enAsCas12f for knock-in and transcriptional activation by a single AAV vector (A) A schematic representation of the knock-in strategy targeting the Alb locus by a single AAV vector in vivo.Neonatal hemophilia B mice were treated with the AAV vector (3 3 10 11 vg/body).NLS, SV40; pA, poly A sequence.(B and C) Increase in plasma factor IX activity (FIX:C) (B) and antigen (FIX:Ag) (C) (n = 6-8, mean ± SD).

Table 1 .
Data collection, processing, model refinement, and validation Detailed methods are provided in the online version of this paper and include the following: supported by AMED grant numbers JP233fa627001 and JP19am0401005, the Platform Project for Supporting Drug Discovery and Life Science Research (Basis for Supporting Innovative Drug Discovery and Life Science Research [BINDS]) from AMED under grant number JP23am121002 (support number 3272) and JP23ama121012 (support number 4848, A.H.), and the Cabinet Office, the Government of Japan, and the Public/Private R&D Investment Strategic Expansion Program (PRISM) grant number JPJ008000.AUTHOR CONTRIBUTIONS T.H., S.T., Y.K., and A.H. conducted deep mutational scanning and subsequent data analysis with assistance from S.M. T.H. and S.T. performed in vitro assays to identify enhanced mutants.A.H. performed self-targeting library experiments.H.Y., Y.O., D.M., and H.M. contributed to the data analysis.T.H. performed the GUIDE-seq analysis.S.I., M.K., H.A., and N.Y.contributed to data collection.T.T. and H.U. conducted the iPSC-derived cardiomyocyte experiments.T.T., T.H., and T.O.conducted AAV preparation and mouse experiments.S.N.O. and R.N. performed biochemical and structural analyses with assistance from S.N.T., H.H., Y.K., and Y.I.T.K. and V.S. conceived the project.S.N.O., R.N., H.U., T.O., A.H., and O.N. wrote the manuscript with help from all authors.T.O., A.H., and O.N. supervised the research.DECLARATION OF INTERESTS T.H., S.N.O., R.N., Y.K., S.M., T.O., A.H., and O.N. have filed a patent application related to this work.