CASB: a concanavalin A‐based sample barcoding strategy for single‐cell sequencing

Abstract Sample multiplexing facilitates single‐cell sequencing by reducing costs, revealing subtle difference between similar samples, and identifying artifacts such as cell doublets. However, universal and cost‐effective strategies are rather limited. Here, we reported a concanavalin A‐based sample barcoding strategy (CASB), which could be followed by both single‐cell mRNA and ATAC (assay for transposase‐accessible chromatin) sequencing techniques. The method involves minimal sample processing, thereby preserving intact transcriptomic or epigenomic patterns. We demonstrated its high labeling efficiency, high accuracy in assigning cells/nuclei to samples regardless of cell type and genetic background, and high sensitivity in detecting doublets by three applications: 1) CASB followed by scRNA‐seq to track the transcriptomic dynamics of a cancer cell line perturbed by multiple drugs, which revealed compound‐specific heterogeneous response; 2) CASB together with both snATAC‐seq and scRNA‐seq to illustrate the IFN‐γ‐mediated dynamic changes on epigenome and transcriptome profile, which identified the transcription factor underlying heterogeneous IFN‐γ response; and 3) combinatorial indexing by CASB, which demonstrated its high scalability.

Transaction Report: (Note: With the exception of the correction of typographical or spelling errors that could be a source of ambiguity, letters and reports are not edited. Depending on transfer agreements, referee reports obtained elsewhere may or may not be included in this compilation. Referee reports are anonymous unless the Referee chooses to sign their reports.) 19th Oct 2020 1st Editorial Decision Thank you for submit ting your manuscript "A concanavalin A-based sample barcoding st rat egy for single-cell sequencing" to Molecular Syst ems Biology.
I have now read you manuscript and discussed it wit h the team, and we think that the present ed approach seems int erest ing. I am glad to inform you that we have decided to send the manuscript out for review.
However, before we send the manuscript to pot ent ial referees, I think that some more direct comparisons of CASB to alt ernat ive exist ing approaches would need to be included. Even if this is only by discussion, we think that the manuscript would benefit from clearer descript ions of the advant ages and superiorit y of CASB compared to specific alt ernat ive met hods. This would be quit e useful for the referees.

18th Nov 2020 1st Revision -Editorial Decision
Thank you again for submit ting your work to Molecular Syst ems Biology. We have now heard back from the three referees who agreed to evaluat e your st udy. Overall, the reviewers acknowledge that the proposed met hod seems to be a relevant cont ribut ion to the single-cell biology field. However, they raise a series of concerns, which we would ask you to address in a major revision.
Without repeating all the points listed below, some of the more fundamental issues are the following: -The reviewers point out that a direct and systematic comparison to other methods is lacking. Such a comparison needs to be included and the advantages of CASB need to be clearly demonstrated. As suggested by the reviewers, a table comparing the different methods would indeed be helpful.
-The applications of CASB to combinatorial indexing and combined RNA-seq/ATAC-seq assays should ideally be demonstrated experimentally (as reviewer #1 recommends). This would add value to the study.
-Potential artifacts caused by ConA (e.g. aggregation or effects on the transcriptome), should be examined in further detail. Related to this, reviewer #2 also points out the high rate of doublets in some of the experiments, which needs to be carefully assessed.
-Reviewers #1 and #2 recommend expanding and clarifying the results of the specific application of CASB to examine drug responses, to make a stronger case for the success of the method.
-Reviewer #3 raises an important point (point #1) related to the potential bias of the method towards certain cell subpopulations in a mixture.
-In line with point #4 of reviewer #3 we would ask you to make sure that the code is made available. The methodology should be described in detail, and the manuscript would benefit from including protocols using our Structured Methods format. Here the aut hors describe a Concanavalin A based sample barcoding st rat egy (CASB) to label samples. As a proof of concept , they used CASB to label cells wit h different treat ment s and show it can be used in bot h scRNA-seq and scATAC-seq. Toget her wit h ot her recent ly described met hods including MULTI-seq, sci-PLEX and ClickTags. CASB provides an alt ernat ive way of sample indexing. However, we are not convinced that CASB is comparable or bet ter than current met hods wit h current dat a. Here are a few major concerns that we would like the aut hors to address in any revision: A more syst emat ic discussion/comparison wit h ot her techniques. The aut hors ment ioned ot her techniques for sample mult iplexing have limit at ions including scalabilit y, universalit y and pot ent ial to int roduce art ifact s. Can the aut hors provide a table or ext ended discussion on bot h the disadvant ages and advant ages of each met hod? From our perspect ive, compared to CASB, MULTI-seq, sci-PLEX and ClickTags all provide more highly scalable met hods to label samples. It's therefore unfair to focus on CITE-seq as a comparat or for scalabilit y as the aut hors do in the discussion. On the ot her hand, one pot ent ial advant age of CASB over ot her techniques is that it can be used in scATAC-seq and it is simpler than many exist ing st rat egies for mult iplexing singlecell samples. The st rengt hs and limit at ions of CASB treat ment should be thoroughly discussed and compared fairly wit h exist ing met hods. Combinat orial indexing is a key st rat egy in sample mult iplexing. This will likely work well for CASB, but it is not demonst rat ed in the manuscript . Similar claims about RNA/AT AC co-assays, while highly plausible, were also not demonst rat ed. Ideally, the aut hors would demonst rat e this. If this is not possible, they should make it clear that these could work in principle but have not yet been demonst rat ed. ConA treat ment could int roduce transcript ional art ifact s and agglut inat ion (aggregat ion) of cells. The dat a in figure 1EV1-H are convincing for this cell type, but users will want to be sure that ConA treat ment does not induce transcript ional art ifact s in their cell type of int erest . Because this concern is in line wit h the known biological act ivit y of ConA and ot her lect ins, how could pot ent ial users know that CASB will not generat e art ifact s? In the scATAC-seq experiment , a ~10% doublet rat e (305/2890 cells) was observed at low nuclei loading rat e. Is this due to agglut inat ion of nuclei prior to sample loading? Images of single-cell and single-nuclei populat ions before and aft er ConA treat ment would be one way to demonst rat e that ConA does not cause cells or nuclei to aggregat e. More thorough analyses of biological experiment s. Gene different ial expression analysis and an invest igat ion of the transcript ional effect s of the pert urbat ions shown in Figure 2E-G would improve the manuscript and demonst rat e the success of the met hod. We don't have a sense of the ext ent of pert urbat ion performed.
Minor issues: It will be really helpful if t he aut hors can show t he signaling pat hway of INF-γ in Figure 3 Line 34,35: When discussing combinatorial indexing, papers including Cusanovich, Darren A., et al. Fang et al. utilized biotinylated Concanavalin A to label samples with specific DNA barcodes. They demonstrated that this method could be applied to both single-cell RNA-seq and ATAC-seq pipelines. Using this sample multiplexing strategy, they revealed the transcriptomic dynamics and chromatin accessibility changes in the cell lines treated with multiple drugs and cytokines. While CASB could be a useful universal tool to the field, I have several concerns where more clarity is needed: 1.As this method is used for sample multiplexing, rigorous assessment of cell doublet rate is essential to prove the cleanness of the assay and to reveal any potential issues. However, except in the first cell-line-mixture experiment (page 6, the last paragraph), where the high doublet rate (3962/12068*2) was likely caused by cell overloading (~12k cells input), no explanation is provided to justify the high doublet rates in other experiments (305/2890*2 in Fig. 3B and 294/3407*2 in Fig.  EV4B), where a typical 2~3% doublet rate should be expected with 3k cells input.
2.Regarding the three clusters identified on MDA-MB-231 (Fig. 2E), what are the markers of each of the clusters? Is there any experimental evidence (e.g., double immunostaining) to support the clustering result? There is also evidence of sub-cluster within cluster 1. How was the clustering resolution determined?
3.In Fig 4D, it's not clear what each point represents just from the figure legend. But it seems that, despite being lower than their counterparts in cluster 2, the expression of NF-kB target genes in cluster 0 at later time points (8h and 12 h) is actually higher than that in cluster 2 at earlier time points (4h and 6h). It contradicts the pattern of CXCL10 shown in Fig. 4E, which only expressed in cluster 2, no matter the time points, but not in any of the cells in cluster 0. It's also different from Fig.  3E, where the highest NF-kB regulatory activities are mostly evidenced in the early (0h) and the late (12h) time points.
Additional points: 1.Could the authors explain why nuclei have higher tagging efficiency than cells (120,000 in 3.For sample indexing in the snATAC-seq experiment, correlation analysis should be added to compare the data quality generated from untreated and indexed samples. 4.A table to summarize the advantages (UMI capture efficiency, cost, etc.) of the CASB method over other platforms will help the readers to get its potential.

Reviewer #3:
Fang et al. introduce a novel tagging strategy for multiplexed single-cell analysis. Several strategies for tagging cells with barcoded oligos have been developed in the past years, including tagging of fixed cells with a click reaction (clickTag), antibody-based cell hashing, and lipid-based anchoring of barcodes (multi-seq). The goal of these strategies is to allow overloading of single-cell platforms such as 10x genomics (reducing costs) and, most importantly, to allow the simultaneous processing of samples in a way that minimizes/eliminates batch effects (e.g. different treatments or developmental stages). The authors' strategy radically differs from all previous methods, using concanavalin A-biotin to bind glicoproteins in the cell surface. The concanavalinA-biotin is assembled first into a labeling complex with streptavidin and a biotinylated oligo containing the sample barcode, then this complex is briefly incubated with cells or nuclei. This design confers great flexibility to the system and allows very rapid barcoding of different types of samples. Most importantly, the method allows barcoding not only for scRNA-seq but also scATAC-seq, being the first such strategies amenable to this application. The authors apply their method first to a breast cancer cell line to simultaneously profile scRNA-seq from 5 different drug perturbations assays. Cross-species labeling assays demonstrate the doublet-detection capacity of their methods, as well as the signal scalability of the tagging strategy and the lack of strong batch effects associated with this labeling. Overall, this novel method represent an important development for single-cell biology and it makes this work a strong candidate for publication.
I have one major concern and a few additional comments: 1. My main concern is about performance of this tagging strategy in different species and, most importantly, in different cell types within the same sample. Some smaller cells or cells with different glycoprotein surface composition may systematically fail to incorporate enough concanavalin-based tags, and therefore become literally invisible in when overloading single-cells (de facto inadvertently mixing their RNA/accessibility signal with other cells). The authors must systematically address this by using different species and cell populations. Of course, this does not need to be measured through repeated time-consuming and expensive single-cell RNA-seq/ATAC-seq experiments, but simply assaying barcode detection by qPCR.
2. The authors must report the number of reads employed in each experiment and, specially, which fraction of reads correspond to barcodes versus transcripts/tagmented DNA. The lack of control over barcode sequencing (since no splitting of the libraries is performed) may results in an extremely cost-ineffective experiment, so these statistics are necessary to evaluate this possibility.
3. The way scATAC data is analysed and presented is a bit unusual. For example, the authors should show the accessibility tracks for the different clusters/cell populations and additional QC stats should be reported.
4. Before publication it is essential that the authors release all the code used in the analyses, as well as any dedicated scripts. Similarly, a detailed protocol must be included in order to ensure the broad applicability of the method. 5. Although not exactly a tagging strategy, the authors should acknowledge and discuss that multiplexing scATAC methods exist (e.g. based on barcoded Tn5 tagmentation or barcode ligation after tagmentation).

Reviewer #1 (Comments to the Author):
Here the authors describe a Concanavalin A based sample barcoding strategy (CASB) to label samples.
As a proof of concept, they used CASB to label cells with different treatments and show it can be used in both scRNA-seq and scATAC-seq. Together with other recently described methods including MULTI-seq, sci-PLEX and ClickTags. CASB provides an alternative way of sample indexing. However, we are not convinced that CASB is comparable or better than current methods with current data.
Here are a few major concerns that we would like the authors to address in any revision: A more systematic discussion/comparison with other techniques. The authors mentioned other techniques for sample multiplexing have limitations including scalability, universality and potential to introduce artifacts. Can the authors provide a table or extended discussion on both the disadvantages and advantages of each method? From our perspective, compared to CASB, MULTIseq, sci-PLEX and ClickTags all provide more highly scalable methods to label samples. It's therefore unfair to focus on CITE-seq as a comparator for scalability as the authors do in the discussion. On the other hand, one potential advantage of CASB over other techniques is that it can be used in scATAC-seq and it is simpler than many existing strategies for multiplexing single-cell samples. The strengths and limitations of CASB treatment should be thoroughly discussed and compared fairly with existing methods. Answer： Thanks for the suggestion. We now provided a table (Table EV1) to list the advantages and limitations of each method.
Indeed, it is unfair to focus on CITE-seq as a comparator for scalability, therefore, we removed the sentence. (Line 310-315) Combinatorial indexing is a key strategy in sample multiplexing. This will likely work well for CASB, but it is not demonstrated in the manuscript. Similar claims about RNA /ATAC co-assays, while highly plausible, were also not demonstrated. Ideally, the authors would demonstrate this. If this is not 3rd Feb 2021 2nd Authors' Response to Reviewers possible, they should make it clear that these could work in principle but have not yet been demonstrated.

Answer：
Indeed, we have been working on combinatorial indexing using CASB. We now included the data of such an experiment, where we applied two strategies, i.e., combinatorial barcoding and split-pool barcoding. In this experiment, a four-by-four combinatorial barcoding allowed 16 cell types from seven different species to be indexed, whereas another round of split-pool indexing with four additional barcodes increased the number of cell indexes to 64. As shown in Figure   5, 64 indexes with different barcode combinations allowed to assign cell origin as well as to further increase the efficiency in doublet detection in the scRNA-seq experiment. For the detailed results, please refer to line 271-302 and Figure 5 and EV7.
We do not have the data to demonstrate the compatibility of CASB with RNA/ATAC co-assays, but, given the similar workflow between the RNA/ATAC co-assay and snATAC-seq provided by 10X Genomics, we think it should work in principle. Nevertheless, we rephrased our text as suggested by the reviewer (Line 333-336).
ConA treatment could introduce transcriptional artifacts and agglutination (aggregation) of cells. The data in figure 1EV1-H are convincing for this cell type, but users will want to be sure that ConA treatment does not induce transcriptional artifacts in their cell type of interest. Because this concern is in line with the known biological activity of ConA and other lectins, how could potential users know that CASB will not generate artifacts?

Answer：
To investigate the potential effect of CASB labelling on cell transcriptome in other cell types, in the combinatorial indexing experiment, we indexed eight cell types from human (HEK-293T, RPE-1, Jurkat, K-562, HCT116, HepG2, HeLa and 786-0), four from mouse (RAW264.7, CT26 and 4T1), and one each from rat (REF), dog (MDCK), hamster (CHO), monkey (Vero) and drosophila (S2) using CASB technique. Meanwhile, before loading on 10X system, equal number of unlabeled cells from each cell type were pooled together with the labeled cells.
Notably, given ConA is a T cell mitogen, we intentionally included 'Jurkat', a T cell leukemia cell line. As shown in Figure 5H  In the scATAC-seq experiment, a ~10% doublet rate (305/2890 cells) was observed at low nuclei loading rate. Is this due to agglutination of nuclei prior to sample loading? Images of single-cell and single-nuclei populations before and after ConA treatment would be one way to demonstrate that ConA does not cause cells or nuclei to aggregate.

Answer：
To investigate whether ConA could cause cells or nuclei to aggregate, we took images of single-cell and single-nuclei populations with or without ConA treatment, as suggested by the reviewer and also quantified the singlet rate by flow cytometry. As shown in Fig EV1B&C, ConA did not induce cell or nucleus aggregation.
The unexpected high doublet rate is likely due to suboptimal condition of our 10X genomics equipment in the particular period, when these two experiments were performed. In the latest combinatorial indexing experiment, where we multiplexed 16 different cell lines and loaded the droplet system with standard number of cells (about 8000 cells were captured with sufficient reads), we got a doublet rate of 12%, which is within expected range. As suggested in user guide of 10X genomics, when 8000 cells are recovered, the doublet rate is around 6.1%. Since this doublet rate is calculated by mixing equal number of human and mouse cells, the true doublet rate should be 2x 6.1%, which is very close to the doublet rate in this experiment.
More thorough analyses of biological experiments. Gene differential expression analysis and an investigation of the transcriptional effects of the perturbations shown in Figure 2E-G would improve the manuscript and demonstrate the success of the method. We don't have a sense of the extent of perturbation performed.

Answer：
We have performed gene differential expression analysis for sensitive cell population after drug perturbation, which revealed that OSI-027, Niraparib and Rucaparib induced expression alteration of 613, 365 and 296 genes (|logFC| > 0.25, P-value < 0.05, Dataset EV1) in the sensitive cell population, respectively, which are highly enriched in cell death and survival pathway (Fig EV3F), again demonstrating the sensitivity of this cell population to these treatments.

Minor issues:
It will be really helpful if the authors can show the signaling pathway of INF-γ in Figure 3.

Answer：
Thanks for the suggestion. We have included a scheme of INF-γ pathway as Figure 3B. Using this sample multiplexing strategy, they revealed the transcriptomic dynamics and chromatin accessibility changes in the cell lines treated with multiple drugs and cytokines. While CASB could be a useful universal tool to the field, I have several concerns where more clarity is needed: 1. As this method is used for sample multiplexing, rigorous assessment of cell doublet rate is essential to prove the cleanness of the assay and to reveal any potential issues. However, except in the first cellline-mixture experiment (page 6, the last paragraph), where the high doublet rate (3962/12068*2) was likely caused by cell overloading (~12k cells input), no explanation is provided to justify the high doublet rates in other experiments (305/2890*2 in Fig. 3B and 294/3407*2 in Fig. EV4B), where a typical 2~3% doublet rate should be expected with 3k cells input.

Answer：
The unexpected high doublet rate is likely due to suboptimal condition of our 10X genomics equipment in the particular period, when these two experiments were performed. In the latest combinatorial indexing experiment, where we multiplexed 16 different cell lines and loaded the droplet system with standard number of cells (about 8000 cells were captured with sufficient reads), we got a doublet rate of 12%, which is within expected range. As suggested in user guide of 10X genomics, when 8000 cells are recovered, the doublet rate is around 6.1%. Since this doublet rate is calculated by mixing equal number of human and mouse cells, the true doublet rate should be 2x6.1%, which is very close to the doublet rate in this experiment.
To investigate whether ConA could cause cells or nuclei to aggregate, as suggested by the first reviewer, we took images of single-cell and single-nuclei populations with or without ConA treatment and also quantified the singlet rate by flow cytometry. As shown in Fig EV1B&C, ConA did not induce cell or nucleus aggregation.
2. Regarding the three clusters identified on MDA-MB-231 (Fig. 2E), what are the markers of each of the clusters? Is there any experimental evidence (e.g., double immunostaining) to support the clustering result? There is also evidence of sub-cluster within cluster 1. How was the clustering resolution determined? Answer： Indeed, specific marker genes could be found for each cell cluster in scRNA-seq data, which we now added in the revised manuscript as Fig EV4. Given one of the key cellular programs differentially activated in cluster 0 vs. cluster1/2 cells is cellular movement, we focused on VIM and KRT18, two cytoskeleton proteins, for immunofluorescent analysis.  As suggested by the reviewer, a sub-cluster could indeed be observed within Cluster 1 in the UMAP projection (Fig 2E), but it is mainly due to the different level of total UMI count rather than the different transcriptome profile (shown below, also in Fig EV3B). Based on Fig 3E and F (now Fig 3F and G), as pointed out in the manuscript, the NF-kB activity is not significantly changed across the different timepoints, instead is highly variable at all time points, likely due to the existence of two cell populations with low and high NF-kB activity, respectively. To check whether such heterogeneous NF-kB activity can have an effect on gene expression, we predicted the NF-kB target genes based on the presence of active ATAC peaks containing NF-kB motif in their vicinity (within 100 kb). Although this strategy was often used to associate gene to ATAC peaks, we have to admit that such prediction would result in a list of target genes with many false positive as well as false negatives. Furthermore, as we know, the relationship between TF binding and target gene expression is often not straightforward, i.e., the expression of "target" genes may not solely depend on the TF binding. Given the two uncertainties, we could only draw any conclusions based on the global trend as shown in Figure 4D.
As suggested by the reviewer, some predicted NF-kB target genes were also expressed in cluster 0 cells with low NF-kB activity. These genes might be false positive predictions, or could be induced by other transcription factors induced by INF-γ. For example, among the 1030 predicted NF-kB target genes, 297 and 234 of them were also predicted as IRF and STAT target genes, respectively. In contrast, well-known NF-kB target genes, whose expression absolutely requires NF-kB activity under INF-γ stimulation, such as CXCL10 and 11, the INFγ treatment was unable to trigger the expression in cells with low NF-kB activity.
Additional points: 1. Could the authors explain why nuclei have higher tagging efficiency than cells (120,000 in Fig. EV1A vs. 50,000 in Fig. 1B), given much smaller membrane area?

Answer：
Indeed, we were also surprised by this phenomenon. But, of note, we had not saturated cellular or nuclear surface with CASB barcodes in these experiments, thus we did not know whether the maximal tagging efficiency of the nuclear surface is also higher than that of the cellular surface. This result only suggested that, at the same concentration of CASB labeling complex, nuclei were more efficiently tagged.
To answer the question, we performed fluorescent labeling experiment in which we firstly saturated cellular or nuclear surface with excessive amount of biotinylated-ConA, and then labeled cells and nuclei with excessive amount of streptavidin-conjugated fluorophore. As revealed by flow cytometry quantification (Figure below), nucleus again demonstrated a much higher labeling efficiency than cell. We speculate this could potentially be due to the more complex environment on the cellular membrane, where other large molecules may hinder the binding of CASB complex to the glycoprotein. Fig 2G would help rule out sequencing depth, or low cell quality (after drug treatment) caused effects.

Answer：
Thanks for the suggestion. We have included UMAP with UMI counts of each cell as Fig EV3B, in which UMI count of individual cells was indicated and did not show cluster or treatment bias.
3. For sample indexing in the snATAC-seq experiment, correlation analysis should be added to compare the data quality generated from untreated and indexed samples.

Answer：
As suggested by the reviewer, we have performed such a control snATAC-seq experiment. In this experiment, CASB labeled HAP1 cells were pooled with the same number of unlabeled HAP1 cells and subjected for bulk tagmentation, FACS sorted into two 96-well plates and subsequent ATAC-seq library preparation. In total, 184 out of 192 cells were obtained with sufficient reads (Fig EV5B). We then separated CASB labeled HAP1 cells from unlabeled cells based on the number of CASB barcode reads in individual cells (Fig EV5B). CASB labeled and unlabeled cells were intermingled in UMAP projection according to the ATAC signal ( Fig   EV5C), suggesting no influence of CASB labeling on epigenomic profile. This was further confirmed, when the cumulative ATAC signal of the labeled and unlabeled cells was compared: the correlation between the labeled and unlabeled cells was similar as that between the two plates ( Fig EV5D). (Line 213-223) 4. A table to summarize the advantages (UMI capture efficiency, cost, etc.) of the CASB method over other platforms will help the readers to get its potential.

Answer：
Thanks for the suggestion. We now provided a table as Table EV1 to list the advantages and limitations of each method. Actually, for scRNA-seq experiments, CASB barcode and transcriptome library were separated by size selection before next-generation sequencing library construction, which enables the sequencing of the two libraries separately at a user-defined depth.
3. The way scATAC data is analyzed and presented is a bit unusual. For example, the authors should show the accessibility tracks for the different clusters/cell populations and additional QC stats should be reported.

Answer：
We have added additional information in the manuscript. The number of ATAC peaks detected in individual cells was shown in Fig EV5F. The cumulative ATAC signal around CXCL10 and 11 genes in two cell clusters with different NF-kB activity at different time points were demonstrated in Fig EV6D. 4. Before publication it is essential that the authors release all the code used in the analyses, as well as any dedicated scripts. Similarly, a detailed protocol must be included in order to ensure the broad applicability of the method.

Answer：
All next-generation sequencing data were submitted to GEO under the accession number GSE153116. Scripts used for CASB barcode analysis is publicly available at https://github.com/GuipengLi/CASB. The detailed protocol was also included in the Method section in 'Structured Methods' format.

5.
Although not exactly a tagging strategy, the authors should acknowledge and discuss that multiplexing scATAC methods exist (e.g. based on barcoded Tn5 tagmentation or barcode ligation after tagmentation).

Answer：
Thanks for pointing it out. We have cited the relevant references in the new version of the manuscript. (Line 41-42) 26th Feb 2021 2nd Revision -Editorial Decision Thank you for sending us your revised manuscript . We have now heard back from the three reviewers who were asked to evaluat e your st udy. Overall, the reviewers appreciat e the thorough response to their concerns and think that the st udy has improved as a result of the performed revisions. However, as you will see below, reviewers #1 and #2 st ill list a few remaining concerns, which we would ask you to address in a revision. Most of them can be addressed by providing clarificat ions and/or discussions.
When you revise your manuscript , we would also ask you to address the following pending edit orial issues. The aut hors performed a thorough response to reviewer comment s and addressed our concerns from the first submission. The paper is now thorough and fairly complet e. Overall, the met hod seems to perform quit e well for RNA-seq and reasonably well for ATAC-seq. We only have one remaining comment relat ed to a new sect ion on combinat orial indexing.
The aut hors perform a 16-sample mult iplexing exper iment that uses 4x4 sample mult iplexing. Then, they pool all samples, split the mixt ure int o four groups, and add an addit ional third mult iplexing oligo. It's good to see a demonst rat ion of combinat orial mult iplexing, but we are confused by the pool-and-split step. Once the samples are pooled, they cannot be uniquely labeled, so the 3rd oligo does not add any sample identifying information. This is clear from Figure 5E. While this step does not invalidate the experiment, its purpose and value is unclear, and should be clarified.
Other than that, the authors performed well-controlled experiments, especially including unlabeled cells along with labeled cells in the same experiment, and they thoroughly explored the data. 1.My biggest concern is still about the high doublet rates described here, which are pretty unusual in single-cell analysis. Although the imaging study and flow cytometry suggested that cell/nuclei aggregation might not be a problem, it would be better if the authors can directly show that a low cell doublet rate can be achieved from a lower input (2000~5000 cell/nuclei) single-cell RNA/ATACseq data when applying CASB. Figure 5F, there are 4 clusters (in brown, in the left bottom part of the figure) that seem to come from the same cell line (HEK?). But in Figure 5C 3.The Pearson's correlation coefficient showed in Figure EV5D is very low. Is that due to the sparsity of ATAC-seq? Can the authors aggregate nearby peaks (over genomic bins) to mitigate the sparsity issue and do the plot again, and see if there is any improvement?

2.In
Reviewer #3: The authors satisfactorily addressed my concerns about tagging efficiency in different cell types and species. Congratulations on an imporant innovation in the single-cell genomics field.

General response to the reviewers:
We thank again the three reviewers for their time and appreciate their comments. During this revision, we have carefully addressed questions from reviewers #1 and #2, as shown below marked in red.

Reviewer #1 (Comments to the Author):
The authors performed a thorough response to reviewer comments and addressed our concerns from the first submission. The paper is now thorough and fairly complete. Overall, the method seems to perform quite well for RNA-seq and reasonably well for ATAC-seq. We only have one remaining comment related to a new section on combinatorial indexing.
The authors perform a 16-sample multiplexing experiment that uses 4x4 sample multiplexing. Then, they pool all samples, split the mixture into four groups, and add an additional third multiplexing oligo.
It's good to see a demonstration of combinatorial multiplexing, but we are confused by the pool-andsplit step. Once the samples are pooled, they cannot be uniquely labeled, so the 3rd oligo does not add any sample identifying information. This is clear from Figure 5E. While this step does not invalidate the experiment, its purpose and value is unclear, and should be clarified.
Other than that, the authors performed well-controlled experiments, especially including unlabeled cells along with labeled cells in the same experiment, and they thoroughly explored the data.

Answer：
Thanks for the question. Indeed, the 3rd oligo from pool-and-split step did not uniquely label 16 different cell line, and, therefore, was not helpful on identifying cell types. However, it increased the barcode combinations from 16 to 64 for the whole cell pool, which could help to identify the cell doublets that were formed by the same cell type (referred to as 'doublets within sample'). In our experiment, the 3rd oligo helped to identify 26 doublets within sample (Fig.   EV7A). Given the high multiplexity and standard loading rate of the droplet system in this particular experiment, we do not expect to observe a lot of cell doublets within sample.
However, this strategy will help to efficiently identify cell doublets within sample when only a few samples are multiplexed in a superloading experiment. We have now clarified our intension of using the 3rd oligo in the manuscript (Line 280-283). 1.My biggest concern is still about the high doublet rates described here, which are pretty unusual in single-cell analysis. Although the imaging study and flow cytometry suggested that cell/nuclei aggregation might not be a problem, it would be better if the authors can directly show that a low cell doublet rate can be achieved from a lower input (2000~5000 cell/nuclei) single-cell RNA/ATAC-seq data when applying CASB.

Answer：
In the latest combinatorial indexing experiment, where we loaded the droplet system with standard number of cells (about 8000 cells were captured with sufficient reads), we got a doublet rate of 12% (calculated according to CASB barcodes), which is within expected range.
As suggested in user guide of 10X genomics, when 8000 cells are recovered, the doublet rate is around 6.1%. Since this doublet rate is calculated by mixing equal number of human and mouse cells, the true doublet rate should be 2x 6.1%, which is very close to the doublet rate we calculated according to CASB barcodes.
Since the cell doublet rate could be affected by many different factors, such as the loading rate, sample property, hands-on experience, technical and equipment fluctuation, to fairly test whether CASB increased cell doublet rate, the best way would be to compare the doublet rate of CASB labeled and unlabeled cell in a single run of single-cell experiment. Therefore, to further exclude the potential of CASB to cause higher cell doublet rate, we compared the doublet rate of CASB labeled and unlabeled cells based on our combinatorial indexing experiment, where 16 different cell lines were multiplexed. Here, we considered only cell doublets consisting of samples from different species, which could be easily detected based on the commonly used genomic mapping information. Then we counted the number of cell doublets formed by 'two labeled cells' or 'two unlabeled cells', respectively. As an example, there were 120 and 94 unlabeled and labeled human-mouse doublets, respectively. The doublet rates were 4.27% (120/(1931+757+120)) and 3.72% (94/(1663+770+94)) (Fig R1A,B).
The doublet rates of human-rat, -monkey, -hamster, -dog and -fly doublets were then calculated in the same way. As shown in Fig R1C,   Genetics volume 52, pages1208-1218(2020)). Figure EV5D is very low. Is that due to the sparsity of ATAC-seq? Can the authors aggregate nearby peaks (over genomic bins) to mitigate the sparsity issue and do the plot again, and see if there is any improvement?

Answer：
Indeed, it is due to the sparsity of snATAC-seq from only less then 100 cells. According to the suggestion, we now have aggregated 3, 5, 7 or 9 nearby peaks and performed the correlation analysis, respectively. As expected, the Pearson's correlation coefficients between labeled and unlabeled cells were increased from originally 0.276, to 0.435, 0.500, 0.535, 0.556, respectively, while those between two plates were increased from originally 0.143 to 0.290, 0.354, 0.389, to 0.411, respectively. unlabeled as well as those between cells collected in plate 1 and 2. Due to the sparsity of snATAC-seq, 9 nearby peaks were aggregated to perform the correlation analysis. The correlation between the labeled and unlabeled cells was similar as that between the two plates.
"R" means Pearson's correlation coefficient. Each dot presents the accumulated intensity of 9 nearby ATAC peak.

Reviewer #3 (Comments to the Author):
For animal studies, include a statement about randomization even if no randomization was used.
4.a. Were any steps taken to minimize the effects of subjective bias during group allocation or/and when assessing results (e.g. blinding of the investigator)? If yes please describe. The data shown in figures should satisfy the following conditions: Source Data should be included to report the data underlying graphs. Please follow the guidelines set out in the author ship guidelines on Data Presentation.
Please fill out these boxes ê (Do not worry if you cannot see all your text once you press return) a specification of the experimental system investigated (eg cell line, species name).
Sample sizes were not selected a priori. Instead, single cells passing quality-control filtering were utilized to demonstrate key aspects of CASB methodology performance. Biological interpretations of single-cell RNA or ATAC sequencing data was constrained by statistical significance. graphs include clearly labeled error bars for independent experiments and sample sizes. Unless justified, error bars should not be shown for technical replicates. if n< 5, the individual data points from each experiment should be plotted and any statistical test employed should be justified the exact sample size (n) for each experimental group/condition, given as a number, not a range; Each figure caption should contain the following information, for each panel where they are relevant:

B-Statistics and general methods
the assay(s) and method(s) used to carry out the reported observations and measurements an explicit mention of the biological and chemical entity(ies) that are being measured. an explicit mention of the biological and chemical entity(ies) that are altered/varied/perturbed in a controlled manner. a statement of how many times the experiment shown was independently replicated in the laboratory.
Any descriptions too long for the figure legend should be included in the methods section and/or with the source data.
In the pink boxes below, please ensure that the answers to the following questions are reported in the manuscript itself. Every question should be answered. If the question is not relevant to your research, please write NA (non applicable). We encourage you to include a specific subsection in the methods section for statistics, reagents, animal models and human subjects.

definitions of statistical methods and measures:
a description of the sample collection allowing the reader to understand whether the samples represent technical or biological replicates (including how many animals, litters, cultures, etc.).

Reporting Checklist For Life Sciences Articles (Rev. June 2017)
This checklist is used to ensure good reporting standards and to improve the reproducibility of published results. These guidelines are consistent with the Principles and Guidelines for Reporting Preclinical Research issued by the NIH in 2014. Please follow the journal's authorship guidelines in preparing your manuscript.