Association of human breast cancer CD44-/CD24- cells with delayed distant metastasis

Tumor metastasis remains the main cause of breast cancer-related deaths, especially delayed breast cancer distant metastasis. The current study assessed the frequency of CD44-/CD24- breast cancer cells in 576 tissue specimens for associations with clinicopathological features and metastasis and investigated the underlying molecular mechanisms. The results indicated that higher frequency (≥19.5%) of CD44-/CD24- cells was associated with delayed postoperative breast cancer metastasis. Furthermore, CD44-/CD24-triple negative breast cancer (TNBC) cells spontaneously converted into CD44+/CD24-cancer stem cells (CSCs) with properties similar to CD44+/CD24-CSCs from primary human breast cancer cells and parental TNBC cells in terms of stemness marker expression, self-renewal, differentiation, tumorigenicity, and lung metastasis in vitro and in NOD/SCID mice. RNA sequencing identified several differentially expressed genes (DEGs) in newly converted CSCs and RHBDL2, one of the DEGs, expression was upregulated. More importantly, RHBDL2 silencing inhibited the YAP1/USP31/NF-κB signaling and attenuated spontaneous CD44-/CD24- cell conversion into CSCs and their mammosphere formation. These findings suggest that the frequency of CD44-/CD24- tumor cells and RHBDL2 may be valuable for prognosis of delayed breast cancer metastasis, particularly for TNBC.


Sample-size estimation
• You should state whether an appropriate sample size was computed when the study was being designed • You should state the statistical method of sample size computation and any required assumptions • If no explicit power analysis was used, you should describe how you decided what sample (replicate) size (number) to use Please outline where this information can be found within the submission (e.g., sections or figure legends), or explain why this information doesn't apply to your submission:

Replicates
• You should report how often each experiment was performed • You should include a definition of biological versus technical replication • The data obtained should be provided and sufficient information should be provided to indicate the number of independent biological and/or technical replicates • If you encountered any outliers, you should describe how these were handled • Criteria for exclusion/inclusion of data should be clearly stated • High-throughput sequence data should be uploaded before submission, with a private link for reviewers provided (these are available from both GEO and ArrayExpress) Please outline where this information can be found within the submission (e.g., sections or figure legends), or explain why this information doesn't apply to your submission: We describe the source of the sample and the number of samples in the Patients and tissue specimens section, and we describe the statistical method of calculating the sample size in the Statistical analysis section.
We describe the way to repeat the experiment in the method and figure legends. The cell experiments (including WB, PCR, etc.) were repeated more than three times independently, and the animal experiments were repeated 5-8 times in each group.

Statistical reporting
• Statistical analysis methods should be described and justified • Raw data should be presented in figures whenever informative to do so (typically when N per group is less than 10) • For each experiment, you should identify the statistical tests used, exact values of N, definitions of center, methods of multiple test correction, and dispersion and precision measures (e.g., mean, median, SD, SEM, confidence intervals; and, for the major substantive results, a measure of effect size (e.g., Pearson's r, Cohen's d) • Report exact p-values wherever possible alongside the summary statistics and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.
Please outline where this information can be found within the submission (e.g., sections or figure legends), or explain why this information doesn't apply to your submission: (For large datasets, or papers with a very large number of statistical tests, you may upload a single table file with tests, Ns, etc., with reference to sections in the manuscript.)

Group allocation
• Indicate how samples were allocated into experimental groups (in the case of clinical studies, please specify allocation to treatment method); if randomization was used, please also state if restricted randomization was applied • Indicate if masking was used during group allocation, data collection and/or data analysis Please outline where this information can be found within the submission (e.g., sections or figure legends), or explain why this information doesn't apply to your submission: Additional data files ("source data") • We encourage you to upload relevant additional data files, such as numerical data that are represented as a graph in a figure, or as a summary table • Where provided, these should be in the most useful format, and they can be uploaded as "Source data" files linked to a main figure or table • Include model definition files including the full list of parameters used • Include code used for data analysis (e.g., R, MatLab) • Avoid stating that data files are "available upon request" Please indicate the figures or tables for which source data files have been provided: We described in the Statistical analysis section of the method that the data were expressed as the mean ± the standard deviation (SD), and the differences between the groups were analyzed by chi-squared or Student's t tests, as applicable.
Our study does not involve any clinical trials, so we do not have descripted this information in the manuscript.
We have uploaded the source data of all patients for statistics in Excel.