Machine-Learning-Based Single-Molecule Quantification of Circulating MicroRNA Mixtures

MicroRNAs (miRs) are small noncoding RNAs that regulate gene expression and are emerging as powerful indicators of diseases. MiRs are secreted in blood plasma and thus may report on systemic aberrations at an early stage via liquid biopsy analysis. We present a method for multiplexed single-molecule detection and quantification of a selected panel of miRs. The proposed assay does not depend on sequencing, requires less than 1 mL of blood, and provides fast results by direct analysis of native, unamplified miRs. This is enabled by a novel combination of compact spectral imaging and a machine learning-based detection scheme that allows simultaneous multiplexed classification of multiple miR targets per sample. The proposed end-to-end pipeline is extremely time efficient and cost-effective. We benchmark our method with synthetic mixtures of three target miRs, showcasing the ability to quantify and distinguish subtle ratio changes between miR targets.


Reporter probe sequences:
• Capture probe for hsa-miR-15b-5p (ATTO488-ATTO647N): /5ATTO488K/TT AGT TGT AAA CCA TGA TGT GCT GCT AAT GTA /3ATTO647NN/     1 in the main text).The three lasers used for exciting the four fluorophores are displayed as solid vertical lines.The plot is stretched according to the non-linear dispersion curve of the optical system (Figure S6), showing the theoretical pixel displacement (bottom X-axis) of each wavelength (top X-axis) in the fluorophores' spectra. 1 .The colored circles correspond to the maximal emission wavelengths of the fluorophores used for designing the three probes used in the experiment, while the colored patches stand for the multiband filter's transmission channels.

Crop size selection:
Crop dimensions, 24⨉10 pixels, were selected such that each crop will contain the complete pointspread-function (PSF) information of the three miR targets' spectral signatures.Therefore, the top blob of each TS detected PSF was centered around the sixth pixel from the top of the crop and at the 5th pixel from the left (centered), leaving 18 pixels below to encapsulate the bottom blob (maximal anticipated peaks distance of 11 pixels, see figure 1C in the main text).The empirical standard deviation of the blobs was determined as ~1.6 pixels, therefore the chosen dimensions of the crops allow it to contain the complete information of a single PSF, while minimizing the chance for detection of multiple PSFs within the same crop.
Figure S1 -Fluorescently labeled single-stranded DNA (ssDNA) capture probe does not non-specifically bind to Peg and Pegbiotin passivated surfaces.100 ul of 100 pM ssDNA labeled with Alexa 647 (ssDNA-A647) was incubated on peg-biotinylated (Peg:Peg-Biotin = 100:1) glass surface for one hour, then washed 3 times with 100 ul PBS before imaging, and a field of view (FOV) with (A) 633 nm and with (B) 561 nm laser is shown.The FOV with 633 nm excitation shows that the ssDNA-A647 probe does not stick on the passivated surface and gets washed effectively with 3x gentle washing.The bright particles in (A) are not ssDNA-A647, and are other impurities seen in (B) with 561 nm laser excitation.

Figure S2 -
Figure S2 -ssDNA capture probes partially bind to S9.6 anti-DNA:RNA hybrid antibody.100 ul of 100 pM ssDNA labeled with Alexa 647 (ssDNA-A647) was incubated on peg-biotinylated (Peg:Peg-Biotin = 100:1) glass surface with immobilized S9.6 antibody for one hour, then washed 3 times with 100 ul PBS before imaging, and a field of view (FOV) with (A) 633 nm and with (B) 561 nm laser is shown.The FOV with 633 nm excitation shows that the ssDNA-A647 probes indeed partially bind to S9.6-immobilized and PEG passivated surfaces.The bright particles in (A) are ssDNA capture probes, whereas very few bright spots seen in (B) with 561 nm laser excitation confirm that the single particles seen in (A) are indeed ssDNA-A647 capture probes.

c. S 9 .
Figure S3 -S9.6 anti-DNA:RNA hybrid antibody does not capture single-stranded microRNAs.100 ul of 100 pM synthetic miR (hsa-mir-15b-5p) labeled with Alexa 488 (miR-A488) was incubated on peg-biotinylated (Peg:Peg-Biotin = 100:1) glass surface with immobilized S9.6 antibody for one hour, then washed the 3 times with 100 ul PBS before imaging, and a field of view (FOV) with (A) 488 nm and with (B) 561 nm laser is shown.The FOV with 633 nm excitation shows that the synthetic miR-A488 has a small affinity for binding to immobilized S9.6 antibody and PEG passivated surface.d. S 9.6 specific miR:Probe capture control

Figure S4 -
Figure S4 -S9.6 anti-DNA:RNA hybrid antibody specifically captures DNA:RNA duplex, and does not catch either ss-miR or ss-DNA capture probes in presence of DNA:RNA duplexes.CoCoS image with a single exposure of 800 ms with both 488 and 633 nm excitation, and at RPA 175 for optimal detection of dispersed emission.ssDNA-A647 capture probe is in (A) one time excess, and (B) seven times excess to DNA:RNA hybrids while incubating on immobilized S9.6 antibody, and then washed 3x with PBS before imaging.The DNA:RNA hybrids are detected as doubly labeled single molecules as ss-DNA capture probe is labeled with Alexa 647, and the target hsa-miR-15b-5p is labeled with AF488.(see Experimental methods for more details) In both (A) and (B) more than 98% detectable signal originates from duplex (two blobs separated by 12 pixels), and singly labeled ss-DNA or hsa-miR-15b-5p are not observed.

Figure
Figure S7 -Same curve as in Figure S6 with maximal emission wavelengths of all fluorophores used for simulating PSFs of optional combinations for future probes.

Figure
Figure S8 -PSF simulations with varying noise distributions.11 fluorophore-pairs with distinct PSFs were simulated in Matlab according to the fluorophores' spectra and the dispersion profile of CoCoS with RPA=177.5.Each row corresponds to simulated Gaussian noise (G.N) added over the same PSF data as in the first row.The G.N is distributed around 0.3 mean and with variance as depicted to the left (corresponding roughly to SNR of ~25, 7 and 1.7, from top to bottom).Poisson distributed noise was added to all noise cases.As can be seen, the Gaussian noise which corresponds to the fluorescence background and pixel readout noises is the main limiting factor for efficient classification of multiple miR targets.

Figure
Figure S9 -V-TIMDER visual explanation.Left, the V-TIMDER GUI in "mixture" mode explained on a miR-126 PSF in normal context crop size (24⨉10 pixels).In "binary" mode the miR type toggle is removed.Right, examples for a large context (48⨉20 pixels) crop of miR-15b PSF (top), extra-large context (240⨉100 pixels) crop of miR-155 (center), and large context of a single spot, noise crop (bottom).

Figure S10 -
Figure S10 -Mixtures' absolute counts distributions.The left y-axis represents the total number of counts produced by the classifier whereas the right y-axis represents the number of crops visually classified by users using V-TIMDER.

Figure S11 -
Figure S11 -Mixtures' fraction distributions.The total number of classified crops for each distribution is given in the legend.

Figure S12 -
Figure S12 -MiR mixture distributions visualization on the concentrations 2-simplex by a ternary plot of miR concentration distributions for the two experimental mixtures 1:1:1 (0.33:0.33:0.33)and 2:5:3 (0.2:0.5:0.3).Color represents the binned counts of simulated concentration values according to the multinomial and Gaussian error estimation described in the methods.The white contour lines represent Gaussians fitted to each of the distributions.The plot was generated using ‫״‬Ternary Plots" from Matlab's file exchange functions 2 .

Figure S13 -
Figure S13 -Ten spectral PSF combinations and classification demonstrated by 100 nm silica beads labeled with four fluorophores.A) Spectral FOV of multi-color beads as registered by CoCoS with RPA=177° B) The same FOV sequentially excited by four different lasers (405 nm, 488 nm, 561 nm, and 638 nm), registered separately, false colored and overlayed to generate a four-color image.C) Example PSFs cropped and placed side-by-side showcasing the ten different PSFs according to their fluorophore combinations.

Figure S14 -
Figure S14 -Amplification-free detection of miR-15b-5p and miR-155 in small RNA extracted from 500 μL plasma.A) An example raw image of FOV containing the two miR targets.B) Zoom-in of the squared region in A, highlighting the two different PSFs, their corresponding miR targets identities and their fluorophore-pairs (in parentheses).

Figure S15 -
Figure S15 -Empirical PSF estimation crops.Empirical reference crops calculated from pixel-wise median of ~10^5 noisy crops.These reference images were input to V-TIMDER (see figureS6) to facilitate the PSF visual classification process.

Figure
Figure S16 -A schematic diagram of the classifier's pipeline.

Figure S17 -
Figure S17 -Crop augmentation by addition of weak Gaussian noise according to table S5 parameters.Left column, three examples of the denoised miR crops.Right column, the same crops as the left column but with the addition of Gaussian noise.For each crop, six realizations of Gaussian augmented crops were made (see methods).

Figure S18 -
Figure S18 -The first 20 PCA components generated by the unsupervised PCA analysis.For visual clarity, the components were symmetrically mirrored to generate 24x10 pixels crops (instead of the actual 24x5 pixels generated by the PCA after the symmetrization preprocessing in the classifier's pipeline, see methods).The three spectral PSFs (5,7-and 11-pixel distances) are clearly represented in the first 6 components, whereas the rest are probably attributed to noise classification.The diverging colormap used to present the PCA components was generated using the 'lbmap' function downloaded from Matlab file exchange 3 .

Figure S19 -
Figure S19 -Confusion matrix results for the validation set.
1. Cost of goods analysis (July 2023) Table S 1 -Costs per sample for multiplexed detection of 5 miR reporters.Cost analysis is excluding the RNA extraction kit as this step can be skipped in future implementations.