Algorithm-improved high speed and non-invasive confocal Raman imaging of two-dimensional materials

Confocal Raman microscopy is important for characterizing two dimensional (2D) materials, but its low throughput significantly hinders its applications. For metastable materials such as graphene oxide (GO), the low throughput is aggravated by the requirement of extremely low laser dose to avoid sample damage. Here we introduce algorithm-improved Confocal Raman Microscopy (ai-CRM), which increases the Raman scanning rate by one to two orders of magnitude with respect to state-of-the-art works for a variety of 2D materials. Meanwhile, GO can be imaged at a laser dose that is 2 to 3 orders of magnitude lower than previously reported, such that laser-induced variations of the material properties can be avoided. ai-CRM also enables fast and spatially resolved quantitative analysis, and is readily extended to three-dimensional mapping of composite materials. Since ai-CRM is based on general mathematical principles, it is cost-effective, facile-to-implement and universally applicable to other hyperspectral imaging methods.


Introduction
Confocal Raman microscopy (CRM) has become one of the most widely used analytical methods to investigate the physico-chemical properties of two-dimensional (2D) materials 1,2 . Compared to other methods such as optical microscopy 3,4,5,6 , atomic force microscopy 3,7 , and fluorescence quenching microscopy 8,9 , Raman microscopy has the advantage to provide label-free, spatially resolved, compositional and structural information of the probed material on arbitrary substrates.
Raman microscopy has enabled studies on the quality 10 , defect 11 , number of layers 10,12 , crystal boundary 13 , strain 5 , oxidation state 14 , and electron-phonon interactions 15 of 2D materials. For example, to determine the quality of fabricated graphene, it is common to present the characteristic Raman spectrum showing an intense G′ peak (~2700 cm -1 ), which is also referred to as the 2D peak (to avoid confusion with "two-dimensional", we denote it as G′), together with a weak or absent D band (~1350 cm -1 ) 10,16 . The quality can be further quantified by comparing the ratio of G and D band intensity 17 . For graphene oxide (GO), the changes in the intensity and peak position of the D and the G (~1600 cm -1 ) band can be used to characterize its thermal reduction behavior 18 . For many other 2D materials such as BN 19,20 , transition metal dichalcogenides 2,21,22 , and phosphorene 23 , Raman microscopy has also become an important characterization tool.
However, the potential of Raman microscopy is severely hindered by its low throughput, due to the extremely low efficiency of the Raman scattering cross section: on average, 1 out of 10 million incident photons are Raman scattered. Typically, a single Raman spectrum of graphene takes hundreds of milliseconds to tens of seconds to acquire 24,25,26,27,28,29,30 . Consequently, a diffraction-limited map done across a 50×50 µm 2 region, created by raster-scanning focal spots at a typical integration time of 1 s, would take half a day. As a result, many previous studies only collected single spectra from a few spots, despite the fact that large-area scanning and volumetric scanning are highly desired for accurate and systematic studies of material properties. In principle, the Raman signal intensity per pixel is proportional to the light dose, which is the product of the laser power and measurement time per pixel (and therefore inversely proportional to the scanning rate). A better signal-to-noise ratio (SNR) could be achieved by increasing the laser power. However, the laser power cannot generally be increased without considering potential light-induced damage to the sample. For graphene, structural changes are observed in response to laser irradiation in the mW range 31,32 . To mitigate this problem, an electronmultiplying charge-coupled device (EMCCD), available in many modern Raman instruments, has been used to amplify the SNR such that a higher scanning speed of several tens of milliseconds per spectrum can be delivered at 1 mW laser power 30, 33 . Higher speed, however, is still needed for large scale volumetric scanning. Recently, wide-field Raman imaging was introduced to map large area graphene sheets in a few seconds 34 . However, this technique requires additional hardware components and is typically limited to one or few frequency bands, undermining its performance for quantitative characterization of many physico-chemical properties such as defect density. The applicability in depth-resolved volumetric 3D scanning also remains challenging. For GO, the low throughput problem is even more severe due to the requirement of an extremely low laser dose to suppress reduction. A recent study suggested that even a laser dose of 48 µJ, i.e. 48 µW in 1 s, is still too high to prevent sample damage 35 .
In recent decades, numerous studies demonstrated that imaging-related challenges can often be addressed with post-data-analysis using classical or modern algorithms 36,37,38 . Principal component analysis (PCA), an algorithm that is widely used in signal processing and machine learning to find common features in the dataset 39 , has been applied to improve the SNR of hyperspectral datasets 40,41 . The idea is that, by analyzing the variance between spectra within the whole dataset, PCA can distinguish signal features from noise features and thereby allowing a reconstruction of the dataset with predominantly signal features. The performance of PCAguided denoising generally increases with the size of the dataset, because larger datasets enable a more thorough extraction of the signal features. Therefore, it is ideal for large scale hyperspectral analysis.
In this work, we introduce an algorithm-improved confocal Raman microscopy (ai-CRM), combining PCA and EMCCD, to image 2D materials. Briefly, we first collect spectra with an EMCCD at high scanning speed and low laser power. The combination of short measurement time per pixel and low laser power results in "noisy" spectra with a SNR below one. Then we recover the faint signal, invisible in the noise, with the help of PCA. With this technique, Raman mapping of GO can be performed with an extremely low laser power of 5 µW, close to hardware limitation, together with short integration times of 10 ms per spectrum. Meanwhile, we demonstrate that such low laser dose per spectrum can effectively prevent GO reduction. Graphene can be mapped at a hardware-limited scanning rate of 1 ms per spectrum at 1 mW laser power. For graphene and GO, the power averaged scanning rate (scanning rate divided by power for fair comparison) in our work is more than 1 order of magnitude faster than state-of-the-art works.
Finally, we demonstrate that ai-CRM can be extended to fast imaging of other 2D materials including MoS 2 , WS 2 , and BN, and fast volumetric imaging of composite materials.

Fast mapping of GO
We first demonstrate our protocol ( Fig. 1) with a typical Raman mapping of GO nano-sheets.
Confocal Raman mapping (Fig. 1a) was conducted by raster scanning over 25×25 µm 2 with a step size of 0.25 µm, using a laser with a wavelength of λ = 532 nm. To avoid sample damage, the laser power was kept at a power of 5 µW underneath the objective. A 100× objective with a numerical aperture (NA) of 0.9 was used, and the estimated laser spot size d is 1.22λ/NA = 0.72 µm, where λ is the laser wavelength, corresponding to a power density of 12.3 µW/ µm 2 . The diffraction limited resolution is d/2=0. 36 µm. An EMCCD was used to collect the spectra with an integration time of 20 ms for each spectrum. After the scan, Raman spectra from all pixels are assembled into a data matrix where each row contains a spectrum (Fig. 1b). The Raman spectrum from 41-3692 cm -1 was selected for analysis. The collected spectra are extremely noisy, as shown by the representative spectrum in Fig. 1c. The Raman signal of GO cannot be identified. Subsequently, we apply PCA to decompose the dataset into its principal components (PCs. Note that these components are not real spectra, but spectra of the variance in the dataset) whose number is equal to that of the number of wavenumber steps (1571 in this case). The PCs are ranked according to their percentage of total variance described, which represents the importance to the dataset 39 . Since the noise in the spectra is random, while signals are a recurring feature in a fraction of the pixels in the dataset, the PCs containing signals will contribute much more to the variance than those containing mostly noise (Supplementary Figure   1). Typically, only the first few PCs contain obvious signal information. We use a Scree test to determine the number of PCs (here: 6) that need to be retained for further analysis (Supplementary Figure 1) 42 . Relatively clear band-like features are observed in these first PCs ( Fig. 1d), but not in the subsequent ones, which contain mostly noise. Using the first 6 PCs to reconstruct the dataset and rejecting 1565 "noise-dominated PCs" out of the total of 1571 PCs, we obtain Raman spectra with dramatically improved SNR that clearly display the distinct D and G band (Fig. 1f). The criteria to choose the number of PCs is loose. A few more or less than that suggested by the Scree test is typically acceptable (Supplementary Figure 2). The spectra in Fig. 1c and Fig. 1f correspond to the same pixel in the image.
To demonstrate the efficiency of our method for mapping, we imaged the GO nano-sheet with an atomic force microscope (AFM) (Fig. 2a), and compared it with the Raman images ( Fig.   2b and Fig. 2c) created by integrating the G band, with and without ai-CRM, respectively. For clarity, peak intensities will be denoted by I and integrated band area intensities will be denoted The ai-CRM data also allows quantitative analysis of the properties of GO. For example, the results ( Fig. 2g) show that the integral intensity of G band is linearly related to the number of layers and the integration time, which ranges here from 5 to 35 ms. This stems from the fact that the Raman intensity linearly scales with the volume of material within the confocal laser spot.
This also implies the 2L and 3L GO here are weakly coupled, suggesting that ai-CRM could be used for fast counting of layer numbers in the current system.
To quantify the SNR enhancement by our method, the average and the standard deviation of each integration time. The SNR was then determined from: (1) where I(G) is the peak intensity of G band in the averaged spectrum and ΔI noise is the standard variation calculated from the same spectrum at a wavenumber region (2000-2200 cm -1 ), which is a region where no bands of these materials occurs. Fig. 2h shows that with CRM, the SNR is lower than 1 for all integration time from 5 to 500 ms, leading to noisy images (Fig. 2b, Fig. 2e and Supplementary Figure 3a). Note that a SNR below 1 implies that the signal cannot be clearly distinguished from noise, and therefore the values of SNR are only a rough estimation.
With ai-CRM, however, the SNR increases dramatically and progressively increases with integration time. Importantly, even the SNR for 10 ms integration is much higher than for any of the integration times without denoising. This suggests that our method can increase the scanning speed by more than 50 times from 500 ms to 10 ms at an improved SNR. In addition, a literature review suggests that the power averaged scanning rate (scanning rate divided by power for more fair comparison since Raman signal intensity is proportional to dose.) in our work is 2-7 orders of magnitude higher than previous works (Supplementary Figure 5a).

Non-invasive mapping of GO
For GO, reliable Raman characterization has been a major challenge due to laser-induced sample damage 35,43 . To mitigate this problem, the laser dose needs to be reduced as much as possible 11,35,43 . Many previous works used mW scale power to characterize GO 44,45,46,47,48 . A recent study 35 suggested that laser-induced reduction of GO cannot be prevented even when the laser intensity is down to a dose of 8*10 7 J/m 2 , which corresponds to 48 µW during 1 second in the confocal spot. Such laser intensity is already a few orders of magnitude lower compared to preceding studies 47,48 . Further reduction of the laser power, however, resulted in Raman spectra with insufficient SNR. However, the results in Fig. 2 suggest that ai-CRM enables characterization of GO at two orders of magnitude lower intensity (tens of ms with a laser power of 5 µW).
For a comparative study with ai-CRM, we selected two levels of laser power, 4 µW and 4 mW (Fig. 3), to determine the influence of laser power on the Raman signal of GO. Note that for the 4 mW case, normal CCD instead of EMCCD was used, because at such high laser power in the EMCCD mode, the intensity of silicon peak exceeds upper detection limit. This has no effect on the sample damage analysis. in Raman intensity is also seen in the average spectra for each image (insets in Fig. 3a to 3f).
Laser-induced reduction (to reduced graphene oxide, rGO) during Raman imaging has been observed on all GO nano-sheets (see e.g. Supplementary Figure 6). This was further confirmed by measuring Raman spectra of a single spot during 500 s of continuous laser irradiation of 4 mW with a spectrum obtained every 50 ms. The Raman intensity decreased rapidly by ~70 percent in the first 100 s and gradually decreased further to a weak plateau value of residual scattering (Fig. 3d). In addition, the full width half maximum (FWHM) of the G band decreased monotonically with time (inset Fig. 3d), and the I(D)/I(G) ratio changes with time (Fig S6(a)), which is a typical consequence of the reduction of GO 35 . In contrast, the intensity loss is substantially reduced when illuminated with 4 µW of laser irradiation (4.9 × 10 5 J/m 2 intensity) for 500 s with 50 ms per spectrum (Fig. 3h and the insets, respectively). In addition, scanning did not cause any obvious optical changes to the bright field image of the sample (Supplementary Figure 6). After 500 s exposure at the same pixel, the change in I(G) is less than 10%. The change in A(G) during the illumination time per pixel in a Raman image is then less than 1 in 10 5 , which is negligible for most purposes. The change in FWHM is also negligible as it is close to the wavenumber resolution of the instrument (~2 cm -1 ). These results clearly confirm the efficiency of ai-CRM to suppress sample damage after reduction of the experimental laser intensity.

Fast mapping of graphene and its use in sample quality analysis
Our method can be also applied for fast Raman mapping of graphene. To demonstrate this, we made a scratch on a single layer of graphene grown by chemical vapor deposition (CVD) on a 300-nm-SiO 2 /Si wafer. Then we performed a Raman scan across 25×25 µm 2 with a step size of 0.25 µm at 1 mW laser power, which was previously confirmed to be non-destructive for graphene 33 . Similar to the results of GO, significant SNR improvement was observed, as shown by the Raman images of the integral G′ band intensity generated at 5 ms integration time before and after applying ai-CRM ( Fig. 4a and b). Before denoising, the D, G, and G′ bands can hardly be seen (bottom spectrum in Fig. 4a). Although different layer numbers can still be distinguished, an accurate quantitative analysis is hardly possible. After denoising, a weak D band, a sharp G band, together with a sharp and strong G′ band are clearly resolved (bottom spectrum in Fig. 4b), indicating the high quality of the graphene sample (refer Supplementary and I(D)) of the double layer graphene (labeled as 2×1L in Fig. 4a) is roughly twice that of the single layer graphene, confirming that the double layer is composed of two weakly coupled single layers.
Importantly, the ai-CRM data enables quantitative analysis of the sample quality. The distance between defects, L D can be estimated from the ratio of I(G) to I(D): 17 (2) where λ is in nanometers. The distribution of L D is then plotted in Fig. 4c. It is observed that the graphene sample has a relatively uniform L D of ~20 nm across the scanned area, again confirming its high quality. Note that regions having L D <10 nm are not plotted (scale bar in Furthermore, to quantify the improvement in scanning rate, we scanned the same sample using different integration times from 1 ms to 500 ms, and calculated the SNR for all datasets before and after applying ai-CRM. Results in Fig. 4d show that the SNR at 500 ms without denoising is comparable to the SNR at 5 ms with denoising, and is much lower than that at 10 ms with denoising. Similar to the results for GO, this demonstrates again that an increase of more than 50 times in scanning rate can be achieved when ai-CRM is applied. Compared to literature, the power averaged scanning rate here is around two orders of magnitude higher than state-of-the-art works (Supplementary Figure 5b).

Fast mapping of other 2D materials
ai-CRM was also tested on mechanically exfoliated MoS 2 , WS 2 , and BN nanosheets (optical and AFM images in Supplementary Figure 7). MoS 2 , WS 2 can be imaged at 50 ms integration time under 20 µW laser ( Fig. 5a and b). BN, having smaller Raman cross section, can be imaged at 50 ms integration time under 500 µW laser (Fig. 5c). Note that the ai-CRM maps and spectra of these materials generated at 50 ms integration time have similar or better quality than normal CRM maps and spectra at 500 ms integration time (compare Supplementary Figure 8), suggesting that ai-CRM increases the scanning speed for at least 10 times, still with an improvement in SNR.

Fast volumetric imaging of a rGO composite
Volumetric imaging is another advantage of CRM, which can help e.g. to assess the properties of graphenic materials inside a composite or a device. However, such a potential advantage has not been taken advantage of in previous studies, due to the required long measurement times.
Since one 2D Raman image across tens of micrometers at diffraction limited spatial resolution could take half a day, a volumetric Raman image created by stacked 2D images would take several days. With our method, it is now possible to reduce the time to around 10 minutes. To demonstrate this, a GO dispersion was mixed with aqueous polyacrylic acid (PAA), and the mixture was cured overnight in an oven at 80° C. After curing, GO is moderately reduced. Using ai-CRM with 0.75 mW laser power (higher laser power is used because GO is already reduced) at 1 ms integration time, both rGO and PAA, respectively bottom left and right spectrum in Fig.   6a, show their characteristic Raman spectra with the CH stretching band located at around 2930 cm -1 in the PAA spectrum. We subsequently scanned a vertical stack of 20 images across a depth of 10 µm. Each image in the stack covers 50×50 µm 2 with a step size of 0.5 µm and 1 ms integration time (Fig. 6a). The imaging thus took 100 × 100 × 0.001 × 20 = 200 seconds pure measurement time, which took ~720 seconds when the camera readout time and stage translation time were taken into account. Afterwards, the three dimensional distribution of rGO is plotted (Fig. 6b), showing how rGO is blended into the composite. Slices extracted from any arbitrary positions within the volumetric view can be visualized, as shown by the two cross section images in the bottom insets of Fig. 6b whose positions are labeled by the dashed green and blue lines.

Fast mapping of graphene and GO on arbitrary substrates
Compared to other characterization methods such as fluorescence quenching microscopy and bright-field optical microscopy, CRM has the advantage of universal applicability on arbitrary substrates. As an example, we used ai-CRM to image a CVD grown graphene on a glass substrate at 20 ms integration time with 1 mW laser power across a 25×25 µm 2 with a step size of 0.25 µm (Supplementary Figure 9). GO can also be imaged on glass or calcium fluoride substrate with low laser power (10 µW) and short integration time (20 ms), as shown in Supplementary Figure 9. The wrinkles and folds on the GO sheets are clearly visible.

Conclusions
In this work, we demonstrate that ai-CRM significanlty improves the SNR of the Raman spectra of various 2D nanomaterials such as graphene, GO, WS 2 , MoS 2 , and BN. Thereby, it increases scanning rates by more than 50 times with respect to conventional state-of-the-art CRM (Fig. 2h and 4d). Introducing this improvement, sensitive samples such as GO can be mapped faster at extremely low laser power of just several µW. This minimizes laser-induced sample damage and enables reliable and quantitative characterization of physico-chemical properties of graphenic nano-sheets, such as layer number and defect density. Compared to other characterization tools, CRM has the advantages to provide label-free, substrate-independent, and three-dimensional spatial information. Since the denoising performance increases with the size of dataset, even higher scanning rate is expected when the Raman mapping area further increases. This is a highly demanded property, because large scale industrial production of 2D materials and their devices requires scalable characterization methods.
While other techniques such as surface-enhanced Raman spectroscopy 49 and stimulated Raman scattering 50 may give rise to even higher SNR improvement, these techniques are either technically much more involved or require specific substrates and samples and/or do not allow for volumetric imaging.
Since ai-CRM is based on a purely mathematical framework, it can also be applied to improve the above techniques instead of replacing them, and other hyperspectral microscopy methods, such as hyperspectral infrared microscopy and photo-luminescent microscopy. We therefore expect ai-CRM to strengthen the use of hyperspectral imaging as a fast, reliable, quantitative, and spatially resolved characterization tool in the fabrication and broad application of 2D materials.

Raman measurements
Raman measurements were carried out using a WiTec alpha 300R Raman microscope connected to a 532 nm laser. A 600 g/mm grating was used, which provided a spectral resolution of around 2.

Denoising
Denoising was performed on MATLAB (version R2017b) with home-written codes without any pretreatment. A sample code and a sample data set for graphene are given in the supporting information. Typically, denoising 10 thousand spectra only takes around 10-20 seconds with a normal office computer. After denoising, cosmic rays are also removed, because of their random nature.   Variation of the normalized intensity of the G-peak of GO, obtained from a single pixel time series data for 4mW (d) and for 4µW (h). The insets show the variation of the spectra as well as the FWHM of the G peak. All data recorded at 50ms integration time per pixel. All scale bars correspond to 5 µm and all Raman maps have a resolution of 50×50 pixels.

Figure 4: Fast Raman mapping and quality analysis of graphene with ai-CRM (a) Normal
CRM map of CVD grown graphene on 300 nm-SiO 2 /Si wafer at 1mW laser power and 5 ms integration time. The inset shows the spectra corresponding to the markers on a double single layer (blue) and single layer (yellow) (b) ai-CRM map of the same region. The spectra show the evolution of the D,G and G′ peaks after denoising, useful for quantitative analysis. (c) Colorcoded defect density map, derived from the peak intensity ratio of D and G band (Eq. 2), of same region depicting that the average distance between defects is ar0und 20 nm, showing the high quality of the CVD grown graphene. (d) Variation of the SNR of the G′ peak for various integration times, illustrating the amplification of the SNR with ai-CRM.