A Statistical Approach to Raman Analysis of Graphene-Related Materials: Implications for Quality Control

: A statistical method to determine the number of measurements required from nanomaterials to ensure reliable and robust analysis is described. Commercial products utilizing graphene are in their infancy and recent investigations of commercial graphene manufacture have attributed this to the lack of robust metrology and standards by which graphene and related carbon materials can be measured and compared. Raman spectroscopy is known to be a useful tool in carbon nanomaterial characterization, but to provide meaningful information, in particular for quality control or management, multiple spectra are needed. Herein we present a statistical method to quantify the number of di ﬀ erent spectra or other microscale measurements that should be taken to reliably characterize a graphene material. We have recorded a large number of Raman measurements and studied the statistical convergence of these data sets. We use a graphical approach to monitor the change in summary statistics and a Monte Carlo based bootstrapping method of data analysis to computationally resample the data demonstrating the e ﬀ ects of underanalyzing a material; for example, graphene nanoplatelets may require over 500 spectra before information about the exfoliation e ﬃ ciency, particle size, layer number, and chemical functionalization is accurately obtained.


■ INTRODUCTION
Since the landmark isolation of single-layer graphene, there have been many investigations into its record properties including electron mobility, 1,2 room temperature quantum Hall effect, 3,4 thermal conductivity, 5 and high strength 6,7 which are all highly desirable in real-world devices and applications. 8 Despite this interest, commercial products utilizing graphene are still rare, and recent publications have pointed to the lack of robust metrology and standards by which graphene and related carbon materials can be measured and compared between industrial manufacturers. 9−17 The number of companies producing and using graphene has expanded rapidly in recent years, and at the time of writing, grapheneinfo (a graphene news aggregator) lists 223 companies: 92 in Europe, 73 in North America, 45 in Asia, and 13 in Australia, 18 using different manufacturing routes and producing a range of materials with irregular sizes, shapes, and chemical functionality. 19 These differences in properties make it difficult to use an alternative or replacement commercial graphene source in applications without extensive prior testing and reformulation. Characterization of graphene from 60 different suppliers has shown that many companies manufacture graphite nanoplatelets with a wider range of properties than the ISO description graphene would suggest. 9,20 Although standardization work on graphene will help, it is worth noting that in some applications, like composite materials, the use of graphite platelets over single-layer graphene can be beneficial. 21 A robust analysis tool is needed for both academia and industry to inform quality control and industrial standards for the fulfilment of graphene's full potential.
This urgent problem has been highlighted in recent publications, 9−12 with potential high throughout quality control techniques for graphene materials including static light scattering, surface area measurements, and wide area optical contrast images suggested. 10,16 Unfortunately, while many of these techniques serve one specific manufacturing route, they cannot be applied to a material without a significant amount of prior knowledge of the origin of the materials. The lack of a widely adopted, high-throughout, low-cost, simple, and rigorous analytical method may suggest the development of a "catch all" approach is difficult; instead, a compromise between rapid screening and detailed analysis may be required to ensure quality control measures can meet required standards. These standards may well depend on application and will remain a question for key stakeholders. To inform such work, we present a statistical method for identifying when detailed microscale measurements can be considered representative of a macroscale material and thereby approach an answer to the following question: how close can we get to high throughput?
We focus on Raman spectroscopy; among the many techniques for graphene analysis, Raman remains the most versatile due to the depth of information that can be readily extracted with very little sample preparation required, including exfoliation efficiency, particle size, layer number, and chemical functionalization. Additionally, Raman metrics have been reported for the analysis of other 2D materials that share many of the same problems observed from graphene: inefficient exfoliation and irregular particle sizes. 22 In materials science, Raman data are usually acquired by using Raman microscopy which couples a Raman spectrometer to an optical microscope, allowing high-magnification visualization of a sample and Raman analysis with a microscopic laser spot. In common with other microscale techniques a single spectrum should not be used to characterize a macroscopic material. While it is common for a representative Raman spectrum of a carbon sample to be reported, 23−26 this is undesirable because of the variations likely to be present. Figure 1a highlights this with three different Raman spectra from the same material. Based on the evidence of only one spectrum, this material could be highly oxidized graphite or few-layer graphene, whereas in fact the sample is mostly graphitized carbon.
One potential solution adopted by some is to collect multiple spectra from a sample and build a statistically meaningful picture of the material being analyzed. 27−31 This is effective, albeit very time-consuming, and is the method we build on here to develop a robust quality control measure. To make the approach more acceptable and useful, it is beneficial to consider how many data points are required to establish a meaningful analysis. Most reports that use microscopic techniques, whether that is Raman microanalysis, electron microscopy, or scanning probe microscopy, generally state a mean and standard deviation assuming the graphene material follows a Gaussian model. 32−34 Unfortunately, this methodology is rarely applicable to graphene materials as shown from careful analysis of flake sizes by Kouroupis-Agalou et al. 35,36 In addition, many important properties can be heavily influenced by minority components within a material. 37,38 The same Figure 1. (a) An indication of the range of single spectra that can be obtained from a carbon material (in this example, produced by hightemperature carbonization) and how poorly any one spectrum represents the bulk sample. (b) Distribution as shown as a bivariate histogram; the population of bins defined by both the I D /I G ratio (x-axis) and I 2D /I G ratio (y-axis) denoted as a color map. Colored crosshairs link bins to a typical spectrum with those features, shown in (a). The bottom (green) spectrum is consistent with graphitized carbon and represents the bulk of material present; the middle (red) spectrum shows a small phase of highly crystalline graphite formed, while the top (blue) Raman spectrum shows the most defective amorphous carbon present.

Scheme 1. Analysis of Different Graphene Materials Undertaken on a Number of Powder Samples a a
Raman data sets were collected on the randomly mixed powders, Raman data were then fitted to extract peak parameters, and statistical testing was used to investigate the size of the data set required. exciting properties like nanoparticle size, shape, and complex chemical environment that promise the most remarkable applications require similarly complex characterization, thereby prohibiting the use of single averaged values. Rather than finding another distribution model, it is easier and more reliable to report the entire distribution of values measured or at least compare a range of summary statistics like mean (μ), upper and lower quartiles (Q 75 , Q 25 ), and 10% and 90% percentiles (P 10 , P 90 ).

ACS Applied Nano Materials
Here we investigate the size of data sets required for a reliable, reproducible evaluation of nanomaterials by proposing a statistical method of visualizing data convergence that can, in principle, be applied to any microscopic technique. The most pressing concern, however, is graphene metrology, and for this we promote statistically robust Raman data sets, following a process outlined in Scheme 1, because of the flexibility, depth of information, and ease of use. To this end, we have investigated the size of Raman data sets required for different industrially relevant graphene materials as well as the effect of signal-to-noise on the reliability of these data sets and also provide a computer program specifically written for the analysis of carbon nanomaterials that can reliably fit large data sets. One feature of Raman data sets is the varied amount of information extracted from peak fitting of large data sets, specifically peak positions, widths, and height data. When dealing with these data, we highlight 3D bivariate histograms as shown in Figure 1b; these allow the full distribution of key parameters such as I D /I G and I 2D /I G to be shown on the same axes in a simple plot that can be compared with other similar materials and act as a fingerprint. These are not the only metrics that can be used, but they can be widely applied to any graphene material; other metrics require specific sample preparation or have a more specific focus. 30,31 It is unlikely that a single technique will meet all the disparate demands from a field as broad as 2D materials; however, Raman spectroscopy is a nondestructive, relatively straightforward technique that requires little sample preparation and is especially useful for carbon materials.

■ EXPERIMENTAL SECTION
Raman Spectroscopy. Raman spectra were recorded with a Horiba LabRam Evolution by using a 532 nm laser (17.2 mW, M 2 < 1.1, beam divergence <0.45 mrad) and a ×50 long working distance objective lens (Leica HCX PL FLUOTAR, WD = 8 mm, NA = 0.55); samples were ground to a fine powder and pressed into pellets to provide a smooth, flat surface to focus onto. Three samples were taken from each type of material, and automatic Raman maps were collected over an 80 × 80 μm 2 square at 2.5 μm intervals to produce three data sets of 1024 points; the acquisition time and number of repeat scans were varied according to sample to maximize the signalto-noise possible. The instrument was calibrated to the 520.7 cm −1 Raman signal of silicon before every map was recorded.
Commercial samples were sourced from different companies and analyzed as received; the use of "high" and "low" quality labels to differentiate the two graphene nanoplatelet samples was based on the marketing materials of the products from their manufacturers.
Exfoliation. Graphite (25 g) and sodium cholate hydrate (2.5 g) were mixed in distilled water (500 mL) in a jacketed glass vessel. A L5M Silverson high shear mixer equipped with a 32 mm rotor and a 96 2 × 2 mm 2 square hole stator (the rotor stator gap is 136 μm) was run at 8000 rpm for 90 min while cooling the dispersion to 0°C.
The resulting dispersion was centrifuged at 500 rpm (32g) for 45 min, and supernatant was collected to remove the unexfoliated graphite. The supernatant was centrifuged at 1000 rpm (129g) for 45 min, and then the supernatant from this is further centrifuged at 10000 rpm (12857g) for 45 min to remove tiny fragments, and the sediment was collected. The sediment was dispersed in water and filtered (0.2 μm polycarbonate); 3 L of distilled water was then used to remove the residual sodium cholate, followed by 100 mL of ethanol and acetone. The filter paper was dried in a vacuum oven (∼1 Pa, 60°C ) for 24 h, and few-layer graphene was removed.
Graphitization. Graphitic foam was produced following similar method to Kicinśki et al.; 39 resorcinol (2.5 g) and iron(III) chloride (8.3 g) were dissolved in a mixture of water (23 mL) and methanol (24 mL), to which furfural (5.0 g) was added. This mixture was placed in a centrifuge tube and mixed with a vortex mixture and then placed in an oven for 24 h at 60°C for polymerization. Following a further 3 days to dry, the solid polymer was placed in an alumina boat for high-temperature processing. The alumina boat was placed inside a quartz worktube (I.D. 29 mm) inside a Carbolite tube furnace (MTF 12/38/400); the system was purged with argon (150 mL min −1 ) for 30 min, and then hydrogen (8.72 mL min −1 ) was added to the flow at atmospheric pressure. Argon gas flow was measured by a volumetric flow meter while hydrogen was controlled by a Brooks 5850 TR series mass flow controller in totalizer mode. The furnace was heated to 1050°C at 10°C and held at this temperature for 2 h. Following high-temperature processing, the graphitized foam was washed in 6 M HCl for 48 h before filtering (0.2 μm, polycarbonate) and washed with copious water until washings were neutral.
Chemically Reduced GO. Graphite oxide was produced by using the Hummers method; 40 typically sodium nitrate (5 g) was dissolved in sulfuric acid (230 mL) at 0°C before graphite (10 g) was added followed by the slow addition of potassium permanganate (30 g), ensuring the temperature does not exceed 10°C. The mixture was then heated to 35°C for 2 h, after which ice cold deionized water (460 mL) was slowly added to quench the reaction, and the brown solution was further diluted (1.4 L) and hydrogen peroxide added until fizzing stops. The acidic solution was centrifuge washed until neutral, with a subsequent 6 M HCl wash and neutralization.
A 20 mL aliquot of the graphite oxide solution was then sonicated (20 min, 30% amplitude, 5 s pulse) by using a 750 W Cole-Parmer ultrasonic processor. This mixture was centrifuged at 1000 rpm for 30 min, and then the supernatant was freeze-dried (SP Scientific BenchTop Pro). The graphene oxide (40 mg) thus prepared was dispersed into water (50 mL) and reduced by the addition of hydrazine monohydrate (20 μL) and heating to 60°C for 24 h; the resulting black dispersion was filtered (0.2 μm nylon) and redispersed by gentle sonication into water (20 mL) and freeze-dried (SP Scientific BenchTop Pro).

■ METHODOLOGY OF STATISTICAL ANALYSIS
A selection of graphene-related materials have been prepared and analyzed by using three independent sets of 1024 Raman spectra collected from powder samples to identify guidance for the robust collection of statistically meaningful data, all following the workflow in Scheme 2. The materials were chosen to cover the full range of interesting and topical carbon nanomaterials, specifically graphite, liquid exfoliated graphene, reduced graphene oxide, and high-temperature graphitized carbon. In addition, two commercial GNP (graphene nano-ACS Applied Nano Materials www.acsanm.org Article https://dx.doi.org/10.1021/acsanm.0c02361 ACS Appl. Nano Mater. XXXX, XXX, XXX−XXX C platelet) samples and a commercial MWCNT (multiwalled carbon nanotube) sample were analyzed; full details are provided in the Supporting Information. The samples were mixed and ground into uniform fine powders and pressed into crude pellets to ensure a random distribution of flakes with minimal spatial dependence; independent Raman spectra were then collected from points 2.5 μm apart over an area covering 80 × 80 μm 2 . Typically, this took around 16 h. These measurements were taken three times for every material (six for MWCNTs and graphitized carbon) on a different sample each time.
Spectra were assigned and fitted by applying well-established protocols. 41 A full treatment of Raman spectroscopy for carbon materials is beyond the scope of this work; however, in summary, the inelastic Raman scattering is caused by the production of phonons, vibrational excitation modes within the crystal structure of the carbon material, and therefore information about the crystal and electronic structure of the carbon can be inferred (a typical spectrum can be seen in Figure S1). Present in all graphitic carbon materials, the G band with a Raman shift of ∼1580 cm −1 is caused by the degenerate iTO (in-plane transverse optical) and LO (longitudinal optical) phonon modes, the symmetrical opposing motion of carbon atoms in a single direction within the plane of the graphene sheets. 41−43 The D peak is dispersive and more variable in wavenumber but generally appears around 1350 cm −1 and is caused by scattering from the same iTO phonon mode around the K point in the Brillouin zone; however, this transition requires a second scattering event from a symmetry breaking "defect" bonded to the six-membered carbon ring, hence the label "defect" peak. 44 This is commonly used to probe functionalization, sheet size, and defect density by crudely correlating them with the peak ratio I D /I G . 45−49 The other major peak is the 2D peak observed at higher Raman shifts around 2700 cm −1 and commonly used to probe graphene thickness. The sensitivity of this peak is due to the double-or triple-resonance condition of this interaction; 41,42,50,51 this means that the incident photon produces a photoexcited electron−hole pair within the material that undergoes two distinct inelastic scattering events before reemitting a photon with a different energy. The similarity in energy between the photons and excitation transition makes the process resonant and therefore sensitive to changes within the band structure such as observed from monolayer and bilayer graphene, a change that alters the shape, position, and intensity of the peak. Commonly, the I 2D /I G and 2D peak FWHM are extracted and used as a measure of graphene thickness or exfoliation efficiency. 52−54 Graphene thin film devices prepared from CVD growth or mechanical exfoliation make excellent use of these parameters, and methods for spatial mapping of pristine graphene films through Raman and optical microscopy have provided insight into defect concentration and electrical performance. 55−57 Although these methods require sufficient data points or spectra for a statistically meaningful spatial map, many commercial graphene products or graphene-containing materials make use of graphene in powder form (such as GNPs) where spatial maps will provide very little useful information.
To investigate the size of data sets for powdered samples, the spectra collected were fitted, and the ratios of peak heights, I D /I G and I 2D /I G , were used in the statistical analysis. Although we chose to use peak intensities of the G and D band, as they are most widely applicable for powders, other peak parameters such as FWHM of the 2D band can be used to test the statistical sample sizes (see the Supporting Information). Elaborating briefly on the program developed to analyze large Raman data sets, every spectrum is treated as independent and fitted in turn. By use of the freely available lmfit 58 Python package for least-squares regression, a background is fitted and then additional spectral features are added sequentially, but each time the principle of Occam's razor was used to validate the new function. Specifically, if the addition of a spectral feature only accounts for <2.5% of variation in the data, that feature was excluded from the final model; this is important to get a reliable estimate of the uncertainty in the fitted parameters. The fitted parameters for every spectrum are then reported, providing the full range of peak parameters measured across the entire data set. Full details are provided in the Supporting Information. This program is freely available at https://github.com/SGoldie4/RamanMapAnalysis.
The parameters returned from each measurement are analyzed according to the workflow in Scheme 2, initially for data convergence as the map was collected. This process considers the entire distribution, making no assumptions about statistical models, and we visualize how key summary statistics vary as more points are added starting from only one data point. The summary statistics chosen were mean (μ), interquartile range (Q 25 , Q 75 ), and 10th and 90th percentiles-(P 10 , P 90 ) as a function of sample size. This visualizes the change in the distribution of data as more Raman spectra are collected and also shows the approximate value at which the distribution stops changing and the data set could be considered to have converged. This is demonstrated in Figure 2, which shows the summary statistics initially changing dramatically as new points are added before becoming smoother; these changes are also highlighted with three colored regions. After this convergence point there is little to be gained by collecting further spectra other than to increase the resolution of the distribution. However, collecting fewer data points than this convergence point could result in erroneous distributions of the Raman peak parameters. The tolerance to errors in the Raman analysis will depend on the purpose of the analysis; if one simply wished to look for any significant changes to a material following some treatment or process, far fewer points could be collected whereas a check for minority phases and impurities in a bulk powder would require a more comprehensive data set.
The flexibility to deal with any distribution rather than assuming one specific statistical model is important as some materials contain a random mixture with a normal distribution, while others are dominated by a log-normal distribution and yet other materials have irregular distributions containing two material phases. 35 In addition to the convergence plots described above, a method known as bootstrapping was used to better understand the effect of underanalyzing a material. Bootstrapping repeatedly analyzes smaller subsamples of the original data Full distribution of I D /I G ratios measured from the sample is shown as a histogram. By calculating summary statistics as each point was recorded, one can see the change in distribution with overall sample size. The left shaded region highlights highly variable data; any analysis undertaken with so few data points will be unreliable. The middle region shows where the data are starting to converge although noise is observed. The lightest shaded area to the right shows where the data have converged within error, after which the addition of new data points will make little difference. ACS Applied Nano Materials www.acsanm.org Article set collected and can be used to analyze the scatter in multiple of these subsamples about the value of the larger data distribution. This is done by using a Monte Carlo type method to randomly select values from the data set and place them into a subsample; this can be analyzed to find the mean or even full distribution of that subsample before repeating the process to generate and analyze another random subsample. As the subsample increases in size closer to the size of the full data set, the mean and distribution are expected to more closely resemble the full population distribution as illustrated in Figure  3. Conversely, the variation and noise in smaller subsamples is indicative of the analysis that would result from only collecting smaller Raman data sets. Finally, to understand the role that random noise plays in the convergence and final distribution of peak parameters, a large model data set was produced based on real data. A polynomial background and perfect Lorentzian peaks were mathematically described before a pseudorandom number generator was used to induce random noise into the signal. A series of "maps" were produced containing the same spectral peaks but with random noise added at a constant level across the map; each map then increases in noise compared to the last. By fitting these model maps, we can crudely separate the scatter caused by experimental noise from the scatter caused by polydisperse samples.

■ RESULTS AND DISCUSSION
To attempt any analysis of large data sets containing many Raman spectra, it is necessary to fit mathematical models and use the parameters in place of each spectra. To achieve this, we developed a simple computer program (further details in the Supporting Information) and shown it is successful at processing large Raman data sets. Every spectrum is fitted independently of all others, ensuring noisy spectra or minor components do not affect the parameters reported from other spectra. The peaks included in every fit are also validated to ensure they are present and not simply the result of random noise.
The data fitting package used allows for estimates of the error in every fitted parameter, which is a useful tool for investigating possible outliers. Our program, freely available with this work, has been successfully used to fit Raman data from various carbon materials and returns all peak parameters for further analysis. The exact parameters required will depend on the analysis, but most commonly I D /I G and I 2D /I G are used to characterize graphene and related materials; for this purpose a 3D bivariate histogram heat map should be highlighted for the effective display of the full distribution of all of these parameters simultaneously in one graph which allows for easy comparison with other materials. Having validated the program used to fit Raman data, the size of Raman data sets and the effect this has on the reliability could be considered. Here it should be stressed that our example uses Raman data collected from graphene and related materials; however, the same approach can be applied to any analytical method of nanomaterials that measures different regions of a heterogeneous material. By visualization of the trends in summary statistics and evolution of the subsample distributions, it has proved possible to understand the analysis and make justifications for the use of a specified sample size. It is clear different materials will require bespoke analyses with rigorous justification; too many data points, in this case Raman spectra, will waste valuable time and increase costs for manufacturers, but too few points can result in erroneous analyses.
To ensure that 1024 points are sufficient to draw such conclusions, the three independent samples for each material . Key statistical plots highlighting the difference present within two different GNP samples; the top row (a, b) is from a more homogeneous product, and the bottom row (c, d) is from a more irregular product. The bootstrap analysis is shown on the left (a, c); the distribution of different subsamples (colored lines) containing different numbers of data points (displayed on the plot) were randomly generated from one complete data set of each material. In cases where the lines are very different, measuring a small number of points should not be relied upon to reproducibly characterize the material. The plots (b, d) on the right show the change in summary statistics (solid = mean; dashed = Q 25 , Q 75 ; dotted = P 10 , P 90 ) as more data points are added to the analysis; the different colored lines indicate different data sets from the same material. The shaded colored regions highlight different convergence regimes as described previously; these regions were used to inform the sample sizes used for the bootstrap analysis. are compared. As expected, there is noise; however, all data sets from the same material are large enough to be indistinguishable from each other, and therefore it is assumed from the "true" distribution of that material. Any gaps between the data and "true" or population distribution can be ignored; full data sets are shown and discussed in the Supporting Information.

ACS Applied Nano Materials
While these data sets are consistent, the difference between materials is marked. To illustrate this point, we contrast two commercial GNPs using the statistical method; this case study highlights the methodology and difference in analysis between materials. They were described as different grades, but there was no information about flake sizes, chemical functionalization, or exfoliation efficiency provided with these materials and both were listed with very similar Raman spectra. However, with large Raman data sets the two materials are unmistakable. One GNP sample is highly monodispersed with sharp peaks in both the I 2D /I G and I D /I G distributions very similar to bulk graphite while the other is very polydisperse with a large range of graphite flake sizes indicated from the Raman data, illustrated in Figure 4.
The difference in materials also applies to the different analyses required for them; the monodisperse sample has a very reliable Raman spectrum that would require a relatively small sample size to plot the complete distribution of material present. In contrast, the polydisperse sample requires a much larger data set to approach complete characterization. These effects are illustrated in Figure 4, where key bootstrap and convergence line plots from the two materials are shown. The monodisperse sample has a much sharper distribution in the bootstrap plots (Figure 4a) even with a small subsample size; these plots are a way of visualizing how different the analysis would be using smaller sample sizes than the 1024 recorded. In the case of the monodisperse material, if only 100 Raman spectra were recorded, there could be some uncertainty due to the shift in mean and intensity between the yellow and green lines. However, the general trend is consistent, and even with 300 data points the different subsamples are indistinguishable from each other.
The convergence plot (Figure 4b) also shows the same trends; with fewer than 200 data points the mean and distribution width are changing substantially as new spectra are included. After this point the distribution becomes more established although there are still some fluctuations as the sample size increases. Once 400 spectra are included, the distribution shows little change beyond random noise and slight variations due to possible outliers or minor components already included in the analysis. In contrast, the other commercial GNP sample (Figure 4d) has a much wider range of I D /I G values that cause significant variation even after hundreds of points have been recorded; indeed, 500 points are required before even the mean becomes stable, and many more points are needed if the full distribution is required, in this case 750 data points to stabilize the percentile and interquartile range.
Considering the bootstrap analysis of this material as shown in Figure 4c, the effect of underanalyzing the material can be seen; these plots also make clear the asymmetrical shape of the distribution of I D /I G values. From this analysis the heterogeneous sample contains two components: a major fraction of material with a low I D /I G and a minor fraction with a much higher I D /I G . The exact number of measurements required would depend on the question being asked; however, these plots illustrate the difficulty of establishing the exact distribution even with hundreds of points. Were these materials to be compared based on only a few Raman spectra, they may appear indistinguishable.
This statistically inspired approach has also been successfully demonstrated with other carbon materials, many of which have different behaviors and would require different sample sizes for their analysis (full details in the Supporting Information). One notable example was characterization of material produced through the liquid phase exfoliation of graphite to produce graphene with high shear, a process commonly used in industry and many research laboratories. Here we deliberately avoided a multistep cascade centrifugation process to purify the material and used only a single centrifugation step. Without the extensive purification process there was a significant population of graphite found in the Raman analysis of the material. However, evidence of exfoliation through an increase in I 2D /I G , accompanied by an increase in I D /I G , was observed in some components present within the bulk powder ( Figure  S33). These signals could be difficult to detect from a single Raman spectrum.
We have shown it is possible, and useful, to display the entire distribution from nanomaterial analysis. The size desired for such distributions to be considered representative can be justified with the methodology described. We have focused on detailed discussions of GNPs here, but the approach has also been applied to other carbon materials. A summary of sample sizes required for the materials measured is included in Table  1; for further details see the Supporting Information. It is important to note that the numbers are not definitive for the types of carbon discussed, as similar materials of different origin may behave differently. It is the methodology and its application in determining the data points needed for convergence that are important.
Experimental noise is inevitable when collecting Raman spectra, and so to understand the effect of noise, a model data set was created based on real peak parameters to which a controlled level of random noise was added. The model data set was fitted, and the convergence and distribution of fitted peak parameters from many different model spectra, with the same noise levels, were compared with increasing noise levels ( Figure S6). The first observation was that the fitted values agreed with the model values used, despite being treated independently, confirming the program written to fit the Raman data sets is correctly fitting the mathematical forms expected. Crucially there was little correlation between the number of points required for a data set to converge and the signal-tonoise present as seen from Figure 5. While the distribution of data clearly becomes wider the more noise is present, the rate of convergence remains steady (in this example requiring 300 points) irrespective of noisy spectra (Figures S8 and S9). This convergence with number of data points (n) follows the relationship √n −1 ; hence, adding points results in the same reduction in the uncertainty of the data set, regardless of the initial uncertainty due to spectral noise. A map containing many noisy spectra may be difficult to fit, but this will not alter the number of points required; that depends on the components and inhomogeneity of the sample. The effect on small peaks that become lost in background noise will be discussed later.
Significantly more impacted by signal-to-noise is the variation in peak parameters. Despite fitting the same spectrum, the random noise creates significantly more uncertainty in the accuracy of a single spectrum. This is visualized in Figure 6 where the standard deviation is clearly increasing with decreasing signal/noise; however, the mean I D / I G is in agreement with the expected value from the model data until the signal/noise level becomes less than around 10.
To understand this effect further, the standard deviation is normalized to the peak ratio being calculated, known as a coefficient of variation, which provides a measure of the scatter as a percentage that accounts for the difference in magnitude between I 2D /I G and I G /I G ; this is plotted against the noise level. Figure 6b shows there is a strongly linear relationship between the average noise level within a set of experiment data and the random noise to be expected in the fitted outputs. This normally distributed noise is unavoidable when using real data sets; however, it is also clear that the standard deviation from noise will remain below 5% for a good signal-to-noise level. Materials often produce significantly wider and less defined distributions so these can reasonably be attributed to polydispersity within the sample.
These linear trends were observed to break down as the signal intensity of the lowest peak, in this model data set the 2D peak, approaches the noise present within the background. In this limit the analysis and peak fitting become unstable as the peak height becomes lost in the noise, occurring at signalto-noise values around 1.5 (further discussion in the Supporting Information). It is suggested that this represents the limit of usability for extracting even vague average values from Raman map data. However, signal-to-noise ratios greater than two can be used to estimate mean values for material properties from large Raman data sets hundreds of points in size. Although, at this value the scatter in data is significant so signal-to-noise values of the lowest spectral feature should be above 20 if probing the polydispersity of a nanomaterial powder.
It is acknowledged that Figure 6b provides no predictive power; however, by spanning spectra containing no noise to spectra in which peaks are smaller than the background scatter, it is possible to confirm that the magnitude of the variation caused by noise remains smaller than the variation that is clearly a feature of nanomaterials. There is also close agreement with experimental data, collected many times from the same spot on a control graphite sample. This indicates the broad trends and approximate values as well as normal distribution curves generated from model data apply to experimental data. This allows us to differentiate between the width and character of distributions generated from a material's inhomogeneity and the effect of noise in the sample. Importantly, the effect of noise on a distribution should be relatively low in comparison to the material properties we are actually trying to measure, so long as the signal-to-noise of the lowest features remains above 1.5.

■ CONCLUSION
In summary, we have investigated a range of carbon nanomaterials by applying a fitting algorithm to extract peak parameters and a statistical approach, utilizing summary statistics and bootstrap analysis, to visualize the convergence of the data sets. This statistical method allowed us to quantify the number of different spectra needed to characterize a material without relying on conventional statistical models that often do not apply to nanomaterials. This method for accurately characterizing nanomaterials will have significant impact on graphene metrology and can be applied to the growing graphene industry in both manufacture through quality control and formulation of graphene products through consistency of supply. We also considered the effect of signal noise on the spectral analysis; while noisy data sets were confirmed to increase uncertainty, they have very little effect on the required sample size for a given level of precision. Such critical reflection of measurement techniques is required for nanomaterial metrology to mature and allow graphene and other exciting and technologically relevant 2D materials to transition from the lab scale to an industrial setting. This work demonstrates that large data sets should be collected and complete distributions reported due to the polydispersity of Figure 5. Convergence plot showing the effect of worsening signal-tonoise on the convergence of large Raman data sets. While complex in appearance, the key message from this plot is the collective behavior of lines as the sample size changes. The colored lines indicate increasing levels of noise through a rainbow spectrum from red (high noise) to blue (low noise). While the lower noise levels are more difficult to see due to the significantly reduced spread of data, the actual rate of convergence or fluctuations in the lines are consistent across all noise levels; this is seen in the general trend of fluctuating lines of all colors before the convergence point around 300, after which all lines become relatively smooth.
ACS Applied Nano Materials www.acsanm.org Article https://dx.doi.org/10.1021/acsanm.0c02361 ACS Appl. Nano Mater. XXXX, XXX, XXX−XXX H many nanomaterials. Typically, a few hundred data points were found to be sufficient to establish the mean value of a given metric, while accurate distributions to identify minor components often required over 400 data points. Such insights should inform the development of nanomaterial metrology which would allow greater confidence in the manufacture of graphene products. We suggest users undertake their own analysis following this methodology for their materials and quality control purposes.
Details of program for fitting Raman spectral data, further discussion of signal-to-noise, full data sets and statistical analysis of carbon samples: graphite, exfoliated graphite, commercial GNPs, commercial MWCNTs, reduced graphene oxide, and high-temperature graphitized sample; details of Raman graphite control (PDF) ■ ACKNOWLEDGMENTS S.J.G. thanks the EPSRC for a PhD studentship (1743232). S.B. acknowledges support from the EPSRC Centre for Doctoral Training in Soft Matter and Functional Interfaces (EP/L015536/1). ■ ABBREVIATIONS AND GLOSSARY P 10 , 10th percentile; P 90 , 90th percentile; Q 25 , 1st quartile; Q 75 , 3rd quartile; GNP, graphite nanoplatelet; MWCNT, multiwalled carbon nanotube; bootstrap, an established statistical methodology to resample a data set by forming smaller subsets that can be used for many other testsin this case used to graphically show convergence testing. Figure 6. Model data containing 500 unique spectra with nominally the same peaks were produced with random noise; this was repeated for increasing noise. (a) Average and standard deviation I D /I G returned per noise level; the deviation is clearly increasing while the mean is reasonably constant and close to the theoretical value shown as gray dashed line. (b) Relationship between the coefficient of variation and signal-to-noise calculated showing that noisy spectra almost never contribute more variation than the variation inherent in nanomaterials. (c) Distribution of I D /I G ratios fitted to the data set with a noise level indicated by the text label.