Predicting Compressive Strength of Consolidated Molecular Solids Using Computer Vision and Deep Learning

We explore the application of computer vision and machine learning (ML) techniques to predict material properties (e.g., compressive strength) based on SEM images of material microstructure. We show that it is possible to train ML models to predict materials performance based on SEM images alone, demonstrating this capability on the real-world problem of predicting uniaxially compressed peak stress of consolidated molecular solids (i.e. TATB) samples. Our image-based ML approach reduces root mean square error (RMSE) by an average of 51% over a non-image-based baseline. We compared two complementary approaches to this problem: (1) a traditional ML approach, random forest (RF), using state-of-the-art computer vision features and (2) an end-to-end deep learning (DL) approach, where features are learned automatically from raw images. We demonstrate the complementarity of these approaches, showing that RF performs best in the “small data” regime in which many real-world scientific applications reside (up to 24% lower RMSE than DL), whereas DL outpaces RF in the “big data” regime, where abundant training samples are available (up to 24% lower RMSE than RF).


Background
Materials characterization is a cornerstone in materials science, providing insights on materials' structures and properties to further the understanding of fundamental phenomena and guide the materials optimization process for application development.Visualization techniques, such as scanning and transmission electron microscopy, electron diffraction, X-ray computed tomography and magnetic resonance imaging, among others, are widely used providing high spatial resolution images of atomic arrangements, morphologies, particle shapes, and microstructure information including defects and voids within materials.With significant improvements in science and technology of materials characterization methods, visualization tools listed above have also made advancements providing higher resolutions and faster data collection capabilities.With these breakthroughs in visualization techniques, the bottleneck in advancements in materials characterization will no longer be the capability limitations of the characterization tools themselves, but rather the ability to rapidly analyze and interpret the large amount of high-quality data.
One of the most beneficial aspects of visualization characterization techniques is the immediate feedback one receives upon analyzing one's samples.For example, a scanning electron microscopy with Energy Dispersive Spectroscopy (SEM-EDS) provides its users immediate information regarding size, morphology, composition and other microstructure information.However, as samples become more complex and heterogeneous, the immediate feedback is no longer definitive, as information becomes qualitative, and not quantitative.To obtain more quantitative values, additional analyses are needed, especially for heterogeneous samples.As images or micrographs collected by SEM and other visualization tools become more complex (high dimensional data) , interpretation of the data will rely significantly on the user's experience and intuitions to infer and impart significance to the data.Although human interpretation is often sufficient to elucidate the significance of the visual data, it also can introduce personal bias, which can overlook or neglect potentially important information.

Related Work
To reduce human workload and to accelerate extraction of quantifiable values from SEM images of heterogeneous samples, computer vision techniques can be applied for feature detection and extraction.Computer vision techniques have been widely used for object identification, medical imaging, satellite image analysis, and numerous other applications.It is a well-established technique applied to labor intensive processes to accelerate identification of objects as well as automate feature extraction.Computer vision assisted techniques have also been utilized in materials science for microstructure characterization and recognition [1]- [3], including powder characterization for additive manufacturing [4].Computer vision feature detection techniques such as Harris-Laplace [5], Difference of Gaussian [6], Haralick texture features [7], and histogram of oriented gradients [1] have been previously utilized.In particular, the "bag of visual words" image representation employed by Holm et al. [8] to create "fingerprint" microstructures is a good example of using computer vision techniques to extract information from micrograph images.In addition, more recent cognitive neural network based approaches have also been utilized [9] to help identify molecular assemblies on surfaces and microstructures.These previous works and approaches show promise to significantly shorten time for image analyses and bringing in big-data based tools for materials science can significantly increase the throughput of labor and time-consuming processes.
While prior work has focused on characterization, this work takes computer assisted image processing a step further, to correlate image features with materials performance.In order to demonstrate this capability, we focus on correlating features from SEM images of organic crystal microstructures (i.e.size, morphology, defects, etc.) to uniaxially compressed peak stress of consolidated TATB (2,4,6-triamino-1,3,5-trinitrobenzene) samples.TATB is an insensitive high explosive compound of interest for both Department of Energy and Department of Defense [10].

Technical Approach: Computer Vision and Deep Learning
Deep learning (DL) has demonstrated advantages over traditional machine learning (ML) and computer vision (CV) techniques for a variety of applications, most notably: improved predictive performance and automated learning of feature representations with minimal human guidance.However, important limitations remain.In particular, DL typically requires more labeled training examples than traditional ML approaches, and it is often difficult to explain model performance.In order to assess application of computational tools for materials science, we chose to compare the two approaches: (1) a traditional ML approach (random forest) using state-of-the-art computer vision features and (2) an end-to-end deep learning approach.

Computer Vision
A wide range of features have been produced by the computer vision and image processing communities that can be used to classify images or perform regression on them [8].We do not know a priori which of these features will be most useful in performance prediction or physical measurement correlations.Ultimately what is desired is a set of features that are complete (i.e., they capture all materials attributes of interest) and concise (i.e., minimize redundant features).To that end, we chose two complementary state-of-the-art image feature extractors: (1) Bag-of-Visual-Words [11] to capture local shapes and (2) Binarized Statistical Image Features [12] to capture image textures.
A common technique known to perform well for general image classification is the Bag-of-Visual-Words (BoVW) [11] using Scale-Invariant-Feature-Transform (SIFT) vectors [12].SIFT captures local shapes (i.e., edges, corners, blobs, ridges) and is robust to changes in scale, rotation, illumination, and viewpoint.This technique has been successfully applied to materials science with microstructural image data [1], [2], [8] and we hypothesize that BoVW will capture the relevant microstructure in TATB SEM images as well.The algorithm works by computing SIFT features on all images and then clustering these features using k-means to establish the visual "words."A description vector is then formed for each image by assigning the output SIFT vectors to a cluster and computing the histogram (i.e.how many vectors are in each cluster for a given image).
An extension of BoVW known as Vector-of-Locally-Aggregated-Descriptors (VLAD) [13] utilizes a similar feature encoding pipeline to BoVW but characterizes the distribution within the cluster through the cumulative residuals in each dimension.Expressivity of the feature vector increases as the spatial distribution in each cluster is reflected in comparison to just cluster assignment.This step aims to mitigate the assumption carried over from Bag-of-Words (BoW) that each cluster (i.e., "word") is a single point with zero area.Furthermore, VLAD commonly replaces the k-means clustering algorithm with a Gaussian Mixture Model (GMM) for soft cluster assignment.
Soft assignment allows overlap amongst cluster distributions which can be considered during residual calculations for the VLAD encoding.Finally, KAZE [14] is used as a replacement for SIFT due to comparable, if not better, performance in detection and description as well as ease of use in recent versions of OpenCV [15].The VLAD encoded description vector is created by flattening the set of cluster residuals producing a description vector of length k × d where k is the number of clusters and d is the dimension of the KAZE feature.
With VLAD able to capture local shape information, we turn to image texture features as a way to capture differences in surface appearances.The computer vision literature is full of methods for capturing image texture features [16].The technique of Binarized Statistical Image Features (BSIF) [17] is a relatively recent and robust algorithm for separating distinct textures.BSIF works by binarizing convolutional responses to pre-learned filters and outputting the responses in a histogram, resulting in a 255-length vector descriptor for each image.The filters are computed by way of independent component analysis on a large set of sample images.
Given VLAD and BSIF features, we use supervised machine learning to train a regression model that can predict materials performance given a corresponding SEM image.Given training SEM images, labeled with known material performance values, the training procedure is: (1) extract VLAD and BSIF features from the image and (2) train a random forest (RF) regressor using these image features.The result of this training procedure is a RF regression model that will output a material performance prediction when given a new SEM image as input.We chose RF as a representative ML algorithm because it requires no meta-parameter tuning and has been shown to perform well for a wide range of ML problems.

Deep Learning
As an alternative to training a traditional ML model (e.g., RF) using a fixed set of extracted CV features, we consider an end-to-end deep learning (DL) solution.Again, we employ a supervised training procedure using labeled examples of SEM images.However, the end-to-end DL solution does not require a separate image feature extraction step.Instead, the DL learning algorithm requires only the raw SEM image pixels (plus the supervised materials performance labels) as inputs.Image features are extracted automatically based on characteristics of the data as part of the DL algorithm.Our DL approach consists of: (1) pretraining a deep convolutional neural network (CNN) on ImageNet [18] data, followed by (2) a supervised training phase using raw SEM images as input.

Background
Mechanical properties of consolidated molecular solids are important performance criteria for their applications.Molecular solids, such as active pharmaceutical ingredients (API) and high energetic (HE) compounds, are often used in their consolidated forms (i.e.tablets and pressed parts) [19], [20].Many factors govern the mechanical performance of consolidated parts, but none more so than the characteristics of the starting crystals (or particles), including crystal size, shape, surface texture (i.e.roughness), and density.In turn, these crystal characteristics influence the overall microstructures of the consolidated parts [21], [22].
Figure 1 shows typical SEM images of TATB crystals.Depending on the synthesis reaction conditions, different sets of TATB crystals can be synthesized.Quantifying TATB crystal characteristics and inferring the significance of these very different looking TATB crystals require significant prior knowledge and experience.To aid in this difficult task, computer vision techniques can be applied to extract image features from SEM images to provide quantifiable TATB crystal features.The extracted features can then be correlated to mechanical performance of consolidated TATB created from different lots of TATB crystals using machine learning (with an assumption that all subsequent processing conditions are identical).With a robust regression model correlating TATB features to mechanical performance, one can determine the key TATB features which dominate the mechanical properties of the consolidated TATB parts.This may provide insights to understand the fundamental TATB microstructures that contribute to the mechanical performance of consolidated TATB parts and guide synthesis processes to achieve the desired TATB crystal features.Herein, we report our efforts to develop a method to predict a figure-of-merit (compressive peak stress) for various lots of TATB, based solely on SEM images, by leveraging computer vision and machine learning.

Material Sample Preparation
The TATB lots were selected for their compression performance with a wide performance range.Figure 2 shows the "ground truth" compression performance measurements for each lot.TATB powder from each lot was uniaxially pressed in a cylindrical die at ambient to 0.5 in.diameter by 1 in.height, with a nominal density of 1.800 g cc -1 .Strain controlled compression tests were run in duplicate at 23°C at a ramp rate of 0.0001 s -1 on an MTS Mini-Bionix servohydraulic test system model 858 with a pair of 0.5-inch gauge length extensometers to collect strain data.From the obtained stress-strain curve, only the peak stress values were considered as the outputs of the machine learning models.

Image Data Preparation
For capturing microstructure information of TATB using SEM image analysis techniques, it is imperative that preparation and collection are uniform.TATB powder is adhered to SEM stubs by double-stick carbon tape on the stub that is placed gently into a reservoir of TATB powder.The excess loose TATB powder is gently blown off with compressed air.The samples are coated with nominally 3.3nm of gold prior to imaging.
The SEM images are collected with a Zeiss Sigma HD VP using a 30.00 µm aperture, 2.00 keV beam energy, and ca.5.1 mm working distance.The program 'Atlas' is used to automate the image collection and works by selecting a large area for the program to collect as smaller tiles with a slight overlap to later stitch together to create a large mosaic.In this analysis, we are using the individual tiles as our image population.The image tiles are set to have a field of view of 256.19 µm × 256.19 µm with a pixel size of 250.18 nm × 250.18 nm and to autofocus every 20 th tile.The images used in this analysis are collected using the SE2 secondary electron detector.The brightness and contrast levels are held constant across all images and samples.
In all, we collected 69,894 sample images from 30 lots of TATB (an average of 2,330 images per lot).Each image is associated with the single peak-stress value for the lot, measured mechanically as described in the previous subsection.

Machine Learning Implementation
Unless otherwise specified, we use default settings for all software libraries.We implemented VLAD in Python using OpenCV [15] for the KAZE image descriptor and scikitlearn for GMM clustering.We set k = 20 and d = 64, where k is the number of clusters and d is the dimension, to keep dimensionality low and create a small dictionary, motivated by the homogenous nature of the TATB lots.During the KAZE key point extraction we consider only the top 128 key points based on the response value, providing 128 KAZE features per image to be fed into the clustering algorithm.
For BSIF, we use the code and pre-computed filters available on the authors' website [23].We translated the original Matlab code to Python for use within our learning framework.
We train random forest regressors using Python scikit-learn [24] with n_estimators=100, max_depth=32, and max_features=1/3 (standard for random forest regression).Note that the scikit-learn RandomForestRegressor default for max_features is N, which is actually a degenerate random forest with no feature sampling.
Our DL approach consists of training deep convolutional neural networks (CNNs) using the Python Caffe framework [25].We started with an ImageNet [18] pretrained DenseNet 121 network [26].The SEM image is gray-scale, but it is converted to RGB in order to make it compatible with the DenseNet network.The target material performance values are normalized so that they range from approximately -1.0 to 1.0.The network has an input size of 352×352, but the SEM images are scaled to 384×384 using bilinear interpolation (standard OpenCV resizing).This allows us to do random cropping during training.We used mean subtraction preprocessing, using the mean derived from ImageNet.This seems to work better in practice than computing a dataset-specific mean for each new dataset.The initial learning rate is 0.01 using an exponential rate step down.We train for 20,000 iterations so that our step size is 200 iterations with a learning rate decay of 0.94 for each step.Mini-batch size is 32.We use standard stochastic gradient descent training with momentum set to 0.9 and weight decay set to 0.0002.

Machine Learning Experiments
To evaluate the ability of the proposed machine learning approaches to generalize to previously unseen materials, we employ a leave-one-lot-out cross-validation procedure.For each of the 30 lots L: we train a model on all lots other than L and then evaluate the trained model on lot L only.We use the trained model to predict peak-stress for each image in the evaluation lot and then calculate a single peak stress prediction for the lot as the median prediction over all images in the lot.

Peak-stress Prediction: Random Forest vs. Deep Learning
Figures 2 and 3 show respectively the peak stress predictions and mean absolute percentage error (MAPE) for RF and DL on each lot.Overall, DL outperforms RF, achieving 206 root mean square error (RMSE)/10% MAPE across all lots vs. 271 RMSE/13% MAPE for RF.However, Figure 3 highlights several exceptions, where RF error is lower than DL.These are lots E, F, AX, AT, V, and AW.We also note the large discrepancy in performance between RF and DL on lot R (we are currently investigating the source of the large error observed for lot R).However, even removing lot R from the evaluation, DL remains the clear winner overall (200 RMSE/9% MAPE Figure 2: Lot-by-lot predicted peak stress values for both the CV/RF approach and the DL approach, as well as the observed ground truth peak stress values from mechanical testing.Closer to ground truth prediction is better.

Figure 3: Lot-by-lot prediction error for RF, DL, and a non-image baseline (always predict mean peak stress). Error is measured using mean absolute percentage error (MAPE). Lots are ordered by increasing RF error to make differences between RF and DL clearer. Lower error is better.
vs. 235/11% for RF). Figure 3 also shows the performance of a simple baseline approach that doesn't use the image data at all but makes use of the distribution of peak stress values, in this case by always predicting the mean peak stress value across all lots (1,580 psi).This type of baseline approach is standard practice in the machine learning community and helps differentiate the effects of distributional information available in training labels from the feature information available in training images.This baseline approach achieves 419 RMSE/26% MAPE overall.

Learning Curves
In order to understand how model performance is affected by the availability of training data, we generated learning curves for RF and DL using the following cross-validation procedure with training set subsampling, which allows us to both: (a) vary the number of training lots available while (b) evaluating on each test lot exactly once: For each value of T = (5, 7, 9, …, 29): For each of 30 lots L: L is the test lot and is excluded from training Randomly select additional lots for exclusion until exactly T training lots remain Train a model M on these T training lots Test Err = evaluate M on held-out test lot L Report mean Test Error over each lot for T Figure 4 shows two versions of the resulting learning curves: Figure 4a shows the raw data, where each point represents an average over 30 cross-validation folds (as described above).Figure 4b shows a smoothed version of the same plot, transformed by a central moving average filter with a window size of 3.This smoothing is effectively a low-pass filter, which removes high-frequency fluctuations (i.e., trial-to-trial variance) and exposes the underlying trend.For both RF and DL, generalization error drops as we add more training lots.Therefore, it appears that variance (i.e., overfitting) is a significant source error for both models.RF is less subject to overfitting than DL for small training sizes since it is a lower complexity model.However, as more training lots are provided, RF performance plateaus as model capacity is exhausted and RF begins to underfit the data.DL performance continues to improve steadily right up to training on 29 of 30 lots, indicating that even 30 lots is not enough to take advantage of the full DL model capacity.This strongly suggests that DL performance will continue to improve as more material lots become available.

Discussion
Here we discuss several aspects of our findings in more detail.
Computer vision is an effective approach for correlating material microstructure with performance.Figure 3 shows that: (a) the SEM images do in fact contain information that is correlated with material performance and (b) both engineered computer vision features and automatically learned DL features are effective in extracting this information from the images.Synthesizing more material lots improves performance.Figure 4 shows that both RF and DL achieve lower generalization error by training over a diverse set of lots.The performance of DL, in particular, improves sharply as more training lots are added.The results indicate that this trend will continue as more lots are collected.Therefore, we expect DL prediction performance to continue to improve as more material lots become available.
Deep Learning is the more powerful method.In addition to the convenience and robustness of DL's automated image feature extraction, we have demonstrated that DL is the best performer overall.Specifically, Figures 3 and 4 show that: (a) DL outperforms RF given sufficient training data (≈20 material lots or more) and (b) the performance gap between RF and DL increases with the number of training lots available.Since DL is a higher-complexity model, it is able to fit whatever data is provided (at least up to 30 lots), whereas RF performance begins to plateau around 15 training lots due to underfitting.
The more powerful method is not always the best.The one area where RF consistently outperforms DL is in the "small data" regime, where training lots are scarce (see Figure 4).This is a crucial caveat to the dominance of DL because scientific applications often fall within this small data regime due to the time, effort, and expense required to conduct experiments.Our results provide an important reminder to always compare to simpler methods as baselines, especially when data is scarce.The more powerful method is not always the best.

Future work.
The above discussion suggests a number of avenues for further improving model performance for this task: (1) We expect synthesis of more material lots to automatically lead to gains in DL performance without any changes to the modelling approach.At what point DL performance eventually plateaus is a question for further investigation.(2) In addition to performance prediction, an important goal for experimentalists is extracting insights from machine learned models.We want to understand what specific characteristics of the material microstructure are contributing to performance and, ultimately, how these characteristics are influenced by synthesis parameters.Model interpretability or explainability is an open area of research in machine learning.We plan to investigate materials-specific approaches to explainability.(3) Finally, since our results indicate that DL performance will continue to improve with more data, but more data is sometimes impossible or impractical to obtain, we will investigate methods for augmenting SEM images with other data sources (i.e., particle size analysis, surface area measurements, etc.) as well as artificially generated SEM images.

Conclusion
Rapid advancements in computer science tools are changing the landscape of data science.Application of computer vision, machine learning, and deep learning in materials science can provide powerful tools to analyze, query and automate scientific data analysis.As scientific capabilities progress and generate large amounts of data, advanced data analytics tools must be implemented.To that end, we explored the application of computer vision and machine learning to quantify materials properties (i.e., compressive strength) based on SEM images of materials microstructure.We showed that it is possible to train machine learning models to predict materials performance based on SEM images alone, demonstrating this capability on the real-world problem of predicting uniaxially compressed peak stress of consolidated TATB samples.
We explored two complementary approaches to this problem: (1) a traditional machine learning approach (random forest) using state-of-the-art computer vision features and (2) an end-to-end deep learning approach, where features are learned automatically from raw images.We demonstrated the complementarity of these approaches, showing that random forest performs best in the "small data" regime in which many real-world scientific applications reside, whereas deep learning outpaces random forest in the "big data" regime, where abundant training samples are available.
Based on our findings, we outlined several future research directions in order to further improve the utility of the approach.These include: (1) synthesizing new material lots to better understand the performance ceiling of the deep learning approach, (2) exploring questions of model explainability to extract experimental insights from trained models, and (3) data augmentation approaches to overcome the limited availability of materials samples.

Figure 1 .
Figure 1.Examples of different TATB crystal structures with varying synthesis conditions at identical magnifications.
C R E D AO AR U W AP AW AT X AZ AY AX T AS AV AU AQ V P F O M L AM Peak Stress (psi) AT AR V AZ AV AW AS D C T X I P AU M AO O L AQ U H AP W AM AY N

Figure 5 :
Figure 5: Lot-by-lot prediction error for RF trained on different feature sets: VLAD only, BSIF only, and a combined model using both BSIF and VLAD.Error is measured using mean absolute percentage error (MAPE).Lots are ordered by increasing BSIF error to make differences between feature sets clearer.Lower error is better.

Raw (4a) and smoothed (4b) learning curves for RF and DL. DL overfits early on, but the fit gets better and better with more data. RF is less subject to overfitting for small training sizes, but eventually underfits as model capacity is exhausted. Lower error is better.
The baseline error for this task is 419 RMSE/26% MAPE, achieved by a simple approach which ignores image data completely and makes use of only the distribution of peak stress values.Compared to this baseline, RF reduces RMSE by 35% and MAPE by 50% on average, and DL 51% and MAPE by 62% on average.Therefore, there is clearly some signal in the image data that both RF and DL are able to effectively correlate with peak stress.