Deep Learning-Powered Optical Microscopy for Steel Research

: The success of machine learning (ML) models in object or pattern recognition naturally leads to ML being employed in the classification of the microstructure of steel surfaces. Light optical microscopy (LOM) is the traditional imaging process in this field. However, the increasing use of ML to extract or relate more aspects of the aforementioned materials and the limitations of LOM motivated us to provide an improvement to the established image acquisition process. In essence, we perform style transfer from LOM to scanning electron microscopy (SEM) combined with “intelligent” upscaling. This is achieved by employing an ML model trained on a multimodal dataset to generate an SEM-like image from the corresponding LOM image. This transformation, in our opinion, which is corroborated by a detailed analysis of the source, target and prediction, successfully pushes the limits of LOM in the case of steel surfaces. The expected consequence is the improvement of the precise characterization of advanced multiphase steels’ structure based on these transformed LOM images.


Introduction
In the steel industry, challenges include producing alloys with tailored properties and minimizing waste.The mechanical properties for a given chemical composition are determined by the micro-/nanostructure of the steel, i.e., the presence of different steel phases, their size and their shape.This fact implies the need to analyze the structure and possibly correlate (the preparation process and) the structural data and properties.The correlation can be "studied" using machine learning (ML) models, sometimes also referred to by the fancier term "artificial intelligence" (AI) in the broader context; see, e.g., the recent review paper [1].The aforementioned challenges and (potential) use of ML puts restrictions on imaging techniques and it pushes for high-quality and detailed information about the micro-/nanostructure.
The physical limitations, in particular the resolution, of this imaging method can be partially circumvented by using state-of-the-art machines or advanced "super-resolution microscopy" techniques; see, e.g., Ref. [6] for a review of some of them.Unfortunately, these techniques-utilized very often in biology-may not always be applicable to a given objective.Quite often, LOM is used together with other methods (see, e.g., Refs.[7][8][9][10]) to acquire more complete information about the system studied.Sometimes, its capabilities complicate the task at hand [11,12].Of course, one can improve the phase contrast by using special etching and carefully adapting other steps in the sample preparation before the actual imaging of the sample in question.
However, can something else be done to "push towards" higher-quality images when using LOM as the single input?We believe so, and we introduce a new approach application that is primarily aimed at the precise characterization of advanced multiphase steels.We employ a multimodal approach, well established in biology, by imaging the same field of view with different techniques.The core of the multimodal approach is to use several probes-light and electrons-and/or detection techniques to acquire more complete information about the system investigated and then train an ML model on the data to transform a LOM image.Here, we use this approach to successfully push the limits of LOM in the case of steel surfaces.
We are well aware that it is fundamentally impossible to create high-resolution images with more information than was stored in the original data without "hallucinating" the details.However, some portion of the higher-resolution information may only be hidden from human perception by the image blurriness, deformations and noise.We propose that the rest of the missing information can be completed from the knowledge of general properties of the investigated materials and their high-resolution images.This hypothesis is validated using the latter input in experiments of transforming LOM images into SEM images using deep learning techniques trained on an extensive image dataset of steels.The training data consist of LOM-SEM pairs of images of the same field of view.It is implicitly assumed that the ML model is able to generalize the necessary general properties.This is a natural consequence of the fact that ML finds statistical patterns that generalize the data outside the training dataset.Thus, the abovementioned software-based transformation of the LOM images represents more than a simple "style transfer".
We tested different models and trained each of them to distill all of the important information from low-resolution images in order to combine it with general knowledge of the investigated materials-generalized about by the model during its training-and thus generate high-resolution images.We started with a so-called U-Net neural network architecture [13] but eventually switched to a model based on generative adversarial networks (GANs) [14].This allowed us to achieve high precision and consistency in the generation process.Please see Section 2.5 for more details.Extra care was taken to prevent the model(s) from creating any details not present in the original low-quality data.This made the resulting pseudo-SEM images suitable for further processing such as by segmentation or phase classification.To the best of our knowledge, there are no publicly available GAN-based models for converting LOM images to SEM images.Thus, we expect the present findings to be of high potential to both the experimentalists and the steel industry.

Materials
Four types of steel were investigated.Their chemical composition is shown in Table 1.The first two chemical compositions were measured on an optical emission spectrometer; the third is declared by the manufacturer [15] and the fourth by the corresponding standard [16].Each steel was wet-cut on a Struers Secotom-60 rip saw to the final sample dimensions with a maximum area of 100 mm 2 .The samples were then hot-mounted in a Struers CitoPress-1 press.The mounting of the samples preceded wet grinding on a Struers Tegramin-20 on MD Piano diamond discs of 220, 500 and 1200 grit and SiC abrasive foils of 2000 and 4000 grit for 3 to 5 min.Mechanical polishing followed on the same apparatus using MD Dac cloth using Struers diamond paste with a grain size of 3 µm and on MD Nap cloth using Struers diamond paste with grain sizes of 1 and 0.25 µm.Cooling of the pad was achieved by the addition of isopropyl alcohol.All samples were chemically etched in a solution of 100 mL ethanol and 4 mL concentrated nitric acid (Nital 4%) for 3 s for visualization of the structure and removal of the deformed layer after mechanical preparation.The S355J2 sample was etched for 6 s instead of 3 s.We intentionally selected a rather standard preparation procedure (without special etching chemicals that may enhance contrast) in order to ensure the procedure can be easily reproduced in the largest number of laboratories.

Microscope Equipment
LOM images were acquired on a fully automated Zeiss Axio Observer 7 materials inverted light microscope equipped with EC Epiplan-Neofluar objectives.The lens, with which the bright-field images were collected, is defined by a 100× magnification, a numerical aperture of 0.9 and a working distance of 1 mm.All LOM images were taken at a 1000× magnification.The microscope is equipped with a microLED illuminator, its color temperature is 5700 K and it has a color-rendering index is equal to >90.Image quality is ensured by a 5 megapixel Zeiss Axiocam 305 color camera with CMOS Global Shutter technology [17].Autofocus was turned on in most of the cases.
Confocal laser scanning microscopy (CLSM) images were acquired on a VK-X1000 microscope by KEYENCE (residing in Mechelen, Belgium).It is equipped with an X1100 head unit with a 404 nm violet semiconductor laser.The Nikon CF IC EPI Plan ApoDeluxe objective, with which the laser images were collected, is defined by a magnification of 150×, a numerical aperture of 0.95 and a working distance of 0.2 mm.All CLSM images were acquired at a magnification of 1500× [18].
In order to obtain high-resolution images, we performed a series of measurements of our test samples on a scanning electron microscope (SEM).We used an ultra-high-resolution Magellan 400 FEG SEM with an Elstar column (Thermo Fisher Scientific Inc., Waltham, MA, USA).The microscope is equipped with several in-lens and out-lens detectors and can operate in ultra-high-resolution, high-resolution (HR) and beam deceleration modes.Our experiments utilized a circular backscatter segmented (CBS) detector, which was placed under the objective lens.The CBS detector is an annular detector (see Figure 1).Additional data from the Everhart-Thornley detector (ETD) were also acquired simultaneously with the CBS data.We imaged the sample with the following parameters: primary beam energy E P = 5 keV, beam current I P = 0.8 nA, working distance WD = 8 mm, signal from all segments of the CBS detector, HR mode.

Data Collecting
The task of locating specific regions of interest in a sample area measuring just a few micrometers by utilizing various imaging techniques can be notably difficult.This challenge is particularly pronounced when employing instruments from various manufacturers, as previously described.By adopting a colocalization grid, we developed a method to systematically capture a large volume of images from targeted regions using our equipment.We opted to simplify the navigation process by introducing a grid onto the finely etched metallographic sample.Initially, we glued a TEM grid to the sample surface (this pertains to the TRIP2 dataset only, the earliest data).After several iterations, an improved method for colocalization by engraved navigation grid was utilized instead (this pertains to the TRIP1, USIBOR and S355J2 datasets).See Figure 2 for a visual example of the two navigation grids we used independent of each other.An auxiliary grid-either of the two types described above-facilitates correlative mapping of relatively large microstructural areas, with each grid cell measuring approximately 500 × 500 µm.This enables detailed examination at the required magnification across different microscopy modalities.The areas of interest were captured using images that partially overlapped-ranging from 10% to 20% overlap, depending on the techniqueto facilitate subsequent processing, as outlined in the dataset workflow.For illustration, a single exemplary grid cell was mapped using 20 LOM images at a magnification of 1000×, 48 CLSM images at 1500× magnification and 48 SEM images at 1200× magnification.Further details are provided in Table 2.
Table 2. Description of the as-measured data.We note the final dataset consists of grayscale (GS) images, i.e., of a single channel, of size 1024 × 1024 pixels with bit depth equal to 8.

Creating Datasets from Raw Images
The images obtained from each microscope present challenges in alignment due to their varying fields of view, aspect ratios, resolutions or degrees of overlap.In order to address these challenges and to simplify the alignment process, we adopted the workflow outlined in Figure 3, as detailed in our publication [19].While this method facilitates rapid alignment, discrepancies may still occur because of the significant differences across imaging modalities, even after fine-tuning the alignment parameters.As a result, a meticulous review of the compiled dataset by an expert was indispensable.Due to factors such as microstructure complexity, imaging conditions, stitching artifacts or contamination, a significant portion of the images in the USIBOR dataset required thorough scrutiny.In contrast, nearly 90% of the images registered in the TRIP2 dataset were deemed satisfactory.Table 3 contains   In order to demonstrate the model's capabilities, an ETD dataset was prepared for the TRIP1 steel.The workflow used for image registration was identical to that for CBS images with the exception that intensity-inverted ETD images replaced the CBS images.Intensity inversion was employed for the purpose of achieving a better alignment due to the significant visual differences between the grayscale LOM image and the original ETD images, which hindered proper automatic registration of the as-is images.The intensity of the aligned inverted ETD images was reverted back to its original value.

Method Alias ML Models
We used two models-a "vanilla" U-Net and a GAN-based model.A simplified description of the U-Net architecture is shown in Figure 4, and more details are present in Section 2.5.1.An illustration of the architecture of the GAN is displayed in Figure 5.  Traditional GANs are generative models that learn mapping from a noise vector z to an image y, G : z → y [14].Another well-known architecture called a conditional GAN [20] is trained to map an input image x and a noise vector z to an output image y, G : {x, z} → y.The purpose of having the noise vector z in the input is to achieve "creativity" of the generative process.Some image-to-image models (see, e.g., Ref. [21]) attempting to achieve a good balance between creativity and determinism propose to train just mapping from an image to another image, G : x → y, and compensate for the lack of stochasticity by adding dropout layers [22].For our purposes, we need to avoid creativity (i.e., "hallucinations") and stochasticity as much as possible; hence, we skip both the noise vector and dropout and train G : x → y without noise.This results in less diverse but more consistent and predictable results.

Generator
Our generator G is based on the U-Net neural network architecture with an altered output layer.U-Net [13] is a fully convolutional neural network [23] that uses skip connections [24] to avoid the gradient vanishing problem.Our implementation of a U-Net consists of 10 convolutional layers in the contracting path and 10 convolutional layers in the expanding path.In contrast to the original implementation of the U-Net, we use a combination of up-sampling and convolution instead of single up-convolution layers, which turns out to be more resistant against creating so-called checkerboard artifacts in the output images [25].
The most important difference between our implementation and the original U-Net is in the last layer.Instead of the sigmoid activation function with binary cross-entropy loss function designed for the segmentation task, we use the hyperbolic tangent activation function with mean absolute error (MAE) as the loss function.The reason for usage of MAE instead of the more commonly used mean squared error (MSE) is that it typically produces sharper output images [26].

Discriminator
Discriminators in GAN architectures are typically neural network-based binary classifiers trained to distinguish fake (generated) images from real ones.In our model, we used a deep convolutional network consisting of 5 convolutional layers with batch normalization [27] and a leaky ReLU (LReLU) activation function, which turned out to be a better choice than standard ReLU [27].We also employed the idea of a PatchGAN discriminator [21], which applies the discriminator on small patches of the investigated image and then computes the average loss.This method helps to improve the quality of the resulting images when working with a high resolution.

GAN Objective
The generator and the discriminator are trained in an adversarial way, i.e., in two steps.For each batch of training examples, we first optimize the loss function of the discriminator: where L BCE stands for the binary cross-entropy loss function [28] and m × n are the dimensions of the patch matrix.The formula can be broken down into two parts-the first part, where the discriminator is optimized to predict ones for true output images y, and the second part, where the discriminator is optimized to predict zeros for fake (generated) output images.
Then we freeze the weights of the discriminator and optimize the loss function of the generator: where L MAE is the pixel-wise mean absolute error and λ is a scalar coefficient.The first part of the loss function is designed such that the generator is trained to fool the discriminator.
In the second part, we train the generator to minimize the MAE of the generated and true output images.The coefficient λ controls the relative importance of these two parts.

Experiments
A large dataset of LOM-SEM image pairs was used in the experiments.We collected 847 grayscale image pairs with a resolution of 1024 × 1024 pixels for the CBS detector and 206 image pairs in the same resolution for the ETD detector.
For testing the CBS detector, we used various types of steel materials.For testing the ETD detector, only TRIP 1 steel was used.The constitution of both the CBS and ETD datasets is described in Table 3.
Before training, the dataset was randomly split into training and validation datasets with proportions of 90% for training and 10% for validation.To increase the diversity of the training dataset, we applied several augmentation techniques.The augmentation procedure is defined as follows: 1.
Choose a random image pair from the training dataset.

2.
Perform a random crop of the paired images, resulting in an image size of 512 × 512 pixels.

3.
With a probability of 0.5, apply a horizontal flip.

4.
With a probability of 0.5, apply a vertical flip.
This procedure is applied to obtain all the samples forming each training batch.We trained the CBS and ETD models separately.All models were trained for 10,000 epochs with a batch size of 8 on a Tesla V100 GPU unit with 16 gigabytes of internal memory.The training of each model took approximately 10 days.
The pixel grayscale 8-bit values were normalized to the interval [−1, 1] before training and validation.The standardization mapping S and its inverse (employed when finalizing the predicted data) are presented in Equation (2).
where the scale s b is half of the maximal achievable value 2 b − 1 in the original 0-based integer data corresponding to the bit depth b = 8.

Results and Discussion
Quantitative evaluation of the level of image improvement is quite complicated.We used a standard evaluation metric called root mean squared error (RMSE), originally designed for the evaluation of regression models.Specifically, we measured the root mean of the square differences between the pixels of the SEM and predicted images.
Using the 8-bit depth optimizes the size of the batch in the GPU-RAM, and we are convinced that 16-bit depth is unnecessarily large.
In order to demonstrate the benefits of the GAN architecture, we compared a vanilla U-Net model with the GAN model for the CBS dataset.The vanilla U-Net model has exactly the same architecture as the standalone generator from the GAN model.We obtained RMSE = 0.2109 for the vanilla U-Net and RMSE = 0.2059 for the GAN model after 10,000 epochs of training.Both RMSE values correspond to the standardized data (see Equation ( 2)).
Preliminary visual examination of the CBS predictions, with only a selection displayed in the below images, shows that the GAN model significantly outperforms the vanilla U-Net model on the CBS data.A similar procedure was repeated for the ETD data but only in the case of the GAN model.
Pearlite is composed of alternating layers of ferrite and cementite that form a lamellar structure.The lamellar structure is very fine and invisible (or hardly visible) in our light optical microscope; see Figure 6.The lamellas in the LOM micrographs coincide with each other, and the result is only a dark, blurred area.Reliable identification of a pearlite phase in the LOM micrographs is impossible.The U-Net prediction slightly improves the visibility of the pearlite internal structure, but lamellas are still invisible.GAN predictions are markedly realistic, and they enable us to identify the pearlite phase.Let us note that the pearlite structure is visible in LOM images reported in Ref. [11].However the details of data acquisition are not explicitly mentioned, and we conclude that a coarser pearlite structure can be imaged using LOM.Obviously, the LOM micrographs are hardly suitable for visualization of the complex microstructure of TRIP steel consisting of a ferrite-bainite matrix and secondary phases arising from the matrix (as a consequence of selective etching), such as martensite, retained austenite and martensite-austenite constituents.The secondary phases are very fine and hence partly blurred in the LOM.The U-Net and GAN predictions are able to depict the secondary phases and better define matrix properties.The GAN pictures present the structure more realistically than the U-Net, e.g., see the region marked by the second arrow from top in Figure 7.
The insufficiency of the simple U-Net model is clearly seen in Figures 8 and 9, where the U-Net manages to approximate the boundaries among the phases but visual contrast among the phases is significantly suppressed.On the other hand, the GAN model retains the visual distinction of the secondary phases and at the same time represents the phase boundaries in a better way, which can be observed in the finer features.
The displayed region of the USIBOR sample, see Figure 10, consists mostly of the martensitic phase.We see that this pure martensite is the most difficult to describe for the models presented here.
In order to demonstrate a real-life application of LOM micrograph enhancement, the original LOM image was transformed into a CBS-like image using a GAN.The input RGB LOM image was automatically converted to an 8-bit grayscale format and then upscaled from its original size to a resolution approximately matching that of an SEM image, using bilinear interpolation.The upscaled image was then divided into 1024 × 1024 px tiles, which were suitable as input for model prediction.After the prediction of all tiles, they were stitched together using the OpenCV library to match the field of view of the original LOM image but with the pixel resolution of an SEM image.The original RGB LOM image and the CBS-like prediction can be compared in Figure 11, which illustrates the possible output of our model.We note that this LOM field of view has no corresponding SEM data measured.The preparation of the dataset for training of the ML model revealed that, e.g., a high value of MS-SSIM does not necessarily guarantee well-aligned images.We decided not to base the discussion of the transformed LOM images solely on quantitative results of metrics measuring their "distance" from the target SEM (either CBS or ETD) images.We describe the transformation in terms of a steel microstructure analysis by the naked eye of an expert, one of the coauthors.Nevertheless, the metrics are still a useful tool in the postprocessing of the images, e.g., they clearly indicate that the GAN model performs better than the U-Net one.
Let us comment on the metrics and their use in more detail.As already described in Section 2.6, the data from pictures were standardized to the interval [−1, 1] before training using the simple linear transformation in Equation (2).Consider a metric proportional to a power of absolute value of difference of pixel values, i.e., of the following type: where the power is understood element-wise and C is a constant of proportionality which may be related to the pixel count N pixels .Such a general case covers both the (R)MSE (p = 2) and MAE (q = p = 1) metrics.Consider a linear transformation, specified by means of two scalar coefficients a and b, of the independent variables (such as the standardization mapping S b and its inverse S −1 b in Equation ( 2)).Then it follows that The above considerations show that the same linear transformation applied to both images affects some of the metrics only by an overall multiplicative factor.This means that it does not alter the performance order of the different models when evaluated by metrics of the type in Equation (3).The above statement is not valid in the case of several other metrics tested.Some of them are not implemented in the case of noninteger input data (namely, MS-SSIM), and some produce a different order of the models for as-is and standardized values.
Let us consider the images displayed in Figure 6 through Figure 10 as a small sample of the test dataset, with each image containing some 262,000 pixels.We use several of the reasonable metrics as implemented in Python libraries Sci-Kit Image [29], version 0.22, and sewar [30], version 0.4.6.We calculate the values of metrics for all the SEM-based predictions and the as-is LOM image.The results are presented in Appendix A in the case of images displayed in this paper, with Figure 9 excluded since the TRIP2 steel is already represented.
Using these data only, we find that the (R)MSE and universal quality index (UQI) metrics prefer the U-Net model except on the TRIP2 steel (512-10_T5-15-TRIP2). See Tables A1 and A2 for the detailed values.The differences in RMSE values are less than 10%.Quite surprisingly, the MAE has the lowest values in the case of as-is LOM images, which we disregard, followed by the GAN-CBS model.We attribute the better performance of GAN-CBS models over U-Net-CBS to the fact that the MAE was used in the generator training; see Equation (1).
Thus far, we have discussed the metrics of the type described in Equation ( 3) except for the UQI.Let us consider other metrics that take into account other features than the differences in the individual pixels.Such metrics are as their titles suggest, e.g., SSIM and MS-SSIM; these two do prefer the GAN-CBS model except for the case of TRIP1 steel; the differences in the SSIM values do not exceed 6%.
Thus, the differences among the U-Net-CBS and GAN-CBS models as measured by the above-discussed metric do not seem to be very large.Two SSIM-based metrics that take into account "the larger picture" and not only the differences in individual pixels prefer the GAN-based model.In other words, these two metrics indicate that the predictions of GAN-based models are somewhat better.
The above-described conclusion based on the metric values was visually corroborated when examining the images in Figure 6 through Figure 10 in detail.This is illustrated in the following three figures, yet to be described.This means that GAN performs better in the style-transfer part of the processing.
A zoom of a pearlite-heavy region is displayed in Figure 12.It shows that the boundaries between the pearlite and ferritic matrix are stricter in the case of both the U-Net and GAN models than in the case of the original LOM image.Both LOM and U-Net prediction misses information about pearlite's inner structure, i.e., cementite laths are invisible.On the other hand, the GAN model indication of this inner structure is present, although the orientation is mostly incorrect.Nevertheless, this indication is enough to provide a hint that pearlite is observed.This shows that apart from more training data for pearlite, a separate model may be needed.Furthermore, Figure 14 clearly shows the superiority of the GAN model over U-Net; the former presents improvement in the contrast and visibility of the secondary phases.The secondary phases become easier to separate from the matrix, and their shape is more precise and closer to SEM micrographs.The USIBOR material is clearly the hardest to describe, see Figure 10.This is due to two factors.First, this mostly martensitic steel has the richest microstructure.Second, the dataset is not fully balanced, as indicated in Table 3; the USIBOR represents the smallest part of the training dataset.We decided not to artificially (increase the) augment(ation of) this particular material to avoid performance degradation on the other, more frequently occurring materials.Now, let us close this section by discussing the robustness of the presented ML model.We intentionally used the "simplest" etching that is widely available.Furthermore, the range of settings used in imaging using LOM and SEM was rather narrow, though the use of the autofocus tool in the case of the LOM images ensured a certain degree of variability in the imaging conditions.On the one hand, this means this model is not very robust against such changes.On the other hand, it implies that employing the typical standard sample preparation procedure-widely available at low cost-should provide the best results.We believe that this represents a fair trade-off between demands on laboratory costs and skills of the operators (including but not limited to knowledge of which etching to use on which material to achieve the best contrast among the phases) and the applicability of the model.

Conclusions and Outlook
We presented a software-based transformation of LOM images trained on pairs of LOM and corresponding high-resolution SEM images acquired after a standard sample preparation technique (polishing and chemical etching with Nital).The resulting output of the neural network exceeds a simple "style transfer" by making some features-previously obscured in the as-acquired LOM images-more pronounced in the predicted output, i.e., it can be regarded as a super-resolution (pixel upscaling of the original LOM).The quality of the style transfer was measured with three relevant metrics (MAE, NRMSE and SSIM), comparing the predictions to the corresponding testing SEM data, which implied that the vanilla U-Net performance is worse.Furthermore, the data were analyzed by the naked eye of experts, and the findings clearly indicate improvements such as deblurring and denoising of the phase boundaries.
Thus, we are confident that the reported GAN-based transformation can improve any subsequent processing of the resulting transformed images provided the sample preparation procedure and imaging settings are reasonably close to those described in this paper.This, of course, includes semantic segmentation.As a result, we expect improvements in techniques such as machine learning-based prediction of material properties utilizing datasets combining knowledge of both the microstructure (analysis of surface micrographs) and mechanical properties of the samples.Because we kept the steel processing to a common standard, most notably etching with Nital, we believe the presented model could be successfully applied to LOM data measured by a wide range of metallographic laboratories.
A possible continuation of this work can include exploring different sample preparation techniques, attempting to improve the transformation model itself or training the model on a larger dataset when more data are acquired.A natural extension of the herepresented work is to proceed with the semantic segmentation-our original motivationand to compare the results from as-acquired LOM images to those from the predicted transformed images.

RMSE Root MSE SSIM
Structure similarity index measure UQI Universal quality index

Appendix A. Several Metrics Calculated on the Example Figures
We present values of metrics calculated for figures in Figure 6 through Figure 10 (except for Figure 9).The reference image is always the target modality, i.e., SEM-either CBS or ETD.

Figure 1 .
Figure 1.Schematics of the CBS and ETD detectors' arrangement in the HR mode for collecting backscattered electrons (BSEs) emitted from the sample.

Figure 2 .
Figure 2. Illustrations of different navigation grids include (a) a TEM grid that was glued onto the sample (early data only), and (b) a picosecond laser-engraved navigation grid, with a subgrid and one of its individual cells highlighted in the top-left corner; a single square subgrid and its elemental square are highlighted by red color.Republished from Ref. [19] with permission.

Figure 3 .
Figure 3.This schematic illustrates the step-by-step workflow that was employed to generate a final dataset of correlative images of a bulk metallographic sample captured using SEM, CLSM and LOM modalities.The process begins with the engraving of a navigation grid and culminates in the creation of the final dataset.Republished from Ref.[19] with permission.
the quantity and distribution of training examples in the final dataset.

Figure 4 .
Figure 4.The U-Net neural network architecture.Both input and output are grayscale images and their three dimensions-pixels in both directions and the number of channels-are explicitly provided.All other single values beside individual components represent the number of features/filters in the corresponding convolutional layers except where otherwise noted.The number of pixels is clear from the operations performed; convolution is always paired with padding (using TensorFlow's function Conv2D with its parameter padding set to value 'same') at the edges to prevent pixel-count reduction.Two inputs are joined by a simple layer concatenation indicated by a circle.

Figure 5 .
Figure 5. Visualization of the discriminator part of the GAN architecture.The U-Net serves as the generator (see Figure4) and a basic CNN (displayed) as the discriminator.The meaning of symbols is as in Figure4.The "zoom" in the left-most part indicates an internal batching process.

Figure 6 .
Figure 6.The first row displays as-measured preprocessed data; the second row comprises transformed LOM images.As the labels indicate, the first column represents the U-Net results and the rest are GAN results.We marked some occurrences of several phases which may include ferrite (red "F"), pearlite (red "P") and (other) secondary phases.The displayed field of view represents one corner-aligned 512 × 512 tile, a quarter of a single 1024 × 1024 image in the dataset.The material is construction steel S355J2.

Figure 7 .
Figure 7.The same as Figure6in the case of TRIP1 steel.Yellow "SP"; each highlighted example is indicated by an arrow.We marked some occurrences of several phases which may include bainite (red "B"), ferrite (red "F").

Figure 8 .
Figure 8.The same as Figure 6 in the case of TRIP2 steel.Yellow "SP"; each highlighted example is indicated by an arrow.

Figure 9 .
Figure 9.The same as Figure 6 in the case of TRIP2 steel (another dataset).Yellow "SP"; each highlighted example is indicated by an arrow.

Figure 10 .
Figure 10.The same as Figure 6 in the case of boron steel for hot-stamping USIBOR.

Figure 11 .
Figure 11.Comparison of an original RGB LOM image (2464 × 2056 pixels) with the CBS-like prediction (8069 × 6745 pixels).Both images were cropped to the same field of view in order to remove minor artifacts in the prediction (due to inelastic stitching of individual predicted tiles).

Figure 12 .
Figure 12.Zoom of a region of interest in the case of S355J2 steel, top-right corner of segments in Figure 6.

Figure 13
Figure13shows that the ETD is described by the GAN model accurately.

Figure 13 .
Figure 13.The same as Figure 12 in the case of TRIP1 steel, top-right corner of segments in Figure 7.The arrows highlight visual improvements over LOM.We note that pure U-Net ETD prediction is missing as this model was not trained on ETD data.

Figure 14 .
Figure 14.The same as Figure 12 in the case of TRIP2 steel, slightly below the center in Figure 9.The arrows highlight visual improvements over LOM.

Table 1 .
Chemical composition [wt.%] of the steel samples considered.

Table 3 .
Distribution of a final training dataset (847 grayscale 8-bit image pairs (LOM/CBS) 1024 × 1024 px, 206 grayscale 8-bit image (LOM/ETD) 1024 × 1024 px), both absolute and relative number of LOM-SEM pairs in the case of the two modes CBS and ETD.

Table A1 .
Values of selected metrics in the case of the CBS-based images; the data correspond to Figures 6-10.

Table A2 .
Values of selected metrics in the case of the ETD-based images; the data correspond to