Deep learning-based monitoring of laser powder bed fusion process on variable time-scales using heterogeneous sensing and operando X-ray radiography guidance Manufacturing

Harnessing the full potential of the metal-based Laser Powder Bed Fusion process (LPBF) relies heavily on how effectively the overall reliability and stability of the manufactured part can be ensured. To this aim, the recent advances in sensorization and processing of the associated signals using Machine Learning (ML) techniques have made in situ monitoring a viable alternative to post-mortem techniques such as X-ray tomography or ultrasounds for the assessment of parts. Indeed, the primary advantage of in situ monitoring over post-mortem analysis is that the process can be stopped in case of discrepancies, saving resources. Additionally, mitigations to repair the discrepancies can also be performed. However, the in situ monitoring strategies based on classifying processing regimes reported in the literature so far operate on signals of fixed length in time, constraining the generalization of the trained ML model by not allowing monitoring processes with heterogeneous laser scanning strategies. As a part of this work, we try to bridge this gap by developing a hybrid Deep Learning (DL) model by combining Convolutional Neural Networks (CNNs) with Long-Short Term Memory (LSTM) that can operate over variable time-scales. The proposed hybrid DL model was trained on signals obtained from a heterogeneous time-synced sensing system consisting of four sensors, namely back reflection (BR), Visible, Infra-Red (IR), and structure- borne Acoustic Emission (AE). The signals captured different phenomena related to the LPBF process zone and were used to classify three regimes: Lack of Fusion (LoF) , conduction mode and Keyhole. Specifically, these three regimes were induced by printing cubes out of austenitic Stainless steel (316 L) on a mini-LPBF device with operando high-speed synchrotron X-ray imaging and signal acquisition with the developed heterogeneous sensing system. The operando X-ray imaging analysis ensured that the regimes correlated with the defined process parameters. During the validation procedure of the trained hybrid DL model, the model predicted three regimes with an accuracy of about 98% across various time scales, ranging from 0.5 ms to 4 ms. In addition to tracking the model performance, a sensitivity analysis of the trained hybrid model was conducted, which showed that the BR and AE sensors carried more relevant information to guide the decision-making process than the other two sensors used in this work.


Introduction
All businesses whose production line requires prototypical high mix and low volumesuch as aerospace, biomedical, automotive, and tooling industrieshave banked on metal-based Additive Manufacturing (AM) techniques such as Laser Powder Bed Fusion process (LPBF) [1]. LPBF is indeed emerging as a commercial manufacturing technology due to its ability to produce complex geometries, customized parts, and open-cell structures (3D lattices) with little material waste [2,3] resulting in a paradigm shift in the manufacturing domain [4]. LPBF process can generally be described as follows; a metal powder layer is deposited on a build plate; upon irradiation from a laser source, this powder is heated and melts to form a melt pool which, upon rapid cooling, solidifies. After the laser scanning of the part's cross-section (also called slice), the building platform is lowered by a predefined distance, followed by a new powder layer deposition. This process is repeated until the completion of the part is achieved [5].
Though the process looks conceptually simple, the utilization of high-powered lasers to fully melt a layer of powder particles brings about complex hydrodynamic phenomena such as multiphase interaction between vapor plume, atmospheric gas, and the materialall of which are not very well understood. These interactions often increase the complications of the process via complex melt flow dynamics and the formation of by-products (i.e., spatter particles and condensates), resulting in the generation of defects in the final consolidated parts. Additionally, extremely high local heating and cooling rates result in numerous physical and thermo-mechanical phenomena taking place locally at very short time scales. Specifically, the occurrence of defects can be directly correlated to the melt pool characteristics, such as its geometry and depression morphology, and they are, in turn, a result of laser irradiation parameters and material properties [1]. Therefore, the manufacturing of parts with desired quality is contingent upon utilising carefully derived sets of process parameters (also called the process window) balancing the undergoing complex phenomena during the build process [6,7]. Furthermore, finding the appropriate process window is rather a time-consuming procedure, given that it is both materialand machine-dependent. This issue represents a significant challenge for industrial application, for which the "reduced lead time" -promised by additive manufacturing techniquesmay be questioned [8,9].
Under optimized process parameters, the melt pool is in the so-called conduction mode regime, where we observe a stable melt pool formation, corresponding stable melt flow dynamics, and subsequent stable solidification. The consolidated parts that went through conduction mode usually result in high density and low defects. Conversely, the melt pool lifetime is shortened under insufficient laser energy densities, resulting in insufficient time for forming a stable melt pool, fully adjoining and wetting the underlying solidified layer. Consequently, upon solidification, Lack of Fusion (LoF) defects appear due to incomplete overlap between adjacent melt pools or the underneath layers [10]. On the other hand, under high laser energy densities, excessive evaporation results in the formation of a deep depression zone (Keyhole) with complex melt flow dynamics, which could be followed by depression collapse and entrapment of pores in the final consolidated part under the name of keyhole porosity [11,12].
Given the above, it is clear that the applicability of AM for industrial applications depends mainly on the level of part quality insurance that can be provided [13]. The conventional characterization techniques to examine the build quality in parts/components built by LPBF include X-ray Computed Tomography (CT) or destructive microscopical analysis on the polished cross-sections. These analyses are carried out offline after the fabrication of the part. Consequently, their disadvantages are twofold: first, they are highly time-consuming, and secondas the detection of defects occurs post-mortemthe resources in terms of material and machine time are unavoidably wasted for the part that eventually fails to pass the quality check [14]. Hence, AM research communities have a solid motivation to develop real-time/in situ quality monitoring as an alternative. The primary advantage of in situ monitoring over post-mortem analysis is that the process can be stopped in case of defects, saving resources. Additionally, suitable actions can be taken to repair the defects after their occurrence. To this aim, recent approaches take advantage of the developments in sensorization and associated Artificial Intelligence (AI) algorithms for building monitoring systems capable of detecting, localizing, and helping to correct defects in real-time during the build process [15].
As seen, with numerous events happening during parts fabrication, the introduction of several sensors to capture the secondary thermal, optical, and Acoustic Emissions (AE) from the region where the laser interacts with the powderthe process zoneis fundamental, as well as the interpretation of the data to understand and monitor the process [16][17][18]. Indeed, correlating undesirable events in LPBF with sensor signals will enable the development of strategies to suppress them. Furthermore, as the associated physical phenomena happening during the process could be very short-lived, the sensors used to monitor them should have a fast response time and not be susceptible to dynamic changes. To cope with that, in literature, the most common sensing technologies for monitoring laser-material interactions are pyrometers, high-speed cameras, near-infrared (NIR) spectrum thermography cameras, photodiodes, and AE sensors.
In the literature, several approaches can be found, mainly differing from the adopted sensing strategy. For example, Chivel and Smurov developed a monitoring system based on pyrometers [19], which allows tracking the surface temperature profile variations of the process zone [20]. The surface temperature measurements carried out by pyrometers can also help understand the melt pool's solidification mechanism [21,22]. However, their main downside is the inability to provide spatially resolved information, as the provided temperature reading is integrated over a region (the so-called field of view). Unlike pyrometers, camera-based sensing systems with CCD or CMOS detectors provide more comprehensive information about the melt pool morphology and temperature profile [23][24][25][26]. Additionally, high-speed imaging techniques can also be used to gain a fundamental understanding of dynamic fluctuations such as spatter or melt pool size fluctuations [27,28]. For these reasons, camera detectors capable of capturing the IR radiation with high data capture rates have started replacing pyrometers [17]. Infrared cameras have been used to detect defects, such as LoF pores caused by insufficient heat dissipation [29]. For example, Bartlett et al. [30] have demonstrated that full-field infrared (IR) thermography can effectively predict LoF defects. With the help of infrared imaging of the melt pool, it has also been demonstrated that unstable behaviors can be identified [24,31]. Similar to camera imaging, photodiodes have been reported to monitor the LPBF process in both off-axis and coaxial modes [32,33]. Moreover, photodiode sensors are preferred over camera imaging due to their higher sampling rate, minimum required computation, and lower cost [34,35]. For example, Egan et al. employed two coaxial photodiodes to capture plasma and Infra-Red (IR) emissions and confirmed that their signals are correlated with defective layers containing pores created during the LPBF process [36]. In addition, the existence of a linear relationship between deposited energy to create the melt pool and the photodiode sensor data has been reported in the literature [37][38][39]. Besides, the correlation between tensile properties of builds produced using AM and off-axial photodiode readings has been established [40]. Berumen et al. [35] also demonstrated that machine faults that occur during the printing job of the part could be detected with the help of a photodiode. With melt pools of dimensions 50-250 µm wide [41,42], appearing in a time scale over roughly 10-100 μs in LPBF process [43][44][45], building in situ monitoring systems based on visual and optical sensors with a high spatial and temporal resolution, makes them very expensive. In this regard, AE sensorsboth structure-borne and air-borneare a suitable alternative for monitoring as they possess good temporal resolution and are also economical [46][47][48]. Gutknecht et al. [49], as an example, showed that acoustic signals from the process zone of the AM processes tend to be 40 times more sensitive compared to imaging techniques and are still 15 times more sensitive than pyrometer techniques. Pandiyan et al. [50] demonstrated that information inside the air-borne acoustic signals from different LPBF regimes exhibited different characteristics in time, frequency, and time-frequency domains. Furthermore, the identification of the location of the micro defects such as pores and microcracks in the LPBF process has been demonstrated based on burst type AE events [51]. Finally, more recent works started to focus on multiple sensing systems such as the combination of photodiodes, pyrometry, and camera systems to capture all the pros from each of them [17,52]. A comprehensive review of in situ sensing methodologies for AM techniques can be found in the literature [17,18,53,54].
Understanding patterns in the multifaceted information from the LPBF process zone extracted through the sensors will help characterize the physics of these mechanisms and build a comprehensive monitoring system with high reliability [55]. However, the highly dynamic nature of laser-material interaction makes it difficult for human operators to extract patterns from the raw sensors data and make the right decision in real-time, given the high dimensionality of the space these data live by Pandiyan et al. [56]. On the contrary, Machine Learning (ML) algorithms can accurately model complex nonlinear problems, even by deriving knowledge from raw sensor data [57]. A comprehensive review of ML algorithms trained in supervised, semi-supervised and unsupervised manners applied for monitoring the AM processes have been reported in the literature [58]. Specifically, data from optical and thermal sensors have been used to train conventional ML algorithms such as K-Nearest Neighbor (KNN) and Decision Tree (DT) to estimate the part quality [14]. Khanzadeh et al. [59] proposed a multilinear principal component analysis (MPCA) approach to extract low dimensional features from thermal maps to monitor the AM process. Gobert et al. [60] confirmed that a linear support vector machine (SVM) could determine the quality of the individual layer based on digital images. Gaussian Mixture Models (GMM) have also been reported to identify the build quality using randomized Singular Value Decomposition (SVD) features extracted from a photodiode sensor [61]. The linear SVM classifier effectively classified different processing regimes in LPBF using statistical features computed on acoustic signals [62]. Unlike conventional ML algorithms that require preprocessing of the data prior to training, algorithms based on Deep learning (DL) have been used recently to monitor the LPBF process [63,64]. The DL algorithms based on Convolutional neural networks (CNNs) have been demonstrated to detect defects such as delamination and spattering based on IR images captured [65]. Similarly, CNN's have been trained with melt pool images corresponding to various processing regimes to detect the associated defects [66]. The CNN trained on the powder bed images before laser scanning has been reported to predict anomalies induced by the recoater blade in the LPBF process [67,68]. Acoustic signals corresponding to different process parameters that induce different concentrations of pores were trained on spectral CNN to distinguish the corresponding build qualities [16]. Semi-supervised DL algorithms have also been implemented to detect and classify anomalies in the LPBF process [69,70]. A defect detection system based on a deep belief network (DBN) and microphone data have been successfully developed to classify several processing regimes [71]. Digital camera images were combined with a deep residual neural network and region proposal network to detect several defects that may occur during fabrication, namely warpage, delamination, and short feed [72]. Caggiano et al. [73] have combined the powder bed and process zone images with a Bi-stream CNN to evaluate the process quality.
It has to be noted that most monitoring strategies reported in the literature using ML are based on signals of fixed length in time (fixed input size), which constrains the application of the trained models across different scanning lengths. As a part of this work, we try to bridge this gap by using a hybrid DL model consisting of CNNs and Long-Short term memory (LSTM). Apart from the AI algorithm that operates over various time scales, this work also introduces a monitoring system comprised of a heterogeneous sensing system consisting of four sensors, namely back reflection (BR), Visible, IR, and AE, measuring different aspects of the LPBF process when printing cubes out of 316 L stainless steel powder. In addition, high-speed synchrotron X-ray imaging was used to validate the occurrence of different operating regimes. The paper is organized into 5 Sections. Section 1 presents a brief literature review of the LPBF processing regimes, sensing techniques, and ML algorithms used for real-time LPBF process monitoring. Section 2 gives an overview of the proposed hybrid DL model. Section 3 describes the LPBF experimental setup, processing parameters, and data acquisition setup. Section 4 presents and discusses the prediction results using the heterogeneous sensing system using hybrid DL architectures. Finally, Section 5 summarizes this investigation's findings and the future works on in situ monitoring for the LPBF process.

Theoretical basis
One of the most impressive aspects of convolutional operation in a neural network is the ability to exploit the degree of invariance in translation that most signals naturally have. Indeed, it is intuitively evident thatonce found a meaningful representation in a specific portion of a signalthe same representation should be searched for everywhere else in the signals, as it represents a distinct pattern. CNNs embody this idea by applying the same linear local transformation (the convolution) to the totality of the signal, using trainable filters [74]. As a result, a trained CNN model tends to have a well-represented hierarchy of features for data distributions close to the ones seen during training [75]. However, the major drawback of neural networks based on CNNs is that they only accept input data with a fixed size, and they process them all at once to produce a fixed amount of output data each time. This processing scheme implies that they cannot be employed for data with different lengths [75]. Unlike CNNs, Recurrent Neural Networks (RNNs) do not process all the input data simultaneously. Instead, they process the input data one data point (the smallest unit in which a signal can be divided) at a time, treating the input signal as a sequence. Indeed, the RNN performs its computation on the first element of the input sequence before producing an output [76]. The output, known as the hidden state, is then combined with the following input in the sequence to produce another output. This computation continues until the model encounters all the elements in the sequence so that the final output is dependent on all the sequence's elements. The computational unit that performs the operations on the current sequence's element and hidden state is called the RNN cell, and it is reused at each time step. This mechanism enables RNNs to exploit dynamically changing temporal information from the input sequences for decision-making [77]. There are many variants of RNNs, such as Vanilla RNNs, Gated Recurrent Units (GRU), LSTM, and Bi-directional LSTMs [78].
Given the CNNs ability to find patterns in the input data and RNNs capacity to discover temporal relationships regardless of the sequence duration, combining both help develop hybrid DL models with interesting properties [79]. The combination of such networks has been applied to a variety of tasks such as forecasting [80], classification [81], and sentiment analysis [82]. As far as this work is concerned, we have built a DL architecture combining a CNN and an LSTM block, namely CNN-LSTM, a network that can flexibly operate over variable time scales. LSTM network was chosen over other variants of RNN as they can learn very long order dependencies [83]. The proposed hybrid DL model is schematized in Fig. 1. As can be seen, CNN acts as the front-end for the proposed model by processing the input data to extract features out of them. As CNNs preserve the signal structure, the processed data is flattened (converted into a vector) before feeding the RNN. The RNN then learns the temporal relationship in the data irrespective of the vector size and performs the decision-making task by outputting a class. Notice that, by combining a CNN (susceptible to the input data size) and an RNN (not affected by the input size), we were able to achieve one of our goal, which is to have an ML model that can predict on inputs with variable time-scales.

Experimental setup
The operando X-ray imaging experiments were carried out at the TOmographic Microscopy and Coherent rAdiology experimenTs (TOMCAT) beamline at the Swiss Light Source (SLS) utilizing a mini-LPBF device designed and built at the Paul Scherrer Institute (PSI) [84,85]. This setup was developed in order to be implemented at synchrotron X-ray diffraction and radiography beamlines for operando experiments while printing 3D structures under conditions very close to commercially available LPBF devices. The mini-LPBF device is equipped with two glassy carbon windows in the front and back of the chamber, transparent to high-energy X-rays, allowing the incoming X-ray beam to access the powder bed through the back window and the transmitted X-ray beam to reach the detector placed outside the chamber via the front window. Fig. 2 displays the main components of the machine. The  Schematic view of the operando radiography setup at the TOMCAT beamline. a), b) Build chamber of the mini-LPBF machine and the camera in two different views, and c) zoom on the build plate and the printed sample to highlight the volume that is probed during the operando experiment. During LPBF processing, a 2axis scanning head (1) deflects the laser beam (2) onto a 12 × 12 mm 2 build plate (3) and the sample (4). A parallel X-ray beam (5) passes through the sample (c) and reaches the microscope (6). chamber has a continuously pulsed laser beam, with a pulse repetition rate of 250 kHz (redPOWER, SPI Lasers Ltd, UK) operating at a 1070 ± 10 nm wavelength with a maximum power of 500 W and a beam quality factor of M 2 < 1.1. The laser beam is collimated as a parallel Gaussian beam into a 2-axis deflection scanning unit (SuperScan III, Raylase GmbH, Germany). The latter employs two fused silica mirror galvanometers to scan the laser beam over the powder bed. An F-theta lens (Sill Optics, Germany) with 163 mm focal length is used to focus the laser beam to a spot size of ø 45 µm at 1/e 2 . The laser and scanning unit is piloted using an SP-ICE-3 board and WeldMARK software (Raylase GmbH, Germany). During and before the operation, the chamber is continuously flushed with high purity Argon gas (99.996%), and the oxygen level is monitored, reaching concentrations as low as 0.2%. The chamber employs a re-coater mechanism for powder deposition between each layer. A more detailed description of this mini-LPBF device can be found in [84,85]. The miniaturized LPBF device is mounted on a devoted stage and tilted by 20 degrees with respect to the X-ray beam direction (Fig. 2). As illustrated in Fig. 2(c), the edge of the powder bed is illuminated by a parallel X-ray beam with energies ranging between 10 and 55 keV. The transmitted beam is recorded with a custom-made microscope with 4x magnification [86] coupled to the in-house developed GigaFRoST detector [87]. The experiments are performed at an acquisition frequency of 10 kHz. All the acquired data was processed using ImageJ software.

Material, processing conditions, and operando X-ray analysis
A gas atomized 316 L stainless steel powder acquired from Oerlikon Metco, with the chemical composition listed in Table 1 and a particle size distribution ranging from 15 to 45 µm, was used in this study. The morphology of the 316 L feedstock powder was dominantly spherical with the presence of occasional satellites.
A cuboid sample (width: 2 mm, length: 8 mm) was printed using the previously described mini-LPBF device. The first 30 layers were built with optimized parameters (see Table 2) for minimal porosity content. The build-up of these initial layers was required to reach a sufficient height for the X-ray beam to be transmitted through the sample without being obstructed by the edge of the build plate, as illustrated in Fig. 2(c). Following this preliminary "build-up" step, three different processing regimes were investigated by building successive layers on the same sample, in the following order: Keyhole, conduction mode, LoF. This order was selected carefully to inhibit the removal of the signatures of each regime by a higher energy regime (corresponding to a deeper melt pool) in subsequent layers. A bidirectional and parallel scanning strategy was employed with the layer thickness set to 30 µm. This value allows an appropriate balance between powder particle size and melt pool depth, allowing good sintering between layer n and layer n-1. Three sets of parameters were defined for each regime to cover an appropriate region of the process parameter space. To easily observe the different regimes by X-Ray radiography, a slow range of scanning speed was chosen. For each regime, the change in power and speed was mostly investigated. The hatch distance was adapted according to theoretical melt pool sizes to reach the desired regime. Each parameter set was repeated four times to have adequate data for further processing.
The process parameters such as power and speed were chosen using the concept of normalized enthalpy ΔH (Eq. (1)) as a way to predict the normalized melt pool depth d (Eq. (2)) and associated processing regime [12,88].
Where ΔH Δh is the ratio between the input and the dissipated energy, α is the absorptivity of the powder material, P is the laser power  Fig. 3, it was determined that the LoF, conduction mode, and keyhole regimes correspond to normalized enthalpy values below 21-22, between 26 and 28, and above 40, respectively. The values of normalized enthalpy used in the present experiment for each regime are reported on the processing map shown in Fig. 3(a). The operando radiography measurements confirmed the occurrence of the three processing regimes. Representative radiographs are shown in Fig. 3(b) for the different regimes. Additionally, videos of keyhole formation are available at https://c4science.ch/diffusion/ 12010/.

Post-mortem tomography and microscopy analysis
Post-mortem analysis of the manufactured samples was performed to confirm each regime's occurrence and correlate the process parameters to the presence of specific types of defects. First, X-ray tomography (Fig. 4.a and b) was carried out on the sample to evaluate the distribution of defects in its entire volume. Then, a microstructure analysis was performed by sectioning the sample perpendicular to the scanning direction, followed by grinding and polishing of the surface down to 1 µm. The melt pools were revealed by etching using aqua regia (100 ML H 2 O, 75 ML HCl, 25 ML HNO 3 ) for 180 s. Representative optical micrographs of each regime are shown in Fig. 4.c. As illustrated in Fig. 4, deep melt pools and numerous spherical pores typical of the keyhole regime are visible in the region where high normalized enthalpies were applied. The melt pools are shallower in the conduction mode region, and defects are significantly reduced. Finally, large LoF pores are observed in the upper part of the specimen, indicating insufficient energy input delivered to the material. These results confirm that the chosen parameters based on a pre-existing processing map (Fig. 4) successfully induced each of the three processing regimes (Keyhole, conduction mode, and LoF).

Heterogeneous sensing and data acquisition
In this work, we perform in situ sensing for predicting three processing regimes (Keyhole, conduction, LoF, see Section 1) via four sensors, namely an AE sensor and three photodiode detectors. The purpose of multiple sensors in this work was to utilize the secondary emissions such as thermal, optical, and acoustic from the process zone for decision making. Fig. 5 shows the mini-LPBF setup used in this work equipped with sensors. The AE sensor PICO HF-1.2 (Physical Instruments, US) is a lightweight miniature structure-borne sensor. It was mounted at the bottom of the base plate with reasonable proximity to the laser-material interaction zone to capture AE events inside the material, as shown in Fig. 5. The sensitivity range of the sensor is 500-1850 kHz. Three photodiode detectors and corresponding optics from Thorlabs were used to look at different aspects of the optical and thermal emissions originating from the top surface. A fixed focus collimator (F220SMA-980) was installed in an off-axial configuration, as shown in Fig. 5, to collect the optical emissions from the melt pool. The collimator was pointed towards the region to be monitored (i.e., the top surface of the build plate) with an angle of 30 • and a distance of 5 cm. The collected optical    splitter, whose ends were connected to InGaAs (PDA20CS2 [800 -1700 nm]) and Si (PDA100A2) photodiodes, respectively. At the amplification level of 30 dB, the InGaAs photodiode was equipped with a low-pass optical filter (FELH1100) to allow sensing wavelengths above the laser radiation. On the other hand, the Si photodiode was used to detect optical emission in the visible and NIR range, thanks to a shortpass optical filter (FESH0950). Due to the weak signals in this spectral range, a high amplification level of 60 dB was used. The output of all three photodiode detectors was represented by an analog voltage with a dynamic range of ± 5 V. All photodiodes have programmable gain that was optimized based on the processing conditions. The schematic of the heterogeneous sensor setup is given in Fig. 6. The signals from the four sensors were acquired separately using four channels of the Advantech Data Acquisition (DAQ) card at a sampling rate of 3 MHz using customized software developed based on the C# framework. The sampling rate of 3 MHz was chosen across all four channels to ensure that the Nyquist Shannon theorem [90] is satisfied for the signals considered in this work. The AE sensor PICO HF-1.2 sensor sensitivity till 1500 kHz was considered in this work; thus, a sampling frequency of 3 MHz (2 ×1500 kHz satisfying Nyquist-Shannon theorem) was used. Out of the four channels in the DAQ card, channels 0, 1, and 2 were assigned to capture the analog signals from the photodetectors configured to capture BR, IR, and visible range. Channel 3 was assigned to record the AE signals from the structure-borne sensor. In addition, the signal acquisition was automatically triggered for all channels based on the photodiode detector that oversees the BR light. The underlying principle of the trigger is thatonce the laser radiation hits the powder bedthe photodiode detector catches the increment of light intensity around the wavelength of the incident laser and produces an analog voltage. As soon as the voltage crosses a user-specified threshold (in this work, 1.0 V), the data acquisition starts, and the files are stored separately for further offline analysis. Since the acquisition starts in parallel across all four channels, they are by default synchronized.

Fig. 5.
Mini-SLM setup as described in [84,85] equipped with optical detectors and AE sensor. Fig. 6. Schematics of the proposed heterogeneous sensing system, comprising of photodiode detectors and AE sensor. Fig. 7 shows the mean trend line computed across ten windows on the BR signals corresponding to the three regimes in the data set, consisting of a window size of 1.65 ms. Comparing the mean and standard distribution computed across the regimes from Fig. 7 suggests that the BR signals' statistical distribution is distinct. The increase in the intensity of the BR signal collected in the keyhole regime compared to the LoF and conduction mode could be attributed to the fact that emissions after the multiple reflections of the laser wavelength from the vapor cavity are more incident on the collimator in off-axial configuration.

Dataset and methodology
The visualization of distribution in the Root Mean Square (RMS) features computed against each window size of 1.65 ms corresponding to emissions in the visible wavelength, as shown in Fig. 8(a), suggests that statistical difference is evident in the distribution among the three regimes. Similarly, there existed a distinct distribution on the skewness feature that was computed for the IR signals with a window size of 1.65 ms corresponding to the three regimes as shown via bar plots in Fig. 8(b).
Furthermore, the AE signal energy decomposition into five energy bands arbitrarily using the periodogram method, namely 0. 25 Figs. 7-9, it is conclusive that multifaceted information from the LPBF process zone extracted across regimes are different and could be used as an input to train models for insitu monitoring of the process state.
Data preparation is imperative to develop a CNN-LSTM model that can work regardless of the time duration of the input signals. Therefore, the time-synced signals corresponding to each printed layer representing the three different built regimes from the four sensors were split into four different running windows (w1, w2, w3, and w4), whose time duration is 0.83, 1.65, 2.5, and 3.30 ms, respectively. The windows of respective lengths were computed across all three processing regimes without overlaps. The length of the windows or the size of the input to train the hybrid DL network were decided based on the authors' previous works [50,70,91]. The choice of windows length, i.e., data points to train the CNN-LSTM network, is not related to the sampling frequency but rather to the resolution of the monitoring strategy. The data preparation was performed as an offline process. The workflow to build the dataset from operando experiments is illustrated in Fig. 10. A detailed description of the dataset is discussed in Table 3.
The development of the hybrid model consists of two phases: training and testing. Before model training, the time-synced signals corresponding to the four window lengths (w1, w2, w3, and w4) from the whole dataset with ground-truths are concatenated to form a fourdimensional tensor which will be the input for the CNN-LSTM model. The dataset is split stochastically into 70% for training and 30% for testing. The full dataset, including all the windows of all time durations, comprises four different data loaders (DL 1, DL 2, DL 3,and DL 4), each dedicated to loading windows of a single time duration. All the four data loaders are put into a randomizer so that only one data loader is used at a time after initialization. In the training phase for differentiating the three processing regimes, the randomizer is initialized separately at each epoch (a full pass over the complete training set), so the model is trained with windows of only a single time duration at each epoch. Since the randomizer is independent of the data, all four data loaders share the total number of epochs to train the model equally. As this is a classification problem, the cross-entropy loss is backpropagated to alter the weights to minimize the classification loss. As stated in Section 2, the coupling of the CNN with the LSTM allows the model to be trained on windows of variable time duration. After training, to check the prediction accuracy of the model, 30% of the data that was reserved for testing is used. The schematic flow of the training strategy is illustrated in Fig. 11.

Architecture and training
The architecture proposed in this work consists of two neural network types, namely CNN and LSTM (see Section 2). However, both networks were trained together. The design of the CNN network consists of five convolution layers, as illustrated in Fig. 12. During training, the first layer of the CNN model takes an input tensor of size 200 (batch size) x 4 (number of sensors) x one of the following [2 ′ 500, 5 ′ 000, 7 ′ 500, 10'000], depending on the specific data loader that gets randomly chosen at the beginning of each epoch. A kernel size of 16 was used across all five 1D convolution layers. The number of kernels of the first convolution was four and was subsequently doubled till the fourth layer. Finally, the fifth layer was configured to use 10 kernels. The output of the CNN was then passed into the LSTM block. The LSTM block with one recurrent layer consisting of ninety hidden states operates on the output of the CNN as a sequence. The newly computed hidden states from the LSTM are further connected through a linear layer for classifying the input signal into three categories labeled LoF, conduction mode, and Keyhole. The cross-entropy was used as the loss function since we are dealing with a classification problem. The total parameters to be trained in the hybrid CNN-LSTM model were around 91 thousand. Both CNN and LSTM models were developed with a PyTorch framework, and the 1D convolutions, LSTM blocks, data loaders, activations, and max pooling operations were performed using the inbuilt PyTorch libraries [92]. Nonlinearity in the model training was introduced using the Rectified Linear Unit (ReLU) as activation function.
A batch size of 200 was selected during the model training. The training parameters of the hybrid CNN-LSTM model are listed in Table 4. Batch normalization was applied across the CNN layers to reduce overfitting and speed-up training time. Furthermore, for the same reasons, it was also ensured that the datasets were shuffled across epochs, and a dropout of 0.2 was applied during training. The number of windows per class differs among classes, i.e., the data set is imbalanced, as shown in Table 3. The keyhole regime class has a higher count value than the other two classes (LoF and conduction mode), making learning biased towards the dominant keyhole class. To improve the training of the hybrid model and remove the biasing towards the more frequent class, the weight of the classes is balanced using weighted loss. The weights of the less frequent classes are scaled to a higher value and vice versa in the loss term based on Eq. (3). The computed weights corresponding to the three classes LoF, conduction mode, and keyhole regime The optimizer for the training was stochastic gradient descent with a momentum of 0.9 and a learning rate of 0.01, and the total number of epochs was 800. Additionally, the model's training was stabilized by reducing the learning rate by 70% after every 100 epochs, as shown in Fig. 13(a). At the beginning of each epoch, a randomizer randomly selected a single data loader among the four available ones (differing by the time duration of the loaded windows). During the training, it was also ensured that the randomizer gave equal weightage for each window length across the epochs. Fig. 13(b) shows that after the training of the hybrid model, 800 epochs were almost equally shared across the four different window lengths.
Furthermore, elastic net regularization was introduced to guide the model towards learning a less complex mapping based on the parameters listed in Table 4. Also, weight initialization was also done to prevent the layer activations outputs from exploding or having vanishing gradients. Since ReLU activation was used, the initialization of the network was based on the Kaiming initialization method [93]. The hyperparameters for the model were determined after an exhaustive search. The model's overall training was performed using two hardware-accelerated Graphical Processing Units (GPU), namely Nvidia RTX Titan, with a dedicated memory of 24 Gigabyte integrated inside a Lambda (Lambda Labs, US) work station. The CNN-LSTM took 4.5 h for training. Fig. 14 shows the accuracy and loss curves of the CNN-LSTM model trained on the data from the heterogeneous system with four window lengths. From the visualization of the loss and accuracy curve values over the 800 epochs, it is evident that the CNN-LSTM model would have learned the distributions corresponding to the three ground-truth labels over the considered four signals. The accuracy and loss trend saturates after 400 epochs. There is no significant performance improvement. However, there are occasional peaks in the loss curves, which we suspect are due to the change in window lengths across each epoch. Table 5 shows the classification accuracy of the trained CNN-LSTM on windows of four different time durations simultaneously using 3by-3 confusion matrices for the three regimes. 30% of the labelled   Table 5, is 98.2%. The classification error is 1% with the categories conduction mode and 0.1% with Keyhole. As shown in the confusion matrix, the model's overall accuracy ranged from 98.2% to 99.9%. Comparing accuracies across window lengths, it was found that with smaller time duration (0.83 ms), the accuracy was lower than the other larger window lengths (1.65, 2.50, and 3.30 ms). Conversely, the accuracies increased with the window length. This analysis shows that there is information loss in smaller window lengths for processing regimes prediction compared to the larger window length.   The generalization of the trained CNN-LSTM model concerning the window length was validated by analyzing the model's prediction accuracy with window lengths that were not used during training. For this study, five window lengths (0.5, 1.33, 2, 3, and 4 ms) were arbitrarily selected. Out of which, three window lengths of 1.33, 2, and 3 ms corresponding to 4 ′ 000, 6 ′ 000, and 9 ′ 000 data points were within the window lengths that the CNN-LSTM model was trained on. The remaining two window lengths were 0.5 and 4 ms corresponding to 500, 12'000 data points which were outside the window lengths the model was trained on. The 30% of the test dataset was also split into these fivewindow lengths with ground-truth labels separately to check the model's prediction. Table 6 shows the 3-by-3 confusion matrices depicting the classification accuracy of the trained CNN-LSTM on the five different arbitrarily selected windows. The overall prediction accuracy of the model in these window lengths ranged from 96.5% to 99.9%. Synonymous with the previous findings reported in Table 5, there was a drop in the accuracy as the window length decreased. However, the global accuracy on different window lengths suggests that the model has good generalization across the different lengths.

Sensor ranking
The main advantage of such multimodal analysismade up of multiple sensors, 4 in our caseis a better adaptation to different situations and comprehensive decision-making. Indeed, a sensor can provide more information in specific conditions and be less informative in others. As it was shown by Shevchik et al. [15], the highest accuracy is obtained when combining multiple sensors at once. Nonetheless, it is noteworthy to quantify which sensor carries the most information in where out units ∈ R N , and N is the number of classes considered (3, in this case). inputSample,saliency ∈ ℝ M times T , where M is the number of sensors (4, in this case), and T is the window length (2 ′ 500, 5 ′ 000, 7 ′ 500, 10'000, corresponding to 0.83, 1.65, 2.50, and 3.30 ms, see Section 3.5). ∇ inputSample emphasizes that the gradient is computed with respect to the input sample and not to the model's parameters as in training, and | • | denotes the element-wise absolute value. Given that the operation of derivation preserves the input sample dimensionality, the median per sensor is applied to the saliency of every input sample (reducing the dimensionality from M times T to M). The result is then scaled by the median absolute amplitude per sensor of the input sample. This operation calculates the relative amplitude of the derivative of the maximally excited output unit with respect to the input sample, which is insensitive to the signal amplitudeallowing to compute a score per sensor for each input sample denoting the importance of each sensor in the decision-making process. Fig. 15 shows the distribution of the relative amplitude of the median saliency per sensor on our test set for each of the four window lengths. As stated, the computed score (the derivate relative amplitude) denotes the importance of each sensor, so as the derivative distribution per sensor is shifted to the right, the higher is the sensor importance. In this scenario, the sensors carrying the most informative content are the BR and AE sensor signals. Interestingly, as the window length reduces, the importance of AE over BR increases. This behavior can be explained considering that the AE sensor can probably capture more time-resolved events that contribute to the decision-making processeven with small windows. However, these events appear to be less reliable for the classification task, as seen in the lower accuracy obtained with smaller windows. In contrast, the BR requires a more extended integration period, which guarantees more stable and reliable results once granted (with a bigger time window), giving a higher accuracy. BR contributes more to the model's decision-making because BR is directly correlated with the regimes than other secondary AE and optical emissions   Tables for the classification accuracy results on the five window lengths. The classification results in each cell are organized in the following descending order: 0.50, 1.33, 2.00, 3.00, and 4.00 ms. All values are in %.
considered in this work, as depicted in Fig. 7. Also, AE contributes to the decision-making because the AE sensor was in close proximity to the process zone compared to the collimator, resulting in less information loss.

Conclusions
We have demonstrated a novel monitoring strategy for LPBF processes that consists of developing and training a hybrid CNN-LSTM model that can classify regimes across different time scales based on heterogeneous sensing data. Specifically, the heterogeneous timesynced sensing system utilized for the hybrid model training included signals from four sensors, namely BR, Visible, IR, and structure-borne AE measuring different aspects of the LPBF process zone in stainless steel (316 L) manufactured with a mini-LPBF device. The experiments were performed in an operando high-speed X-ray imaging environment to confirm the occurrence of three processing regimes: LoF, conduction mode, and Keyhole, which were subsequently classified with the proposed DL architecture. The following generalized conclusions are drawn based on the experimental results: • The combination of two neural network architectures, namely CNN and LSTM, proved advantageous, allowing the creation of a single monitoring model that can predict the processing regime using input signals whose duration extends from 0.5 to 4.0 ms with high prediction accuracy ranging from 95.9% to 100%. • Secondly, due to the usage of signals with heterogeneous time duration during training, the developed model was able to generalize its high-accuracy prediction to input data of time duration not seen during training.
• Based on the model prediction accuracy over the different input data time durations, it was seen that there was a drop in model prediction as the signal lengths decreased. • Finally, saliency map-based sensor ranking computation revealed that signals from BR and AE sensors influenced the decision-making process more than others. Also, as the length of the window decreases, the AE sensor tends to have higher relevance than BR.
In general, this research's outcomes confirm that the proposed approach allows the generalization of the model's predictions to data of different time scales. Though this work's primary investigation has been performed by printing trivial cubes, the model's efficacy in a more demanding situations such as more complex geometries and scanning paths is to be validated and integrated into our planned research direction. Out of many events in the laser interaction zone, only three process regimes are evaluated in this research work. The application of such strategies over other types of defects is part of our future work. Our future work will also include optimizing the sensors' hardware, the data collection pipeline, and the inclusion of physics-based inference from the trained models. The data and codes for this work are present in the following repo (https://c4science.ch/diffusion/12010/). The computed score (the derivative relative amplitude) denotes the importance of each sensor, so as the derivate distribution per sensor is shifted to the right, the higher the sensor importance. Analysis performed on the test set. Depending on the window lengths, the sensors carrying the most of the information are the BR and the AE.