Automatic Fault Detection for Selective Laser Melting using Semi-Supervised Machine Learning

Risk-averse areas such as the medical, aerospace and energy sectors have been somewhat slow towards accepting and applying Additive Manufacturing (AM) in many of their value chains. This is partly because there are still significant uncertainties concerning the quality of AM builds. This paper introduces a machine learning algorithm for the automatic detection of faults in AM products. The approach is semi-supervised in that, during training, it is able to use data from both builds where the resulting components were certified and builds where the quality of the resulting components is unknown. This makes the approach cost efficient, particularly in scenarios where part certification is costly and time consuming. The study specifically analyses Selective Laser Melting (SLM) builds. Key features are extracted from large sets of photodiode data, obtained during the building of 49 tensile test bars. Ultimate tensile strength (UTS) tests were then used to categorise each bar as ‘faulty’ or ‘acceptable’. A fully supervised approach identified faulty specimens with a 77% success rate while the semi-supervised approach was able to consistently achieve similar results, despite being trained on a fraction of the available certification data. The results show that semi-supervised learning is a promising approach for the automatic certification of AM builds that can be implemented at a fraction ∗Corresponding author Email address: plgreen@liverpool.ac.uk (Peter L. Green) Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 18 September 2018 doi:10.20944/preprints201809.0346.v1 © 2018 by the author(s). Distributed under a Creative Commons CC BY license. of the cost currently required.


Introduction
There is a growing demand for efficient manufacturing technologies [1].Additive Manufacturing (AM) has huge potential in healthcare for custom implants and in aerospace for lightweight designs [2].However, uncertainties surrounding part quality prevent the full adoption of AM technology in such sectors.Moreover, certification of AM parts is challenging (as faults may occur internal to the parts) and often requires expensive CT scans.
The current paper specifically considers Selective Laser Melting (SLM).SLM is a 3D printing technology that has become very popular in recent times due to its ability to produce complex metal geometries, relative to traditional methods.The SLM process involves layer-by-layer construction of a build by repeatedly channelling laser beams onto a thin layer of metal powder deposited on a fusion bed [3].Powder deposition and sintering are repeated until a desired product is made to specification.
For the work detailed herein, data related to the SLM process was gathered using high precision photodiodes, which were installed axial to the laser.These sensors were placed behind filters that were designed to eliminate the reflected laser light, thus allowing the reflected light intensity to be measured during the builds .The photodiodes provide process measurements from which, potentially, it may be possible to determine the quality of AM products 1 .Understanding these data is a challenging area.However, advances in machine learning have made it possible to create and apply intelligent algorithms to large datasets for decision making [4].Such algorithms can identify patterns in large data, after being trained.The current work is based on the hypothesis that, using large amounts of process measurements from SLM machines, machine learning can be used to quickly and cheaply classify the success of SLM builds.
Classification algorithms can be broadly categorised as supervised, semisupervised or unsupervised (for a theoretical review of these methods, [5][6] [7] are recommended).With a supervised approach, the algorithm is presented with labelled data -a set of input vectors, each of which is associated with an observed output value (or 'label').Unsupervised learning can be thought of as finding patterns in only unlabelled data (clustering, for example, is one form of unsupervised learning).With a semi-supervised approach, the user provides some labelled data and some unlabelled data at the same time.The model may then attempt to establish a decision boundary and classifies the data into clusters; based on the characteristics of the provided labelled and unlabelled information [8] [9].
In the current context, input vectors consist of data that was gathered during AM builds and the labels are used to indicate whether each particular build was 'acceptable' or 'faulty' (in this paper, for example, labels are defined based on ultimate tensile strength values).Consequently, before the application of supervised machine learning, one would have to conduct and certify a large number of AM builds (see [10], for example, where 100s of parts were produced to generate the data needed to train a support vector machine).This procedure would have to be repeated per new type of component or material.However, in many practical applications, completely labelled information is not available [11].It is more common to find few labelled data and relatively large amounts of unlabelled data.In the current study, for instance, process measurements are generated whenever a component is manufactured, but cost constraints prevent labels from being assigned to most of these data.This study explores how machine learning could help to automatically detect defects in situations where there is a large amount of unlabelled data (builds that were not certified) and a small amount of labelled data (builds that were certified).Furthermore, it illustrates the application of a probabilistic methodology -an important aspect of the approach which allows one to quantify the uncertainties associated with the machine-learnt assessements of AM builds.
The paper makes 3 main contributions: 1.It is illustrated how a a Randomised Singular Value Decomposition can be used to extract key features from large sets of SLM process measurements.
2. The feasibility of using machine learning to detect unsuccessful SLM builds from process measurements is demonstrated.This highlights how signal-based process monitoring, which is adopted in risk analysis and industrial statistics frameworks [12], could be extended to AM applications.
3. It is shown that, using semi-supervised learning, the number of costly certification experiments associated with such an approach can be significantly reduced.
It is important to note that this study does not aim to draw links between specific SLM process parameters and the quality of the resulting builds.Rather, it details a purely data-based approach whereby a machine learning algorithm is used to classify SLM build quality based only on the patterns that are contained within sets of photodiode measurements.
The paper is structured as follows; Section 2 discusses current state-ofthe art and highlights the contributions of the paper, Section 3 discusses the semi-supervised model derivation and formulation, Section 4 demonstrates the model using a case study and Section 5 is the Conclusion of the work.It is noted that Section 3 is included so that the machine learning approach is not presented as a 'black box'.Section 3 can, however, be skipped by those who are purely interested in the case study.

Literature review
This section highlights key relevant contributions before establishing where the current paper fits amongst other literature in the field.

Key process parameters
Local defects may occur during layer by layer construction of an AM part.The root causes could be traced to improper process parameters, insufficient supports, a non-homogeneous powder deposition, improper heat exchanges and/or material contaminations [4][13] [14].The effects of process parameters, namely; laser power, scanning speed, hatch spacing, layer thickness and powder temperature, on the tensile strength of AM products are reported in [15] [16].Specifically for SLM, it has been shown that four key parameters, namely; part bed temperature, laser power, scan speed, and scan spacing, have significant effect on the mechanical properties and quality of an SLM product [17][18] [19].Sensitivity analyses of SLM process parameters have revealed that both the scan speed of the laser and scan spacing can be used to facilitate effective improvement in mechanical properties [20].
The type of laser employed determines, to a great extent, the behaviour of the powdered particles during SLM processing [21].This dependence is attributed to the dependence of material laser absorptivity on the wavelength of the laser type used.It has also been discovered that particle size, size distribution, tap density, oxide film thickness, surface chemistry and impurity concentration has little effect on the sintering behaviour of aluminium powders [22].Debroy et al. [15] pointed out that during laser sintering of metals, the alloying elements vaporise from the surface of the molten pool and, as a result, the surface area-to-volume ratio is one of the crucial factors for determining the magnitude of composition change.

Towards feedback control for SLM
While much work has been conducted to identify key parameters that affect SLM build quality, it can still be difficult to relate this knowledge to the development of effective control strategies.This is particularly evident when one considers developing control strategies for new materials.While proof-of-concept controllers have been developed in [23] [24] (using measurements from high-speed cameras and/or photodiodes to control laser power) and the works [25] [26] detail an approach whereby geometrical accuracy was improved by varying beam offset and scan speed, the adaptability of these methods to new materials can be prohibitively time consuming and/or expensive.In [25], for example, it is stated that 'the benchmarking process is time consuming' and that 'a change of material used will require identification of a new process benchmark as the properties of different materials influence the fabrication parameters in the process'.
It is worth noting that Finite Element models can, potentially, aid controller development by relating key process parameters to the microstructure of builds (see [27], for example).Unfortunately, these models tend to be very specific in terms of part design and can take a long time to develop and/or implement.

Machine learning approaches
Through data-based approaches, facilitated by machine learning algorithms, it may be possible to overcome the challenges associated with inferring build quality from knowledge of key parameters and/or the results of Finite Element models.Work has shown that data-driven methods can, from build data, model how process parameters affect the quality of final parts [28][29] [30].Approaches that utilise build data are advantageous because they provide great opportunities for digitalisation and smart process control, otherwise known as 'smart manufacturing' [31][32] [33].
Broadly, machine-learning approaches can be categorised as being either 'supervised' or 'unsupervised'.Supervised approaches involve training an algorithm on a set of data, whereby each training point has a 'label' attached to it.This label indicates the particular class that the training point belongs to (for example, in the current context, the label could indicate whether a particular set of build data corresponds to a build that had been found to be 'acceptable' or 'faulty').Supervised algorithms then attempt to infer decision boundaries that separate these classes.Unsupervised approaches, on the other hand, are used to identify key patterns in data that is unlabelled (cluster analysis, for example, is a well-known example of unsupervised learning).The following 2 subsections highlight relevant applications of unsupervised and supervised approaches within the context of SLM.Particular attention is given to describing the data acquisition process and/or assumptions involved in each example, as this motivates the use of semi-supervised learning in the current study (Section 2.4).

Unsupervised learning
In [34], to automatically detect local overheating phenomenon, the kmeans algorithm was used to cluster features that had been extracted from images (in the visible spectrum) of SLM builds.This unsupervised approach clustered the data before further assumptions were used to relate the results to build quality.(It was, for example, assumed that data from 'normal' and 'abnormal' builds would be best represented by 2 and 3 clusters respectively).The authors of [35] used an unsupervised cluster-based approach to relate melt pool characteristics to build porosity.As with [34], a set of assumptions were then used to relate the clustering results to build quality (specifically, it was assumed that the number of 'abnormal' melt pools would be small compared to the number of 'normal' melt pools).
In a recent study [36], anomaly detection and classification of SLM specimens was implemented using an unsupervised machine learning algorithm operating on a training database of image patches.The algorithm functioned well as a post-build analysis tool, allowing a user to identify failure modes and locate regions within a final part that may contain millimeter-scale flaws.However, the algorithm was not designed to classify a mixture of labelled images and unlabelled images simultaneously.The image patches were manually selected from a secondary database; based on a pre-determined rule which clearly distinguished the patches.

Supervised learning
Ref. [37] describes a variety of approaches that can be used to infer a relationship between melt pool characteristics and part porosity.To label melt pool data, 3D CT scans were used to empirically locate part defects before algorithm training could begin.In [38] a Gaussian process was used to infer a mapping between laser power and scan speed to part porosity.To facilitate this approach, data was generated by conducting experiments across a grid of laser power and scan speed values before porosity was measured using Archimedes' principle.In [39], a support vector machine was used to classify images of build layers that had been obtained using a high-resolution digital camera.Training data was obtained using 3D CT scans, which were used to identify discontinuities in parts post-build.
It is worth noting that, whilst optical methods have been gaining popularity in recent times, the feasibility of applying supervised learning to acoustic emission data for in-situ quality monitoring has also been investigated.[40], for example, used neural networks to classify features of in-build acoustic signals to one of three quality ratings (defined based on part porosity).To generate data for algorithm training, a SLM build was conducted using different laser scanning parameters which, through a visual analysis, were shown to influence porosity.[41] used acoustic signals to train deep belief networks (a neural network algorithm, sometimes referred to under the banner of 'deep learning').During data acquisition, process parameters were varied to deliberately induce different types of build flaw, leading to 5 different classes ('balling', 'slight balling', 'normal', 'slight overheating' and 'overheating').This labelled data was then used to train the parameters of the neural network.

Contribution of the current work
The works listed above highlight the advantages and disadvantages that can be encountered when implementing both supervised and unsupervised approaches.
Supervised approaches require sufficient quantities of labelled data.Unfortunately, the assignment of labels to data often requires a significant amount of additional resources.[37][39], for example, assigned labels based on the outcomes of CT scans while [38] utilised the results from component porosity tests.To circumvent the requirement for such additional testing, [40] and [41] used pre-existing knowledge regarding the relationship between process parameters and build defects to label data.Such an approach, however, relies on the availability of relatively in-depth knowledge regarding processdefect relationships.This information may be difficult to obtain, particularly when new materials are being analysed.
Unsupervised approaches do not need labelled data and, consequently, are often cheaper to implement.However, the relationship between the results of an unsupervised analysis and build quality has to be built upon an additional set of assumptions.For example, both [34] and [35] had to make assumptions about the number / relative size of the data clusters revealed by their analyses.While the results reported in [34] and [35] are encouraging, it is likely that the validity of these assumptions will come into question if such an approach was used to guarantee build quality for applications in risk-averse disciplines.[36] used an unsupervised approach, but only after image data had been manually selected from a database -a process which, it must be assumed, was fairly time consuming.
In the current work it is suggested that an efficient approach should be able to utilise data that is both labelled and unlabelled.This is because, in the author's experience, developing new materials in SLM often leads to a large amount of unlabelled data and a small amount of labelled data.It is, for example, relatively easy to conduct (and obtain measurements from) a large number of builds but, because of cost constraints, only a relatively small number of these can be 'labelled' according to build quality.The work herein hypothesises that the large amounts of unlabelled data (i.e.process measurements where the final build has not been certified) should not be wasted and, should be analysed alongside the more limited set of labelled data.This semi-supervised approach is especially suited to situations where there are few labelled data and much unlabelled data.It therefore has the potential to reduce the number of costly and time consuming certification experiments that are required in the development of machine-learnt models of SLM build quality.

Model Derivation and Formulation
The proposed semi-supervised method uses a Gaussian Mixture Model (GMM) to classify 'acceptable' and 'faulty' AM builds.GMMs are often relatively time-efficient as their parameters can be estimated with the Expectation Maximization (EM) algorithm [42], [43] (described briefly in Section 3.2.1).A description of GMMs is covered here for the sake of completeness and to highlight the application of semi-supervised learning in the current context.For more information on GMMs the book [44] is reccomended.
Essentially, a GMM algorithm clusters data based on the assumption that each data point is a sample from a mixture of Gaussian distributions, such that the probability distribution over each data point can be described as a weighted sum of Gaussian components [45].This is elaborated further below, where we first describe how a GMM can be applied to labelled data (supervised learning), before then describing its application to unlabelled data (unsupervised learning).This then helps to establish how a GMM can be used to address situations involving both labelled and unlabelled data.

Supervised Learning with a GMM
In the following, we use x to represent an input vector -a vector of features that have been extracted from the process measurements of an SLM build.
As stated previously, a GMM assumes that each vector, x, was sampled from a mixture of Gaussian distributions [44] such that where µ = {µ 1 , ..., µ K } represents the means of the Gaussians, Σ = {Σ 1 , ..., Σ K } are covariance matrices of the Gaussians and π = {π 1 , ...π K } are referred to as the mixture proportions.N is the number of available data points and K represents the number of Gaussian distributions that are considered in the mixture.The model parameters that need to be estimated during algorithm training are For supervised learning, each input vector, x, is already labelled -in other words, the user already knows which of the Gaussians in the mixture was used to generate each sample.In such a circumstance, identifying the parameters θ is very easy -it is shown here for illustrative purposes and to establish notation.Using X k to denote the set of N k samples that were generated from the kth Gaussian, the mean of each Gaussian can be estimated by The covariance matrices are estimated using while the mixture proportions are set according to where N , as before, represents the total number data points (such that N = k N k ).

Unsupervised Learning
With unsupervised learning each data point is unlabelled.For a GMM model this means that, while it is assumed that each point is a sample from one of the Gaussian distributions in the mixture, the specific Gaussian distribution from which each data point was sampled is not known.In such a case the labels are described as latent variables, as they are hidden to the user when an analysis is conducted.This makes the problem much more difficult relative to the supervised case as now it is necessary to estimate both the parameters of the Gaussian distributions in the mixture and the labels associated with each data point.Difficulties arise because the parameters of the Gaussian distributions and the labels must be correlated -the geometry of the Gaussian distributions can only be estimated if the labels are known while the labels can only be estimated if the geometry of the Gaussian distributions are known.
At this point, it is convenient to write the latent variables using what is known as a 1-of-K representation.Specifically, each data point (x i , for example) is associated with a K-dimensional vector, z i .One element of z i is always equal to 1 while all the other elements of z i are set equal to 0. This means that, by stating that z ik = 1 indicates that x i was generated from the kth Gaussian in the mixture, the set Z = {z 1 , ..., z N } can be used to represent the latent variables in the problem.Further analysis can be used to show that the mixture proportions can be defined as (see [44] for example) while the probability of observing the point x i conditional on z i and θ is Assuming uncorrelated samples, one can then write that where X = {x 1 , ..., x N } is the set of all observed data.Furthermore, the posterior probability of Z can be derived using Bayes' theorem: Equation ( 8) allows the maximum likelihood θ to be identified, conditional on the latent variables.Likewise, equation ( 9) allows a probabilistic analysis of the latent variables, conditional on θ.This allows estimates of θ and Z to be estimated in a two-step procedure, known as the Expectation Maximization (EM) algorithm.

Expectation Maximization
As the name implies, the EM algorithm starts with an expectation step.Simplifying matters slightly, this is essentially where the model parameters θ are held fixed and the expected values of the latent variables Z are computed.Using equation (9) it can be shown that This step is followed by the maximization step, where the latent variables Z are held equal to their expected values and the maximum likelihood of the model parameters, θ, are computed.Evaluating the derivative of equation ( 8) and setting the resulting expression equal to 0 then, subject to the appropriate constraints, it can be shown that the maximum likelihood parameters are The sequence of EM steps is repeated until convergence of the likelihood, equation (8), is observed.The reader may consult [44][46] for more details about the EM algorithm.

Semi-Supervised Model Formulation
In semi-supervised learning, the full data set consists of labelled and unlabelled data.The aim is to classify future data using the labelled information, while also using information contained in the unlabelled data.This approach is essentially a combination of the supervised and unsupervised formulations described in the previous sections.
For the labelled data, it is now convenient to introduce a 1-of-K representation of each label.Specifically, each labelled point x i is associated with a vector y i where, in a similar manner to our definition of the latent variables, one element of y i is always equal to 1 while all the other elements of y i are set equal to 0 (thus indicating the Gaussian that was used to generate the data point).For simplicity it is assumed that the data is ordered such that the first L points are labelled, while the remaining points are unlabelled.This allows the sets of labelled and unlabelled data to be written as and respectively.The probability of witnessing the data conditional on the GMM parameters is therefore: from which it is possible to show that the maximum likelihood values of θ are where underbraces have been used to highlight which parts of equations ( 18), ( 19) and ( 20) arise because of the labelled and unlabelled data.
The expected values of the latent variables, conditional on θ, are found using equation ( 10) whereby the summation is only applied to the unlabelled data.Consequently, the EM algorithm can be applied in this context (whereby the expected labels and maximum likelihood parameter estimates are updated sequentially, over a number of iterations).

Case Study
A Renishaw RenAM 500M SLM machine was used to construct two builds, each consisting of 25 individual tensile test bars.Each build involved the printing of approximately 3600 layers.Herein, these builds are referred to as B4739 and B4741 respectively.
All samples for this study were produced from a single batch of Inconel 718.Inconel 718 has a nickel mass fraction of up to 55% alloyed with iron up to 21% and chromium up to 21%.Typical properties include high strength, excellent corrosion resistance and a working temperature range between −250 °C and 650 °C.It has a wide range of applications within industry and is suitable for applications where good tensile, creep, and rupture strength is required.In particular, it is often used in situations where corrosion and oxidation resistance at high temperatures is needed.Its excellent welding characteristics and resistance to cracking makes it an ideal material for AM.
Figure 1 shows a schematic of the machine and optical system used to control the movement of the nominal 80µm diameter focused laser spot.Samples were built in a layer-wise fashion on a substrate plate.The plate is connected to an elevator which moves vertically downwards, allowing the controlled deposition of powder layers at 60 µm intervals.
A commercially available laser processing parameter set (supplied by Renishaw) was used throughout the experiments.These were derived from standard process optimisation methods used in the AM industry.Post build, the test pieces were removed from the substrate plates using wire erosion.The tensile test bars were machined to ASTM E8-15a specification to a nominal diameter of 6.0mm and parallel length equal to 36.0mm.
Each specimen was instrumented with a dual averaging extensometer and tested at ambient temperature using an Instron tensile test machine.Tests were conducted with a 100 kN load cell under strain rate control at the first rate (0.005 strain/min) to beyond yield at which point the second rate (0.05 strain/min ) was adopted, following the removal of the extensometry equipment.Figure 1 illustrates the photodiode sensing system (MeltVIEW) that was used during each build.Light from the melt pool enters the optical mirror before being reflected into the MeltVIEW module by the galvanometer mirror.A semi-transparent mirror is then used to reflect light to photodiode 1 (labelled as 4 in Figure 1) before a fully opaque mirror reflects light to photodiode 2 (labelled as 5 in Figure 1).Photodiode 1 is designed to detect plasma emissions (between 700 and 1050 nm) while photodiode 2 is designed to detect thermal radiation from the melt pool (between 1100 and 1700 nm).Time histories of the photodiode measurements and laser position were output to a series of DAT files.Each DAT file corresponded to a layer of the build and contained approximately 115 KB of data.During processing, no missing values were identified.
Using the MeltVIEW sensing system, the task is to extract significant information about build quality from the photodiode measurements.In this work, quality is defined using the results from an Ultimate Tensile Strength (UTS) test of each bar.Here, a UTS value of 1400 MPa represents an acceptable part while UTS values below 1400 MPa represent a faulty part (this definition is sufficient for demonstrating the feasibility of the proposed approach although it is noted that more complex criteria can be utilised in the future).Figure 2 shows the x-y coordinates of the laser during the build of 1 layer on the fusion bed.Regarding the choice of sensing system, the general consensus amongst current literature is that data regarding melt pool characteristics will be closely related to build quality.Photodiode data is used to the current work as it is known to be closely correlated to properties of the melt pool (see [24], for example).While it has been hypothesised that, relative to thermal imaging systems, photodiodes may be able to capture data from a larger zone around the melt pool, the significance of this difference is currently unclear and could be investigated as future work.

Feature Extraction
As described previously, 2 SLM builds were conducted as part of the study.This led to the construction of a total of 50 tensile test bars (i.e. 25 bars per build).During each build, the x and y position of the laser was collected alongside time history measurements from 2 photodiodes sensors (sample frequency equal to 100kHz, resulting in approximately 400 GB of data per build).Here it is described how, from these large data, key features were extracted.This was based on the hypothesis that, from the photodiode measurements, it would be possible to extract relatively low dimensional features that give a statistically significant indication of build quality.It is also demonstrated how, because of the size of the data being utilised, feature extraction from SLM process measurements must be conducted using methods that are appropriate for large data sets.Initial data processing / reduction was conducted in two steps.Figure 3 graphically demonstrates this process for a single build (noting that the same procedure was applied to measurements from both photodiodes, Figure 3 illustrates the process for data from a single photodiode only).In Step 1, a downsampling procedure was used such that only the data from every 10th layer of the build was used in subsequent analyses2 .Note that only measurements taken when the laser was active were considered.In Step 2, for each layer that was analysed, the x-y position of the laser was used to identify which parts of each photodiode measurement time history corresponded to the building of a particular tensile test bar.This data was then collected together into an m × n data matrix, A, where the first column of A corresponded to measurements associated with bar one, the second column of A corresponded to measurements associated with bar two etc.The transpose of A is illustrated graphically at the bottom of Figure 3.In the final step of the feature extraction procedure the intention was to apply a Singular Value Decomposition (SVD) to the data matrix, allowing A to be written as the product of 3 matrices: where U is an m × n orthogonal matrix, V is an n × n orthogonal matrix and where σ 1 , σ 2 , ... are constants (given by the eigenvalues of A T A) that, typically, are ordered such that σ 1 ≥ σ 2 ≥ ... ≥ σ m .The SVD allows each of the columns in A to be written as a linear combination of basis vectors.Specifically, writing B = DV T , it can be shown that where a j is the jth column in A and u p is the pth column in U .From equation (24) it can be seen that each column of A is now associated with n constants (a j is associated with B 1j , B 2j , ..., B nj etc.)It is these constants that can be used as features -inputs to the machine learning algorithm.
In fact, by ordering the SVD results such that σ 1 ≥ σ 2 ≥ ... ≥ σ m , close approximations of A can be realised without using the full set of basis vectors.Specifically, if a new matrix, Ã, is formed whose jth column is then Ã will form a low-rank approximation of A. Using Ã instead of A can therefore facilitate a reduction in the size of the feature space (in other words, the number of constants associated with each column of Ã will be less than the number of constants associated with each column of A).
Unfortunately it was found that the matrix A was prohibitively large for analysis via standard SVD.To circumvent this issue A was, instead, decomposed using a Randomised SVD.A brief outline of this procedure is given in the following text, however, for more information, readers may consult [47][48] [49].
A Randomised SVD first involves the generation of an orthogonal projection matrix G, which, when applied to the data matrix, reduces dimensionality while approximately preserving the pairwise distances between each of the projected vectors.To avoid the large computational costs that can be associated with this procedure, the columns of the projection matrix are sampled from a zero-mean unit-variance multivariate Gaussian distribution.This ensures that, on average, the required properties of the projection matrix are obtained.Once G has been formed, A is projected onto G to realise the matrix H (such that H = AG).An iterative procedure, described in [48], is then used to increase the differences between the large and small singular values of H.This decreases the computational cost of the next stage of the process, whereby a QR-decomposition is used to orthonormalise the column vectors of H.The QR-decomposition is used to account for the fact that, potentially, the randomly generated projection matrix G may not be perfectly orthonormal.Having been orthonormalised, H is then used to realise a final, low rank approximation of A, denoted A .A standard SVD is then applied to A .
In the current work, for each tensile test bar, the time history of measurements from photodiodes 1 and 2 were each projected onto a single basis vector only.As a consequence, each specimen becomes associated with a 2-dimensional 'feature vector'.The first element of the feature vector represents the projection of measurements from photodiode 1 onto a single basis vector while the second element of the feature vector is the projection of measurements from photodiode 2 onto a single basis vector.Inevitably, some information is lost in this projection process.Figures 4 and 5 respectively compare a segment of the measurements from photodiodes 1 and for a single specimen, before and after the projection onto a single basis vector.If this level of information loss was deemed unsatisfactory one could choose to project these measurement time histories onto a greater number of basis vectors (although this would, in turn, increase the dimensionality of the space within which the machine learning algorithm must be applied).In the current study, however, it was found that projecting onto a single basis vector made it possible to distinguish between acceptable and faulty builds with sufficient accuracy -the potential benefits of projecting onto more than 1 basis vector will be investigated in future work.

Semi-Supervised Learning Application
Tensile tests were performed on the builds using a standard Instron tensile machine at room temperature.As detailed in Section 4.1, the ultimate tensile strength (UTS) of the bars were used to define each bar as 'acceptable' or 'faulty'.Semi-supervised learning was applied to the features extracted from each of the bars.However, bar 22 from build B4741 was not considered because its ultimate tensile strength could not be obtained.As a result, 49 specimens were considered in this analysis.Figure 6 shows the position of each specimen in the feature space and the associated labels.With the aim of distinguishing between 'acceptable' and 'faulty' cases, a GMM with two Gaussian distributions was employed.
In the following, when assessing new data, specimens are labelled as faulty if the probability that they are faulty is greater than 0.5.This was considered sufficient for analysing the feasibility of the approach such that, once established, future work can aim to further exploit the uncertainty information contained in such probabilistic outputs.In the author's opinion, it is important that an uncertainty quantification framework is built into the proposed approach from the onset as, for approaches that are purely data based, knowing when a diagnosis is uncertain and where human intervention may be required will be crucial.It is noted that, in the following, the algorithm is always initialised using the results of a purely supervised approach.Specifically, the first iteration ignores unlabelled samples and produces an initial estimate of the GMM parameters using the labelled samples only (employing equations (11), ( 12), ( 13) and ( 14)).

Results
Initial runs concentrated on a single case where 25 of the specimens were labelled while the remaining 24 were unlabelled.For this case, the unlabelled points were selected randomly, leading to the training data shown in Figure 7.The semi-supervised GMM was then trained, before being used to classify all 49 specimens.Using the training data shown in Figure 7, faulty specimens were identified with a 77% success rate.These results are illustrated in Figure 8, where red and green contours illustrate the positions of the two Gaussians in the mixture model, circle represents the true labels that were assigned to each specimen and triangles show the labels inferred by the algorithm.Note that the inferred labels are colour-coded depending on the probabilities that were assigned by the algorithm -purely green triangles correspond to the probability of a faulty specimen equal to zero while purely red triangles correspond to the probability of a faulty specimen equal to one.The results in Figure 8 represent the algorithm's outputs for a single set of training data only.To better gauge overall performance, a Monte Carlo analysis was conducted -1000 analyses were undertaken where, at each Monte Carlo iteration, the 24 unlabelled points were selected randomly.The resulting positions of the two Gaussian distributions were found to be relatively insensitive to the choice of unlabelled points.This is illustrated in Figure 9, which shows the results that were obtained for six runs of the Monte Carlo analysis.Furthermore, the algorithm success rate was also found to be relatively insensitive to the assignment of unlabelled data; the histogram in Figure 10 shows success rates that are closely clustered around 77%.It is important to note that, by giving a probabilistic estimate of each specimen's label, uncertainty quantification is embedded into the approach.This is useful as it can illustrate, to the user, when a particular specimen is difficult to label (i.e. when it is not clear which cluster the data point belongs to).To analyse how the algorithms performance degrades as less labelled data is used, similar Monte Carlo simulations were conducted using different amounts of labelled and unlabelled data.Figure 11 shows results ranging from the case where there are 48 labelled points (and 1 unlabelled point) to the case where there are 20 label points (and 28 unlabelled points).While lower success rates are more frequently observed when the number of labelled data appoints is reduced (as one would expect), it is encouraging to note that algorithm performance does not drop off sharply.It can be seen, for example, that the number of labelled data points can be halved without significantly altering the resulting success rates.While, in the example, labels were relatively cheap to obtain (using tensile tests) the cost savings associated with the semi-supervised approach will clearly increase when more thorough and/or expensive certification methods are used.For example, in the author's experience, a CT scan of a typical component usually costs between £500 and £1000.

Conclusion and Future Work
Additive Manufacturing (AM) is a digital approach for manufacturing highly customised components.However, uncertainties surrounding part quality hinders the adoption of AM technology in many risk-averse sectors.This paper is the outcome of a feasibility study wherein a semi-supervised machine learning algorithm was developed and applied to a large amount of AM process data (photodiode measurements, generated during SLM builds of tensile test bars).Key features were extracted from these large datasets using a Randomised Singular Value Decomposition, before a Gaussian Mixture Model was trained to recognise builds that had been identified as 'faulty'.The semi-supervised approach allowed this to be conducted using a reduced number of certification experiments and, even when the number of labelled data points was halved, could consistently identify faulty builds with a success rate close to 77%.Key contributions are summarised as follows: 1.In this work it was demonstrated how, when using machine learning to infer part quality from SLM process measurements, the large quantity of available data can prevent the application of 'conventional' feature extraction methods.It was illustrated how this challenge can be overcome using methods that are suitable for large datasets (a Randomised Singular Value Decomposition in this case).
2. By successfully classifying 'successful' builds with a 77% success rate, the feasibility of identifying faulty SLM builds using a purely databased approach analysis of photodiode measurement time histories has been demonstrated.
3. It has been demonstrated that, through a semi-supervised approach, the number of costly certification experiments required in the implementation of machine-learnt build classification can be significantly reduced.
The paper has led to several avenues of future work.
Firstly, the authors are currently investigating whether the results reported in the current manuscript can be improved through the use of additional basis vectors.This will reduce the amount of information lost during feature extraction but will also increase the dimensionality of the feature space within which machine learning must be performed.Secondly, with regard to sensing systems, the current paper utilised data from photodiodes sensors (which has been shown to be closely related properties of the melt pool [24]).Future work aims to investigate whether classification can be improved through the use of additional, complimentary sensing systems (acoustic sensors and thermal imaging cameras, for example).Finally, the authors are currently developing a version of the semi-supervised algorithm described in the current paper that is suitable for layer-by-layer defect detection, using data provided from CT scans.Ultimately, the aim of this work is to establish machine-learnt control strategies that can de-risk AM Technology, facilitate its wider adoption and reduce the time associated with new materials innovation.

Figure 2 :
Figure 2: x-y coordinates of the laser as a single layer of a build is being constructed.Red areas indicate the positions of the 25 tensile test bars while blue represents the laser path.Note that x-y coordinates are calculated from galvanometer measurements and that, for confidentiality reasons, units of position have been left as arbitrary.

Figure 3 :
Figure 3: Initial analysis of data from a single photodiode sensor, for a single build.

Figure 5 :
Figure5: Outputs for photodiode 2, for the first tensile test bar of build B4739.Black represents the uncompressed measurements, red represents measurements after they have been projected onto a single basis vector.Note that, for confidentiality reasons, unitless photodiode measurements are shown here.

Figure 6 :
Figure 6: The position of each specimen in the feature space.The triangular points are for bars from build B4741 and the circular points are for bars from build B4739.The colour green represents acceptable specimens and red represents faulty specimens.

Figure 7 :
Figure 7: Training data.Example labelled and unlabelled specimens, in the feature space, before the application of semi-supervised learning.

Figure 8 :
Figure 8: Example semi-supervised learning results.Red and green contours show the inferred geometry of the two Gaussian distributions in the mixture.Circles represent the true labels that were assigned to each specimen, while triangles show the inferred labels.

6 Figure 9 :
Figure 9: Semi-supervised learning results obtained for 6 runs of a Monte Carlo simulation where, for each run, 24 unlabelled points are selected randomly.

Figure 10 :
Figure 10: Histogram of algorithm success rates, obtained over 1000 runs of Monte Carlo simulation where, for each run, 24 unlabelled points are selected randomly.

Figure 11 :
Figure 11: Histogram of algorithm success rates, obtained over 1000 runs of Monte Carlo simulation, as a function of the number of labelled data points.

21 Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 18 September 2018 doi:10.20944/preprints201809.0346.v1
Figure4: Outputs for photodiode 1, for the first tensile test bar of build B4739.Black represents the uncompressed measurements, red represents measurements after they have been projected onto a single basis vector.Note that, for confidentiality reasons, unitless photodiode measurements are shown here.