Image-based dataset of artifact surfaces fabricated by additive manufacturing with applications in machine learning

Fused Deposition Modeling (FDM), also known as Fused Filament Fabrication (FFF), is the most widely used type of Additive Manufacturing (AM) technology at the consumer level. This technology severely suffers from a lack of online quality assessment and process adjustment. To fill up this gap, a high-speed 2D Laser Profiler KEYENCE LJ-V7000 series is equipped above an FDM machine and performs a scan after each print layer. The point cloud of the upper surface will be processed and transformed into a 2D depth map to analyze the in-plane anomalies during the FDM fabrication process. The author used the above data to categorize the surface quality into four categories: under printing, over printing, normal, and empty regions. The author showed the effectiveness of data in detecting print anomalies, and further work can be done to show the application of more advanced algorithms towards a better detection accuracy or to present a novel way to approach the data and detect a broader range of anomalies.


a b s t r a c t
Fused Deposition Modeling (FDM), also known as Fused Filament Fabrication (FFF), is the most widely used type of Additive Manufacturing (AM) technology at the consumer level. This technology severely suffers from a lack of online quality assessment and process adjustment. To fill up this gap, a high-speed 2D Laser Profiler KEYENCE LJ-V70 0 0 series is equipped above an FDM machine and performs a scan after each print layer. The point cloud of the upper surface will be processed and transformed into a 2D depth map to analyze the in-plane anomalies during the FDM fabrication process. The author used the above data to categorize the surface quality into four categories: under printing, over printing, normal, and empty regions. The author showed the effectiveness of data in detecting print anomalies, and further work can be done to show the application of more advanced algorithms towards a better detection accuracy or to present a novel way to approach the data and detect a broader range of anomalies.

Data Description
The dataset presented in this article consists of 434 scan files from the top surface of the artifacts under study. Each scan file is pre-analyzed and threshold to generate color map data to represent the surface quality better. All the scan files are summarized and stored in a struct file with branches as follows: 1) Heightmap image of size [300 × 300 × 3 uint8] 2) Labels [10 × 10 double] 3) Part's name or number as a string Each height map data is divided into a 10by10 grid (100 segments in total), and professionals labeled each segment into four categories: (a) Over Printing Situation, (b) Normally Printed Situation, (c) Under Printing Situation, (d) Empty. After acquiring labels, each image was cropped into 100 smaller images and then stored in a separate folder based on the label. Table 1 summarizes the amount of data available for each category and the proportion used for training, validation, and testing. Four directories containing all the segments are also attached to the dataset. On top of the data, we have also attached a MATLAB-based User Interface (U.I.) developed for labeling this dataset. Fig. 1 shows the structure of the dataset.

Experimental Design, Materials and Methods
The dataset presented in this study is captured by an online laser-based process monitoring and anomaly identification system for the FDM machine. This system equips with a non-contact laser scanner to obtain the top surface profile of each layer during the FDM fabrication process. This section describes the experimental setup, the pre-processing steps for the dataset, and the data labeling steps.

Experimental Setup
In this study, the FDM machine used is Creality ender 5 with 220mm × 220mm × 300mm building volume. The schematic of the online laser-based process monitoring system is shown in Fig. 2 . A high-speed 2D Laser Profiler KEYENCE LJ-V70 0 0 series is equipped above the FDM machine. The laser scanner includes a laser line emitter and built-in camera packaged with a fixed relative position. The scanner is calibrated from the factory and has a 1 μm resolution in the Z direction. The measurement range of the laser scanner is ±48 mm from the reference position, which is 200 mm below the bottom of the sensor. An artifact is designed in SOLIDWORKS 3D CAD design software for each print and converted into Standard Tessellation Language (STL) with American Standard Code for Information Interchange (ASCII) codes. The Simplify3D Software is used to slice the STL model and generates the G-code of the artifact. Then the FDM machine starts printing after receiving the G-code from the workstation. After each layer is completed, the scanner frame motion mechanism will drive the laser scanner to move above the fabricated layer with constant speed. At the same time, an angle encoder continuously triggers the scanner to perform measurements. The limit switches are there to make sure the scan is limited to the platform area of the FDM machine. Then the 3D point cloud of the upper surface is acquired from the laser scanner. The experimental setup is shown in Fig. 3 .

Point cloud processing
The 3D point cloud acquired from the laser scanner includes noise that needs to be removed before further analysis. The mechanical vibration commonly causes noise during the scanning process and the shadow effect characteristics of the optical scanning sensor. This study uses three steps to pre-process the 3D point cloud, including replacing missing measurement, removing measurement noise, and transformation.

Step 1: replace missing measurement values
When the target is out of detecting range during the laser scanning process, or does not capture enough information, the laser scanner generates missing measurements values, flagged as Nan in this laser scanner. The raw data obtained from the laser scanner is a 2D depth matrix in the form of m × n . To replace the missing measurement values, the median operator with a 3 × 3 window is utilized [2] . If missing values surround the missing value, the pixel will be removed. Then filled 2D depth dataset is converted to point cloud form represented as ( x , y , z ) spatial coordinates.

Step 2: remove measurement noise
Due to the design of the laser scanner, the shadow effect can cause measurement noise in the raw data. When the scanner moves above areas of rapid change in height, these areas could be identified as outliers algorithmically according to the surrounding topographies. The statistical outlier removal algorithm and region growing algorithm in the point cloud library (PCL) [3] are customized and utilized to reduce the noise in the point cloud dataset. The effect of applying these algorithms is shown in Fig. 4 . Fig. 4 (a) highlights the noise point cloud and concentrates on the step area. In Fig. 4 (b), to better display the point cloud after using a statistical outlier removal algorithm, the point cloud with different heights is rendered as different colors. It shows that the algorithm can effectively remove the noise between steps. Fig. 4 (c) shows the noise point cloud as an island apart from the main body. The extra deposit plastic is as the print head travels through the open air and the laser scanner captures it. This phenomenon is commonly caused by incorrect retraction settings or high extruder temperatures. Since the study focuses on the upper place surface, the point cloud of extra material is treated as noise data and can be removed by using a region-growing algorithm.

Step 3: point cloud transformation
After removing the noise data, the point cloud should be transformed to the FDM printer coordinate to keep the scan results in the same spatial coordinate. Therefore, a calibration process is necessary to find the spatial relationship between the FDM printer and the laser scanner [4] . The spherical targets are used as calibration referencing markers. The calibration method is shown in Table 2 . Four hemisphere targets are fabricated by the FDM printer. Since the fabrication coordinates of the hemisphere centers are known in the Gcode file, when the scan coordinates of the fabricated parts are obtained, the spatial relationship between FDM and laser scanner can be calculated and represented as 3D affine transformation matrix. To reduce the computational burden, transformation matrix R _ 1 is calculated to make the platform plane of point cloud parallel to the x-y plane of the FDM machine. Then transformation matrix R _ 2 is Table 2 Calibration algorithm for laser scanner.
Calibration for 3D printer and laser scanner system Input : Point cloud of calibration target pt _ scan , Vector of hemisphere centers in 3D printer coordinate v _ target Output : Transformation matrix R 1. Denoise pt _ scan , 2. Using RANSAC, segment platform plane plane _ plat f orm from pt _ scan , get the normal vector of platform plane v _ plat f orm , 3. Obtain transformation matrix R _ 1 from v _ plat f orm to ( 0 , 0 , 1 ) , 5. Segment hemispheres from pt _ 1 , get the vector of hemisphere centers v _ 1 , 6. Obtain transformation matrix R _ 2 from v _ 1 to v _ target, obtained by transferring one of the hemisphere centers to the design coordinate. Finally, the iterative closest point (ICP) algorithm is used to fine tune the alignment of all hemisphere centers and the design coordinates. The ICP algorithm is widely used to minimize the difference between two point clouds.
Then the interested point cloud of the upper surface is segmented from the pre-processed point cloud. As shown in Fig. 5 , the Random Sample Consensus (RANSAC) algorithm [5] is used to segment the planes of the point cloud. Using this algorithm, only the points between two parallel planes within distance threshold δ are considered as inliers. In this study, the δ is set as half of the layer thickness. The majority of the point cloud in the upper surface can be selected, shown as green points in Fig. 5 . The plane segmented by RANSAC is identified as the upper surface virtual plane. Then the threshold δ is increased to the layer thickness value to include more point clouds, which are used as points of upper surface for the following processing.
Finally, to generate a 2D depth image dataset shared in this paper, the point cloud of the upper surface needs to be projected on a 2D plane. The rasterization method is implemented [4] .  Then the color of each pixel in the 2D depth image is determined by the ratio of the accumulated depth of pixel and layer thickness. For example, if the pixel is shown as green (0, 255, 0), it means that the accumulated depth of this pixel is between -20% to 20% of the layer thickness, which is 0.3mm in this study. The color mapping rule is shown in Fig. 6 .

Database Structure
The database is generated after the printing of multiple artifacts, shown in Fig. 7 . During the printing of each artifact, with different settings, various qualities were observed. The process parameters are shown in Table 3 . All the other parameters are kept constant. Later, each observation was studied and processed. In total, we gathered 434 scans, and by dividing each image into 10by10 segments, we generated 43400 labeled data images.

Labeling the Dataset
Labeling a dataset in many cases requires a significant investment and effort of professionals to make sure the data is labeled correctly and in a repeatable nonbiased manner. In this dataset, to simplify the process, four categories for labels are selected: 1) Over Printing Situation, 2) Normally Printed Situation, 3) Under Printing Situation, 4) Empty Region. To better visualize the labeled data, solid color has been given to each grid based on its label respectively: Red, Green, Blue, white. Due to the complexity of the labeling task and the massive data, a User Interface based on MATLAB was developed to minimize the user error in labeling and store the labels by taking advantage of visual representation of data and labels. This Interface allows researchers to load their data, define the number of segments and labels needed, and ease the labeling process. Professionals can go over the previously labeled data with the review functionality to doublecheck and verify their accuracy visually. In Fig. 8 , a sample of the labeling U.I. is shown in the Testing mode. This U.I. is also shared along with the data.