Thermal error modelling of a gantry-type 5-axis machine tool using a Grey Neural Network Model

This paper presents a new modelling methodology for compensation of the thermal errors on a gantrytype 5-axis CNC machine tool. The method uses a “Grey Neural Network Model with Convolution Integral” (GNNMCI(1, N)), which makes full use of the similarities and complementarity between Grey system models and artificial neural networks (ANNs) to overcome the disadvantage of applying either model in isolation. A Particle Swarm Optimisation (PSO) algorithm is also employed to optimise the proposed Grey neural network. The size of the data pairs is crucial when the generation of data is a costly affair, since the machine downtime necessary to acquire the data is often considered prohibitive. Under such circumstances, optimisation of the number of data pairs used for training is of prime concern for calibrating a physical model or training a black-box model. A Grey Accumulated Generating Operation (AGO), which is a basis of the Grey system theory, is used to transform the original data to a monotonic series of data, which has less randomness than the original series of data. The choice of inputs to the thermal model is a non-trivial decision which is ultimately a compromise between the ability to obtain data that sufficiently correlates with the thermal distortion and the cost of implementation of the necessary feedback sensors. In this study, temperature measurement at key locations was supplemented by direct distortion measurement at accessible locations. This form of data fusion simplifies the modelling process, enhances the accuracy of the system and reduces the overall number of inputs to the model, since otherwise a much larger number of thermal sensors would be required to cover the entire structure. The Z-axis heating test, C-axis heating test, and the combined (helical) movement are considered in this work. The compensation values, calculated by the GNNMCI(1, N) model were sent to the controller for live error compensation. Test results show that a 85% reduction in thermal errors was achieved after compensation. Crown Copyright © 2016 Published by Elsevier Ltd on behalf of The Society of Manufacturing Engineers. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/).


Introduction
There is a focus of current research on high production rates on small machine tools. However, large machine tools are of great importance because of the significant demand for large highaccuracy parts, such as impellers, engine blocks, aeroplane sections, aerofoils, etc. The accuracy of a gantry-type 5-axis machine tool capable of manufacturing large parts is usually not as high as that of small, three-axis machine tools because there are a greater number of error sources, which are amplified by bigger volumes and longer axis strokes. High accuracy for smaller machines is often achievable by improved design or other "error avoidance" strategies. However, the same reductions in error are not always technically or commercially viable for larger machines.
acteristics. However, building a numerical model can be a great challenge due to problems of establishing the boundary conditions and accurately obtaining the characteristic of heat transfer. Therefore, testing of the machine tool is still required to calibrate the model for successful application of the technique.
In contrast, other techniques use empirical modelling, where the model is based on the experimental measurements of the machine tool, rather than calibrating an existing model. Different model structures have been used to predict thermal errors in machine tools such as multiple regression analysis [4], types of artificial neural networks [5], fuzzy logic [6], an adaptive neuro-fuzzy inference system [7,8], Grey system theory [9] and a combination of several different modelling methods [10,11].
Early work by Chen et al. [4] used both a multiple regression analysis (MRA) model and an artificial neural network (ANN) model for thermal error compensation of a horizontal machining centre. To build their models, 810 data sets were collected from five different tests; each test was run for 6 h for a heating cycle and then stopped for 10 h for a cooling down cycle. With their experimental results, the thermal error was reduced from 196 to 8 mm. Wang [10] used a Hierarchy-Genetic-Algorithm (HGA) trained neural network in order to map the temperature change against the thermal response of the machine tool. Wang [8] also proposed a thermal model by using an Adaptive Neuro Fuzzy Inference System (ANFIS) and optimised the number of sensors by Grey system model GM (1,m). A hybrid learning method, which is a combination of both steepest descent and the least-squares estimator methods, was used in the learning algorithms. Experimental results indicated that the thermal error compensation model could reduce the thermal error to less than 9 m under real cutting conditions. Wang in Refs. [10,8] used 150 min and 480 min of data acquisition in order to build HGA and ANFIS models, respectively. However, both models require training cycles to calibrate the model how to respond to various changes in input conditions. Eskandari et al. [12] presented a method by which to compensate for positional, geometric, and thermally induced errors of three-axis CNC milling machine using an offline technique. Thermal errors are modelled by three empirical models: MRA, ANN, and ANFIS. To build their models, the experimental data were collected every 10 min while the machine was running for 120 min. The experimental data are divided into training and checking data sets. Their validated results on a free form, show significant average improvement of 41% of the errors. Abdulshahed et al. [13] proposed a thermal model by using an ANFIS with fuzzy c-means clustering. Different groups of key temperature points were identified from thermal images using a novel schema based on a GM (0, N) model and Fuzzy c-means clustering. Experimental results indicated that the thermal error compensation model could reduce the thermal error to less than 2 m. Also, similar works have been carried out by the same authors in Refs. [11,14,15].
Wang et al. [9] proposed a systematic methodology for the thermal error compensation of a machine tool. The thermal response was modelled using a Grey model based on Grey system theory to predict the thermal errors with only 30 min of measured data. Unfortunately, their model lacks the ability of self-learning, self-adaption, self-organisation, and consideration of feedback correction. Therefore, their model obtained under one particular operating condition is not robust under other operation conditions. Gomez-Acedo et al. [16] proposed a parametric state space model for the compensation of thermal distortions in large machine tools. Only two-temperature sensors and spindle speed were used as model inputs. A small number of thermal sensors, however, might lead to poor prediction accuracy.
Whilst empirical models can be good at predicting thermal errors, they require a large amount of data with different working conditions to determine the governing laws of the system. How-ever, a realistic governing law may not exist even when a large amount of data has been measured. Furthermore, the process of obtaining such data can take several hours for internal heating tests and several days or more for the environmental test.
The growing complexity of manufacturing systems drives research to develop techniques to imitate the underlying functionality of the system. In the past, the model had to be kept as simple as possible. For instance, although the ANN models are more accurate than the regression models, the calibration of the regression model coefficients is simpler (least squares approach). Nevertheless, there is still a strong argument for simplicity, where possible, to avoid over-constraining the system and introducing instability. Extensive research has also explored a number of metamodels, e.g. polynomial models, radial basis function (RBF), and ANN models. Metamodeling involves (i) choosing an experimental design, (ii) choosing a model, and then (iii) training/calibrating the model to the experimental data [17]. There are several options for each of these steps as illustrated in Ref. [17]. Hussain et al. [18] have used a metamodeling technique based on radial basis functions, which explored using factorial and Latin hypercube designs. The resulting metamodel was tested on seven different data sets, obtained from known input-output relationships. Simulation results indicate that the factorial designs generally provided better fit compared with Latin hypercube designs for metamodels using RBF, except in some instances near the centre of design space.
Properly designed experiments should be used to obtain an accurate model. The number of samples can vary greatly depending on the complexity of the system under consideration [19]. However, many other statistical models have been trained successfully with small amounts of training data [20,21,19]. Buragohain and Mahanta [19] have proposed an ANFIS based modelling method where the number of data samples employed for training was minimised by application of an engineering statistical technique called full factorial design. Furthermore in Refs. [20,21] they have applied another method called V-Fold technique. Although, their techniques were able to construct a model with a small number of training samples (as few as 7), they still used all the experimental samples in order to select the optimal ones. Data transformation can also change the smoothness and comparability of the data. For instance, Huang and Chu [22] have proposed a data transformation technique to simplify the fuzzy modelling procedures. The transformation method allows the whole raw data to be mapped to another domain such that there is no need to adjust the membership functions, and the fuzzification process is simply taking place on the fixed ones. Shmilovici and Aguilar-Martin [23] have also utilised Box-Cox transform to improve the quality of the fuzzy model, before parameter optimisation occurs. Therefore, optimisation in the number of training patterns and data domain used for training are of prime concern in the field of modelling.
To supplement the proposed model, we use the AGO to increase the linear characteristics and reduce the randomness from the measuring samples. This simple but effective technique allows us to build the thermal model under the condition of small training data. In short, the proposed model incorporates the AGO method into the modelling process to improve its prediction accuracy and robustness with minimal efforts.
The hysteresis effect is defined as a system that has memory, where the effects of the current input to the system are experienced with a certain delay in time [24]. Due to varying thermal time constant, thermal effects on CNC machine tools have the characteristic of memorising the previous thermal status. Therefore, the errors in a machine tool are not only dependent on the current thermal status measured at the surface, but also influenced by the previous conditions of the machine. The hysteresis behaviour will introduce error in each cycle, which in a worst case scenario can be seen in large machine tools with bigger volumes, longer strokes and heavier cutting loads [25]. This hysteresis phenomenon makes the static/instantaneous modelling approach less robust. The characterisation of structural material which exhibits thermal hysteresis needs a special consideration. This is more evident when the rate of temperature change is low as compared with the speed of response of thermal displacement and also where surface-mounted sensors do not reflect the slower-changing internal temperature. Therefore, most of the above-mentioned methods require a large amount of measured data during heating/cooling cycles. Methods that require a calibrated model to predict thermal errors are expected to be confounded by the very large variety of working conditions that exist in a machine tool. Furthermore, attention is often drawn to the prohibitive downtime required to conduct the experiments in an ordinary machine shop [26].
Accurate and reliable measurements of key variables of the machine tool are very important. The information from these variables will be used for model training/calibration; therefore, they should contain the most relevant feedback information. In most of the thermal error models of machine tools, temperature sensors are used as inputs to estimate thermal deformation [9,10,12]. However, spindle speed, axis feedrate, machining time and other parameters of the machine can also be taken into consideration because they are responsible for major heat sources [27]. In some cases [28,29] no direct temperature measurement is taken and only the spindle speed and feedrate are used as inputs. However, this strategy is limited, because the model obtained under one particular operation condition is not robust under other operation conditions. Therefore, error reduction needs greater understanding of the machine tool properties and error sources. This results in the need for a machine tool structural monitoring system.
Fibre Bragg Grating (FBG) sensors are used for strain measurement purposes [30]. They have several advantages over other sensors in terms of sensitivity and quality [30] and could be embedded in a future, commercialised system. In literature, the common applications of FBG are damage detection, structure health monitoring and strain measurement in harsh environments [31,32]. FBG can be employed to observe the change in the strain of the structure with respect to variation in temperature to provide a new response of the system. By using these sensors, the modelling process can become simpler, more robust and more efficient since the number of thermal sensors can be reduced and the effects of thermal hysteresis minimised.
Huang et al. [33] used FBG to investigate the effect of temperature variations of a heavy-duty machine tool on the shop floor. The variations of ambient temperature were measured by the FBG sensors and the spindle thermal shift errors were monitored by laser displacement sensors simultaneously. Experimental results indicate that the spindle thermal errors have a similar change trend following the ambient temperature. Based on acquired data by FBG sensors and thermal error, the authors suggested that a thermal error compensation model could be built by using several modelling techniques such as multiple linear regressions, neural network, and other system identification methods; however, no implementation has been done in this regard.
This section has highlighted that many thermal error models of machine tools used temperature sensors as inputs to estimate thermal deformations. The development of a compensation system using other parameters of the machine is discussed and investigated in this work.
This paper develops an error compensation system for the gantry type 5-axis machine tool. A novel prediction model "Grey Neural Network Model with Convolution Integral (GNNMCI(1, N))" is proposed, which makes full use of the similarities and complementarity between Grey system models and artificial neural networks to overcome the disadvantage of applying either a Grey model or an artificial neural network individually. Its most signifi- cant advantage is that it needs a small amount of experimental data for accurate prediction, and the requirement for the data distribution is also low. A Particle Swarm Optimisation (PSO) algorithm is also employed to optimise the Grey neural network. Different physical inputs will be applied to the proposed model, which are capable of simplifying the system prediction model. This is because different physical inputs (temperature and strain) have different correlation efficiency and their effective and cooperative fusion is expected to produce a better prediction results. The experimental results show that the proposed model has an excellent performance in terms of the accuracy of its predictive ability and reduction of machine downtime when compared against traditional and other self-learning techniques.

Modelling the thermal error using a Grey neural network
The Grey system theory, established by Deng in Ref. [34], is a methodology that focuses on solving problems involving incomplete information or small samples. The technique can be applied to uncertain systems with partially known information by generating, mining, and extracting useful information from available data so that system behaviours and their hidden laws of evolution can be accurately described. It uses a Black-Grey-White colour to describe complex systems [35], the concepts of a Grey system can be illustrated as in Fig. 1. A grey number is a kind of figure that we only know the range of values, and do not know an exact value. This number can be an interval or a general number set to represent the degree of uncertainty of information. GM(1, N) is the most widely used implementation in literature [36], which can establish a first-order differential equation featured by comprehensive and dynamic analysis of the relationship between system parameters. The Accumulated Generating Operation (AGO) is the most important characteristic of the Grey system theory, and its benefit is to increase the linear characters and reduce the randomness of the samples. Based on the existing GM(1, N) model, Tien [36] proposed a GMC(1, N) model, which is an improved Grey prediction model. The modelling values by GM(1, N) are corrected by including a convolution integral. Traditionally, these models have been calibrated by the least square method. However, due to the nonlinearity of the problem, the least square solution may not meet the expectation.
Compared with other empirical models, artificial neural networks have a strong capacity for processing information, parallel processing, and self-learning. However, they have some disadvantages such as: the need for a large number of learning samples; the long training computation time; and the "black box" results are non-interpretable, meaning that a non-physically realistic solution can be reached but not identified. In addition, the working conditions of machine tools are in general complex and susceptible to unexpected perturbation on input signals. Therefore, ANN models in isolation have significant drawbacks as a modelling approach for thermal error compensation [37].
Because the way of presenting information for neural network and Grey models has some commonality in format, the two meth-ods can be fused. Two levels can be added; an initial Grey level will process the input information and a whitening level after to process the output information to obtain good results [38]. Therefore, the Grey meaning is contained in the neural network. The advantages of both can be used to build a high-performance neural network model with a minimum amount of training data. The main difference between Grey neural network modelling and conventional neural network modelling is that the hidden layers and their nodes are determined precisely by the Grey system theory in the Grey neural network method while the conventional neural network methodology defines them through tedious trial and error work. Although, radial basis function (RBF) method has been widely used in time-series prediction with few training dataset, it is still difficult to select an appropriate network structure. Therefore, the proposed method drives from Grey system theory which is a relatively similar to fuzzy mathematical tools.
Currently, the most neural networks based on standard back propagation (BP) learning algorithm. As standard BP algorithm uses gradient decent method it is easy to fall into local minima and has poor generalisation performance. The PSO algorithm was introduced by Eberhart and Kennedy in Ref. [39] as an alternative to other evolutionary techniques. The PSO algorithm is inspired by the behaviours of the natural swarms, such as the formation of flocks of birds and school of fish. The advantages of PSO algorithm is that it does not require the objective function to be differentiable as in gradient decent method, which makes few assumptions about the problem to be solved [40]. Furthermore, it has simple structure and its optimisation method illustrates a clear physical meaning. PSO consists of a population formed by individuals called particles, where each one represents a possible solution of the problem. Each particle tries to search the best position with time in D-dimensional space (solution space). During swim or flight, each particle adjusts its "flying" or "swimming" in light of its own experience and its companions' experience, including the current position, velocity and the best previous position experienced by itself and its companions. Therefore, instead of using the standard algorithms, a new method, PSO algorithm is employed to optimise the Grey neural network parameters in this study.

GNNMCI(1, N) architecture
The fusion model of Grey system and neural network is employed in the modelling of the thermal error of machine tools. The model can reveal the long-term trend of data and, by driving the model by the AGO, rather than raw data, can minimize the effect of some of the random occurrences. Therefore, the first step for building GNNMCI(1, N) is to carry out 1-AGO (first-order Accumulated Generating Operation) to the data, so as to increase the linear characteristics and reduce the randomness from the measuring samples (see Appendix A). To understand this property in more detail, Fig. 3 shows original (temperature changes) and converted series of data. The PSO algorithm, with capability to optimise complex numerical functions [39], is adopted to train the GNNMCI(1, N) model. Finally, an IAGO (inverse Accumulated Generating Operation) is performed to predict the thermal error and generate the final compensation values. The model fully takes the advantages of neural networks and Grey models, and overcomes the disadvantages of them, achieving the goal of effective, efficient and accurate modelling. The modelling detail is described as follows: The Grey prediction model with convolution integral GMC(1, N) [36] is: where X 1 is the 1-AGO data of the predicted series and X i , i = 2, 3, . . ., n are the corresponding 1-AGO data of the associated series, b 1 is the development coefficient, b i (i = 2, 3, . . ., N) the driving coefficient, and u is the Grey control parameter. Therefore, time response sequences can be obtained.
. .,n. To calculate the coefficients b i and u, the neural network method can be used to map Eq. (2.2) to a neural network. Then, the neural network model is trained until the performance is satisfactory. Finally, the optimal corresponding weights are used as the Grey neural network weights to predict the thermal error, similar procedure can be seen in Ref. [38].
Where k is the serial number of input parameters; In this study, x 1 (k + 1) is chosen as a dependent variable (network output) and x (1) N−1 (k + 1), as independent variables, (N is the number of network inputs); w 11 , w 21 , w 22 , . . ., w 2n ; w 31 , w 32 . . .w 3n are the weights of the network; Layer A, layer B, layer C, and layer D are the four layers of the network, respectively.
Where, the corresponding neural network weights can be assigned as follows: Let us assume that The transfer function of Layer B is a sigmoid function f (x) = 1 1+e −x , the transfer functions of other layer's neuron are adopted as a linear function f (x) = x.
Step 2: In order to avoid the entrapment in a local minimum, a PSO algorithm is adopted to train the GNNMCI(1, N) model. Here, a particle refers to a weight in the model that changes its position from one move to another based on velocity updates. The flowchart for PSO implementation is given in Fig. 4, and the mathematical description of PSO algorithm is as follows; suppose that the search space is D-dimensional, then the current position and velocity of the ith particle can be represented by W i = [w i1 , w i2 , . . ., w iD ] T and  variables. Afterwards, Particle i adjusts its velocity of iteration k + 1 according to the local and global best positions, as well as the velocity and position of iteration k, as follows: where ω is the inertia factor which is used to manipulate the impact of the previous velocities on the current velocity. C 1 and C 2 are  the self-confidence factor and the swarm-confidence factor, respectively. R is a uniformly distributed random real numbers that can take any value between 0 and 1.
With the updated velocity, the position of particle i in the iteration k + 1 can be obtained as follows: The fitness of particle is measured using a fitness function that quantifies the distance between the particle and its optimal solution as follows: where f is the fitness value,x (0) (k) is the target output; and, x (0) (k) is the predicted output based on connection weight (particle) updating.
Step 3: update the velocity and position of each particle based on Eqs. (2.7) and (2.8).
Adjusting the connection weights between layers: -Adjusting the connection weights from LA to LC.
-Adjust the bias value .
Step 4: If the value of the error meets the requirement of the model, or a pre-determined number of epochs are passed, then the network training will end if not, then return to Step 3.
Step 5: Export the optimal solution W i .

Experimental setup and approach
In this study, the machine under investigation is a 5-axis gantry milling machine as shown in Fig. 5. The machine is constructed of three linear axes X, Y, Z, and two rotary axes B and C. The toolcarrying spindle is mounted on the B axis and for this configuration, all axes move the tool. The maximum speeds along the X axis, Y axis, and Z axis of the machining centre are 75 m/min, 75 m/min, and 70 m/min, and the travels are 2.5 m, 1.2 m, and 0.7 m, respectively. The spindle has a maximum rotational speed of 3200 revolutions per minute. This machine has linear scale feedback for the three axes and directly mounted rotary encoder for the B and C axes.
The first step in modelling the thermal errors of this machine was to perform an initial assessment to identify machine structural elements and heat sources that contribute most significantly to the machine errors. A thermal imaging camera was used to record temperature distributions across the machine structure during "dry" operations, i.e. without coolant present. The two main contributors to thermal error were due to C-axis rotation and Z-axis movement of the ram. These two errors are therefore analysed in this paper. MATLAB processing routines have been devised in Ref. [41] to generate "virtual" temperature sensors from the thermographic images, which were used to identify the optimal position to install surface-mount temperature sensors on the surface of the structure (see Fig. 6). From related work on this aspect [13] and the initial tests, a total of twelve temperature sensors were placed on the machine. Six sensors were located on or near the major heat sources: one measured the surface temperature of the ram near the C-axis motor (T1); one (T2) measured the surface temperature of the lower bearing of the ball screw; two monitored the gradient from the end of the ram (T3, and T4); and two measured the  surface temperature of the Z-axis motor (T5, and T6). Another six temperature sensors were placed around the machine to pick up the ambient temperature changes. Four laser displacement sensors were used to measure the displacement of a test bar (attached to the spindle) caused by the thermal distortion of the machine: two measured displacement of the test bar in the Y-axis and Zaxis directions (this study did not consider the X-axis direction due to symmetry of the machine); two measured any tilt. A general overview of the experimental setup is shown in Fig. 7.
To improve the accuracy of the predicted model, and to avoid the need for a large number of temperature sensors, additional feedback information is supplied by Fibre Bragg Gratings (FBGs) as shown in Figs. 5 and 6. This can detect the change in length by measuring the detectable strain. However, the FBG sensor itself is also affected by temperature by a factor that equates to 8.64 m/m/ • C. One method of compensating temperature is to use an unconstrained grating to measure temperature. Nevertheless, this was unviable for this application because it would require additional gratings to be mounted, incurring additional cost and requiring additional mounting space, which was not readily available. Instead, the low-cost temperature sensors used for the temperature-based model were used to correct for change in the grating temperature. Three FBG sensors were placed on the ram structure in order to measure the distortion of each side of the structure. Another four FBG sensors were placed on the cross-beam structure to monitor the thermal response with change in the ambient temperature. These on-line measures will be used as input to the proposed model in order to predict the growth of the ram along the Z-axis direction. Fig. 8 shows test results from a cycle of two hours heating-up and another two hours for cooling down test detail will be given in Section 4. Results show that the temperature of the machine tool (T2 Ram rear) changed with a certain delay relative to variation in the machine displacement and FBG sensors (FBG-1, and FBG-2).

Hysteresis effect
Furthermore, Fig. 9 shows hysteresis plot of different sensors, it can be clearly seen that the FBG sensors located on the machine ram exhibit lower hysteresis. For example, (FBG-1, and FBG-2) sensors respond in an almost linear fashion, whether the machine is being heated or cooled. It can also be observed that the temperature at the point of measurement (T2 Ram rear) possess slightly higher hysteresis behaviour relative to other sensors; there is a latency of approximately 10 min. By using FBG sensors, the effect of thermal hysteresis could be minimised. Therefore, the application of FBG sensors could allow for a more accurate prediction of thermal error.

Error compensation model
The model designers often want to know which heat sources have a dominant effect and which exert less influence on thermal response of the machine tool. Poor location and a small number of thermal sensors will lead to poor prediction accuracy. However, a large number of thermal sensors may have a negative influence on a model's robustness because each thermal sensor may bring noise to the model as well as bringing useful information. Furthermore, issues relating to sensor reliability are commercially sensitive; the fewer sensors installed the fewer potential failures. The optimal sensor locations were selected based on our work in Refs. [15,13]. The Matlab software has been used successfully in numerous other applications [42][43][44][45][46]. Thus, the thermal compensation model is designed and simulated in the MATLAB environment. The integrated model was designed as follows: Step 1: A 1-AGO (first-order Accumulated Generating Operation) is applied to the raw data to increase the linear characteristics and reduce the randomness from the measuring samples.
Step 2: The GNNMCI(1,N) model is trained with a PSO algorithm as discussed in Section 2.2.1.  Step 3: An IAGO (Inverse Accumulated Generating Operation) is performed to calculate the thermal error and generate the final compensation value.
To demonstrate the modelling of thermal error using GNN-MCI(1,N) model, five independence variables (temperature and strain) were selected based on their influence coefficient value using the Grey model [13]. Three FBG sensors (FBG-1, FBG-2, and FBG-3) were placed on the ram structure in order to measure the distortion of each side of the structure. Another FBG-4 sensor was placed on the cross-beam structure to monitor the thermal response with change in the ambient temperature. Additionally, there is another temperature sensor was placed on the ram structure (T2 Ram rear). These on-line measures will be used as input to the proposed model in order to predict the growth of the ram along the Z-axis direction.
In this paper, two compensation procedures were used to predict the thermal errors. The first method was to obtain the GNN-MCI(1, N) model at the first stage of the test regime, and then to use this model to predict the machine movement during the remainder Table 1 The training data from first 5 readings. of the same test or for other regimes. The other was to obtain the model parameters during a short test, and then predict the thermal displacement for all other tests. The advantage of using a short test to calibrate the model is that it reduces non-productive downtime of the machine. The potential disadvantage is the lack accuracy of the model due to low training experience. In order to optimise the GNNMCI(1, N) parameters (weights), the experimental data sets were divided into training set (and afterward direct validation), validation set (cross validation), and testing set. An example of training data set from a short test of five samples is illustrated in Table 1; four FBG sensors and one temperature sensor are used as inputs, and Y-axis displacement as output.
In the PSO algorithm, the number of the particles is set to be 90 whilst the self-confidence factor and the swarm-confidence factor are C 1 = 1.5 and C 2 = 2, respectively. The inertia weight was taken as a decreasing linear function in iteration index k from 0.9 to 0.4, which were the same as those suggested by other papers [47,48] and these values did not depend on the problems. After 100 training epochs, the total error was at acceptable level. The Grey neural network weights obtained using PSO algorithm are: Training and validation errors diminish through the initial phase of training stage. The first test was to check whether the model is able to reproduce the training dataset that has been used for training stage (direct validation). Subsequently, cross validation has been applied to check the model validity. When the validation error becomes minimum, the most appropriate model is achieved. The prediction result of the next six values of thermal errors derived   by these weights based on this GNNMCI(1, 6) model are listed in Table 2.
The final GNNMCI(1, 6) model being trained and validated in this work has been tested by new unseen dataset. The independent variables are shown in Fig. 10(a). Simulation results show that the thermal error in the Z direction can be significantly reduced to less than ±5 m using testing dataset (see Fig. 10(b)). Furthermore, this result shows that the PSO algorithm can act as an alternative training algorithm for Grey neural network that can be used for thermal error compensation.
The modelling approach mentioned in this section is a preliminary work with a scope to be extended in the next sections by considering a variety of modelling methods such as a modular approach.

Results and discussion
Several experiments were conducted on the 5-axis milling machine. The primary motivation of these experiments was to compensate the deformation taking place in the ram of the machine in the Z-axis direction as a result of heat induced by rotation of the C-axis and by motion of the Z-axis. The Z-axis heating test, C-axis heating test, and the combined (helical) movement are considered in this paper. Detailed procedures and results are as follows:

Case 1: Z-axis heating test
In this test, the ram reciprocates at a speed of 70 m/min 10 times before dwelling for 10 s (to allow stable measurement) to excite the thermal behaviour in the ram. This cycle is repeated for the two hours "heating" cycle. The axes remain stationary for a subsequent two hours cooling cycle. The temperature variation is measured by the temperature sensors and the change in the strain of the ram and crossbeam are measured with FBG sensors. The data is given in Fig. 11 (a). The heat sources on the ram structure are friction in the two support bearings of the Z-axis ballscrew, friction in the ballnut and the power loss of the Z-axis motor. Additionally, there is an effect from change in ambient temperature on the whole structure of the machine. Laser position sensors were used to measure the growth of the ram along the Z-axis direction. It can be seen that the rise in temperature measured by the selected sensors correlates to an error in the Z-axis of more than 100 m.
The simulation result shows that the GNNMCI(1, 6) model can predict the error accurately and also can track the sudden changes of thermal error precisely (the maximum residual is approximately 16 m, a 85% improvement see Fig. 11 (b)), even with such a short training period. Indeed, the greatest loss in model accuracy occurs over one hour after the "heating" cycle. The majority of this thermal error derives from a reaction to ambient changes, for which the model has not been trained. This effect may not be significant in practice since it could be argued that the machine will not be producing parts if the axes are not being used. Nevertheless, this issue will be addressed under further work for those situations where the machining regime excites different parts of the structure during various operations.

Case 2: C-axis heating test
In this test, the C-axis rotates at 2500 rpm ten times before dwelling for 10 s (for measurement) to excite the thermal behaviour in the machine ram. This cycle is repeated for the two hours "heating" cycle. The axes remain stationary for a subsequent two hours cooling cycle. Data collected from temperature sensors and FBGs sensors are shown in Fig. 12 (a). The heat sources in this test are the friction in the C-axis bearings and loss from the motor located inside the ram structure (near the location of temperature sensor   T1). Therefore, T1 is the highest temperature (rising by 7 • C). The maximum value of T2 is lower than (4 • C), and the value of T3 and T4 are the lowest (1 • C) because they are relatively further from the heat source. The Z-axis thermal error was greater than 80 m.
As with the Z-axis heating test, the model weights were obtained at the first stage of the test regime. Simulation results show that the GNNMCI(1, 6) model can provide a good prediction result. Fig. 12 (b), presents the comparison between thermal displacements from the actual measured data and the output of the model. It can be seen that the prediction ability of the model is excellent, and that the model shows a reduction from 80 m to ±8 m.

Case 3: combined axis (helical) test
In this test, the C-axis rotates while the Z-axis is also oscillated simultaneously (helical test). The purpose was to validate the compensation model for the thermal error that was trained from the previous two cases (Case 1, Case 2). This was to demonstrate that the thermal model could be built up in a modular form and so is extensible to the remainder of the structure.
The four hours validation test was again equally divided into two stages of heating and cooling cycles. Fig. 13 (a) describes the temperature/strain change during the test regime, which induces thermal expansion in the Z-axis direction of approximately 95 m. The model weights were obtained from the previous independent C-axis and Z-axis tests. Fig. 13 (b) shows a reduction in error from 95 m to ±9 m, with the loss in performance again being prevalent quite some time after the heating part of the cycle. This study validates the modular approach, which means that the combining training data can be superimposed on each other in one model.

Comparison with other models
In order to assess the ability of the GNNMCI(1, N) model relative to that of a neural network model, and a conventional GMC(1, N) model, two models were constructed using the same input variables to the GNNMCI(1, N) model with five inputs. A Feed-forward Multilayer Perceptron (MLP) has been widely used ANN model for thermal error compensation [5,49], so this was selected as the benchmark. In this model, 70% of the dataset was assigned as the training set, while the remaining 30% was used for testing the performance of the model prediction. Usually, ANN model have three layers: Input, hidden and output layer. An ANN model with three layers was used in this study: the input layer has five input variables and the output layer has one neuron (the thermal response in the Z-axis direction). Although, an ANN model is able to learn from relationships between inputs and output, the optimal number of neurons in hidden layer has to be found. Therefore, the selection of this number is a trial and error process that may be changed during the optimisation process. The number of neurons in  hidden layer was varied from 1 to 15 with different transfer functions namely logarithmic sigmoid and tangent sigmoid. We started with one hidden neurons, and then the ANN is trained and tested. The number of hidden neurons is then increased, and the training process is repeated while the overall results of the training and testing are improved. Thus, after a series of simulations to find the best architecture, an ANN model with 10 neurons in the hidden layer and logarithmic sigmoid was constructed to predict the thermal response in the Z-axis direction.
Generally, ANNs are trained by adjusting the weights to reach from a particular input to a specific target using a suitable learning algorithm until the ANN output matches the target. The training process stops when the error falls below a pre-determined value or the maximum number of epochs reaches. Among supervised gradient-based training method, Levenberg-Marquart is commonly used because of its integration of advantages of Gauss method and steepest gradient descent algorithm. The best ANN architecture is illustrated in Table 3 below.
Another Grey model was developed by using the traditional Least Squares (LS) method in order to evaluate the model parameters. The unknown variables of the Eq. (2.2) were determined by the traditional least squares method. A similar model has been used earlier by Wang et al. [9] for thermal error compensation on CNC machine tool. The three comparison models were further verified by the unseen combined axis test (Section 4.3), not used during the training, validation and testing stages. Predictive results using the ANN and Grey model are shown in Fig. 14 (a) and (b), respectively, and can be compared to Fig. 13 (b), which is the result from the method proposed in this paper. The performance of each of the three thermal prediction models is presented and compared in Table 4, where the three models are validated by the same testing dataset. According to the predictive results and evaluation criteria values in Table 4, it is very clear that the GNNMCI(1, N) model has a smaller Root Mean Square Error (RMSE), residual value (±9 m), higher efficiency coefficient (E), and higher correlation coefficient (R) compared to the ANN and Grey model. The ANN model performed better than the Grey model for predicting thermal error in Z-direction. It can be also observed from Table 4 that the models developed using the artificial intelligence techniques outperformed the statistical model (Grey model with (LS)). However, although the ANN model does reduce the residual value to less than ±15 m, it requires a large amount of high quality dataset to train the model. Furthermore, it is worth noting that the ANN model needs a proper optimisation to predict effectively. For instance, the ANN model needs 10 neurons in the hidden layer, which was difficult to optimise. Therefore, the results obtained from the proposed GNNMCI(1, N) model exhibit better performance than conventional models, with far fewer training samples. Consequently, this paper develops a simple, less computationally intensive and lower-cost approach with a high adaptation rate. This work develops an error compensation model for the gantry type 5-axis machine tool. The machine operates in a non-temperature controlled environment. Changes in temperature cause the machine to change shape and result in a loss of accuracy. In the initial work on this machine, only temperature sensors were used as inputs to the model. The model established by only temperature sensors on this machine has high residual value due to complexity of the thermal behaviour, as a result of bigger volumes, and longer strokes. The model was improved by fusion of both temperature sensors and direct strain measurement from FBG sensors. Additionally, another model was built up of two component modules. The validation of combined thermal inputs was shown to be as effective as when the individual elements were validated.
Unlike the existing deterministic models, the proposed method is easily extensible to other physical variables. This means that alternative or additional sensors can be deployed with minimal retraining required. Furthermore, other machine or machining parameters can be acquired directly from the controller to provide some feedforward information and to minimise the effects of thermal hysteresis. Example is the spindle speed or axis feedrate, although other significant factors can also be considered. It is worth noting that changes to motor behaviour over its lifetime will affect the thermal output at a given speed. For this reason, the inclusion of the primary parameters is non-trivial when looking for long-term accuracy from the model and it can be more robust only to include the derived values that directly affect accuracy.
One of the major problems for thermal error modelling is the complex way in which the machine tool distorts due to the environmental change combined with duty cycle effects. It will never reach a true thermal equilibrium condition. Future studies will also concentrate on the investigation by looking at applying this modelling technique to a machine tool under different conditions; different Environmental Temperature Variation error (ETVE) tests (summer and winter), and more complex duty cycles.

Conclusions
This research work proposes a thermal error modelling method based on the Grey system theory and the learning ability of the artificial neural network in a single system. The number of sensors used in this model was minimised by fusion of both temperature sensors and direct strain measurement from Fibre Bragg Gratings (FBG) sensors. We have shown that a model consisting of a combination of a direct strain measurement and temperature sensors can minimise the hysteresis effect with much more sensitivity. The model was built up of two component modules and so is shown to be extensible to the remainder of the structure by adding further models. This is important where changes to the structure are possible, since it means that only that part of the model needs to be retrained. It also means that for greater precision, other structural elements can be conveniently included in the model depending upon the amount of precision required. The compensation system using GNNMCI(1, N) model has been found to be flexible, quick and efficient to implement, and has been used to reduce thermal errors from heating of the C and Z axes of a gantry machine by over 85% using a quick heating test for calibrating the model. This dramatically reduces the amount of experimental data, and so reduces the downtime needed for implementing the compensation model. The proposed model was compared to two other architectures and demonstrated better performance than ANN model and Grey model with far fewer training samples.
It can therefore be concluded that the thermal error compensation model using GNNMCI (1, N) introduced in this study can be applied in modular form to any CNC machine tool because the model does not rely on a parametric model of the thermal error behaviour. In addition, this model is open to extension to other physical inputs, meaning that alternative sensors can be deployed with minimal retraining required. There is still large room for enhancement of the proposed model by including more machining parameters from the controller to provide some feedforward information, and try different hybrid AI tools to optimise the model parameters for thermal error modelling. Future studies will also concentrate on validating the proposed model with different CNC machine tool configurations under sophisticated operation conditions.
one of the original series as the second value of the new series, selecting the sum of the first three values of the original series as the third value of the new series, and so on, as follows: (1) (1) , x (1) (2) , . . .x (1) (n − 1) , x (1) (n) . (A.2) By so doing, we obtain the new 1-AGO series X (1) of the original data X (0) , which have more regular series for the benefit of modelling instead of modelling with original data.
Step 3: 1-IAGO can be applied to obtain the original series, selecting the first value as the first value of the new series, selecting the second value minus the first one of the original series as the second entry of the new series, selecting the third value minus the second one of the original series as the third value of the new series, and so on. The mathematical expressions are as the following: where k = 2, 3, . . ., n.x (0) (1) = x (1) (1). Therefore, by applying AGO transformation, the following important advantages can be obtained: (i) removing extreme fluctuation and noise so that the new series is more stable for modelling, (ii) the new series has a linear characteristic which makes it easier to model instead of modelling with the original data, (iii) and it has the characteristic of determining realistic governing laws from the available data [50,51]. The emphasis is to discover the true properties of the system under the condition of small training data.