Mathematical approach to the validation of surface texture filtration software

A novel method for the validation of surface texture filtration software is introduced. Mathematically traceable reference pairs for linear Gaussian filtration are developed, utilising Fourier series surface definitions in conjunction with the frequency dependent transmission characteristic of the linear Gaussian filter. The novel method is demonstrated using a library of reference pairs to validate the performance of five surface texture analysis software packages. Investigations into the effects of different surface properties are made in relation to the deviation of the software-obtained results from the traceable reference values. Analysis of variance tests are used to verify the statistical significance of the results.


Introduction
The analysis of surface texture is an important aspect of surface topography characterisation in the field of precision manufacturing [1][2][3]. Surface texture can have a significant effect on the function of a part, and attributes such as friction and wear can impact the overall lifetime of a component and contribute toward energy consumption and efficiency [4][5][6].
Surface texture analysis is usually performed using surface texture field parameters; numerical descriptors of a surface that provide statistical information about the distribution of heights across a measured area [1,7,8]. Obtaining meaningful surface texture parameters from a surface topography measurement is achieved by first performing adequate filtration operations in order to extract a finite spatial frequency band from the surface measurement data [9]. Appropriate a priori knowledge is required to determine the scale-limited surface relevant to the application at hand, enabling efficient and effective analysis [6].
Surface texture analysis software is used to perform filtration operations and it is important that such software is validated against reference values. The current state of the art uses reference software, developed by national metrology institutes (NMIs), to calculate parameter values for a given input surface topography dataset against which software-obtained values can be compared [10][11][12]. However, these reference software packages are developed using similar numerical, discrete-based algorithms as commercial software and, therefore, are subject to the same sources of error. Algorithms can vary in different implementations of software and can lead to variation in the resulting output values. Previous work has compared the output parameter values of different reference software packages and shown significant variations in the results [13][14][15]. These differences justify the need for an alternative approach to surface texture analysis software validation that moves away from discrete software-based algorithm implementations and instead utilises mathematically-defined traceable references.
Previous work has addressed the calculation of areal surface texture parameter values using a mathematical foundation [16,17]. By defining a surface analytically, surface texture parameter values can be obtained that are accurate and mathematically traceable [18]. These mathematically defined parameter values can then be used as traceable references against which software can be compared.
The work presented in this paper builds on the previous work by applying the same mathematical approach to the filtration operations required prior to parameter calculation. A technique is presented that enables the creation of mathematically traceable reference pairs; one pre-filter analytical surface and one post-filter analytical surface, that can be used as reference surfaces that assess the performance of software implemented filtration operations. A limitation of this method is that it can only be used with surface filtration methods that have an explicit mathematically continuous definition, such as the Gaussian filter.
Other popular filtration methods that are discrete in nature, such as the spline filter or morphological filter, cannot be used. In these cases, an extrapolation of the discrete case to the continuous case would be required, causing a variation from the definition in the international standard and therefore nullifying the benefit of a mathematical approach. Nevertheless, the Gaussian filter is one of the most popular filters used in surface texture analysis, and is often the default filtration option in many software packages, making this new technique still a worthwhile contribution to the state of the art. This paper focusses on the linear areal and profile Gaussian filters defined in international standards ISO 16610-21 and ISO 16610-61 [19,20], respectively. Section 2 details the proposed method of creating reference pairs using surfaces defined by their spatial frequency in connection with knowledge of the spatial frequency transmission characteristics of the Gaussian filters. Section 3 presents a range of example reference pairs created using this method and uses the reference pairs to perform an assessment on the filtration operations of five surface texture analysis software packages. Section 3 goes on to use the results of the software assessment to identify significant factors in the reference surfaces that contribute to the errors found in the software outputs.

Reference pairs
The use of reference pairs is an established method for the testing of software [21]. For a given input, the software under test will perform an operation, or a series of operations, upon that input to produce an output. With a reference pair, the output has been calculated in a traceable manner. The reference input can then be operated on by the software under test, and the software output can be compared to the reference output to assess the software error.
For surface texture filtration, the reference pair is a set of topographical surfaces that determine vertical distance from a mean plane as a function of lateral location, z(x, y); one pre-filter surface, and one surface upon which a traceable filtration operation is applied. While surface texture analysis software requires discrete datasets as inputs, this work will define reference pairs as analytical mathematical functions in order to retain mathematical traceability. Datasets can then be sampled from these definitions using whichever method and resolution that best suits the application of the software under test, taking care that sampling choices are clearly stated alongside the software assessment results.

Input mathematical surface
The first component of the reference pair is the input reference. This is an analytical mathematical expression that corresponds to a pre-filter surface height representation, z(x,y). Such an expression is achieved using a Fourier series approach, wherein a spatially variant signal can be constructed using a summation of weighted cosine terms, each one of a defined spatial frequency. Defining the input reference surface in terms of its spatial frequency components is of particular value when obtaining a mathematically traceable post-filter reference surface for a linear Gaussian filter, for reasons that will be explained in section 2.2. A general expression for an areal surface defined using this Fourier series approach can be presented in the form å å where N×M is the total number of terms; m and n denote the x and y spatial frequency components of each term, respectively; A n,m defines the amplitude of each term; N x and N y represent the size of the surface in the x and y directions as multiples of the scaling length 1/f s ; and f x and f y are the phase shifts of the cosine term in the x and y directions, respectively. N x /(mf s ) and N y /(nf s ) are equivalent to the x, y wavelength components, respectively.

Transmission characteristics
The second component of the reference pair, the output reference, is a post-filtered surface obtained by applying a mathematically traceable filtration operation to the input reference. The linear Gaussian filter for an open profile is defined in ISO 16610-21 [19] by its weighting function, ⎡ where λ c is the filter cut-off wavelength and a p = log 2 . By performing a Fourier transform on the weighting function, the low-pass transmission characteristics can be obtained [22]: where λ is the spatial wavelength component of the surface, A 0 is the amplitude of the spatial wavelength component before filtration and A 1 is the amplitude of the spatial wavelength component after filtration. The transmission characteristic, therefore, defines the relative magnitude of transmission of a surface through the filter, as a function of the spatial wavelength, relative to the cut-off wavelength, λ c . By defining the input reference surface directly in terms of its spatial frequency, or wavelength, components, it becomes straightforward to obtain an output reference surface that is mathematically traceable; the amplitude of each Fourier series term is adjusted based on its spatial frequency, as defined by the transmission characteristic. The transmission characteristic for the linear Gaussian filter defined in equation (3) is shown in figure 1. The transmission characteristic is defined here for a low-pass filter, or L-filter in areal surface texture analysis nomenclature, resulting in a surface with high spatial frequency components attenuated. In the context of profile surface texture analysis, this corresponds to obtaining the waviness profile. To obtain the roughness profile, the transmission characteristic is simply 1−W(λ).
The expressions here are given for the profile case. However, as the areal linear Gaussian filter is separable, it is straightforward to apply the same approach to the areal case [9]. For a separable input Fourier series surface expression, the transmission characteristic function can be applied in the same manner as for the profile case; once in the x direction and once in the y direction. This enables both profile and areal Gaussian filtration algorithms in surface texture analysis software to be assessed using the same technique.

Assessment of existing software
To showcase this new method of mathematically traceable software validation for surface texture filtration, a selection of reference pairs was created to be used to assess the performance of five surface texture analysis software packages. The pre-filter, input references were then sampled into discrete datasets, and the software packages were used to perform linear Gaussian filtration operations. The resulting outputs were then compared to the post-filter reference surface to assess the performance of the software.

Creation of reference surfaces
In order to fully test the software, a total of 124 reference pairs were created. This range ensures each property of the surface definition that could have an influence on the filtration operation is varied, in order to identify which are the most significant factors that could affect the results from the software under test. Table 1 details the surface properties that were varied. Amplitude values of 1 μm were given to each surface cosine term. The wavelength values for both λ c and the surface spatial frequency components were chosen in line with the recommended default values given in ISO 4288 [23], as these are likely to be the most popular filter settings used in the software under test. A maximum areal resolution of 700×700 was chosen due to a dataset file size limitation for one of the five software packages tested, and so all software packages were given the same resolution to facilitate more accurate comparisons.
With the pre-filter reference surfaces defined using cosine terms, the associated post-filter reference surfaces were calculated using the method described in section 2.2, wherein the amplitude of each cosine term was adjusted as a function of the cut-off value, λ c , and the wavelength of the cosine term. Datasets were created by sampling the surface equation at discrete locations between, and including, the surface region boundaries, with the sampling interval in one dimension determined by  where h and l are the upper and lower boundary regions in that dimension, respectively, and n is the number of sampling points in that dimension.

Truncated surfaces
The method described in section 2.2 applies the transmission characteristic in a straightforward manner for continuous surfaces defined using a finite number of cosine terms When sampling a continuous surface to produce a dataset, a surface representation of finite length is produced. Therefore, depending on the implementation of the filter, the finite length of the surface may cause end effects that are intrinsic to the surface definition, and not due to errors in the software algorithms. Truncated versions of each surface were included in this analysis to account for this potential effect and identify the impact that it may have on the software results. The truncated definitions of each surface, z T (x,y), are defined by where x l , y l and x h , y h are the lower and upper boundary regions of the surface in the x and y dimensions. For the profile case, the same condition holds, but in one dimension. To calculate the postfilter reference surfaces for these truncated surfaces, the straightforward amplitude adjustment approach used for the continuous case is no longer applicable, as the boundary regions cause a discontinuity in the surface which is not well represented by a finite Fourier series. Instead, a more traditional approach of performing a convolution between the surface equation and the Gaussian filter weighting function, as described in equation (2), is applied. This was achieved using the computer algebra system, Wolfram Mathematica 11.3.

Software comparisons
In order to efficiently compare the software outputs to the post-filter reference surface for a large number of surfaces, a statistical quantifier was needed. This was chosen to be the standard deviation of the deviation of the software obtained height values from the postfilter reference surface height values. The post-filter reference surface height values were first subtracted from the software output at matching x, y locations, creating an array, D, of deviation values, D i . The standard deviation, S, of the resulting deviation map was then calculated according to the definition where K is the total number of deviation values, and μ is the mean of all values of D. The deviation values are calculated such that their mean value is zero, however, the standard deviation has been chosen here instead of a statistical quantity, such as the mean absolute deviation, in order to give higher weighting to larger deviation values. In addition, all comparisons are scaled such that the reference surface is of order unity. This makes the standard deviation metric scale invariant so that the amplitude scale of the surface is not a contributing factor to the performance metric; only the deviations of the software from the reference are used. Using this approach, the deviation for a software dataset from the post-filter reference can be given as a single value, enabling much easier comparisons across multiple surfaces. In addition, the same approach can be used for both profile and areal surfaces, allowing performance comparisons between the two types of surface. Figures 2 and 3 shows the standard deviation from reference for a range of profile and areal surfaces for five surface texture analysis software packages, respectively, wherein the lower the deviation value, the greater the degree of agreement with the mathematical reference. The post-filter surfaces selected were calculated using the continuous pre-filter definitions and the transmission characteristic. An end removal of λ c /2 and a surface length of 2 mm were also selected. Note here that software E was only able to process profile surfaces, and so is not present in the areal results throughout this paper. These results demonstrate the level of agreement with the mathematical reference for a large variety of surfaces, and enables easy comparison between multiple software packages. For surface texture analysis software developers, this approach can be used to test multiple filtration algorithms against a known, traceable reference to easily identify which algorithms are most accurate. For end users, this approach allows for the validation of surface texture software, whether it be a commercial package or a software measurement standard made available by an NMI, in order to understand the accuracy of the software in relation to a mathematically traceable reference value and determine whether it is fit for purpose. pairs. The third graph shows the post-filter surface obtained by the software, and the following graphs show the difference between the software and the reference surface. In addition, table 2 provides the prefilter and post-filter amplitudes of the reference pairs. Figure 4 shows the result for a typical surface with good agreement between software and reference. It is clear that the majority of the deviation from reference is due to end effects still present on the surface. By increasing the size of the cut-off length from λ c /2 to λ c , as shown in the lower two graphs, a much closer agreement to the reference surface is achieved. This effect is even more pronounced in figure 5, which  shows end effects with much larger amplitudes. This is because of the greater difference in amplitude between the pre-filter and post-filter references. The ends of the software result are still roughly equivalent in amplitude to that of the pre-filter reference surface, leading to the large deviation. When cropping in, much greater agreement is observed. Figure 6 shows the results of a reference pair with an extreme amplitude suppression around a factor of 30. In this example the software is not able to export a surface to such a  precision, resulting in a significant difference between surfaces.

Effect of end removal length
One of the factors included in the surface creation section was the length that is removed from each end of the surface in order to account for potential end effects. To investigate this, two popular end removal lengths were chosen, λ c /2 and λ. The ratio of the standard deviation from reference values for surfaces with differing end removal lengths, with all other surface properties kept constant, are presented in figure 7. The results are presented as the relative difference of end removal length values,   meaning that a larger value corresponds to a reduction in the deviation from reference when increasing the length of the end removal, i.e. the larger the value, the greater the improvement due to removing more of the ends. It should be noted here that some of the software packages tested did not give the option to remove end effects within the software. For these cases, end removal was performed manually in order to enable meaningful comparisons across all of the software tested. Figure 7 shows that a clear improvement is seen for software A and B, with reductions in the standard deviation from reference on the order of 10 3 , indicating that both produce end effects that result in deviations from the mathematical reference that are significant in comparison to any deviations present in the central region of the surface. This result is seen for both profile and areal surfaces. Software C, D and E, however, show very little change in the deviation from reference, suggesting that these software packages do not produce significant end effects. This may be because their implementations of the Gaussian filter include methods to account for end effects, for example, by artificially extending the length of the surface, and then cutting back down after the application of the filter.
To verify the visual trends seen in figure 7, a oneway analysis of variance (ANOVA) test was performed using IBM SPSS Statistics on each software result to determine whether the increase in end removal length was a statistically significant factor in the variation in standard deviation from the reference. In order to determine statistical significance, it is chosen that a rejection of the null hypothesis H 0 :p=0.05 is required. Extreme deviation results >0.1 μm were omitted from the analysis in order to focus on the effect of end removal length and avoid influencing the results with additional factors. The results are given in table 3. In line with the visual data, the results conclude that only software A and B obtain p-values <0.05, implying end removal length is a significant factor in the deviation from reference values for those, and not for software C, D and E.

Effect of surface truncation
As mentioned in section 3.1.1, the act of sampling the continuous surface to produce a finite width surface representation may introduce end effects due to the implementation of a finite width filtering window. Similar to the approach in section 3.3, reference pairs were created with both continuous definitions and truncated definitions, and the ratio of the resulting software standard deviations from reference are shown in figure 8. Here, a consistent improvement in deviation from reference is seen for software A for a variety of surfaces, indicating the filter implementation better agrees with a truncated reference pair than a continuous reference pair. The remaining software packages show no such improvement, suggesting their filter implementations either assume continuous frequency components within a surface and apply the filter accordingly, or that the effect of truncation on the surface is insignificant compared to other error sources in the filtration implementation. Software B, however, suggests the software's deviation from reference is improved when using a continuously defined reference pair. This supports the idea that the software's filtration implementation assumes a continuous frequency component within the surface, and filters based on that assumption. Table 4 shows the ANOVA test results for the surface creation method on the standard deviation from the reference, and supports the visual results that the surface creation method is statistically significant for software A. The results show a failure to reject the null hypothesis (p>0.05) for software B, however. The reason for this rejection is unclear, given the visual results of figure 8(a), but may be related to the high variance seen in the results, supported by the box plot shown in figure 8(b). This suggests the deviation within the groups (truncation and continuous) is more significant than the deviation between the groups, and is due to another surface property not factored into the ANOVA analysis, thus restricting the ability to claim statistical significance for the 'between group' variation.

Identification of high deviation surfaces
Throughout the assessment of software using this method, multiple surfaces arose for which all software packages tested obtained large deviation values from the reference. Examples of this can be seen for surfaces 6, 10, 11 and 12 in figure 2, and surface 10 in figure 3. As these results are found for all software tested, it is expected that this is due to some property of the surfaces that limits the effectiveness of all discrete filtration algorithms. Figure 9 shows the spread of standard deviation from reference values for each software, split into the ratios of highest x wavelength surface component against cut-off. This combination was chosen to investigate the effect of the transmission function on the software filtration results, with the transmission function being a function of both surface frequency, λ, and cut-off frequency, λ c . The results show a clear increasing trend in the deviation values for low λ/λ c values across all tested software, with a particular increase after λ/λ c =1. Referring to the transmission characteristic shown in figure 1, the high deviation results correspond to the left-hand side of the curve, with values 0.1λ/λ c <1. In these regions, the amplitude transmission, A 1 /A 0 drops rapidly to zero, or close to zero. For example, evaluated at λ/λ c =0.1, Within this region, rounding errors can play a significant part in the final results of discrete filtration implementations, and the accuracy of calculations is limited by the working precision of the software [18]. In addition, the finite resolution of the input dataset becomes increasingly significant when representing high frequency components, as there are less data values within each frequency period. Combining both of these factors, it is unsurprising to see high deviation values from the software under test. Table 5 displays the ANOVA test results for the effect of λ/λ c on deviation values. In line with the visual results in figure 9, the transmission ratio is a significant factor for four of the five software packages tested. While very close (p=0.07), the software C results failed to reject the null hypothesis. This is due  to the larger interquartile ranges displayed in figure 9, indicating high degrees of variation within the groups that is not explained by the groupings of λ/λ c . Figure 10 shows a box plot of the spread in standard deviation from reference values for each software, grouped into profile surfaces and areal surfaces. The aim of this analysis is to identify differences in the implementation of profile and areal Gaussian filters, and what effect this has on the software's ability to obtain a post-filter surface that agrees with the mathematical reference surface. The results show highly similar spreads of results for both software A and software B, implying a similar filtration operation has been implemented for both surface types. This is because ISO 16610-21 and 16610-1 both define identical transmission characteristics for the profile Figure 9. Spread of the standard deviation from reference values for each software package as a function of λ/λ c , for all test surfaces, including both end removal lengths and surface creation types. and areal linear Gaussian filters, respectively [19,20]. Furthermore, the areal linear Gaussian filter is separable, meaning it is possible to achieve an areal filtration by performed two successive profile filtrations; once in the x direction and once in the y direction. The results for software A and B suggest such has implementation has been used. Software C and D show differing results depending of the surface dimensionality. Software C obtains notably smaller deviation values when performing profile filtrations, implying the added dimensional complexity of the areal filtration operation has increased the software error. It is possible that two separable profile filtration operations are performed in this case, and any small errors introduced when performing the first filtration operation are compounded when acted on again by the second filtration operation. Software D, however, displays a slight reduction in deviation from reference when performing areal filtration. The reason for this result is not clear.

Effect of surface dimension
table 6 shows the ANOVA test results for the effect of surface dimension of deviation from mathematical reference. In line with figure 10, significance is achieved (p<0.05) for software C and D.

Effect of areal surface resolution
Two resolutions were chosen for the areal surface datasets to be input into the software under test: 700×700 and 512×512. The 700×700 resolution was chosen as the highest available resolution that could be input into all of the software tested. The 512×512 resolution was chosen as the highest usable resolution that is a power of two, 2 n ×2 n . This is chosen due to its compatibility with the binary numeral system, which can lead to more accurate computer algorithms. The aim of this test was to identify whether the use of data that is easily factored by two can have a significant effect of the results from any of the tested software. Figure 11 shows the ratio between standard deviation from reference values of 700×700 resolution surfaces and 512×512 resolution surfaces. Here, a value greater than one represents a reduction in deviation when a 700×700 surface is used over a 512×512 surface. Software C and D show consistent values greater than one, suggesting the added information in the higher resolution surface leads to a more accurate filtration result, in relation to the mathematical reference values. The effect is more pronounced for software C, with an approximately 40% improvement for the higher resolution surface, compared to an approximate 20% improvement for software D. Software B shows a low degree of variance between results, averaging around approximately unity, meaning the resolution of the surface had little effect on the results of the filtration operation, and that other surface properties were more significant. This result suggests that software B is able to extract a similar degree of frequency component information from the lower Figure 10. Spread of the standard deviation from reference values for each software package as a function of surface dimensionality. High deviation values >0.1 were omitted from the analysis to avoid distortion of the spread due to external influence factors unrelated to dimensionality. resolution surface compared to the higher resolution surface, potentially using some form of interpolation. Figure 11 also shows that the filtration operation of software A is improved when using the lower resolution, power of two surface dataset, despite the slight reduction in information present in that dataset. This result supports the idea that software A is using filtration implementation methods that are more accurate when given a dataset with power of two resolution. The box plot also reveals a large spread in the results, meaning this effect is not consistent across all surfaces. table 7 presents the ANOVA test analysis for the statistical significance of variance due to resolution. As (p<0.05) was not satisfied for any software, no statistical significance was found for surface resolution. This is likely due to the variance found within the groups (512 × 512 and 700 × 700 resolutions). Nonetheless, visual inspection does seem to deliver insight into the implementation of filtration in each software package tested.

Conclusions
This paper introduces a novel method for the validation of surface texture filtration methods in software. Mathematical reference pairs are created that are traceable, and in line with the linear Gaussian filtration definitions laid out in ISO 16610-21 and ISO 16610-61. The use of the transmission curve alongside mathematical surfaces defined using Fourier series allows for straightforward post-filter surface calculations that can be applied to both profile and areal linear Gaussian filtration.
The use of this method was showcased with the creation of 124 reference pairs used to validate the performance of five available surface texture analysis software packages. The mathematically traceable reference pairs were used as a reference against which the software results were compared, enabling insights to be gained about the implementation of the filtration software. Investigations were performed for a variety of surface properties, including end removal length, Figure 11. Left: Ratio of the standard deviation from reference values for 512×512 and 700×700 resolution surfaces. Right: Spread of the ratio results for each software package tested. Only continuously defined reference pairs were used in this analysis. resolution, and dimensionality, and attempts were made to use the software deviations from reference to understand potential errors in the software. As a result of the investigations performed, it was found that the influence of many surface properties on the accuracy of the filtration operation is dependent on the particular software used. Therefore, it is recommended that for optimal accuracy, a good understanding of the software used to perform surface texture filtration is needed. In the event of collaboration efforts that use different surface texture analysis software implementations, or to attempt to achieve the highest degree of accuracy from the largest variety of surface texture analysis software, certain choices can be made. It is recommended that the end removal length is chosen to be λ c to adequately avoid the inclusion of end effects, and the resolution of the surface dataset is set to be a power of two, to best utilise the software algorithms that designed to operate in the binary numeral system. These properties have been shown to have no negative effect on the results of software for whom such properties do not benefit, but can significantly improve the results of those that do.
This mathematical approach to surface texture filtration validation was intended to also be applied to the spline filter defined in ISO 16610-22 [24]. In contrast to the Gaussian filter, however, the spline filter is defined in terms of a discrete, dataset-based algorithm, and so a mathematical reference pair based on continuous definitions is not applicable. Conversion of the discrete definition given in ISO 16610-22 into a continuous definition has been investigated, but no solutions have been found.
This mathematical approach to software validation can be used by software developers to test their software to improve accuracy. This will benefit end users by enabling higher accuracy surface texture measurements with increased confidence in the results, and facilitates more effective collaborations between users of different software by providing a traceable reference against which all software can be compared.

Future work
This work is a follow on from previous work that introduced the use of mathematical references for the validation of field and functional surface texture parameters [16,17]. By combining this newly introduced work into the mathematical reference framework, traceable validation of surface texture analysis software is closer to realisation. The remaining component of this framework is the development of mathematical references for surface form removal. Once complete, all ISO recommended stages of surface texture analysis will have a traceable reference.