Quantification of scar collagen texture and prediction of scar development via second harmonic generation images and a generative adversarial network

: Widely used for medical analysis, the texture of the human scar tissue is characterized by irregular and extensive types. The quantitative detection and analysis of the scar texture as enabled by image analysis technology is of great significance to clinical practice. However, the existing methods remain disadvantaged by various shortcomings, such as the inability to fully extract the features of texture. Hence, the integration of second harmonic generation (SHG) imaging and deep learning algorithm is proposed in this study. Through combination with Tamura texture features, a regression model of the scar texture can be constructed to develop a novel method of computer-aided diagnosis, which can assist clinical diagnosis. Based on wavelet packet transform (WPT) and generative adversarial network (GAN), the model is trained with scar texture images of different ages. Generalized Boosted Regression Trees (GBRT) is also adopted to perform regression analysis. Then, the extracted features are further used to predict the age of scar. The experimental results obtained by our proposed model are better compared to the previously published methods. It thus contributes to the better understanding of the mechanism behind scar development and possibly the further development of SHG for skin analysis and clinic practice.


Introduction
Scar is a general term used for the cosmetic and histopathological changes caused to normal skin tissue by various traumas, and it is an inevitable outcome produced by the process of human trauma repair [1]. In general, scars are classified into normal scars and pathological scars. As for the latter, which is commonly caused by trauma, is often characterized by the excessive deposition of collagen and an additional connective tissue matrix [2]. The image recognition of collagen is vitally important in medicine [3]. As one of the products of wound healing, normal scars can provide a morphological reference for the study of pathological scar treatment [4][5][6]. In recent years, as optical technology advances, second harmonic generation (SHG) has been widely applied to image the collagen in scars [7]. SHG refers to a second-order nonlinear optical process, in which the photons interacting with a nonlinear material are effectively 'combined' to generate the new photons with twice the frequency of initial photons [8]. SHG enables the direct imaging of anisotropic biological structures of collagen by interacting with highly non-centrosymmetric molecular assemblies, which makes it an ideal means of in vivo imaging. In the meantime, it demonstrates a massive potential of application in various biomedical settings [9,10]. However, quantitative methods are ignored in most of the researches conducted on the basis of SHG imaging technology [11]. In addition, medical images contain various complex information related to pathology. Therefore, the application of artificial intelligence technology in the analysis of medical microscopic image can help improve the efficiency of diagnosis, which is of much practical significance.
Most recently, deep learning techniques have already been applied to computer vision and pattern recognition [12]. In 2012, Gomez et al. investigated 22 co-occurrence and gray-level quantization in breast tumor classification [13]. In 2015, Ronneberger et al. proposed U-net for its application in the segmentation of medical image [14]. At that time, U-net outperformed conventional encoder-to-decoder neural networks. Besides, Shin et al. put forward a semisupervised learning method in 2019 for the classification of breast ultrasound image [15]. In literature [16], Local Orientation Ternary Pattern (LOTP), Improved Local Ternary Pattern (ILTP), Binary Gradient Contours 1 (BGC1), and Gray Level Co-occurrence Matrix (GLCM) methods were adopted to extract texture features for the construction of regression models, but they performed poorly in describing the features of scar images. Some typical neural network models, such as VGG and ResNet, require training data in large amounts to capture the diversity across the imaging modality, which makes it lengthy and laborious to label each sample in practice. Generative Adversarial Network (GAN) is the most effective deep learning model to address generative problems, such as synthetic data generation, cell image generation and image-to-image translation [17][18][19][20][21]. GAN is a preferred choice because it is a deep learning method with the training data provided by its generator. Besides, it is usually necessary to process the corresponding relationship between two different domains of images. Dong et al. applied the GAN to denoise Optical Coherence Tomography (OCT) image for ophthalmic applications [22], while Rubin et al. employed the GAN to classify the label-free quantitative phase maps of live cells [23]. With different GAN models, more general datasets and better training strategies, the GAN method can be improved for different optical image processing projects.
SHG is referred to as a powerful tool for biomedical diagnosis, and SHG image is an important type of optical image [22]. In this study, GAN is developed to predict the evolution of the scar texture (ScarGAN), which allows the potential features of the SHG images to be extracted on different scales. The collagen textures in the normal human scar were analyzed, and the characteristics of scar collagen which varied over time were studied. According to the image textural features related to the collagen morphological characteristics, the scar images were categorized by scar ages, thus representing the collagen morphology of human normal scars. Through the differentiation of the normal scar tissue and the characteristics of changes with scar age, doctors can better diagnose the scars, which will improve the understanding as to the mechanism of scars development [24][25][26]. Figure 1 shows the research design for making textual descriptions. In general, the normal scar samples collected from patients were used for this study. Scar SHG images were captured by SHG microscope. Then, the images were processed using ScarGAN, which can transform the SHG images into the target images. Tamura features were extracted as a quantitative index to describe the textural information of scar SHG images. Finally, the regression prediction analysis of the features extracted using GBRT was performed to validate the target method.

Scar sample preparation
The human scar samples used in this study were collected from hospitalized patients, with their consent received in advance. This study was approved by the Institutional Review Board of Fujian Medical University. All of the six scar samples of different scar ages (2, 4, 8, 15, 21 and  40 years) were derived from the abdomen of female patients with caesarean section [16]. The age of the patients varied, ranging from 25 to 58 years old. 150 SHG images were acquired for each scar sample. The scar collagen texture changes with the age of scar or patient, but the impact of scar age on collagen texture is more significant [16]. In this study, our focus is placed on the age of the scar, and the effect of age on the texture of the scar can be eliminated.

SHG imaging
The SHG images of collagen in scar were captured with the assistance of the Zeiss LSM 510 META microscope system. Manufactured in Germany, it has a mode-locked femtosecond titanium: sapphire laser (Coherent Mira 900-F, America) at 810 nm. The system consists of three components: an imaging detection system, an excitation light source and a high-throughput scanning inverted microscope. For high-resolution imaging, a high-numerical-aperture, oil immersion objective (Plan-Apochromat 63×, N.A. 1.4, Zeiss) was employed. The SHG image shows high directivity, which involves the principle of correlation interference, so that the photos can be made clearer than fluorescence images [16]. With a size of 256 × 256, the images were captured from the cross-sectional section of the sample that contains different tissue structures from the epidermis to the dermis. Since normal skin tissues are possibly contained in the epidermis and the underlying dermis, the frontal accuracy of the experiment can be compromised. To avoid the impact of the normal skin tissues in the edge area of the samples on the experiments, the deep dermis were randomly selected in the central part of the samples for imaging analysis.

Gradient boosted regression trees (GBRT)
The classification and regression tasks can be completed using the residual error based on decision tree. In terms of classification performance, the effect of iterative decision tree on many datasets is superior to that of random forest algorithm [16].
In this study, the scar sample was presented by the matrix of m × n, where m indicates the number of pictures with the same properties and n refers to the number of features corresponding to the pictures. The quality of different features tends to have immediate impact on the performance of the regression model. Therefore, our top priority is to choose the appropriate feature. As the number of iterations increases, the predicted value and loss function are constantly updated, and it stops after the preset number of iterations is reached. Indicating the pruning rate of the leaf node, the weight function is required to be initialized. In this process, the importance of each feature can be obtained according to the exact state of all sub-trees, that is, the contribution rate to the model. The importance can be used to assess the quality of features and taken as the basis for the selection of features.
GBRT demonstrates such advantages as outstanding predictive ability, the robustness to outliers in the output space through a powerful loss function, and the satisfactory handling of various data. GBRT involves various nonlinear transformations, which makes the algorithm more applicable, while eliminating the risk of complex feature transformations. The residual learning performed by itself is more established than the original boost algorithm, so that the model is further generalized. With control applied on the number of iterations and step length, the model can avoid the risk of over-fitting. At the same time, the sub-tree can obtain the contribution ability of the feature variable to the model by voting method, which facilitates the analysis of feature variables. Most importantly, GBRT performs well in prediction, which makes it effective in solving nearly all regression problems.

Tamura texture features
The visual features proposed by Tamura can be summarized as six textural features, including coarseness, contrast, directionality, line-likeness, regularity and roughness. Of them, coarseness, contrast and directionality are the basic elements of the texture, which have high discrimination to distinguish the difference between textures.
(1) Coarseness (Crs): Coarseness is the most basic texture feature. When two images with the same content but different sizes are compared, the roughness of a large-scale image can be amplified.
(2) Contrast (Con): Contrast refers to the gray difference between the brightest part and the darkest part in the selected area of the image. It indicates the clarity of the image and the depth of the texture groove.
(3) Directionality (Dir): Directivity as a global attribute for a given region is similar to the contrast. The description of directivity includes the rules of the shape and distribution of elements. In the image, the texture is presented as a certain shape in a certain direction, which indicates directionality to some extent. The directivity is obtained by calculating the gradient vector at each pixel.
(4) Line-likeness (Lin): Linearity can be used to evaluate the shape of texture elements. A directional co-occurrence matrix is constructed to extract the significant attributes of texture feature about the shape of elements. According to the direction co-occurrence matrix, the calculation formula of linearity is expressed as follow: where P Dd (i, j) represents the element of the distance point in the n × n matrix.
(5) Regularity (Reg): Regularity is the attribute of a variable about layout rules. If there are many different texture features in an image, it is deemed irregular. The image is split into multiple sub-images so as to obtain their respective feature variables. The sum of four features of each sub-image is taken as the regularity for the image. The calculation formula is expressed as follows: where σ crs represents the standard deviation of the roughness exhibited by the sub-image, the other three indicate the standard deviations of contrast, directionality and line-likeness of sub-images, respectively, and r denotes the normalizing factor.
(6) Roughness (Rgh): In the study of texture, the rough and smooth sentence is routinely used to compare the roughness of different texture features. Coarseness is the commonly used to describe roughness and contrast.

Generative adversarial network (GAN)
Playing a crucial role in probability statistics and machine learning, the generative model refers to a series of models applied to generate observable data randomly. The generation model is widely used in image, text, and audio materials [27]. The mainstream deep generation models included autoregressive model, variational automatic encoder, generation countermeasure network and so on. Based on the zero-sum game in game theory, the generative antagonism network treats the generation problem as the confrontation and game between the discriminator and the generator. That is to say, the generator is used to generate synthetic data from the given noise, which generally refers to uniform distribution or normal distribution, and the discriminator is applied to distinguish the output of the generator from the real data. The former is purposed to produce the data which is more faithful to the reality, while the latter is aimed to improve the outcome of distinguishing between real data and generated data. Ultimately, the two networks are optimized in confrontation which continue beyond optimization. The data collected by the generative network is made more faithful to the real data, so that the expected data can be generated, such as images and videos. GAN consists of a generator and discriminator. The former is used to capture the pattern of data distribution, while the latter is employed to calculate the probability of samples getting collected from the training data instead of the generator. Its basic structure is shown in Fig. 2. GAN provides a novel means of data generation. Compared with the traditional generative model, the adversarial training mode of the game between the generator and the discriminator of GAN can improve the quality of data generation significantly. In the meantime, the training framework of GAN is highly flexible, and there are no particular restrictions applied on the specific forms of the generator or the discriminator. Instead, the two are required only to meet the need for differentiability, which is conductive to the extensive application of GAN. It is thus essential to conduct research on the core technologies of GAN for practical application in various projects.

Our method (ScarGAN)
The low-level image information and high-level semantic information were factored into the model, so as to reduce the ambiguity of mappings between the unmatched scar images of different ages under a standardized mode of image conversion. Instead of simply using an additional loss term to control the attributes of the generated results, we embed the attribute vector into the generator to consider the semantic texture information in the generation processing fully and encourage the model to generate images with consistent attributes more effectively. The structure of our method is illustrated in Fig. 3.   Fig. 3. Architecture of our proposed model (ScarGAN). A generator G extracts the features of SHG images and generates images. A discriminator D is used to distinguish the SHG images with different ages based on wavelet packet transform. Both generator and discriminator undergo semantic conditioning by embedding the scar input attribute vector to output scar images.
Serving as a generator, a fully convolution network (FCN) consists of an encoder network, a decoder network and four residual blocks in the middle. The input attribute vectors are replicated and connected to the output blobs of the last residual blocks because they share high-level semantic features. After combination, the decoder network transforms the connected feature blocks back into the image space. The incorporation of both low-level image information (pixel values) and high-level semantic information (scar attributes) is proposed to regularize the patterns of image translation and reduce the ambiguity of mappings between unpaired input and output images.
The discriminator D performs two crucial functions. One is to distinguishing the generated scar texture image of a specific age from the scar texture image of different ages. The other is to establish whether the attribute of each synthesized image is consistent with that of the corresponding input image. More specifically, in order to extract the texture features of SHG images fully, wavelet packet transform (WPT) is applied to extract its texture features. Multi-layer WPT is purposed to analyze the texture in a given SHG image in a more thorough way. The wavelet coefficients at each decomposition level are inputted into the convolution path of the discriminator. In this study, the advantage of using WPT is that the calculation of wavelet coefficients can be regarded by a single convolution layer as forward propagation, thus reducing the calculation cost significantly. The semantic conditional information of the input is involved by embedding scar attribute vectors in both the generator and discriminator. In this way, the model could be guided to output scar images with the attributes faithful to each corresponding input. The WPT is adopted to extract the features of texture at different scales in the frequency domain, based on which the fine-grained details of scar aging effects can be obtained. WPT can help reduce the number of convolutions performed in each forwarding process. Although this part of the model has been simplified, it still takes advantage of multi-scale image texture analysis, which is favorable to improving the visual fidelity of the generated image.
The generated images show the development of scar collagen at different ages. The Tamura features of the generated images are extracted to compare the development of skin scar texture between different groups. Though the discriminator can make distinction according to the extract features, it remains difficult to describe these features in detail. In addition, it is convenient and easy to extract Tamura features using MATLAB. In our previous study [16], MATLAB was also used to extract feature variables, and then these features were used to conduct regression analysis. Figure 4 shows the experimental images captured at different scar ages and the deep learning models.
GBRT was adopted to carry out regression analysis. In the experiment, the Gaussian distribution was treated as the loss function to perform the regression analysis, with the step size set to 0.005, so that the length of calculation can be reduced considerably.
In our experiment, ADAM was taken as the optimizer and the learning rate was set to 0.0005. It was performed on a computer with NVIDIA GeForce GTX1080Ti graphics card and Intel i7-8700 CPU. The model was implemented on the basis of PyTorch framework and Python language. MATLAB (version 2016a) was also applied to extract feature variables, and WEKA (version 3.8.2) was adopted to analyze the data, with 60% of the dataset as training set, and the rest as testing set. We also used 10-fold cross-validation mechanism. Figure 5 shows the importance of the six input texture feature variables to normalized correlation, which is the importance of dividing all values by the maximum value, that is, the contribution to our proposed model. Reg, Dir and Rgh contribute most to the model, indicating that the changes in these three characteristics are particularly significant as the scar age changes. Meanwhile, Lin is the least important feature, which however does not mean that linearity contributes nothing to our model, but to a smaller relatively lesser extent. Regularity is the best input variable (the relative importance is 93.18%) compared with LOTP.

Experiments and results
In order to verify the advantages of our proposed algorithm, comparison is performed in this paper against other methods, including LOTP, ILTP, BGC1, GLCM, DenseNet and SinGAN. When these methods are applied to the image separately, there are six texture features extracted. Then, the corresponding model is obtained through GBRT. The scar age is predicted to verify the performance of ScarGAN through comparison with other methods. To visualize the prediction, all of the regression curves are shown in Fig. 6.
As shown in Fig. 6, the correlation regression curve of our model is the best, while GLCM is the worst. Ideally, the slope of the regression curve is supposed to be 1. In addition, it can be seen from the figure that the slope of our model is close to 1, which means the texture features of the image can be better described. Although Con, Crs and Lin are less significant in our method than another three features, it will be superior to other methods in general. The models established by the ILTP algorithm and the BGC1 algorithm are not particularly satisfactory. The most important reason for this is that both the ILTP algorithm and the BGC1 algorithm focus on the gradient information of the central pixel, which affects the performance in extracting the gradient information of the neighborhood. The ILTP algorithm focuses on the diversity of the gradient of the central pixel, while the BGC1 algorithm focuses on the diversity shown by the neighborhood gradient, as a result of which gradient information will be lost. Since the texture exhibited by the boundary of the scar collagen second-harmonic image is clearly irregular, it is very limited if only the gradient feature of the central pixel is taken into consideration. In addition, the scar image contains many prominent points. They are bright or dark spots, which can be easily overlooked. Therefore, the experimental results suggest that the texture features extracted by ScarGAN from the image are more representative than the features extracted using other methods. In the process of encoding the neighborhood, BGC1 and LOTP are highly similar to each other, but the resulting encoding value varies significantly. As indicated by the low importance of each individual feature variable obtained using the BGC1 algorithm, an excessive amount of neighborhood information will hinder the extraction of Tamura features.
When comparison is performed with other methods, the values of R 2 and RMSE are calculated. The RMSE is a positive number. Besides, the more concentrated the data predicted by the model. According to Table 1 and Fig. 7, the values of R 2 and RMSE obtained using our method are the best, which indicates the characteristic curve of our method is well fitted. R 2 denotes the degree of curve fitting, and RMSE represents another form of the residual error. From the overall R 2 ( Table 1) and RMSE (Fig. 7), it can be seen that only LOTP (R 2 = 0.94, RMSE=1.84) is close to our method (R 2 = 0.97, RMSE=1.41). Based on the importance of the single feature, however, our method is undoubtedly advantageous. Figure 8 shows the loss function curve of training set and testing set. Figure 9 shows the accuracy of the training set and testing set. It can be seen from the figure that the accuracy rate of the training set reaches about 98% before and after the 40th round, and then shows the trend of convergence. The accuracy of the testing set fluctuates evidently in the early stage, which is because the model has yet to learn the appropriate parameters in the previous rounds of model training. After the 10th round, the accuracy of the testing set improves slowly, and it gradually converges after the 60th round. As indicated by the increasing Dir of ScarGAN method, the arrangement order of collagen fibers improves on a continued basis over time. According to the increasing Rgh, the boundary of collagen fibers becomes better and stronger over time. The Reg with the most significant contribution indicates that the texture characteristics of collagen fibers improve constantly. When the difference between the two texture boundaries is limited to the gray level, the texture contrast can be effectively measured, but the single boundary of the texture boundary will be ignored, which also affect the description of collagen fiber texture by contrast.   Table 2 shows the time taken by each method. It can be seen from the table that the ScarGAN algorithm performs other calculation methods in the pace of calculation, which evidences the advantage of ScarGAN.

Discussion
In our previous study, a LOTP operator was proposed and demonstrated as robust to illumination, with a great promotion effect achieved in describing the Tamura texture features [16]. With the development of deep learning and artificial intelligence technology, deep neural networks have produced satisfactory results in image processing [28][29][30][31]. As a deep learning model widely used for image generation, GAN consists of a generator and a discriminator. The generator is capable to learn the low-level information of the input image from the training set of the image to image, which meets the requirement that the underlying information of the generated image remains unchanged. Additionally, the discriminator has the capacity to judge the truth of the generated image, so as to ensure the authenticity of the generated image. In this study, the relatively small size of the images was taken into account. In order to make full use of the feature in the SHG images and extract the feature from coarse to fine, the generator and discriminator were used to extract the texture feature from the SHG images and generate SHG images with different ages, representing the scar development at different ages. The regression model was constructed to predict the development of the scar texture based on the extraction Tamura texture features. In this study, the images of scar texture were collected from 6 age groups of patients. However, there is plenty of data required to train the general GAN, and the training is excessively lengthy. In addition, scar images contain many salient points. They are the bright spots or dark spots that can be easily ignored by other algorithms in the process of feature extraction. For this reason, the attribute vector of images were embedded into the generator and discriminator for the introduction of input semantic conditional information, so as to guide the model on how to output the scar image of each corresponding input. Besides, to enhance the detailed information of images, WPT was used to effectively extract the multi-scale features in the frequency space. The single image generation model is aimed to capture the internal feature distribution of images. Besides, the model based on unconditional GAN was also proposed for texture synthesis and image processing. Proposing SinGAN in 2019, Shaham et al. applied the unconditional pyramid generation model to learn the feature distribution of images [32]. However, these single image generation models usually consider only one image and are incapable to capture the relationship between the two images. Differently, our model aims to capture the changes in distribution between two unpaired images. That is to say, ScarGAN can be used to fully extract the texture features of the scar changes over time, and then the development of the scar can be predicted. The value of R 2 and RMSE (RMSE=1.41, R 2 =0.97) indicates that our method is effective in constructing the model, and that the model is not over-fitted. The increased value of Crs suggests that collagen fibers become better and stronger over time, while the increased value of Dir implies that the order of collagen fibers is getting better and better over time. The Reg with the most significant contribution rate evidences that collagen fibers are made increasingly regular and complete.
It is worth nothing that the size of the dataset used to train ScarGAN could be insufficient, which leads to the memorization of the dataset as a common problem encountered in GAN networks. This is attributed mainly to the overly large number of parameters and the insufficient sample size. To improve generalization for the model, there are a number of methods to be adopted. The first one is to expand the dataset, by increasing dataset capacity or carrying out data enhancement. The second one is to execute early stop strategy. That is to say, epoch is stopped or iteration is terminated in advance, which is effective in controlling the size of weight parameters, so as to reduce the complexity of the model. The third one is to maintain the validation dataset for verifying the training results. The fourth one is to collect additional data for cross-validation. The fifth one is to perform regularization, which is required for the optimization of the objective function or cost function. The sixth one is weight sharing, the purpose of which is to reduce the number of parameters in the model and alleviate the workload of calculation. The seventh one is batch normalization. That is to say, the input value of each layer is normalized, and the normalized data is reconstructed to ensure that the distribution of data remains unchanged. The last one is to use dropout schema. The above methods are all applied to a model, and the risk of over-fitting is prevented by adjusting the complexity of the model. Another feasible method is to combine multiple models to avoid over-fitting.
As suggested by the experimental results, the method proposed in this study can improve the accuracy of the model significantly. According to the different texture features of different scar images, the proposed method is used in this paper to effectively analyze six features of the scar texture images as discriminant features, and Tamura texture features are extracted on the basis of SHG images, with the best results achieved in the experiment. Compared with the previously-proposed methods and other deep learning methods, our method performs better [16]. The regression coefficient, slope, intercept and the error of the regression curve are all superior to other methods. It thus can enable the computer-aided diagnosis of the human scar. In the meantime, however, the algorithm is subjected to some limitations needing to be addressed. It is impractical for a novel model to suit all images, and the new algorithm is no exception. In spite of this, it still has a positive effect on the images of scar texture. Also, the number of parameters related to ScarGAN is relatively large, which requires lengthy training. In the further, it remains necessary to collect more medical images to train the model for the construction of a more accurate regression model. Our focus was placed solely on the SHG signal of collagen in the deep dermis. The epidermis mainly contains various cells, which can release a strong two-photon excitation fluorescence (TPEF) signal, but barely any SHG signal. In the dermis, due to the abundance of collagen, SHG signal is extremely strong, but some elastic fibers and cells can still release TPEF signal [33,34]. A further study will be conducted to explore whether this model can be generalized to the situations where some normal issues arise, with consideration given to TPEF signal.

Conclusion
In this study, a GAN model with Tamura texture features is developed that enables the extraction of scar texture, and it shows the potential to facilitate non-invasive diagnostic assays in clinical settings. Then, the developed method ScarGAN is used to extract and calculate six features including Con, Crs, Lin, Dir, Rgh and Reg. Besides, GBRT is combined to construct a regression model applied to study the characteristics of scar changes over time, based on which the development of the scar is predicted. As suggested by the experimental results, compared with other methods, R 2 and RMSE are better based on ScarGAN. This method can be used to achieve satisfactory results for our scar images. In the future, the scar images of all scar ages will be tested to improve the accuracy of the model under certain condition. It is believed that this novel method will be widely used for the diagnosis and treatment of scars.