Isolated Pulmonary Nodules Characteristics Detection Based on CT Images

Pulmonary nodules are the main pathological changes of the lung. Malignant pulmonary nodules will be transformed into lung cancer, which is a serious threat to human health and life. Therefore, the detection of pulmonary nodules is of great significance to save lives. However, in the face of a large number of lung CT image sequences, doctors need to spend a lot of time and energy, and in the detection process will inevitably produce the problem of false detection and missed detection. Therefore, it is very necessary for computer-aided doctors to detect pulmonary nodules. It is difficult to segment pulmonary nodules accurately and recognize the characteristics of pulmonary nodules in CT images. A complete set of semi-automatic lung nodule extraction and feature identification system is established, which is in line with the doctor’s diagnosis process. A segmentation algorithm of pulmonary nodules based on regional statistical information is proposed to extract pulmonary nodules accurately. This is the first time that dynamic time warping algorithm is applied in the field of image processing, focusing on the lung nodule boundary. On this basis, the recursive graph visualization model is established to realize the visualization of boundary similarity. Finally, in order to accurately identify the characteristics of pulmonary nodules, a video similarity distance discrimination system is introduced to quantify the similarity between the nodules to be examined and the pulmonary nodules in the database. The experimental results show that the algorithm can accurately identify the normal shape, lobulated shape and lobulated shape of pulmonary nodules. The average processing speed is 0.58s/nodule. To some extent, it can reduce the misdiagnosis caused by experience and fatigue.


I. INTRODUCTION
Lung cancer is a common malignant tumor with high morbidity, rapid growth and other features, and it has huge obstacle and hidden danger to human life and health [1], [2].The isolated pulmonary nodule which divided into benign and malignant [3] is the most common manifestations of lung cancer in the early stages.Accordingly, early confirmation and treatment of pulmonary nodules are of great significance to saving human lives.
The associate editor coordinating the review of this manuscript and approving it for publication was Yongtao Hao.
CT scanning image is based on the different absorption and transmission rate of X-rays in different tissues of body.It is an effective way to find the body's lesions by measuring human body with highly sensitive instruments.The detection of pulmonary nodules through observing CT images, and then the discrimination of pulmonary nodules characteristics is the main and effective method to determine benign and malignant pulmonary nodules [4].Due to the complex structure and various shapes of the human lung tissues, it is more difficult to recognize the isolated pulmonary nodule characteristics by visual inspection.Spiculation and lobulation are important indicators for judging benign and malignant nodules among the numerous isolated pulmonary nodules characteristics [5], however, they have various manifestation.Therefore, the detection of isolated pulmonary nodules characteristics is the focus and difficulty of current research.
Raicu et al. [6] constructed a three-dimensional model and then qualitatively judged the characteristics.Horsthemke et al. [7] distinguished pulmonary nodules characteristics from gradient directionality.Dhara et al. [8] used Gaussian and mean curvature to discriminate characteristics based on differential theory.Wang et al. [9] enhanced the boundary of isolated pulmonary nodules image, thereby improving the physician's diagnosis correct rate.Ciompi et al. [10] analyzed the pulmonary nodule images from the frequency domain, and then divided the pulmonary nodules into spiculation and non-spiculation.Dhara et al. [11] further studied on [10] and proposed a fast discrimination algorithm.Zhang et al. [12] analyzed the feature of different characteristics from medical perspectives on the diagnosed pulmonary nodule characteristics data.Liu et al. [13] constructed classifiers based on 24 pulmonary nodule image features to distinguish pulmonary nodules characteristics.Dennie et al. [14] extracted boundary features by computer, and then the degree of benign and malignant of pulmonary nodules was quantitatively analyzed.Hussein et al. [15] constructed a CNN network to classify pulmonary nodules.Dhara et al. [16] established a self-learning pulmonary nodules diagnosis mechanism from the perspective of image retrieval.Hussein et al. [17] used multi-view neural networks to describe pulmonary nodule characteristics.Ciompi et al. [18] established a deep learning system from data containing pulmonary nodules to automatically extract pulmonary nodules.Xie et al. [19] constructed a gray level co-occurrence matrix (GLCM) based on boundary information to classify pulmonary nodules.Snoeckx et al. [20] researched the phenomenon of changes in nodules at different time intervals, emphasizing the importance of the pulmonary nodule edge in identifying pulmonary nodule characteristics.Li et al. [21] improved the Random Forest (RF) algorithm with dimensionality reduction to decrease the algorithm computation At present, there are three key problems in distinguishing isolated pulmonary nodules. 1) Isolated pulmonary nodules are difficult to determine and extract the boundary.2) The more image features used, the importance of the boundary information in distinguishing pulmonary nodule characteristics are weakening.3) There are many parameters settings and the discriminant effect of characteristics cannot be displayed intuitively.All of the above are different from the physical's detection process.Accordingly, we conducted a study based on the physical's process of distinguishing pulmonary nodules characteristics, focusing on the boundary information, and using the principle of similarity to intuitively display and distinguish pulmonary nodules characteristics.

II. LUNG NODULES CHARACTERISTICS DETECTION
Through the further study of 10 professional physicals detection process of pulmonary nodules characteristic.It was found that the physical's detection process was divided into the following three steps: 1) Focusing on the lung area and locating the isolated lung nodule area; 2) Focusing on the pulmonary nodule edge area; 3) Using the knowledge of medicine, anatomy, etc. to construct a pulmonary nodule model; 4) Making a diagnosis by using experience to distinguish the similarity of the pulmonary nodule characteristics to the known.This paper proposes an algorithm for the computer simulation to physically distinguish the pulmonary nodule characteristics as Fig. 1. 1) Pulmonary parenchymal extraction.The physician manually determines the pulmonary nodules area.2) Pulmonary nodules boundary information are completely extracted, and then analyzed in a time-series manner according to the strong correlation of the boundary.
3) A recurrence plot model is established for known characteristics data and discriminating characteristics data, then the boundary information is visually displayed.4) The similarity degree between video frames are introduced to distinguish pulmonary nodule characteristics.

A. ISOLATED PULMONARY NODULE EXTRACTION
Histopathological analysis showed that isolated pulmonary nodules were distributed in isolated areas with high local luminance.The image is shown as an area with high grayscale value and limited pixel aggregation in the lung [2].The benign pulmonary nodules are circular with smooth and clear boundary.Malignant pulmonary nodules are quasi-circular and have vague boundary, spiculation and lobulation are the key feature to distinguish the benign and malignant pulmonary nodules.External characteristics of pulmonary nodules are shown in Fig. 2. Currently, technologies extract and detect pulmonary nodules based on computer-assisted can be divided into the completely manual method, the semi-automatic interactive method, and the automatic detection method [22].Completely, manual method requires specialist physicians to spend lots of effort on labeling pulmonary nodules but has low efficiency.Automated detection method, without the participation of doctors, directly extracts pulmonary nodules in areas with high efficiency, but it cannot guarantee 100% accurate results.Taking into account the need for efficiency and accuracy, we use a semi-automatic interactive method for extracting pulmonary nodules.
The pixel value of the image has a high dynamic range (0-65535) according to the principle of the CT image, so it is necessary to constantly adjust the window width and window location in order to observe the entire CT image.The effect is shown in Fig. 3.For reducing the amount of data and the dynamic range of images.Moreover, according to lung nodules distributed in the lung parenchymal region and the characteristics of the lung CT data in concordance with the double-Gaussian distribution.This paper uses the double-Gaussian distribution algorithm proposed in the literature [23], which to determine the threshold and completely extract the pulmonary region.
Then, the physician manually labeled the approximate location of the pulmonary nodule and the computer took a complete extraction of the pulmonary nodule in the subsequent.
The pulmonary nodule boundary information is the most important feature for detecting pulmonary nodule characteristics, hence the complete extraction of the pulmonary nodule boundary information is particularly important.Robert operator [24] fund edges by calculating local difference operators, which is accurate but sensitive to noise and not smooth enough.The Prewitt [25] and Sobel [26] operators incorporated pixel position information can detect grayscalegraded, low-noise images, but processing images with mixed, complex noises does not perform well.
In order to meet the needs of precise segmentation, scholars proposed segmentation algorithm based on level set, such as Snake [27], GAC [28], C_V [29] model and so on.They have achieved a good segmentation effect for the obvious boundary and the smoothness situation.However, when the boundary bump changes greatly, the model tends to fall into a local minimum value in the process of minimizing the energy functional, resulting in unsatisfactory segmentation effect.Accordingly, based on GAC and C_V model, this paper proposes an algorithm based on the driving force of image region information.
The GAC model is based on image gradient modulus algorithm to minimize energy functional to determine dynamic contours: The gradient descent function is: where C(s) is the evolution curve, is the distance function, and k is the curvature.N is the unit normal vector of the closed curve and the direction is directed to the interior of the curve.It can be seen that when the image is in the flat region |∇I| ≈0, then g =1 and ∇g =0.The curve evolves by kN.
When g =0 near the target edge, the curve evolves according to (∇ g• N)N.∇g points in the direction of the border, that is, toward the border.
For targets with inconspicuous boundaries, the GAC model is difficult to segment effectively.Chan and Vese [30] proposed an energy functional: The image is divided into 1 and 2 by the curve C. When the curve reaches the vicinity of the contour line, the corresponding energy of 1 and 2 is 0, and E(c 1 , c 2 , C) reaches the minimum value.
When the depression is deep, the curve expands outward in the depression of above algorithm resulting in the curve has difficult in close to the edge, which lead to inaccurate segmentation.Functionality is apt to falling into local minimums that severely affect the identification of spiculation.
It is hard to avoid this problem from the image boundary information.For this reason, we propose a driving function based on statistical information of the image region to determine the evolution of the curve.
The descending flow function corresponding to the GAV principle is: where |u| =1, α is the force coefficient, then As shown in Fig. 4, there are four main positions for the initial contour and the target contour.The gray is the initial contour, the white is the target contour, the blue is the area we need to discuss, and the arrows is the unfold direction of the contour.For convenience of explanation, let: The deduction of algorithm as shown in Table 1.''+'' means initial contour expansion and ''-'' means initial contour contraction.According to the above analysis, the algorithm takes the image region statistical information as the driving function, which overcomes the bad case when the boundary is vague and the depression is deep.

B. OTHER RECOMMENDATIONS BOUNDARY EXPANSION
The boundary is the most important feature for judging the signs of pulmonary nodules [20].In order to observe the pulmonary nodule boundary, the lung nodules are unfolded at first, and then the sequence is constructed with the dependence of the boundary to prepare for the subsequent characteristics detection.
After acquiring the edge image of the pulmonary nodule, focusing on the edge area and unfolding it, the step as follows:  The calculation formula for θ is:

STEP 3: The pulmonary nodule boundary tracking sequence BL was reordered from 0 • to 360 • angle, and normalized to obtain a sequence CL of the pulmonary nodule edge shape.
After unfolding the pulmonary nodule, the fluctuation of the boundary can be visually reflected, but it should be measured by selecting an appropriate analytical method.
Traditional statistical analysis methods are based on the assumption that data sequences have independence.However, time series analysis [31], [32] is different from traditional statistical methods, which focuses on researching and analyzing the interdependence of data sequences.This method has been successfully applied to speech signal processing and has achieved good research results.
Similar to the speech signal, the pulmonary nodule boundary has features of continuity and strong dependence of adjacent pixels.For this reason, this paper proposes to convert the pulmonary nodule boundary sequence into a time series.
The time series is a sequence uniformly marked by time, but the value of pulmonary nodules angle regarded as ''time'' is sparse and discrete.Therefore, we need to interpolate and uniformly sample unfolding sequence CL in order to convert the pulmonary nodule edge CL into time series.
In this paper, the current mature cubic spline function is used to interpolate, and then the time series is formed by sampling.Thus, pulmonary nodules are converted to a time series DL for subsequent characteristics detection.

C. ESTABLISH RECURRENCE PLOT MODEL
The physician discrimination of pulmonary nodules is mainly through anatomical, pathological, imaging and combined with their own experience to form a reference model of different characteristics.Then, the pulmonary nodule characteristics to be identified are modeled.Finally, the similarity criteria are established to compare the discriminated model with the reference model.Computer simulation of physician detection process focuses on DL at first.Then, a reference model based on recurrence plot is constructed according to known characteristics data sets.Finally, the normalized compression distance is used to quantify the similarity degree to identify the pulmonary nodule characteristics.
The researches on time series similarity discrimination mainly include based on dynamic time warping (DTW) algorithm [33].The DTW algorithm calculates the time axis curvature to obtain the minimum distance between two time series, and determines the best correspondence of each point in a time series, which can effectively solve other algorithms problems such as Euclidean distance is very difficult to deal with.
However, for the shape recognition of pulmonary nodules, the edge of the pulmonary nodules is unfolding from different angles, which will produce different time series, but the DTW algorithm can only search for similar areas nearby.Therefore, DTW show not excellent recognition performance when faced with this problem, and without the feature of strong similarity within the group and weak similarity between groups.
To this end, we propose to use the recurrence plot (RP) [34] method to study the boundary sequence of pulmonary nodules, so as to make full use of the chaotic, non-stationary and periodicity of the boundary, and better represent the internal structure of the sequence.
where N is the number of states experienced by the time series, m is the sequence dimension, x(i) and x(j) is the value observed in the sequence at i and j, • is the distance between the two observation points, and ε is the threshold for measuring the difference in distance.
The recurrence plot can reflect the internal structure of the sequence, it needs to compare the similarity between sequences based on recurrence plot subsequently.

D. DISCRIMAINATION MODEL
At present, the distance algorithm for evaluating the similarity of two images is the normalized compression distance (NCD), which needs to linearize the two images, resulting in loss of spatial position information, CK-1 [36] used the inter-frame compression technique of video to extend the limitations of the NCD algorithm, which used the MPEG-1 video compression algorithm to preserve spatial information and find patterns repetitive appear in the framework to achieve compression.Compared the size of the compressed image to measure the distance between images.Accordingly, we use CK-1 to calculate the distance, the formula is as follows: where a is a recurrence plot that needs to determine pulmonary nodules characteristics, and b is a recurrence plot of known pulmonary nodules characteristics, C(b|a) represents the video size by compressing b first and then compressing a through the MPEG-1 algorithm.The smaller d mpeg (a,b), the higher the similarity.The schematic diagram of PRCD algorithm for morphological characteristics detection of lung is shown in Fig. 6.First, different types of pulmonary nodules are expanded into time series, and then time series is transformed into Recurrence plot to form a pulmonary nodule sign database.When the solitary pulmonary nodule sign is judged, the pulmonary nodule is also transformed into a Recurrence plot.The similarity between the pulmonary nodule sign and the characteristics of the pulmonary nodule pool is judged by Eq.12, and then the pulmonary nodule sign is determined.

III. EXPERIMENT
The experiment data is provided by the Lung Image Database Consortium (LIDC) published by the National Cancer Institute (NCI), which contains the patient's complete lung CT image sequence, and the medical characteristics of the pulmonary nodules marked by four experts and the label of benign and malignant diagnosis of nodules.We selected 35 definite cases with isolated nodules.The label of pulmonary nodule characteristics by expert as a diagnostic reference, and then we establish a database for subsequent experiments.

A. SEGMENTATION PERFORMANCE
The area overlap measure (AOM) was adopted as the evaluation index of segmentation effect which is defined as: where AOM is the area overlap measure, ϑ is the image marked by the physical, ξ is the segmentation result graph, S(•) indicates the number of pixels in the corresponding area, and the larger the AOM value, the better the segmentation effect.
Compared the typical algorithm and the mainstream algorithm with our proposed algorithm, the results of the AOM average are shown in Table 2.The Robert [24] and Prewitt [25] algorithms have good segmentation results for benign pulmonary nodules, but the malignant nodules are poorly segmented due to vague boundaries.The algorithm [27], [28], [29] based on level set is better than the classical algorithm.The Snake [27] algorithm uses gradient and curvature constraints has good segmentation in the better rule boundaries.The GAC [28] algorithm constructs an internal and external force model to segment the image.C_V [29] algorithm has improved the segmentation effect of pulmonary nodules with vague boundaries due to the constraints of the model.The level set algorithm proposed above is easy to fall into local minimum value when the boundary is vague and the shape is concave and convex, resulting in the segmentation effect is not good.The algorithm proposed in this paper can use the local image region statistical information to completely segment the lung nodules.

B. COMPARISION OF SIGNS DISCRIMINANT ALGORITHMS
The discriminant results were measured by four indicators: accuracy, sensitivity, specificity and ROC (receiver operating characteristic curve) curve.We introduce the following concepts at first: The ROC curve can be used to evaluate the performance of two or more image classification algorithms, and it has extremely wide applications in clinical and scientific research.
the analysis of Table 3, it can be seen that the boundary of the normal pulmonary smooth, and the detection results of all algorithms are not much different.In the face of spiculation and lobulation, the results are quite different because of the diversity of their features.3D algorithm [6] analyzed pulmonary nodule characteristics in three-dimensional angle, the algorithm has high algorithm complexity and slow processing time, and it did not establish a unified three-dimensional detection model.GRAD (gradient) algorithm [7] used gradient directionality to discriminate pulmonary nodule characteristics, which has faster speed.RETR (retrieval) algorithm [16] established a retrieval model from the aspect of image information to identify pulmonary nodule characteristics, which has certain discriminating effects.And GLCM (gray level co-occurrence matrix) algorithm [19] fused multiple features to distinguish pulmonary nodule characteristics.
The DTW algorithm [33] focuses on the boundary information with the fast processing speed.The difference of the boundary sequences formed from different positions is not considered resulting in insufficient detection effect.It can be seen from the ROC curve of Fig. 7, although the algorithm proposed in this paper has lower effect on the normal nodule than the GLCM algorithm, it has better results in the face of the lobulation and speculation, and the time meets the clinical application criteria.
In this paper, we can conclude from the algorithm that the accurate segmentation of pulmonary nodules is the premise of the realization of pulmonary nodule signs, so it takes more time to establish the pulmonary nodule extraction model.Generally, although 0.58s/nodule is slightly higher than 0.55s/nodule of DTW, the detection accuracy is greatly improved, which can meet the actual needs.

C. ALGORITHM PROCESS
The proposed algorithm discriminates different types of pulmonary nodules, the effects of each step are shown in Fig. 8. Red is the boundary extracted by the algorithm.Different pulmonary nodules are unfolded into time series, so the difference between normal and lobulation with speculation could be easily distinguished by comparing the smoothness of time series.However, the time series of the normal and lobulation are smooth cannot be distinguished effectively.The recurrence plot algorithm comprehensively considers the chaotic, non-stationary and periodicity of the pulmonary nodule boundary, therefore the difference between normal and lobulation is obvious.

IV. CONCLUSION
In order to reduce the intensity of doctors' work and improve the efficiency of doctors' sign discrimination, a complete process of pulmonary nodule extraction and sign recognition is proposed according to the problems of difficulties in segmentation and recognition of pulmonary nodule by computer.The computer is used to simulate the detection process of doctors, and the accurate segmentation of pulmonary nodules is proposed based on the statistical information of image regions.Then we focus on the pulmonary nodule boundary information to expand the pulmonary nodule into a time series.Finally, its boundary information is intuitively displaced through the establishment of recursive graph and its system is discriminated through the distance of video similarity.The recognition of pulmonary nodule signs is realized, which shows a good effect on the existing database.In the future, we will study the benign and malignant pulmonary nodules on this basis.

FIGURE 1 .
FIGURE 1.The flow chart of characteristics detection.

FIGURE 3 .
FIGURE 3. The display results of CT image in different window width and location.

FIGURE 4 .
FIGURE 4. Initial contour and target contour position.

FIGURE 5 .
FIGURE 5. Schematic diagram of the unfolding of pulmonary nodules.

Fig. 5 ,
the coordinates of point A are (x a , y a ).

FIGURE 6 .
FIGURE 6.Schematic diagram of detecting lung morphological signs by PRCD algorithm.

FIGURE 8 .
FIGURE 8.The characteristics detection result image.

TABLE 1 .
The deduction of algorithm.

TABLE 2 .
Comparison of segmentation results.