Using wavelet transform to detect peaks in PCR signals

The article of the feasibility of determining the nucleotide sequence of sequencing to Sanger is justified. The signals from a sample obtained using a DNA analyzer in which there is a nucleotide sequence are reviewed. A method for determining the sequence based on a continuous wavelet transform is proposed. The results of experimental investigations are presented.

Dye sequencing is currently the basis of automated sequencing, due to its feasibility and high speed. The disadvantage of this method is the difference in the combination of dye-labeled chain terminators with a DNA fragment, which leads to a change in the peak shape in the chromatogram of the electronic DNA sequence after capillary electrophoresis.
The sequencers of the DNA are detected and recorded the dye fluorescence and output data as a sequence of peaks using capillary electrophoresis. Several software packages are implemented that can evaluate the reliability of the peak and remove low-quality ones. The during a research of the DNA on the viruses presence of viruses in its, the number of peaks increases (their amplitude decreases). This requires the development of new information processing methods. The article considers one of the options. We have proposed a method for identifying peaks based on the wavelet transform, which can improve the accuracy of detection of reliable peaks.

Wavelet transform
Wavelet transform is an integral transform, which is a convolution of a mother wavelet with a signal. The wavelet transform translates the signal from a temporal representation into a frequency-temporal one.
The wavelet transform for a continuous signal relative to the mother wavelet is defined as follows: where the parameter uR  corresponds to the time shift, and is called the position parameter, the parameter s > 0 sets the scaling and is called the stretching parameter.
Like a windowed Fourier transform, a wavelet transform can measure the time evolution of frequency transients. This requires using a complex analytic wavelet, which can separate amplitude and phase components. In contrast, real wavelets are often used to detect sharp signal transitions [44][45][46]. Since the purpose of using the wavelet transform is to detect peaks in the signal, a real wavelet was used.
The based on the definition of CWT (Continuous Wavelet Transform), the wavelet coefficients are found from the internal product between the signal x(t) and the wavelets ψs,u (t). The wavelet coefficients indicate the similarity of the signal by the transformation index u and the wavelet in the scale s. To detect peaks, the goal is to map the various shapes (heights and widths) of the peaks by scaling the mother wavelet ψ(t).
It can be shown that the local smoothness of the signal is characterized by a decrease in the amplitude of the wavelet transform with a decrease in scale. Features and differences are highlighted by the study of local maximums of the wavelet transform at small scales [44][45][46].
It was shown in [41,42] that the peaks in PCR signals can be approximately considered a Gaussian function. The mother wavelet is selected so that its main characteristics are similar to the desired peak in the signal [47]. The Mexican hat wavelet is most suitable for this study because of the similarity of its shape to the peak. The Mexican hat wavelet is a Ricker wavelet in mathematics and numerical analysis and is defined as:

Algorithm implementation and signal processing results
It follows from the Hwong -Mallat theorem that the function f can be singular (not Lipschitz 1) at a point ν, only if there is a sequence of wavelet maxima points (up, sp), p ϵ ℕ that converges toward ν at fine scales [44]. Thus, the essence of the method is to determine the features of the signal, that is, in our case, peaks from the lines of the maxima on the signal scalogram. The following is an algorithm for determining peaks using a continuous wavelet transform.

Modeling input signal
The signal modeling was performed according to [41]. The sampling frequency was chosen in such a way that about 10 samples fall on one peak, which corresponds to real data. Denote the model signal as: [ ], 0,1,2, , , where N is the length of signal.

Continuous wavelet transform
We denote the value where X is the matrix of wavelet transform coefficients. Accordingly, the wavelet coefficients for the scale j s are denoted as ,: j X , the j-th row of the matrix.

Maxima detection
From the obtained matrix X, we determine all peak maxima for each scale, that is, it is necessary to satisfy the condition: where  is the threshold for deciding that this maximum is a peak. All maxima below the threshold are not considered. Therefore, j P contains all the maxima for the scale j s , which are larger then  . Sort the elements j P in ascending order: 1 2 , , , ji p is the column index of the i-th peak maxima at row j s in matrix X, and j m is the total number of elements in j P .

Identify ridges
All maxima from the matrix P are studied from the largest scale to the smallest scale. The algorithm for identifying ridges is as follows: 1. Starts at largest scale, jJ = . Every peak maxima at the largest scale  Figure 1 (a, b). The peak detection process: (a)model signal; (b) -CWT with identified ridges after removing short ridges. On average, about 99 % of the query cover was achieved for the first half of the real signal. The results for the entire signal are highly dependent on the quality of that signal.

Conclusion
For real signals that do not have large distortions we were able to achieve a peak determination reliability of 0.99. The research shows that for the "good" part of the real signal, in which distortions are not yet very pronounced, this method detects on average more luminescent peaks than other methods. For better detection of peaks along the entire length of the signal, it is necessary to configure the initial parameters for constructing lines on the scalogram. The further development our work will be the automation of parameter settings for determining peaks.