Feature Extraction and Similarity of Movement Detection during Sleep, Based on Higher Order Spectra and Entropy of the Actigraphy Signal: Results of the Hispanic Community Health Study/Study of Latinos

The aim of this work was to develop a new unsupervised exploratory method of characterizing feature extraction and detecting similarity of movement during sleep through actigraphy signals. We here propose some algorithms, based on signal bispectrum and bispectral entropy, to determine the unique features of independent actigraphy signals. Experiments were carried out on 20 randomly chosen actigraphy samples of the Hispanic Community Health Study/Study of Latinos (HCHS/SOL) database, with no information other than their aperiodicity. The Pearson correlation coefficient matrix and the histogram correlation matrix were computed to study the similarity of movements during sleep. The results obtained allowed us to explore the connections between certain sleep actigraphy patterns and certain pathologies.


Introduction
Actigraphy is now being increasingly used to explore sleep patterns in sleep laboratories. Its main advantages include its easy setup, its low cost, and the fact that prolonged records can be obtained over time, permitting patient activity in ambulatory conditions without interfering with their daily routines. It is considered to be a valuable tool for controlling and monitoring circadian alterations and insomnia, as well as avoiding false positives in the assessment of daytime sleepiness tests, such as the multiple sleep latency test, and the wakefulness maintenance test [1][2][3][4][5].
Many recent studies have validated the practice of actigraphy, for example, in [6] several wrist-worn sleep assessments, actigraphy devices were compared. A relationship has been found between sleep disorders and their effects on certain conditions, such as hypertension and obesity [7], and it is now even possible to analyze sleep depth by actigraphy signals [8].
A review of the current state of higher-order statistics (HOS) and their use in biosignal analysis can be found in [9]. As most of the biomedical signals are non-linear, non-stationary, and non-Gaussian m x k (τ 1 , τ 2 , . . . , τ k−1 ) = E(x(n)·x(n + τ 1 ) . . . x(n + τ k−1 )) (1) represents the moment of order k of that vector. This moment only depends on the different time slots τ 1 , . . . , τ k−1 where τ i = 0, ±1, . . . for all i. The cumulants are similar to the moments, but the difference is that the moments of a random process are derived from the characteristic function of the random variable, while the cumulant generating function is defined as the logarithm of the characteristic function of that random variable. The k-th order cumulant of a stationary random process {x(n)} n can be written as [20]: c x k (τ 1 , τ 2 , . . . , τ k−1 ) = m x k (τ 1 , τ 2 , . . . , τ k−1 ) − m G k (τ 1 , τ 2 , . . . , τ k−1 ), (2) where m G k (τ 1 , τ 2 , . . . , τ k−1 ) is the k-th order moment of a process with an equivalent Gaussian distribution that presents the same mean value and autocorrelation function as the vector {x(n)} n .
It is evident from (2) that a process following a Gaussian distribution has null cumulants for orders greater than 2, since m x k (τ 1 , τ 2 , . . . , τ k−1 ) = m G k (τ 1 , τ 2 , . . . , τ k−1 ), and so that c x k (τ 1 , τ 2 , . . . , τ k−1 ) = 0 [20,21]. In practice, we estimate cumulants and polyspectra from a finite amount of data {x(n)} N−1 n=0 . These estimates are also random and are characterized by their bias and variance [22]. Let {x(n)} n . denote a zero mean stationary process; we assume that all relevant statistics exist, and that they have finite values. The third order cumulant sample estimate is given by [21]: where N 1 yand N 2 are chosen such that the sums only involve x(n) for n = 0, . . . , N − 1, N being the number of samples in the cumulant region. Likewise, the bispectrum estimation is defined as the Fourier Transform of the third-order cumulant sequence [22]: where f 1 and f 2 are the spectral frequency vectors of the sequence {x(n)} N−1 n=0 , and X( f i ), i = 1, 2, is its Fourier Transform.

Bispectral Entropy Analysis
Entropy provides a measure for quantifying the information content of a random variable in terms of the minimum number of bits per symbol that are required to encode the variable. It is an indicator of the amount of randomness or uncertainty of a discrete random process [23]. Consider a random variable Z with M states z 1 , z 2 , . . . z M , and state probabilities p 1 , p 2 , . . . p M , that is, P(Z = z i ) = p i , the entropy of Z is defined as: The entropy of a discrete-valued random variable attains a maximum value for a uniformly distributed variable. In order to extend this notion from the spatial to the frequency domain, we introduce bispectral entropy as a way of measuring the uniformity of the spectrum [21]. The bispectral entropy is defined as: where the energy probability is computed in terms of the bispectrum estimation: Sensors 2018, 18, 4310 4 of 17

Results
The actigraphy signals that measured the movements of individuals while sleeping were analyzed. These movements have an intrinsically random nature, since they can occur with non-specific probabilities and durations. This can be checked by analyzing the frequency spectrum of the activity signal and comparing it with a noise pattern. The probabilistic distribution function of the spectral pattern depends on the nature and uniformity of the movements, which may follow a normal distribution or another, such as a uniform distribution, depending on the random nature of the process.

Application of the Bispectrum to the Actigraphy Signal
A spectral analysis based on the one-dimensional Fourier transform is not recommended for the detection of traits in a random signal, such as the actigraphy signal. For these, this analysis only provides information relative to the magnitude-frequency or phase-frequency distribution. In other words, what is visualized in the spectrum is noise, which in our case, is in fact the useful information from which certain characteristics and features have to be extracted. The frequency spectrum of two actigraphy signals is shown in Figure 1, where it can be seen that the one-dimensional Fourier Transform is not able to identify the discriminant features in this type of signal.

Application of the Bispectrum to the Actigraphy Signal
A spectral analysis based on the one-dimensional Fourier transform is not recommended for the detection of traits in a random signal, such as the actigraphy signal. For these, this analysis only provides information relative to the magnitude-frequency or phase-frequency distribution. In other words, what is visualized in the spectrum is noise, which in our case, is in fact the useful information from which certain characteristics and features have to be extracted. The frequency spectrum of two actigraphy signals is shown in Figure 1, where it can be seen that the one-dimensional Fourier Transform is not able to identify the discriminant features in this type of signal. Unlike the one-dimensional frequency spectrum, the bispectrum of an activity signal can provide information on the spatial distribution of the amplitude, and on the frequency components (see Equation (4)). This information can be represented in a matrix that can be used to obtain the particular identification features of each signal. The bispectrum of the actigraphy signal was simulated in MatLab, using the Higher Order Spectra Analysis toolbox. Figures 2 and 3 show the contours of the bispectrum surface of the actigraphy signal, where f1 and f2 are the normalized spectral frequency vectors generated from the calculation of the bidimensional Fourier Transform.
We found that the bispectrum can indicate variables that measure specific characteristics of the movement during sleep, based on the uniformity of the activity data and the disorder of the sample. Here, a greater frequency disorder at a bispectral level may imply an excess of movement during the analyzed period, which can even be an identifying feature of sleep, and be linked to patients. For the sake of completeness, we can see in  that the bispectrum is a unique variable for each actigraphy signal. Unlike the one-dimensional frequency spectrum, the bispectrum of an activity signal can provide information on the spatial distribution of the amplitude, and on the frequency components (see Equation (4)). This information can be represented in a matrix that can be used to obtain the particular identification features of each signal. The bispectrum of the actigraphy signal was simulated in MatLab, using the Higher Order Spectra Analysis toolbox. Figures 2 and 3 show the contours of the bispectrum surface of the actigraphy signal, where f1 and f2 are the normalized spectral frequency vectors generated from the calculation of the bidimensional Fourier Transform.
We found that the bispectrum can indicate variables that measure specific characteristics of the movement during sleep, based on the uniformity of the activity data and the disorder of the sample. Here, a greater frequency disorder at a bispectral level may imply an excess of movement during the analyzed period, which can even be an identifying feature of sleep, and be linked to patients. For the sake of completeness, we can see in  that the bispectrum is a unique variable for each actigraphy signal.       It can also be seen that the daily bispectrum registrations are all different from each other, showing that all these registers form an identification pattern, which we have named the bispectral pattern of the activity signal.
A bispectrum analysis was performed on 20 different activity signal records. We tried to identify each one with a specific spectral sleep pattern per day, and to find a possible relationship between an individual's movement patterns during sleep. The results obtained are shown in Figures 6-8, which give the bispectrum of the actigraphy signal for the first 10 of the 20 analyzed actigraphy signals from the HCHS/SOL database.  It can also be seen that the daily bispectrum registrations are all different from each other, showing that all these registers form an identification pattern, which we have named the bispectral pattern of the activity signal.
A bispectrum analysis was performed on 20 different activity signal records. We tried to identify each one with a specific spectral sleep pattern per day, and to find a possible relationship between an individual's movement patterns during sleep. The results obtained are shown in Figures 6-8, which give the bispectrum of the actigraphy signal for the first 10 of the 20 analyzed actigraphy signals from the HCHS/SOL database. It can also be seen that the daily bispectrum registrations are all different from each other, showing that all these registers form an identification pattern, which we have named the bispectral pattern of the activity signal.
A bispectrum analysis was performed on 20 different activity signal records. We tried to identify each one with a specific spectral sleep pattern per day, and to find a possible relationship between an individual's movement patterns during sleep. The results obtained are shown in Figures 6-8, which give the bispectrum of the actigraphy signal for the first 10 of the 20 analyzed actigraphy signals from the HCHS/SOL database.  It can be seen that there are unique identifiable characteristic features that can be used to obtain patterns of movement during sleep. For instance, Figures 5a, 6b, 7a, and 8d have similar contours. This means individuals can be divided into groups according to the similarity of their sleep patterns. It can be seen that there are unique identifiable characteristic features that can be used to obtain patterns of movement during sleep. For instance, Figures 5a, 6b, 7a, and 8d have similar contours. This means individuals can be divided into groups according to the similarity of their sleep patterns. To further illustrate these results, we correlated the bispectrum of the seven days of signals by computing the Pearson correlation coefficients for every pair of samples to find similarities between the two signals. The results are given in the correlation matrix R in Table 1. For example, is the Pearson correlation coefficient between the bispectrum of samples 1 and 2 from hchs-sol-sueno-00163225 and hchs-sol-sueno-00238589.
In order to determine subgroups in the set of samples, and to identify the pairs of signals that give correlation values closest to 1, we selected the pairs with correlation values of greater than 0.97. This was done to satisfy the hypothesis of the similarity of the sleep movement patterns of two signals, since there must be as few differences as possible, and therefore, also minimal differences in their bispectral patterns. The results of similar pairs are shown in black in Figure 9, in which the values with the lowest correlation are indicated with red dashed lines to show different activity patterns. For this latter case, we considered values of below 0.8. Although these values are relatively high in comparison with other applications, we have considered its use for the search of dissimilar sleep patterns. To further illustrate these results, we correlated the bispectrum of the seven days of signals by computing the Pearson correlation coefficients for every pair of samples to find similarities between the two signals. The results are given in the correlation matrix R in Table 1. For example, R 1−2 is the Pearson correlation coefficient between the bispectrum of samples 1 and 2 from hchs-sol-sueno-00163225 and hchs-sol-sueno-00238589.
In order to determine subgroups in the set of samples, and to identify the pairs of signals that give correlation values closest to 1, we selected the pairs with correlation values of greater than 0.97. This was done to satisfy the hypothesis of the similarity of the sleep movement patterns of two signals, since there must be as few differences as possible, and therefore, also minimal differences in their bispectral patterns. The results of similar pairs are shown in black in Figure 9, in which the values with the lowest correlation are indicated with red dashed lines to show different activity patterns. For this latter case, we considered values of below 0.8. Although these values are relatively high in comparison with other applications, we have considered its use for the search of dissimilar sleep patterns.  The correlation values given in Table 1 and Figure 9 show that there may be a similarity in sleep movement patterns. In Table 1, the maximum distance value is 0.3122 and the minimum is 10 , the mean is 0.0538, and the statistical mode (the most frequent value in an array) is 0.001. Figure 10 gives a comparative measurement of the values in Table 1 by rearranging the columns of the matrix into a vector, and considering it as a time series, in which the x-coordinate is the position in the vector and the y-coordinate, the corresponding value of the coefficient. In this arrangement, the groups indicate almost repetitive terms that represent signals with similar characteristics. The correlation values given in Table 1 and Figure 9 show that there may be a similarity in sleep movement patterns. In Table 1, the maximum distance value is 0.3122 and the minimum is 10 −6 , the mean is 0.0538, and the statistical mode (the most frequent value in an array) is 0.001. Figure 10 gives a comparative measurement of the values in Table 1 by rearranging the columns of the matrix into a vector, and considering it as a time series, in which the x-coordinate is the position in the vector and the y-coordinate, the corresponding value of the coefficient. In this arrangement, the groups indicate almost repetitive terms that represent signals with similar characteristics.  Table 1.
In order to better distinguish the differences and similarities between the sleep signals, we performed another analysis using the bispectral entropy as the method of characterizing the disorder/uniformity of the processed signals.

Application of Bispectral Entropy as a Measure of Actigraphy Disorder
The experiment was based on a similarity analysis, analogous to that of the bispectrum. We calculated the bispectral entropy of each activity sample for the whole period of seven days, to obtain In order to better distinguish the differences and similarities between the sleep signals, we performed another analysis using the bispectral entropy as the method of characterizing the disorder/uniformity of the processed signals.

Application of Bispectral Entropy as a Measure of Actigraphy Disorder
The experiment was based on a similarity analysis, analogous to that of the bispectrum. We calculated the bispectral entropy of each activity sample for the whole period of seven days, to obtain a measure of the degree of uniformity of the sleep movement pattern, taking the degree of randomness of the activity signal into account. We considered the maximum value of the bispectral entropy as a way of describing the degree of uniformity of a random process.
The bispectral entropy of the signals was computed in a minimum window of eight samples, to represent the temporal displacement index of the signals. The results obtained are shown in Figure 11, together with the mean value of the bispectral entropy of each actigraphy signal.  Table 1.
In order to better distinguish the differences and similarities between the sleep signals, we performed another analysis using the bispectral entropy as the method of characterizing the disorder/uniformity of the processed signals.

Application of Bispectral Entropy as a Measure of Actigraphy Disorder
The experiment was based on a similarity analysis, analogous to that of the bispectrum. We calculated the bispectral entropy of each activity sample for the whole period of seven days, to obtain a measure of the degree of uniformity of the sleep movement pattern, taking the degree of randomness of the activity signal into account. We considered the maximum value of the bispectral entropy as a way of describing the degree of uniformity of a random process.
The bispectral entropy of the signals was computed in a minimum window of eight samples, to represent the temporal displacement index of the signals. The results obtained are shown in Figure  11, together with the mean value of the bispectral entropy of each actigraphy signal.  It can be seen that signals 8 and 16 have the lowest bispectral entropy values, due to the non-uniformity of the bispectrum frequency distribution. This can also be identified in some of the previous graphs; for instance, in Figure 8b, the high-frequency components are characterized by the outer points (in blue), and the disconnected regions are the lowest frequency values.
In Figure 11 there are also samples with similar values of bispectral entropy of between 0.98 and 0.99, which indicates that they may be related to the hypothesis that activity samples with a similar correlation at the bispectral level may have the same level of uniformity of their value distributions. The opposite is also true with the minimum values of bispectral entropy, shown in Figure 11, as are those of samples 8, 10, 7, and 16, and other visible relationships, whose correlation values are under 0.8 in Table 2, and in Figure 11 are related to different uniformity patterns.
Given the analogy of the activity signal with the random process, the maximum entropy value would mean a greater uniformity of movement in the subject in the time interval studied, i.e., a high uniformity in the randomness of the movements. Conversely, occasional movements would be associated with impulsive noise, which has a non-uniform randomness, and thus, it would be associated with minimum entropy.
To also visualize the frequency of the maximum uniformity of sleep movements, histograms were made of the 7-day bispectral entropy of each activity signal. The frequencies of the entropy values for each processed sample are shown in Figures 12 and 13. These histograms provide information on the number of repetitions of the entropy values in each sample, i.e., the number of times the value in the data vector is repeated. associated with minimum entropy.
To also visualize the frequency of the maximum uniformity of sleep movements, histograms were made of the 7-day bispectral entropy of each activity signal. The frequencies of the entropy values for each processed sample are shown in Figures 12 and 13. These histograms provide information on the number of repetitions of the entropy values in each sample, i.e., the number of times the value in the data vector is repeated.  associated with impulsive noise, which has a non-uniform randomness, and thus, it would be associated with minimum entropy.
To also visualize the frequency of the maximum uniformity of sleep movements, histograms were made of the 7-day bispectral entropy of each activity signal. The frequencies of the entropy values for each processed sample are shown in Figures 12 and 13. These histograms provide information on the number of repetitions of the entropy values in each sample, i.e., the number of times the value in the data vector is repeated. Although none of the histograms is repeated in Figures 12 and 13, some of them show certain similarities that could indicate similar sleep patterns. To verify this, the histograms were correlated to each other, with the criteria for the entropy values as well as for the data repetition frequency. The results are shown below in Table 2. Table 2. Correlation matrix obtained from the analysis of the bispectral entropy histograms of the 20 analyzed samples from the HCHS/SOL database. Although none of the histograms is repeated in Figures 12 and 13, some of them show certain similarities that could indicate similar sleep patterns. To verify this, the histograms were correlated to each other, with the criteria for the entropy values as well as for the data repetition frequency. The results are shown below in Table 2.   Table 2 contains the results based on the histogram of the bispectral entropy of the activity signals to provide a criterion for the similarity of the data, based on the uniformity of the bispectrum. This table can be interpreted similarly to Table 1, which was based on the algorithm that describes the matrix correlation in Figure 9.
According to the previous analysis, the upper threshold was 0.97, and the lower threshold was a little lower than previously found. We considered 0.7 to distinguish between the similarities and clear differences among the signals (see Figure 14). It can thus be seen that several histograms are highly correlated, which indicates that this activity signal presents a high level of data uniformity, i.e., bispectral entropies with similar values, and also a high correlation value in terms of the bispectrum comparison. The dispersion graph of the correlation values obtained from Table 2 is shown in Figure 15. The data with similar values are seen to be grouped. The maximum value of the distance matrix is 0.6715, and the minimum is 10 . The It can thus be seen that several histograms are highly correlated, which indicates that this activity signal presents a high level of data uniformity, i.e., bispectral entropies with similar values, and also a high correlation value in terms of the bispectrum comparison. The dispersion graph of the correlation values obtained from Table 2 is shown in Figure 15. The data with similar values are seen to be grouped. The maximum value of the distance matrix is 0.6715, and the minimum is 10 −5 . The mean value of the distance matrix was 0.1407, and the statistical mode was 10 −5 , which indicates data groups with similar characteristics associated with the same type of movement, as can be seen in Figure 15. It can thus be seen that several histograms are highly correlated, which indicates that this activity signal presents a high level of data uniformity, i.e., bispectral entropies with similar values, and also a high correlation value in terms of the bispectrum comparison. The dispersion graph of the correlation values obtained from Table 2 is shown in Figure 15. The data with similar values are seen to be grouped. The maximum value of the distance matrix is 0.6715, and the minimum is 10 . The mean value of the distance matrix was 0.1407, and the statistical mode was 10 , which indicates data groups with similar characteristics associated with the same type of movement, as can be seen in Figure 15.  Table 2. Figure 15. Scatter plot for the correlation matrix shown in Table 2.

Discussion
In order to associate the results with clinical diagnoses, several variables were taken from the HCHS/SOL database as the clinical characteristics of the 20 actigraphy samples. First, we considered the following variables: CDCR_SUENO: self-report of cerebrovascular disease & carotid revascularization. CHD_SELF_SUENO: combination of self-reports of coronary revascularization or heart attack. DIABETES_SELF_SUENO: indicates a self-report of diabetes. DIABETES _SUENO: indicates diabetes. DM_AWARE_SUENO: describes the awareness of diabetes. Hypertension_SUENO: indicates hypertension status. STROKE_SUENO: checks for a self-report of stroke history. STROKE_TIA_SUENO: checks for medical history of stroke, mini-stroke or TIA (transient ischemic attack).
These variables are of the 0/1 type, i.e., '0' for a negative response and '1' for a positive. Their values for the 20 individuals whose actigraphy signals were processed can be found in Table 3.  To relate the clinical characteristics of the patients with the obtained results, the correlation was first used, which is a measure of the similarity of data. We show these results, although the obtained correlations are weak, in part, for the limited number of signals used, and for the limitations of the information content embedded in the used signals database.
We opted to consider the HYPERTENSION_SUENO variable to study relationships within the actigraphy signals, since its value varies in several samples. First, we saw that 47.62% of the pairs whose bispectrum correlates with a value greater than 0.97 share the same clinical diagnosis. However, in Figure 9, it can be seen that the pairs with the same positive or negative diagnosis tend to cluster, which indicates a stronger hidden relationship that cannot be obtained by simply correlating the bispectrum of the signals (see Figure 16). A similar effect was found in the comparison of the bispectral entropy histograms. Only 41.17% of the pairs correlated with a coefficient of 0.97 or higher present the same hypertension diagnoses. However, in the pairs with the same diagnosis in Figure 14 those sharing the hypertension diagnosis are seen to be connected (see Figure 17). A similar effect was found in the comparison of the bispectral entropy histograms. Only 41.17% of the pairs correlated with a coefficient of 0.97 or higher present the same hypertension diagnoses. However, in the pairs with the same diagnosis in Figure 14 those sharing the hypertension diagnosis are seen to be connected (see Figure 17). Figure 16. Pairs of bispectrum signals correlated with a coefficient that is greater than 0.97 (black lines) or lower than 0.7 (red dashed line). The thick black line indicates pairs that share a hypertension diagnosis, while the dashed black line indicates pairs in which neither has hypertension.
A similar effect was found in the comparison of the bispectral entropy histograms. Only 41.17% of the pairs correlated with a coefficient of 0.97 or higher present the same hypertension diagnoses. However, in the pairs with the same diagnosis in Figure 14 those sharing the hypertension diagnosis are seen to be connected (see Figure 17). Although, the results shown in Figures 16 and 17 are not conclusive, they do suggest a further in-depth study of the characteristics of bispectrum signals that can contribute most to these similarities. It is also worth mentioning that the limited number of cases considered in this study advise a more systematic study of larger database samples.

Conclusions
This paper has shown that the application of higher-order statistical analysis to actigraphy signals can contribute to determining the traits and patterns of movement during sleep. These criteria can be based on part of the spatial information provided by the bispectrum and the bispectral entropy, both of which can help us to determine effective criteria for measuring the uniformity of data randomness. Although, the results shown in Figures 16 and 17 are not conclusive, they do suggest a further in-depth study of the characteristics of bispectrum signals that can contribute most to these similarities. It is also worth mentioning that the limited number of cases considered in this study advise a more systematic study of larger database samples.

Conclusions
This paper has shown that the application of higher-order statistical analysis to actigraphy signals can contribute to determining the traits and patterns of movement during sleep. These criteria can be based on part of the spatial information provided by the bispectrum and the bispectral entropy, both of which can help us to determine effective criteria for measuring the uniformity of data randomness.
The actigraphy signal experiments suggest the possible application of these criteria for the extraction and comparison of patterns of sleep movements. This would have a potential use in medicine, since similar pathologies may have similar associated movement patterns.
In future work we propose to use high-order statistical techniques, as for instance in [23]. We also want to experiment with data from chest actigraphy or other actigraphy signal measures, to corroborate the potential use of sleep actigraphy signals for purposes of diagnosis.
Our next step will be to increase the number of cases analyzed to cover the entire HCHS/SOL database, and also to experiment with other clinical characteristics in patients and pathologies associated with specific sleep disorders or brain-associated diseases.