A computationally efficient algorithm to obtain an accurate and interpretable model of the effect of circadian rhythm on resting heart rate

Objective: Wrist-worn wearable devices equipped with heart rate (HR) sensors have become increasingly popular. The ability to correctly interpret the collected data is fundamental to analyse user’s well-being and perform early detection of abnormal physiological data. Circadian rhythm is a strong factor of variability in HR, yet few models attempt to accurately model its effect on HR. Approach: In this paper we present a mathematical derivation of the single-component cosinor model with multiple components that fits user data to a predetermined arbitrary function (the expected shape of the circadian effect on resting HR (RHR)), thus permitting us to predict the user’s circadian rhythm component (i.e. MESOR, Acrophase and Amplitude) with a high accuracy. Main results: We show that our model improves the accuracy of HR prediction compared to the single component cosinor model (10% lower RMSE), while retaining the readability of the fitted model of the single component cosinor. We also show that the model parameters can be used to detect sleep disruption in a qualitative experiment. The model is computationally cheap, depending linearly on the size of the data. The computation of the model does not need the full dataset, but only two surrogates, where the data is accumulated. This implies that the model can be implemented in a streaming approach, with important consequences for security and privacy of the data, that never leaves the user devices. Significance: The multiple component model provided in this paper can be used to approximate a user’s RHR with higher accuracy than single component model, providing traditional parameters easy to interpret (i.e. the same produced by the single component cosinor model). The model we developed goes beyond fitting circadian activity on RHR, and it can be used to fit arbitrary periodic real valued time series, vectorial data, or complex data.

time of the day. Once the expected RHR is calculated, the difference between the measures HR and the RHR can provide valuable insight about the user physiological health and well-being.
RHR is widely investigated in the literature because it is an independent predictor of cardiovascular and allcause mortality in men and women (Kristal-Boneh et al 2000, Fox et al 2007, Larsson et al 2019, Li et al 2019, Sartipy et al 2019. However, these studies are focused on a single point measurements without evaluating the variation of RHR through the day that could provide more information about health status of the population. In Faria and Drummond (1982) detected a physiological response fluctuation during the day of RHR and resting temperature, demonstrating a relationship with physical activity tasks. All the studies previously cited evaluated the cardiovascular disease risk and all-cause mortality from RHR and not from HR because of RHR reflects the basal activation of the autonomic nervous system that is found to be a predictor of several pathologies. The variation of HR among days could be only due to the different physical activity performed in a specific day (Fossion et al 2018) and not from a real change in health status, that can be detected analyzing the daily fluctuation of RHR.
Physiological measures such as HR and RHR are continuous variables that are subject to the influence of external and internal stimuli, changing over a daily cycle of about 24 h with patterns that have been defined as circadian rhythms (Faria andDrummond 1982, Koukkari andSothern 2006). The time series obtained from wrist IoT wearable device are often noisy (i.e. variations in the biological system that are not part of the deterministic portion of the signal and that are derived from external errors such as instrument inaccuracy), short (to save battery) and sparse (i.e. unequal time intervals between observations, missing datapoints) (Fernández et al 2003). Despite these time series problems and the fact that the real signal is not a simple sinusoidal, HR circadian rhythms analysis is usually performed using the cosinor method (Cornelissen 2014) with a single component because the resulting parameters can be easily interpreted. In particular, the cosinor curve with single component provides three parameters that describe the circadian rhythm: (i) mid-line estimating statistic of rhythm (MESOR) is an estimation of central tendency of the distribution of values across the cycles of the circadian rhythm computed using a cosine function; (ii) amplitude is the difference between the peak and the mean value of a wave; (iii) acrophase is the time of the day at which the peak of the circadian rhythm occurs.
It was found that the shape of the HR as a function of the hour of the day significantly differs from a simple cosine: during the night the HR becomes slower for approximately 6 h, while during the day it rises for approximately 18 h, with three peaks corresponding to meals (Malik et al 1990, Nakagawa et al 1998. However, only a few studies in the literature propose a cosinor method with multiple-component to analyze periodic components in non-sinusoidal longitudinal time series (Fernández et al 2003, Cornelissen 2014. The result of the multiplecomponent cosinor analysis includes several parameters which are not easy to interpreted in comparison to the three parameters provided by single-component cosinor. Despite the multiple-component model being found to be more suitable to approximate the signals waveform when it deviates from sinusoidal, the single-component cosinor model is still widely used to investigate physiological circadian rhythm because of its simplicity and readability (Cornelissen 2014).
Some models have been proposed to fit the data more accurately. Leise proposed the use of wavelets (Leise andHarrington 2011, Leise 2017). Fossion proposed the use of multiscale adaptive analysis (Fossion et al 2017). Those approaches improve the goodness of fit, but fitting parameters are difficult to interpret, compared to the single component cosinor. An additional limitation of those methods is that they are only concerned with the frequencies present in the data, without requiring the signal to follow a specified shape.
With the wide adoption of fitness devices, concerns about the security and privacy of the data collected by these devices have been raised (Aktypi et al 2017, Fereidooni et al 2017. Among the principles of Privacy by Design are data minimisation, which states that only the data strictly necessary to perform the intended activities should be collected, and full functionality, which states that privacy should not impair the functionalities of the product (Cavoukian et al 2009). However, the standard approach of fitness companies is to upload all the user data to their servers for processing, and performing all the data analysis on the cloud, requiring access to users raw HR data. In practice, data minimisation is usually sacrificed in favour of full functionality.

Contributions of this paper
In this paper we present a mathematical derivation of the single-component cosinor model with multiple components that fits user data onto an arbitrary function (i.e. the model which describes the circadian rhythm of the population), thus enabling the prediction of the user's circadian rhythm effect on RHR.
Unlike the cosinor model with multiple components provided by Cornelissen (2014), the model proposed in our study provides the same three easy-to-interpret parameters which can be derived from the cosinor model with single component (i.e. MESOR, amplitude, acrophase).
Our model allows to assign weights to every data point used for fitting the expected shape to the user data. This enables advanced analyses with arbitrary error models, like assigning less importance to data points collected while the user was moving (expecting motion artifacts, as explained in Morelli et al (2017)). Using weights also it makes possible to implement real-time streaming analysis of the data, performing a new analysis as new data arrives, assigning less importance to older data. In this study we show how to implement such a real-time strategy assigning exponentially decaying weights to old data, obtaining a circadian model that continuously changes with new data. We show how the three fitted parameters (i.e. MESOR, amplitude, acrophase) can be used to interpret behavioural and physiological changes.
The model presented in this paper is computationally light and, for this reason, it is suitable to be executed on a wearable device. The complexity of the accumulation phase is linear on the number of data points, while the complexity resolution phase depends on the number of frequencies used and not on the number of data points.
To the best of our knowledge, this the first study that provides an algorithm that fits the circadian rhythm by using a multiple component cosinor, forcing the data to fit a particular shape, i.e. the population average effect of RHR circadian rhythm, only leaving three free parameters (i.e. the same used by single one). Moreover, our algorithm allows to perform all the data sensitive calculations on the wearable device. As a consequence users raw HR data could be kept on the wearable device, while allowing companies to obtain the output of the analysis, implementing both the full functionality and the data minimisation Privacy by Design principles.
Predicting a user's RHR is an important step to interpret the measured HR data: the residual between measured HR and estimated RHR can then be correlated with physical activity, and inference of valuable insight on this user's well-being and health becomes possible.

Background
Short and sparse time series were historically analyzed by the single-component cosinor model. However, this model is not suitable for unevenly sampled, noisy time series (i.e. the one showing several outlier values), with missing data such as the data recorded by wearable HR trackers.
Cornelissen (2014) extend the single-component cosinor to a multiple-component model in order to analyze the time series in chronobiology. Instead of solving a system of three equations in three unknowns to define the parameters of the circadian rhythm curve (i.e. MESOR, amplitude, acrophase) there are 2c + 1 parameters to estimate (i.e. c refers to the number of harmonics selected to describe the circadian). This approach better approximates the signals waveform of the circadian rhythm compared to single-component cosinor model thanks to the higher number of cosinor components used to fit the model on the circadian rhythm curve. For example, a 2-component cosinor model that describes periods of 24 and 12 h has been found to be well-suited to approximate the nightly drop of blood pressure (Halberg et al 1981). However, the parameters provided by this model result hard to interpret and to compare with the results provided from single-or other multiple-comp onent cosinor methods (Fernández et al 2003), e.g. a small change in the phase of an harmonic will result in drastic changes in the resulting shape of the function, and parameters like amplitude will be distributed among several components.
Fitting unevenly sampled data to sinusoidal functions can also be performed by least-squares spectral analysis techniques, such as the Lomb-Scargle periodogram (Vander Plas 2018), or the Vaníček method (Taylor and Hamilton 1972). The setting of our problem is different, as in we are interested in fitting functions with a particular shape, that is not a simple sinusoidal, but an arbitrary combination of cosines, with fixed amplitudes and locked phases.

The mathematical model 2.1.1. Prior from general population
We model the prior of RHR over time A(h) with the sum of c cosines, as defined in the multiple-component cosinor in Cornelissen (2014). This equation can be rewritten using Euler's complex exponential formula as equation (1). The number c of components to use is a free parameter: The parameters β, α, and φ correspond to the MESOR, amplitude, and acrophase, and can be found as explained in Cornelissen (2014). (2) Note that we have defined ψ so that ψ −k is equal to the complex conjugate of ψ k . Therefore, the number of real parameters has not changed.
Equations (1) and (2) are equivalent to cosinor analysis (Cornelissen 2014) and the Vaníček method (Taylor and Hamilton 1972). From the next section we introduce novel contributions.

Fitting user data with locked parameters
Single component cosinor can only accurately fit data modulated by a simple sinusoidal factor. Circadian effect on RHR is unlikely to be fully captured by such a simple function. Multiple component cosinor can fit data more accurately (Cornelissen 2014), but the resulting set of fitting parameters, (a MESOR; an acrophase and an amplitude for each component) can not be easily interpreted, making it difficult to interpret the parameters of each component. Moreover, fitting a model for each user with all the free parameters will fit shapes that might not be what circadian is supposed to be, and it will be difficult to get an idea of what is the phase and amplitude.
We use the A(h) model provided in the previous section as the prototype (and prior) for users RHR throughout the day (circadian rhythm). However, every user has a different RHR at night, during the day, and might go to sleep and wake up at different times than most of the users (that generated the model). For this reason we define a refined version of the model where some parameters are let free to adapt to users data. In particular, equation (3) shows the formula that locks parameters of the c components only allowing three degrees of freedom (the same used in single component cosinor analysis: MESOR, amplitude, and acrophase). This is equivalent to single comp onent cosinor analysis, using a different function than a cosine: The free parameters in equation (3) are θ 0 , the average RHR of the user; θ 1 , the amplitude of the oscillation of the user RHR with respect to the prior. Values of θ 1 close to 1 indicate that the range of the RHR is similar to the prior, less than 1 indicate that the user RHR has a smaller range compared to the prior, etc; and θ 2 , the phase of the user circadian rhythm with respect to the phase of a cosine with period equal to one day. θ 2 ranges between −0.5 and 0.5. Values of θ 2 close to 0 indicate that the user is sleeping at midday.
The θ parameters allow to express the user RHR, U(h), as a vertical shift (θ 0 ), horizontal shift (θ 2 ), and a change in amplitude (θ 1 ) of the population prior A(h).

Loss function
To find θ that best approximates U(h) for a particular user, we fit U(h) to the RHR data collected for that user, minimising the distance L between the data and U(h) as shown in equation (4), where y j is the j th measure of RHR of the user, x j is the time of the measure, and w j is the weight assigned to the j th measure: Finding the θ that minimises L is not trivial, because we use periodic functions that are not simple cosines, but a particular combination of cosines, defined by the α and φ parameters, that are not allowed to change. The literature of cosinor based rhythmometry does not offer any solutions, neither analytic nor numeric. Therefore, we developed an analytic solution to finding the θ that minimises L.
First, we introduce the following notation to further simplify formulae:

D Morelli et al
We can now develop equation (4) using the definition of U in equation (3) into the following: where the indices k and m range between −c and c.
To minimize the loss we put to zero the partial derivatives with respect to θ 0 , θ 1 , θ 2 . We obtain the system in equation (9): Substituting θ 0 and then θ 1 in the equations of the system (9) we find a polynomial in ζ: where Since the indices m, l, and k range between −c and c, and ∀m.Ψ m,m,m = 0, the polynomial in equation (10) has degree 6c − 2.
The roots of the polynomial in ζ can be found with spectral methods, i.e. using the eigenvalues of the companion matrix (Edelman and Murakami 1995) to find an approximation, and then the solutions can be refined using the Newton-Raphson method.
From the roots of the polynomial we can determinate the values of θ 2 , and subsequently of θ 1 and θ 0 , where the total derivative of L(θ) is zero. We compute L(θ) using equation (8) on these points to determine the absolute minimum.
Our model is based on the weighted least squares approach. Data do not need to respect the assumption of homoscedasticity, and weights can be designed to focus the analysis on certain areas of the data, e.g. give less weight to data collected in movement.

Engineering considerations
The system in equation (9) depends on data indirectly through the values of F and S. Moreover, also the square loss function can be expressed in term of these values (and the total sum of y 2 i ). This means the whole algorithm does not require access to the full dataset, but only a limited set of values (namely the values of S, F and the sum of squares of y values if we want to output an absolute value for the loss). All these values are moreover an additive function of data sets, that is for two disjoint set of values A = {x i , y i } and B = {x j , y j } S k,A∪B = S k,A + S k,B and F k,A∪B = F k,A + F k,B . This implies that it can be computed separately for each subset of samples, and then added together to obtain the total value: for example it can be accumulated directly in the user's device, with big advantages in computational complexity, data transfer and privacy. In particular the asymptotic analysis of the algorithm complexity is the following (where N is the number of data points, c the number of components):

• The accumulation of data into the surrogate values S and F is O(N · c).
• To solve the polynomial we need to find eigenvalues of a 6c − 2 sided matrix which requires O(c 3 ) operations. • To compute the loss for a given set of fit parameters we required O(c 2 ) operations, which has to be repeated for each candidate solution, for a total of O(c 3 ) operations.
The total complexity is thus O(c 3 + Nc), in particular it only depends linearly on the size of data.

Experimental setting
To test the accuracy of our model of circadian effect on RHR against the traditionally used single component consior, we used the data collected by Biobeats, a corporate well-being company 4 , through their product Biobase, a stress management app that uses a wrist worn wearable device (Biobeam) to passively collect activity data (number of steps performed every 20 s), sleep phases by a triaxial accelerometer with a sample rate of 100 Hz, and heart beats by photoplethysmography sensors (i.e. Interbeat Interval durations are measured for 2 min every 30 min, with a sample rate of 50 Hz; and the average HR of 1 min of heart activity is measured every 10 min, with a sample rate of 25 Hz). To test our model we used a dataset collected by Biobeats from two pilots with financial sector organizations. Forty-seven individuals (F = 52%, M = 47%), with an age range of 20-63 years (M = 29.29, SD =11.38), agreed to participate to the experiment and wore a wrist worn wearable device that collected HR every 10 min, activity data (steps) every 20 s, and sleep phases every night, for four weeks. The number of HR measurements per user was in average 3471 (SD 2964). All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Building a prior by fitting β and ψ on the general population
To build a prior for the RHR we proceeded as follows: • for each user we kept only time series with at least 100 HR samples. After this preprocessing step dataset had a total of 163 179 HR measures; • we split the dataset using a leave-one-out strategy, training the prior on a dataset that did not include the user for which we wanted to calculate the θ parameters; • for each HR measure, we calculated the amount of steps taken by the user in the previous 5 min; • we removed from the dataset the HR points where the number of steps in the previous 5 min was larger than 10. This step was carried out to ensure we estimate RHR, instead of HR that is influenced by the physiological request of physical activity. After this preprocessing step the number of HR datapoints decreased from 163 179 to 132 498; • on each user we first performed cosinor analysis on a single component to find the Acrophase (every user will generally have different sleep habits); • we then aligned the data of all users, forcing all users to have the same phase as a cosine with period equal to 1 d; and • finally, we used multiple-components cosinor on the phase aligned data from all users, using c = 4 cosines, with period of 1 d, half a day, a third of a day, and a quarter of a day. Figure 1 shows one of the priors fitted using the leave-out-out approach. In that case the values for β and ψ, from equation (2), are the following: β represents the population average. ψ 1 describes the first of the four components cosinor fitting. The modulus expresses how important is this component in the fitting, and the argument expresses the phase of this component. ψ 1 has argument equal to zero, because we explicitly aligned all users to a cosine. We can see that the first component captures 62% of the overall information, as the modulus of ψ 1 is equal to 0.62 multiplied by the sum of the moduli of all ψ 1 , ψ 2 , ψ 3 , and ψ 4 .

Building the models of a user resting heart rate by fitting θ on the data of that user
To obtain the analysis of the user of interest, we proceeded as follows: • for every HR measure, we calculated the number of steps in the 5 min preceding the timestamp of the HR measure; • we removed all the HR measures with a corresponding number of steps larger than 10. Therefore we removed all the HR measures that happened during, or immediately after, some physical activity; • with a rolling window approach, for every HR measure we applied exponential decay to the previous HR measures. The decay function is applied to the weights w i used in the loss function L. w i = e −∆t , where ∆t is the time, in days, between the datapoint i and the latest datapoint, that triggered the update of the model. Therefore, the weight approximately halves every day; we then calculated the surrogates S and F, solved the polynomial in ζ, finding the θ parameters of equation (3) that minimise the loss function. Figure 2 shows θ 0 , θ 1 , and θ 2 , calculated at each HR measure. The choice of using an exponential decay function over time ensures that recent data is weighted more than old data. This approach allows the model to evolve over time, reacting to changes in sleep habits and physiological status. An alternative approach could have been to consider measures in finite time moving windows. However, this latter approach could yield unpredictable results in presence of wide sections of missing data, a frequent situation in data from field experiments, caused by users forgetting to put the band back on after recharging it. An exponential decay approach ensures that even in presence of wide sections with missing data, a usable model is always available. Figure 2 shows seven days of HR and sleep data of one of the users, and the estimated circadian parameters: • the first graph shows the HR measures of the user as black dots, and the RHR as estimated by the circadian model as red dots. The x-axis is time expressed as UNIX-timestamps, the y -axes is HR in beats per minute; • the second graph shows the estimated MESOR, θ 0 , evolving over time, changing for every HR measure. The x-axis is time expressed as UNIX-timestamps, the y -axes is HR in beats per minute; • the third graph shows the estimated amplitude, θ 1 , evolving over time, changing for every HR measure. The x-axis is time expressed as UNIX-timestamps, the y -axes is the ratio between the amplitude of the user model and the amplitude of the population model. A value around 1 (the dotted horizontal line) indicates that the circadian amplitude is in line with the population average, values above 1 indicate that the amplitude is larger than the population average, values below 1 that the amplitude is below the population average; • the fourth graph shows the estimated acrophase, θ 2 , evolving over time, changing for every HR measure. The x-axis is time expressed as UNIX-timestamps, the y -axes is phase in days with respect to a cosine with period 1 d, with range between −0.5 and 0.5 d. The horizontal dotted line shows a phase of zero, indicating that the user sleeps at midday; and • the fifth graph shows the sleep data measures by the wearable, in the graph we show at what time the user went to bed and woke up, as a line. The horizontal dotted line indicates midnight. We did not use this data in our model, and is shown here as a reference for the sleep behaviour of the user.
Looking at the estimated circadian parameters, it can be seen that on the fourth night, slightly after timestamp = 1546 300 000, the MESOR increases and the amplitude decreases, indicating non optimal physical conditions of the user, and that the acrophase suddenly changes, increasing from the stable average. Observing the sleep data from the band it can be seen that on that same night the user suddenly changed sleep habits (from a healthy stable time to bed of about 11 pm, to 3 am). The lack of proper sleep can be seen in the HR data and is reflected in the changes in θ 0 and θ 1 . The sudden change in the user sleep habit is reflected in the sudden change in θ 2 . Figure 3 shows the data of one day of a user, with the fitted single component cosinor model and our circadian model. We can see that our model better captures the sudden change in HR when the user awakens, and the fact that the time awake is longer than the time asleep. More examples with the fitted circadian model of multiple users can be found in the supplementary materials (stacks.iop.org/PM/40/095001/mmedia). The paired t-test between cosinor RMSE and null hypothesis RMSE returns a non significant difference of means (p-value >0.5).
The loss function L(θ) can be used to calculate the standard error, useful to have an estimation of the distribution of the predicted data. This information could be used to automatically analyse HR activity, finding anomalous data points that could be caused by factors not modelled by circadian effect on RHR, such as physiological conditions.

Conclusion
In this paper we presented a novel mathematical model that can be used to approximate a user's RHR. The model achieves the expressiveness of the multiple component cosinor, i.e. the model fits the data. At the same time the parameters of the model are easy to interpret parameters: MESOR, amplitude and acrophase, the same produced by the single component cosinor model. The model does not require uniformly sampled data, and allows to weight each datapoint independently. Hence, the results showed in this paper suggest that it is possible to accurately predict circadian RHR rhythm by using wrist worn PPG device permitting to make inference about health status of the people at home without the need of expansive devices. Moreover, our model allows the fitting to require a signal with a predetermined shape (a combination of sinusoud functions with fixed amplitudes and phases). We use this characteristic to force the shape of the expected effect of circadian rhythm on RHR, an operation not easily doable with the method used in the literature.
Continuously assessing the evolution of all the parameters of RHR circadian rhythm (i.e. MESOR, amplitude and acrophase) it is possible to make inference about all-cause mortality. In particular, it is possible to evaluate the quality of sleep-wake circadian rhythm from the wrist RHR. For example, low arousability, e.g. circadian rhythm with long sleep time duration, have been found to be a possible risk factor for sudden-infants-death-syndrome (SIDS) (Dvir et al 2018). A home used wrist-worn HR sensors device with an accurate circadian rhythm analysis showing increase in the sleep time duration (low arousability) might be used to prevent SIDS. Moreover, heart diseases could be induced by sleep disorder also in adulthood, i.e. too long (more than 8 h per night) or too short (less than 7 h per night) sleep duration). Hence, an accurate sleep time duration analysis may be used to screen heart diseases using simple home device (Shahar et al 2001).
The model provided in this study is computationally cheap, depending linearly on the size of the data. The computation of the model does not need the full dataset, but only two surrogates (F and S). This implies that the model can be implemented in a streaming approach, where the data is accumulated on the device, with important consequences for security and privacy of the data, that never leaves the user devices. Moreover, this model is, to the best of our knowledge, the only computational model that can be used to predict users RHR and to analyze the data using the parameters traditionally used to describe the circadian modulation of physiological activity. The model we developed goes beyond fitting circadian activity on RHR, and it can be used to fit arbitrary periodic real valued time series, but also vectorial data (e.g. gesture recognition from the accelerometer), or complex data. With an extension to the mathematical model it could be used to fit non periodic data using a linear combination of an arbitrary set of functions.