Deep longitudinal phenotyping of wearable sensor data reveals independent markers of longevity, stress, and resilience

Biological age acceleration (BAA) models based on blood tests or DNA methylation emerge as a de facto standard for quantitative characterizations of the aging process. We demonstrate that deep neural networks trained to predict morbidity risk from wearable sensor data can provide a high-quality and cheap alternative for BAA determination. The GeroSense BAA model was trained and validated using steps per minute recordings from 103,830 one-week long and 2,599 of up to 2 years-long longitudinal samples and exhibited a superior association with life-expectancy over the average number of steps per day in, e.g., groups stratified by professional occupations. The association between the BAA and effects of lifestyles, the prevalence of future incidence of diseases was comparable to that of BAA from models based on blood test results. Wearable sensors let sampling of BAA fluctuations at time scales corresponding to days and weeks and revealed the divergence of organism state recovery time (resilience) as a function of chronological age. The number of individuals suffering from the lack of resilience increased exponentially with age at a rate compatible with Gompertz mortality law. We speculate that due to the stochastic character of BAA fluctuations, its mean and auto-correlation properties together comprise the minimum set of biomarkers of aging in humans.


INTRODUCTION
Any advances in personalized and informed lifestyle interventions to promote longevity and health will require reliable and immediate feedback on health status changes in response to treatments. Such capabilities have just recently become available in the form of biological clocks and are increasingly used in the field of quantitative aging research. State-of-art implementations involve machine learning of the associations of the DNA methylation patterns [1] or blood variables [2][3][4] with either the chronological age or risks of death and diseases. The aging clocks have been used in clinical trials of anti-aging interventions [5].
Large-scale biochemical or genomic profiling of Biological age acceleration (BAA) is, however, still logistically difficult and expensive. Mobile technology holds a great promise for the democratization of population health studies. It already provides engagement tools to help customers maintain physical activity levels, body weight, and adhere to lifestyles known to promote a healthy lifespan. In 2019, one-infive U.S. adults (21%) reported they regularly used a wearable fitness tracker or smartwatch [6]. The health and home fitness app downloads grew by 46% during COVID-19 lockdown [7].
In fact, only mobile technology can support large-scale studies involving monitoring of early signs of a disease or measuring recovery rates, all requiring sampling more often than once per week. Recent examples include the analysis of the worldwide distribution of physical activity [8], changes in physical activity levels in response to COVID-19 lockdown [9], and the AGING associations of physical activity and the risks of COVID-19 mortality [10,11]. There are, however, multiple unresolved issues, such as inaccuracies of sensor data, missing data, outliers, varying measurements between devices of different manufacturers, and seasonal variation of physical activity [12,13] -all precluding from wider acceptance of the wearables signal in population studies.
We applied deep learning technology to systematically address these challenges. We trained and characterized a simple model that learns physical activity patterns from wearable devices, which are directly associated with morbidity risks on the population level. Accordingly, the organism state representation output by this model is a single dynamic variable closely related to BAA. The neural network architecture included components specifically designed to resolve the missing data and solve transferability across platforms. We found that both blood-based and wristband step-counter-based models demonstrated surprisingly similar levels of sensitivity in applications involving BAA associations with diseases and lifestyles. Moreover, the activity-based models' signal-to-noise ratio could be improved by averaging over longer motion tracks. After just a few months of averaging, the activity-based model applied to a wristband signal may detect the effects of chronic diseases and smoking at the same level of significance as blood-based PhenoAge from [2] and Dynamic Organism State Indicator (DOSI) from [4]. The same finding held for the association of BAA with the incidence and severity of seasonal infectious diseases (including COVID-19).
Finally, we investigated the auto-correlation properties of the BAA fluctuations. The diverging autocorrelation times are typical for systems approaching tipping or disintegration point [14] and a hallmark of aging [15,4]. Accordingly, we observed vanishing recovery rate and the exponentially increasing fraction of individuals with long recovery times in subsequent age cohorts. The number of non-resilient individuals doubled every 8 years, which is compatible with the mortality rate doubling time characteristic to the Gompertz mortality law [16]. We conclude that due to the inherent stochastic character of BAA fluctuations, the BAA mean and the BAA autocorrelation time (the resilience) are the two most basic and independent health indicators, closely related to aging and human mortality.

Biological age predicts morbidity and mortality
We trained the GeroSense system, a deep artificial neural network ( Figure 1) to extract health-associated features from the physical activity recordings. The system included the encoder part, which took the input in the form of a series of step count per minute measurements for at least as long as one week and compressed the signal into 4-dimensional representations (embeddings). During the training and test procedure, we used one week-long samples of steps per minute recordings for 97,320 UK Biobank and 6,510 NHANES participants along with recordings samples from longitudinal data obtained for 1,876 smartphone and 723 smartwatch users. The embedding vectors were further fed into the domain-adaptation network, trained to reduce the difference between the feature sets distribution in samples originating from different devices. In such a way, we were able to produce the most common features present in the motion data.
At the top layer, log-linear proportional hazards models of all-cause mortality are natural tools to build the biological age acceleration models, see, e.g., the PhenoAge model [2,17]. If, however, the number of observed events is small, a simple logistic regression model provides an excellent approximation to the solution of the corresponding proportional hazards [18,19]. Therefore, in the present study, we trained the neural network using cross-entropy loss to predict binary labels: the prevalence of at least one chronic disease. Overall, we labeled events for 23% and 29% samples in NHANES and UKB, respectively (see Materials and Methods section "Morbidity status" for the precise definition).
The model's output was the Biological Age Acceleration (BAA), estimated once per each seven days and calculated as the linear combination of the physical activity signal embeddings and biological sex label. During the training procedure, BAA was added to the chronological age of each participant to produce biological age followed by sigmoid activation layer and cross-entropy loss on the prediction of morbidity status.
To control for over-fitting, we split all data into training and test subsets. The quality of GeroSense BAA for predicting the morbidity status was similar in training and test subsets in both NHANES ( Figure 2A) and UK Biobank ( Figure 2C) with ROC AUC 0.60−0.61 in test subsets.
We also expected the high concordance between the mortality and morbidity predictors [20]. Accordingly, we tested the ability of the model to predict future mortality events (see Figure 2B, 2D for the summary of the GeroSense BAA model performance in NHANES and UKB datasets, respectively). The scoring performance was similar to that of morbidity status and yielded ROC AUC 0.60−0.62 in test subsets.

AGING
BAA and the life expectancy in professional occupation groups BAA from the network was superior to average daily physical activity-based BAA in scoring life expectancy in various professional occupations. The number of steps per day averaged over a sufficiently long period is an easy-to-understand and adjustable parameter that predicts mortality and morbidity [20]. This can be readily seen in Figure 2, where the negative logarithm of the number of daily steps (nloga) has all the properties required of BAA. However, the average per day based on step counts recorded by wearable or mobile device sensors using each individual's week-long physical activity tracks. The network components responsible for the feature extraction and BAA output are shown in green. BAA can be predicted for any sample of arbitrary length exceeding one week. For example, BAA on day 10 is predicted using the step counts data coming from day 4 through day 10, and so forth. Shown in red are the network components used only during the training procedure. One is the discriminator responsible for domain adaptation between e.g. smartphones and smartwatches. The other is the class predictor based on the log-odds ratio trained to predict morbidity binary status for UK Biobank and NHANES. AGING physical activity obviously cannot be a good biological age measure. It is strongly affected by social factors and working schedule and therefore has a poor correlation with life expectancy across countries [8] and between groups of different professional occupations ( Figure 3A).
Notably, the GeroSense system produced BAA from wearable sensors data, which properly ranked professional occupation groups in NHANES according to both genders' empirical life expectancy ( Figure 3B). We did not have access to and hence could not test the association of physical activity and lifespan data across countries. Therefore, GeroSense BAA's ability to score the life-expectancy of populations of different countries remains an open issue.

Cross-platform transferability of BAA and seasonal variations
The embeddings of physical activity tracks depend on the signal source, whether it is a smartphone or a smartwatch. Deep Neural networks are powerful feature-extraction tools and a proper choice to address this issue. We employed the domain adaptation network minimizing the feature-wise Kullback-Leibler divergence loss between samples originating from different devices during the training procedure. The problem is akin to batch removal. The proposed procedure helped the GeroSense network to learn the most common features between UKB, NHANES, and samples obtained from iPhone and Apple Watch.
Seasonal changes affect blood parameters [21], and physical activity patterns recorded by wearables [12]. The seasonal variations of the activity patterns may be an additional source of unwarranted fluctuations of the biological age estimates. We applied another Kullback-Leibler divergence minimization to penalize pair-wise differences in distributions of features for UK Biobank samples collected in the summer and winter.
The domain adaptation worked well: BAA level distributions were almost indistinguishable between the samples originating from smartphones and smartwatches (p=2E−5). In contrast, the levels of negative logarithm of average physical activity were much more different (p=2.7E−80). The difference was expected but is still striking since we analyzed the smartphone and smartwatch data from the same users.
The results of the statistical testing (p-values) strongly depend on the sample size. That is why, here and in all the following examples, we report p-values obtained for the same maximum size of 500 in each group. The pvalues themselves are calculated using Fisher's AGING combined probability test (see details in Materials and Methods section).
Notably, there was a very significant drop in the physical activity levels during the COVID-19 pandemic lockdown in March through May 2020 as compared to the same period in 2019 (p<1E−30 for nloga). This was consistent with what was reported earlier [9]. In contrast, the increase in BAA was much less significant (p>1E−10). This may indicate that BAA responds weaker to the lockdown than the expected decrease in physical activity, see Figure 4F. Moreover, this was in contrast to the improved ability of BAA to predict future risks of COVID-19 incidence and mortality rates in UKB as compared to nloga.
The decreased average level of physical activity (nloga) was associated with the increased COVID-19 risk in UKB [11], although it was not clear if this was not an effect of chronic disease burden (also known for its association with increased BAA). In Figure 4 we report that the excess BAA predicted the increased risk of COVID-19 incidence (for example, HR=2.4, p=4E−2 for 16 of UKB subjects died from the disease) in the subset of randomly sampled 500 UK Biobank participants free of chronic diseases at the time of measurements (2013−2015).

Side-by-side comparison of motion data-and bloodbased aging clocks
We compared the performance of different BAA models for stratification of cohorts of NHANES participants of various lifestyles and health status. We have already seen in physical activity data [22] that the disease and smoking labels are associated with elevated BAA among individuals without chronic diseases. In our tests, the sensitivity of the BAA derived from blood markers was comparable to that of the self-reported questionnaire. GeroSense BAA performed consistently well in the same set of tests and conditions, see Figures 5,6. Estimation of the BAA from wearable sensors has an advantage over blood-based models. It arises from its ability to further improve the signal-to-noise ratio by averaging over sufficiently long motion data streams. We demonstrated this with self-reported morbidity and  Averaging of GeroSense BAA predictions over a few weeks-long tracks led to a dramatic improvement of association between the BAA and morbidity/smoking status ( Figure 7A). As expected, the sensitivity of the model was comparatively lower once we used smartphones instead of wristbands as the source of the data ( Figure 7B) but also improved upon averaging over several weeks.

Longitudinal analysis of BAA fluctuations reveals age-dependent loss of resilience
BAA reversibly depends on lifestyles. Hence, BAA is a dynamic variable more characteristic of stress rather than aging and responding to random organism state perturbations in a stochastic manner. We used longitudinal tracks of step counts from Fitbit devices and calculated the autocorrelation function for every user.
The autocorrelation function decayed exponentially. Accordingly, we carried out the exponential fit to infer the autocorrelation time as a measure of recovery rate or resilience. This quantity is a natural quantitative measure of an organism's ability to recover its equilibrium state after stress.
The characteristic decay time was typically in the range of a few weeks and increased with age. Figure 8A shows the dependence of the recovery rate (the inverse auto-correlation time) on chronological age. The graph was produced by averaging over age-stratified cohorts and resembles much what we have previously reported for blood-based marker DOSI [4]. The recovery rate decreased approximately linearly with age, indicating  the effective loss of resilience at some age exceeding 100 y.o. The same extrapolation would suggest that the recovery time increases approximately hyperbolically and would diverge at the same age, indicating the complete loss of resilience and the dynamic stability of the organism state.
To further investigate the relationship between resilience and aging, we identified individuals, which failed to recover quickly under stress. We established a somewhat arbitrary resilience cutoff corresponding to the recovery time exceeding 3 weeks. The fraction of such "non-resilient" individuals increased exponentially  AGING as a function of age (see Figure 8B). Moreover, this growth demonstrated the characteristic doubling rate of 0.087 per year, which was close to the mortality rate doubling rate according to the Gompertz mortality law.

DISCUSSION
We report the development and characterization of a deep neural network model trained to quantify the state of human health from the analysis of intraday physical activity tracks collected by consumer wearable devices (including mobile phones). The quantity has properties of biological age acceleration (BAA): it is associated with chronic diseases and life-shortening lifestyles, predicts the risks of death and future incidence of chronic diseases in cohorts of individuals free of chronic diseases [4].
Deep neural networks are natural tools for learning nontrivial and highly non-linear representations of the input data. Convolutional and recurrent networks have been used for the analysis of intraday physical activity data streams from wearable devices and predictive modeling of health outcomes [23] including biological age [17,24]. Often such models demonstrate a moderate improvement in accuracy at a price of a decreased transferability across datasets with different baseline feature levels. This is, of course, is well-known batch effect problem in large-scale studies in biology [25], which is often aggravated by feature-rich deep learning architectures [26,13].
GeroSense BAA model employs additional neural network components to address this domain shift problem to ensure learning device-independent representations of the input signal. To achieve this goal, we imposed an additional loss in the course of training to penalize model parameters if distributions of learned representations were too far apart for data from different domains (devices). Without such a domain adaptation, the properties of the signal may indeed be very different even in the same biological context. For example, the (log-scaled) average number of daily steps recorded by phone was significantly lower (p=2.7E−80) than that by the smartwatch in the data from the same users. GeroSense BAA network successfully resolved this batch effect and yielded essentially indistinguishable BAA distributions for the same population (p=2E−5).
The average activity level recorded by the same device in a group of people of the same gender, professional occupation, and country of residence is already an excellent and popular proxy to biological age. The association between the mean activity and health is robust and hence is the basis for the popular recommendation to take a minimum of 10,000 steps a day [27]. However, the average activity level is highly context-dependent, which is why it is poorly associated with life expectancy across countries [8]. In our study, we demonstrate that the average activity is incorrectly (negatively) associated with the life expectancy across professional occupation groups ( Figure 3A).
The device-independent features from intraday physical activity patterns from the GeroSense network are still associated with health but decoupled from the mean activity. The procedure did not undermine the predictive power of GeroSense model, as we could see from the BAA association with mortality events (Figure 2). GeroSense BAA was superior in scoring life expectancy in professional occupation subgroups ( Figure 3B). This feature of the model should be useful in applications involving health risk assessment and life insurance applications.
Biological clocks based on mortality risk, including GeroSense BAA, are associated with the prevalence of chronic diseases ( Figure 5) and life-shortening lifestyles, such as smoking, in a reversible way ( Figure  6). This is totally consistent with earlier observations of the effect of smoking on physical activity [20], blood markers [4,22], and DNAm PhenoAge [2].
In NHANES cohorts, the GeroSense model produced the association between the BAA and the morbidity and smoking labels at the significance level matching that of the BAA calculated based on self-reported health questionnaire [22], blood test-based bioage including CBC only [4], and blood biochemistry [22], and Phenotypic Age [2].
The longitudinal character of motion data provides a natural way to improve the signal-to-noise ratio by averaging over sufficiently long tracks (see Figure 7A, 7B). This may be critical for mobile phone applications since the step counts recorded by phones suffer from missing data whenever a device is idle and is not recording the user's walks. Our analysis suggests that GeroSense BAA from smartphones can be averaged to a useful level once at least a few months of data are available for an individual. The inferior performance of the biological age model in smartphone data can be compensated by smartphone population coverage compared to that of wristband wearable devices. The smartphone motion data can be used for truly largescale epidemiological studies involving cohort comparisons. The latter factor might turn important to mitigate the issues of non-representative datasets due to possible income/health status [13,28] and already observed enrollment biases [29].

AGING
We observed, that GeroSense BAA is also associated with the incidence of non-chronic diseases. This is consistent with earlier observations of the association of lower physical activity levels and risks of COVID-19 infection [10,11], although it was not clear whether this is an effect of chronic diseases, also negatively affecting mobility. GeroSense BAA was better associated with the incidence of COVID-19 than the average physical activity level in UKB among a sub-population of individuals free of chronic health conditions (Figure 4).
The average physical activity dropped worldwide in 2020 in the course of COVID-19 lockdown [9]. We also observed a significant change in (log-scaled) number of daily step counts in our data, but not in GeroSense BAA during March-May 2020 as compared to the same period in 2019. We provided evidence suggesting that GeroSense BAA more efficiently sores those at risk of getting an infection than the physical activity level. The effects of lockdown on morbidity risk may be smaller than one could expect simply by monitoring the drop of the activity. Further studies including direct association with epidemiological data are required to test this hypothesis.
The idea of reducing complex biological signals to as little as one variable, the BAA, in relation to the current or future health arises from the effectively low dimensionality of physiological systems. Typically, physiological and behavioral responses manifest themselves as highly coordinated changes in physiological variables, such as blood tests [4] or daily physical activity patterns [20]. The increasing concordance between the physiological indices is expected to increase late in life, as the range of the fluctuations and the organism state recovery time effectively diverge at advanced ages indicating a maximum attainable lifespan [4]. On the contrary, the number of the relevant variables is expected to increase if we turn to the characterization of the organism state variation at a higher sampling rate. This might be the case for a situation involving response to an acute illness on time scales of days or a few weeks [30], such as increased RHR during fever [31,32] or change in sleep patterns as potentially a COVID-19 specific signal [33,34].
The quantitative characterization of the dynamic properties of BAA fluctuations or recovery processes requires a reliable determination of baseline BAA. This task may be hampered by seasonal variation of the physiological state variables, such as blood tests [21,35], blood pressure [36], resting heart rate [37], and of course physical activity [12]. High-quality research studies acknowledge this problem and adjust for baseline oscillations [12,28]. Such corrections are straightforward for relatively short time scales involved in acute respiratory illnesses [30] or post-operative recovery [38].
Unfortunately, proper adjustments are not always possible in practice. Health outcomes associated with BAA may be years apart from the time (and hence the season) of observations [19]. Otherwise, the time of measurements may be available at poor granularity. For example, NHANES provides publicly only the binary labels corresponding to the winter-spring (November-April) or summer-fall (May-October) seasons.
We trained the GeroSense BAA model with an additional loss penalizing the winter-summer distribution difference. In such a way, the model output is decoupled from seasonal variations and yet demonstrated pretty good performance in ranking health outcomes. We expect that this feature of GeroSense BAA will be handy for practical applications.
The longitudinal character of motion data allows the investigation of organism state fluctuations in response to natural stresses and diseases. We computed autocorrelation functions of GeroSense BAA along the individual BAA trajectories. The recovery rate measured as the inverse decay time of the autocorrelation function demonstrated an agedependent decrease ( Figure 8A). Extrapolation to advanced ages shows, that the recovery rate vanishes (and hence the resilience formally diverges) at some age exceeding 100 years, which may be an indication of limiting lifespan [4].
The recovery time among the most healthy individuals was in the range of a few weeks. We used a somewhat arbitrary cutoff corresponding to the recovery rate less than 3 week -1 and used it to mark individuals with longer recovery time as those who lack resilience. We observed a progressive exponential increase of the fraction of non-resilient persons in the population with age ( Figure 8B). This number grew and doubled every 8 years, which is close to the mortality rate doubling time in Gompertz mortality law for the human population [16].
Long auto-correlation times of state fluctuations are typical for complex systems approaching a tipping point or in the process of disintegration [14] and represent a hallmark of aging [15,4]. Case fatality rates (CFR) accelerate with age in the case of COVID-19, stroke, and probably other diseases. The characteristic doubling rate in the case of COVID-19 is reported in [39] as 6−9 years. Our estimation from the figure in [40] yielded ≈10 years for the doubling rate for one-year survival of stroke patients. The physiology of stroke and infection AGING diseases is apparently very different. The similarity of patterns of age-dependence of CFR is intriguing and may suggest that the loss of resilience may be a good marker of the approaching loss of dynamic stability of an organism and hence a major and universal contributing factor to the fatality.
The reversible character of the association between mortality risk-based BAA and unhealthy lifestyles (such as smoking) suggests that BAA is not a biomarker of aging but instead is a measure of the overall stress level. BAA's dependence on age in large cross-sectional datasets is a marker of stress imposed by the increasing burden of chronic diseases. The high sampling rate achievable by the motion data lends us a richer set of biomarkers associated with age. Aside from the average BAA level, the continuous data collected by wearable sensors provides a practical opportunity to investigate the autocorrelation and variance properties of BAA fluctuations, which are independent organism state variables, each uniquely informing about the user's health state. We can hardly imagine a large-scale blood test study involving sampling more often than once a month or so for healthy people. Therefore, only the motion data analysis exemplified here is the only technology currently up for the task.
Wearable device motion data have already been used for monitoring acute illnesses including detection of early signs of the outbreak of influenza-like illnesses [28] and COVID-19 [30,34]. Application of motion data, including the wider deployment of the GeroSense system, described here, should provide means to monitor levels of stress and resilience in response to environmental conditions or interventions on a population level in different countries and socioeconomic groups in future studies. We hope that future developments will lead to further applications of AI in geroscience research, public health, and policy decisionmaking.

UK Biobank
Physical activity for UK Biobank participants aged 40−80 y.o. (54,777 female and 42,543 male) was measured by Axivity AX3 tri-axial accelerometers worn on the wrist for one week. We converted 100Hz raw acceleration measurements to step counts per minute to fit the format of data in other datasets used in this study. The number of steps during each consecutive minute was counted as the number of peaks of the absolute value of acceleration exceeding 1.3g. To ensure the local noise does not affect the result, only one peak (the highest) was counted in each 480ms sliding window with a step of 160ms resulting in at most 3 step counts during each 960ms.
Steps closer to each other than 90s were combined into walking bouts and bouts with less than 5 steps in total were discarded.

NHANES
Physical activity for NHANES 2005−2006 participants was used in the form of step counts per minute collected by ActiGraph AM-7164 single-axis accelerometer worn on hip. Data were retrieved from the file "Physical Activity Monitor" of the "Examination data" category. Samples for 3,362 female and 3,148 male participants aged 6−85 y.o. were used.

Healthkit
Physical activity for users of Gero app aged 45−75 y.o. (464 female and 1,412 male users of smartphone, 125 female and 598 male users of smartwatch) was obtained from Healthkit. Raw activity data comprised the number of steps recorded by either smartphone or smartwatch during a time period with start and end timestamps and was resampled to equispaced time series of steps per minute.

Morbidity status
Binary morbidity status for the Healthkit dataset was assigned according to response to the survey question "Have you ever been told you have one of the following: diabetes, hypertension, cancer, coronary heart disease, heart failure, heart attack, or stroke?" Binary morbidity status of NHANES and UK Biobank participants was assigned according to the presence of at least one of those diagnoses. We used NHANES data on health condition and age at diagnosis available in the questionnaire category "Medical Conditions" (MCQ). Data on diabetes and hypertension was retrieved additionally from questionnaire categories "Diabetes" (DIQ) and "Blood Pressure and Cholesterol" (BPQ), respectively. For UK Biobank we aggregated ICD10 (block level) data to match that of NHANES and used the following ICD10 codes to cover the health conditions in UK Biobank: diabetes (E10-E14), hypertension (I10-I15), cancer (C00-C99), coronary heart disease (I20-I25), congestive heart failure (I50), myocardial infarction (I21, I22), and stroke (I60-I64).

Life expectancy
Empirical life expectancy from birth was determined for professional occupation groups using linked death register follow-up data for NHANES 2005−2015 AGING surveys. To do that we fitted parameters of Gompertz likelihood adopted from [41]

Statistical analysis
Statistical analysis of the association of various Biological Age measures with morbidity/smoking status was performed using two-sided Mann-Whitney test. To ensure the reported p-values are comparable between tests we used the same cutoff of maximum of 500 samples in each test with 100 random samplings followed by combining p-value according to Fisher's combined probability method [42]. All statistical tests were carried out using the python package SciPy (version 1.5.2).

Blood tests-based biological age models
In this work, we used blood tests-based biological age models trained using Cox proportional hazards approach in NHANES mortality follow-up data and reported elsewhere earlier. The Blood CBC (DOSI) model was trained using log-scaled values of hemoglobin, mean corpuscular volume, mean corpuscular hemoglobin concentration, red blood cell distribution width, red blood cell, platelet, neutrophil, lymphocyte, monocyte, and eosinophil counts as well as biological sex label [4]. The Blood Biochemistry model additionally included age, and log-scaled values of Creactive protein, albumin, alkaline phosphatase, gammaglutamyl transferase, globulin, and serum glucose [22]. The Blood PhenoAge model was based on age, albumin, creatinine, serum glucose, log-scaled Creactive protein, lymphocyte percent, mean cell volume, red cell distribution width, alkaline phosphatase, and white blood cell count [2].

Neural network architecture
Deep neural network architecture is schematically shown in Figure 1. Wearable data is input in the form of a continuous array of steps per minute. The input is immediately converted to a one-hot embedding representation, where each bin corresponds to an increment of 4 steps per minute. Next, the encoded data is processed by a block of 16 1D-convolutional layers, each having 16 filters with a kernel size of 3 and "elu"activation. One in two convolutional layers is followed by a local max-pooling with stride 2, 3 or 5, and each layer is followed by batch-normalization. The output of the convolutional block was 4 features per every 1440 points in the input array, which corresponds to the number of minutes in one day. Finally, the features were subject to a 7 day-long average pooling and linearly combined with binary biological sex label so that the deep neural network was capable of outputting a prediction once per day based on 7 previous days.
The output of the deep neural network was interpreted as the Biological Age Acceleration (BAA) expressed in years of healthy life expectancy gained or lost. To guarantee this, during the supervised training of class label predictor we obtained the value of the Biological age of each NHANES and UK Biobank user by adding the network output (BAA) to the chronological age. The Biological age was then subject to sigmoid activation and fitted to binary morbidity status label, assuming that such procedure is an approximation to fitting proportional hazards model [18,19].
The Domain adaptation networks were employed in the form of pairwise Kullback-Leibler divergence loss functions applied to enforce similar feature distributions for samples from UK Biobank on one side and NHANES, HealthKit smartphones, and smartwatches on the other side. Additionally, a domain adaptation was applied to UK Biobank samples collected during summer and winter as well as to samples with up to 3 zero-imputed (missing) days.
The training procedure was run for 2000 iterations, each batch comprising 256 samples. The class predictor was trained on each iteration for UK Biobank samples and only on one in five iterations for NHANES to avoid potential overfitting since the number of NHANES samples was small. All domain adaptation networks were trained on each iteration. Each network was trained using Adam optimizer as implemented in python package tensorflow-gpu (version 2.3.1) with a learning rate of 1E−3.