LongITools: Dynamic longitudinal exposome trajectories in cardiovascular and metabolic noncommunicable diseases

The current epidemics of cardiovascular and metabolic noncommunicable diseases have emerged alongside dramatic modifications in lifestyle and living environments. These correspond to changes in our “modern” postwar societies globally characterized by rural-to-urban migration, modernization of agricultural practices, and transportation, climate change, and aging. Evidence suggests that these changes are related to each other, although the social and biological mechanisms as well as their interactions have yet to be uncovered. LongITools, as one of the 9 projects included in the European Human Exposome Network, will tackle this environmental health equation linking multidimensional environmental exposures to the occurrence of cardiovascular and metabolic noncommunicable diseases.


Introduction
From one generation to the next, there are vicious circles operating among the rising prevalence of cardiovascular and metabolic noncommunicable diseases (CM-NCDs), social inequality, spiraling health care costs, and varying quality of living environments. If and through which mechanisms these processes relate to each other is probably one of the greatest epidemiological questions of the 21st century. Undeniably, we are facing complex sociodemographic and medical challenges that can be conceptualized as a network of highly correlated determinants and risk factors. These factors in turn influence longitudinal health trajectories, ultimately contributing to the risk of CM-NCDs and the consequent economic burden. LongITools is 1 of the 9 projects included in the European Human Exposome Network (EHEN; www.humanexposome.eu). EHEN is funded by Horizon 2020, the European Union (EU) Framework Programme for Research and Innovation, and it represents the world's largest network of projects created to study the impact of environmental exposure on human health. Within the EHEN, LongITools' task is to study the dynamics of environment and cardiometabolic health and develop tools for exposome research. LongITools brings together European longitudinal data, from prospective cohort studies, randomized controlled trials (RCTs), biobanks, and registries, to construct the basis for longitudinal exposome studies.
Abstract. The current epidemics of cardiovascular and metabolic noncommunicable diseases have emerged alongside dramatic modifications in lifestyle and living environments. These correspond to changes in our "modern" postwar societies globally characterized by rural-to-urban migration, modernization of agricultural practices, and transportation, climate change, and aging. Evidence suggests that these changes are related to each other, although the social and biological mechanisms as well as their interactions have yet to be uncovered. LongITools, as one of the 9 projects included in the European Human Exposome Network, will tackle this environmental health equation linking multidimensional environmental exposures to the occurrence of cardiovascular and metabolic noncommunicable diseases.
Keywords: Exposome; Cardio-metabolic and vascular health; Life-course pathways; European research consortium

What this study adds
This consortium profile paper introduces (1) LongITools' scientific concepts that are primarily based on longitudinal modeling; (2) the metadata for the project; (3) the expected impact of the project; and finally (4) the strengths and challenges of this endeavor.
The overarching aim of LongITools is to understand the environmental, biological, and psychosocial dimensions of CM-NCDs, taking life course factors and a longitudinal approach into consideration. LongITools will improve our understanding on how the exposome, i.e., the combined exposures throughout the life course of an individual, contributes to the risk of CM-NCDs.

LongITools concepts
The relationships between environmental risk factors such as air pollution, environmental noise, and urbanization on one hand and the development of CM-NCDs on the other can be conceptualized in several frameworks as exemplified in Figure 1. LongITools is based on a well-established observational relationship between adiposity and increased risk of adverse glycemic, lipid-related, and cardiac functions leading to the development of insulin resistance, hypercholesterolemia, hypertriglyceridemia, and high blood pressure, which in turn together with environmental risk factors are associated with increased risk of CM-NCDs. Various stages of development of CM-NCDs share genetic, biological, lifestyle (unhealthy diet and physical inactivity), environmental, and sociodemographic causes. However, at all stages of the life course, which in LongITools are divided into "early-life," covering the fetal period to childhood, "adolescence" and "adulthood and old age," considerable knowledge gaps remain.
Addressing the relationships in Figure 1, LongITools will look to challenge the following assumptions, which are not mutually exclusive, regarding the role of exposome on the life course development of CM-NCDs: • Direct chain of causality: variations in environmental risk factors are causally related to changes in lipid and glycemic trajectories with different relationships to early disease stages and subsequent development of CM-NCDs. Work within LongITools will attempt to combine data from cohorts as proposed by Hughes et al 1 and deploy multiple orthogonal analysis designs to challenge the causal chain, such as by implementing cross-cohort comparisons, Mendelian randomization, and Bayesian path models. 2,3 • Joint effects hypothesis: associations and potentially causal relationships with a single or a set of environmental exposures, reflecting underlying commonality, influence the disease trajectories. This hypothesis can be tested by analyzing how the environmental factors, in isolation or as latent environmental scores, may modify the core relationship between anthropometry, early disease stages, and the onset of CM-NCDs. • Bidirectional causality hypothesis: the relationships between cardiometabolic health trajectories and onset of CM-NCD promote the deterioration of public health and the environment. The assumption of bidirectionality is assumed correct as mutually promoting risk profiles (i.e., of disease status and environmental exposure) are assessed and potentially demonstrated. In this instance, time series/ longitudinal data (with possible crossover events) and bidirectional Mendelian randomization will be used to explore the possible existence of enforcing feed-forward relationships between disease and environment, e.g., morbidity, sociodemographic patterns, and access and exposure profiles to protective or risky environments. by environmental factors. This relationship may theoretically occur in reverse, and effort will be put into the examination of apparent interactions, considering challenges in both statistical power and the true origin of apparent interactions.
The source of life course data and the opportunity to make these data findable, accessible, interoperable, and reusable (FAIR) are core components of this consortium. 4 This consortium profile describes the studies involved in LongITools and the FAIR metadata that the project will build and promote. LongITools is coordinated by the University of Oulu in Finland and includes 15 academic and 3 small-and medium-sized enterprise (SME) partners across Europe (eTable 1; http://links.lww.com/EE/A168). The participants all complement one another, bringing together the full range of technical and specialist expertise in epidemiology, (epi-) genetics, metabolomics, lifestyle, mathematics, economics, policy making, and sensor technology that are required to create a critical mass of expertise for the project.

Who is in the study?
LongITools builds upon and leverages prospective birth cohorts, longitudinal studies in adults, register-based follow-ups, randomized controlled trials (RCTs), patient databases, as well as maternity and hospital biobanks. Currently, these add up to 25 different studies including 11 million individuals across Europe (Table 1). Birth cohorts within the project will provide substantial longitudinal data from pregnancy to adolescence and early adulthood, complemented by prospective adult cohorts, with multiple follow-ups during adulthood and in older age. The RCTs involved are focused on the role of nutrition and physical activity in general health and metabolism. These will not only provide comprehensive biological and exposome profiles of study participants but will also allow an in-depth analysis in more controlled settings. Finally, the involved biobanks will be essential for the generalization of the analyses in large populations. Altogether, the studies involved in LongITools cover the whole life course, represented by blood samples, metadata, and questionnaires of thousands of cohort participants (Figures 2A, B, and 3). The data collected in the studies at different time points are summarized in Table 2 and are available in more detail on the LongITools website (www.longitools.org/about).

How do we study?
To optimize findability, all relevant study metadata, i.e., the available variables in the studies, as well as how they are harmonized to be made interoperable for pooled and meta-analysis, will be made findable and accessible into a MOLGENIS catalogue, 5 linked to the Biobanking and BioMolecular resources Research Infrastructure -European Research Infrastructure Consortium (BBMRI-ERIC) Directory of cohorts and biobanks, 6 and integrated with the existing EU Child Cohort Network Variable Catalogue (https://catalogue.lifecycle-project.eu/) created by the Horizon 2020-funded LifeCycle project. 7 LongITools will use recommendations from the LifeCycle project when possible and will establish new harmonization instructions when needed. By using centrally administered instructions for harmonization, LongITools aims to ease the collaboration between studies. As all studies historically have their own design and data collection protocols, harmonization may not always make optimal use of all data available in each study. However, the increased statistical power in the pooled and meta-analyses will be the positive tradeoff of possible loss of detail caused by harmonization. LongITools will use a federated data analysis platform, DataSHIELD, which enables the analysis without need to physically transport the data.
Federated data analysis approach.
LongITools will use DataSHIELD, when technically, scientifically, and ethically relevant, which was developed as part of the EU-FP7 Biobank Standardisation and Harmonisation for Research Excellence in the European Union Project. 8,9 DataSHIELD enables researchers to analyze data from partner institutions swiftly and securely, respecting the current national and European data protection regulations. To briefly summarize its use, data holders store individual-level data on their own local data warehouse servers and link to the DataSHIELD client portal using the MOLGENIS Armadillo server (https://github. com/molgenis/molgenis-service-armadillo). The connection between the data warehouse and the client portal is restricted so that only analysis commands can pass through from the client portal to the data server and only nondisclosive summary statistics are sent from the data server to the client portal. In this way, analyses using data from multiple studies can be run from a central analyst's computer, thus strongly increasing analysis speed and decreasing administrative load and local analyst time. Each study controls permissions to identified researchers within LongITools to use their data in any analysis.
What has been and will be measured?
LongITools will use existing pan-European models for air pollution, noise, and green space as established within European projects, such as the European Study of Cohorts for Air Pollution Effects 10,11 (www.escapeproject.eu) and the Effects of Low-Level Air Pollution: A Study in Europe 12,13 (www.elapseproject.eu). Following environmental maps will be linked to the individual residential addresses using a geographical information system: • Air pollution will be assessed using EU-wide air pollution maps at a fine (100 × 100 m) resolution, which have been developed within the European Study of Cohorts for Air Pollution Effects and Effects of Low-Level Air Pollution: A Study in Europe. These use hybrid land use regression modeling, incorporating surface air quality monitoring, satellite monitoring, chemical transport modeling, and fine scale traffic and land use data; • Noise estimates will be obtained using harmonized pan-European noise exposure models for traffic noise estimates, extending the existing (metropolitan area) maps to the full European population; • Green space will be assessed using satellite-based indices of greenness such as the normalized difference vegetation index; • Built environment will be modeled from the geographical information system and translated into indices of walkability, distances, food and sport outlet density, and accessibility of health care services.
These estimates allow LongITools to compose and study the exposomes throughout the life course. In addition, locally collected exposure data will be applied within RCTs to study the impact of environmental effects and their interaction with intervention target factors in the risk markers of CM-NCD within rather short intervention periods. Although largely available and often collected in a standardized way, the data on environmental exposures have possible intrinsic limitations in terms of (1) availability in historical cohorts such as the Dutch Famine Birth Cohort or the Northern Finland Birth Cohort 1966 and (2) heterogeneity of the source (and/or the effects) between countries. This later limitation will be addressed by studying in detail the structure of the data representing the environmental exposures.

Internal exposures.
LongITools will analyze the molecular pathways underlying the associations of environmental exposures and cardiometabolic health trajectories by using repeated measures of the internal exposome.   • Epigenomics will be studied by using DNA methylation, which has been measured by the Illumina Infinium Human Methylation 450K BeadChip and MethylationEPIC BeadChip platforms; • Transcriptomics measures are based on Illumina or Affymetrix arrays and RNA sequencing. e.g., we will use transcriptome data from the RCTs Foods for Weight Maintenance (ELIPA), Fat Quality on Blood Lipids and Immune Response (NOMA), and Health Grain Intervention (SYSDIMET) to analyze how air pollution, noise, and the build environment may mediate their effect on health via change in specific gene expression; • Metabolomics will be studied using nuclear magnetic resonance or liquid chromatography mass spectrometry-based platforms with methods enabling coverage of a wide repertoire of both endo-and exogenous metabolite classes including amino acids, bile acids, steroids, various lipid classes, microbiota-produced metabolites, diet-derived compounds, and xenobiotics. 14,15 Nontargeted metabolic profiling will be used to explore the connections between circulating metabolites and the exposure variables, providing metabolic snapshot of the exposome with unique opportunities for molecular epidemiology. 16,17 These analyses will result in semiquantitative detection of thousands of metabolite features, of which approximately 1000 will be identified a priori. Unidentified metabolites of interest detected from data analysis 18,19 will be identified using stateof-the-art tools and pipelines.
Information about the availability of the omics data in each LongITools study can be found in Table 2.

Health trajectories.
LongITools will use longitudinal, life course modeling throughout its analyses. LongITools will study how the exposome, linked to geocodes from an individual's birthplace or residential location, is associated with the following cardiovascular and metabolic health trajectories, i.e., the 4 main outcome phenotypes of LongITools: • Anthropometric trajectories, identified using height and weight measures in infancy, childhood, and adolescence, by using longitudinal growth data or latent trajectory modeling supported by adiposity milestones, such as adiposity peak, adiposity rebound, body mass index at puberty, and life course body mass index trajectories; • Glycemic health trajectories, identified using repeated measures of glycemic health, such as fasting glucose, fasting insulin, glycosylated hemoglobin, diabetes diagnosis, and diabetes medications; • Cardiovascular health trajectories, identified using repeated measures of blood pressure, heart rate, indices of cardiac structure and function, cardiac diagnoses, and cardiovascular medications; • Lipid-related health trajectories, identified using repeated measures of blood lipids, lipoproteins, and related medications.
Economic and policy impact.
LongITools will build a comprehensive data set of policy interventions targeting the exposome and health care, which were implemented in the time span and locations covered by the birth cohorts ( Figure 2). The aim is to investigate if and how such policy interventions have affected the insurgency of CM-NCDs, in terms of both health status and economic implications. Furthermore, LongITools will estimate, within an economic life course model of health production, the extent to which the economic burden is due to the external exposome and evaluate policy-relevant "what-if" scenarios using a dynamic microsimulation model, i.e., the Future Elderly Model. [20][21][22] Knowledge exploitation.
The theoretical framework will be carried out on existing data from the LongITools consortium to train artificial intelligence (AI) algorithms, such as random forests, support vector machines and deep neural networks, which will enable translation of data and knowledge into simple and available predictive tools for scientists, citizens, policy makers, or other end users. For this later part, codesigning activities are currently ongoing with multiple stakeholders, including clinicians, AI technologists, social scientists, and exposome experts, to define the functional and user requirements for these AI-powered digital tools. The steps being developed to achieve this are visualized in eFigure 1; http://links.lww.com/EE/A171 (design and principles of the LongITools health application), here interdisciplinary competences converge. Many variables from environmental and personal domains concur to delineate longitudinal trajectories. Some of them are already available, thanks to digital personal health care devices, while others will be more specific and will need the inclusion of targeted sensors as part of an embedded system (LongIToolsHub). These tools will be validated in a pilot study.

Strengths and Challenges
LongITools comprises a vast amount of prospective data collected in Europe, harnessed to enhance exposome research, as well as longitudinal and econometric modeling. When combined, these data offer immense potential to inform future European health policy. Furthermore, the data are organized to enable direct replication under the FAIR principles. While sample size allowing statistical power is deemed essential for robust evidence-based strategies, it is also important to combine study designs to validate findings under different statistical assumptions. Another strength of LongITools is the inclusion of data from RCTs for in-depth sensitivity analyses and to identify novel pathways that could be generalized in the cohort setting. Finally, LongITools includes longitudinal birth cohorts and aging cohorts from the same geographical location, which enables us to study the changing environment and its association with cardiovascular and metabolic health.
The key challenge faced by LongITools, and more broadly by all epidemiological study, is to translate the findings into meaningful change for global health. To tackle this, LongITools operates in close collaboration with policy makers throughout the project to convert the results into evidence-based policy options. A critical mass of data and expertise brought together in LongITools offers a substantial resource, which also leads to another challenge faced by the consortium: how to best combine the characteristics of the cohorts involved. The cohorts were established for their own individual purposes before being brought together under this project, and the methods of data collection have thus not been standardized a priori across the consortium. Therefore, consideration is required for the transferability of the statistical models and harmonization of the data. However, this also gives us the opportunity to examine if similar processes operate in different environments and thus to draw conclusions on generalizability. In addition, owing to the internationality of the project, differences in technology, questionnaire data and biospecimen collection methods, terminology and diagnosis definitions, country-specific measurement techniques, and ethical requirements among the studies exist. This heterogeneity can introduce differences in the results between the studies, which can be analyzed when necessary; we can generalize where possible and be specific when needed. In addition, the environmental exposures are harmonized by using the same model, which can also mitigate possible inconsistencies between the studies. The consortium has made significant progress in overcoming these challenges by developing and updating harmonization manual for the key variables and the overarching advantage of LongITools is that all studies provide rich data on similar key exposures and the outcome measures of interest.

Conclusion
LongITools provides a collection of studies across different time periods and encompassing different life stages, which will enable us to use a life course approach to study the exposome and its role in the trajectories of cardiometabolic health. Valuing the idea of open science, through its innovative data infrastructure, LongITools will spread new knowledge rapidly and efficiently to the other European Human Exposome Network projects and beyond. The generated and combined knowledge can then be used to develop innovative products and services with the potential to create new markets. In this way, LongITools aims to improve EU citizens' cardiovascular and metabolic health and thereby reduce individual and societal burdens and health care costs of CM-NCDs. Through the cooperation between research teams and SMEs and by using our extensive data, we expect to make several breakthrough discoveries. The evidence-based innovation platform developed in collaboration among academic and industrial partners during the project will support the cross-fertilization of new technologies and stimulate collaborations in developing new products and services within and beyond the European Human Exposome Network. As a proof of concept, LongITools will develop a mobile application for cardiometabolic risk monitoring, combining computational methods to wearable sensors data, realizing effective cooperation between academic and SME partners.

Conflicts of interest
The authors declare no conflict of interest.