The ‘ALSPAC in London’ dataset: adiposity, cardiometabolic risk profiles, and the emerging arterial phenotype in young adulthood [version 1; peer review: 2 approved with reservations]

Rising rates of adiposity in the young pose one of the greatest threats to future population burden of cardiovascular disease. Understanding the contribution of genetic and early-life influences to adiposity profiles in young adulthood – when the first signs of subclinical cardiovascular disease commonly appear – are vital if effective lifetime prevention strategies are to be developed. This data note documents the extensive range of genotypic and phenotypic data available from a London-based sub-study of the long-running Avon Longitudinal Study of Parents and Children (ALSPAC)—the ‘ALSPAC in London’ Study—in which extensive adipose and cardiovascular phenotyping was carried out in participants recruited based on a genetic predisposition to obesity.


Introduction
The recent worldwide increase in population adiposity threatens to reverse achieved reductions in cardiovascular morbidity and mortality. The fastest rises in adiposity rates have been in children, teenagers and young adults and therefore represent one of the major growing threats to worldwide public health. The Avon Longitudinal Study of Parents and Children (ALSPAC) is a longitudinal birth cohort that recruited pregnant women living near Bristol, UK with an estimated delivery date between 1991 and 1992. The study includes extensive phenotypic, genetic, epigenetic and metabolomic data on the mothers, fathers and children from questionnaires, clinics and samples, and follow up is ongoing. Using a novel 'recall-by-genotype' study design to identify individuals with genetically predicted variation in body mass index (BMI) 1 , the 'ALSPAC in London' study aimed to examine the impact of adiposity on metabolic, inflammatory and autonomic disturbances, as well as their relationship to cardiovascular (CV) structure and function, with geneticallydirected deep phenotyping.

Methods
ALSPAC recruited 14,541 pregnant women resident in Avon, UK (former county covering Bristol and the surrounding areas in the South West UK) with expected dates of delivery 1st April 1991 to 31st December 1992 2,3 . The initial number of pregnancies for which the mother enrolled in the ALSPAC study and had either returned at least one questionnaire or attended a "Children in Focus" clinic by 19/07/99 is 14,541. Of these initial pregnancies, there were a total of 14,676 fetuses, resulting in 14,062 live births and 13,988 children who were alive at 1 year of age. When the oldest children were approximately 7 years of age, an attempt was made to bolster the initial sample with eligible cases who had failed to join the study originally. As a result, when considering variables collected from the age of seven onwards (and potentially abstracted from obstetric notes) there are data available for more than the 14,541 pregnancies mentioned above.
The number of new pregnancies not in the initial sample (known as Phase I enrolment) that are currently represented in the study and reflecting enrolment status at the age of 18 is 706 (452 and 254 recruited during Phases II and III respectively), resulting in an additional 713 children being enrolled. The phases of enrolment are described in more detail in the cohort profile paper 2,3 .
The total sample size for analyses using any data collected after the age of seven is therefore 15,247 pregnancies, resulting in 15,458 foetuses. Of this total sample of 15,458 foetuses, 14,775 were live births and 14,701 were alive at 1 year of age 2,3 .
The data included in this resource were collected from 436 young adults (mean age 21 ± 1 years) recalled for extensive adipose and cardiometabolic phenotyping between 2011 and 2015. Of these, 419 were recruited based on a genetic propensity for high or low BMI, with the other 17 comprising patients recruited prior to the decision to employ a recall-bygenotype study design.

Dataset 1 -genetic risk groups
Original beta-coefficients for the genome-wide genetic risk score (GRS) assembly were obtained directly from the genome-wide association study (GWAS) of BMI, conducted in 2010 by Speliotes et al. 4 , with the ALSPAC data removed from the release. All alleles were aligned such that the reference allele matched the BMI-increasing direction reported in the GWAS and plink (--score) was used to derive a score for each participant (N=8,350). After removing heterozygous haploid genotypes (N=46,067), direct genotypes were available for 500,527 SNPs, of which 472,208 mapped to SNPs that overlapped with the Speliotes et al. 4 GWAS results and 470,667 mapped to alleles. The ALSPAC genetic dataset used was the best available data at the time of the RbG study design (2012) and was subsequently used in several papers 5,6 . The number of participants invited and recruited to the RbG study was orchestrated based on an a priori power calculation taking into account the predicted explained variation in BMI by the genomewide GRS and its effect size on an example cardiovascular phenotype (SBP). Specifically, using the observed association between SBP and BMI at age 17, we calculated that we would require 450 participants in our RbG to be able to detect a difference of 3 mmHg (a difference that in young adulthood has clinically meaningful association with future cardiovascular disease (CVD) 7 with 80% power and a two-sided p-value threshold of 0.05 and assuming an approximate 3.3kg/m 2 unit difference in BMI between recalled groups. Participants were then recruited according to their appearance in nine sampling groups, from the most extreme to the least, to maximise power and difference in BMI. These samples were as follows: 3%, 6%, 9%, 12%, 15%, 18%, 21%, 24%, 27% from the lower tail of the genome-wide GRS distribution and 97%, 94%, 91%, 88%, 85%, 82%, 79%, 76%, 73% from the upper tail of the genome-wide GRS distribution The GRS was computed for 4,602 individuals in total within these sampling groups. Of those who were invited to the study (N=2,071), 419 individuals attended from across the entire distribution of sampling groups and had both full genetic and BMI data.
Dataset 2 -participant characteristics Height was measured in metres using a stadiometer (SECA 213, Birmingham, UK) and weight in kilograms using electronic weighing scales (Marsden M-110, Rotherham, UK). BMI was calculated as weight (kg) / height (m) 2 . Waist circumference was measured in centimetres at the narrowest circumference of the abdomen (roughly level to the umbilicus) using a standard flexible tape measure. Hip circumference was measured in centimetres at the widest circumference of the hips/buttocks using a standard flexible tape measure. Sagittal abdominal diameter was measured in centimetres as the distance between the small of the back and the top of the abdomen, with all measurements taken in the supine position with specialist metal callipers. Numerous other measures such as medical history, recent illnesses, smoking, alcohol consumption, physical activity and menstrual cycle were recorded via questionnaire, a copy of which is available as extended data 8 .
Dataset 3 -adipose phenotypes Body fat volume and distribution was measured using a 1.5T magnetic resonance imaging (MRI) system (Avanto; Siemens Medical Solutions). A graphics processing unit (GPU) implementation of the T2-IDEAL algorithm was used to measure body fat content. This technique iteratively separated MR images into fat and water components, which could then be used to measure the proportion of fat in each 36x36x10 mm voxel. Data was acquired in a continuous stack of 10 mm thick slices from the neck to the knees. To prevent motion artefact, breath holding was used for the thorax and abdomen and cardiac gating for slices containing the heart. Fat quantification in the head, arms and below the knees was impractical due to the need for participant re-positioning or specialised coils. Due to their low fat content these body parts were excluded using anatomical landmarks to ensure consistency between participants. The visceral compartment was manually separated from the fat image and the liver was excluded due to frequent artefacts at diaphragm level. Absolute volumes of fat were calculated for subcutaneous, visceral, liver, and pericardial deposits.

Dataset 4 -cardiovascular phenotypes
Carotid intima-media thickness. Common carotid artery intima-media thickness (cIMT) was measured using B-mode ultrasound images acquired in the ear-to-ear plane with the head rotated to 45° from the midpoint using a Zonare Z.OneUltra system equipped with a L10-5 linear transducer (Zonare Medical Systems, CA, US). Images were recorded in Digital 2 Imaging and Communications in Medicine (DICOM) format as 10 second cine-loop files for offline analysis using Carotid Analyzer software (Medical Imaging Applications, Coralville, IA). Left and right cIMT were taken to be the average of three end-diastolic measurements located on the far-wall of a single segment of arterial wall 5-10 mm in length and 10 mm proximal to the bifurcation.
Pulse wave velocity. Pulse-wave velocity (PWV) was used to estimate arterial stiffness via applanation tonometry (Sphygmo-Cor Vx, AtCor Medical, NSW, Australia). Two distinct indices of PWV were measured, using electrocardiogram (ECG)-gated pulse waves travelling between both carotid-radial and carotidfemoral sites. The participant was rested in a supine position and a handheld tonometer was placed over the left carotid artery in order to allow the recording of 10-12 clear and reproducible pressure waveforms. The same tonometer was then used to measure a similar number of femoral arterial pulse waveforms in the inguinal crease at the top of the right leg, or radial arterial pulse waveforms in the right wrist. Carotid-radial transit distance was measured between the upper edge of the suprasternal notch and the radial pulse measurement site using a tape measure, whereas carotid-femoral transit distance was measured between carotid and femoral sites both directly and via the umbilicus. The device software calculated the mean transit time (in milliseconds) from the recorded pulse waveforms and carotid-femoral and carotid-radial PWV were calculated as the transit distance/ transit time.
Pulse wave analysis. Estimates of central blood pressure (BP) and indices of systemic arterial stiffness such as augmentation pressure (AP) and augmentation index (AIx) were assessed using pulse wave analysis (SphygmoCor Vx, AtCor Medical, NSW, Australia). This technique measures the arterial pressure waveform at the radial artery and applies a validated generalised transfer function to provide the central pressure waveform. Participants lay supine on a couch in ambient conditions for at least 10 mins prior to the start of the assessment. Brachial BP was measured using a digital automated sphygmomanometer (Omron M6 Comfort). With the participant's wrist held steady, the tonometer probe was placed over the right radial artery at the level of the wrist and a recording of 10-12 pressure waveforms was saved to specialist software. Quality indices (average pulse height, pulse height variation, diastolic variation, and shape deviation) were assessed for each measurement to confirm they fell within acceptable limits (automatically calculated), otherwise the scan was repeated.
Autonomic function. Autonomic function was assessed using measures of baroreflex sensitivity (BRS) and heart rate variability (HRV). Participants lay supine on a couch in a darkened room in ambient conditions for at least 10 mins prior to the start of the assessment. Following this, a 3-lead ECG and 4-lead respiratory monitoring system were attached to the participant's chest to measure heart rate and breathing cycle, while an inflatable infrared photoplethysmographic cuff (Portapres, FMS, Netherlands) was attached to the middle finger of the right hand and (following calibration) inflated in order to continuously and non-invasively measure BP. Participants remained at rest for 20 mins while measurements were recorded by the software, with all signals fed to specialist software for the calculation of BRS and HRV. A stationary period of 5 min with less than 5% atrial/ventricular ectopic beats was chosen for the temporal QT interval variability analysis using a computer algorithm. The examiner defined a template QT interval for one beat from the beginning of the QRS complex to the end of T wave, including all deflections that might relate to repolarization, including possible U waves. The algorithm found the QT intervals of all other beats by determining the stretch or compression in time of each beat that best matched the ST segment and the T wave of the template beat, whereas the QRS complex was ignored. RR interval mean and variance and QT interval mean and variance were derived from the respective time series. QTVI, which represents the log ratio between normalized QT and RR interval variability, was calculated according to the equation: QTVI = log10 [(QTv ⁄QTm 2 )/(RRv/RRm 2 )], whereby a difference of 1 QTVI between two individuals implies a tenfold difference in temporal QT variability normalized to the QT interval, RR variance (HRV calculated in the time domain) and the RR interval. The squared coherence function was calculated from power spectra of the RR and QT interval time series and the cross-spectrum between these two series derived by fast Fourier Transform (Welch algorithm, five blocks, 50% overlap, Hanning window). The mean squared coherence was obtained by averaging the coherence function over the frequency band from 0 to 0.45 Hz. The coherence provides a measurement of the degree of linear interaction between the RR and QT interval fluctuations as a function of the frequency of those fluctuations. For BRS calculations, registrations were recorded at a sampling frequency of 1000 Hz and stored on a computer. The recordings were inspected offline for the removal of artefactual segments and sequences containing non-sinus beats. Ectopic beats were corrected by interpolation. The time series of systolic blood pressure (SBP) and RR interval from the entire period of recording (20 min) were scanned to identify baroreflex sequences, which were defined as three or more consecutive beats in which successive SBP and RR intervals concordantly increased or decreased, with the threshold set at 1.0 mmHg and 5.0 ms, respectively, and a shift of +1 between the BP pulse and the RR interval, resembling the classical criteria suggested by Bertinieri et al. 9 (threshold values of 1.0 mmHg and 4.0 ms, respectively and shift 1). Linear regression was applied to each sequence and only those for which the square of the correlation coefficient (r 2 ) was greater than 0.85 were accepted for further analysis. The spontaneous BRS was calculated, reflecting the average regression slope for all the linear regressions. RR data were used to derive standard deviation of normal-to-normal RR intervals (SDNN) for time-domain HRV.

Ultra-high frequency ultrasound (UHFUS).
In a randomlyselected sub-sample of participants, images of individual intima and media thicknesses (IT and MT, respectively) were obtained in the carotid, radial and dorsal pedal arteries using an ultra-high resolution ultrasound system (UHFUS; Vevo 2100, Visualsonics) with attached 40 MHz and 50 MHz transducers. Right common carotid artery scans were obtained via UHFUS and imaged longitudinally 1-2 cm proximal to the carotid bifurcation. Images were focused on the posterior (far) wall of the artery and the zoom function was used to magnify the area. Right radial artery scans were collected in the same way at a location 1-2 cm proximal to the skin fold separating the palma manus from the region antebrachii anterior. Right dorsal paedal scans were measured above the proximal first metatarsal bone in the foot. Images were recorded in DICOM format as cine loops for off-line analysis. IT echo was assessed offline and its total thickness measured with callipers within a standardised (8mm×8 mm) higher resolution zoom. The image was acquired, temporarily stored in the cine loop and consecutively zoomed when the measurements were done. The measurements of the IT were performed in systole of the vessel (determined by scrolling through the cine loop to reach the arteries largest diameter). The MT was then calculated as the difference between IMT and IT (MT = IMT−IT). IMT was defined as the distance from the leading edge of the lumen-intima interface to the leading edge of the media-adventitia interface. Lumen diameter was defined by the distance between the leading edges of the intima-lumen interface of the near wall and the lumen-intima interface of the far wall.
Cardiovascular magnetic resonance imaging. Cardiovascular MRI was used to assess numerous measures of cardiovascular structure and function. All measures were made using a 1.5T MR scanner (Avanto, Siemens Medical Solutions, Erlangen, Germany). Endocardial borders of the left ventricle (LV) were traced manually on short axis stacks at end-diastole and end-systole to evaluate end-diastolic volume (EDV) and endsystolic volume (ESV). Stroke volume (SV) was obtained by subtracting ESV from EDV. Epicardial borders were traced in end-diastole to calculate an epicardial volume. The EDV was subtracted from this volume, multiplied by assumed myocardial density to obtain left ventricular mass (LVM). Flow quantification was performed through-plane in a cross-section of the ascending aorta as it passes the bifurcation of the pulmonary arteries using an ECG-gated spiral phase-contrast MRI sequence. This technique allows images to be acquired within a short breathhold (0.5 seconds) with a spatial resolution of 1.6 x 1.6 mm and a temporal resolution of 30 ms. All images were processed using in-house plug-ins for the Open source software OsiriX (OsiriX Foundation, Geneva, Switzerland). Flow images were manually segmented (using the modulus images) and SV (ml) was measured and cardiac output (CO, L/min) calculated as SV x heart rate. At the time of flow imaging, BP was simultaneously measured using MRI-compatible oscillometric sphygmomanometer (Datex Ohmeda). Systemic vascular resistance (SVR; measured in mmHg/L/min) was calculated by dividing the measured mean BP by CO. Total arterial compliance (TAC) was calculated by optimisation of the two-element Windkessel model. Briefly, the flow curves and SVR were used as inputs to the model. Pulse pressure (PP) was calculated for a series of modelled pressure curves generated using a range of TAC values from 0.1 to 5.0 mL/mmHg in increments of 0.01. The compliance value that gave the smallest error between the modelled PP and the true PP was taken to be the true compliance.
Dataset 5 -metabolic/metabolomic phenotypes In order to assess cardiometabolic risk profiles, 15 ml of blood was collected at least 4-6 h post-meal using standard venepuncture techniques and divided between both serum and plasma (EDTA) vacutainers. Within 30 minutes of collection, samples were centrifuged for 12 mins at 4000 rpm, pipetted into 1 ml Eppendorf tubes and immediately frozen at -80°C. Plasma lipids (total cholesterol, triglycerides, HDL cholesterol, and direct LDL cholesterol), liver function tests (ALT, AST, and GGT), Apolipoprotein A1 and B, C-reactive protein, and glucose were measured using an automated analyser (c311, Roche Diagnostics, Burgess Hill, UK). Insulin was also measured using an automated immunoassay analyser (e411, Roche Diagnostics, Burgess Hill, UK), while adiponectin, leptin and IL-6 (high sensitivity) were measured by ELISA (R&D systems, Abingdon, UK). All measures were made using the manufacturer's calibrators and quality control materials. Serum paraoxonase (PON-1) activities were measured by UV spectrophotometry in a 96-well plate format using paraoxon (SigmaeAldrich, St Louis, Missouri). Metabolomic profiling was also performed on EDTA plasma samples by Nightingale Health (Finland) using a quantitative high-throughput NMR metabolomics platform, full details of which have been published elsewhere 10 .

Dataset 6 -psychological questionnaires
Three questionnaire-based psychological assessments were provided to participants on the day of testing in order to gauge indices of interpersonal support, happiness, and depression. The Interpersonal Support Evaluation List is a 12-item scale made up of three subscales -namely Tangible Support, Belonging Support, and Appraisal Support. All answers are given on a 4-point scale ranging from 'Definitely True' to 'Definitely False' and participants are required to rate each statement depending on its relevance to them. For analysis, the scores for negatively phrased statements are reversed and the scale totals for each subscale and the overall total are calculated 10 . The Oxford Happiness Questionnaire (OHQ) is a 29-item questionnaire; with each statement answered using a uniform 6-point Likert Scale. For analysis, negatively phrased questions are reversed and the sum total is divided by the number of questions to give an average score 11 . The Beck Depression Inventory II is a 21-question multiple choice self-report inventory used to assess the severity of depression. Questions are scaled from 0-3, with overall scores of 0-13 suggesting minimal depression, 14-19 mild depression, 20-28 moderate depression, and 29-63 severe depression 12 .

Ethical approval and consent
Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Full details of the approvals obtained are available from the study website (http://www.bristol.ac.uk/alspac/researchers/ research-ethics/). Written informed consent was obtained from participants prior to their clinic visit. Study members have the right to withdraw their consent for elements of the study or from the study entirely at any time.

Data
For ease of navigation, results include only variables that are deemed to be of particular relevance to this study and have been separated into six distinct datasets contained within tables, namely Table 1, genetic risk groups; Table 2, participant  characteristics; Table 3, adipose phenotypes; Table 4, cardiovascular phenotypes; Table 5, metabolic/metabolomic phenotypes; and Table 6, psychological questionnaires. A selection of example graphs showing relationships between BMI and a number of these adipose, vascular, cardiac, and metabolic phenotypes can be found in the extended data 8 .

Flow-Mediated Dilation
Baseline brachial artery diameter in mm prior to cuff inflation Bldfmd

Peripheral Blood Pressure
Systolic blood pressure measured immediately prior to PWA using brachial cuff occlusion in mmHg Sbppwa 431 117 (10) Diastolic blood pressure measured immediately prior to PWA using brachial cuff occlusion in mmHg Dbppwa 431 67 (7) Pulse pressure calculated immediately prior to PWA as systolic blood pressure -diastolic blood pressure Pppwa 431 50 (10) Mean arterial pressure calculated immediately prior to PWA as diastolic blood pressure + 1/3 (systolic blood pressure -diastolic blood pressure) Mappwa 431 84 (7) Pulse Wave Analysis

Dataset validation
In order to minimise variability inherent in the assessment of certain vascular phenotypes (e.g. carotid intima-media thickness, pulse-wave velocity, flow-mediated dilation), all study technicians working on the study were required to undergo identical training and accreditation procedures as those conducted in previous ALSPAC vascular clinics prior to scanning participants, details of which have been published elsewhere 13,14 . All blood samples were analysed in a single batch upon completion of the study in order to minimise any potential inter-assay variability. As with all large-scale studies, a number of variables have different degrees of missing data. Information on the extent of missing data for each variable can be found within each of the descriptive tables provided within this data note. Of note, around 10% of participants refused to provide a blood sample due to fear of needles. In addition, ultra-high frequency resolution ultrasound and psychological questionnaires were only added to the main study protocol in 2013, and data for these variables are therefore available in only a sub-sample of participants.

Data availability
Underlying data ALSPAC data access is through a system of managed open access. Full details of all available data can be accessed through a fully searchable data dictionary provided on the ALSPAC study website (http://www.bris.ac.uk/alspac/researchers/dataaccess/data-dictionary) and the steps below highlight how to apply for access to both the data included in this data note and all other ALSPAC data. The datasets presented in this data note are linked to ALSPAC project number B645; please quote this project number during your application. The ALSPAC variable codes highlighted in the dataset descriptions can be used to specify required variables.
1. Please read the ALSPAC access policy (PDF, 627kB) which describes the process of accessing the data and samples in detail and outlines the costs associated with doing so.
2. You may also find it useful to browse our fully searchable research proposals database, which lists all research projects that have been approved since April 2011.
3. Please submit your research proposal for consideration by the ALSPAC Executive Committee using the online process. You will receive a response within 10 working days to advise you whether your proposal has been approved.
If you have any questions about accessing data, please email alspac-data@bristol.ac.uk.
The ALSPAC data management plan describes in detail the policy regarding data sharing, which is through a system of managed open access. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Extended data
2 should provide details on the distribution of measured BMI in the participants in each of the defined percentile groups, as well as in the 17 participants who were not recruited by the recallby-genotype procedure, and the text should at least provide the average and standard deviation of BMI in the lower and upper tail groups.
I regret that I lack the expertise to review the other aspects of phenotype collection or study design.
As to the question, "Are the datasets clearly presented in a useable and accessible format?", I have answered no, because the datasets presented are simply cohort-level summary statistics for the dataset, rather than accessible and useable links to the data itself. The cohort data itself is only accessible after writing a project proposal which may be considered by the ALSPAC Executive committee.