Open science datasets from PREVENT-AD, a longitudinal cohort of pre-symptomatic Alzheimer’s disease

Highlights • PREVENT-AD is openly releasing datasets to the neuroscience community.• PREVENT-AD is a longitudinal study of older adults at-risk of Alzheimer disease.• Data include imaging, genetics, fluid biochemistry, neurosensory, cognition and more.• Data collection methods and data sharing plans for open science are described.


Background and Summary
Dementia is the final stage of Alzheimer's disease (AD), representing the culmination of a process that begins decades before onset of symptoms (Dubois, 2016;Iturria-Medina, 2016;Villemagne, 2013). Characterizing and tracking the pre-symptomatic stage of AD requires methods sensitive to the disease's early manifestations. These may include not only subtle cognitive decline, but also biochemical changes and structural or functional brain alterations. Studying these pre-symptomatic changes is crucial to a full understanding of AD, and their precise measurement is critical for trials of interventions that seek to prevent symptom onset.
To meet this challenge, in 2010 investigators at McGill University and the Douglas Mental Health University Institute Research Centre created a Centre for Studies on Prevention of Alzheimer's Disease (StoP-AD Centre). The Centre's prime objective was to pursue innovative studies of pre-symptomatic AD, with efforts to provide relatively enriched samples for prevention trials requiring individuals at-risk of developing the disease (Breitner, 2016). To this end, the StoP-AD Centre developed an observational cohort for PRe-symptomatic EValuation of Experimental or Novel Treatments for AD (PREVENT-AD). To increase the probability that participants would harbor the earliest changes associated with pre-symptomatic AD, entry criteria required intact cognition and a parental or multiple-sibling family history of AD. It is well-documented that populations with such a family history of AD-like dementia have a 2-3 fold relative increase in risk of AD dementia (Breitner, 1999;Huang, 2004).
The cohort was genotyped and followed by naturalistic studies of cognition, neurosensory capacities, cerebrospinal fluid (CSF) biochemistry, magnetic resonance neuroimaging (MRI) and by medical and clinical evaluations. The goal was to test a vast array of well-known biomarkers of AD pathology (amyloid and tau level in CSF) and neurodegeneration (cortical thickness and volume), but also to include more experimental measurements (e.g. episodic memory task fMRI) and other promising biomarkers (e.g. neurosensory measures). Precise measurement of the biomarkers mentioned above was required not only to monitor the progression of asymptomatic AD, but also to assess the effects of preventive interventions years before symptom onset. The main StoP-AD Centre clinical trial investigated the impact of naproxen sodium, a non-steroid anti-inflammatory drug, on the trajectory of a composite score of AD biomarkers (INTREPAD trial -NCT0270281). Data collection protocols remained the same between 2011 and 2017 with only a few additions. The data repositories and related methods described here refer to the data collected at the StoP-AD Centre, in the observational cohort and the INTREPAD trial from November 2011 to November 2017, the so-called PREVENT-AD 'Stage 1 ′ . During this interval, a total of 425 participants completed their baseline visit (BL), with 195 participants initially enrolled in the INTREPAD trial. A total of 386 met criteria for sustained investigation and 349 retrospectively agreed to share their data openly (Fig. 1).
PREVENT-AD data are now broadly available as an open science resource (Das, 2017;McKiernan, 2016;Wilkinson, 2016), providing opportunities for a larger number of researchers to analyze this rich dataset. Because the original PREVENT-AD ethics approval and consent process did not fully address open science data sharing plans, several steps related to ethics were required and are described below. Storage, Fig. 1. PREVENT-AD candidates enrolled between November 2011 and November 2017. 425 participants performed a baseline visit (BL) among which the datasets of 386 participants were shared with internal collaborators for analysis. 349 of these participants agreed to have their data openly shared, while one of them specifically refused to share data in the registered repository. MCI = Mild Cognitive Impairment, MRI = Magnetic Resonance Imaging. management, quality control (QC), validation and distribution of PREVENT-AD data were performed in LORIS, a system designed for linking heterogeneous data (e.g. behavioral, clinical, imaging, genomics) within a longitudinal context (Das, 2016(Das, , 2012. Up to 5 years of longitudinal follow-up are shared as part of this Stage 1, which includes a total of 1300 cognitive evaluations, 476 CSF sample analyses, 1559 MRI sessions and 1283 neurosensory tests, from 349 consenting participants.
As this research program is evolving, new biomarkers are being added. In February 2018, the data collection regimen was notably modified after a short period of inactivity and is now focused mainly on longitudinal cognitive, behavioral and lifestyle assessments as well as new neuroimaging modalities. These additional variables will be described in a PREVENT-AD 'Stage 2 ′ companion paper.
The multi-modal approach, the depth of the neurocognitive phenotyping of this population at increased risk of AD dementia along with the longitudinal nature of this single-site study render this dataset exceptional. The methods section will describe the PREVENT-AD cohort characteristics, data acquisition methods, and the approach used to create the data repositories for dissemination to the wider research community. Additional information about the PREVENT-AD program can be found at http://prevent-alzheimer.net/.

Development of open science data sharing
Making the PREVENT-AD dataset available for open science was a multi-step process achieved over approximately 2 years. The steps required to prepare the dataset before its dissemination on an open science platform included: ethical considerations (discussed below), data preparation in a structured and standardized way, dataset documentation (data dictionary, label convention, etc) and development of two LORIS databases for 1) Open and 2) Registered Access data dissemination. Significant efforts were needed to obtain additional ethics approval for the open science plans and proceed with a re-consent process for all participants. Most of the participants had remained actively involved in StoP-AD Centre activities, and this greatly facilitated the re-consent process. We also attempted to contact participants who were no longer associated with the Centre. We failed to reach only 13 participants out of 386 with data potentially to share. Even though partially de-identified data had been prepared for sharing with collaborating research teams, additional dataset de-identification steps were required to share data with a much larger community of researchers. For example, all brain images were "scrubbed" to remove any potentially identifying fields from their headers, and structural imaging modalities were defaced to prevent facial re-identification using 3D rendering (Fonov et al., 2018). Details about the procedure are described in the Data Record section. Brain images presenting incidental findings have the potential of presenting unique features that could increase the risk of potential re-identification. We decided to share these neuroimaging scans with an additional level of protection. The initial PREVENT-AD study codes were assigned a new "public" alphanumeric code (e.g.: CONP0000000), to which a participant's identity cannot directly be linked; the ability to do so remaining exclusively with the StoP-AD team.
Decisions were made by PREVENT-AD investigators regarding the choice of variables to be shared (based on quality, reliability, and level of standardization) and their level of access (Open or Registered Access, based on data sensitivity and the risk of potential re-identification or misuse). Datasets were prepared for two different LORIS platforms depending on the level of access. Open data are available to anyone who requests an account, whereas a broader dataset is available through Registered Access, available only to bona fide researchers (Dyke, 2018). Registration is approved by the StoP-AD team upon verification of the applicant's account information. Both repositories are discoverable through the unified interface of the Canadian Open Neuroscience Platform (CONP). Although costly, we expect that these efforts to prepare shared resources will increase the rate of scientific discovery in dementia research; this being our ultimate hope for people living with AD and their families.

Eligibility and enrollment assessments
A study nurse conducted preliminary eligibility screening over the phone or via an online questionnaire. Participants had to be 60 years of age or older, with an exception that individuals between 55 and 59 years old were eligible if their own age was within 15 years of symptom onset of their youngest-affected first-degree relative. Participants' family history of "AD-like dementia" was ascertained either by a compelling AD diagnosis from an experienced clinician or, if such a report was not available, by use of a structured questionnaire developed for the Cache County Study (Breitner, 1999). The questionnaire was intended to establish memory or concentration issues sufficiently severe to cause disability or loss of function, having an insidious onset and gradual progression (as opposed to typical consequences of a stroke or other sudden insult).
An on-site eligibility visit (visit label EL00) then included more specific questions on family history of AD dementia, medical and surgical history, pharmacological profile, lifestyle habits, as well as physical and neurological examinations, blood and urine sampling. The blood sample was used for genotyping (see section 3.1) only after an individual was declared eligible to the program. The CAIDE score (Cardiovascular Risk Factors, Aging, and Incidence of Dementia risk score) was derived using data collected at entry into the program (age, sex, education, systolic blood pressure, body mass index (BMI), cholesterol, physical activity and APOE ε4 status) (Kivipelto, 2006). Two cognitive screening instruments assessed integrity of cognition: the Table 1 Inclusion and Exclusion Criteria.

Inclusion criteria
→ Self-reported parental or multiple-sibling (2 or more*) history of Alzheimer-like dementia → Age 60 years or older (persons aged 55-59 years and < 15 years younger than their affected index relative were also eligible) → Minimum of 6 years of formal education → Study partner available to provide information on cognitive status → Sufficient fluency in spoken and written French and/or English → Ability and intention to participate in regular visits → Agreement for periodic donation of blood and urine samples → Agreement to participate in periodic multimodal assessments via MRI and LP for CSF collection (LP optional at first, then mandatory (in 2017) for participation) → Agreement to limit use of medicines as required by clinical trial protocols, if applicable → Provision of informed consent of the different protocols Exclusion criteria → Cognitive disorders -Known or identified during eligibility assessments (MoCA and CDR or exhaustive neuropsychological evaluation when needed) → Use of acetyl-cholinesterase inhibitors including tacrine, donepezil, rivastigmine, galantamine → Use of memantine or other approved prescription cognitive enhancer → Use of vitamin E at>600 i.u. / day or aspirin at > 325 mg / day → Use of opiates (oxycodone, hydrocodone, tramadol, meperidine, hydromorphone) → Use of NSAIDs or regular use of systemic or inhalation corticosteroids → Clinically significant hypertension (accepted if controlled medically), anemia, significant liver or kidney disease → Concurrent use of warfarin, ticlopidine, clopidrogel, or similar anti-coagulant → Current plasma Creatinine > 1.5 mg/dl (132 mmol/l) → Current alcohol, barbiturate or benzodiazepine abuse/dependence Inclusion and exclusion criteria for the PREVENT-AD observational cohort.  (Morris, 1993;Nasreddine, 2005) including its brief cognitive test battery. When cognitive status was in doubt (MoCA typically ≤ 26/30 or CDR > 0), a complete evaluation (2.5 h of testing) was performed by a certified neuropsychologist. The aim of this assessment was to determine if the cognitive deficits detected by the screening tests fell within the range of mild cognitive impairment (MCI), did not meet MCI criteria or were simply circumstantial, see section 'Management of cognitive decline' for more details.
Subsequently, during the enrollment visit (visit label EN00), a ~ 30minute Magnetic Resonance Imaging (MRI) session was acquired to rule out structural brain disease, while simultaneously ensuring participants' familiarity with the MRI environment. Handedness was determined using the Edinburgh Handedness Inventory (Oldfield, 1971), and an electrocardiogram was performed. Enrollment also required further documentation of stable general health, availability of a study partner to provide information on daily functioning, and willingness to comply with study protocols (Table 1 for detailed inclusion/exclusion criteria). Specific INTREPAD clinical trial inclusion/exclusion criteria are in the publication describing results of the trial (Meyer, 2019). In brief, they were similar except for additional criteria related to gastrointestinal tract problems and specific contraindicated concomitant medication. Final determination of eligibility for PREVENT-AD program was made by clinical consensus between one or more study physicians, a research nurse, and a neuropsychologist. All consent procedures fulfilled modern requirements for human subjects' protection, while avoiding excess participant burden. Consent forms were carefully crafted to use simple but comprehensive language (typically at an 8th grade reading level). Protocols, consent forms and study procedures were approved by McGill Institutional Review Board and/or Douglas Mental Health University Institute Research Ethics Board. Specific consent forms were presented prior to each experimental procedure.

Data collection overview
Eligible participants were enrolled either in the observational cohort or in the INTREPAD trial as both started enrollment around the same time. For both groups, the first annual visit is called baseline, labeled BL00, and follow-up (FU) visits are labeled FU12, FU24, FU36 and FU48, corresponding to the number of months after the baseline visit. Telephone follow-ups were conducted between on-site annual visits to keep contact with the participant and to update clinical information. Participants in the INTREPAD trial had more frequent on-site visits and telephone follow-ups for safety purposes. See Fig. 2 for the data collection timelines of the observational and trial cohorts.

Annual evaluations
During each longitudinal visit (BL00, FU12, FU24, FU36 and FU48) for both observational and trial cohorts, a standardized cognitive evaluation, neurosensory tests, and an MRI scanning session (1 to 1.5 h) were performed. On a separate day, participants who consented to the procedure donated CSF samples via lumbar puncture (LP). Medical conditions, pharmacological profile and various in-house health questionnaires were updated annually while blood and urine sampling, neurological and physical examinations were also performed. Routine laboratory results were obtained from a central medical laboratory in the Montreal area while ten milliliters of blood were centrifuged, aliquoted (plasma and red blood cells) and stored for further analysis in Dr. J. Poirier's laboratory.
Details about each experimental procedure are described in section 4.

INTREPAD trial
INTREPAD (Investigation of Naproxen TReatment Effects in Presymptomatic Alzheimer's Disease; clinicaltrials.gov -NCT02702817) was a randomized, double-blinded, placebo-controlled two-year trial of low dose naproxen sodium (220 mg b.i.d.) conducted in 195 PREVENT-AD participants. Trial recruitment began in March 2012 and ended in March 2015. Treatment (active or placebo) duration was 24 months. Standard annual PREVENT-AD evaluations were supplemented with an additional identical session three months after randomization (FU03). The 3-month assessment was intended to determine whether treatmentrelated changes, if any, occurred gradually or had rapid onset. It also served as a run-in period, with a modified Intent-to-Treat analysis design that considered only those participants who remained on treatment through this initial interval. The primary outcome of the trial was a composite Alzheimer Progression Score (APS) derived using item response theory from various cognitive and biomarker measures (Leoutsakos, 2016). For other trial data (such as study drug compliance, adverse events, etc), please refer to our results paper (Meyer et. al., 2019) (Meyer, 2019).

Summary of the open science sub-sample
Among the 425 participants who underwent baseline visits, 386 were confirmed as appropriate for final data analysis (see Fig. 1 for reasons for exclusion of the 39 others). From these, 349 participants (90.4%) consented retrospectively to have their data included in the shared data repositories under the principles of open science (one participant specifically asked to only share data in the open repository leaving a number of 348 in the registered repository). Ethnicity and genetic background of the cohort are relatively homogeneous. The majority of participants in the shared dataset come from the greater Montreal area in Québec, Canada; 98.9% are Caucasian and 86% have French as mother tongue. Women are somewhat over-represented (102 men, 247 women) while the proportion of APOEε4 carriers (4/4 = 2.0%; 4/3 = 32.2%; 4/2 = 4.3%) is slightly higher than the general Caucasian population (Farrer, 1997), in keeping with participants' family history of AD. PREVENT-AD participants are on average, younger than those in most aging or AD studies, (mean age at baseline = 63.6 ± 5.1 years old), are highly educated (15.4 ± 3.3 years of education) and cognitively unimpaired (MoCA score of 28.0 ± 1.6 out of 30). Up to four years of follow-ups are available (median 36-months of follow-up, IQR 36). Twelve-month data (FU12) are available for 278 participants, 24-month data (FU24) are available for 236 participants, 36-month (FU36) data are available for 177 participants and 48-month (FU48) data are available for 116 participants. Three-month (FU03) are available for 150 INTREPAD participants.

Cognition
Cognitive performance over time was assessed using the ~ 30 min Repeatable Battery for Assessment of Neuropsychological Status (RBANS) (Randolph, 1998) at baseline and each subsequent follow-up visit. This battery consists of 12 subtests (list learning, story learning, figure copy, line orientation, picture naming, semantic fluency, digit span, coding, list recall, list recognition, story recall, figure recall) that yield 5 Index scores (immediate memory, delayed memory, language, attention and visuospatial capacities) and a total score. The battery is available in both French and English in 4 equivalent versions to reduce practice effects in longitudinal assessment. In the observational cohort, the versions were administered in chronologic order A, B, C, D. For participants enrolled in the INTREPAD trial, version A was administered at baseline, and alternate forms were used in random order at follow-up visits. The data were scored following the RBANS manual, which results in age-adjusted Index scores with a mean of 100 and standard deviation of 15. Subtest scores have a mean of 10 and standard deviation of 3. Additionally, we scored all participants using norms specified for individuals aged 60-69 years, thereby allowing detection of potential decline with advancing age. Both scores (age-adjusted and graded exclusively using age 60-69 norms) are available in the registered data repository.
At these annual visits, we also administered the AD8 questionnaire to the study partner (a family member or friend in regular contact with the study participant). The AD8 comprises 8 questions evaluating changes in the participant's memory and functional abilities, and is intended to discriminate normal aging from very mild dementia (Galvin, 2005). AD8 total score and answers to each question are shared in the registered repository.

Management of cognitive decline.
Once the MoCA and CDR performed at eligibility confirmed that the research participants were cognitively intact at entry, we performed the baseline RBANS and followed their cognition annually (+3-month FU in INTREPAD trial). However, at each visit, if the cognitive test results were lower than expected and the cognitive status was in doubt (MoCA less than 26 or CRD > 0 (for screening tests at eligibility) or RBANS index score > 1 SD below the mean in two different cognitive domains (for the follow-ups)), a complete cognitive evaluation was requested by the study physician and was performed by a certified neuropsychologist. Suspicion of probable mild cognitive impairment (MCI) after this neuropsychological evaluation led to the exclusion (or ineligibility) of the research participant from the research program and referral to an affiliated memory clinic, as needed. This procedure was implemented to ensure our cohort was purely asymptomatic. An exception was made for participants from the INTREPAD trial since we needed to monitor potential adverse events, so INTREPAD participants showing cognitive decline were invited to pursue their annual visit at our Center. Notably, a significant portion of the comprehensive neuropsychological evaluations, triggered by low test results, turned out to be reassuring and did not reveal any cognitive deficits. In these cases, the participants were invited to continue their annual follow-up in our cohort as their low scores were considered 'circumstantial'. From 2016, an extension to the PREVENT-AD protocol was approved to allow the follow-up of PREVENT-AD participants who developed MCI or dementia. Thus, the time point of conversion to probable MCI is documented in the data sharing repository and data related to this conversion point are also provided. This is described in more detail in the Stage 2 data sharing companion paper, in preparation.

Smell identification.
Odor identification (OI) abilities were tested in a 30-minute session in a well-ventilated room, using the standardized University of Pennsylvania Smell Identification Test (UPSIT) (Doty et al., 1984). This test uses "scratch-and-sniff" stimuli of 40 items (4 randomized booklets of 10 odorants each). Although the test can be self-administered, a trained examiner administered the test to improve reliability. The UPSIT was administered at baseline and each follow-up visit, and total score and selected data are shared in the registered repository. Additional information on the use of the UPSIT in PREVENT-AD and related results are detailed in two publications (Lafaille-

Auditory processing.
Central auditory processing (CAP) evaluations were added to the study in 2014. This instrument is therefore not available at all time points for every participant and was available only in French. We used both the Synthetic Sentence Identification with Ipsilateral Competing Message (SSI-ICM) test and the Dichotic Stimulus Identification (DSI) test (Fifer, 1983;Speaks et al., 1967). After having first been assessed for simple auditory acuity (with monosyllabic words), participants were asked to identify spoken "pseudo-sentences," either with various sound levels of a distracting background narrative (SSI-ICM) or with dichotic binaural presentation (DSI). A session including these two auditory tests could typically be completed in less than 45 min. Selected auditory processing data are available in the registered repository. Additional information on the use of the auditory processing test in PREVENT-AD and related results are detailed in two publications (Tuwaig, 2016(Tuwaig, , 2017.

Neuroimaging
All participants were scanned on a Siemens TIM Trio 3 Tesla Magnetic Resonance Imaging (MRI) scanner at the Brain Imaging Centre of the Douglas Mental Health University Institute using a Siemens standard 12 or 32-channel coil (Siemens Medical Solutions, Erlangen, Germany). The duration of MRI sessions varied between the different visits from 0.5 to 1.5 h and included structural and functional modalities (Fig. 3A). Modalities acquired included T1-weighted, T2-weighted and Fluidattenuated inversion recovery (FLAIR) images, diffusion MRI, arterial spin labeling (ASL), resting-state functional MRI and task functional MRI to assess episodic memory (see Table 2 for parameters of each sequence). After June 2016, new enrollees (n = 48) underwent a slightly different protocol where the task fMRI acquisitions were removed and the following acquisitions added: a MP2RAGE for T1 maps, a multi-echo gradient echo for T2* maps, and a high in-plane resolution T2weighted image to assess hippocampal subfields and brain microstructure. The 12-channel coil was replaced by a 32-channel coil for all acquisitions with this new session protocol (Fig. 3B). The same images are shared in the open and the registered repositories, but images of participants presenting potentially identifying incidental findings are provided through the registered LORIS instance only.

Episodic memory task fMRI.
Episodic memory tasks for objectlocation associations were performed by participants longitudinally, but not part of the protocol at the 36 M visit. As previously mentioned, those enrolled after June 2016 did not perform this task while the existing cohort continued to complete it annually. The study design is presented in a recent publication (Rabipour, 2020). In brief, participants were scanned as they encoded an object and its left/right spatial location on the screen. Forty-eight encoding stimuli were presented one at a time for 2000 msec with a variable inter-trial interval (ITI). A twenty-minute break followed encoding, during which time structural MRIs were acquired. After this break, participants were presented with the associative  Legend: TR = repetition time; TE = echo time; TI = inversion time; FOV = field of view; MPRAGE = magnetization prepared gradient echo; FLAIR = fluid attenuated inversion recovery; PCASL = pseudo-continuous arterial spin labeling retrieval task in which they were presented with 96 objects (48 "old"previously encoded objects; 48 "new" objects) and were asked to make a forced-choice between four-alternative answers: i) "The object is FAMILIAR but you don't remember the location"; ii) "You remember the object and it was previously on the LEFT"; iii) "You remember the object and it was previously on the RIGHT"; and iv) "The object is NEW". Different stimuli were employed at each visit allowing longitudinal data collection. Images used for the task were taken from a bank of standardized stimuli (Brodeur, 2010;Brodeur et al., 2014). Six different versions of the stimulus sets were presented in the following order for both the observational cohort and the INTREPAD trial: enrolment scan: v1; baseline scan: v2; follow-up 3 months: v3 (only in INTREAD trial), follow-up 12 months: v4; follow-up 24 months: v5; follow-up 36 months: no episodic memory task; follow-up 48 months: v6. The E-Prime program (version 2) was used to run the experimental protocol and collect behavioral data (Psychology Software Tools Inc., Pittsburgh, PA, USA).

Cerebrospinal fluid collection
Participants who consented to this procedure donated CSF samples via lumbar puncture (LP) on a separate day from the main annual visit. We link the CSF data to a specific time point as the LP procedure was performed within 6 months of this annual visit (on average 28 days after the annual visit). Given that serial LPs were initially only performed on participants enrolled in INTREPAD, the majority of the CSF samples come from INTREPAD participants (n = 99 INTREPAD participants out of 160 CSF donors who consented to share data with the research community). In 2016, considering the overall success of the LP program clinical trial. * Family history of Alzheimer-like dementia: Self-reported at entry in the program but also updated at regular intervals. ** Blood test (non-fasted): Vitamin B12, glycosylated hemoglobin, thyroid stimulating hormone, total cholesterol, high density lipoprotein, low density lipoprotein. ***Breakdown of the MRI sessions is shown in Table 4; more MRI are available in the registered LORIS instance as it contains the MRI with incidental findings.
(acceptance, tolerability, and retention through serial repetitions) serial LPs were also performed in the broader observational cohort. Therefore, some participants were enrolled in the LP protocol after their baseline visit and may have CSF data only at later time point(s). In 2017, consent for such LPs became an inclusion criterion for new participants.
LPs were performed by a neurologist (Dr. P. Rosa-Neto) with an internationally accepted procedure that typically lasted less than 15 min. A large-bore introducer was inserted at the L3-L4 or L4-L5 intervertebral space, after which the atraumatic Sprotte 24 ga. spinal needle was used to puncture the dura. Up to 30 ml of CSF were withdrawn in 5.0 ml polypropylene syringes. These samples were centrifuged at room temperature for 10 min at ~ 2000g, and then aliquoted in 0.5 ml polypropylene cryotubes, and quick-frozen at − 80 • C for long-term storage. A video describing the procedure at the StoP-AD Centre is available at https://www.youtube.com/watch?v = 9kckrlBIR2E.

Data management
The LORIS system was customized for the PREVENT-AD program to facilitate data entry, storage, and data dissemination. Forms included customized algorithms developed for aggregating various pieces of data in a user-friendly manner. Numerous LORIS modules were also used to facilitate the curation process, including a module to track the status of the participants, a specific module on family history of AD and another one on drug compliance, for example. The document repository and data release modules facilitated the management of data distribution, documentation and access.

Data Record
Basic demographics and longitudinal neuroimaging raw data can be found in the open LORIS repository (https://openpreventad.loris.ca), while datasets with more sensitive material such as cognitive, medical and neurosensory information, genotypes, CSF measurements, etc., are accessible to qualified researchers only at https://registeredpreventad. loris.ca. PREVENT-AD repositories are discoverable via the Canadian Open Neuroscience Platform (CONP) at https://portal.conp.ca. Table 3 presents the list of Stage 1 PREVENT-AD shared data, their level of access, the number of participants who provided data and at which time points. MRI acquisitions available at each time point are presented in Table 4. In the registered LORIS repository, (https://registe redpreventad.loris.ca), all PREVENT-AD Stage 1 data are regrouped in 13 different CSV files accompanied by 3 text files and a detailed data dictionary. The content of each file is described in Table 5.

Versions
Three releases were part of this Stage 1 data sharing. In April 2019, we first released data from 232 participants in the open repository (OPEN version 1.0). In August 2020, we added 76 subjects into the same open repository (OPEN version 2.0) for a total of 308 participants. We are now releasing the registered data related to 348 participants (Registered version 1.0).

Neuroimaging specificities
All MRI acquisitions are available in MINC and NIfTI file formats, the latter being organized according to the Brain Imaging Data Structure (BIDS) (Gorgolewski, 2016). Brain MRIs can directly be downloaded following instructions provided in both LORIS instances. MRIs are available for 308 participants in the open instance, while an additional 37 candidates with MRI presenting potentially identifying incidental findings are provided in the registered instance, for a total of 344 participants (note: 4 participants did not undergo the MRI protocol and one Table 4 MRI modalities available at each study visit -Stage 1 .

Code availability
In the shared repositories, identifying fields (such as PREVENT-AD participant's ID, date of birth, date of MRI, etc) were scrubbed from the MRI image headers using the DICOM Anonymization Tool (DICAT; https://github.com/aces/DICAT), while anatomical images were 'defaced' using the defacing algorithm developed by Fonov et al. (2018), which has been shown to not significantly affect data processing outcomes (Fonov et al., 2018). While the code for this algorithm was slightly modified for integration into the LORIS platform, the algorithm remained unchanged. The version of the script used to deface the PREVENT-AD datasets is available in Github (https://github.com/cma djar/Loris-MRI/blob/open_preventad_v20.1.0/uploadNeuroDB/bin/de face_minipipe.pl).
For the episodic memory task fMRI, data were saved in .edat2 format (readable by the program only), and as text files to facilitate future data sharing. De-identification of the text files (scrubbing for dates and PREVENT-AD study IDs) was performed using a script available on Github (https://github.com/cmadjar/Loris-MRI/blob/open_preventad_ v20.1.0/tools/scrub_and_relabel_task_events.pl). De-identified data are available in both repositories.

Technical validation
Data were entered in LORIS in duplicate to allow detection of discrepancies between two entries of the same information and systematic corrections of mistakes by the data entry personnel. In case of significant discrepancy, source documentation was reviewed and discussed among the data entry crew and the clinical team. If needed, information was reviewed with the participants by phone, or at next follow-up. LORIS has several internal checks in place to detect any abnormalities and avoid missing data in required fields, out of range values, etc. Additional QC checks were implemented during data preparation for sharing.

Neuroimaging
Visual quality control of the raw anatomical images was performed by a single rater via the PREVENT-AD LORIS study interface. Quality control status, predefined comments and text comments were saved directly in LORIS. After the de-identification process, every image was visually reviewed to ensure proper defacing and the absence of any potentially identifiable information.

Terms of use
When accessing the shared data repositories, researchers agree to a standard set of good data use practices, such as meeting ethics  requirements and keeping the data secure. PREVENT-AD data must be used for neuroscience research as stipulated in the consent forms and in the Terms of Use. Authors publishing manuscripts using the PREVENT-AD Stage 1 data must name PREVENT-AD as the source of data in the abstract and or method section and cite this manuscript. The terms also include agreements on commercialization and privacy (https://openpr eventad.loris.ca/login/request-account/). For reuse of the PREVENT-AD data, researchers need to carefully read and understand the context of the data collection described in this paper and in the documentation available in the data repositories.

Labeling convention
The label convention used in the PREVENT-AD dataset is available when accessing the data repositories via the data dictionary.

Additional convention for the INTREPAD trial
Data collected for individuals enrolled in the trial are identified as such in the repositories by the prefix 'NAP' (e.g.: NAPFU12) in opposition to 'PRE' identifying the PREVENT-AD observational cohort (e.g.: PREBL00). From the shared sub-group of participants (n = 349), 11 initially enrolled in INTREPAD failed to complete the first 3-months on study drug (re.: adverse events, low compliance, etc). These cases had their firsts visit labelled as 'NAP' and the rest as 'PRE' as they were switched back to the observational cohort. Any participants who stayed on the study drug for the minimum 'run-in period' of 3-months, kept their prefix 'NAP' even if the study drug was discontinued at any time between FU03 and FU24. Participants' visits also continued to be identified as 'NAP' even after the end of the trial (Stage1: NAPFU36, NAPFU48 and Stage 2: up to NAPFU84).
The treatment allocation regimen (naproxen vs placebo) information is shared in the registered repository, but we suggest that data collected in the observational cohort and the trial (treated group and placebo group) can be merged for longitudinal analysis as no treatment effect was demonstrated in the trial and visit protocols were identical for all (Meyer, 2019).

Ongoing and future efforts
The StoP-AD Centre continues to collect data. The Stage 2 data collection regimen includes additional neuroimaging techniques, such as positron emission tomography (PET), magnetoencephalography (MEG) and a modified MRI protocol, additional lifestyle, personality traits and behavioral information as well as information on individuals who developed MCI. These new acquisitions enhance the information value of the PREVENT-AD data resource and expand the number of longitudinal observations up to 96-months of follow-up. PREVENT-AD Stage 2 datasets are also being prepared to be shared with the research community. To facilitate data usage, Stage 1 and Stage 2 PREVENT-AD datasets will be shared in the same data repositories.
At the StoP-AD Centre, our goal is to continue to keep our cohort of participants engaged in our research program, carefully monitor their cognition, gather new AD biomarkers using state-of-the-art technologies and continue our involvements in the McGill University Open Science Initiatives to make data available to the greater neuroscience research community.

Data and code availability statements
All the information presented below is also provided in the manuscript.
Data used in preparation of this article were obtained from the Presymptomatic Evaluation of Novel or Experimental Treatments for Alzheimer's Disease (PREVENT-AD) program (https://prevent-alzheimer. net/?page_id=42&lang=en).

Data availability
Basic demographics and longitudinal neuroimaging raw data can be found in the open LORIS repository (https://openpreventad.loris.ca), while datasets with more sensitive material such as cognitive, medical and neurosensory information, genotypes, CSF measurements, etc., are accessible to qualified researchers only at https://registeredpreventad. loris.ca. PREVENT-AD repositories are discoverable via the Canadian Open Neuroscience Platform (CONP) at https://portal.conp.ca/.

Code availability
In the shared repositories, identifying fields (such as PREVENT-AD participant's ID, date of birth, date of MRI, etc) were scrubbed from the MRI image headers using the DICOM Anonymization Tool (DICAT; https://github.com/aces/DICAT), while anatomical images were 'defaced' using the defacing algorithm developed by Fonov and Collins (2018). While the code for this algorithm was slightly modified for integration into the LORIS platform, the algorithm remained unchanged. The version of the script used to deface the PREVENT-AD datasets is available in Github (https://github.com/cmadjar/Loris -MRI/blob/open_preventad_v20.1.0/uploadNeuroDB/bin/deface_mini pipe.pl).
For the episodic memory task functional MRI, data were saved in . edat2 format (readable by the program only), and as text files to facilitate future data sharing. De-identification of the text files (scrubbing for dates and PREVENT-AD study IDs) was performed using a script available on Github (https://github.com/cmadjar/Loris-MRI/blob/open_ preventad_v20.1.0/tools/scrub_and_relabel_task_events.pl).