Improving data availability for brain image biobanking in healthy subjects: Practice-based suggestions from an international multidisciplinary working group
Introduction
Neuroimaging has become embedded in substantial research endeavours to understand normal brain function and effects of disease (e.g. Thompson et al., 2003; Fox and Schott, 2004; Lemaitre et al., 2005; Marcus et al., 2009; Wardlaw et al., 2011a, 2011b; Weiner et al., 2015). Until recently, many neuroimaging studies were in single centres and, inevitably, of modest size (Dickie et al., 2012). Many much larger population scanning initiatives are now ongoing (Jack Jr et al., 2008), and many multicentre clinical trials routinely include imaging as part of inclusion criteria and as outcome measures (Cash et al., 2014), providing the potential for large multicentre collections capturing the range of brain structure in the population. The importance of maximising the value captured in this large amount of imaging data – to detect how differences in brain structure and function relate to behavioural or clinical outcomes – is now widely recognised (Toga, 2002, Barkhof, 2012; Poline et al., 2012). The value of data for answering new questions can grow with sample size, e.g. for replication, increasing population representativeness, and increasing study power. To address this issue, a growing number of electronic databanks including brain imaging are available, either from dedicated cohorts (e.g. Alzheimer's Disease Neuroimaging Initiative, UK Biobank, IMAGEN), or collections of studies (e.g. Brain Imaging in Normal Subjects, Dementia Platform UK, Open Access Series of Imaging Studies): see Table 1.
The wide variation in brain structure and function both within and between individuals at different ages has long been recognised (Wardlaw et al., 2011a, 2011b; Dickie et al., 2013). Methodologies that use appropriately representative populations are needed to provide normative populations, particularly for healthy subjects (i.e. those without neurological diseases such as stroke or dementia). They can provide informative reports for users (e.g. ‘brain on 5th percentile for volume at age 70’ for a specified population) and simultaneously embrace the spectrum of individual variation (Dickie et al., 2015a, Dickie et al., 2015b). Brain imaging is increasingly used in the diagnosis of neurological diseases, and mental health disorders (Fox and Schott, 2004). Data from existing cohort or population studies (e.g. Marcus et al., 2009), can help define boundaries between health and disease, to aid diagnosis and trial inclusion, to provide effect size estimates for planning trials, and, where relevant, controls for case-control studies (e.g. Dickie et al., 2015a; ADNI: Potvin et al., 2016).
Large repositories of brain imaging data from well-characterised subjects in accessible databanks are required to achieve this, while ensuring that data protection concerns are also addressed. These comprise data initiatives that are planned around harmonised protocols, such as ADNI (Alzheimer's Disease Neuroimaging Initiative) (Weiner et al., 2015), UK Biobank (Matthews & Sudlow, 2016), Human Connectome Project (van Essen et al., 2013), OASIS (Open Access Series of Imaging Studies) (Marcus et al., 2007a, Marcus et al., 2007b, Marcus et al., 2009), and those that represent data aggregation without initial harmonisation e.g. ENIGMA. (Enhancing Neuro Imaging Genetics through Meta-Analysis - Thompson et al., 2014, Thompson et al., 2015). The value of brain images is hugely enhanced by the information on the characteristics of individual subjects and the study in which they participated, but at present studies vary widely in what data they present on the study, subject or image data, and how these data are presented (Dickie et al., 2012).
Only a small proportion of the images performed for research are included in biobanks, and in existing structural brain image biobanks, normal subjects over 60 years of age are relatively under-represented, with limited cognitive and medical metadata to support their classification as “normal” (Dickie et al., 2012), and available with a limited range of neuroimaging sequences. For example, fluid attenuated inversion recovery (FLAIR) and T2* volumes are often not available, although they are essential for sensitively identifying and quantifying white matter hyper-intensities (WMH) and microbleeds respectively, neuropathologies present in normal ageing but associated with vascular cognitive impairment (Wardlaw et al., 2013; Ritchie et al., 2016). Newer initiatives like BRAINS (Job et al., 2016) provide a range of sequences (e.g., T1, T2, T2*, and FLAIR) for most subjects plus cognitive and medical information. Future data sharing will be facilitated by influencing how new data are collected in terms of core imaging sequences and meta-data variables.
The INCF (International Neuroinformatics Coordinating Facility) Standards for Data Sharing Neuroimaging Task Force the Brain Imaging Data Structure (http://bids.neuroimaging.io/) to advance standard organisation and descriptions of data files, and the Neuroimaging Data Model (http://nidm.nidash.org/) for data provenance tracking, but ongoing work is needed around developing community consensus and adoption of standards (Bjaalie and Grillner, 2007). Issues such as privacy, de-identification, quality control, provenance, avoiding including the same subjects in multiple databases, ethics (historical and future), consent, essential components of ‘good guardianship’, costs, sustainability, software version control, definitions of ‘normality’, and international variations in ethical and legal frameworks, also need further consideration (Rodríguez González et al., 2010). The European Society of Radiology (ESR) published a position paper on Imaging Biobanks (European Society of Radiology, 2015) defining imaging biobanks, outlining their purpose, and advocating the creation of a network/federation of such repositories with existing biobanks.
Many funders advocate or mandate that data generated by studies they fund are made public and the International Committee of Medical Journal Editors (ICMJE) has proposed that deidentified patient information is shared before research manuscripts of randomised controlled trials will be considered for publication (Taichman et al., 2016). While this data sharing may be relatively straightforward for tabular demographic data (i.e. the types of alphanumeric data that can be held in traditional databases), the situation is much more complex for brain image data (Toga, 2002, Marcus et al., 2007a). Factors like the size of imaging files and the possibility of identifying subjects from images impose non-trivial technological challenges. While initiatives such as NeuroVault (www.neurovault.org - Gorgolewski et al., 2015) avoid the problem by publicly sharing statistical maps for data aggregation it does not include whole datasets. By contrast, a repository like OpenfMRI (www.openfmri.org) includes raw-data, with some subject-level variables, which allows newer analyses to be performed. Even when there is a desire to share imaging data, there are a number of technical, legal and practical problems to be overcome: (Poline et al., 2012; Poldrack and Gorgolewski, 2014, Pernet and Poline, 2015).
Section snippets
Learning from existing databanks and population studies
Against this background, a group of experts, including specialists in image acquisition and analysis, clinical disciplines, epidemiology, legal, ethics, and data science, met to discuss and debate conceptual, legal, ethical and technical issues around creating brain image banks. We aimed to highlight the issues that need to be addressed, from the ethical to the practical, achieve some consensus, promote best practice and provide useful advice for ongoing and planned studies. The primary aim of
Data collection
There is a great willingness from many people across the life course to volunteer for brain imaging studies: even when the participants are in their nineties and the study includes prolonged imaging (Deary et al., 2012). However, such willing individuals – irrespective of age – tend to be fitter, better educated and less socially deprived than the general population (e.g. Deary et al., 2012; Stafford et al., 2013). Extra effort is therefore needed to encourage more representative population
Addressing data heterogeneity
Where more than one study is included in a brain image bank, like 3-CITIES (Alperovitch et al., 2002) or BRAINS (Job et al., 2016), there is usually substantial heterogeneity of the acquired demographic/clinical and imaging data. This can be addressed either by describing each variable (3-CITIES), or by harmonising metadata (BRAINS). Having many variables makes the database large and difficult to search, while transforming variables to agreed standards, which is simpler for the end user, is
Database infrastructure
Many of the studies that led to the creation of imaging databanks started over a decade ago, and reported issues relating to changing technology (Mazziotta et al., 2001). For example, technical staff need to consider the impact of hardware changes (e.g. upgrading or changing scanner software or hardware; changes in data storage solutions and formats) and software evolution, which can make keeping track of multiple analyses of the database challenging (Poldrack, 2014). Such changes in technology
Database management
The legal and ethical framework of individual countries, and agreements reached between them, may affect how and where data are or can be stored. Systems are required to ensure data security, but allow appropriate access. Relevant approvals should be transparent, e.g. in publications and on websites.
During the meeting it was recognized that brain image databanks should have a Steering Committee, including independent and lay representatives, to monitor and review progress. This has the
Conclusions
Brain image biobanking is a rapidly evolving field. Several related and relevant projects will complement our recommendations, such as the International Neuroinformatics Coordinating Facility (INCF) Neuroimaging Data Sharing Task Force (wiki.incf.org/mediawiki/index.php/Neuroimaging_Task_Force) meeting held at Stanford University on January 27–30th 2015, which led to the development of the Brain Imaging Data Structure (BIDS - http://bids.neuroimaging.io/, Gorgolewski et al., 2016).
A federated
Funding sources
The writing of this paper did not receive any specific grant from funding agencies in the public, commercial or non-for-profit sectors. Guarantors of Brain, British Geriatrics Society (Scottish Branch), Royal Society of Edinburgh, SINAPSE (Scottish Imaging Network: a Platform for Scientific Excellence) SPIRIT, International Neuroinformatics Coordinating Facility and Nuffield Foundation made contributions towards funding the meeting which formed the basis of this paper.
Acknowledgements
Thanks to Guarantors of Brain, British Geriatrics Society (Scottish Branch), Royal Society of Edinburgh, SINAPSE (Scottish Imaging Network: a Platform for Scientific Excellence) SPIRIT, International Neuroinformatics Coordinating Facility and Nuffield Foundation for contributions towards funding the meeting which formed the basis of this paper. TEN is supported by the Wellcome Trust (100309/Z/12/Z).
References (78)
- et al.
Epidemiological studies on aging in France: from the PAQUID study to the Three-City study
C R Biol.
(2002) - et al.
An optimised tract-based spatial statistics protocol for neonates: applications to prematurity and chronic lung disease
Neuroimage.
(2010) - et al.
Abnormal deep grey matter development following preterm birth detected using deformation-based morphometry
Neuroimage
(2006) - et al.
Permutation and parametric tests for effect sizes in voxel-based morphometry of gray matter volume in brain structural MRI
Magn. Reson. Imaging
(2015) - et al.
Brain development cooperative group
Neuroimage
(2016) - et al.
Brain templates and atlases
Neuroimage
(2012) - et al.
Imaging cerebral atrophy: normal ageing to Alzheimer's disease
Lancet
(2004) - et al.
Cerebral asymmetry and the effects of sex and handedness on brain structure: a voxel-based morphometric analysis of 465 normal adult human brains
Neuroimage
(2001) - et al.
Magnetic resonance imaging of the newborn brain: manual segmentation of labelled atlases in term-born and preterm infants
Neuroimage
(2012) - et al.
Anatomical correlations of the international 10–20 sensor placement system in infants
Neuroimage
(2014)
Towards structured sharing of raw and derived neuroimaging data across existing resources
NeuroImage
COINS Data Exchange: an open platform for compiling, curating, and disseminating neuroimaging data
Neuroimage
Effects of changing from non-accelerated to accelerated MRI for follow-up in brain atrophy measurement
Neuroimage
Age-and sex-related effects on the neuroanatomy of healthy elderly
Neuroimage
Regional growth and atlasing of the developing human brain
Neuroimage.
BIL&GIN: a neuroimaging, cognitive, behavioral, and genetic database for the study of human brain lateralization
Neuroimage
A patient care system for early 3.0 T magnetic resonance imaging of very low birth weight infants
Early Hum. Dev.
Methods and considerations for longitudinal structural brain imaging analysis across development
Dev. Cogn. Neurosci.
Multi-contrast human neonatal brain atlas: application to normal neonate development analysis
Neuroimage
Normative data for subcortical regional volumes over the lifetime of the adult human brain
Neuroimage
The WU-Minn human connectome project: an overview
NeuroImage
Neuroimaging standards for research into small vessel disease and its contribution to ageing and neurodegeneration
Lancet Neurol.
Impact of the Alzheimer's Disease Neuroimaging Initiative, 2004–2014
Alzheimers Dement
Making better use of our brain MRI research data
Eur. Radiol.
Parcellation of the healthy neonatal brain into 107 regions using atlas propagation through intermediate time points in childhood
Front. Neurosci.
Global neuroinformatics: the international neuroinformatics coordinating facility
J. Neurosci.
Common genetic variants and risk of brain injury after preterm birth
Pediatrics.
Imaging endpoints for clinical trials in Alzheimer's disease
Alzheimers Res. Ther.
LORIS: a web-based data management system for multi-center studies
Front. Neuroinf.
Associations between education and brain structure at age 73 years, adjusted for age 11 IQ
Neurology
Cohort profile: the Lothian Birth Cohorts of 1921 and 1936
Int. J. Epidemiol.
Do brain image databanks support understanding of normal ageing brain structure? A systematic review
Eur. Radiol.
Variance in brain volume with advancing age: implications for defining the limits of normality
PLoS One
Use of brain MRI atlases to determine boundaries of age-related pathology: the importance of statistical method
PLoS One
Genetic influences on schizophrenia and subcortical brain volumes: large-scale proof-of concept and roadmap for future studies
Nat. Neurosci.
Privacy-preserving data publishing: a survey of recent developments
ACM Comput. Surv. (CSU)
FBIRN, MBIRN, BIRN-CC XCEDE: an Extensible Schema For Biomedical Data
Neuroinformatics
Who wants a free brain scan? Assessing and correcting for recruitment biases in a population-based sMRI pilot study
Brain Imaging Behav.
Cited by (14)
Standardized biomarker and biobanking requirements for personalized psychiatry
2019, Personalized PsychiatryNormal Aging Brain Collection Amsterdam (NABCA): A comprehensive collection of postmortem high-field imaging, neuropathological and morphometric datasets of non-neurological controls
2019, NeuroImage: ClinicalCitation Excerpt :To NABCA donors are anonymous ID numbers with limited information as shown in Table 3. Regarding MRI scans, brain scans are unique and could allow for identification of the individual (BRAINS (Brain Imaging in Normal Subjects) Expert Working Group et al., 2017), therefore further de-identification of MRI data is applied. Image header information is removed (Rodríguez González et al., 2010), and either defacing (Milchenko and Marcus, 2013), or brain extraction methods (Jenkinson et al., 2012) are applied before distribution.
Machine learning of neuroimaging for assisted diagnosis of cognitive impairment and dementia: A systematic review
2018, Alzheimer's and Dementia: Diagnosis, Assessment and Disease MonitoringCitation Excerpt :Including nonimaging features, such as CSF biomarkers and cognitive test scores, unsurprisingly also improve performance. Further work is needed to clarify the interplay between data from images and other sources [22]. Most studies started with preprocessed features (“ground truth”) as input to the machine learning method.
From calcium imaging to graph topology
2022, Network NeuroscienceManagement and Quality Control of Large Neuroimaging Datasets: Developments From the Barcelonaβeta Brain Research Center
2021, Frontiers in Neuroscience