Parent attitudes towards data sharing in developmental science

Background Data sharing in developmental science is increasingly encouraged, supported by funder and publisher mandates for open data access. Data sharing can accelerate discovery, link researchers with high quality analytic expertise to researchers with large datasets and democratise the research landscape to enable researchers with limited funding to access large sample sizes. However, there are also significant privacy and security concerns, in addition to conceptual and ethical considerations. These are particularly acute for developmental science, where child participants cannot consent themselves. As we move forward into a new era of data openness, it is essential that we adequately represent the views of stakeholder communities in designing data sharing efforts. Methods We conducted a comprehensive survey of the opinions of 195 parents on data sharing in developmental science. Survey themes included how widely parents are willing to share their child’s data, which type of organisations they would share the data with and the type of consent they would be comfortable providing. Results Results showed that parents were generally supportive of curated, but not open, data sharing. In addition to individual privacy and security concerns, more altruistic considerations around the purpose of research were important. Parents overwhelmingly supported nuanced consenting models in which preferences for particular types of data sharing could be changed over time. This model is different to that implemented in the vast majority of developmental science research and is contrary to many funder or publisher mandates. Conclusions The field should look to create shared repositories that implement features such as dynamic consent and mechanisms for curated sharing that allow consideration of the scientific questions addressed. Better communication and outreach are required to build trust in data sharing, and advanced analytic methods will be required to understand the impact of selective sharing on reproducibility and representativeness of research datasets.


Introduction
Developmental cognitive neuroscience is a burgeoning field (Nketia et al., 2021).Understanding the brain and cognitive changes that underpin the dramatic changes in behaviour over the first years of life is critical to a fundamental etiological understanding of human functioning.Studying brain development is also central to robustly testing the assumptions of theoretical models of brain function constructed to explain adult data.Finally, many neurodevelopmental conditions are highly heritable and connected to genes with peak periods of expression prenatally, thus likely impacting early brain development (Ismail et al., 2017).Therefore, developing a robust science of early brain and cognitive development that can support inferences about both general development and individual differences is critical.There is mounting evidence that robust science requires data sharing, because this allows assessment of reproducibility (Gilmore & Qian, 2022), generalisability to different analytic approaches (e.g., Poldrack et al., 2013), diversification of samples (see Hindorff et al., 2018 for a review) and increased power (Jones et al., 2019).However, efforts to share paediatric data are complicated by the ethical considerations around sharing data from participants who cannot themselves consent.As we move forward with data sharing efforts, it is important to consider the views of parents and families on the design and governance of data sharing endeavours to ensure that data is shared responsibly.
The laudable goals have led to many funders adopting mandates for data sharing.Journals may mandate statements about data accessibility and encourage placement of data in public repositories, with 'badges' awarded for compliance (Kidwell et al., 2016).These efforts are important, but there are also real concerns around the culture of 'bropen science' (Whitaker & Guest, 2020), whereby only a narrow demographic of researchers are able to benefit from open science practices.In particular, some consider that aspects of the open science movement have become unnecessarily dogmatic and hegemonic (e.g., McDermott, 2022).Further, there has been little focus on the gender distribution of researchers who exploit open datasets (sometimes called 'data parasites ';Longo & Drazen, 2016) relative to those who often are involved in preparing them.This can have real implications for the career development of researchers who collect and share data (particularly if they are at an earlier career stage) because this process requires a considerable investment of time relative to running analyses on existing data.Specifically, it has been found that female academics are assigned and complete more 'academic housework', such as mentoring, student and faculty service (including emotional labour) and being involved in lower status committees that do not necessarily get reflected in their CVs (Hanasono et al., 2019;Järvinen & Mik-Meyer, 2024;O'Meara et al., 2017).Indeed, this 'invisible labour' could also be extended to the time intensive tasks of data collection and curation that are specific to open datasets, with women typically overrepresented in more junior (e.g., data collection) roles, but underrepresented in more senior academic roles (Herschberg & Berger, 2015).In order for open datasets to avoid becoming a route where gender disparities are amplified, it would be important to examine the mechanisms of this unequitable gender distribution and remedy this.
Further to the current ethos of open data science, the needs and wishes of participants are often not the primary focus of many efforts to improve transparency and visibility, in part because such efforts are often driven by researchers who are not working directly with the communities from whom data is collected.
Concurrent with these efforts, there has been increasing concern about privacy, security and the values of those accessing research data.The introduction of GDPR legislation in Europe has highlighted the need to provide a purpose for which personal information is shared or collected; in the context of research, any data linked to an individual through an ID is considered to be personal information.There is meaningful concern that GDPR is hampering sharing of data for research or medical purposes far more than the use of data by companies that it was primarily designed to restrict (Vukovic et al., 2022).Widely publicised scandals such as the misuse of Facebook data by Cambridge Analytica (Berghel, 2018) and identifiable health care data used by Google (the Nightingale Project; Ledford, 2019) have raised further questions of trust in industry and have highlighted potential risks of data sharing.Finally, the neurodiversity movement has increased the profile of concerns around the misuse of data on individuals with neurodevelopmental conditions, with a particular focus on genetics and the long shadow of eugenics (Sanderson, 2021).
Developmental cognitive neuroscience has been largely insulated from these debates to date, with most focus on large-scale adult datasets, biobanks or specific clinical populations.However, there are increasing efforts to generate large-scale developmental brain and cognitive data that include consortia focused on basic science (Frank et al., 2017); the early development of cohorts of infants enriched for neurodevelopmental conditions (Jones et al., 2019;Volkow et al., 2021); and the inclusion of biological measures into ongoing population studies (Magnus et al., 2006;Magnus et al., 2016).As data sharing possibilities evolve with the advent of new technologies, there is pressing need to consider the views of parents on the collection and sharing of their child's data.This is particularly critical as sharing mandates become more common at developmental journals (e.g., Gennetian et al., 2022).The previous literature in this area has typically focused on genetic sharing (e.g.Yamamoto et al., 2021), and general biobanking (Antommaria et al., 2018;Halverson & Ross, 2012).For example, previous interview or questionnaire studies have

Amendments from Version 1
We have updated some of the Figures as well as expanded upon some of the points raised in the Introduction in regards to the gender distribution of researchers who prepare open datasets and those that exploit them.We have also included more clarifying detail about the questionnaire used.
Any further responses from the reviewers can be found at the end of the article shown that parents are generally supportive of data sharing, but with concerns around privacy, security and shared values between themselves and experimenters (Manhas et al., 2015).For example, an interview study with 19 interviewees and 18 focus-group participants selected from participants in an existing birth cohort study found generally strong support for data sharing (Manhas et al., 2016).However, the study found that altruism has limits.Participants had remaining concerns about privacy and security and some areas of divergence in opinion, including on sharing data with industry and the nature and composition of data access panels.Although one study indicated support for broad consent for data sharing (Manhas et al., 2016), another indicated that families were much more likely to refuse sharing for their child's biological data than their own (Burstein et al., 2014) and would select restricted sharing options if given the choice (Burstein et al., 2014).In a survey of families asked to think about biobanking data for children who were sick, the nature of a disease affecting their child would influence parent views on biobanking (Salvaterra et al., 2014).Thus, the existing literature suggests that families do have concerns about data sharing but recognise its power and potential.
Taken together, the majority of focus in the literature has been biobanks (in which samples are taken specifically for long-term storage) and genetic studies.Less is known about parental views on sharing other modalities commonly used in developmental science and collected for a specific research purpose and later shared (for example, including such measures such as imaging, eyetracking, electroencephalography and cortisol levels).Further, although some work shows that the presence of a medical condition might impact sharing (Salvaterra et al., 2014), no studies have considered whether decision-making is influenced by having a family history of neurodevelopmental conditions.This is timely to explore because many recent concerns have emerged as part of the neurodiversity movement (Hobson et al., 2022).Finally, few studies have explored the factors that influence sharing such as geographical location and nature of the receiving partner (though a range of studies have shown that trust in industry can be low; e.g., Manhas et al., 2015).
To fill these gaps, we conducted a comprehensive survey of 195 parents with and without a family member with a neurodevelopmental condition and with a child under 18 years.Most families had some level of previous research participation (c.94.4%), and thus were generally familiar with research practices.We focused on three broad questions: Who can share the data?We examined participants' willingness to share data with different types of organisations; across different geographical locations; and the factors that most influenced their decisions.How is the data shared?We asked participants about their views of common consenting models, including Restrictive (participants are contacted and asked each time a study wants to use their previously collected data), Dynamic (participants use an online portal to choose which studies their data is shared with), Tiered (participants choose certain categories/tiers of research they would be happy to share their data with at the point of informed consent for the original data collection) and Broad consent (the data can be shared in an anonymised manner for other studies and participants are not contacted for permission).What data is shared?We asked whether parents think differently about sharing data such as brain scans compared to sharing video recordings of behavioural tasks.We examined whether responses differed between families with and without a family history of neurodevelopmental conditions.

Study design
Participants completed an online questionnaire examining their attitudes towards sharing their child's data that had been collected within a research setting.The questionnaire was live between 2020 to 2021.

Recruitment procedure
Participants were recruited via the Birkbeck Babylab.Parents registered on the database were emailed an online questionnaire link (see Extended data (Begum-Ali, 2023)).As part of signing up to the database, participants had previously consented to be contacted for future research.In addition to this, we also advertised the questionnaire via social media platforms (e.g., Twitter, Facebook and Instagram).Both recruitment methods require participants to click on the study weblink which redirects to the participant information sheet (see Extended data (Begum-Ali, 2023)), as such participants were recruited on a voluntary basis.Recruitment continued until the questionnaire link had been live for a year.Consent was gathered via tick box; participants had to complete the online consent before being able to move onto the questionnaire.Inclusion criteria for the study was based on two factors; being a parent or legal guardian of a child under the age of 16 and currently being based in the UK.Participants were not rewarded with any compensation, monetary or otherwise, for taking part in the study.Ethical approval was provided by Birkbeck University of London Research Ethics Committee (ID: 192092)

Questionnaire
The survey was designed in Gorilla by researchers at the Centre for Brain and Cognitive Development, Birkbeck University of London for the purpose of this study, consisting of 57 questions, split into four sections (see Extended data (Begum-Ali, 2023)).The questionnaire took an average of 25 minutes for parents to complete.Parents were asked throughout the study to answer each question with their youngest child (mean age=59.44months, SD=45.11months) in mind unless the question specifically stated otherwise.Section 1 was designed to collect basic demographic information about their youngest child (sex, age, ethnicity) and family (highest level of education of the parent filling in the questionnaire, presence of neurodevelopmental or genetic conditions).Sections 2 to 4 asked participants about the types of organisations that parents would be willing to share data with (either their own or their child's), the level of control they would want over the data sharing process, what might influence their decision to share data and what type/form of data they would be willing to share.The survey consists of open-ended and closed questions.A small pilot group provided feedback on an earlier version of the survey (n=5).
Participants 466 individuals clicked on the study weblink, 166 of these did not complete the consent form and therefore did not get access to the study.A further 105 individuals completed the consent form but did not answer any questions, both these groups were not included in the analysis.195 participants are included in the study (see Figure 1); 183 mothers and 12 fathers.Of the 195 respondents, 122 had typically developing (TD) children and reported no neurodevelopmental conditions in the target child, siblings or parents, while 73 families reported at least one neurodevelopmental condition (NDC) in the immediate family (38 ASD, 11 ADHD, 16 ASD+ADHD, 3 NF1, 5 Dyslexia/dyspraxia/Epilepsy); see Table 1 for sample demographics.

Statistical analysis plan
Statistical analysis was conducted using SPSS (IBM version 26.0.0.0).Non-parametric tests were used due to the ordinal and categorical nature of the data.Answers to open questions were explored using Leximancer Desktop 5.0; this was not intended to be an exhaustive analysis but instead to generate concepts and themes which may be important to consider or highlight other factors not yet explored by the survey.The Leximancer's concept tool was used to display the prevalence and co-occurrence of generated themes/concepts within the text.

Who can share my child's data?
Participants were asked to select the types of organisations with whom they would be happy to share their child's data.This showed strong differences in views by sector: of the total sample 96.4% would be happy to share their child's data with universities and research centres, while only 16.9% of participants would be happy to share their child's data with private companies and industry.Families with a history of neurodevelopmental conditions (NDC) were more willing than families without a history of NDC to share their child's data with GPs and hospitals (NDC=95.9%vs TD=85.2%,X 2 (1)=5.38,p=.02, V=.17), private companies and industry (NDC=24% vs TD =12%, X 2 (1)=4.97,p=.026, V=.16) and charities (NDC=58% vs TD=31%; X 2 (1)=13.14,p<.001, V=.26).The two groups did not differ significantly in their willingness to share their data with universities and research centres (NDC=99% vs TD=95%; X 2 (1)=1.66,p=.197, V=.1).

Influences on parent's decisions to share child data.
To try and distinguish if the type of organisation was the main factor influencing parental decision making, participants were asked to rate three possible factors which may influence their decision to share their child's data: who is running the study, the purpose of the study and the data security procedures of the study, using a Likert scale from 1 (Not at all) to 5 (Very much so).Friedman's analysis of variance found a significant difference in how important participants rated the three factors, found that the purpose of the study was not rated significantly lower than data security (z=-.081,p=1) or who the study was run by (z=.2, p=.34).Data security was not rated significantly differently to who was running the study (z=.12, p=1); see Figure 3. Thus, the purpose for which data is shared is equally important to families as the security with which it is shared, and who data is shared with.Mann-Whitney U tests were used to investigate if there were differences between the NDC and TD groups on the three factors that may influence a parent's decision to share their child's data.NDC families reported that the purpose of the study is more important to them (M=4.6,SD=.76) than TD groups (M=3.89,SD=1.21);U=2551, z=3.65, p<.001, r=0.32.There were no significant differences between groups on the influence of who was running the study (U=1869, z=-0.37,p=.71, r=0.03) (NDC M=4.4,SD=0.92,TD M=4.4,SD=0.99) or the study data security procedures (U=2011, z=0.53, p=.6, r=0.05)(NDCM=4.41, SD=0.79,TD M=4.24, SD=1.02).
The influence of who was running the study on data sharing preferences was significantly affected by parental education [X 2 (2)=7.348,p=.025, η2=.047]; those parents with an undergraduate or postgraduation education were more influenced by the organisation conducting the study in their decision to share their child's data compared to those parents with a primary/secondary education (z=2.576,p=.030 and z=2.601, p=.028 respectively).There was no significant difference between parents with an undergraduate or postgraduate education on the influence of who is running the study on their willingness to share their child's data (z=.109,p=.913).See Table S6 for mean ranks and standard deviations.Parental education did not affect the influence of the Purpose of the study [X 2 (2)=.040,p=.980, η2=.0001] or the influence of data security procedures [X 2 (2)=2.099,p=.350, η2=.015] on data sharing preferences.
We also examined whether parental ethnicity affected the factors that parents rated as most influential in making the decision to share their child's data.We found that the influence of who was running the study was affected by parental ethnicity  4).There are no significant group differences when comparing TD and NDC ratings of level of comfort with the four consent types (Table S1).
We investigated whether parental education influenced participants ratings of the different consent models, no significant

Worry about child's wishes.
Participants were asked to imagine they had taken part in a study with their child when they were an infant and to rate how much they would worry about their child's wishes towards their data as they grew older, using a Likert scale from 1 (Not at all) to 5 (Very much so).
We considered the impact of prior research experience (e.g., if families who had taken part in any form of research differed in their views to those families that had no previous research experience).We found significant differences [χ 2 (4)=10.

What am I sharing?
Participants were asked to consider different types of data (e.g., questionnaires or experimental measures) and how willing they were to complete and share each type of data.Participants were asked to rate this on a scale from 1 (would not complete) to 5 (happy for this data to be shared freely).This is particularly important to identify what types of data participants would be happy to complete/share in order to inform study design with respect to future data sharing.
Biological data.A significant difference was found between the type of biological data and parents' willingness to complete and share this measure [Friedman's χ 2 (6)=265.S2.
Personal/demographic data.We found significant differences between the types of personal and demographic data parents were willing to share [χ 2 (4)=32, p<.001, W=.04].Parents were more willing to share parent demographics such as ethnicity and education level (mean rank=3.17),when compared to Household information, such as the first part of postcode or the number of bedrooms in the home (z=.46, p=.042; mean rank=2.71).This was also the case for parent substance use (mean rank=3.18),which was found to be more willingly shared when compared to Household information (z=-.467,p=.036).
Questionnaire data.We found significant differences in parents' willingness to share questionnaire data [χ 2 (5)=88.Qualitative results.The questionnaire also consisted of four free text questions, with no character limit.Parents were given the opportunity to report on what they viewed as the main benefits and risks of sharing their own/their child's data.We used Leximancer Desktop 5.0 (www.leximancer.com)to generate concepts and themes based on responses to these questions.This analysis was not intended to be exhaustive, but was included to identify any concepts of importance to parents which might have been overlooked when designing the survey.We used Leximancer's concept map tool to display topographically the prevalence and co-occurrence of concepts and themes from the qualitative data (see Figure 5).
Parents favoured themes like "better", "research" and "understanding" when describing the benefits of data sharing, indicating that supporting the improvement of research quality was the primary benefit of data sharing.The proximity of the themes "results", "available" and "faster", indicate that parents felt that expediting the research process was another important benefit of data sharing.
Parents describing the risks of data sharing used terms like "information and "misuse", both proximate to each other topographically and with high respective prevalence, indicating that the misuse of the information, or data, shared, was a key concern.Concerns about data "breaches" and data falling into the "wrong hands" were topographically proximate, though had fewer prevalence hits than other concepts.Parents of children with one or more neurodevelopmental condition were particularly concerned with the "sensitivity" of data being shared -this theme had high prevalence for this group only, and was proximal to other key concepts, particularly "misuse."

Discussion
Data sharing is an increasingly important focus for developmental science (Friedman, 2007), but we need to move forward in partnership with parents and families.To establish family views on key parameters of data sharing, we conducted a comprehensive survey of the views of parents of young children.We assessed a heterogenous sample enriched for families with a history of neurodevelopmental conditions to establish whether motivations may differ based on experience with clinical services.We focused on views of sharing of pseudoanonymised data (with contact details removed but identified with an ID code, as is common in most developmental research), because personal data is already governed by strict privacy and security frameworks.Results showed that families preferred consent models that gave them maximal control over sharing their child's data (dynamic or tiered consent).Further, only a third of families indicated that they were comfortable with completely open sharing, meaning that funders and journals that mandate open sharing will risk substantially skewed samples of enrolled participants; only 55% of families were happy with global data sharing, posing challenges for the globalisation of research collaborations.Trust in industry was particularly low, a consistent theme in other work that indicates the need for continued work in communicating the goals and values of industrial partners (Manhas et al., 2015).Importantly, the purpose of a research study was as important in decision-making as security and the nature of the people conducting it; this indicates families are thinking beyond their own child's right to privacy and considering more altruistic aspects of societal research goals but also that efforts to improve data security are necessary but not sufficient for an ethical data sharing policy.In general, families with experience of neurodevelopmental conditions were happier to support data sharing than other families; this raises important considerations for data sharing dialogues that often focus on these communities but may assume that parents with typically developing children have fewer concerns.Taken together, developmental science requires databasing approaches that allow shared decision making and embed transparency about the purposes for which data is shared.

Variation by location and sector
Trust and willingness to share data varied by key factors such as geographical location and sector.Overall, whilst 96% of parents said they would be happy to share their child's data with universities and research centres, only 17% of participants would be happy to share their child's data with private companies and industry.This lack of trust in the private sector has been previously noted (Manhas et al., 2015) and indicates substantial barriers to collaborative projects that include industrial partnerships.Further work is required to determine the reasons for mistrust, but high-profile concerns around the approach of companies such as Cambridge Analytica (Berghel, 2018) and mistrust of the profit motives of private companies and their contribution to (for example) the opioid crisis (Marks, 2020) are likely contributory factors.Outreach events in which companies can discuss their approach to research with parents are important, as well as maintaining the highest standards of probity and transparency in industrial collaborations.
The common failures to declare conflicts of interest in published studies or press releases undermine these efforts (e.g., Feldman & Mann, 2019).
One point of note to the question of trust in organisations is that parental ratings may well have been influenced by the organisation conducting the research.For example, it is possible that if parents are willing to take part in research for certain organisations (in this case, completing questionnaires for a university), they may also be more willing to share their/their child's data with the same type of organisations.
As such, it may be that we are observing a possible overestimation of trust in universities and research centres by virtue of the sample that completed the questionnaire.Though, importantly, parents reported similar levels of trust in GPs and hospitals as that of universities (25% vs 32% of parents that trusted the respective organisations completely with their child's data).
Geographical location of a sharing partner was also relevant.Although 55% of this UK sample said they would be happy to share their data globally, 24% preferred their data to stay 'Within the EU' and 15% selected 'UK only' wide data sharing (with 'None of the above' at 6%).The democratisation of research data is a critical endeavour (Buckingham Shum et al., 2012), with emphasis needed on both collecting data from contexts outside industrialised settings (Nielsen et al., 2017) but also sharing already-collected data with scientists from around the world.Data sharing is important for pooling insights across larger populations; several recent publications on linking individual differences in brain function to behaviour suggest sample sizes need to reach the hundreds of thousands to be meaningful (Grady et al., 2021), and in genetics samples in the millions will be required (Hivert et al., 2021).Such samples are rarely easy to reach without data pooling across settings, and this process is in its infancy for most developmental research.Thus, parental reticence about sharing beyond local geographical borders is a challenge to the globalisation of research.Some groups have managed this successfully.Consortia like the 'Many Babies' group (e.g., Visser et al., 2022) play a critical role in deploying the same experiment across continents to assess reproducibility and generalisability; methods employed are typically visual attention or eyetracking (the methodology that raised the fewest concerns in our respondents).Databrary provides an innovative platform for video sharing (http://databrary.org).Many behavioural cohorts have infant data with permission for broad sharing.However, databases that include multiple linked datapoints per participant with brain and biological data from infancy that can be openly shared remain rare.Further, there is very little empirical literature on the experience of parents and children in these studies, their decision-making around data sharing and their understanding of the open nature of the data.

Family experience of neurodevelopmental conditions
Families with a child or family member with a neurodevelopmental condition (primarily autism or ADHD) were in general more positive about data sharing than families who only had experience of typically developing children.Families with experience of neurodevelopmental conditions may have a stronger motivation to share data in order to help their child or other children experiencing similar challenges (e.g., Haas et al., 2016;Lajonchere, 2010;Salvaterra et al., 2014).Care must be taken not to exploit this altruistic motivation and to ensure that data sharing protections are just as strong for this group.In particular, attitudes of autistic people and family members of autistic people can sometimes diverge; ensuring that infants or children have control over their own data as they grow old enough to make decisions for themselves is critically important.Since parents are making decisions about sharing on behalf of their child, consideration of the child's future self is important.For example, discoveries about a child's genetics can have far-reaching implications for them having children in the future.For this reason, general NHS practice is not to test for genetic syndromes in children who are not "Gillick competent" (able to consent) unless they are immediately consequential for treatment (British Society for Human Genetics, 2010; Joint Committee on Genomics in Medicine, 2019); these norms are beginning to shift in other countries and settings (Papaz et al., 2019) and raises important questions about the ethics of returning genetic results from a study where data is collected at multiple global sites.In the case of neurodevelopmental conditions such as autism there are substantial concerns around the development of prenatal tests that could lead to selective abortion (Walsh et al., 2011); this is more likely in the case of rare variant detection given their larger effect sizes on developmental outcomes.Open sharing of a child's data including their genetic information cannot be retrieved; many autistic people have substantial concerns about genetic data sharing (Sanderson, 2021) and it is unclear how knowing their DNA was shared as a child could affect them.Studies exploring the feelings of adults whose data was shared as a child and of teenagers (Pavarini et al., 2022) are important in this regard.

Research goals
Families are concerned not only about security and privacy of their child's data, but the purpose for which it is shared.These concerns are greatest in families with personal experience of neurodevelopmental conditions.Conversations about the goals of research have become increasingly prevalent on social media and in research collaborations, with the neurodiversity movement at the cutting-edge of highlighting the common mismatches between a study's stated scientific goals and their public communication (e.g., Pellicano et al., 2014).Increasing awareness of the bias that can be introduced by ableist language (Bottema-Beutel et al., 2021) and deficit-based models (Kapp, 2019) has led to an increasing shift in thinking and approach in many developmental fields (Pellicano & den Houting, 2022).Together, there is a growing understanding that families are not simply concerned with the implications for their own child of a data sharing breach (e.g. the loss of their right to an open future through inflicted insight, or the loss of privacy if information is reidentified).Rather, families are concerned that their child's data might be used for a study with a purpose with which they disagree.An area of common concern are studies with the goal of identifying prenatal tests for neurodevelopmental conditions (known to enable abortion in the case of Down's syndrome; Natoli et al., 2012).High profile instances of similar issues have been widely discussed in the case of cultural groups and DNA data (e.g., Lee et al., 2019) and the use of cell lines without permission (Wolinetz & Collins, 2020).We need to carefully consider mechanisms that could allow families to decide against data sharing with studies with a particular goal.One approach (the most burdensome) is to ask families to reconsent to each new episode of data sharing, but this requires data to remain identifiable so that it can be shared (or not shared) on each new occasion.This trade-off of identifiability and control must be carefully considered at study outset, but notably dynamic consent was the clear preference expressed in our survey.
A less onerous method may be to develop broad categories of purpose that families could agree or disagree with during the original consenting process; online portals can then allow these preferences to be changed at will by the parent or by the child as they grow (then frozen when delinkage occurs).
Alternatively, participants could apply to join a panel that would be responsible for collective decision-making about acceptable purposes on behalf of the cohort (allowing deidentification to occur).Applicants to access study data can then describe the purpose to which the data will be put.By essentially requiring a preregistration, this approach can also slow the process of dataset decay (Thompson et al., 2020) and constrain statistical flexibility, allowing Type 2 error rates to be tracked.This curated sharing approach thus provides an attractive balance between open science and privacy and protection, but requires considerable investment by the scientific community in scientific review.

Biased samples
More personalised approaches to curated data sharing raise substantial analytical issues.Given only 33% of our parents said they would be happy with open data sharing, investigators who mandate sharing risk attracting a biased sample of families to the research project.This may exacerbate inequalities in the representation of minoritized groups in research studies.The overwhelmingly White/educated/industrialised nature of research samples in developmental science has been well documented (Nielsen et al., 2017); efforts to improve this need to be considered alongside initiatives to increase possibilities for open data sharing.Rigorous evaluation of the demographic characteristics of participants who have different levels of willingness to share data is important in this endeavour, along with increasing trust in the research process in historically marginalised communities.Comfort with sharing did vary slightly between modalities, with the most concerns for sharing DNA data and the fewest for eyetracking data.However, the differences were small and in general the data indicated that researchers need to consider family concerns for all collected data types.Additionally, we found no differences in parental level of education and ethnicity between those parents that are willing to share their child's data and those parents that would prefer to restrict their child's data.As such, the sample of parents that are willing to share their child's data seem to be representative on these core characteristics.This may lend support for the suggestion that "the claims of the amount of consent bias are likely overstated; and any residual effects of consent bias fall below acceptable levels of imprecision" (Rothstein & Shoben, 2013).
Including a broad range of participants in a research study with tiered or dynamic permissions for sharing also poses challenges for reproducibility.If data shared with a group on a different continent contains a different number of participants than data shared with a local group, assessing the reproducibility of a research finding is challenging.One approach is to simulate missing data so that its statistical characteristics are preserved (e.g., Woods et al., 2021); though it is important to have a full ethical discussion about whether this is in the spirit of the participant's decision to restrict the sharing of their data.Imputation approaches should become standard in data repositories such that this can be reproducibly achieved in a standardised way that can be described in publications resulting from shared data.However, many missing data approaches may be unsuitable if data is not missing at random (Little et al., 2014); again, this highlights the need to first understand the characteristics of the individuals who do and do not choose to share data openly.Further qualitative study of the reasoning behind an individual's choice to restrict sharing may also open avenues -if concerns are primarily around privacy, federated data sharing approaches or full information about security and privacy considerations may enable them to feel confident about broader sharing.

Limitations
Though we attempted to include a broad sample in the current investigation, we are aware that much of the sample is from a White British background (74%) with higher familial education levels (~88% with an undergraduate degree).Also, the study was conducted within a UK context only.As such, opinions from different contexts and cultures are important to consider.
In addition to this, a large proportion of our sample had previous experience of participating in research (either themselves or their children).Whilst this is a strength in that participants are drawing on lived experiences to respond to the questions and not answering hypothetically, it is important to note that we may be overestimating the willingness to share data by virtue of not capturing a higher proportion of those respondents that may not participate in research at all.

Conclusion
Families want us to share their child's data, but to do it very carefully.We need a field-wide investment in data sharing architectures that enable dynamic or tiered consent, implement state-of-the-art approaches to dealing with missing data, and embed full transparency to build community trust in the security, privacy and goals of different groups of researchers.
We need a stronger emphasis on responsible open science, twinning the important drive for greater data democracy with an understanding of participant concerns and a commitment to ensuring that datasets do not become even less representative because of data sharing efforts.Taken together, we need to work in partnership with families to produce data sharing architectures that enables developmental science to become truly global.
The different types of consent could be more clearly explained in the introduction and/or method section.It is difficult to fully understand this section of the results as it currently is.In the discussion, different consent scenarios are described, but it is not always straightforward to match these to the labels that are used in the questionnaire and the results section.
p.3: The authors state that 'there has been little focus on the gender distribution of researchers who exploit open datasets'.Could the authors clarify why this gender distribution is of interest?
Figure 2.: The colour scale seems to be missing from the bottom panel.
Figure 3: The question presented on the bottom panel does not seem to match the colour scale.
In personal/demographic data, it is not always clear what information would be classified as personal and demographic versus household information.I also wondered how this was presented to parents and if all parents would have understood these terms in a similar way.
The same goes for questionnaire data.What is this exactly and how was it described for parents?
To interpret Figure 5, it would be helpful to know how many parents provided written responses.For example, when stating that a term had 114 hits, does this mean that it was presented in 114

Jannath Begum Ali
Reviewer 2 comments: This article investigates parental opinions on data sharing in developmental science.This is an important and rarely addressed topic with clear implications for funders' and publishers' data sharing policy.Data is presented from 195 parents who responded to a survey designed by the authors.The data suggests that parents are generally supportive of curated data sharing, with increased willingness to share their child's data with universities, research centres, GPs and hospitals than private companies and industry.Parents supported more nuanced consenting models with opportunities to modify their permission over time.Parents with family experience of neurodevelopmental conditions were generally more open to data sharing than families without experience of neurodevelopmental conditions.The article is well-documented and written and presents a study which is robustly designed, conducted, and analysed.I have a few suggestions for clarifications and improvements.
We thank the Reviewer for their positive comments on our manuscript.
R2.1: The different types of consent could be more clearly explained in the introduction and/or method section.It is difficult to fully understand this section of the results as it currently is.In the discussion, different consent scenarios are described, but it is not always straightforward to match these to the labels that are used in the questionnaire and the results section.
We have provided further details about the consent models (pg.7):"How is the data shared?We asked participants about their views of common consenting models, including Restrictive (participants are contacted and asked each time a study wants to use their previously collected data), Dynamic (participants use an online portal to choose which studies their data is shared with), Tiered (participants choose certain categories/tiers of research they would be happy to share their data with at the point of informed consent for the original data collection) and Broad consent (the data can be shared in an anonymised manner for other studies and participants are not contacted for permission)."R2.2: p.3:The authors state that 'there has been little focus on the gender distribution of researchers who exploit open datasets'.Could the authors clarify why this gender distribution is of interest?
As per R1.2, we have further expanded upon this point (pg.4): "Further, there has been little focus on the gender distribution of researchers who exploit open datasets (sometimes called 'data parasites ';Longo & Drazen, 2016) relative to those who often are involved in preparing them.This can have real implications for the career development of researchers who collect and share data (particularly if they are at an earlier career stage) because this process requires a considerable investment of time relative to running analyses on existing data.Specifically, it has been found that female academics are assigned and complete more 'academic housework', such as mentoring, student and faculty service (including emotional labour) and being involved in lower status committees that do not necessarily get reflected in their CVs (O'Meara et al., 2017;Hanosono et al., 2019;Jarvinen & Mik-Meyer, 2024).Indeed, this 'invisible labour' could also be extended to the time intensive tasks of data collection and curation that are specific to open datasets, with women typically overrepresented in more junior (e.g., data collection) roles, but underrepresented in more senior academic roles (Herschberg and Berger, 2015).In order for open datasets to avoid becoming a route where gender disparities are amplified, it would be important to examine the mechanisms of this unequitable gender distribution and remedy this."R2.3: Figure 2.: The colour scale seems to be missing from the bottom panel.
We have now added the legend for Figure 2 (pg.40) R2.4: Figure 3: The question presented on the bottom panel does not seem to match the colour scale.
We have now corrected this to "How much would you worry about your child's wishes towards their data as they grew older?" (pg.41).
R2.5: In personal/demographic data, it is not always clear what information would be classified as personal and demographic versus household information.I also wondered how this was presented to parents and if all parents would have understood these terms in a similar way.The same goes for questionnaire data.What is this exactly and how was it described for parents?
Given the close nature of these types of information, we agree that it may have been confusing for the parents.As such, in the questionnaire (detailed in the SM), we defined the exact types of information for each category.For example, we asked specifically "How happy would you be to share demographic information about your child, including ethnicity and date of birth?", "How happy would you be to share demographic information about yourself, including ethnicity, date of birth and education level?" and "How happy would you be to share information about your household, including number of bedrooms, first part of postcode and area?"We also provided several, separate options regarding different types of questionnaire data.Specifically, we asked: "Please select all the types of questionnaires you would be happy to be shared: Child's sleep patterns, Child's temperament (personality), Child's milestones (e.g., when did they start crawling?),Parent(s) mood (e.g., questions about symptoms of depression), Parent(s) stress levels, Family medical and psychiatric history, Pregnancy data (e.g., gestation, any medications taken during pregnancy), Parent(s) substance use."We have also clarified this in the main text (pp.8-9): "Section 1 was designed to collect basic demographic information about their youngest child (sex, age, ethnicity) and family (highest level of education of the parent filling in the questionnaire, presence of neurodevelopmental or genetic conditions).""Personal/Demographic data We found significant differences between the types of personal and demographic data parents were willing to share [χ 2 (4)=32, p<.001, W=.04].Parents were more willing to share parent demographics such as ethnicity and education level (mean rank=3.17),when compared to Household information, such as the first part of postcode or the number of bedrooms in the home (z=.46, p=.042; mean rank=2.71).This was also the case for parent substance use (mean rank=3.18),which was found to be more willingly shared when compared to Household information (z=-.467, p=.036)."(pp. 18-19).As such, given the detailed information for each type of data, we feel that this would have been fairly clear to participants.
R2.6: To interpret Figure 5, it would be helpful to know how many parents provided written responses.For example, when stating that a term had 114 hits, does this mean that it was presented in 114 out of 195 written reports?Our final sample completed the questionnaire in full (N = 195) In this work, the authors investigated parents' views on data sharing in developmental science.Given the current situation of funders, publishers and institutions increasingly encouraging open data access, this is really timely.I read this manuscript with great interest and I agree with the authors that the question of data sharing is an important one, for both researchers and parents.The authors conducted a survey online and report data from 195 parents of children with and without neurodevelopment disorders.I believe the methods and the analyses employed are valid and the results will be of interest not only to the scientific community and those taking part in research but also to a range of stakeholders.The discussion provides informative conclusions and offers thoughtful reflections.The authors also advance possible directions in the area of data sharing.
I outline below some minor comments and suggestions: The first part of the second paragraph reads a bit redundant with the introductory paragraph. 1.
I find it interesting the reflection on gender distribution of researchers who prepare and exploit open datasets but I believe it needs a slightly bigger context or explanation to be fully appreciated by the reader.

2.
Please specify whether the information of parent education refers only to the parent filling in the survey. 3.

4.
Figure 2: missing legend for graphs (E) (F) (G). 5. Do the authors think the answers to the questions on the types of organisations (who can share the data and trust in these organisations) would have been different if the survey was coming from a different organisation, i.e. private company, charity, GP? Consider to comment on this point in the discussion.

Are sufficient details of methods and analysis provided to allow replication by others? Yes
Are all the source data and materials underlying the results available?Yes If applicable, is the statistical analysis and its interpretation appropriate?

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Developmental science I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 23 Apr 2024

Jannath Begum Ali
Reviewer 1 comments: In this work, the authors investigated parents' views on data sharing in developmental science.Given the current situation of funders, publishers and institutions increasingly encouraging open data access, this is really timely.I read this manuscript with great interest and I agree with the authors that the question of data sharing is an important one, for both researchers and parents.The authors conducted a survey online and report data from 195 parents of children with and without neurodevelopment disorders.I believe the methods and the analyses employed are valid and the results will be of interest not only to the scientific community and those taking part in research but also to a range of stakeholders.The discussion provides informative conclusions and offers thoughtful reflections.The authors also advance possible directions in the area of data sharing.We thank the Reviewer for their positive assessment of the study.
R1.1: I outline below some minor comments and suggestions:The first part of the second paragraph reads a bit redundant with the introductory paragraph.We have removed this portion of the second paragraph (pg. 3) R1.2:I find it interesting the reflection on gender distribution of researchers who prepare and exploit open datasets but I believe it needs a slightly bigger context or explanation to be fully appreciated by the reader.We thank the Reviewer for this comment and have expanded our discussion of the gender disparity in open datasets (pg.4): "Further, there has been little focus on the gender distribution of researchers who exploit open datasets (sometimes called 'data parasites';Longo & Drazen, 2016) relative to those who often are involved in preparing them.This can have real implications for the career development of researchers who collect and share data (particularly if they are at an earlier career stage) because this process requires a considerable investment of time relative to running analyses on existing data.Specifically, it has been found that female academics are assigned and complete more 'academic housework', such as mentoring, student and faculty service (including emotional labour) and being involved in lower status committees that do not necessarily get reflected in their CVs (O'Meara et al., 2017;Hanosono et al., 2019;Jarvinen & Mik-Meyer, 2024).Indeed, this 'invisible labour' could also be extended to the time intensive tasks of data collection and curation that are specific to open datasets, with women typically overrepresented in more junior (e.g., data collection) roles, but underrepresented in more senior academic roles (Herschberg and Berger, 2015).In order for open datasets to avoid becoming a route where gender disparities are amplified, it would be important to examine the mechanisms of this unequitable gender distribution and remedy this."R1.3: Please specify whether the information of parent education refers only to the parent filling in the survey.
We have now clarified this with the following text (pp.8-9): "Section 1 was designed to collect basic demographic information about their youngest child (sex, age, ethnicity) and family (highest level of education of the parent filling in the questionnaire, presence of neurodevelopmental or genetic conditions."R1.4: Figure 1: I think the arrow to the box 'No questions completed (n=105)' should come from the box 'Completed consent form (n=300)'.We have amended this in Figure 1 (pg.39).R1.5: Figure 2: missing legend for graphs (E) (F) (G).We have now added the legend for Figure 2 (pg.40) R1.6: Do the authors think the answers to the questions on the types of organisations (who can share the data and trust in these organisations) would have been different if the survey was coming from a different organisation, i.e. private company, charity, GP? Consider to comment on this point in the discussion.This is an interesting point and we have added the following text to the Discussion (pg.23): "One point of note to the question of trust in organisations is that parental ratings may well have been influenced by the organisation conducting the research.For example, it is possible that if parents are willing to take part in research for certain organisations (in this case, completing questionnaires for a university), they may also be more willing to share their/their child's data with the same type of organisations.As such, it may be that we are observing a possible overestimation of trust in universities and research centres by virtue of the sample that completed the questionnaire.Though, importantly, parents reported similar levels of trust in GPs and hospitals as that of universities (25% vs 32% of parents that trusted the respective organisations completely with their child's data)." Competing Interests: No competing interests were disclosed.

Figure 1 .
Figure 1.Consort diagram showing the total number of participants in our sample.

Figure 2 .
Figure 2. Graphs showing participants' trust levels with different organisations (Panel A: Universities and Research Centre's, Panel B: Private Companies and Industry, Panel C: Charities, Panel D: GP's and Hospitals), that may share their data.Graphs showing which factors influence parents decisions to share data (Panel E: Who is running the study, Panel F: Data security procedures, Panel G: Purpose of the study).

Figure 3 .
Figure 3. Graphs showing participants preference in distance of sharing data, split by group (Panel A: Families without a child with a neurodevelopmental condition, Panel B: Families with a child with a neurodevelopmental condition).Graphs showing participants' level of worry about sharing data, split by whether they had any previous experience of taking part in research (Panel C: No previous research experience, Panel D: Some research experience either parent or child).

Figure 4 .
Figure 4. Graphs to show participants ranked preference of different consent models from most to least comfortable (Panel A: Restrictive, Panel B: Tiered, Panel C: Broad, Panel D: Dynamic).

Figure 5 .
Figure 5. Word map of all parents' (n = 195) opinions on the Benefits (Panel A) and Risks (Panel B) of data sharing.Panel B is restricted to those parents with a child with one or more neurodevelopmental condition (n = 73).
, which included the free text responses regarding the benefits and risks of data sharing.We have made this clear in the Figure captions (pg.38): "Figure 5: Word map of all parents' (n = 195) opinions on the Benefits (Panel A) and Risks (Panel B) of data sharing.Panel B is restricted to those parents with a child with one or more neurodevelopmental condition (n = 73)."Department of Psychology, University of Essex, Colchester, England, UK

Figure 1 :
Figure 1: I think the arrow to the box 'No questions completed (n=105)' should come from the box 'Completed consent form (n=300)'.

Is the work clearly and accurately presented and does it engage with the current literature? Yes Is the study design appropriate and is the work technically sound? Yes Are sufficient details of methods and analysis provided to allow replication by others? Yes Are all the source data and materials underlying the results available? Yes If applicable, is the statistical analysis and its interpretation appropriate? Yes Are the conclusions drawn adequately supported by the results? Yes Competing Interests:
No competing interests were disclosed.