Medication use for depression and anxiety: data collected from 28/29 year-old offspring in the Avon Longitudinal Study of Parents and Children [version 1; peer review: awaiting peer review]

Patterns of mental health have been well characterised in the Avon Longitudinal Study of Parents and Children (ALSPAC), but there is a paucity of longitudinal medication data for depression and anxiety within the ALSPAC study. Understanding types and usage of pharmacological treatment allows for a deeper understanding of mental health in the ALSPAC study and key factors influencing illness outcomes, such as access to service provision. Enhanced understanding of the types of medication people have used to manage depression and anxiety could also give insight into which treatments work for individuals over the life course. This data note describes data collection on medication for depression and anxiety in the offspring (ALSPAC-G1) at ages 28-29 (born 1991/1992). Data were collected through a questionnaire deployed between December 2020 and April 2021. First, we highlight the variables collected as part of the questionnaire, specifically on medication use for depression and anxiety. We then outline how we have derived antidepressant variables including type of antidepressants, length of time used, and treatment phenotypes such as ‘remission’ and ‘non remission’. Finally, we also report associations between longitudinal mental health variables and reported use of medication and antidepressant variables to validate these new measures. Considerations for how the data collected can be used for researchers is also summarised.


Introduction
Antidepressants (ADs) are a commonly used psychopharmacological intervention for the treatment of depression and anxiety. Their use in the United Kingdom (UK) has been rising, with 83.4 million ADs prescribed in 2021/22 1 . Evidence also suggests that AD usage rates are rising worldwide in association with the COVID-19 pandemic 2 , alongside evidence of deteriorating mental health in UK adults as the pandemic progressed 3 .
Recent NICE guidelines for the treatment of depression state that ADs are no longer routinely offered as a first-line treatment for less severe depression 4 , but are still recommended as the first-line treatment option for adults with more severe depression in combination with Cognitive Behavioural Therapy (CBT). AD monotherapy is only recommended as fourth-line treatment 4 . Despite this, patients with mild to moderate depressive episodes can request ADs as first-line treatment if that is their preference. Treatment preference in more severe episodes of depression is at the discretion of the healthcare provider. Within these guidelines, recent recommendations around safe prescribing of ADs acknowledge that people would like to have more specific information around the potential benefits and harms of AD use, as well as information on the length of treatment and process of withdrawal 5 . The guidelines also state that, amongst other things, information on possible side effects of AD use should be available to the patient before commencing treatment. Data on AD use will inform an important part of the ongoing effort to manage mental health conditions and challenges, with the Wellcome Trust recently selecting Selective Serotonin Reuptake Inhibitors (SSRIs) as an important "active ingredient" in the effort to combat youth anxiety and depression 6,7 .
Birth cohort studies offer the opportunity to explore mental health and illness dynamics intergenerationally and over time, facilitating the exploration of potential causes and features of episodic mental health challenges in its participants. The Avon Longitudinal Study of Parents and Children (ALSPAC) has rich information on longitudinal patterns of mental health 8 , but does not currently contain detailed information on AD use and health behaviours relating to medication use. Collecting richer data that documents reasons for stopping treatment will be of great benefit to the existing mental health data corpus available in the ALSPAC resource, enhancing understanding of intervention efficacy in a real-world setting that goes beyond the objectivity of electronic health record (EHR) linkage.
This data note describes the responses to the 'Life @ 28' Mental Health Treatments questionnaire section, deployed between December 2020 and April 2021, as part of the ALSPAC birth cohort study. The data described below relates to the prevalence and use of antidepressant medication for depression and anxiety, including names and types of ADs used, dates and timeframes of usage, adherence rates, and reasons for stopping treatment. These data may be used as part of research studies which could seek to address the gaps in clinical knowledge the recent NICE guidelines for depression have been adapted around, such as potential benefits and harms of treatment, and barriers to patient adherence.

Setting
ALSPAC is a population birth cohort study launched in the 1990s that followed the lives of approximately 15,000 families in Bristol and the surrounding areas 9,10 . Pregnant women resident in Avon, UK with expected dates of delivery between 1st April 1991 and 31st December 1992 were invited to take part in the study. 20,248 pregnancies have been identified as being eligible and the initial number of pregnancies enrolled was 14,541. Of the initial pregnancies, there was a total of 14,676 foetuses, resulting in 14,062 live births and 13,988 children who were alive at one year of age. When the oldest children were approximately seven years of age, an attempt was made to bolster the initial sample with eligible cases who had failed to join the study originally. As a result, when considering variables collected from the age of seven onwards (and potentially abstracted from obstetric notes) there are data available for more than the 14,541 pregnancies mentioned above: The number of new pregnancies not in the initial sample (known as Phase I enrolment) that are currently represented in the released data and reflecting enrolment status at the age of 24 is 906, resulting in an additional 913 children being enrolled (456, 262 and 195 recruited during Phases II, III and IV respectively). The total sample size for analyses using any data collected after the age of seven is therefore 15,447 pregnancies, resulting in 15,658 foetuses. Of these 14,901 children were alive at one year of age 11 .
The types of data available in ALSPAC is vast and spans many different domains of health, lifestyle, and wellbeing. Data have been collected across the ALSPAC resource using a variety of self-report, in-clinic, and biological assays, with online data being captured through REDcap software 12 . Details about ALSPAC, an overview of available data, and a variable search tool can be found at http://www.bristol.ac.uk/alspac/researchers/our-data/. Study data were collected and managed using REDCap electronic data capture tools hosted at the University of Bristol. REDCap (Research Electronic Data Capture) is a secure, web-based software platform designed to support data capture for research studies 13 . Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees.

Questionnaire content design
The 'Life@28' questionnaire (also indexed with the name YPH) was deployed in December 2020 and had one section dedicated to mental health treatments, asking: This question was repeated ten times, so respondents could enter a maximum of ten medications with additional space at the end of the questionnaire to enter any more medications if needed.
The questionnaire then continues with questions regarding current and previous use of Cognitive Behavioural Therapy (CBT), Psychological Therapies, and Self-Help Guidance, which for brevity are not described here.

Data cleaning and curation
The data set was created iteratively through a mixture of manual and automated processing, due to the complex nature of the raw data and the unstructured and varied free-text responses. Automated processing was undertaken using Python version 3.1 and STATA version 16 and included utilising a string pattern matching software ('FuzzyWuzzy') to classify the medication names. Medication names that were spelt relatively accurately were able to be automatically matched, but entries that were more ambiguous had to be classified manually (e.g. 'Certraline' classified manually as 'Sertraline', 'Streamline' classified as 'Missing / unable to characterise'). Due to the complex nature of the free-text data, approximately half of the free text-data was able to be characterised using an automated function, with the rest processed manually due to complexity.
The free text medication name data were processed first to facilitate sorting of the data set into separate medication sections and derivation of AD class use variables. The adherence, start, and cessation dates data, along with the cessation reasons data, were used to derive medication-use phenotypes detailed below. The data were subject to multiple iterations of quality checking to debug the data, accounting for both computer programming and human error.
Descriptive statistics -frequencies of questionnaire response types 4446 people completed the section on mental health treatments, with 918 answering 'yes' to having been prescribed medication for depression or anxiety in the past five years (Table 1).

Participant demographics
As indicated in Table 2, a higher frequency of female respondents had been prescribed medication for anxiety or depression in the last five years compared to males (Pearson's χ 2 = 75.6, p<0.001). There was evidence for an association between maternal SES in early life and likelihood of having been prescribed medication (Pearson's χ 2 = 4.3, p=0.039), and there was evidence that respondents who had at least one parent with mental health problems during childhood and early adolescence (as measured as an Adverse Childhood Event (ACE) see 14 for details) are more likely to have been prescribed medication for depression or anxiety (Pearson's χ 2 = 56.7, p<0.001).

Structure of final data set
The final data set has five main sections: AD usage, antipsychotic section, anxiolytic section, phenotypes section, and miscellaneous variables. Each section is described below.

AD usage section
The AD usage section has seven repeated subsections, corresponding to the number of medications reported by each individual. Participants were able to report up to seven separate occasions of AD use. Each occasion of AD use has a separate variable name, detailed in Table 3. A respondent who has only taken one AD will have the first AD usage variable populated, with subsequent AD variables across that row containing the descriptor 'No further ADs'. Each instance of AD use was recorded as a separate occasion, rather than each occasion of use describing a different type of AD used. For example, a respondent who has been prescribed Sertraline two times would result in two instances of AD use recorded as Sertraline, with 'No further ADs' recorded from occasion three onwards. Further AD variables derived include information on AD class, amount of ADs prescribed, and indicators of whether the respondent stopped taking a particular class of ADs due to side effects. Variables and corresponding variable names are described below:  Table 4 for details on how different ADs were classified into the above class variables.

Antipsychotic section
The antipsychotic usage section has three repeated subsections, corresponding to the number of antipsychotics reported within the sample. Duration information is only available for the first antipsychotic recorded, due to low cell counts in subsequent sections risking disclosure. The variable names for each variable 1-3 are described below: d) Binary indicator variable ('Participant has been prescribed an antipsychotic') (YPH7290)

Anxiolytic section
The anxiolytic usage section has three repeated subsections, corresponding to the amount of anxiolytics reported within the sample. Duration information is only available for the first and second anxiolytic recorded, due to low cell counts in subsequent sections risking disclosure. The variable names for each variable 1-3 are described below: 1. Anxiolytic name (YPH7250, YPH7260, YPH7270) Within the antipsychotic and anxiolytic usage sections, there are three individual sections pertaining to a single instance of antipsychotic or anxiolytic use. The cessation reasons were excluded for these medications to limit the size and complexity of the final data set. Due to the disclosive nature of medication names in data points that have small cell counts, antipsychotic use entries that were >1 and anxiolytic use entries >2 were set to missing. Researchers wishing to use these data specifically are able to request access via ALSPAC's split stage protocol detailed in appendix three of the ALSPAC Access Policy http://www.bristol.ac.uk/media-library/sites/alspac/ documents/researchers/data-access/ALSPAC_Access_Policy.pdf. ▪ Participant prescribed >=3 antidepressants and is not adhering to any of them. Note: Researchers Table 5. Non-AD drug class classifications.

Miscellaneous variables
Missing month (YPH7315) ▪ Derived to indicate entries which had missing month data in any section. The duration variables for these respondents were derived using logical approximation with the 'start month' date set to either the beginning or the end of the year given. For sensitivity, these respondents can be removed using this variable for analyses looking specifically at timeframes and duration of medication use <1 year.
Participant is taking any other medication (YPH7282) ▪ Binary indicator variable not containing specific information e.g. medication name. Examples of medications classified into this variable include medication that is not a prescribed antidepressant, antipsychotic, or anxiolytic, such as homeopathic remedies, non-prescription medication, drugs considered 'recreational', or medication believed to be prescribed for other illnesses e.g. ADHD medication.

Frequency of AD classes and types
Bar chart counts of the frequencies of the amounts of ADs prescribed to each respondent can be found in Figure 1. Bar charts of specific AD classes -selective serotonin reuptake inhibitors (SSRIs), serotonin norepinephrine reuptake inhibitors (SNRIs), tricyclic antidepressants (TCAs), atypical antidepressants, and 'other' antidepressants -can be found in Figure 2. Figure 3 shows the percentage of respondents who stopped taking AD medications due to feeling better '('remission') or feeling worse ('non-remission'). In addition to medication stop reasons 1, 5 and 6 being used to derive these variables, respondents also had to have taken the medication for at least 90 days to meet the criteria. Because of this duration caveat, readers should bear in mind that not meeting the criteria for 'remission' does not mean the respondent did not stop the AD due to feeling better, and vice versa for 'non-remission'. Table 6 shows the duration of antidepressant use of the first antidepressant prescribed for both adherers and respondents who ceased medication. Due to incomplete time data for some entries (e.g., respondent only logging the start / end years of medication use without month data), the duration data is an estimation, as month data was defaulted to either the beginning or end of the year given, as described previously. Start dates of the medication were given with year and month data separately. A variable 'missing_month' has been derived to highlight entries where a respondent had missing month data for any medication entry and the month data had defaulted during the cleaning process. This can be used to identify those entries where the time data is an estimation, to reduce noise in analyses. Entries where the year was missing were kept as missing in the final data set.

Summary
This data note has described the 2020-21 collection of medications for mental health data in the ALSPAC birth cohort study. These data can be linked to existing mental health, lifestyle, behavioural and physiological health data to explore associations between AD use and various health outcomes, as well as understanding patterns of responses to AD treatment in a real-world setting.
Considerations for the data The data described here were derived from a combination of manual and automated data wrangling, due to the complexity of free text self-report data and the variability with which the data was entered by respondents. The data have not been validated with any external prescription records. Therefore, accuracy of both the names of medications and the timeframes of use are dependent on respondent recall. Typos, inaccuracies, and ambiguous entries were dealt with to the best of our knowledge, with any entries too ambiguous to categorise set to missing. Responses such as 'antidepressant' or 'anxiety medication' were also set to missing, due to being uncharacterizable. Entries where two medication names were given in the same column were processed as two separate entries, using the same date, adherence, and cessation reasons.
A total of seventeen different ADs were recorded from the questionnaire responses. When the same medication was referred to by multiple names (e.g. 'prozac' and 'fluoxetine'), they were recorded as the same name (the official drug name used in the United Kingdom (UK)). Some participants included dosage information when reporting their medication usage. Whilst this information would be valuable, we did not ask for dosages explicitly in the questionnaire items, so did not retain any dosage information in the final data set.
Some participants also responded with a comprehensive history of their medication use, despite the root question asking only for the past five years. This means that some of the medication use data collected refers to when the participant was younger than 24. We have chosen to keep this information within the dataset in the hope that it is useful for other researchers.
The original raw data includes dates of usage, which were used to derive duration variables. As mentioned in previous sections, these data suffered from missingness and incomplete responses, likely due to the retrospective nature of the questions and understandable difficulties in accurate reporting of medication history. For this reason, conservative (i.e., underestimating) estimates of duration were used in some cases. Users of the data can remove participants who had incomplete time data to create complete-case data sets for sensitive analyses, as detailed in previous sections.
Some participants listed free-text stop reasons for discontinuing antidepressants. These data have not been included in the final release of the data, but may be available to users upon request.
We have derived medication use 'phenotypes' and given information on how we have defined them for the purposes of this data set. These were defined using researcher expertise but may be defined differently elsewhere in the literature. Defining treatment outcomes in mental health research is not Figure 3. Percentage of respondents who stopped taking an AD due to feeling better (F) and percentage of respondents who stopped taking an AD due to feeling worse / not feeling better (G).
straightforward, with there being variability in how concepts such as 'recovery' have been defined historically 15 , and there being differences in how 'response' to treatment would be characterised using observational, real-world data. We have stated clearly how we have defined said phenotypes; future research could provide alternative ways to derive these data, and users are encouraged to pay close attention to how said phenotypes have been defined before use.
Researchers wishing to use these data are advised to read the corresponding documentation to ensure users are aware of the data's limitations and considerations.

Data set validation
Measure of association between taking medications and poor mental health We validated the dataset using various longitudinal phenotypes from the pre-existing ALSPAC data, including: -Number of episodes of depression symptoms in adolescence and adulthood -Previous medication history -Depression symptoms -Self and doctor diagnosed depression  16 . Comorbidity of depression and anxiety was derived using the GAD variable and the presence of depression symptoms at 24. Difficulty in finding out how to manage mental health problems was assessed at age 25. Documentation for the above variables and corresponding measurement scales can be found in the variable catalogue, accessible as a zip file here: http://www.bristol.ac.uk/alspac/researchers/our-data/.
Logistic regression analyses (detailed in Table 7) were run to validate the data by quantifying the association between taking medication for depression and anxiety (variable YPH7000) and relevant mental health outcomes such as diagnosis of depression, presence of depression and anxiety symptoms, previous history of taking antidepressants, and other risk factors such as difficulty finding out how to manage mental health problems. We selected variables that allowed for an insight into the associations between medication use and severity, chronicity, and comorbidity of depression and anxiety, as well as access to treatment.
A greater number of prospective depression episodes across adolescence were associated with being prescribed medication for depression and anxiety between the ages of 23 and 28 (OR: 1.52, 95%CIs: 1.41, 1.63). Similar results were also found for the number of depression episodes in early adulthood (see Table 7). There was a dose-response between episodes of mild, moderate, and severe depression severity and taking medication. Medication use reported in clinic at age 24 was associated with reported medication use between the ages of 23 and 28, validated by an odds ratio of 38.35. Respondents were also at higher odds of having experienced difficulty in finding out how to manage mental health problems if they had been prescribed medication by age 28. These results together indicate an association between taking medication and relevant covariates relating to symptom presence and help seeking behaviours.

Ethics policy
Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees, full details of the approvals obtained are available from the study website (http://www.bristol.ac.uk/alspac/researchers/research-ethics/).

Consent
Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the

Data availability
Underlying data ALSPAC data access is through a system of managed open access. The steps below highlight how to apply for access to ALSPAC data: 1. Please read the ALSPAC access policy which describes the process of accessing the data in detail, and outlines the costs associated with doing so.
2. You may also find it useful to browse the fully searchable research proposals database, which lists all research projects that have been approved since April 2011.
3. Please submit your research proposal for consideration by the ALSPAC Executive Committee. You will receive a response within 10 working days to advise you whether your proposal has been approved. If you have any questions about accessing data, please email alspac-data@bristol.ac.uk. This project contains the following extended data:

Extended data
-PDF of 'Life@28+' questionnaire distributed to ALSPAC participants -Excel file of complete data dictionary Author contributions HF led the data set curation and manuscript creation. GW and SM quality checked the data set curation and produced final data set. RM, RP, BD oversaw the lead author's work with AK supervising.