Intellectual disability in the children of the Avon Longitudinal Study of Parents and Children (ALSPAC)

Background: Intellectual disability (ID) describes a neurodevelopmental condition involving impaired cognitive and functional ability. Here, we describe a multisource variable of ID using data from the Avon Longitudinal Study of Parents and Children (ALSPAC). Methods: The multisource indicator variable for ID was derived from i) IQ scores less than 70 measured at age 8 and at age 15, ii) free text fields from parent reported questionnaires, iii) school reported provision of educational services for individuals with a statement of special educational needs for cognitive impairments, iv) from relevant READ codes contained in GP records, iv) international classification of disease diagnoses contained in electronic hospital records and hospital episode statistics and v) recorded interactions with mental health services for ID contained within the mental health services data set. A case of ID was identified if two or more sources indicated ID. A second indicator, labelled as “probable ID”, was created by relaxing the cut off in IQ scores to be less than 85. An indicator variable for known causes of ID was also created to aid in aetiological studies where ID with a known cause may need to be excluded. Results: 158 of 14,370 participants (1.10%) were indicated as having ID by two or more sources and 449 (3.12%) were indicated as having probable ID when the criteria for IQ scores was relaxed to less than 85. There were 476 participants (3.31%) with 1 or fewer sources of available information on ID; these participants had their multisource variable set to missing. The number of cases of ID with known cause was 31 (0.22% of the cohort, 19.6% of those with ID). Conclusions: The multisource variable of ID can be used in future analyses on ID in ALSPAC children.


Introduction
Intellectual disability (ID) is a developmental condition defined as having an arrested or incomplete development of the mind alongside functional impairment in facets that contribute to overall intelligence such as cognition, language and social ability 1 . ID manifests during the developmental period and is not the result of later changes to the brain as a result of injury or disease.
There are several challenges in defining ID in practice, particularly in relation to the language used. Several terms are used in the UK including learning disability, learning difficulties, developmental disorder (or delay) and special educational needs 2 . Confusion can arise as these phrases are components of other, separate concepts. For example, specific learning disability refers to dyslexia or dyscalculia, while learning difficulty can refer to intellectual disability or a specific learning disability. It is important to note that those with ID may also have a specific learning disability. Further challenges arise in the definitions used between studies based in different global regions. In the USA the phrase "intellectual disability" carries the same meaning as "learning disability" in the UK, while use of the phrase "learning disability" in the USA refers to what would be described as a "specific learning disability" in the UK.
In a healthcare setting, several diagnostic criteria including the International Classification of Diseases, Version 10 (ICD-10) 1 and Diagnostic and Statistical Manual of Mental Disorders, 4 th edition (DSM-IV) 3 define ID using an intelligence quotient (IQ) score of less than 70; equivalent to 2 standard deviations (SD) less than the assumed population average of 100, alongside functional impairments. The Diagnostic and Statistical Manual of Mental Disorders, 5 th Edition (DSM-5) 4 , states that IQ tests will generally be measured with an error of around 5 points and therefore scores between 65 and 75 may indicate ID. The definition used will greatly affect the prevalence of ID in studies. For example, Cooper et al. 5 note that the proportion of the population expected to lie in the range of IQ scores between 70 and 75 (2.5%), is greater than the proportion of the population expected to have ID using scores less than 70 as a cut off (2.28%). The educational system in the UK uses an even less stringent cut off, IQ less than 85 (equivalent to 1 standard deviation lower than the population average), to indicate "mild learning difficulty" 6, 7 .
It has been argued that ID should not be defined on the basis of IQ test scores alone 7,8 due to the instability of the measure on the basis of mood and fatigue, potential to be influenced by learning or rehearsal, and tests that are largely centred around Western cultural understanding that may have important implications, particularly for migrants. The ICD-10 and DSM-5 also use social functioning and age of onset for diagnosis. Those who have an IQ less than 70 but are able to function without assistance by this definition are not considered to have ID in relation to clinical services. Cooper et al. 5 provide examples such as living independently and holding a job as meeting this criteria of functioning without assistance. Such a definition means that ID is not necessarily stable throughout the life. Those with ID do learn throughout the lifetime, and some of those who require significant support during school age years may go on to learn to live independently.
Intellectual disability has been under-researched in large epidemiological investigations leading to a relative lack of understanding of both its aetiology and consequences. The Avon Longitudinal Study of Parents and Children (ALSPAC) has recorded data in the form of questionnaires, biological samples, and genetic information for several thousand participants from gestation in the early 1990s to the present day. The cohort therefore provides an opportunity to explore both early life causes of ID and its long-term outcomes.
Our goal was to derive a multi-sourced measure of ID for participants of ALSPAC. Data is available from IQ tests measured by trained study fieldworkers at different ages during participant visits to the 'study assessment clinic'. However, participation in ALSPAC, and at these at clinics, may be influenced by having ID. This pattern of missing data is likely to lead to biases in complete case analyses 9,10 and in analyses that attempt to address missing data such as multiple imputation 11,12 . Data linkage to school reported statements of special educational needs and health service reported data on diagnoses and interactions with mental health services can be used to supplement the missing information. In this Data Note, we describe the processes used to derive indicator variables of ID which can be used by researchers in their own studies.

ALSPAC sample
The ALSPAC cohort 13,14 recruited 14,541 pregnant women resident in Avon, UK with expected dates of delivery 1st April 1991 to 31st December 1992. Each enrolled mother either returned at least one questionnaire or attended a "Children in Focus" clinic by 19/07/99. The core sample of pregnancies (also referred to as Phase I) contained a total of 14,676 fetuses that resulted in 14,062 live births; 13,988 of these index children were alive at 1 year of age.
Attempts were made to bolster the initial core sample with eligible cases who had failed to join the study originally. These attempts were made in 1999 when the oldest children were approximately 7 years of age (Phase II recruitment; 456 children recruited), opportunistically from 1999-2012 (Phase III; 262 children recruited) and then from 2012 onwards with specific focus on recruiting second generation pregnancies (Phase IV; 195 index children recruited) 15 . The phases of enrolment are described in more detail in the cohort profile paper and its update 13,14 .
Data has been collected on the cohort since its inception and is still ongoing. The mothers, their partners and the index

Amendments from Version 1
We have added to the abstract the percentage of participants with a known cause of ID out of those identified as having ID. This was included following reviewer comments that indicated the importance of this value to researchers in the area.
Any further responses from the reviewers can be found at the end of the article REVISED child have been followed up using clinics, questionnaires, and links to routine data. The study website contains details of all the data that is available through a fully searchable data dictionary. From age 18, study children were sent 'fair processing' materials describing ALSPAC's intended use of their health and administrative records and were given clear means to object via a written form. This was an 'opt out' approach, meaning linkage was attempted for all participants, except those who objected and those who were not sent fair processing materials. Where 'opt in' consent became practicable (e.g., when a participant attended a study assessment visit) then this was collected by a trained fieldworker.
There were 15,659 total ALSPAC mother-child pairs across Phase I-IV recruitment. Of these, 795 had no NHS number and so could not be linked to the UK Secure eResearch Platform (UKSeRP) where the data were held, 1 participant withdrew consent at this stage. Of the remaining 14,863 participants, 92 were not alive at 1 year of age and 435 were not singleton births (not mutually exclusive groups). On removal of these, a sample of 14,370 mother-child pairs remained. A cohort flow diagram is presented in Figure 1, that describes the exclusion process for each stage of the study.
Data sources for ID indicator Data from ALSPAC sources included measures of IQ taken at age 8 and 15 and free text fields in child-based questionnaires where the responder could record additional information. The linked sources included the Pupil Level Annual School Census (PLASC) which recorded the provision of educational services for individuals with statements of special educational needs (SEN), General Practitioner (GP) records which recorded Read codes related to ID, Hospital Episode Statistics (HES) data which recorded International Classification of Disease (ICD) 1 diagnosis codes for ID and the Mental Health Services Data Set (MHSDS) which contains information on interactions with mental health services for reasons related to ID. Data linkage has previously been undertaken in the Identification of Developmental Impairments (IDI) project led by Emond 16 which identified neurodevelopmental disorders up to a maximum age of 11 years using ICD-10 diagnoses and statements of SEN. Further details of each source of information is provided in the subsections below.
Availability of linked health records (GP records, HES data and MHSDS data) was divided into four groups: (i) those who had explicitly consented to data linkage (5,063 individuals; 35.23%), (ii) those who had not explicitly consented to data linkage (7,358 individuals; 51.20%), (iii) those who had explicitly refused consent for data linkage (359 individuals; 2.50%), and (iv) those who had no data linkage available (1,590 individuals; 11.06%). A Confidentiality Advisory Group (CAG) application 17 was made to obtain access to the information of those who had not explicitly consented to data linkage (group ii) via use of Section 251 of the National Health Service Act 2006 18 . The CAG application, submitted by the ALSPAC data linkage team 19 , via the Integrated Research Application System 20 (CAG reference: 20.CAG/0056; IRAS project ID: 268410) and aligned NHS Data Sharing Agreements, support the use of GP records for this study but not HES or MHSDS data. As a result, data are available on all linked health records for explicit consenters (group i), and on GP records only for non-explicit consenters (group ii). IQ scores. IQ at age 8 years was measured using a short form of the Wechsler Intelligence Scale for Children -III 21 which consisted of alternate items for all subtests except the coding subtest (which was administered in full) as part of a half day battery of mainly psychological and psychometric testing. IQ at age 15 years was measured using the Wechsler Abbreviated Scale of Intelligence 22 as part of a 4 hour battery of testing. Data was available for 7,113 (49.50% of total ALSPAC sample after exclusions) individuals at age 8 and for 5,116 (35.60%) individuals at age 15. From the IQ scores binary variables were created indicating if IQ was below 70 at each age. A second variable was created indicating a less stringent cut off of IQ below 85, equivalent to one population standard deviation below the assumed population average of 100.
Free text fields. ALSPAC contains free text responses to many questions answered by participants and their guardians across the lifetime of the study. For example, at age 9 guardians (typically mothers) of participant children were asked whether the children had been identified as having any particular problems at school and to describe in text each type of school problem. A search was performed across all free text fields contained in ALSPAC for terms related to ID (see Table 1 for the search terms used and number of hits). A review of all free text responses for each individual identified with relevant free text fields (n=203) was performed to check if the text indicated whether the child was likely to have ID or not. Any queries were checked by a clinician who specialises in neurodevelopmental disorders (author DR). We did not classify individuals as having ID if the search terms identified specific learning difficulties (e.g., dyslexia or difficulties specific to maths and literacy ability) or where the terms identified individuals as explicitly not having a learning disability. Following the review of all free text fields for each identified individual, 94 individuals were classed as having ID and 109 individuals were classed as not having ID. Free text data was available for 12,722 individuals in the sample.

Pupil level annual school census (PLASC) records of provision for special educational needs.
Educational provision for children with SEN statements falling under the category "cognition and learning needs" 23 were used to indicate ID. Records of these provisions were made in 2003/4 when the vast majority of the sample children were in school years 6-8 (ages [11][12][13]. We identified all individuals within the category who had a statement for moderate to profound learning difficulties as being a case of ID. The cognition and learning needs category also includes individuals with specific learning difficulties related to problems learning to read, write, spell or manipulate numbers. This latter group were not included as having ID unless they also had a statement for moderate to profound learning difficulties. PLASC data were available for 10,349 (72.02%) of the sample. Those who did not have a PLASC record either did not attend state school in England (includes those attending independent schools, schools outside of England or those educated at home) or could not be matched (for example if their name was changed without ALSPAC being informed) or were not included in the linkage sample as no legal basis could be established. Absence of PLASC information may therefore be associated with ID status and/or enrolment in state provided education.
GP records. GP records contain coded information in the form of Read codes 24,25 . These are a hierarchically coded thesaurus of clinical terms that have been in use by the NHS since 1985. The codes are entered into a computerised system by clinicians or practice staff from general practice or secondary care consultations. A list of version 2 Read codes was created by checking for terms related to intellectual disability or its synonyms using the UK Read Browser, previously accessible from NHS digital's Technology Reference data Update Distribution. The list of Read codes identified was cross checked against a list of codes selected in a previous study looking at incidence of mental illness and challenging behaviour in individuals with ID 26 . Terms that appeared in either list were used (see Table 2 for the Read codes used). Data was available for 12,421 individuals (86.44%) of the sample. Those who did not have a GP record either received primary care outside of England or Wales or via a private (non-NHS) provider; individuals whose GP did not approve the studies extraction of their record; or could not be matched (due to linkage failure); or were not included in the linkage sample as no legal basis could be established (those who objected or where no fair processing could occur). Absence of GP records may therefore be associated with ID status.
Hospital episode statistics. Details of all admissions, attendances at accident and emergency and outpatient appointments at NHS hospitals in England are collected in the centralised, national, HES database 27 . Data for admitted patients are available from April 1997, for outpatient appointments from April 2003 and for accident and emergency attendances from April 2007. This means that data from these sources are available from when the participants were 5-6, 11-12 and 15-16 years of age respectively. The HES dataset recorded all diagnoses up until 1995 using ICD-9 and all diagnoses in subsequent years as ICD-10 codes 28 . Diagnoses of 317-319 (ICD-9) and F70-F79 (ICD-10) made during hospital interactions were used to indicate ID. Data was available for the 5,063 individuals (35.21% of the sample) who had explicitly consented to data linkage of health records and who had presented for hospital care in England. All obtained diagnoses of ID were found in admitted patient records and none were found in either of the outpatient of accident and emergency records.
Mental health services data set. The MHSDS collects data on all interactions between patients and specialist secondary mental health care services 29 . Patients are assigned to mental health clusters using the Health of the Nation Outcome Scales 30 which can be used to indicate the nature of the mental health care. Information regarding intellectual disability can be found within care clusters 18-21 which relate to cognitive impairment. All individuals that had more than one recorded final clinician allocated cluster related to cognitive impairment were indicated as having ID. Less than 5 cases were indicated using this method. All were contained within cluster 18.
MHSDS data was only available for 188 individuals (1.31% of the total sample) who had a relevant Read code found in GP records or ICD code found in HES data. The sample for who MHSDS data was available was therefore a subsample of the explicitly consenting sample of 5,063 individuals who received community mental health care in England.

IDI project.
The IDI project has been described in detail elsewhere 16 . Briefly, the project identified individuals in the ALSPAC cohort with any form of developmental delay as defined by ICD-10 classification. Information on diagnoses was obtained from computerised medical records of NHS trusts in the local Bristol area between 1991 and 2003 (North Bristol Trust, United Bristol Healthcare Trust, Weston Area Health It was not possible to determine the exact overlap between the IDI project sample and the analysis sample of the current project. This was due to the data retained from the IDI project only containing information on those who had an identified diagnosis and not all those for whom medical records were available at the time of the project. The documentation for the IDI project (which can be obtained from the ALSPAC 'useful data' repository) states that 13,898 of the 14,062 live born individuals who make up the ALSPAC Phase 1 sample were eligible for the IDI project: this larger sample size reflects less stringent governance requirements of the time. It was therefore assumed that data was available on IDI diagnoses for all Phase 1 ALSPAC participants.

Multi-sourced indicator of ID
The information available to create a multi-sourced indicator of ID were therefore the following eight items: 1 8. An ICD-10 diagnosis found in the IDI project A case of ID was identified if two or more of the eight criteria were met. We defined a second variable, labelled as "probable ID" using the same criteria as above except that the threshold for ID from the IQ scores was relaxed to 85 (1 SD lower than the population average). This was done to be closer aligned with the definition of borderline ID used by the UK educational system. Where a participant had observed data in only one or fewer sources, they were considered to have missing data for the multi-sourced indicator of ID.

Known causes of ID -a tool for exclusion criteria
Individuals who have a genetic, metabolic, or chromosomal abnormality that is associated with ID constitute a group in which ID is likely regardless of environmental exposure. Such a group may need to be excluded in analyses investigating the aetiology of ID. Genetic, metabolic, or chromosomal abnormalities associated with ID were identified using free text information, GP records and HES data. The free text records of individuals with text relevant to ID were screened for mentions of known genetic causes of ID. Read codes and ICD codes for genetic disorders related to ID were obtained from GP records and HES data. A list of the codes used is presented in Table 3. If a participant had any of these codes, they were

Q99
Other chromosomal abnormalities, not elsewhere specified provided with a "known cause of ID" flag. In total 31 participants had a known cause of ID resulting from genetic, metabolic, or chromosomal abnormalities (8 using free text data, 17 using GP records and 10 using HES data).

Assessment of validity
Individual sources of ID Figure 2 shows the intersection of sources indicating an ID for groups with counts greater than 5 (showing unique intersections where participants are not indicated as having ID by any other source). This information is further explored in Table 4 which i) shows the number of individuals with ID indicated by each source, ii) the number of individuals with each combination of sources of ID (regardless of wider set intersections) and iii) the 95% confidence interval (CI) of the odds ratio (OR) for having ID for each source according to whether participants had another source indicating ID (given that data is available from both sources).
The most common combination of sources for ID were, in order, i) SEN statement and GP Read codes, ii) SEN statement, GP Read codes and an IDI project diagnosis, iii) SEN statement and IDI project diagnosis, iv) an IDI project diagnosis and free text information and v) IQ < 70 at age 8 and 15. Despite being one of the most common combinations, an IQ less than 70 was not commonly indicated by both IQ tests at age 8 and 15. Instead, it was more common to have an IQ less than 70 on one test and an IQ less than 85 on the other. An IQ less than 85 was common on both tests. Diagnosis from the IDI project seemed to be the strongest predictor of having ID indicated by other sources according to ORs. Both the HES and MHSDS sources indicated fewer than 5 cases of ID each, and therefore do not contribute much information to the multi-sourced variables.
The distribution of available IQ scores for those with ID indicated by each source of information is presented in Table 5.
The mean IQ at age 8 was less than 70 among those who were indicated as having ID from the IDI project and from HES data but was greater than 70 for those with ID indicated by free text data, SEN statements and GP records. This may suggest that different severities of ID are being identified by the different sources of information. It is, however, also possible that those with lower IQs were selectively underrepresented at the collection of the IQ data, in questionnaire data and/or in the linked education records. If this is the case then  Upper triangle of the table provides the lower and upper bounds of the 95% confidence interval for the OR obtained from the 2×2 table of the sources for ID identification.
a -indicates that the confidence interval could not be calculated due to perfect prediction. the average IQ at age 8 for those with ID indicated by these sources may be lower than 70, had the missing IQ information been available.

Multi-sourced variables of ID
Of the sample of 14,370 individuals, 158 (1.1%) were indicated as having ID by two or more sources and 449 (3.1%) were indicated as having probable ID when the criteria for IQ scores was relaxed to less than 85. Counts of participants with each number of sources of available data and number of sources indicating ID and probable ID are displayed in Table 6. If the participant had one or fewer sources of information available (irrespective of whether the single source indicated ID) they were considered to have missing data for ID; 476 participants (3.3%) were considered to have missing data using this definition. Ten of these 476 individuals (2.1%) had one source indicating ID but no other sources of information available.
Individuals with ID and probable ID indicated by the multi-sourced variables were compared to those not indicated as having ID on IQ scores measured at age 8 and 15 (presented in Table 7). Those with ID had IQ scores on average 40 points lower than those without ID at age 8 and 29 points on average lower at age 15. For those with probable ID the IQ scores were on average 30 points lower at age 8 and on average 21 points lower at age 15. It should be noted that, as IQ is included in the Where counts are ≤5, the count may be equal to 0.  Known cause of ID flag A comparison of those with a known cause of ID to those without a known cause of ID in terms of sources indicating ID is presented in Table 8. The table shows that 13 individuals identified as having ID had a known cause of ID. Those with a known cause of ID were more likely to be identified as having ID using free text information, SEN statements and GP records than those without a known cause of ID.
Eighteen individuals who were not identified as having ID had a known cause of ID. This is possible as those with a genetic, metabolic, or chromosomal abnormality, used to identify known causes of ID, may not in fact develop an ID, or alternatively may not be investigated for ID as a result of their known abnormality.
Patterns of data availability and consenter status for linked health records Consenter status for linked health records (GP records, HES data and MHSDS data) may also influence the ability to identify cases of ID. Table 9 presents the number of variables  available to identify ID, the number with available IQ data  and the average IQ scores at age 8 and 15, across categories of consent status. The non-explicit consenter group (those with section 251 approval) had on average one fewer available source of information (excluding linked health data sources) than the explicit consenters and were less likely to have available IQ measures at age 8 or 15 than those in the explicit consent or explicit non-consent groups. The non-explicit consenter group also had lower average IQ scores at age 8 and 15 than the explicit consenters. This may suggest that the non-explicit consenter group contains more severe cases of ID than the explicit consenters. It is important to note that the non-explicit consenter group is likely to include people who are unable to participate in ALSPAC, those who are unable to provide explicit consent as they lack capacity for this, or attend clinics to measure IQ, because of an ID.
The impact of missing study data can be partially mitigated by the use of linked routine health and education records where available: however, each linkage source is also impacted by incomplete coverage. This means there are some ALSPAC participants for whom there exists insufficient evidence to assess ID status and that it is reasonable to suggest that disproportionate numbers of individuals with ID may fall into this group. Whilst this has not impacted the ascertainment of case status in those with information, it does mean that this case status should not be used to determine prevalence estimates and users of the data should note that some cases, possibly those with most pronounced ID, are missing from the data even where linked records are available.

Ethics policies
Ethical approval for the study was obtained from the ALSPAC Law and Ethics committee and for the ALSPAC record linkage programme, from a local research ethics committees (NHS Haydock REC: 10/H1010/70). A comprehensive list of research ethics committee approval references is available to download.

Consent
Written informed consent was obtained from the main caregiver of participating children after receiving a full explanation of the study. Children were invited to give assent where appropriate. Study members have the right to withdraw their consent for elements of the study or from the study entirely at any time. Full details of the ALSPAC consent procedures are available on the study website. Access to the linked health records of those who had not explicitly consented to data linkage was authorised via use of Section 251 of the National Health Service Act 2006 18 for GP records but not HES or MHSDS data (CAG reference: 20.CAG/0056; IRAS project ID: 268410).

Data availability
to ALSPAC data, including access to the data and R scripts described in this data note.
1. Please read the ALSPAC access policy (PDF, 627kB) which describes the process of accessing the data and samples in detail, and outlines the costs associated with doing so.
2. You may also find it useful to browse our fully searchable research proposals database, which lists all research projects that have been approved since April 2011.
3. Please submit your research proposal for consideration by the ALSPAC Executive Committee. You will receive a response within 10 working days to advise you whether your proposal has been approved.
If you have any questions about accessing data, please email alspac-data@bristol.ac.uk.
The ALSPAC data management plan describes in detail the policy regarding data sharing, which is through a system of managed open access.

Freya Tyrer
Department of Health Sciences (Biostatistics Research Group), University of Leicester, Leicester, England, UK Thank you for the opportunity to review this paper which I found very interesting and well-written. I agree that the use of multiple sources to identify ID is useful owing to the ambiguities in the definitions of ID.
I have minor comments, as below: It is not surprising to me that ID is poorly identified in English HES data using F7x codes owing to coders' and clinician's preferences for the F81.9 (developmental disorders of scholastic skills) for ID -a recent publication has also highlighted this (see https://pubmed.ncbi.nlm.nih.gov/36940198/ 1 ). I can see why the authors haven't used this as it could potentially refer to people without ID, but I think there does need to be some recognition of this preference in the article -perhaps in the limitations section. There seems to be some reluctance to change these codes because hospitals receive an uplift for any interventions in patients with ID. I think the inability to identify ID based on HES data alone would be a useful recommendation arising from this work.

○
The only other comments I had relates to Figure 1. The figure shows 93 people being identified from free text fields, but the narrative refers to 94 people. Similarly, 1289 people were excluded but the numbers of people excluded adds up to 1323. This might be because people fit into more than 1 category but it stands out (to me anyway). I'd recommend looking at the entire figure to make sure that everything is consistent and makes sense and that overlaps are specified, where they occur.