iCoverT: A rich data source on the incidence of child maltreatment over time in England and Wales

Child maltreatment is a major public health problem, which is plagued with research challenges. Good epidemiological data can help to establish the nature and scope of past and present child maltreatment, and monitor its progress going forward. However, high quality data sources are currently lacking for England and Wales. We employed systematic methodology to harness pre-existing datasets (including non-digitalised datasets) and develop a rich data source on the incidence of Child maltreatment over Time (iCoverT) in England and Wales. The iCoverT consists of six databases and accompanying data documentation: Child Protection Statistics, Children In Care Statistics, Criminal Statistics, Homicide Index, Mortality Statistics and NSPCC Statistics. Each database is a unique indicator of child maltreatment incidence with 272 data variables in total. The databases span from 1858 to 2016 and therefore extends current data sources by over 80 years. We present a proof-of-principle analysis of a subset of the data to show how time series methods may be used to address key research challenges. This example demonstrates the utility of iCoverT and indicates that it will prove to be a valuable data source for researchers, clinicians and policy-makers concerned with child maltreatment. The iCoverT is freely available at the Open Science Framework (osf.io/cf7mv).


Introduction
Child maltreatment is a major public health and social welfare problem world-wide [1,2]. However, child maltreatment research is plagued with challenges; from difficulties in measuring and establishing the scope of the problem, to ethical and practical obstacles hindering the use of randomised controlled trials to determine the effectiveness of interventions [3,4].
Good epidemiological data are needed to overcome such research challenges, in particular regularly collected national incidence data [1]. These data can help characterise the nature and extent of child maltreatment, and coupled with advanced statistical methods can also be used to evaluate the effect of planned or unplanned interventions [5,6]. Exploring long-term trends PLOS ONE | https://doi.org/10.1371/journal.pone.0201223 August 27, 2018 1 / 12 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 and changes to the incidence of child maltreatment can shed new insights on this emergent and complex social health issue, providing directly translatable evidence for practitioners and policy-makers.
Countries are beginning to respond to this need by routinely collecting national estimates of the incidence of child maltreatment. The United States, Canada and the Netherlands have developed professional surveys to prospectively collect incidence estimates [7][8][9], whereas England and Wales extract administrative data. Following the introduction of the Children Act 1989, England and Wales have routinely collected administrative data on the number of children referred to, and assessed by, social services [10]. These data have been compiled and published on a national database since 2008 under the revised children in need census [11]. Although these data go some way in providing national incidence data on child maltreatment, considerable progress is still needed before there are sufficient data to investigate long-term trends and to implement robust statistical techniques, such as time series analyses. More comprehensive usable data sources are needed to advance current research and understanding.
To address this need, we developed a rich epidemiological data source on the incidence of child maltreatment over Time (iCoverT) in England and Wales by harnessing pre-existing datasets. The iCoverT offers new opportunities to quantify temporal trends and changes over time, whilst also providing a valuable public health surveillance tool for monitoring child maltreatment.

Overview
We adapted systematic review and routinely-collected data recommendations from the PRISMA and RECORD statements to devise a systematic strategy for identifying, investigating and assessing pre-existing datasets of routinely collected data [12,13]. Ethical approval was not required because all data were fully anonymised and publicly available before we accessed them. Out of 13 identified datasets, data from six datasets were extracted and prepared as six temporally consistent databases. These six databases on the incidence of child maltreatment over time and their accompanying documentation form the iCoverT (Fig 1). The iCoverT is a freely accessible data source and new data may be deposited to maintain and update it [14].

Dataset identification
Three main search strategies were used to identify relevant datasets on the incidence of child maltreatment in England and Wales, including literature and internet searches using historically sensitive search terms and contact with experts within academia and health provision. We searched for datasets which met the following inclusion criteria: (1) data could be used to estimate the incidence of child maltreatment. Child maltreatment was defined as one, or a combination of, physical abuse, neglect, sexual abuse and emotional maltreatment to a child aged 18 years old or under. Bullying by peers and witnessing intimate partner violence were not included in this definition; (2) data were nationally representative of England and Wales, including or excluding Northern Ireland; (3) data were collected prospectively; (4) data were collected annually; and (5) data were collected and available for a period of at least 25 years. If data were overlapping only the richest dataset was included. We identified 13 relevant datasets; six datasets were identified from literature searches, three from internet searches, and four from field experts (Fig 1).

Dataset investigation and quality assessment
We investigated the 13 identified datasets by asking the following key questions: who collects the data, why are the data collected, when were the data collected, what data are collected, and where are the data located. Answers for each dataset were obtained by reading accompanying dataset information and contacting the authorities responsible for collecting, publishing and/ Incidence of child maltreatment over time (iCoverT) or holding the data. We made 23 enquiries to different government departments, organisations, libraries and online data repositories, submitted nine submitted Freedom of Information (FOI) requests, and visited seven libraries and archives. Following these investigations, seven datasets were excluded because they did not satisfy the inclusion criteria (S1 Table), leaving the six included datasets: Child Protection Statistics, Children In Care Statistics, Criminal Statistics, Homicide Index, Mortality Statistics and National Society for the Prevention of Cruelty to Children (NSPCC) Statistics.
We assessed the quality of the six remaining datasets using an adapted quality assessment tool [15][16][17]. In line with our pre-specified rating guidance, we judged data quality to be good, satisfactory or problematic for eight quality criteria (representativeness, missingness, accuracy, temporal consistency, validity, definitions, timeliness, interpretability) (S2 Table). If a dataset was rated problematic on a criterion we explored feasible strategies to address the problem. Datasets were rated satisfactory to good on the majority of criteria, except all datasets were rated to have problematic temporal consistency. In addition, Children In Care Statistics showed problematic missingness and accuracy, Child Protection Statistics had problematic accuracy, and Criminal Statistics demonstrated problematic interpretability (S2B Table). Although no dataset was fully excluded due to unresolvable quality issues, four data variables were excluded due to unresolvable temporal consistency problems, and two were truncated due to unresolvable temporal consistency and accuracy problems (S3 Table).

Data extraction
We located, saved and indexed all available data for the six included datasets using the online shareable reference manager Zotero. Data were found in various formats both within and across datasets. Data available electronically or via websites were saved directly to Zotero, whilst data only available as hardcopies at libraries and archives were photographed, uploaded, and indexed accordingly. Data extraction depended on the format of the original data and their sources. Data available as hardcopies or embedded within PDFs required manual extraction, whilst some data in excel or common separated value (CSV) formats were electronically copied across. Most data required year-by-year extraction from separate sources, whilst some data were part-collated across years (e.g. 5-year period). M.D.E. systematically extracted all data, entering the data for each dataset into separate standardised CSV templates to generate six databases.
The reliability of the data extraction process and accuracy of extracted data were checked by embedding internal calculations within the databases to validate the relationships between data variables. For example, for the Children In Care Statistics database "Males in care in England and Wales" was checked to equal the sum of "Males in care in England" and "Males in care in Wales". In addition, a second data extractor (J.T.) carried out independent data checks of 100% of the data. The average interrater agreement across databases was 99.1% and any discrepancies were discussed and resolved by consensus. Data were further visualised to identify any potential anomalies.

Data preparation
Following data extraction, we prepared the data in two main stages to generate six databases of complete, temporally consistent time series data.
Stage 1: Generating temporally consistent data. Our quality assessment identified that all six databases suffered from temporal inconsistency problems due to various changes over time. These changes ranged from structural and organisational, to data and terminology. We investigated all changes over time and implemented appropriate data linkage strategies to address them. We employed five main strategies: (a) derived calculations to address temporal changes and generate equivalent data across years; (b) adopted broader categories to obtain consistent data variables across time; (c) identified and matched equivalent items over time; (d) consistently prioritised specific data sources; and (e) determined the change to have no substantial effect on the data. Full details are specified in S4 Table. Despite addressing all temporal inconsistencies, not all strategies may comprehensively account for changes over time. We therefore recommend the inclusion of dummy variables in data analyses for specific changes, where dummy variables may be coded to represent these changes over time. Our recommendations are indicated in S4 Table. Stage 2: Adjusting the data. We carried out data adjustments if: more than 20% of the data was missing, there were data idiosyncrasies, data did not cover a 12-month period, and data were inconsistently rounded. The Criminal Statistics and the Homicide Index databases did not require any data adjustments. Only one data variable from the Children In Care Statistics database was found to have more than 20% missing data, which was addressed using adhoc missing data imputations. Child Protection Statistics, Children In Care Statistics, Mortality Statistics and NSPCC Statistics showed specific idiosyncrasies within their data. Each idiosyncrasy was minor and easily addressed. One year within the Children In Care Statistics database and two years within the NSPCC Statistics database did not cover the full 12-month period. For these three years, data were adjusted to account for missing months. Child Protection Statistics and Children In Care Statistics data were inconsistently rounded over time. As a result, data variables were rounded to the same level of precision. Missing data imputations and data adjustments are described in S5 Table.

Data documentation
We systematically and comprehensively documented all stages of dataset investigations and assessment, and data extraction and preparation. For each database, these details were synthesised and prepared as three types of data documentation: Data guides, Data dictionaries and Attribution statements. Data guides include background information obtained from dataset investigations, and where appropriate, more detailed information about data variables (context and definitions). Data dictionaries describe the contents, format, and structure of each database. Attribution statements document in full all data sources and corresponding permissions.

Results and discussion
Description of the iCoverT iCoverT structure. The iCoverT is a comprehensive data source on the incidence of child maltreatment over time [14]. It consists of six separate databases, which are each accompanied by a set of three data documents (Data guide, Data dictionary, Attribution statement). An overview of the iCoverT is detailed in machine-actionable and readable Data Documentation Initiative (DDI) compliant metadata (XML and PDF format).
Database characteristics. Characteristics of the iCoverT databases are summarised in Table 1. Child Protection Statistics, Children In Care Statistics, Criminal Statistics, Homicide Index, Mortality Statistics and NSPCC Statistics are all forms of administrative data, which have been annually collected over time. For all databases, data were first collected by local authorities or regional branches and then aggregated, checked and validated by centralised departments or headquarters to form national estimates. Centralised departments and headquarters include various government departments, from the Office for National Statistics to the Department of Health, and the NSPCC's London Headquarters.
Five databases represent the number of victims of child maltreatment, be it children on the child protection register or child homicide victims, whereas one database represents the number of perpetrators of child maltreatment (Criminal Statistics). All databases cover England and Wales, with data from NSPCC Statistics also covering Northern Ireland. The average time range across databases is just under 90 years. Mortality Statistics is the oldest database collected since 1858 while Child Protection Statistics is the newest database first collected in 1988. Although we excluded more recent data from NSPCC Statistics due to temporal inconsistencies, all databases continue to be collected today.
Descriptive statistics. In total, there are 272 data variables. These data variables include detailed information on the gender and age of victim, as well as information about the type of maltreatment, who reported the incident, and the country (e.g. England, Wales). All data are freely available online, and Table 2 shows descriptive statistics for main data variables on the overall number of child maltreatment incidents, and the victims' gender. All data are counts and range from 0 to 15,9407 depending on the database, data variable and year. Across  a The number of males and females on the child protection register do not add up to the total number of children as registrations were for unborn children and/or details were unknown (from approximately 0% to 3% depending on the year). In addition, some figures were reported to the nearest 100 (see Table 1) and this resulted in rounding error. b Aged under 18 years old.
c The number of males and females do not add up to the total number of children in care as some figures were rounded to the nearest 100 (see Table 1) and this resulted in rounding error. Incidence of child maltreatment over time (iCoverT) databases, there is high degree of variance within data variables due to considerable year-onyear fluctuations.

Utility of the iCoverT
The iCoverT is a rich epidemiological data source, which extends current maltreatment incidence data in England and Wales by over 80 years [11,[18][19][20][21]. As a result, the iCoverT offers the statistical power to use advanced statistical methods, such as fitting regression models, interrupted time series designs, and ARIMA modelling. The databases are also well-suited to help answer key issues within child maltreatment research. The historical data can shed light on past maltreatment and help to contextualise current maltreatment, whilst more recent data can be used to monitor current and future rates of child maltreatment. The iCoverT may also be used to evaluate the impact of planned or unplanned interventions on the occurrence of child maltreatment (e.g. specific child protection efforts).

Proof-of-principle analysis
To demonstrate the potential utility of the iCoverT, we carried out a proof-of-principle analysis on a subset of its data taken from the Criminal Statistics database. To date, there are contradicting findings on whether trends in child maltreatment are decreasing. Some scientific evidence indicates decreasing trends since the 1970s and other evidence suggests no significant change over time [19,20]. However, to our knowledge, no empirical evidence quantitatively investigates maltreatment trends in England and Wales before the 1970s. We therefore addressed the question: Has the incidence of guilty convictions for the criminal offence Cruelty to children decreased from 1893 to 1970? All statistical analyses were carried out in R (version 3.4.1). Age-standardised incidence rates (guilty persons per 100,000 population) were calculated from the number of persons found guilty of Cruelty to children and the total population of persons aged over 15 [22]. There were missing data due to disruptions caused by the First and Second World Wars (1915, 1916, 1920, 1921 and 1939-1945). These missing values were imputed using linear interpolation for non-seasonal univariate time series. We derived and plotted an eight-year moving average as this best visualised the data, smoothing short-term fluctuations and isolating the long-term trend (Fig 2). The smoothed time series shown in Fig 2 illustrates an overall downward trend in incidence rates of persons found guilty of Cruelty to children from 1900 to 1970. This decline is particularly steep between 1900 and 1930, plateauing out to a gentler decline from 1940 to 1970. Although this may suggest that the incidence of the criminal offence Cruelty to children fell between 1893 and 1970, a more careful historical analysis is needed to establish whether this trend reflects real decreases in the number of maltreatment-related crimes or whether it reflects wider changes to the judicial system, such as stricter guidelines on determining guilt. Nonetheless, this finding provides historical perspective and sheds light on longer-term, as well as more recent, maltreatment trends. This example also shows that the iCoverT can be analysed using time series methods to help answer key questions within child maltreatment research.

Strengths of the iCoverT
The iCoverT has a number of strengths. The first strength lies in the systematic and exhaustive methods used to develop each database and their accompanying data documentation. All identified datasets underwent thorough investigations and were assessed against pre-specified inclusion and quality criteria. Data were manually extracted and checked in full by a second extractor. Despite complexities to the data extraction process, there was good agreement between extractors (99.1%) and further checks were carried out to ensure the accuracy of the data. In addition, we included unique steps to generate complete, temporally consistent data, applying data linkage strategies and data adjustments where appropriate. All stages of dataset identification, investigation, assessment, data extraction and preparation were meticulously documented.
The second strength lies in the richness of the data. The iCoverT consists of six databases with 272 data variables and spans from 1858 to 2016. The databases extend current data by over 80 years, and each database may shed a different light on the complex construct of child maltreatment. For example, NSPCC Statistics may capture more types of child maltreatment than Mortality Statistics, which only captures severe cases of physical abuse and/or neglect. Data variables may also offer important insights as they provide additional breakdowns, including information on: country, gender and age of victims, and type of child maltreatment. Third, the databases are nationally representative of England and Wales. Research on this type of epidemiological data results in robust and translatable national-level evidence, which is most pertinent for informing public health polices and interventions. Lastly, because new Year Incidence rate x 100000 relevant data will become available each year, the iCoverT has potential for maintenance and growth. We intend to continue to collect data going forward, regularly updating and maintaining the iCoverT as a freely accessible data source for research.

Limitations of the iCoverT
The iCoverT also has several limitations. First, most data were not originally collected for research purposes. Although we exhaustively investigated and addressed all identified confounders, the data may be influenced by obscure operational processes and changing organisation priorities that we are unaware of [23]. Second, while all data were reported annually, this temporal information may not necessarily reflect the occurrence of maltreatment in that specific year. This is particularly relevant for Mortality Statistics as previous research highlights a reporting delay in registering deaths [19]. However, the most recent release relating to Mortality Statistics states that the median registration is only 5 days, which would minimally effect the temporal accuracy of these data [24]. Third, we identified the relevant dataset of Hospital Episode Statistics for Admitted Patient Care in England [25]. Despite extracting relevant data from 1999 to 2016 and identifying earlier data from 1980 to 1998, we were unable to include this potentially valuable dataset due to financial costs (NHS Digital quoted a minimum cost of £1800 which exceeded available funds).

Conclusions
The iCoverT is a rich, freely accessible data source on the incidence of child maltreatment over time in England and Wales. The development of the iCoverT, as described in this article, demonstrates how systematic methods can be used to overcome practical obstacles and harness pre-existing datasets from across disciplines. We believe that the iCoverT will be an invaluable data source and public health surveillance tool for researchers, clinicians and policy-makers concerned with child maltreatment.
Supporting information S1  Table) and quality assessment of included datasets (S2B Table). who responded to our enquiries and FOI requests. In particular, we owe thanks to the NSPCC's Knowledge and Information Service for its continued support in responding to research queries. Benjamin Jenkins should also be thanked for his feedback on the manuscript.