A chronological map of 308 physical and mental health conditions from 4 million individuals in the English National Health Service

Summary Background To effectively prevent, detect, and treat health conditions that affect people during their lifecourse, health-care professionals and researchers need to know which sections of the population are susceptible to which health conditions and at which ages. Hence, we aimed to map the course of human health by identifying the 50 most common health conditions in each decade of life and estimating the median age at first diagnosis. Methods We developed phenotyping algorithms and codelists for physical and mental health conditions that involve intensive use of health-care resources. Individuals older than 1 year were included in the study if their primary-care and hospital-admission records met research standards set by the Clinical Practice Research Datalink and they had been registered in a general practice in England contributing up-to-standard data for at least 1 year during the study period. We used linked records of individuals from the CALIBER platform to calculate the sex-standardised cumulative incidence for these conditions by 10-year age groups between April 1, 2010, and March 31, 2015. We also derived the median age at diagnosis and prevalence estimates stratified by age, sex, and ethnicity (black, white, south Asian) over the study period from the primary-care and secondary-care records of patients. Findings We developed case definitions for 308 disease phenotypes. We used records of 2 784 138 patients for the calculation of cumulative incidence and of 3 872 451 patients for the calculation of period prevalence and median age at diagnosis of these conditions. Conditions that first gained prominence at key stages of life were: atopic conditions and infections that led to hospital admission in children (<10 years); acne and menstrual disorders in the teenage years (10–19 years); mental health conditions, obesity, and migraine in individuals aged 20–29 years; soft-tissue disorders and gastro-oesophageal reflux disease in individuals aged 30–39 years; dyslipidaemia, hypertension, and erectile dysfunction in individuals aged 40–59 years; cancer, osteoarthritis, benign prostatic hyperplasia, cataract, diverticular disease, type 2 diabetes, and deafness in individuals aged 60–79 years; and atrial fibrillation, dementia, acute and chronic kidney disease, heart failure, ischaemic heart disease, anaemia, and osteoporosis in individuals aged 80 years or older. Black or south-Asian individuals were diagnosed earlier than white individuals for 258 (84%) of the 308 conditions. Bone fractures and atopic conditions were recorded earlier in male individuals, whereas female individuals were diagnosed at younger ages with nutritional anaemias, tubulointerstitial nephritis, and urinary disorders. Interpretation We have produced the first chronological map of human health with cumulative-incidence and period-prevalence estimates for multiple morbidities in parallel from birth to advanced age. This can guide clinicians, policy makers, and researchers on how to formulate differential diagnoses, allocate resources, and target research priorities on the basis of the knowledge of who gets which diseases when. We have published our phenotyping algorithms on the CALIBER open-access Portal which will facilitate future research by providing a curated list of reusable case definitions. Funding Wellcome Trust, National Institute for Health Research, Medical Research Council, Arthritis Research UK, British Heart Foundation, Cancer Research UK, Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Department of Health and Social Care (England), Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), Economic and Social Research Council, Engineering and Physical Sciences Research Council, National Institute for Social Care and Health Research, and The Alan Turing Institute.


Introduction
A chronological map of human health from birth to death depicting the most common conditions by age and marking the median age at diagnosis is fundamental to understanding who gets which conditions when, on a population level. This understanding can inform clinicians Consented cohort studies investigating multiple health conditions were limited in age range and in ascertainment of conditions diagnosed in primary care. Many had too few participants to reliably estimate disease distribution by age, sex, and ethnicity. Studies based on electronic health records (EHRs) surmounted these limitations, but the manual curation required for developing case definitions and phenotype algorithms from EHR data restricted the number of conditions analysed within a single study to fewer than 100. Many studies reported prevalence estimates for comorbid conditions relative to an index disease, such as heart failure. Some were confined to either primary or secondary care. The Global Burden of Disease initiative inferred disease prevalence estimates from mathematical models based on empirical frequency data. The US National Cancer Institute's Surveillance, Epidemiology, and End Results cancer statistics review reported age at diagnosis by sex and ethnicity for primary cancer sites. Most other studies reported age at diagnosis for a single disease from small, sometimes unrepresentative sample sets. We did not find any studies that described the age distribution and age at diagnosis stratified by sex and ethnicity from birth to death for several hundred diseases contemporaneously with a single linked clinical dataset obtained from primary-care and secondary-care settings within a universal health-care system.

Added value of this study
We present the first lifecourse map of human health, charting the 50 most common conditions in each decade of life, and the median age at diagnosis for 308 conditions. We compiled case definitions, cumulative incidence and age-specific, sex-specific and ethnicity-specific period prevalences for 308 conditions, by harmonising Read, International Classification of Diseases (tenth revision), and Office of the Population Censuses and Surveys Classification of Interventions and Procedures version 4 codes across primary-care and secondary-care records in England. This has involved updating and extensively expanding the phenotyping algorithms in the CALIBER Portal. Conditions were selected to reflect the disease burden and health-care utilisation of the English population, which are likely to be similar to those in countries with similar economies and population structures. Conditions with more than 10 000 Hospital Episode Statistics finished consultant episodes (the time spent under the care of one consultant while admitted to hospital) in England from April 1, 2014, to March 31, 2015, or those with estimated prevalences greater than 0·01% and considered clinically important by our panel of clinicians were included in this report.
Our results illustrate the varying dominance of different conditions through the passage of life. Common childhood conditions were atopic disorders and acute infections. Acne and menstrual disorders gained prominence in teenagers. Mental health disorders emerged in young adults, together with obesity and migraine. Disorders associated with the metabolic syndrome, soft tissue disorders, erectile dysfunction, and gastro-oesophageal reflux disease rose substantially in middle-age. Cancer, osteoarthritis, benign prostatic hyperplasia, cataract, diverticular disease, and deafness became more common in individuals aged 60-79 years, whereas atrial fibrillation, dementia, acute and chronic kidney disease, heart failure, ischaemic heart disease, anaemia, and osteoporosis escalated in advanced age (≥80 years).
Ethnic and sex differences were also discernible. White patients had later median age at diagnosis for 258 of the 308 conditions. Although this could be attributed to the older age structure of the white population, another potential reason is that distinct biological pathways can lead to the same diagnosis in different demographic groups. Sleep apnoea, for example, was common in black boys and older white men, with potentially different mechanisms underlying the two groups. Female individuals were younger at diagnosis of tubulointerstitial nephritis, urinary incontinence, chronic cystitis, and nutritional anaemias, whereas male individuals were diagnosed at younger ages with bone fractures and atopic conditions.

Implications of all the available evidence
By mapping the distribution of health conditions across the lifecourse, we have empowered researchers, clinicians, health-care providers, and policy makers to better identify individuals at risk, and to instigate strategies to detect, prevent, and manage specific conditions. The patterns of disease distribution that we have revealed could lead to further research into the heterogeneous causes of diseases. The platform that we have created can promote further research into ageing-related health conditions and multimorbidity to meet the challenges facing ageing populations. By providing the phenotyping algorithms for hundreds of conditions through an existing open access Portal (CALIBER), we are also facilitating the use of EHR data in large cohort studies such as UK Biobank in this era of high-throughput biomedical data. www.thelancet.com/digital-health Vol 1 June 2019 e65 Addressing this question requires large-scale, population-based studies with broad coverage of health conditions and appropriate age-related frequency measures. The age-specific cumulative incidence establishes when specific conditions are more likely to occur during the lifecourse, while the age-specific period prevalence unveils the collective past medical history of a population at each stage of life over a specified calendar time period. Age at first recorded diagnosis by sex and ethnicity characterises patterns of health-condition onset by age and differences between demographic groups, with potentially distinctive underlying pathological processes. Although the Global Burden of Disease (GBD) reports 1 and multimorbidity studies 2,3,4 have drawn from various data sources to estimate disease frequency statistics for the overall population, there have been no previous studies linking prevalence estimates with age of diagnosis for multiple conditions in parallel within a single health system, to draw the chronological map of human health conditions across the lifecourse.
The UK National Health Service (NHS) is well placed to support these analyses, as the provider of universal cradle-to-grave health care in the UK since 1948, with more than 98% of the UK population registered with an NHS general practice. 5 NHS clinical data can be aggregated on a population scale, with electronic health records (EHRs) in primary care 6 linked to digitised disease-episode coding in secondary care 7 using unique NHS identification numbers assigned permanently to individuals. 8 EHRs comprise data from multiple sources with a variety of coding schemes, wherein a single condition such as type 2 diabetes might be represented by hundreds of codes. Therefore, the construction of case definitions and codelists across the various clinical settings requires meticulous curation, which has previously been a limiting factor in the contemporaneous study of hundreds of conditions.
We have created a chronological map of human health by charting the most common mental and physical health conditions by decade of age, and by estimating the median age at first recorded diagnosis for 308 health conditions using linked EHRs in England. We have also compiled a compendium of phenotyping algorithms and codelists; age-specific, sex-specific, and ethnicity-specific prevalences; and median age at first record by sex and ethnicity of the spectrum of disorders affecting recipients of NHS care for the use of clinicians, policy makers, health-care providers, and researchers.

Study design and participants
We studied population-based EHRs of primary-care patient-level data from the Clinical Practice Research Datalink (CPRD) linked to the dataset of the Hospital Episode Statistics (HES) for admitted-patient care. CPRD is one of the largest EHR databases in the world, is representative of the English population by age, sex, and ethnicity, 5,9 provides anonymised data, and has been previously validated for epidemiological research. 10 Individuals older than 1 year were included in the study if their records met research standards set by the CPRD 5 and they had been registered in a general practice in England contributing up-to-standard data for at least 1 year from April 1, 2010, to March 31, 2015. The study was approved by the Independent Scientific Advisory Committee for the Medicines and Healthcare products Regulatory Agency (protocol 16_022).

Procedures
We identified physical and mental health conditions that involve intensive use of health-care resources. These conditions included those from the quality and outcomes framework, 11 a UK general-practice payment-for-performance scheme, with modifications for more granular   12 Diagnoses were coded using three-character or four-character codes from the International Classification of Diseases, tenth revision (ICD-10). We examined the finished consultant episodes for codes in chapters I-XIV and XVI-XVII of the ICD-10. We excluded pregnancy-related conditions, symptoms, signs, abnormal clinical and laboratory findings, and external causes of morbidity and mortality. Three-character or four-character ICD-10 codes were assigned to specific conditions as agreed between clinicians in the team (VK, OB, SS, SH, MH, DN, CAP, RTL, RS, and ADH). Conditions with codes that had more than 10 000 finished consultant episodes were included. If a condition had fewer than 10 000 finished consultant episodes but the prevalence was greater than 0·01% and it was considered to be clinically important by our panel, it was included in the study (appendix p 56).
Infections were categorised by organ system and causal organism. Chronic infections with long-term sequelae included were HIV, chronic viral hepatitis, tuberculosis, and rheumatic fever. Acute infections were limited to hospital admissions. Obesity was only considered for individuals older than 18 years.
Health conditions were harmonised across primarycare and secondary-care coding systems and organised into 16 disease categories corresponding closely to ICD-10 chapters (appendix pp 2-6).
Phenotyping algorithms defining these conditions were based on diagnosis or procedural codes, with the additional inclusion of some blood test values or other measures-ie, estimated glomerular filtration rate, total cholesterol, low-density lipoprotein cholesterol, highdensity lipoprotein cholesterol, triglyceride, or bodymass index (BMI). Diagnoses and procedures are recorded in CPRD with Read codes. ICD-10 diagnosis codes and Office of the Population Censuses and Surveys Classification of Interventions and Procedures version 4 (OPCS-4) procedural codes are used in the HES for admitted-patient care. Keywords were searched in the Read and OPCS-4 dictionaries for each of the selected conditions to construct the Read and OPCS-4 codelists. Patients were considered to have or have had a specific condition if they met the criteria in the algorithm for that condition before or during the study period. Algorithms and codelists for all identified conditions are available on the CALIBER Portal. The algorithms can be downloaded in a machine-readable CSV format from the algorithm data repository.
Selection of health conditions, algorithm development, and codelist construction were done by a panel of clinicians The main outcomes of our study were cumulative incidence and period prevalence, stratified by age, ethnicity, and sex (male and female), and age at first diagnosis.
Ethnicity was grouped into the five categories of the 2011 UK census-ie, white, mixed, south Asian, black, and other (appendix pp 7). 9 Patients with missing ethnicity or codes belonging to more than one category were classified as unknown. Ethnic stratification was reported for white, south-Asian, and black populations only, as interpretation of mixed and other populations is less meaningful when considering disease susceptibility.

Statistical analysis
The age at first recorded diagnosis was the earliest age at which the criteria in a phenotyping algorithm for a specific condition were met from any source.
The cumulative incidence between April 1, 2010, and March 31, 2015, was calculated by dividing the number of incident cases (people with first recorded diagnoses) during this time period by the number of people in the study population at risk on April 1, 2010. We computed the sex-standardised cumulative incidence for 10-year age bands (0-9 years, 10-19 years, 20-29 years, 30-39 years, 40-49 years, 50-59 years, 60-69 years, 70-79 years, ≥80 years). As we had not estimated the prevalence of childhood obesity in this study, we did not calculate the cumulative incidence for obesity for those between 18 years and 20 years of age because we were unable to determine the denominator (individuals aged 18 years on April 1, 2010, who had not previously been defined as obese). Age-specific, sex-specific, and ethnicity-specific period prevalences from April 1, 2010, to March 31, 2015, were calculated by dividing the number of new and preexisting cases by the number of people in the study population during this time period. Standardisation was applied using the 2013 European Standard Population. 13 The median age (IQR) at which conditions were first recorded was determined for patients in the study population.
We compared our prevalence estimates with those from the GBD study 1 and from Barnett and colleagues' study. 2 Prevalence estimates were obtained directly from the published article in the case of Barnett and colleagues' study, 2 or downloaded from the GBD online results tool 14 in the case of the GBD 2017 study. 1 Analyses were done using R (version 3.4.3).

Role of the funding source
The funders had no role in study design, data collection, data analysis, data interpretation, report writing, or the decision to submit the paper for publication. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.
Age-standardised period-prevalence estimates for the 308 conditions stratified by sex and ethnicity are provided in the appendix (pp [17][18][19][20][21][22][23][24][25][26]. We examined the differences in median age of diagnosis of the health conditions in the study by ethnicity and sex (figure 2-4 and appendix pp 27-37). Our ethnicity subanalysis found that white individuals had later median age at first record than black or south-Asian individuals for 258 (84%) of 308 conditions. Conditions with large differences in median age of diagnosis between ethnicities included sepsis ( We compared the study design and characteristics of our study with those of the GBD 2017 study 1 and of Barnett and colleagues' study 2 (appendix p 38). Prevalence values for 112 out of the 308 conditions in our study were previously reported by either one or the other study. 1,2 We compared the prevalence estimates between the three studies for the 112 overlapping conditions and reported the prevalence estimates for non-overlapping conditions by disease category (appendix pp 39-41 and 42-55).

Discussion
We introduced the first chronological map of human health, with cumulative-incidence estimates, periodprevalence estimates, and median age at first record for 308 conditions from a single, large, clinically representative study population, stratified by age, sex, and ethnicity.
Although this chronological map reflects the burden of health conditions in England, it is likely to be relevant to other high-income countries with similar age and sex profiles. The findings complement the GBD reports, which have a wide geographical remit and hence encompass low-income, middle-income, and high-income countries. Prevalence estimates for some long-term conditions common in the NHS-such as hypertension, dyslipidaemia, irritable bowel syndrome, and thyroid disorders-were not included in GBD 2017. 1 Our coverage of conditions was both wider (spanning both primary and secondary care) and more granular than in the seminal Scottish primary-care study by Barnett and colleagues. 2 We classified cancers by major organ system, and subcategorised coronary heart disease into stable angina, myocardial infarction, unstable angina, and coronary heart disease not otherwise specified. Frequency estimates for health conditions in this study convey the actual clinical experience of individuals in the NHS as documented in their EHRs. Prevalence estimates in the established medical literature vary widely and depend on multiple factors, including case definition, methodology, study sample, and time of measurement. Many disease-prevalence estimates are based on sparse sample sets from dated clinical studies, or surveys not representative of the general population. GBD 2017 1 collated disease-distribution estimates from various sources including published literature, with renewed analysis of publicly available data and estimated prevalence with statistical models, whereas Barnett and colleagues 2 reported point prevalence values based on EHRs from Scottish general-practice data in 2007. The period-prevalence estimates in this study are expected to be higher than the point-prevalence measures reported in the GBD or Barnett and colleagues' studies. 1,2 This is because we included all individuals who had ever been recorded with a condition, not just individuals presumed to have the condition at the time of analysis. Our periodprevalence calculation included people who had died during the study period; hence, conditions more common in those who died could have higher estimates in this report than in previous reports. 1,2 Another potential reason for higher prevalence estimates in this report than in Barnett and colleagues' study 2 is the inclusion of secondary-care data in our case definitions. Comparison between these three large studies illustrates the similarities and differences in estimates using different methodologies, and allows the reader to select the method most relevant for their objectives. We reviewed disease frequency estimates from other published studies for conditions with the largest disparities between this study and the GBD or Barnett studies in the appendix (p 1).
Our age-stratified analysis enabled us to chronicle the passage of health conditions across the lifecourse. Hospital-admitted infections affected individuals at the extremes of life (<10 years and ≥80 years). Atopic conditions were common in childhood (<10 years). Mental health disorders were most prevalent from early adulthood (≥20 years). Menstrual disorders and migraine afflicted women of childbearing age (20-49 years). Metabolic conditions, particularly dyslipidaemias and obesity, together with hypertension, increased in prevalence from middle age (≥40 years). Cardiovascular diseases emerged later in life (≥60 years), following the surge in metabolic conditions in middle-age. Degenerative   conditions involving the sense organs, musculoskeletal, genitourinary, and neurological systems were prominent in older individuals (≥80 years). This is the first study to systematically document the age of diagnosis for hundreds of health conditions contemporaneously. Age of first recorded diagnosis encapsulates information regarding the age of onset and diagnosis, consultation patterns, and diagnostic and recording practices. It might also reveal different subtypes of disease according to sex and ethnicity.
The median age at first record was later in white individuals than in other ethnicities for 258 (84%) of the 308 conditions. Similar results have previously been reported for cancers. 15 We found no significant diff erences between ethnicities in age-specific prevalence for acute infections, for which discrepancies in the age at first record between ethnicities were wide. Therefore, the later age at diagnosis for acute infections, and potentially other conditions, could be attributed to the higher proportion of older people in the white population than in other ethnicities. Other reasons for the younger ages at diagnosis of ethnic minorities could be a combination of genetic predisposition, socioeconomic status, or culturally determined health beliefs and practices.
Different pathological pathways might also be responsible for disorders affecting different ethnicities at different ages. Black individuals were diagnosed earlier with sleep apnoea than white individuals. This finding was consistent with our prevalence estimates, which identified a higher prevalence among black boys (aged <20 years) and white men older than 30 years. Sleep apnoea is usually associated with adenotonsillar hypertrophy in children, 16 whereas in adults the main contributing factors are obesity, male sex, and age. 17 Distinct disease mechanisms could also give rise to disparities in age of diagnosis between the sexes. Up to the age of 40 years, wrist fractures were more common in men and boys, but after this age, the incidence was higher in women. This difference could be attributed to higherrisk physical activities in young men or boys than in women or girls, and to decreased bone mineral density in older women (≥40 years). Male individuals were diagnosed with asthma at a younger age than female individuals, with earlier age at first record and higher cumulative incidence in boys younger than 10 years, whereas more   women and girls were diagnosed with asthma after this age. This pattern of early-onset asthma in men or boys but late-onset in women or girls has been reported elsewhere. 18,19 Asthma is a heterogeneous disease, and early-onset asthma has been related to atopy and has substantial genetic susceptibility, whereas late-onset asthma tends to be non-allergic and induced by environmental and hormonal triggers. 18,19 Tubulointerstitial nephritis was diagnosed earlier in women than in men. An Australian study found an increase in acute interstitial nephritis in young women, which they attributed to immune-mediated conditions or analgesic nephropathy. 20 Potential beneficiaries from this study include individual patients, patient groups, medical charities, practising clinicians (in primary and secondary care), health-care providers, public health organisations, policy makers, and medical researchers both in academia and industry, including those involved in drug development and evaluation.
Knowing the age-specific and sex-specific incidence and prevalence could help patients gain perspective into their conditions. Patient organisations can use these data for awareness campaigns and to support fundraising.
Our chronological map can guide clinicians assessing individual patients on the likelihood of possible diagnoses on the basis of their frequency distribution in the general population at different ages. It could also be the first step towards the creation of decision-support tools from EHRs using artificial intelligence. 21 Age-specific incidence data on a wide range of preventable health conditions such as those presented in this Article are essential to realise the ambitions of the NHS Five Year Forward View 22 and the Life Sciences Industrial Strategy, 23 which have prioritised disease prevention and the development of new technologies to achieve this goal.
Commissioners of clinical services can use the findings from this study to inform budget allocation. The high prevalence of mental health, metabolic syndrome, musculoskeletal, and gynaecological conditions identified in this study highlights health-care delivery needs for these conditions. The incidence and prevalence of dementia will rise as the population ages. This will require not only effective drugs to prevent the onset of this condition, but also adequate social services to maintain the quality of life for affected individuals for as long as possible.
Our analysis lends support to calls for workforce expansion in key specialties. 24 Adequate staffing is urgently needed to treat highly prevalent conditions at different stages of the lifecourse, such as mental health and gynaecological disorders from young adulthood to middle age, and musculoskeletal, neurodegenerative, and eye conditions in later life.
High degrees of disparity between research funding and disease burden have been shown in mental health, musculoskeletal, and cardiovascular conditions. 25 Our findings reinforce the need for increased research investment into these conditions.
Delineating unmet health-care needs is crucial when planning and prioritising the initiation of new drugdevelopment programmes. Understanding when specific disease endpoints are most likely to occur, and in which individuals, is essential in designing and planning clinical trials.
By providing the case definitions for hundreds of conditions and their median age at diagnosis, we are laying the foundation for future studies into multi morbidity and ageing-related diseases using EHRs. The need for this research has been highlighted in a 2018 report published by the Academy of Medical Sciences. 26 The phenotyping algorithms in our platform can also be applied to EHRs linked to research-based cohort studies to provide disease-phenotype enrichment to support largescale genetic-association studies. [27][28][29] This integration of EHRs with genetic and other biomedical data enables a systems approach to the pathophysiology of disease. For example, phenome-wide association studies based on hospital EHRs are helping to identify diseases with common biological mechanisms. 30 Collectively, these methods could unlock new opportunities for drug target discovery and repositioning. 31 The main limitation in this study is its dependence on the accuracy of data recording. Although general practitioners directly enter codes into patients' EHRs during primary-care consultations, in secondary care, records are primarily paper-based and trained coders extract information from handwritten notes to allocate diagnoses and procedural codes for a hospital episode, during which process vital information could be misinterpreted and incorrectly reported. We expect the accuracy of secondary-care EHRs to improve with widespread adoption by clinicians of computerised hospital medical records.
Conditions might be under-represented in EHRs compared with surveys, as patients with mild to moderate symptoms might not present to health-care services. However, surveys are susceptible to non-response, response, selection, and volunteer biases, so the results might not be generalisable to the wider population. 32 Asymptomatic cases can also lead to underestimates in conditions in which diagnosis requires clinical examination or investigations. Although clinical studies might detect asymptomatic cases, they are seldom representative of the general population.
A time-lag might occur between disease onset and the age of first record because of delays in clinical manifestation, presentation to the doctor, and documentation of the condition in the patients' records. Age at first diagnosis, therefore, might not reflect the actual age of onset, especially for diseases with a long subclinical phase.
The NHS Health Checks programme 33 began in 2009 with the aim of reducing cardiovascular-disease risks and events. This has led to increased lipid profiling, blood pressure and BMI measurements in patients aged 40-74 years. Although this might have biased our estimation of incidence, prevalence, and age of first recorded diagnosis of dyslipidaemia towards middle-aged patients, it nevertheless allowed us to capture all relevant clinical measurements in a large population-based study, as opposed to relying on surveys or statistical estimations. NHS England offers a range of other screening tests to different sections of the population, depending on their risk of developing specific conditions. These programmes aim to detect early signs of disease in asymptomatic individuals. Neonates are screened for rare metabolic conditions, including cystic fibrosis and sickle-cell disease. Pregnant women are screened for fetal anomalies, HIV, syphilis, hepatitis B, sickle-cell disease, and thalassaemia. Patients with diabetes are screened for eye complications. Cervical screening is offered to women aged 25-64 years and breast screening is offered to women aged 50-70 years. Bowel-cancer screening is offered to individuals aged 55 years in some parts of England and 60-74 years throughout England, and screening for abdominal aortic aneurysm is offered to men aged 65 years. The eligibility criteria for screening, together with differing response rates within the invited population might bias the generalisability of prevalence estimates based on EHRs. Nevertheless, these screening programmes allow more cases to be identified from EHRs than other study samples, which would not be devoid of biases in any case.
We have identified anomalies in the records due to inaccurate coding for rare conditions and disorders with asymptomatic or carrier states. Autosomal recessive disorders such as thalassaemia and cystic fibrosis had median ages at first record of 29 years and 31 years, later than would have been anticipated. These conditions had a bimodal distribution of age at first record, with a first peak in early childhood and the second peak at childbearing age (appendix p 61). One explanation for these results could be that patients considering parenthood were erroneously coded as having these conditions after genetic screening tests revealed that they were heterozygous carriers. Another explanation is that mothers of neonates with these conditions were coded in lieu of their affected children who had not yet been registered with a general practice. Researchers using EHR data for these conditions should employ quality-control measures before analysis.
Caution needs to be exercised when interpreting the data for HIV, chronic hepatitis, and other sexually transmitted infections. In the UK, most consultations involving sexually transmitted infections are diagnosed and treated at sexual health service centres. 34 The records from these services are not linked to primary or secondary care for reasons of confidentiality. Therefore, these conditions are under-reported in the CPRD linked dataset.
As the population ages and multimorbidity becomes more prevalent, clinicians, health-care planners, policy makers, and researchers need to know which sections of the population are vulnerable to which health conditions at which ages to prevent, detect, and treat these conditions effectively. We have generated a compendium of health conditions consisting of a comprehensive reference of case-definition algorithms and frequencydistribution patterns, together with a chronological map of human health conditions over the lifecourse to address this need.

Contributors
VK conceived and designed the study. ADH and HH developed it. ADH and HH supervised the work. VK, OB, SH, SS, MH, DN, CAP, RTL, RS, and ADH selected the health conditions, developed the algorithms, and constructed the codelists. SD, AGI, and KD extracted the data and maintain the CALIBER Portal. VK analysed and interpreted the data, and wrote the report, to which HH and ADH made substantial revisions. All authors reviewed and interpreted the results, commented on the report, contributed to revisions, and read and approved the final version.

Declaration of interests
DN is on the steering group for grants funded by Glaxo Smith Kline and her team was subcontracted by Informatica to do the analyses of the National CKD Audit. RTL reports grants from Pfizer. ICKW has received research grants or speaker fees from Pfizer, GSK, Bayer, Amgen, Janssen, Medice and Novartis in the last three years outside this study, and is a member of the Independent Scientific Advisory Committee of Clinical Practice Research Datalink. All other authors declare no competing interests.

Data sharing
Algorithms and codelists for all 308 conditions included in our study are available on the CALIBER Portal. Our phenotyping algorithms and codelists are publicly available for readers to adopt and adapt for their own research, and can be downloaded in a machine-readable CSV format from a github data repository.