Abstract

An empiric perspective on what epidemiology has studied over time might inform discussions about future directions for the discipline. We aimed to identify the main areas of epidemiologic inquiry and determine how they evolved over time in 5 high-impact epidemiologic journals. We analyzed the titles and abstracts of 20,895 articles that were published between 1974 and 2013. In 5 time periods that reflected approximately equal numbers of articles, we identified the main topics by clustering terms based on co-occurrence. Infectious disease and cardiovascular disease epidemiology were the prevailing topics over the 5 periods. Cancer epidemiology was a major topic from 1974 to 2001 but disappeared thereafter. Nutritional epidemiology gained relative importance from 1974 to 2013. Environmental epidemiology appeared during 1996–2001 and continued to be important, whereas 2 clusters related to methodology and meta-analysis in genetics appeared during 2008–2013. Several areas of epidemiology, including injury or psychiatric epidemiology, did not make an appearance as major topics at any time. In an ancillary analysis of 6 high-impact general medicine journals, we found patterns of epidemiologic articles that were overall consistent with the findings in epidemiologic journals. This metaknowledge investigation allowed identification of the dominant topics in and conversely those that were absent from 5 major epidemiologic journals. We discuss implications for the field.

The definition of epidemiology has not changed significantly since it originated. A review of 70 epidemiology textbooks published between 1931 and 2014 shows that epidemiology has consistently been defined as the science of understanding the distribution and determinants of population health to be able to intervene to control or prevent disease (Supplementary Data and Supplementary Data, available at http://aje.oxfordjournals.org/). However, the scope of epidemiologic study and practice has expanded substantially over the past few decades. Between 1974 and 2013, there was a nearly 6-fold increase in the use of the term epidemiology in papers indexed by MEDLINE.

Motivated in part by changes in funding opportunities and in the scale of population-based studies, several recent comments have been concerned with potential future directions for epidemiology as a discipline (1–7). However, much of this soul-searching has been informed principally by expert opinion, with little evidence to guide our thinking. An empiric perspective on the field's evolution may be useful to help guide our collective thinking about future research directions for the field (8, 9). A large-scale content analysis can track how areas of epidemiologic research evolve, with various areas gaining or losing importance. Underrepresented research paths might be due to a lack of attention to important areas that need to be looked at in the future.

Metaknowledge investigations can complement the ongoing self-reflection in the field (10). Because they involve analyzing large quantities of texts, metaknowledge investigations have the potential to allow the investigation of the distribution and relative influence of topics over time (11). Considering that what we, as epidemiologists, write should reflect our vision of the discipline, such analysis may help us shape the discipline. Therefore, we aimed here to provide an empirical perspective on the field of epidemiology by identifying the main topics in 5 major epidemiology journals and assessing how they had evolved over the past 40 years.

METHODS

Selection of articles

We considered 5 high-impact epidemiology journals: the American Journal of Epidemiology, the International Journal of Epidemiology, the Annals of Epidemiology, Epidemiology, and the European Journal of Epidemiology. Their impact factors are among the highest for the category public, environmental, and occupational health of the Journal Citation Reports, and these journals are widely considered the journals of record (12).

We retrieved from MEDLINE via PUBMED the records of all indexed articles published in these 5 journals up to 2013, without any restriction on article type but including only articles for which an abstract was available. The selected articles were categorized into 5 time periods, and we aimed to have approximately equal numbers of articles in each time period. Data were analyzed separately for each period of time.

Linguistic processing

For each article, we extracted the title and abstract and then combined them into a single string. We discarded the words used to denote the structure of the abstract. Grammatical tagging allowed us to identify the part of speech (e.g., noun, pronoun, adjective, noun, or verb) and assign the lemma (its canonical form) of each word of a string. For instance, “genes” and “gene” would be assigned the lemma “gene.” Moreover, we developed a thesaurus that allowed for merging of different spellings of the same word (“ischaemic” and “ischemic”) and for merging an abbreviation with the word or phrase itself (PTSD and “posttraumatic stress disorder”). Each string was then reduced to a set of noun phrases, that is, single nouns or sequences of adjectives plus nouns or nouns that belong together (e.g., “cardiovascular disease” or “relative risk”). In the remainder, noun phrases were referred to as terms.

Terms that occurred multiple times within a string were counted only once, and we discarded the terms that occurred in fewer than 10 articles. The relevance of each term was estimated as the degree to which the occurrences of the term were oriented towards 1 or more topics underlying the articles. For a given term, it was measured as the Kullback-Leibler distance between the distribution of (second-order) co-occurrences between that term and all other terms and the overall distribution of co-occurrences over all terms. We selected the top 60% of the terms with the highest relevance (13).

Mapping and clustering of terms

The selected terms were positioned on a 2-dimensional co-occurrence plot and were grouped into clusters based on the co-appearance of terms. A normalized co-occurrence frequency was derived for each pair of terms. The locations of terms on the plot were determined by minimizing a weighted sum of the squared distances between all pairs of terms. Minimization was achieved through stress majorization (14). Consequently, terms with high co-occurrence tend to be close to each other, whereas terms that are far away from each other do not or rarely occur together in the same article (15). The terms were also assigned to clusters using a weighted variant of modularity-based clustering (16, 17). We characterized each cluster by providing a heading based on the terms in the cluster. We assessed the relative importance of clusters according to their share of terms relative to the total number of terms. Clusters of terms are interpreted as major epidemiology topics, and clusters located close to each other in the map indicate related topics.

For each time period, the resulting maps show terms as labeled nodes in the co-occurrence network. Node size is proportional to the term frequency of occurrence, so that the larger the node, the more articles include the term. The clustering of the terms is displayed on top of the map by coloring nodes based on the cluster to which they belong. Analysis involved the use of the VOSviewer software, version 1.5.7 (Centre for Science and Technology Studies, Leiden University, The Netherlands) (18).

Identification of bursts

To identify topics that attracted attention in epidemiology research but eventually faded away, we used Kleinberg's burst detection algorithm to identify words that experienced sudden increases in use (19, 20). The algorithm assesses states of the document stream, with different frequencies of individual words, and identifies state transitions, that is, years around which the frequency of a word's usage changes significantly. The analysis generates a list of burst words, together with the intervals of time during which each burst occurred and the intensity of the burst. We visualized the top 100 burst words graphically on a horizontal bar chart, with publication year on the x-axis, burst words on the y-axis, and a bar from the start to the end of the burst. The bar width is proportional to the intensity of the burst. Bars were color-coded according to the major epidemiology topics, as previously. Some words did not belong to any particular cluster, and the corresponding bars were left uncolored. Analysis involved the use of the Science of Science Tool, version 1.1 β (Cyberinfrastructure for Network Science Center, Indiana University, Bloomington, Indiana, http://sci2.cns.iu.edu).

Ancillary analysis of high-impact general medicine journals

Because many epidemiologic articles are published outside of epidemiology journals, we performed the following ancillary analysis. We considered 6 high-impact general medicine journals: The New England Journal of Medicine, The Lancet, The Journal of the American Medical Association, The BMJ, Annals of Internal Medicine, and PLoS Medicine. To identify articles most likely of relevance to the field of epidemiology, we analyzed how the articles published in the 5 epidemiology journals were indexed with MeSH terms in MEDLINE and we derived the following sensitivity-maximizing search filter: “epidemiology” (Subheading) OR Epidemiologic Factors (MeSH) OR Epidemiologic Methods (MeSH) OR epidemiologic studies (MeSH). Using this filter, we retrieved from MEDLINE the records of articles with abstracts that were published in these 6 general medicine journals during the same time period that the other articles in the 5 epidemiology journals. The selected articles were categorized into the same 5 time periods. We applied the same linguistic processing to the titles and abstracts, and we mapped and clustered terms into major topics, as previously.

RESULTS

Characteristics of selected articles

We selected 20,895 articles. Overall, 42.7% were published in the American Journal of Epidemiology, 21.7% in the International Journal of Epidemiology, 10.2% in the Annals of Epidemiology, 10.9% in Epidemiology, and 14.5% in the European Journal of Epidemiology. Figure 1 shows the evolution over time of the yearly number of articles across the 5 journals. In all, 3,725 (17.8%) articles were published between 1974 and 1989; 3,948 (18.9%) were published between 1990 and 1995; 4,492 (21.5%) were published between 1996 and 2001; 4,180 (20.0%) were published between 2002 and 2007; and 4,550 (21.8%) were published between 2008 and 2013.

Figure 1.

Number of articles published per year from 1974 to 2013 by journal.

Mapping and clustering of terms

Figure 2 shows the mapping and clustering of terms over time. Table 1 shows a summary of the clusters of terms and the evolution of major epidemiology topics. The map for 1974–1989 contained 5 main clusters of co-occurring terms, which corresponded to infectious diseases epidemiology, cardiovascular diseases epidemiology, cancer epidemiology, reproductive and perinatal epidemiology, and nutrition epidemiology.

Table 1.

Evolution of Major Topics in 5 High-Impact Epidemiologic Journals and in a Subset of Articles Published in 6 High-Impact General Medicine Journals, 1974–2013

Topica1974–1989
Epidemiology Journals 3,725 Articles, 951 Terms, %General Medicine Journalsb 8,602 Articles, 823 Terms, %
Infectious disease epidemiology3624
Infection1413
Antibody86
Virus7
Outbreak6
Illness66
Prevalence5
United States4
Cardiovascular disease epidemiology2416
Man15
Smoking73
Blood pressure6
Cigarette smoking5
Adjust5
Risk factor6
Relative risk3
Diabetes3
Cohort2
Cancer epidemiology1519
Mortality12
Cancer11
Mortality rate4
Death rate3
Lung cancer2
Therapy14
Survival5
Cell4
Chemotherapy3
Recipient3
Reproductive and perinatal epidemiology125
Case control study12
Relative risk7
Confidence interval6
Pregnancy44
Infant45
Mother3
Birth2
Delivery2
Clinical trials20
Trial10
Placebo8
Dose7
Efficacy6
Improvement4
Health care quality16
Physician6
Care4
Survey4
Cost3
Service
3
1990–1995
Epidemiology Journals 3,948 Articles, 1,116 Terms, %
General Medicine Journalsb 6,434 Articles, 903 Terms, %
Infectious disease epidemiology3215
Infection1214
Antibody55
HIV57
Italy4
Sensitivity4
Detection4
HIV infection3
Cardiovascular disease epidemiology2118
Body mass index8
Cigarette smoking6
Blood pressure5
Coronary heart disease5
Diabetes5
Case control study5
Baseline5
Adjust5
Smoking4
Hypertension4
Cancer epidemiology18
Cancer10
Approach5
Mortality rate4
Validity3
Example3
Reproductive and perinatal epidemiology148
Pregnancy64
Mother53
Birth54
Infant55
Smoker4
Screening5
Nutritional epidemiology8
Consumption8
Diet5
Alcohol4
Correlation4
Food3
Female cancer epidemiology8
Breast cancer4
Parity2
Family history2
Oral contraceptive2
Menopause1
Health care quality27
Care9
Survey8
Practice6
Questionnaire6
Physician6
Clinical trials21
Therapy14
Trial14
Placebo9
Efficacy9
Dose8
Cardiovascular disease epidemiology12
Sensitivity5
Myocardial infarction4
Stroke3
Specificity3
Acute myocardial infarction
2
1996–2001
Epidemiology Journals 4,492 Articles, 1,346 Terms, %
General Medicine Journalsb 6,891 Articles, 1,009 Terms, %
Infectious disease epidemiology3625
Infection1113
Approach5
HIV46
Antibody4
Strategy4
Survival8
Cell4
Sensitivity4
Cardiovascular disease epidemiology2129
Body mass index10
Baseline5
Physical activity5
Hypertension44
Alcohol consumption4
Diabetes5
Case control study5
Infant4
Birth4
Cancer epidemiology15
Lung cancer3
Cigarette3
Nonsmoker3
Cancer registry2
Cancer risk2
Reproductive and perinatal epidemiology13
Pregnancy7
Birth7
Mother5
Infant5
Breast cancer4
Nutritional epidemiology10
Consumption7
Diet4
Validity3
Error3
Agreement2
Environmental epidemiology6
Asthma2
Air pollution2
Season2
Respiratory symptom1
Respiratory disease1
Health care quality25
Care9
Quality7
Survey7
Practice7
Physician7
Clinical trials22
Trial21
Placebo11
Efficacy10
Dose7
Double blind
6
2002–2007
Epidemiology Journals 4,180 Articles, 1,236 Terms, %
General Medicine Journalsb 6,283 Articles, 978 Terms, %
Nutritional epidemiology24
Consumption9
Diet3
Inverse association3
Incident case2
Lower risk2
Cardiovascular disease epidemiology2220
Body mass index104
Cardiovascular disease65
Weight5
Coronary heart disease5
Height5
Stroke5
Hypertension5
Case control study4
Infectious disease epidemiology20
Infection8
Mortality rate3
Transmission3
HIV34
Setting2
Gene4
Cell4
Progression3
Vaccine3
Methodology16
Approach7
Epidemiology5
Bias5
Paper5
Problem4
Reproductive and perinatal epidemiology10
Pregnancy7
Mother6
Infant4
Birth weight4
Offspring3
Environmental epidemiology7
Asthma2
Air pollution2
Nonsmoker2
Susceptibility2
Season2
Health care quality28
Practice7
Health7
Survey6
Research6
Problem5
Clinical trials23
Placebo12
Controlled trial12
Hazard ratio10
Efficacy10
Dose8
Meta-analysis13
Quality10
Review6
Medline6
Systematic review6
Meta analysis6
Cost-benefit analysis2
Cost5
Dollar2
Cost effectiveness2
Life year2
Life expectancy
1
2008–2013
Epidemiology Journals 4,550 Articles, 1,354 Terms, %
General Medicine Journalsb 6,159 Articles, 1,040 Terms, %
Cardiovascular disease epidemiology2318
Body mass index125
Height5
Childhood3
Cause mortality3
Blood pressure3
Cardiovascular disease5
Prospective cohort study4
Smoking4
Gene3
Nutritional epidemiology20
Hazard ratio10
Consumption5
Diet3
Health study3
Smoker3
Infectious disease epidemiology19
Research8
Epidemiology6
Infection6
Issue4
Article4
Methodology15
Method13
Approach10
Bias7
Problem5
Design5
Reproductive and perinatal epidemiology11
Pregnancy8
Mother5
Childhood4
Birth weight3
Infant3
Environmental epidemiology7
Air pollution3
Particulate matter2
Season2
Temperature1
Hospital admission1
Meta-analysis313
Meta analysis59
Gene4
Systematic review38
Genotype3
Polymorphism2
Randomized controlled trial8
Medline7
Database7
Global health38
Country11
Prevalence9
Trend6
Survey6
Cost6
Clinical trials20
Hazard ratio15
Therapy15
Week13
Placebo12
Clinical trial11
Cardiovascular disease epidemiology13
Hazard ratio15
Stroke7
Significant difference6
Myocardial infarction6
Cause mortality5
Topica1974–1989
Epidemiology Journals 3,725 Articles, 951 Terms, %General Medicine Journalsb 8,602 Articles, 823 Terms, %
Infectious disease epidemiology3624
Infection1413
Antibody86
Virus7
Outbreak6
Illness66
Prevalence5
United States4
Cardiovascular disease epidemiology2416
Man15
Smoking73
Blood pressure6
Cigarette smoking5
Adjust5
Risk factor6
Relative risk3
Diabetes3
Cohort2
Cancer epidemiology1519
Mortality12
Cancer11
Mortality rate4
Death rate3
Lung cancer2
Therapy14
Survival5
Cell4
Chemotherapy3
Recipient3
Reproductive and perinatal epidemiology125
Case control study12
Relative risk7
Confidence interval6
Pregnancy44
Infant45
Mother3
Birth2
Delivery2
Clinical trials20
Trial10
Placebo8
Dose7
Efficacy6
Improvement4
Health care quality16
Physician6
Care4
Survey4
Cost3
Service
3
1990–1995
Epidemiology Journals 3,948 Articles, 1,116 Terms, %
General Medicine Journalsb 6,434 Articles, 903 Terms, %
Infectious disease epidemiology3215
Infection1214
Antibody55
HIV57
Italy4
Sensitivity4
Detection4
HIV infection3
Cardiovascular disease epidemiology2118
Body mass index8
Cigarette smoking6
Blood pressure5
Coronary heart disease5
Diabetes5
Case control study5
Baseline5
Adjust5
Smoking4
Hypertension4
Cancer epidemiology18
Cancer10
Approach5
Mortality rate4
Validity3
Example3
Reproductive and perinatal epidemiology148
Pregnancy64
Mother53
Birth54
Infant55
Smoker4
Screening5
Nutritional epidemiology8
Consumption8
Diet5
Alcohol4
Correlation4
Food3
Female cancer epidemiology8
Breast cancer4
Parity2
Family history2
Oral contraceptive2
Menopause1
Health care quality27
Care9
Survey8
Practice6
Questionnaire6
Physician6
Clinical trials21
Therapy14
Trial14
Placebo9
Efficacy9
Dose8
Cardiovascular disease epidemiology12
Sensitivity5
Myocardial infarction4
Stroke3
Specificity3
Acute myocardial infarction
2
1996–2001
Epidemiology Journals 4,492 Articles, 1,346 Terms, %
General Medicine Journalsb 6,891 Articles, 1,009 Terms, %
Infectious disease epidemiology3625
Infection1113
Approach5
HIV46
Antibody4
Strategy4
Survival8
Cell4
Sensitivity4
Cardiovascular disease epidemiology2129
Body mass index10
Baseline5
Physical activity5
Hypertension44
Alcohol consumption4
Diabetes5
Case control study5
Infant4
Birth4
Cancer epidemiology15
Lung cancer3
Cigarette3
Nonsmoker3
Cancer registry2
Cancer risk2
Reproductive and perinatal epidemiology13
Pregnancy7
Birth7
Mother5
Infant5
Breast cancer4
Nutritional epidemiology10
Consumption7
Diet4
Validity3
Error3
Agreement2
Environmental epidemiology6
Asthma2
Air pollution2
Season2
Respiratory symptom1
Respiratory disease1
Health care quality25
Care9
Quality7
Survey7
Practice7
Physician7
Clinical trials22
Trial21
Placebo11
Efficacy10
Dose7
Double blind
6
2002–2007
Epidemiology Journals 4,180 Articles, 1,236 Terms, %
General Medicine Journalsb 6,283 Articles, 978 Terms, %
Nutritional epidemiology24
Consumption9
Diet3
Inverse association3
Incident case2
Lower risk2
Cardiovascular disease epidemiology2220
Body mass index104
Cardiovascular disease65
Weight5
Coronary heart disease5
Height5
Stroke5
Hypertension5
Case control study4
Infectious disease epidemiology20
Infection8
Mortality rate3
Transmission3
HIV34
Setting2
Gene4
Cell4
Progression3
Vaccine3
Methodology16
Approach7
Epidemiology5
Bias5
Paper5
Problem4
Reproductive and perinatal epidemiology10
Pregnancy7
Mother6
Infant4
Birth weight4
Offspring3
Environmental epidemiology7
Asthma2
Air pollution2
Nonsmoker2
Susceptibility2
Season2
Health care quality28
Practice7
Health7
Survey6
Research6
Problem5
Clinical trials23
Placebo12
Controlled trial12
Hazard ratio10
Efficacy10
Dose8
Meta-analysis13
Quality10
Review6
Medline6
Systematic review6
Meta analysis6
Cost-benefit analysis2
Cost5
Dollar2
Cost effectiveness2
Life year2
Life expectancy
1
2008–2013
Epidemiology Journals 4,550 Articles, 1,354 Terms, %
General Medicine Journalsb 6,159 Articles, 1,040 Terms, %
Cardiovascular disease epidemiology2318
Body mass index125
Height5
Childhood3
Cause mortality3
Blood pressure3
Cardiovascular disease5
Prospective cohort study4
Smoking4
Gene3
Nutritional epidemiology20
Hazard ratio10
Consumption5
Diet3
Health study3
Smoker3
Infectious disease epidemiology19
Research8
Epidemiology6
Infection6
Issue4
Article4
Methodology15
Method13
Approach10
Bias7
Problem5
Design5
Reproductive and perinatal epidemiology11
Pregnancy8
Mother5
Childhood4
Birth weight3
Infant3
Environmental epidemiology7
Air pollution3
Particulate matter2
Season2
Temperature1
Hospital admission1
Meta-analysis313
Meta analysis59
Gene4
Systematic review38
Genotype3
Polymorphism2
Randomized controlled trial8
Medline7
Database7
Global health38
Country11
Prevalence9
Trend6
Survey6
Cost6
Clinical trials20
Hazard ratio15
Therapy15
Week13
Placebo12
Clinical trial11
Cardiovascular disease epidemiology13
Hazard ratio15
Stroke7
Significant difference6
Myocardial infarction6
Cause mortality5

Abbreviation: HIV, human immunodeficiency virus.

a Data are clusters of terms interpreted as major topics (with the percentage of terms within each cluster) and the top 5 terms within each cluster (with the percentage of articles including each term). The major topics are ordered by their importance in the epidemiologic journals and then in the general medicine journals.

b For the 6 high-impact medical journals, we retrieved articles most likely of relevance to the field of epidemiology by using a custom search filter.

Table 1.

Evolution of Major Topics in 5 High-Impact Epidemiologic Journals and in a Subset of Articles Published in 6 High-Impact General Medicine Journals, 1974–2013

Topica1974–1989
Epidemiology Journals 3,725 Articles, 951 Terms, %General Medicine Journalsb 8,602 Articles, 823 Terms, %
Infectious disease epidemiology3624
Infection1413
Antibody86
Virus7
Outbreak6
Illness66
Prevalence5
United States4
Cardiovascular disease epidemiology2416
Man15
Smoking73
Blood pressure6
Cigarette smoking5
Adjust5
Risk factor6
Relative risk3
Diabetes3
Cohort2
Cancer epidemiology1519
Mortality12
Cancer11
Mortality rate4
Death rate3
Lung cancer2
Therapy14
Survival5
Cell4
Chemotherapy3
Recipient3
Reproductive and perinatal epidemiology125
Case control study12
Relative risk7
Confidence interval6
Pregnancy44
Infant45
Mother3
Birth2
Delivery2
Clinical trials20
Trial10
Placebo8
Dose7
Efficacy6
Improvement4
Health care quality16
Physician6
Care4
Survey4
Cost3
Service
3
1990–1995
Epidemiology Journals 3,948 Articles, 1,116 Terms, %
General Medicine Journalsb 6,434 Articles, 903 Terms, %
Infectious disease epidemiology3215
Infection1214
Antibody55
HIV57
Italy4
Sensitivity4
Detection4
HIV infection3
Cardiovascular disease epidemiology2118
Body mass index8
Cigarette smoking6
Blood pressure5
Coronary heart disease5
Diabetes5
Case control study5
Baseline5
Adjust5
Smoking4
Hypertension4
Cancer epidemiology18
Cancer10
Approach5
Mortality rate4
Validity3
Example3
Reproductive and perinatal epidemiology148
Pregnancy64
Mother53
Birth54
Infant55
Smoker4
Screening5
Nutritional epidemiology8
Consumption8
Diet5
Alcohol4
Correlation4
Food3
Female cancer epidemiology8
Breast cancer4
Parity2
Family history2
Oral contraceptive2
Menopause1
Health care quality27
Care9
Survey8
Practice6
Questionnaire6
Physician6
Clinical trials21
Therapy14
Trial14
Placebo9
Efficacy9
Dose8
Cardiovascular disease epidemiology12
Sensitivity5
Myocardial infarction4
Stroke3
Specificity3
Acute myocardial infarction
2
1996–2001
Epidemiology Journals 4,492 Articles, 1,346 Terms, %
General Medicine Journalsb 6,891 Articles, 1,009 Terms, %
Infectious disease epidemiology3625
Infection1113
Approach5
HIV46
Antibody4
Strategy4
Survival8
Cell4
Sensitivity4
Cardiovascular disease epidemiology2129
Body mass index10
Baseline5
Physical activity5
Hypertension44
Alcohol consumption4
Diabetes5
Case control study5
Infant4
Birth4
Cancer epidemiology15
Lung cancer3
Cigarette3
Nonsmoker3
Cancer registry2
Cancer risk2
Reproductive and perinatal epidemiology13
Pregnancy7
Birth7
Mother5
Infant5
Breast cancer4
Nutritional epidemiology10
Consumption7
Diet4
Validity3
Error3
Agreement2
Environmental epidemiology6
Asthma2
Air pollution2
Season2
Respiratory symptom1
Respiratory disease1
Health care quality25
Care9
Quality7
Survey7
Practice7
Physician7
Clinical trials22
Trial21
Placebo11
Efficacy10
Dose7
Double blind
6
2002–2007
Epidemiology Journals 4,180 Articles, 1,236 Terms, %
General Medicine Journalsb 6,283 Articles, 978 Terms, %
Nutritional epidemiology24
Consumption9
Diet3
Inverse association3
Incident case2
Lower risk2
Cardiovascular disease epidemiology2220
Body mass index104
Cardiovascular disease65
Weight5
Coronary heart disease5
Height5
Stroke5
Hypertension5
Case control study4
Infectious disease epidemiology20
Infection8
Mortality rate3
Transmission3
HIV34
Setting2
Gene4
Cell4
Progression3
Vaccine3
Methodology16
Approach7
Epidemiology5
Bias5
Paper5
Problem4
Reproductive and perinatal epidemiology10
Pregnancy7
Mother6
Infant4
Birth weight4
Offspring3
Environmental epidemiology7
Asthma2
Air pollution2
Nonsmoker2
Susceptibility2
Season2
Health care quality28
Practice7
Health7
Survey6
Research6
Problem5
Clinical trials23
Placebo12
Controlled trial12
Hazard ratio10
Efficacy10
Dose8
Meta-analysis13
Quality10
Review6
Medline6
Systematic review6
Meta analysis6
Cost-benefit analysis2
Cost5
Dollar2
Cost effectiveness2
Life year2
Life expectancy
1
2008–2013
Epidemiology Journals 4,550 Articles, 1,354 Terms, %
General Medicine Journalsb 6,159 Articles, 1,040 Terms, %
Cardiovascular disease epidemiology2318
Body mass index125
Height5
Childhood3
Cause mortality3
Blood pressure3
Cardiovascular disease5
Prospective cohort study4
Smoking4
Gene3
Nutritional epidemiology20
Hazard ratio10
Consumption5
Diet3
Health study3
Smoker3
Infectious disease epidemiology19
Research8
Epidemiology6
Infection6
Issue4
Article4
Methodology15
Method13
Approach10
Bias7
Problem5
Design5
Reproductive and perinatal epidemiology11
Pregnancy8
Mother5
Childhood4
Birth weight3
Infant3
Environmental epidemiology7
Air pollution3
Particulate matter2
Season2
Temperature1
Hospital admission1
Meta-analysis313
Meta analysis59
Gene4
Systematic review38
Genotype3
Polymorphism2
Randomized controlled trial8
Medline7
Database7
Global health38
Country11
Prevalence9
Trend6
Survey6
Cost6
Clinical trials20
Hazard ratio15
Therapy15
Week13
Placebo12
Clinical trial11
Cardiovascular disease epidemiology13
Hazard ratio15
Stroke7
Significant difference6
Myocardial infarction6
Cause mortality5
Topica1974–1989
Epidemiology Journals 3,725 Articles, 951 Terms, %General Medicine Journalsb 8,602 Articles, 823 Terms, %
Infectious disease epidemiology3624
Infection1413
Antibody86
Virus7
Outbreak6
Illness66
Prevalence5
United States4
Cardiovascular disease epidemiology2416
Man15
Smoking73
Blood pressure6
Cigarette smoking5
Adjust5
Risk factor6
Relative risk3
Diabetes3
Cohort2
Cancer epidemiology1519
Mortality12
Cancer11
Mortality rate4
Death rate3
Lung cancer2
Therapy14
Survival5
Cell4
Chemotherapy3
Recipient3
Reproductive and perinatal epidemiology125
Case control study12
Relative risk7
Confidence interval6
Pregnancy44
Infant45
Mother3
Birth2
Delivery2
Clinical trials20
Trial10
Placebo8
Dose7
Efficacy6
Improvement4
Health care quality16
Physician6
Care4
Survey4
Cost3
Service
3
1990–1995
Epidemiology Journals 3,948 Articles, 1,116 Terms, %
General Medicine Journalsb 6,434 Articles, 903 Terms, %
Infectious disease epidemiology3215
Infection1214
Antibody55
HIV57
Italy4
Sensitivity4
Detection4
HIV infection3
Cardiovascular disease epidemiology2118
Body mass index8
Cigarette smoking6
Blood pressure5
Coronary heart disease5
Diabetes5
Case control study5
Baseline5
Adjust5
Smoking4
Hypertension4
Cancer epidemiology18
Cancer10
Approach5
Mortality rate4
Validity3
Example3
Reproductive and perinatal epidemiology148
Pregnancy64
Mother53
Birth54
Infant55
Smoker4
Screening5
Nutritional epidemiology8
Consumption8
Diet5
Alcohol4
Correlation4
Food3
Female cancer epidemiology8
Breast cancer4
Parity2
Family history2
Oral contraceptive2
Menopause1
Health care quality27
Care9
Survey8
Practice6
Questionnaire6
Physician6
Clinical trials21
Therapy14
Trial14
Placebo9
Efficacy9
Dose8
Cardiovascular disease epidemiology12
Sensitivity5
Myocardial infarction4
Stroke3
Specificity3
Acute myocardial infarction
2
1996–2001
Epidemiology Journals 4,492 Articles, 1,346 Terms, %
General Medicine Journalsb 6,891 Articles, 1,009 Terms, %
Infectious disease epidemiology3625
Infection1113
Approach5
HIV46
Antibody4
Strategy4
Survival8
Cell4
Sensitivity4
Cardiovascular disease epidemiology2129
Body mass index10
Baseline5
Physical activity5
Hypertension44
Alcohol consumption4
Diabetes5
Case control study5
Infant4
Birth4
Cancer epidemiology15
Lung cancer3
Cigarette3
Nonsmoker3
Cancer registry2
Cancer risk2
Reproductive and perinatal epidemiology13
Pregnancy7
Birth7
Mother5
Infant5
Breast cancer4
Nutritional epidemiology10
Consumption7
Diet4
Validity3
Error3
Agreement2
Environmental epidemiology6
Asthma2
Air pollution2
Season2
Respiratory symptom1
Respiratory disease1
Health care quality25
Care9
Quality7
Survey7
Practice7
Physician7
Clinical trials22
Trial21
Placebo11
Efficacy10
Dose7
Double blind
6
2002–2007
Epidemiology Journals 4,180 Articles, 1,236 Terms, %
General Medicine Journalsb 6,283 Articles, 978 Terms, %
Nutritional epidemiology24
Consumption9
Diet3
Inverse association3
Incident case2
Lower risk2
Cardiovascular disease epidemiology2220
Body mass index104
Cardiovascular disease65
Weight5
Coronary heart disease5
Height5
Stroke5
Hypertension5
Case control study4
Infectious disease epidemiology20
Infection8
Mortality rate3
Transmission3
HIV34
Setting2
Gene4
Cell4
Progression3
Vaccine3
Methodology16
Approach7
Epidemiology5
Bias5
Paper5
Problem4
Reproductive and perinatal epidemiology10
Pregnancy7
Mother6
Infant4
Birth weight4
Offspring3
Environmental epidemiology7
Asthma2
Air pollution2
Nonsmoker2
Susceptibility2
Season2
Health care quality28
Practice7
Health7
Survey6
Research6
Problem5
Clinical trials23
Placebo12
Controlled trial12
Hazard ratio10
Efficacy10
Dose8
Meta-analysis13
Quality10
Review6
Medline6
Systematic review6
Meta analysis6
Cost-benefit analysis2
Cost5
Dollar2
Cost effectiveness2
Life year2
Life expectancy
1
2008–2013
Epidemiology Journals 4,550 Articles, 1,354 Terms, %
General Medicine Journalsb 6,159 Articles, 1,040 Terms, %
Cardiovascular disease epidemiology2318
Body mass index125
Height5
Childhood3
Cause mortality3
Blood pressure3
Cardiovascular disease5
Prospective cohort study4
Smoking4
Gene3
Nutritional epidemiology20
Hazard ratio10
Consumption5
Diet3
Health study3
Smoker3
Infectious disease epidemiology19
Research8
Epidemiology6
Infection6
Issue4
Article4
Methodology15
Method13
Approach10
Bias7
Problem5
Design5
Reproductive and perinatal epidemiology11
Pregnancy8
Mother5
Childhood4
Birth weight3
Infant3
Environmental epidemiology7
Air pollution3
Particulate matter2
Season2
Temperature1
Hospital admission1
Meta-analysis313
Meta analysis59
Gene4
Systematic review38
Genotype3
Polymorphism2
Randomized controlled trial8
Medline7
Database7
Global health38
Country11
Prevalence9
Trend6
Survey6
Cost6
Clinical trials20
Hazard ratio15
Therapy15
Week13
Placebo12
Clinical trial11
Cardiovascular disease epidemiology13
Hazard ratio15
Stroke7
Significant difference6
Myocardial infarction6
Cause mortality5

Abbreviation: HIV, human immunodeficiency virus.

a Data are clusters of terms interpreted as major topics (with the percentage of terms within each cluster) and the top 5 terms within each cluster (with the percentage of articles including each term). The major topics are ordered by their importance in the epidemiologic journals and then in the general medicine journals.

b For the 6 high-impact medical journals, we retrieved articles most likely of relevance to the field of epidemiology by using a custom search filter.

Figure 2.

Mapping and clustering of terms in 5 high-impact epidemiology journals for A) 1974–1989, B) 1990–1995, C) 1996–2001, D) 2002–2007, and E) 2008–2013. The maps show terms as labeled nodes. Some terms appear to be misspelled or truncated because of the tasks of linguistic processing that were performed before the mapping and clustering of terms, as described in the Methods section. Node size is proportional to the term frequency of occurrence (i.e., the larger the node, the more articles include the term). Terms that are far away from each other do not or rarely occur together in the same article, whereas terms with high co-occurrence are close to each other. The clustering of the terms is displayed on top of the map by coloring nodes based on the cluster to which they belong. Clusters of terms are interpreted as major epidemiology topics, and clusters located close to each other in the map indicate related topics.

These 5 major topics were identified consistently over the 1990–1995 and 1996–2001 periods. In addition, a cluster corresponding to female cancer epidemiology was identified for 1990–1995 but disappeared thereafter, whereas a cluster related to environmental appeared during 1996–2001 and persisted in the subsequent periods.

For the period of 2002–2007, infectious and cardiovascular diseases epidemiology remained among the top major topics. However, the cluster related to cancer epidemiology disappeared, and the nutritional epidemiology cluster gained a larger share of the term map. Moreover, a cluster related to methodology appeared during 2002–2007, ahead of reproductive and perinatal epidemiology and environmental epidemiology, and persisted in the subsequent period.

Finally, 2008–2013 saw the appearance of another cluster related to meta-analysis in genetics, and the period included a total of 7 clusters. Cardiovascular diseases, nutrition, and infectious disease epidemiology remained the top major topics. Reproductive and perinatal epidemiology and environmental epidemiology completed the picture.

Identification of bursts

The analysis of the 100 top burst words showed similar patterns (Supplementary Data). From 1974 to 1999, all of the bursts were related to infectious diseases except the words “systolic” and “diastolic,” which were related to the epidemiology of cardiovascular diseases. The word “seropositive” showed the most intense burst. After 2000, there was no clear time pattern of burst. Some terms belonged to a single topic. For instance, the words “ambient” and “particulate” were exclusively related to environmental epidemiology. Eight terms were related to genetics and meta-analysis. Two of the terms with the most intense bursts (“gestational” and “preterm”) were related to reproductive and perinatal epidemiology. A majority of burst words were related to methodology (e.g., “pathway,” “Bayesian,” “causal,” “mediation,” and “simulation”). Finally, approximately one-third of burst words did not belong to any particular cluster, but most of them were related to methodology (e.g., “multilevel,” “P for trend,” “Cox,” “heterogeneity,” “modeling,” and “confounder”).

Ancillary analysis of high-impact general medicine journals

Our search retrieved 32,760 articles most likely of relevance to the field of epidemiology that were published in the 6 general medicine journals. Supplementary Data shows the mapping and clustering of terms over time. Table 1 shows a summary of the clusters of terms and allows comparison with articles from epidemiologic journals.

Some topics were similar to those identified in epidemiology journals and showed similar evolution. Cardiovascular diseases and infectious diseases were among the main topics over the 5 time periods, except the last time period for infectious diseases. We identified a cancer cluster in the 1974–1989 period and a reproductive and perinatal health cluster over the 1974–1989 and 1990–1995 periods. A cluster related to meta-analysis appeared during 2002–2007 and persisted in the subsequent period.

Other topics differed from those identified previously. One cluster corresponding to clinical trial terms (in multiple clinical specialties) was identified over the 5 periods. Another cluster related to health care quality was identified over the periods from 1974 to 2007. Lastly, a cost-benefit analysis cluster was identified in 2002–2007, and a global health cluster was identified in 2008–2013.

DISCUSSION

We analyzed 20,895 articles published in 5 epidemiology journals over 4 decades using a production-oriented approach to investigate the epistemic core of epidemiology. We found a clear pattern of leading areas of epidemiologic inquiry during this period and patterns in the evolution of these areas. We found, first, that the epidemiology of infectious and cardiovascular diseases have consistently been the main topics of interest in these 5 journals. Second, cancer epidemiology has been a major topic, with a peak in knowledge production in 1990–1995, where 2 clusters related to cancer and female cancer were identified but stopped being a leading focus of papers in the 5 epidemiologic journals after 2001. Third, nutritional epidemiology gained importance over time. Fourth, 3 topics were among the leading areas of inquiry for time-delimited periods, namely environmental epidemiology since 1996, whereas methodology and meta-analysis in genetics appeared in 2008–2013.

Because we focused our inquiry on these 5 leading epidemiology journals, we can interpret our findings as representing knowledge produced and regulated through peer review principally by epidemiologists and shaped by editorial processes in line with the leading epidemiologic organizations. The American Journal of Epidemiology is published in association with the Society for Epidemiologic Research, the International Journal of Epidemiology is published on behalf of the International Epidemiological Association, Epidemiology is the official journal of The International Society for Environmental Epidemiology, and the Annals of Epidemiology is the official publication of the American College of Epidemiology. By design, this analysis, excludes papers published by epidemiologists, or papers published by nonepidemiologists that would nonetheless be considered within the field's remit, that were published in nonepidemiologic journals. There is little question that such papers thrive, particularly in clinical journals. So, for example, the decrease in focus on cancer in these 5 journals over the past decade represents most likely a shift in where these papers are being published—away from epidemiology journals to cancer journals. We would argue, however, that there is consequence to publishing the relevant papers in the leading journals in the discipline. Epidemiologists are, in many respects, the keepers of the methodological flame in population health sciences. If cancer epidemiology is evolving in nonepidemiology journals, it represents a tremendous lost opportunity for the field to make the contribution it can and should make to one of the leading global causes of death. Therefore, although our observations are in some ways heartening, reinforcing that we are focusing on cardiovascular disease commensurately with the contribution of cardiovascular disease to burden of mortality, they also suggest that the discipline is playing a far smaller role in other areas that are also important. For example, although several new areas have gained prominence in the field over the past decades, including social epidemiology, these are clearly not represented among the key areas in these 5 leading epidemiology journals over the time period of interest (21–23). Moreover, areas such as injury, psychiatric, or neurological epidemiology are clearly not among the main topics identified; this is dissonant with the importance that these areas have for global burden of disease (24–26). Within a consequentialist epidemiology framework, it would certainly stand the field in good stead if we engaged actively around inquiry concerning the major causes of morbidity and mortality worldwide, with an eye to how we may prevent disease and improve health (27–29).

The increasing role that methodological papers play in publication in epidemiology journals over the past decade presents both opportunities and challenges. In some respects, this evolution represents an evolution in the field, wherein methods for epidemiology are being developed principally by epidemiologists. This reflects a maturity in the field, moving well beyond its origins where methods in the discipline emerged from other areas (8). However, it also suggests that the field takes upon itself greater responsibility, both to keep developing methods that are adequate to the evolving population health challenges we face and to ensure flexibility to the incorporation of methods that do arise in other areas that may be fruitful for epidemiology to adopt.

These observations also have implications for our educational programs and how we train the next generation of epidemiologists. If the leading epidemiology journals focus insufficiently on significant areas of population health, we as a discipline may fall short on our self-definition and our promise as a field. This stands both to change the composition of those who are attracted to the discipline and potentially to influence the structural factors (such as promotion expectations and criteria) that stand to reinforce our areas of focus and growth in the field going forward.

Our analysis has limitations. First, our results and interpretation depended on our selection of epidemiology journals. A different list of journals could be considered. For instance, the Public Health/Health Administration Section of the Medical Library Association considered 10 journals as “essential for a collection that supports a program with subject specialization in this area” (30, p. 572). Moreover, many epidemiologic articles are published in nonepidemiologic journals. However, in our ancillary analysis of 6 high-impact general medicine journals, we found patterns of epidemiologic papers that were consistent overall with the findings in epidemiologic journals, which suggests that the trends observed here hold across the discipline. Second, our results correspond to a macroscopic rather than microscopic mapping of the discipline in the sense that we may have missed subtle regularities in the objects of research. We were, in fact, interested by the identification of main topics, suggesting that our approach suited our purpose. We may have missed the exact dynamics of appearance or disappearance of these main topics because our categorization of articles aimed for an approximately equal number of articles in each time period.

In sum, we identified the major topics in 5 high-impact journals of epidemiology, and we analyzed the trends of these main topics. This allowed for an empiric perspective on the discipline's past, with an eye to its future. Our metaknowledge investigation, which relied on freely accessible data sources and free software, is replicable. Monitoring the evolution of the science of epidemiology may help inform our efforts to consider appropriate recalibration of the field's scope.

ACKNOWLEDGMENTS

Author affiliations: Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, New York (Ludovic Trinquart); and School of Public Health, Boston University, Boston, Massachusetts (Sandro Galea).

Conflict of interest: none declared.

REFERENCES

1

Lilienfeld
DE
.
The general epidemiologist: Is there a place in today's epidemiology?
Am J Epidemiol
.
2007
;
166
1
:
1
4
.

2

Armenian
HK
.
Epidemiology: a problem-solving journey
.
Am J Epidemiol
.
2009
;
169
2
:
127
131
.

3

Ness
RB
Andrews
EB
Gaudino
JA
Jr
et al. 
The future of epidemiology
.
Acad Med
.
2009
;
84
11
:
1631
1637
.

4

Bhopal
R
Macfarlane
GJ
Smith
WC
et al. 
What is the future of epidemiology?
Lancet
.
2011
;
378
9790
:
464
465
.

5

Pearce
N
.
Epidemiology in a changing world: variation, causation and ubiquitous risk factors
.
Int J Epidemiol
.
2011
;
40
2
:
503
512
.

6

McKeown
RE
.
Is epidemiology correcting its vision problem? A perspective on our perspective: 2012 presidential address for American College of Epidemiology
.
Ann Epidemiol
.
2013
;
23
10
:
603
607
.

7

Khoury
MJ
Lam
TK
Ioannidis
JP
et al. 
Transforming epidemiology for 21st century medicine and public health
.
Cancer Epidemiol Biomarkers Prev
.
2013
;
22
4
:
508
516
.

8

Morabia
A
.
A History of Epidemiologic Methods and Concepts
.
Basel, Switzerland
:
Birkhäser Verlag
;
2004
.

9

Buck
C
Llopis
A
Najera
E
et al. 
The Challenge of Epidemiology. Issues and Selected Readings
. Washington, DC
:
Pan American Health Organization
;
1988
.

10

Evans
JA
Foster
JG
.
Metaknowledge
.
Science
.
2011
;
331
6018
:
721
725
.

11

Griffiths
TL
Steyvers
M
.
Finding scientific topics
.
Proc Natl Acad Sci U S A
.
2004
;
101
(
suppl 1
):
5228
5235
.

12

ISI Web of Knowledge
.
2012 Journal Citation Reports Science Edition
. .
Thomson Reuters
;
2014
.

13

van Eck
N
Waltman
L
.
Text Mining and Visualization Using VOSviewer
.
Leiden, The Netherlands
:
Centre for Science and Technology Studies, Leiden University
;
2012
.

14

Borg
I
Groenen
P
.
Modern Multidimensional Scaling
. 2nd ed.
New York, NY
:
Springer
;
2005
.

15

Van Eck
NJ
Waltman
L
.
Bibliometric mapping of the computational intelligence field
.
Int J Unc Fuzz Knowl Based Syst
.
2007
;
15
5
:
625
645
.

16

Newman
MEJ
Girvan
M
.
Finding and evaluating community structure in networks
.
Phys Rev E
.
2004
;
69
2
:
026113
.

17

Waltman
L
van Eck
NJ
Noyons
ECM
.
A unified approach to mapping and clustering of bibliometric networks
.
J Informetrics
.
2010
;
4
4
:
629
635
.

18

van Eck
NJ
Waltman
L
.
Software survey: VOSviewer, a computer program for bibliometric mapping
.
Scientometrics
.
2010
;
84
2
:
523
538
.

19

Kleinberg
J
.
Bursty and hierarchical structure in streams
.
Data Min Knowl Discov
.
2003
;
7
4
:
373
397
.

20

Mane
KK
Börner
K
.
Mapping topics and topic bursts in PNAS
.
Proc Natl Acad Sci U S A
.
2004
;
101
(s
uppl 1
):
5287
5290
.

21

Berkman
L
Kawachi
I
.
Social Epidemiology
.
New York, NY
:
Oxford University Press
;
2000
.

22

Cwikel
J
.
Social Epidemiology: Strategies for Public Health Activism
.
New York, NY
:
Columbia University Press
;
2006
.

23

O'Campo
P
Dunn
J
.
Rethinking Social Epidemiology: Towards a Science of Change
.
New York, NY
:
Springer
;
2011
.

24

Vos
T
Flaxman
AD
Naghavi
M
et al. 
Years lived with disability (YLDs) for 1160 sequelae of 289 diseases and injuries 1990–2010: a systematic analysis for the Global Burden of Disease Study 2010
.
Lancet
.
2012
;
380
9859
:
2163
2196
.

25

Murray
CJ
Vos
T
Lozano
R
et al. 
Disability-adjusted life years (DALYs) for 291 diseases and injuries in 21 regions, 1990–2010: a systematic analysis for the Global Burden of Disease Study 2010
.
Lancet
.
2012
;
380
9859
:
2197
2223
.

26

Whiteford
HA
Degenhardt
L
Rehm
J
et al. 
Global burden of disease attributable to mental and substance use disorders: findings from the Global Burden of Disease Study 2010
.
Lancet
.
2013
;
382
9904
:
1575
1586
.

27

Galea
S
.
An argument for a consequentialist epidemiology
.
Am J Epidemiol
.
2013
;
178
8
:
1185
1191
.

28

Cates
W
Jr
.
Invited commentary: consequential(ist) epidemiology: let's seize the day
.
Am J Epidemiol
.
2013
;
178
8
:
1192
1194
.

29

Galea
S
.
Galea Responds to “consequential(ist) epidemiology: finally”
.
Am J Epidemiol
.
2013
;
178
8
:
1195
1196
.

30

Ascher
M
.
Journals, epidemiological
. In:
Boslaugh
S
, ed.
Encyclopedia of Epidemiology
.
Thousand Oaks, CA
:
SAGE Publisher, Inc.
;
2008
:
572
573
.

Supplementary data