Psychosocial factors and cancer incidence (PSY‐CA): Protocol for individual participant data meta‐analyses

Abstract Objectives Psychosocial factors have been hypothesized to increase the risk of cancer. This study aims (1) to test whether psychosocial factors (depression, anxiety, recent loss events, subjective social support, relationship status, general distress, and neuroticism) are associated with the incidence of any cancer (any, breast, lung, prostate, colorectal, smoking‐related, and alcohol‐related); (2) to test the interaction between psychosocial factors and factors related to cancer risk (smoking, alcohol use, weight, physical activity, sedentary behavior, sleep, age, sex, education, hormone replacement therapy, and menopausal status) with regard to the incidence of cancer; and (3) to test the mediating role of health behaviors (smoking, alcohol use, weight, physical activity, sedentary behavior, and sleep) in the relationship between psychosocial factors and the incidence of cancer. Methods The psychosocial factors and cancer incidence (PSY‐CA) consortium was established involving experts in the field of (psycho‐)oncology, methodology, and epidemiology. Using data collected in 18 cohorts (N = 617,355), a preplanned two‐stage individual participant data (IPD) meta‐analysis is proposed. Standardized analyses will be conducted on harmonized datasets for each cohort (stage 1), and meta‐analyses will be performed on the risk estimates (stage 2). Conclusion PSY‐CA aims to elucidate the relationship between psychosocial factors and cancer risk by addressing several shortcomings of prior meta‐analyses.


INTRODUCTION
Psychosocial factors such as depression, general distress, and low social support have long been theorized to increase cancer risk (Dalton et al., 2002). Findings from prior research studying the association between psychosocial factors and cancer are mixed. Two metaanalyses focusing on depression concluded that there was a small, potentially trivial, effect on cancer risk (McGee et al., 1994;Oerlemans et al., 2007). Another meta-analysis of the published literature indicated that depression (combined hazards ratio [HR] = 1.29), psychosocial factors relating to stress-prone personality or poor coping style (combined HR = 1.08), and psychosocial factors relating to emotional distress or poor quality of life (combined HR = 1.13) increased the risk for all cancer outcomes and, when collapsing psychosocial factors across subtypes, especially for lung cancer (combined HR = 1.23) (Chida et al., 2008). However, the included studies varied greatly in the psychosocial factors investigated and the cancer endpoint of interest. It is crucial to use clearly and specifically defined psychosocial factors as they can lead to distinct physiological and behavioral effects (O'Donovan et al., 2010). These effects may increase risk for specific cancers given their unique etiologies. Furthermore, published studies vary greatly in the confounders adjusted for (if any), making reliability and interpretation of outcome debatable. Rather than including studies where analyses have been determined by the original authors, two-stage individual participant data (IPD) meta-analysis refers to the (re-)analysis of original data for each cohort using a standardized approach (stage 1) before combining in a meta-analysis (stage 2) (Tierney et al., 2020). IPD meta-analyses of cohorts have the potential to produce more reliable results than meta-analyses of published findings (Stewart & Parmar, 1993) as one can ensure a consistent definition of the psychosocial factors, specific cancer endpoints, and key confounders adjusted for across all included cohorts.
Evidence remains limited regarding how psychosocial factors increase cancer risk. Theory postulates several possible, potentially interrelated, pathways that link psychosocial factors and cancer, including angiogenesis, endocrine mechanisms, immunosuppression, impairments in DNA repair, and inflammation (Lutgendorf et al., 2007).
While health behaviors as a potential pathway between psychosocial factors and cancer have received little attention, they deserve consideration given the established relationship between health behaviors and psychosocial factors (Strine et al., 2008;Verger et al., 2009), and between health behaviors and cancer (Biswas et al., 2015;Chen et al., 2018;Kerr et al., 2017). To date, studies have most often considered health behaviors as confounders, rather than playing a direct role in the association between psychosocial factors and cancer risk. If health behaviors explain the relationship between psychosocial factors and cancer risk, this may justify offering health behavior interventions in at-risk groups such as individuals who are depressed and smoke.  (Knekt et al., 1998), weight (Kerr et al., 2017), alcohol use (Pelucchi et al., 2011), physical activity (Kerr et al., 2017), sedentary behavior (Kerr et al., 2017), sleep duration and sleep quality (Hurley et al., 2015), menopausal status (Trichopoulos et al., 1972), hormone replacement therapy (Vecchia et al., 2001), age (Thakkar et al., 2014), sex (White et al., 2018), and education level (Mouw et al., 2008). For example, in one study, depressive symptoms increased the risk of colorectal cancer particularly in overweight women (Kroenke et al., 2005), and another study found that the effect of depressive symptoms on cancer risk was increased at higher levels of cigarette smoking (Linkins & Comstock, 1990). Studying interactions provides insight into the mechanisms leading to cancer development and also shows for which subgroups the association between psychosocial factors and cancer incidence is most prominent and thus could benefit most from preventive interventions.
Health behaviors may not only interact with psychosocial factors, but may also function as mediators situated in the pathway from psychosocial factors to the development of cancer. Symptoms of depression, for example, have been linked to smoking initiation and the amount of smoking (Steuber & Danner, 2006), increased alcohol use (Bulloch et al., 2012), weight gain, weight loss, obesity (Blaine, 2008) (Benca et al., 1997), all of which have subsequently been associated with an increased cancer risk (Biswas et al., 2015;Chen et al., 2018;Kerr et al., 2017). While weight is not a health behavior, we refer to this as a health behavior given the association with several other health behaviors, specifically diet and physical activity. Despite numerous allusions to the potential mediating role of health behaviors in the relationship between psychosocial factors and cancer (Chida et al., 2008;Dalton et al., 2002), there are remarkably few studies in which this has been tested.
The psychosocial factors and cancer incidence (PSY-CA) consortium was established to investigate whether a variety of psychosocial factors increase the risk of cancer. The investigated psychosocial factors include diagnosed depressive disorder and depressive symptoms (Jia et al., 2017) (here forth referred to as depression), diagnosed anxiety disorder and anxiety symptoms (Chen et al., 2018) (here forth referred to as anxiety), (recent) loss events (Dalton et al., 2002), perceived low social support (Idahl et al., 2018), relationship status (Randi et al., 2004), general distress (Peled et al., 2008), and neuroticism (Schapiro et al., 2001 sedentary behavior, and sleep) in the relationship between psychosocial factors and the incidence of cancer (see Figure 1). Specific hypotheses have been formulated (Appendix 1).

Design overview
Preplanned two-stage IPD meta-analyses are performed. We apply the Maelstrom guidelines (Fortier et al., 2016) to create harmonized vari-ables across the 18 cohorts. Data are analyzed in each cohort (stage 1) and the outputs are used in a meta-analysis (stage 2).

PSY-CA consortium
The consortium consists of the steering group (LvT, JD, AV, AdG, and AVR), three main researchers (LvT, MB, and K-YP), representatives from each participating cohort, and selected experts in the field of psycho-oncology, epidemiology, methodology, and statistics. Meetings are held at least two times a year with the first formal consortium meeting having taken place on March 2019. During the meetings, consensus is reached on the objectives, approach, and interpretation of findings.
The project leader (JD), the steering group, and the representatives of the Dutch cohorts are responsible for the formal management of the study.

Preregistration
The PSY-CA study has been preregistered in PROSPERO:

Ethics
The ethics approval for PSY-CA was waived by the Medical Ethics

Cohorts
Cohorts were eligible to take part in PSY-CA if following criteria were met: 1. a valid and reliable measure of depression, anxiety, recent loss events, social support, general distress, and/or neuroticism; 2. availability of an objective measure of cancer diagnosis during follow-up or the potential to get this information through, for example, linkage with a cancer registry; 3. availability of data regarding smoking, alcohol, sex, and age; and 4. a prospective study design (i.e., psychosocial factors were measured before cancer incidence).
Cohorts were not eligible if there was no information about a history of cancer at baseline. Initially, relatively objective social support (i.e., social network size) and hopelessness were concepts also included in the first criterion, however as most cohorts did not have a measure of this, these concepts were subsequently dropped. Objective social support was replaced with relationship status. One cohort-Prospect-EPIC (see Table 1)-initially appeared to have information about depression and anxiety diagnoses through a psychiatric registry.
However, on closer inspection, this data appeared to be incomplete.
As relationship status was measured in Prospect-EPIC, it remained included in the study.

Participants
Across all analyses, participants were excluded if they had a cancer diagnosis (based on [cancer] registry data or self-report) at baseline or in the past (including in situ carcinomas and neoplasms of undetermined behavior [i.e., benign/malignant status undetermined]), with the exception of nonmelanoma skin cancer. Participants who had refused linkage with an external registry were also excluded from any analy-sis. People with a cancer diagnosis within one year from baseline were excluded from the analysis.

Search strategy and eligible studies
In preparation of PSY-CA, a feasibility study was conducted to identify potential cohorts ( During the feasibility study, coordinators of the candidate cohorts were contacted to check whether the inclusion and exclusion criteria were met, and to outline any other potential issues related to costs or ethical issues, for example. The included cohorts (11 cohorts in the Netherlands and seven cohorts in the United Kingdom, Norway, and Canada) are outlined in Table 1.
PSY-CA is set up in such a way that after the project has finished, additional cohorts can be incorporated by applying the harmonization manual (outlined in data handling below) to the data and running the standardized analyses scripts.

Psychosocial factors
The relationship between the following seven psychosocial factors and cancer incidence is analyzed: depressive symptoms or clinical depression (i.e., major depressive disorder, dysthymia), anxiety symptoms or anxiety disorders (excluding specific phobias), recent loss events (defined specifically as the loss of an immediate family member or partner in the last 12 months), perceived social support, relationship sta-  (Penninx et al., 2008) Note: In some cohorts a measurement wave other than baseline is used in PSY-CA due to the absence of a measure relating to one of the psychosocial factors outlined in the hypotheses. a This is before applying any exclusion criteria (e.g., a history of cancer) and based on baseline adult sample sizes. b Subcohorts are limited to those that are treated as subcohorts in the meta-analyses. For certain cohorts, subcohorts were combined where subsample sizes were too small otherwise (i.e., <1000) and combining resulted in minimal or no loss of data.

Harmonization and data cleaning
Data harmonization ensures the quality of the results, and the interpretability. Harmonization of data across cohorts also enables the use of standardized scripts for stage 1 analyses (see statistical analysis below) requiring minimal user input. We apply the Maelstrom guidelines (Fortier et al., 2016) to create individual harmonization manuals for each cohort providing guidance on how to recode data to create the variables required for PSY-CA. Definitions of the variables to be derived are agreed upon within the consortium. Local researchers at each cohort harmonize the data and receive a script to run a number of basic checks (e.g., checking for proportion of missing data). The basic checks are then reviewed by two researchers (LvT and MB) as an additional check of adherence to the manuals.

Missing data
Previously defined, cohort-specific approaches to dealing with missing data is applied within a given measure (e.g., questionnaire). Where no such approach has previously been defined, the general rule applied is to substitute person-mean for up to 20% missing responses for a measure. This rule is based on previous studies comparing ways to deal with item-level missing data, specifically in measures of depression (Bono et al., 2007;Shrive et al., 2006). Missing responses or responses equivalent to "I don't know" are coded as missing data. The only exception is the family history of cancer variables where "I don't know" is coded as no family history of (the specific) cancer.

Unlikely values and extreme outliers
Local researchers harmonizing the cohort data are instructed to investigate extreme, unlikely values and recode these to missing if there is sufficient support that these are errors (e.g., a very high BMI that is markedly higher than the BMI reported at a follow-up wave for a given participant).
Across all cohorts, extreme outliers are defined as values that are more than three times the interquartile range above the third quartile or below the first quartile, and are truncated to the cut-offs, respectively. The exception to this rule is variables that contain true zeroes as the lowest possible score as these are likely to be skewed (e.g., pack years where all never smokers score zero). For these variables, only the upper extreme values are truncated. The number of cases that are capped, and the replacement value are recorded for all cohorts and double-checked.

Statistical analysis
PSY-CA employs a two-stage design. In stage 1, local researchers at the cohort level run a standardized R script over the harmonized dataset, and subsequently provide all output generated to the main researchers (LvT, MB, and K-YP). In stage 2, the output from stage 1 across all cohorts is pooled in the meta-analyses. As such, the main researchers do not have direct access to cohort raw data. However, in the event that further clarification is required from specific cohort data, subsequent scripts are sent to the local researchers to gain additional information.
2.9.1 Stage 1 For the analyses related to question one (relationship between psychosocial factors and cancer) and question two (interaction), Cox regression models are used. For question three (mediation), different regression models (logistic and multiple) are used to test the path between the psychosocial factor and the mediator, dependent on whether the mediator is categorical or continuous. Cox regression models are used for the path between the psychosocial factor and cancer, and between the mediator and cancer.
Across all research questions, entry age is the age at baseline, while exit age is the age at cancer incidence, death, or end of cancer follow-up period of the respective cohort (whichever comes first). Note that several cohorts are ongoing but, for the purposes of PSY-CA, are capped to the moment of linkage with the cancer or vitality registry (whichever comes first). Where another type of cancer occurs (e.g., lung cancer) after cancer endpoint being modeled (e.g., breast cancer), participants are censored at the age of first diagnosis (Ji et al., 2020).
For the first two research questions, the following models are run: Model 1: univariable-which includes the year of birth and the psychosocial factor.  Finally, moderators of effect size are explored including when the cohort started, and whether the cohort took place in the Netherlands or not.

Power analysis
To test the power of our IPD meta-analysis, we ran a power simulation study similar to that of Ensor et al. (2018), with a focus on depression.
The study information that we used for the simulation study involved the total number of participants, the prevalence of depression at the baseline measurement of the study (estimated where this was not yet known), and the expected number of cancer cases of two types of cancer: lung cancer and any cancer (anticipated smallest and largest categories, respectively). Requiring 80% power, an alpha of 0.05 (two-sided testing), and using fixed-effects meta-analysis of HR from Cox regression models, our simulation study showed that regarding main effects the minimal detectable effect size for is HR = 1.04 for any cancer and HR = 1.12 for lung cancer.
Regarding the interaction analyses, our calculations (based on the general convention that the sample size in a single trial should be increased approximately four times to detect the interaction effect (Brookes et al., 2004;McClelland & Judd, 1993)) showed that the minimal detectable effect size for any cancer is HR = 1.08 for any cancer and HR = 1.25 for lung cancer. Regarding the mediation analyses, our calculations (based on an inflation factor of two, compared to testing the main effects) showed that the corresponding minimal detectable effect size is HR = 1.06 for any cancer and HR = 1.18 for lung cancer.
The inflation factor of two for mediation analysis was found as an upper bound by comparing sample sizes needed for main effects with those for mediation effects using Baron and Kenny's test assuming different effect sizes (Fritz & MacKinnon, 2007).

Interpretation
The hypotheses tested include four psychosocial factors (depression [symptoms], anxiety [symptoms], recent loss events, and perceived social support), four health behaviors (smoking, alcohol, physical activity, and weight), and seven cancer outcomes (see Appendix 1). It is important to specify that the interpretation of the results is done holistically, and not based on a single association (i.e., "cherry picking").
Through triangulation of the evidence from the different analyses, we conclude if there is statistical support of an association between psychosocial factors and cancer, and by extension whether there is evidence for interaction of or mediation by health behaviors. Interpretation is done by looking at the obtained HRs (or beta-coefficients) and 95% confidence intervals (CIs) and by exploring consistency and robustness of the findings. Additionally, the associations between a number of further psychosocial factors, other health and demographic factors, and cancer are studied. The results of these additional analyses are considered to be exploratory. Subgroup and sensitivity analyses are considered to be exploratory as well.

DISCUSSION
Previous meta-analyses investigating the role of psychosocial factors in cancer incidence have shown mixed findings (Chida et al., 2008;McGee et al., 1994;Oerlemans et al., 2007). While this is partly explained by differences in the types of psychosocial factors and cancer endpoints, many studies in these meta-analyses pose further limitations, Given the link of health behaviors such as smoking with both psychosocial factors (Strine et al., 2008;Verger et al., 2009) and cancer (Biswas et al., 2015;Chen et al., 2018;Kerr et al., 2017), there is a need to clarify whether the role of health behaviors is more than a confounding effect. Health behaviors, demographic and somatic factors that are well established cancer risk factors may interact with psychosocial factors to pose further risk. Furthermore, health behaviors could explain the link between psychosocial factors and cancer (i.e., health behaviors as mediators). Research into the role of health behaviors in the association between psychosocial factors and cancer is surprisingly lacking, and PSY-CA aims to provide insight into this area. As such, the results from the proposed study outlined in this article may reveal psychosocial factors that put individuals at risk for cancer, identify certain subgroups to target with preventive interventions, and support the use of health-behavior interventions to reduce the risk of cancer associated with psychosocial factors.

ACKNOWLEDGMENT
The PSY-CA consortium is supported by funding from the Dutch Cancer Society(VU2017-8288).

CONFLICT OF INTEREST
The authors declare no conflict of interests.

PEER REVIEW
The peer review history for this article is available at https://publons. cer? Specifically, we hypothesize that depression, anxiety, recent loss events, and perceived lack of social support all individually increase the incidence of any cancer, breast cancer, lung cancer, prostate cancer, colorectal cancer, smoking-related cancers, and alcohol-related cancers.
We limit our hypotheses to depression, anxiety, recent loss events and perceived low social support given the rather clear distinction between these concepts (e.g., while neuroticism and general distress are relatively broad constructs incorporating symptoms of both depression and anxiety), and the focus on these factors in prior research (e.g., relatively little research has looked at relationship status and cancer incidence). Therefore, analyses relating to neuroticism, general distress and relationship status are considered explorative.
Research question 2: Do these psychosocial factors interact with health behaviors (smoking, alcohol use, weight, physical activity, sedentary behavior, sleep duration, and sleep quality) or demographic and clinical factors (age, sex, education, hormone replacement therapy, and menopausal status) on the risk of cancer incidence? Specifically, we hypothesize that the risk of cancer in people with psychosocial stress (i.e., elevated depression symptom level or diagnosis, elevated anxiety symptom level or diagnosis, a recent loss event, or perceived lack of social support) and unhealthy behavior (smoking, alcohol use, overweight, and low physical activity) is greater than the sum of the individual effects of the psychosocial factor and unhealthy behavior on cancer incidence. We limit our hypotheses to these health-related behaviors given the consistent evidence of their association with cancer.
Research question 3: Are the relationships between these psychosocial factors and incidence of cancer mediated by health-related factors (smoking, alcohol use, weight, physical activity, sedentary behavior, sleep duration, and sleep quality)? Again limiting the hypotheses to depression, anxiety, recent loss events, and perceived low social support, we hypothesize that (a) smoking, alcohol use, physical inactivity, and high body mass index (BMI) partially mediate the association between the psychosocial factors and cancer of any kind, breast cancer, and colorectal cancer; (b) smoking partially mediates the association between the psychosocial factors and smoking-related cancers; (c) alcohol use partially mediates the association between the psychosocial factors and alcohol-related cancers; (d) smoking and physical inactivity partially mediate the association between the psychosocial factors and lung cancer; and (e) physical inactivity partially mediates the association between the psychosocial factors and prostate cancer.