Quantitative methods for descriptive intersectional analysis with binary health outcomes

Intersectionality recognizes that in the context of sociohistorically shaped structural power relations, an individual's multiple social positions or identities (e.g., gender, ethnicity) can interact to affect health-related outcomes. Despite limited methodological guidance, intersectionality frameworks have increasingly been incorporated into epidemiological studies, both to describe health disparities and to examine their causes. This study aimed to advance methods in intersectional estimation of binary outcomes in descriptive health disparities research through evaluation of 7 potentially intersectional data analysis methods: cross-classification, regression with interactions, multilevel analysis of individual heterogeneity (MAIHDA), and decision trees (CART, CTree, CHAID, random forest). Accuracy of estimated intersection-specific outcome prevalence was evaluated across 192 intersections using simulated data scenarios. For comparison we included a non-intersectional main effects regression. We additionally assessed variable selection performance amongst decision trees. Example analyses using National Health and Nutrition Examination Study data illustrated differences in results between methods. At larger sample sizes, all methods except for CART performed better than non-intersectional main effects regression. In smaller samples, MAIHDA was the most accurate method but showed no advantage over main effects regression, while random forest, cross-classification, and saturated regression were the least accurate, and CTree and CHAID performed moderately well. CART performed poorly for estimation and variable selection. Sensitivity analyses examining the bias-variance tradeoff suggest MAIHDA as the preferred unbiased method for accurate estimation of high-dimensional intersections at smaller sample sizes. Larger sample sizes are more imperative for other methods. Results support the adoption of an intersectional approach to descriptive epidemiology.


Introduction
Intersectionality acknowledges that in the context of sociohistorically shaped structural power relations, an individual's multiple social positions or identities (e.g., gender, ethnicity) can interact to affect health-related outcomes (Collins, 2002;Crenshaw, 1989). Since the term intersectionality was first used academically to describe unjust legal processes for Black women (Crenshaw, 1989), this theoretical framework has traversed disciplines and been extended beyond gender and race to other identities or positions that reflect social power structures (e.g., income, age, sexuality, disability) (Bauer et al., 2021;Cho, Crenshaw, & McCall, 2013). Intersectionality can serve as a framework in incorporating social context into epidemiological research, informing conceptualization of research questions, sampling, study design, analysis, and interpretation of results (Agènor, 2020;Bauer, 2014;Bowleg, 2012).
Intercategorical intersectional analyses describe intersectional groups, and differences between them (McCall, 2005). Calculating health outcomes for intersections (defined by a combination of social positions), rather than by combining effects estimated for each position separately, can create more accurate estimates ([Authors' Names Redacted], n.d.). Within descriptive epidemiology, analyses using large ABBREVIATIONS: CART, classification and regression tree; CTree, conditional inference trees; CHAID, chi-square automatic interaction detector; VIM, variable importance measure; MAIHDA, multilevel analysis of individual heterogeneity and discriminatory accuracy; NHANES, National Health and Nutrition Examination Study; SD, standard deviation; MAD, mean absolute deviation; U.S., United States. population datasets can explore outcomes for high-dimensional intersections (i.e., crossing four or five social positions), including under-studied intersections (Bauer, 2014). We classify as descriptive intersectional analysis methods those able to independently estimate outcomes for co-formed social intersections, by not assuming (as in main-effects regression) that effects of individual social positions are constant across intersections ([Authors' Names Redacted], n.d.). We use the term "intersectional methods" with the understanding that methods themselves do not make research intersectional. Regression with interaction terms and cross-classification are the most common conventional analysis methods for descriptive intersectional research (Bauer et al., 2021).
Many health outcomes are binary, with a wide range of prevalences, creating challenges when studying high-dimensional intersections. Lowprevalence outcomes are prone to influence by outliers and require large samples to produce suitable numbers of events, problems exacerbated by increasing numbers of intersectional subgroups. For conventional regression analyses with binary outcomes, inclusion of higher-level interaction terms reduces the probability of model convergence and increases the potential for both high variance in estimates leading to wide confidence intervals and for bias away from the null (Greenland, Mansournia, & Altman, 2016;Peduzzi, Concato, Kemper, Holford, & Feinstein, 1996). These issues result in researchers resorting to non-intersectional main effects approaches or limiting the number of intersections under study.
This motivates the evaluation of currently used alternative methods for intersectional analyses of binary outcomes (Bauer et al., 2021). Classification and regression trees (CART), conditional inference trees (CTree), and chi-square automatic interaction detector (CHAID), are data-driven non-parametric methods that apply decision rules to partition data into a single final decision tree, can incorporate any level of interaction, and can identify subgroups for further study or intervention, but do not produce estimates of effect size or variance (Breiman, Friedman, Stone, & Olshen, 1984;Hothorn, Hornik, & Zeileis, 2006;Kass, 1980). Decision trees can also be useful for variable selection, to reduce a list of variables to those most likely to be split on ([Authors' Names Redacted], n.d.). Random forest aggregates multiple decision trees formed from bootstrapped samples to reduce overfitting (Banerjee, Reynolds, Andersson, & Nallamothu, 2019); it produces no single decision tree or subgroup visualization, but rather a variable importance measure (VIM). Multilevel analysis of individual heterogeneity and discriminatory accuracy (MAIHDA) is a multilevel method which uses individual-level data (Evans, Williams, Onnela, & Subramanian, 2018), with fixed effects for each social position variable and a random intercept for each intersection. MAIHDA uses weighted random intercepts that can reduce overfitting, but appropriateness of fixed and random effects for intersectional interpretation is contested (Evans, Leckie, & Merlo, 2020;Lizotte, Mahendran, Churchill, & Bauer, 2020).
We previously evaluated intersection-specific estimation accuracy of these conventional and alternative methods (except CHAID) for continuous outcomes, using simulated data ([Authors' Names Redacted], n.d.). We found random forest,MAIHDA, and CTree to be more accurate than other methods at smaller sample sizes, while at large sample sizes all methods performed similarly for estimation except for CART, which produced less accurate estimates at both large and small sample sizes ([Authors' Names Redacted], n.d.). However, method performance and implementation may differ depending on outcome variable type. For example, CHAID analyses require categorical outcomes (Kass, 1980). The objective of this study was thus to evaluate seven methods (regression with interaction terms, cross-classification, MAIHDA, CART, CTree, random forest, and CHAID) for accuracy in intersection-specific prevalence estimation, alongside a non-intersectional main effects regression and a perfectly specified (but impracticable) regression model, using a variety of simulated but realistic data scenarios. We then sought to demonstrate analyses using National Health and Nutrition Examination Survey (NHANES) 2015 to 2018 data on high blood pressure. Finally, the decision tree methods were assessed for variable selection performance. While epidemiological studies often focus on significance testing and estimation of effect sizes or interactions, accurate outcome estimation for subpopulation groups is an important objective for population health. Therefore, we focus on improving estimation. This study is an opportunity to reexamine how we approach descriptive epidemiology for binary outcomes from both a theoretical and statistical standpoint. While it originated in concerns regarding intersectionality methods, results are relevant to all epidemiological applications that face similar statistical challenges in exploring heterogeneity in binary outcomes.

Simulation process
A rare outcome was simulated with an average prevalence of 3% (range: 1.35%-5.60%). A common outcome was simulated with an average prevalence of 15% (range: 7.15%-28.58%). Both outcome types were created with a set of categorical inputs and a set of mixed inputs, resulting in four possible models. Each model was iterated 1000 times for four different sample sizes (N=2000, 5000, 50,000, 200,000). This Each simulated model resulted in 192 intersections, (4*2*2*2*2*3=192).

Mixed
Coefficients were sampled so that outcome probability was less than or equal to 1.
The intercept had a value of − 3 for the rare outcome, and − 1.5 for the common outcome. X1 and X6 were continuous in the mixed inputs model, and categorical (four and three categories respectively) in the categorical inputs model; other variables were binary. Input variable structure is described in Table 1. The input variables resulted in 192 possible intersections of varying sizes. The set of effect sizes for variables X1 to X5 and the interaction terms differed with each iteration. For the rare outcome, effects sizes were selected from a truncated normal distribution (SD=0.30) between 1.24 and 1.80, or 0.20 to 0.76, and for the common outcome between 1.11 and 1.80, or 0.20 to 0.89, on the relative risk scale. X6 was simulated to have no effect on the outcome. Simulation code is provided online (https://github.com/m-mahendr an/methods_for_intersectionality_simulation_binary_outcomes).
Analyses were conducted in R version 3.6.1 ("R Foundation for Statistical Computing. R: A language and environment for statistical computing," 2021).

Statistical estimation approaches
For cross-classification, the prevalence of the outcome within each intersection was calculated with no further statistical adjustment. Correctly-specified regression included only the lower-level coefficients (the intercept and variables X1 to X6) and the interaction terms modeled into the simulated data (X1*X2, X3*X4, X4*X5, X3*X5, and X3*X4*X5).
Saturated regression included all possible lower-level and interaction terms, and represents a more real-world application wherein the underlying data structure is unknown. Note that interaction terms were for improving overall estimation of intersection-specific outcome prevalence, rather than inference regarding the size or statistical significance of interaction effects. Finally, main effects regression was included as a non-intersectional method for comparison. Poisson regression, modified to use robust variance estimation, was used for all single-level regressions, because it produces risk ratio estimates for both rare and common binary outcomes (Zou, 2004). Analyses were run using the R-core function "glm", and packages "lmtest" and "sandwich" to produce robust variance estimates using the sandwich estimator (Zeileis & Hothorn, 2002;Zeileis, 2006). MAIHDA models were run using multilevel logistic regression, using the R-package "lme4" (Bates, Mächler,  Bolker, & Walker, 2015). Following MAIHDA modeling practices, fixed effects were assigned to each of the simulated coefficients (X1 to X6), and a random intercept was assigned to each intersection (Evans et al., 2018). While MAIHDA models are typically run using a Bayesian analysis with uninformative priors, we ran a frequentist analysis due to simulation time constraints. Previous simulation has found the main effects estimates to be comparable between the two approaches ([Authors' Names Redacted], n.d.); other parameters are expected to have similar estimates but have not been exhaustively compared.

Machine learning estimation approaches
CART, CTree, CHAID, and random forest models were created with the following R-packages, respectively: "rpart" (Thernau, Atkinson, & Ripley, n.d.), "partykit" (Hothorn et al., 2006), "CHAID" (The FoRt Student Project Team & Hothorn, n.d.), and 'tuneRanger" (Probst, Wright, & Boulesteix, 2019). CART splitting criterion was based on the Gini rule, with ten-fold cross-validation to select the complexity parameter with minimal cross-validation error. CTree and CHAID models were created using an alpha of 0.05. As CHAID is based on chi-squared analysis, it could only be applied with categorical inputs. Default minimum node size to split for CART, CTree, and CHAID was 20. Random forest models were built with 500 trees, tuned using the parameter mtry by a step factor of 1, and the default minimum final node size was 1. Splitting criterion was based on decreases in node impurity (defined by the Gini index) (Wright & Ziegler, 2017). The VIM was assessed using two measures: impurity-based, which only produces estimates, and permutation-based, which produces estimates and p-values (Altmann, Toloşi, Sander, & Lengauer, 2010).

Primary objective: estimation accuracy
Estimation accuracy was assessed using the mean absolute deviation/mean ratio (MAD). The MAD for each method was calculated as, such that n was 192 representing 192 possible intersections, P i was the estimated prevalence of the outcome Y=1 for intersection i, P i was the true prevalence of the outcome Y=1 for intersection i, and p was the prevalence of the outcome in the entire sample. The true prevalence of the outcome was known by using the outcome generation formula, but not running it through the binary sampling function. Because estimations were for each intersection rather than each individual, accuracy is measured at the subpopulation (intersection) level, rather than the individual level. This emphasizes equal performance across intersections, rather than favouring better performance for more-populated intersections. A MAD of 0 is only achieved by perfect estimation of the prevalence within each intersectional group.

Secondary objective: assessment of variable selection
In population health analyses, even those rooted in theory, there may be more social identity/position variables than can be incorporated into an intersectional analysis. While decision trees do not produce traditional outputs such as variance estimates, they can assist in variable selection to identify relevant variables to include in subsequent analyses that do ([Authors' Names Redacted], n.d.). Variable selection in this setting is for data analysis planning, not to determine significance or strength of a variable on an outcome. For CART, CTree, and CHAID, variable selection was assessed by the percentage of simulation replicates where each variable (X1 to X6) was used as a splitting variable. Random forest variable selection was assessed by the average impurity-based VIM for each variable, and the percent of iterations that the variable had p<0.05 for the permutation-based VIM. Permutation-based VIM was only assessed for 200 iterations, due to computational time constraints. For our simulated data, decision trees that perform well for variable selection would detect X6 as least relevant.

Sensitivity analysis
We conducted two post-hoc sensitivity analyses assessing the performance of MAIHDA, main effects regression, correctly-specified regression, and cross-classification, at sample sizes 2000 and 5000. First, we simulated four scenarios with a higher outcome prevalence of approximately 50%: categorical inputs, mixed inputs, categorical inputs with larger effect sizes only for interaction effects, mixed inputs with larger effect sizes only for the interaction effects. Second, we assessed the bias and variance of intersection-specific prevalence estimates, using one simulation scenario for the rare, common, and 50% prevalence outcomes with categorical inputs, with one set of effect sizes for each scenario, iterated 1000 times. Bias (the expected difference between modeled estimates and the true parameter value) and variance were estimated for each of the 192 intersections, and results present the Table 3 Splitting percentage (% of 1000 iterations) for each variable.

Example NHANES analysis
We used NHANES data to demonstrate and compare differences in results between the methods. Using a multistage probability sample, NHANES is designed to represent the U.S. non-institutionalized population (Chen, Clark, Riddles, Mohadjer, & Fakhouri, 2020). Sixty intersections were formed from sex/gender, race/ethnicity, age, and poverty (income below the US federal poverty line, income above the poverty line). The high blood pressure outcome was defined as systolic blood pressure ≥130 mmHg and/or a diastolic blood pressure ≥80 mmHg (Whelton et al., 2018), each measured by averaging a maximum of three readings. After removal of missing data, final sample size was N=9576. The prevalence of high blood pressure was 41.2%. Analyses included comparing intersection-specific estimations across methods, and comparing final decision tree outputs for variable selection. Other methods-specific outputs are presented in Web Appendix 2.

Convergence
We considered the feasibility of the different analysis methods for binary outcomes. While other analyses ran smoothly, the saturated regression did not always converge at smaller sample sizes (Table 2). Results for saturated regression models are thus only from converged models. Fig. 1 presents the distribution of intersection-specific estimation MAD over 1000 iterations for each of the four scenarios and four sample sizes in the simulated data. In large samples, intersectional methods generally performed well and had higher accuracy than the misspecified main effect regression. Exceptions were CART, which performed poorly across sample sizes and scenarios, and random forest under just one data scenario. In smaller samples the mis-specified main effects analysis performed better than all other methods except MAIHDA, which performed similarly, and correctly-specified regression for models with common prevalence outcomes. MAIHDA and the implausible correctly-specified regression were the best intersectional methods for all four scenarios at small sample sizes, followed by CTree and CHAID (when applicable). Fig. 2 presents estimated high blood pressure prevalences for each intersection, by each method, using real-world NHANES data. Choice in methods impacted final estimated prevalences. For example, for white female respondents aged 18 to 39 with poverty-level income, the estimated prevalence varied from 7% to 20%. Comparing the two bestperforming methods at smaller samples from the simulation, MAIHDA and main effects, the estimated prevalences were also different. For example, among Black male respondents age 60+ with non-poverty income, main effects estimated 76.5% while MAIHDA estimated 65.3%. For female Hispanic respondents age 60+with poverty-level income, main effects estimated 50.5% while MAIHDA estimated 60.3%. Table 3 presents the splitting percentages for the CART, CTree, and CHAID models for the four simulation scenarios. Across all scenarios and sample sizes, CART analysis produced almost no splitting. Therefore, for many iterations estimations were based on only the sample population prevalence, and were equal across all intersections. For CHAID and CTree, splitting on X1 to X5 increased with increasing sample size, and reached 100% in larger samples. X6 splitting frequency was lower than for X1 to X5, but also increased with sample size. The only difference between CTree and CHAID was that the splitting percentages for all variables were slightly higher for CHAID, starting at N=2000. Table 4 presents the average random forest impurity-based VIM's. For models with categorical inputs, X6 was only the least important at larger sample sizes. For the mixed input models where X6 was continuous, X6 was the second most important variable, after X1, even at the largest sample size. Table 5 presents variable selection performance of the permutationbased VIM. A cut off of P < 0.05 is effective at maintaining a low detection of false positives in the mixed inputs scenario, where X6 is continuous, but not in the categorical inputs scenario, where X6 is a three-category variable. Table 6 presents NHANES variable selection results. CART split on fewer variables (i.e., identified fewer as relevant) and resulted in fewer final subgroups than CTree or CHAID. Decision trees visualizations are presented in Web Appendix 2. For random forest models, using the impurity-based measure, age was the most important estimator by a wide margin, followed by gender and race/ethnicity, with income as the least important. Using the permutation-based measure of importance all variables except income were statistically significant at P < 0.05.

Sensitivity analysis
Sensitivity analyses present more detailed information on small samples and the performance of main effects analysis. Results from the first set of simulation analyses evaluating method estimation accuracy at 50% outcome prevalence show correctly-specified regression as equivalent or slightly more accurate than main effects regression, and MAIHDA in some scenarios performing better than main effects, correctly-specified regression, and cross-classification (Fig. 3A-D). The second set of analyses evaluating bias and variance of estimates for each intersection (Table 7) shows main effects regression and MAIHDA to generally have smaller variance than the correctly-specified regression or cross-classification, but larger bias. Between main effects regression and MAIHDA, bias and variance of estimates were quite similar, but at the highest outcome prevalence MAIHDA estimates appear more likely to reduce in bias with the increase in sample size between N=2000 and N=5000, while main effects estimates had greater reductions in variance.

Discussion
Challenges in assessing binary outcomes are amplified when We modeled 192 simulated intersections, a number typical of published studies using decision tree or MAIHDA methods; researchers studying fewer intersections may want to adjust our "larger" and "smaller" samples accordingly. For example, N=50,000 corresponded to a mean intersection size of N=260, while N=5000 corresponded to 26. Our small-sample analyses intentionally pushed limits for demonstration purposes, and a saturated regression model assessing 192 intersections at N=2000 is not a reasonable expectation.
In larger samples, all intersectional methods but CART were accurate estimators and outperformed non-intersectional main effects. However, at small sample sizes the main effects analysis performed better as an estimator than most methods, except for MAIHDA. Additionally, in smaller samples the traditional methods of cross-classification and saturated regression were poor estimators, and saturated regression was often not feasible given convergence issues.
Our sensitivity analysis assessing the bias and variance of the intersection estimates aimed to explain why at small sample sizes almost all the intersectional methods performed worse than the main effects approach, which given our data scenarios is mis-specified and incompatible with intersectionality. The overall estimation accuracy of a method is attributable to both the bias and variance of the estimates (Geurts, 2009). A main effects analysis's inability to account for variation between intersections was reflected in the bias of the estimates, which was greater for main effects than for a correctly-specified regression or cross-classification. However, at smaller sample sizes the variance of the main effects estimates was much smaller compared to cross-classification or regression with interaction terms, as it estimates fewer effects and is thus less prone to extreme values. The bias-to-variance tradeoff in this situation results in the biased main effects regression often performing equally or better than other methods able to account for intersectionality.
MAIHDA was the only applicable intersectional method (excluding correctly-specified regression as it is not a realistic option) that performed equivalent to or better than a main effects analysis, at smaller and larger sample sizes. Additionally, the two methods were similar in bias-to-variance tradeoff. We suggest that this is because MAIHDA models are comprised of a main effects regression (although main effects are determined differently than for a single-level regression), with additional residual estimates to account for effects within each intersection (Evans et al., 2018). The residuals are weighted so their magnitude is smaller if the intersection sample size is smaller, reducing the impact of the residual on the overall estimate and reducing variance caused by small intersections. For our smallest sample sizes, the residuals are heavily down-weighted, resulting in MAIHDA and main effects regression producing similar estimates. This down-weighting, also called shrinkage, was characterized by Bell, Holman, and Jones (2019) and protects MAIHDA from identifying intersectional effects when evidence (data or signal) is limited.
MAIHDA accuracy surpasses main effects at larger and more plausible sample sizes, with decreased number of intersections, or with a higher outcome prevalence, due to reductions in estimation bias compared to main effects regression. Our NHANES analysis was a realworld example with a high prevalence outcome, assessing a realistic number of intersections given the sample size, and demonstrated the non-equivalence of MAIHDA and main effects regression results. Intersection-specific prevalence estimates differed by up to 10% between MAIHDA and main effects. We recommend MAIHDA for estimating high-dimensional intersectional outcomes, especially when the sample size is small relative to the number of intersections such as in the NHANES example. It theoretically accounts for intersectionality (resulting in reduced bias with increasing sample size) and estimates fewer parameters (resulting in a low variance of estimates even at smaller sample sizes).
Previous studies assessing classification by random forest have concluded that it performs similarly or more accurately than logistic regression, CART, and CHAID (Caruana & Niculescu-Mizil, 2006;Kanerva, Kontto, Erkkola, Nevalainen, & Männistö, 2018;Maroco et al., 2011). However our random forest models produced less accurate intersection-specific estimates, especially in smaller samples. The typical application of random forest, or any decision tree method, may involve hundreds of input variables, and this is when these methods are most advantageous over conventional methods. Even with a large number of intersections, our inclusion of a relatively small number of  variables potentially reduced the benefits of a random forest. Additionally, in our data scenarios CART performed poorly. The random forests algorithm used CART to analyze each bootstrapped subsample. Random forests formed using CTree instead may have improved performance (Hothorn et al., 2006). In both the simulation and NHANES example, CART under-identified relevant variables representing heterogeneities in the data. These results are important given CART's use in intersectionality research as the decision tree method of choice (Cairney et al., 2014;Greene et al., 2019;Villanti et al., 2018). For CTree and CHAID, detection of relevant variables was low at small sample sizes and improved (but also had increasing false positive detection) with increasing sample size. Pruning of CTree or CHAID with the alpha criterion may mitigate Type 1 error issues. For random forest, permutation-based VIM performs well for mixed inputs, as it is less impacted by the bias in the Gini index, which favors splitting on continuous or multicategorical variables (Strobl, Boulesteix, Zeileis, & Hothorn, 2007). The impurity-based VIM better identifies relevant variables for categorical inputs, under conditions of a larger sample sizes and a common outcome. A permutation-based VIM can be used for a set of categorical inputs across outcome frequencies and sample sizes, and relevant variables will be identified as significant more often than the non-relevant variables. However, a permutation-based VIM in these scenarios does run the risk of a high false-positive detection.
To note, variable relevance as referenced in this study is purely quantitative. Our prior study detailed how sequential variable selection by decision tree methods may be a useful tool for variable selection, to narrow down a list of potential variables to be used in other analyses such as MAIHDA or regression with interactions ([Authors' Names Redacted], n.d.). However, a social position may have real-life impacts and not be detected by data-driven methods. For example, across CTree, CHAID, and random forest, variable relevance was less detectable for rare outcomes, especially at smaller sample sizes. In practice, variable selection for an intersectional approach should consider 1) capabilities of the dataset (for which variables there is enough information) and 2) existing research or community knowledge regarding the possible social structures and powers that would impact the outcome. Tools such as decision trees can provide additional decision support if needed. Table 8 presents recommendations, as well as quantitative outputs unaddressed by this study. We also present certain methods-specific outputs from the NHANES analysis in Web Appendix 2. We caution careful consideration of the theoretical match to intersectionality of these other outputs. For example, researchers using any method that produces effect estimates must be careful not to focus on significance testing to "prove" intersectionality as a statistical hypothesis, rather than using it as an informative framework (Bowleg, 2012). For decision trees, subgroups formed may be irrelevant to policy or practice if they do not represent reachable real-world groups. Cut-off values and final subgroups are subject to the instability of single decision trees (Li & Belford, 2002), and thus should not be seen as definitive, but rather corroborated with analyses using other datasets (Kreatsoulas & Subramanian, 2018), in addition to existing literature and community knowledge. We strongly recommend presenting measures of variance when possible, both to understand the precision of outcome estimates and to avoid over-emphasis of differences without consideration of within-intersection variation (e.g., via measures of discriminatory accuracy) (Merlo, 2018). The decision tree methods assessed in this study do not inherently produce variance estimates, which indicates an important limitation of these methods.
This study compared multiple methods for describing highdimensional intersections with binary outcomes, and found MAIHDA most accurate for intersection estimation, especially at smaller sample sizes. We acknowledge that our results may be limited by the simulation data generation process. Additionally, alternative applications of these methods, such as penalized regression with interactions, may improve their performance for estimation or variable selection (Rahman & Sultana, 2017). Future studies may also assess how incorporating survey weights affects the validity of estimation and variable selection. While this study's focus was quantitative performance, we remind researchers that application of intersectionality is more than a methodological choice, but an approach to research process, design, and interpretation. Ultimately, less-conventional methods can allow for better study of high-dimensional intersections, and broaden the possibilities to incorporate intersectional frameworks in epidemiological research, and ultimately to improve health.

Funding sources
This work was supported by an Ontario Graduate Student Scholarship to MM and by a Canadian Institutes of Health Research Sex and Gender Science Chair to GB [GSB-171372].

Ethics statement
Research ethics approval was not required for this study, which uses only simulated and publicly downloadable data.

Declaration of competing interest
The authors declare no conflicts of interest.