Using Directed Acyclic Graphs (DAGs) to Determine if the Total Causal Effect of an Individual Randomized Physical Activity-Promoting Intervention is Identifiable

ABSTRACT Physical activity promotion is a best buy for public health because it has the potential to help individuals feel better, sleep better, and perform daily tasks more easily, in addition to providing disease prevention benefits. There is strong evidence that individual-level theory-based behavioral interventions are effective for increasing physical activity levels in adult populations but causal inference from these interventions often is unclearly articulated. A directed acyclic graph (DAG) can be, but rarely is, used to determine if the causal effect of an individual-level theory-based physical activity-promoting intervention is identifiable (e.g. stripped of any spurious association). The primary objective of the current study was to demonstrate how a DAG can be used to determine if the total causal effect of an individual randomized physical activity-promoting intervention is identifiable. The demonstration was based on the Well-Being and Physical Activity study (ClinicalTrials.gov, identifier: NCT03194854). Annotated files from DAGitty and Mplus are provided.

The tutorial and teacher's toolbox section in Measurement in Physical Education and Exercise Science (MPEES) provides an outlet for review papers of special interest to the journal's readership (e.g., Myers, Lee, et al., 2018).The substantive issue addressed in this paper, determining if the total causal effect of an individual randomized physical activity-promoting intervention is identifiable, aligns with a longtime focus on challenges related to measuring change (with a desired causal inference for that change) in a variety of motor intervention contexts in MPEES (e.g., Christina, 1997;Kim & Kang, 2021;Palmer et al., 2021;Rikli, 1997).The methodological issue addressed in this paper, conceptualizing how to use directed acyclic graphs (DAGs) to determine if the total causal effect of an individual randomized physical activity-promoting intervention is identifiable, aligns with a longtime focus on conceptualizing how to apply methodological advances -developed outside of kinesiology -to human movement research in MPEES (e.g., Gill, 1997;James & Bates, 1997;Pacewicz & Myers, 2021;Pfeiffer et al., 2023).The synergy provided within this paper, a didactic treatment of how to specify DAGs, via specialty software, to determine if the total causal effect of an individual randomized physical activity-promoting intervention is identifiable aligns with the original conception of the need for a tutorial and teacher's toolbox section in MPEES: improving measurement practices in physical education and exercise science (e.g., Baumgartner, 1997;Baumgartner & Safrit, 2003;Jensen et al., 2000;Looney, 1997).
Physical activity promotion is a best buy for public health because it has the potential to help individuals feel better, sleep better, and perform daily tasks more easily, in addition to providing disease prevention benefits (e.g., United States Department of Health and Human Services [USDHHS], 2018; World Health Organization [WHO], 2020).Encouragingly, there is meta-analytic evidence from randomized controlled trials (RCTs) that individual-level theory-based physical activity-promoting interventions may be effective for increasing physical activity levels in adults (M.Gourlan et al., 2016;USDHHS, 2018).Unfortunately, causal inference from these experimental studies (e.g., can the observed "effect" of a physical activitypromoting intervention be interpreted as causal or is it merely an "association" with an unknown causal inference?)often is unclearly articulated (e.g., M. Gourlan et al., 2016;USDHHS, 2018).Therefore, a major need in physical activity research is application of causal inference methods to determine when an observed "effect" of a physical activity-promoting intervention from an experimental design can confidently be interpreted as likely (i.e., an inference) causal (e.g., USDHHS, 2018).
The primary objective of the current study is to demonstrate how a DAG can be used to determine if the total causal effect of an individual randomized physical activity-promoting intervention is identifiable (e.g., stripped of any spurious association).Closely related secondary objectives in the current study are to demonstrate how a DAG can be used to: (1) make causal qualitative assumptions explicit, (2) derive testable implications of a causal model, and (3) inform empirical estimation of the total causal effect of an intervention on an outcome.Objectives are addressed in a substantivemethodological synergy format consistent with previous publications in MPEES (e.g., Myers, Pacewicz, et al., 2023).Specifically, objectives will be addressed in the order that a researcher is likely to encounter them in practice, in this case secondary objectives 1 and 2, primary objective, and secondary objective 3, followed by a brief concluding remarks section.Prior to directly addressing objectives, however, we initially provide some necessary broader contexts: (a) a case for common interests between DAGs and MPEES and (b) a more general introduction to DAGs.

DAGs and MPEES: common interests
Understanding methods for determining whether the total causal effect of an individual-level theory-based physical activity-promoting intervention is identifiable with a DAG is a topic that we believe may be of interest to many MPEES readers for at least three reasons.Each of these three reasons relies on what we believe to be a common interest between the aims and scope of MPEES and the potential use of DAGs to determine if the total causal effect of an individual-level theory-based physical activity-promoting intervention is identifiable.More broadly, however, we believe that: (a) MPEES readers may be interested in knowing when an association between an individual-level theory-based physical activity-promoting intervention and physical activity may be interpreted as causal and that (b) DAGs can help determine when this inference may be defensibleand when it may not.

Common interest 1: conceptual models
Explicit use of detailed conceptual models, which typically depict proposed causal relationships (e.g., with unidirectional arrows), to guide the design of individual-level theory-based physical activity-promoting interventions has been a core of the MPEES mission since nearly its inception (e.g., Kim & Kang, 2021;Rikli, 1997).A DAG is a graphical causal model (e.g., Elwert, 2013;Greenland et al., 1999;Pearl, 1988Pearl, , 1995) ) that depicts a proposed data-generating process by encoding conceptual knowledge about how a phenomenon (e.g., individual-level physical activity) may work (e.g., what are the causes of individual-level physical activity?)both in the real-world (e.g., absent intervention) and in a particular research design (e.g., with intervention).An important, and ideally pre-data collection, outcome of a carefully specified DAG is that causal assumptions are made explicit.Explicitly stating causal assumptions is crucial because causal claims in a study are conditional on the DAG put forth in the study.Thus, detailed conceptual causal models are a common interest to both MPEES (e.g., providing justification for the design of an individual-level theory-based physical activitypromoting intervention) and DAGs (e.g., encoding qualitative beliefs about the causal structure of the phenomenon of interest and how the data will be collected in a particular research design).

Common interest 2: rigorous research designs
Construction of rigorous research designs to test the effectiveness of individual-level theory-based physical activity-promoting interventions, particularly with RCTs, has also been a core of the MPEES mission since nearly its inception (e.g., James & Bates, 1997;Palmer et al., 2021).Two important, and ideally predata collection, outcomes of a carefully specified DAG (e.g., Elwert, 2013;Greenland et al., 1999;Pearl, 1988Pearl, , 1995) ) are: (1) derivation of testable implications of a causal model (e.g., pairs of variables that are marginally and/or conditionally independent in the DAG) and (2) identification of causal effects (e.g., determining if it is possible to identify the total causal effect of an intervention on an outcome given the DAG).Thus, rigorous research designs are a common interest to both MPEES (e.g., designing a RCT to test the effectiveness of an individual-level theory-based physical activitypromoting intervention) and DAGs (e.g., determining if an association between an intervention and a subsequent outcome evaluated in an RCT can be interpreted as causal).

Common interest 3: estimating the total causal effect
A DAG is not a statistical model, but it can be used to inform the application of a statistical model for the purpose of estimating a total causal effect (e.g., Elwert, 2013;Greenland et al., 1999;Pearl, 1988Pearl, , 1995)).Application of advanced statistical models to empirically estimate the effectiveness of individual-level theory-based physical activity-promoting interventions, particularly with structural equation modeling (SEM), has also been a core of the MPEES mission since nearly its inception (e.g., Gill, 1997;Pacewicz & Myers, 2021).A SEM can be used to estimate the total causal effect of an intervention (e.g., an individual-level theory-based physical activity-promoting intervention) on an outcome (e.g., individual-level physical activity), if this causal relationship was determined to be identifiable in the proposed DAG (e.g., Pearl, 1988Pearl, , 2000Pearl, , 2009)).In fact, SEM and causal inference have been linked for over a century (e.g., Bollen & Pearl, 2013;Duncan, 1975;Haavelmo, 1943;Koopmans, 1953;Pearl, 1988Pearl, , 1998Pearl, , 2012Pearl, , 2023;;Wright, 1921).An important post-data collection outcome of a carefully specified DAG (e.g., Elwert, 2013;Greenland et al., 1999;Pearl, 1988Pearl, , 1995) ) is informing empirical estimation (e.g., covariate selection within a SEM) of the total causal effect of an intervention on an outcome.Thus, empirically estimating the total causal effect of an intervention with an SEM is a common interest to both MPEES (e.g., empirically estimating the total effect of an individual-level theorybased physical activity-promoting intervention from a RCT) and DAGs (e.g., determining if the empirically estimated total effect of an intervention on a subsequent outcome from an RCT can be interpreted as causal).

Directed Acyclic Graphs (DAGs)
Graphical causal models have become closely linked with DAGs over the past few decades (e.g., Elwert, 2013;Pearl, 1988Pearl, , 1993Pearl, , 1995;;Spirtes et al., 1993;T. VanderWeele & Robins, 2007) with historical roots in path diagrams depicting linear SEMs (e.g., Blalock, 1964;Duncan, 1975;Haavelmo, 1943;Wright, 1921).More recently, viewing DAGs as non-parametric (e.g., absent distributional and functional form assumptions) SEMs for causal inference dominates the literature on graphical causal models (e.g., Pearl, 1995Pearl, , 2009Pearl, , 2023)).Viewing DAGs as non-parametric SEMs is the approach taken in this manuscript prior to estimating model parameters of interest (e.g., marginal associations; a total causal effect; etc.) in our demonstration.Model parameters of interest will be estimated under the assumption of linearity because this is a common assumption for intervention research in general (e.g., Pearl, 2013Pearl, , 2017) ) and in physical activity research in particular (e.g., Ntoumanis & Myers, 2016).Readers are referred to Figure 3.1 in Pearl (2023) for a visual of how to use SEM methodology as a causal inference framework.
The DAG acronym describes three key features of these causal models.DAGs are "directed" (i.e., the D in DAG) because they contain single-headed arrows.DAGs are "acyclic" (i.e., the A in DAG) because they do not contain directed cycles of arrows (i.e., feedback loops).DAGs are graphical (i.e., the G in DAG) because they visually depict researchers' causal qualitative assumptions.Qualitative causal assumptions are a necessary, though sometimes uncomfortable, component in any causal inference framework (e.g., DAGs as non-parametric SEMs; potential-outcome framework, e.g., Holland, 1986;Neyman, 1923;Rubin, 1974).The ability to clearly articulate qualitative causal assumptions in a DAG is aided by some fluency in key notation and terminology commonly used to visually depict and/ or verbally describe DAGs.

Key notation and terminology
Some key DAG notation and terminology are introduced with a relatively simple four-variable demonstration (i.e., Causal Model 1) based on the Well-Being and Physical Activity (WBPA) study (ClinicalTrials.gov,identifier: NCT03194854; Myers, Lee, et al., 2019).Figure 1 contains this DAG, depicted in SEM-like style, created in the freely available (http://www.dagitty.net/)DAGitty 3.1 software (Textor et al., 2011(Textor et al., , 2016)).Appendix A in the supplemental materials contains the model code used to generate Figure 1.Readers can run this code in DAGitty to create this DAG.Readers are referred to the DAGitty manual (http://www.dagitty.net/manual-2.x.pdf) for fuller information on how to use DAGitty.The DAG depicted in Figure 1 is a simplified version of a more cumbersome 12-variable DAG based on the WBPA study (see Figure 1 in the supplemental materials).Appendix B in the supplemental materials provides the model code used to generate Figure 1.We primarily focus on the simplified four-variable DAG for didactic purposes (e.g., textual parsimony) but occasionally refer to the fuller 12-variable DAG to also provide a more realistic example when useful for accomplishing the objectives of this manuscript.
The objective of the 2018 WBPA study was to provide the first investigation of the effectiveness of the Fun For Wellness (FFW; Myers et al., 2017) online intervention to increase well-being and physical activity in adults with obesity in the United States of America.Fun For Wellness is a self-efficacy theory-based behavioral intervention developed to promote growth in well-being and physical activity by providing capability-enhancing opportunities to participants (Myers, Prilleltensky, et al., 2019).More information about the WBPA study and FFW is provided in subsequent sections of this manuscript when helpful for accomplishing the objectives of this manuscript.Readers are referred to: Myers, Prilleltensky, et al. (2019) for the WBPA study protocol; Scarpa et al. (2021) for engagement with the FFW intervention results; Myers et al. (2021) and Myers, Prilleltensky, et al. (2023) for subjective wellbeing outcome results; Lee et al. (2021) for wellbeing actions outcome results; and, Myers et al. (2020) for physical activity outcome results.
The DAG in Figure 1 depicts nodes, arrows, missing arrows, and paths.A definition for each of these key terms, along with a definition for key kinship terminology -ancestor, parent, child, and descendant -is briefly reviewed in this section.These key definitions will provide a foundation from which the objectives of the current manuscript will be addressed.Readers are referred to Pearl (2009) for a fuller treatment on DAG notation and terminology.

Nodes
A node (or a vertex) often represents a variable in a DAG and does so in the current manuscript.A node can be observed (i.e., depicted with a rectangle in SEM-like style) or unobserved (i.e., depicted with an oval in SEM-like style).There are four observed nodes in the DAG depicted in Figure 1: C, E, M, and O. Let C represent a covariate (e.g., age); E represent an exposure (e.g., individual randomized exposure to a physical activity-promoting intervention); M represent a psychological construct (e.g., selfefficacy) that E is designed to promote; and, O represent a behavioral outcome (e.g., individuallevel physical activity) that E is designed to promote.Distributional assumptions about nodes (e.g., normal etc.) are not communicated in DAGs.For a DAG to be a causal DAG it must include all common causes of each pair of nodes in the DAG.This is a key assumption that will be elaborated upon in later sections.
Prior to introducing a demonstration dataset from the WBPA study, we typically refer to a node within Figure 1 by its letter or its role in the DAG (e.g., C or covariate) instead of a possible conceptual interpretation (e.g., age).We do this to first keep the focus on DAGs in general so that readers can more easily imagine their own individual randomized physical activity-promoting intervention trials.Once we introduce a specific demonstration dataset, we will typically refer to a node within Figure 1 by its conceptual interpretation (e.g., age) instead of its letter or its role in the DAG (e.g., C or covariate).We do this to then keep the focus on conceptual implications of a particular DAG so that readers can more easily imagine conceptual implications for their own individual randomized physical activity-promoting intervention trials.

(Single-headed) arrows
An arrow (or a directed edge) represents a direct causal effect between a pair of nodes in a DAG that may have a non-zero (i.e., non-null) value.There are five direct causal effects assumed in the DAG depicted in Figure 1 The functional form of causal effects (e.g., linear etc.) is not communicated in DAGs.Similarly, effect heterogeneity (e.g., inclusion of interaction terms) is also not communicated in DAGs.

Missing arrows
A missing arrow (or an exclusion restriction) represents an absent (i.e., null) direct causal effect between a node pair in a DAG.There is one node pair, C and E, that lacks a direct causal effect between its nodes (i.e., C → E or C ← E) in the DAG depicted in Figure 1.Let a feature of the research design, random assignment of individuals to E (e.g., 0 = control, 1 = intervention), explain why no arrow directly connects C and E. Thus, node pair C and E is not adjacent in the DAG depicted in Figure 1.Missing arrows aid in the identification of causal effects.

Paths
A path is an unbroken, nonintersecting route traced along arrows, either with (i.e., causal) or against (i.e., non-causal) the direction of arrowheads in a DAG, that connects nodes.There are causal (e.g., C → M) and non-causal (e.g., C → O ← M) paths, with respect to the beginning and ending nodes in a path, connecting node pairs (e.g., C and M) in the DAG depicted in Figure 1.The first two columns in Table 1 provide connecting paths between each node pair in Figure 1.Pairs of nodes with at least one connecting path are connected.Pairs of nodes without a connecting path are disconnected.Each unique pair of nodes is connected in the DAG depicted in Figure 1.For this reason, the DAG depicted in Figure 1 is a connected DAG.This connected DAG, however, is not a complete DAG because node pair C and E is not adjacent.

Ancestor
Relationships among nodes in a DAG are commonly described with kinship descriptors (e.g., ancestor) and are closely connected to causal paths.Causal paths often are of primary interest (e.g., queries of interest) to researchers.An ancestor of a node is a node that causes, directly and/or indirectly, the node of interest.In Figure 1, the ancestors of O are C, E, and M; while the ancestors of M are C and E; and both C and E are without ancestors.In DAGitty, ancestors of a node with outcome status (e.g., O in Figure 1) are colored blue by default (e.g., C and M in Figure 1) unless an ancestor's status has been set to exposure.An ancestor whose status has been set to exposure (e.g., E in Figure 1) is colored green (and with a ► symbol) in DAGitty by default.A node whose status has been set to outcome (e.g., O in Figure 1) is colored blue (and with a │symbol) in DAGitty by default.

Parent
A parent of a node is a node that directly causes the node of interest (i.e., a specific type of ancestor).In the DAG depicted in Figure 1, the parent structure matches the ancestor structure for each node.A node without a parent is called a root.Every DAG has at least one root.In the DAG depicted in Figure 1, both C and E are roots.A connected DAG where every node has no more than one parent is called a tree.The DAG depicted in Figure 1 is not a tree (i.e., both M and O have more than one parent).

Child
A child of a node is a node that is directly caused by the node of interest.In Figure 1, the children of C and E are M and O; while the lone child of M is O; and, O is childless.A node without a child is called a sink.Every DAG has at least one sink.In the DAG depicted in Figure 1, O is a sink.

Descendant
A descendant of a node is a node that is caused, directly (i.e., a child) and/or indirectly, by the node of interest.In the DAG depicted in Figure 1, the descendant structure matches the child structure for each node.A DAG where each node has a maximum of one child is called a chain.The DAG depicted in Figure 1 is not a chain (i.e., both C and E have more than one child).
Readers are referred to Elwert (2013) for examples of node pairs being conditionally independent.

Secondary objective 1: making causal qualitative assumptions explicit
Now that some key DAG notation and terminology have been introduced, recall that DAGs make causal assumptions explicit and that DAGs can be viewed as nonparametric SEMs.Thus, a key input for using SEM as a causal inference method (e.g., in individual-level theory-based physical activity-promoting interventions) is that researchers explicitly provide a set of causal qualitative assumptions (A) that they can defend based on theory, previous research, proposed research design, etc (e.g., Pearl, 2012Pearl, , 2023)).Assumption set A is described as "qualitative" because no new quantitative data is assumed to have been collected at this point.Assumption set A can be clearly communicated in words and/or a causal model and/or non-parametric structural equations.
In words, set A for Causal Model 1 (A 1 ) assumes that: C and E may cause both M and O; and that M may cause O.Each of these five assumptions is regarded as a relatively weak assumption in that it merely allows for the possibility of a causal relationship with an unknown value.A stronger assumption within set A 1 is the absence of a causal relationship between C and E. This assumption, depicted by a missing arrow between C and E in Figure 1, imposes a null (i.e., the value equals zero) direct causal relationship between C and E in Causal Model 1.
A causal model (M A ) provides a visual outlet for researchers to explicitly communicate their assumption set A. The DAG depicted in Figure 1 can be viewed as M A for Causal Model 1 (M A1 ).Note that this illustration of M A1 differs from a "path diagram" depiction commonly used in SEM regarding structural error term representation.In DAGs where structural error terms are assumed to be independent it is common to omit their representation entirely as in Figure 1.Whereas in path diagrams commonly used in SEM, structural error terms often are depicted, even if they are assumed to be jointly independent, as in Figure 2. Specifically, the U variables in Figure 2 depict structural error terms within M A1 .Differences in structural error term depiction aside, Figure 1 (i.e., a DAG in SEM-like style) and Figure 2 (i.e., a path diagram in SEM) can be viewed as equivalent representations of assumption set A 1 .
Non-parametric structural equations provide a mathematical outlet for researchers to explicitly communicate their assumption set A. For example, assumption set A 1 can be communicated with four unknown functions (f) using the notation established in Figure 2: Equation 1 communicates the assumed causal process for C. Equation (2) communicates the assumed causal process for E. Note that only a structural error term (U_) is specified as a cause in both Equation 1 that is U_C and Equation (2) that is U_E because both C and E are roots within assumption set A 1 .Equation ( 3) communicates the assumed causal process for M. Equation (4) communicates the assumed causal process for O.Note that each parent, in addition to a U_ term, are specified as causes in Equation ( 3) that is C, E and U_M, and Equation ( 4) that is C, E, M and U_O because both M and O are children within assumption set A 1 .Set A typically contains many qualitative causal assumptions that provide necessary fuller context for a narrower set of queries of interest (Q) that often is the primary purpose for conducting a research study (e.g., Pearl, 2012Pearl, , 2023)).The degree to which set Q (which may contain one to many queries) can be accurately assessed depends on the validity of set A. For example, assumption set A 1 contains many qualitative causal assumptions that provide necessary fuller context for the particular query of interest (Q i ) in our simple four-variable demonstration: What is the total causal effect of E (i.e., FFW; an individual-level theorybased physical activity-promoting intervention) on O (i.e., individual-level physical activity)?The degree to which Q i can be accurately assessed depends on the validity of A 1 .For this reason, assumption set A 1 was explicitly communicated in three different outlets in this section: words − 2 nd paragraph; a causal model − 3 rd paragraph; and, non-parametric structural equations − 4 th paragraph.Each of these outlets can be viewed as equivalent representations of assumption set A 1 .Making causal qualitative assumptions explicit, in whichever outlet is preferred, is a necessary but insufficient step for moving forward to accomplishing the next secondary objective of this manuscript.
Recall that explicit communication of set A should also be accompanied by a defense of set A based on theory, previous research, proposed research design, etc (e.g., Pearl, 2012Pearl, , 2023)).For example, assumption set A 1 can be defended based on self-efficacy theory (e.g., Bandura, 1977Bandura, , 1997)), previous research on individuallevel physical activity-promoting interventions for adults with obesity (e.g., M. J. Gourlan et al., 2011;USDHHS, 2013USDHHS, , 2018)), the research design of the WBPA study (Myers, Prilleltensky, et al., 2019), and physical activity guidelines (e.g., USDHHS, 2018;WHO, 2018).Specific arrows from C (e.g., age) to M (e.g., self-efficacy) and O (e.g., physical activity) can be defended based on previous research (e.g., Bauman et al., 2012;Rubenstein et al., 2016) and theory (e.g., Beauchamp et al., 2019;Feltz et al., 2008;Jackson et al., 2020).While specific arrows from E (e.g., FFW) to M (e.g., self-efficacy), E (e.g., FFW) to O (e.g., physical activity), and from M (e.g., self-efficacy) to O (e.g., physical activity) also can be defended based on previous research (e.g., Bauman et al., 2012;Lee et al., 2023;Myers et al., 2020;Myers, Lee, et al., 2019) and theory (e.g., Beauchamp et al., 2019;Feltz et al., 2008;Jackson et al., 2020).The omission of an arrow between C (e.g., age) and E (e.g., FFW) can be defended based on the research design of the WBPA study: a prospective, double-blind, parallel group individual randomized RCT.Similarly, the tacitly assumed temporal ordering of nodes in Figure 1 can be defended based on the proposed research design of the WBPA study: C (e.g., age) and E (e.g., FFW assignment) were to be measured at baseline and the FFW intervention was to be delivered from baseline to 30-days post-baseline; M (e.g., selfefficacy) was to be measured 30-days post-baseline; and, O (e.g., physical activity) was to be measured 60days post-baseline.Readers are referred to Myers, Prilleltensky, et al. (2019) for a fuller defense of assumption set A 1 , which is a subset of the assumption set depicted in Figure 1s (i.e., the more cumbersome 12variable DAG).

Secondary objective 2: deriving testable implications of a causal model, M A
Given that assumption set A has been explicitly communicated and defended, associational implications between node pairs in a DAG can be derived (e.g., Wright, 1921).Pairs of nodes can be determined to be marginally (e.g., bivariate correlation, unadjusted path coefficient, etc.) and/or conditionally (e.g., partial correlation, adjusted path coefficient, etc.) dependent (i.e., associated) and/or independent (i.e., unassociated) given the DAG.Deriving testable implications of a M A is important because they can help assess the degree to which the data to be collected are found to be consistent with assumption set A. If the testable implications of a M A are found to be inconsistent with the observed data (e.g., an association is observed between a node pair determined to be independent in the DAG), then causal claims (e.g., for queries of interest Q) based on the DAG may also be in doubt.
Paths allow transmission of association between node pairs.Association between a pair of nodes occurs via three possible sources.The first possible source of association between a pair of nodes is causal, which can be transmitted directly and/or indirectly.For example, in Figure 1 association between E and O can be transmitted via both a direct causal path, E → O, and an indirect causal path, E → M → O.The second possible source of association between a pair of nodes is common cause confounding.For example, in Figure 1 association between M and O can be transmitted via common cause C: O ← C → M. The third possible source of association between a pair of nodes is conditioning (e.g., statistically controlling for) on (or "adjustment" in DAGitty language) a common effect (i.e., a collider on a path) or a descendant of a common effect (e.g., Berkson, 1946;Elwert & Winship, 2014).A collider on a path is a node that has at least two arrows pointing to it on the path of interest.For example, in Figure 1 association between C and E can be transmitted if M is conditioned on (i.e., M): C → M ← E. Note that if M is not conditioned on then the path C → M ← E does not allow transmission of association between C and E because this path is blocked by M (i.e., M is a collider on this path).
Each of the three possible sources of association between pairs of nodes described in the previous paragraph can be described as an "open" path in a DAG.A node pair with at least one open path connecting them is dependent in the DAG.A dependent pair of nodes in a DAG has a relatively weak testable implication (e.g., the association value may be non-zero) and therefore typically is not a testable implication of primary methodological interest.A node pair with no open paths connecting them is independent in a DAG.An independent pair of nodes in a DAG has a strong testable implication (e.g., the association value is zero) and therefore typically is a testable implication of primary methodological interest.The d-separation criterion (Pearl, 1986(Pearl, , 1988) ) enumerates testable implications of a M A and is described by Pearl (2023) as follows: A set of S nodes is said to block a path p if either: (1) p contains at least one arrow-emitting node that is in S, or (2) p contains at least one collision node that is outside S and has no descendant in S.
If S blocks all paths from set X to set Y, it is said to "d-separate X and Y," and then, it can be shown that variables X and Y are independent given S, written X ╨ Y |S.(p.60) Note that set S can include a null set (i.e., no other nodes conditioned on), which implies marginal independence, such as X ╨ Y. Note, too, that the double up-tack symbol ( ╨ ) communicates independence in this notation system.Now that deriving testable implications of a causal model, M A , has been described in a general way we return to our simple four-variable demonstration.Given that assumption set A 1 has been explicitly communicated and defended, associational implications between each pair of nodes in the causal model depicted in Figure 1, M A1 , can be derived.First, recall that node pairs and paths connecting each node pair in M A1 were already provided in the first two columns of Table 1.Columns three through five now describe whether a connecting path is: causal, open, and if there is a collision node.Columns six and seven now describe the testable implications for each node pair.As can be seen in Table 1, C and E is the only node pair that is determined to be independent (without conditioning) given the DAG.This result is depicted within the "Testable implications" window of DAGitty using the code provided in Appendix A in the supplemental materials.Specifically, the following text (italicized for clarity) is provided within the "Testable implications" window of DAGitty: C (e.g., Age) ⊥ E (e.g., FFW).Note that the up-tack symbol (⊥) communicates independence in DAGitty. Figure 2 in the supplemental materials provides a relevant screenshot.Note, too, that the more cumbersome 12-variable DAG depicted in Figure 1 follows a similar pattern regarding testable implications for each of the other 8 covariates (i.e., in addition to age) with "FFW" (i.e., marginal independence).

Demonstration dataset
Deriving testable implications of M A1 is important because they can help assess the degree to which the data to be collected are found to be consistent with qualitative assumption set A 1 .To demonstrate, we simulated a sample dataset (N = 500) using Monte Carlo methods (e.g., Gentle, 2003) in Mplus 8.10 (L.K. Muthén & Muthén, 1998-2017;see chapter 12).This sample was generated based on relevant suggestions for how to use Monte Carlo methods to improve data analysis in practice (e.g., L. K. Muthén & Muthén, 2002).Parameter values for this fictitious population (e.g., an ideal condition) were similar to parameter estimates (e.g., a less-than-ideal condition) provided in Myers et al (2020;e.g., N = 461; physical activity scores scaled by responses to the international physical activity questionnaire put forth by Ainsworth et al., 2000;etc.).and to a meta-analysis of effect size estimates for physical activity-promoting interventions in adults with obesity (M.J. Gourlan et al., 2011).Appendix C in the supplemental materials provides our annotated Mplus input file used to create the simulated sample.Readers with access to Mplus can run this file to create this dataset (i.e., MA1_TEST_IMPL1.dat).Readers without access to Mplus can find this dataset in Appendix D in the supplemental materials.For textual parsimony, we often refer to the simulated sample dataset as simply the observed data.
Descriptive statistics were estimated in Mplus 8.10 (see Appendix E in the supplemental materials for the input file) with weighted least squares mean-and variance-adjusted (WLSMV; B. Muthén et al., 1997) estimation.The WLSMV estimator was selected to estimate descriptive statistics because it is advocated for in exercise science (e.g., Myers, Pacewicz, et al., 2023) when at least some of the data are categorical (e.g., FFW) and distributional assumptions are made regarding these variables when estimating model parameters (e.g., bivariate correlations).Table 2 provides descriptive statistics for the four variables (i.e., nodes) in M A1 from the fictitious population and the simulated sample.If a researcher were to look at the four-by-four table of estimated marginal associations (e.g., bivariate correlations) from the observed data they would expect to find evidence for independence between age and FFW and possible dependence between each of the other variable pairs (e.g., FFW and physical activity) given the previously derived testable implications of the DAG depicted in Figure 1.The estimated correlation between FFW and age is statistically non-significant and negligible in size, r FFW,age = .03,consistent with the null expectation and providing support for the strong testable implication.The estimated bivariate correlation between each of the other variable pairs is statistically significant and ranges from −.33 (i.e., r self-efficacy,age ) to .24(i.e., r self-efficacy,physical activity ).Whether any observed association (e.g., r FFW,physical activity = .20)indicates possible causality (e.g., total causal effect of FFW on physical activity) is not yet clear but is the focus of the next section.

Primary objective: determining if the total causal effect of an individual randomized physical activity-promoting intervention is identifiable
Setting the status of a node pair to exposure and outcome, respectively, clarifies the causal effect identification of interest for DAGitty.By default, the specific causal effect for which identification is evaluated by DAGitty is the total causal effect of the exposure on the outcome.A definition of the total causal effect at the population level is: the average difference in potential outcomes (e.g., individual-level physical activity) at different levels of the exposure (e.g., treatment group versus control group) over the entire population (e.g., Rubin, 1974).The total causal effect includes all causal (and no non-causal) pathways from exposure to the outcome.A total causal effect is identified if it is possible to recover this effect under ideal conditions (e.g., true M A ; population data; absence of measurement error etc.).Thus, identification (e.g., is it possible to recover a total causal effect under ideal conditions) occurs prior to estimation (e.g., how to estimate a total causal effect under less-than-ideal conditions).
Evaluating if the total causal effect of the exposure on the outcome is identifiable in DAGitty is intuitive because it visually communicates if it is possible under ideal conditions to: (a) retain all associations transmitted along causal paths (i.e., "causal" associations) while (b) blocking all associations transmitted along non-causal paths (i.e., "spurious" associations) between the exposure and the outcome given the M A .If the answer to this question is "no" then proceeding to the final secondary objective of this manuscript (i.e., informing empirical estimation of the total causal effect of an intervention on an outcome) is not possible unless M A is altered.We note that other causal effect (e.g., direct effect of the exposure on the outcome) identification is possible in DAGitty but requires additional common cause confounding assumptions.For this reason, we focus on total causal effect identification and later briefly discuss other causal effect (e.g., indirect effect of the exposure on the outcome through the mediator) identification in the Brief Concluding Remarks section.
Return to our simple four-variable demonstration is depicted in Figure 1.Previously setting the status of node pair E (e.g., FFW) to exposure and O (e.g., physical activity) to outcome clarified the total (by default) causal effect identification of interest for DAGitty.Thus, the causal effect identification evaluated by DAGitty in our demonstration was: is the total causal effect of FFW on physical activity identified given our M A1 ?The answer to this question is "yes" and is depicted within the "Causal effect identification" window of DAGitty using the code provided in Appendix A in the supplemental materials.Specifically, the following key text (italicized for clarity) is provided within the "Causal effect identification" window of DAGitty: No open biasing paths.No adjustment is necessary to estimate the total effect of E (e.g., FFW) on O (e.g., Physical Activity).Figure 3 in the supplemental materials provides a relevant screenshot with the fuller text provided by DAGitty for the DAG depicted in Figure 1.Note that the total causal effect of FFW on physical activity also is identified (without conditioning) in the more cumbersome 12-variable DAG depicted in Figure 1. Figure 4 in the supplemental materials provides a relevant screenshot of the "Causal effect identification" window of DAGitty for the DAG depicted in Figure 1.
Table 1 provides another, and perhaps more intuitive, way to see that the total causal effect of E (e.g., FFW) on O (e.g., physical activity) is identified (without conditioning) given our M A1 .Focus on the three connecting paths between node pair E and O.Note that the two causal paths (i.e., E → O and E → M → O) are open (i.e., allowing transmission of "causal" association), while the lone non-causal (or "biasing" in DAGitty language) path (i.e., E → M ← C → O) is closed (i.e., disallowing transmission of "spurious" association) by collider M. Thus, the total causal effect of E (e.g., FFW) on O (e.g., physical activity) is identified (without conditioning) in our M A1 because the causal paths between E and O (i.e., E → O and E → M → O) are open, while the lone non-causal path between E and O (i.e., E → M ← C → O) is closed.Appendix F in the supplemental materials demonstrates the identification of the total causal effect of E (e.g., FFW) on O (e.g., physical activity) from the potential outcome framework using counterfactuals (i.e., an equivalent way to express this identification result).Now that we know that it is possible to identify the total causal effect of E (e.g., FFW) on O (e.g., physical activity) we proceed to the final secondary objective of this manuscript: Using a DAG (e.g., M A1 ) to inform empirical estimation (e.g., covariate selection within a SEM) of the total causal effect of an intervention (e.g., FFW) on an outcome (e.g., physical activity).

Secondary objective 3: informing empirical estimation of the total causal effect of an intervention on an outcome
An important Q i (i.e., particular query of interest) to empirically estimate in physical activity intervention research is: What is the total causal effect of an individual-level theory-based physical activity-promoting intervention on individual-level physical activity postintervention?In this case, under an intent-to-treat approach (e.g., Hollis & Campbell, 1999) and assuming a traditional two-arm RCT, the population parameter (i.e., the estimand) that typically is to be estimated is the true mean difference (τ) on physical activity postintervention for the intervention group compared to the control group (e.g., Little & Lewis, 2021).A well-known rule (i.e., the estimator) for estimating τ in linear SEM is to estimate the path coefficient from the randomization variable (e.g., 0 = control group, 1 = intervention group) to physical activity under maximum-likelihood (ML) estimation, while controlling for all confounders (i.e., common causes) of the randomization variable and physical activity given the M A (e.g., Bollen, 1989;Pearl, 2017;Wright, 1921).The result from the previous section on our primary objective provided the set of variables (i.e., null set) that would typically need to be controlled for, given M A , to get an estimate of .τThis is how a DAG can inform empirical estimation (e.g., covariate selection within a SEM) of the total causal effect of an intervention on an outcome.Under this modeling approach no distributional assumptions are made for exogenous (i.e., "roots" in a DAG) variables (e.g., randomization).
Return to our simple four-variable demonstration depicted in Figure 1.Recall the Q i for which we want an empirical estimate: What is the total causal effect of FFW (i.e., node E) on physical activity (i.e., node O)?In this case (i.e., the WBPA study research design), under an intent-to-treat approach, τ is the true mean difference on physical activity 60-days post-baseline for the FFW group compared to the control group.One estimator for τ in linear SEM is to estimate the path coefficient from FFW (i.e., 0 = control group, 1 = FFW group) to physical activity under ML estimation, while controlling for all confounders of FFW and physical activity given M A1 .Recall that the result (provided by DAGitty) from the previous section on our primary objective suggests that no adjustment (e.g., statistically controlling for) is necessary because there are no common cause confounders of FFW and physical activity in M A1 .Thus, this estimate of τ may be referred to as the unconditional total causal effect of FFW on physical activity.This simple model was estimated in Mplus 8.10 using the observed data (see Appendix G in the supplemental materials for the input file).The estimate of the unconditional total causal effect of FFW on physical activity was: τ = 5.68, SE = 1.60, p < .001.Thus, the estimate of the true unadjusted mean difference on physical activity 60-days post-baseline for the FFW group (M = 15.91) as compared to the control group (M = 10.23) was 5.68.Given the previous identification result for our M A1 , this estimate has a causal interpretation: on average, the total causal effect of assignment to the FFW intervention was an increase in physical activity of 5.68 hours per week.

Conditional total causal effect
In physical activity intervention research, it is often of interest to estimate the total causal effect of an individual-level theory-based physical activity-promoting intervention on individual-level physical activity postintervention, after controlling for covariates such as demographic variables, baseline values of physical activity, etc (e.g., M. Gourlan et al., 2016;USDHHS, 2018;WHO, 2020).In this case, under an intent-totreat approach and assuming a traditional two-arm RCT, the population parameter (i.e., the estimand) that typically is to be estimated is the true adjusted (for covariates) mean difference (τ a ) on physical activity post-intervention for the intervention group compared to the control group (e.g., Little & Lewis, 2021).One well-known rule (i.e., the estimator) for estimating τ a in linear SEM is to estimate the path coefficient from the randomization variable (e.g., 0 = control group, 1 = intervention group) to physical activity under ML estimation, while controlling for all relevant covariates of physical activity given the M A -that would leave the conditional total causal effect identified (e.g., Bollen, 1989;Pearl, 2017;Wright, 1921).Evaluating if the conditional total causal effect of the exposure on the outcome remains identifiable is important because it communicates if it is possible to: (a) retain all associations transmitted along causal paths (i.e., "causal" associations) while (b) blocking all associations transmitted along non-causal paths (i.e., "spurious" associations) between the exposure and the outcome given the M A , after controlling for the proposed set of covariates.This is how a DAG can inform empirical estimation (e.g., covariate selection within a SEM) of the conditional total causal effect of an intervention on an outcome.
Return to our simple four-variable demonstration is depicted in Figure 1.Recall the slightly modified Q i for which we now want an empirical estimate: What is the conditional total causal effect of FFW (i.e., node E) on physical activity (i.e., node O)?Note that there appears to be two possible covariates depicted in Figure 1: node C (e.g., age) and node M (e.g., self-efficacy).As a pedagogical exercise, we begin by evaluating if the total causal effect of FFW on physical activity remains identified after controlling for both age and self-efficacy.Return to Appendix A in the supplemental materials and use the model code to re-generate Figure 1 in DAGitty.Then click on node C (e.g., age) and check "adjusted" in the Variable window in the upper left corner.Repeat this process for node M (e.g., self-efficacy).Figure 3   Adjusting for M (e.g., self-efficacy) alone would not only result in the closing of causal path E → M → O (i.e., blocking a "causal" association) it would also result in the opening of non-causal path E → M ← C → O (i.e., allowing transmission of a "spurious" association).Figure 4 contains this modified DAG, depicted in SEM-like style, created in DAGitty 3.1.Given that the total causal effect of FFW on physical activity is not identified after controlling for both age and self-efficacy, nor for controlling for self-efficacy alone, we do not proceed to estimation of these two τ a .
Adjusting for node C (e.g., age) only (see Figure 5) leaves the conditional total causal effect of FFW (i.e., node E) on physical activity (i.e., node O) identified because the E → M → O causal path is no longer blocked; the E → O causal path remains open; and the E → M ← C → O non-causal path remains closed.This model was estimated in Mplus 8.10 using the observed data (see Appendix H in the supplemental materials for the input file) after centering age on its mean for ease of interpretation.The estimate of this conditional total causal effect of FFW on physical activity was: τa = 5.60, SE = 1.60, p < .001.Thus, the estimate of the true adjusted (for age) mean difference on physical activity 60-days post-baseline for the FFW group (M = 15.87)compared to the control group (M = 10.27) was 5.60.Given the identification result for this slightly modified M A1 , this estimate has a causal interpretation: on average, the total causal effect of assignment to the FFW intervention was an increase in physical activity of 5.60 hours per week, after controlling for age.Note that the more cumbersome 12-variable DAG depicted in Figure 1 follows a similar pattern regarding identification of a conditional total causal effect of FFW on physical activity -identified for any combination of potential covariate adjustment unless self-efficacy is included in the adjustment set.This is how a DAG can inform empirical estimation (e.g., covariate selection within a SEM) of the conditional total causal effect of an intervention on an outcome.Simply, post-intervention variables in causal paths between E (e.g., FFW) and O (e.g., physical activity), such as M (e.g., self-efficacy), should not be adjusted for if identification of a conditional total causal effect is of interest (e.g., Rosenbaum, 1984).The E node is green and with a ► symbol to communicate its status as the exposure node.The O node is blue and with a│symbol to communicate its status as the outcome node.The C node is gray to communicate its role as irrelevant for identifying sufficient adjustment sets now that it has been assigned adjusted status.The M node is blue to communicate its role as ancestor of the O node.

Figure 1 .
Figure 1.Causal model 1 depicted in a Directed Acyclic Graph (DAG).Note.FFW = Fun For Wellness.The E node is green and with a ► symbol to communicate its status as the exposure node.The O node is blue and with a│symbol to communicate its status as the outcome node.The C node and the M node are blue to communicate their role as ancestors of the O node.The green arrow from E to O represents an open causal path between the exposure node (i.e., E) and the outcome node (i.e., O).The sequence of green arrows from E to M to O represents an open causal path between the exposure node (i.e., E) and the outcome node (i.e., O).
and M → O.A node pair connected by an arrow is adjacent.Five of the six node pairs (i.e., C and M, C and O, E and M, E and O, and M and O) are adjacent in the DAG depicted in Figure 1.

Figure 3 .
Figure 3.Causal Model 1 depicted in a Directed Acyclic Graph (DAG) after adjusting for C (e.g., age) and M (e.g., self-efficacy).Note.FFW = Fun For Wellness.The E node is green and with a ► symbol to communicate its status as the exposure node.The O node is blue and with a│symbol to communicate its status as the outcome node.The C node and the M node are gray to communicate their role as irrelevant for identifying sufficient adjustment sets now that they have been assigned adjusted status.The green arrow from E to O represents an open causal path between the exposure node (i.e., E) and the outcome node (i.e., O).The sequence of black arrows from E to M to O represents a closed causal path between the exposure node (i.e., E) and the outcome node (i.e., O).

Figure 4 .
Figure 4. Causal Model 1 depicted in a Directed Acyclic Graph (DAG) after adjusting for M (e.g., self-efficacy) only.Note.FFW = Fun For Wellness.The E node is green and with a ► symbol to communicate its status as the exposure node.The O node is blue and with a│symbol to communicate its status as the outcome node.The M node is gray to communicate its role as irrelevant for identifying sufficient adjustment sets now that it has been assigned adjusted status.The C node is blue to communicate its role as ancestor of the O node.The green arrow from E to O represents an open causal path between the exposure node (i.e., E) and the outcome node (i.e., O).The sequence of red arrows from E to M to C to O represents an open biasing path between the exposure node (i.e., E) and the outcome node (i.e., O).
contains this modified DAG, depicted in SEM-like style, created in DAGitty 3.1.Note the new text within the "Causal effect identification" window of DAGitty.Specifically, the following key text (italicized for clarity) is provided within the "Causal effect identification" window of DAGitty: Incorrectly adjusted.No adjustment sets found.
Figure 5 in the supplemental materials provides a relevant screenshot with the fuller text provided by DAGitty.Thus, the causal effect identification now evaluated by DAGitty in our demonstration was: is the conditional total causal effect of FFW on physical activity identified given our modified M A1 ?The answer to this question is "no" because a previously open causal path (i.e., E → M → O) is now closed due to adjusting for M.This change from open to closed can be seen in DAGitty by comparing Figure 1, where the arrows in the causal path E → M → O are green (i.e., open), to Figure 3, where the arrows in the causal path E → M → O are black (i.e., closed).

Figure 5 .
Figure 5. Causal Model 1 depicted in a directed acyclic graph (DAG) after adjusting for C (e.g., age) only.Note.FFW = Fun For Wellness.The E node is green and with a ► symbol to communicate its status as the exposure node.The O node is blue and with a│symbol to communicate its status as the outcome node.The C node is gray to communicate its role as irrelevant for identifying sufficient adjustment sets now that it has been assigned adjusted status.The M node is blue to communicate its role as ancestor of the O node.The green arrow from E to O represents an open causal path between the exposure node (i.e., E) and the outcome node (i.e., O).The sequence of green arrows from E to M to O represents an open causal path between the exposure node (i.e., E) and the outcome node (i.e., O).
Figure 5. Causal Model 1 depicted in a directed acyclic graph (DAG) after adjusting for C (e.g., age) only.Note.FFW = Fun For Wellness.The E node is green and with a ► symbol to communicate its status as the exposure node.The O node is blue and with a│symbol to communicate its status as the outcome node.The C node is gray to communicate its role as irrelevant for identifying sufficient adjustment sets now that it has been assigned adjusted status.The M node is blue to communicate its role as ancestor of the O node.The green arrow from E to O represents an open causal path between the exposure node (i.e., E) and the outcome node (i.e., O).The sequence of green arrows from E to M to O represents an open causal path between the exposure node (i.e., E) and the outcome node (i.e., O).

Table 1 .
Pairs of nodes in causal Model 1 (M A1 ): connecting paths and testable implications.

Table 2 .
Descriptive statistics for the four variables (i.e., nodes) in causal Model 1 (M A1 ) from a fictitious population and a simulated sample dataset (i.e., the observed data).Threshold value corresponding to a .50population proportion of individuals assigned to the FFW group.b Not an independent model parameter due to the dichotomous scaling of the FFW variable.c Threshold value estimate corresponding to a .48sample proportion of individuals assigned to the FFW group.*p < .05. ***p < .001. a