Identifying and estimating net eﬀects of treatments in sequential causal inference

: Suppose that a sequence of treatments are assigned to inﬂuence an outcome of interest that occurs after the last treatment. Between treatments, there are time-dependent covariates that may be post-treatment variables of the earlier treatments and confounders of the subsequent treatments. In this article, we study identiﬁcation and estimation of the net eﬀect of each treatment in the treatment sequence. We construct a point parametrization for the joint distribution of treatments, time-dependent covariates and the outcome, in which the point parameters of interest are the point eﬀects of treatments considered as single-point treatments. We identify net eﬀects of treatments by their expressions in terms of point eﬀects of treatments and express patterns of net eﬀects of treatments by constraints on point eﬀects of treatments. We estimate net eﬀects of treatments through their point eﬀects under the constraint by maximum likeli- hood and reduce the number of point parameters in the estimation by the treatment assignment condition. As a result, we obtain an unbiased con- sistent maximum-likelihood estimate for the net eﬀect of treatment even in a long treatment sequence. We also show by simulation that the inter- val estimation of the net eﬀect of treatment achieves the nominal coverage probability.


Introduction
In many economic and medical practices, a sequence of treatments are assigned to influence an outcome of interest that occurs after the last treatment of the sequence. Between consecutive treatments, time-dependent covariates are present that may be post-treatment variables of the earlier treatments (Rosenbaum [16], Robins [8], Frangakis & Rubin [1]) and confounders of the subsequent treatments. In this article, we aim at the net effect of each treatment in the sequence, i.e. the causal effect of the treatment on the subpopulation defined by the previous treatments and time-dependent covariates while setting the subsequent treatments at controls. The net effect of treatment is also called the blip effect in the context of semi-parametric sequential causal inference (Robins [9,10,14,15]).
By using the G-computation algorithm formula (also called the G-formula), Robins [7,10,12,14,15] identified the net effect of treatment by standard parameters, which are usually the means of the outcome in subpopulations defined by all the treatments and time-dependent covariates. Robins illustrated that any constraint imposing equalities among standard parameters may lead to automatic rejection of the null hypothesis of net effects if the time-dependent covariate is a post-treatment variable of the earlier treatments and a confounder of the subsequent treatments. As the treatment sequence gets long, the number of standard parameters becomes huge, and with no constraint on these parameters, the maximum-likelihood (ML) estimates of net effects may not be consistent (Robins & Ritov [11]). To obtain non-genuine likelihood-based estimates of net effects, two semi-parametric approaches have been developed: the structural nested model (Robins [9,10,14,15], Robins et al. [13], Murphy [6], Henderson et al. [4]) and the marginal structural model (Robins [12,15], Murphy et al. [5]).
On the other hand, every treatment in the sequence can be considered as a single-point treatment. In the framework of single-point causal inference (Rosenbaum & Rubin [17], Rosenbaum [18], Rubin [19]), a treatment has a point causal effect on a subpopulation defined by the previous treatments and timedependent covariates. The point causal effect of the treatment can be identified by the point effect of the treatment, which corresponds to differences between the means of the outcome in the sub-subpopulations assigned with different treatments. In this case, the time-dependent covariates act only as confounders. As a result, a constraint on point effects does not necessarily lead to automatic rejection of the null hypothesis of point causal effects.
Intuitively, the point causal effect of each treatment in the sequence results from the net effects of this and all the subsequent treatments. If one can identify the point causal effects by the point effects, then one should be able to identify the net effects by the point effects. Because a constraint on point effects does not necessarily lead to automatic rejection of the null hypothesis of point causal effects, it should not lead to automatic rejection of the null hypothesis of net effects. The purpose of this article is to identify and estimate the net effects through their point effects in line of the above observation.
In Section 2, we introduce the background and notation of the identification and estimation of the net effects of treatments. In Section 3, we construct a point parametrization for the joint distribution of treatments, time-dependent covariates and an outcome after the last treatment, in which the point parameters of interest are the point effects of treatments. In Section 4, we identify net effects by their point effects, express patterns of net effects by constraints on point effects and analyze the role of treatment assignment conditions in the identification. In Section 5, we estimate net effects through their point effects under the constraint and reduce the number of point parameters in the estimation by the treatment assignment condition. In Section 6, we illustrate our method via an analytical example, a simulation study and a real example. In Section 7, we conclude the article with remarks and discussion.

Treatment sequence, potential covariates and potential outcome
Let z t indicate the treatments at time t (t = 1, . . . , T ). Assume that all z t are discrete variables and have the values 0, 1, . . .. We take z t = 0 as the control treatment and z t = 1, 2, . . . as active treatments. Let the set z t 1 = (z 1 , . . . , z t ) indicate the treatment sequences from times 1 to t. Suppose that every treatment sequence z T 1 could be applied to each unit of a population. Assume that there is no interference between units and no represented treatment sequence for any unit. For notational simplicity, we use one subpopulation defined by stationary covariates of the population as our population, and henceforth, do not consider stationary covariates in the following development.
In the framework of sequential causal inference, each unit could have a potential (time-dependent) covariate vector x t (z t 1 ) between treatments z t and z t+1 and a potential outcome y(z T 1 ) of our interest after the last treatment z T under treatment sequence z T 1 . Assume that x t (z t 1 ) is a discrete vector with nonnegative components. We take x t (z t 1 ) = 0 as the reference level. Let x t 1 (z t 1 ) = {x 1 (z 1 ), x 2 (z 2 1 ), . . . , x t (z t 1 )} be the potential covariate array between treatments z 1 and z t+1 .
In the above definition of the potential covariates and outcome, every z t in z T 1 is a deterministic function of the earlier treatments and potential covariates, does not depend on the earlier treatments and potential covariates, then z T 1 is a static treatment sequence, and otherwise, it is a dynamic treatment sequence.

Net effects of treatments
In most practical cases, treatments z t (t = 1, . . . , T ) are consecutively and randomly assigned, so the potential covariate vectors x t (z t 1 ) (t = 1, . . . , T − 1) and the potential outcome y(z T 1 ) become consecutively and randomly observable. Denote the observable covariate vector simply by x t (t = 1, . . . , T − 1) and the observable outcome by y. Let x t 1 = (x 1 , . . . , x t ). The assigned treatments z t−1 1 are assumed to be a treatment sequence, which can be static or dynamic and leads to the observable covariate array x t−1 1 . This assumption is known as the consistency assumption (Robins [7,8,9,10,12,14,15]). Let (z t− 1 1 , x t−1 1 ) be a set of observable variables in which the treatment sequence z t−1 1 leads to the observable covariate array x t−1 1 . Let z T t = (z t , . . . , z T ) be the treatment sequence given the observable vari- ) and a potential outcome y(z T 1 ). [9,10,14,15] defined the net effect of z t > 0 given the observable variable for t = 1, . . . , T . Here the expectation E(b | a) is with respect to the conditional distribution of b given a in the population. Throughout the article, we adopt the following notational conventions. First, should be omitted from relevant expressions. For instance, the notations z 0 1 and x 0 1 in (1) for t = 1 should be omitted, and so (1) becomes Similarly, the notation z T T +1 = 0 in (1) for t = T should be omitted, and so (1) becomes ; we may use one or another notation in different contexts.

G-formula for net effects of treatments
In the following assumption, let z * t indicate the treatments randomly assigned at time t. We assume that the assignment of z * t (t = 1, . . . , T ) satisfies for any treatment sequence z T t given the observable variable (z t−1 1 , x t−1 1 ). Here A⊥B | C means that A is conditionally independent of B given C. The z t is a specified treatment at time t in the specified treatment sequence z T t . The first part of (2) is known as the assumption of no unmeasured confounders (Robins [7,8,9,10,12,14,15]). The second part is known as the positivity assumption. There may exist other observable covariates than x T −1 1 but the assignment of z * t does not depend on them and no further information is available about them.
Standard parameters for the conditional distribution of y given (z T 1 , ). Standard parameters for the conditional distribution of the discrete variable x t given (z t 1 , Using assumption (2), Robins (1986Robins ( , 1997) derived the G-formula ). Combining (1) and (3), we obtain the G-formula for the net effect for t = 1, . . . , T − 1 and ) (s = t, . . . , T − 1) under assumption (2). If x t (t = 1, . . . , T − 1) are post-treatment variables of z s (s ≤ t), then the standard parameters essentially do not have any pattern (Rosenbaum [16], Robins [8], Frangakis & Rubin [1]). If x t are simultaneously confounders of z s (s > t), then one needs to use all these standard parameters to identify With a long treatment sequence, the number of these parameters is huge. Without a constraint on standard parameters, the ML estimates of φ(z t− 1 1 , x t−1 1 , z t ) may not be consistent (Robins [7,10,12,14,15], Robins & Ritov [11]).
In this article, instead of standard parameters, we are going to use point effects of treatments to identify the net effects of treatments under assumption (2).

Joint distribution of observable variables
, y) of the observable variables, we consider N independent and identically distributed sets, Here f (u | v) is a conditional probability distribution of u given v if u is discrete, and a conditional density distribution of u given v if u is continuous. We are going to construct point parameterizations for (6) in Section 3.2 and for (7), (8) and then (5) in Section 3.3.

Point parametrization for the conditional outcome distribution (6)
, a stratum is a set of those sets satisfying certain conditions. For instance, a stratum (z t 1 , x t−1 1 ) is a set of those sets satisfying (z t i1 , x t−1 i1 ) = (z t 1 , x t−1 1 ). Let prop(A) denote the proportion of stratum A in the N sets and prop(A | B) denote the conditional proportion of stratum A in stratum B.
Consider the mean of y in stratum (z t 1 , Consider the mean of y in stratum (z t Let which is the grand mean of y in all the N sets , the proportions of treatments and covariates can be treated as constants. Therefore the point effects of treatments, the point effects of covariates and the grand mean are linear functions of the standard parameters ) and thus are parameters of (6) which are called point parameters. According to (9), (10) and (11), each point parameter can be expressed in terms of the standard parameters μ(z T 1 , ). Conversely, we show in Appendix A.1 that each standard parameter can be expressed in terms of the point parameters by where we take ϑ(z t−1 1 , x t−1 1 , z t = 0) = 0 and ζ(z t 1 , x t−1 1 , x t = 0) = 0. Let Ψ y be the set of all the point parameters, i.e. Ψ y = {ϑ(z t− 1 1 , Then Ψ y forms a parametrization for (6). Definition 1. The set Ψ y is called a point parametrization for the conditional outcome distribution (6)

Point parametrization for the joint distribution (5)
Treating x t as an outcome conditional on the observable variable (z t 1 , x t−1 1 ) and repeating the procedure of the previous subsection, we construct point parameters and the point parametrization for the conditional distribution (7). Denote the point parametrization by Ψ xt (t = 1, . . . , T − 1). In observational studies, treatments z t (t = 1, . . . , T ) can be exposures, whose assignments are unknown and need to be estimated. Treating z t as an outcome conditional on the observable variable (z t− 1 1 , x t−1 1 ) and repeating the procedure of the previous subsection, we construct point parameters and the point parametrization for the conditional distribution Let Ψ = {Ψ y ; Ψ zt , t = 1, . . . , T ; Ψ xt , t = 1, . . . , T − 1}. Then Ψ forms a parametrization for the joint distribution (5).

Limits of point parameters of Ψ y
As described in Section 3, the point parameters of Ψ y are conditional on N , the proportions of treatments and covariates are different and therefore these parameters may be different. On the other hand, the limits of these parameters as N approaches infinity exist and are properties of the population.
As N approaches infinity, the proportion prop which is the point effect of treatment z t > 0 on subpopulation which is the point effect of covariate which is the grand mean in the population.
Letting N approach infinity in (12), we obtain We are going to show that we only need the point effects θ

Net effects of treatments versus point effects of treatments
Inserting (16) into (3), we derive the following formula in Appendix A.2 for t = 1, . . . , T − 1 and Inserting (17) into (1), using θ(z s−1 , and then dropping the unnecessary superscript *, we express each net effect φ(z t− 1 1 , x t−1 1 , z t ) in terms of the point effects of treatments and the probabilities of covariates and treatments by .
Formula (18) shows that each net effect φ(z t−1 1 , x t−1 1 , z t ) can be expressed in terms of the point effects of treatments and the probabilities of covariates and treatments.
In Appendix A.3, we derive the following formula, under assumption (2), for t = 1, . . . , T − 1 and Formula (19) implies that the mean μ(z t 1 , x t−1 1 ) arises from the net effects of active treatments z s > 0 at times s ≥ t on substrata (z s 1 , (19) can also be derived from formula (8.3) of Robins [10]. Inserting (19) into (13), we obtain for t = 1, . . . , T − 1 and Formula (20) shows that each point effect θ(z t−1 1 , x t−1 1 , z t ) can be expressed in terms of the net effects of treatments and the probabilities of covariates and treatments. The formula decomposes θ(z t−1 . In summary, we have the following identification theorem: Under assumption (2), for the given probabilities of covariates and treatments, the point effects (18) and (20).

Patterns of net effects of treatments versus constraints on point effects of treatments
Suppose that the data-generating mechanism is such that the net effects φ(z t−1 1 , where the parameter vector ϕ = (ϕ 1 , ϕ 2 , ϕ 3 ) indexes all the net effects. Generally, we consider a pattern of the net effects described by a function where the k-dimensional parameter vector ϕ = (ϕ 1 , . . . , ϕ k ) indexes all the net effects. We call ϕ the net effect vector. Inserting (21) into (20), we obtain a constraint on the point effects θ(z t−1 1 ,

X. Wang and L. Yin
Despite pattern (21), the covariate x t can still be a post-treatment variable of z t 1 (Robins [7,10,12,14,15]). In this case, the standard parameter μ(z T 1 , x T −1 1 ) does not have any pattern for treatments z t (t < T ) (Rosenbaum [16], Robins [8], Frangakis & Rubin [1]). If x t is additionally a confounder of z T t+1 , then one needs all the standard parameters to identify the net effect.

Role of treatment assignment conditions in the identification of net effects of treatments
In the framework of single-point causal inference, it is well known that treatment assignment conditions may reduce the number of parameters in the identification of the causal effect of a single-point treatment (Rosenbaum & Rubin [17], Rosenbaum [18], Rubin [19]). In randomized trials, one does not need to use covariates to identify the single-point causal effect. In observational studies even with a large number of covariates, one can approximate the treatment assignment by several sub-randomized trials called subclasses, and one only needs to use the subclasses to identify the single-point causal effect.
In sequential randomized trials, the covariate vector x t typically has a small dimension, but the array x t−1 1 has a huge dimension at large t. In observational sequential studies, the vector x t can easily have a dimension of several dozens. On the other hand, every treatment in the treatment sequence can be treated as a single-point treatment. We may use assignment conditions of individual treatments z t to reduce the number of point parameters in constraint (22). For illustration, we consider the Markov process, a common assignment mechanism of a treatment sequence, in which the assignment of z t only depends on a limited history of previous treatments and covariates, for instance, x t−1 . In this case, the following proportion equality is satisfied, at least approximately: Assuming equality (23), given Taking the average on both sides of (9) with respect to prop(z t−1 1 , x t−2 1 | x t−1 ), we then obtain the following point effect of treatment z t > 0 on stratum As N approaches infinity, prop(z t−1 . Then (23) converges to the following probability equality: which is the point effect of z t > 0 on subpopulation x t−1 . Now we consider a constraint on θ(x t−1 , z t ). For illustration, suppose the following pattern of the net effects for t = 1, . . . , T . Firstly inserting (25) into (20), secondly taking the average on both sides of (20) with respect to pr(z t−1 1 , x t−2 1 | x t−1 ) and noticing the probability equality above, and finally using the equality we obtain a constraint on the point effects θ(x t−1 , z t ).

Corollary 2.
For the given probabilities of covariates and treatments, under the Markov process and pattern (25), all θ(x t−1 , z t ) are indexed by the net effect vector ϕ and given by Despite the Markov process and pattern (25), the covariate x t can still be a post-treatment variable of z t (Robins [7,10,12,14,15]). In this case, the standard parameter μ(z T 1 , x T −1 1 ) does not have any pattern for treatments z t (t < T ) (Rosenbaum [16], Robins [8], Frangakis & Rubin [1]). If x t is additionally a confounder of z t+1 , then one needs all the standard parameters to identify the net effect.

Remark 3.
Generally, the Markov process and pattern (25) do not simplify the G-formula (4).

Likelihoods of point parameters
The data set comprises independent observations {z T i1 , x T −1 i1 , y i } on units i = 1, . . . , N. Using distribution (5) in Section 3.1, we obtain the following likelihood of the point parameters Ignoring the sampling variability of the treatments and covariates , we treat the proportions as the probabilities and ϑ(z t−1 . . , T ). Hence we can use (27) alone to estimate the net effect φ(z t−1 1 , x t−1 1 , z t ). In some circumstances, it is also of interest to incorporate this sampling variability into the estimation of the net effect. In this case, we can use (28) and (29) to estimate this variability. With the estimate of this variability, we can obtain the estimate of the net effect incorporating this sampling variability of treatments and covariates.
In the rest of the article, we are going to focus on the estimation of the net effect based on (27), thus ignoring the sampling variability of treatments and covariates.

ML estimation of net effects of treatments under a constraint
The outcome model is where ) which is expressed by (12) in terms of the point parameters in Ψ y defined in Section 3.2.
Ignoring the sampling variability of for t = 1, . . . , T − 1 and ϑ(z T −1 1 , x t−1 1 , z t ) into the net effects of treatments z s > 0 at times s ≥ t in stratum (z t−1 1 , x t−1 1 , z t ) versus stratum (z t−1 1 , x t−1 1 , z t = 0). For common distributions, the net effects can be estimated according to the following procedure. First, we estimate the mean ν(z t− 1 1 , x t−1 1 , z t ) (t = 1, . . . , T ) by using likelihood (27) and model (30). The estimateν(z t−1 to calculate the estimate of the point effect ϑ(z t− 1 1 , x t−1 1 , z t ) according to formula (9). Third, we perform a regression ofθ(z t−1 according to (31) to obtain the estimate of the net effect vector ϕ. Finally, we replace ϕ byφ in pattern (21) to obtain the estimate of φ(z t−1 . This procedure will be further illustrated in Section 6. Here we discuss the unbiasedness and consistency of the estimate.
The estimateν(z t− 1 1 , x t−1 1 , z t ) is unbiased for finite sample and so isθ(z t−1 1 , is a linear function of ϕ, then the estimateφ is unbiased according to (31) treated as a regression model, and so isφ(z t− 1 1 , satisfies simple regularity conditions, for instance, that it is a smooth and monotone function of ϕ.
Oftentimes, the dimension k of ϕ is finite, that is, the net effects have a pattern of finite dimension. Treating (31) as a regression model, we see thatφ is consistent and so isφ(z t−1 which contain the k-dimensional vector ϕ and whose estimates have zero covariance matrices as the sample size N approaches infinity. This condition can be satisfied even with a long treatment sequence, where the variables z t and x t−1 take finite numbers of values.

ML estimation of net effects of treatments using treatment assignment conditions
At large t, the number of possible strata (z t−1 1 , x t−1 1 ) is huge, and with a finite sample, most of these strata do not have both active and control values of the treatment variable z t , so most of the point effects ϑ(z t− 1 1 , x t−1 1 , z t ) are not estimable. In this case, it is highly difficult to estimate the net effects . However, as described in Section 4.4, we may estimate the net effects φ(z t− 1 1 , x t−1 1 , z t ) through the point effects on large strata under treatment assignment conditions, for instance, ϑ(x t−1 , z t ) under the Markov process. Even with a small sample and large t, stratum x t−1 is large enough for z t to take both active and control values so that ϑ(x t−1 , z t ) is estimable. Furthermore, the proportion equality (23), or equivalently, can be approximately satisfied. For instance, a small sample is sufficient to allow approximately the same marginal distribution of each variable of (z t−1 1 , x t−2 1 ) in stratum (x t−1 , z t > 0) versus stratum (x t−1 , z t = 0). In some applications, correlations between the variables, often the nearest ones in the sequence, have an influence on the net effect. With a slightly larger sample, we may obtain approximately the same correlations between the nearest variables in stratum The outcome model is still (30), i.e.
Ignoring the sampling variability of , we replace the probabilities by the proportions and θ(x t−1 , z t ) by ϑ(x t−1 , z t ) in constraint (26), leading to the following constraint on ϑ( All arguments and statements about the estimation procedure, unbiasedness and consistency of the ML estimateφ(z t− 1 1 , x t−1 1 , z t ) are carried over from those in the previous subsection if we replace ν(z t−1 (9) by (24), and (31) by (32).
Suppose that all treatments of type one have the same net effect ϕ 1 and all treatments of type two have the same net effect ϕ 2 . Then the pattern of the net effects is Let T (1) (t) denote the set of times since t at which treatments of type one are assigned and T (2) (t) the set of times since t at which treatments of type two are assigned. Decomposing the point effect ϑ(z t−1 1 , x t−1 1 , z t = 0) and using the pattern above, we obtain the following constraint on ϑ(z t−1 where q = 1, 2 indicates treatments of types one and two respectively.

ML estimates of net effects of treatments under a constraint
Now suppose that y is normally distributed. For simplicity, additionally suppose that the variance is equal to one for any given (z T 1 , is expressed in terms of point parameters by (12). The outcome model we use is (30), i.e.
First we estimate the point effect ϑ(z t−1 1 , x t−1 1 ). Let s(A) be the set of units in stratum A and n(A) be the number of units in stratum A. Using the outcome model and the likelihood above, we obtain the estimate for the mean of y in stratum (z t− 1 1 , . Using (9), we then obtain the estimate for the point effect of z t = 1 on stratum and its variance In Appendix A.4, we prove Proposition 1. Suppose that the outcome y is normal and has the same known variance for all given (z T 1 ,

Remark 4.
For the outcome of other distributions, the estimateν(z t− 1 1 , x t−1 1 , z t ) and thusθ(z t− 1 1 , x t−1 1 ) are highly robust to point parameters at time s > t. Thereforeθ(z t− 1 1 , x t−1 1 ) at time t is weakly correlated with estimates of the point parameters at the other times and the correlation may be omitted.
As a result of the proposition, we treat constraint (34) as a linear regression with unequal variances var{θ(z t−1 1 , x t−1 1 )}. Using techniques of the linear regression, we regressθ(z t−1 to obtain the estimates of ϕ 1 and ϕ 2 . The closed form for these estimates can easily be derived and are not presented here due to space restrictions. The estimatesφ 1 andφ 2 are both unbiased and consistent.

ML estimates of net effects of treatments under the Markov process
Suppose that the net effects have pattern (33) while the treatment assignment mechanism is a Markov process, described in Sections 4.4 and 5.3, in which the assignment of z t depends only on x t−1 . Given x t−1 , there is only one net effect φ(x t−1 , z t = 1) denoted by φ(x t−1 ) and one point effect ϑ(x t−1 , z t = 1) denoted by ϑ(x t−1 ), and noticeably, φ(z 1 = 1) = φ and ϑ(z 1 = 1) = ϑ at t = 1. Notice that stratum x t−1 is much larger than stratum (z t−1 1 , x t−1 1 ) and so ϑ(x t−1 ) is almost always estimable even with a long treatment sequence and a finite sample.
Decomposing ϑ(x t−1 ) into the net effects of z s = 1 (s ≥ t) in stratum (x t−1 , z t = 1) versus stratum (x t−1 , z t = 0) and using pattern (33), we obtain the following constraint on ϑ(x t−1 ) (t = 1, . . . , T ) with where q = 1, 2 indicates treatments of types one and two respectively. Using the likelihood and the outcome model given in the beginning of the previous subsection, we obtain the estimate for the mean of y in stratum ( .

X. Wang and L. Yin
Under the Markov process, we use (24) to obtain the estimate for point effect From Proposition 1, we see thatθ(x t−1 ) at time t is independent of the estimates of point parameters at the other times, i.e. ϑ(z s−1 , x s ) (s = 1, . . . , T −1) and the grand mean ν. Therefore we can treat constraint (37) as a linear regression with unequal variances var{θ(x t−1 )}. Using techniques of the linear regression, we regresŝ ϑ(x t−1 ) on c (1) (x t−1 ) and c (2) (x t−1 ) to obtain a closed form for the estimates of ϕ 1 and ϕ 2 , which are not presented here given space considerations. The estimatesφ 1 andφ 2 are both unbiased and consistent.
In the estimation above, we have only used the likelihood, the outcome model and constraint (37), albeit in the point parametrization. It is theoretically possible to do the same estimation in the standard parametrization by using the G-computation algorithm, but this is practically difficult due to the high dimension of standard parameters and the complex expression of constraint (37) in terms of standard parameters. Furthermore, if equalities are imposed on standard parameters, then the estimation is biased (Robins [7,10,12,14,15]).

A simulation study
Here we show by simulation that interval estimation of the net effect of treatment achieves the nominal coverage probability in the example of the previous subsection. We consider the case of T = 3, so pattern (33) of the net effects is We are going to study three situations with the net effect (ϕ 1 , ϕ 2 ) = (20, 10), (20, 0), (0, 0) respectively. In particular, the vector (20, 0) represents the null net effect of treatment z 2 in the presence of non-zero net effects of treatments z 1 and z 3 ; the vector (0, 0) represents the null net effects of all treatments.
To generate the data with the above pattern of net effects, we construct the standard parameters μ (z 1 , x 1 , z 2 , x 2 , z 3 ), as explained here. First, we construct the proportions of z 1 , x 1 , z 2 , x 2 , and z 3 which give a sample size of 648 units yielding integer frequencies for all (z 1 , x 1 , z 2 , x 2 , z 3 ). Second, with the obtained proportions and a given value of (ϕ 1 , ϕ 2 ), we calculate the point effect ϑ(z t−1 1 , x t−1 1 ) (t = 1, 2, 3) according to formula (34). The second part of (34) is used to calculate the constants c (1) (z t−1 1 , x t−1 1 ) and c (2) (z t−1 1 , x t−1 1 ) while the first part is used to calculate ϑ(z t− 1 1 , x t−1 1 ). Third, we arbitrarily choose the point effect ζ(z t 1 , x t 1 ) (t = 1, 2) and the grand mean ν, which according to Theorem 1 do not affect the net effect. Finally, we insert the obtained point parameters ϑ(z t− 1 1 , x t−1 1 ), ζ(z t 1 , x t 1 ) and ν into (12) to obtain the standard pa- ). In this setting, we have also made the time-dependent covariates x 1 and x 2 post-treatment variables of the earlier treatments and confounders of the subsequent treatments. Furthermore, the treatment assignment follows a Markov process such that the assignment of z 2 only depends on x 1 and that of z 3 on x 2 . A detailed description of the procedure is presented in Table 1. The standard parameters and the relevant SAS code are given in the supplementary material [20].
Because we have ignored the sampling variability of treatments and covariates since Section 5.1, we only generate the outcome y given (z 1 , x 1 , z 2 , x 2 , z 3 ). With the obtained standard parameter μ(z 1 , x 1 , z 2 , x 2 , z 3 ), assuming that the variance of the outcome y given (z 1 , x 1 , z 2 , x 2 , z 3 ) is one, we generate y to form a data set of 648 observations on (z 1 , x 1 , z 2 , x 2 , z 3 , y). A total of 2000 data sets are generated. For each data set, we calculate the confidence interval of the net effect as follows. Using the method described in Section 6.1.3, we calculate the estimateθ(x t−1 ) and the constants c (1) (x t−1 ) and c (2) (x t−1 ) and then regresŝ ϑ(x t−1 ) on c (1) (x t−1 ) and c (2) (x t−1 ) according to (37) to obtain the estimateŝ ϕ 1 andφ 2 . Withφ 1 and its variance, we calculate the confidence interval of ϕ 1 . With 2000 data sets, we obtain 2000 confidence intervals. By counting how many confidence intervals contain the given value of ϕ 1 , we obtain the actual coverage probability for the confidence interval of ϕ 1 . The same procedure is followed for ϕ 2 . The SAS code generating the data set and calculating the actual coverage probability is presented in the supplementary material. The mean and standard deviation ofφ i (i = 1, 2) and the actual coverage probability of the 95 % confidence interval of ϕ i are presented in Table 2, together with the given value of ϕ i . Table 2 shows that for the three simulations with (ϕ 1 , ϕ 2 ) = (20, 10), (20, 0), (0, 0) respectively, the actual coverage probability for the 95% confidence interval is the same: 95.20% for ϕ 1 and 94.85% for ϕ 2 .

A real example
Here we use a medical example to illustrate the practical procedure of estimating the net effect. Many HIV-infected patients use recreational drugs such as cocaine. A relevant medical question is how the recreational drug influences the count of CD4 cells which reflects progression of the disease. We attempt to estimate the net effect of the recreational drug on the CD4 count when the drug is used repeatedly.
The Multicenter AIDS Cohort Study enrolled nearly 5000 gay or bisexual men from Baltimore, Pittsburgh, Chicago and Los Angeles between 1984 and 1991, and required these men to return every 6 months to complete a questionnaire Table 1 The procedure of constructing standard parameters given net effects of treatments as described in Section 6.2. The SAS code and the obtained standard parameters are given in the supplementary material Using the above proportions, we obtain other proportions of z 1 , The sample size is set at 648, which yields integer frequencies for all (z 1 , x 1 , z 2 , x 2 , z 3 ).
(2) Calculate the point effects of treatments according to (34) Using the above proportions, we calculate the following constants: With a given value of (ϕ 1 , ϕ 2 ), we then calculate the point effects of treatments ) for t = 1, 2, 3. For z 1 = 0, 1, z 2 = 0, 1, and f (x 1 ) = 0, 3, 6, 9 when x 1 = (0, 0), (0, 1), (1, 0), (1, 1) respectively, we have For the grand mean, we have ν = 15 (4) Calculate the standard parameters according to (12) where I(zt) equals one when zt = 1 and zero otherwise. and undergo various examinations (Kaslow et al. [3]). Our data was a subset of the data from the study which involved 375 participants who were seronegative at entry and seroconverted during the follow-up (Zeger and Diggle [21]). In the initial period of seroconversion, these participants were not exposed to anti-HIV drugs, which might complicate the net effect of the recreational drug. Hence we restricted our study to the visit before seroconversion (t = 0) and the first and second visits after seroconversion (t = 1, 2). Furthermore, some participants lost their follow-ups due to unknown non-ignorable missing data mechanism, so we excluded these patients and finally obtained a data set of 256 participants. The data set is presented in the supplementary material. At each visit t = 0, 1, 2, a participant recalled or was examined for the drug use (z t ), CD4 count (x t1 ), the number of packs of cigarettes smoked daily (x t2 ), the number of sexual partners (x t3 ) and a mental illness score (x t4 ). We assumed that z t occurred prior to x t1 , . . . , x t4 . Age (x 05 ) was also included as a covariate at visit t = 0. Consequently, the temporal order of these variables is {z 0 , (x 01 , . . . , x 05 ), z 1 , (x 11 , . . . , x 14 ), z 2 , (x 21 , . . . , x 24 )}.
The treatment variables are drug uses z 1 and z 2 . Due to incomplete information about covariates prior to drug use z 0 , it is not possible to obtain the net effect of z 0 . Instead, we use z 0 and x 01 , . . . , x 05 as stationary covariates to adjust for participants' differences. The time-dependent covariates between z 1 and z 2 are x 11 , . . . , x 14 . The outcome is the logarithm of CD4 count at t = 2, i.e. y = log(x 21 ). All variables prior to y are dichotomized, with ones meaning 'yes' or 'high' and zeros meaning 'no' or 'low'. We assume that y is normally distributed.
As described in Section 5.2, we use the model, for i = 1, . . . , 256, to estimate the point effect ϑ(z 0 , x 0 ) of drug use z 1 = 1 and the point effect ϑ(z 0 , x 0 , z 1 , x 1 ) of drug use z 2 = 1. When implementing this estimation by common statistical softwares like SAS, we (1) use the model to estimate the variance of y given (z 0 , x 0 , z 1 , x 1 , z 2 ), i.e. var(y | z 0 , x 0 , z 1 , x 1 , z 2 ), (2) use the same model to estimate ϑ(z 0 , x 0 , z 1 , x 1 ), and (3) use the model to estimate ϑ(z 0 , x 0 ) but we use the variance estimated from step (1). We improve the estimation in the usual framework of statistical modeling. By the likelihood ratio-based significance test of the model parameters at the significance level of 10 %, we find that only the CD4 count x 01 is significant of all covariates and so ϑ(z 0 , x 0 ) is considered equal to the point effect ϑ(x 01 ) of drug use z 1 = 1. Furthermore, x 01 does not have a significant interaction with z 1 . As a result, we use the model to estimate ϑ(x 01 ) = β 1 . Similarly, we find that only the CD4 counts x 01 and x 11 are significant of all covariates and so ϑ(z 0 , x 0 , z 1 , x 1 ) is considered equal to the point effect ϑ(x 01 , x 11 ) of drug use z 2 = 1. Furthermore, x 01 and x 11 do not have a significant interaction with z 2 . As a result, we use the model to estimate ϑ(x 01 , x 11 ) = β 2 . In the estimation above, the variance of y given (x 01 , z 1 , x 11 , z 2 ), i.e. var(y | x 01 , z 1 , x 11 , z 2 ), is estimated by using the model We decompose the point effects β 1 and β 2 into the net effects ϕ 1 and ϕ 2 according to their proportions and obtain β 1 = ϕ 1 + ϕ 2 {prop(z 2 = 1 | z 1 = 1) − prop(z 2 = 1 | z 1 = 0)}, According to this decomposition, we regressβ 1 andβ 2 on the proportions to estimate ϕ 1 and ϕ 2 . The estimate (variance) of ϕ 1 is 0.04(0.08) whereas that of ϕ 2 is 0.08(0.06). This result shows that the recreational drug has an immediate positive effect on CD4 count (i.e. ϕ 2 ), which is well-known in the medical literature, and a decreasing distant effect on CD4 count (i.e. ϕ 1 ), which is observed in this study. The estimation procedure and the results are presented in Table 3. In the procedure above, we estimated the variances var(β 1 ) and var(β 2 ) through var(y | x 01 , z 1 , , x 11 , z 2 ) estimated from model (40). On the other hand, we can estimate var(β 1 ) through var(y | x 01 , z 1 ) estimated from model (38), and var(β 2 ) through var(y | x 01 , x 11 , z 2 ) estimated from model (39). Because (40) avoids dispersion of the covariates, var(y | x 01 , z 1 , x 11 , z 2 ) is smaller than var(y | x 01 , x 11 , z 2 ) and var(y | x 01 , z 1 ), yielding smaller estimates for var(β 1 ) and var(β 2 ).
However, the latter method can be useful in the case of a long treatment sequence. As an example, recall the example described in Section 6.1.3, where Table 3 The procedure and results of estimating the net effects of the recreational drugs z 1 and z 2 on the logarithm y of the CD4 count after z 2 . The medical background is described in Section 6.3. The SAS code and the data set are given in the supplementary material (1) Estimating the point effects of treatments according to (38) and (39) Point effect of treatment Estimate (standard deviation) ϑ(x 01 ) 0.10 (0.07) ϑ(x 01 , x 11 ) 0.08 (0.06) (2) Decomposing the point effects into the net effects according to (41) ϑ(x 01 ) = ϕ 1 + 0.73ϕ 2 ϑ(x 01 , x 11 ) = ϕ 2 (3) Estimate the net effects according to the decomposition above Net effect of treatment Estimate (standard deviation) ϕ 1 0.04 (0.08) ϕ 2 0.08 (0.06) we calculated the variance var{θ(x t−1 )} through the variance var(y | z T 1 , x T −1 1 ). However, with a long treatment sequence and a finite sample, it is highly difficult to estimate var(y | z T 1 , x T −1 1 ). In this case, we can estimate var{θ(x t−1 )} through var(y | x t−1 , z t ) estimated from the model

Conclusion
In this article we have identified net effects of treatments by point effects of treatments and estimated net effects of treatments via point effects of treatments by maximum likelihood. Using methods in the single-point causal inference (Rosenbaum & Rubin [17], Rosenbaum [18], Rubin [19]), we improve estimation of the net effects by using constraints on their point effects and treatment assignment conditions.
Given a data, model and the likelihood, our estimation of the net effects is most efficient due to the nature of maximum likelihood. The point estimation is unbiased while the interval estimation achieves the nominal coverage probability. Furthermore, the ML estimate of the net effect is consistent in many practical applications where the net effect has a pattern of finite dimension while treatments and covariates take finite numbers of values. The consistency is true even when the treatment sequence gets long and the number of point parameters increases exponentially. It is interesting to compare this consistency with the inconsistency of the ML estimate of the causal effect of a single-point treatment when adjusting for a confounder of infinite dimension (Robins & Ritov [11]). In the latter case, the ML estimate of the single-point causal effect is highly correlated with that of the confounder of infinite dimension.
There are rich literatures on semi-parametric approaches to estimation of the net effect, for instance, the marginal structural model and the structural nested model. These approaches are also used to estimate the causal effects of treatment sequences including dynamic ones under special conditions where, for instance, study units remain on active treatments after treatment initiation and the net effects are the same for all treatments (Lok & DeGruttola [2]). In comparison, our approach estimates the net effects of all treatments in the sequence, though in a relatively simple setting. Together with the proportions of treatments and covariates, the ML estimates of these net effects can be used to obtain the ML estimate of the causal effect of any treatment sequence (for instance, Robins [10]). Due to the scope of this article, we have only focused on the net effect.
Due to the scope of this article, we have only considered the situation where treatments are assigned at fixed times, treatments and covariates are discrete, there is no missing data, the outcome model is linear and the point and net effects are measured by differences. However, there are methods available for estimating the causal effect of one single-point treatment in more complex settings. We believe that analogous methods can be developed to estimate the net effects in more complex settings in the context of a treatment sequence.