Bounding Causes of Effects With Mediators

Philip Dawid, Macartan Humphreys, Monica Musio

DSEID: DSEID-001-3003049
DOI: 10.1177/00491241211036161
Journal: Sociological Methods & Research
Publisher: SAGE Publications
Published: 2024-2
Status: available

Abstract

Suppose X and Y are binary exposure and outcome variables, and we have full knowledge of the distribution of Y, given application of X. We are interested in assessing whether an outcome in some case is due to the exposure. This “probability of causation” is of interest in comparative historical analysis where scholars use process tracing approaches to learn about causes of outcomes for single units by observing events along a causal path. The probability of causation is typically not identified, but bounds can be placed on it. Here, we provide a full characterization of the bounds that can be achieved in the ideal case that X and Y are connected by a causal chain of complete mediators, and we know the probabilistic structure of the full chain. Our results are largely negative. We show that, even in these very favorable conditions, the gains from positive evidence on mediators is modest.

PDF

GROBID Extracted text; discontinued.

This text is generated from TEI extraction for accessibility, search, and TTS. Formulas, tables, figures, page layout, and references may not perfectly match the original PDF.

Extraction warning: A DOI found in the document differs from the requested DOI.

Extracted abstract

Suppose X and Y are binary exposure and outcome variables, and we have full knowledge of the distribution of Y , given application of X. From this we know the average causal effect of X on Y . We are now interested in assessing, for a case that was exposed and exhibited a positive outcome, whether it was the exposure that caused the outcome. The relevant "probability of causation", PC, typically is not identified by the distribution of Y given X, but bounds can be placed on it, and these bounds can be improved if we have further information about the causal process. Here we consider cases where we know the probabilistic structure for a sequence of complete mediators between X and Y . We derive a general formula for calculating bounds on PC for any pattern of data on the mediators (including the case with no data). We show that the largest and smallest upper and lower bounds that can result from any complete mediation process can be obtained in processes with at most two steps. We also consider homogeneous processes with many mediators. PC can sometimes be identified as 0 with negative data, but it cannot be identified at 1 even with positive data on an infinite set of mediators. The results have implications for learning about causation from knowledge of general processes and of data on cases.

Introduction

Even the best possible evidence regarding the effects of a treatment on an outcome is generally not enough to identify the probability that the outcome was caused by the treatment.

For instance, researchers conducting randomised controlled trials may determine that providing a medicine to school children increases the overall probability of good health from one third to two thirds. This information, no matter how precise, is not enough to answer the following question: Is Ann healthy because she took the medicine? It is not even enough to answer the question probabilistically. The reason is that, consistent with these results, it may be that the medicine makes a positive change for 2 out of 3 students, but an adverse change for the remainder: in that case the medicine certainly helped Ann. But it might alternatively be that the medicine makes a positive change for 1 in 3 children but no change for the others. In that case the chances it helped Ann are just 1 in 2. Of the children taking the medicine, two thirds are healthy. Half of these are healthy because of the medicine, whereas the other half would have been healthy anyway.

Put differently, the experimental data identifies the "effects of causes," (EoC) but we are interested in the reverse problem, of quantifying "causes of effects" (CoE). The CoE task of defining and assessing the probability of causation (Robins and Greenland, 1989 ) in an individual case has been considered by Tian and Pearl (2000) ; Dawid (2011) ; Yamamoto (2012) ; Pearl (2015) ; Dawid, Musio and Fienberg (2016) ; Dawid, Murtas and Musio (2016) ; Dawid, Musio and Murtas (2017) ; Murtas, Dawid and Musio (2017) . Note that this is distinct from the "reverse causal question" of Gelman and Imbens (2013) , which is an EoC task aimed at ascertaining which causes have an effect on an outcome.

To understand causes of effects better, we might seek additional evidence along causal pathways. For example, researchers evaluating development programs specify "theories of change" and seek evidence for intermediate outcomes along a pathway linking treatment to outcomes-most simply, Was the treatment received? Was the medicine ingested? Van Evera (1997) describes various tests that might be implemented using such ancillary evidence. A "smoking gun test" searches for evidence that, though unlikely to be found, would give great confidence in a claim if it were to be found; a "hoop test" test is a search for evidence that we expect to find, but which, if found to be absent, would provide compelling evidence against a proposition (as if the proposition were asked to jump through a hoop).

Sometimes many points along a causal pathway are investigated. An intervention might be to provide citizens with information on political corruption, in the hope that this will lead to ultimate changes in politicians' behavior. Researchers might then check many points along a chain of intermediate outcomes. Was the political message delivered? Was it understood? Was it believed? Did it induce a change in behavior by citizens? Did this in turn produce a change in behavior by politicians?

Seeing positive evidence at many points along a such a causal chain would appear to give confidence that the final outcome is indeed due to the conjectured cause. This is the core premise of "process tracing," as deployed by qualitative political scientists (Collier, 2011) , as well as of mixed methods research as used in development evaluation (White, 2009) . In the most optimistic accounts it is assumed that, as one gets close enough to a process, by observing more and more links in a chain, the link between any two steps becomes less questionable and eventually the causal process reveals itself (Mahoney, 2012, 581) .

We here provide a comprehensive treatment of the scope for inferences of this form from knowledge of causal chains. We obtain a general formula for calculating bounds on the probability of causation, for an arbitrary pattern of data along chains of binary variables. We derive implications of this formula, and calculate the largest and smallest upper and lower bounds achievable from any causal chain consistent with the known relation between X and Y . We give special attention to what might appear to be the best possible conditions: those in which causal processes really do follow a simple causal chain, in which researchers have complete experimental evidence about the probabilistic relationship between any two consecutive nodes in the chain, in which the chain is arbitrarily long, in which the causal effect of each intermediate variable on its successor climbs to 1, and in which researchers observe outcomes consistent with positive effects at every point on the chain. We show that such information does indeed increase confidence that an outcome can be attributed to a cause and, for homogeneous chains at least, that the longer the chain the better. However, we find that even under these ideal conditions our ability to narrow the bounds for the probability of causation can be modest. In the example of attributing Ann's health to good medicine, a homogeneous process with arbitrarily many positive intermediate steps observed might only tighten the bounds from [.5, 1] to [.58, 1] .

In contrast, we show that non-homogeneous processes can tighten the bounds considerably. For example, suppose Ann was prescribed the medicine and recovered. If we know that being prescribed the medicine is the only way in which Ann could have obtained and taken the medicine, and that taking the medicine helps anyone who would otherwise be sick, then with positive evidence on a single intermediate point on the causal chain-that Ann did indeed take the medicine-we can identify the probability that prescribing the medicine caused Ann's recovery at 2/3. (We are still short of 1, because it is possible that Ann would have recovered even without the medicine.) A process like this, in which we observe a "necessary condition for a sufficient condition", provides the largest possible lower bound on the probability of causation available from any observations on any chain. At this point we have done the best possible and more data along the chain will not help.

Although achieving identification of the probability of causation at 1 is generally elusive, negative data can yield identification at 0, either in two steps from a heterogeneous process, or from alternating data along an infinite homogeneous chain. In this sense, information on mediators can support "hoop" tests but not "smoking gun" tests.

Plan of paper

Existing results (Dawid, Murtas and Musio, 2016) have considered the case of a single unobserved mediator. We generalize this in two ways. First, we consider situations with chains of arbitrary length. Secondly, we calculate bounds for general data, that is, for situations in which the values of none, some or all the mediators are observed.

We proceed as follows. Section 2 introduces the set-up, and provides general formulae for bounding the probability of causation for a simple one-step process. In § 3 we extend these results to cases in which we know the structure of a complete mediation process. We consider various degrees of knowledge of the values of the mediators for the individual case at hand: all unobserved, all observed, or just some observed. Our main result is Theorem 4, which provides a general formula applicable to all cases.

Section 4 draws out the detailed implications of this result in a variety of contexts. In § 4.1 we investigate the largest achievable lower and upper bounds from any sequence, and find that these can be achieved by heterogeneous two-step processes. Section 4.2 examines the case of homogeneous processes of arbitrary length. We show that an alternating pattern for the values at all intermediate points can lead to a limiting value of 0 for the probability of causation. However, it is not generally possible for even the most positive evidence to identify the probability of causation-and a fortiori not possible to identify it at 1-even in the limit of infinitely many steps. Section § 4.3 considers implications of our results for gathering data on mediators. In § 5 we compare the bounds based on knowledge of mediator processes with those achievable from knowledge of covariates, which can be much tighter. We summarise our findings in § 6. Various technical details for the proofs in the paper are elaborated in three appendices.

Preliminaries

We consider a binary treatment variable X and binary outcome variable Y . We suppose we have access to experimental (or unconfounded observational) data supplying values for Pr(Y = y | X ← x), where we use the notation X ← x to denote a regime in which X is set to value x by external intervention.

Define

τ := Pr(Y = 1 | X ← 1) -Pr(Y = 1 | X ← 0) ρ := Pr(Y = 1 | X ← 1) -Pr(Y = 0 | X ← 0).

Then τ is the average causal effect of X on Y , while ρ is a measure of how common Y = 1 is. The transition matrix from X to Y (where the row and column labels of any such matrix are implicitly 0 and 1 in that order) can be written:

P = P (τ, ρ) := 1 2 (1 + τ -ρ) 1 2 (1 -τ + ρ) 1 2 (1 -τ -ρ) 1 2 (1 + τ + ρ)

(1)

All entries of P must be non-negative: this holds if and only if

|ρ| + |τ | ≤ 1. ( 2

We have equality in (2) if and only if one of the entries of (1) is 1, in which case we term P degenerate. For τ ≥ 0, this will happen if either ρ = 1 -τ , in which case Pr(Y = 1 | X = 1) = 1 and X = 1 can be thought of as a sufficient condition for Y = 1; or ρ = τ -1, in which case Pr(Y = 1 | X = 0) = 0, and X = 1 can be thought of as a necessary condition for Y = 1. Defining, for τ ≥ 0,

σ := ρ 1 -τ , (3)

we might thus regard σ ∈ [-1, 1] as measuring the relative sufficiency of X = 1 for Y = 1. 1

1 Although we do not focus on it, for τ < 0 the analogous quantity -ρ 1+τ can be interpreted as the relative sufficiency of X = 1 for Y = 0.

Potential outcomes and causes of effects

While knowledge of the transition matrix P , and in particular the "average causal effect" τ , is directly relevant for EoC ("effects of causes") analysis, it is not enough to support CoE ("causes of effects") analysis. For this we need to introduce the pair of potential outcomes, Y = (Y 0 , Y 1 ), where we conceive of Y x as the value Y would take, if X ← x. We regard both Y 0 and Y 1 as existing simultaneously, even prior to setting the value of X, and as having a bivariate probability distribution.

We can now define the following events in terms of Y (where x denotes 1 -x, the value distinct from x, etc.):

General causation C (X,Y ) := "Y 1 = Y 0 ".

That is, changing the value of X will result in a change to the value of Y . We can also describe this as "X affects Y ."

When the relevant variables X and Y are clear from the context we will simplify the notation to C.

Specific causation C

(X,Y ) xy := "Y x = y, Y x = y" (for x, y = 0 or 1).

That is, changing the value of X from x to x would change the value of Y from y to y. We can also describe this as "X = x causes Y = y." When the relevant variables X and Y are clear from the context we will simplify the notation to C xy .

We note that C xy = C xy .

Probability of Causation.

In cases of interest we will have observed X = x, Y = y, and want to know the probability that X caused Y , given this information. We denote this quantity by PC (X,Y )

xy

, or PC xy when the relevant variables X and Y are clear from the context. Thus

PC xy = Pr(C | X = x, Y = y) = Pr(C xy | X = x, Y x = y). (4)

The joint distribution for Y, while constrained by knowledge of the transition matrix P , is in general not fully determined by it. Rather, we can only deduce that it has the form of Table 1 , where the marginal probabilities agree with (1) according to Pr(Y x = y) = Pr(Y = y | X ← x). However, the internal entries of Table 1 are not determined by P , but have one degree of freedom,

Y 1 = 0 Y 1 = 1 Y 0 = 0 1 2 (1 -ρ -ξ) 1 2 (ξ + τ ) 1 2 (1 + τ -ρ) Y 0 = 1 1 2 (ξ -τ ) 1 2 (1 + ρ -ξ) 1 2 (1 -τ + ρ) 1 2 (1 -τ -ρ) 1 2 (1 + τ + ρ) 1 Table 1: Pr(Y 0 = y 0 , Y 1 = y 1 )

expressed by the "slack" quantity ξ = ξ(P ). We see that

ξ = Pr(Y 0 = 0, Y 1 = 1) + Pr(Y 0 = 1, Y 1 = 0) = Pr(C), (5)

the probability of general causation.

The only constraints on ξ are that all internal entries of Table 1 must be non-negative, which holds if and only if

|τ | ≤ ξ ≤ 1 -|ρ|. (6)

In particular ξ, and thus the bivariate distribution of (Y 0 , Y 1 ) in Table 1 , is uniquely determined by P if and only P is degenerate. We further note

Pr(C 00 ) = Pr(C 11 ) = 1 2 (ξ + τ ) (7) Pr(C 01 ) = Pr(C 10 ) = 1 2 (ξ -τ ) (8) whence, by (6), max{0, τ } ≤ Pr(C 00 ) = Pr(C 11 ) ≤ 1 2 (1 + τ -|ρ|) (9) max{0, -τ } ≤ Pr(C 01 ) = Pr(C 10 ) ≤ 1 2 (1 -τ -|ρ|). (10)

Throughout this article we shall assume no confounding, expressed mathematically as X ⊥ ⊥ Y.

Then PC xy = Pr(C xy ) Pr(Y x = y) = Pr(C xy ) Pr(Y = y | X ← x)

which is thus subject to the interval bounds, given by ( 9 ) or ( 10 ), as appropriate, divided by the known entry Pr(Y = y | X ← x) of the transition matrix P .

This analysis delivers the following lower and upper bounds (prefix "s" for "simple"):

sLB 00 := max{0, τ } Pr(Y = 0 | X ← 0) ≤ PC 00 ≤ 1 2 (τ + 1 -|ρ|) Pr(Y = 0 | X ← 0) =: sUB 00 (11) sLB 10 := max{0, -τ } Pr(Y = 0 | X ← 1) ≤ PC 10 ≤ 1 2 (1 -|ρ| -τ ) Pr(Y = 0 | X ← 1) =: sUB 10 ( 12

sLB 01 := max{0, -τ } Pr(Y = 1 | X ← 0) ≤ PC 01 ≤ 1 2 (1 -|ρ| -τ ) Pr(Y = 1 | X ← 0) =: sUB 01 ( 13

sLB 11 := max{0, τ } Pr(Y = 1 | X ← 1) ≤ PC 11 ≤ 1 2 (τ + 1 -|ρ|) Pr(Y = 1 | X ← 1) =: sUB 11 . (14)

In the absence of additional information, the above bounds constitute the best available inference regarding the probability of causation. Specifically, when τ ≥ 0, on defining

γ := 1 -τ -|ρ| 1 -τ + |ρ| = 1 -|σ| 1 + |σ| (15) δ := 1 + τ -|ρ| 1 + τ + |ρ| (16)

we have the following upper bounds: For ρ ≥ 0:

sUB 00 = 1 ( 17

sUB 01 = γ (18) sUB 10 = 1 (19) sUB 11 = δ (20)

For ρ < 0:

sUB 00 = δ (21) sUB 01 = 1 (22) sUB 10 = γ ( 23

sUB 11 = 1 (24)

Special case

A particular interest is in cases where τ > 0 (so the overall effect of X and Y is positive) and we observe positive outcomes, X = 1, Y = 1. In this case we omit the subscript 11. We have

PC = ξ + τ 2 Pr(Y = 1 | X ← 1) , (25)

and interval bounds given by

sLB = 2τ 1 + τ + ρ ≤ PC ≤ sUB = δ (ρ ≥ 0) 1 (ρ < 0) (26)

This result agrees with (Tian and Pearl, 2000; Dawid, 2011; Dawid, Musio and Murtas, 2017) . PC is identified (i.e., the interval in ( 26 ) reduces to a single point) if and only if |ρ| = 1 -τ , which holds when P is degenerate with either the lower left or upper right element of P being 0. In the former case PC = τ , while in the latter case PC = 1.

More generally, we have sLB = τ / Pr(Y = 1 | X ← 1) ≥ τ , so PC ≥ τ .

Bounds from mediation

We now suppose that, in addition to X and Y , we can gather data on one or more binary mediator variables M 1 , . . . , M n-1 . We also define M 0 ≡ X and M n ≡ Y . We are interested in assessing the probability that X = x caused Y = y for a new case where we have information on the values of some or all of the mediators M 1 , . . . , M n-1 . We assume that the data are based on experiments, or in any case are such as to allow us to determine the one-step interventional probabilities Pr(M i+1 = m i+1 | M i ← m i ), i = 0, . . . , n -1. We shall here confine attention to the case of a complete mediation sequence, where

Pr(M i+1 = m i+1 | M j ← m j , j = 0, . . . , i) = Pr(M i+1 = m i+1 | M i ← m i ), (i = 0, . . . , n -1).

We shall further suppose that, for any new case considered, there is no confounding at every step, so that

Pr(M i+1 = m i+1 | M j = m j , j = 0, . . . , i) = Pr(M i+1 = m i+1 | M i ← m i ), (i = 0, . . . , n -1).

In this case the sequence of observations (X ≡ M 0 , . . . , M n ≡ Y ) on a new case will form a (generally non-stationary) Markov chain. This is an empirically testable consequence of our assumptions, assumptions which would therefore be falsified if the Markov property is found to fail (although those assumptions are not guaranteed to be valid when it is found to hold.)

Let the transition matrix from M i-1 to M i be P i = P (τ i , ρ i ), and the overall transition matrix from X to Y be P = P (τ, ρ). We shall write

P = P 1 | P 2 . . . | P n (27)

to indicate that we are assuming the above mediation sequence, and refer to (27) as a decomposition of the matrix P . In particular we then have P = P (n) := n i=1 P i . We can readily show by induction that

τ = τ (n) := n i=1 τ i (28) ρ = ρ (n) := n i=1 ρ i n j=i+1 τ j . (29)

In particular, for the case n = 2, (29) becomes

ρ = ρ 1 τ 2 + ρ 2 . ( 30

On account of (28) we have the following result:

Theorem 1 The average causal effect of X on Y is the product of the successive average causal effects of each variable in the sequence on the following one.

Again, to conduct CoE rather than EoC analysis, we introduce, for i ≥ 1, bivariate variables

M i := (M i0 , M i1 )

where M im denotes the potential value of M i under M i-1 ← m, supposed unaffected by values of previous M 's. We further assume that the variable M i is common to all the various worlds, whether actual or counterfactual, under consideration. The actually realised values (M i ) satisfy

M i = M i,Mi-1 .

As the expression of our "no confounding" assumptions, we impose mutual independence between X, M 1 ,. . . ,M n . M1+1) . That is to say, M 0 ≡ X affects M n ≡ Y if and only if each M i affects the next.

Theorem 2 C (X,Y ) = n-1 i=0 C (Mi,

Proof. Suppose first that each variable affects the next. Then changing the value of X will change that of M 1 , which in turn will change that of M 2 , and so on until the value of Y is changed, so showing that X affects Y . Conversely, if, for some j < n, M j does not affect M j+1 , then, whether or not M j has been changed, the value of M j+1 will be unchanged, whence so too will that of M j+2 , and so on until the value of Y is unchanged, whence X does not affect Y .

Corollary 1

(i). Pr(C (X,Y ) ) = n i=1 Pr(C (Mi-1,M1) ) (ii). ξ(P ) = n i=1 ξ(P i ) (iii)

. Given the detailed information on the decomposition (27), the constraints on ξ = ξ(P ) are now:

|τ | ≤ ξ ≤ n i=1 (1 -|ρ i |) . (31)

Proof.

(i) By the assumed mutual independence of the (M i ).

(ii) By ( 5 ).

(iii) By (ii), ( 6 ) for each P i , and (28).

On account of (i) we have:

Corollary 2 For any decomposition, the probability that X affects Y is the product of the probabilities that each variable in the sequence from X to Y affects the next in the sequence.

On comparing (31) with ( 6 ), we see that detailed knowledge of the mediation process has not changed the lower bound for ξ. However, the upper bound is typically reduced:

Theorem 3 The upper bound of (31), which takes into account the decomposition (27), does not exceed the upper bound of (6), which ignores the decomposition. It will be strictly less if all the P i are non-degenerate with ρ i = 0.

Proof. Consider first the case n = 2. Then

|ρ| = |ρ 1 τ 2 + ρ 2 | by (30) ≤ |ρ 1 ||τ 2 | + |ρ 2 | (32) ≤ |ρ 1 |(1 -|ρ 2 |) + |ρ 2 | by (2). ( 33

) It follows that (1 -|ρ 1 |)(1 -|ρ 2 |) ≤ 1 -|ρ|. (34)

Moreover, we shall have strict inequality in (33), and hence also in (34), if P 2 is non-degenerate and ρ 1 = 0.

The result for general n follows easily by induction.

We note that the above condition for strict inequality in (34), while sufficient, is not necessary. For example, in the case n = 2 it will also hold if ρ 1 τ 2 and ρ 2 have different signs, since then we would have strict inequality in (32).

It follows from ( 31 ) and ( 34 ) that collapsing two mediators into a single one can only increase the upper bound for ξ:

Corollary 3 Consider two decompositions P = P 1 | P 2 . . . | P n and P = P 1 | . . . | P i | Q | P i+2 | . . . | P n , where Q = P i P i+1 .

Then the upper bound for ξ for the former does not exceed that for the latter.

Bounds when mediators are unobserved

Suppose first that, for the new case, we have observed X = x, Y = y, but the values of the mediators are not observed. Even in this case, as shown for the two-term decomposition in Dawid, Murtas and Musio (2016) , knowledge of the decomposition (27) of P can alter the bounds for PC.

Indeed, in this case (4) still applies, where Pr(C xy ) is given by ( 7 ) or (8) as appropriate, but now with ξ subject to the revised bounds of (31). In each case the lower bound is unaffected, but, by Theorem 3, the upper bound is reduced.

This analysis delivers the following revised bounds (prefix "u" for "unobserved mediators"):

uLB 00 := sLB 00 = max{0, τ } Pr(Y = 0 | X ← 0) ≤ PC 00 ≤ τ + n i=1 (1 -|ρ i |) 2 Pr(Y = 0 | X ← 0) =: uUB 00 (35) uLB 10 := sLB 10 = max{0, -τ } Pr(Y = 0 | X ← 1) ≤ PC 10 ≤ n i=1 (1 -|ρ i |) -τ 2 Pr(Y = 0 | X ← 1)

=: uUB 10 (36)

uLB 01 := sLB 01 = max{0, -τ } Pr(Y = 1 | X ← 0) ≤ PC 01 ≤ n i=1 (1 -|ρ i |) -τ 2 Pr(Y = 1 | X ← 0) =: uUB 01 (37) uLB 11 := sLB 11 = max{0, τ } Pr(Y = 1 | X ← 1) ≤ PC 11 ≤ τ + n i=1 (1 -|ρ i |) 2 Pr(Y = 1 | X ← 1) =: uUB 11 (38)

Special case

In particular, for the case τ > 0, where we observe X = 1, Y = 1 (but the values of mediators are not observed), we have revised bounds

uLB := 2τ 1 + τ + ρ ≤ PC ≤ τ + n i=1 (1 -|ρ i |) 1 + τ + ρ =: uUB. ( 39

For n = 2 this agrees with the analysis of Dawid, Murtas and Musio (2016) .

Bounds when some or all mediators are observed

Now suppose that, in addition to X = x, Y = y, we also observe data on k mediators (0 ≤ k ≤ n -1) for the new case. In particular we observe M ir = m ir , for 0 < i 1 < . . . i r . . . < i k < n. For notational simplicity we write M r for M ir , m r for m ir . We also identify M 0 ≡ X and M k+1 ≡ Y (so m 0 = x, m k+1 = y).

The relevant probability of causation is now

PC xy := Pr C | M r = m r , i = 0, . . . , k + 1 .

Note that in contrast to the difference between ( 35 )-( 38 ) on the one hand and ( 11 )-( 14 ) on the other hand, which relate to the same quantity PC xy but express different conclusions about it, PC xy is a genuinely different quantity from PC xy , since it conditions on different information about the new case.

Theorem 4 Given observations on X, M 1 , . . . , M k , Y , the probability that X caused Y is given by the product of the probabilities that each observed term in the sequence caused the next observed term: (40)

PC xy = k r=0 PC ( Mr,

Now since we have the decomposition information about the mediators (if any) occurring between M r ≡ M ir and M r+1 ≡ M ir+1 , but not their values for the new case, the bounds on any factor in (40) will, mutatis mutandis, have the form of the relevant expressions for uLB xy and uUB xy , as displayed in ( 35 )-( 38 ). Then the overall lower [resp., upper] bound on PC xy will be the product of these lower [resp., upper] bounds, across all terms. This procedure supplies a complete recipe for determining the appropriate bounds on PC xy in the knowledge of the full decomposition of P and the values of the observed mediators for the new case.

Special cases

Again consider the case τ > 0, X = Y = 1. On account of (28) we can, after possibly switching the labels 0 and 1 for some of the M i 's, take τ i > 0, all i. We assume henceforth that this is the case. The above procedure then delivers lower bound 0 unless m i = m i-1 , all i, so that m i = 1, all i. In that case we obtain lower bound (with prefix "o" for "observed mediators"):

oLB := τ k r=0 Pr M r+1 = 1 | M r = 1 = τ Pr Y = 1, M r = m r , r = 2, . . . , k | X = 1 . ( 41

) No evidence Positive evidence Mixed evidence Largest Upper uUB = 1+τ -|ρ| 1+τ +ρ oUB = min{1, 1 -ρ} mUB = 1 Lower uLB = 2τ 1+τ +ρ oLB = 1+τ -ρ 2 mLB = 0 Smallest Upper uUB = 2τ 1+τ +ρ (*) oUB = 2τ 1+τ +ρ (*) mUB = 0 (*) Lower uLB = 2τ 1+τ +ρ oLB = 2τ 1+τ +ρ mLB = 0

Table 2: Largest and smallest achievable upper and lower bounds from decompositions of any length, given no mediators observed, positive evidence observed for all mediators, or mixed evidence is observed. (*) Indicates that PC can be identified.

It is easy to see that this lower bound can only increase if we introduce further observed mediators. It follows that the smallest lower bound occurs when the are no observed mediators, when it reduces to uLB = sLB as in ( 39 ) and ( 26 ); while the largest lower bound occurs when all mediators are observed (all taking value 1)-that is to say, there is positive evidence for every link in the mediation chain.

In the remainder of this paper we shall give special attention to this case, and write simply PC for PC 11 , etc. The bounds for PC are then:

oLB := n i=1 2τ i 1 + τ i + ρ i ≤ PC ≤ n i=1 1 + τ i -|ρ i | 1 + τ i + ρ i =: oUB. (42)

The following result follows directly from the above considerations:

Lemma 1 The lower bound oLB of ( 42 ) is at least as large as the lower bound sLB of (26).

It is not, however, always the case that oUB ≤ sUB: see (45) below.

Implications

Equation (40) provides a general formula for calculating bounds on the probability of causation for any pattern of data observed on mediating variables (including no data). We now derive implications from this analysis.

Largest and smallest upper and lower bounds

Consider an arbitrary decomposition of P :

P = P 1 | P 2 | . . . | P n , (43)

with P = P (τ, ρ), P i = P (τ i , ρ i ). We restrict attention to the case τ > 0 and assume that variables are labeled so that each τ i > 0.

We investigate the smallest and largest achievable values for uLB, uUB, oLB, oUB, mLB, mUB (prefix m for mixed evidence) and show that in each case these are achievable by decompositions involving at most one mediator.

Theorem 5 Let the (known, fixed) transition matrix from X to Y be P = P (τ, ρ), with τ > 0 and |ρ| < 1 -τ . The largest and smallest upper and lower bounds from any complete mediation process for the case with mediators unobserved, for the case with positive outcomes on all mediators observed, and for mixed cases, that include some negative evidence on the mediators, are as given in Table 2 . These can all be achieved by decompositions of length 1 or 2.

Proof. See Appendix A.

The largest upper bound with mediators unobserved, uUB, can be achieved without any mediators. Since unobserved mediators do not alter the lower bound we have uLB = uLB = sLB. In addition we have uUB = sLB, which is achievable, for example, from the following decomposition:

P = 2τ 1+τ +ρ 1-τ +ρ 1+τ +ρ 0 1 1 0 1-τ -ρ 2 1+τ +ρ 2 . ( 44

Note that with this decomposition PC is identified via two degenerate transition matrices: X = 1 is a sufficient condition for M = 1, while M = 1 is a necessary condition for Y = 1.

The smallest upper and lower bounds available when mediators are observed agree with the simple lower bound. Positive evidence cannot reduce the lower bound, but it can reduce the upper bound to the lower bound, at which point PC is identified. This can be achieved by the same decomposition given in ( 44 ).

The largest upper bound with positive evidence on mediators, oUB, can exceed the simple upper bound when ρ > 0. It is achieved by the following two-term decomposition, involving a single mediator:

P = 1-ρ+τ 2(1-ρ) 1-ρ-τ 2(1-ρ) 1-ρ-τ 2(1-ρ) 1-ρ+τ 2(1-ρ) 1 -ρ ρ 0 1 . ( 45

The lower bound can be raised with positive information on mediators, and takes its largest value with the following degenerate two-term decomposition P = P 1 | P 2 , involving a single mediator:

P = 1 0 1-τ -ρ 1+τ -ρ 2τ 1+τ -ρ 1+τ -ρ 2 1-τ +ρ 2 0 1 . ( 46

With this decomposition PC is identified via two degenerate transition matrices: in this case X = 1 is a necessary condition for M = 1, while M = 1 is a sufficient condition for Y = 1. The largest lower bound with positive evidence from this decomposition is 1+τ -ρ 2 which can fall far short of 1, implying that in general mediators cannot provide "smoking gun" evidence that X = 1 caused Y = 1.

For the case with mixed evidence on the mediators the lower bound is always 0. The smallest upper bound is also 0, which can be achieved by the decomposition (46) above, with the single mediator observed at 0 (the key feature of this decomposition is that Y = 1 can not be caused by M = 0). In this case PC is identified at 0, showing that it is possible for negative data on mediators to provide "hoop" evidence that X = 1 did not cause Y = 1. The highest upper bound, mUB = 1, can be achieved by a two-step decomposition P (τ, ρ) = P (τ 1 , ρ 1 ) | P (τ 2 , ρ 2 ), with the mediator taking value 0. For ρ ≤ 0 this occurs with the decomposition with parameters

τ 1 = 2τ 1 + τ + ρ ρ 1 = 0 τ 2 = 1 + τ + ρ 2 ρ 2 = ρ. (47)

For ρ ≥ 0 it occurs with decomposition parameterized by

τ 1 = τ (1 + ρ + τ ) 2(τ + ρ) ρ 1 = ρ(1 + ρ + τ ) 2(τ + ρ) τ 2 = 2(τ + ρ) 1 + τ + ρ ρ 2 = 0. ( 48

Homogeneous transitions

Throughout this section we confine attention to the special case τ > 0, X = Y = 1. We specialize further to the case of a constant one-step transition matrix, P i = P = P (τ , ρ ) for all i. We define σ , γ , δ in terms of τ and ρ in parallel to (3), ( 15 ) and ( 16 ). In this case, by ( 28 ) and ( 29 ), we have

τ = (τ ) n (49) ρ = ρ × 1 -(τ ) n 1 -τ = ρ × 1 -τ 1 -τ . ( 50

In particular, we note that the relative sufficiency of X for Y is preserved at each intermediate step:

σ = ρ /(1 -τ ) = ρ/(1 -τ ) = σ. It follows that γ = γ.

We have

τ = τ 1/n (51) ρ = ρ × 1 -τ 1/n 1 -τ . (52)

Note that, for large n, τ must be close to 1 and ρ close to 0, with the same sign as ρ.

Using ( 51 ) and ( 52 ) in ( 39 ) and ( 42 ) yield the following bounds for a homogeneous process:

uLB n = 2τ 1 + τ + ρ ( 53

uUB n = τ + (1 -|ρ |) n 1 + τ + ρ ( 54

oLB n = τ 2 1 + τ + ρ n (55) oUB n = (δ ) n (ρ ≥ 0) 1 (ρ < 0). (56)

In particular, for the degenerate cases |ρ| = 1 -τ , so that |ρ | = 1 -τ , we see, that for all n, PC and PC are both identified, at τ when ρ = 1 -τ , and at 1 when ρ = τ -1-the existence of the mediators being irrelevant in these cases.

Mixed evidence Here we assume the process is non-degenerate.

For the case with some negative evidence the lower bound, mLB n say, is always 0, as noted in Section § 3.4. The upper bound, however, depends on the particular pattern of positive and negative evidence. For any sequence s of observations on consecutive mediators (allowing M 0 ≡ X and M n ≡ Y , both required to take value 1), denote the associated upper bound by UB(s). Let s denote a full sequence of observations (i.e., on all n + 1 mediators). We search for a full sequence s 0 yielding the maximum value, mUB n say, of UB(s).

Theorem 6 For large enough n, we have

mUB n = γ n/2 (n even) γ (n-1)/2 δ (n odd)

The optimal sequence s 0 alternates 10101 . . ., except, if n is odd, for the final 2 symbols.

Proof. See Appendix B.

For ρ = 0 the smallest possible upper bound is 1 for all n. Otherwise, mUB n → 0 as n → ∞. Then with alternating evidence on many mediators the associated probability of causation, PC say, is effectively identified as 0.

Figure 1 plots the intervals [uLB n , uUB n ], [oLB n , oUB n ] and [mLB n , mUB n ] for a range of cases. It highlights how modest are the gains from repeated observation of homogeneous mediators and how alternating evidence can tighten bounds as long as ρ = 0.

Unboundedly many mediators

We now consider the behaviour of the bounds when we have a potentially unlimited sequence of variables directly mediating between X and Y -still assuming identical one-step transition matrices. Our results are given in Theorem 7. Theorem 7 uLB ∞ := lim n→∞ uLB n = 2τ 1 + τ + ρ (57) uUB ∞ := lim n→∞ uUB n = τ + τ |σ| 1 + τ + ρ (58) oLB ∞ := lim n→∞ oLB n = τ 1 2 (1+σ) (59) oUB ∞ := lim n→∞ oUB n = min {1, τ σ } (60) mLB ∞ := lim n→∞ mLB n = 0 (61)

mUB ∞ := lim n→∞ mUB n = 0 if ρ = 0 1 if ρ = 0 (62) Proof. See Appendix C.

In particular, for ρ = 0 we have

0 = mLB ∞ < uLB ∞ = 2τ /(1 + τ ) ≤ oLB ∞ = τ 1 2 , and uUB ∞ = oUB ∞ = mUB ∞ = 1.

Proposition 1 For |ρ| < 1 -τ , oLB n is a concave strictly increasing function of n, and uUB n and (for ρ > 0) oUB n are both convex strictly decreasing functions of n.

We do not have a full proof of Proposition 1. Supporting evidence is given by numerous plots of oLB n and oUB n against n for various (τ, ρ) pairs, and the following two results, which are proved in Appendix C.

Lemma 2 If |ρ| < 1 -τ , then oLB n is a concave increasing function of n, and uUB n and (for ρ > 0) oUB n are convex strictly decreasing functions of n, for n sufficiently large.

Lemma 3 For the non-degenerate case |ρ| < 1 -τ , uUB 2n < uUB n , oLB 2n > oLB n , and (for ρ > 0) oUB 2n < uUB n .

Implications for data gathering

Our results have focused on improving the bounds on PC by learning about general mediating processes together with values for prespecified mediators for the case at hand. Our results can also be used to suggest which mediators researchers might most fruitfully seek to observe for the case at hand. Thus consider a homogeneous process with n steps (n even) and suppose that researchers can observe the value of just one mediator M i . In this case we can show that the lower bound LB on PC, if we were to observe M i = 1, is maximized if the central mediator in the sequence is observed. To see this, note that from (28), (29) and Theorem 4, the lower bound LB from observation of mediator M k = 1 is given by the product of the lower bound for the probability that X = 1 caused M k = 1 and the lower bound for the probability that M k = 1 caused Y = 1:

2(τ ) k 1 + (τ ) k + ρ {1 + τ + • • • + (τ ) k } × 2(τ ) n-k 1 + (τ ) n-k + ρ {1 + τ + • • • + (τ ) n-k }

where τ and ρ are given by ( 51 ) and ( 52 ). This expression has the form c/f (k)f (n -k), where f (k) is decreasing and convex in k: this holds since ∆ k+1 := f (k+1)-f (k) = τ k+1 -τ k +ρ τ k+1 = τ k (τ + ρ -1) < 0, and ∆ k+1 -∆ k = (τ k -τ k-1 )(τ + ρ -1) > 0. Hence the denominator is minimised, and so LB is maximised, when k = n -k.

As an illustration, suppose 121 dominoes stand in a row. The fall of any domino increases the chance that its neighbor will fall from 0.005 to 0.995. You know that the first domino was knocked and fell, that the last is also down, and want the probability that the fall of the first one caused the fall of the last one. A lower bound above 50% would secure a conviction of domino 1.

With no further information, the lower bound is 0.461-not enough to convict. But now suppose you can seek information on the status of just one other domino in the sequence: which should you choose? It is better to choose in the middle than at the edges.

If for example you were to seek information on the status of domino 2 and found that it had fallen, you would find LB = 0.463-a modest gain, reflecting the fact that you fully expected domino 2 to have fallen, given that domino 1 was knocked. However, you are less sure you will find domino 61 down. If you do, you find LB = 0.501-enough to convict domino 1.

Note that in all cases the lower bound would be 0 if the intermediate domino were found to be standing. Taking both possible outcomes into account, the expected lower bound is always 0.461. But the second strategy does better than the first, in allowing the possibility to obtain a larger lower bound (albeit with a smaller probability), and so secure a conviction.

Comparisons with other bounds

Although knowledge of mediators can narrow bounds, we have seen that this narrowing can be modest, even with access to an infinite sequence of positive evidence along a causal path. To put our results in context, we compare them with bounds that can be achieved from monotonicity, and from covariate information. Knowledge of the bounds achievable by different strategies provides some guidance as to whether a strategy would be worth pursuing.

Monotonicity Suppose that we somehow knew that there are no cases for which the exposure would prevent the outcome, i.e., such that Y 0 = 1, Y 1 = 0. From Table 1 this is equivalent to ξ = τ , its lower limit, which in turn implies that PC, given by (25), is identified at its lower limit, sLB = (2τ )/(1 + τ + ρ).

However, since monotonicity is an attribute of the typically unidentifiable joint distribution of (Y 0 , Y 1 ), it is not easy to justify without additional knowledge. One case where this works is when we know the existence of a mediation process with decomposition (44).

Observed covariate Suppose that, in addition to X and Y , we can observe a binary covariate C, which can affect the dependence of Y on X. Let π = Pr(C = 1), and let P i be the transition matrix from X to Y , conditional on C = i; for consistency with the known P = P (τ, ρ) we must have P = πP 1 + (1 -π)P 0 .

In particular, it could be the case that π = (1 + τ -ρ)/2, and

P 1 = 1 0 1-τ -ρ 2 1+τ +ρ 2 P 0 = 0 1 1-τ -ρ 2 1+τ +ρ 2

In this case knowledge that an individual with X = Y = 1 also has C = 1 is enough to identify PC at 1.

Unobserved covariate

As shown in Dawid (2011) , knowledge of covariates can improve bounds, even if their values are not observed for the case at hand. In particular, this can let us identify PC at the upper bound, sUB = min{1, 1+τ -ρ 1+τ +ρ }. For this to be possible, however, the average treatment effect must be negative for some value of C.

Thus suppose π = 1+τ +ρ 2 , and the conditional transition matrices are: For ρ < 0,

P 1 = 1 0 0 1 P 0 = -2ρ 1-τ -ρ 1+τ -ρ 1-τ -ρ 1 0 .

For ρ ≥ 0,

P 1 = 1+τ -ρ 1+τ +ρ 2ρ 1+τ +ρ 0 1 P 0 = 0 1 1 0 .

In either case, knowledge that X = Y = 1 is sufficient to infer that C = 1. This identifies the probability of causation: PC = 1 for ρ < 0, PC = 1+τ -ρ 1+τ +ρ for ρ ≥ 0. In both cases we hit the upper bound.

Comparisons Figure 2 compares the bounds obtained, for a range of values of τ and ρ. It illustrates how, in general, lower bounds rise with τ and fall with ρ. For homogeneous processes the lower bounds improve on the simple bounds, although the gain from unlimited steps is not a striking improvement on that for just two steps. The best gains from non-homogeneous decompositions are substantial, as are the gains from knowledge of covariates, especially when ρ is small.

Conclusion

We close with some comments, which may help to guide the collection of ancillary evidence to improve the bounds on the probability of causation. These are based on our general results, as exemplified in Figure 2 .

1. Knowledge of mediation processes, and of positive values for some mediators in a particular case, can raise the lower bound on the probability of causation, thus providing some evidence against a sceptic who doubts that the outcome in the case can be attributed to the putative cause. However, it may well not raise the bound enough to convince her. In contrast, for some processes, observing negative evidence on mediators can effectively convince the sceptic that the outcome is not the result of the exposure.

2. Observing positive data on homogeneous mediation processes can improve the bounds, but there are diminishing returns, and full identification is not achieved, even with infinite data.

3. For a homogeneous process, observation in the middle of the process is more informative than nearer the edges.

4. Heterogeneous mediation processes can sometimes yield identification with minimal auxiliary data gathering:

• A process where X is a necessary condition for a sufficient condition for Y yields the largest possible upper bound, and identifies the probability of causation. For example, if it is known that the effect of delivering a deworming medicine passes uniquely through ingestion, and ingestion is sufficient for effective deworming, then evidence of ingestion raises the lower bound and identifies the probability of causation.

• A process in which X is a sufficient condition for a necessary condition for Y yields identification, and there is no gain from gathering data on the mediator. For instance if ingesting medicine is a sufficient condition for good health, and good health is a necessary condition for good school performance, then observing ingestion and good school performance is sufficient to achieve identification. There are no additional gains from measuring health, since good health is already implied by good performance.

5. Potential gains from knowledge of mediation processes are typically weaker than potential gains from knowledge of conditions under which interventions are more or less effective. Even when covariates are unobserved for the case at hand, knowledge of the general effect of covariates can tighten the bounds when some subgroups exhibit adverse effects. On this basis researchers might be able to assess whether a search for a suitable covariate could lead to improved bounds, and perhaps even identification of the probability of causation.

where the leading term is positive, so oLB n is eventually increasing. Similarly

d n+1 -d n = 2 oLB ∞ k/n 3 + O(1/n 4 )

with negative leading term, so oLB n is eventually concave in n. A similar argument shows that, for ρ > 0, oUB n is eventually decreasing and convex in n. We note that the convergence of oUB n to its limit is at a faster rate than for oLB n . The behaviour of uUB n is obtained similarly (the limit being approached at rate 1/n).

Proof of Lemma 3. Consider the n-part homogeneous decomposition P = P 1 | . . . | P n . Now replace each P i by its homogeneous 2-part decomposition P i = Q i1 | Q i2 , so creating the 2n-part homogeneous decomposition

P = Q 11 | Q 12 | . . . | Q n1 | Q n2 .

By Corollary 3 and (25) we see that uUB decreases on making these replacements. The argument of § 3.4 shows that oLB is increased by these replacements.

To show the result for oUB it is enough to show that oUB < sUB for a two-term homogeneous decomposition with ρ > 0. That is to say,

(1 + τ -ρ ) 2 (1 + τ + ρ ) 2 < 1 + τ -ρ 1 + τ + ρ or equivalently 2ρ (1 + τ ) 2 + ρ 2 < 4ρ (1 + τ )(1 + τ ).

Noting that ρ = ρ (1 + τ ) and τ = τ 2 , this becomes

(1 + τ ) 2 + ρ 2 < 2 1 + τ 2 ,

equivalent to ρ 2 < (1 -τ ) 2 , which holds since, by ( 51 ) and ( 52 ), ρ /(1 -τ ) = ρ/(1 -τ ) ∈ (0, 1) by assumption.

Mr, Mr+1) M r = m r , M r+1 = m r+1

Figure 1 :

Figure1: Bounds from homogeneous decompositions of length n of P = P (τ, ρ), for τ = 0.2 and six different values of ρ. PC bounds when mediators not observed (marked blue), PC bounds for mediators observed at 1 for homogeneous decompositions of length n (red), and PC bounds for alternating observations (green). For positive data the bounds tighten only modestly as the number of links in the chain increases. For alternating data the bounds converge to 0 unless ρ = 0.

References

Understanding Process Tracing David Collier PS: Political Science & Politics 44 4 2011
From Statistical Evidence to Evidence of Causality A Dawid Monica Philip Stephen E Musio Fienberg Bayesian Analysis 11 2016
Alexander Dawid Philip The Rôle of Scientific and Statistical Evidence in Assessing Causality. In Perspectives on Causation Richard Goldberg Oxford Hart Publishing 2011
The probability of causation1 Alexander Philip Dawid Monica Musio Rossella Murtas 10.1093/lpr/mgx012 Law, Probability and Risk 1470-8396 1470-840X 16 4 2017 Oxford University Press (OUP)
Bounding the Probability of Causation in Mediation Analysis Alexander Philip Dawid Rossella Murtas Monica Musio 10.1007/978-3-319-44093-4_8 Topics on Methodological and Applied Statistical Inference Springer International Publishing 2016
Why Ask Why? Forward Causal Inference and Reverse Causal Questions Andrew Gelman Guido Imbens National Bureau of Economic Research 2013. 19614. 19614 Working Paper
The Logic of Process Tracing Tests in the Social Sciences James Mahoney Sociological Methods & Research 41 4 2012
New Bounds for the Probability of Causation in Mediation Analysis Rossella Murtas Alexander Philip Dawid Monica Musio arXiv:1706.04857 2017
Causes of Effects and Effects of Causes Judea Pearl Sociological Methods & Research 44 1 2015
The Probability of Causation Under a Stochastic Model for Individual Risk James Robins Sander Greenland Biometrics 45 1989
Probabilities of Causation: Bounds and Identification Jin Tian Judea Pearl Annals of Mathematics and Artificial Intelligence 28 2000
Guide to Methods for Students of Political Science Stephen Van Evera 1997 Cornell University Press Ithaca, NY
Theory-Based Impact Evaluation: Principles and Practice Howard White Journal of Development Effectiveness 1 3 2009
Mathematica, Version 11.3 2018 Wolfram Research, Inc Champaign, IL
Understanding the Past: Statistical Analysis of Causal Attribution Teppei Yamamoto American Journal of Political Science 56 1 2012

Metadata

Title: Bounding Causes of Effects With Mediators
Delta ID: DSEID-001-3003049
Authors: Philip Dawid, Macartan Humphreys, Monica Musio
Abstract source: crossref
Source URL: https://arxiv.org/pdf/1907.00399
Access: open_repository
Licence: https://creativecommons.org/licenses/by/4.0/
PDF SHA-256: 6701158d536af61f7eaa82f9ff573b1398ee11578ae2aa63eb9867da63d397d6
TEI SHA-256: d00106c2ae0029c643f978dc2d37a911ddf1b09b4328f880bb0de4fa43bf5923
GROBID: {"version":"0.8.2","revision":"a91ee48"}

Issues

No public issues have been filed for this DOI.

Submit an issue

Record history

When	Event	Field	Old	New
2026-06-18 19:37:53.011249+00:00	identifier_assigned	DSEID		DSEID-001-3003049
2026-06-18 18:34:27.057162+00:00	pdf_processed	pdf_sha256		6701158d536af61f7eaa82f9ff573b1398ee11578ae2aa63eb9867da63d397d6