home | library | feedback

# Inferring causalities in the presence of latent variables

Sometimes inferring existence or non-existence of causalities between variables is possible even if we relax the naive assumption that there are no latent variables involved in dependencies.

## Not so naive model

In general all the dependencies between variables might be caused by one latent variable. Postulating such variable is however somewhat against scientific inquiry where the goal is to make as few assumptions as possible (Occam's razor). If we however restrict ourselves to the latent variable models where every latent variable is a parent of exactly two observed variables, and none of the latent variables has parents, we can infer something about causal relationships of observed variables. We call this restricted set of latent variable models "not so naive causal models". This restriction is not as bad as it sounds, since it can be shown, that under very reasonable assumptions, all causal models with latent variables can be represented as models in this class.

## Excluding causality

Sometimes the subset of dependencies in our model can help us exclude the possibility of A causing B even when A and B seem to be always dependent. This is the case if there is third variable C that given S (that does not include A or B or C) is dependent of A but independent of B. If A were a direct cause of B, the dependence between C and A would always make C and B dependent too. This is against our assumption that our model contains a statement saying C and B are independent given S (that does not contain A or B or C). The only possibilities left are that either B causes A or that there is a latent common cause for A and B, that makes them look appear dependent.

## Inferring causality

In the example above we were left with two possibilities: either B causes A or there is a common latent cause for both A and B. Sometimes our model may let us rule out the possibility of the common latent cause leaving us the only choice of B being a direct cause of A.

So let us assume that our model contains among other things following dependency statements (and let us assume that set S does not contain B):

1. A and B are dependent on each other no matter what
2. B and C are dependent on each other no matter what
3. A and C are dependent on each other given S
4. A and C are independent on each other given S and B

Let us further assume that exploiting (possibly some some other) dependency statements in our model we have been able to exclude the possibility that B is a direct cause of C. Now it is impossible that dependency between A and B could be caused by a latent common cause, since in this case knowing B would not block the dependency between A and C like our dependency statement 4 says. This can be seen by looking at the two possible situations in which A and B has a common latent cause L.

### Possibility 1: C causes B and latent L causes A and B

 In this case A and C would always appear dependent if we know B. Knowing collision node B opens path through B. This is against dependency statement 4 of our model. ### Possibility 2: Latent K causes both C and B and latent L causes both A and B

 In this case A and C would always appear dependent if we know B. Knowing collision node B opens path from C via K through B via L to A. This is against dependency statement 4 of our model.   B-Course, version 2.0.0 CoSCo 2002