home | library | feedback

What is a statistical dependency

Statistical dependency is a technical issue that is based on the concept of probability. In Bayesian theory probability is related to beliefs, which gives us a somewhat less technical interpretation of dependencies.

ť Read more about Bayesian probability

Pairwise conditional dependencies

In B-Course we will only need something that is called pairwise conditional dependencies, since that is the only type of dependency that appears in our models. (In the following "dependency" always means "pairwise conditional dependency".) Dependency is a statement about the variables (i.e. those things named in the first row of your data file). Saying that variables A and B are dependent on each other means that if I know what the value of variable A is it helps me to guess what the value of variable B is.

Example:
Let us have three variables Animal, AnimalType and LivesAt. Let the possible values of the Animal be {whale, deer, cat, dog, perch, pike, shark} and let the the possible values for AnimalType be {mammal, fish} and possible values for LivesAt be {water, ground}. Now let us imagine the game where someone thinks about an animal and you have to guess where this animal lives. As long as you do not know what animal your co-player is thinking of, you are somewhat uncertain about the habitat of the animal. As soon as you are told that the AnimalType is mammal you start to think that the animal most probably lives on ground. The information about AnimalType clearly helps you guess the value of LivesAt. In other words AnimalType and LivesAt are dependent on each other.

Let us imagine another course the game could take. Let us think that before you are told that AnimalType is mammal, you are told that Animal is a whale. This leads you to conclude that LivesAt is water. When you are later on told that AnimalType is mammal, that information does not change your "guess" about LivesAt. So when you know the value of Animal the information about the value of AnimalType does not help you to know the value of LivesAt. In other words AnimalType and LivesAt are not dependent of each other if you know the value of Animal.

The example above illustrates the important characteristic of dependencies: Dependency can be conditional, i.e., whether there is a dependency between variables A and B depends on whether you know or do not know values of some other variables.

An attempt to be more rigorous

Since s dependency between A and B is sensitive to the knowledge about other variables, each dependency statement actually says something about all the variables. The dependency statement has five parts. It has the form "Variables A and B are dependent on each other if we know something about all the variables in set K and we know nothing about any variables in set N and it does not matter if we know or don't know something about rest of the variables I." So the five parts of the dependency statement are

Variable A, that is claimed to be dependent on B.
Variable B, that is claimed to be dependent on A.
Set K (K - Know) of variables, such that for dependency statement to be true we must have some information about all the variables that belong to K.
Set N (N - Not know) of variables, such that for dependency statement to be true we must not have any information about any variables that belong to N.
Set I (I - Indifferent) of variables, such that for dependency statement to be true it does not matter if we know or not know something about variables that belong to I.

It is intuitive to further require that A and B are different variables and neither of them belongs to sets K, N or I. Furthermore, all the other variables belong to exactly one of the three sets K,N and I.

To be able to compactly represent the dependency statements we invent a notation for "Variables A and B are dependent of each other if we know something about all the variables in set K and we know nothing about any variables in set N and it does not matter if we know or don't know something about rest of the variables I." We simply write DEP(A, B | K, N, I). In the example above we can write DEP(LivesAt, AnimalType | {}, {Animal}, {}) and not DEP(LivesAt, AnimalType | {Animal}, {}, {}).



B-Course, version 2.0.0	CoSCo 2002