home | library | feedback |
Statistical dependency is a technical issue that is based on the concept of probability. In Bayesian theory probability is related to beliefs, which gives us a somewhat less technical interpretation of dependencies.
» Read more about Bayesian probability
In B-Course we will only need something that is called pairwise conditional dependencies, since that is the only type of dependency that appears in our models. (In the following "dependency" always means "pairwise conditional dependency".) Dependency is a statement about the variables (i.e. those things named in the first row of your data file). Saying that variables A and B are dependent on each other means that if I know what the value of variable A is it helps me to guess what the value of variable B is.
Example:
Let us have three variables Animal, AnimalType and LivesAt. Let the
possible values of the Animal be {whale, deer, cat, dog, perch, pike,
shark} and let the the possible values for AnimalType be {mammal,
fish} and possible values for LivesAt be {water, ground}. Now let us
imagine the game where someone thinks about an animal and you have to
guess where this animal lives. As long as you do not know what animal
your co-player is thinking of, you are somewhat uncertain about the habitat
of the animal. As soon as you are told that the AnimalType is mammal you
start to think that the animal most probably lives on ground. The information
about AnimalType clearly helps you guess the value of LivesAt. In other words
AnimalType and LivesAt are dependent on each other.
Let us imagine another course the game could take. Let us think that before you are told that AnimalType is mammal, you are told that Animal is a whale. This leads you to conclude that LivesAt is water. When you are later on told that AnimalType is mammal, that information does not change your "guess" about LivesAt. So when you know the value of Animal the information about the value of AnimalType does not help you to know the value of LivesAt. In other words AnimalType and LivesAt are not dependent of each other if you know the value of Animal.
The example above illustrates the important characteristic of dependencies: Dependency can be conditional, i.e., whether there is a dependency between variables A and B depends on whether you know or do not know values of some other variables.
Since s dependency between A and B is sensitive to the knowledge about other variables, each dependency statement actually says something about all the variables. The dependency statement has five parts. It has the form "Variables A and B are dependent on each other if we know something about all the variables in set K and we know nothing about any variables in set N and it does not matter if we know or don't know something about rest of the variables I." So the five parts of the dependency statement are
It is intuitive to further require that A and B are different variables and neither of them belongs to sets K, N or I. Furthermore, all the other variables belong to exactly one of the three sets K,N and I.
To be able to compactly represent the dependency statements we invent a notation for "Variables A and B are dependent of each other if we know something about all the variables in set K and we know nothing about any variables in set N and it does not matter if we know or don't know something about rest of the variables I." We simply write DEP(A, B | K, N, I). In the example above we can write DEP(LivesAt, AnimalType | {}, {Animal}, {}) and not DEP(LivesAt, AnimalType | {Animal}, {}, {}).
B-Course, version 2.0.0 |