home | library | feedback

Predicting with dependence models

Even if dependency models are kind of "qualitative" in nature (they just state that something depends on something"), they can also be used to calculate (conditional) probabilities of unknown future observations. B-Course also provides means to play with the models predictively.

Predictive playground

On predictive playgrounds the name of the game is this: You tell B-Course that you have observed something about a single case and B-Course will respond with one probability distribution for each variable that was not included in your set of observed variables. The probability distribution of the variable tells you how probable it is that the variable has certain values. Your observations can be about many variables simultaneously.

We have two variables describing a student, Gender {boy, girl} and Preference {looks, sports}, and our model is such that these two variables are dependent on each other. You now tell B-Course that student thinks that good looks is more important than being good in sports. B-Course then responds with the distribution (0.3, 0.7) for the variable Gender meaning that probability, that student is a boy is 0.3 and the probability that student is girl is 0.7.

References to the prediction with Bayesian networks

Variables are predicted separately

It is tempting to think that by collecting the most probable predicted values of each variable, one can obtain the description of most probable case. For example if prediction for Gender(male, female) is (0.6, 0.4) and the prediction for Age(young, old) is also (0.6, 0.4) one would like to conclude that the most probable case is a young male. However, this reasoning is not necessarily true. Picking the most probable values for single variables does not necessarily yield the most probable case as the following simple example demonstrates. If we have two young males, four old males and four young females, the most common Age is young (six out of ten) and the most common Gender is male (six out of ten as well). However being young and male is not the most common combination since there is only two of those and there is twice as many old males and twice as many young females too.

Without going into details, the fact that most probable combination can not be calculated by gathering most probable values of single variables lies in the very heart of dependency modeling. Classical Chi-test for independence (we are not using tests, but still) is based on this phenomenon. If the variables are independent of each other, the most probable combination of values can actually be built by taking the most probable values of single variables. Finding out the most probable combination of values can also be effectively calculated by using dependency model. However, at the moment B-Course does not provide this possibility.

Observation is not intervention

One would like to use playgrounds for testing what would happen if a variable is actively set to some value. Unfortunately you cannot insert actions onto the playground, setting the variable to a value means having passive observations, not having active interventions. Everybody understands the difference between observing someone to die and killing him. Still it is sometimes tempting to make this mistake, but it is mistake as the following example shows.

Let us have an observational study of mice in a terrarium. Let us imagine that some food items (but not all) in a terrarium have been treated with certain chemical and we observe how mice eating those treated chemicals react. For each mouse, we record the variable EatChemical(yes, no), that tells whether the mouse has tasted the food item that is treated with chemical. We also record the variable Effect(more active, normal, less active, dead) that tells the activity level of the mouse after the experiment. Let us now imagine that the result of our study is that most of those mice that tasted the chemical died, while most of the mice that did not taste the chemical went on leading normal mouse life. Now if you put a fresh mouse on the top the terrarium and you are told that it died, you are pretty sure that it tasted the chemical. However, if you go and shoot that mouse, it's status is dead too, but you are not tempted to think that it tasted the chemical. The moral of the story is that actions that actively change the status of some variable lead to different predictions than passive observations about the value of variable.


  B-Course, version 2.0.0
CoSCo 2002