home | library | feedback

Searching for the best model

For us, best model is the most probable model. Bayesian theory gives us tools to compare the probabilities of different models. However, our model set often contains so many models that it is impossible to go through all the models and pick the most probable one. That is why we have to search.

Computer science and fast computers

The bad news is that in computer science it is practically proven that no matter how smart computational methods are and no matter how fast the computers will be in a future, there will never be a fast way to find out for sure the most probable model unless the number of variables is very small. On the other hand, this is not anything extraordinary, it is very common in computer science. That is why in computer science researchers have developed many computer programs that deal with these "impossible" problems. In our case it means that it takes time to find a good model, and still we cannot be sure that we find the best one. But sometimes we may find the best one (although there is no way of knowing that) and usually we find a very good one (maybe second best or third, again cannot know for sure).

Simple and greedy random search

To get an idea of the search we describe a very simple random search. It goes as follows. We pick one model at random and declare it the best model so far (which it is since it is the only one). We then pick another model at random and compare the probability of this second model to the probability of the best model so far. If the probability of the second model is bigger than the probability of the best model so far (the first model) we declare the second model to be best model and we throw away the old champion. If, on the other hand, the probability of the second model is less than the probability of the best model so far, the old champion can keep its title and we throw away the second model.

We then continue picking up models at random and we always compare the probability of the freshly picked model to the probability of the best model so far. The one that wins this comparison is declared the current best model and the other is dumped. This game can be continued forever (since we put the loosing models back to the pool where we pick them.) When we get tired to play this game, we stop and declare the current best model to be result of our search.

B-Course can do better

However, it is possible to do something smarter. Namely, the models which resemble each other a lot (i.e., they have almost same set of dependency statements) are likely to be almost equally probable. This leads us to a search strategy, where we do not pick models from a set of models randomly, but we pick models that resemble the current best model. These models are more likely to be more probable than the current best model than if we picked the models randomly.

We can also abandon our habit of always dumping the looser and collect a set of relatively good models. We can then try to combine the best parts of these models so that the resulting combined model is better than any of the original models. This is roughly the idea of what computer scientists call genetic algorithms.

There are many more tricks like sometimes dumping the more probable model (algorithms like hill climbing, simulated annealing) or trying to avoid picking similar model twice (tabu search). There is also systems that learn to search better by experience.

B-Course is not committed to any particular search strategy. The underlying search engine in B-Course is frequently and without notification improved.

B-Course informs about its search

When asked about an intermediate or a final search status, B-Course sometimes reports having evaluated a very large amount of models. As the search routine has limited memory, these numbers are not exact. When it encounters a model it cannot remember seeing before, it counts the model as a new one. So sometimes the search routine evaluates and counts the same model more than once. It is true when B-Course says that it has evaluated 100000000000 models, however they are not necessarily different models.

 

  B-Course, version 2.0.0
CoSCo 2002