home | library | feedback

How much data should I have?

The number of data vectors does not play such a dramatic role in Bayesian statistics than it appears to play in "ordinary" statistics. This does not mean that it does not matter at all. Large data sets makes results more certain.

In theory you can do even without any data

Technically the lower limit for number of data vectors is zero. With any non-negative number of data data vectors you will get the most probable model(s). But the catch is that if you have just a few data vectors (small N) almost all the models appear equally probable. If you have very much data, one of the models is likely to appear much more probable than the rest of the models. Since for Bayesians differences on probabilities are differences in certainties, more data means more certainty.

Large data set justifies complex models

If you have very little data the most probable model is likely to be a simple one, while a large data set has "power" to justify more complex models. This is just the symptom of the fact that we calculate conditional probability, the probability of the model in the light of evidence carried by data. Changing the data, e.g., getting more data, will also change the probabilities of models.


  B-Course, version 2.0.0
CoSCo 2002