home | library | feedback

How much data should I have?

The number of data vectors does not play such a dramatic role in Bayesian statistics than it appears to play in "ordinary" statistics. This does not mean that it does not matter at all. Large data sets makes results more certain.

In theory you can do even without any data

Technically the lower limit for number of data vectors is zero. With any non-negative number of data data vectors you will get the classification model. But the catch is that if you have just a few data vectors (small N) the quality of the model is not necessarily very good and there is many different models that are almost equally good, but the data is not sufficient to pick up the best one with reasonable certainty. Since the classification model is also a bayesian dependency model, the general comments about the effects of the data size to dependency models apply.

For practical reasons we need some data

It is not meaningful to do classification analysis, if you do not have at least two data vectors that belong to the different class, so the mininimum number of data vectors required in B-Course classification analysis is two. That is not to say that two data vectors are enough to make the results terribly meaningful.



B-Course, version 2.0.0	CoSCo 2002