Is the Jeffreys' scale a reliable tool for Bayesian model comparison in cosmology?
We are entering an era where progress in cosmology is driven by data, and alternative models will have to be compared and ruled out according to some consistent criterium. The most conservative and widely used approach is Bayesian model comparison. In this paper we explicitly calculate the Bayes factors for all models that are linear with respect to their parameters. We do this in order to test the so called Jeffreys' scale and determine analytically how accurate its predictions are in a simple case where we fully understand and can calculate everything analytically. We also discuss the case of nested models, e.g. one with $M_1$ and another with $M_2⊃ M_1$ parameters and we derive analytic expressions for both the Bayes factor and the Figure of Merit, defined as the inverse area of the model parameter's confidence contours. With all this machinery and the use of an explicit example we demonstrate that the threshold nature of Jeffreys' scale is not a "one size fits all" reliable tool for model comparison and that it may lead to biased conclusions. Furthermore, we discuss the importance of choosing the right basis in the context of models that are linear with respect to their parameters and how that basis affects the parameter estimation and the derived constraints.