Potential fitting biases resulting from grouping data into variable width bins
When reading peer-reviewed scientific literature describing any analysis of empirical data, it is natural and correct to proceed with the underlying assumption that experiments have made good faith efforts to ensure that their analyses yield unbiased results. However, particle physics experiments are expensive and time consuming to carry out, thus if an analysis has inherent bias (even if unintentional), much money and effort can be wasted trying to replicate or understand the results, particularly if the analysis is fundamental to our understanding of the universe. In this note we discuss the significant biases that can result from data binning schemes. As we will show, if data are binned such that they provide the best comparison to a particular (but incorrect) model, the resulting model parameter estimates when fitting to the binned data can be significantly biased, leading us to too often accept the model hypothesis when it is not in fact true. When using binned likelihood or least squares methods there is of course no a priori requirement that data bin sizes need to be constant, but we show that fitting to data grouped into variable width bins is particularly prone to produce biased results if the bin boundaries are chosen to optimize the comparison of the binned data to a wrong model. The degree of bias that can be achieved simply with variable binning can be surprisingly large. Fitting the data with an unbinned likelihood method, when possible to do so, is the best way for researchers to show their analyses are not biased by binning effects. Failing that, equal bin widths should be employed as a cross-check of the fitting analysis whenever possible.