Ranking with non-random missing ratings: influence of popularity and positivity on evaluation metrics
The evaluation of recommender systems in terms of ranking has recently gained attention, as it seems to better fit the top-k recommendation task than the usual ratings prediction task. In that context, several authors have proposed to consider missing ratings as some form of negative feedback to compensate for the skewed distribution of observed ratings when users choose the items they rate. In this work, we study two major biases of the selection of items: the first one is that some items obtain more ratings than others (popularity effect), and the second one is that positive ratings are observed more frequently than negative ratings (positivity effect). We present a theoretical analysis and experiments on the Yahoo! dataset with randomly selected items, which show that considering missing data as a form of negative feedback during training may improve performances, but also that it can be misleading when testing, favoring models of popularity more than models of user preferences.