(2002), pp. 133-142.
This paper proposes a new method for evaluating the quality of retrieval functions. Unlike traditional methods that require relevance judgements by experts or explicit user feedback, it is based entirely on clickthrough data. This is a key advantage, since clickthrough data can be collected at very low cost and without overhead for the user. Taking an approach from experiment design, the paper proposes an experiment setup that generates unbiased feedback about the relative quality of two search results without explicit user feedback. A theoretical analysis shows that the method gives the same results as evaluation with traditional relevance judgements under mild statistical assumptions. An empirical analysis veries that the assumptions are indeed justied and that the new method leads to conclusive results in a WWW retrieval study. 1