Score-based likelihood ratios for handwriting evidence
Score-based approaches for computing forensic likelihood ratios are becoming more prevalent in the forensic literature. When two items of evidential value are entangled via a scorefunction, several nuances arise when attempting to model the score behavior under the competing source-level propositions. Specific assumptions must be made in order to appropriately model the numerator and denominator probability distributions. This process is fairly straightforward for the numerator of the score-based likelihood ratio, entailing the generation of a database of scores obtained by pairing items of evidence from the same source. However, this process presents ambiguities for the denominator database generation – in particular, how best to generate a database of scores between two items of different sources. Many alternatives have appeared in the literature, three of which we will consider in detail. They differ in their approach to generating denominator databases, by pairing (1) the item of known source with randomly selected items from a relevant database; (2) the item of unknown source with randomly generated items from a relevant database; or (3) two randomly generated items. When the two items differ in type, perhaps one having higher information content, these three alternatives can produce very different denominator databases. While each of these alternatives has appeared in the literature, the decision of how to generate the denominator database is often made without calling attention to the subjective nature of this process. In this paper, we compare each of the three methods (and the resulting score-based likelihood ratios), which can be thought of as three distinct interpretations of the denominator proposition. Our goal in performing these comparisons is to illustrate the effect that subtle modifications of these propositions can have on inferences drawn from the evidence evaluation procedure. The study was performed using a data set composed of cursive writing samples from over 400 writers. We found that, when provided with the same two items of evidence, the three methods often would lead to differing conclusions (with rates of disagreement ranging from 0.005 to 0.48). Rates of misleading evidence and Tippet plots are both used to characterize the range of behavior for the methods over varying sized questioned documents. The appendix shows that the three score-based likelihood ratios are theoretically very different not only from each other, but also from the likelihood ratio, and as a consequence each display drastically different behavior.