Harpoon or Bait? A Comparison of Various Metrics in Fishing for Sequence Patterns
The use of sequence analysis in the social sciences has significantly increased during the last decade or two. Sequence analysis explores and describes trajectories and “fishes for patterns” (Abbott, 2000). Many dissimilarity metrics exist in various domains (bioinformatics, data mining, etc.); therefore a crucial and pervasive issue in papers using sequence analysis is robustness. To what extent do the various techniques lead to consistent and converging results? What kinds of patterns are more easily fished out by each of the metrics? Here we propose a systematic comparison of about ten metrics that have been used in the social science literature, based on the examination of dissimilarity matrices computed from a simulated sequence data set including various patterns that sociologists can try to identify. This should help scholars in picking the method best suited to their data design and inquiry objectives.