The author exploited the confidence measure-related features from lattices. Those features included:
1) link probability. It is the acoustic and language model score computed based on lattices.
2) hypothesis density. It is the number of links that span the time segment of a word. Interested in how the time segment of a word is determined. Is the word chosen from all the nodes, or just from the best hypothesis (see 3)?
3) acoustic stability. Dump some hypotheses, choose a reference hypothesis at first, by weighting acoustic and language model scores. Then, align all the other hypotheses with the reference hypothesis (only on text?). For each word, compute the occurrence of it in other aligned hypotheses, and divided by occurrence of other words.
Still cannot tell the difference between lattices and confusion networks on this issue. Especially in alignment, confusion networks are fully aligned on time, while lattices are not. Selecting some time segments in lattice would be hard? Looking for detailed description of the algorithm.
Reviewed by
zzb3886
as

- 2008-06-24 22:01:46
For many practical applications of speech recognition systems, it is desirable to have an estimate of confidence for each hypothesized word, i.e. to have an estimate which words of the speech recognizer's output are likely to be correct and which are not reliable. Many of today's speech recognition systems use word lattices as a compact representation of a set of alternative hypothesis. We exploit the use of such word lattices as information sources for the measure-of-confidence tagger JANKA...