Supervised and unsupervised, MAP and MLLR adaptation can be used. Supervised one requires reference text, which is not available in the real decoding. But it sets the upper limit about how good the adaptation can possibly be. Unsupervised one doesn't require reference text. However we do want to know what phone to adapt, thus what's being said, to perform adaptation. So we can run the 1st pass decoding and get some hypotheses as the reference, or simply guessing what's being said. Ideally, the reference is a combination of possible utterances, weighted by their probability. Choosing the 1st top one is the transcription mode. Nbest can also be chosen. Choosing a kinds of phone combinations is called a phone loop adaptation for lighter computation.
Reviewed by
zzb3886
- 2009-02-24 23:55:54
Recently there has been much interest in the area of adaptation for improved speech recognition in the presence of mismatches between the training and testing conditions. In this paper we focus on transformation-based maximum-likelihood (ML) adaptation. Some of the important adaptation parameters include whether the adaptation is sbibperformed in the feature-space or model-space, and whether the adaptation is supervised or unsupervised. An additional parameter is the adaptation data. For example adaptation may be performed using an independent dataset or the test data itself. The latter is referred to as transcription-mode adaptation. In this paper, we experimentally study the effect of these various parameters, and report on our findings. 1.