![]() |
CiteULike | ![]() |
mote's CiteULike | ![]() |
![]() |
|
![]() |
Register | ![]() |
Log in | ![]() |
Stochastic Pronunciation Modeling by Ergodic-HMM of Acoustic Sub-word Units |
Reviews
[Write a review of this article]
Notes for this articleAim is more robust detection of non-native speech (tolerance of mispronunciations).
Achieved by better phone models trained using EHMM
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
Posting History
AbstractWe propose a stochastic pronunciation model using an ergodic - hidden Markov model (EHMM) of automatically derived acoustic sub-word units (SWU). The proposed EHMM discovers the pronunciation structure inherent in the acoustic training data of a word without any apriori phonetic transcriptions. The EHMM is an HMM of HMMs its states are SWU HMMs and the state-transitions compose various possible lexicon. The EHMM parameters are estimated by an iterative segmental -means procedure which jointly optimizes the subword units (states) and the pronunciation structure parameters (state-transitions). The EHMM based pronunciation model is evaluated in an English isolated word recognition task with 70 speakers drawn from 8 highly different first languages. Results show that EHMM learns the lexicon distribution over the population of speakers for each word, thereby effectively modeling the inter-speaker pronunciation variability. EHMM offers an improvement of 8% (absolute) word recognition accuracy over a single most likely lexicon performance.
BibTeX record
RIS record