CiteULike is a free online bibliography manager. Register and you can start organising your references online.
Tags

Robust automatic speech recognition with missing and unreliable acoustic data

by: Martin Cooke, Phil Green, Ljubomir Josifovski, Ascension Vizinho
Speech Communication, Vol. 34, No. 3. (June 2001), pp. 267-285, doi:10.1016/s0167-6393(00)00034-0  Key: citeulike:12105774

Formatted Citation


Show HTML

Likes (beta)

This copy of the article hasn't been liked by anyone yet.

View FullText article


Abstract

Human speech perception is robust in the face of a wide variety of distortions, both experimentally applied and naturally occurring. In these conditions, state-of-the-art automatic speech recognition (ASR) technology fails. This paper describes an approach to robust ASR which acknowledges the fact that some spectro-temporal regions will be dominated by noise. For the purposes of recognition, these regions are treated as missing or unreliable. The primary advantage of this viewpoint is that it makes minimal assumptions about any noise background. Instead, reliable regions are identified, and subsequent decoding is based on this evidence. We introduce two approaches for dealing with unreliable evidence. The first – marginalisation – computes output probabilities on the basis of the reliable evidence only. The second – state-based data imputation – estimates values for the unreliable regions by conditioning on the reliable parts and the recognition hypothesis. A further source of information is the bounds on the energy of any constituent acoustic source in an additive mixture. This additional knowledge can be incorporated into the missing data framework. These approaches are applied to continuous-density hidden Markov model (HMM)-based speech recognisers and evaluated on the TIDigits corpus for several noise conditions. Two criteria which use simple noise estimates are employed as a means of identifying reliable regions. The first treats regions which are negative after spectral subtraction as unreliable. The second uses the estimated noise spectrum to derive local signal-to-noise ratios, which are then thresholded to identify reliable data points. Both marginalisation and state-based data imputation produce a substantial performance advantage over spectral subtraction alone. The use of energy bounds leads to a further increase in performance for both approaches. While marginalisation outperforms data imputation, the latter technique allows the technique to act as a preprocessor for conventional recognisers, or in speech-enhancement applications.


asterix77's tags for this article

Citations (CiTO)

No CiTO relationships defined

X There are no reviews yet

X Find related articles with these CiteULike tags

X Posting History


X Export records

Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.