Tags

asterix77's library 835 articles

 
Sort by: Order: Empty fields:
 

Nonparametric belief propagation

  [CiTO]
Commun. ACM, Vol. 53, No. 10. (October 2010), pp. 95-103, doi:10.1145/1831407.1831431

Abstract

Continuous quantities are ubiquitous in models of real-world phenomena, but are surprisingly difficult to reason about automatically. Probabilistic graphical models such as Bayesian networks and Markov random fields, and algorithms for approximate inference such as belief propagation (BP), have proven to be powerful tools in a wide range of applications in statistics and artificial intelligence. However, applying these methods to models with continuous variables remains a challenging task. In this work we describe an extension of BP to continuous variable models, ...

 

CLOSE-A Data-Driven Approach to Speech Separation

  [CiTO]
Audio, Speech, and Language Processing, IEEE Transactions on, Vol. 21, No. 7. (July 2013), pp. 1355-1368, doi:10.1109/tasl.2013.2250959
posted to datadriven monaural nonparametric separation by asterix77 on 2013-04-16 14:35:54 read

Abstract

This paper studies single-channel speech separation, assuming unknown, arbitrary temporal dynamics for the speech signals to be separated. A data-driven approach is described, which matches each mixed speech segment against a composite training segment to separate the underlying clean speech segments. To advance the separation accuracy, the new approach seeks and separates the longest mixed speech segments with matching composite training segments. Lengthening the mixed speech segments to match reduces the uncertainty of the constituent training segments, and hence the error ...

 

K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation

  [CiTO]
Signal Processing, IEEE Transactions on In Signal Processing, IEEE Transactions on [see also Acoustics, Speech, and Signal Processing, IEEE Transactions on], Vol. 54, No. 11. (November 2006), pp. 4311-4322, doi:10.1109/tsp.2006.881199
posted to dictionarylearning sparse by asterix77  on 2013-04-12 17:12:41 read along with 16 people angli btracey dersebu Dyomich fastjoe23 hoangminhchau jekky mtepper myranam2 normanpoh qmw sugarexpletive sylvain_chevallier tino tnkysr yanntraonmilin

Abstract

In recent years there has been a growing interest in the study of sparse representation of signals. Using an overcomplete dictionary that contains prototype signal-atoms, signals are described by sparse linear combinations of these atoms. Applications that use sparse representation are many and include compression, regularization in inverse problems, feature extraction, and more. Recent activity in this field has concentrated mainly on the study of pursuit algorithms that decompose signals with respect to a given dictionary. Designing dictionaries to better fit ...

 

The PASCAL CHiME speech separation and recognition challenge

  [CiTO]
Computer Speech & Language, Vol. 27, No. 3. (May 2013), pp. 621-633, doi:10.1016/j.csl.2012.10.004
posted to noise noisy speechrecognition survey by asterix77 on 2013-04-09 19:39:40 read

Abstract

Distant microphone speech recognition systems that operate with human-like robustness remain a distant goal. The key difficulty is that operating in everyday listening conditions entails processing a speech signal that is reverberantly mixed into a noise background composed of multiple competing sound sources. This paper describes a recent speech recognition evaluation that was designed to bring together researchers from multiple communities in order to foster novel approaches to this problem. The task was to identify keywords from sentences reverberantly mixed into ...

 

The Infinite Factorial Hidden Markov Model

  [CiTO]
In Advances in Neural Information Processing Systems 21 (2009), pp. 1697-1704
posted to bayesiannonparametric factorialhmm hmm infinite by asterix77 on 2013-03-21 16:52:13 read
 

Regularized estimation of cepstrum envelope from discrete frequency points

  [CiTO]
In Applications of Signal Processing to Audio and Acoustics, 1995., IEEE ASSP Workshop on (October 1995), pp. 213-216, doi:10.1109/aspaa.1995.482993
posted to cepstrum frequencydomainlpc by asterix77 on 2013-03-15 17:10:12 read

Abstract

This paper presents an improved method for the estimation of a continuous frequency-envelope when the value of this envelope is specified only at discrete frequencies. It is based on the Galas/Rodet (1990) approach which consists of fitting a cepstral amplitude envelope to the specified frequency points by minimizing a frequency-domain least-squares criterion. This paper introduces a regularization technique which increases the robustness of the estimation procedure. Used in combination with a warped frequency-scale, the proposed method is shown to provide an ...

 

Robust automatic speech recognition with missing and unreliable acoustic data

  [CiTO]
Speech Communication, Vol. 34, No. 3. (June 2001), pp. 267-285, doi:10.1016/s0167-6393(00)00034-0
posted to missingdata speechrecognition by asterix77 on 2013-03-04 21:35:18 read

Abstract

Human speech perception is robust in the face of a wide variety of distortions, both experimentally applied and naturally occurring. In these conditions, state-of-the-art automatic speech recognition (ASR) technology fails. This paper describes an approach to robust ASR which acknowledges the fact that some spectro-temporal regions will be dominated by noise. For the purposes of recognition, these regions are treated as missing or unreliable. The primary advantage of this viewpoint is that it makes minimal assumptions about any noise background. Instead, ...

 

Linear prediction on a warped frequency scale

  [CiTO]
Vol. 68, No. 4. (01 October 1980), pp. 1071-1076, doi:10.1121/1.384992
posted to lpc plp warping by asterix77 on 2013-02-21 22:39:10 read

Abstract

Linear prediction is considered with respect to a nonlinear frequency scale obtained by a first‐order all‐pass transformation. The predictor can be computed from a frequency‐warped autocorrelation function obtained from the power spectrum or by a direct linear transformation of the original acf. Three numerical procedures are compared. Alternatively, the predictor can be determined from a covariance matrix or (adaptively) from continuously formed correlations, suitably defined according to the all‐pass transformation. Prediction‐error minimization and spectral flattening are no longer equivalent criteria. In ...

 

Perceptually based linear predictive analysis of speech

  [CiTO]
In Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '85., Vol. 10 (April 1985), pp. 509-512, doi:10.1109/icassp.1985.1168384
posted to lpc plp by asterix77 on 2013-02-21 21:46:48 read

Abstract

A novel speech analysis method which uses several established psychoacoustic concepts, the perceptually based linear predictive analysis (PLP), models the auditory spectrum by the spectrum of the low-order all-pole model. The auditory spectrum is derived from the speech waveform by critical-band filtering, equal-loudness curve pre-emphasis, and intensity-loudness root compression. We demonstrate through analysis of both synthetic and natural speech that psychoacoustic concepts of spectral auditory integration in vowel perception, namely the F1, F2' concept of Carlson and Fant and the 3.5 ...

 

Noise Power Spectral Density Tracking: A Maximum Likelihood Perspective

  [CiTO]
Signal Processing Letters, IEEE, Vol. 19, No. 8. (August 2012), pp. 495-498, doi:10.1109/lsp.2012.2204048
posted to noise noiseestimator tracking by asterix77 on 2013-02-21 14:26:34 read/This user's rating 2.0/Average rating 2.0

Abstract

We propose a new approach for online noise power spectral density (psd) tracking. In this approach, the prior and posterior probabilities of speech absence and also noise statistics are analytically retrieved from a maximum-likelihood-based criterion at every time-frequency slot. The recursive update rules of these three terms are performed in a unified manner and without relying on the conventional tracking of speech psd minima. A single parameter (a forgetting factor) is needed in this process. Comparisons with state of the art ...

 

Beta-Divergence as a Subclass of Bregman Divergence

  [CiTO]
Signal Processing Letters, IEEE, Vol. 18, No. 2. (February 2011), pp. 83-86, doi:10.1109/lsp.2010.2096211
posted to bregman divergence nmf by asterix77 on 2013-02-19 14:25:35 read

Abstract

In this paper, we present a complete proof that the β-divergence is a particular case of Bregman divergence. This little-known result makes it possible to straightforwardly apply theorems about Bregman divergences to β-divergences. This is of interest for numerous applications since these divergences are widely used, for instance in non-negative matrix factorization (NMF). ...

 

Short term spectral analysis, synthesis, and modification by discrete Fourier transform

  [CiTO]
Acoustics, Speech and Signal Processing, IEEE Transactions on, Vol. 25, No. 3. (June 1977), pp. 235-238, doi:10.1109/tassp.1977.1162950
posted to fourier stft by asterix77 on 2013-02-12 15:00:43 read/This user's rating 4.0/Average rating 4.0

Abstract

A theory of short term spectral analysis, synthesis, and modification is presented with an attempt at pointing out certain practical and theoretical questions. The methods discussed here are useful in designing filter banks when the filter bank outputs are to be used for synthesis after multiplicative modifications are made to the spectrum. ...

 

The Complex Gradient Operator and the CR-Calculus

  [CiTO]
(26 Jun 2009)
posted to filtering gradient gradient-descent by asterix77 on 2013-01-31 18:49:24 read/This user's rating 5.0/Average rating 5.0

Abstract

A thorough discussion and development of the calculus of real-valued functions of complex-valued vectors is given using the framework of the Wirtinger Calculus. The presented material is suitable for exposition in an introductory Electrical Engineering graduate level course on the use of complex gradients and complex Hessian matrices, and has been successfully used in teaching at UC San Diego. Going beyond the commonly encountered treatments of the first-order complex vector calculus, second-order considerations are examined in some detail filling a gap in the pedagogic literature. ...

 

All-pole modeling technique based on weighted sum of LSP polynomials

  [CiTO]
Signal Processing Letters, IEEE, Vol. 10, No. 6. (June 2003), pp. 180-183, doi:10.1109/lsp.2003.811635
posted to lpc lsf robustlpc by asterix77 on 2013-01-28 16:02:39 read

Abstract

This study presents a new technique called weighted-sum line spectrum pair (WLSP) where an all-pole filter is defined by using a sum of weighted line spectrum pair polynomials. The WLSP yields a stable all-pole filter of order m, whose autocorrelation function coincides with that of the input signal between indices 0 and m-1. By sacrificing the exact matching at index m, the WLSP models the autocorrelation of the input signal at the indices above m more accurately than conventional linear prediction ...

 

Discrete weighted mean square all-pole modeling

  [CiTO]
In Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on, Vol. 1 (April 2003), pp. I-828-I-831 vol.1, doi:10.1109/icassp.2003.1198909
posted to frequencydomainlpc plp by asterix77 on 2013-01-25 21:34:30 read/This user's rating 4.0/Average rating 4.0

Abstract

The paper presents a new method for all-pole model estimation based on minimization of the weighted mean square error in the sampled spectral domain. Due to discrete nature of the proposed distance measure, emphasis can be put on an arbitrary set of spectral samples what can greatly improve the model accuracy for periodic signals. Weighting can also be applied to improve the fitting in certain spectral regions according to any desired fidelity criterion. Iterative algorithm for determination of the optimal model ...

 

Linear predictive modelling of speech : constraints and line spectrum pair decomposition

  [CiTO]
(2004)
posted to lpc lsp prior by asterix77 on 2013-01-25 20:20:34 read

Abstract

In an exploration of the spectral modelling of speech, this thesis presents theory and applications of constrained linear predictive (LP) models. Spectral models are essential in many applications of speech technology, such as speech coding, synthesis and recognition. At present, the prevailing approach in speech spectral modelling is linear prediction. In speech coding, spectral models obtained by LP are typically quantised using a polynomial transform called the Line Spectrum Pair (LSP) decomposition. An inherent drawback of conventional LP is its inability ...

 

Linear prediction, extermal entropy and prior information in speech signal analysis and synthesis

  [CiTO]
Speech Communication, Vol. 1, No. 1. (May 1982), pp. 9-20, doi:10.1016/0167-6393(82)90004-8
posted to lpc maxent missingdata by asterix77 on 2013-01-25 20:05:54 read/This user's rating 3.0/Average rating 3.0

Abstract

This paper reviews the fundamental concepts of Linear Prediction (LP) and Maximum Entropy (ME) spectral analysis, and elucidates the reasons for their practical importance in the world of real signals. Subsequently, the powerful principle of Minimum Cross-Entropy (MCE) spectral analysis is introduced. MCE permits the incorporation of prior information into signal analysis. In a new approach to speech signal analysis, application of the MCE principle reduces the average number of predictor coefficients (poles) that have to be specified per time frame ...

 

Single-channel speech separation and recognition using loopy belief propagation

  [CiTO]
In Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on (April 2009), pp. 3845-3848, doi:10.1109/icassp.2009.4960466

Abstract

We address the problem of single-channel speech separation and recognition using loopy belief propagation in a way that enables efficient inference for an arbitrary number of speech sources. The graphical model consists of a set of N Markov chains, each of which represents a language model or grammar for a given speaker. A Gaussian mixture model with shared states is used to model the hidden acoustic signal for each grammar state of each source. The combination of sources is modeled in ...

 

A Bounded Divergence Measure Based on The Bhattacharyya Coefficient

  [CiTO]
(29 Aug 2012)
posted to distance probability by asterix77 on 2013-01-23 19:55:08 read

Abstract

We introduce a new divergence measure, the bounded Bhattacharyya distance (BBD), for quantifying the dissimilarity between probability distributions. BBD is based on the Bhattacharyya coefficient (fidelity), and is symmetric, positive semi-definite, and bounded. Unlike the Kullback-Leibler divergence, BBD does not require probability density functions to be absolutely continuous with respect to each other. We show that BBD belongs to the class of Csiszar f-divergence and derive certain relationships between BBD and well known measures such as Bhattacharyya, Hellinger and Jensen-Shannon divergence. Bounds on the Bayesian error probability are established ...

 

Information theory: A signal take on speech

  [CiTO]
Nature, Vol. 466, No. 7308. (11 August 2010), pp. 821-822, doi:10.1038/466821a
posted to review speechrecognition by asterix77  on 2013-01-18 15:43:35 read along with 3 people fbaroni megraw mrkn

Abstract

Approaches that abandon traditional speech categories offer promise for developing statistical descriptions that encapsulate how speech conveys information. Grandparents would be among the beneficiaries. Our own ease of understanding speech belies its underlying complexity. At an abstract level, it is easy enough to describe speech as a sequence of words or of phonemes, but it's notoriously difficult to analyse at the level of the acoustic signal. ...

 

Learning a better representation of speech soundwaves using restricted boltzmann machines

  [CiTO]
In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on (May 2011), pp. 5884-5887, doi:10.1109/icassp.2011.5947700
posted to dbn neuralnet representation speech by asterix77 on 2013-01-15 15:38:51 read/This user's rating 2.0/Average rating 2.0

Abstract

State of the art speech recognition systems rely on preprocessed speech features such as Mel cepstrum or linear predictive coding coefficients that collapse high dimensional speech sound waves into low dimensional encodings. While these have been successfully applied in speech recognition systems, such low dimensional encodings may lose some relevant information and express other information in a way that makes it difficult to use for discrimination. Higher dimensional encodings could both improve performance in recognition tasks, and also be applied to ...

 

Digital signal processing in continuous time: a possibility for avoiding aliasing and reducing quantization error

  [CiTO]
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on In Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on, Vol. 2 (May 2004), pp. ii-589-92 vol.2, doi:10.1109/icassp.2004.1326326
posted to asynchronous continuoustime by asterix77 on 2013-01-15 15:32:53 read along with 1 person nchandra

Abstract

The operation of digital signal processors in continuous time is discussed. It is shown that the main advantages of digital arithmetic can be maintained in such operations, while aliasing of the signal and the quantization error is avoided altogether. Continuous-time operation makes possible a smaller number of bits for a given signal-to-quantization error ratio. Simulation results are presented. ...

 

Learning a Discriminative Weighted Finite-State Transducer for Speech Recognition

  [CiTO]
Audio, Speech, and Language Processing, IEEE Transactions on, Vol. 19, No. 5. (July 2011), pp. 1360-1367, doi:10.1109/tasl.2010.2090518
posted to asr discriminative finitestatetransducer language_model by asterix77 on 2013-01-10 15:32:32 read/This user's rating 4.0/Average rating 4.0

Abstract

Weighted finite-state transducers (WFSTs) have been widely adopted as efficient representations of a general speech recognition model. The WFST for speech recognizer is typically assembled or composed from the several components-the language model, the pronunciation mapping and the acoustic model-which are estimated separately without any end-to-end optimization. This paper examines how the weights of such transducers can be learned in a manner that captures the interaction between the components. The paths in the transducer are represented as ...

 

An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech

  [CiTO]
Audio, Speech, and Language Processing, IEEE Transactions on, Vol. 19, No. 7. (2011), pp. 2125-2136, doi:10.1109/tasl.2011.2114881
posted to intelligibility model objectivemeasure by asterix77 on 2012-12-16 16:25:38 read

Abstract

In the development process of noise-reduction algorithms, an objective machine-driven intelligibility measure which shows high correlation with speech intelligibility is of great interest. Besides reducing time and costs compared to real listening experiments, an objective intelligibility measure could also help provide answers on how to improve the intelligibility of noisy unprocessed speech. In this paper, a short-time objective intelligibility measure (STOI) is presented, which shows high correlation with the intelligibility of noisy and time-frequency weighted noisy speech (e.g., resulting from noise ...

 

Subword Modeling for Automatic Speech Recognition: Past, Present, and Emerging Approaches

  [CiTO]
Signal Processing Magazine, IEEE, Vol. 29, No. 6. (November 2012), pp. 44-57, doi:10.1109/msp.2012.2210952
posted to articulatory speechrecognition survey by asterix77 on 2012-12-15 03:20:22 read/This user's rating 3.0/Average rating 3.0

Abstract

Modern automatic speech recognition systems handle large vocabularies of words, making it infeasible to collect enough repetitions of each word to train individual word models. Instead, large-vocabulary recognizers represent each word in terms of subword units. Typically the subword unit is the phone, a basic speech sound such as a single consonant or vowel. Each word is then represented as a sequence, or several alternative sequences, of phones specified in a pronunciation dictionary. Other choices of subword units have been studied ...

 

The Intelligibility of Interrupted Speech

  [CiTO]
The Journal of the Acoustical Society of America, Vol. 22, No. 2. (01 March 1950), pp. 167-173, doi:10.1121/1.1906584

Abstract

This paper concerns the effects of interrupting speech waves—turning them on and off intermittently or masking them with intermittent noise—upon their intelligibility. The effects were studied with various rates of interruption and with the speech left undisturbed various percentages of the time. Tests were conducted (1) with speech turned on and off in quiet, (2) with continuous speech masked by interrupted white noise, and (3) with speech and noise interrupted alternately, the speech wave being turned on as the noise wave ...

 

An Analysis of Perceptual Confusions Among Some English Consonants

  [CiTO]
The Journal of the Acoustical Society of America, Vol. 27, No. 2. (01 March 1955), pp. 338-352, doi:10.1121/1.1907526
posted to confusion intelligibility phonemes by asterix77  on 2012-12-10 23:01:04 read along with 3 people and 1 group garyfeng kapfelba pdgf-88 ReadingLab

Abstract

Sixteen English consonants were spoken over voice communication systems with frequency distortion and with random masking noise. The listeners were forced to guess at every sound and a count was made of all the different errors that resulted when one sound was confused with another. With noise or low‐pass filtering the confusions fall into consistent patterns, but with high‐pass filtering the errors are scattered quite randomly. An articulatory analysis of these 16 consonants provides a system of five articulatory features or ...

 

Uncomodulated glimpsing in ‘‘checkerboard’’ noise

  [CiTO]
Vol. 93, No. 5. (01 May 1993), pp. 2915-2922, doi:10.1121/1.405811
posted to glimpsing intelligibility psychoacoustics by asterix77 on 2012-12-10 22:58:41 read

Abstract

The ability of listeners to ‘‘glimpse’’ acoustic cues during the quieter sections of an interrupted noise has primarily been studied using maskers with interruptions occurring simultaneously across the entire frequency range of the masker—broadband comodulated interruptions. Here, the possibility of uncomodulated glimpsing (the glimpsing of acoustic cues separated both in time and frequency) was investigated. To achieve this, speech reception thresholds for a set of intervocalic consonants were adaptively measured in 100‐Hz to 10‐kHz pink noise divided into a varying number ...

 

On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis

  [CiTO]
Speech Separation by Humans and Machines In Speech Separation by Humans and Machines (2005), pp. 181-197, doi:10.1007/0-387-22794-6_12
edited by Pierre Divenyi
posted to idealbinarymask separation by asterix77 on 2012-12-10 21:55:51 read

Abstract

In his famous treatise of computational vision, Marr (1982) makes a compelling argument for separating different levels of analysis in order to understand complex information processing. In particular, the computational theory level, concerned with the goal of computation and general processing strategy, must be separated from the algorithm level, or the separation of what from how. This chapter is an attempt at a computational-theory analysis of auditory scene analysis, where the main task is to understand the character of the CASA ...

 

Role of mask pattern in intelligibility of ideal binary-masked noisy speech

  [CiTO]
Vol. 126, No. 3. (01 September 2009), pp. 1415-1426, doi:10.1121/1.3179673
posted to idealbinarymask intelligibility masking psychoacoustics by asterix77 on 2012-12-10 17:12:06 read/This user's rating 4.0/Average rating 4.0

Abstract

Intelligibility of ideal binary masked noisy speech was measured on a group of normal hearing individuals across mixture signal to noise ratio (SNR) levels, masker types, and local criteria for forming the binary mask. The binary mask is computed from time-frequency decompositions of target and masker signals using two different schemes: an ideal binary mask computed by thresholding the local SNR within time-frequency units and a target binary mask computed by comparing the local target energy against the long-term average speech ...

 

Consonant and vowel confusions in speech-weighted noise

  [CiTO]
Vol. 121, No. 4. (01 April 2007), pp. 2312-2326, doi:10.1121/1.2642397
posted to intelligibility phonemes psychoacoustics by asterix77 on 2012-12-10 16:49:49 read

Abstract

This paper presents the results of a closed-set recognition task for 64 consonant-vowel sounds (16 C×4 V, spoken by 18 talkers) in speech-weighted noise (−22,−20,−16,−10,−2 [dB]) and in quiet. The confusion matrices were generated using responses of a homogeneous set of ten listeners and the confusions were analyzed using a graphical method. In speech-weighted noise the consonants separate into three sets: a low-scoring set C1 (/f/, /θ/, /v/, /ð/, /b/, /m/), a high-scoring set C2 (/t/, /s/, /z/, /ʃ/, /ʒ/) and set C3 (/n/, ...

 

A psychoacoustic method to find the perceptual cues of stop consonants in natural speech

  [CiTO]
Vol. 127, No. 4. (01 April 2010), pp. 2599-2610, doi:10.1121/1.3295689
posted to intelligibility psychoacoustics by asterix77 on 2012-12-10 15:10:46 read

Abstract

Synthetic speech has been widely used in the study of speech cues. A serious disadvantage of this method is that it requires prior knowledge about the cues to be identified in order to synthesize the speech. Incomplete or inaccurate hypotheses about the cues often lead to speech sounds of low quality. In this research a psychoacoustic method, named three-dimensional deep search (3DDS), is developed to explore the perceptual cues of stop consonants from naturally produced speech. For a given sound, it ...

 

An Uncertainty Propagation Approach to Robust ASR Using the ETSI Advanced Front-End

  [CiTO]
Selected Topics in Signal Processing, IEEE Journal of, Vol. 4, No. 5. (October 2010), pp. 824-833, doi:10.1109/jstsp.2010.2057194
posted to speechrecognition uncertaintypropagation by asterix77 on 2012-12-10 01:36:25 read

Abstract

In this paper, we show how uncertainty propagation, combined with observation uncertainty techniques, can be applied to a realistic implementation of robust distributed speech recognition (DSR) to improve recognition robustness furthermore, with little increase in computational complexity. Uncertainty propagation, or error propagation, techniques employ a probabilistic description of speech to reflect the information lost during speech enhancement or source separation in the time or frequency domain. This uncertain description is then propagated through the feature extraction process to the domain of ...

 

The influence of spectral characteristics of early reflections on speech intelligibility

  [CiTO]
Vol. 130, No. 2. (01 August 2011), pp. 996-1005, doi:10.1121/1.3609258
posted to earlyechos intelligibility by asterix77 on 2012-12-06 18:11:24 read

Abstract

The auditory system takes advantage of early reflections (ERs) in a room by integrating them with the direct sound (DS) and thereby increasing the effective speech level. In the present paper the benefit from realistic ERs on speech intelligibility in diffuse speech-shaped noise was investigated for normal-hearing and hearing-impaired listeners. Monaural and binaural speech intelligibility tests were performed in a virtual auditory environment where the spectral characteristics of ERs from a simulated room could be preserved. The useful ER energy was ...

 

Semiparametric Bayesian Inference for Time Series with Mixed Spectra

  [CiTO]
Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 59, No. 1. (1997), pp. 255-268, doi:10.1111/1467-9868.00067
posted to lpc robustlpc by asterix77 on 2012-11-28 17:21:47 read

Abstract

A Bayesian analysis is presented of a time series which is the sum of a stationary component with a smooth spectral density and a deterministic component consisting of a linear combination of a trend and periodic terms. The periodic terms may have known or unknown frequencies. The advantage of our approach is that different features of the data—such as the regression parameters, the spectral density, unknown frequencies and missing observations—are combined in a hierarchical Bayesian framework and estimated simultaneously. A Bayesian ...

 

Spectral envelope estimation using a penalized likelihood criterion

  [CiTO]
In Applications of Signal Processing to Audio and Acoustics, 1997. 1997 IEEE ASSP Workshop on (October 1997), 4 pp., doi:10.1109/aspaa.1997.625612
posted to frequencydomainlpc lpc robustlpc by asterix77 on 2012-11-28 16:56:32 read

Abstract

Finding a smooth spectral envelope that connects estimated sinusoids is a topic of major importance in audio signal processing. A penalized likelihood criterion is introduced for the estimation of the spectral envelope in the presence of measurement noise. Various simulation results are presented that highlight the efficiency of the proposed performance criterion ...

 

Discrete all-pole modeling

  [CiTO]
Signal Processing, IEEE Transactions on, Vol. 39, No. 2. (February 1991), pp. 411-423, doi:10.1109/78.80824
posted to frequencydomainlpc lpc robustlpc by asterix77 on 2012-11-28 16:07:01 read/This user's rating 3.0/Average rating 3.0

Abstract

A method for parametric modeling and spectral envelopes when only a discrete set of spectral points is given is introduced. This method, called discrete all-pole (DAP) modeling, uses a discrete version of the Itakura-Saito distortion measure as its error criterion. One result is an autocorrelation matching condition that overcomes the limitations of linear prediction and produces better fitting spectral envelopes for spectra that are representable by a relatively small discrete set of values, such as in voiced speech. An iterative algorithm ...

 

Maximum-Likelihood Autoregressive Estimation on Incomplete Spectra

  [CiTO]
In Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, Vol. 3 (April 2007), pp. III-1001-III-1004, doi:10.1109/icassp.2007.366851
posted to frequencydomainlpc lpc robustlpc by asterix77 on 2012-11-28 16:01:13 read/This user's rating 3.0/Average rating 3.0

Abstract

Frequency-selective autoregressive (AR) estimation is arousing increasing interest. We propose herein a new method to estimate the AR model from a reduced set of spectral samples. The proposed method is founded on the maximum likelihood criterion over the logarithmic spectral residue, and it is implemented efficiently with a multivariate Newton-Raphson algorithm. Results over deterministic and stochastic scenarios show its excellent performance ...

 

All-Pole Estimation in Spectral Domain

  [CiTO]
IEEE Transactions on Signal Processing, Vol. 55, No. 10. (October 2007), pp. 4821-4830, doi:10.1109/tsp.2007.897880
posted to frequencydomainlpc lpc robustlpc by asterix77 on 2012-11-27 17:08:53 read/This user's rating 3.0/Average rating 3.0
 

Probabilistic Inference of Speech Signals from Phaseless Spectrograms

  [CiTO]
In Neural Information Processing Systems (2003)
posted to graphicalmodels phase by asterix77 on 2012-11-02 18:28:57 read/This user's rating 4.0/Average rating 4.0

Abstract

Many techniques for complex speech processing such as denoising and deconvolution, time/frequency warping, multiple speaker separation, and multiple microphone analysis operate on sequences of short-time power spectra (spectrograms), a representation which is often well-suited to these tasks. However, a significant problem with algorithms that manipulate spectrograms is that the output spectrogram does not include a phase component, which is needed to create a time-domain signal that has good perceptual quality. Here we describe a generative model of time-domain speech signals and their spectrograms, and show how an efficient ...

 

A Comparison of Acoustic Features for Articulatory Inversion

  [CiTO]
In Interspeech (2007)
posted to articulatory by asterix77 on 2012-11-02 16:13:19 read
 

Robust LPC analysis of speech by extended correlation matching

  [CiTO]
In Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '85., Vol. 10 (April 1985), pp. 473-476, doi:10.1109/icassp.1985.1168377
posted to lpc robustlpc by asterix77 on 2012-11-02 16:07:52 read

Abstract

Contamination of speech, for example by environmental noise, is sometimes unavoidable. Under such circumstances the familiar LPC analysis technique, either for low bit-rate coding or for automated recognition at the receiver, becomes fragile thus jeopardizing the system objective. In this paper we present an extended correlation matching approach for LPC analysis which results in good spectral matching between the true speech spectrum and the all-pole model spectrum, especially for the first three formants. The method has been tested on both synthetic ...

 

The unimportance of phase in speech enhancement

  [CiTO]
Acoustics, Speech and Signal Processing, IEEE Transactions on, Vol. 30, No. 4. (August 1982), pp. 679-681, doi:10.1109/tassp.1982.1163920
posted to phase subjective by asterix77 on 2012-11-02 16:06:33 read

Abstract

The importance of Fourier transform phase in speech enhancement is considered. Results indicate that a more accurate estimation of phase is unwarranted in speech enhancement at the S/N ratios where the intelligibility scores of unprocessed speech range from 5 to 95 percent, if the phase estimate is used to reconstruct speech by combining it with an independently estimated magnitude or to reconstruct speech using the phase-only signal reconstruction algorithm. ...

 

Online EM for unsupervised models

  [CiTO]
In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (2009), pp. 611-619
posted to em gmm online by asterix77 on 2012-09-24 21:31:05 read

Abstract

The (batch) EM algorithm plays an important role in unsupervised induction, but it sometimes suffers from slow convergence. In this paper, we show that online variants (1) provide significant speedups and (2) can even find better solutions than those found by batch EM. We support these findings on four unsupervised tasks: part-of-speech tagging, document classification, word segmentation, and word alignment. ...

 

An Introduction to Conditional Random Fields for Relational Learning

  [CiTO]
(2007)
posted to crf graphicalmodel hmm statistics by asterix77 on 2010-09-15 23:21:41 read
 

No Unbiased Estimator of the Variance of K-Fold Cross-Validation

  [CiTO]
J. Mach. Learn. Res., Vol. 5 (2004), pp. 1089-1105
posted to correlation machinelearning statistics by asterix77  on 2010-08-16 21:13:50 read along with 2 people and 1 group caporaso jmartinezot biomedical-nlp

Abstract

Most machine learning researchers perform quantitative experiments to estimate generalization error and compare the performance of different algorithms (in particular, their proposed algorithm). In order to be able to draw statistically convincing conclusions, it is important to estimate the uncertainty of such estimates. This paper studies the very commonly used K-fold cross-validation estimator of generalization performance. The main theorem shows that there exists no universal (valid under all distributions) unbiased estimator of the variance of K-fold cross-validation. The analysis that accompanies ...

 

Generating Spike Trains with Specified Correlation Coefficients

  [CiTO]
Neural Computation, Vol. 21, No. 2. (4 August 2008), pp. 397-423, doi:10.1162/neco.2008.02-08-713

Abstract

Spike trains recorded from populations of neurons can exhibit substantial pairwise correlations between neurons and rich temporal structure. Thus, for the realistic simulation and analysis of neural systems, it is essential to have efficient methods for generating artificial spike trains with specified correlation structure. Here we show how correlated binary spike trains can be simulated by means of a latent multivariate gaussian model. Sampling from the model is computationally very efficient and, in particular, feasible even for large populations of neurons. ...

 

Tracking an unknown time-varying number of speakers using TDOA measurements: a random finite set approach

  [CiTO]
Signal Processing, IEEE Transactions on, Vol. 54, No. 9. (September 2006), pp. 3291-3304, doi:10.1109/tsp.2006.877658
posted to localization motion tracking by asterix77 on 2010-06-11 23:21:51 read/This user's rating 3.0/Average rating 3.0

Abstract

Speaker location estimation techniques based on time-difference-of-arrival measurements have attracted much attention recently. Many existing localization ideas assume that only one speaker is active at a time. In this paper, we focus on a more realistic assumption that the number of active speakers is unknown and time-varying. Such an assumption results in a more complex localization problem, and we employ the random finite set (RFS) theory to deal with that problem. The RFS concepts provide us with an effective, solid foundation ...

 

Tandem connectionist feature extraction for conventional HMM systems

  [CiTO]
Acoustics, Speech, and Signal Processing, IEEE International Conference on, Vol. 3 (2000), pp. 1635-1638, doi:10.1109/icassp.2000.862024
posted to neuralnet speechrecognition by asterix77 on 2010-04-22 18:12:51 read along with 1 person ryanbuaa

Abstract

Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural networks to estimate the probability distribution among subword units given the acoustic observations. In this work we show a large improvement in word recognition performance by combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling. By training the network to generate the subword probability posteriors, then using ...

 

The Aurora Experimental Framework for the Performance Evaluation of Speech Recognition Systems under Noisy Conditions

  [CiTO]
In ISCA ITRW ASR2000 (2000), pp. 29-32
posted to data noise speechrecognition by asterix77 on 2010-04-22 18:08:41 read along with 1 person bklynbam

Abstract

This paper describes a database designed to evaluate the performance of speech recognition algorithms in noisy conditions. The database may either be used to measure frontend feature extraction algorithms, using a defined HMM recognition back-end, or complete recognition systems. The source speech for this database is the TIdigits, consisting of connected digits task spoken by American English talkers (downsampled to 8kHz). A selection of 8 different real-world noises have been added to the speech over a range of signal to noise ...

Note: You may cite this page as: http://www.citeulike.org/user/asterix77/order/to_read,asc,last

Result page: 1 2 3 4 5 6 7 8 9 10 Next

Create CiTO

Create a CiTO relationship by dragging the [CiTO] link onto another article.

Alternatively, drag two articles into the two boxes below. This is useful when the two articles are not on the same page - the articles will be remembered between pages.

This article...

...this one

Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.