Tags

asterix77's library 835 articles

 
Sort by: Order: Empty fields:
 

Inference of Room Geometry From Acoustic Impulse Responses

  [CiTO]
Audio, Speech, and Language Processing, IEEE Transactions on, Vol. 20, No. 10. (December 2012), pp. 2683-2695, doi:10.1109/tasl.2012.2210877
posted to acoustics geometry reverb rooms by asterix77 on 2013-04-18 15:09:57 **** along with 1 person aharma

Abstract

Acoustic scene reconstruction is a process that aims to infer characteristics of the environment from acoustic measurements. We investigate the problem of locating planar reflectors in rooms, such as walls and furniture, from signals obtained using distributed microphones. Specifically, localization of multiple two- dimensional (2-D) reflectors is achieved by estimation of the time of arrival (TOA) of reflected signals by analysis of acoustic impulse responses (AIRs). The estimated TOAs are converted into elliptical constraints about the location of the line reflector, ...

 

Exemplar-Based Processing for Speech Recognition: An Overview

  [CiTO]
Signal Processing Magazine, IEEE, Vol. 29, No. 6. (November 2012), pp. 98-113, doi:10.1109/msp.2012.2208663
posted to exemplar speechrecognition by asterix77 on 2013-04-09 20:43:44 ****

Abstract

Solving real-world classification and recognition problems requires a principled way of modeling the physical phenomena generating the observed data and the uncertainty in it. The uncertainty originates from the fact that many data generation aspects are influenced by nondirectly measurable variables or are too complex to model and hence are treated as random fluctuations. For example, in speech production, uncertainty could arise from vocal tract variations among different people or corruption by noise. The goal of modeling is to establish a ...

 

Sparse Representations in Audio and Music: From Coding to Source Separation

  [CiTO]
Proceedings of the IEEE, Vol. 98, No. 6. (June 2010), pp. 995-1005, doi:10.1109/jproc.2009.2030345
posted to review sparse by asterix77 on 2013-03-01 16:56:46 ****

Abstract

Sparse representations have proved a powerful tool in the analysis and processing of audio signals and already lie at the heart of popular coding standards such as MP3 and Dolby AAC. In this paper we give an overview of a number of current and emerging applications of sparse representations in areas from audio coding, audio enhancement and music transcription to blind source separation solutions that can solve the ??cocktail party problem.?? In each case we will show how the prior assumption ...

 

Recursive expectation-maximization (EM) algorithms for time-varying parameters with applications to multiple target tracking

  [CiTO]
Signal Processing, IEEE Transactions on, Vol. 47, No. 2. (February 1999), pp. 306-320, doi:10.1109/78.740104
posted to em kalmanfilter separation tracking by asterix77 on 2013-02-20 21:47:53 ****

Abstract

We investigate the application of expectation maximization (EM) algorithms to the classical problem of multiple target tracking (MTT) for a known number of targets. Conventional algorithms, which deal with this problem, have a computational complexity that depends exponentially on the number of targets, and usually divide the problem into a localization stage and a tracking stage. The new algorithms achieve a linear dependency and integrate these two stages. Three optimization criteria are proposed, using deterministic and stochastic dynamic models for the ...

 

Multiple hypothesis tracking using clustered measurements

  [CiTO]
In Robotics and Automation, 2009. ICRA '09. IEEE International Conference on (May 2009), pp. 3955-3961, doi:10.1109/robot.2009.5152841
posted to clustering temporal tracking by asterix77 on 2013-02-11 22:15:01 ****

Abstract

This paper introduces an algorithm for tracking targets whose locations are inferred from clusters of observations. This method, which we call MHTC, expands the traditional multiple hypothesis tracking (MHT) hypothesis tree to include model hypotheses - possible ways the data can be clustered in each time step - as well as ways the measurements can be associated with existing targets across time steps. We present this new hypothesis framework and its probability expressions and demonstrate MHTC's operation in a robotic solution ...

 

On-line expectation–maximization algorithm for latent data models

  [CiTO]
Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 71, No. 3. (1 June 2009), pp. 593-613, doi:10.1111/j.1467-9868.2009.00698.x

Abstract

Summary.  We propose a generic on-line (also sometimes called adaptive or recursive) version of the expectation–maximization (EM) algorithm applicable to latent variable models of independent observations. Compared with the algorithm of Titterington, this approach is more directly connected to the usual EM algorithm and does not rely on integration with respect to the complete-data distribution. The resulting algorithm is usually simpler and is shown to achieve convergence to the stationary points of the Kullback–Leibler divergence between the marginal distribution of the ...

 

Adapting Scrum to Managing a Research Group

  [CiTO]
No. CS-TR-4966. (September 2010)
posted to researchmethodology by asterix77 on 2013-01-23 19:58:06 ****
 

Band importance for sentences and words reexamined

  [CiTO]
Vol. 133, No. 1. (01 January 2013), pp. 463-473, doi:10.1121/1.4770246
posted to bandimportancefunction psychoacoustics by asterix77 on 2013-01-20 20:15:22 ****

Abstract

Band-importance functions were created using the “compound” technique [Apoux and Healy, J. Acoust. Soc. Am. 132, 1078–1087 (2012)] that accounts for the multitude of synergistic and redundant interactions that take place among speech bands. Functions were created for standard recordings of the speech perception in noise (SPIN) sentences and the Central Institute for the Deaf (CID) W-22 words using 21 critical-band divisions and steep filtering to eliminate the influence of filter slopes. On a given trial, a band of interest was ...

 

Cochlea-scaled entropy, not consonants, vowels, or time, best predicts speech intelligibility

  [CiTO]
Proceedings of the National Academy of Sciences, Vol. 107, No. 27. (06 July 2010), pp. 12387-12392, doi:10.1073/pnas.0913625107
posted to importance intelligibility noise by asterix77 on 2013-01-18 16:26:21 ****

Abstract

Speech sounds are traditionally divided into consonants and vowels. When only vowels or only consonants are replaced by noise, listeners are more accurate understanding sentences in which consonants are replaced but vowels remain. From such data, vowels have been suggested to be more important for understanding sentences; however, such conclusions are mitigated by the fact that replaced consonant segments were roughly one-third shorter than vowels. We report two experiments that demonstrate listener performance to be better predicted by simple psychoacoustic measures ...

 

Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle

  [CiTO]
Audio, Speech, and Language Processing, IEEE Transactions on, Vol. 18, No. 6. (August 2010), pp. 1643-1654, doi:10.1109/tasl.2009.2038819
posted to frequencydomainlpc lpc monaural robustlpc separation by asterix77 on 2012-11-28 18:19:29 ****

Abstract

A new method for the estimation of multiple concurrent pitches in piano recordings is presented. It addresses the issue of overlapping overtones by modeling the spectral envelope of the overtones of each note with a smooth autoregressive model. For the background noise, a moving-average model is used and the combination of both tends to eliminate harmonic and sub-harmonic erroneous pitch estimations. This leads to a complete generative spectral model for simultaneous piano notes, which also explicitly includes the typical deviation from ...

 

Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription

  [CiTO]
(27 Jun 2012)
posted to music transcription by asterix77 on 2012-09-26 21:25:05 ****

Abstract

We investigate the problem of modeling symbolic sequences of polyphonic music in a completely general piano-roll representation. We introduce a probabilistic model based on distribution estimators conditioned on a recurrent neural network that is able to discover temporal dependencies in high-dimensional sequences. Our approach outperforms many traditional models of polyphonic music on a variety of realistic datasets. We show how our musical language model can serve as a symbolic prior to improve the accuracy of polyphonic transcription. ...

 

Web-Scale Multimedia Analysis: Does Content Matter?

  [CiTO]
Multimedia, IEEE, Vol. 18, No. 2. (February 2011), pp. 12-15, doi:10.1109/mmul.2011.34

Abstract

The initial success of Web-image search was based exclusively on the text around an image. Certainly we have progressed since then. But recent research results dramatically beg to differ. For example, if you want to judge the similarity of two different pieces of music, should you look at the musical notes, or should you look at what people say about the music? Similarly, how should you find the best movie to recommend to a friend? Shouldn't the genre of the movie ...

 

Estimators of The Magnitude-Squared Spectrum and Methods for Incorporating SNR Uncertainty

  [CiTO]
IEEE Transactions on Audio, Speech, and Language Processing (2010), doi:10.1109/tasl.2010.2082531
posted to estimation masking oracle separation by asterix77 on 2011-05-06 06:04:46 ****

Abstract

Statistical estimators of the magnitude-squared spectrum are derived based on the assumption that the magnitude-squared spectrum of the noisy speech signal can be computed as the sum of the (clean) signal and noise magnitude-squared spectra. Maximum a posterior (MAP) and minimum mean square error (MMSE) estimators are derived based on a Gaussian statistical model. The gain function of the MAP estimator was found to be identical to the gain function used in the ideal binary mask (IdBM) that is widely used ...

 

Spatial Hearing in Echoic Environments: The Role of the Envelope in Owls

  [CiTO]
Vol. 67, No. 4. (26 August 2010), pp. 643-655
posted to itd localization neural precedenceeffect by asterix77 on 2010-08-26 15:11:07 ****

Abstract

In the precedence effect, sounds emanating directly from the source are localized preferentially over their reflections. Although most studies have focused on the delay between the onset of a sound and its echo, humans still experience the precedence effect when this onset delay is removed. We tested in barn owls the hypothesis that an ongoing delay, equivalent to the onset delay, is discernible from the envelope features of amplitude-modulated stimuli and may be sufficient to evoke this effect. With sound pairs ...

 

Learning Similarity from Collaborative Filters

  [CiTO]
In International Society of Music Information Retrieval Conference (2010), pp. 345-350
posted to collaborativefiltering machinelearning similarity tags by asterix77 on 2010-08-23 21:16:41 ****
 

Improving Auto-tagging by Modeling Semantic Co-occurrences

  [CiTO]
In International Society of Music Information Retrieval Conference (2010), pp. 297-302
posted to classification dirichlet tags by asterix77 on 2010-08-23 21:16:41 ****
 

An Efficient Learning Procedure for Deep Boltzmann Machines.

  [CiTO]
No. MIT-CSAIL-TR-2010-037. (August 2010)
posted to machinelearning rbm review by asterix77 on 2010-08-16 14:06:20 **** along with 1 person wangxinxi

Abstract

We present a new learning algorithm for Boltzmann Machines that contain many layers of hid- den variables. Data-dependent statistics are estimated using a variational approximation that tends to focus on a single mode, and data-independent statistics are estimated using persistent Markov chains. The use of two quite different techniques for estimating the two types of statistic that enter into the gradient of the log likelihood makes it practical to learn Boltzmann Machines with multiple hidden layers and millions of parameters. The learning can be made more ...

 

An Introduction to the Conjugate Gradient Method Without the Agonizing Pain

  [CiTO]
(1994)
posted to gradient gradient-descent optimization by asterix77  on 2010-07-22 16:47:05 **** along with 23 people and 2 groups abergeron ansobol aufrank bwilfley bwoodacre daniel51 dbkruger ddahlem dsquared fghorow iff kohei-h kutabar nafets nsephus3 pmendes rally rblake RhysU solo1nombre thomast2 thomast3 Zaphod uiuc-cs ur-cls

Abstract

The Conjugate Gradient Method is the most prominent iterative method for solving sparse systems of linear equations. Unfortunately, many textbook treatments of the topic are written so that even their own authors would be mystified, if they bothered to read their own writing. For this reason, an understanding of the method has been reserved for the elite brilliant few who have painstakingly decoded the mumblings of their forebears. Nevertheless, the Conjugate Gradient Method is a composite of simple, elegant ideas that ...

 

Measuring classifier performance: a coherent alternative to the area under the ROC curve

  [CiTO]
Mach. Learn. In Machine Learning, Vol. 77, No. 1. (1 October 2009), pp. 103-123, doi:10.1007/s10994-009-5119-5
posted to evaluation machinelearning ranking by asterix77  on 2010-06-11 22:37:42 **** along with 13 people abrentnall arider1 arsyed Borelli dakelley gotgenes humburg jjrodriguez lfriedl mtkachenko neils nliu82 PaperCollector

Abstract

The area under the ROC curve (AUC) is a very widely used measure of performance for classification and diagnostic rules. It has the appealing property of being objective, requiring no subjective input from the user. On the other hand, the AUC has disadvantages, some of which are well known. For example, the AUC can give potentially misleading results if ROC curves cross. However, the AUC also has a much more serious deficiency, and one which appears not to have been previously ...

 

Using Regression to Combine Data Sources for Semantic Music Discovery

  [CiTO]
In Proc. International Symposium on Music Information Retrieval (2009)
 

The Cocktail Party Problem

  [CiTO]
Neural Comput., Vol. 17, No. 9. (2005), pp. 1875-1902, doi:10.1162/0899766054322964

Abstract

This review presents an overview of a challenging problem in auditory perception, the cocktail party phenomenon, the delineation of which goes back to a classic paper by Cherry in 1953. In this review, we address the following issues: (1) human auditory scene analysis, which is a general process carried out by the auditory system of a human listener; (2) insight into auditory perception, which is derived from Marr's vision theory; (3) computational auditory scene analysis, which focuses on specific approaches aimed ...

 

Accurate Sound Localization in Reverberant Environments Is Mediated by Robust Encoding of Spatial Cues in the Auditory Midbrain

  [CiTO]
Neuron, Vol. 62, No. 1. (16 April 2009), pp. 123-134, doi:10.1016/j.neuron.2009.02.018
posted to localization neural psych reverb by asterix77 on 2009-04-16 17:44:23 ****

Abstract

In reverberant environments, acoustic reflections interfere with the direct sound arriving at a listener's ears, distorting the spatial cues for sound localization. Yet, human listeners have little difficulty localizing sounds in most settings. Because reverberant energy builds up over time, the source location is represented relatively faithfully during the early portion of a sound, but this representation becomes increasingly degraded later in the stimulus. We show that the directional sensitivity of single neurons in the auditory midbrain of anesthetized cats follows ...

 

A New Method Based on Spectral Subtraction for Speech Dereverberation

  [CiTO]
Acta Acustica united with Acustica, pp. 359-366
posted to dereverb model monaural reverb by asterix77 on 2009-03-11 16:24:37 **** along with 1 person tyoshioka

Abstract

A new monaural method for the suppression of late room reverberation from speech signals, based on spectral subtraction, is presented. The problem of reverberation suppression differs from classical speech de-noising in that the "reverberation noise" is non stationary. In this paper, the use of a novel estimator of the non-stationary reverberation-noise power spectrum, based on a statistical model of late reverberation, is presented. The algorithm is tested on real reverberated signals. The performances for different RIRs with Tr ranging from 0.34 ...

 

Binaural Distance Perception Based on Direct-to-Reverberant Energy Ratio

  [CiTO]
In Proc. International Workshop on Acoustic Echo and Noise Control (September 2008)
posted to bayesian binaural d2r localization reverb by asterix77 on 2009-02-20 00:30:11 ****

Abstract

The direct-to-reverberant energy ratio has long been recognized as an absolute auditory cue for sound source distance perception in listeners. Traditional methods to extract this energy ratio are based on post-processing of the estimated room impulse response, which is computationally expensive and inaccurate in practice. An alternative is based on estimating the energy arriving from the azimuth of the direct source, under the assumption that reverberant components result in a spatially-diffuse sound field. We propose a binaural equalization-cancellation technique to calculate ...

 

Point-to-Point Correlation of Sound Pressures in Reverberation Chambers

  [CiTO]
Journal of the Acoustical Society of America, Vol. 45, No. 1. (1969), pp. 337-337
posted to acoustics coherence reverb by asterix77 on 2009-02-19 22:36:38 ****

Abstract

Point-to-point correlations of reverberant sound fields are important both for high-intensity noise tests of spacecraft and for exploring the state of diffusion of the sound field. The classic paper on reverberant field correlation by Cook and others [J. Acoust Soc. Amer. 27, 1072 (1955)] derived a narrow-band correlation coefficient of (sin kr)/kr, where r is the separation of any two points considered, on the assumption that the field is completely diffuse. The primary intent of the present paper is to consider ...

 

Spatial-Correlation Functions for Various Noise Models

  [CiTO]
Journal of the Acoustical Society of America, Vol. 34, No. 11. (1962), pp. 1732-1736

Abstract

Observations indicate that noise in the ocean is a superposition of an isotropic noise field and an anisotropic noise field originating at the surface. Models which produce such noise fields are described, and the spatial-correlation functions are obtained. The volume-noise model, which produces an isotropic noise field, consists of noise sources uniformly distributed within a sphere. A single-frequency component of each noise source is considered; the mean-square output of each is the same, the relative phases are random, and inverse spreading ...

 

Speech Source Separation in Convolutive Environments Using Space-Time-Frequency Analysis

  [CiTO]
EURASIP Journal of Advances in Signal Processing, Vol. 2006, No. 1. (January 2006), pp. 1-12, doi:10.1155/asp/2006/38412

Abstract

We propose a new method for speech source separation that is based on directionally-disjoint estimation of the transfer functions between microphones and sources at different frequencies and at multiple times. The spatial transfer functions are estimated from eigenvectors of the microphones' correlation matrix. Smoothing and association of transfer function parameters across different frequencies are performed by simultaneous extended Kalman filtering of the amplitude and phase estimates. This approach allows transfer function estimation even if the number of sources is greater than ...

 

On the use of sparse time-relative auditory codes for music

  [CiTO]
In International Symposium on Music Information Retrieval (September 2008), pp. 603-608
posted to ismir representation sparse by asterix77 on 2008-11-01 19:42:23 ****
 

A two-microphone dual delay-line approach for extraction of a speech sound in the presence of multiple interferers

  [CiTO]
Journal of the Acoustical Society of America, Vol. 110, No. 6. (2001), pp. 3218-3231, doi:10.1121/1.1419090
posted to model separation by asterix77 on 2008-07-24 21:27:40 ****

Abstract

This paper describes algorithms for signal extraction for use as a front-end of telecommunication devices, speech recognition systems, as well as hearing aids that operate in noisy environments. The development was based on some independent, hypothesized theories of the computational mechanics of biological systems in which directional hearing is enabled mainly by binaural processing of interaural directional cues. Our system uses two microphones as input devices and a signal processing method based on the two input channels. The signal processing procedure ...

 

Multiresolution spectrotemporal analysis of complex sounds

  [CiTO]
Journal of the Acoustical Society of America, Vol. 118, No. 2. (2005), pp. 887-906, doi:10.1121/1.1945807
posted to model timefrequency by asterix77 on 2008-03-25 16:30:58 ****

Abstract

A computational model of auditory analysis is described that is inspired by psychoacoustical and neurophysiological findings in early and central stages of the auditory system. The model provides a unified multiresolution representation of the spectral and temporal features likely critical in the perception of sound. Simplified, more specifically tailored versions of this model have already been validated by successful application in the assessment of speech intelligibility [Elhilali et al., Speech Commun. 41(2-3), 331–348 (2003); Chi et al., J. Acoust. Soc. Am. ...

 

Survey of sparse and non-sparse methods in source separation

  [CiTO]
International Journal of Imaging Systems and Technology, Vol. 15, No. 1. (2005), pp. 18-33, doi:10.1002/ima.20035
posted to ica model separation timefrequency by asterix77 on 2008-02-13 18:27:37 ****

Abstract

Source separation arises in a variety of signal processing applications, ranging from speech processing to medical image analysis. The separation of a superposition of multiple signals is accomplished by taking into account the structure of the mixing process and by making assumptions about the sources. When the information about the mixing process and sources is limited, the problem is called ?blind?. By assuming that the sources can be represented sparsely in a given basis, recent research has demonstrated that solutions to ...

 

Frequency domain binaural model based on interaural phase and level differences

  [CiTO]
Acoustical Science and Technology, Vol. 24, No. 4. (2003), pp. 172-178
posted to binaural localization model separation speechrecognition by asterix77 on 2008-01-10 20:45:53 ****

Abstract

We can communicate with others in a noisy environment. This phenomenon is known as a “Cocktail Party Effect” and is one of the most important binaural functions. This paper addresses a frequency domain binaural model that plays the role of a binaural function based on an interaural phase and level difference. The proposed model is evaluated not only as a front-end of the speech recognition system, but also as a speech enhancer. According to the evaluation, when the direction of arrival ...

 

Measurement of Correlation Coefficients in Reverberant Sound Fields

  [CiTO]
Journal of the Acoustical Society of America, Vol. 27, No. 6. (1955), pp. 1072-1077, doi:10.1121/1.1908122
posted to acoustics bib-waspaa0 coherence crosscorrelation by asterix77 on 2008-01-10 20:35:24 ****

Abstract

Reverberation chambers used for acoustical measurements should have completely random sound fields. We denote by R the cross-correlation coefficient for the sound pressures at two points a distance r apart. R = p1p2Av/(p12Avp22Av), where p1 is the sound pressure at one point, p2 that at the other, and the angular brackets denote long time averages. In a random sound field, R = (sinkr)/kr, where k = 2/(the wavelength of the sound). An instrument for measuring and recording R as a function ...

 

Reducing musical noise by a fine-shift overlap-add method applied to source separation using a time-frequency mask

  [CiTO]
Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on In Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing, Vol. 3 (2005), pp. 81-84, doi:10.1109/icassp.2005.1415651
posted to masking model separation by asterix77 on 2008-01-10 20:33:10 ****

Abstract

Musical noise is a typical problem with blind source separation using a time-frequency mask. We report that a fine-shift and overlap-add method reduces the musical noise without degrading the separation performance. The effectiveness was confirmed by results of a listening test undertaken in a room with a reverberation time of RT/sub 60/=130 ms. ...

 

A Probabilistic Model for Binaural Sound Localization

  [CiTO]
IEEE Transactions on Systems, Man, and Cybernetics---Part B: Cybernetics In Systems, Man, and Cybernetics, Part B, IEEE Transactions on, Vol. 36, No. 5. (October 2006), pp. 982-994, doi:10.1109/tsmcb.2006.872263

Abstract

This paper proposes a biologically inspired and technically implemented sound localization system to robustly estimate the position of a sound source in the frontal azimuthal half-plane. For localization, binaural cues are extracted using cochleagrams generated by a cochlear model that serve as input to the system. The basic idea of the model is to separately measure interaural time differences and interaural level differences for a number of frequencies and process these measurements as a whole. This leads to two-dimensional frequency versus ...

 

The effect of overlap-masking on binaural reverberant word intelligibility

  [CiTO]
Journal of the Acoustical Society of America, Vol. 116, No. 5. (2004), pp. 3141-3151, doi:10.1121/1.1781621
posted to binaural intelligibility psych reverb by asterix77 on 2008-01-10 00:02:19 ****

Abstract

Reverberation interferes with the ability to understand speech in rooms. Overlap-masking explains this degradation by assuming reverberant phonemes endure in time and mask subsequent reverberant phonemes. Most listeners benefit from binaural listening when reverberation exists, indicating that the listener's binaural system processes the two channels to reduce the reverberation. This paper investigates the hypothesis that the binaural word intelligibility advantage found in reverberation is a result of binaural overlap-masking release with the reverberation acting as masking noise. The tests utilize phonetically ...

 

The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List

  [CiTO]
J. Mach. Learn. Res., Vol. 10 (December 2009), pp. 2233-2271
posted to boosting ranking by asterix77 on 2013-04-17 19:59:08 ***

Abstract

We are interested in supervised ranking algorithms that perform especially well near the top of the ranked list, and are only required to perform sufficiently well on the rest of the list. In this work, we provide a general form of convex objective that gives high-scoring examples more importance. This "push" near the top of the list can be chosen arbitrarily large or small, based on the preference of the user. We choose lp-norms to provide a specific type of push; ...

 

Sparse Adaptive Representations for Musical Signals

  [CiTO]
In Signal Processing Methods for Music Transcription (2006), pp. 65-98, doi:10.1007/0-387-32845-9_3
posted to review sparse by asterix77 on 2013-03-04 17:18:08 ***

Abstract

Musical signals are, strictly speaking, acoustic signals where some aesthetically relevant information is conveyed through propagating pressure waves. Although the human auditory system exhibits a remarkable ability to interpret and understand these sound waves, these types of signals cannot be processed as such by computers. Obviously, the signals have to be converted into digital form, and this first implies sampling and quantization. In time-domain digital formats, such as the Pulse Code Modulation (PCM)—or newer formats such as one-bit oversampled bitstreams used ...

 

Maximum likelihood spectral estimation and its application to narrow-band speech coding

  [CiTO]
Acoustics, Speech and Signal Processing, IEEE Transactions on, Vol. 32, No. 2. (April 1984), pp. 243-251, doi:10.1109/tassp.1984.1164318
posted to frequencydomainlpc itakurasaito lpc by asterix77 on 2013-02-21 18:20:52 ***

Abstract

Itakura and Saito [1] used the maximum likelihood (ML) method to derive a spectral matching criterion for autoregressive (i.e., all-pole) random processes. In this paper, their results are generalized to periodic processes having arbitrary model spectra. For the all-pole model, Kay's [2] covariance domain solution to the recursive ML (RML) problem is cast into the spectral domain and used to obtain the RML solution for periodic processes. When applied to speech, this leads to a method for solving the joint pitch ...

 

Model-Based Dereverberation Preserving Binaural Cues

  [CiTO]
Audio, Speech, and Language Processing, IEEE Transactions on, Vol. 18, No. 7. (2010), pp. 1732-1745, doi:10.1109/tasl.2010.2052156
posted to binaural dereverb by asterix77 on 2013-02-12 19:01:04 ***

Abstract

The ability of the human auditory system for sound localization mainly depends on the binaural cues, especially interaural time and level differences (ITD and ILD). In the context of digital hearing aids and binaural audio transmission systems, these cues can be severely degraded by independent bilateral signal processing such as dereverberation or noise reduction. This contribution presents a novel two-stage binaural dereverberation algorithm which explicitly preserves the binaural cues. The first stage is based on a statistical model of the room ...

 

Sequential Monte Carlo methods for multitarget filtering with random finite sets

  [CiTO]
Aerospace and Electronic Systems, IEEE Transactions on, Vol. 41, No. 4. (October 2005), pp. 1224-1245, doi:10.1109/taes.2005.1561884
posted to particlefilter randomfiniteset temporal tracking by asterix77 on 2013-02-12 18:38:17 ***

Abstract

Random finite sets (RFSs) are natural representations of multitarget states and observations that allow multisensor multitarget filtering to fit in the unifying random set framework for data fusion. Although the foundation has been established in the form of finite set statistics (FISST), its relationship to conventional probability is not clear. Furthermore, optimal Bayesian multitarget filtering is not yet practical due to the inherent computational hurdle. Even the probability hypothesis density (PHD) filter, which propagates only the first moment (or PHD) instead ...

 

Binaural Localization of Multiple Sources in Reverberant and Noisy Environments

  [CiTO]
Audio, Speech, and Language Processing, IEEE Transactions on, Vol. 20, No. 5. (July 2012), pp. 1503-1512, doi:10.1109/tasl.2012.2183869
posted to casa localization reverb by asterix77 on 2013-02-11 22:16:12 ***

Abstract

Sound source localization from a binaural input is a challenging problem, particularly when multiple sources are active simultaneously and reverberation or background noise are present. In this work, we investigate a multi-source localization framework in which monaural source segregation is used as a mechanism to increase the robustness of azimuth estimates from a binaural input. We demonstrate performance improvement relative to binaural only methods assuming a known number of spatially stationary sources. We also propose a flexible azimuth-dependent model of binaural ...

 

Recursive Parameter Estimation Using Incomplete Data

  [CiTO]
Journal of the Royal Statistical Society. Series B (Methodological), Vol. 46, No. 2. (1984)
posted to em online temporal by asterix77 on 2013-02-11 16:21:09 ***

Abstract

Stochastic approximation procedures are considered for the estimation of parameters using incomplete data. One procedure is stated and illustrated which often leads to asymptotically efficient estimators. Others are developed which, although possibly not optimal in the above sense, will be very much easier to apply. This will be particularly advantageous when quick recursive estimates are required. Examples are given and a link is made between one of the sub-optimal methods and the EM algorithm. ...

 

Line spectrum representation of linear predictor coefficients of speech signals

  [CiTO]
Vol. 57, No. S1. (01 April 1975), pp. S35-S35, doi:10.1121/1.1995189
posted to lsf lsp by asterix77 on 2013-02-04 16:49:30 ***

Abstract

It has been known that the linear predictor coefficients (LPC) of speech signals can be transformed into a “pseudo” vocal‐tract area function whose boundary conditions are (a) a complete opening at the lips and (b) a matching resistance termination at the glottis. If the boundary condition at the glottis is replaced by a complete opening or a complete closure, all the poles of the resulting system function will move onto the unit circle in z plane. Using this fact it is ...

 

A scalar homotopy method for parallel and robust tracking of line spectral pairs

  [CiTO]
In Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on, Vol. 2 (May 1996), pp. 805-808 vol. 2, doi:10.1109/icassp.1996.543243
posted to lsp temporal tracking by asterix77 on 2013-01-25 21:55:32 ***

Abstract

We present an adaptive path-following method based on the technique of homotopy, which efficiently computes the line spectral pairs by exploiting their natural ordering and low frame-to-frame variation. We first define continuous paths from known roots of the LSP polynomials of a prior speech frame to the unknown roots of the next frame in the sequence. A gradient-search based numerical predictor-corrector procedure is then used for tracing these paths in order to compute the unknown roots. This method uses only scalar ...

 

Efficient model-based speech separation and denoising using non-negative subspace analysis

  [CiTO]
In Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on (March 2008), pp. 1833-1836, doi:10.1109/icassp.2008.4517989
posted to mixturemodel nmf separation by asterix77 on 2013-01-23 21:08:20 ***

Abstract

We present a new probabilistic architecture for analyzing composite non-negative data, called Non-negative Subspace Analysis (NSA). The NSA model provides a framework for understanding the relationships between sparse subspace and mixture model based approaches, and encompasses a range of models, including Sparse Non-negative Matrix Factorization (SNMF) [1] and mixture-model based analysis as special cases. We present a convenient instantiation of the NSA model, and an efficient variational approximate learning and inference algorithm that combines the advantages of SNMF and mixture model-based ...

 

Event-Driven Data Acquisition and Digital Signal Processing—A Tutorial

  [CiTO]
Circuits and Systems II: Express Briefs, IEEE Transactions on, Vol. 57, No. 8. (August 2010), pp. 577-581, doi:10.1109/tcsii.2010.2056012
posted to asynchronous review sequence by asterix77 on 2013-01-15 15:31:50 ***

Abstract

Event-driven analog-to-digital conversion and associated digital signal processing techniques are reviewed. Such techniques, still in the research stage, have the potential to significantly reduce the consumption of energy and bandwidth resources in several important applications. ...

 

Unsupervised learning of time-frequency patches as a noise-robust representation of speech

  [CiTO]
Speech Commun., Vol. 51, No. 11. (18 November 2009), pp. 1124-1138, doi:10.1016/j.specom.2009.05.003
posted to patches representation unsupervised by asterix77 on 2013-01-15 15:18:38 ***

Abstract

We present a self-learning algorithm using a bottom-up based approach to automatically discover, acquire and recognize the words of a language. First, an unsupervised technique using non-negative matrix factorization (NMF) discovers phone-sized time-frequency patches into which speech can be decomposed. The input matrix for the NMF is constructed for static and dynamic speech features using a spectral representation of both short and long acoustic events. By describing speech in terms of the discovered time-frequency patches, patch activations are obtained which express ...

 

Supervised topic model for automatic image annotation

  [CiTO]
In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on (March 2010), pp. 1894-1897, doi:10.1109/icassp.2010.5495341
posted to autotag image topicmodel by asterix77 on 2013-01-15 15:17:09 ***

Abstract

This paper presents a new probabilistic model for the task of image annotation. Our model, which we call sLDA-bin, extends supervised Latent Dirichlet Allocation (sLDA) model to handle a multi-variate binary response variable of the annotation data. Unlike correspondence LDA (cLDA), the association model in sLDA allows each caption word to be associated with more than 1 image region and is thus more appropriate for annotation words that globally describe the scene. By modeling the response variable as a multi-variate Bernoulli, ...

 

Large Margin Methods for Structured and Interdependent Output Variables

  [CiTO]
J. Mach. Learn. Res., Vol. 6 (December 2005), pp. 1453-1484

Abstract

Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary issue of designing classification algorithms that can deal with more complex outputs, such as trees, sequences, or sets. More generally, we consider problems involving multiple dependent output variables, structured output spaces, and classification problems with class attributes. In order to accomplish ...

Note: You may cite this page as: http://www.citeulike.org/user/asterix77/order/to_read,desc,last

Result page: 1 2 3 4 5 6 7 8 9 10 Next

Create CiTO

Create a CiTO relationship by dragging the [CiTO] link onto another article.

Alternatively, drag two articles into the two boxes below. This is useful when the two articles are not on the same page - the articles will be remembered between pages.

This article...

...this one

Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.