Tags

Group: searchingspeech2010 - library 484 articles

 
 

Towards increasing speech recognition error rates

  [CiTO]
Speech Commun., Vol. 18 (May 1996), pp. 205-231
posted to no-tag by marlar to the group searchingspeech2010 on 2011-10-04 00:39:10 **

Abstract

An abstract is not available. ...

 

Intelligent Multimedia Information Retrieval

  [CiTO]
(02 May 1997)
posted to no-tag by marlar to the group searchingspeech2010 on 2011-10-03 19:06:15 ** along with 1 person thsant

Abstract

<B>Foreword by Karen Spärck Jones</B><br /> <br /> <br /> Intelligent multimedia information retrieval lies at the intersection of artificial intelligence, information retrieval, human-computer interaction, and multimedia computing. Its systems enable users to create, process, summarize, present, interact with, and organize information within and across different media such as text, speech, graphics, imagery, and video. These systems go beyond traditional hypermedia and hypertext environments to analyze and generate media, and support intelligent interaction with or via multiple media.<br /> <br /> ...

 

Improving meeting summarization by focusing on user needs: a task-oriented evaluation

  [CiTO]
In Proceedings of the 14th international conference on Intelligent user interfaces (2009), pp. 17-26, doi:10.1145/1502650.1502657
posted to no-tag by marlar to the group searchingspeech2010 on 2011-10-02 22:43:42 ** along with 1 person diogomartins

Abstract

Advances in multimedia technologies have enabled the creation of huge archives of audio-video recordings of meetings, and there is burgeoning interest in developing meeting browsers to help users better leverage these archives. A recent study has shown that extractive summaries provide a more efficient way of navigating meeting content than simply reading through the transcript and using the audio-video record, or navigating via keyword search (Murray, 2007). The extractive summary technique identifies informative dialogue acts to generate general purpose summaries. These ...

 

Multimodal Genre Analysis Applied to Digital Television Archives

  [CiTO]
Database and Expert Systems Applications, International Workshop on, Vol. 0 (2008), pp. 130-134, doi:10.1109/dexa.2008.22
posted to no-tag by marlar to the group searchingspeech2010 on 2011-10-02 19:49:01 **

Abstract

Automatic genre classification is a simple and effective solution to describe semantic properties of multimedia data. In this paper, a method to classify the genre of TV programmes is presented. In our approach, four multimodal vectors, including both low-level perceptual descriptors and higher-level, human-centred features are employed. These vectors serve as the input for a parallel neural network system that performs classification of seven video genres. The experiment results confirm the effectiveness of our method, reaching a classification accuracy rate of ...

 

Automated story capture from conversational speech

  [CiTO]
In Proceedings of the 3rd international conference on Knowledge capture (2005), pp. 145-152, doi:10.1145/1088622.1088649
posted to no-tag by marlar to the group searchingspeech2010 on 2011-10-02 19:47:17 ** along with 1 person jliegl

Abstract

While storytelling has long been recognized as an important part of effective knowledge management in organizations, knowledge management technologies have generally not distinguished between stories and other types of discourse. In this paper we describe a new type of technological support for storytelling that involves automatically capturing the stories that people tell to each other in conversations. We describe our first attempt at constructing an automated story extraction system using statistical text classification and a simple voting scheme. We evaluate the ...

 

Towards robust features for classifying audio in the CueVideo system

  [CiTO]
In Proceedings of the seventh ACM international conference on Multimedia (Part 1) (1999), pp. 393-400, doi:10.1145/319463.319658
posted to no-tag by marlar to the group searchingspeech2010 on 2011-10-02 19:45:39 ** along with 1 person abergeron

Abstract

The role of audio in the context of multimedia applications involving video is becoming increasingly important. Many efforts in this area focus on audio data that contains some built-in semantic information structure such as in broadcast news, or focus on classification of audio that contains a single type of sound such as cleaar speech or clear music only. In the CueVideo system, we detect and classify audio that consists of mixed audio, i.e. combinations of speech and music together with other ...

 

Joke-o-Mat HD: browsing sitcoms with human derived transcripts

  [CiTO]
In Proceedings of the international conference on Multimedia (2010), pp. 1591-1594, doi:10.1145/1873951.1874295
posted to no-tag by marlar to the group searchingspeech2010 on 2011-10-02 16:01:33 **

Abstract

Joke-o-mat HD is a system that allows a user to navigate sitcoms (such as Seinfeld) by "narrative themes", including scenes, punchlines, and dialog segments. The themes can be filtered by the main actors and by keyword. For example, the user can select to see only punchlines by Kramer that contain the word "armoire". The system infers the narrative themes using segmentation of the audio track into laughter, actors, words, and music. The segmentation can be generated either by an expert annotator, ...

 

An introduction to voice search

  [CiTO]
Signal Processing Magazine, IEEE, Vol. 25, No. 3. (May 2008), pp. 28-38, doi:10.1109/msp.2008.918411
posted to no-tag by marlar to the group searchingspeech2010 on 2011-10-02 12:41:33 ** along with 1 person lantash_luci

Abstract

Voice search is the technology underlying many spoken dialog systems (SDSs) that provide users with the information they request with a spoken query. The information normally exists in a large database, and the query has to be compared with a field in the database to obtain the relevant information. The contents of the field, such as business or product names, are often unstructured text. This article categorized spoken dialog technology into form filling, call routing, and voice search, and reviewed the ...

 

Speech recognition in university classrooms: liberated learning project

  [CiTO]
In Proceedings of the fifth international ACM conference on Assistive technologies (2002), pp. 192-196, doi:10.1145/638249.638284
posted to searchingspeech by marlar to the group searchingspeech2010 on 2011-10-02 10:41:20 **

Abstract

The LIBERATED LEARNING PROJECT (LLP) is an applied research project studying two core questions:1) Can speech recognition (SR) technology successfully digitize lectures to display spoken words as text in university classrooms?2) Can speech recognition technology be used successfully as an alternative to traditional classroom notetaking for persons with disabilities?This paper addresses these intriguing questions and explores the underlying complex relationship between speech recognition technology, university educational environments, and disability issues. ...

 

Survey on speech emotion recognition: Features, classification schemes, and databases

  [CiTO]
Pattern Recognition, Vol. 44, No. 3. (13 March 2011), pp. 572-587, doi:10.1016/j.patcog.2010.09.020
posted to no-tag by marlar to the group searchingspeech2010 on 2011-10-01 22:36:56 ** along with 1 person msoley

Abstract

Recently, increasing attention has been directed to the study of the emotional content of speech signals, and hence, many systems have been proposed to identify the emotional content of a spoken utterance. This paper is a survey of speech emotion classification addressing three important aspects of the design of a speech emotion recognition system. The first one is the choice of suitable features for speech representation. The second issue is the design of an appropriate classification scheme and the third issue ...

 

Searching multimedia content with a spontaneous conversational speech track

  [CiTO]
In Proceedings of the 17th ACM international conference on Multimedia (2009), pp. 1159-1160, doi:10.1145/1631272.1631549
posted to asr searchingspeech tno tud twente by marlar to the group searchingspeech2010 on 2011-03-20 15:51:21 ** along with 1 group petamedia

Abstract

An abstract is not available. ...

 

Multimedia with a speech track: searching spontaneous conversational speech

  [CiTO]
SIGIR Forum, Vol. 44 (August 2010), pp. 76-81, doi:10.1145/1842890.1842901
posted to asr fhg speech tno tud twente by marlar to the group searchingspeech2010 on 2011-03-20 15:42:38 ** along with 1 group petamedia

Abstract

After two successful years at SIGIR in 2007 and 2008, the third workshop on Searching Spontaneous Conversational Speech (SSCS 2009) was held conjunction with the ACM Multimedia 2009. The goal of the SSCS series is to serve as a forum that brings together the disciplines that collaborate on spoken content retrieval, including information retrieval, speech recognition and multimedia analysis. Multimedia collections often contain a speech track, but in many cases it is ignored or not fully exploited for information retrieval. Currently, ...

 

The N-Best algorithm: an efficient procedure for finding top N sentence hypotheses

  [CiTO]
In Proceedings of the workshop on Speech and Natural Language (1989), pp. 199-202, doi:10.3115/1075434.1075467
posted to no-tag by marlar to the group searchingspeech2010 on 2011-02-16 14:22:49 **

Abstract

In this paper we introduce a new search algorithm that provides a simple, clean, and efficient interface between the speech and natural language components of a spoken language system. The N-Best algorithm is a time-synchronous Viterbi-style beam search algorithm that can be made to find the most likely N whole sentence alternatives that are within a given a "beam" of the most likely sentence. The algorithm can be shown to be exact under some reasonable constraints. That is, it guarantees that ...

 

Word graphs: an efficient interface between continuous-speech recognition and language understanding

  [CiTO]
posted to no-tag by marlar to the group searchingspeech2010 on 2011-02-16 14:21:27 **

Abstract

Word graphs are directed acyclic graphs where each edge is labeled with a word and a score, and each node is labeled with a point in time. Word graphs form an efficient feedforward interface between continuous-speech recognition and linguistic processors. Word graphs with high coverage and modest graph densities can be generated with a computational load comparable with bigram best-sentence recognition. Results on word graph error rates and word graph densities are presented for the ASL (Architecture Speech/Language) benchmark test ...

 

Phonetic Searching vs. LVCSR: How to Find What You Really Want in Audio Archives

  [CiTO]
pp. 9-22
posted to sub-word by marlar to the group searchingspeech2010 on 2011-01-15 20:37:28 ** along with 1 group ilps

Abstract

A new technique is presented for searching digital audio at the word/phrase level. Unlike previous methods based upon Large Vocabulary Continuous Speech Recognition (LVCSR, with inherent problems of closed vocabulary and high word error rate), phonetic searching combines high speed and accuracy, supports open vocabulary, imposes low penalty for new words, permits phonetic and inexact spelling, enables user-determined depth of search, and is amenable to parallel execution for highly scalable deployment. A detailed comparison of accuracy between phonetic searching and one ...

 

Discriminative keyword spotting

  [CiTO]
Speech Communication, Vol. 51, No. 4. (April 2009), pp. 317-329, doi:10.1016/j.specom.2008.10.002
posted to no-tag by marlar to the group searchingspeech2010 on 2011-01-15 20:05:53 ** along with 2 people kabus Rui_Feng

Abstract

This paper proposes a new approach for keyword spotting, which is based on large margin and kernel methods rather than on HMMs. Unlike previous approaches, the proposed method employs a discriminative learning procedure, in which the learning phase aims at achieving a high area under the ROC curve, as this quantity is the most common measure to evaluate keyword spotters. The keyword spotter we devise is based on mapping the input acoustic representation of the speech utterance along with the target ...

 

Impact of Spontaneous Speech Features on Business Concept Detection: a Study of Call-Centre Data

  [CiTO]
In Proceedings of the ACM Multimedia Workshop on Searching Spontaneous Conversational Speech (2010), pp. 11-14
posted to no-tag by marlar to the group searchingspeech2010 on 2010-12-12 12:03:04 **
 

Today's and Tomorrow's Retrieval Practice in the Audiovisual Archive

  [CiTO]
In ACM International Conference on Image and Video Retrieval 2010 (CIVR 2010) (July 2010)
posted to av by marlar to the group searchingspeech2010 on 2010-11-28 15:34:04 ** along with 1 group mirlit

Abstract

Content-based video retrieval is maturing to the point where it can be used in real-world retrieval practices. One such practice is the audiovisual archive, whose users increasingly require fine-grained access to broadcast television content. We investigate to what extent content-based video retrieval methods can improve search in the audiovisual archive. In particular, we propose an evaluation methodology tailored to the specific needs and circumstances of the audiovisual archive, which are typically missed by existing evaluation initiatives. We utilize logged searches and ...

 

Investigating the Global Semantic Impact of Speech Recognition Error on Spoken Content Collections

  [CiTO]
In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval (2009), pp. 755-760, doi:10.1007/978-3-642-00958-7_80
posted to no-tag by marlar to the group searchingspeech2010 on 2010-11-28 12:46:11 **

Abstract

Errors in speech recognition transcripts have a negative impact on effectiveness of content-based speech retrieval and present a particular challenge for collections containing conversational spoken content. We propose a Global Semantic Distortion (GSD) metric that measures the collection-wide impact of speech recognition error on spoken content retrieval in a query-independent manner. We deploy our metric to examine the effects of speech recognition substitution errors. First, we investigate frequent substitutions, cases in which the recognizer habitually mis-transcribes one word as another. Although ...

 

SemanticVox: a multilingual video search engine

  [CiTO]
In Proceedings of the 6th ACM international conference on Image and video retrieval (2007), pp. 81-84, doi:10.1145/1282280.1282291
posted to no-tag by marlar to the group searchingspeech2010 on 2010-11-28 12:42:27 **

Abstract

In this paper, we describe the SemanticVox project. SemanticVox aims at providing a real link between speech transcription technologies from Vecsys [8] based on LIMSI research [9] and multimedia documents analysis and retrieval technologies from the Multilingual Multimedia Knowledge Engineering Laboratory (LIC2M) of the CEA-LIST [1]. The first application of the project is a cross-lingual automatic video indexing and retrieval system based on speech transcription and video analysis. The two main novelties of the system are: (i) its ability to manage ...

 

Is word error rate a good indicator for spoken language understanding accuracy

  [CiTO]
posted to wer by marlar to the group searchingspeech2010 on 2010-11-28 10:55:13 ** along with 1 person lantash_luci

Abstract

It is a conventional wisdom in the speech community that better speech recognition accuracy is a good indicator for better spoken language understanding accuracy, given a fixed understanding component. The findings in this work reveal that this is not always the case. More important than word error rate reduction, the language model for recognition should be trained to match the optimization objective for understanding. In this work, we applied a spoken language understanding model as the language model in speech recognition. ...

 

One-sided measures for evaluating ranked retrieval effectiveness with spontaneous conversational speech

  [CiTO]
In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (2006), pp. 673-674, doi:10.1145/1148170.1148311
posted to eval by marlar to the group searchingspeech2010 on 2010-11-28 09:05:39 ** along with 1 person lantash_luci

Abstract

Early speech retrieval experiments focused on news broadcasts, for which adequate Automatic Speech Recognition (ASR) accuracy could be obtained. Like newspapers, news broadcasts are a manually selected and arranged set of stories. Evaluation designs reflected that, using known story boundaries as a basis for evaluation. Substantial advances in ASR accuracy now make it possible to build search systems for some types of spontaneous conversational speech, but present evaluation designs continue to rely on known topic boundaries that are no longer well ...

 

An Investigation of Mixed-Media Information Retrieval

  [CiTO]
In Research and Advanced Technology for Digital Libraries, Vol. 2458 (13 September 2002), pp. 463-478, doi:10.1007/3-540-45747-x_34
posted to error by marlar to the group searchingspeech2010 on 2010-11-27 22:20:38 **

Abstract

Digital document archives are increasingly derived from various different media sources. At present such archives are stored and searched independently. The Information Retrieval from Mixed-Media Collections (IRMMC) project is investigating retrieval from combined document collections composed of items originating from differing media forms. Experimentalin vestigation of a mixed-media retrieval task based on the existing TREC Spoken Document Retrieval task combining Text, Spoken and Scanned Image is described. Results show that nontext media perform well within the mixed-media collection. Also ...

 

Methods and Tools for Speech Data Acquisition exploiting a Database of German Parliamentary Speeches and Transcripts from the Internet

  [CiTO]
In International Conference on Language Resources and Evaluation (LREC) (2002)
posted to alignment by marlar to the group searchingspeech2010 on 2010-11-27 20:04:32 **
 

IFINDER: an MPEG-7-based retrieval system for distributed multimedia content

  [CiTO]
In Proceedings of the tenth ACM international conference on Multimedia (2002), pp. 431-435, doi:10.1145/641007.641102
posted to no-tag by marlar to the group searchingspeech2010 on 2010-11-27 19:59:28 **

Abstract

This paper describes the MPEG-7 compliant indexing and retrieval system iFinder based on XML and open source database technology. The iFinder system automatically extracts metadata from A/V-content and allows access to the enriched content by means of a client/server-based retrieval engine. This multimedia retrieval system allows for search and retrieval of short video segments in huge multimedia archives. As a reference application, the iFinder system is used to index speeches from the German Parliament. The user can search for fragments of ...

 

A Study of Users' Perception of Relevance of Spoken Documents

  [CiTO]
No. TR-99-013. (July 1999)
posted to ui by marlar to the group searchingspeech2010 on 2010-11-27 18:30:58 **

Abstract

We present the results of a study of users' perception of relevance of documents. Documents retrieved in response to a query are presented to users in a variety of ways, from full text to a machine spoken query-biased automatically-generated summary, and the difference in users' perception of relevance is studied. The aim is to study experimentally how users' perception of relevance varies depending on the form that retrieved documents are presented. The experimental results suggest that the effectiveness of advanced multimedia ...

 

On the Use of Automatic Speech Recognition for Spoken Information Retrieval from Video Databases

  [CiTO]
In Progress in Pattern Recognition, Image Analysis and Applications, Vol. 3287 (2004), pp. 381-385, doi:10.1007/978-3-540-30463-0_47
posted to no-tag by marlar to the group searchingspeech2010 on 2010-11-25 19:02:31 **

Abstract

This document describes the realization of a spoken information retrieval system and its application to words search in an indexed video database. The system uses an automatic speech recognition (ASR) software to convert the audio signal of a video file into a transcript file and then a document indexing tool to index this transcripted file. Then, a spoken query, uttered by any user, is presented to the ASR to decode the audio signal and propose a hypothesis that is later used ...

 

Television information filtering through speech recognition

  [CiTO]
In Interactive Distributed Multimedia Systems and Services, Vol. 1045 (1996), pp. 59-69, doi:10.1007/3-540-60938-5_5
posted to filtering by marlar to the group searchingspeech2010 on 2010-11-25 19:00:39 **
 

Spoken term detection using fast phonetic decoding

  [CiTO]
Acoustics, Speech, and Signal Processing, IEEE International Conference on, Vol. 0 (2009), pp. 4881-4884, doi:10.1109/icassp.2009.4960725
posted to std by marlar to the group searchingspeech2010 on 2010-11-22 09:20:40 **

Abstract

While spoken term detection (STD) systems based on word indices provide good accuracy, there are several practical applications where it is infeasible or too costly to employ an LVCSR engine. An STD system is presented, which is designed to incorporate a fast phonetic decoding front-end and be robust to decoding errors whilst still allowing for rapid search speeds. This goal is achieved through monophone open-loop decoding coupled with fast hierarchical phone lattice search. Results demonstrate that an STD system that is ...

 

Towards spoken-document retrieval for the enterprise: Approximate word-lattice indexing with text indexers

  [CiTO]
(December 2007), pp. 629-634, doi:10.1109/asru.2007.4430185
posted to no-tag by marlar to the group searchingspeech2010 on 2010-11-22 09:19:28 **

Abstract

Enterprise-scale search engines are generally designed for linear text. Linear text is suboptimal for audio search, where accuracy can be significantly improved if the search includes alternate recognition candidates, commonly represented as word lattices. We propose two methods to enable text indexers to approximately index lattices with little or no code change: "TMI" (Time-based Merging for Indexing) aims at lattice-index size reduction, and the "sausage"-like "TALE" (Time-Anchored Lattice Expansion) approximation requires no indexer-code or data-format changes at all. On four enterprise-type ...

 

Spoken term detection system based on combination of LVCSR and phonetic search

  [CiTO]
In Proceedings of the 4th international conference on Machine learning for multimodal interaction (2008), pp. 237-247
posted to std by marlar to the group searchingspeech2010 on 2010-11-22 09:18:33 **

Abstract

The paper presents the Brno University of Technology (BUT) system for indexing and search of speech, combining LVCSR and phonetic approach. It brings a complete description of individual building blocks of the system from signal processing, through the recognizers, indexing and search until the normalization of detection scores. It also describes the data used in the first edition of NIST Spoken term detection (STD) evaluation. The results are presented on three US-English conditions - meetings, broadcast news and conversational telephone speech, ...

 

Balancing false alarms and hits in Spoken Term Detection

  [CiTO]
(March 2010), pp. 5286-5289, doi:10.1109/icassp.2010.5494966
posted to std by marlar to the group searchingspeech2010 on 2010-11-22 09:16:27 **

Abstract

This paper presents methods to improve retrieval of Out-Of-Vocabulary (OOV) terms in a Spoken Term Detection (STD) system. We demonstrate that automated tagging of OOV regions helps to reduce false alarms while incorporating phonetic confusability increases the hits. Additional features that boost the probability of a hit in accordance with the number of neighboring hits for the same query and query-length normalization also improve the overall performance of the spoken-term detection system. We show that these methods can be combined effectively ...

 

Sub-word modeling of out of vocabulary words in spoken term detection

  [CiTO]
(December 2008), pp. 273-276, doi:10.1109/slt.2008.4777893
posted to no-tag by marlar to the group searchingspeech2010 on 2010-11-22 09:15:20 **

Abstract

This paper deals with comparison of sub-word based methods for spoken term detection (STD) task and phone recognition. The sub-word units are needed for search for out-of-vocabulary words. We compared words, phones and multigrams. The maximal length and pruning of multigrams were investigated first. Then two constrained methods of multigram training were proposed. We evaluated on the NIST STD06 dev-set CTS data. The conclusion is that the proposed method improves the phone accuracy more than 9% relative and STD accuracy more ...

 

Searching Conversational Telephone Speech in Any of the World's Languages

  [CiTO]
In International Conference on Intelligence Analysis (2005)
posted to no-tag by marlar to the group searchingspeech2010 on 2010-11-21 20:55:27 **
 

"I just played that a minute ago!:" Designing User Interfaces for Audio Navigation

  [CiTO]
In Workshop On Content Visualization And Intermediate Representations (1998)
posted to no-tag by marlar to the group searchingspeech2010 on 2010-11-21 12:52:14 **
 

Studying search and archiving in a real audio database

  [CiTO]
In AAAI Technical Report SS-97-03 (1997)
posted to no-tag by marlar to the group searchingspeech2010 on 2010-11-21 12:48:09 **
 

An initial attempt to improve spoken term detection by learning optimal weights for different indexing features

  [CiTO]
(March 2010), pp. 5278-5281, doi:10.1109/icassp.2010.5494981
posted to weights by marlar to the group searchingspeech2010 on 2010-11-20 21:25:44 **

Abstract

Because different indexing features actually have different discriminative capabilities for spoken term detection and different levels of reliability in recognition, it is reasonable to weight the indexing features in the transcribed lattices differently during spoken term detection. In this paper, we present an initial attempt of using two weighting schemes, one context independent (fixed weight for each feature) and one context dependent(different weights for the same feature in different context). These weights can be learned by optimizing a desired spoken term ...

 

Advances in phonetic word spotting

  [CiTO]
In Proceedings of the tenth international conference on Information and knowledge management (2001), pp. 580-582, doi:10.1145/502585.502697
posted to subwords by marlar to the group searchingspeech2010 on 2010-11-20 20:52:32 **

Abstract

Phonetic speech retrieval is used to augment word based retrieval in spoken document retrieval systems, for in and out of vocabulary words. In this paper, we present a new indexing and ranking scheme using metaphones and a Bayesian phonetic edit distance. We conduct an extensive set of experiments using a hundred hours of HUB4 data with ground truth transcript and twenty-four thousands query words. We show improvement of up to 15% in precision compare to results obtained speech recognition alone, at ...

 

Confidence measures for the SWITCHBOARD database

  [CiTO]
posted to confidence by marlar to the group searchingspeech2010 on 2010-11-20 19:20:52 **

Abstract

There is increasing interest in systems which attempt to automate a task or a transaction using speech input and output. To function effectively with imperfect speech recognition, such systems require an estimate of which words in the output from the recogniser are likely to be correct and which can probably be disregarded as incorrect, i.e. a confidence-measure for each decoded word. We define a measure for evaluating the effectiveness of a post-classifier which estimates confidence-measures, and describe the development of a post-classifier for words decoded from the SWITCHBOARD database, ...

 

Detecting misrecognitions and out-of-vocabulary words

  [CiTO]
Acoustics, Speech, and Signal Processing, 1994. ICASSP-94., 1994 IEEE International Conference on In Acoustics, Speech, and Signal Processing, 1994. ICASSP-94., 1994 IEEE International Conference on, Vol. ii (1994), pp. II/21-II/24, doi:10.1109/icassp.1994.389728
posted to asr confidence by marlar to the group searchingspeech2010 on 2010-11-20 17:18:14 **

Abstract

This paper describes and evaluates a new technique for evaluating confidence in word strings produced by a speech recognition system. It detects misrecognized and out-of-vocabulary words in spontaneous spoken dialogs. The system uses multiple, diverse knowledge sources including acoustics, semantics, pragmatics and discourse to determine if a word string is misrecognized. When likely misrecognitions are detected, a series of tests distinguishes out-of-vocabulary words from other error sources. The work is part of a larger effort to automatically recognize and understand new words when spoken in a spontaneous spoken dialog. ...

 

Continuous hidden Markov modeling for speaker-independent word spotting

  [CiTO]
Acoustics, Speech, and Signal Processing, 1989. ICASSP-89., 1989 International Conference on In Acoustics, Speech, and Signal Processing, 1989. ICASSP-89., 1989 International Conference on (1989), pp. 627-630, doi:10.1109/icassp.1989.266505
posted to no-tag by marlar to the group searchingspeech2010 on 2010-11-20 14:51:13 **

Abstract

A word-spotting system using Gaussian hidden Markov models is presented. Several aspects of this problem are investigated. Specifically, results are reported on the use of various signal processing and feature transformation techniques. The authors have observed that performance can be greatly affected by the choice of features used, the covariance structure of the Gaussian models, and transformations based on energy and feature distributions. Due to the open-set nature of the problem, the specific techniques for modeling out-of-vocabulary speech and the choice of scoring metric can have a significant effect on ...

 

Automatic modeling for adding new words to a large-vocabulary continuous speech recognition system

  [CiTO]
Acoustics, Speech, and Signal Processing, IEEE International Conference on, Vol. 0 (1991), pp. 305-308, doi:10.1109/icassp.1991.150337
posted to confidence by marlar to the group searchingspeech2010 on 2010-11-20 11:46:11 **

Abstract

The authors report on the detection of new words for the speaker-dependent and speaker-independent paradigms. A useful operating point in a speaker-dependent paradigm is defined at 71% detection rate and 1% false alarm rate. The authors present a novel technique for obtaining a phonetic transcription for a new word, which is needed to add the new word to the system. The technique utilizes DECtalk's text-to-sound rules to obtain an initial phonetic transcription for the new word. Since these text-to-sound rules are ...

 

A phone-dependent confidence measure for utterance rejection

  [CiTO]
In Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01 (1996), pp. 515-517, doi:10.1109/icassp.1996.541146
posted to confidence by marlar to the group searchingspeech2010 on 2010-11-20 10:58:03 **

Abstract

An acoustic confidence measure for acceptance/rejection of recognition hypotheses for continuous speech utterances is proposed. This measure is useful for rejecting utterances that are out of domain, or contain out-of-vocabulary words or speech disfluencies. A phone-based approach is implemented so that a single global threshold can be applied to hypothesis rejection for any word sequence. Phone confidence is computed for each frame of speech as the posterior phone probability given the acoustic observation. Word sequence confidence is evaluated as the average ...

 

A probabilistic approach to confidence estimation and evaluation

  [CiTO]
posted to confidence by marlar to the group searchingspeech2010 on 2010-11-20 10:53:09 **

Abstract

In this paper we propose a novel way of estimating confidences for words that are recognized by a speech recognition system, together with a natural methodology for evaluating the overall quality of those confidence estimates. Our approach is based on an interpretation of a confidence as the probability that the corresponding recognized word is correct, and makes use of generalized linear models as a means for combining various predictor scores so as to arrive at confidence estimates. Experimental results using these models are presented based on four different sources ...

 

Predicting Word Spotting Performance

  [CiTO]
In Third International Conference on Spoken Language Processing (ICSLP 94) (1994), pp. 2195-2198
posted to confidence by marlar to the group searchingspeech2010 on 2010-11-20 09:50:16 **

Abstract

To use a word spotting system efficiently, it is helpful to be able to predict the performance of the system accurately. In this paper, we investigate performance prediction under different conditions. First, we discuss how to use statistical techniques to predict performance, and its variability on new unseen testing data. Second, we show that classification trees can be used to estimate the posterior probability of putative hits and that posterior probability can predict performance of unlabeled test data. Thirdly, we show ...

 

Understanding and improving speech recognition performance through the use of diagnostic tools

  [CiTO]
Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on, Vol. 1 (1995), pp. 221-224, doi:10.1109/icassp.1995.479404
posted to confidence by marlar to the group searchingspeech2010 on 2010-11-20 09:01:27 ** along with 1 person Milos

Abstract

The goal of this work is to highlight aspects of an experiment other than the word error rate. When a speech recognition experiment is performed, the word error rate provides no insight into the factors responsible for the recognition errors. We begin this paper by describing an experiment which contrasts the language of conversational speech with that of text from the Wall Street Journal. The remainder of the paper is devoted to the description of a more general approach to performance diagnosis which identifies significant sources of error ...

 

Improving the suitability of imperfect transcriptions for information retrieval from spoken documents

  [CiTO]
(1999), pp. 505-508 vol.1, doi:10.1109/icassp.1999.758173
posted to no-tag by marlar to the group searchingspeech2010 on 2010-11-20 08:57:56 **

Abstract

There has been a considerable focus on information retrieval for multimedia databases. When speech is used as the source material for multimedia indexing, the effect of transcriber error on retrieval effectiveness must be considered. This paper describes a method for measuring the relevance of documents to queries when information about the probability of word transcription error is available. To support the use of this technique, a method is presented for estimating word error probability in speech recognition engines that use word graphs (lattices). An information retrieval experiment using this ...

 

Learning new words from spontaneous speech

  [CiTO]
Acoustics, Speech, and Signal Processing, IEEE International Conference on, Vol. 2 (1993), pp. 590-591, doi:10.1109/icassp.1993.319377
posted to no-tag by marlar to the group searchingspeech2010 on 2010-11-14 20:12:34 **

Abstract

The authors describe the design of a system to learn new words from spontaneous speech input, and present an initial experiment on detecting the new words to be learned. Learning a new word involves detecting an out-of-vocabulary word in the input, determining its meaning, and adding the word to the system lexicon and grammars. Such learning would enable later recognition, parsing, and interpretation of the new words. ...

 

Context-based speech recognition error detection and correction

  [CiTO]
In Proceedings of HLT-NAACL 2004: Short Papers on XX (2004), pp. 85-88
posted to conmod by marlar to the group searchingspeech2010 on 2010-11-14 20:01:09 ** along with 1 group mirlit

Abstract

In this paper we present preliminary results of a novel unsupervised approach for high-precision detection and correction of errors in the output of automatic speech recognition systems. We model the likely contexts of all words in an ASR system vocabulary by performing a lexical co-occurrence analysis using a large corpus of output from the speech system. We then identify regions in the data that contain likely contexts for a given query word. Finally, we detect words or sequences of words in ...

 

Improving automatic speech transcription for multimedia conten

  [CiTO]
In Proceedings of WWW/Internet (2007), pp. 145-152
posted to no-tag by marlar to the group searchingspeech2010 on 2010-11-14 19:57:29 **
Note: You may cite this page as: http://www.citeulike.org/group/10577/library

Result page: 1 2 3 4 5 6 7 8 9 10 Next

Create CiTO

Create a CiTO relationship by dragging the [CiTO] link onto another article.

Alternatively, drag two articles into the two boxes below. This is useful when the two articles are not on the same page - the articles will be remembered between pages.

This article...

...this one

Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.