ISCA Archive

Keynotes 1-4

On organic interfaces
Victor Zue

The neural basis of speech perception - a view from functional imaging
Sophie K. Scott

Computer-supported human-human multilingual communication
Alex Waibel, Keni Bernardin, Matthias Wölfel

Self-organization in the evolution of shared systems of speech sounds: a computational study
Pierre-Yves Oudeyer

Discriminative and Large Margin Techniques in Acoustic Modeling

Soft margin feature extraction for automatic speech recognition
Jinyu Li, Chin-Hui Lee

A fast optimization method for large margin estimation of HMMs based on second order cone programming
Yan Yin, Hui Jiang

Frame margin probability discriminative training algorithm for noisy speech recognition
Hao-Zheng Li, Douglas O'Shaughnessy

Hierarchical neural networks feature extraction for LVCSR system
Fabio Valente, Jithendra Vepa, Christian Plahl, Christian Gollan, Hynek Hermansky, Ralf Schlüter

Bhattacharyya error and divergence using variational importance sampling
Peder A. Olsen, John R. Hershey

Phoneme dependent frame selection preference
Tingyao Wu, Jacques Duchateau, Dirk Compernolle

Speech Production I, II

An articulatory and acoustic study of "retroflex" and "bunched" american English rhotic sound based on MRI
Xinhui Zhou, Carol Y. Espy-Wilson, Mark Tiede, Suzanne Boyce

An MRI study of european portuguese nasals
Paula Martins, Inês Carbone, Augusto Silva, António J. S. Teixeira

A four-cube FEM model of the extrinsic and intrinsic tongue muscles to simulate the production of vowel /i/
Sayoko Takano, Hiroki Matsuzaki, Kunitoshi Motoki

Performance evaluation of glottal quality measures from the perspective of vocal tract filter consistency
Juan Torres, Elliot Moore

Statistical identification of critical, dependent and redundant articulators
Veena D. Singampalli, Philip J. B. Jackson

An empirical investigation of the nonuniqueness in the acoustic-to-articulatory mapping
Chao Qin, Miguel Á. Carreira-Perpiñán

Vocal tract length during speech production
Sorin Dusan

Approximation method of subglottal system using ARMA filter
Nobuhiro Miki, Kyohei Hayashi

Enhancing acoustic-to-EPG mapping with lip position information
Asterios Toutios, Konstantinos Margaritis

A model of glottal flow incorporating viscous-inviscid interaction
Tokihiko Kaburagi, Yosuke Tanabe

Thinking outside the cube: modeling language processing tasks in a multiple resource paradigm
Kilian G. Seeber

Experimental validation of direct and inverse glottal flow models for unsteady flow conditions
Julien Cisonni, Annemie Van Hirtum, Jan Willems, Xavier Pelorson

Effect of unsteady glottal flow on the speech production process
Hideyuki Nomura, Tetsuo Funada

Word stress correlates in spontaneous child-directed speech in German
Katrin Schneider, Bernd Möbius

Acquisition and synchronization of multimodal articulatory data
Michael Aron, Nicolas Ferveur, Erwan Kerrien, Marie-Odile Berger, Yves Laprie

A phonetic concatenative approach of labial coarticulation
Vincent Robert, Yves Laprie, Anne Bonneau

Visual analysis of lip coarticulation in VCV utterances
Aseel Turkmani, Adrian Hilton, Philip J. B. Jackson, James Edge

Comparison of multiple voice source parameters in different phonation types
Matti Airas, Paavo Alku

Acoustic and affective comparisons of natural and imaginary infant-, foreigner- and adult-directed speech
Monja Knoll, Lisa Scharrer

Vowel production in two occlusal classes
André Araújo, Luis M. T. Jesus, Isabel M. Costa

Nepalese retroflex stops: a static palatography study of inter- and intra-speaker variability
Rajesh Khatiwada

Effects of testosterone levels on temporal and intonational aspects of speech: more exploratory data
Charles A. Lamoureux, Victor J. Boucher

Phonetic Segmentation and Classification I, II

Fixed-size kernel logistic regression for phoneme classification
Peter Karsmakers, Kristiaan Pelckmans, Johan Suykens, Hugo Van hamme

A multiple-model based framework for automatic speech segmentation
Seung Seop Park, Jong Won Shin, Jong Kyu Kim, Nam Soo Kim

Semi-supervised learning of speech sounds
Aren Jansen, Partha Niyogi

Evaluation of syllable stress using single class classifier
Abhinav Parate, Ashish Verma, Jayanta Basak

Distinctive phonetic feature (DPF) based phone segmentation using hybrid neural networks
Mohammad Nurul Huda, Ghulam Muhammad, Junsei Horikawa, Tsuneo Nitta

A methodology for the automatic detection of perceived prominent syllables in spoken French
J. -Ph. Goldman, M. Avanzi, A. -C. Simon, Anne Lacheret, A. Auchlin

Dual-channel acoustic detection of nasalization states
Xiaochuan Niu, Jan P. H. van Santen

Acoustic parameters for the automatic detection of vowel nasalization
Tarun Pruthi, Carol Y. Espy-Wilson

On the use of time-delay neural networks for highly accurate classification of stop consonants
Jun Hou, Lawrence R. Rabiner, Sorin Dusan

A new approach for phoneme segmentation of speech signals
Ladan Golipour, Douglas O'Shaughnessy

Automatically learning the units of speech by non-negative matrix factorisation
Veronique Stouten, Kris Demuynck, Hugo Van hamme

A saliency-based auditory attention model with applications to unsupervised prominent syllable detection in speech
Ozlem Kalinli, Shrikanth S. Narayanan

Zero-crossing-based ratio masking for sound segregation
Sung Jun An, Young-Ik Kim, Rhee Man Kil

Event detection of speech signals based on auditory processing with a dynamic compressive gammachirp filterbank
Satomi Tanaka, Minoru Tsuzaki, Hiroaki Kato, Yoshinori Sagisaka

Segmentation of speech: child's play?
Odette Scharenborg, Mirjam Ernestus, Vincent Wan

Dimensionality reduction methods applied to both magnitude and phase derived features
Andrew Errity, John McKenna, Barry Kirkpatrick

Discourse, Dialog and Conversation

Voice source and vocal tract variations as cues to emotional states perceived from expressive conversational speech
Hiroki Mori, Hideki Kasuya

Exploring initiative strategies using computer simulation
Fan Yang, Peter A. Heeman

From one base form to multiple output styles - predicting stylistic dynamics of discourse prosody
Chiu-yu Tseng, Zhao-yu Su

Topic in dialogue: prosodic and syntactic features
Claudia Crocco, Renata Savy

Features of pauses and conjunctions at syntactic and discourse boundaries in Japanese monologues
Michiko Watanabe, Yasuharu Den, Keikichi Hirose, Shusaku Miwa, Nobuaki Minematsu

Spoken Dialog Systems I, II

Utilizing online content as domain knowledge in a multi-domain dynamic dialogue system
Craig Wootton, Michael McTear, Terry Anderson

Handling speech input in the ritel QA dialogue system
Boris van Schooten, Sophie Rosset, Olivier Galibert, Aurélien Max, Rieks op den Akker, Gabriel Illouz

Online call quality monitoring for automating agent-based call centers
Woosung Kim

Analysis of communication failures for spoken dialogue systems
Sebastian Möller, Klaus-Peter Engelbrecht, Antti Oulasvirta

How to access audio files of large data bases using in-car speech dialogue systems
Sandra Mann, André Berton, Ute Ehrlich

Analyzing temporal transition of real user's behaviors in a spoken dialogue system
Kazunori Komatani, Tatsuya Kawahara, Hiroshi G. Okuno

Voicepedia: towards speech-based access to unstructured information
J. Sherwani, Dong Yu, Tim Paek, Mary Czerwinski, Yun-Cheng Ju, Alex Acero

Exploiting prosodic features for dialog act tagging in a discriminative modeling framework
Vivek Rangarajan, Srinivas Bangalore, Shrikanth S. Narayanan

Using information state to improve dialogue move identification in a spoken dialogue system
Hua Ai, Antonio Roque, Anton Leuski, David Traum

Using multiple strategies to manage spoken dialogue
Shiu-Wah Chu, Ian O'Neill, Philip Hanna

An information state based dialogue manager for a mobile robot
Marcelo Quinderé, Luís Seabra Lopes, António J. S. Teixeira

Automated directory assistance system - from theory to practice
Dong Yu, Yun-Cheng Ju, Ye-Yi Wang, Geoffrey Zweig, Alex Acero

The voice-rate dialog system for consumer ratings
Geoffrey Zweig, Patrick Nguyen, Yun-Cheng Ju, Ye-Yi Wang, Dong Yu, Alex Acero

The influence of user tailoring and cognitive load on user performance in spoken dialogue systems
Andi Winterboer, Jiang Hu, Johanna D. Moore, Clifford Nass

Confidence measures for voice search applications
Ye-Yi Wang, Dong Yu, Yun-Cheng Ju, Geoffrey Zweig, Alex Acero

Effects of quiz-style information presentation on user understanding
Ryuichiro Higashinaka, Kohji Dohsaka, Shigeaki Amano, Hideki Isozaki

A data visualization and analysis method for natural language call routing system design
Hong-Kwang Jeff Kuo, Vaibhava Goel

Accent and Language Identification I, II

Discriminative optimization of language adapted HMMs for a language identification system based on parallel phoneme recognizers
Josef G. Bauer, Bernt Andrassy, Ekaterina Timoshenko

Fusion of contrastive acoustic models for parallel phonotactic spoken language identification
Khe Chai Sim, Haizhou Li

Multi-layer kohonen self-organizing feature map for language identification
Liang Wang, Eliathamby Ambikairajah, Eric H. C. Choi

Hierarchical language identification based on automatic language clustering
Bo Yin, Eliathamby Ambikairajah, Fang Chen

Using speech rhythm for acoustic language identification
Ekaterina Timoshenko, Harald Höge

A model-based estimation of phonotactic language verification performance
Ka-keung Wong, Man-hung Siu, Brian Mak

A tagging algorithm for mixed language identification in a noisy domain
Mike Rosner, Paulseph-John Farrugia

Improved language recognition using better phonetic decoders and fusion with MFCC and SDC features
Doroteo T. Toledano, Javier Gonzalez-Dominguez, Alejandro Abejon-Gonzalez, Danilo Spada, Ismael Mateos-Garcia, Joaquin Gonzalez-Rodriguez

An open-set detection evaluation methodology applied to language and emotion recognition
David A. van Leeuwen, Khiet P. Truong

Boosting with anti-models for automatic language identification
Xi Yang, Man-hung Siu, Herbert Gish, Brian Mak

Acoustic language identification using fast discriminative training
Fabio Castaldo, Daniele Colibro, Emanuele Dalmasso, Pietro Laface, Claudio Vair

Spoken language identification using score vector modeling and support vector machine
Ming Li, Hongbin Suo, Xiao Wu, Ping Lu, Yonghong Yan

Language identification based on n-gram frequency ranking
R. Cordoba, L. F. D'Haro, F. Fernandez-Martinez, J. Macias-Guarasa, J. Ferreiros

Improving phonotactic language recognition with acoustic adaptation
Wade Shen, Douglas Reynolds

Education and Training

Syllable lattices as a basis for a children's speech reading tracker
Daniel Bolanos, Wayne Ward, Sarel Van Vuuren, Javier Garrido

Mandarin vowel pronunciation quality evaluation by using formant pattern recognition
Fuping Pan, Qingwei Zhao, Yonghong Yan

Automatic detection and classification of disfluent reading miscues in young children's speech for the purpose of assessment
Matthew Black, Joseph Tepperman, Sungbok Lee, Patti Price, Shrikanth S. Narayanan

Structural assessment of language learners' pronunciation
Nobuaki Minematsu, K. Kamata, Satoshi Asakawa, T. Makino, T. Nishimura, Keikichi Hirose

Enhancing usability of CAPL system for qur'an recitation learning
Abdurrahman Samir, Sherif Mahdy Abdou, Ahmed Husien Khalil, Mohsen Rashwan

Automatic large-scale oral language proficiency assessment
Febe de Wet, Christa van der Walt, Thomas Niesler

Robust ASR I, II

Noise-robust hands-free voice activity detection with adaptive zero crossing detection using talker direction estimation
Yuki Denda, Takamasa Tanaka, Masato Nakayama, Takanobu Nishiura, Yoichi Yamashita

A robust mel-scale subband voice activity detector for a car platform
A. Álvarez, R. Martínez, P. Gómez, V. Nieto, V. Rodellar

Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio
Kentaro Ishizuka, Tomohiro Nakatani, Masakiyo Fujimoto, Noboru Miyazaki

Feature and distribution normalization schemes for statistical mismatch reduction in reverberant speech recognition
A. M. Toh, Roberto Togneri, Sven Nordholm

Temporal masking for unsupervised minimum Bayes risk speaker adaptation
Matthew Gibson, Thomas Hain

Speech feature compensation based on pseudo stereo codebooks for robust speech recognition in additive noise environments
Tsung-hsueh Hsieh, Jeih-weih Hung

Multiband, multisensor robust features for noisy speech recognition
Dimitrios Dimitriadis, Petros Maragos, Stamatios Lefkimmiatis

Noise robust speech recognition for voice driven wheelchair
Akira Sasou, Hiroaki Kojima

Irrelevant variability normalization based HMM training using VTS approximation of an explicit model of environmental distortions
Yu Hu, Qiang Huo

On the jointly unsupervised feature vector normalization and acoustic model compensation for robust speech recognition
Luis Buera, Antonio Miguel, Eduardo Lleida, Óscar Saz, Alfonso Ortega

An ensemble modeling approach to joint characterization of speaker and speaking environments
Yu Tsao, Chin-Hui Lee

Cluster-based polynomial-fit histogram equalization (CPHEQ) for robust speech recognition
Shih-Hsiang Lin, Yao-Ming Yeh, Berlin Chen

Robust distributed speech recognition using histogram equalization and correlation information
Pedro M. Martinez, Jose C. Segura, Luz Garcia

Predictive minimum Bayes risk classification for robust speech recognition
Jen-Tzung Chien, Koichi Shinoda, Sadaoki Furui

Applying word duration constraints by using unrolled HMMs
Ning Ma, Jon Barker, Phil Green

Evaluating the temporal structure normalisation technique on the Aurora-4 task
Xiong Xiao, Eng Siong Chng, Haizhou Li

Two-stage system for robust neutral/lombard speech recognition
Hynek Bořil, Petr Fousek, Harald Höge

Noise suppression using search strategy with multi-model compositions
Takatoshi Jitsuhiro, Tomoji Toriyama, Kiyoshi Kogure

Investigations into early and late reflections on distant-talking speech recognition toward suitable reverberation criteria
Takanobu Nishiura, Yoshiki Hirano, Yuki Denda, Masato Nakayama

An approach to iterative speech feature enhancement and recognition
Stefan Windmann, Reinhold Haeb-Umbach

Optimization of temporal filters in the modulation frequency domain for constructing robust features in speech recognition
Jeih-weih Hung

The harming part of room acoustics in automatic speech recognition
Rico Petrick, Kevin Lohde, Matthias Wolff, Rüdiger Hoffmann

A reference model weighting-based method for robust speech recognition
Yuan Fu Liao, Yh-Her Yang, Chi-Hui Hsu, Cheng-Chang Lee, Jing-Teng Zeng

Mel sub-band filtering and compression for robust speech recognition
Babak Nasersharif, Ahmad Akbari, Mohammad Mehdi Homayounpour

Adaptation in ASR I, II

Clustered maximum likelihood linear basis for rapid speaker adaptation
Yun Tang, Richard Rose

Rapid speaker adaptation by reference model interpolation
Wenxuan Teng, Guillaume Gravier, Frédéric Bimbot, Frédéric Soufflet

Rapid unsupervised speaker adaptation using single utterance based on MLLR and speaker selection
Randy Gomez, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

Robustness of several kernel-based fast adaptation methods on noisy LVCSR
Brian Mak, Roger Hsiao

Estimating VTLN warping factors by distribution matching
Janne Pylkkönen

Frequency domain correspondence for speaker normalization
Ming Liu, Xi Zhou, Mark Hasegawa-Johnson, Thomas S. Huang, Zhengyou Zhang

Unsupervised training of adaptation rate using q-learning in large vocabulary continuous speech recognition
Masafumi Nishida, Yasuo Horiuchi, Akira Ichikawa

Application of CMLLR in narrow band wide band adapted systems
Martin Karafiát, Lukáš Burget, Jan Černocký, Thomas Hain

Fast adaptation of GMM-based compact models
Christophe Lévy, Georges Linarès, Jean-François Bonastre

Efficient estimation of speaker-specific projecting feature transforms
Jonas Lööf, Ralf Schlüter, Hermann Ney

Regularized feature-based maximum likelihood linear regression for speech recognition
Mohamed Kamal Omar

Modelling confusion matrices to improve speech recognition accuracy, with an application to dysarthric speech
Omar Caballero Morales, Stephen Cox

An active approach to speaker and task adaptation based on automatic analysis of vocabulary confusability
Qiang Huo, Wei Li

fMPE-MAP: improved discriminative adaptation for modeling new domains
Jing Zheng, Andreas Stolcke

Discriminative MCE-based speaker adaptation of acoustic models for a spoken lecture processing task
Timothy J. Hazen, Erik McDermott

Speaker Verification & Identification I-IV

A new kernel for SVM MLLR based speaker recognition
Zahi N. Karam, William M. Campbell

A GMM-based probabilistic sequence kernel for speaker verification
Kong-Aik Lee, Changhuai You, Haizhou Li, Tomi Kinnunen

Speaker recognition using kernel-PCA and intersession variability modeling
Hagai Aronowitz

Linear and non linear kernel GMM supervector machines for speaker verification
Réda Dehak, Najim Dehak, Patrick Kenny, Pierre Dumouchel

Support vector regression for speaker verification
Ignacio Lopez-Moreno, Ismael Mateos-Garcia, Daniel Ramos, Joaquin Gonzalez-Rodriguez

Derivative and parametric kernels for speaker verification
C. Longworth, M. J. F. Gales

Application of shifted delta cepstral features in speaker verification
Jose R. Calvo, Rafael Fernández, Gabriel Hernández

A smoothing kernel for spatially related features and its application to speaker verification
Luciana Ferrer, Kemal Sönmez, Elizabeth Shriberg

VZ-norm: an extension of z-norm to the multivariate case for anchor model based speaker verification
D. Charlet, M. Collet, Frédéric Bimbot

Word-conditioned HMM supervectors for speaker recognition
Howard Lei, Nikki Mirghafori

Speaker clustering using direct maximization of a BIC-based score
Wei-Ho Tsai

Confidence measure based unsupervised target model adaptation for speaker verification
A. Preti, Jean-François Bonastre, Driss Matrouf, F. Capman, B. Ravera

Emotion attribute projection for speaker recognition on emotional speech
Huanjun Bao, Ming-Xing Xu, Thomas Fang Zheng

High-level feature-based speaker verification via articulatory phonetic-class pronunciation modeling
Shi-Xiong Zhang, Man-Wai Mak, Helen Meng

Direct acoustic feature using iterative EM algorithm and spectral energy for classifying suicidal speech
T. Yingthawornsuk, H. Kaymaz Keskinpala, D. M. Wilkes, R. G. Shiavi, R. M. Salomon

On comparing and combining intra-speaker variability compensation and unsupervised model adaptation in speaker verification
Claudio Garreton, Nestor Becerra Yoma, Fernando Huenupán, Carlos Molina

Comparison of two kinds of speaker location representation for SVM-based speaker verification
Xianyu Zhao, Yuan Dong, Hao Yang, Jian Zhao, Liang Lu, Haila Wang

Jitter and shimmer measurements for speaker recognition
Mireia Farrús, Javier Hernando, Pascual Ejarque

Natural-emotion GMM transformation algorithm for emotional speaker recognition
Zhenyu Shan, Yingchun Yang, Ruizhi Ye

Optimized one-bit quantization for adapted GMM-based speaker verification
Ivy H. Tseng, Olivier Verscheure, Deepak S. Turaga, Upendra V. Chaudhari

A comparison of session variability compensation techniques for SVM-based speaker recognition
Mitchell McLaren, Robbie Vogt, Brendan Baker, Sridha Sridharan

Influence of task duration in text-independent speaker verification
Benoît Fauve, Nicholas Evans, Neil Pearson, Jean-François Bonastre, John Mason

A text-constrained prosodic system for speaker verification
Elizabeth Shriberg, Luciana Ferrer

Fusing acoustic, phonetic and data-driven systems for text-independent speaker verification
Asmaa El Hannani, Dijana Petrovska-Delacrétaz

Continuous prosodic features and formant modeling with joint factor analysis for speaker verification
Najim Dehak, Patrick Kenny, Pierre Dumouchel

Loquendo - Politecnico di torino's 2006 NIST speaker recognition evaluation system
Claudio Vair, Daniele Colibro, Fabio Castaldo, Emanuele Dalmasso, Pietro Laface

A straightforward and efficient implementation of the factor analysis model for speaker verification
Driss Matrouf, Nicolas Scheffer, Benoît Fauve, Jean-François Bonastre

Multi-modal user authentication from video for mobile or variable-environment applications
Timothy J. Hazen, Daniel Schultz

Quasi text-independent speaker-verification based on pattern matching
Michael Gerber, René Beutler, Beat Pfister

Virtual fusion for speaker recognition
Yosef A. Solewicz, Moshe Koppel

Evolutionary minimum verification error learning of the alternative hypothesis model for LLR-based speaker verification
Yi-Hsiang Chao, Wei-Ho Tsai, Shih-Sian Cheng, Hsin-Min Wang, Ruei-Chuan Chang

Speaker recognition by combining MFCC and phase information
Seiichi Nakagawa, Kouhei Asakawa, Longbiao Wang

A semi-automatic approach for speaker mining of tapped telephone conversations
Sandeep Manocha, Carol Y. Espy-Wilson

Cluster adaptive training weights as features in SVM-based speaker verification
Hao Yang, Yuan Dong, Xianyu Zhao, Jian Zhao, Liang Lu, Haila Wang

Study on speaker verification with non-audible murmur segments
Hideki Okamoto, Mariko Kojima, Tomoko Matsui, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano

Dimension reduction for speaker identification based on mutual information
Xugang Lu, Jianwu Dang

Robustness of long time measures of fundamental frequency
Jonas Lindh, Anders Eriksson

Score distribution scaling for speaker recognition
Vinod Prakash, John H. L. Hansen

Global features for rapid identity verification with dynamic biometric data
A. C. Morris, J. Koreman, B. Ly-Van, H. Sellahewa, S. Jassim, R. Llarena Gómez

Robust voice activity detection for narrow-bandwidth speaker verification under adverse environments
Tuan Van Pham, Michael Neffe, Gernot Kubin

Speaker verification with multiple classifier fusion using Bayes based confidence measure
Fernando Huenupán, Nestor Becerra Yoma, Carlos Molina, Claudio Garreton

Audiovisual speaker identity verification based on lip motion features
Girija Chetty, Michael Wagner

Duration and pronunciation conditioned lexical modeling for speaker verification
Gokhan Tur, Elizabeth Shriberg, Andreas Stolcke, Sachin Kajarekar

Artificial impostor voice transformation effects on false acceptance rates
Jean-François Bonastre, Driss Matrouf, Corinne Fredouille

Spoken Data Retrieval I, II

Rapid and accurate spoken term detection
David R. H. Miller, Michael Kleber, Chia-Lin Kao, Owen Kimball, Thomas Colthurst, Stephen A. Lowe, Richard M. Schwartz, Herbert Gish

Subword-based position specific posterior lattices (s-PSPL) for indexing speech information
Yi-cheng Pan, Hung-lin Chang, Berlin Chen, Lin-shan Lee

Improved methods for language model based question classification
Andreas Merkel, Dietrich Klakow

Error-tolerant question answering for spoken documents
Tomoyosi Akiba, Hirofumi Tsujimura

Exploiting information extraction annotations for document retrieval in distillation tasks
Dilek Hakkani-Tür, Gokhan Tur, Michael Levit

Learning spoken document similarity and recommendation using supervised probabilistic latent semantic analysis
K. Thambiratnam, F. Seide

A phonetic search approach to the 2006 NIST spoken term detection evaluation
Roy Wallace, Robbie Vogt, Sridha Sridharan

An integration method of retrieval results using plural subword models for vocabulary-free spoken document retrieval
Yoshiaki Itoh, Kohei Iwata, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka, Shi-wook Lee

The SRI/OGI 2006 spoken term detection system
Dimitra Vergyri, Izhak Shafran, Andreas Stolcke, Ramana R. Gadde, Murat Akbacak, Brian Roark, Wen Wang

Podcastle: a web 2.0 approach to speech recognition research
Masataka Goto, Jun Ogata, Kouichirou Eto

Speech mining in noisy audio message corpus
Nathalie Camelin, Frédéric Béchet, Géraldine Damnati, Renato De Mori

A fast fuzzy keyword spotting algorithm based on syllable confusion network
Jian Shao, Qingwei Zhao, Pengyuan Zhang, Zhaojie Liu, Yonghong Yan

Advances in speechfind: transcript reliability estimation employing confidence measure based on discriminative sub-word model for SDR
Wooil Kim, John H. L. Hansen

An interactive timeline for speech database browsing
Benoit Favre, Jean-François Bonastre, Patrice Bellot

Speech Perception I, II

Spoken word recognition of Chinese homophones: a further investigation
Michael C. W. Yip

The role of outer hair cell function in the perception of synthetic versus natural speech
Maria Wolters, Pauline Campbell, Christine DePlacido, Amy Liddell, David Owens

Hybridizing conversational and clear speech
Akiko Kusumoto, Alexander B. Kain, John-Paul Hosom, Jan P. H. van Santen

Neighborhood density and neighborhood frequency effects in French spoken word recognition
Sophie Dufour, Ulrich Hans Frauenfelder

Discrimination and recognition of scaled word sounds
Toshio Irino, Yoshie Aoki, Yoshie Hayashi, Hideki Kawahara, Roy D. Patterson

Benchmarking human performance on the acoustic and linguistic subtasks of ASR systems
László Tóth

Contributions of temporal fine structure cues to Chinese speech recognition in cochlear implant simulation
Lin Yang, Jianping Zhang, Yonghong Yan

Effect of number of masking talkers on speech-on-speech masking in Chinese
Xihong Wu, Jing Chen, Zhigang Yang, Qiang Huang, Mengyuan Wang, Liang Li

Do different boundary types induce subtle acoustic cues to which French listeners are sensitive?
Odile Bagou, Sophie Dufour, Cécile Fougeron, Alain Content, Ulrich Hans Frauenfelder

An information theoretic approach to predict speech intelligibility for listeners with normal and impaired hearing
Svante Stadler, Arne Leijon, Björn Hagerman

Speaking rate effects in a landmark-based phonetic exemplar model
Travis Wade, Bernd Möbius

Acoustic correlates of intelligibility enhancements in clearly produced fricatives
Kazumi Maniwa, Allard Jongman, Travis Wade

Modelling the human-machine gap in speech reception: microscopic speech intelligibility prediction for normal-hearing subjects with an auditory model
Tim Jürgens, Thomas Brand, Birger Kollmeier

Lombard speech impact on perceptual speaker recognition
Ayako Ikeno, John H. L. Hansen

Effect of within- and between-talker variability on word identification in noise by younger and older adults
Huiwen Goy, Kathleen Pichora-Fuller, Pascal van Lieshout, Gurjit Singh, Bruce Schneider

Speech perception in children with speech sound disorder
H. Timothy Bunnell, N. Carolyn Schanen, Linda D. Vallino, Thierry G. Morlet, James B. Polikoff, Jennette D. Driscoll, James T. Mantell

Speech coding and information processing by auditory neurons
Huan Wang, Werner Hemmert

What do listeners attend to in hearing prosodic structures? investigating the human speech-parser using short-term recall
Annie C. Gilbert, Victor J. Boucher

Time-compressed speech perception with speech and noise maskers
Douglas S. Brungart, Nandini Iyer

L2 consonant identification in noise: cross-language comparisons
Anne Cutler, Martin Cooke, Maria Luisa Garcia Lecumberri, Dennis Pasveer

Effects of non-native dialects on spoken word recognition
Jennifer T. Le, Catherine T. Best, Michael D. Tyler, Christian Kroos

Identification of natural whistled vowels by non-whistlers
Julien Meyer, Fanny Meunier, Laure Dentel

Prelexical adjustments to speaker idiosyncrasies: are they position-specific?
Alexandra Jesse, James M. McQueen

Top-down effects on compensation for coarticulation are not replicable
Holger Mitterer

Prosody: Prosodic Structure

Pitch pattern alternation in goshogawara Japanese: evidence for a prosodic phrase above the domain for downstep
Yosuke Igarashi

Some evidence on the phonetics and phonology of prosodic phrasing in Russian
Irina Nesterenko, Pavel Skrelin

Temporal downtrends in Czech read speech
Jan Volín, Radek Skarnitzl

Empirical evidence for prosodic phrasing: pauses as linguistic annotation in Korean read speech
Hyongsil Cho, Daniel Hirst

Exploiting prosody for PCFGs with latent annotations
Markus Dreyer, Izhak Shafran

Combining length distribution model with decision tree in prosodic phrase prediction
Qin Shi, DanNing Jiang, FanPing Meng, Yong Qin

Duration and pauses as boundary-markers in speech: a cross-linguistic study
Li-chiung Yang

Prosodic Modeling I, II

Modeling incompletion phenomenon in Mandarin dialog prosody
Jian Yu, Lixing Huang, Jianhua Tao, Xia Wang

Accent assignment algorithm in Hungarian, based on syntactic analysis
Anne Tamm, Kálmán Abari, Gábor Olaszy

An effective initial/final duration prediction method for corpus-based singing voice synthesis of Mandarin Chinese
Cheng-Yuan Lin, Pei-Chi Jao, J. -S. Roger Jang

Increasing prosodic variability of text-to-speech synthesizers
Géza Németh, Márk Fék, Tamás Gábor Csapó

Unsupervised HMM classification of F0 curves
Damien Lolive, Nelly Barbot, Olivier Boeffard

Automatic pitch accent prediction for text-to-speech synthesis
Ian Read, Stephen Cox

An unsupervised approach to automatic prosodic annotation
Xinqiang Ni, Yining Chen, Frank K. Soong, Min Chu, Ping Zhang

A system for transforming the emotion in speech: combining data-driven conversion techniques for prosody and voice quality
Zeynep Inanoglu, Steve Young

An automatic prosody labeling method for Mandarin speech
Chen-Yu Chiang, Hsiu-Min Yu, Yih-Ru Wang, Sin-Horng Chen

Corpus-based generation of prosodic features from text based on generation process model
Keikichi Hirose, Keiko Ochi, Nobuaki Minematsu

Novel eigenpitch-based prosody model for text-to-speech synthesis
Jilei Tian, Jani Nurminen, Imre Kiss

Modelling prominence and emphasis improves unit-selection synthesis
Volker Strom, Ani Nenkova, Robert Clark, Yolanda Vazquez-Alvarez, Jason Brenier, Simon King, Dan Jurafsky

A framework of reply speech generation for concept-to-speech conversion in spoken dialogue systems
Seiya Takada, Yuji Yagi, Keikichi Hirose, Nobuaki Minematsu

Synthesis of prosodic attitudinal variants in German backchannel ja
Thorsten Stocksmeier, Stefan Kopp, Dafydd Gibbon

Inter-language prosodic style modification experiment using word impression vector for communicative speech generation
Ke Li, Yoko Greenberg, Yoshinori Sagisaka

Speech Analysis

A conservative aggressive subspace tracker
Koby Crammer

Mutual information and the speech signal
Mattias Nilsson, W. Bastiaan Kleijn

Spectro-temporal analysis of speech using 2-d Gabor filters
Tony Ezzat, Jake Bouvrie, Tomaso Poggio

A comparative study of speech rate estimation techniques
Tomas Dekens, Mike Demol, Werner Verhelst, Piet Verhoeve

Spectro-temporal processing for blind estimation of reverberation time and single-ended quality measurement of reverberant speech
Tiago H. Falk, Hua Yuan, Wai-Yip Chan

Spectral Analysis, Formants and Vocal Tract Models

Linear prediction of audio signals
Toon van Waterschoot, Marc Moonen

Stabilised weighted linear prediction - a robust all-pole method for speech processing
Carlo Magi, Tom Bäckström, Paavo Alku

Conditionally linear Gaussian models for estimating vocal tract resonances
Daniel Rudoy, Daniel N. Spendley, Patrick J. Wolfe

Time-varying pre-emphasis and inverse filtering of speech
Karl Schnell, Arild Lacroix

Reconstructing audio signals from modified non-coherent hilbert envelopes
Joachim Thiemann, Peter Kabal

A flexible spectral modification method based on temporal decomposition and Gaussian mixture model
Binh Phu Nguyen, Masato Akagi

A comparison of estimated and MAP-predicted formants and fundamental frequencies with a speech reconstruction application
Jonathan Darch, Ben Milner

Effect of incomplete glottal closures on estimates of glottal waves via inverse filtering of vowel sounds
Huiqun Deng, Douglas O'Shaughnessy

Vocal tract and area function estimation with both lip and glottal losses
Kaustubh Kalgaonkar, Mark A. Clements

Detection of instants of glottal closure using characteristics of excitation source
S Guruprasad, B Yegnanarayana, K Sri Rama Murty

A comparative evaluation of the zeros of z transform representation for voice source estimation
Nicolas Sturmel, Christophe D'Alessandro, Boris Doval

Speech and Audio Processing for Intelligent Environments

Ambient telephony: scenarios and research challenges
Aki Härmä

Always listening to you: creating exhaustive audio database in home environments
Yasunari Obuchi, Akio Amano

Joint speaker segmentation, localization and identification for streaming audio
Joerg Schmalenstroeer, Reinhold Haeb-Umbach

Active binaural distance estimation for dynamic sources
Yan-Chen Lu, Martin Cooke, Heidi Christensen

A packetization and variable bitrate interframe compression scheme for vector quantizer-based distributed speech recognition
Bengt J. Borgström, Abeer Alwan

Channel selection by class separability measures for automatic transcriptions on distant microphones
Matthias Wölfel

Conversation detection and speaker segmentation in privacy-sensitive situated speech data
Danny Wyatt, Tanzeem Choudhury, Jeff Bilmes

Audio-based approaches to head orientation estimation in a smart-room
Alberto Abad, Carlos Segura, Climent Nadeu, Javier Hernando

Multi-resolution soft features for channel-robust distributed speech recognition
Valentin Ion, Reinhold Haeb-Umbach

Language Modeling I, II

Large-scale random forest language models for speech recognition
Yi Su, Frederick Jelinek, Sanjeev Khudanpur

PLSA-based topic detection in meetings for adaptation of lexicon and language model
Yuya Akita, Yusuke Nemoto, Tatsuya Kawahara

Language modeling using PLSA-based topic HMM
Atsushi Sako, Tetsuya Takiguchi, Yasuo Ariki

Lexicon adaptation with reduced character error (LARCE) - a new direction in Chinese language modeling
Yi-cheng Pan, Lin-shan Lee

Minimum rank error training for language modeling
Meng-Sung Wu, Jen-Tzung Chien

Integrating MAP, marginals, and unsupervised language model adaptation
Wen Wang, Andreas Stolcke

Dynamic language model adaptation using presentation slides for lecture speech recognition
Hiroki Yamazaki, Koji Iwano, Koichi Shinoda, Sadaoki Furui, Haruo Yokota

Web-based language modelling for automatic lecture transcription
Cosmin Munteanu, Gerald Penn, Ron Baecker

LSA-based language model adaptation for highly inflected languages
Tanel Alumäe, Toomas Kirt

Language model adaptation using latent dirichlet allocation and an efficient topic inference algorithm
Aaron Heidel, Hung-an Chang, Lin-shan Lee

Structural Bayesian language modeling and adaptation
Sibel Yaman, Jen-Tzung Chien, Chin-Hui Lee

Vocabulary selection for a broadcast news transcription system using a morpho-syntactic approach
Ciro Martins, António J. S. Teixeira, João Neto

Handling OOV words in Arabic ASR via flexible morphological constraints
Nguyen Bach, Mohamed Noamany, Ian Lane, Tanja Schultz

Phrases in category-based language models for Spanish and basque ASR
Raquel Justo, M. Inés Torres

Language modeling for automatic turkish broadcast news transcription
Ebru Arısoy, Haşim Sak, Murat Saraçlar

Prosody Production and Perception

Predicting focus through prominence structure
Sasha Calhoun

Analysis of emotional speech prosody in terms of part of speech tags
Murtaza Bulut, Sungbok Lee, Shrikanth S. Narayanan

The neutral tone in question intonation in Mandarin
Fang Liu, Yi Xu

Pointing to a target while naming it with /pata/ or /tapa/: the effect of consonants and stress position on jaw-finger coordination
Amélie Rochet-Capellan, Jean-Luc Schwartz, Rafael Laboissière, Arturo Galvàn

Suprasegmental aspects of pre-lexical speech in cochlear implanted children
Øydis Hide, Steven Gillis, Paul Govaerts

Categorical perception in intonation: a matter of signal dynamics?
Oliver Niebuhr

Multimodal Speech Recognition

A HMM recognition of consonant-vowel syllables from lip contours: the cued speech case
Noureddine Aboutabit, Denis Beautemps, Jeanne Clarke, Laurent Besacier

A unified approach to multi-pose audio-visual ASR
Patrick Lucey, Gerasimos Potamianos, Sridha Sridharan

Audio-visual integration for robust speech recognition using maximum weighted stream posteriors
Rowan Seymour, Darryl Stewart, Ji Ming

Continuous-speech phone recognition from ultrasound and optical images of the tongue and lips
Thomas Hueber, Gérard Chollet, Bruce Denby, Gérard Dreyfus, Maureen Stone

Multimodal speech recognition with ultrasonic sensors
Bo Zhu, Timothy J. Hazen, James Glass

Fused HMM-adaptation of multi-stream HMMs for audio-visual speech recognition
David Dean, Patrick Lucey, Sridha Sridharan, Tim Wark

Speech and Other Modalities

Analysis of head motions and speech in spoken dialogue
Carlos T. Ishi, Hiroshi Ishiguro, Norihiro Hagita

A paradigm for mobile speech-centric services
Lars Bo Larsen, Kasper L. Jensen, Søren Larsen, Morten Rasmussen

Design and recording of Czech sign language corpus for automatic sign language recognition
Pavel Campr, Marek Hrúz, Miloš Železný

Pushy versus meek - using avatars to influence turn-taking behaviour
Jens Edlund, Jonas Beskow

Wavelet-based front-end for electromyographic speech recognition
Michael Wand, Szu-Chen Stan Jou, Tanja Schultz

Intensive gestures in French and their multimodal correlates
Gaëlle Ferré, Roxane Bertrand, Philippe Blache, Robert Espesser, Stéphane Rauzy

Aspects of visual speech in Arabic
Slim Ouni, Kais Ouni

Rigid vs non-rigid face and head motion in phone and tone perception
Denis Burnham, Jessica Reynolds, Guillaume Vignali, Sandra Bollwerk, Caroline Jones

Multimodal/Multimedia Signal Processing

Audio-visual phoneme classification for pronunciation training applications
Hedvig Kjellström, Olov Engwall, Sherif Mahdy Abdou, Olle Bälter

Visual information and redundancy conveyed by internal articulator dynamics in synthetic audiovisual speech
Katja Grauwinkel, Britta Dewitt, Sascha Fagel

A speech rate related lip movement model for speech animation
Wei Zhou, Zengfu Wang

An extension 2DPCA based visual feature extraction method for audio-visual speech recognition
Guanyong Wu, Jie Zhu

Preventing an external acoustic noise from being misrecognized as a speech recognition object by confirming the lip movement image signal
Soo-jong Lee, Jun Park, Eung-kyeu Kim

Automatic head motion prediction from speech data
Gregor Hofer, Hiroshi Shimodaira

Omnidirectional audio-visual talker localizer with dynamic feature fusion based on validity and reliability criteria
Yuki Denda, Takanobu Nishiura, Yoichi Yamashita

Processing image and audio information for recognising discourse participation status through features of face and voice
Nick Campbell, Damien Douxchamps

Speech Enhancement

The effect of the additivity assumption on time and frequency domain wiener filtering for speech enhancement
Kamil K. Wójcicki, Stephen So, Kuldip K. Paliwal

Noise reduction based on adaptive β-order generalized spectral subtraction for speech enhancement
Junfeng Li, Shuichi Sakamoto, Satoshi Hongo, Masato Akagi, Yôiti Suzuki

Class constrained ROVER based speech enhancement
Amit Das, John H. L. Hansen

EMD based soft-thresholding for speech enhancement
Erhan Deger, Md. Khademul Islam Molla, Keikichi Hirose, Nobuaki Minematsu, Md. Kamrul Hasan

An approximate solution for perceptually constrained signal subspace speech enhancement method
Adam Borowicz, Alexander Petrovsky

Quality assessment of speech enhancement systems by separation of enhanced speech, noise, and echo
Tim Fingscheidt, Suhadi Suhadi

Perceptual musical noise reduction using critical bands tonality coefficients and masking thresholds
Anis Ben Aicha, Sofia Ben Jebara

On optimal estimation of compressed speech for hearing aids
Dirk Mauler, Anil M. Nagathil, Rainer Martin

DFT domain subspace based noise tracking for speech enhancement
Richard C. Hendriks, Jesper Jensen, Richard Heusdens

Noise tracking for speech systems in adverse environments
Nitish Krishnamurthy, John H. L. Hansen

Speech enhancement using multi-reference noise reduction in a vehicle environment
Abderrahman Essebbar, Tristan Poinsard

Blind adaptive principal eigenvector beamforming for acoustical source separation
Ernst Warsitz, Reinhold Haeb-Umbach, Dang Hai Tran Vu

Time-domain blind audio source separation using advanced ICA methods
Zbyněk Koldovský, Petr Tichavský

Model-based speech separation with single-microphone input
S. W. Lee, Frank K. Soong, P. C. Ching

Multi-step linear prediction based speech dereverberation in noisy reverberant environment
Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Masato Miyoshi

A statistical model based post-filtering algorithm for residual echo suppression
Seung Yeol Lee, Jong Won Shin, Hwan Sik Yun, Nam Soo Kim

An optimal speech enhancement under speech uncertainty probability and masking property of auditory system
Xiaoshan Huang, Xiaoqun Zhao

Structure-based and Template-based Automatic Speech Recognition

Temporal episodic memory model: an evolution of minerva2
Viktoria Maier, Roger K. Moore

Speech recognition with factorial-HMM syllabic acoustic models
Gianpaolo Coro, Francesco Cutugno, Fulvio Caropreso

Evaluating acoustic distance measures for template based recognition
Mathias De Wachter, Kris Demuynck, Patrick Wambacq, Dirk Van Compernolle

Hierarchical acoustic modeling based on random-effects regression for automatic speech recognition
Yan Han, Lou Boves

Construction and analysis of multiple paths in syllable models
Annika Hämäläinen, Louis ten Bosch, Lou Boves

Landmark-based approach to speech recognition: an alternative to HMMs
Carol Y. Espy-Wilson, Tarun Pruthi, Amit Juneja, Om Deshmukh

Automatic recognition of connected vowels only using speaker-invariant representation of speech dynamics
Satoshi Asakawa, Nobuaki Minematsu, Keikichi Hirose

A structured speech model parameterized by recursive dynamics and neural networks
Roberto Togneri, Li Deng

Structure-based and template-based automatic speech recognition - comparing parametric and non-parametric approaches
Li Deng, Helmer Strik

Learning the inter-frame distance for discriminative template-based keyword detection
David Grangier, Samy Bengio

Handling phonetic context and speaker variation in a structure-based speech recognizer
Dong Yu, Li Deng, Alex Acero

Robust ASR Against Noise and Reverberation

Vector-quantization based mask estimation for missing data automatic speech recognition
Maarten Van Segbroeck, Hugo Van hamme

Accurate marginalization range for missing data recognition
Sébastien Demange, Christophe Cerisara, Jean-Paul Haton

Smooth soft mel-spectrographic masks based on blind sparse source separation
Marco Kühne, Roberto Togneri, Sven Nordholm

Model-driven detection of clean speech patches in noise
Jonathan Laidler, Martin Cooke, Neil D. Lawrence

polyaural array processing for automatic speech recognition in degraded environments
Richard M. Stern, Evandro B. Gouvêa, Govindarajan Thattai

Adding noise to improve noise robustness in speech recognition
Nicolás Morales, Liang Gu, Yuqing Gao

Language Resources and Tools

The buckeye corpus of speech: updates and enhancements
Eric Fosler-Lussier, Laura Dilley, Na'im Tyson, Mark Pitt

Development of multimodal resources for multilingual information retrieval in the basque context
N. Barroso, A. Ezeiza, N. Gilisagasti, K. López de Ipiña, A. López, J. M. López

Construction of a phonotactic dialect corpus using semiautomatic annotation
Reva Schwartz, Wade Shen, Joseph Campbell, Shelley Paget, Julie Vonwiller, Dominique Estival, Christopher Cieri

BECAM tool - a semi-automatic tool for bootstrapping emotion corpus annotation and management
Slim Abdennadher, Mohamed Aly, Dirk Bühler, Wolfgang Minker, Johannes Pittermann

Resources for new research directions in speaker recognition: the mixer 3, 4 and 5 corpora
Christopher Cieri, Linda Corson, David Graff, Kevin Walker

Intercoder reliability in annotating complex disfluencies
Peter A. Heeman, Andy McMillin, J. Scott Yaruss

Single-channel Speech Enhancement

Single channel speech separation using maximum a posteriori estimation
M. H. Radfar, R. M. Dansereau

Speech enhancement with improved a posteriori SNR computation
Suhadi Suhadi, Tim Fingscheidt

Method of LP-based blind restoration for improving intelligibility of bone-conducted speech
Thang Vu Tat, Germine Seide, Masashi Unoki, Masato Akagi

Noise suppression based on extending a speech-dominated modulation band
Tiago H. Falk, Svante Stadler, W. Bastiaan Kleijn, Wai-Yip Chan

Speech enhancement using PCA and variance of the reconstruction error model identification
Amin Haji Abolhassani, Sid-Ahmed Selouani, Douglas O'Shaughnessy, Mohamed-Faouzi Harkat

Speech reinforcement based on partial specific loudness
Jong Won Shin, Woohyung Lim, Junesig Sung, Nam Soo Kim

Phonetics and Phonology

The phonetics and phonology of high and low tones in two falling f0-contours in standard German
Tamara Rathcke, Jonathan Harrington

Temporal alignment of creaky voice in neutralised realisations of an underlying, post-nasal voicing contrast in German
Tina John, Jonathan Harrington

The duration of speech pauses in a multilingual environment
Mike Demol, Werner Verhelst, Piet Verhoeve

Syllable timing patterns in Polish: results from annotation mining
Dafydd Gibbon, Jolanta Bachan, Grażyna Demenko

Minimal pairs and functional loads of sound contrasts obtained from a list of modern greek words
Constandinos Kalimeris, Stelios Bakamidis

More on acoustic correlates of stress
Daan Wissing

Comparing praat and snack formant measurements on two large corpora of northern and southern French
Cécile Woehrling, Philippe Boula de Mareüil

The phonetic exponency of phrasal accentuation in French and German
William Barry, Bistra Andreeva, Ingmar Steiner

Phonetic geminates in cypriot greek: the case of voiceless plosives
Christiana Christodoulou

Predicting vowel duration in spontaneous canadian French speech
Darcie Williams, François Poiré

Rhotic variation and schwa epenthesis in windsor French
Ivan Chow, François Poiré

On the categorical nature of the process involved in schwa elision in French
Audrey Bürki, Cécile Fougeron, Cédric Gendrot

Exploring tonal variations via context-dependent tone models
Yue-Ning Hu, Min Chu, Chao Huang, Yan-Ning Zhang

Acoustic analysis of the neutral tone in Mandarin
Philippe Martin, Jun Li

F₀ analysis of perceptual distance among Cantonese level tones
Rerrario Shui-Ching Ho, Yoshinori Sagisaka

Features for ASR

Extended powered cepstral normalization (p-CN) with range equalization for robust features in speech recognition
Chang-wen Hsu, Lin-shan Lee

Selection of optimal dimensionality reduction method using chernoff bound for segmental unit input HMM
Makoto Sakai, Norihide Kitaoka, Seiichi Nakagawa

Fepstrum: an improved modulation spectrum for ASR
Vivek Tyagi

Narrowband to wideband feature expansion for robust multilingual ASR
Dušan Macho

Non-linear spectral contrast stretching for in-car speech recognition
Weifeng Li, Hervé Bourlard

Clustering-based two-dimensional linear discriminant analysis for speech recognition
Xiao-Bing Li, Douglas O'Shaughnessy

A study on temporal features derived by analytic signal
Yotaro Kubo, Shigeki Okawa, Akira Kurematsu, Katsuhiko Shirai

Dimensionality reduction of speech features using nonlinear principal components analysis
Stephen A. Zahorian, Tara Singh, Hongbing Hu

Linear transformation approach to VTLN using dynamic frequency warping
D. R. Sanand, D. Dinesh Kumar, S. Umesh

Features interpolation domain for distributed speech recognition and performance for ITU-t g.723.1 CODEC
Vladimir Fabregas Surigué de Alencar, Abraham Alcaim

Dynamic integration of multiple feature streams for robust real-time LVCSR
Shoei Sato, Kazuo Onoe, Akio Kobayashi, Shinich Homma, Toru Imai, Tohru Takagi, Tetsunori Kobayashi

PCA-based feature extraction for fluctuation in speaking style of articulation disorders
Hironori Matsumasa, Tetsuya Takiguchi, Yasuo Ariki, Ichao Li, Toshitaka Nakabayashi

Multi-stream features combination based on dempster-shafer rule for LVCSR system
Fabio Valente, Jithendra Vepa, Hynek Hermansky

Dimensionality reduction for speech recognition using neighborhood components analysis
Natasha Singh-Miller, Michael Collins, Timothy J. Hazen

Probabilistic latent speaker analysis for large vocabulary speech recognition
Dan Su, Xihong Wu, Huisheng Chi

MRASTA and PLP in automatic speech recognition
S. R. Mahadeva Prasanna, Hynek Hermansky

Objective Assessment of Voice and Speech Quality

Women's vocal aging: a longitudinal approach
Markus Brückl

Effect of intensive voice therapy on vocal tremor for parkinson speakers
Laurence Cnockaert, Jean Schoentgen, Canan Ozsancak, Pascal Auzou, Francis Grenez

Assessment of vocal dysperiodicities in connected disordered speech
A. Alpan, A. Kacha, Francis Grenez, Jean Schoentgen

Effects of FE modelled consequences of tonsillectomy on perceptual evaluation of voice
Anne-Maria Laukkanen, Jaromír Horáček, Pavel Švancara, Elina Lehtinen

Speech quality after major surgery of the oral cavity and oropharynx with microvascular soft tissue reconstruction
Irma M. Verdonck-de Leeuw, Louis ten Bosch, Li Ying Chao, Rico N. P. M. Rinkel, Pepijn A. Borggreven, Lou Boves, C. René Leemans

Voice fatigue and use of speech recognition: a study of voice quality ratings
Christel de Bruijn, Sandra Whiteside

Complementary approaches for voice disorder assessment
Jean-François Bonastre, Corinne Fredouille, A. Ghio, A. Giovanni, G. Pouchoulin, J. Révis, B. Teston, P. Yu

Frequency study for the characterization of the dysphonic voices
G. Pouchoulin, Corinne Fredouille, Jean-François Bonastre, A. Ghio, A. Giovanni

Acoustic correlates of laryngeal-muscle fatigue: findings for a phonometric prevention of acquired voice pathologies
Victor J. Boucher

Automatic scoring of the intelligibility in patients with cancer of the oral cavity
Andreas Maier, Maria Schuster, Anton Batliner, Elmar Nöth, Emeka Nkenke

Automatic assessment of children's reading level
Jacques Duchateau, Leen Cleuren, Hugo Van hamme, Pol Ghesquière

Using waveform matching techniques in the measurement of shimmer in voiced signals
Carlos Ferrer, María E. Hernández-Díaz, Eduardo González

Analysis of the impact of analogue telephone channel on MFCC parameters for voice pathology detection
R. Fraile, J. I. Godino-Llorente, N. Sáenz-Lechón, V. Osma-Ruiz, P. Gómez-Vilda

Objective parameters from videokymographic images: a user-friendly interface
C. Manfredi, L. Bocchi, G. Cantarella, G. Peretti, G. Guidi, V. Mezzatesta

Discourse, Dialog and Emotion Expression

Integrating audio and visual cues for speaker friendliness in multimodal speech synthesis
David House

The influence of masking words on the prediction of TRPs in a shadowed dialog
Wieneke Wesseling, R. J. J. H. van Son, Louis C. W. Pols

Analysis of the occurrence of laughter in meetings
Kornel Laskowski, Susanne Burger

Incremental perception of acted and real emotional speech
Pashiera Barkhuysen, Emiel Krahmer, Marc Swerts

Speaking through a noisy channel - experiments on inducing clarification behaviour in human-human dialogue
David Schlangen, Raquel Fernández

Computerized chironomy: evaluation of hand-controlled intonation reiteration
Christophe D'Alessandro, Albert Rilliard, Sylvain Le Beux

Resource Acquisition and Preparation; Resource and System Evaluation

JAAE: the java abstract annotation editor
Ivan Habernal, Miloslav Konopík

How to judge reusability of existing speech corpora for target task by utilizing statistical multidimensional scaling
Goshu Nagino, Makoto Shozakai, Kiyohiro Shikano

Feasibility of constructing an expressive speech corpus from television soap opera dialogue
Peter Rutten

Collection of empirical data for standardization of generic vocabularies in speech driven ICT devices and services
Rosemary Orr, Bernat González i Llinares, Françoise Petersen, Helge Hüttenrauch, Martin Böcker, Michael Tate

Acoustic-phonetic features for refining the explicit speech segmentation
Antonio Marcos Selmini, Fábio Violaro

Text island spotting in large speech databases
B. Lecouteux, Georges Linarès, Frédéric Beaugendre, Pascal Nocera

People watcher: a game for eliciting human-transcribed data for automated directory assistance
Tim Paek, Yun-Cheng Ju, Christopher Meek

The effect of speech interface accuracy on driving performance
Andrew Kun, Tim Paek, Zeljko Medenica

Context constrained-generalized posterior probability for verifying phone transcriptions
Hua Zhang, Lijuan Wang, Frank K. Soong, Wenju Liu

Getting start with UTDrive: driver-behavior modeling and assessment of distraction for in-vehicle speech systems
Pongtep Angkititrakul, DongGu Kwak, SangJo Choi, JeongHee Kim, Anh PhucPhan, Amardeep Sathyanarayana, John H. L. Hansen

Relative evaluation of informativeness in machine generated summaries
BalaKrishna Kolluru, Yoshihiko Gotoh

A method for evaluating task-oriented spoken dialog translation systems based on communication efficiency
Toshiyuki Takezawa, Masahide Mizushima, Tohru Shimizu, Genichiro Kikui

Using eye movements for online evaluation of speech synthesis
Charlotte van Hooijdonk, Edwin Commandeur, Reinier Cozijn, Emiel Krahmer, Erwin Marsi

Sentence level intelligibility evaluation for Mandarin text-to-speech systems using semantically unpredictable sentences
Jian Li, Dmitry Sityaev, Jie Hao

N-best: the northern- and southern-dutch benchmark evaluation of speech recognition technology
Judith Kessens, David A. van Leeuwen

A MAP based approach to adaptive speech intelligibility measurements
Trym Holter, Svein Sørsdal

Phone boundary detection using selective refinements and context-dependent acoustic features
Sirinoot Boonsuk, Proadpran Punyabukkana, Atiwong Suchato

ASR: New Paradigms

Modeling context and language variation for non-native speech recognition
Tien-Ping Tan, Laurent Besacier

An evaluation of cross-language adaptation and native speech training for rapid HMM construction based on very limited training data
Xufang Zhao, Douglas O'Shaughnessy

Never-ending learning with dynamic hidden Markov network
Konstantin Markov, Satoshi Nakamura

Building multiple complementary systems using directed decision trees
C. Breslin, M. J. F. Gales

Automatic speech recognition framework for multilingual audio contents
Hiroaki Nanjo, Yuichi Oku, Takehiko Yoshimi

Combined acoustic and pronunciation modelling for non-native speech recognition
G. Bouselmi, Dominique Fohr, I. Illina

Automatic estimation of scaling factors among probabilistic models in speech recognition
Tadashi Emori, Yoshifumi Onishi, Koichi Shinoda

Memory efficient modeling of polyphone context with weighted finite-state transducers
Emilian Stoimenov, John McDonough

Extra large vocabulary continuous speech recognition algorithm based on information retrieval
Valeriy Pylypenko

PocketSUMMIT: small-footprint continuous speech recognition
I. Lee Hetherington

Development of preschool children subsystem for ASR and q&a in a real-environment speech-oriented guidance task
Tobias Cincarek, Izumi Shindo, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

A study on word detector design and knowledge-based pruning and rescoring
Chengyuan Ma, Chin-Hui Lee

Parameter tuning for fast speech recognition
Thomas Colthurst, Tresi Arvizo, Chia-Lin Kao, Owen Kimball, Stephen A. Lowe, David R. H. Miller, Jim Van Sciver

A computational model for unsupervised word discovery
Louis ten Bosch, Bert Cranen

Phoneme confusions in human and automatic speech recognition
Bernd T. Meyer, Matthias Wächter, Thomas Brand, Birger Kollmeier

Construction of spoken language model including fillers using filler prediction model
Kengo Ohta, Masatoshi Tsuchiya, Seiichi Nakagawa

Attention shift decoding for conversational speech recognition
Raghunandan Kumaran, Jeff Bilmes, Katrin Kirchhoff

Speech and Language Technology for Less-resourced Languages

A morpho-graphemic approach for the recognition of spontaneous speech in agglutinative languages - like Hungarian
Péter Mihajlik, Tibor Fegyó, Zoltán Tüske, Pavel Ircing

A semi-supervised learning approach for morpheme segmentation for an Arabic dialect
Mei Yang, Jing Zheng, Andreas Kathol

Accelerating the annotation of lexical data for less-resourced languages
Gerhard B. van Huyssteen, Martin J. Puttkammer

On web-based creation of speech resources for less-resourced languages
Christoph Draxler

Building an information retrieval system for serbian - challenges and solutions
Miroslav Martinović, Srdjan Vesić, Goran Rakić

Bootstrapping morphological analysis of gĩkũyũ using unsupervised maximum entropy learning
Guy De Pauw, Peter Waiganjo Wagacha

The voiceTRAN machine translation system
Jerneja Žganec Gros, Stanislav Gruden

MuLAS: a framework for automatically building multi-tier corpora
Sérgio Paulo, Luís C. Oliveira

Creating multimedia dictionaries of endangered languages using LEXUS
Jacquelijn Ringersma, Marc Kemps-Snijders

IceNLP: a natural language processing toolkit for icelandic
Hrafn Loftsson, Eiríkur Rögnvaldsson

Phonotactic spoken language identification with limited training data
Marius Peche, Marelie Davel, Etienne Barnard

Automatic speech recognition for an under-resourced language - amharic
Solomon Teferra Abate, Wolfgang Menzel

Information retrieval strategies for accessing african audio corpora
Abdillahi Nimaan, Pascal Nocera, Frédéric Béchet, Jean-François Bonastre

Morfessor and variKN machine learning tools for speech and language technology
Vesa Siivola, Mathias Creutz, Mikko Kurimo

Towards better language modeling for Thai LVCSR
Markpong Jongtaveesataporn, Issara Thienlikit, Chai Wutiwiwatchai, Sadaoki Furui

Spoken Language Understanding

Generative and discriminative algorithms for spoken language understanding
Christian Raymond, Giuseppe Riccardi

A soft-clustering algorithm for automatic induction of semantic classes
Elias Iosif, Alexandros Potamianos

Classification of discourse functions of affirmative words in spoken dialogue
Agustín Gravano, Stefan Benus, Julia Hirschberg, Shira Mitchell, Ilia Vovsha

Conditional use of word lattices, confusion networks and 1-best string hypotheses in a sequential interpretation strategy
Bogdan Minescu, Géraldine Damnati, Frédéric Béchet, Renato De Mori

Speaker adaptation of language models for automatic dialog act segmentation of meetings
Jáchym Kolář, Yang Liu, Elizabeth Shriberg

Unsupervised categorisation approaches for technical support automated agents
Amparo Albalate, Dimitar Dimitrov, Roberto Pieraccini

Pitch Extraction I, II

Joint position-pitch extraction from multichannel audio
Michael Wohlmayr, Marián Képesi

Morphological pre-processing technique and its applications on speech signal
Hyun Soo Kim

A pitch extraction system based on phase locked loops and consensus decision
Patricia A. Pelle, Claudio F. Estienne

A robust multi-phase pitch-mark detection algorithm
Milan Legát, Jindřich Matoušek, Daniel Tihelka

Pitch estimation of noisy speech signals using empirical mode decomposition
Md. Khademul Islam Molla, Keikichi Hirose, Nobuaki Minematsu, Md. Kamrul Hasan

Evaluating two versions of the momel pitch modelling algorithm on a corpus of read speech in Korean
Daniel Hirst, Hyongsil Cho, Sunhee Kim, Hyunji Yu

Hybrid electroglottograph and speech signal based algorithm for pitch marking
Hussein Hussein, Oliver Jokisch

A fine pitch model for speech
Jasha Droppo, Alex Acero

Pitch period estimation using multipulse model and wavelet transform
Prasanta Kumar Ghosh, Antonio Ortega, Shrikanth S. Narayanan

Combining rate and place information for robust pitch extraction
Martin Heckmann, Frank Joublin, Christian Goerick

Integrating pitch and localisation cues at a speech fragment level
Heidi Christensen, Ning Ma, Stuart N. Wrigley, Jon Barker

Speech fundamental frequency estimation using the alternate comb
Jean-Sylvain Liénard, François Signol, Claude Barras

Detecting pitch accent using pitch-corrected energy-based predictors
Andrew Rosenberg, Julia Hirschberg

Speech Coding and Transmission

Normalized two stage SVQ for minimum complexity wide-band LSF quantization
Saikat Chatterjee, T. V. Sreenivas

A novel 2kb/s waveform interpolation speech coder based on non-negative matrix factorization
Peng Zhang, Chang-chun Bao

A novel energy distribution comparison approach for robust speech spectrum vector quantization
Ahmed Ismail, Yasser Dakroury, Hazem Abbas

Novel low-band phase representation for low bit-rate speech coding
Ahmed Ismail, Yasser Dakroury, Hazem Abbas

Perceptual-based playout mechanisms for multi-stream voice over IP networks
Chun-Feng Wu, Cheng-Lung Lee, Wen-Whei Chang

Time-warping and re-phasing in packet loss concealment
Robert Zopf, Jes Thyssen, Juin-Hwey Chen

The harmonic model codec (HMC) framework for voIP
Yannis Agiomyrgiannakis, Yannis Stylianou

Bit-erasure channel decoding for GMM-based multiple description coding
Yannis Agiomyrgiannakis, Yannis Stylianou

Degradation-classification assisted single-ended quality measurement of speech
Hua Yuan, Tiago H. Falk, Wai-Yip Chan

Concept and evaluation of a downward-compatible system for spatial teleconferencing using automatic speaker clustering
Alexander Raake, Sascha Spors, Jens Ahrens, Jitendra Ajmera

Speech quality estimation using packet loss effects in CELP-type speech coders
Min-Ki Lee, Kyung-Tae Kim, Hong-Goo Kang, Dae Hee Youn

An 8-32 kbit/s scalable wideband coder extended with MDCT-based bandwidth extension on top of a 6.8 kbit/s narrowband CELP coder
Masahiro Oshikiri, Hiroyuki Ehara, Toshiyuki Morii, Tomofumi Yamanashi, Kaoru Satoh, Koji Yoshida

Topics in Acoustic Modeling

Comparison of HMM and DTW methods in automatic recognition of pathological phoneme pronunciation
Robert Wielgat, Tomasz P. Zieliński, Paweł Świętojański, Piotr Żołądź, Daniel Król, Tomasz Woźniak, Stanisław Grabias

Unsupervised training with directed manual transcription for recognising Mandarin broadcast audio
K. Yu, M. J. F. Gales, P. C. Woodland

Context dependent syllable acoustic model for continuous Chinese speech recognition
Hao Wu, Xihong Wu

A sub-optimal viterbi-like search for linear dynamic models classification
Dimitris Oikonomidis, Vassilis Diakoloukas, Vassilis Digalakis

On the equivalence of Gaussian HMM and Gaussian HMM-like hidden conditional random fields
Georg Heigold, Ralf Schlüter, Hermann Ney

Speeding-up neural network training using sentence and frame selection
Stefano Scanzio, Pietro Laface, Roberto Gemello, Franco Mana

Using a small development set to build a robust dialectal Chinese speech recognizer
Linquan Liu, Thomas Fang Zheng, Makoto Akabane, Ruxin Chen, Wenhu Wu

Confidence Measures (and Related Topics)

Unsupervised re-scoring of observation probability in viterbi based on reinforcement learning by using confidence measure and HMM neighborhood
Carlos Molina, Nestor Becerra Yoma, Fernando Huenupán, Claudio Garreton

Optimization on decoding graphs by discriminative training
Shiuan-Sung Lin, François Yvon

Morphosyntactic processing of n-best lists for improved recognition and confidence measure computation
Stéphane Huet, Guillaume Gravier, Pascale Sébillot

How predictable is ASR confidence in dialog applications?
Xiang Li, Juan M. Huerta

Error detection in confusion network
Alexandre Allauzen

An approach to efficient generation of high-accuracy and compact error-corrective models for speech recognition
Takanobu Oba, Takaaki Hori, Atsushi Nakamura

Detection of out-of-vocabulary words in posterior based ASR
Hamed Ketabdar, Mirko Hannemann, Hynek Hermansky

Grapheme-to-Phoneme Conversion

Homograph ambiguity resolution in front-end design for portuguese TTS systems
Daniela Braga, Luís Coelho, Fernando Gil V. Resende

New word acquisition using subword modeling
Ghinwa F. Choueiter, Stephanie Seneff, James Glass

Language identification of person names using CF-IOF based weighing function
Samuel Thomas, Ashish Verma

G2p conversion of names: what can we do (better)?
Henk van den Heuvel, Jean-Pierre Martens, Nanneke Konings

A learning method for Thai phonetization of English words
Ausdang Thangthai, Chai Wutiwiwatchai, Anocha Ragchatjaroen, Sittipong Saychum

Spontaneous speech synthesis by pronunciation variant selection - a comparison to natural speech
Steffen Werner, Rüdiger Hoffmann

A generic methodology of converting transliterated text to phonetic strings case study: greeklish
Nikos Tsourakis, Vassilis Digalakis

Probabilistic deduction of symbol mappings for extension of lexicons
Rita Singh, Evandro B. Gouvêa, Bhiksha Raj

Lexical and Prosodic Modeling

Use of syllable center detection for improved duration modeling in Chinese Mandarin connected digits recognition
Sergey Astrov, Joachim Hofer, Harald Höge

Using phonetic features in unsupervised word decompounding for ASR with application to a less-represented language
Thomas Pellegrini, Lori Lamel

Robust F0 modeling for Mandarin speech recognition in noise
Sheng Qiang, Yao Qian, Frank K. Soong, Congfu Xu

Word duration modeling for word graph rescoring in LVCSR
Dino Seppi, Daniele Falavigna, Georg Stemmer, Roberto Gretter

On automatic prominence detection for German
Fabio Tamburini, Petra Wagner

Prosody-enriched lattices for improved syllable recognition
Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan

Exploiting phoneme similarities in hybrid HMM-ANN keyword spotting
Joel Pinto, Andrew Lovitt, Hynek Hermansky

Online vocabulary adaptation using limited adaptation data
C. E. Liu, K. Thambiratnam, F. Seide

Speech Recognition by Automatic Attribute Transcription

An overview on automatic speech attribute transcription (ASAT)
Chin-Hui Lee, Mark A. Clements, Sorin Dusan, Eric Fosler-Lussier, Keith Johnson, Biing-Hwang Juang, Lawrence R. Rabiner

Detection-based ASR in the automatic speech attribute transcription project
Ilana Bromberg, Qian Qian, Jun Hou, Jinyu Li, Chengyuan Ma, Brett Matthews, Antonio Moreno-Daniel, Jeremy Morris, Sabato Marco Siniscalchi, Yu Tsao, Yu Wang

Attribute-based Mandarin speech recognition using conditional random fields
Chi-Yueh Lin, Hsiao-Chuan Wang

Comparing classifiers for pronunciation error detection
Helmer Strik, Khiet P. Truong, Febe de Wet, Catia Cucchiarini

Using prosodic and spectral characteristics for sleepiness detection
Jarek Krajewski, Bernd Kröger

Score fusion for articulatory feature detection
Brian M. Ore, Raymond E. Slyh

Speaker Diarization

Improved location features for meeting speaker diarization
Scott Otterson

A robust stopping criterion for agglomerative hierarchical clustering in a speaker diarization system
Kyu J. Han, Shrikanth S. Narayanan

The blame game: performance analysis of speaker diarization system components
Marijn Huijbregts, Chuck Wooters

Trainable speaker diarization
Hagai Aronowitz

Improving speaker diarization for CHIL lecture meetings
Jing Huang, Etienne Marcheret, Karthik Visweswariah

Speaker diarization using normalized cross likelihood ratio
Viet-Bac Le, Odile Mella, Dominique Fohr

First and Second Language Learning

Tone production by the speakers of different age-and-gender groups
Wai-Sum Lee

Vowels and tones in infant directed speech: hyperarticulation for both, but different developmental patterns
Nan Xu, Denis Burnham, Christine Kitamura

Acquisition of vowel duration in children speaking american English
Eon-Suk Ko

F₀ models show Chinese speakers of Japanese insert intonational boundaries and drop pitch
Hiroko Hirano, Keikichi Hirose, Goh Kawai, Wentao Gu, Nobuaki Minematsu

Formal modelling of L1 and L2 perceptual learning: computational linguistics versus machine learning
Paola Escudero, Jelle Kastelein, Klara Weiand, R. J. J. H. van Son

Kettle hinders cat, shadow does not hinder shed: activation of ‘almost embedded’ words in nonnative listening
Mirjam Broersma

Speech Synthesis I, II

An HMM-based speech synthesis system applied to German and its adaptation to a limited set of expressive football announcements
Sacha Krstulović, Anna Hunecke, Marc Schröder

Statistical vowelization of Arabic text for speech synthesis in speech-to-speech translation systems
Liang Gu, Wei Zhang, Lazkin Tahir, Yuqing Gao

A pair-based language model for the robust lexical analysis in Chinese text-to-speech synthesis
Wu Liu, Dezhi Huang, Yuan Dong, Xinnian Mao, Haila Wang

A trainable excitation model for HMM-based speech synthesis
R. Maia, Tomoki Toda, Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda

Cross-language phonemisation in German text-to-speech synthesis
Jochen Steigner, Marc Schröder

Preliminary experiments toward automatic generation of new TTS voices from recorded speech alone
Ryuki Tachibana, Tohru Nagano, Gakuto Kurata, Masafumi Nishimura, Noboru Babaguchi

Implementation and evaluation of an HMM-based Thai speech synthesis system
Suphattharachai Chomphan, Takao Kobayashi

Speech synthesis enhancement in noisy environments
Davide Bonardo, Enrico Zovato

Tagging syllable boundaries with joint n-gram models
Helmut Schmid, Bernd Möbius, Julia Weidenkaff

Hierarchical non-uniform unit selection based on prosodic structure
Jun Xu, Dezhi Huang, Yongxin Wang, Yuan Dong, Lianhong Cai, Haila Wang

Control of an articulatory speech synthesizer based on dynamic approximation of spatial articulatory targets
Peter Birkholz

A preselection method based on cost degradation from the optimal sequence for concatenative speech synthesis
Nobuyuki Nishizawa, Hisashi Kawai

Line cepstral quefrencies and their use for acoustic inventory coding
Guntram Strecha, Matthias Eichner, Rüdiger Hoffmann

Articulatory acoustic feature applications in speech synthesis
Peter Cahill, Daniel Aioanei, Julie Carson-Berndsen

Approaches for adaptive database reduction for text-to-speech synthesis
Aleksandra Krul, Géraldine Damnati, François Yvon, Cédric Boidin, Thierry Moudenc

Exploiting unlabeled internal data in conditional random fields to reduce word segmentation errors for Chinese texts
Richard Tzong-Han Tsai, Hsi-Chuan Hung, Hong-Jie Dai, Wen-Lian Hsu

On the role of spectral dynamics in unit selection speech synthesis
Barry Kirkpatrick, Darragh O'Brien, Ronán Scaife, Andrew Errity

ugloss: a framework for improving spoken language generation understandability
Brian Langner, Alan W. Black

Combination of LSF and pole based parameter interpolation for model-based diphone concatenation
Karl Schnell, Arild Lacroix

Automatic building of synthetic voices from large multi-paragraph speech databases
Kishore Prahallad, Arthur R. Toth, Alan W. Black

Automatic phonetic segmentation of Spanish emotional speech
A. Gallardo-Antolín, R. Barra, Marc Schröder, Sacha Krstulović, J. M. Montero

Iterative unit selection with unnatural prosody detection
Dacheng Lin, Yong Zhao, Frank K. Soong, Min Chu, Jieyu Zhao

Voice Conversion and Modification

F0 transformation within the voice conversion framework
Zdeněk Hanzlíček, Jindřich Matoušek

Weighted frequency warping for voice conversion
Daniel Erro, Asunción Moreno

Frame alignment method for cross-lingual voice conversion
Daniel Erro, Asunción Moreno

Voicing level control with application in voice conversion
Jani Nurminen, Jilei Tian, Victor Popa

New algorithm for LPC residual estimation from LSF vectors for a voice conversion system
Winston S. Percybrooks, Elliot Moore

Speaker adaptive training for one-to-many eigenvoice conversion based on Gaussian mixture model
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

Improving the phase vocoder approach to pitch-shifting
Petko N. Petkov, W. Bastiaan Kleijn

Comparing GMM-based speech transformation systems
Larbi Mesbahi, Vincent Barreaud, Olivier Boeffard

Improved Acoustic Modeling for ASR

Improved HMM/SVM methods for automatic phoneme segmentation
Jen-Wei Kuo, Hung-Yi Lo, Hsin-Min Wang

Gaussian mixture optimization for HMM based on efficient cross-validation
Takahiro Shinozaki, Tatsuya Kawahara

Model-space MLLR for trajectory HMMs
Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda

In-context phone posteriors as complementary features for tandem ASR
Hamed Ketabdar, Hervé Bourlard

Phone-discriminating minimum classification error (p-MCE) training for phonetic recognition
Qian Qian, Xiaodong He, Li Deng

Improved acoustic modeling for transcribing Arabic broadcast data
Lori Lamel, Abdel. Messaoudi, Jean-Luc Gauvain

String and lattice based discriminative training for the corpus of spontaneous Japanese lecture transcription task
Erik McDermott, Atsushi Nakamura

Discriminative noise adaptive training approach for an environment migration
Byung-Ok Kang, Ho-Young Jung, Yun-Keun Lee

Word confusability - measuring hidden Markov model similarity
Jia-Yu Chen, Peder A. Olsen, John R. Hershey

Speech recognition with state-based nearest neighbour classifiers
Thomas Deselaers, Georg Heigold, Hermann Ney

HMM-based speech recognition using decision trees instead of GMMs
Remco Teunen, Masami Akamine

An improved method for unsupervised training of LVCSR systems
Christian Gollan, Stefan Hahn, Ralf Schlüter, Hermann Ney

A variational approach to robust maximum likelihood estimation for speech recognition
Mohamed Kamal Omar

Generating small, accurate acoustic models with a modified Bayesian information criterion
Kai Yu, Rob A. Rutenbar

Sparse Gaussian graphical models for speech recognition
Peter Bell, Simon King

An HMM acoustic model incorporating various additional knowledge sources
Sakriani Sakti, Konstantin Markov, Satoshi Nakamura

Comparison of subspace methods for Gaussian mixture models in speech recognition
Matti Varjokallio, Mikko Kurimo

Multilingualism in Speech and Language Processing

SPICE: web-based tools for rapid language adaptation in speech processing systems
Tanja Schultz, Alan W. Black, Sameer Badaskar, Matthew Hornyak, John Kominek

Introduction to multilingual corpus-based concatenative speech synthesis
Filip Deprez, Jan Odijk, Jan De Moortel

Recognition of foreign names spoken by native speakers
Frederik Stouten, Jean-Pierre Martens

Language identification using several sources of information with a multiple-Gaussian classifier
R. Cordoba, L. F. D'Haro, F. Fernandez-Martinez, J. M. Montero, R. Barra

Dynamic language change in MIMUS
Carmen Del Solar, Guillermo Pérez, Eva Florencio, David Moral, Gabriel Amores, Pilar Manchón

Systems for LVCSR and Rich Transcription I, II

The RWTH 2007 TC-STAR evaluation system for european English and Spanish
Jonas Lööf, Christian Gollan, Stefan Hahn, Georg Heigold, B. Hoffmeister, Christian Plahl, David Rybach, Ralf Schlüter, Hermann Ney

Using direction of arrival estimate and acoustic feature information in speaker diarization
Eugene Chin Wei Koh, Hanwu Sun, Tin Lay Nwe, Trung Hieu Nguyen, Bin Ma, Eng Siong Chng, Haizhou Li, Susanto Rahardja

Recovering punctuation marks for automatic speech recognition
Fernando Batista, Diamantino Caseiro, Nuno Mamede, Isabel Trancoso

Disfluency correction of spontaneous speech using conditional random fields with variable-length features
Jui-Feng Yeh, Chung-Hsien Wu, Wei-Yen Wu

Detection, diarization, and transcription of far-field lecture speech
Jing Huang, Etienne Marcheret, Karthik Visweswariah, Vit Libal, Gerasimos Potamianos

Speech-based annotation and retrieval of digital photographs
Timothy J. Hazen, Brennan Sherry, Mark Adler

Co-training using prosodic and lexical information for sentence segmentation
Umit Guz, Sébastien Cuendet, Dilek Hakkani-Tür, Gokhan Tur

Extracting true speaker identities from transcriptions
Yannick Estève, Sylvain Meignier, Paul Deléglise, Julie Mauclair

An improved speaker diarization system
Rong Fu, Ian D. Benest

The ISL 2007 English speech transcription system for european parliament speeches
Sebastian Stüker, Christian Fügen, Florian Kraft, Matthias Wölfel

Advances in Mandarin broadcast speech recognition
Mei-Yuh Hwang, Wen Wang, Xin Lei, Jing Zheng, Ozgur Cetin, Gang Peng

Automatic transcription for a web 2.0 service to search podcasts
Jun Ogata, Masataka Goto, Kouichirou Eto

Language Learning and Assessment

A text-free approach to assessing nonnative intonation
Joseph Tepperman, Abe Kazemzadeh, Shrikanth S. Narayanan

Automatic generation of cloze items for prepositions
John Lee, Stephanie Seneff

Evaluating and optimizing Japanese tutor system featuring dynamic question generation and interactive guidance
Christopher Waple, Hongcui Wang, Tatsuya Kawahara, Yasushi Tsubota, Masatake Dantsuji

ASR-based pronunciation training: scoring accuracy and pedagogical effectiveness of a system for dutch L2 learners
Catia Cucchiarini, Ambra Neri, Febe de Wet, Helmer Strik

A Bayesian network classifier for word-level reading assessment
Joseph Tepperman, Matthew Black, Patti Price, Sungbok Lee, Abe Kazemzadeh, Matteo Gerosa, Margaret Heritage, Abeer Alwan, Shrikanth S. Narayanan

Multimodal Interaction: Analysis and Technology

Behavior models for learning and receptionist dialogs
Hartwig Holzapfel, Alex Waibel

Design of a rich multimodal interface for mobile spoken route guidance
Markku Turunen, Jaakko Hakulinen, Anssi Kainulainen, Aleksi Melto, Topi Hurtig

The virtual guide: a direction giving embodied conversational agent
Mariët Theune, Dennis Hofs, Marco van Kessel

Creating spoken dialogue characters from corpora without annotations
Sudeep Gandhe, David Traum

Complementarity and redundancy in multimodal user inputs with speech and pen gestures
Pui-Yu Hui, Zhengyu Zhou, Helen Meng

Children's convergence in referring expressions to graphical objects in a speech-enabled computer game
Linda Bell, Joakim Gustafson

Emotion

An analysis of individual differences in the f₀ contour and the duration of anger utterances at several degrees
Hiromi Kawatsu, Sumio Ohno

Acoustic features of anger utterances during natural dialog
Yoshiko Arimoto, Sumio Ohno, Hitoshi Iida

Comparing american and palestinian perceptions of charisma using acoustic-prosodic and lexical analysis
Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg, Wisam Dakka

Using neutral speech models for emotional speech analysis
Carlos Busso, Sungbok Lee, Shrikanth S. Narayanan

Emotion clustering using the results of subjective opinion tests for emotion recognition in infants' cries
N. Satoh, K. Yamauchi, S. Matsunaga, M. Yamashita, R. Nakagawa, K. Shinohara

On the limitations of voice conversion techniques in emotion identification tasks
R. Barra, J. M. Montero, J. Macias-Guarasa, J. Gutiérrez-Arriola, J. Ferreiros, J. M. Pardo

Use of lexical and affective prosodic cues to emotion by younger and older adults
Kate Dupuis, Kathleen Pichora-Fuller

Two-stream emotion recognition for call center monitoring
Purnima Gupta, Nitendra Rajput

The role of intonation and voice quality in the affective speech perception
Ioulia Grichkovtsova, Anne Lacheret, Michel Morel

Combining frame and turn-level information for robust recognition of emotions within speech
Bogdan Vlasenko, Björn Schuller, Andreas Wendemuth, Gerhard Rigoll

Speakers: Expression, Emotion and Personality Recognition

The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals
Björn Schuller, Anton Batliner, Dino Seppi, Stefan Steidl, Thurid Vogt, Johannes Wagner, Laurence Devillers, Laurence Vidrascu, Noam Amir, Loic Kessous, Vered Aharonson

Automatic question detection: prosodic-lexical features and crosslingual experiments
Vũ Minh Quang, Laurent Besacier, Eric Castelli

Performance evaluation of HMM-based style classification with a small amount of training data
Makoto Tachibana, Keigo Kawashima, Junichi Yamagishi, Takao Kobayashi

Visualizing acoustic similarities between emotions in speech: an acoustic map of emotions
Khiet P. Truong, David A. van Leeuwen

Fusion of global statistical and segmental spectral features for speech emotion recognition
Hao Hu, Ming-Xing Xu, Wei Wu

Group delay features for emotion detection
Vidhyasaharan Sethu, Eliathamby Ambikairajah, Julien Epps

Combining short-term cepstral and long-term pitch features for automatic recognition of speaker age
Christian Müller, Felix Burkhardt

Detecting deception using critical segments
Frank Enos, Elizabeth Shriberg, Martin Graciarena, Julia Hirschberg, Andreas Stolcke

Style estimation of speech based on multiple regression hidden semi-Markov model
Takashi Nose, Yoichi Kato, Takao Kobayashi

Analysis and classification of speech mode: whispered through shouted
Chi Zhang, John H. L. Hansen

First Language, Second Language, Cross-language

Perception and production of word-final alveolar stops by brazilian portuguese learners of English
Melissa Bettoni-Techio, Andréia S. Rauber, Rosana Denise Koerich

The relationship between the perception and production of English nasal codas by brazilian learners of English
Denise Cristina Kluge, Andréia S. Rauber, Mara Silvia Reis, Ricardo A. Hoffmann Bion

CALL courseware for learning reactive tokens in face-to-face dialogs
Takafumi Utashiro, Goh Kawai

The developmental analysis of demonstrative expression skills utilizing a multimodal infant behavior corpus
Shinya Kiriyama, Ryo Tsuji, Tomohiko Kasami, Shogo Ishikawa, Naofumi Otani, Hiroaki Horiuchi, Yoichi Takebayashi, Shigeyoshi Kitazawa

Russian vowels system acoustic features development in ontogenesis
Elena E. Lyakso, Olga V. Frolova

The role of metrical stress in comprehension and production in dutch children at-risk of dyslexia
Petra van Alphen, Elise de Bree, Paula Fikkert, Frank Wijnen

A statistical method of evaluating pronunciation proficiency for presentation in English
Seiichi Nakagawa, Kei Ohta

The intelligibility and its relations to acoustic characteristics of English /s/ and /esh/ produced by native speakers of Japanese
Akiyo Joto, Yoshiki Nagase, Seiya Funatsu

The limits of multidimensional category learning
Martijn Goudbeek, Daniel Swingley, Keith R. Kluender

Mobile adaptive CALL (MAC): a lightweight speech-based intervention for mobile language learners
Maria Uther, James Uther, Panos Athanasopoulos, Pushpendra Singh, Reiko Akahane-Yamada

English and French speakers' perception of voicing distinctions in non-native lateral consonant syllable onsets
Catherine T. Best, Pierre A. Hallé, Jennifer S. Pardo

Predicting the consequences of vocalizations in early infancy
Francisco Lacerda, Lisa Gustavsson

Learning tone distinctions for Mandarin Chinese
David Weenink, Guangqin Chen, Zongyan Chen, Stefan de Konink, Dennis Vierkant, Eveline van Hagen, R. J. J. H. van Son

Perception of disfluency: language differences and listener bias
Catherine Lai, Kyle Gorman, Jiahong Yuan, Mark Liberman

Novel Techniques for the NATO Non-native Air-traffic Control and HIWIRE Cockpit Databases

Design and characterization of the non-native military air traffic communications database (nnMATC)
Stephane Pigeon, Wade Shen, Aaron Lawson, David A. van Leeuwen

A comparison of speaker clustering and speech recognition techniques for air situational awareness
Wade Shen, Douglas Reynolds

Advanced front-end for robust speech recognition in extremely adverse environments
Dimitrios Dimitriadis, Jose C. Segura, Luz Garcia, Alexandros Potamianos, Petros Maragos, Vassilis Pitsikalis

Experiments on hiwire database using denoising and adaptation with a hybrid HMM-ANN model
Roberto Gemello, Franco Mana, Stefano Scanzio

Detection and removal of switching noise in push-to-talk and voice operated exchange communications systems
Brett Y. Smolenski

Evaluation of the combined use of MEMLIN and MLLR on the non-native adaptation task of hiwire project database
Luis Buera, Antonio Miguel, Óscar Saz, Eduardo Lleida, Alfonso Ortega

Systems for Spoken Language Translation I, II

Improved machine translation of speech-to-text outputs
Daniel Déchelotte, Holger Schwenk, Gilles Adda, Jean-Luc Gauvain

Improvements in machine translation for English/iraqi speech translation
S. Saleem, K. Subramanian, R. Prasad, David Stallard, Chia-Lin Kao, P. Natarajan, R. Suleiman

Improving speech translation with automatic boundary prediction
Evgeny Matusov, Dustin Hillard, Mathew Magimai-Doss, Dilek Hakkani-Tür, Mari Ostendorf, Hermann Ney

Punctuating confusion networks for speech translation
Roldano Cattoni, Nicola Bertoldi, Marcello Federico

Integration of ASR and machine translation models in a document translation task
Aarthi Reddy, Richard Rose, Alain Désilets

Bilingual LSA-based translation lexicon adaptation for spoken language translation
Yik-Cheung Tam, Tanja Schultz

The BBN 2007 displayless English/iraqi speech-to-speech translation system
David Stallard, Fred Choi, Chia-Lin Kao, Kriste Krstovski, P. Natarajan, R. Prasad, S. Saleem, K. Subramanian

Context dependent word modeling for statistical machine translation using part-of-speech tags
Ruhi Sarikaya, Yonggang Deng, Yuqing Gao

Translating conversational speech to standard linguistic form
Darren Scott Appling, Nick Campbell

Using inter-lingual triggers for machine translation
Caroline Lavecchia, Kamel Smaïli, David Langlois, Jean-Paul Haton

The IRST English-Spanish translation system for european parliament speeches
Daniele Falavigna, Nicola Bertoldi, Fabio Brugnara, Roldano Cattoni, Mauro Cettolo, Boxing Chen, Marcello Federico, Diego Giuliani, Roberto Gretter, Deepa Gupta, Dino Seppi

The influence of utterance chunking on machine translation performance
Christian Fügen, Muntsin Kolss

Iraqcomm: a next generation translation system
Kristin Precoda, Jing Zheng, Dimitra Vergyri, Horacio Franco, Colleen Richey, Andreas Kathol, Sachin Kajarekar

Optimizing sentence segmentation for spoken language translation
Sharath Rao, Ian Lane, Tanja Schultz

Articulatory Features

A multitask learning perspective on acoustic-articulatory inversion
Korin Richmond

A comparison of acoustic features for articulatory inversion
Chao Qin, Miguel Á. Carreira-Perpiñán

Can unquantised articulatory feature continuums be modelled?
Odette Scharenborg, Vincent Wan

Estimation of place of articulation in stop consonants for visual feedback
Milind S. Shah, Prem C. Pandey

Compact representations of the articulatory-to-acoustic mapping
Blaise Potard, Yves Laprie

Articulatory feature classifiers trained on 2000 hours of telephone speech
Joe Frankel, Mathew Magimai-Doss, Simon King, Karen Livescu, Özgür Çetin

Wideband Speech Processing

Objective analysis of the effect of memory inclusion on bandwidth extension of narrowband speech
Amr H. Nour-Eldin, Peter Kabal

Artificial bandwidth extension without side information for ITU-t g.729.1
Bernd Geiser, Hervé Taddei, Peter Vary

The effect of highband harmonic structure in the artificial bandwidth expansion of telephone speech
Hannu Pulakka, Paavo Alku, Laura Laaksonen, Päivi Valve

Artificial bandwidth extension for speech signals using speech recogniton
Shingo Kuroiwa, Masashi Takashina, Satoru Tsuge, Ren Fuji

Voicing-based codebook in low-rate wideband CELP coding
Driss Guerchi, Tamer Rabie, Abdelrhani Louzi

Performance of speaker-dependent wideband speech coding
Ethan R. Duni, Bhaskar D. Rao

Accessibility Issues

Speech recognition techniques for a sign language recognition system
Philippe Dreuw, David Rybach, Thomas Deselaers, Morteza Zahedi, Hermann Ney

Impact of various small sound source signals on voice conversion accuracy in speech communication aid for laryngectomees
Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

Design and development of voice controlled aids for motor-handicapped persons
Petr Cerva, Jan Nouza

Management of static/dynamic properties in a multimodal interaction system
Kouichi Katsurada, Yuji Okuma, Makoto Yano, Yurie Iribe, Tsuneo Nitta

Evaluation of alternatives on speech to sign language translation
R. San-Segundo, A. Pérez, D. Ortiz, L. F. D'Haro, M. Inés Torres, F. Casacuberta

Speech based drug information system for aged and visually impaired persons
Géza Németh, Gábor Olaszy, Mátyás Bartalis, Géza Kiss, Csaba Zainkó, Péter Mihajlik

Automatic speech recognition with a cochlear implant front-end
Waldo Nogueira, Tamás Harczos, Bernd Edler, Jörn Ostermann, Andreas Büchner

Voice activated powered wheelchair with non-voice rejection algorithm
Soo-Young Suk, Hiroaki Kojima

Phonetic based sentence level rewriting of questions typed by dyslexic spellers in an information retrieval context
Laurianne Sitbon, Patrice Bellot, Philippe Blache

New Application Areas

How to integrate speech-operated internet information dialogs into a car
André Berton, Peter Regel-Brietzmann, Hans-Ulrich Block, Stefanie Schachtl, Manfred Gehrke

Recent progress in the MIT spoken lecture processing project
James Glass, Timothy J. Hazen, Scott Cyphers, Igor Malioutov, David Huynh, Regina Barzilay

How to personalize speech applications for web-based information in a car
Philipp Fischer, Andreas Österle, André Berton, Peter Regel-Brietzmann

Topic estimation with domain extensibility for guiding user's out-of-grammar utterances in multi-domain spoken dialogue systems
Satoshi Ikeda, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Prosody change and response timing analysis in spontaneously spoken dialogs and their modeling in a spoken dialog system
Ryota Nishimura, Norihide Kitaoka, Seiichi Nakagawa

GEMSIS - a novel application of speech recognition to emergency and disaster medicine
Satoshi Tamura, Kunihiko Takamatsu, Shinji Ogura, Satoru Hayamizu

Application of speech technology in a home based assessment kiosk for early detection of alzheimer's disease
Rachel Coulston, Esther Klabbers, Jacques de Villiers, John-Paul Hosom

Ontology-based multimodal high level fusion involving natural language analysis for aged people home care application
Olga Vybornova, Monica Gemo, Ronald Moncarey, Benoit Macq

Story Segmentation

Modeling the statistical behavior of lexical chains to capture word cohesiveness for automatic story segmentation
Shing-kai Chan, Lei Xie, Helen Meng

Cross-linguistic analysis of prosodic features for sentence segmentation
James G. Fung, Dilek Hakkani-Tür, Mathew Magimai-Doss, Elizabeth Shriberg, Sébastien Cuendet, Nikki Mirghafori

Varying input segmentation for story boundary detection in English, Arabic and Mandarin broadcast news
Andrew Rosenberg, Mehrbod Sharifi, Julia Hirschberg

Speaker role based structural classification of broadcast news stories
BalaKrishna Kolluru, Yoshihiko Gotoh

Prosody: Production

The influence of vowel quality features on peak alignment
Matthias Jilka, Bernd Möbius

Pitch accent versus lexical stress: quantifying acoustic measures related to the voice source
Yen-Liang Shue, Markus Iseli, Nanette Veilleux, Abeer Alwan

Prosody, emotions, and… ‘whatever’
Stefan Benus, Agustín Gravano, Julia Hirschberg

Modeling tones in hakka on the basis of the command-response model
Wentao Gu, Rerrario Shui-Ching Ho, Tan Lee

Length, ordering preference and intonational phrasing: evidence from pauses
Gerrit Kentner

Alignment of the second low target in dutch falling-rising pitch contours
Jörg Peters, Judith Hanssen, Carlos Gussenhoven

On filled-pauses and prolongations in european portuguese
Helena Moniz, Ana Isabel Mata, M. Céu Viana

Prosody: Perception

Dependence of tone perception on syllable perception
Michael Olsberg, Yi Xu, Jeremy Green

Testing the relevance of speech rate, pitch and a glottal Chink for the perception of age in synthesized speech using formant synthesis
Ralf Winkler

Utterance-final glottalization as a cue for familiar speaker recognition
Tamás Böhm, Stefanie Shattuck-Hufnagel

A rule-based speech morphing for verifying a expressive speech perception model
Chun-Fang Huang, Masato Akagi

On the importance of pure prosody in the perception of speaker identity
Elina E. Helander, Jani Nurminen

Perceptual relevance of pitch contours of Mandarin tones and its efficacy in prosody generation of speech synthesis
Shi-Han Chen, Chih-Chung Kuo

The effect of filled pauses in a lecture speech on impressive evaluation of listeners
Hiromitsu Nishizaki, Mitsuhiro Sohmiya, Kenji Kobayashi, Yoshihiro Sekiguchi

Perceptual equivalence of approximated Cantonese tone contours
Yujia Li, Tan Lee

Audiovisual emotional speech of game playing children: effects of age and culture
Suleman Shahid, Emiel Krahmer, Marc Swerts

Machine Learning for Spoken Dialog Systems

Machine learning for spoken dialogue systems
Oliver Lemon, Olivier Pietquin

Learning dialogue strategies for interactive database search
Verena Rieser, Oliver Lemon

Hierarchical dialogue optimization using semi-Markov decision processes
Heriberto Cuayáhuitl, Steve Renals, Oliver Lemon, Hiroshi Shimodaira

Knowledge consistent user simulations for dialog systems
Hua Ai, Diane J. Litman

Reducing recognition error rate based on context relationships among dialogue turns
Hsu-Chih Wu, Stephanie Seneff

Bayes risk-based optimization of dialogue management for document retrieval system with speech interface
Teruhisa Misu, Tatsuya Kawahara

Phonetics

Realisations and alternations in German /r/-realisation
Christiane Ulbrich, Horst Ulbrich

Singleton and geminate stops in Finnish - acoustic correlates
Christopher S. Doty, Kaori Idemaru, Susan G. Guion

Segment deletion in spontaneous speech: a corpus study using mixed effects models with crossed random effects
Christophe Van Bael, Harald Baayen, Helmer Strik

Categorical perception of Cantonese tones in context: a cross-linguistic study
Hongying Zheng, Peter W. M. Tsang, William S. -Y. Wang

A corpus study of the 3^rd tone sandhi in standard Chinese
Yiya Chen, Jiahong Yuan

Age-related changes in fundamental frequency and formants: a longitudinal study of four speakers
Jonathan Harrington, Sallyanne Palethorpe, Catherine I. Watson

Spoken Language Understanding and Summarization

A comparative study on speech summarization of broadcast news and lecture speech
Jian Zhang, Ho Yin Chan, Pascale Fung, Lu Cao

Towards online speech summarization
Gabriel Murray, Steve Renals

System request detection in conversation based on acoustic and speaker alternation features
Tomoyuki Yamagata, Atsushi Sako, Tetsuya Takiguchi, Yasuo Ariki

Selecting on-topic sentences from natural language corpora
Michael Levit, Elizabeth Boschee, Marjorie Freedman

A semi-supervised method for efficient construction of statistical spoken language understanding resources
Seokhwan Kim, Minwoo Jeong, Gary Geunbae Lee

Automatic extraction of cue phrases for important sentences in lecture speech and automatic lecture speech summarization
Yasuhisa Fujii, Norihide Kitaoka, Seiichi Nakagawa

A unified probabilistic generative framework for extractive spoken document summarization
Yi-Ting Chen, Hsuan-Sheng Chiu, Hsin-Min Wang, Berlin Chen

Generic class-based statistical language models for robust speech understanding in directed dialog applications
Matthieu Hébert

Robust location understanding in spoken dialog systems using intersections
Michael L. Seltzer, Yun-Cheng Ju, Ivan Tashev, Alex Acero

Voice Activity Detection and Sound Classification

Speech-nonspeech discrimination using the information bottleneck method and spectro-temporal modulation index
Maria Markaki, Michael Wohlmayr, Yannis Stylianou

A uniformly most powerful test for statistical model-based voice activity detection
Keun Won Jang, Dong Kook Kim, Joon-Hyuk Chang

Direct optimisation of a multilayer perceptron for the estimation of cepstral mean and variance statistics
John Dines, Jithendra Vepa

Filtering the unknown: speech activity detection in heterogeneous video collections
Marijn Huijbregts, Chuck Wooters, Roeland Ordelman

Environmentally aware voice activity detector
Abhijeet Sangwan, Nitish Krishnamurthy, John H. L. Hansen

Noise robust voice activity detection based on switching kalman filter
Masakiyo Fujimoto, Kentaro Ishizuka

Voice activity detection based on support vector machine using effective feature vectors
Q-Haing Jo, Yun-Sik Park, Kye-Hwan Lee, Ji-Hyun Song, Joon-Hyuk Chang

Voice activity detection in degraded speech using excitation source information
K Sri Rama Murty, B Yegnanarayana, S Guruprasad

Evaluation of real-time voice activity detection based on high order statistics
David Cournapeau, Tatsuya Kawahara

Robust voice activity detection based on adaptive sub-band energy sequence analysis and harmonic detection
Yanmeng Guo, Qian Qian, Yonghong Yan

The influence of speech activity detection and overlap on speaker diarization for meeting room recordings
Corinne Fredouille, Nicholas Evans

Voice activity detection using the phase vector in microphone array
Gibak Kim, Nam Ik Cho

Adaptive weighting of microphone arrays for distant-talking F0 and voiced/unvoiced estimation
Federico Flego, Christian Zieger, Maurizio Omologo

Robust and high-resolution voiced/unvoiced classification in noisy speech using a signal smoothness criterion
A. Sreenivasa Murthy, S. Chandra Sekhar, T. V. Sreenivas

Audio classification using extended baum-welch transformations
Tara N. Sainath, Victor Zue, Dimitri Kanevsky

Automatic laughter detection using neural networks
Mary Tai Knox, Nikki Mirghafori

Automatic acoustic segmentation for speech recognition on broadcast recordings
Gang Peng, Mei-Yuh Hwang, Mari Ostendorf

Unreviewed Papers for Special Sessions

Articulatory synthesis of singing
Peter Birkholz

Vocal conversion from speaking voice to singing voice using STRAIGHT
Takeshi Saitou, Masataka Goto, Masashi Unoki, Masato Akagi

Speech to chant transformation with the phase vocoder
Axel Roebel, Joshua Fineberg

VOCALOID - commercial singing synthesizer based on sample concatenation
Hideki Kenmochi, Hayato Ohshita

RAMCESS/handsketch : a multi-representation framework for realtime and expressive singing synthesis
Nicolas D’Alessandro, Thierry Dutoit

Formant-based synthesis of singing
Sten Ternström, Johan Sundberg

ELAN: a free and open-source multimedia annotation tool
Han Sloetjes, Albert Russel, Alexander Klassmann

Speechindexer in action: managing endangered Formosan languages
Jozsef Szakos, Ulrike Glavitsch

A portable record player for wax cylinders using a laser-beam reflection method
Tohru Ifukube, Yasuyuki Shimizu

Interspeech 2007

Antwerp, Belgium 27-31 August 2007

General Chairs: Dirk Van Compernolle, Lou Boves