ISCA Archive

Keynotes

Speaking more like you: entrainment in conversational speech
Julia Hirschberg

Neural representations of word meanings
Tom M. Mitchell

Signals and speech
Alex Pentland

Speaker Recognition - Modeling

Skew Gaussian mixture models for speaker recognition
Avi Matza, Yuval Bistritz

Towards goat detection in text-dependent speaker verification
Orith Toledo-Ronen, Hagai Aronowitz, Ron Hoory, Jason Pelecanos, David Nahamoo

Speaker modeling using local binary decisions
Jean-François Bonastre, Xavier Anguera, Gabriel H. Sierra, Pierre-Michel Bousquet

New developments in voice biometrics for user authentication
Hagai Aronowitz, Ron Hoory, Jason Pelecanos, David Nahamoo

Evaluation of i-vector speaker recognition systems for forensic application
Miranti Indar Mandasari, Mitchell McLaren, David A. van Leeuwen

Mixture of PLDA models in i-vector space for gender-independent speaker recognition
Mohammed Senoussaoui, Patrick Kenny, Niko Brümmer, Edward de Villiers, Pierre Dumouchel

Speech Perception - Speech Intelligibility

Segregation of whispered speech interleaved with noise or speech maskers
Nandini Iyer, Douglas S. Brungart, Brian D. Simpson

Monaural azimuth localization using spectral dynamics of speech
Roi Kliper, Hendrik Kayser, Daphna Weinshall, Israel Nelken, Jörn Anemüller

Prediction of binaural intelligibility level differences in reverberation
Jan Rennies, Thomas Brand, Birger Kollmeier

Let's all speak together! exploring the impact of various languages on the comprehension of speech in multi-linguistic babble
Aurore Gautreau, Michel Hoen, Fanny Meunier

Cross-rate variation in the intelligibility of dual-rate gated speech in older listeners
Valeriy Shafiro, Stanley Sheft, Robert Risley

An efferent-inspired auditory model front-end for speech recognition
Chia-ying Lee, James Glass, Oded Ghitza

Speech Representation and Modelling

A long-term harmonic plus noise model for speech signals
Faten Ben Ali, Laurent Girin, Sonia Djaziri Larbi

A frequency domain approach to ARX-LF voiced speech parameterization and synthesis
Alan Ó Cinnéide, David Dorran, Mikel Gainza, Eugene Coyle

Automatic data-driven learning of articulatory primitives from real-time MRI data using convolutive NMF with sparseness constraints
Vikram Ramanarayanan, Athanasios Katsamanis, Shrikanth Narayanan

Online pattern learning for non-negative convolutive sparse coding
Dong Wang, Ravichander Vipperla, Nicholas Evans

Sinewave representations of nonmodality
Nicolas Malyska, Thomas F. Quatieri, Robert Dunn

Time-varying signal adaptive transform and IHT recovery of compressive sensed speech
Ch. Srikanth Raj, T. V. Sreenivas

Emotion, Speaking Style, and Social Behavior

Acoustic-linguistic recognition of interest in speech with bottleneck-BLSTM nets
Martin Wöllmer, Felix Weninger, Florian Eyben, Björn Schuller

Automatic detection of anger in human-human call center dialogs
Mustafa Erden, Levent M. Arslan

Improved classification of speaking styles for mental health monitoring using phoneme dynamics
Keng-hao Chang, Howard Lei, John Canny

“you made me do it”: classification of blame in married couples' interactions by fusing automatically derived speech and language information
Matthew P. Black, Panayiotis G. Georgiou, Athanasios Katsamanis, Brian R. Baucom, Shrikanth Narayanan

Context and priming effects in the recognition of emotion of old and young listeners
Martijn Goudbeek, Marie Nilsenová

Acoustic and prosodic correlates of social behavior
Agustín Gravano, Rivka Levitan, Laura Willson, Štefan Beňuš, Julia Hirschberg, Ani Nenkova

HMM-Based Speech Synthesis I, II

Decision tree-based clustering with outlier detection for HMM-based speech synthesis
Kyung Hwan Oh, June Sig Sung, Doo Hwa Hong, Nam Soo Kim

Prediction of voice aperiodicity based on spectral representations in HMM speech synthesis
Hanna Silén, Elina Helander, Moncef Gabbouj

A perceptual expressivity modeling technique for speech synthesis based on multiple-regression HSMM
Takashi Nose, Takao Kobayashi

Multi-speaker modeling with shared prior distributions and model structures for Bayesian speech synthesis
Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

Feature-space transform tying in unified acoustic-articulatory modelling for articulatory control of HMM-based speech synthesis
Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi

The effect of using normalized models in statistical speech synthesis
Matt Shannon, Heiga Zen, William Byrne

Continuous control of the degree of articulation in HMM-based speech synthesis
Benjamin Picart, Thomas Drugman, Thierry Dutoit

Estimation of window coefficients for dynamic feature extraction for HMM-based speech synthesis
Ling-Hui Chen, Yoshihiko Nankaku, Heiga Zen, Keiichi Tokuda, Zhen-Hua Ling, Li-Rong Dai

Inverse filtering based harmonic plus noise excitation model for HMM-based speech synthesis
Zhengqi Wen, Jianhua Tao

Improved HNM-based vocoder for statistical synthesizers
Daniel Erro, Iñaki Sainz, Eva Navas, Inma Hernáez

A statistical phrase/accent model for intonation modeling
Gopala Krishna Anumanchipalli, Luís C. Oliveira, Alan W. Black

Intermediate-state HMMs to capture continuously-changing signal features
Gustav Eje Henter, W. Bastiaan Kleijn

Automatic sentence selection from speech corpora including diverse speech for improved HMM-TTS synthesis quality
Norbert Braunschweiler, Sabine Buchholz

Phonological knowledge guided HMM state mapping for cross-lingual speaker adaptation
Hui Liang, John Dines

Reformulating prosodic break model into segmental HMMs and information fusion
Nicolas Obin, Pierre Lanchantin, Anne Lacheret, Xavier Rodet

Multipulse sequences for residual signal modeling
Ranniery Maia, Heiga Zen, Kate Knill, M. J. F. Gales, Sabine Buchholz

Can objective measures predict the intelligibility of modified HMM-based synthetic speech in noise?
Cassia Valentini-Botinhao, Junichi Yamagishi, Simon King

Speech synthesis based on articulatory-movement HMMs with voice-source codebooks
Tsuneo Nitta, Takayuki Onoda, Masashi Kimura, Yurie Iribe, Kouichi Katsurada

Large-scale subjective evaluations of speech rate control methods for HMM-based speech synthesizers
Tsuneo Kato, Makoto Yamada, Nobuyuki Nishizawa, Keiichiro Oura, Keiichi Tokuda

HMM-based emphatic speech synthesis using unsupervised context labeling
Yu Maeno, Takashi Nose, Takao Kobayashi, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka

Speaker Recognition - Modeling, Automatic Procedures, Analysis I-III

Restoring the residual speaker information in total variability modeling for speaker verification
Ce Zhang, Rong Zheng, Bo Xu

New developments in joint factor analysis for speaker verification
Hagai Aronowitz, Oren Barkan

Speaker recognition using temporal contours in linguistic units: the case of formant and formant-bandwidth trajectories
Joaquin Gonzalez-Rodriguez

Discriminatively trained i-vector extractor for speaker verification
Ondřej Glembek, Lukáš Burget, Niko Brümmer, Oldřich Plchot, Pavel Matějka

Constrained cepstral speaker recognition using matched UBM and JFA training
Michelle Hewlett Sanchez, Luciana Ferrer, Elizabeth Shriberg, Andreas Stolcke

A new perspective on GMM subspace compensation based on PPCA and wiener filtering
Alan McCree, Douglas Sturim, Douglas Reynolds

Data-driven Gaussian component selection for fast GMM-based speaker verification
Ce Zhang, Rong Zheng, Bo Xu

Analysis of i-vector length normalization in speaker recognition systems
Daniel Garcia-Romero, Carol Y. Espy-Wilson

An analysis framework based on random subspace sampling for speaker verification
Weiwu Jiang, Zhifeng Li, Helen Meng

Factor analysis back ends for MLLR transforms in speaker recognition
Nicolas Scheffer, Yun Lei, Luciana Ferrer

Report on performance results in the NIST 2010 speaker recognition evaluation
Craig S. Greenberg, Alvin F. Martin, Bradford N. Barr, George R. Doddington

ivector fusion of prosodic and cepstral features for speaker verification
Marcel Kockmann, Luciana Ferrer, Lukáš Burget, Jan Černocký

i-vector based speaker recognition on short utterances
Ahilan Kanagasundaram, Robbie Vogt, David Dean, Sridha Sridharan, Michael Mason

Study of overlapped speech detection for NIST SRE summed channel speaker recognition
Hanwu Sun, Bin Ma

Super-dirichlet mixture models using differential line spectral frequencies for text-independent speaker identification
Zhanyu Ma, Arne Leijon

Comparison of voice activity detectors for interview speech in NIST speaker recognition evaluation
Hon-Bill Yu, Man-Wai Mak

Eigen-voice based anchor modeling system for speaker identification using MLLR super-vector
A. K. Sarkar, S. Umesh

Automatic detection of speaker attributes based on utterance text
Wen Wang, Andreas Kathol, Harry Bratt

Comparison of speaker recognition approaches for real applications
Sandro Cumani, Pier Domenico Batzu, Daniele Colibro, Claudio Vair, Pietro Laface, Vasileios Vasilakakis

Modeling speaker personality using voice
Tim Polzehl, Sebastian Möller, Florian Metze

Structural joint factor analysis for speaker recognition
Marc Ferràs, Koichi Shinoda, Sadaoki Furui

Acoustic forest for SMAP-based speaker verification
Sangeeta Biswas, Marc Ferràs, Koichi Shinoda, Sadaoki Furui

Mixture of auto-associative neural networks for speaker verification
G. S. V. S. Sivaram, Samuel Thomas, Hynek Hermansky

Speech Perception - Perceptual Learning and Cross-Language Perception

Perceptual learning of liquids
Odette Scharenborg, Holger Mitterer, James M. McQueen

The efficiency of cross-dialectal word recognition
Annelie Tuinman, Holger Mitterer, Anne Cutler

Estimation of perceptual spaces for speaker identities based on the cross-lingual discrimination task
Minoru Tsuzaki, Keiichi Tokuda, Hisashi Kawai, Jinfu Ni

The relation between perception and production in L2 phonological processing
Sharon Peperkamp, Camillia Bouchon

The role of word-initial glottal stops in recognizing English words
Maria Paola Bissiri, Maria Luisa Garcia Lecumberri, Martin Cooke, Jan Volín

Effect of language experience on the categorical perception of Cantonese vowel duration
Caicai Zhang, Gang Peng, William S.-Y. Wang

Speech Analysis

Adaptive estimation of zeros of time-varying z-transforms
C. F. Pedersen, Ove Andersen, Paul Dalsgaard

Identifying regions of non-modal phonation using features of the wavelet transform
John Kane, Christer Gobl

Acoustic analysis of whispered speech for phoneme and speaker dependency
Xing Fan, Keith W. Godin, John H. L. Hansen

Multi-party speech recovery exploiting structured sparsity models
Afsaneh Asaei, Mohammad J. Taghizadeh, Hervé Bourlard, Volkan Cevher

Modulation spectrum analysis for recognition of reverberant speech
Sri Harish Mallidi, Sriram Ganapathy, Hynek Hermansky

Discrete choice models for non-intrusive quality assessment
Petko N. Petkov, W. Bastiaan Kleijn, Bert de Vries

Speech Enhancement and Dereverberation

Single channel dereverberation using example-based speech enhancement with uncertainty decoding technique
Keisuke Kinoshita, Mehrez Souden, Marc Delcroix, Tomohiro Nakatani

A statistical room impulse response model with frequency dependent reverberation time for single-microphone late reverberation suppression
Jan S. Erkelens, Richard Heusdens

An assessment of the improvement potential of time-frequency masking for speech dereverberation
Chenxi Zheng, Tiago H. Falk, Wai-Yip Chan

Perceptual improvement of a two-stage algorithm for speech dereverberation
Thiago de M. Prego, Amaro A. de Lima, Sergio L. Netto

A model-based spectral envelope wiener filter for perceptually motivated speech enhancement
Najib Hadir, Friedrich Faubel, Dietrich Klakow

Binaural noise-reduction method based on blind source separation and perceptual post processing
Jorge I. Marin-Hurtado, Devangi N. Parikh, David V. Anderson

ASR - Feature Extraction I, II

Region dependent transform on MLP features for speech recognition
Tim Ng, Bing Zhang, Spyros Matsoukas, Long Nguyen

Discriminant sub-space projection of spectro-temporal speech features based on maximizing mutual information
Martin Heckmann, Claudius Gläser

Combining feature space discriminative training with long-term spectro-temporal features for noise-robust speech recognition
Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura

Combining frame and segment level processing via temporal pooling for phonetic classification
Sumit Chopra, Patrick Haffner, Dimitrios Dimitriadis

Improved bottleneck features using pretrained deep neural networks
Dong Yu, Michael L. Seltzer

Minimum classification error based spectro-temporal feature extraction for robust audio classification
Yuan-Fu Liao, Chia-Hsing Lin, We-Der Fang

Integrating recent MLP feature extraction techniques into TRAP architecture
František Grézl, Martin Karafiát

Feature frame stacking in RNN-based tandem ASR systems - learned vs. predefined context
Martin Wöllmer, Björn Schuller, Gerhard Rigoll

Improved acoustic feature combination for LVCSR by neural networks
Christian Plahl, Ralf Schlüter, Hermann Ney

Hierarchical tandem features for ASR in Mandarin
Joel Pinto, Mathew Magimai-Doss, Hervé Bourlard

Analysis and comparison of recent MLP features for LVCSR systems
Fabio Valente, Mathew Magimai-Doss, Wen Wang

Deep learning of speech features for improved phonetic recognition
Jaehyung Lee, Soo-Young Lee

Globality-locality consistent discriminant analysis for phone classification
Heyun Huang, Yang Liu, Jort F. Gemmeke, Louis ten Bosch, Bert Cranen, Lou Boves

Front-end compensation methods for LVCSR under lombard effect
Hynek Bořil, František Grézl, John H. L. Hansen

Classification of fricatives using feature extrapolation of acoustic-phonetic features in telephone speech
Jung-Won Lee, Jeung-Yoon Choi, Hong-Goo Kang

Noise robust feature extraction based on extended weighted linear prediction in LVCSR
Sami Keronen, Jouni Pohjalainen, Paavo Alku, Mikko Kurimo

Comparing different flavors of spectro-temporal features for ASR
Bernd T. Meyer, Suman V. Ravuri, Marc René Schädler, Nelson Morgan

VTLN in the MFCC domain: band-limited versus local interpolation
Ehsan Variani, Thomas Schaaf

Multistream bandpass modulation features for robust speech recognition
Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali

An analysis of automatic speech recognition with multiple microphones
Davide Marino, Thomas Hain

Speech Production - Articulatory Measurements

Visualization of vocal tract shape using interleaved real-time MRI of multiple scan planes
Yoon-Chul Kim, Michael Proctor, Shrikanth Narayanan, Krishna S. Nayak

Biomechanical tongue models: an approach to studying inter-speaker variability
Ralf Winkler, Susanne Fuchs, Pascal Perrier, Mark Tiede

Quantifying articulatory distinctiveness of vowels
Jun Wang, Jordan R. Green, Ashok Samal, David B. Marx

Direct estimation of articulatory kinematics from real-time magnetic resonance image sequences
Michael Proctor, Adam Lammert, Athanasios Katsamanis, Louis Goldstein, Christina Hagedorn, Shrikanth Narayanan

Combined optical distance sensing and electropalatography to measure articulation
Peter Birkholz, Christiane Neuschaefer-Rube

Simulating post-l F0 bouncing by modeling articulatory dynamics
Santitham Prom-on, Yi Xu, Fang Liu

Acoustic Event Detection

Learning new acoustic events in an HMM-based system using MAP adaptation
Jürgen T. Geiger, Mohamed Anouar Lakhal, Björn Schuller, Gerhard Rigoll

Alternative frequency scale cepstral coefficient for robust sound event recognition
Yi Ren Leng, Huy Dat Tran, Norihide Kitaoka, Haizhou Li

Evaluation of abnormal sound detection using multi-stage GMM in various environments
Akinori Ito, Akihito Aiba, Masashi Ito, Shozo Makino

Unsupervised learning of acoustic events using dynamic time warping and hierarchical k-means++ clustering
Joerg Schmalenstroeer, Markus Bartek, Reinhold Haeb-Umbach

Feature extraction assessment for an acoustic-event classification task using the entropy triangle
David Mejía-Navarrete, Ascensión Gallardo-Antolín, Carmen Peláez-Moreno, Francisco J. Valverde-Albacete

Unsupervised audio analysis for categorizing heterogeneous consumer domain videos
Pradeep Natarajan, Stavros Tsakalidis, Vasant Manohar, Rohit Prasad, Premkumar Natarajan

Speech Synthesis - Unit Selection and Hybrid approaches

Enriching text-to-speech synthesis using automatic dialog act tags
Vivek Kumar Rangarajan Sridhar, Ann Syrdal, Alistair D. Conkie, Srinivas Bangalore

Joint target and join cost weight training for unit selection synthesis
Lukas Latacz, Wesley Mattheyses, Werner Verhelst

Prominence-based prosody prediction for unit selection speech synthesis
Andreas Windmann, Igor Jauk, Fabio Tamburini, Petra Wagner

Evaluating the meaning of synthesized listener vocalizations
Sathish Pammi, Marc Schröder

A hybrid TTS approach for prosody and acoustic modules
Iñaki Sainz, Daniel Erro, Eva Navas, Inma Hernáez

Uniform speech parameterization for multi-form segment synthesis
Alexander Sorin, Slava Shechtman, Vincent Pollet

Speech Enhancement Analysis and Evaluation

Theoretical analysis of musical noise and speech distortion in structure-generalized parametric blind spatial subtraction array
Ryoichi Miyazaki, Hiroshi Saruwatari, Kiyohiro Shikano

Subjective and objective evaluation of speech intelligibility enhancement under constant energy and duration constraints
Yan Tang, Martin Cooke

A risk-estimation-based comparison of mean square error and itakura-saito distortion measures for speech enhancement
Nagarjuna Reddy Muraka, Chandra Sekhar Seelamantula

On noise tracking for noise floor estimation
Mahdi Triki

Maximum a posteriori estimation of noise from non-acoustic reference signals in very low signal-to-noise ratio environments
Ben Milner

Blind speech prior estimation for generalized minimum mean-square error short-time spectral amplitude estimator
Ryo Wakisaka, Hiroshi Saruwatari, Kiyohiro Shikano, Tomoya Takatani

Speaker Recognition - Analysis and Statistics I-III

Harmonic structure transform for speaker recognition
Kornel Laskowski, Qin Jin

Combining evidence from spectral and source-like features for person recognition from humming
Hemant A. Patil, Maulik C. Madhavi, Keshab K. Parhi

Improvements in speaker characterization using spectral subband energy based on harmonic plus noise model
Yanhua Long, Zhi-Jie Yan, Frank K. Soong, Li-Rong Dai, Wu Guo

Implicit segmentation in two-wire speaker recognition
Yosef A. Solewicz, Hagai Aronowitz

Boosting speaker recognition performance with compact representations
Sibel Yaman, Jason Pelecanos, Mohamed Kamal Omar

Partitioning of two-speaker conversation datasets
Carlos Vaquero, Alfonso Ortega, Eduardo Lleida

Intersession compensation and scoring methods in the i-vectors space for speaker recognition
Pierre-Michel Bousquet, Driss Matrouf, Jean-François Bonastre

Kernel alignment maximization for speaker recognition based on high-level features
Szymon Drgas, Adam Dabrowski

Kernel partial least squares for speaker recognition
Balaji Vasan Srinivasan, Daniel Garcia-Romero, Dmitry N. Zotkin, Ramani Duraiswami

Conversational-side-specific inter-session variability compensation
Mohamed Kamal Omar, Jason Pelecanos

A speaker line-up for the likelihood ratio
David A. van Leeuwen, Niko Brümmer

Towards fully Bayesian speaker recognition: integrating out the between-speaker covariance
Jesús Villalba, Niko Brümmer

Variational Bayesian model selection for GMM-speaker verification using universal background model
Timur Pekhovsky, Alexandra Lokhanova

To weight or not to weight: source-normalised LDA for speaker recognition using i-vectors
Mitchell McLaren, David A. van Leeuwen

Maximum entropy based data selection for speaker recognition
Chien-Lin Huang, Bin Ma

Addressing the data-imbalance problem in kernel-based speaker verification via utterance partitioning and speaker comparison
Wei Rao, Man-Wai Mak

Single-channel head orientation estimation based on discrimination of acoustic transfer function
Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki

Maximum likelihood i-vector space using PCA for speaker verification
Zhenchun Lei, Yingchun Yang

Speaker verification using sparse representations on total variability i-vectors
Ming Li, Xiang Zhang, Yonghong Yan, Shrikanth Narayanan

Robust speaker recognition in non-stationary room environments based on empirical mode decomposition
Taufiq Hasan, John H. L. Hansen

Range based multi microphone array fusion for speaker activity detection in small meetings
Jani Even, Panikos Heracleous, Carlos T. Ishi, Norihiro Hagita

Speaker verification robust to talking style variation using multiple kernel learning based on conditional entropy minimization
Tetsuji Ogawa, Hideitsu Hino, Noboru Murata, Tetsunori Kobayashi

Regularized logistic regression fusion for speaker verification
Ville Hautamäki, Kong Aik Lee, Tomi Kinnunen, Bin Ma, Haizhou Li

A longest matching segment approach with Bayesian adaptation - application to noise-robust speaker recognition
Ayeh Jafari, Ramji Srinivasan, Danny Crookes, Ji Ming

Data selection with kurtosis and nasality features for speaker recognition
Howard Lei, Nikki Mirghafori

Use of the harmonic phase in speaker recognition
Inma Hernáez, Ibon Saratxaga, Jon Sanchez, Eva Navas, Iker Luengo

Speech Production - Coarticulation and Speech Timing

Jaw movement in vowels and liquids forming the syllable nucleus
Štefan Beňuš, Marianne Pouplier

Coarticulation across prosodic domains in Italian: an ultrasound investigation
Barbara Gili Fivela, Antonio Stella, Sonia D'Apolito, Francesco Sigona

Investigating the stability of intergestural timing relations
Juraj Šimko, Fred Cummins, Štefan Beňuš

Speech timing organization for the phonological length contrast in Italian consonants
Claudio Zmarich, Barbara Gili Fivela, Pascal Perrier, Christophe Savariaux, Graziano Tisato

Timing in Italian VNC sequences at different speech rates
Chiara Celata, Silvia Calamai

Automatic analysis of singleton and geminate consonant articulation using real-time magnetic resonance imaging
Christina Hagedorn, Michael Proctor, Louis Goldstein

Speech Segmentation

A two-stage sample-based phone boundary detector using segmental similarity features
Yih-Ru Wang

Iterative improvement of speaker segmentation in a noisy environment using high-level knowledge
Qiang Huang, Stephen J. Cox

Hierarchical audio segmentation with HMM and factor analysis in broadcast news domain
Diego Castán, Carlos Vaquero, Alfonso Ortega, David Martínez, Jesús Villalba, Eduardo Lleida

Syllable segmentation of continuous speech using auditory attention cues
Ozlem Kalinli

Exploiting phone-class specific landmarks for refinement of segment boundaries in TTS databases
Vijayaditya Peddinti, Kishore Prahallad

Phoneme-level text to audio synchronization on speech signals with background music
Agnès Pedone, Juan José Burred, Simon Maller, Pierre Leveau

ASR - Acoustic Models I-III

Conversational speech transcription using context-dependent deep neural networks
Frank Seide, Gang Li, Dong Yu

Sequential classification criteria for NNs in automatic speech recognition
Guangsen Wang, Khe Chai Sim

Grapheme-based automatic speech recognition using KL-HMM
Mathew Magimai-Doss, Ramya Rasipuram, Guillermo Aradilla, Hervé Bourlard

Direct error rate minimization of hidden Markov models
Joseph Keshet, Chih-Chieh Cheng, Mark Stoehr, David McAllester

On the effectiveness of statistical modeling based template matching approach for continuous speech recognition
Xie Sun, Xin Chen, Yunxin Zhao

Comparison of smoothing techniques for robust context dependent acoustic modelling in hybrid NN/HMM systems
Guangsen Wang, Khe Chai Sim

Generalized Baum-welch algorithm and its implication to a new extended Baum-welch algorithm
Roger Hsiao, Tanja Schultz

Word boundary modelling and full covariance Gaussians for Arabic speech-to-text systems
F. Diehl, M. J. F. Gales, X. Liu, M. Tomalin, P. C. Woodland

A fully automated derivation of state-based eigentriphones for triphone modeling with no tied states using regularization
Tom Ko, Brian Mak

Reducing computational complexities of exemplar-based sparse representations with applications to large vocabulary speech recognition
Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky

An i-vector based approach to training data clustering for improved speech recognition
Yu Zhang, Jian Xu, Zhi-Jie Yan, Qiang Huo

Rapid training of acoustic models using graphics processing unit
Senaka Buthpitiya, Ian Lane, Jike Chong

Semi-automatic acoustic model generation from large unsynchronized audio and text chunks
Michele Alessandrini, Giorgio Biagetti, Alessandro Curzi, Claudio Turchetti

Unsupervised testing strategies for ASR
Brian Strope, Doug Beeferman, Alexander Gruenstein, Xin Lei

Acoustic model training with detecting transcription errors in the training data
Gakuto Kurata, Nobuyasu Itoh, Masafumi Nishimura

Towards unsupervised training of speaker independent acoustic models
Aren Jansen, Kenneth Church

Acoustic modeling with bootstrap and restructuring based on full covariance
Xiaodong Cui, Xin Chen, Jian Xue, Peder A. Olsen, John R. Hershey, Bowen Zhou

An i-vector based approach to acoustic sniffing for irrelevant variability normalization based acoustic model training and speech recognition
Jian Xu, Yu Zhang, Zhi-Jie Yan, Qiang Huo

Log-linear optimization of second-order polynomial features with subsequent dimension reduction for speech recognition
Muhammad Ali Tahir, Ralf Schlüter, Hermann Ney

Genre categorization and modeling for broadcast speech transcription
Qingqing Zhang, Lori Lamel, Jean-Luc Gauvain

Individual error minimization learning framework and its applications to speech recognition and utterance verification
Sunghwan Shin, Ho-Young Jung, Biing-Hwang Juang

Effective triphone mapping for acoustic modeling in speech recognition
Sakhia Darjaa, Miloš Cerňak, Marián Trnka, Milan Rusko, Róbert Sabo

Analysis of dialectal influence in pan-Arabic ASR
Udhyakumar Nallasamy, Michael Garbus, Florian Metze, Qin Jin, Thomas Schaaf, Tanja Schultz

Connected digit recognition by means of reservoir computing
Azarakhsh Jalalvand, Fabian Triefenbach, David Verstraeten, Jean-Pierre Martens

Large margin - minimum classification error using sum of shifted sigmoids as the loss function
Madhavi V. Ratnagiri, Biing-Hwang Juang, Lawrence Rabiner

Representing phonological features through a two-level finite state model
Javier M. Olaso, M. Inés Torres, Raquel Justo

Optimization of the Gaussian mixture model evaluation on GPU
Jan Vaněk, Jan Trmal, Josef V. Psutka, Josef Psutka

Robust Speech Recognition I-III

Propagation of uncertainty through multilayer perceptrons for robust automatic speech recognition
Ramón Fernandez Astudillo, João Paulo da Silva Neto

Mapping sparse representation to state likelihoods in noise-robust automatic speech recognition
Katariina Mahkonen, Antti Hurmalainen, Tuomas Virtanen, Jort F. Gemmeke

Uncertainty measures for improving exemplar-based source separation
Heikki Kallasjoki, Ulpu Remes, Jort F. Gemmeke, Tuomas Virtanen, Kalle J. Palomäki

Maximum confidence measure based interaural phase difference estimation for noise masking in dual-microphone robust speech recognition
Hsien-Cheng Liao, Yuan-Fu Liao, Chin-Hui Lee

A performance monitoring approach to fusing enhanced spectrogram channels in robust speech recognition
Shirin Badiezadegan, Richard Rose

Generalized variable parameter HMMs for noise robust speech recognition
Ning Cheng, X. Liu, Lan Wang

Sinusoidal approach for the single-channel speech separation and recognition challenge
P. Mowlaee, R. Saeidi, Zheng-Hua Tan, M. G. Christensen, Tomi Kinnunen, P. Fränti, S. H. Jensen

Semi-supervised single-channel speech-music separation for automatic speech recognition
Cemil Demir, A. Taylan Cemgil, Murat Saraçlar

A level-dependent auditory filter-bank for speech recognition in reverberant environments
HariKrishna Maganti, Marco Matassoni

A multichannel feature-based processing for robust speech recognition
Mehrez Souden, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani

Feature normalization using structured full transforms for robust speech recognition
Xiong Xiao, Jinyu Li, Eng Siong Chng, Haizhou Li

A robust estimation method of noise mixture model for noise suppression
Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani

A versatile Gaussian splitting approach to non-linear state estimation and its application to noise-robust ASR
Volker Leutnant, Alexander Krueger, Reinhold Haeb-Umbach

Generalized-log spectral mean normalization for speech recognition
Hilman F. Pardede, Koichi Shinoda

Zero-crossing-based channel attentive weighting of cepstral features for robust speech recognition: the ETRI 2011 CHiME challenge system
Young-Ik Kim, Hoon-Young Cho, Sang-Hun Kim

Feature compensation for speech recognition in severely adverse environments due to background noise and channel distortion
Wooil Kim, John H. L. Hansen

Binaural cues for fragment-based speech recognition in reverberant multisource environments
Ning Ma, Jon Barker, Heidi Christensen, Phil D. Green

Sub-band level histogram equalization for robust speech recognition
Vikas Joshi, Raghavendra Bilgi, S. Umesh, L. Garcia, C. Benitez

GMM-based missing-feature reconstruction on multi-frame windows
Ulpu Remes, Yoshihiko Nankaku, Keiichi Tokuda

Improvements of a dual-input DBN for noise robust ASR
Yang Sun, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves

Denoising using optimized wavelet filtering for automatic speech recognition
Randy Gomez, Tatsuya Kawahara

Noise robust speaker-independent speech recognition with invariant-integration features using power-bias subtraction
Florian Müller, Alfred Mertins

Physiology and Pathology of Spoken Language

Novel VTEO based mel cepstral features for classification of normal and pathological voices
Hemant A. Patil, Pallavi N. Baljekar

Temporal performance of dysarthric patients in speech and tapping tasks
Eiji Shimura, Kazuhiko Kakehi

A comparative acoustic study on speech of glossectomy patients and normal subjects
Xinhui Zhou, Maureen Stone, Carol Y. Espy-Wilson

Dysperiodicity analysis of perceptually assessed synthetic speech stimuli
Ali Alpan, Francis Grenez, Jean Schoentgen

Is the perception of voice quality language-dependent? a comparison of French and Italian listeners and dysphonic speakers
Alain Ghio, Frédérique Weisz, Giovanna Baracca, Giovanna Cantarella, Danièle Robert, Virginie Woisard, Franco Fussi, Antoine Giovanni

Automatic selection of acoustic and non-linear dynamic features in voice signals for hypernasality detection
J. R. Orozco-Arroyave, S. Murillo-Rendón, A. M. Álvarez-Meza, J. D. Arias-Londoño, E. Delgado-Trejos, J. F. Vargas-Bonilla, C. G. Castellanos-Domínguez

ASR - Lexical, Prosodic and Multi-Lingual Models

Learning from mistakes: expanding pronunciation lexicons using word recognition errors
Sravana Reddy, Evandro Gouvêa

Improving non-native ASR through stochastic multilingual phoneme space transformations
David Imseng, Hervé Bourlard, John Dines, Philip N. Garner, Mathew Magimai-Doss

Unsupervised Arabic dialect adaptation with self-training
Scott Novotney, Rich Schwartz, Sanjeev Khudanpur

Template-based automatic speech recognition meets prosody
Dino Seppi, Kris Demuynck, Dirk Van Compernolle

Pronunciation learning from continuous speech
Ibrahim Badr, Ian McGraw, James Glass

State-level data borrowing for low-resource speech recognition based on subspace GMMs
Yanmin Qian, Daniel Povey, Jia Liu

Source Separation

Blind speech separation in multiple environments using a frequency oriented PCA method for convolutive mixtures
Y. Benabderrahmane, Sid-Ahmed Selouani, Douglas O'Shaughnessy

Blind speech separation in time-domain using block-toeplitz structure of reconstructed signal matrices
Zbyněk Koldovský, Jiří Málek, Petr Tichavský

Generalized method for solving the permutation problem in frequency-domain blind source separation of convolved speech signals
Auxiliadora Sarmiento, Iván Durán, Sergio Cruces, Pablo Aguilera

Adaptation of speaker-specific bases in non-negative matrix factorization for single channel speech-music separation
Emad M. Grais, Hakan Erdogan

An informed source separation system for speech signals
Shuhua Zhang, Laurent Girin

Adaptive blocking beamformer for speech separation
Ngoc Thuy Tran, William Cowley, André Pollok

Multimodal Signal Processing

Asynchronous multimodal text entry using speech and gesture keyboards
Per Ola Kristensson, Keith Vertanen

Robust bimodal person identification using face and speech with limited training data and corruption of both modalities
Niall McLaughlin, Ji Ming, Danny Crookes

Toward a multi-speaker visual articulatory feedback system
Atef Ben Youssef, Thomas Hueber, Pierre Badin, Gérard Bailly

Statistical mapping between articulatory and acoustic data for an ultrasound-based silent speech interface
Thomas Hueber, Elie-Laurent Benaroya, Bruce Denby, Gérard Chollet

Unsupervised geometry calibration of acoustic sensor networks using source correspondences
Joerg Schmalenstroeer, Florian Jacob, Reinhold Haeb-Umbach, Marius H. Hennecke, Gernot A. Fink

Investigations on speaking mode discrepancies in EMG-based speech recognition
Michael Wand, Matthias Janke, Tanja Schultz

ASR - Language Models I, II

Empirical evaluation and combination of advanced language modeling techniques
Tomáš Mikolov, Anoop Deoras, Stefan Kombrink, Lukáš Burget, Jan Černocký

Personalizing model M for voice-search
Geoffrey Zweig, Shuangyu Chang

Sentence selection by direct likelihood maximization for language model adaptation
Takahiro Shinozaki, Yu Kubota, Sadaoki Furui, Eiji Utsunomiya, Yasutaka Shindoh

Feature combination approaches for discriminative language models
Ebru Arısoy, Bhuvana Ramabhadran, Hong-Kwang Jeff Kuo

On-line language model biasing for multi-pass automatic speech recognition
Sankaranarayanan Ananthakrishnan, Stavros Tsakalidis, Rohit Prasad, Premkumar Natarajan

Mandarin word-character hybrid-input neural network language model
Moonyoung Kang, Tim Ng, Long Nguyen

Unary data structures for language models
Jeffrey Sorensen, Cyril Allauzen

Bayesian language model interpolation for mobile speech input
Cyril Allauzen, Michael Riley

On the estimation of discount parameters for language model smoothing
Martin Sundermeyer, Ralf Schlüter, Hermann Ney

N-grams for conditional random fields or a failure-transition(ϕ) posterior for acyclic FSTs
Patrick Lehnen, Stefan Hahn, Hermann Ney

Hybrid language models using mixed types of sub-lexical units for open vocabulary German LVCSR
M. Ali Basha Shaik, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney

Morpheme based factored language models for German LVCSR
Amr El-Desoky Mousa, M. Ali Basha Shaik, Ralf Schlüter, Hermann Ney

Compound word recombination for German LVCSR
Markus Nußbaum-Thom, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney

Lattice-based risk minimization training for unsupervised language model adaptation
Akio Kobayashi, Takahiro Oku, Shinichi Homma, Toru Imai, Seiichi Nakagawa

Similarity language model
Christian Gillot, Christophe Cerisara

Data sampling and dimensionality reduction approaches for reranking ASR outputs using discriminative language models
Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın

Training a language model using webdata for large vocabulary Japanese spontaneous speech recognition
Ryo Masumura, Seongjun Hahm, Akinori Ito

Large vocabulary SOUL neural network language models
Hai-Son Le, Ilya Oparin, Abdel Messaoudi, Alexandre Allauzen, Jean-Luc Gauvain, François Yvon

Improved spoken query transcription using co-occurrence information
Jonathan Mamou, Abhinav Sethy, Bhuvana Ramabhadran, Ron Hoory, Paul Vozila

Unsupervised latent speaker language modeling
Yik-Cheung Tam, Paul Vozila

Phonology and Phonetics

Laryngealization and breathiness in persian
Vahid Sadeghi

Age-dependent differences in the neutralization of the intervocalic voicing contrast: evidence from an apparent-time study on east franconian
Viola Müller, Jonathan Harrington, Felicitas Kleber, Ulrich Reubold

Comparing syllable frequencies in corpora of written and spoken language
Barbara Samlowski, Bernd Möbius, Petra Wagner

Sylli: automatic phonological syllabification for Italian
Luca Iacoponi, Renata Savy

A preliminary study on the production of signs in brazilian sign language when one of the manual articulators is unavailable
André N. Xavier, Plínio A. Barbosa

Electroglottograph and acoustic cues for phonation contrasts in taiwan min falling tones
Ho-hsien Pan, Mao-Hsu Chen, Shao-Ren Lyu

Voice Conversion

One-to-many voice conversion based on tensor representation of speaker space
Daisuke Saito, Keisuke Yamamoto, Nobuaki Minematsu, Keikichi Hirose

A study on bag of Gaussian model with application to voice conversion
Yu Qiao, Tong Tong, Nobuaki Minematsu

A Bayesian approach to voice conversion based on GMMs using multiple model structures
Lei Li, Yoshihiko Nankaku, Keiichi Tokuda

Quality improvement of voice conversion systems based on trellis structured vector quantization
Mahdi Eslami, Hamid Sheikhzadeh, Abolghasem Sayadiyan

Voice conversion using GMM with enhanced global variance
Hadas Benisty, David Malah

Spectral envelope transformation using DFW and amplitude scaling for voice conversion with parallel or nonparallel corpora
Elizabeth Godoy, Olivier Rosec, Thierry Chonavel

Spoken Language Understanding

Multi-task learning for spoken language understanding with shared slots
Xiao Li, Ye-Yi Wang, Gokhan Tur

Learning weighted entity lists from web click logs for spoken language understanding
Dustin Hillard, Asli Celikyilmaz, Dilek Hakkani-Tür, Gokhan Tur

Bootstrapping domain detection using query click logs for new domains
Dilek Hakkani-Tür, Gokhan Tur, Larry Heck, Elizabeth Shriberg

Approximate inference for domain detection in spoken language understanding
Asli Celikyilmaz, Dilek Hakkani-Tür, Gokhan Tur

Speech indexing using semantic context inference
Chien-Lin Huang, Bin Ma, Haizhou Li, Chung-Hsien Wu

Automatically optimizing utterance classification performance without human in the loop
Yun-Cheng Ju, Jasha Droppo

Dialect and Accent Identification

In search of cues discriminating West-african accents in French
Philippe Boula de Mareüil, Jean-Luc Rouas, Manuela Yapomo

Computer and human recognition of regional accents of british English
Abualsoud Hanani, Martin Russell, Michael J. Carey

Target-aware lattice rescoring for dialect recognition
Rong Tong, Bin Ma, Haizhou Li, Eng Siong Chng

Effective Arabic dialect classification using diverse phonotactic models
Murat Akbacak, Dimitra Vergyri, Andreas Stolcke, Nicolas Scheffer, Arindam Mandal

Characterizing deletion transformations across dialects using a sophisticated tying mechanism
Nancy F. Chen, Wade Shen, Joseph P. Campbell

Dialect and accent recognition using phonetic-segmentation supervectors
Fadi Biadsy, Julia Hirschberg, Daniel P. W. Ellis

First Language Acquisition

The multi timescale phoneme acquisition model of the self-organizing based on the dynamic features
Kouki Miyazawa, Hideaki Miura, Hideaki Kikuchi, Reiko Mazuka

The time-course of talker-specificity effects for newly-learned pseudowords: evidence for a hybrid model of lexical representation
Helen Brown, M. Gareth Gaskell

A parametric approach to intonation acquisition research: validation on child-directed speech data
Britta Lintfert, Antje Schweitzer, Bernd Möbius

Modelling novelty preference in word learning
Maarten Versteegh, Louis ten Bosch, Lou Boves

Using imitation to learn infant-adult acoustic mappings
G. Ananthakrishnan, Giampiero Salvi

Thresholding word activations for response scoring - modelling psycholinguistic data
Christina Bergmann, Louis ten Bosch, Lou Boves

Spoken Dialogue Systems I, II

User study of spoken decision support system
Teruhisa Misu, Kiyonori Ohtake, Chiori Hori, Hisashi Kawai, Satoshi Nakamura

Efficient probabilistic tracking of user goal and dialog history for spoken dialog systems
Antoine Raux, Yi Ma

Tackling a shilly-shally classifier for predicting task success in spoken dialogue interaction
Alexander Schmitt, Alexander Zgorzelski, Wolfgang Minker

Evaluation of listening-oriented dialogue control rules based on the analysis of HMMs
Toyomi Meguro, Yasuhiro Minami, Ryuichiro Higashinaka, Kohji Dohsaka

Large-scale experiments on data-driven design of commercial spoken dialog systems
D. Suendermann, J. Liscombe, J. Bloom, G. Li, Roberto Pieraccini

Comparing system-driven and free dialogue in in-vehicle interaction
Fredrik Kronlid, Jessica Villing, Alexander Berman, Staffan Larsson

Optimizing situated dialogue management in unknown environments
Heriberto Cuayáhuitl, Nina Dethlefs

Acoustic-similarity based technique to improve concept recognition
Om D. Deshmukh, Shajith Ikbal, Ashish Verma, Etienne Marcheret

Dialog methods for improved alphanumeric string capture
Doug Peters, Peter Stubley

Detecting the status of a predictive incremental speech understanding model for real-time decision-making in a spoken dialogue system
David DeVault, Kenji Sagae, David Traum

User simulation in dialogue systems using inverse reinforcement learning
Senthilkumar Chandramohan, Matthieu Geist, Fabrice Lefèvre, Olivier Pietquin

Lossless value directed compression of complex user goal states for statistical spoken dialogue systems
Paul A. Crook, Oliver Lemon

Spoken Language Resources, Evaluation and Standardization I, II

Rapid evaluation of speech representations for spoken term discovery
Michael A. Carlin, Samuel Thomas, Aren Jansen, Hynek Hermansky

Phonemic similarity metrics to compare pronunciation methods
Ben Hixon, Eric Schneider, Susan L. Epstein

Investigating the effect of number of interlocutors on the quality of experience for multi-party audio conferencing
Janto Skowronek, Alexander Raake

On development of consistently punctuated speech corpora
Jáchym Kolář, Lori Lamel

A multimodal real-time MRI articulatory corpus for speech research
Shrikanth Narayanan, Erik Bresch, Prasanta Kumar Ghosh, Louis Goldstein, Athanasios Katsamanis, Yoon Kim, Adam Lammert, Michael Proctor, Vikram Ramanarayanan, Yinghua Zhu

Building an audio-visual corpus of Australian English: large corpus collection with an economical portable and replicable black box
Denis Burnham, Dominique Estival, Steven Fazio, Jette Viethen, Felicity Cox, Robert Dale, Steve Cassidy, Julien Epps, Roberto Togneri, Michael Wagner, Yuko Kinoshita, Roland Göcke, Joanne Arciuli, Marc Onslow, Trent Lewis, Andrew Butcher, John Hajek

Spoken Language Resources, Evaluation and Standardization I

Measurement of objective intelligibility of Japanese accented English using ERJ (English read by Japanese) database
Nobuaki Minematsu, Koji Okabe, Keisuke Ogaki, Keikichi Hirose

From single-call to multi-call quality: a study on long-term quality integration in audio-visual speech communication
Sebastian Möller, Chihuy Bang, Teele Tamme, Markus Vaalgamaa, Benjamin Weiss

Optimal selection of limited vocabulary speech corpora
Hui Lin, Jeff Bilmes

Open source multi-language audio database for spoken language processing applications
Stephen A. Zahorian, Jiang Wu, Montri Karnjanadecha, Chandra SekharVootkuri, Brian Wong, Andrew Hwang, Eldar Tokhtamyshev

The USC CARE corpus: child-psychologist interactions of children with autism spectrum disorders
Matthew P. Black, Daniel Bone, Marian E. Williams, Phillip Gorrindo, Pat Levitt, Shrikanth Narayanan

Towards a versatile multi-layered description of speech corpora using algebraic relations
Nelly Barbot, Vincent Barreaud, Olivier Boëffard, Laure Charonnat, Arnaud Delhay, Sébastien Le Maguer, Damien Lolive

Announcing the electromagnetic articulography (day 1) subset of the mngu0 articulatory corpus
Korin Richmond, Phil Hoole, Simon King

A pitch tracking corpus with evaluation on multipitch tracking scenario
Gregor Pirker, Michael Wohlmayr, Stefan Petrik, Franz Pernkopf

On building and evaluating a broadcast-news audio segmentation system
Taras Butko, Climent Nadeu

Time- and acoustic-mediated alignment algorithms for speech recognition evaluation
Simon Dobrišek, France Mihelič

Effects of shortening speech prompts of in-car voice user interfaces on users mental models
Julia Niemann, Kati Schulz, Ina Wechsung

Speech transcript evaluation for information retrieval
Laurens van der Werff, Wessel Kraaij, Franciska de Jong

The Albayzin 2010 language recognition evaluation
Luis Javier Rodriguez-Fuentes, Mikel Penagarikano, Amparo Varona, Mireia Diez, Germán Bordel

Progress and prospects for speech technology: results from three sexennial surveys
Roger K. Moore

Painless WFST cascade construction for LVCSR - transducersaurus
Josef R. Novak, Nobuaki Minematsu, Keikichi Hirose

Language Identification

Data-driven UBM generation via tied Gaussians for GMM-supervector based accent identification
Rong Zheng, Ce Zhang, Bo Xu

I3a language recognition system for albayzin 2010 LRE
David Martínez, Jesús Villalba, Antonio Miguel, Alfonso Ortega, Eduardo Lleida

Dimensionality reduction for using high-order n-grams in SVM-based phonotactic language recognition
Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-Fuentes, Germán Bordel

Language recognition via i-vectors and dimensionality reduction
Najim Dehak, Pedro A. Torres-Carrasquillo, Douglas Reynolds, Reda Dehak

Language recognition in ivectors space
David Martínez, Oldřich Plchot, Lukáš Burget, Ondřej Glembek, Pavel Matějka

Second Language Acquisition, Development and Learning I, II

On mispronunciation lexicon generation using joint-sequence multigrams in computer-aided pronunciation training (CAPT)
Xiaojun Qian, Helen Meng, Frank K. Soong

Validating a second language perception model for classroom context - a longitudinal study within the perceptual assimilation model
Bianca Sisinni, Mirko Grimaldi

The role of variability in non-native perceptual learning of a Japanese geminate-singleton fricative contrast
Makiko Sadakata, James M. McQueen

Fluency changes with general progress in L2 proficiency
Jared Bernstein, Jian Cheng, Masanori Suzuki

Tongue gestures awareness and pronunciation training
Slim Ouni

Impact of speaker variability on speech perception in non-native listeners
Wim A. van Dommelen, Valerie Hazan

Acquisition of timing patterns in second language
Mikhail Ordin, Leona Polyanskaya, Christiane Ulbrich

Context-dependent duration modeling with backoff strategy and look-up tables for pronunciation assessment and mispronunciation detection
Hongyan Li, Shen Huang, Shijin Wang, Bo Xu

Perceptual training of vowel length contrast of Japanese by L2 listeners: effects of an isolated word versus a word embedded in sentences
Mee Sonu, Keiichi Tajima, Hiroaki Kato, Yoshinori Sagisaka

Similar vowels in L1/L2 production: confused or discerned in early L2 English learners with different amount of exposure
E-Chin Wu

Production and perception of estonian vowels by native and non-native speakers
Lya Meister, Einar Meister

New feature parameters for pronunciation evaluation in English presentations at international conferences
Hiroshi Kibishi, Seiichi Nakagawa

Synchronous reading: learning French orthography by audiovisual training
Gérard Bailly, Will Barbour

Phoneme level non-native pronunciation analysis by an auditory model-based native assessment scheme
Christos Koniaris, Olov Engwall

The open front vowel /æ/ in the production and perception of Czech students of English
Pavel Šturm, Radek Skarnitzl

Error selection for ASR-based English pronunciation training in `my pronunciation coach'
Catia Cucchiarini, Henk van den Heuvel, Eric Sanders, Helmer Strik

An experimental analysis of pitch patterns in Japanese speakers of English with verification by speech re-synthesis
Tomoko Nariai, Kazuyo Tanaka

An analysis of word duration in native speakers and Japanese speakers of English
Tomoko Nariai, Kazuyo Tanaka, Yoshiaki Ito

ASR - Search, Keyword Spotting and Confidence Measures I, II

A template based voice trigger system using bhattacharyya edit distance
Evelyn Kurniawati, Samsudin Ng, Karthik Muralidhar, Sapna George

Acoustic look-ahead for more efficient decoding in LVCSR
D. Nolden, Ralf Schlüter, Hermann Ney

A new epsilon filter for efficient composition of weighted finite-state transducers
Frank Duckhorn, Matthias Wolff, Rüdiger Hoffmann

A bottom-up stepwise knowledge-integration approach to large vocabulary continuous speech recognition using weighted finite state machines
Sabato Marco Siniscalchi, Torbjørn Svendsen, Chin-Hui Lee

Combining information sources for confidence estimation with CRF models
M. S. Seigel, P. C. Woodland

Evaluation of fast spoken term detection using a suffix array
Kouichi Katsurada, Shinta Sawada, Shigeki Teshima, Yurie Iribe, Tsuneo Nitta

Event selection from phone posteriorgrams using matched filters
Keith Kintzley, Aren Jansen, Hynek Hermansky

A piecewise aggregate approximation lower-bound estimate for posteriorgram-based dynamic time warping
Yaodong Zhang, James Glass

OOV detection and recovery using hybrid models with different fragments
Long Qin, Ming Sun, Alexander Rudnicky

AUC optimization based confidence measure for keyword spotting
Haiyang Li, Jiqing Han, Tieran Zheng

An empirical study of multilingual spoken term detection
Zejun Ma, Xiaorui Wang, Bo Xu

Fusing multiple confidence measures for Chinese spoken term detection
Zejun Ma, Xiaorui Wang, Bo Xu

Response probability based decoding algorithm for large vocabulary continuous speech recognition
Zhanlei Yang, Hao Chao, Wenju Liu

Combining lattice-based language dependent and independent approaches for out-of-language detection in LVCSR
Yuxiang Shan, Yan Deng, Jia Liu

Evaluation of tree-trellis based decoding in over-million LVCSR
Naoaki Ito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Lattice based discriminative model combination using automatically induced phonetic contexts
Hao Huang, Bing Hu Li

Predicting human perceived accuracy of ASR systems
Taniya Mishra, Andrej Ljolje, Mazin Gilbert

Cross-lingual study of ASR errors: on the role of the context in human perception of near-homophones
I. Vasilescu, D. Yahia, N. Snoeren, Martine Adda-Decker, Lori Lamel

Performance prediction of speech recognition using average-voice-based speech synthesis
Tatsuhiko Saito, Takashi Nose, Takao Kobayashi, Yohei Okato, Akio Horii

Confidence measures for turkish call center conversations
Ali Haznedaroglu, Levent M. Arslan

Spoken document confidence estimation using contextual coherence
Taichi Asami, Narichika Nomoto, Satoshi Kobashikawa, Yoshikazu Yamaguchi, Hirokazu Masataki, Satoshi Takahashi

SLP for Information Extraction and Retrieval I, II

Latent topic modeling for audio corpus summarization
Timothy J. Hazen

Investigation of spontaneous speech characterization applied to speaker role recognition
Richard Dufour, Yannick Estève, Paul Deléglise

Zero-resource audio-only spoken term detection based on a combination of template matching techniques
Armando Muscariello, Guillaume Gravier, Frédéric Bimbot

Automatic learning in content indexing service using phonetic alignment
Yeon-Jun Kim, David C. Gibbon

Leveraging relevance cues for improved spoken document retrieval
Pei-Ning Chen, Kuan-Yu Chen, Berlin Chen

Spoken lecture summarization by random walk over a graph constructed with automatically extracted key terms
Yun-Nung Chen, Yu Huang, Ching-Feng Yeh, Lin-shan Lee

Topic segmentation of TV-streams by mathematical morphology and vectorization
Vincent Claveau, Sébastien Lefèvre

Probabilistic latent semantic analysis for broadcast news story segmentation
Mimi Lu, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li

Hybrid speech recognition for voice search: a comparative study
Evandro Gouvêa

A new phonetic candidate generator for improving search query efficiency
Bo Peng, Yao Qian, Frank K. Soong, Bo Zhang

Towards voice-input symbolic pattern retrieval using parameter-based search
Yukiko Suzuki, Kiyoaki Aikawa

A language independent approach to audio search
Vikram Gupta, Jitendra Ajmera, Arun Kumar, Ashish Verma

Speaker Diarization I, II

Speaker diarization using a priori acoustic information
Hagai Aronowitz

Improved overlapped speech handling for speaker diarization
Kofi Boakye, Oriol Vinyals, Gerald Friedland

Exploiting intra-conversation variability for speaker diarization
Stephen Shum, Najim Dehak, Ekapol Chuangsuwanich, Douglas Reynolds, James Glass

Speaker clustering based on non-negative matrix factorization
Masafumi Nishida, Seiichi Yamamoto

Information bottleneck features for HMM/GMM speaker diarization of meetings recordings
Sree Harsha Yella, Fabio Valente

Cross likelihood ratio based speaker clustering using eigenvoice models
D. Wang, Robbie Vogt, Sridha Sridharan, David Dean

Prosodic and phonetic features for speaker clustering in speaker diarization systems
Janez Žibert, France Mihelič

Diarization-based speaker retrieval for broadcast television archives
Marijn Huijbregts, David A. van Leeuwen

The detection of overlapping speech with prosodic features for speaker diarization
Martin Zelenák, Javier Hernando

LP residual features for robust, privacy-sensitive speaker diarization
Sree Hari Krishnan Parthasarathi, Hervé Bourlard, Daniel Gatica-Perez

Extending the task of diarization to speaker attribution
Houman Ghaemmaghami, David Dean, Robbie Vogt, Sridha Sridharan

Comparing multi-stage approaches for cross-show speaker diarization
Viet-Anh Tran, Viet Bac Le, Claude Barras, Lori Lamel

Prosody I, II

A quantitative investigation of the prosody of verum focus in Italian
Giuseppina Turco, Michele Gubian, Jessamyn Schertz

Effects of focus on f_0 and duration in irish (gaelic) declaratives
Amelie Dorn, Ailbhe Ní Chasaide

The phonology and phonetics of perceived prosody: what do listeners imitate?
Jennifer Cole, Stefanie Shattuck-Hufnagel

Uncovering the effect of imitation on tonal patterns of French accentual phrases
Amandine Michelas, Noël Nguyen

Crossmodal prosodic and gestural contribution to the perception of contrastive focus
Pilar Prieto, Cecilia Pugliesi, Joan Borràs-Comes, Ernesto Arroyo, Josep Blat

Temporal relationship between auditory and visual prosodic cues
Erin Cvejic, Jeesun Kim, Chris Davis

Analysing the correspondence between automatic prosodic segmentation and syntactic structure
György Szaszák, Katalin Nagy, András Beke

Long-distance rhythmic dependencies and their application to automatic language identification
Joseph Tepperman, Emily Nava

Symbolic and direct sequential modeling of prosody for classification of speaking-style and nativeness
Andrew Rosenberg

Prosodic analysis and perception of Mandarin utterances conveying attitudes
Wentao Gu, Ting Zhang, Hiroya Fujisaki

Predicting taiwan Mandarin tone shapes from their duration
Chierh Cheng, Michele Gubian

Variation of accent type and of context - influences on pragmatic focus interpretation
Charlotte Wollermann, Ulrich Schade, Bernhard Schröder

ASR - New Paradigms

New methods for template selection and compression in continuous speech recognition
Xie Sun, Yunxin Zhao

Structured support vector machines for noise robust continuous speech recognition
Shi-Xiong Zhang, M. J. F. Gales

Continuous digits recognition leveraging invariant structure
Masayuki Suzuki, Gakuto Kurata, Masafumi Nishimura, Nobuaki Minematsu

Convergence of line search a-function methods
Dimitri Kanevsky, David Nahamoo, Tara N. Sainath, Bhuvana Ramabhadran

Hidden boosted MMI and hierarchical state posterior feature for automatic speech recognition based on hidden conditional neural fields
Yasuhisa Fujii, Kazumasa Yamamoto, Seiichi Nakagawa

Recognition and real time performances of a lightweight ultrasound based silent speech interface employing a language model
Jun Cai, Bruce Denby, Pierre Roussel, Gérard Dreyfus, Lise Crevier-Buchman

Adaptation for ASR

Model adaptation for automatic speech recognition based on multiple time scale evolution
Shinji Watanabe, Atsushi Nakamura, Biing-Hwang Juang

Integrated online speaker clustering and adaptation
C. Breslin, K. K. Chin, M. J. F. Gales, Kate Knill

A study on speaker normalized MLP features in LVCSR
Zoltán Tüske, Christian Plahl, Ralf Schlüter

Matrix-variate distribution of training models for robust speaker adaptation
Yongwon Jeong, Young Kuk Kim

Separating speaker and environmental variability using factored transforms
Michael L. Seltzer, Alex Acero

Your mobile virtual assistant just got smarter!
Mazin Gilbert, Iker Arizmendi, Enrico Bocchieri, Diamantino Caseiro, Vincent Goffin, Andrej Ljolje, Mike Phillips, Chao Wang, Jay Wilpon

Speech Enhancement

Evaluating artificial bandwidth extension by conversational tests in car using mobile devices with integrated hands-free functionality
Laura Laaksonen, Ville Myllylä, Riitta Niemistö

Low-frequency bandwidth extension of telephone speech using sinusoidal synthesis and Gaussian mixture model
Hannu Pulakka, Ulpu Remes, Santeri Yrttiaho, Kalle J. Palomäki, Mikko Kurimo, Paavo Alku

Memory-based approximation of the Gaussian mixture model framework for bandwidth extension of narrowband speech
Amr H. Nour-Eldin, Peter Kabal

Speech enhancement by reconstruction from cleaned acoustic features
Philip Harding, Ben Milner

A soft decision-based speech enhancement using acoustic noise classification
Jae-Hun Choi, Sang-Kyun Kim, Joon-Hyuk Chang

A noise estimation method based on speech presence probability and spectral sparseness
Chao Li, Wenju Liu

Improved a posteriori speech presence probability estimation based on cepstro-temporal smoothing and time-frequency correlation
Chao Li, Wenju Liu

A rapid adaptation algorithm for tracking highly non-stationary noises based on Bayesian inference for on-line spectral change point detection
Md Foezur Rahman Chowdhury, Sid-Ahmed Selouani, Douglas O'Shaughnessy

Single channel speech enhancement using MMSE estimation of short-time modulation magnitude spectrum
Kuldip Paliwal, Belinda Schwerin, Kamil Wójcicki

Speech enhancement using masking properties in adverse environments
Atanu Saha, Tetsuya Shimamura

Phoneme-dependent NMF for speech enhancement in monaural mixtures
Bhiksha Raj, Rita Singh, Tuomas Virtanen

Kernel PCA for speech enhancement
Christina Leitner, Franz Pernkopf, Gernot Kubin

Objective intelligibility prediction of speech by combining correlation and distortion based techniques
Angel M. Gomez, Belinda Schwerin, Kuldip Paliwal

Spoken Dialogue & Spoken Language Understanding Systems

Multi-view approach for speaker turn role labeling in TV broadcast news shows
Géraldine Damnati, Delphine Charlet

Evaluation of an integrated authoring tool for building advanced question-answering characters
Sudeep Gandhe, Michael Rushforth, Priti Aggarwal, David Traum

Towards unsupervised spoken language understanding: exploiting query click logs for slot filling
Gokhan Tur, Dilek Hakkani-Tür, Dustin Hillard, Asli Celikyilmaz

Web-enhanced content retrieval for information access dialogue system
Donghyeon Lee, Cheongjae Lee, Minwoo Jeong, Kyungduk Kim, Seokhwan Kim, Junhwi Choi, Gary Geunbae Lee

Uncertainty management for on-line optimisation of a POMDP-based large-scale spoken dialogue system
Lucie Daubigney, Milica Gašić, Senthilkumar Chandramohan, Matthieu Geist, Olivier Pietquin, Steve Young

Detection of task-incomplete dialogs based on utterance-and-behavior tag n-gram for spoken dialog systems
Sunao Hara, Norihide Kitaoka, Kazuya Takeda

Shrinkage-based features for natural language call routing
Ruhi Sarikaya, Stanley F. Chen, Bhuvana Ramabhadran

Clustering with modified cosine distance learned from constraints
Leonid Rachevsky, Dimitri Kanevsky, Ruhi Sarikaya, Bhuvana Ramabhadran

Using speaker ID to discover repeat callers of a spoken dialog system
Andrew Fandrianto, Brian Langner, Alan W. Black

Semantic graph clustering for POMDP-based spoken dialog systems
Florian Pinault, Fabrice Lefèvre

Learning place-names from spoken utterances and localization results by mobile robot
Ryo Taguchi, Yuji Yamada, Koosuke Hattori, Taizo Umezaki, Masahiro Hoguro, Naoto Iwahashi, Kotaro Funakoshi, Mikio Nakano

Active learning for dialogue act classification
Björn Gambäck, Fredrik Olsson, Oscar Täckström

Speaker role recognition using question detection and characterization
Thierry Bazillon, Benjamin Maza, Michael Rouvier, Frederic Bechet, Alexis Nasr

Learning score structure from spoken language for a tennis game
Qiang Huang, Stephen J. Cox

Semi-automated classifier adaptation for natural language call routing
Silke M. Witt

Interactional style detection for versatile dialogue response using prosodic and semantic features
Wei-Bin Liang, Chung-Hsien Wu, Chih-Hung Wang, Jhing-Fa Wang

Quality aspects of multimodal dialog systems: identity, stimulation and success
Christine Kühnel, Benjamin Weiss, Matthias Schulz, Sebastian Möller

Prosodic Structure

Where should pitch accents and phrase breaks go? a syntax tree transducer solution
Joseph Tepperman, Emily Nava

Phrasal prominences do not need pitch movements: postfocal phrasal heads in Italian
Giuliano Bocci, Cinzia Avesani

Intonation of left dislocated topics in modern greek
David Le Gac, Hiyon Yoo

Phrases, pitch and perceived prominence in māori
Laura Thompson, Catherine I. Watson, Ray Harlow, Jeanette King, Margaret Maclagan, Helen Charters, Peter Keegan

Perceptual sensitivity to prenuclear and nuclear intonational patterns
Tomáš Duběda

Tonal alignment defined: the case of southern irish English
Raya Kalaldeh

Using mutual information to identify regions of analysis for prosodic analysis
Andrew Rosenberg

Prosodic highlights in Mandarin continuous speech - cross-genre attributes and implications
Chiu-yu Tseng, Chao-yu Su, Chi-Feng Huang

When two newly-acquired words are one: new words differing in stress alone are not automatically represented differently
Simone Sulpizio, James M. McQueen

Automatic determination of the standard Chinese prosodic phrase boundaries by f_0 generation model
Shehui Bu, Zhenjie Zhuo, Lingling Yang, Shuichi Itahashi

Measuring speakers' similarity in speech by means of prosodic cues: methods and potential
Céline De Looze, Stéphane Rauzy

Tonal variations in Mandarin: new evidence from spontaneous and read speech
Li-chiung Yang

Language Processing

Accounting for prosodic information to improve ASR-based topic tracking for TV broadcast news
Camille Guinaudeau, Julia Hirschberg

Morpheme conversion for connecting speech recognizer and language analyzers in unsegmented languages
Kenji Imamura, Tomoko Izumi, Kugatsu Sadamitsu, Kuniko Saito, Satoshi Kobashikawa, Hirokazu Masataki

Emotion detection based on concept inference and spoken sentence analysis for customer service
Ren-Ying Fang, Bo-Wei Chen, Jhing-Fa Wang, Chung-Hsien Wu

Commas recovery with syntactic features in French and in Czech
Christophe Cerisara, Pavel Král, Claire Gardent

Redundancy reduction in ASR of spontaneous speech through statistical machine translation
Daniele Falavigna

From interview to news text: a study of taiwan TV Political interviews in newspaper reports
Chin-Chih Chiang

Paralinguistic Information - Classification and Detection

On the use of multimodal cues for the prediction of degrees of involvement in spontaneous conversation
Catharine Oertel, Stefan Scherer, Nick Campbell

Anger recognition in spoken dialog using linguistic and para-linguistic information
Narichika Nomoto, Masafumi Tamoto, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi

Recognition of personality traits from human spoken conversations
A. V. Ivanov, G. Riccardi, A. J. Sporka, J. Franc

Using multiple databases for training in emotion recognition: to unite or to vote?
Björn Schuller, Zixing Zhang, Felix Weninger, Gerhard Rigoll

“would you buy a car from me?” - on the likability of telephone voices
Felix Burkhardt, Björn Schuller, Benjamin Weiss, Felix Weninger

Automatic identification of salient acoustic instances in couples' behavioral interactions using diverse density support vector machines
James Gibson, Athanasios Katsamanis, Matthew P. Black, Shrikanth Narayanan

Predicting speaker changes and listener responses with and without eye-contact
Daniel Neiberg, Joakim Gustafson

Emotion classification using inter- and intra-subband energy variation
Senaka Amarakeerthi, Tin Lay Nwe, Liyanage C. De Silva, Michael Cohen

Emotion classification of infants' cries using duration ratios of acoustic segments
K. Kitahara, S. Michiwiki, M. Sato, S. Matsunaga, M. Yamashita, K. Shinohara

Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions
Bogdan Vlasenko, Dmytro Prylipko, David Philippou-Hübner, Andreas Wendemuth

Intra-, inter-, and cross-cultural classification of vocal affect
Daniel Neiberg, Petri Laukka, Hillary Anger Elfenbein

Applications for Learning, Education, Aged and Handicapped Persons

Verifying human users in speech-based interactions
Sajad Shirali-Shahreza, Yashar Ganjali, Ravin Balakrishnan

Automatic assessment of prosody in high-stakes English tests
Jian Cheng

Improvement of segmental mispronunciation detection with prior knowledge extracted from large L2 speech corpus
Dean Luo, Xuesong Yang, Lan Wang

Off-topic detection in automated speech assessment applications
Jian Cheng, Jianqiang Shen

Towards context-dependent phonetic spelling error correction in children's freely composed text for diagnostic and pedagogical purposes
Sebastian Stüker, Johanna Fay, Kay Berkling

Factored translation models for improving a speech into sign language translation system
V. López-Ludeña, R. San-Segundo, R. Córdoba, J. Ferreiros, J. M. Montero, J. M. Pardo

Formant maps in Hungarian vowels - online data inventory for research, and education
Kálmán Abari, Zsuzsanna Zsófia Rácz, Gábor Olaszy

Automatic subtitling of the basque parliament plenary sessions videos
Germán Bordel, Silvia Nieto, Mikel Penagarikano, Luis Javier Rodriguez-Fuentes, Amparo Varona

Generating animated pronunciation from speech through articulatory feature extraction
Yurie Iribe, Silasak Manosavanh, Kouichi Katsurada, Ryoko Hayashi, Chunyue Zhu, Tsuneo Nitta

A tale of two tasks: detecting children's off-task speech in a reading tutor
Wei Chen, Jack Mostow

Problems encountered by Japanese EL2 with English short vowels as illustrated on a 3d vowel chart
Toshiko Isei-Jaakkola, Takatoshi Naka, Keikichi Hirose

Automatic generation of listening comprehension learning material in european portuguese
Thomas Pellegrini, Rui Correia, Isabel Trancoso, Jorge Baptista, Nuno Mamede

Candidate generation for ASR output error correction using a context-dependent syllable cluster-based confusion matrix
Chao-Hong Liu, Chung-Hsien Wu, David Sarwono, Jhing-Fa Wang

Semi-supervised tree support vector machine for online cough recognition
Thai Hoa Huynh, Vu An Tran, Huy Dat Tran

Source Separation and Speech Enhancement

Monaural voiced speech segregation based on pitch and comb filter
Xueliang Zhang, Wenju Liu

Fast and simple iterative algorithm of lp-norm minimization for under-determined speech separation
Yasuharu Hirasawa, Naoki Yasuraoka, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

Monaural speech separation based on a 2d processing and harmonic analysis
Azam Rabiee, Saeed Setayeshi, Soo-Young Lee

Underdetermined blind source separation with fuzzy clustering for arbitrarily arranged sensors
Ingrid Jafari, Serajul Haque, Roberto Togneri, Sven Nordholm

On initial seed selection for frequency domain blind speech separation
Dang Hai Tran Vu, Reinhold Haeb-Umbach

Spatial filter calibration based on minimization of modified LSD
Nobuaki Tanaka, Tetsuji Ogawa, Tetsunori Kobayashi

Probabilistic spectrum envelope: categorized audio-features representation for NMF-based sound decomposition
Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki

A high resolution multiple source localization based on generalized cumulant structure (GCS) matrix
Jinho Choi, Chang D. Yoo

Single channel speech music separation using nonnegative matrix factorization with sliding windows and spectral masks
Emad M. Grais, Hakan Erdogan

Perceptually-inspired processing for multichannel Wiener filter
Jorge I. Marin-Hurtado, David V. Anderson

Speech recognition in mixed sound of speech and music based on vector quantization and non-negative matrix factorization
Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa

Reduction of highly nonstationary ambient noise by integrating spectral and locational characteristics of speech and noise for robust ASR
Tomohiro Nakatani, Shoko Araki, Marc Delcroix, Takuya Yoshioka, Masakiyo Fujimoto

Voice processing by dynamic glottal models with applications to speech enhancement
Carlo Drioli, Andrea Calanca

Supervised sparse coding strategy in cochlear implants
Jinqiu Sang, Guoping Li, Hongmei Hu, Mark E. Lutman, Stefan Bleeck

Phonetics and Phonology, Stress, Accent, Rhythm

Chinese and Italian speech rhythm: normalization and the CCI algorithm
Chiara Bertini, Pier Marco Bertinetto, Na Zhi

Rhythm metrics on syllables and feet do not work as expected
Paolo Mairano, Antonio Romano

Applying rhythm features to automatically assess non-native speech
Lei Chen, Klaus Zechner

Prosodic synchrony in co-operative task-based dialogues: a measure of agreement and disagreement
Brian Vaughan

Low and high, short and long by crook or by hook?
Oliver Niebuhr, Astrid Wolf

Estimating speaking rate by means of rhythmicity parameters
Christian Heinrich, Florian Schiel

Comparing word and syllable prominence rated by naïve listeners
Denis Arnold, Bernd Möbius, Petra Wagner

L1/L2 perception of lexical stress with F0 peak-delay: effect of an extra syllable added
Shinichi Tokuma, Yi Xu

Letter-to-phoneme conversion based on two-stage neural network focusing on letter and phoneme contexts
Kheang Seng, Yurie Iribe, Tsuneo Nitta

An international English speech corpus for longitudinal study of accent development
Rosemary Orr, Hugo Quené, Roeland van Beek, Thari Diefenbach, David A. van Leeuwen, Marijn Huijbregts

A corpus-based study of English pronunciation variations
Sunhee Kim, Kyuwhan Lee, Minhwa Chung

Long term average speech spectra in yolngu matha and pitjantjatjara speaking females and males
Hywel Stoakes, Andrew Butcher, Janet Fletcher, Marija Tabain

Context and speaker dependency in the relation of vowel formants and subglottal resonances - evidence from Hungarian
Tekla Etelka Gráczi, Steven M. Lulich, Tamás Gábor Csapó, András Beke

Pitch Processing - Singing Voice Analysis

Fundamental frequency estimation using modified higher order moments and multiple windows
Alipah Pawi, Saeed Vaseghi, Ben Milner, Seyed Ghorshi

EM-based gain adaptation for probabilistic multipitch tracking
Michael Wohlmayr, Franz Pernkopf

Joint robust voicing detection and pitch estimation based on residual harmonics
Thomas Drugman, Abeer Alwan

Epoch extraction in high pass filtered speech using hilbert envelope
D. Govind, S. R. M. Prasanna, Debadatta Pati

Robust HNR-based closed-loop pitch and harmonic parameters estimation
Alexander Pavlovets, Alexander Petrovsky

Exploring bessel features for detection of glottal closure instants
Chetana Prakash, Dhananjaya N., Suryakanth V. Gangashetty

Evaluation of glottal epoch detection algorithms on different voice types
João P. Cabral, John Kane, Christer Gobl, Julie Carson-Berndsen

A divide et impera algorithm for optimal pitch stylization
Antonio Origlia, Giovanni Abete, Francesco Cutugno, Iolanda Alfano, Renata Savy, Bogdan Ludusan

Singing voice analysis using relative harmonic delays
Ricardo Sousa, Aníbal Ferreira

Singing voice synthesis: singer-dependent vibrato modeling and coherent processing of spectral envelope
S. W. Lee, Minghui Dong

Chorus digitalis: experiments in chironomic choir singing
Sylvain Le Beux, Lionel Feugère, Christophe d'Alessandro

Prosodic Modeling

Prominence model for prosodic features in automatic lexical stress and pitch accent detection
Kun Li, Shuang Zhang, Mingxing Li, Wai-Kit Lo, Helen Meng

Hierarchical stress modeling in Mandarin text-to-speech
Ya Li, Jianhua Tao, Xiaoying Xu

Automatic prosodic events detection by using syllable-based acoustic, lexical and syntactic features
Chong-Jia Ni, Wenju Liu, Bo Xu

Using dynamic time warping to compute prosodic similarity measures
Albert Rilliard, Alexandre Allauzen, Philippe Boula de Mareüil

Applying the quantitative target approximation model (qTA) to German and brazilian portuguese
Plínio A. Barbosa, Hansjörg Mixdorff, Sandra Madureira

Stylization and trajectory modelling of short and long term speech prosody variations
Nicolas Obin, Anne Lacheret, Xavier Rodet

Toward a continuous modeling of French prosodic structure: using acoustic features to predict prominence location and prominence degree
Mathieu Avanzi, Nicolas Obin, Anne Lacheret-Dujour, Bernard Victorri

Optimal models of prosodic prominence using the Bayesian information criterion
Tim Mahrt, Jui-Ting Huang, Yoonsook Mo, Margaret Fleck, Mark Hasegawa-Johnson, Jennifer Cole

Quantitative analysis of tone coarticulation in Mandarin
Hussein Hussein, Hansjörg Mixdorff, Hue San Do, Rüdiger Hoffmann

Tracking pitch contours using minimum jerk trajectories
Daniel Neiberg, G. Ananthakrishnan, Joakim Gustafson

Discourse and Dialogue

On the use of linguistic features in an automatic system for speech analytics of telephone conversations
Benjamin Maza, Marc El-Beze, Georges Linares, Renato De Mori

Determining what questions to ask, with the help of spectral graph theory
Abe Kazemzadeh, Sungbok Lee, Panayiotis G. Georgiou, Shrikanth Narayanan

`are you sure you're paying attention?' - `uh-huh' communicating understanding as a marker of attentiveness
Hendrik Buschmeier, Zofia Malisz, Marcin Włodarczak, Stefan Kopp, Petra Wagner

Projectability of transition-relevance places using prosodic features in Japanese spontaneous conversation
Yuichi Ishimoto, Mika Enomoto, Hitoshi Iida

Measuring final lengthening for speaker-change prediction
Anna Hjalmarsson, Kornel Laskowski

Incremental learning and forgetting in stochastic turn-taking models
Kornel Laskowski, Jens Edlund, Mattias Heldner

Reinforcement learning of argumentation dialogue Policies in negotiation
Kallirroi Georgila, David Traum

Topic switching strategies for spoken dialogue systems
Tobias Heinroth, Savina Koleva, Wolfgang Minker

Unsupervised clustering of utterances using non-parametric Bayesian methods
Ryuichiro Higashinaka, Noriaki Kawamae, Kugatsu Sadamitsu, Yasuhiro Minami, Toyomi Meguro, Kohji Dohsaka, Hirohito Inagaki

SLP for Speech Translation, Information Extraction and Retrieval

OOV sensitive named-entity recognition in speech
Carolina Parada, Mark Dredze, Frederick Jelinek

Speech translation with grammar driven probabilistic phrasal bilexica extraction
Markus Saers, Dekai Wu, Chi-kiu Lo, Karteek Addanki

An efficient unified extraction algorithm for bilingual data
Christoph Tillmann, Sanjika Hewavitharana

Using features from topic models to alleviate over-generation in hierarchical phrase-based translation
Songfang Huang, Bowen Zhou

An empirical study on improving hierarchical phrase-based translation using alignment features
Songfang Huang, Bowen Zhou

Robust speech translation by domain adaptation
Xiaodong He, Li Deng

Enhancements to the training process of classifier-based speech translator via topic modeling
Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth Narayanan

A scalable approach to building a parallel corpus from the web
Vivek Kumar Rangarajan Sridhar, Luciano Barbosa, Srinivas Bangalore

Spoken term detection results using plural subword models by estimating detection performance for each query
Yoshiaki Itoh, Kohei Iwata, Masaaki Ishigame, Kazuyo Tanaka, Shi-wook Lee

Speechforms: from web to speech and back
Luciano Barbosa, Diamantino Caseiro, Giuseppe Di Fabbrizio, Amanda Stent

Image processing filters for line detection-based spoken term detection
Kazuyuki Noritake, Hiroaki Nanjo, Takehiko Yoshimi

Using latent topic features for named entity extraction in search queries
Joe Polifroni, François Mairesse

Language model expansion using webdata for spoken document retrieval
Ryo Masumura, Seongjun Hahm, Akinori Ito

Effects of query expansion for spoken document passage retrieval
Tomoyosi Akiba, Koichiro Honda

Unsupervised hidden Markov modeling of spoken queries for spoken term detection without speech recognition
Chun-an Chan, Lin-shan Lee

Topic identification from audio recordings using rich recognition results and neural network based classifiers
Roberto Gemello, Franco Mana, Pier Domenico Batzu

Speech Synthesis - Selected Topics

A grammar based approach to style specific phrase prediction
Alok Parlikar, Alan W. Black

Unsupervised features from text for speech synthesis in a speech-to-speech translation system
Oliver Watts, Bowen Zhou

Unsupervised continuous-valued word features for phrase-break prediction without a part-of-speech tagger
Oliver Watts, Junichi Yamagishi, Simon King

Albayzín 2010: a Spanish text to speech evaluation
Francisco Campillo, Francisco Méndez, Montserrat Arza, Laura Docío, Antonio Bonafonte, Eva Navas, Iñaki Sainz

Combining active and semi-supervised learning for homograph disambiguation in Mandarin text-to-speech synthesis
Binbin Shen, Zhiyong Wu, Yongxin Wang, Lianhong Cai

Automatically creating a diphone set from a speech database
Thomas Ewender, Beat Pfister

Automatic viseme clustering for audiovisual speech synthesis
Wesley Mattheyses, Lukas Latacz, Werner Verhelst

Perceptual quality dimensions of text-to-speech systems
Florian Hinterleitner, Sebastian Möller, Christoph Norrenbrock, Ulrich Heute

A pointwise approach to pronunciation estimation for a TTS front-end
Shinsuke Mori, Graham Neubig

Correlating text with prosody
Mohamed Abou-Zleikha, Julie Carson-Berndsen

“what is… dengue fever?” - modeling and predicting pronunciation errors in a text-to-speech system
Andrew Rosenberg, Raul Fernandez, Bhuvana Ramabhadran

Aperiodicity analysis for quality estimation of text-to-speech signals
Christoph Norrenbrock, Ulrich Heute, Florian Hinterleitner, Sebastian Möller

Human Speech and Sound Perception I, II

Parallels in infants' attention to speech articulation and to physical changes in speech-unrelated objects
Eeva Klintfors, Ellen Marklund, Francisco Lacerda

Speech events are recoverable from unlabeled articulatory data: using an unsupervised clustering approach on data obtained from electromagnetic midsaggital articulography (EMA)
Daniel Duran, Jagoda Bruni, Grzegorz Dogil, Hinrich Schütze

Children's recognition of their own voice: influence of phonological impairment
Sofia Strömbergsson

Evaluation of bone-conducted ultrasonic hearing-aid regarding transmission of speaker discrimination information
Takayuki Kagomiya, Seiji Nakagawa

Impact of different feedback mechanisms in EMG-based speech recognition
Christian Herff, Matthias Janke, Michael Wand, Tanja Schultz

Phonotactic constraints and the segmentation of Cantonese speech
Michael C. W. Yip

Reaction time and decision difficulty in the perception of intonation
Katrin Schneider, Grzegorz Dogil, Bernd Möbius

Processing of stress related acoustic cues as indexed by ERPs
Ferenc Honbolygó, Valéria Csépe

On the relationship between perceived accentedness, acoustic similarity, and processing difficulty in foreign-accented speech
Marijt J. Witteman, Andrea Weber, James M. McQueen

The perception boundary between single and geminate stops in 3- and 4-mora Japanese words
Shigeaki Amano, Yukari Hirata

Correlation analysis of acoustic features with perceptual voice quality similarity for similar speaker selection
Yusuke Ijima, Mitsuaki Isogai, Hideyuki Mizuno

Pointing gestures do not influence the perception of lexical stress
Alexandra Jesse, Holger Mitterer

Relationships between phonetic features and speech perception - a statistical investigation from a large anechoic british English corpus
Ian R. Cushing, Francis F. Li, Ken Worrall, Tim Jackson

The representation of speech in a nonlinear auditory model: time-domain analysis of simulated auditory-nerve firing patterns
Guy J. Brown, Tim Jürgens, Ray Meddis, Matthew Robertson, Nicholas R. Clark

An automatic voice pleasantness classification system based on prosodic and acoustic patterns of voice preference
Luis Coelho, Daniela Braga, Miguel Sales-Dias, Carmen Garcia-Mateo

Contributions of F1 and F2 (F2') to the perception of plosive consonants
René Carré, Pierre Divenyi, Willy Serniclaes, Emmanuel Ferragne, Egidio Marsico, Viet-Son Nguyen

Auditory speech processing is affected by visual speech in the periphery
Jeesun Kim, Chris Davis

Visual speech speeds up auditory identification responses
Tim Paris, Jeesun Kim, Chris Davis

Agglomerative hierarchical clustering of emotions in speech based on subjective relative similarity
Ryoichi Takashima, Tohru Nagano, Ryuki Tachibana, Masafumi Nishimura

Optimal syllabic rates and processing units in perceiving Mandarin spoken sentences
Guangting Mai, Gang Peng

Cross-lingual speaker discrimination using natural and synthetic speech
Mirjam Wester, Hui Liang

Multilingual and Multimodal Approaches to Spoken Language

Can audio-visual speech recognition outperform acoustically enhanced speech recognition in automotive environment?
Rajitha Navarathna, Tristan Kleinschmidt, David Dean, Sridha Sridharan, Patrick Lucey

A multimodal approach to dictation of handwritten historical documents
Vicent Alabau, Verónica Romero, Antonio-L. Lagarda, Carlos-D. Martínez-Hinarejos

Weight optimization for bimodal unit-selection talking head synthesis
Asterios Toutios, Utpala Musti, Slim Ouni, Vincent Colotte

Modality selection and perceived mental effort in a mobile application
Stefan Schaffer, Benjamin Jöckel, Ina Wechsung, Robert Schleicher, Sebastian Möller

A cross-lingual spoken content search system
Jitendra Ajmera, Ashish Verma

Nemo: a platform for multilingual news monitoring
C. Girardi, Roberto Gretter, Daniele Falavigna, Fabio Brugnara, Diego Giuliani, M. Federico

Unsupervised learning of acoustic unit descriptors for audio content representation and classification
Sourish Chaudhuri, Mark Harvilla, Bhiksha Raj

Conditioned hidden Markov model fusion for multimodal classification
Michael Glodek, Stefan Scherer, Friedhelm Schwenker

Distant speech recognition in a smart home: comparison of several multisource ASRs in realistic conditions
Benjamin Lecouteux, Michel Vacher, François Portet

A robust approach to mining repeated sequence in audio stream
Jiansong Chen, Lei Zhu, Bailan Feng, Peng Ding, Bo Xu

ASR - New Paradigms and Other Topics

Accelerated parallelizable neural network learning algorithm for speech recognition
Dong Yu, Li Deng

Deep convex net: a scalable architecture for speech pattern classification
Li Deng, Dong Yu

Modeling broad context for tone recognition with conditional random fields
Siwei Wang, Gina-Anne Levow

Improved tonal language speech recognition by integrating spectro-temporal evidence and pitch information with properly chosen tonal acoustic units
Shang-wen Li, Yow-bang Wang, Liang-che Sun, Lin-shan Lee

Kullback-leibler divergence-based ASR training data selection
Evandro Gouvêa, Marelie H. Davel

Articulatory feature classification using nearest neighbors
Arild Brandrud Næss, Karen Livescu, Rohit Prabhavalkar

Continuous episodic memory based speech recognition using articulatory dynamics
Sébastien Demange, Slim Ouni

Graphone model interpolation and Arabic pronunciation generation
T. Li, P. C. Woodland, F. Diehl, M. J. F. Gales

Grapheme-to-phoneme conversion using conditional random fields
Irina Illina, Dominique Fohr, Denis Jouvet

Bilingual acoustic model adaptation by unit merging on different levels and cross-level integration
Ching-Feng Yeh, Chao-Yu Huang, Lin-shan Lee

A qualitative evaluation of phoneme-to-phoneme technology
Marijn Schraagen, Gerrit Bloothooft

Cheap bootstrap of multi-lingual hidden Markov models
Daniele Falavigna, Roberto Gretter

Adaptive stream fusion in multistream recognition of speech
Nima Mesgarani, Samuel Thomas, Hynek Hermansky

Unsupervised audio patterns discovery using HMM-based self-organized units
Man-hung Siu, Herbert Gish, Steve Lowe, Arthur Chan

Nearest neighbors with learned distances for phonetic frame classification
John Labiak, Karen Livescu

Speech Audio Analysis and Classification

Stop consonant recognition by temporal fine structure of burst
Seppo Fagerlund, Unto K. Laine

Phonetic classification using controlled random walks
Katrin Kirchhoff, Andrei Alexandrescu

Keyphrase cloud generation of broadcast news
Luís Marujo, Márcio Viveiros, João Paulo da Silva Neto

Optimized feature extraction and HMMs in subword detectors
Alfonso M. Canterla, Magne H. Johnsen

Real-world speech/non-speech audio classification based on sparse representation features and GPCs
Ziqiang Shi, Jiqing Han, Tieran Zheng

Privacy preserving speaker verification using adapted GMMs
Manas A. Pathak, Bhiksha Raj

Clustering expressive speech styles in audiobooks using glottal source parameters
Éva Székely, João P. Cabral, Peter Cahill, Julie Carson-Berndsen

On the use of the rhythmogram for automatic syllabic prominence detection
Bogdan Ludusan, Antonio Origlia, Francesco Cutugno

Speech modulation features for robust nonnative speech accent detection
Sethserey Sam, Xiong Xiao, Laurent Besacier, Eric Castelli, Haizhou Li, Eng Siong Chng

Frame-level vocal effort likelihood space modeling for improved whisper-island detection
Chi Zhang, John H. L. Hansen

Speaker identification for whispered speech using a training feature transformation from neutral to whisper
Xing Fan, John H. L. Hansen

An accurate and robust gender identification algorithm
Andrea DeMarco, Stephen J. Cox

Deep belief networks for automatic music genre classification
Xiaohong Yang, Qingcai Chen, Shusen Zhou, Xiaolong Wang

Image representation of the subband power distribution for robust sound classification
Jonathan Dennis, Huy Dat Tran, Haizhou Li

Acoustic and visual cues of turn-taking dynamics in dyadic interactions
Bo Xiao, Viktor Rozgić, Athanasios Katsamanis, Brian R. Baucom, Panayiotis G. Georgiou, Shrikanth Narayanan

Speech Audio Analysis

Robust audio fingerprinting based on local spectral luminance maxima scheme
Yong-zhe Shi, Wei-Qiang Zhang, Jia Liu

Entropy-rate driven inference of stochastic grammars
Unto K. Laine

An efficient pre-processing scheme to improve the sound source localization system in noisy environment
Sheng-Chieh Lee, K. Bharanitharan, Bo-Wei Chen, Jhing-Fa Wang, Chung-Hsien Wu, Min-Jian Liao

A study on auditory feature spaces for speech-driven lip animation
Guylaine Le-Jan, Yannick Benezeth, Guillaume Gravier, Frédéric Bimbot

Phase-only speech reconstruction using very short frames
Erfan Loweimi, Seyed Mohammad Ahadi, Hamid Sheikhzadeh

Frequency-warped and stabilized time-varying cepstral coefficients
Trond Skogstad, Torbjørn Svendsen

Using human perception for automatic accent assessment
Freddy William, Abhijeet Sangwan, John H. L. Hansen

A study of the effectiveness of articulatory strokes for phonemic recognition
Carlos Molina, Sungbok Lee, Shrikanth Narayanan, Néstor Becerra Yoma

Auditory filterbank improves voice morphing
Erika Okamoto, Toshio Irino, Ryuichi Nisimura, Hideki Kawahara

Monaural sound localization
Anna Katharina Fuchs, Christian Feldbauer, Michael Stark

Speech Coding

Dual-mode AVQ coding based on spectral masking and sparseness detection for ITU-t g.711.1/g.722 super-wideband extensions
Masahiro Fukui, Shigeaki Sasaki, Yusuke Hiwasaki, Kurihara Sachiko, Yoichi Haneda

Phone impact based speech transmission technique for reliable speech recognition in poor wireless network conditions
Azar Taufique, Kumaran Vijayasankar, Wooil Kim, John H. L. Hansen, Marco Tacca, Andrea Fumagalli

Automatic speech codec identification with applications to tampering detection of speech recordings
Jingting Zhou, Daniel Garcia-Romero, Carol Y. Espy-Wilson

A hybrid quasi-harmonic/CELP wideband speech coding scheme for unit selection TTS synthesis
Chang-Heon Lee, Olivier Rosec, Yannis Stylianou

Voice quality characterization of IETF opus codec
Anssi Rämö, Henri Toukomaa

Leja ordering LSFs for accurate estimation of predictor coefficients
C. F. Pedersen

Improved quality for conversational voIP using path diversity
Qipeng Gong, Peter Kabal

Tree encoding for the ITU-t g.711.1 speech coder
Abdul Hannan Khan, Peter Kabal

Parallel and hierarchical decision making for sparse coding in speech recognition
Dong Wang, Ravichander Vipperla, Nicholas Evans

A new model-based Mandarin-speech coding system
Chen-Yu Chiang, Jyh-Her Yang, Ming-Chieh Liu, Yih-Ru Wang, Yuan-Fu Liao, Sin-Horn Chen

Robustness and Adaptation for ASR

Using unsupervised feature-based speaker adaptation for improved transcription of spoken archives
Petr Cerva, Karel Palecek, Jan Silovsky, Jan Nouza

Online speaker adaptation with pre-computed FMLLR transformations
Volker Fischer, Siegfried Kunzmann

Instantaneous speaker adaptation through selection and combination of fMLLR transformation matrices
Diego Giuliani, Fabio Brugnara

Joint bilinear transformation space based maximum a posteriori linear regression adaptation using prior with variance function
Hwa Jeon Song, Yunkeun Lee, Hyung Soon Kim

A study on combining VTLN and SAT to improve the performance of automatic speech recognition
D. R. Sanand, Mikko Kurimo

Incorporating regional information to enhance MAP-based stochastic feature compensation for robust speech recognition
Yu Tsao, Paul R. Dixon, Chiori Hori, Hisashi Kawai

A study on the effect of pitch on LPCC and PLPC features for children's ASR in comparison to MFCC
Shweta Ghai, Rohit Sinha

About handling boundary uncertainty in a speaking rate dependent modeling approach
Denis Jouvet, Dominique Fohr, Irina Illina

An active learning approach to task adaptation
Ji Wu, Zhiyang He, Ping Lv

Efficient speaker and noise normalization for robust speech recognition
Vikas Joshi, Raghavendra Bilgi, S. Umesh, C. Benitez, L. Garcia

How realistic is artificially added noise?
Thomas Winkler

Voice Activity Detection

Voice activity detection in MTF-based power envelope restoration
Masashi Unoki, Xugang Lu, Rico Petrick, Shota Morita, Masato Akagi, Rüdiger Hoffmann

Using spectral fluctuation of speech in multi-feature HMM-based voice activity detection
Miquel Espi, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono, Shigeki Sagayama

Linear dynamic models for voice activity detection
Kannu Mehta, Chau Khoa Pham, Eng Siong Chng

Detection of shouted speech in the presence of ambient noise
Jouni Pohjalainen, Tuomo Raitio, Paavo Alku

Breath-detection-based telephony speech phrasing
Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura

Multi-channel voice activity detection based on conic constraints
Gibak Kim

Multi-sensor voice activity detection based on multiple observation hypothesis testing
Theodoros Petsatodis, Fotios Talantzis, Christos Boukis, Zheng-Hua Tan, Ramjee Prasad

Online speech activity detection in broadcast news
Chao Gao, Guruprasad Saikumar, Saurabh Khanwalkar, Avi Herscovici, Anoop Kumar, Amit Srivastava, Premkumar Natarajan

A real-time speech command detector for a smart control room
Daniel Reich, Felix Putze, Dominic Heger, Joris Ijsselmuiden, Rainer Stiefelhagen, Tanja Schultz

Robust voice activity detector for real world applications using harmonicity and modulation frequency
Ekapol Chuangsuwanich, James Glass

On noise robust voice activity detection
Tomas Dekens, Werner Verhelst

Adaptive regularization framework for robust voice activity detection
Xugang Lu, Masashi Unoki, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura

Human Speech Production I

On the use of extended context for HMM-based spontaneous conversational speech synthesis
Tomoki Koriyama, Takashi Nose, Takao Kobayashi

Predicting tongue positions from acoustics and facial features
Asterios Toutios, Slim Ouni

Assessing acoustic reduction: exploiting local structure in speech
Louis ten Bosch, Annika Hämäläinen, Mirjam Ernestus

The “fortis-lenis” distinction in Bulgarian and German
Bistra Andreeva, Magdalena Wolska

Acoustic correlates of glottal gaps
Gang Chen, Jody Kreiman, Yen-Liang Shue, Abeer Alwan

Using a genetic algorithm to estimate parameters of a coarticulation model
Brian O. Bush, John-Paul Hosom, Alexander Kain, Akiko Amano-Kusumoto

Synthesis of breathy, normal, and pressed phonation using a two-mass model with a triangular glottis
Peter Birkholz, Bernd J. Kröger, Christiane Neuschaefer-Rube

Analysis of inter-articulator correlation in acoustic-to-articulatory inversion using generalized smoothness criterion
Prasanta Kumar Ghosh, Shrikanth Narayanan

Frequency-domain representation of source-filter coupling and its effect in the production of voice
Tokihiko Kaburagi

Method for speech inversion with large scale statistical evaluation
Heikki Rasilo, Unto K. Laine, Okko Räsänen, Toomas Altosaar

Italian in the no-man's land between stress-timing and syllable-timing? speakers are more stress-timed than listeners
Bettina Braun, Sabine Geiselmann

The lombard effect in spontaneous dialog speech
Laura Folk, Florian Schiel

Voice Conversion and Speech Synthesis

Gaussian process experts for voice conversion
Nicholas C. V. Pilkington, Heiga Zen, M. J. F. Gales

Intonation conversion from neutral to expressive speech
Christophe Veaux, Xavier Rodet

Speaker-adaptive speech synthesis based on eigenvoice conversion and language-dependent prosodic conversion in speech-to-speech translation
Nobuhiko Hattori, Tomoki Toda, Hisashi Kawai, Hiroshi Saruwatari, Kiyohiro Shikano

Adding glottal source information to intra-lingual voice conversion
Javier Pérez, Antonio Bonafonte

Formant-controlled HMM-based speech synthesis
Ming Lei, Junichi Yamagishi, Korin Richmond, Zhen-Hua Ling, Simon King, Li-Rong Dai

Analysis of HMM-based lombard speech synthesis
Tuomo Raitio, Antti Suni, Martti Vainio, Paavo Alku

Discrete/continuous modelling of speaking style in HMM-based speech synthesis: design and evaluation
Nicolas Obin, Pierre Lanchantin, Anne Lacheret, Xavier Rodet

Factored MLLR adaptation for singing voice generation
June Sig Sung, Doo Hwa Hong, Shin Jae Kang, Nam Soo Kim

Adaptation of prosody in speech synthesis by changing command values of the generation process model of fundamental frequency
Keikichi Hirose, Keiko Ochi, Ryusuke Mihara, Hiroya Hashimoto, Daisuke Saito, Nobuaki Minematsu

Prosody conversion for emotional Mandarin speech synthesis using the tone nucleus model
Miaomiao Wen, Miaomiao Wang, Keikichi Hirose, Nobuaki Minematsu

Rapid adaptation of foreign-accented HMM-based speech synthesis
Reima Karhila, Mirjam Wester

The effects of phoneme errors in speaker adaptation for HMM speech synthesis
Bálint Tóth, Tibor Fegyó, Géza Németh

Human Speech Production II

Articulatory reduction in Mandarin Chinese words
Jeffrey Berry, Sunjing Ji, Ian Fasel, Diana Archangeli

Morphological variation in the adult vocal tract: a modeling study of its potential acoustic impact
Adam Lammert, Michael Proctor, Athanasios Katsamanis, Shrikanth Narayanan

Analysis and automatic estimation of children's subglottal resonances
Steven M. Lulich, Harish Arsikere, John R. Morton, Gary K. F. Leung, Abeer Alwan, Mitchell S. Sommers

Acceleration sensor based estimates of subglottal resonances: short vs. long vowels
Wolfgang Wokurek, Andreas Madsack

Comparison of nasalance measurements from accelerometers and microphones and preliminary development of novel features
Nicolas Audibert, Angélique Amelot

The effect of seeing the interlocutor on speech production in different noise types
Michael Fitzpatrick, Jeesun Kim, Chris Davis

Conversing in the presence of a competing conversation: effects on speech production
Vincent Aubanel, Martin Cooke, Julián Villegas, Maria Luisa Garcia Lecumberri

Very short utterances and timing in turn-taking
Mattias Heldner, Jens Edlund, Anna Hjalmarsson, Kornel Laskowski

Validating rt-MRI based articulatory representations via articulatory recognition
Athanasios Katsamanis, Erik Bresch, Vikram Ramanarayanan, Shrikanth Narayanan

An electropalatographic and acoustic study on anticipatory coarticulation in V1#C2V2 sequences in standard Chinese
Yinghao Li, Jiangping Kong

Final /t/ reduction in dutch past-participles: the role of word predictability and morphological decomposability
Iris Hanique, Mirjam Ernestus

Parametrising degree of articulator movement from dynamic MRI data
Zeynab Raeesy, Ladan Baghai-Ravary, John Coleman

Systems for LVCSR and Rich Transcription

Improving LVCSR system combination using neural network language model cross adaptation
X. Liu, M. J. F. Gales, P. C. Woodland

Towards high performance LVCSR in speech-to-speech translation system on smart phones
Jian Xue, Xiaodong Cui, Gregg Daggett, Etienne Marcheret, Bowen Zhou

Deploying google search by voice in Cantonese
Yun-Hsuan Sung, Martin Jansche, Pedro J. Moreno

An investigation in speech recognition for colloquial Arabic
Sarah Al-Shareef, Thomas Hain

A multithreaded implementation of Viterbi decoding on recursive transition networks
Fabio Brugnara

Recurrent neural network based language modeling in meeting recognition
Stefan Kombrink, Tomáš Mikolov, Martin Karafiát, Lukáš Burget

Ad-hoc meeting transcription on clusters of mobile devices
Michele Cossalter, Priya Sundararajan, Ian Lane

ROVER enhancement with automatic error detection
Kacem Abida, Fakhri Karray

Automatic comma insertion of lecture transcripts based on multiple annotations
Yuya Akita, Tatsuya Kawahara

Language, Dialect Identification and Speaker Diarization

Study on the relevance factor of maximum a posteriori with GMM for language recognition
Chang Huai You, Haizhou Li, Kong Aik Lee

Improving multiband position-pitch algorithm for localization and tracking of multiple concurrent speakers by using a frequency selective criterion
Tania Habib, Harald Romsdorfer

On the use of lattices of time-synchronous cross-decoder phone co-occurrences in a SVM-phonotactic language recognition system
Amparo Varona, Mikel Penagarikano, Luis Javier Rodriguez-Fuentes, Germán Bordel

Speaker clustering based on utterance-oriented dirichlet process mixture model
Naohiro Tawara, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi

PLDA-based clustering for speaker diarization of broadcast streams
Jan Silovsky, Jan Prazak, Petr Cerva, Jindrich Zdansky, Jan Nouza

ivector approach to phonotactic language recognition
Mehdi Soufifar, Marcel Kockmann, Lukáš Burget, Oldřich Plchot, Ondřej Glembek, Torbjørn Svendsen

Discriminative features for language identification
Chris Alberti, Michiel Bacchiani

Perceptual sensitivity to dialectal and generational variations in vowels
Robert Allen Fox, Ewa Jacewicz

Investigation of cross-show speaker diarization
Qian Yang, Qin Jin, Tanja Schultz

Language identification for text chats
Vesa Siivola, Bryan Pellom, Meagan Sills

Spoken language recognition in the latent topic simplex
Kong Aik Lee, Chang Huai You, Ville Hautamäki, Anthony Larcher, Haizhou Li

Paralinguistic Information - Analysis and Tools

Investigating robustness of spectral moments on normal- and high-effort speech
Frederike Gottsmann, Corinna Harwardt

Comparing the impact of raised vocal effort on various spectral parameters
Corinna Harwardt

Vowel context and speaker interactions influencing glottal open quotient and formant frequency shifts in physical task stress
Keith W. Godin, John H. L. Hansen

Prosodic correlates of individual physiological response to stress
Serguei Pakhomov, Michael Kotlyar

The vocal effort of dominance in scenario meetings
Marcela Charfuelan, Marc Schröder

A preliminary model of emotional prosody using multidimensional scaling
Sona Patel, Rahul Shrivastav

An exploratory study of the relations between perceived emotion strength and articulatory kinematics
Jangwon Kim, Sungbok Lee, Shrikanth Narayanan

Improved acoustic characterization of breathy and whispery voices
Carlos T. Ishi, Hiroshi Ishiguro, Norihiro Hagita

Neutral to target emotion conversion using source and suprasegmental information
D. Govind, S. R. M. Prasanna, B. Yegnanarayana

A multimodal analysis of vocal and visual backchannels in spontaneous dialogs
Khiet P. Truong, Ronald Poppe, Iwan de Kok, Dirk Heylen

Kernel models for affective lexicon creation
Nikos Malandrakis, Alexandros Potamianos, Elias Iosif, Shrikanth Narayanan

Speech and Language Processing-Based Assistive Technologies and Health Applications (Special Session)

Automatic detection of depression in speech using Gaussian mixture modeling with factor analysis
Douglas Sturim, Pedro A. Torres-Carrasquillo, Thomas F. Quatieri, Nicolas Malyska, Alan McCree

Utterance verification for automating the hearing in noise test (HINT)
H. Timothy Bunnell, Jason Lilley, Sigfrid D. Soli, Ivan Pal

Analyzing the nature of ECA interactions in children with autism
Emily Mower, Chi-Chun Lee, James Gibson, Theodora Chaspari, Marian E. Williams, Shrikanth Narayanan

Incorporating speech recognition engine into an intelligent assistive reading system for dyslexic students
Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou, Evmorfia N. Argyriou, Antonis Symvonis

An investigation of depressed speech detection: features and normalization
Nicholas Cummins, Julien Epps, Michael Breakspear, Roland Goecke

Using prosodic and spectral features in detecting depression in elderly males
Michelle Hewlett Sanchez, Dimitra Vergyri, Luciana Ferrer, Colleen Richey, Pablo Garcia, Bruce Knoth, William Jarrold

Combining phonological and acoustic ASR-free features for pathological speech intelligibility assessment
Catherine Middag, Tobias Bocklet, Jean-Pierre Martens, Elmar Nöth

Speech synthesis parameter generation for the assistive silent speech interface MVOCA
Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore, Sergey I. Rybchenko

Computer-assisted disfluency counts for stuttered speech
Peter A. Heeman, Andy McMillin, J. Scott Yaruss

Spectral features for automatic blind intelligibility estimation of spastic dysarthric speech
Richard Hummel, Wai-Yip Chan, Tiago H. Falk

Extraction of narrative recall patterns for neuropsychological assessment
Emily T. Prud'hommeaux, Brian Roark

Gesture design of hand-to-speech converter derived from speech-to-hand converter based on probabilistic integration model
Aki Kunikoshi, Yu Qiao, Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose

Powered wheelchair control using acoustic-based recognition of head gesture accompanying speech
Akira Sasou

Analyzing training dependencies and posterior fusion in discriminant classification of apnea patients based on sustained and connected speech
José Luis Blanco, Rubén Fernández, Doroteo Torre, F. Javier Caminero, Eduardo López

Crowdsourcing for Speech Processing (Special Session)

Speaking to the crowd: looking at past achievements in using crowdsourcing for speech and predicting future challenges
Gabriel Parent, Maxine Eskenazi

A transcription task for crowdsourcing with automatic quality control
Chia-ying Lee, James Glass

Reliability-weighted acoustic model adaptation using crowd sourced transcriptions
Kartik Audhkhasi, Panayiotis G. Georgiou, Shrikanth Narayanan

Crowdsourcing for word recognition in noise
Martin Cooke, Jon Barker, Maria Luisa Garcia Lecumberri, Krzysztof Wasilewski

Crowdsourcing preference tests, and how to detect cheating
Sabine Buchholz, Javier Latorre

Growing a spoken language interface on Amazon Mechanical Turk
Ian McGraw, James Glass, Stephanie Seneff

Real user evaluation of spoken dialogue systems using Amazon Mechanical Turk
F. Jurčíček, S. Keizer, Milica Gašić, François Mairesse, B. Thomson, K. Yu, Steve Young

Quality assessment of crowdsourcing transcriptions for african languages
Hadrien Gelas, Solomon Teferra Abate, Laurent Besacier, François Pellegrino

Using crowdsourcing to provide prosodic annotations for non-native speech
Keelan Evanini, Klaus Zechner

Podcastle: recent advances of a spoken document retrieval service improved by anonymous user contributions
Masataka Goto, Jun Ogata

Spoken Language Processing of Human-Human Conversations (Special Session)

Language-independent socio-emotional role recognition in the AMI meetings corpus
Fabio Valente, Alessandro Vinciarelli

Measuring acoustic-prosodic entrainment with respect to multiple levels and dimensions
Rivka Levitan, Julia Hirschberg

Automatic call quality monitoring using cost-sensitive classification
Youngja Park

Learning influences from word use in polylogue
Tomoharu Iwata, Shinji Watanabe

Identifying agreement/disagreement in conversational speech: a cross-lingual study
Wen Wang, Kristin Precoda, Colleen Richey, Geoffrey Raymond

A dual channel coupled decoder for fillers and feedback
Daniel Neiberg, Joakim Gustafson

An analysis of PCA-based vocal entrainment measures in married couples' affective spoken interactions
Chi-Chun Lee, Athanasios Katsamanis, Matthew P. Black, Brian R. Baucom, Panayiotis G. Georgiou, Shrikanth Narayanan

Speech and Audio Processing for Human-Robot Interaction (Special Session)

Using prominence detection to generate acoustic feedback in tutoring scenarios
Lars Schillingmann, Petra Wagner, Christian Munier, Britta Wrede, Katharina Rohlfing

Bayesian extension of MUSIC for sound source localization and tracking
Takuma Otsuka, Kazuhiro Nakadai, Tetsuya Ogata, Hiroshi G. Okuno

Speech-based non-prototypical affect recognition for child-robot interaction in reverberated environments
Martin Wöllmer, Felix Weninger, Stefan Steidl, Anton Batliner, Björn Schuller

Blind source separation for robot audition using fixed beamforming with HRTFs
Mounira Maazaoui, Yves Grenier, Karim Abed-Meraim

Real-life emotion detection from speech in human-robot interaction: experiments across diverse corpora with child and adult voices
Marie Tahon, Agnes Delaborde, Laurence Devillers

Weighted ordered classes - nearest neighbors: a new framework for automatic emotion recognition from speech
Yazid Attabi, Pierre Dumouchel

Prosodic analysis of a corpus of tales
David Doukhan, Albert Rilliard, Sophie Rosset, Martine Adda-Decker, Christophe d'Alessandro

Analysis of acoustic-prosodic features related to paralinguistic information carried by interjections in dialogue speech
Carlos T. Ishi, Hiroshi Ishiguro, Norihiro Hagita

Robust intonation pattern classification in human robot interaction
Martin Heckmann, Kazuhiro Nakadai, Hirofumi Nakajima

ASR for human-symbiotic robot “EMIEW2” with mechanical noise and floor-level noise reduction
Takashi Sumiyoshi, Masahito Togami, Yasunari Obuchi

Speech Technology for Under-Resourced Languages (Special Session)

Rapid building of an ASR system for under-resourced languages based on multilingual unsupervised training
Ngoc Thang Vu, Franziska Kraus, Tanja Schultz

Places and manner of articulation of Bangla consonants: an EPG based study
Shyamal Kr. Das Mandal, Somnath Chandra, Swaran Lata, A. K. Datta

Efficient harvesting of internet audio for resource-scarce ASR
Marelie H. Davel, Charl van Heerden, Neil Kleynhans, Etienne Barnard

Automatic prosody generation for serbo-croatian speech synthesis based on regression trees
Milan Sečujski, Darko Pekar, Nikša Jakovljević

Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis
Alexey Karpov, Irina Kipyatkova, Andrey Ronzhin

Cross-language phone recognition when the target language phoneme inventory is not known
Timothy Kempton, Roger K. Moore, Thomas Hain

A paradigm for limited vocabulary speech recognition based on redundant spectro-temporal feature sets
Sourish Chaudhuri, Bhiksha Raj, Tony Ezzat

Gorup: an ontology-driven audio information retrieval system that suits the requirements of under-resourced languages
N. Barroso, K. López de Ipiña, A. Ezeiza, C. Hernández, N. Ezeiza, O. Barroso, U. Susperregi, S. Barroso

Woefzela - an open-source platform for ASR data collection in the developing world
Nic J. de Vries, Jaco Badenhorst, Marelie H. Davel, Etienne Barnard, Alta de Waal

A study on the perception of tone and intonation in Sesotho
Hansjörg Mixdorff, Lehlohonolo Mohasi, 'Malillo Machobane, Thomas Niesler

Developing a broadband automatic speech recognition system for Afrikaans
Febe de Wet, Alta de Waal, Gerhard B. van Huyssteen

Multi-accent speech recognition of Afrikaans, black and white varieties of south african English
Herman Kamper, Thomas Niesler

Perceptual representation of consonant sounds in Thai
C. Tantibundhit, C. Onsuwan, T. Saimai, N. Saimai, S. Thatphithakkul, P. Chootrakool, K. Kosawat, N. Thatphithakkul

A cross-lingual approach to the development of an HMM-based speech synthesis system for malay
Mumtaz B. Mustafa, Raja N. Ainon, Roziati Zainuddin, Zuraidah M. Don, Gerry Knowles

Speaker State Challenge - Intoxication and Sleepiness I, II (Special Session)

The INTERSPEECH 2011 speaker state challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Florian Schiel, Jarek Krajewski

Combining multiple phoneme-based classifiers with audio feature-based classifier for the detection of alcohol intoxication
Claude Montacié, Marie-José Caraty

Intoxication detection using phonetic, phonotactic and prosodic cues
Fadi Biadsy, William Yang Wang, Andrew Rosenberg, Julia Hirschberg

Drink and speak: on the automatic classification of alcohol intoxication by acoustic, prosodic and text-based features
Tobias Bocklet, Korbinian Riedhammer, Elmar Nöth

Intoxicated speech detection by fusion of speaker normalized hierarchical features and GMM supervectors
Daniel Bone, Matthew P. Black, Ming Li, Angeliki Metallinou, Sungbok Lee, Shrikanth Narayanan

Attention, sobriety checkpoint! can humans determine by means of voice, if someone is drunk… and can automatic classifiers compete?
Stefan Ultes, Alexander Schmitt, Wolfgang Minker

Does it groove or does it stumble - automatic classification of alcoholic intoxication using prosodic features
Florian Hönig, Anton Batliner, Elmar Nöth

Perception of alcoholic intoxication in speech
Florian Schiel

Detecting sleepiness by fusing classifiers trained with novel acoustic features
Tauhidur Rahman, Soroosh Mariooryad, Shalini Keshavamurthy, Gang Liu, John H. L. Hansen, Carlos Busso

An HMM-based approach to the INTERSPEECH 2011 speaker state challenge
Albino Nogueiras Rodríguez

RANSAC-based training data selection for speaker state recognition
Elif Bozkurt, Engin Erzin, Çiğdem Eroğlu Erdem, A. Tanju Erdem

University of Ljubljana system for interspeech 2011 speaker state challenge
Rok Gajšek, Simon Dobrišek, France Mihelič

Speaker state classification based on fusion of asymmetric SIMPLS and support vector machines
Dong-Yan Huang, Shuzhi Sam Ge, Zhengchen Zhang

Speech Processing Tools (Special Session)

Speech processing tools - an introduction to interoperability
Christoph Draxler, Toomas Altosaar, Sadaoki Furui, Mark Liberman, Peter Wittenburg

Easyalign: an automatic phonetic alignment tool under praat
Jean-Philippe Goldman

Mtrans: a multi-channel, multi-tier speech annotation tool
Julián Villegas, Martin Cooke, Vincent Aubanel, Marco A. Piccolino-Boniforti

The JSafran platform for semi-automatic speech processing
Christophe Cerisara, Claire Gardent

The social signal interpretation framework (SSI) for real time signal processing and recognition
Johannes Wagner, Florian Lingenfelser, Elisabeth André

ELAN - aspects of interoperability and functionality
Han Sloetjes, Peter Wittenburg, Aarthy Somasundaram

Open source voice creation toolkit for the MARY TTS platform
Marc Schröder, Marcela Charfuelan, Sathish Pammi, Ingmar Steiner

Java visual speech components for rapid application development of GUI based speech processing applications
Stefan Steidl, Korbinian Riedhammer, Tobias Bocklet, Florian Hönig, Elmar Nöth

mtalk - a multimodal browser for mobile services
Michael Johnston, Giuseppe Di Fabbrizio, Simon Urbanek

Web-based automatic speech recognition service - webASR
Stuart N. Wrigley, Thomas Hain

A web based speech transcription workplace
Markus Klehr, Andreas Ratzka, Thomas Roß

Winpitch: a multimodal tool for speech analysis of endangered languages
Philippe Martin

Recording caregiver interactions for machine acquisition of spoken language using the KLAIR virtual infant
Mark Huckvale

Show & Tell Demonstration - Speech Systems and Applications (Special Session)

An affective spoken storyteller
Felix Burkhardt

Text driven 3d photo-realistic talking head
Lijuan Wang, Wei Han, Frank K. Soong, Qiang Huo

Physical models producing vowels with pitch variation
Takayuki Arai

An engine-independent text-to-speech workplace
Margot Mieskes

An application to test the emotion conveyed by vocal and musical signals
Simone Carcone, Carlo Giovannella

Automatic speech recognition system dedicated for Polish
Mariusz Ziółko, Jakub Gałka, Bartosz Ziółko, Tomasz Jadczyk, Dawid Skurzok, Mariusz Masior

Joint application of speech and speaker recognition for automation and security in smart home
Kong Aik Lee, Anthony Larcher, Helen Thai, Bin Ma, Haizhou Li

Adding a speech cursor to a multimodal dialogue system
Staffan Larsson, Alexander Berman, Jessica Villing

Prosody toolkit: integrating HTK, praat and WEKA
S. Thomas Christie, Serguei Pakhomov

Collecting life logs for experience-based corpora
F. Francesconi, A. Ghosh, G. Riccardi, M. Ronchetti, A. Vagin

Show & Tell Demonstration - Mobility and Web-Services (Special Session)

Making an automatic speech recognition service freely available on the web
Stuart N. Wrigley, Thomas Hain

AT&t voicebuilder: a cloud-based text-to-speech voice builder tool
Yeon-Jun Kim, Thomas Okken, Alistair D. Conkie, Giuseppe Di Fabbrizio

Extending audio notetaker to browse webASR transcriptions
Roger Tucker, Dan Fry, Vincent Wan, Stuart N. Wrigley, Thomas Hain

A web-based tool for developing multilingual pronunciation lexicons
Samantha Ainsley, Linne Ha, Martin Jansche, Ara Kim, Masayuki Nanzawa

Speak4it and the multimodal semantic interpretation system
Michael Johnston, Patrick Ehlen

TSAB - web interface for transcribed speech collections
Tanel Alumäe, Ahti Kitsik

Visual voice mail to text on the iphone/ipad
Andrej Ljolje, Vincent Goffin, Diamantino Caseiro, Taniya Mishra, Mazin Gilbert

Percy - an HTML5 framework for media rich web experiments on mobile devices
Christoph Draxler

The KLAIR toolkit for recording interactive dialogues with a virtual infant
Mark Huckvale

Real-time prototype for integration of blind source extraction and robust automatic speech recognition
Francesco Nesta, Marco Matassoni, HariKrishna Maganti

Interspeech 2011

Florence, Italy 27-31 August 2011

General Chairs: Piero Cosi, Renato De Mori