Binary Speech Features for Keyword Spotting Tasks

Riviello, Alexandre; David, Jean-Pierre

doi:10.21437/Interspeech.2019-1877

Binary Speech Features for Keyword Spotting Tasks

Alexandre Riviello, Jean-Pierre David

Keyword spotting is a classification task which aims to detect a specific set of spoken words. In general, this type of task runs on a power-constrained device such as a smartphone. One method to reduce the power consumption of a keyword spotting algorithm (typically a neural network) is to reduce the precision of the network weights and activations. In this paper, we propose a new representation of speech features which is more adapted to low-precision networks and compatible with binary/ternary neural networks. The new representation is based on the log-Mel spectrogram and models the variation of power over time. Tested on a ResNet, this representation produces results nearly as accurate as full-precision MFCCs, which are traditionally used in speech recognition applications.

doi: 10.21437/Interspeech.2019-1877

Cite as: Riviello, A., David, J.-P. (2019) Binary Speech Features for Keyword Spotting Tasks. Proc. Interspeech 2019, 3460-3464, doi: 10.21437/Interspeech.2019-1877

@inproceedings{riviello19_interspeech,
  author={Alexandre Riviello and Jean-Pierre David},
  title={{Binary Speech Features for Keyword Spotting Tasks}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3460--3464},
  doi={10.21437/Interspeech.2019-1877}
}