Automatic Speech Recognition (ASR)


Dramatically improved automatic speech recognition performance in moderately noisy and reverberant environments would greatly improve existing applications and enable new ones like a real-time meeting transcriber with rewind and search. In recent NIST evaluations on recognizing speech from meetings, the best system (ours) still generated word errors 40% of the time, and so there is much room for improvement [Morgan, Zhu et al. 2005]. Progress depends on exploiting greatly expanded signal and feature transformations.

Conventional ASR systems have found limited acceptance because they work reliably only under good acoustic situations. We will use the computational capabilities of manycore systems to implement novel ASR systems that perform reliably in noisy environments, such as meetings and teleconferences. We will use our own feature generation subsystems, including Quicknet and Feacalc [Hermansky and Morgan 1994; Morgan, Zhu et al. 2005]. These systems are frequently downloaded [ICSI 2007] and we have used them in combination with SRI's DECIPHER ASR components [Stolcke, Chen et al. 2006] to produce a collaborative system that has been extremely successful at every NIST evaluation on meeting speech. For this project we will use elements from a publicly available system such as Cambridge's HTK (used at universities worldwide, and recently updated to include large vocabulary enhancements) or IDIAP's Juicer [IDIAP 2006].