A multi-stream approach to utilizing the inherently large
number of spectro-temporal features for speech recognition is
investigated in this study. Instead of reducing the feature-
space dimension, this method divides the features into streams
so that each represents a patch of information in the spectro-
temporal response field. When used in combination with
MFCCs for speech recognition under both clean and noisy
conditions, multi-stream spectro-temporal features provide
roughly a 30% relative improvement in word-error rate over
using MFCCs alone. The result suggests that the multi-stream
approach may be an effective way to handle and utilize
spectro-temporal features for speech applications.
information at a certain time. Thus, there is a need to explore
the saliency of spectro-temporal features in different
environments as well as methods that allow the dynamic
selection of these features.