logo
Department of Electrical Engineering
 IIT Kanpur

EE627A - Jan. 2018

Speech Signal processing

Home page

Content

Announcements

Schedule

Class Notes

Assignments

Quiz Solutions

Projects

Student Presentations

Grading Policy

Course Content and References (in pdf)
I. Overview of the speech production mechanism
a) The physiological model of speech production
b) The mathematical (source-system) model of speech production
c) Relation between (a) and (b)
d) Physiological and mathematical basis of categorization of speech sounds
II. Basic Signal processing techniques for speech recognition
a) Discrete time speech signals, relevant properties of the fast Fourier
transform and Z-transform for speech
processing. Convolution, filter banks, and analytical pole-zero
modeling of the speech signal
b) Spectral estimation of speech using the Discrete Fourier transform
c) Pole-zero modeling, linear prediction (LP) analysis, perceptual linear
prediction (PLP), analysis of
speech
d) Homomorphic speech signal deconvolution, real and complex cepstrum,
application of cepstral analysis to speech signals

III. The speech recognition front end and pattern comparison techniques
a) Mel frequency cepstral co-efficients (MFCC), MVDR-MFCC, RASTA-PLP
cepstral co-efficients.
b) Issues in feature vector extraction for speech recongition, Static and
dynamic feature vectors for speech
recognition, robustness issues, discrimination in the feature space,
feature selection
c) Log spectral distance, cepstral distances, weighted cepstral distances,
distances for linear and warped
scales
IV. Statistical models for speech recognition
a) Vector quantization models for speech and speaker recognition
b) Gaussian mixture modeling for speaker, language and speech recognition
c) Hidden Markov modeling for isolated word and continuous speech recognition
V. Speech Recognition in practice
a)Using HTK for speech recognition



 Detailed Reference List:
*  Note that portions of these references will be used for teaching. While you are
    encouraged to read all the stuff in the book chapters mentioned below,
    it is not mandatory. You need to read only what is covered in the lectures which will be listed
    in the course notes link to the left of this page.
    

1. Discrete-Time Speech Signal Processing: Principles and Practice,  Thomas F. Quatieri,
     Cloth, 816 pp.  ISBN: 013242942X Published: OCT 29, 2001

     Chapters 2, 3, 5, 6, 7, 13 (CMS & SS only), 14 (14.2, 14.3 only)
2. Fundamentals of Speech Recognition, L. Rabiner and B. Juang,
   
Prentice-Hall SignalProcessing Series, Pages: 507, Year of Publication: 1993, ISBN:0-13-015157-2

    Chapters 1, 2, 3, 6, 8

3. Speech and Audio Signal Processing: Processing and perception of speech and music
    B. Gold and N. Morgan, Wiley 2000, ISBN: 0-471-35154-7

    Chapters 5 ,6, 7, 8, 9, 19, 20, 21, 22, 23, 24, 25, 26, 28 (overview only)

4. Corpus-Based Methods in Language and Speech Processing, Steve Young et. al editors, 234 pages, Kluwer, ISBN                      0-7923-4463-4

    Chapters 2, 3

5. Discrete Time Processing of Speech Signals, JR Deller, JG Proakis, JH Hansen,
    Year of Publication: 1993, ISBN:0023283017

    Chapters 1, 2, 4, 6, 10, 11, 12

6. IEEE Transactions on audio, speech and language processing, (formerly speech and audio processing, ASSP),               Available on IEEE Xplore accessible from inside UCSD and on VPN from outside

   6.a J. Makhoul, Linear Prediction: A Tutorial Review
   6.b JW Picone, Signal Modeling Techniques in Speech Recognition
   6.c SB Davis and P Mermelstein, Comparison of Parametric Representations for Monosyllabic Word Recognition in
        Continuously Spoken  Sentences   
   6.d  H Hermansky and N Morgan, RASTA Processing of Speech
    6.e  DA Reynolds and RC Rose, Robust Text-Independent Speaker Identification Using Gaussian
         Mixture Speaker Models
   6.f  LR Rabiner and BH Juang, An Introduction to Hidden Markov Models
   6.g LR Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech
         Recognition

7. The HTK toolkit for speech recognition
    http://htk.eng.cam.ac.uk/

8. The Sphinx toolkit for speech recognition
    http://cmusphinx.sourceforge.net/html/cmusphinx.php

9. Hidden Markov Models for Speech Recognition, XD Huang, Y Ariki, MA Jack, Edinburgh University Press

     Chapters 2,3,4,5,6,8

9. Digital Processing of Speech Signals, LR Rabiner and RW Schafer, Pearson Education
   
    Chapters 3, 4, 6, 7, 8.
   








 Rajesh Hegde<rhegde@iitk.ac.in>
 Dept. of Electrical Engg. IIT Kanpur