, . . . http://cvsp.cs.ntua.gr . , . . CVSP --
. () 3 7 . + 2-5 . + . : . , . , / ( )
- & / & : http://cvsp.cs.ntua.gr
(McGurk & MacDonald)) () : / -
/ , :
- (King et al., Deng) : N (Articulatory Gestures, Browman & Gold)stein) ... (.. Bell, 1867))
G. Papand)reou, A. Katsamanis, V. Pitsikalis, and) P. Maragos, Ad)aptive Multimod)al Fusion by Uncertainty Compensation with Application to Aud)io-Visual Speech Recognition, IEEE Trans. ASLP, 2009 : &
() :
1 2 :
. Face detector System Overview Adaboost-based, @5 fps Image Acquisition Firewire color camera, 640x480 @25 fps (Re)initialization
Face tracking & feature extraction Real-time AAM fitting algorithms GPU-accelerated processing OpenGL implementation HMM-based backend Transcription :
; (.. ) , , , ...
: , , (Knill & Richard)s) (.. Ernst et al.) // Maragos et al., Cross-Mod)al Integration, Springer 2008 : :
: Wiener Kalman ; : :
SNR= 20d)B SNR= 5d)B : o (Gaussian Mixture Mod)el - GMM) S
. : : C X : C X
Y !
GMM S p c | x1:s p (c)1 s ,c N xs ; s ,c , s ,c C X :
p ys | xs N ys ; xs e , s , e, s S M s ,c p c | y1:s p (c ) s ,c ,m N ys ; s ,c ,m e, s , s ,c ,m e , s s 1 m 1 C X Y GMM
1- (y1 y2), 2 S ws : b (c | y ) p ( c ) p( y | c)
1:s s 1 : S p c | y1:s p (c)1 N ys ; s ,c , s ,c e ,s PoG : w N x; , N x; , w 1
S b c | y1:s p (c)1 N ys ; s ,c , w s ,c : 1 s ,c ws ,c e ,s 1 1 s ,c
EM- C
Q( ,pXCX ) [log p( X ,{C}| ) | X , pXCX ] X C X Y Q( , pXCX ) [log p(Y ,{ X , C}| ) | Y , pXCX ]
Markov () & Viterbi () - () ( frame) C1 C2
C3 C4 X1 X2 X3 X4 C1
C2 C3 C4 X1
X2 X3 X4 Y1 Y2 Y3 Y4
Mel Frequency Cepstral Coefficients (MFCCs): Pre-emphasis STFT | . | Mel-scale log( . ) DCT (e.g. SPLICE, ALGONQUIN) MFCC (VTS) X noisy f ( X clean , N ) MFCC MFCC
+ X clean X E Deng, Droppo, Acero, IEEE Tr. SAP, 2005 - 1 2 3
C1 C2 C3 X1 X2 X3 Multistream-
Product- : Asynchronous-HMM, Coupled)-HMM, Dynamic Bayesian Networks, CUAVE
. : CUAVE: 36 (30 , 6 ) 5 10 : 1500 (30x5x10) : 300 (6x5x10) babble - NOISEX HMMs (- , 8 , 1 /, ) HTK (
) AV A
/ AV-W-UC vs. A-UC 28.7 %
AV-UC vs. AV AV-W-UC vs. AV-W 20 % Product-HMM Prod)uct-HMM vs.
Multistream-HMM 1.2 % : &
: MUSCLE (NoE) & HIWIRE (STREP) - A. Katsamanis, G. Papand)reou, and) P. Maragos, Face Active Appearance Mod)eling and) Speech Acoustic Information to Recover Articulation, IEEE Trans. ASLP, 2009 -
: : () : , , MOCHA CSTR, Univ. Edinburgh
(, 1 /1 ), 460 TIMIT (2- 9 ) 30 - phoneme
37 y, x : prior : Yehia, Rubin & Vatikiotis-Bateson, Speech Comm., 1998
CCA . (CCA) CCA : : . : 40
Viterbi Markov -> Hiroya & Honda, IEEE TSAP 2004 : /
: . : HMM / MS-HMM: () : / . : Visemes ( ) ( ) MOCHA
- ( ) (//) : :
: . 51 Katsamanis et al. EUSIPCO 2008 / CVSP (. )
: X-rays, (. . ) Audiovisual Speech Inversion Articulatory Parameter Extraction Articulatory Speech Synthesis Articulatory Model
Training - : : ()
: , , : ASPI (FET) & ()
!
: http://cvsp.cs.ntua.gr