National Taiwan University, Taiwan Automatic Key Term Ext raction from Spoken Course Lectures Using Branching Entropy and Pr osodic/Semantic Features Speaker: 2 Outline O Introduction O Proposed Approach O Branching Entropy O Feature Extraction O Learning Method O Experiments & Evaluation O Conclusion Key Term Extraction, NTU 3 Introduction

Key Term Extraction, NTU 4 Key Term Extraction, NTU Definition O Key Term O Higher term frequency O Core content O Two types O Keyword O Key phrase O Advantage O Indexing and retrieval O The relations between key terms and segments of documents 5 Introduction Key Term Extraction, NTU

6 Key Term Extraction, NTU Introduction language model n gram hmm acoustic model hidden Markov model phone 7 Key Term Extraction, NTU Introduction bigram language model n gram hmm acoustic model

hidden Markov model phone Target: extract key terms from course lectures 8 Key Term Extraction, NTU Proposed Approach Key Term Extraction, NTU 9 Automatic Key Term Extractio n Original spoken documents ASR trans Archive of spoken documents ASR speech signal

Learning Methods Branchin g Entropy Feature Extraction 1)K-means Exemplar 2)AdaBoost 3)Neural Network 10 Key Term Extraction, NTU Automatic Key Term Extractio n ASR trans Archive of spoken documents ASR speech signal Learning Methods

Branchin g Entropy Feature Extraction 1)K-means Exemplar 2)AdaBoost 3)Neural Network 11 Key Term Extraction, NTU Automatic Key Term Extractio n ASR trans Archive of spoken documents ASR speech signal Learning Methods Branchin

g Entropy Feature Extraction 1)K-means Exemplar 2)AdaBoost 3)Neural Network 12 Key Term Extraction, NTU Automatic Key Term Extractio n Phrase Identificatio n ASR trans Archive of spoken documents ASR Learning Methods Branchin g Entropy

Feature Extraction 1)K-means Exemplar 2)AdaBoost 3)Neural Network speech signal First using branching entropy to identify phrases 13 Key Term Extraction, NTU Automatic Key Term Extractio n Phrase Identificatio n ASR trans Archive of spoken documents ASR

Key Term Extraction Learning Methods Branchin g Entropy Feature Extraction 1)K-means Exemplar 2)AdaBoost 3)Neural Network speech signal Key terms entropy acoustic model : Learning to extract key terms by some features 14 Key Term Extraction, NTU Automatic Key Term Extractio

n Phrase Identificatio n ASR trans Archive of spoken documents ASR Key Term Extraction Learning Methods Branchin g Entropy Feature Extraction 1)K-means Exemplar 2)AdaBoost 3)Neural Network speech signal Key terms entropy

acoustic model : 15 Key Term Extraction, NTU How to decide the boundary of a phrase? Branching Entropy represent is of in : : hidden Markov model is can : :

O hidden is almost always followed by the same word 16 Key Term Extraction, NTU How to decide the boundary of a phrase? Branching Entropy represent is of in : : hidden Markov model is can : : O hidden is almost always followed by the same word O hidden Markov is almost always followed by the same

word 17 Key Term Extraction, NTU How to decide the boundary of a phrase? Branching Entropy represent is of in : : hidden Markov model boundary is can : :

O hidden is almost always followed by the same word O hidden Markov is almost always followed by the same word O hidden Markov model is followed by many different w ords Define branching entropy to decide possible boundary 18 Key Term Extraction, NTU How to decide the boundary of a phrase? Branching Entropy represent is of in : : hidden Markov model X

xi O Definition of Right Branching Entropy O Probability of children xi for X O Right branching entropy for X is can : : 19 Key Term Extraction, NTU How to decide the boundary of a phrase? Branching Entropy represent is of in :

: hidden Markov model X boundary is can : : O Decision of Right Boundary O Find the right boundary located between X and xi where 20 Key Term Extraction, NTU How to decide the boundary of a phrase? Branching Entropy represent is of

in : : hidden Markov model is can : : 21 Key Term Extraction, NTU How to decide the boundary of a phrase? Branching Entropy represent is of in : :

hidden Markov model is can : : 22 Key Term Extraction, NTU How to decide the boundary of a phrase? Branching Entropy represent is of in : : hidden Markov model is can

: : 23 Key Term Extraction, NTU How to decide the boundary of a phrase? Branching Entropy represent is of in : : hidden Markov model boundary X is can :

: O Decision of Left Boundary X: model Markov hidden O Find the left boundary located between X and xi where Using PAT Tree to implement Key Term Extraction, NTU 24 How to decide the boundary of a phrase? Branching Entropy O Implementation in the PAT tree O Probability of children xi for X hidden O Right branching entropy for X X : hidden Markov x1: hidden Markov model x2: hidden Markov chain 1 Markov

X model variable 4 chain distribution 3 2 x1 state x2 6 5 25 Key Term Extraction, NTU

Automatic Key Term Extractio n Phrase Identificatio n ASR trans Archive of spoken documents ASR Key Term Extraction Learning Methods Branchin g Entropy Feature Extraction 1)K-means Exemplar 2)AdaBoost 3)Neural Network speech signal Key terms

entropy acoustic model : Extract some features for each candidate term Key Term Extraction, NTU 26 Feature Extraction O Prosodic features Speaker tends to use longer duration to emphasize key terms O For each candidate term appearing at the first time Featur e Name duration of phone a normalized by avg duration of phone a Feature Description Duration normalized duration

(I IV) (max, min, mean, using 4 values for range) duration of the term Key Term Extraction, NTU 27 Feature Extraction O Prosodic features Higher pitch may represent significant information O For each candidate term appearing at the first time Featur e Name Feature Description Duration normalized duration (I IV) (max, min, mean, range)

Key Term Extraction, NTU 28 Feature Extraction O Prosodic features Higher pitch may represent significant information O For each candidate term appearing at the first time Featur e Name Feature Description Duration normalized duration (I IV) (max, min, mean, range) Pitch (I - IV) F0 (max, min, mean, range)

Key Term Extraction, NTU 29 Feature Extraction O Prosodic features Higher energy emphasizes important information O For each candidate term appearing at the first time Featur e Name Feature Description Duration normalized duration (I IV) (max, min, mean, range) Pitch (I - IV) F0 (max, min, mean,

range) Key Term Extraction, NTU 30 Feature Extraction O Prosodic features Higher energy emphasizes important information O For each candidate term appearing at the first time Featur e Name Feature Description Duration normalized duration (I IV) (max, min, mean, range) Pitch (I - IV) F0

Energy (I - IV) energy (max, min, mean, range) (max, min, mean, range) 31 Key Term Extraction, NTU Feature Extraction O Lexical features Feature Name Feature Description TF term frequency IDF inverse document frequency

TFIDF tf * idf PoS the PoS tag Using some well-known lexical features for each candidate term Key Term Extraction, NTU 32 Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Probability D1 D2 Di

t1 t2 T1 T2 P(Tk|Di ) Tk tj P(t j |Tk) TK DN Di: documents Tk: latent topics tn tj: terms 33 Key Term Extraction, NTU

Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Probability non-key term key term Feature Name LTP (I - III) How to use it? Feature Description Latent Topic Probability (mean, variance, standard describe deviation)a probability distribution

34 Key Term Extraction, NTU Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Significance Within-topic to out-of-topic ratio within-topic freq. out-of-topic freq. Feature Name LTP (I - III) non-key term key term

Feature Description Latent Topic Probability (mean, variance, standard deviation) 35 Key Term Extraction, NTU Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Significance Within-topic to out-of-topic ratio within-topic freq. out-of-topic freq. Feature Name

LTP (I - III) LTS (I - III) non-key term key term Feature Description Latent Topic Probability (mean, variance, standard deviation) Latent Topic Significance (mean, variance, standard deviation) 36 Key Term Extraction, NTU Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA) Latent Topic Entropy non-key

term key term Feature Name LTP (I - III) LTS (I - III) Feature Description Latent Topic Probability (mean, variance, standard deviation) Latent Topic Significance (mean, variance, standard deviation) 37 Key Term Extraction, NTU Feature Extraction O Semantic features Key terms tend to focus on limited topics O Probabilistic Latent Semantic Analysis (PLSA)

Latent Topic Entropy non-key termHigher LTE key term Feature Name LTP (I - III) LTS (I - III) Feature Description Lower LTE Latent Topic Probability (mean, variance, standard deviation) Latent Topic Significance (mean, variance, standard deviation) 38 Key Term Extraction, NTU Automatic Key Term Extractio

n Phrase Identificatio n ASR trans Archive of spoken documents ASR Key Term Extraction Learning Methods Branchin g Entropy Feature Extraction 1)K-means Exemplar 2)AdaBoost 3)Neural Network speech signal Key terms entropy acoustic

model : Using learning approaches to extract key terms 39 Key Term Extraction, NTU Learning Methods O Unsupervised learning O K-means Exemplar Transform a term into a vector in LTS (Latent Topic Signi ficance) space Run K-means The terms in the same cluster focus on a single topic Find The term in centroid the sameofgroup related tothe thekey keyterm

term the each are cluster to be The key term can represent this topic 40 Key Term Extraction, NTU Learning Methods O Supervised learning O Adaptive Boosting O Neural Network Automatically adjust the weights of features to produce a classifier 41 Key Term Extraction, NTU Experiments & Evaluatio n 42

Key Term Extraction, NTU Experiments O Corpus O NTU lecture corpus O Mandarin Chinese embedded by English words solution viterbi algorithm (Our solution is viterbi algorithm) O Single speaker O 45.2 hours 43 Key Term Extraction, NTU Experiments O ASR Accuracy SI Model CH EN Background Out-of-domain corpora Adaptive

some data from target speaker Bilingual AM and model adaptation AM trigram interpolation LM In-domain corpus Language Mandarin English Overall Char Acc (%) 78.15

53.44 76.26 44 Key Term Extraction, NTU Experiments O Reference Key Terms O Annotations from 61 students who have taken the course If the k-th annotator labeled N key terms, he gave each k of them a score of , but 0 to others Rank the terms by the sum of all scores given by all an notators for each term Choose the top N terms form the list (N is average N ) k O N = 154 key terms 59 key phrases and 95 keywords 45 Key Term Extraction, NTU

Experiments O Evaluation O Unsupervised learning Set the number of key terms to be N O Supervised learning 3-fold cross validation 46 Key Term Extraction, NTU Experiments O Feature Effectiveness O Neural network for keywords from ASR transcriptions F-measure 60 50 42.8 6 40 30 48.1 5

56.5 5 35.6 3 20.7 8 20 10 0 Pr Lx Sm Pr+LxPr+Lx+Sm Pr: Prosodic Lx: Lexical Sm: Semantic

Prosodic Three features sets and of features lexical are all F1 useful are additive Each set of these features alonefeatures gives from 20% to Key Term Extraction, NTU 47 Experiments AB: AdaBoost

NN: Neural Network O Overall Performance F-measure 70 67.31 62.39 60 55.84 51.95 Conventional TFIDF scores w/o 50 branching entropy stop word removal PoS filtering 40 manual ASR 30

23.38 20 10 0 Baseline U: TFIDF U: K-means S: AB S: NN Supervised approaches areentropy betteroutperforms than unsupervised Branching performs well K-means Exempler TFIDF approaches

Key Term Extraction, NTU 48 Experiments O Overall Performance AB: AdaBoost NN: Neural Network F-measure 70 62.39 60 57.68 55.84 51.95 67.31 62.70 52.60 50

43.51 40 30 23.38 manual ASR 20.78 20 10 0 Baseline U: TFIDF U: K-means S: AB S: NN Supervised learning using

neuralworse network the best results The performance of ASR is slightly than gives manual but reasonable 49 Conclusion Key Term Extraction, NTU 50 Key Term Extraction, NTU Conclusion O We propose the new approach to extract key terms O The performance can be improved by O Identifying phrases by branching entropy

O Prosodic, lexical, and semantic features together O The results are encouraging 51 Key Term Extraction, NTU Thanks for your attention! Q&A NTU Virtual Instructor: http://speech.ee.ntu.edu.tw/~RA/lectu re