Lexical Analysis Uses formalism of Regular Languages Regular Expressions Deterministic Finite Automata (DFA) Non-deterministic Finite Automata (NDFA) RE NDFA DFA minimal DFA (F)Lex uses RE as input, builds lexor 1
DFAs: Formal Definition DFA M = (Q, , , q0, F) Q = states finite set = alphabet finite set = transition function function in Q Q q0 = initial/starting state q0 Q F = final states FQ 2 DFAs: Example strings over {a,b} with next-to-last symbol = a
a b a a ab a a b b aa b
b a b a b ba a bb b 3 Nondeterministic Finite Automata
Nondeterminism implies having a choice. Multiple possible transitions from a state on a symbol. (q,a) is a set of states : Q Pow(Q) Can be empty, so no need for error/nonsense state. Acceptance: exist path to a final state? I.e., try all choices. Also allow transitions on no input: : Q ( {}) Pow(Q) 4 NFAs: Formal Definition NFA M = (Q, , , q0, F) Q = states a finite set = alphabet
a finite set = transition function a total function in Q ( {}) Pow(Q) q0 = initial state q0 Q F = final states FQ 5 NFAs: Example strings over {a,b} with next-to-last symbol = a a
a a Loop until we guess which is the next-to-last a. 6 NFAs: Example strings over {0,1,2} having (either 0-or-more 0s or 0-or-more 1s) followed by 0-or-more 2s 0 2 0s
1 1s 2s 7 Regular Expressions Regular expression (over )
a where a r+r r r r* where r,r regular (over ) Notational shorthand: r0 = , ri = rri-1 r+ = rr* 8
RE NFA Defined inductively on structure of RE. This construction produces NFA with single final state. 6 cases: , , a, r+r, rr, r* 9 RE NFA: q0 qf
Accepts nothing since no edge to final state. 10 RE NFA: q0 11 RE NFA: a q0 a qf 12
RE NFA: r+r q0 qf q0 qf q0 qf
edges guess whether to use r or r. 13 RE NFA: rr q0 q0 qf q0
qf qf Could conflate q0 with q0, qf with qf. 14 RE NFA: r* q0 q0
qf qf Can loop r as many times as desired or skip it. 15 RE NFA: Example (0+01)* 0
0 1 16
RE NFA: Notes Most constructions produce very large NFAs. Not optimal for size. But easy to construct. 17 NFA -> DFA Subset Construction Complicated but well described in the text
Section 3.7.1 (pp 152-155), Algorithm 3.20 (2nd edition) In section 3.6 (pp 116-121) in 1st edition 18 Minimizing DFA Partition states of DFA, D, into two sets, final states, and non-final states. Continue until no more partitions are
needed For each partition, P, split the DFA states of P so that, for each subpartition, all DFA states in that partition have the same transition for each input symbol, x. 19