Build HMM-based(Hidden Markov model) POS (part of speech) tagger from scratch and implement the Viterbi algorithm using the Penn Treebank training corpus. Modify the Viterbi algorithm to solve the problem of unknown words using at least two techniques
The techniques used are:
Build the vanilla Viterbi based POS tagger Modify Viterbi Heuristic Algorithm to consider only transition probability for UNKNOWN words Modify Viterbi Heuristic Algorithm to tag UNKNOWN words using rule-based approach