Name 2 approaches for text classification
Rule based
Supervised ML
Pro and con of rule-based and supervised ML
rule-based
+ Precision can be high
- Very expensive to build and maintain
supervised ML
+ Easier to maintain, usually more accurate
- Needs training data
Formular Bayes Rule
Formular Naive Bayes
What is a limitation of standard classification and how can we overcome these limitations?
Limitation: Assumption that indivisual cases are disconnected and independent
-> Hidden Markov Models
Explain Sequence labeling
Each token in a sequence is assigned a label
Labels of tokens are dependent on the labels of other tokens in the sequence, particularly their neighbors
Example: POS
2 Problems of POS
Not easy to integrate information from category of tokens on both sides
Difficult to propagate uncertainty between decisions and “collectively” determine the most likely joint assignment of categories to all of the tokens in a sequence
4 Key features of the HMM
Fixed set of states
State transition probabilities
Fixed set of possible outputs
For each state: a distribution of probabilities for every possible output -> Emission probabilities
Formullar for NLP Transition Probability and Emission probabilitites
Transition Probabilities
Emission Probabilities
Name of the dynamic programming solution for HMM
Viterbi algorithm
Last changed2 days ago