Name 3 approaches for text classification
Rule-Based – Uses handcrafted linguistic rules.
Supervised Machine Learning – Trains a model using labeled data.
Unsupervised Learning – Uses clustering or topic modeling without labels.
Pro and con of rule-based ML
+ Precision can be high, human-readable rules
- Very expensive to build and maintain
What are the pros and cons of supervised learning for text classification?
Pros: More accurate, easier to maintain than rule-based systems.
Cons: Requires labeled training data.
Formular Bayes Rule
Formular Naive Bayes
What is a limitation of standard classification and how can we overcome these limitations?
Limitation: Assumption that indivisual cases are disconnected and independent
-> Hidden Markov Models
Explain Sequence labeling
Each token in a sequence is assigned a label
Labels of tokens are dependent on the labels of other tokens in the sequence, particularly their neighbors
Example: POS
2 Problems of POS
Not easy to integrate information from category of tokens on both sides
Difficult to propagate uncertainty between decisions and “collectively” determine the most likely joint assignment of categories to all of the tokens in a sequence
4 Key features of the HMM
Fixed set of states
State transition probabilities
Fixed set of possible outputs
For each state: a distribution of probabilities for every possible output -> Emission probabilities
Formullar for NLP Transition Probability and Emission probabilitites
Transition Probabilities
Emission Probabilities
Name of the dynamic programming solution for HMM
Viterbi algorithm
What is the Viterbi algorithm, and how does it improve HMM decoding?
The Viterbi algorithm efficiently finds the most likely sequence of hidden states by:
Computing emission and transition probabilities.
Backpropagating to reconstruct the best sequence.
Reducing complexity to O(sm) compared to O(ms²).
What are common ways to represent words in text classification?
One-Hot Encoding – Sparse vector representation.
TF-IDF – Weighs words based on importance.
Word Embeddings (Word2Vec, GloVe) – Dense, semantic representations.
Last changed3 months ago