undefined

von Felix S.

Numeric Attributes (before nominal attributes)

standard method: binary splits
NEW: nominal attributes has many possible split points
basic approach:
- place split points halfway between values
extended approach:
- evaluate for every possible split point of attribute
- choose best split point
- best split point creates 2 sets with maximal measure

=> computationally demanding

Binary vs. multi-way Splits

Splitting on a nominal attribute exhausts all information in that attribute
- nominal attribute is testet once on any path in tree
C4.5 uses binary split, by numeric useful but not in every cases
- in same cases single split will not increase information

what kind of split does c4.5 use?

Missing Values

ID3 can not handle cases with missings / NA
C4.5 allow missings with the form of “?”
Information Gain works as before - unknown values are not included in calculations

Pruning

When the decision tree is built, many of the branches will reflect anomalies in training data due to noise or outliers
Goal: prevent overfitting
2 Strategies
- Prepruning - stop growing branches by unreliable information
  - stop creating subtree when
    - number of samples below treshold
    - information gain below treshold
    - depth of tree beyond treshold
    - based on statistical significance test
- Postpruning - fully grown tree -> discard unreliable parts
  - 2 operations
    - subtree replacement
    - subtree raising

What is the preferred Pruning method?

C5.0

C 5.0: The Successor
Variable misclassification costs
Case weight attribute that quantifies the importance of each observation (case)
Winnowing (feature selection) function integrated for high-dimensional data
Allows for several interesting data types such as dates, times, timestamps
C5.0's trees are noticeably smaller and C5.0 is faster by factors of 3, 5, and 15 respectively.

Boosting

technique for generating and combining multiple classifiers to improve predictive accuracy
bossting can reduce classification accuracy for noisy cases

Ensemble Learning

Problem: classifier with low bias tend to have high variance
approach: use several classifiers
- selection: each classifier is a local expert in some local neighborhood of the feature space
- fusion: all classifiers are trained over the entire feature space, and then combined to obtain a composite classifier with lower variance and lower error
combine smaller methods to a big one

Steps of Ensemble Methods

Data sambling and selection
1. completely random
2. following a strategy
training of the component classifiers
mechanism to combine the classifiers
1. discrete predictions: simple or weighted majority voting
2. continuous predictions: mean rule, weighted avg. min max median

Random Forests

Decision Trees Valuation

pros

Decision trees provide understandable decision rules
Fast classification
Continuous & discrete variables can be processed
Attributes providing most classification power can be identified
Easy extension: random forests provide better results (ensemble learning with several trees)

cons

Zuletzt geändert
vor einem Jahr

LE7 - Modelling III - Classification II