undefined

Buffl

Data Science

von Felix S.

Decision Trees

non parametric
supervised
used for classification and regression

CART

variant of decision tree algorithm
predictive algorithm in ML
classification and regression
supervised
learns from labelled data to predict unseen data
Goal: create model that predicts the value of a target variable by learning simple decision rules inferred from the data variables
Nodes are split into Subnodes -> by searching for best homogenity
root node taken as training set and is split into two by considering best attribute and treshold value
splitting criteria: evaluate all possible splits and selects the one that best reduces the impurity of the resulting subsets
splitting stops when impurity below certain treshold
prevent overfitting by pruning -> removes nodes that cintribute little to models accuracy

Which model can handle classification and regression?

How works the CART algorithm

best split point of each input is obtained
based on best split points of each input in Step 1, the new “best” split point is identified
Split the chosen input according to the best split point
continue splitting until a stopping rule is satisfied or no further splitting available

What are the pros and cons of CART

Pros

simplistic results
nonparametrix and nonlinear
implicity perform feature selection
outliers have no effect
requires minimal superviseion and easy to understand model

cons

Overfitting
Greedy algorithms -> not optimal results
Trees might be biased if a specific class dominates the dataset

ID3 Algorithm

decision tree from a fixed set of examples
top down, greedy
classify future samples

ID3 Algorithm

explain the 5 steps

calculate Information Gane of each future
split dataset S into subsets - using feature with Information Gain is max
make decision tree node using feature with information gain max
if all rows belong to same class, make current node as a leaf node with class as labels
repeat with remaining features until no features left, or tree has all leaf nodes

What are the 2 Criteria of ID3

Entropy (Overall level of uncertainty)
Information Gain: (Information most usefull classification)

ID3

Entropy and Information Gain
Calculations

Hinds:

Entropy:

Information Gain:

ID3

advantages
advantages

advantage

simple and inexpensive
fast at classifying unknown records
easy interpretation for small sized trees
robust to noise
ease handle on irrelevant attributes

disadvantages

space of trees is exponentially large -> greedy are often unable to find best trees
produce only small trees not the smallest possible tree
can only handle normal nominal attributes
each decision boundary involves only a single attribute

C 4.5

based on ID3
gain ratio for attribute selection
permit numeric attributes
new: sinsible with missing values
pruning to deal with noise
one of best known und widely used learning algorithms

C 4.5

what is the problem with information gain criteria?

Attributes with more possible values will often have highest information gain

=> Attributes selected due to lots of values not because of real relations

C 4.5

Gain Ratio
- 2 Criteria

2 Criteria

Information Gain (Information being the most useful for classification)
Split Info (measures the potential information generated by splitting data based on an attribute)

C4.5

Gain Ratio
- Steps

2 Steps

Calculate Split Information
Compute Gain Ratio

C4.5

Gain Ratio
- Information Gain
- Split Info
- Gain Ratio

Information Gain
- determine best attribute fir splitting data on how well it separates the classes
Split info
- helps penalize splits that divide data too finelsy by considering the information of the split
Gain Ratio
- combines both to select the attribute that splits the data well and avoid overfitting