Machine Learning Pipeline
“Clean” ML Design
Explore the Data
Prepare the Data
Data Division
Training set
Validation/developmet set: Which you use to tune parameters, select features, and make other decisions regarding the learning algorithm. Sometimes also called the hold-out cross validation set .
test set: which you use to evaluate the performance of the algorithm, but not to make any decisions regarding what learning algorithm or parameters to use.
Precision
of a cat classifier is the fraction of images in the dev (or test) set it labeled as cats that really are cats
Recall
is the percentage of all cat images in the dev (or test) set that it correctly labeled as a cat.
F1 score
2/((1/Precision)+(1/Recall))
Intersection over Union (IoU)
Also known as Jaccard Index, is a
measure to describe the extent of over- lap of predicted segmentation
mask with ground truth. It is defined as the intersection area between
the predicted segmentation and the ground truth, divided by the area of
union between the predicted segmentation mask and the ground truth:
IoU = J(A, B) = |A ∩ B|/(A ∪ B) = TP/(TP + FP + FN)
where A and B denote the ground truth and the predicted segmentation,
respectively, it goes between 0 and 1.
Hints for Model Selection
Bias-Variance Tradeoff
The Unreasonable Effectiveness of Data
Effort per ML Project Task
Last changed7 months ago