What are two supervised machine learning methods for tabular data?
Regression
Classification
What type of machine learning can be applied to series of data?
Forecasting
What are three machine learning methods that can be applied to images? Are they supervised or unsupervised?
Image classification
Image segmentation
Object detection
Name three supervised machine learning methods for video media.
Video classifications
Video object tracking
Video action recognition
What are the three supervised methods for analyzing text with machine learning?
Sentiment analysis
Entity extraction
Translation
What are Google’s Responsible AI Practices?
General Best Practices: Solicit feedback early in the design process. Engage and test to ensure solution meets the need.
Fairness: …
Interpretability: …
Privacy: …
Security: …
Name two tabular unsupervised machine learning methods for tabular data?
K-means clustering
Principal components analysis (PCA)
Are there any unsupervised methods for text data?
Yes; topic modeling!
Are machine learning methods, such as collaborative filtering and recommendations, unsupervised or supervised?
They are both for mixed data types!
Why use an ML metric?
An ML metric (or a suite of metrics) is used to determine if the trained model is accurate enough.
What is a Confusion matrix?
A count determining the difference between actual and predicted results when compared to the positive and negative class labels in a binary classification problem.
For example:
Predicted
Actual
Positive Prediction
Negative Prediction
Positive Class
5
2
Negative Class
3
990
How do you calculate recall? What do you use it for?
Recall is for understanding the percentage of postive examples correctly detected. When it’s important to have a high postive identification rate, recall allows you to determine model success with false negatives in the denominator.
How do you calculate precision? What do you use it for?
When you’re concerned about lower false positives, you want to use precision. Helps to identify the percentage of predictions that’s actually correct.
If you’re concerned about false positives and false negatives, what’s the metric you will use? Is there anything special about it? How would you calculate it?
You’d use F1-score to measure false positives and false negatives at the same time. It’s the harmonic mean between precision and recall.
To calculate it, you’d need the following formula:
Where does the ROC come from in Area Under the Curve Receiver Operating Characteristic (AUC ROC)? What does the graphical plot show?
It’s from signal processing. The graphical plot that summarizes the performance of a binary classification model.
How would you compare two binary classification models?
When you have two models, you get two ROC curves, and the way to compare them is to calculate the area under the curve (AUC). Once you have chosen the model based on AUC, you can find the threshold point that maximizes your F1 (as indicated in Figure 1.2).
What’s the benefit of using the AUC ROC curve to measure model success?
This method has the following advantages:
Scale‐invariant: It measures how well the predictions are ranked and not their absolute values.
Classification threshold‐invariant: It helps you measure the model irrespective of what threshold is chosen.
What are regression metrics to measure model fit?
MAE The mean absolute error (MAE) is the average absolute difference between the actual values and the predicted values.
RMSE The root‐mean‐squared error (RMSE) is the square root of the average squared difference between the target and predicted values. If you are worried that your model might incorrectly predict a very large value and want to penalize the model, you can use this. Ranges from 0 to infinity.
RMSLE The root‐mean‐squared logarithmic error (RMSLE) metric is similar to RMSE, except that it uses the natural logarithm of the predicted and actual values +1. This is an asymmetric metric, which penalizes under prediction (value predicted is lower than actual) rather than over prediction.
MAPE Mean absolute percentage error (MAPE) is the average absolute percentage difference between the labels and the predicted values. You would choose MAPE when you care about proportional difference between actual and predicted value.
R2 R‐squared (R2) is the square of the Pearson correlation coefficient (r) between the labels and predicted values. This metric ranges from zero to one; and generally a higher value indicates a better fit for the model.
Last changed7 months ago