What types of pattern recognition can be employed?
regression
classification
clustering
What and how does Regression predict?
predict continuous values
supervised
e.g.
house pricing
sales
person weight
What and how does classification predict?
predict discrete values
object detection
spam detectoin
cancer detection
What and how does clustering predict?
unsupervised
genome patterns
google news
lidar processing
What is a motivation to use regression in Automotive Technology?
vehicle parameters often only roughly known
-> estimatoin by regression techniques
financial relations -> e.g. car pricing
What is the general motivation to use regressoin? How is the training data structured?
-> given data and model structure
=> possible to predict outcome of a process or system
training dataset usually representaiton at sparse points
contains lots of noise
allows usage of information in simulation, optimization…
Process of regressoin?
learning:
model structure -> predictive model <- training data
prediction:
prevoiusly unseen sets of input variables
->
predictive model
predictions about output variables
What is the essential questoin in both machine learning and statistics?
how can we extract information from data
and use them to reason and predict
in beforehand unseen cases?
(learning)
What is the relation between statistics and machine learning?
nearly all classical ML methods can be reinterpreted in terms of statistics
focus in machine learning is mainly on prediction
statistics focusses on relation analysis
lots of advanced regression techniques build upon a statistical interpretatino of regression
What is the linear basis functoin model?
y = b + SUM wi * Φi(x)
y := output variables
b := Bias term
wi := weight parameters
Φi := basis functions
y = [1 Φ1(x) … Φk(x)] [b w1 … wk]T
How can one represent the dataset as matrix?
What are some extensions that allow non-linear regression?
nonlinear optimization (fit curve through datapoints…)
gaussian process
kernels, non parametric, bayesian interpretation)
SVM
kernels, non parameteric, mostly used for classification
What is the workflow of obtaining model parameters?
Choose model
Find Parameters
Validate Model
What has to be done when choosing a model?
type of basis functions
number of basis functions
What has to be done when finding parameters?
specify loss function (-> measure how good model fits the data)
specify constraints on the parameters
use explicit solution or iterative algorithms to obtain parameters
What has to be done to validate the model?
use second dataset which was not used for training
to evaluate the performance of your model
this ensures ability to generalize to unseen data
What types of models (functions) are there?
linear function
sinusodia function
polynomial function
gaussian basis function
What are attributes of polynomials for regression?
golbally defined on independent variable domain
design matrix (model) becomes ill-conditioned for large input domain variables for standard polynomiels
hyperparameter:
polynomial degree
Φ0(x) = 1
Φ1(x) = x
Φ2(x) = x^2
Φ3(x) = x^3
…
Φi(x) = x^i
What are attributes of gaussians for regression?
locally defined on independent variable domain
sparse design matrix
infinitely differentiable (e^x…)
hyperparameters
number of gaussian functions
width s of each basis function
mean my of each basis function
How do outliers affect polynomial vs gaussian?
polynomial -> gets worse globallly (affect globally…)
gaussian -> gets worse only locally…
What are some ohte rbasis functions?
sigmoid
bounded
globally defined
bin functions (box aroud [-1,1]
locally defined
no continuity
piercewise linear
not bounded
max(0,x)
What is the purpose of the loss function? Thus what is the best model we can obtain?
measures accuracy of a model based on training dataset
best model we can obtain is minimum loss model
choice of loss function fundamental in regression problem
What are pros and cons of MSE/L2 loss?
pros:
very important in practical applications
solution can be easily obtained analytically
cons:
not robuts to outliers
examples:
basic regression
energy optimization
control applications
Formula and graph of L2
Pros and Cons MAE/L1 loss?
robust to outliers
no analytical solutoin
non-differentiable at the origin
financial applications
Formula and graph of L1
What are pros and cons of the huber loss?
combines strengths and weaknesses of L1 and L2
robust + differentiable
more hyperparameters
no analytical solution
Formula and graph of Huber loss
Comparison Loss functions
L2 is differentiable
L1 more intuitive
Huber combines theoretical steengths of both
=> start with L2
How to analytically solve regression?
optimization problem
with model y = w1*x + b
insert datapoints
=> optimal solutions are obtained where the gradient is zero…
=> calculate gradient w.r.t w and b
=> set to 0
=> find optimal parameter for given y,x pairs
What is the general form of analytical MSE optimization?
Steps for optimizatino calculation?
When to use sequential analysis?
apply regression during operation
not enough memory to store all data points / not all data points available at the same time…
What is a possible solution to sequential analysis?
recursive least squares (RLS)
How to do sequential analysis?
we have memory matrix
and weigths
we iteratively update each step
How to update the weights
How to update memory matrix?
What is crucial in sequential analysis?
initialization
possibilities:
use first datapoints available
use values wich you asusme to be right (e.g. based on pyhsical knowledge / laws)
What is a forgetting factor required for?
some applicatoins show slowly varying conditions in logn term
but can be considered stationary on short to medium time periods
-> e.g. aging of products lead to slight parameter changes…
other: vehicle mass usually constant over significantly time period…
=> RLS algo can deal with this by introducint forgetting factor
-> reduces the weight of old samples…
How is forgetting factor introduced?
What are numerical interativce solutions?
solution to regression
important for large-scale problems
and non-quadratic loss functions
What are some popular numerical iterative methods?
gradient descend
gauss-newton
levenberg-marquardt
Pros and cons of numerical iterative solutinos?
very generic
knowledge about numeric optimization necessary
Why is there a need for constraints on weights?
weights can be interpreted as physical quantities
temperatire
spring constants
mass
valid range known for the weithts
introduce c1 <= w <= c2…
=> improves robustness
more difficult to solve
What is the decision tree on how to solve a regression problem?
quadratic cost funciton?
no -> numeric iterative
yes -> 2
are there parameter constraints?
yes -> numeric iterative
no -> 3
is the dataset very large?
no -> 4
is all data available instantanously?
yes -> analytic
no -> sequential analytic
What is the specturm of choosing correct model?
underfitted <-> well done <-> overfitted
underfitted:
not enough features
wrong structure
overfitted:
too many featuers
irrelevant features
overly complex model
What are the effects of overfittin?
overfitting: failure to generalize properly between datapoints
cost function decreases with increased model complexity
noise and irrelevant effects become too important
What is the most often cause of overfitting? How to avoid?
large model complexity most common cause
advisable: look for least complex model with decent performance on training and testing dataset
-> more likely to generarlize and less prone to numerical issues and outliers
What is difficult w.r.t. overfitting?
difficult to quantify wether model is overfitting
depends on application and structure of dataset…
What is the curse of dimensionality w.r.t. overfitting?
overfitting occurs if:
data points sparse
model complexity high
sparsity of data points is difficutlt to graps
sparsity increases fast with increased input dimensionality
=> helpful if data points “nicely” located in sample space…
Why do we use validation datasets?
difficult to judge overfitting in high-dimensional domains and autonomous sysetems
=> standard technique: separate data in training and validation data
What is the structure of using validation data?
What are common pitfalls with validation datasets?
validation dataset must reflect future properties of underlying physical relationship
do not reuse validation datsaets
if used again and again for testing model
-> somehow incorporated into moddelin proces
-> does not give expected results anymore
split before fitting model is essential
-> 2/3 good start for training and 1/3 for vlaidatoin
!! visualize data as much as posisble
What is k-fold cross validation?
e.g. have limited data set size
-> one may not want to remove a substantial part for validatoin
=> smaller validation sets to estimate true prediciton error by splitting data in multiple “folds”
=> variance of estimation error is indicator for model stability…
=> k=4 -> solit dataset into 4 parts, and use each iteration another quater for validation…
What is a basic goal for which regularization is used to achieve?
Basic goal: choose model structure based on underlying physical principles and not on characteristics of dataset
How does polynomial and gauss behave w.r.t. overfitting?
polynomial
tend to have larger coefficients for sparse datasets
gaussian:
tend to overfit locallyleading to single, large coefficients
=> circumvent this using regularization…
How can regularization mitigate the aforementioned problems (poly and gauss…)
penalize high coefficients in the optimization
weighting of the penalty term gives intuitive hyperparameter to control model complexity
What types of regularization did we discuss?
L2 / Thikonov
L1
Features of L2 regularization?
prevents overfitting well
analytic solution is available as an extension to the MSE problem
difficult to apply and tune in high-dimensional feature spaces
L2 retularization term?
Features L1 regularization?
tends to produce sparse solutions
-> can therefore be applied for feature selection
spase solutions mean several coefficients go to zero…
Term L1 regularization?
Other names L1 / L2?
L1: Lasso Regression
L2: Ridge Regression
How is regularization applied?
use high dimensional input space for test model
apply ridge regression
regularized solutions perform far better at interpolation…
note: must evaluate between sample points…
How can one tune regularization?
plot train and test loss for different lambda (regularization weight)
-> the lower the lambda, the higher the model complexity, the higher, the more underfitting…
How can over- and underfitting be formalized?
bias and variance…
=> study predictors performance on perviously unseen data
-> evaluate the bias and variance of the outputs
=> high bias -> model is underfitting, as it is wrong for many data points
=> high variance -> model is overfitting, as it is very wrong for couple of data points…
Again intuition of bias and variance?
consider repeating fitting lots of times
-> high variance -> very noisy, althoug in general correct fitting…
-> high bias -> less noisy but more incorrect fitting…
General problem bias variance?
cannot ensure in general to reach low bias and variance at same time
-> have to balance according to our objective
=> low bias and variance requires large, high quality datasets
What is a problem of general regression?
independent variable (x-axis) assumed to be noise free
=> often not true in engineering applications
other methods to consider this:
total least squares
Principal Component Analysis (PCA)
Last changed2 years ago