03_Regression

Buffl

KI Fahrzeugtechni

by Jensen J.

What types of pattern recognition can be employed?

regression
classification
clustering

What and how does Regression predict?

predict continuous values
supervised
e.g.
- house pricing
- sales
- person weight

What and how does classification predict?

predict discrete values
supervised
e.g.
- object detection
- spam detectoin
- cancer detection

What and how does clustering predict?

predict discrete values
unsupervised
e.g.
- genome patterns
- google news
- lidar processing

What is a motivation to use regression in Automotive Technology?

vehicle parameters often only roughly known
-> estimatoin by regression techniques

financial relations -> e.g. car pricing

What is the general motivation to use regressoin? How is the training data structured?

-> given data and model structure

=> possible to predict outcome of a process or system

training dataset usually representaiton at sparse points
contains lots of noise
allows usage of information in simulation, optimization…

Process of regressoin?

learning:

model structure -> predictive model <- training data

prediction:

prevoiusly unseen sets of input variables

predictive model

predictions about output variables

What is the essential questoin in both machine learning and statistics?

how can we extract information from data
and use them to reason and predict
in beforehand unseen cases?
(learning)

What is the relation between statistics and machine learning?

nearly all classical ML methods can be reinterpreted in terms of statistics
focus in machine learning is mainly on prediction
statistics focusses on relation analysis
lots of advanced regression techniques build upon a statistical interpretatino of regression

What is the linear basis functoin model?

y = b + SUM wi * Φi(x)

y := output variables
b := Bias term
wi := weight parameters
Φi := basis functions

y = [1 Φ1(x) … Φk(x)] [b w1 … wk]T

How can one represent the dataset as matrix?

What are some extensions that allow non-linear regression?

nonlinear optimization (fit curve through datapoints…)
gaussian process
- kernels, non parametric, bayesian interpretation)
SVM
- kernels, non parameteric, mostly used for classification

What is the workflow of obtaining model parameters?

Choose model
Find Parameters
Validate Model

What has to be done when choosing a model?

type of basis functions
number of basis functions

What has to be done when finding parameters?

specify loss function (-> measure how good model fits the data)
specify constraints on the parameters
use explicit solution or iterative algorithms to obtain parameters

What has to be done to validate the model?

use second dataset which was not used for training
- to evaluate the performance of your model
this ensures ability to generalize to unseen data

What types of models (functions) are there?

linear function
sinusodia function
polynomial function
gaussian basis function

What are attributes of polynomials for regression?

golbally defined on independent variable domain
design matrix (model) becomes ill-conditioned for large input domain variables for standard polynomiels
hyperparameter:
- polynomial degree

Φ0(x) = 1
Φ1(x) = x
Φ2(x) = x^2
Φ3(x) = x^3
…
Φi(x) = x^i

What are attributes of gaussians for regression?

locally defined on independent variable domain
sparse design matrix
infinitely differentiable (e^x…)
hyperparameters
- number of gaussian functions
- width s of each basis function
- mean my of each basis function

How do outliers affect polynomial vs gaussian?

polynomial -> gets worse globallly (affect globally…)
gaussian -> gets worse only locally…

What are some ohte rbasis functions?

sigmoid
- bounded
- globally defined
bin functions (box aroud [-1,1]
- bounded
- locally defined
- no continuity
piercewise linear
- not bounded
- globally defined
- max(0,x)

What is the purpose of the loss function? Thus what is the best model we can obtain?

measures accuracy of a model based on training dataset
best model we can obtain is minimum loss model
- choice of loss function fundamental in regression problem

What are pros and cons of MSE/L2 loss?

pros:

very important in practical applications
solution can be easily obtained analytically

cons:

not robuts to outliers

examples:

basic regression
energy optimization
control applications

Formula and graph of L2

Pros and Cons MAE/L1 loss?

pros:

robust to outliers

cons:

no analytical solutoin
non-differentiable at the origin

examples:

financial applications

Formula and graph of L1

What are pros and cons of the huber loss?

pros:

combines strengths and weaknesses of L1 and L2
robust + differentiable

cons:

more hyperparameters
no analytical solution

Formula and graph of Huber loss

Comparison Loss functions

L2 is differentiable
L1 more intuitive
Huber combines theoretical steengths of both

=> start with L2

How to analytically solve regression?

optimization problem

with model y = w1*x + b
insert datapoints
=> optimal solutions are obtained where the gradient is zero…

=> calculate gradient w.r.t w and b

=> set to 0

=> find optimal parameter for given y,x pairs

What is the general form of analytical MSE optimization?

Steps for optimizatino calculation?

When to use sequential analysis?

apply regression during operation
not enough memory to store all data points / not all data points available at the same time…

What is a possible solution to sequential analysis?

recursive least squares (RLS)

How to do sequential analysis?

we have memory matrix
and weigths
we iteratively update each step

How to update the weights

How to update memory matrix?

What is crucial in sequential analysis?

initialization
- possibilities:
- use first datapoints available
- use values wich you asusme to be right (e.g. based on pyhsical knowledge / laws)

What is a forgetting factor required for?

some applicatoins show slowly varying conditions in logn term
but can be considered stationary on short to medium time periods
-> e.g. aging of products lead to slight parameter changes…
other: vehicle mass usually constant over significantly time period…

=> RLS algo can deal with this by introducint forgetting factor

-> reduces the weight of old samples…

How is forgetting factor introduced?

What are numerical interativce solutions?

solution to regression
important for large-scale problems
and non-quadratic loss functions

What are some popular numerical iterative methods?

gradient descend
gauss-newton
levenberg-marquardt

Pros and cons of numerical iterative solutinos?

pros:

very generic

cons:

knowledge about numeric optimization necessary

Why is there a need for constraints on weights?

weights can be interpreted as physical quantities
- temperatire
- spring constants
- mass
valid range known for the weithts
introduce c1 <= w <= c2…
=> improves robustness
more difficult to solve

What is the decision tree on how to solve a regression problem?

quadratic cost funciton?
- no -> numeric iterative
- yes -> 2
are there parameter constraints?
- yes -> numeric iterative
- no -> 3
is the dataset very large?
- yes -> numeric iterative
- no -> 4
is all data available instantanously?
- yes -> analytic
- no -> sequential analytic

What is the specturm of choosing correct model?

underfitted <-> well done <-> overfitted

underfitted:

not enough features
wrong structure

overfitted:

too many featuers
irrelevant features
overly complex model

What are the effects of overfittin?

overfitting: failure to generalize properly between datapoints
cost function decreases with increased model complexity
noise and irrelevant effects become too important

What is the most often cause of overfitting? How to avoid?

large model complexity most common cause
advisable: look for least complex model with decent performance on training and testing dataset
- -> more likely to generarlize and less prone to numerical issues and outliers

What is difficult w.r.t. overfitting?

difficult to quantify wether model is overfitting
depends on application and structure of dataset…

What is the curse of dimensionality w.r.t. overfitting?

overfitting occurs if:
- data points sparse
- model complexity high
sparsity of data points is difficutlt to graps
sparsity increases fast with increased input dimensionality

=> helpful if data points “nicely” located in sample space…

Why do we use validation datasets?

difficult to judge overfitting in high-dimensional domains and autonomous sysetems
=> standard technique: separate data in training and validation data

What is the structure of using validation data?

What are common pitfalls with validation datasets?

validation dataset must reflect future properties of underlying physical relationship
do not reuse validation datsaets
- if used again and again for testing model
- -> somehow incorporated into moddelin proces
- -> does not give expected results anymore
split before fitting model is essential
- -> 2/3 good start for training and 1/3 for vlaidatoin
!! visualize data as much as posisble

What is k-fold cross validation?

e.g. have limited data set size
-> one may not want to remove a substantial part for validatoin
=> smaller validation sets to estimate true prediciton error by splitting data in multiple “folds”
=> variance of estimation error is indicator for model stability…
=> k=4 -> solit dataset into 4 parts, and use each iteration another quater for validation…

What is a basic goal for which regularization is used to achieve?

Basic goal: choose model structure based on underlying physical principles and not on characteristics of dataset

How does polynomial and gauss behave w.r.t. overfitting?

polynomial
- tend to have larger coefficients for sparse datasets
gaussian:
- tend to overfit locallyleading to single, large coefficients
=> circumvent this using regularization…

How can regularization mitigate the aforementioned problems (poly and gauss…)

penalize high coefficients in the optimization
weighting of the penalty term gives intuitive hyperparameter to control model complexity

What types of regularization did we discuss?

L2 / Thikonov
L1

Features of L2 regularization?

prevents overfitting well
analytic solution is available as an extension to the MSE problem
difficult to apply and tune in high-dimensional feature spaces

L2 retularization term?

Features L1 regularization?

tends to produce sparse solutions
- -> can therefore be applied for feature selection
spase solutions mean several coefficients go to zero…

Term L1 regularization?

Other names L1 / L2?

L1: Lasso Regression
L2: Ridge Regression

How is regularization applied?

use high dimensional input space for test model
apply ridge regression
regularized solutions perform far better at interpolation…
note: must evaluate between sample points…

How can one tune regularization?

plot train and test loss for different lambda (regularization weight)
-> the lower the lambda, the higher the model complexity, the higher, the more underfitting…

How can over- and underfitting be formalized?

bias and variance…
=> study predictors performance on perviously unseen data
-> evaluate the bias and variance of the outputs
=> high bias -> model is underfitting, as it is wrong for many data points
=> high variance -> model is overfitting, as it is very wrong for couple of data points…

Again intuition of bias and variance?

consider repeating fitting lots of times
-> high variance -> very noisy, althoug in general correct fitting…
-> high bias -> less noisy but more incorrect fitting…

General problem bias variance?

cannot ensure in general to reach low bias and variance at same time
-> have to balance according to our objective
=> low bias and variance requires large, high quality datasets

What is a problem of general regression?

independent variable (x-axis) assumed to be noise free
=> often not true in engineering applications

other methods to consider this:
- total least squares
- Principal Component Analysis (PCA)

Join Course

Preview

Author

Jensen J.

Information

Last changed
2 years ago

Report course