What is the ML development workflow?
problem definition
data preparation
model
hardware
training
tuning
inference
What are the questions in problem definition?
What kind of problem?
What are inputs and desired outputs?
What are the questions in data preparation?
How much data is needed?
How to get that data?
What are the questions in Model?
Neural Network?
Which layers?
What are the questions in Hardware?
Which hardware for training?
Which hardware for inference?
What are the questions in Training?
Supervised?
Which loss function?
Which optimizer?
What are the questions in Tuning?
Iterative change of model, hyperparamters, etc.
What are the questions in Inference?
Deploy the model for your desired problem
What approaches are there to get data? What are problems?
public datasets
how well does this fit your problem defintion (inputs, outputs,…)
can the data be processed to fit your problem? (e.g. 3d bounding boxes -> 2d bounding boxes)
create your own dataset
prototype needed?
data infrastructure (how to save data? how to handle large files?)
time consuming and expensive
How is the relation of model training and implementation to data preparation?
data prepartaion makes up 80% of your project -> crucial!!!
but only sees around 1% of AI research…
What is the suvivorship bias?
e.g. use statistics to see where planes most often hit by bullets to reinforce these areas
-> bias: only planes that did survive these bullets came back to be evaluated (-> should reinforce areas where no bullets hit…)
What is the sample bias?
unbalanced sampling from a population
e.g. tank detection
-> images with tanks only at day
-> images without tanks only at night…
What are ehtical issues with bias?
e.g. have less with people of color
-> e.g. guy holding tehrmometer
white: thermometer
black: gun
…
What are class imbalances? How to consider?
e.g. detect different vehicles
-> 2000 motor cycles
-> 10 cars
=> will not yield good results…
-> repeat samples with under-represented data
-> consider imbalance in loss function
What can be problems with data annotation (e.g. output of bounding boxes)?
what to do when objects are overlapping?
-> inconsistency can be problematic…
=> different ground truths…?
How can one measure the consistency of labeling?
IoU (intersection over union) = area of overlap / area of union
=> inconsistent labels can be considered as noise…
-> smaller is worse
What is noise in data? How to handle it?
for an x value -> no consistent y values…
use more data to better middle…
or clean data to have consistent labels…
How can we increase data? (images)
Data augmentation!!!
flip images
rotate images
scale images outward or inward
crop images
translation of objects in x,y position (verschieben); similar to crop…
add gaussian noise…
defacto changes all pixle values…
deep photo style transfer
fancy…
-> generate new image by new style (using generative NN…)
How effective is data agumentaiton?
depending on mehtod aroudn 10 percent
-> overall feeing augmentation useful…
What different types of biases exist?
sample biase
dataset does not actually represent the world
exclusion bias
systematic exclusion of information
measurement bias
data measurement for training differs from inference (e.g. for training differnt camera used…)
recall bias
label similar types of data inconsistent
observer bias
effect of seeing what you expect ot see or want to see in data
What NN models did we discuss?
Fully connected
convolutional neural networks
graph neural networks
recurrent neural networks
Can we create networks from different types?
yes…
What activatoin functions did we discuss?
step functoin
sigmoid
relu
List hardware that can be used for ML
CPU
solve general / wide range of tasks
GPU
designed to accelerate rendering of graphic
focus on parallelizuation
TPU (tensor processing unit)
Designed to accelerate deep learning tasks
TPUs from Google especially for Tensorflow
FPGA
AISEC
Differnce CPU GPU?
mainly serielles abarbeiten von inputs
GPU stark parallelisiert…
Common loss functions we discussed?
L1 (MAE)
L2 (MSE)
Binary Cross Entropy
-> BCE good for:
binary classification for clasificaiotn…
What are aspects where differetn optimizers can differ?
computaton time
local / global minima
conergence tyme
learning rate dependency
number of hyperparameters (e.g. momentum)
Why is wheight initialization important?
neccesarry to have starting point for optimization
two important dimensions
value range
distribution
best practice: start with default initialization from your ML library
In what regards is the value range important for weight initialization?
too small value lead to slow learning
too large values may lead to divergence
keep in mind vanishing and exploding gradients
In what regards is the distribution important for weight initialization?
constant initialization performs poorly -> need for randomness
xavier initialization with
mean of activations is 0
variance of activations stay the same across every layer
What can be a variation in transfer lerningß
keep weights in initiali layers
=> in image classification: early layers have generic features (low level)
re-train later layers
train the actually important, complex features specific to our task….
What are methods to prevent overfitting?
increase data set / data set augmentaiton
reduce model size
early stopping
regularization
dropouts
In what regards is randomness imporant in Neural Networks?
sequence of data
else, find global minima difficult for optimizer
initialization
optimizer (e.g. SGD)
randoim retularization (e.g. dropout)
calculation architecture may non-dtereministic
How can we try to keep determinism? Effects of randomness?
keep random seed
-> but not all calculations might be deterministic
=> train NN several times with same settings to account for impact of randomness
=> comparability of NN results often difficult…
What are training hyperparameters?
learning rate
decay rate
bacth size
number of epochs
dropout rate
What are hyperparameters of the model?
number of layers
number of parameters in layer
What is a problem with hyperparameter optimization?
lots of them
-> high dimensional optimization problem
What is a problem with relations between hpyerparameters?
hyperparameters not independent of each other (-> change one influences effect of other)
=> for some rough explainable (e.g. learning rate and number of epochs)
=> for some relation not explainable (e.g. dropout rate and batch size)
What is a problem w.r.t. hyperparameter search?=
training may take very long (e.g. days)
-> exhaustive serarch of hyperparameter combinations not feasible…
What is a way to optimize hyperparameters?
bayesian optimiozation
is gaussian process and gradient free
not feaible for lots of hyperparameers (e.g. 4 okay, 30 not)
how can we try to overcome the black box?
featuer visualizatoin
-> plot graphs to the weights / kernels….
activation maps -> which inputs are important?
What are some post-training optimization approiaches?
precision calibration
layer and tensor fusion
kernel auto-tuning
multi stream execution
dynamic tensor memroy
What is an application to monitor our training and model?
tensoroboard
Last changed2 years ago