What is a problem when using regular fully connected layers with images?
lots and lots of parameters…
=> requires way too much memory for parameters…
=> only usable to some extend (limited image size) and subpar…
What is a solution to process images?
use convolutional neural networks…
How to avoid the loss of spatial information in CNN?
require ability to find and analyze patterns / features
what are images?
how do humans understand and classify objects?
How to reduce numbner of parameters / weights?
new archtiecture (layer type)
What are the base properties of CNN?
CNN have (similar to FCN/MLP) neurons that have learnable weights and biases
each neuron still receives input, performs dot product and follows it with non-linearity (activation function)
network still expresses single differentiable score function (raw image pixels -> calss probabilities)
still hase loss function on the last (fully-connected) layer
all the tips/tricks seen so so farr for learning regular deep neural networks still apply
What is new in CNN?
CNNs make the explicit assumption that the inputs are images (2D, 3D inputs).
This assumption allows to encode certain properties into the architecture which make the forward function much more efficient to implement.
This vastly reduces the number of parameters in the network and improves the learning performance in general.
What is the concept of convolutions?
learn separate filters which are characteristics for different objects
-> network needs to store only corresponding weights of filters
=> since usually much smaller than image -> will result in significant reduction of the memory need while having better performance
Formula convolution?
Formula convolution multi dimensional with two functions?
What do convolutions provide?
measure for similarity between two objects / signals / arrays
What is an activation map?
kernel moves/convolves from left to right, top to bottom and performs calculation in input
=> result is feature map
By what is the number of activation maps determined?
by the number of kernels
-> one “output / activation map” for each kernel…
What is the result of more complex filter with higher depth?
becomes more dificult for humans to interpret the filter tuning results
Dimensioality:
input depth same as kernel depth
output depth same as number of kernels
What is the idea of filters for deeper levels?
filters correspond to learned features
-> get “stacked” and combined to more complex features the deeper in the network…
=> first coarse features, then more high level ones…
Interpretability different feature complexities?
low level -> easy to identify (e.g. circle, squares, diagonal lines,…)
high level -> usually impossible to interpret
What does the CNN calculate?
dot product with kernel (convolution..:)
What parameters affect the convolution process and resulting dimensionality?
padding
stride
kernel dimension
input dimension
Why do we employ padding?
to retain the dimensionality
-> as basically, convolutions always reduce the dimensionality
-> not wanted in e.g. deep networks…
How to calculate new dimensions in CNN?
N: number of filters
F: Filter size
S: Stride
P: Padding
What are exemplary combinatinos to keep the dimensinos?
What is learned in convolutional layers?
weight values of the filters
with corresponding bias term(s)
How can we apply several filters in the same layer?
in parallel
-> compute feature map for each filter
-> stack the output (depth wise…)
!! needs to have same width and heigth!!!
Why do we want to apply filters in parallel?
make sure network can learn to detect different object features…
What is pooling?
reduces spatial output of dimensions
operates over each activation map independently
(reduces width and heigth…)
=> makes it possible to improve learning process…
How does max pooling work?
size 2x2
choose max value
=> typical choice that produces good and stable results
What is average pooling?
calculate dot product and divide by (kernel width * kernel heigth)
Effects of pooling?
useful when detect object in image regardless of its position
reduce overfitting
increase efficiency
faster training times (due to reduction)
How to calc new dimensions in pooling?
Why to use larger filters?
finding larger characteristics in the data
Why do we need/want pooling=
larger filters -> increases parameter amount
use pooling to reduce the size of the image
=> allows to find large charactersitics with smaller filter sizes…
What does max unpooling allow? How is it done?
allows to restore original dimensions
-> when doing pooling, keep map that indicates which result we used from the max pooling (indicate position of initial max object …)
-> map to higher resolution by filling everything to 0 and replace the place indicated by max pooling map with value from smaller matreix we unpool…
What is the image net competition?
image net -> database containing 14M images with more than 20000 classes
-> due to size: suitable to benchmark DNN
What are some popular image classificaiotn architectures?
lenet
convolution -> pooling -> nonlinear activation
tanh
alex net
more layers and relu
zfnet
optimized version of alexnet
goolenet
much more layers…
vggnet
even larger network
resnet
use skip connections
What is transfer learning?
model trained on one task repurposed on second task
use generic model -> retrain on specific task…
-> one does not have to start from scratch… simply tune existing model for own purpose…
Last changed2 years ago