Computer Vision Process
Image Preprocessing
Data augmentation
create modifyed version of input image to provide more examples for ML model
eg:
de texturise
de colorise
edge enhancmet
slient edge map
flip/rotate
Feature Extraction
traditional machine learning require handcrafted feature extraction
CNN
convolutional neural networks
no need to manually extract features form image
network extracts them automatically and learns importance by applying weights to connections
CNN History and Applications
80s: oots in neocognition (no end-to-end supervised-learning)
90s: time-delay nn for speech recognition
2000s: detection, segmentation and recognitio of objects and regions in images (abundance of labeled data)
2012: Imagenet competion: dataset of about a million images with 1000 different classes
succeses since:
use of GPU
ReLu
Dropout
data augmentation
CNN motivation
CNN Heuristics
exploit that high level features are composed of lower level ones
inspired bey classic notions of simple cells and complex cells in visual neuroscience
key ideas that take advantages of natural signals:
local connections
shared weights
pooling
many layers
CNN convolution
local groups of values often highly correlated —> distinctive local motifs
local statistics are invariant to location
if a motive appear it could appear anywhere
mathematically it is a discrete convolution
operation is matrix multiplication
CNN pooling
merge semantically similar features into one
relative position of features in a motive can vary —> coarse-graining the postion
computes maximum of local patch of units in one feature
neighbouring units are shifted by more than one row or column —> reduces the dimension of representation
representations to vary very little when elements in the previous layer vary in position and appearance
Pooling layers goal is to subsample (i.e., shrink) the input image in order to reduce the computational load, the memory usage, and the number of parameters (thereby limiting the risk of overfitting). Reducing the input image size also makes the neural network tolerate a little bit of image shift (location invariance)
feature map
A neuron located in row i, column j of a given layer is connected to the outputs of the neurons in the previous layer located in rows i to i + fh to j + fw – 1, columns j – 1 where fh and fw are the height and width of the receptive field (“filter” neurons in the feature maps)
In order for a layer to have the same height and width as the previous layer, it is common to add zeros around the inputs, as shown in the diagram. This is called zero padding
strides
distance between two conscutive receptive fileds
A neuron located in row i, column j in the upper layer is connected to the outputs of the neurons in the previous layer located in rows i × sh to i × sh + fh – 1, columns j × sw to j × sw + fw – 1, where sh and sw are the vertical and horizontal strides.
Stacking Multiple Feature Maps
Within one feature map, all neurons share the same parameters weights and bias term), but different feature maps may have different parameters.
A neuron located in row i, column j of the feature map k in a given convolutional layer l is connected to the outputs of the neurons in the previous layer l – 1, located in rows i × sw to i × sw + fw – 1 and columns j × sh to j × sh + fh – 1, across all feature maps in layer l – 1.
CNN Architecture
Some CNN Architectures
AlexNet
Won the 2012 ImageNet ILSVRC challenge by a large margin
Quite similar to LeNet-5, only much larger and deeper, and it was the first to stack convolutional layers directly on top of each other, instead of stacking a pooling layer on top of each convolutional layer
GoogLeNet
ILSVRC 2014 challenge by pushing the top-5 error rate below 7%
This great performance came in large part from the fact that the network was much deeper than previous CNNs
Inception Module
Use parameters much more efficiently than previous architectures
Cannot capture spatial patterns, but can capture patterns along the depth dimension.
Output fewer feature maps than their inputs, so they serve as bottleneck layers, meaning they reduce dimensionality.
Each pair of convolutional layers ([1 × 1, 3 × 3] and [1 × 1, 5 × 5]) acts like a single, powerful convolutional layer, capable of capturing more complex patterns. Indeed, instead of sweeping a simple linear classifier across the image this pair of convolutional layers sweeps a two layer neural network across the image.
YOLO
You Only Look Once
Object Detection
Region proposal—An algorithm or a DL model is used to generate regions of interest (RoIs) to be further processed by the system.
Feature extraction and network predictions—Visual features are extracted for each of the bounding boxes. They are evaluated, and it is determined whether and which objects are present in the proposals based on visual features.
Non-maximum suppression (NMS)—In this step, the model has likely found multiple bounding boxes for the same object.
Evaluation metrics: mean average precision (mAP), precision-recall curve (PR curve), and intersection over union (IoU).
Semantic Segmentation
In semantic segmentation, each pixel is classified according to the class of the object it belongs to (e.g., road, car, pedestrian, building, etc.)
Image Captioning
Image Captioning is the process of generating textual description of an image.
It uses both Natural Language Processing and Computer Vision to generate the captions.
Other applications for CNN
1D CNN for sequence data
Convert audio signal to spectrogram and use CNN
Convert sensor signals to image and use CNN
1D CNN WaveNet for Sequence Data
Stacked 1D convolutional layers, doubling the dilation rate at every layer: the first convolutional layer gets a glimpse of just two time steps at a time, while the next one sees four time steps, the next one sees eight time steps, and so on.
This way, the lower layers learn short-term patterns, while the higher layers learn long-term patterns. Thanks to the doubling dilation rate, the network can process extremely large sequences very efficiently.
Audio Signal Processing by CNN
Convert audio signal to spectrogram (time-frequency signal representation).
Use CNN on spectrograms.
Transfer Learning
What are two main operations in CNNs?
feature extraction with
pooling: merge sematiclly similar features into one, take max value from sample
convolution: local groups are highly correlated merge many values into smaller sample
Calculate the output of a convolutional filter for given input and parameters values.
Which kinds of CNNs do you know?
How can we generate new images using deep learning?
they are trained like classification nn
they then generate new data that is similar in characteristics
What is transfer learning and when it can be used?
using a allready trained NN and giving it some more data to learn from a general application to a more specifiyed one
How would you design a CNN for a particular application e.g. autonomous driving or medical image segmentation?
Pipeline Monitoring by Fiber Sensing
Detection and tracking of an excavator along a pipeline using an optic fiber as a sensor
Convert Signal to Image
Deep NN
Dedect events
selective laser sintering (SLS) process monitoring and quality control
Goal: Improvement of process monitoring and quality control for selective laser sintering (SLS) by using modern machine/deep learning algorithms
Defects at the powder bed have a negative impact on the process and material properties of the component.
Types of defect:
curling, warpage, surface defects, layer delimination, cracks
Curling: clearly visible at IR and non coated
data augmentation:
roatatin 20&`%
shifting to max 2% height
rescaling
cutting away 0.15%
filling missing pixels with nearest filled values
zoomin max 15%
horizontal flipping
data divided into 80% training, 10% validation, 10% test
VGG-16
138*10⁶ parameter
Xception
23*10⁶ parameter
Grad CAM
The gradient-weighted class activation mapping uses the gradients as weights during backpropagation and highlights mportant areas related to the predictions made by the network
Helpful by interpreting and explaining deep networks results neural
Last changed9 months ago