Computer Vision

Buffl

Advanced AI and Data Science

von Carmen F.

Computer Vision Process

Image Preprocessing

Data augmentation

create modifyed version of input image to provide more examples for ML model

eg:

de texturise
de colorise
edge enhancmet
slient edge map
flip/rotate

Feature Extraction

traditional machine learning require handcrafted feature extraction

CNN

convolutional neural networks

no need to manually extract features form image

network extracts them automatically and learns importance by applying weights to connections

CNN History and Applications

80s: oots in neocognition (no end-to-end supervised-learning)

90s: time-delay nn for speech recognition

2000s: detection, segmentation and recognitio of objects and regions in images (abundance of labeled data)

2012: Imagenet competion: dataset of about a million images with 1000 different classes

succeses since:

use of GPU
ReLu
Dropout
data augmentation

CNN motivation

CNN Heuristics

exploit that high level features are composed of lower level ones

inspired bey classic notions of simple cells and complex cells in visual neuroscience

key ideas that take advantages of natural signals:

local connections
shared weights
pooling
many layers

CNN convolution

local groups of values often highly correlated —> distinctive local motifs

local statistics are invariant to location

if a motive appear it could appear anywhere

mathematically it is a discrete convolution

operation is matrix multiplication

CNN pooling

merge semantically similar features into one

relative position of features in a motive can vary —> coarse-graining the postion

computes maximum of local patch of units in one feature

neighbouring units are shifted by more than one row or column —> reduces the dimension of representation

representations to vary very little when elements in the previous layer vary in position and appearance

Pooling layers goal is to subsample (i.e., shrink) the input image in order to reduce the computational load, the memory usage, and the number of parameters (thereby limiting the risk of overfitting). Reducing the input image size also makes the neural network tolerate a little bit of image shift (location invariance)

feature map

A neuron located in row i, column j of a given layer is connected to the outputs of the neurons in the previous layer located in rows i to i + fh to j + fw – 1, columns j – 1 where fh and fw are the height and width of the receptive field (“filter” neurons in the feature maps)

In order for a layer to have the same height and width as the previous layer, it is common to add zeros around the inputs, as shown in the diagram. This is called zero padding

strides

distance between two conscutive receptive fileds

A neuron located in row i, column j in the upper layer is connected to the outputs of the neurons in the previous layer located in rows i × sh to i × sh + fh – 1, columns j × sw to j × sw + fw – 1, where sh and sw are the vertical and horizontal strides.

Stacking Multiple Feature Maps

Within one feature map, all neurons share the same parameters weights and bias term), but different feature maps may have different parameters.

A neuron located in row i, column j of the feature map k in a given convolutional layer l is connected to the outputs of the neurons in the previous layer l – 1, located in rows i × sw to i × sw + fw – 1 and columns j × sh to j × sh + fh – 1, across all feature maps in layer l – 1.

CNN Architecture

Some CNN Architectures

AlexNet

Won the 2012 ImageNet ILSVRC challenge by a large margin

Quite similar to LeNet-5, only much larger and deeper, and it was the first to stack convolutional layers directly on top of each other, instead of stacking a pooling layer on top of each convolutional layer

GoogLeNet

ILSVRC 2014 challenge by pushing the top-5 error rate below 7%

This great performance came in large part from the fact that the network was much deeper than previous CNNs

Inception Module

Use parameters much more efficiently than previous architectures

Cannot capture spatial patterns, but can capture patterns along the depth dimension.

Output fewer feature maps than their inputs, so they serve as bottleneck layers, meaning they reduce dimensionality.

Each pair of convolutional layers ([1 × 1, 3 × 3] and [1 × 1, 5 × 5]) acts like a single, powerful convolutional layer, capable of capturing more complex patterns. Indeed, instead of sweeping a simple linear classifier across the image this pair of convolutional layers sweeps a two layer neural network across the image.

YOLO

You Only Look Once

Object Detection

Region proposal—An algorithm or a DL model is used to generate regions of interest (RoIs) to be further processed by the system.

Feature extraction and network predictions—Visual features are extracted for each of the bounding boxes. They are evaluated, and it is determined whether and which objects are present in the proposals based on visual features.

Non-maximum suppression (NMS)—In this step, the model has likely found multiple bounding boxes for the same object.

Evaluation metrics: mean average precision (mAP), precision-recall curve (PR curve), and intersection over union (IoU).

Semantic Segmentation

In semantic segmentation, each pixel is classified according to the class of the object it belongs to (e.g., road, car, pedestrian, building, etc.)

Image Captioning

Image Captioning is the process of generating textual description of an image.

It uses both Natural Language Processing and Computer Vision to generate the captions.

Other applications for CNN

1D CNN for sequence data

Convert audio signal to spectrogram and use CNN

Convert sensor signals to image and use CNN

1D CNN WaveNet for Sequence Data

Stacked 1D convolutional layers, doubling the dilation rate at every layer: the first convolutional layer gets a glimpse of just two time steps at a time, while the next one sees four time steps, the next one sees eight time steps, and so on.

This way, the lower layers learn short-term patterns, while the higher layers learn long-term patterns. Thanks to the doubling dilation rate, the network can process extremely large sequences very efficiently.

Audio Signal Processing by CNN

Convert audio signal to spectrogram (time-frequency signal representation).

Use CNN on spectrograms.

Transfer Learning

What are two main operations in CNNs?

feature extraction with
- pooling: merge sematiclly similar features into one, take max value from sample
- convolution: local groups are highly correlated merge many values into smaller sample

Calculate the output of a convolutional filter for given input and parameters values.

Which kinds of CNNs do you know?

How can we generate new images using deep learning?

they are trained like classification nn

they then generate new data that is similar in characteristics

What is transfer learning and when it can be used?

using a allready trained NN and giving it some more data to learn from a general application to a more specifiyed one

How would you design a CNN for a particular application e.g. autonomous driving or medical image segmentation?

Pipeline Monitoring by Fiber Sensing

Detection and tracking of an excavator along a pipeline using an optic fiber as a sensor

Convert Signal to Image
Deep NN
Dedect events

selective laser sintering (SLS) process monitoring and quality control

Goal: Improvement of process monitoring and quality control for selective laser sintering (SLS) by using modern machine/deep learning algorithms

Defects at the powder bed have a negative impact on the process and material properties of the component.

Types of defect:

curling, warpage, surface defects, layer delimination, cracks

Curling: clearly visible at IR and non coated

data augmentation:

roatatin 20&`%
shifting to max 2% height
rescaling
cutting away 0.15%
filling missing pixels with nearest filled values
zoomin max 15%
horizontal flipping

data divided into 80% training, 10% validation, 10% test

VGG-16

138*10⁶ parameter

Xception

23*10⁶ parameter

Grad CAM

The gradient-weighted class activation mapping uses the gradients as weights during backpropagation and highlights mportant areas related to the predictions made by the network

Helpful by interpreting and explaining deep networks results neural

Beitreten

Vorschau

Author

Carmen F.

Informationen

Zuletzt geändert
vor einem Jahr

Kurs melden