What is Machine Perception?
computer acquiring ability to interpret data related to environment
=> sensors as input…
What can computer vision be understood as in correlation with machine perception?
Machine Perception that focuses on the use of Camera as inputs…
What are advantages of cameras for machine perception?
cheapest sensor of car
-> includes most important information about environment
images can be processes by classical CV algorithms and deep learning algos
What can be challenges (in pictures) to computer vision?
occlusion (not whole object is seen)
viewpoint variation
illumination
background clutter
defomrmation (e.g. schlangenmensch -> has not “natural / usual” shape)
inter-class similarities
intra-class variation
How to overcome challenges in computer vision (image wise)?
have lots and lots of labelled images…
What is the computer vision pipeline?
chose hardware (camera, lens…)
aquire image/video stream
integrate image in SW pipeline
preprocess image for further evaluatoin
detect/extract features in image
localize object
classify object
What to keep in mind when acquiring images?
lighting
machine vision not as capable as human
-> carefully iluminate scene one wants to capture
=> problem autonomous driving: light changes during day, weather,…
distance and area
camera allows to chose Field of View (FOV)
-> to make good fotos, have to know FOV and work distnace…
resolution
sensor that converts light from lens into electrical signal
-> array of signals (pixels..)
frame grabber and software
-> frame grabber (normally) sends digitized image over bus system to computer
Pros Cons Matlab for computer vision?
pros:
easy to use
good documentation
GPU booost
cons:
closed environment
performance
Pros and cons openCV for computer vision?
free for everyone (not like matlab only for students)
language C++, python
everything you need
powerful
GPU boost
installation (especially GPU)
What is image processing?
method to perform operaitons on image in order to
enhance image
extraction of useful information
Steps image processing pipeline
Camera takes picture
camera calibration
digital transformation
color spaces
filtering
contrast enhancment
affine transformation
resampling / compression
save to image
What is camera calibration? and why is it needed?
camera moves 3d world points in 2d
-> most cameras produce distortion
=> camera calibration :
find oarameters of camera and lens that affect image processing
What type of parameters can caues distortion?
extrinsic camera parameters
position of camera center, camera heading
intrinsic camera parametes
focal length
image sensor formant
lens distortion parameters
difference radial and tangential distortion?
radial:
curve in lens
-> edges of images distorted
-> objects / lines appearr more or less curved than they actual are
tangential:
camera not perfect alligned to image plane
-> image look tilted that some objects appera farther or closer than they actually are
How to perform calibraiton for distortion (radial)
find and draw corners
-> find distortion coefficients and correction
correct
What is a picture in data representation?
matrix of values based on color (channels x dim_x in pixels x dim_y in pixels)
=> represents rectangular grid of evenly spaced pixels
How to detect color?
filter for red, green and blue
-> at each pixel, measure amount of light falling into sensor…
Wny are there different color spaces? Name them
for different problems, as have different attributes
RGB
HSV
CMYK
CIE
How many values does RBG has=
8 bit for each channel
-> 0-255
-> 255^3 possible colors
How to understand RBG?
cube in 3d plane with length 255,255,255
-> points in cube represent color
How are HSV colors indicated?
Hue (0° red, 120° green, 240° blue) -> in between the transition from these colors…
saturation (0% gray, 100% saturated color)
value (0% dark, 100% light
How to understand hsv?
zylinder
go around it for different colors
go up for lighter, down for darker
go to center for no gray (white when fuly light) and to border for intensive color
What can filtering be used for im image processing?
apply matrix convolution to emphysize information
by transforming the image
=> convolution with kernel (filter/mask/conv matrix)
Name some filters
mean
gaussian blur
sharpen
What is contrast? And what is the idea to alter it?
range of difference in color and brightness
-> low : image values concentrated in narrow range
enhancment -> change image value distribution to cover wider range
What is used to determine contrast?-
histogramm
-> cumulative distribution functoin should optimally be linear …
What are affine transformaitons? What to keep in mindß
they preserve points, straight lines and planes…
=> paralell lines remain paralell after tranformation
=> does not necesarrily preserve angles or distances…
-> but preserver ratios of distances between points lying on a straight line
What is resampling?
change resolution (number of pixels)
-> downsampling -> decrease resolution (e.g. mean/max/min filtering)
-> upsampling .-> increase resolution (e.g. interpolation)
What are interpolated points in upsampling based?
each cell in new raster must be computed by
sampling or interpolation over some neighborhood cells in corresponding position in original raster object
What are some upsampling techniques?
nearest neighbor
bilinear
bicubic
What is a feature ?
piece of information relevant for solving computational task related to certain application
What is feature detection?
includes methods for computing abstractions of image informatino
-> and making llocal decisions at every image point
=> resulting features subsets of image domain like isolaed points, continuous curves, connected regions
=> e.g. edges, corners, lines,…
What is feature extraction?
after detecting features
-> local image patch around feature can be extracted
=> e.g. isolation of shapes of digitized image or video stream
What is feature extraction used for?
object detection
robot navigation
motion tracking
…
What is the goal of edge detection?
Identify sudden changes
(discontinuities) in an image
intuitively:
most semantic and shape info can be encoded in edges
more compact than pixels
What is the ideal of edge detection?
artists line drawing of an object
-> but artist has object-level knowledge …
What factors cause edges?
surface normal discontinuity (surface ends)
depth discontinuity (e.g. backside of ball not visible although continuous surface…)
surface color discontinuity
illumination discontinuity (e.g. shadow)
How can edged be detected?
-> rapid change in intensity function of image… (grayscale…)
=> first derivative -> extrema are edges
What is probably the most widely used edge detector in computer vision?
canny edge detection
How does canny edge detection function?
gaussian filter to smooth image
remove noise
larger kernel size decreases sensitivity to noise but increase localization error
compute intensity graidents for each pixel
non-maximum suppresion -> thin multiple-pixle wide ridges down to single pixel width -> in narrow intensity drops -> chose only maximum / minimum…
apply lower and upper threshold for edges (hysteris)
intensity above are used to start edge curve
if neighboring are above lower thershold, edge is continued
if intensity below lower thershold, discarded as noise
Advantage and remaining disadvantage of feature extraction
edge detection makes possible to considearbly reduce amount of data in an image
-> but image still described by pixels…
=> if lines, ellipses, … could be defined by characteristic equations => amount of data reduced even more
How to reduce the amount of required data after edge detection even more?
hough transformation
How is Hogh Transfomration mathematically structuerd?
all lines in 2d representable by y=ax+b
=> each line represented by (a,b)
=> each line (a,b) corresponds to single point pi in hough space
=> hough space has the two features a,b
What is a problem of representing single points in hough space?
are part of infinite number of lines -> each point creates infinite points in hough space
=> these points form line hough space determined by all (a,b) comninations going thoruhg the point in the image space
How to comply with the principle that a line in image space corresponds to a point in hough space considering that each point on the line L creates a seperate line in the hough space?
the intersection of all these points on the line L in the hough space
=> correesponds to the single line in real space…
What intuition can be used for corner detection?
shifting a window in any direction should result in large changes…
=> flat region -> same intensity -> no changes when moving
=> edge -> no change along edge directoin movement
=> corner -> significant changes in all directions
How does depth detection work?
have to images available
have baseline (distance between cameras)
have focal length f
calculate difference of same object w.r.t. position on x-axis
Z = B*f/d
distance = Baseline * focal length / disparity (x2-x1)
=> map depth information to each pixel…
What methods exist for image classification?
rule based
->manual feature extraction and classification logic
machine learning
-> manual feature extraction
classifier learned automatically
deep learning
-> relevant features and classifier learned jointly
How to do video analysis?
video as sequence of frames over time
-> image data is function of space (x,y) and time t
=> optical flow is pattern of apparent motion of ibjects surface and edges
=> in visual scene caused by relative motion between observer and scene
What is SLAM?
simultaneous localization and mappinf
=> visual odometry -> process of determining position and orientation of robot by analyzing associated camera images
Mean filter?
Gaussian Blur (3x3) filter?
Guassian Blur (5x5)
on both axis bell curve (normal distribution…)
Image Intensity Function and derivative
Again, canny, what is the suppress non maxima used for?
“Thin” the lines to create clear edges and not blurry ones…
Canny, what blur used for?
suppress noise
What is the pipeline for machine learning based image classification?
have input image
change color and/or contrast (e.g. HSV is prominent, better edge detection…)
compute histogram of oriented gradients (HOGs)
Normalize contrast over overlapping spatial blocks
collect HOGs over detection window
apply machine learning
-> person / non-person classification
Are HOGs the only solution?
no one of many possible methods…
e.g. SIFT
SURF
How does HOG roughly work?
manually choose kernel to compute gradients
calculate orientation for each pixel (one of 8 angles \|/…)
=> use these features in ML algo…
Pipeline detect lanes?
extract image shape
transform image to greyscale
blue image
perform canny edge detection
mask special region in image
detect hough line
Zuletzt geändertvor 2 Jahren