What types of descriptors did we discuss?
patch descriptors
-> i.e. patch of intensity values
needs to be warped into canonical space
census descriptor
a vector with integer/float values
What is patch scale search?
trying to find the correct scale between two patches to compare
-> brute force try SSD for all (patch1) x (patch2 x scales) cobinations
What are drawbacks of patch scale search?
inefficient
-> O(N^2S)
where N are the features per image
and S is the number of scales to try (i.e. we only try diff. scales in one image)
as we fix scale of left patch
-> cannot guarantee that it is optimal (distinctive enough)
How do we want to improve the high complexity of patch scale search?
general: assume that we know the scale beforehand (a priori)
-> only O(N^2)…
gial: automatically determine scale before matching (independent of it) before we actually match
=> i.e. determine scale based on single image
What is the goal of automatic scale determination?
we want to automatically find a scale (size)
for both images individually
=> i.e. independent of tentative matching unlike before
What methods do we have to find rotation for our patch descriptor?
harris descriptor
use gradients vector
How does the harris detector work for derotating?
use harris detector as it is rotation invariant
-> as eigenvalues of M matrix correspond to diretoins of quickest and slowest changes of SSD
=> eigenvalues form an ellipse that can rotate, but shaoe stays the same
=> easy to assign rotation…
How does the second method to derotate using pixel wise gradient vectors work?
compute gradients vector at each pixel within a patch
build histogram of gradient orientations (0-2pi) weighted by gradient magnitude (norm of vector)
using this information -> extract local maxima above certain threshold
=> constitute a candidate dominant directoin (typically 3)
use these dominant directions to align the rotatoin for both patches
Which strategy to rotate patches is preffered in practice?
the one based on gradient orientations
-> more robust
What is the aim of blob detectoin?
given single image
-> detect blobs and automatically assign them “appropriate” scales
Why is the use of blobs more comfortable than e.g. cornerrs?
blobs inherently encode scale informaiton
-> corner can hardly do this..
=> blobs can directly be resized to evaluate two pacthed by SSD
How is the scale mathematically expressed in blobs?
radius of the circle that denotes scale
How do we get from blob matching to point corerespondances?
for two matched blobs
-> their respectve centers constitute the corresponding points
What is a drawback of using Blobs?
blob centers may not be very precise
compared with corners
How do we determine the optimal scale of a blob?
make use of LoG (laplacian of gaussians)
-> defined for fixed scale sima
try set of candidate scales sigma for which we compute LoG (applying kernel)
the LoG corresponds to the so called “response”
-> find the sigma where the response is maxium
=> this is our optimal scale
How can we improve our Blob finding?
disregard too close patches with same scale
-> as are probably the same…
How do we disregard “too similar / overlapping blobs”
overlapping: centers are close, sigma is the same
=> use Non maximum suppresion
=> only keep the blob wiht higher response (i.e. LoG extrema)
What is the pipeline for blob detection with scale using a single image?
build laplacian space, starting with an iinitial scale and go for n interations
generate a (scale-normalized) LoG filter at a given scale (k^n*initial)
filter image with LoG kernel
save square of laplacian filter response for current level of scale space
increase scale by factor k
perform non-maximum suppresion in scale space (-> find best scale for each blob)
display resulting circles at their characteristic scales
What scale of patch descriptor methods did we differentiate? What is the complexity?
bruteforce matching with straightforward scale search (one-step method)
S*N^2
bruteforce matching with automatic scale search (two step method)
N*S + N*S + N^2
What are disadvantages of patch descriptor-based methods?
if warping not estimated accurately
-> very small errors in rotaiton, scale, viewpoint will affect matching score based on SSD
LoG is relative inefficient
What is an alternative to patch descriptor based methods?
census descriptor based methodd
What is a main difference betweeen patch and census descriptor based methods?
in census descriptor based methods
-> we do not directly compare patches with SSD
-> but compare associated vector descriptors
-> less sensitive to noise
=> i.e. use vector to describe patch instead of pixel-wise SSD
What are disadvantages of patch-descriptor based methods?
What is an alternative to LoG to overcome its inefficiency?
use the difference of gaussian DoG kernel
What is sift?
scale invariant feature transform
What are the overarching stepts in SIFT?
key point extraction based on extreme detection using DoG (instead of LoG)
census descriptor assignment by HoG
How do we replace LoG with DoG?
LoG -> use kernel
DoG -> approximates LoG without using convolution
-> uses diffference of gaussian blurs at different scales
How do we perform DoG?
we have source image
compute several gaussian blurs of it (with different StdDev)
sigma = 1; sigma = 2, …
calcualte difference between adjacent blurred images
=> result in DoG images
=> do this for different scales of the image (so called ocatves)
detect local extrema in these DoG images
What are the local extrema in the DoG images?
the SIFT key points
How do we detect local extrema in the DoG images?
each pixel is compared to 26 neighbors
8 around it (neighbors in current image)
9 above it (adjacent upper scale)
9 below it (adjacent lower scale)
-> i.e. like a surronding cube in the DoG pyramid)
if pixel is local extrema -> select as SIFT feature
What is the output of the SIFT detector for each SIFT feature?
location (x,y) (pixel that is lcoal extrema)
scale s (scale in pyramid at which local extrema resides
How are census descriptors also called?
histogram of oriented gradients (HOG) descriptor
How do we calculate the HoG/census descriptor?
input: de-rotated patch
divide pacth into 4x4 cels
for each cell: generate 8-bin histogram (i.e. 8 directions)
=> concatenate all histograms into single 1D vector
What is the dimension of our SIFT HOG/census desriptor?
4 (cells) x 4(cells) x 8 (bins) = 128
What is the output of the SIFT algo?
location (-> pixel coords of center of patch): 2D vector
scale (-> high scale -> high blur in gauss pyramid..): 1 scalar value
orientation (-> dominant direction of HoG): 1 scalar value (i.e. angle of patch)
Descriptor (128 values …)
How does the application of HoG differ for dominant direction (rotation) determination and descriptor generation?
dominant direction:
pixel level
use HoG to find out how to de-rotate
descriptor generation:
already de-rotated
then use for cell-level descriptor generation
Last changeda year ago