What is a projection plane?
plane defined by origin of a coordinate system
and a 3D line / 2D line
What is the problem formulation fo the matching/tracking problem?
estimate the transformation W (warping)
between a template image T and the current image I
=> all inlier 2D-2D point correspindences should satisfy the same warping model
How can one reformulate the warping estimatoin problem?
as the corerspondence finding problem
-> when one knows the correspondances -> one can (easily) estimate the required warping parameters
How does the warping estimation problem and the coprrespondance finding problem relate?
chicken and egg problem
-> when one has the one, the other is easy to calculate
=> in practice, we know nothing…
What types of solutions do we have to find correspondences?
indirect methods
direct methods
How do indirect methods work?
detecting and matching features
i.e. points or lines
What are advantages and disadvantages of indirect methods?
advantages:
can cope with
large frame-to-frame motions
strong illumination changes
cons:
slow due to costly
feature extraction
matching
outlier removal
e.g. RANSAC
What is the general pipeline of indirect methods?
detect and match featuers that are invariant to
scale
rotation
view point changes
e.g. SIFT
gemoetric verification (RANSAC)
refine estimate by minimizing the sum of squared reprojection errors between
observed feature in current image
and the warped corresponding feature from the template
What are pros and cons of direct methods?
pros:
all information in the image can be exploited
higher accuracy
higher robustness to motino blur and weak texture
i.e. weak gradients
increasing the framerate reduces computational cost per frame (no RANSAC needed)
very sensitive to initial value
limited frame-to-frame motion
How do direct methods roughly work?
use all pixels (no individual point correspondances)
-> directly process pixel intensities (i.e. we use greyscale)
=> they estimate the warp parameters that
minimize the sum of squared distances over all pixels
of the template image
and the warped corresponding pixel of the current iamge
yellow dots are individual pixels
minimize warp parameters
so that the SSD between over all pixels x
T(x) -> pixel intensity in template of pixel x
I(W(x,p)) -> intensity of the with p warped pixel x in the current image
What assumptions do we make for the direct method?
brightness constancy
temporal consistency
spatial coherence
spatial coherency
What is meant with brightness constancy?
as we directly compare squared distances of pixel intensities
-> intensity of pixels to track must not change much over consecutive frames
=> direct methods do not cope well with strong illumination chagnes
=> assume brightness is constant…
What is meant by temporal consistency?
assume that the frame-to-frame motion of object to track is small
-> around 1-2 pixels
=> direct method does not cope well with large frame-to-frame motion
can be addressed using ocarse to fune multi scale implementations (later)
What is meant by spatial coherence?
all pixels in template undergo same transformation
-> i.e. they lie roughly on the same 3D surface
=> i.e. if path to track contains two individual objects that move (transform) differently
-> hard to use direct method…
What is meant by spatial coherency?
no errors in template image boundary
only the object to track appears in template image
-> i.e. no background… (as it has different motion)
no occlusion
entire template is visible in the input image
-> i.e. postal card occluded by hand infromt of it XXXX
What is an exemplary direct method we introduced?
KLT (kanade-lucas-tomasi) tracker for small motion
consists of two sub-algorithms
Does direct methods in theory and practice differ?
yes
in theory: use all pixels
in practice (at least for KLM) -> do not use all pixels, as some are unreliable…
Of what sub-algorithms does the KLM tracker consist?
Tomasi-Kanade -> how should we select features (which pixels/image patch should we track)
-> method to choose best features
lucas-kanade -> how should we track features from frame to frame?
-> method to align an image patch
In the KLT tracker, are the sub-algorihms sequentially?
no the goal is to solve both simultaneously
What is the objective function in KLT in case of pure translation?
SSD over the translation parameters u,v (-> x+u,y+v)
is the Sum over all pixels in our image patch
(intensity of pixel (x,y) in patch - intensity of pixel (x+u,y+v) in current image)^2
How do we rewrite our cost function to be able to use gradient descent?
have to get u and v out of the I1(…)
-> approximate the intensity in current image with first order taylor expansion
=> first order taylor expansion approximates the intensity near our x,y pixel
-> for this, we create the sum of
I1(x,y) -> intensity at pixel (x,y)
directional gradient (x-direction) of current image * u (-> yields the approximate difference at the distance u from x)
same for directional gradient y…
resulting in:
How do we actually differentiate the KLM SSD formula and minimize it?
minimize -> calcualte gradient and set to 0
-> i.e. derivative w.r.t. u,v
first derive by u
use innere * äußere
same for v
How can we solve the minimization ?
ausmultiplizieren (disregard factor -2 …)
bring Ix,Iy parts in one matrix
bring u,v in a vector
bring Ix delta I, Iy delta I in another matrix
solve for u,v vector
What do we have to look for in our M matrix?
must be invertible to solve
-> det(M) should be non-zero
-> eigenvalues should be large (i.e. not flat region, not an edge)
=> in practice: patch should be corner or more generally contain texture (else det(M) is low…)
After our findings, how can we answer the question on “how to choose patches to track”?
patches whose associated M matrix has large eigenvalues
After our findings, how can we answer the question on “how to track patches from frame to frame?
use SSD to find best fit for our patch in next frame (with displacement vector u,v)
How can we extend the discussed KLT case of simple translation to the general case (i.e. warping…)
extend our SSD formula
where x are the individual pixels of our patch
T(x) is the intensity of teh respectice pixel in our template
W(x,p) is the warping of our pixel with the unknown warping parameters p (-> warping -> new position in new image)
wher I(W(…)) is the intensity of a pixel location in the current image
How do we solve our minimization problem in the general case?
similarity:
apply first order approximation of warping
difference:
pure translation: partial derivatives to obtain direct solution
general case:
gauss newton method to minimize SSD iteratively
(can theoretically still use first oder optimailty conditions to generate equations w.r.t. warping parameters -> may be difficult to solve..)
How do we iteratively minimize for the genearl case?
assume p is known
-> incrementally update p (with a delta p) so that SSD is reduced
=> in each step, find delta p that minimizes the SSD
What is a drawback of our method to choose suitable patches? How can it be improved?
we have to judge all pixels in an image
-> only for first image, judge all pixels
-> in subsequent, only consider the tracked points…
How do we generally try to solve a chicken-and-egg problem? And what is the chicken-and-egg problem in our direct method?
we do not know the correspondances we want to track
neither do we know the warping parameters
-> genearlly: problem has two set of unknown parameters, parameters are mutually determined
solution: find additional constraint to solve parameter
for us: e.g. brightness consistency (i.e. SSD must be small…)
How do we generally proceed in KLT to find the best warping parameters?
incrementally update them (param p) to continuoulsy reduce the value of the cost function (SSD)
-> each iteration, we assume we know p and want to find a delta p that improves (reduces) our loss
=> i.e. gradient descent
Last changeda year ago