How does sparse reconstructoin usually work?
Again, what are disadvantages of feature-based methods?
create only sparse map of the world
does not sample across all available image data
-> discard info around edges & weak intensities
What are some motivations to use direct methods to estimate relative pose?
compared to indirect (two-step -> feature track and then movement) these are one-step methods
create a potentially more dense feature map
more accurate
less prone to error propagation (as only has one step…)
What is the photometric error?
The foundation of direct methods
-> i.e. brightness consistency…
we have sparse 2D-2D correspondances (monocular…)
reduces image to set of sparse keypoints matched with feature descriptors
-> reconstructed also sparse
In direct methods, in what order are optimal pose and ocrrespondence obtained?
simultaneously
How do we define the photometric error for a paior of pixels?
least squares error
How do we obtain the reconstructed point, R and T in direct method?
What is the role of depth for our photometric error?
based on depth -> can back project any pixel in 3D space and then project into next image
How can depth information be obtained?
RGB-D cameras
binocular (stereo) camera -> pixel depth based on disparity
monocular -> have to treat depth as unknown -> optimize it along with camera pose
How are two points connected w.r.t. 3D-2D geometry in our photometric error problem?
we have to images p1 and p2
they are the perspective projections of 3D point P
we do not know P, nor do we know the rotation and translation between camera 1 and 2
=> we want to express p2 as a formula of p1
we can express the poitn P as a “stretched” version of the direction p1 (normalized)
we can then express p2 as the projection of P onto camera frame 2
in general, K is assmued to be known, R,t and Zdepth is assumed to be unknown
What is the goal of using photometric error w.r.t. the linear equation we used to set p1 and p2 into relation?
brightness constancy
-> estimate the depth, R, and t
such that the photometric error between p1 and p2 is minimized…
=> which we obtain simultaneously
What is the practical setup for our matching and estimatino problem?
no feature extraction, no matching, no RANSAC needed
-> directly minimize photometric error
What is our function we want to minimize?
we want to find the optimal rotation R, translation T, and points {P_i} (i.e. depth information)
so that we minimize
the error between intensity at p1 in the left image (i.e. image k-1 -> as we assume SLAM -> continuous number fo frames…)
and the intensity from p1 back projected point P, projected onto image k with R,T
here, we also introduce a “robust kernel” instead of the squared distance (comes in later lecture)
-> minimize this over the sum of all point we have in the right image
What are pros and cons in our photometric error minimization approach?
pros:
all image pixels can be used
higher accuracy
higher rboustness to motion blur and weak textures (i.e. weak gradients)
cons:
very sensitive to initial value
limited frame to frame motion
What is a problem with using regular depth for our photometric error approach=?
some features in environment (like clouds) are far off
-> leading distance to be infinity…
=> can cause prblems with numerical stability
How do we avoid problems with numerical stability w.r.t. distance?
use inverse depth paraetrization
-> replace depth with the inverse of it
-> if norm is very large (i.e. distancce point p to camera center c0) -> roh goes to 0
-> if distance is very small ->
improves numerical stability
Do we actually want to track all pixels in our approach?
no -> typically not necesarry as some pixels might be redundant
-> choose from one of three strategies
What different strategies to choose pixels do we ahve?
dense direct method
use all pixels
semi-dense direct method
track partial pixels with significant gradients
sparse direct mehthod
track sparse key points
Why not to track all pixels?
not really achiveable in real time
not all pixels contribute to solution (i.e. pixel with non obvious gradient…)
How does represeentative image look of trackin all pixels?
What is the semi-dense approach?
-> if gradient 0 -> jacobian is 0 (i.e. no contribution to problem)
-> only use pixels with high gradients (-> discard areas where gardietn non-obvious)
use tracked pixels to reconstruct semi dense structure
What is the sparse direct approach?
yields less but more reliable pixels
kind of combine indirect method with direct
-> use i.e. harris do erxtract key points
but instead of using e.g. SIFT -> we only want the position but no descriptor
-> thus, speed up compared to indirect method as we have no feature matching to establish corerspondances….
fastest method
but can only calculate sparse reconstruction
What is the general influence of the motion baseline (i.e. frame to farme motion) on the convergence rate of our direct method?
direct SLAM not suitable for large baselines for two reasons:
initial pose may be unreliable -> leads to local minimum
photometric conssitency assumption not satisfied
-> in practice small preferred
What are some direct slam methods?
indirect:
PTAM
ORB-SLAM
SVO
direct:
SVO (is direct and indirect)
LSD-SLAM
DSO
What is photometric calibration?
reduce various effects that affect our brightnes constancy assumption
By what may our brightness consistency assumptino be affected?
different exposure times
vignetting
response funciton
How does the response function affect the brightness consistency assumption?
response function maps irradienace (energy per minute falling onto sensor) into brightness
-> is not linear
=> thus, to actually correctly calculate the difference in “brightness” -> we should use irrediance (as brightness is a non-linear function of irrediance and thus somewhat biased…)
=> thus we should calibrate the response funciton
How does exposure time effect brightness constancy?
longer exposure -> brighter image
in practice: pair of image differnet exposure times (i.e. cell phone automatically adjusts it…)
=> given we consider consistency of irradiacen rather than brightness
-> we have to calibrat the exposure time to have the same value
How does vignetting affect brightness (irridiance) consistency?
vignetting -> reduction of images brightness towards the periphery compared to image center
mainly caused by manufacturing flaws
=> should remove this effect before apply photometric loss…
Last changeda year ago