How does 3D-2D and 2D-2D geometry compare?
different types of correspondances:
2D-2D:
2D-2D correspondances for relative camera pose estimatino
-> NOT suitable to compute absolute pose of sequential iamges
time consuming
estimated translation up to scale
3D-2D:
3D-2D correspondances for absolute camera pose estimation (i.e. calibration…)
What two practical configurations do we have for perspective projection model?
R,T known -> use them to obtain 2D projectoins
2D projections (associated with 3D points) known -> use to compute R,T
What is the goal of perspective n points (PnP)?
given set of 2D-3D point correspondences
determine the 6 DOF absolute pose (w.r.t. world frame…) of a camera
-> under the assumptino that calibration is given (i.e. we know intrinsic…)
What is the relationship of PnP with camera calibration?
camera calibration:
focus on simultaneously calirbartion of extrinsic and intrinsic
PnP:
only estimate intrinsic (with known extrinsic)
-> camera localization problem
What is the minimal case/solutioun of PnP?
3 Points
2 Points: ininite number of solutions, but bounded
4 Points: more reliable
What do the infinite but number of solutions contain in the PnP 2 point case?
we can basically fix the 2D points and connect them arbitrarily with a camera center on a curve…
After experiencing the 2 point case, how does the 3-point case allows us to find single camera center?
2 points -> create 3D curve on which camera center can lie
3rd point -> lies on this curve uniquely identifying the camera centes…
What two methods to establish 3D-2D point correspondances exist?
first method:
use 2D descriptors, assign them to 3D point and search for similarity (similar to 2D-2D)
second method:
geometric method (introduced later)
What is the general idea of generating 2D-3D correspondances based on 2D descriptors?
mapp descriptor of 2D point to reconstructed 3D point
match this 3D point (descriptor) to the extracted 2D point based on descriptor similarity (in new image…)
=> i.e. we have first frame and corresponding 3D point
-> assign 3D point descriptor of corresp. 2D point
-> in next frame, find 2D points with similar descrioptors to 3D potin to establish 2D-3D correspondance
What is the classic mehtod to calculate the absolute camera pose from 3D-2D correspondances?
use DLT
we can bring the calibration to the left side -> (k^-1) by this we normalize the points
we then define new vectors t_i corrsponding to the row of the t-matrix
after reforming we can derive the followung formula:
leading to the matrix
What number of matching points do we need to solve our T matrix?
T has dimensionality of 12
-> each point brings two constraings
-> need at least six pairs
What is P3P?
perspective 3 points
-> minimal case of PnP
What does P3P “solve” compared to the DLT case?
DLT -> needs 6 points -> redundancy
P3P -> needs three points
What is the idea of P3P?
use the law of cosines on the known distances
Here, we know the distance of AB (provided 3D points in world frame) , the distance of ab (provided normalized points in camera frame)
same for AC, BC
left side (OA, OB, OC) unknown, right side (cos<a,b>,…; AB,…) known
-> use these to calculate the OA, OB and OC…
What is the law of cosines?
What is EPnP?
effective perspective n points
-> different version of PnP
2 step method
What are advantages and disadvantages of EPnP?
advantage: higher accuracy
disadvantage: more points
What is the idea of EPnP?
express each 3D points as linear combination of four control points (which are known beforehand)
determine coordinates of these 4 control points in both camera and world frames (known in world, unknown in camera…)
use control poitns ot obtain coordinates of each 3D point in both camera and world frames (3D-3D point correspondances)
use 3D-3D geometry to compute closed-form solution of R and t
What is the constraint on the control points in EPnP?
points denoted as linear combination of the four control points are barycentric coordinates
-> i.e. the coefficients sum up to 1
=> so called barycentric coordinates
As we use barycentric coords for our points w.r.t. the control points, what advantage does this have for our points in the camera frame?
simple formula for point in camera frame: -> transformed point in world frame…
replacing the point in the world frame and the translation part yields:
-> above part relative straightfoprward (linear combination of control points in world frame -> yield point in world frame…)
-> lower part -> simply replace 1 by the sum of our barycentric coefficients (which is equivalent to 1…)
now bringing in the formula for projecting the control points from world to camera
we can see that by replacing the left hand side with the the weighted sum
that this equals our previous derived expression
=> thus, we can represent the point in camera frame as a barycentric coordinate using the control frames in camera frame
=> with the same coefficients as in the world frame!
How do we proceed with previous derivations to solve EPnP?
as we know the coefficients for the control points in our world frame to calculate the 3D points
and we know the corresponding image coordinates
we can derive a formula to calculate the control points in the camera frame
How does EPnP use the control points in the camera frame?
we now know the control points in camera frame
using barycentric expression, we can express arbitrary 3D points in camera frame
as we know these points in the world frame beforehand
=> we now have 3D-3D point correspondances between world and camera frame
=> use regular 3D-3D geometry to dertermine R|T between camera and world…
What is PnL?
perspective n-lines
inptu: set of 3D-2D line correspondences
where 3D lines are in world frame
output:
3-DOF rotation and 3-DOF translation
aligning world frame to camera frame
=> i.e. PnP but with lines instead of points…
Last changeda year ago