03. Image Formation

Buffl

Computer VIsion 2

by Jensen J.

What different type of semantic segmentation did we talk about?

semantic segmentation
- find what the objects are
instance segmentation
- differentiate the different objects from each other
panoptic segmentation
- instance segmantation that also considers the background

What are the general steps of image fitting pipeline?

model selection based on type of data
- -> for data fitting
- -> e.g. if only rotation and translation needed
  - -> SO(3) is good
- -> if scaling is needed
  - -> requires SE(3)
parameter estimation by
- minimizing the distances
- between correspondences
-> done by computing sum of distances
=> find optimal rotation, translation (parameters) to minimize sum of correspondence distances (objective function)

What is the response functoin in cameras? What is a problem with it?

maps log-exposure value (scene radiance)
to intensity levels in teh input images
=> basically transforms light signal to electruc signal…
=> is not linear, thus needs calibration

How does a pinhole camera work?

has converging lens
all rays parallel to optical axis converge at focal point

flips the image

How do we map 3D coordinates to image coordinates in pinhole cameras?

image coordinates: x
3D coordinates: X

What are the perspective effects of pinhole cameras?

far away objects appear smaller
intersection of parallel lines in 2D
vanishing directions

What is meant by far away objects appear smaller?

size is inversively proportional with distance

What is meant by intersection of parallel lines in 2D?

paralell lines intersect at “vanishing point” in the image
can fall both inside or outside of the image
connection between two horizontal vanishing points is the horizon
- => i.e. wie in gemoetrie in HfT -> fernpunkte wenn tatsächlich paralell
- -> oder in echt paralell aber durch perspektive nivht mehr…

-> blue line is horizon / vanishing line

What is meant by vanishing directoins?

vanishing direction is defined by connecting vanishing point and camera center
-> is paralell to a 3D dominant direction (which yield the vanishing point…)

-> more on that in later lectures…

What is the FOV in our pinhole camera?

angular portion of 3D scene seen by camera
=> FOV are dimensions of the scene of the target object
-> the furtehr we are away, the smaller the FOV…

What is the FOV proportional to?

inversely proportional to focal length

-> large POV -> use short focal length

-> small POV -> use large focal lenght

What is the mathematical relation of FOV θ, image width W and focal length f?

How are points represented using homogenous coordinates?

homogenous:

(a,b,c)

cartesian:

(a/c, b/c)

What notation do we have in perspective projectino?

C: optical center
- -> center of lens, i.e. center of projection
Xc, Yc, Zc: axes of camera frame
Zc: optical axis (principal frame)
O: principal point
- -> i.e. intersection of optical axis and image plane

How do perspective projection and parallel projeciton differ?

perspective:

size varies inversely with distance -> looks realistic
parallel lines do not (in general) remain paralell

paralell:

good for exact measurements
paralell lines remain paralell
less realistic looking

How are points from camera frame (stuff in front of the lens) projected onto the image plane?

based on similar triangles

!! we use front image plane!!!

In perspective projection, how do we convert image coordinates to pixel coordinates?

ku and kv are conversoin factors (between mm and pixels)
u,v are pixel coordinates

here, we can expres the focal length
in mm -> ku*f; kv*f
in pixels -> au, av (replacing ku*f…)

What is the final model to transform 3D camera coordinates to 2D image coordinates?

use formulas we derived so far (camera plane -> image plane -> pixel)

where focal length au av (in pixels…)
and principal points u0 v0

What is the skew factor?

principal point (center of camera) does not completely align with center of image frame
=> due to manufacturing process…
=> nowadays very good manufacturing -> skew factor assumed to be 0…
=> au = av (i.e. suqare pixels…)

What are the general steps of projecting a point from world frame to pixel coordinates?

world frame -> camera frame (take extrinsic parameters into consideratoin)
camera frame -> pixel

What are the important coordinate systems we make use of in perspective projection?

camera frame
image coordinates
pixel coordinates
world frame

Where do intrinsic, where extrinsic parameters matter?

intrinsic:

camera frame -> image frame

extrinsic:

world frame -> camera frame
- -> corresponds to rotation and translation…

What are the extrinsic parameters for mapping world to camera frame?

[R|T] -> rotation and translation
=> called extrinsic parameters

How are extrinsic parameters applied?

How can we (using our knowledge so far) create a condensed formula for mapping world to pixel frame?

-> combine intrinsic and extrinsic parameters

What optoins for line projection did we discuss?

two-step computatoin method
one-step computation method

How does two-step line projection work?

independently project the two endpoints of the line (homogenous)
line is then cross product of these homogenous endpoints
-> line also in homogenous representation… (as it is 3D vector…)

How is a plane defined in homogenous coordinates?

4D vector (A,B,C,D)
where n=[A,B,C] is the unit vector
D is the distance from the origin (projection from origin onto normal)

take two arbitrary points on plane to calculate the vector between them
1. dot product with normal must be 0 (as it is orthogonal…)
Use this to compute D

When does a point in homogenous coordinates lies on a plane?

point P (X,Y,Z,1)
-> PTn = 0
=> lies on plane

How to compute a projection plane by image line?

camera center and line on image plane span projection plane…

P: projection matrix (i.e. combined intrinsic and extrinsic matrix -> but here transposed!)
I: line on image plane (result of 1-step or 2-step method…)

How to compute the intersectoin between a 3D line and 3D plane?

D: intersectoin
L: Plucker matrix of line
- => derived from the plucker coordinates

pi: plane

Again, what is the principal point?

intersectoin of camera center with image plane… (center (if no intrinsic deviation) of image plane…)

What is a normalized image?

virtual image plane
focal length equal to 1
origin of pixle coordinates at principal point

Why to use normalized images?

in certain cases, want to do thing in 3D
-> treat/extend 2D coordinates of image plane in 3D (w.r.t. camera center etc.)
=> normalized images improve ease of computation

How does one compute normalized coordinates?

inverse K and multiply wiht original coordinates

What is meant by parallelism of ray direction?

it is a geometric constrait that
-> image frame points are parelell to camera frame points
=> lie on same line from camera center

=> only differ in scale

What is meant by parallelism of normals of projection plane?

describes a geometric constraint of lines
-> normal on line on image frame and camera frame only differ by scale
=> normals are paralell

How can we alternatively express line constriants (textual…)

3D line projection (image to camera frame) is orthogonal to the normal of projection plane
direction defined by 3D point lying on 3D line and the origin is orthogonal to the normal of projeciton plane
- line from camera center to point on line …

Using these constraints, explain how we can derive rotaiton and translation?

given: (camera frame) 3D point o, 3D lines ex, ey
given: projection of theme onto image frame of two cameras (p, dx, dy)
-> use constraints to calculate rotation and translation matrix between them

calculate normal nx, ny between p and dx, dy
ex, ey is scaled paralell to the cross products of both norms with translaiton

using this, we can calculate

What is one large difference between planar and spherical projection?

sperical has larger fov than planar projection

-> sperical: iamge plane is sphere…

How to create a spherical projection?

use omnicamera (360° camera…)

How to project 3D points to spherical coordinates?

normalize 3D point to map onto sphere

convert to spherical coordinates
- using angles…

“up / down” using polar angle (0-pi)
“left/right” using asimuth (0-2pi)
r needed if we do not have unit sphere (to express distance from center if not 1…)

How to map spherical representation to planar?