“Regardless of the used method for representation learning (i.e. PCA, ICA, SC), the resulting set of basis vectors depends on the given dataset.”
Correct: Each method of representation learning tries to find a representation that is optimal for the given dataset. Therefore the algorithms exploit / eliminate correlations, compute the Eigenvectors of the distribution etc.
“The first principal component has the smallest eigenvalue and covers the largest possible variance of the data.”
Incorrect:
The first principle component has the greatest eigenvalue and covers the largest possible variance of the data.
“The optimal number of principal components to retain is 42.”
The optimal number of components to retain depends on the given dataset (e.g. there could only be few eigenvalues >>0) and the problem that should be solved (classification, compression and reconstruction, …).
“After whitening, the features of an observation are uncorrelated and have unit variance.”
Correct:
First, the correlations are eliminated with the PCA. After subtracting the mean and dividing by the standard deviation in each dimension, the features have unit variance.
“The resulting basis vectors of the ICA (i.e. the independent components) are mutually orthogonal.”
Unlike PCA, ICA is able to find independent components that do not have to be orthogonal. It can find nonlinear dependencies.
“The ICA is an enhancement of the PCA which additionally aims to make the already uncorrelated features statistically independent.”
First the data is whitened (including the PCA), then ICA finds and removes the dependencies in the uncorrelated features.
“The set of basis vectors of the ICA can be over-complete, i.e. more basis vectors than dimensions of the input space.”
There are no additional dimensions in the n-dimensional input space which could lead to more than n basis vectors.
“Sparsity means that only a few basis vectors are used to reconstruct an image.”
There are only few non-zero coefficients that contribute to the reconstruction of an image, so fewer of the basis vecotrs are used.
“The optimization problem for SC (c.f. equation 1.9 in the lecture notes) is convex.”
The optimization problem is a nested problem. It is highly non-convex and there is no guarantee that the global minimum is found.
“Unlike PCA and ICA, SC results in a linear transform of the data.”
PCA and ICA are linear transformations. The representation acquired by SC is (in general) non-linear.
Last changeda year ago