Principal Component Analysis (PCA) – Basics
Statistical technique for simplifying datasets.
Introduced by Karl Pearson (early unsupervised learning).
Also called Karhunen-Loève transform, eigenvector projection, or SVD.
Goal: reduce dimensionality while preserving maximum variance.
Creates principal components (PCs):
Linear functions of original variables.
Uncorrelated with each other.
Why PCA?
Input data often has many correlated/redundant variables → complexity.
PCA constructs new variables (PCs) → most info contained in first few PCs.
Helps simplify data for:
Machine learning.
Regression models.
Main Uses of PCA
Dimensionality reduction → fewer variables.
Feature extraction → uncorrelated features.
Data visualization → higher dimensions reduced to 2D/3D.
Applied in image & signal processing.
Often used in ML preprocessing → reduces noise, improves performance.
When to Use PCA?
Do we want to reduce number of variables (without selecting manually)?
Do we want to ensure independence between variables?
If yes → PCA is a good method.
PCA Algorithm – Steps
Standardize data → subtract mean from each variable.
Covariance matrix → measure variance & covariance.
Eigenvalues & eigenvectors →
Eigenvectors = directions of maximum variance.
Eigenvalues = variance explained.
Sort eigenvalues → largest = 1st principal component (PC1).
Reduce dimension → ignore less significant PCs.
Reconstruct dataset → transformed data back into feature space.
PCA Example (x1 & x2 dataset)
Data scattered around diagonal x1 = x2.
PC1 = diagonal axis (captures most variance).
PC2 = perpendicular axis (captures second-highest variance).
If reducing variables → keep PC1, drop PC2.
Last changed10 days ago