5. Selected Math. Techniques – Buffl

Buffl

MR

by Marie R.

Principal Component Analysis (PCA) – Basics

Statistical technique for simplifying datasets.
Introduced by Karl Pearson (early unsupervised learning).
Also called Karhunen-Loève transform, eigenvector projection, or SVD.
Goal: reduce dimensionality while preserving maximum variance.
Creates principal components (PCs):
- Linear functions of original variables.
- Uncorrelated with each other.

Why PCA?

Input data often has many correlated/redundant variables → complexity.
PCA constructs new variables (PCs) → most info contained in first few PCs.
Helps simplify data for:
- Machine learning.
- Regression models.

Main Uses of PCA

Dimensionality reduction → fewer variables.
Feature extraction → uncorrelated features.
Data visualization → higher dimensions reduced to 2D/3D.

Applied in image & signal processing.
Often used in ML preprocessing → reduces noise, improves performance.

When to Use PCA?

Do we want to reduce number of variables (without selecting manually)?
Do we want to ensure independence between variables?
If yes → PCA is a good method.

PCA Algorithm – Steps

Standardize data → subtract mean from each variable.
Covariance matrix → measure variance & covariance.
Eigenvalues & eigenvectors →
- Eigenvectors = directions of maximum variance.
- Eigenvalues = variance explained.
Sort eigenvalues → largest = 1st principal component (PC1).
Reduce dimension → ignore less significant PCs.
Reconstruct dataset → transformed data back into feature space.

PCA Example (x1 & x2 dataset)

Data scattered around diagonal x1 = x2.
PC1 = diagonal axis (captures most variance).
PC2 = perpendicular axis (captures second-highest variance).
If reducing variables → keep PC1, drop PC2.

Author

Marie R.

Information

Last changed
16 days ago

© 2023 Buffl GmbH