Unsupervised Machine Learning

by Lukas T.

What is unsupervised learing?

unsupervised learning uses unlabled Data to learn. The goal is to find patterns in the input data which makes it possible to cluster the data or to find new information based on similarities.

Because there is no labled training Data there is no specific way to compare model performance in most unsupervised learning models

What are 3 tasks of unsupervised learning?

clustering: (Find groups of similar items based on their features)
representation learning: (Find other possible representation of the data)
density estimations: (find where data points are "concentrated" in a given space)

What are some common use-cases of unsupervised learning

Exploratory Analysis
- To explore and understand the dataset by revealing patterns, trends, or anomalies, without a specific target variable.
Dimensionality Reduction
- To simplify data by reducing its dimensionality (number of features) without losing too much of its important structure.

What is meant by clustering in machine learning?

Using clustering algorithm (e.g. K-mean) in unsupervised machine learning it is possible to group data together based on similarities. The idea is that data with similar features belong into the same group while data that is dissimilar belongs in another group. The knowledge about how an algorithm groups a datasets can give valuable insights into the dataset.

What are some real life applications of unsupervised machine learning?

Market Basket analysis
- helps determining possible purchases on past purchases -> which products are often bought together
Semantic Clustering
- used for search engines. Similar phrases are grouped together to find the search result that fits best.
Delivery Store Optimization
- used to analyse which products are sold most, where they mostly delivered and how to optimize delivery -> used to optimize the supply chain
Identify Accident Prone Areas:
- used to find areas with high risk of accidents

Explain K-Mean and its purpose

K-Mean is a unsupervised learning algorithm used to cluster different datapoints together.

Choose the number of K clusters to divide the dataset into.
1. Can be guessed or by using the Elbow method -> this uses the sum of the squared distances of each data point from its centroid to evaluate the best number for K

chose K random centroids as a starting point. One centroid per cluster
calculate the distance between the centroids and the other datapoints (e.g. Euclidean distance measure), the datapoint is than assigned to the cluster with the nearest centroid
The centroid is now recalculated as the mean of all datapoints within a cluster
Step 3 and 4 are repeated until the centroids want significantly change anymore -> Convergence is reached

What is dimensionality reduction and what is it used for?

Dimensionality reduction is a technique used in machine learning and data analysis to reduce the number of features (dimensions) in a dataset while retaining as much relevant information as possible.

This is achieved by transforming the data into a lower-dimensional space where the key patterns and relationships are preserved.

Avoid Overfitting:
- In datasets with many features (high dimensionality), machine learning models can overfit because the model captures noise or irrelevant patterns instead of meaningful trends. By reducing dimensions, the risk of overfitting decreases.
Improve Generalization:
- With fewer dimensions, the model can generalize better to unseen data, as it focuses on the most important features.
Enhance Interpretability:
- High-dimensional datasets are hard to visualize and interpret. Reducing dimensions to 2D or 3D makes it easier to visualize and understand relationships in the data.
Reduce Computational Complexity:
- High-dimensional data increases the computational cost of training models. Dimensionality reduction helps by simplifying the dataset.

What is principal Component Analysis and what is it used for? How does it work?

PCA is a technique used for dimensional reduction of data. It can convert a multidimensional dataset into a 3D or 2D dateset. Helping to visualize and analyze the data in a better way.

"Which parts (features) of this data have the most variance (meaning the most information or interesting patterns)?"
It keeps those parts because they matter the most.
It gets rid of or ignores the parts that don’t vary much or don’t add much value (like noise or redundant information).

T - Distributed Stochastic Neighbor Embedding (T - SNE)?

Is a technique for dimensional reduction that is used to seperate data that cannot be lineary seperated.

Performs KMeans clustering on two columns of the initial dataset (degree_spondylolisthesis and pelvic_radius). The algorithm is set to predict two distinct clusters. The KMeans clustering algorithm is trained on data2, and predictions are made for data2. The result is visualized in a scatter plot, displaying the clusters in different colors.

Creates a plot to visualize the best number of clusters for K-Mean clustering. Used when the number of clusters is unknown. The lower the inertia the more clusters are used. The goal is to achive a low inertia (sum of squared distances of samples to their closest cluster center) while at the same time not make to many clusters.

the cross tab shows the number of instances predicted for each class and to which class they actually belong.

PCA reduces the number of features (dimensions) in the dataset by transforming the original features into a new set of uncorrelated features (principal components). By still capturing most of the variance of the original dataset.

t-SNE (t-Distributed Stochastic Neighbor Embedding) can be used for dimensionality reduction, primarily for the visualization of high-dimensional datasets in a 2D or 3D plane. It helps to find patterns and analyze the data by visualizing clusters and relationships. However, it is not recommended to use t-SNE for training a model in the same way as PCA, because t-SNE is more focused on preserving local structure and is not suitable for generalizing to new data points.

T-SNE uses Gradient Descent, the learning rate adjusts how fast the algorithm converges to a minimum

Join Course

Preview

Author

Lukas T.

Information

Last changed
5 months ago

Report course