Recap of Hands-on AI I

Buffl

Hands on AI II

by Noel K.

What are three important datatypes?

Tabular data
Images
Sequences

What is tabular data and what are key properties of tabular data?

Tabular data stores data in vertical columns and horizontal rows
Each row and column is uniquely numbered
Key properties
- Tabular data has infinite range for data
- Each record (i.g. row) shares the same set of properties
- Each column is usually assigned with a header
- Each object can be retrieved by a query trough key values

What is the suitable terminology regarding tabular data?

sample
feature
feature vector
class
label

Sample: Each entry of the data
Feature: Set of belonging properties for each sample
Feature vector: Different features can be represented as a feature vector
Class: Belonging type of each entry
Label: Lists each samples class

Which problem does dimensionality solve and what are popular algorithms regarding dimenionality reduction?

Dimensionilty reduction makes it possible to visualize data with many features
It downprojects n-dimensional data to easily visualized data (often to 2D or 3D) while preserving information as much information as possible
Popular algorithms
- t-SNE (t-distributed stochastic neighbor embedding)
- PCA (Principal component analysis)

What are clustering algorithms?

Clustering algorithms try to group unlabeled data to similiar and dissimiliar samples

How are images representated, what is color depth and how does the RGB-model work?

Images are represented within three dimesions
- Height
- Width
- Channels (typically: red, green, blue)
Color depth is the number of possible values for each channel of a pixel
(commonly used: 8-bit and 16-bit)
The RGB-model has three channels (red, gree, blue). The final image results from adding all channels together

What is data augmentation and what are advanteges and downsides of data augmentation?

Data augmentation means creating new artificial samples by modifying existing ones
Advantages
- Can increase the data points (i.g. samples) with little effort
- Reduces overfitting
- Increases the robustness of a model
Downsides
- Can introduce new artifacts
- Can change the task entirely
- Heavily dependent on the task, data and model

What are popular data augmentation techniques?

Rotation
Flipping
Zooming/Cropping
Blurring
Noise
Input Dropout
Distortion Effects
Color Jittering

What is a sequence, which data can be displayed with a sequence and what are possible examples of sequences?

A sequence is a datatype which lists values in a certain order
A sequence can essentially display every kind of data but it does not always make sense
Examples
- Time series (e.g. weather, stock price)
- Positional series (e.g. molecule representation, symbol and word or in a language)

What is supervised machine learning and for what is it typically used?

Supervised machine learning is a machine learning technique, where a model learns from input data with corresponding target values
Predictive modeling –> Use trained model to predict target values for other (new) inputs where the targets are not known yet

What is the suitable terminology regarding supervised machine learning?

Model: parameterized function/method with specific parameter values (e.g., a trained neural network)
Model class: the class of models in which we search for the model (e.g., neural networks, SVMs, . . . )
Parameters: what is adjusted during training (e.g., network weights)
Hyperparameters: settings controlling model complexity or the training procedure (e.g., network learning rate)
Model selection/training: process of finding a model (optimal parameters) from the model class

What are the two most important supervised machine learning tasks and what are their differences?

Classification: target value is a class label (e.g. spam and not spam)
Regression: target value is a numerical value (e.g. house prices)

How does PCA roughly work?

Input: dimension (2D, 3D), unlabeled data
Algorithm: reduces data to desired dimension while trying to preserve as much information as possible
Output: Scatter plot of data points and their according class labels, note: this is only possible with corresponding plotting functions

How does t-SNE roughly work?

Input: dimension (2D, 3D), unlabeled data, perplexity
Algorithm: reduces data to desired dimension including a certain randomness (i.e. perplexity), while trying to preserve as much information as possible
Output: Scatter plot of data points and their according class labels, note: this is only possible with corresponding plotting functions

How does Affinity propagation roughly work?

Input: unlabeld and reduced data
Algorithm: tries to cluster the data, the number of clusters is learned by the algorithm
Output: Scatter plot of data points and their according clusters, note: this is only possible with corresponding plotting functions and clusters are not equal to class labels

How does k-means roughly work?

Input: Number of clusters, reduced and unlabeled data
Algorithm: Tries to cluster the data, the number of clusters is defined by the user
Output: Scatter plot of data points and their according clusters, note: this is only possible with corresponding plotting functions and clusters are not equal to class labels

What is the most obvious way to plot time series data?

Line plots, since the data depends on time and can therefore be approbiately visualized in that way

What is often problematic with image data and which type of neural networks are helpful with processing image data?

Image data is usually highly dimensional
Convolutional networks come in handy when dealing with images

How can an image be portrayed with a feature vector?

The pixels of the image are simply flattened out, s.t. a grayscaled 28x28 image can be translated into a feature vector with 728 elements

Consider the following code of a convolution:

torch.nn.Conv2d(1, 10, 5)

How many kernels are applied, what size and what dimension

do they have?

Dimension of kernels: 1
Number of kernels: 10
Size of kernels: 5

How does k-nearest neighbour algorithm roughly work?

Input: Reduced and unlabeled data, number of k-nearest neighbours
Algorithm: The algorithm is trained on a training data set. It chooses the class for a datapoint accordingly to the class, which is inherited by the most k-nearest datapoints. It is finally evaluated on the test set
Output: Accuracy on the training and test set

Join Course

Preview

Author

Noel K.

Information

Last changed
2 years ago

Report course