What is machine learning?
basically applied statistics. It uses algorithms to parse data and learn from it and than make predictions on it.
In comparison to a general programming algorithm which clearly define how a task should be solved. Machine learning algorithm use large amounts of data to learn how to solve a task.
What is an Artificial Neural Network?
Based on the human brain. Uses large amounts of Data. It is a type of mashine learning Process which uses interconnected nodes/neurons in a layered structure. In comparison to a real human brain where neurons can connect to other neurons in the area and exchange signals between them, neural network has a distinct number of layers and information propargation in one direction.
What is deep learning? Why not just use mashine learning?
Deep learning is a subset of machine learning. The difference is that while maschine learning algorithm use manually defined feastures (feature extraction before learning) deep learning algorithms define the features during the learning process by itself (learns features directly from data).
The reason for using deep learning is that especially for large datasets with a lot of features, deep learning algorithms lead to better results in performance and accurancy.
What is a neural network?
A neural network typically consists of at least three layers:
Input layer – Takes in the input features.
Hidden layers – Process the information by applying weights, biases, and activation functions.
Output layer – Produces the final prediction.
Each layer consists of neurons, which hold numerical values. Each neuron in one layer is connected to every neuron in the next layer (in a fully connected network).
A logistic regression-like operation is used to propagate information from one layer to the next:
Each neuron receives a weighted sum of the outputs from the previous layer.
A bias term is added.
The result is passed through an activation function (e.g., ReLU, Sigmoid, or Softmax).
Each connection between neurons has a different weight, so each neuron receives a different computed value. This process continues until we reach the output layer.
The number of neurons in the output layer depends on the number of possible output categories. For example, if we want to classify an image of a handwritten digit (0 to 9), the output layer will have 10 neurons, each representing a digit.
How does a neural Network learn?
The learning process consists of adjusting the weights and biases so that the accuracy of predictions improves over time.
Initializing Weights Randomly
The initial weights and biases are randomly assigned.
This leads to poor initial predictions, but the network improves through training.
Computing the Cost Function
The error (or "badness") of the predictions is measured using a cost function (also called a loss function).
The higher the value of the cost function, the worse the predictions.
Backpropagation (Using the Chain Rule of Calculus)
To improve predictions, we need to adjust the weights so that the cost function decreases.
This requires calculating gradients (partial derivatives of the cost function with respect to each weight).
The chain rule of calculus is used to propagate these gradients backward through the network.
Gradients provide two key pieces of information:
Direction – Whether a weight should be increased or decreased to reduce the cost.
Magnitude – How much changing the weight will affect the final output.
Gradient Descent Optimization
Once gradients are computed, they are used in the gradient descent algorithm to update the weights.
The update formula is:
where η is the learning rate.
Learning Rate and Optimization
The learning rate determines how big each update step is:
A large learning rate speeds up learning but can cause overshooting.
A small learning rate is slower but more precise.
Iterating the Process
The entire process (computing the cost, calculating gradients, and updating weights) is repeated for multiple iterations (epochs).
Over time, the neural network gradually finds better weights and biases, improving accuracy.
Forward Propagation:
The function first computes the forward propagation.
It calculates the weighted sum of each pixel in every picture. The result z is a 1D vector giving each picture a numerical value that can be within ± infinity.
z
The sigmoid function squashes the result to be within 0 and 1, producing y_head, which represents the predicted probabilities.
y_head
The loss is calculated using binary cross-entropy; the smaller the loss, the better the prediction corresponds to the actual value.
The cost is the average loss, taking into consideration every picture.
Backward Propagation:
The second part of the function calculates the gradients to adjust the weights and biases later on.
It does so by calculating the derivative of the cost with respect to the weights and biases.
derivative_weight is the gradient of the cost with respect to the weights.
derivative_weight
derivative_bias is the gradient of the cost with respect to the bias.
derivative_bias
Return:
Finally, it returns the cost and the gradients.
The code is used to perform Gradient descent on the weights and biases of the Neuronal Network. The goal is to adjust the weights and biases by using the gradients of the backward propargation together with a pre-set learning rate used to define how big the steps of changing the weights and gradients shall be. This process of computing the gradients to adjust the weights and biases is performed for the number of iterations. The loss/cost should decrease with each iteration making the predictions better each time.
This function makes predictions using the already adjusted weights and biases on the testing data. It computes the prediction value for each image in the testing data, which can range between ± infinity, and then applies the sigmoid function to squash it to be within the range of 0 to 1. If the resulting value is above 0.5, the prediction is set to 1; otherwise, it is set to 0.
The code uses logistic Regression method from sklearn library to train and predict the outcome on testing data. In the end the accuracy of the training and testing will be displayed
Last changed3 days ago