Neural networks: Training

Buffl

Programming in Python II

by Noel K.

How can you automatically compute gradients of a model?

gradients = torch.autograd.grad(output, trainable_param, retain_graph=True)[0]

# you can use autograd, retain_graph=True, if we want to compute the gradients twice

output.backward(retain_graph=True)

# or use backward, then the gradients will be accumulated in the “grad” attribute”,

# you therefore have to reset the gradients manually if you want to compute them a second time

trainable_param.grad.zero_()

How can you minimize the loss?

optimizer = torch.optim.SGD([trainable_param], lr=0.01)

# initialize optimizer

output = trainable_param.sum() * 2

# Compute the output

loss = torch.abs(target - output)

# Compute the loss

loss.backward()

# Compute the gradients, backward method only takes scalars

optimizer.step()

# Perform the update

optimizer.zero_grad()

# Reset the accumulated gradients

What are common loss function and how can we apply them?

loss_function = torch.nn.MSELoss()

# initialize mean squared error loss, regression

output = dsnn(input_tensor)

loss = loss_function(output, target_tensor)

# loss function takes output and target tensor as parameters

loss.backward()

optimizer.step()

optimizer.zero_grad()

loss_function = torch.nn.BCEWithLogitsLoss()

# binary cross entropy, classification,

# excepts values before applying the sigmoid activation function

loss_function = torch.nn.CrossEntropyLoss()

# cross entropy loss, used for multi-class classification,

# expects values before applying the softmax function

How can we inspect performance of the models during training?

from torch.utils.tensorboard import SummaryWriter

log_dir = os.path.join("results", "experiment_00")

writer = SummaryWriter(log_dir)

# perform training, track losses, weights

# and gradients during training

writer.close()

What are general hints and tips for training a neural net?

with unbalanced data, it may help to increase the weight of sample loss of the underrepresented class
learning has to be figured out during training, momentum helps to overcome local minima
use clipping to prevent destabilization due to outliers in the data
use regularization and belonging penalties
hyperparameters cannot be optimized independantly, but can be found with grid search
When evaluating the model, computation of the gradients is not needed
Save trained models

What does optim.step() do?

It carries out an update on the registered trainable parameters of my_model. It should be called after the loss of the output of my_model has been calculated and loss.backward() has been called.

What is the purpose of a loss function?

To compute the difference between the model ouput and the actual targets (ground truth).

What does it mean to train your model for one epoch?

perform one training iteration over all training samples.

Join Course

Preview

Author

Noel K.

Information

Last changed
2 years ago

Report course