How can you automatically compute gradients of a model?
gradients = torch.autograd.grad(output, trainable_param, retain_graph=True)[0]
# you can use autograd, retain_graph=True, if we want to compute the gradients twice
output.backward(retain_graph=True)
# or use backward, then the gradients will be accumulated in the “grad” attribute”,
# you therefore have to reset the gradients manually if you want to compute them a second time
trainable_param.grad.zero_()
How can you minimize the loss?
optimizer = torch.optim.SGD([trainable_param], lr=0.01)
# initialize optimizer
output = trainable_param.sum() * 2
# Compute the output
loss = torch.abs(target - output)
# Compute the loss
loss.backward()
# Compute the gradients, backward method only takes scalars
optimizer.step()
# Perform the update
optimizer.zero_grad()
# Reset the accumulated gradients
What are common loss function and how can we apply them?
loss_function = torch.nn.MSELoss()
# initialize mean squared error loss, regression
output = dsnn(input_tensor)
loss = loss_function(output, target_tensor)
# loss function takes output and target tensor as parameters
loss_function = torch.nn.BCEWithLogitsLoss()
# binary cross entropy, classification,
# excepts values before applying the sigmoid activation function
loss_function = torch.nn.CrossEntropyLoss()
# cross entropy loss, used for multi-class classification,
# expects values before applying the softmax function
How can we inspect performance of the models during training?
from torch.utils.tensorboard import SummaryWriter
log_dir = os.path.join("results", "experiment_00")
writer = SummaryWriter(log_dir)
# perform training, track losses, weights
# and gradients during training
writer.close()
What are general hints and tips for training a neural net?
with unbalanced data, it may help to increase the weight of sample loss of the underrepresented class
learning has to be figured out during training, momentum helps to overcome local minima
use clipping to prevent destabilization due to outliers in the data
use regularization and belonging penalties
hyperparameters cannot be optimized independantly, but can be found with grid search
When evaluating the model, computation of the gradients is not needed
Save trained models
What does optim.step() do?
It carries out an update on the registered trainable parameters of my_model. It should be called after the loss of the output of my_model has been calculated and loss.backward() has been called.
What is the purpose of a loss function?
To compute the difference between the model ouput and the actual targets (ground truth).
What does it mean to train your model for one epoch?
perform one training iteration over all training samples.
Last changed2 years ago