How can you make sure of reproducibility?
With a random number generator, either with numpy or with torch
rng = np.random.default_rng(seed=0)
a = rng.uniform(size=(3,))
torch.random.manual_seed(0)
a = torch.rand(size=(3,))
How can you create a Dataset?
from torch.utils.data import Dataset
class Simple1DRandomDataset(Dataset):
def __init__(self, samples: np.ndarray):
# initialization of the dataset
def __getitem__(self, index):
# return one sample with the sample itself an the belonging index
def __len__(self):
# return the total numbers of samples in the dataset
How can we load the data?
from torch.utils.data import DataLoader
our_dataloader = DataLoader( our_dataset, shuffle=True,
batch_size=4, num_workers=0)
The batch size is a hyperparameter we need to set
Number of workers refers to the option of multiple processing, only works with the name = main statement
How can you split the data into test, training and validation set?
from torch.utils.data import Subset
n_samples = len(our_dataset)
shuffled_indices = rng.permutation(n_samples)
test_set_indices = shuffled_indices[:int(n_samples / 5)]
validation_set_indices = shuffled_indices [int(n_samples / 5):int(n_samples / 5) * 2]
training_set_indices = shuffled_indices[int(n_samples / 5) * 2:]
You need to create seperate Dataloaders afterwards
How can you custom stacking, e.g. if you do not want to stack the minitbatches?
With the argument collate_fn
training_loader = DataLoader(training_set, shuffle=False,
batch_size=4, collate_fn=no_stack_collate_fn)
What should the getitem() method typically return?
The corresponding sample, specified by the index
What is the argument collate_fn for?
It specifies how the samples are combined into minibatches
What is the purpose of an instance of torch.utils.data.DataLoader?
They create minibatches from the samples returned by a torch.utils.data.Dataset instance
Last changed2 years ago