Data analysis

by Noel K.

How can you convert a PIL-image to a numpy array?

with Image.open(image_files[0]) as im:

np.array(im)

How many features and channels to images usually have

How can we compute the mean and standard deviation of an image and what may be useful to consider with large datasets?

mean = image.mean(axis=(0, 1))

std = image.std(axis=(0, 1))

With large datasets it may be useful to create save points and compress the saved data

import dill as pkl

import gzip

What can you use to downproject images?

You can use pretrained CNN models provided by Pytorch, to downproject the data into a better suited feature space, e.g. SqueezeNet 1.1 model

weights = torchvision.models.SqueezeNet1_1_Weights.IMAGENET1K_V1

pretrained_model = torchvision.models.squeezenet1_1(weights=weights)

How can we apply t-sne to downprojected data?

from sklearn.manifold import TSNE

tsne = TSNE(n_components=2, perplexity=30, random_state=1)

images_projected_tsne = tsne.fit_transform(images_projected_cnn)

What are common data normalization/scaling approaches?

If you want to determine a normalization constant, like the mean over many samples, how do you do it?

What for can torch.utils.data.Subset be used?

generate a Dataset from a subset of the original Dataset according to a list of indices.

Last changed
2 years ago