undefined

Buffl

von Erik M.

How many bytes per elemet for a 32 bit float?

How to calc memory in kb?

Output volume * memory per elem / 1024

How to calc flops?

Number of output elements * ops per element (Filter)

How to calc the output size for pooling layers?

How many learnable params do pooling layers have?

Where in alexnet is more memory usage, where are the parameters and where the most flops?

Whats VGG?

Deeper network afer alex net, with certain design rules for the layers

What was special about GoogLeNet?

Focus on efficiency?

Reduced Parameter count, memory, computation intensity

—> Downsampling of the input in beginning

Had inception module:

The Inception module in GoogLeNet allows the network to capture different types of features at various scales simultaneously. It does this by applying multiple filters (1x1, 3x3, 5x5) and max pooling in parallel to the same input, then concatenating the outputs.

This approach enables the network to process fine details and broader patterns efficiently, without having to choose a single filter size, making it more powerful and flexible for image recognition.

What is 1x1 convolution about?

A 1x1 convolution is used in CNNs to reduce the number of channels (depth) in the feature maps without affecting their spatial dimensions. It works by applying a single weight to each pixel's channel, combining information across channels.

Key reasons:

Dimensionality Reduction: Reduces computational complexity by lowering the number of channels.
Feature Mixing: Allows different feature maps to interact, improving learning by mixing information from different channels.

In essence, it simplifies the model and makes it more efficient while preserving important features. —> Dimensionality reduction!!!

Whats global average pooling?

Global Average Pooling (GAP) replaces the fully connected layers at the end of a CNN. It works by averaging all values in each feature map, reducing the entire feature map to a single value.

Key Points:

Output: For each feature map, GAP produces one number (the average of all its elements).
Purpose: It reduces the spatial dimensions while keeping the depth, making the model more lightweight and reducing overfitting.

What are res nets about?

The key innovation in ResNets (Residual Networks) is the introduction of skip connections, which allow the network to "skip" one or more layers by adding the input of a layer directly to its output.

What’s new:

Skip Connections: These bypass layers and pass the input forward, helping avoid the vanishing gradient problem by ensuring gradients can flow through deeper layers during backpropagation.
Deeper Networks: ResNets enable much deeper architectures (e.g., 50, 101, or more layers) without degradation in performance, which would typically happen in very deep networks.

This allows ResNets to learn more complex features without suffering from the typical training issues of deep networks.

Makes it easier to optimize deep models!

What are Bottleneck blocks, how do they help?

Bottleneck blocks are a type of building block used in deeper versions of ResNets to make the network more efficient. They consist of three layers:

1x1 convolution: Reduces the number of channels (compression).
3x3 convolution: Applies the main transformation.
1x1 convolution: Restores the original number of channels (expansion).

How they help:

Efficiency: By reducing the number of channels with the first 1x1 convolution, the computational cost of the 3x3 convolution is greatly reduced.
Deeper Networks: Bottleneck blocks allow for deeper networks (e.g., ResNet-50, ResNet-101) with fewer parameters, improving performance without excessive computation.

This structure makes deeper ResNets faster and more memory-efficient.

What is grouped convolution about?

Grouped convolution is a technique used in convolutional neural networks where the input and output channels are divided into groups. Each group of input channels is convolved with its corresponding group of output channels separately.

Key Features:

Division into Groups: The input and output channels are split into smaller subsets (groups). For example, if you have 64 input channels and use 4 groups, each group will have 16 channels.
Independent Convolution: Each group performs its own convolution operation, reducing the number of parameters and computations.

Benefits:

Reduced Complexity: By decreasing the number of channels processed at once, grouped convolutions lower computational costs and memory usage.
Increased Efficiency: They allow for the design of deeper networks while keeping the computations manageable.

Grouped convolution is used in architectures like AlexNet and ResNeXt, and it's particularly beneficial for mobile and resource-constrained applications.

What is neural architecture search about?

Automation of arch design —> Very expensive 800 gpus for 28 days in original paper

Summary