7. Adversarial Learning Introduction

Buffl

ML ITSecv

von Jensen J.

What approaches to find weights in a NN?

random search -> poor
evolutionary algos -> works
backprop -> performant
- needs differentiable f!!!

When is a function differentiable?

if it is continuous
and for all x
- limes from right = limes from left…

What is the mathematical problem / goal of backprop?

find theta ( parameters of NN)
so that
average loss (1/N) over all training dataset instances
is minimized

How to apply backprop?

compute the gradient over the N training samples
yields a dim(theta) dimensional gradient vector ([dL/ dT1, dL/dT2, …, dL/dTn])
points in direction of largest increase of 1/N SUM loss
-> find parameters via iterative update

What is the formula for iterative update in backprop?

new weights
=
old weights
-
learning rate
*
gradient

How are the weights initialized?

randomly from normal distribution

What is the general goal of adversarial learning?

find some delta such that
f(x + delat) != y
-> find deviation to the input so that it is classified incorrectly

What are the constraints on the deviation in adversarial learningß

delta is small
- ||d||_2 < epsilon
x+d does not exceed feature domain
- e.g. x+d element [0,1] for images…

What types of adversarial learning attacks are there?

targeted attack
- have f
- classify x+d as not y
- but some specific other class y’
untargeted attack
- have f
- classify x+d as not y
- but some arbitrary other class

What is the mathematical formula to find an optiumal delta in targeted attacks?

find delta that minimizes
loss of diverted input (x+d) w.r.t. target class
+
size of diversion ||d||_2 times trade-off factor lambda

What is the mathematical formula to find an optiumal delta in untargeted attacks?

maximize the loss for diverted input (x+d) w.r.t. correct class
- -> here, we minimize the negative log likelihood!!!
also considering the size of delta (with tradeoff factor lambda)

How can the formulas be rewritten for targeted and untargeted attacks?

targeted:

untargeted:

Does J must incorporate the loss?

no, not necessarily-> can also use some other function..

What else can be rewritten?

merge x+d simply into x
use the same for targeted and untargeted
- when the context is clear…
- y’, y -> y

How can one find optimal delta for attacks?

-> Gradient descent with d initialized as 0-vector

How to actually perform backprop in attacks?

use gradient w.r.t x
-> how must x (the input) change so that the attacker loss is minimized… meaning:
- targeted -> minimize loss w.r.t. target class
- untargeted -> minimize negative loss w.r.t. actual class -> maximize error classifying as correct class…

How does the gradient look like for backprop in adversarial ML?

How to interpret the slope of the gradient?

if negative -> increase x to minimize loss
if positive -> decrease x to minimize loss

How is the delta updated in gradient descent?

What has shown w.r.t. susceptability of networks to targeted and untargeted attacks?

both deep and shallow susceptible
the larger the weight decay lambda -> the larger the average required distortion
- possible defense
- but: trade off between accuracy and robustness…
adversarial examples transfer between models
white box attack…

Beitreten

Vorschau

Author

Jensen J.

Informationen

Zuletzt geändert
vor 2 Jahren

Kurs melden