What approaches to find weights in a NN?
random search -> poor
evolutionary algos -> works
backprop -> performant
needs differentiable f!!!
When is a function differentiable?
if it is continuous
and for all x
limes from right = limes from left…
What is the mathematical problem / goal of backprop?
find theta ( parameters of NN)
so that
average loss (1/N) over all training dataset instances
is minimized
How to apply backprop?
compute the gradient over the N training samples
yields a dim(theta) dimensional gradient vector ([dL/ dT1, dL/dT2, …, dL/dTn])
points in direction of largest increase of 1/N SUM loss
-> find parameters via iterative update
What is the formula for iterative update in backprop?
new weights
=
old weights
-
learning rate
*
gradient
How are the weights initialized?
randomly from normal distribution
What is the general goal of adversarial learning?
find some delta such that
f(x + delat) != y
-> find deviation to the input so that it is classified incorrectly
What are the constraints on the deviation in adversarial learningß
delta is small
||d||_2 < epsilon
x+d does not exceed feature domain
e.g. x+d element [0,1] for images…
What types of adversarial learning attacks are there?
targeted attack
have f
classify x+d as not y
but some specific other class y’
untargeted attack
but some arbitrary other class
What is the mathematical formula to find an optiumal delta in targeted attacks?
find delta that minimizes
loss of diverted input (x+d) w.r.t. target class
+
size of diversion ||d||_2 times trade-off factor lambda
What is the mathematical formula to find an optiumal delta in untargeted attacks?
maximize the loss for diverted input (x+d) w.r.t. correct class
-> here, we minimize the negative log likelihood!!!
also considering the size of delta (with tradeoff factor lambda)
How can the formulas be rewritten for targeted and untargeted attacks?
targeted:
untargeted:
Does J must incorporate the loss?
no, not necessarily-> can also use some other function..
What else can be rewritten?
merge x+d simply into x
use the same for targeted and untargeted
when the context is clear…
y’, y -> y
How can one find optimal delta for attacks?
-> Gradient descent with d initialized as 0-vector
How to actually perform backprop in attacks?
use gradient w.r.t x
-> how must x (the input) change so that the attacker loss is minimized… meaning:
targeted -> minimize loss w.r.t. target class
untargeted -> minimize negative loss w.r.t. actual class -> maximize error classifying as correct class…
How does the gradient look like for backprop in adversarial ML?
How to interpret the slope of the gradient?
if negative -> increase x to minimize loss
if positive -> decrease x to minimize loss
How is the delta updated in gradient descent?
What has shown w.r.t. susceptability of networks to targeted and untargeted attacks?
both deep and shallow susceptible
the larger the weight decay lambda -> the larger the average required distortion
possible defense
but: trade off between accuracy and robustness…
adversarial examples transfer between models
white box attack…
Last changed2 years ago