By machine learning solution I mean all the deep learning neural network based solutions.
Yes we can, its very true that people have started working on hacking machine learning solutions and are also trying to use these techniques for improving the Deep learning solutions.
The papers which I went through are:
“Explaining and Harnessing Adversarial Examples” by Goodfellow and co-authors, and
“DeepFool: a simple and accurate method to deepfool neural networks” by Dezfooli and team.
How and why is this important?
There can be many use cases but let me share a few. Many solutions worldwide use Machine learning as Identity recognition system to grant access to the platform/solutions. Now think, what if anyone could make a system identify someone else as Yourself, or someone else’s identity as yours?? Hence the entire system can become very vulnerable.
10000-ft view how this is doable:
Let’s take an image which the system currently identifies correctly and modify it such that it starts identifying it as someone else. Say a NN system was trained to identify animals. We have an image which our NN System identified as “Elephant” and now modify that image in such a way, that now NN starts identifying it as “Lion”, while you are still seeing the image of an elephant. By modify I mean here changing some pixels values in the image.
Taking things forward, let’s discuss about adversarial perturbations.
Adversarial means “involving or characterized by conflict or opposition
” and
Perturbation means “anxiety; mental uneasiness.”
Let me show you adversarial example for human brain
The circles are not intertwined but they look as if they are. Our brain is also confusing it for something which it is not.
Same thing can be done to the deep learning solutions also. We can confuse them and get them to behave as we like i.e Detect things as we want, not what they are meant to detect.
How do we do it( L theory)
To simplify things let’s take a example of a binary classifier. Is like a line on the data set if the input lies on one side it is detected as one and another on other side.
Let f be an affine classifier f(x) = wT * x +b
Now suppose X = X+h where h is very small. Now suppose we could choose h such that the f(x+h) moves to the other side of the line, our goal will be achieved.
DeepFool equation for binary classifier is:
R(x0) = arg min||r||2
Subject to sign(f(x0+r)) ¹sign(f(x0))
= - (For detailed notations please refer to the above papers)
Similarly, we can achieve formula of all the different classifiers.
The two common ones are “deepfool” and “fast gradient sign method”.
As per Fast Gradient Sign Method
h = e sign(Ñx J(Q,x,y))
where Q are parameters x is input and y is the target output and Ñ means gradient.
Following is a tensorflow code snippet
with e is 0.23 we can choose around 0.007
y_ = tf.placeholder(tf.float32, [None, 10])
imgs = tf.placeholder(tf.float32, [None, 784])
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
grads,=tf.gradients(cross_entropy,x)
sign_grads = tf.sign(grads)
scaled_sign_grads = -0.23 * sign_grads
adv_x = tf.stop_gradient(x+scaled_sign_grads)
now we can simply run adv_x to get the h and add it to the desired input. In this case, I used an image of 7 and target was 1. Following above flow, the system started detecting this image as 1 instead of the actual 7 being expected.
Thanks for reading.
Please let me know if any queries or suggestions.
