One place for hosting & domains


      How To Trick a Neural Network in Python 3

      The author selected Dev Color to receive a donation as part of the Write for DOnations program.

      Could a neural network for animal classification be fooled? Fooling an animal classifier may have few consequences, but what if our face authenticator could be fooled? Or our self-driving car prototype’s software? Fortunately, legions of engineers and research stand between a prototype computer-vision model and production-quality models on our mobile devices or cars. Still, these risks have significant implications and are important to consider as a machine-learning practitioner.

      In this tutorial, you will try “fooling” or tricking an animal classifier. As you work through the tutorial, you’ll use OpenCV, a computer-vision library, and PyTorch, a deep learning library. You will cover the following topics in the associated field of adversarial machine learning:

      • Create a targeted adversarial example. Pick an image, say, of a dog. Pick a target class, say, a cat. Your goal is to trick the neural network into believing the pictured dog is a cat.
      • Create an adversarial defense. In short, protect your neural network against these tricky images, without knowing what the trick is.

      By the end of the tutorial, you will have a tool for tricking neural networks and an understanding of how to defend against tricks.


      To complete this tutorial, you will need the following:

      Step 1 — Creating Your Project and Installing Dependencies

      Let’s create a workspace for this project and install the dependencies you’ll need. You’ll call your workspace AdversarialML:

      Navigate to the AdversarialML directory:

      Make a directory to hold all your assets:

      • mkdir ~/AdversarialML/assets

      Then create a new virtual environment for the project:

      • python3 -m venv adversarialml

      Activate your environment:

      • source adversarialml/bin/activate

      Then install PyTorch, a deep-learning framework for Python that you’ll use in this tutorial.

      On macOS, install Pytorch with the following command:

      • python -m pip install torch==1.2.0 torchvision==0.4.0

      On Linux and Windows, use the following commands for a CPU-only build:

      • pip install torch==1.2.0+cpu torchvision==0.4.0+cpu -f
      • pip install torchvision

      Now install prepackaged binaries for OpenCV and numpy, which are libraries for computer vision and linear algebra, respectively. OpenCV offers utilities such as image rotations, and numpy offers linear algebra utilities such as a matrix inversion:

      • python -m pip install opencv-python== numpy==1.14.5

      On Linux distributions, you will need to install

      • sudo apt-get install libsm6 libxext6 libxrender-dev

      With the dependencies installed, let’s run an animal classifier called ResNet18, which we describe next.

      Step 2 — Running a Pretrained Animal Classifier

      The torchvision library, the official computer vision library for PyTorch, contains pretrained versions of commonly used computer vision neural networks. These neural networks are all trained on ImageNet 2012, a dataset of 1.2 million training images with 1000 classes. These classes include vehicles, places, and most importantly, animals. In this step, you will run one of these pretrained neural networks, called ResNet18. We will refer to ResNet18 trained on ImageNet as an “animal classifier”.

      What is ResNet18? ResNet18 is the smallest neural network in a family of neural networks called residual neural networks, developed by MSR (He et al.). In short, He found that a neural network (denoted as a function f, with input x, and output f(x)) would perform better with a “residual connection” x + f(x). This residual connection is used prolifically in state-of-the-art neural networks, even today. For example, FBNetV2, FBNetV3.

      Download this image of a dog with the following command:

      • wget -O assets/dog.jpg

      Image of corgi running near pond

      Then, download a JSON file to convert neural network output to a human-readable class name:

      • wget -O assets/imagenet_idx_to_label.json

      Next, create a script to run your pretrained model on the dog image. Create a new file called

      • nano

      First, add the Python boilerplate by importing the necessary packages and declaring a main function:

      from PIL import Image
      import json
      import torchvision.models as models
      import torchvision.transforms as transforms
      import torch
      import sys
      def main():
      if __name__ == '__main__':

      Next, load the mapping from neural network output to human-readable class names. Add this directly after your import statements and before your main function:

      . . .
      def get_idx_to_label():
          with open("assets/imagenet_idx_to_label.json") as f:
              return json.load(f)
      . . .

      Create an image transformation function that will ensure your input image firstly has the correct dimensions, and secondly is normalized correctly. Add the following function directly after the last:

      . . .
      def get_image_transform():
          transform = transforms.Compose([
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
          return transform
      . . .

      In get_image_transform, you define a number of different transformations to apply to the images that are passed to your neural network:

      • transforms.Resize(224): Resizes the smaller side of the image to 224. For example, if your image is 448 x 672, this operation would downsample the image to 224 x 336.
      • transforms.CenterCrop(224): Takes a crop from the center of the image, of size 224 x 224.
      • transforms.ToTensor(): Converts the image into a PyTorch tensor. All PyTorch models require PyTorch tensors as input.
      • transforms.Normalize(mean=..., std=...): Standardizes your input by subtracting the mean, then dividing by the standard deviation. This is described more precisely in the torchvision documentation.

      Add a utility to predict the animal class, given the image. This method uses both the previous utilities to perform animal classification:

      . . .
      def predict(image):
          model = models.resnet18(pretrained=True)
          out = model(image)
          _, pred = torch.max(out, 1)  
          idx_to_label = get_idx_to_label()  
          cls = idx_to_label[str(int(pred))]  
          return cls
      . . .

      Here the predict function classifies the provided image using a pretrained neural network:

      • models.resnet18(pretrained=True): Loads a pretrained neural network called ResNet18.
      • model.eval(): Modifies the model in-place to run in ‘evaluation’ mode. The only other mode is ‘training’ mode, but training mode isn’t needed, as you aren’t training the model (that is, updating the model’s parameters) in this tutorial.
      • out = model(image): Runs the neural network on the provided, transformed image.
      • _, pred = torch.max(out, 1): The neural network outputs one probability for each possible class. This step computes the index of the class with the highest probability. For example, if out = [0.4, 0.1, 0.2], then pred = 0.
      • idx_to_label = get_idx_to_label(): Obtains a mapping from class index to human-readable class names. For example, the mapping could be {0: cat, 1: dog, 2: fish}.
      • cls = idx_to_label[str(int(pred))]: Convert the predicted class index to a class name. The examples provided in the last two bullet points would yield cls = idx_to_label[0] = 'cat'.

      Next, following the last function, add a utility to load images:

      . . .
      def load_image():
          assert len(sys.argv) > 1, 'Need to pass path to image'
          image =[1])
          transform = get_image_transform()
          image = transform(image)[None]
          return image
      . . .

      This will load an image from the path provided in the first argument to the script. transform(image)[None] applies the sequence of image transformations defined in the previous lines.

      Finally, populate your main function with the following, to load your image and classify the animal in the image:

      def main():
          x = load_image()
          print(f'Prediction: {predict(x)}')

      Double check that your file matches our final step 2 script at on GitHub. Save and exit your script, and run the animal classifier:

      • python assets/dog.jpg

      This will produce the following output, showing your animal classifier works as expected:


      Prediction: Pembroke, Pembroke Welsh corgi

      That concludes running inference with your pretrained model. Next, you will see an adversarial example in action by tricking a neural network with impercetible differences in the image.

      Step 3 — Trying an Adversarial Example

      Now, you will synthesize an adversarial example, and test the neural network on that example. For this tutorial, you will build adversarial examples of the form x + r, where x is the original image and r is some “perturbation”. You will eventually create the perturbation r yourself, but in this step, you will download one we created for you beforehand. Start by downloading the perturbation r:

      • wget -O assets/adversarial_r.npy

      Now composite the picture with the perturbation. Create a new file called

      • nano

      In this file, you will perform the following three-step process, to produce an adversarial example:

      1. Transform an image
      2. Apply the perturbation r
      3. Inverse transform the perturbed image

      At the end of step 3, you will have an adversarial image. First, import the necessary packages and declare a main function:

      from PIL import Image
      import torchvision.transforms as transforms
      import torch
      import numpy as np
      import os
      import sys
      from step_2_pretrained import get_idx_to_label, get_image_transform, predict, load_image
      def main():
      if __name__ == '__main__':

      Next, create an “image transformation” that inverts the earlier image transformation. Place this after your imports, before the main function:

      . . .
      def get_inverse_transform():
          return transforms.Normalize(
              mean=[-0.485/0.229, -0.456/0.224, -0.406/0.255],  # INVERSE normalize images, according to
              std=[1/0.229, 1/0.224, 1/0.255])
      . . .

      As before, the transforms.Normalize operation subtracts the mean and divides by the standard deviation (that is, for the original image x, y = transforms.Normalize(mean=u, std=o) = (x - u) / o). You do some algebra and define a new operation that reverses this normalize function (transforms.Normalize(mean=-u/o, std=1/o) = (y - -u/o) / 1/o = (y + u/o) o = yo + u = x).

      As part of the inverse transformation, add a method that transforms a PyTorch tensor back to a PIL image. Add this following the last function:

      . . .
      def tensor_to_image(tensor):
          x =, 2, 0) * 255.  
          x = np.clip(x, 0, 255)
          return Image.fromarray(x.astype(np.uint8))
      . . .
      • converts the PyTorch tensor into a NumPy array. .transpose(1, 2, 0) rearranges (channels, width, height) into (height, width, channels). This NumPy array is approximately in the range (0, 1). Finally, multiply by 255 to ensure the image is now in the range (0, 255).
      • np.clip ensures that all values in the image are between (0, 255).
      • x.astype(np.uint8) ensures all image values are integers. Finally, Image.fromarray(...) creates a PIL image object from the NumPy array.

      Then, use these utilities to create the adversarial example with the following:

      . . .
      def get_adversarial_example(x, r):
          y = x + r
          y = get_inverse_transform()(y[0])
          image = tensor_to_image(y)
          return image
      . . .

      This function generates the adversarial example as described at the start of the section:

      1. y = x + r. Take your perturbation r and add it to the original image x.
      2. get_inverse_transform: Obtain and apply the reverse image transformation you defined several lines earlier.
      3. tensor_to_image: Finally, convert the PyTorch tensor back to an image object.

      Finally, modify your main function to load the image, load the adversarial perturbation r, apply the perturbation, save the adversarial example to disk, and run prediction on the adversarial example:

      def main():
          x = load_image()
          r = torch.Tensor(np.load('assets/adversarial_r.npy'))
          # save perturbed image
          os.makedirs('outputs', exist_ok=True)
          adversarial = get_adversarial_example(x, r)
          # check prediction is new class
          print(f'Old prediction: {predict(x)}')
          print(f'New prediction: {predict(x + r)}')

      Your completed file should match on GitHub. Save the file, exit the editor, and launch your script with:

      • python assets/dog.jpg

      You’ll see this output:


      Old prediction: Pembroke, Pembroke Welsh corgi New prediction: goldfish, Carassius auratus

      You’ve now created an adversarial example: tricking the neural network into thinking a corgi is a goldfish. In the next step, you will actually create the perturbation r that you used here.

      Step 4 — Understanding an Adversarial Example

      For a primer on classification, see “How to Build an Emotion-Based Dog Filter”.

      Taking a step back, recall that your classification model outputs a probability for each class. During inference, the model predicts the class with the highest probability. During training, you update the model parameters t to maximize the probability of the correct class y, given your data x.

      argmax_y P(y|x,t)

      However, to generate adversarial examples, you now modify your goal. Instead of finding a class, your goal is now to find a new image, x. Take any class other than the correct one. Let us call this new class w. Your new objective is to maximize the probability of the wrong class.

      argmax_x P(w|x)

      Note that the neural network weights t are missing from the above expression. This is because you now assume the role of the adversary: Someone else has trained and deployed a model. You are only allowed to create adversarial inputs and are not allowed to modify the deployed model. To generate the adversarial example x, you can run “training”, except instead of updating the neural network weights, you update the input image with the new objective.

      As a reminder, for this tutorial, you assume that the adversarial example is an affine transformation of x. In other words, your adversarial example takes the form x + r for some r. In the next step, you will write a script to generate this r.

      Step 5 — Creating an Adversarial Example

      In this step, you will learn a perturbation r, so that your corgi is misclassified as a goldfish. Create a new file called

      Import the necessary packages and declare a main function:

      from torch.autograd import Variable
      import torchvision.models as models
      import torch.nn as nn
      import torch.optim as optim
      import numpy as np
      import torch
      import os
      from step_2_pretrained import get_idx_to_label, get_image_transform, predict, load_image
      from step_3_adversarial import get_adversarial_example
      def main():
      if __name__ == '__main__':

      Directly following your imports and before the main function, define two constants:

      . . .
      TARGET_LABEL = 1
      EPSILON = 10 / 255.
      . . .

      The first constant TARGET_LABEL is the class to misclassify the corgi as. In this case, index 1 corresponds to “goldfish”. The second constant EPSILON is the maximum amount of perturbation allowed for each image value. This limit is introduced so that the image is imperceptibly altered.

      Following your two constants, add a helper function to define a neural network and the perturbation parameter r:

      . . .
      def get_model():
          net = models.resnet18(pretrained=True).eval()
          r = nn.Parameter(data=torch.zeros(1, 3, 224, 224), requires_grad=True)
          return net, r
      . . .
      • model.resnet18(pretrained=True) loads a pretrained neural network called ResNet18, like before. Also like before, you set the model to evaluation mode using .eval.
      • nn.Parameter(...) defines a new perturbation r, the size of the input image. The input image is also of size (1, 3, 224, 224). The requires_grad=True keyword argument ensures that you can update this perturbation r in later lines, in this file.

      Next, begin modifying your main function. Start by loading the model net, loading the inputs x, and defining the label label:

      . . .
      def main():
          print(f'Target class: {get_idx_to_label()[str(TARGET_LABEL)]}')
          net, r = get_model()
          x = load_image()
          labels = Variable(torch.Tensor([TARGET_LABEL])).long()
        . . .

      Next, define both the criterion and the optimizer in your main function. The former tells PyTorch what the objective is—that is, what loss to minimize. The latter tells PyTorch how to train your parameter r:

      . . .
          criterion = nn.CrossEntropyLoss()
          optimizer = optim.SGD([r], lr=0.1, momentum=0.1)
      . . .

      Directly following, add the main training loop for your parameter r:

      . . .
          for i in range(30):
    , EPSILON)
              outputs = net(x + r)
              loss = criterion(outputs, labels)
              _, pred = torch.max(outputs, 1)
              if i % 5 == 0:
                  print(f'Loss: {loss.item():.2f} / Class: {get_idx_to_label()[str(int(pred))]}')
      . . .

      On each iteration of this training loop, you:

      • Ensure the parameter r is small, within EPSILON of 0.
      • optimizer.zero_grad(): Clear any gradients you computed in the previous iteration.
      • model(x + r): Run inference on the modified image x + r.
      • Compute the loss.
      • Compute the gradient loss.backward.
      • Take a gradient descent step optimizer.step.
      • Compute the prediction pred.
      • Finally, report the loss and predicted class print(...).

      Next, save the final perturbation r:

      def main():
          . . .
          for i in range(30):
              . . .
          . . .

      Directly following, still in the main function, save the perturbed image:

      . . .
          os.makedirs('outputs', exist_ok=True)
          adversarial = get_adversarial_example(x, r)

      Finally, run prediction on both the original image and the adversarial example:

          print(f'Old prediction: {predict(x)}')
          print(f'New prediction: {predict(x + r)}')

      Double check your script matches on GitHub. Save, exit, and run the script:

      • python assets/dog.jpg

      Your script will output the following.


      Target class: goldfish, Carassius auratus Loss: 17.03 / Class: Pembroke, Pembroke Welsh corgi Loss: 8.19 / Class: Pembroke, Pembroke Welsh corgi Loss: 5.56 / Class: Pembroke, Pembroke Welsh corgi Loss: 3.53 / Class: Pembroke, Pembroke Welsh corgi Loss: 1.99 / Class: Pembroke, Pembroke Welsh corgi Loss: 1.00 / Class: goldfish, Carassius auratus Old prediction: Pembroke, Pembroke Welsh corgi New prediction: goldfish, Carassius auratus

      The last two lines indicate you have now completed construction of an adversarial example from scratch. Your neural network now classifies a perfectly reasonable corgi image as a goldfish.

      You’ve now shown that neural networks can be fooled easily—what’s more, the lack of robustness to adversarial examples has significant consequences. A natural next question is this: How can you combat adversarial examples? A good amount of research has been conducted by various organizations, including OpenAI. In the next section, you’ll run a defense to thwart this adversarial example.

      Step 6 — Defending Against Adversarial Examples

      In this step, you will implement a defense against adversarial examples. The idea is the following: You are now the owner of the animal classifier being deployed to production. You don’t know what adversarial examples may be generated, but you can modify the image or the model to protect against attacks.

      Before you defend, you should see for yourself how imperceptible the image manipulation is. Open both of the following images:

      1. assets/dog.jpg
      2. outputs/adversarial.png

      Here, you show both side by side. Your original image will have a different aspect ratio. Can you tell which is the adversarial example?

      (left) Corgi as goldfish, adversarial, (right)Corgi as itself, not adversarial

      Notice that the new image looks identical to the original. As it turns out, the left image is your adversarial image. To be certain, download the image and run your evaluation script:

      • wget -O assets/adversarial.png
      • python assets/adversarial.png

      This will output the goldfish class, to prove its adversarial nature:


      Prediction: goldfish, Carassius auratus

      You will run a fairly naive, but effective, defense: Compress the image by writing to a lossy JPEG format. Open the Python interactive prompt:

      Then, load the adversarial image as PNG, and save it back as a JPEG.

      • from PIL import Image
      • image ='assets/adversarial.png')

      Type CTRL + D to leave the Python interactive prompt. Next, run inference with your model on the compressed adversarial example:

      • python outputs/adversarial.jpg

      This will now output the corgi class, proving the efficacy of your naive defense.


      Prediction: Pembroke, Pembroke Welsh corgi

      You’ve now completed your very first adversarial defense. Note that this defense does not require knowing how the adversarial example was generated. This is what makes an effective defense. There are also many other forms of defense, many of which involve retraining the neural network. However, these retraining procedures are a topic of their own and beyond the scope of this tutorial. With that, this concludes your guide into adversarial machine learning.


      To understand the implications of your work in this tutorial, revisit the two images side-by-side—the original and the adversarial example.

      (left) Corgi as goldfish, adversarial, (right)Corgi as itself, not adversarial

      Despite the fact that both images look identical to the human eye, the first has been manipulated to fool your model. Both images clearly feature a corgi, and yet the model is entirely confident that the second model contains a goldfish. This should concern you and, as you wrap up this tutorial, keep in mind the fragility of your model. Just by applying a simple transformation, you can fool it. These are real, plausible dangers that evade even cutting-edge research. Research beyond machine-learning security is just as susceptible to these flaws, and, as a practitioner, it is up to you to apply machine learning safely. For more readings, check out the following links:

      For more machine learning content and tutorials, you can visit our Machine Learning Topic page.

      Source link