快速特性傻瓜:一种数据独立的方法，以普遍的对抗干扰

Fast Feature Fool: A data independentapproach to universal adversarialperturbations(2017)

原文连接：

https://arxiv.org/pdf/1707.05572.pdf

GB/T 7714 Mopuri K R, Garg U, Babu R V. Fast feature fool: A data independent approach to universal adversarial perturbations[J]. arXiv preprint arXiv:1707.05572, 2017.

MLA Mopuri, Konda Reddy, Utsav Garg, and R. Venkatesh Babu. "Fast feature fool: A data independent approach to universal adversarial perturbations." arXiv preprint arXiv:1707.05572 (2017).

APA Mopuri, K. R., Garg, U., & Babu, R. V. (2017). Fast feature fool: A data independent approach to universal adversarial perturbations. arXiv preprint arXiv:1707.05572.

Abstract

摘要

State-of-the-art object recognition Convolutional Neural Networks (CNNs) are shown to be fooled by image agnostic perturbations, called universal adversarial perturbations. It is also observed that these perturbations generalize across multiple networks trained on the same target data. However, these algorithms require training data on which the CNNs were trained and compute adversarial perturbations via complex optimization. The fooling performance of these approaches is directly proportional to the amount of available training data. This makes them unsuitable for practical attacks since its unreasonable for an attacker to have access to the training data. In this paper, for the first time, we propose a novel data independent approach to generate image agnostic perturbations for a range of CNNs trained for object recognition. We further show that these perturbations are transferable across multiple network architectures trained either on same or different data. In the absence of data, our method generates universal perturbations efficiently via fooling the features learned at multiple layers thereby causing CNNs to misclassify. Experiments demonstrate impressive fooling rates and surprising transferability for the proposed universal perturbations generated without any training data.

最先进的对象识别卷积神经网络(CNNs)被显示愚弄的图像不可知摄动，称为普遍对抗式摄动。我们还观察到，这些扰动在同一个目标数据上训练的多个网络上泛化。然而，这些算法需要训练cnn的训练数据，并通过复杂优化计算对抗扰动。这些方法的欺骗性能与可用的训练数据量成正比。这使得它们不适合实际的攻击，因为攻击者访问训练数据是不合理的。在本文中，我们首次提出了一种新的数据独立的方法，以产生图像的不可知摄动范围的cnn训练对象识别。我们进一步表明，这些扰动可以在基于相同或不同数据的多个网络架构之间转移。在缺乏数据的情况下，我们的方法通过欺骗在多层学习到的特征，有效地产生普遍扰动，从而导致cnn分类错误。实验证明，在没有任何训练数据的情况下产生的普遍扰动具有惊人的欺骗率和惊人的可转移性。

1. Introduction

1. 介绍

Machine learning systems are vulnerable [2, 3, 9] to adversarial samples - malicious input with structured perturbations that can fool the systems to infer wrong predictions. Recently, Deep Convolutional Neural Network (CNN) based object classifiers are also shown [8, 12, 14, 17, 22] to be fooled by adversarial perturbations that are quasi-imperceptible to humans. There have been multiple approaches formulated to compute the adversarial samples, exploiting linearity of the models [8], finite training data [1], etc. More importantly, adversarial samples can be transferred (generalized) from one model to another, even if the second model has a different architecture and trained on different subset of training data [8, 22]. This property allows an attacker to launch an attack without the knowledge of the target model’s internals, which makes them a dangerous threat for deploying the models in practice. Particularly, for critical applications that involve safety, robust models should be learned towards adversarial attacks. Therefore, the effect of adversarial perturbations warrants the need for in depth analysis of this subject.

机器学习系统很容易受到敌对样本的攻击[2,3,9]——带有结构扰动的恶意输入，可以欺骗系统推断出错误的预测。最近，基于深度卷积神经网络(CNN)的对象分类器[8,12,14,17,22]也被证明是被对敌摄动所愚弄的，而对敌摄动对人类来说是准察觉不到的。有多种方法可以用来计算敌对样本，利用模型[8]的线性，有限的训练数据[1]等。更重要的是，对抗性样本可以从一个模型转移(一般化)到另一个模型，即使第二个模型有不同的架构，在不同的训练数据子集上进行训练[8,22]。这个属性允许攻击者在不了解目标模型内部结构的情况下发起攻击，这使得他们在实际部署模型时成为一个危险的威胁。特别是对于涉及安全的关键应用程序，应该学习针对对抗性攻击的健壮模型。因此，对抗性扰动的影响证明有必要对这个问题进行深入分析。

Recent work by Moosavi-Dezfooli et al. [13] has shown that there exists a single perturbation image, called universal adversarial perturbation (UAP), that can fool a model with high probability when added to any data sample. These perturbations are image agnostic and show transferability (being able to fool) across multiple networks trained on the same data.

moosavi - dez愚i等人[13]最近的工作表明，存在一个单一的扰动图像，称为普遍对抗扰动(UAP)，当添加到任何数据样本时，它可以以高概率欺骗一个模型。这些干扰是与图像无关的，并且显示了跨多个网络在相同数据上的可移植性(能够欺骗)。

However, this method requires training data on which the target model is trained. They solve a complex data dependent optimization (equation (2)) to design a single perturbation that when added can flip the classifier’s prediction for most of the images. Using a subset of this training data, they iteratively update the universal perturbation with the objective of changing the predicted label. It is observed that their optimization procedure requires certain minimum amount of training data in order to converge. Moreover, the fooling performance of these approaches are directly proportional to the amount of available training data (Figure 3). This data dependence makes the approach not suitable for practical attacks as training data of the target system is generally unavailable.

然而，这种方法需要目标模型训练所依据的训练数据。他们解决了一个复杂的数据依赖优化(方程(2))设计一个单一扰动，当添加时，可以翻转分类器的预测大多数图像。利用训练数据的一个子集，他们迭代地更新通用扰动，目的是改变预测的标签。该算法的优化过程需要一定的最小训练数据量才能收敛。而且，这些方法的欺骗性能与可用的训练数据量成正比(图3)。这种数据依赖性使得该方法不适合实际攻击，因为目标系统的训练数据一般都不可用。

In order to address these shortcomings, we propose a novel data independent method to compute universal adversarial perturbations. The objective of our approach is to generate a single perturbation that can fool a target CNN on most of the images without any knowledge of the target data, such as, type of data distribution (eg: faces, objects, scenes, etc.), number of classes, sample images, etc. As our method has no access to data to learn a perturbation that can flip the classifier’s label, we aim to fool the features learned by the CNN. In other words, we formulate this as an optimization problem to compute the perturbation which can fool the features learned at individual layers in a CNN and eventually making it to misclassify a perturbed sample. Our method is computationally efficient and can compute universal perturbation for any target CNN quickly (Section 4.4 and Table 6), thereby named Fast Feature Fool algorithm. The main contributions of our work are listed below

为了解决这些缺点，我们提出了一种新的数据无关的计算通用对抗性摄动的方法。我们的方法的目标是产生一个单一的扰动，可以欺骗目标CNN在大多数图像上的目标数据没有任何知识，例如，数据分布的类型(例如:脸，对象，场景，等等)，类的数量，样本图像，等等。由于我们的方法无法获取数据来学习一个能翻转分类器标签的扰动，所以我们的目的是欺骗CNN学习的特征。换句话说，我们把这个问题表述为一个优化问题来计算扰动，这种扰动可以欺骗CNN在单个层学习到的特征，并最终使它对一个扰动样本进行错误分类。我们的方法计算效率很高，可以快速计算任意目标CNN的普遍扰动(第4.4节和表6)，因此称为快速特征傻瓜算法。我们工作的主要贡献如下

We introduce for the first time, a novel data independent approach to compute universal adversarial perturbations. To the best of our knowledge, there exists no previous work that can generate adversarial perturbations, universal or otherwise, without access to target training data. In fact, the proposed method doesn’t require any knowledge about target data distribution, all that it requires is the target network.
本文首次介绍了一种计算通用对抗性摄动的数据无关的新方法。据我们所知，在没有获得目标训练数据的情况下，没有任何以前的工作可以产生对抗干扰，普遍的或其他的。实际上，该方法不需要任何关于目标数据分布的知识，只需要目标网络即可。
We show that misfiring the features learned at individual layers of a CNN to produce impartial (undiscriminating) activations can lead to eventual misclassification of the sample. We present an efficient and generic objective to construct image agnostic perturbations that can misfire the features and fool the CNN with high probability.
我们证明，如果不使用在CNN的各个层学习到的特征来产生公正的(无区别的)激活，就会导致最终的样本误分类。我们提出了一个有效的和通用的目标来构造图像不可知扰动，它可以使特征失效并以高概率欺骗CNN。
We show that similar to data dependent universal perturbations, our data independent perturbations also exhibit remarkable transferability across multiple networks trained on same data. In addition to that, we show transferability across same architectures trained on different data. As our method is explicitly made data independent, the transfer performance of our method is far better compared to that of data dependent methods (Section 4.2 and Table 2). This property of data independent approach emphasizes the necessity for more research focus in this direction.
我们表明，类似于依赖数据的普遍扰动，我们的数据独立扰动也表现出显著的可转移性，在多个网络上训练相同的数据。此外，我们还展示了在不同数据上训练的相同架构之间的可转移性。由于我们的方法明确地使数据独立，我们的方法的传输性能比依赖数据的方法要好得多(第4.2节和表2)。数据独立方法的这一特性强调了在这方面进行更多研究的必要性。

The paper is organized as following: section 2 discusses existing approaches to compute adversarial perturbations, section 3 presents the proposed data independent approach in detail, section 4 demonstrates the effectiveness of the proposed method to fool CNNs and their transferability via a set of comprehensive experiments and section 5 hosts discussion and provides useful inferences and directions to continue further study about the adversarial perturbations and design of robust CNN classifiers.

本文组织如下:第二节讨论了现有方法计算敌对的扰动,第三节介绍了提出详细数据独立方法,第四节演示了该方法的有效性,傻瓜CNN和可通过一组综合实验和第五节主持人讨论和提供有用的推论和方向继续进一步研究关于敌对的扰动和健壮的CNN分类器的设计。

2. 相关的工作

Szegedy et al. [22] observed that, despite their excellent recognition performances, neural networks get fooled by structured perturbations that are quasi-imperceptible to humans. Later, multiple other [6, 7, 8, 12, 14, 15, 18] investigations studied this interesting property called, adversarial perturbations. These crafted malicious perturbations can be estimated per data point by simple gradient ascent [8] or complex optimizations [14, 22]. Note that, the underlying property for all these methods is, they are intrinsically data dependent. The perturbations are computed for each data point specifically and independent of each other.

Szegedy等人[22]观察到，尽管神经网络具有出色的识别性能，但它们会被类似于人类察觉不到的结构扰动所愚弄。后来，其他许多[6,7,8,12,14,15,18]的研究研究了这个叫做对抗性扰动的有趣性质。这些精心设计的恶意扰动可以通过简单的[8]梯度提升或复杂的优化来估计每个数据点[14,22]。请注意，所有这些方法的基本属性是，它们本质上是依赖于数据的。对每个数据点分别计算了扰动，且相互独立。

Moosavi-Dezfooli et al. [13] consider a generic problem to craft a single perturbation to fool a given CNN on most of the natural images, called universal perturbation. They collect samples from the data distribution and iteratively craft a single perturbation that can flip the labels over these samples. Fooling any new image now involves just an addition of the universal perturbation to it (no more optimization). In their work, they investigate the existence of a single adversarial direction in the space that is sufficient to fool most of the images from the data distribution. However, it is observed that the convergence of their optimization requires to sample enough data points from the distribution. Also, fooling rate increases proportionately with the sample size (Figure 3), making it an inefficient attack.

moosavie - dez愚i等人[13]考虑一个一般性的问题来设计一个单一的扰动来愚弄一个给定的CNN对大多数的自然图像，称为普遍扰动。他们从数据分布中收集样本，反复地制造一个扰动，欺骗任何新图像现在只需使这些样本的标签翻转。要添加一个普遍的扰动(没有更多的优化)。在他们的工作中，他们调查了空间中存在的一个对抗方向，这个方向足以欺骗数据分布中的大多数图像。然而，我们发现它们的优化收敛需要从分布中采样足够的数据点。而且，欺骗率随样本大小成比例地增加(图3)，使得攻击效率低下。

Another line of research [16], called oracle-based black box attacks, trains a local CNN with crafted inputs and output labels provided by the target CNN (called victim network). They use the local network to craft adversarial samples and show that they are effective on the victim (original) network also.

[16]的另一项研究被称为基于oracle的黑盒攻击，它利用目标CNN(称为受害者网络)提供的精心制作的输入和输出标签来训练一个本地CNN。他们使用本地网络来制作对抗样本，并表明他们对受害者(原始)网络也是有效的。

On the other hand, the proposed approach aims towards solving a more generic and difficult problem: we seek data independent universal perturbations, without sampling any images from the data distribution or train a local replica of the target model. We exploit the fundamental hierarchical nature of the features learned by the target CNN to fool it over most of the natural images. Also, we explore the existence of universal perturbations that can be transferred across architectures trained on different data distributions.

另一方面，提出的方法旨在解决一个更普遍和困难的问题:我们寻求数据独立的普遍扰动，不从数据分布采样任何图像或训练目标模型的局部副本。我们利用目标CNN学习到的特征的基本层次性来欺骗它对大多数自然图像。此外，我们还探讨了通用扰动的存在性，这些扰动可以在基于不同数据分布的架构之间传递。

3. Fast Feature Fool

3. 快速功能傻瓜

In this section we present the proposed Fast Feature Fool algorithm to compute universal adversarial perturbations in a data independent fashion.

在这一节中，我们提出了快速特征傻瓜算法，以一种数据独立的方式计算通用的对抗摄动。

First, we introduce the notation followed throughout the paper. Let $X$ denote the distribution of images in $\mathbb{R}^{d}$ and f denotes the classification function learned by a CNN that maps an image $x ∼ X$ from the distribution to an estimated label f(x).

首先，我们介绍贯穿全文的符号。设 $X$ 表示图像在 $\mathbb{R}^{d}$ 中的分布， $f$ 表示CNN学习到的分类函数，该函数将图像 $x \sim X$ 从分布映射到估计的标签 $f(X)$ 。

The objective of this paper is to find a perturbation δ ∈ R d that fools the classifier f on a large fraction of data points from X without utilizing any samples. In other words, we seek a data independent universal (image agnostic) perturbation that can misclassify majority of the target data samples. That is, we construct a δ such that

本文的目的是在不使用任何样本的情况下，找到一种扰动映射R d，使分类器f在来自X的大量数据点上欺骗。换句话说，我们寻求一种数据独立的、普遍的(图像不可知)扰动，这种扰动可以对大多数目标数据样本进行误分类。也就是说，我们构造一个这样的变量

For the perturbation to be called adversarial, it has to be quasi-imperceptible for humans. That is, the pixel intensities of perturbation δ should be restricted. Existing works (eg: [13, 14, 20]) impose an l 8constraint (ξ) on the perturbation δ to realize imperceptibility. Therefore, the goal here is to find a δ such that,

要把这种扰动称为对抗性的，它必须对人类来说是准察觉不到的。也就是说，摄动色调的像素强度应该受到限制。现有的工作(如:[13、14、20])对l 8约束(ξ)扰动δ意识到细微。因此，这里的目标是找到一个这样的

As the primary focus of the proposed approach is to craft the universal perturbations (δ) without any knowledge about target data X , we attempt to exploit the dependencies across the layers in a given CNN. The data independence prohibits us to impose the first part of equation 2 while learning δ. Therefore, we propose to fool the CNN by over-saturating the features learned at multiple layers (replacing the “flipping the label" objective). That is, by adding perturbation to input, we make the features at each layer to misfire thereby misleading the features (filters) at the following layer. This cumulative disturbance of features along the network hierarchy makes the network impartial (undiscriminating) to the original input, leading to an erroneous prediction in the final layer.

由于所提议的方法的主要焦点是在没有任何关于目标数据X的知识的情况下创建通用扰动(戏弄)，我们尝试利用给定CNN中各层之间的依赖关系。数据独立性禁止我们在学习学习的时候强加方程2的第一部分。因此，我们建议通过过度饱和在多层学习到的特征来欺骗CNN(代替翻转标签“objective”)。也就是说，通过向输入添加扰动，我们使每一层的特性失效，从而误导下一层的特性(过滤器)。这种特征沿网络层次的累积干扰，使得网络对原始输入不偏倚(无区别)，导致最后一层的预测错误。

The perturbation should essentially cause filters at a particular layer to spuriously fire and abstract out uninformative activations. Note that in the presence of data (during the attack), to mislead the activations from retaining the discriminative information, the perturbation has to be highly effective, given the added imperceptibility constraint (second part of equation 2). Therefore, the difficulty of the attempted problem lies in crafting a perturbation δ whose dynamic range is restricted typically [13, 14, 20] (with ξ = 10) to less than 8% of the data range.

扰动应该在本质上导致特定层的过滤器虚假地触发并抽象出无信息的激活。注意的数据(在攻击),从保留区别的信息,误导激活扰动必须高效、考虑到细微添加约束(第二部分的方程2)。因此,尝试困难的问题在于制定一个扰动δ限制通常是谁的动态范围(13、14、20)(ξ= 10)不到8%的数据范围。

Hence, without showing any data (x) to the target CNN, we seek for a perturbation (δ) that can produce maximal spurious activations at each layer. In order to obtain such a perturbation, we start with a random δ and optimize for the following loss

因此，在不向目标CNN显示任何数据(x)的情况下，我们寻求能在每一层产生最大伪激活的扰动(扰动)。为了得到这样的扰动，我们从一个随机扰动开始，并对下面的损失进行优化

where, $\bar{l}(\delta)$ is the mean activation in the output tensor at layer i when δ is input to the CNN. Note that the activations are considered after the non-linearity (typically ReLU), therefore ¯li is non-negative. K is the total number of layers in the CNN at which we maximize activations for the perturbation δ. We typically consider all the convolution layers before the fully connected layers (see section 4 for advanced architectures). This is because, the convolution layers are generally considered to learn suitable features to extract information over which a series of fully connected layers act as a classifier. Also, we empirically found that it is sufficient to optimize at convolution layers. Therefore, we restrict the optimization to feature extraction layers, ξ is the limit on the pixel intensity of the perturbation δ.

式中， $\bar{l}(\delta)$ 是将系指输入到CNN时，第i层输出张量的平均激活量。注意，激活是在非线性之后考虑的(通常是ReLU)，因此li是非负的。K是在CNN中我们使扰动的激活最大化的层数。我们通常在考虑完全连接层之前考虑所有卷积层(参见第4节了解高级架构)。这是因为，卷积层通常被认为是学习合适的特征来提取信息，而在这些信息上，一系列完全连接的层充当一个分类器。此外，我们经验地发现，在卷积层上进行优化就足够了。因此,我们限制优化特征提取层,ξ的限制像素强度扰动δ。

The proposed objective computes product of mean activations at multiple layers in order to simultaneously maximize the perturbation at all those layers. We observed that product results in a stronger δ than other forms of combining the individual layer activations (eg:sum). This is understandable, as product is a stronger constraint that forces activations at all layers to increase for the loss to reduce. To avoid working with extreme values (≈ 0), we apply log on the product. Note that the objective is open-ended as there is no optimum value to reach. We would ideally want δ to cause as much strong perturbation at all the layers as possible within the imperceptibility constraint.

该目标计算了各层的平均激活积，以同时最大化各层的扰动。我们观察到，产品结果在一个更强的分裂比其他形式的结合个别层激活(如:总和)。这是可以理解的，因为product是一个更强的约束，它迫使所有层的激活增加，以减少损失。为了避免使用极值(0)，我们在产品上应用log。请注意，目标是开放的，因为没有最佳值要达到。理想情况下，我们希望在不可感知的限制范围内，在所有的层上造成尽可能多的强扰动。

We begin with a trained network and a random perturbation image δ. We then perform the above optimization to update δ to achieve higher activations at all the convolution layers in the given network. Note that the optimization updates the perturbation image δ but not the network parameters and no image data is involved in the optimization. We update δ with the gradients computed for loss in equation (3). After every update step, we clip the perturbation δ to satisfy the imperceptibility constraint. We treat the algorithm has converged when either the loss gets saturated or fooling performance over a small held out set is maximized.

我们从一个训练过的网络和一个随机扰动的图像样本开始。然后，我们执行上述优化，以更新，以实现更高的激活在所有卷积层在给定的网络。注意，优化更新的是扰动映像保留，而不是网络参数，并且优化中不涉及任何映像数据。我们用公式(3)中计算的损失梯度来更新更新更新。每更新一步后，我们剪辑摄动更新以满足不可感知约束。我们认为当损耗达到饱和或在一个小延迟集上的欺骗性能最大化时，算法是收敛的。

4. Experiments

4. 实验

In this section, we evaluate the proposed Fast Feature Fool method to fool multiple CNN architectures trained on ILSVRC [19] dataset. Particularly, we considered CaffeNet (similar to Alexnet [11]), VGG-F [4], VGG-16 [21], VGG-19 [21] and GoogLeNet [23] architectures. As the primary experiment, we compute the image agnostic universal perturbations for each of these CNNs via optimizing the loss given in equation (3). For simple networks (eg: CaffeNet) we optimize the activations at all the convolution layers after the non-linearity (ReLU). However, for networks with complex architectures (inception block) such as GoogLeNet, we optimize activations at selected layers. Specifically, for GoogLeNet, we compute the perturbations by maximizing the activations at all the concat layers of inception blocks and conv layers that are not part of any inception block. This is because maximizing at concat layer for the case of inception blocks inherently takes care of optimization for the convolution layers since they are part of concat layers.

在本节中，我们评估所提出的快速特征欺骗方法来欺骗在ILSVRC[19]数据集上训练的多个CNN架构。特别地，我们考虑了CaffeNet(类似于Alexnet[11])、VGG-F[4]、VGG-16[21]、VGG-19[21]和GoogLeNet[23]架构。作为主要实验，我们通过优化公式(3)中给出的损失来计算每个cnn的图像不可知的普遍扰动。对于简单网络(例如CaffeNet)，我们优化了非线性(ReLU)之后所有卷积层的激活。然而，对于具有复杂架构(inception块)的网络，如GoogLeNet，我们在选定的层上优化激活。具体地说，对于GoogLeNet，我们通过最大化初始块的所有concat层和不属于任何初始块的conv层的激活来计算扰动。这是因为在初始块的情况下，对concat层进行最大化本质上考虑了对卷积层的优化，因为它们是concat层的一部分。

Similar to existing approaches [13, 14, 20], we restricted the pixel intensities of the perturbation to lie within −10,+10 range by choosing ξ in equation (3) to be 10. Figure 1 shows the universal perturbations δ obtained for the networks using the proposed method. Note that the perturbations are visually different for each network architecture. Figure 2 shows sample perturbed images (x + δ) for GoogLenet from ILSVRC validation set along with their corresponding original images. Note that the adversarial images are perceptually indistinguishable from original ones and yet get misclassified by the CNN.

类似于现有方法(13、14、20),我们限制了像素强度的扰动躺在10 + 10的范围选择ξ在方程(3)10。图1显示了使用所提方法获得的网络的通用扰动控制。注意，每个网络架构的干扰在视觉上是不同的。图2显示了来自ILSVRC验证集的GoogLenet的扰动图像样本(x +图像)及其对应的原始图像。值得注意的是，敌对的图像与原始图像在感知上无法区分，但却被CNN误分类。

Table 1: Fooling rates for the proposed perturbations crafted for multiple networks trained on ILSVRC dataset computed over 50000 validation images. Each row shows fooling rates for perturbation crafted for a particular network. Each column shows the transfer fooling rates obtained for a given network. Diagonal values in bold are the fooling rates obtained via dedicated optimization for each architecture.

表1:在ILSVRC数据集上训练的多个网络中，所提出的扰动的欺骗率计算了超过50000张验证图像。每行显示针对特定网络设计的干扰的欺骗率。每一列显示了给定网络获得的传输欺骗率。粗体的对角线值是通过对每种架构进行专门优化而获得的欺骗率。

4.1 Transferability across network architectures

4.1跨网络架构的可转移性

Interesting property of the proposed data independent universal perturbations is that they transfer not only to other images but also to different network architectures trained on the same dataset. That is, we compute universal perturbation on one architecture (eg: VGGF) and observe its ability to fool on other networks (eg: GoogLeNet). This property is observed in the case of existing data dependent perturbations [8, 13, 22] as well. However, transferability of proposed perturbations is a serious issue and needs more attention, as the perturbation is crafted without any data. Table 1 presents the fooling rates (% of images for which the predicted label is flipped by adding δ) of the proposed approach including the transfer rates on multiple networks. Note that all these architectures are trained on ILSVRC dataset and the fooling rates are computed for the 50000 validation images from the dataset. Each row in Table 1 shows fooling rates for perturbation crafted for a particular network. Each column shows the transfer fooling rates obtained for a given network. Diagonal values in bold are the fooling rates obtained for dedicated optimization for each of the architectures. Observe that, the perturbations crafted for some of the architectures generalize very well across multiple networks. For example, perturbation obtained for GoogLeNet (3rd row in Table 1) has a minimum fooling rate of 40.17% across all the tested networks. Note that the average transfer rate of the crafted perturbations for all the five networks is 41.31% which is very significant given the method has no knowledge about target data distribution. These results show that our data independent universal perturbations generalize well across different architectures and are of practical importance.

提出的数据无关的通用扰动的有趣性质是，它们不仅传输到其他图像，还传输到在同一数据集上训练的不同网络架构。也就是说，我们计算一个架构(如:VGGF)上的普遍扰动，并观察它在其他网络(如:GoogLeNet)上的欺骗能力。在已有的数据相关扰动[8,13,22]的情况下也可以观察到这一特性。然而，提出的扰动的可转移性是一个严重的问题，需要更多的关注，因为扰动是在没有任何数据的情况下制作的。表1给出了所提方法的蒙骗率(通过添加转导翻转预测标签的图像的%)，包括在多个网络上的传输率。请注意，所有这些架构都是在ILSVRC数据集上训练的，并且对数据集中的50000张验证图像进行欺骗率计算。表1中的每一行显示了针对特定网络设计的干扰的欺骗率。每一列显示了给定网络获得的传输欺骗率。粗体的对角线值是针对每种架构进行专门优化所获得的欺骗率。请注意，为某些架构设计的扰动可以很好地适用于多个网络。例如，对GoogLeNet(表1中第3行)获得的扰动在所有测试网络上的最小欺骗率为40.17%。值得注意的是，五个网络的精心设计扰动的平均传输率为41.31%，这对于不知道目标数据分布的方法来说是非常重要的。这些结果表明，我们的数据无关普遍摄动在不同结构上有很好的推广效果，具有实际意义。

4.2 Transferability across data

4.2 数据间的可转移性

Until now, existing methods have shown the transferability of adversarial perturbations to images belonging to single distribution (eg: ILSVRC dataset) and multiple networks trained on same target distribution (some times on disjoint subsets [22]). We now extend the notion of transferability by considering multiple data distributions. That is, we compute universal perturbation for a given CNN trained over one dataset (eg: ILSVRC [19]) and observe the fooling rate for the same architecture trained on another dataset (eg: Places [24]). For this evaluation, we have considered three network architectures trained on ILSVRC and Places205 datasets. Since the proposed objective is data independent and the crafted perturbations aim at fooling the learned features, we study the extent to which the perturbations fool similar filters learned on different data. Table 2 presents the change in fooling rates for the proposed method when evaluated across datasets for multiple networks.

到目前为止，已有的方法已经证明了对抗性扰动对于属于单一分布的图像(如ILSVRC数据集)和在同一目标分布上训练的多个网络(有时在不相交的子集[22]上)的可转移性。现在我们通过考虑多种数据分布来扩展可移植性的概念。也就是说，我们计算在一个数据集上训练的给定CNN的普遍扰动(例如:ILSVRC[19])，并观察在另一个数据集上训练的相同架构的欺骗率(例如:Places[24])。为了进行评估，我们考虑了三种受过ILSVRC和Places205数据集训练的网络架构。由于提出的目标是数据独立的，而精心设计的扰动旨在欺骗学习的特征，我们研究了在不同数据上的扰动欺骗相似滤波器的程度。表2给出了在跨数据集对多个网络进行评估时所提出方法的愚弄率的变化。

Table 2: Comparing transferability across data. Change in fooling rates for the perturbations crafted for architectures trained on ILSVRC [19] and evaluated on same architectures trained on Places-205 [24]. The results clearly show that the absolute change in fooling rate for UAP [13] is significantly higher than our approach because of the strong data dependence. Note that the perturbation trained on CaffeNet was tested on AlexNet trained on Places (the networks differ slightly) and explains the larger drop in case of CaffeNet for the proposed approach.

表2:比较数据之间的可转移性。在ILSVRC[19]上训练的架构和在place -205[24]上训练的相同架构上评估的扰动的欺骗率的变化。结果表明，由于数据依赖性强，UAP[13]的欺骗率的绝对变化明显高于我们的方法。注意，在CaffeNet上训练的扰动是在在Places上训练的AlexNet上测试的(网络略有不同)，这解释了在提出的方法中，CaffeNet的扰动下降更大。

In order to bring out the effectiveness of the proposed data independent perturbations, we compared with the performance of data dependent perturbations (UAP) [13]. We have evaluated the fooling rates of [13], crafted on ILSVRC and transferred to Places-205. The validation set of Places-205 dataset contains 20500 images from 205 scene categories over which the fooling rates are computed. Table 2 shows the absolute change in fooling rate ( | rate on ILSVRC - rate on Places-205 | ) of the perturbations when evaluated on Places-205 trained architectures. It is clearly observed that the proposed approach on average suffers less change in the fooling rate compared to data dependent approach. Note that the data dependent universal perturbations quickly lose their ability to fool the features trained on different data even for the same architecture. This is understandable as they are crafted in association with target data distribution (X ). Unlike the data dependent approaches, the proposed perturbations are not tied to any target data and the objective makes them more generic to fool data from different distributions.

为了证明所提出的数据独立扰动的有效性，我们与数据依赖扰动[13]的性能进行了比较。我们评估了[13]的欺骗率，在ILSVRC上制作并转移到Places-205。Places-205数据集的验证集包含205个场景类别的20500幅图像，在此基础上计算了欺骗率。表2显示了在Places-205训练过的架构上评估扰动的欺骗率(ILSVRC上的|率- place -205 |率)的绝对变化。可以清楚地看到，与依赖数据的方法相比，所提出的方法在欺骗率上的平均变化较小。请注意，依赖于数据的通用干扰很快就失去了欺骗在不同数据上训练的特性的能力，即使在相同的体系结构中也是如此。这是可以理解的，因为它们是与目标数据分布(X)相关联的。与依赖数据的方法不同，所提出的扰动不绑定到任何目标数据，而目标使它们更通用，可以欺骗来自不同分布的数据。

4.3 Initialization with smaller network’s perturbation

4.3 较小网络扰动的初始化

In all the above set of experiments we begin our optimization with perturbation initialized with uniform random distribution − 10,+10 . In this section, we investigate the effect of a pretrained initialization for δ on the proposed objective and optimization. We consider perturbations computed for shallow network (eg: VGG-F) as initialization and optimize for deeper nets (GoogLeNet, VGG-16, VGG-19). Note that all the networks are trained on ILSVRC dataset. Table 3 shows the obtained fooling rates for other networks when initialized with VGG-F’s perturbation, and shows the improvement over random initialization(shown in parentheses). This improvement is understandable as the perturbation from VGG-F already has some structure and shows transferability on the deeper networks, therefore optimization using this offers slight imporvement when compared to random initialization.

在所有上述的实验中，我们开始我们的优化与扰动初始化与均匀随机分布10，+10。在本节中，我们将研究预训练的初始化对所提出的目标和优化的影响。我们将浅层网络(如:vggi - f)的扰动计算作为初始化，并对较深的网络(GoogLeNet, vggi -16, vggi -19)进行优化。注意，所有的网络都是在ILSVRC数据集上训练的。表3显示了在vggg - f干扰下初始化其他网络时获得的欺骗率，并显示了相对于随机初始化的改进(如括号所示)。这种改进是可以理解的，因为来自vggg - f的扰动已经有了一些结构，并且在更深的网络上显示了可转移性，因此与随机初始化相比，使用这种方法进行优化提供了轻微的改进。

Table 3: Fooling rates for deeper networks initialized with smaller network’s perturbation. All the networks are trained on ILSVRC dataset. Proposed universal perturbations are computed with VGG-F’s perturbation as initialization. Improvent over random initialization shown in parenthesis.

表3:用较小的网络干扰初始化的较深网络的欺骗率。所有的网络都是在ILSVRC数据集上训练的。以vggg - f微扰为初始化，计算了所提出的通用微扰。在括号中显示的随机初始化的改进。

4.4 Comparison with data dependent universal perturbations

4.4 与依赖数据的普遍扰动的比较

In this section, we investigate how the proposed perturbations compare to data dependent counterparts. We consider two cases, when the data dependent approach (UAP[13]): (i) has access to the target dataset, and (ii) uses images form a different dataset instead of the target dataset.

在本节中，我们将研究如何将建议的扰动与依赖数据的对应项进行比较。我们考虑两种情况，当数据依赖方法(UAP[13]):(i)访问目标数据集，和(ii)使用图像形成一个不同的数据集，而不是目标数据集。

4.4.1 With access to target dataset

与访问目标数据集

Note that the data dependent methods [13] when utilize samples (X) from target data distribution to craft the perturbations, they are expected to demonstrate higher fooling rates on the target datasets. However, we compare the fooling rates for different sample sizes (X) versus our no data fooling rate performance. Figure 3 presents the comparison for three networks trained on ILSVRC dataset evaluated on 50000 validation images. As evaluated in [13], we have computed fooling performance for 500,1000,2000, 4000 and 10000 samples from the training data for them. Note that the fooling performance of [13] improves monotonically with the sample (X) size for all the networks. It shows the strong association of their perturbations to target data distribution and lose their ability to fool when evaluated on different data though the target architectures are similar. On the other hand, the proposed perturbations, as they do not utilize any data, they fool networks trained on other data equally well (Table 2).

请注意，数据依赖的方法[13]当利用来自目标数据分布的样本(X)来制造扰动时，它们被期望在目标数据集上显示更高的欺骗率。然而，我们比较了不同样本大小(X)的欺骗率与我们的无数据欺骗率性能。图3展示了在50000张验证图像上用ILSVRC数据集训练的三个网络的对比。在[13]中，我们分别对训练数据中的500个、1000个、2000个、4000个和10000个样本进行了愚弄性能的计算。注意，对于所有网络，[13]的欺骗性能随着样本(X)的大小单调地提高。它显示了它们的扰动与目标数据分布的强烈关联，并且在对不同数据进行评估时失去了欺骗的能力，尽管目标架构是相似的。另一方面，由于所提出的扰动不使用任何数据，它们同样能很好地愚弄在其他数据上训练过的网络(表2)。

4.4.2 Without the access to target dataset

4.4.2不能访问目标数据集

While it is not a reasonable assumption to have access to the training data, we can argue that the data dependent methods (UAP) can use images from an arbitrary dataset and train the perturbations. In this section, we investigate by learning UAPs for a target architecture using an arbitrary dataset. We have considered ILSVRC and Places-205 datasets.

Figure 3: Comparision of fooling rates for the proposed approach with UAP [13] for multiple networks trained on ILSVRC [19]. Legend shows the size of (X) sampled from training data. Note that, due to their strong data dependence the performance of UAP [13] increases monotonically with size of X for all networks. This strong data dependence explains the larger drop in performance of UAP when tested on the same architecture trained on different datasets, as shown in Table 2.

虽然访问训练数据并不是一个合理的假设，但我们可以说，数据依赖方法(UAP)可以使用来自任意数据集的图像来训练扰动。在本节中，我们通过学习使用任意数据集的目标架构的UAPs来进行研究。我们考虑了ILSVRC和Places-205数据集。

图3:在ILSVRC[19]上训练的多个网络中，所提方法与UAP[13]的欺骗率比较。图例显示了从训练数据中采样的(X)的大小。注意，由于UAP[13]的数据依赖性很强，所有网络的性能都会随着X的大小单调地增加。这种强烈的数据依赖性解释了在不同数据集上训练的相同体系结构上测试UAP时性能下降更大的原因，如表2所示。

Table 4 shows the fooling rates on ILSVRC trained networks, where we used Places-205 data to generate the perturbation for UAP [13] and Table 5 shows the reverse scenario. Note that in both cases our approach just needs the target network and no data. The numbers are computed on the validation set of the corresponding datasets. These experiments clearly show that the data dependent perturbations [13] are strongly tied to the target dataset and experience a significant drop in performance if the same is unavailable. It is also seen that this drop is more severe with larger networks (GoogLeNet). On the contrary, our approach without using any data results in significantly better performance for CNNs trained on both datasets.

表4显示了ILSVRC训练的网络上的欺骗率，其中我们使用place -205数据生成UAP[13]的干扰，表5显示了相反的场景。注意，在这两种情况下，我们的方法只需要目标网络而不需要数据。这些数字是在相应数据集的验证集上计算的。这些实验清楚地表明，与数据相关的扰动[13]与目标数据集紧密相连，如果不可用，则性能会显著下降。也可以看到，这种下降在较大的网络(GoogLeNet)中更为严重。相反，我们的方法在不使用任何数据的情况下，在两个数据集上训练的CNNs的性能都有显著提高。

Table 4: Fooling rates obtained when UAPs [13] are trained and tested for ILSVRC architectures using the data from Places-205. Note that our approach doesn’t require any data.

表4:使用place -205的数据对UAPs[13]进行ILSVRC架构的训练和测试时获得的欺骗率。注意，我们的方法不需要任何数据。

Table 5: Fooling rates obtained when UAPs [13] are trained and tested for Places-205 architectures using the data from ILSVRC. Note that our approach doesn’t require any data.

表5:使用ILSVRC的数据对UAPs[13]进行了Places-205架构的训练和测试，获得了欺骗率。注意，我们的方法不需要任何数据。

4.4.3 Convergence time

4.4.3收敛时间

The time of convergence for the proposed optimization is compared with that of data dependent universal perturbations approach [13] in Table 6. We have utilized the implementation provided by the authors of [13] that samples 10000 training images. Convergence time is reported in seconds for both the approaches on three different network architectures trained on ILSVRC dataset. Note that the proposed approach takes only a small fraction of time taken by [13]. We have run the timing experiments on an NVIDIA GeForce TITAN-X GPU with no other jobs on the system.

表6将所提优化方法的收敛时间与数据依赖通用摄动方法[13]的收敛时间进行了比较。我们利用了[13]的作者提供的实现方法，对10000张训练图像进行了采样。两种方法在三种不同的网络架构上的收敛时间以秒为单位进行了ILSVRC数据集的训练。注意，所提议的方法只需要[13]所花费时间的一小部分。我们在NVIDIA GeForce titanium - x GPU上进行了计时实验，系统上没有其他工作。

Table 6: Comparison of the time of convergence for UAP [13] and the proposed approch. It is observed that the proposed approach takes only a fraction of time compared to UAP across different network architechtures.

4.5 Implementation details

4.5实现细节

In this section, for the ease of reproducibility we explain the implementation details of the proposed approach. We conducted all experiments using the TensorFlow [5] framework. As the objective is to craft universal perturbations, for each network we extracted the activations at all convolution or concat (for inception) layers and formulated the loss as the log of product of activations at different layers (eq 3), we minimized the negative of this and so the loss is unbounded in the negative direction. We used the Adam [10] optimizer with a learning rate of 0.1 with other parameters set at their default values. We monitor the loss to see when it saturates and check validation over a held out set of 1000 images to save the perturbation. Since, the optimization updates just the input (δ) which is restricted to [-10 10] range, it gets saturated very quickly. Therefore, to avoid later updates from being ignored we periodically rescale the perturbation to [-5 5] range and then continue the optimization. Empirically we found rescaling at every 300 iterations to work better and we use that for all our experiments. Project codes are available at https://github.com/utsavgarg/fast-feature-fool.

在本节中，为了便于再现，我们将解释所提议的方法的实现细节。我们使用TensorFlow[5]框架进行了所有的实验。的目标是工艺普遍扰动,每个网络我们提取激活卷积或concat(《盗梦空间》)层和制定产品的损失作为日志在不同的层(eq 3)激活,我们最小的负负方向的损失是无界的。我们使用的Adam[10]优化器的学习率为0.1，其他参数设置为默认值。我们监视损失，看它什么时候饱和，并检查一组1000个图像的验证，以保存扰动。由于优化只更新限制在[-10 10]范围内的输入(更新)，因此它很快就会饱和。因此，为了避免以后的更新被忽略，我们周期性地将扰动调整到[-5 5]范围，然后继续优化。根据经验，我们发现每300次迭代调整一次刻度效果更好，我们在所有实验中都使用了这种方法。项目代码可以在https://github.com/utsavgarg/fast- feates-fool获得。

5 Conclusion

5 结论

We have presented a simple and effective procedure to compute data independent universal adversarial perturbations. The proposed perturbations are quasi-imperceptible to humans but they fool state-of-the-art CNNs with significantly high fooling rates. In fact, the proposed perturbations are triply universal: (i) the same perturbation can fool multiple images form the target dataset over a given CNN, (ii) they demonstrate transferability across multiple networks trained on same dataset and (iii) they surprisingly retain (compared to data dependent perturbations) the ability to fool CNNs trained on different target dataset.

本文提出了一种计算数据无关的普遍对抗性摄动的简单有效的方法。所提出的扰动对人类来说是准不可察觉的，但它们以显著的高愚弄率愚弄了最先进的cnn。事实上,该扰动三重普遍:(i)同样的扰动可以愚弄多个图像形式的目标数据集在给定CNN,(2)他们演示跨多个相同网络训练数据集和可转移性(3)他们惊人的保留(相对于数据依赖的扰动)傻瓜CNN的能力在不同的目标数据集训练。

Experiments (sections 4.1 and 4.2) demonstrate that data independent universal adversarial perturbations can pose a more serious threat when compared to their data dependent counterparts. They can enable the attackers not to be concerned about either the dataset on which the target models are trained or the internals of the model themselves. At this point in time, a more rigorous study (in case of extreme depth, presence of advanced regularizers, etc.) about the data independent aspect of the adversarial perturbations is of utmost importance. It should also be complemented simultaneously with the efforts to develop methods to learn more robust models. However, we believe our work opens new avenues into the intriguing aspects of adversarial machine learning with a data independent perspective.

实验(4.1节和4.2节)表明，与依赖数据的干扰相比，数据独立的普遍对抗性扰动会造成更严重的威胁。它们可以使攻击者不关心目标模型训练的数据集，也不关心模型本身的内部。在这一点上，一个更严格的研究(在极端深入的情况下，存在高级正则化器，等等)关于数据独立的对抗式扰动是最重要的。它还应与开发学习更稳健模型的方法的努力同时得到补充。然而，我们相信，我们的工作开辟了新的途径，以数据独立的视角来研究对抗性机器学习的有趣方面。

References

参考文献

[1] Yoshua Bengio. Learning deep architectures for AI. Found. Trends Mach. Learn., 2(1), January 2009.

[2] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndic, Pavel ´ Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 387–402. Springer, 2013.

[3] Battista Biggio, Giorgio Fumera, and Fabio Roli. Pattern recognition systems under attack: Design issues and research challenges. International Journal of Pattern Recognition and Artificial Intelligence, 28(07):1460002, 2014.

[4] Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In Proceedings of the BMVC, 2014.

[5] Martín Abadi et al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL http://tensorflow.org/. Software available from tensorflow.org.

[6] Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Analysis of classifiers’ robustness to adversarial perturbations. arXiv preprint arXiv:1502.02590, 2015.

[7] Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Robustness of classifiers: from adversarial to random noise. In NIPS, 2016.

[8] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. CoRR, abs/1412.6572, 2014.

[9] Ling Huang, Anthony D. Joseph, Blaine Nelson, Benjamin I.P. Rubinstein, and J. D. Tygar. Adversarial machine learning. In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, AISec ’11, 2011.

[10] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

[11] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS. 2012.

[12] Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial machine learning at scale. CoRR, abs/1611.01236, 2016.

[13] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. arXiv preprint arXiv:1610.08401, 2016.

[14] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: A simple and accurate method to fool deep neural networks. in CVPR, 2016.

[15] Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. 2015.

[16] Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on, pages 372–387. IEEE, 2016.

[17] Nicolas Papernot, Patrick D. McDaniel, Ian J. Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against deep learning systems using adversarial examples. CoRR, abs/1602.02697, 2016.

[18] Andras Rozsa, Ethan M Rudd, and Terrance E Boult. Adversarial diversity and hard positive generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 25–32, 2016.

[19] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015.

[20] Sara Sabour, Yanshuai Cao, Fartash Faghri, and David J Fleet. Adversarial manipulation of deep representations. arXiv preprint arXiv:1511.05122, 2015.

[21] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for largescale image recognition. CoRR, abs/1409.1556, 2014.

[22] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013. URL http://arxiv.org/abs/1312.6199.

[23] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. CoRR, abs/1409.4842, 2014.

[24] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Antonio Torralba, and Aude Oliva. Places: An image database for deep scene understanding. CoRR, abs/1610.02055, 2016.

Previous对神经网络鲁棒性的评估 Next普遍的对抗性的扰动

Last updated 4 years ago

Was this helpful?