普遍的对抗性的扰动
Universal adversarial perturbations(2017)
Last updated
Was this helpful?
Universal adversarial perturbations(2017)
Last updated
Was this helpful?
https://arxiv.org/pdf/1610.08401.pdf
GB/T 7714 Moosavi-Dezfooli S M, Fawzi A, Fawzi O, et al. Universal adversarial perturbations[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1765-1773.
MLA Moosavi-Dezfooli, Seyed-Mohsen, et al. "Universal adversarial perturbations." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
APA Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1765-1773).
Given a state-of-the-art deep neural network classifier, we show the existence of a universal (image-agnostic) and very small perturbation vector that causes natural images to be misclassified with high probability. We propose a systematic algorithm for computing universal perturbations, and show that state-of-the-art deep neural networks are highly vulnerable to such perturbations, albeit being quasiimperceptible to the human eye. We further empirically analyze these universal perturbations and show, in particular, that they generalize very well across neural networks. The surprising existence of universal perturbations reveals important geometric correlations among the high-dimensional decision boundary of classifiers. It further outlines potential security breaches with the existence of single directions in the input space that adversaries can possibly exploit to break a classifier on most natural images.1
给出了一个最先进的深度神经网络分类器,我们证明了一个普遍的(图像不可知)和非常小的扰动向量的存在,导致自然图像错误分类的高概率。我们提出了一种计算普遍扰动的系统算法,并证明了最先进的深度神经网络在这种扰动面前非常脆弱,尽管人眼是准察觉不到的。我们进一步以经验分析了这些普遍的扰动,并特别表明,它们在神经网络中很好地推广。普遍扰动的惊人存在揭示了分类器的高维决策边界之间的重要几何关联。它进一步概述了输入空间中存在的单一方向可能造成的安全漏洞,敌人可以利用它来破坏大多数自然图像上的分类器
Can we find a single small image perturbation that fools a state-of-the-art deep neural network classifier on all natural images? We show in this paper the existence of such quasi-imperceptible universal perturbation vectors that lead to misclassify natural images with high probability. Specifically, by adding such a quasi-imperceptible perturbation to natural images, the label estimated by the deep neural network is changed with high probability (see Fig. 1). Such perturbations are dubbed universal, as they are imageagnostic. The existence of these perturbations is problematic when the classifier is deployed in real-world (and possibly hostile) environments, as they can be exploited by adversaries to break the classifier. Indeed, the perturbation process involves the mere addition of one very small perturbation to all natural images, and can be relatively straightforward to implement by adversaries in real-world environments, while being relatively difficult to detect as such perturbations are very small and thus do not significantly affect data distributions. The surprising existence of universal perturbations further reveals new insights on the topology of the decision boundaries of deep neural networks. We summarize the main contributions of this paper as follows:
我们能找到一个单一的小图像扰动,愚弄了最先进的深度神经网络分类器在所有自然图像?本文证明了一类准微扰向量的存在性,这些准微扰向量导致了自然图像高概率的误分类。具体来说,通过在自然图像中加入这种准微扰,由深度神经网络估计的标号有很大概率发生改变(见图1)。这种微扰被称为通用扰动,因为它们对图像无影响。当分类器部署在真实世界(可能是敌对的)环境中时,这些干扰的存在是有问题的,因为它们可以被对手利用来破坏分类器。事实上,扰动过程包括仅仅添加一个非常小的扰动自然图像,并可以被对手相对简单的实现在真实环境中,而相对很难发现这种扰动是非常小的,因此数据分布没有明显影响。普遍扰动的惊人存在进一步揭示了深度神经网络的决策边界拓扑的新见解。我们总结了本文的主要贡献如下:
We show the existence of universal image-agnostic perturbations for state-of-the-art deep neural networks.
我们证明了最先进的深度神经网络的普遍图像不可知摄动的存在。
We propose an algorithm for finding such perturbations. The algorithm seeks a universal perturbation for a set of training points, and proceeds by aggregating atomic perturbation vectors that send successive datapoints to the decision boundary of the classifier.
我们提出了一种寻找这种扰动的算法。该算法寻找一组训练点的通用扰动,并通过聚合原子扰动向量来继续,这些原子扰动向量将连续的数据点发送到分类器的决策边界。
We show that universal perturbations have a remarkable generalization property, as perturbations computed for a rather small set of training points fool new images with high probability.
我们证明了普遍摄动具有显著的泛化性质,因为对一个相当小的训练点集计算的摄动以高概率欺骗新图像。
We show that such perturbations are not only universal across images, but also generalize well across deep neural networks. Such perturbations are therefore doubly universal, both with respect to the data and the network architectures.
我们表明,这种扰动不仅在图像中普遍存在,而且在深度神经网络中也能很好地推广。因此,这种扰动在数据和网络架构方面具有双重普遍性。
We explain and analyze the high vulnerability of deep neural networks to universal perturbations by examining the geometric correlation between different parts of the decision boundary.
我们通过检查决策边界不同部分之间的几何相关性来解释和分析深度神经网络对普遍扰动的高脆弱性。
The robustness of image classifiers to structured and unstructured perturbations have recently attracted a lot of attention [19, 16, 20, 3, 4, 12, 13, 14]. Despite the impressive performance of deep neural network architectures on challenging visual classification benchmarks [6, 9, 21, 10], these classifiers were shown to be highly vulnerable to perturbations. In [19], such networks are shown to be unstable to very small and often imperceptible additive adversarial perturbations. Such carefully crafted perturbations are either estimated by solving an optimization problem [19, 11, 1] or through one step of gradient ascent [5], and result in a perturbation that fools a specific data point. A fundamental property of these adversarial perturbations is their intrinsic dependence on datapoints: the perturbations are specifically crafted for each data point independently. As a result, the computation of an adversarial perturbation for a new data point requires solving a data-dependent optimization problem from scratch, which uses the full knowledge of the classification model. This is different from the universal perturbation considered in this paper, as we seek a single perturbation vector that fools the network on most natural images. Perturbing a new datapoint then only involves the mere addition of the universal perturbation to the image (and does not require solving an optimization problem/gradient computation). Finally, we emphasize that our notion of universal perturbation differs from the generalization of adversarial perturbations studied in [19], where perturbations computed on the MNIST task were shown to generalize well across different models. Instead, we examine the existence of universal perturbations that are common to most data points belonging to the data distribution.
图像分类器对结构化和非结构化扰动的鲁棒性近年来受到了广泛关注[19,16,20,3,4,12,13,14]。尽管在具有挑战性的视觉分类基准上,深度神经网络结构的表现令人印象深刻[6,9,21,10],但这些分类器被证明对扰动非常脆弱。在[19]中,这样的网络被证明是不稳定的,对于非常小的和通常难以察觉的加性对抗摄动。这种精心设计的扰动要么通过求解一个优化问题[19,11,1]来估计,要么通过一步梯度[5]来估计,并导致一个扰动,使一个特定的数据点陷入困境。这些对抗性扰动的一个基本属性是它们对数据点的内在依赖:这些扰动是针对每个数据点单独设计的。因此,计算一个新的数据点的对抗性扰动需要从头解决一个数据相关的优化问题,这需要使用分类模型的全部知识。这不同于本文所考虑的普遍摄动,因为我们寻求一个单一的摄动向量,在大多数自然图像上欺骗网络。然后,扰动一个新的数据点只涉及到添加到图像的普遍扰动(不需要解决优化问题/梯度计算)。最后,我们强调我们的普遍摄动的概念不同于在[19]中研究的对敌摄动的推广,其中MNIST任务计算的摄动被证明可以很好地推广不同的模型。相反,我们检查普遍扰动的存在性,这是属于数据分布的大多数数据点所共有的。
We formalize in this section the notion of universal perturbations, and propose a method for estimating such perturbations. Let denote a distribution of images in , and define a classification function that outputs for each image an estimated label . The main focus of this paper is to seek perturbation vectors that fool the classifier on almost all datapoints sampled from . That is, we seek a vector such that
\hat{k}(x+v)\neq\hat{k}(x) ~for~"most"~x\sim\mu\tag{1}
在这一节中,我们形式化了普遍摄动的概念,并提出了一种估计这种摄动的方法。 表示图像在 中的分布, 定义一个分类函数,该函数为每个图像 输出一个估计的标签 。本文的主要重点是寻找扰动向量 来欺骗分类器 从几乎所有的数据点 采样。也就是说,我们寻找一个这样的向量
\hat{k}(x+v)\neq\hat{k}(x) ~for~"most"~x\sim\mu\tag{1}
We coin such a perturbation universal, as it represents a fixed image-agnostic perturbation that causes label change for most images sampled from the data distribution . We focus here on the case where the distribution represents the set of natural images, hence containing a huge amount of variability. In that context, we examine the existence of small universal perturbations (in terms of the norm with ) that misclassify most images. The goal is therefore to find v that satisfies the following two constraints:
1. \lVert\rVert_{p}\leq\zeta,\\2. \mathbb{P}(\hat{k}(x+v)\neq\hat{k}(x))\geq1-\delta.\tag{2}
我们创造了这样一种普遍的扰动,因为它代表了一种固定的图像不可知的扰动,它会导致从数据分布中采样的大多数图像的标签改变 。我们在这里关注的情况是,分布代表了一组自然图像,因此包含了巨大的可变性 。在这个背景下,我们检验了小的普遍扰动的存在( 根据 的p范数)),它错误地分类了大多数图像。因此,我们的目标是找到满足以下两个约束条件的v
1. \lVert\rVert_{p}\leq\zeta,\\2. \mathbb{P}(\hat{k}(x+v)\neq\hat{k}(x))\geq1-\delta.\tag{2}
The parameter controls the magnitude of the perturbation vector , and quantifies the desired fooling rate for all images sampled from the distribution .
的参数控制扰动向量 的大小,和 量化所需的所有图片欺骗率分布 取样。
Algorithm. Let be a set of images sampled from the distribution . Our proposed algorithm seeks a universal perturbation , such that , while fooling most data points in . The algorithm proceeds iteratively over the data points in and gradually builds the universal perturbation, as illustrated in Fig. 2. At each iteration, the minimal perturbation that sends the current perturbed point, , to the decision boundary of the classifier is computed, and aggregated to the current instance of the universal perturbation. In more details, provided the current universal perturbation v does not fool data point , we seek the extra perturbation with minimal norm that allows to fool data point xi by solving the following optimization problem:
\vartriangle v_{i}\gets argmin\lVert r \rVert_{2}s.t.\hat{k}(x_{i}+v+r)\neq\hat{k}(x_{i}).\tag{1}
算法。令 是从分发版中采样的一组图像。我们的算法寻找一个普遍的扰动 ,这样 ,虽然在大多数数据点在 迭代算法所得的数据点 ,逐步构建普遍的扰动,如图2中所示。在每次迭代中,计算将当前摄动点 发送到分类器决策边界的最小摄动 ,并聚合到通用摄动的当前实例。更详细地说,如果当前的普遍扰动 没有欺骗数据点 ,我们通过求解下面的优化问题来寻求允许欺骗数据点 的具有最小范数的额外扰动
\vartriangle v_{i}\gets argmin\lVert r \rVert_{2}s.t.\hat{k}(x_{i}+v+r)\neq\hat{k}(x_{i}).\tag{1}
To ensure that the constraint is satisfied, the updated universal perturbation is further projected on the ball of radius and centered at 0. That is, let be the projection operator defined as follows:
P_{p,\zeta}(v)=argmin\lVert v-v^{\prime} \rVert_{2}subject to \lVert v^{\prime} \rVert_{p}\leq\zeta.\tag{2}
确保约束kvkp≤ξ是满意,更新后的通用微扰进一步投射在p球的半径ξ,集中在0。让Pp,ξ投影算子定义如下:
P_{p,\zeta}(v)=argmin\lVert v-v^{\prime} \rVert_{2}subject to \lVert v^{\prime} \rVert_{p}\leq\zeta.\tag{2}
Then, our update rule is given by . Several passes on the data set are performed to improve the quality of the universal perturbation. The algorithm is terminated when the empirical “fooling rate” on the perturbed data set exceeds the target threshold . That is, we stop the algorithm whenever
Err(X_{v}):=\frac{1}{m}\sum\limits_{i=1}^{m}l_{\hat{k}(x_{i}+v)\neq\hat{k}(x_{i})}\geq1-\delta.\tag{3}
然后,我们更新规则是由 。通过对数据集合 进行改善的质量普遍的扰动。当扰动数据集上的经验欺骗率为 超过目标阈值 个参数。也就是说,我们在任何时候都停止算法
Err(X_{v}):=\frac{1}{m}\sum\limits_{i=1}^{m}l_{\hat{k}(x_{i}+v)\neq\hat{k}(x_{i})}\geq1-\delta.\tag{3}
The detailed algorithm is provided in Algorithm 1. Interestingly, in practice, the number of data points m in X need not be large to compute a universal perturbation that is valid for the whole distribution µ. In particular, we can set m to be much smaller than the number of training points (see Section 3).
具体算法见算法1。有趣的是,在实践中,X中的数据点m不必很大,就能计算出对整个分布有效的普遍扰动。特别地,我们可以设m比训练点的数目小得多(见 第三节)。
The proposed algorithm involves solving at most m instances of the optimization problem in Eq. (1) for each pass. While this optimization problem is not convex when is a standard classifier (e.g., a deep neural network), several efficient approximate methods have been devised for solving this problem [19, 11, 7]. We use in the following the approach in [11] for its efficency. It should further be noticed that the objective of Algorithm 1 is not to find the smallest universal perturbation that fools most data points sampled from the distribution, but rather to find one such perturbation with sufficiently small norm. In particular, different random shufflings of the set X naturally lead to a diverse set of universal perturbations v satisfying the required constraints. The proposed algorithm can therefore be leveraged to generate multiple universal perturbations for a deep neural network (see next section for visual examples).
所提出的算法对每一遍最多求解公式(1)中优化问题的m个实例。当k是一个标准分类器(例如,一个深度神经网络)时,这个优化问题不是凸的,一些有效的近似方法已经被设计来解决这个问题[19,11,7]。为了提高效率,我们在下面使用[11]中的方法。需要进一步注意的是,算法1的目标不是找到欺骗了从分布中采样的大多数数据点的最小的普遍扰动,而是找到一个具有足够小范数的这样的扰动。特别地,集合X的不同随机变换自然会导致满足所需约束条件的不同的普遍扰动集v。因此,所提出的算法可以被用来为深度神经网络生成多种通用扰动(见下一节的可视化例子)。
We now analyze the robustness of state-of-the-art deep neural network classifiers to universal perturbations using Algorithm 1.
我们现在分析先进的深度神经网络分类器的鲁棒性普遍摄动使用 算法1。
In a first experiment, we assess the estimated universal perturbations for different recent deep neural networks on the ILSVRC 2012 [15] validation set (50,000 images), and report the fooling ratio, that is the proportion of images that change labels when perturbed by our universal perturbation. Results are reported for , where we respectively set ξ = 2000 and ξ = 10. These numerical values were chosen in order to obtain a perturbation whose norm is significantly smaller than the image norms, such that the perturbation is quasi-imperceptible when added to natural images2 . Results are listed in Table 1. Each result is reported on the set X, which is used to compute the perturbation, as well as on the validation set (that is not used in the process of the computation of the universal perturbation). Observe that for all networks, the universal perturbation achieves very high fooling rates on the validation set. Specifically, the universal perturbations computed for CaffeNet and VGG-F fool more than 90% of the validation set (for p = ∞). In other words, for any natural image in the validation set, the mere addition of our universal perturbation fools the classifier more than 9 times out of 10. This result is moreover not specific to such architectures, as we can also find universal perturbations that cause VGG, GoogLeNet and ResNet classifiers to be fooled on natural images with probability edging 80%. These results have an element of surprise, as they show the existence of single universal perturbation vectors that cause natural images to be misclassified with high probability, albeit being quasiimperceptible to humans. To verify this latter claim, we show visual examples of perturbed images in Fig. 3, where the GoogLeNet architecture is used. These images are either taken from the ILSVRC 2012 validation set, or captured using a mobile phone camera. Observe that in most cases, the universal perturbation is quasi-imperceptible, yet this powerful image-agnostic perturbation is able to misclassify any image with high probability for state-of-the-art classifiers. We refer to the supp. material for the original (unperturbed) images, as well as their ground truth labels. We also refer to the video in the supplementary material for real-world examples on a smartphone. We visualize the universal perturbations corresponding to different networks in Fig. 4. It should be noted that such universal perturbations are not unique, as many different universal perturbations (all satisfying the two required constraints) can be generated for the same network. In Fig. 5, we visualize five different universal perturbations obtained by using different random shufflings in X. Observe that such universal perturbations are different, although they exhibit a similar pattern. This is moreover confirmed by computing the normalized inner products between two pairs of perturbation images, as the normalized inner products do not exceed 0.1, which shows that one can find diverse universal perturbations.
在第一个实验中,我们评估了ILSVRC 2012[15]验证集(50,000张图像)上最近不同深度神经网络的估计普遍摄动,并报告了欺骗比率,即受我们的普遍摄动干扰时改变标签的图像的比例。结果报告 ,我们分别设置ξ= 2000和ξ= 10。选择这些数值是为了获得范数明显小于图像范数的摄动,这样当加到自然图像上时,摄动是准不可察觉的。结果如表1所示。每个结果都报告在用于计算摄动的集合X上,以及在计算通用摄动的过程中未使用的验证集上。观察到,对于所有网络,通用扰动在验证集上达到了非常高的欺骗率。具体来说,CaffeNet和vgg- f计算的通用扰动愚弄了超过90%的验证集(对于 )。换句话说,对于验证集中的任何自然图像,仅仅加上我们的普遍扰动就使分类器10次中有9次被愚弄了。此外,这个结果并不是特定于这种架构,因为我们也可以发现普遍的扰动,导致VGG、GoogLeNet和ResNet分类器在自然图像上被欺骗的概率接近80%。这些结果令人惊讶,因为它们显示了单一的普遍扰动向量的存在,导致自然图像被错误分类的可能性很高,尽管对人类来说是准难以察觉的。为了验证后一种说法,我们在图3中展示了使用GoogLeNet结构的扰动图像的可视化示例。这些图片要么是从ILSVRC 2012验证集拍摄的,要么是用手机摄像头拍摄的。观察,在大多数情况下,普遍的摄动是准不可察觉的,但这强大的图像不可知摄动能够错误分类任何图像与最高水平的分类器高概率。我们参考原始(未受干扰的)图像的原材料,以及它们的地面真实值标签。我们还参考了补充材料中的视频,以获得关于智能手机的真实例子。我们在图4中可视化了对应于不同网络的普遍摄动。需要注意的是,这种普遍摄动并不是唯一的,因为可以为同一个网络生成许多不同的普遍摄动(都满足两个要求的约束)。在图5中,我们将在x中使用不同的随机打乱所得到的五种不同的普遍摄动可视化。观察到这些普遍摄动是不同的,尽管它们表现出相似的模式。通过计算两对摄动图像的归一化内积,证明了这一点,因为归一化内积不超过0.1,表明可以发现不同的普遍摄动。
While the above universal perturbations are computed for a set X of 10,000 images from the training set (i.e., in average 10 images per class), we now examine the influence of the size of X on the quality of the universal perturbation. We show in Fig. 6 the fooling rates obtained on the validation set for different sizes of X for GoogLeNet. Note for example that with a set X containing only 500 images, we can fool more than 30% of the images on the validation set. This result is significant when compared to the number of classes in ImageNet (1000), as it shows that we can fool a large set of unseen images, even when using a set X containing less than one image per class! The universal perturbations computed using Algorithm 1 have therefore a remarkable generalization power over unseen data points, and can be computed on a very small set of training images.
虽然上述通用扰动是对训练集中10,000张图像的集合X(即平均每个类10张图像)进行计算的,但我们现在检查X的大小对通用扰动质量的影响。我们在图6中显示了对GoogLeNet的不同大小的X在验证集上获得的欺骗率。注意例如,一组X只包含500张图片,我们可以愚弄30%以上的图像验证集。这个结果是重要的类的数量相比ImageNet(1000),因为它表明,我们可以愚弄一套大的看不见的图像,即使使用一组X包含小于一个图像每个类!因此,使用算法1计算的普遍扰动对看不见的数据点具有显著的泛化能力,并且可以在非常小的训练图像集上计算。
Cross-model universality. While the computed perturbations are universal across unseen data points, we now examine their cross-model universality. That is, we study to which extent universal perturbations computed for a specific architecture (e.g., VGG-19) are also valid for another architecture (e.g., GoogLeNet). Table 2 displays a matrix summarizing the universality of such perturbations across six different architectures. For each architecture, we compute a universal perturbation and report the fooling ratios on all other architectures; we report these in the rows of Table 2. Observe that, for some architectures, the universal perturbations generalize very well across other architectures. For example, universal perturbations computed for the VGG-19 network have a fooling ratio above 53% for all other tested architectures. This result shows that our universal perturbations are, to some extent, doubly-universal as they generalize well across data points and very different architectures. It should be noted that, in [19], adversarial perturbations were previously shown to generalize well, to some extent, across different neural networks on the MNIST problem. Our results are however different, as we show the generalizability of universal perturbations across different architectures on the ImageNet data set. This result shows that such perturbations are of practical relevance, as they generalize well across data points and architectures. In particular, in order to fool a new image on an unknown neural network, a simple addition of a universal perturbation computed on the VGG-19 architecture is likely to misclassify the data point.
跨模型的普遍性。虽然计算的扰动在看不见的数据点上是普遍的,我们现在检查它们的跨模型通用性。也就是说,我们研究了对一个特定结构(例如,VGG-19)计算的普遍摄动在多大程度上也适用于另一个结构(例如,GoogLeNet)。表2显示了一个矩阵,总结了这种扰动在六种不同架构中的普遍性。对于每种架构,我们计算一个普遍扰动并报告所有其他架构上的欺骗比率;我们在表2的行中报告了这些。注意,对于某些结构,普遍摄动可以很好地推广到其他结构。例如,对vggg -19网络计算得出的通用扰动对所有其他测试架构的欺骗率高于53%。这个结果表明,我们的普遍摄动在某种程度上是双重普遍的,因为它们可以很好地推广到数据点和非常不同的结构。值得注意的是,在[19]中,在某种程度上,对抗性扰动在MNIST问题上可以很好地推广到不同的神经网络。然而,我们的结果是不同的,因为我们展示了ImageNet数据集上跨不同架构的普遍扰动的普遍性。这个结果表明,这种扰动具有实际意义,因为它们可以很好地推广到数据点和架构上。特别是,为了在未知的神经网络上欺骗新图像,在vgg19架构上简单地加上一个通用扰动很可能会对数据点进行错误分类。
Visualization of the effect of universal perturbations. To gain insights on the effect of universal perturbations on natural images, we now visualize the distribution of labels on the ImageNet validation set. Specifically, we build a directed graph G = (V, E), whose vertices denote the labels, and directed edges e = (i → j) indicate that the majority of images of class i are fooled into label j when applying the universal perturbation. The existence of edges i → j therefore suggests that the preferred fooling label for images of class i is j. We construct this graph for GoogLeNet, and visualize the full graph in the supp. material for space constraints. The visualization of this graph shows a very peculiar topology. In particular, the graph is a union of disjoint components, where all edges in one component mostly connect to one target label. See Fig. 7 for an illustration of two connected components. This visualization clearly shows the existence of several dominant labels, and that universal perturbations mostly make natural images classified with such labels. We hypothesize that these dominant labels occupy large regions in the image space, and therefore represent good candidate labels for fooling most natural images. Note that these dominant labels are automatically found by Algorithm 1, and are not imposed a priori in the computation of perturbations.
普遍摄动效应的可视化。获得洞见普遍扰动对自然图像的影响,我们现在想象的分布标签ImageNet验证集。具体地说,我们建立一个有向图G = (V, E),其顶点表示标签,和导演边E =(我j)表明,绝大多数的图像类我被愚弄到标签j应用普遍的摄动时。因此,边i j的存在性表明,类i的图像的首选欺骗标签为j。我们构造了这个图,并在空间约束条件下将整个图可视化。这个图形的可视化显示了一个非常特殊的拓扑。特别地,这个图是一个不相交分量的并集,其中一个分量的所有边大多连接到一个目标标号。如图7所示为两个连接的部件。这种可视化清晰地显示了几个主要标签的存在,而普遍的扰动大多使自然图像用这些标签分类。我们假设这些主要标签在图像空间中占据了很大的区域,因此可以作为伪装大多数自然图像的好候选标签。请注意,这些显性标签是由算法1自动找到的,而不是在计算扰动时预先施加的。
Fine-tuning with universal perturbations. We now examine the effect of fine-tuning the networks with perturbed images. We use the VGG-F architecture, and fine-tune the network based on a modified training set where universal perturbations are added to a fraction of (clean) training samples: for each training point, a universal perturbation is added with probability 0.5, and the original sample is preserved with probability 0.5. 3 To account for the diversity of universal perturbations, we pre-compute a pool of 10 different universal perturbations and add perturbations to the training samples randomly from this pool. The network is fine-tuned by performing 5 extra epochs of training on the modified training set. To assess the effect of fine-tuning on the robustness of the network, we compute a new universal perturbation for the fine-tuned network (using Algorithm 1, with p = ∞ and ξ = 10), and report the fooling rate of the network. After 5 extra epochs, the fooling rate on the validation set is 76.2%, which shows an improvement with respect to the original network (93.7%, see Table 1).4 Despite this improvement, the fine-tuned network remains largely vulnerable to small universal perturbations. We therefore repeated the above procedure (i.e., computation of a pool of 10 universal perturbations for the fine-tuned network, finetuning of the new network based on the modified training set for 5 extra epochs), and we obtained a new fooling ratio of 80.0%. In general, the repetition of this procedure for a fixed number of times did not yield any improvement over the 76.2% fooling ratio obtained after one step of finetuning. Hence, while fine-tuning the network leads to a mild improvement in the robustness, we observed that this simple solution does not fully immune against small universal perturbations.
具有普遍扰动的微调。我们现在研究的影响,微调网络与摄动图像。我们使用vggg - f架构,并基于一个改进的训练集对网络进行微调,该训练集在一部分(干净的)训练样本上添加了通用扰动:对于每个训练点,以0.5的概率添加一个通用扰动,以0.5的概率保留原始样本。3为了解释普遍扰动的多样性,我们预先计算了一个由10个不同的普遍扰动组成的集合,并从这个集合中随机地将扰动加到训练样本中。网络是通过执行5个额外的调整时期修改后的训练集训练。评估的效果调整网络的鲁棒性,我们计算一个新的通用微扰调整网络(使用算法1,p =和ξ= 10),并报告网络的欺骗率。5个多世纪之后,验证集的欺骗率为76.2%,相对于原始网络(93.7%,见表1)有了改进。4尽管有了这些改进,但是经过微调的网络在很大程度上仍然容易受到小的普遍扰动的影响。因此,我们重复了上述步骤(即计算微调网络的10个普遍扰动,基于5个额外epoch的改进训练集对新网络进行微调),得到了新的欺骗率为80.0%。一般来说,重复这个步骤固定次数并不会比一步微调后得到的76.2%的欺骗率有任何改善。因此,尽管对网络进行微调可以略微提高健壮性,但我们观察到,这种简单的解决方案不能完全抵抗普遍存在的小扰动。
The goal of this section is to analyze and explain the high vulnerability of deep neural network classifiers to universal perturbations. To understand the unique characteristics of universal perturbations, we first compare such perturbations with other types of perturbations, namely i) random perturbation, ii) adversarial perturbation computed for a randomly picked sample (computed using the DF and FGS methods respectively in [11] and [5]), iii) sum of adversarial perturbations over X, and iv) mean of the images (or ImageNet bias). For each perturbation, we depict a phase transition graph in Fig. 8 showing the fooling rate on the validation set with respect to the `2 norm of the perturbation. Different perturbation norms are achieved by scaling accordingly each perturbation with a multiplicative factor to have the target norm. Note that the universal perturbation is computed for ξ = 2000, and also scaled accordingly.
本节的目标是分析和解释深神经网络分类器在普遍摄动下的高脆弱性。了解通用扰动的独特特征,我们首先比较与其他类型的扰动,这种扰动即我)随机扰动,ii)对抗的微扰计算的随机挑选样本(计算使用DF和投篮方法分别在[11]和[5]),3)的和敌对的扰动/ X,和(四)的均值图像(或ImageNet偏见)。对于每个扰动,我们在图8中描绘了一个相变图,显示了验证集上相对于扰动的' 2范数的欺骗率。不同的扰动范数通过相应的缩放来实现,每个扰动都有一个乘法因子来获得目标范数。注意,通用的微扰计算ξ= 2000,也相应扩大。
Observe that the proposed universal perturbation quickly reaches very high fooling rates, even when the perturbation is constrained to be of small norm. For example, the universal perturbation computed using Algorithm 1 achieves a fooling rate of 85% when the `2 norm is constrained to ξ = 2000, while other perturbations achieve much smaller ratios for comparable norms. In particular, random vectors sampled uniformly from the sphere of radius of 2000 only fool 10% of the validation set. The large difference between universal and random perturbations suggests that the universal perturbation exploits some geometric correlations between different parts of the decision boundary of the classifier. In fact, if the orientations of the decision boundary in the neighborhood of different data points were completely uncorrelated (and independent of the distance to the decision boundary), the norm of the best universal perturbation would be comparable to that of a random perturbation. Note that the latter quantity is well understood (see [4]), as the norm of the random perturbation required to fool a specific data point precisely behaves as Θ(√ dkrk2), where d is the dimension of the input space, and krk2 is the distance between the data point and the decision boundary (or equivalently, the norm of the smallest adversarial perturbation). For the considered ImageNet classification task, this quantity is equal to √ dkrk2 ≈ 2×104 , for most data points, which is at least one order of magnitude larger than the universal perturbation (ξ = 2000). This substantial difference between random and universal perturbations thereby suggests redundancies in the geometry of the decision boundaries that we now explore.
观察到,即使当扰动被约束为小范数时,所提出的普遍扰动也很快达到非常高的欺骗率。例如,通用微扰计算使用算法1达到欺骗率为85%时的2范数约束ξ= 2000,而其他扰动实现规模小得多的比率比较规范。特别是在半径为2000的球体上均匀采样的随机向量只欺骗了验证集的10%。通用扰动与随机扰动的巨大差异表明,通用扰动利用了分类器决策边界不同部分之间的几何相关性。事实上,如果决策边界在不同数据点的邻域上的方向完全不相关(并且独立于到决策边界的距离),最佳普遍扰动的范数将与随机扰动的范数相比较。注意,后者数量很好理解(见[4]),所需的随机扰动的规范傻瓜一个特定的数据点精确表现Θ(dkrk2), d是输入空间的维数,和krk2数据点之间的距离,决定边界(或等价,最小的对抗性的扰动的标准)。认为ImageNet分类任务,这个量等于dkrk2 2 104年,大多数数据点,至少一个数量级大于通用微扰(ξ= 2000)。随机扰动和普遍扰动之间的这种实质性差异,因此表明了我们现在探索的决策边界的几何冗余。
For each image x in the validation set, we compute the adversarial perturbation vector r(x) = arg minr krk2 s.t. ˆk(x + r) 6= ˆk(x). It is easy to see that r(x) is normal to the decision boundary of the classifier (at x + r(x)). The vector r(x) hence captures the local geometry of the decision boundary in the region surrounding the data point x. To quantify the correlation between different regions of the decision boundary of the classifier, we define the matrix
N=\lgroup \frac{r(x_{i})}{\lVert r(x_{i}) \rVert_{2}}...\frac{r(x_{n})}{\lVert r(x_{n}) \rVert_{2}} \rgroup\tag{4}
对于验证集中的每幅图像x,我们计算敌对摄动向量r(x) = arg minr krk2 s.t k(x + r) 6= k(x)。很容易看出r(x)是垂直于分类器的决策边界(在x + r(x)处)。因此,向量r(x)捕获了在数据点x周围区域的决策边界的局部几何。为了量化分类器决策边界不同区域之间的相关性,我们定义了矩阵
N=\lgroup \frac{r(x_{i})}{\lVert r(x_{i}) \rVert_{2}}...\frac{r(x_{n})}{\lVert r(x_{n}) \rVert_{2}} \rgroup\tag{4}
of normal vectors to the decision boundary in the vicinity of n data points in the validation set. For binary linear classifiers, the decision boundary is a hyperplane, and N is of rank 1, as all normal vectors are collinear. To capture more generally the correlations in the decision boundary of complex classifiers, we compute the singular values of the matrix N. The singular values of the matrix N, computed for the CaffeNet architecture are shown in Fig. 9. We further show in the same figure the singular values obtained when the columns of N are sampled uniformly at random from the unit sphere. Observe that, while the latter singular values have a slow decay, the singular values of N decay quickly, which confirms the existence of large correlations and redundancies in the decision boundary of deep networks. More precisely, this suggests the existence of a subspace S of low dimension d 0 (with d 0 d), that contains most normal vectors to the decision boundary in regions surrounding natural images. We hypothesize that the existence of universal perturbations fooling most natural images is partly due to the existence of such a low-dimensional subspace that captures the correlations among different regions of the decision boundary. In fact, this subspace “collects” normals to the decision boundary in different regions, and perturbations belonging to this subspace are therefore likely to fool datapoints. To verify this hypothesis, we choose a random vector of norm ξ = 2000 belonging to the subspace S spanned by the first 100 singular vectors, and compute its fooling ratio on a different set of images (i.e., a set of images that have not been used to compute the SVD). Such a perturbation can fool nearly 38% of these images, thereby showing that a random direction in this well-sought subspace S significantly outperforms random perturbations (we recall that such perturbations can only fool 10% of the data). Fig. 10 illustrates the subspace S that captures the correlations in the decision boundary. It should further be noted that the existence of this low dimensional subspace explains the surprising generalization properties of universal perturbations obtained in Fig. 6, where one can build relatively generalizable universal perturbations with very few images.
对于二值线性分类器来说,决策边界是一个超平面,n的秩为1,因为所有的法向量都是共线的。为了更普遍地捕获复杂分类器决策边界中的相关性,我们计算矩阵N的奇异值。图9显示了为CaffeNet结构计算的矩阵N的奇异值。在同一图中,我们进一步说明了从单位球上均匀随机采样N列时所得到的奇异值。观察到后一种奇异值衰减速度较慢,而N个奇异值衰减速度较快,证实了深度网络决策边界存在较大的相关性和冗余。更准确地说,这表明了低维d0(带有d0d)子空间S的存在,它包含了自然图像周围区域中判定边界的大多数法向量。我们假设,普遍扰动的存在欺骗了大多数自然图像的部分原因是这样一个低维子空间的存在,它捕获了决策边界不同区域之间的相关性。事实上,该子空间在不同区域收集到判定边界的法线,因此属于该子空间的扰动很可能欺骗数据点。来验证这个假设,我们选择一个随机向量属于子空间的规范ξ= 2000年代跨越100年第一奇异向量,并计算其愚弄比率在一组不同的图像(例如,一组图片没有被用于计算计算。这样的扰动可以欺骗近38%的这些图像,从而表明在这个寻找良好的子空间S的随机方向显著优于随机扰动(我们记得这样的扰动只能欺骗10%的数据)。图10显示了在决策边界捕获相关的子空间S。应该进一步指出的是,这个低维子空间的存在解释了图6中得到的普遍摄动令人惊讶的推广特性,其中人们可以用很少的图像建立相对可推广的普遍摄动。
Unlike the above experiment, the proposed algorithm does not choose a random vector in this subspace, but rather chooses a specific direction in order to maximize the overall fooling rate. This explains the gap between the fooling rates obtained with the random vector strategy in S and Algorithm 1.
与上述实验不同的是,本文算法并没有在该子空间中选择一个随机向量,而是选择了一个特定的方向,以使整体的欺骗率最大化。这解释了随机向量策略在S和算法1中获得的欺骗率之间的差距。
We showed the existence of small universal perturbations that can fool state-of-the-art classifiers on natural images. We proposed an iterative algorithm to generate universal perturbations, and highlighted several properties of such perturbations. In particular, we showed that universal perturbations generalize well across different classification models, resulting in doubly-universal perturbations (imageagnostic, network-agnostic). We further explained the existence of such perturbations with the correlation between different regions of the decision boundary. This provides insights on the geometry of the decision boundaries of deep neural networks, and contributes to a better understanding of such systems. A theoretical analysis of the geometric correlations between different parts of the decision boundary will be the subject of future research.
我们证明了小的普遍扰动的存在,可以欺骗先进的分类器对自然图像。我们提出了一种产生普遍摄动的迭代算法,并强调了这种摄动的几个性质。特别地,我们证明了普遍摄动可以很好地推广到不同的分类模型,从而产生双普遍摄动(成像无关、网络无关)。我们进一步用决策边界不同区域之间的相关性来解释这种扰动的存在。这提供了深刻的几何上的决策边界的深度神经网络,并有助于更好地理解这样的系统。对决策边界不同部分之间的几何关联进行理论分析将是未来研究的主题。
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research.
我们非常感谢NVIDIA公司对我们的支持,感谢他们为我们的研究提供了特斯拉K40 GPU。
[1] O. Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis, A. Nori, and A. Criminisi. Measuring neural net robustness with constraints. In Neural Information Processing Systems (NIPS), 2016. 2 [2] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference, 2014. 4 [3] A. Fawzi, O. Fawzi, and P. Frossard. Analysis of classifiers’ robustness to adversarial perturbations. CoRR, abs/1502.02590, 2015. 2 [4] A. Fawzi, S. Moosavi-Dezfooli, and P. Frossard. Robustness of classifiers: from adversarial to random noise. In Neural Information Processing Systems (NIPS), 2016. 2, 7 [5] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015. 2, 7 [6] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 2, 4 [7] R. Huang, B. Xu, D. Schuurmans, and C. Szepesvari. Learn- ´ ing with a strong adversary. CoRR, abs/1511.03034, 2015. 3 [8] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM International Conference on Multimedia (MM), pages 675–678, 2014. 4 [9] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS), pages 1097–1105, 2012. 2 [10] Q. V. Le, W. Y. Zou, S. Y. Yeung, and A. Y. Ng. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 3361–3368. IEEE, 2011. 2
[11] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 2, 3, 7 [12] A. Nguyen, J. Yosinski, and J. Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 427–436, 2015. 2 [13] E. Rodner, M. Simon, R. Fisher, and J. Denzler. Fine-grained recognition in the noisy wild: Sensitivity analysis of convolutional neural networks approaches. In British Machine Vision Conference (BMVC), 2016. 2 [14] A. Rozsa, E. M. Rudd, and T. E. Boult. Adversarial diversity and hard positive generation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2016. 2 [15] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. Berg, and L. Fei-Fei. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015. 3 [16] S. Sabour, Y. Cao, F. Faghri, and D. J. Fleet. Adversarial manipulation of deep representations. In International Conference on Learning Representations (ICLR), 2016. 2 [17] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR), 2014. 4 [18] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 4 [19] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014. 2, 3, 4 [20] P. Tabacof and E. Valle. Exploring the space of adversarial images. IEEE International Joint Conference on Neural Networks, 2016. 2 [21] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface: Closing the gap to human-level performance in face verification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1701–1708, 2014. 2
Fig. 11 shows the original images corresponding to the experiment in Fig. 3. Fig. 12 visualizes the graph showing relations between original and perturbed labels (see Section 3 for more details).
图11为与图3实验结果相对应的原始图像。图12显示了原始标签和扰动标签之间的关系(详见第3节)。
图11:原始图像。前两行是从验证集随机选择的图像,最后一行图像是从手机摄像头拍摄的个人图像。