DeepFool:一种简单而准确的欺骗深度神经网络的方法
DeepFool: a simple and accurate method to fool deep neural networks
Last updated
Was this helpful?
DeepFool: a simple and accurate method to fool deep neural networks
Last updated
Was this helpful?
原文连接:
GB/T 7714 Moosavi-Dezfooli S M, Fawzi A, Frossard P. Deepfool: a simple and accurate method to fool deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2574-2582.
MLA Moosavi-Dezfooli, Seyed-Mohsen, Alhussein Fawzi, and Pascal Frossard. "Deepfool: a simple and accurate method to fool deep neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
APA Moosavi-Dezfooli, S. M., Fawzi, A., & Frossard, P. (2016). Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2574-2582).
State-of-the-art deep neural networks have achieved impressive results on many image classification tasks. However, these same architectures have been shown to be unstable to small, well sought, perturbations of the images. Despite the importance of this phenomenon, no effective methods have been proposed to accurately compute the robustness of state-of-the-art deep classifiers to such perturbations on large-scale datasets. In this paper, we fill this gap and propose the DeepFool algorithm to efficiently compute perturbations that fool deep networks, and thus reliably quantify the robustness of these classifiers. Extensive experimental results show that our approach outperforms recent methods in the task of computing adversarial perturbations and making classifiers more robust.1
先进的深度神经网络已经在许多图像分类任务上取得了令人印象深刻的结果。然而,这些相同的架构已经被证明是不稳定的小,寻找,扰乱的图像。尽管这一现象的重要性,但没有提出有效的方法来精确计算先进的深度分类器对这种大规模数据集扰动的鲁棒性。在本文中,我们填补了这一空白,并提出了深度傻瓜算法来有效地计算欺骗深度网络的扰动,从而可靠地量化这些分类器的鲁棒性。广泛的实验结果表明,我们的方法在计算对抗性扰动和使分类器更加健壮方面优于最近的方法。
Deep neural networks are powerful learning models that achieve state-of-the-art pattern recognition performance in many research areas such as bioinformatics [1, 16], speech [12, 6], and computer vision [10, 8]. Though deep networks have exhibited very good performance in classification tasks, they have recently been shown to be particularly unstable to adversarial perturbations of the data [18]. In fact, very small and often imperceptible perturbations of the data samples are sufficient to fool state-of-the-art classifiers and result in incorrect classification. (e.g., Figure 1). Formally, for a given classifier, we define an adversarial perturbation as the minimal perturbation r that is sufficient to change the estimated label :
\vartriangle(x;\hat{k}):=min\lVert r \rVert_{2} ~subject~to ~\hat{k}(x+r)\neq\hat{k}(x),\tag{1}
深度神经网络是一种强大的学习模型,可以在生物信息学[1,16]、语音[12,6]和计算机视觉[10,8]等许多研究领域实现最先进的模式识别性能。虽然深度网络在分类任务中表现出了很好的性能,但最近它们在[18]数据的敌对干扰下表现得特别不稳定。事实上,数据样本非常小且常常难以察觉的扰动足以欺骗先进的分类器,并导致不正确的分类。(例如,图1)。在形式上,对于给定的分类器,我们定义一个对敌扰动为最小扰动r,它足以改变估计的标签 :
\vartriangle(x;\hat{k}):=min\lVert r \rVert_{2} ~subject~to ~\hat{k}(x+r)\neq\hat{k}(x),\tag{1}
\rho_{adv}(\hat{k})=\mathbb{E}_{x}\frac{\vartriangle(x;\hat{k})}{\lVert x \rVert_{2}}\tag{2}
An accurate method for finding the adversarial perturbations is thus necessary to study and compare the robustness of different classifiers to adversarial perturbations. It might be the key to a better understanding of the limits of current architectures and to design methods to increase robustness. Despite the importance of the vulnerability of state-ofthe-art classifiers to adversarial instability, no well-founded method has been proposed to compute adversarial perturbations and we fill this gap in this paper.
因此,有必要研究和比较不同分类器对敌对摄动的鲁棒性,从而找到找到敌对摄动的精确方法。它可能是更好地理解当前体系结构的局限性和设计方法以增强健壮性的关键。尽管最先进的分类器易受敌对不稳定性的影响,但没有一个很好的方法被提出来计算敌对摄动,我们在本文中填补了这一空白。
Our main contributions are the following:
我们的主要贡献如下:
• We propose a simple yet accurate method for computing and comparing the robustness of different classifiers to adversarial perturbations.
我们提出了一种简单而准确的方法来计算和比较不同分类器对敌对扰动的鲁棒性。
We perform an extensive experimental comparison, and show that 1) our method computes adversarial perturbations more reliably and efficiently than existing methods 2) augmenting training data with adversarial examples significantly increases the robustness to adversarial perturbations.
我们做了一个广泛的实验比较,并表明1)我们的方法计算对抗性扰动比现有的方法更可靠和有效2)增加训练数据对抗性实例显著增加了对对抗性扰动的鲁棒性。
We show that using imprecise approaches for the computation of adversarial perturbations could lead to different and sometimes misleading conclusions about the robustness. Hence, our method provides a better understanding of this intriguing phenomenon and of its influence factors.
我们表明,使用不精确的方法来计算敌对摄动可能导致不同的,有时误导性的关于鲁棒性的结论。因此,我们的方法提供了一个更好的理解这一有趣的现象及其影响因素。
We now review some of the relevant work. The phenomenon of adversarial instability was first introduced and studied in [18]. The authors estimated adversarial examples by solving penalized optimization problems and presented an analysis showing that the high complexity of neural networks might be a reason explaining the presence of adversarial examples. Unfortunately, the optimization method employed in [18] is time-consuming and therefore does not scale to large datasets. In [14], the authors showed that convolutional networks are not invariant to some sort of transformations based on the experiments done on Pascal3D+ annotations. Recently, Tsai et al. [19] provided a software to misclassify a given image in a specified class, without necessarily finding the smallest perturbation. Nguyen et al. [13] generated synthetic unrecognizable images, which are classified with high confidence. The authors of [3] also studied a related problem of finding the minimal geometric transformation that fools image classifiers, and provided quantitative measure of the robustness of classifiers to geometric transformations. Closer to our work, the authors of [4] introduced the “fast gradient sign” method, which computes the adversarial perturbations for a given classifier very efficiently. Despite its efficiency, this method provides only a coarse approximation of the optimal perturbation vectors. In fact, it performs a unique gradient step, which often leads to sub-optimal solutions. Then in an attempt to build more robust classifiers to adversarial perturbations, [5] introduced a smoothness penalty in the training procedure that allows to boost the robustness of the classifier. Notably, the method in [18] was applied in order to generate adversarial perturbations. We should finally mention that the phenomenon of adversarial instability also led to theoretical work in [2] that studied the problem of adversarial perturbations on some families of classifiers, and provided upper bounds on the robustness of these classifiers. A deeper understanding of the phenomenon of adversarial instability for more complex classifiers is however needed; the method proposed in this work can be seen as a baseline to efficiently and accurately generate adversarial perturbations in order to better understand this phenomenon.
我们现在回顾一些相关的工作。在[18]中首次引入并研究了对抗不稳定性现象。作者通过求解惩罚优化问题估计了对抗性实例,并提出了一项分析,表明神经网络的高度复杂性可能是对抗性实例存在的原因之一。不幸的是,在[18]中使用的优化方法是耗时的,因此不能扩展到大型数据集。在[14]中,作者根据在Pascal3D+注释上所做的实验表明,卷积网络对某种转换不是不变的。最近,Tsai等人[19]提供了一种软件,可以对给定的图像进行特定类别的误分类,而不必找到最小的扰动。Nguyen等人[13]生成的是合成的无法识别的图像,这些图像具有很高的置信度。[3]的作者还研究了一个相关的问题,即寻找能使图像分类器蠢笨的最小几何变换,并给出了分类器对几何变换鲁棒性的定量度量。在我们的工作中,[4]的作者引入了快速梯度符号法,它可以非常有效地计算给定分类器的敌对摄动。尽管它的效率,这种方法只提供一个最优摄动向量的粗略逼近。事实上,它执行唯一的梯度步骤,这往往导致次最优解。然后在试图建立更鲁棒的分类器对抗扰动,[5]在训练过程中引入了平滑惩罚,允许提高分类器的鲁棒性。值得注意的是,应用[18]中的方法是为了产生对抗性扰动。最后需要指出的是,对抗性不稳定性现象也导致了[2]中一些分类器族对抗性扰动问题的理论研究,并给出了这些分类器鲁棒性的上界。然而,需要对复杂的分类器的对抗不稳定现象有更深的理解;为了更好地理解这一现象,本工作中提出的方法可以被视为有效和准确地产生对抗性扰动的基线。
The rest of paper is organized as follows. In Section 2, we introduce an efficient algorithm to find adversarial perturbations in a binary classifier. The extension to the multiclass problem is provided in Section 3. In Section 4, we propose extensive experiments that confirm the accuracy of our method and outline its benefits in building more robust classifiers.
论文的其余部分组织如下。在第2节中,我们介绍了一个有效的算法,以寻找对抗摄动在二分类器。第3节提供了对多类问题的扩展。在第4节,我们提出了广泛的实验,以证实我们的方法的准确性,并概述了其优点,以建立更鲁棒的分类器。
r_{\star}(x_{0}):=argmin\lVert\rVert_{2}\\~subjectt ~to~sign~(f(x_{0}+r))\neq(f(x_{0}))=-\frac{f(x_{0})}{\lVert \omega \rVert_{2}^{2}}\omega.\tag{3}
argmin\lVert\rVert_{2}~subject~to~f(x_{i})+\vartriangle_{f}(x_{i})^{T}r_{i}=0\tag{4}
argmin\lVert\rVert_{2}~subject~to~f(x_{i})+\vartriangle_{f}(x_{i})^{T}r_{i}=0\tag{4}
\hat{k}(x)=argmaxf_{k}(x),\tag{5}
现在我们将DeepFool方法扩展到多类情况。多类分类器最常用的方案是one-vs-all。因此,我们也提出了基于此分类方案的方法。在这个方案中,分类器有输出,其中是类的数量。因此,可以将分类器定义为,通过以下映射进行分类:
\hat{k}(x)=argmaxf_{k}(x),\tag{5}
argmin\lVert r\rVert_{2}\\s.t. k:\omega_{k}^{T}(x_{0}+r)+b_{k}\geq \omega_{\hat{k}(x0)}^{T}(x_{0}+r)+b_{\hat{k}(x0)},\tag{6}
argmin\lVert r \rVert_{2}\\s.t. k:\omega_{k}^{T}(x_{0}+r)+b_{k}\geq \omega_{\hat{k}(x0)}^{T}(x_{0}+r)+b_{\hat{k}(x0)},\tag{6}
P=\bigcap\limits_{k=1}^{c}\{ x:f_{\hat{k}(x0)}(x)\geq f_{k}(x) \},\tag{7}
其中wk为w的第k列,几何上,上述问题对应于计算x0到凸多面体P补的距离,
P=\bigcap\limits_{k=1}^{c}\{ x:f_{\hat{k}(x0)}(x)\geq f_{k}(x) \},\tag{7}
\hat{l}(x0) =argmin\frac{\lvert f_{k}(x_{0})-f_{\hat{k}(x0)}(x0) \rvert}{\lVert \omega_{k}-\omega_{\hat{k}(x0)} \rVert_{2}}.\tag{8}
\hat{l}(x0) =argmin\frac{\lvert f_{k}(x_{0})-f_{\hat{k}(x0)}(x0) \rvert}{\lVert \omega_{k}-\omega_{\hat{k}(x0)} \rVert_{2}}.\tag{8}
r∗(x0) =\frac{\lvert \hat{l}_{(x0)}(x_{0})-f_{\hat{k}(x0)}(x0) \rvert}{\lVert \omega_{k}-\omega_{\hat{k}(x0)} \rVert_{2}^{2}}(\omega_{\hat{l}(x0)}-\omega_{\hat{l}(x0)}).\tag{9}
r∗(x0) =\frac{\lvert \hat{l}_{(x0)}(x_{0})-f_{\hat{k}(x0)}(x0) \rvert}{\lVert \omega_{k}-\omega_{\hat{k}(x0)} \rVert_{2}^{2}}(\omega_{\hat{l}(x0)}-\omega_{\hat{l}(x0)}).\tag{9}
In other words, we find the closest projection of x0 on faces of P.
换句话说,我们找到了x0在P的面上最近的投影。
\tilde{P}_{i}=\bigcap\limits_{k=1}^{c}\{ x: f_{k}(x)-f_{\hat{k}(x0)}(x) \} \\+\triangledown f_{k}(x_{i})^{T}x-\triangledown f_{\hat{k}(x0)(x_{i})^{T}x\leq0},\tag{10}
现在我们将DeepFool算法推广到一般情况下的多类可微分类器。对于一般的非线性分类器,Eq.(7)中描述分类器输出标签所在空间区域的集合P不再是一个多面体。根据二元情况下的迭代线性化方法,我们用一个多面体来逼近迭代集
\tilde{P}_{i}=\bigcap\limits_{k=1}^{c}\{ x: f_{k}(x)-f_{\hat{k}(x0)}(x) \} \\+\triangledown f_{k}(x_{i})^{T}x-\triangledown f_{\hat{k}(x0)(x_{i})^{T}x\leq0},\tag{10}
It should be noted that the optimization strategy of DeepFool is strongly tied to existing optimization techniques. In the binary case, it can be seen as Newton’s iterative algorithm for finding roots of a nonlinear system of equations in the underdetermined case [15]. This algorithm is known as the normal flow method. The convergence analysis of this optimization technique can be found for example in [21]. Our algorithm in the binary case can alternatively be seen as a gradient descent algorithm with an adaptive step size that is automatically chosen at each iteration. The linearization in Algorithm 2 is also similar to a sequential convex programming where the constraints are linearized at each step.
值得注意的是,DeepFool的优化策略与现有的优化技术紧密相连。在二值化情况下,它可以看作是求解欠定情况下[15]非线性方程组的牛顿迭代算法。这种算法被称为正流法。该优化技术的收敛性分析可以在[21]中找到实例。我们在二进制情况下的算法可以被看作是一个梯度下降算法,具有自适应步长,在每次迭代时自动选择。算法2中的线性化也类似于顺序凸规划,其中约束在每一步都被线性化。
In this paper, we have measured the perturbations using the ℓ2 norm. Our framework is however not limited to this choice, and the proposed algorithm can simply be adapted to find minimal adversarial perturbations for any ℓp norm (p ∈ [1, ∞)). To do so, the update steps in line 10 and 11 in Algorithm 2 must be respectively substituted by the following updates
\hat{l} =argmin\frac{\lvert f_{k}^{\prime} \rvert}{\lVert \omega_{k}^{\prime} \rVert_{q}}.\tag{11}
r_{i} \gets \frac{\lvert f_{\hat{l}}^{\prime} \rvert}{\lVert \omega_{k}^{\prime} \rVert_{q}^{q}}\lvert \omega_{\hat{l}}^{\prime} \rvert^{q-1}sign(\omega_{\hat{l}}^{\prime}).\tag{12}
在本文中,我们用2范数测量了微扰。然而,我们的框架并不局限于这种选择,而且所提出的算法可以简单地适应于寻找任何p范数(p[1,)的最小对抗性扰动。为此,算法2中的第10行和第11行中的更新步骤必须分别被以下更新替换
\hat{l} =argmin\frac{\lvert f_{k}^{\prime} \rvert}{\lVert \omega_{k}^{\prime} \rVert_{q}}.\tag{11}
r_{i} \gets \frac{\lvert f_{\hat{l}}^{\prime} \rvert}{\lVert \omega_{k}^{\prime} \rVert_{q}^{q}}\lvert \omega_{\hat{l}}^{\prime} \rvert^{q-1}sign(\omega_{\hat{l}}^{\prime}).\tag{12}
where ⊙ is the pointwise product and q = p p−1 . 3 In particular, when p = ∞ (i.e., the supremum norm ℓ∞), these update steps become
其中是点态积,q = p1。特别是,当p =(即,最高规范)时,这些更新步骤变成
\hat{l} =argmin\frac{\lvert f_{k}^{\prime} \rvert}{\lVert \omega_{k}^{\prime} \rVert_{1}}.\tag{13}
r_{i} \gets \frac{\lvert f_{\hat{l}}^{\prime} \rvert}{\lVert \omega_{k}^{\prime} \rVert_{1}}sign(\omega_{\hat{l}}^{\prime}).\tag{14}
We now test our DeepFool algorithm on deep convolutional neural networks architectures applied to MNIST, CIFAR-10, and ImageNet image classification datasets. We consider the following deep neural network architectures:
我们现在在应用于MNIST的深度卷积神经网络架构上测试我们的DeepFool算法, CIFAR-10和ImageNet图像分类数据集。我们考虑以下深度神经网络结构:
MNIST: A two-layer fully connected network, and a two-layer LeNet convoluational neural network architecture [9]. Both networks are trained with SGD with momentum using the MatConvNet [20] package.
MNIST:两层全连接网络,两层LeNet卷积神经网络架构[9]。两个网络都使用MatConvNet[20]包进行了带有动量的SGD训练。
CIFAR-10: We trained a three-layer LeNet architecture, as well as a Network In Network (NIN) architecture [11].
CIFAR-10:我们训练了一个三层的LeNet架构,以及一个网络中的网络(NIN)架构[11]。
ILSVRC 2012: We used CaffeNet [7] and GoogLeNet [17] pre-trained models.
ILSVRC 2012:我们使用CaffeNet[7]和GoogLeNet [17] pre-trained模型。
\tilde{\rho}_{adv}(f)=\frac{1}{\lvert \mathfrak{D} \rvert}\sum\limits_{x\in\mathfrak{D}}\frac{\lVert \tilde{r}(x) \rVert_{2}}{\lVert x \rVert_{2}}\tag{15}
\tilde{\rho}_{adv}(f)=\frac{1}{\lvert \mathfrak{D} \rvert}\sum\limits_{x\in\mathfrak{D}}\frac{\lVert \tilde{r}(x) \rVert_{2}}{\lVert x \rVert_{2}}\tag{15}
where rˆ(x) is the estimated minimal perturbation obtained using DeepFool, and D denotes the test set4 .
在rˆ(x)是使用DeepFool,获得的估计最小扰动和D表示测试set4 。
We compare the proposed DeepFool approach to stateof-the-art techniques to compute adversarial perturbations in [18] and [4]. The method in [18] solves a series of penalized optimization problems to find the minimal perturbation, whereas [4] estimates the minimal perturbation by taking the sign of the gradient
\hat{r}(x)=\epsilon sign(\triangledown_{x}j(\theta,x,y)),\tag{16}
我们将提出的DeepFool方法与计算[18]和[4]中对敌摄动的最新技术进行比较。[18]中的方法解决了一系列不利的优化问题,以找到最小扰动,而[4]通过取梯度的符号来估计最小扰动
\hat{r}(x)=\epsilon sign(\triangledown_{x}j(\theta,x,y)),\tag{16}
with J the cost used to train the neural network, θ is the model parameters, and y is the label of x. The method is called fast gradient sign method. In practice, in the absence of general rules to choose the parameter ǫ, we chose the smallest ǫ such that 90% of the data are misclassified after perturbation.5
用J作为训练神经网络的代价,用kfc作为模型参数,用kfc作为x的标签,该方法称为快速梯度符号法。在实践中,由于缺乏ǫ一般规则选择参数,我们选择最小的ǫ这样perturbation.5后90%的数据被误诊
We report in Table 1 the accuracy and average robustness ρˆadv of each classifier computed using different methods. We also show the running time required for each method to compute one adversarial sample. It can be seen that DeepFool estimates smaller perturbations (hence closer to minimal perturbation defined in (1)) than the ones computed using the competitive approaches. For example, the average perturbation obtained using DeepFool is 5 times lower than the one estimated with [4]. On the ILSVRC2012 challenge dataset, the average perturbation is one order of magnitude smaller compared to the fast gradient method. It should be noted moreover that the proposed approach also yields slightly smaller perturbation vectors than the method in [18]. The proposed approach is hence more accurate in detecting directions that can potentially fool neural networks. As a result, DeepFool can be used as a valuable tool to accurately assess the robustness of classifiers. On the complexity aspect, the proposed approach is substantially faster than the standard method proposed in [18]. In fact, while the approach [18] involves a costly minimization of a series of objective functions, we observed empirically that DeepFool converges in a few iterations (i.e., less than 3) to a perturbation vector that fools the classifier. Hence, the proposed approach reaches a more accurate perturbation vector compared to state-of-the-art methods, while being computationally efficient. This makes it readily suitable to be used as a baseline method to estimate the robustness of very deep neural networks on large-scale datasets. In that context, we provide the first quantitative evaluation of the robustness of state-of-the-art classifiers on the large-scale ImageNet dataset. It can be seen that despite their very good test accuracy, these methods are extremely unstable to adversarial perturbations: a perturbation that is 1000 smaller in magnitude than the original image is sufficient to fool state-of-the-art deep neural networks.
我们在表1中报告了使用不同方法计算的每个分类器的准确率和平均鲁棒性adv。我们还展示了计算一个对抗性样本所需的每种方法的运行时间。可以看出,与使用竞争方法计算的扰动相比,DeepFool估计的扰动更小(因此更接近(1)中定义的最小扰动)。例如,使用DeepFool获得的平均摄动比使用[4]估计的要低5倍。在ILSVRC2012挑战数据集上,平均扰动比快速梯度法小一个数量级。此外,值得注意的是,所提出的方法也比[18]方法产生略小的扰动向量。因此,提出的方法更准确地检测方向,可能欺骗神经网络。因此,DeepFool可以作为一个有价值的工具来准确评估分类器的鲁棒性。在复杂度方面,该方法比[18]中提出的标准方法要快得多。事实上,虽然[18]方法涉及一系列目标函数的代价极小化,但我们从经验上观察到DeepFool在几个迭代(即小于3次)中收敛到一个扰动向量,从而使分类器傻瓜。因此,所提出的方法达到了一个更精确的摄动向量比最先进的方法,而计算效率。这使得它很容易被用作估计大型数据集上深度神经网络鲁棒性的基线方法。在这种情况下,我们提供了第一次定量评估的鲁棒性的先进分类器在大规模ImageNet数据集。可以看出,尽管他们非常好的测试精度,这些方法是极端不稳定的对抗摄动:一个摄动在量级上比原始图像小1000就足以欺骗先进的深度神经网络。
We illustrate in Figure 1 perturbed images generated by the fast gradient sign and DeepFool. It can be observed that the proposed method generates adversarial perturbations which are hardly perceptible, while the fast gradient sign method outputs a perturbation image with higher norm.
我们在图1中说明了由快速梯度符号和DeepFool生成的扰动图像。结果表明,该方法产生了难以察觉的对抗性扰动,而快速梯度符号法则产生了具有较高范数的扰动图像。
值得注意的是,当使用范数测量扰动时,上述结论保持不变:DeepFool产生的对敌扰动比其他计算对敌实例的方法更小(因此更接近最佳值)。表2报告了对对敌摄动的鲁棒性,分别使用DeepFool (P =,见3.3节)和MNIST和ci远-10任务的快速梯度符号法计算r (x)。
Fine-tuning using adversarial examples In this section, we fine-tune the networks of Table 1 on adversarial examples to build more robust classifiers for the MNIST and CIFAR-10 tasks. Specifically, for each network, we performed two experiments: (i) Fine-tuning the network on DeepFool’s adversarial examples, (ii) Fine-tuning the network on the fast gradient sign adversarial examples. We fine-tune the networks by performing 5 additional epochs, with a 50% decreased learning rate only on the perturbed training set. For each experiment, the same training data was used through all 5 extra epochs. For the sake of completeness, we also performed 5 extra epochs on the original data. The evolution of ρˆadv for the different fine-tuning strategies is shown in Figures 6a to 6d, where the robustness ρˆadv is estimated using DeepFool, since this is the most accurate method, as shown in Table 1. Observe that finetuning with DeepFool adversarial examples significantly increases the robustness of the networks to adversarial perturbations even after one extra epoch. For example, the robustness of the networks on MNIST is improved by 50% and NIN’s robustness is increased by about 40%. On the other hand, quite surprisingly, the method in [4] can lead to a decreased robustness to adversarial perturbations of the network. We hypothesize that this behavior is due to the fact that perturbations estimated using the fast gradient sign method are much larger than minimal adversarial perturbations. Fine-tuning the network with overly perturbed images decreases the robustness of the networks to adversarial perturbations. To verify this hypothesis, we compare in Figure 7 the adversarial robustness of a network that is fine-tuned with the adversarial examples obtained using DeepFool, where norms of perturbations have been deliberately multiplied by α = 1, 2, 3. Interestingly, we see that by magnifying the norms of the adversarial perturbations, the robustness of the fine-tuned network is decreased. This might explain why overly perturbed images decrease the robustness of MNIST networks: these perturbations can really change the class of the digits, hence fine-tuning based on these examples can lead to a drop of the robustness (for an illustration, see Figure 8). This lends credence to our hypothesis, and further shows the importance of designing accurate methods to compute minimal perturbations.
在本节中,我们将针对敌对示例对表1的网络进行微调,从而为MNIST和ci远-10任务构建更健壮的分类器。具体地说,对于每个网络,我们进行了两个实验:(i)对DeepFool的对抗性例子进行网络微调,(ii)对快速梯度符号对抗性例子进行网络微调。我们通过执行5个额外的epoch来微调网络,只有在受扰动的训练集上学习率降低了50%。对于每个实验,在所有额外的5个epoch中使用相同的训练数据。为了完整起见,我们还对原始数据进行了5次额外epoch处理。从图6a到图6d显示了不同微调策略下的发展趋势,其中由于使用DeepFool是最精确的方法,所以使用鲁棒性的观察到finetuning与DeepFool的对抗性例子显著地增加了网络对对抗性扰动的鲁棒性,即使是在一个额外的纪元之后。例如,MNIST上的网络鲁棒性提高了50%,NIN s鲁棒性提高了约40%。另一方面,相当令人惊讶的是,[4]中的方法会导致对网络的敌对扰动的鲁棒性下降。我们假设这种行为是由于使用快速梯度符号法估计的扰动要比最小对抗性扰动大得多。对具有过摄动图像的网络进行微调,会降低网络对敌对摄动的鲁棒性。为了验证这一假设,我们在图7中比较了一个网络的对抗鲁棒性,该网络与使用DeepFool获得的对抗示例进行了微调,其中扰动的规范被故意乘以了1,2,3。有趣的是,我们看到,通过放大对抗性扰动的规范,微调网络的鲁棒性会降低。这也许可以解释为什么过分扰乱图像减少MNIST网络的健壮性:这些扰动可以真正改变数字的类,因此调整基于这些例子会导致下降的鲁棒性(有关说明,请参见图8),这让人们相信我们的假设,并进一步显示了设计的重要性,准确的方法来计算最小扰动。
Table 3 lists the accuracies of the fine-tuned networks. It can be seen that fine-tuning with DeepFool can improve the accuracy of the networks. Conversely, fine-tuning with the approach in [4] has led to a decrease of the test accuracy in all our experiments. This confirms the explanation that the fast gradient sign method outputs overly perturbed images that lead to images that are unlikely to occur in the test data. Hence, it decreases the performance of the method as it acts as a regularizer that does not represent the distribution of the original data. This effect is analogous to geometric data augmentation schemes, where large transformations of the original samples have a counter-productive effect on generalization.6
表3列出了经过微调的网络的精确度。可以看出,使用DeepFool进行微调可以提高网络的准确性。相反,在[4]中使用这种方法进行微调会导致我们所有实验中测试精度的下降。这证实了快速梯度符号方法输出的图像受到过扰动,导致图像不太可能出现在测试数据中的解释。因此,它降低了方法的性能,因为它充当了一个不代表原始数据分布的正则化器。这种效果类似于几何数据增强方案,在这种方案中,原始样本的大变换会对推广产生反作用
To emphasize the importance of a correct estimation of the minimal perturbation, we now show that using approximate methods can lead to wrong conclusions regarding the adversarial robustness of networks. We fine-tune the NIN classifier on the fast gradient sign adversarial examples. We follow the procedure described earlier but this time, we decreased the learning rate by 90%. We have evaluated the adversarial robustness of this network at different extra epochs using DeepFool and the fast gradient sign method. As one can see in Figure 9, the red plot exaggerates the effect of training on the adversarial examples. Moreover, it is not sensitive enough to demonstrate the loss of robustness at the first extra epoch. These observations confirm that using an accurate tool to measure the robustness of classifiers is crucial to derive conclusions about the robustness of networks.
为了强调对最小扰动的正确估计的重要性,我们现在表明,使用近似方法可能会导致关于网络对抗鲁棒性的错误结论。我们在快速梯度符号对抗式的例子中对NIN分类器进行了微调。我们遵循前面描述的程序,但这一次,我们降低了90%的学习率。我们用DeepFool和快速梯度符号法评估了该网络在不同额外时点的对抗鲁棒性。从图9中可以看出,红色的图夸大了训练对敌对例子的影响。此外,它还没有足够的敏感性来证明在第一个额外的纪元中鲁棒性的丧失。这些观察结果证实了使用一个准确的工具来测量分类器的鲁棒性对于得出有关网络鲁棒性的结论是至关重要的。
In this work, we proposed an algorithm, DeepFool, to compute adversarial examples that fool state-of-the-art classifiers. It is based on an iterative linearization of the classifier to generate minimal perturbations that are sufficient to change classification labels. We provided extensive experimental evidence on three datasets and eight classifiers, showing the superiority of the proposed method over stateof-the-art methods to compute adversarial perturbations, as well as the efficiency of the proposed approach. Due to its accurate estimation of the adversarial perturbations, the proposed DeepFool algorithm provides an efficient and accurate way to evaluate the robustness of classifiers and to enhance their performance by proper fine-tuning. The proposed approach can therefore be used as a reliable tool to accurately estimate the minimal perturbation vectors, and build more robust classifiers.
在这项工作中,我们提出了一个算法,DeepFool,以计算对抗的例子,愚弄先进的分类器。它基于分类器的迭代线性化来产生足以改变分类标签的最小扰动。我们在3个数据集和8个分类器上提供了广泛的实验证据,显示了所提方法在计算对抗性扰动方面优于现有方法,以及所提方法的效率。由于DeepFool算法能够准确地估计对抗性扰动,因此可以有效地评估分类器的鲁棒性,并通过适当的微调提高分类器的性能。因此,所提出的方法可以作为一个可靠的工具,以准确估计最小扰动向量,并建立更鲁棒的分类器。
[1] D. Chicco, P. Sadowski, and P. Baldi. Deep autoencoder neural networks for gene ontology annotation predictions. In ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, pages 533–540, 2014.
[2] A. Fawzi, O. Fawzi, and P. Frossard. Analysis of classifiers’ robustness to adversarial perturbations. CoRR, abs/1502.02590, 2015.
[3] A. Fawzi and P. Frossard. Manitest: Are classifiers really invariant? In British Machine Vision Conference (BMVC), pages 106.1–106.13, 2015.
[4] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
[5] S. Gu and L. Rigazio. Towards deep neural network architectures robust to adversarial examples. CoRR, abs/1412.5068, 2014.
[6] G. E. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag., 29(6):82–97, 2012.
[7] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM International Conference on Multimedia (MM), pages 675–678. ACM, 2014.
[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS), pages 1097–1105, 2012.
[9] Y. LeCun, P. Haffner, L. Bottou, and Y. Bengio. Object recognition with gradient-based learning. In Shape, contour and grouping in computer vision, pages 319–345. 1999.
[10] Y. LeCun, K. Kavukcuoglu, C. Farabet, et al. Convolutional networks and applications in vision. In IEEE International Symposium on Circuits and Systems (ISCAS), pages 253– 256, 2010.
[11] M. Lin, Q. Chen, and S. Yan. Network in network. 2014.
[12] T. Mikolov, A. Deoras, D. Povey, L. Burget, and J. Cernock ˇ y.` Strategies for training large scale neural network language models. In IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pages 196–201, 2011.
[13] A. Nguyen, J. Yosinski, and J. Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 427–436, 2015.
[14] B. Pepik, R. Benenson, T. Ritschel, and B. Schiele. What is holding back convnets for detection? In Pattern Recognition, pages 517–528. Springer, 2015.
[15] A. P. Ruszczynski. ´ Nonlinear optimization, volume 13. Princeton university press, 2006.
[16] M. Spencer, J. Eickholt, and J. Cheng. A deep learning network approach to ab initio protein secondary structure prediction. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 12(1):103–112, 2015.
[17] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1– 9, 2015.
[18] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014.
[19] C.-Y. Tsai and D. Cox. Are deep learning algorithms easily hackable? http://coxlab.github.io/ ostrichinator.
[20] A. Vedaldi and K. Lenc. Matconvnet: Convolutional neural networks for matlab. In ACM International Conference on Multimedia (MM), pages 689–692, 2015.
[21] H. F. Walker and L. T. Watson. Least-change secant update methods for underdetermined systems. SIAM Journal on numerical analysis, 27(5):1227–1262, 1990.
where is an image and is the estimated label. We call the robustness of at point . The robustness of classifier is then defined as
其中x是一个形象和 估计标签。我们称之为 的健壮性 点 。分类器的鲁棒性然后定义为
where is the expectation over the distribution of data. The study of adversarial perturbations helps us understand what features are used by a classifier. The existence of such examples is seemingly in contradiction with the generalization ability of the learning algorithms. While deep networks achieve state-of-the-art performance in image classification tasks, they are not robust at all to small adversarial perturbations and tend to misclassify minimally perturbed data that looks visually similar to clean samples. Though adversarial attacks are specific to the classifier, it seems that the adversarial perturbations are generalizable across different models [18]. This can actually become a real concern from a security point of view.
是对数据分布的期望。对敌干扰的研究有助于我们理解分类器使用什么特性。这些例子的存在似乎与学习算法的泛化能力相矛盾。虽然深度网络在图像分类任务中取得了最先进的性能,但它们对小的敌对干扰一点都不鲁棒,而且往往会对视觉上类似于干净样本的最小扰动数据进行错误分类。虽然对敌攻击是特定于分类器的,但对敌干扰似乎可以在不同的[18]模型中推广。从安全的角度来看,这实际上是一个值得关注的问题。
As a multiclass classifier can be viewed as aggregation of binary classifiers, we first propose the algorithm for binary classifiers. That is, we assume here , where is an arbitrary scalar-valued image classification function . We also denote by the level set at zero of f. We begin by analyzing the case where f is an affine classifier , and then derive the general algorithm, which can be applied to any differentiable binary classifier
由于一个多类分类器可以看作是二分类器的集合,我们首先提出了二分类器的算法。我们假设这里 , 是一个任意纯量值图像分类函数 我们也表示 零的水平集 。我们首先分析的情况是一个仿射分级机 ,然后推导出通用算法,可以应用于任何可微的二元分类器。
In the case where the classifier is affine, it can easily be seen that the robustness of f at point , , is equal to the distance from to the separating affine hyperplane (Figure 2). The minimal perturbation to change the classifier’s decision corresponds to the orthogonal projection of x0 onto F. It is given by the closed-form formula:
Assuming now that is a general binary differentiable classifier, we adopt an iterative procedure to estimate the robustness . Specifically, at each iteration, is linearized around the current point and the minimal perturbation of the linearized classifier is computed as
假设 是一个通用的二值可微分类器,我们采用迭代方法估计其鲁棒性 。具体来说,在每次迭代时, 在当前点 附近线性化,线性化分类器的最小扰动计算为
The perturbation at iteration of the algorithm is computed using the closed form solution in Eq. (3), and the next iterate is updated. The algorithm stops when changes sign of the classifier. The DeepFool algorithm for binary classifiers is summarized in Algorithm 1 and a geometric illustration of the method is shown in Figure 3.
利用公式(3)中的封闭形式解计算算法第i次迭代时的摄动 ,并更新下一次迭代 。当分类器的符号 发生变化时,算法停止。算法1总结了二值分类器的DeepFool算法,图3给出了该方法的几何图示。
In practice, the above algorithm can often converge to a point on the zero level set . In order to reach the other side of the classification boundary, the final perturbation vector is multiplied by a constant , with . In our experiments, we have used .
在实践中,上述算法通常可以收敛到一个点在零水平集 .为了达到分类边界的另一边,最后一个扰动向量 乘以一个常数 ,与 。在我们的实验中,我们使用了 。
Figure 3: Illustration of Algorithm 1 for n = 2. Assume x0 ∈ R n. The green plane is the graph of , which is tangent to the classifier function (wire-framed graph) . The orange line indicates where . is obtained from x0 by projecting x0 on the orange hyperplane of .
图3: 时算法1的说明。设 ,绿色平面为 的图,与分类器函数(线框图) 相切。橙色线表示 处。x1是通过将x0投影到 的橙色超平面上得到的。
We now extend the DeepFool method to the multiclass case. The most common used scheme for multiclass classifiers is one-vs-all. Hence, we also propose our method based on this classification scheme. In this scheme, the classifier has outputs where is the number of classes. Therefore, a classifier can be defined as and the classification is done by the following mapping:
where is the output of that corresponds to the k th class. Similarly to the binary case, we first present the proposed approach for the linear case and then we generalize it to other classifiers.
其中 是对应于第k类的 的输出。与二值分类器相似,我们首先给出了线性分类器的方法,然后将其推广到其他分类器中。
Let be an affine classifier, i.e., for a given and . Since the mapping is the outcome of a one-vs-all classification scheme, the minimal perturbation to fool the classifier can be rewritten as follows
让 是一个仿射分类器,也就是说, 给定W 和b。自映射 是one-vs-all分类方案的结果,最小扰动愚弄的分类器可以改写如下
where is the column of . Geometrically, the above problem corresponds to the computation of the distance between and the complement of the convex polyhedron P,
where is located inside P. We denote this distance by . The polyhedron P defines the region of the space where f outputs the label . This setting is depicted in Figure 4. The solution to the problem in Eq. (6) can be computed in closed form as follows. Define to be the closest hyperplane of the boundary of P (e.g. in Figure 4). Formally, can be computed as follows
其中 位于p内,我们用 表示此距离。多面体P定义了f输出标签 的空间区域。此设置如图4所示。式(6)中问题的解可以以封闭形式计算如下:定义 为P(如图4中的 边界的最近超平面。形式上, 可以计算如下
The minimum perturbation is the vector that projects on the hyperplane indexed by , i.e.,
最小扰动 是超平面上的向量项目 被 ,也就是说,
We now extend the DeepFool algorithm to the general case of multiclass differentiable classifiers. For general non-linear classifiers, the set P in Eq. (7) that describes the region of the space where the classifier outputs label is no longer a polyhedron. Following the explained iterative linearization procedure in the binary case, we approximate the set at iteration by a polyhedron
We then approximate, at iteration i, the distance between xi and the complement of P, , by . Specifically, at each iteration of the algorithm, the perturbation vector that reaches the boundary of the polyhedron is computed, and the current estimate updated. The method is given in Algorithm 2. It should be noted that the proposed algorithm operates in a greedy way and is not guaranteed to converge to the optimal perturbation in (1). However, we have observed in practice that our algorithm yields very small perturbations which are believed to be good approximations of the minimal perturbation.
然后在第i次迭代时,用 来近似xi与P的补码 之间的距离。具体来说,在算法的每次迭代中,计算到达多面体 边界的摄动向量,并更新当前估计值。该方法在算法2中给出。需要注意的是,所提出的算法以一种贪婪的方式运行,不能保证收敛于(1)中的最优摄动。然而,我们在实践中观察到,我们的算法产生非常小的摄动,被认为是最小摄动的良好近似。
In order to evaluate the robustness to adversarial perturbations of a classifier , we compute the average robustness , defined by
为了评估分类器f对敌对扰动的鲁棒性,我们计算平均鲁棒性 ,定义的
It should be noted that, when perturbations are measured using the norm, the above conclusions remain unchanged: DeepFool yields adversarial perturbations that are smaller (hence closer to the optimum) compared to other methods for computing adversarial examples. Table 2 reports the robustness to adversarial perturbations measured by , where is computed respectively using DeepFool (with p = ∞, see Section 3.3), and the Fast gradient sign method for MNIST and CIFAR-10 tasks.