利用输入多样性提高对抗性算例的可转移性

Improving Transferability of Adversarial Examples with Input Diversity

原文连接:

https://openaccess.thecvf.com/content_CVPR_2019/papers/Xie_Improving_Transferability_of_Adversarial_Examples_With_Input_Diversity_CVPR_2019_paper.pdf

GB/T 7714 Xie C, Zhang Z, Zhou Y, et al. Improving transferability of adversarial examples with input diversity[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 2730-2739.

MLA Xie, Cihang, et al. "Improving transferability of adversarial examples with input diversity." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

APA Xie, C., Zhang, Z., Zhou, Y., Bai, S., Wang, J., Ren, Z., & Yuille, A. L. (2019). Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2730-2739).

Abstract

摘要

Though CNNs have achieved the state-of-the-art performance on various vision tasks, they are vulnerable to adversarial examples — crafted by adding human-imperceptible perturbations to clean images. However, most of the existing adversarial attacks only achieve relatively low success rates under the challenging black-box setting, where the attackers have no knowledge of the model structure and parameters. To this end, we propose to improve the transferability of adversarial examples by creating diverse input patterns. Instead of only using the original images to generate adversarial examples, our method applies random transformations to the input images at each iteration. Extensive experiments on ImageNet show that the proposed attack method can generate adversarial examples that transfer much better to different networks than existing baselines. By evaluating our method against top defense solutions and official baselines from NIPS 2017 adversarial competition, the enhanced attack reaches an average success rate of 73.0%, which outperforms the top-1 attack submission in the NIPS competition by a large margin of 6.6%. We hope that our proposed attack strategy can serve as a strong benchmark baseline for evaluating the robustness of networks to adversaries and the effectiveness of different defense methods in the future. Code is available at https: //github.com/cihangxie/DI-2-FGSM.

尽管cnn已经在各种视觉任务上取得了最先进的性能,但它们仍然容易受到与之相反的例子的攻击,这些例子是通过在清晰的图像上添加人眼无法察觉的干扰而制作的。然而,现有的大多数对抗性攻击都是在具有挑战性的黑盒设置下,攻击者不知道模型结构和参数,成功率相对较低。为此,我们建议通过创建不同的输入模式来提高对抗性示例的可移植性。我们的方法不是只使用原始图像来生成对抗性的示例,而是在每次迭代时对输入图像进行随机变换。在ImageNet上的大量实验表明,与现有的基线相比,所提出的攻击方法能够更好地在不同的网络上产生敌对的攻击实例。通过对比顶级防御方案和NIPS 2017对抗赛官方基线,我们的方法增强攻击平均成功率为73.0%,比NIPS比赛中排名第一的攻击提交成功率高出6.6%。我们希望我们提出的攻击策略可以作为一个强大的基准基线,在未来评估网络对敌人的鲁棒性和不同防御方法的有效性。代码可以通过https: //github.com/cihangxie/DI-2-FGSM获得。

1. Introduction

1. 介绍

Recent success of Convolutional Neural Networks (CNNs) leads to a dramatic performance improvement on various vision tasks, including image classification [15, 32, 13], object detection [10, 28, 40] and semantic segmentation [22, 5]. However, CNNs are extremely vulnerable to small perturbations to the input images, i.e., humanimperceptible additive perturbations can result in failure predictions of CNNs. These intentionally crafted images are known as adversarial examples [36]. Learning how to generate adversarial examples can help us investigate the robustness of different models [1] and understand the insufficiency of current training algorithms [11, 17, 37].

卷积神经网络(CNNs)的成功使其在图像分类[15,32,13]、目标检测[10,28,40]和语义分割[22,5]等各种视觉任务中的性能有了显著提高。然而,CNNs对于输入图像的小扰动非常脆弱,即人为的不可察觉加性扰动会导致CNNs的失效预测。这些精心制作的图像被称为[36]。学习如何生成对抗性的例子,可以帮助我们研究不同模型[1]的鲁棒性,了解目前训练算法的不足[11,17,37]。

Several methods [11, 36, 16] have been proposed recently to find adversarial examples. In general, these attacks can be categorized into two types according to the number of steps of gradient computation, i.e., single-step attacks [11] and iterative attacks [36, 16]. Generally, iterative attacks can achieve higher success rates than single-step attacks in the white-box setting, where the attackers have a perfect knowledge of the network structure and weights. However, if these adversarial examples are tested on a different network (either in terms of network structure, weights or both), i.e., the black-box setting, single-step attacks perform better. This trade-off is due to the fact that iterative attacks tend to overfit the specific network parameters (i.e., have high white-box success rates) and thus making generated adversarial examples rarely transfer to other networks (i.e., have low black-box success rates), while single-step attacks usually underfit to the network parameters (i.e., have low white-box success rates) thus producing adversarial examples with slightly better transferability. Observing the phenomenon, one interesting question is whether we can generate adversarial examples with high success rates under both white-box and black-box settings.

最近提出了几种方法[11,36,16]来寻找敌对的例子。一般来说,根据梯度计算的步数可以将这些攻击分为两种类型,即单步攻击[11]和迭代攻击[36,16]。一般来说,在白盒设置下,迭代攻击比单步攻击的成功率更高。在白盒设置下,攻击者对网络结构和权值都有很好的了解。然而,如果这些对抗的例子在不同的网络上测试(无论是在网络结构、权重方面还是两者都测试),例如,黑盒设置,单步攻击的性能更好。这代价是由于这一事实迭代攻击倾向于overfit特定网络参数(即白盒成功率很高),从而使生成的对抗性的例子很少转移到其他网络(即黑盒成功率较低),而单步攻击通常underfit网络参数(即白盒成功率低)因此产生敌对的可转让性稍高一点的例子。观察到这一现象,一个有趣的问题是,在白盒和黑盒的设置下,我们是否能够产生高成功率的对抗性例子。

In this work, we propose to improve the transferability of adversarial examples by creating diverse input patterns. Our work is inspired by the data augmentation [15, 32, 13] strategy, which has been proven effective to prevent networks from overfitting by applying a set of label-preserving transformations (e.g., resizing, cropping and rotating) to training images. Meanwhile, [38, 12] showed that image transformations can defend against adversarial examples under certain situations, which indicates adversarial examples cannot generalize well under different transformations. These transformed adversarial examples are known as hard examples [30, 31] for attackers, which can then be served as good samples to produce more transferable adversarial examples.

在这项工作中,我们建议通过创建不同的输入模式来提高对抗性示例的可移植性。我们的工作受到了数据增强[15,32,13]策略的启发,该策略已被证明可以通过对训练图像应用一组标签保持变换(如缩放、裁剪和旋转)有效地防止网络过拟合。同时[38,12]表明,图像变换在某些情况下可以抵抗对抗性例子,这说明对抗性例子在不同的变换下不能很好地推广。对攻击者来说,这些转换后的对抗例子被称为硬例子[30,31],然后可以作为好的样本,产生更多可转移的对抗例子。

We incorporate the proposed input diversity strategy with iterative attacks, e.g., I-FGSM [17] and MI-FGSM [9]. At each iteration, unlike the traditional methods which maximize the loss function directly w.r.t. the original inputs, we apply random and differentiable transformations (e.g., random resizing, random padding) to the input images with probability p and maximize the loss function w.r.t. these transformed inputs. Note that these randomized operations were previously used to defend against adversarial examples [38], while here we incorporate them into the attack process to create hard and diverse input patterns. Fig. 1 shows an adversarial example generated by our method and compares the success rates to other attack methods under both white-box and black-box settings.

我们将所提出的输入分集策略与迭代攻击相结合,例如I-FGSM[17]和MI-FGSM[9]。在每次迭代中,不像传统方法那样直接将原始输入的损失函数最大化,我们对输入图像应用随机可微变换(例如,随机大小调整,随机填充),并将这些转换后的输入的损失函数最大化。请注意,这些随机化的操作以前是用来抵御敌对示例[38]的,而在这里,我们将它们合并到攻击过程中,以创建硬的和多样化的输入模式。图1给出了一个由我们的方法生成的对抗实例,并比较了在白盒和黑盒设置下与其他攻击方法的成功率。

We test the proposed input diversity on several network under both white-box and black-box settings, and singlemodel and multi-model settings. Compared with traditional iterative attacks, the results on ImageNet (see Sec. 4.2) show that our method gets significantly higher success rates for black-box models and maintains similar success rates for white-box models. By evaluating our attack method w.r.t. the top defense solutions and official baselines from NIPS 2017 adversarial competition [18], this enhanced attack reaches an average success rate of 73.0%, which outperforms the top-1 attack submission in the NIPS competition by a large margin of 6.6%. We hope that our proposed attack strategy can serve as a benchmark for evaluating the robustness of networks to adversaries and the effectiveness of different defense methods in future.

我们在几个网络上测试了所提出的输入分集在白盒和黑盒设置,单模型和多模型设置下。与传统的迭代攻击相比,ImageNet上的结果(参见第4.2节)表明,我们的方法在黑箱模型上获得了显著的更高成功率,在白箱模型上保持了类似的成功率。通过对NIPS 2017对抗赛[18]的顶级防御方案和官方基准的攻击方法w.r.t.进行评估,该增强攻击平均成功率达到73.0%,比NIPS比赛中排名前1的攻击提交高出6.6%。我们希望我们所提出的攻击策略可以作为未来评估网络对敌人的鲁棒性和不同防御方法的有效性的基准。

2. 相关工作

2.1. Generating Adversarial Examples

2.1 产生敌对的例子

Traditional machine learning algorithms are known to be vulnerable to adversarial examples [7, 14, 3]. Recently, Szegedy et al. [36] pointed out that CNNs are also fragile to adversarial examples, and proposed a box-constrained LBFGS method to find adversarial examples reliably. Due to the expensive computation in [36], Goodfellow et al. [11] proposed the fast gradient sign method to generate adversarial examples efficiently by performing a single gradient step. This method was extended by Kurakin et al. [16] to an iterative version, and showed that the generated adversarial examples can exist in the physical world. Dong et al. [9] proposed a broad class of momentum-based iterative algorithms to boost the transferability of adversarial examples. The transferability can also be improved by attacking an ensemble of networks simultaneously [21]. Besides image classification, adversarial examples also exist in object detection [39], semantic segmentation [39, 6], speech recognition [6], deep reinforcement learning [20], etc.. Unlike adversarial examples which can be recognized by human, Nguyen et al. [25] generated fooling images that are different from natural images and difficult for human to recognize, but CNNs classify these images with high confidences.

传统的机器学习算法被认为容易受到敌对例子的攻击[7,14,3]。最近,Szegedy等人[36]指出,cnn在对抗算例面前也很脆弱,提出了box-constrained LBFGS方法可靠地寻找对抗算例。由于[36]计算量大,Goodfellow等[11]提出了快速梯度符号法,通过执行单一梯度步长,可以有效地生成对抗性示例。由Kurakin等人[16]将该方法推广到一个迭代版本,并证明所生成的对抗式实例可以存在于物理世界中。Dong等人[9]提出了一种广泛的基于动量的迭代算法来提高对抗性例子的可转移性。同时攻击网络集合[21]也可以提高可转让性。除了图像分类之外,在目标检测[39]、语义分割[39,6]、语音识别[6]、深度强化学习[20]等方面也存在着不一致的例子。与人类可以识别的对抗性例子不同,Nguyen等人[25]生成的欺骗图像不同于自然图像,难以被人类识别,但CNNs对这些图像的分类具有很高的可信度。

Our proposed input diversity is also related to EOT [2]. These two works differ in several aspects: (1) we mainly focus on the challenging black-box setting while [2] focuses on the white-box setting; (2) our work aims at alleviating overfitting in adversarial attacks, while [2] aims at making adversarial examples robust to transformations, without any discussion of overfitting; and (3) we do not apply expectation step in each attack iteration, while “expectation” is the core idea in [2].

我们提出的输入分集也与EOT[2]有关。这两项工作有几个不同之处:(1)我们主要关注具有挑战性的黑箱设置,而[2]主要关注白箱设置;(2)我们的工作旨在减轻对抗性攻击中的过拟合,而[2]旨在使对抗性例子对转换健壮,不讨论过拟合;(3)在每次攻击迭代中我们都没有使用期望步骤,而期望是[2]的核心思想。

2.2. Defending Against Adversarial Examples

2.2. 抵御敌对的例子

Conversely, many methods have been proposed recently to defend against adversarial examples. [11, 17] proposed to inject adversarial examples into the training data to increase the network robustness. Tramer` et al. [37] pointed out that such adversarially trained models still remain vulnerable to adversarial examples, and proposed ensemble adversarial training, which augments training data with perturbations transferred from other models, in order to improve the network robustness further. [38, 12] utilized randomized image transformations to inputs at inference time to mitigate adversarial effects. Dhillon et al. [8] pruned a random subset of activations according to their magnitude to enhance network robustness. Prakash et al. [27] proposed a framework which combines pixel deflection with soft wavelet denoising to defend against adversarial examples. [24, 33, 29] leveraged generative models to purify adversarial images by moving them back towards the distribution of clean images.

相反地,最近提出了许多方法来对抗对抗的例子。[11, 17]提出在训练数据中注入对抗性例子,提高网络的鲁棒性。Tramer’et al.[37]指出,这种经过对抗性训练的模型仍然容易受到对抗性实例的攻击,并提出了集成对抗性训练,利用来自其他模型的扰动来增强训练数据,以进一步提高网络的鲁棒性。[38,12]利用随机图像变换在推理时间输入,以减轻对抗性影响。Dhillon等人[8]根据激活的大小对激活的随机子集进行修剪,以增强网络的健壮性。Prakash等人[27]提出了一种将像素偏转与软小波去噪相结合的框架来抵御敌对的例子。[24, 33, 29]利用生成模型来净化敌对的图像,将它们移回干净图像的分布位置。

3. Methodology

3.方法

Let X denote an image, and ytruey^{true} denote the corresponding ground-truth label. We use θ to denote the network parameters, and L(X,ytrue;θ)L(X, y^{true}; θ) to denote the loss. To generate the adversarial example, the goal is to maximize the loss L(X+r,ytrue;θ)L(X +r, y^{true}; θ) for the image X, under the constraint that the generated adversarial example Xadv = X + r should look visually similar to the original image X and the corresponding predicted label y adv 6= y true. In this work, we use l∞-norm to measure the perceptibility of adversarial perturbations, i.e., ||r||∞ ≤ ǫ. The loss function is defined as

设X表示一幅图像, ytruey^{true} 表示对应的地真标号。用极小值表示网络参数, L(X+r,ytrue;θ)L(X +r, y^{true}; θ) 表示损失。为了生成对抗的例子,目标是使损失 L(X,ytrue;θ)L(X, y^{true}; θ) 对于图像X,在生成的敌对示例Xadv = X + r在视觉上与原始图像X相似以及对应的预测标签y adv 6= y true的约束下。在这项工作中,我们使用l标准来衡量的感觉力敌对的扰动,也就是说,| | r | |ǫ。损失函数定义为

where 1ytrue1_{y^{true}} is the one-hot encoding of the ground-truth ytruey^{true} and l(X;θ)l(X; θ) is the logits output. Note that all the baseline attacks have been implemented in the cleverhans library [26], which can be used directly for our experiments.

1ytrue1_{y^{true}} 日渐y的一个炎热的编码是 ytruey^{true} 的和 l(X;θ)l(X; θ) 是logits的输出。注意,所有的基线攻击都是在cleverhans库[26]中实现的,可以直接用于我们的实验。

3.1. Family of Fast Gradient Sign Methods

3.1. 快速梯度符号法族

In this section, we give an overview of the family of fast gradient sign methods.

在这一节中,我们给出了快速梯度符号方法族的概述。

Fast Gradient Sign Method (FGSM). FGSM [11] is the first member in this attack family, which finds the adversarial perturbations in the direction of the loss gradient XL(X,ytrue;θ)∇_{X}L(X, y^{true}; θ) . The update equation is

快速梯度符号法(FGSM)。FGSM[11]是这个攻击家族的第一个成员,它发现了损失梯度方向上的敌对摄动 XL(X,ytrue;θ)∇_{X}L(X, y^{true}; θ) 。更新方程为

Iterative Fast Gradient Sign Method (I-FGSM). Kurakin et al. [17] extended FGSM to an iterative version, which can be expressed as

迭代快速梯度符号法(I-FGSM)。Kurakin等人[17]将FGSM扩展为迭代版本,可表示为

where Clipǫ X indicates the resulting image are clipped within the ǫ-ball of the original image X, n is the iteration number and α is the step size.

在剪辑ǫ X显示生成的图像中剪ǫ球的原始图像X, n是迭代数量和α是步长。

Momentum Iterative Fast Gradient Sign Method (MIFGSM). MI-FGSM [9] proposed to integrate the momentum term into the attack process to stabilize update directions and escape from poor local maxima. The updating procedure is similar to I-FGSM, with the replacement of Eq. (3) by:

动量迭代快速梯度符号法(MIFGSM)。MI-FGSM[9]提出将动量项纳入攻击过程,稳定更新方向,避免局部极值。更新过程与I-FGSM相似,将式(3)替换为

where µ is the decay factor of the momentum term and gn is the accumulated gradient at iteration n.

式中,是动量项的衰减因子,gn是迭代n时的累积梯度。

3.2. Motivation

3.2. 动机

Let θ~\tilde{\theta} denote the unknown network parameters. In general, a strong adversarial example should have high success rates on both white-box models, i.e., L(Xadv,ytrue;θ>L(X,ytrue;θ)L(X^{adv}, y^{true;}\theta>L(X, y^{true};θ) , and black-box models, i.e., L(Xadv, ytrue; ˆθ) > L(X, ytrue; ˆθ). On one hand, the traditional single-step attacks, e.g., FGSM, tend to underfit to the specific network parameters θ due to inaccurate linear appropriation of the loss L(X, ytrue; θ), thus cannot reach high success rates on white-box models. On the other hand, the traditional iterative attacks, e.g., IFGSM, greedily perturb the images in the direction of the sign of the loss gradient ∇XL(X, ytrue; θ) at each iteration, and thus easily fall into the poor local maxima and overfit to the specific network parameters θ. These overfitted adversarial examples rarely transfer to black-box models. In order to generate adversarial examples with strong transferability, we need to find a better way to optimize the loss L(X, ytrue; θ) to alleviate this overfitting phenomenon.

θ~\tilde{\theta} 表示未知的网络参数。 一般来说,一个强有力的对抗性例子应该在两种白盒模型上都有很高的成功率,例如, L(Xadv,ytrue;θ>L(X,ytrue;θ)L(X^{adv}, y^{true;}\theta>L(X, y^{true};θ) ,黑盒模型, ,即 L(Xadv,ytrue;θ^>L(X,ytrue;θ^)L(X^{adv}, y^{true;} \widehat{\theta}>L(X, y^{true};\widehat{\theta}) 。一方面,传统的单步攻击,如FGSM,由于对损失 L(Xadv,ytrue;θ)L(X^{adv}, y^{true;}\theta) 的不准确的线性占用,容易使其不符合特定的网络参数。因此,在白盒模型上无法达到很高的成功率。另一方面,传统的迭代攻击,如IFGSM,贪婪地沿损失梯度符号XL(X, ytrue)方向扰动图像;在每次迭代时,容易陷入较差的局部极值,并过度拟合特定的网络参数。这些过度匹配的对抗性例子很少转化为黑箱模型。为了生成可转移性强的对抗性实例,需要寻找更好的方法来优化损失L(X, ytrue;为了缓解这种过拟合现象。

Data augmentation [15, 32, 13] is shown as an effective way to prevent networks from overfitting during the training process. Meanwhile, [38, 12] showed that adversarial examples are no longer malicious if simple image transformations are applied, which indicates these transformed adversarial images can serve as good samples for better optimization. Those facts inspire us to apply random and differentiable transformations to the inputs for the sake of the transferability of adversarial examples.

数据增加[15,32,13]被证明是在训练过程中防止网络过拟合的一种有效方法。同时,[38,12]表明,对图像进行简单变换后,对敌对图像不再具有恶意,表明这些经过变换的敌对图像可以作为较好的样本进行更好的优化。这些事实启发我们对输入应用随机可微的变换,以保证对抗性例子的可转移性。

3.3. Diverse Input Patterns

3.3. 不同的输入模式

Based on the analysis above, we aim at generating more transferable adversarial examples via diverse input patterns.

基于上述分析,我们的目标是通过不同的输入模式生成可转移的对抗示例。

DI2 -FGSM. First, we propose the Diverse Inputs Iterative Fast Gradient Sign Method (DI2 -FGSM), which applies image transformations T(·) to the inputs with the probability p at each iteration of I-FGSM [17] to alleviate the overfitting phenomenon.

DI2 -FGSM。首先,我们提出了DI2 -FGSM多元输入迭代快速梯度符号法(DI2 -FGSM),该方法在I-FGSM[17]每次迭代时,对概率为p的输入进行图像变换T(·),以缓解过拟合现象。

In this paper, we consider random resizing, which resizes the input images to a random size, and random padding, which pads zeros around the input images in a random manner [38], as the instantiation of the image transformations T(·) 1 . The transformation probability p controls the tradeoff between success rates on white-box models and success rates on black-box models, which can be observed from Fig. 4. If p = 0, DI2 -FGSM degrades to I-FGSM and leads to overfitting. If p = 1, i.e., only transformed inputs are used for the attack, the generated adversarial examples tend to have much higher success rates on black-box models but lower success rates on white-box models, since the original inputs are not seen by the attackers.

在本文中,我们考虑了随机大小调整(将输入图像调整为随机大小)和随机填充(将0以随机方式[38]填充在输入图像周围)作为图像变换T(·)1的实例化。转换概率p控制白盒模型成功率和黑盒模型成功率之间的权衡,如图4所示。当p = 0时,DI2 -FGSM退化为I-FGSM,导致过拟合。如果p = 1,即只使用转换后的输入进行攻击,生成的对抗性示例在黑箱模型上的成功率会高得多,而在白箱模型上的成功率会低得多,因为攻击者看不到原始的输入。

In general, the updating procedure of DI2 -FGSM is similar to I-FGSM, with the replacement of Eq. (3) by

一般来说,DI2的更新过程 -FGSM与I-FGSM相似,由

图2。不同攻击之间的关系。通过设置变换概率p的设定值,衰减因子 和总迭代次数N,我们可以将这些攻击联系在快速梯度符号方法族中。

where the stochastic transformation function T(Xnadv;p)T(X_{n}^{adv} ; p) is

其中随机变换函数 T(Xnadv;p)T(X_{n}^{adv} ; p)

M-DI2 -FGSM. Intuitively, momentum and diverse inputs are two completely different ways to alleviate the overfitting phenomenon. We can combine them naturally to form a much stronger attack, i.e., Momentum Diverse Inputs Iterative Fast Gradient Sign Method (M-DI2 -FGSM). The overall updating procedure of M-DI2 -FGSM is similar to MIFGSM, with the only replacement of Eq. (4) by

M-DI2 -FGSM。直观上,动量和不同的输入是缓解过拟合现象的两种完全不同的方式。我们可以将它们自然地结合起来,形成更强的攻击,即动量多样化输入迭代快速梯度符号法(M-DI2 -FGSM)。M-DI2 -FGSM的整体更新过程与MIFGSM相似,仅通过

3.4. Relationships between Different Attacks

3.4. 不同攻击之间的关系

The attacks mentioned above all belong to the family of Fast Gradient Sign Methods, and they can be related via different parameter settings as shown in Fig. 2. To summarize,

上面提到的攻击都属于 快速梯度符号方法,可以通过不同的参数设置来关联,如图2所示。总而言之,

  • If the transformation probability p = 0, M-DI2 -FGSM degrades to MI-FGSM, and DI2 -FGSM degrades to IFGSM.

  • 如果变换概率p = 0 M-DI2 -FGSM降级为MI-FGSM和DI2 -FGSM降级为IFGSM。

  • If the decay factor µ = 0, M-DI2 -FGSM degrades to DI2 -FGSM, and MI-FGSM degrades to I-FGSM.

  • 如果衰减因子(x) = 0,即M-DI2 -FGSM降解, DI2 -FGSM, MI-FGSM降级为I-FGSM。

  • If the total iteration number N = 1, I-FGSM degrades to FGSM

  • 如果总迭代次数N = 1, I-FGSM降级为FGSM

3.5. Attacking an Ensemble of Networks

3.5. 攻击网络集合

Liu et al. [21] suggested that attacking an ensemble of multiple networks simultaneously can generate much stronger adversarial examples. The motivation is that if an adversarial image remains adversarial for multiple networks, then it is more likely to transfer to other networks as well. Therefore, we can use this strategy to improve the transferability even further.

Liu等人[21]认为,同时攻击多个网络集合可以产生更强的对抗实例。这样做的动机是,如果一个敌对的形象在多个网络中仍然是敌对的,那么它更有可能转移到其他网络中。因此,我们可以利用该策略进一步提高可转让性。

We follow the ensemble strategy proposed in [9], which fuse the logit activations together to attack multiple networks simultaneously. Specifically, to attack an ensemble of K models, the logits are fused by:

我们采用[9]中提出的集成策略,将logit激活融合在一起,同时攻击多个网络。具体地说,为了攻击K个模型的集合,logits被融合:

where lk(X; θk) is the logits output of the k-th model with the parameters θk, wk is the ensemble weight with ω0\omega\geq0 and k=1Kωk=1\sum\limits_{k=1}^{K}\omega_{k}=1

路(X;其中,为第k个模型的logits输出,参数为,wk为集合权重 ω0\omega\geq0k=1Kωk=1\sum\limits_{k=1}^{K}\omega_{k}=1 .

4. Experiment

4. 实验

4.1. Experiment Setup

4.1. 实验设置

Dataset. It is less meaningful to attack the images that are already classified wrongly. Therefore, we randomly choose 5000 images from the ImageNet validation set that are classified correctly by all the networks which we test on, to form our test dataset. All these images are resized to 299×299×3 beforehand.

数据集。对已经被错误分类的图像进行攻击是没有意义的。因此,我们从ImageNet验证集随机选择5000张图像,这些图像被我们测试的所有网络正确分类,以形成我们的测试数据集。所有这些图片事先都被调整为299X299X3。

Networks. We consider four normally trained networks, i.e., Inception-v3 (Inc-v3) [35], Inception-v4 (Incv4) [34], Resnet-v2-152 (Res-152) [13] and InceptionResnet-v2 (IncRes-v2) [34], and three adversarially trained networks [37], i.e., ens3-adv-Inception-v3 (Inc-v3ens3), ens4-adv-Inception-v3 (Inc-v3ens4) and ens-adv-InceptionResNet-v2 (IncRes-v2ens). All networks are publicly available2,3 .

网络。我们考虑4个通常训练的网络,即incep -v3 (Inc-v3)[35]、incep -v4 (Incv4)[34]、Resnet-v2-152 (Res-152)[13]和InceptionResnet-v2(增量-v2)[34],以及3个反向训练的网络[37],即ens3- advo - incep -v3 (Inc-v3ens3)、ens4- advo - incep -v3(增量-v3ens4)和ens- advo - InceptionResnet-v2(增量-v2ens)。所有的网络都是公共的。

Implementation details. For the parameters of different attackers, we follow the default settings in [16] with the step size α = 1 and the total iteration number N = min(ǫ + 4, 1.25ǫ). We set the maximum perturbation of each pixel to be ǫ = 15, which is still imperceptible for human observers [23]. For the momentum term, decay factor µ is set to be 1 as in [9]. For the stochastic transformation function T(X; p), the probability p is set to be 0.5, i.e., attackers put equal attentions on the original inputs and the transformed inputs. For transformation operations T(·), the input X is first randomly resized to a rnd × rnd × 3 image, with rnd ∈ [299, 330), and then padded to the size 330 × 330 × 3 in a random manner.

实现细节。参数不同的攻击者,我们按照默认设置与步长[16]α= 1和总迭代数N = min(ǫ+ 4,1.25ǫ)。我们将每个像素的最大摄动ǫ= 15,仍听不清人类观察员[23]。对于动量项,衰减因子在[9]中设置为1。对于随机变换函数T(X;p),设概率p为0.5,即攻击者对原始输入和转换后的输入给予同等关注。对于变换操作T(·),首先将输入X随机调整为rnd rnd3图像,其中rnd[299, 330),然后随机填充到大小为330X330X3的图像。

4.2. Attacking a Single Network

4.2. 攻击单一网络

We first perform adversarial attacks on a single network. We craft adversarial examples only on normally trained networks, and test them on all seven networks. The success rates are shown in Table 1, where the diagonal blocks indicate white-box attacks and off-diagonal blocks indicate black-box attacks. We list the networks that we attack on in rows, and networks that we test on in columns.

我们首先对单个网络执行对抗性攻击。我们只在正常训练的网络上设计了对抗的例子,并在所有7个网络上测试它们。成功率如表1所示,其中对角线块表示白盒攻击,非对角线块表示黑盒攻击。我们按行列出要攻击的网络,按列列出要测试的网络。

From Table 1, a first glance shows that M-DI2 -FGSM outperforms all other baseline attacks by a large margin on all black-box models, and maintains high success rates on all white-box models. For example, if adversarial examples are crafted on IncRes-v2, M-DI2 -FGSM has success rates of 67.4% on Inc-v4 (normally trained black-box model) and 25.1% on Inc-v3ens3 (adversarially trained black-box model), while strong baselines like MI-FGSM only obtains the corresponding success rates of 45.9% and 15.3%, respectively. This convincingly demonstrates the effectiveness of the combination of input diversity and momentum for improving the transferability of adversarial examples.

从表1可以看出,M-DI2 -FGSM在所有黑盒模型上都比其他所有基线攻击表现出色,并且在所有白盒模型上都保持着很高的成功率。例如,如果在increse -v2上制作对抗性示例,M-DI2 -FGSM在Inc-v4(通常训练的黑盒模型)上的成功率为67.4%,在Inc-v3ens3(反向训练的黑盒模型)上的成功率为25.1%,而MI-FGSM等强基线仅获得相应的45.9%和15.3%的成功率。这有力地证明了输入多样性和动量的结合对于提高对抗性实例的可转移性的有效性。

Table 1. The success rates on seven networks where we attack a single network. The diagonal blocks indicate white-box attacks, while the off-diagonal blocks indicate black-box attacks which are much more challenging. Experiment results demonstrate that our proposed input diversity strategy substantially improve the transferability of generated adversarial examples.

表1。我们攻击一个网络的七个网络的成功率。对角线块表示白盒攻击,而非对角线块表示更具挑战性的黑盒攻击。实验结果表明,我们提出的输入分集策略大大提高了生成的对抗性实例的可转移性。

Figure 3. Visualization of randomly selected clean images and their corresponding adversarial examples. All these adversarial examples are generated on Inception-v3 using our proposed DI2 - FGSM with the maximum perturbation of each pixel ǫ = 15.

图3。可视化随机选择的干净图像和相应的对抗实例。所有这些敌对的例子都是使用我们提出的DI2在incepo -v3上生成的 - 与每个像素的最大摄动FGSMǫ= 15。

We then compare the success rates of I-FGSM and DI2 -FGSM to see the effectiveness of diverse input patterns solely. By generating adversarial examples with input diversity, DI2 -FGSM significantly improves the success rates of I-FGSM on challenging black-box models, regardless whether this model is adversarially trained, and maintains high success rates on white-box models. For example, if adversarial examples are crafted on Res-152, DI2 -FGSM has success rates of 99.2% on Res-152 (white box model), 53.8% on Inc-v3 (normally trained blackbox model) and 11.1% on Inc-v3ens4 (adversarially trained black-box model), while I-FGSM only obtains the corresponding success rates of 99.1%, 20.8% and 4.6%, respectively. Compared with FGSM, DI2 -FGSM also reaches much higher success rates on the normally trained blackbox models, and comparable performance on the adversarially trained black-box models. Besides, we visualize 5 randomly selected pairs of such generated adversarial images and their clean counterparts in Figure 3. These visualization results show that these generated adversarial perturbations are human imperceptible.

然后我们比较I-FGSM和DI2 -FGSM的成功率,以单独观察不同输入模式的有效性。DI2 -FGSM通过生成具有输入多样性的对抗性实例,显著提高了I-FGSM挑战黑箱模型的成功率,无论该模型是否经过对抗性训练,在白箱模型上保持较高的成功率。例如,如果敌对的例子是精心res - 152, DI2 -FGSM成功率为99.2% res - 152(白箱模型),53.8% Inc-v3 Inc-v3ens4(通常训练有素的黑箱模型)和11.1%(对抗训练有素的黑箱模型),而I-FGSM仅获得相应的成功率为99.1%,分别为20.8%和4.6%。与FGSM相比,DI2 -FGSM在正常训练的黑箱模型上也达到了更高的成功率,在反向训练的黑箱模型上也达到了相同的性能。另外,我们将随机选取5对这样生成的敌对图像和干净的对应图像进行可视化,如图3所示。这些可视化结果表明,这些产生的对抗性扰动是人类无法察觉的。

It should be mentioned that the proposed input diversity is not merely applicable to fast gradient sign methods. To demonstrate the generalization, we also incorporate C&W attack [4] with input diversity. The experiment is conducted on 1000 correctly classified images. For the parameters of C&W, the maximal iteration is 250, the learning rate is 0.01 and the confidence is 10. As Table 2 suggests, our method D-C&W obtains a significant performance improvement over C&W on black-box models.

需要指出的是,所提出的输入多样性并不仅仅适用于快速梯度符号方法。为了证明其通用性,我们还将C&W攻击[4]与输入分集结合。对1000幅正确分类的图像进行实验。对于C&W的参数,最大迭代为250次,学习率为0.01,置信度为10。如表2所示,我们的方法D-C&W在黑箱模型上比C&W获得了显著的性能改进。

4.3. Attacking an Ensemble of Networks

4.3. 攻击网络集合

Though the results in Table 1 show that momentum and input diversity can significantly improve the transferability of adversarial examples, they are still relatively weak at attacking an adversarially trained network under the blackbox setting, e.g., the highest black-box success rate on IncRes-v2ens is only 19.0%. Therefore, we follow the strategy in [21] to attack multiple networks simultaneously in order to further improve transferability. We consider all seven networks here. Adversarial examples are generated on an ensemble of six networks, and tested on the ensembled network and the hold-out network, using I-FGSM, DI2 - FGSM, MI-FGSM and M-DI2 -FGSM, respectively. FGSM is ignored here due to its low success rates on white-box models. All ensembled models are assigned with equal weight, i.e., wk = 1/6.

虽然从表1的结果可以看出动量和输入多样性可以显著提高对抗性算例的可转移性,但在黑箱设置下,它们对对抗性训练过的网络的攻击能力仍然较弱,例如,在IncRes-v2ens上,最高的黑箱成功率只有19.0%。因此,我们采用[21]中的策略,同时攻击多个网络,以进一步提高可转让性。我们考虑了这七个网络。在六个网络的集成上,分别用I-FGSM、DI2 -FGSM、MI-FGSM和M-DI2 -FGSM对集成网络和非集成网络进行了测试。FGSM在这里被忽略了,因为它在白盒模型上的成功率很低。各集合模型权重相等,即wk = 1/6。

Table 2. The success rates on seven networks where we attack a single network using C&W attack. Experiment results demonstrate that the proposed input diversity strategy can enhance C&W attack for generating more transferable adversarial examples.

表2。7个网络中使用C&W攻击一个网络的成功率。实验结果表明,该输入分集策略可以增强C&W攻击,产生更多可转移的对抗实例。

Table 3. The success rates of ensemble attacks. Adversarial examples are generated on an ensemble of six networks, and tested on the ensembled network (white-box setting) and the hold-out network (black-box setting). The sign “-” indicates the hold-out network. We observe that the proposed M-DI2 -FGSM significantly outperforms all other attacks on all black-box models.

表3。集体攻击的成功率。在六个网络的集合上生成了相反的例子,并在集合网络(白盒设置)和保持网络(黑盒设置)上进行了测试。“-”标志表示网络被占用。我们观察到提出的M-DI2 -FGSM显著优于所有其他攻击黑盒模型。

4.4. Ablation Studies

4.4. 烧蚀研究

In this section, we conduct a series of ablation experiments to study the impact of different parameters. We only consider attacking an ensemble of networks here, since it is much stronger than attacking a single network and can provide a more accurate evaluation of the network robustness. The max perturbation of each pixel ǫ is set to 15 for all experiments.

在本节中,我们进行了一系列的烧蚀实验来研究不同参数的影响。这里我们只考虑攻击一个网络集合,因为它比攻击一个网络要强大得多,可以对网络的鲁棒性提供更准确的评估。每个像素的最大摄动ǫ所有实验设置为15。

Transformation probability p. We first study the influence of the transformation probability p on the success rates under both white-box and black-box settings. We set the step size α = 1 and the total iteration number N = min(ǫ + 4, 1.25ǫ). The transformation probability p varies from 0 to 1. Recall the relationships shown in Fig. 2, M-DI2 -FGSM (or DI2 -FGSM) degrades to MI-FGSM (or I-FGSM) if p = 0.

转换概率p。我们首先研究了转换概率p在白盒和黑盒两种情况下对成功率的影响。我们设置了步长α= 1和总迭代数N = min(ǫ+ 4,1.25ǫ)。变换概率p在0到1之间变化。回想一下图2所示的关系,如果p = 0, M-DI2 -FGSM(或DI2 -FGSM)退化为MI-FGSM(或I-FGSM)。

We show the success rates on various networks in Fig. 4. We observe that both DI2 -FGSM and M-DI2 -FGSM achieve a higher black-box success rates but lower white-box success rates as p increase. Moreover, for all attacks, if p is small, i.e., only a small amount of transformed inputs are utilized, black-box success rates can increase significantly, while white-box success rates only drop a little. This phenomenon reveals the importance of adding transformed inputs into the attack process.

图4显示了在各种网络上的成功率。我们观察到,随着p的增加,DI2 -FGSM和M-DI2 -FGSM的黑箱成功率更高,而白箱成功率更低。而且,对于所有攻击,当p很小,即只利用了少量的转换输入时,黑盒成功率会显著提高,而白盒成功率只会略有下降。这一现象揭示了在攻击过程中添加转换过的输入的重要性。

Figure 4. The success rates of DI2 -FGSM (a) and M-DI2 -FGSM (b) when varying the transformation probability p. “Ensemble” (white-box setting) is with dashed lines and “Hold-out” (black-box setting) is with solid lines.

图4。DI2的成功率 -FGSM (a)和M-DI2 -FGSM (b)变换概率p时,“Ensemble”(白盒设置)带有虚线和“hol空” (黑盒设置)是实线。

图5。DI2的成功率 -FGSM (a)和M-DI2 -FGSM (b)在改变总迭代次数时N.“集成” (白盒设置)使用虚线,而“镂空”(黑盒设置)使用实线。

Figure 5. The success rates of DI2 -FGSM (a) and M-DI2 -FGSM (b) when varying the total iteration number N. “Ensemble” (white-box setting) is with dashed lines and “Hold-out” (blackbox setting) is with solid lines.

The trends shown in Fig. 4 also provide useful suggestions of constructing strong adversarial attacks in practice. For example, if you know the black-box model is a new network that totally different from any existing networks, you can set p = 1 to reach the maximum transferability. If the black-box model is a mixture of new networks and existing networks, you can choose a moderate value of p to maximize the black-box success rates under a pre-defined white-box success rates, e.g., white-box success rates must greater or equal than 90%.

图4所示的趋势也提供了在实践中构建强对抗性攻击的有用建议。例如,如果您知道黑盒模型是一个完全不同于任何现有网络的新网络,那么您可以设置p = 1以达到最大的可转让性。如果黑箱模型是新网络和现有网络的混合,可以选择一个中等的p值,在预定义的白箱成功率下最大化黑箱成功率,例如,白箱成功率必须大于或等于90%。

Total iteration number N. We then study the influence of the total iteration number N on the success rates under both white-box and black-box settings. We set the transformation probability p = 0.5 and the step size α = 1. The total iteration number N varies from 15 to 31, and the results are plotted in Fig. 5. For DI2 -FGSM, we see that the black-box success rates and white-box success rates always increase as the total iteration number N increase. Similar trends can also be observed for M-DI2 -FGSM except for the black-box success rates on adversarially trained models, i.e., performing more iterations cannot bring extra transferability on adversarially trained models. Moreover, we observe that the success rates gap between M-DI2 -FGSM and DI2 -FGSM is diminished as N increases.

然后,我们研究了在白盒和黑盒设置下,总迭代次数N对成功率的影响。设变换概率p = 0.5,步长为:1。总迭代次数N为15 ~ 31,结果如图5所示。对于DI2 -FGSM,我们可以看到黑盒成功率和白盒成功率总是随着总迭代次数N的增加而增加。在M-DI2 -FGSM中也可以观察到类似的趋势,除了反向训练模型上的黑箱成功率,即,执行更多的迭代不能给反向训练模型带来额外的可转移性。此外,我们观察到M-DI2 -FGSM和DI2 -FGSM的成功率差距随着N的增加而减小。

Figure 6. The success rates of DI2 -FGSM (a) and M-DI2 -FGSM (b) when varying the step size α. “Ensemble” (white-box setting) is with dashed lines and “Hold-out” (black-box setting) is with solid lines.

图6。DI2的成功率 -FGSM (a)和M-DI2 -FGSM (b)在改变步进尺寸时。“Ensemble”(白盒设置)用虚线表示,“holout”(black box设置)用实线表示。

Step size α. We finally study the influence of the step size α on the success rates under both white-box and black-box settings. We set the transformation probability p = 0.5. In order to reach the maximum perturbation ǫ even for a small step size α, we set the total iteration number be proportional to the step size, i.e., N = ǫ/α. The results are plotted in Fig. 6. We observe that the white-box success rates of both DI2 -FGSM and M-DI2 -FGSM can be boosted if a smaller step size is provided. Under the black-box setting, the success rates of DI2 -FGSM is insensitive to the step size, while the success rates of M-DI2 -FGSM can still be improved with smaller step size.

步长α。最后,我们研究了在白盒和黑盒两种设置下,步长误差对测试成功率的影响。我们设变换概率p = 0。5。以达到最大扰动ǫ即使一小步大小α,我们设置了总迭代数成正比的步长,也就是说,N =ǫ/α。结果如图6所示。我们观察到,如果提供更小的步长,DI2 -FGSM和M-DI2 -FGSM的白盒成功率都可以提高。在黑箱设置下,DI2 -FGSM的成功率对步长不敏感,而M-DI2 -FGSM的成功率仍然可以通过更小的步长来提高。

4.5. NIPS 2017 Adversarial Competition

4.5. NIPS 2017对抗性竞争

In order to verify the effectiveness of our proposed attack methods in practice, we here reproduce the top defense entries and official baselines from NIPS 2017 adversarial competition [18] for testing transferability. Due to the resource limitation, we only consider the top-3 defense entries, i.e., TsAIL [19], iyswim [38] and Anil Thomas4 , as well 3 official baselines, i.e., Inc-v3adv, IncRes-v2ens and Inc-v3. We note that the No.1 solution and the No.3 solution apply significantly different image transformations (compared to random resizing & padding used in our attack method) for defending against adversarial examples. For example, the No.1 solution, TsAIL, applies an image denoising network for removing adversarial perturbations, and the No.3 solution, Anil Thomas, includes a series of image transformations, e.g., JPEG compression, rotation, shifting and zooming, in the defense pipeline. The test dataset contains 5000 images which are all of the size 299 × 299 × 3, and their corresponding labels are the same as the ImageNet labels.

为了在实践中验证我们提出的攻击方法的有效性,我们在这里重现了NIPS 2017对抗赛[18]的顶级防御条目和官方基线,用于测试可转移性。由于资源的限制,我们只考虑前三名的防御条目,即TsAIL[19]、iyswim[38]和Anil Thomas4,以及3个官方基线,即Inc-v3adv、增量-v2ens和Inc-v3。我们注意到第一个解决方案和第3个解决方案应用了显著不同的图像变换(与随机大小调整相比)。在我们的攻击方法中使用的填充)为防御对抗的例子。例如,第一个解决方案,TsAIL,应用图像去噪网络来去除敌对干扰,第三个解决方案,Anil Thomas,包括一系列的图像变换,例如JPEG压缩,旋转,移动和缩放,在国防管道。测试数据集包含5000个图像,大小都是299 299 3,它们对应的标签与ImageNet标签相同。

Table 4. The success rates on top defense solutions and official baselines from NIPS 2017 adversarial competition [18]. * indicates the official results reported in the competition. Our proposed M-DI2 -FGSM reaches an average success rate of 73.0%, which outperforms the top-1 attack submission in the NIPS competition by a large margin of 6.6%.

表4。来自NIPS 2017对抗赛[18]的顶级防御解决方案和官方基线的成功率。*表示在比赛中公布的正式成绩。我们提出的M-DI2 -FGSM的平均成功率为73.0%,在NIPS竞争中比top-1攻击提交高出6.6%。

Generating adversarial examples. When generating adversarial examples, we follow the procedure in [18]: (1) split the dataset equally into 50 batches; (2) for each batch, the maximum perturbation ǫ is randomly chosen from the set { 4 255 , 8 255 , 12 255 , 16 255 }; and (3) generate adversarial examples for each batch under the corresponding ǫ constraint.

产生敌对的例子。在生成对抗性示例时,我们遵循[18]中的步骤:(1)将数据集平均分成50批;(2)对于每一个批处理,设置的最大摄动ǫ是随机选择{255 4 255年,8 255年,16 255};和(3)产生敌对的例子为每一批相应的ǫ约束。

Attacker settings. For the settings of attackers, we follow [9] by attacking an ensemble eight diferent models, i.e., Inc-v3, Inc-v4, IncRes-v2, Res-152, Inc-v3ens3, Inc-v3ens4, IncRes-v2ens and Inc-v3adv [17]. The ensemble weights are set as 1/7.25 equally for the first seven models and 0.25/7.25 for Inc-v3adv. The total iteration number N is 10 and the decay factor µ is 1. This configuration for MIFGSM won the 1-st place in the NIPS 2017 adversarial attack competition. For DI2 -FGSM and M-DI2 -FGSM, we choose p = 0.4 according to the trends shown in Fig. 4.

攻击者设置。对于攻击者的设置,我们遵循[9],攻击一个集合八个不同的模型,即Inc-v3、Inc-v4、增量-v2、Res-152、Inc-v3ens3、增量-v3ens4、增量-v2ens和增量-v3adv[17]。前7个模型的整体权重设为1/7.25,Inc-v3adv的权重设为0.25/7.25。总迭代次数N为10,衰减因子为1。MIFGSM的这种配置获得了NIPS 2017对抗性攻击比赛的第一名。对于DI2 -FGSM和M-DI2 -FGSM,我们根据图4所示的趋势选择p = 0.4。

Results. The results are summarized in Table 4. We also report the official results of MI-FGSM (named MI-FGSM) as a reference to validate our implementation. The performance difference between MI-FGSM and MI-FGSM is due to the randomness of the max perturbation magnitude introduced in the attack process. Compared with MIFGSM, DI2 -FGSM have higher success rates on top defense solutions while slightly lower success rates on baseline models, which results in these two attack methods having similar average success rates. By integrating both diverse inputs and momentum term, this enhanced attack, M-DI2 -FGSM, reaches an average success rate of 73.0%, which is far better than other methods. For example, the top-1 attack submission, MI-FGSM, in the NIPS competition only gets an average success rate of 66.4%. We believe this superior transferability can also be observed on other defense submissions which we do not evaluate on.

结果。结果如表4所示。我们还报告了MI-FGSM(命名为MI-FGSM)的官方结果,作为验证我们实现的参考。MI-FGSM和MI-FGSM的性能差异是由于攻击过程中引入的最大摄动幅度的随机性。与MIFGSM相比,DI2 -FGSM在顶级防御方案上的成功率更高,而在基线模型上的成功率略低,这使得这两种攻击方法的平均成功率相近。通过综合不同的输入和动量项,这种增强攻击即M-DI2 -FGSM,平均成功率为73.0%,远优于其他方法。例如,在NIPS比赛中排名第一的攻击提交,MI-FGSM,平均成功率只有66.4%。我们相信这种优越的可转让性也可以在我们没有评估的其他辩护文件上观察到。

4.6. Discussion

4.6. 讨论

We provide a brief discussion of why the proposed diverse input patterns can help to generate adversarial examples with better transferability. One hypothesis is that the decision boundaries of different networks share similar inherent structures due to the same training dataset, e.g., ImageNet. For example, as shown in Fig 1, different networks make similar mistakes in the presence of adversarial examples. By incorporating diverse patterns at each attack iteration, the optimization produces adversarial examples that are more robust to small transformations. These adversarial examples are malicious in a certain region at the network decision boundary, thus increasing the chance to fool other networks, i.e., they achieve better black-box success rate than existing methods. In the future, we plan to validate this hypothesis theoretically or empirically.

我们将简要讨论为什么所提议的不同输入模式可以帮助生成具有更好可转移性的对抗性示例。一种假设是,由于使用相同的训练数据集,例如ImageNet,不同网络的决策边界共享类似的固有结构。例如,如图1所示,不同网络在存在对抗性实例时,会出现类似的错误。通过在每个攻击迭代中合并不同的模式,优化产生了对抗小转换的更健壮的例子。这些对抗性实例恶意地位于网络决策边界的某个区域,从而增加了欺骗其他网络的机会,即比现有方法获得更好的黑箱成功率。在未来,我们计划从理论或经验上验证这一假设。

5. Conclusions

5. 结论

In this paper, we propose to improve the transferability of adversarial examples with input diversity. Specifically, our method applies random transformations to the input images at each iteration in the attack process. Compared with traditional iterative attacks, the results on ImageNet show that our proposed attack method gets significantly higher success rates for black-box models, and maintains similar success rates for white-box models. We improve the transferability further by integrating momentum term and attacking multiple networks simultaneously. By evaluating this enhanced attack against the top defense submissions and official baselines from NIPS 2017 adversarial competition [18], we show that this enhanced attack reaches an average success rate of 73.0%, which outperforms the top-1 attack submission in the NIPS competition by a large margin of 6.6%. We hope that our proposed attack strategy can serve as a benchmark for evaluating the robustness of networks to adversaries and the effectiveness of different defense methods in future. Code is publicly available at https://github.com/cihangxie/DI-2-FGSM.

在本文中,我们提出了改进具有输入多样性的对抗性算例的可转移性。具体来说,我们的方法在攻击过程的每次迭代中对输入图像应用随机变换。在ImageNet上的结果表明,与传统的迭代攻击相比,我们提出的攻击方法对黑箱模型的成功率显著提高,对白箱模型的成功率保持相似。通过整合动量项,同时攻击多个网络,进一步提高了网络的可转移性。通过对NIPS 2017对抗赛[18]中排名靠前的防御提交和官方基线进行评估,我们发现该增强攻击的平均成功率为73.0%,比排名靠前的攻击提交高出6.6%。我们希望我们所提出的攻击策略可以作为未来评估网络对敌人的鲁棒性和不同防御方法的有效性的基准。代码可以通过https://github.com/cihangxie/di2-fgsm公开获取。

Acknowledgement

致谢

This work was supported by a gift grant from YiTu and ONR N00014-12-1-0883.

这项工作得到了YiTu和ONR N00014-12-1-0883的赠款支持。

References

参考文献

[1] A. Arnab, O. Miksik, and P. H. Torr. On the robustness of semantic segmentation models to adversarial attacks. arXiv preprint arXiv:1711.09856, 2017. 1

[2] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok. Synthesizing robust adversarial examples. In International Conference on Machine Learning, pages 284–293, 2018. 2

[3] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Srndi ˇ c,´ P. Laskov, G. Giacinto, and F. Roli. Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases, pages 387–402, 2013. 2

[4] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017. 5

[5] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017. 1

[6] M. Cisse, Y. Adi, N. Neverova, and J. Keshet. Houdini: Fooling deep structured prediction models. arXiv preprint arXiv:1707.05373, 2017. 2

[7] N. Dalvi, P. Domingos, S. Sanghai, D. Verma, et al. Adversarial classification. In ACM SIGKDD international conference on Knowledge discovery and data mining, 2004. 2

[8] G. S. Dhillon, K. Azizzadenesheli, J. D. Bernstein, J. Kossaifi, A. Khanna, Z. C. Lipton, and A. Anandkumar. Stochastic activation pruning for robust adversarial defense. In International Conference on Learning Representations, 2018. 2

[9] Y. Dong, F. Liao, T. Pang, H. Su, X. Hu, J. Li, and J. Zhu. Boosting adversarial attacks with momentum. arXiv preprint arXiv:1710.06081, 2017. 2, 3, 4, 8

[10] R. Girshick. Fast r-cnn. In International Conference on Computer Vision, 2015. 1

[11] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015. 1, 2, 3

[12] C. Guo, M. Rana, M. Cisse, and L. van der Maaten. Coun- ´ tering adversarial images using input transformations. In International Conference on Learning Representations, 2018. 2, 3

[13] K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In European Conference on Computer Vision, 2016. 1, 2, 3, 4

[14] L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. Tygar. Adversarial machine learning. In ACM workshop on Security and artificial intelligence, 2011. 2

[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 2012. 1, 2, 3

[16] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial examples in the physical world. In International Conference on Learning Representations Workshop, 2017. 1, 2, 4

[17] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial machine learning at scale. In International Conference on Learning Representations, 2017. 1, 2, 3, 8

[18] A. Kurakin, I. Goodfellow, S. Bengio, Y. Dong, F. Liao, M. Liang, T. Pang, J. Zhu, X. Hu, C. Xie, et al. Adversarial attacks and defences competition. arXiv preprint arXiv:1804.00097, 2018. 2, 7, 8

[19] F. Liao, M. Liang, Y. Dong, T. Pang, X. Hu, and J. Zhu. Defense against adversarial attacks using high-level representation guided denoiser. In Computer Vision and Pattern Recognition, 2018. 7

[20] Y.-C. Lin, Z.-W. Hong, Y.-H. Liao, M.-L. Shih, M.-Y. Liu, and M. Sun. Tactics of adversarial attack on deep reinforcement learning agents. In International Joint Conference on Artificial Intelligence, 2017. 2

[21] Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and black-box attacks. In International Conference on Learning Representations, 2017. 2, 4, 6

[22] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Computer Vision and Pattern Recognition, 2015. 1

[23] Y. Luo, X. Boix, G. Roig, T. Poggio, and Q. Zhao. Foveationbased mechanisms alleviate adversarial examples. arXiv preprint arXiv:1511.06292, 2015. 4

[24] D. Meng and H. Chen. Magnet: a two-pronged defense against adversarial examples. arXiv preprint arXiv:1705.09064, 2017. 2

[25] A. Nguyen, J. Yosinski, and J. Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Computer Vision and Pattern Recognition, 2015. 2

[26] N. Papernot, F. Faghri, N. Carlini, I. Goodfellow, R. Feinman, A. Kurakin, C. Xie, Y. Sharma, T. Brown, A. Roy, A. Matyasko, V. Behzadan, K. Hambardzumyan, Z. Zhang, Y.-L. Juang, Z. Li, R. Sheatsley, A. Garg, J. Uesato, W. Gierke, Y. Dong, D. Berthelot, P. Hendricks, J. Rauber, and R. Long. cleverhans v2.1.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768, 2018. 3

[27] A. Prakash, N. Moran, S. Garber, A. DiLillo, and J. Storer. Deflecting adversarial attacks with pixel deflection. arXiv preprint arXiv:1801.08926, 2018. 2

[28] S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, 2015. 1

[29] P. Samangouei, M. Kabkab, and R. Chellappa. DefenseGAN: Protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations, 2018. 2

[30] A. Shrivastava, A. Gupta, and R. Girshick. Training regionbased object detectors with online hard example mining. In Computer Vision and Pattern Recognition, 2016. 2

[31] E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, and F. Moreno-Noguer. Discriminative learning of deep convolutional feature point descriptors. In International Conference on Computer Vision, 2015. 2

[32] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015. 1, 2, 3

[33] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. arXiv preprint arXiv:1710.10766, 2017. 2

[34] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, 2017. 4

[35] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In Computer Vision and Pattern Recognition, 2016. 4

[36] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations, 2014. 1, 2

[37] F. Tramer, A. Kurakin, N. Papernot, D. Boneh, and P. Mc- ` Daniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017. 1, 2, 4

[38] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille. Mitigating adversarial effects through randomization. In International Conference on Learning Representations, 2018. 2, 3, 7

[39] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. Yuille. Adversarial Examples for Semantic Segmentation and Object Detection. In International Conference on Computer Vision, 2017. 2

[40] Z. Zhang, S. Qiao, C. Xie, W. Shen, B. Wang, and A. L. Yuille. Single-shot object detection with enriched semantics. arXiv preprint arXiv:1712.00433, 2017. 1

Last updated

Was this helpful?