以势头推动对抗性攻击

Boosting Adversarial Attacks with Momentum

原文连接：

https://openaccess.thecvf.com/content_cvpr_2018/papers/Dong_Boosting_Adversarial_Attacks_CVPR_2018_paper.pdf

GB/T 7714 Dong Y, Liao F, Pang T, et al. Boosting adversarial attacks with momentum[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 9185-9193.

MAL Dong, Yinpeng, et al. "Boosting adversarial attacks with momentum." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

APA Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., & Li, J. (2018). Boosting adversarial attacks with momentum. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9185-9193).

Abstract

摘要

Deep neural networks are vulnerable to adversarial examples, which poses security concerns on these algorithms due to the potentially severe consequences. Adversarial attacks serve as an important surrogate to evaluate the robustness of deep learning models before they are deployed. However, most of existing adversarial attacks can only fool a black-box model with a low success rate. To address this issue, we propose a broad class of momentum-based iterative algorithms to boost adversarial attacks. By integrating the momentum term into the iterative process for attacks, our methods can stabilize update directions and escape from poor local maxima during the iterations, resulting in more transferable adversarial examples. To further improve the success rates for black-box attacks, we apply momentum iterative algorithms to an ensemble of models, and show that the adversarially trained models with a strong defense ability are also vulnerable to our black-box attacks. We hope that the proposed methods will serve as a benchmark for evaluating the robustness of various deep models and defense methods. With this method, we won the first places in NIPS 2017 Non-targeted Adversarial Attack and Targeted Adversarial Attack competitions.

深度神经网络很容易受到敌对的例子的攻击，这给这些算法带来了安全问题，因为潜在的严重后果。在深度学习模型部署之前，对抗性攻击是评估其健壮性的重要替代方法。然而，大多数现有的对抗性攻击只能以较低的成功率愚弄黑箱模型。为了解决这个问题，我们提出了一种广泛的基于动量的迭代算法来增强对抗性攻击。通过将动量项整合到攻击的迭代过程中，我们的方法可以在迭代过程中稳定更新方向，避免较差的局部极值，从而产生更可转移的对抗实例。为了进一步提高黑箱攻击的成功率，我们将动量迭代算法应用到一个模型集合中，并证明了那些经过反向训练的具有较强防御能力的模型也容易受到我们的黑箱攻击。我们希望所提出的方法将作为一个基准来评估各种深层模型和防御方法的鲁棒性。利用这种方法，我们在NIPS 2017非定向对抗性攻击和定向对抗性攻击比赛中获得了第一名。

1. Introduction

1. 介绍

Deep neural networks (DNNs) are challenged by their vulnerability to adversarial examples [23, 5], which are crafted by adding small, human-imperceptible noises to legitimate examples, but make a model output attackerdesired inaccurate predictions. It has garnered an increasing attention to generating adversarial examples since it helps to identify the vulnerability of the models before they are launched. Besides, adversarial samples also facilitate various DNN algorithms to assess the robustness by providing more varied training data [5, 10].

深度神经网络(DNNs)面临的挑战是他们的弱点，对抗性例子[23,5]，这是通过添加小的，人类察觉不到的噪音，以合法的例子，但使一个模型输出攻击者想要的不准确的预测。由于它有助于在启动模型之前查明它们的脆弱性，因此它在产生对抗性例子方面获得了越来越多的关注。此外，对抗性样本还可以提供更多样化的训练数据，促进不同的DNN算法评估鲁棒性[5,10]。

With the knowledge of the structure and parameters of a given model, many methods can successfully generate adversarial examples in the white-box manner, including optimization-based methods such as box-constrained LBFGS [23], one-step gradient-based methods such as fast gradient sign [5] and iterative variants of gradient-based methods [9]. In general, a more severe issue of adversarial examples is their good transferability [23, 12, 14], i.e., the adversarial examples crafted for one model remain adversarial for others, thus making black-box attacks practical in real-world applications and posing real security issues. The phenomenon of transferability is due to the fact that different machine learning models learn similar decision boundaries around a data point, making the adversarial examples crafted for one model also effective for others.

与知识的结构和参数给定的模型,很多方法可以成功地生成白盒的方式对抗的例子,包括文中针对方法如box-constrained LBFGS[23],基于一步梯度方法如快速梯度信号[5]和[9]基于迭代变量的梯度方法。一般来说，对抗性例子的一个更严重的问题是它们良好的可转移性[23,12,14]，也就是说，为一个模型设计的对抗性例子对其他模型仍然是对抗性的，因此使得黑箱攻击在真实世界的应用中很实际，并提出了真正的安全问题。可转移性现象是由于不同的机器学习模型在一个数据点周围学习相似的决策边界，使得为一个模型设计的对抗示例对其他模型也有效。

However, existing attack methods exhibit low efficacy when attacking black-box models, especially for those with a defense mechanism. For example, ensemble adversarial training [24] significantly improves the robustness of deep neural networks and most of existing methods cannot successfully attack them in the black-box manner. This fact largely attributes to the trade-off between the attack ability and the transferability. In particular, the adversarial examples generated by optimization-based and iterative methods have poor transferability [10], and thus make black-box attacks less effective. On the other hand, one-step gradientbased methods generate more transferable adversarial examples, however they usually have a low success rate for the white-box model [10], making it ineffective for blackbox attacks. Given the difficulties of practical black-box attacks, Papernot et al. [16] use adaptive queries to train a surrogate model to fully characterize the behavior of the target model and therefore turn the black-box attacks to whitebox attacks. However, it requires the full prediction confidences given by the target model and tremendous number of queries, especially for large scale datasets such as ImageNet [19]. Such requirements are impractical in real-world applications. Therefore, we consider how to effectively attack a black-box model without knowing its architecture and parameters, and further, without querying.

然而，现有的攻击方法在攻击黑箱模型时表现出较低的有效性，特别是对于具有防御机制的黑箱模型。例如，集成对抗训练[24]显著提高了深度神经网络的鲁棒性，而现有的大多数方法都无法在黑箱方式下成功攻击深度神经网络。这一事实很大程度上归因于攻击能力和可转移性之间的权衡。特别是基于优化和迭代方法生成的对抗性实例[10]的可移动性较差，使得黑箱攻击的有效性较低。另一方面，一步递变的方法产生了更多可转移的对抗例子，但是对于白盒模型[10]，它们的成功率通常很低，使得它对黑盒攻击无效。鉴于实际黑箱攻击的困难，Papernot等[16]使用自适应查询训练代理模型，以充分表征目标模型的行为，从而将黑箱攻击转变为白箱攻击。然而，它需要目标模型给出完整的预测信心和大量的查询，特别是对于像ImageNet[19]这样的大规模数据集。这些要求在实际应用中是不切实际的。因此，我们考虑如何在不知道其架构和参数的情况下有效地攻击黑盒模型，更进一步，不进行查询。

In this paper, we propose a broad class of momentum iterative gradient-based methods to boost the success rates of the generated adversarial examples. Beyond iterative gradient-based methods that iteratively perturb the input with the gradients to maximize the loss function [5], momentum-based methods accumulate a velocity vector in the gradient direction of the loss function across iterations, for the purpose of stabilizing update directions and escaping from poor local maxima. We show that the adversarial examples generated by momentum iterative methods have higher success rates in both white-box and black-box attacks. The proposed methods alleviate the trade-off between the white-box attacks and the transferability, and act as a stronger attack algorithm than one-step methods [5] and vanilla iterative methods [9].

在本文中，我们提出了一种广泛的基于动量迭代梯度的方法来提高对抗性实例的生成成功率。基于梯度的迭代方法使用梯度迭代扰动输入以使损失函数[5]最大化，而基于动量的方法在迭代过程中在损失函数的梯度方向上积累一个速度矢量，以稳定更新方向并避免局部极值。结果表明，动量迭代算法在白盒攻击和黑盒攻击中都具有较高的成功率。该方法减轻了白盒攻击与可转移性之间的权衡，是一种比一步法[5]和普通迭代法[9]更强的攻击算法。

To further improve the transferability of adversarial examples, we study several approaches for attacking an ensemble of models, because if an adversarial example fools multiple models, it is more likely to remain adversarial for other black-box models [12]. We show that the adversarial examples generated by the momentum iterative methods for multiple models, can successfully fool robust models obtained by ensemble adversarial training [24] in the blackbox manner. The findings in this paper raise new security issues for developing more robust deep learning models, with a hope that our attacks will be used as a benchmark to evaluate the robustness of various deep learning models and defense methods. In summary, we make the following contributions:

We introduce a class of attack algorithms called momentum iterative gradient-based methods, in which we accumulate gradients of the loss function at each iteration to stabilize optimization and escape from poor local maxima.
本文介绍了一种基于动量迭代梯度的攻击算法，在这种算法中，我们在每次迭代中累积损失函数的梯度，以稳定优化和摆脱糟糕的局部极值。
We study several ensemble approaches to attack multiple models simultaneously, which demonstrates a powerful capability of transferability by preserving a high success rate of attacks.
研究了几种同时攻击多个模型的集成方法，在保证攻击成功率的前提下证明了其强大的可移植性。
We are the first to show that the models obtained by ensemble adversarial training with a powerful defense ability are also vulnerable to the black-box attacks.
首次证明了通过整体对抗性训练得到的具有强大防御能力的模型也容易受到黑箱攻击。

2. Backgrounds

2. 背景

In this section, we provide the background knowledge as well as review the related works about adversarial attack and defense methods. Given a classifier f(x) : $x ∈ X → y ∈ Y$ that outputs a label y as the prediction for an input x, the goal of adversarial attacks is to seek an example x ∗ in the vicinity of x but is misclassified by the classifier. Specifically, there are two classes of adversarial examples— non-targeted and targeted ones. For a correctly classified input x with ground-truth label y such that f(x) = y, a non-targeted adversarial example x ∗ is crafted by adding small noise to x without changing the label, but misleads the classifier as f(x ∗ ) 6= y; and a targeted adversarial example aims to fool the classifier by outputting a specific label as f(x ∗ ) = y ∗ , where y ∗ is the target label specified by the adversary, and y ∗ 6= y. In most cases, the Lp norm of the adversarial noise is required to be less than an allowed value ǫ as kx ∗ − xkp ≤ ǫ, where p could be 0, 1, 2, ∞.

在这一部分，我们提供了背景知识，并回顾了有关对抗式攻击和防御方法的相关工作。给定一个分类器f(x): $x ∈ X → y ∈ Y$ ，输出一个标签y作为对输入 $x$ 的预测，对敌攻击的目标是在 $x$ 附近寻找一个例子 $x$ 但被分类器误分类。具体来说，有两类对抗的例子非目标和目标。对于一个正确分类的输入 $x$ ，其地真标号y使f(x) = y，在不改变标号的情况下，通过在x上添加小噪声，制作一个非目标的对抗性示例x，但误将分类器引导为f(x) 6= y;和有针对性的对抗的例子目的是傻瓜的分类器输出一个特定的标签为f (x) = y, y是目标标签指定的对手,和y 6 = y。在大多数情况下,对抗噪声的Lp规范需要不到一个允许价值ǫkx xkpǫ,p可能是0,1,2,。

2.1. Attack methods

2.1. 攻击方法

Existing approaches for generating adversarial examples can be categorized into three groups. We introduce their non-targeted version of attacks here, and the targeted version can be simply derived.

现有的用于生成敌对示例的方法可以分为三组。我们在这里介绍了它们的非目标版本的攻击，目标版本可以简单地派生出来。

One-step gradient-based approaches, such as the fast gradient sign method (FGSM) [5], find an adversarial example $x^{∗}$ by maximizing the loss function $J(x^{∗} , y)$ , where $J$ is often the cross-entropy loss. FGSM generates adversarial examples to meet the $L_{\infty}$ norm bound $\lvert x^{\star}-x \rvert_{\infty}\leq\epsilon$ as

基于梯度的一步法，如快速梯度符号法(FGSM)[5]，通过使损失函数 $J(x^{∗} , y)$ 最大化来找到一个敌对的例子 $x^{∗}$ ，其中 $J$ 通常是交叉熵损失。FGSM产生敌对的例子来满足 $L_{\infty}$ 规范绑定 $\lvert x^{\star}-x \rvert_{\infty}\leq\epsilon$

where $\triangledown_{x}J(x,y)$ is the gradient of the loss function w.r.t. x. The fast gradient method (FGM) is a generalization of FGSM to meet the $L_{2}$ norm bound $\lvert x^{\star}-x \rvert_{2}\leq\epsilon$ as

其中 $\triangledown_{x}J(x,y)$ 是loss function的梯度w.r.t。 x.快速梯度法(FGM)是对 FGSM满足 $L_{2}$ 范数界 $\lvert x^{\star}-x \rvert_{2}\leq\epsilon$

Iterative methods [9] iteratively apply fast gradient multiple times with a small step size α. The iterative version of FGSM (I-FGSM) can be expressed as:

迭代方法[9]迭代应用快速梯度多次与小步长。FGSM (I-FGSM)的迭代版本可以表示为:

To make the generated adversarial examples satisfy the $L_{\infty} (or ~L_{2})$ bound, one can clip $x_{t}^{\star}$ into the $\epsilon$ vicinity of $x$ or simply set $α = \epsilon/T$ with T being the number of iterations. It has been shown that iterative methods are stronger whitebox adversaries than one-step methods at the cost of worse transferability [10, 24].

使生成的敌对的例子满足 $L_{\infty} (or ~L_{2})$ 绑定,可以剪辑 $x_{t}^{\star}$ 的 $\epsilon$ 附近或者只是设置 $α = \epsilon/T$ 的迭代次数。已有证据表明，迭代方法比一步方法具有更强的白盒对手性，但代价是可转让性更差[10,24]。

Optimization-based methods [23] directly optimize the distance between the real and adversarial examples subject to the misclassification of adversarial examples. Boxconstrained L-BFGS can be used to solve such a problem. A more sophisticated way [1] is solving:

基于优化的方法[23]直接优化真实算例与敌对算例之间的距离，容易导致敌对算例的误分类。盒约束的L-BFGS可以用来解决这一问题。 [1]更复杂的解决方法是:

Since it directly optimizes the distance between an adversarial example and the corresponding real example, there is no guarantee that the L∞ (L2) distance is less than the required value. Optimization-based methods also lack the efficacy in black-box attacks just like iterative methods.

由于它直接优化了一个反例与对应的实例之间的距离，因此不能保证L∞(L2)距离小于所要求的值。与迭代方法相比，基于优化的方法在黑盒攻击中也缺乏有效性。

2.2. Defense methods

2.2. 防御方法

Among many attempts [13, 3, 15, 10, 24, 17, 11], adversarial training is the most extensively investigated way to increase the robustness of DNNs [5, 10, 24]. By injecting adversarial examples into the training procedure, the adversarially trained models learn to resist the perturbations in the gradient direction of the loss function. However, they do not confer robustness to black-box attacks due to the coupling of the generation of adversarial examples and the parameters being trained. Ensemble adversarial training [24] augments the training data with the adversarial samples produced not only from the model being trained, but also from other hold-out models. Therefore, the ensemble adversarially trained models are robust against one-step attacks and black-box attacks.

在许多尝试中[13,3,15,10,24,17,11]，对抗训练是研究最广泛的增加DNNs鲁棒性的方法[5,10,24]。通过在训练过程中引入对抗性的例子，对抗性的训练模型学会了抵抗损失函数梯度方向上的扰动。然而，由于对抗性例子的生成与所训练的参数的耦合，它们并没有赋予黑盒攻击的鲁棒性。集合对抗性训练[24]使用来自被训练模型和其他保持模型的对抗性样本来增加训练数据。因此，集合反向训练模型对一步攻击和黑盒攻击具有较强的鲁棒性。

3. Methodology

3. 方法

In this paper, we propose a broad class of momentum iterative gradient-based methods to generate adversarial examples, which can fool white-box models as well as black-box models. In this section, we elaborate the proposed algorithms. We first illustrate how to integrate momentum into iterative FGSM, which induces a momentum iterative fast gradient sign method (MI-FGSM) to generate adversarial examples satisfying the L∞ norm restriction in the non-targeted attack fashion. We then present several methods on how to efficiently attack an ensemble of models. Finally, we extend MI-FGSM to L2 norm bound and targeted attacks, yielding a broad class of attack methods.

在本文中，我们提出了一个广泛的基于动量迭代梯度的方法来产生对抗性的例子，它可以欺骗白盒模型，也可以欺骗黑盒模型。在这一节中，我们将详细阐述所提出的算法。首先说明了如何将动量与迭代法结合，提出了一种动量迭代快速梯度符号法(MI-FGSM)，在非目标攻击方式下生成满足L范数约束的对抗算例。然后，我们提出了几种有效地攻击模型集合的方法。最后，我们将MI-FGSM扩展到L2范数约束和有针对性的攻击，产生了广泛的攻击方法。

3.1. Momentum iterative fast gradient sign method

3.1. 动量迭代快速梯度符号法

The momentum method [18] is a technique for accelerating gradient descent algorithms by accumulating a velocity vector in the gradient direction of the loss function across iterations. The memorization of previous gradients helps to barrel through narrow valleys, small humps and poor local minima or maxima [4]. The momentum method also shows its effectiveness in stochastic gradient descent to stabilize the updates [20]. We apply the idea of momentum to generate adversarial examples and obtain tremendous benefits.

动量法[18]是一种加速梯度下降算法的技术，通过累积一个速度矢量在梯度方向的损失函数的迭代。记住以前的梯度有助于通过狭窄的山谷，小的驼峰和糟糕的局部极小值或极大值[4]。动量法在随机梯度下降中稳定更新[20]的有效性也得到了证明。我们应用动量的概念来产生对抗的例子，并获得巨大的利益。

To generate a non-targeted adversarial example $x^{∗}$ from a real example $x$ , which satisfies the $L_{∞}$ norm bound, gradient-based approaches seek the adversarial example by solving the constrained optimization problem

生成一个非目标的对抗性示例x 从一个真实的例子 $x^{∗}$ ，它满足 $L_{∞}$ 范数界，基于梯度的方法通过解决约束优化问题来寻求对抗的例子

where $\epsilon$ is the size of adversarial perturbation. FGSM generates an adversarial example by applying the sign of the gradient to a real example only once (in Eq. (1)) by the assumption of linearity of the decision boundary around the data point. However in practice, the linear assumption may not hold when the distortion is large [12], which makes the adversarial example generated by FGSM “underfits” the model, limiting its attack ability. In contrast, iterative FGSM greedily moves the adversarial example in the direction of the sign of the gradient in each iteration (in Eq. (3)). Therefore, the adversarial example can easily drop into poor local maxima and “overfit” the model, which is not likely to transfer across models.

$\epsilon$ 是敌对的扰动的大小。在Eq.(1)中，通过假设决策边界在数据点周围是线性的，FGSM将梯度的符号仅应用于一个实际例子一次，从而产生一个相反的例子。但在实际应用中，当失真[12]较大时，线性假设可能不成立，使得FGSM生成的对抗性实例不符合模型，限制了其攻击能力。而迭代式FGSM则在每次迭代中贪婪地将反例移向梯度符号的方向(见式(3))。因此，对抗性实例容易陷入局部极大值差，模型过拟合，不易在模型间传递。

In order to break such a dilemma, we integrate momentum into the iterative FGSM for the purpose of stabilizing update directions and escaping from poor local maxima. Therefore, the momentum-based method remains the transferability of adversarial examples when increasing it erations, and at the same time acts as a strong adversary for the white-box models like iterative FGSM. It alleviates the trade-off between the attack ability and the transferability, demonstrating strong black-box attacks.

为了打破这种困境，我们将动量集成到迭代FGSM中，以稳定更新方向，避免局部极值。因此，基于动量的方法在增加迭代次数的同时保持了对抗性算例的可转移性，同时也成为迭代FGSM等白盒模型的有力对手。缓解了攻击能力和可转让性之间的权衡，表现出较强的黑箱攻击。

The momentum iterative fast gradient sign method (MIFGSM) is summarized in Algorithm 1. Specifically, gt gathers the gradients of the first t iterations with a decay factor $µ$ , defined in Eq. (6). Then the adversarial example $x_{t}^{\star}$ until the t-th iteration is perturbed in the direction of the sign of gt with a step size α in Eq. (7). If $µ$ equals to 0, MI-FGSM degenerates to the iterative FGSM. In each iteration, the current gradient $∇xJ(x ∗ t , y)$ is normalized by the L1 distance (any distance measure is feasible) of itself, because we notice that the scale of the gradients in different iterations varies in magnitude.

在算法1中总结了动量迭代快速梯度符号法(MIFGSM)。具体来说,gt收集第t的梯度迭代衰减系数,定义在Eq。(6),然后对抗的例子x t直到第t迭代摄动的方向标志gt的步长αEq。(7)。如果等于0,迭代FGSM MI-FGSM退化。在每一次迭代中，当前的梯度 $∇xJ(x ∗ t , y)$ 被自身的L1距离(任何距离度量都是可行的)归一化，因为我们注意到在不同的迭代中梯度的尺度在量级上是不同的。

3.2. Attacking ensemble of models

3.2. 攻击模型的集合

In this section, we study how to attack an ensemble of models efficiently. Ensemble methods have been broadly adopted in researches and competitions for enhancing the performance and improving the robustness [6, 8, 2]. The idea of ensemble can also be applied to adversarial attacks, due to the fact that if an example remains adversarial for multiple models, it may capture an intrinsic direction that always fools these models and is more likely to transfer to other models at the same time [12], thus enabling powerful black-box attacks.

在本节中，我们将研究如何有效地攻击模型集合。为了提高性能和鲁棒性，集成方法在研究和竞赛中被广泛采用[6,8,2]。合奏的概念也可以应用于对手的攻击,因为如果仍然是多个模型对抗的一个例子,它可以捕获一个内在的方向,总是傻瓜这些模型和更有可能转移到其他模型同时[12],从而使强大的黑盒的攻击。

We propose to attack multiple models whose logit activations1 are fused together, and we call this method ensemble in logits. Because the logits capture the logarithm relationships between the probability predictions, an ensemble of models fused by logits aggregates the fine detailed outputs of all models, whose vulnerability can be easily discovered. Specifically, to attack an ensemble of K models, we fuse the logits as

我们提出攻击多个logit激活被融合在一起的模型，在logits中我们称之为集成方法。由于对数捕获概率预测之间的对数关系，由对数融合的模型集合将所有模型的详细输出集合起来，这些模型的漏洞很容易被发现。具体地说，为了攻击K个模型的集合，我们将logits融合为

where lk(x) are the logits of the k-th model, wk is the ensemble weight with wk ≥ 0 and PK k=1 wk = 1. The loss function J(x, y) is defined as the softmax cross-entropy loss given the ground-truth label y and the logits l(x)

其中lk(x)为第k个模型的对数，wk为wk≥0的集合权重，PK k=1 wk =1。损失函数J(x, y)被定义为基于真值标记y和logits l(x)的softmax交叉熵损失。

where 1y is the one-hot encoding of y. We summarize the MI-FGSM algorithm for attacking multiple models whose logits are averaged in Algorithm 2.

其中1y是y的一个热编码，我们总结算法2中MI-FGSM算法用于攻击对数平均的多个模型。

For comparison, we also introduce two alternative ensemble schemes, one of which is already studied [12]. Specifically, K models can be averaged in predictions [12] as $p(x)=\sum_{k=1}^{K}\omega_{k}p_{k}(x)$ , where $p_{k}(x)$ is the predicted probability of the k-th model given input $x$ . $K$ models can also be averaged in loss as $J(x, y) = \sum_{k=1}^{K} wkJk(x, y)$ . In these three methods, the only difference is where to combine the outputs of multiple models, but they result in different attack abilities. We empirically find that the ensemble in logits performs better than the ensemble in predictions and the ensemble in loss, among various attack methods and various models in the ensemble, which will be demonstrated in Sec. 4.3.1.

为了比较，我们还引入了两种可选的系综方案，其中一种已经被研究过了。具体来说,平均K模型可以预测[12], $p(x)=\sum_{k=1}^{K}\omega_{k}p_{k}(x)$ 在 $p_{k}(x)$ 的预测概率K模型给定的输入 $x$ 。 $K$ 模型也可以平均损失的 $J(x, y) = \sum_{k=1}^{K} wkJk(x, y)$ 。在这三个方法,唯一的区别就是把多个模型的输出,但他们导致不同的攻击能力。我们通过实验发现，在各种攻击方法和各种攻击模型中，logits系统比预测系统和损失系统表现得更好，这将在4.3.1节中进行论证。

3.3. Extensions

3.3. 扩展

The momentum iterative methods can be easily generalized to other attack settings. By replacing the current gradient with the accumulated gradient of all previous steps, any iterative method can be extended to its momentum variant. Here we introduce the methods for generating adversarial examples in terms of the L2 norm bound attacks and the targeted attacks.

动量迭代方法可以很容易地推广到其它攻击环境中。通过将当前的梯度替换为之前所有步骤的累积梯度，任何一种迭代方法都可以推广到它的动量变化。在这里，我们介绍了根据L2范数界限攻击和目标攻击生成敌对例子的方法。

To find an adversarial examples within the $\epsilon$ vicinity of a real example measured by $L_{2}$ distance as $\lVert x^{\star}-x \rVert_{2}\leq\epsilon$ , the momentum variant of iterative fast gradient method (MIFGM) can be written as

找到一个敌对的例子在 $\epsilon$ 附近一个真实的例子来衡量L2距离作为 $\lVert x^{\star}-x \rVert_{2}\leq\epsilon$ ,动量的快速迭代梯度法(MIFGM)可以写成

where $g_{t+1}$ is defined in Eq. (6) and $\alpha=\epsilon/T$ with T standing for the total number of iterations.

$g_{t+1}$ 定义在情商的地方。(6)和 $\alpha=\epsilon/T$ 站总数的迭代。

For targeted attacks, the objective for finding an adversarial example misclassified as a target class $y^{∗}$ is to minimize the loss function $J(x^{∗} , y^{∗})$ . The accumulated gradient is derived as

对于有针对性的攻击，寻找被误分类为目标类 $y^{∗}$ 的对抗性实例的目标是最小化损失函数 $J(x^{∗} , y^{∗})$ 。累积梯度推导为

The targeted MI-FGSM with an L∞ norm bound is

具有L∞范数界的目标MI-FGSM为

and the targeted MI-FGM with an L2 norm bound is

L2范数约束的目标MI-FGM为

Therefore, we introduce a broad class of momentum iterative methods for attacks in various settings, whose effectiveness is demonstrated in Sec. 4.

因此，我们引入了一种广泛的动量迭代方法来应对各种情况下的攻击，其有效性在第4节中得到了证明。

4. Experiments

4. 实验

In this section, we conduct extensive experiments on the ImageNet dataset [19] to validate the effectiveness of the proposed methods. We first specify the experimental settings in Sec. 4.1. Then we report the results for attacking a single model in Sec. 4.2 and an ensemble of models in Sec. 4.3. Our methods won both the NIPS 2017 Non-targeted and Targeted Adversarial Attack competitions, with the configurations introduced in Sec. 4.4.

在本节中，我们在ImageNet数据集[19]上进行了广泛的实验，以验证所提方法的有效性。我们首先在第4.1节中指定实验设置。然后，我们在第4.2节报告了攻击单个模型的结果，在第4.3节报告了攻击多个模型的结果。我们的方法赢得了NIPS 2017非定向和定向对抗性攻击比赛，配置在第4.4节中介绍。

4.1. Setup

4.1. 设置

We study seven models, four of which are normally trained models—Inception v3 (Inc-v3) [22], Inception v4 (Inc-v4), Inception Resnet v2 (IncRes-v2) [21], Resnet v2- 152 (Res-152) [7] and the other three of which are trained by ensemble adversarial training—Inc-v3ens3, Inc-v3ens4, IncRes-v2ens [24]. We will simply call the last three models as “adversarially trained models” without ambiguity.

我们研究了七个模型，其中四个是通常训练的模型Inception v3 (inco -v3) [22]， Inception v4 (inco -v4)， Inception Resnet v2(增量-v2) [21]， Resnet v2- 152 (Res-152)[7]，另外三个是由ensemble对局训练inco -v3ens3, inco -v3ens4，增量-v2ens[24]所训练的。我们将简单地称最后三个模型为正面训练的模型，没有歧义。

It is less meaningful to study the success rates of attacks if the models cannot classify the original image correctly. Therefore, we randomly choose 1000 images belonging to the 1000 categories from the ILSVRC 2012 validation set, which are all correctly classified by them.

如果模型不能对原始图像进行正确的分类，那么研究攻击成功率就没有多大意义。因此，我们从ILSVRC 2012验证集中随机选择1000张属于1000个类别的图像，这些图像都被它们正确地分类。

In our experiments, we compare our methods to onestep gradient-based methods and iterative methods. Since optimization-based methods cannot explicitly control the distance between the adversarial examples and the corresponding real examples, they are not directly comparable to ours, but they have similar properties with iterative methods as discussed in Sec. 2.1. For clarity, we only report the results based on L∞ norm bound for non-targeted attacks, and leave the results based on L2 norm bound and targeted attacks in the supplementary material. The findings in this paper are general across different attack settings.

在实验中，我们将我们的方法与一步梯度法和迭代法进行了比较。由于基于优化的方法不能显式地控制对抗算例与对应的实算例之间的距离，因此它们不能直接与我们的方法进行比较，但它们与第2.1节中讨论的迭代方法具有相似的性质。为了清楚起见，我们只报告了非目标攻击的基于L范数界的结果，而将基于L2范数界和有目标攻击的结果留在补充材料中。本文的研究结果适用于不同的攻击设置。

4.2. Attacking a single model

4.2. 攻击单个模型

We report in Table 1 the success rates of attacks against the models we consider. The adversarial examples are generated for Inc-v3, Inc-v4, InvRes-v2 and Res-152 respectively using FGSM, iterative FGSM (I-FGSM) and MIFGSM attack methods. The success rates are the misclassification rates of the corresponding models with adversarial images as inputs. The maximum perturbation ǫ is set to 16 among all experiments, with pixel value in [0, 255]. The number of iterations is 10 for I-FGSM and MI-FGSM, and the decay factor µ is 1.0, which will be studied in Sec. 4.2.1.

我们在表1中报告了针对我们所考虑的模型的攻击成功率。分别用FGSM、迭代FGSM (I-FGSM)和MIFGSM攻击方法对Inc-v3、Inc-v4、invors -v2和rs -152进行了对敌实例生成。成功率是对应模型以敌对图像作为输入时的误分类率。的最大摄动ǫ设置为16在所有的实验中,用像素值在[0,255]。I-FGSM和MI-FGSM迭代次数为10次，衰减因子为1.0，将在第4.2.1节进行研究。

From the table, we can observe that MI-FGSM remains as a strong white-box adversary like I-FGSM since it can attack a white-box model with a near 100% success rate. On the other hand, it can be seen that I-FGSM reduces the success rates for black-box attacks than one-step FGSM. But by integrating momentum, our MI-FGSM outperforms both FGSM and I-FGSM in black-box attacks significantly. It obtains more than 2 times of the success rates than I-FGSM in most black-box attack cases, demonstrating the effectiveness of the proposed algorithm. We show two adversarial images in Fig. 1 generated for Inc-v3.

从表中，我们可以看到MI-FGSM仍然是一个像I-FGSM一样强大的白盒对手，因为它可以以接近100%的成功率攻击白盒模型。另一方面可以看出，I-FGSM比一步FGSM降低了黑箱攻击的成功率。但通过整合动力，我们的MI-FGSM在黑箱攻击上明显优于FGSM和I-FGSM。在大多数黑盒攻击案例中，该算法的成功率是I-FGSM的2倍以上，证明了该算法的有效性。在图1中，我们展示了为Inc-v3生成的两张敌对的图像。

It should be noted that although our method greatly improves the success rates for black-box attacks, it is still ineffective for attacking adversarially trained models (e.g., less than 16% for IncRes-v2ens) in the black-box manner. Later we show that ensemble-based approaches greatly improve the results in Sec. 4.3. Next, we study several aspects of MI-FGSM that are different from vanilla iterative methods, to further explain why it performs well in practice.

值得注意的是，尽管我们的方法极大地提高了黑盒攻击的成功率，但它仍然无法有效地攻击经过反向训练的模型(例如，对于incres -v2 - ens，小于16%)。随后，我们在第4.3节中展示了基于ensembo的方法极大地改善了结果。接下来，我们将研究MI-FGSM不同于普通迭代方法的几个方面，以进一步解释它在实践中表现出色的原因。

4.2.1 Decay factor µ

4.2.1衰减因子

The decay factor µ plays a key role for improving the success rates of attacks. If µ = 0, momentum-based iterative methods trivially turn to vanilla iterative methods. Therefore, we study the appropriate value of the decay factor. We attack Inc-v3 model by MI-FGSM with the perturbation ǫ = 16, the number of iterations 10, and the decay factor ranging from 0.0 to 2.0 with a granularity 0.1. We show the success rates of the generated adversarial examples against Inc-v3, Inc-v4, IncRes-v2 and Res-152 in Fig. 2. The curve of the success rate against a black-box model is unimodal whose maximum value is obtained at around µ = 1.0. When µ = 1.0, another interpretation of gt defined in Eq. (6) is that it simply adds up all previous gradients to perform the current update.

衰减因子对提高攻击成功率起着关键作用。如果= 0，基于动量的迭代方法就可以很容易地转化为普通的迭代方法。因此，我们研究了衰减因子的适当值。我们攻击Inc-v3模型通过MI-FGSM扰动ǫ= 16,10的迭代次数和衰减系数从0.0到2.0,粒度0.1。在图2中，我们展示了生成的针对inco -v3、inco -v4、增量-v2和Res-152的对抗示例的成功率。成功率与黑盒模型的曲线是单峰的，其最大值在= 1.0左右。当= 1.0时，对于Eq.(6)中定义的gt的另一种解释是，它只是简单地将之前所有的梯度相加来执行当前的更新。

4.2.2 The number of iterations

4.2.2迭代次数

We then study the effect of the number of iterations on the success rates when using I-FGSM and MI-FGSM. We adopt the same hyper-parameters (i.e., ǫ = 16, µ = 1.0) for attacking Inc-v3 model with the number of iterations ranging from 1 to 10, and then evaluate the success rates of adversarial examples against Inc-v3, Inc-v4, IncRes-v2 and Res-152 models, with the results shown in Fig. 3.

然后我们研究了迭代次数对使用I-FGSM和MI-FGSM的成功率的影响。我们采用相同的hyper-parameters(例如,ǫ= 16日= 1.0)攻击Inc-v3模型的迭代的数量从1到10,然后评估对Inc-v3成功率对抗的例子,Inc-v4, IncRes-v2和res - 152模型,图3所示的结果。

It can be observed that when increasing the number of iterations, the success rate of I-FGSM against a black-box model gradually decreases, while that of MI-FGSM maintains at a high value. The results prove our argument that the adversarial examples generated by iterative methods easily overfit a white-box model and are not likely to transfer across models. But momentum-based iterative methods help to alleviate the trade-off between the white-box attacks and the transferability, thus demonstrating a strong attack ability for white-box and black-box models simultaneously.

可以看到，随着迭代次数的增加，I-FGSM对黑箱模型的成功率逐渐降低，而MI-FGSM的成功率保持在较高的数值。结果证明了迭代法产生的对抗性算例容易过拟合白盒模型，且不易在模型间转移。但基于动量的迭代方法有助于缓解白盒攻击与可转移性之间的权衡，从而同时对白盒和黑盒模型显示出较强的攻击能力。

4.2.3 Update directions

4.2.3更新方向

To interpret why MI-FGSM demonstrates better transferability, we further examine the update directions given by IFGSM and MI-FGSM along the iterations. We calculate the cosine similarity of two successive perturbations and show the results in Fig. 4 when attacking Inc-v3. The update direction of MI-FGSM is more stable than that of I-FGSM due to the larger value of cosine similarity in MI-FGSM.

为了解释为什么MI-FGSM表现出更好的可转让性，我们进一步研究了沿迭代的IFGSM和MI-FGSM给出的更新方向。我们计算了两个连续扰动的余弦相似度，并在图4中显示了攻击Inc-v3时的结果。由于MI-FGSM中余弦相似度值较大，因此MI-FGSM的更新方向比I-FGSM更稳定。

Recall that the transferability comes from the fact that models learn similar decision boundaries around a data point [12]. Although the decision boundaries are similar, they are unlikely the same due to the highly non-linear structure of DNNs. So there may exist some exceptional decision regions around a data point for a model (holes as shown in Fig. 4&5 in [12]), which are hard to transfer to other models. These regions correspond to poor local maxima in the optimization process and the iterative methods can easily trap into such regions, resulting in less transferable adversarial examples. On the other hand, the stabilized update directions obtained by the momentum methods as observed in Fig. 4 can help to escape from these exceptional regions, resulting in better transferability for adversarial attacks. Another interpretation is that the stabilized updated directions make the L2 norm of the perturbations larger, which may be helpful for the transferability.

回想一下，可转移性来自这样一个事实:模型在数据点[12]周围学习类似的决策边界。虽然决策边界相似，但由于DNNs的高度非线性结构，它们不太可能是相同的。因此，在某一模型的某一数据点附近可能存在一些异常的决策区域([12]图4和图5所示的孔洞)，这些区域很难转移到其他模型中。这些区域在优化过程中对应较差的局部极大值，迭代方法容易陷入这些区域，导致对抗性实例的可转移性较差。另一方面，如图4所示，动量法得到的稳定的更新方向有助于逃离这些异常区域，对敌方攻击具有更好的可转移性。另一种解释是，稳定的更新方向使扰动的L2范数变大，这可能有助于可转移性。

Table 2. The success rates (%) of non-targeted adversarial attacks of three ensemble methods. We report the results for an ensemble of white-box models and a hold-out black-box target model. We study four models—Inc-v3, Inc-v4, IncRes-v2 and Res-152. In each row, “-” indicates the name of the hold-out model and the adversarial examples are generated for the ensemble of the other three models by FGSM, I-FGSM and MI-FGSM respectively. Ensemble in logits consistently outperform other methods.

表2。三种集成方法非目标对抗性攻击的成功率(%)。我们报告了一个白盒模型集合和一个保持的黑盒目标模型的结果。我们研究了四种模型Inc-v3, Inc-v4，增量-v2和Res-152。每一行中-表示持有模型的名称，另外三个模型的集合分别由FGSM、I-FGSM和MI-FGSM生成对应的示例。集成在logits中的表现始终优于其他方法。

Figure 5. The success rates (%) of the adversarial examples generated for Inc-v3 against Inc-v3 (white-box) and Res-152 (blackbox). We compare the results of FGSM, I-FGSM and MI-FGSM with different size of perturbation. The curves of Inc-v3 vs. MIFGSM and Inc-v3 vs. I-FGSM overlap together.

图5。针对inco -v3生成的对抗inco -v3(白盒)和Res-152(黑盒)的例子的成功率(%)。比较了不同微扰尺寸下的FGSM、I-FGSM和MI-FGSM的结果。Inc-v3与MIFGSM、Inc-v3与I-FGSM的曲线重叠。

4.2.4 The size of perturbation

4.2.4扰动的大小

We finally study the influence of the size of adversarial perturbation on the success rates. We attack Inc-v3 model by FGSM, I-FGSM and MI-FGSM with ǫ ranging from 1 to 40 with the image intensity [0, 255], and evaluate the performance on the white-box model Inc-v3 and a black-box model Res-152. In our experiments, we set the step size α in I-FGSM and MI-FGSM to 1, so the number of iterations grows linearly with the size of perturbation ǫ. The results are shown in Fig. 5.

最后研究了对抗性扰动的大小对成功率的影响。我们攻击Inc-v3模型FGSM, I-FGSM MI-FGSMǫ从1到40图像强度[0,255],并评估白盒上的性能模型Inc-v3和黑盒模型res - 152。在我们的实验中,我们设置步长αI-FGSM和MI-FGSM 1,所以迭代数量的增加线性微扰ǫ的大小。结果如图5所示。

For the white-box attack, iterative methods reach the 100% success rate soon, but the success rate of one-step FGSM decreases when the perturbation is large. The phenomenon largely attributes to the inappropriate assumption of the linearity of the decision boundary when the perturbation is large [12]. For the black-box attacks, although the success rates of these three methods grow linearly with the size of perturbation, MI-FGSM’s success rate grows faster. In other words, to attack a black-box model with a required success rate, MI-FGSM can use a smaller perturbation, which is more visually indistinguishable for humans.

对于白盒攻击，迭代方法很快就能达到100%的攻击成功率，但当扰动较大时，一步法的攻击成功率有所下降。这种现象主要是由于当扰动较大时，对决策边界的线性性假设不恰当所致。对于黑箱攻击，虽然三种方法的成功率随扰动的大小呈线性增长，但MI-FGSM的成功率增长更快。换句话说，要攻击一个具有所需成功率的黑箱模型，MI-FGSM可以使用更小的扰动，这对人类来说在视觉上更难以区分。

4.3. Attacking an ensemble of models

4.3. 攻击模型的集合

In this section, we show the experimental results of attacking an ensemble of models. We first compare the three ensemble methods introduced in Sec. 3.2, and then demonstrate that the adversarially trained models are vulnerable to our black-box attacks.

在本节中，我们展示攻击一个模型集合的实验结果。我们首先比较了第3.2节中介绍的三种集成方法，然后证明了反向训练的模型在我们的黑箱攻击下是脆弱的。

4.3.1 Comparison of ensemble methods

4.3.1集成方法的比较

We compare the ensemble methods for attacks in this section. We include four models in our study, which are Incv3, Inc-v4, IncRes-v2 and Res-152. In our experiments, we keep one model as the hold-out black-box model and attack an ensemble of the other three models by FGSM, I-FGSM and MI-FGSM respectively, to fully compare the results of the three ensemble methods, i.e., ensemble in logits, ensemble in predictions and ensemble in loss. We set ǫ to 16, the number of iterations in I-FGSM and MI-FGSM to 10, µ in MI-FGSM to 1.0, and the ensemble weights equally. The results are shown in Table 2.

在这一节中，我们将比较用于攻击的集成方法。我们的研究包括四个模型，Incv3, Inc-v4，增量-v2和Res-152。在实验中，我们以其中一个模型为支撑黑箱模型，分别采用FGSM、I-FGSM和MI-FGSM攻击另外三个模型的一个集成，以充分比较三种集成方法的结果，即对数集成、预测集成和损失集成。我们设置ǫ16日的迭代数量I-FGSM MI-FGSM 10, MI-FGSM到1.0,整体重量一样。结果如表2所示。

It can be observed that the ensemble in logits outperforms the ensemble in predictions and the ensemble in loss consistently among all the attack methods and different models in the ensemble for both the white-box and blackbox attacks. Therefore, the ensemble in logits scheme is more suitable for adversarial attacks.

可以观察到，对于白盒攻击和黑盒攻击，在所有攻击方法和不同的攻击模型中，logits集合在预测性能上优于集合，在损失性能上一致。因此，logits方案中的集成更适合对抗攻击。

Another observation from Table 2 is that the adversarial examples generated by MI-FGSM transfer at a high rate, enabling strong black-box attacks. For example, by attacking an ensemble of Inc-v4, IncRes-v2 and Res-152 fused in logits without Inc-v3, the generated adversarial examples can fool Inc-v3 with a 87.9% success rate. Normally trained models show their great vulnerability against such an attack.

表2的另一个观察结果是，MI-FGSM高速传输生成的对抗示例支持强大的黑盒攻击。例如，通过攻击在没有inco -v3的日志中融合的inco -v4、增量-v2和Res-152的集合，生成的对抗性示例可以以87.9%的成功率欺骗inco -v3。正常训练的模型显示出它们在面对这种攻击时的巨大弱点。

Table 3. The success rates (%) of non-targeted adversarial attacks against an ensemble of white-box models and a hold-out black-box target model. We include seven models—Inc-v3, Inc-v4, IncResv2, Res-152, Inc-v3ens3, Inc-v3ens4 and IncRes-v2ens. In each row, “-” indicates the name of the hold-out model and the adversarial examples are generated for the ensemble of the other six models.

表3。非目标对抗性攻击的成功率(%)对一组白盒模型和一个保持的黑盒目标模型。我们包括7个模型Inc-v3, Inc-v4, IncResv2, re -152, Inc-v3ens3, Inc-v3ens4和增量-v2ens。在每一行中，-表示持有的模型的名称，其他6个模型的集合将生成对抗的示例。

4.3.2 Attacking adversarially trained models

4.3.2攻击逆反训练的模型

To attack the adversarially trained models in the black-box manner, we include all seven models introduced in Sec. 4.1. Similarly, we keep one adversarially trained model as the hold-out target model to evaluate the performance in the black-box manner, and attack the rest six model in an ensemble, whose logits are fused together with equal ensemble weights. The perturbation ǫ is 16 and the decay factor µ is 1.0. We compare the results of FGSM, I-FGSM and MIFGSM with 20 iterations. The results are shown in Table 3.

为了以黑盒的方式攻击经过反向训练的模型，我们包括了4.1节中介绍的所有7个模型。同样，我们保留一个逆反训练的模型作为保留的目标模型，以黑箱的方式评估性能，并以一个集合的方式攻击其余六个模型，其对数与集合权重相等。微扰ǫ是16和衰减系数是1.0。我们用20次迭代比较FGSM、I-FGSM和MIFGSM的结果。结果如表3所示。

It can be seen that the adversarially trained models also cannot defend our attacks effectively, which can fool Inc-v3ens4 by more than 40% of the adversarial examples. Therefore, the models obtained by ensemble adversarial training, the most robust models trained on the ImageNet as far as we are concerned, are vulnerable to our attacks in the black-box manner, thus causing new security issues for developing algorithms to learn robust deep learning models.

从图中可以看出，经过对抗性训练的模型也不能有效地防御我们的攻击，对ince -v3ens4的对抗性实例欺骗率超过40%。因此，通过集成对抗训练得到的模型，也就是我们认为在ImageNet上训练的最鲁棒的模型，在黑箱方式下容易受到我们的攻击，从而给开发学习鲁棒深度学习模型的算法带来新的安全问题。

4.4. Competitions

4.4. 比赛

There are three sub-competitions in the NIPS 2017 Adversarial Attacks and Defenses Competition, which are the Non-targeted Adversarial Attack, Targeted Adversarial Attack and Defense Against Adversarial Attack. The organizers provide 5000 ImageNet-compatible images for evaluating the attack and defense submissions. For each attack, one adversarial example is generated for each image with the size of perturbation ranging from 4 to 16 (specified by the organizers), and all adversarial examples run through all defense submissions to get the final score. We won the first places in both the non-targeted attack and targeted attack by the method introduced in this paper. We will specify the configurations in our submissions.

NIPS 2017对抗性攻防比赛分为三个子比赛，即非定向对抗性攻防、定向对抗性攻防和对抗性攻防。组织者提供了5000幅图像兼容的图像，用于评估攻击和防御提交的文件。对于每一次攻击，每幅扰动大小从4到16(由组织者指定)的图像都会生成一个对抗性的示例，所有对抗性的示例会通过所有答辩提交得到最终分数。本文所介绍的方法在无目标攻击和有目标攻击中均取得了第一名的成绩。我们将在提交中指定配置。

For the non-targeted attack2 , we implement the MIFGSM for attacking an ensemble of Inc-v3, Inc-v4, IncResv2, Res-152, Inc-v3ens3, Inc-v3ens4, IncRes-v2ens and Incv3adv [10]. We adopt the ensemble in logits scheme. The ensemble weights are set as 1/7.25 equally for the first seven models and 0.25/7.25 for Inc-v3adv. The number of iterations is 10 and the decay factor µ is 1.0.

对于非目标攻击2，我们实现了MIFGSM，用于攻击Inc-v3、Inc-v4、IncResv2、ris -152、Inc-v3ens3、Inc-v3ens4、IncRes-v2ens和Incv3adv[10]的集合。logits方案中采用了集成。前7个模型的整体权重设为1/7.25,Inc-v3adv的权重设为0.25/7.25。迭代次数为10次，衰减系数为1.0。

For the targeted attack3 , we build two graphs for attacks. If the size of perturbation is smaller than 8, we attack Inc-v3 and IncRes-v2ens with ensemble weights 1/3 and 2/3; otherwise we attack an ensemble of Inc-v3, Inc-v3ens3, Incv3ens4, IncRes-v2ens and Inc-v3adv with ensemble weights 4/11, 1/11, 1/11, 4/11 and 1/11. The number of iterations is 40 and 20 respectively, and the decay factor µ is also 1.0.

对于目标攻击3，我们构建了两个攻击图。如果扰动的大小小于8，则以系综权重为1/3和2/3攻击incv3和增量v2;否则，我们攻击集合权重为4/11、1/11、1/11、4/11和1/11的incv3ens3、Incv3ens4、IncRes-v2ens和Inc-v3adv。迭代次数分别为40次和20次，衰减系数也为1.0。

5. Discussion

5. 讨论

Taking a different perspective, we think that finding an adversarial example is an analogue to training a model and the transferability of the adversarial example is also an analogue to the generalizability of the model. By taking a meta view, we actually “train” an adversarial example given a set of models as training data. In this way, the improved transferability obtained by the momentum and ensemble methods is reasonable because the generalizability of a model is usually improved by adopting the momentum optimizer or training on more data. And we think that other tricks (e.g., SGD) for enhancing the generalizability of a model could also be incorporated into adversarial attacks for better transferability.

从另一个角度来看，我们认为找到一个对抗性例子是对一个模型的训练的模拟，对抗性例子的可转移性也是对模型的可推广性的模拟。通过采用元视图，我们实际上训练了一个敌对的例子，给出了一组模型作为训练数据。这样，动量法和集成法得到的改进可转移性是合理的，因为通常通过动量优化器或对更多数据的训练来提高模型的可转移性。我们认为，其他提高模型通用性的技巧(例如SGD)也可以纳入对抗攻击中，以提高可转让性。

6. Conclusion

6. 结论

In this paper, we propose a broad class of momentumbased iterative methods to boost adversarial attacks, which can effectively fool white-box models as well as blackbox models. Our methods consistently outperform one-step gradient-based methods and vanilla iterative methods in the black-box manner. We conduct extensive experiments to validate the effectiveness of the proposed methods and explain why they work in practice. To further improve the transferability of the generated adversarial examples, we propose to attack an ensemble of models whose logits are fused together. We show that the models obtained by ensemble adversarial training are vulnerable to our black-box attacks, which raises new security issues for the development of more robust deep learning models.

在本文中，我们提出了一种广泛的基于动量的迭代方法来增强对敌攻击，它可以有效地欺骗白盒模型和黑盒模型。我们的方法始终比基于梯度的一步方法和黑盒方式的普通迭代方法更好。我们进行了广泛的实验，以验证所提出的方法的有效性，并解释为什么他们在实践中工作。为了进一步提高生成的对抗性例子的可转移性，我们提出攻击一组logits被融合在一起的模型。我们表明，通过集成对抗训练获得的模型容易受到我们的黑箱攻击，这为开发更健壮的深度学习模型提出了新的安全问题。

Acknowledgements

致谢

The work is supported by the National NSF of China (Nos. 61620106010, 61621136008, 61332007, 61571261 and U1611461), Beijing Natural Science Foundation (No. L172037), Tsinghua Tiangong Institute for Intelligent Computing and the NVIDIA NVAIL Program, and partially funded by Microsoft Research Asia and Tsinghua-Intel Joint Research Institute.

国家科学基金会(No. 61620106010, 61621136008, 61332007, 61571261, U1611461)，北京市自然科学基金(No. 61620106010, 61621136008, 61332007, 61571261, U1611461)资助。L172037)、清华天宫智能计算研究所和英伟达NVAIL项目，部分资金由微软亚洲研究院和清华-英特尔联合研究院提供。

References

参考文献

[1] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017. 3

[2] R. Caruana, A. Niculescu-Mizil, G. Crew, and A. Ksikes. Ensemble selection from libraries of models. In ICML, 2004. 4

[3] Y. Dong, H. Su, J. Zhu, and F. Bao. Towards interpretable deep neural networks by leveraging adversarial examples. arXiv preprint arXiv:1708.05493, 2017. 3

[4] W. Duch and J. Korczak. Optimization and global minimization methods suitable for neural networks. Neural computing surveys, 2:163–212, 1998. 3

[5] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015. 1, 2, 3

[6] L. K. Hansen and P. Salamon. Neural network ensembles. IEEE transactions on pattern analysis and machine intelligence, 12(10):993–1001, 1990. 4

[7] K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In ECCV, 2016. 5

[8] A. Krogh and J. Vedelsby. Neural network ensembles, cross validation and active learning. In NIPS, 1994. 4

[9] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016. 1, 2, 3

[10] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial machine learning at scale. In ICLR, 2017. 1, 2, 3, 8

[11] Y. Li and Y. Gal. Dropout inference in bayesian neural networks with alpha-divergences. In ICML, 2017. 3

[12] Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and black-box attacks. In ICLR, 2017. 1, 2, 3, 4, 6, 7

[13] J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff. On detecting adversarial perturbations. In ICLR, 2017. 3

[14] S. M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturbations. In CVPR, 2017. 1

[15] T. Pang, C. Du, and J. Zhu. Robust deep learning via reverse cross-entropy training and thresholding test. arXiv preprint arXiv:1706.00633, 2017. 3

[16] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, 2017. 2

[17] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy, 2016. 3

[18] B. T. Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1–17, 1964. 3

[19] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015. 2, 5

[20] I. Sutskever, J. Martens, G. Dahl, and G. Hinton. On the importance of initialization and momentum in deep learning. In ICML, 2013. 3

[21] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, 2017. 5

[22] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In CVPR, 2016. 1, 5

[23] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In ICLR, 2014. 1, 3

[24] F. Tramer, A. Kurakin, N. Papernot, D. Boneh, and P. Mc- ` Daniel. Ensemble adversarial training: Attacks and defenses. In ICLR, 2018. 2, 3, 5

PreviousDeepFool:一种简单而准确的欺骗深度神经网络的方法 NextHopSkipJumpAttack:一种查询效率高的基于决策的攻击

Last updated 4 years ago

Was this helpful?