HopSkipJumpAttack:一种查询效率高的基于决策的攻击

HopSkipJumpAttack: A Query-Efficient Decision-Based Attack

原文链接：

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9152788

GB/T 7714 Chen J, Jordan M I, Wainwright M J. Hopskipjumpattack: A query-efficient decision-based attack[C]//2020 IEEE Symposium on Security and Privacy (SP). IEEE, 2020: 1277-1294.

MLA Chen, Jianbo, Michael I. Jordan, and Martin J. Wainwright. "Hopskipjumpattack: A query-efficient decision-based attack." 2020 IEEE Symposium on Security and Privacy (SP). IEEE, 2020.

APA Chen, J., Jordan, M. I., & Wainwright, M. J. (2020, May). Hopskipjumpattack: A query-efficient decision-based attack. In 2020 IEEE Symposium on Security and Privacy (SP) (pp. 1277-1294). IEEE.

Abstract

摘要

The goal of a decision-based adversarial attack on a trained model is to generate adversarial examples based solely on observing output labels returned by the targeted model. We develop HopSkipJumpAttack, a family of algorithms based on a novel estimate of the gradient direction using binary information at the decision boundary. The proposed family includes both untargeted and targeted attacks optimized for 2 and ∞ similarity metrics respectively. Theoretical analysis is provided for the proposed algorithms and the gradient direction estimate. Experiments show HopSkipJumpAttack requires significantly fewer model queries than several state-of-the-art decision-based adversarial attacks. It also achieves competitive performance in attacking several widely-used defense mechanisms.

对训练模型进行基于决策的对抗性攻击的目的是仅根据观察目标模型返回的输出标签来生成对抗性示例。我们开发了HopSkipJumpAttack，这是一种基于梯度方向的新估计的算法，利用决策边界处的二进制信息。所提出的族包括优化为2的无目标攻击和有目标攻击，以及相似度度量。对所提算法和梯度方向估计进行了理论分析。实验表明，与一些最先进的基于决策的对抗性攻击相比，HopSkipJumpAttack需要更少的模型查询。它还在攻击几种广泛使用的防御机制中实现了竞争性能。

I. INTRODUCTION

1.介绍

Although deep neural networks have achieved state-of-the-art performance on a variety of tasks, they have been shown to be vulnerable to adversarial examples—that is, maliciously perturbed examples that are almost identical to original samples in human perception, but cause models to make incorrect decisions [1]. The vulnerability of neural networks to adversarial examples implies a security risk in applications with real-world consequences, such as self-driving cars, robotics, financial services, and criminal justice; in addition, it highlights fundamental differences between human learning and existing machine-based systems. The study of adversarial examples is thus necessary to identify the limitation of current machine learning algorithms, provide a metric for robustness, investigate the potential risk, and suggest ways to improve the robustness of models.

虽然深度神经网络已经在各种任务上取得了最先进的性能，但它们已经被证明容易受到敌对的例子的攻击，也就是说，恶意干扰的例子与人类感知的原始样本几乎相同，但导致模型做出错误的决定[1]。神经网络在对抗案例中的脆弱性意味着在具有现实后果的应用中存在安全风险，比如自动驾驶汽车、机器人、金融服务和刑事司法;此外，它强调了人类学习和现有的基于机器的系统之间的根本区别。因此，对抗性例子的研究对于识别当前机器学习算法的局限性，提供鲁棒性度量，研究潜在的风险，并提出改进模型鲁棒性的方法是必要的。

Recent years have witnessed a flurry of research on the design of new algorithms for generating adversarial examples [1–16]. Adversarial examples can be categorized according to at least three different criteria: the similarity metric, the attack goal, and the threat model. Commonly used similarity metrics are $\ell_{p}$ -distances between adversarial and original examples with $p ∈ \{0, 2, ∞\}$ . The goal of attack is either untargeted or targeted. The goal of an untargeted attack is to perturb the input so as to cause any type of misclassification, whereas the goal of a targeted attack is to alter the decision of the model to a pre-specific target class. Changing the loss function allows for switching between two types of attacks [3, 5, 6].

近年来，针对生成敌对算例的新算法的设计进行了大量研究[1 16]。敌对的例子可以根据至少三个不同的标准进行分类:相似度度量、攻击目标和威胁模型。常用的相似度度量是敌对示例与原始示例之间的p距离， $p ∈ \{0, 2, ∞\}$ 。攻击的目标要么是无目标的，要么是有目标的。非定向攻击的目标是干扰输入，从而导致任何类型的错误分类，而定向攻击的目标是将模型的决策更改为预先确定的目标类。改变损失函数可以在两种攻击类型之间切换[3,5,6]。

Figure 1: An illustration of accessible components of the target model for each of the three threat models. A white-box threat model assumes access to the whole model; a score-based threat model assumes access to the output layer; a decision-based threat model assumes access to the predicted label alone.

图1:三个威胁模型的目标模型的可访问组件的图解。白盒威胁模型假设可以访问整个模型;基于分数的威胁模型假设可以访问输出层;基于决策的威胁模型假设只能访问预测的标签。

Perhaps the most important criterion in practice is the threat model, of which there are two primary types: white-box and black-box. In the white-box setting, an attacker has complete access to the model, including its structure and weights. Under this setting, the generation of adversarial examples is often formulated as an optimization problem, which is solved either via treating misclassification loss as a regularization [1, 6] or via tackling the dual as a constrained optimization problem [2, 3, 7]. In the black-box setting, an attacker can only access outputs of the target model. Based on whether one has access to the full probability or the label of a given input, black-box attacks are further divided into score-based and decision-based. See Figure 1 for an illustration of accessible components of the target model for each of the three threat models. Chen et al. [8] and Ilyas et al. [9, 10] introduced score-based methods using zeroth-order gradient estimation to craft adversarial examples.

也许在实践中最重要的标准是威胁模型，其中有两种主要类型:白盒和黑盒。在白盒设置中，攻击者可以完全访问模型，包括模型的结构和权重。在此设置下，对抗性例子的生成通常被表述为一个优化问题，该问题可以通过将误分类损失处理为正则化[1,6]或将对偶处理为一个约束优化问题[2,3,7]来解决。在黑箱设置中，攻击者只能访问目标模型的输出。根据是否能够访问给定输入的全部概率或标签，黑盒攻击进一步分为基于分数的攻击和基于决策的攻击。图1展示了三个威胁模型的目标模型的可访问组件。Chen等人[8]和Ilyas等人[9,10]引入了基于分数的方法，利用零阶梯度估计来制作对抗性实例。

The most practical threat model is that in which an attacker has access to decisions alone. A widely studied type of the decision-based attack is transfer-based attack. Liu et al. [11] showed that adversarial examples generated on an ensemble of deep neural networks from a white-box attack can be transferred to an unseen neural network. Papernot et al. [12, 13] proposed to train a substitute model by querying the target model. However, transfer-based attack often requires a carefully-designed substitute model, or even access to part of the training data. Moreover, they can be defended against via training on a data set augmented by adversarial examples from multiple static pre-trained models [17]. In recent work, Brendel et al. [14] proposed Boundary Attack, which generates adversarial examples via rejection sampling. While relying neither on training data nor on the assumption of transferability, this attack method achieves comparable performance with state-of-the-art white-box attacks such as C&W attack [6]. One limitation of Boundary Attack, however, is that it was formulated only for 2-distance. Moreover, it requires a relatively large number of model queries, rendering it impractical for real-world applications.

最实际的威胁模型是攻击者单独访问决策。一种被广泛研究的基于决策的攻击是基于传输的攻击。Liu等人[11]表明，由白盒攻击在深度神经网络集合上生成的对抗性例子可以转移到看不见的神经网络。Papernot等[12,13]提出通过查询目标模型来训练替代模型。然而，基于传输的攻击通常需要一个精心设计的替代模型，甚至需要访问部分训练数据。此外，还可以通过对多个预先训练的静态模型[17]的对抗性例子进行扩充的数据集进行训练来防御。在最近的工作中，Brendel et al.[14]提出了边界攻击，它通过拒绝抽样产生敌对的例子。这种攻击方法既不依赖于训练数据，也不依赖于可转移性的假设，但与C&W攻击[6]等最先进的白盒攻击具有相当的性能。然而，边界攻击的一个限制是，它只能表示两距离。此外，它需要相对大量的模型查询，这使得它对于真实的应用程序不太现实。

It is more realistic to evaluate the vulnerability of a machine learning system under the decision-based attack with a limited budget of model queries. Online image classification platforms often set a limit on the allowed number of queries within a certain time period. For example, the cloud vision API from Google currently allow 1,800 requests per minute. Query inefficiency thus leads to clock-time inefficiency and prevents an attacker from carrying out large-scale attacks. A system may also be set to recognize the behavior of feeding a large number of similar queries within a small amount of time as a fraud, which will automatically filter out query-inefficient decision-based attacks. Last but not least, a smaller query budget directly implies less cost in evaluation and research. Query-efficient algorithms help save the cost of evaluating the robustness of public platforms, which incur a cost for each query made by the attacker. It also helps facilitate research in adversarial vulnerability, as such a decision-based attack which does not require access to model details may be used as a simple and efficient first step in evaluating new defense mechanisms, as we will see in Section V-B and Appendix C.

在模型查询预算有限的情况下，评估机器学习系统在决策攻击下的脆弱性更为现实。在线图像分类平台通常会对一定时间内允许的查询数量进行限制。例如，谷歌的cloud vision API目前允许每分钟1800个请求。查询效率低会导致时钟时间效率低，从而阻止攻击者进行大规模攻击。系统还可以将在短时间内输入大量类似查询的行为识别为欺诈行为，从而自动过滤掉查询效率低下的基于决策的攻击。最后但同样重要的是，更小的查询预算直接意味着更少的评估和研究成本。高效查询算法有助于节省评估公共平台健壮性的成本，因为攻击者每次查询都会产生成本。它还有助于对抗性漏洞的研究，因为这种不需要访问模型细节的基于决策的攻击可以作为评估新防御机制的简单而有效的第一步，我们将在V-B节和附录C中看到。

We propose a novel unbiased estimate of gradient direction at the decision boundary based solely on access to model decisions, and propose ways to control the error from deviation from the boundary.
提出了一种新的基于模型决策的决策边界梯度方向无偏估计方法，并提出了控制决策边界偏差的方法。
We design a family of algorithms, HopSkipJumpAttack, based on the proposed estimate and our analysis, which is hyperparameter-free, query-efficient and equipped with a convergence analysis.
基于所提出的估计和我们的分析，我们设计了一个算法族，HopSkipJumpAttack，它是超参数无，查询高效，并具有收敛性分析。
We demonstrate the superior efficiency of our algorithm over several state-of-the-art decision-based attacks through extensive experiments.
通过大量的实验，我们证明了我们的算法在几个最先进的基于决策的攻击上的优越效率。
Through the evaluation of several defense mechanisms such as defensive distillation, region-based classification, adversarial training and input binarization, we suggest our attack can be used as a simple and efficient first step for researchers to evaluate new defense mechanisms.
通过对防御精馏、基于区域的分类、对抗训练和输入二值化等防御机制的评估，我们提出我们的攻击可以作为研究人员评估新的防御机制的一个简单有效的第一步。

Roadmap.

路标

In Section II, we describe previous work on decision-based adversarial attacks and their relationship to our algorithm. We also discuss the connection of our algorithm to zeroth-order optimization. In Section III, we propose and analyze a novel iterative algorithm which requires access to the gradient information. Each step carries out a gradient update from the boundary, and then projects back to the boundary again. In Section IV, we introduce a novel asymptotically unbiased gradient-direction estimate at the boundary, and a binary-search procedure to approach the boundary. We also discuss how to control errors with deviation from the boundary. The analysis motivates a decision-based algorithm, HopSkipJumpAttack (Algorithm 2). Experimental results are provided in Section V. We conclude in Section VI with a discussion of future work.

在第二节中，我们描述了以前关于基于决策的对抗攻击的工作，以及它们与我们算法的关系。我们还讨论了我们的算法与零阶优化的联系。在第三节中，我们提出并分析了一种新的迭代算法，该算法需要获取梯度信息。每一步都从边界执行一个梯度更新，然后再投影回边界。在第四节中，我们介绍了一种新的渐近无偏梯度方向估计方法，以及一种接近边界的二值搜索方法。我们还讨论了如何控制偏离边界的误差。通过分析，我们提出了一种基于决策的算法，HopSkipJumpAttack(算法2)。实验结果将在第五节中给出。我们将在第六节中总结并讨论未来的工作。

A. Decision-based attacks

A. 决定攻击

Most related to our work is the Boundary Attack method introduced by Brendel et al. [14]. Boundary Attack is an iterative algorithm based on rejective sampling, initialized at an image that lies in the target class. At each step, a perturbation is sampled from a proposal distribution, which reduces the distance of the perturbed image towards the original input. If the perturbed image still lies in the target class, the perturbation is kept. Otherwise, the perturbation is dropped. Boundary Attack achieves performance comparable to state-of-the-art white-box attacks on deep neural networks for image classification. The key obstacle to its practical application is, however, the demand for a large number of model queries. In practice, the required number of model queries for crafting an adversarial example directly determines the level of the threat imposed by a decision-based attack. One source of inefficiency in Boundary Attack is the rejection of perturbations which deviate from the target class. In our algorithm, the perturbations are used for estimation of a gradient direction.

与我们工作最相关的是Brendel等人[14]提出的边界攻击方法。边界攻击是一种基于拒绝采样的迭代算法，初始化于目标类中的图像。在每一步，从一个建议分布采样一个扰动，这减少了扰动图像到原始输入的距离。如果摄动图像仍然在目标类中，则保持摄动。否则，扰动就会被消除。在图像分类方面，边界攻击的性能可与最先进的深度神经网络白盒攻击相媲美。然而，其实际应用的关键障碍是需要进行大量的模型查询。在实践中，创建一个对抗示例所需的模型查询数量直接决定了基于决策的攻击所造成的威胁级别。边界攻击的低效来源之一是拒绝偏离目标类别的扰动。在我们的算法中，使用摄动估计梯度方向。

Several other decision-based attacks have been proposed to improve efficiency. Brunner et al. [15] introduced Biased Boundary Attack, which biases the sampling procedure by combining low-frequency random noise with the gradient from a substitute model. Biased Boundary Attack is able to significantly reduce the number of model queries. However, it relies on the transferability between the substitute model and the target model, as with other transfer-based attacks. Our algorithm does not rely on the additional assumption of transferability. Instead, it achieves a significant improvement over Boundary Attack through the exploitation of discarded information into the gradient-direction estimation. Ilyas et al. [9] proposed Limited attack in the label-only setting, which directly performs projected gradient descent by estimating gradients based on novel proxy scores. Cheng et al. [16] introduced Opt attack, which transforms the original problem to a continuous version, and solves the new problem via randomized zeroth-order gradient update. Our algorithm approaches the original problem directly via a novel gradientdirection estimate, leading to improved query efficiency over both Limited Attack and Opt Attack. The majority of model queries in HopSkipJumpAttack come in mini-batches, which also leads to improved clock-time efficiency over Boundary Attack.

为了提高效率，还提出了其他几种基于决策的攻击。Brunner等人[15]引入了偏置边界攻击，通过将低频随机噪声与替代模型的梯度相结合，使采样过程产生偏差。偏置边界攻击能够显著减少模型查询的数量。然而，它依赖于替代模型和目标模型之间的可转移性，就像其他基于转移的攻击一样。我们的算法不依赖于可转移性的附加假设。相反，它通过利用丢弃的信息来进行梯度方向估计，从而大大改善了边界攻击。Ilyas等人[9]在标签设置中提出了有限攻击，通过基于新的代理分数估计梯度，直接执行投影梯度下降。Cheng等[16]引入了最优攻击，将原问题转化为连续的问题，通过随机化的零阶梯度更新来解决新问题。我们的算法通过一种新的梯度方向估计直接逼近原始问题，从而提高了有限攻击和最优攻击的查询效率。在HopSkipJumpAttack中，大多数模型查询都是小批量的，这也提高了边界攻击的时钟效率。

B. Zeroth-order optimization

B .零阶优化

Zeroth-order optimization refers to the problem of optimizing a function f based only on access to function values f(x), as opposed to gradient values ∇f(x). Such problems have been extensively studied in the convex optimization and bandit literatures. Flaxman et al. [18] studied one-point randomized estimate of gradient for bandit convex optimization. Agarwal et al. [19] and Nesterov and Spokoiny [20] demonstrated that faster convergence can be achieved by using two function evaluations for estimating the gradient. Duchi et al. [21] established optimal rates of convex zeroth-order optimization via mirror descent with two-point gradient estimates. Zerothorder algorithms have been applied to the generation of adversarial examples under the score-based threat model [8–10]. Subsequent work [22] proposed and analyzed an algorithm based on variance-reduced stochastic gradient estimates.

零阶优化是指仅基于函数值f(x)的访问来优化函数f的问题，而不是基于梯度值 $\triangledown f(x)$ 的访问。这类问题在凸优化和bandit文献中得到了广泛的研究。Flaxman等人[18]研究了bandit凸优化梯度的一点随机估计。Agarwal等人[19]和Nesterov和Spokoiny[20]证明，使用两个函数评估估计梯度可以实现更快的收敛。Duchi等人通过两点梯度估计的镜面下降建立了凸零阶优化的最优率。Zerothorder算法被应用于基于分数的威胁模型下的对抗算例生成[8 10]。后续工作[22]提出并分析了一种基于方差减少随机梯度估计的算法。

We formulate decision-based attack as an optimization problem. A core component of our proposed algorithm is a gradient-direction estimate, the design of which is motivated by zeroth-order optimization. However, the problem of decision-based attack is more challenging than zerothorder optimization, essentially because we only have binary information from output labels of the target model, rather than function values.

我们提出了一个基于决策的攻击作为一个优化问题。我们提出的算法的核心部分是梯度方向估计，其设计动机是零阶优化。然而，基于决策的攻击问题比zerothorder优化更具有挑战性，本质上是因为我们只有来自目标模型输出标签的二进制信息，而不是函数值。

III. AN OPTIMIZATION FRAMEWORK

III. 一个优化框架

In this section, we describe an optimization framework for finding adversarial instances for an m-ary classification model of the following type. The first component is a discriminant function $F : \mathbb{R}^{d} → \mathbb{R}^{m}$ that accepts an input $x ∈ [0, 1]^{d}$ and produces an output $y ∈ Δm := \{y ∈ [0, 1]^{m} | \sum_{c=1}^{m}y_{c} = 1\}$ . The output vector y = (F1(x),...,Fm(x)) can be viewed as a probability distribution over the label set [m] = {1,...,m}. Based on the function F, the classifier C : Rd → [m] assigns input x to the class with maximum probability—that is,

在本节中，我们描述了一个优化框架，为下列类型的一个m-ary分类模型寻找对抗实例。第一个组件是一个判别函数 $F : \mathbb{R}^{d} → \mathbb{R}^{m}$ ,它接受一个输入 $x ∈ [0, 1]^{d}$ 并产生一个输出 $y ∈ Δm := \{y ∈ [0, 1]^{m} | \sum_{c=1}^{m}y_{c} = 1\}$ 。输出向量 $y = (F_{1}(x)，…，F_{m}(x))$ 可以看作是标签集上的概率分布 $[m] ={1，…，m}$ 。分类器C: Rd [m]根据函数F，将输入x赋给最大概率为的类

We study adversaries of both the untargeted and targeted varieties. Given some input $x^{\star}$ , the goal of an untargeted attack is to change the original classifier decision $c^{\star}:= C(x^{\star})$ to any $c ∈ [m]\{c^{\star}\}$ , whereas the goal of a targeted attack is to change the decision to some pre-specified $c^{†} ∈ [m]\{c^{\star}\}$ . Formally, if we define the function $S_{x} : \mathbb{R}^{d} → \mathbb{R}$ via

我们研究非目标品种和目标品种的敌人。给定一些输入 $x^{\star}$ ，非目标攻击的目标是将原来的分类器决策 $c^{\star}:= C(x^{\star})$ 更改为任何 $c ∈ [m]\{c^{\star}\}$ ，而目标攻击的目标是将决策更改为某些预先指定的 $c^{†} ∈ [m]\{c^{\star}\}$ 。形式上，如果我们定义函数 $S_{x} : \mathbb{R}^{d} → \mathbb{R}$ via

then a perturbed image $x^{\prime}$ is a successful attack if and only if $S_{x^{\star}}(x^{\prime})>0$ . The boundary between successful and unsuccessful perturbed images is

然后是一个扰动的图像 $x^{\prime}$ 一个成功的攻击是当且仅当 $S_{x^{\star}}(x^{\prime})>0$ .成功扰动图像与不成功扰动图像之间的边界为

As an indicator of successful perturbation, we introduce the Boolean-valued function $φx : [0, 1]d → {−1, 1}$ via

作为成功扰动的一个指标，我们引入了布尔值函数 $φx : [0, 1]d → \{−1, 1\}$ via

This function is accessible in the decision-based setting, as it can be computed by querying the classifier C alone. The goal of an adversarial attack is to generate a perturbed sample x such that φx (x )=1, while keeping x close to the original sample x. This can be formulated as the optimization problem

这个函数在基于决策的设置中是可访问的，因为它可以通过查询分类器C单独计算。对抗性攻击的目的是生成一个扰动样本 $x\prime$ 如此 $\phi_{x^{*}}(x^{\prime})=1$ 同时保持接 $x\prime$ 近原始样本 $x^{\star}$ 这可以表述为优化问题

where d is a distance function that quantifies similarity. Standard choices of d studied in past work [2, 5, 6] include the usual $\ell_{p}$ -norms, for $p ∈ \{0, 2, ∞\}$ .

其中d是量化相似性的距离函数。在过去的工作[2,5,6]中研究的d的标准选择包括通常的 $\ell_{p}$ 范数，对于 $p ∈ \{0, 2, ∞\}$ 。

A. An iterative algorithm for $\ell_{2}$ distance

A. $\ell_{2}$ 距离的迭代算法

Consider the case of the optimization problem (2) with the $\ell_{2}-norm~d(x,x^{\star})=\lVert x-x^{\star} \rVert_{2}$ . We first specify an iterative algorithm that is given access to the gradient $\triangledown S_{x^{\star}}$ . Given an initial vector x0 such that $S_{x^{\star}}(x_{0})>0$ and a stepsize sequence $\{ \zeta_{t} \}_{t\geq 0}$ , it performs the update

考虑优化问题(2)的情况 $\ell_{2}-norm~d(x,x^{\star})=\lVert x-x^{\star} \rVert_{2}$ 。我们首先指定一个迭代算法，给定了对梯度的访问 $\triangledown S_{x^{\star}}$ 。已知初始向量x0 $S_{x^{\star}}(x_{0})>0$ 和步长序列 $\{ \zeta_{t} \}_{t\geq 0}$ 它执行更新

where $ξ_{t}$ is a positive step size. Here the line search parameter $αt ∈ [0, 1]$ is chosen such that $S_{x^{\star}}(x_{t+1})=0$ —that is, so that the next iterate xt+1 lies on the boundary. The motivation for this choice is that our gradient-direction estimate in Section IV is only valid near the boundary.

其中 $ξ_{t}$ 是一个积极的步长。这里，选择直线搜索参数反演 $αt ∈ [0, 1]$ ，使Sx (xt+1)=0即，使下一个迭代 $S_{x^{\star}}(x_{t+1})=0$ 位于边界上。这样选择的动机是我们在第四节中的梯度方向估计只在边界附近有效。

We now analyze this algorithm with the assumption that we have access to the gradient of $S_{x^{\star}}$ in the setting of binary classification. Assume that the function $S_{x^{\star}}$ is twice differentiable with a locally Lipschitz gradient, meaning that there exists L > 0 such that for all $x,y\in\{ z:\lVert z-x^{\star} \rVert_{2}\leq \lVert x_{0}-x^{\star} \rVert_{2} \}$ , we have

我们现在分析这个算法的前提是假设我们可以得到的梯度 $S_{x^{\star}}$ 在二分类的设置中。假设函数 $S_{x^{\star}}$ 具有局部利普希茨梯度的二次可微性，意味着存在 $L > 0$ 对所有人来说 $x,y\in\{ z:\lVert z-x^{\star} \rVert_{2}\leq \lVert x_{0}-x^{\star} \rVert_{2} \}$

In addition, we assume the gradient is bounded away from zero on the boundary: there exists a positive C >˜ 0 such that $\lVert \triangledown S_{x^{\star}}(x)-\tilde{C} \rVert \leq \tilde{C}$ for any $z\in bd(S_{x^{\star}}).$

此外,我们假设上的梯度有界远离零边界:存在一个积极的C >˜0, $a = b$ $\lVert \triangledown S_{x^{\star}}(x)-\tilde{C} \rVert \leq \tilde{C}$ for any $z\in bd(S_{x^{\star}}).$

We analyze the behavior of the updates (3) in terms of the angular measure

我们从angular的角度来分析更新(3)的行为

corresponding to the cosine of the angle between $x_{t}-x^{\star}$ and the gradient $\triangledown S_{x^{\star}}(x_{t}))$ . Note that the condition $r(x_{t},x^{\star})=1$ holds if and only if x is a stationary point of the optimization (2). The following theorem guarantees that, with a suitable step size, the updates converge to such a stationary point:

对应于 $x_{t}-x^{\star}$ 和梯度 $\triangledown S_{x^{\star}}(x_{t}))$ 的夹角的余弦。注意，条件 $r(x_{t},x^{\star})=1$ 当且仅当x是优化(2)的平稳点时成立。以下定理保证在合适的步长下，更新收敛到该平稳点

Theorem 1. Under the previously stated conditions on Sx , suppose that we compute the updates (3) with step size ξt = xt − x2t −q for some q ∈ 1 2 , 1 . Then there is a universal constant c such that

定理1。在先前规定的条件下 $S_{x^{\star}}$ , 假设我们用步长计算更新(3) $\zeta_{t}=\lVert x_{t}-x^{\star} \rVert_{2}t-q$ 对于一些 $q\in(\frac{1}{2},1)$ . 然后有一个通用常数c

In particular, the algorithm converges to a stationary point of problem (2).

特别地，该算法收敛于问题(2)的一个平稳点。

Theorem 1 suggests a scheme for choosing the step size in the algorithm that we present in the next section. An experimental evaluation of the proposed scheme is carried out in Appendix B. The proof of the theorem is constructed by establishing the relationship between the objective value $d(x_{t}, x^{\star})$ and $r(x_{t}, x^{\star})$ , with a second-order Taylor approximation to the boundary. See Appendix A-A for details.

定理1提出了一种在算法中选择步长的方案，我们将在下一节中介绍。附录b对所提出的方案进行了实验评估。通过建立目标值 $d(x_{t}, x^{\star})$ 和 $r(x_{t}, x^{\star})$ 之间的关系，利用边界的二阶泰勒近似，构造了定理的证明。详见附录A-A。

B. Extension to $\ell_{∞}$ -distance

B. 扩展 $\ell_{∞}$ —远程

We now describe how to extend these updates so as to minimize the $\ell_{∞}$ -distance. Consider the $\ell_{∞}$ -projection of a point x onto the sphere of radius $\alpha_{t}$ centered at $x^{\star}$ :

我们现在描述如何扩展这些更新，以最小化 $\ell_{∞}$ 距离。考虑点 $x^{\star}$ 在以x为圆心的半径为 $\alpha_{t}$ 的球面上的 $\ell_{∞}$ 投影:

This perspective allows us to extend the algorithm to other $\ell_{∞}$ -norms for $p\neq2$ . For instance, in the case $p = ∞$ , we can define the ∞-projection operator $\prod_{x^{\star},\alpha_{t}}^{2}$ . It performs a perpixel clip within a neighborhood of x, such that the ith entry of $\prod_{x^{\star},\alpha_{t}}^{2}$ is

这种视角允许我们将算法扩展到其他方面 $\ell_{∞}$ -distance for $p\neq2$ . 例如，在这个案例中 $p = ∞$ , 我们可以定义∞投影算子 $\prod_{x^{\star},\alpha_{t}}^{2}$ 它执行一个perpixel剪辑在一个社区 $x^{\star}$ ,第i个条目的 $\prod_{x^{\star},\alpha_{t}}^{2}$

where $c := α\lVert x-x^{\star} \rVert_{∞}$ . We propose the ∞-version of our algorithm by carrying out the following update iteratively:

在 $c := α\lVert x-x^{\star} \rVert_{∞}$ 。通过迭代地进行以下更新，我们提出了算法的∞版本:

where αt is chosen such that $Sx^{\star}(x_{t+1})=0$ , and “sign” returns the element-wise sign of a vector. We use the sign of the gradient for faster convergence in practice, similar to previous work [2, 3, 7].

其中，选择t使 $Sx^{\star}(x_{t+1})=0$ ，并且“符号”返回向量的元素明智符号。我们在实践中使用梯度符号来加快收敛速度，类似于之前的工作[2,3,7]。

IV. A DECISION-BASED ALGORITHM BASED ON A NOVEL GRADIENT ESTIMATE

IV. 基于小说的一种决策算法梯度估计

We now extend our procedures to the decision-based setting, in which we have access only to the Boolean-valued function $φ_{x^{\star}} (x) = sign(S_{x^{\star}}(x))$ —that is, the method cannot observe the underlying discriminant function F or its gradient. In this section, we introduce a gradient-direction estimate based on $φ_{x^{\star}}$ when $x_{t}\in bd(S_{x^{\star}})$ (so that $φ_{x^{\star}} (x) = 0$ by definition). We proceed to discuss how to approach the boundary. Then we discuss how to control the error of our estimate with a deviation from the boundary. We will summarize the analysis with a decision-based algorithm.

现在，我们将我们的程序扩展到基于决策的设置，在该设置中，我们只能访问布尔值函数，即， $φ_{x^{\star}} (x) = sign(S_{x^{\star}}(x))$ 该方法不能观察潜在的判别函数F或其梯度。在这一节中，我们介绍了在 $x_{t}\in bd(S_{x^{\star}})$ 时(因此根据定义 $φ_{x^{\star}} (x) = 0$ 基于rxx的梯度方向估计。我们接着讨论如何接近边界。然后讨论了如何用离边界的偏差来控制估计的误差。我们将用一个基于决策的算法来总结分析。

A. At the boundary

A.在边界处

Given an iterate $xt ∈ bd(S_{x^{\star}} )$ we propose to approximate the direction of the gradient $∇S_{x^{\star}}(x_{t})$ via the Monte Carlo estimate

对于迭代 $xt ∈ bd(S_{x^{\star}} )$ ，我们提出通过蒙特卡洛估计来近似梯度 $∇S_{x^{\star}}(x_{t})$ 的方向

where {ub}B b=1 are i.i.d. draws from the uniform distribution over the d-dimensional sphere, and δ is small positive parameter. (The dependence of this estimator on the fixed centering point x is omitted for notational simplicity.)

其中{ub}B B =1为i.i.d.来自于d维球的均匀分布，而散度为小的正参数。(为了简单起见，此估计器对固定定心点x的依赖省略了。)

The perturbation parameter δ is necessary, but introduces a form of bias in the estimate. Our first result controls this bias, and shows that $\widetilde{\triangledown S}(x_{t},\delta)$ is asymptotically unbiased as δ → 0+.

扰动参数的取值是必要的，但在估计中引入了一种形式的偏差。我们的第一个结果控制了这种偏差，并表明 $\widetilde{\triangledown S}(x_{t},\delta)$ 是渐近无偏的吗 $δ → 0+$

Theorem 2. For a boundary point xt, suppose that Sx has L-Lipschitz gradients in a neighborhood of xt. Then the cosine of the angle between $\widetilde{\triangledown S}(x_{t},\delta)$ and $\widetilde{\triangledown S}(x_{t},\delta)$ is bounded as

定理2。对于边界点xt，假设Sx有邻域上的L-Lipschitz梯度。那么夹角的余弦值 $\widetilde{\triangledown S}(x_{t},\delta)$ and $\widetilde{\triangledown S}(x_{t},\delta)$ 是有界的

In particular, we have

特别是，我们有

showing that the estimate is asymptotically unbiased as an estimate of direction.

表明估计作为方向的估计是渐近无偏的。

We remark that Theorem 2 only establishes the asymptotic behavior of the proposed estiamte at the boundary. This also motivates the boundary search step in our algorithm to be discussed in Seciton IV-B. The proof of Theorem 2 starts from dividing the unit sphere into three components: the upper cap along the direction of gradient, the lower cap opposite to the direction of gradient, and the annulus in between. The error from the annulus can be bounded when δ is small. See Appendix A-B for the proof of this theorem. As will be seen in the sequel, the size of perturbation δ should be chosen proportionally to d−1; see Section IV-C for details.

我们注意到定理2只建立了所提estiamte在边界处的渐近性态。这也促使我们算法中的边界搜索步骤在第四章- b中进行讨论。定理2的证明从把单位球分成三个分量开始:沿梯度方向的上帽，与梯度方向相反的下帽，以及中间的环。当上下限很小时，环的误差是有限度的。这个定理的证明见附录A-B。正如在续集中所看到的，扰动扰动的大小应按d1的比例选择;详情见第IV-C节。

B. Approaching the boundary

B.接近边界

The proposed estimate (9) is only valid at the boundary. We now describe how we approach the boundary via a binary search. Let $\tilde{x}_{t}$ denote the updated sample before the operator $\prod_{x,\alpha_{t}}^{p}$ is applied:

建议的估计(9)仅在边界处有效。我们现在描述如何通过二分查找接近边界。让 $\tilde{x}_{t}$ 运营商之前表示更新的样本 $\prod_{x,\alpha_{t}}^{p}$ 应用:

where $\widehat{\triangledown S}$ will be introduced later in equation (16), as a variance-reduced version of $\widehat{\triangledown S}$ , and δt is the size of perturbation at the t-th step.

$\widehat{\triangledown S}$ 将在式(16)中引入，作为?的方差简化版本 $\widehat{\triangledown S}$ 并且是第t-th步扰动的大小。

We hope $\tilde{x}_{t}$ is at the opposite side of the boundary to x so that the binary search can be carried out. Therefore, we initialize at $\tilde{x}_{0}$ at the target side with $φx (˜x0)=1$ , and set $x0 := Πp x,α0 (˜x0)$ , where α0 is chosen via a binary search between 0 and 1 to approach the boundary, stopped at x0 lying on the target side with $φx (x0)=1$ . At the t-th iteration, we start at xt lying at the target side $φx (xt)=1$ . The step size is initialized as

我们希望 $\tilde{x}_{t}$ 在x的边界的另一边这样二分查找就可以进行了。因此,我们在目标端初始化在 $\tilde{x}_{0}$ $φx (˜x0)=1$ ,并设置 $x0 := Πp x,α0 (˜x0)$ ,α0在哪里选择通过一个二叉搜索方法的边界在0和1之间,停在x0躺在目标端与 $φx (x0)=1$ 。在第t次迭代中，我们从位于目标边的xt处开始计算。步长初始化为

as suggested by Theorem 1 in the $\ell_{2}$ case, and is decreased by half until $φx (˜xt)=1$ , which we call geometric progression of ξt. Having found an appropriate x˜t, we choose the projection radius αt via a binary search between 0 and 1 to approach the boundary, which stops at xt+1 with φx (xt+1)=1. See Algorithm 1 for the complete binary search, where the binary search threshold θ is set to be some small constant.

所提出的定理1在 $\ell_{2}$ 的情况下,减少了一半,直到 $φx (˜xt)=1$ ,我们称之为ξt的几何级数。在找到合适的xt后，我们通过0和1之间的二分搜索来选择投影半径曲面t来接近边界，边界在xt+1处停止，而xt (xt+1)=1。完全二分搜索的算法1，二分搜索的阈值被设置为一个小的常数。

C. Controlling errors of deviations from the boundary

C.控制边界偏差的误差

Binary search never places xt+1 exactly onto the boundary. We analyze the error of the gradient-direction estimate, and propose two approaches for reducing the error.

二分法搜索永远不会把xt+1精确地放在边界上。分析了梯度方向估计的误差，提出了减小误差的两种方法。

a) Appropriate choice of the size of random perturbation: First, the size of random perturbation δt for estimating the gradient direction is chosen as a function of image size d and the binary search threshold θ. This is different from numerical differentiation, where the optimal choice of δt is at the scale of round-off errors (e.g., [23]). Below we characterize the error incurred by a large δt as a function of distance between x˜t and the boundary, and derive the appropriate choice of ξt and δt. In fact, with a Taylor approximation of Sx at xt, we have

a)适当选择随机扰动的大小:首先，选择用于估计梯度方向的随机扰动扰动的大小，作为图像大小d和二值搜索阈值的函数。这不同于数值微分法，在数值微分法中，最优的选择是在舍入误差的范围内(例如，[23])。下面我们描述错误发生的大量δt的函数x t和边界之间的距离,并获得适当的选择ξt和δt。实际上，通过对Sx在xt处的泰勒近似，我们得到

At the boundary Sx (xt)=0, the error of gradient approximation scales at O(δ2 t ), which is minimized by reducing δt to the scale of rooted round-off error. However, the outcome xt of a finite-step binary search lies close to, but not exactly on the boundary.

在边界Sx (xt)=0处，梯度逼近的误差尺度为O(住径2t)，通过将住径t减小到带根的舍入误差的尺度，使之最小化。然而，有限步二分搜索的结果xt接近边界，但不完全在边界上。

When δt is small enough such that second-order terms can be omitted, the first-order Taylor approximation implies that φx (xt+δtu) = −1 if and only if xt+δtu lies on the spherical cap C, with

当卡乘t足够小，二阶项可以省略时，一阶泰勒近似表明，当且仅当卡乘t+卡乘tu位于球帽C上时，可得到(x (xt+卡乘tu) =−1

On the other hand, the probability mass of u concentrates on the equator in a high-dimensional sphere, which is characterized by the following inequality [24]:

另一方面，u的概率质量集中在高维球体的赤道上，其特征为不等式[24]:

A Taylor expansion of xt at x t := Π2 ∂(xt) yields

在xxx收益率下xt的泰勒展开式

By the Cauchy-Schwarz inequality and the definition of 2- projection, we have

根据Cauchy-Schwarz不等式和2-投影的定义，我们有

This yields

这个收益率

where q = 1 − (1/p) is the dual exponent. In order to avoid a loss of accuracy from concentration of measure, we let δt = dqθx˜t−1 − x2. To make the approximation error independent of dimension d, we set θ at the scale of d−q−1, so that δt is proportional to d−1, as suggested by Theorem 2. This leads to a logarithmic dependence on dimension for the number of model queries. In practice, we set

其中q = 1−(1/p)是对偶指数。为了避免测量浓度造成的精度损失，我们让 $d^{q}\theta\lVert \tilde{x}_{t-1}-x^{\star} \rVert_{2}$ 为了使逼近误差不受维d的影响，我们将f (n)设为d - q - 1的尺度，使t (n)与d - 1成比例，如定理2所示。这导致模型查询的数量对维数有对数依赖性。在实践中，我们设置

b) A baseline for variance reduction in gradient-direction estimation: Another source of error comes from the variance of the estimate, where we characterize variance of a random vector v ∈ Rd by the trace of its covariance operator: Var(v) := d i=1 Var(vi). When xt deviates from the boundary and δt is not exactly zero, there is an uneven distribution of perturbed samples at the two sides of the boundary:

b)梯度方向估计中方差减少的基线:另一个误差来源来自估计的方差，其中我们通过跟踪其协方差算子来描述一个随机向量v Rd的方差:Var(v):= di =1 Var(vi)。当xt偏离边界，且定值t不完全为零时，边界两边的摄动样本分布不均匀

as we can see from Equation (14). To attempt to control the variance, we introduce a baseline φx into the estimate:

由式(14)可以看出。为了控制方差，我们在估计值中引入了一个基线值:

which yields the following estimate:

由此得出以下估计:

It can be easily observed that this estimate is equal to the previous estimate in expectation, and thus still asymptotically unbiased at the boundary: When xt ∈ bd(Sx ), we have

可以很容易地观察到，这个估计值与之前的期望估计值相等，因此在边界处仍然渐近无偏:当xt∈bd(Sx)时，我们有

Moreover, the introduction of the baseline reduces the variance when E[φx (xt + δu)] deviates from zero. In particular, the following theorem shows that whenever |E[φx (xt + δu)]| = Ω(B− 1 2 ), the introduction of a baseline reduces the variance.

此外，基线的引入减少了E[rxx (xt +变量u)]偏离零时的方差。特别地，下面的定理表明，当|E[(xt +令u)]| = Ω(B−1 2)，基线的引入减少了方差。

Theorem 3. Defining σ2 := Var(φx (xt + δu)u) as the variance of one-point estimate, we have

定理3。定义了自定义变量:= Var(自定义变量x (xt +自定义变量u)u)作为一点估计的方差，我们有

where

然而

See Appendix A-C for the proof. We also present an experimental evaluation of our gradient-direction estimate when the sample deviates from the boundary in Appendix B, where we show our proposed choice of δt and the introduction of baseline yield a performance gain in estimating gradient.

详见附录A-C。我们还提供了一个实验评估，当样本偏离附录B中的边界时，我们的梯度方向估计，在那里，我们展示了我们所提议的选择，以及基线的引入，在估计梯度的性能提高。

D. HopSkipJumpAttack

Table I: Median distance at various model queries. The smaller median distance at a given model query is bold-faced. BA and HSJA stand for Boundary Attack and HopSkipJumpAttack respectively.

表I:各种模型查询的中间距离。在给定的模型查询中，更小的中值距离用粗体显示。英航和 HSJA分别代表Boundary Attack和HopSkipJumpAttack。

We now combine the above analysis into an iterative algorithm, HopSkipJumpAttack. It is initialized with a sample in the target class for untargeted attack, and with a sample blended with uniform noise that is misclassified for targeted attack. Each iteration of the algorithm has three components. First, the iterate from the last iteration is pushed towards the boundary via a binary search (Algorithm 1). Second, the gradient direction is estimated via Equation (16). Third, the updating step size along the gradient direction is initialized as Equation (13) based on Theorem 1, and is decreased via geometric progression until perturbation becomes successful. The next iteration starts with projecting the perturbed sample back to the boundary again. The complete procedure is summarized in Algorithm 2. Figure 2 provides an intuitive visualization of the three steps in 2. For all experiments, we initialize the batch size at 100 and increase it with √t linearly, so that the variance of the estimate reduces with t. When the input domain is bounded in practice, a clip is performed at each step by default.

现在，我们将上述分析结合到一个迭代算法HopSkipJumpAttack中。初始化时使用目标类中的样本进行非目标攻击，并使用混合均匀噪声的样本进行初始化，该样本被误分类用于目标攻击。算法的每次迭代都有三个组成部分。首先，通过二分查找(算法1)将最后一次迭代的迭代量推向边界。其次，通过式(16)估计梯度方向。第三，根据定理1，将梯度方向的更新步长初始化为式(13)，并通过几何级数递减，直到扰动成功。下一个迭代开始于再次将扰动样本投射回边界。算法2总结了整个过程。图2直观地展示了2中的三个步骤。对于所有的实验，我们将批大小初始化为100，并随t线性增加，从而使估计的方差随t减小。在实践中，当输入域有界时，默认每一步都执行一个剪辑。

V. EXPERIMENTS

v. 实验

In this section, we carry out experimental analysis of HopSkipJumpAttack. We compare the efficiency of HopSkipJumpAttack with several previously proposed decisionbased attacks on image classification tasks. In addition, we evaluate the robustness of three defense mechanisms under our attack method. All experiments were carried out on a Tesla K80 GPU, with code available online.2 Our algorithm is also available on CleverHans [25] and Foolbox [26], which are two popular Python packages to craft adversarial examples for machine learning models.

在本节中，我们对HopSkipJumpAttack进行了实验分析。我们比较了HopSkipJumpAttack与之前提出的几种基于决策的图像分类攻击的效率。此外，我们还评估了三种防御机制在攻击方法下的鲁棒性。所有的实验都是在特斯拉K80 GPU上进行的，代码可以在线获得。2我们的算法也可以在CleverHans[25]和傻瓜[26]上使用，这是两个为机器学习模型制作对抗示例的流行Python包。

A. Efficiency evaluation

A. 效率评估

a) Baselines: We compare HopSkipJumpAttack with three state-of-the-art decision-based attacks: Boundary Attack [14], Limited Attack [9] and Opt Attack [16]. We use the implementation of the three algorithms with the suggested hyperparameters from the publicly available source code online. Limited Attack is only included under the targeted ∞ setting, as in Ilyas et al. [9].

a)基线:我们比较了HopSkipJumpAttack和三种最先进的基于决策的攻击:边界攻击[14]、有限攻击[9]和最优攻击[16]。我们使用了这三种算法的实现，并从公开的在线源代码中获得了建议的超参数。有限攻击只包括在目标设置下，如Ilyas等人[9]。

b) Data and models: For a comprehensive evaluation of HopSkipJumpAttack, we use a wide range of data and models, with varied image dimensions, data set sizes, complexity levels of task and model structures.

b)数据和模型:为了对HopSkipJumpAttack进行综合评估，我们使用了广泛的数据和模型，图像尺寸、数据集大小、任务的复杂程度和模型结构各不相同。

The experiments are carried out over four image data sets: MNIST, CIFAR-10 [27], CIFAR-100 [27], and ImageNet [28] with the standard train/test split [29]. The four data sets have varied image dimensions and class numbers. MNIST contains 70K 28 × 28 gray-scale images of handwritten digits in the range 0-9. CIFAR-10 and CIFAR-100 are both composed of 32×32×3 images. CIFAR-10 has 10 classes, with 6K images per class, while CIFAR-100 has 100 classes, with 600 images per class. ImageNet has 1, 000 classes. Images in ImageNet are rescaled to 224 × 224 × 3. For MNIST, CIFAR-10 and CIFAR-100, 1, 000 correctly classified test images are used, which are randomly drawn from the test data set, and evenly distributed across classes. For ImageNet, we use 100 correctly classified test images, evenly distributed among 10 randomly selected classes. The selection scheme follows Metzen et al. [30] for reproducibility.

实验在四个图像数据集上进行:MNIST, CIFAR-10 [27]， CIFAR-100[27]和ImageNet[28]，使用标准的train/test split[29]。这四个数据集具有不同的图像尺寸和类数。MNIST包含了70K 28 28个手写数字的灰度图像，范围在0-9。CIFAR-10和CIFAR-100都是由32 32 3图像组成的。CIFAR-10有10个类，每个类有6K张图片，而CIFAR-100有100个类，每个类有600张图片。ImageNet有1000个类。ImageNet中的图像被重新标定为224 224 3。对于MNIST、CIFAR-10和CIFAR-100，使用了1000张正确分类的测试图像，这些图像是从测试数据集中随机抽取的，并且均匀地分布在各个类中。对于ImageNet，我们使用100张正确分类的测试图像，均匀分布在10个随机选择的类中。为了重现性，选择方案遵循Metzen等人的[30]。

Figure 3: Median distance versus number of model queries on MNIST with CNN, and CIFAR-10 with ResNet and DenseNet from top to bottom rows. 1st column: untargeted 2. 2nd col.: targeted 2. 3rd col.: untargeted ∞. 4th col.: targeted ∞.

图3:在MNIST上与CNN的模型查询，以及在ResNet和DenseNet上的ci远-10的模型查询，从上到下一行的中位数距离。第一栏:无目标第二节:目标2。3上校:无目标∞。第四节:目标无穷。

We also use models of varied structure, from simple to complex. For MNIST, we use a simple convolutional network composed of two convolutional layers followed by a hidden dense layer with 1024 units. Two convolutional layers have 32, 64 filters respectively, each of which is followed by a max-pooling layer. For both CIFAR-10 and CIFAR-100, we train a 20-layer ResNet [31] and 121-layer DenseNet [32] respectively, with the canonical network structure [29]. For ImageNet, we use a pre-trained 50-layer ResNet [31]. All models achieve close to state-of-the-art accuracy on the respective data set. All pixels are scaled to be in the range [0, 1]. For all experiments, we clip the perturbed image into the input domain [0, 1] for all algorithms by default.

我们也使用各种结构的模型，从简单到复杂。对于MNIST，我们使用一个简单的卷积网络，它由两个卷积层和一个1024单位的隐密层组成。两个卷积层分别有32,64个过滤器，每个过滤器后面都有一个最大池层。对于CIFAR-10和CIFAR-100，我们分别训练20层ResNet[31]和121层DenseNet[32]，其标准网络结构为[29]。对于ImageNet，我们使用预训练的50层ResNet[31]。所有的模型在各自的数据集上都能达到最先进的精度。所有的像素都被缩放到范围内[0,1]。对于所有的实验，我们都将被扰动的图像剪辑到所有默认算法的输入域[0,1]。

c) Initialization: For untargeted attack, we initialize all attacks by blending an original image with uniform random noise, and increasing the weight of uniform noise gradually until it is misclassified, a procedure which is available on Foolbox [26], as the default initialization of Boundary Attack. For targeted attack, the target class is sampled uniformly among the incorrect labels. An image belonging to the target class is randomly sampled from the test set as the initialization. The same target class and a common initialization image are used for all attacks.

c)初始化:对于非目标攻击，我们将原始图像与均匀随机噪声混合，逐渐增加均匀噪声的权重，直到分类错误，这是一种默认的边界攻击初始化方法，可在dumbox[26]上使用。对于定向攻击，目标类在不正确的标签之间均匀采样。从测试集中随机抽取一个属于目标类的图像作为初始化。所有攻击都使用相同的目标类和通用的初始化映像。

d) Metrics: The first metric is the median p distance between perturbed and original samples over a subset of test images, which was commonly used in previous work, such as Carlini and Wagner [6]. A version normalized by image dimension was employed by Brendel et al. [14] for evaluating Boundary Attack. The 2 distance can be interpreted in the following way: Given a byte image of size h×w×3, perturbation of size d in 2 distance on the rescaled input image amounts to perturbation on the original image of $[d/\sqrt{h\times\omega\times3}*255]$ bits per pixel on average, in the range [0,255]. The perturbation of size d in $\ell_{\infty}$ distance amounts to a maximum perturbation of $[255\cdot d]$ bits across all pixels on the raw image.

d)度量:第一个度量是测试图像子集上的扰动和原始样本之间的中间p距离，这是在以前的工作中常用的，如Carlini和Wagner[6]。Brendel等[14]采用图像维数归一化的版本来评估边界攻击。2的距离可以用以下方式解释:给定一个大小为hw3的字节图像，在2的距离内对大小为d的重新标定输入图像进行扰动相当于对原始图像的扰动，在范围内平均为每像素位元。距离上大小d的扰动等于原始图像上所有像素上位的最大扰动。

As an alternative metric, we also plot the success rate at various distance thresholds for both algorithms given a limited budget of model queries. An adversarial example is defined a success if the size of perturbation does not exceed a given distance threshold. The success rate can be directly related to the accuracy of a model on perturbed data under a given distance threshold:

作为替代度量，我们还绘制了在给定有限的模型查询预算下，两种算法在不同距离阈值下的成功率。如果扰动的大小不超过给定的距离阈值，则定义为成功。在给定的距离阈值下，模型对摄动数据的准确性与成功率直接相关:

Throughout the experiments, we limit the maximum budget of queries per image to 25,000, the setting of practical interest, due to limited computational resources.

在整个实验中，由于计算资源有限，我们将每幅图像的最大查询预算限制在25,000，这是实际感兴趣的设置。

e) Results: Figure 3 and 4 show the median distance (on a log scale) against the queries, with the first and third quartiles used as lower and upper error bars. For Boundary, Opt and HopSkipJumpAttack, Table I summarizes the median distance when the number of queries is fixed at 1,000, 5,000, and 20,000 across all distance types, data, models and objectives. Figure 5 and 6 show the success rate against the distance threshold. Figure 3 and 5 contain results on MNIST with CNN, and CIFAR-10 with ResNet, Denset, subsequently from the top row to the bottom row. Figure 4 and 6 contain results on CIFAR-100 with ResNet and DenseNet, and ImageNet with ResNet, subsequently from the top row to the bottom row. The four columns are for untargeted 2, targeted 2, untargeted ∞ and targeted ∞ attacks respectively.

e)结果:图3和图4显示了相对于查询的中值距离(在对数尺度上)，第一和第三个四分位数用作上下误差条。对于Boundary、Opt和HopSkipJumpAttack，表I总结了在所有距离类型、数据、模型和目标的查询数固定为1,000、5,000和20,000时的中值距离。图5和图6显示了相对于距离阈值的成功率。图3和图5包含了CNN在MNIST上的结果，ResNet、Denset在CIFAR-10上的结果，然后从上一行到下一行。图4和图6包含了ResNet和DenseNet对ci远-100的结果，ImageNet和ResNet的结果，从上一行到下一行。这四个栏目分别是针对无目标2、有目标2、无目标和有目标的攻击。

With a limited number of queries, HopSkipJumpAttack is able to craft adversarial examples of a significantly smaller distance with the corresponding original examples across all data sets, followed by Boundary Attack and Opt Attack. As a concrete example, Table I shows that untargeted 2-optimized HopSkipJumpAttack achieves a median distance of 0.559 on CIFAR-10 with a ResNet model at 1, 000 queries, which amounts to below 3/255 per pixel on average. At the same budget of queries, Boundary Attack and Opt Attack only achieve median 2-distances of 2.78 and 2.07 respectively. The difference in efficiency becomes more significant for ∞ attacks. As shown in Figure 5, under an untargeted ∞- optimized HopSkipJumpAttack with 1,000 queries, all pixels are within an 8/255-neighborhood of the original image for around 70% of adversarial examples, a success rate achieved by Boundary Attack only after 20,000 queries.

在查询数量有限的情况下，HopSkipJumpAttack能够在所有数据集上创建与原始示例距离明显更小的对抗示例，然后是边界攻击和最优攻击。作为一个具体的例子，表I显示了在使用ResNet模型进行1000次查询时，非目标2优化的HopSkipJumpAttack在ci远-10上达到了0.559的中值距离，平均低于每像素3/255。在查询预算相同的情况下，边界攻击和最优攻击的中位数2距离分别为2.78和2.07。对于攻击来说，效率上的差异变得更加显著。如图5所示，在具有1000个查询的无目标优化的HopSkipJumpAttack下，约70%的对抗性示例中所有像素都在原始图像的8/255邻域内，只有在经过20000次查询后边界攻击才有成功率。

By comparing the odd and even columns of Figure 3-6, we can find that targeted HopSkipJumpAttack takes more queries than the untargeted one to achieve a comparable distance. This phenomenon becomes more explicit on CIFAR100 and ImageNet, which have more classes. With the same number of queries, there is an order-of-magnitude difference in median distance between untargeted and targeted attacks (Figure 3 and 4). For 2-optimized HopSkipJumpAttack, while the untargeted version is able to craft adversarial images by perturbing 4 bits per pixel on average within 1,000 queries for 70% − 90% of images in CIFAR-10 and CIFAR-100, the targeted counterpart takes 2,000-5,000 queries. The other attacks fail to achieve a comparable performance even with 25,000 queries. On ImageNet, untargeted 2-optimized HopSkipJumpAttack is able to fool the model with a perturbation of size 6 bits per pixel on average for close to 50% of images with 1, 000 queries; untargeted ∞-optimized HopSkipJumpAttack controls the maximum perturbation across all pixels within 16 bits for 50% images within 1, 000 queries. The targeted Boundary Attack is not able to control the perturbation size to such a small scale until after around 25, 000 queries. On the one hand, the larger query budget requirement results from a strictly more powerful formulation of targeted attack than untargeted attack. On the other hand, this is also because we initialize targeted HopSkipJumpAttack from an arbitrary image in the target class. The algorithm may be trapped in a bad local minimum with such an initialization. Future work can address systematic approaches to better initialization.

通过比较图3-6的奇数列和偶数列，我们可以发现，有目标的HopSkipJumpAttack要比无目标的HopSkipJumpAttack需要更多的查询才能达到可比较的距离。这种现象在CIFAR100和ImageNet上更加明显，因为它们有更多的类。与相同数量的查询,能够有一个数量级的差异值之间的距离没有针对性和有针对性的攻击(图3和4)。对于2-optimized HopSkipJumpAttack,虽然没有针对性的版本能够工艺敌对的图片通过微扰4比特每像素1000查询内平均70% - 90% CIFAR-10和cifar - 100图像的目标对应需要2000 - 5000查询。其他攻击即使在25,000次查询中也无法达到类似的性能。在ImageNet上，没有目标的2优化的HopSkipJumpAttack能够欺骗模型，通过1000次查询，对接近50%的图像进行平均每像素6位的扰动;无目标优化的HopSkipJumpAttack控制了在1000次查询中对50%的图像在16位内的所有像素的最大干扰。目标边界攻击无法控制扰动的大小到如此小的规模，直到大约25000个查询之后。一方面，更大的查询预算需求源于严格意义上更强大的有目标攻击的公式，而不是无目标攻击。另一方面，这也是因为我们从目标类中的任意图像初始化了目标HopSkipJumpAttack。这样的初始化可能会使算法陷入一个坏的局部最小值。未来的工作可以解决更好的初始化的系统方法。

Figure 5: Success rate versus distance threshold for MNIST with CNN, and CIFAR-10 with ResNet, DenseNet from top to bottom rows. 1st column: untargeted 2. 2nd column: targeted 2. 3rd column: untargeted ∞. 4th column: targeted ∞.

图5:CNN中MNIST的成功率与距离阈值，ResNet中ci远-10的成功率与距离阈值，DenseNet从上到下一行。第一栏:无目标第二栏:目标第三栏:无目标∞。第四列:目标∞。

As a comparison between data sets and models, we see that adversarial images often have a larger distance to their corresponding original images on MNIST than on CIFAR-10 and CIFAR-100, which has also been observed in previous work (e.g., [6]). This might be because it is more difficult to fool a model on simpler tasks. On the other hand, HopSkipJumpAttack also converges in a fewer number of queries on MNIST, as is shown in Figure 3. It does not converge even after 25, 000 queries on ImageNet. We conjecture the query budget is related to the input dimension, and the smoothness of decision boundary. We also observe the difference in model structure does not have a large influence on decision-based algorithms, if the training algorithm and the data set keep the same. For ResNet and DenseNet trained on a common data set, a decision-based algorithm achieves comparable performance in crafting adversarial examples, although DenseNet has a more complex structure than ResNet.

在数据集和模型之间的比较中，我们可以看到，相对于CIFAR-10和CIFAR-100，在MNIST上敌对的图像与其对应的原始图像之间的距离往往更大，这在之前的工作中也被观察到(如[6])。这可能是因为在更简单的任务上欺骗模型更困难。另一方面，HopSkipJumpAttack也在MNIST上聚合较少数量的查询，如图3所示。即使在ImageNet上进行了25,000次查询，它也不能收敛。我们推测查询预算与输入维数、决策边界的光滑性有关。我们还观察到，如果训练算法和数据集保持一致，模型结构的差异对基于决策的算法没有太大的影响。对于在公共数据集上训练的ResNet和DenseNet来说，尽管DenseNet的结构比ResNet更复杂，但基于决策的算法在制造敌对的例子方面取得了可比较的性能。

As a comparison with state-of-the-art white-box targeted attacks, C&W attack [6] achieves an average 2-distance of 0.33 on CIFAR-10, and BIM [3] achieves an average ∞- distance of 0.014 on CIFAR-10. Targeted HopSkipJumpAttack achieves a comparable distance with 5K-10K model queries on CIFAR-10, without access to model details. On ImageNet, targeted C&W attack and BIM achieve an 2-distance of 0.96 and an ∞-distance of 0.01 respectively. Untargeted HopSkipJumpAttack achieves a comparable performance with 10, 000 − 15, 000 queries. The targeted version is not able to perform comparably as targeted white-box attacks when the budget of queries is limited within 25, 000.

与目前最先进的白盒目标攻击相比，C&W攻击[6]在CIFAR-10上的平均2-距离为0.33,BIM[3]在CIFAR-10上的平均2-距离为0.014。有针对性的HopSkipJumpAttack在CIFAR-10上使用5K-10K模型查询达到了相当的距离，而不需要访问模型细节。在ImageNet上，目标C&W攻击和BIM分别达到了2-距离0.96和-距离0.01。无目标的HopSkipJumpAttack通过1万个1万5千个查询获得了类似的性能。当查询预算限制在25000以内时，目标版本无法执行与目标白盒攻击相当的功能。

Visualized trajectories of HopSkipJumpAttack optimized for 2 distances along varied queries on CIFAR10 and ImageNet can be found in Figure 7. On CIFAR-10, we observe untargeted adversarial examples can be crafted within around 500 queries; targeted HopSkipJumpAttack is capable of crafting human indistinguishable targeted adversarial examples within around 1, 000 − 2, 000 queries. On ImageNet, untargeted HopSkipJumpAttack is able to craft good adversarial examples with 1, 000 queries, while targeted HopSkipJumpAttack takes 10, 000 − 20, 000 queries.

在图7中可以找到沿着CIFAR10和ImageNet上不同查询优化了2个距离的HopSkipJumpAttack的可视化轨迹。在CIFAR-10上，我们观察到可以在大约500个查询中创建非目标的对抗性示例;有针对性的HopSkipJumpAttack能够在大约1000个2000个查询中创建人类无法区分的目标对抗示例。在ImageNet上，无目标的HopSkipJumpAttack能够使用1000个查询创建好的对抗示例，而有目标的HopSkipJumpAttack则需要1万个2万个查询。

Figure 6: Success rate versus distance threshold for CIFAR-100 with ResNet, DenseNet, and ImageNet with ResNet from top to bottom rows. 1st column: untargeted 2. 2nd column: targeted 2. 3rd column: untargeted ∞. 4th column: targeted ∞.

图6:ResNet、DenseNet和ImageNet中从上到下一行的ci远-100的成功率与距离阈值。第一栏:无目标第二栏:目标第三栏:无目标∞。第四列:目标∞。

B. Defense mechanisms under decision-based attacks

B.决策攻击下的防御机制

We investigate the robustness of various defense mechanisms under decision-based attacks.

我们研究了各种防御机制在基于决策的攻击下的鲁棒性。

a) Defense mechanisms: Three defense mechanisms are evaluated: defensive distillation, region-based classification, and adversarial training. Defensive distillation [33], a form of gradient masking [13], trains a second model to predict the output probabilities of an existing model of the same structure. We use the implementaion provided by Carlini and Wagner [6] for defensive distillation. The second defense, region-based classification, belongs to a wide family of mechanisms which add test-time randomness to the inputs or the model, causing the gradients to be randomized [34]. Multiple variants have been proposed to randomize the gradients [35–39]. We adopt the implementation in Cao and Gong [35] with suggested noise levels. Given a trained base model, region-based classification samples points from the hypercube centered at the input image, predicts the label for each sampled point with the base model, and then takes a majority vote to output the label. Adversarial training [2, 3, 7, 17] is known to be one of the most effective defense mechanisms against adversarial perturbation [34, 40]. We evaluate a publicly available model trained through a robust optimization method proposed by Madry et al. [7]. We further evaluate our attack method by constructing a non-differentiable model via input binarization followed by a random forest in Appendix C. The evaluation is carried out on MNIST, where defense mechanisms such as adversarial training work most effectively.

a)防御机制:评估三种防御机制:防御精馏、区域分类和对抗性训练。防御蒸馏[33]，梯度掩蔽[13]的一种形式，训练第二个模型来预测相同结构的现有模型的输出概率。我们使用由Carlini和Wagner[6]提供的实现进行防御性蒸馏。第二种防御，基于区域的分类，属于一个广泛的机制系列，它给输入或模型增加了测试时间的随机性，导致梯度被随机化[34]。已经有人提出用多个变量来随机化梯度[35 39]。我们在Cao和锣[35]中采用了建议的噪音水平。给定一个经过训练的基模型，基于区域的分类从以输入图像为中心的超立方体中抽取点，用基模型预测每个采样点的标签，然后以多数投票的结果输出标签。对抗性训练[2,3,7,17]被认为是对抗对抗性干扰最有效的防御机制之一[34,40]。我们评估了一个通过Madry等人[7]提出的鲁棒优化方法训练的公开可用模型。在附录c中，我们通过输入二值化和随机森林构造一个不可微模型来进一步评估我们的攻击方法。评估在MNIST上进行，其中对抗性训练等防御机制最有效。

b) Baselines: We compare our algorithm with state-of-the-art attack algorithms that require access to gradients, including C&W Attack [6], DeepFool [4] for minimizing 2-distance, and FGSM [2], and BIM [7, 41] for minimizing ∞-distance. For region-based classification, the gradient of the base classifier is taken with respect to the original input.

b)基线:我们将我们的算法与需要使用梯度的最先进攻击算法进行比较，包括C&W攻击[6]，DeepFool[4]最小化2-距离，FGSM[2]和BIM[7,41]最小化-距离。对于基于区域的分类，取基分类器相对于原始输入的梯度。

We further include methods designed specifically for the defense mechanisms under threat. For defensive distillation, we include the ∞-optimized C&W Attack [6]. For regionbased classification, we include backward pass differentiable approximation (BPDA) [34], which calculates the gradient of the model at a randomized input to replace the gradient at the original input in C&W Attack and BIM. All of these methods assume access to model details or even defense mechanisms, which is a stronger threat model than the one required for decision-based attacks. We also include Boundary Attack as a decision-based baseline.

我们还包括专门为受到威胁的防御机制设计的方法。防御性蒸馏，我们包括优化的C&W攻击[6]。在基于区域的分类中，我们采用了向后传递可微近似(BPDA)[34]，它计算模型在随机输入时的梯度，以替代c&w攻击和BIM中原始输入时的梯度。所有这些方法都假定可以访问模型细节甚至防御机制，这是一个比基于决策的攻击所需的更强大的威胁模型。我们还将边界攻击作为一个基于决策的基线。

Figure 7: Visualized trajectories of HopSkipJumpAttack for optimizing 2 distance on randomly selected images in CIFAR-10 and ImageNet. 1st column: initialization (after blended with original images). 2nd-9th columns: images at 100, 200, 500, 1K, 2K, 5K, 10K, 25K model queries. 10th column: original images.

图7:在CIFAR-10和ImageNet中对随机选择的图像进行2距离优化的HopSkipJumpAttack的可视化轨迹。第一列:初始化(与原始图像混合后)。第2 -9列:100、200、500、1K、2K、5K、10K、25K模型查询的图像。第十栏:原图。

For HopSkipJumpAttack and Boundary Attack, we include the success rate at three different scales of query budget: 2K, 10K and 50K, so as to evaluate our method both with limited queries and a sufficient number of queries. We find the convergence of HopSkipJumpAttack becomes unstable on region-based classification, resulting from the difficulty of locating the boundary in the binary search step when uncertainty is increased near the boundary. Thus, we increase the binary search threshold to 0.01 to resolve this issue.

对于HopSkipJumpAttack和Boundary Attack，我们包括了2K、10K和50K三种不同查询预算尺度下的成功率，以便在有限查询和足够查询的情况下评估我们的方法。我们发现HopSkipJumpAttack的收敛性在基于区域的分类上变得不稳定，这是由于边界不确定性增加，在二分查找步骤中很难定位边界造成的。因此，我们将二分查找阈值提高到0.01来解决这个问题。

c) Results: Figure 8 shows the success rate of various attacks at different distance thresholds for the three defense mechanisms. On all of the three defenses, HopSkipJumpAttack demonstrates similar or superior performance compared to state-of-the-art white-box attacks with sufficient model queries. Even with only 1K-2K model queries, it also achieves acceptable performance, although worse than the best whitebox attacks. With sufficient queries, Boundary Attack achieves a comparable performance under the 2-distance metric. But it is not able to generate any adversarial examples when the number of queries is limited to 1, 000. We think this is because the strength of our batch gradient direction estimate over the random walk step in Boundary Attack becomes more explicit when there is uncertainty or non-smoothness near the decision boundary. We also observe that Boundary Attack does not work in optimizing the ∞-distance metric for adversarial examples, making it difficult to evaluate defenses designed for ∞ distance, such as adversarial training proposed by Madry et al. [7].

c)结果:图8显示了三种防御机制在不同距离阈值下的各种攻击成功率。在这三种防御手段中，与使用充分的模型查询的最先进的白盒攻击相比，HopSkipJumpAttack展示了类似或更优的性能。即使只有k - 2k模型查询，它也能达到可接受的性能，尽管比最好的白盒攻击还要差。通过充分的查询，边界攻击在两距离度量下获得了可比较的性能。但是，当查询数量限制在1000个以内时，它不能生成任何相反的示例。我们认为这是因为当决策边界附近存在不确定性或非光滑性时，我们对边界攻击中随机行走步骤的批量梯度方向估计的强度变得更加明确。我们还观察到，边界攻击不能优化对抗性例子的距离度量，这使得评估为距离设计的防御变得困难，如Madry等人[7]提出的对抗性训练。

On a distilled model, when the ∞-distance is thresholded at 0.3, a perturbation size proposed by Madry et al. [7] to measure adversarial robustness, HopSkipJumpAttack achieves success rates of 86% and 99% with 1K and 50K queries respectively. At an 2-distance of 3.0, the success rate is 91% with 2K queries. HopSkipJumpAttack achieves a comparable performance with C&W attack under both distance metrics with 10K-50K queries. Also, gradient masking [13] by defensive distillation does not have a large influence on the query efficiency of HopSkipJumpAttack, indicating that the gradient direction estimate is robust under the setting where the model does not have useful gradients for certain white-box attacks.

在经过提取的模型上，当-距离阈值为0.3时，即Madry等人[7]提出的度量对抗鲁棒性的扰动大小，HopSkipJumpAttack在1K查询和50K查询下，成功率分别为86%和99%。当2个距离为3.0时，2K次查询的成功率为91%。在10K-50K的查询情况下，HopSkipJumpAttack与C&W攻击在这两个距离指标下取得了相当的性能。另外，防御蒸馏的梯度掩盖[13]对HopSkipJumpAttack的查询效率影响不大，说明在模型对某些白盒攻击没有有用的梯度的情况下，梯度方向估计是稳健的。

On region-based classification, with 2K queries, HopSkipJumpAttack achieves success rates of 82% and 93% at the same ∞- and 2-distance thresholds respectively. With 10K-50K queries, it is able to achieve a comparable performance to BPDA, a white-box attack tailored to such defense mechanisms. On the other hand, we observe that HopSkipJumpAttack converges slightly slower on region-based classification than itself on ordinary models, which is because stochasticity near the boundary may prevent binary search in HopSkipJumpAttack from locating the boundary accurately.

在基于区域的分类上，2K查询下，相同距离阈值和2距离阈值下，HopSkipJumpAttack的成功率分别为82%和93%。通过10K-50K的查询，它能够实现与BPDA相当的性能，BPDA是一种针对此类防御机制定制的白盒攻击。另一方面，我们观察到HopSkipJumpAttack在基于区域的分类上的收敛速度比它在普通模型上的收敛速度稍慢，这是由于边界附近的随机性使得HopSkipJumpAttack的二值搜索无法准确定位边界。

On an adversarially trained model, HopSkipJumpAttack achieves a success rate of 11.0% with 50K queries when the ∞-distance is thresholded at 0.3. As a comparison, BIM has a success rate of 7.4% at the given distance threshold. The success rate of ∞-HopSkipJumpAttack transfers to an accuracy of 87.58% on adversarially perturbed data, close to the state-of-the-art performance achieved by white-box attacks.3 With 1K queries, HopSkipJumpAttack also achieves comparable performance to BIM and C&W attack.

在一个反向训练的模型上，当-distance的阈值为0.3时，HopSkipJumpAttack对50K查询的成功率为11.0%。相比之下，在给定的距离阈值下BIM的成功率为7.4%。对于反向干扰数据，-HopSkipJumpAttack的成功率达到了87.58%，接近白盒攻击所达到的最先进的性能。3 .在1K的查询下，HopSkipJumpAttack也可以达到与BIM和C&W攻击相当的效果。

Figure 8: Success rate versus distance threshold for a distilled model, a region-based classifier and an adversarially trained model on MNIST. Blue, magenta, cyan and orange lines are used for HopSkipJumpAttack and Boundary Attack at the budget of 1K, 2K, 10K and 50K respectively. Different attacks are plotted with different line styles. An amplified figure is included near the critical ∞-distance of 0.3 for adversarial training.

图8:MNIST上蒸馏模型、基于区域的分类器和反向训练的模型的成功率与距离阈值。在预算为1K, 2K, 10K, 50K的情况下，HopSkipJumpAttack和Boundary Attack分别使用蓝色，品红，青色，橙色线。不同的攻击用不同的线条样式绘制。在0.3的临界∞-距离附近加入一个放大图，用于对抗训练。

VI. DISCUSSION

VI. 讨论

We have proposed a family of query-efficient algorithms based on a novel gradient-direction estimate, HopSkipJumpAttack, for decision-based generation of adversarial examples, which is capable of optimizing 2 and ∞-distances for both targeted and untargeted attacks. Convergence analysis has been carried out given access to the gradient. We have also provided analysis for the error of our Monte Carlo estimate of gradient direction, which comes from three sources: bias at the boundary for a nonzero perturbation size, bias of deviation from the boundary, and variance. Theoretical analysis has provided insights for selecting the step size and the perturbation size, which leads to a hyperparameter-free algorithm. We have also carried out extensive experiments, showing HopSkipJumpAttack compares favorably to Boundary Attack in query efficiency, and achieves competitive performance on several defense mechanisms.

我们提出了一种基于新的梯度方向估计的查询效率算法，HopSkipJumpAttack，用于基于决策的对抗性例子生成，能够优化2和-距离的目标和非目标攻击。在给定梯度的情况下进行了收敛分析。我们还分析了梯度方向蒙特卡罗估计的误差来自三个方面:非零扰动大小的边界偏差、边界偏差和方差。理论分析为选择步长和扰动大小提供了见解，从而产生超参数无算法。我们还进行了大量的实验，表明HopSkipJumpAttack在查询效率上优于边界攻击，并在几种防御机制上实现了竞争性能。

Given the fact that HopSkipJumpAttack is able to craft a human-indistinguishable adversarial example within a realistic budget of queries, it becomes important for the community to consider the real-world impact of decision-based threat models. We have also demonstrated that HopSkipJumpAttack is able to achieve comparable or even superior performance to state-of-the-art white-box attacks on several defense mechanisms, under a much weaker threat model. In particular, masked gradients, stochastic gradients, and nondifferentiability are not barriers to our algorithm. Because of its effectiveness, efficiency, and applicability to nondifferentiable models, we suggest future research on adversarial defenses may evaluate the designed mechanism against HopSkipJumpAttack as a first step.

鉴于HopSkipJumpAttack能够在实际的查询预算范围内创建一个无法区分人的对抗示例，社区考虑基于决策的威胁模型对现实世界的影响就变得很重要了。我们还演示了在更弱的威胁模型下，HopSkipJumpAttack能够在多个防御机制上实现与最先进的白盒攻击相当甚至更好的性能。特别是，掩蔽梯度、随机梯度和不可微性对我们的算法来说并不是障碍。由于它的有效性、效率和对不可微模型的适用性，我们建议未来对抗性防御的研究可以首先评估设计的针对HopSkipJumpAttack的机制。

One limitation of all existing decision-based algorithms, including HopSkipJumpAttack, is that they require evaluation of the target model near the boundary. They may not work effectively by limiting the queries near the boundary, or by widening the decision boundary through insertion of an additional “unknown” class for inputs with low confidence. We have also observed that it still takes tens of thousands of model queries for HopSkipJumpAttack to craft imperceptible adversarial examples with a target class on ImageNet, which has a relatively large image size. Future work may seek the combination of HopSkipJumpAttack with transfer-based attack to resolve these issues.

所有现有的基于决策的算法，包括HopSkipJumpAttack，都存在一个局限性，即需要对边界附近的目标模型进行评估。通过限制边界附近的查询，或者通过为低置信度的输入插入额外的未知类来扩大决策边界，它们可能无法有效地工作。我们还观察到，使用ImageNet上的目标类制作难以察觉的对抗示例仍然需要对HopSkipJumpAttack进行数以万计的模型查询，而ImageNet的目标类具有相对较大的图像大小。未来的工作可能会寻求HopSkipJumpAttack和transfer-based attack的结合来解决这些问题。

VII. ACKNOWLEDGEMENT

VII. 致谢

We would like to thank Nicolas Papernot and anonymous reviewers for providing their helpful feedback.

我们要感谢Nicolas Papernot和匿名审稿人提供的有帮助的反馈。

REFERENCES

参考文献

[1] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations, 2014.

[2] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations, 2015.

[3] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. In International Conference on Learning Representations, 2017.

[4] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2574–2582, 2016.

[5] Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy, pages 372–387. IEEE, 2016.

[6] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy, pages 39–57. IEEE, 2017.

[7] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.

[8] Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and ChoJui Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 15–26. ACM, 2017.

[9] Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited queries and information. In International Conference on Machine Learning, pages 2142–2151, 2018.

[10] Andrew Ilyas, Logan Engstrom, and Aleksander Madry. Prior convictions: Black-box adversarial attacks with bandits and priors. In International Conference on Learning Representations, 2019.

[11] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. In Proceedings of the International Conference on Learning Representations, 2017.

[12] Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016.

[13] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical blackbox attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pages 506–519. ACM, 2017.

[14] Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decisionbased adversarial attacks: Reliable attacks against black-box machine learning models. In International Conference on Learning Representations, 2018.

[15] Thomas Brunner, Frederik Diehl, Michael Truong Le, and Alois Knoll. Guessing smart: Biased sampling for efficient black-box adversarial attacks. arXiv preprint arXiv:1812.09803, 2018.

[16] Minhao Cheng, Thong Le, Pin-Yu Chen, Huan Zhang, JinFeng Yi, and Cho-Jui Hsieh. Query-efficient hard-label black-box attack: An optimization-based approach. In International Conference on Learning Representations, 2019.

[17] Florian Tramr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations, 2018.

[18] Abraham D Flaxman, Adam Tauman Kalai, and H Brendan McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 385–394. SIAM, 2005.

[19] Alekh Agarwal, Dean P Foster, Daniel J Hsu, Sham M Kakade, and Alexander Rakhlin. Stochastic convex optimization with bandit feedback. In Advances in Neural Information Processing Systems, pages 1035–1043, 2011.

[20] Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527–566, 2017.

[21] John C Duchi, Michael I Jordan, Martin J Wainwright, and Andre Wibisono. Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 61(5):2788–2806, 2015.

[22] Sijia Liu, Bhavya Kailkhura, Pin-Yu Chen, Paishun Ting, Shiyu Chang, and Lisa Amini. Zeroth-order stochastic variance reduction for nonconvex optimization. In Advances in Neural Information Processing Systems, pages 3731–3741, 2018.

[23] David Kincaid, David Ronald Kincaid, and Elliott Ward Cheney. Numerical Analysis: Mathematics of Scientific Computing, volume 2. American Mathematical Soc., 2009.

[24] Michel Ledoux. The Concentration of Measure Phenomenon. Number 89. American Mathematical Soc., 2001.

[25] Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Alexey Kurakin, Cihang Xie, Yash Sharma, Tom Brown, Aurko Roy, Alexander Matyasko, Vahid Behzadan, Karen Hambardzumyan, Zhishuai Zhang, Yi-Lin Juang, Zhi Li, Ryan Sheatsley, Abhibhav Garg, Jonathan Uesato, Willi Gierke, Yinpeng Dong, David Berthelot, Paul Hendricks, Jonas Rauber, and Rujun Long. Technical report on the cleverhans v2.1.0 adversarial examples library. arXiv preprint arXiv:1610.00768, 2018.

[26] Jonas Rauber, Wieland Brendel, and Matthias Bethge. Foolbox: A python toolbox to benchmark the robustness of machine learning models. arXiv preprint arXiv:1707.04131, 2017.

[27] Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.

[28] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009.

[29] Franc¸ois Chollet et al. Keras. https://keras.io, 2015.

[30] Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. In International Conference on Learning Representations, 2017.

[31] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European Conference on Computer Vision, pages 630–645. Springer, 2016.

[32] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4700–4708, 2017.

[33] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy, pages 582–597. IEEE, 2016.

[34] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International Conference on Machine Learning, pages 274–283, 2018.

[35] Xiaoyu Cao and Neil Zhenqiang Gong. Mitigating evasion attacks to deep neural networks via region-based classification. In Proceedings of the 33rd Annual Computer Security Applications Conference, pages 278–287. ACM, 2017.

[36] Xuanqing Liu, Minhao Cheng, Huan Zhang, and Cho-Jui Hsieh. Towards robust neural networks via random self-ensemble. In Proceedings of the European Conference on Computer Vision (ECCV), pages 369–385, 2018.

[37] Guneet S. Dhillon, Kamyar Azizzadenesheli, Jeremy D. Bernstein, Jean Kossaifi, Aran Khanna, Zachary C. Lipton, and Animashree Anandkumar. Stochastic activation pruning for robust adversarial defense. In International Conference on Learning Representations, 2018.

[38] Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning, pages 1310–1320, 2019.

[39] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through randomization. In International Conference on Learning Representations, 2018.

[40] Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 3–14. ACM, 2017.

[41] Alexey Kurakin, Ian J Goodfellow, and Samy Bengio. Adversarial examples in the physical world. In Artificial Intelligence Safety and Security, pages 99–112. Chapman and Hall/CRC, 2018.

[42] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.

Previous以势头推动对抗性攻击 Next利用输入多样性提高对抗性算例的可转移性

Last updated 4 years ago

Was this helpful?