使用ResNet产生通用对抗扰动网络

Generating universal adversarial perturbation with ResNet

原文连接：

https://reader.elsevier.com/reader/sd/pii/S0020025520304394?token=720578957CCC4F8DF2318A1C4E8212F0404D384FFD5CF186D7C9CA956AD014566D937BDF27EC2C8ABEDE559588A2962B

GB/T 7714 Xu J, Liu H, Wu D, et al. Generating Universal Adversarial Perturbation with ResNet[J]. Information Sciences, 2020.

MLA Xu, Jian, et al. "Generating Universal Adversarial Perturbation with ResNet." Information Sciences (2020).

APA Xu, J., Liu, H., Wu, D., Zhou, F., Gao, C. Z., & Jiang, L. (2020). Generating Universal Adversarial Perturbation with ResNet. Information Sciences.

abstract

Adversarial machine learning, as a research area, has received a great deal of attention inrecent years. Much of this attention has been devoted to a phenomenon called adversarialperturbation, which is human-imperceptible and can be used to craft adversarial examples.The deep neural networks are vulnerable to adversarial examples, which raises securityconcerns on learning algorithms due to the potentially severe consequences. It was shownthere exist universal perturbations that are image-agnostic can fool the network whenadded to the majority of images. Since different attack strategies proposed for generatinguniversal perturbation are still suffering from attack success rate, attack efficiency, andtransferability. In this paper, we design an attack framework that uses a residual network(ResNet) to create universal perturbation. We introduce a trainable residual network gen-erator that converts random noise into universal adversarial perturbation, which can beused to efficiently generate perturbations for any instance after being trained. Unlike tra-ditional methods, moreover, we use a loss network to guarantee the similarity of images incontent. The new generator structure and objective function make our method achieve bet-ter attack results than the existing methods. A variety of experiments conducted on theCIFAR-10 dataset reveal that our proposed attack framework constitutes an advance inthe creation of universal adversarial perturbation, as it can achieve a success rate of 89%,which outperforms the similar methods, along with low perturba.

摘要

对抗性机器学习作为一个研究领域，近年来受到了广泛的关注。这种注意力主要集中在一种被称为对抗性扰动的现象上，这种现象是人类无法察觉的，可以用来制造对抗性的例子。深度神经网络很容易受到敌对例子的攻击，这引起了对学习算法的安全担忧，因为潜在的严重后果。这表明，存在普遍的扰动，图像不可知，可以欺骗网络，当添加到大多数图像。由于产生普遍扰动的不同攻击策略仍然受到攻击成功率、攻击效率和可移植性的影响。在本文中，我们设计了一个利用残差网络(ResNet)来产生通用扰动的攻击框架。我们引入了一种可训练的残差网络生成器，它能将随机噪声转化为通用的对抗性扰动，在训练后可以有效地生成任何实例的扰动。与传统的方法不同，我们使用了一个损失网络来保证图像在内容上的相似度。新的生成器结构和目标函数使我们的方法比现有的方法获得更好的攻击效果。在ifar -10数据集上进行的各种实验表明，我们提出的攻击框架在创建通用对抗性扰动方面是一个进步，因为它可以达到89%的成功率，优于类似的方法，并且具有低干扰。

1.Introduction

In recent years, deep learning has achieved impressive performance in the fields of computer vision[1,2], medicine[3,4],natural language processing[5,6], and security[7,8], among others. However, Szegedy et al.[9]found that deep neural net-works are vulnerable to adversarial examples. Subsequently, a number of works[10–17]have studied the robustness of deepneural networks against adversarial attacks in depth, and proposed a variety of attack methods to craft adversarial examplesby adding adversarial perturbation to the original images.[18]reviews the latest research of adversarial example and gives asurvey on the state-of-the-art attack methods and defense methods. Considering that learning systems use signals fromcameras and other sensors as input in real application,[19]demonstrates that most of adversarial examples still have anattack effect even when perceived through the camera. The existence of adversarial examples can affect the application ofdeep neural networks in computer vision tasks, as well as cause other security issues in applications such as autonomousdriving: a hypothetical attacker could, for example, introduce small manipulations into the image mark ”STOP” that areinvisible to human eyes, causing the vehicle’s automatic identification system to misjudge the input as ”SPEED LIMIT 60”or similar, with results that are likely to be catastrophic. Research into adversarial examples can help us to understandthe robustness of different model structures, along with the advantages or disadvantages of different training algorithmsin ensuring the security of deep learning in practical applications.

近年来，深度学习在计算机视觉[1,2]、医学[3,4]、自然语言处理[5,6]、安全[7,8]等领域取得了令人瞩目的成绩。然而，Szegedy等[9]发现，深度神经网络容易受到敌对实例的攻击。随后，许多著作[10 17]深入研究了深度神经网络对敌对攻击的鲁棒性，并通过在原始图像中添加敌对摄动，提出了各种攻击方法来制造敌对例子。综述了对抗性实例的最新研究进展，并对目前最先进的攻击方法和防御方法进行了评述。考虑到学习系统在实际应用中使用来自摄像头和其他传感器的信号作为输入，[19]表明，大多数敌对的例子仍然具有攻击效果，即使通过摄像头感知。敌对的例子的存在会影响深度神经网络在计算机视觉中的应用任务,以及导致自动驾驶等其他安全问题的应用程序:一个假设攻击者可以,例如,小操作引入到图像标记停止areinvisible人眼,导致车辆自动识别系统误判输入限速60或类似,结果可能是灾难性的。通过对抗性例子的研究，可以帮助我们了解不同模型结构的鲁棒性，以及不同训练算法的优缺点，从而确保深度学习在实际应用中的安全性。

Adversarial perturbations are very small, imperceptible by humans, and can mislead even state-of-the-art deep neuralnetwork (DNN) into making incorrect predictions. From the perspective of attack methods, adversarial perturbations canbe divided into two categories[20]: methods in the first category independently produce adversarial perturbations for eachoriginal example, which is an image-dependent perturbation. While the second type of method produces a single universalperturbation for most natural data, which is a universal perturbation. Perturbations of the first method are characterized bytheir inherent dependence on specific original data. They are specifically crafted for each data point independently with solv-ing a data-dependent optimization problem. The second category of methods, first proposed by Moosavi-Dezfooli et al.[13],produces a single universal perturbation for a given target model and dataset, which can fool the network on most naturalimages.

对抗性的干扰非常小，人类无法察觉，甚至可以误导最先进的深层神经网络(DNN)做出错误的预测。从攻击方法的角度来看，敌对摄动可以分为两类[20]:第一类方法对每个原始实例独立产生敌对摄动，这是一个与像相关的摄动。而第二类方法对大多数自然数据产生单一的普遍扰动，即普遍扰动。第一种方法的扰动以其对特定原始数据的固有依赖性为特征。它们是专门为每个数据点设计的，独立求解一个依赖数据的优化问题。第二类方法是由moosavi - dez愚i等人[13]首先提出的，对给定的目标模型和数据集产生单一的普遍扰动，可以在大多数自然图像上欺骗网络。

However, existing attack methods require a lot of computation time. Such as optimization methods, they[21]take aboutthree hours to generate one thousand adversarial examples. Fast Gradient Sign Method (FGSM) can have higher attack effi-ciency, however, it is a one-step gradient-based approach and has a low success rate for the white-box mode. The iterativemethods iteratively apply fast gradient multiple times with a small step size, thereby, which needs more computation time.[21]demonstrates that generative model is faster than FGSM at inference time. Therefore, we consider using the residualgenerator as a generative model to accelerate the production of adversarial example. On the other hand, most attack meth-ods produce image-dependent perturbations for targeted and non-targeted attacks.[13]shows the existence of a universaladversarial perturbation (image-agnostic) which does not depend on specific data. However, most methods of generatinguniversal perturbations do not have a high success with a small norm constraint. To further improve the attack rate undera small norm. We do include two distance minimization terms within the objective function, one is norm distance term andanother is content loss term which is defined by loss network.

然而，现有的攻击方法需要大量的计算时间。例如优化方法，它们需要大约三个小时来生成一千个敌对的例子。快速梯度符号法(FGSM)具有较高的攻击效率，但它是一种基于梯度的一步攻击方法，对白盒模式的攻击成功率较低。迭代方法以较小的步长迭代多次使用快速梯度，因此需要较多的计算时间。[21]表明生成模型在推理时比FGSM更快。因此，我们考虑利用残差生成器作为生成模型来加速对抗性实例的生成。另一方面，大多数攻击方法对有目标和非目标攻击产生依赖于图像的扰动。[13]显示了不依赖于特定数据的普遍对抗性扰动(图像不可知)的存在。然而，对于小范数约束，大多数生成普遍摄动的方法都不太成功。进一步提高小常模下的发病率。我们在目标函数中包含两个距离最小化项，一个是范数距离项，另一个是内容损失项，由损失网定义。

In this paper, we propose a framework, Universal Adversarial Perturbation with ResNet (hereafter referred to as UAP-RN),to create the universal adversarial perturbations, which uses the image transformation function of a residual network gen-erator combined with the feature map output of the loss network. We further demonstrate the excellent attack effect of thisframework on a public standard dataset usingl1as the metric. More specifically, the proposed framework achieves animprovement relative to non-target attacks where crafting universal adversarial perturbations is concerned. As for targetedattacks, this method can also effectively attack each tag under a lower norm constraint. Moreover, the adversarial examplesproduced by our method exhibit excellent transferability.

在本文中，我们提出了一个基于ResNet的通用对抗性摄动(以下简称UAP-RN)框架来创建通用对抗性摄动，该框架利用残差网络生成器的图像变换函数与损失网络的特征图输出相结合。我们进一步证明了该框架对使用l1作为度量标准的公共标准数据集的良好攻击效果。更具体地说，所提出的框架实现了相对于非目标攻击的改进，其中制造普遍的对抗性扰动是关注的。对于targetedattacks，该方法也可以在较低范数约束下有效地攻击每个标签。此外，本方法生产的对抗性样品表现出优良的可转移性。

2.1. Image-dependent perturbations

The adversarial attack method was first proposed by Szegedy et al.[9]. These authors designed an optimization problemconsisting of perturbation norm and model loss for each original image, and then calculated the optimal approximate solu-tion using L-BFGS iterative algorithm to create adversarial perturbation. However, these adversarial perturbations were onlyvalid for the specific raw image in question, making them image-dependent, moreover, the method in[9]also required sub-stantial calculation time.

对抗性攻击方法首先由Szegedy等[9]提出。摘要针对每幅原始图像设计了一个包含摄动范数和模型损失的优化问题，然后利用L-BFGS迭代算法计算了产生对敌摄动的最优近似解。然而，这些对抗摄动只对特定的原始图像有效，使它们与图像有关，而且，在[9]中的方法也需要实际的计算时间。

Following on from this work,[10,19,14,22,11,15,16 ] (among others) studied the topic in more detail.[10] proposed the FGSM algorithm, which uses the approximate gradient of the loss function to construct an adversarial example, i.e., given a raw image $x$ , the attacker generates the adversarial example $x_{A}=x+\eta$ with infinite norm constraint. The generation process can be seen in Fig. 1. More specifically: $\eta=\in·sign(\bigtriangledown_{x}\ell_{f}(x,y))$ represents an imperceptible perturbation, where $\ell_{f}(·)$ is the crossentropy loss used to train the model $f$ , and $y$ represents the ground truth of $x$ . However, being a single-step attack, the FGSM method adds a significant amount of perturbation at any one time. Kurakin et al. [19] extended FGSM to an iterative version, and proposed Basic Iterative Attack, which runs multiple iterations of the FGSM and has a better attack effect.Moreover, in[14], an iterative attack algorithm based on FGSM was developed. Assuming that the loss function is linearizedaround the current data point, and a small amount of perturbation is added to the original image in each iteration to makethe attack possible. [22]showed that their algorithms can reliably produce adversarial examples while only modifying onfew percentages of the input features per sample. Firstly, they construct adversarial saliency maps to identify features ofthe input that most significantly impact output classification. Secondly, they craft adversarial examples by modifying thesefeatures deliberately. [11]further investigated how an optimal objective function might be constructed and proposed a $C\&W$ algorithm capable of producing excellent adversarial examples, although this method requires additional computationalcost.[15]introduced momentum technology to the adversarial attack concept, proposing a momentum-based iterativeattack algorithm called MI-FGSM. This method integrates the momentum term into the iterative process for attacks, allowing the directions to be stably updated and preventing any falling into the local solution. MI-FGSM can achieve superior attack performance at a fast attack speed, this method also won first place in both the NIPS 2017 Non-targeted Adversarial Attackand Targeted Adversarial Attack competitions.[16]presents a decision-based adversarial attack method that is iterative andcomprises three main steps: a binary search to approach the boundary, gradient estimation, and a step-size search.

在此基础上，10,19,14,22,11,15,16对这一课题进行了更详细的研究。[10]提出了FGSM算法，该算法利用损失函数的近似梯度构造了一个对抗性的例子，即时，攻击者生成具有无限范数约束的对抗性示例。生成过程如图1所示。更具体地说:表示一种难以察觉的扰动，其中是用来训练模型的交叉损耗，表示的ground truth。然而，作为单步攻击，FGSM方法在任何时候都增加了大量的干扰。Kurakin等[19]将FGSM扩展为迭代版本，提出了基本迭代攻击，该攻击方法对FGSM进行多次迭代，攻击效果更好。在[14]中，提出了一种基于FGSM的迭代攻击算法。假设损失函数在当前数据点附近线性化，并在每次迭代中对原始图像添加少量的扰动，使攻击成为可能。[22]表明，它们的算法在每个样本只修改少数百分比的输入特征的情况下，能够可靠地产生敌对的例子。首先，他们构造了对抗性显著性图来识别对输出分类影响最大的输入特征。其次，他们通过有意地修改这些特征来设计对抗的例子。[11]进一步研究了如何最优目标函数可能被构造，并提出了一个算法能够产生优秀的对抗性例子，尽管这种方法需要额外的计算成本。[15]将动量技术引入对敌攻击的概念，提出了基于动量的迭代攻击算法MI-FGSM。该方法将动量项集成到攻击的迭代过程中，使方向稳定更新，防止落入局部解。MI-FGSM攻击速度快，攻击性能优越，该方法在NIPS 2017非目标对抗性攻击和目标对抗性攻击比赛中均获得第一名。[16]提出了一种基于决策的对抗攻击方法，该方法是迭代的，包括三个主要步骤:逼近边界的二值搜索、梯度估计和步长搜索。

Different from the above methods,[21]proposed a framework for generating adversarial examples via generative adver-sarial networks (GAN)[23], and further verified that the speed of attack example creation using the pre-trained generator isfar higher than that achieved by the FGSM attack algorithm.[24]also used GAN[23]to design a method of defense againstattacks, this method arrives at a correction model by means of adversarial training, which can eliminate malicious pertur-bation in the input.[25]proposed a defense method called Defense-GAN, which is effective against both white-box andblack-box attacks. And they also demonstrated that one might come across practical difficulties while implementing thismethod. As we all know, training GAN is still a challenging task and an active area of research, if the GAN is not properlytrained and tuned, the performance of this method will be relatively poor.

与上述方法不同的是，[21]提出了一种通过生成式反向网络(GAN)[23]生成对抗算例的框架，并进一步验证了使用预训练的生成器生成攻击算例的速度远高于FGSM攻击算法。[24]还利用GAN[23]设计了一种防御攻击的方法，该方法通过对抗性训练得到一个修正模型，可以消除输入中的恶意扰动。[25]提出了一种防御方法叫做Defense-GAN，该方法对白盒攻击和黑盒攻击都有效。他们还证明了在实现这种方法时可能会遇到实际困难。我们都知道，训练GAN仍然是一个具有挑战性的任务和活跃的研究领域，如果GAN不适当的训练和调优，这种方法的性能会比较差。

2.2. Universal perturbations

Moosavi-Dezfooli et al.[13]proposed a systematic algorithm for computing universal (i.e. image-agnostic) perturbations,which are capable of fooling a model with high probability when added to any data sample. These authors iteratively com-puted the small perturbations corresponding to each data in a target dataset, then used these perturbations to establish asingle general perturbation. Furthermore, considering that the training dataset is usually not fully accessible in practice,[12]proposed a data-independent method to compute universal adversarial perturbations that could attack multiple layersof the network without any knowledge of the target data. Unlike traditional iterative algorithms,[20,26]produced universalperturbations through neural network learning. This kind of attack method learns in order to build a generator that convertsrandom noise into a perturbation capable of fooling a target convolutional neural networks (CNN) on most of the images.Poursaeed et al.[20]used only the model loss as the objective function to train the generator network, here, the generatoradopts the residual network structure. Hayes et al.[26]designed an objective function with model loss and a perturbationnorm where the generator structure comprises five deconvolution layers and three fully connected layers. Once trained, thefeed-forward generator can efficiently produce adversarial perturbations.

Moosavi-Dezfooli et al.[13]提出了一个计算通用(即图像不确定)扰动的系统算法，当添加到任何数据样本时，该算法能够以高概率欺骗一个模型。这些作者迭代地将目标数据集中每个数据对应的小扰动组合起来，然后利用这些扰动建立一个单一的一般扰动。此外，考虑到训练数据集在实践中通常是不完全可访问的，[12]提出了一种数据独立的方法来计算在不知道目标数据的情况下可以攻击网络多层的通用对抗扰动。与传统的迭代算法不同，[20,26]通过神经网络学习产生了普遍的摄动。这种攻击方法学习的目的是建立一个发生器，将随机噪声转换成扰动，能够在大多数图像上欺骗目标卷积神经网络(CNN)。Poursaeed等[20]仅以模型损耗为目标函数对发电机网络进行训练，此处发电机采用残差网络结构。Hayes等人[26]设计了一个具有模型损失和扰动范数的目标函数，其中发生器结构包括5个反褶积层和3个完全连接层。一旦训练，前馈发生器可以有效地产生对抗性扰动。

Research into the robustness of deep neural networks requires a large number of adversarial examples to be produced inonly a short period of time. However, most current iterative algorithms capable of producing excellent adversarial examplesrequire extensive calculation time. Therefore, it is considered advisable to study attack methods utilizing neural network for-ward propagation.[27]proposes a method of defending against universal adversarial perturbation: in brief, this methodtrains a correction model based on both the original image and universal adversarial perturbation, then uses the trainedmodel as the pre-input layer of the target model to detect and correct the perturbation in the image.

深度神经网络鲁棒性的研究需要在短时间内产生大量的反例。然而，大多数能够产生优秀的对抗性实例的迭代算法需要大量的计算时间。因此，研究利用神经网络向传播的攻击方法是可取的。[27]提出了一种抵御普遍对抗性扰动的方法:简单地说，该方法在原始图像和普遍对抗性扰动的基础上训练一个校正模型，然后将训练好的模型作为目标模型的预输入层来检测和校正图像中的扰动。

For the comprehensive information of adversarial examples, we recommend this paper[28]which proposed a taxonomyof attack methods, further elaborated countermeasures for adversarial examples, investigated the applications for adversar-ial examples.

为了全面了解敌对算例，我们推荐了[28]，提出了攻击方法的分类，进一步阐述了敌对算例的对策，研究了敌对算例的应用。

3. Preliminaries

3.1. Adversarial perturbations

An adversarial perturbation is a tiny noise added to the original data that is difficult for humans to perceive[29]. Definition1 below presents a formal description of adversarial perturbation.

一个不利的干扰是在原始数据上添加一个微小的噪声，这使得人类很难感知[29]。下面的定义1给出了对抗式扰动的正式描述。

Definition 1. Let $A: R^{d}\to R^{d}$ be attack strategy function of attacker, given a raw datax, an adversarial example would be $A(x).Let\delta_{x}=A(x)-x$ be the perturbation chosen by the adversary at $x$ . Moreover, $f:R^{d} \to R$ denotes a score based classifier while $g:R^{d}\to \{-1,1\}$ is a base classifier. The perturbation $\delta_{x}$ chosen by an adversary at $x$ is thus said to be adversarial for $f$ if $\parallel \delta_{x} \leq \epsilon \parallel$ and:

定义 1. 让 $A: R^{d}\to R^{d}$ 攻击者的攻击策略函数，给定一个原始数据ax，一个敌对的例子将是 $A(x).Let\delta_{x}=A(x)-x$ 被对手选择的干扰 $x$ .而且， $f:R^{d} \to R$ 表示基于评分的分类器 $g:R^{d}\to \{-1,1\}$ 是一个基分类器。干扰项 $\delta_{x}$ 被对手选择了x 这就是对抗 $\parallel \delta_{x} \leq \epsilon \parallel$ 和：

Note that, if the perturbation also causes the base classifier to misclassify, it cannot be considered adversarial. Furthermore, if $f$ disagrees with $g$ at $x$ , this should be judged as a standard prediction error rather than an adversarial error. The adversarialexample can be considered as the raw data with added adversarial perturbation, which satisfies sign $(f(x+\delta_{x}))\neq g(x)$ .

注意，如果扰动也导致基分类器分类错误，那么它就不能被认为是对抗性的。此外，我不同意 $g$ 和 $x$ ，这应该被判断为一个标准的预测错误，而不是一个对抗性的错误。对抗的例子可以被视为原始数据添加了敌对的扰动,满足sign $(f(x+\delta_{x}))\neq g(x)$ .

3.2. Adversarial transferability

Szegedy et al.[9]found that the same perturbation can fool a different network that was trained on a different subset ofthe dataset. An adversarial example that can cause a non-target model to misclassify is said to be transferable. Definition2formalizes this adversarial transferability.

Szegedy等人[9]发现，同样的干扰可以欺骗在数据集的不同子集上训练的不同网络。一个可能导致非目标模型分类错误的对抗性例子被称为可转移的。定义2将这种对抗性可转让性形式化。

Definition 2.Let $x^{A}$ be an adversarial example crafted for modelfand the original inputx, while model $f\prime$ is different from $f$ .The adversarial example $x^{A}$ can therefore be said to have adversarial transferability if

定义2。让我们来看一个与modelfand原始inputx相反的例子，而modelf0与f是不同的。因此，对抗性的例子被认为具有对抗性可转让性

In general, both iterative attacks and neural network-based attacks can achieve higher success rates in a white-box setting,i.e. one in which the attackers have perfect knowledge of the target network structure and training data. However, if adver-sarial examples of this kind are tested on a different network, the attack success rate will be significantly reduced. This isbecause these types of attack method tend to overfit specific network structures and parameters, resulting in generatedadversarial examples that rarely transfer to other networks. Transferability is an important indicator of the robustness ofadversarial examples and is the main property required for a black-box attack.

一般来说，迭代攻击和基于神经网络的攻击在白盒设置下都能获得更高的成功率，即攻击者完全了解目标网络结构和训练数据。但是，如果在不同的网络上测试这种不利的例子，攻击成功率会大大降低。这是因为这些类型的攻击方法往往会过度拟合特定的网络结构和参数，导致生成的对抗性示例很少传输到其他网络。可转让性是t的重要指标。

3.3 Loss network

A loss network is a pre-trained network that is used to define perceptual loss functions[30]. Different layers of a convo-lutional neural network can learn the feature information from different levels of the image. To formalize the feature information, for a loss networkln, let $ln_{j}(x)$ be the activations of the $j th$ layer of the network $ln$ when processing the image $x$ . The feature information $ln_{j}(x)$ is a feature map of shape $C_{j}\times H_{j}\times W_{j}$ where $j$ is a convolutional layer. $ln_{j}(x)$ can be used to define the content loss function and the style loss function of images. Real-Time Style Transfer[30] used a VGG-16 pretrained for image classification task to define perceptual loss functions, as shown in Fig. 2, the loss network remains fixed during the training process. In our work, we also use a similar pre-trained model as a fixed loss network to define the content loss function between images. The content loss is used to encourage adversarial examples to appear similar to the original data.

损失网络是用于定义感知损失函数[30]的预训练网络。神经网络的不同层次可以从不同层次的图像中获取特征信息。为了使特征信息形式化，对于一个损失网络，当处理图片 $x$ 让 $ln_{j}(x)$ 作为激活这个 $j th$ 层。当j是卷积层的时候，特征的信息 $ln_{j}(x)$ 匹配 $C_{j}\times H_{j}\times W_{j}$ 。 $ln_{j}(x)$ 可用于定义图像的内容丢失函数和样式丢失函数。[30]使用预先训练过的vga -16进行图像分类任务来定义感知丢失函数，如图2所示，丢失网络在训练过程中保持不变。在我们的工作中，我们也使用一个类似的预训练模型作为固定损失网络来定义图像之间的内容损失函数。内容丢失是为了鼓励相反的示例看起来类似于原始数据。

3.4 ResNet generator

The residual network (or ResNet) generator is one example of an excellent image transformation networks. For example,given an original image $x$ , the pre-trained residual network generatorGcan output a new image $x^{A}$ that is the needs of trainers. For this network, the input and output are both color images of identical shape. This network contains two stride-2 convolutions, some residual blocks, and two 1/2-strided convolutions. Since the residual network generator is fully convolutional, it can be applied to images of any resolution during experiments. The residual network generator was originallyused in image style transfer[30], where it was trained to solve the optimization problem proposed by Gatys et al. in realtime. Cycle-GAN[31] also used this residual network generator for image-to-image translation, in which the goal is to learnthe mapping between an input image and an output image using a training set of aligned image pairs. In this paper, we use the ResNet generator to convert random noise into universal adversarial perturbation, unlike[30], we use batch normalization rather than instance normalization.

残余网络(或ResNet)发生器是一个很好的图像变换网络的例子。例如，给定一幅原始图像 $x$ ，预训练的残差网络生成器可以输出一幅新的图像 $x^{A}$ ，这是训练器需要的。对于这个网络，输入和输出都是相同形状的彩色图像。该网络包含两个stride-2卷积、一些残块和两个半strided卷积。由于残差网络生成器是完全卷积的，因此在实验中可以应用于任何分辨率的图像。残差网络生成器最初用于图像风格传输[30]中，通过训练它来实时解决Gatys等人提出的优化问题。 Cycle-GAN[31] 也使用这个残差网络生成器进行图像到图像的转换，目标是使用一组对齐的图像对的训练集来学习输入图像和输出图像之间的映射。本文利用ResNet生成器将随机噪声转化为通用的对抗性扰动，不同于[30]，我们使用批处理归一化而不是实例归一化。

4. The proposed method

4.1 Design concept

Generative adversarial perturbations (GAP)[20]and Learning universal adversarial perturbations with generative models(UAN)[26]are state-of-the-art attack methods designed to create universal adversarial perturbations using trainable gener-ators. However, GAP[20]maintains the perturbation in a fixed norm by relying on a scaling factor before applying the per-turbation to an image. Although it reduces the amount of computation required during gradient back-propagation, it maynot be able to achieve the optimal attack effect. Moreover, the generator in UAN[26]is a simple network which consistsof some deconvolution layers and some fully connected layers, a shallow structure of this kind may not be as effective asa deep network structure. Furthermore, neither of these two attack methods pays attention to the similarity of the outputfeature map in the network’s middle layer. After considering the above-mentioned problems, we designed an attack methodto craft perturbation using a residual network and a loss network. Here, the residual network, which is composed of residualblocks, has a deeper structure and can thus learn more information. For its part, the loss network captures the feature mapoutput of the image in the middle and defines the content loss function. In practice, we additionally included a distance min-imization term within the objective function and gradually adjusted the perturbation norm from a smaller value to achieve abetter attack effect. In order to ensure content similarity between the adversarial image and the original image, we alsoadded the content loss term defined by the loss network to the objective function.

生成对抗摄动(GAP)[20]和学习通用对抗摄动与生成模型(UAN)[26]是最先进的攻击方法，旨在使用可训练的生成器创建通用对抗摄动。然而，GAP[20]在对图像应用每扰动之前，通过依赖一个缩放因子来保持扰动在一个固定的范数。虽然减少了梯度反向传播时的计算量，但可能无法达到最优攻击效果。此外，在UAN[26]中，发生器是由一些反褶积层和一些全连通层组成的简单网络，这种浅层结构可能不如深层结构有效。此外，这两种攻击方法都不注重网络中间层输出特征映射的相似性。在考虑了上述问题之后，我们设计了一种利用剩余网络和损失网络来制造扰动的攻击方法。这里，残差网络由残差块组成，具有更深的结构，可以了解到更多的信息。丢失网络则捕获中间图像的feature mapoutput并定义内容丢失函数。在实际操作中，我们在目标函数中增加了距离最小拟合项，并从一个较小的值逐步调整扰动范数，以达到更好的攻击效果。为了保证敌对图像与原始图像的内容相似度，我们还在目标函数中加入了由丢失网络定义的内容丢失项。

4.2 Problem definition

Let $x\in X$ represent original data (where $X$ is the dataset), and each data sample has a corresponding label $c\in\{1,...,C\}$ .And $f(\cdot)$ represents a classification network trained on the corresponding dataset. Given a data point $x$ , the network outputs a probability vector $f(x)$ . The attacker’s goal is to craft a corresponding adversarial example $x^{A}$ for the original data $x$ . Here, $f(x^{A})\neq y$ (y denotes the true label) or $f(x^{A})=t$ (t is the target class). $x^{A}$ should also be close to the original instance $x$ in terms of $l_{p}$ distance metric and image content. We expect obtain a function that converts an instance of random noise (sampledfrom a uniform distribution) into a universal adversarial perturbation. In practice, this involves designing an objective function, then training the residual network generator to approximate this function.

$x\in X$ 表示原始数据(其中 $X$ 为数据集)，每个数据样本都有相应的标签 $c\in\{1,...,C\}$ ， $f(\cdot)$ 表示在相应数据集上训练的分类网络。给定一个数据点 $x$ ，网络输出一个概率向量 $f(x)$ 。攻击者的目标是为原始数据 $x$ 创建一个对应的对抗性示例 $x^{A}$ 。这里， $f(x^{A})\neq y$ (y表示真实的标签)或 $f(x^{A})=t$ (t是目标类)。 $x^{A}$ 在距离度量和图像内容方面也应该接近 $l_{p}$ 原始实例 $x$ 。 $x^{A}$ 我们期望得到一个将随机噪声实例(从均匀分布取样)转换为通用对抗性扰动的函数。在实践中，这涉及到设计一个目标函数，然后训练残差网络生成器来近似这个函数。

4.3 UAP-RN framework

The attack framework primarily comprises a generator $G$ , a loss networklnand a target model $f$ , as shown in Fig. 3. First, we sample a fixed-shape random noisezfrom a uniform distribution. $G$ takes this as input and outputs a perturbation $\delta$ . The $\delta$ is then scaled by a factor $\omega\in(0,\frac{\in}{\parallel\delta\parallel_{p}}]$ , where is the maximum permitted perturbation. The factor $\omega$ takes a small value inthe initial training phase and is increased appropriately while the objective function loss value does not decrease.

攻击框架主要包括一个生成器 $G$ , 亏损网络和目标模型 $f$ , 如图3所示。首先，我们从均匀分布的随机噪声中采样一个固定形状的随机噪声。将 $G$ 作为输入和输出扰动 $\delta$ 。然后将 $\delta$ 其缩放一个因子 $\omega\in(0,\frac{\in}{\parallel\delta\parallel_{p}}]$ , 公式中为最大允许扰动。该因子在初始训练阶段取值较小，并在目标函数损失值不降低的情况下适当增大。

The scaled perturbation $\delta\prime=\omega\cdot\delta$ is added to the original data $x+\delta\prime=x^{A}$ , after which the $x^{A}$ is clipped to $[0,1]^{m}$ . We feed the clipped image $x^{A}$ into the loss network $l_{n}$ to obtain the feature map of the middle layer, where feature map is used to compute the content loss $L_{cont}^{ln}$ . During training, the network structure and parameter information of the target model are fully known. The target model $f$ takes $x^{A}$ as input and outputs probability vector $f(x^{A})$ , which is used to compute model loss $L_{adv}^{f}$ . Meanwhile, the original image is also fed into the target model and the loss network to obtain the corresponding probability vectorand feature map used to calculate the loss item.

缩放扰动 $\delta\prime=\omega\cdot\delta$ 被添加到原始数据 $x+\delta\prime=x^{A}$ 中，然后 $x^{A}$ 被剪切到原始数据 $[0,1]^{m}$ 中。将裁剪后的图像 $x^{A}$ 输入丢失网络 $l_{n}$ ，得到中间层的特征图，中间层的特征图用于计算内容丢失 $L_{cont}^{ln}$ 。在训练过程中，目标模型 $f$ 的网络结构 $x^{A}$ 和参数信息是完全已知的 $f(x^{A})$ 。目标模型以输入和输出概率向量作为输入和输出概率向量 $L_{adv}^{f}$ ，用于计算模型损失。同时，将原始图像输入目标模型和损失网络，得到相应的概率向量和特征图，用于计算损失项。

[30] has shown that the first few layers of neural networks learn about the lines and color information of the image. Asthe depth increases, the next few layers pay more attention to abstract information of the image. Since we hope to guaranteesimilarity in content between the two images during the adversarial attack, we therefore add the content loss term definedby the loss network, expressed as:

[30] 显示了神经网络的前几层可以学习图像的线条和颜色信息。随着深度的增加，接下来的几层将更多地关注图像的抽象信息。由于我们希望保证在对抗性攻击时两幅图像之间内容的连贯性，因此我们增加了由损失网络定义的内容损失项，表示为:

We use the pre-trained VGG-19 model as the loss network, input image, then output corresponding feature map, which has a size of $C_{j}H_{j}W_{j}$ ( $C, H ~and~ W$ represent the number of channels, height and width respectively).

我们使用预先训练好的vggg -19模型作为损失网络，输入图像，然后输出相应的特征图，其一个尺寸为 $C_{j}H_{j}W_{j}$ ( $C, H ~and~ W$ 分别表示通道数、高度和宽度)。

For non-targeted attacks, moreover, we want the forecast label to be different from label $c_{0}$ , which is the true label. As in UAN[26], we also use the model loss term in CW[11] to encourage the perturbed image to be misclassified as another class. In this case, the model loss term $L_{adv}^{f}$ can be expressed as:

此外，对于非目标攻击，我们希望预测标签不同于标签 $c_{0}$ ，这才是真正的标签。与在UAN[26]中一样，我们也使用CW[11]中的模型损失项，以促使扰动图像被误分类为另一类 $L_{adv}^{f}$ 。此时，模型损失项可表示为:

For a targeted attack, we compute a single perturbation which causes any images to be predicted as a target class $c$ . The model loss term $L_{adv}^{f}$ can be expressed as:

对于有针对性的攻击，我们计算一个单一的扰动，使任何图像被预测为目标类 $c$ 。模型损失测试 $L_{adv}^{f}$ 可以表示为:

The $l_{p}$ metric is typically used to accurately capture the perceptual differences between raw image and adversarial image, wealso adopt this metric. In order to achieve a better attack effect under the control of the perturbation norm, we add the per-turbation norm to the objective function. We therefore use the same distance metric as in CW[11] to quantify the distance between the original data and attack example, as follows:

该度量 $l_{p}$ 通常用于准确地捕捉原始图像和敌对图像之间的感知差异，我们也采用了该度量。为了在扰动范数的控制下取得更好的攻击效果，我们在目标函数中加入了单扰动范数。因此，我们使用与CW[11]中相同的距离度量来量化原始数据与攻击实例之间的距离，如下:

Finally, our full objective can be expressed as:

最后，我们的全部目标可以表示为：

Here $a$ and $b$ are used to control the relative importance of two target loss items respectively. $L_{adv}^{f}$ is used to optimize for thehigh attack success rate, while $L_{cont}^{ln}$ is used to encourage the perturbed image to appear similar to original image. Althoughmaking adjustments to these parameters may be difficult, the generator can be trained stably as long as the parameters areproperly initialized. Once the attack success rate reaches 100% or perturbation norm exceeds the preset threshold, we terminate training.

分别用于a和b控制两个目标损失项的相对重要性。 $L_{adv}^{f}$ 是用于优化高攻击成功率，而 $L_{cont}^{ln}$ 用于鼓励被扰动的图像看起来与原始图像相似。虽然调整这些参数可能是困难的，但只要参数被正确初始化，生成器可以被稳定地训练。一旦攻击成功率达到100%或扰动标准超过预设阈值，我们将取消训练。

5. Experimental results

All of our experiments are conducted using the CIFAR-10[32]dataset, which consists of 60,000, 32*32 RGB images in tencategories: airplane, car, bird, cat, deer, dog, frog, horses, ships, trucks. The training set contains 50,000 images, while the testset contains 10,000 images. The pre-trained target models used in the experiments are: VGG-19[33], ResNet-101[34], Den-seNet[35]. The accuracy of the three models on test set were: 93%, 95%, 94% respectively. All experiments use $l_{\infty}$ norm con-strain perturbations. A fixed-size random noise of 3*32*32 is used as the initial input, and original image is processed as $x\in [0,1]^{m}$ . The hyperparameters are as follows: Learning Rate 0.0002, Batch Size<100, Epochs 100. We set relative perturbation $\zeta_{p}=0.04$ as the upper bound of perturbation norm, which is equivalent to the infinite norm attack outlined inMoosavi-Dezfooli[13]at $\in=10$ . Relative perturbation is expressed as follows:

我们所有的实验都是使用CIFAR-10[32]数据集进行的，该数据集包含60000张3232的RGB图像，分为10个类别:飞机、汽车、鸟、猫、鹿、狗、青蛙、马、船、卡车。训练集包含50,000张图像，而testset包含10,000张图像。实验中使用的预训练目标模型为:VGG-19[33]、ResNet-101[34]、Den-seNet[35]。三种模型在测试集上的准确率分别为:93%，95%，94%。所有的实验都使用 $l_{\infty}$ 一般驱动扰动。使用固定大小的随机噪声3*32*32作为初始输入，原始图像为：

5.1 Non-targeted attack

In this case, our aim is to cause the target model to predict another arbitrary tag that deviates from the real tag. Accord-ingly, the loss term in(4) is brought into the final objective function. We used three pre-trained models as attack target, while 50,000 images were used as the training set, then tested the success rate of each trained generator on the test set of10,000 images. The universal adversarial perturbation generated by the generator for the three target models is shown inFig. 4. It can be seen that for different target models, the universal adversarial perturbation is different. Moreover, we com-pare our method of crafting universal adversarial perturbation with two similar, state-of-the-art attack methods, namelyGAP[20]and UAN[26], the results of these comparisons are presented inTable 1. It can be seen from the table that this kindof neural network learning-based attack has almost the same success rate on the training and test sets. Furthermore, ourmethod improves the attack success rate of the DenseNet model by 24% compared to the GAP[20]method and 14% relativeto the UAN[26]method.

Table 1

Comparison of error rates for UAP-RN against GAP[20]and UAN[26]based on thel1metric.

Attack

Dataset

VGG-19

DenseNet

ResNet-101

UAP-RN

Train

67%

89%

88%

UAP-RN

Val

68%

89%

UAN

Train

64%

75%

83%

UAN

Val

66%

75%

85%

GAP

Train

64%

65%

78%

GAP

Val

64%

65%

78%

5.1 一道攻击

在这种情况下，我们的目标是使目标模型预测另一个偏离真实标签的任意标签。因此，将(4)中的损失项带入最终目标函数中。我们以三种预先训练好的模型作为攻击目标，以50,000张图像作为训练集，然后在10,000张图像的测试集上测试每个训练好的生成器的成功率。对三种目标模型所产生的通用对抗性摄动如图所示。4. 可以看出，对于不同的目标模型，普遍的对敌摄动是不同的。此外，我们用两种相似的、先进的攻击方法，即gap[20]和UAN[26]，来对比我们制作通用对抗性扰动的方法，这些比较的结果在文章1中给出。从表中可以看出，这种基于神经网络学习的攻击在训练集和测试集上的成功率几乎相同。此外，与GAP[20]方法相比，DenseNet模型的攻击成功率提高了24%，与UAN[26]方法相比提高了14%。

表1

基于此度量的UAP-RN与GAP[20]和UAN[26]的错误率比较。

Attack

Dataset

VGG-19

DenseNet

ResNet-101

UAP-RN

Train

67%

89%

88%

UAP-RN

Val

68%

89%

UAN

Train

64%

75%

83%

UAN

Val

66%

75%

85%

GAP

Train

64%

65%

78%

GAP

Val

64%

65%

78%

5.2 Targeted attack

In the case of a target attack, the attacker’s goal is to craft an adversarial example that can mislead the model into pre-dicting a specific label. In our experiment, we used the same training parameters (batch size, learning rate, norm metric, etc.) as in the non-target attack. The main difference here is that the loss term in(5) is brought into final objective function. Wetested the attack effect for each tag on different values of relative perturbation (seeFig. 5). From the figure, it can be seen thatthe attack success rate for each tag can reach 70% when the relative perturbation value is 0.04. For almost every class, more-over, attacks against the VGG-19 model were found to be less effective than those against the other two models, as was alsothe case in the non-targeted attack setting. For the target tags ‘plane’, ‘dog’, ‘horse’, and ‘truck’, the attack effects for the threemodels are similar as the relative perturbation increases. The difference in target tags also does not have much influence onthe attack effect.Fig. 6presents the similar results for adversarial examples generated against DenseNet for all classes.

5.2 针对性攻击

在目标攻击的情况下，攻击者的目标是设计一个对抗的例子，它可以误导模型预测一个特定的标签。在我们的实验中，我们使用了与非目标攻击相同的训练参数(批量大小、学习率、范数度量等)。这里的主要区别是(5)中的损失项被带入最终目标函数。我们测试了每个标签对不同相对扰动值的攻击效果(seeFig)。5)从图中可以看出，在相对摄动值为0.04时，每个标签的攻击成功率都可以达到70%。此外，对于几乎每一个类，针对vgg19模型的攻击都比针对其他两种模型的攻击效率低，在非目标攻击设置中也是如此。对于目标标签飞机，狗，马，和卡车，三个阶段的攻击效果是相似的相对摄动增加。不同的目标标签对攻击效果也没有太大的影响。6给出了针对所有类使用DenseNet生成的对抗性示例的类似结果。

5.3 Transferability

Transferability is an important indicator for the robustness of adversarial examples, and is the main attribute used to per-form black-box attacks. A type of attack with strong transferability will have a better attack effect in a black-box setting,where the attackers have no knowledge of the model structure and parameters. Inspired by the data augmentation strategy,Xie et al.[17]designed an attack algorithm to improve the transferability of adversarial examples. For our part, we also testthe transferability of adversarial examples generated by our attack framework.

5.3 可转移性

可转移性是对抗实例健壮性的一个重要指标，也是每次黑箱攻击的主要属性。一种具有较强可转移性的攻击，在攻击者不知道模型结构和参数的黑箱设置下，攻击效果会更好。受数据扩充策略的启发，Xie等[17]设计了一种攻击算法来提高对抗性例子的可转移性。在我们的部分，我们也测试了由我们的攻击框架产生的对抗例子的可转移性。

Under thel1metric, we attacked the three target models separately and trained three different attack generators. In orderto measure transferability, we used each generator to create 10,000 adversarial images based on the test set, then tested theerror rate of each target model on these adversarial images, experimental results are presented inFig. 7. The abscissa ofFig. 7represents target model, while the ordinate represents error rate of the target model under each generator attack. For exam-ple, a generator based on VGG-19, its attack success rate against VGG-19 is 68%. Moreover, its obtained success rates are 88%,90% when it is evaluated on DenseNet and ResNet-101 respectively. The transferability of UAN[26]is further shown inFig. 8.For a UAN trained on VGG-19 and evaluated on DenseNet and ResNet-101, the error rate is 55%, 61% respectively. It cantherefore be seen that our attack method has lower dependence on the network structure and parameters of the targetmodel, and thus exhibits a stronger transferability effect.

在此条件下，分别对三种目标模型进行攻击，并训练了三种攻击生成器。为了度量可移植性，我们使用每个生成器在测试集的基础上创建了10000幅敌对图像，然后测试了每个目标模型对这些敌对图像的错误率，实验结果如图所示。7. 的横坐标ofFig。7表示目标模型，纵坐标表示目标模型在各发生器攻击下的错误率。例如，一个基于vgr -19的生成器，它对vgr -19的攻击成功率是68%。在DenseNet和ResNet-101上分别评估其获得成功率为88%和90%。图中进一步说明了UAN[26]的可转移性。8.对于在vga -19上训练并在DenseNet和ResNet-101上评估的UAN，错误率分别为55%和61%。由此可见，我们的攻击方法对目标模型的网络结构和参数的依赖性较小，具有较强的可转移性效果。

Previously, we conducted attack experiments while relying on the assumption that the attackers shared full access to anyimages used to train the target model. However, in a black-box setting, a hypothetical attacker can obtain only a partialrather than a complete dataset. Accordingly, we trained our generator on subsets of the CIFAR-10 training set that containedimages from all categories. The results of non-targeted attacks against ResNet-101 on the CIFAR-10 dataset are shown inFig. 9: here, the attack succeeds in 78% of cases when trained on only 0.4% of the training set. It can thus be seen that, pro-vided that the obtained dataset has the same structure as the original dataset, a good attack generator can be trained suc-cessfully even without access to the complete set of original data, moreover, our attack framework still achieves a significantattack effect in a black-box setting with limited dataset access.

在之前的攻击实验中，我们假设攻击者可以完全访问用于训练目标模型的任意图像。然而，在黑盒设置中，一个假设的攻击者只能获得部分数据集，而不是完整的数据集。因此，我们在包含来自所有类别的图像的CIFAR-10训练集的子集上训练生成器。对CIFAR-10数据集的ResNet-101进行非目标攻击的结果如图所示。9:在这里,攻击成功时78%的病例只有0.4%的训练集训练。因此可以看出,能提供获取数据集具有相同的结构与原始数据集,一个好的攻击发生器可以训练成功即使没有访问完整的原始数据,此外,我们的进攻框架仍达到significantattack效应在黑盒环境有限的数据集访问。

6. Conclusion

he attack framework designed in this paper uses a ResNet generator to carry out adversarial attacks. Once trained, uni-versal adversarial perturbation can be efficiently generated by using ResNet generator. We tested our method in a variety ofsettings. Experimental results demonstrate that our method can achieve a high success rate for non-targeted attacks, andalso achieves an excellent attack effect for targeted attacks in a white-box setting. Moreover, the adversarial examples pro-duced by our method exhibit remarkably good transferability between different models. That is, they can achieve superiorattack effects even in a black-box setting. In addition, we have shown that an excellent attack generator can be trained on asubset of the CIFAR-10 training set. In future work, the proposed framework could be applied to high-pixel datasets, as wellas other adversarial attack areas (e.g. semantic segmentation, etc.).

6.接论

本文设计的攻击框架采用ResNet生成器进行对抗性攻击。一旦训练，单对抗性扰动可以有效地产生使用ResNet发生器。我们在各种环境下测试了我们的方法。实验结果表明，该方法对非目标攻击具有较高的成功率，并且在白盒环境下对目标攻击具有良好的攻击效果。此外，该方法所给出的实例表明，不同模型之间具有很好的可转移性。也就是说，即使在黑盒环境下，它们也可以获得更强的攻击效果。此外，我们还证明了一个优秀的攻击生成器可以在CIFAR-10训练集的一个ubset上进行训练。在未来的工作中，该框架可以应用于高像素数据集，以及其他对敌攻击领域(如语义分割等)。

CRediT authorship contribution statement

Jian Xu:Conceptualization, Methodology, Formal analysis.Heng Liu:Data curation, Software, Writing - original draft.Dexin Wu:Data curation, Software, Validation.Fucai Zhou:Project administration, Investigation.Chong-zhi Gao:Supervi-sion, Formal analysis.Linzhi Jiang:Writing - review & editing.

信用作者贡献声明

徐建:概念化、方法论、形式分析。刘恒:数据管理，软件，写作——初稿。吴德新:数据管理，软件，验证。周富才:项目管理、考察。高崇智:上级，形式分析。林志江:写作-评论&编辑。

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could haveappeared to influence the work reported in this paper.

竞争利益声明

作者声明，他们没有已知的可能影响这篇论文所报道的工作的相互竞争的经济利益或个人关系。

Acknowledgment

This work is supported by the National Natural Science Foundation of China (61872069, U1936116), the FundamentalResearch Funds for the Central Universities (N2017012), and the Guangxi Key Laboratory of Cryptography and InformationSecurity (GCIS201807).

致谢

这项工作得到了国家自然科学基金(61872069,U1936116)、中央高校基础研究基金(N2017012)、广西密码与信息安全重点实验室(GCIS201807)的支持。

References

[1] K. He, X. Zhang, S. Ren, J. Sun, Identity Mappings in Deep Residual Networks, arXiv e-prints, p. arXiv:1603.05027, Mar 2016..

[2] T. Durand, N. Mehrasa, G. Mori, Learning a Deep ConvNet for Multi-label Classification with Partial Labels, arXiv e-prints, p. arXiv:1902.09720,Feb2019..

[3]Z. Mao, Y. Su, G. Xu, X. Wang, Y. Huang, W. Yue, L. Sun, N. Xiong, Spatio-temporal deep learning method for adhd fmri classification, Inf. Sci. 499 (2019)1–11.

[4]J. Islam, Y. Zhang, Early diagnosis of alzheimer’s disease: a neuroimaging study with deep learning architectures, in: Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition Workshops, 2018, pp. 1881–1883.

[5]J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE conference on computer visionand pattern recognition, 2015, pp. 3431–3440.

[6]Z. Liu, G. Lin, S. Yang, F. Liu, W. Lin, W.L. Goh, Towards robust curve text detection with conditional spatial expansion, in: Proceedings of the IEEEConference on Computer Vision and Pattern Recognition, 2019, pp. 7269–7278.

[7]X. Wang, J. Li, X. Kuang, Y. Tan, J. Li, The security of machine learning in an adversarial setting: a survey, J. Parallel Distributed Comput. 130 (2019)12–23.

[8]T. Li, C. Gao, L. Jiang, W. Pedrycz, J. Shen, Publicly verifiable privacy-preserving aggregation and its application in IoT, J. Netw. Computer Appl. 126(2019) 39–44.

[9] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, arXiv preprintarXiv:1312.6199, 2013..

[10] I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples, arXiv preprint arXiv:1412.6572, 2014..

[11]N. Carlini, D. Wagner, Towards evaluating the robustness of neural networks, in: 2017 IEEE Symposium on Security and Privacy (SP), IEEE, 2017, pp.39–57.

[12] K. Reddy Mopuri, U. Garg, R. Venkatesh Babu, Fast Feature Fool: A data independent approach to universal adversarial perturbations, arXiv e-prints, p.arXiv:1707.05572, Jul 2017..

[13]S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, P. Frossard, Universal adversarial perturbations, in: Proceedings of the IEEE conference on computer visionand pattern recognition, 2017, pp. 1765–1773.

[14]S.-M. Moosavi-Dezfooli, A. Fawzi, P. Frossard, Deepfool a simple and accurate method to fool deep neural networks, in: Proceedings of the IEEEconference on computer vision and pattern recognition, 2016, pp. 2574–2582.

[15]Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, J. Li, Boosting adversarial attacks with momentum, in: Proceedings of the IEEE conference on computervision and pattern recognition, 2018, pp. 9185–9193.

[16] J. Chen, M.I. Jordan, M.J. Wainwright, HopSkipJumpAttack: A Query-Efficient Decision-Based Attack, arXiv e-prints, p. arXiv:1904.02144, Apr2019..

[17]C. Xie, Z. Zhang, Y. Zhou, S. Bai, J. Wang, Z. Ren, A.L. Yuille, Improving transferability of adversarial examples with input diversity, in: Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 2730–2739.

[18] J. Zhang, X. Jiang, Adversarial examples: Opportunities and challenges, arXiv preprint arXiv:1809.04790, 2018..

[19] A. Kurakin, I. Goodfellow, S. Bengio, Adversarial examples in the physical world, arXiv preprint arXiv:1607.02533, 2016..

[20]O. Poursaeed, I. Katsman, B. Gao, S. Belongie, Generative adversarial perturbations, in: Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, 2018, pp. 4422–4431.

[21] C. Xiao, B. Li, J.-Y. Zhu, W. He, M. Liu, D. Song, Generating adversarial examples with adversarial networks, arXiv preprint arXiv:1801.02610, 2018..

[22]N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z.B. Celik, A. Swami, The limitations of deep learning in adversarial settings, in: 2016 IEEE EuropeanSymposium on Security and Privacy (EuroS&P), IEEE, 2016, pp. 372–387.

[23] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in Advances in neuralinformation processing systems, 2014, pp. 2672–2680..

[24] S. Shen, G. Jin, K. Gao, Y. Zhang, Ape-gan: Adversarial perturbation elimination with gan, arXiv preprint arXiv:1707.05474, 2017..

[25] P. Samangouei, M. Kabkab, R. Chellappa, Defense-gan: Protecting classifiers against adversarial attacks using generative models, arXiv preprintarXiv:1805.06605, 2018..

[26]J. Hayes, G. Danezis, Learning universal adversarial perturbations with generative models, in: 2018 IEEE Security and Privacy Workshops (SPW), IEEE,2018, pp. 43–49.

[27]N. Akhtar, J. Liu, A. Mian, Defense against universal adversarial perturbations, in: Proceedings of the IEEE Conference on Computer Vision and PatternRecognition, 2018, pp. 3389–3398.

[28]X. Yuan, P. He, Q. Zhu, X. Li, Adversarial examples: attacks and defenses for deep learning, IEEE Trans. Neural Networks Learn. Syst. (2019).

[29] A.S. Suggala, A. Prasad, V. Nagarajan, P. Ravikumar, Revisiting adversarial risk, arXiv preprint arXiv:1806.02924, 2018..

[30]J. Johnson, A. Alahi, L. Fei-Fei, Perceptual losses for real-time style transfer and super-resolution, in: European conference on computer vision,Springer,2016, pp. 694–711.

[31]J.-Y. Zhu, T. Park, P. Isola, A.A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, in: Proceedings of theIEEEinternational conference on computer vision, 2017, pp. 2223–2232.

[32] A. Krizhevsky, G. Hinton et al., Learning multiple layers of features from tiny images, Citeseer, Tech. Rep., 2009..

[33] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, 2014..

[34]K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and patternrecognition, 2016, pp. 770–778.

[35]G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger, Densely connected convolutional networks, in: Proceedings of the IEEE conference on computervision and pattern recognition, 2017, pp. 4700–4708

Previous抵御普遍的对抗性干扰

Last updated 4 years ago

Was this helpful?

原文连接：

abstract

摘要

1.Introduction

2. Related work

2.1. Image-dependent perturbations

2.2. Universal perturbations

3. Preliminaries

3.1. Adversarial perturbations

3.2. Adversarial transferability

3.3 Loss network

3.4 ResNet generator

4. The proposed method

4.1 Design concept

4.2 Problem definition

4.3 UAP-RN framework

5. Experimental results

5.1 Non-targeted attack

Table 1

5.1 一道攻击

表1

5.2 Targeted attack

5.2 针对性攻击

5.3 Transferability

5.3 可转移性

6. Conclusion

6.接论

CRediT authorship contribution statement

信用作者贡献声明

Declaration of Competing Interest

竞争利益声明

Acknowledgment

致谢

References