相反的例子:机遇和挑战

Adversarial Examples: Opportunities and Challenges

原文链接:

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8842604

GB/T 7714 Zhang J, Li C. Adversarial examples: Opportunities and challenges[J]. IEEE transactions on neural networks and learning systems, 2019.

MLA Zhang, Jiliang, and Chen Li. "Adversarial examples: Opportunities and challenges." IEEE transactions on neural networks and learning systems (2019).

APA Zhang, J., & Li, C. (2019). Adversarial examples: Opportunities and challenges. IEEE transactions on neural networks and learning systems.

Abstract

摘要

Deep neural networks (DNNs) have shown huge superiority over humans in image recognition, speech processing, autonomous vehicles, and medical diagnosis. However, recent studies indicate that DNNs are vulnerable to adversarial examples (AEs), which are designed by attackers to fool deep learning models. Different from real examples, AEs can mislead the model to predict incorrect outputs while hardly be distinguished by human eyes, therefore threaten security-critical deep-learning applications. In recent years, the generation and defense of AEs have become a research hotspot in the field of artificial intelligence (AI) security. This article reviews the latest research progress of AEs. First, we introduce the concept, cause, characteristics, and evaluation metrics of AEs, then give a survey on the state-of-the-art AE generation methods with the discussion of advantages and disadvantages. After that, we review the existing defenses and discuss their limitations. Finally, future research opportunities and challenges on AEs are prospected.

深度神经网络(DNNs)在图像识别、语音处理、自动驾驶汽车和医学诊断方面显示出了比人类更大的优势。然而,最近的研究表明,DNNs在对抗实例(AEs)面前很脆弱,AEs是攻击者设计用来欺骗深度学习模型的。与实际例子不同的是,AEs会误导模型预测错误的输出,人眼很难分辨,从而威胁到对安全性至关重要的深度学习应用。近年来,AEs的生成和防御已成为人工智能安全领域的一个研究热点。本文综述了AEs的最新研究进展。首先介绍了AEs的概念、产生原因、特点和评价指标,然后对目前最先进的AE产生方法进行了综述,并讨论了其优缺点。之后,我们回顾了现有的防御并讨论了它们的局限性。最后,对AEs未来的研究机会和挑战进行了展望。

Index Terms— Adversarial examples (AEs), artificial intelligence (AI), deep neural networks (DNNs).

索引术语-敌对例子(AEs),人工智能(AI),深度神经网络(DNNs)。

I. INTRODUCTION

I. 介绍

In recent years, deep neural networks (DNNs) have shown great advantages in autonomous vehicles, robotics, network security, image/speech recognition, and natural language processing. For example, in 2017, an intelligent robot with the superior face recognition ability, named XiaoDu developed by Baidu, defeated a representative from the team of humans strongest brain with the score of 3:2 [1]. On October 19, 2017, AlphaGo Zero released by the DeepMind team of Google shocked the world. Compared with the previous AlphaGo, AlphaGo Zero relies on reinforcement learning without any prior knowledge to grow chess skills and, finally, defeats every human competitor [2].

近年来,深度神经网络(DNNs)在自动驾驶汽车、机器人、网络安全、图像/语音识别和自然语言处理等领域显示出了巨大的优势。例如,2017年百度研发的人脸识别能力超强的智能机器人“小都”以[1]3:2的比分击败了人类最强大脑团队的一位代表。2017年10月19日,谷歌DeepMind团队发布的AlphaGo Zero震惊了世界。与之前的AlphaGo相比,AlphaGo Zero依靠不需要任何先验知识的强化学习来提高棋艺,最终击败了人类对手[2]。

For artificial intelligence (AI) research, the United States received huge support from the government, such as the Federal Research Fund. In October 2016, the United States issued the projects of Preparing for the Future of Artificial Intelligence and the National Artificial Intelligence Research and Development Strategic Plan, which raised AI to the national strategic level and formulated ambitious blueprints [3], [4]. In 2017, China issued the New Generation Artificial Intelligence Development Plan, which mentioned that the scale of the AI core industries would exceed 150 billion CNY by 2020, promoting the development of related industries to enlarge their scale to more than 1 trillion CNY. In the same year, AI was written into the nineteenth National Congress report, which pushed the development of AI industries to a new height and filled the gap in the top-level strategy of AI development [5].

在人工智能(AI)研究方面,美国得到了政府的大力支持,比如联邦研究基金。2016年10月,美国发布了《为人工智能的未来做准备项目》和《国家人工智能研发战略规划》,将人工智能提升到国家战略水平,并制定了雄心勃勃的蓝图[3]、[4]。2017年,中国发布了《新一代人工智能发展规划》,提出到2020年,人工智能核心产业规模将超过1500亿元,推动相关产业发展规模扩大到1万亿元以上。同年,人工智能被写入第十九次全国代表大会报告,将人工智能产业的发展推向了一个新的高度,填补了人工智能发展[5]顶层战略的空白。

In the early stage of AI, people paid more attention to the basic theory and application research. With the rapid development of AI, security issues have attracted great attention. For example, at the Shenzhen Hi-tech Fair on November 16, 2016, a robot named Chubby suddenly broke down and hit the booth glass without any instructions and injured the pedestrian, which was the world’s first robot injury incident [6]. In July 2016, a crime-killing robot, Knightscope, manufactured by Silicon Valley Robotics, knocked down and injured a 16-month-old boy at the Silicon Valley shopping center [7]. At 10 P.M., March 22, 2018, an Uber autonomous test vehicle hits the 49-year-old woman named Elaine Herzberg who died after being sent to the hospital for invalid treatment in the suburbs of Tempe, Arizona. This is the first fatal autonomous vehicle accident in the world [8].

在人工智能发展的早期,人们更加注重基础理论和应用研究。随着人工智能的快速发展,安全问题引起了人们的高度关注。例如,在2016年11月16日的深圳高交会上,一个名叫Chubby的机器人在没有任何指示的情况下突然发生故障,撞在展台玻璃上,伤及行人,这是世界上第一起机器人伤害事件[6]。2016年7月,硅谷机器人公司(Silicon Valley Robotics)制造的杀人机器人Knightscope在硅谷购物中心[7]撞倒并打伤了一名16个月大的男孩。2018年3月22日晚10点,49岁的伊莱恩·赫茨伯格被一辆优步自动测试车撞到,她被送往亚利桑那州坦佩郊区的医院接受无效治疗后死亡。这是世界上第一起致命的自动驾驶汽车事故。

In the past decade, various attacks on AI systems have emerged [9]–[14], [16]–[18]. At the training stage, poisoning attacks [12]–[14] can damage the original probability distribution of training data by injecting malicious examples to reduce the prediction accuracy of the model. At the test or inference stage, evasion attacks [15]–[18] can trick a target system by constructing a specific input example without changing the target machine learning (ML) system. Lowd and Meek [19] proposed the concept of adversarial learning, in which an adversary conducts an attack to minimize a cost function. Under this framework, they proposed an algorithm to reverse engineer linear classifiers. Barreno et al. [20] presented a taxonomy of different types of attacks on ML systems. In order to mitigate poisoning attacks and evasion attacks, a lot of defenses have been proposed [10], [11], [21]–[23]. Szegedy et al. [24] proposed the concept of adversarial example (AE). By adding a slight perturbation to the input, the model misclassifies the AE with high confidence, while human eyes cannot recognize the difference. Even though different models have different architectures and training data, the same set of AEs can be used to attack related models. AEs have shown a huge threat to DNNs. For example, the classifier may misclassify an adversarial image of the stop traffic sign as a speed limit sign of 45 km/h, resulting in a serious traffic accident [25]. In the image captioning system, an image is used as input to generate some captions to describe the image, which is perturbed by attackers to generate some image-independent, completely opposite or even malicious captions [26]. In the malware detection, the ML-based visualization malware detectors are vulnerable to AE attacks, where a malicious malware may be classified as a benign one by adding a slight perturbation on the transformed grayscale images [9].

在过去的十年里,各种针对AI系统的攻击出现了[9][14][16][18]。在训练阶段,投毒攻击[12][14]通过注入恶意示例破坏训练数据的原始概率分布,降低模型的预测精度。在测试或推理阶段,[15][18]可以通过构造一个特定的输入示例来欺骗目标系统,而不改变目标机器学习(ML)系统。Lowd和Meek[19]提出了对抗式学习的概念,即对手通过攻击使代价函数最小化。在此框架下,他们提出了一种对线性分类器进行反向工程的算法。Barreno等人[20]提出了对ML系统攻击的不同类型的分类。为了减轻投毒攻击和逃避攻击,人们提出了很多防御[10],[11],[21][23]。Szegedy等[24]提出了对抗算例(AE)的概念。通过在输入信号中添加一个轻微的扰动,模型对声发射进行了高度可靠的误分类,而人眼无法识别其中的差异。即使不同的模型有不同的体系结构和训练数据,同样的一组AEs可以用来攻击相关的模型。AEs已经显示出对dna的巨大威胁。例如,分类器可能将一个敌对的停车交通标志图像误分类为45km /h的限速标志,导致严重的交通事故[25]。在图像字幕系统中,以一幅图像作为输入,生成一些描述该图像的字幕,被攻击者干扰生成一些与图像无关、完全相反甚至恶意的字幕[26]。在恶意软件检测中,基于ml的可视化恶意软件检测器容易受到AE攻击,通过对转换后的灰度图像[9]添加轻微扰动,可以将恶意软件归为良性恶意软件。

In recent years, many AE construction methods and defense techniques have been proposed. This survey elaborates on the related research and development status of AE on DNNs since it was proposed in [24]. The overall framework is shown in Fig. 1.

近年来,人们提出了许多声发射施工方法和防御技术。本调查阐述了自[24]提出DNNs以来AE的相关研究和发展现状。总体框架如图1所示。

II. PRELIMINARIES

II. 预赛

A. Neural Network

A. 神经网络

Artificial neural network (ANN) simulates the human brain’s nervous system to process complex information and consists of many interconnected neurons. Each neuron represents a specific output function called the activation function. The connection between the two neurons represents the weight of the signal. The neural network connects many single neurons together by weights to simulate the human brain to process information.

人工神经网络是一种模拟人脑神经系统处理复杂信息的网络,由许多相互连接的神经元组成。每个神经元代表一个被称为激活函数的特定输出函数。两个神经元之间的连接代表了信号的权重。该神经网络通过权重将多个单个神经元连接在一起,模拟人脑处理信息的过程。

As shown in Fig. 2, a three-layer neural network is composed of an input layer L1, a hidden layer L2, and an output layer L3, where the circle represents the neuron of the neural network; the circle labeled “+1” represents the bias unit; the circles labeled “X1,” “X2,” and “X3” are the inputs. Neurons in different layers are connected by weights W. We use a (l) i to represent the activation value (output value) of the ith unit in lth layer, when l = 1, ai(1)=Xia_{i}^{(1)} = X_{i} . With the given inputs and weights, the function output h(W,b)(X)h_{(W,b)}(X) can be calculated. The specific steps are as follows:

如图2所示,三层神经网络由输入层L1、隐含层L2和输出层L3组成,其中圆形表示神经网络的神经元;标记为+1的圆表示偏差单位;标记为X1 X2 X3的圆是输入。我们用a(l) i表示第l层第i个单元的激活值(输出值),当l = 1时, ai(1)=Xia_{i}^{(1)} = X_{i} 。给定输入和权重,可计算出函数输出 h(W,b)(X)h_{(W,b)}(X) 。具体步骤如下:

The above-mentioned calculation process is called forward propagation (FP), which is a transfer process of input information through the hidden layer to the output layer. The activation function rectified linear unit (ReLU): f(X)=max{0,X}f(X) = max\{0, X\} is used to nonlinearize the neural network between different hidden layers. When the ML task is a binary classification, the final output layer uses the activation function sigmoid: f(X)=1/(1+e(X))f(X)=1/(1 + e^{(-X)}) . When the ML task is a multi-class problem, the final output layer uses the activation function softmax: f(X)=(eXK/(i=1NeXi)),k=1,2,...Nf(X)=(e^{X_{K}}/(\sum_{i=1}^{N}e^{X_{i}})),k=1,2,...N . In the training process, the weights W and the bias b connecting the neurons in different layers are determined by back propagation.

上述计算过程称为正向传播(forward propagation, FP),即输入信息通过隐含层向输出层传递的过程。利用激活函数整线性单元(ReLU): f(X)=max{0,X}f(X) = max\{0, X\} 对不同隐层之间的神经网络进行非线性化。当ML任务是二进制分类时,最终的输出层使用激活函数sigmoid: f(X)=1/(1+e(X))f(X)=1/(1 + e^{(-X)}) 。当ML任务是一个多类问题时,最终的输出层使用激活函数softmax: f(X)=(eXK/(i=1NeXi)),k=1,2,...Nf(X)=(e^{X_{K}}/(\sum_{i=1}^{N}e^{X_{i}})),k=1,2,...N 在训练过程中,通过反向传播来确定连接各层神经元的权值W和偏置b。

Neural networks belong to a cross-disciplinary research field combining computer, probability, statistics, and brain science. They focus on how to enable computers to simulate and implement human learning behaviors, so as to achieve better automatic knowledge acquisition. However, recent studies show that neural networks are particularly vulnerable to AEs, which are generated by adding small perturbations to the inputs. In what follows, we will discuss the AEs in detail.

神经网络属于计算机、概率、统计和脑科学相结合的交叉学科研究领域。他们的重点是如何让计算机模拟和实现人类的学习行为,从而实现更好的自动知识获取。然而,最近的研究表明,神经网络特别容易受到AEs的攻击,这种攻击是通过在输入端添加小的扰动而产生的。接下来,我们将详细讨论AEs。

III. ADVERSARIAL EXAMPLES

III.对抗样本

Szegedy et al. [24] proposed AEs to fool DNNs. Adding a subtle perturbation to the input of the neural network will produce an error output with high confidence, while human eyes cannot recognize the difference. Suppose that there are a ML model M and an original example C, which can be correctly classified by the model, i.e., M(C)=ytrueM(C) = y_{true} , where ytrue is the true label of C. However, it is possible to construct an AE C which is perceptually indistinguishable from CC^{\prime} but is classified incorrectly, i.e., M(C)=ytrueM(C) = y_{true} [24]. A typical example is shown in Fig. 3, the model considers the original image to be a “panda” (57.7%). After adding a slight perturbation to the original image, it is classified as a “gibbon” by the same model with 99.3% confidence, while the human eyes cannot completely distinguish the differences between the original image and the adversarial image [27].

Szegedy等人提出了AEs来愚弄DNNs。在神经网络的输入中加入一个微小的扰动,就会产生一个高度可靠的误差输出,而人眼无法识别其差异。假设有一个毫升模型C M和一个原始的例子,可以正确分类的模型,例如, M(C)=ytrueM(C) = y_{true} ytrue CC^{\prime} ,然而真正的标签,可以构造一个感知与AE C C,但分类不正确,也就是说, M(C)=ytrueM(C) = y_{true} [24]。典型例子如图3所示,模型认为原始图像为熊猫(57.7%)。在对原始图像进行轻微扰动后,同一模型以99.3%的置信度将其归类为长臂猿,而人眼无法完全区分原始图像和敌对图像[27]之间的差异。

In order to facilitate the reader to understand AEs intuitively, we use the neural network model in Fig. 2 as an example to show the change of the outputs by perturbing the inputs. As shown in Fig. 4, W(1) and W(2) are the weight matrices. After adding a small perturbation sign (0.5) to the original inputs, the adversarial inputs X1X_{1}^{\prime} , X2X_{2}^{\prime} , and X3X_{3}^{\prime} are equal to 1.5. Then, through the first layer’s weight matrix W(1) and the transform operation of the activation function ReLU, the output values a1(2)a_{1}^{\prime(2)} , a2(2)a_{2}^{\prime(2)} , and a3(2)a_{3}^{\prime(2)} are equal to 1.5. Finally, after passing the second layer’s weight matrix W(2) and the activation function sigmoid transform operation, the probability of the output is changed from 0.2689 to 0.8176, which makes the model misclassify the image with high confidence. With the increase of the model depth, the probability of the output changes more obviously.

为了便于读者直观地理解AEs,我们以图2中的神经网络模型为例,展示了输入扰动后输出的变化情况。 如图4所示,W(1)和W(2)为权值矩阵。 在原始输入上加上一个小扰动符号(0.5)后,对抗性输入X 1, X 2, X 3等于1。5。然后,通过第一层的权值矩阵W(1)和激活函数ReLU的变换运算,得到输出值 a1(2)a_{1}^{\prime(2)} , a2(2)a_{2}^{\prime(2)} ,和 a3(2)a_{3}^{\prime(2)} 。1.5。最后,在经过s之后

A. Cause of Adversarial Examples

A.对抗性例子的原因

AE is a serious vulnerability in deep learning systems and cannot be ignored in security-critical AI applications. However, in current research, there are no well-recognized explanations on why the AEs can be constructed. Analyzing the cause of AEs can help researchers to fix the vulnerability effectively. The reason may be overfitting or insufficient regularization of the model, which leads to insufficient generalization ability that learning models predict unknown data. However, by adding perturbations to a regularized model, Goodfellow et al. [27] found that the effectiveness against AEs was not improved significantly. Other researchers [28] suspected that AEs arose from extreme nonlinearity of DNNs. However, if the input dimensions of a linear model are high enough, AEs can also be constructed successfully with high confidence by adding small perturbations to the inputs.

AE是深度学习系统中的一个严重漏洞,在安全关键型人工智能应用中不可忽视。然而,在目前的研究中,还没有公认的解释为什么AEs可以被构建。分析AEs产生的原因可以帮助研究者有效地修复该漏洞。原因可能是模型拟合过度或正则化不足,导致学习模型预测未知数据的泛化能力不足。然而,Goodfellow等人[27]发现,通过对规范化模型添加扰动,对AEs的有效性并没有显著提高。其他研究者[28]怀疑AEs是由DNNs的极端非线性引起的。然而,如果一个线性模型的输入维数足够高,通过在输入中添加小扰动,也可以成功地建立AEs。

Goodfellow et al. [27] believed that the reason for generating AEs is the linear behavior in high-dimensional space. In the high-dimensional linear classifier, each individual input feature is normalized. For one dimension of each input, small perturbations will not change the overall prediction of the classifier. However, small perturbations to all dimensions of the inputs will lead to an effective change of the output.

Goodfellow等[27]认为产生AEs的原因是高维空间中的线性行为。在高维线性分类器中,每个单独的输入特征都被归一化。对于每个输入的一维,小扰动不会改变分类器的整体预测。然而,对输入的所有维度的小扰动将导致输出的有效变化。

As shown in Fig. 5, the score of class “1” is improved from 5% to 88% by adding or subtracting 0.5 to each dimension of the original example X in a particular direction. It demonstrates that linear models are vulnerable to AEs and refutes the hypothesis that the existence of AEs is due to the high nonlinearization of the model. Therefore, the existence of high-dimensional linear space may be the cause of AEs.

如图5所示,通过对原例X的每个维度在某一特定方向上加减0.5,将第1类的分数从5%提高到88%。论证了线性模型易受AEs的攻击,驳斥了AEs存在是由于模型高度非线性化的假设。因此,高维线性空间的存在可能是AEs发生的原因。

B. Characteristics of Adversarial Examples

B.对抗性例子的特点

In general, AEs have the following three characteristics.

一般来说,AEs有以下三个特点。

1) Transferability: AEs are not limited to attack a specific neural network. It is unnecessary to obtain architecture and parameters of the model when constructing AEs, as long as the model is trained to perform the same task. AEs generated from one model M1 can fool a different model M2 with a similar probability. Therefore, an attacker can use AEs to attack the models that perform the same task, which means that an attacker can construct AEs in the known ML model and then attack related unknown models [29].

1)可转移性:AEs并不局限于攻击特定的神经网络。在构造AEs时无需获取模型的架构和参数,只要训练模型执行相同的任务即可。从一个模型M1生成的AEs可以以类似的概率欺骗另一个模型M2。因此,攻击者可以使用AEs攻击执行相同任务的模型,这意味着攻击者可以在已知的ML模型中构造AEs,然后攻击相关的未知模型[29]。

2) Regularization Effect: Adversarial training [27] can reveal the defects of models and improve the robustness. However, compared to other regularization methods, the cost of constructing a large number of AEs for adversarial training is expensive. Unless researchers can find shortcuts for constructing AEs in the future, they are more likely to use dropout [30] or weight decay (L2 regularization).

2)正则化效果:对抗性训练[27]可以揭示模型的缺陷,提高鲁棒性。然而,与其他正则化方法相比,构建大量的AEs进行对敌训练的代价是昂贵的。除非研究人员能够找到未来构建AEs的捷径,否则他们更有可能使用dropout[30]或重量衰减(L2正规化)。

3) Adversarial Instability: In the physical world, it is easy to lose the adversarial ability for AEs after physical transformations such as translation, rotation, and lighting. In this case, AEs will be correctly classified by the model. This instability characteristic challenges attackers to construct robust AEs and creates the difficulty of deploying AEs in the real world.

3)对抗性不稳定性:在物理世界中,AEs在经过平移、旋转、光照等物理转换后,很容易丧失对抗性能力。在这种情况下,AEs将根据模型正确分类。这种不稳定性的特点使得攻击者很难构造健壮的AEs,也给在现实世界中部署AEs带来了困难。

C. Evaluation Metrics

c .评价指标

1) Success Rate: When constructing AEs, the success rate is the most direct and effective evaluation criterion. In general, the success rate to generate AEs is inversely proportional to the magnitude of perturbations. For example, the fast gradient sign method (FGSM) [27] requires a large perturbation and is prone to label leaking [32] that the model correctly classifies the AE generated with the true label and misclassifies the AE created with the false label. Therefore, the success rate is much lower than the iterative method [33] with the lower perturbation and the Jacobian-based saliency map attack (JSMA) method [31] with the specific perturbation. Generally, it is difficult to construct AEs with 100% success rate.

1)成功率:在构建AEs时,成功率是最直接、最有效的评价标准。一般来说,产生AEs的成功率与扰动的大小成反比。例如,快速梯度符号法[27]需要较大的扰动,并且容易标记泄漏[32],模型正确地分类了真实标签生成的AE,而错误的分类了虚假标签生成的AE。因此,与低摄动的迭代方法[33]和特定摄动的基于雅可比矩阵的显著性映射攻击[31]相比,该方法的成功率要低得多。一般情况下,很难构造出100%成功率的AEs。

2) Robustness: The robustness of ML models is related to the classification accuracy [34], [35]. Better ML models are less vulnerable to AEs. Robustness is a metric to evaluate the resilience of DNNs to AEs. In general, a robust DNN model has the following two features [36], [37].

2)鲁棒性:ML模型的鲁棒性与分类精度[34]、[35]有关。更好的ML模型不易受到AEs的攻击。鲁棒性是评价DNNs对AEs恢复能力的指标。一般来说,稳健的DNN模型具有以下两个特征[36],[37]。

Fig. 4. Perturbation to the original inputs for a three-layer neural network (Fig. 2). The value of the original inputs X1, X2 and X3 and the weights W(1) and W(2) are initialized randomly.

图4所示。对三层神经网络的原始输入进行扰动(图2)。对原始输入X1、X2和X3的值以及权重W(1)和W(2)进行随机初始化。

  1. The model has high accuracy both inside and outside of the data set.

    1. 该模型在数据集内外均具有较高的精度。

  2. The classifier of a smoothing model can classify inputs consistently near a given example.

    1. 平滑模型的分类器可以对接近给定实例的输入进行一致的分类。

We first define the robustness of classifiers f(X)f(X) to adversarial perturbations in the input space Rd. Given an input XRdX ∈ \mathbb{R}^{d} from μ (the probability measure of the data points that we wish to classify), Δadv(X,F)Δadv(X, F) is denoted as the norm of the smallest perturbation to make models misclassified

我们首先定义分类器的鲁棒性 f(X)f (X) 在输入空间Rd敌对的扰动。给定一个输入 XRdX ∈ \mathbb{R}^{d} 数据点的概率测度,我们希望分类), Δadv(X,F)Δadv(X, F) 的规范来标示最小扰动模型被误诊

where the perturbation r aims to flip the label of X, corresponding to the minimal distance from X to the decision boundary of the classifier.

其中扰动r的目的是翻转X的标签,对应于从X到分类器决策边界的最小距离。

The robustness of a DNN model F to adversarial perturbations is defined as the average of Δadv(X,F)Δadv(X, F) over all X, and the corresponding expression is

款模型F敌对的扰动的鲁棒性被定义为 Δadv(X,F)Δadv(X, F) 的平均值在所有X (X, F),和相应的表达式

As shown in Fig. 6, the outputs of the classifier are constant inside the circle with a radius of Δadv(X,F)Δadv(X, F) . However, the classified results of all samples XX^{∗} outside the circle are different from X. Therefore, the magnitude of the perturbation Δadv(X,F)Δadv(X, F) is proportional to the robustness of the model, i.e., the higher the minimum perturbation needed to misclassify the example, the stronger the robustness of the DNN is.

如图6所示,分类器的输出是常数的圆半径内 Δadv(X,F)Δadv(X, F) 。然而,所有样本的分类结果 XX^{∗} 外圆不同于X因此,扰动的大小 Δadv(X,F)Δadv(X, F) 模型的鲁棒性成正比,也就是说,越高所需的最小扰动分类的例子,是款的鲁棒性越强。

3) Transferability: AEs generated for one ML model can be used to misclassify another model even if both models have different architectures and training data. This property is called transferability. AEs can be transferred among different models because a contiguous subspace with a large dimension in the adversarial space is shared among different models [38]. This transferability provides a tremendous advantage for AEs because attackers only need to train alternative models to construct AEs and deploy them to attack the target model.

3)可转移性:一个ML模型生成的AEs可以用于误分类另一个模型,即使这两个模型有不同的架构和训练数据。这个特性称为可转移性。由于敌对空间中维数较大的连续子空间在不同模型之间共享,因此AEs可以在不同模型之间进行传输。这种可转移性为AEs提供了巨大的优势,因为攻击者只需要训练替代模型来构造AEs并部署它们来攻击目标模型。

Fig. 6. Visualizing the metric of robustness. This 2-D representation illustrates the metric as the radius of the disk at the original example X and going through the closest AE X∗ constructed from X [63].

图6所示。可视化健壮性度量。这个二维表示法演示了作为原始示例X处的圆盘半径的度量,并通过从X构造的最接近AE X的∗[63]。

The transferability of AEs can be measured by the transfer rate, i.e., the ratio of the number of transferred AEs to the total number of AEs constructed by the original model. In the nontargeted attack, the percentage of the number of AEs generated by one model that are correctly classified by another model is used to measure the nontargeted transferability. It is called the accuracy rate. A lower accuracy rate means a better nontargeted transfer rate. In the targeted attack, the percentage of the AEs generated by one model that can be classified by another model as the target label is used to measure the targeted transferability. It is referred to as the matching rate, and a higher matching rate means a better targeted transfer rate [39].

AEs的可转移性可以通过传输速率来度量,即传输的AEs数量与原始模型所构建的AEs总数的比值。在非目标攻击中,一个模型生成的AEs数目的百分比被另一个模型正确地分类,用来度量非目标可转移性。它被称为准确率。较低的准确率意味着较好的非目标传输率。在目标攻击中,一个模型生成的AEs可以被另一个模型分类为目标标签,它所占的百分比用来度量目标可转移性。它被称为匹配率,而更高的匹配率意味着更好的目标传输率[39]。

The transfer rate of AEs depends on two factors. One is the model-related parameters, including the model architecture, model capacity, and test accuracy. The transfer rate of AEs is high among models with similar architecture, low model capacity (the number of model parameters), and high test accuracy [40]. Another factor is the magnitude of the adversarial perturbation. Within a certain perturbation range, the transfer rate of AEs is proportional to the magnitude of adversarial perturbations, i.e., the greater perturbations to the original example, the higher transfer rate of the constructed AE. The minimum perturbation required for different methods of constructing AEs is different.

AEs的传输速率取决于两个因素。一个是与模型相关的参数,包括模型架构、模型容量和测试精度。AEs在结构相似的模型中传输速率高,模型容量(模型参数数)低,测试精度高。另一个因素是对抗性扰动的程度。在一定的摄动范围内,AEs的传输速率与敌对摄动的大小成正比,即对原始示例的摄动越大,所构造的AE的传输速率就越高。构造AEs的不同方法所需的最小扰动是不同的。

4) Perturbations: Too small perturbations on the original examples are difficult to construct AEs, while too large perturbations are easily distinguished by human eyes. Therefore, perturbations need to achieve a balance between constructing AEs and the human visual system. For example, it is difficult to control the perturbation for FGSM [27], which incurs label leaking easily. To address the issue, Kurakin and Goodfellow [33] proposed an optimized FGSM based on the iterative method, which can control the perturbation within a threshold range. Hence, the success rate of constructing AEs is improved significantly. However, the transfer rate of such AEs is low. Later on, a saliency map-based method [31] is proposed to improve the transfer rate. The key steps include: 1) direction sensitivity estimation: evaluate the sensitivity of each class for each input feature and 2) perturbation selection: use the sensitivity information to select a minimum perturbation δ among the input dimension, which is most likely to misclassify the model.

4)扰动:对原始示例的扰动太小,很难构造AEs,而扰动太大,人眼容易分辨。因此,扰动需要在构建AEs和人类视觉系统之间取得平衡。例如,FGSM[27]的扰动难以控制,容易造成标签泄漏。为了解决这一问题,Kurakin和Goodfellow[33]提出了一种基于迭代法的优化FGSM,可以将扰动控制在阈值范围内。从而大大提高了AEs的构建成功率。然而,这种AEs的传输速率很低。在此基础上,提出了一种基于显著性图的[31]方法来提高传输速率。关键步骤包括:1)方向灵敏度估计:对每个输入特征评估每类的灵敏度;2)扰动选择:利用灵敏度信息在输入维数中选择最容易对模型进行误分类的最小扰动因子。

In general, L2-norm is used to measure the perturbation of the AE and is defined as

通常,l2 -范数用于测量声发射的扰动,定义为

where the n-dimensional vectors XX and XX^{\prime} represent the original example and AE, respectively, d(X,X)d(X^{\prime},X) is the distance metric between XX and XX^{\prime} . It shows that the larger d(X,X)d(X^{\prime},X) to the original example, the greater perturbations needed to construct AEs.

其中n维向量 XXXX^{\prime} 分别代表原例和AE, d(X,X)d(X^{\prime},X)XXXX^{\prime} 之间的距离度量。它表明, d(X,X)d(X^{\prime},X) 比原始示例越大,构造AEs所需的扰动就越大。

5) Perceptual Adversarial Similarity Score: AEs are visually recognized by humans as correct classes while being misclassified by the models. Since the human visual system is sensitive to structural changes, Wang et al. [41] proposed the structural similarity (SSIM) index as a metric to measure the similarity between two images. Luminance and contrast associated with the object structure are defined as the structure information of the image.

5)感知对敌相似度评分:AEs被人类在视觉上识别为正确的类,但被模型误分类。由于人类视觉系统对结构变化非常敏感,Wang等[41]提出了结构相似度(SSIM)指数作为衡量两幅图像之间相似度的度量指标。与目标结构相关的亮度和对比度定义为图像的结构信息。

The structure of the SSIM measurement system is shown in Fig. 7. For two different aligned images X1 and X2, the SSIM measurement system consists of three comparisons: luminance, contrast, and structure. First, the luminance of each image is compared. Second, the standard deviation (the square root of variance) is used as an estimate of the contrast of each image. Third, the image is normalized by its own standard deviation as an estimate of the structure comparison. Finally, the three components are combined to produce an overall similarity measure. Therefore, the SSIM between image signal X and image signal Y can be modeled as

SSIM测量系统结构如图7所示。对于两幅不同对齐的图像X1和X2, SSIM测量系统包括三个比较:亮度、对比度和结构。首先,比较各图像的亮度。其次,使用标准差(方差的平方根)来估计每幅图像的对比度。第三,利用图像自身的标准差对图像进行归一化,作为结构比较的估计。最后,这三个组成部分结合起来产生一个整体的相似性度量。因此,图像信号X与图像信号Y之间的SSIM可以建模为

where m is the number of pixels; L, C, and S are the luminance, contrast, and structure of the image, respectively; hyperparameters α, β, and γ are used to weight the relative importance of L, C, and S, respectively; the default setting is α = β = γ = 1.

其中m为像素点的个数;L、C、S分别为图像的亮度、对比度和结构;超参数(hyperparameters)分别为: α, 、β, \alpha,~、\beta,~ and ρ\rho 是用来衡量的相对重要性 L, C,L,~C, and SS, 分别为:默认设置为: α=β=ρ=1\alpha=\beta=\rho=1

Based on SSIM measurement system, perceptual adversarial similarity score (PASS) is proposed to quantify human perception of AEs [42]. The PASS between X and X is defined as

基于SSIM测量系统,提出了感知对抗相似度评分(PASS)来量化人对AEs[42]的感知。X和X之间的传递被定义为

where ψ(X , X) represents the homography transform (a mapping from one plane to another) from the original image X to the adversarial image X . PASS can quantify the AEs by measuring the similarity of the original image and the adversarial image. An appropriate PASS threshold can be set to distinguish the AEs with excessive perturbations. Meanwhile, attackers can also use the PASS threshold to optimize the methods of constructing AEs. Therefore, constructing an AE should satisfy

其中,crab (X, X)表示从原始图像X到敌对图像X的单应变换(从一个平面到另一个平面的映射)。通过测量原始图像和敌对图像的相似度来量化AEs。可以设置适当的通过阈值来区分带有过多干扰的AEs。同时,攻击者也可以通过阈值来优化AEs的构造方法。因此,构造AE应满足

where d(X, X ) is some dissimilarity measure, f(X ) = y represents that the AE is misclassified by the model, θ is the PASS threshold set by the attacker, PASS(X, X ) ≥ θ represents that the AE is not recognized by the human eyes.

式中,d(X, X)是某种不同的度量,f(X) = y表示AE被模型误分类了,rmb为攻击者设定的通过阈值,rmb为PASS(X, X), rmb为人眼无法识别。

D. Adversarial Abilities and Adversarial Goals

D. 对抗的能力和对抗的目标

Adversarial ability is determined by how well attackers understand the model. Threat models in deep learning system are classified into the following types according to the attacker’s abilities.

对抗能力是由攻击者对模型的理解程度决定的。根据攻击者的能力,将深度学习系统中的威胁模型分为以下几种类型。

1) White-Box Attack: Attackers know everything related to trained neural network models, including training data, model parameters, and model architectures.

1)白盒攻击(White-Box Attack):攻击者知道与训练过的神经网络模型有关的一切,包括训练数据、模型参数、模型架构。

2) Gray-Box Attack: Attackers know some model information such as model architectures, learning rate, training data, and training steps, except model parameters. This attack is a byproduct of black-box attack and is not common in practical applications.

2)灰盒攻击:攻击者知道模型架构、学习率、训练数据、训练步骤等模型信息,但不知道模型参数。这种攻击是黑盒攻击的副产品,在实际应用中并不常见。

3) Black-Box Attack: Attackers do not know the architecture and parameters of the ML model, but can interact with the ML system. For example, the outputs can be determined by classifying random test vectors. Attackers utilize the transferability of AE to train an alternative model to construct AEs first, and then use the generated AEs to attack the unknown target model.

3)黑盒攻击:攻击者不知道ML模型的架构和参数,但可以与ML系统交互。例如,可以通过对随机测试向量进行分类来确定输出。攻击者利用AE的可转移性,先训练一个备用模型来构造AE,然后利用生成的AE攻击未知目标模型。

The process of black-box attack is shown in Fig. 8. First, attackers use the known data set to train an alternative model. Then, attackers construct the corresponding AEs through the alternative model. Finally, these AEs can be used to attack the unknown target model due to the transferability. However, in some scenarios, such as ML as a service, it is difficult to obtain the structure and parameters of the target model and training data set. Papernot et al. [29] proposed a practical black-box attack method to generate AEs. The specific process is as follows.

黑盒攻击过程如图8所示。首先,攻击者使用已知的数据集训练替代模型。然后,攻击者通过备选模型构造相应的AEs。最后,这些AEs可用于攻击未知目标模型。但是在一些场景中,比如ML作为服务,很难获得目标模型和训练数据集的结构和参数。Papernot等[29]提出了一种实用的黑盒攻击方法来生成AEs。具体流程如下。

  1. Attackers use the target model as an oracle to construct a synthetic data set, where the inputs are synthetically generated and the outputs are labels observed from the oracle.

    1. 攻击者使用目标模型作为oracle来构建一个合成数据集,其中的输入是综合生成的,输出是通过oracle观察到的标签。

  2. Attackers select an ML algorithm randomly and use the synthetic data set to train a substitute model.

    1. 攻击者随机选择一种ML算法,利用合成的数据集训练替代模型。

  3. Attackers generate AEs through the substitute model.

    1. 攻击者通过替代模型生成AEs。

  4. Based on the transferability, attackers use the generated AEs to attack the unknown target model.

    1. 基于可转移性,攻击者使用生成的 AEs攻击未知的目标模型。

The goal of adversarial deep learning is to misclassify the model. According to the different influence of the perturbation on the classifier, we classify the adversarial goals into four types.

对抗性深度学习的目的是对模型进行错误分类。根据扰动对分类器的不同影响,我们将敌对目标分为四类。

  1. Confidence Reduction: Reduce the confidence of output classification.

    1. Confidence Reduction:降低输出分类的置信度。

  2. Nontargeted Misclassification: Alter the output classification to any class which is different from the original class.

    1. 非定向错误分类:将输出分类更改为与原始类不同的任何类。

  3. Targeted Misclassification: Force the output classification to be the specific target class.

    1. 定向错分类:强制输出分类为特定的目标类。

  4. Source/Target Misclassification: Select a specific input to generate a specific target class.

    1. 源/目标错误分类:选择特定的输入来生成特定的目标类。

As shown in Fig. 9, the vertical axis represents the adversarial abilities that include architecture, training data, oracle (the adversary can obtain output classifications from provided inputs and observe the relationship between changes in inputs and outputs to adaptively construct AEs), and samples (the adversary has the ability to collect input and output pairs, but cannot modify these inputs to observe the difference in the output). The horizontal axis represents the adversarial goals, and the increasing complexity from left to right is confidence reduction, nontargeted misclassification, targeted misclassification, and source/target misclassification. In general, the weaker the adversarial ability or the higher the adversarial goal, the more difficult it is for the model to be attacked.

如图9所示,纵轴代表了对抗的能力,包括体系结构、训练数据,甲骨文(对手可以获得提供输入和输出分类观察输入和输出的关系变化自适应地构造AEs),和样品(对手有能力收集输入和输出对,但不能修改这些观察输入输出)的差异。横轴表示敌对的目标,从左到右增加的复杂度是置信降低、非目标误分类、目标误分类和源/目标误分类。一般情况下,对抗能力越弱或对抗目标越高,模型越难受到攻击。

IV. METHODS OF CONSTRUCTING AES

IV. 构建AES的方法

In what follows, several typical AE construction methods will be introduced in detail.

下面将详细介绍几种典型的声发射施工方法。

A. Mainstream Attack Methods

A.主流攻击方法

1) L-BFGS: Szegedy et al. [24] proposed limited-memory Broyden–Fletcher–Goldforb–Shanno (L-BFGS) to construct AEs. Given an image XX^{\prime} , attackers construct an image X similar to X with L2-norm and XX^{\prime} can be labeled as a different class. The optimization problem is

1) L-BFGS: Szegedy等人[24]提出Broyden Fletcher Goldforb Shanno (L-BFGS)构建AEs。给定一个图像 XX^{\prime} ,攻击者用L2-norm构造一个类似于X的图像, XX^{\prime} 可以被标记为一个不同的类。优化问题是

where XX2\lVert X-X^{\prime} \rVert_{2} is L2-norm. The goal of the attack is to make f(X)=l,X[0,1]nf(X ) = l, X ∈ [0, 1]^{n} , where l is the target class. f(X ) = l is the nonlinear and nonconvex function, which is difficult to be solved directly. Therefore, the box-constrained L-BFGS is used for approximately solving the following problem:

XX2\lVert X-X^{\prime} \rVert_{2} L2-norm。攻击的目标是使 f(X)=l,X[0,1]nf(X ) = l, X ∈ [0, 1]^{n} ,其中l为目标类。f(X) = l是难以直接求解的非线性非凸函数。因此,使用盒约束的L-BFGS来近似解决以下问题

where c is a randomly initialized hyperparameter, which is determined by linear search; lossF,l()lossF,l(∗) is the loss function. f(X ) = l is approximated by minimizing the loss function. Although this method is high stability and effectiveness, the calculation is complicated.

其中c是随机初始化的超参数,通过线性搜索确定; lossF,l()lossF,l(∗) 是损失函数。f(X) = l是通过最小化损失函数逼近的。虽然该方法具有较高的稳定性和有效性,但计算较为复杂。

2) FGSM: Goodfellow et al. [27] proposed a simplest and fastest method to construct AEs, named FGSM. The generated images are misclassified by adding perturbations and linearizing the cost function in the gradient direction. Given an original image X, the problem can be solved with

2) FGSM: Goodfellow等[27]提出了一种最简单、最快的构建AEs的方法,称为FGSM。通过添加扰动和在梯度方向上线性化代价函数,生成的图像被错误地分类。给定一个原始图像X,这个问题可以用

where XadvX^{adv} represents an AE from X, ϵ\epsilon is a randomly initialized hyperparameter, sign()sign(∗) is a sign function, ytrue is the true label corresponding to X, and J(∗) is the cost function used to train the neural network, XJ()∇XJ(∗) represents the gradient of X.

其中表示来自X的AE,是一个随机初始化的超参数,是一个符号函数,ytrue是对应于X的真标签,J(∗)是用于训练神经网络的代价函数,表示X的梯度。

There are two main differences between FGSM and L-BFGS. First, FGSM is optimized with the L∞-norm. Second, FGSM is a fast AE construction method because it does not require an iterative procedure to compute AEs. Hence, it has lower computation cost than other methods. However, FGSM is prone to label leaking. Therefore, Kurakin et al. [32] proposed FGSM-pred that uses the predicted label ypredy_{pred} instead of true label ytruey_{true} . Kurakin et al. [32] also use the gradients with L2- and L∞-norm, i.e., sign(XJ(X,ytrue))sign(∇X J(X, ytrue)) is changed to (XJ(X,ytrue))/(XJ(X,ytrue)2)(∇X J(X, ytrue))/ (\lVert∇XJ(X, ytrue)\rVert2) and (XJ(X,ytrue))/(XJ(X,ytrue))(∇X J(X, ytrue))/(\lVert∇XJ(X, ytrue)\rVert_{∞}) , these two methods are named as Fast grad.L2 and Fast grad.L∞, respectively.

FGSM和L-BFGS有两个主要区别。首先,用L -norm对FGSM进行优化。其次,FGSM是一种快速的声发射构造方法,因为它不需要迭代过程来计算AEs。因此,与其他方法相比,它的计算成本更低。然而,FGSM容易标签泄漏。因此,Kurakin等[32]提出使用预测标签而不是真实标签的fgsmp -pred。Kurakin等人[32]也使用了L2-和L -norm的梯度,即改为and,这两种方法称为Fast grad。L2和Fast grad。分别L。

3) IGSM: It is difficult for FGSM to control the perturbation in constructing AEs. Kurakin and Goodfellow [33] proposed an optimized FGSM, named iterative gradient sign method (IGSM), which applies perturbations to multiple smaller steps and clips the results after each iteration to ensure that the perturbations are within the neighborhood of the original image. For the Nth iteration, the update process is

3) IGSM: FGSM在构造AEs时难以控制扰动。Kurakin和Goodfellow[33]提出了一种优化的FGSM,名为迭代梯度符号法(IGSM),该方法对多个较小的步骤进行扰动,并在每次迭代后对结果进行剪辑,以确保扰动在原始图像的邻域内。对于第n次迭代,更新过程为

where ClipX,ϵ()Clip_{X,\epsilon}(∗) denotes [Xϵ,X+ϵ][X − \epsilon, X + \epsilon] .

IGSM is nonlinear in the gradient direction and requires multiple iterations, which is simpler than L-BFGS method in calculation, and the success rate of AE construction is higher than FGSM. IGSM can be further divided into two types: 1) reducing the confidence of the original prediction as the original class and 2) increasing the confidence of the prediction that originally belongs to the class with the smallest probability.

IGSM在梯度方向上是非线性的,需要多次迭代,比L-BFGS方法计算简单,声发射构建成功率高于FGSM。IGSM可以进一步分为两种类型:1)降低原预测的置信度作为原类;2)增加原属于概率最小类的预测的置信度。

Recently, Dong et al. [50] proposed a momentum iterative method (MIM). The basic idea is to add momentum based on the IGSM. The weakness of previous iterative attacks is that the transferability (black-box attack) is weakened when the number of iterations increases, which can be addressed after adding momentum. MIM attack not only enhances the attack ability on the white-box model but also increases the success rate on the black-box model. The momentum IGSM for the targeted attack is given by

最近,Dong等人[50]提出了一种动量迭代法(MIM)。其基本思想是在IGSM的基础上增加动力。以前迭代攻击的缺点是,当迭代次数增加时,可转移性(黑盒攻击)会减弱,这可以在添加动量后加以解决。MIM攻击不仅提高了对白盒模型的攻击能力,而且提高了对黑盒模型的攻击成功率。给出了目标攻击的动量IGSM

where gt gathers the gradients of the first t iterations with a decay factor μ, and its initial value is 0.

其中gt收集了衰减因子为liu的第一次t迭代的梯度,其初值为0。

4) Iterl.l: FGSM and L-BFGS try to increase the probability of predicting wrong results but do not specify which wrong class should be selected by the model. These methods are sufficient for small data sets such as Mixed National Institute of Standards and Technology Database (MNIST) and CIFAR-10. On ImageNet, with a larger number of classes and varying degrees of significance between classes, FGSM and L-BFGS may construct uninteresting misclassifications, such as misclassifying one type of cat into another cat. To generate more meaningful AEs, a novel AE generation method is proposed by perturbing the target class with the lowest probability so that this least-likely class turns to become the correct class after the perturbation, which is called as iterative least-likely class method (iterl.l) [33]. To make the adversarial image Xadv be classified as yLL, we have the following procedure:

4) Iterl。l: FGSM和l - bfgs试图增加预测错误结果的概率,但没有明确模型应该选择哪个错误类。这些方法对于小数据集,如国家标准与技术研究所混合数据库(MNIST)和CIFAR-10已经足够了。在ImageNet上,由于类的数量较多,类之间的显著性程度不同,FGSM和L-BFGS可能会构建无趣的误分类,如将一种猫误分类为另一种猫。为了生成更有意义的AE,提出了一种新的AE生成方法,即以最小概率扰动目标类,使该最小概率类在扰动后变成正确的类,称为迭代最小概率类方法[33]。为了将敌对图像Xadv归类为yLL,我们有以下步骤:

where yLL represents the least-likely (the lowest probability) target class. For a classifier with good performance, the least-likely class is usually quite different from the correct class. Therefore, this attack method can lead to some interesting errors, such as misclassifying a cat as an aircraft. It is also possible to use a random class as the target class, which is called as iteration random class method.

其中yLL表示概率最小的目标类。对于性能良好的分类器,最不可能的类通常与正确的类有很大的不同。因此,这种攻击方法会导致一些有趣的错误,比如将cat误归为飞机。也可以使用随机类作为目标类,称为迭代随机类方法。

5) JSMA: Papernot et al. [31] proposed the JSMA, which is based on the L0 distance norm. The basic idea is to construct a saliency map with the gradients and model the gradients based on the impact of each pixel. The gradients are directly proportional to the probability that the image is classified as the target class, i.e., changing a pixel with a larger gradient will significantly increase the likelihood that the model classifies the image as the target class. JSMA allows us to select the most important pixel (the maximum gradient) based on the saliency map and then perturb the pixel to increase the likelihood of labeling the image as the target class. More specifically, JSMA includes the following steps.

5) JSMA: Papernot等[31]提出了基于L0距离范数的JSMA。其基本思想是构造一个带有梯度的显著性映射,并根据每个像素的影响对梯度进行建模。梯度与图像被分类为目标类的概率成正比,即改变一个梯度较大的像素将显著增加模型将图像分类为目标类的可能性。JSMA允许我们根据显著性映射选择最重要的像素(最大梯度),然后对像素进行扰动以增加将图像标记为目标类的可能性。更具体地说,JSMA包括以下步骤。

1) Compute forward derivative ∇F(X)

1)计算∇F(X)的前向导数

2) Construct a saliency map S based on the forward derivative, as shown in Fig. 10.

2) 基于正导数构造显著性映射,如图10所示。

3) Modify the most important pixel based on the saliency map, repeat this process until the output is the target class or the maximum perturbation is obtained.

3)根据显著性图修改最重要的像素,重复此过程,直到输出为目标类或获得最大扰动为止。

When the model is sensitive to the change of inputs, JSMA is easier to calculate the minimum perturbation to generate the AEs. JSMA has high computational complexity, while the generated AEs have a high success rate and transfer rate.

当模型对输入的变化敏感时,JSMA更容易计算生成AEs的最小扰动。JSMA具有较高的计算复杂度,而生成的AEs具有较高的成功率和传输率。

6) DeepFool: Moosavi-Dezfooli et al. [45] proposed a nontargeted attack method based on the L2-norm, called DeepFool. Assuming that the neural network is completely linear, there must be a hyperplane separating one class from another. Based on this assumption, we analyze the optimal solution to this problem and construct AEs. The corresponding optimization problem is

6) DeepFool: moosavi - dez愚i等人[45]提出了一种基于l2规范的非目标攻击方法,称为DeepFool。假设神经网络是完全线性的,那么一定有一个超平面将一类与另一类分开。在此基础上,分析了该问题的最优解并构造了AEs。对应的优化问题为

subject to sign (f(X0+r))sign(f(X0))(f(X0 + r)) \neq sign(f(X0)) , where r indicates the perturbation.

受符号 (f(X0+r))sign(f(X0))(f(X0 + r)) \neq sign(f(X0)) ,其中r表示扰动。

As shown in Fig. 11, X0 is the original example, f(X) is a linear binary classifier, the straight line WX + b = 0 is the decision boundary, and r∗(X) is the distance from the original example to the decision boundary, i.e., the distance from X0 to the straight line WX +b = 0. The distance is equivalent to the perturbation Δ(X0; f). Therefore, when Δ(X0; f) > r∗(X), the AE can be generated.

如图11所示,X0是原始的例子,f (X)是一个线性二元分类器,直线WX + b = 0的决策边界,和r (X)的距离原来的例子来决定边界,即直线的距离X0 WX + b = 0。距离相当于扰动Δ(X0;f)。因此,当Δ(X0;f)的在r (X),可以生成AE。

Compared with L-BFGS, DeepFool is more efficient and powerful. The basic idea is to find the decision boundary that is the closest to X in the image space, and then use the boundary to fool the classifier. It is difficult to solve this problem directly in neural networks with high dimension and nonlinear space. Therefore, a linearized approximation is used to iteratively solve this problem. The approximation is to linearize the intermediate X0 classifier in each iteration and obtain an optimal update direction on the linearized model. Then, X0 is iteratively updated in this direction by a small step α, repeating the linear update process until X0 crosses the decision boundary. Finally, AEs can be constructed with subtle perturbations.

与L-BFGS相比,DeepFool更高效、更强大。其基本思想是找到图像空间中最接近X的决策边界,然后利用该边界欺骗分类器。在高维非线性空间的神经网络中,很难直接解决这一问题。因此,采用线性化逼近方法迭代求解该问题。近似方法是在每次迭代中对中间X0分类器进行线性化,得到线性化后的模型的最优更新方向。然后,X0在这个方向上通过一个小的步进迭代更新,重复线性更新过程,直到X0越过决策边界。最后,AEs可以用细微的扰动来构造。

7) Universal Adversarial Perturbations: FGSM, JSMA, and DeepFool can only generate adversarial perturbations to fool a network on a single image. Moosavi-Dezfooli et al. [46] proposed a universal image-agnostic perturbation attack method, which fools classifiers by single adversarial perturbation to all images. The specific problem can be defined as finding a universal perturbation v, such that ˆ k(X + v) = ˆ k(X) for most examples in the data set subject to the distribution μ and can be expressed as

7)通用对抗性摄动:FGSM、JSMA、DeepFool只能产生对抗性摄动来对单一图像欺骗网络。moosavi - dez愚i等人[46]提出了一种通用的图像不可知扰动攻击方法,该方法通过对所有图像的单一对攻扰动来欺骗分类器。具体问题可以定义为寻找一个普遍的扰动v,使受分布支点影响的数据集中大多数例子的k(X + v) = k(X)可以表示为

where ˆ k denotes a classification function that labels each image. Such perturbation v is called as the universal perturbation. Attackers’ goal is to find v that satisfies the following two constraints:

在ˆk表示一个标签每个图像的分类函数。这种微扰v称为普遍微扰。攻击者的目标是找到满足以下两个约束条件的v:

p\lVert \cdot \rVert_{p} denotes the p-norm, the parameter § controls the magnitude of the perturbation v, and δ quantifies the desired fooling rate for all images. The attack method has two characteristics: 1) the perturbation is related to the target model rather than the image and 2) the small perturbation will not change the structure of the image itself.

p\lVert \cdot \rVert_{p} 表示p范数,该参数控制扰动的幅度v,并对所有图像的期望欺骗率进行量化。该攻击方法具有两个特点:1)扰动与目标模型有关,而与图像无关;2)小扰动不会改变图像本身的结构。

8) Carlini and Wagner (CW) Attack (C&W): Carlini and Wagner [47] proposed a powerful attack method based on L-BFGS. The attack with L0, L2, and L∞ distance norm can be targeted or nontargeted, and we take the nontargeted L2-norm as an example here. The corresponding optimization problem is

8) Carlini and Wagner (CW)攻击(C&W): Carlini and Wagner[47]提出了一种基于L-BFGS的强大攻击方法。使用L0, L2,和L距离范数的攻击可以是有针对性的,也可以是非针对性的,这里我们以非针对性的L2范数为例。对应的优化问题为

where X + δ ∈ [0, 1]n, c is a hyperparameter that can balance these two terms, and δ is a small perturbation. The objective function f(X ) is defined as

其中X +暂定值[0,1]n, c是一个超参数,可以平衡这两个项,而暂定值是一个小扰动。目标函数f(X)定义为

where Z(X ) is the last hidden layer, t is the true label, and l is a hyperparameter, which is used to control the confidence level of the model misclassification, and AE X can be classified as t with high confidence by adjusting the value of l. In general, high confidence attacks have large perturbations and high transfer rates, and CW attack based on the L0, L2, and L∞ distance metric can defeat the defensive distillation [63]. There are three improvements to this attack based on L-BFGS as follows.

  1. Use the gradient of the actual output in the model instead of the gradient of softmax.

  2. Apply different distance metrics (L0, L2, and L∞).

  3. Apply different objective functions f(X)f(X^{\prime}) .

Z (X)是最后一个隐藏层,t是真正的标签,和l hyperparameter,用于控制模型的置信度误分类,和AE X可分为t信心通过调整价值高的l。一般来说,信心攻击大扰动和高转移率高,和连续波攻击基于L0, L2, l距离度量可以击败防守蒸馏[63]。基于L-BFGS的这种攻击有如下三个改进。

  1. 使用模型中实际输出的梯度,而不是softmax的梯度。

  2. 应用不同的距离度量(L0, L2,和L∞)。

  3. 应用不同的目标函数 f(X)f(X^{\prime})

9) Ensemble Attack: Liu et al. [39] proposed an ensemble attack method combining multiple models to construct AEs. If an adversarial image remains adversarial for multiple models, it is likely to be transferred to other models. Formally, given k white-box models with softmax outputs being J1,...,Jk, an original image X and its true label y, the ensemble-based approach solves the following optimization problem (for targeted attack):

9)集成攻击:Liu等[39]提出了一种结合多个模型构建AEs的集成攻击方法。如果一个敌对的形象对多个模型仍然敌对,它很可能转移到其他模型。正式地,给定k个softmax输出为J1的白盒模型,…,Jk,一个原始图像X和它的真实标签y,基于嵌入的方法解决了以下优化问题(针对目标攻击)

where yy^{\prime} is the target label specified by the attacker, i=1kαiJi(X)\sum_{i=1}^{k}\alpha_{i}J_{i}(X^{\prime}) is the ensemble model, and αi is the weight of the ith model, i=1kαi=1\sum_{i=1}^{k}\alpha_{i}=1 , λ is a randomly initialized parameter that is used to control the weight of the two terms. The goal is to ensure that the generated AEs are still adversarial for the other black-box model Jk+1. Since the decision boundaries for different models are almost the same, the transferability of targeted AEs is improved significantly.

yy^{\prime} 攻击者指定的目标标签 i=1kαiJi(X)\sum_{i=1}^{k}\alpha_{i}J_{i}(X^{\prime}) 为系综模型,其次为第i个模型的权重, i=1kαi=1\sum_{i=1}^{k}\alpha_{i}=1 , 仿真参数是一个随机初始化的参数,用于控制两个项的权重。目标是确保生成的AEs对于另一个黑盒模型Jk+1仍然是敌对的。由于不同模型的决策边界几乎相同,从而显著提高了目标AEs的可移植性。

B. Other Attack Methods

B.其他攻击方式

From the perspective of attackers, the goal of the attack is to construct strong AEs with small perturbations and fool the model with high confidence without being recognized by human eyes. Recently, in addition to typical AE construction methods introduced above, a lot of other attack methods have been proposed [48], [49]. Xia and Liu [51] proposed AdvGAN to construct AEs. The basic idea is to use generative adversarial networks to construct targeted AEs, which not only learns and preserves the distribution of the original examples but also guarantees the diversity of perturbations and enhances generalization ability significantly. Tramèr et al. [52] proposed an ensemble attack RAND + FGSM. First, they added a small random perturbation RAND to “escape” the nonsmooth vicinity of the data point before computing the gradients. Then, they applied the FGSM to enhance the attack ability greatly. Compared with the FGSM, this method has a higher success rate and can effectively avoid label leaking. Su et al. [53] proposed a one-pixel attack method that only changes one pixel for each image to construct AEs to fool DNNs. However, such simple perturbation can be recognized by human eyes easily. Weng et al. [54] proposed a computationally feasible method called Cross Lipschitz Extreme Value for nEtwork Robustness (CLEVER), which applies extreme value theory to estimate a lower bound of the minimum adversarial perturbation required to misclassify the image. CLEVER is the first attack-independent method and can evaluate the intrinsic robustness of neural networks. Brown et al. [55] proposed adversarial patch that does not need to subtly transform an existing image into another and can be placed anywhere within the field of view of the classifier to cause the classifier to output a targeted class. Li et al. [56] studied the security of real-world cloud-based image detectors, including Amazon Web Services (AWS), Azure, Google Cloud, Baidu Cloud, and Alibaba Cloud. Specifically, they proposed four different attacks based on semantic segmentation, which generate semantics-aware AEs by interacting only with the black-box application programming interfaces (APIs).

从攻击者的角度来看,攻击的目的是构造具有小扰动的强AEs,在不被人眼识别的情况下,以高可信度欺骗模型。近年来,除了上述典型的声发射构造方法外,还有很多其他攻击方法被提出,如[48]、[49]。Xia和Liu[51]提出了AdvGAN构建AEs。其基本思想是利用生成式对抗网络构建有针对性的AEs,既学习并保留了原始示例的分布,又保证了扰动的多样性,显著提高了泛化能力。Tramer等人[52]提出了一种集成攻击RAND + FGSM。首先,在计算梯度之前,他们添加了一个小的随机扰动RAND来逃离数据点的非光滑附近。然后,他们采用了FGSM技术,大大提高了攻击能力。与FGSM相比,该方法成功率更高,能有效避免标签泄漏。Su等[53]提出了一种单像素攻击方法,即每幅图像只改变一个像素来构造AEs来欺骗DNNs。然而,这种简单的扰动很容易被人眼识别出来。Weng等人[54]提出了一种计算上可行的网络鲁棒性交叉Lipschitz极值方法(smart),该方法利用极值理论估计图像误分类所需的最小对抗性扰动的下界。CLEVER是第一种不依赖攻击的方法,可以评估神经网络的固有鲁棒性。Brown等人[55]提出了对敌patch,它不需要将现有图像巧妙地转换为另一幅图像,可以放置在分类器视野的任何位置,使分类器输出一个目标类。Li et al.[56]研究了现实世界中基于云的图像探测器的安全性,包括亚马逊网络服务(AWS)、Azure、谷歌云、百度云和阿里巴巴云。具体来说,他们提出了四种基于语义分割的不同攻击,这种攻击仅通过与黑箱应用程序编程接口(api)交互来生成语义感知的AEs。

As discussed above, many AE attacks have been proposed in recent years. We summarized the advantages and disadvantages of several typical attack methods in Table I.

如上所述,近年来出现了许多声发射攻击。我们在表I中总结了几种典型攻击方法的优缺点。

V. COMPARISON OF VARIOUS ATTACK METHODS

V.各种攻击方法的比较

In this section, we compare the attributes of different attack methods in terms of black/white-box, attack-type, targeted/nontargeted, and PASS. Then, we conduct a lot of experiments to compare the success rate and the transfer rate for different attack methods. The code to reproduce our experiments is available online at http://hardwaresecurity.cn/ SurveyAEcode.zip.

在本节中,我们将根据黑/白盒、攻击类型、目标/非目标和通过来比较不同攻击方法的属性。然后,我们进行了大量的实验来比较不同攻击方法的成功率和传输率。重现我们实验的代码可以在http://hardwaresecurity.cn/surveyaecode.zip上找到。

A. Experimental Setup

A.实验装置

1) Platform: All the experiments are conducted on a machine equipped with an AMD Threadripper 1920X CPU, a NVIDIA GTX 1050Ti GPU, and 16G memory, and implemented in tensorflow-gpu 1.6.0 with Python 3.6.

1)平台:所有实验都在AMD Threadripper 1920X CPU, NVIDIA GTX 1050Ti GPU, 16G内存的机器上进行,使用Python 3.6实现tensorflow-gpu 1.6.0。

2) Data Set: All of the experiments described in this section are performed on ILSVRC 2012 data set [57]. The ILSVRC 2012 data set is a subset of ImageNet, containing about 1000 images of 1000 categories. The examples are split among a training set of approximately 1.2 million examples, a validation set of 50 000 and a test set of 150 000. In our experiments, we randomly select 1000 images from 1000 categories in the ILSVRC 2012 validation set. The size of each image is 299×299×3. The intensity values of pixels in all these images are scaled to a real number in [0, 1].

2)数据集:本节所有实验均在ILSVRC 2012数据集[57]上进行。ILSVRC 2012数据集是ImageNet的一个子集,包含大约1000幅1000类的图像。这些示例被分割成一个包含大约120万个示例的训练集、一个包含5万个示例的验证集和一个包含15万个示例的测试集。在我们的实验中,我们从ILSVRC 2012验证集的1000个类别中随机选择了1000张图片,每张图片的大小为299X299X3。所有这些图像中像素的强度值都被缩放为一个实数[0,1]。

3) DNN Model: For the data set, we evaluate the success rate and transfer rate of AEs on five popular deep network architectures: Inception V3 [58], AlexNet [59], ResNet34 [60], DenseNet20 [61], and VGG19 [62], which are pretrained by the ImageNet data set.

3) DNN模型:对于数据集,我们评估了AEs在五个流行的深度网络架构上的成功率和传输率:Inception V3[58]、AlexNet[59]、ResNet34[60]、DenseNet20[61]和VGG19[62],这五个架构都是由ImageNet数据集预先训练的。

B. Comparison of Different Attack Methods on Attributes

B.属性上不同攻击方法的比较

Table II summarizes the attribute information of mainstream attacks. We observed the following.

表二总结了主流攻击的属性信息。我们观察到以下情况。

1) Most of the attack methods are white-box attacks. In the scenario of black-box attacks, attackers are difficult to construct AE attacks on ML models.

1)大部分攻击方法为白盒攻击。在黑盒攻击场景中,攻击者很难对ML模型进行AE攻击。

2) Most of the mainstream attacks are based on the gradient. Later on, other attacks such as decision boundary, iterative optimization, and ensemble optimization are proposed.

2)大多数主流攻击都是基于梯度的。在此基础上,提出了决策边界、迭代优化和集成优化等攻击方法。

3) The more asterisks, the larger the value of PASS, i.e., the generated AEs are easier to be perceived by human eyes. For example, since FGSM is prone to the label leakage effect, the best perturbation to generate AEs is large (i.e., the PASS is large), and hence, the generated AEs are easier to be perceived.

3)星号越多,PASS值越大,即生成的AEs更容易被人眼感知。 例如,由于FGSM容易产生标签泄漏效应,生成AEs的最佳扰动较大(即通量值较大),因此生成的AEs更容易被感知。

C. Success Rate

c .成功率

The success rate of various AE construction methods for targeted and nontargeted attacks on the Inception V3 model is evaluated. As shown in Table III, the success rate of targeted attacks is lower than the nontargeted attacks for the same AE construction method, and MIM has the highest success rate for target and nontarget attacks. Note that JSMA needs to calculate the forward derivative of each pixel in the image to construct the Jacobian matrix. Therefore, JSMA is computationally expensive. In the experiment, we do not report the success rate and transfer rate for JSMA because it runs out of memory on the big data set.

评估了针对在先启V3模型的目标攻击和非目标攻击的各种AE构造方法的成功率。由表3可以看出,对于相同的AE构造方法,有目标攻击的成功率低于非目标攻击,其中MIM对目标和非目标攻击的成功率最高。注意,JSMA需要计算图像中每个像素的前向导数来构造雅可比矩阵。因此,JSMA在计算上是昂贵的。在实验中,我们没有报告JSMA的成功率和传输率,因为它在大数据集中内存不足。

D. Transfer Rate

d .传输速率

The transfer rates of typical AEs for targeted and nontargeted attacks among the five models [Inception V3 (A), AlexNet (B), ResNet (C), DenseNet (D), and VGG (E)] are shown in Table IV. Inception V3 is used as the source model and the other models are used as the target models. We collect 1000 adversarial images generated by each AE construction method in Inception V3 and apply them to other models to test the transfer rate of AEs. For each AE construction method, we define the average perturbation P as

5种模型[Inception V3 (A)、AlexNet (B)、ResNet (C)、DenseNet (D)、VGG (E)]中典型AEs针对有目标和非目标攻击的传输速率见表四。以Inception V3为源模型,其他模型为目标模型。我们在Inception V3中收集了每一种AE构建方法生成的敌对图像1000张,并将其应用于其他模型来测试AEs的传输率。对于每种声发射构造方法,我们将平均摄动P定义为

where N, R, C, and L are the number of images, rows, columns, and layers, respectively, pixelori(adv) i,r,c,l is the pixel value of row r, column c, and layer l of original (adversarial) image i. abs(pixelori i,r,c,l − pixeladv i,r,c,l) is the absolute value of (pixelori i,r,c,l − pixeladv i,r,c,l). The magnitude of the perturbation is proportional to the transfer rate, i.e., the greater the perturbation, the higher the transfer rate. In addition, we find that the transfer rate of AE construction methods is low, i.e., the transfer rates of most of targeted attacks are less than 0.5%, which means that it is difficult to conduct AE attacks in the real black-box scenario, and it is urgent to develop AE attack methods with a high transfer rate.

其中N、R、C、L分别为图像、行、列、层的数量,pixelori(adv) i、R、C、L为原始(敌对)图像i的行、列、层的像素值。abs(pixelori i, R, C, L pixeladv i, R, C, L)为(pixelori i, R, C, L pixeladv i, R, C, L)的绝对值。扰动的大小与传输速率成正比,即扰动越大,传输速率越高。此外,我们发现AE施工方法的传输速率很低,也就是说,大多数的、有针对性的攻击的转移率不到0.5%,这意味着很难进行攻击,在真正的黑盒的场景中,迫在眉睫的是培养AE攻击方法和高传输速率。

VI. DEFENSES AGAINST ADVERSARIAL EXAMPLES

VI.对抗对抗的例子

AEs bring a great threat to the security-critical AI applications such as face payment [64], medical systems [65], and autonomous vehicles [66], [67] based on image recognition in deep learning. Vulnerability to AEs is not unique to deep learning. All ML models are vulnerable to AEs [29]. Therefore, defending against AEs is urgent for ML security. In this section, we will briefly describe the basic goals of defending against AEs, then detail the current defense techniques and their limitations. Finally, some suggestions are presented for future research work on the problems of the current defense techniques.

AEs给人脸支付[64]、医疗系统[65]、深度学习中基于图像识别的自动驾驶汽车[66]、[67]等安全关键的人工智能应用带来巨大威胁。AEs的弱点并不是深度学习独有的。所有的ML模型都易受AEs[29]攻击。因此,防御AEs是ML安全的当务之急。在本节中,我们将简要描述防御AEs的基本目标,然后详细介绍当前的防御技术及其局限性。最后,对当前防御技术存在的问题提出了今后的研究建议。

A. Defense Goals

国防目标

Generally, there are four defense goals as follows.

一般来说,有四个防御目标如下。

  1. Low Impact on the Model Architecture: When constructing any defense against AEs, the primary consideration is the minimal modification to model architectures.

    1. 对模型体系结构的低影响:在构造任何针对AEs的防御时,主要考虑的是对模型体系结构的最小修改。

  2. Maintain Model Speed: Running time is very important for the availability of DNNs. It should not be affected during testing. With the deployment of defenses, DNNs should still maintain high performance on large data sets.

    1. 保持模型速度:运行时间对DNNs的可用性非常重要。它不应该在测试期间受到影响。随着防御系统的部署,DNNs仍将在大数据集上保持高性能。

  3. Maintain Accuracy: Defenses should have little impact on the classification accuracy of models.

    1. 保持精度:防御应该对模型的分类精度有很小的影响。

  4. Defenses Should be Targeted: Defenses should be effective for the examples that are relatively close to the training set. Since the examples that are far from the data set are relatively secure, the perturbations to these examples are easily detected by the classifier.

    1. 防御应该是有针对性的:防御应该对相对接近训练集的例子有效。因为远离数据集的例子是相对安全的,所以对这些例子的干扰很容易被分类器检测出来。

B. Current Defenses

B .当前防御

1) Adversarial Training: AEs have been used to improve the anti-interference ability for AI models. In 2015, Goodfellow et al. [27] proposed the adversarial training to improve the robustness of the model. The basic idea is to add AEs to the training data and continuously generate new AEs at each step of the training. The number and relative weight of AEs in each batch is controlled by the loss function independently. The corresponding loss function [32] is where L(X|y) is a loss function of the example X with a true label y, m is the total number of training examples, k is the number of AEs, and λ is a hyperparameter used to control the relative weight of the AEs in the loss function. When k = 0.5E m/2, i.e., when the number of AEs is the same as the number of original examples, the model has the best effect in the adversarial training.

1)对抗性训练:使用AEs来提高AI模型的抗干扰能力。2015年Goodfellow等[27]提出对抗性训练,以提高模型的鲁棒性。其基本思想是将AEs添加到训练数据中,并在训练的每一步不断生成新的AEs。每批AEs的数量和相对重量由损耗函数独立控制。对应的损失函数[32]为其中L(X|y)为例X的损失函数,真实标号为y, m为训练例总数,k为AEs的数量,而在损失函数中,用于控制AEs的相对权重的超参数为:当k = 0.5E m/2时,即当AEs的数量与原始示例的数量相同时,该模型在对抗性训练中效果最好。

Adversarial training is not the same as data augmentation. The augmented data may appear in the test set, while AEs are usually not shown in the test set but can reveal the defects of the model. Adversarial training can be viewed as the process of minimizing classification error rates when the data are maliciously perturbed. In the following two situations, it is suggested to use adversarial training.

对抗性训练与数据增强不同。扩充的数据可能会出现在测试集中,而AEs通常不会出现在测试集中,但可以揭示模型的缺陷。对抗性训练可以看作是在数据被恶意干扰时最小化分类错误率的过程。在以下两种情况下,建议采用对抗性训练。

  1. Overfitting: When a model is overfitting, a regularization term is needed.

    1. 过拟合:模型过拟合时,需要一个正则化项。

  2. Security: When AEs refer to security problems, adversarial training is the most secure method among all known defenses with only a small loss of accuracy.

    1. 安全性:当AEs涉及到安全性问题时,对抗性训练是所有已知防御手段中最安全的方法,准确性只有很小的损失。

Although a model is robust to white-box attacks after adversarial training, it is still vulnerable to the AEs generated from other models, i.e., the model is not robust to black-box attacks. Based on this attribute, Tramèr et al. [52] proposed the concept of ensemble adversarial training. The main idea is to augment the training data, which constructed not only from the model being trained but also from the other pretrained models, which increases the diversity of AEs and improves the generalization ability.

虽然经过对抗性训练的模型对白盒攻击具有鲁棒性,但对其他模型生成的AEs仍然具有脆弱性,即对黑箱攻击不具有鲁棒性。基于此属性,Tramer等人[52]提出了ensemble对抗性训练的概念。其主要思想是增加训练数据,这些数据不仅是由被训练的模型构建的,也是由其他的预训练模型构建的,增加了AEs的多样性,提高了AEs的泛化能力。

2) Defensive Distillation: Adversarial training needs AEs to train the model, thus the defense is related to the process of AEs construction. For any defense, the defense effect is quite different for different attack methods. In 2016, Papernot et al. [63] proposed a universal defensive method for neural networks, which is called defensive distillation. The distillation method uses a small model to simulate a large and computationally intensive model without affecting the accuracy and can solve the problem of information missing. Different from the traditional distillation technique, defensive distillation aims to smooth the model during the training process by generalizing examples outside the training data. The specific training steps are shown in Fig. 12.

2)防御精馏:对抗性训练需要AEs来训练模型,因此防御关系到AEs构建的过程。对于任何一种防御,不同的攻击方式,防御效果都是不同的。2016年,Papernot等人[63]提出了一种通用的神经网络防御方法,称为防御蒸馏。精馏方法在不影响精度的情况下,利用小模型模拟计算量大的模型,解决了信息缺失的问题。与传统的蒸馏技术不同,防御性蒸馏的目的是通过对训练数据外的例子进行一般化,使训练过程中的模型平滑。具体的训练步骤如图12所示。

Fig. 12. Pipeline of defensive distillation. The initial network is trained at temperature T on the training set (X,Y (X)), and the distilled network is trained at the same temperature T on the new training set (X,F(X)) [63].

图12所示。防御蒸馏管道。初始网络在训练集(X,Y (X))上以温度T进行训练,蒸馏后的网络在新的训练集(X,F(X))上以相同的温度T进行训练[63]。

  1. The probability vectors produced by the first DNN are used to label the data set. These new labels are called soft labels as opposed to hard class labels.

    1. 第一个DNN产生的概率向量被用来标记数据集。这些新的标签被称为软标签,而不是硬类标签。

  2. The data set to train the second DNN model can be the newly labeled data or a combination of hard and soft labels. Since the second model combines the knowledge of the first model, it has smaller size, better robustness, and smaller computational complexity.

    1. 训练第二个DNN模型的数据集可以是新标记的数据,也可以是硬标签和软标签的组合。由于第二个模型结合了第一个模型的知识,它具有更小的规模,更好的鲁棒性和更小的计算复杂度。

The basic idea of defensive distillation is to generate smooth classifiers that are more resilient to AEs, reducing the sensitivity of the DNN to the input perturbation. In addition, defensive distillation improves the generalization ability because it does not modify the neural network architecture. Therefore, it has low training overhead and no testing overhead. Although attack methods [36], [47] have demonstrated that the defensive distillation does not improve the robustness of neural networks significantly, the following three methods are still a good research direction for defending AEs.

防御蒸馏的基本思想是生成平滑的分类器,它对AEs更有弹性,降低DNN对输入扰动的敏感性。此外,防御蒸馏不改变神经网络的结构,提高了泛化能力。因此,它的训练开销很低,而且没有测试开销。虽然攻击方法[36]、[47]已经证明了防御精馏并没有显著提高神经网络的鲁棒性,但以下三种方法仍是防御AEs的一个很好的研究方向。

  1. Consider defensive distillation under different types of the perturbation (FGSM, L-BFGS, etc.).

    1. 考虑不同类型扰动(FGSM、L-BFGS等)下的防御蒸馏。

  2. Investigate the effect of distillation on other DNN models and AE constructing algorithms.

    1. 研究蒸馏对其它DNN模型和声发射构造算法的影响。

  3. Study various distance metrics such as L0, L2, and L∞ between the original examples and the AEs.

    1. 研究原始样本与AEs之间的各种距离度量,如L0、L2和L∞。

3) Detector: Adversarial training is proposed to enhance the robustness of the model. However, this method lacks generalization ability and is difficult to popularize. Defensive distillation is proposed to defend AEs but defeated by a strong CW Attack. In 2017, Lu et al. [68] proposed a radial basis function (RBF)-support vector machine (SVM)-based detector to detect whether the input is normal or adversarial (as shown in Fig. 13). The detector can get the internal state of some back layers in the original classification neural network. If the detector finds out that the example is an AE, then it will be rejected.

3)检测器:提出对抗性训练,增强模型的鲁棒性。但该方法缺乏推广能力,难以推广。提出了防御蒸馏来防御AEs,但被强大的CW攻击击败。2017年,Lu等人[68]提出了一种基于径向基函数(RBF)-支持向量机(SVM)的检测器,用于检测输入是正态还是对抗性(如图13所示)。检测器可以得到原始分类神经网络中某些后台层的内部状态。如果检测器发现这个例子是一个AE,那么它将被拒绝。

We assume that the detector is difficult to be attacked, and the output of ReLU activation function is processed in the binary format. Since normal examples and AEs generate different binary codes, detectors can compare the code during the test to determine whether the input is normal or adversarial.

我们假设检测器不易被攻击,ReLU激活函数的输出以二进制格式处理。由于普通示例和AEs生成不同的二进制代码,检测器可以在测试期间比较代码,以确定输入是正常的还是对抗性的。

At present, AE detectors are mainly divided into the following classes [69].

目前声发射检测器主要分为以下几类[69]。

  1. Detection Based on Secondary Classification: Generally, there are two kinds of secondary classification detection methods. One is the adversarial training detector [70], [71], which is similar to adversarial training. The main idea is to add a new classification label to AEs during training. If an AE is detected, the model will classify it into a new class. Another is to take the characteristics extracted from AEs and original examples during the convolution layer as input. Then, the labeled input data are used to train neural network detectors. This method performed well on detecting over 85% of AEs.

    1. 基于二级分类的检测:一般有两种二级分类检测方法。一种是对抗性训练检测器[70],[71],它与对抗性训练相似。其主要思想是在训练过程中为AEs添加一个新的分类标签。如果检测到一个AE,模型将把它分类为一个新的类。另一种方法是在卷积层中提取AEs和原始样本的特征作为输入。然后,用带标记的输入数据训练神经网络检测器。该方法对85%以上的AEs具有良好的检测效果。

  2. Detection Based on Principal Component Analysis (PCA): The essence of PCA is to transform the original features linearly and map them to a low-dimensional space with the best possible representation of the original features. PCA-based detection methods are mainly divided into two types. The first one uses PCA in the input layer due to the greater weight of AEs processed by PCA than original examples [72]. The second one uses PCA in the hidden layer [73]. If the result of each hidden layer matches the feature of original examples, the detector will classify the input as original examples.

    1. 基于主成分分析(PCA)的检测:主成分分析的本质是对原始特征进行线性变换,并将其映射到一个低维空间中,使其尽可能得到原始特征的最佳表示。基于pca的检测方法主要分为两类。第一种方法是在输入层使用PCA,因为PCA处理的AEs的权重比原始示例大[72]。第二种方法使用隐藏层中的主成分分析[73]。如果每个隐藏层的结果与原始样本的特征相匹配,检测器将输入分类为原始样本。

  3. Detection Based on Distribution: There are two main distribution-based detection methods. The first one uses the maximum mean discrepancy [70], which measures the distance between two different but related distributions. Assuming that there are two sets of images S1 and S2, S1 contains all the original examples and S2 contains either all AEs or all original examples. If S1 and S2 have the same distribution, then S2 has original examples; otherwise, S2 is full of AEs. The second one uses kernel density estimation [74]. Since AEs have a different density distribution from the original examples, they can be detected with high confidence by the estimation of the density ratio. If the density ratio of one example is close to 1, it belongs to the original example. If the density ratio is much larger than 1, it belongs to AEs.

    1. 基于分布的检测:主要有两种基于分布的检测方法。第一种方法使用最大平均差异[70],它测量两个不同但相关的分布之间的距离。假设有两组图像S1和S2, S1包含所有原始示例,S2包含所有AEs或所有原始示例。如果S1和S2有相同的分布,那么S2有原始的例子;否则,S2充满了AEs。第二种方法使用核密度估计[74]。由于AEs的密度分布与原样品不同,通过对其密度比的估计可以获得较高的可信度。如果一个示例的密度比接近1,则属于原始示例。如果密度比远远大于1,则属于AEs。

  4. Other Detection Methods: Dropout randomization [30] is a method to use dropout randomly during AE detection. Original examples always generate correct labels, but AEs are of high possibility to be different from the label corresponding to original examples. In addition, another method called Mean Blur [73] uses the filter to perform mean blurring on the input image and can effectively improve the robustness of models.

    1. [30]是在AE检测中随机使用Dropout的一种方法。原始示例总是生成正确的标签,但AEs很可能与原始示例对应的标签不同。另外,另一种方法称为均值模糊[73],使用滤波器对输入图像进行均值模糊,可以有效提高模型的鲁棒性。

C. Other Defense Techniques

C.其他防御技术

Defenders’ goal is to train a model where no AEs exist or AEs cannot be easily generated. Recently, some novel researches on defending AEs have been proposed. Meng and Chen [43] proposed a framework MagNet, including one or more separate detector networks and one reformer network. The detector network learns to distinguish normal examples from AEs by approximating normal examples. The reformer network moves AEs toward the normal examples. As MagNet is independent of the process of constructing AEs, hence it is effective in the black-box and gray-box attacks. Dong et al. [50] proposed a high-level representation guided denoiser method which requires fewer training images and consumes less training time than previous defense methods at the expense of a reduced success rate. Ma et al. [75] proposed local intrinsic dimensionality (LID) to describe the dimensional attributes of the adversarial subspace in the AEs and proved that these features can distinguish normal examples from AEs effectively. Baluja and Fischer [81] proposed adversarial transformation networks (ATNs) to increase the diversity of perturbations to improve the effectiveness of adversarial training. However, ATNs may produce similar perturbations to iterative methods, which are not suitable for adversarial training. In addition, hardware security primitives such as physical unclonable functions [82]–[85] can be used to randomize the model to assist the AE detection.

捍卫者的目标是训练不存在AEs或AEs不容易生成的模型。近年来,人们对AEs的防御提出了一些新的研究。孟和Chen[43]提出了一种框架磁体,包括一个或多个单独的检测器网络和一个转化体网络。检测器网络通过近似正常示例来学习从AEs中区分正常示例。转化网络将AEs移向正常示例。由于MagNet与AEs的构建过程无关,因此它在黑箱攻击和灰箱攻击中都是有效的。Dong等人[50]提出了一种高级的以表示为指导的去噪方法,该方法比以往的防御方法需要的训练图像更少,训练时间更短,但成功率降低。Ma等人[75]提出了局部内在维数(local intrinsic dimensionality, LID)来描述AEs中敌对子空间的维数属性,并证明了这些特征可以有效地区分AEs中的正态例子。Baluja和Fischer[81]提出了对抗变换网络(ATNs)来增加扰动的多样性,以提高对抗训练的有效性。但是,ATNs可能会产生类似于迭代法的扰动,不适用于对抗性训练。此外,硬件安全原语,如物理不可clonable functions[82][85]可以用来随机化模型,以帮助AE检测。

D. Limitations of Defenses

D.防御的限制

As discussed above, a lot of defenses have been proposed. In what follows, we summarize their advantages and disadvantages.

如上所述,已经提出了许多防御措施。在下面,我们总结了他们的优点和缺点。

As shown in Table V, adversarial training is simple and can significantly improve the robustness of models. However, AEs are required in the training process, which brings high overhead. In addition, it is difficult to theoretically explain which attack method to construct AEs for adversarial training can achieve the best robustness of models. Defensive distillation can greatly reduce the sensitivity to perturbations without modifying the neural network architectures. Therefore, defensive distillation incurs low overhead in training and testing. However, defensive distillation needs to add distillation temperature and modify the objective function, which increases the complexity of designing defensive models. In addition, attackers can easily bypass the defensive distillation by the following three strategies: 1) choose a more suitable objective function; 2) calculate the final layer of gradient instead of the second-to-last layer of gradient; and 3) attack a fragile model and then transfer to the distillation model. Detectors do not need to modify the model architecture and parameters; hence, the complexity is low. However, its performance is highly correlated with the type of detector. In addition, this method only detects the existence of AEs and does not improve the robustness of the model.

如表V所示,对抗性训练简单,可以显著提高模型的鲁棒性。然而,在培训过程中需要AEs,这带来了很高的开销。另外,构建AEs进行对抗性训练的攻击方法,很难从理论上解释哪种攻击方法能使模型具有最好的鲁棒性。防御蒸馏可以在不改变神经网络结构的情况下,大大降低对扰动的敏感性。因此,防御蒸馏在训练和测试中带来了较低的开销。但防御蒸馏需要增加蒸馏温度和修改目标函数,增加了防御模型设计的复杂性。此外,攻击者可以通过以下三种策略轻松绕过防御蒸馏:1)选择更合适的目标函数;2)计算梯度的最后一层,而不是倒数第二层;3)攻击脆性模型,然后转移到蒸馏模型。检测器不需要修改模型体系结构和参数;因此,复杂性很低。然而,它的性能与探测器的类型高度相关。此外,该方法只检测AEs的存在,并不提高模型的鲁棒性。

VII. RECENT CHALLENGES AND NEW OPPORTUNITIES

VII.最近的挑战和新的机遇

AE construction and defense are one of the research hotspots in the AI security field. Although many AE construction methods and defense techniques have been proposed, various unresolved problems still exist. This section summarizes the challenges to this field and put forward to some future research directions.

AE的构建与防御是人工智能安全领域的研究热点之一。虽然已经提出了许多声发射的施工方法和防御技术,但仍存在各种尚未解决的问题。本部分总结了该领域面临的挑战,并提出了未来的研究方向。

In term of AE construction, there are three major challenges as follows.

在AE建设方面,存在以下三大挑战。

  1. It Is Difficult to Build a Generalized AE Construction Method: In recent years, a lot of AE-construction methods have been proposed, such as the gradient-based FGSM, JSMA, the decision boundary-based DeepFool, and the ensemble attack method combining multiple models. These methods are difficult to construct a generalized AE and can only achieve good performance in some evaluation metrics. Therefore, defenders can propose efficient defenses against these specific attacks. For example, the gradient can be hidden or obfuscated to prevent against the gradient-based AE construction methods.

    1. 建立一种通用的声发射构造方法是困难的:近年来,人们提出了许多声发射构造方法,如基于梯度的FGSM、JSMA、基于决策边界的DeepFool以及结合多模型的集成攻击方法。这些方法难以构造一个广义AE,只能在某些评价指标上取得较好的效果。因此,防御者可以针对这些特定的攻击提出有效的防御。例如,梯度可以被隐藏或混淆,以防止基于梯度的声发射构造方法。

  2. It Is Difficult to Control the Magnitude of Perturbation for Target Images: In the mainstream attack methods, attackers construct AEs by perturbing target images to fool neural network models. However, it is difficult to control the magnitude of perturbations because too small perturbations cannot generate AEs and too large perturbations can be perceived by human eyes easily.

    1. 目标图像的扰动程度难以控制:在主流的攻击方法中,攻击者通过扰动目标图像来构造AEs来欺骗神经网络模型。然而,扰动的大小很难控制,因为太小的扰动不能产生AEs,而太大的扰动人眼容易察觉。

  3. AEs Are Difficult to Maintain Adversarial Stability in Real-World Applications: The image perturbed at specific distances and angles may result in the misclassification of the model. However, a lot of images perturbed at different distances and angles fail to fool the classifier [86]. Moreover, AEs may lose their adversarial with physical transformation such as blurring, rotation, scaling, and illumination [87]. In fact, it is hard for AEs to maintain stability in real-world applications.

    1. AEs很难在竞争中保持稳定 实际应用:图像在特定距离和角度上的扰动可能会导致模型的误分类。然而,很多以不同距离和角度受扰动的图像都无法欺骗分类器[86]。 此外,AEs可能失去其与物理变换的对抗,如模糊、旋转、缩放和光照[87]。事实上,AEs很难在实际应用程序中保持稳定性。

Therefore, to address these issues, we propose to improve AE quality in the following three directions.

因此,针对这些问题,我们提出改进建议 AE质量有以下三个方向。

  1. Construct AEs With a High Transfer Rate: With the diversification of neural network models, the effectiveness of attacks for a single model is not enough. Based on the transferability, constructing AEs with a high transfer rate is a prerequisite to evaluate the effectiveness of black-box attacks and a key metric to evaluate generalized attacks.

    1. 构建传输率高的AEs:随着神经网络模型的多样化,对单一模型的攻击效率不够。基于可转移性,构造具有高传输率的AEs是评价黑盒攻击有效性的前提,也是评价广义攻击的关键指标。

  2. Construct AEs Without Perturbing the Target Image: When constructing AEs, the magnitude of perturbations to the target image is determined by experiments. Hence, the optimal perturbation will be different in various models. It increases the complexity of attacks and affects the success rate and the transfer rate. Therefore, constructing AEs without perturbing the target image is a novel and challenging research direction.

    1. 在不扰动目标图像的情况下构造AEs:在构造AEs时,通过实验确定对目标图像的扰动程度。因此,在不同的模型中,最优扰动是不同的。它增加了攻击的复杂性,影响了成功率和传输率。因此,构造不干扰目标图像的AEs是一个新颖而富有挑战性的研究方向。

  3. Model the Physical Transformation: In the physical world, attackers need to consider not only the magnitude of the perturbations but also the physical transformations such as translation, rotation, brightness, and contrast. However, it is difficult for attackers to use traditional algorithms to generate real-world AEs with high adversarial stability. Therefore, modeling physical perturbations is an efficient way to improve the stability of real-world AEs.

    1. 对物理转换建模:在物理世界中,攻击者不仅需要考虑扰动的大小,还需要考虑物理转换,如平移、旋转、亮度和对比度。然而,攻击者很难使用传统算法生成具有高对抗稳定性的真实AEs。因此,模拟物理扰动是提高实际AEs稳定性的有效途径。

In terms of defending against AEs, there are two main challenges at present.

在防御AEs方面,目前存在两个主要的挑战。

  1. Defense is Highly Related to Model Architectures and Parameters: The black-box attack does not need to obtain model architecture and parameters to construct AEs. Therefore, it is difficult for defenders to resist the black-box attack by modifying the model architectures or parameters. For example, defensive distillation needs to modify and retrain the target classifier.

    1. 防御与模型体系结构和参数高度相关:黑盒攻击不需要获得模型体系结构和参数来构造AEs。因此,防御者很难通过修改模型架构或参数来抵抗黑箱攻击。例如,防御性蒸馏需要修改和重新训练目标分类器。

  2. Weak Generalization for Defense Models: Adversarial training and detector are representative defense techniques. Adversarial training can improve the robustness of the model by adding AEs to the training set. The detector can detect examples based on AEs in the data set. However, the defense effect is quite different when defending AEs generated by different attack methods, i.e., the generalization ability of defense models is weak.

    1. 防御模型的弱泛化:对抗性训练和探测器是典型的防御技术。对抗的训练可以提高模型的鲁棒性训练集通过添加AEs。探测器可以检测基于AEs数据集内的例子。然而,防御的效果是完全不同的防守AEs所产生的不同的攻击方法,例如,防御模型的泛化能力弱。

VIII. CONCLUSION

VIII.结论

DNNs have recently achieved state-of-the-art performance on a variety of pattern recognition tasks. However, recent researches show that DNNs, like many other ML models, are vulnerable to AEs. Although many AE construction and defense methods have been proposed, there are still some challenges to be solved. The state-of-the-art research is still in the adversarial development stage of “while the priest climbs a post, the devil climbs ten.” In this survey, we review the stateof-the-art AE construction methods and the corresponding defense techniques, then summarize several challenges along with the future trends in this field. Although AEs have caused the deep learning to be questioned, it also prompts both academia and industry to understand the difference between AI and our human brain better.

最近,DNNs在多种模式识别任务上取得了最先进的性能。然而,最近的研究表明,DNNs,像许多其他ML模型,易受AEs攻击。虽然已经提出了许多声发射施工和防御方法,但仍存在一些有待解决的挑战。先进的研究还处于“神通一柱,鬼通十柱”的对抗式发展阶段。本文综述了目前最先进的声发射施工方法和相应的防御技术,总结了该领域面临的几个挑战以及未来的发展趋势。虽然AEs引起了深度学习的质疑,但它也促使学术界和工业界更好地理解人工智能和人类大脑之间的区别。

REFERENCES

参考文献

[1] (2017). The Brain (Game Show). [Online]. Available: https://en. wikipedia.org/wiki/TheBrain(gameshow)

[2] (2017). AlphaGo Zero: Learning From Scratch. [Online]. Available: https://deepmind.com/blog/alphago-zero-learning-scratch/

[3] A. Bundy, “Preparing for the future of artificial intelligence,” AI Soc., vol. 32, no. 2, pp. 285–287, May 2017.

[4] L. E. Parker, “Creation of the national artificial intelligence research and development strategic plan,” AI Mag., vol. 39, no. 2, pp. 25–32, 2018.

[5] A Next Generation Artificial Intelligence Development Plan: China. Accessed: Mar. 5, 2018. [Online]. Available: https:// chinacopyrightandmedia.wordpress.com/2017/07/20/a-next-generationartificial-intelligence-development-plan/

[6] (2016). Chubby Robot Goes Haywire, Injures Human at Trade Show. [Online]. Available: https://www.dailydot.com/debug/first-robot-humaninjury-china/

[7] (2017). Danger, Danger! 10 Alarming Examples of AI Gone Wild. [Online]. Available: https://www.infoworld.com/article/3184205/ technology-business/danger-danger-10-alarming-examples-of-ai-gonewild.html

[8] Uber Self-Driving Car Kills Arizona Pedestrian, Realizing Worst Fears of the New Tech. Accessed: May 7, 2018. [Online]. Available: https://www.usatoday.com/story/tech/2018/03/19/uber-self-drivingcar-kills-arizona-woman/438473002/

[9] X. Liu, J. Zhang, Y. Lin, and H. Li, “ATMPA: Attacking machine learning-based malware visualization detection methods via adversarial examples,” in Proc. IEEE/ACM Int. Symp. Qual. Service, Phoenix, AZ, USA, Jun. 2019, Art. no. 38.

[10] B. Biggio and F. Roli, “Wild patterns: Ten years after the rise of adversarial machine learning,” Pattern Recognit., vol. 84, pp. 317–331, Dec. 2018.

[11] M. Brundage et al., “The malicious use of artificial intelligence: Forecasting, prevention, and mitigation,” 2018, arXiv:1802.07228. [Online]. Available: https://arxiv.org/abs/1802.07228

[12] G. L. Wittel and S. F. Wu, “On attacking statistical spam filters,” in Proc. Conf. Email Anti-Spam, 2004.

[13] B. Biggio, G. Fumera, F. Roli, and L. Didaci, “Poisoning adaptive biometric systems,” Structural, Syntactic, and Statistical Pattern Recognition (Lecture Notes in Computer Science: Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7626. Berlin, Germany: Springer, 2012, pp. 417–425.

[14] B. Biggio, L. Didaci, G. Fumera, and F. Roli, “Poisoning attacks to compromise face templates,” in Proc. Int. Conf. Biometrics, Jun. 2013, pp. 1–7.

[15] S. R. Bulò, B. Biggio, I. Pillai, M. Pelillo, and F. Roli, “Randomized prediction games for adversarial machine learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 11, pp. 2466–2478, Nov. 2017.

[16] B. Biggio et al., “Evasion attacks against machine learning at test time,” in Proc. Eur. Conf. Mach. Learn. Knowl. Discovery Databases, 2013, pp. 387–402.

[17] K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel, “Adversarial examples for malware detection,” in Proc. Eur. Symp. Res. Comput. Secur., 2017, pp. 62–79.

[18] Z. Abaid, M. A. Kaafar, and S. Jha, “Quantifying the impact of adversarial evasion attacks on machine learning based Android malware classifiers,” in Proc. IEEE 16th Int. Symp. Netw. Comput. Appl. (NCA), Oct./Nov. 2017, pp. 1–10.

[19] D. Lowd and C. Meek, “Adversarial learning,” in Proc. 11th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2005, pp. 641–647.

[20] M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar, “Can machine learning be secure?” in Proc. ACM Symp. Inf., Comput. Commun. Secur., 2006, pp. 16–25.

[21] B. I. P. Rubinstein et al., “ANTIDOTE: Understanding and defending against poisoning of anomaly detectors,” in Proc. 9th ACM SIGCOMM Conf. Internet Meas. Conf., 2009, pp. 1–14.

[22] N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma, “Adversarial classification,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2004, pp. 99–108.

[23] F. Zhang, P. P. K. Chan, B. Biggio, D. S. Yeung, and F. Roli, “Adversarial feature selection against evasion attacks,” IEEE Trans. Cybern., vol. 46, no. 3, pp. 766–777, Mar. 2016.

[24] C. Szegedy et al., “Intriguing properties of neural networks,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2014, pp. 1–10.

[25] Even Artificial Neural Networks Can Have Exploitable ‘Backdoors’. Accessed: May 12, 2018. [Online]. Available: https://www.wired.com/ story/machine-learning-backdoors/

[26] H. Chen, H. Zhang, P.-Y. Chen, J. Yi, and C.-J. Hsieh, “Attacking visual language grounding with adversarial examples: A case study on neural image captioning,” in Proc. 56th Annu. Meeting Assoc. Comput. Linguistics, 2018, pp. 2587–2597.

[27] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2015.

[28] G. Philipp and J. G. Carbonell, “The nonlinearity coefficient—Predicting generalization in deep neural networks,” 2018, arXiv:1806.00179. [Online]. Available: https://arxiv.org/abs/1806.00179

[29] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” in Proc. ACM Asia Conf. Comput. Commun. Secur., 2017, pp. 506–519.

[30] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014.

[31] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings,” in Proc. IEEE Eur. Symp. Secur. Privacy (EuroS&P), Mar. 2016, pp. 372–387.

[32] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2017, pp. 1–17.

[33] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2017, pp. 1–14.

[34] D. Su, H. Zhang, H. Chen, J. Yi, P.-Y. Chen, and Y. Gao, “Is robustness the cost of accuracy?—A comprehensive study on the robustness of 18 deep image classification models,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 631–648.

[35] A. Rozsa, M. Günther, and T. E. Boult, “Are accuracy and robustness correlated,” in Proc. 15th IEEE Int. Conf. Mach. Learn. Appl. (ICMLA), Dec. 2016, pp. 227–232.

[36] N. Carlini and D. Wagner, “Defensive distillation is not robust to adversarial examples,” 2016, arXiv:1607.04311. [Online]. Available: https://arxiv.org/abs/1607.04311

[37] A. Fawzi, O. Fawzi, and P. Frossard, “Analysis of classifiers’ robustness to adversarial perturbations,” Mach. Learn., vol. 107, no. 3, pp. 481–508, 2018.

[38] F. Tramèr, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel, “The space of transferable adversarial examples,” 2017, arXiv:1704.03453. [Online]. Available: https://arxiv.org/abs/1704.03453

[39] Y. Liu, X. Chen, C. Liu, and D. Song, “Delving into transferable adversarial examples and black-box attacks,” 2016, arXiv:1611.02770. [Online]. Available: https://arxiv.org/abs/1611.02770

[40] L. Wu, Z. Zhu, C. Tai, and E. Weinan, “Understanding and enhancing the transferability of adversarial examples,” 2018, arXiv:1802.09707. [Online]. Available: https://arxiv.org/abs/1802.09707

[41] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.

[42] A. Rozsa, E. M. Rudd, and T. E. Boult, “Adversarial diversity and hard positive generation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun./Jul. 2016, pp. 25–32.

[43] D. Meng and H. Chen, “MagNet: A two-pronged defense against adversarial examples,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., New York, NY, USA, 2017, pp. 135–147.

[44] N. Narodytska and S. Kasiviswanathan, “Simple black-box adversarial attacks on deep neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, Jul. 2017, pp. 1310–1318.

[45] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “DeepFool: A simple and accurate method to fool deep neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2574–2582.

[46] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Universal adversarial perturbations,” in Proc. 30th IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 86–94.

[47] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in Proc. IEEE Symp. Secur. Privacy (EuroS&P), May 2017, pp. 39–57.

[48] X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and defenses for deep learning,” IEEE Trans. Neural Netw. Learn. Syst., to be published.

[49] Q. Liu, P. Li, W. Zhao, W. Cai, S. Yu, and V. C. M. Leung, “A survey on security threats and defensive techniques of machine learning: A data driven view,” IEEE Access, vol. 6, pp. 12103–12117, 2018.

[50] Y. Dong et al., “Boosting adversarial attacks with momentum,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. 9185–9193.

[51] F. Xia and R. Liu, “Adversarial examples generation and defense based on generative adversarial network,” in Proc. 27th Int. Joint Conf. Artif. Intell., 2018, pp. 3905–3911.

[52] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel, “Ensemble adversarial training: Attacks and defenses,” in Proc. Int. Conf. Learn. Represent., 2018, pp. 1–20.

[53] J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack for fooling deep neural networks,” IEEE Trans. Evol. Comput., to be published.

[54] T.-W. Weng et al., “Evaluating the robustness of neural networks: An extreme value theory approach,” in Proc. Int. Conf. Learn. Represent., 2018, pp. 1–18.

[55] T. B. Brown, D. Mané, A. Roy, M. Abadi, and J. Gilmer, “Adversarial patch,” in Proc. 31st Conf. Neural Inf. Process. Syst. (NIPS), Long Beach, CA, USA, 2017, pp. 1–6.

[56] X. Li et al., “Adversarial examples versus cloud-based detectors: A black-box empirical study,” 2019, arXiv:1901.01223. [Online]. Available: https://arxiv.org/abs/1901.01223

[57] O. Russakovsky et al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, Dec. 2015.

[58] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2818–2826.

[59] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. 31st Conf. Neural Inf. Process. Syst. (NIPS), 2012, pp. 1097–1105.

[60] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.

[61] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4700–4708.

[62] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Represent., 2015, pp. 1–14.

[63] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation as a defense to adversarial perturbations against deep neural networks,” in Proc. IEEE Symp. Secur. Privacy, May 2016, pp. 582–597.

[64] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, “A general framework for adversarial examples with objectives,” 2017, arXiv:1801.00349. [Online]. Available: https://arxiv.org/abs/1801.00349

[65] S. G. Finlayson, H. W. Chung, I. S. Kohane, and A. L. Beam, “Adversarial attacks against medical deep learning systems,” 2018, arXiv:1804.05296. [Online]. Available: https://arxiv.org/abs/1804.05296

[66] Y. Huang and S.-H. Wang, “Adversarial manipulation of reinforcement learning policies in autonomous agents,” in Proc. Int. Joint Conf. Neural Netw., Jul. 2018, pp. 1–8.

[67] S. Huang, N. Papernot, I. Goodfellow, Y. Duan, and P. Abbeel, “Adversarial attacks on neural network policies,” 2017, arXiv:1702.02284. [Online]. Available: https://arxiv.org/abs/1702.02284

[68] J. Lu, T. Issaranon, and D. Forsyth, “SafetyNet: Detecting and rejecting adversarial examples robustly,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 446–454.

[69] N. Carlini and D. Wagner, “Adversarial examples are not easily detected: Bypassing ten detection methods,” in Proc. 10th ACM Workshop Artif. Intell. Secur., 2017, pp. 3–14.

[70] K. Grosse, P. Manoharan, N. Papernot, M. Backes, and P. McDaniel, “On the (statistical) detection of adversarial examples,” 2017, arXiv:1702.06280. [Online]. Available: https://arxiv.org/abs/1702.06280

[71] Z. Gong, W. Wang, and W.-S. Ku, “Adversarial and clean data are not twins,” 2017, arXiv:1704.04960. [Online]. Available: https://arxiv. org/abs/1704.04960

[72] D. Hendrycks and K. Gimpel, “Early methods for detecting adversarial images,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2017, pp. 1–9.

[73] X. Li and F. Li, “Adversarial examples detection in deep networks with convolutional filter statistics,” in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2017, pp. 5775–5783.

[74] R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner, “Detecting adversarial samples from artifacts,” 2017, arXiv:1703.00410. [Online]. Available: https://arxiv.org/abs/1703.00410

[75] X. Ma et al., “Characterizing adversarial subspaces using local intrinsic dimensionality,” 2018, arXiv:1801.02613. [Online]. Available: https://arxiv.org/abs/1801.02613

[76] G. K. Dziugaite, Z. Ghahramani, and D. M. Roy, “A study of the effect of JPG compression on adversarial images,” 2016, arXiv:1608.00853. [Online]. Available: https://arxiv.org/abs/1608.00853

[77] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. Yuille, “Adversarial examples for semantic segmentation and object detection,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 1378–1387.

[78] S. Gu and L. Rigazio, “Towards deep neural network architectures robust to adversarial examples,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2015, pp. 1–9.

[79] A. S. Ross and F. Doshi-Velez, “Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients,” in Proc. AAAI Conf. Artif. Intell., 2018, pp. 1660–1669.

[80] H. Lee, S. Han, and J. Lee, “Generative adversarial trainer: Defense to adversarial perturbations with GAN,” 2017, arXiv:1705.03387. [Online]. Available: https://arxiv.org/abs/1705.03387

[81] S. Baluja and I. Fischer, “Adversarial transformation networks: Learning to generate adversarial examples,” 2017, arXiv:1703.09387. [Online]. Available: https://arxiv.org/abs/1703.09387

[82] J.-L. Zhang, G. Qu, Y.-Q. Lyu, and Q. Zhou, “A survey on silicon PUFs and recent advances in ring oscillator PUFs,” J. Comput. Sci. Technol., vol. 29, no. 4, pp. 664–678, Jul. 2014.

[83] J. Zhang, Y. Lin, Y. Lyu, and G. Qu, “A PUF-FSM binding scheme for FPGA IP protection and pay-per-device licensing,” IEEE Trans. Inf. Forensics Security, vol. 10, no. 6, pp. 1137–1150, Jun. 2015.

[84] J. Zhang, B. Qi, Z. Qin, and G. Qu, “HCIC: Hardware-assisted control-flow integrity checking,” IEEE Internet Things J., vol. 6, no. 1, pp. 458–471, Feb. 2019.

[85] J. Zhang and G. Qu, “Physical unclonable function-based key-sharing via machine learning for IoT security,” IEEE Trans. Ind. Electron., to be published. doi: 10.1109/TIE.2019.2938462.

[86] J. Lu, H. Sibai, E. Fabry, and D. Forsyth, “NO need to worry about adversarial examples in object detection in autonomous vehicles,” 2017, arXiv:1707.03501. [Online]. Available: https://arxiv.org/ abs/1707.03501

[87] A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok, “Synthesizing robust adversarial examples,” 2017, arXiv:1707.07397. [Online]. Available: https://arxiv.org/abs/1707.07397

Last updated

Was this helpful?