Zhang H, Yu Y, Jiao J, et al. Theoretically Principled Trade-off between Robustness and Accuracy[J]. arXiv: Learning, 2019.

@article{zhang2019theoretically,

title={Theoretically Principled Trade-off between Robustness and Accuracy},

author={Zhang, Hongyang and Yu, Yaodong and Jiao, Jiantao and Xing, Eric P and Ghaoui, Laurent El and Jordan, Michael I},

journal={arXiv: Learning},

year={2019}}

从二分类问题入手, 拆分\(\mathcal{R}_{rob}\)为\(\mathcal{R}_{nat},\mathcal{R}_{bdy}\), 通过\(\mathcal{R}_{rob}-\mathcal{R}_{nat}^*\)的上界建立损失函数,并将这种思想推广到一般的多分类问题.

主要内容

符号说明

\(X, Y\): 随机变量;

\(x\in \mathcal{X}, y\): 样本, 对应的标签(\(1, -1\));

\(f\): 分类器(如神经网络);

\(\mathbb{B}(x, \epsilon)\): \(\{x'\in \mathcal{X}:\|x'-x\| \le \epsilon\}\);

\(\mathbb{B}(DB(f),\epsilon)\): \(\{x \in \mathcal{X}: \exist x'\in \mathbb{B}(x,\epsilon), \mathrm{s.t.} \: f(x)f(x')\le0\}\) ;

\(\psi^*(u)\): \(\sup_u\{u^Tv-\psi(u)\}\), 共轭函数;

\(\phi\): surrogate loss.

Error

\[\tag{e.1}
\mathcal{R}_{rob}(f):= \mathbb{E}_{(X,Y)\sim \mathcal{D}}\mathbf{1}\{\exist X' \in \mathbb{B}(X, \epsilon), \mathrm{s.t.} \: f(X')Y \le 0\},
\]

其中\(\mathbf{1}(\cdot)\)表示指示函数, 显然\(\mathcal{R}_{rob}(f)\)是关于分类器\(f\)存在adversarial samples 的样本的点的测度.

\[\tag{e.2}
\mathcal{R}_{nat}(f) :=\mathbb{E}_{(X,Y)\sim \mathcal{D}}\mathbf{1}\{f(X)Y \le 0\},
\]

显然\(\mathcal{R}_{nat}(f)\)是\(f\)正确分类真实样本的概率, 并且\(\mathcal{R}_{rob} \ge \mathcal{R}_{nat}\).

\[\tag{e.3}
\mathcal{R}_{bdy}(f) :=\mathbb{E}_{(X,Y)\sim \mathcal{D}}\mathbf{1}\{X \in \mathbb{B}(DB(f), \epsilon), \:f(X)Y > 0\},
\]

显然

\[\tag{1}
\mathcal{R}_{rob}-\mathcal{R}_{nat}=\mathcal{R}_{bdy}.
\]

因为想要最优化\(0-1\)loss是很困难的, 我们往往用替代的loss \(\phi\), 定义:

\[\mathcal{R}_{\phi}(f):= \mathbb{E}_{(X, Y) \sim \mathcal{D}} \phi(f(X)Y), \\
\mathcal{R}^*_{\phi}(f):= \min_f \mathcal{R}_{\phi}(f).
\]

Classification-calibrated surrogate loss

这部分很重要, 但是篇幅很少, 我看懂, 等回看了引用的论文再讨论.

引理2.1

定理3.1

在假设1的条件下\(\phi(0)\ge1\), 任意的可测函数\(f:\mathcal{X} \rightarrow \mathbb{R}\), 任意的于\(\mathcal{X}\times \{\pm 1\}\)上的概率分布, 任意的\(\lambda > 0\), 有

\[\begin{array}{ll}
& \mathcal{R}_{rob}(f) - \mathcal{R}_{nat}^* \\
\le & \psi^{-1}(\mathcal{R}_{\phi}(f)-\mathcal{R}_{\phi}^*) + \mathbf{Pr}[X \in \mathbb{B}(DB(f), \epsilon), f(X)Y >0] \\
\le & \psi^{-1}(\mathcal{R}_{\phi}(f)-\mathcal{R}_{\phi}^*) + \mathbb{E} \quad \max _{X' \in \mathbb{B}(X, \epsilon)} \phi(f(X')f(X)/\lambda). \\
\end{array}
\]

最后一个不等式, 我知道是因为\(\phi(f(X')f(X)/\lambda) \ge1.\)

定理3.2

结合定理\(3.1, 3.2\)可知, 这个界是紧的.

由此导出的TRADES算法

二分类问题, 最优化上界, 即:

扩展到多分类问题, 只需:

算法如下:

实验概述

5.1: 衡量该算法下, 理论上界的大小差距;

5.2: MNIST, CIFAR10 上衡量\(\lambda\)的作用, \(\lambda\)越大\(\mathcal{R}_{nat}\)越小, \(\mathcal{R}_{rob}\)越大, CIFAR10上反映比较明显;

5.3: 在不同adversarial attacks 下不同算法的比较;

5.4: NIPS 2018 Adversarial Vision Challenge.

代码



import torch
import torch.nn as nn def quireone(func): #a decorator, for easy to define optimizer
def wrapper1(*args, **kwargs):
def wrapper2(arg):
result = func(arg, *args, **kwargs)
return result
wrapper2.__doc__ = func.__doc__
wrapper2.__name__ = func.__name__
return wrapper2
return wrapper1 class AdvTrain: def __init__(self, eta, k, lam,
net, lr = 0.01, **kwargs):
"""
:param eta: step size for adversarial attacks
:param lr: learning rate
:param k: number of iterations K in inner optimization
:param lam: lambda
:param net: network
:param kwargs: other configs for optim
"""
kwargs.update({'lr':lr})
self.net = net
self.criterion = nn.CrossEntropyLoss()
self.opti = self.optim(self.net.parameters(), **kwargs)
self.eta = eta
self.k = k
self.lam = lam @quireone
def optim(self, parameters, **kwargs):
"""
quireone is decorator defined below
:param parameters: net.parameteres()
:param kwargs: other configs
:return:
"""
return torch.optim.SGD(parameters, **kwargs) def normal_perturb(self, x, sigma=1.): return x + sigma * torch.randn_like(x) @staticmethod
def calc_jacobian(loss, inp):
jacobian = torch.autograd.grad(loss, inp, retain_graph=True)[0]
return jacobian @staticmethod
def sgn(matrix):
return torch.sign(matrix) def pgd(self, inp, y, perturb):
boundary_low = inp - perturb
boundary_up = inp + perturb
inp.requires_grad_(True)
out = self.net(inp)
loss = self.criterion(out, y)
delta = self.sgn(self.calc_jacobian(loss, inp)) * self.eta
inp_new = inp.data
for i in range(self.k):
inp_new = torch.clamp(
inp_new + delta,
boundary_low,
boundary_up
)
return inp_new def ipgd(self, inps, ys, perturb):
N = len(inps)
adversarial_samples = []
for i in range(N):
inp_new = self.pgd(
inps[[i]], ys[[i]],
perturb
)
adversarial_samples.append(inp_new) return torch.cat(adversarial_samples) def train(self, trainloader, epoches=50, perturb=1, normal=1): for epoch in range(epoches):
running_loss = 0.
for i, data in enumerate(trainloader, 1):
inps, labels = data adv_inps = self.ipgd(self.normal_perturb(inps, normal),
labels, perturb) out1 = self.net(inps)
out2 = self.net(adv_inps) loss1 = self.criterion(out1, labels)
loss2 = self.criterion(out2, labels) loss = loss1 + loss2 self.opti.zero_grad()
loss.backward()
self.opti.step() running_loss += loss.item() if i % 10 is 0:
strings = "epoch {0:<3} part {1:<5} loss: {2:<.7f}\n".format(
epoch, i, running_loss
)
print(strings)
running_loss = 0.

Theoretically Principled Trade-off between Robustness and Accuracy的更多相关文章

  1. 近年Recsys论文

    2015年~2017年SIGIR,SIGKDD,ICML三大会议的Recsys论文: [转载请注明出处:https://www.cnblogs.com/shenxiaolin/p/8321722.ht ...

  2. real-Time Correlative Scan Matching

    启发式算法(heuristic algorithm)是相对于最优化算法提出的.一个问题的最优算法求得该问题每个实例的最优解.启发式算法可以这样定义:一个基于直观或经验构造的算法,在可接受的花费(指计算 ...

  3. [Tensorflow] Object Detection API - retrain mobileNet

    前言 一.专注话题 重点话题 Retrain mobileNet (transfer learning). Train your own Object Detector. 这部分讲理论,下一篇讲实践. ...

  4. [转]Introduction to Learning to Trade with Reinforcement Learning

    Introduction to Learning to Trade with Reinforcement Learning http://www.wildml.com/2018/02/introduc ...

  5. Introduction to Learning to Trade with Reinforcement Learning

    http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/ The academic ...

  6. Accuracy, Precision, Resolution & Sensitivity

    Instrument manufacturers usually supply specifications for their equipment that define its accuracy, ...

  7. Certified Adversarial Robustness via Randomized Smoothing

    目录 概 主要内容 定理1 代码 Cohen J., Rosenfeld E., Kolter J. Certified Adversarial Robustness via Randomized S ...

  8. caffe的python接口学习(7):绘制loss和accuracy曲线

    使用python接口来运行caffe程序,主要的原因是python非常容易可视化.所以不推荐大家在命令行下面运行python程序.如果非要在命令行下面运行,还不如直接用 c++算了. 推荐使用jupy ...

  9. HDOJ 1009. Fat Mouse' Trade 贪心 结构体排序

    FatMouse' Trade Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others) ...

随机推荐

  1. 大数据学习day33----spark13-----1.两种方式管理偏移量并将偏移量写入redis 2. MySQL事务的测试 3.利用MySQL事务实现数据统计的ExactlyOnce(sql语句中出现相同key时如何进行累加(此处时出现相同的单词))4 将数据写入kafka

    1.两种方式管理偏移量并将偏移量写入redis (1)第一种:rdd的形式 一般是使用这种直连的方式,但其缺点是没法调用一些更加高级的api,如窗口操作.如果想更加精确的控制偏移量,就使用这种方式 代 ...

  2. 【Reverse】每日必逆0x00

    附件:https://files.buuoj.cn/files/aa4f6c7e8d5171d520b95420ee570e79/a9d22a0e-928d-4bb4-8525-e38c9481469 ...

  3. Codeforces Round #754 (Div. 2) C. Dominant Character

    题目:Problem - C - Codeforces 如代码,一共有七种情况,注意不要漏掉  "accabba"  , "abbacca"  两种情况: 使用 ...

  4. 【Git项目管理】Git分支 - 远程分支

    远程分支 远程引用是对远程仓库的引用(指针),包括分支.标签等等. 你可以通过 git ls-remote (remote) 来显式地获得远程引用的完整列表,或者通过 git remote show ...

  5. 如果你不想让pthread_join阻塞你的进程,那么请调用pthread_detach

    如果你不想让pthread_join阻塞你的进程,那么请调用pthread_detach 2016年01月13日 16:04:20 炸鸡叔 阅读数:7277   转发自:http://baike.ba ...

  6. Linux:awk与cut命令的区别

    结论:awk 以空格为分割域时,是以单个或多个连续的空格为分隔符的;cut则是以单个空格作为分隔符.

  7. SpringIOC原理浅析

    1. IoC理论的背景我们都知道,在采用面向对象方法设计的软件系统中,它的底层实现都是由N个对象组成的,所有的对象通过彼此的合作,最终实现系统的业务逻辑. 图1:软件系统中耦合的对象 如果我们打开机械 ...

  8. 收集linux网络配置信息的shell脚本

    此脚本已在CentOS/ RHEL和Fedora Linux操作系统下进行测试过.可用于当前网络配置信息. 代码: #!/bin/bash # HWINF=/usr/sbin/hwinfo IFCFG ...

  9. numpy基础教程--二维数组的转置

    使用numpy库可以快速将一个二维数组进行转置,方法有三种 1.使用numpy包里面的transpose()可以快速将一个二维数组转置 2.使用.T属性快速转置 3.使用swapaxes(1, 0)方 ...

  10. 培训班输出的大量学员,会对IT行业产生哪些影响?

    先说下会有哪些影响呢?   1 可能也就是些大城市的,规模比较大的,口碑比较好的培训学校输出的码农才能入行,而且能做长久.一些线上的所谓培训机构,或者小城市的培训学校,输出的能入行的码农,其实规模很有 ...