Learning Efficient Convolutional Networks through Network Slimming

简介

这是我看的第一篇模型压缩方面的论文，应该也算比较出名的一篇吧，因为很早就对模型压缩比较感兴趣，所以抽了个时间看了一篇，代码也自己实现了一下，觉得还是挺容易的。这篇文章就模型压缩问题提出了一种剪枝针对BN层的剪枝方法，作者通过利用BN层的权重来评估输入channel的score，通过对score进行threshold过滤到score低的channel，在连接的时候这些score太小的channel的神经元就不参与连接，然后逐层剪枝，就达到了压缩效果。

就我个人而言，现在常用的attention mechanism我认为可以用来评估channel的score可以做一做文章，但是肯定是针对特定任务而言的，后面我会自己做一做实验，利用attention机制来模型剪枝。

方法

本文的方法如图所示，即

给定要保留层的比例，记下所有BN层大于该比例的权重
对模型先进行BN层的剪枝，即丢弃小于上面权重比例的参数
对模型进行卷积层剪枝（因为通常是卷积层后+BN，所以知道由前后的BN层可以知道卷积层权重size），对卷积层的size做匹配前后BN的对应channel元素丢弃的剪枝。
对FC层进行剪枝

感觉说不太清楚，但是一看代码就全懂了。。

代码

我自己实现了一下。

import torch

import torch.nn as nn

import torch.nn.functional as F

from torchvision.models import vgg19

from torchsummary import summary

class Net(nn.Module):

    def __init__(self):

        super(Net,self).__init__()

        self.convnet = nn.Sequential(

            nn.Conv2d(3,16,kernel_size = 3),

            nn.BatchNorm2d(16),

            nn.ReLU(),

            nn.Conv2d(16,32,kernel_size = 3),

            nn.BatchNorm2d(32),

            nn.ReLU(),

            nn.Conv2d(32,64,kernel_size = 3),

            nn.BatchNorm2d(64),

            nn.ReLU(),

            nn.Conv2d(64,128,kernel_size = 3),

            nn.BatchNorm2d(128),

            nn.ReLU()

        )

        self.maxpool = nn.MaxPool2d(216)

        self.fc = nn.Linear(128,3)

    def forward(self,x):

        x = self.convnet(x)

        x = self.maxpool(x)

        x = x.view(-1,x.size(1))

        return self.fc(x)

if __name__ == "__main__":

    net = Net()

    net_new = Net()

    idxs = []

    idxs.append(range(3))

    for module in net.modules():

        if type(module) is nn.BatchNorm2d:

            weight = module.weight.data

            n = weight.size(0)

            y,idx = torch.sort(weight)

            n = int(0.8 * n)

            idxs.append(idx[:n])

            #print(module.weight.data.size())

    i=1

    for module in net_new.modules():

        if type(module) is nn.Conv2d:

            weight = module.weight.data.clone()

            weight = weight[idxs[i],:,:,:]

            weight = weight[:,idxs[i-1],:,:]

            module.bias.data = module.bias.data[idxs[i]]

            module.weight.data = weight

        elif type(module) is nn.BatchNorm2d:

            weight = module.weight.data.clone()

            bias = module.bias.data.clone()

            running_mean = module.running_mean.data.clone()

            running_var = module.running_var.data.clone()

            weight = weight[idxs[i]]

            bias = bias[idxs[i]]

            running_mean = running_mean[idxs[i]]

            running_var = running_var[idxs[i]]

            module.weight.data = weight

            module.bias.data = bias

            module.running_var.data = running_var

            module.running_mean.data = running_mean

            i += 1

        elif type(module) is nn.Linear:

            #print(module.weight.data.size())

            module.weight.data = module.weight.data[:,idxs[-1]]

    summary(net_new,(3,224,224),device = "cpu")

'''

这是对vgg的剪枝例子，文章中说了对其他网络的slimming例子

'''

import os

import argparse

import numpy as np

import torch

import torch.nn as nn

from torch.autograd import Variable

from torchvision import datasets, transforms

from torchvision.models import vgg19

from models import *

# Prune settings

parser = argparse.ArgumentParser(description='PyTorch Slimming CIFAR prune')

parser.add_argument('--dataset', type=str, default='cifar100',

                    help='training dataset (default: cifar10)')

parser.add_argument('--test-batch-size', type=int, default=256, metavar='N',

                    help='input batch size for testing (default: 256)')

parser.add_argument('--no-cuda', action='store_true', default=False,

                    help='disables CUDA training')

parser.add_argument('--depth', type=int, default=19,

                    help='depth of the vgg')

parser.add_argument('--percent', type=float, default=0.5,

                    help='scale sparse rate (default: 0.5)')

parser.add_argument('--model', default='', type=str, metavar='PATH',

                    help='path to the model (default: none)')

parser.add_argument('--save', default='', type=str, metavar='PATH',

                    help='path to save pruned model (default: none)')

args = parser.parse_args()

args.cuda = not args.no_cuda and torch.cuda.is_available()

if not os.path.exists(args.save):

    os.makedirs(args.save)

model = vgg19(dataset=args.dataset, depth=args.depth)

if args.cuda:

    model.cuda()

if args.model:

    if os.path.isfile(args.model):

        print("=> loading checkpoint '{}'".format(args.model))

        checkpoint = torch.load(args.model)

        args.start_epoch = checkpoint['epoch']

        best_prec1 = checkpoint['best_prec1']

        model.load_state_dict(checkpoint['state_dict'])

        print("=> loaded checkpoint '{}' (epoch {}) Prec1: {:f}"

              .format(args.model, checkpoint['epoch'], best_prec1))

    else:

        print("=> no checkpoint found at '{}'".format(args.resume))

print(model)

total = 0

for m in model.modules():# 遍历vgg的每个module

    if isinstance(m, nn.BatchNorm2d): # 如果发现BN层

        total += m.weight.data.shape[0] # BN层的特征数目，total就是所有BN层的特征数目总和

bn = torch.zeros(total)

index = 0

for m in model.modules():

    if isinstance(m, nn.BatchNorm2d):

        size = m.weight.data.shape[0]

        bn[index:(index+size)] = m.weight.data.abs().clone()

        index += size # 把所有BN层的权重给CLONE下来

y, i = torch.sort(bn) # 这些权重排序

thre_index = int(total * args.percent) # 要保留的数量

thre = y[thre_index] # 最小的权重值

pruned = 0

cfg = []

cfg_mask = []

for k, m in enumerate(model.modules()):

    if isinstance(m, nn.BatchNorm2d):

        weight_copy = m.weight.data.abs().clone()

        mask = weight_copy.gt(thre).float().cuda()# 小于权重thre的为0，大于的为1

        pruned = pruned + mask.shape[0] - torch.sum(mask) # 被剪枝的权重的总数

        m.weight.data.mul_(mask) # 权重对应相乘

        m.bias.data.mul_(mask) # 偏置也对应相乘

        cfg.append(int(torch.sum(mask))) #第几个batchnorm保留多少。

        cfg_mask.append(mask.clone()) # 第几个batchnorm 保留的weight

        print('layer index: {:d} \t total channel: {:d} \t remaining channel: {:d}'.

            format(k, mask.shape[0], int(torch.sum(mask))))

    elif isinstance(m, nn.MaxPool2d):

        cfg.append('M')

pruned_ratio = pruned/total # 剪枝比例

print('Pre-processing Successful!')

# simple test model after Pre-processing prune (simple set BN scales to zeros)

def test(model):

    kwargs = {'num_workers': 1, 'pin_memory': True} if args.cuda else {}

    if args.dataset == 'cifar10':

        test_loader = torch.utils.data.DataLoader(

            datasets.CIFAR10('./data.cifar10', train=False, transform=transforms.Compose([

                transforms.ToTensor(),

                transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))])),

            batch_size=args.test_batch_size, shuffle=True, **kwargs)

    elif args.dataset == 'cifar100':

        test_loader = torch.utils.data.DataLoader(

            datasets.CIFAR100('./data.cifar100', train=False, transform=transforms.Compose([

                transforms.ToTensor(),

                transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))])),

            batch_size=args.test_batch_size, shuffle=True, **kwargs)

    else:

        raise ValueError("No valid dataset is given.")

    model.eval()

    correct = 0

    for data, target in test_loader:

        if args.cuda:

            data, target = data.cuda(), target.cuda()

        data, target = Variable(data, volatile=True), Variable(target)

        output = model(data)

        pred = output.data.max(1, keepdim=True)[1] # get the index of the max log-probability

        correct += pred.eq(target.data.view_as(pred)).cpu().sum()

    print('\nTest set: Accuracy: {}/{} ({:.1f}%)\n'.format(

        correct, len(test_loader.dataset), 100. * correct / len(test_loader.dataset)))

    return correct / float(len(test_loader.dataset))

acc = test(model)

# Make real prune

print(cfg)

newmodel = vgg(dataset=args.dataset, cfg=cfg)

if args.cuda:

    newmodel.cuda()

# torch.nelement() 可以统计张量的个数

num_parameters = sum([param.nelement() for param in newmodel.parameters()]) # 元素个数，比如对于张量shape为(20,3,3,3),那么他的元素个数就是四者乘积也就是20*27 = 540

# 可以用来统计参数量　嘿嘿

savepath = os.path.join(args.save, "prune.txt")

with open(savepath, "w") as fp:

    fp.write("Configuration: \n"+str(cfg)+"\n")

    fp.write("Number of parameters: \n"+str(num_parameters)+"\n")

    fp.write("Test accuracy: \n"+str(acc))

layer_id_in_cfg = 0　# 第几层

start_mask = torch.ones(3)

end_mask = cfg_mask[layer_id_in_cfg] #

for [m0, m1] in zip(model.modules(), newmodel.modules()):

    if isinstance(m0, nn.BatchNorm2d):

        # np.where 返回的是所有满足条件的数的索引，有多少个满足条件的数就有多少个索引，绝对的索引

        idx1 = np.squeeze(np.argwhere(np.asarray(end_mask.cpu().numpy()))) # 大于0的所有数据的索引，squeeze变成向量

        if idx1.size == 1: # 只有一个要变成数组的1个

            idx1 = np.resize(idx1,(1,))

        m1.weight.data = m0.weight.data[idx1.tolist()].clone() # 用经过剪枝的替换原来的

        m1.bias.data = m0.bias.data[idx1.tolist()].clone()

        m1.running_mean = m0.running_mean[idx1.tolist()].clone()

        m1.running_var = m0.running_var[idx1.tolist()].clone()

        layer_id_in_cfg += 1 # 下一层

        start_mask = end_mask.clone() # 当前在处理的层的mask

        if layer_id_in_cfg < len(cfg_mask):  # do not change in Final FC

            end_mask = cfg_mask[layer_id_in_cfg]

    elif isinstance(m0, nn.Conv2d): # 对卷积层进行剪枝

        # 卷积后面会接bn

        idx0 = np.squeeze(np.argwhere(np.asarray(start_mask.cpu().numpy())))

        idx1 = np.squeeze(np.argwhere(np.asarray(end_mask.cpu().numpy())))

        print('In shape: {:d}, Out shape {:d}.'.format(idx0.size, idx1.size))

        if idx0.size == 1:

            idx0 = np.resize(idx0, (1,))

        if idx1.size == 1:

            idx1 = np.resize(idx1, (1,))

        w1 = m0.weight.data[:, idx0.tolist(), :, :].clone() # 这个剪枝牛B了。。

        w1 = w1[idx1.tolist(), :, :, :].clone() # 最终的权重矩阵

        m1.weight.data = w1.clone()

    elif isinstance(m0, nn.Linear):

        idx0 = np.squeeze(np.argwhere(np.asarray(start_mask.cpu().numpy())))

        if idx0.size == 1:

            idx0 = np.resize(idx0, (1,))

        m1.weight.data = m0.weight.data[:, idx0].clone()

        m1.bias.data = m0.bias.data.clone()

torch.save({'cfg': cfg, 'state_dict': newmodel.state_dict()}, os.path.join(args.save, 'pruned.pth.tar'))

print(newmodel)

model = newmodel

test(model)

[论文理解] Learning Efficient Convolutional Networks through Network Slimming的更多相关文章

模型压缩-Learning Efficient Convolutional Networks through Network Slimming
Zhuang Liu主页:https://liuzhuang13.github.io/ Learning Efficient Convolutional Networks through Networ ...
[论文理解] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Intro MobileNet 我 ...
论文翻译：2020_WaveCRN: An efficient convolutional recurrent neural network for end-to-end speech enhancement
论文地址:用于端到端语音增强的卷积递归神经网络论文代码:https://github.com/aleXiehta/WaveCRN 引用格式:Hsieh T A, Wang H M, Lu X, et ...
图像处理论文详解 | Deformable Convolutional Networks | CVPR | 2017
文章转自同一作者的微信公众号:[机器学习炼丹术] 论文名称:"Deformable Convolutional Networks" 论文链接:https://arxiv.org/a ...
[论文阅读] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (MobileNet)
论文地址:MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 本文提出的模型叫Mobi ...
论文笔记——MobileNets(Efficient Convolutional Neural Networks for Mobile Vision Applications)
论文地址:MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications MobileNet由Go ...
VGGNet论文翻译-Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan[‡] & Andrew Zi ...
目标检测论文阅读：Deformable Convolutional Networks
https://blog.csdn.net/qq_21949357/article/details/80538255 这篇论文其实读起来还是比较难懂的,主要是细节部分很需要推敲,尤其是deformab ...
论文学习：Fully Convolutional Networks for Semantic Segmentation
发表于2015年这篇<Fully Convolutional Networks for Semantic Segmentation>在图像语义分割领域举足轻重. 1 CNN 与 FCN 通 ...

随机推荐

Vue 路由(对路由页面编写做规范)
前言上一篇写了“Vue 路由拦截(对某些页面需要登陆才能访问)” 的博客,此篇是续上篇对路由页面模块化,可以看到之前的路由配置都写在main.js中,真正开发肯定不能都写在main.js,所以我们要 ...
golang time json mongodb 时间处理
golang 中解决前端time 输出,后端mongodb中时间存储. package mask import ( "fmt" "time" "go. ...
Runtime.getRuntime.exec()执行linux脚本导致程序卡死问题
rumtime程序执行中出现卡住,执行成果达不到预期的标准.查看输出流以及错误流程是否内存占满了.开两个线程来运行输出流程和错误流程. rumtime运行windows脚本执行是要添加执行环境 cmd ...
apache笔记
apache笔记一)两种工作模式 Prefork和worker prefork模式: 一个进程响应一个请求主进程生成多个工作进程,由工作进程一对一的去响应客户端的请求过程: 1)用户空间有个具有 ...
maven 学习之路一
一.mave介绍: maven :我的理解就是一个代码构建管理的一个工具.类似的工具有gradle,ant等. 官方理解:Apache Maven is a software project mana ...
小程序UI设计（5）-符合视觉规范-按钮视觉规范
在设计工具中,根据规范我们定义了大中小三种按钮的尺寸大:720rpx *94rpx 圆角10px 字体18中:360rpx*70rpx 圆角8px 字体16 文字距离两边最小60小:120rpx*60 ...
VMware的linux虚拟机配置ip后无法ping通宿主机
VMware的linux虚拟机配置ip(使用eth0)后无法ping通宿主机,同样宿主机无法ping通linux虚拟机. 可能原因:linux虚拟机使用的网卡,与本机使用的网卡不同,配置成与本机一致的 ...
Pythonic Code In Several Lines
1. Fibonacci Series def Fib(n): if n == 1 or n == 2: return 1; else: return Fib(n - 1) + Fib(n - 2) ...
nginx-轮询、权重、ip_hash 、fair模式
在 linux 下有 Nginx.LVS.Haproxy 等等服务可以提供负载均衡服务,而且 Nginx 提供了几种分配方式(策略): 1.轮询(默认) 每个请求按时间顺序逐一分配到不同的后端服务器 ...
eclipse+自己安装的maven不能run as 找不到包
我本地环境eclipse自带maven但是默认指定的路径是c盘下,本着不想放c盘,就自己下了maven包集成到eclipse中,但是java类中main方法如果调用了maven中的包是找不到的.后类自 ...

[论文理解] Learning Efficient Convolutional Networks through Network Slimming

Learning Efficient Convolutional Networks through Network Slimming

简介

方法

代码

[论文理解] Learning Efficient Convolutional Networks through Network Slimming的更多相关文章

随机推荐

热门专题