Introduction

这两天看了一下这篇文章，我就这里分享一下，不过我还是只记录一下跟别人blog上没有，或者自己的想法(ps: 因为有时候翻blog时候发现每篇都一样还是挺烦的= =) 。为了不重复前人的工作，我post一个不小心翻到的博客权值简化（1）：三值神经网络（Ternary Weight Networks），整个论文内容及实现都讲的很全面了，可以翻阅一下，我也借鉴一下。

文中主要工作的点在三个方面:

增加了网络的表达力（expressive ability）。在{1，0，1}基础上增加了 $\alpha$ 作为scaled factor；
压缩模型大小。当然主要是weight的压缩。比起FPWN（full precision weight network）有16~32x的提升，但是BPWN（binary precision weight network）的2x大小（ps:当然在TWN的caffe代码里面，都由float double类型存储，因为这需要在应该上方面来实现）；
减少计算需求。主要相比于BPWN增多了0，当然这方面也需硬件来获得提升，在该caffe代码里面并没有；

Ternary Quantization

在我的理解看来，文中最核心的内容是：将有约束的并且两变量之间互相依赖的优化问题，逐步拆分最后用具有先验的统计方法来近视解决。

最初的优化问题：

将$W^{t}$的约束具体化为：

并将其带入公式（1），将$W^{t*}$的优化转化为$\Delta^*$的优化：

然后对公式（4）中的$\alpha$求偏导，得到：

因为$\alpha$和$\Delta$相互依赖，将（5）代入（4）消去$\alpha$：

但问题来了，公式（6）依然没法求，而文中就根据先验知识，假设$W_i$服从$N(0,\sigma^2)$分布，近视的$\Delta^*$为$0.6\sigma$（$0.6\sigma$等于$0.75E(|W|)$）。因此作者采用粗暴的方法，把$\Delta^*$设为$\Delta^*\approx0.7E(|W|)\approx\frac{n}{0.7}\sum_{i=1}^n|W_i|$

//caffe-twns

//blob.cpp

template <typename Dtype>

void Blob<Dtype>::set_delta(){

  float scale_factor = TERNARY_DELTA * 1.0 / 10; //delta = 0.7

  Dtype delta = (Dtype) scale_factor * this->asum_data() / this->count(); // 0.7*(E|W_i|)/num

  delta = (delta <= 100) ? delta : 100;

  delta = (delta >= -100) ? delta : -100;

  this->delta_ = delta;

}

template <typename Dtype>

void Blob<Dtype>::set_delta(Dtype delta){

  delta = (delta <= 100) ? delta : 100;

  delta = (delta >= -100) ? delta : -100;

  this->delta_ = delta;

}

Implement

我借用一张图

步骤3~5，其中第5步代码在上面：

template <typename Dtype>

void Blob<Dtype>::ternarize_data(Phase phase){

if(phase == RUN){

// if(DEBUG) print_head();

 //LOG(INFO) << "RUN phase...";

 // caffe_sleep(3);

 return; // do nothing for the running phase

}else if(phase == TRAIN){

 //LOG(INFO) << "TRAIN phase ...";

 // caffe_sleep(3);

}else{

 //LOG(INFO) << "TEST phase ...";

 // caffe_sleep(3);

}

  // const Dtype delta = 0; // default value;

  // const Dtype delta = (Dtype) 0.8 * this->asum_data() / this->count();

  this->set_delta();  //defualt 0.7*(E|W_i|)/num or set by user

  const Dtype delta = this->get_delta();

  Dtype alpha = 1;

  if (!data_) { return; }

  switch (data_->head()) {

  case SyncedMemory::HEAD_AT_CPU:

{

	caffe_cpu_ternary<Dtype>(this->count(), delta, this->cpu_data(), this->mutable_cpu_binary()); //quantized weight to ternary

	alpha = caffe_cpu_dot(this->count(), this->cpu_binary(), this->cpu_data());  //scale-alpha: (E |W_i|)   i belong to I_delta

	alpha /= caffe_cpu_dot(this->count(), this->cpu_binary(), this->cpu_binary()); //(1/num_binary)*alpha

	caffe_cpu_scale(this->count(), alpha, this->cpu_binary(), this->mutable_cpu_binary());

	// this->set_alpha(alpha);

}

    return;

  case SyncedMemory::HEAD_AT_GPU:

  case SyncedMemory::SYNCED:

#ifndef CPU_ONLY

{

    caffe_gpu_ternary<Dtype>(this->count(), delta, this->gpu_data(), this->mutable_gpu_binary());

	Dtype* pa = new Dtype(0);

	caffe_gpu_dot(this->count(), this->gpu_binary(), this->gpu_data(), pa);

	Dtype* pb = new Dtype(0);

	caffe_gpu_dot(this->count(), this->gpu_binary(), this->gpu_binary(), pb);

	alpha = (*pa) / ((*pb) + 1e-6);

	this->set_alpha(alpha);

	caffe_gpu_scale(this->count(), alpha, this->gpu_binary(), this->mutable_gpu_binary());

	// this->set_alpha((Dtype)1);

    // LOG(INFO) << "alpha = " << alpha;

	// caffe_sleep(3);

}

    return;

#else

    NO_GPU;

#endif

  case SyncedMemory::UNINITIALIZED:

    return;

  default:

    LOG(FATAL) << "Unknown SyncedMemory head state: " << data_->head();

  }

}

步骤6~7，其中在第6步作者在caffe-twns直接采用传统caffe的方法，而$Z=XW\approx X(\alpha W^t)=(\alpha X)\bigoplus W^t $更偏向与在硬件加速的优化（因为本身在caffe-twns的ternary就采用float或者double，并且用blas或cudnn加速也无法直接跳过0值）：

//conv_layer.cpp

template <typename Dtype>

void ConvolutionLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,

      const vector<Blob<Dtype>*>& top) {

  // const Dtype* weight = this->blobs_[0]->cpu_data();

if(BINARY){

  this->blobs_[0]->binarize_data();

} 

if(TERNARY){

  this->blobs_[0]->ternarize_data(this->phase_);  //quantized from blob[0] to ternary sand stored in cpu_binary()

/*

    Dtype alpha = (Dtype) this->blobs_[0]->get_alpha();

for(int i=0; i<bottom.size(); i++){

  Blob<Dtype>* blob = bottom[i];

  caffe_cpu_scale(blob->count(), alpha, blob->cpu_data(), blob->mutable_cpu_data());

}

*/

}

  const Dtype* weight = (BINARY || TERNARY) ? this->blobs_[0]->cpu_binary() : this->blobs_[0]->cpu_data();

...

}

步骤11~19，weight的Update是在full precision上，而计算gradient则是用ternary weight：

//conv_layer.cpp

template <typename Dtype>

void ConvolutionLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,

      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {

  const Dtype* weight = this->blobs_[0]->cpu_data();

  Dtype* weight_diff = this->blobs_[0]->mutable_cpu_diff();

  for (int i = 0; i < top.size(); ++i) {

    ...

    if (this->param_propagate_down_[0] || propagate_down[i]) {

      for (int n = 0; n < this->num_; ++n) {

        // gradient w.r.t. weight. Note that we will accumulate diffs.

        if (this->param_propagate_down_[0]) {

          this->weight_cpu_gemm(bottom_data + n * this->bottom_dim_,

              top_diff + n * this->top_dim_, weight_diff);

        }

        // gradient w.r.t. bottom data, if necessary.

        if (propagate_down[i]) {

          this->backward_cpu_gemm(top_diff + n * this->top_dim_, weight,

              bottom_diff + n * this->bottom_dim_);

        }

      }

    }

  }

}

Ternary weight networks的更多相关文章

论文翻译：Ternary Weight Networks
目录 Abstract 1 Introduction 1.1 Binary weight networks and model compression 2 Ternary weight network ...
[综述]Deep Compression/Acceleration深度压缩/加速/量化
Survey Recent Advances in Efficient Computation of Deep Convolutional Neural Networks, [arxiv '18] A ...
zz神经网络模型量化方法简介
神经网络模型量化方法简介 https://chenrudan.github.io/blog/2018/10/02/networkquantization.html 2018-10-02 本文主要梳理了 ...
deeplearning模型量化实战
deeplearning模型量化实战 MegEngine 提供从训练到部署完整的量化支持,包括量化感知训练以及训练后量化,凭借"训练推理一体"的特性,MegEngine更能保证量化 ...
Understanding the Effective Receptive Field in Deep Convolutional Neural Networks
Understanding the Effective Receptive Field in Deep Convolutional Neural Networks 理解深度卷积神经网络中的有效感受野 ...
[C6] Andrew Ng - Convolutional Neural Networks
About this Course This course will teach you how to build convolutional neural networks and apply it ...
[CS231n-CNN] Training Neural Networks Part 1 : activation functions, weight initialization, gradient flow, batch normalization | babysitting the learning process, hyperparameter optimization
课程主页:http://cs231n.stanford.edu/ Introduction to neural networks -Training Neural Network ________ ...
【转】Artificial Neurons and Single-Layer Neural Networks
原文:written by Sebastian Raschka on March 14, 2015 中文版译文:伯乐在线 - atmanic 翻译,toolate 校稿 This article of ...
一天一经典Reducing the Dimensionality of Data with Neural Networks [Science2006]
别看本文没有几页纸,本着把经典的文多读几遍的想法,把它彩印出来看,没想到效果很好,比在屏幕上看着舒服.若用蓝色的笔圈出重点,这篇文章中几乎要全蓝.字字珠玑. Reducing the Dimensio ...

随机推荐

EM算法理论与推导
EM算法(Expectation-maximization),又称最大期望算法,是一种迭代算法,用于含有隐变量的概率模型参数的极大似然估计(或极大后验概率估计) 从定义可知,该算法是用来估计参数的,这 ...
数据可视化之powerBI技巧（十九）DAX作图技巧：使用度量值动态分组和配色
有了前两篇关于分组的铺垫,这篇文章就来学习一个分组的经典应用,图表的动态分组,并对分组动态配色. 假设有十几个产品,每个产品的销售额,是随着时间而变化的,想知道某个时间的每一个产品的销售额与平均销售额 ...
数据分析03 /基于pandas的数据清洗、级联、合并
数据分析03 /基于pandas的数据清洗.级联.合并目录数据分析03 /基于pandas的数据清洗.级联.合并 1. 处理丢失的数据 2. pandas处理空值操作 3. 数据清洗案例 4. 处 ...
Java图片验证码生成工具
直接把以下代码拷贝使用: import javax.imageio.ImageIO;import java.awt.*;import java.awt.image.BufferedImage;impo ...
Mysql---搭建简单集群，实现主从复制，读写分离
参考博客:https://blog.csdn.net/xlgen157387/article/details/51331244 A. 准备:多台服务器,且都可以互相随意访问root用户,都可以随意进行 ...
P3406 海底高铁（洛谷）
题目背景大东亚海底隧道连接着厦门.新北.博艾.那霸.鹿儿岛等城市,横穿东海,耗资1000亿博艾元,历时15年,于公元2058年建成.凭借该隧道,从厦门可以乘坐火车直达台湾.博艾和日本,全程只需要4个 ...
浏览器如何执行JS
作为JS系列的第一篇,内容当然是浏览器如何执行一段JS啦. 首先通过浏览器篇我们可以得知,JS是在渲染进程里的JS引擎线程执行的.在此之后还要了解几个概念,编译器(Compiler).解释器(Inte ...
前端学习（十七）：JavaScript常用对象
进击のpython ***** 前端学习--JavaScript常用对象 JavaScript中的所有事物都是对象:字符串.数字.数组.日期,等等在JavaScript中,对象是拥有属性和方法的数据 ...
构建一个基于事件分发驱动的EventLoop线程模型
在之前的文章中我们详细介绍过Netty中的NioEventLoop,NioEventLoop从本质上讲是一个事件循环执行器,每个NioEventLoop都会绑定一个对应的线程通过一个for(;;)循环 ...
PHP代码实现二分法查找
需求:定义一个函数接收一个数组对象和一个要查找的目标元素,函数要返回该目标元素在数组中的索引值,如果目标元素不存在数组中,那么返回-1表示. //折半查找法(二分法): 使用前提必需是有序的数组. / ...

Ternary weight networks

Introduction

Ternary Quantization

Implement

Ternary weight networks的更多相关文章

随机推荐

热门专题