WHAT I READ FOR DEEP-LEARNING

Today, I spent some time on two new papers proposing a new way of training very deep neural networks (Highway-Networks) and a new activation function for Auto-Encoders (ZERO-BIAS AUTOENCODERS AND THE BENEFITS OF
CO-ADAPTING FEATURES) which evades the use of any regularization methods such as Contraction or Denoising.

Lets start with the first one. Highway-Networks proposes a new activation type similar to LTSM networks and they claim that this peculiar activation is robust to any choice of initialization scheme and learning problems occurred at very deep NNs. It is also incentive to see that they trained models with >100 number of layers. The basic intuition here is to learn a gating function attached to a real activation function that decides to pass the activation or the input itself. Here is the formulation

T(x,Wt) is the gating function and H(x,WH) is the real activation. They use Sigmoid activation for gating and Rectifier for the normal activation in the paper. I also implemented it with Lasagne and tried to replicate the results (I aim to release the code later). It is really impressive to see its ability to learn for 50 layers (this is the most I can for my PC).

The other paper ZERO-BIAS AUTOENCODERS AND THE BENEFITS OF CO-ADAPTING FEATURES suggests the use of non-biased rectifier units for the inference of AEs. You can train your model with a biased Rectifier Unit but at the inference time (test time), you should extract features by ignoring bias term. They show that doing so gives better recognition at CIFAR dataset. They also device a new activation function which has the similar intuition to Highway Networks.  Again, there is a gating unit which thresholds the normal activation function.

The first equation is the threshold function with a predefined threshold (they use 1 for their experiments).  The second equation shows the reconstruction of the proposed model. Pay attention that, in this equation they use square of a linear activation for thresholding and they call this model TLin  but they also use normal linear function which is called TRec. What this activation does here is to diminish the small activations so that the model is implicitly regularized without any additional regularizer. This is actually good for learning over-complete representation for the given data.

For more than this silly into, please refer to papers  and warn me for any mistake.

These two papers shows a new coming trend to Deep Learning community which is using complex activation functions . We can call it controlling each unit behavior in a smart way instead of letting them fire naively. My notion also agrees with this idea. I believe even more complication we need for smart units in our deep models like Spike and Slap networks.

 

WHAT I READ FOR DEEP-LEARNING的更多相关文章

  1. Deep learning:五十一(CNN的反向求导及练习)

    前言: CNN作为DL中最成功的模型之一,有必要对其更进一步研究它.虽然在前面的博文Stacked CNN简单介绍中有大概介绍过CNN的使用,不过那是有个前提的:CNN中的参数必须已提前学习好.而本文 ...

  2. 【深度学习Deep Learning】资料大全

    最近在学深度学习相关的东西,在网上搜集到了一些不错的资料,现在汇总一下: Free Online Books  by Yoshua Bengio, Ian Goodfellow and Aaron C ...

  3. 《Neural Network and Deep Learning》_chapter4

    <Neural Network and Deep Learning>_chapter4: A visual proof that neural nets can compute any f ...

  4. Deep Learning模型之:CNN卷积神经网络(一)深度解析CNN

    http://m.blog.csdn.net/blog/wu010555688/24487301 本文整理了网上几位大牛的博客,详细地讲解了CNN的基础结构与核心思想,欢迎交流. [1]Deep le ...

  5. paper 124:【转载】无监督特征学习——Unsupervised feature learning and deep learning

    来源:http://blog.csdn.net/abcjennifer/article/details/7804962 无监督学习近年来很热,先后应用于computer vision, audio c ...

  6. Deep Learning 26:读论文“Maxout Networks”——ICML 2013

    论文Maxout Networks实际上非常简单,只是发现一种新的激活函数(叫maxout)而已,跟relu有点类似,relu使用的max(x,0)是对每个通道的特征图的每一个单元执行的与0比较最大化 ...

  7. Deep Learning 23:dropout理解_之读论文“Improving neural networks by preventing co-adaptation of feature detectors”

    理论知识:Deep learning:四十一(Dropout简单理解).深度学习(二十二)Dropout浅层理解与实现.“Improving neural networks by preventing ...

  8. Deep Learning 19_深度学习UFLDL教程:Convolutional Neural Network_Exercise(斯坦福大学深度学习教程)

    理论知识:Optimization: Stochastic Gradient Descent和Convolutional Neural Network CNN卷积神经网络推导和实现.Deep lear ...

  9. 0.读书笔记之The major advancements in Deep Learning in 2016

    The major advancements in Deep Learning in 2016 地址:https://tryolabs.com/blog/2016/12/06/major-advanc ...

  10. #Deep Learning回顾#之LeNet、AlexNet、GoogLeNet、VGG、ResNet

    CNN的发展史 上一篇回顾讲的是2006年Hinton他们的Science Paper,当时提到,2006年虽然Deep Learning的概念被提出来了,但是学术界的大家还是表示不服.当时有流传的段 ...

随机推荐

  1. 再探Redux Middleware

    前言 在初步了解Redux中间件演变过程之后,继续研究Redux如何将中间件结合.上次将中间件与redux硬结合在一起确实有些难看,现在就一起看看Redux如何加持中间件. 中间件执行过程 希望借助图 ...

  2. CentOS7安装OpenStack(Rocky版)-02.安装Keyston认证服务组件(控制节点)

    本文分享openstack的认证服务组件keystone --------------- 完美的分割线 ---------------- 2.0.keystone认证服务 1)用户与认证:用户权限与用 ...

  3. Linux读书笔记第五章

    主要内容: 什么是系统调用 Linux上的系统调用实现原理 一个简单的系统调用的实现 1. 什么是系统调用 简单来说,系统调用就是用户程序和硬件设备之间的桥梁. 用户程序在需要的时候,通过系统调用来使 ...

  4. android开发心得之知识的量变到质变

    随着身边越来越多的人开始了尝试android开发,看着他们一点点学期 从nodepad++写代码 cmd 执行,到安装eclipse 和android SDK,仿佛看到了昨天的我一样,一样勤勤恳恳的学 ...

  5. 蜗牛慢慢爬 LeetCode 23. Merge k Sorted Lists [Difficulty: Hard]

    题目 Merge k sorted linked lists and return it as one sorted list. Analyze and describe its complexity ...

  6. Beta阶段敏捷冲刺总结

    设想和目标 1. 我们的软件要解决什么问题?是否定义得很清楚?是否对典型用户和典型场景有清晰的描述?       在最开始的时候我们就是为了解决集美大学计算机工程学院网页没有搜索引擎的问题.因为没有搜 ...

  7. Jmeter使用笔记之组件的作用域

    以前一直使用loadrunner,最近入职新公司后需要使用jmeter,这里把使用过程中出现的一些问题进行总结,同时会和自己使用loadrunner的情况相比较,以后也会不断总结,GO! 一.组件的作 ...

  8. Docker(二十五)-Docker Machine

    Docker Machine 是什么? Docker Machine 是 Docker 官方提供的一个工具,它可以帮助我们在远程的机器上安装 Docker,或者在虚拟机 host 上直接安装虚拟机并在 ...

  9. php 的优缺点

    1.优点:开源 免费性 快捷性 [程序开发快,运行快,技术本身学习快] 插件丰富,网上的解决方案有很多,而且还有庞大的开源社区可以提供帮助. 跨平台性强  效率高   图像处理 面向对象 [在php4 ...

  10. ELK之消息队列选择redis_kafka_rabbitmq

    前言描述 生产初级,Service服务较少,访问量较少,随着业务量的不断增加,日志量成倍增长,然后就遇到了消息队列redis被充爆,不能满足应用的情况.针对此情况,我们来分析下可用的消息多列. 官方推 ...