卷积神经网络CNN(Convolutional Neural Networks)没有原理只有实现
零.说明:
本文的所有代码均可在 DML 找到,欢迎点星星。
注.CNN的这份代码非常慢,基本上没有实际使用的可能,所以我只是发出来,代表我还是实践过而已
一.引入:
CNN这个模型实在是有些年份了,最近随着深度学习的兴起又开始焕发青春了,把imagenet测试的准确度提高了非常多,一个是Alex的工作,然后最近好像Zeiler又有突破性的成果,可惜这些我都没看过,主要是imagenet的数据太大了,我根本没有可能跑得动,所以学习的积极性有些打折扣。不说那么多,还是先实现一个最基础的CNN再说吧:
二.实现:
好吧,基本是根据DeepLearnToolbox的结构按照 Notes on Convolutional Neural Networks 来写的,大家可以先看这里的代码讲解:【面向代码】学习
Deep Learning(三)Convolution Neural Network(CNN)
本来是想根据Notes那篇文章来写的,只是最后发现如果给subsampling层加上sigmoid之后整个结构就不收敛了~~~,我用numeric_grad_check检测发现梯度计算也是对的,不明所以~~~这份代码我也上传了(old),不过下面的代码最后就只能改成稍简化版的,貌似通常情况下CNN的pooling(subsampling)层也是没有sigmoid的,先这样吧,这个东西理解起来简单还是写了我两个下午……伤……
代码: DML/CNN/cnn.py
from __future__ import division
import numpy as np
import scipy as sp
from scipy.signal import convolve as conv
from dml.tool import sigmoid,expand,showimage
from numpy import rot90
'''
this algorithm have refered to the DeepLearnToolBox(https://github.com/rasmusbergpalm/DeepLearnToolbox)
also:[1]:"Notes on Convolutional Neural Networks" Jake Bouvrie 2006 - How to implement CNNs
I want to implement as [1] described,where the subsampling layer have sigmoid function
but finally it does not converge,but I can pass the gradcheck!!
(this version is dml/CNN/cnn.py.old ,if you can figure out what is wrong in the code,PLEASE LET ME KNOW)
at last I changed code back to simple version,delete the sigmoid in 's' layer
ps:this code in python is too slow!don't use it do anything except reading.
'''
class LayerC:
def __init__(self,types='i',out=0,scale=0,kernelsize=0):
self.types=types
self.a=None
self.b=None
self.d=None
if (types=='i'):
pass
elif (types=='c'):
self.out=out
self.kernelsize=kernelsize
self.k=None
elif (types=='s'):
self.scale=scale
self.Beta={}
self.dBeta={} class CNNC:
def __init__(self,X,y,layers,opts):
self.X=np.array(X)
self.y=np.array(y)
self.layers=layers
self.opts=opts
inputmap = 1
mapsize = np.array(self.X[0].shape) for i in range(len(self.layers)):
if self.layers[i].types=='s':
mapsize = mapsize / self.layers[i].scale
assert np.sum(np.floor(mapsize)== mapsize)==mapsize.size
self.layers[i].b={}
self.layers[i].db={}
for j in range(inputmap):
self.layers[i].b.setdefault(j,0)
self.layers[i].db.setdefault(j,0)
self.layers[i].Beta.setdefault(j,1)
self.layers[i].dBeta.setdefault(j,0.0)
pass
if self.layers[i].types=='c':
mapsize = mapsize - self.layers[i].kernelsize + 1
fan_out = self.layers[i].out*self.layers[i].kernelsize**2
self.layers[i].k={}
self.layers[i].dk={}
self.layers[i].b={}
self.layers[i].db={}
for j in range(self.layers[i].out): fan_in = inputmap*self.layers[i].kernelsize**2
for t in range(inputmap):
self.layers[i].k.setdefault(t,{})
self.layers[i].k[t].setdefault(j)
self.layers[i].k[t][j]=(np.random.rand(self.layers[i].kernelsize,self.layers[i].kernelsize)-
0.5)*2*np.sqrt(6/(fan_out+fan_in))
self.layers[i].dk.setdefault(t,{})
self.layers[i].dk[t].setdefault(j)
self.layers[i].dk[t][j]=np.zeros(self.layers[i].k[t][j].shape)
self.layers[i].b.setdefault(j,0)
self.layers[i].db.setdefault(j,0)
inputmap=self.layers[i].out
if self.layers[i].types=='i':
pass
fvnum = np.prod(mapsize)*inputmap;
onum = self.y.shape[0];
self.ffb=np.zeros((onum,1))
self.ffW=(np.random.rand(onum, fvnum)-0.5)*2*np.sqrt(6/(onum+fvnum))
def cnnff(self,x):
#print x
self.layers[0].a={}
self.layers[0].a.setdefault(0)
self.layers[0].a[0]=x.copy()
inputmap=1
n=len(self.layers) for l in range(1,n):
if self.layers[l].types=='s':
for j in range(inputmap): temp=np.ones((self.layers[l].scale,self.layers[l].scale))/(self.layers[l].scale**2)
z=conv(self.layers[l-1].a[j],np.array([temp]), 'valid')
z=np.array(z)[:,::self.layers[l].scale,::self.layers[l].scale] if self.layers[l].a==None:
self.layers[l].a={}
self.layers[l].a.setdefault(j)
self.layers[l].a[j] =z if self.layers[l].types=='c':
if self.layers[l].a==None:
self.layers[l].a={}
for j in range(self.layers[l].out): #for each outmaps
z = np.zeros(self.layers[l-1].a[0].shape - np.array([0,self.layers[l].kernelsize-1,self.layers[l].kernelsize-1]))
for i in range(inputmap): #cumulate from inputmaps
z+=conv(self.layers[l-1].a[i],np.array([self.layers[l].k[i][j]]),'valid')
self.layers[l].a.setdefault(j)
self.layers[l].a[j]=sigmoid(z+self.layers[l].b[j])
inputmap = self.layers[l].out self.fv=None
for j in range(len(self.layers[n-1].a)):
sa=self.layers[n-1].a[j].shape
p=self.layers[n-1].a[j].reshape(sa[0],sa[1]*sa[2]).copy()
if (self.fv==None):
self.fv=p
else:
self.fv=np.concatenate((self.fv,p),axis=1)
self.fv=self.fv.transpose()
self.o=sigmoid(np.dot(self.ffW,self.fv) + self.ffb) def cnnbp(self,y):
n=len(self.layers)
self.e=self.o-y
self.L=0.5*np.sum(self.e**2)/self.e.shape[1]
self.od=self.e*(self.o*(1-self.o)) self.fvd=np.dot(self.ffW.transpose(),self.od)
if self.layers[n-1].types=='c':
self.fvd=self.fvd*(self.fv*(1-self.fv))
sa=self.layers[n-1].a[0].shape
fvnum=sa[1]*sa[2]
for j in range(len(self.layers[n-1].a)):
if self.layers[n-1].d==None:
self.layers[n-1].d={}
self.layers[n-1].d.setdefault(j)
self.layers[n-1].d[j]=self.fvd[(j*fvnum):((j+1)*fvnum),:].transpose().reshape(sa[0],sa[1],sa[2]) for l in range(n-2,-1,-1):
if self.layers[l].types=='c':
for j in range(len(self.layers[l].a)):
if self.layers[l].d==None:
self.layers[l].d={}
self.layers[l].d.setdefault(j)
self.layers[l].d[j]=self.layers[l].a[j]*(1-self.layers[l].a[j])*
np.kron(self.layers[l+1].d[j],np.ones(( self.layers[l+1].scale,self.layers[l+1].scale))/(self.layers[l+1].scale**2)) elif self.layers[l].types=='s':
for j in range(len(self.layers[l].a)):
if self.layers[l].d==None:
self.layers[l].d={}
self.layers[l].d.setdefault(j)
z=np.zeros(self.layers[l].a[0].shape)
for i in range(len(self.layers[l+1].a)):
rotated=np.array([rot90(self.layers[l+1].k[j][i],2)])
z=z+conv(self.layers[l+1].d[i],rotated,'full')
self.layers[l].d[j]=z for l in range(1,n):
m=self.layers[l].d[0].shape[0]
if self.layers[l].types=='c':
for j in range(len(self.layers[l].a)):
for i in range(len(self.layers[l-1].a)):
#self.layers[l].dk[i][j]=rot90(conv(self.layers[l-1].a[i],rot90(self.layers[l].d[j],2),'valid'),2)
self.layers[l].dk[i][j]=self.layers[l].dk[i][j]*0
for t in range(self.layers[l].d[0].shape[0]):
self.layers[l].dk[i][j]+=rot90(conv(self.layers[l-1].a[i][t],rot90(self.layers[l].d[j][t],2),'valid'),2) self.layers[l].dk[i][j]=self.layers[l].dk[i][j]/m
self.layers[l].db[j]=np.sum(self.layers[l].d[j])/m
self.dffW=np.dot(self.od,self.fv.transpose())/self.od.shape[1]
self.dffb = np.mean(self.od,1).reshape(self.ffb.shape); def cnnapplygrads(self,alpha=0.1):
n=len(self.layers)
for l in range(1,n):
if self.layers[l].types=='c':
for j in range(len(self.layers[l].a)):
for i in range(len(self.layers[l-1].a)):
self.layers[l].k[i][j]-=alpha*self.layers[l].dk[i][j]
self.layers[l].b[j]-=alpha*self.layers[l].db[j]
pass self.ffW-=alpha*self.dffW
self.ffb-=alpha*self.dffb def train(self):
m=self.X.shape[0]
batchsize=self.opts['batchsize']
numbatches = m/batchsize
print numbatches
self.rL = []
for i in range(self.opts['numepochs']):
print 'the %d -th epoch is running'% (i+1)
kk=np.random.permutation(m)
for j in range(numbatches):
print 'the %d -th batch is running , totally %d batchs'% ((j+1),numbatches)
batch_x=self.X[kk[(j)*batchsize:(j+1)*batchsize],:,:].copy()
batch_y=self.y[:,kk[(j)*batchsize:(j+1)*batchsize]].copy()
self.cnnff(batch_x)
self.cnnbp(batch_y)
self.cnnapplygrads(alpha=self.opts['alpha']) if len(self.rL)==0:
self.rL.append(self.L)
else:
p=self.rL[len(self.rL)-1]
self.rL.append(p*0.99+0.1*self.L)
print self.L
def gradcheck(self,test_x,test_y):
#github上有这部分代码 def test(self,test_x,test_y):
self.cnnff(np.array(test_x))
p=self.o.argmax(axis=0)
bad= np.sum(p!=np.array(test_y).argmax(axis=0))
print p,np.array(test_y).argmax(axis=0)
print bad
print np.array(test_y).shape[1]
er=bad/np.array(test_y).shape[1]
print er
def pred(self,test_x):
self.cnnff(np.array(test_x))
p=self.o.argmax(axis=0)
return p
三.测试
因为python跑这个实在是太慢了,主要原因我觉得是convolution函数(我用的scipy.signal.convolve)比matlab里慢太多了,所以跑MNIST以50为一个patch跑SGD一轮要二三十分钟,所以建议不要使用这份代码,你可以去用DeepLearnToolbox比这都快……
使用代码来测试:test/CNN_test/test_cnn.py
layers=[LayerC('i'),
LayerC('c',out=6,kernelsize=5),
LayerC('s',scale=2),
LayerC('c',out=12,kernelsize=5),
LayerC('s',scale=2)]
opts={}
opts['batchsize']=40
opts['numepochs']=1
opts['alpha']=1 a=CNNC(X,groundTruth,layers,opts)
#a.gradcheck(test_x[1:3,:,:],test_groundTruth[:,1:3])
a.train()
a.test(test_x,test_groundTruth)
这是一轮的结果,89.99%的准确度,应该还是正常的:
卷积神经网络CNN(Convolutional Neural Networks)没有原理只有实现的更多相关文章
- 卷积神经网络(Convolutional Neural Networks)CNN
申明:本文非笔者原创,原文转载自:http://www.36dsj.com/archives/24006 自今年七月份以来,一直在实验室负责卷积神经网络(Convolutional Neural ...
- 小白也能弄懂的卷积神经网络(Convolutional Neural Networks )
本系列主要是讲解卷积神经网络 - Convolutional Neural Networks 的系列知识,本系列主要帮助大家入门,我相信这是所有入门深度学习的初学者都必须学习的知识,这里会用更加直接和 ...
- 卷积神经网络LeNet Convolutional Neural Networks (LeNet)
Note This section assumes the reader has already read through Classifying MNIST digits using Logisti ...
- 卷积神经网络(Convolutional Neural Network,CNN)
全连接神经网络(Fully connected neural network)处理图像最大的问题在于全连接层的参数太多.参数增多除了导致计算速度减慢,还很容易导致过拟合问题.所以需要一个更合理的神经网 ...
- 【转载】 卷积神经网络(Convolutional Neural Network,CNN)
作者:wuliytTaotao 出处:https://www.cnblogs.com/wuliytTaotao/ 本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可,欢迎 ...
- “卷积神经网络(Convolutional Neural Network,CNN)”之问
目录 Q1:CNN 中的全连接层为什么可以看作是使用卷积核遍历整个输入区域的卷积操作? Q2:1×1 的卷积核(filter)怎么理解? Q3:什么是感受野(Receptive field)? Q4: ...
- 卷积神经网络CNN学习笔记
CNN的基本结构包括两层: 特征提取层:每个神经元的输入与前一层的局部接受域相连,并提取该局部的特征.一旦该局部特征被提取后,它与其它特征间的位置关系也随之确定下来: 特征映射层:网络的每个计算层由多 ...
- paper 162:卷积神经网络(CNN)解析
卷积神经网络(CNN)解析: 卷积神经网络CNN解析 概揽 Layers used to build ConvNets 卷积层Convolutional layer 池化层Pooling Layer ...
- 卷积神经网络CNN的原理(一)---基本概念
什么是卷积神经网络呢?这个的确是比较难搞懂的概念,特别是一听到神经网络,大家脑海中第一个就会想到复杂的生物学,让人不寒而栗,那么复杂啊.卷积神经网络是做什么用的呢?它到底是一个什么东东呢? 卷积神经网 ...
随机推荐
- python算法——第四天
一.递归 def func(num): if num / 2 > 0: num -= 1 print(num) num = func(num) print('quit') return num ...
- awesome-scala
https://github.com/lauris/awesome-scala Awesome Scala A community driven list of useful Scala libra ...
- [转载]触发ASSERT(afxCurrentResourceHandle != NULL)错误的原因
触发ASSERT(afxCurrentResourceHandle != NULL)错误的原因 Debug Assert error afxwin1.inl line:22 翻译参考 http://w ...
- spring security使用数据库管理用户权限
<authentication-provider> <user-service> <user name="admin" password=" ...
- Log4J简单使用
一.一般会将commons-logging和Log4j一起使用 原因:1.commons-logging功能较弱 2.log4j功能强大. 所需jar: log4j-1.2.16.ja ...
- 怎么在Centos7下添加win8.1的启动项
首先找到启动文件. 在/boot/grub2目录下, 找到grub.cfg文件. 然后, sudo修改, 用gedit工具方便. 怎么修改? 打开文件, 找到有两个menuentry开头的部分, 然 ...
- C# 常用类
一.Convert 主要用于数据类型的转换,常用的静态方法有: Convert.ToSingle():把数据转换为单精度浮点数,参数常为字符串 Convert.ToDouble():转为双精度浮点数 ...
- 增强for循环
import java.util.ArrayList; import java.util.HashMap; import java.util.Iterator; import java.util.Li ...
- 新安装个Myeclipse,导入以前做的程序后程序里好多错,提示The import java.util cannot be resolved
原因:这是由于你的项目buildpath不对原来的项目,比如采用了原先的MyEclipse自带的jdk (D:\myeclipse\XXXXXX)结果,你现在换了一个,原来的没了就导致了现在这种错误, ...
- 利用Sonar规则结合WebStorm进行Code Inspect
1.目的 在编写代码时会受到公司Sonar规则的限制,不想在编写完成后再对代码进行Inspect,回头再来一个个修正,费时费力. 那么,下面将通过优秀的WebStorm开发工具自身的CodeInspe ...