参考: https://groups.google.com/forum/#!topic/theano-users/teA-07wOFpE

这个问题出现的原因是,我在读文件的时候,应该Train_X读成matrix(rows * dimensions),Train_Y读成vector(因为只有label一维)



train_X_matrix = numpy.empty((train_rows,n_ins),numpy.float64)#初始化为矩阵
train_Y_matrix = []#为什么要初始化为list,详见下面解释 #按行读文件的float进矩阵
rownum = 0
f = open(train_X_path)
for line in f.readlines():
train_X_matrix[rownum] = numpy.asarray(line.strip('\n ').split(' '), dtype=float)
rownum += 1 #按行读每一行的int进vector
f = open(train_Y_path)
for line in f.readlines():
train_Y_matrix = numpy.asarray(train_Y_matrix)



“You have the wrong mental model for using NumPy efficiently. NumPy arrays are stored in contiguous blocks of memory. If you want to add rows or columns to an existing array, the entire array needs to be copied to a new block of memory, creating gaps for the new elements to be stored. This is very inefficient if done repeatedly to build an array.”

意思是什么呢?就是numpy array是连续在内存中保存的,如果append的话它会不断copy block到新内存,效率太低。


>>> import numpy
>>> a = numpy.zeros(shape=(5,2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> a[0] = [1,2]
>>> a[1] = [2,3]
>>> a
array([[ 1., 2.],
[ 2., 3.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])

但是感觉这样还是不方便,于是就先初始化空list,再不断append,最后全部转化为numpy 的 array,就是代码中写的。



