https://github.com/lijingpeng/kaggle/tree/master/competitions/image_recognize

识别谷歌街景图片中的字母

street-view-getting-started-with-julia 让我们从谷歌街景的图片中鉴定字母,这个题目是让我们学习和使用Julia,Julia有python和R的易用性,有C语言的速度,无奈对Julia不是很熟悉,所以还是想用python来试试。

import cv2
import numpy as np
import sys
import pandas as pd

我们希望所有的图片最后存储在一个numpy的矩阵当中,每一行为图片的像素值。为了得到统一的表达呢,我们将RGB三个通道的值做平均得到的灰度图像作为每个图片的表示:

# typeData 为"train"或者"test"
# labelsInfo 包含每一个图片的ID
# 图片存储在trainResized和testResized文件夹内
def read_data(typeData, labelsInfo, imageSize):
labelsIndex = labelsInfo["ID"]
x = np.zeros((np.size(labelsIndex), imageSize))
for idx, idImage in enumerate(labelsIndex):
# 得到图片文件名并读取
nameFile = typeData + "Resized/" + str(idImage) + ".Bmp"
img = cv2.imread(nameFile)
# 转化为灰度图
temp = np.mean(img, 2)
# 将图片转化为行向量
x[idx, :] = np.reshape(temp, (1, imageSize))
return x

预处理训练集和测试集

imageSize = 400
trainlabels = pd.read_csv("trainLabels.csv")
testlabels = pd.read_csv("sampleSubmission.csv")
# 得到训练集的特征
xTrain = read_data('train', trainlabels, imageSize)
# 得到测试集的特征
xTest = read_data("test", testlabels, imageSize)

预览数据:

print trainlabels.head(2)
print testlabels.head(2)
   ID Class
0 1 n
1 2 8
ID Class
0 6284 A
1 6285 A
yTrain = trainlabels["Class"]
yTrain = [ord(x) for x in yTrain]

模型训练

随机森林

使用随机森林进行训练,树的个数和深度需要多次调解寻求最佳值

from sklearn.ensemble import RandomForestClassifier
%time rfc = RandomForestClassifier(n_estimators = 500, max_features = 50, max_depth=None)
rfc.fit(xTrain, yTrain)
CPU times: user 121 µs, sys: 367 µs, total: 488 µs
Wall time: 494 µs RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features=50, max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)

预测

将训练后的模型应用到测试集上,并保存结果:

predTest = rfc.predict(xTest)
predResult = [chr(x) for x in predTest]
testlabels["Class"] = predResult
testlabels.to_csv("rf_500_50_result.csv",index = None)

结果

使用50颗树进行训练,提交kaggle之后准确率约为0.40

改用300颗树进行训练,提交kaggle之后准确率为0.46695

改用500颗树进行训练,深度为10,提价kaggle后准确率为0.40,估计出现了过拟合

改用500颗树进行训练,不设置深度,提价kaggle后准确率为0.47480

贝叶斯

from sklearn.naive_bayes import GaussianNB as GNB
model_GNB = GNB()
model_GNB.fit(xTrain, yTrain) predTest = model_GNB.predict(xTest)
predResult = [chr(x) for x in predTest]
testlabels["Class"] = predResult
testlabels.to_csv("gnb_result.csv",index = None)

贝叶斯的训练非常的快,把结果提交kaggle后,得到0.02389的准确率,明显低于随机森林

GBDT

from sklearn.ensemble import GradientBoostingClassifier
%time GBDT = GradientBoostingClassifier(loss='deviance', learning_rate=0.1, n_estimators=100, subsample=1.0, \
min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, init=None, \
random_state=None, max_features=None, verbose=0, max_leaf_nodes=None, warm_start=False, presort='auto') %time GBDT.fit(xTrain, yTrain) %time predTest = GBDT.predict(xTest)
predResult = [chr(x) for x in predTest]
testlabels["Class"] = predResult
testlabels.to_csv("gbdt_result.csv",index = None)
CPU times: user 91 µs, sys: 738 µs, total: 829 µs
Wall time: 2.93 ms
CPU times: user 40min 16s, sys: 52.3 s, total: 41min 9s
Wall time: 2h 55min 22s
CPU times: user 1.75 s, sys: 44.5 ms, total: 1.8 s
Wall time: 1.79 s

使用GBDT仅得到了0.31937的准确率,可能是我的默认参数没有调节好,关键是GBDT的训练时间太长,调试成本也比较高

神经网络

import os
from skimage.io import imread
from lasagne import layers
from lasagne.nonlinearities import softmax
from nolearn.lasagne import NeuralNet, BatchIterator
# Define functions
def read_datax(typeData, labelsInfo, imageSize, path):
x = np.zeros((labelsInfo.shape[0], imageSize)) for (index, idImage) in enumerate(labelsInfo['ID']):
# use specially created 32 x 32 images
nameFile = '{0}/{1}Resized32/{2}.Bmp'.format(path,
typeData, idImage)
img = imread(nameFile, as_grey = True) x[index, :] = np.reshape(img, (1, imageSize)) return x def fit_model(reshaped_train_x, y, image_width,
image_height, reshaped_test_x):
net = NeuralNet(
layers = [
('input', layers.InputLayer),
('conv1', layers.Conv2DLayer),
('pool1', layers.MaxPool2DLayer),
('dropout1', layers.DropoutLayer),
('conv2', layers.Conv2DLayer),
('pool2', layers.MaxPool2DLayer),
('dropout2', layers.DropoutLayer),
('conv3', layers.Conv2DLayer),
('hidden4', layers.DenseLayer),
('output', layers.DenseLayer),
],
input_shape = (None, 1, 32, 32),
conv1_num_filters=32, conv1_filter_size=(5, 5),
pool1_pool_size=(2, 2),
dropout1_p=0.2,
conv2_num_filters=64, conv2_filter_size=(5, 5),
pool2_pool_size=(2, 2),
dropout2_p=0.2,
conv3_num_filters = 128, conv3_filter_size = (5, 5),
hidden4_num_units=500,
output_num_units = 62, output_nonlinearity = softmax, update_learning_rate = 0.01,
update_momentum = 0.9, batch_iterator_train = BatchIterator(batch_size = 100),
batch_iterator_test = BatchIterator(batch_size = 100), use_label_encoder = True,
regression = False,
max_epochs = 100,
verbose = 1,
) net.fit(reshaped_train_x, y)
prediction = net.predict(reshaped_test_x) return prediction
# 预处理数据,首先将图片保存为32*32的小图片
imageSize = 1024 # 32 x 32
image_width = image_height = int(imageSize ** 0.5) labelsInfoTrain = pd.read_csv\
('trainLabels.csv'.format(path))
labelsInfoTest = pd.read_csv\
('sampleSubmission.csv'.format(path)) # Load dataset
nnxTrain = read_datax('train', labelsInfoTrain, imageSize, '.')
nnxTest = read_datax('test', labelsInfoTest, imageSize, '.') nnyTrain = map(ord, labelsInfoTrain['Class'])
nnyTrain = np.array(yTrain)
# 归一化数据
nnxTrain /= nnxTrain.std(axis = None)
nnxTrain -= nnxTrain.mean() nnxTest /= nnxTest.std(axis = None)
nnxTest -= nnxTest.mean()
# Reshape data
train_x_reshaped = nnxTrain.reshape(nnxTrain.shape[0], 1,
image_height, image_width).astype('float32')
test_x_reshaped = nnxTest.reshape(nnxTest.shape[0], 1,
image_height, image_width).astype('float32')
# 进行训练和测试
predict = fit_model(train_x_reshaped, nnyTrain, image_width, image_height, test_x_reshaped)
# Neural Network with 352586 learnable parameters

## Layer information

  #  name      size
--- -------- --------
0 input 1x32x32
1 conv1 32x28x28
2 pool1 32x14x14
3 dropout1 32x14x14
4 conv2 64x10x10
5 pool2 64x5x5
6 dropout2 64x5x5
7 conv3 128x1x1
8 hidden4 500
9 output 62 epoch trn loss val loss trn/val valid acc dur
------- ---------- ---------- --------- ----------- ------
1 [36m4.08201[0m [32m4.01012[0m 1.01793 0.07254 16.55s
2 [36m3.87688[0m [32m3.84326[0m 1.00875 0.04836 17.72s
3 [36m3.82788[0m [32m3.79976[0m 1.00740 0.04914 16.58s
4 [36m3.78741[0m [32m3.78872[0m 0.99965 0.07254 16.14s
5 [36m3.78030[0m [32m3.78600[0m 0.99850 0.07254 16.37s
6 [36m3.77679[0m [32m3.78520[0m 0.99778 0.07254 16.56s
7 [36m3.77487[0m 3.78537 0.99723 0.07254 16.30s
8 [36m3.77411[0m [32m3.78468[0m 0.99721 0.07254 16.51s
9 [36m3.77257[0m 3.78518 0.99667 0.07254 15.92s
10 [36m3.77202[0m [32m3.78459[0m 0.99668 0.07254 16.55s
11 [36m3.76948[0m [32m3.78458[0m 0.99601 0.07254 16.25s
12 [36m3.76882[0m [32m3.78414[0m 0.99595 0.07254 16.31s
13 [36m3.76717[0m [32m3.78411[0m 0.99552 0.07254 15.70s
14 [36m3.76606[0m 3.78469 0.99508 0.07254 16.04s
15 [36m3.76419[0m 3.78671 0.99405 0.07176 15.70s
16 [36m3.76277[0m [32m3.78392[0m 0.99441 0.07176 16.05s
17 [36m3.76014[0m 3.78821 0.99259 0.07176 15.71s
18 3.78179 3.78606 0.99887 0.07254 16.11s
19 3.76928 [32m3.78321[0m 0.99632 0.07254 15.75s
20 3.76688 3.78358 0.99559 0.07254 16.05s
21 3.76434 [32m3.78255[0m 0.99519 0.07254 17.36s
22 3.76186 [32m3.78174[0m 0.99474 0.07254 18.12s
23 [36m3.75829[0m 3.78184 0.99377 0.07878 17.90s
24 [36m3.75370[0m 3.78545 0.99161 0.07488 18.19s
25 [36m3.74749[0m [32m3.77908[0m 0.99164 0.07098 17.81s
26 [36m3.73650[0m [32m3.77806[0m 0.98900 0.07020 18.08s
27 [36m3.71592[0m [32m3.77626[0m 0.98402 0.06474 18.03s
28 [36m3.67805[0m [32m3.74531[0m 0.98204 0.07176 18.04s
29 [36m3.59550[0m 3.79802 0.94668 0.07566 18.12s
30 [36m3.44086[0m [32m3.35483[0m 1.02564 0.19111 18.06s
31 [36m3.14160[0m [32m3.00021[0m 1.04713 0.29251 17.41s
32 [36m2.73389[0m [32m2.89130[0m 0.94556 0.31903 16.19s
33 [36m2.61587[0m [32m2.53098[0m 1.03354 0.38144 15.73s
34 [36m2.25316[0m [32m2.26086[0m 0.99660 0.43994 16.14s
35 [36m1.95499[0m [32m2.03661[0m 0.95993 0.48206 15.76s
36 [36m1.75483[0m [32m1.94987[0m 0.89997 0.49610 16.01s
37 [36m1.60276[0m [32m1.78637[0m 0.89722 0.52106 15.60s
38 [36m1.47862[0m [32m1.73524[0m 0.85211 0.54524 15.98s
39 [36m1.35049[0m [32m1.65705[0m 0.81500 0.55694 15.62s
40 [36m1.27458[0m [32m1.65253[0m 0.77129 0.57254 16.01s
41 [36m1.18548[0m [32m1.60550[0m 0.73839 0.58112 15.61s
42 [36m1.11862[0m 1.62259 0.68940 0.58268 16.51s
43 [36m1.05698[0m 1.68044 0.62899 0.58112 16.24s
44 [36m1.01350[0m 1.64642 0.61558 0.59126 16.50s
45 [36m0.93587[0m 1.62059 0.57749 0.59906 15.81s
46 [36m0.87893[0m 1.65983 0.52953 0.59984 16.54s
47 [36m0.83695[0m 1.66309 0.50325 0.60452 16.42s
48 1.72887 2.92194 0.59169 0.54446 16.31s
49 3.85830 3.39520 1.13640 0.21373 15.84s
50 2.26598 1.97743 1.14592 0.46724 18.41s
51 2.11105 1.89927 1.11150 0.49298 18.02s
52 1.66393 1.75705 0.94700 0.51794 17.99s
53 1.48332 1.65795 0.89467 0.54212 17.94s
54 1.38197 [32m1.60296[0m 0.86214 0.55928 17.73s
55 1.28419 [32m1.56050[0m 0.82293 0.56318 17.94s
56 1.21078 [32m1.54983[0m 0.78123 0.57176 17.70s
57 1.13885 1.55330 0.73318 0.55616 17.93s
58 1.10488 [32m1.53462[0m 0.71997 0.57956 17.71s
59 1.03479 1.54234 0.67092 0.58502 17.70s
60 0.98439 [32m1.52492[0m 0.64554 0.59984 17.95s
61 0.93277 [32m1.49128[0m 0.62548 0.59204 17.67s
62 1.03055 1.58280 0.65109 0.57878 18.01s
63 0.89008 1.54904 0.57460 0.59750 17.69s
64 0.83698 1.59463 0.52487 0.58346 17.92s
65 [36m0.79801[0m 1.59534 0.50021 0.60452 17.80s
66 [36m0.77752[0m 1.56702 0.49618 0.60842 17.91s
67 [36m0.73901[0m 1.61821 0.45668 0.59594 17.81s
68 [36m0.71108[0m 1.56703 0.45377 0.61154 17.98s
69 [36m0.67279[0m 1.61497 0.41659 0.61154 17.81s
70 [36m0.64651[0m 1.66452 0.38841 0.60530 17.97s
71 [36m0.61597[0m 1.65828 0.37145 0.62012 17.84s
72 [36m0.59188[0m 1.69796 0.34858 0.60296 17.92s
73 [36m0.57862[0m 1.72392 0.33564 0.60686 17.73s
74 [36m0.56451[0m 1.75449 0.32175 0.60062 17.56s
75 [36m0.53835[0m 1.74351 0.30877 0.62090 17.77s
76 [36m0.53288[0m 1.80642 0.29499 0.60842 18.08s
77 [36m0.49975[0m 1.76941 0.28244 0.61700 17.76s
78 [36m0.48489[0m 1.75930 0.27561 0.60998 17.92s
79 [36m0.45688[0m 1.81943 0.25111 0.61622 17.78s
80 0.46801 1.80187 0.25974 0.62480 17.96s
81 [36m0.45527[0m 1.88136 0.24199 0.61310 17.84s
82 [36m0.43178[0m 1.93961 0.22261 0.61622 18.56s
83 [36m0.41726[0m 1.90341 0.21922 0.61856 16.52s
84 [36m0.38590[0m 1.91029 0.20201 0.61778 15.59s
85 [36m0.38510[0m 1.93524 0.19900 0.61778 16.00s
86 [36m0.37565[0m 1.92514 0.19513 0.61466 15.56s
87 [36m0.36222[0m 1.99870 0.18123 0.61544 15.88s
88 0.38495 2.08839 0.18433 0.61466 15.55s
89 [36m0.34101[0m 1.94872 0.17499 0.62559 15.97s
90 [36m0.33575[0m 2.01506 0.16662 0.61856 15.63s
91 [36m0.32353[0m 2.05956 0.15709 0.62090 16.03s
92 [36m0.30422[0m 2.12548 0.14313 0.64041 15.66s
93 [36m0.29631[0m 2.10645 0.14067 0.63495 16.02s
94 0.32050 2.11861 0.15128 0.62168 15.73s
95 0.30140 2.14516 0.14050 0.62871 15.99s
96 [36m0.28195[0m 2.09292 0.13472 0.63339 15.67s
97 0.30323 2.20744 0.13737 0.62246 16.07s
98 [36m0.27107[0m 2.15645 0.12570 0.63729 16.32s
99 0.27947 2.22565 0.12557 0.62637 16.51s
100 [36m0.26500[0m 2.22825 0.11893 0.64431 16.52s
# 保存结果
yTest = map(chr, predict)
labelsInfoTest['Class'] = yTest
labelsInfoTest.to_csv('nnresult.csv'.format(path), index = False)

提交kaggle之后的准确率:0.64562

kaggle之识别谷歌街景图片中的字母的更多相关文章

  1. Tesseract.js 一个几乎能识别出图片中所有语言的JS库

    Tesseract.js 一个几乎能识别出图片中所有语言的JS库. 官网:http://tesseract.projectnaptha.com/ git:https://github.com/napt ...

  2. Tensorflow搭建卷积神经网络识别手写英语字母

    更新记录: 2018年2月5日 初始文章版本 近几天需要进行英语手写体识别,查阅了很多资料,但是大多数资料都是针对MNIST数据集的,并且主要识别手写数字.为了满足实际的英文手写识别需求,需要从训练集 ...

  3. GOCR.js – 使用 JS 识别出图片中的文本

    GOCR.js 是 GOCR(开源的 OCR 光学识别程序)项目的纯 JavaScript 版本,使用 Emscripten 进行自动转换.这是一个简单的 OCR (光学字符识别)程序,可以扫描图像中 ...

  4. python识别一段由字母组成的字符串是拼音还是英文单词

    环境:win10 python3.6 先说一下算法思想: 首先建立本地拼音库(不带声调).使用贪婪算法将字符串从左向右扫描,将字符串与本地拼音库(这里提供给大家一个)进行匹配,当发现匹配成功时继续扫描 ...

  5. Kaggle比赛:从何着手?

    介绍 参加Kaggle比赛,我必须有哪些技能呢? 你有没有面对过这样的问题?最少在我大二的时候,我有过.过去我仅仅想象Kaggle比赛的困难度,我就感觉害怕.这种恐惧跟我怕水的感觉相似.怕水,让我无法 ...

  6. 识别图片中文字(百度AI)

     这个是百度官方的文档         https://ai.baidu.com/docs#/OCR-API/top    通用的文字识别,如果是其他的含生僻字/含位置信息的版本,请参考官方的文档,只 ...

  7. C#识别验证码技术-Tesseract

    相信大家在开发一些程序会有识别图片上文字(即所谓的OCR)的需求,比如识别车牌.识别图片格式的商品价格.识别图片格式的邮箱地址等等,当然需求最多的还是识别验证码.如果要完成这些OCR的工作,需要你掌握 ...

  8. kaggle之人脸特征识别

    Facial_Keypoints_Detection github code facial-keypoints-detection, 这是一个人脸识别任务,任务是识别人脸图片中的眼睛.鼻子.嘴的位置. ...

  9. Python网络爬虫之cookie处理、验证码识别、代理ip、基于线程池的数据爬去

    本文概要 session处理cookie proxies参数设置请求代理ip 基于线程池的数据爬取 引入 有些时候,我们在使用爬虫程序去爬取一些用户相关信息的数据(爬取张三“人人网”个人主页数据)时, ...

随机推荐

  1. jQuery中 $ 符号的冲突问题

    jQuery中 $ 符号的冲突问题是常见问题之一.   在jQuery中,$是jQuery的别名,为了书写方便,我们更习惯用$('#id')这一类的方式来书写代码.当同一页面引用了jQuery多个版本 ...

  2. windows下删除服务的方法

    删除的办法有两个: 办法一: 用sc.exe这个Windows命令 开始——运行——cmd.exe,然后输入sc就可以看到了.使用办法很简单: sc delete "服务名" (如 ...

  3. string之substring的用法

    package com.j1; public class StringTest1 { public static void main(String[] args) { String s =" ...

  4. 虚拟机克隆linux系统后需要做的网络设置

    1.vim /etc/sysconfig/network-scripts/ifcfg-eth0删除HWMAC地址行,然后重新分配静态IP/掩码/网关/DNS 2.vim /etc/udev/rules ...

  5. encodeURI与encodeURIComponent的区别

    webservice输出时选择的格式与Content-Type报文头有关 encodeURI与encodeURIComponent的区别:后者会将URI进行编码(包括"://")

  6. Oracle触发器Trigger4触发条件_when的使用

    /* 同一表使用所有条件 Create or replace trigger t5 Before insert,delete,update on 表名 For each row //plsql块 */ ...

  7. NSString字符串类型-学习总结

    1.字符串的创建 (1)创建常量字符串 NSString *str = @"This is a String"; //str是变量名 (2)创建空的字符串,给字符串赋值 NSStr ...

  8. 基于nginx的rtmp的服务器(nginx-rtmp-module)

    一,首先下载安装nginx需要依赖的库文件: 1.1,选定源码目录 选定目录 /usr/local/RTMP cd /usr/local/RTMP 1.2,安装PCRE库 cd /usr/local/ ...

  9. hdu 油菜花王国

    Problem Description 程序设计竞赛即将到来,作为学校ACM集训队主力,小明训练一直很努力.今天天气不错,教练也心情大好,破例给各位队员放假一天,小明就骑着自己的小电驴到郊外踏青去了. ...

  10. Nutch+Hadoop集群搭建

    转载自:http://www.open-open.com/lib/view/open1328670771405.html 1.Apache Nutch    Apache Nutch是一个用于网络搜索 ...