Python 如何在csv中定位非数字和字母的符号
在数据清洗过程中,有时不仅希望去掉脏数据,更希望定位脏数据的位置,例如从csv里面定位非数字和字母单元格的位置,在使用isdigit()、isalpha()、isalnum()时无法判断浮点数,会将浮点数都判断为特殊符号。
以下为样例数据,希望定位特殊符号的位置。
实现代码为:
- # -*- coding: utf-8 -*-
- """
- Created on Tue Dec 6 14:37:12 2016
- @author: user
- """
- import csv
- import re
- csv_reader = csv.reader(open('D:/工作文件夹/Pyhton/20081003.csv',encoding = 'utf-8'))
- rows = 0
- #方法一、此方法可用于输出所有数值,过滤非数值(反之亦然成立)
- '''
- def is_a_num(string):
- try:
- float(string)#return float(string)
- except:
- return string#return ''
- for row in csv_reader:
- if row != ['FIELD_000','FIELD_001','FIELD_002','FIELD_003','FIELD_004','FIELD_005','FIELD_006','FIELD_007','FIELD_008','FIELD_009','FIELD_010','FIELD_011','FIELD_012','FIELD_013','FIELD_014','FIELD_015','FIELD_016','FIELD_017','FIELD_018','FIELD_019','FIELD_020','FIELD_021','FIELD_022','FIELD_023','FIELD_024','FIELD_025','FIELD_026','FIELD_027','FIELD_028','FIELD_029','FIELD_030','FIELD_031','FIELD_032','FIELD_033','FIELD_034','FIELD_035','FIELD_036','FIELD_037','FIELD_038','FIELD_039','FIELD_040','FIELD_041','FIELD_042','FIELD_043','FIELD_044','FIELD_045','FIELD_046','FIELD_047','FIELD_048','FIELD_049','FIELD_050','FIELD_051','FIELD_052','FIELD_053','FIELD_054','FIELD_055','FIELD_056','FIELD_057','FIELD_058','FIELD_059','FIELD_060','FIELD_061','FIELD_062','FIELD_063','FIELD_064','FIELD_065','FIELD_066','FIELD_067','FIELD_068','FIELD_069','FIELD_070','FIELD_071','FIELD_072','FIELD_073','FIELD_074','FIELD_075','FIELD_076','FIELD_077','FIELD_078','FIELD_079','FIELD_080','FIELD_081','FIELD_082','FIELD_083','FIELD_084','FIELD_085','FIELD_086','FIELD_087','FIELD_088','FIELD_089','FIELD_090','FIELD_091','FIELD_092','FIELD_093','FIELD_094','FIELD_095','FIELD_096','FIELD_097','FIELD_098','FIELD_099','FIELD_100','FIELD_101','FIELD_102','FIELD_103','FIELD_104','FIELD_105','FIELD_106','FIELD_107','FIELD_108','FIELD_109','FIELD_110','FIELD_111','FIELD_112','FIELD_113','FIELD_114','FIELD_115','FIELD_116','FIELD_117','FIELD_118','FIELD_119','FIELD_120','FIELD_121','FIELD_122','FIELD_123','FIELD_124','FIELD_125','FIELD_126','FIELD_127','FIELD_128','FIELD_129','FIELD_130','FIELD_131','FIELD_132','FIELD_133','FIELD_134','FIELD_135','FIELD_136','FIELD_137','FIELD_138','FIELD_139','FIELD_140','FIELD_141','FIELD_142','FIELD_143','FIELD_144','FIELD_145','FIELD_146','FIELD_147','FIELD_148','FIELD_149','FIELD_150','FIELD_151','FIELD_152','FIELD_153','FIELD_154','FIELD_155','FIELD_156','FIELD_157','FIELD_158','FIELD_159','FIELD_160','FIELD_161','FIELD_162','FIELD_163','FIELD_164','FIELD_165','FIELD_166','FIELD_167','FIELD_168','FIELD_169','FIELD_170','FIELD_171','FIELD_172','FIELD_173','FIELD_174','FIELD_175','FIELD_176','FIELD_177','FIELD_178','FIELD_179','FIELD_180','FIELD_181','FIELD_182','FIELD_183','FIELD_184','FIELD_185','FIELD_186','FIELD_187','FIELD_188','FIELD_189','FIELD_190','FIELD_191','FIELD_192','FIELD_193','FIELD_194','FIELD_195','FIELD_196','FIELD_197','FIELD_198','FIELD_199','FIELD_200','FIELD_201','FIELD_202','FIELD_203','FIELD_204','FIELD_205','FIELD_206','FIELD_207','FIELD_208','FIELD_209','FIELD_210','FIELD_211','FIELD_212','FIELD_213','FIELD_214','FIELD_215','FIELD_216','FIELD_217','FIELD_218','FIELD_219','FIELD_220','FIELD_221','FIELD_222','FIELD_223','FIELD_224','FIELD_225','FIELD_226','FIELD_227','FIELD_228','FIELD_229','FIELD_230','FIELD_231','FIELD_232','FIELD_233','FIELD_234','FIELD_235','FIELD_236','FIELD_237','FIELD_238','FIELD_239','FIELD_240','FIELD_241','FIELD_242','FIELD_243','FIELD_244','FIELD_245','FIELD_246','FIELD_247','FIELD_248','FIELD_249','FIELD_250','FIELD_251','FIELD_252','FIELD_253','FIELD_254','FIELD_255','FIELD_256','FIELD_257','FIELD_258','FIELD_259','FIELD_260','FIELD_261','FIELD_262','FIELD_263','FIELD_264','FIELD_265','FIELD_266','FIELD_267','FIELD_268','FIELD_269','FIELD_270','FIELD_271','FIELD_272','FIELD_273','FIELD_274','FIELD_275','FIELD_276','FIELD_277','FIELD_278','FIELD_279','FIELD_280','FIELD_281','FIELD_282','FIELD_283','FIELD_284','FIELD_285','FIELD_286','FIELD_287','FIELD_288','FIELD_289','FIELD_290','FIELD_291','FIELD_292','FIELD_293','FIELD_294','FIELD_295','FIELD_296','FIELD_297','FIELD_298','FIELD_299','FIELD_300','FIELD_301','FIELD_302','FIELD_303','FIELD_304','FIELD_305','FIELD_306','FIELD_307','FIELD_308','FIELD_309','FIELD_310','FIELD_311','FIELD_312','FIELD_313','FIELD_314','FIELD_315','FIELD_316','FIELD_317','FIELD_318','FIELD_319','FIELD_320','FIELD_321','FIELD_322','FIELD_323','FIELD_324','FIELD_325','FIELD_326']:
- rows += 1
- columns = 0
- for Factor in row[0:]:
- if is_a_num(Factor) and Factor != '':
- # if not Factor.isalnum() and Factor != '' :
- columns += 1
- print(rows,columns,Factor)
- '''
- #方法二
- for row in csv_reader:
- if row != ['FIELD_000','FIELD_001','FIELD_002','FIELD_003','FIELD_004','FIELD_005','FIELD_006','FIELD_007','FIELD_008','FIELD_009','FIELD_010','FIELD_011','FIELD_012','FIELD_013','FIELD_014','FIELD_015','FIELD_016','FIELD_017','FIELD_018','FIELD_019','FIELD_020','FIELD_021','FIELD_022','FIELD_023','FIELD_024','FIELD_025','FIELD_026','FIELD_027','FIELD_028','FIELD_029','FIELD_030','FIELD_031','FIELD_032','FIELD_033','FIELD_034','FIELD_035','FIELD_036','FIELD_037','FIELD_038','FIELD_039','FIELD_040','FIELD_041','FIELD_042','FIELD_043','FIELD_044','FIELD_045','FIELD_046','FIELD_047','FIELD_048','FIELD_049','FIELD_050','FIELD_051','FIELD_052','FIELD_053','FIELD_054','FIELD_055','FIELD_056','FIELD_057','FIELD_058','FIELD_059','FIELD_060','FIELD_061','FIELD_062','FIELD_063','FIELD_064','FIELD_065','FIELD_066','FIELD_067','FIELD_068','FIELD_069','FIELD_070','FIELD_071','FIELD_072','FIELD_073','FIELD_074','FIELD_075','FIELD_076','FIELD_077','FIELD_078','FIELD_079','FIELD_080','FIELD_081','FIELD_082','FIELD_083','FIELD_084','FIELD_085','FIELD_086','FIELD_087','FIELD_088','FIELD_089','FIELD_090','FIELD_091','FIELD_092','FIELD_093','FIELD_094','FIELD_095','FIELD_096','FIELD_097','FIELD_098','FIELD_099','FIELD_100','FIELD_101','FIELD_102','FIELD_103','FIELD_104','FIELD_105','FIELD_106','FIELD_107','FIELD_108','FIELD_109','FIELD_110','FIELD_111','FIELD_112','FIELD_113','FIELD_114','FIELD_115','FIELD_116','FIELD_117','FIELD_118','FIELD_119','FIELD_120','FIELD_121','FIELD_122','FIELD_123','FIELD_124','FIELD_125','FIELD_126','FIELD_127','FIELD_128','FIELD_129','FIELD_130','FIELD_131','FIELD_132','FIELD_133','FIELD_134','FIELD_135','FIELD_136','FIELD_137','FIELD_138','FIELD_139','FIELD_140','FIELD_141','FIELD_142','FIELD_143','FIELD_144','FIELD_145','FIELD_146','FIELD_147','FIELD_148','FIELD_149','FIELD_150','FIELD_151','FIELD_152','FIELD_153','FIELD_154','FIELD_155','FIELD_156','FIELD_157','FIELD_158','FIELD_159','FIELD_160','FIELD_161','FIELD_162','FIELD_163','FIELD_164','FIELD_165','FIELD_166','FIELD_167','FIELD_168','FIELD_169','FIELD_170','FIELD_171','FIELD_172','FIELD_173','FIELD_174','FIELD_175','FIELD_176','FIELD_177','FIELD_178','FIELD_179','FIELD_180','FIELD_181','FIELD_182','FIELD_183','FIELD_184','FIELD_185','FIELD_186','FIELD_187','FIELD_188','FIELD_189','FIELD_190','FIELD_191','FIELD_192','FIELD_193','FIELD_194','FIELD_195','FIELD_196','FIELD_197','FIELD_198','FIELD_199','FIELD_200','FIELD_201','FIELD_202','FIELD_203','FIELD_204','FIELD_205','FIELD_206','FIELD_207','FIELD_208','FIELD_209','FIELD_210','FIELD_211','FIELD_212','FIELD_213','FIELD_214','FIELD_215','FIELD_216','FIELD_217','FIELD_218','FIELD_219','FIELD_220','FIELD_221','FIELD_222','FIELD_223','FIELD_224','FIELD_225','FIELD_226','FIELD_227','FIELD_228','FIELD_229','FIELD_230','FIELD_231','FIELD_232','FIELD_233','FIELD_234','FIELD_235','FIELD_236','FIELD_237','FIELD_238','FIELD_239','FIELD_240','FIELD_241','FIELD_242','FIELD_243','FIELD_244','FIELD_245','FIELD_246','FIELD_247','FIELD_248','FIELD_249','FIELD_250','FIELD_251','FIELD_252','FIELD_253','FIELD_254','FIELD_255','FIELD_256','FIELD_257','FIELD_258','FIELD_259','FIELD_260','FIELD_261','FIELD_262','FIELD_263','FIELD_264','FIELD_265','FIELD_266','FIELD_267','FIELD_268','FIELD_269','FIELD_270','FIELD_271','FIELD_272','FIELD_273','FIELD_274','FIELD_275','FIELD_276','FIELD_277','FIELD_278','FIELD_279','FIELD_280','FIELD_281','FIELD_282','FIELD_283','FIELD_284','FIELD_285','FIELD_286','FIELD_287','FIELD_288','FIELD_289','FIELD_290','FIELD_291','FIELD_292','FIELD_293','FIELD_294','FIELD_295','FIELD_296','FIELD_297','FIELD_298','FIELD_299','FIELD_300','FIELD_301','FIELD_302','FIELD_303','FIELD_304','FIELD_305','FIELD_306','FIELD_307','FIELD_308','FIELD_309','FIELD_310','FIELD_311','FIELD_312','FIELD_313','FIELD_314','FIELD_315','FIELD_316','FIELD_317','FIELD_318','FIELD_319','FIELD_320','FIELD_321','FIELD_322','FIELD_323','FIELD_324','FIELD_325','FIELD_326']:
- rows += 1
- columns = 0
- for Factor in row[0:]:
- if re.match("[.0-9A-Z]+$", Factor) == None and Factor != '':
- # if not Factor.isalnum() and Factor != '' :
- columns += 1
- print(rows,columns,Factor)
其中,re.match为正则表达式:
re.match的函数原型为:re.match(pattern, string, flags)
第一个参数是正则表达式,这里为"[.0-9A-Z]+$",匹配[]中的任何字符至少1次,如果匹配成功,则返回一个Match,否则返回一个None;
第二个参数表示要匹配的字符串;
第三个参数是标致位,用于控制正则表达式的匹配方式,如:是否区分大小写,多行匹配等等。
Python 如何在csv中定位非数字和字母的符号的更多相关文章
- Python 解决写入csv中间隔一行空行问题
转载解决写入csv中间隔一行空行问题 写入csv: with open(birth_weight_file,'w') as f: writer=csv.writer(f) writer.writero ...
- Python的驻留机制(仅对数字,字母,下划线有效)
Python的驻留机制及为在同一运行空间内,当两变量的值相同,则地址也相同. 举例: a = 'abc' b = 'abc' print(id(a)) print(id(b)) 以上示例为驻留机制有效 ...
- python 找出字符串中出现次数最多的字母
# 请大家找出s=”aabbccddxxxxffff”中 出现次数最多的字母 # 第一种方法,字典方式: s="aabbccddxxxxffff" count ={} for i ...
- [C++/Python] 如何在C++中使用一个Python类? (Use Python-defined class in C++)
最近在做基于OpenCV的车牌识别, 其中需要用到深度学习的一些代码(Python), 所以一开始的时候开发语言选择了Python(祸患之源). 固然现在Python的速度不算太慢, 但你一定要用Py ...
- winform中如何在TextBox中只能输入数字(可以带小数点)
可以采用像web表单验证的方式,利用textbox的TextChanged事件,每当textbox内容变化时,调用正则表达式的方法验证,用一个label在text后面提示输入错误,具体代码如下: pr ...
- C# winform如何在textbox中判断输入的是字母还是数字?
1.用正规式using System.Text.RegularExpressions; string pattern = @"^\d+(\.\d)?$";if(Text1.Text ...
- python 如何在 command 中能够找到 其他module
部分代码如下: __author__ = 'norsd' # coding=utf8 # 上句说明使用utf8编码 try: import os import sys import time #关键语 ...
- js判断字符串中是否有数字和字母
var p = /[0-9]/; var b = p.test(string);//true,说明有数字var p = /[a-z]/i; var b = p.test(string);//true, ...
- sql 判断字符串中是否含有数字和字母
判断是否含有字母 select PATINDEX('%[A-Za-z]%', ‘ads23432’)=0 (如果存在字母,结果<>1) 判断是否含有数字 PATINDEX('%[0-9]% ...
随机推荐
- L2-006. 树的遍历(不建树)
L2-006. 树的遍历 给定一棵二叉树的后序遍历和中序遍历,请你输出其层序遍历的序列.这里假设键值都是互不相等的正整数. 输入格式: 输入第一行给出一个正整数N(<=30),是二叉树中结点 ...
- [bzoj1867][Noi1999][钉子和小球] (动态规划)
Description Input 第1行为整数n(2<=n<=50)和m(0<=m<=n).以下n行依次为木板上从上至下n行钉子的信息,每行中‘*’表示钉子还在,‘.’表示钉 ...
- mappingLocations、mappingDirectoryLocations与mappingJarLocations 区别 (转)
mappingLocations.mappingDirectoryLocations与mappingJarLocations 区别 由于spring对hibernate配置文件hibernate.cf ...
- codevs1004 四子连棋
题目描述 Description 在一个4*4的棋盘上摆放了14颗棋子,其中有7颗白色棋子,7颗黑色棋子,有两个空白地带,任何一颗黑白棋子都可以向上下左右四个方向移动到相邻的空格,这叫行棋一步,黑白双 ...
- [bzoj1617][Usaco2008 Mar]River Crossing渡河问题_动态规划
River Crossing渡河问题 bzoj-1617 Usaco-2008 Mar 题目大意:题目链接. 注释:略. 想法:zcs0724出考试题的时候并没有发现这题我做过... 先把m求前缀和, ...
- cogs——1786. 韩信点兵
1786. 韩信点兵 ★★★ 输入文件:HanXin.in 输出文件:HanXin.out 简单对比 时间限制:1 s 内存限制:256 MB [题目描述] 韩信是中国军事思想“谋战” ...
- 阿里云nginx创建多站点
最近开始用阿里云的vps,用了它的一键安装包安装了php环境,nginx的.下面记录创建多站点的心得. 首先php安装好后会自带安装一个phpwind的站点. 文件目录存放在 /alidata/www ...
- postgresql vacuum table
2down vote according to Documentation VACUUM reclaims storage occupied by dead tuples. But according ...
- 闲聊ROOT权限——ROOT权限的前世今生
近期工作一直非常忙.居然慢慢地疏远了CSDN的博客,然而在工作中遇到问题,又会被多次的引导至CSDN,故笔者抽出时间也将自己学习的成果与大家分享在这里,希望能帮助到须要帮助的人. 本文将从几个方面,由 ...
- Neutron中的网络I/O虚拟化
为了提升网络I/O性能.虚拟化的网络I/O模型也在不断的演化: 1,全虚拟化网卡(emulation).如VMware中的E1000用来仿真intel 82545千兆网卡,它的功能更完备,如相比一些半 ...