python 部分数据处理代码

# -*- coding:utf8 -*-
import os
import jieba.posseg as pseg
# -*- coding:utf8 -*-
import os

def splitSentence(inputFile,name):
    fin = open(inputFile, 'r')      #以读的方式打开文件
    print name
    fout= open('/home/xdj/target/'+name,'w')         #以写得方式打开文件
    for eachLine in fin:
        line = eachLine.strip().decode('utf-8', 'ignore')      #去除每行首尾可能出现的空格，并转为Unicode进行处理
   line=line.strip('\n')                                       #去掉多余空行
        wordList = pseg.cut(line)                        #用结巴分词，对每行内容进行分词
        outStr = ''
        for word in wordList:#
       #print word.word,word.flag
       outStr += word.word+'/'+word.flag
   #print outStr
        fout.write(outStr.encode('utf-8'))              #将分词好的结果写入到输出文件
       fout.write('\n')
    fin.close()
    fout.close()

path='/media/软件/zhuomian/VARandLDA/xuejiesourse'
fns=[os.path.join(root,fn) for root,dirs,files in os.walk(path) for fn in files]
#fout= open('/home/xdj/myOutput.txt','w')

i=-1
num=0
for f in fns:
    print f
    i=i+1
    strm = '%d' %i
    splitSentence(f,strm)
#fout.close()
print num

# -*- coding:utf8 -*-

import os

import jieba.posseg as pseg

# -*- coding:utf8 -*-

import os

def splitSentence(inputFile,name):

    fin = open(inputFile, 'r')      #以读的方式打开文件

    print name

    fout= open('/home/xdj/target/'+name,'w')         #以写得方式打开文件

    for eachLine in fin:

        line = eachLine.strip().decode('utf-8', 'ignore')      #去除每行首尾可能出现的空格，并转为Unicode进行处理

    line=line.strip('\n')                                       #去掉多余空行

        wordList = pseg.cut(line)                        #用结巴分词，对每行内容进行分词

        outStr = ''

        for word in wordList:#

        #print word.word,word.flag

        outStr += word.word+'/'+word.flag

    #print outStr

        fout.write(outStr.encode('utf-8'))              #将分词好的结果写入到输出文件

        fout.write('\n')

    fin.close()

    fout.close()

path='/media/软件/zhuomian/VARandLDA/xuejiesourse'

fns=[os.path.join(root,fn) for root,dirs,files in os.walk(path) for fn in files]

#fout= open('/home/xdj/myOutput.txt','w') 

i=-1

num=0

for f in fns:

    print f

    i=i+1

    strm = '%d' %i

    splitSentence(f,strm)

#fout.close()

print num

python 部分数据处理代码的更多相关文章

python地理数据处理库geopy
http://blog.csdn.net/pipisorry/article/details/52205266 python地理位置处理 python地理编码地址以及用来处理经纬度的库 GeoDjan ...
<转>机器学习系列(9)_机器学习算法一览（附Python和R代码）
转自http://blog.csdn.net/han_xiaoyang/article/details/51191386 – 谷歌的无人车和机器人得到了很多关注,但我们真正的未来却在于能够使电脑变得更 ...
Python 坑爹之代码缩进
建议:统一使用空格!!!!!!!!!不要Tab Python代码缩进这两天python-cn邮件列表有一条thread发展的特别长,题目是<python的代码缩进真是坑爹>(地址), ...
Python第一行代码
Python版本:Python 3.6.1 0x01 命令行交互在交互式环境的提示符>>>下,直接输入代码,按回车,就可以立刻得到代码执行结果.现在,试试输入100+200,看看计 ...
python的PEP8 代码风格指南
PEP8 代码风格指南这篇文章原文实际上来自于这里:https://www.python.org/dev/peps/pep-0008/ 知识点代码排版字符串引号表达式和语句中的空格注释版本 ...
python 常忘代码查询和autohotkey补括号脚本和一些笔记和面试常见问题
笔试一些注意点: --,23点43 今天做的京东笔试题目: 编程题目一定要先写变量取None的情况.今天就是因为没有写这个边界条件所以程序一直不对.以后要注意!!!!!!!!!!!!!!!!!!!!! ...
用python处理html代码的转义与还原
用python处理html代码的转义与还原转义 escape: import cgi s = cgi.escape("""& < >" ...
【转】利用Boost.Python将C++代码封装为Python模块
用Boost.Python将C++代码封装为Python模块一. 基础篇借助Boost.Python库可以将C/C++代码方便.快捷地移植到python模块当中,实现对python模块的扩 ...
python爬虫小说代码，可用的
python爬虫小说代码,可用的,以笔趣阁为例子,python3.6以上,可用作者的QQ:342290433,汉唐自远工程师 import requests import refrom lxml i ...

随机推荐

IDEA中配置JUnit单元测试
参考安装教程:https://www.jianshu.com/p/c37753b6dbd6 如果想用junit4的话,需要在pom.xml中配置. 需要安装JUnitGenerator V2.0插件, ...
第二章向量（e）起泡排序
NumPy 数学函数
NumPy 数学函数 NumPy 包含大量的各种数学运算的函数,包括三角函数,算术运算的函数,复数处理函数等. 三角函数 NumPy 提供了标准的三角函数:sin().cos().tan(). 实例 ...
【转】iOS 自动化性能采集
前言对于iOS总体生态是比较封闭的,相比Android没有像adb这种可以查看内存.cpu的命令.在日常做性能测试,需要借助xcode中instruments查看内存.cpu等数据. 但是借助i ...
15. 3Sum (重新分配数组大小）
Given an array S of n integers, are there elements a, b, c in S such that a + b + c = 0? Find all un ...
Python+Selenium学习--设置等待时间
场景 sleep():设置固定休眠时间.python 的time 包提供了休眠方法sleep() ,导入time 包后就可以使用sleep()进行脚本的执行过程进行休眠.implicitly_wait ...
对stm32f373XX的startup.s的文件的分析
;******************** (C) COPYRIGHT 2012 STMicroelectronics ********************;* File Name : start ...
100-days: Two
Title: London HIV patient's remission spurs hope for curing AIDS HIV 艾滋病毒 human immunodeficiency vi ...
POJ 1684 Corn Fields(状压dp)
描述 Farmer John has purchased a lush new rectangular pasture composed of M by N (1 ≤ M ≤ 12; 1 ≤ N ≤ ...
关于Laravel框架
第1讲-Laravel介绍 1.1 什么是Laravel laravel是目前一个比较主流的框架,现在很多互联网的公司都在使用该框架.该框架的前身是symfony框架 Laravel的定位就是做一个简 ...

python 部分数据处理代码

python 部分数据处理代码的更多相关文章

随机推荐

热门专题