I have a large file ( ~4G) to process in Python. I wonder whether it is OK to "read" such a large file. So I tried in the following several ways:

The original large file to deal with is not "./CentOS-6.5-i386.iso", I just take this file as an example here.

1:  Normal Method. (ignore try/except/finally)

def main():
f = open(r"./CentOS-6.5-i386.iso", "rb")
for line in f:
print(line, end="")
f.close() if __name__ == "__main__":
main()

2: "With" Method.

def main():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
print(line, end="") if __name__ == "__main__":
main()

3:  "readlines" Method. [Bad Idea]

#NO. readlines() is really bad for large files.
#Memory Error.
def main():
for line in open(r"./CentOS-6.5-i386.iso", "rb").readlines():
print(line, end="") if __name__ == "__main__":
main()

4: "fileinput" Method.

import fileinput

def main():
for line in fileinput.input(files=r"./CentOS-6.5-i386.iso", mode="rb"):
print(line, end="") if __name__ == "__main__":
main()

5: "Generator" Method.

def readFile():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
yield line def main():
for line in readFile():
print(line, end="") if __name__ == "__main__":
main()

The methods above, all work well for small files, but not always for large files(readlines Method). The readlines() function loads the entire file into memory as it runs.

When I run the readlines Method, I got the following error message:

When using the readlines Method, the Percentage of Used CPU and Used Memory rises rapidly(in the following figure). And when the percentage of Used Memory reaches over 50%, I got the "MemoryError" in Python.

The other methods (Normal Method, With Method, fileinput Method, Generator Method) works well for large files. And when using these methods, the workload for CPU and memory which is shown in the following figure does not get a distinct rise.

By the way, I recommend the generator method, because it shows clearly that you have taken the file size into account.

Reference:

How to read large file, line by line in python

Read Large Files in Python的更多相关文章

  1. Huge CSV and XML Files in Python, Error: field larger than field limit (131072)

    Huge CSV and XML Files in Python January 22, 2009. Filed under python twitter facebook pinterest lin ...

  2. Working with Excel Files in Python

    Working with Excel Files in Python from: http://www.python-excel.org/ This site contains pointers to ...

  3. GitHub 上传文件过大报错:remote: error: GH001: Large files detected.

    1.查看哪个文件过大了 报错信息: remote: Resolving deltas: 100% (24/24), completed with 3 local objects. remote: wa ...

  4. Creating Excel files with Python and XlsxWriter(通过 Python和XlsxWriter来创建Excel文件(xlsx格式))

    以下所有内容翻译至: https://xlsxwriter.readthedocs.io/ #----------------------------------------------------- ...

  5. 【Selenium】【BugList4】执行pip报错:Fatal error in launcher: Unable to create process using '""D:\Program Files\Python36\python.exe"" "D:\Program Files\Python36\Scripts\pip.exe" '

    环境信息: python版本:V3.6.4 安装路径:D:\Program Files\python36 环境变量PATH:D:\Program Files\Python36;D:\Program F ...

  6. Read a large file with python

    python读取大文件 较pythonic的方法,使用with结构 文件可以自动关闭 异常可以在with块内处理 with open(filename, 'rb') as f: for line in ...

  7. reading/writing files in Python

    file types: plaintext files, such as .txt .py Binary files, such as .docx, .pdf, iamges, spreadsheet ...

  8. How to read and write multiple files in Python?

    Goal: I want to write a program for this: In a folder I have =n= number of files; first read one fil ...

  9. Creating Excel files with Python and XlsxWriter——Introduction

    XlsxWriter 是用来写Excel2007版本以上的xlsx文件的Python模块. XlsxWriter 在供选择的可以写Excel的Python模块中有自己的优缺点. #---------- ...

随机推荐

  1. android studio 中配置androidAnnotation 的新版正确配置

    apply ].processResources.manifestFile resourcePackageName 'com.peiandsky.firstandroidstudio' }}

  2. poj2987 Firing 最大权闭合子图 边权有正有负

    /** 题目:poj2987 Firing 最大权闭合子图 边权有正有负 链接:http://poj.org/problem?id=2987 题意:由于金融危机,公司要裁员,如果裁了员工x,那么x的下 ...

  3. js 数组取出最大值最小值和平均值的方法

    1.直接遍历数组 ,,,,,,,]; ]; ;i<arr.length;i++){ if(max<arr[i]) max=arr[i]; } 2.借用Math的方法 ,,,,,,,]; v ...

  4. 第二百零八节,jQuery EasyUI,SplitButton(分割按钮菜单)组件

    jQuery EasyUI,SplitButton(分割按钮)组件 学习要点: 1.加载方式 2.属性列表 3.方法列表 本节课重点了解 EasyUI 中 SplitButton(分割按钮)组件的使用 ...

  5. 数组有没有length()这个方法?String有没有length()这个方法?

    数组有没有length()这个方法?String有没有length()这个方法? 解答:数组没有length()方法 它有length属性 String有length()方法.

  6. js ie 6,7,8 使用不了 firstElementChild

    一. <div> <p>123</p> </div> 在上面这段代码中,如果使用以下js代码 var oDiv=document.getElementB ...

  7. 网页(aspx)与用户控件(ascx)交互逻辑处理实现

    几个页面(ASPX)都使用一些相同的控件,一个文本框,二个按钮(搜索和导出),为了以后好维护,把这相同的部分抽取放在一个用户控件(ASCX)上.现需要处理逻辑如下 搜索事件处理的逻辑在各个页面处理. ...

  8. jquery 复选框全选/全不选切换 普通DOM元素点击选中/取消选中切换

    1.要选中的复选框设置统一的name 用prop() prop() 方法设置或返回被选元素的属性和值. $("#selectAll").click(function(){ $(&q ...

  9. VC++ 判断你的窗口是否置顶TopMost

    大家可能已经知道,使你的窗口置顶(TopMost)或者总是最前(Always on Top)的方法:  C++ Code  12345   // Make topmost , SWP_NOMOVE | ...

  10. Eclipse下导入外部jar包的3种方式

    http://blog.csdn.net/mazhaojuan/article/details/21403717