I have a large file ( ~4G) to process in Python. I wonder whether it is OK to "read" such a large file. So I tried in the following several ways:

The original large file to deal with is not "./CentOS-6.5-i386.iso", I just take this file as an example here.

1:  Normal Method. (ignore try/except/finally)

def main():
f = open(r"./CentOS-6.5-i386.iso", "rb")
for line in f:
print(line, end="")
f.close() if __name__ == "__main__":
main()

2: "With" Method.

def main():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
print(line, end="") if __name__ == "__main__":
main()

3:  "readlines" Method. [Bad Idea]

#NO. readlines() is really bad for large files.
#Memory Error.
def main():
for line in open(r"./CentOS-6.5-i386.iso", "rb").readlines():
print(line, end="") if __name__ == "__main__":
main()

4: "fileinput" Method.

import fileinput

def main():
for line in fileinput.input(files=r"./CentOS-6.5-i386.iso", mode="rb"):
print(line, end="") if __name__ == "__main__":
main()

5: "Generator" Method.

def readFile():
with open(r"./CentOS-6.5-i386.iso", "rb") as f:
for line in f:
yield line def main():
for line in readFile():
print(line, end="") if __name__ == "__main__":
main()

The methods above, all work well for small files, but not always for large files(readlines Method). The readlines() function loads the entire file into memory as it runs.

When I run the readlines Method, I got the following error message:

When using the readlines Method, the Percentage of Used CPU and Used Memory rises rapidly(in the following figure). And when the percentage of Used Memory reaches over 50%, I got the "MemoryError" in Python.

The other methods (Normal Method, With Method, fileinput Method, Generator Method) works well for large files. And when using these methods, the workload for CPU and memory which is shown in the following figure does not get a distinct rise.

By the way, I recommend the generator method, because it shows clearly that you have taken the file size into account.

Reference:

How to read large file, line by line in python

Read Large Files in Python的更多相关文章

  1. Huge CSV and XML Files in Python, Error: field larger than field limit (131072)

    Huge CSV and XML Files in Python January 22, 2009. Filed under python twitter facebook pinterest lin ...

  2. Working with Excel Files in Python

    Working with Excel Files in Python from: http://www.python-excel.org/ This site contains pointers to ...

  3. GitHub 上传文件过大报错:remote: error: GH001: Large files detected.

    1.查看哪个文件过大了 报错信息: remote: Resolving deltas: 100% (24/24), completed with 3 local objects. remote: wa ...

  4. Creating Excel files with Python and XlsxWriter(通过 Python和XlsxWriter来创建Excel文件(xlsx格式))

    以下所有内容翻译至: https://xlsxwriter.readthedocs.io/ #----------------------------------------------------- ...

  5. 【Selenium】【BugList4】执行pip报错:Fatal error in launcher: Unable to create process using '""D:\Program Files\Python36\python.exe"" "D:\Program Files\Python36\Scripts\pip.exe" '

    环境信息: python版本:V3.6.4 安装路径:D:\Program Files\python36 环境变量PATH:D:\Program Files\Python36;D:\Program F ...

  6. Read a large file with python

    python读取大文件 较pythonic的方法,使用with结构 文件可以自动关闭 异常可以在with块内处理 with open(filename, 'rb') as f: for line in ...

  7. reading/writing files in Python

    file types: plaintext files, such as .txt .py Binary files, such as .docx, .pdf, iamges, spreadsheet ...

  8. How to read and write multiple files in Python?

    Goal: I want to write a program for this: In a folder I have =n= number of files; first read one fil ...

  9. Creating Excel files with Python and XlsxWriter——Introduction

    XlsxWriter 是用来写Excel2007版本以上的xlsx文件的Python模块. XlsxWriter 在供选择的可以写Excel的Python模块中有自己的优缺点. #---------- ...

随机推荐

  1. 获取JQuery UI tabs中被选中的tabs的方法

    JQuery标签事件处理实例 如果你正在使用JQuery tabs而且想从基本的功能扩展到自定义的功能,这是你最好知道如何处理JQuery的点击事件. 在这篇文章中: 1.回顾如何添加当tab被点击时 ...

  2. 完整的jdbc查询结果集编码

    public static ArrayList<HashMap<String,Object>> query(Connection conn,String sql, Object ...

  3. Visual Studio 2017 RC使用初体验

    .NET Core新式,高效,特别适合用于大规模的Web应用:而传统的.NET Framework则非常适合用于开发Windows桌面应用程序. 一 安装 请下载Visual Studio 2017 ...

  4. [工具04]java实现获取鼠标的坐标

    本篇博客其实没什么难度可言,在这里分享给大家,是因为有时候我们需要这个工具,java作为跨平台语言的优势在这个软件就可以体现出来,不需修改就可以在windows.mac.linux上使用这个软件. 这 ...

  5. 重载(Overload)

    重载(Overload) 重载(overloading) 是在一个类里面,方法名字相同,而参数不同.返回类型可以相同也可以不同. 每个重载的方法(或者构造函数)都必须有一个独一无二的参数类型列表. 最 ...

  6. Android中TextView和EditView经常使用属性设置

    Android开发中最经常使用的几乎相同就是TextView和EditView了,在使用它时.我们也会设置它的一些属性,为了让我们设计的更好看,设置的更合理.这里记下它的经常使用属性,方便后期查阅. ...

  7. 【JavaEE】SSH+Spring Security整合及example

    到前文为止,SSH的基本框架都已经搭建出来了,现在,在这基础上再加上权限控制,也就是Spring Security框架,和前文的顺序一样,先看看需要加哪些库. 1. pom.xml Spring Se ...

  8. Ubuntu安装qBittorrent

    qBitTorrent是Ubuntu Linux中最受欢迎的P2P软件之中的一个. 出自一名法国大学生之手的qBitTorrent功能强大.界面精美.操作直观. qBitTorrent是Linux中最 ...

  9. Android无线测试之—UiAutomator UiObject API介绍五

    获取对象属性与属性的判断 1.获取对象属性相关API 返回值 API 说明 Rect getBounds() 获取对象矩形坐标,矩形左上角坐标与右下角坐标 int getChildCount() 获得 ...

  10. manacher算法处理最长的回文子串(一)

    引言 相信大家都玩过折叠纸张,如果把回文串相当于折叠一个A4纸,比如ABCDDCBA就是沿着中轴线(D与D之间)对折重合,那么这个就是一个回文串.或者是ABCDEDCBA的中轴线就是E,那么沿着中轴线 ...