模块,用一砣代码实现了某个功能的代码集合。

类似于函数式编程和面向过程编程,函数式编程则完成一个功能,其他代码用来调用即可,提供了代码的重用性和代码间的耦合。而对于一个复杂的功能来,可能需要多个函数才能完成(函数又可以在不同的.py文件中),n个 .py 文件组成的代码集合就称为模块。

如:os 是系统相关的模块;file是文件操作相关的模块

模块分为三种:

  • 自定义模块
  • 内置标准模块(又称标准库)
  • 开源模块

自定义模块 和开源模块的使用参考 http://www.cnblogs.com/wupeiqi/articles/4963027.html

一、time 和 datetime模块

 1 #_*_coding:utf-8_*_
2 __author__ = 'Alex Li'
3
4 import time
5
6
7 # print(time.clock()) #返回处理器时间,3.3开始已废弃 , 改成了time.process_time()测量处理器运算时间,不包括sleep时间,不稳定,mac上测不出来
8 # print(time.altzone) #返回与utc时间的时间差,以秒计算\
9 # print(time.asctime()) #返回时间格式"Fri Aug 19 11:14:16 2016",
10 # print(time.localtime()) #返回本地时间 的struct time对象格式
11 # print(time.gmtime(time.time()-800000)) #返回utc时间的struc时间对象格式
12
13 # print(time.asctime(time.localtime())) #返回时间格式"Fri Aug 19 11:14:16 2016",
14 #print(time.ctime()) #返回Fri Aug 19 12:38:29 2016 格式, 同上
15
16
17
18 # 日期字符串 转成 时间戳
19 # string_2_struct = time.strptime("2016/05/22","%Y/%m/%d") #将 日期字符串 转成 struct时间对象格式
20 # print(string_2_struct)
21 # #
22 # struct_2_stamp = time.mktime(string_2_struct) #将struct时间对象转成时间戳
23 # print(struct_2_stamp)
24
25
26
27 #将时间戳转为字符串格式
28 # print(time.gmtime(time.time()-86640)) #将utc时间戳转换成struct_time格式
29 # print(time.strftime("%Y-%m-%d %H:%M:%S",time.gmtime()) ) #将utc struct_time格式转成指定的字符串格式
30
31
32
33
34
35 #时间加减
36 import datetime
37
38 # print(datetime.datetime.now()) #返回 2016-08-19 12:47:03.941925
39 #print(datetime.date.fromtimestamp(time.time()) ) # 时间戳直接转成日期格式 2016-08-19
40 # print(datetime.datetime.now() )
41 # print(datetime.datetime.now() + datetime.timedelta(3)) #当前时间+3天
42 # print(datetime.datetime.now() + datetime.timedelta(-3)) #当前时间-3天
43 # print(datetime.datetime.now() + datetime.timedelta(hours=3)) #当前时间+3小时
44 # print(datetime.datetime.now() + datetime.timedelta(minutes=30)) #当前时间+30分
45
46
47 #
48 # c_time = datetime.datetime.now()
49 # print(c_time.replace(minute=3,hour=2)) #时间替换
Directive Meaning Notes
%a Locale’s abbreviated weekday name.  
%A Locale’s full weekday name.  
%b Locale’s abbreviated month name.  
%B Locale’s full month name.  
%c Locale’s appropriate date and time representation.  
%d Day of the month as a decimal number [01,31].  
%H Hour (24-hour clock) as a decimal number [00,23].  
%I Hour (12-hour clock) as a decimal number [01,12].  
%j Day of the year as a decimal number [001,366].  
%m Month as a decimal number [01,12].  
%M Minute as a decimal number [00,59].  
%p Locale’s equivalent of either AM or PM. (1)
%S Second as a decimal number [00,61]. (2)
%U Week number of the year (Sunday as the first day of the week) as a decimal number [00,53]. All days in a new year preceding the first Sunday are considered to be in week 0. (3)
%w Weekday as a decimal number [0(Sunday),6].  
%W Week number of the year (Monday as the first day of the week) as a decimal number [00,53]. All days in a new year preceding the first Monday are considered to be in week 0. (3)
%x Locale’s appropriate date representation.  
%X Locale’s appropriate time representation.  
%y Year without century as a decimal number [00,99].  
%Y Year with century as a decimal number.  
%z Time zone offset indicating a positive or negative time difference from UTC/GMT of the form +HHMM or -HHMM, where H represents decimal hour digits and M represents decimal minute digits [-23:59, +23:59].  
%Z Time zone name (no characters if no time zone exists).  
%% A literal '%' character.

二、random模块

随机数

1 import random
2 print(random.random())
3 print(random.randint(1,2))
4 print(random.randrange(1,10))

生成随机验证码

 1 import random
2 checkcode = ''
3 for i in range(4):
4 current = random.randrange(0,4)
5 if current != i:
6 temp = chr(random.randint(65,90))
7 else:
8 temp = random.randint(0,9)
9 checkcode += str(temp)
10 print checkcode

三、OS模块

提供对操作系统进行调用的接口

 1 os.getcwd() 获取当前工作目录,即当前python脚本工作的目录路径
2 os.chdir("dirname") 改变当前脚本工作目录;相当于shell下cd
3 os.curdir 返回当前目录: ('.')
4 os.pardir 获取当前目录的父目录字符串名:('..')
5 os.makedirs('dirname1/dirname2') 可生成多层递归目录
6 os.removedirs('dirname1') 若目录为空,则删除,并递归到上一级目录,如若也为空,则删除,依此类推
7 os.mkdir('dirname') 生成单级目录;相当于shell中mkdir dirname
8 os.rmdir('dirname') 删除单级空目录,若目录不为空则无法删除,报错;相当于shell中rmdir dirname
9 os.listdir('dirname') 列出指定目录下的所有文件和子目录,包括隐藏文件,并以列表方式打印
10 os.remove() 删除一个文件
11 os.rename("oldname","newname") 重命名文件/目录
12 os.stat('path/filename') 获取文件/目录信息
13 os.sep 输出操作系统特定的路径分隔符,win下为"\\",Linux下为"/"
14 os.linesep 输出当前平台使用的行终止符,win下为"\t\n",Linux下为"\n"
15 os.pathsep 输出用于分割文件路径的字符串
16 os.name 输出字符串指示当前使用平台。win->'nt'; Linux->'posix'
17 os.system("bash command") 运行shell命令,直接显示
18 os.environ 获取系统环境变量
19 os.path.abspath(path) 返回path规范化的绝对路径
20 os.path.split(path) 将path分割成目录和文件名二元组返回
21 os.path.dirname(path) 返回path的目录。其实就是os.path.split(path)的第一个元素
22 os.path.basename(path) 返回path最后的文件名。如何path以/或\结尾,那么就会返回空值。即os.path.split(path)的第二个元素
23 os.path.exists(path) 如果path存在,返回True;如果path不存在,返回False
24 os.path.isabs(path) 如果path是绝对路径,返回True
25 os.path.isfile(path) 如果path是一个存在的文件,返回True。否则返回False
26 os.path.isdir(path) 如果path是一个存在的目录,则返回True。否则返回False
27 os.path.join(path1[, path2[, ...]]) 将多个路径组合后返回,第一个绝对路径之前的参数将被忽略
28 os.path.getatime(path) 返回path所指向的文件或者目录的最后存取时间
29 os.path.getmtime(path) 返回path所指向的文件或者目录的最后修改时间

四、sys模块

1 import sys
2 sys.argv 命令行参数List,第一个元素是程序本身路径
3 sys.exit(n) 退出程序,正常退出时exit(0)
4 sys.version 获取Python解释程序的版本信息
5 sys.maxint 最大的Int值
6 sys.path 返回模块的搜索路径,初始化时使用PYTHONPATH环境变量的值
7 sys.platform 返回操作系统平台名称
8 sys.stdout.write('please:')
9 val = sys.stdin.readline()[:-1]

五、shutil

高级的 文件、文件夹、压缩包 处理模块

shutil.copyfileobj(fsrc, fdst[, length])

将文件内容拷贝到另一个文件中,可以部分内容

1 def copyfileobj(fsrc, fdst, length=16*1024):
2 """copy data from file-like object fsrc to file-like object fdst"""
3 while 1:
4 buf = fsrc.read(length)
5 if not buf:
6 break
7 fdst.write(buf)

shutil.copyfile(src, dst)
拷贝文件

 1 def copyfile(src, dst):
2 """Copy data from src to dst"""
3 if _samefile(src, dst):
4 raise Error("`%s` and `%s` are the same file" % (src, dst))
5
6 for fn in [src, dst]:
7 try:
8 st = os.stat(fn)
9 except OSError:
10 # File most likely does not exist
11 pass
12 else:
13 # XXX What about other special files? (sockets, devices...)
14 if stat.S_ISFIFO(st.st_mode):
15 raise SpecialFileError("`%s` is a named pipe" % fn)
16
17 with open(src, 'rb') as fsrc:
18 with open(dst, 'wb') as fdst:
19 copyfileobj(fsrc, fdst)

shutil.copymode(src, dst)
仅拷贝权限。内容、组、用户均不变

1 def copymode(src, dst):
2 """Copy mode bits from src to dst"""
3 if hasattr(os, 'chmod'):
4 st = os.stat(src)
5 mode = stat.S_IMODE(st.st_mode)
6 os.chmod(dst, mode)

shutil.copystat(src, dst)
拷贝状态的信息,包括:mode bits, atime, mtime, flags

 1 def copystat(src, dst):
2 """Copy all stat info (mode bits, atime, mtime, flags) from src to dst"""
3 st = os.stat(src)
4 mode = stat.S_IMODE(st.st_mode)
5 if hasattr(os, 'utime'):
6 os.utime(dst, (st.st_atime, st.st_mtime))
7 if hasattr(os, 'chmod'):
8 os.chmod(dst, mode)
9 if hasattr(os, 'chflags') and hasattr(st, 'st_flags'):
10 try:
11 os.chflags(dst, st.st_flags)
12 except OSError, why:
13 for err in 'EOPNOTSUPP', 'ENOTSUP':
14 if hasattr(errno, err) and why.errno == getattr(errno, err):
15 break
16 else:
17 raise

shutil.copy(src, dst)
拷贝文件和权限

 1 def copy(src, dst):
2 """Copy data and mode bits ("cp src dst").
3
4 The destination may be a directory.
5
6 """
7 if os.path.isdir(dst):
8 dst = os.path.join(dst, os.path.basename(src))
9 copyfile(src, dst)
10 copymode(src, dst)

shutil.copy2(src, dst)
拷贝文件和状态信息

 1 def copy2(src, dst):
2 """Copy data and all stat info ("cp -p src dst").
3
4 The destination may be a directory.
5
6 """
7 if os.path.isdir(dst):
8 dst = os.path.join(dst, os.path.basename(src))
9 copyfile(src, dst)
10 copystat(src, dst)

shutil.ignore_patterns(*patterns)
shutil.copytree(src, dst, symlinks=False, ignore=None)
递归的去拷贝文件

例如:copytree(source, destination, ignore=ignore_patterns('*.pyc', 'tmp*'))

 1 def ignore_patterns(*patterns):
2 """Function that can be used as copytree() ignore parameter.
3
4 Patterns is a sequence of glob-style patterns
5 that are used to exclude files"""
6 def _ignore_patterns(path, names):
7 ignored_names = []
8 for pattern in patterns:
9 ignored_names.extend(fnmatch.filter(names, pattern))
10 return set(ignored_names)
11 return _ignore_patterns
12
13 def copytree(src, dst, symlinks=False, ignore=None):
14 """Recursively copy a directory tree using copy2().
15
16 The destination directory must not already exist.
17 If exception(s) occur, an Error is raised with a list of reasons.
18
19 If the optional symlinks flag is true, symbolic links in the
20 source tree result in symbolic links in the destination tree; if
21 it is false, the contents of the files pointed to by symbolic
22 links are copied.
23
24 The optional ignore argument is a callable. If given, it
25 is called with the `src` parameter, which is the directory
26 being visited by copytree(), and `names` which is the list of
27 `src` contents, as returned by os.listdir():
28
29 callable(src, names) -> ignored_names
30
31 Since copytree() is called recursively, the callable will be
32 called once for each directory that is copied. It returns a
33 list of names relative to the `src` directory that should
34 not be copied.
35
36 XXX Consider this example code rather than the ultimate tool.
37
38 """
39 names = os.listdir(src)
40 if ignore is not None:
41 ignored_names = ignore(src, names)
42 else:
43 ignored_names = set()
44
45 os.makedirs(dst)
46 errors = []
47 for name in names:
48 if name in ignored_names:
49 continue
50 srcname = os.path.join(src, name)
51 dstname = os.path.join(dst, name)
52 try:
53 if symlinks and os.path.islink(srcname):
54 linkto = os.readlink(srcname)
55 os.symlink(linkto, dstname)
56 elif os.path.isdir(srcname):
57 copytree(srcname, dstname, symlinks, ignore)
58 else:
59 # Will raise a SpecialFileError for unsupported file types
60 copy2(srcname, dstname)
61 # catch the Error from the recursive copytree so that we can
62 # continue with other files
63 except Error, err:
64 errors.extend(err.args[0])
65 except EnvironmentError, why:
66 errors.append((srcname, dstname, str(why)))
67 try:
68 copystat(src, dst)
69 except OSError, why:
70 if WindowsError is not None and isinstance(why, WindowsError):
71 # Copying file access times may fail on Windows
72 pass
73 else:
74 errors.append((src, dst, str(why)))
75 if errors:
76 raise Error, errors

shutil.rmtree(path[, ignore_errors[, onerror]])
递归的去删除文件

 1 def rmtree(path, ignore_errors=False, onerror=None):
2 """Recursively delete a directory tree.
3
4 If ignore_errors is set, errors are ignored; otherwise, if onerror
5 is set, it is called to handle the error with arguments (func,
6 path, exc_info) where func is os.listdir, os.remove, or os.rmdir;
7 path is the argument to that function that caused it to fail; and
8 exc_info is a tuple returned by sys.exc_info(). If ignore_errors
9 is false and onerror is None, an exception is raised.
10
11 """
12 if ignore_errors:
13 def onerror(*args):
14 pass
15 elif onerror is None:
16 def onerror(*args):
17 raise
18 try:
19 if os.path.islink(path):
20 # symlinks to directories are forbidden, see bug #1669
21 raise OSError("Cannot call rmtree on a symbolic link")
22 except OSError:
23 onerror(os.path.islink, path, sys.exc_info())
24 # can't continue even if onerror hook returns
25 return
26 names = []
27 try:
28 names = os.listdir(path)
29 except os.error, err:
30 onerror(os.listdir, path, sys.exc_info())
31 for name in names:
32 fullname = os.path.join(path, name)
33 try:
34 mode = os.lstat(fullname).st_mode
35 except os.error:
36 mode = 0
37 if stat.S_ISDIR(mode):
38 rmtree(fullname, ignore_errors, onerror)
39 else:
40 try:
41 os.remove(fullname)
42 except os.error, err:
43 onerror(os.remove, fullname, sys.exc_info())
44 try:
45 os.rmdir(path)
46 except os.error:
47 onerror(os.rmdir, path, sys.exc_info())

shutil.move(src, dst)
递归的去移动文件

 1 def move(src, dst):
2 """Recursively move a file or directory to another location. This is
3 similar to the Unix "mv" command.
4
5 If the destination is a directory or a symlink to a directory, the source
6 is moved inside the directory. The destination path must not already
7 exist.
8
9 If the destination already exists but is not a directory, it may be
10 overwritten depending on os.rename() semantics.
11
12 If the destination is on our current filesystem, then rename() is used.
13 Otherwise, src is copied to the destination and then removed.
14 A lot more could be done here... A look at a mv.c shows a lot of
15 the issues this implementation glosses over.
16
17 """
18 real_dst = dst
19 if os.path.isdir(dst):
20 if _samefile(src, dst):
21 # We might be on a case insensitive filesystem,
22 # perform the rename anyway.
23 os.rename(src, dst)
24 return
25
26 real_dst = os.path.join(dst, _basename(src))
27 if os.path.exists(real_dst):
28 raise Error, "Destination path '%s' already exists" % real_dst
29 try:
30 os.rename(src, real_dst)
31 except OSError:
32 if os.path.isdir(src):
33 if _destinsrc(src, dst):
34 raise Error, "Cannot move a directory '%s' into itself '%s'." % (src, dst)
35 copytree(src, real_dst, symlinks=True)
36 rmtree(src)
37 else:
38 copy2(src, real_dst)
39 os.unlink(src)

shutil.make_archive(base_name, format,...)

创建压缩包并返回文件路径,例如:zip、tar

    • base_name: 压缩包的文件名,也可以是压缩包的路径。只是文件名时,则保存至当前目录,否则保存至指定路径,
      如:www                        =>保存至当前路径
      如:/Users/wupeiqi/www =>保存至/Users/wupeiqi/
    • format: 压缩包种类,“zip”, “tar”, “bztar”,“gztar”
    • root_dir: 要压缩的文件夹路径(默认当前目录)
    • owner: 用户,默认当前用户
    • group: 组,默认当前组
    • logger: 用于记录日志,通常是logging.Logger对象
1 #将 /Users/wupeiqi/Downloads/test 下的文件打包放置当前程序目录
2
3 import shutil
4 ret = shutil.make_archive("wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')
5
6
7 #将 /Users/wupeiqi/Downloads/test 下的文件打包放置 /Users/wupeiqi/目录
8 import shutil
9 ret = shutil.make_archive("/Users/wupeiqi/wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')
 1 def make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0,
2 dry_run=0, owner=None, group=None, logger=None):
3 """Create an archive file (eg. zip or tar).
4
5 'base_name' is the name of the file to create, minus any format-specific
6 extension; 'format' is the archive format: one of "zip", "tar", "bztar"
7 or "gztar".
8
9 'root_dir' is a directory that will be the root directory of the
10 archive; ie. we typically chdir into 'root_dir' before creating the
11 archive. 'base_dir' is the directory where we start archiving from;
12 ie. 'base_dir' will be the common prefix of all files and
13 directories in the archive. 'root_dir' and 'base_dir' both default
14 to the current directory. Returns the name of the archive file.
15
16 'owner' and 'group' are used when creating a tar archive. By default,
17 uses the current owner and group.
18 """
19 save_cwd = os.getcwd()
20 if root_dir is not None:
21 if logger is not None:
22 logger.debug("changing into '%s'", root_dir)
23 base_name = os.path.abspath(base_name)
24 if not dry_run:
25 os.chdir(root_dir)
26
27 if base_dir is None:
28 base_dir = os.curdir
29
30 kwargs = {'dry_run': dry_run, 'logger': logger}
31
32 try:
33 format_info = _ARCHIVE_FORMATS[format]
34 except KeyError:
35 raise ValueError, "unknown archive format '%s'" % format
36
37 func = format_info[0]
38 for arg, val in format_info[1]:
39 kwargs[arg] = val
40
41 if format != 'zip':
42 kwargs['owner'] = owner
43 kwargs['group'] = group
44
45 try:
46 filename = func(base_name, base_dir, **kwargs)
47 finally:
48 if root_dir is not None:
49 if logger is not None:
50 logger.debug("changing back to '%s'", save_cwd)
51 os.chdir(save_cwd)
52
53 return filename

shutil 对压缩包的处理是调用 ZipFile 和 TarFile 两个模块来进行的,详细:

 1 import zipfile
2
3 # 压缩
4 z = zipfile.ZipFile('laxi.zip', 'w')
5 z.write('a.log')
6 z.write('data.data')
7 z.close()
8
9 # 解压
10 z = zipfile.ZipFile('laxi.zip', 'r')
11 z.extractall()
12 z.close()

Zipfile 压缩解压

 1 import tarfile
2
3 # 压缩
4 tar = tarfile.open('your.tar','w')
5 tar.add('/Users/wupeiqi/PycharmProjects/bbs2.zip', arcname='bbs2.zip')
6 tar.add('/Users/wupeiqi/PycharmProjects/cmdb.zip', arcname='cmdb.zip')
7 tar.close()
8
9 # 解压
10 tar = tarfile.open('your.tar','r')
11 tar.extractall() # 可设置解压地址
12 tar.close()

tarfile 压缩解压

  1 import tarfile
2
3 # 压缩
4 tar = tarfile.open('your.tar','w')
5 tar.add('/Users/wupeiqi/PycharmProjects/bbs2.zip', arcname='bbs2.zip')
6 tar.add('/Users/wupeiqi/PycharmProjects/cmdb.zip', arcname='cmdb.zip')
7 tar.close()
8
9 # 解压
10 tar = tarfile.open('your.tar','r')
11 tar.extractall() # 可设置解压地址
12 tar.close()
13
14 复制代码
15 复制代码
16
17 class ZipFile(object):
18 """ Class with methods to open, read, write, close, list zip files.
19
20 z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=False)
21
22 file: Either the path to the file, or a file-like object.
23 If it is a path, the file will be opened and closed by ZipFile.
24 mode: The mode can be either read "r", write "w" or append "a".
25 compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib).
26 allowZip64: if True ZipFile will create files with ZIP64 extensions when
27 needed, otherwise it will raise an exception when this would
28 be necessary.
29
30 """
31
32 fp = None # Set here since __del__ checks it
33
34 def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False):
35 """Open the ZIP file with mode read "r", write "w" or append "a"."""
36 if mode not in ("r", "w", "a"):
37 raise RuntimeError('ZipFile() requires mode "r", "w", or "a"')
38
39 if compression == ZIP_STORED:
40 pass
41 elif compression == ZIP_DEFLATED:
42 if not zlib:
43 raise RuntimeError,\
44 "Compression requires the (missing) zlib module"
45 else:
46 raise RuntimeError, "That compression method is not supported"
47
48 self._allowZip64 = allowZip64
49 self._didModify = False
50 self.debug = 0 # Level of printing: 0 through 3
51 self.NameToInfo = {} # Find file info given name
52 self.filelist = [] # List of ZipInfo instances for archive
53 self.compression = compression # Method of compression
54 self.mode = key = mode.replace('b', '')[0]
55 self.pwd = None
56 self._comment = ''
57
58 # Check if we were passed a file-like object
59 if isinstance(file, basestring):
60 self._filePassed = 0
61 self.filename = file
62 modeDict = {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'}
63 try:
64 self.fp = open(file, modeDict[mode])
65 except IOError:
66 if mode == 'a':
67 mode = key = 'w'
68 self.fp = open(file, modeDict[mode])
69 else:
70 raise
71 else:
72 self._filePassed = 1
73 self.fp = file
74 self.filename = getattr(file, 'name', None)
75
76 try:
77 if key == 'r':
78 self._RealGetContents()
79 elif key == 'w':
80 # set the modified flag so central directory gets written
81 # even if no files are added to the archive
82 self._didModify = True
83 elif key == 'a':
84 try:
85 # See if file is a zip file
86 self._RealGetContents()
87 # seek to start of directory and overwrite
88 self.fp.seek(self.start_dir, 0)
89 except BadZipfile:
90 # file is not a zip file, just append
91 self.fp.seek(0, 2)
92
93 # set the modified flag so central directory gets written
94 # even if no files are added to the archive
95 self._didModify = True
96 else:
97 raise RuntimeError('Mode must be "r", "w" or "a"')
98 except:
99 fp = self.fp
100 self.fp = None
101 if not self._filePassed:
102 fp.close()
103 raise
104
105 def __enter__(self):
106 return self
107
108 def __exit__(self, type, value, traceback):
109 self.close()
110
111 def _RealGetContents(self):
112 """Read in the table of contents for the ZIP file."""
113 fp = self.fp
114 try:
115 endrec = _EndRecData(fp)
116 except IOError:
117 raise BadZipfile("File is not a zip file")
118 if not endrec:
119 raise BadZipfile, "File is not a zip file"
120 if self.debug > 1:
121 print endrec
122 size_cd = endrec[_ECD_SIZE] # bytes in central directory
123 offset_cd = endrec[_ECD_OFFSET] # offset of central directory
124 self._comment = endrec[_ECD_COMMENT] # archive comment
125
126 # "concat" is zero, unless zip was concatenated to another file
127 concat = endrec[_ECD_LOCATION] - size_cd - offset_cd
128 if endrec[_ECD_SIGNATURE] == stringEndArchive64:
129 # If Zip64 extension structures are present, account for them
130 concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator)
131
132 if self.debug > 2:
133 inferred = concat + offset_cd
134 print "given, inferred, offset", offset_cd, inferred, concat
135 # self.start_dir: Position of start of central directory
136 self.start_dir = offset_cd + concat
137 fp.seek(self.start_dir, 0)
138 data = fp.read(size_cd)
139 fp = cStringIO.StringIO(data)
140 total = 0
141 while total < size_cd:
142 centdir = fp.read(sizeCentralDir)
143 if len(centdir) != sizeCentralDir:
144 raise BadZipfile("Truncated central directory")
145 centdir = struct.unpack(structCentralDir, centdir)
146 if centdir[_CD_SIGNATURE] != stringCentralDir:
147 raise BadZipfile("Bad magic number for central directory")
148 if self.debug > 2:
149 print centdir
150 filename = fp.read(centdir[_CD_FILENAME_LENGTH])
151 # Create ZipInfo instance to store file information
152 x = ZipInfo(filename)
153 x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH])
154 x.comment = fp.read(centdir[_CD_COMMENT_LENGTH])
155 x.header_offset = centdir[_CD_LOCAL_HEADER_OFFSET]
156 (x.create_version, x.create_system, x.extract_version, x.reserved,
157 x.flag_bits, x.compress_type, t, d,
158 x.CRC, x.compress_size, x.file_size) = centdir[1:12]
159 x.volume, x.internal_attr, x.external_attr = centdir[15:18]
160 # Convert date/time code to (year, month, day, hour, min, sec)
161 x._raw_time = t
162 x.date_time = ( (d>>9)+1980, (d>>5)&0xF, d&0x1F,
163 t>>11, (t>>5)&0x3F, (t&0x1F) * 2 )
164
165 x._decodeExtra()
166 x.header_offset = x.header_offset + concat
167 x.filename = x._decodeFilename()
168 self.filelist.append(x)
169 self.NameToInfo[x.filename] = x
170
171 # update total bytes read from central directory
172 total = (total + sizeCentralDir + centdir[_CD_FILENAME_LENGTH]
173 + centdir[_CD_EXTRA_FIELD_LENGTH]
174 + centdir[_CD_COMMENT_LENGTH])
175
176 if self.debug > 2:
177 print "total", total
178
179
180 def namelist(self):
181 """Return a list of file names in the archive."""
182 l = []
183 for data in self.filelist:
184 l.append(data.filename)
185 return l
186
187 def infolist(self):
188 """Return a list of class ZipInfo instances for files in the
189 archive."""
190 return self.filelist
191
192 def printdir(self):
193 """Print a table of contents for the zip file."""
194 print "%-46s %19s %12s" % ("File Name", "Modified ", "Size")
195 for zinfo in self.filelist:
196 date = "%d-%02d-%02d %02d:%02d:%02d" % zinfo.date_time[:6]
197 print "%-46s %s %12d" % (zinfo.filename, date, zinfo.file_size)
198
199 def testzip(self):
200 """Read all the files and check the CRC."""
201 chunk_size = 2 ** 20
202 for zinfo in self.filelist:
203 try:
204 # Read by chunks, to avoid an OverflowError or a
205 # MemoryError with very large embedded files.
206 with self.open(zinfo.filename, "r") as f:
207 while f.read(chunk_size): # Check CRC-32
208 pass
209 except BadZipfile:
210 return zinfo.filename
211
212 def getinfo(self, name):
213 """Return the instance of ZipInfo given 'name'."""
214 info = self.NameToInfo.get(name)
215 if info is None:
216 raise KeyError(
217 'There is no item named %r in the archive' % name)
218
219 return info
220
221 def setpassword(self, pwd):
222 """Set default password for encrypted files."""
223 self.pwd = pwd
224
225 @property
226 def comment(self):
227 """The comment text associated with the ZIP file."""
228 return self._comment
229
230 @comment.setter
231 def comment(self, comment):
232 # check for valid comment length
233 if len(comment) > ZIP_MAX_COMMENT:
234 import warnings
235 warnings.warn('Archive comment is too long; truncating to %d bytes'
236 % ZIP_MAX_COMMENT, stacklevel=2)
237 comment = comment[:ZIP_MAX_COMMENT]
238 self._comment = comment
239 self._didModify = True
240
241 def read(self, name, pwd=None):
242 """Return file bytes (as a string) for name."""
243 return self.open(name, "r", pwd).read()
244
245 def open(self, name, mode="r", pwd=None):
246 """Return file-like object for 'name'."""
247 if mode not in ("r", "U", "rU"):
248 raise RuntimeError, 'open() requires mode "r", "U", or "rU"'
249 if not self.fp:
250 raise RuntimeError, \
251 "Attempt to read ZIP archive that was already closed"
252
253 # Only open a new file for instances where we were not
254 # given a file object in the constructor
255 if self._filePassed:
256 zef_file = self.fp
257 should_close = False
258 else:
259 zef_file = open(self.filename, 'rb')
260 should_close = True
261
262 try:
263 # Make sure we have an info object
264 if isinstance(name, ZipInfo):
265 # 'name' is already an info object
266 zinfo = name
267 else:
268 # Get info object for name
269 zinfo = self.getinfo(name)
270
271 zef_file.seek(zinfo.header_offset, 0)
272
273 # Skip the file header:
274 fheader = zef_file.read(sizeFileHeader)
275 if len(fheader) != sizeFileHeader:
276 raise BadZipfile("Truncated file header")
277 fheader = struct.unpack(structFileHeader, fheader)
278 if fheader[_FH_SIGNATURE] != stringFileHeader:
279 raise BadZipfile("Bad magic number for file header")
280
281 fname = zef_file.read(fheader[_FH_FILENAME_LENGTH])
282 if fheader[_FH_EXTRA_FIELD_LENGTH]:
283 zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH])
284
285 if fname != zinfo.orig_filename:
286 raise BadZipfile, \
287 'File name in directory "%s" and header "%s" differ.' % (
288 zinfo.orig_filename, fname)
289
290 # check for encrypted flag & handle password
291 is_encrypted = zinfo.flag_bits & 0x1
292 zd = None
293 if is_encrypted:
294 if not pwd:
295 pwd = self.pwd
296 if not pwd:
297 raise RuntimeError, "File %s is encrypted, " \
298 "password required for extraction" % name
299
300 zd = _ZipDecrypter(pwd)
301 # The first 12 bytes in the cypher stream is an encryption header
302 # used to strengthen the algorithm. The first 11 bytes are
303 # completely random, while the 12th contains the MSB of the CRC,
304 # or the MSB of the file time depending on the header type
305 # and is used to check the correctness of the password.
306 bytes = zef_file.read(12)
307 h = map(zd, bytes[0:12])
308 if zinfo.flag_bits & 0x8:
309 # compare against the file type from extended local headers
310 check_byte = (zinfo._raw_time >> 8) & 0xff
311 else:
312 # compare against the CRC otherwise
313 check_byte = (zinfo.CRC >> 24) & 0xff
314 if ord(h[11]) != check_byte:
315 raise RuntimeError("Bad password for file", name)
316
317 return ZipExtFile(zef_file, mode, zinfo, zd,
318 close_fileobj=should_close)
319 except:
320 if should_close:
321 zef_file.close()
322 raise
323
324 def extract(self, member, path=None, pwd=None):
325 """Extract a member from the archive to the current working directory,
326 using its full name. Its file information is extracted as accurately
327 as possible. `member' may be a filename or a ZipInfo object. You can
328 specify a different directory using `path'.
329 """
330 if not isinstance(member, ZipInfo):
331 member = self.getinfo(member)
332
333 if path is None:
334 path = os.getcwd()
335
336 return self._extract_member(member, path, pwd)
337
338 def extractall(self, path=None, members=None, pwd=None):
339 """Extract all members from the archive to the current working
340 directory. `path' specifies a different directory to extract to.
341 `members' is optional and must be a subset of the list returned
342 by namelist().
343 """
344 if members is None:
345 members = self.namelist()
346
347 for zipinfo in members:
348 self.extract(zipinfo, path, pwd)
349
350 def _extract_member(self, member, targetpath, pwd):
351 """Extract the ZipInfo object 'member' to a physical
352 file on the path targetpath.
353 """
354 # build the destination pathname, replacing
355 # forward slashes to platform specific separators.
356 arcname = member.filename.replace('/', os.path.sep)
357
358 if os.path.altsep:
359 arcname = arcname.replace(os.path.altsep, os.path.sep)
360 # interpret absolute pathname as relative, remove drive letter or
361 # UNC path, redundant separators, "." and ".." components.
362 arcname = os.path.splitdrive(arcname)[1]
363 arcname = os.path.sep.join(x for x in arcname.split(os.path.sep)
364 if x not in ('', os.path.curdir, os.path.pardir))
365 if os.path.sep == '\\':
366 # filter illegal characters on Windows
367 illegal = ':<>|"?*'
368 if isinstance(arcname, unicode):
369 table = {ord(c): ord('_') for c in illegal}
370 else:
371 table = string.maketrans(illegal, '_' * len(illegal))
372 arcname = arcname.translate(table)
373 # remove trailing dots
374 arcname = (x.rstrip('.') for x in arcname.split(os.path.sep))
375 arcname = os.path.sep.join(x for x in arcname if x)
376
377 targetpath = os.path.join(targetpath, arcname)
378 targetpath = os.path.normpath(targetpath)
379
380 # Create all upper directories if necessary.
381 upperdirs = os.path.dirname(targetpath)
382 if upperdirs and not os.path.exists(upperdirs):
383 os.makedirs(upperdirs)
384
385 if member.filename[-1] == '/':
386 if not os.path.isdir(targetpath):
387 os.mkdir(targetpath)
388 return targetpath
389
390 with self.open(member, pwd=pwd) as source, \
391 file(targetpath, "wb") as target:
392 shutil.copyfileobj(source, target)
393
394 return targetpath
395
396 def _writecheck(self, zinfo):
397 """Check for errors before writing a file to the archive."""
398 if zinfo.filename in self.NameToInfo:
399 import warnings
400 warnings.warn('Duplicate name: %r' % zinfo.filename, stacklevel=3)
401 if self.mode not in ("w", "a"):
402 raise RuntimeError, 'write() requires mode "w" or "a"'
403 if not self.fp:
404 raise RuntimeError, \
405 "Attempt to write ZIP archive that was already closed"
406 if zinfo.compress_type == ZIP_DEFLATED and not zlib:
407 raise RuntimeError, \
408 "Compression requires the (missing) zlib module"
409 if zinfo.compress_type not in (ZIP_STORED, ZIP_DEFLATED):
410 raise RuntimeError, \
411 "That compression method is not supported"
412 if not self._allowZip64:
413 requires_zip64 = None
414 if len(self.filelist) >= ZIP_FILECOUNT_LIMIT:
415 requires_zip64 = "Files count"
416 elif zinfo.file_size > ZIP64_LIMIT:
417 requires_zip64 = "Filesize"
418 elif zinfo.header_offset > ZIP64_LIMIT:
419 requires_zip64 = "Zipfile size"
420 if requires_zip64:
421 raise LargeZipFile(requires_zip64 +
422 " would require ZIP64 extensions")
423
424 def write(self, filename, arcname=None, compress_type=None):
425 """Put the bytes from filename into the archive under the name
426 arcname."""
427 if not self.fp:
428 raise RuntimeError(
429 "Attempt to write to ZIP archive that was already closed")
430
431 st = os.stat(filename)
432 isdir = stat.S_ISDIR(st.st_mode)
433 mtime = time.localtime(st.st_mtime)
434 date_time = mtime[0:6]
435 # Create ZipInfo instance to store file information
436 if arcname is None:
437 arcname = filename
438 arcname = os.path.normpath(os.path.splitdrive(arcname)[1])
439 while arcname[0] in (os.sep, os.altsep):
440 arcname = arcname[1:]
441 if isdir:
442 arcname += '/'
443 zinfo = ZipInfo(arcname, date_time)
444 zinfo.external_attr = (st[0] & 0xFFFF) << 16L # Unix attributes
445 if compress_type is None:
446 zinfo.compress_type = self.compression
447 else:
448 zinfo.compress_type = compress_type
449
450 zinfo.file_size = st.st_size
451 zinfo.flag_bits = 0x00
452 zinfo.header_offset = self.fp.tell() # Start of header bytes
453
454 self._writecheck(zinfo)
455 self._didModify = True
456
457 if isdir:
458 zinfo.file_size = 0
459 zinfo.compress_size = 0
460 zinfo.CRC = 0
461 zinfo.external_attr |= 0x10 # MS-DOS directory flag
462 self.filelist.append(zinfo)
463 self.NameToInfo[zinfo.filename] = zinfo
464 self.fp.write(zinfo.FileHeader(False))
465 return
466
467 with open(filename, "rb") as fp:
468 # Must overwrite CRC and sizes with correct data later
469 zinfo.CRC = CRC = 0
470 zinfo.compress_size = compress_size = 0
471 # Compressed size can be larger than uncompressed size
472 zip64 = self._allowZip64 and \
473 zinfo.file_size * 1.05 > ZIP64_LIMIT
474 self.fp.write(zinfo.FileHeader(zip64))
475 if zinfo.compress_type == ZIP_DEFLATED:
476 cmpr = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
477 zlib.DEFLATED, -15)
478 else:
479 cmpr = None
480 file_size = 0
481 while 1:
482 buf = fp.read(1024 * 8)
483 if not buf:
484 break
485 file_size = file_size + len(buf)
486 CRC = crc32(buf, CRC) & 0xffffffff
487 if cmpr:
488 buf = cmpr.compress(buf)
489 compress_size = compress_size + len(buf)
490 self.fp.write(buf)
491 if cmpr:
492 buf = cmpr.flush()
493 compress_size = compress_size + len(buf)
494 self.fp.write(buf)
495 zinfo.compress_size = compress_size
496 else:
497 zinfo.compress_size = file_size
498 zinfo.CRC = CRC
499 zinfo.file_size = file_size
500 if not zip64 and self._allowZip64:
501 if file_size > ZIP64_LIMIT:
502 raise RuntimeError('File size has increased during compressing')
503 if compress_size > ZIP64_LIMIT:
504 raise RuntimeError('Compressed size larger than uncompressed size')
505 # Seek backwards and write file header (which will now include
506 # correct CRC and file sizes)
507 position = self.fp.tell() # Preserve current position in file
508 self.fp.seek(zinfo.header_offset, 0)
509 self.fp.write(zinfo.FileHeader(zip64))
510 self.fp.seek(position, 0)
511 self.filelist.append(zinfo)
512 self.NameToInfo[zinfo.filename] = zinfo
513
514 def writestr(self, zinfo_or_arcname, bytes, compress_type=None):
515 """Write a file into the archive. The contents is the string
516 'bytes'. 'zinfo_or_arcname' is either a ZipInfo instance or
517 the name of the file in the archive."""
518 if not isinstance(zinfo_or_arcname, ZipInfo):
519 zinfo = ZipInfo(filename=zinfo_or_arcname,
520 date_time=time.localtime(time.time())[:6])
521
522 zinfo.compress_type = self.compression
523 if zinfo.filename[-1] == '/':
524 zinfo.external_attr = 0o40775 << 16 # drwxrwxr-x
525 zinfo.external_attr |= 0x10 # MS-DOS directory flag
526 else:
527 zinfo.external_attr = 0o600 << 16 # ?rw-------
528 else:
529 zinfo = zinfo_or_arcname
530
531 if not self.fp:
532 raise RuntimeError(
533 "Attempt to write to ZIP archive that was already closed")
534
535 if compress_type is not None:
536 zinfo.compress_type = compress_type
537
538 zinfo.file_size = len(bytes) # Uncompressed size
539 zinfo.header_offset = self.fp.tell() # Start of header bytes
540 self._writecheck(zinfo)
541 self._didModify = True
542 zinfo.CRC = crc32(bytes) & 0xffffffff # CRC-32 checksum
543 if zinfo.compress_type == ZIP_DEFLATED:
544 co = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
545 zlib.DEFLATED, -15)
546 bytes = co.compress(bytes) + co.flush()
547 zinfo.compress_size = len(bytes) # Compressed size
548 else:
549 zinfo.compress_size = zinfo.file_size
550 zip64 = zinfo.file_size > ZIP64_LIMIT or \
551 zinfo.compress_size > ZIP64_LIMIT
552 if zip64 and not self._allowZip64:
553 raise LargeZipFile("Filesize would require ZIP64 extensions")
554 self.fp.write(zinfo.FileHeader(zip64))
555 self.fp.write(bytes)
556 if zinfo.flag_bits & 0x08:
557 # Write CRC and file sizes after the file data
558 fmt = '<LQQ' if zip64 else '<LLL'
559 self.fp.write(struct.pack(fmt, zinfo.CRC, zinfo.compress_size,
560 zinfo.file_size))
561 self.fp.flush()
562 self.filelist.append(zinfo)
563 self.NameToInfo[zinfo.filename] = zinfo
564
565 def __del__(self):
566 """Call the "close()" method in case the user forgot."""
567 self.close()
568
569 def close(self):
570 """Close the file, and for mode "w" and "a" write the ending
571 records."""
572 if self.fp is None:
573 return
574
575 try:
576 if self.mode in ("w", "a") and self._didModify: # write ending records
577 pos1 = self.fp.tell()
578 for zinfo in self.filelist: # write central directory
579 dt = zinfo.date_time
580 dosdate = (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2]
581 dostime = dt[3] << 11 | dt[4] << 5 | (dt[5] // 2)
582 extra = []
583 if zinfo.file_size > ZIP64_LIMIT \
584 or zinfo.compress_size > ZIP64_LIMIT:
585 extra.append(zinfo.file_size)
586 extra.append(zinfo.compress_size)
587 file_size = 0xffffffff
588 compress_size = 0xffffffff
589 else:
590 file_size = zinfo.file_size
591 compress_size = zinfo.compress_size
592
593 if zinfo.header_offset > ZIP64_LIMIT:
594 extra.append(zinfo.header_offset)
595 header_offset = 0xffffffffL
596 else:
597 header_offset = zinfo.header_offset
598
599 extra_data = zinfo.extra
600 if extra:
601 # Append a ZIP64 field to the extra's
602 extra_data = struct.pack(
603 '<HH' + 'Q'*len(extra),
604 1, 8*len(extra), *extra) + extra_data
605
606 extract_version = max(45, zinfo.extract_version)
607 create_version = max(45, zinfo.create_version)
608 else:
609 extract_version = zinfo.extract_version
610 create_version = zinfo.create_version
611
612 try:
613 filename, flag_bits = zinfo._encodeFilenameFlags()
614 centdir = struct.pack(structCentralDir,
615 stringCentralDir, create_version,
616 zinfo.create_system, extract_version, zinfo.reserved,
617 flag_bits, zinfo.compress_type, dostime, dosdate,
618 zinfo.CRC, compress_size, file_size,
619 len(filename), len(extra_data), len(zinfo.comment),
620 0, zinfo.internal_attr, zinfo.external_attr,
621 header_offset)
622 except DeprecationWarning:
623 print >>sys.stderr, (structCentralDir,
624 stringCentralDir, create_version,
625 zinfo.create_system, extract_version, zinfo.reserved,
626 zinfo.flag_bits, zinfo.compress_type, dostime, dosdate,
627 zinfo.CRC, compress_size, file_size,
628 len(zinfo.filename), len(extra_data), len(zinfo.comment),
629 0, zinfo.internal_attr, zinfo.external_attr,
630 header_offset)
631 raise
632 self.fp.write(centdir)
633 self.fp.write(filename)
634 self.fp.write(extra_data)
635 self.fp.write(zinfo.comment)
636
637 pos2 = self.fp.tell()
638 # Write end-of-zip-archive record
639 centDirCount = len(self.filelist)
640 centDirSize = pos2 - pos1
641 centDirOffset = pos1
642 requires_zip64 = None
643 if centDirCount > ZIP_FILECOUNT_LIMIT:
644 requires_zip64 = "Files count"
645 elif centDirOffset > ZIP64_LIMIT:
646 requires_zip64 = "Central directory offset"
647 elif centDirSize > ZIP64_LIMIT:
648 requires_zip64 = "Central directory size"
649 if requires_zip64:
650 # Need to write the ZIP64 end-of-archive records
651 if not self._allowZip64:
652 raise LargeZipFile(requires_zip64 +
653 " would require ZIP64 extensions")
654 zip64endrec = struct.pack(
655 structEndArchive64, stringEndArchive64,
656 44, 45, 45, 0, 0, centDirCount, centDirCount,
657 centDirSize, centDirOffset)
658 self.fp.write(zip64endrec)
659
660 zip64locrec = struct.pack(
661 structEndArchive64Locator,
662 stringEndArchive64Locator, 0, pos2, 1)
663 self.fp.write(zip64locrec)
664 centDirCount = min(centDirCount, 0xFFFF)
665 centDirSize = min(centDirSize, 0xFFFFFFFF)
666 centDirOffset = min(centDirOffset, 0xFFFFFFFF)
667
668 endrec = struct.pack(structEndArchive, stringEndArchive,
669 0, 0, centDirCount, centDirCount,
670 centDirSize, centDirOffset, len(self._comment))
671 self.fp.write(endrec)
672 self.fp.write(self._comment)
673 self.fp.flush()
674 finally:
675 fp = self.fp
676 self.fp = None
677 if not self._filePassed:
678 fp.close()

zipfile

  1 class TarFile(object):
2 """The TarFile Class provides an interface to tar archives.
3 """
4
5 debug = 0 # May be set from 0 (no msgs) to 3 (all msgs)
6
7 dereference = False # If true, add content of linked file to the
8 # tar file, else the link.
9
10 ignore_zeros = False # If true, skips empty or invalid blocks and
11 # continues processing.
12
13 errorlevel = 1 # If 0, fatal errors only appear in debug
14 # messages (if debug >= 0). If > 0, errors
15 # are passed to the caller as exceptions.
16
17 format = DEFAULT_FORMAT # The format to use when creating an archive.
18
19 encoding = ENCODING # Encoding for 8-bit character strings.
20
21 errors = None # Error handler for unicode conversion.
22
23 tarinfo = TarInfo # The default TarInfo class to use.
24
25 fileobject = ExFileObject # The default ExFileObject class to use.
26
27 def __init__(self, name=None, mode="r", fileobj=None, format=None,
28 tarinfo=None, dereference=None, ignore_zeros=None, encoding=None,
29 errors=None, pax_headers=None, debug=None, errorlevel=None):
30 """Open an (uncompressed) tar archive `name'. `mode' is either 'r' to
31 read from an existing archive, 'a' to append data to an existing
32 file or 'w' to create a new file overwriting an existing one. `mode'
33 defaults to 'r'.
34 If `fileobj' is given, it is used for reading or writing data. If it
35 can be determined, `mode' is overridden by `fileobj's mode.
36 `fileobj' is not closed, when TarFile is closed.
37 """
38 modes = {"r": "rb", "a": "r+b", "w": "wb"}
39 if mode not in modes:
40 raise ValueError("mode must be 'r', 'a' or 'w'")
41 self.mode = mode
42 self._mode = modes[mode]
43
44 if not fileobj:
45 if self.mode == "a" and not os.path.exists(name):
46 # Create nonexistent files in append mode.
47 self.mode = "w"
48 self._mode = "wb"
49 fileobj = bltn_open(name, self._mode)
50 self._extfileobj = False
51 else:
52 if name is None and hasattr(fileobj, "name"):
53 name = fileobj.name
54 if hasattr(fileobj, "mode"):
55 self._mode = fileobj.mode
56 self._extfileobj = True
57 self.name = os.path.abspath(name) if name else None
58 self.fileobj = fileobj
59
60 # Init attributes.
61 if format is not None:
62 self.format = format
63 if tarinfo is not None:
64 self.tarinfo = tarinfo
65 if dereference is not None:
66 self.dereference = dereference
67 if ignore_zeros is not None:
68 self.ignore_zeros = ignore_zeros
69 if encoding is not None:
70 self.encoding = encoding
71
72 if errors is not None:
73 self.errors = errors
74 elif mode == "r":
75 self.errors = "utf-8"
76 else:
77 self.errors = "strict"
78
79 if pax_headers is not None and self.format == PAX_FORMAT:
80 self.pax_headers = pax_headers
81 else:
82 self.pax_headers = {}
83
84 if debug is not None:
85 self.debug = debug
86 if errorlevel is not None:
87 self.errorlevel = errorlevel
88
89 # Init datastructures.
90 self.closed = False
91 self.members = [] # list of members as TarInfo objects
92 self._loaded = False # flag if all members have been read
93 self.offset = self.fileobj.tell()
94 # current position in the archive file
95 self.inodes = {} # dictionary caching the inodes of
96 # archive members already added
97
98 try:
99 if self.mode == "r":
100 self.firstmember = None
101 self.firstmember = self.next()
102
103 if self.mode == "a":
104 # Move to the end of the archive,
105 # before the first empty block.
106 while True:
107 self.fileobj.seek(self.offset)
108 try:
109 tarinfo = self.tarinfo.fromtarfile(self)
110 self.members.append(tarinfo)
111 except EOFHeaderError:
112 self.fileobj.seek(self.offset)
113 break
114 except HeaderError, e:
115 raise ReadError(str(e))
116
117 if self.mode in "aw":
118 self._loaded = True
119
120 if self.pax_headers:
121 buf = self.tarinfo.create_pax_global_header(self.pax_headers.copy())
122 self.fileobj.write(buf)
123 self.offset += len(buf)
124 except:
125 if not self._extfileobj:
126 self.fileobj.close()
127 self.closed = True
128 raise
129
130 def _getposix(self):
131 return self.format == USTAR_FORMAT
132 def _setposix(self, value):
133 import warnings
134 warnings.warn("use the format attribute instead", DeprecationWarning,
135 2)
136 if value:
137 self.format = USTAR_FORMAT
138 else:
139 self.format = GNU_FORMAT
140 posix = property(_getposix, _setposix)
141
142 #--------------------------------------------------------------------------
143 # Below are the classmethods which act as alternate constructors to the
144 # TarFile class. The open() method is the only one that is needed for
145 # public use; it is the "super"-constructor and is able to select an
146 # adequate "sub"-constructor for a particular compression using the mapping
147 # from OPEN_METH.
148 #
149 # This concept allows one to subclass TarFile without losing the comfort of
150 # the super-constructor. A sub-constructor is registered and made available
151 # by adding it to the mapping in OPEN_METH.
152
153 @classmethod
154 def open(cls, name=None, mode="r", fileobj=None, bufsize=RECORDSIZE, **kwargs):
155 """Open a tar archive for reading, writing or appending. Return
156 an appropriate TarFile class.
157
158 mode:
159 'r' or 'r:*' open for reading with transparent compression
160 'r:' open for reading exclusively uncompressed
161 'r:gz' open for reading with gzip compression
162 'r:bz2' open for reading with bzip2 compression
163 'a' or 'a:' open for appending, creating the file if necessary
164 'w' or 'w:' open for writing without compression
165 'w:gz' open for writing with gzip compression
166 'w:bz2' open for writing with bzip2 compression
167
168 'r|*' open a stream of tar blocks with transparent compression
169 'r|' open an uncompressed stream of tar blocks for reading
170 'r|gz' open a gzip compressed stream of tar blocks
171 'r|bz2' open a bzip2 compressed stream of tar blocks
172 'w|' open an uncompressed stream for writing
173 'w|gz' open a gzip compressed stream for writing
174 'w|bz2' open a bzip2 compressed stream for writing
175 """
176
177 if not name and not fileobj:
178 raise ValueError("nothing to open")
179
180 if mode in ("r", "r:*"):
181 # Find out which *open() is appropriate for opening the file.
182 for comptype in cls.OPEN_METH:
183 func = getattr(cls, cls.OPEN_METH[comptype])
184 if fileobj is not None:
185 saved_pos = fileobj.tell()
186 try:
187 return func(name, "r", fileobj, **kwargs)
188 except (ReadError, CompressionError), e:
189 if fileobj is not None:
190 fileobj.seek(saved_pos)
191 continue
192 raise ReadError("file could not be opened successfully")
193
194 elif ":" in mode:
195 filemode, comptype = mode.split(":", 1)
196 filemode = filemode or "r"
197 comptype = comptype or "tar"
198
199 # Select the *open() function according to
200 # given compression.
201 if comptype in cls.OPEN_METH:
202 func = getattr(cls, cls.OPEN_METH[comptype])
203 else:
204 raise CompressionError("unknown compression type %r" % comptype)
205 return func(name, filemode, fileobj, **kwargs)
206
207 elif "|" in mode:
208 filemode, comptype = mode.split("|", 1)
209 filemode = filemode or "r"
210 comptype = comptype or "tar"
211
212 if filemode not in ("r", "w"):
213 raise ValueError("mode must be 'r' or 'w'")
214
215 stream = _Stream(name, filemode, comptype, fileobj, bufsize)
216 try:
217 t = cls(name, filemode, stream, **kwargs)
218 except:
219 stream.close()
220 raise
221 t._extfileobj = False
222 return t
223
224 elif mode in ("a", "w"):
225 return cls.taropen(name, mode, fileobj, **kwargs)
226
227 raise ValueError("undiscernible mode")
228
229 @classmethod
230 def taropen(cls, name, mode="r", fileobj=None, **kwargs):
231 """Open uncompressed tar archive name for reading or writing.
232 """
233 if mode not in ("r", "a", "w"):
234 raise ValueError("mode must be 'r', 'a' or 'w'")
235 return cls(name, mode, fileobj, **kwargs)
236
237 @classmethod
238 def gzopen(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs):
239 """Open gzip compressed tar archive name for reading or writing.
240 Appending is not allowed.
241 """
242 if mode not in ("r", "w"):
243 raise ValueError("mode must be 'r' or 'w'")
244
245 try:
246 import gzip
247 gzip.GzipFile
248 except (ImportError, AttributeError):
249 raise CompressionError("gzip module is not available")
250
251 try:
252 fileobj = gzip.GzipFile(name, mode, compresslevel, fileobj)
253 except OSError:
254 if fileobj is not None and mode == 'r':
255 raise ReadError("not a gzip file")
256 raise
257
258 try:
259 t = cls.taropen(name, mode, fileobj, **kwargs)
260 except IOError:
261 fileobj.close()
262 if mode == 'r':
263 raise ReadError("not a gzip file")
264 raise
265 except:
266 fileobj.close()
267 raise
268 t._extfileobj = False
269 return t
270
271 @classmethod
272 def bz2open(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs):
273 """Open bzip2 compressed tar archive name for reading or writing.
274 Appending is not allowed.
275 """
276 if mode not in ("r", "w"):
277 raise ValueError("mode must be 'r' or 'w'.")
278
279 try:
280 import bz2
281 except ImportError:
282 raise CompressionError("bz2 module is not available")
283
284 if fileobj is not None:
285 fileobj = _BZ2Proxy(fileobj, mode)
286 else:
287 fileobj = bz2.BZ2File(name, mode, compresslevel=compresslevel)
288
289 try:
290 t = cls.taropen(name, mode, fileobj, **kwargs)
291 except (IOError, EOFError):
292 fileobj.close()
293 if mode == 'r':
294 raise ReadError("not a bzip2 file")
295 raise
296 except:
297 fileobj.close()
298 raise
299 t._extfileobj = False
300 return t
301
302 # All *open() methods are registered here.
303 OPEN_METH = {
304 "tar": "taropen", # uncompressed tar
305 "gz": "gzopen", # gzip compressed tar
306 "bz2": "bz2open" # bzip2 compressed tar
307 }
308
309 #--------------------------------------------------------------------------
310 # The public methods which TarFile provides:
311
312 def close(self):
313 """Close the TarFile. In write-mode, two finishing zero blocks are
314 appended to the archive.
315 """
316 if self.closed:
317 return
318
319 if self.mode in "aw":
320 self.fileobj.write(NUL * (BLOCKSIZE * 2))
321 self.offset += (BLOCKSIZE * 2)
322 # fill up the end with zero-blocks
323 # (like option -b20 for tar does)
324 blocks, remainder = divmod(self.offset, RECORDSIZE)
325 if remainder > 0:
326 self.fileobj.write(NUL * (RECORDSIZE - remainder))
327
328 if not self._extfileobj:
329 self.fileobj.close()
330 self.closed = True
331
332 def getmember(self, name):
333 """Return a TarInfo object for member `name'. If `name' can not be
334 found in the archive, KeyError is raised. If a member occurs more
335 than once in the archive, its last occurrence is assumed to be the
336 most up-to-date version.
337 """
338 tarinfo = self._getmember(name)
339 if tarinfo is None:
340 raise KeyError("filename %r not found" % name)
341 return tarinfo
342
343 def getmembers(self):
344 """Return the members of the archive as a list of TarInfo objects. The
345 list has the same order as the members in the archive.
346 """
347 self._check()
348 if not self._loaded: # if we want to obtain a list of
349 self._load() # all members, we first have to
350 # scan the whole archive.
351 return self.members
352
353 def getnames(self):
354 """Return the members of the archive as a list of their names. It has
355 the same order as the list returned by getmembers().
356 """
357 return [tarinfo.name for tarinfo in self.getmembers()]
358
359 def gettarinfo(self, name=None, arcname=None, fileobj=None):
360 """Create a TarInfo object for either the file `name' or the file
361 object `fileobj' (using os.fstat on its file descriptor). You can
362 modify some of the TarInfo's attributes before you add it using
363 addfile(). If given, `arcname' specifies an alternative name for the
364 file in the archive.
365 """
366 self._check("aw")
367
368 # When fileobj is given, replace name by
369 # fileobj's real name.
370 if fileobj is not None:
371 name = fileobj.name
372
373 # Building the name of the member in the archive.
374 # Backward slashes are converted to forward slashes,
375 # Absolute paths are turned to relative paths.
376 if arcname is None:
377 arcname = name
378 drv, arcname = os.path.splitdrive(arcname)
379 arcname = arcname.replace(os.sep, "/")
380 arcname = arcname.lstrip("/")
381
382 # Now, fill the TarInfo object with
383 # information specific for the file.
384 tarinfo = self.tarinfo()
385 tarinfo.tarfile = self
386
387 # Use os.stat or os.lstat, depending on platform
388 # and if symlinks shall be resolved.
389 if fileobj is None:
390 if hasattr(os, "lstat") and not self.dereference:
391 statres = os.lstat(name)
392 else:
393 statres = os.stat(name)
394 else:
395 statres = os.fstat(fileobj.fileno())
396 linkname = ""
397
398 stmd = statres.st_mode
399 if stat.S_ISREG(stmd):
400 inode = (statres.st_ino, statres.st_dev)
401 if not self.dereference and statres.st_nlink > 1 and \
402 inode in self.inodes and arcname != self.inodes[inode]:
403 # Is it a hardlink to an already
404 # archived file?
405 type = LNKTYPE
406 linkname = self.inodes[inode]
407 else:
408 # The inode is added only if its valid.
409 # For win32 it is always 0.
410 type = REGTYPE
411 if inode[0]:
412 self.inodes[inode] = arcname
413 elif stat.S_ISDIR(stmd):
414 type = DIRTYPE
415 elif stat.S_ISFIFO(stmd):
416 type = FIFOTYPE
417 elif stat.S_ISLNK(stmd):
418 type = SYMTYPE
419 linkname = os.readlink(name)
420 elif stat.S_ISCHR(stmd):
421 type = CHRTYPE
422 elif stat.S_ISBLK(stmd):
423 type = BLKTYPE
424 else:
425 return None
426
427 # Fill the TarInfo object with all
428 # information we can get.
429 tarinfo.name = arcname
430 tarinfo.mode = stmd
431 tarinfo.uid = statres.st_uid
432 tarinfo.gid = statres.st_gid
433 if type == REGTYPE:
434 tarinfo.size = statres.st_size
435 else:
436 tarinfo.size = 0L
437 tarinfo.mtime = statres.st_mtime
438 tarinfo.type = type
439 tarinfo.linkname = linkname
440 if pwd:
441 try:
442 tarinfo.uname = pwd.getpwuid(tarinfo.uid)[0]
443 except KeyError:
444 pass
445 if grp:
446 try:
447 tarinfo.gname = grp.getgrgid(tarinfo.gid)[0]
448 except KeyError:
449 pass
450
451 if type in (CHRTYPE, BLKTYPE):
452 if hasattr(os, "major") and hasattr(os, "minor"):
453 tarinfo.devmajor = os.major(statres.st_rdev)
454 tarinfo.devminor = os.minor(statres.st_rdev)
455 return tarinfo
456
457 def list(self, verbose=True):
458 """Print a table of contents to sys.stdout. If `verbose' is False, only
459 the names of the members are printed. If it is True, an `ls -l'-like
460 output is produced.
461 """
462 self._check()
463
464 for tarinfo in self:
465 if verbose:
466 print filemode(tarinfo.mode),
467 print "%s/%s" % (tarinfo.uname or tarinfo.uid,
468 tarinfo.gname or tarinfo.gid),
469 if tarinfo.ischr() or tarinfo.isblk():
470 print "%10s" % ("%d,%d" \
471 % (tarinfo.devmajor, tarinfo.devminor)),
472 else:
473 print "%10d" % tarinfo.size,
474 print "%d-%02d-%02d %02d:%02d:%02d" \
475 % time.localtime(tarinfo.mtime)[:6],
476
477 print tarinfo.name + ("/" if tarinfo.isdir() else ""),
478
479 if verbose:
480 if tarinfo.issym():
481 print "->", tarinfo.linkname,
482 if tarinfo.islnk():
483 print "link to", tarinfo.linkname,
484 print
485
486 def add(self, name, arcname=None, recursive=True, exclude=None, filter=None):
487 """Add the file `name' to the archive. `name' may be any type of file
488 (directory, fifo, symbolic link, etc.). If given, `arcname'
489 specifies an alternative name for the file in the archive.
490 Directories are added recursively by default. This can be avoided by
491 setting `recursive' to False. `exclude' is a function that should
492 return True for each filename to be excluded. `filter' is a function
493 that expects a TarInfo object argument and returns the changed
494 TarInfo object, if it returns None the TarInfo object will be
495 excluded from the archive.
496 """
497 self._check("aw")
498
499 if arcname is None:
500 arcname = name
501
502 # Exclude pathnames.
503 if exclude is not None:
504 import warnings
505 warnings.warn("use the filter argument instead",
506 DeprecationWarning, 2)
507 if exclude(name):
508 self._dbg(2, "tarfile: Excluded %r" % name)
509 return
510
511 # Skip if somebody tries to archive the archive...
512 if self.name is not None and os.path.abspath(name) == self.name:
513 self._dbg(2, "tarfile: Skipped %r" % name)
514 return
515
516 self._dbg(1, name)
517
518 # Create a TarInfo object from the file.
519 tarinfo = self.gettarinfo(name, arcname)
520
521 if tarinfo is None:
522 self._dbg(1, "tarfile: Unsupported type %r" % name)
523 return
524
525 # Change or exclude the TarInfo object.
526 if filter is not None:
527 tarinfo = filter(tarinfo)
528 if tarinfo is None:
529 self._dbg(2, "tarfile: Excluded %r" % name)
530 return
531
532 # Append the tar header and data to the archive.
533 if tarinfo.isreg():
534 with bltn_open(name, "rb") as f:
535 self.addfile(tarinfo, f)
536
537 elif tarinfo.isdir():
538 self.addfile(tarinfo)
539 if recursive:
540 for f in os.listdir(name):
541 self.add(os.path.join(name, f), os.path.join(arcname, f),
542 recursive, exclude, filter)
543
544 else:
545 self.addfile(tarinfo)
546
547 def addfile(self, tarinfo, fileobj=None):
548 """Add the TarInfo object `tarinfo' to the archive. If `fileobj' is
549 given, tarinfo.size bytes are read from it and added to the archive.
550 You can create TarInfo objects using gettarinfo().
551 On Windows platforms, `fileobj' should always be opened with mode
552 'rb' to avoid irritation about the file size.
553 """
554 self._check("aw")
555
556 tarinfo = copy.copy(tarinfo)
557
558 buf = tarinfo.tobuf(self.format, self.encoding, self.errors)
559 self.fileobj.write(buf)
560 self.offset += len(buf)
561
562 # If there's data to follow, append it.
563 if fileobj is not None:
564 copyfileobj(fileobj, self.fileobj, tarinfo.size)
565 blocks, remainder = divmod(tarinfo.size, BLOCKSIZE)
566 if remainder > 0:
567 self.fileobj.write(NUL * (BLOCKSIZE - remainder))
568 blocks += 1
569 self.offset += blocks * BLOCKSIZE
570
571 self.members.append(tarinfo)
572
573 def extractall(self, path=".", members=None):
574 """Extract all members from the archive to the current working
575 directory and set owner, modification time and permissions on
576 directories afterwards. `path' specifies a different directory
577 to extract to. `members' is optional and must be a subset of the
578 list returned by getmembers().
579 """
580 directories = []
581
582 if members is None:
583 members = self
584
585 for tarinfo in members:
586 if tarinfo.isdir():
587 # Extract directories with a safe mode.
588 directories.append(tarinfo)
589 tarinfo = copy.copy(tarinfo)
590 tarinfo.mode = 0700
591 self.extract(tarinfo, path)
592
593 # Reverse sort directories.
594 directories.sort(key=operator.attrgetter('name'))
595 directories.reverse()
596
597 # Set correct owner, mtime and filemode on directories.
598 for tarinfo in directories:
599 dirpath = os.path.join(path, tarinfo.name)
600 try:
601 self.chown(tarinfo, dirpath)
602 self.utime(tarinfo, dirpath)
603 self.chmod(tarinfo, dirpath)
604 except ExtractError, e:
605 if self.errorlevel > 1:
606 raise
607 else:
608 self._dbg(1, "tarfile: %s" % e)
609
610 def extract(self, member, path=""):
611 """Extract a member from the archive to the current working directory,
612 using its full name. Its file information is extracted as accurately
613 as possible. `member' may be a filename or a TarInfo object. You can
614 specify a different directory using `path'.
615 """
616 self._check("r")
617
618 if isinstance(member, basestring):
619 tarinfo = self.getmember(member)
620 else:
621 tarinfo = member
622
623 # Prepare the link target for makelink().
624 if tarinfo.islnk():
625 tarinfo._link_target = os.path.join(path, tarinfo.linkname)
626
627 try:
628 self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
629 except EnvironmentError, e:
630 if self.errorlevel > 0:
631 raise
632 else:
633 if e.filename is None:
634 self._dbg(1, "tarfile: %s" % e.strerror)
635 else:
636 self._dbg(1, "tarfile: %s %r" % (e.strerror, e.filename))
637 except ExtractError, e:
638 if self.errorlevel > 1:
639 raise
640 else:
641 self._dbg(1, "tarfile: %s" % e)
642
643 def extractfile(self, member):
644 """Extract a member from the archive as a file object. `member' may be
645 a filename or a TarInfo object. If `member' is a regular file, a
646 file-like object is returned. If `member' is a link, a file-like
647 object is constructed from the link's target. If `member' is none of
648 the above, None is returned.
649 The file-like object is read-only and provides the following
650 methods: read(), readline(), readlines(), seek() and tell()
651 """
652 self._check("r")
653
654 if isinstance(member, basestring):
655 tarinfo = self.getmember(member)
656 else:
657 tarinfo = member
658
659 if tarinfo.isreg():
660 return self.fileobject(self, tarinfo)
661
662 elif tarinfo.type not in SUPPORTED_TYPES:
663 # If a member's type is unknown, it is treated as a
664 # regular file.
665 return self.fileobject(self, tarinfo)
666
667 elif tarinfo.islnk() or tarinfo.issym():
668 if isinstance(self.fileobj, _Stream):
669 # A small but ugly workaround for the case that someone tries
670 # to extract a (sym)link as a file-object from a non-seekable
671 # stream of tar blocks.
672 raise StreamError("cannot extract (sym)link as file object")
673 else:
674 # A (sym)link's file object is its target's file object.
675 return self.extractfile(self._find_link_target(tarinfo))
676 else:
677 # If there's no data associated with the member (directory, chrdev,
678 # blkdev, etc.), return None instead of a file object.
679 return None
680
681 def _extract_member(self, tarinfo, targetpath):
682 """Extract the TarInfo object tarinfo to a physical
683 file called targetpath.
684 """
685 # Fetch the TarInfo object for the given name
686 # and build the destination pathname, replacing
687 # forward slashes to platform specific separators.
688 targetpath = targetpath.rstrip("/")
689 targetpath = targetpath.replace("/", os.sep)
690
691 # Create all upper directories.
692 upperdirs = os.path.dirname(targetpath)
693 if upperdirs and not os.path.exists(upperdirs):
694 # Create directories that are not part of the archive with
695 # default permissions.
696 os.makedirs(upperdirs)
697
698 if tarinfo.islnk() or tarinfo.issym():
699 self._dbg(1, "%s -> %s" % (tarinfo.name, tarinfo.linkname))
700 else:
701 self._dbg(1, tarinfo.name)
702
703 if tarinfo.isreg():
704 self.makefile(tarinfo, targetpath)
705 elif tarinfo.isdir():
706 self.makedir(tarinfo, targetpath)
707 elif tarinfo.isfifo():
708 self.makefifo(tarinfo, targetpath)
709 elif tarinfo.ischr() or tarinfo.isblk():
710 self.makedev(tarinfo, targetpath)
711 elif tarinfo.islnk() or tarinfo.issym():
712 self.makelink(tarinfo, targetpath)
713 elif tarinfo.type not in SUPPORTED_TYPES:
714 self.makeunknown(tarinfo, targetpath)
715 else:
716 self.makefile(tarinfo, targetpath)
717
718 self.chown(tarinfo, targetpath)
719 if not tarinfo.issym():
720 self.chmod(tarinfo, targetpath)
721 self.utime(tarinfo, targetpath)
722
723 #--------------------------------------------------------------------------
724 # Below are the different file methods. They are called via
725 # _extract_member() when extract() is called. They can be replaced in a
726 # subclass to implement other functionality.
727
728 def makedir(self, tarinfo, targetpath):
729 """Make a directory called targetpath.
730 """
731 try:
732 # Use a safe mode for the directory, the real mode is set
733 # later in _extract_member().
734 os.mkdir(targetpath, 0700)
735 except EnvironmentError, e:
736 if e.errno != errno.EEXIST:
737 raise
738
739 def makefile(self, tarinfo, targetpath):
740 """Make a file called targetpath.
741 """
742 source = self.extractfile(tarinfo)
743 try:
744 with bltn_open(targetpath, "wb") as target:
745 copyfileobj(source, target)
746 finally:
747 source.close()
748
749 def makeunknown(self, tarinfo, targetpath):
750 """Make a file from a TarInfo object with an unknown type
751 at targetpath.
752 """
753 self.makefile(tarinfo, targetpath)
754 self._dbg(1, "tarfile: Unknown file type %r, " \
755 "extracted as regular file." % tarinfo.type)
756
757 def makefifo(self, tarinfo, targetpath):
758 """Make a fifo called targetpath.
759 """
760 if hasattr(os, "mkfifo"):
761 os.mkfifo(targetpath)
762 else:
763 raise ExtractError("fifo not supported by system")
764
765 def makedev(self, tarinfo, targetpath):
766 """Make a character or block device called targetpath.
767 """
768 if not hasattr(os, "mknod") or not hasattr(os, "makedev"):
769 raise ExtractError("special devices not supported by system")
770
771 mode = tarinfo.mode
772 if tarinfo.isblk():
773 mode |= stat.S_IFBLK
774 else:
775 mode |= stat.S_IFCHR
776
777 os.mknod(targetpath, mode,
778 os.makedev(tarinfo.devmajor, tarinfo.devminor))
779
780 def makelink(self, tarinfo, targetpath):
781 """Make a (symbolic) link called targetpath. If it cannot be created
782 (platform limitation), we try to make a copy of the referenced file
783 instead of a link.
784 """
785 if hasattr(os, "symlink") and hasattr(os, "link"):
786 # For systems that support symbolic and hard links.
787 if tarinfo.issym():
788 if os.path.lexists(targetpath):
789 os.unlink(targetpath)
790 os.symlink(tarinfo.linkname, targetpath)
791 else:
792 # See extract().
793 if os.path.exists(tarinfo._link_target):
794 if os.path.lexists(targetpath):
795 os.unlink(targetpath)
796 os.link(tarinfo._link_target, targetpath)
797 else:
798 self._extract_member(self._find_link_target(tarinfo), targetpath)
799 else:
800 try:
801 self._extract_member(self._find_link_target(tarinfo), targetpath)
802 except KeyError:
803 raise ExtractError("unable to resolve link inside archive")
804
805 def chown(self, tarinfo, targetpath):
806 """Set owner of targetpath according to tarinfo.
807 """
808 if pwd and hasattr(os, "geteuid") and os.geteuid() == 0:
809 # We have to be root to do so.
810 try:
811 g = grp.getgrnam(tarinfo.gname)[2]
812 except KeyError:
813 g = tarinfo.gid
814 try:
815 u = pwd.getpwnam(tarinfo.uname)[2]
816 except KeyError:
817 u = tarinfo.uid
818 try:
819 if tarinfo.issym() and hasattr(os, "lchown"):
820 os.lchown(targetpath, u, g)
821 else:
822 if sys.platform != "os2emx":
823 os.chown(targetpath, u, g)
824 except EnvironmentError, e:
825 raise ExtractError("could not change owner")
826
827 def chmod(self, tarinfo, targetpath):
828 """Set file permissions of targetpath according to tarinfo.
829 """
830 if hasattr(os, 'chmod'):
831 try:
832 os.chmod(targetpath, tarinfo.mode)
833 except EnvironmentError, e:
834 raise ExtractError("could not change mode")
835
836 def utime(self, tarinfo, targetpath):
837 """Set modification time of targetpath according to tarinfo.
838 """
839 if not hasattr(os, 'utime'):
840 return
841 try:
842 os.utime(targetpath, (tarinfo.mtime, tarinfo.mtime))
843 except EnvironmentError, e:
844 raise ExtractError("could not change modification time")
845
846 #--------------------------------------------------------------------------
847 def next(self):
848 """Return the next member of the archive as a TarInfo object, when
849 TarFile is opened for reading. Return None if there is no more
850 available.
851 """
852 self._check("ra")
853 if self.firstmember is not None:
854 m = self.firstmember
855 self.firstmember = None
856 return m
857
858 # Read the next block.
859 self.fileobj.seek(self.offset)
860 tarinfo = None
861 while True:
862 try:
863 tarinfo = self.tarinfo.fromtarfile(self)
864 except EOFHeaderError, e:
865 if self.ignore_zeros:
866 self._dbg(2, "0x%X: %s" % (self.offset, e))
867 self.offset += BLOCKSIZE
868 continue
869 except InvalidHeaderError, e:
870 if self.ignore_zeros:
871 self._dbg(2, "0x%X: %s" % (self.offset, e))
872 self.offset += BLOCKSIZE
873 continue
874 elif self.offset == 0:
875 raise ReadError(str(e))
876 except EmptyHeaderError:
877 if self.offset == 0:
878 raise ReadError("empty file")
879 except TruncatedHeaderError, e:
880 if self.offset == 0:
881 raise ReadError(str(e))
882 except SubsequentHeaderError, e:
883 raise ReadError(str(e))
884 break
885
886 if tarinfo is not None:
887 self.members.append(tarinfo)
888 else:
889 self._loaded = True
890
891 return tarinfo
892
893 #--------------------------------------------------------------------------
894 # Little helper methods:
895
896 def _getmember(self, name, tarinfo=None, normalize=False):
897 """Find an archive member by name from bottom to top.
898 If tarinfo is given, it is used as the starting point.
899 """
900 # Ensure that all members have been loaded.
901 members = self.getmembers()
902
903 # Limit the member search list up to tarinfo.
904 if tarinfo is not None:
905 members = members[:members.index(tarinfo)]
906
907 if normalize:
908 name = os.path.normpath(name)
909
910 for member in reversed(members):
911 if normalize:
912 member_name = os.path.normpath(member.name)
913 else:
914 member_name = member.name
915
916 if name == member_name:
917 return member
918
919 def _load(self):
920 """Read through the entire archive file and look for readable
921 members.
922 """
923 while True:
924 tarinfo = self.next()
925 if tarinfo is None:
926 break
927 self._loaded = True
928
929 def _check(self, mode=None):
930 """Check if TarFile is still open, and if the operation's mode
931 corresponds to TarFile's mode.
932 """
933 if self.closed:
934 raise IOError("%s is closed" % self.__class__.__name__)
935 if mode is not None and self.mode not in mode:
936 raise IOError("bad operation for mode %r" % self.mode)
937
938 def _find_link_target(self, tarinfo):
939 """Find the target member of a symlink or hardlink member in the
940 archive.
941 """
942 if tarinfo.issym():
943 # Always search the entire archive.
944 linkname = "/".join(filter(None, (os.path.dirname(tarinfo.name), tarinfo.linkname)))
945 limit = None
946 else:
947 # Search the archive before the link, because a hard link is
948 # just a reference to an already archived file.
949 linkname = tarinfo.linkname
950 limit = tarinfo
951
952 member = self._getmember(linkname, tarinfo=limit, normalize=True)
953 if member is None:
954 raise KeyError("linkname %r not found" % linkname)
955 return member
956
957 def __iter__(self):
958 """Provide an iterator object.
959 """
960 if self._loaded:
961 return iter(self.members)
962 else:
963 return TarIter(self)
964
965 def _dbg(self, level, msg):
966 """Write debugging output to sys.stderr.
967 """
968 if level <= self.debug:
969 print >> sys.stderr, msg
970
971 def __enter__(self):
972 self._check()
973 return self
974
975 def __exit__(self, type, value, traceback):
976 if type is None:
977 self.close()
978 else:
979 # An exception occurred. We must not call close() because
980 # it would try to write end-of-archive blocks and padding.
981 if not self._extfileobj:
982 self.fileobj.close()
983 self.closed = True
984 # class TarFile

Tarfile

六、shelve模块

shelve模块是一个简单的k,v将内存数据通过文件持久化的模块,可以持久化任何pickle可支持的python数据格式

 1 import shelve
2
3 d = shelve.open('shelve_test') #打开一个文件
4
5 class Test(object):
6 def __init__(self,n):
7 self.n = n
8
9
10 t = Test(123)
11 t2 = Test(123334)
12
13 name = ["alex","rain","test"]
14 d["test"] = name #持久化列表
15 d["t1"] = t #持久化类
16 d["t2"] = t2
17
18 d.close()

七、XML模块

xml是实现不同语言或程序之间进行数据交换的协议,跟json差不多,但json使用起来更简单,不过,古时候,在json还没诞生的黑暗年代,大家只能选择用xml呀,至今很多传统公司如金融行业的很多系统的接口还主要是xml。

xml的格式如下,就是通过<>节点来区别数据结构的:

 1 <?xml version="1.0"?>
2 <data>
3 <country name="Liechtenstein">
4 <rank updated="yes">2</rank>
5 <year>2008</year>
6 <gdppc>141100</gdppc>
7 <neighbor name="Austria" direction="E"/>
8 <neighbor name="Switzerland" direction="W"/>
9 </country>
10 <country name="Singapore">
11 <rank updated="yes">5</rank>
12 <year>2011</year>
13 <gdppc>59900</gdppc>
14 <neighbor name="Malaysia" direction="N"/>
15 </country>
16 <country name="Panama">
17 <rank updated="yes">69</rank>
18 <year>2011</year>
19 <gdppc>13600</gdppc>
20 <neighbor name="Costa Rica" direction="W"/>
21 <neighbor name="Colombia" direction="E"/>
22 </country>
23 </data>

xml协议在各个语言里的都 是支持的,在python中可以用以下模块操作xml

 1 import xml.etree.ElementTree as ET
2
3 tree = ET.parse("xmltest.xml")
4 root = tree.getroot()
5 print(root.tag)
6
7 #遍历xml文档
8 for child in root:
9 print(child.tag, child.attrib)
10 for i in child:
11 print(i.tag,i.text)
12
13 #只遍历year 节点
14 for node in root.iter('year'):
15 print(node.tag,node.text)

修改和删除xml文档内容

 1 import xml.etree.ElementTree as ET
2
3 tree = ET.parse("xmltest.xml")
4 root = tree.getroot()
5
6 #修改
7 for node in root.iter('year'):
8 new_year = int(node.text) + 1
9 node.text = str(new_year)
10 node.set("updated","yes")
11
12 tree.write("xmltest.xml")
13
14
15 #删除node
16 for country in root.findall('country'):
17 rank = int(country.find('rank').text)
18 if rank > 50:
19 root.remove(country)
20
21 tree.write('output.xml')

自己创建xml文档

 1 import xml.etree.ElementTree as ET
2
3
4 new_xml = ET.Element("namelist")
5 name = ET.SubElement(new_xml,"name",attrib={"enrolled":"yes"})
6 age = ET.SubElement(name,"age",attrib={"checked":"no"})
7 sex = ET.SubElement(name,"sex")
8 sex.text = '33'
9 name2 = ET.SubElement(new_xml,"name",attrib={"enrolled":"no"})
10 age = ET.SubElement(name2,"age")
11 age.text = '19'
12
13 et = ET.ElementTree(new_xml) #生成文档对象
14 et.write("test.xml", encoding="utf-8",xml_declaration=True)
15
16 ET.dump(new_xml) #打印生成的格式

八、ConfigParser模块

用于生成和修改常见配置文档,当前模块的名称在 python 3.x 版本中变更为 configparser。

来看一个好多软件的常见文档格式如下

 1 [DEFAULT]
2 ServerAliveInterval = 45
3 Compression = yes
4 CompressionLevel = 9
5 ForwardX11 = yes
6
7 [bitbucket.org]
8 User = hg
9
10 [topsecret.server.com]
11 Port = 50022
12 ForwardX11 = no

如果想用python生成一个这样的文档怎么做呢?

 1 import configparser
2
3 config = configparser.ConfigParser()
4 config["DEFAULT"] = {'ServerAliveInterval': '45',
5 'Compression': 'yes',
6 'CompressionLevel': '9'}
7
8 config['bitbucket.org'] = {}
9 config['bitbucket.org']['User'] = 'hg'
10 config['topsecret.server.com'] = {}
11 topsecret = config['topsecret.server.com']
12 topsecret['Host Port'] = '50022' # mutates the parser
13 topsecret['ForwardX11'] = 'no' # same here
14 config['DEFAULT']['ForwardX11'] = 'yes'
15 with open('example.ini', 'w') as configfile:
16 config.write(configfile)

写完了还可以再读出来

 1 >>> import configparser
2 >>> config = configparser.ConfigParser()
3 >>> config.sections()
4 []
5 >>> config.read('example.ini')
6 ['example.ini']
7 >>> config.sections()
8 ['bitbucket.org', 'topsecret.server.com']
9 >>> 'bitbucket.org' in config
10 True
11 >>> 'bytebong.com' in config
12 False
13 >>> config['bitbucket.org']['User']
14 'hg'
15 >>> config['DEFAULT']['Compression']
16 'yes'
17 >>> topsecret = config['topsecret.server.com']
18 >>> topsecret['ForwardX11']
19 'no'
20 >>> topsecret['Port']
21 '50022'
22 >>> for key in config['bitbucket.org']: print(key)
23 ...
24 user
25 compressionlevel
26 serveraliveinterval
27 compression
28 forwardx11
29 >>> config['bitbucket.org']['ForwardX11']
30 'yes'

configparser增删改查语法

 1 [section1]
2 k1 = v1
3 k2:v2
4
5 [section2]
6 k1 = v1
7
8 import ConfigParser
9
10 config = ConfigParser.ConfigParser()
11 config.read('i.cfg')
12
13 # ########## 读 ##########
14 #secs = config.sections()
15 #print secs
16 #options = config.options('group2')
17 #print options
18
19 #item_list = config.items('group2')
20 #print item_list
21
22 #val = config.get('group1','key')
23 #val = config.getint('group1','key')
24
25 # ########## 改写 ##########
26 #sec = config.remove_section('group1')
27 #config.write(open('i.cfg', "w"))
28
29 #sec = config.has_section('wupeiqi')
30 #sec = config.add_section('wupeiqi')
31 #config.write(open('i.cfg', "w"))
32
33
34 #config.set('group2','k1',11111)
35 #config.write(open('i.cfg', "w"))
36
37 #config.remove_option('group2','age')
38 #config.write(open('i.cfg', "w"))

hashlib模块  

用于加密相关的操作,3.x里代替了md5模块和sha模块,主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ,MD5 算法

 1 import hashlib
2
3 m = hashlib.md5()
4 m.update(b"Hello")
5 m.update(b"It's me")
6 print(m.digest())
7 m.update(b"It's been a long time since last time we ...")
8
9 print(m.digest()) #2进制格式hash
10 print(len(m.hexdigest())) #16进制格式hash
11 '''
12 def digest(self, *args, **kwargs): # real signature unknown
13 """ Return the digest value as a string of binary data. """
14 pass
15
16 def hexdigest(self, *args, **kwargs): # real signature unknown
17 """ Return the digest value as a string of hexadecimal digits. """
18 pass
19
20 '''
21 import hashlib
22
23 # ######## md5 ########
24
25 hash = hashlib.md5()
26 hash.update('admin')
27 print(hash.hexdigest())
28
29 # ######## sha1 ########
30
31 hash = hashlib.sha1()
32 hash.update('admin')
33 print(hash.hexdigest())
34
35 # ######## sha256 ########
36
37 hash = hashlib.sha256()
38 hash.update('admin')
39 print(hash.hexdigest())
40
41
42 # ######## sha384 ########
43
44 hash = hashlib.sha384()
45 hash.update('admin')
46 print(hash.hexdigest())
47
48 # ######## sha512 ########
49
50 hash = hashlib.sha512()
51 hash.update('admin')
52 print(hash.hexdigest())

python 还有一个 hmac 模块,它内部对我们创建 key 和 内容 再进行处理然后再加密

散列消息鉴别码,简称HMAC,是一种基于消息鉴别码MAC(Message Authentication Code)的鉴别机制。使用HMAC时,消息通讯的双方,通过验证消息中加入的鉴别密钥K来鉴别消息的真伪;

一般用于网络通信中消息加密,前提是双方先要约定好key,就像接头暗号一样,然后消息发送把用key把消息加密,接收方用key + 消息明文再加密,拿加密后的值 跟 发送者的相对比是否相等,这样就能验证消息的真实性,及发送者的合法性了。

1 import hmac
2 h = hmac.new(b'天王盖地虎', b'宝塔镇河妖')
3 print h.hexdigest()

看这里https://www.tbs-certificates.co.uk/FAQ/en/sha256.html

Python也可以很容易的处理ymal文档格式,只不过需要安装一个模块,参考文档:http://pyyaml.org/wiki/PyYAMLDocumentation

九、Subprocess模块

 1 #执行命令,返回命令执行状态 , 0 or 非0
2 >>> retcode = subprocess.call(["ls", "-l"])
3
4 #执行命令,如果命令结果为0,就正常返回,否则抛异常
5 >>> subprocess.check_call(["ls", "-l"])
6 0
7
8 #接收字符串格式命令,返回元组形式,第1个元素是执行状态,第2个是命令结果
9 >>> subprocess.getstatusoutput('ls /bin/ls')
10 (0, '/bin/ls')
11
12 #接收字符串格式命令,并返回结果
13 >>> subprocess.getoutput('ls /bin/ls')
14 '/bin/ls'
15
16 #执行命令,并返回结果,注意是返回结果,不是打印,下例结果返回给res
17 >>> res=subprocess.check_output(['ls','-l'])
18 >>> res
19 b'total 0\ndrwxr-xr-x 12 alex staff 408 Nov 2 11:05 OldBoyCRM\n'
20
21 #上面那些方法,底层都是封装的subprocess.Popen
22 poll()
23 Check if child process has terminated. Returns returncode
24
25 wait()
26 Wait for child process to terminate. Returns returncode attribute.
27
28
29 terminate() 杀掉所启动进程
30 communicate() 等待任务结束
31
32 stdin 标准输入
33
34 stdout 标准输出
35
36 stderr 标准错误
37
38 pid
39 The process ID of the child process.
40
41 #例子
42 >>> p = subprocess.Popen("df -h|grep disk",stdin=subprocess.PIPE,stdout=subprocess.PIPE,shell=True)
43 >>> p.stdout.read()
44 b'/dev/disk1 465Gi 64Gi 400Gi 14% 16901472 104938142 14% /\n'
 1 >>> subprocess.run(["ls", "-l"])  # doesn't capture output
2 CompletedProcess(args=['ls', '-l'], returncode=0)
3
4 >>> subprocess.run("exit 1", shell=True, check=True)
5 Traceback (most recent call last):
6 ...
7 subprocess.CalledProcessError: Command 'exit 1' returned non-zero exit status 1
8
9 >>> subprocess.run(["ls", "-l", "/dev/null"], stdout=subprocess.PIPE)
10 CompletedProcess(args=['ls', '-l', '/dev/null'], returncode=0,
11 stdout=b'crw-rw-rw- 1 root root 1, 3 Jan 23 16:23 /dev/null\n')

调用subprocess.run(...)是推荐的常用方法,在大多数情况下能满足需求,但如果你可能需要进行一些复杂的与系统的交互的话,你还可以用subprocess.Popen(),语法如下:

1 p = subprocess.Popen("find / -size +1000000 -exec ls -shl {} \;",shell=True,stdout=subprocess.PIPE)
2 print(p.stdout.read())

可用参数:

    • args:shell命令,可以是字符串或者序列类型(如:list,元组)
    • bufsize:指定缓冲。0 无缓冲,1 行缓冲,其他 缓冲区大小,负值 系统缓冲
    • stdin, stdout, stderr:分别表示程序的标准输入、输出、错误句柄
    • preexec_fn:只在Unix平台下有效,用于指定一个可执行对象(callable object),它将在子进程运行之前被调用
    • close_sfs:在windows平台下,如果close_fds被设置为True,则新创建的子进程将不会继承父进程的输入、输出、错误管道。
      所以不能将close_fds设置为True同时重定向子进程的标准输入、输出与错误(stdin, stdout, stderr)。
    • shell:同上
    • cwd:用于设置子进程的当前目录
    • env:用于指定子进程的环境变量。如果env = None,子进程的环境变量将从父进程中继承。
    • universal_newlines:不同系统的换行符不同,True -> 同意使用 \n
    • startupinfo与createionflags只在windows下有效
      将被传递给底层的CreateProcess()函数,用于设置子进程的一些属性,如:主窗口的外观,进程的优先级等等

终端输入的命令分为两种:

  • 输入即可得到输出,如:ifconfig
  • 输入进行某环境,依赖再输入,如:python

需要交互的命令示例

 1 import subprocess
2
3 obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
4 obj.stdin.write('print 1 \n ')
5 obj.stdin.write('print 2 \n ')
6 obj.stdin.write('print 3 \n ')
7 obj.stdin.write('print 4 \n ')
8
9 out_error_list = obj.communicate(timeout=10)
10 print out_error_list

subprocess实现sudo 自动输入密码

 1 import subprocess
2
3 def mypass():
4 mypass = '123' #or get the password from anywhere
5 return mypass
6
7 echo = subprocess.Popen(['echo',mypass()],
8 stdout=subprocess.PIPE,
9 )
10
11 sudo = subprocess.Popen(['sudo','-S','iptables','-L'],
12 stdin=echo.stdout,
13 stdout=subprocess.PIPE,
14 )
15
16 end_of_pipe = sudo.stdout
17
18 print "Password ok \n Iptables Chains %s" % end_of_pipe.read()

十、logging模块

很多程序都有记录日志的需求,并且日志中包含的信息即有正常的程序访问日志,还可能有错误、警告等信息输出,python的logging模块提供了标准的日志接口,你可以通过它存储各种格式的日志,logging的日志可以分为 debug()info()warning()error() and critical() 5个级别,下面我们看一下怎么用。

最简单用法

1 import logging
2
3 logging.warning("user [alex] attempted wrong password more than 3 times")
4 logging.critical("server is down")
5
6 #输出
7 WARNING:root:user [alex] attempted wrong password more than 3 times
8 CRITICAL:root:server is down

看一下这几个日志级别分别代表什么意思

Level When it’s used
DEBUG Detailed information, typically of interest only when diagnosing problems.
INFO Confirmation that things are working as expected.
WARNING An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected.
ERROR Due to a more serious problem, the software has not been able to perform some function.
CRITICAL A serious error, indicating that the program itself may be unable to continue running.

如果想把日志写到文件里,也很简单

1 import logging
2
3 logging.basicConfig(filename='example.log',level=logging.INFO)
4 logging.debug('This message should go to the log file')
5 logging.info('So should this')
6 logging.warning('And this, too')

其中下面这句中的level=loggin.INFO意思是,把日志纪录级别设置为INFO,也就是说,只有比日志是INFO或比INFO级别更高的日志才会被纪录到文件里,在这个例子, 第一条日志是不会被纪录的,如果希望纪录debug的日志,那把日志级别改成DEBUG就行了。

1 logging.basicConfig(filename='example.log',level=logging.INFO)

感觉上面的日志格式忘记加上时间啦,日志不知道时间怎么行呢,下面就来加上!

1 import logging
2 logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p')
3 logging.warning('is when this event was logged.')
4
5 #输出
6 12/12/2010 11:46:36 AM is when this event was logged.

日志格式

%(name)s

Logger的名字

%(levelno)s

数字形式的日志级别

%(levelname)s

文本形式的日志级别

%(pathname)s

调用日志输出函数的模块的完整路径名,可能没有

%(filename)s

调用日志输出函数的模块的文件名

%(module)s

调用日志输出函数的模块名

%(funcName)s

调用日志输出函数的函数名

%(lineno)d

调用日志输出函数的语句所在的代码行

%(created)f

当前时间,用UNIX标准的表示时间的浮 点数表示

%(relativeCreated)d

输出日志信息时的,自Logger创建以 来的毫秒数

%(asctime)s

字符串形式的当前时间。默认格式是 “2003-07-08 16:49:45,896”。逗号后面的是毫秒

%(thread)d

线程ID。可能没有

%(threadName)s

线程名。可能没有

%(process)d

进程ID。可能没有

%(message)s

用户输出的消息

Python 使用logging模块记录日志涉及四个主要类,使用官方文档中的概括最为合适:

logger提供了应用程序可以直接使用的接口;

handler将(logger创建的)日志记录发送到合适的目的输出;

filter提供了细度设备来决定输出哪条日志记录;

formatter决定日志记录的最终输出格式。

logger
每个程序在输出信息之前都要获得一个Logger。Logger通常对应了程序的模块名,比如聊天工具的图形界面模块可以这样获得它的Logger:
LOG=logging.getLogger(”chat.gui”)
而核心模块可以这样:
LOG=logging.getLogger(”chat.kernel”) Logger.setLevel(lel):指定最低的日志级别,低于lel的级别将被忽略。debug是最低的内置级别,critical为最高
Logger.addFilter(filt)、Logger.removeFilter(filt):添加或删除指定的filter
Logger.addHandler(hdlr)、Logger.removeHandler(hdlr):增加或删除指定的handler
Logger.debug()、Logger.info()、Logger.warning()、Logger.error()、Logger.critical():可以设置的日志级别 handler handler对象负责发送相关的信息到指定目的地。Python的日志系统有多种Handler可以使用。有些Handler可以把信息输出到控制台,有些Logger可以把信息输出到文件,还有些 Handler可以把信息发送到网络上。如果觉得不够用,还可以编写自己的Handler。可以通过addHandler()方法添加多个多handler
Handler.setLevel(lel):指定被处理的信息级别,低于lel级别的信息将被忽略
Handler.setFormatter():给这个handler选择一个格式
Handler.addFilter(filt)、Handler.removeFilter(filt):新增或删除一个filter对象 每个Logger可以附加多个Handler。接下来我们就来介绍一些常用的Handler:
1) logging.StreamHandler
使用这个Handler可以向类似与sys.stdout或者sys.stderr的任何文件对象(file object)输出信息。它的构造函数是:
StreamHandler([strm])
其中strm参数是一个文件对象。默认是sys.stderr 2) logging.FileHandler
和StreamHandler类似,用于向一个文件输出日志信息。不过FileHandler会帮你打开这个文件。它的构造函数是:
FileHandler(filename[,mode])
filename是文件名,必须指定一个文件名。
mode是文件的打开方式。参见Python内置函数open()的用法。默认是’a',即添加到文件末尾。 3) logging.handlers.RotatingFileHandler
这个Handler类似于上面的FileHandler,但是它可以管理文件大小。当文件达到一定大小之后,它会自动将当前日志文件改名,然后创建 一个新的同名日志文件继续输出。比如日志文件是chat.log。当chat.log达到指定的大小之后,RotatingFileHandler自动把 文件改名为chat.log.1。不过,如果chat.log.1已经存在,会先把chat.log.1重命名为chat.log.2。。。最后重新创建 chat.log,继续输出日志信息。它的构造函数是:
RotatingFileHandler( filename[, mode[, maxBytes[, backupCount]]])
其中filename和mode两个参数和FileHandler一样。
maxBytes用于指定日志文件的最大文件大小。如果maxBytes为0,意味着日志文件可以无限大,这时上面描述的重命名过程就不会发生。
backupCount用于指定保留的备份文件的个数。比如,如果指定为2,当上面描述的重命名过程发生时,原有的chat.log.2并不会被更名,而是被删除。 4) logging.handlers.TimedRotatingFileHandler
这个Handler和RotatingFileHandler类似,不过,它没有通过判断文件大小来决定何时重新创建日志文件,而是间隔一定时间就 自动创建新的日志文件。重命名的过程与RotatingFileHandler类似,不过新的文件不是附加数字,而是当前时间。它的构造函数是:
TimedRotatingFileHandler( filename [,when [,interval [,backupCount]]])
其中filename参数和backupCount参数和RotatingFileHandler具有相同的意义。
interval是时间间隔。
when参数是一个字符串。表示时间间隔的单位,不区分大小写。它有以下取值:
S 秒
M 分
H 小时
D 天
W 每星期(interval==0时代表星期一)
midnight 每天凌晨
 1 import logging
2
3 #create logger
4 logger = logging.getLogger('TEST-LOG')
5 logger.setLevel(logging.DEBUG)
6
7
8 # create console handler and set level to debug
9 ch = logging.StreamHandler()
10 ch.setLevel(logging.DEBUG)
11
12 # create file handler and set level to warning
13 fh = logging.FileHandler("access.log")
14 fh.setLevel(logging.WARNING)
15 # create formatter
16 formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
17
18 # add formatter to ch and fh
19 ch.setFormatter(formatter)
20 fh.setFormatter(formatter)
21
22 # add ch and fh to logger
23 logger.addHandler(ch)
24 logger.addHandler(fh)
25
26 # 'application' code
27 logger.debug('debug message')
28 logger.info('info message')
29 logger.warn('warn message')
30 logger.error('error message')
31 logger.critical('critical message')

文件自动截断例子

 1 import logging
2
3 from logging import handlers
4
5 logger = logging.getLogger(__name__)
6
7 log_file = "timelog.log"
8 #fh = handlers.RotatingFileHandler(filename=log_file,maxBytes=10,backupCount=3)
9 fh = handlers.TimedRotatingFileHandler(filename=log_file,when="S",interval=5,backupCount=3)
10
11
12 formatter = logging.Formatter('%(asctime)s %(module)s:%(lineno)d %(message)s')
13
14 fh.setFormatter(formatter)
15
16 logger.addHandler(fh)
17
18
19 logger.warning("test1")
20 logger.warning("test12")
21 logger.warning("test13")
22 logger.warning("test14")

十一、re模块

常用正则表达式符号

 1 '.'     默认匹配除\n之外的任意一个字符,若指定flag DOTALL,则匹配任意字符,包括换行
2 '^' 匹配字符开头,若指定flags MULTILINE,这种也可以匹配上(r"^a","\nabc\neee",flags=re.MULTILINE)
3 '$' 匹配字符结尾,或e.search("foo$","bfoo\nsdfsf",flags=re.MULTILINE).group()也可以
4 '*' 匹配*号前的字符0次或多次,re.findall("ab*","cabb3abcbbac") 结果为['abb', 'ab', 'a']
5 '+' 匹配前一个字符1次或多次,re.findall("ab+","ab+cd+abb+bba") 结果['ab', 'abb']
6 '?' 匹配前一个字符1次或0次
7 '{m}' 匹配前一个字符m次
8 '{n,m}' 匹配前一个字符n到m次,re.findall("ab{1,3}","abb abc abbcbbb") 结果'abb', 'ab', 'abb']
9 '|' 匹配|左或|右的字符,re.search("abc|ABC","ABCBabcCD").group() 结果'ABC'
10 '(...)' 分组匹配,re.search("(abc){2}a(123|456)c", "abcabca456c").group() 结果 abcabca456c
11
12
13 '\A' 只从字符开头匹配,re.search("\Aabc","alexabc") 是匹配不到的
14 '\Z' 匹配字符结尾,同$
15 '\d' 匹配数字0-9
16 '\D' 匹配非数字
17 '\w' 匹配[A-Za-z0-9]
18 '\W' 匹配非[A-Za-z0-9]
19 's' 匹配空白字符、\t、\n、\r , re.search("\s+","ab\tc1\n3").group() 结果 '\t'
20
21 '(?P<name>...)' 分组匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city") 结果{'province': '3714', 'city': '81', 'birthday': '1993'}

最常用的匹配

re.match 从头开始匹配
re.search 匹配包含
re.findall 把所有匹配到的字符放到以列表中的元素返回
re.splitall 以匹配到的字符当做列表分隔符
re.sub      匹配字符并替换
反斜杠的困扰
与大多数编程语言相同,正则表达式里使用"\"作为转义字符,这就可能造成反斜杠困扰。假如你需要匹配文本中的字符"\",那么使用编程语言表示的正则表达式里将需要4个反斜杠"\\\\":前两个和后两个分别用于在编程语言里转义成反斜杠,转换成两个反斜杠后再在正则表达式里转义成一个反斜杠。Python里的原生字符串很好地解决了这个问题,这个例子中的正则表达式可以使用r"\\"表示。同样,匹配一个数字的"\\d"可以写成r"\d"。有了原生字符串,你再也不用担心是不是漏写了反斜杠,写出来的表达式也更直观。
1 re.I(re.IGNORECASE): 忽略大小写(括号内是完整写法,下同)
2 M(MULTILINE): 多行模式,改变'^'和'$'的行为(参见上图)
3 S(DOTALL): 点任意匹配模式,改变'.'的行为

python3-day5的更多相关文章

  1. Python3.5学习之旅——day5

    模块初识 一.定义 在python中,模块是用来实现某一特定功能的代码集合.其本质上就是以‘.py’结尾的python文件.例如某文件名为test.py,则模块名为test. 二.导入方法 我们在这一 ...

  2. python3.x Day5 面向对象

    类:类是指:对具有相同属性的事物的抽象.蓝图.原型.在类中定义了这些事物都具备的属性和共同的方法. 对象:一个对象就是一个类实例化以后的实例,一个类必须经过实例化后才能在程序中被使用,一个类可以实例化 ...

  3. python3.x Day5 socket编程

    socket编程: socket 是大多应用层的底层的封装,实际封装的就是 发送,接收,但其实很复杂,在传输层协议之上(TCP/IP,UDP) 既然是网络通讯,一般按照服务端,客户端来处理:服务端: ...

  4. python3.x Day5 异常处理

    异常处理: 预计可能会发生的异常,明确如果发生,如何处理,不过一般不参与业务逻辑,也不要一次性捕捉全部异常,不然可能程序就不可控了. data={} mmm=[] try: #捕获异常, data[& ...

  5. python3.x Day5 subprocess模块!!

    subprocess模块: # subprocess用来替换多个旧模块和函数 os.system os.spawn* os.popen* popen2.* commands.* subprocess简 ...

  6. Python基础篇-day5

    本节目录: 1.生成器 1.1 列表推导式方法 1.2 函数法--适用复杂的推导方法2.迭代器3.装饰器 3.1 单一验证方式(调用不传参数) 3.2 单一验证方式(调用传参数) 3.3 多种验证方式 ...

  7. Python Day5 模块 包

    一:区分Python文件的2种用途 1个Python文件的2种用途 1.1 当作脚本执行:        if __name__ == '__main__': 1.2 当作模块导入使用     if ...

  8. python3之os、sys

    os模块 # 显示当前使用平台:"nt":windows;"posix":Linux >>> os.name 'nt' # 当前工作目录 &g ...

  9. python自动化运维之路~DAY5

    python自动化运维之路~DAY5 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.模块的分类 模块,用一砣代码实现了某个功能的代码集合. 类似于函数式编程和面向过程编程,函数 ...

  10. Day5 函数递归,匿名、内置行数,模块和包,开发规范

    一.递归与二分法 一.递归 1.递归调用的定义 递归调用:在调用一个函数的过程中,直接或间接地调用了函数本身 2.递归分为两类:直接与间接 #直接 def func(): print('from fu ...

随机推荐

  1. ElasticSearch-生命周期管理

    1月29日,Elastic Stack 迎来 6.6 版本的发布,该版本带来很多新功能,比如: Index Lifecycle Management Frozen Index Geoshape bas ...

  2. Java BigDecimal使用指南

    提起BigDecimal,相信大家都使用过,之所以总结这篇呢,是因为最近发现项目中使用的不是太规范,在某些场景下甚至出现代码抛出异常的情况, 所以就总结了这篇,希望大家在使用时,可以少踩一些坑. 1. ...

  3. MD5加密,java String 转变成MD5 String 详细代码,工具类Android开发必备

    /** * MD5加码.32位 * @param inStr * @return */ public static String MD5(String inStr) { MessageDigest m ...

  4. C++11中std::move、std::forward、左右值引用、移动构造函数的测试

    关于C++11新特性之std::move.std::forward.左右值引用网上资料已经很多了,我主要针对测试性能做一个测试,梳理一下这些逻辑,首先,左值比较熟悉,右值就是临时变量,意味着使用一次就 ...

  5. Activiti7 zip部署,查询及其删除

    zip部署 package com.itheima.activiti; import org.activiti.engine.ProcessEngine; import org.activiti.en ...

  6. 正则表达式在Java中应用的三种典型场合:验证,查找和替换

    正则式在编程中常用,总结在此以备考: package regularexp; import java.util.regex.Matcher; import java.util.regex.Patter ...

  7. ICARUS主题美化

    Icarus用户指南 - 主题美化 Icarus的主题样式编码文件为themes/icarus/layout/layout.jsx. 此文件定义了站点全局的样式设置.本文详细介绍了本主题针对文章分类的 ...

  8. jzoj 3431. 【GDOI2014模拟】网格

    Description 某城市的街道呈网格状,左下角坐标为A(0, 0),右上角坐标为B(n, m),其中n >= m.现在从A(0, 0)点出发,只能沿着街道向正右方或者正上方行走,且不能经过 ...

  9. oracle数据处理之sql*loader(二)

    目录 SQL*Loader对不同文件及格式的处理方法 2.1 Excel文件 一般的Excel文件最大行数不超过65536行,说明数据处理量并不大,处理Excel的方式是将其另存为CSV格式文件,然后 ...

  10. [LeetCode] 79. 单词搜索(DFS,回溯)

    题目 给定一个二维网格和一个单词,找出该单词是否存在于网格中. 单词必须按照字母顺序,通过相邻的单元格内的字母构成,其中"相邻"单元格是那些水平相邻或垂直相邻的单元格.同一个单元格 ...