模块和包

一、定义：

模块：用来从逻辑上组织Python代码(变量，函数，类，逻辑：实现一个功能),
本质就是.py结尾的Python文件
包：用来从逻辑上组织模块，本质就是一个目录（必须带有一个__init__.py文件）

二、导入方法：

import module_name,module_name2,...

from module_name import *

from module_name import m1，m2,m3

from module_name import logger as logger1

三、import本质(路径搜索和搜索路径)

导入模块的本质就是把Python文件解释一遍,解释器解释该py文件
(import test test='test.py all code')
(from test import name name='code')

import module_name----->module_name.py----->module_name.py的路径

导入包的本质就是执行该包的__init__.py文件,解释器解释该包下的 __init__.py 文件

name

当做脚本运行：
　　__name__ 等于'__main__'

当做模块导入：
　　__name__= 模块名

我们可以借助这个特性来控制我们的py文件在不同的应用场景下执行不同的逻辑。

举个例子：

def say_hai(name):

    print('Hi, {}'.format(name))

# 下面的代码在当前文件以模块的方法被导入时是不会执行的

if __name__ == "__main__":

    print(__name__)

    input_name = input('your name:').strip()

    say_hai(input_name)

四、导入优化

from module_test import test

五、模块的分类

import加载的模块分为四个通用类别：

1 使用python编写的代码（.py文件）

2 已被编译为共享库或DLL的C或C++扩展

3 包好一组模块的包

4 使用C编写并链接到python解释器的内置模块

常用内置模块

(一)时间模块

在Python中，通常有这几种方式来表示时间：

时间戳 1970年1月1日之后的秒，即：time.time()
格式化的字符串 2014-11-11 11:11，即：time.strftime('%Y-%m-%d')
结构化时间元组包含了：年、日、星期等... time.struct_time 即：time.localtime()

由于Python的time模块实现主要调用C库，所以各个平台可能有所不同。
UTC（Coordinated Universal Time，世界协调时）亦即格林威治天文时间，世界标准时间。

在中国为UTC+8。DST（Daylight Saving Time）即夏令时。

时间戳（timestamp）的方式：通常来说，时间戳表示的是从1970年1月1日00:00:00开始按秒计算的偏移量。

我们运行“type(time.time())”，返回的是float类型。返回时间戳方式的函数主要有time()，clock()等

索引（Index）	属性（Attribute）	值（Values）
0	tm_year（年）	比如2011
1	tm_mon（月）	1 - 12
2	tm_mday（日）	1 - 31
3	tm_hour（时）	0 - 23
4	tm_min（分）	0 - 59
5	tm_sec（秒）	0 - 61
6	tm_wday（weekday）	0 - 6（0表示周日）
7	tm_yday（一年中的第几天）	1 - 366
8	tm_isdst（是否是夏令时）	默认为-1

time模块的常用方法(函数)：

1）time.localtime([secs])：将一个时间戳转换为当前时区的struct_time。secs参数未提供，则以当前时间为准。

>>> import time

>>> time.localtime()

time.struct_time(tm_year=2018, tm_mon=10, tm_mday=25, tm_hour=22, tm_min=57, tm_sec=42, tm_wday=3, tm_yday=298, tm_isdst=0)

2）time.time()：返回当前时间的时间戳。

>>>import time

>>> time.time()

1540479500.1852782

3）time.gmtime([secs])：和localtime()方法类似，gmtime()方法是将一个时间戳转换为UTC时区（0时区）的struct_time。

>>>import time

>>> time.gmtime()

time.struct_time(tm_year=2018, tm_mon=10, tm_mday=25, tm_hour=14, tm_min=59, tm_sec=23, tm_wday=3, tm_yday=298, tm_isdst=0)

4）time.mktime(t)：将一个struct_time(UTC+8)转化为时间戳。

>>>import time

>>> x=time.localtime()

>>> time.mktime(x)

1540479626.0

5）time.sleep(secs)：线程推迟指定的时间运行。单位为秒。

import time

'''

运行程序，睡眠2秒后输出"Hello  Python！"

'''

time.sleep(2)

print("Hello Python!")

6）time.asctime([t])：把一个表示时间的元组或者struct_time表示为这种形式：'Sun Jun 20 23:21:05 1993'。如果没有参数，将会将time.localtime()作为参数传入。

>>>import time

>>>x=time.localtime()

>>> time.asctime(x)

'Thu Oct 25 23:00:26 2018'

>>>

7）time.ctime([secs])：把一个时间戳（按秒计算的浮点数）转化为time.asctime()的形式。如果参数未给或者为None的时候，将会默认time.time()为参数。它的作用相当于time.asctime(time.localtime(secs))。

1 import time

2 >>> time.time()

3 1540459453.0845733

4 >>> time.ctime(time.time())

5 'Thu Oct 25 17:24:36 2018'

6 >>>

8）time.strftime(format[, t])：把一个代表时间的元组或者struct_time（如由time.localtime()和time.gmtime()返回）转化为格式化的时间字符串。

如果t未指定，将传入time.localtime()。如果元组中任何一个元素越界，ValueError的错误将会被抛出。

格式	含义	备注
%a	本地（locale）简化星期名称
%A	本地完整星期名称
%b	本地简化月份名称
%B	本地完整月份名称
%c	本地相应的日期和时间表示
%d	一个月中的第几天（01 - 31）
%H	一天中的第几个小时（24小时制，00 - 23）
%I	第几个小时（12小时制，01 - 12）
%j	一年中的第几天（001 - 366）
%m	月份（01 - 12）
%M	分钟数（00 - 59）
%p	本地am或者pm的相应符	一
%S	秒（01 - 61）	二
%U	一年中的星期数。（00 - 53星期天是一个星期的开始。）第一个星期天之前的所有天数都放在第0周。	三
%w	一个星期中的第几天（0 - 6，0是星期天）	三
%W	和%U基本相同，不同的是%W以星期一为一个星期的开始。
%x	本地相应日期
%X	本地相应时间
%y	去掉世纪的年份（00 - 99）
%Y	完整的年份
%Z	时区的名字（如果不存在为空字符）
%%	‘%’字符

备注：

“%p”只有与“%I”配合使用才有效果。
文档中强调确实是0 - 61，而不是59，闰年秒占两秒（汗一个）。
当使用strptime()函数时，只有当在这年中的周数和天数被确定的时候%U和%W才会被计算。

1 import time

2

3 >>> time.strftime("%Y-%m-%d %A  %H:%M:%S ")

4 '2018-10-25 Thursday  17:33:29 '

5

6 >>> time.strftime(" %A  %H:%M:%S %Y-%m-%d ")

7 ' Thursday  17:35:09 2018-10-25 '

8 >>>

9）time.strptime(string[, format])：把一个格式化时间字符串转化为struct_time。实际上它和strftime()是逆操作。

import time

>>> time.strptime(' Thursday  17:35:09 2018-10-25',' %A  %H:%M:%S %Y-%m-%d')

time.struct_time(tm_year=2018, tm_mon=10, tm_mday=25, tm_hour=17, tm_min=35, tm_sec=9, tm_wday=3, tm_yday=298, tm_isdst=-1)

>>>

10）time.clock()：这个需要注意，在不同的系统上含义不同。在UNIX系统上，它返回的是“进程时间”，它是用秒表示的浮点数（时间戳）。

而在WINDOWS中，第一次调用，返回的是进程运行的实际时间。而第二次之后的调用是自第一次调用以后到现在的运行时间。

（实际上是以WIN32上QueryPerformanceCounter()为基础，它比毫秒表示更为精确）

>>>import time

>>> if __name__ =='__main__':

...     time.sleep(1)

...     print("clock1:%s"%time.clock())

...     time.sleep(1)

...     print("clock2:%s" % time.clock())

...     time.sleep(1)

...     print("clock3:%s" % time.clock())

...

clock1:2.5e-06

clock2:1.0002382

clock3:2.0004314

>>>

时间关系转换

datetime

>>>import datetime

#当前时间

>>> datetime.datetime.now()

datetime.datetime(2018, 10, 25, 14, 58, 9, 526923)

#当前时间为未来3天

>>> print ( datetime.datetime.now()+datetime.timedelta(3))

2018-10-28 14:59:58.085724

#当前时间为-3天

>>> print ( datetime.datetime.now()+datetime.timedelta(-3))

2018-10-22 15:01:00.604181

>>>

#当前时间+3小时

>>>print ( datetime.datetime.now()+datetime.timedelta(hours=3))

2018-10-25 18:02:36.695773

#当前时间+30分钟

>>> print ( datetime.datetime.now()+datetime.timedelta(minutes=30))

2018-10-25 15:33:21.053755

>>>

#时间替换

>>> c_time=datetime.datetime.now()

>>> print(c_time.replace(minute=3,hour=2))

2018-10-25 02:03:39.820451

>>>

datetime.date.today() 本地日期对象,(用str函数可得到它的字面表示(2014-03-24))
datetime.date.isoformat(obj) 当前[年-月-日]字符串表示(2014-03-24)
datetime.date.fromtimestamp() 返回一个日期对象，参数是时间戳,返回 [年-月-日]
datetime.date.weekday(obj) 返回一个日期对象的星期数,周一是0
datetime.date.isoweekday(obj) 返回一个日期对象的星期数,周一是1
datetime.date.isocalendar(obj) 把日期对象返回一个带有年月日的元组
datetime对象：
datetime.datetime.today() 返回一个包含本地时间(含微秒数)的datetime对象 2014-03-24 23:31:50.419000
datetime.datetime.now([tz]) 返回指定时区的datetime对象 2014-03-24 23:31:50.419000
datetime.datetime.utcnow() 返回一个零时区的datetime对象
datetime.fromtimestamp(timestamp[,tz]) 按时间戳返回一个datetime对象，可指定时区,可用于strftime转换为日期表示 
datetime.utcfromtimestamp(timestamp) 按时间戳返回一个UTC-datetime对象
datetime.datetime.strptime(‘2014-03-16 12:21:21‘,”%Y-%m-%d %H:%M:%S”) 将字符串转为datetime对象
datetime.datetime.strftime(datetime.datetime.now(), ‘%Y%m%d %H%M%S‘) 将datetime对象转换为str表示形式
datetime.date.today().timetuple() 转换为时间戳datetime元组对象，可用于转换时间戳
datetime.datetime.now().timetuple()
time.mktime(timetupleobj) 将datetime元组对象转为时间戳
time.time() 当前时间戳
time.localtime
time.gmtime

(二)random模块

random.random()#用于生成一个0到1的随机符点数: 0 <= n < 1.0

>>> import random

>>> random.random()

0.8048731160537441

>>> random.random()

0.540423134210193

>>> random.random()

0.5877892352747521

>>>

random.randint(a, b)，用于生成一个指定范围内的整数。其中参数a是下限，参数b是上限，生成的随机数n: a <= n <= b

>>>import random
>>> random.randint(1,9)
3
>>> random.randint(1,9)
7
>>> random.randint(1,9)
5
>>> random.randint(1,9)
9

randrange([start], stop[, step])， # 从指定范围内，按指定基数递增的集合中获取一个随机数。

如：random.randrange(10, 100, 2)， # 结果相当于从[10, 12, 14, 16, ... 96, 98]序列中获取一个随机数。

>>>import random

>>> random.randrange(1,10,2)

1

>>> random.randrange(1,10,2)

7

>>> random.randrange(1,10,2)

9

>>> random.randrange(1,10,2)

5

>>> random.randrange(1,10,2)

1

>>> random.randrange(1,10,2)

9

>>> random.randrange(1,10,2)

3

random.choice(sequence)参数sequence表示一个有序类型。从序列中获取一个随机元素sequence

在python不是一种特定的类型，而是泛指一系列的类型。如 list, tuple, 字符串都属于sequence。

>>>import random

>>> random.choice("I Love You")

'o'

>>> random.choice("I Love You")

'Y'

>>> random.choice("I Love You")

'v'

>>> random.choice("I Love You")

' '

>>> random.choice("I Love You")

' '

>>> random.choice("I Love You")

'L'

>>> random.choice("I Love You")

' '

>>>

实际应用：

import random

import string

#随机整数：

print( random.randint(0,99))  #

#随机选取0到100间的偶数：

print(random.randrange(0, 101, 2)) #

#随机浮点数：

print( random.random()) #0.2746445568079129

print(random.uniform(1, 10)) #9.887001463194844

#随机字符：

print(random.choice('abcdefg&#%^*f')) #f

#多个字符中选取特定数量的字符：

print(random.sample('abcdefghij',3)) #['f', 'h', 'd']

#随机选取字符串：

print( random.choice ( ['apple', 'pear', 'peach', 'orange', 'lemon'] )) #apple

#洗牌#

items = [1,2,3,4,5,6,7]

print(items) #[1, 2, 3, 4, 5, 6, 7]

random.shuffle(items)

print(items) #[1, 4, 7, 2, 5, 3, 6]

生成随机验证码

import random

checkcode=''

for i in range(4):

    current=random.randrange(0,4)

    if i==current:

    # 字母

        tmp=chr(random.randint(65,122))

    else:

    #数字

        tmp=random.randint(0,9)

    checkcode+=str(tmp)

print(checkcode)

(三)os模块模块

提供对操作系统进行调用的接口

os.getcwd() 获取当前工作目录，即当前python脚本工作的目录路径

os.chdir("dirname")  改变当前脚本工作目录；相当于shell下cd

os.curdir  返回当前目录: ('.')

os.pardir  获取当前目录的父目录字符串名：('..')

os.makedirs('dirname1/dirname2')    可生成多层递归目录

os.removedirs('dirname1')    若目录为空，则删除，并递归到上一级目录，如若也为空，则删除，依此类推

os.mkdir('dirname')    生成单级目录；相当于shell中mkdir dirname

os.rmdir('dirname')    删除单级空目录，若目录不为空则无法删除，报错；相当于shell中rmdir dirname

os.listdir('dirname')    列出指定目录下的所有文件和子目录，包括隐藏文件，并以列表方式打印

os.remove()  删除一个文件

os.rename("oldname","newname")  重命名文件/目录

os.stat('path/filename')  获取文件/目录信息

os.sep    输出操作系统特定的路径分隔符，win下为"\\",Linux下为"/"

os.linesep    输出当前平台使用的行终止符，win下为"\t\n",Linux下为"\n"

os.pathsep    输出用于分割文件路径的字符串

os.name    输出字符串指示当前使用平台。win->'nt'; Linux->'posix'

os.system("bash command")  运行shell命令，直接显示

os.environ  获取系统环境变量

os.path.abspath(path)  返回path规范化的绝对路径

os.path.split(path)  将path分割成目录和文件名二元组返回

os.path.dirname(path)  返回path的目录。其实就是os.path.split(path)的第一个元素

os.path.basename(path)  返回path最后的文件名。如何path以／或\结尾，那么就会返回空值。即os.path.split(path)的第二个元素

os.path.exists(path)  如果path存在，返回True；如果path不存在，返回False

os.path.isabs(path)  如果path是绝对路径，返回True

os.path.isfile(path)  如果path是一个存在的文件，返回True。否则返回False

os.path.isdir(path)  如果path是一个存在的目录，则返回True。否则返回False

os.path.join(path1[, path2[, ...]])  将多个路径组合后返回，第一个绝对路径之前的参数将被忽略

os.path.getatime(path)  返回path所指向的文件或者目录的最后存取时间

os.path.getmtime(path)  返回path所指向的文件或者目录的最后修改时间

(四)sys模块

sys.argv           命令行参数List，第一个元素是程序本身路径

sys.exit(n)        退出程序，正常退出时exit(0)

sys.version        获取Python解释程序的版本信息

sys.maxint         最大的Int值

sys.path           返回模块的搜索路径，初始化时使用PYTHONPATH环境变量的值

sys.platform       返回操作系统平台名称

sys.stdout.write('please:')

val = sys.stdin.readline()[:-1]

(五)shutil模块

高级的文件、文件夹、压缩包处理模块

shutil.copyfileobj(fsrc, fdst[, length])
将文件内容拷贝到另一个文件中，可以部分内容

def copyfileobj(fsrc, fdst, length=16*1024):

    """copy data from file-like object fsrc to file-like object fdst"""

    while 1:

        buf = fsrc.read(length)

        if not buf:

            break

        fdst.write(buf)

import shutil

f1 = open("程序员必逛的网站.txt",encoding='gbk')

f2 = open("笔记本2",'w',encoding='utf-8')

shutil.copyfileobj(f1,f2)

shutil.copyfile(src, dst)
拷贝文件

def copyfile(src, dst):

    """Copy data from src to dst"""

    if _samefile(src, dst):

        raise Error("`%s` and `%s` are the same file" % (src, dst))

    for fn in [src, dst]:

        try:

            st = os.stat(fn)

        except OSError:

            # File most likely does not exist

            pass

        else:

            # XXX What about other special files? (sockets, devices...)

            if stat.S_ISFIFO(st.st_mode):

                raise SpecialFileError("`%s` is a named pipe" % fn)

    with open(src, 'rb') as fsrc:

        with open(dst, 'wb') as fdst:

            copyfileobj(fsrc, fdst)

import shutil

shutil.copyfile('笔记本2','笔记本3')

shutil.copymode(src, dst)
仅拷贝权限。内容、组、用户均不变

def copymode(src, dst):

    """Copy mode bits from src to dst"""

    if hasattr(os, 'chmod'):

        st = os.stat(src)

        mode = stat.S_IMODE(st.st_mode)

        os.chmod(dst, mode)

shutil.copystat(src, dst)
拷贝状态的信息，包括：mode bits, atime, mtime, flags

def copystat(src, dst):

    """Copy all stat info (mode bits, atime, mtime, flags) from src to dst"""

    st = os.stat(src)

    mode = stat.S_IMODE(st.st_mode)

    if hasattr(os, 'utime'):

        os.utime(dst, (st.st_atime, st.st_mtime))

    if hasattr(os, 'chmod'):

        os.chmod(dst, mode)

    if hasattr(os, 'chflags') and hasattr(st, 'st_flags'):

        try:

            os.chflags(dst, st.st_flags)

        except OSError, why:

            for err in 'EOPNOTSUPP', 'ENOTSUP':

                if hasattr(errno, err) and why.errno == getattr(errno, err):

                    break

            else:

                raise

shutil.copy(src, dst)
拷贝文件和权限

def copy(src, dst):

    """Copy data and mode bits ("cp src dst").

    The destination may be a directory.

    """

    if os.path.isdir(dst):

        dst = os.path.join(dst, os.path.basename(src))

    copyfile(src, dst)

    copymode(src, dst)

shutil.copy2(src, dst)
拷贝文件和状态信息

def copy2(src, dst):

    """Copy data and all stat info ("cp -p src dst").

    The destination may be a directory.

    """

    if os.path.isdir(dst):

        dst = os.path.join(dst, os.path.basename(src))

    copyfile(src, dst)

    copystat(src, dst)

shutil.ignore_patterns(*patterns)
shutil.copytree(src, dst, symlinks=False, ignore=None)
递归的去拷贝文件

import shutil

shutil.copytree('a','new_a')

shutil.rmtree(path[, ignore_errors[, onerror]])
递归的去删除文件

import  shutil

shutil.rmtree('new_a')

shutil.move(src, dst)
递归的去移动文件

def move(src, dst):

    """Recursively move a file or directory to another location. This is

    similar to the Unix "mv" command.

    If the destination is a directory or a symlink to a directory, the source

    is moved inside the directory. The destination path must not already

    exist.

    If the destination already exists but is not a directory, it may be

    overwritten depending on os.rename() semantics.

    If the destination is on our current filesystem, then rename() is used.

    Otherwise, src is copied to the destination and then removed.

    A lot more could be done here...  A look at a mv.c shows a lot of

    the issues this implementation glosses over.

    """

    real_dst = dst

    if os.path.isdir(dst):

        if _samefile(src, dst):

            # We might be on a case insensitive filesystem,

            # perform the rename anyway.

            os.rename(src, dst)

            return

        real_dst = os.path.join(dst, _basename(src))

        if os.path.exists(real_dst):

            raise Error, "Destination path '%s' already exists" % real_dst

    try:

        os.rename(src, real_dst)

    except OSError:

        if os.path.isdir(src):

            if _destinsrc(src, dst):

                raise Error, "Cannot move a directory '%s' into itself '%s'." % (src, dst)

            copytree(src, real_dst, symlinks=True)

            rmtree(src)

        else:

            copy2(src, real_dst)

            os.unlink(src)

shutil.make_archive(base_name, format,...)

import shutil

shutil.make_archive('shutil_make_archive','zip','H:\Python3_study\jichu\day1')

 def make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0,

                  dry_run=0, owner=None, group=None, logger=None):

     """Create an archive file (eg. zip or tar).

     'base_name' is the name of the file to create, minus any format-specific

     extension; 'format' is the archive format: one of "zip", "tar", "bztar"

     or "gztar".

     'root_dir' is a directory that will be the root directory of the

     archive; ie. we typically chdir into 'root_dir' before creating the

     archive.  'base_dir' is the directory where we start archiving from;

     ie. 'base_dir' will be the common prefix of all files and

     directories in the archive.  'root_dir' and 'base_dir' both default

     to the current directory.  Returns the name of the archive file.

     'owner' and 'group' are used when creating a tar archive. By default,

     uses the current owner and group.

     """

     save_cwd = os.getcwd()

     if root_dir is not None:

         if logger is not None:

             logger.debug("changing into '%s'", root_dir)

         base_name = os.path.abspath(base_name)

         if not dry_run:

             os.chdir(root_dir)

     if base_dir is None:

         base_dir = os.curdir

     kwargs = {'dry_run': dry_run, 'logger': logger}

     try:

         format_info = _ARCHIVE_FORMATS[format]

     except KeyError:

         raise ValueError, "unknown archive format '%s'" % format

     func = format_info[0]

     for arg, val in format_info[1]:

         kwargs[arg] = val

     if format != 'zip':

         kwargs['owner'] = owner

         kwargs['group'] = group

     try:

         filename = func(base_name, base_dir, **kwargs)

     finally:

         if root_dir is not None:

             if logger is not None:

                 logger.debug("changing back to '%s'", save_cwd)

             os.chdir(save_cwd)

     return filename

创建压缩包并返回文件路径，例如：zip、tar

base_name：压缩包的文件名，也可以是压缩包的路径。只是文件名时，则保存至当前目录，否则保存至指定路径，
如：www =>保存至当前路径
如：/Users/wupeiqi/www =>保存至/Users/wupeiqi/

format：压缩包种类，“zip”, “tar”, “bztar”，“gztar”

root_dir：要压缩的文件夹路径（默认当前目录）

owner：用户，默认当前用户

group：组，默认当前组

logger：用于记录日志，通常是logging.Logger对象

shutil 对压缩包的处理是调用 ZipFile 和 TarFile 两个模块来进行的，详细：

import zipfile

# 压缩

z = zipfile.ZipFile('laxi.zip', 'w')

z.write('a.log')

z.write('data.data')

z.close()

# 解压

z = zipfile.ZipFile('laxi.zip', 'r')

z.extractall()

z.close()

zipfile 压缩解压

zipfile 压缩解压

import tarfile

# 压缩

tar = tarfile.open('your.tar','w')

tar.add('/Users/wupeiqi/PycharmProjects/bbs2.zip', arcname='bbs2.zip')

tar.add('/Users/wupeiqi/PycharmProjects/cmdb.zip', arcname='cmdb.zip')

tar.close()

# 解压

tar = tarfile.open('your.tar','r')

tar.extractall()  # 可设置解压地址

tar.close()

tarfile 压缩解压

class ZipFile(object):

    """ Class with methods to open, read, write, close, list zip files.

    z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=False)

    file: Either the path to the file, or a file-like object.

          If it is a path, the file will be opened and closed by ZipFile.

    mode: The mode can be either read "r", write "w" or append "a".

    compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib).

    allowZip64: if True ZipFile will create files with ZIP64 extensions when

                needed, otherwise it will raise an exception when this would

                be necessary.

    """

    fp = None                   # Set here since __del__ checks it

    def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False):

        """Open the ZIP file with mode read "r", write "w" or append "a"."""

        if mode not in ("r", "w", "a"):

            raise RuntimeError('ZipFile() requires mode "r", "w", or "a"')

        if compression == ZIP_STORED:

            pass

        elif compression == ZIP_DEFLATED:

            if not zlib:

                raise RuntimeError,\

                      "Compression requires the (missing) zlib module"

        else:

            raise RuntimeError, "That compression method is not supported"

        self._allowZip64 = allowZip64

        self._didModify = False

        self.debug = 0  # Level of printing: 0 through 3

        self.NameToInfo = {}    # Find file info given name

        self.filelist = []      # List of ZipInfo instances for archive

        self.compression = compression  # Method of compression

        self.mode = key = mode.replace('b', '')[0]

        self.pwd = None

        self._comment = ''

        # Check if we were passed a file-like object

        if isinstance(file, basestring):

            self._filePassed = 0

            self.filename = file

            modeDict = {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'}

            try:

                self.fp = open(file, modeDict[mode])

            except IOError:

                if mode == 'a':

                    mode = key = 'w'

                    self.fp = open(file, modeDict[mode])

                else:

                    raise

        else:

            self._filePassed = 1

            self.fp = file

            self.filename = getattr(file, 'name', None)

        try:

            if key == 'r':

                self._RealGetContents()

            elif key == 'w':

                # set the modified flag so central directory gets written

                # even if no files are added to the archive

                self._didModify = True

            elif key == 'a':

                try:

                    # See if file is a zip file

                    self._RealGetContents()

                    # seek to start of directory and overwrite

                    self.fp.seek(self.start_dir, 0)

                except BadZipfile:

                    # file is not a zip file, just append

                    self.fp.seek(0, 2)

                    # set the modified flag so central directory gets written

                    # even if no files are added to the archive

                    self._didModify = True

            else:

                raise RuntimeError('Mode must be "r", "w" or "a"')

        except:

            fp = self.fp

            self.fp = None

            if not self._filePassed:

                fp.close()

            raise

    def __enter__(self):

        return self

    def __exit__(self, type, value, traceback):

        self.close()

    def _RealGetContents(self):

        """Read in the table of contents for the ZIP file."""

        fp = self.fp

        try:

            endrec = _EndRecData(fp)

        except IOError:

            raise BadZipfile("File is not a zip file")

        if not endrec:

            raise BadZipfile, "File is not a zip file"

        if self.debug > 1:

            print endrec

        size_cd = endrec[_ECD_SIZE]             # bytes in central directory

        offset_cd = endrec[_ECD_OFFSET]         # offset of central directory

        self._comment = endrec[_ECD_COMMENT]    # archive comment

        # "concat" is zero, unless zip was concatenated to another file

        concat = endrec[_ECD_LOCATION] - size_cd - offset_cd

        if endrec[_ECD_SIGNATURE] == stringEndArchive64:

            # If Zip64 extension structures are present, account for them

            concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator)

        if self.debug > 2:

            inferred = concat + offset_cd

            print "given, inferred, offset", offset_cd, inferred, concat

        # self.start_dir:  Position of start of central directory

        self.start_dir = offset_cd + concat

        fp.seek(self.start_dir, 0)

        data = fp.read(size_cd)

        fp = cStringIO.StringIO(data)

        total = 0

        while total < size_cd:

            centdir = fp.read(sizeCentralDir)

            if len(centdir) != sizeCentralDir:

                raise BadZipfile("Truncated central directory")

            centdir = struct.unpack(structCentralDir, centdir)

            if centdir[_CD_SIGNATURE] != stringCentralDir:

                raise BadZipfile("Bad magic number for central directory")

            if self.debug > 2:

                print centdir

            filename = fp.read(centdir[_CD_FILENAME_LENGTH])

            # Create ZipInfo instance to store file information

            x = ZipInfo(filename)

            x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH])

            x.comment = fp.read(centdir[_CD_COMMENT_LENGTH])

            x.header_offset = centdir[_CD_LOCAL_HEADER_OFFSET]

            (x.create_version, x.create_system, x.extract_version, x.reserved,

                x.flag_bits, x.compress_type, t, d,

                x.CRC, x.compress_size, x.file_size) = centdir[1:12]

            x.volume, x.internal_attr, x.external_attr = centdir[15:18]

            # Convert date/time code to (year, month, day, hour, min, sec)

            x._raw_time = t

            x.date_time = ( (d>>9)+1980, (d>>5)&0xF, d&0x1F,

                                     t>>11, (t>>5)&0x3F, (t&0x1F) * 2 )

            x._decodeExtra()

            x.header_offset = x.header_offset + concat

            x.filename = x._decodeFilename()

            self.filelist.append(x)

            self.NameToInfo[x.filename] = x

            # update total bytes read from central directory

            total = (total + sizeCentralDir + centdir[_CD_FILENAME_LENGTH]

                     + centdir[_CD_EXTRA_FIELD_LENGTH]

                     + centdir[_CD_COMMENT_LENGTH])

            if self.debug > 2:

                print "total", total

    def namelist(self):

        """Return a list of file names in the archive."""

        l = []

        for data in self.filelist:

            l.append(data.filename)

        return l

    def infolist(self):

        """Return a list of class ZipInfo instances for files in the

        archive."""

        return self.filelist

    def printdir(self):

        """Print a table of contents for the zip file."""

        print "%-46s %19s %12s" % ("File Name", "Modified    ", "Size")

        for zinfo in self.filelist:

            date = "%d-%02d-%02d %02d:%02d:%02d" % zinfo.date_time[:6]

            print "%-46s %s %12d" % (zinfo.filename, date, zinfo.file_size)

    def testzip(self):

        """Read all the files and check the CRC."""

        chunk_size = 2 ** 20

        for zinfo in self.filelist:

            try:

                # Read by chunks, to avoid an OverflowError or a

                # MemoryError with very large embedded files.

                with self.open(zinfo.filename, "r") as f:

                    while f.read(chunk_size):     # Check CRC-32

                        pass

            except BadZipfile:

                return zinfo.filename

    def getinfo(self, name):

        """Return the instance of ZipInfo given 'name'."""

        info = self.NameToInfo.get(name)

        if info is None:

            raise KeyError(

                'There is no item named %r in the archive' % name)

        return info

    def setpassword(self, pwd):

        """Set default password for encrypted files."""

        self.pwd = pwd

    @property

    def comment(self):

        """The comment text associated with the ZIP file."""

        return self._comment

    @comment.setter

    def comment(self, comment):

        # check for valid comment length

        if len(comment) > ZIP_MAX_COMMENT:

            import warnings

            warnings.warn('Archive comment is too long; truncating to %d bytes'

                          % ZIP_MAX_COMMENT, stacklevel=2)

            comment = comment[:ZIP_MAX_COMMENT]

        self._comment = comment

        self._didModify = True

    def read(self, name, pwd=None):

        """Return file bytes (as a string) for name."""

        return self.open(name, "r", pwd).read()

    def open(self, name, mode="r", pwd=None):

        """Return file-like object for 'name'."""

        if mode not in ("r", "U", "rU"):

            raise RuntimeError, 'open() requires mode "r", "U", or "rU"'

        if not self.fp:

            raise RuntimeError, \

                  "Attempt to read ZIP archive that was already closed"

        # Only open a new file for instances where we were not

        # given a file object in the constructor

        if self._filePassed:

            zef_file = self.fp

            should_close = False

        else:

            zef_file = open(self.filename, 'rb')

            should_close = True

        try:

            # Make sure we have an info object

            if isinstance(name, ZipInfo):

                # 'name' is already an info object

                zinfo = name

            else:

                # Get info object for name

                zinfo = self.getinfo(name)

            zef_file.seek(zinfo.header_offset, 0)

            # Skip the file header:

            fheader = zef_file.read(sizeFileHeader)

            if len(fheader) != sizeFileHeader:

                raise BadZipfile("Truncated file header")

            fheader = struct.unpack(structFileHeader, fheader)

            if fheader[_FH_SIGNATURE] != stringFileHeader:

                raise BadZipfile("Bad magic number for file header")

            fname = zef_file.read(fheader[_FH_FILENAME_LENGTH])

            if fheader[_FH_EXTRA_FIELD_LENGTH]:

                zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH])

            if fname != zinfo.orig_filename:

                raise BadZipfile, \

                        'File name in directory "%s" and header "%s" differ.' % (

                            zinfo.orig_filename, fname)

            # check for encrypted flag & handle password

            is_encrypted = zinfo.flag_bits & 0x1

            zd = None

            if is_encrypted:

                if not pwd:

                    pwd = self.pwd

                if not pwd:

                    raise RuntimeError, "File %s is encrypted, " \

                        "password required for extraction" % name

                zd = _ZipDecrypter(pwd)

                # The first 12 bytes in the cypher stream is an encryption header

                #  used to strengthen the algorithm. The first 11 bytes are

                #  completely random, while the 12th contains the MSB of the CRC,

                #  or the MSB of the file time depending on the header type

                #  and is used to check the correctness of the password.

                bytes = zef_file.read(12)

                h = map(zd, bytes[0:12])

                if zinfo.flag_bits & 0x8:

                    # compare against the file type from extended local headers

                    check_byte = (zinfo._raw_time >> 8) & 0xff

                else:

                    # compare against the CRC otherwise

                    check_byte = (zinfo.CRC >> 24) & 0xff

                if ord(h[11]) != check_byte:

                    raise RuntimeError("Bad password for file", name)

            return ZipExtFile(zef_file, mode, zinfo, zd,

                    close_fileobj=should_close)

        except:

            if should_close:

                zef_file.close()

            raise

    def extract(self, member, path=None, pwd=None):

        """Extract a member from the archive to the current working directory,

           using its full name. Its file information is extracted as accurately

           as possible. `member' may be a filename or a ZipInfo object. You can

           specify a different directory using `path'.

        """

        if not isinstance(member, ZipInfo):

            member = self.getinfo(member)

        if path is None:

            path = os.getcwd()

        return self._extract_member(member, path, pwd)

    def extractall(self, path=None, members=None, pwd=None):

        """Extract all members from the archive to the current working

           directory. `path' specifies a different directory to extract to.

           `members' is optional and must be a subset of the list returned

           by namelist().

        """

        if members is None:

            members = self.namelist()

        for zipinfo in members:

            self.extract(zipinfo, path, pwd)

    def _extract_member(self, member, targetpath, pwd):

        """Extract the ZipInfo object 'member' to a physical

           file on the path targetpath.

        """

        # build the destination pathname, replacing

        # forward slashes to platform specific separators.

        arcname = member.filename.replace('/', os.path.sep)

        if os.path.altsep:

            arcname = arcname.replace(os.path.altsep, os.path.sep)

        # interpret absolute pathname as relative, remove drive letter or

        # UNC path, redundant separators, "." and ".." components.

        arcname = os.path.splitdrive(arcname)[1]

        arcname = os.path.sep.join(x for x in arcname.split(os.path.sep)

                    if x not in ('', os.path.curdir, os.path.pardir))

        if os.path.sep == '\\':

            # filter illegal characters on Windows

            illegal = ':<>|"?*'

            if isinstance(arcname, unicode):

                table = {ord(c): ord('_') for c in illegal}

            else:

                table = string.maketrans(illegal, '_' * len(illegal))

            arcname = arcname.translate(table)

            # remove trailing dots

            arcname = (x.rstrip('.') for x in arcname.split(os.path.sep))

            arcname = os.path.sep.join(x for x in arcname if x)

        targetpath = os.path.join(targetpath, arcname)

        targetpath = os.path.normpath(targetpath)

        # Create all upper directories if necessary.

        upperdirs = os.path.dirname(targetpath)

        if upperdirs and not os.path.exists(upperdirs):

            os.makedirs(upperdirs)

        if member.filename[-1] == '/':

            if not os.path.isdir(targetpath):

                os.mkdir(targetpath)

            return targetpath

        with self.open(member, pwd=pwd) as source, \

             file(targetpath, "wb") as target:

            shutil.copyfileobj(source, target)

        return targetpath

    def _writecheck(self, zinfo):

        """Check for errors before writing a file to the archive."""

        if zinfo.filename in self.NameToInfo:

            import warnings

            warnings.warn('Duplicate name: %r' % zinfo.filename, stacklevel=3)

        if self.mode not in ("w", "a"):

            raise RuntimeError, 'write() requires mode "w" or "a"'

        if not self.fp:

            raise RuntimeError, \

                  "Attempt to write ZIP archive that was already closed"

        if zinfo.compress_type == ZIP_DEFLATED and not zlib:

            raise RuntimeError, \

                  "Compression requires the (missing) zlib module"

        if zinfo.compress_type not in (ZIP_STORED, ZIP_DEFLATED):

            raise RuntimeError, \

                  "That compression method is not supported"

        if not self._allowZip64:

            requires_zip64 = None

            if len(self.filelist) >= ZIP_FILECOUNT_LIMIT:

                requires_zip64 = "Files count"

            elif zinfo.file_size > ZIP64_LIMIT:

                requires_zip64 = "Filesize"

            elif zinfo.header_offset > ZIP64_LIMIT:

                requires_zip64 = "Zipfile size"

            if requires_zip64:

                raise LargeZipFile(requires_zip64 +

                                   " would require ZIP64 extensions")

    def write(self, filename, arcname=None, compress_type=None):

        """Put the bytes from filename into the archive under the name

        arcname."""

        if not self.fp:

            raise RuntimeError(

                  "Attempt to write to ZIP archive that was already closed")

        st = os.stat(filename)

        isdir = stat.S_ISDIR(st.st_mode)

        mtime = time.localtime(st.st_mtime)

        date_time = mtime[0:6]

        # Create ZipInfo instance to store file information

        if arcname is None:

            arcname = filename

        arcname = os.path.normpath(os.path.splitdrive(arcname)[1])

        while arcname[0] in (os.sep, os.altsep):

            arcname = arcname[1:]

        if isdir:

            arcname += '/'

        zinfo = ZipInfo(arcname, date_time)

        zinfo.external_attr = (st[0] & 0xFFFF) << 16L      # Unix attributes

        if compress_type is None:

            zinfo.compress_type = self.compression

        else:

            zinfo.compress_type = compress_type

        zinfo.file_size = st.st_size

        zinfo.flag_bits = 0x00

        zinfo.header_offset = self.fp.tell()    # Start of header bytes

        self._writecheck(zinfo)

        self._didModify = True

        if isdir:

            zinfo.file_size = 0

            zinfo.compress_size = 0

            zinfo.CRC = 0

            zinfo.external_attr |= 0x10  # MS-DOS directory flag

            self.filelist.append(zinfo)

            self.NameToInfo[zinfo.filename] = zinfo

            self.fp.write(zinfo.FileHeader(False))

            return

        with open(filename, "rb") as fp:

            # Must overwrite CRC and sizes with correct data later

            zinfo.CRC = CRC = 0

            zinfo.compress_size = compress_size = 0

            # Compressed size can be larger than uncompressed size

            zip64 = self._allowZip64 and \

                    zinfo.file_size * 1.05 > ZIP64_LIMIT

            self.fp.write(zinfo.FileHeader(zip64))

            if zinfo.compress_type == ZIP_DEFLATED:

                cmpr = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,

                     zlib.DEFLATED, -15)

            else:

                cmpr = None

            file_size = 0

            while 1:

                buf = fp.read(1024 * 8)

                if not buf:

                    break

                file_size = file_size + len(buf)

                CRC = crc32(buf, CRC) & 0xffffffff

                if cmpr:

                    buf = cmpr.compress(buf)

                    compress_size = compress_size + len(buf)

                self.fp.write(buf)

        if cmpr:

            buf = cmpr.flush()

            compress_size = compress_size + len(buf)

            self.fp.write(buf)

            zinfo.compress_size = compress_size

        else:

            zinfo.compress_size = file_size

        zinfo.CRC = CRC

        zinfo.file_size = file_size

        if not zip64 and self._allowZip64:

            if file_size > ZIP64_LIMIT:

                raise RuntimeError('File size has increased during compressing')

            if compress_size > ZIP64_LIMIT:

                raise RuntimeError('Compressed size larger than uncompressed size')

        # Seek backwards and write file header (which will now include

        # correct CRC and file sizes)

        position = self.fp.tell()       # Preserve current position in file

        self.fp.seek(zinfo.header_offset, 0)

        self.fp.write(zinfo.FileHeader(zip64))

        self.fp.seek(position, 0)

        self.filelist.append(zinfo)

        self.NameToInfo[zinfo.filename] = zinfo

    def writestr(self, zinfo_or_arcname, bytes, compress_type=None):

        """Write a file into the archive.  The contents is the string

        'bytes'.  'zinfo_or_arcname' is either a ZipInfo instance or

        the name of the file in the archive."""

        if not isinstance(zinfo_or_arcname, ZipInfo):

            zinfo = ZipInfo(filename=zinfo_or_arcname,

                            date_time=time.localtime(time.time())[:6])

            zinfo.compress_type = self.compression

            if zinfo.filename[-1] == '/':

                zinfo.external_attr = 0o40775 << 16   # drwxrwxr-x

                zinfo.external_attr |= 0x10           # MS-DOS directory flag

            else:

                zinfo.external_attr = 0o600 << 16     # ?rw-------

        else:

            zinfo = zinfo_or_arcname

        if not self.fp:

            raise RuntimeError(

                  "Attempt to write to ZIP archive that was already closed")

        if compress_type is not None:

            zinfo.compress_type = compress_type

        zinfo.file_size = len(bytes)            # Uncompressed size

        zinfo.header_offset = self.fp.tell()    # Start of header bytes

        self._writecheck(zinfo)

        self._didModify = True

        zinfo.CRC = crc32(bytes) & 0xffffffff       # CRC-32 checksum

        if zinfo.compress_type == ZIP_DEFLATED:

            co = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,

                 zlib.DEFLATED, -15)

            bytes = co.compress(bytes) + co.flush()

            zinfo.compress_size = len(bytes)    # Compressed size

        else:

            zinfo.compress_size = zinfo.file_size

        zip64 = zinfo.file_size > ZIP64_LIMIT or \

                zinfo.compress_size > ZIP64_LIMIT

        if zip64 and not self._allowZip64:

            raise LargeZipFile("Filesize would require ZIP64 extensions")

        self.fp.write(zinfo.FileHeader(zip64))

        self.fp.write(bytes)

        if zinfo.flag_bits & 0x08:

            # Write CRC and file sizes after the file data

            fmt = '<LQQ' if zip64 else '<LLL'

            self.fp.write(struct.pack(fmt, zinfo.CRC, zinfo.compress_size,

                  zinfo.file_size))

        self.fp.flush()

        self.filelist.append(zinfo)

        self.NameToInfo[zinfo.filename] = zinfo

    def __del__(self):

        """Call the "close()" method in case the user forgot."""

        self.close()

    def close(self):

        """Close the file, and for mode "w" and "a" write the ending

        records."""

        if self.fp is None:

            return

        try:

            if self.mode in ("w", "a") and self._didModify: # write ending records

                pos1 = self.fp.tell()

                for zinfo in self.filelist:         # write central directory

                    dt = zinfo.date_time

                    dosdate = (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2]

                    dostime = dt[3] << 11 | dt[4] << 5 | (dt[5] // 2)

                    extra = []

                    if zinfo.file_size > ZIP64_LIMIT \

                            or zinfo.compress_size > ZIP64_LIMIT:

                        extra.append(zinfo.file_size)

                        extra.append(zinfo.compress_size)

                        file_size = 0xffffffff

                        compress_size = 0xffffffff

                    else:

                        file_size = zinfo.file_size

                        compress_size = zinfo.compress_size

                    if zinfo.header_offset > ZIP64_LIMIT:

                        extra.append(zinfo.header_offset)

                        header_offset = 0xffffffffL

                    else:

                        header_offset = zinfo.header_offset

                    extra_data = zinfo.extra

                    if extra:

                        # Append a ZIP64 field to the extra's

                        extra_data = struct.pack(

                                '<HH' + 'Q'*len(extra),

                                1, 8*len(extra), *extra) + extra_data

                        extract_version = max(45, zinfo.extract_version)

                        create_version = max(45, zinfo.create_version)

                    else:

                        extract_version = zinfo.extract_version

                        create_version = zinfo.create_version

                    try:

                        filename, flag_bits = zinfo._encodeFilenameFlags()

                        centdir = struct.pack(structCentralDir,

                        stringCentralDir, create_version,

                        zinfo.create_system, extract_version, zinfo.reserved,

                        flag_bits, zinfo.compress_type, dostime, dosdate,

                        zinfo.CRC, compress_size, file_size,

                        len(filename), len(extra_data), len(zinfo.comment),

                        0, zinfo.internal_attr, zinfo.external_attr,

                        header_offset)

                    except DeprecationWarning:

                        print >>sys.stderr, (structCentralDir,

                        stringCentralDir, create_version,

                        zinfo.create_system, extract_version, zinfo.reserved,

                        zinfo.flag_bits, zinfo.compress_type, dostime, dosdate,

                        zinfo.CRC, compress_size, file_size,

                        len(zinfo.filename), len(extra_data), len(zinfo.comment),

                        0, zinfo.internal_attr, zinfo.external_attr,

                        header_offset)

                        raise

                    self.fp.write(centdir)

                    self.fp.write(filename)

                    self.fp.write(extra_data)

                    self.fp.write(zinfo.comment)

                pos2 = self.fp.tell()

                # Write end-of-zip-archive record

                centDirCount = len(self.filelist)

                centDirSize = pos2 - pos1

                centDirOffset = pos1

                requires_zip64 = None

                if centDirCount > ZIP_FILECOUNT_LIMIT:

                    requires_zip64 = "Files count"

                elif centDirOffset > ZIP64_LIMIT:

                    requires_zip64 = "Central directory offset"

                elif centDirSize > ZIP64_LIMIT:

                    requires_zip64 = "Central directory size"

                if requires_zip64:

                    # Need to write the ZIP64 end-of-archive records

                    if not self._allowZip64:

                        raise LargeZipFile(requires_zip64 +

                                           " would require ZIP64 extensions")

                    zip64endrec = struct.pack(

                            structEndArchive64, stringEndArchive64,

                            44, 45, 45, 0, 0, centDirCount, centDirCount,

                            centDirSize, centDirOffset)

                    self.fp.write(zip64endrec)

                    zip64locrec = struct.pack(

                            structEndArchive64Locator,

                            stringEndArchive64Locator, 0, pos2, 1)

                    self.fp.write(zip64locrec)

                    centDirCount = min(centDirCount, 0xFFFF)

                    centDirSize = min(centDirSize, 0xFFFFFFFF)

                    centDirOffset = min(centDirOffset, 0xFFFFFFFF)

                endrec = struct.pack(structEndArchive, stringEndArchive,

                                    0, 0, centDirCount, centDirCount,

                                    centDirSize, centDirOffset, len(self._comment))

                self.fp.write(endrec)

                self.fp.write(self._comment)

                self.fp.flush()

        finally:

            fp = self.fp

            self.fp = None

            if not self._filePassed:

                fp.close()

ZipFile

ZipFile

 class TarFile(object):

     """The TarFile Class provides an interface to tar archives.

     """

     debug = 0                   # May be set from 0 (no msgs) to 3 (all msgs)

     dereference = False         # If true, add content of linked file to the

                                 # tar file, else the link.

     ignore_zeros = False        # If true, skips empty or invalid blocks and

                                 # continues processing.

     errorlevel = 1              # If 0, fatal errors only appear in debug

                                 # messages (if debug >= 0). If > 0, errors

                                 # are passed to the caller as exceptions.

     format = DEFAULT_FORMAT     # The format to use when creating an archive.

     encoding = ENCODING         # Encoding for 8-bit character strings.

     errors = None               # Error handler for unicode conversion.

     tarinfo = TarInfo           # The default TarInfo class to use.

     fileobject = ExFileObject   # The default ExFileObject class to use.

     def __init__(self, name=None, mode="r", fileobj=None, format=None,

             tarinfo=None, dereference=None, ignore_zeros=None, encoding=None,

             errors=None, pax_headers=None, debug=None, errorlevel=None):

         """Open an (uncompressed) tar archive `name'. `mode' is either 'r' to

            read from an existing archive, 'a' to append data to an existing

            file or 'w' to create a new file overwriting an existing one. `mode'

            defaults to 'r'.

            If `fileobj' is given, it is used for reading or writing data. If it

            can be determined, `mode' is overridden by `fileobj's mode.

            `fileobj' is not closed, when TarFile is closed.

         """

         modes = {"r": "rb", "a": "r+b", "w": "wb"}

         if mode not in modes:

             raise ValueError("mode must be 'r', 'a' or 'w'")

         self.mode = mode

         self._mode = modes[mode]

         if not fileobj:

             if self.mode == "a" and not os.path.exists(name):

                 # Create nonexistent files in append mode.

                 self.mode = "w"

                 self._mode = "wb"

             fileobj = bltn_open(name, self._mode)

             self._extfileobj = False

         else:

             if name is None and hasattr(fileobj, "name"):

                 name = fileobj.name

             if hasattr(fileobj, "mode"):

                 self._mode = fileobj.mode

             self._extfileobj = True

         self.name = os.path.abspath(name) if name else None

         self.fileobj = fileobj

         # Init attributes.

         if format is not None:

             self.format = format

         if tarinfo is not None:

             self.tarinfo = tarinfo

         if dereference is not None:

             self.dereference = dereference

         if ignore_zeros is not None:

             self.ignore_zeros = ignore_zeros

         if encoding is not None:

             self.encoding = encoding

         if errors is not None:

             self.errors = errors

         elif mode == "r":

             self.errors = "utf-8"

         else:

             self.errors = "strict"

         if pax_headers is not None and self.format == PAX_FORMAT:

             self.pax_headers = pax_headers

         else:

             self.pax_headers = {}

         if debug is not None:

             self.debug = debug

         if errorlevel is not None:

             self.errorlevel = errorlevel

         # Init datastructures.

         self.closed = False

         self.members = []       # list of members as TarInfo objects

         self._loaded = False    # flag if all members have been read

         self.offset = self.fileobj.tell()

                                 # current position in the archive file

         self.inodes = {}        # dictionary caching the inodes of

                                 # archive members already added

         try:

             if self.mode == "r":

                 self.firstmember = None

                 self.firstmember = self.next()

             if self.mode == "a":

                 # Move to the end of the archive,

                 # before the first empty block.

                 while True:

                     self.fileobj.seek(self.offset)

                     try:

                         tarinfo = self.tarinfo.fromtarfile(self)

                         self.members.append(tarinfo)

                     except EOFHeaderError:

                         self.fileobj.seek(self.offset)

                         break

                     except HeaderError, e:

                         raise ReadError(str(e))

             if self.mode in "aw":

                 self._loaded = True

                 if self.pax_headers:

                     buf = self.tarinfo.create_pax_global_header(self.pax_headers.copy())

                     self.fileobj.write(buf)

                     self.offset += len(buf)

         except:

             if not self._extfileobj:

                 self.fileobj.close()

             self.closed = True

             raise

     def _getposix(self):

         return self.format == USTAR_FORMAT

     def _setposix(self, value):

         import warnings

         warnings.warn("use the format attribute instead", DeprecationWarning,

                       2)

         if value:

             self.format = USTAR_FORMAT

         else:

             self.format = GNU_FORMAT

     posix = property(_getposix, _setposix)

     #--------------------------------------------------------------------------

     # Below are the classmethods which act as alternate constructors to the

     # TarFile class. The open() method is the only one that is needed for

     # public use; it is the "super"-constructor and is able to select an

     # adequate "sub"-constructor for a particular compression using the mapping

     # from OPEN_METH.

     #

     # This concept allows one to subclass TarFile without losing the comfort of

     # the super-constructor. A sub-constructor is registered and made available

     # by adding it to the mapping in OPEN_METH.

     @classmethod

     def open(cls, name=None, mode="r", fileobj=None, bufsize=RECORDSIZE, **kwargs):

         """Open a tar archive for reading, writing or appending. Return

            an appropriate TarFile class.

            mode:

            'r' or 'r:*' open for reading with transparent compression

            'r:'         open for reading exclusively uncompressed

            'r:gz'       open for reading with gzip compression

            'r:bz2'      open for reading with bzip2 compression

            'a' or 'a:'  open for appending, creating the file if necessary

            'w' or 'w:'  open for writing without compression

            'w:gz'       open for writing with gzip compression

            'w:bz2'      open for writing with bzip2 compression

            'r|*'        open a stream of tar blocks with transparent compression

            'r|'         open an uncompressed stream of tar blocks for reading

            'r|gz'       open a gzip compressed stream of tar blocks

            'r|bz2'      open a bzip2 compressed stream of tar blocks

            'w|'         open an uncompressed stream for writing

            'w|gz'       open a gzip compressed stream for writing

            'w|bz2'      open a bzip2 compressed stream for writing

         """

         if not name and not fileobj:

             raise ValueError("nothing to open")

         if mode in ("r", "r:*"):

             # Find out which *open() is appropriate for opening the file.

             for comptype in cls.OPEN_METH:

                 func = getattr(cls, cls.OPEN_METH[comptype])

                 if fileobj is not None:

                     saved_pos = fileobj.tell()

                 try:

                     return func(name, "r", fileobj, **kwargs)

                 except (ReadError, CompressionError), e:

                     if fileobj is not None:

                         fileobj.seek(saved_pos)

                     continue

             raise ReadError("file could not be opened successfully")

         elif ":" in mode:

             filemode, comptype = mode.split(":", 1)

             filemode = filemode or "r"

             comptype = comptype or "tar"

             # Select the *open() function according to

             # given compression.

             if comptype in cls.OPEN_METH:

                 func = getattr(cls, cls.OPEN_METH[comptype])

             else:

                 raise CompressionError("unknown compression type %r" % comptype)

             return func(name, filemode, fileobj, **kwargs)

         elif "|" in mode:

             filemode, comptype = mode.split("|", 1)

             filemode = filemode or "r"

             comptype = comptype or "tar"

             if filemode not in ("r", "w"):

                 raise ValueError("mode must be 'r' or 'w'")

             stream = _Stream(name, filemode, comptype, fileobj, bufsize)

             try:

                 t = cls(name, filemode, stream, **kwargs)

             except:

                 stream.close()

                 raise

             t._extfileobj = False

             return t

         elif mode in ("a", "w"):

             return cls.taropen(name, mode, fileobj, **kwargs)

         raise ValueError("undiscernible mode")

     @classmethod

     def taropen(cls, name, mode="r", fileobj=None, **kwargs):

         """Open uncompressed tar archive name for reading or writing.

         """

         if mode not in ("r", "a", "w"):

             raise ValueError("mode must be 'r', 'a' or 'w'")

         return cls(name, mode, fileobj, **kwargs)

     @classmethod

     def gzopen(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs):

         """Open gzip compressed tar archive name for reading or writing.

            Appending is not allowed.

         """

         if mode not in ("r", "w"):

             raise ValueError("mode must be 'r' or 'w'")

         try:

             import gzip

             gzip.GzipFile

         except (ImportError, AttributeError):

             raise CompressionError("gzip module is not available")

         try:

             fileobj = gzip.GzipFile(name, mode, compresslevel, fileobj)

         except OSError:

             if fileobj is not None and mode == 'r':

                 raise ReadError("not a gzip file")

             raise

         try:

             t = cls.taropen(name, mode, fileobj, **kwargs)

         except IOError:

             fileobj.close()

             if mode == 'r':

                 raise ReadError("not a gzip file")

             raise

         except:

             fileobj.close()

             raise

         t._extfileobj = False

         return t

     @classmethod

     def bz2open(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs):

         """Open bzip2 compressed tar archive name for reading or writing.

            Appending is not allowed.

         """

         if mode not in ("r", "w"):

             raise ValueError("mode must be 'r' or 'w'.")

         try:

             import bz2

         except ImportError:

             raise CompressionError("bz2 module is not available")

         if fileobj is not None:

             fileobj = _BZ2Proxy(fileobj, mode)

         else:

             fileobj = bz2.BZ2File(name, mode, compresslevel=compresslevel)

         try:

             t = cls.taropen(name, mode, fileobj, **kwargs)

         except (IOError, EOFError):

             fileobj.close()

             if mode == 'r':

                 raise ReadError("not a bzip2 file")

             raise

         except:

             fileobj.close()

             raise

         t._extfileobj = False

         return t

     # All *open() methods are registered here.

     OPEN_METH = {

         "tar": "taropen",   # uncompressed tar

         "gz":  "gzopen",    # gzip compressed tar

         "bz2": "bz2open"    # bzip2 compressed tar

     }

     #--------------------------------------------------------------------------

     # The public methods which TarFile provides:

     def close(self):

         """Close the TarFile. In write-mode, two finishing zero blocks are

            appended to the archive.

         """

         if self.closed:

             return

         if self.mode in "aw":

             self.fileobj.write(NUL * (BLOCKSIZE * 2))

             self.offset += (BLOCKSIZE * 2)

             # fill up the end with zero-blocks

             # (like option -b20 for tar does)

             blocks, remainder = divmod(self.offset, RECORDSIZE)

             if remainder > 0:

                 self.fileobj.write(NUL * (RECORDSIZE - remainder))

         if not self._extfileobj:

             self.fileobj.close()

         self.closed = True

     def getmember(self, name):

         """Return a TarInfo object for member `name'. If `name' can not be

            found in the archive, KeyError is raised. If a member occurs more

            than once in the archive, its last occurrence is assumed to be the

            most up-to-date version.

         """

         tarinfo = self._getmember(name)

         if tarinfo is None:

             raise KeyError("filename %r not found" % name)

         return tarinfo

     def getmembers(self):

         """Return the members of the archive as a list of TarInfo objects. The

            list has the same order as the members in the archive.

         """

         self._check()

         if not self._loaded:    # if we want to obtain a list of

             self._load()        # all members, we first have to

                                 # scan the whole archive.

         return self.members

     def getnames(self):

         """Return the members of the archive as a list of their names. It has

            the same order as the list returned by getmembers().

         """

         return [tarinfo.name for tarinfo in self.getmembers()]

     def gettarinfo(self, name=None, arcname=None, fileobj=None):

         """Create a TarInfo object for either the file `name' or the file

            object `fileobj' (using os.fstat on its file descriptor). You can

            modify some of the TarInfo's attributes before you add it using

            addfile(). If given, `arcname' specifies an alternative name for the

            file in the archive.

         """

         self._check("aw")

         # When fileobj is given, replace name by

         # fileobj's real name.

         if fileobj is not None:

             name = fileobj.name

         # Building the name of the member in the archive.

         # Backward slashes are converted to forward slashes,

         # Absolute paths are turned to relative paths.

         if arcname is None:

             arcname = name

         drv, arcname = os.path.splitdrive(arcname)

         arcname = arcname.replace(os.sep, "/")

         arcname = arcname.lstrip("/")

         # Now, fill the TarInfo object with

         # information specific for the file.

         tarinfo = self.tarinfo()

         tarinfo.tarfile = self

         # Use os.stat or os.lstat, depending on platform

         # and if symlinks shall be resolved.

         if fileobj is None:

             if hasattr(os, "lstat") and not self.dereference:

                 statres = os.lstat(name)

             else:

                 statres = os.stat(name)

         else:

             statres = os.fstat(fileobj.fileno())

         linkname = ""

         stmd = statres.st_mode

         if stat.S_ISREG(stmd):

             inode = (statres.st_ino, statres.st_dev)

             if not self.dereference and statres.st_nlink > 1 and \

                     inode in self.inodes and arcname != self.inodes[inode]:

                 # Is it a hardlink to an already

                 # archived file?

                 type = LNKTYPE

                 linkname = self.inodes[inode]

             else:

                 # The inode is added only if its valid.

                 # For win32 it is always 0.

                 type = REGTYPE

                 if inode[0]:

                     self.inodes[inode] = arcname

         elif stat.S_ISDIR(stmd):

             type = DIRTYPE

         elif stat.S_ISFIFO(stmd):

             type = FIFOTYPE

         elif stat.S_ISLNK(stmd):

             type = SYMTYPE

             linkname = os.readlink(name)

         elif stat.S_ISCHR(stmd):

             type = CHRTYPE

         elif stat.S_ISBLK(stmd):

             type = BLKTYPE

         else:

             return None

         # Fill the TarInfo object with all

         # information we can get.

         tarinfo.name = arcname

         tarinfo.mode = stmd

         tarinfo.uid = statres.st_uid

         tarinfo.gid = statres.st_gid

         if type == REGTYPE:

             tarinfo.size = statres.st_size

         else:

             tarinfo.size = 0L

         tarinfo.mtime = statres.st_mtime

         tarinfo.type = type

         tarinfo.linkname = linkname

         if pwd:

             try:

                 tarinfo.uname = pwd.getpwuid(tarinfo.uid)[0]

             except KeyError:

                 pass

         if grp:

             try:

                 tarinfo.gname = grp.getgrgid(tarinfo.gid)[0]

             except KeyError:

                 pass

         if type in (CHRTYPE, BLKTYPE):

             if hasattr(os, "major") and hasattr(os, "minor"):

                 tarinfo.devmajor = os.major(statres.st_rdev)

                 tarinfo.devminor = os.minor(statres.st_rdev)

         return tarinfo

     def list(self, verbose=True):

         """Print a table of contents to sys.stdout. If `verbose' is False, only

            the names of the members are printed. If it is True, an `ls -l'-like

            output is produced.

         """

         self._check()

         for tarinfo in self:

             if verbose:

                 print filemode(tarinfo.mode),

                 print "%s/%s" % (tarinfo.uname or tarinfo.uid,

                                  tarinfo.gname or tarinfo.gid),

                 if tarinfo.ischr() or tarinfo.isblk():

                     print "%10s" % ("%d,%d" \

                                     % (tarinfo.devmajor, tarinfo.devminor)),

                 else:

                     print "%10d" % tarinfo.size,

                 print "%d-%02d-%02d %02d:%02d:%02d" \

                       % time.localtime(tarinfo.mtime)[:6],

             print tarinfo.name + ("/" if tarinfo.isdir() else ""),

             if verbose:

                 if tarinfo.issym():

                     print "->", tarinfo.linkname,

                 if tarinfo.islnk():

                     print "link to", tarinfo.linkname,

             print

     def add(self, name, arcname=None, recursive=True, exclude=None, filter=None):

         """Add the file `name' to the archive. `name' may be any type of file

            (directory, fifo, symbolic link, etc.). If given, `arcname'

            specifies an alternative name for the file in the archive.

            Directories are added recursively by default. This can be avoided by

            setting `recursive' to False. `exclude' is a function that should

            return True for each filename to be excluded. `filter' is a function

            that expects a TarInfo object argument and returns the changed

            TarInfo object, if it returns None the TarInfo object will be

            excluded from the archive.

         """

         self._check("aw")

         if arcname is None:

             arcname = name

         # Exclude pathnames.

         if exclude is not None:

             import warnings

             warnings.warn("use the filter argument instead",

                     DeprecationWarning, 2)

             if exclude(name):

                 self._dbg(2, "tarfile: Excluded %r" % name)

                 return

         # Skip if somebody tries to archive the archive...

         if self.name is not None and os.path.abspath(name) == self.name:

             self._dbg(2, "tarfile: Skipped %r" % name)

             return

         self._dbg(1, name)

         # Create a TarInfo object from the file.

         tarinfo = self.gettarinfo(name, arcname)

         if tarinfo is None:

             self._dbg(1, "tarfile: Unsupported type %r" % name)

             return

         # Change or exclude the TarInfo object.

         if filter is not None:

             tarinfo = filter(tarinfo)

             if tarinfo is None:

                 self._dbg(2, "tarfile: Excluded %r" % name)

                 return

         # Append the tar header and data to the archive.

         if tarinfo.isreg():

             with bltn_open(name, "rb") as f:

                 self.addfile(tarinfo, f)

         elif tarinfo.isdir():

             self.addfile(tarinfo)

             if recursive:

                 for f in os.listdir(name):

                     self.add(os.path.join(name, f), os.path.join(arcname, f),

                             recursive, exclude, filter)

         else:

             self.addfile(tarinfo)

     def addfile(self, tarinfo, fileobj=None):

         """Add the TarInfo object `tarinfo' to the archive. If `fileobj' is

            given, tarinfo.size bytes are read from it and added to the archive.

            You can create TarInfo objects using gettarinfo().

            On Windows platforms, `fileobj' should always be opened with mode

            'rb' to avoid irritation about the file size.

         """

         self._check("aw")

         tarinfo = copy.copy(tarinfo)

         buf = tarinfo.tobuf(self.format, self.encoding, self.errors)

         self.fileobj.write(buf)

         self.offset += len(buf)

         # If there's data to follow, append it.

         if fileobj is not None:

             copyfileobj(fileobj, self.fileobj, tarinfo.size)

             blocks, remainder = divmod(tarinfo.size, BLOCKSIZE)

             if remainder > 0:

                 self.fileobj.write(NUL * (BLOCKSIZE - remainder))

                 blocks += 1

             self.offset += blocks * BLOCKSIZE

         self.members.append(tarinfo)

     def extractall(self, path=".", members=None):

         """Extract all members from the archive to the current working

            directory and set owner, modification time and permissions on

            directories afterwards. `path' specifies a different directory

            to extract to. `members' is optional and must be a subset of the

            list returned by getmembers().

         """

         directories = []

         if members is None:

             members = self

         for tarinfo in members:

             if tarinfo.isdir():

                 # Extract directories with a safe mode.

                 directories.append(tarinfo)

                 tarinfo = copy.copy(tarinfo)

                 tarinfo.mode = 0700

             self.extract(tarinfo, path)

         # Reverse sort directories.

         directories.sort(key=operator.attrgetter('name'))

         directories.reverse()

         # Set correct owner, mtime and filemode on directories.

         for tarinfo in directories:

             dirpath = os.path.join(path, tarinfo.name)

             try:

                 self.chown(tarinfo, dirpath)

                 self.utime(tarinfo, dirpath)

                 self.chmod(tarinfo, dirpath)

             except ExtractError, e:

                 if self.errorlevel > 1:

                     raise

                 else:

                     self._dbg(1, "tarfile: %s" % e)

     def extract(self, member, path=""):

         """Extract a member from the archive to the current working directory,

            using its full name. Its file information is extracted as accurately

            as possible. `member' may be a filename or a TarInfo object. You can

            specify a different directory using `path'.

         """

         self._check("r")

         if isinstance(member, basestring):

             tarinfo = self.getmember(member)

         else:

             tarinfo = member

         # Prepare the link target for makelink().

         if tarinfo.islnk():

             tarinfo._link_target = os.path.join(path, tarinfo.linkname)

         try:

             self._extract_member(tarinfo, os.path.join(path, tarinfo.name))

         except EnvironmentError, e:

             if self.errorlevel > 0:

                 raise

             else:

                 if e.filename is None:

                     self._dbg(1, "tarfile: %s" % e.strerror)

                 else:

                     self._dbg(1, "tarfile: %s %r" % (e.strerror, e.filename))

         except ExtractError, e:

             if self.errorlevel > 1:

                 raise

             else:

                 self._dbg(1, "tarfile: %s" % e)

     def extractfile(self, member):

         """Extract a member from the archive as a file object. `member' may be

            a filename or a TarInfo object. If `member' is a regular file, a

            file-like object is returned. If `member' is a link, a file-like

            object is constructed from the link's target. If `member' is none of

            the above, None is returned.

            The file-like object is read-only and provides the following

            methods: read(), readline(), readlines(), seek() and tell()

         """

         self._check("r")

         if isinstance(member, basestring):

             tarinfo = self.getmember(member)

         else:

             tarinfo = member

         if tarinfo.isreg():

             return self.fileobject(self, tarinfo)

         elif tarinfo.type not in SUPPORTED_TYPES:

             # If a member's type is unknown, it is treated as a

             # regular file.

             return self.fileobject(self, tarinfo)

         elif tarinfo.islnk() or tarinfo.issym():

             if isinstance(self.fileobj, _Stream):

                 # A small but ugly workaround for the case that someone tries

                 # to extract a (sym)link as a file-object from a non-seekable

                 # stream of tar blocks.

                 raise StreamError("cannot extract (sym)link as file object")

             else:

                 # A (sym)link's file object is its target's file object.

                 return self.extractfile(self._find_link_target(tarinfo))

         else:

             # If there's no data associated with the member (directory, chrdev,

             # blkdev, etc.), return None instead of a file object.

             return None

     def _extract_member(self, tarinfo, targetpath):

         """Extract the TarInfo object tarinfo to a physical

            file called targetpath.

         """

         # Fetch the TarInfo object for the given name

         # and build the destination pathname, replacing

         # forward slashes to platform specific separators.

         targetpath = targetpath.rstrip("/")

         targetpath = targetpath.replace("/", os.sep)

         # Create all upper directories.

         upperdirs = os.path.dirname(targetpath)

         if upperdirs and not os.path.exists(upperdirs):

             # Create directories that are not part of the archive with

             # default permissions.

             os.makedirs(upperdirs)

         if tarinfo.islnk() or tarinfo.issym():

             self._dbg(1, "%s -> %s" % (tarinfo.name, tarinfo.linkname))

         else:

             self._dbg(1, tarinfo.name)

         if tarinfo.isreg():

             self.makefile(tarinfo, targetpath)

         elif tarinfo.isdir():

             self.makedir(tarinfo, targetpath)

         elif tarinfo.isfifo():

             self.makefifo(tarinfo, targetpath)

         elif tarinfo.ischr() or tarinfo.isblk():

             self.makedev(tarinfo, targetpath)

         elif tarinfo.islnk() or tarinfo.issym():

             self.makelink(tarinfo, targetpath)

         elif tarinfo.type not in SUPPORTED_TYPES:

             self.makeunknown(tarinfo, targetpath)

         else:

             self.makefile(tarinfo, targetpath)

         self.chown(tarinfo, targetpath)

         if not tarinfo.issym():

             self.chmod(tarinfo, targetpath)

             self.utime(tarinfo, targetpath)

     #--------------------------------------------------------------------------

     # Below are the different file methods. They are called via

     # _extract_member() when extract() is called. They can be replaced in a

     # subclass to implement other functionality.

     def makedir(self, tarinfo, targetpath):

         """Make a directory called targetpath.

         """

         try:

             # Use a safe mode for the directory, the real mode is set

             # later in _extract_member().

             os.mkdir(targetpath, 0700)

         except EnvironmentError, e:

             if e.errno != errno.EEXIST:

                 raise

     def makefile(self, tarinfo, targetpath):

         """Make a file called targetpath.

         """

         source = self.extractfile(tarinfo)

         try:

             with bltn_open(targetpath, "wb") as target:

                 copyfileobj(source, target)

         finally:

             source.close()

     def makeunknown(self, tarinfo, targetpath):

         """Make a file from a TarInfo object with an unknown type

            at targetpath.

         """

         self.makefile(tarinfo, targetpath)

         self._dbg(1, "tarfile: Unknown file type %r, " \

                      "extracted as regular file." % tarinfo.type)

     def makefifo(self, tarinfo, targetpath):

         """Make a fifo called targetpath.

         """

         if hasattr(os, "mkfifo"):

             os.mkfifo(targetpath)

         else:

             raise ExtractError("fifo not supported by system")

     def makedev(self, tarinfo, targetpath):

         """Make a character or block device called targetpath.

         """

         if not hasattr(os, "mknod") or not hasattr(os, "makedev"):

             raise ExtractError("special devices not supported by system")

         mode = tarinfo.mode

         if tarinfo.isblk():

             mode |= stat.S_IFBLK

         else:

             mode |= stat.S_IFCHR

         os.mknod(targetpath, mode,

                  os.makedev(tarinfo.devmajor, tarinfo.devminor))

     def makelink(self, tarinfo, targetpath):

         """Make a (symbolic) link called targetpath. If it cannot be created

           (platform limitation), we try to make a copy of the referenced file

           instead of a link.

         """

         if hasattr(os, "symlink") and hasattr(os, "link"):

             # For systems that support symbolic and hard links.

             if tarinfo.issym():

                 if os.path.lexists(targetpath):

                     os.unlink(targetpath)

                 os.symlink(tarinfo.linkname, targetpath)

             else:

                 # See extract().

                 if os.path.exists(tarinfo._link_target):

                     if os.path.lexists(targetpath):

                         os.unlink(targetpath)

                     os.link(tarinfo._link_target, targetpath)

                 else:

                     self._extract_member(self._find_link_target(tarinfo), targetpath)

         else:

             try:

                 self._extract_member(self._find_link_target(tarinfo), targetpath)

             except KeyError:

                 raise ExtractError("unable to resolve link inside archive")

     def chown(self, tarinfo, targetpath):

         """Set owner of targetpath according to tarinfo.

         """

         if pwd and hasattr(os, "geteuid") and os.geteuid() == 0:

             # We have to be root to do so.

             try:

                 g = grp.getgrnam(tarinfo.gname)[2]

             except KeyError:

                 g = tarinfo.gid

             try:

                 u = pwd.getpwnam(tarinfo.uname)[2]

             except KeyError:

                 u = tarinfo.uid

             try:

                 if tarinfo.issym() and hasattr(os, "lchown"):

                     os.lchown(targetpath, u, g)

                 else:

                     if sys.platform != "os2emx":

                         os.chown(targetpath, u, g)

             except EnvironmentError, e:

                 raise ExtractError("could not change owner")

     def chmod(self, tarinfo, targetpath):

         """Set file permissions of targetpath according to tarinfo.

         """

         if hasattr(os, 'chmod'):

             try:

                 os.chmod(targetpath, tarinfo.mode)

             except EnvironmentError, e:

                 raise ExtractError("could not change mode")

     def utime(self, tarinfo, targetpath):

         """Set modification time of targetpath according to tarinfo.

         """

         if not hasattr(os, 'utime'):

             return

         try:

             os.utime(targetpath, (tarinfo.mtime, tarinfo.mtime))

         except EnvironmentError, e:

             raise ExtractError("could not change modification time")

     #--------------------------------------------------------------------------

     def next(self):

         """Return the next member of the archive as a TarInfo object, when

            TarFile is opened for reading. Return None if there is no more

            available.

         """

         self._check("ra")

         if self.firstmember is not None:

             m = self.firstmember

             self.firstmember = None

             return m

         # Read the next block.

         self.fileobj.seek(self.offset)

         tarinfo = None

         while True:

             try:

                 tarinfo = self.tarinfo.fromtarfile(self)

             except EOFHeaderError, e:

                 if self.ignore_zeros:

                     self._dbg(2, "0x%X: %s" % (self.offset, e))

                     self.offset += BLOCKSIZE

                     continue

             except InvalidHeaderError, e:

                 if self.ignore_zeros:

                     self._dbg(2, "0x%X: %s" % (self.offset, e))

                     self.offset += BLOCKSIZE

                     continue

                 elif self.offset == 0:

                     raise ReadError(str(e))

             except EmptyHeaderError:

                 if self.offset == 0:

                     raise ReadError("empty file")

             except TruncatedHeaderError, e:

                 if self.offset == 0:

                     raise ReadError(str(e))

             except SubsequentHeaderError, e:

                 raise ReadError(str(e))

             break

         if tarinfo is not None:

             self.members.append(tarinfo)

         else:

             self._loaded = True

         return tarinfo

     #--------------------------------------------------------------------------

     # Little helper methods:

     def _getmember(self, name, tarinfo=None, normalize=False):

         """Find an archive member by name from bottom to top.

            If tarinfo is given, it is used as the starting point.

         """

         # Ensure that all members have been loaded.

         members = self.getmembers()

         # Limit the member search list up to tarinfo.

         if tarinfo is not None:

             members = members[:members.index(tarinfo)]

         if normalize:

             name = os.path.normpath(name)

         for member in reversed(members):

             if normalize:

                 member_name = os.path.normpath(member.name)

             else:

                 member_name = member.name

             if name == member_name:

                 return member

     def _load(self):

         """Read through the entire archive file and look for readable

            members.

         """

         while True:

             tarinfo = self.next()

             if tarinfo is None:

                 break

         self._loaded = True

     def _check(self, mode=None):

         """Check if TarFile is still open, and if the operation's mode

            corresponds to TarFile's mode.

         """

         if self.closed:

             raise IOError("%s is closed" % self.__class__.__name__)

         if mode is not None and self.mode not in mode:

             raise IOError("bad operation for mode %r" % self.mode)

     def _find_link_target(self, tarinfo):

         """Find the target member of a symlink or hardlink member in the

            archive.

         """

         if tarinfo.issym():

             # Always search the entire archive.

             linkname = "/".join(filter(None, (os.path.dirname(tarinfo.name), tarinfo.linkname)))

             limit = None

         else:

             # Search the archive before the link, because a hard link is

             # just a reference to an already archived file.

             linkname = tarinfo.linkname

             limit = tarinfo

         member = self._getmember(linkname, tarinfo=limit, normalize=True)

         if member is None:

             raise KeyError("linkname %r not found" % linkname)

         return member

     def __iter__(self):

         """Provide an iterator object.

         """

         if self._loaded:

             return iter(self.members)

         else:

             return TarIter(self)

     def _dbg(self, level, msg):

         """Write debugging output to sys.stderr.

         """

         if level <= self.debug:

             print >> sys.stderr, msg

     def __enter__(self):

         self._check()

         return self

     def __exit__(self, type, value, traceback):

         if type is None:

             self.close()

         else:

             # An exception occurred. We must not call close() because

             # it would try to write end-of-archive blocks and padding.

             if not self._extfileobj:

                 self.fileobj.close()

             self.closed = True

 # class TarFile

 TarFile

TarFile

(六)json和pickle模块

用于序列化的两个模块

json，用于字符串和 python数据类型间进行转换

'''

序列化

'''

import json

info={

    'name':'鲁班',

    'age':22

}

f=open('test.txt','w')

f.write(json.dumps(info))#用于将Python数据以字符串的形式写入到文件中

f.close()

'''

反序列化

'''

import json

#json不同语言之间进行交互

f = open('test.txt','r')

data=json.loads(f.read())#从文件中加载出Python的数据类型

print(data['age'])

pickle，用于python特有的类型和 python的数据类型间进行转换

'''

序列化

'''

import  pickle

def sayhi(name):

    print("hello  python",name)

info = {

    'name':'鲁班',

    'age':22,

    'func':'sayhi'

}

f=open("pickle_test.txt",'rb')

pickle.dump(info,f)#==f.write(pickle.dumps(info))

f.close()

'''

反序列化

'''

import pickle

f=open("pickle_test.txt",'rb')

data=pickle.load(f)

print(data["age"])

Json模块提供了四个功能：dumps、dump、loads、load

pickle模块提供了四个功能：dumps、dump、loads、load

(七)shelve模块

shelve模块是一个简单的k,v将内存数据通过文件持久化的模块，可以持久化任何pickle可支持的python数据格式

'''

利用shelve模块把Python数据写入文件

'''

import shelve

d = shelve.open('shelve_test')  # 打开一个文件

t = '123'

t2 = '123334'

name = ["鲁班", "rain", "test"]

d["test"] = name  # 持久化列表

d["t1"] = t  # 持久化类

d["t2"] = t2

d.close()

'''

利用shelve模块从文件中读取Python数据

'''

import shelve

d=shelve.open('shelve_test')  # 打开一个文件

print(d.get("test"))

print(d.get("t1"))

print(d.get("t2"))

(七)xml处理模块

xml是实现不同语言或程序之间进行数据交换的协议，跟json差不多，但json使用起来更简单，不过，古时候，在json还没诞生的黑暗年代，

大家只能选择用xml呀，至今很多传统公司如金融行业的很多系统的接口还主要是xml。

xml的格式如下，就是通过<>节点来区别数据结构的:

<?xml version="1.0"?>

<data>

    <country name="Liechtenstein">

        <rank updated="yes">2</rank>

        <year>2008</year>

        <gdppc>141100</gdppc>

        <neighbor name="Austria" direction="E"/>

        <neighbor name="Switzerland" direction="W"/>

    </country>

    <country name="Singapore">

        <rank updated="yes">5</rank>

        <year>2011</year>

        <gdppc>59900</gdppc>

        <neighbor name="Malaysia" direction="N"/>

    </country>

    <country name="Panama">

        <rank updated="yes">69</rank>

        <year>2011</year>

        <gdppc>13600</gdppc>

        <neighbor name="Costa Rica" direction="W"/>

        <neighbor name="Colombia" direction="E"/>

    </country>

</data>

xml_hehe.xml

xml协议在各个语言里的都是支持的，在python中可以用以下模块操作xml。

import xml.etree.ElementTree as ET

tree = ET.parse("xml_hehe.xml")

root = tree.getroot()

print(root)

print(root.tag)

# 遍历xml文档

for child in root:

    print(child.tag, child.attrib)

    for i in child:

        print(i.tag, i.text,i.attrib)

# 只遍历year 节点

for node in root.iter('year'):

    print(node.tag, node.text)

xml_handle.py

修改和删除xml文档内容

import xml.etree.ElementTree as ET

tree = ET.parse("xml_hehe.xml")

root = tree.getroot()

# 修改

for node in root.iter('year'):

    new_year = int(node.text) + 1

    node.text = str(new_year)

    node.set("updated_by", "Yun")

tree.write("xmltest.xml")

# 删除node

for country in root.findall('country'):

    rank = int(country.find('rank').text)

    if rank > 50:

        root.remove(country)

tree.write('output.xml')

xml_change.py

自己创建xml文档

import xml.etree.ElementTree as ET

new_xml = ET.Element("namelist")

Personal = ET.SubElement(new_xml, "Personal", attrib={"enrolled": "yes"})

name = ET.SubElement(Personal,"name")

name.text="鲁班大师"

age = ET.SubElement(Personal, "age", attrib={"checked": "no"})

sex = ET.SubElement(Personal, "sex")

age.text = ''

sex.text='man'

Personal = ET.SubElement(new_xml, "Personal2", attrib={"enrolled": "no"})

name = ET.SubElement(Personal, "name")

name.text="安琪拉"

sex = ET.SubElement(Personal, "sex")

sex.text='men'

age = ET.SubElement(Personal, "age")

age.text = ''

et = ET.ElementTree(new_xml)  # 生成文档对象

et.write("test.xml", encoding="utf-8", xml_declaration=True)

ET.dump(new_xml)  # 打印生成的格式

xml_myself.py

<?xml version='1.0' encoding='utf-8'?>

<namelist>

    <Personal enrolled="yes">

        <name>鲁班大师</name>

        <age checked="no">33</age>

        <sex>man</sex></Personal>

    <Personal2 enrolled="no">

        <name>安琪拉</name>

        <sex>men</sex>

        <age>19</age></Personal2>

</namelist>

test.xml

(八)PyYAML模块

Python也可以很容易的处理ymal文档格式，只不过需要安装一个模块，参考文档：http://pyyaml.org/wiki/PyYAMLDocumentation

(九)ConfigParser模块

用于生成和修改常见配置文档，当前模块的名称在 python 3.x 版本中变更为 configparser。

来看一个好多软件的常见文档格式如下

[DEFAULT]

ServerAliveInterval = 45

Compression = yes

CompressionLevel = 9

ForwardX11 = yes

[bitbucket.org]

User = hg

[topsecret.server.com]

Port = 50022

ForwardX11 = no

如果想用python生成一个这样的文档怎么做呢？

import configparser

config = configparser.ConfigParser()

config["DEFAULT"] = {'ServerAliveInterval': '',

                     'Compression': 'yes',

                     'CompressionLevel': ''}

config['bitbucket.org'] = {}

config['bitbucket.org']['User'] = 'hg'

config['topsecret.server.com'] = {}

topsecret = config['topsecret.server.com']

topsecret['Host Port'] = ''  # mutates the parser

topsecret['ForwardX11'] = 'no'  # same here

config['DEFAULT']['ForwardX11'] = 'yes'

with open('example.ini', 'w') as configfile:

    config.write(configfile)

Config_test.py

读取Config文档内容

import  configparser

conf = configparser.ConfigParser()

conf.read("example.ini")

print(conf.defaults())

print(conf.sections())

print(conf['bitbucket.org']['user'])

configparser增删改查语法

[section1]

k1 = v1

k2:v2

[section2]

k1 = v1

import ConfigParser

config = ConfigParser.ConfigParser()

config.read('i.cfg')

# ########## 读 ##########

#secs = config.sections()

#print secs

#options = config.options('group2')

#print options

#item_list = config.items('group2')

#print item_list

#val = config.get('group1','key')

#val = config.getint('group1','key')

# ########## 改写 ##########

#sec = config.remove_section('group1')

#config.write(open('i.cfg', "w"))

#sec = config.has_section('wupeiqi')

#sec = config.add_section('wupeiqi')

#config.write(open('i.cfg', "w"))

#config.set('group2','k1',11111)

#config.write(open('i.cfg', "w"))

#config.remove_option('group2','age')

#config.write(open('i.cfg', "w"))

(十)hashlib模块　　

用于加密相关的操作，3.x里代替了md5模块和sha模块，主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ，MD5 算法

import hashlib

m = hashlib.md5()

m.update(b'hello')

print(m.hexdigest())

m.update(b'world!')

print(m.hexdigest())

m2 = hashlib.md5()

m2.update(b'helloworld!')

print(m2.hexdigest())

#sha256()

hash=hashlib.sha256()

hash.update('微微一笑很倾城'.encode(encoding='utf-8'))

print(hash.hexdigest())

#sha384()

hash1 = hashlib.sha384()

hash1.update('微微一笑很倾城'.encode(encoding='utf-8'))

print(hash1.hexdigest())

#sha512()

hash2 = hashlib.sha512()

hash2.update('微微一笑很倾城'.encode(encoding='utf-8'))

print(hash2.hexdigest())

'''

python 还有一个 hmac 模块，它内部对我们创建 key 和 内容 再进行处理然后再加密

散列消息鉴别码，简称HMAC，是一种基于消息鉴别码MAC（Message Authentication Code）

的鉴别机制。使用HMAC时,消息通讯的双方，通过验证消息中加入的鉴别密钥K

来鉴别消息的真伪；

一般用于网络通信中消息加密，前提是双方先要约定好key,就像接头暗号一样，

然后消息发送把用key把消息加密，接收方用key ＋ 消息明文再加密，

拿加密后的值 跟 发送者的相对比是否相等，这样就能验证消息的真实性，

及发送者的合法性了。

'''

import hmac

h = hmac.new('鲁班大师'.encode(encoding='utf-8'),

             '智障二百五'.encode(encoding='utf-8'))

print (h.hexdigest())

(十一)re模块

常用正则表达式符号

'.'     默认匹配除\n之外的任意一个字符，若指定flag DOTALL,则匹配任意字符，包括换行

'^'     匹配字符开头，若指定flags MULTILINE,这种也可以匹配上(r"^a","\nabc\neee",flags=re.MULTILINE)

'$'     匹配字符结尾，或e.search("foo$","bfoo\nsdfsf",flags=re.MULTILINE).group()也可以

'*'     匹配*号前的字符0次或多次，re.findall("ab*","cabb3abcbbac")  结果为['abb', 'ab', 'a']

'+'     匹配前一个字符1次或多次，re.findall("ab+","ab+cd+abb+bba") 结果['ab', 'abb']

'?'     匹配前一个字符1次或0次

'{m}'   匹配前一个字符m次

'{n,m}' 匹配前一个字符n到m次，re.findall("ab{1,3}","abb abc abbcbbb") 结果'abb', 'ab', 'abb']

'|'     匹配|左或|右的字符，re.search("abc|ABC","ABCBabcCD").group() 结果'ABC'

'(...)' 分组匹配，re.search("(abc){2}a(123|456)c", "abcabca456c").group() 结果 abcabca456c

'\A'    只从字符开头匹配，re.search("\Aabc","alexabc") 是匹配不到的

'\Z'    匹配字符结尾，同$

'\d'    匹配数字0-9

'\D'    匹配非数字

'\w'    匹配[A-Za-z0-9]

'\W'    匹配非[A-Za-z0-9]

's'     匹配空白字符、\t、\n、\r , re.search("\s+","ab\tc1\n3").group() 结果 '\t'

'(?P<name>...)' 分组匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","").groupdict("city") 
结果{'province': '', 'city': '', 'birthday': ''}

演示

>>> import re

>>> re.match('.','dsskdslds211')

<_sre.SRE_Match object; span=(0, 1), match='d'>

>>>import re

>>> re.match('^ds','dsdsdsdsadj1212')

<_sre.SRE_Match object; span=(0, 2), match='ds'>

>>>

>>>import re

>>> re.match('^ds\d','ds12123dsdsdsadj1212')

<_sre.SRE_Match object; span=(0, 3), match='ds1'>

>>> re.match('^ds\d+','ds12123dsdsdsadj1212')

<_sre.SRE_Match object; span=(0, 7), match='ds12123'>

>>>

>>>import re

>>>re.search('k[a-z]+a','sahsaj1212kaHEHEsha12sakasha')

<_sre.SRE_Match object; span=(23, 28), match='kasha'>

>>>import re

>>>re.search('k[a-zA-Z]+a','sahsaj1212kaHEHEsha12sakasha')

<_sre.SRE_Match object; span=(10, 19), match='kaHEHEsha'>

>>>import re

>>>re.search('#.+#','as#hello#ha')

<_sre.SRE_Match object; span=(2, 9), match='#hello#'>

>>>import re

>>>print(re.search('a?','asnksaaaha'))

>>>print(re.search('aa?','asnksaaaha'))

>>>print(re.search('aaa?','asnksaaaha'))

<_sre.SRE_Match object; span=(0, 1), match='a'>

<_sre.SRE_Match object; span=(0, 1), match='a'>

<_sre.SRE_Match object; span=(5, 8), match='aaa'>

import re

print(re.search('[0-9]{3}','asn1k2sa1213aaha'))

print(re.search('[0-9]{1,3}','asn1k2sa1213aaha'))

<_sre.SRE_Match object; span=(8, 11), match=''>

<_sre.SRE_Match object; span=(3, 4), match=''>

import re

print(re.findall('[0-9]{3}','asn1k2sa1213aaha'))

print(re.findall('[0-9]{1,3}','asn1k2sa1213aaha'))

['']

['', '', '', '']

import re

print(re.findall('abc|ABC','asabcn1k2sABCa1213aaha'))

print(re.search('abc|ABC','asabcn1k2sABCa1213aaha').group())

['abc', 'ABC']

abc

import re

print(re.search('(abc){2}','asabcn1abcabcka'))

print(re.search('(abc){2}\|','asabcn1abcabc|ka'))

print(re.search('(abc){2}\|{2}','asabcn1abcabc||ka'))

print(re.search('(abc){2}\|\|=','asabcn1abcabc||=ka'))

print(re.search('(abc){2}(\|\|=){2}','asabcn1abcabc||=||=ka'))

<_sre.SRE_Match object; span=(7, 13), match='abcabc'>

<_sre.SRE_Match object; span=(7, 14), match='abcabc|'>

<_sre.SRE_Match object; span=(7, 15), match='abcabc||'>

<_sre.SRE_Match object; span=(7, 16), match='abcabc||='>

<_sre.SRE_Match object; span=(7, 19), match='abcabc||=||='>

import re

print(re.search('\A[0-9]+[a-z]\Z','1213a'))

<_sre.SRE_Match object; span=(0, 5), match='1213a'>

import re

print(re.search('\D+','1213asa |?$#@'))

print(re.search('\W+','1213asa |?$#@'))

print(re.search('\s+','1213asa \r\n\t'))

<_sre.SRE_Match object; span=(4, 13), match='asa |?$#@'>

<_sre.SRE_Match object; span=(7, 13), match=' |?$#@'>

<_sre.SRE_Match object; span=(7, 11), match=' \r\n\t'>

import re

re.search("(?P<province>[0-9]{2})(?P<city>[0-9]{2})

          (?P<local>[0-9]{2})(?P<birthday>[0-9]{8})",

          "").groupdict("city") 

{'province': '', 'city': '', 'local': '', 'birthday': ''}

正则表达式

在线测试工具 http://tool.chinaz.com/regex/

同一个位置上可以出现的字符的范围。

字符组 ： [字符组]

在同一个位置可能出现的各种字符组成了一个字符组，在正则表达式中用[]表示

字符分为很多类，比如数字、字母、标点等等。

假如你现在要求一个位置"只能出现一个数字",那么这个位置上的字符只能是0、1、2...9这10个数之一。

字符：

元字符	匹配内容
.	匹配除换行符以外的任意字符
\w	匹配字母或数字或下划线
\s	匹配任意的空白符
\d	匹配数字
\n	匹配一个换行符
\t	匹配一个制表符
\b	匹配一个单词的结尾
^	匹配字符串的开始
$	匹配字符串的结尾
\W	匹配非字母或数字或下划线
\D	匹配非数字
\S	匹配非空白符
a\|b	匹配字符a或字符b
()	匹配括号内的表达式，也表示一个组
[...]	匹配字符组中的字符
[^...]	匹配除了字符组中字符的所有字符

量词：

量词	用法说明
*	重复零次或更多次
+	重复一次或更多次
?	重复零次或一次
{n}	重复n次
{n,}	重复n次或更多次
{n,m}	重复n到m次

. ^ $

正则	待匹配字符	匹配结果	说明
海.	海燕海娇海东	海燕海娇海东	匹配所有"海."的字符
^海.	海燕海娇海东	海燕	只从开头匹配"海."
海.$	海燕海娇海东	海东	只匹配结尾的"海.$"

* + ? { }

正则	待匹配字符	匹配结果	说明
李.?	李杰和李莲英和李二棍子	李杰李莲李二	?表示重复零次或一次，即只匹配"李"后面一个任意字符
李.*	李杰和李莲英和李二棍子	李杰和李莲英和李二棍子	*表示重复零次或多次，即匹配"李"后面0或多个任意字符
李.+	李杰和李莲英和李二棍子	李杰和李莲英和李二棍子	+表示重复一次或多次，即只匹配"李"后面1个或多个任意字符
李.{1,2}	李杰和李莲英和李二棍子	李杰和李莲英李二棍	{1,2}匹配1到2次任意字符

注意：前面的*,+,?等都是贪婪匹配，也就是尽可能匹配，后面加?号使其变成惰性匹配

正则	待匹配字符	匹配结果	说明
李.*?	李杰和李莲英和李二棍子	李李李	惰性匹配

字符集［］［^］

正则	待匹配字符	匹配结果	说明
李[杰莲英二棍子]*	李杰和李莲英和李二棍子	李杰李莲英李二棍子	表示匹配"李"字后面[杰莲英二棍子]的字符任意次
李[^和]*	李杰和李莲英和李二棍子	李杰李莲英李二棍子	表示匹配一个不是"和"的字符任意次
[\d]	456bdha3	4 5 6 3	表示匹配任意一个数字，匹配到4个结果
[\d]+	456bdha3	456 3	表示匹配任意个数字，匹配到2个结果

分组 ()与或｜［^］

身份证号码是一个长度为15或18个字符的字符串，如果是15位则全部由数字组成，首位不能为0；如果是18位，则前17位全部是数字，末位可能是数字或x，下面我们尝试用正则来表示：

正则	待匹配字符	匹配结果	说明
^[1-9]\d{13,16}[0-9x]$	110101198001017032	110101198001017032	表示可以匹配一个正确的身份证号
^[1-9]\d{13,16}[0-9x]$	1101011980010170	1101011980010170	表示也可以匹配这串数字，但这并不是一个正确的身份证号码，它是一个16位的数字
^[1-9]\d{14}(\d{2}[0-9x])?$	1101011980010170	False	现在不会匹配错误的身份证号了 ()表示分组，将\d{2}[0-9x]分成一组，就可以整体约束他们出现的次数为0-1次
^([1-9]\d{16}[0-9x]\|[1-9]\d{14})$	110105199812067023	110105199812067023	表示先匹配[1-9]\d{16}[0-9x]如果没有匹配上就匹配[1-9]\d{14}

转义符 \

在正则表达式中，有很多有特殊意义的是元字符，比如\n和\s等，如果要在正则中匹配正常的"\n"而不是"换行符"就需要对"\"进行转义，变成'\\'。

在python中，无论是正则表达式，还是待匹配的内容，都是以字符串的形式出现的，在字符串中\也有特殊的含义，本身还需要转义。所以如果匹配一次"\n",字符串中要写成'\\n'，那么正则里就要写成"\\\\n",这样就太麻烦了。这个时候我们就用到了r'\n'这个概念，此时的正则是r'\\n'就可以了。

正则	待匹配字符	匹配结果	说明
\n	\n	False	因为在正则表达式中\是有特殊意义的字符，所以要匹配\n本身，用表达式\n无法匹配
\\n	\n	True	转义\之后变成\\，即可匹配
"\\\\n"	'\\n'	True	如果在python中，字符串中的'\'也需要转义，所以每一个字符串'\'又需要转义一次
r'\\n'	r'\n'	True	在字符串之前加r，让整个字符串不转义

贪婪匹配

贪婪匹配：在满足匹配时，匹配尽可能长的字符串，默认情况下，采用贪婪匹配

正则	待匹配字符	匹配结果	说明
<.*>	<script>...<script>	<script>...<script>	默认为贪婪匹配模式，会匹配尽量长的字符串
<.*?>	<script>...<script>	<script> <script>	加上？为将贪婪匹配模式转为非贪婪匹配模式，会匹配尽量短的字符串

几个常用的非贪婪匹配Pattern

*? 重复任意次，但尽可能少重复

+? 重复1次或更多次，但尽可能少重复

?? 重复0次或1次，但尽可能少重复

{n,m}? 重复n到m次，但尽可能少重复

{n,}? 重复n次以上，但尽可能少重复

.*?的用法

. 是任意字符

* 是取 0 至 无限长度

? 是非贪婪模式。

何在一起就是 取尽量少的任意字符，一般不会这么单独写，他大多用在：

.*?x

就是取前面任意长度的字符，直到一个x出现

re模块下的常用方法

import re

ret = re.findall('a', 'eva egon yuan')  # 返回所有满足匹配条件的结果,放在列表里

print(ret) #结果 : ['a', 'a']

ret = re.search('a', 'eva egon yuan').group()

print(ret) #结果 : 'a'

# 函数会在字符串内查找模式匹配,只到找到第一个匹配然后返回一个包含匹配信息的对象,该对象可以

# 通过调用group()方法得到匹配的字符串,如果字符串没有匹配，则返回None。

ret = re.match('a', 'abc').group()  # 同search,不过仅在字符串开始处进行匹配

print(ret)

#结果 : 'a'

ret = re.split('[ab]', 'abcd')  # 先按'a'分割得到''和'bcd',在对''和'bcd'分别按'b'分割

print(ret)  # ['', '', 'cd']

ret = re.sub('\d', 'H', 'eva3egon4yuan4', 1)#将数字替换成'H'，参数1表示只替换1个

print(ret) #evaHegon4yuan4

ret = re.subn('\d', 'H', 'eva3egon4yuan4')#将数字替换成'H'，返回元组(替换的结果,替换了多少次)

print(ret)

obj = re.compile('\d{3}')  #将正则表达式编译成为一个 正则表达式对象，规则要匹配的是3个数字

ret = obj.search('abc123eeee') #正则表达式对象调用search，参数为待匹配的字符串

print(ret.group())  #结果 ： 123

import re

ret = re.finditer('\d', 'ds3sy4784a')   #finditer返回一个存放匹配结果的迭代器

print(ret)  # <callable_iterator object at 0x10195f940>

print(next(ret).group())  #查看第一个结果

print(next(ret).group())  #查看第二个结果

print([i.group() for i in ret])  #查看剩余的左右结果

注意：

1 findall的优先级查询：

import re

ret = re.findall('www.(baidu|oldboy).com', 'www.oldboy.com')

print(ret)  # ['oldboy']     这是因为findall会优先把匹配结果组里内容返回,如果想要匹配结果,取消权限即可

ret = re.findall('www.(?:baidu|oldboy).com', 'www.oldboy.com')

print(ret)  # ['www.oldboy.com']

2 split的优先级查询

ret=re.split("\d+","eva3egon4yuan")

print(ret) #结果 ： ['eva', 'egon', 'yuan']

ret=re.split("(\d+)","eva3egon4yuan")

print(ret) #结果 ： ['eva', '3', 'egon', '4', 'yuan']

#在匹配部分加上（）之后所切出的结果是不同的，

#没有（）的没有保留所匹配的项，但是有（）的却能够保留了匹配的项，

#这个在某些需要保留匹配部分的使用过程是非常重要的。

正则	待匹配字符	匹配结果	说明
[0123456789]	8	True	在一个字符组里枚举合法的所有字符，字符组里的任意一个字符和"待匹配字符"相同都视为可以匹配
[0123456789]	a	False	由于字符组中没有"a"字符，所以不能匹配
[0-9]	7	True	也可以用-表示范围,[0-9]就和[0123456789]是一个意思
[a-z]	s	True	同样的如果要匹配所有的小写字母，直接用[a-z]就可以表示
[A-Z]	B	True	[A-Z]就表示所有的大写字母
[0-9a-fA-F]	e	True	可以匹配数字，大小写形式的a～f，用来验证十六进制字符

十二string模块

str.capitalize() 把字符串的第一个字符大写

str.center(width) 返回一个原字符串居中，并使用空格填充到width长度的新字符串

str.ljust(width) 返回一个原字符串左对齐，用空格填充到指定长度的新字符串

str.rjust(width) 返回一个原字符串右对齐，用空格填充到指定长度的新字符串

str.zfill(width) 返回字符串右对齐，前面用0填充到指定长度的新字符串

str.count(str,[beg,len]) 返回子字符串在原字符串出现次数，beg,len是范围

str.decode(encodeing[,replace]) 解码string,出错引发ValueError异常

str.encode(encodeing[,replace]) 解码string

str.endswith(substr[,beg,end]) 字符串是否以substr结束，beg,end是范围

str.startswith(substr[,beg,end]) 字符串是否以substr开头，beg,end是范围

str.expandtabs(tabsize = 8) 把字符串的tab转为空格，默认为8个

str.find(str,[stat,end]) 查找子字符串在字符串第一次出现的位置，否则返回-1

str.index(str,[beg,end]) 查找子字符串在指定字符中的位置，不存在报异常

str.isalnum() 检查字符串是否以字母和数字组成，是返回true否则False

str.isalpha() 检查字符串是否以纯字母组成，是返回true,否则false

str.isdecimal() 检查字符串是否以纯十进制数字组成，返回布尔值

str.isdigit() 检查字符串是否以纯数字组成，返回布尔值

str.islower() 检查字符串是否全是小写，返回布尔值

str.isupper() 检查字符串是否全是大写，返回布尔值

str.isnumeric() 检查字符串是否只包含数字字符，返回布尔值

str.isspace() 如果str中只包含空格，则返回true,否则FALSE

str.title() 返回标题化的字符串（所有单词首字母大写，其余小写）

str.istitle() 如果字符串是标题化的(参见title())则返回true,否则false

str.join(seq) 以str作为连接符，将一个序列中的元素连接成字符串

str.split(str=‘‘,num) 以str作为分隔符，将一个字符串分隔成一个序列，num是被分隔的字符串

str.splitlines(num) 以行分隔，返回各行内容作为元素的列表

str.lower() 将大写转为小写

str.upper() 转换字符串的小写为大写

str.swapcase() 翻换字符串的大小写

str.lstrip() 去掉字符左边的空格和回车换行符

str.rstrip() 去掉字符右边的空格和回车换行符

str.strip() 去掉字符两边的空格和回车换行符

str.partition(substr) 从substr出现的第一个位置起，将str分割成一个3元组。

str.replace(str1,str2,num) 查找str1替换成str2，num是替换次数

str.rfind(str[,beg,end]) 从右边开始查询子字符串

str.rindex(str,[beg,end]) 从右边开始查找子字符串位置

str.rpartition(str) 类似partition函数，不过从右边开始查找

str.translate(str,del=‘‘) 按str给出的表转换string的字符，del是要过虑的字符

十三math模块

ceil:取大于等于x的最小的整数值，如果x是一个整数，则返回x

copysign:把y的正负号加到x前面，可以使用0

cos:求x的余弦，x必须是弧度

degrees:把x从弧度转换成角度

e:表示一个常量

exp:返回math.e,也就是2.71828的x次方

expm1:返回math.e的x(其值为2.71828)次方的值减１

fabs:返回x的绝对值

factorial:取x的阶乘的值

floor:取小于等于x的最大的整数值，如果x是一个整数，则返回自身

fmod:得到x/y的余数，其值是一个浮点数

frexp:返回一个元组(m,e),其计算方式为：x分别除0.5和1,得到一个值的范围

fsum:对迭代器里的每个元素进行求和操作

gcd:返回x和y的最大公约数

hypot:如果x是不是无穷大的数字,则返回True,否则返回False

isfinite:如果x是正无穷大或负无穷大，则返回True,否则返回False

isinf:如果x是正无穷大或负无穷大，则返回True,否则返回False

isnan:如果x不是数字True,否则返回False

ldexp:返回x*(2**i)的值

log:返回x的自然对数，默认以e为基数，base参数给定时，将x的对数返回给定的base,计算式为：log(x)/log(base)

log10:返回x的以10为底的对数

log1p:返回x+1的自然对数(基数为e)的值

log2:返回x的基2对数

modf:返回由x的小数部分和整数部分组成的元组

pi:数字常量，圆周率

pow:返回x的y次方，即x**y

radians:把角度x转换成弧度

sin:求x(x为弧度)的正弦值

sqrt:求x的平方根

tan:返回x(x为弧度)的正切值

trunc:返回x的整数部分

十四urllib模块

urllib.quote(string[,safe]) 对字符串进行编码。参数safe指定了不需要编码的字符

urllib.unquote(string) 对字符串进行解码

urllib.quote_plus(string[,safe]) 与urllib.quote类似，但这个方法用‘+‘来替换‘ ‘，而quote用‘%20‘来代替‘ ‘

urllib.unquote_plus(string ) 对字符串进行解码

urllib.urlencode(query[,doseq]) 将dict或者包含两个元素的元组列表转换成url参数。

例如 字典{‘name‘:‘wklken‘,‘pwd‘:‘123‘}将被转换为”name=wklken&pwd=123″

urllib.pathname2url(path) 将本地路径转换成url路径

urllib.url2pathname(path) 将url路径转换成本地路径

urllib.urlretrieve(url[,filename[,reporthook[,data]]]) 下载远程数据到本地

filename：指定保存到本地的路径（若未指定该，urllib生成一个临时文件保存数据）

reporthook：回调函数，当连接上服务器、以及相应的数据块传输完毕的时候会触发该回调

data：指post到服务器的数据

rulrs = urllib.urlopen(url[,data[,proxies]]) 抓取网页信息，[data]post数据到Url,proxies设置的代理

urlrs.readline() 跟文件对象使用一样

urlrs.readlines() 跟文件对象使用一样

urlrs.fileno() 跟文件对象使用一样

urlrs.close() 跟文件对象使用一样

urlrs.info() 返回一个httplib.HTTPMessage对象，表示远程服务器返回的头信息

urlrs.getcode() 获取请求返回状态HTTP状态码

urlrs.geturl() 返回请求的URL

十五logging模块

函数式简单配置

import logging

logging.debug('debug message')

logging.info('info message')

logging.warning('warning message')

logging.error('error message')

logging.critical('critical message')

输出结果：

C:\Python3.6\python.exe H:/test/loggin模块/test1.py

WARNING:root:warning message

ERROR:root:error message

CRITICAL:root:critical message

进程已结束,退出代码0

　　默认情况下Python的logging模块将日志打印到了标准输出中，且只显示了大于等于WARNING级别的日志，

这说明默认的日志级别设置为WARNING（日志级别等级CRITICAL > ERROR > WARNING > INFO > DEBUG），

默认的日志格式为日志级别：Logger名称：用户输出消息。

灵活配置日志级别，日志格式，输出位置:

配置参数：

logging.basicConfig()函数中可通过具体参数来更改logging模块默认行为，可用参数有：

　　filename：用指定的文件名创建FiledHandler，这样日志会被存储在指定的文件中。

　　filemode：文件打开方式，在指定了filename时使用这个参数，默认值为“a”还可指定为“w”。

　　format：指定handler使用的日志显示格式。

　　datefmt：指定日期时间格式。

　　level：设置rootlogger（后边会讲解具体概念）的日志级别

　　stream：用指定的stream创建StreamHandler。可以指定输出到sys.stderr,sys.stdout或者文件(f=open(‘test.log’,’w’))，默认为sys.stderr。
　　若同时列出了filename和stream两个参数，则stream参数会被忽略。

format参数中可能用到的格式化串：

　　%(name)s Logger的名字

　　%(levelno)s 数字形式的日志级别

　　%(levelname)s 文本形式的日志级别

　　%(pathname)s 调用日志输出函数的模块的完整路径名，可能没有

　　%(filename)s 调用日志输出函数的模块的文件名

　　%(module)s 调用日志输出函数的模块名

　　%(funcName)s 调用日志输出函数的函数名

　　%(lineno)d 调用日志输出函数的语句所在的代码行

　　%(created)f 当前时间，用UNIX标准的表示时间的浮 点数表示

　　%(relativeCreated)d 输出日志信息时的，自Logger创建以 来的毫秒数

　　%(asctime)s 字符串形式的当前时间。默认格式是 “2003-07-08 16:49:45,896”。逗号后面的是毫秒

　　%(thread)d 线程ID。可能没有

　　%(threadName)s 线程名。可能没有

　　%(process)d 进程ID。可能没有

　　%(message)s用户输出的消息

logger对象配置

import logging

logger = logging.getLogger()

# 创建一个handler，用于写入日志文件

fh = logging.FileHandler('test.log',encoding='utf-8') 

# 再创建一个handler，用于输出到控制台

ch = logging.StreamHandler()

formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

fh.setLevel(logging.DEBUG)

fh.setFormatter(formatter)

ch.setFormatter(formatter) 

logger.addHandler(fh) #logger对象可以添加多个fh和ch对象

logger.addHandler(ch) 

logger.debug('logger debug message')

logger.info('logger info message')

logger.warning('logger warning message')

logger.error('logger error message')

logger.critical('logger critical message')

　　logging库提供了多个组件：Logger、Handler、Filter、Formatter。

Logger对象提供应用程序可直接使用的接口，Handler发送日志到适当的目的地，Filter提供了过滤日志信息的方法，Formatter指定日志显示格式。

另外，可以通过：logger.setLevel(logging.Debug)设置级别，当然也可以通过fh.setLevel(logging.Debug)单独对某个日志handler设置级别。

collections模块

在内置数据类型（dict、list、set、tuple）的基础上， collections模块还提供了几个额外的数据类型：Counter、deque、defaultdict、namedtuple和OrderedDict等。

1.namedtuple: 生成可以使用名字来访问元素内容的tuple

2.deque: 双端队列，可以快速的从另外一侧追加和推出对象

3.Counter: 计数器，主要用来计数

4.OrderedDict: 有序字典

5.defaultdict: 带有默认值的字典

namedtuple

我们知道 tuple 可以表示不变集合，例如，一个点的二维坐标就可以表示成：

>>> p = (1, 2)

但是，看到(1, 2)，很难看出这个tuple是用来表示一个坐标的。也就是说元祖在某些场合并不形象。

这时， namedtuple 就派上了用场：

>>> from collections import namedtuple

>>> Point = namedtuple('Point', ['x', 'y'])

>>> p = Point(1, 2)

>>> p.x

1

>>> p.y

2

类似的，如果要用坐标和半径表示一个圆，也可以用 namedtuple 定义：

#namedtuple('名称', [属性list]):

Circle = namedtuple('Circle', ['x', 'y', 'r'])

deque

使用list存储数据时，按索引访问元素很快，但是插入和删除元素就很慢了，因为list是线性存储，数据量大的时候，插入和删除效率很低。

deque是为了高效实现插入和删除操作的双向列表，适合用于队列和栈：

>>> from collections import deque

>>> q = deque(['a', 'b', 'c'])

>>> q.append('x')

>>> q.appendleft('y')

>>> q

deque(['y', 'a', 'b', 'c', 'x'])

deque 除了实现list的 append() 和 pop() 外，还支持 appendleft() 和 popleft() ，这样就可以非常高效地往头部添加或删除元素。

OrderedDict

*Python3.6中，Dict已经可以记住key加入的顺序了。

如果我们要显示保持Key的顺序，可以用 OrderedDict ：

>>> from collections import OrderedDict

>>> d = dict([('a', 1), ('b', 2), ('c', 3)])

>>> d # dict的Key是无序的

{'a': 1, 'c': 3, 'b': 2}

>>> od = OrderedDict([('a', 1), ('b', 2), ('c', 3)])

>>> od # OrderedDict的Key是有序的

OrderedDict([('a', 1), ('b', 2), ('c', 3)])

注意， OrderedDict 的Key会按照插入的顺序排列，不是Key本身排序：

>>> od = OrderedDict()

>>> od['z'] = 1

>>> od['y'] = 2

>>> od['x'] = 3

>>> od.keys() # 按照插入的Key的顺序返回

['z', 'y', 'x']

defaultdict

有如下值集合 [ 11 , 22 , 33 , 44 , 55 , 66 , 77 , 88 , 99 , 90. ..]，将所有大于 66 的值保存至字典的第一个key中，将小于 66 的值保存至第二个key的值中。

即： { 'k1' : 大于 66 , 'k2' : 小于 66 }

原生字典解决方法：

values = [11, 22, 33,44,55,66,77,88,99,90]

my_dict = {}

for value in  values:

    if value>66:

        if my_dict.has_key('k1'):

            my_dict['k1'].append(value)

        else:

            my_dict['k1'] = [value]

    else:

        if my_dict.has_key('k2'):

            my_dict['k2'].append(value)

        else:

            my_dict['k2'] = [value]

defaultdict字典解决方法：

from collections import defaultdict

values = [11, 22, 33,44,55,66,77,88,99,90]

my_dict = defaultdict(list)

for value in  values:

    if value>66:

        my_dict['k1'].append(value)

    else:

        my_dict['k2'].append(value)

使用 dict 时，如果引用的Key不存在，就会抛出 KeyError 。如果希望key不存在时，返回一个默认值，就可以用 defaultdict ：

>>> from collections import defaultdict

>>> dd = defaultdict(lambda: 'N/A')

>>> dd['key1'] = 'abc'

>>> dd['key1'] # key1存在

'abc'

>>> dd['key2'] # key2不存在，返回默认值

'N/A'

Counter

Counter类的目的是用来跟踪值出现的次数。

它是一个无序的容器类型，以字典的键值对形式存储，其中元素作为key，其计数作为value。

应用示例：

>>> from collections import Counter

>>> c = Counter('abcdeabcdabcaba')

>>> c

Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1})

Python3基础之内置模块的更多相关文章

Python3基础（八）模块
在程序中定义函数可以实现代码重用.但当你的代码逐渐变得庞大时,你可能想要把它分割成几个文件,以便能够更简单地维护.同时,你希望在一个文件中写的代码能够被其他文件所重用,这时我们应该使用模块(modul ...
python3基础视频教程
随着目前Python行业的薪资水平越来越高,很多人想加入该行业拿高薪.有没有想通过视频教程入门的同学们?这份Python教程全集等你来学习啦! python3基础视频教程:http://pan.bai ...
Python3基础-特别函数（map filter partial reduces sorted）实例学习
1. 装饰器关于Python装饰器的讲解,网上一搜有很多资料,有些资料讲的很详细.因此,我不再详述,我会给出一些连接,帮助理解. 探究functools模块wraps装饰器的用途案例1 impor ...
2. Python3 基础入门
Python3 基础入门编码在python3中,默认情况下以UTF-8编码.所有字符串都是 unicode 字符串,当然也可以指定不同编码.体验过2.x版本的编码问题,才知道什么叫难受. # -* ...
python002 Python3 基础语法
python002 Python3 基础语法编码默认情况下,Python 3 源码文件以 UTF-8 编码,所有字符串都是 unicode 字符串. 当然你也可以为源码文件指定不同的编码: # -* ...
Python3基础（十二）学习总结·附PDF
Python是一门强大的解释型.面向对象的高级程序设计语言,它优雅.简单.可移植.易扩展,可用于桌面应用.系统编程.数据库编程.网络编程.web开发.图像处理.人工智能.数学应用.文本处理等等. 在学 ...
【python3基础】python3 神坑笔记
目录 os 篇 os.listdir(path) 运算符篇 is vs. == 实例 1:判断两个整数相等实例 2:argparse 传参实例 3:np.where 命令行参数篇 Referenc ...
Python3基础语法和数据类型
Python3基础语法编码默认情况下,Python3源文件以UTF-8编码,所有字符串都是unicode字符串.当然你也可以为原码文件制定不同的编码: # -*- coding: 编码 -*- 标 ...
Python3基础-目录
Python3基础-目录(Tips:长期更新Python3目录) 第一章初识Python3 1.1 Python3基础-前言 1.2 Python3基础-规范第二章 Python3内置函数&a ...

随机推荐

JNDI数据源的使用
有时候我们数据库的连接会使用jndi的方式 try { InitialContext ic = new InitialContext(); dataSource = (DataSource) ic.l ...
还在拼字符串？试试HTML5的template标签
HTML5中<template>标签的详细介绍(图文) 这篇文章主要介绍了HTML5中的template标签,是HTML5入门中的重要知识,需要的朋友可以参考一.HTML5 templa ...
深入浅出ES6的标准内置对象Proxy
Proxy是ES6规范定义的标准内置对象,可以对目标对象的读取.函数调用等操作进行拦截.一般来说,通过Proxy可以让目标对象"可控",比如是否能调用对象的某个方法,能否往对象添加 ...
MySQL基础之事务编程学习笔记
MySQL基础之事务编程学习笔记在学习<MySQL技术内幕:SQL编程>一书,并做了笔记.本博客内容是自己学了<MySQL技术内幕:SQL编程>事务编程一章之后,根据自己的理 ...
【他山之石】jenkins忘记初始化密码解决办法
没有太好的方式,网上有的是这样子的,找到 /var/lib/jenkins/users/username/config.xml, 修改为一个已知的 hash 值 #jbcrypt:$2a$10$Dda ...
linux-free、lscpu、
1.free -h 以人类可读的形式显示 -m 以MB为单位显示 -w 将buffers和cache分开单独显示(针对centos7系统) centos6上: centos7上: -s 动态查看内存信 ...
[工具] Git版本管理（四）（贡献开源代码、git配置、git免密、gitignore）
一.开源项目贡献代码 1.fork项目代码例如,我们想向tornado框架贡献代码,首先搜索tornado. 然后,将tornado的代码fork到我们的仓库中. 2.clone到本地进行开发克隆 ...
1050 螺旋矩阵 (25 分)C语言
本题要求将给定的 N 个正整数按非递增的顺序,填入"螺旋矩阵".所谓"螺旋矩阵",是指从左上角第 1 个格子开始,按顺时针螺旋方向填充.要求矩阵的规模为 m 行 ...
《算法笔记》之基础C/C++进阶
这一次主要讲C++不同于C的地方:类. 1.类的定义定义一个类,本质上是定义一个数据类型的蓝图.这实际上并没有定义任何数据,但它定义了类的名称意味着什么,也就是说,它定义了类的对象包括了什么,以及可 ...
mui选择器和软键盘冲突解决
只需要让此节点失焦即可: onfocus="this.blur();"

Python3基础之内置模块

一、定义：

二、导入方法：

三、import本质(路径搜索和搜索路径)

__name__

四、导入优化

五、模块的分类

常用内置模块

(一)时间模块

(二)random模块

(三)os模块模块

(五)shutil模块

(六)json和pickle模块

(七)shelve模块

(八)PyYAML模块

(九)ConfigParser模块

(十)hashlib模块

(十一)re模块

正则表达式

字符：

量词：

. ^ $

* + ? { }

字符集［］［^］

分组 ()与 或 ｜［^］

转义符 \

贪婪匹配

re模块下的常用方法

十二string模块

十三math模块

十四urllib模块

十五logging模块

函数式简单配置

灵活配置日志级别，日志格式，输出位置:

配置参数：

logger对象配置

collections模块

namedtuple

deque

OrderedDict

defaultdict

Counter

Python3基础之内置模块的更多相关文章

随机推荐

热门专题

name

(十)hashlib模块　　

分组 ()与或｜［^］