Python的平凡之路（5）

一、模块介绍

定义：

模块--用来从逻辑上组织python代码（变量，函数，类，逻辑：实现一个功能），本质就是.py结尾的python文件（文件名test.py，模块名test）

包—用来从逻辑上组织模块的，本质就是一个目录（必须带有一个__init__文件）

导入方法（模块、包）：

import module_name #同级目录，引入模块中的变量\方法等，module_alex.name module_alex.logger

import module_name,module2_name

from module_alex(dir) import module_name ,module2_name

from module_alex import logger as logger_alex

from module_alex import name #导入模块的name变量

import本质（路径搜索和所搜路径）：

导入模块的本质是把python文件解释一遍

导入包的本质就是执行该包下的__init__文件 #要练习一下，os.path.append()，在__init__中执行from . import test1 意思是把当前路径下的test1.py导入到

__init__.py中

导入优化：

from module_test import test # 在其他PY文件中导入模块，这样写可以方便调用，省了查找

譬如：

module.py

#import module_test #这样写不优化

from module_test import test #这样会写优化

def logger():

#module_test.test() #这样写不优化

test（）#这样写优化，避免了查找module_test

print(“1234”)

def logger2():

# module_test.test() #这样写不优化

print(“5678”)

module2.py

#!/usr/bin/env python
#Author is wspikh
# -*- coding: encoding -*-
#导入module_import包中test5程序的test方法
from module_import.test5 import test
def logger1():
test()
print("------oh!-------")
logger1()

def logger2():
test()
print("------yeah!------")
logger2()

模块的分类：

a 标准库

b 开源模块

c 自定义模块

二、time &datetime模块

时间表示：

时间戳、格式化的时间字符串、元组（9个元素），时间戳函数time（），clock（）

举例：

time.py

#!/usr/bin/env python
#Author is wspikh
# -*- coding: encoding -*-
import os,time,datetime
print(time.localtime())
print(time.time())
print(time.clock())
print(time.sleep(2))
print(time.gmtime()) #时间戳转时间元组(标准时区)
print(time.localtime()) #时间戳转时间元组(本地时区)

#时间戳转(时间)元组
x = time.localtime(123213123)
print(x)
print(x.tm_year)
print("this is 1973 day:%d" %x.tm_yday)
#元组转时间戳
x = time.localtime()
print(time.mktime(x))

#元组转格式化时间(格式化的字符串)
x = time.localtime()
print(time.strftime("%Y-%m-%d %H:%M:%S",x))

#格式化时间转元组

print(time.strptime("2016-08-20 14:43:50","%Y-%m-%d %H:%M:%S"))

date.py

#!/usr/bin/env python
#Author is wspikh
# -*- coding: encoding -*-
import datetime,time

print(datetime.datetime.now())
print(datetime.date.fromtimestamp(time.time()))
print(datetime.datetime.now() )
print(datetime.datetime.now() + datetime.timedelta(3)) #当前时间+3天
print(datetime.datetime.now() + datetime.timedelta(-3)) #当前时间-3天
print(datetime.datetime.now() + datetime.timedelta(hours=3)) #当前时间+3小时
print(datetime.datetime.now() + datetime.timedelta(minutes=30)) #当前时间+30分

三、random模块

用途：

用于生成随机数

举例：

#!/usr/bin/env python
#Author is wspikh
# -*- coding: encoding -*-
import random

print(random.randint(1,3)) #1~3都能照顾到
print(random.randrange(1,3)) #不留尾
print(random.sample("hello",2)) #多字符中选取特定数量的字符
print(random.uniform(1,3)) #随机浮点数
print(random.random()) #随机浮点数

#随机验证码
checkcode = ''
for i in range(4):
current = random.randrange(0,4)
#字母
if current == i:
tmp = chr(random.randint(65,90)) #随机获取一个字母
#数字
else:
tmp = random.randint(0,9) #随机获取一个数字
checkcode += str(tmp)
print(checkcode)

四、os模块

定义：

包含普遍的操作系统功能

常用方法：

1、os.name

输出字符串指示正在使用的平台。如果是window 则用'nt'表示，对于Linux/Unix用户，它是'posix'。

2、os.getcwd()

函数得到当前工作目录，即当前Python脚本工作的目录路径。

3、os.listdir()

>>> os.listdir(os.getcwd())

['Django', 'DLLs', 'Doc', 'include', 'Lib', 'libs', 'LICENSE.txt', 'MySQL-python-wininst.log', 'NEWS.txt', 'PIL-wininst.log', 'python.exe', 'pythonw.exe', 'README.txt', 'RemoveMySQL-python.exe', 'RemovePIL.exe', 'Removesetuptools.exe', 'Scripts', 'setuptools-wininst.log', 'tcl', 'Tools', 'w9xpopen.exe']

>>>

4、os.remove()

删除一个文件。

5、os.system()

运行shell命令。

>>> os.system('dir')

0

>>> os.system('cmd') #启动dos

6、os.sep 可以取代操作系统特定的路径分割符。

7、os.linesep字符串给出当前平台使用的行终止符

>>> os.linesep

'\r\n'            #Windows使用'\r\n'，Linux使用'\n'而Mac使用'\r'。

>>> os.sep

'\\'              #Windows

>>>

8、os.path.split()

函数返回一个路径的目录名和文件名

>>> os.path.split('C:\\Python25\\abc.txt')

('C:\\Python25', 'abc.txt')

9、os.path.isfile()和os.path.isdir()函数分别检验给出的路径是一个文件还是目录。

>>> os.path.isdir(os.getcwd())

True

>>> os.path.isfile('a.txt')

False

10、os.path.exists()函数用来检验给出的路径是否真地存在

>>> os.path.exists('C:\\Python25\\abc.txt')

False

>>> os.path.exists('C:\\Python25')

True

>>>

11、os.path.abspath(name):获得绝对路径

12、os.path.normpath(path):规范path字符串形式

13、os.path.getsize(name):获得文件大小，如果name是目录返回0L

14、os.path.splitext():分离文件名与扩展名

>>> os.path.splitext('a.txt')

('a', '.txt')

15、os.path.join(path,name):连接目录与文件名或目录

>>> os.path.join('c:\\Python','a.txt')

'c:\\Python\\a.txt'

>>> os.path.join('c:\\Python','f1')

'c:\\Python\\f1'

>>>

16、os.path.basename(path):返回文件名

>>> os.path.basename('a.txt')

'a.txt'

>>> os.path.basename('c:\\Python\\a.txt')

'a.txt'

>>>

17、os.path.dirname(path):返回文件路径

>>> os.path.dirname('c:\\Python\\a.txt')
'c:\\Python'

五、sys模块

用途：

sys模块包括了一组非常实用的服务，内含很多函数方法和变量，用来处理Python运行时配置以及资源，从而可以与前当程序之外的系统环境交互，如：Python解释器

常用的几个变量：

1 )、 sys.stdin 标准输入流。

2）、sys.stdout 标准输出流。

3 )、 sys.stderr 标准错误流。

4 )、sys.path 查找模块所在目录的目录名列表。

5 )、sys.argv 命令行的参数，包括脚本名称。

6 )、sys.platform 返回当前系统平台，如：win32、Linux等。

详解：

处理命令行参数

在解释器启动后, argv 列表包含了传递给脚本的所有参数, 列表的第一个元素为脚本自身的名称.

使用sys模块获得脚本的参数

复制代码代码如下:

print "script name is", sys.argv[0] # 使用sys.argv[0]采集脚本名称

if len(sys.argv) > 1:
    print "there are", len(sys.argv)-1, "arguments:" # 使用len(sys.argv)-1采集参数个数-1为减去[0]脚本名称
    for arg in sys.argv[1:]:            #输出除了[0]外所有参数
        print arg
else:
    print "there are no arguments!"

如果是从标准输入读入脚本 (比如 "python < sys-argv-example-1.py"), 脚本的名称将被设置为空串.

如果把脚本作为字符串传递给python (使用 -c 选项), 脚本名会被设置为 "-c".

处理模块

path 列表是一个由目录名构成的列表, Python 从中查找扩展模块( Python 源模块, 编译模块,或者二进制扩展).

启动 Python 时,这个列表从根据内建规则, PYTHONPATH 环境变量的内容, 以及注册表( Windows 系统)等进行初始化.

由于它只是一个普通的列表, 你可以在程序中对它进行操作,

使用sys模块操作模块搜索路径

复制代码代码如下:

print "path has", len(sys.path), "members"

sys.path.insert(0, "samples") #将路径插入到path,[0]中
import sample

sys.path = [] #删除path中所有路径
import random

使用sys模块查找内建模块

builtin_module_names 列表包含 Python 解释器中所有内建模块的名称

复制代代码如下:

def dump(module):
    print module, "=>",
    if module in sys.builtin_module_names: #查找内建模块是否存在
        print "<BUILTIN>"
    else:
        module = _ _import_ _(module)         #非内建模块输出模块路径
        print module._ _file_ _

dump("os")
dump("sys")
dump("string")
dump("strop")
dump("zlib")

os => C:\python\lib\os.pyc
sys => <BUILTIN>
string => C:\python\lib\string.pyc
strop => <BUILTIN>
zlib => C:\python\zlib.pyd

使用sys模块查找已导入的模块

modules 字典包含所有加载的模块. import 语句在从磁盘导入内容之前会先检查这个字典.

Python 在处理你的脚本之前就已经导入了很多模块.

复制代码代码如下:

print sys.modules.keys()

['os.path', 'os', 'exceptions', '_ _main_ _', 'ntpath', 'strop', 'nt',
'sys', '_ _builtin_ _', 'site', 'signal', 'UserDict', 'string', 'stat']

使用sys模块获得当前平台

sys.platform 返回当前平台出现如： "win32" "linux2" 等

处理标准输出/输入

标准输入和标准错误 (通常缩写为 stdout 和 stderr) 是内建在每一个 UNIX 系统中的管道。

当你 print 某些东西时，结果前往 stdout 管道；

当你的程序崩溃并打印出调试信息 (例如 Python 中的 traceback (错误跟踪)) 的时候，信息前往 stderr 管道

复制代码代码如下:

>>> for i in range(3):
... print'Dive in'

Dive in
Dive in
Dive in
>>> import sys
>>> for i in range(3):
... sys.stdout.write('Dive in')

Dive inDive inDive in
>>> for i in range(3):
... sys.stderr.write('Dive in')

Dive inDive inDive in

stdout 是一个类文件对象；调用它的 write 函数可以打印出你给定的任何字符串。

实际上，这就是 print 函数真正做的事情；它在你打印的字符串后面加上一个硬回车，然后调用 sys.stdout.write 函数。

在最简单的例子中，stdout 和 stderr 把它们的输出发送到相同的地方

和 stdout 一样，stderr 并不为你添加硬回车；如果需要，要自己加上。

stdout 和 stderr 都是类文件对象，但是它们都是只写的。

它们都没有 read 方法，只有 write 方法。然而，它们仍然是类文件对象，因此你可以将其它任何 (类) 文件对象赋值给它们来重定向其输出。

使用sys重定向输出

复制代码代码如下:

print 'Dive in'        # 标准输出
saveout = sys.stdout        # 终在重定向前保存stdout，这样的话之后你还可以将其设回正常
fsock = open('out.log', 'w')      # 打开一个新文件用于写入。如果文件不存在，将会被创建。如果文件存在，将被覆盖。
sys.stdout = fsock                 # 所有后续的输出都会被重定向到刚才打开的新文件上。

print 'This message will be logged instead of displayed' # 这样只会将输出结果“打印”到日志文件中；屏幕上不会看到输出

sys.stdout = saveout # 在我们将 stdout 搞乱之前，让我们把它设回原来的方式。

fsock.close() # 关闭日志文件。

重定向错误信息

fsock = open('error.log', 'w')           # 打开你要存储调试信息的日志文件。
sys.stderr = fsock                           # 将新打开的日志文件的文件对象赋值给stderr以重定向标准错误。
raise Exception, 'this error will be logged'   # 引发一个异常,没有在屏幕上打印出任何东西,所有正常的跟踪信息已经写进error.log

还要注意你既没有显式关闭日志文件，也没有将 stderr 设回最初的值。

这样挺好，因为一旦程序崩溃 (由于引发的异常)，Python 将替我们清理并关闭文件

打印到 stderr

向标准错误写入错误信息是很常见的，所以有一种较快的语法可以立刻导出信息

复制代码代码如下:

>>> print 'entering function'
entering function
>>> import sys
>>> print >> sys.stderr, 'entering function'

entering function

print 语句的快捷语法可以用于写入任何打开的文件 (或者是类文件对象)。

在这里，你可以将单个print语句重定向到stderr而且不用影响后面的print语句。

使用sys模块退出程序

复制代码代码如下:

import sys

sys.exit(1)

注意 sys.exit 并不是立即退出. 而是引发一个 SystemExit 异常. 这意味着你可以在主程序中捕获对 sys.exit 的调用

捕获sys.exit调用

复制代码代码如下:

import sys
print "hello"
try:
    sys.exit(1)
except SystemExit:   # 捕获退出的异常
    pass                    # 捕获后不做任何操作
print "there

hello

there

如果准备在退出前自己清理一些东西(比如删除临时文件), 你可以配置一个 "退出处理函数"(exit handler), 它将在程序退出的时候自动被调用

另一种捕获sys.exit调用的方法

复制代代码如下:

def exitfunc():
print "world"

sys.exitfunc = exitfunc # 设置捕获时调用的函数

print "hello"
sys.exit(1) # 退出自动调用exitfunc()后，程序依然退出了

print "there" # 不会被 print

hello

world

六、shuti模块l

用途：

shutil -- High-level file operations 是一种高层次的文件操作工具

类似于高级API，而且主要强大之处在于其对文件的复制与删除操作更是比较支持好。是高级的文件，文件夹，压缩包处理模块。

详解：

shutil.copyfileobj(fsrc, fdst[, length])
将文件内容拷贝到另一个文件中

import shutil

shutil.copyfileobj(open('old.xml','r'), open('new.xml', 'w'))

shutil.copyfile(src, dst)
拷贝文件

1	`shutil.copyfile('f1.log',` `'f2.log')`

shutil.copymode(src, dst)
仅拷贝权限。内容、组、用户均不变

1	`shutil.copymode('f1.log',` `'f2.log')`

shutil.copystat(src, dst)
仅拷贝状态的信息，包括：mode bits, atime, mtime, flags

1	`shutil.copystat('f1.log',` `'f2.log')`

shutil.copy(src, dst)
拷贝文件和权限

1	`shutil.copy('f1.log',` `'f2.log')`

shutil.copy2(src, dst)
拷贝文件和状态信息

1	`shutil.copy2('f1.log',` `'f2.log')`

shutil.ignore_patterns(*patterns)
shutil.copytree(src, dst, symlinks=False, ignore=None)
递归的去拷贝文件夹

shutil.copytree('folder1', 'folder2', ignore=shutil.ignore_patterns('*.pyc', 'tmp*'))

shutil.copytree('f1', 'f2', symlinks=True, ignore=shutil.ignore_patterns('*.pyc', 'tmp*'))

shutil.rmtree(path[, ignore_errors[, onerror]])
递归的去删除文件

1	`shutil.rmtree('folder1')`

shutil.move(src, dst)
递归的去移动文件，它类似mv命令，其实就是重命名。

1	`shutil.move('folder1',` `'folder3')`

shutil.make_archive(base_name, format,...)

创建压缩包并返回文件路径，例如：zip、tar

base_name：压缩包的文件名，也可以是压缩包的路径。只是文件名时，则保存至当前目录，否则保存至指定路径，
如：www =>保存至当前路径
如：/Users/wupeiqi/www =>保存至/Users/wupeiqi/
format：压缩包种类，“zip”, “tar”, “bztar”，“gztar”
root_dir：要压缩的文件夹路径（默认当前目录）
owner：用户，默认当前用户
group：组，默认当前组
logger：用于记录日志，通常是logging.Logger对象

#将 /Users/wupeiqi/Downloads/test 下的文件打包放置当前程序目录

import shutil

ret = shutil.make_archive("wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')

#将 /Users/wupeiqi/Downloads/test 下的文件打包放置 /Users/wupeiqi/目录

import shutil

ret = shutil.make_archive("/Users/wupeiqi/wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')

shutil 对压缩包的处理是通过调用ZipFile 和 TarFile两个模块来进行的。

import zipfile

# 压缩

z = zipfile.ZipFile('laxi.zip', 'w')

z.write('a.log')

z.write('data.data')

z.close()

# 解压

z = zipfile.ZipFile('laxi.zip', 'r')

z.extractall()

z.close()

import tarfile

# 压缩

tar = tarfile.open('your.tar','w')

tar.add('/Users/wupeiqi/PycharmProjects/bbs2.log', arcname='bbs2.log')

tar.add('/Users/wupeiqi/PycharmProjects/cmdb.log', arcname='cmdb.log')

tar.close()

# 解压

tar = tarfile.open('your.tar','r')

tar.extractall() # 可设置解压地址

tar.close()

七、json模块

定义：

序列化（Serialization）：将对象的状态信息转换为可以存储或可以通过网络传输的过程，传输的格式可以是JSON，XML等。反序列化就是从存储区域（JSON，XML）读取反序列化对象的状态，重新创建该对象。 JSON（Java Script Object Notation）：一种轻量级数据交互格式，相对于XML而言更简单，也易于阅读和编写，机器也方便解析和生成，Json是JavaScript中的一个子集。

python2.6版本开始加入了JSON模块，python的json模块序列化与反序列化的过程分别是encoding和decoding。

encoding：把一个python对象编码转换成Json字符串。

decoding：把json格式字符串编码转换成python对象。

详解：

json提供四个功能：dumps, dump, loads, load

1 # dumps功能

2 # 将数据通过特殊的形式转换为所有程序语言都认识的字符串

3 >>> import json

4 >>> data = ['aa', 'bb', 'cc']

5 >>> j_str = json.dumps(data)

6 >>> j_str

7 '["aa", "bb", "cc"]'

1 # loads功能

2 # 将json编码的字符串再转换为python的数据结构

3 >>> j_str

4 '["aa", "bb", "cc"]'

5 >>> mes = json.loads(j_str)

6 >>> mes

7 ['aa', 'bb', 'cc']

1 # dump功能

2 # 将数据通过特殊的形式转换为所有程序语言都认识的字符串，并写入文件

3 with open('D:/tmp.json', 'w') as f:

4     json.dump(data, f)

1 # load功能

2 # 从数据文件中读取数据,并将json编码的字符串转换为python的数据结构

3 with open('D:/tmp.json', 'r') as f:

4     data = json.load(f)

　　json编码支持的基本类型有：None, bool, int, float, string, list, tuple, dict.

　　对于字典，json会假设key是字符串（字典中的任何非字符串key都会在编码时转换为字符串），要符合JSON规范，应该只对python列表和字典进行编码。此外，在WEB应用中，把最顶层对象定义为字典是一种标准做法。

　　json编码的格式几乎和python语法一致，略有不同的是：True会被映射为true,False会被映射为false,None会被映射为null，元组()会被映射为列表[]，因为其他语言没有元组的概念，只有数组，也就是列表。

1 >>> import json
2 >>> data = {'a':True, 'b':False, 'c':None, 'd':(1,2), 1:'abc'}
3 >>> j_str = json.dumps(data)
4 >>> j_str
5 '{"a": true, "c": null, "d": [1, 2], "b": false, "1": "abc"}'

八、pickle模块

定义：

用于序列化的两个模块

　　json：用于字符串和Python数据类型间进行转换

　　pickle: 用于python特有的类型和python的数据类型间进行转换

　　json提供四个功能：dumps,dump,loads,load

　　pickle提供四个功能：dumps,dump,loads,load

存储类型：

1、所有python支持的原生类型：布尔值，整数，浮点数，复数，字符串，字节，None。

2、由任何原生类型组成的列表，元组，字典和集合。

3、函数，类，类的实例

详解：

1 # dumps功能

2 mport pickle

3 data = ['aa', 'bb', 'cc']

4 # dumps 将数据通过特殊的形式转换为只有python语言认识的字符串

5 p_str = pickle.dumps(data)

6 print(p_str)            
7 b'\x80\x03]q\x00(X\x02\x00\x00\x00aaq\x01X\x02\x00\x00\x00bbq\x02X\x02\x00\x00\x00ccq\x03e.

1 # loads功能

2 # loads  将pickle数据转换为python的数据结构

3 mes = pickle.loads(p_str)

4 print(mes)

5 ['aa', 'bb', 'cc']

1 # dump功能

2 # dump 将数据通过特殊的形式转换为只有python语言认识的字符串，并写入文件

3 with open('D:/tmp.pk', 'w') as f:

4     pickle.dump(data, f)

1 # load功能
2 # load 从数据文件中读取数据，并转换为python的数据结构
3 with open('D:/tmp.pk', 'r') as f:
4 data = pickle.load(f)

九、shelve模块

定义：

如果只需要存储一个简单的数据，shelve模块可以满足。

所要做的就是为它提供文件名。shelve最重要的函数是open，在调用它的时候（使用文件名作为参数），它会返回一个shelf对象，可以用它来存储内容。

只需要把它当成普通的字典就好。但是键一定要字符串，在完成工作之后，调用它的close方法。

详解：

1、潜在陷阱：

意识到shelve.open函数返回的对象并不是普通的映射是很重要的，如下所示：

In [2]: s = shelve.open('test.db')
In [3]: s['x'] = ['a','b','c']
In [4]: s['x']
Out[4]: ['a', 'b', 'c']
In [5]: s['x'].append('d')
In [6]: s['x']
Out[6]: ['a', 'b', 'c']

（1）列表['a', 'b', 'c']存储在键x下

（2）获得存储的表示，并且根据它来创建新的列表，而'd'被添加到这个副本中，修改的版本还没有被保存。

（3）最终，再次获得原始版本——没有'd'

为了避免上面的问题，在使用shelve模块修改存储的对象，必须将临时变量绑定到获得的副本上，并且在它被修改后重新存储这个副本。

#!/usr/bin/env python
#Author is wspikh
# -*- coding: encoding -*-
import shelve
def store_person(db):
pid = input("Enter unique ID number:")
person = {}
person['name'] = input('Enter name:')
person['age'] = input('Enter age:')
person['phone'] = input('Enter phone:')
db[pid]= person
def lookup_person(db):
pid = input("Enter ID numbers:")
field = input("What would you like to know?(name,age,phone):")
#capitalize函数将字符串的第一个字母变成大写,其他字母变小写
print(field.capitalize()+":",db[pid][field])
def print_help():
print("The avaliable commands are:")
print("store: stores information about a person")
print("lookup: looks up a person from ID number")
print("quit: save changes and exit")
print("?: prints this message")
def enter_command():
cmd = input("Enter command(? for help):")
cmd = cmd.strip().lower()
return cmd
def main():
database = shelve.open("test.db")
try:
while True:
cmd = enter_command()
if cmd == "store":
store_person(database)
elif cmd == "lookup":
lookup_person(database)
elif cmd == "?":
print_help()
elif cmd == "quit":
return
else:
print("please sure input is right!")
exit()
finally:
database.close()

if __name__ == "__main__":main()

（1）以上代码将所有内容都放到函数中会让程序更加结构化。

（2）主程序放到main函数中，只有在if __name__ == '__main__'条件成立的时候才被调用。这意味着可以在其他程序中将这个程序作为模块导入，然后调用main()函数。

（3）在main函数中打开数据库，然后将其作为参数传给另外需要它的函数。在大多数情况下最好避免使用全局变量。

（4）对读取的内容调用strip和lower函数以生成了一个修改后的版本。如果总是对用户的输入使用strip和lower函数，那么就可以让用户随意输入大小写字母和添加空格

（5）使用try/finally确保数据库能正常关闭。我们永远不知道什么时候会出错，如果程序在没有正确关闭数据库的情况下终止，那么数据库文件就有可能被损坏。

运行结果如下：

/Library/Frameworks/Python.framework/Versions/3.5/bin/python3.5 /Users/khwsp/PycharmProjects/a18/day5/shelve2.py
Enter command(? for help):?
The avaliable commands are:
store: stores information about a person
lookup: looks up a person from ID number
quit: save changes and exit
?: prints this message
Enter command(? for help):store
Enter unique ID number:789
Enter name:wspkh
Enter age:44
Enter phone:13877779999
Enter command(? for help):?
The avaliable commands are:
store: stores information about a person
lookup: looks up a person from ID number
quit: save changes and exit
?: prints this message
Enter command(? for help):lookup
Enter ID numbers:789
What would you like to know?(name,age,phone):age
Age: 44
Enter command(? for help):quit

虽然普通字典也能达到这样的效果，但是shelve是将数据字典存储在数据库文件中，是存储在磁盘上，只要数据库文件不损坏，数据也不会丢失

十、xml处理模块

定义：

　　xml是实现不同语言或程序之间进行数据交换的协议，可扩展标记语言，标准通用标记语言的子集。是一种用于标记电子文件使其具有结构性的标记语言。xml格式如下，是通过<>节点来区别数据结构的。

详解：

XML文件解析的两种方式：

1、解析字符串方式

2、解析文件方式

十一、yaml处理模块

<详见文档>

十二、configparser模块

定义：

configparser用于配置文件解析，可以解析特定格式的配置文件，多数此类配置文件名格式为XXX.ini，例如mysql的配置文件。在python3.X中

模块名为configparser

配置文件：

##### ini 文件示例 ########

[section1]

name = wang

age = 18

[section2]

name:python

age = 19

#### 文件格式说明 #########

[XXX] 代表节点

XX = XX 或者 XX : XX 代表参数

详解：

import configparser # 导入模块

config = configparser.ConfigParser() # 创建对象

config.read("user.ini", encoding="utf-8") # 读取配置文件，如果配置文件不存在则创建

# 查看

secs = config.sections() # 获取所有的节点名称

print(secs)

# ['section1', 'section2']

options = config.options('section1') # 获取指定节点的所有key

print(options)

# ['name', 'age']

item_list = config.items('section1') # 获取指定节点的键值对

print(item_list）

#[('name', 'wang'), ('age', '18')]

val = config.get('section1', 'name') # 获取指定节点的指定key的value

print(val)

#　wang

val = config.getint('section1', 'age') # 获取节点section1的age属性，属性需要是int型，否则ValueError

print(val）

# 18

val = config.has_section('section1') # 检查指定节点是否存在，返回True或False

print(val)

# True

val = config.has_option('section1', 'age') # 检查指定节点中是否存在某个key，返回True或False

print(val)

#True

# 增删改

config.add_section("node") # 添加一个节点，节点名为node, 此时添加的节点node尚未写入文件

config.write(open('user.ini', "w")) # 将添加的节点node写入配置文件

config.remove_section("node") # 删除一个节点，节点名为node, 删掉了内存中的节点node

config.write(open("user.ini", "w")) # 将删除节点node后的文件内容回写到配置文件

config.set("section1", "k1", "v1")

# 在已存在的节点中添加一个键值对k1 = v1 ,如果该节点不存在则报错,如果key已经存在，则修改value

# configparser.NoSectionError: No section: 'section'

config.write(open("user.ini", "w"))

十三、hashlib模块

定义：

用于加密相关的操作，代替了md5模块和sha模块，主要提供SHA1，SHA224，SHA256，SHA384，SHA512,MD5算法。

在python3中已经废弃了md5和sha模块，简单说明下md5和sha的使用。什么是摘要算法呢？摘要算法又称为哈希算法，散列算法。它通过一个函数，把任意长度的数据转换为一个长度固顶的数据串（通常用16进制的字符串表示）用于加密相关的操作。

详解：

1. md5加密

1 hash = hashlib.md5()

2 hash.update('admin'.encode('utf-8'))

3 print(hash.hexdigest())

4 21232f297a57a5a743894a0e4a801fc3

2. sha1加密

1 hash = hashlib.sha1()

2 hash.update('admin'.encode('utf-8'))

3 print(hash.hexdigest())

4 d033e22ae348aeb5660fc2140aec35850c4da997

3. sha256加密

1 hash = hashlib.sha256()

2 hash.update('admin'.encode('utf-8'))

3 print(hash.hexdigest())

4 8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918

4. sha384加密

1 hash = hashlib.sha384()

2 hash.update('admin'.encode('utf-8'))

3 print(hash.hexdigest())

4 9ca694a90285c034432c9550421b7b9dbd5c0f4b6673f05f6dbce58052ba20e4248041956ee8c9a2ec9f10290cdc0782

5. sha512加密

1 hash = hashlib.sha512()

2 hash.update('admin'.encode('utf-8'))

3 print(hash.hexdigest())

4 c7ad44cbad762a5da0a452f9e854fdc1e0e7a52a38015f23f3eab1d80b931dd472634dfac71cd34ebc35d16ab7fb8a90c81f975113d6c7538dc69dd8de9077ec

6. ‘加盐’加密

　　以上加密算法虽然很厉害，但仍然存在缺陷，通过撞库可以反解。所以必要对加密算法中添加自定义key再来做加密。

1 ######  md5 加密 ############

2 hash = hashlib.md5('python'.encode('utf-8'))

3 hash.update('admin'.encode('utf-8'))

4 print(hash.hexdigest())

5 75b431c498b55557591f834af7856b9f

7. hmac加密

　　hmac内部对我们创建的key和内容进行处理后在加密

1 import hmac

2 h = hmac.new('python'.encode('utf-8'))

3 h.update('helloworld'.encode('utf-8'))

4 print(h.hexdigest())

5 b3b867248bb4cace835b59562c39fd55

8. 获取文件的MD5

import hashlib

def md5sum(filename):

"""

用于获取文件的md5值

:param filename: 文件名

:return: MD5码

"""

if not os.path.isfile(filename): # 如果校验md5的文件不是文件，返回空

return

myhash = hashlib.md5()

f = open(filename, 'rb')

while True:

b = f.read(8096)

if not b:

break

myhash.update(b)

f.close()

return myhash.hexdigest()

举例：

hmac.py

#!/usr/bin/env python
#Author is wspikh
# -*- coding: encoding -*-
import hmac
h = hmac.new(b"wueiqi")
h.update(b"hellowo")
print(h.hexdigest())
h.update(b"wohello")
print(h.hexdigest())
#h的结果等同于h2
h2 = hmac.new(b"wueiqi")
h2.update(b"hellowowohello")

print(h2.hexdigest())

#获取字符串MD5

md5 = hashlib.md5('hello!world'.encode('utf-8')).hexdigest()

print(md5）

#获取文件的MD5

file = '/tmp/sh/k.sh'

md5file = open(file,'rb')

md5 = hashlib.md5(md5file.read()).hexdigest()

md5file.close()

print(md5)

十四、subprocess模块

定义：

subprocess最早在2.4版本引入。用来生成子进程，并可以通过管道连接他们的输入/输出/错误，以及获得他们的返回值。

# subprocess用来替换多个旧模块和函数

os.system

os.spawn*

os.popen*

popen2.*

commands.*

运行python的时候，我们都是在创建并运行一个进程，linux中一个进程可以fork一个子进程，并让这个子进程exec另外一个程序。在python中，我们通过标准库中的subprocess包来fork一个子进程，并且运行一个外部的程序。subprocess包中定义有数个创建子进程的函数，这些函数分别以不同的方式创建子进程，所欲我们可以根据需要来从中选取一个使用。另外subprocess还提供了一些管理标准流(standard stream)和管道(pipe)的工具，从而在进程间使用文本通信。

详解：

1. call

　　执行命令，返回状态码，shell = True允许shell命令时字符串形式

1 2	`subprocess.check_call(["ls",` `"-l"])` `subprocess.check_call("exit 1", shell=True)`

2. check_call

　　执行命令，如果执行状态码是0，则返回0，否则抛出异常

1 2	`subprocess.check_call(["ls",` `"-l"])` `subprocess.check_call("exit 1", shell=True)`

3. check_output

　　执行命令，如果状态码是0，则返回执行结果，否则抛出异常

1 2	`subprocess.check_output(["echo",` `"Hello World!"])` `subprocess.check_output("exit 1", shell=True)`

4. subprocess.Popen(...)　

　　用于执行复杂的系统命令

参数：

args: 可以是字符串或者序列类型（如：list, tuple）。默认的，要执行的程序应该是序列的第一个字段，如果是单个字符串，它的解析依赖于平台。在unix中，如果args是一个字符串，那么这个字符串解释成被执行程序的名字或路径，然而，这种情况只能用在不需要参数的程序。
bufsieze: 指定缓冲。0表示无缓冲，1表示缓冲，任何其他的整数值表示缓冲大小，负数值表示使用系统默认缓冲，通常表示完全缓冲。默认值为0即没有缓冲。
stdin, stdout, stderr：分别表示程序的标准输入，输出，错误句柄
preexec_fn : 只在unix平台有效，用于指定一个可执行对象，它将在子进程中运行之前被调用
close_fds : 在windows平台下，如果close_fds被设置为true，则新创建的子进程将不会继承父进程的输入，输出，错误管道。所以不能将close_fds设置为true同时重定向子进程的标准输入，输出与错误。
shell : 默认值为False, 声明了是否使用shell来执行程序，如果shell=True,它将args看做一个字符串，而不是一个序列。在unix系统中，且shell=True, shell默认使用/bin/sh。
cwd : 用于设置子进程的当前目录。当它不为None时，子程序在执行前，它的当前路径会被替换成cwd的值。这个路径并不会被添加到可执行程序的搜索路径，所以cwd不能是相对路径。
env : 用于指定子进程的环境变量。如果env=None,子进程的环境变量将从父进程中继承。当它不为None时，它是新进程的环境变量的映射。可以用它来代替当前进程的环境。
universal_newlines : 不同系统的换行符不同，文件对象stdout和stderr都被以文本文件的方式打开
startupinfo 与 createionflags只在windows下生效。将被传递给底层的CreateProcess()函数，用于设置子进程的一些属性，如：主窗口的外观，进程的优先级等等

执行普通命令：

import subprocess

ret1 = subprocess.Popen(["mkdir","t1"])

ret2 = subprocess.Popen("mkdir t2", shell=True)

终端输入的命令分为两种：

输入即可得到输出，如：ifconfig
输入进行某环境，依赖在输入，如: python

import subprocess

obj = subprocess.Popen("mkdir t3", shell=True, cwd='/home/dev',)

import subprocess

obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)

obj.stdin.write("print(1)\n")

obj.stdin.write("print(2)")

obj.stdin.close()

cmd_out = obj.stdout.read()

obj.stdout.close()

cmd_error = obj.stderr.read()

obj.stderr.close()

print(cmd_out)

print(cmd_error)

import subprocess

obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)

obj.stdin.write("print(1)\n")

obj.stdin.write("print(2)")

out_error_list = obj.communicate()

print(out_error_list)

import subprocess

obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)

out_error_list = obj.communicate('print("hello")')

print(out_error_list)

十五、logging模块

定义：

日志打印模块

详解：

1.简单的将日志打印到屏幕

import logging

logging.debug('This is debug message')

logging.info('This is info message')

logging.warning('This is warning message')

屏幕上打印:

WARNING:root:This is warning message

默认情况下，logging将日志打印到屏幕，日志级别为WARNING；日志级别大小关系为：CRITICAL > ERROR > WARNING > INFO > DEBUG > NOTSET，当然也可以自己定义日志级别。

2.通过logging.basicConfig函数对日志的输出格式及方式做相关配置

import logging

logging.basicConfig(level=logging.DEBUG,

                format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',

                datefmt='%a, %d %b %Y %H:%M:%S',

                filename='myapp.log',

                filemode='w')

logging.debug('This is debug message')

logging.info('This is info message')

logging.warning('This is warning message')

./myapp.log文件中内容为:

Sun, 24 May 2009 21:48:54 demo2.py[line:11] DEBUG This is debug message

Sun, 24 May 2009 21:48:54 demo2.py[line:12] INFO This is info message

Sun, 24 May 2009 21:48:54 demo2.py[line:13] WARNING This is warning message

logging.basicConfig函数各参数:
filename: 指定日志文件名
filemode: 和file函数意义相同，指定日志文件的打开模式，'w'或'a'
format: 指定输出的格式和内容，format可以输出很多有用信息，如上例所示:
%(levelno)s: 打印日志级别的数值
%(levelname)s: 打印日志级别名称
%(pathname)s: 打印当前执行程序的路径，其实就是sys.argv[0]
%(filename)s: 打印当前执行程序名
%(funcName)s: 打印日志的当前函数
%(lineno)d: 打印日志的当前行号
%(asctime)s: 打印日志的时间
%(thread)d: 打印线程ID
%(threadName)s: 打印线程名称
%(process)d: 打印进程ID
%(message)s: 打印日志信息
datefmt: 指定时间格式，同time.strftime()
level: 设置日志级别，默认为logging.WARNING
stream: 指定将日志的输出流，可以指定输出到sys.stderr,sys.stdout或者文件，默认输出到sys.stderr，当stream和filename同时指定时，stream被忽略

3.将日志同时输出到文件和屏幕

import logging

logging.basicConfig(level=logging.DEBUG,

                format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',

                datefmt='%a, %d %b %Y %H:%M:%S',

                filename='myapp.log',

                filemode='w')##################################################################################################定义一个StreamHandler，将INFO级别或更高的日志信息打印到标准错误，并将其添加到当前的日志处理对象#

console = logging.StreamHandler()

console.setLevel(logging.INFO)

formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s')

console.setFormatter(formatter)

logging.getLogger('').addHandler(console)#################################################################################################

logging.debug('This is debug message')

logging.info('This is info message')

logging.warning('This is warning message')

屏幕上打印:

root        : INFO     This is info message

root        : WARNING  This is warning message

./myapp.log文件中内容为:

Sun, 24 May 2009 21:48:54 demo2.py[line:11] DEBUG This is debug message

Sun, 24 May 2009 21:48:54 demo2.py[line:12] INFO This is info message

Sun, 24 May 2009 21:48:54 demo2.py[line:13] WARNING This is warning message

4.logging之日志回滚

import logging

from logging.handlers importRotatingFileHandler##################################################################################################定义一个RotatingFileHandler，最多备份5个日志文件，每个日志文件最大10MRthandler=RotatingFileHandler('myapp.log', maxBytes=10*1024*1024,backupCount=5)Rthandler.setLevel(logging.INFO)

formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s')Rthandler.setFormatter(formatter)

logging.getLogger('').addHandler(Rthandler)################################################################################################

从上例和本例可以看出，logging有一个日志处理的主对象，其它处理方式都是通过addHandler添加进去的。

logging的几种handle方式如下：

logging.StreamHandler: 日志输出到流，可以是sys.stderr、sys.stdout或者文件
logging.FileHandler: 日志输出到文件

日志回滚方式，实际使用时用RotatingFileHandler和TimedRotatingFileHandler
logging.handlers.BaseRotatingHandler
logging.handlers.RotatingFileHandler
logging.handlers.TimedRotatingFileHandler

logging.handlers.SocketHandler: 远程输出日志到TCP/IP sockets
logging.handlers.DatagramHandler: 远程输出日志到UDP sockets
logging.handlers.SMTPHandler: 远程输出日志到邮件地址
logging.handlers.SysLogHandler: 日志输出到syslog
logging.handlers.NTEventLogHandler: 远程输出日志到Windows NT/2000/XP的事件日志
logging.handlers.MemoryHandler: 日志输出到内存中的制定buffer
logging.handlers.HTTPHandler: 通过"GET"或"POST"远程输出到HTTP服务器

由于StreamHandler和FileHandler是常用的日志处理方式，所以直接包含在logging模块中，而其他方式则包含在logging.handlers模块中，上述其它处理方式的使用请参见python2.5手册！

5.通过logging.config模块配置日志

#logger.conf

###############################################

[loggers]

keys=root,example01,example02

[logger_root]

level=DEBUG

handlers=hand01,hand02

[logger_example01]

handlers=hand01,hand02

qualname=example01

propagate=0

[logger_example02]

handlers=hand01,hand03

qualname=example02

propagate=0

###############################################

[handlers]

keys=hand01,hand02,hand03

[handler_hand01]

class=StreamHandler

level=INFO

formatter=form02

args=(sys.stderr,)

[handler_hand02]

class=FileHandler

level=DEBUG

formatter=form01

args=('myapp.log', 'a')

[handler_hand03]

class=handlers.RotatingFileHandler

level=INFO

formatter=form02

args=('myapp.log', 'a', 10*1024*1024, 5)

###############################################

[formatters]

keys=form01,form02

[formatter_form01]

format=%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s

datefmt=%a, %d %b %Y %H:%M:%S

[formatter_form02]

format=%(name)-12s: %(levelname)-8s %(message)s

datefmt=

上例3：

import logging

import logging.config

logging.config.fileConfig("logger.conf")

logger = logging.getLogger("example01")

logger.debug('This is debug message')

logger.info('This is info message')

logger.warning('This is warning message')

上例4：

import logging
import logging.config

logging.config.fileConfig("logger.conf")
logger = logging.getLogger("example02")

logger.debug('This is debug message')
logger.info('This is info message')

logger.warning('This is warning message')

6.logging是线程安全的

十六、re正则表达式模块

定义：

Python 的 re 模块（Regular Expression 正则表达式）提供各种正则表达式的匹配操作，在文本解析、复杂字符串分析和信息提取时是一个非常有用的工具

常用用法：

语法	意义	说明
"."	任意字符
"^"	字符串开始	'^hello'匹配'helloworld'而不匹配'aaaahellobbb'
"$"	字符串结尾	与上同理
"*"	0 个或多个字符（贪婪匹配）	<*>匹配<title>chinaunix</title>
"+"	1 个或多个字符（贪婪匹配）	与上同理
"?"	0 个或多个字符（贪婪匹配）	与上同理
*?,+?,??	以上三个取第一个匹配结果（非贪婪匹配）	<*>匹配<title>
{m,n}	对于前一个字符重复m到n次，{m}亦可	a{6}匹配6个a、a{2,4}匹配2到4个a
{m,n}?	对于前一个字符重复m到n次，并取尽可能少	‘aaaaaa’中a{2,4}只会匹配2个
"\\"	特殊字符转义或者特殊序列
[]	表示一个字符集	[0-9]、[a-z]、[A-Z]、[^0]
"\|"	或	A\|B,或运算
(...)	匹配括号中任意表达式
(?#...)	注释，可忽略
(?=...)	Matches if ... matches next, but doesn't consume the string.	'(?=test)' 在hellotest中匹配hello
(?!...)	Matches if ... doesn't match next.	'(?!=test)' 若hello后面不为test，匹配hello
(?<=...)	Matches if preceded by ... (must be fixed length).	'(?<=hello)test' 在hellotest中匹配test
(?<!...)	Matches if not preceded by ... (must be fixed length).	'(?<!hello)test' 在hellotest中不匹配test

正则表达式特殊序列表如下：

特殊序列符号	意义
\A	只在字符串开始进行匹配
\Z	只在字符串结尾进行匹配
\b	匹配位于开始或结尾的空字符串
\B	匹配不位于开始或结尾的空字符串
\d	相当于[0-9]
\D	相当于[^0-9]
\s	匹配任意空白字符:[\t\n\r\r\v]
\S	匹配任意非空白字符:[^\t\n\r\r\v]
\w	匹配任意数字和字母:[a-zA-Z0-9]
\W	匹配任意非数字和字母:[^a-zA-Z0-9]

详解：

常用的功能函数包括：compile、search、match、split、findall（finditer）、sub（subn）

compile
re.compile(pattern[, flags])
作用：把正则表达式语法转化成正则表达式对象
flags定义包括：
re.I：忽略大小写
re.L：表示特殊字符集 \w, \W, \b, \B, \s, \S 依赖于当前环境
re.M：多行模式
re.S：’ . ’并且包括换行符在内的任意字符（注意：’ . ’不包括换行符）
re.U：表示特殊字符集 \w, \W, \b, \B, \d, \D, \s, \S 依赖于 Unicode 字符属性数据库
更多用法可以在http://www.devexception.com/sitemap_index.xml上查找
search
re.search(pattern, string[, flags])
search (string[, pos[, endpos]])
作用：在字符串中查找匹配正则表达式模式的位置，返回 MatchObject 的实例，如果没有找到匹配的位置，则返回 None。

match
re.match(pattern, string[, flags])
match(string[, pos[, endpos]])
作用：match() 函数只在字符串的开始位置尝试匹配正则表达式，也就是只报告从位置 0 开始的匹配情况，而 search() 函数是扫描整个字符串来查找匹配。如果想要搜索整个字符串来寻找匹配，应当用 search()。

下面是几个例子：
例：最基本的用法，通过re.RegexObject对象调用

#!/usr/bin/env python
import re
r1 = re.compile(r'world')
if r1.match('helloworld'):
print 'match succeeds'
else:
print 'match fails'
if r1.search('helloworld'):
print 'search succeeds'
else:
print 'search fails'

说明一下：r是raw(原始)的意思。因为在表示字符串中有一些转义符，如表示回车'\n'。如果要表示\表需要写为'\\'。但如果我就是需要表示一个'\'+'n'，不用r方式要写为:'\\n'。但使用r方式则为r'\n'这样清晰多了。

例：设置flag

#r2 = re.compile(r'n$', re.S)
#r2 = re.compile('\n$', re.S)
r2 = re.compile('World$', re.I)
if r2.search('helloworld\n'):
print 'search succeeds'
else:
print 'search fails'

例：直接调用

if re.search(r'abc','helloaaabcdworldn'):
print 'search succeeds'
else:
print 'search fails'

split
re.split(pattern, string[, maxsplit=0, flags=0])
split(string[, maxsplit=0])
作用：可以将字符串匹配正则表达式的部分割开并返回一个列表
例：简单分析ip

#!/usr/bin/env python
import re
r1 = re.compile('W+')
print r1.split('192.168.1.1')
print re.split('(W+)','192.168.1.1')
print re.split('(W+)','192.168.1.1',
1)

结果如下：
['192', '168', '1', '1']
['192', '.', '168', '.', '1', '.', '1']
['192', '.', '168.1.1']

findall
re.findall(pattern, string[, flags])
findall(string[, pos[, endpos]])
作用：在字符串中找到正则表达式所匹配的所有子串，并组成一个列表返回
例：查找[]包括的内容（贪婪和非贪婪查找）

#!/usr/bin/env python
import re
r1 = re.compile('([.*])')
print re.findall(r1,"hello[hi]heldfsdsf[iwonder]lo")
r1 = re.compile('([.*?])')
print re.findall(r1,"hello[hi]heldfsdsf[iwonder]lo")
print re.findall('[0-9]{2}',"fdskfj1323jfkdj")
print re.findall('([0-9][a-z])',"fdskfj1323jfkdj")
print re.findall('(?=www)',"afdsfwwwfkdjfsdfsdwww")
print re.findall('(?<=www)',"afdsfwwwfkdjfsdfsdwww")

finditer
re.finditer(pattern, string[, flags])
finditer(string[, pos[, endpos]])
说明：和 findall 类似，在字符串中找到正则表达式所匹配的所有子串，并组成一个迭代器返回。同样 RegexObject 有：

sub
re.sub(pattern, repl, string[, count, flags])
sub(repl, string[, count=0])
说明：在字符串 string 中找到匹配正则表达式 pattern 的所有子串，用另一个字符串 repl 进行替换。如果没有找到匹配 pattern 的串，则返回未被修改的 string。Repl 既可以是字符串也可以是一个函数。
例：

#!/usr/bin/env python
import re
p = re.compile('(one|two|three)')
print p.sub('num','one word two words three words
apple', 2)

subn
re.subn(pattern, repl, string[, count, flags])
subn(repl, string[, count=0])

说明：该函数的功能和 sub() 相同，但它还返回新的字符串以及替换的次数。