1.集合类型定义及其操作:

集合用{}表示,元素用逗号分隔,无序,唯一

集合操作符:

|:并

-:减

&:交

^ :补

<= <:判断子集关系

>= >:判断包含关系

|=:

-=:

&=:

^=:

集合处理方法:

S.add(x)  字面意思

S.discard(x) 移除S中元素x,若不在,不报错

S.remove(x) 移除S中元素x,若不在,产生KeyError异常

S.clear(x) 移除S中所有元素

S.pop(x) 随机返回S的一个元素,更新S,若S为空产生KeyError异常

S.copy()

len(S)

x in S

x not in S

set(x)

集合类型应用场景:

包含关系比较

元素去重

2.序列类型及其操作:

定义:一维元素向量,元素类型可以不同

比如字符串

序列操作符:in, not in,s+t,s*n,s[i],s[i:j:k]

函数:len(s),min(s),max(s),s.index(x)或s.index(x,i,j),返回序列s从i开始到j位置中第一次出现x的位置,s.count(x),返回s中出现x的总次数

元组类型及其操作:一旦创建不能被更改,tuple()

列表类型及其操作:可以随意修改,使用[]或list()创建

ls[i] = x

ls[i:j:k] = lt

del s[i],del ls[i:j:k]删除

ls += lt

ls *= n

ls.append(x) 在列表ls最后增加一个元素x

ls.clear()

ls.copy()

ls.insert(i,x)

ls.pop(i) 取出并删除

ls.remove(x) 删除第一个x

ls.reverse() 将ls中的元素反转

3.序列类型应用场景:

item遍历

4.sorted排序

基本统计值实例

#CalStatisticsV1.py
def getNum():
nums = []
iNumStr = input("请输入数字(回车退出):")
while iNumStr !="":
nums.append(eval(iNumStr))
iNumStr = input("请输入数字(回车退出):")
return nums def mean(numbers):
s = 0.0
for num in numbers:
s = s+num
return s / len(numbers) def dev(numbers,mean):
sdev = 0.0
for num in numbers:
sdev = sdev + (num-mean)**2
return pow(sdev / (len(numbers)-1),0.5) def median(numbers):
sorted(numbers)
size = len(numbers)
if size % 2 == 0:
med = (numbers[size//2-1]+numbers[size//2])/2
else:
med = numbers[size//2]
return med n = getNum()
m = mean(n)
print("平均值:{},方差:{:.2},中位数:{}".format(m,dev(n,m),median(n)))

三、字典类型

映射是键和值的对应

采用{}和dict()创建,键值对用冒号:表示

d = {"C":"B","M":"H","F":"B"}
>>> d
{'C': 'B', 'M': 'H', 'F': 'B'}
>>> d["C"]

del d[k] 删除字典d中键k对应的数据值

k in d 判断键k是否在字典中

d.keys() 返回字典d中所有的键信息

d.values() 返回字典d中所有的值信息

d.items() 返回字典d中所有的键值对信息

>>> d = {"中国":"北京","美国":"华盛顿","法国":"巴黎"}
>>> "中国" in d
True
>>> d.keys()
dict_keys(['中国', '美国', '法国'])
>>> d.values()
dict_values(['北京', '华盛顿', '巴黎'])

d.get(k,<default>) 键k存在,则返回相应的值,不在,则返回<default>值

d.pop(k,<default>) 键k存在,则取出相应的值,不在,则返回<default>值

d.popitem() 随机从字典d中取出一个键值对,以元组形式返回

d.clear() 删除所有键值对

len(d) 返回字典d中元素的个数

>>> d.get("中国","伊斯兰堡")
'北京'
>>> d.get("日本","伊斯兰堡")
'伊斯兰堡'
>>> d.popitem()
('法国', '巴黎')
定义空字典 :d = {}
向d增加两个键值对元素:d["a"] = 1;d["b"] = 2

字典类型应用场景:

四、jieba库

Microsoft Windows [版本 10.0.17134.648]
(c) Microsoft Corporation。保留所有权利。 C:\Users\ASUS>pip install jieba
Collecting jieba
Downloading https://files.pythonhosted.org/packages/71/46/c6f9179f73b818d5827202ad1c4a94e371a29473b7f043b736b4dab6b8cd/jieba-0.39.zip (7.3MB)
% |████████▌ | .9MB 19kB/s eta ::32Exception:
Traceback (most recent call last):
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_vendor\urllib3\response.py", line , in _error_catcher
yield
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_vendor\urllib3\response.py", line , in read
data = self._fp.read(amt)
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_vendor\cachecontrol\filewrapper.py", line , in read
data = self.__fp.read(amt)
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\http\client.py", line , in read
n = self.readinto(b)
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\http\client.py", line , in readinto
n = self.fp.readinto(b)
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\socket.py", line , in readinto
return self._sock.recv_into(b)
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\ssl.py", line , in recv_into
return self.read(nbytes, buffer)
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\ssl.py", line , in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out During handling of the above exception, another exception occurred: Traceback (most recent call last):
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\cli\base_command.py", line , in main
status = self.run(options, args)
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\commands\install.py", line , in run
resolver.resolve(requirement_set)
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\resolve.py", line , in resolve
self._resolve_one(requirement_set, req)
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\resolve.py", line , in _resolve_one
abstract_dist = self._get_abstract_dist_for(req_to_install)
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\resolve.py", line , in _get_abstract_dist_for
self.require_hashes
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\operations\prepare.py", line , in prepare_linked_requirement
progress_bar=self.progress_bar
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\download.py", line , in unpack_url
progress_bar=progress_bar
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\download.py", line , in unpack_http_url
progress_bar)
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\download.py", line , in _download_http_url
_download_url(resp, link, content_file, hashes, progress_bar)
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\download.py", line , in _download_url
hashes.check_against_chunks(downloaded_chunks)
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\utils\hashes.py", line , in check_against_chunks
for chunk in chunks:
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\download.py", line , in written_chunks
for chunk in chunks:
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\utils\ui.py", line , in iter
for x in it:
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\download.py", line , in resp_read
decode_content=False):
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_vendor\urllib3\response.py", line , in stream
data = self.read(amt=amt, decode_content=decode_content)
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_vendor\urllib3\response.py", line , in read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\contextlib.py", line , in __exit__
self.gen.throw(type, value, traceback)
File "c:\users\asus\appdata\local\programs\python\python37-32\lib\site-packages\pip\_vendor\urllib3\response.py", line , in _error_catcher
raise ReadTimeoutError(self._pool, None, 'Read timed out.')
pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=): Read timed out.
You are using pip version 18.1, however version 19.0. is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command. C:\Users\ASUS>python -m pip install --upgrade pip
Collecting pip
Downloading https://files.pythonhosted.org/packages/d8/f3/413bab4ff08e1fc4828dfc59996d721917df8e8583ea85385d51125dceff/pip-19.0.3-py2.py3-none-any.whl (1.4MB)
% |████████████████████████████████| .4MB 20kB/s
Installing collected packages: pip
Found existing installation: pip 18.1
Uninstalling pip-18.1:
Successfully uninstalled pip-18.1
Successfully installed pip-19.0. C:\Users\ASUS>pip install jieba
Collecting jieba
Downloading https://files.pythonhosted.org/packages/71/46/c6f9179f73b818d5827202ad1c4a94e371a29473b7f043b736b4dab6b8cd/jieba-0.39.zip (7.3MB)
% |████████████████████████████████| .3MB 445kB/s
Installing collected packages: jieba
Running setup.py install for jieba ... done
Successfully installed jieba-0.39 C:\Users\ASUS>

精确模式:把文本精确地切分开,不存在冗余单词

全模式:有冗余

搜索引擎模式:在精确模式基础上,对长词再次切分

函数:

jieba.lcut(s) ,精确模式,返回一个列表类型的分词结果

jieba.lcut(s,cut_all=True) ,全模式,返回一个列表类型的分词结果,存在冗余

jieba.lcut_for_search(s) ,搜索引擎模式,返回一个列表类型的分词结果,存在冗余

jieba.add_word(w) 向分词词典增加新词w

>>> import jieba
>>> jieba.lcut("中国是一个伟大的国家")
Building prefix dict from the default dictionary ...
Dumping model to file cache C:\Users\ASUS\AppData\Local\Temp\jieba.cache
Loading model cost 1.139 seconds.
Prefix dict has been built succesfully.
['中国', '是', '一个', '伟大', '的', '国家']
>>> jieba.lcut("中国是一个伟大的国家")
['中国', '是', '一个', '伟大', '的', '国家']
>>> jieba.lcut("中国是一个伟大的国家",cut_all=True)
['中国', '国是', '一个', '伟大', '的', '国家']
>>> jieba.lcut_for_search("中华人民共和国是伟大的")
['中华', '华人', '人民', '共和', '共和国', '中华人民共和国', '是', '伟大', '的']

五、“文本词频统计”实例

#CalHamletV1.py
def getText():
txt = open("hamlet.txt","r").read() #读取文本
txt = txt.lower()
for ch in '~!@#$%^&*()_+{}:"<>?[];,./-=':
txt = txt.replace(ch," ")
return txt hamletTxt = getText()
words = hamletTxt.split()
counts = {}
for word in words:
counts[word] = counts.get(word,0)+1 #统计词频
items = list(counts.items())
items.sort(key=lambda x:x[1],reverse=True) #降序排列
for i in range(10):
word,count = items[i]
print("{0:<10}{1:>5}".format(word,count))
#CalThreeKingdomsV1.py
import jieba
txt = open("threekingdoms.txt","r",encoding="utf-8").read()
words = jieba.lcut(txt)
counts = {}
for word in words:
if len(word)==1:
continue
else:
counts[word] = counts.get(word,0)+1
items = list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(15):
word,count = items[i]
print("{0:<10}{1:>5}".format(word,count))
运行遇到问题,直接评论即可
#CalThreeKingdomsV2.py
import jieba
excludes= {"将军","却说","荆州","二人","不可","不能","如此","商议","如何","军士"}
txt = open("threekingdoms.txt","r",encoding="utf-8").read()
words = jieba.lcut(txt)
counts = {}
for word in words:
if len(word)==1:
continue
elif word=="诸葛亮" or word=="孔明曰":
rword = "孔明"
elif word=="关公" or word=="云长":
rword = "关羽"
elif word=="玄德" or word=="玄德曰":
rword = "刘备"
elif word=="孟德" or word=="丞相曰" or word=="丞相" or word=="主公":
rword = "曹操"
else:
rword = word
counts[rword] = counts.get(rword,0)+1
for word in excludes:
del counts[word]
items = list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(15):
word,count = items[i]
print("{0:<10}{1:>5}".format(word,count))

Python第六章(北理国家精品课 嵩天等)的更多相关文章

  1. Python第八章(北理国家精品课 嵩天等)

    程序设计方法 8.1体育竞技分析实例 from random import random def printIntro(): print("这个程序模拟两个选手A和B的某种竞技比赛" ...

  2. Python第七章(北理国家精品课 嵩天等)

    7.1文件的使用 1.1文本类型 文本文件:由单一特定编码组成的文件,如.txt 二进制文件:如.png,.avi 1.2文件的打开和关闭 打开-操作-关闭 <变量名> = open(&l ...

  3. Python第四章(北理国家精品课 嵩天等)

    一.程序的分支结构 二.身体质量指数BMI #CalBIv1.py height,weight = eval((input("请输入身高(米)和体重\(公斤)[逗号隔开]:"))) ...

  4. Python第二章(北理国家精品课 嵩天等)

    一.深入理解Python语言 语法简洁 >13万第三方库 二.Python蟒蛇绘制 #PythonDraw.py import turtle turtle.setup(650,350,200,2 ...

  5. Python第五章(北理国家精品课 嵩天等)

    函数和代码复用 函数的定义和使用 def <函数名>(<参数(0个或多个)>): <函数体> return <返回值>可选参数放在不可选参数之后*b不定 ...

  6. Python第三章(北理国家精品课 嵩天等)

    一.数字类型及其操作 整数:pow(x,y),想算多大,就算多大:以0b或0B开头表示二进制:以0o或0O开头表示八进制:以0x或0X开头表示十六进制. 浮点数:取值范围-10^308至10^308, ...

  7. Python第一章(北理国家精品课 嵩天等)

    1.1程序设计基本方法 IPO 分析问题,划分边界,设计算法: 编写程序,调试测试,升级维护. 1.2Python开发环境配置 1.3实例1:温度转换 1.4Python程序语法元素分析 缩进,#添加 ...

  8. 简学Python第六章__class面向对象编程与异常处理

    Python第六章__class面向对象编程与异常处理 欢迎加入Linux_Python学习群  群号:478616847 目录: 面向对象的程序设计 类和对象 封装 继承与派生 多态与多态性 特性p ...

  9. Python第六章 面向对象

    第六章 面向对象 1.面向对象初了解 ​ 面向对象的优点: ​ 1.对相似功能的函数,同一个业务下的函数进行归类,分类 ​ 2.类是一个公共的模板,对象就是从具体的模板中实例化出来的,得到对象就得到一 ...

随机推荐

  1. webpack学习笔记 (三) webpack-dev-server插件和HotModuleReplacementPlugin插件使用

    webpack-dev-server插件 webpack-dev-server是webpack官方提供的一个小型Express服务器.使用它可以为webpack打包生成的资源文件提供web服务. we ...

  2. windows平台 python生成 pyd文件

    Python的文件类型介绍: .py       python的源代码文件 .pyc     Python源代码import后,编译生成的字节码 .pyo     Python源代码编译优化生成的字节 ...

  3. HDU 5299 Circles Game

    HDU 5299 思路: 圆扫描线+树上删边博弈 圆扫描线有以下四种情况,用set维护扫描线与圆的交点,重载小于号 代码: #pragma GCC optimize(2) #pragma GCC op ...

  4. legend2---项目总结(legend2的意义)

    legend2---项目总结(legend2的意义) 一.总结 一句话总结:总体来说还是化腐朽为神奇的,之前投了很多精力在学习上面,学的内容非常多,但是都记不住,尤其是英语,感悟也是没办法继续深悟,这 ...

  5. TreeMap/LinkedHashMap/HashMap按键排序和按值排序

    今天做统计时需要对X轴的地区按照地区代码(areaCode)进行排序,由于在构建XMLData使用的map来进行数据统计的,所以在统计过程中就需要对map进行排序. 一.简单介绍Map 在讲解Map排 ...

  6. ibm产品系列架构师技术路线

  7. laravle 整合 thrift

    1,安装thrift https://www.cnblogs.com/sunlong88/p/9965522.html 2,生成 RPC文件 thrift -r --out ./app --gen p ...

  8. js中for(var key in o ){};用法小记

    o不只可以是对象,key也不只可以是对象中的键. o也可以是一个数组,这时候的key就是数组的下标,从"0"开始,注意下标“0”是个字符串类型. 但是这种循环在 IE8浏览器下 对 ...

  9. Lapack求解线性方程组

    可参见这两个页面: 1. http://www.culatools.com/dense/lapack/ 2. http://www.netlib.org/lapack/lug/node1.html 根 ...

  10. 小白的python之路11/3内存 进程 二进制软件包 rpm yum

    一 分区 查看swap分区 swapon -s mkswap /dev/sdb8 激活 swapon -a /dev/sdb8 swapon -s (sdb8进入了文件中) 提供内存服务 free ( ...