collections——高性能容器数据类型

　　由于最近对机器学习算法感兴趣，一直知道python有一个包collections封装了一些比dict,list之类高级点的类，所以抽空研究下，为接下来的工作准备。

　　主要参考是https://docs.python.org/2/library/collections.html#defaultdict-objects官方的文档，根据不高的英文水平翻译和理解总结出来的，如果有错误欢迎提醒，万一，您有兴趣转载的也请注明是@瓜棚

collections封装的结构主要有5个:

###########################################################################################################################################

Counter            *            字典(dict)的子类用来统计可哈希对象的值                                    *      new in py 2.7   *

deque              *            双端队列，两端都可以作为队列的结束，方便前端插入需求                         *      new in py 2.4   *

namedtuple         *            tuple的子类，可以用于按名字标识元组                                       *      new in py 2.6   *

OrderedDict        *            dict的子类，创建一个可哈希的有序字典                                      *      new in py 2.7   *

defaultdict        *            dict的子类，当某个key不存在时，一共一个默认值，而不是报KeyError              *      new in py 2.5   *

Counter类

example:

from collections import Counter

cnt = Counter()

for word in ['','','','','','']:

    cnt[word] +=1

cnt

#Counter({'1':3,'2':2,'3':1})

#####################################

#统计一段话里出现次数最多的10个词和各自的次数

text = ['a', 'an', 'and', 'are', 'as', 'at', 'be', 'by', 'can','for', 'from', 'have', 'if', 'in', 'is', 'it', 'may','not', 'of', 'on', 'or', 'tbd', 'that', 'the', 'this','to', 'us', 'we', 'when', 'will', 'with', 'yet','you', 'your', '的', '了', '和','or', 'tbd', 'that', 'the', 'this','to', 'us', 'we', 'when', 'will','when']

Counter(text).most_common(10)

#[('when', 3), ('to', 2), ('we', 2), ('that', 2), ('tbd', 2), ('this', 2), ('us',2), ('will', 2), ('the', 2), ('or', 2)]

Counter类是dict的子类，接受参数可以是iterable或者mapping.Counter是一个无序的集合。

c = Counter()     #一个新的空的Counter

c = Counter('kwejrkhdskf')    #以可迭代对象'kwejrkhdskf'为基础创建Counter

c = Counter({'red': 4, 'blue': 2})    #以mapping{'red': 4, 'blue': 2}为基础创建Counter

c = Counter(cats=4, dogs=8)    #以keywords args为基础创建Counter

如果索引的key不存在，Counter类不会报一个KeyError,相应地，它扩展了dict类，如果索引的key不存在，则返回0，如果key存在，则返回对应的值。

>>> c = Counter(['eggs', 'ham','eggs'])

>>> c['bacon']

0    #在c.keys中不存在'bacon'，故返回0

>>> c['eggs']

2    #在c.keys中存在'bacon'，故返回对应的value

设置一个key的值为0，并不意味着把key从Counter中删除，相应的用del 关键词可以达到想要的效果。

>>> c['sausage'] = 0  #设置'sausage'的个数为0

>>> del c['sausage']    #从Counter中删除'sausage'

Counter的一些常用方法

 ##########

elements()

    Return an iterator over elements repeating each as many times as its count. Elements are returned in arbitrary order. If an element’s count is less than one, elements() will ignore it.

##########

most_common([n])

Return a list of the n most common elements and their counts from the most common to the least. If n is omitted or None, most_common() returns all elements in the counter. Elements with equal counts are ordered arbitrarily:

#########################

subtract([iterable-or-mapping])

Elements are subtracted from an iterable or from another mapping (or counter). Like dict.update() but subtracts counts instead of replacing them. Both inputs and outputs may be zero or negative.

##########################

sum(c.values())                 # 求和

c.clear()                       # 重置

list(c)                         # 转换为list

set(c)                          # 转换为set

dict(c)                         # 转换为dict

c.items()                       # 类似dict的items()

c += Counter()                  # 删除掉值为0和负数的count

Counter还提供加、减、与、或运算。

>>> c = Counter(a=3, b=1)

>>> d = Counter(a=1, b=2)

>>> c + d                       # 相加:  c[x] + d[x]

Counter({'a': 4, 'b': 3})

>>> c - d                       # 相减 (舍掉非整数值)

Counter({'a': 2})

>>> c & d                       # 取最小值:  min(c[x], d[x])

Counter({'a': 1, 'b': 1})

>>> c | d                       # 取最大值:  max(c[x], d[x])

Counter({'a': 3, 'b': 2})

deque

example:

>>> from collections import deque

>>> d = deque('ghi')                 # 创建实例

>>> for elem in d:                   # 迭代d

...     print elem.upper()

G

H

I

>>> d.append('j')                    # 在最右边加

>>> d.appendleft('f')                # 在最左边加

>>> d                                # show

deque(['f', 'g', 'h', 'i', 'j'])

>>> d.pop()                          # 在右边pop

'j'

>>> d.popleft()                      #在左边pop

'f'

>>> list(d)                          # 转换为list

['g', 'h', 'i']

>>> d[0]                             #用下标获取元素

'g'

>>> d[-1]                            # 类比list语法

'i'

>>> list(reversed(d))

['i', 'h', 'g']

>>> 'h' in d

True

>>> d.extend('jkl')                  # 拼接一个可迭代对象

>>> d

deque(['g', 'h', 'i', 'j', 'k', 'l'])

>>> d.rotate(1)                      #顺时针旋转

>>> d

deque(['l', 'g', 'h', 'i', 'j', 'k'])

>>> d.rotate(-1)                     #逆时针旋转

>>> d

deque(['g', 'h', 'i', 'j', 'k', 'l'])

>>> deque(reversed(d))

deque(['l', 'k', 'j', 'i', 'h', 'g'])

>>> d.clear()                        # 清空

>>> d.pop()                          # 没有元素不可以pop

Traceback (most recent call last):

  File "<pyshell#6>", line 1, in -toplevel-

    d.pop()

IndexError: pop from an empty deque

>>> d.extendleft('abc')              # reverse input

>>> d

deque(['c', 'b', 'a'])

del d[n] 相当于：

def del_(d,n):

    d.rorate(-n)

    d.popleft()

    d.rorate(n)

一个有趣的例子是计算MA：

#####################

算法：

    例如：[40, 30, 50, 46, 39, 44] --> 40.0 42.0 45.0 43.0

计算公式： MA = (C1+C2+C3+C4+C5+....+Cn)/n C 为收盘价，n 为移动平均周期数例如，现货黄金的 5 日移动平均价格计算方法为： MA 5 = （前四天收盘价+前三天收盘价+前天收盘价+昨天收盘价+今天收盘价）/5

#####################

def moving_average(iterable, n=3):

    # http://en.wikipedia.org/wiki/Moving_average

    it = iter(iterable)

    d = deque(itertools.islice(it, n-1))

    d.appendleft(0)

    s = sum(d)

    for elem in it:

        s += elem - d.popleft()

        d.append(elem)

        yield s / float(n)

namedtuple()

example:

>>> Point = namedtuple('Point', ['x', 'y'], verbose=True)

class Point(tuple):

    'Point(x, y)'

    __slots__ = ()

    _fields = ('x', 'y')

    def __new__(_cls, x, y):

        'Create a new instance of Point(x, y)'

        return _tuple.__new__(_cls, (x, y))

    @classmethod

    def _make(cls, iterable, new=tuple.__new__, len=len):

        'Make a new Point object from a sequence or iterable'

        result = new(cls, iterable)

        if len(result) != 2:

            raise TypeError('Expected 2 arguments, got %d' % len(result))

        return result

    def __repr__(self):

        'Return a nicely formatted representation string'

        return 'Point(x=%r, y=%r)' % self

    def _asdict(self):

        'Return a new OrderedDict which maps field names to their values'

        return OrderedDict(zip(self._fields, self))

    def _replace(_self, **kwds):

        'Return a new Point object replacing specified fields with new values'

        result = _self._make(map(kwds.pop, ('x', 'y'), _self))

        if kwds:

            raise ValueError('Got unexpected field names: %r' % kwds.keys())

        return result

    def __getnewargs__(self):

        'Return self as a plain tuple.   Used by copy and pickle.'

        return tuple(self)

    __dict__ = _property(_asdict)

    def __getstate__(self):

        'Exclude the OrderedDict from pickling'

        pass

    x = _property(_itemgetter(0), doc='Alias for field number 0')

    y = _property(_itemgetter(1), doc='Alias for field number 1')

>>> p = Point(11, y=22)     # 实例化一个点对象

>>> p[0] + p[1]             # 索引方式相加

33

>>> x, y = p                # unpack like a regular tuple

>>> x, y

(11, 22)

>>> p.x + p.y               # 属性方式相加

33

>>> p                       # __repr__实例的值

Point(x=11, y=22)

namedtuple对于导入csv和sqlite3的数据十分方便。以下是官方的demo

EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade')

import csv

for emp in map(EmployeeRecord._make, csv.reader(open("employees.csv", "rb"))):

    print emp.name, emp.title

import sqlite3

conn = sqlite3.connect('/companydata')

cursor = conn.cursor()

cursor.execute('SELECT name, age, title, department, paygrade FROM employees')

for emp in map(EmployeeRecord._make, cursor.fetchall()):

    print emp.name, emp.title

namedtuple的一些封装方法

>>> t = [11, 22]

>>> Point._make(t)    #通过_make(可迭代对象)对实例传值

Point(x=11, y=22)

>>> p._asdict()    #返回一个有序字典（py2.7更新的功能）

OrderedDict([('x', 11), ('y', 22)])

>>> p = Point(x=11, y=22)

>>> p._replace(x=33) #_replace方法替换值

Point(x=33, y=22)

>>> for partnum, record in inventory.items():

        inventory[partnum] = record._replace(price=newprices[partnum], timestamp=time.now())

>>> p._fields            # 查看 fields名字

('x', 'y')

>>> Color = namedtuple('Color', 'red green blue')

>>> Pixel = namedtuple('Pixel', Point._fields + Color._fields)

>>> Pixel(11, 22, 128, 255, 0)

Pixel(x=11, y=22, red=128, green=255, blue=0)

>>> getattr(p, 'x') #获取p实例的x的值

11

>>> d = {'x': 11, 'y': 22}

>>> Point(**d)    #用"**"表示传的参数是一个字典

Point(x=11, y=22)

>>> class Point(namedtuple('Point', 'x y')):

        __slots__ = ()

        @property

        def hypot(self):

            return (self.x ** 2 + self.y ** 2) ** 0.5

        def __str__(self):

            return 'Point: x=%6.3f  y=%6.3f  hypot=%6.3f' % (self.x, self.y, self.hypot)

>>> Point3D = namedtuple('Point3D', Point._fields + ('z',))

>>> Account = namedtuple('Account', 'owner balance transaction_count')

>>> default_account = Account('<owner name>', 0.0, 0)

>>> johns_account = default_account._replace(owner='John')

>>> Status = namedtuple('Status', 'open pending closed')._make(range(3)) #实例化时，也可以同时初始化对象

>>> Status.open, Status.pending, Status.closed

(0, 1, 2)

>>> class Status:

        open, pending, closed = range(3)

OrderedDict

普通字典是无序结构，不是可哈希的值，对于某些应用情况可能不方便，OrderedDict提供的就是无序字典结构有序化的方法。

example:

>>> # 普通的字典

>>> d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}

>>> # 以key排序

>>> OrderedDict(sorted(d.items(), key=lambda t: t[0]))

OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])

>>> # 以value排序

>>> OrderedDict(sorted(d.items(), key=lambda t: t[1]))

OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])

>>> # 以key的长度排序

>>> OrderedDict(sorted(d.items(), key=lambda t: len(t[0])))

OrderedDict([('pear', 1), ('apple', 4), ('orange', 2), ('banana', 3)])

defaultdict

dict的子类，当某个key不存在时，提供一个默认值，而不是报错"keyerror"。

example

>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]

>>> d = defaultdict(list) #以list格式储存字典values

>>> for k, v in s:

...     d[k].append(v)

...

>>> d.items()

[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

>>> d = {}

>>> for k, v in s:

...     d.setdefault(k, []).append(v) #另一种方式

...

>>> d.items()

[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

>>> s = 'mississippi'

>>> d = defaultdict(int) #这种风格比较像counter

>>> for k in s:

...     d[k] += 1

...

>>> d.items()

[('i', 4), ('p', 2), ('s', 4), ('m', 1)]

>>> def constant_factory(value):

...     return itertools.repeat(value).next

>>> d = defaultdict(constant_factory('<missing>'))

>>> d.update(name='John', action='ran')

>>> '%(name)s %(action)s to %(object)s' % d #key=object是缺失的值，采用默认值

'John ran to <missing>'

>>> s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)]

>>> d = defaultdict(set) #以集合形式存储字典的values

>>> for k, v in s:

...     d[k].add(v)

...

>>> d.items()

[('blue', set([2, 4])), ('red', set([1, 3]))]

collections——高性能容器数据类型的更多相关文章

python初探-collections容器数据类型
collections容器数据类型是对基本数据类型的补充,简单介绍下计数器.有序字典.默认字典.可命名元祖.队列. 计数器(Counter) Counter是对字典类型的补充,用于追踪值得出现次数 c ...
Python3标准库：collections容器数据类型
1. collections容器数据类型 collections模块包含除内置类型list.dict和tuple以外的其他容器数据类型. 1.1 ChainMap搜索多个字典 ChainMap类管理一 ...
python 标准类库-数据类型之集合-容器数据类型
标准类库-数据类型之集合-容器数据类型 by:授客 QQ:1033553122 Counter对象例子 >>> from collections import Counter ...
python collections（容器）模块
原文:http://docs.pythontab.com/interpy/collections/collections/ 容器(Collections) Python附带一个模块,它包含许多容器数据 ...
Python3-collections模块-容器数据类型
Python3中的collections模块实现了一些专业的容器数据类型最常用的容器数据类型字典.列表和元组.集合都已经被Python默认导入,但在实现一些特定的业务时,collections模块 ...
python容器数据类型的特色
python容器数据类型的特色 list: 可变数据类型(不可哈希), 有序, 可索引获取, 可修改 Dict: 可变数据类型(不可哈希), 3.6版本有序, 可通 ...
Kube-OVN 1.2.0发布，携手社区成员打造高性能容器网络
Kube-OVN 1.2.0 新版本如期而至,支持 Vlan 和 OVS-DPDK 两种类型的高性能网络接口.本次发布得益于社区的壮大,感谢Intel爱尔兰开发团队与锐捷网络开发团队持续积极参与Kub ...
Python中的高性能容器--collections
集合模块相对于 Python 中内置的称为链表.集合.字典和元组的默认容器类型来说,集合模块( collection module )提供了高性能的备选方案( alternative ). 简单地看 ...
python模块--collections(容器数据类型)
Counter类(dict的子类, 计数器) 方法返回值类型说明 __init__ Counter 传入可迭代对象, 会对对象中的值进行计数, 值为键, 计数为值 .elements() 迭代器 ...

随机推荐

NAS4Free 安装配置（六）配置transmission实现BT（PT）下载
配置transmission transmission是一个跨平台的BT客户端首先我们建立一个存放transmission配置文件的目录可以通过SSH,也可以通过网页来完成注意:最好是通过SSH ...
论如何进CSDN博客排名前500
http://www.jtahstu.com/blog/post-71.html 目前该方法并不适用于博客园,显然写博客园的程序员智商要高些.
开心菜鸟系列学习笔记－－－－－－－－初探Nodejs（了解篇）
一Node.js开始学习了! 1) 输出hellow worlds a.建一个js文件 hello.js 写 console.info('hellow world !!!'); 进入终 ...
Ext4.0.7使用Ext.grid.ColumnModel报错：TypeError: Ext.grid.Model is not a constructor
代码如下: Ext.onReady(function(){ //定义列 var cm = new Ext.grid.ColumnModel([ {header: '编号', dataIndex: 'i ...
KEIL的宏汇编器A51介绍
A51是一种具有通用特性和用法的重定位宏汇编器.它与Intel公司的MASM51宏汇编器具有很好兼容性,支持模块化编程,可以方便地与高级语言接口.A51宏汇编器支持汇编伪指令.宏处理指令以及汇编控制命 ...
.Net XML操作 <第二篇>
一.XML文件操作中与.Net中对应的类微软的.NET框架在System.xml命名空间提供了一系列的类用于Dom的实现. 以下给出XML文档的组成部分对应.NET中的类: XML文档组成部分对应 ...
SQL查询最近三个月的数据（查询最近几天，几年等等）
定义和用法 :: 天,这样就可以找到付款日期. 我们使用如下 ,OrderDate) AS OrderPayDate FROM Orders 结果: OrderId OrderPayD ...
mysql----用户root被删除或忘记root密码的解决方案
修改文件my.cnf,可用VIM打开,如:sudo vim /etc/my.cnf 在[mysqld]下加上一行: skip-grant-tables 保存文件,然后重启mysqld程序:sudo s ...
jquery插件-自定义select
由于原生select在各个浏览器的样式不统一,特别是在IE67下直接不可以使用样式控制,当PM让你做一个样式的时候,那是相当的痛苦.最好的办法就是使用自定义样式仿select效果.这里写了一个 ...
pyqt动态创建一系列组件并绑定信号和槽(网友提供学习)
# -*- coding: utf-8 -*- # python:2.x __author__ = 'Administrator' #如上图要求:创建指定多个复选框,一种是通过QT设计器Designe ...

collections——高性能容器数据类型

collections——高性能容器数据类型的更多相关文章

随机推荐

热门专题