建议50：Python中的高级数据结构

# -*-  coding:utf-8 -*-

'''

Collection、Array、Heapq、Bisect、Weakref、Copy以及Pprint

collections模块包含了内建类型之外的一些有用的工具，例如Counter、defaultdict、OrderedDict、deque以及nametuple。其中Counter、deque以及defaultdict是最常用的类。

'''

from collections import Counter

li = ["Dog", "Cat", "Mouse", 42, "Dog", 42, "Cat", "Dog"]

a = Counter(li)

print a # Counter({'Dog': 3, 42: 2, 'Cat': 2, 'Mouse': 1})

#若要统计一个list中不同单词的数目，可以这么用：

print len(set(li)) #

#如果需要对结果进行分组，可以这么做：

print "{0} : {1}".format(a.values(),a.keys())  # [1, 3, 2] : ['Mouse', 'Dog', 'Cat']

print(a.most_common(3)) # [('Dog', 3), ('Cat', 2), ('Mouse', 1)]

'''

Deque是一种由队列结构扩展而来的双端队列(double-ended queue)，队列元素能够在队列两端添加或删除。

因此它还被称为头尾连接列表(head-tail linked list)，尽管叫这个名字的还有另一个特殊的数据结构实现。

Deque支持线程安全的，经过优化的append和pop操作，在队列两端的相关操作都能够达到近乎O(1)的时间复杂度。

虽然list也支持类似的操作，但是它是对定长列表的操作表现很不错，而当遇到pop(0)和insert(0, v)这样既改变了列表的长度

又改变其元素位置的操作时，其复杂度就变为O(n)了。

'''

import time

from collections import deque

num = 100000

def append(c):

    for i in range(num):

        c.append(i)

def appendleft(c):

    if isinstance(c, deque):

        for i in range(num):

            c.appendleft(i)

    else:

        for i in range(num):

            c.insert(0, i)

def pop(c):

    for i in range(num):

        c.pop()

def popleft(c):

    if isinstance(c, deque):

        for i in range(num):

            c.popleft()

    else:

        for i in range(num):

            c.pop(0)

for container in [deque, list]:

    for operation in [append, appendleft, pop, popleft]:

        c = container(range(num))

        start = time.time()

        operation(c)

        elapsed = time.time() - start

        print "Completed {0}/{1} in {2} seconds: {3} ops/sec".format(

              container.__name__, operation.__name__, elapsed, num / elapsed)

# Completed deque/append in 0.0250000953674 seconds: 3999984.74127 ops/sec

# Completed deque/appendleft in 0.0199999809265 seconds: 5000004.76838 ops/sec

# Completed deque/pop in 0.0209999084473 seconds: 4761925.52225 ops/sec

# Completed deque/popleft in 0.0199999809265 seconds: 5000004.76838 ops/sec

# Completed list/append in 0.0220000743866 seconds: 4545439.17637 ops/sec

# Completed list/appendleft in 21.3209998608 seconds: 4690.21155917 ops/sec

# Completed list/pop in 0.0240001678467 seconds: 4166637.52682 ops/sec

# Completed list/popleft in 4.01799988747 seconds: 24888.0046791 ops/sec

#另一个例子是执行基本的队列操作：

from collections import deque

q = deque(range(5))

q.append(5)

q.appendleft(6)

print q

print q.pop()

print q.popleft()

print q.rotate(3)

print q

print q.rotate(-1)

print q

# deque([6, 0, 1, 2, 3, 4, 5])

#

#

# None

# deque([2, 3, 4, 0, 1])

# None

# deque([3, 4, 0, 1, 2])

'''

Defaultdict

这个类型除了在处理不存在的键的操作之外与普通的字典完全相同。当查找一个不存在的键操作发生时，

它的default_factory会被调用，提供一个默认的值，并且将这对键值存储下来。其他的参数同普通的字典方法dict()一致，

一个defaultdict的实例同内建dict一样拥有同样地操作。

defaultdict对象在当你希望使用它存放追踪数据的时候很有用。

'''

#假定你希望追踪一个单词在字符串中的位置，那么你可以这么做：

from collections import defaultdict

s = "the quick brown fox jumps over the lazy dog"

words = s.split()

location = defaultdict(list)

for m, n in enumerate(words):

    location[n].append(m)

print location

# defaultdict(<type 'list'>, {'brown': [2], 'lazy': [7], 'over': [5], 'fox': [3],

# 'dog': [8], 'quick': [1], 'the': [0, 6], 'jumps': [4]})

#是选择lists或sets与defaultdict搭配取决于你的目的，使用list能够保存你插入元素的顺序，而使用set则不关心元素插入顺序，它会帮助消除重复元素。

from collections import defaultdict

s = "the quick brown fox jumps over the lazy dog"

words = s.split()

location = defaultdict(set)

for m, n in enumerate(words):

    location[n].add(m)

print location

# defaultdict(<type 'set'>, {'brown': set([2]), 'lazy': set([7]),

# 'over': set([5]), 'fox': set([3]), 'dog': set([8]), 'quick': set([1]),

# 'the': set([0, 6]), 'jumps': set([4])})

#另一种创建multidict的方法：

s = "the quick brown fox jumps over the lazy dog"

d = {}

words = s.split()

for key, value in enumerate(words):

    d.setdefault(key, []).append(value)

print d

# {0: ['the'], 1: ['quick'], 2: ['brown'], 3: ['fox'], 4: ['jumps'], 5: ['over'], 6: ['the']

'''

array模块定义了一个很像list的新对象类型，不同之处在于它限定了这个类型只能装一种类型的元素。

array元素的类型是在创建并使用的时候确定的。

如果你的程序需要优化内存的使用，并且你确定你希望在list中存储的数据都是同样类型的，那么使用array模块很合适。

举个例子，如果需要存储一千万个整数，如果用list，那么你至少需要160MB的存储空间，然而如果使用array，你只需要40MB。

但虽然说能够节省空间，array上几乎没有什么基本操作能够比在list上更快。

在使用array进行计算的时候，需要特别注意那些创建list的操作。例如，使用列表推导式(list comprehension)的时候，

会将array整个转换为list，使得存储空间膨胀。一个可行的替代方案是使用生成器表达式创建新的array

'''

import array

a = array.array("i", [1,2,3,4,5])

b = array.array(a.typecode, (2*x for x in a))

#因为使用array是为了节省空间，所以更倾向于使用in-place操作。一种更高效的方法是使用enumerate

import array

a = array.array("i", [1,2,3,4,5])

for i, x in enumerate(a):

    a[i] = 2*x

import array

from timeit import Timer

def arraytest():

    a = array.array("i", [1, 2, 3, 4, 5])

    b = array.array(a.typecode, (2 * x for x in a))

def enumeratetest():

    a = array.array("i", [1, 2, 3, 4, 5])

    for i, x in enumerate(a):

        a[i] = 2 * x

if __name__=='__main__':

    m = Timer("arraytest()", "from __main__ import arraytest")

    n = Timer("enumeratetest()", "from __main__ import enumeratetest")

    print m.timeit() # 5.22479210582

    print n.timeit() # 4.34367196717

'''

heapq模块使用一个用堆实现的优先级队列。堆是一种简单的有序列表，并且置入了堆的相关规则。

堆是一种树形的数据结构，树上的子节点与父节点之间存在顺序关系。二叉堆(binary heap)能够用一个经过组织的列表或数组结构来标识，

在这种结构中，元素N的子节点的序号为2*N+1和2*N+2(下标始于0)。简单来说，这个模块中的所有函数都假设序列是有序的，

所以序列中的第一个元素(seq[0])是最小的，序列的其他部分构成一个二叉树，并且seq[i]节点的子节点分别为seq[2*i+1]以及

seq[2*i+2]。当对序列进行修改时，相关函数总是确保子节点大于等于父节点。

'''

import heapq

heap = []

for value in [20, 10, 30, 50, 40]:

    heapq.heappush(heap, value)

while heap:

    print heapq.heappop(heap)

#heapq模块有两个函数nlargest()和nsmallest()

import heapq

nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]

print(heapq.nlargest(3, nums)) # Prints [42, 37, 23]

print(heapq.nsmallest(3, nums)) # Prints [-4, 1, 2]

#两个函数也能够通过一个键参数使用更为复杂的数据结构

import heapq

portfolio = [

{'name': 'IBM', 'shares': 100, 'price': 91.1},

{'name': 'AAPL', 'shares': 50, 'price': 543.22},

{'name': 'FB', 'shares': 200, 'price': 21.09},

{'name': 'HPQ', 'shares': 35, 'price': 31.75},

{'name': 'YHOO', 'shares': 45, 'price': 16.35},

{'name': 'ACME', 'shares': 75, 'price': 115.65}

]

cheap = heapq.nsmallest(3, portfolio, key=lambda s: s['price'])

expensive = heapq.nlargest(3, portfolio, key=lambda s: s['price'])

print cheap

# [{'price': 16.35, 'name': 'YHOO', 'shares': 45},

# {'price': 21.09, 'name': 'FB', 'shares': 200}, {'price': 31.75, 'name': 'HPQ', 'shares': 35}]

print expensive

# [{'price': 543.22, 'name': 'AAPL', 'shares': 50}, {'price': 115.65, 'name': 'ACME',

# 'shares': 75}, {'price': 91.1, 'name': 'IBM', 'shares': 100}]

#看看如何实现一个根据给定优先级进行排序，并且每次pop操作都返回优先级最高的元素的队列例子

import heapq

class Item:

    def __init__(self, name):

        self.name = name

    def __repr__(self):

        return 'Item({!r})'.format(self.name)

class PriorityQueue:

    def __init__(self):

        self._queue = []

        self._index = 0

    def push(self, item, priority):

        heapq.heappush(self._queue, (-priority, self._index, item))

        self._index += 1

    def pop(self):

        return heapq.heappop(self._queue)[-1]

q = PriorityQueue()

q.push(Item('foo'), 1)

q.push(Item('bar'), 5)

q.push(Item('spam'), 4)

q.push(Item('grok'), 1)

print q.pop() # Item('bar')

print q.pop() # Item('spam')

print q.pop() # Item('foo')

print q.pop() # Item('grok')

'''

bisect模块能够提供保持list元素序列的支持。它使用了二分法完成大部分的工作。

它在向一个list插入元素的同时维持list是有序的。在某些情况下，这比重复的对一个list进行排序更为高效，

并且对于一个较大的list来说，对每步操作维持其有序也比对其排序要高效。

'''

import bisect

a = [(0, 100), (150, 220), (500, 1000)]

bisect.insort_right(a, (250,400))

print a # [(0, 100), (150, 220), (250, 400), (500, 1000)]

#使用bisect()函数来寻找插入点：

import bisect

a = [(0, 100), (150, 220), (500, 1000)]

bisect.insort_right(a, (250,400))

bisect.insort_right(a, (399, 450))

print a # [(0, 100), (150, 220), (250, 400), (500, 1000)]

print bisect.bisect(a, (550, 1200)) #

#bisect(sequence, item) => index 返回元素应该的插入点，但序列并不被修改。

import bisect

a = [(0, 100), (150, 220), (500, 1000)]

bisect.insort_right(a, (250,400))

bisect.insort_right(a, (399, 450))

print a # [(0, 100), (150, 220), (250, 400), (500, 1000)]

print bisect.bisect(a, (550, 1200)) #

bisect.insort_right(a, (550, 1200))

print a # [(0, 100), (150, 220), (250, 400), (399, 450), (500, 1000), (550, 1200)]

'''

weakref模块能够帮助我们创建Python引用，却不会阻止对象的销毁操作。这一节包含了weak reference的基本用法，

并且引入一个代理类。

在开始之前，我们需要明白什么是strong reference。strong reference是一个对对象的引用次数、

生命周期以及销毁时机产生影响的指针。strong reference如你所见，就是当你将一个对象赋值给一个变量的时候产生的：

'''

'''

Weak reference则是对对象的引用计数器不会产生影响。当一个对象存在weak reference时，并不会影响对象的撤销。这就说，如果一个对象仅剩下weak reference，那么它将会被销毁。

你可以使用weakref.ref函数来创建对象的weak reference。这个函数调用需要将一个strong reference作为第一个参数传给函数，并且返回一个weak reference。

最好将weak reference用于开销较大的对象，或避免循环引用(虽然垃圾回收器经常干这种事情)。

'''

import weakref

import gc

class MyObject(object):

    def my_method(self):

        print 'my_method was called!'

obj = MyObject()

r = weakref.ref(obj)

gc.collect()

assert r() is obj #r() allows you to access the object referenced: it's there.

obj = 1 #Let's change what obj references to

gc.collect()

assert r() is None #There is no object left: it was gc'ed.

'''

通过shallow或deep copy语法提供复制对象的函数操作。

shallow和deep copying的不同之处在于对于混合型对象的操作(混合对象是包含了其他类型对象的对象，例如list或其他类实例)。

•对于shallow copy而言，它创建一个新的混合对象，并且将原对象中其他对象的引用插入新对象。

•对于deep copy而言，它创建一个新的对象，并且递归地复制源对象中的其他对象并插入新的对象中。

普通的赋值操作知识简单的将心变量指向源对象。

'''

import copy

a = [1,2,3]

b = [4,5]

c = [a,b]

# Normal Assignment

d = c

print id(c) == id(d)          # True - d is the same object as c

print id(c[0]) == id(d[0])    # True - d[0] is the same object as c[0]

# Shallow Copy

d = copy.copy(c)

print id(c) == id(d)          # False - d is now a new object

print id(c[0]) == id(d[0])    # True - d[0] is the same object as c[0]

# Deep Copy

d = copy.deepcopy(c)

print id(c) == id(d)          # False - d is now a new object

print id(c[0]) == id(d[0])    # False - d[0] is now a new object

#shallow copy (copy())操作创建一个新的容器，其包含的引用指向原对象中的对象。

#deep copy (deepcopy())创建的对象包含的引用指向复制出来的新对象。

'''

假定我有两个类，名为Manager和Graph，每个Graph包含了一个指向其manager的引用，而每个Manager有一个指向其管理的Graph的集合，现在我们有两个任务需要完成：

1) 复制一个graph实例，使用deepcopy，但其manager指向为原graph的manager。

2) 复制一个manager，完全创建新manager，但拷贝原有的所有graph。

'''

import weakref, copy

class Graph(object):

    def __init__(self, manager=None):

        self.manager = None if manager is None else weakref.ref(manager)

    def __deepcopy__(self, memodict):

        manager = self.manager()

        return Graph(memodict.get(id(manager), manager))

class Manager(object):

    def __init__(self, graphs=[]):

        self.graphs = graphs

        for g in self.graphs:

            g.manager = weakref.ref(self)

a = Manager([Graph(), Graph()])

b = copy.deepcopy(a)

if [g.manager() is b for g in b.graphs]:

    print True # True

if copy.deepcopy(a.graphs[0]).manager() is a:

    print True # True

'''

Pprint模块能够提供比较优雅的数据结构打印方式，如果你需要打印一个结构较为复杂，层次较深的字典或是JSON对象时，使用Pprint能够提供较好的打印结果。

假定你需要打印一个矩阵，当使用普通的print时，你只能打印出普通的列表，不过如果使用pprint，你就能打出漂亮的矩阵结构

'''

import pprint

matrix = [ [1,2,3], [4,5,6], [7,8,9] ]

a = pprint.PrettyPrinter(width=20)

a.pprint(matrix)

# [[1, 2, 3],

#  [4, 5, 6],

#  [7, 8, 9]]

#------------------------------------------------------------

#--- 单链链表

#------------------------------------------------------------

class Node:

    def __init__(self):

        self.data = None

        self.nextNode = None

    def set_and_return_Next(self):

        self.nextNode = Node()

        return self.nextNode

    def getNext(self):

        return self.nextNode

    def getData(self):

        return self.data

    def setData(self, d):

        self.data = d

class LinkedList:

    def buildList(self, array):

        self.head = Node()

        self.head.setData(array[0])

        self.temp = self.head

        for i in array[1:]:

            self.temp = self.temp.set_and_return_Next()

            self.temp.setData(i)

            self.tail = self.temp

        return self.head

    def printList(self):

        tempNode = self.head

        while(tempNode!=self.tail):

            print(tempNode.getData())

            tempNode = tempNode.getNext()

        print(self.tail.getData())

myArray = [3, 5, 4, 6, 2, 6, 7, 8, 9, 10, 21]

myList = LinkedList()

myList.buildList(myArray)

myList.printList()

#------------------------------------------------------------

#--- 用Python实现的普林姆算法

#------------------------------------------------------------

from collections import defaultdict

from heapq import heapify, heappop, heappush

def prim( nodes, edges ):

    conn = defaultdict( list )

    for n1,n2,c in edges:

        conn[ n1 ].append( (c, n1, n2) )

        conn[ n2 ].append( (c, n2, n1) )

    mst = []

    used = set( nodes[ 0 ] )

    usable_edges = conn[ nodes[0] ][:]

    heapify( usable_edges )

    while usable_edges:

        cost, n1, n2 = heappop( usable_edges )

        if n2 not in used:

            used.add( n2 )

            mst.append( ( n1, n2, cost ) )

            for e in conn[ n2 ]:

                if e[ 2 ] not in used:

                    heappush( usable_edges, e )

    return mst

#test

nodes = list("ABCDEFG")

edges = [ ("A", "B", 7), ("A", "D", 5),

          ("B", "C", 8), ("B", "D", 9), ("B", "E", 7),

      ("C", "E", 5),

      ("D", "E", 15), ("D", "F", 6),

      ("E", "F", 8), ("E", "G", 9),

      ("F", "G", 11)]

print "prim:", prim( nodes, edges )

建议50：Python中的高级数据结构的更多相关文章

Python中的高级数据结构详解
这篇文章主要介绍了Python中的高级数据结构详解,本文讲解了Collection.Array.Heapq.Bisect.Weakref.Copy以及Pprint这些数据结构的用法,需要的朋友可以参考 ...
Python中的高级数据结构(转)
add by zhj: Python中的高级数据结构数据结构数据结构的概念很好理解,就是用来将数据组织在一起的结构.换句话说,数据结构是用来存储一系列关联数据的东西.在Python中有四种内建的数 ...
Python中的高级数据结构
数据结构数据结构的概念很好理解,就是用来将数据组织在一起的结构.换句话说,数据结构是用来存储一系列关联数据的东西.在Python中有四种内建的数据结构,分别是List.Tuple.Dictionar ...
第二章 python中重要的数据结构（下）
二.元组(tuple):不可变序列跟list一样,也是一种序列,唯一不同的是,元组元素不能被修改,通常用(, ,)表示元组,也可以不加括号. #创建元组 >>> 1,2,3 (1, ...
Python中的高级变量类型
高级变量类型目标列表元组字典字符串公共方法变量高级知识点回顾 Python 中数据类型可以分为数字型和非数字型数字型整型 (int) 浮点型(float) 布尔型(bool) ...
Python中的高级特性
1.切片.使用“[”和“]”即可,类似Matlab,可以切list,tuple,字符串等. 2.迭代.Python内置的enumerate函数可以把一个list变成索引-元素对. 3.列表生成式.列表 ...
Python中的高级turtle（海龟）作图
在Python里,海龟不仅可以画简单的黑线,还可以用它画更复杂的几何图形,用不同的颜色,甚至还可以给形状填色. 一.从基本的正方形开始引入turtle模块并创建Pen对象: >>> ...
第一章 python中重要的数据结构（上）
最近,由于工作需要,使用python开发公司的运维自动化平台,所以找本书来并结合官方手册,开始python的学习之旅. 一.列表 [含义]:列表用中括号表示,通过逗号进行分隔一组数据(可以为不同的数据 ...
python 中的高级函数filter()
filter()函数是 Python 内置的另一个有用的高阶函数,filter()函数接收一个函数 f 和一个list,这个函数 f 的作用是对每个元素进行判断,返回 True或 False,filt ...

随机推荐

深入理解yum工作原理
前言在前面一篇rpm包制作描述了rpm的打包过程,这篇文章主要讲述yum的工作原理. yum 运行原理 yum的工作需要两部分来合作,一部分是yum服务器,还有就是client的yum工具.下面分别 ...
python 开发技巧（2）-- Django的安装与使用
一.安装Django pip3 install django 或者直接使用PyCharm安装参考二.添加环境变量将 "(python安装路径)\Scripts" 添加到环境变 ...
Hibernate生成器类
在Hibernate中,id元素的<generator>子元素用于生成持久化类的对象的唯一标识符. Hibernate框架中定义了许多生成器类. 所有的生成器类都实现了org.hibern ...
Microsoft SQL Server JDBC 驱动程序支持矩阵
本页包含 Microsoft SQL Server JDBC 驱动程序的支持矩阵和支持生命周期策略. Microsoft JDBC 驱动程序支持生命周期矩阵和策略 Microsoft 支持生命周期 ( ...
metadata简介
元资料(Metadata),又称元数据.诠释资料.中继资料后设资料,为描述资料的资料(data about data),主要是描述资料属性(property)的资讯,用来支持如指示储存位置.历史资料. ...
Nginx系列之负载均衡和反响代理
NGINX介绍 Nginx是一款轻量级的Web 服务器/反向代理服务器及电子邮件(IMAP/POP3)代理服务器,并在一个BSD-like 协议下发行其特点是占有内存少,并发能力强,事实上nginx ...
定位frame中的元素
场景处理frame需要用到2个方法,分别是switch_to_frame(name_or_id_or_frame_element)和switch_to_default_content() 如何理解这 ...
Android开发：《Gradle Recipes for Android》阅读笔记(翻译)2.7——使用Android Studio签署发布apk
问题: 想要使用Android studio生成签名配置,给他们分配build类型. 解决方案: Build菜单提供了生成签名配置,Project Structure窗口有tab用于分配不同的type ...
Android开发：《Gradle Recipes for Android》阅读笔记(翻译)2.2——将Eclipse开发的项目导入到AndroidStudio
问题: 你想要将一个Eclipse ADT项目导入到Android Studio中. 解决方案: Android Studio提供了一个导入向导,可以重写已有的项目. 详细: 在Android Stu ...
ZOJ1119(SPF)
题目链接:传送门题目大意:一副无向图,问有多少个节点满足删除该节点后图不连通,对于每个满足条件的节点,输出节点编号及删除节点将图分为几个连通块.若没有节点满足则输出No SPF nodes 题目思路 ...

建议50：Python中的高级数据结构

建议50：Python中的高级数据结构的更多相关文章

随机推荐

热门专题