1. Generic Mapping Types

  • The collections.abc module provides the Mapping and MutableMapping ABCs to formalize the interfaces of dict and similar types (in Python 2.6 to 3.2, these classes are imported from the collections module, and not from collections.abc).
  • All mapping types in the standard library use the basic dict in their implementation, so they share the limitation that the keys must be hashable.

a = dict(one=1, two=2, three=3)
b = {'one': 1, 'two': 2, 'three': 3}
c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))
d = dict([('two', 2), ('one', 1), ('three', 3)])
e = dict({'three': 3, 'one': 1, 'two': 2})
print(a == b == c == d == e) # True from collections import abc
print(isinstance(a, abc.Mapping)) # True
# 1. The main value of the ABCs is documenting and formalizing the
# minimal interfaces for mappings, and serving as criteria for isinstance
# tests in code that needs to support mappings in a broad sense. print(hash((1, 2, (3, 4)))) # -2725224101759650258
# 1. An object is hashable if it has a hash value which never changes
# during its lifetime (it needs a __hash__() method), and can be compared
# to other objects (it needs an __eq__() method).
# 2. Hashable objects which compare equal must have the same hash value.
# 3. (atomic immutable types / frozen set / tuple when all its items are hashable)
# 4. User-defined types are hashable by default because their hash value is their
# id() and they all compare not equal. If an object implements a custom __eq__
# that takes into account its internal state, it may be hashable only if all its
# attributes are immutable.

2. Dict Comprehensions

DIAL_CODES = [
(86, 'China'),
(91, 'India'),
(1, 'United States')
]
print {country.upper(): code for code, country in DIAL_CODES if code > 66}
# {'INDIA': 91, 'CHINA': 86}

3. Overview of Common Mapping Methods

4. Mappings with Flexible Key Lookup

my_dict.get(key, default)
# my_dict[key] # KeyError my_dict.setdefault(key, []).append(new_value)
# if key not in my_dict:
# my_dict[key] = []
# my_dict[key].append(new_value) # 1. setdefault returns the value, so it can
# be updated without requiring a second search. import collections
my_dict = collections.defaultdict(list)
my_dict[key].append(new_value)
print my_dict.default_factory # <type 'list'> # 1. When instantiating a defaultdict, you provide a callable
# that is used to produce a default value whenever __getitem__
# is passed a nonexistent key argument.
# 2. If no default_factory is provided, the usual KeyError
# is raised for missing keys.
# 3. The default_factory of a defaultdict is only invoked to
# provide default values for __getitem__ calls, and not for
# the other methods. (__missing__) class MyDict(dict):
def __missing__(self, key):
if isinstance(key, str):
raise KeyError(key)
return self[str(key)]
def get(self, key, default=None):
try:
return self[key]
except KeyError:
return default
def __contains__(self, key):
return key in self.keys() or str(key) in self.keys() d = MyDict([('1', 'one'), ('4', 'four')])
print d[1] # one
print d[2] # KeyError
print d.get(3) # None
print 4 in d # True # 1. If you subclass dict and provide a __missing__ method, the
# standard dict.__getitem__ will call it whenever a key is not found,
# instead of raising KeyError. The __missing__ method is just called
# by __getitem__. (i.e., for the d[k] operator).
# 2. A better way to create a user-defined mapping type is to
# sub‐class collections.UserDict instead of dict
# 3. A search like k in my_dict.keys() is efficient in Python 3 even
# for very large mappings because dict.keys() returns a view, which is
# similar to a set, and containment checks in sets are as fast as in
# dictionaries. In Python 2, dict.keys() returns a list, k in my_list
# must scan the list.

5. Variations of dict

  1. collections.OrderedDict

    • Maintains keys in insertion order. The popitem method of an OrderedDict pops the first item by default, but if called as my_odict.popitem(last=True), it pops the last item added.
  2. collections.ChainMap
    • Holds a list of mappings that can be searched as one. The lookup is performed on each mapping in order, and succeeds if the key is found in any of them. This is useful to interpreters for languages with nested scopes, where each mapping represents a scope context. The “ChainMap objects” section of the collections docs has several examples of ChainMap usage.
    • import builtins
      pylookup = ChainMap(locals(), globals(), vars(builtins))
  3. collections.Counter

    • A mapping that holds an integer count for each key. Updating an existing key adds to its count. This can be used to count instances of hashable objects (the keys) or as a multiset—a set that can hold several occurrences of each element.
    • import collections
      ct = collections.Counter('abracadabra')
      print ct # Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
      ct.update('aaaaazzz')
      print ct # Counter({'a': 10, 'z': 3, 'b': 2, 'r': 2, 'c': 1, 'd': 1})
      print ct.most_common(2) # [('a', 10), ('z', 3)]
  4. collections.UserDict
    • A pure Python implementation of a mapping that works like a standard dict.
    • import collections
      class MyDict2(collections.UserDict):
      def __missing__(self, key):
      if isinstance(key, str):
      raise KeyError(key)
      return self[str(key)]
      def __contains__(self, key):
      return str(key) in self.data
      def __setitem__(self, key, item):
      self.data[str(key)] = item # stores all keys as str d = MyDict2([('1', 'one'), ('4', 'four')])
      print(d[1]) # one
      print(d[2]) # KeyError
      print(d.get(3)) # None
      print(4 in d) # True
      d.update([('1', '111')])
      print(d.get(1)) # 111 # 1. The main reason why it’s preferable to subclass from UserDict
      # rather than from dict is that the built-in has some implementation
      # shortcuts that end up forcing us to override methods that we can
      # just inherit from UserDict with no problems.
      # 2. UserDict does not inherit from dict, but has an internal dict
      # instance, called data, which holds the actual items.

6. Immutable Mappings

from types import MappingProxyType
d = {1: 'A'}
d_proxy = MappingProxyType(d)
print(d_proxy) # mappingproxy({1: 'A'})
print(d_proxy[1]) # 'A'
# d_proxy[2] = 'B' # TypeError
d[2] = 'B'
print(d_proxy) # {1: 'A', 2: 'B'}
print(d) # {1: 'A', 2: 'B'} # 1. updates to the original mapping can be seen in the
# mappingproxy, but changes cannot be made through it.

7. Set Theory

l = ['spam', 'spam', 'eggs', 'spam']
s1 = set(l)
s2 = {'spam'}
print(hash(frozenset(range(4)))) # 7327027465595056883
print(s1) # {'eggs', 'spam'}
print(s1 & s2) # {'spam'}
print(s2.pop()) # spam
print(s2) # set()
s3 = {i for i in range(10) if i < 4} # Set Comprehensions
print(s3) # {0, 1, 2, 3} # 1. Create s1 is slower than s2 because, to evaluate it, Python
# has to look up the set name to fetch the constructor, then build
# a list, and finally pass it to the constructor.
# 2. Set elements must be hashable. The set type is not hashable,
# but frozenset is.
# 3. Given two sets a and b, a | b returns their union, a & b
# computes the intersection, and a - b the difference.
# 4. There’s no literal notation for the empty set, just use set().

8. dict and set Under the Hood

8.1 Hash Tables in Dictionaries

  • A hash table is a sparse array (i.e., an array that always has empty cells). In standard data structure texts, the cells in a hash table are often called “buckets.” In a dict hash table, there is a bucket for each item, and it contains two fields: a reference to the key and a reference to the value of the item. Because all buckets have the same size, access to an individual bucket is done by offset.

  • Python tries to keep at least 1/3 of the buckets empty; if the hash table becomes too crowded, it is copied to a new location with room for more buckets.
  • To put an item in a hash table, the first step is to calculate the hash value of the item key, which is done with the hash() built-in function.
  • Hashes and equality:
    • The hash() built-in function works directly with built-in types and falls back to calling __hash__ for user-defined types. If two objects compare equal, their hash values must also be equal. (hash(1) == hash(1.0))
    • Ideally, objects that are similar but not equal should have hash values that differ widely.
    • Starting with Python 3.3, a random salt value is added to the hashes of str, bytes, and datetime objects. The salt value is constant within a Python process but varies between interpreter runs. The random salt is a security measure to prevent a DOS attack.
  • The hash table algorithm:
    • To fetch the value at my_dict[search_key], Python calls hash(search_key) to obtain the hash value of search_key and uses the least significant bits of that number as an offset to look up a bucket in the hash table (the number of bits used depends on the current size of the table). If the found bucket is empty, KeyError is raised. Otherwise the found bucket has an item—a found_key:found_value pair—and then Python checks whether search_key == found_key. If they match, that was the item sought: found_value is returned.
    • However, if search_key and found_key do not match, this is a hash collision. This happens because a hash function maps arbitrary objects to a small number of bits, and—in addition—the hash table is indexed with a subset of those bits. In order to resolve the collision, the algorithm then takes different bits in the hash, massages them in a par‐ticular way, and uses the result as an offset to look up a different bucket. If that is empty, KeyError is raised; if not, either the keys match and the item value is returned, or the collision resolution process is repeated.
    • when inserting items, Python may determine that the hash table is too crowded and rebuild it to a new location with more room. As the hash table grows, so does the number of hash bits used as bucket offsets, and this keeps the rate of collisions low.
    • even with millions of items in a dict, many searches happen with no collisions, and the average number of collisions per search is between one and two.

8.2 Practical Consequences of How dict Works

  • Keys must be hashable objects:

    • Hashable:

      • It supports the hash() function via a hash() method that always returns the same value over the lifetime of the object.
      • It supports equality via an eq() method.
      • If a == b is True then hash(a) == hash(b) must also be True.
      • User-defined types are hashable by default because their hash value is their id() and they all compare not equal.
    • If you implement a class with a custom __eq__ method, you must also implement a suitable __hash__, because you must always make sure that if a == b is True then hash(a) == hash(b) is also True. Otherwise you are breaking an invariant of the hash table algo‐rithm, with the grave consequence that dicts and sets will not deal reliably with your objects. If a custom __eq__ depends on mutable state, then __hash__ must raise TypeError.

  • Dicts have significant memory overhead:
    • Because a dict uses a hash table internally, and hash tables must be sparse to work, they are not space efficient.
    • Replacing dicts with tuples reduces the memory usage in two ways: by removing the overhead of one hash table per record and by not storing the field names again with each record.
  • Key search is very fast:
    • trading space for time.
  • Key ordering depends on insertion order:
    • a dict built as dict([(key1, value1), (key2, value2)]) compares equal to dict([(key2, value2), (key1, value1)]), but their key ordering may not be the same if the hashes of key1 and key2 collide.
  • Adding items to a dict may change the order of existing keys:
    • In Python 3, the .keys(), .items(), and .values() methods return dictionary views, which behave more like sets than the lists returned by these methods in Python 2. Such views are also dynamic: they do not replicate the contents of the dict, and they immediately reflect any changes to the dict.

3. Dictionaries and Sets的更多相关文章

  1. C#中的Collection 1

    Collection定义 Collection是个关于一些变量的集合,按功能可分为Lists,Dictionaries,Sets三个大类. Lists:是一个有序的集合,支持index look up ...

  2. Python教学相关资料

    Python教学调查链接 一.专题 1.绘图 如何开始使用Python来画图 Python画图总结 2.科学计算与数据分析 3.可视化 4.网络爬虫 5. 做笔记 Python-Jupyter Not ...

  3. python快速上手教程

    python版本 python目前的版本分为2.7和3.5,两种版本的代码目前无法兼容,查看python版本号: python --version 基本数据类型 数字类型 整型和浮点型数据和其它编程语 ...

  4. 《流畅的Python》Data Structures--第2章序列array

    第二部分 Data Structure Chapter2 An Array of Sequences Chapter3 Dictionaries and Sets Chapter4 Text vers ...

  5. python学习之那些由print引起的困惑

    该文索所起之因:在练习列表的操作时,要输出一波操作后的列表,但是一直让本人耿耿于怀的时下边的这个现象: 红色框框里是字符串,黄色框框里是列表,同样是只对一个元素进行的操作,为啥输出时字符串是作为一个整 ...

  6. Arrays, Hashtables and Dictionaries

    Original article Built-in arrays Javascript Arrays(Javascript only) ArrayLists Hashtables Generic Li ...

  7. [Python] 03 - Lists, Dictionaries, Tuples, Set

    Lists 列表 一.基础知识 定义 >>> sList = list("hello") >>> sList ['h', 'e', 'l', ' ...

  8. Look Further to Recognize Better: Learning Shared Topics and Category-Specific Dictionaries for Open-Ended 3D Object Recognition

    张宁 Look Further to Recognize Better: Learning Shared Topics and Category-Specific Dictionaries for O ...

  9. TSQL 分组集(Grouping Sets)

    分组集(Grouping Sets)是多个分组的并集,用于在一个查询中,按照不同的分组列对集合进行聚合运算,等价于对单个分组使用“union all”,计算多个结果集的并集.使用分组集的聚合查询,返回 ...

随机推荐

  1. CentOS 7下Cloudera Manager及CDH 6.0.1安装过程详解

    目录 一.概念介绍 1.CDH 概览 2.Cloudera Manager 概览 二.环境准备 1.软件版本选择 2.节点准备(四个节点) 3.配置主机名和hosts解析(所有节点) 4.关闭防火墙 ...

  2. 【POJ - 3046】Ant Counting(多重集组合数)

    Ant Counting 直接翻译了 Descriptions 贝西有T种蚂蚁共A只,每种蚂蚁有Ni只,同种蚂蚁不能区分,不同种蚂蚁可以区分,记Sum_i为i只蚂蚁构成不同的集合的方案数,问Sum_k ...

  3. thinkPHP5 类库包注册

    tp5的类库包注册分为自动注册和手动注册 自动注册 我们只需要把自己的类库包目录放入EXTEND_PATH目录(默认为extend,可配置),就可以自动注册对应的命名空间,例如: 我们在extend目 ...

  4. 用elasticsearchdump备份恢复数据

    1.安装elastic searchdump mkdir /data/nodejs cd /data/nodejs wget https://nodejs.org/dist/v10.16.2/node ...

  5. 使用Dockerfile创建ssh服务的镜像02

    使用Dockerfile创建ssh服务的镜像02 1:创建工作目录---一个镜像的所有文件都放这个目录下 ubuntu@ubuntu:~$ mkdir sshd_ubuntu ubuntu@ubunt ...

  6. 如何查看class文件的编译jdk版本号

    使用命令 javap -verbose 命 进入cmd中,使用如下命令 红色

  7. 【LOJ】#3096. 「SNOI2019」数论

    LOJ#3096. 「SNOI2019」数论 如果\(P > Q\)我们把\(P\)和\(Q\)换一下,现在默认\(P < Q\) 这个时候每个合法的\(a_i\)都可以直接落到\(Q\) ...

  8. XML文件介绍

    xml基础详解 1.概述: xml:即可扩展标记语言,xml是互联网数据传输的重要工具,它可以跨越互联网的任何平台,不受编程语言和操作系统的限制,可以说他是一个拥有互联网最高级别通行证的数据携带者.x ...

  9. Web文件上传靶场 - 通关笔记

    Web应用程序通常会提供一些上传功能,比如上传头像,图片资源等,只要与资源传输有关的地方就可能存在上传漏洞,上传漏洞归根结底是程序员在对用户文件上传时控制不足或者是处理的缺陷导致的,文件上传漏洞在渗透 ...

  10. vue之多页面的开发

    我们平常用vue开发的时候总觉得vue好像就是专门为了单页面应用而诞生的,其实不是.因为vue在工程化开发的时候很依赖webpack,而webpack是将所有的资源整合到一块,弄成一个单页面.但是vu ...