转自:http://apfelmus.nfshost.com/articles/monoid-fingertree.html

This post grew out of the big monoid discussion on the haskell-cafe mailing list.

Introduction

A very powerful application of monoids are 2-3 finger trees, first described by Ralf Hinze and Ross Patterson.

Basically, they allow you to write fast implementations for pretty much every abstract data type mentioned in Okasaki’s book on purely functional data structures. For example, you can do sequences, priority queues, search trees and priority search queues. Moreover, any fancy and custom data structures like interval trees or something for stock trading are likely to be implementable in this framework as well.

How can one tree be useful for so many different data structures? The answer: monoids! Namely, the finger tree works with elements that are related to a monoid, and all the different data structures mentioned above arise by different choices for this monoid.

Let me explain how this monoid magic works.

A list with random access

We begin with the simplest of all data structures, the linked list. As you well know, retrieving the head is fast but random access is much slower:

xs !! n

needs O(n) i.e. linear time to retrieve the n-th element of the list. We would like to create a faster list-like data structure that reduces this to O(log n) i.e. logarithmic time.

For that, we use a binary tree that stores the elements a at the leaves. Furthermore, every node is annotated with a value of type v

data Tree v a = Leaf   v a
| Branch v (Tree v a) (Tree v a)

In other words, our trees look like this

     v
/ \
v v
/ \ / \
v v v v
a a a / \
v v
a a

The leaves store the elements of our list from left to right.

toList :: Tree v a -> [a]
toList (Leaf _ a) = [a]
toList (Branch _ x y) = toList x ++ toList y

Annotations are fetched by

tag :: Tree v a -> v
tag (Leaf v _) = v
tag (Branch v _ _) = v

We can also implement the head operation which retrieves the leftmost element

head :: Tree v a -> a
head (Leaf _ a) = a
head (Branch _ x _) = head x

Ok, so accessing the 1st leaf was easy, how about the 2nd, 3rd, the n-th leaf?

The solution is to annotate each subtree with its size.

type Size = Int

Our example tree has 5 leaves in total and the subtree on the right contains 3 leaves.

     5
/ \
2 3
/ \ / \
1 1 1 2
a a a / \
1 1
a a

Thus, we set v = Size and we want the annotations to fulfill

tag (Leaf  ..)       = 1
tag (Branch .. x y) = tag x + tag y

We can make sure that they are always correct by using smart constructors: instead of using Leaf and Branch to create a tree, we use custom functions

leaf :: a -> Tree Size a
leaf a = Leaf 1 a branch :: Tree Size a -> Tree Size a -> Tree Size a
branch x y = Branch (tag x + tag y) x y

which automatically annotate the right sizes.

Given size annotations, we can now find the n-th leaf:

(!!) :: Tree Size a -> Int -> a
(Leaf _ a) !! 0 = a
(Branch _ x y) !! n
| n < tag x = x !! n
| otherwise = y !! (n - tag x)

And assuming that our tree is balanced, this will run in O(log n) time. But for now, let’s ignore balancing which would become relevant when implementing cons or tail.

A priority queue

Let’s consider a different data structure, the priority queue. It stores items that have different “priorities” and always returns the most urgent one first. We represent priorities as integers and imagine them as points in time so the smallest ones are more urgent.

type Priority = Int

Once again, we use a binary tree. This time, we imagine it as a tournament tree, so that every subtree is annotated with the smallest priority it contains

     2
/ \
4 2
/ \ / \
16 4 2 8
a a a / \
32 8
a a

In other words, our annotations are to fulfill

tag (Leaf .. a)     = priority a
tag (Branch .. x y) = tag x `min` tag y

with corresponding smart constructors. Given the tournament table, we can reconstruct the element that has the smallest priority in O(log n) time

winner :: Tree Priority a -> a
winner t = go t
where
go (Leaf _ a) = a
go (Branch _ x y)
| tag x == tag t = go x -- winner on left
| tag y == tag t = go y -- winner on right

Again, we forgo balancing and thus insertion or deletion.

Monoids - the grand unifier

As we can see, one and the same tree structure can be used for two quite different purposes, just by using different annotations. And by recognizing that the tags form amonoid, we can completely unify both implementations. Moreover, the retrieval operations (!!) and winner are actually special cases of one and the same function!

For brevity, we will denote the associative operation of a monoid with <>

(<>) = mappend

Think of the <> as a small diamond symbol.

Annotations are monoids

The observation is that we obtain the tag of a branch by combining its children with the monoid operation

tag (Branch .. x y) = tag x <> tag y

of the following monoid instances

instance Monoid Size where
mempty = 0
mappend = (+) instance Monoid Priority where
mempty = maxBound
mappend = min

Hence, a unified smart constructor reads

branch :: Monoid v => Tree v a -> Tree v a -> Tree v a
branch x y = Branch (tag x <> tag y) x y

For leaves, the tag is obtained from the element. We can capture this in a type class

class Monoid v => Measured a v where
measure :: a -> v

so that the smart constructor reads

leaf :: Measured a v => a -> Tree v a
leaf a = Leaf (measure a) a

For our examples, the instances would be

instance Measured a Size where
measure _ = 1 -- one element = size 1 instance Measured Foo Priority where
measure a = priority a -- urgency of the element

How does the annotation at the top of a tree relate to the elements at the leaves? In our two examples, it was the total number of leaves and the least priority respectively. These values are independent of the actual shape of the tree. Thanks to the associativity of <>, this is true for any monoid. For instance, the two trees

(v1<>v2) <> (v3<>v4)         v1 <> (v2<>(v3<>v4))
/ \ / \
/ \ v1 v2 <> (v3<>v4)
/ \ a1 / \
v1 <> v2 v3 <> v4 v2 v3 <> v4
/ \ / \ a2 / \
v1 v2 v3 v4 v3 v4
a1 a2 a3 a4 a3 a4

have the same annotations

(v1<>v2) <> (v3<>v4) = v1 <> (v2<>(v3<>v4)) = v1 <> v2 <> v3 <> v4

as long as the sequences of leaves are the same. In general, the tag at the root of a tree withn elements is

measure a1 <> measure a2 <> measure a3 <> ... <> measure an

While independent of the shape of the branching, i.e. on the placement of parenthesis, this may of course depend on the order of elements.

It makes sense to refer to this combination of measures of all elements as the measure of the tree

instance Measured a v => Measured (Tree a v) v where
measure = tag

Thus, every tree is annotated with its measure.

Search

Our efforts culminate in the unification of the two search algorithms (!!) and winner. They are certainly similar; at each node, they descend into one of the subtrees which is chosen depending on the annotations. But to see their exact equivalence, we have to ignore branches and grouping for now because this is exactly what associativity “abstracts away”.

In a sequence of elements

a1 , a2 , a3 , a4 , ... , an

how to find say the 3rd one? Well, we scan the list from left to right and add 1 for each element encountered. As soon as the count exceeds 3, we have found the 3rd element.

1                -- is not > 3
1 + 1 -- is not > 3
1 + 1 + 1 -- is not > 3
1 + 1 + 1 + 1 -- is > 3
...

Similarly, how to find the element of a least priority say v? Well, we can scan the list from left to right and keep track of the minimum priority so far. We have completed our search once it becomes equal to v.

v1                                -- still bigger than v
v1 `min` v2 -- still bigger than v
v1 `min` v2 `min` v3 -- still bigger than v
v1 `min` v2 `min` v3 `min` v4 -- equal to v!
...

In general terms, we are looking for the position where a predicate p switches from Falseto True.

measure a1                                              -- not p
measure a1 <> measure a2 -- not p
measure a1 <> measure a2 <> measure a3 -- not p
measure a1 <> measure a2 <> measure a3 <> measure a4 -- p
... -- p

In other words, we are looking for the position k where

p (measure a1 <> ... <> measure ak)                    is  False
p (measure a1 <> ... <> measure ak <> measure a(k+1)) is True

The key point is that p does not test single elements but combinations of them, and this allows us to do binary search! Namely, how to find the element where p flips? Answer: divide the total measure into two halves

x <> y

    x =       measure a1 <> ... <> measure a(n/2)
y = measure a(n/2+1) <> ... <> measure an

If p is True on the first half, then we have to look there for the flip, otherwise we have to search the second half. In the latter case, we would have to split y = y1 <> y2 and test p (x <> y1).

In the case of our data structures, the tree shape determines how the measure is split into parts at each step. Here is the full procedure

search :: Measured a v => (v -> Bool) -> Tree v a -> Maybe a
search p t
| p (measure t) = Just (go mempty p t)
| otherwise = Nothing
where
go i p (Leaf _ a) = a
go i p (Branch _ l r)
| p (i <> measure l) = go i p l
| otherwise = go (i <> measure l) p r

Since we have annotated each branch with its measure, testing p takes no time at all.

Of course, this algorithm only works if p really does flip from False to True exactly once. This is the case if p fulfills

p (x)  implies  p (x <> y)   for all y

and we say that p is a monotonic predicate. Our two examples (> 3) and (== minimum)have this property and thus, we can finally conclude with

t !! k   = search (> k)
winner t = search (== measure t)

Where to go from here

I hope you have enjoyed this excursion into the land of trees and monoids. If you want to stay a bit longer, implement a data structure that do both look up the k-th element and retrieve the element with the least priority at the same time. This is also known as priority search queue.

If you still long for more, the finger tree paper knows the way; I have tried to closely match their notation. In particular, they solve the balancing issue which turns the binary search on monoids into a truly powerful tool to construct about any fancy data structure with logarithmic access times you can imagine.

Heinrich Apfelmus

转:Monoids and Finger Trees的更多相关文章

  1. Finger Trees: A Simple General-purpose Data Structure

    http://staff.city.ac.uk/~ross/papers/FingerTree.html Summary We present 2-3 finger trees, a function ...

  2. The Swiss Army Knife of Data Structures … in C#

    "I worked up a full implementation as well but I decided that it was too complicated to post in ...

  3. [C#] C# 知识回顾 - 表达式树 Expression Trees

    C# 知识回顾 - 表达式树 Expression Trees 目录 简介 Lambda 表达式创建表达式树 API 创建表达式树 解析表达式树 表达式树的永久性 编译表达式树 执行表达式树 修改表达 ...

  4. hdu2848 Visible Trees (容斥原理)

    题意: 给n*m个点(1 ≤ m, n ≤ 1e5),左下角的点为(1,1),右上角的点(n,m),一个人站在(0,0)看这些点.在一条直线上,只能看到最前面的一个点,后面的被档住看不到,求这个人能看 ...

  5. [LeetCode] Minimum Height Trees 最小高度树

    For a undirected graph with tree characteristics, we can choose any node as the root. The result gra ...

  6. [LeetCode] Unique Binary Search Trees 独一无二的二叉搜索树

    Given n, how many structurally unique BST's (binary search trees) that store values 1...n? For examp ...

  7. [LeetCode] Unique Binary Search Trees II 独一无二的二叉搜索树之二

    Given n, generate all structurally unique BST's (binary search trees) that store values 1...n. For e ...

  8. 2 Unique Binary Search Trees II_Leetcode

    Given n, generate all structurally unique BST's (binary search trees) that store values 1...n. For e ...

  9. Linux下的Finger指令

    Linux finger命令 Linux finger命令可以让使用者查询一些其他使用者的资料.会列出来的资料有: Login Name User Name Home directory Shell ...

随机推荐

  1. C#函数式程序设计之函数、委托和Lambda表达式

    C#函数式程序设计之函数.委托和Lambda表达式 C#函数式程序设计之函数.委托和Lambda表达式   相信很多人都听说过函数式编程,提到函数式程序设计,脑海里涌现出来更多的是Lisp.Haske ...

  2. r.js实践

    r.js合并实践 项目中用到require.js做生产时模块开发,但上线要合并压缩,幸好它配套有r.js.下面就其用法说明一下. 首先建一个目录,里面的结构如下: require.js可以到r.js项 ...

  3. Bootstrap3.0入门学习系列

    Bootstrap3.0入门学习系列规划[持续更新]   前言 首先在此多谢博友们在前几篇博文当中给与的支持和鼓励,以及在回复中提出的问题.意见和看法. 在此先声明一下,之前在下小菜所有的随笔文章中, ...

  4. [转]Even when one byte matters

    Source:http://kernelbof.blogspot.jp/2009/07/even-when-one-byte-matters.html Common Vulnerabilities a ...

  5. cocos2d(CCSprite绑定不规则刚体与精灵一起移动)

    对于不规则的精灵我们可以借助PhysicsEditor来制作shape , 对于地图可以使用Tiled软件制作瓷砖地图. 今天主要记录一下如何把CCSprite与不规则刚体进行绑定,然后一起移动 // ...

  6. 译文:User-agent的历史

      这是一篇译文,译文出处在文章底部贴出.由于技术水平,英语水平,翻译水平有限,请各路大侠多多指正,谢谢,提高你也提高我:)   几个礼拜之前,我谈论了特征检测和浏览器检测.这篇帖子提到了一点点嗅探U ...

  7. jQuery获取动态生成的元素

    需求描述:页面上可以动态添加数据,比如table,点击按钮可以动态添加行.又或页面 加载时table数据是通过ajax从后台获取的.而这时我们想要获取其中的某个值,又该如何获取呢? 如果是要通过某个事 ...

  8. C#线程同步(转)

    线程同步 在应用程序中使用多个线程的一个好处是每个线程都可以异步执行.对于 Windows 应用程序,耗时的任务可以在后台执行,而使应用程序窗口和控件保持响应.对于服务器应用程序,多线程处理提供了用不 ...

  9. WINCE 电池状态(C#)

    WINCE 电池状态(C#) 分类:             电量              2013-04-18 12:08     397人阅读     评论(1)     收藏     举报   ...

  10. key-list类型内存数据引擎介绍及使用场景

    “互联网数据目前基本使用两种方式来存储,关系数据库或者key value.但是这些互联网业务本身并不属于这两种数据类型,比如用户在社会化平台中的关系,它是一个list,如果要用关系数据库存储就需要转换 ...