Improving the AbiWord's Piece Table【转】

One of the most critical parts of any word processor is the backend used to store its text. It should be fast to lookup, fast to insert and to erase new text at a random location, undo friendly, etc.

The AbiWord backend has all these virtues and some more. It's (IMHO) the most impressive piece of code of the whole AbiWord project, and one that has been exceptionally stable over the years. In short, Jeff's code rocks^TM.

However, improvement is still possible. I will show a modified piece table that changes the current O(n)complexity of current insertion and lookup operations by O(log(n)) operations.

Nota Bene: In this discussion, “n” is the number of pieces, not the number of characters.

Current Piece Table

If you already know how the piece table works, you can skip this section.

TODO: Write this section

In the meantime, you can read several good descriptions in the article Data Structures for Text Sequences (by Charles Crowley) and in this Piece Table Description.

The piece table that AbiWord uses is like the one explained in these articles, except that it has a little cache (the last served piece and the next one on the piece table are cached), and that after a change on the piece table, when you do a lookup, a vector is created to mirror the doubly linked list of pieces (obviously, a O(n)operation).

This vector increases the speed at which pieces are served (as long as it remains valid) and looked up (the lookup operation becomes O(log(n)) once the vector is up to date).

Unfortunatelly, the vector comes with a price. It slows down the first lookup after an insert/erase operation, it takes more memory, and it complicates the code that uses the pf_Fragments class, as it has to signal when the frags becomes dirty (the AbiWord “fragments” are in this document “pieces”).

Red-Black trees

You should be able to find plenty of explanations about how red-black trees work in the net.

The complexity guarantees of red-black trees are O(log(n)) for insertion, erase, lookup, next and previous in the worst case. The next and previous operations have an average complexity of O(1) (in average, you should just follow two pointers to reach the next or previous node).

TODO: Give some pointers to red-black trees descriptions.

Suggested modifications

The modification that I suggest is to change the doubly linked list with a auto-balanced tree. We need to stablish a key and a comparation operation to make the change possible.

As we want to make lookups (i.e., to pass from a document's position to a piece) in O(log(n)), it seems natural to choose as key something related to the document's position range that is covered by each piece.

If we choose as key the beginning position and the size of the piece, we'll have trees like the next one:

It's obvious that lookup is done now in O(log(n)), but if we do an insertion in the middle of the document, we will have to update the “beginning position” of the half upper pieces of the documents (half the tree). As we need to walk from a node to the next one, and we need to visit in the worst case O(n) nodes, the insertion operation will be O(n).

Nota Bene: It may seem that it should be O(n log(n)), because the worst case of a “go to the next node” operation is O(log(n)), but we will prove latter that the average cost of this operation is just O(1) (TODO: write down the prove!).

To solve this problem, we will “distribute” the offset information among several nodes, so it will be harder to recover the offset of a piece (O(log(n)) instead of O(1)), but it will be faster to “fix” the offsets of all the nodes in the tree (O(log(n)) instead of O(n)). We will put in each node only the size of its left subtree, the size of its right subtree, and its own size.

With this change, the lookup operation remain O(log(n)), and the insertion operation becomes alsoO(log(n)), as we don't have to update the whole tree anymore, but just all the parents of the modified piece (and any leaf has O(log(n)) parents).

With this strategy, the new tree will look like this one:

Now, if we insert/erase a node, let's say that this node becomes a left son, we should just “fix” the size_left of its parent, and then repeat the fixation process with our parent. This fixation should be done before the eventual rebalance of the tree starts. And, of course, the sizes should also be updated after each rotation in the rebalancing of the tree.

total = ;

while (node != root)

{

        total = node→size_left + node→size + node→size_right;

        if (node→parent→left == node)

                node→parent→size_left = total;

        else

                node→parent→size_right = total;

}

As the rebalance of the tree is a O(log(n)) operation for a red-black tree (the variant of autobalanced tree that I've used here), and the “fix size” operation is also a O(log(n)) operation, the whole cost of the insertion/erase of a new node is O(log(n)).

To calculate the offset of a node, we should start with the size_left field of this node, and add the size_left + size of all the ancestors for whom this node is in the right subtree. For example, to calculate the offset of the node that has a size of 12, we start with its size_left (0), and we jump to its parent (size 1). As we are the left son, we don't take in account the contribution of the parent. We then jump to the grandparent (size 8), and this time, we're in the right subtree of the grandparent, so we add the size_left and the size of the grandparent to the previous offset (0 + 9 + 8 = 17). We jump to the parent of the grandparent (the root of the tree), and as we're in the left subtree, we don't take in account the contribution of the root. We're done, the offset of the node is 17.

offset = node→size_left;

while (node != root)

{

        if (node→parent→right == node)

                offset += node→parent→size_left + node→parent→size;

        node = node→parent;

}

The lookup operation is trivially in O(log(n)), due to the invariants of the red-black tree. (The lookup operation is a linear function of the height of the tree, and the height of the tree is always less than 2 log(n)in a RB tree.)

Never assume, measure!

I've performed two performance tests. In the first one, I throw 1,000,000 characters to the piece table, each one of them at a random position. The piece table will finish with roughly 1,000,000 pieces. That's the equivalent of a dense document with 30,000 pages (and with a good deal of format changes).

The mean time for the insertion operation goes from ~2 μs (that's 2 * 10^-6 seconds) when the piece table is empty to ~10 μs when it has 1 million pieces (on a 750Mhz computer with 256MB of memory). The experimental data are the blue squares, and the theoretical curve is the black line. I guess that the two dots that are visibly out of the theoretical curve are just due to a process switch between when I start measuring and when I end the measure. (One of the other ten processes that were running in my computer should have got several cicles while I was measuring.)

So far, so good. The delete operation, however, hides more surprises than its peer, the insert operation. To interpret the next figure, we should divide it in two parts. The first one is the inferior branch that starts between 2 and 3 μs and 0 pieces, and ends with 250,000 pieces and between 7 and 8 μs. The second branch (the upper one) goes from 250,000 pieces to 0 pieces.

When the delete operation is performed in a piece table with a big piece, it will split the piece in two. When it is performed in a piece table with plenty of pieces that contain only one character, then it will delete a piece.

The delete operation starts making more and more pieces, until it reaches a stability point in which the number of destructions equals the number of creation (in our figure, when the piece table has 250,000 pieces), after this point the number of destructions becomes the dominant factor, and we end coming back to 0 pieces.

Now, why does the delete operation show this histeresis ? My guess is that the tree is extremely dispersed in the computer's memory in the second branch. The tree had 250,000 nodes, who were compacted in several MB. When we start deleting them randomly, the mean distance (in the computer's memory) between two nodes increases, and this distance induce more and more page misses (and that becomes the dominant factor). But I'm just guessing.

We're not yet lost, as we can reduce the number of page misses. To reduce them I will focus on:

Reduce the memory size. The “color” of the node can be optimized to the point of not adding a single bit to the size of the node structure. The size_right field can also be suppresed entirely without any bad consecuence (all the operations keep the same speed, maybe even a bit faster) [DONE]. The node structure can be allocated using a memory pool. That way the bookeping memory that the compiler uses to handle the structure can be optimized away.
Increment the spacial locality of the nodes. Using the memory pool (again), we can put all the nodes together, and thus put them in the same page (or in a reduced set of pages) all the information need to walk through the tree.

Conclusion

I've shown than it's possible to have a backend in which all the operations have a worst case of O(log(n)), and usual cases (forward and backward movement) can still be resolved on an average time of O(1).

Is it a priority for the AbiWord project to switch to this kind of piece table? IMHO, no. It's not even near a priority. As I said in the introduction, the current piece table is already a high quality implementation, and it has got several useful improvements performance-wise over the time.

The current bottleneck in AbiWord right now is in the layout part (TODO: give some figures. Assertions without facts suck.). Neverless, the same kind of structure that I propose here is automatically usable to also solve theO(n) operations that AbiWord has in the layout code.

Anyway, let's say that I wanted to solve this problem, not because I considered it very important, but because it was the second time that I tried to solve it, and I knew that it was possible :-)

That said, when the performance problems of the layout code will be fixed, the piece table will eventually show its head in profilers, and I hope that these modifications will help at that time

I've done a reference implementation of a piece table as the one that I describe here. You can download the code PieceTable2.zip. In the zip you will find two different backends for the piece table, a red-black tree and a double linked list. It also contains an almost complet regression test. Update: This version contains a piece table without the size_right member.

The code has been tested with MSVC 6 and gcc 2.95.3.

TODO: Remove exceptions (AbiWord doesn't like C++ exceptions), fix the (2) functions that have sub-basic exception guarantees, complete the regression test (mostly done), fully comment the code, and profile it (done).

Improving the AbiWord's Piece Table的更多相关文章

Office文件的实质是什么
Office文件的实质是什么一.总结一句话总结:对于一个Microsoft Office文件,其实质是一个Windows复合二进制文件(Windows Compound Binary File), ...
Office文件的奥秘——.NET平台下不借助Office实现Word、Powerpoint等文件的解析
Office文件的奥秘——.NET平台下不借助Office实现Word.Powerpoint等文件的解析分类: 技术 2013-07-26 15:38 852人阅读评论(0) 收藏举报 Offi ...
Chapter 6 — Improving ASP.NET Performance
https://msdn.microsoft.com/en-us/library/ff647787.aspx Retired Content This content is outdated and ...
abiword Related Pages
Application Framework The 'af' directory contains all source code for the cross-platform application ...
DHT(Distributed Hash Table) Translator
DHT(Distributed Hash Table) Translator What is DHT? DHT is the real core of how GlusterFS aggregates ...
提高神经网络的学习方式Improving the way neural networks learn
When a golf player is first learning to play golf, they usually spend most of their time developing ...
BookNote: Refactoring - Improving the Design of Existing Code
BookNote: Refactoring - Improving the Design of Existing Code From "Refactoring - Improving the ...
PE Header and Export Table for Delphi
Malware Analysis Tutorial 8: PE Header and Export Table 2. Background Information of PE HeaderAny bi ...
Cucumber 步骤中传Data Table作为参数
引用链接:http://cukes.info/step-definitions.html Data Tables Data Tables are handy for specifying a larg ...

随机推荐

iOS基本的发短信和打电话调用
电话.短信是手机的基础功能,iOS中提供了接口,让我们调用.这篇文章简单的介绍一下iOS的打电话.发短信在程序中怎么调用. 1.打电话 [[UIApplication sharedApplicatio ...
VCMI Mods list
http://heroescommunity.com/viewthread.php3?TID=40902 http://heroes3wog.net/ http://heroes3towns.com/ ...
Prime Palindrome Golf
Prime Palindrome Golf Do you know how to play Prime Palindrome Golf? You are given a number and your ...
利用智能手机(Android)追踪一块磁铁（一）
之前看到一个外国人用iPhone做了一个追踪磁铁的Demo感觉不错(参考视频:http://v.youku.com/v_show/id_XODM2MjczNzE2.html),然后我就参考做了一个An ...
Linux 计算器
bc: 默认没有小数位,使用scale=2设置保留的位数.
程序员求职之道（《程序员面试笔试宝典》）之程序设计基础（static的使用）？
在C语言中,关键字static的意思是静态,它有三个明显的作用:首先,在函数体内,静态变量具有"记忆"功能,即一个被声明为静态的变量在这一函数被调用过程中其值维持不变.其次,在模块 ...
【转】手机web——自适应网页设计（html/css控制）
手机web——自适应网页设计(html/css控制) 就目前形势来看,Web App 正是眼下的一个趋势和潮流,但是,对于Web App的设计可能大家有的不是很了解,下面就将整理好的网页设计的技巧奉献 ...
C#基于UDP实现的P2P语音聊天工具(1)
这篇文章主要是一个应用,使用udp传送语音和文本等信息.在这个系统中没有服务端和客户端,相互通讯都是直接相互联系的.能够很好的实现效果. 语音获取要想发送语音信息,首先得获取语音,这里有几种方法,一 ...
ASP.NET MVC 之表格分页
简单效果图:(框架:MVC+NHibernate) 要点: (1)首先建立表格分页Model(GridModel.cs) (2)然后建立数据展示页(PageCloth.cshtml) (3)再建分页版 ...
Leetcode 238 Product of Array Except Self 时间O(n)和空间O(1)解法
1. 问题描写叙述给定一个n个整数的数组(n>1n>1)nums,返回一个数组output,当中的元素outputioutput_i的值为原数组nums中除numsinums_i之外的全 ...

Improving the AbiWord's Piece Table