Improving the AbiWord's Piece Table【转】

One of the most critical parts of any word processor is the backend used to store its text. It should be fast to lookup, fast to insert and to erase new text at a random location, undo friendly, etc.

The AbiWord backend has all these virtues and some more. It's (IMHO) the most impressive piece of code of the whole AbiWord project, and one that has been exceptionally stable over the years. In short, Jeff's code rocksTM.

However, improvement is still possible. I will show a modified piece table that changes the current O(n)complexity of current insertion and lookup operations by O(log(n)) operations.

Nota Bene: In this discussion, “n” is the number of pieces, not the number of characters.

Current Piece Table

If you already know how the piece table works, you can skip this section.

TODO: Write this section

In the meantime, you can read several good descriptions in the article Data Structures for Text Sequences (by Charles Crowley) and in this Piece Table Description.

The piece table that AbiWord uses is like the one explained in these articles, except that it has a little cache (the last served piece and the next one on the piece table are cached), and that after a change on the piece table, when you do a lookup, a vector is created to mirror the doubly linked list of pieces (obviously, a O(n)operation).

This vector increases the speed at which pieces are served (as long as it remains valid) and looked up (the lookup operation becomes O(log(n)) once the vector is up to date).

Unfortunatelly, the vector comes with a price. It slows down the first lookup after an insert/erase operation, it takes more memory, and it complicates the code that uses the pf_Fragments class, as it has to signal when the frags becomes dirty (the AbiWord “fragments” are in this document “pieces”).

Red-Black trees

You should be able to find plenty of explanations about how red-black trees work in the net.

The complexity guarantees of red-black trees are O(log(n)) for insertion, erase, lookup, next and previous in the worst case. The next and previous operations have an average complexity of O(1) (in average, you should just follow two pointers to reach the next or previous node).

TODO: Give some pointers to red-black trees descriptions.

Suggested modifications

The modification that I suggest is to change the doubly linked list with a auto-balanced tree. We need to stablish a key and a comparation operation to make the change possible.

As we want to make lookups (i.e., to pass from a document's position to a piece) in O(log(n)), it seems natural to choose as key something related to the document's position range that is covered by each piece.

If we choose as key the beginning position and the size of the piece, we'll have trees like the next one:

It's obvious that lookup is done now in O(log(n)), but if we do an insertion in the middle of the document, we will have to update the “beginning position” of the half upper pieces of the documents (half the tree). As we need to walk from a node to the next one, and we need to visit in the worst case O(n) nodes, the insertion operation will be O(n).

Nota Bene: It may seem that it should be O(n log(n)), because the worst case of a “go to the next node” operation is O(log(n)), but we will prove latter that the average cost of this operation is just O(1) (TODO: write down the prove!).

To solve this problem, we will “distribute” the offset information among several nodes, so it will be harder to recover the offset of a piece (O(log(n)) instead of O(1)), but it will be faster to “fix” the offsets of all the nodes in the tree (O(log(n)) instead of O(n)). We will put in each node only the size of its left subtree, the size of its right subtree, and its own size.

With this change, the lookup operation remain O(log(n)), and the insertion operation becomes alsoO(log(n)), as we don't have to update the whole tree anymore, but just all the parents of the modified piece (and any leaf has O(log(n)) parents).

With this strategy, the new tree will look like this one:

Now, if we insert/erase a node, let's say that this node becomes a left son, we should just “fix” the size_left of its parent, and then repeat the fixation process with our parent. This fixation should be done before the eventual rebalance of the tree starts. And, of course, the sizes should also be updated after each rotation in the rebalancing of the tree.

total = ;
while (node != root)
{
total = node→size_left + node→size + node→size_right;
if (node→parent→left == node)
node→parent→size_left = total;
else
node→parent→size_right = total;
}

As the rebalance of the tree is a O(log(n)) operation for a red-black tree (the variant of autobalanced tree that I've used here), and the “fix size” operation is also a O(log(n)) operation, the whole cost of the insertion/erase of a new node is O(log(n)).

To calculate the offset of a node, we should start with the size_left field of this node, and add the size_left + size of all the ancestors for whom this node is in the right subtree. For example, to calculate the offset of the node that has a size of 12, we start with its size_left (0), and we jump to its parent (size 1). As we are the left son, we don't take in account the contribution of the parent. We then jump to the grandparent (size 8), and this time, we're in the right subtree of the grandparent, so we add the size_left and the size of the grandparent to the previous offset (0 + 9 + 8 = 17). We jump to the parent of the grandparent (the root of the tree), and as we're in the left subtree, we don't take in account the contribution of the root. We're done, the offset of the node is 17.

offset = node→size_left;
while (node != root)
{
if (node→parent→right == node)
offset += node→parent→size_left + node→parent→size; node = node→parent;
}

The lookup operation is trivially in O(log(n)), due to the invariants of the red-black tree. (The lookup operation is a linear function of the height of the tree, and the height of the tree is always less than 2 log(n)in a RB tree.)

Never assume, measure!

I've performed two performance tests. In the first one, I throw 1,000,000 characters to the piece table, each one of them at a random position. The piece table will finish with roughly 1,000,000 pieces. That's the equivalent of a dense document with 30,000 pages (and with a good deal of format changes).

The mean time for the insertion operation goes from ~2 μs (that's 2 * 10-6 seconds) when the piece table is empty to ~10 μs when it has 1 million pieces (on a 750Mhz computer with 256MB of memory). The experimental data are the blue squares, and the theoretical curve is the black line. I guess that the two dots that are visibly out of the theoretical curve are just due to a process switch between when I start measuring and when I end the measure. (One of the other ten processes that were running in my computer should have got several cicles while I was measuring.)

So far, so good. The delete operation, however, hides more surprises than its peer, the insert operation. To interpret the next figure, we should divide it in two parts. The first one is the inferior branch that starts between 2 and 3 μs and 0 pieces, and ends with 250,000 pieces and between 7 and 8 μs. The second branch (the upper one) goes from 250,000 pieces to 0 pieces.

When the delete operation is performed in a piece table with a big piece, it will split the piece in two. When it is performed in a piece table with plenty of pieces that contain only one character, then it will delete a piece.

The delete operation starts making more and more pieces, until it reaches a stability point in which the number of destructions equals the number of creation (in our figure, when the piece table has 250,000 pieces), after this point the number of destructions becomes the dominant factor, and we end coming back to 0 pieces.

Now, why does the delete operation show this histeresis ? My guess is that the tree is extremely dispersed in the computer's memory in the second branch. The tree had 250,000 nodes, who were compacted in several MB. When we start deleting them randomly, the mean distance (in the computer's memory) between two nodes increases, and this distance induce more and more page misses (and that becomes the dominant factor). But I'm just guessing.

We're not yet lost, as we can reduce the number of page misses. To reduce them I will focus on:

  1. Reduce the memory size. The “color” of the node can be optimized to the point of not adding a single bit to the size of the node structure. The size_right field can also be suppresed entirely without any bad consecuence (all the operations keep the same speed, maybe even a bit faster) [DONE]. The node structure can be allocated using a memory pool. That way the bookeping memory that the compiler uses to handle the structure can be optimized away.
  2. Increment the spacial locality of the nodes. Using the memory pool (again), we can put all the nodes together, and thus put them in the same page (or in a reduced set of pages) all the information need to walk through the tree.

Conclusion

I've shown than it's possible to have a backend in which all the operations have a worst case of O(log(n)), and usual cases (forward and backward movement) can still be resolved on an average time of O(1).

Is it a priority for the AbiWord project to switch to this kind of piece table? IMHO, no. It's not even near a priority. As I said in the introduction, the current piece table is already a high quality implementation, and it has got several useful improvements performance-wise over the time.

The current bottleneck in AbiWord right now is in the layout part (TODO: give some figures. Assertions without facts suck.). Neverless, the same kind of structure that I propose here is automatically usable to also solve theO(n) operations that AbiWord has in the layout code.

Anyway, let's say that I wanted to solve this problem, not because I considered it very important, but because it was the second time that I tried to solve it, and I knew that it was possible :-)

That said, when the performance problems of the layout code will be fixed, the piece table will eventually show its head in profilers, and I hope that these modifications will help at that time

I've done a reference implementation of a piece table as the one that I describe here. You can download the code PieceTable2.zip. In the zip you will find two different backends for the piece table, a red-black tree and a double linked list. It also contains an almost complet regression test. Update: This version contains a piece table without the size_right member.

The code has been tested with MSVC 6 and gcc 2.95.3.

TODO: Remove exceptions (AbiWord doesn't like C++ exceptions), fix the (2) functions that have sub-basic exception guarantees, complete the regression test (mostly done), fully comment the code, and profile it (done).

Improving the AbiWord's Piece Table的更多相关文章

  1. Office文件的实质是什么

    Office文件的实质是什么 一.总结 一句话总结:对于一个Microsoft Office文件,其实质是一个Windows复合二进制文件(Windows Compound Binary File), ...

  2. Office文件的奥秘——.NET平台下不借助Office实现Word、Powerpoint等文件的解析

    Office文件的奥秘——.NET平台下不借助Office实现Word.Powerpoint等文件的解析 分类: 技术 2013-07-26 15:38 852人阅读 评论(0) 收藏 举报 Offi ...

  3. Chapter 6 — Improving ASP.NET Performance

    https://msdn.microsoft.com/en-us/library/ff647787.aspx Retired Content This content is outdated and ...

  4. abiword Related Pages

    Application Framework The 'af' directory contains all source code for the cross-platform application ...

  5. DHT(Distributed Hash Table) Translator

    DHT(Distributed Hash Table) Translator What is DHT? DHT is the real core of how GlusterFS aggregates ...

  6. 提高神经网络的学习方式Improving the way neural networks learn

    When a golf player is first learning to play golf, they usually spend most of their time developing ...

  7. BookNote: Refactoring - Improving the Design of Existing Code

    BookNote: Refactoring - Improving the Design of Existing Code From "Refactoring - Improving the ...

  8. PE Header and Export Table for Delphi

    Malware Analysis Tutorial 8: PE Header and Export Table 2. Background Information of PE HeaderAny bi ...

  9. Cucumber 步骤中传Data Table作为参数

    引用链接:http://cukes.info/step-definitions.html Data Tables Data Tables are handy for specifying a larg ...

随机推荐

  1. CoreData (四)备

    监听NSFetchedResultsController 之前说过, NSFetchedResultsController是有两个重要的功能. 第一:NSFetchedResultsControlle ...

  2. 自己动手写谷歌API翻译接口

      可以看到,利用GET请求方式,带入某些参数,就会返回一个json数组,QueryString参数如下:     同样的,我们只需要传入这三个参数,就可以获得我们想要的翻译内容,公开方法,代码如下. ...

  3. KEIL的宏汇编器A51介绍

    A51是一种具有通用特性和用法的重定位宏汇编器.它与Intel公司的MASM51宏汇编器具有很好兼容性,支持模块化编程,可以方便地与高级语言接口.A51宏汇编器支持汇编伪指令.宏处理指令以及汇编控制命 ...

  4. 【转】用户空间使用i2c_dev--不错

    原文网址:http://blog.csdn.net/yuanlulu/article/details/6161706 ========================================= ...

  5. BindService总结

    一.整体工程图 二.activity_bind.xml <?xml version="1.0" encoding="utf-8"?> <Lin ...

  6. Direct3D 索引缓存

    小学的时候我们知道3个顶点组成一个三角形,那么四个顶点我们会说有4个三角形.这就是一个顶点同时参与了四次绘制三角形的结果. 在程序中也一样,比如我们绘制的两个三角形是挨着一起的,总有几个顶点是重合的. ...

  7. 图的最小生成树(Prim、Kruskal)

    理论: Prim: 基本思想:假设G=(V,E)是连通的,TE是G上最小生成树中边的集合.算法从U={u0}(u0∈V).TE={}开始.重复执行下列操作: 在所有u∈U,v∈V-U的边(u,v)∈E ...

  8. 新手使用ThinkPHP3.2.3的命名空间问题

    ThinkPHP3.2.3的命名空间问题 命名空间的出现是为了避免命名冲突. 我们在TP3.2.3的Collection和Model的创建过程中经常会遇到这样的两行代码: 这是在控制器中的写法.其中n ...

  9. 显示器 RUIGE瑞鸽高清显示器TL-S1700HD

    版权声明:本文博客琅邪工作室原创文章,博客,未经同意不得转载.

  10. Android应用程序框架层和系统运行库层日志系统源代码分析

    文章转载至CSDN社区罗升阳的安卓之旅,原文地址:http://blog.csdn.net/luoshengyang/article/details/6598703 在开发Android应用程序时,少 ...