https://github.com/couchbaselabs/vellum

Building an FST

To build an FST, create a new builder using the New() method. This method takes an io.Writer as an argument. As the FST is being built, data will be streamed to the writer as soon as possible. With this builder you MUST insert keys in lexicographic order. Inserting keys out of order will result in an error. After inserting the last key into the builder, you MUST call Close() on the builder. This will flush all remaining data to the underlying writer.

In memory:

  var buf bytes.Buffer
builder, err := vellum.New(&buf, nil)
if err != nil {
log.Fatal(err)
}

To disk:

  f, err := os.Create("/tmp/vellum.fst")
if err != nil {
log.Fatal(err)
}
builder, err := vellum.New(f, nil)
if err != nil {
log.Fatal(err)
}

MUST insert keys in lexicographic order:

err = builder.Insert([]byte("cat"), 1)
if err != nil {
log.Fatal(err)
} err = builder.Insert([]byte("dog"), 2)
if err != nil {
log.Fatal(err)
} err = builder.Insert([]byte("fish"), 3)
if err != nil {
log.Fatal(err)
} err = builder.Close()
if err != nil {
log.Fatal(err)
}

Using an FST

After closing the builder, the data can be used to instantiate an FST. If the data was written to disk, you can use the Open()method to mmap the file. If the data is already in memory, or you wish to load/mmap the data yourself, you can instantiate the FST with the Load() method.

Load in memory:

  fst, err := vellum.Load(buf.Bytes())
if err != nil {
log.Fatal(err)
}

Open from disk:

  fst, err := vellum.Open("/tmp/vellum.fst")
if err != nil {
log.Fatal(err)
}

Get key/value:

  val, exists, err = fst.Get([]byte("dog"))
if err != nil {
log.Fatal(err)
}
if exists {
fmt.Printf("contains dog with val: %d\n", val)
} else {
fmt.Printf("does not contain dog")
}

Iterate key/values:

  itr, err := fst.Iterator(startKeyInclusive, endKeyExclusive)
for err == nil {
key, val := itr.Current()
fmt.Printf("contains key: %s val: %d", key, val)
err = itr.Next()
}
if err != nil {
log.Fatal(err)
}

How does the FST get built?

A full example of the implementation is beyond the scope of this README, but let's consider a small example where we want to insert 3 key/value pairs.

First we insert "are" with the value 4.

Next, we insert "ate" with the value 2.

Notice how the values associated with the transitions were adjusted so that by summing them while traversing we still get the expected value.

At this point, we see that state 5 looks like state 3, and state 4 looks like state 2. But, we cannot yet combine them because future inserts could change this.

Now, we insert "see" with value 3. Once it has been added, we now know that states 5 and 4 can longer change. Since they are identical to 3 and 2, we replace them.

Again, we see that states 7 and 8 appear to be identical to 2 and 3.

Having inserted our last key, we call Close() on the builder.

Now, states 7 and 8 can safely be replaced with 2 and 3.

For additional information, see the references at the bottom of this document.

A Go library implementing an FST (finite state transducer)——mark下的更多相关文章

  1. Finite State Transducers

    一, 简介 Finite State Transducers 简称 FST, 中文名:有穷状态转换器.在自然语言处理等领域有很大应用,其功能类似于字典的功能(STL 中的map,C# 中的Dictio ...

  2. Finite State Machine 是什么?

    状态机(Finite State Machine):状态机由状态寄存器和组合逻辑电路构成,能够根据控制信号按照预先设定的状态进行状态转移,是协调相关信号动       作.完成特定操作的控制中心. 类 ...

  3. Finite State Machine

    Contents [hide]  1 Description 2 Components 3 C# - FSMSystem.cs 4 Example Description This is a Dete ...

  4. 证明与计算(7): 有限状态机(Finite State Machine)

    什么是有限状态机(Finite State Machine)? 什么是确定性有限状态机(deterministic finite automaton, DFA )? 什么是非确定性有限状态机(nond ...

  5. paper:synthesizable finite state machine design techniques using the new systemverilog 3.0 enhancements 之 standard verilog FSM conding styles(二段式)

    1.Two always block style with combinational outputs(Good Style) 对应的代码如下: 2段式总结: (1)the combinational ...

  6. paper:synthesizable finite state machine design techniques using the new systemverilog 3.0 enhancements 之 FSM Coding Goals

    1.the fsm coding style should be easily modifiable to change state encoding and FSM styles. FSM 的的 状 ...

  7. FPGA学习笔记(七)——FSM(Finite State Machine,有限状态机)设计

    FPGA设计中,最重要的设计思想就是状态机的设计思想!状态机的本质就是对具有逻辑顺序和时序规律的事件的一种描述方法,它有三个要素:状态.输入.输出:状态也叫做状态变量(比如可以用电机的不同转速作为状态 ...

  8. paper:synthesizable finite state machine design techniques using the new systemverilog 3.0 enhancements 之 standard verilog FSM conding styles(三段式)

    Three always block style with registered outputs(Good style)

  9. TCP Operational Overview and the TCP Finite State Machine (FSM) http://tcpipguide.com/free/t_TCPOperationalOverviewandtheTCPFiniteStateMachineF.htm

    http://tcpipguide.com/free/t_TCPOperationalOverviewandtheTCPFiniteStateMachineF.htm   http://tcpipgu ...

随机推荐

  1. 性能测试培训day1

    测试本质: 1构造测试数据和期望结果 2执行 3验证 自动化测试: 写完代码,单元测试测代码逻辑,单元测试搞清楚代码逻辑就行了(白盒测试)先静态,运行前用工具扫描BUG例如(a==11写成a=11), ...

  2. python 连接sqlserver: pymssql

    停了一个月,终于还是把这个做了,工作需要!!!在装pymssql时,一直报错,确定了要先装freetds: 1. 安装freetds时报错,搜索到要先进行如下操作: brew unlink freet ...

  3. 编译时报错,找不到指定路径下的command,而路径是正确的。

    使用的Fedora 18 64位的系统kernel,内核为3.6.10.按照要求使用yum install *** 安装各项工具. path路径使用提供的toolchain,各种路径也安装正确,却发现 ...

  4. 关于 <customErrors> 标记的“mode”属性设置为“Off”的问题的解决方案

    用 权限问题 <customErrors> 标记的“mode”属性设置为“Off”. 权限问题标记的“mode”属性设置为“Off”.说明: 服务器上出现应用程序错误.此应用程序的当前自定 ...

  5. 动态规划之最长公共子序列(LCS)

             在字符串S中按照其先后顺序依次取出若干个字符,并讲它们排列成一个新的字符串,这个字符串就被称为原字符串的子串          有两个字符串S1和S2,求一个最长公共子串,即求字符串 ...

  6. Flask基础(3):session、flash、特殊装饰器、蓝图、路由正则匹配、上下文管理 & flask-session

    Session: Flask 默认将 session 以加密的形式放到了浏览器的 cookie 中 Flask 的 session 就是一个字典,字典有什么方法 session 就有什么方法 flas ...

  7. httpClient使用总结

    前记 最近有个需求,需要根据商品id获取商品详情: 首先想到的是在浏览器里输入url按回车就可以了:或者在linux中使用curl+url来发起一个http请求; 但如果是要在java程序中发出htt ...

  8. Linux下汇编语言学习笔记46 ---

    这是17年暑假学习Linux汇编语言的笔记记录,参考书目为清华大学出版社 Jeff Duntemann著 梁晓辉译<汇编语言基于Linux环境>的书,喜欢看原版书的同学可以看<Ass ...

  9. Ubuntu12.04之修改密码

    Ubuntu 12.04 默认root没有密码 修改密码方式如下: test@localhost:~$ sudo passwd root [sudo] password for test: 输入新的 ...

  10. POJ 3255_Roadblocks

    题意: 无向图,求单源次短路,每条边可以走多次. 分析: 对于每个点记录最短路和次短路,次短路可以是由最短路+边或者是次短路+边更新而来.在更新每个点的最短路时,相应更新次短路,如果最短路更新了,就令 ...