<编程珠玑>笔记 (一) 问题-算法-数据结构
1 精确描述问题
第一章强调的重点在于”精确的描述问题“,这是程序开发的第一步 -- "Problem definition"
1.1 Precise problem statement
1) input: a file containing at most 107 positive intergers (each < 107); any interger occurs twice is an error; no other data is associated with the interger
2) output: a sorted list in increasing order
3) constraints: at most 1MB available in main memory; ample disk storage; 10s ≤ runtime < several minutes (at most)
1.2 Program design
1) mergesort with work files
read the file once from the input, sort it with the aid of work files that are read and written many times, and then write it once

2) 40-pass algorithm
if we store each number in 4 bytes (32-bit int), we can store 250,000 numbers in 1MB (1 megabytes/4 bytes).
we use a program that makes 40 passes over the input files. The first pass reads 0 ~ 249,999, and the 40th pass reads 9,750,000 ~ 9,999,999

3) read once without intermediate files
only if we could represent all the integers in the input file in 1MB of main memory (即使利用下文 bitmap 结构,107个整数仍需要1.25MB > 1MB)

1.3 Implementation sketch
we use bitmap data structure to represent the file by a string of 107 bits in which the ith bit is on only if the interger i is in the file
E.g. store the set {1, 2, 3, 5, 8, 13} in a string of 20 bits
0 1 1 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0
1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11,12,13,14,15,16,17,18,19,20
// n is the number of bits in the vector (in this case 10,000,000)
// 1) initialize set to empty
, n)
bit[i] =
// 2) insert present elements into the set
for each i in the input file
bit[i] =
// 3) write sorted output
, n)
write i on the output file
1.4 Principles
design = right problem + bitmap data structure + multiple-pass algorithm + time-space tradeoff
2 三个算法
在第二章里作者首先提出了三个问题,然后引出各自对应的算法实现。
2.1 Binary search
Given a sequential file that contain at most 4x109 integers(32-bit) in random order, find a 32-bit integer that is not in the file.
How would you solve it with ample main memory? -- bitmap (232bits)
or using several external "scratch" files but only a few hundred bytes of main memory? -- binary search 二分查找

the insight is that we can probe a range by counting the elements above
and below its midpoint: either the upper or the lower range has at most
half the elements in the total range. Because the total range has a
missing element, the smaller half must also have a missing element.
its only drawback is that the entire table must be known and sorted in advance.
2.2 Rotate -> reverse
Rotate a one-dimensional vector x of n elements left by i positions. For instance, with n=8 and i=3, the vector abcdefgh is rotated into defghabc.
Can you rotate the vector in time proportional to n using only a few dozen extra bytes of storage?
starting with ab, we reverse a to get arb, reverse b to get arbr, and then reverse the whole thing to get (arbr)r, which is exactly ba
reverse(, i-) // cbadefgh reverse(i, n-) // cbahgfed reverse(, n-) // defghabc
2.3 Signatures
Given a dictionary of English words, find all sets of anagram. For instance, "pots", "stop" and "tops" are all anagrams of each other.

3 四条原则
第三章作者给出了四条原则,并重点阐述一种编程观念 “data does indeed structure programs”
1) rework repeated code into arrays
a long stretch of similar code is often best expressed by the array
2) encapsulate complex structures 封装复杂结构
define a sophisticated data structure in abstract terms, and express those operations as a class
3) use advanced tools when possible
Hypertext, name-value pairs, spreadsheets, databases, languages are powerful tools
4) let the data structure the program
before writing code, thoroughly understand the input, the output and the intermediate data structures
4 验证正确性
在芯片设计(IC)领域有专门的职位叫做芯片验证工程师,他们常用的一种方法叫形式验证(Formal Verification),具体包括等价性检查,模型检查和定理证明。
本章所讲的程序验证方法(并非软件测试),与芯片行业的形式验证非常相似。参考芯片行业,随着分工的细化,软件领域也会出现更多的验证工程师。
4.1 Binary search
determine whether the sorted array x[0..n-1] contains the target element t
mustbe(range): the key idea is that we always know that if t is anywhere in x[0..n-1], then it must be in a certain range of x
1) sketch
/* sketch */
initialize range to ..n-
loop
{ invariation: mustbe(range) }
if range is empty,
break and report that t is not in the array
compute m, the middle of the range
use m as a probe to shrink the range
if t is found during the shrinking process,
break and report its position
2) refine
/* refine */
lo = ; hi = n-
loop
{ mustbe(lo, hi) }
if lo > hi
p = -; break
mid = lo + (hi-lo)/
case
x[mid] < t: lo = mid +
x[mid] == t: p = m; break
x[mid] > t: hi = mid -
3) program
/* program */
{ mustbe(, n-) }
lo = ; hi = n -
{ mustbe(lo,hi) }
loop
{ mustbe(lo,hi) }
if lo > hi
{ lo > hi && mustbe(lo,hi) }
{ t is not in the array }
p = -; break
{ mustbe(lo,hi) && lo <= hi }
m = lo + (hi-lo)/
{ mustbe(lo,hi) && lo <= mid <= hi }
case
x[mid] < t:
{ mustbe(lo,hi) && cantbe(,mid) }
{ mustbe(mid+,hi) }
lo = mid +
{ mustbe(lo,hi) }
x[mid] == t:
{ mustbe(lo,hi) }
p = mid; break
x[m] > t:
{ mustbe(lo,hi) && cantbe(mid, n-) }
{ mustbe(lo,mid-) }
hi = mid-
{ mustbe(lo,hi) }
{ mustbe(lo,hi) }
4.2 Program verification
1) assertions (inputs, variables and outputs)
2) sequential control structures
"do this statement and then that statement" -- place assertions between them and analyze each step of the program' progress individually
3) selection control structures
"if", "case": one of many choices is selected -- consider each of the several choices individually
4) iteration control structures
initialization: invariation is true when the loop is executed the first time
preservation: invariation is true before and after each iteration of loop
termination: the desired result is true whenever execution of the loop terminates

5) functions
precondition: the state(inputs, variables) must be true before it is called
postcondition: what the function will guarantee on termination
int bsearch( int t, int x[], int n )
/* precondition: x[0] <= x[1] <= ... <= x[n-1]
postcondition:
result == -1 => t not present in x
0 <= result < n => x[result] == t
*/
5 编程实现
本章紧接上一章,继续以“二分查找”为例,展示整个程序的实现过程
5.1 coding
/* return (any) position if t is in sorted x[0..n-1]
or -1 if t is not present */
int binarysearch(DataType t)
{
int lo, hi, mid;
lo = ;
hi = n-;
while(lo < hi)
{
mid = lo + (hi-lo)/;
if(x[mid] < t)
lo = mid + ;
else if(x[mid] == t)
return mid;
else /* x[mid] > t */
hi = mid -;
}
;
}
5.2 testing
5.3 debugging
5.4 timing
<编程珠玑>笔记 (一) 问题-算法-数据结构的更多相关文章
- 【C语言编程入门笔记】排序算法之快速排序,一文轻松掌握快排!
排序算法一直是c语言重点,各个算法适应不用的环境,同时,在面试时,排序算法也是经常被问到的.今天我们介绍下快速排序,简称就是快排. 1.快速排序思想: 快排使用 分治法 (Divide and con ...
- 编程珠玑第一章的算法,Java实现,通俗易懂
该算法也就是所谓的位图算法,用一个int表示32位,也就是实际值为1~32的数. 按照书里说的, 该算法只适合内存有限,而磁盘和时间不限,且数字在1~MAX之间不重复的排序. package demo ...
- Select 选择算法 - 编程珠玑(续) 笔记
Select 算法 I 编程珠玑(续)介绍的 Quickselect 算法 选择 N 个元素中的第 K 小(大)值,是日常场景中常见的问题,也是经典的算法问题. 选取 N 个元素的数组的中的第 K 小 ...
- 《编程珠玑,字字珠玑》读书笔记完结篇——AVL树
写在最前面的 手贱翻开了<珠玑>的最后几章,所以这一篇更多是关于13.14.15章的内容.这篇文章的主要内容是“AVL树”,即平衡树,比红黑树低一个等次.捣乱真惹不起红黑树,情况很复杂:而 ...
- 【读书笔记】《编程珠玑》第一章之位向量&位图
此书的叙述模式是借由一个具体问题来引出的一系列算法,数据结构等等方面的技巧性策略.共分三篇,基础,性能,应用.每篇涵盖数章,章内案例都非常切实棘手,解说也生动有趣. 自个呢也是头一次接触编程技巧类的书 ...
- 读书笔记--编程珠玑II
学化学的应该都知道chemdraw,这是一款专门绘制化学结构的软件,什么苯环.双键各种word难以搞定的分子式,你可以轻松的用chemdraw完成,可以称得上化学工作者居家旅行必备的良药.其实早在19 ...
- 编程珠玑I算法总结
主要是根据编程珠玑后面的Algorithm附录总结了一下这本书里面的经典算法. 1 辗转相减求最大公约数 思想:最大公约数能整除i和j,则其一定也能整除i-j(if i>j) int gcd(i ...
- 学习笔记之编程珠玑 Programming Pearls
Programming Pearls (2nd Edition): Jon Bentley: 0785342657883: Amazon.com: Books https://www.amazon.c ...
- 《Clojure编程》笔记 第3章 集合类与数据结构
目录 背景简述 第3章 集合类与数据结构 3.1 抽象优于实现 3.1.1 Collection 3.1.2 Sequence 3.1.3 Associative 3.1.4 Indexed 3.1. ...
随机推荐
- Android动画效果之Property Animation进阶(属性动画)
前言: 前面初步认识了Android的Property Animation(属性动画)Android动画效果之初识Property Animation(属性动画)(三),并且利用属性动画简单了补间动画 ...
- 搞定.NET MVC IOC控制反转,依赖注入
一直听说IOC,但是一直没接触过,只看例子好像很高达上的样子,今天抽了点时间实现了下,当然也是借助博客园里面很多前辈的文章来搞的!现在做个笔记,防止自己以后忘记! 1.首先创建MVC项目 2.然后新建 ...
- React.js实现原生js拖拽效果及思考
一.起因&思路 不知不觉,已经好几天没写博客了...近来除了研究React,还做了公司官网... 一直想写一个原生js拖拽效果,又加上近来学react学得比较嗨.所以就用react来实现这个拖 ...
- Bootstrap框架的学习(二)
一.下载Bootstrap Bootstrap (当前版本 v3.3.0)提供以下几种方式帮你快速上手,每一种方式针对具有不同技能等级的开发者和不同的使用场景. 下载地址:http://v3.boot ...
- asp.net core 依赖注入问题
最近.net core可以跨平台了,这是一个伟大的事情,为了可以赶上两年以后的跨平台部署大潮,我也加入到了学习之列.今天研究的是依赖注入,但是我发现一个问题,困扰我很久,现在我贴出来,希望可以有人帮忙 ...
- Unicode转义(\uXXXX)的编码和解码
在涉及Web前端开发时, 有时会遇到\uXXXX格式表示的字符, 其中XXXX是16进制数字的字符串表示形式, 在js中这个叫Unicode转义字符, 和\n \r同属于转义字符. 在其他语言中也有类 ...
- WPF 自定义Grid双击事件
先设置Grid_MouseDown事件函数,然后在函数内增加如下代码 ) { //双击执行 } 更改数字可以区别单击和双击 感谢@一 定 会 去 旅 行
- Servlet 服务器性能提高--->数据库请求频率控制(原创)
首先我要说下我实现这个功能接口涉及到的业务和实现的详细流程,然后会说此接口涉及到的相关技术,最后会贴出注释后的详细代码, 这个接口涉及到的是 app上咻一咻功能,咻一咻中奖的奖品一共有七类,其中四类是 ...
- php中抽象类与接口的概念以及区别
php里面的接口类,抽象类到底有什么用呢? 刚接触php的时候,觉得这个东西没什么用,其实这些东西还是有一定的作用的,下面我就简单的说说. 1.php 接口类:interface 其实他们的作用很简单 ...
- CentOS 防火墙开放特定端口
iptables是linux下的防火墙,同时也是服务名称. service iptables status 查看防火墙状态 service iptables start ...