
(1)整数: two's complement,即补码表示法

假设用N位bit表示整数w: 其中最左边一位为符号位,符号位为0,表示正数,为1表示负数。


(2)浮点数: 浮点数采用类似科学计数法的方式


即:  x = S*M*2^E

例如: -1.10110 × 2^10


  通常 E = exp - bais, 对于float, bais = 2^(8-1)-1 = 127; M = 1 + frac。


  A. 当exp = 0xFF 时,若frac全为0,表示±∞若frac不全为0,则表示NaN(Not A Num).

  B. 当exp = 0x00 时, 为非规格化的,此时exp=0, 但是 E ≠ 0 - bais 而是规定 E = 1 - bais

   另外,M也不是1+frac, 而是 M=frac, 所以当exp=0且frac=0时,表示±0;

  C. 当exp≠0xFF,也≠0x00时位规格化的,此时才有E = exp - bais, M = 1 + frac



(例子参考下面 datalab中的 float_twice)



若有超过这个精度的数转换为float数,就存在舍入的问题。 一般浮点数舍入遵循两点:


(例子参考下面 datalab中的 float_i2f)


  1. int main(int argc, char *argv[]){
  2. double dt = 0x0.0000008p+;
  3. double d0 = 0x1.0000010p+;
  4. for (int i = ; i < ; ++i) {
  5. printf("=======\n");
  6. printf("double: %a \n", d0);
  7. printf("float: %a \n", (float)d0);
  8. d0 += dt;
  9. }
  10. }


  1. =======
  2. double: 0x1.000001p+
  3. float: 0x1p+
  4. =======
  5. double: 0x1.0000018p+
  6. float: 0x1.000002p+
  7. =======
  8. double: 0x1.000002p+
  9. float: 0x1.000002p+
  10. =======
  11. double: 0x1.0000028p+
  12. float: 0x1.000002p+
  13. =======
  14. double: 0x1.000003p+
  15. float: 0x1.000004p+
  16. =======
  17. double: 0x1.0000038p+
  18. float: 0x1.000004p+


data lab

  1. /*
  2. * CS:APP Data Lab
  3. *
  4. * <Please put your name and userid here>
  5. *
  6. * bits.c - Source file with your solutions to the Lab.
  7. * This is the file you will hand in to your instructor.
  8. *
  9. * WARNING: Do not include the <stdio.h> header; it confuses the dlc
  10. * compiler. You can still use printf for debugging without including
  11. * <stdio.h>, although you might get a compiler warning. In general,
  12. * it's not good practice to ignore compiler warnings, but in this
  13. * case it's OK.
  14. */
  16. #if 0
  17. /*
  18. * Instructions to Students:
  19. *
  20. * STEP 1: Read the following instructions carefully.
  21. */
  23. You will provide your solution to the Data Lab by
  24. editing the collection of functions in this source file.
  28. Replace the "return" statement in each function with one
  29. or more lines of C code that implements the function. Your code
  30. must conform to the following style:
  32. int Funct(arg1, arg2, ...) {
  33. /* brief description of how your implementation works */
  34. int var1 = Expr1;
  35. ...
  36. int varM = ExprM;
  38. varJ = ExprJ;
  39. ...
  40. varN = ExprN;
  41. return ExprR;
  42. }
  44. Each "Expr" is an expression using ONLY the following:
  45. . Integer constants through (0xFF), inclusive. You are
  46. not allowed to use big constants such as 0xffffffff.
  47. . Function arguments and local variables (no global variables).
  48. . Unary integer operations ! ~
  49. . Binary integer operations & ^ | + << >>
  51. Some of the problems restrict the set of allowed operators even further.
  52. Each "Expr" may consist of multiple operators. You are not restricted to
  53. one operator per line.
  55. You are expressly forbidden to:
  56. . Use any control constructs such as if, do, while, for, switch, etc.
  57. . Define or use any macros.
  58. . Define any additional functions in this file.
  59. . Call any functions.
  60. . Use any other operations, such as &&, ||, -, or ?:
  61. . Use any form of casting.
  62. . Use any data type other than int. This implies that you
  63. cannot use arrays, structs, or unions.
  65. You may assume that your machine:
  66. . Uses 2s complement, -bit representations of integers.
  67. . Performs right shifts arithmetically.
  68. . Has unpredictable behavior when shifting an integer by more
  69. than the word size.
  72. /*
  73. * pow2plus1 - returns 2^x + 1, where 0 <= x <= 31
  74. */
  75. int pow2plus1(int x) {
  76. /* exploit ability of shifts to compute powers of 2 */
  77. return ( << x) + ;
  78. }
  80. /*
  81. * pow2plus4 - returns 2^x + 4, where 0 <= x <= 31
  82. */
  83. int pow2plus4(int x) {
  84. /* exploit ability of shifts to compute powers of 2 */
  85. int result = ( << x);
  86. result += ;
  87. return result;
  88. }
  92. For the problems that require you to implent floating-point operations,
  93. the coding rules are less strict. You are allowed to use looping and
  94. conditional control. You are allowed to use both ints and unsigneds.
  95. You can use arbitrary integer and unsigned constants.
  97. You are expressly forbidden to:
  98. . Define or use any macros.
  99. . Define any additional functions in this file.
  100. . Call any functions.
  101. . Use any form of casting.
  102. . Use any data type other than int or unsigned. This means that you
  103. cannot use arrays, structs, or unions.
  104. . Use any floating point data types, operations, or constants.
  106. NOTES:
  107. . Use the dlc (data lab checker) compiler (described in the handout) to
  108. check the legality of your solutions.
  109. . Each function has a maximum number of operators (! ~ & ^ | + << >>)
  110. that you are allowed to use for your implementation of the function.
  111. The max operator count is checked by dlc. Note that '=' is not
  112. counted; you may use as many of these as you want without penalty.
  113. . Use the btest test harness to check your functions for correctness.
  114. . Use the BDD checker to formally verify your functions
  115. . The maximum number of ops for each function is given in the
  116. header comment for each function. If there are any inconsistencies
  117. between the maximum ops in the writeup and in this file, consider
  118. this file the authoritative source.
  120. /*
  121. * STEP 2: Modify the following functions according the coding rules.
  122. *
  124. * 1. Use the dlc compiler to check that your solutions conform
  125. * to the coding rules.
  126. * 2. Use the BDD checker to formally verify that your solutions produce
  127. * the correct answers.
  128. */
  130. #endif
  132. /*
  133. * bitAnd - x&y using only ~ and |
  134. * Example: bitAnd(6, 5) = 4
  135. * Legal ops: ~ |
  136. * Max ops: 8
  137. * Rating: 1
  138. */
  139. int bitAnd(int x, int y) {
  140. return ~((~x) | (~y));
  141. }
  143. /*
  144. * getByte - Extract byte n from word x
  145. * Bytes numbered from 0 (LSB) to 3 (MSB)
  146. * Examples: getByte(0x12345678,1) = 0x56
  147. * Legal ops: ! ~ & ^ | + << >>
  148. * Max ops: 6
  149. * Rating: 2
  150. */
  151. int getByte(int x, int n) {
  152. int y = x >> (n << );
  153. return y & 0xFF;
  154. }
  156. /*
  157. * logicalShift - shift x to the right by n, using a logical shift
  158. * Can assume that 0 <= n <= 31
  159. * Examples: logicalShift(0x87654321,4) = 0x08765432
  160. * Legal ops: ! ~ & ^ | + << >>
  161. * Max ops: 20
  162. * Rating: 3
  163. */
  164. int logicalShift(int x, int n) {
  165. int y = x >> n;
  167. int helper = ( << ) >> n;
  168. helper = ~(helper << );
  169. return y & helper;
  170. }
  172. /*
  173. * bitCount - returns count of number of 1's in word
  174. * Examples: bitCount(5) = 2, bitCount(7) = 3
  175. * Legal ops: ! ~ & ^ | + << >>
  176. * Max ops: 40
  177. * Rating: 4
  178. */
  179. int bitCount(int x) {
  180. int mk1, mk2, mk3, mk4, mk5, result;
  181. mk5 = 0xff | (0xff << );
  182. mk4 = 0xff | (0xff << );
  183. mk3 = 0x0f | (0x0f << );
  184. mk3 = mk3 | (mk3 << );
  185. mk2 = 0x33 | (0x33 << );
  186. mk2 = mk2 | (mk2 << );
  187. mk1 = 0x55 | (0x55 << );
  188. mk1 = mk1 | (mk1 << );
  190. // 先把16个相邻两位有几个1,并用这两位表示,然后以此类推,
  191. // 即: 32->16, 16->8, 8->4, 4->2, 2->1
  192. result = (mk1 & x) + (mk1 & (x >> ));
  193. result = (mk2 & result) + (mk2 & (result >> ));
  194. result = mk3 & (result + (result >> ));
  195. result = mk4 & (result + (result >> ));
  196. result = mk5 & (result + (result >> ));
  197. return result;
  198. }
  200. /*
  201. * bang - Compute !x without using !
  202. * Examples: bang(3) = 0, bang(0) = 1
  203. * Legal ops: ~ & ^ | + << >>
  204. * Max ops: 12
  205. * Rating: 4
  206. */
  207. int bang(int x) {
  208. return ((x | (~x + )) >> ) + ;
  209. }
  211. /*
  212. * tmin - return minimum two's complement integer
  213. * Legal ops: ! ~ & ^ | + << >>
  214. * Max ops: 4
  215. * Rating: 1
  216. */
  217. int tmin(void) {
  218. return << ;
  219. }
  221. /*
  222. * fitsBits - return 1 if x can be represented as an
  223. * n-bit, two's complement integer.
  224. * 1 <= n <= 32
  225. * Examples: fitsBits(5,3) = 0, fitsBits(-4,3) = 1
  226. * Legal ops: ! ~ & ^ | + << >>
  227. * Max ops: 15
  228. * Rating: 2
  229. */
  230. int fitsBits(int x, int n) {
  231. /*
  232. n 能表示的数,除去符号位,剩下n-1位,对应到32位int数中:
  233. 正数应该是前32-(n-1)位都是0,负数应该是32-(n-1)位都是1。
  234. */
  235. int signX = x >> ;
  236. int y = x >> (n + (~));
  237. return !(signX ^ y);
  238. }
  240. /*
  241. * divpwr2 - Compute x/(2^n), for 0 <= n <= 30
  242. * Round toward zero
  243. * Examples: divpwr2(15,1) = 7, divpwr2(-33,4) = -2
  244. * Legal ops: ! ~ & ^ | + << >>
  245. * Max ops: 15
  246. * Rating: 2
  247. */
  248. int divpwr2(int x, int n) {
  249. int signX = x >> ;
  250. int bias = ( << n) + (~);
  251. bias = signX & bias;
  252. return (x + bias) >> n;
  253. }
  255. /*
  256. * negate - return -x
  257. * Example: negate(1) = -1.
  258. * Legal ops: ! ~ & ^ | + << >>
  259. * Max ops: 5
  260. * Rating: 2
  261. */
  262. int negate(int x) {
  263. return (~x) + ;
  264. }
  266. /*
  267. * isPositive - return 1 if x > 0, return 0 otherwise
  268. * Example: isPositive(-1) = 0.
  269. * Legal ops: ! ~ & ^ | + << >>
  270. * Max ops: 8
  271. * Rating: 3
  272. */
  273. int isPositive(int x) {
  274. return !((x >> ) | (!x));
  275. }
  277. /*
  278. * isLessOrEqual - if x <= y then return 1, else return 0
  279. * Example: isLessOrEqual(4,5) = 1.
  280. * Legal ops: ! ~ & ^ | + << >>
  281. * Max ops: 24
  282. * Rating: 3
  283. */
  284. int isLessOrEqual(int x, int y) {
  285. int signX = x >> ;
  286. int signY = y >> ;
  287. int signSame = !(signX ^ signY);
  288. int diff = x + (~y) + ;
  289. int diffNegZero = (diff >> ) | (!diff);
  290. return (signSame & diffNegZero) | ((!signSame) & signX);
  291. }
  293. /*
  294. * ilog2 - return floor(log base 2 of x), where x > 0
  295. * Example: ilog2(16) = 4
  296. * Legal ops: ! ~ & ^ | + << >>
  297. * Max ops: 90
  298. * Rating: 4
  299. */
  300. int ilog2(int x) {
  301. int bn = (!!(x >> )) << ;
  302. bn = bn + ((!!(x >> (bn + ))) << );
  303. bn = bn + ((!!(x >> (bn + ))) << );
  304. bn = bn + ((!!(x >> (bn + ))) << );
  305. bn = bn + (!!(x >> (bn + )));
  306. return bn;
  307. }
  309. /*
  310. * float_neg - Return bit-level equivalent of expression -f for
  311. * floating point argument f.
  312. * Both the argument and result are passed as unsigned int's, but
  313. * they are to be interpreted as the bit-level representations of
  314. * single-precision floating point values.
  315. * When argument is NaN, return argument.
  316. * Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
  317. * Max ops: 10
  318. * Rating: 2
  319. */
  320. unsigned float_neg(unsigned uf) {
  321. /*
  322. * s111 1111 1xxx xxxx xxxx xxxx xxxx xxxx
  323. * s is sign bit, when xs are all ZERO, this represents inf,
  324. * and when xs are not all ZERO, it's NaN.
  325. */
  326. unsigned fracMask, expMask;
  327. unsigned fracPart, expPart;
  328. fracMask = ( << ) - ;
  329. expMask = 0xff << ;
  330. fracPart = uf & fracMask;
  331. expPart = uf & expMask;
  332. if ((expMask == expPart) && fracPart) {
  333. return uf;
  334. }
  336. return ( << ) + uf;
  337. }
  339. /*
  340. * float_i2f - Return bit-level equivalent of expression (float) x
  341. * Result is returned as unsigned int, but
  342. * it is to be interpreted as the bit-level representation of a
  343. * single-precision floating point values.
  344. * Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
  345. * Max ops: 30https://www.linuxmint.com/start/sarah/
  346. * Rating: 4
  347. */
  348. unsigned float_i2f(int x) {
  349. unsigned signX, expPart, fracPart;
  350. unsigned absX;
  351. unsigned hp = << ;
  352. unsigned shiftLeft = ;
  353. unsigned roundTail;
  354. unsigned result;
  355. if ( == x) {
  356. return ;
  357. }
  358. absX = x;
  359. signX = ;
  360. if (x < ) {
  361. absX = -x;
  362. signX = hp;
  363. }
  364. while ( == (hp & absX)) {
  365. absX = absX << ;
  366. shiftLeft += ;
  367. }
  368. expPart = + - shiftLeft;
  369. roundTail = absX & 0xff;
  370. fracPart = (~(hp >> )) & (absX >> );
  371. result = signX | (expPart << ) | fracPart;
  372. // 离大数更近时,进位;离小数更近时,舍位。
  373. if (roundTail > 0x80) {
  374. result += ;
  375. } else if (0x80 == roundTail) {
  376. // 离两边同样近时,根据左边一位舍入到偶数,左边一位为1则进,为0则舍。
  377. if (fracPart & ) {
  378. result += ;
  379. }
  380. }
  381. return result;
  382. }
  384. /*
  385. * float_twice - Return bit-level equivalent of expression 2*f for
  386. * floating point argument f.
  387. * Both the argument and result are passed as unsigned int's, but
  388. * they are to be interpreted as the bit-level representation of
  389. * single-precision floating point values.
  390. * When argument is NaN, return argument
  391. * Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
  392. * Max ops: 30
  393. * Rating: 4
  394. */
  395. unsigned float_twice(unsigned uf) {
  396. unsigned signX, expPart, fracPart;
  397. unsigned helper = << ;
  398. unsigned fracMask = ( << ) - ;
  399. if ( == uf) { // positive 0
  400. return ;
  401. }
  402. if (helper == uf) { // negative 0
  403. return helper;
  404. }
  405. signX = uf & helper;
  406. expPart = (uf >> ) & 0xff;
  407. if (expPart == 0xff) {
  408. return uf;
  409. }
  410. fracPart = uf & fracMask;
  411. if ( == expPart) { // 非规格化值
  412. fracPart = fracPart << ;
  413. if (fracPart & ( << )) {
  414. fracPart = fracPart & fracMask;
  415. expPart += ;
  416. }
  417. } else {
  418. expPart += ;
  419. }
  420. return signX | (expPart << ) | fracPart;
  421. }

