Hamming Weight的算法分析（转载）

看代码时遇到一个求32bit二进制数中1的个数的问题，感觉算法很奇妙，特记录学习心得于此，备忘。

计算一个64bit二进制数中1的个数。

解决这个问题的算法不难，很自然就可以想到，但是要给出问题的最优解，却很有难度。

通常，最容易想到的算法是除余法，继而考虑到除法的代价较高，而且除数是2，会想到使用向右移位来代替除法，并使用&0x1操作来取末位的值，这样提高了算法的效率。然而，这样仍然进行了63次&操作、63次移位操作和63次+操作。若假设字长大小不限，记作N，那么上述算法的时间复杂度都为O(N)。

当然，还有更优的算法。

这个问题其实是HammingWeight的一个应用，又叫做populationcount,popcountorsidewayssum。HammingWeight详见http://en.wikipedia.org/wiki/Hamming_weight，以下部分内容取自维基百科。

Hammingcode是指一个字串中非0符号的个数（TheHamming weight
of a stringis the number of symbols that are different from the zero-symbol ofthealphabetused.）。应用到2进制符号序列中来，即二进制串中1的个数就是该串的Hammingcode.那么上述的问题即转换成求解字串的Hammingcode的问题。

下面对维基百科上给出的算法，进行分析。算法使用c语言实现。

[cpp] view plaincopy

    //types and constants used in the functions below  

    typedef unsigned __int64 uint64;  //assume this gives 64-bits

    const uint64 m1  = 0x5555555555555555; //binary: 0101...

    const uint64 m2  = 0x3333333333333333; //binary: 00110011..

    const uint64 m4  = 0x0f0f0f0f0f0f0f0f; //binary:  4 zeros,  4 ones ...

    const uint64 m8  = 0x00ff00ff00ff00ff; //binary:  8 zeros,  8 ones ...

    const uint64 m16 = 0x0000ffff0000ffff; //binary: 16 zeros, 16 ones ...

    const uint64 m32 = 0x00000000ffffffff; //binary: 32 zeros, 32 ones

    const uint64 hff = 0xffffffffffffffff; //binary: all ones

    const uint64 h01 = 0x0101010101010101; //the sum of 256 to the power of 0,1,2,3...  

    //This is a naive implementation, shown for comparison,

    //and to help in understanding the better functions.

    //It uses 24 arithmetic operations (shift, add, and).

    int popcount_1(uint64 x) {

        x = (x & m1 ) + ((x >>  ) & m1 ); //put count of each  2 bits into those  2 bits

        x = (x & m2 ) + ((x >>  ) & m2 ); //put count of each  4 bits into those  4 bits

        x = (x & m4 ) + ((x >>  ) & m4 ); //put count of each  8 bits into those  8 bits

        x = (x & m8 ) + ((x >>  ) & m8 ); //put count of each 16 bits into those 16 bits

        x = (x & m16) + ((x >> ) & m16); //put count of each 32 bits into those 32 bits

        x = (x & m32) + ((x >> ) & m32); //put count of each 64 bits into those 64 bits

        return x;

    }

分析：popcount1是下面算法的基础，理解了这个算法的思想，下面的算法不过就是此算法的局部优化罢了。
首先，理解这样一个事实，64bit的二进制串中最多有64个1，而0~64内的值必然可以使用该串的低8位来表示（2^8>64）。2^2>2,那么2bit的串中的1的个数必然可以用这两位来表示。
我们先简化成8bit的串，来描述算法的基本思想。
使用abcdefgh来代表一个8bit的2进制串，其中a,b,c,d,e,f,g,h属于集合{0,1}
那么算法求解的目标输出是out=a+b+c+d+e+f+g+h
对应到上面代码中的第一步来说，x = (x & m1 ) + ((x >> 2) & m1 )，
x&m1 = 0b0d0f0h
(x>>2)&m1 = 0a0c0e0g
求和得到：[a+b]2[c+d]2[e+f]2[g+h]2,这里[x]2 表示2位的二进制，其值=x(x表示10进制的值)。如果对应到64bit的串，那么这里将有32个2-bit的组合，即将64bit两两一组，并使用其来表示自身包含的1的个数。

代码的第二步：x = (x & m2 ) + ((x >> 4) & m2 ),同样使用8bit串来简化描述。
x&m2 = 00[c+d]200[g+h]2
(x>>4)&m2 = 00[a+b]2 00[e+f]2
求和得到：[a+b+c+d]4[e+f+g+h]4

第三步： x = (x & m4 ) + ((x >> 4) & m4 );
x&m4 = 0000[e+f+g+h]4
(x>>4)&m2 = 0000[a+b+c+d]4
求和得到：[a+b+c+d+e+f+g+h]8
至此问题得解。对于64bit的串，则如代码所示还要多进行3步。

到这里可以很清楚的看到，算法是使用了分治的思想，每步将问题划分成子问题，然后合并来减小问题的规模，求解问题的过程像是一棵倒置的二叉树。先将n位的
二进制相邻的两位两两分为一组，并巧妙的利用移位和掩码来使其利用自身来表示所得到的和，这样从宏观上来看，问题就被简化成规模为n/2bit(这里的
bit其实已经是虚指了，其实理解为unit更好)的问题求解了，同样的，继续两两划分成一组分治求解。经过lg（n）步，得到最终的解。
由以上分析可见，算法的复杂度为O（lgn）。
对于64位的字串来说，只使用了24次算数操作，比起前面的算法来说要明显减少了。

[cpp] view plaincopy

    //This uses fewer arithmetic operations than any other known

    //implementation on machines with slow multiplication.

    //It uses 17 arithmetic operations.

    int popcount_2(uint64 x) {

        x -= (x >> ) & m1;             //put count of each 2 bits into those 2 bits

        x = (x & m2) + ((x >> ) & m2); //put count of each 4 bits into those 4 bits

        x = (x + (x >> )) & m4;        //put count of each 8 bits into those 8 bits

        x += x >>  ;  //put count of each 16 bits into their lowest 8 bits

        x += x >> ;  //put count of each 32 bits into their lowest 8 bits

        x += x >> ;  //put count of each 64 bits into their lowest 8 bits

        return x & 0x7f;

    }

popcount2在popcount1的基础上进行了优化。
第一步基于了这样一个事实:ab-0a得到的值为ab中1的个数。
简单证明：若a为0，那么0a=0,减0无变化，那么b就是结果。
若a位1，那么只有两种情况，10-01 = 01, 11-01 = 10.都符合上述事实。
这样x -= (x >> 1) & m1和 x = (x & m1 ) + ((x >> 1)
& m1
)的结果相同，却节省了1个操作。（这里我有个疑问，有符号数使用补码进行减法操作等于加法操作，效率相当，然而这里x为无符号数，即原码加减法，原码的
减法在机器级如何实现，即一个源码减法的操作的代价与加法和与操作的代价和比较，哪个更大？有时间的话要去看下原码减法的实现）

第二步第三步同popcount1，此时x=[a]8[b]8[c]8[d]8[e]8[f]8[g]8[h]8

第四步后x = [H8|a+b]16[H8|c+d]16[H8|e+f]16[H8|g+h]16,这里H8代表高8位，由于我们不关心高8位的值（当然H的值是明显知道的），这里就用H代替。由于使用低8位完全可以表示0～64范围内的值，因此不用担心低八位溢出。

同理，第五步后x=[H24|a+b+c+c]32[H24|e+f+g+h]32
第六步后x=[H56|e+f+g+h]64
第七步使用掩码0x7f获得低8位的值(0xff效果应该一样的吧..?)

[cpp] view plaincopy

    //This uses fewer arithmetic operations than any other known

    //implementation on machines with fast multiplication.

    //It uses 12 arithmetic operations, one of which is a multiply.

    int popcount_3(uint64 x) {

        x -= (x >> ) & m1;             //put count of each 2 bits into those 2 bits

        x = (x & m2) + ((x >> ) & m2); //put count of each 4 bits into those 4 bits

        x = (x + (x >> )) & m4;        //put count of each 8 bits into those 8 bits

        return (x * h01)>>;  //returns left 8 bits of x + (x<<8) + (x<<16) + (x<<24) + ...

    }

popcount3进一步进行了优化，只看最后一步:return (x * h01)>>56;
此步之前的x=[a]8[b]8[c]8[d]8[e]8[f]8[g]8[h]8
x*h01 = x*0x0101010101010101 = x+(x<<8)+(x<<16)...+(x<<56)
即x=[a+b+c+d+e+f+g+h|L56], L56指低56位
右移56位获得a+b+c+d+e+f+h的值，得解。

此外还有比较有趣的算法：

[cpp] view plaincopy

    //This is better when most bits in x are 0

    //It uses 3 arithmetic operations and one comparison/branch per "1" bit in x.

    int popcount_4(uint64 x) {

        int count;

        for (count=; x; count++)

            x &= x-;

        return count;

    }

上面这个算法在已知0的数目比较多时候很高效。
此算法基于这样一个事实：x-1使得以二进制表示的x，从低向高位开始包括第一个1在内的值，都由0变成1，由1变成0。如11-01 = 10, 10
– 01 = 01, 01 – 01 = 00, 100 – 001 =
011。而&操作使得发生变化的位置都变成0，这样就去除了1个1，从而有几个1就&几次，最终x必变成0.
下面算法消除了popcount4的循环

[cpp] view plaincopy

    //This is better if most bits in x are 0.

    //It uses 2 arithmetic operations and one comparison/branch  per "1" bit in x.

    //It is the same as the previous function, but with the loop unrolled.

    #define f(y) if ((x &= x-1) == 0) return y;

    int popcount_5(uint64 x) {

        if (x == ) return ;

        f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( )

        f( ) f() f() f() f() f() f() f()

        f() f() f() f() f() f() f() f()

        f() f() f() f() f() f() f() f()

        f() f() f() f() f() f() f() f()

        f() f() f() f() f() f() f() f()

        f() f() f() f() f() f() f() f()

        f() f() f() f() f() f() f()

        return ;

    }  

    //Use this instead if most bits in x are 1 instead of 0

    #define f(y) if ((x |= x+1) == hff) return 64-y;

最有趣的是查表法，当有足够的内存时，我们可以用空间换时间，从而得到O(1)的最优算法。
以4bit的串为例，可以构造一个数组int counts[16]={0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4}.
对于4bit的x,x的hamming weight即为：counts[x].
对于32bit的串，也可以使用分成两部分查表的方法来节省一点内存：

static unsigned char wordbits[65536] = { bitcounts of ints between 0 and 65535 };

static int popcount(uint32 i)

{

    return (wordbits[i&0xFFFF] + wordbits[i>>16]);

}

Hamming Weight还有很多应用，这里只是简单记录一下它在求解popcount上的用法。

Hamming Weight的算法分析（转载）的更多相关文章

统计无符号整数二进制中1的个数（Hamming weight）
1.问题来源之所以来记录这个问题的解法,是因为在在线编程中经常遇到,比如编程之美和京东的校招笔试以及很多其他公司都累此不疲的出这个考题.看似简单的问题,背后却隐藏着很多精妙的解法.查找网上资料,才知 ...
variable-precision SWAR算法：计算Hamming Weight
variable-precision SWAR算法:计算Hamming Weight 转自我的Github 最近看书看到了一个计算Hamming Weight的算法,觉得挺巧妙的,纪录一下. Hamm ...
海明距离hamming distance
仔细阅读ORB的代码,发现有很多细节不是很明白,其中就有用暴力方式测试Keypoints的距离,用的是HammingLUT,上网查了才知道,hamming距离是相差位数.这样就好理解了. 我理解的Ha ...
linux tricks 之 bitmap分析.
------------------------------------------- 本文系作者原创, 欢迎大家转载! 转载请注明出处:netwalker.blog.chinaunix.net -- ...
Leetcode_191_Number of 1 Bits
本文是在学习中的总结.欢迎转载但请注明出处:http://blog.csdn.net/pistolove/article/details/44486547 Write a function that ...
191. Number of 1 Bits Leetcode Python
Write a function that takes an unsigned integer and returns the number of '1' bits it has (also know ...
【一天一道LeetCode】#191. Number of 1 Bits
一天一道LeetCode 本系列文章已全部上传至我的github,地址:ZeeCoder's Github 欢迎大家关注我的新浪微博,我的新浪微博欢迎转载,转载请注明出处 (一)题目 Write a ...
leetcode bug free
---不包含jiuzhang ladders中出现过的题.如出现多个方法,则最后一个方法是最优解. 目录: 1 String 2 Two pointers 3 Array 4 DFS &&am ...
萌新笔记——Cardinality Estimation算法学习（一）（了解基数计算的基本概念及回顾求字符串中不重复元素的个数的问题）
最近在菜鸟教程上自学redis.看到Redis HyperLogLog的时候,对"基数"以及其它一些没接触过(或者是忘了)的东西产生了好奇. 于是就去搜了"HyperLo ...

随机推荐

dojo 七 DOM dojo/dom
官方教程:Dojo DOM Functions对dom的使用,需要引用包dojo/dom.1.获取节点,dom.byIdbyId中既可以传递一个字符串,也可以传递一个节点对象 require([&qu ...
无线路由器WDS设置方法图解_无线桥接设置
随着无线网络的发展,现在越来越多的公司及企业都已经开始布局无线局域网,今天我们主要介绍下适合中小企业的无线路由器桥接或WDS功能.文章以TP-link WR841N无线路由器设置为例,其它路由器参考设 ...
Python学习之类
class Person: def __init__(self, name): self.name = name def sayHi(self): print('Hello, my name is'+ ...
cookie使用
知识拷贝. 理论很简单,而且模式也和大多请求 http://blog.csdn.net/lanmao100/article/details/2328491(源地址).返回状态的SSO差不多.但是有几个 ...
cdoj 1334 郭大侠与Rabi-Ribi 贪心+数据结构
郭大侠与Rabi-Ribi Time Limit: 3000/1000MS (Java/Others) Memory Limit: 65535/65535KB (Java/Others) Su ...
Android中GridView拖拽的效果【android进化三十六】
最近看到联想,摩托罗拉等,手机launcher中有个效果,进入mainmenu后,里面的应用程序的图标可以拖来拖去,所以我也参照网上给的代码,写了一个例子.还是很有趣的,实现的流畅度没有人家的 ...
Java编程思想 (1~10)
[注:此博客旨在从<Java编程思想>这本书的目录结构上来检验自己的Java基础知识,只为笔记之用] 第一章对象导论 1.万物皆对象2.程序就是对象的集合3.每个对象都是由其它对象所构成 ...
精选37条强大的常用linux shell命令组合
任务命令组合 1 删除0字节文件 find . -type f -size 0 -exec rm -rf {} \;find . type f ...
UISegment
UISegment分段控制属性 1.segmentedControlStyle 设置segment的显示样式. typedef NS_ENUM(NSInteger, UISegmentedContr ...
BZOJ 2286 消耗战
虚树裸题. 23333以后memset千万慎用. #include<iostream> #include<cstdio> #include<cstring> #in ...

Hamming Weight的算法分析（转载）

Hamming Weight的算法分析（转载）的更多相关文章

随机推荐

热门专题