Website:http://srinvis.blogspot.ca/2006/07/hash-table-lengths-and-prime-numbers.html

This has been bugging me for some time now...

The first thing you do when inserting/retreiving from hash table is to calculate the hashCode for the given key and then find the correct bucket by trimming the hashCode to the size of the hashTable by doing hashCode % table_length. Here are 2 statements that
you most probably have read somewhere

  • If you use a power of 2 for table_length, finding (hashCode(key) % 2^n ) is as simple and quick as (hashCode(key) & (2^n -1)). But if your function to calculate hashCode for a given key isn't good,
    you will definitely suffer from clustering of many keys in a few hash buckets.
  • But if you use prime numbers for table_length, hashCodes calculated could map into the different hash buckets even if you have a slightly stupid hashCode function.

Ok, but now can someone tell me why it is so and give me the proof for the above statements...I couldn't find much on google. Most reasons given are again just statements and I cannot accept statements (And there are some stupid proofs also on the net, so beware).
And don't even try telling me that the proof is experimental.

Today i have finally been able to get rid of this thought from my head...below is the proof I came up with (Hope it doesn't fall into the stupid category, it doesn't seem atleast for now, if u think otherwise put in a comment, atleast for the sake of others).
I suggest you think about the solution on your own before reading further...

If suppose your hashCode function results in the following hashCodes among others {x , 2x, 3x, 4x, 5x, 6x...}, then all these are going to be clustered in just m number of buckets, where m = table_length/GreatestCommonFactor(table_length, x). (It is trivial
to verify/derive this). Now you can do one of the following to avoid clustering

  1. Make sure that you don't generate too many hashCodes that are multiples of another hashCode like in {x, 2x, 3x, 4x, 5x, 6x...}.But this may be kind of difficult if your hashTable is supposed to have millions
    of entries.
  2. Or simply make m equal to the table_length by making GreatestCommonFactor(table_length, x) equal to 1, i.e by making table_length coprime with x. And if x can be just about any number then make sure that table_length
    is a prime number.

If you need some more kick from hash maps take a look at

Doug lea's highly performant ConcurrentHashMap http://www-128.ibm.com/developerworks/java/library/j-jtp08223/

Google's space/time efficient hashmap implementations http://goog-sparsehash.sourceforge.net/doc/implementation.html

Hash table lengths and prime numbers的更多相关文章

  1. Hash table: why size should be prime?

    Question: Possible Duplicate:Why should hash functions use a prime number modulus? Why is it necessa ...

  2. Hash Map (Hash Table)

    Reference: Wiki  PrincetonAlgorithm What is Hash Table Hash table (hash map) is a data structure use ...

  3. [转载] 散列表(Hash Table)从理论到实用(上)

    转载自:白话算法(6) 散列表(Hash Table)从理论到实用(上) 处理实际问题的一般数学方法是,首先提炼出问题的本质元素,然后把它看作一个比现实无限宽广的可能性系统,这个系统中的实质关系可以通 ...

  4. [转载] 散列表(Hash Table)从理论到实用(中)

    转载自:白话算法(6) 散列表(Hash Table)从理论到实用(中) 不用链接法,还有别的方法能处理碰撞吗?扪心自问,我不敢问这个问题.链接法如此的自然.直接,以至于我不敢相信还有别的(甚至是更好 ...

  5. [转载] 散列表(Hash Table) 从理论到实用(下)

    转载自: 白话算法(6) 散列表(Hash Table) 从理论到实用(下) [澈丹,我想要个钻戒.][小北,等等吧,等我再修行两年,你把我烧了,舍利子比钻戒值钱.] ——自扯自蛋 无论开发一个程序还 ...

  6. algorithm@ Sieve of Eratosthenes (素数筛选算法) & Related Problem (Return two prime numbers )

    Sieve of Eratosthenes (素数筛选算法) Given a number n, print all primes smaller than or equal to n. It is ...

  7. POJ2739 Sum of Consecutive Prime Numbers(尺取法)

    POJ2739 Sum of Consecutive Prime Numbers 题目大意:给出一个整数,如果有一段连续的素数之和等于该数,即满足要求,求出这种连续的素数的个数 水题:艾氏筛法打表+尺 ...

  8. [LeetCode] 1. Two Sum_Easy tag: Hash Table

    Given an array of integers, return indices of the two numbers such that they add up to a specific ta ...

  9. [译]C语言实现一个简易的Hash table(1)

    说明 Hash table翻译过来就是Hash表,是一种提供了类似于关联数组的数据结构,可以通过key执行搜索.插入和删除操作.Hash表由一些列桶(buckets)组成,而每一个bucket都是由k ...

随机推荐

  1. windows安装mongodb及相关命令

      - 安装   解压: mongodb-win32-x86_64-2008plus-ssl-3.6.4.7z 将文件夹改名为mongodb 移动文件到指定目录下,如: C:\python\soft ...

  2. RabbitMQ 常用操作

    RabbitMQ简介 1.首先安装erlang rpm -Uvh https://www.rabbitmq.com/releases/erlang/erlang-19.0.4-1.el7.centos ...

  3. OSLab多线程

      日期:2019/3/26 内容:多线程. 一.基本知识 线程的定义 线程(thread)是操作系统能够进行运算调度的最小单位.它被包含在进程之中,是进程中的实际运作单位.一条线程指的是进程中一个单 ...

  4. idea集成tomcat

    1 Tomcat的使用 * Tomcat:web服务器软件 1. 下载:http://tomcat.apache.org/ 2. 安装:解压压缩包即可. * 注意:安装目录建议不要有中文和空格 3. ...

  5. 深圳scala-meetup-20180902(2)- Future vs Task and ReaderMonad依赖注入

    在对上一次3月份的scala-meetup里我曾分享了关于Future在函数组合中的问题及如何用Monix.Task来替代.具体分析可以查阅这篇博文.在上篇示范里我们使用了Future来实现某种non ...

  6. 从零开始学习渗透Node.js应用程序

    本文来源于i春秋学院,未经允许严禁转载 0x01 介绍 简单的说 Node.js 就是运行在服务端的 JavaScript.Node.js 是一个基于Chrome JavaScript 运行时建立的一 ...

  7. jQuery应用实例2:表格隔行换色

    这里是用JS实现的:http://www.cnblogs.com/xuyiqing/p/8376312.html 接下来利用上一篇提到的选择器利用jQuery实现: 发现原来多行代码这里只需要两行: ...

  8. Swift5 语言指南(四) 基础知识

    Swift是iOS,macOS,watchOS和tvOS应用程序开发的新编程语言.尽管如此,Swift的许多部分对您在C和Objective-C中的开发经验都很熟悉. 雨燕提供了自己的所有基本C和Ob ...

  9. linux中ftp的安装过程记录[运维篇]

    安装FTP的全过程记录,对于相同情况希望有所帮助.[centOS] 1.查询本机是否安装vsftpd: rpm -qa |grep vsftpd : 2.安装ftp服务 yum install vsf ...

  10. Typescript 学习笔记一:介绍、安装、编译

    前言 整理了一下 Typescript 的学习笔记,方便后期遗忘某个知识点的时候,快速回忆. 为了避免凌乱,用 gitbook 结合 marketdown 整理的. github地址是:ts-gitb ...