The Step-by-Step Approach

break down a tricky problem and to solve problems using what you do know.

Step 1: Make Believe

Pretend that the data can all fit on one machine and there are no memory limitations. Provide the general outline for your solution.

Step 2:  Get Real

figure out how to logically divide the data up, and how one machine would identify where to look up a different piece of data.

Step 3: Solve Problems

  Dividing Up Lots of Data:

By Order of Appearance:

By Hash Value: 1)pick some sort of key relating to the data 2)hash the key 3)mod the hash value by the number of machines 4)store data on the machine with that value

        there is no relationship between what the data represents and which machine stores data.

By Acutal Value: reduce system latency by using information about what the data represents.

Arbitrarily:

Good Example: Find all documents that contains a list of words.


10.1 build some sort of service that will be called by up to 1000 client applications to get simple end-of-day stock price information.

We want to start off by thinking about what the different aspects we should consider in a given proposal are:

1. Client Ease of Use: we want the service to be easy for the clients to implement and useful for them

2. Ease for Ourselves: consider in this not only the cost of implementing, but also the cost of maintenance

3. Flexibility for Future Demands:

4. Scalability and Efficiency: not to overly burden our service.

DataBase vs XML(json) P 343


10.2 good problem

10.7 LRU Cache

Chp10: Scalability and Memory Limits的更多相关文章

  1. is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 2.6 GB of 40 GB virtual memory used

    昨天使用hadoop跑五一的数据,发现报错: Container [pid=,containerID=container_1453101066555_4130018_01_000067] GB phy ...

  2. Memory Limits for Windows and Windows Server Releases

    来源:https://msdn.microsoft.com/en-us/library/windows/desktop/aa366778(v=vs.85).aspx Limits on memory ...

  3. [hadoop] - Container [xxxx] is running beyond physical/virtual memory limits.

    当运行mapreduce的时候,有时候会出现异常信息,提示物理内存或者虚拟内存超出限制,默认情况下:虚拟内存是物理内存的2.1倍.异常信息类似如下: Container [pid=13026,cont ...

  4. hive: insert数据时Error during job, obtaining debugging information 以及beyond physical memory limits

    insert overwrite table canal_amt1...... 2014-10-09 10:40:27,368 Stage-1 map = 100%, reduce = 32%, Cu ...

  5. hadoop is running beyond virtual memory limits问题解决

    单机搭建了2.6.5的伪分布式集群,写了一个tf-idf计算程序,分词用的是结巴分词,使用standalone模式运行没有任何问题,切换到伪分布式模式运行一直报错: hadoop is running ...

  6. hadoop的job执行在yarn中内存分配调节————Container [pid=108284,containerID=container_e19_1533108188813_12125_01_000002] is running beyond virtual memory limits. Current usage: 653.1 MB of 2 GB physical memory used

    实际遇到的真实问题,解决方法: 1.调整虚拟内存率yarn.nodemanager.vmem-pmem-ratio (这个hadoop默认是2.1) 2.调整map与reduce的在AM中的大小大于y ...

  7. [转载]Memory Limits for Windows and Windows Server Releases

    Memory Limits for Windows and Windows Server Releases This topic describes the memory limits for sup ...

  8. Kafka:ZK+Kafka+Spark Streaming集群环境搭建(十三)kafka+spark streaming打包好的程序提交时提示虚拟内存不足(Container is running beyond virtual memory limits. Current usage: 119.5 MB of 1 GB physical memory used; 2.2 GB of 2.1 G)

    异常问题:Container is running beyond virtual memory limits. Current usage: 119.5 MB of 1 GB physical mem ...

  9. Container [pid=6263,containerID=container_1494900155967_0001_02_000001] is running beyond virtual memory limits

    以Spark-Client模式运行,Spark-Submit时出现了下面的错误: User: hadoop Name: Spark Pi Application Type: SPARK Applica ...

随机推荐

  1. mysql之触发器trigger(1)

    触发器(trigger):监视某种情况,并触发某种操作. 触发器创建语法四要素:1.监视地点(table) 2.监视事件(insert/update/delete) 3.触发时间(after/befo ...

  2. How to Get Rid of /wordpress/ From your WordPress Site URL

    I brought up a website using wordpress, but I had to visit my website in a way I don't like -- www.e ...

  3. .NET清除Session 的几个方法[clear/removeAll/remove/Abandon]

    1.clear() 清空所有session对象的值,但保留会话   2.removeAll() 调用clear()方法   3.remove("SessionName") 删除某个 ...

  4. 《搭建DNS负载均衡服务》RHEL6

    搭建DNS负载均衡环境: 1.至少三台的linux虚拟机,一台主的DNS服务器,1台副的(可以N台),1台测试机. 负载均衡有很多种,apache那样的是为了缓解人们访问网站时给服务器造成太大的压力, ...

  5. VC 2010下安装OpenCV2.4.4

    说明: 安装平台:32位XP,VS2010: OpenCV 2.4.4不支持VC 6.0: 网上有很多用CMake编译OpenCV的安装教程,这里建议先不要自己编译,如果使用预编译好的库有问题,再尝试 ...

  6. 解决nginx负载均衡的session共享问题

    1.不使用session,换用cookie session是存放在服务器端的,cookie是存放在客户端的,我们可以把用户访问页面产生的session放到cookie里面,就是以cookie为中转站. ...

  7. margin的BUG

    在进行简单的div盒子嵌套时,发现设置margin-top时存在bug,然后就去谷歌搜索了一下,发现margin确实存在bug. bug的现象是父子元素嵌套时,如果子元素是块元素时,对块元素设置mar ...

  8. hadoop分布式安装过程

    一.安装准备及环境说明 1.下载hadoop-1.2.1,地址:http://apache.spinellicreations.com/hadoop/common/stable/hadoop-1.2. ...

  9. 提高Linux安全性--hosts.allow, hosts.deny 文件修改方法

    有一种办法来提高Linux安全性--修改 hosts.allow , hosts.deny 这2个文件来配置 允许某个ip访问, 或者禁止访问. 可以通过这种方式设置限制 sshd 的远程访问, 只允 ...

  10. nginx服务器绑定域名和设置根目录

    首先进入nginx安装目录的配置目录conf,然后执行 vi conf/nginx.conf 打开nginx的配置文件,找到并修改红字部分 server { listen default_server ...