Facebook Chat, offered a nice set of software engineering challenges:

Real-time presence notification:

The most resource-intensive operation performed in a chat system is not sending messages. It is rather keeping each online user aware of the online-idle-offline states of their friends, so that conversations can begin.

The naive implementation of sending a notification to all friends whenever a user comes online or goes offline has a worst case cost of O(average friendlist size * peak users * churn rate) messages/second, where churn rate is the frequency with which users come online and go offline, in events/second. This is wildly inefficient to the point of being untenable, given that the average number of friends per user is measured in the hundreds, and the number of concurrent users during peak site usage is on the order of several millions.

Surfacing connected users' idleness greatly enhances the chat user experience but further compounds the problem of keeping presence information up-to-date. Each Facebook Chat user now needs to be notified whenever one of his/her friends
(a) takes an action such as sending a chat message or loads a Facebook page (if tracking idleness via a last-active timestamp) or
(b) transitions between idleness states (if representing idleness as a state machine with states like "idle-for-1-minute", "idle-for-2-minutes", "idle-for-5-minutes", "idle-for-10-minutes", etc.).
Note that approach (a) changes the sending a chat message / loading a Facebook page from a one-to-one communication into a multicast to all online friends, while approach (b) ensures that users who are neither chatting nor browsing Facebook are nonetheless generating server load.

Real-time messaging:

Another challenge is ensuring the timely delivery of the messages themselves. The method we chose to get text from one user to another involves loading an iframe on each Facebook page, and having that iframe's Javascript make an HTTP GET request over a persistent connection that doesn't return until the server has data for the client. The request gets reestablished if it's interrupted or times out. This isn't by any means a new technique: it's a variation of Comet, specifically XHR long polling, and/or BOSH.

Having a large-number of long-running concurrent requests makes the Apache part of the standard LAMP stack a dubious implementation choice. Even without accounting for the sizeable overhead of spawning an OS process that, on average, twiddles its thumbs for a minute before reporting that no one has sent the user a message, the waiting time could be spent servicing 60-some requests for regular Facebook pages. The result of running out of Apache processes over the entire Facebook web tier is not pretty, nor is the dynamic configuration of the Apache process limits enjoyable.

Distribution, Isolation, and Failover:

Fault tolerance is a desirable characteristic of any big system: if an error happens, the system should try its best to recover without human intervention before giving up and informing the user. The results of inevitable programming bugs, hardware failures, et al., should be hidden from the user as much as possible and isolated from the rest of the system.

The way this is typically accomplished in a web application is by separating the model and the view: data is persisted in a database (perhaps with a separate in-memory cache), with each short-lived request retrieving only the parts relevant to that request. Because the data is persisted, a failed read request can be re-attempted. Cache misses and database failure can be detected by the non-database layers and either reported to the user or worked around using replication.

While this architecture works pretty well in general, it isn't as successful in a chat application due to the high volume of long-lived requests, the non-relational nature of the data involved, and the statefulness of each request.

For Facebook Chat, we rolled our own subsystem for logging chat messages (in C++) as well as an epoll-driven web server (in Erlang) that holds online users' conversations in-memory and serves the long-polled HTTP requests. Both subsystems are clustered and partitioned for reliability and efficient failover. Why Erlang? In short, because the problem domain fits Erlang like a glove. Erlang is a functional concurrency-oriented language with extremely low-weight user-space "processes", share-nothing message-passing semantics, built-in distribution, and a "crash and recover" philosophy proven by two decades of deployment on large soft-realtime production systems.

Glueing with Thrift:

Despite those advantages, using Erlang for a component of Facebook Chat had a downside: that component needed to communicate with the other parts of the system. Glueing together PHP, Javascript, Erlang, and C++ is not a trivial matter. Fortunately, we have Thrift. Thrift translates a service description into the RPC glue code necessary for making cross-language calls (marshalling arguments and responses over the wire) and has templates for servers and clients. Since going open source a year ago (we had the gall to release it on April Fool's Day, 2007), the Thrift project has steadily grown and improved (with multiple iterations on the Erlang binding). Having Thrift available freed us to split up the problem of building a chat system and use the best available tool to approach each sub-problem.

facebook chat 【转】的更多相关文章

  1. facebook chat api 使用

    官方API文档: https://developers.facebook.com/docs/chat/ 下面是根据文档修改的类: <?php class Invite_Chat{ protect ...

  2. How to add Facebook’s Customer Chat Plugin to your website

    How to add Facebook’s Customer Chat Plugin to your website By Gerardo Salandra  Do you need a live c ...

  3. 【转发】揭秘Facebook 的系统架构

    揭底Facebook 的系统架构 www.MyException.Cn   发布于:2012-08-28 12:37:01   浏览:0次 0 揭秘Facebook 的系统架构 www.MyExcep ...

  4. Facebook的体系结构分析---外文转载

    Facebook的体系结构分析---外文转载 From various readings and conversations I had, my understanding of Facebook's ...

  5. Facebook 的系统架构(转)

    来源:http://www.quora.com/What-is-Facebooks-architecture(由Micha?l Figuière回答) 根据我现有的阅读和谈话,我所理解的今天Faceb ...

  6. facebook design question 总结

    http://blog.csdn.net/sigh1988/article/details/9790337 这里原帖地址: http://www.mitbbs.com/article_t/JobHun ...

  7. Facebook Architecture

    Facebook Architecture Quora article a relatively old presentation on facebook architecture another I ...

  8. [Erlang 0105] Erlang Resources 小站 2013年1月~6月资讯合集

    很多事情要做,一件一件来; Erlang Resources 小站 2013年1月~6月资讯合集,方便检索.      小站地址: http://site.douban.com/204209/     ...

  9. 通向高可扩展性之路(WhatsApp篇)---- 脸书花了190亿买来的WhatsApp的架构

    原文链接:http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-facebook-bought-for-19-bill ...

随机推荐

  1. Diycode开源项目 MyTopicActivity分析

    1.总体浏览效果及布局分析 1.1.看一下我的帖子预览效果 1.2.然后看一下我的收藏预览效果 1.3.归纳一下 所以我的帖子和我的收藏可以共用同一个类. 布局效果几乎一样,只是用户的选择不同. 所以 ...

  2. sql中比较大小

    if object_id('tempdb..#dataOldNew1') is not null drop table #dataOldNew1 select distinct store_cd ,i ...

  3. tomcat缓存

    问题描述: 一个用到struts2框架的web项目,由于在struts.xml中少配置了一个action,导致项目运行时报异常.将原本好的代码复旧,重启tomcat服务,第一次加载程序没问题,再刷新时 ...

  4. linux学习(三) -- lnmp环境切换php版本,并安装相应redis扩展

    原创文章,转载请注明出处   我想配置的环境是ubuntu+nginx+mysql+php+redis,其中php装两个版本,php7和php56 ubuntu+nginx+mysql+php的环境配 ...

  5. python实现算法: 多边形游戏 数塔问题 0-1背包问题 快速排序

    去年的算法课挂了,本学期要重考,最近要在这方面下点功夫啦! 1.多边形游戏-动态规划 问题描述: 多边形游戏是一个单人玩的游戏,开始时有一个由n个顶点构成的多边形.每个顶点被赋予一个整数值, 每条边被 ...

  6. [oldboy-django][2深入django]django目录说明 + 路由系统

    django project目录说明 project - app01 -- admin.py #django自带后台管理 -- apps.py #app01配置文件 -- models.py #编写类 ...

  7. ubuntu下安装JDK(复制)

    ubuntu 安装jdk 的两种方式:(本来jdk应该安装到/usr/lib/jvm下,但我安装到了/usr/local/lib/jvm下了) 1:通过ppa(源) 方式安装. 2:通过官网下载安装包 ...

  8. web自动化测试:watir+minitest(三)

    本文,谢绝转载. 整体框架设计: 1.用例的解耦性.一个测试用例一个脚本.而并非minitest中的N个test写在一个文件中 2.单独调试与全量连跑或部分连跑 3.任意变量.参数配置.这点对后期维护 ...

  9. JVM(8):JVM知识点总览-高级Java工程师面试必备

    http://www.importnew.com/23792.html jvm 总体梳理 jvm体系总体分四大块: 类的加载机制 jvm内存结构 GC算法 垃圾回收 GC分析 命令调优 当然这些知识点 ...

  10. java _tomcat_mysql 部署

    项目做完了,要发布了,而Java的特长之一就是移植性好,面对着微软的XP的停止服务,Windows系统的“独裁”,越来越多的商家选择了开源的免费的linux系统作为服务器.因为linux系统也有图形界 ...