Scaling the Messages Application Back End 【转】
11年的blog.
Facebook Messages seamlessly integrates many communication channels: email, SMS, Facebook Chat, and the existing Facebook Inbox. Combining all this functionality and offering a powerful user experience involved building an entirely new infrastructure stack from the ground up.
To simplify the product and present a powerful user experience, integrating and supporting all the above communication channels requires a number of services to run together and interact. The system needs to:
- Scale, as we need to support millions of users with existing message history.
- Operate in real time.
- Be highly available.
To overcome all these challenges, we started laying down a new architecture. At the heart of the application back end are the application servers. Application servers are responsible for answering all queries and take all the writes into the system. They also interact with a number of services to achieve this.

Each application server comprises:
- API: The entry point for all get and set operations, which every client calls. An application server is the sole entry point for any given user into the system. Any data written to or read from the system needs to go through this API.
- Distributed logic: To understand the distributed logic we need to understand what a cell is. The entire system is divided into cells, and each cell contains only a subset of users. A cell looks like this:

Understanding Cells
Cells give us many advantages:
- They help us scale incrementally while limiting failure scenarios
- Easy upgrades
- Metadata store failures affect only a few users
- Easy rollout
- Flexibility to host cells in different data centers with multi-homing for disaster recovery
Each cell consists of a single cluster of application servers, and each application server cluster is controlled by a set of ZooKeeper machines.
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications.
ZooKeeper is open source software that we use mainly for two purposes: as the controller for implementing sharding and failover of application servers, and as a store for our discovery service. Since ZooKeeper provides us with a highly available repository and notification mechanism, it goes a long way towards helping us build a highly available service.
Each application server registers itself in ZooKeeper by generating N tokens. The server uses these tokens to take N virtual positions on a consistent hash ring. This is used to shard users across these nodes. In case of failures, the neighboring nodes take over the load for those users, hence distributing the load evenly. This also allows for easy addition and removal of nodes into and from the application server cluster.
- Application business logic: This is where the magic happens. The business logic is responsible for making sense of all user data, storing and retrieving it, and applying all the complex product operations to it to perform various functions. It also has a dedicated cache that acts as a write-through cache, since the application servers are the only entry points to read/write data for any given user. This cache stores the entire recent image for the user and gives us a very high cache hit rate. The business logic also interacts with the Web servers to respect user privacy and also apply any policies.
- Data access layer: The data access layer is the schema used to store the user’s metadata. It consists mainly of a time sequenced log, which is the absolute source of truth for the user’s data, and is used to back up, retrieve, and regenerate user data. The schema also consists of snapshots that represent the serialized user objects understood by the business logic. This layer is designed to present a generic interface to the application servers while making the underlying store pluggable.
- Metadata store: Each cell also has a dedicated metadata store. We use HBase as our metadata store. The data access layer interacts with HBase to provide storage functionality. Late last year we talked about our Messages storage infrastructure, which is built on top of Apache HBase.
Finally, the whole system has a number of cells, and looks like this:

Other Messages Services
The Messages application back end needs to parse email messages and attachments, and also provide discovery of the right application servers for the given user. This is achieved with the following services:
- MTA proxy: This service receives all incoming email messages and is responsible for parsing the email RFCs, attachments, large bodies of email, and so forth. These parsed out values are stored in a dedicated Haystack cluster (which is the same key/value store that we use for photos). Once the proxy has created a lightweight email object, it talks to the appropriate application server and delivers the message. But talking to the appropriate application server involves figuring out the cell and machine a particular user resides on, which brings us to the discovery service.
- Discovery service: This consists of a map of user-to-cell mappings. Every client needs to talk to the discovery service before it can contact an application server for any request. Given the stringent requirements, this service needs to be very highly available, scalable, and performant.
- Distributed logic client: These clients listen for ZooKeeper notifications and watch for any changes in the application server cluster state. Each application server cluster or cell has a dedicated client. These clients live in the discovery service process, and once the discovery service has mapped the user’s cell, it queries that cell’s client, which executes the consistent hash algorithm to figure out the correct application server node for the user.
The Messages application back end also relies on the following services:
- Memcache dirty service: The application servers query message counts from the home page very frequently to accurately display the message notification jewels. These counts are cached in memcache in order to display the home page as quickly as possible. As new messages arrive, these entries need to be dirtied from the application servers. Thus, this dedicated service runs to dirty these caches in every data center.
- User index service: This provides the social information for each user, like friends, friends of friends, and so forth. This information is used to implement the social features of messaging. For example, on every message that is added to the system, the application server node queries this service to determine if this message is from a friend or a friend of friend and directs it to the appropriate folder.
The clients of the application back end system include MTAs for email traffic, IMAP, Web servers, SMS client, and Web Chat clients. Apart from the MTAs, which talk to the MTA proxy, all other clients talk directly to the application servers.
Given that we built this services infrastructure from scratch, one of the most important things was to have the appropriate tools and monitoring in place to push this software on almost a daily basis without any service disruption. So we ended up building a number of useful tools that can give us a view of the various cells, enable/disable cells, manage addition and removal of hardware, do rolling deployments without disrupting service, and give us a view of the performance and bottlenecks in various parts of the system.
All these services need to work in tandem and be available and reliable for messaging to work. We are in the process of importing millions of users into this system every day. Very soon every Facebook user will have access to the new Messages product.
We spent a few weeks setting up a test framework to evaluate clusters of MySQL, Apache Cassandra, Apache HBase, and a couple of other systems. We ultimately chose HBase. MySQL proved to not handle the long tail of data well; as indexes and data sets grew large, performance suffered. We found Cassandra's eventual consistency model to be a difficult pattern to reconcile for our new Messages infrastructure.
HBase comes with very good scalability and performance for this workload and a simpler consistency model than Cassandra. While we’ve done a lot of work on HBase itself over the past year, when we started we also found it to be the most feature rich in terms of our requirements (auto load balancing and failover, compression support, multiple shards per server, etc.). HDFS, the underlying filesystem used by HBase, provides several nice features such as replication, end-to-end checksums, and automatic rebalancing. Additionally, our technical teams already had a lot of development and operational expertise in HDFS from data processing with Hadoop. Since we started working on HBase, we've been focused on committing our changes back to HBase itself and working closely with the community. The open source release of HBase is what we’re running today.
Scaling the Messages Application Back End 【转】的更多相关文章
- Web Scalability for Startup Engineers Tip&Techniques for Scaling You Web Application --读书笔记
Web Scalability for Startup Engineers Tip&Techniques for Scaling You Web Application 第1章和第2章讲述可伸 ...
- 【转发】揭秘Facebook 的系统架构
揭底Facebook 的系统架构 www.MyException.Cn 发布于:2012-08-28 12:37:01 浏览:0次 0 揭秘Facebook 的系统架构 www.MyExcep ...
- Facebook的体系结构分析---外文转载
Facebook的体系结构分析---外文转载 From various readings and conversations I had, my understanding of Facebook's ...
- Facebook 的系统架构(转)
来源:http://www.quora.com/What-is-Facebooks-architecture(由Micha?l Figuière回答) 根据我现有的阅读和谈话,我所理解的今天Faceb ...
- Facebook Architecture
Facebook Architecture Quora article a relatively old presentation on facebook architecture another I ...
- [Windows Azure] .NET Multi-Tier Application Using Storage Tables, Queues, and Blobs - 1 of 5
.NET Multi-Tier Application Using Storage Tables, Queues, and Blobs - 1 of 5 This tutorial series sh ...
- [Windows Azure] How to Scale an Application
How to Scale an Application To use this feature and other new Windows Azure capabilities, sign up fo ...
- In partitioned databases, trading some consistency for availability can lead to dramatic improvements in scalability.
In partitioned databases, trading some consistency for availability can lead to dramatic improvement ...
- aws 试题
/* Domain 1 Design Resilient Architectures 1. Which of the following statements regarding S3 storage ...
随机推荐
- Linux命令之---find
命令简介 find明林用于查找目录下的文件,同时也可以调用其他命令执行相应的操作 命令格式 find pathname -options [-print -exec -ok ...] find [选项 ...
- HTML插入文件链接(如音乐,照片)
html中插入音频.H5的标签 src为本地 <audio controls="> <source src="韩庚 - I Don't Give A 屑.mp3& ...
- springboot学习资料汇总
收集Spring Boot相关的学习资料,Spring Cloud点这里 推荐博客 纯洁的微笑 程序猿DD liaokailin的专栏 Spring Boot 揭秘与实战 系列 catoop的专栏 简 ...
- PHP 自定义二维码生成
环境:PHP 7.*.* ,Composer 包管理工具.QrCode 效果如下: 使用 Composer 安装 QrCode QrCode 类库基于 php 的 GD 库,用于生成任意尺寸的二维码, ...
- 用nc+简单bat/vbs脚本+winrar制作迷你远控后门
前言 某大佬某天和我聊起了nc,并且提到了nc正反向shell这个概念. 我对nc之前的了解程度仅局限于:可以侦听TCP/UDP端口,发起对应的连接. 真正的远控还没实践过,所以决定写个小后门试一试. ...
- javascript原生360 开机小动画
<!DOCTYPE html> <html> <head lang="en"> <meta charset="UTF-8&quo ...
- mysql导入导出命令详解
mysql导入导出命令详解 该命令适用于临时备份操作. 一.导出数据库用mysqldump命令(注意mysql的安装路径,即此命令的路径): /usr/local/mysql/bin/ ---> ...
- 关于getAttribute()和setAttribute()的总结
继续声明:欲练其功,必先自宫.博主正处在自宫阶段,修炼得道者多多指教. 最近在看<JavaScript DOM 编程艺术>这本书,看到了getAttribute()和setAttribut ...
- 【01】react 之 hello world
React 起源于 Facebook 的内部项目,因为该公司对市场上所有 JavaScript MVC 框架,都不满意,就决定自己写一套,用来架设 Instagram 的网站.做出来以后,发现这套东西 ...
- 安装最新版本的cocoapods
因为公司的iOS项目使用了cocoapods来管理第三方库,所以要求所有组员的cocoapods版本一致.一般的就是执行: $ sudo gem install -n /usr/local/bin c ...