Facebook Architecture
Facebook Architecture
Quora article
a relatively old presentation on facebook architecture
another InfoQ presentation on Facebook architecture / scale
Web frontend
- PHP
- HipHop
- HipHop Virtual Machine (HHVM)
- BigPipe to pipeline page rendering, by dividing the page into pagelet and pipeline.
- Vanish Cache for web caching
Business Logic
- service-oriented, exposed as service
- Thrift API
- multiple language bindings
- no need to worry about serialization / connection handling / threading
- support different server type: non-blocking, async, single-thread, multi-thread
- Java service uses a custom application server (not Tomcat or Jetty etc.)
Persistence
- MySQL, Memcached, Hadoop's HBase
- MySQL/Innodb used as key-value store, distributed / load-balanced to many instances
- global ID is assigned to user data (user info, wall posts, comments etc.)
- Blob data e.g. photos and videos, are handled separately
Logging
- Scribe, one instance on each host
- Scribe-HDFS for analytics
Photo
- first version is NFS-backed storage, served via HTTP
- Haystack, Facebook's object store for photos
- Haystack slides
- Massive CDN to cache/delivery data
- previously NFS-backed, but traditional POSIX file system incurs too much overhead which is not necessary: directory resolution, file metadata, inode etc.
- Haystack Store: 1 server's 10 TB storage is split into 100 "physical volumes"; physical volumes on different hosts are organized into "logical volumes", data are replicated within logical volume
- physical volume is simply a very large file (100 GB) mounted at /hay/haystack_/
- Haystack Cache: internal cache
- example of an image's URL:
http://<CDN>/<Cache>/<Machine id>/<Logical volume, Photo>
- Haystack Directory: metadata / mapping
- mapping and URL construction
- load balance among logical volumes for write, and load balance among physical volumes (within a specific logical volume) for read.
- XFS works best with Haystack
News Feed
- the system is called multifeed in FB
- Facebook News Feed: Social Data at Scale, and slides
- recent (2015) redesign to News Feed
- What is News Feed
- fetch recent activity from all your friends
- gather it in a central place
- group into stories
- rank stories by relevance etc.
- send back results
- Scale
- 10 billion / day
- 60ms average latency
- Fan-out-on-write vs. Fan-out-on-read
- fan-out-on-write i.e. push writes to your friend
- can cause so called write amplification
- what Twitter originally does (with some optimization later on users with many followers, Justin Bieber Problem..)
- fan-out-on-read i.e. fetch and aggregate at read time - what Facebook does
- flexibility on read-time aggregation (like what content to generate, bound the data volume)
- How it works
- incoming requests is sent from PHP layer to an "aggregator", which figures out users to query (e.g. a request from me will query for all my friends)
- a server named leaf node holds all activities of a number of users
- there're many many leaf nodes for such purpose, with partitioning / possibly replication
- data is then loaded from the corresponding leaf node, then rank/aggregate the data, and finally send the stories back.
- PHP layer gets back a list of "action ids", and queries memcached/MySQL to load content of the action (like a video, a post)
- a "tailer": input data pipelines user actions and feedbacks to a leaf node in realtime (e.g. when a user posts a new video)
Facebook Chat
- Chat Stability and Scalability
- channel server: receive a user's message, and send to the user's browser, written in Erlang
- presence server: whether a user is online or not - channel server pushes active users to presence server - written in C++
- lexical_cast causes memory allocation, when heap is fragmented, new malloc() will spend quite some CPU time on finding memory
Facebook Search
- Intro to facebook search
- Role: find a specific name/page in Facebook, e.g. a guy named "Bob", a band named "Johny"
- Ranking (relevance indicators)
- personal context;
- social context;
- query itself;
- global popularity
- challenges
- no query cache can be used;
- no locality in index (i.e. no hot index)
- Life of a Typeahead Query
- initial try: preload user's friends, pages, groups, applications, upcoming events into browser cache - and try to serve the search here
- request sent to aggregator (similar to News Feed's aggregator), which delegates to several leaf services
- Graph Search on people
- Graph Search on objects
- global objects - an index on all pages and applications on Facebook, no personalization - could be cached
- each leaf service returns some data, aggregator merges and ranks the result, and send to web tier
- result from aggregator are ids to resources, web-tier will load the data and send back to user's browser
Graph Search
- Unicorn: A System for Searching the Social Graph
- Under the Hood: Building out the infrastructure for Graph Search
- Under the Hood: Indexing and ranking in Graph Search
- Under the Hood: The natural language interface of Graph Search
- Under the Hood: Building posts search
- hisotry of facebook search
- keyword based search
- typeahead search, prefix-matching
- Unicorn is an inverted index system for many-to-many mapping. Difference with typical inverted index is that it not only indexes "documents" or entities like users/pages/groups/applications, but also search based on the edges (edge types) between nodes
- graph search natural language interface example: employers of my friends who live in New York
- input node: ME
ME --[friend-edge]--> my friends (who live in NY)
- load list of nodes connected by a specific edge-type to the input nodes, here edge-type is "friend-edge"[MY FRIENDS FROM NY]--[works-at-edge]--> employers
- "apply operator" i.e. "work-at" edge
- Indexing: performed as a combination of map-reduce jobs that collect data from Hive tables, process them and convert into inverted index data structures
- live udpates are streamed into the index via a separate live udpate pipeline.
- Graph Search components (Unicorn) - essentially an in-memory database with a query language interface
- Vertical - an unicorn instance - different entity types are kept in separate Unicorn verticals, e.g. USER Vertical, PAGES Vertical
- index server - part of a vertical, holds some of the index given the index is too large to fit into one single host
- Vertical Aggregator - broadcasts query to all verticals, and rank them
- because there're multiple Unicorn instances (Verticals), there's a TOP AGGREGATOR to on top of all vertical aggregators - which runs blending algorithm to blend result from each vertical
- Query Rewriting: parse the query into a structured Unicorn retrivial query, correct spelling, synonyms / segmentation etc.
- example: "restaurants liked by Facebook employees" gets converted to
273819889375819/places/20531316728/employees/places-liked/intersect
- Scoring to rank result (static ranking); then "Result set scoring" to score the result as a whole, and only return a subset (e.g. "photos of facebook employees" may contain too many photos from Mark Zuckerberg)
- Nested Queries: the structured query may be nested and need to be JOINed, e.g. "restaurants liked by Facebook employees"
- Query Suggestion: relies on a NLP module to identify what kinds of entity that may be (sri as in name vs. sri as in "people who live in Sri.."
- Machine Learning is used to adjust the "scoring function"
- How to evaluate Search algorithm changes
- CTR - click through rate
- DCG (discounted cumulative gain) - measures the usefulness (gain) of a result set, by considering the gain of each result in the set and the position of the result
- Natural Language Interface to Graph Search
- keywords as an interface is not good: nouns only, while connections in Facebook Graph data are verbs
- quite intensive content, see article
- Building Posts Search
- more than 1 billion posts added everyday
- Wormhole to listen on posts from MySQL store of posts
- much larger than other index types - stored in SSD instead of RAM
- trillions of posts, nobody can read all result - dynamically add optional clauses to bias the result towards what we think are more valuable to the user
Facebook Messages
- presentation in Hadoop Summit 2011
- Scaling the Messages Application Back End
- Inside Facebook Messages' Application Server
- The Underlying Technology of Messages
- HBase as main storage
- Database Layer: Master / Backup Master / Region Server [1..n]
- Storage Layer: Name node / secondary name node / Data node [1..n]
- Coordination Service: Zookeeper peers
- A user is sticky to an application server
- Cell: application server + HBase node
- 5 or more racks per cell, 20 servers per rack => more than 100 machine for a cell
- controllers (master nodes, zookeeper, name nodes) spread across racks
- User Directory Service: find cell for a given user
- A separate backup system - quick and dirty to me
- Use Scribe
- double logging to reduce loss - merge and dedup
- ability to restore
- quite some effort to make HBase more reliable, fail safe, and support real-time workload.
- action log - any updates to a user's mailbox is recorded into the action log - can be replayed for various purposes
- full text search - use Lucene to extract data and add to HBase, each keyword has its own column
- Testing via Dark Launch - mirror live traffic from Chat and Inbox into a test Messages cluster for about 10% of the users.
Configuration Management
- an 2015 paper on this topic
Facebook Architecture的更多相关文章
- facebook architecture 2 【转】
At the scale that Facebook operates, a lot of traditional approaches to serving web content breaks d ...
- 【转发】揭秘Facebook 的系统架构
揭底Facebook 的系统架构 www.MyException.Cn 发布于:2012-08-28 12:37:01 浏览:0次 0 揭秘Facebook 的系统架构 www.MyExcep ...
- Facebook的体系结构分析---外文转载
Facebook的体系结构分析---外文转载 From various readings and conversations I had, my understanding of Facebook's ...
- 【转】为什么很多看起来不是很复杂的网站,比如 Facebook、淘宝,都需要大量顶尖高手来开发?
先说你看到的页面上,最重要的几个:[搜索商品]——这个功能,如果你有几千条商品,完全可以用select * from tableXX where title like %XX%这样的操作来搞定.但是— ...
- Facebook MyRocks at MariaDB
Recently my colleague Rasmus Johansson announced that MariaDB is adding support for the Facebook MyR ...
- Facebook技术架构
Facebook MySQL,Multifeed (a custom distributed system which takes the tens of thousands of updates f ...
- Analyzing The Papers Behind Facebook's Computer Vision Approach
Analyzing The Papers Behind Facebook's Computer Vision Approach Introduction You know that company c ...
- 100 open source Big Data architecture papers for data professionals
zhuan :https://www.linkedin.com/pulse/100-open-source-big-data-architecture-papers-anil-madan Big Da ...
- Facebook 的系统架构(转)
来源:http://www.quora.com/What-is-Facebooks-architecture(由Micha?l Figuière回答) 根据我现有的阅读和谈话,我所理解的今天Faceb ...
随机推荐
- Swift开发必备技巧:内存管理、weak和unowned
因为 Playground 本身会持有所有声明在其中的东西,因此本节中的示例代码需要在 Xcode 项目环境中运行.在 Playground 中可能无法得到正确的结果. 不管在什么语言里,内存管理的内 ...
- linux上ln命令详细说明
ln是linux中又一个非常重要命令,它的功能是为某一个文件在另外一个位置建立一个同不的链接,这个命令最常用的参数是-s,具体用法是:ln –s 源文件 目标文件. 当我们需要在不同的目录,用到相同的 ...
- basename usage in linux
作用:去掉文件的目录和后缀 1.去掉文件路径 jenkins@work:~/ci/script$ basename /backup/jenkins/ci/script/Release.sh.bak R ...
- PHPCMS 标签与解析小记_Jason
Content模块下的标签解析:phpcms\modules\content\classes\content_tag.class.php 推荐位:public function position
- Css3 javascript 写的分类
不兼容IE10以下的浏览器 <!DOCTYPE html> <html> <head> <meta charset=utf-> <title> ...
- phpize 编译安装memcached
下面是Memcached的安装过程: #wget http://memcached.googlecode.com/files/memcached-1.4.9.tar.gz # tar zvxf mem ...
- linux 添加 $path
# vim /etc/profile在文档最后,添加:export PATH="/usr/local/src/bin:$PATH"保存,退出,然后运行:#source /etc/p ...
- Python3 如何优雅地使用正则表达式(详解三)
模块级别的函数 使用正则表达式也并非一定要创建模式对象,然后调用它的匹配方法.因为,re 模块同时还提供了一些全局函数,例如 match(),search(),findall(),sub() 等等.这 ...
- Android开发者须知的几种APP加密方式--备
作为一个Android开发者,不仅需要使自己的APP功能丰富,便于使用,同时也需要去完善APP的安全性,下面就介绍几种简单而又可靠的加密方法.1.Spongy Castle Spongy Castle ...
- UIApplication-备用
iPhone应用程序是由主函数main启动,它负责调用UIApplicationMain函数,该函数的形式如下所示: int UIApplicationMain ( int argc, char *a ...