Facebook Architecture

Quora article
a relatively old presentation on facebook architecture
another InfoQ presentation on Facebook architecture / scale

Web frontend

  • PHP
  • HipHop
  • HipHop Virtual Machine (HHVM)
  • BigPipe to pipeline page rendering, by dividing the page into pagelet and pipeline.
  • Vanish Cache for web caching

Business Logic

  • service-oriented, exposed as service
  • Thrift API
  • multiple language bindings
  • no need to worry about serialization / connection handling / threading
  • support different server type: non-blocking, async, single-thread, multi-thread
  • Java service uses a custom application server (not Tomcat or Jetty etc.)

Persistence

  • MySQL, Memcached, Hadoop's HBase
  • MySQL/Innodb used as key-value store, distributed / load-balanced to many instances
  • global ID is assigned to user data (user info, wall posts, comments etc.)
  • Blob data e.g. photos and videos, are handled separately

Logging

  • Scribe, one instance on each host
  • Scribe-HDFS for analytics

Photo

  • first version is NFS-backed storage, served via HTTP
  • Haystack, Facebook's object store for photos
  • Haystack slides
  • Massive CDN to cache/delivery data
  • previously NFS-backed, but traditional POSIX file system incurs too much overhead which is not necessary: directory resolution, file metadata, inode etc.
  • Haystack Store: 1 server's 10 TB storage is split into 100 "physical volumes"; physical volumes on different hosts are organized into "logical volumes", data are replicated within logical volume
  • physical volume is simply a very large file (100 GB) mounted at /hay/haystack_/
  • Haystack Cache: internal cache
  • example of an image's URL: http://<CDN>/<Cache>/<Machine id>/<Logical volume, Photo>
  • Haystack Directory: metadata / mapping
  • mapping and URL construction
  • load balance among logical volumes for write, and load balance among physical volumes (within a specific logical volume) for read.
  • XFS works best with Haystack

News Feed

  • the system is called multifeed in FB
  • Facebook News Feed: Social Data at Scale, and slides
  • recent (2015) redesign to News Feed
  • What is News Feed
  • fetch recent activity from all your friends
  • gather it in a central place
  • group into stories
  • rank stories by relevance etc.
  • send back results
  • Scale
  • 10 billion / day
  • 60ms average latency
  • Fan-out-on-write vs. Fan-out-on-read
  • fan-out-on-write i.e. push writes to your friend
    • can cause so called write amplification
    • what Twitter originally does (with some optimization later on users with many followers, Justin Bieber Problem..)
  • fan-out-on-read i.e. fetch and aggregate at read time - what Facebook does
    • flexibility on read-time aggregation (like what content to generate, bound the data volume)
  • How it works
  • incoming requests is sent from PHP layer to an "aggregator", which figures out users to query (e.g. a request from me will query for all my friends)
  • a server named leaf node holds all activities of a number of users
  • there're many many leaf nodes for such purpose, with partitioning / possibly replication
  • data is then loaded from the corresponding leaf node, then rank/aggregate the data, and finally send the stories back.
  • PHP layer gets back a list of "action ids", and queries memcached/MySQL to load content of the action (like a video, a post)
  • a "tailer": input data pipelines user actions and feedbacks to a leaf node in realtime (e.g. when a user posts a new video)

Facebook Chat

  • Chat Stability and Scalability
  • channel server: receive a user's message, and send to the user's browser, written in Erlang
  • presence server: whether a user is online or not - channel server pushes active users to presence server - written in C++
  • lexical_cast causes memory allocation, when heap is fragmented, new malloc() will spend quite some CPU time on finding memory

Facebook Search

  • Intro to facebook search
  • Role: find a specific name/page in Facebook, e.g. a guy named "Bob", a band named "Johny"
  • Ranking (relevance indicators)
    • personal context;
    • social context;
    • query itself;
    • global popularity
  • challenges
    • no query cache can be used;
    • no locality in index (i.e. no hot index)
  • Life of a Typeahead Query
  • initial try: preload user's friends, pages, groups, applications, upcoming events into browser cache - and try to serve the search here
  • request sent to aggregator (similar to News Feed's aggregator), which delegates to several leaf services
    • Graph Search on people
    • Graph Search on objects
    • global objects - an index on all pages and applications on Facebook, no personalization - could be cached
  • each leaf service returns some data, aggregator merges and ranks the result, and send to web tier
  • result from aggregator are ids to resources, web-tier will load the data and send back to user's browser

Graph Search

  • Unicorn: A System for Searching the Social Graph
  • Under the Hood: Building out the infrastructure for Graph Search
  • Under the Hood: Indexing and ranking in Graph Search
  • Under the Hood: The natural language interface of Graph Search
  • Under the Hood: Building posts search
  • hisotry of facebook search
    • keyword based search
    • typeahead search, prefix-matching
  • Unicorn is an inverted index system for many-to-many mapping. Difference with typical inverted index is that it not only indexes "documents" or entities like users/pages/groups/applications, but also search based on the edges (edge types) between nodes
  • graph search natural language interface example: employers of my friends who live in New York
    • input node: ME
    • ME --[friend-edge]--> my friends (who live in NY) - load list of nodes connected by a specific edge-type to the input nodes, here edge-type is "friend-edge"
    • [MY FRIENDS FROM NY]--[works-at-edge]--> employers - "apply operator" i.e. "work-at" edge
  • Indexing: performed as a combination of map-reduce jobs that collect data from Hive tables, process them and convert into inverted index data structures
  • live udpates are streamed into the index via a separate live udpate pipeline.
  • Graph Search components (Unicorn) - essentially an in-memory database with a query language interface
    • Vertical - an unicorn instance - different entity types are kept in separate Unicorn verticals, e.g. USER Vertical, PAGES Vertical
    • index server - part of a vertical, holds some of the index given the index is too large to fit into one single host
    • Vertical Aggregator - broadcasts query to all verticals, and rank them
    • because there're multiple Unicorn instances (Verticals), there's a TOP AGGREGATOR to on top of all vertical aggregators - which runs blending algorithm to blend result from each vertical
    • Query Rewriting: parse the query into a structured Unicorn retrivial query, correct spelling, synonyms / segmentation etc.
    • example: "restaurants liked by Facebook employees" gets converted to 273819889375819/places/20531316728/employees/places-liked/intersect
    • Scoring to rank result (static ranking); then "Result set scoring" to score the result as a whole, and only return a subset (e.g. "photos of facebook employees" may contain too many photos from Mark Zuckerberg)
    • Nested Queries: the structured query may be nested and need to be JOINed, e.g. "restaurants liked by Facebook employees"
    • Query Suggestion: relies on a NLP module to identify what kinds of entity that may be (sri as in name vs. sri as in "people who live in Sri.."
  • Machine Learning is used to adjust the "scoring function"
  • How to evaluate Search algorithm changes
    • CTR - click through rate
    • DCG (discounted cumulative gain) - measures the usefulness (gain) of a result set, by considering the gain of each result in the set and the position of the result
  • Natural Language Interface to Graph Search
    • keywords as an interface is not good: nouns only, while connections in Facebook Graph data are verbs
    • quite intensive content, see article
  • Building Posts Search
    • more than 1 billion posts added everyday
    • Wormhole to listen on posts from MySQL store of posts
    • much larger than other index types - stored in SSD instead of RAM
    • trillions of posts, nobody can read all result - dynamically add optional clauses to bias the result towards what we think are more valuable to the user

Facebook Messages

  • presentation in Hadoop Summit 2011
  • Scaling the Messages Application Back End
  • Inside Facebook Messages' Application Server
  • The Underlying Technology of Messages
  • HBase as main storage
    • Database Layer: Master / Backup Master / Region Server [1..n]
    • Storage Layer: Name node / secondary name node / Data node [1..n]
    • Coordination Service: Zookeeper peers
  • A user is sticky to an application server
  • Cell: application server + HBase node
    • 5 or more racks per cell, 20 servers per rack => more than 100 machine for a cell
    • controllers (master nodes, zookeeper, name nodes) spread across racks
  • User Directory Service: find cell for a given user
  • A separate backup system - quick and dirty to me
    • Use Scribe
    • double logging to reduce loss - merge and dedup
    • ability to restore
  • quite some effort to make HBase more reliable, fail safe, and support real-time workload.
  • action log - any updates to a user's mailbox is recorded into the action log - can be replayed for various purposes
  • full text search - use Lucene to extract data and add to HBase, each keyword has its own column
  • Testing via Dark Launch - mirror live traffic from Chat and Inbox into a test Messages cluster for about 10% of the users.

Configuration Management

Facebook Architecture的更多相关文章

  1. facebook architecture 2 【转】

    At the scale that Facebook operates, a lot of traditional approaches to serving web content breaks d ...

  2. 【转发】揭秘Facebook 的系统架构

    揭底Facebook 的系统架构 www.MyException.Cn   发布于:2012-08-28 12:37:01   浏览:0次 0 揭秘Facebook 的系统架构 www.MyExcep ...

  3. Facebook的体系结构分析---外文转载

    Facebook的体系结构分析---外文转载 From various readings and conversations I had, my understanding of Facebook's ...

  4. 【转】为什么很多看起来不是很复杂的网站,比如 Facebook、淘宝,都需要大量顶尖高手来开发?

    先说你看到的页面上,最重要的几个:[搜索商品]——这个功能,如果你有几千条商品,完全可以用select * from tableXX where title like %XX%这样的操作来搞定.但是— ...

  5. Facebook MyRocks at MariaDB

    Recently my colleague Rasmus Johansson announced that MariaDB is adding support for the Facebook MyR ...

  6. Facebook技术架构

    Facebook MySQL,Multifeed (a custom distributed system which takes the tens of thousands of updates f ...

  7. Analyzing The Papers Behind Facebook's Computer Vision Approach

    Analyzing The Papers Behind Facebook's Computer Vision Approach Introduction You know that company c ...

  8. 100 open source Big Data architecture papers for data professionals

    zhuan :https://www.linkedin.com/pulse/100-open-source-big-data-architecture-papers-anil-madan Big Da ...

  9. Facebook 的系统架构(转)

    来源:http://www.quora.com/What-is-Facebooks-architecture(由Micha?l Figuière回答) 根据我现有的阅读和谈话,我所理解的今天Faceb ...

随机推荐

  1. ExtJS实例1

    1.创建一个Extjs的Window,用ajax请求HTML文件,并执行HTML的代码和脚本 窗体中文字是从一个HTML中获取,并且HTML中执行脚本使窗体高亮1秒 主页面: <!DOCTYPE ...

  2. 过滤器(filter)实现

    花了2天时间,实现了过滤器功能,针对数据进行筛选,包含以下7个过滤器: 'date','currency','number','tolowercase','touppercase','orderBy' ...

  3. 武汉科技大学ACM:1007: 不高兴的津津

    Problem Description 津津上初中了.妈妈认为津津应该更加用功学习,所以津津除了上学之外,还要参加妈妈为她 报名的各科复习班.另外每周妈妈还会送她去学习朗诵.舞蹈和钢琴.但是津津如果一 ...

  4. 武汉科技大学ACM :1006: 华科版C语言程序设计教程(第二版)习题7.15

    Problem Description 输入n个字符串(n<=100),输出其中最长的串,如果有多个则取最先找到的那一个. Input 多组测试数据. 每组测试数据第一行包含一个整数n,表示一共 ...

  5. java事件处理3

    鼠标拖动事件 接口MouseMotionListener 两个方法 mouseDragged(MouseEvent)//拖动鼠标 mouseMoved(MouseEvent)//移动鼠标 一个拖动按钮 ...

  6. C#操作Excel开发报表系列整理(转)

    C#操作Excel进行报表开发系列共写了七篇,也已经有很久没有新东西了,现在整理一下,方便以后查阅,如果有写新的,会同时更新.需要注意的是因为Office的版本不同,实际的代码可能会有所不同,但是都是 ...

  7. centos 下mysql操作

    MySQL名字的来历MySQL是一个小型关系型数据库管理系统,MySQL被广泛地应用在Internet上的中小型网站中.由于其体积小.速度 快.总体拥有成本低,尤其是开放源码这一特点,许多中小型网站为 ...

  8. d017: 打印某年某月有几天

    内容: 打印某年某月有几天 输入说明: 一行两个整数,前面是年份 后面是月份 输出说明: 一个整数 输入样例:   2009 6 输出样例 : 30 #include <stdio.h> ...

  9. 遍历元素绑定事件时作用域是怎么回事啊,为什么要用this关键字,而直接使用元素本身就不行?

    如下代码,将this改为rows[i]为啥不起作用了 var rows = document.getElementsByTagName("tr"); for(var i=0;i&l ...

  10. 转:内核空间与用户空间数据交换的方式之一 --ioctl(通过字符设备演示)

    对于linux而言,内核程序和用户程序分别运行在内核空间和用户空间,要实现两者的数据交换,主要有以下几种方式:系统调用,读写系统文件(procfs,sysfs, seq_file,debugfs等), ...