This is the 3rd course in big data specification courses. Data model reivew 1, data model 的特点: Structured, operations on it, constrains. 2. different types of data model Retrieving data (week 1/2) Querying data from ralational DB. query data from mon…
Week 4 Big Data Precessing Pipeline 上图可以generalize 成下图,也就是Big data pipeline some high level processing operations in big data pipeline 在一个pipeline里 有哪些data transformation 方法?课程上讲了一个类比data transformation的例子,把原木加工成家具. 基本的data transformation 操作有 : Map 是…
Week 5, Big Data Analytics using Spark Programing in Spark Spark Core: Programming in Spark using RDD in pipelines RDD 创建过后,会有两种操作,Transformation 和 Action. 只有到了Action 阶段才会验证Transformation 操作是否正确,所以经常看到Action阶段有很多报错. 叫 lazy 下图是一个具体的例子. 教程里提到了cac…
week4 streaming data format 下面讲 data lakes schema-on-read: 从数据源读取raw data 直接放到 data lake 里,然后再读到model里 schema-on-write: 传统模式,把raw data 经过处理后放到data warehouse里,此时已经是结构化的数据,然后直接load 出来 data lake summary week5 - big data management 针对大数据,传统DBMS 需要提高的地方 s…
Introduction to data management 整个coures 2 是讲data management and storage 的,主要内容就是分布式文件系统,HDFS, Redis 等 What is data management? Introduction to data model 什么是data model? 三个component - Structure, Operations, Constrants 四个基本 data operation - selection(…
http://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/ Overview In recent years, this idea got a lot of traction and a whole bunch of solutions like Twitter's Storm, Yahoo's S4, Cloudera's Impala, Apache Spark, and Apache Tez…