1. Hadoop It would be impossible to talk about open source data analytics without mentioning Hadoop. This Apache Foundation project has become nearly synonymous with big data, and it enables large-scale distributed processing of extremely large data…
ABSTRACT Recent technological advancement have led to a deluge of data from distinctive domains (e.g., health care and scientific sensors, user-generated data, Internet and financial companies, and supply chain systems) over the past two decades. The…
http://www.infoq.com/articles/bigdata-analytics-for-security This article first appeared in the IEEE Security & Privacymagazine and is brought to you by InfoQ & IEEE Computer Society. Enterprises routinely collect terabytes of security-relevant da…
Assessment TaskIAB303 Data Analyticsfor Business InsightSemester I 2019Assessment 2 – Data Analytics NotebookName Assessment 2 – Data Analytics NotebookDue Sun 28 Apr 11:59pmWeight 30% (indicative weighting)Submit Jupyter Notebook via BlackboardRatio…
Each of the five Amazon Web Services (AWS) certifications brings in an average salary of more than $100,000. There are more than 685,000 Project Management Professionals (PMPs) worldwide, and their average annual salary is $116,094. Each of the five…
Week 5, Big Data Analytics using Spark     Programing in Spark   Spark Core: Programming in Spark using RDD in pipelines RDD 创建过后,会有两种操作,Transformation 和 Action. 只有到了Action 阶段才会验证Transformation 操作是否正确,所以经常看到Action阶段有很多报错. 叫 lazy 下图是一个具体的例子. 教程里提到了cac…
Week 4 Big Data Precessing Pipeline 上图可以generalize 成下图,也就是Big data pipeline some high level processing operations in big data pipeline 在一个pipeline里 有哪些data transformation 方法?课程上讲了一个类比data transformation的例子,把原木加工成家具. 基本的data transformation 操作有 : Map 是…
This is the 3rd course in big data specification courses. Data model reivew 1, data model 的特点: Structured, operations on it, constrains. 2. different types of data model Retrieving data (week 1/2) Querying data from ralational DB. query data from mon…
1. 摘要 数据是每项技术业务的支柱,作为一个健康医疗技术平台,Halodoc 更是如此,用户可以通过以下方式与 Halodoc 交互: 送药 与医生交谈 实验室测试 医院预约和药物 所有这些交互都会产生高度敏感.多样化且通常是非结构化的数据. 因此随着公司的成长,必须拥有一个强大的数据平台,平台需要满足如下需求: 确保数据的隐私和安全 在处理结构化和半/非结构化数据时可靠.可扩展.快速且高可用 促进为业务/运营团队生成报告和实时仪表板 为数据科学团队提供一个平台来运行实验.模型和存储结果 2.…