week4 streaming data format 下面讲 data lakes schema-on-read: 从数据源读取raw data 直接放到 data lake 里,然后再读到model里 schema-on-write: 传统模式,把raw data 经过处理后放到data warehouse里,此时已经是结构化的数据,然后直接load 出来 data lake summary week5 - big data management 针对大数据,传统DBMS 需要提高的地方 s…
Introduction to data management 整个coures 2 是讲data management and storage 的,主要内容就是分布式文件系统,HDFS, Redis 等 What is data management? Introduction to data model 什么是data model? 三个component - Structure, Operations, Constrants 四个基本 data operation - selection(…
Week 4 Big Data Precessing Pipeline 上图可以generalize 成下图,也就是Big data pipeline some high level processing operations in big data pipeline 在一个pipeline里 有哪些data transformation 方法?课程上讲了一个类比data transformation的例子,把原木加工成家具. 基本的data transformation 操作有 : Map 是…
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, Data Exploration 一般有两种方法,summary statistics 和 visualization Summary statistics (mean  平均数,median 中位数, mode 最常见的数) high Kurtosis 预示着有outlier的存在 visuali…
Week 5, Big Data Analytics using Spark     Programing in Spark   Spark Core: Programming in Spark using RDD in pipelines RDD 创建过后,会有两种操作,Transformation 和 Action. 只有到了Action 阶段才会验证Transformation 操作是否正确,所以经常看到Action阶段有很多报错. 叫 lazy 下图是一个具体的例子. 教程里提到了cac…
This is the 3rd course in big data specification courses. Data model reivew 1, data model 的特点: Structured, operations on it, constrains. 2. different types of data model Retrieving data (week 1/2) Querying data from ralational DB. query data from mon…
什么是分布式文件系统?为什么需要分布式文件系统? 如果文件系统可以管理用网络连接的很多个存储单元,叫分布式文件系统. 分布式文件系统提供了数据可扩展性,容错性,高并发. 这些是传统文件系统不具有的. Hadoop getting started 为什么用Hadoop? Hadoop 的 4 个What 和 How. Hadoop 的主要Goal: 1. 可扩展来增加 node 2. 容错,Node down 可以很容易recover 3. 可以读取各种格式的数据(structured, unst…
Status: week 2 done. Week 1, 主要讲了大数据的的来源 - 机器产生的数据,人产生的数据(比如社交软件上的update, 一般是unstructed data), 组织产生的数据(一般是structured data) 怎么把unstructured data 转化成 structured data? 利用 Hadoop, Storm, Spark and NoSQL. Hadoop 能解决data量大的问题,因为它是支持分布式计算的. Storm 和 Spark 能分…
week 3 Classification KNN :基本思想是 input value 类似,就可能是同一类的 Decision Tree Naive Bayes Week 4 Evaluating model Over-fitting 怎么在Decision Tree 训练时避免 overfitting: Pre-Pruning 和 Post-Pruning pre-pruning 两个停止条件:1. 某个node上的record数目小于一定量,比如 <20个, 2. 纯度到达一定数值,比如…
1. Keystonejs http://keystonejs.com/ 2. Apostrophe http://apostrophenow.org/…
https://en.wikipedia.org/wiki/List_of_content_management_systems Microsoft ASP.NET Name Platform Supported databases Latest stable release Licenses Latest release date C1 CMS ASP.NET (Web Forms, MVC) XML, SQL Server 6.1 Mozilla Public License 2017-04…
  英文稿: The “Hype Cycle for Emerging Technologies” report is the longest-running annual Hype Cycle, providing a cross-industry perspective on the technologies and trends that IT managers should consider in developing emerging-technology portfolios (se…
ABSTRACT Recent technological advancement have led to a deluge of data from distinctive domains (e.g., health care and scientific sensors, user-generated data, Internet and financial companies, and supply chain systems) over the past two decades. The…
原文地址:http://www.javacodegeeks.com/2015/07/mysql-vs-mongodb.html 1. Introduction It would be fair to say that as IT professionals we are living in the golden age of data management era. As our software systems become more complex and more distributed,…
An approach is provided in a hypervised computer system where a page table request is at an operating system running in the hypervised computer system. The operating system determines whether the page table request requires the hypervisor to process.…
Carlo Batini, Cinzia Cappiello, Chiara Francalanci, and Andrea Maurino. 2009. Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41, 3, Article 16 (July 2009), 52 pages. (gs:173) 这篇论文是关于数据质量方法的综述,全文共52页(其中正文34页,附录18页),对现有的"d…
https://www.gartner.com/doc/reprints?id=1-4LC8PAW&ct=171130&st=sb Summary Security and risk management leaders are implementing and expanding SIEM to improve early targeted attack detection and response. Advanced users seek SIEM with advanced prof…
https://www.fdic.gov/regulations/examinations/credit_card/ch8.html Types of Scoring FICO Scores    VantageScore    Other Scores              Application Scoring              Attrition Scoring              Bankruptcy Scoring              Behavior Scor…
http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying 主要的思想, 将所有的系统都可以看作两部分,真正的数据log系统和各种各样的query engine 所有的一致性由log系统来保证,其他各种query engine不需要考虑一致性,安全性,只需要不停的从log系统来同步数据,如果数据丢失或c…
Problems[show] Classification Clustering Regression Anomaly detection Association rules Reinforcement learning Structured prediction Feature engineering Feature learning Online learning Semi-supervised learning Unsupervised learning Learning to rank…
从面向找工作的角度出发,我觉得以下课程有很大帮助: 首推Robert Sedgewick,也是我觉得对我帮助最大的老师,讲课特点是能把复杂的算法讲解清楚(典型例子:红黑树,KMP算法) 他在Coursera有四门课,循序渐进,也越来越理论,尤其是前三门,非常值得一上.个人认为上完前两门,你的理论基础(当然还要结合刷题的实践)已经可以虐普遍的小公司和大部分的大公司了.上完第三门可以虐一流公司如Google,Facebook,Linkedin等.第四门还没开,不过看过课程介绍,觉得上完可以去当大公司…
BACKGROUND The present invention relates to video processing systems. Advances in imaging technology have led to high resolution cameras for personal use as well as professional use. Personal uses include digital cameras and camcorders that can captu…
文章标题 One SQL to Rule Them All – an Efficient and Syntactically Idiomatic Approach to Management of Streams and Tables 用SQL统一所有:一种有效的.语法惯用的流和表管理方法 syntactically 句法上;语法上;句法;句法性地;句法特征 idiomatic [ˌɪdiəˈmætɪk] 惯用的;合乎语言习惯的;习语的 approach [əˈproʊtʃ] v.(在距离或时间…
Project Management ProcessDescription .......................................................................................................................................................................................1STAGE/STEP/TASK SUMMARY LIST…
1. Business Analytic Applications Data Analytics Also referred to as 'Business Analytics' or 'Business Intelligence' Although basic reporting capabilities have been built into ERP systems since their inception, there is increasing interest in making…
https://github.com/onurakpolat/awesome-bigdata A curated list of awesome big data frameworks, resources and other awesomeness. Inspired by awesome-php, awesome-python, awesome-ruby, hadoopecosystemtable & big-data. Your contributions are always welco…
January 22, 2019Use Cases, Apache Flink Lasse Nedergaard     Recently there has been significant discussion about edge computing as a major technology trend in 2019. Edge computing brings computing capabilities away from the cloud, and rather close t…
原文地址:http://www.yourenterprisearchitect.com/2011/10/introducing-service-bus.html.   Expert Systems. The enterprise is typically composed of expert systems that perform core business functions and perform them well.  A retail system consists of points…
Open source software has become a fundamental building block for some of the biggest websites. And as those websites have grown, best practices and guiding principles around their architectures have emerged. This chapter seeks to cover some of the ke…
数据Data 描述事物的符号记录成为数据. 数据是数据库中存储的基本对象.   除了基本的数字之外.像图书的名称.价格.作者都可以称为数据. 将多种数据记录列成一张表.通过数据表管理数据. 每一行的数据成为记录(recorder),每一列的内容叫做字段(列field) 每一列都有自己的数据类型. 数据库Database DB 数据库是存放数据的仓库,所有的数据在计算机存储设备上保存,而且所有保存的数据会按照一定的格式进行保存. 数据库是长期储存在计算机内.有组织的.可共享的大量数据的集合. 数据…