What’s the difference between data mining and data warehousing?
Data mining is the process of finding patterns in a given data set. These patterns can often provide meaningful and insightful data to whoever is interested in that data. Data mining is used today in a wide variety of contexts – in fraud detection, as an aid in marketing campaigns, and even supermarkets use it to study their consumers.
Data warehousing can be said to be the process of centralizing or aggregating data from multiple sources into one common repository.
Example of data mining
|
If you’ve ever used a credit card, then you may know that credit card companies will alert you when they think that your credit card is being fraudulently used by someone other than you. This is a perfect example of data mining – credit card companies have a history of your purchases from the past and know geographically where those purchases have been made. If all of a sudden some purchases are made in a city far from where you live, the credit card companies are put on alert to a possible fraud since their data mining shows that you don’t normally make purchases in that city. Then, the credit card company can disable your card for that transaction or just put a flag on your card for suspicious activity.
Another interesting example of data mining is how one grocery store in the USA used the data it collected on it’s shoppers to find patterns in their shopping habits.
They found that when men bought diapers on Thursdays and Saturdays, they also had a strong tendency to buy beer.
The grocery store could have used this valuable information to increase their profits. One thing they could have done – odd as it sounds – is move the beer display closer to the diapers. Or, they could have simply made sure not to give any discounts on beer on Thursdays and Saturdays. This is data mining in action – extracting meaningful data from a huge data set.
Subscribe to our newsletter for more free interview questions.
Example of data warehousing – Facebook
A great example of data warehousing that everyone can relate to is what Facebook does. Facebook basically gathers all of your data – your friends, your likes, who you stalk, etc – and then stores that data into one central repository. Even though Facebook most likely stores your friends, your likes, etc, in separate databases, they do want to take the most relevant and important information and put it into one central aggregated database. Why would they want to do this? For many reasons – they want to make sure that you see the most relevant ads that you’re most likely to click on, they want to make sure that the friends that they suggest are the most relevant to you, etc – keep in mind that this is the data mining phase, in which meaningful data and patterns are extracted from the aggregated data. But, underlying all these motives is the main motive: to make more money – after all, Facebook is a business.
We can say that data warehousing is basically a process in which data from multiple sources/databases is combined into one comprehensive and easily accessible database. Then this data is readily available to any business professionals, managers, etc. who need to use the data to create forecasts – and who basically use the data for data mining.
Datawarehousing vs Datamining
Remember that data warehousing is a process that must occur before any data mining can take place. In other words, data warehousing is the process of compiling and organizing data into one common database, and data mining is the process of extracting meaningful data from that database. The data mining process relies on the data compiled in the datawarehousing phase in order to detect meaningful patterns.
In the Facebook example that we gave, the data mining will typically be done by business users who are not engineers, but who will most likely receive assistance from engineers when they are trying to manipulate their data. The data warehousing phase is a strictly engineering phase, where no business users are involved. And this gives us another way of defining the 2 terms: data mining is typically done by business users with the assistance of engineers, and data warehousing is typically a process done exclusively by engineers.
What’s the difference between data mining and data warehousing?的更多相关文章
- Datasets for Data Mining and Data Science
https://github.com/mattbane/RecommenderSystem http://grouplens.org/datasets/movielens/ KDDCUP-2012官网 ...
- Machine Learning and Data Mining(机器学习与数据挖掘)
Problems[show] Classification Clustering Regression Anomaly detection Association rules Reinforcemen ...
- Distributed Databases and Data Mining: Class timetable
Course textbooks Text 1: M. T. Oszu and P. Valduriez, Principles of Distributed Database Systems, 2n ...
- What is the most common software of data mining? (整理中)
What is the most common software of data mining? 1 Orange? 2 Weka? 3 Apache mahout? 4 Rapidminer? 5 ...
- A web crawler design for data mining
Abstract The content of the web has increasingly become a focus for academic research. Computer prog ...
- cluster analysis in data mining
https://en.wikipedia.org/wiki/K-means_clustering k-means clustering is a method of vector quantizati ...
- Weka 3: Data Mining Software in Java
官方网站: Weka 3: Data Mining Software in Java 相关使用方法博客 WEKA使用教程(经典教程转载) (实例数据:bank-data.csv) Weka初步一.二. ...
- data mining,machine learning,AI,data science,data science,business analytics
数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)的区别是什么? 数据科学(data science)和商业分析(business analytics ...
- 数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)的区别是什么? 数据科学(data science)和商业分析(business analytics)之间有什么关系?
本来我以为不需要解释这个问题的,到底数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)有什么区别,但是前几天因为有个学弟问我,我想了想发现我竟然也回答 ...
随机推荐
- 487-3279[POJ1002]
487-3279 Time Limit: 2000MS Memory Limit: 65536K Total Submissions: 284182 Accepted: 51001 Descr ...
- BZOJ2706 : [SDOI2012]棋盘覆盖
A类数据: 将棋盘黑白染色,相邻的点之间连边,求出二分图最大匹配即可. B类数据: 答案为$\lfloor\frac{n^2-1}{3}\rfloor$,用FFT加速计算即可,时间复杂度$O(L\lo ...
- 【转】推荐UML插件AmaterasUML
基于Green UML在使用过程中的问题(对于大工程,点击生成类图后不响应),自己只能再次寻找其他的插件.在无意中,发现AmaterasUML. 官方网站:http://amateras.source ...
- sublime注释插件与javascript注释规范
前言 代码中注释是不可少的,即使是自己写的代码,过了一段时间之后再重看,如果没有注释记录的话,可能会想不到当初是这样实现的,尤其是在业务逻辑比较复杂的项目,注释变得尤为重要.怎么优雅的写有用的注释呢? ...
- odeforces Beta Round #77 (Div. 2 Only)
A. Football time limit per test 2 seconds memory limit per test 256 megabytes input standard input o ...
- BZOJ 1015 题解
1015: [JSOI2008]星球大战starwar Description 很久以前,在一个遥远的星系,一个黑暗的帝国靠着它的超级武器统治者整个星系.某一天,凭着一个偶然的机遇,一支反抗军摧毁了帝 ...
- Oracle数据库合并行记录,WMSYS.WM_CONCAT 函數的用法
Sql代码 select t.rank, t.Name from t_menu_item t; 10 CLARK 10 KING 10 MILLER 20 ADAMS 20 F ...
- zabbix3.2.0beta2 监控模版
Zabbix监控中用到了一系列模版,nginx后端检测状态 微信告警等一系列常规的服务应用监控 memcached监控模版,可以自己重新定义memcached的端口 http://files.cnbl ...
- [LintCode] Ugly Number 丑陋数
Write a program to check whether a given number is an ugly number`. Ugly numbers are positive number ...
- Java I/O Basic
/* 记住每个类相应的用法*/流的分类: io包内定义了所有的流 分类: 方向:输入流.输出流 处理数据单位:字节流.字符流 功能不同:节点流.处理流 所有流类型,位于java.io包内,分别继承以下 ...