Database

https://en.wikipedia.org/wiki/Database

A database is an organized collection of data.[1] A relational database, more restrictively, is a collection of schemas, tables, queries, reports, views, and other elements. Database designers typically organize the data to model aspects of reality in a way that supports processes requiring information, such as (for example) modelling the availability of rooms in hotels in a way that supports finding a hotel with vacancies.

https://searchsqlserver.techtarget.com/definition/database

Definition

database (DB)

 

A database is a collection of information that is organized so that it can be easily accessed, managed and updated.

Data is organized into rows, columns and tables, and it is indexed to make it easier to find relevant information. Data gets updated, expanded and deleted as new information is added. Databases process workloads to create and update themselves, querying the data they contain and running applications against it.

Data Warehouse

https://en.wikipedia.org/wiki/Data_warehouse

In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence.[1] DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place[2] that are used for creating analytical reports for workers throughout the enterprise.[3]

两者区别

数据库面向现实世界的建模, 用于数据业务处理。

数据仓库,用于归纳数据于一处, 以便输出分析报告, 做数据挖掘等使用。

https://www.healthcatalyst.com/database-vs-data-warehouse-a-comparative-review

What I will refer to as a “database” in this post is one designed to make transactional systems run efficiently. Typically, this type of database is an OLTP (online transaction processing) database.

The important fact is that a transactional database doesn’t lend itself to analytics. To effectively perform analytics, you need a data warehouse. A data warehouse is a database of a different kind: an OLAP (online analytical processing) database. A data warehouse exists as a layer on top of another database or databases (usually OLTP databases). The data warehouse takes the data from all these databases and creates a layer optimized for and dedicated to analytics.

So the short answer to the question I posed above is this: A database designed to handle transactions isn’t designed to handle analytics. It isn’t structured to do analytics well. A data warehouse, on the other hand, is structured to make analytics fast and easy.

  Database Data Warehouse
Definition Any collection of data organized for storage, accessibility, and retrieval. A type of database that integrates copies of transaction data from disparate source systems and provisions them for analytical use.
Types There are different types of databases, but the term usually applies to an OLTP application database, which we’ll focus on throughout this table.Other types of databases include OLAP (used for data warehouses), XML, CSV files, flat text, and even Excel spreadsheets. We’ve actually found that many healthcare organizations use Excel spreadsheets to perform analytics (a solution that is not scalable). A data warehouse is an OLAP database. An OLAP database layers on top of OLTPs or other databases to perform analytics.An important side note about this type of database: Not all OLAPs are created equal. They differ according to how the data is modeled. Most data warehouses employ either an enterprise or dimensional data model, but at Health Catalyst, we advocate a unique, adaptive Late- Binding™ approach. You can learn more about why the Late-Binding™ approach is so important in healthcare analytics in Late-Binding vs. Models: A Comparison of Healthcare Data Warehouse Methodologies.
Similarities Both OLTP and OLAP systems store and manage data in the form of tables, columns, indexes, keys, views, and data types. Both use SQL to query the data.
How used Typically constrained to a single application: one application equals one database. An EHR is a prime example of a healthcare application that runs on an OLTP database. OLTP allows for quick real-time transactional processing. It is built for speed and to quickly record one targeted process (ex: patient admission date and time). Accommodates data storage for any number of applications: one data warehouse equals infinite applications and infinite databases.OLAP allows for one source of truth for an organization’s data. This source of truth is used to guide analysis and decision-making within an organization (ex: total patients over age 18 who have been readmitted, by department and by month). Interestingly enough, complex queries like the one just described are much more difficult to handle in an OLTP database.
Service Level Agreement (SLA) OLTP databases must typically meet 99.99% uptime. System failure can result in chaos and lawsuits. The database is directly linked to the front end application.Data is available in real time to serve the here-and-now needs of the organization. In healthcare, this data contributes to clinicians delivering precise, timely bedside care. With OLAP databases, SLAs are more flexible because occasional downtime for data loads is expected. The OLAP database is separated from frontend applications, which allows it to be scalable.Data is refreshed from source systems as needed (typically this refresh occurs every 24 hours). It serves historical trend analysis and business decisions.
Optimization Optimized for performing read-write operations of single point transactions. An OLTP database should deliver sub-second response times.Performing large analytical queries on such a database is a bad practice, because it impacts the performance of the system for clinicians trying to use it for their day-to-day work. An analytical query could take several minutes to run, locking all clinicians out in the meantime. Optimized for efficiently reading/retrieving large data sets and for aggregating data. Because it works with such large data sets, an OLAP database is heavy on CPU and disk bandwidth.A data warehouse is designed to handle large analytical queries. This eliminates the performance strain that analytics would place on a transactional system.
Data Organization An OLTP database structure features very complex tables and joins because the data is normalized (it is structured in such a way that no data is duplicated). Making data relational in this way is what delivers storage and processing efficiencies—and allows those sub-second response times. In an OLAP database structure, data is organized specifically to facilitate reporting and analysis, not for quick-hitting transactional needs. The data is denormalized to enhance analytical query response times and provide ease of use for business users. Fewer tables and a simpler structure result in easier reporting and analysis.
Reporting/Analysis Because of the number of table joins, performing analytical queries is very complex. They usually require the expertise of a developer or database administrator familiar with the application.Reporting is typically limited to more static, siloed needs. You can actually get quite a bit of reporting out of today’s EHRs (which run on an OLTP database), but these reports are static,one-time lists in PDF format. For example, you might generate a monthly report of heart failure readmissions or a list of all patients with a central line inserted. These reports are helpful— particularly for real-time reporting for bedside care—but they don’t allow in-depth analysis. With fewer table joins, analytical queries are much easier to perform. This means that semi-technical users (anyone who can write a basic SQL query) can fill their own needs.The possibilities for reporting and analysis are endless. When it comes to analyzing data, a static list is insufficient. There’s an intrinsic need for aggregating, summarizing, and drilling down into the data. A data warehouse enables you to perform many types of analysis:

  • Descriptive (what has happened)
  • Diagnostic (why it happened)
  • Predictive (what will happen)
  • Prescriptive (what to do about it)

This is the level of analytics required to drive real quality and cost improvement in he

Hadoop生态类比

http://hadoop.apache.org/

  • HBase™: A scalable, distributed database that supports structured data storage for large tables.
  • Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.

HIVE

http://hive.apache.org/

The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.

HBASE

http://hbase.apache.org/

Apache HBase™ is the Hadoop database, a distributed, scalable, big data store.

Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

 
 

DataBase vs Data Warehouse的更多相关文章

  1. 场景4 Data Warehouse Management 数据仓库

    场景4 Data Warehouse Management 数据仓库 parallel 4 100% —> 必须获得指定的4个并行度,如果获得的进程个数小于设置的并行度个数,则操作失败 para ...

  2. 对数据集“dsArea”执行查询失败。 (rsErrorExecutingCommand),Query execution failed for dataset 'dsArea'. (rsErrorExecutingCommand),Manually process the TFS data warehouse and analysis services cube

    错误提示: 处理报表时出错. (rsProcessingAborted)对数据集“dsArea”执行查询失败. (rsErrorExecutingCommand)Team System 多维数据集或者 ...

  3. 使用PowerShell在Azure China创建Data Warehouse

    微软的Azure Data Warehouse是基于MPP架构的分布式系统: Control Node负责管理系统和接受用户的请求,Compute Node负责计算. 目前在国内Azure Data ...

  4. 混合 Data Warehouse 和 Big Data 倉庫的新架構

    (讀書筆記)許多公司,儘管想導入 Big Data,仍必須繼續用 Data Warehouse 來管理結構化的營運數據.系統記錄.而 Big Data 的出現,為 Data Warehouse 提供了 ...

  5. Azure SQL Data Warehouse

    Azure SQL Data Warehouse & AWS Redshift Amazon Redshift Amazon Redshift 是一种快速.完全托管的 PB 级数据仓库,可方便 ...

  6. 浅析基于微软SQL Server 2012 Parallel Data Warehouse的大数据解决方案

    作者 王枫发布于2014年2月19日 综述 随着越来越多的组织的数据从GB.TB级迈向PB级,标志着整个社会的信息化水平正在迈入新的时代 – 大数据时代.对海量数据的处理.分析能力,日益成为组织在这个 ...

  7. 转:浅析基于微软SQL Server 2012 Parallel Data Warehouse的大数据解决方案

    综述 随着越来越多的组织的数据从GB.TB级迈向PB级,标志着整个社会的信息化水平正在迈入新的时代 – 大数据时代.对海量数据的处理.分析能力,日益成为组织在这个时代决胜未来的关键因素,而基于大数据的 ...

  8. Data Warehouse

    Knowledge Discovery Process OLTP & OLAP 联机事务处理(OLTP, online transactional processing)系统:涵盖组织机构大部 ...

  9. data warehouse 1.0 vs 2.0

    data warehouse 1.01. EDW goal, separate data marts reqlity2. batch oriented etl3. IT driven BI - das ...

随机推荐

  1. 与非java语言使用RSA加解密遇到的问题:algid parse error, not a sequence

    遇到的问题 在一个与Ruby语言对接的项目中,决定使用RSA算法来作为数据传输的加密与签名算法.但是,在使用Ruby生成后给我的私钥时,却发生了异常:IOException: algid parse ...

  2. vi命令下常用命令

    dd:删除游标所在的一整行(常用)ndd:n为数字.删除光标所在的向下n行,例如20dd则是删除光标所在的向下20行d1G:删除光标所在到第一行的所有数据dG:删除光标所在到最后一行的所有数据d$:删 ...

  3. 【Teradata】使用arcmain进行不落地数据迁移(管道)

    1.备份脚本准备 //脚本bak_ds.arc .logon 192.168.253.222/sysdba,learning1510; archive data tables(DS) ,release ...

  4. 解决connect() failed (111: Connection refused) while connecting to upstream

    使用nginx时, 有可能遇到connect() failed (111: Connection refused) while connecting to upstream的问题. 如果upstrea ...

  5. C#深度学习のLINQ

    一.LINQ的由来 LINQ是Language Integrated Query的缩写,意思是语言扩展查询 查询是一种从数据源检索数据的表达式. 查询通常用专门的查询语言来表示. 随着时间的推移,人们 ...

  6. 洛谷P2243 电路维修

    题目地址 转化为图论问题 对于每个交叉点(X,Y)抽象成节点.与它相邻的四个点中,可以直接连线的边权为0,否则边权为1. 用死了的SPFA解决图论问题. #include <cstring> ...

  7. day15-面向对象基础(二)

    今天整理类的组合以及类的三大特性 1.类的组合 2.类的继承 3.类的封装 4.类的多态 开始今日份整理 1.类的组合 类与类之间,并不是独立的,很多的时候在正常使用的时候都是类与类之间互相调用,所以 ...

  8. C# PDF转Image图片

    概述 PDF是常用的文件格式之一,通常情况下,我们可以使用itextsharp生产PDF文件:可是如何将PDF文件转换成图片那?目前常用的: 思路1.根据PDF绘画轨迹重新绘制图片: 思路2.是将PD ...

  9. ASP.NET Core 配置跨域(CORS)

    1.安装程序CORS程序包 Install-Package Microsoft.AspNetCore.Mvc.Cors 一般默认都带了此程序包的 2.配置CORS服务 在 Startup类,Confi ...

  10. cmd执行超大sql文件

    osql -S 127.0.0.1 -U sa -P 123456 -i d:\test.sql osql为SQL Server的命令,要在cmd中执行该命令,一般安装完SQL Server后该命令对 ...