一、累积快照简介

累积快照事实表用于定义业务过程开始、结束以及期间的可区分的里程碑事件。通常在此类事实表中针对过程中的关键步骤都包含日期外键，并包含每个步骤的度量，这些度量的产生一般都会滞后于数据行的创建时间。累积快照事实表中的一行，对应某一具体业务的多个状态。例如，当订单产生时会插入一行。当该订单的状态改变时，累积事实表行被访问并修改。这种对累积快照事实表行的一致性修改在三种类型的事实表（事务、周期快照、累积快照）中具有独特性，对于前面两类事实表只追加数据，不会对已经存在的行进行更新操作。除了日期外键与每个关键过程步骤关联外，累积快照事实表中还可以包含其它维度和可选退化维度的外键。
累积快照事实表在库存、采购、销售、电商等业务领域都有广泛应用。比如在电商订单里面，下单的时候只有下单时间，但是在支付的时候，又会有支付时间，同理，还有发货时间，完成时间等。下面以销售订单数据仓库为例，讨论累积快照事实表的实现。
假设希望跟踪以下五个销售订单的里程碑：下订单、分配库房、打包、配送和收货，分别用状态N、A、P、S、R表示。这五个里程碑的日期及其各自的数量来自源数据库的销售订单表。一个订单完整的生命周期由五行数据描述：下订单时生成一条销售订单记录；订单商品被分配到相应库房时，新增一条记录，存储分配时间和分配数量；产品打包时新增一条记录，存储打包时间和数量；类似的，订单配送和订单客户收货时也都分别新增一条记录，保存各自的时间戳与数量。为了简化示例，不考虑每种状态出现多条记录的情况（例如，一条订单中的产品可能是在不同时间点分多次出库），并且假设这五个里程碑是以严格的时间顺序正向进行的。
对订单的每种状态新增记录只是处理这种场景的多种设计方案之一。如果里程碑的定义良好并且不会轻易改变，也可以考虑在源订单事务表中新增每种状态对应的数据列，例如，新增8列，保存每个状态的时间戳和数量。新增列的好处是仍然能够保证订单号的唯一性，并保持相对较少的记录数。但是，这种方案还需要额外增加一个last_modified字段记录订单的最后修改时间，用于Sqoop增量数据抽取。因为每条订单在状态变更时都会被更新，所以订单号字段已经不能作为变化数据捕获的比较依据。

二、建立累积快照表

1. 修改源库表结构

执行下面的脚本将源数据库中销售订单事务表结构做相应改变，以处理五种不同的状态。

use source;
-- 修改销售订单事务表
alter table sales_order
       change order_date status_date datetime,
       add order_status varchar(1) after status_date,
       change order_quantity quantity int;    

-- 删除sales_order表的主键
alter table sales_order change order_number order_number int not null;
alter table sales_order drop primary key;    

-- 建立新的主键
alter table sales_order add id int unsigned not null auto_increment primary key comment '主键' first;

说明：

将order_date字段改名为status_date，因为日期不再单纯指订单日期，而是指变为某种状态日期。
将order_quantity字段改名为quantity，因为数量变为某种状态对应的数量。
在status_date字段后增加order_status字段，存储N、A、P、S、R等订单状态之一。它描述了status_date列对应的状态值，例如，如果一条记录的状态为N，则status_date列是下订单的日期。如果状态是R，status_date列是收货日期。
每种状态都会有一条订单记录，这些记录具有相同的订单号，因此订单号不能再作为事务表的主键，需要删除order_number字段上的自增属性与主键约束。
新增id字段作为销售订单表的主键，它是表中的第一个字段。

2. 重建销售订单外部表

执行下面的语句重建销售订单外部表，使其与源表结构一致。

set search_path=ext;
drop external table sales_order;
create external table sales_order
(
  id bigint,
  order_number int,
  customer_number int,
  product_code int,
  verification_ind char(1),
  credit_check_flag char(1),
  new_customer_ind char(1),
  web_order_flag char(1),
  status_date timestamp,
  order_status char(1),
  request_delivery_date timestamp,
  entry_date timestamp,
  order_amount decimal(10 , 2 ),
  quantity int
)
location ('pxf://mycluster/data/ext/sales_order?profile=hdfstextsimple')
  format 'text' (delimiter=e',', null='null');       

comment on table sales_order is '销售订单外部表';
comment on column sales_order.id is '业务主键';
comment on column sales_order.order_number is '订单号';
comment on column sales_order.customer_number is '客户编号';
comment on column sales_order.product_code is '产品编码';
comment on column sales_order.verification_ind is '审核标志';
comment on column sales_order.credit_check_flag is '信用检查标志';
comment on column sales_order.new_customer_ind is '客户首个订单标志';
comment on column sales_order.web_order_flag is '线上订单标志';
comment on column sales_order.status_date is '状态日期';
comment on column sales_order.order_status is '订单状态';
comment on column sales_order.request_delivery_date is '请求交付日期';
comment on column sales_order.entry_date is '登记日期';
comment on column sales_order.order_amount is '销售金额';
comment on column sales_order.quantity is '数量';

3. 修改销售订单原始数据存储表

set search_path=rds;
alter table sales_order rename order_date to status_date;
alter table sales_order rename order_quantity to quantity;
alter table sales_order add column order_status char(1) default null;

comment on column sales_order.status_date is '状态日期';
comment on column sales_order.quantity is '数量';
comment on column sales_order.order_status is '订单状态';

说明：

将销售订单事实表中order_date和order_quantity字段的名称修改为与源表一致。
增加订单状态字段。
rds.sales_order并没有增加id列，原因有两个：一是该列只作为增量检查列，不用在原始数据表中存储；二是不需要再重新导入已有数据。

4. 修改销售订单事实表

set search_path=tds;
alter table sales_order_fact rename order_date_sk to status_date_sk;
alter table sales_order_fact rename order_quantity to quantity;
alter table sales_order_fact add column order_status char(1) default null;

comment on column sales_order_fact.status_date_sk is '状态日期外键';
comment on column sales_order_fact.quantity is '数量';
comment on column sales_order_fact.order_status is '订单状态'; 

create view v_sales_order_fact as
select order_number,
       customer_sk,
       product_sk,
       year_month,
       order_amount,
       request_delivery_date_sk,
       sales_order_attribute_sk,
       customer_zip_code_sk,
       shipping_zip_code_sk,
       max(case order_status when 'N' then status_date_sk else null end) nd,
	   max(case order_status when 'N' then quantity else null end) nq,
	   max(case order_status when 'A' then status_date_sk else null end) ad,
	   max(case order_status when 'A' then quantity else null end) aq,
	   max(case order_status when 'P' then status_date_sk else null end) pd,
	   max(case order_status when 'P' then quantity else null end) pq,
	   max(case order_status when 'S' then status_date_sk else null end) sd,
	   max(case order_status when 'S' then quantity else null end) sq,
	   max(case order_status when 'R' then status_date_sk else null end) rd,
	   max(case order_status when 'R' then quantity else null end) rq
  from sales_order_fact
 group by order_number,
          customer_sk,
          product_sk,
          year_month,
          order_amount,
          request_delivery_date_sk,
          sales_order_attribute_sk,
          customer_zip_code_sk,
          shipping_zip_code_sk;

-- 建立四个日期维度视图
create view v_allocate_date_dim
(allocate_date_sk, allocate_date, month, month_name, quarter, year)
as
select * from date_dim ;  

create view v_packing_date_dim
(packing_date_sk, packing_date, month, month_name, quarter, year)
as
select * from date_dim ;  

create view v_ship_date_dim
(ship_date_sk, ship_date, month, month_name, quarter, year)
as
select * from date_dim ;  

create view v_receive_date_dim
(receive_date_sk, receive_date, month, month_name, quarter, year)
as
select * from date_dim ;

说明：

对销售订单事实表结构的修改与rds.sales_order类似。
新建了一个视图v_sales_order_fact，将五个状态及其数量做行转列。
建立四个日期角色扮演维度视图，用来获取相应状态的日期代理键。

三、重建增量抽取Sqoop作业

使用下面的脚本重建Sqoop作业，因为源表会有多个相同的order_number，所以不能再用它作为检查字段，将检查字段改为id。

last_value=`sqoop job --show myjob_incremental_import | grep incremental.last.value | awk '{print $3}'`
sqoop job --delete myjob_incremental_import
sqoop job --create myjob_incremental_import -- import --connect "jdbc:mysql://172.16.1.127:3306/source?usessl=false&user=dwtest&password=123456" --table sales_order --target-dir /data/ext/sales_order --compress --where "entry_date < current_date()" --incremental append --check-column id --last-value $last_value

四、修改定期数据装载函数

create or replace function fn_regular_load ()
returns void as
$$
declare
    -- 设置scd的生效时间
    v_cur_date date := current_date;
    v_pre_date date := current_date - 1;
    v_last_load date;
begin
    -- 分析外部表
    analyze ext.customer;
    analyze ext.product;
    analyze ext.sales_order;              

    -- 将外部表数据装载到原始数据表
    truncate table rds.customer;
    truncate table rds.product;               

    insert into rds.customer select * from ext.customer;
    insert into rds.product select * from ext.product;
    insert into rds.sales_order
    select order_number,
           customer_number,
           product_code,
           status_date,
           entry_date,
           order_amount,
           quantity,
           request_delivery_date,
           verification_ind,
           credit_check_flag,
           new_customer_ind,
           web_order_flag,
           order_status
      from ext.sales_order;              

    -- 分析rds模式的表
    analyze rds.customer;
    analyze rds.product;
    analyze rds.sales_order;              

    -- 设置cdc的上限时间
    select last_load into v_last_load from rds.cdc_time;
    truncate table rds.cdc_time;
    insert into rds.cdc_time select v_last_load, v_cur_date;              

    -- 装载客户维度
    insert into tds.customer_dim
    (customer_number,
     customer_name,
     customer_street_address,
     shipping_address,
     isdelete,
     version,
     effective_date)
    select case flag
                when 'D' then a_customer_number
                else b_customer_number
            end customer_number,
           case flag
                when 'D' then a_customer_name
                else b_customer_name
            end customer_name,
           case flag
                when 'D' then a_customer_street_address
                else b_customer_street_address
            end customer_street_address,
           case flag
                when 'D' then a_shipping_address
                else b_shipping_address
            end shipping_address,
           case flag
                when 'D' then true
                else false
            end isdelete,
           case flag
                when 'D' then a_version
                when 'I' then 1
                else a_version + 1
            end v,
           v_pre_date
      from (select a.customer_number a_customer_number,
                   a.customer_name a_customer_name,
                   a.customer_street_address a_customer_street_address,
                   a.shipping_address a_shipping_address,
                   a.version a_version,
                   b.customer_number b_customer_number,
                   b.customer_name b_customer_name,
                   b.customer_street_address b_customer_street_address,
                   b.shipping_address b_shipping_address,
                   case when a.customer_number is null then 'I'
                        when b.customer_number is null then 'D'
                        else 'U'
                    end flag
              from v_customer_dim_latest a
              full join rds.customer b on a.customer_number = b.customer_number
             where a.customer_number is null -- 新增
                or b.customer_number is null -- 删除
                or (a.customer_number = b.customer_number
                    and not
                           (coalesce(a.customer_name,'') = coalesce(b.customer_name,'')
                        and coalesce(a.customer_street_address,'') = coalesce(b.customer_street_address,'')
                        and coalesce(a.shipping_address,'') = coalesce(b.shipping_address,'')
                        ))) t
             order by coalesce(a_customer_number, 999999999999), b_customer_number limit 999999999999;              

    -- 装载产品维度
    insert into tds.product_dim
    (product_code,
     product_name,
     product_category,
     isdelete,
     version,
     effective_date)
    select case flag
                when 'D' then a_product_code
                else b_product_code
            end product_code,
           case flag
                when 'D' then a_product_name
                else b_product_name
            end product_name,
           case flag
                when 'D' then a_product_category
                else b_product_category
            end product_category,
           case flag
                when 'D' then true
                else false
            end isdelete,
           case flag
                when 'D' then a_version
                when 'I' then 1
                else a_version + 1
            end v,
           v_pre_date
      from (select a.product_code a_product_code,
                   a.product_name a_product_name,
                   a.product_category a_product_category,
                   a.version a_version,
                   b.product_code b_product_code,
                   b.product_name b_product_name,
                   b.product_category b_product_category,
                   case when a.product_code is null then 'I'
                        when b.product_code is null then 'D'
                        else 'U'
                    end flag
              from v_product_dim_latest a
              full join rds.product b on a.product_code = b.product_code
             where a.product_code is null -- 新增
                or b.product_code is null -- 删除
                or (a.product_code = b.product_code
                    and not
                           (a.product_name = b.product_name
                        and a.product_category = b.product_category))) t
             order by coalesce(a_product_code, 999999999999), b_product_code limit 999999999999;              

    -- 装载销售订单事实表
    insert into sales_order_fact
    select a.order_number,
           customer_sk,
           product_sk,
           e.date_sk,
           e.year * 100 + e.month,
           order_amount,
           quantity,
           f.date_sk,
           g.sales_order_attribute_sk,
           h.customer_zip_code_sk,
           i.shipping_zip_code_sk,
           a.order_status
      from rds.sales_order a,
           v_customer_dim_his c,
           v_product_dim_his d,
           date_dim e,
           date_dim f,
           sales_order_attribute_dim g,
           v_customer_zip_code_dim h,
           v_shipping_zip_code_dim i,
           rds.customer j,
           rds.cdc_time k
     where a.customer_number = c.customer_number
       and a.status_date >= c.effective_date
       and a.status_date < c.expiry_date
       and a.product_code = d.product_code
       and a.status_date >= d.effective_date
       and a.status_date < d.expiry_date
       and date(a.status_date) = e.date
       and date(a.request_delivery_date) = f.date
       and a.verification_ind = g.verification_ind
       and a.credit_check_flag = g.credit_check_flag
       and a.new_customer_ind = g.new_customer_ind
       and a.web_order_flag = g.web_order_flag
       and a.customer_number = j.customer_number
       and j.customer_zip_code = h.customer_zip_code
       and j.shipping_zip_code = i.shipping_zip_code
       and a.entry_date >= k.last_load and a.entry_date < k.current_load;                            

    -- 重载PA客户维度
    truncate table pa_customer_dim;
    insert into pa_customer_dim
    select distinct a.*
      from customer_dim a,
           sales_order_fact b,
           v_customer_zip_code_dim c
     where c.customer_state = 'pa'
       and b.customer_zip_code_sk = c.customer_zip_code_sk
       and a.customer_sk = b.customer_sk;    

    -- 分析tds模式的表
    analyze customer_dim;
    analyze product_dim;
    analyze sales_order_fact;
    analyze pa_customer_dim;      

    -- 更新时间戳表的last_load字段
    truncate table rds.cdc_time;
    insert into rds.cdc_time select v_cur_date, v_cur_date;              

end;
$$
language plpgsql;

需要修改定期数据装载中的相应列名。在装载事务事实表时，只用entry_date >= last_load and entry_date < current_load条件就可以过滤出所有新录入的、包括五种状态的订单，因为每种状态的订单都有自己对应的录入时间。
HAWQ不能更新已有的表数据，因此在装载时只新增数据，然后通过视图转化为固定状态列的格式。注意，本示例中的累积周期快照视图仍然是以订单号字段作为逻辑上的主键。

五、测试

在源数据库的销售订单事务表中新增两个销售订单记录。

use source;

set @order_date := from_unixtime(unix_timestamp('2017-06-02 00:00:01') + rand() * (unix_timestamp('2017-06-02 12:00:00') - unix_timestamp('2017-06-02 00:00:01')));
set @request_delivery_date := from_unixtime(unix_timestamp(date_add(current_date, interval 5 day)) + rand() * 86400);
set @amount := floor(1000 + rand() * 9000);
set @quantity := floor(10 + rand() * 90);      

insert into source.sales_order values
  (null, 141, 1, 1, 'y', 'y', 'y', 'y',  @order_date, 'N', @request_delivery_date,
        @order_date, @amount, @quantity);    

set @order_date := from_unixtime(unix_timestamp('2017-06-02 12:00:00') + rand() * (unix_timestamp('2017-06-03 00:00:00') - unix_timestamp('2017-06-02 12:00:00')));
set @request_delivery_date := from_unixtime(unix_timestamp(date_add(current_date, interval 5 day)) + rand() * 86400);
set @amount := floor(1000 + rand() * 9000);
set @quantity := floor(10 + rand() * 90);      

insert into source.sales_order values
  (null, 142, 2, 2, 'y', 'y', 'y', 'y', @order_date, 'N', @request_delivery_date,
       @order_date, @amount, @quantity);    

commit;

设置时间窗口。

truncate table rds.cdc_time;
insert into rds.cdc_time select date '2017-06-02', date '2017-06-02';

执行定期装载脚本。

~/regular_etl.sh

查询v_sales_order_fact里的两个销售订单，确认定期装载成功。

select a.order_number,
       c.order_date,
       d.allocate_date,
       e.packing_date,
       f.ship_date,
       g.receive_date
  from v_sales_order_fact a
  left join v_order_date_dim c on a.nd = c.order_date_sk
  left join v_allocate_date_dim d on a.ad = d.allocate_date_sk
  left join v_packing_date_dim e on a.pd = e.packing_date_sk
  left join v_ship_date_dim f on a.sd = f.ship_date_sk
  left join v_receive_date_dim g on a.rd = g.receive_date_sk
 where a.order_number > 140
 order by order_number;

查询结果如图1所示，只有order_date列有值，其它日期都是空，因为这两个订单是新增的，并且还没有分配库房、打包、配送或收货。

图1

添加销售订单作为这两个订单的分配库房和/或打包的里程碑。

use source;    

set @order_date := from_unixtime(unix_timestamp('2017-06-03 00:00:00') + rand() * (unix_timestamp('2017-06-03 12:00:00') - unix_timestamp('2017-06-03 00:00:00')));
insert into sales_order
select null,
       order_number,
       customer_number,
       product_code,
       verification_ind,
       credit_check_flag,
       new_customer_ind,
       web_order_flag,
       @order_date,
       'A',
       request_delivery_date,
       @order_date,
       order_amount,
       quantity
  from sales_order
 where order_number = 141;

set @order_date := from_unixtime(unix_timestamp('2017-06-03 12:00:00') + rand() * (unix_timestamp('2017-06-04 00:00:00') - unix_timestamp('2017-06-03 12:00:00')));
insert into sales_order
select null,
       order_number,
       customer_number,
       product_code,
       verification_ind,
       credit_check_flag,
       new_customer_ind,
       web_order_flag,
       @order_date,
       'P',
       request_delivery_date,
       @order_date,
       order_amount,
       quantity
  from sales_order
 where id = 143;

set @order_date := from_unixtime(unix_timestamp('2017-06-03 12:00:00') + rand() * (unix_timestamp('2017-06-04 00:00:00') - unix_timestamp('2017-06-03 12:00:00')));
insert into sales_order
select null,
       order_number,
       customer_number,
       product_code,
       verification_ind,
       credit_check_flag,
       new_customer_ind,
       web_order_flag,
       @order_date,
       'A',
       request_delivery_date,
       @order_date,
       order_amount,
       quantity
  from sales_order
 where order_number = 142;

commit;

设置时间窗口。

truncate table rds.cdc_time;
insert into rds.cdc_time select date '2017-06-03', date '2017-06-03';

执行定期装载脚本。

~/regular_etl.sh

查询v_sales_order_fact表里的两个销售订单，确认定期装载成功。查询结果如图2所示。第一个订单具有了allocate_date和packing_date，第二个只具有allocate_date。

图2

添加销售订单作为这两个订单后面的里程碑：打包、配送和/或收货。注意四个日期可能相同。

use source;    

set @order_date := from_unixtime(unix_timestamp('2017-06-04 00:00:00') + rand() * (unix_timestamp('2017-06-04 12:00:00') - unix_timestamp('2017-06-04 00:00:00')));
insert into sales_order
select null,
       order_number,
       customer_number,
       product_code,
       verification_ind,
       credit_check_flag,
       new_customer_ind,
       web_order_flag,
       @order_date,
       'S',
       request_delivery_date,
       @order_date,
       order_amount,
       quantity
  from sales_order
 where order_number = 141
 order by id desc
 limit 1;

set @order_date := from_unixtime(unix_timestamp('2017-06-04 12:00:00') + rand() * (unix_timestamp('2017-06-05 00:00:00') - unix_timestamp('2017-06-04 12:00:00')));
insert into sales_order
select null,
       order_number,
       customer_number,
       product_code,
       verification_ind,
       credit_check_flag,
       new_customer_ind,
       web_order_flag,
       @order_date,
       'R',
       request_delivery_date,
       @order_date,
       order_amount,
       quantity
  from sales_order
 where order_number = 141
 order by id desc
 limit 1;

set @order_date := from_unixtime(unix_timestamp('2017-06-04 12:00:00') + rand() * (unix_timestamp('2017-06-05 00:00:00') - unix_timestamp('2017-06-04 12:00:00')));
insert into sales_order
select null,
       order_number,
       customer_number,
       product_code,
       verification_ind,
       credit_check_flag,
       new_customer_ind,
       web_order_flag,
       @order_date,
       'P',
       request_delivery_date,
       @order_date,
       order_amount,
       quantity
  from sales_order
 where order_number = 142
 order by id desc
 limit 1;

commit;

设置时间窗口。

truncate table rds.cdc_time;
insert into rds.cdc_time select date '2017-06-04', date '2017-06-04';

执行定期装载脚本。

~/regular_etl.sh

查询v_sales_order_fact表里的两个销售订单，确认定期装载成功。查询结果如图3所示。第一个订单号为141的订单，具有了全部日期，这意味着订单已完成（客户已经收货）。第二个订单已经打包，但是还没有配送。

图3

六、修改周期快照表装载函数

累积快照将原来的一个数量order_quantity变为了每种状态对应一个数量，因此需要修改周期快照表装载函数fn_month_sum。该函数汇总月底订单金额和数量，我们必须重新定义数量，假设需要统计的是新增订单中的数量。修改后的函数如下。

create or replace function tds.fn_month_sum(p_year_month int)   
returns void as   
$$  
declare      
    sqlstring varchar(1000);     
begin  
    -- 幂等操作，先删除上月数据  
    sqlstring := 'truncate table month_end_sales_order_fact_1_prt_p' || cast(p_year_month as varchar);  
    execute sqlstring;  
  
    -- 插入上月销售汇总数据  
    insert into month_end_sales_order_fact    
    select t1.year_month,   
           t2.product_sk,   
           coalesce(t2.month_order_amount,0),   
           coalesce(t2.month_order_quantity,0)   
      from (select p_year_month year_month) t1   
      left join (select year_month, product_sk, sum(order_amount) month_order_amount, sum(quantity) month_order_quantity  
                   from sales_order_fact   
                  where year_month = p_year_month and coalesce(order_status,'N') = 'N' 
                  group by year_month,product_sk) t2   
           on t1.year_month = t2.year_month;  
   
end;  
$$      
language plpgsql;

HAWQ取代传统数仓实践（十四）——事实表技术之累积快照的更多相关文章

HAWQ取代传统数仓实践（十九）——OLAP
一.OLAP简介 1. 概念 OLAP是英文是On-Line Analytical Processing的缩写,意为联机分析处理.此概念最早由关系数据库之父E.F.Codd于1993年提出.OLAP允 ...
HAWQ取代传统数仓实践（十六）——事实表技术之迟到的事实
一.迟到的事实简介数据仓库通常建立于一种理想的假设情况下,这就是数据仓库的度量(事实记录)与度量的环境(维度记录)同时出现在数据仓库中.当同时拥有事实记录和正确的当前维度行时,就能够从容地首先维护维 ...
HAWQ取代传统数仓实践（十三）——事实表技术之周期快照
一.周期快照简介周期快照事实表中的每行汇总了发生在某一标准周期,如一天.一周或一月的多个度量.其粒度是周期性的时间段,而不是单个事务.周期快照事实表通常包含许多数据的总计,因为任何与事实表时间范围一 ...
HAWQ取代传统数仓实践（十五）——事实表技术之无事实的事实表
一.无事实事实表简介在多维数据仓库建模中,有一种事实表叫做"无事实的事实表".普通事实表中,通常会保存若干维度外键和多个数字型度量,度量是事实表的关键所在.然而在无事实的事实表中 ...
HAWQ取代传统数仓实践（十）——维度表技术之杂项维度
一.什么是杂项维度简单地说,杂项维度就是一种包含的数据具有很少可能值的维度.事务型商业过程通常产生一系列混杂的.低基数的标志位或状态信息.与其为每个标志或属性定义不同的维度,不如建立单独的将不同维度 ...
HAWQ取代传统数仓实践（八）——维度表技术之角色扮演维度
单个物理维度可以被事实表多次引用,每个引用连接逻辑上存在差异的角色维度.例如,事实表可以有多个日期,每个日期通过外键引用不同的日期维度,原则上每个外键表示不同的日期维度视图,这样引用具有不同的含义.这 ...
HAWQ取代传统数仓实践（十七）——事实表技术之累积度量
累积度量指的是聚合从序列内第一个元素到当前元素的数据,例如统计从每年的一月到当前月份的累积销售额.本篇说明如何在销售订单示例中实现累积月销售数量和金额,并对数据仓库模式.初始装载.定期装载做相应地修改 ...
HAWQ取代传统数仓实践（九）——维度表技术之退化维度
退化维度技术减少维度的数量,简化维度数据仓库模式.简单的模式比复杂的更容易理解,也有更好的查询性能. 有时,维度表中除了业务主键外没有其它内容.例如,在本销售订单示例中,订单维度表除了订 ...
HAWQ取代传统数仓实践（七）——维度表技术之维度子集
有些需求不需要最细节的数据.例如更想要某个月的销售汇总,而不是某天的数据.再比如相对于全部的销售数据,可能对某些特定状态的数据更感兴趣等.此时事实数据需要关联到特定的维度,这些特定维度包含在从细节维度 ...

随机推荐

Python中 and，or 的计算规则
一.纯 and 和 or 语句 1. 在纯and语句中,如果每一个表达式都不是假的话,那么返回最后一个,因为需要一直匹配直到最后一个.如果有一个是假,那么返回假2. 在纯or语句中,只要有一个表达式不 ...
blogCMS整理
一.在urls中写路由二.返回登录页面(login.html中写前端代码) - username(用户名) - password(密码) - validCode(验证码) -submit(提交按钮) ...
使用阿里的maven库
快使用阿里云的maven仓库自从开源中国的maven仓库挂了之后就一直在用国外的仓库,慢得想要砸电脑的心都有了.如果你和我一样受够了国外maven仓库的龟速下载?快试试阿里云提供的maven仓库,从 ...
[nowcoder]青蛙
链接:https://www.nowcoder.com/acm/contest/158/F 挺有意思的一道题,考场并查集忘记路径压缩就没AK== 很显然一个贪心是不,每只青蛙使劲往前跳,能跳多远跳多远 ...
《大型网站系统与JAVA中间件实践》读书笔记-大型网站架构演进
大型网站架构演进大型网站是一种很常见的分布式系统,除了海量数据和高并发的访问量,本身业务和系统也复杂. 大型网站的架构演进我们现在常用的大型网站都是从小网站一步一步发展起来的,这个过程中会有一些 ...
[USACO08DEC]在农场万圣节Trick or Treat on the Farm
题目描述 Every year in Wisconsin the cows celebrate the USA autumn holiday of Halloween by dressing up i ...
EF Code-First 学习之旅从已存在的数据库进行Code First
namespace EFDemo { using System; using System.Data.Entity; using System.ComponentModel.DataAnnotatio ...
mac 安装python3
Python有两个版本,一个是2.x版,一个是3.x版,这两个版本是不兼容的. 现在 Mac 上默认安装的 python 版本为 2.7 版本,若安装新版本需要通过该地址进行下载: http ...
java实现定时任务的三种方法 - 转载
java实现定时任务的三种方法 /** * 普通thread * 这是最常见的,创建一个thread,然后让它在while循环里一直运行着, * 通过sleep方法来达到定时任务的效果.这样可以快速简 ...
2017 湘潭邀请赛&JSCPC G&J
训练的时候对G想了一个假算法..也有很大可能是写错了.. 下来一看别人的G 看起来很奇妙.. 开始把所有的左括号翻成右括号,然后cost*=-1 这样在优先队列中就是最优的然后for每一段如果前缀 ...

HAWQ取代传统数仓实践（十四）——事实表技术之累积快照

一、累积快照简介

二、建立累积快照表

1. 修改源库表结构

2. 重建销售订单外部表

3. 修改销售订单原始数据存储表

4. 修改销售订单事实表

三、重建增量抽取Sqoop作业

四、修改定期数据装载函数

五、测试

六、修改周期快照表装载函数

HAWQ取代传统数仓实践（十四）——事实表技术之累积快照的更多相关文章

随机推荐

热门专题