SLIQ/SPRINT

*/-->

SLIQ/SPRINT

Before SLIQ, most classification alogrithms have the problem that they do not scale. Because these alogrithms have the limit that the traning data should fit in memory. That's why SLIQ was raised.

1 Generic Decision-Tree Classification

Most decision-tree classifiers perform classification in two phases: Tree Building and Tree Pruning.

1.1 Tree Building

MakeTree(Training Data T)
Partition(T); Partition(Data S)
if(all points in S are in the same class) then return;
Evaluate splits for each attribute A
Use the best split found to partition S into S1 and S2
Partition(S1);
Partition(S2);

1.2 Tree Pruning

As we have known, no matter how your preprocess works, there always exist "noise" data or other bad data. So, when we use the traning data to build the decision-tree classification, it also create branches for thos bad data. These branches can lead to errors when classifying test data. Tree pruning is aimed at removing these braches from decision tree by selecting the subtree with the least estimated error rate.

2 Scalability Issues

2.1 Tree Building

As I mentioned, ID3/C4.5/Gini1 is used to evaluate the "goodness" of the alternative splits for an attribute.

2.1.1 Splits for Numeric Attribute

The cost of evaluating splits for a numeric attribute is dominated by the cost of sorting the values. Therefore, an important scalability issue is the reduction of sorting costs for numeric attributes.

2.1.2 Splits for Categorical Attribute

2.2 Tree Pruning

3 SLIQ Classifier

To achieve this pre-sorting, we use the following data structures. We create a separate list for each attribute of the training data. Additionally, a separate list,called class list , is created for the class labels attached to the examples. An entry in an attribute list has two fields: one contains an attribute value, the other anindex into the class list. An entry of the class list also has two fields: one contains a class label, the other a reference to a leaf node of the decision tree. The i th entry of the class list corresponds to the i th example in the training data. Each leaf node of the decision tree represents a partition of the training data, the partition being defined by the conjunction of the predicates on the path from the node to the root. Thus, the class list can at any time identify the partition to which an example belongs. We assume that there is enough memory to keep the class list memory-resident. Attribute lists are written to disk if necessary.

Author: mlhy

Created: 2015-10-08 四 21:29

Emacs 24.5.1 (Org mode 8.2.10)

SLIQ/SPRINT的更多相关文章

  1. TFS 2015 敏捷开发实践 – 在Kanban上运行一个Sprint

    前言:在 上一篇 TFS2015敏捷开发实践 中,我们给大家介绍了TFS2015中看板的基本使用和功能,这一篇中我们来看一个具体的场景,如何使用看板来运行一个sprint.Sprint是Scrum对迭 ...

  2. Sprint计划

    团队: 郭志豪:http://www.cnblogs.com/gzh13692021053/ 杨子健:http://www.cnblogs.com/yzj666/ 刘森松:http://www.cnb ...

  3. 计应152第六组Sprint计划会议

    Sprint计划会议 会议时间:2016年12月8下午16:00 会议地点:宿舍 会议进程 • 首先我们讨论了排球计分规则程序完成需要做的一些工作:程序的初期设计,数据分析,典型用户,场景,代码的编写 ...

  4. HOW TO RUN A SPRINT PLANNING MEETING (THE WAY I LIKE IT)

    This is a sample agenda for a sprint planning meeting. Depending on your context you will have to ch ...

  5. Sprint

    Sprint冲刺 1.选题 <寿司点餐系统> 2.app名 <Sushi> 3.团名 ZEG 4.目标 制作一个成型的人性化的寿司点餐系统,介绍各种寿司的材料做法吃法以及价格, ...

  6. sprint 3 总结

    1.要求: 演示可参考毕业设计答辩,包含两部分内容: 项目陈述,可综述项目.团队.开发过程等. 运行演示,实现的功能.业务.用户反馈等. 希望各组认真准备,拿出最好的阵容最好的状态,展示一学期的学习与 ...

  7. [课程设计]Sprint Three 回顾与总结&发表评论&团队贡献分

    Sprint Three 回顾与总结&发表评论&团队贡献分 ● 一.回顾与总结 (1)回顾 燃尽图: Sprint计划-流程图: milestones完成情况如下: (2)总结 本次冲 ...

  8. TFS二次开发系列:八、TFS二次开发的数据统计以PBI、Bug、Sprint等为例(二)

    上一篇文章我们编写了此例的DTO层,本文将数据访问层封装为逻辑层,提供给界面使用. 1.获取TFS Dto实例,并且可以获取项目集合,以及单独获取某个项目实体 public static TFSSer ...

  9. TFS二次开发系列:七、TFS二次开发的数据统计以PBI、Bug、Sprint等为例(一)

    在TFS二次开发中,我们可能会根据某一些情况对各个项目的PBI.BUG等工作项进行统计.在本文中将大略讲解如果进行这些数据统计. 一:连接TFS服务器,并且得到之后需要使用到的类方法. /// < ...

随机推荐

  1. DLL对应的导入库一定会生成的

    测试代码: #pragma once #define TESTDEPEND_EXPORTS #ifdef TESTDEPEND_EXPORTS #define TESTDEPEND_API __dec ...

  2. js获取指定日期n天之后的日期

    function addDays(date, days,seperator='-') { let oDate = new Date(date).valueOf(); let nDate = oDate ...

  3. VUE 引用公共样式

    首先创建公共样式文件 在main.js中引用样式 浏览器效果图

  4. Python 中如何自动导入缺失的库?

    在写 Python 项目的时候,我们可能经常会遇到导入模块失败的错误:ImportError: No module named 'xxx'或者ModuleNotFoundError: No modul ...

  5. [Algo] 115. Array Deduplication I

    Given a sorted integer array, remove duplicate elements. For each group of elements with the same va ...

  6. 吴裕雄--天生自然 PYTHON3开发学习:元组

    tup1 = ('Google', 'Runoob', 1997, 2000) tup2 = (1, 2, 3, 4, 5, 6, 7 ) print ("tup1[0]: ", ...

  7. android测量的三种模式

    测量模式有三种引用官方的解释如下 UNSPECIFIED The parent has not imposed any constraint on the child. It can be whate ...

  8. win10 python 3.7 pip install tensorflow

    环境: ide:pyCharm 2018.3.2 pyhton3.7 os:win10 64bit 步骤: 1.确认你的python有没有装pip,有则直接跳2.无则cmd到python安装目录下ea ...

  9. AtCoder Beginner Contest 129

    ABCD 签到(A.B.C过水已隐藏) #include<bits/stdc++.h> using namespace std; ; int n,m,ans,f1[N][N],f2[N][ ...

  10. [AC自动机]玄武密码

    题目描述 一个长度为\(N\)的母串,有四个元素分别是:N,S,W,N. 有M个长度为100的模式串. 现在要求每个模式串的前缀与母串匹配最长长度. 输入样例 7 3 SNNSSNS NNSS NNN ...