SLIQ/SPRINT
SLIQ/SPRINT
*/-->
SLIQ/SPRINT
Before SLIQ, most classification alogrithms have the problem that they do not scale. Because these alogrithms have the limit that the traning data should fit in memory. That's why SLIQ was raised.
1 Generic Decision-Tree Classification
Most decision-tree classifiers perform classification in two phases: Tree Building and Tree Pruning.
1.1 Tree Building
- MakeTree(Training Data T)
- Partition(T);
- Partition(Data S)
- if(all points in S are in the same class) then return;
- Evaluate splits for each attribute A
- Use the best split found to partition S into S1 and S2
- Partition(S1);
- Partition(S2);
1.2 Tree Pruning
As we have known, no matter how your preprocess works, there always exist "noise" data or other bad data. So, when we use the traning data to build the decision-tree classification, it also create branches for thos bad data. These branches can lead to errors when classifying test data. Tree pruning is aimed at removing these braches from decision tree by selecting the subtree with the least estimated error rate.
2 Scalability Issues
2.1 Tree Building
As I mentioned, ID3/C4.5/Gini1 is used to evaluate the "goodness" of the alternative splits for an attribute.
2.1.1 Splits for Numeric Attribute
The cost of evaluating splits for a numeric attribute is dominated by the cost of sorting the values. Therefore, an important scalability issue is the reduction of sorting costs for numeric attributes.
2.1.2 Splits for Categorical Attribute
2.2 Tree Pruning
3 SLIQ Classifier
To achieve this pre-sorting, we use the following data structures. We create a separate list for each attribute of the training data. Additionally, a separate list,called class list , is created for the class labels attached to the examples. An entry in an attribute list has two fields: one contains an attribute value, the other anindex into the class list. An entry of the class list also has two fields: one contains a class label, the other a reference to a leaf node of the decision tree. The i th entry of the class list corresponds to the i th example in the training data. Each leaf node of the decision tree represents a partition of the training data, the partition being defined by the conjunction of the predicates on the path from the node to the root. Thus, the class list can at any time identify the partition to which an example belongs. We assume that there is enough memory to keep the class list memory-resident. Attribute lists are written to disk if necessary.
Footnotes:
SLIQ/SPRINT的更多相关文章
- TFS 2015 敏捷开发实践 – 在Kanban上运行一个Sprint
前言:在 上一篇 TFS2015敏捷开发实践 中,我们给大家介绍了TFS2015中看板的基本使用和功能,这一篇中我们来看一个具体的场景,如何使用看板来运行一个sprint.Sprint是Scrum对迭 ...
- Sprint计划
团队: 郭志豪:http://www.cnblogs.com/gzh13692021053/ 杨子健:http://www.cnblogs.com/yzj666/ 刘森松:http://www.cnb ...
- 计应152第六组Sprint计划会议
Sprint计划会议 会议时间:2016年12月8下午16:00 会议地点:宿舍 会议进程 • 首先我们讨论了排球计分规则程序完成需要做的一些工作:程序的初期设计,数据分析,典型用户,场景,代码的编写 ...
- HOW TO RUN A SPRINT PLANNING MEETING (THE WAY I LIKE IT)
This is a sample agenda for a sprint planning meeting. Depending on your context you will have to ch ...
- Sprint
Sprint冲刺 1.选题 <寿司点餐系统> 2.app名 <Sushi> 3.团名 ZEG 4.目标 制作一个成型的人性化的寿司点餐系统,介绍各种寿司的材料做法吃法以及价格, ...
- sprint 3 总结
1.要求: 演示可参考毕业设计答辩,包含两部分内容: 项目陈述,可综述项目.团队.开发过程等. 运行演示,实现的功能.业务.用户反馈等. 希望各组认真准备,拿出最好的阵容最好的状态,展示一学期的学习与 ...
- [课程设计]Sprint Three 回顾与总结&发表评论&团队贡献分
Sprint Three 回顾与总结&发表评论&团队贡献分 ● 一.回顾与总结 (1)回顾 燃尽图: Sprint计划-流程图: milestones完成情况如下: (2)总结 本次冲 ...
- TFS二次开发系列:八、TFS二次开发的数据统计以PBI、Bug、Sprint等为例(二)
上一篇文章我们编写了此例的DTO层,本文将数据访问层封装为逻辑层,提供给界面使用. 1.获取TFS Dto实例,并且可以获取项目集合,以及单独获取某个项目实体 public static TFSSer ...
- TFS二次开发系列:七、TFS二次开发的数据统计以PBI、Bug、Sprint等为例(一)
在TFS二次开发中,我们可能会根据某一些情况对各个项目的PBI.BUG等工作项进行统计.在本文中将大略讲解如果进行这些数据统计. 一:连接TFS服务器,并且得到之后需要使用到的类方法. /// < ...
随机推荐
- php curl模拟post请求提交数据例子总结
php curl模拟post请求提交数据例子总结 [导读] 在php中要模拟post请求数据提交我们会使用到curl函数,下面我来给大家举几个curl模拟post请求提交数据例子有需要的朋友可参考参考 ...
- LeetCode 687. Longest Univalue Path 最长同值路径 (C++/Java)
题目: Given a binary tree, find the length of the longest path where each node in the path has the sam ...
- 201709-1 打酱油 Java
思路: 先看能不能买5瓶,因为送的最多,然后看能不能买3瓶,最后一瓶一瓶地买 import java.util.Scanner; public class Main { public static v ...
- SQL基础教程(第2版)第8章 SQL高级处理:练习题
本题中 SELECT 语句的含义是“按照商品编号(product_id)的升序进行排序, 计算出截至当前行的最高销售单价”.因此,在显示出最高销售单价的同时,窗口函 数的返回结果也会变化.这恰好和奥运 ...
- 解决 springweb Filter 读取request body miss body
package com.lb.demo.listener; import java.io.BufferedReader; import java.io.ByteArrayInputStream; im ...
- MySQL--SHOW ENGINE INNODB STATUS
===================================== -- :: 0x7f305b965700 INNODB MONITOR OUTPUT =================== ...
- 5. react 基础 - 组件拆分 和 组件传值
1.将 todoList 进行拆分 创建 编写TodoList.js import React, {Component, Fragment} from 'react';import TodoItem ...
- php速成_day3
一.MySQL关系型数据库 1.什么是数据库 数据库 数据存储的仓库,在网站开发应用当中,需要有一些数据存储起来. 注册的用户信息,使用PHP变量只是一个临时的存储,如果需要永久的存储起来,就把数据存 ...
- MVPR下的PHP分页教程
这个PHP分页其实不难,现在就开始看看核心思路吧. 我习惯从最底层开始看起. 1. 首先用LIMIT偏移QUERY的指针 /* * get hot post by current page * @pa ...
- Oracle数据库中表的imp&exp
在Oracle数据库中可以使用imp和exp命令来执行数据的导入导出(包括表结构和数据),使用imp和exp命令执行导入导出操作必需的是需要安装Oracle数据库,系统安装Oracle数据库,可以识别 ...