数据挖掘中ID3算法实现zz
id3 function D = ID3(train_features, train_targets, params, region) % Classify using Quinlan's ID3 algorithm
% Inputs:
% features - Train features
% targets - Train targets
% params - [Number of bins for the data, Percentage of incorrectly assigned samples at a node]
% region - Decision region vector: [-x x -y y number_of_points]
%
% Outputs
% D - Decision sufrace [Ni, M] = size(train_features); %Get parameters
[Nbins, inc_node] = process_params(params);
inc_node = inc_node*M/100; %For the decision region
N = region(5);
mx = ones(N,1) * linspace (region(1),region(2),N);
my = linspace (region(3),region(4),N)' * ones(1,N);
flatxy = [mx(:), my(:)]'; %Preprocessing
[f, t, UW, m] = PCA(train_features, train_targets, Ni, region);
train_features = UW * (train_features - m*ones(1,M));;
flatxy = UW * (flatxy - m*ones(1,N^2));; %First, bin the data and the decision region data
[H, binned_features]= high_histogram(train_features, Nbins, region);
[H, binned_xy] = high_histogram(flatxy, Nbins, region); %Build the tree recursively
disp('Building tree')
tree = make_tree(binned_features, train_targets, inc_node, Nbins); %Make the decision region according to the tree
disp('Building decision surface using the tree')
targets = use_tree(binned_xy, 1:N^2, tree, Nbins, unique(train_targets)); D = reshape(targets,N,N);
%END function targets = use_tree(features, indices, tree, Nbins, Uc)
%Classify recursively using a tree targets = zeros(1, size(features,2)); if (size(features,1) == 1),
%Only one dimension left, so work on it
for i = 1:Nbins,
in = indices(find(features(indices) == i));
if ~isempty(in),
if isfinite(tree.child(i)),
targets(in) = tree.child(i);
else
%No data was found in the training set for this bin, so choose it randomally
n = 1 + floor(rand(1)*length(Uc));
targets(in) = Uc(n);
end
end
end
break
end %This is not the last level of the tree, so:
%First, find the dimension we are to work on
dim = tree.split_dim;
dims= find(~ismember(1:size(features,1), dim)); %And classify according to it
for i = 1:Nbins,
in = indices(find(features(dim, indices) == i));
targets = targets + use_tree(features(dims, :), in, tree.child(i), Nbins, Uc);
end %END use_tree function tree = make_tree(features, targets, inc_node, Nbins)
%Build a tree recursively [Ni, L] = size(features);
Uc = unique(targets); %When to stop: If the dimension is one or the number of examples is small
if ((Ni == 1) | (inc_node > L)),
%Compute the children non-recursively
for i = 1:Nbins,
tree.split_dim = 0;
indices = find(features == i);
if ~isempty(indices),
if (length(unique(targets(indices))) == 1),
tree.child(i) = targets(indices(1));
else
H = hist(targets(indices), Uc);
[m, T] = max(H);
tree.child(i) = Uc(T);
end
else
tree.child(i) = inf;
end
end
break
end %Compute the node's I
for i = 1:Ni,
Pnode(i) = length(find(targets == Uc(i))) / L;
end
Inode = -sum(Pnode.*log(Pnode)/log(2)); %For each dimension, compute the gain ratio impurity
delta_Ib = zeros(1, Ni);
P = zeros(length(Uc), Nbins);
for i = 1:Ni,
for j = 1:length(Uc),
for k = 1:Nbins,
indices = find((targets == Uc(j)) & (features(i,:) == k));
P(j,k) = length(indices);
end
end
Pk = sum(P);
P = P/L;
Pk = Pk/sum(Pk);
info = sum(-P.*log(eps+P)/log(2));
delta_Ib(i) = (Inode-sum(Pk.*info))/-sum(Pk.*log(eps+Pk)/log(2));
end %Find the dimension minimizing delta_Ib
[m, dim] = max(delta_Ib); %Split along the 'dim' dimension
tree.split_dim = dim;
dims = find(~ismember(1:Ni, dim));
for i = 1:Nbins,
indices = find(features(dim, :) == i);
tree.child(i) = make_tree(features(dims, indices), targets(indices), inc_node, Nbins);
end
数据挖掘中ID3算法实现zz的更多相关文章
- 数据挖掘之决策树ID3算法(C#实现)
决策树是一种非常经典的分类器,它的作用原理有点类似于我们玩的猜谜游戏.比如猜一个动物: 问:这个动物是陆生动物吗? 答:是的. 问:这个动物有鳃吗? 答:没有. 这样的两个问题顺序就有些颠倒,因为一般 ...
- 决策树-预测隐形眼镜类型 (ID3算法,C4.5算法,CART算法,GINI指数,剪枝,随机森林)
1. 1.问题的引入 2.一个实例 3.基本概念 4.ID3 5.C4.5 6.CART 7.随机森林 2. 我们应该设计什么的算法,使得计算机对贷款申请人员的申请信息自动进行分类,以决定能否贷款? ...
- Python实现ID3算法
自己用Python写的数据挖掘中的ID3算法,现在觉得Python是实现算法的最好工具: 先贴出ID3算法的介绍地址http://wenku.baidu.com/view/cddddaed0975f4 ...
- 机器学习之决策树(ID3)算法与Python实现
机器学习之决策树(ID3)算法与Python实现 机器学习中,决策树是一个预测模型:他代表的是对象属性与对象值之间的一种映射关系.树中每个节点表示某个对象,而每个分叉路径则代表的某个可能的属性值,而每 ...
- 决策树 -- ID3算法小结
ID3算法(Iterative Dichotomiser 3 迭代二叉树3代),是一个由Ross Quinlan发明的用于决策树的算法:简单理论是越是小型的决策树越优于大的决策树. 算法归 ...
- 机器学习笔记----- ID3算法的python实战
本文申明:本文原创,如有转载请申明.数据代码来自实验数据都是来自[美]Peter Harrington 写的<Machine Learning in Action>这本书,侵删. Hell ...
- 决策树笔记:使用ID3算法
决策树笔记:使用ID3算法 决策树笔记:使用ID3算法 机器学习 先说一个偶然的想法:同样的一堆节点构成的二叉树,平衡树和非平衡树的区别,可以认为是"是否按照重要度逐渐降低"的顺序 ...
- paper 56 :机器学习中的算法:决策树模型组合之随机森林(Random Forest)
周五的组会如约而至,讨论了一个比较感兴趣的话题,就是使用SVM和随机森林来训练图像,这样的目的就是 在图像特征之间建立内在的联系,这个model的训练,着实需要好好的研究一下,下面是我们需要准备的入门 ...
- ID3算法 决策树的生成(2)
# coding:utf-8 import matplotlib.pyplot as plt import numpy as np import pylab def createDataSet(): ...
随机推荐
- Packer 基本试用
安装 使用mac 系统 https://www.packer.io/downloads.html 配置环境变量 可选 sudo nano ~/.bash_profile export PATH=$PA ...
- Install LAMP Server (Apache, MariaDB, PHP) On CentOS/RHEL/Scientific Linux 7
Install LAMP Server (Apache, MariaDB, PHP) On CentOS/RHEL/Scientific Linux 7 By SK - August 12, 201 ...
- [LeetCode系列] 最长回文子串问题
给定字符串S, 找到其子串中最长的回文字符串. 反转法: 反转S为S', 找到其中的最长公共子串s, 并确认子串s在S中的下标iS与在S'中的下标iS'是否满足式: length(S) = iS ...
- Spring AOP 不同配置方式产生的冲突问题
Spring AOP的原理是 JDK 动态代理和CGLIB字节码增强技术,前者需要被代理类实现相应接口,也只有接口中的方法可以被JDK动态代理技术所处理:后者实际上是生成一个子类,来覆盖被代理类,那么 ...
- 第十届蓝桥杯 试题 E: 迷宫
试题 E: 迷宫 本题总分:15 分 [问题描述] 下图给出了一个迷宫的平面图,其中标记为 1 的为障碍,标记为 0 的为可 以通行的地方. 010000 000100 001001 110000 迷 ...
- ThinkJava-File类
1.1目录列表器: package com.java.io; import java.io.File; import java.io.FilenameFilter; import java.util. ...
- C++ 构造函数_内存分区_对象初始化
内存分区 栈区:int x = 0:int *p = NULL; 定义一个变量,定义一个指针时,会在栈区进行分配内存.分配的内存系统分配收回的,我们不用管. 堆区:int *p = new i ...
- 管理Linux服务器的用户和组
管理Linux服务器的用户和组 Linux操作系统是一个多用户多任务的操作系统,允许多个用户同时登录到系统,使用系统资源. 为了使所有用户的工作顺利进行,保护每个用户的文件和进程,规范每个用户的权限, ...
- [maven] 实战笔记 - 构建、打包和安装maven
① 手工构建自己的maven项目 Maven 项目的核心是 pom.xml.POM (Project Object Model,项目对象模型)定义了项目的基本信息,用于描述项目如何构建,声明项目依赖等 ...
- AWS安装
curl "https://bootstrap.pypa.io/get-pip.py" -o "get-pip.py" sudo python get-pip. ...