Naive Bayes是比较常用的分类器,因为思想比较简单。之所以说是naive,是因为他假设用于分类的特征在类确定的条件下是条件独立的,这个假设使得分类变得很简单,但会损失一定的精度。
具体推导可以看《统计学习方法》
经过推导我们可知y=argMaxP(Y=ck)*P(X=x|Y=ck)。那么我们需要求先验概率也就是P(Y=ck)和求条件概率p(X=x|Y=ck).
具体的例子以:http://blog.163.com/jiayouweijiewj@126/blog/static/1712321772010102802635243/来说明。
我这里一共用了4个mapreduce,因为采用了多项式模型,先验概率P(c)= 类c下单词总数/整个训练样本的单词总数。类条件概率P(tk|c)=(类c下单词tk在各个文档中出现过的次数之和+1)/(类c下单词总数+|V|)(|V|是单词种类数)。输入是:
1:Chinese Beijing Chinese
1:Chinese Chinese Shanghai
1:Chinese Macao
0:Tokyo Japan Chinese
1 一个mapreduce是用于求在各个类别下的单词数,这个是为了后面求先验概率用的。
输出为:
0              3
1              8
 
2 一个mapreduce用于求条件概率,输出为:
0:Chinese               0.2222222222222222
0:Japan   0.2222222222222222
0:Tokyo 0.2222222222222222
1:Beijing 0.14285714285714285
1:Chinese               0.42857142857142855
1:Macao 0.14285714285714285
1:Shanghai            0.14285714285714285
 
3 一个mapreduce用于计算单词种类数,输出为:
num is 6
4 最后一个mapreduce是用于预测的。
 
下面说下各个mapreduce的实现:
1 求各个类别下的单词数,这个比较简单,就是以类别为key,然后进行单词统计就好。
 附上代码:
 package hadoop.MachineLearning.Bayes.Pro;

 import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class PriorProbability {//用于求各个类别下的单词数,为后面求先验概率 public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String input="hdfs://10.107.8.110:9000/Bayes/Bayes_input/";
String output="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Pro/";
Job job = Job.getInstance(conf, "ProirProbability");
job.setJarByClass(hadoop.MachineLearning.Bayes.Pro.PriorProbability.class);
// TODO: specify a mapper
job.setMapperClass(MyMapper.class);
//job.setMapInputKeyClass(LongWritable.class);
// TODO: specify a reducer
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setReducerClass(MyReducer.class); // TODO: specify output types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class); // TODO: specify input and output DIRECTORIES (not files)
FileInputFormat.setInputPaths(job, new Path(input));
FileOutputFormat.setOutputPath(job, new Path(output)); if (!job.waitForCompletion(true))
return;
} } package hadoop.MachineLearning.Bayes.Pro; import java.io.IOException; import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Mapper.Context; public class MyMapper extends Mapper<LongWritable, Text, Text, Text> { public void map(LongWritable ikey, Text ivalue, Context context)
throws IOException, InterruptedException {
String[] line=ivalue.toString().split(":| ");
int size=line.length-1;
context.write(new Text(line[0]),new Text(String.valueOf(size)));
} } package hadoop.MachineLearning.Bayes.Pro; import java.io.IOException; import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context; public class MyReducer extends Reducer<Text, Text, Text, IntWritable> { public void reduce(Text _key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
// process values
int sum=0;
for (Text val : values) {
sum+=Integer.parseInt(val.toString());
}
context.write(_key,new IntWritable(sum));
} }
 
 
2 求文档中的单词种类数,自己实现的方法不太好,思路是,对每一行的输入都以相同的key输出,然后在combiner中先利用set求得该节点上的不重复的单词,接着在reduce中再利用set,将所有单词求种类数。感觉好一点的话是先按照单词进行规约,最后再利用一个mapreduce对单词种类数进行统计。但是考虑到刚学会mapreduce不久还不会写链式,而且一个bayes已经写了4个mapreduce就不考虑再复杂化了。

package hadoop.MachineLearning.Bayes.Count;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class Count {//计算文档中的单词种类数目 public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Count");
String input="hdfs://10.107.8.110:9000/Bayes/Bayes_input";
String output="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Count";
job.setJarByClass(hadoop.MachineLearning.Bayes.Count.Count.class);
// TODO: specify a mapper
job.setMapperClass(MyMapper.class);
// TODO: specify a reducer
job.setCombinerClass(MyCombiner.class);
job.setReducerClass(MyReducer.class); // TODO: specify output types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class); // TODO: specify input and output DIRECTORIES (not files)
FileInputFormat.setInputPaths(job, new Path(input));
FileOutputFormat.setOutputPath(job, new Path(output)); if (!job.waitForCompletion(true))
return;
} } package hadoop.MachineLearning.Bayes.Count; import java.io.IOException; import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper; public class MyMapper extends Mapper<LongWritable, Text, Text, Text> { public void map(LongWritable ikey, Text ivalue, Context context)
throws IOException, InterruptedException {
String[] line=ivalue.toString().split(":| ");
String key="1";
System.out.println(" ");
System.out.println(" ");
System.out.println(" ");
for(int i=1;i<line.length;i++){ System.out.println(line[i]);
context.write(new Text(key),new Text(line[i]));//以相同的key进行输出,使得能最后输出到一个reduce中
}
} } package hadoop.MachineLearning.Bayes.Count; import java.io.IOException;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set; import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer; public class MyCombiner extends Reducer<Text, Text, Text, Text> {//先在本地的节点上利用set删去重复的单词 public void reduce(Text _key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
// process values
Set set=new HashSet();
for (Text val : values) {
set.add(val.toString());
}
for(Iterator it=set.iterator();it.hasNext();){
context.write(new Text("1"),new Text(it.next().toString()));
}
} } package hadoop.MachineLearning.Bayes.Count; import java.io.IOException;
import java.util.HashSet;
import java.util.Set; import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer; public class MyReducer extends Reducer<Text, Text, Text, Text> {//通过combiner后,再利用set对单词进行去重,最后得到种类数 public void reduce(Text _key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
// process values
Set set=new HashSet();
for (Text val : values) {
set.add(val.toString());
}
context.write(new Text("num is "),new Text(String.valueOf(set.size())));
} }
 
 
3 求条件概率.这里需要用到该类别下该单词的数目sum,该类别下的单词总数,文档中的单词种类数。这些都可以在之前的输出文件中获得,我这里都用map去接受这些数据。由于有些单词没有出现在该类别下,例如P(Japan | yes)=P(Tokyo | yes),如果将他们当作0处理,那么导致该条件概率会是0,所以这里用了平滑的方法可以参考上述的链接。这里有个细节,就是条件概率生成的会比较多,需要一种高效的存储和查找方式,我这里因为水平不够,就直接用map来存放了,如果对于大的数据,这个会很低效。

package hadoop.MachineLearning.Bayes.Cond;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class CondiPro {//用于求条件概率 public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String input="hdfs://10.107.8.110:9000/Bayes/Bayes_input";
String output="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Con";
String proPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Pro";//这是之前求各个类别下单词数目的输出
String countPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Count";//这是之前求的单词种类数
conf.set("propath",proPath);
conf.set("countPath",countPath);
Job job = Job.getInstance(conf, "ConditionPro"); job.setJarByClass(hadoop.MachineLearning.Bayes.Cond.CondiPro.class);
// TODO: specify a mapper
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// TODO: specify a reducer
job.setReducerClass(MyReducer.class); // TODO: specify output types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class); // TODO: specify input and output DIRECTORIES (not files)
FileInputFormat.setInputPaths(job, new Path(input));
FileOutputFormat.setOutputPath(job, new Path(output)); if (!job.waitForCompletion(true))
return;
} } package hadoop.MachineLearning.Bayes.Cond; import java.io.IOException;
import java.util.Map; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper; public class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> { public void map(LongWritable ikey, Text ivalue, Context context)
throws IOException, InterruptedException {
String[] line=ivalue.toString().split(":| ");
for(int i=1;i<line.length;i++){
String key=line[0]+":"+line[i];
context.write(new Text(key),new IntWritable(1));
}
} } package hadoop.MachineLearning.Bayes.Cond; import java.io.IOException;
import java.util.Map; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer; public class MyReducer extends Reducer<Text, IntWritable, Text, DoubleWritable> {
public Map<String,Integer> map;
public int count=0;
public void setup(Context context) throws IOException{
Configuration conf=context.getConfiguration(); String proPath=conf.get("propath");
String countPath=conf.get("countPath");//
map=Utils.getMapFormHDFS(proPath);//获得各个类别下的单词数
count=Utils.getCountFromHDFS(countPath);//获得单词种类数
}
public void reduce(Text _key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
// process values
int sum=0;
for (IntWritable val : values) {
sum+=val.get();
}
int type=Integer.parseInt(_key.toString().split(":")[0]);
double probability=0.0;
for(Map.Entry<String,Integer> entry:map.entrySet()){
if(type==Integer.parseInt(entry.getKey())){
probability=(sum+1)*1.0/(entry.getValue()+count);//条件概率的计算
}
}
context.write(_key,new DoubleWritable(probability));
} } package hadoop.MachineLearning.Bayes.Cond; import java.io.IOException;
import java.util.HashMap;
import java.util.Map; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.util.LineReader; public class Utils { /**
* @param args
* @throws IOException
*/ public static Map<String,Integer> getMapFormHDFS(String input) throws IOException{
Configuration conf=new Configuration();
Path path=new Path(input);
FileSystem fs=path.getFileSystem(conf); FileStatus[] stats=fs.listStatus(path);
Map<String,Integer> map=new HashMap();
for(int i=0;i<stats.length;i++){
if(stats[i].isFile()){
FSDataInputStream infs=fs.open(stats[i].getPath());
LineReader reader=new LineReader(infs,conf);
Text line=new Text();
while(reader.readLine(line)>0){
String[] temp=line.toString().split(" ");
//System.out.println(temp.length);
map.put(temp[0],Integer.parseInt(temp[1]));
}
reader.close();
}
} return map; } public static Map<String,Double> getMapFormHDFS(String input,boolean j) throws IOException{
Configuration conf=new Configuration();
Path path=new Path(input);
FileSystem fs=path.getFileSystem(conf); FileStatus[] stats=fs.listStatus(path);
Map<String,Double> map=new HashMap();
for(int i=0;i<stats.length;i++){
if(stats[i].isFile()){
FSDataInputStream infs=fs.open(stats[i].getPath());
LineReader reader=new LineReader(infs,conf);
Text line=new Text();
while(reader.readLine(line)>0){
String[] temp=line.toString().split(" ");
//System.out.println(temp.length);
map.put(temp[0],Double.parseDouble(temp[1]));
}
reader.close();
}
} return map; } public static int getCountFromHDFS(String input) throws IOException{
Configuration conf=new Configuration();
Path path=new Path(input);
FileSystem fs=path.getFileSystem(conf); FileStatus[] stats=fs.listStatus(path); int count=0;
for(int i=0;i<stats.length;i++){
if(stats[i].isFile()){
FSDataInputStream infs=fs.open(stats[i].getPath());
LineReader reader=new LineReader(infs,conf);
Text line=new Text();
while(reader.readLine(line)>0){
String[] temp=line.toString().split(" ");
//System.out.println(temp.length);
count=Integer.parseInt(temp[1]);
}
reader.close();
}
}
return count;
} public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
String proPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Pro";
String countPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Count/";
Map<String,Integer> map=Utils.getMapFormHDFS(proPath);
for(Map.Entry<String,Integer> entry:map.entrySet()){
System.out.println(entry.getKey()+"->"+entry.getValue());
} int count=Utils.getCountFromHDFS(countPath);
System.out.println("count is "+count);
} }
 
4 预测,例如输入Chinese, Chinese, Chinese, Tokyo, Japan。那就分别对每个单词以0,1的类别进行输出,输出为type:words,接着就是在条件概率中查找,进行简单的累乘即可。

package hadoop.MachineLearning.Bayes.Predict;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class Predict { public static void main(String[] args) throws Exception {//预测
Configuration conf = new Configuration();
String input="hdfs://10.107.8.110:9000/Bayes/Predict_input";
String output="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Predict";
String condiProPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Con";
String proPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Pro";
String countPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Count";
conf.set("condiProPath",condiProPath);
conf.set("proPath",proPath);
conf.set("countPath",countPath);
Job job = Job.getInstance(conf, "Predict");
job.setJarByClass(hadoop.MachineLearning.Bayes.Predict.Predict.class);
// TODO: specify a mapper
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
// TODO: specify a reducer
job.setReducerClass(MyReducer.class); // TODO: specify output types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class); // TODO: specify input and output DIRECTORIES (not files)
FileInputFormat.setInputPaths(job, new Path(input));
FileOutputFormat.setOutputPath(job, new Path(output)); if (!job.waitForCompletion(true))
return;
} } package hadoop.MachineLearning.Bayes.Predict; import java.io.IOException;
import java.util.HashMap;
import java.util.Map; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper; public class MyMapper extends Mapper<LongWritable, Text, Text, Text> { public Map<String,Integer> map=new HashMap(); public void setup(Context context) throws IOException{
Configuration conf=context.getConfiguration();
String proPath=conf.get("proPath");
map=Utils.getMapFormHDFS(proPath);
} public void map(LongWritable ikey, Text ivalue, Context context)
throws IOException, InterruptedException {
for(Map.Entry<String,Integer> entry:map.entrySet()){
context.write(new Text(entry.getKey()),ivalue);//对每一行数据,打上所有类别,方便后续的求条件概率
}
} } package hadoop.MachineLearning.Bayes.Predict; import java.io.IOException;
import java.util.HashMap;
import java.util.Map; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer; public class MyReducer extends Reducer<Text, Text, Text, DoubleWritable> { public Map<String,Double> mapDouble=new HashMap();//存放条件概率 public Map<String,Integer> mapInteger=new HashMap();//存放各个类别下的单词数 public Map<String,Double> noFind=new HashMap();//用于那些单词没有出现在某个类别中的 public Map<String,Double> prePro=new HashMap();//求的后的先验概率 public void setup(Context context) throws IOException{
Configuration conf=context.getConfiguration(); String condiProPath=conf.get("condiProPath");
String proPath=conf.get("proPath");
String countPath=conf.get("countPath");
mapDouble=Utils.getMapFormHDFS(condiProPath,true);
mapInteger=Utils.getMapFormHDFS(proPath);
int count=Utils.getCountFromHDFS(countPath);
for(Map.Entry<String,Integer> entry:mapInteger.entrySet()){
double pro=0.0;
noFind.put(entry.getKey(),(1.0/(count+entry.getValue())));
}
int sum=0;
for(Map.Entry<String,Integer> entry:mapInteger.entrySet()){
sum+=entry.getValue();
} for(Map.Entry<String,Integer> entry:mapInteger.entrySet()){
prePro.put(entry.getKey(),(entry.getValue()*1.0/sum));
} } public void reduce(Text _key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
// process values
String type=_key.toString();
double pro=1.0;
for (Text val : values) {
String[] words=val.toString().split(" ");
for(int i=0;i<words.length;i++){
String condi=type+":"+words[i];
if(mapDouble.get(condi)!=null){//如果该单词出现在该类别中,说明有条件概率
pro=pro*mapDouble.get(condi);
}else{//如果该单词不在该类别中,就采用默认的条件概率
pro=pro*noFind.get(type);
}
}
}
pro=pro*prePro.get(type);
context.write(new Text(type),new DoubleWritable(pro));
} } package hadoop.MachineLearning.Bayes.Predict; import java.io.IOException;
import java.util.HashMap;
import java.util.Map; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.util.LineReader; public class Utils { /**
* @param args
* @throws IOException
*/ public static Map<String,Integer> getMapFormHDFS(String input) throws IOException{
Configuration conf=new Configuration();
Path path=new Path(input);
FileSystem fs=path.getFileSystem(conf); FileStatus[] stats=fs.listStatus(path);
Map<String,Integer> map=new HashMap();
for(int i=0;i<stats.length;i++){
if(stats[i].isFile()){
FSDataInputStream infs=fs.open(stats[i].getPath());
LineReader reader=new LineReader(infs,conf);
Text line=new Text();
while(reader.readLine(line)>0){
String[] temp=line.toString().split(" ");
//System.out.println(temp.length);
map.put(temp[0],Integer.parseInt(temp[1]));
}
reader.close();
}
} return map; } public static Map<String,Double> getMapFormHDFS(String input,boolean j) throws IOException{
Configuration conf=new Configuration();
Path path=new Path(input);
FileSystem fs=path.getFileSystem(conf); FileStatus[] stats=fs.listStatus(path);
Map<String,Double> map=new HashMap();
for(int i=0;i<stats.length;i++){
if(stats[i].isFile()){
FSDataInputStream infs=fs.open(stats[i].getPath());
LineReader reader=new LineReader(infs,conf);
Text line=new Text();
while(reader.readLine(line)>0){
String[] temp=line.toString().split(" ");
//System.out.println(temp.length);
map.put(temp[0],Double.parseDouble(temp[1]));
}
reader.close();
}
} return map; } public static int getCountFromHDFS(String input) throws IOException{
Configuration conf=new Configuration();
Path path=new Path(input);
FileSystem fs=path.getFileSystem(conf); FileStatus[] stats=fs.listStatus(path); int count=0;
for(int i=0;i<stats.length;i++){
if(stats[i].isFile()){
FSDataInputStream infs=fs.open(stats[i].getPath());
LineReader reader=new LineReader(infs,conf);
Text line=new Text();
while(reader.readLine(line)>0){
String[] temp=line.toString().split(" ");
//System.out.println(temp.length);
count=Integer.parseInt(temp[1]);
}
reader.close();
}
}
return count;
} public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
String proPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Pro";
String countPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Count/";
Map<String,Integer> map=Utils.getMapFormHDFS(proPath);
for(Map.Entry<String,Integer> entry:map.entrySet()){
System.out.println(entry.getKey()+"->"+entry.getValue());
} int count=Utils.getCountFromHDFS(countPath);
System.out.println("count is "+count);
} }
 
 
 
 
 
 
 

Naive Bayes在mapreduce上的实现的更多相关文章

  1. Naive Bayes在mapreduce上的实现(转)

    Naive Bayes在mapreduce上的实现 原文地址 http://www.cnblogs.com/sunrye/p/4553732.html Naive Bayes是比较常用的分类器,因为思 ...

  2. (转载)微软数据挖掘算法:Microsoft Naive Bayes 算法(3)

    介绍: Microsoft Naive Bayes 算法是一种基于贝叶斯定理的分类算法,可用于探索性和预测性建模. Naïve Bayes 名称中的 Naïve 一词派生自这样一个事实:该算法使用贝叶 ...

  3. [Machine Learning & Algorithm] 朴素贝叶斯算法(Naive Bayes)

    生活中很多场合需要用到分类,比如新闻分类.病人分类等等. 本文介绍朴素贝叶斯分类器(Naive Bayes classifier),它是一种简单有效的常用分类算法. 一.病人分类的例子 让我从一个例子 ...

  4. Spark MLlib 之 Naive Bayes

    1.前言: Naive Bayes(朴素贝叶斯)是一个简单的多类分类算法,该算法的前提是假设各特征之间是相互独立的.Naive Bayes 训练主要是为每一个特征,在给定的标签的条件下,计算每个特征在 ...

  5. Naive Bayes理论与实践

    Naive Bayes: 简单有效的常用分类算法,典型用途:垃圾邮件分类 假设:给定目标值时属性之间相互条件独立 同样,先验概率的贝叶斯估计是 优点: 1. 无监督学习的一种,实现简单,没有迭代,学习 ...

  6. [ML] Naive Bayes for Text Classification

    TF-IDF Algorithm From http://www.ruanyifeng.com/blog/2013/03/tf-idf.html Chapter 1, 知道了"词频" ...

  7. 朴素贝叶斯方法(Naive Bayes Method)

        朴素贝叶斯是一种很简单的分类方法,之所以称之为朴素,是因为它有着非常强的前提条件-其所有特征都是相互独立的,是一种典型的生成学习算法.所谓生成学习算法,是指由训练数据学习联合概率分布P(X,Y ...

  8. 朴素贝叶斯算法(Naive Bayes)

    朴素贝叶斯算法(Naive Bayes) 阅读目录 一.病人分类的例子 二.朴素贝叶斯分类器的公式 三.账号分类的例子 四.性别分类的例子 生活中很多场合需要用到分类,比如新闻分类.病人分类等等. 本 ...

  9. Naive Bayes (NB Model) 初识

    1,Bayes定理 P(A,B)=P(A|B)P(B); P(A,B)=P(B|A)P(A); P(A|B)=P(B|A)P(A)/P(B);    贝叶斯定理变形 2,概率图模型 2.1  定义 概 ...

随机推荐

  1. Ansible1:简介与基本安装【转】

    Ansible是一个综合的强大的管理工具,他可以对多台主机安装操作系统,并为这些主机安装不同的应用程序,也可以通知指挥这些主机完成不同的任务.查看多台主机的各种信息的状态等,ansible都可以通过模 ...

  2. redis学习一

    一.简介: 在过去的几年中,NoSQL数据库一度成为高并发.海量数据存储解决方案的代名词,与之相应的产品也呈现出雨后春笋般的生机.然而在众多产品中能够脱颖而出的却屈指可数,如Redis.MongoDB ...

  3. zf-删除重复数据只保留一条(转)

    http://blog.csdn.net/anya/article/details/6407280

  4. 实现jsp页面显示用户登录信息,利用session保存。

    这是后台代码 这是jsp代码,上面是声明,下面是获得值.

  5. ignite客户端找不到服务端的时候如何设置退出

    ignite启动客户端时需要有服务端支持: Ignition.setClientMode(true); Ignition.start("ignite.xml"); 这里有个问题,当 ...

  6. OpenGL.Vertex Array Object (VAO).

    OpenGL抛弃glEnable(),glColor(),glVertex(),glEnable()这一套流程的函数和管线以后,就需要一种新的方法来传递数据到Graphics Card来渲染几何体,我 ...

  7. UIImage图片拉伸方法

    纵观移动市场,一款移动app,要想长期在移动市场立足,最起码要包含以下几个要素:实用的功能.极强的用户体验.华丽简洁的外观.华丽外观的背后,少不了美工的辛苦设计,但如果开发人员不懂得怎么合理展示这些设 ...

  8. 4位开锁<dfs>

    题意: 有一个四位密码的锁,每一位是1~9的密码,1跟9相连.并且相邻的连个密码位可以交换.每改变一位耗时1s,给出锁的当前状态和密码,求最少解锁时间. 思路: 用bfs枚举出所有相邻交换的情况,并记 ...

  9. 使用URL工具类调用webservice接口(soap)与http接口的实现方式

    import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import ...

  10. 更换arm-linux-gcc 4.3.2编译器

    先创建一个临时目录:mcx@mcx-virtual-machine:/home/work/tools$ mkdir tmp 解压到根目录:mcx@mcx-virtual-machine:/home/w ...