[Javascript] Classify JSON text data with machine learning in Natural
In this lesson, we will learn how to train a Naive Bayes classifier and a Logistic Regression classifier - basic machine learning algorithms - on JSON text data, and classify it into categories.
While this dataset is still considered a small dataset -- only a couple hundred points of data -- we'll start to get better results.
The general rule is that Logistic Regression will work better than Naive Bayes, but only if there is enough data. Since this is still a pretty small dataset, Naive Bayes works better here. Generally, Logistic Regression takes longer to train as well.
This uses data from Ana Cachopo: http://ana.cachopo.org/datasets-for-single-label-text-categorization.
// train data [{text: 'xxxxxx', label: 'space'}]
// Load train data form the files and train var natural = require('natural');
var fs = require('fs');
var classifier = new natural.BayesClassifier(); fs.readFile('training_data.json', 'utf-8', function(err, data){
if (err){
console.log(err);
} else {
var trainingData = JSON.parse(data);
train(trainingData);
}
}); function train(trainingData){
console.log("Training");
trainingData.forEach(function(item){
classifier.addDocument(item.text, item.label);
});
var startTime = new Date();
classifier.train();
var endTime = new Date();
var trainingTime = (endTime-startTime)/1000.0;
console.log("Training time:", trainingTime, "seconds");
loadTestData();
} function loadTestData(){
console.log("Loading test data");
fs.readFile('test_data.json', 'utf-8', function(err, data){
if (err){
console.log(err);
} else {
var testData = JSON.parse(data);
testClassifier(testData);
}
});
} function testClassifier(testData){
console.log("Testing classifier");
var numCorrect = 0;
testData.forEach(function(item){
var labelGuess = classifier.classify(item.text);
if (labelGuess === item.label){
numCorrect++;
}
});
console.log("Correct %:", numCorrect/testData.length);
saveClassifier(classifier)
}
function saveClassifier(classifier){
classifier.save('classifier.json', function(err, classifier){
if (err){
console.log(err);
} else {
console.log("Classifier saved!");
}
});
}
In a new project, we can test the train result by:
var natural = require('natural'); natural.LogisticRegressionClassifier.load('classifier.json', null, function(err, classifier){
if (err){
console.log(err);
} else {
var testComment = "is this about the sun and moon?";
console.log(classifier.classify(testComment));
}
});
[Javascript] Classify JSON text data with machine learning in Natural的更多相关文章
- [Javascript] Classify text into categories with machine learning in Natural
In this lesson, we will learn how to train a Naive Bayes classifier or a Logistic Regression classif ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 3/4/5)
week 3 Classification KNN :基本思想是 input value 类似,就可能是同一类的 Decision Tree Naive Bayes Week 4 Evaluating ...
- 斯坦福大学公开课机器学习:machine learning system design | data for machine learning(数据量很大时,学习算法表现比较好的原理)
下图为四种不同算法应用在不同大小数据量时的表现,可以看出,随着数据量的增大,算法的表现趋于接近.即不管多么糟糕的算法,数据量非常大的时候,算法表现也可以很好. 数据量很大时,学习算法表现比较好的原理: ...
- How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
- 100 Most Popular Machine Learning Video Talks
100 Most Popular Machine Learning Video Talks 26971 views, 1:00:45, Gaussian Process Basics, David ...
- [C5/C6] 机器学习诊断和系统设计(Machine learning Diagnostic and System Desig
机器学习诊断(Machine learning diagnostic) Diagnostic : A test that you can run to gain insight what is / i ...
- [C2P3] Andrew Ng - Machine Learning
##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...
- Machine Learning - XI. Machine Learning System Design机器学习系统的设计(Week 6)
http://blog.csdn.net/pipisorry/article/details/44119187 机器学习Machine Learning - Andrew NG courses学习笔记 ...
随机推荐
- eBay起诉指控亚马逊利用非法手段挖走其卖家
[摘要]eBay在诉状中称,亚马逊的代表滥用eBay的内部电子邮件系统联系卖家,这违反了市场政策. 腾讯科技讯 10月18日消息,据外媒报道,拍卖网站eBay对亚马逊提起诉讼,指控这家美国零售巨头利用 ...
- 监控mysqld服务
#!/bin/bash #监控mysqld服务 #telnet 192.168.122.171 3306 | grep Connected | wc -l #远程检查 #num=`netstat -n ...
- 紫书 习题 10-10 UVa 1645(递推)
除了根节点以外,有n-1个节点,然后就看n-1的因数有那些,所有因数加起来(递推)就好了. #include<cstdio> #define REP(i, a, b) for(int i ...
- 用JS中的cookie实现商品的浏览记录
最近在做一个购物车效果,为了实现商品的浏览记录效果可是让我百般周折,避免以后忘记特写此随笔与大家共享,希望博友们看后有所收获. 第一步:在一个公用的js文件下getCookie(“liulan”),c ...
- vue2.0 vue-loader
vue-cli npm install 脚手架: vue-loader 1.0 -> new Vue({ el: '#app', components:{App} }) 2.0-> new ...
- Docker之基础篇
小白学Docker之基础篇 系列文章: 小白学Docker之基础篇 小白学Docker之Compose 小白学Docker之Swarm PS: 以下是个人作为新手小白学习docker的笔记总结 1 ...
- Centos7 zabbix3.4.6的安装部署 (二)
接着安装zabbix客户端 直接安装在服务器上 监控服务器 ip 192.168.161.25 yum -y install zabbix-agent #通过Yum安装zabbix客户端 接着配置za ...
- nginx假死导致的问题回顾
背景: 网络大致拓扑型 定位到一台Nginx节点于凌晨Timewait异常,等到6K. 进程异常,ngx_http_realtime_request模块错误 错误日志: 2017/02/24 00:0 ...
- spring-cloud导入eclipse时,@slf4j注解为什么找不到log变量
原因是缺少插件Lomboz. Lomboz是一个基于LGPL的开源J2EE综合开发环境的Eclipse插件,对编码,发布,测试,以及debug等各个软件开发的生命周期提供支持,支持JSP,EJB等.L ...
- Android 仿QQ首页的消息和电话的切换,首页的头部(完全用布局控制)
Android 仿QQ首页的消息和电话的切换,首页的头部(完全用布局控制) 首先贴上七个控制布局代码 1.title_text_sel.xml 字体颜色的切换 放到color文件夹下面 <?xm ...