Seven Steps to Success Machine Learning in Practice
Seven Steps to Success
Machine Learning in Practice
Project failures in IT are all too common. The risks are higher if you are adopting a new technology that is unfamiliar to your organisation. Machine learning has been around for a long time in academia, but awareness and development of the technology has only recently reached a point at which its benefits are becoming attractive to business. There is huge potential to reduce costs and find new revenue by applying this technology correctly, but there are also pitfalls.
This guide will help you apply machine learning effectively to solve practical problems within your organisation. I’ll talk about issues that I’ve encountered applying machine learning in industry. My experience is in applying machine learning to analysis of text, however I believe the lessons I have learnt are generally applicable. I have been able to deliver significant and measurable benefits through applying machine learning, and I hope that I can enable you to do the same.
I will assume that you know the basics of machine learning, and that you have a real-world problem that you want to apply it to. This is not an introduction to machine learning (there are already plenty of those), however I don’t assume that you’re a machine learning expert. A lot of the advice is non-technical and would be just as useful to a product manager wanting to understand the technology as a software developer creating a solution.
Clearly understand the business need
Understanding the business need is important for any project, but it is easy to get blinded by technological possibilities. Is machine learning really going to benefit the company, or is it possible to achieve the same goals (or most of them) with some simple rules? The goal is to build a solution, not to do machine learning for the sake of it.
Try and identify all the metrics that are important to the business. The metrics we are optimising for have a profound effect on the solution we choose, so it is important to identify these early on. It also affects what alternatives there are to machine learning.
In the case of classification problems, potential metrics to consider are
accuracy: the proportion of all instances classified correctly. Note that this can be very misleading if the data is biased (if 90% of the data is from class 1, we can get 90% accuracy by simply classifying everying as being from that class). Real data is normally biased in some way. For this reason, you may want to consider an average of the accuracy on each class, or some other measure.
precision is needed when the results need to look good, for example if they are being presented to customers without any manual filtering after the machine learning phase.
high recall is important when combining machine learning with manual analysis to produce a combined system with high overall accuracy.
F1 score, or more generally Fβ score is useful when a trade-off between precision and recall is needed, and β can be adjusted to prefer one over the other.
Customer Service at Direct Electric
Direct Electric are a large electricity company based in the south of England. Dave, the head of customer service, is concerned about response times for upset customers who contact the company online. He wants to ensure that if a customer sends an angry email, a representative will get back to them quickly.
“At the moment, it takes about two days to respond, and I’d like to get that down to half a day,” he explains to Samantha, the resident machine learning expert on the software development team. Dave has heard about automated sentiment analysis, and wonders if that could be used to quickly identify the emails of interest, so that they can be prioritised by the customer service team.
“What we could do,” suggests Samantha, “is try and identify the emails that are likely to carry negative sentiment automatically, and send those to your team to look at first.”
“That sounds good!”
“The thing is,” says Samantha, “A machine-learning based system isn’t going to get everything right. Would it matter if we missed some of the negative sentiment emails?” Samantha thinks a high precision system may be what they are looking for. In this case, we will most likely have to sacrifice recall, and miss some of the emails of interest.
“Well, not really,” says Dave, “it’s only really useful to us if it finds them all.”
“Well, if you want to guarantee you find all of them,” says Samantha, “the only way to do that is to examine them manually.” Dave looks crestfallen. “But,” she continues, “we could probably get nearly all of them. Would it matter if we accidentally prioritised some articles that aren’t really negative?” She is thinking of trying to build a system with high recall, which will probably mean lower precision.
“That would be fine,” says Dave. “After all, at the moment, we’re reading them all.”
Sign up below to read all seven chapters: #### 1. Clearly understand the business need #### 2. Know what’s possible #### 3. Know the data #### 4. Plan for change #### 5. Avoid premature optimisation #### 6. Mitigate risks #### 7. Use common sense
Seven Steps to Success Machine Learning in Practice的更多相关文章
- Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
- How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
- 17 Great Machine Learning Libraries
17 Great Machine Learning Libraries 08 October 2013 After wonderful feedback on my previous post on ...
- [C2P3] Andrew Ng - Machine Learning
##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...
- Roles on a Machine Learning Project (机器学习项目中的角色)
原文 :https://medium.com/machine-learning-in-practice/roles-on-a-machine-learning-project-216903a6dc12 ...
- Machine Learning and Data Mining(机器学习与数据挖掘)
Problems[show] Classification Clustering Regression Anomaly detection Association rules Reinforcemen ...
- [C2P1] Andrew Ng - Machine Learning
About this Course Machine learning is the science of getting computers to act without being explicit ...
- (转) Graph-powered Machine Learning at Google
Graph-powered Machine Learning at Google Thursday, October 06, 2016 Posted by Sujith Ravi, S ...
- ON THE EVOLUTION OF MACHINE LEARNING: FROM LINEAR MODELS TO NEURAL NETWORKS
ON THE EVOLUTION OF MACHINE LEARNING: FROM LINEAR MODELS TO NEURAL NETWORKS We recently interviewed ...
随机推荐
- spring beans源码解读之--bean definiton解析器
spring提供了有两种方式的bean definition解析器:PropertiesBeanDefinitionReader和XmLBeanDefinitionReader即属性文件格式的bean ...
- 关于js中伪数组
伪数组: 具有length属性: 按索引方式存储数据: 不具有数组的push().pop()等方法: 伪数组无法直接调用数组方法或期望length属性有什么特殊的行为,不具有数组的push().pop ...
- python下操作ftp上传
生产情况:tomcat下业务log备份,目录分多级,然后对应目录格式放到ftp上:所以,结构上 我就是一级一级目录进行判断(因为我没有找到在ftp一次判断其子目录是否存在),还有一个low点就是我没有 ...
- mysql-distinct去重、mysql-group …
一.MYSQL-distinct用法 在使用mysql时,有时需要查询出某个字段不重复的记录,虽然mysql提供有distinct这个关键字来过滤掉多余的重复记录只保留一条,但往往只用它来返回不重复记 ...
- Http协议、线程、线程池
Socket模拟服务端运行代码: 1:启动服务端监听的服务,并接受客户端的连接 1.1 创建Socket Socket listenSocket=new Socket(AddressFamily.In ...
- 数据库msqlserver的几种类型及解决MSSQLServer服务启动不了的问题
从08年开始学习了sqlserver数据库之后,就一直以为sqlserver只有版本的区分,没有类型的差异:总以为从Sql2000. sql2005到sql2008.sql2012,微软出口的数据库, ...
- SVN库迁移过程总结
一.背景:老SVN是安装在32位服务器上:现在64位服务器上安装了新版本SVN服务,所以需要将SVN从老服务器上迁移到新服务器上. 1.SVN Server下载:https://www.visuals ...
- jstl的formatNumber标签的四舍五入问题
jstl的formatNumber标签的四舍五入问题 近日使用JSTL的formatNumber 标签进行四舍五入时,发现它竟然使用的是"4舍6入5奇偶"的算法. 要实现" ...
- BearSkill实用方法之UITextField限制输入的字符数量
原文:http://blog.csdn.net/xiongbaoxr/article/details/51525061
- cocos2dx系列笔记(1)- windows环境配置前篇
cocos2dx升级之旅,请多指教~ 本篇是本人搭建cocos2dx-Windows 64位环境的配置说明,仅供参考. 开发准备 搭建环境肯定需要准备好所有工具,只有把工具都准备好了,才能撸起袖子干活 ...