Build a Machine Learning Portfolio(构建机器学习投资组合)
Complete Small Focused Projects and Demonstrate Your Skills
(完成小型针对性机器学习项目,证明你的能力)
A portfolio is typically used by designers and artists to show examples of prior work to prospective clients and employers.
Design, art and photography are examples where the work product is creative and empirical, where telling someone you can do it is not valued the same as showing them.
In this post, I will convince you that building a machine learning portfolio has value to you, others and the community.
You will discover what exactly a machine learning portfolio is, the types of projects that can be included and how to make your portfolio really work for you.
Your Portfolio
- Pick a theme. This is the type of projects that you want to work on. A no-brainer would be reports on customer data (high-value customers, predictions of prospects that convert, etc.).
- Find open datasets. You need to locate datasets that you can practice on that are close to or on your theme. Look on competition websites like Kaggle and KDDCup as a starting point. There are a lot of public access datasets these days that you can practice on!
- Complete projects. Treat each dataset like a project with a client and apply your process to it in order to deliver a result. This may require you to assume the role of the client and take an educated guess as to the outcome they are looking for (model or report on a specific question, etc.)
- Write-up. Write-up your findings as a semi-formal work product and host it publicly online.
Benefits of a Machine Learning Portfolio
If you are just starting out as a beginner in machine learning or you are a hardened veteran, a machine learning portfolio can keep you on track and demonstrate your skills. Creating a machine learning portfolio is a valuable exercise for you and for others.
Benefits for You
Building up a collection of completed machine learning projects can keep you focused, motivated and be leveraged on future projects.
- Focus: Each project has a well-defined purpose and end point. Small projects constrained in effort and resources can keep velocity high.
- Knowledge Base: The corpus of completed projects provide a knowledge base for you to reflect on and leverage as you push into projects further from comfort zones.
- Trajectory: There are so many shiny things to investigate, reminding yourself that you are looking for a consistent collection projects can be used as a lever to keep you on track.
Benefits to Others
A portfolio of completed projects can be used by others as an indicator of specific skills, ability to communicate and a demonstration of drive.
- Skills: A project can demonstrate your capability with regard to a specific problem domain, tool, library technology stack or algorithm.
- Communication: A project must be understood at least in terms of its purpose and the findings. The curation of a good portfolio requires excellent communication skills that tautologically demonstrate your ability to communicate technical subjects well.
- Motivation: Working on and completing side projects, regardless of the size of scope takes a certain level of self-discipline. The fact that you managed to put together a portfolio is a monument to your interest in the subject and ability to manage your time.
Benefits to the Community
Sharing your projects in public extends the benefits to the broader machine learning community.
- Engagement: A public project can elicit feedback from third-parties which may provide extensions and improvements from which both you and the community itself can learn from.
- Starting Point: A public portfolio project can provide the jumping off point from which others can learn and build upon, perhaps for their own small project or something serious.
- Case Study: a public project can provide a point of study perhaps for a unique or interesting algorithm behavior or problem decomposition, the very source of innovation.
Hopefully, I’ve convinced you that building a machine learning portfolio has some benefits that interest you. Next, we will look at what exactly a machine learning portfolio is.
Build a Machine Learning Portfolio
A machine learning portfolio is a collection of completed independent projects, each of which uses machine learning in some way. The folio presents the collection of projects and allows review of individual projects.
Five properties of an effective machine learning portfolio include:
- Accessible: I advocate making the portfolio public in the form of a publicly accessible webpage or collection of public code repositories. You want people to find, read, comment on, and use your work if possible.
- Small: Each project should be small in scope in terms of effort, resources, and most importantly, your time (10-20 hours). You’re busy and it’s hard to keep focus. See my Small Projects Methodology.
- Completed: Small projects help you have finished projects. Set a modest project objective and achieve it. Like mini-experiments, you present the findings of your successes and your failures, they are all useful learnings.
- Independent: Each project should be independent so that it can be understood in isolation. This does not mean you can’t leverage prior work, it means that the project makes sense on its own as a standalone piece of work.
- Understandable: Each project must clearly and effectively communicate it’s purpose and findings (at the very least). Spend some time and make sure a fresh set of eyes understand what you did and why it matters.
Four types of small project ideas that may inspire you, include:
- Investigate a property of a machine learning tool or library.
- Investigate the behavior of a machine learning algorithm.
- Investigate and characterize a data set or machine learning problem.
- Implement a machine learning algorithm in your favorite programming language.
Some ideas for projects that you probably didn’t think were portfolio pieces include:
- Coursework: Your clear presentation of your notes and homework for a machine learning related course (such as a MOOC).
- Book Review: Your clear presentation of your notes from reading and reviewing a machine learning book.
- Software Review: Your clear presentation and worked examples for using a machine learning related software tool or library.
- Competition Participation: You’re clearly presented notes and results for participating in a machine learning competition, such as Kaggle.
- Commentary: An essay in response to a machine learning themed blog post or your detailed response to a machine learning related question on a Q&A site like Quora, Reddit Machine Learning or CrossValidated.
Now that you know what a machine learning portfolio is and have some ideas of projects, let’s look at how to turn up the awesome on your portfolio.
Making Your Portfolio Great
To make your portfolio shine, you need to do some light marketing. Don’t worry, it’s none of that slimy stuff, just good old fashioned getting the word out.
Code Repository
Consider using a public source code repository such as GitHub or BitBucket that naturally list your public projects. These sits encourage you to provide a readme file in the root of each project that describes what the project is all about. Use this feature to clearly describe the purpose and findings for each project. Don’t be afraid to include images, graphs, videos and links.
Provide unambiguous instructions for downloading the project and recreating the results (if there is code or experimentation involved). You want people to re-run your work, make it as easy as possible (i.e. type this to download then type this to build and run it).
Curate Projects
You can slap together any old project on GitHub, but only include your best, clearest most interesting work in your machine learning portfolio.
Curate your projects like a gallery. Choose those that best demonstrate your skills, interests and capabilities. Show off what you can do and what you have done. These ideas of self-promotion can feed back into the projects you might want to tackle. Be clear in your vision, where you want to be and what projects you want to tackle that will help you get there. Own the process.
Present Findings
Spend a lot of time writing up results. Explain how they relate to the aims of the project. Explain the impact they have in the domain or could have. List off opportunities for extensions that you would or could explore if you had another month or year to deep dive on the project.
Create tables, graphs and any other pretty pictures that help you tell your story. Write up your findings as a blog post. For bonus points, create a short screen cast showing how you got the results and a small power point presentation for what that mean, put it up on YouTube. This video can be embedded in your blog post and linked to from your project repository readme file.
Depending on the findings you have and how important they are to you (such as doing well in a Kaggle competition), you can consider creating a technical report and uploading it to scribd and uploading your slides to SlideShare.
Promote Your Work
You can share the details of each project as you finish it. You may be completing one per week depending on the number of free hours you can find around study and/or work. Sharing links on social media is a good start, such as twitter, facebook and Google+.
I would urge you to add each project (or just your best projects) as “projects” on LinkedIn. It supports the idea of projects and you may have to create a job for them to be listed against. Consider the name of your blog, your sole trader company or invent a relevant job and title such as “Machine Learning Mastery” (wink) or “Self Education“.
Now that we have some ideas on how to make our portfolio shine and how to get the word out, can look at some examples of machine learning portfolios.
Trend of Machine Learning Portfolio
The idea of a code portfolio is not new, it was baked into GitHub. What is interesting is that in recent interviews with data scientists and managers, portfolios are being requested even desired along with participation in machine learning competitions and completion of online training.
Like sample code in programming interviews, Machine Learning portfolios are getting to become a serious part of hiring.
Look for examples of good (or at least filled out) machine learning portfolios. Look for people doing well in machine learning competitions, they typically have an amazing collection of projects described on their blogs and in their public code repositories.
Look for contributors to open source machine learning projects, they can have amazing tutorials, applications and extensions to the software on their blogs and public code repositories.
Get started now. Dig up your projects and put them together in a story that explains your knowledge, interest or skills in machine learning.
Build a Machine Learning Portfolio(构建机器学习投资组合)的更多相关文章
- Machine Learning - XI. Machine Learning System Design机器学习系统的设计(Week 6)
http://blog.csdn.net/pipisorry/article/details/44119187 机器学习Machine Learning - Andrew NG courses学习笔记 ...
- 【原】Coursera—Andrew Ng机器学习—课程笔记 Lecture 11—Machine Learning System Design 机器学习系统设计
Lecture 11—Machine Learning System Design 11.1 垃圾邮件分类 本章中用一个实际例子: 垃圾邮件Spam的分类 来描述机器学习系统设计方法.首先来看两封邮件 ...
- [Machine Learning & Algorithm]CAML机器学习系列2:深入浅出ML之Entropy-Based家族
声明:本博客整理自博友@zhouyong计算广告与机器学习-技术共享平台,尊重原创,欢迎感兴趣的博友查看原文. 写在前面 记得在<Pattern Recognition And Machine ...
- 利用Microsoft Azure Machine Learning Studio创建机器学习实例
Microsoft Azure云服务推出机器学习的模块,用户只需上传数据,利用机器学习模块提供的一些算法接口和R语言或别的语言接口,就能利用Microsoft Azure强大的云计算能力来实现自己的机 ...
- Roles on a Machine Learning Project (机器学习项目中的角色)
原文 :https://medium.com/machine-learning-in-practice/roles-on-a-machine-learning-project-216903a6dc12 ...
- [Machine Learning & Algorithm]CAML机器学习系列1:深入浅出ML之Regression家族
声明:本博客整理自博友@zhouyong计算广告与机器学习-技术共享平台,尊重原创,欢迎感兴趣的博友查看原文. 符号定义 这里定义<深入浅出ML>系列中涉及到的公式符号,如无特殊说明,符号 ...
- 【Machine Learning】如何处理机器学习中的非均衡数据集?
在机器学习中,我们常常会遇到不均衡的数据集.比如癌症数据集中,癌症样本的数量可能远少于非癌症样本的数量:在银行的信用数据集中,按期还款的客户数量可能远大于违约客户的样本数量. 比如非常有名的德国信 ...
- Pattern Recognition And Machine Learning (模式识别与机器学习) 笔记 (1)
By Yunduan Cui 这是我自己的PRML学习笔记,目前持续更新中. 第二章 Probability Distributions 概率分布 本章介绍了书中要用到的概率分布模型,是之后章节的基础 ...
- Pattern Recognition and Machine Learning 模式识别与机器学习
模式识别(PR)领域: 关注的是利⽤计算机算法⾃动发现数据中的规律,以及使⽤这些规律采取将数据分类等⾏动. 聚类:目标是发现数据中相似样本的分组. 反馈学习:是在给定的条件下,找到合适的动作, ...
随机推荐
- s6-5 TCP 连接的建立
TCP 连接的建立 采用三次握手建立连接 一方(server)被动地等待一个进来的连接请求 另一方(the client)通过发送连接请求,设置一些参数 服务器方回发确认应答 应答到达请求方,请求方最 ...
- Photoshop功能组成色彩快捷键
功能 专业测评 Photoshop的专长在于图像处理,而不是图形创作.图像处理是对已有的位图图像进行编辑加工处理以及运用一些特殊效果,其重点在于对图像的处理加工:图形创作软件是按照自己的构思创意,使用 ...
- android踩坑日记1
Android四大组件-活动.服务.广播.碎片 情况一 应用场景:定时从服务器获取数据,然后活动或者碎片中根据最新获得的数据,更新UI. 思考: 首先定时,想到定时器,推荐使用系统自带的AlertMa ...
- jQuery事件学习
1.JS事件的基本知识 <!DOCTYPE html> <html lang="en"> <head> <meta charset=&qu ...
- 关于select的使用感受~大坑~select不能添加点击事件触发~
这是一个坑,把我摔惨了! select+option是浏览器自带的下拉选项框,样式及其丑,还好现在有很多框架都相应做了些美化,select 元素是一种表单控件,可用于在表单中接受用户输入. 还有一个重 ...
- flask-cookie & session
Cookie @app.route('/') def hello_world(): name=request.cookies.get('Name') # 获取cookie resp = Respon ...
- Shell编写字符菜单管理-8
第8章 Shell编写字符菜单管理 一.shell函数定义function menu(){ echo 'this is a func!!';} 二.shell函数使用menu 三.cat命令的here ...
- SQL 经典应用
SQL Server日常维护常用的一些脚本整理. 1.sql server开启clr权限: exec sp_configure 'clr enabled', 1 GO RECONFIGURE GO A ...
- Maven3-依赖
依赖配置 我们先来看一份简单的依赖声明: <project> ... <dependencies> <dependency> <groupId>...& ...
- 实现一个simple 3层的神经网络
1.基本概念 1.1softmax softmax函数:一句话概括:是logistic 函数的扩展,将一个p维的数值向量映射成为一个k维的概率值,且这k个值的和为1. 公式: 解释: 1.2 cros ...