Build a Machine Learning Portfolio(构建机器学习投资组合)

Complete Small Focused Projects and Demonstrate Your Skills

（完成小型针对性机器学习项目，证明你的能力）

A portfolio is typically used by designers and artists to show examples of prior work to prospective clients and employers.

Design, art and photography are examples where the work product is creative and empirical, where telling someone you can do it is not valued the same as showing them.

In this post, I will convince you that building a machine learning portfolio has value to you, others and the community.

You will discover what exactly a machine learning portfolio is, the types of projects that can be included and how to make your portfolio really work for you.

Your Portfolio

Pick a theme. This is the type of projects that you want to work on. A no-brainer would be reports on customer data (high-value customers, predictions of prospects that convert, etc.).
Find open datasets. You need to locate datasets that you can practice on that are close to or on your theme. Look on competition websites like Kaggle and KDDCup as a starting point. There are a lot of public access datasets these days that you can practice on!
Complete projects. Treat each dataset like a project with a client and apply your process to it in order to deliver a result. This may require you to assume the role of the client and take an educated guess as to the outcome they are looking for (model or report on a specific question, etc.)
Write-up. Write-up your findings as a semi-formal work product and host it publicly online.

Benefits of a Machine Learning Portfolio

If you are just starting out as a beginner in machine learning or you are a hardened veteran, a machine learning portfolio can keep you on track and demonstrate your skills. Creating a machine learning portfolio is a valuable exercise for you and for others.

Benefits for You

Building up a collection of completed machine learning projects can keep you focused, motivated and be leveraged on future projects.

Focus: Each project has a well-defined purpose and end point. Small projects constrained in effort and resources can keep velocity high.
Knowledge Base: The corpus of completed projects provide a knowledge base for you to reflect on and leverage as you push into projects further from comfort zones.
Trajectory: There are so many shiny things to investigate, reminding yourself that you are looking for a consistent collection projects can be used as a lever to keep you on track.

Benefits to Others

A portfolio of completed projects can be used by others as an indicator of specific skills, ability to communicate and a demonstration of drive.

Skills: A project can demonstrate your capability with regard to a specific problem domain, tool, library technology stack or algorithm.
Communication: A project must be understood at least in terms of its purpose and the findings. The curation of a good portfolio requires excellent communication skills that tautologically demonstrate your ability to communicate technical subjects well.
Motivation: Working on and completing side projects, regardless of the size of scope takes a certain level of self-discipline. The fact that you managed to put together a portfolio is a monument to your interest in the subject and ability to manage your time.

Benefits to the Community

Sharing your projects in public extends the benefits to the broader machine learning community.

Engagement: A public project can elicit feedback from third-parties which may provide extensions and improvements from which both you and the community itself can learn from.
Starting Point: A public portfolio project can provide the jumping off point from which others can learn and build upon, perhaps for their own small project or something serious.
Case Study: a public project can provide a point of study perhaps for a unique or interesting algorithm behavior or problem decomposition, the very source of innovation.

Hopefully, I’ve convinced you that building a machine learning portfolio has some benefits that interest you. Next, we will look at what exactly a machine learning portfolio is.

Build a Machine Learning Portfolio

A machine learning portfolio is a collection of completed independent projects, each of which uses machine learning in some way. The folio presents the collection of projects and allows review of individual projects.

Five properties of an effective machine learning portfolio include:

Accessible: I advocate making the portfolio public in the form of a publicly accessible webpage or collection of public code repositories. You want people to find, read, comment on, and use your work if possible.
Small: Each project should be small in scope in terms of effort, resources, and most importantly, your time (10-20 hours). You’re busy and it’s hard to keep focus. See my Small Projects Methodology.
Completed: Small projects help you have finished projects. Set a modest project objective and achieve it. Like mini-experiments, you present the findings of your successes and your failures, they are all useful learnings.
Independent: Each project should be independent so that it can be understood in isolation. This does not mean you can’t leverage prior work, it means that the project makes sense on its own as a standalone piece of work.
Understandable: Each project must clearly and effectively communicate it’s purpose and findings (at the very least). Spend some time and make sure a fresh set of eyes understand what you did and why it matters.

Four types of small project ideas that may inspire you, include:

Investigate a property of a machine learning tool or library.
Investigate the behavior of a machine learning algorithm.
Investigate and characterize a data set or machine learning problem.
Implement a machine learning algorithm in your favorite programming language.

Some ideas for projects that you probably didn’t think were portfolio pieces include:

Coursework: Your clear presentation of your notes and homework for a machine learning related course (such as a MOOC).
Book Review: Your clear presentation of your notes from reading and reviewing a machine learning book.
Software Review: Your clear presentation and worked examples for using a machine learning related software tool or library.
Competition Participation: You’re clearly presented notes and results for participating in a machine learning competition, such as Kaggle.
Commentary: An essay in response to a machine learning themed blog post or your detailed response to a machine learning related question on a Q&A site like Quora, Reddit Machine Learning or CrossValidated.

Now that you know what a machine learning portfolio is and have some ideas of projects, let’s look at how to turn up the awesome on your portfolio.

Making Your Portfolio Great

To make your portfolio shine, you need to do some light marketing. Don’t worry, it’s none of that slimy stuff, just good old fashioned getting the word out.

Code Repository

Consider using a public source code repository such as GitHub or BitBucket that naturally list your public projects. These sits encourage you to provide a readme file in the root of each project that describes what the project is all about. Use this feature to clearly describe the purpose and findings for each project. Don’t be afraid to include images, graphs, videos and links.

Provide unambiguous instructions for downloading the project and recreating the results (if there is code or experimentation involved). You want people to re-run your work, make it as easy as possible (i.e. type this to download then type this to build and run it).

Curate Projects

You can slap together any old project on GitHub, but only include your best, clearest most interesting work in your machine learning portfolio.

Curate your projects like a gallery. Choose those that best demonstrate your skills, interests and capabilities. Show off what you can do and what you have done. These ideas of self-promotion can feed back into the projects you might want to tackle. Be clear in your vision, where you want to be and what projects you want to tackle that will help you get there. Own the process.

Present Findings

Spend a lot of time writing up results. Explain how they relate to the aims of the project. Explain the impact they have in the domain or could have. List off opportunities for extensions that you would or could explore if you had another month or year to deep dive on the project.

Create tables, graphs and any other pretty pictures that help you tell your story. Write up your findings as a blog post. For bonus points, create a short screen cast showing how you got the results and a small power point presentation for what that mean, put it up on YouTube. This video can be embedded in your blog post and linked to from your project repository readme file.

Depending on the findings you have and how important they are to you (such as doing well in a Kaggle competition), you can consider creating a technical report and uploading it to scribd and uploading your slides to SlideShare.

Promote Your Work

You can share the details of each project as you finish it. You may be completing one per week depending on the number of free hours you can find around study and/or work. Sharing links on social media is a good start, such as twitter, facebook and Google+.

I would urge you to add each project (or just your best projects) as “projects” on LinkedIn. It supports the idea of projects and you may have to create a job for them to be listed against. Consider the name of your blog, your sole trader company or invent a relevant job and title such as “Machine Learning Mastery” (wink) or “Self Education“.

Now that we have some ideas on how to make our portfolio shine and how to get the word out, can look at some examples of machine learning portfolios.

Trend of Machine Learning Portfolio

The idea of a code portfolio is not new, it was baked into GitHub. What is interesting is that in recent interviews with data scientists and managers, portfolios are being requested even desired along with participation in machine learning competitions and completion of online training.

Like sample code in programming interviews, Machine Learning portfolios are getting to become a serious part of hiring.

Look for examples of good (or at least filled out) machine learning portfolios. Look for people doing well in machine learning competitions, they typically have an amazing collection of projects described on their blogs and in their public code repositories.

Look for contributors to open source machine learning projects, they can have amazing tutorials, applications and extensions to the software on their blogs and public code repositories.

Get started now. Dig up your projects and put them together in a story that explains your knowledge, interest or skills in machine learning.

Build a Machine Learning Portfolio(构建机器学习投资组合)的更多相关文章

Machine Learning - XI. Machine Learning System Design机器学习系统的设计(Week 6)
http://blog.csdn.net/pipisorry/article/details/44119187 机器学习Machine Learning - Andrew NG courses学习笔记 ...
【原】Coursera—Andrew Ng机器学习—课程笔记 Lecture 11—Machine Learning System Design 机器学习系统设计
Lecture 11—Machine Learning System Design 11.1 垃圾邮件分类本章中用一个实际例子: 垃圾邮件Spam的分类来描述机器学习系统设计方法.首先来看两封邮件 ...
[Machine Learning & Algorithm]CAML机器学习系列2：深入浅出ML之Entropy-Based家族
声明:本博客整理自博友@zhouyong计算广告与机器学习-技术共享平台,尊重原创,欢迎感兴趣的博友查看原文. 写在前面记得在<Pattern Recognition And Machine ...
利用Microsoft Azure Machine Learning Studio创建机器学习实例
Microsoft Azure云服务推出机器学习的模块,用户只需上传数据,利用机器学习模块提供的一些算法接口和R语言或别的语言接口,就能利用Microsoft Azure强大的云计算能力来实现自己的机 ...
Roles on a Machine Learning Project （机器学习项目中的角色）
原文 :https://medium.com/machine-learning-in-practice/roles-on-a-machine-learning-project-216903a6dc12 ...
[Machine Learning & Algorithm]CAML机器学习系列1：深入浅出ML之Regression家族
声明:本博客整理自博友@zhouyong计算广告与机器学习-技术共享平台,尊重原创,欢迎感兴趣的博友查看原文. 符号定义这里定义<深入浅出ML>系列中涉及到的公式符号,如无特殊说明,符号 ...
【Machine Learning】如何处理机器学习中的非均衡数据集？
在机器学习中,我们常常会遇到不均衡的数据集.比如癌症数据集中,癌症样本的数量可能远少于非癌症样本的数量:在银行的信用数据集中,按期还款的客户数量可能远大于违约客户的样本数量. 比如非常有名的德国信 ...
Pattern Recognition And Machine Learning (模式识别与机器学习) 笔记 (1)
By Yunduan Cui 这是我自己的PRML学习笔记,目前持续更新中. 第二章 Probability Distributions 概率分布本章介绍了书中要用到的概率分布模型,是之后章节的基础 ...
Pattern Recognition and Machine Learning 模式识别与机器学习
模式识别(PR)领域: 关注的是利⽤计算机算法⾃动发现数据中的规律,以及使⽤这些规律采取将数据分类等⾏动. 聚类:目标是发现数据中相似样本的分组. 反馈学习:是在给定的条件下,找到合适的动作, ...

随机推荐

ScrollView无法滚动
ScrollView视图无法滚动箭头所指地方应该用dp单位的数值
将Paul替换成Ringo
<!DOCTYPE html><html><head lang="en"> <meta charset="UTF-8" ...
文件上传的三种模式-Java
文件上传的三种方式-Java 前言:因自己负责的项目(jetty内嵌启动的SpringMvc)中需要实现文件上传,而自己对java文件上传这一块未接触过,且对 Http 协议较模糊,故这次采用渐进的方 ...
STM32学习笔记
1.32位即表示32个二进制位(0/1)即32根线,每根线可以表示0/1两种状态,所以可以表示2^32=4GB的大小,CM3 采用了哈佛结构,拥有独立的指令总线和数据总线,可以让取指与数据访问并行不悖 ...
基本数据类型的包装类(Interger)
基本数据类型 vs包装类 byte Byte short Short char Character int Integer long Long float Float double Double bo ...
TCP/IP（一）之开启计算机网络之路
阅读目录(Content) 一.局域网.广域网和Internet 1.1.局域网 1.2.广域网 1.3.Internet 二.计算机数据之间通信的过程 2.1.路由器的功能(转发收到的分组) 三.O ...
【转】Asp.NetMve移除HTTP Header中服務器信息Server、X-AspNet-Version、X-AspNetMvc-Version、X-Powered-By:ASP.NET
默認情況下Chrome中截獲的HTTP Header信息: Cache-Control: Content-Encoding:gzip Content-Length: Content-Type:text ...
【npm】伙计，给我来一杯package.json！不加糖
前言:夜深了,我熬了一锅热气腾腾的package.json,给大家端上来,希望大家喜欢 json和JS对象的区别 package.json,顾名思义,它是一个json文件,而不能写入JS对象. 所以我 ...
Not posting notification with icon==0问题解决
问题:E/NotificationService: Not posting notification with icon==0: Notification(pri=0 contentView=null ...
吴恩达机器学习笔记28-多类分类（Multiclass Classification）
当我们有不止两种分类时(也就是