5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics

Where does theory fit into a top-down approach to studying machine learning?

In the traditional approach to teaching machine learning, theory comes first requiring an extensive background in mathematics to be able to understand it. In my approach to teaching machine learning, I start with teaching you how to work problems end-to-end and deliver results.

So where does the theory fit?

In this post you will discover what we really mean when we talk about “theory” in machine learning. Hint: it’s all about the algorithms.

You will discover that once you get skilled at working through problems and delivering results, you will develop a compulsion to dive deeper in order to better understanding and results.Nobody will be able to hold you back.

Finally, you will discover 5 techniques that you can use when you are practicing machine learning on standard datasets to incrementally build up your understanding of machine learning algorithms.

How To Learn Machine Learning without the Math
Photo by Ed Brambley, some rights reserved

Learn Theory Last, Not First

The way machine learning is taught to developers is crap.

It is taught bottom-up. This is crap if you are a developer who is primarily interested in using machine learning as a tool to solve problems rather than being a researcher in the field.

The traditional approach requires that you learn all of the prerequisite mathematics like linear algebra, probability and statistics before learning the theory of algorithms. You’re lucky if you ever go near a working implementation of an algorithm or discuss how to work a problem end-to-end and deliver a working, reliable and accurate predictive model.

I teach a top-down approach to learning machine learning. In this approach we start with 1) learning a systematic process for working through problems end-to-end, 2) map the process onto “best of breed” machine learning tools and platforms then 3) complete targeted practice on test datasets.

You can learn more about my approach to teaching top-down machine learning in the post “Machine Learning for Programmers: Leap from developer to machine learning practitioner“.

So where does theory fit into this process?

If the model is flipped, then theory is taught later. But what theory are we talking about and how exactly do you learn that theory when you are practicing on test datasets?

The Theory is Really All About Algorithms

The field of machine learning is theory-dense.

It’s dense because there is a tradition to describe and explain concepts mathematically.

This is useful because mathematical descriptions can be very concise, cutting down on the ambiguity. They also lend themselves to analysis by leveraging the techniques from the context in which they are described (e.g. a probabilistic understanding of a process).

A lot of these tangential mathematical techniques are often bundled in with the description of machine learning algorithms. For someone who just wants to build a superficial understanding of a method to be able to configure and apply it, this feels overwhelming. Frustratingly so.

It is frustrating if you do not have the grounding to be able to parse and understand the description of an algorithm. It’s frustrating because coming from a field like computer science, algorithms are described all the time, but the difference is the descriptions are intended for fast comprehension (e.g. for desk checking) and implementation.

We know that for example when learning what a hash table is and how to use it, that we almost never need to know the specifics of the hashing function in our day-to-day. But we also know what a hashing function is and where to go to learn more about hashing function specifics and how to write your own. Why can’t machine learning work like that?

The bulk of the “theory” one encounters in machine learning is related to machine learning algorithms. If you ask any beginner about why they are frustrated with the theory, you will learn that it is in relation to learning how to understand or use a specific machine learning algorithm.

Here, algorithms is more broad than a process for creating a predictive model. It also refers to algorithms for selecting features, engineering new features, transforming data and estimating the accuracy of a model on unseen data (e.g. cross validation).

So, learning theory last, really means learning about machine learning algorithms.

A Compulsion To Dive Into Theory

I generally advise targeted practice on well known machine learning datasets.

This is because well known machine learning dataset, like those on the UCI Machine Learning Repository are easy to work with. They are small so they fit into memory and can be processed on your workstation. They are also well studied and understood so you have a baseline for comparison.

You can learn more about targeted practice of machine learning datasets in the post “Practice Machine Learning with Small In-Memory Datasets from the UCI Machine Learning Repository“.

Understanding machine learning algorithms fits into this process. The reason is in the pursuit of getting results on standard machine learning algorithms you are going to run into limitations. You are going to want to know how to get more out of a given algorithm or to know more about how to best configure it, or how it actually works.

This need to know more and curiosity will drive you into studying the theory of machine learning algorithms. You will be compelled to piece together an understand of the algorithms in order to achieve better results.

We see this same effect in young developers from varied backgrounds that end up eventually studying the code of open source projects, textbooks and even research papers in order to hone their craft. The need to being a better more capable programmer drives them to it.

If you are curious and motivated to succeed, you cannot resist studying the theory.

5 Techniques To Understand Machine Learning Algorithms

The time will come to dive into machine learning algorithms as part of your targeted practice

When that time comes, there are a number of techniques and template that you can use to short cut the process.

In this section you will discover 5 techniques that you can use to understand the theory of machine learning algorithms, fast.

1) Create Lists of Machine Learning Algorithms

When you are just starting out you may feel overwhelmed by the larger number of algorithms available.

Even when spot testing algorithms, you may be unsure of which algorithms to include in your mix (hint, be diverse).

An excellent trick you can use when starting out is to keep track of the algorithms you read about. These lists can be as simple as the name of the algorithm, and can increase in complexity as you interest and curiosity build.

Capture details like the problem type to which they are suited (classification or regression), related algorithms, and taxonomic class (decision tree, kernel, etc.). When you see the name of an algorithm that is new to you, add it to your list. When you start a new problem, try some algorithms you have never used before. Mark a check next to algorithms you have used before. And so on.

Controlling the names of algorithms in lists gives you power. This ridiculously simple tactic can help you get on top of the overwhelm. Examples of where your simple algorithm lists can save you a lot of time and frustration are:

  • Ideas of algorithms to try on new and different problem types (time series, rating systems, etc.)
  • Algorithms that you can investigate to learn more about how to apply.
  • Get a handle on algorithm types by category (trees, kernels, etc.).
  • Avoid the problem of fixating on a favorite algorithm.

Start by creating lists of algorithms, open a spreadsheet and get started.

See the post “Take Control By Creating Targeted Lists of Machine Learning Algorithms” for more information on this tactic.

2) Research Machine Learning Algorithms

When you want to know more about a machine learning algorithm you need to research it.

The main reasons you will be interested to research an algorithm is to learn how to configure it and to learn how it works.

Research is not just for academics. A few simple tips can take you a long way in gathering information on a given machine learning algorithm.

The key is diversity of information sources. The following is a short list of the types of sources you can consult for information on an algorithm you are researching.

  1. Authoritative sources like textbooks, lecture notes, slide and overview papers.
  2. Seminal sources like the papers and articles in which the algorithm was first described.
  3. Leading-edge sources that describe state-of-the-art extensions and experiments on the algorithm.
  4. Heuristic sources like those that come out of machine learning competitions, posts on Q&A websites and conference papers.
  5. Implementation sources such as open source code for tools and libraries, blog posts and technical reports.

You do not need to be a PhD researcher nor a machine learning algorithm expert.

Take your time and pick over many sources collecting facts on a machine learning algorithm you are trying to figure out. Focus on the practical details you can apply or understand and leave the rest.

For more information on researching machine learning algorithms see the post “How to Research a Machine Learning Algorithm“.

3) Create Your Own Algorithm Descriptions

Machine learning algorithm descriptions you will discover in your research will be incomplete and inconsistent.

An approach that you can use is to put together your own mini algorithm descriptions. This is another very simple and very powerful tactic.

You can design a standard algorithm description template with only those details that are useful to you in getting the most from algorithms, like algorithm usage heuristics, pseudo-code listings, parameter ranges and resource lists.

You can then use the same algorithm description template across a number of key algorithms and start to build up your own little algorithm encyclopedia that you can refer to on future projects.

Some questions you might like to use in your own algorithm description template include:

  • What are the standard abbreviations used for the algorithm?
  • What is the objective or goal for the algorithm?
  • What is the pseudo-code or flowchart description of the algorithm?
  • What are the heuristics or rules of thumb for using the algorithm?
  • What are useful resources for learning more about the algorithm?

You will be surprised at how useful and practical these descriptions can be. For example, I used this approach to write a book of nature-inspired algorithm descriptions that I still refer back to years later.

For more on how to create effective algorithm description templates, see the post “How to Learn a Machine Learning Algorithm“.

For more information on my book of algorithms described using a standard algorithm description template, see “Clever Algorithms: Nature-Inspired Programming Recipes“.

4) Investigate Algorithm Behavior

Machine learning algorithms are complex systems that are sometimes best understood by their behaviors on actual datasets.

By designing small experiments on machine learning algorithms using small datasets you can learn a lot about how an algorithm works, it’s limitations and how to configure it in ways that may transfer to exceptional results on other problems.

A simple procedure that you can use to investigate a machine learning algorithm is as follows:

  1. Select an algorithm that you would like to know more about (e.g. random forests).
  2. Identify a question about that algorithm you would like answered (e.g. the effect of the number of trees).
  3. Design an experiment to find an answer to that question (e.g. try different numbers of trees on a few binary classification problems and chart the relationship with classification accuracy).
  4. Execute the experiment and write-up your results so that you can make use of them in the future.
  5. Repeat the process.

This is one of the truly exciting aspects of applied machine learning, that through your own simple investigations you can achieve surprising and state of the art results.

For more information on how to study algorithms from their behavior, see the post “How To Investigate Machine Learning Algorithm Behavior“.

5) Implement Machine Learning Algorithms

You cannot get more intimate with a machine learning algorithm than by implementing it.

In implementing a machine learning algorithm from scratch you will be confronted with the myriad of micro-decisions that go into a given implementation. You may decide to cover some up with rules of thumb of expose them all as parameters to the user.

Below is a repeatable process that you can use to implement machine learning algorithms from scratch.

  1. Select a programming language, one that you are most familiar with is probably best.
  2. Select an algorithm to implement, start with something easy (see below for a list).
  3. Select a problem to test your implementation on as you develop, 2D data is good for visualizing (even in Excel).
  4. Research the algorithm and leverage many and diverse sources of information (e.g. read tutorials, papers, other implementations, and so on).
  5. Unit test the algorithm to confirm your understanding and validate the implementation.

Start small and build confidence.

For example 3 algorithms that you select as your first machine learning algorithm implementation from scratch are:

For more information on how to implement machine learning algorithms, see the post “How to Implement a Machine Learning Algorithm“.

Also see the posts:

Theory is Not Just For the Mathematicians

Machine learning is not just for the mathematical elite. You can learn how machine learning algorithms work and how to get the most from them without diving deep into multivariate statistics.

You do not need to be good at math.

As we saw in the techniques section, you can start with algorithm lists and transition deeper into algorithm research, descriptions and algorithm behavior.

You can go very far with these methods without diving much at all into the math.

You do not need to be an academic researcher.

Research is not just for academics. Anyone can read books and papers and compile their own understanding of a topic like a specific machine learning algorithm.

Your biggest breakthroughs will come when you take on the persona of “the scientist” and start experimenting on machine learning algorithms as though they were complex systems in need of study. You will discover all kinds of interesting quirks in behavior that may not even be documented.

Take Action

Pick one of the techniques listed above and get started.

I mean today, now.

Unsure where to start?

Here’s 5 great ideas of where you could start:

  1. Make a list of 10 machine algorithms for classification.
  2. Find five books that give detailed descriptions of Random Forests.
  3. Create a five-slide presentation on Naive Bayes using your own algorithm description template.
  4. Open Weka and see how the “k” parameter affects accuracy of k-nearest neighbor on the iris flowers data set.
  5. Implement linear regression using stochastic gradient descent.

Did you take action? Enjoy this post? Leave a comment below.

5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics的更多相关文章

  1. 机器学习算法之旅A Tour of Machine Learning Algorithms

    In this post we take a tour of the most popular machine learning algorithms. It is useful to tour th ...

  2. Machine Learning Algorithms Study Notes(3)--Learning Theory

    Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 本系列文章是Andrew Ng 在斯坦福的机器学习课程 CS 22 ...

  3. Machine Learning Algorithms Study Notes(2)--Supervised Learning

    Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 本系列文章是Andrew Ng 在斯坦福的机器学习课程 CS 22 ...

  4. Machine Learning Algorithms Study Notes(1)--Introduction

    Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 目 录 1    Introduction    1 1.1    ...

  5. Machine Learning Algorithms Study Notes(6)—遗忘的数学知识

    机器学习中遗忘的数学知识 最大似然估计( Maximum likelihood ) 最大似然估计,也称为最大概似估计,是一种统计方法,它用来求一个样本集的相关概率密度函数的参数.这个方法最早是遗传学家 ...

  6. Machine Learning Algorithms Study Notes(5)—Reinforcement Learning

    Reinforcement Learning 对于控制决策问题的解决思路:设计一个回报函数(reward function),如果learning agent(如上面的四足机器人.象棋AI程序)在决定 ...

  7. Machine Learning Algorithms Study Notes(4)—无监督学习(unsupervised learning)

    1    Unsupervised Learning 1.1    k-means clustering algorithm 1.1.1    算法思想 1.1.2    k-means的不足之处 1 ...

  8. Top 10 Machine Learning Algorithms For Beginners

    Linear Regression Logistic regression KNN Classification Support Vector Machine (SVM) Decision Trees ...

  9. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

随机推荐

  1. 第四周作业——C语言自评

    1.你对自己的未来有什么规划?做了哪些准备?以目前的现状来说,希望至少能够掌握专业所要求的基本操作,然后一步步去深入.提升,毕业之后不会灰溜溜的一次次求职失败.目前更多的是利用闲暇时间补回过去老师同学 ...

  2. LR之Java Vuser II

    最近项目待压测的服务端协议使用的是java的Netty框架开发,而传输的业务数据使用了google protobuf进行序列化,然后通过tcp数据流与客户端通讯.这一次的压测脚本决定使用LR的java ...

  3. LoadRunner函数大全之中文解释

    LoadRunner函数大全之中文解释

  4. spring mvc4 找不到静态文件js/css/html 404

    说明: http://localhost:8080 指向的目录是WEB-INF所在的目录,也就是说请求静态资源时都是从该根目录开始查找.建议将所有静态文件放到和WEB-INF同级的目录下. 以 htt ...

  5. 让VS2013支持 C# 6.0 语法

    还未升级使用VS2015前,又想尝试使用C# 6.0的语言特性,可以用以下方法启用: VS2013中“工具”下选择“程序包管理器控制台”: 选中需要使用C# 6.0的项目,再敲入"Insta ...

  6. oracle 绝对值小于1的数值显示小数点前面的0

    SELECT DECODE(TRUNC(-.98),0,REPLACE(TO_CHAR(-.98), '.', '0.'),TO_CHAR(-.98))FROM DUAL;

  7. css & text-overflow & ellipsis

    css & text-overflow & ellipsis https://developer.mozilla.org/en-US/docs/Web/CSS/text-overflo ...

  8. LoadRunner录制用户操作

    先说明一点,使用录制的手段拿到的测试脚本和工程师自己编写的测试脚本其实是一样的,不要觉得录制的方式low,而自己编写脚本就显得高大上,这是不对的.除非工程师本身对开发们写的代码逻辑很熟,对业务上的各个 ...

  9. 第80天:jQuery插件使用

    jQuery其他补充+ 4.1 链式编程: end()补充 * 补充五角星 评论案例 * 第一步:鼠标移入,当前五角星和前面的五角星变实体.后面的变空心五角星 * 第二步:鼠标点击的时候,为当前元素添 ...

  10. Codeforces 627D Preorder Test(二分+树形DP)

    题意:给出一棵无根树,每个节点有一个权值,现在要让dfs序的前k个结点的最小值最大,求出这个值. 考虑二分答案,把>=答案的点标记为1,<答案的点标记为0,现在的任务时使得dfs序的前k个 ...