A Gentle Introduction to Transfer Learning for Deep Learning | 迁移学习
by Jason Brownlee on December 20, 2017 in Better Deep Learning
Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task.
It is a popular approach in deep learning where pre-trained models are used as the starting point on computer vision and natural language processing tasks given the vast compute and time resources required to develop neural network models on these problems and from the huge jumps in skill that they provide on related problems.
In this post, you will discover how you can use transfer learning to speed up training and improve the performance of your deep learning model.
After reading this post, you will know:
- What transfer learning is and how to use it.
- Common examples of transfer learning in deep learning.
- When to use transfer learning on your own predictive modeling problems.
Let’s get started.
What is Transfer Learning?
Transfer learning is a machine learning technique where a model trained on one task is re-purposed on a second related task.
Transfer learning and domain adaptation refer to the situation where what has been learned in one setting … is exploited to improve generalization in another setting
— Page 526, Deep Learning, 2016.
Transfer learning is an optimization that allows rapid progress or improved performance when modeling the second task.
Transfer learning is the improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned.
— Chapter 11: Transfer Learning, Handbook of Research on Machine Learning Applications, 2009.
Transfer learning is related to problems such as multi-task learning and concept drift and is not exclusively an area of study for deep learning.
Nevertheless, transfer learning is popular in deep learning given the enormous resources required to train deep learning models or the large and challenging datasets on which deep learning models are trained.
Transfer learning only works in deep learning if the model features learned from the first task are general.
In transfer learning, we first train a base network on a base dataset and task, and then we repurpose the learned features, or transfer them, to a second target network to be trained on a target dataset and task. This process will tend to work if the features are general, meaning suitable to both base and target tasks, instead of specific to the base task.
— How transferable are features in deep neural networks?
This form of transfer learning used in deep learning is called inductive transfer. This is where the scope of possible models (model bias) is narrowed in a beneficial way by using a model fit on a different but related task.
Depiction of Inductive Transfer
Taken from “Transfer Learning”
How to Use Transfer Learning?
You can use transfer learning on your own predictive modeling problems.
Two common approaches are as follows:
- Develop Model Approach
- Pre-trained Model Approach
Develop Model Approach
- Select Source Task. You must select a related predictive modeling problem with an abundance of data where there is some relationship in the input data, output data, and/or concepts learned during the mapping from input to output data.
- Develop Source Model. Next, you must develop a skillful model for this first task. The model must be better than a naive model to ensure that some feature learning has been performed.
- Reuse Model. The model fit on the source task can then be used as the starting point for a model on the second task of interest. This may involve using all or parts of the model, depending on the modeling technique used.
- Tune Model. Optionally, the model may need to be adapted or refined on the input-output pair data available for the task of interest.
Pre-trained Model Approach
- Select Source Model. A pre-trained source model is chosen from available models. Many research institutions release models on large and challenging datasets that may be included in the pool of candidate models from which to choose from.
- Reuse Model. The model pre-trained model can then be used as the starting point for a model on the second task of interest. This may involve using all or parts of the model, depending on the modeling technique used.
- Tune Model. Optionally, the model may need to be adapted or refined on the input-output pair data available for the task of interest.
This second type of transfer learning is common in the field of deep learning.
Examples of Transfer Learning with Deep Learning
Let’s make this concrete with two common examples of transfer learning with deep learning models.
Transfer Learning with Image Data
It is common to perform transfer learning with predictive modeling problems that use image data as input.
This may be a prediction task that takes photographs or video data as input.
For these types of problems, it is common to use a deep learning model pre-trained for a large and challenging image classification task such as the ImageNet 1000-class photograph classification competition.
The research organizations that develop models for this competition and do well often release their final model under a permissive license for reuse. These models can take days or weeks to train on modern hardware.
These models can be downloaded and incorporated directly into new models that expect image data as input.
Three examples of models of this type include:
For more examples, see the Caffe Model Zoo where more pre-trained models are shared.
This approach is effective because the images were trained on a large corpus of photographs and require the model to make predictions on a relatively large number of classes, in turn, requiring that the model efficiently learn to extract features from photographs in order to perform well on the problem.
In their Stanford course on Convolutional Neural Networks for Visual Recognition, the authors caution to carefully choose how much of the pre-trained model to use in your new model.
[Convolutional Neural Networks] features are more generic in early layers and more original-dataset-specific in later layers
— Transfer Learning, CS231n Convolutional Neural Networks for Visual Recognition
Transfer Learning with Language Data
It is common to perform transfer learning with natural language processing problems that use text as input or output.
For these types of problems, a word embedding is used that is a mapping of words to a high-dimensional continuous vector space where different words with a similar meaning have a similar vector representation.
Efficient algorithms exist to learn these distributed word representations and it is common for research organizations to release pre-trained models trained on very large corpa of text documents under a permissive license.
Two examples of models of this type include:
These distributed word representation models can be downloaded and incorporated into deep learning language models in either the interpretation of words as input or the generation of words as output from the model.
In his book on Deep Learning for Natural Language Processing, Yoav Goldberg cautions:
… one can download pre-trained word vectors that were trained on very large quantities of text […] differences in training regimes and underlying corpora have a strong influence on the resulting representations, and that the available pre-trained representations may not be the best choice for [your] particular use case.
— Page 135, Neural Network Methods in Natural Language Processing, 2017.
When to Use Transfer Learning?
Transfer learning is an optimization, a shortcut to saving time or getting better performance.
In general, it is not obvious that there will be a benefit to using transfer learning in the domain until after the model has been developed and evaluated.
Lisa Torrey and Jude Shavlik in their chapter on transfer learning describe three possible benefits to look for when using transfer learning:
- Higher start. The initial skill (before refining the model) on the source model is higher than it otherwise would be.
- Higher slope. The rate of improvement of skill during training of the source model is steeper than it otherwise would be.
- Higher asymptote. The converged skill of the trained model is better than it otherwise would be.
Three ways in which transfer might improve learning.
Taken from “Transfer Learning”.
Ideally, you would see all three benefits from a successful application of transfer learning.
It is an approach to try if you can identify a related task with abundant data and you have the resources to develop a model for that task and reuse it on your own problem, or there is a pre-trained model available that you can use as a starting point for your own model.
On some problems where you may not have very much data, transfer learning can enable you to develop skillful models that you simply could not develop in the absence of transfer learning.
The choice of source data or source model is an open problem and may require domain expertise and/or intuition developed via experience.
Further Reading
This section provides more resources on the topic if you are looking to go deeper.
Books
Papers
- A survey on transfer learning, 2010.
- Chapter 11: Transfer Learning, Handbook of Research on Machine Learning Applications, 2009.
- How transferable are features in deep neural networks?
Pre-trained Models
- Oxford VGG Model
- Google Inception Model
- Microsoft ResNet Model
- Google’s word2vec Model
- Stanford’s GloVe Model
- Caffe Model Zoo
Articles
- Transfer learning on Wikipedia
- Transfer Learning – Machine Learning’s Next Frontier, 2017.
- Transfer Learning, CS231n Convolutional Neural Networks for Visual Recognition
- How does transfer learning work? on Quora
Summary
In this post, you discovered how you can use transfer learning to speed up training and improve the performance of your deep learning model.
Specifically, you learned:
- What transfer learning is and how it is used in deep learning.
- When to use transfer learning.
- Examples of transfer learning used on computer vision and natural language processing tasks.
A Gentle Introduction to Transfer Learning for Deep Learning | 迁移学习的更多相关文章
- paper 124:【转载】无监督特征学习——Unsupervised feature learning and deep learning
来源:http://blog.csdn.net/abcjennifer/article/details/7804962 无监督学习近年来很热,先后应用于computer vision, audio c ...
- 转:无监督特征学习——Unsupervised feature learning and deep learning
http://blog.csdn.net/abcjennifer/article/details/7804962 无监督学习近年来很热,先后应用于computer vision, audio clas ...
- [转] 无监督特征学习——Unsupervised feature learning and deep learning
from:http://blog.csdn.net/abcjennifer/article/details/7804962 无监督学习近年来很热,先后应用于computer vision, audio ...
- UFLDL(Unsupervised Feature Learning and Deep Learning)
UFLDL(Unsupervised Feature Learning and Deep Learning)Tutorial 是由 Stanford 大学的 Andrew Ng 教授及其团队编写的一套 ...
- Unsupervised Feature Learning and Deep Learning(UFLDL) Exercise 总结
7.27 暑假开始后,稍有时间,“搞完”金融项目,便开始跑跑 Deep Learning的程序 Hinton 在Nature上文章的代码 跑了3天 也没跑完 后来Debug 把batch 从200改到 ...
- deep learning 以及deep learning 常用模型和方法
首先为什么会有Deep learning,我们得到一个结论就是Deep learning需要多层来获得更抽象的特征表达. 1.Deep learning与Neural Network 深度学习是机器学 ...
- 使用深度学习的超分辨率介绍 An Introduction to Super Resolution using Deep Learning
使用深度学习的超分辨率介绍 关于使用深度学习进行超分辨率的各种组件,损失函数和度量的详细讨论. 介绍 超分辨率是从给定的低分辨率(LR)图像恢复高分辨率(HR)图像的过程.由于较小的空间分辨率(即尺寸 ...
- A beginner’s introduction to Deep Learning
A beginner’s introduction to Deep Learning I am Samvita from the Business Team of HyperVerge. I join ...
- What are some good books/papers for learning deep learning?
What's the most effective way to get started with deep learning? 29 Answers Yoshua Bengio, ...
随机推荐
- 十、cent OS开启APR模式报错:configure: error: Found APR 1.3.9. You need version 1.4.3 or newer installed
错误内容显示APR的版本过低,需要新版本 到http://apr.apache.org/download.cgi#apr1这个地址下载所需要的包apr-1.4.5.tar.gz apr-iconv-1 ...
- Java 基础 内部类
Java 基础 内部类 内部类(嵌套类) nested class 目的为外围类enclosing class提供服务. 四种: 静态成员类 static member class 非静态成员类 no ...
- react+javascript前端进阶
组合1: react技术栈(react(阮一峰react入门,官网教程).redux(阮一峰redux入门,官网教程).saga)+JS(ES6)+antd+you don`t know JS(上中下 ...
- Django—自定义分页
分页功能在每个网站都是必要的,对于分页来说,其实就是根据用户的输入计算出应该显示在页面上的数据在数据库表中的起始位置. 确定分页需求: 1. 每页显示的数据条数 2. 每页显示页号链接数 3. 上一页 ...
- Android DB类,支持MDB,SQLITE,SQLSERVER,支持查询、事务,对象直接插入和更新操作等
直做数据库,最近花了点时间把自己常用的东西封装在一起. DBHelper using System; using System.Collections.Generic; using System.Te ...
- Android PopupWindow显示位置设置
当点击某个按钮并弹出PopupWindow时,PopupWindow左下角默认与按钮对齐,但是如果PopupWindow是下图的那样,会发 生错位的情况,尤其是不同尺寸的平板上,那错位错的不是一般的不 ...
- ES6入门——函数的扩展
1.函数参数的默认值 在ES6之前,不能直接为函数的参数指定默认值,只能采用变通的方法.现在ES6可以为函数的参数添加默认值,简洁了许多. ES5 function show(a,b){ b = b ...
- Hadoop自定义JobTracker和NameNode管理页面
为了可以方便地在hadoop的管理界面(namenode和jobtracker)中自定义展示项,使用代理servlet的方式实现了hadoop的管理界面. 首先, 在org.apache.hadoop ...
- 新开篇关于vue
参考链接:http://cn.vuejs.org/v2/guide/instance.html 了解vue组件的生命周期: 1.beforeCreate 即将创建 2.created 创建 3.bef ...
- GIT团队合作探讨之三--使用分支
这篇文章是一个作为对git branch的综合介绍.首先,我们会看看创建branch,这有点像是请求一个新的项目历史.然后,我们看看git checkout是如何能够被用来选择一个branch,最后看 ...