迁移学习-Transfer Learning
- 迁移学习两种类型:
- ConvNet as fixed feature extractor:利用在大数据集(如ImageNet)上预训练过的ConvNet(如AlexNet,VGGNet),移除最后几层(一般是最后分类器),将剩下的ConvNet作为应用于新数据集的固定不变的特征提取器,输出特征称为CNN codes,如果在预训练网络上是经过ReLUd,那这些codes也要经过ReLUd(important for performance);提取出所有CNN codes之后,再基于新数据集训练一个线性分类器(Linear SVM or Softmax classifier);
- Fine-tuning the ConvNet:第一步:在新数据集上,替换预训练ConvNet顶层的分类器并retrain该分类器;第二步:以较小的学习率继续反向传播来微调预训练网络的权重,两种做法:微调ConvNet的所有层,或者保持some earlier layers fixed (due to overfitting concerns) ,只微调some higher-level portion of the network;
- 原理:一般认为CNN中前端(靠近输入图片)的层提取的是纹理、色彩等基本特征,越靠近后端,提取的特征越高级、抽象、面向具体任务。所以更普遍的微调方法是:固定其他参数不变,替换预训练网络最后几层,基于新数据集重新训练最后几层的参数(之前的层参数保持不变,作为特征提取器),之后再用较小的学习率将网络整体训练。
- 一些开源的Pretrained models:Model Zoo
- When and how to fine-tune?
- 四个主要场景:
- 新数据集小,且与原始数据集相似:要考虑小数据集过度拟合问题;利用CNN codes 训练一个线性分类器
- 新数据集大,且与原始数据集相似:不用考虑过度拟合,可尝试微调整个神经网络;
- 新数据集小,并与原始数据集差距大:训练一个线性分类器,而新数据集与原始数据集差距大,work better to train the SVM classifier from activations somewhere earlier in the network
- 新数据集大,且与原始数据集差距大,使用预训练模型参数,基于新数据集微调整个神经网络
- Practical advice
- a few additional things to keep in mind when performing Transfer Learning:
- Constraints from pretrained models:使用预训练网络,新数据集使用的架构将受限,比如不能随意take out Conv layers from the pretrained network;
- Learning rates:微调 ConvNet权重(ConvNet weights are relatively good)时的学习率要比新的线性分类器(权重是随机初始化的)的学习率要小;
英文原文:
(These notes are currently in draft form and under development)
Table of Contents:
● Transfer Learning
● Additional References
Transfer Learning
In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest. The three major Transfer Learning scenarios look as follows:
● ConvNet as fixed feature extractor. Take a ConvNet pretrained on ImageNet, remove the last fully-connected layer (this layer’s outputs are the 1000 class scores for a different task like ImageNet), then treat the rest of the ConvNet as a fixed feature extractor for the new dataset. In an AlexNet, this would compute a 4096-D vector for every image that contains the activations of the hidden layer immediately before the classifier. We call these features CNN codes. It is important for performance that these codes are ReLUd (i.e. thresholded at zero) if they were also thresholded during the training of the ConvNet on ImageNet (as is usually the case). Once you extract the 4096-D codes for all images, train a linear classifier (e.g. Linear SVM or Softmax classifier) for the new dataset.
● Fine-tuning the ConvNet. The second strategy is to not only replace and retrain the classifier on top of the ConvNet on the new dataset, but to also fine-tune the weights of the pretrained network by continuing the backpropagation. It is possible to fine-tune all the layers of the ConvNet, or it’s possible to keep some of the earlier layers fixed (due to overfitting concerns) and only fine-tune some higher-level portion of the network. This is motivated by the observation that the earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original dataset. In case of ImageNet for example, which contains many dog breeds, a significant portion of the representational power of the ConvNet may be devoted to features that are specific to differentiating between dog breeds.
● Pretrained models. Since modern ConvNets take 2-3 weeks to train across multiple GPUs on ImageNet, it is common to see people release their final ConvNet checkpoints for the benefit of others who can use the networks for fine-tuning. For example, the Caffe library has a Model Zoo where people share their network weights.
When and how to fine-tune? How do you decide what type of transfer learning you should perform on a new dataset? This is a function of several factors, but the two most important ones are the size of the new dataset (small or big), and its similarity to the original dataset (e.g. ImageNet-like in terms of the content of images and the classes, or very different, such as microscope images). Keeping in mind that ConvNet features are more generic in early layers and more original-dataset-specific in later layers, here are some common rules of thumb for navigating the 4 major scenarios:
- New dataset is small and similar to original dataset. Since the data is small, it is not a good idea to fine-tune the ConvNet due to overfitting concerns. Since the data is similar to the original data, we expect higher-level features in the ConvNet to be relevant to this dataset as well. Hence, the best idea might be to train a linear classifier on the CNN codes.
- New dataset is large and similar to the original dataset. Since we have more data, we can have more confidence that we won’t overfit if we were to try to fine-tune through the full network.
- New dataset is small but very different from the original dataset. Since the data is small, it is likely best to only train a linear classifier. Since the dataset is very different, it might not be best to train the classifier form the top of the network, which contains more dataset-specific features. Instead, it might work better to train the SVM classifier from activations somewhere earlier in the network.
- New dataset is large and very different from the original dataset. Since the dataset is very large, we may expect that we can afford to train a ConvNet from scratch. However, in practice it is very often still beneficial to initialize with weights from a pretrained model. In this case, we would have enough data and confidence to fine-tune through the entire network.
Practical advice. There are a few additional things to keep in mind when performing Transfer Learning:
● Constraints from pretrained models. Note that if you wish to use a pretrained network, you may be slightly constrained in terms of the architecture you can use for your new dataset. For example, you can’t arbitrarily take out Conv layers from the pretrained network. However, some changes are straight-forward: Due to parameter sharing, you can easily run a pretrained network on images of different spatial size. This is clearly evident in the case of Conv/Pool layers because their forward function is independent of the input volume spatial size (as long as the strides “fit”). In case of FC layers, this still holds true because FC layers can be converted to a Convolutional Layer: For example, in an AlexNet, the final pooling volume before the first FC layer is of size [6x6x512]. Therefore, the FC layer looking at this volume is equivalent to having a Convolutional Layer that has receptive field size 6x6, and is applied with padding of 0.
● Learning rates. It’s common to use a smaller learning rate for ConvNet weights that are being fine-tuned, in comparison to the (randomly-initialized) weights for the new linear classifier that computes the class scores of your new dataset. This is because we expect that the ConvNet weights are relatively good, so we don’t wish to distort them too quickly and too much (especially while the new Linear Classifier above them is being trained from random initialization).
Additional References
● CNN Features off-the-shelf: an Astounding Baseline for Recognition trains SVMs on features from ImageNet-pretrained ConvNet and reports several state of the art results.
● DeCAF reported similar findings in 2013. The framework in this paper (DeCAF) was a Python-based precursor to the C++ Caffe library.
● How transferable are features in deep neural networks? studies the transfer learning performance in detail, including some unintuitive findings about layer co-adaptations.
- 迁移学习transfer-learning
- 迁移学习参考资料:the CS231n course notes
- 在tensorflow上实现迁移学习: How to Retrain Inception's Final Layer for New Categories
- 在实际中,通常不会自己训练庞大的神经网络。事实上已有很多使用大型数据集(如 ImageNet) 训练长达数周的模型。
- 学习使用预训练模型 VGGNet 对花朵图像进行分类
- VGGNet is great because it's simple and has great performance, coming in second in the ImageNet competition. The idea here is that we keep all the convolutional layers, but replace the final fully connected layers with our own classifier. This way we can use VGGNet as a feature extractor for our images then easily train a simple classifier on top of that. What we'll do is take the first fully connected layer with 4096 units, including thresholding with ReLUs. We can use those values as a code for each image, then build a classifier on top of those codes.
- using VGGNet to classify images of flowers:
- Github-tensorflow-vgg--Tensorflow VGG16 and VGG19
迁移学习-Transfer Learning的更多相关文章
- 【转载】 迁移学习(Transfer learning),多任务学习(Multitask learning)和端到端学习(End-to-end deep learning)
--------------------- 作者:bestrivern 来源:CSDN 原文:https://blog.csdn.net/bestrivern/article/details/8700 ...
- 【深度学习系列】迁移学习Transfer Learning
在前面的文章中,我们通常是拿到一个任务,譬如图像分类.识别等,搜集好数据后就开始直接用模型进行训练,但是现实情况中,由于设备的局限性.时间的紧迫性等导致我们无法从头开始训练,迭代一两百万次来收敛模型, ...
- pytorch例子学习——TRANSFER LEARNING TUTORIAL
参考:https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html 以下是两种主要的迁移学习场景 微调convnet : ...
- 图像识别 | AI在医学上的应用 | 深度学习 | 迁移学习
参考:登上<Cell>封面的AI医疗影像诊断系统:机器之心专访UCSD张康教授 Identifying Medical Diagnoses and Treatable Diseases b ...
- AI小白必读:深度学习、迁移学习、强化学习别再傻傻分不清
摘要:诸多关于人工智能的流行词汇萦绕在我们耳边,比如深度学习 (Deep Learning).强化学习 (Reinforcement Learning).迁移学习 (Transfer Learning ...
- 【迁移学习】2010-A Survey on Transfer Learning
资源:http://www.cse.ust.hk/TL/ 简介: 一个例子: 关于照片的情感分析. 源:比如你之前已经搜集了大量N种类型物品的图片进行了大量的人工标记(label),耗费了巨大的人力物 ...
- [DeeplearningAI笔记]ML strategy_2_3迁移学习/多任务学习
机器学习策略-多任务学习 Learninig from multiple tasks 觉得有用的话,欢迎一起讨论相互学习~Follow Me 2.7 迁移学习 Transfer Learninig 神 ...
- 迁移学习(Transformer),面试看这些就够了!(附代码)
1. 什么是迁移学习 迁移学习(Transformer Learning)是一种机器学习方法,就是把为任务 A 开发的模型作为初始点,重新使用在为任务 B 开发模型的过程中.迁移学习是通过从已学习的相 ...
- 迁移学习( Transfer Learning )
在传统的机器学习的框架下,学习的任务就是在给定充分训练数据的基础上来学习一个分类模型:然后利用这个学习到的模型来对测试文档进行分类与预测.然而,我们看到机器学习算法在当前的Web挖掘研究中存在着一个关 ...
随机推荐
- [补档]happiness
happiness 题目 传送门:http://cogs.pro/cogs/problem/problem.php?pid=1873 高一一班的座位表是个n×m的矩阵,经过一个学期的相处,每个同学和前 ...
- Koa框架教程,Koa框架开发指南,Koa框架中文使用手册,Koa框架中文文档
我的博客:CODE大全:www.codedq.net:业余草:www.xttblog.com:爱分享:www.ndislwf.com或ifxvn.com. Koa -- 基于 Node.js 平台的下 ...
- 字符串常用-----atof()函数,atoi()函数
头文件:#include <stdlib.h>函数 atof() 用于将字符串转换为双精度浮点数(double),其原型为:double atof (const char* str);at ...
- WCF项目的架构设计
本文将介绍以WCF开发项目为主的架构设计,主要从类库的分类和代码的结构. 下面将以一个WCF实例做具体的介绍.此项目底层是一个Windows Service,WCF服务Hosted在此Windows ...
- 利用GeoIP数据库及API进行地理定位查询 Java
地理定位查询的的数据库比较多,而且大多都开放一些free的版本 国内的有纯真数据库等,但是他只提供文本的地理位置信息,不提供经纬度数据 当应用到google map时,就不可以了 国外的有MaxMin ...
- 手动修复IAT
现在我们已经了解了IAT的的工作原理,现在我们来一起学习手动修复IAT,一方面是深入了解运行过程一方面是为了避免遇到有些阻碍自动修复IAT的壳时不知所措. 首先我们用ESP定律找到加了UPX壳后的OE ...
- 1.怎样控制div中的图片居中
答案如下: #div{ width: 100%; height: 100%; border: 1px solid #000; text-align: center;}#div img{ vertica ...
- WAS应用--虚拟主机
--WAS应用--虚拟主机 ---------------------2013/11/08 在部署was应用的时候,步骤3<为web模块映射虚拟主机>. 例如有应用orsscheduleE ...
- [Oracle Support]PeopleSoft Support中Fixes,Patches,Bundles,Packs?
在Oracle Support中经常能看到下面术语,一起学习下. Fixes - 最小单元的维护,修复通常会解决一个特定的问题,例如:oracle可能会给您一个解决生产问题的解决方案. Patches ...
- 老司机和你深聊 Kubenertes 资源分配之 Request 和 Limit 解析
欢迎大家前往腾讯云技术社区,获取更多腾讯海量技术实践干货哦~ 作者:腾讯云容器服务团队 Kubernetes是一个容器集群管理平台,Kubernetes需要统计整体平台的资源使用情况,合理地将资源分配 ...