迁移学习-Transfer Learning

迁移学习两种类型：
- ConvNet as fixed feature extractor：利用在大数据集(如ImageNet)上预训练过的ConvNet(如AlexNet，VGGNet)，移除最后几层（一般是最后分类器），将剩下的ConvNet作为应用于新数据集的固定不变的特征提取器，输出特征称为CNN codes，如果在预训练网络上是经过ReLUd，那这些codes也要经过ReLUd（important for performance）；提取出所有CNN codes之后，再基于新数据集训练一个线性分类器（Linear SVM or Softmax classifier）；
- Fine-tuning the ConvNet：第一步：在新数据集上，替换预训练ConvNet顶层的分类器并retrain该分类器；第二步：以较小的学习率继续反向传播来微调预训练网络的权重，两种做法：微调ConvNet的所有层，或者保持some earlier layers fixed (due to overfitting concerns) ，只微调some higher-level portion of the network；
- 原理：一般认为CNN中前端（靠近输入图片）的层提取的是纹理、色彩等基本特征，越靠近后端，提取的特征越高级、抽象、面向具体任务。所以更普遍的微调方法是：固定其他参数不变，替换预训练网络最后几层，基于新数据集重新训练最后几层的参数（之前的层参数保持不变，作为特征提取器），之后再用较小的学习率将网络整体训练。
- 一些开源的Pretrained models：Model Zoo

When and how to fine-tune?
- 四个主要场景：
- 新数据集小，且与原始数据集相似：要考虑小数据集过度拟合问题；利用CNN codes 训练一个线性分类器
- 新数据集大，且与原始数据集相似：不用考虑过度拟合，可尝试微调整个神经网络；
- 新数据集小，并与原始数据集差距大：训练一个线性分类器，而新数据集与原始数据集差距大，work better to train the SVM classifier from activations somewhere earlier in the network
- 新数据集大，且与原始数据集差距大，使用预训练模型参数，基于新数据集微调整个神经网络

Practical advice
- a few additional things to keep in mind when performing Transfer Learning:
- Constraints from pretrained models：使用预训练网络，新数据集使用的架构将受限，比如不能随意take out Conv layers from the pretrained network；
- Learning rates：微调 ConvNet权重（ConvNet weights are relatively good）时的学习率要比新的线性分类器（权重是随机初始化的）的学习率要小；

英文原文：

(These notes are currently in draft form and under development)

Table of Contents:

● Transfer Learning

● Additional References

Transfer Learning

In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest. The three major Transfer Learning scenarios look as follows:

● ConvNet as fixed feature extractor. Take a ConvNet pretrained on ImageNet, remove the last fully-connected layer (this layer’s outputs are the 1000 class scores for a different task like ImageNet), then treat the rest of the ConvNet as a fixed feature extractor for the new dataset. In an AlexNet, this would compute a 4096-D vector for every image that contains the activations of the hidden layer immediately before the classifier. We call these features CNN codes. It is important for performance that these codes are ReLUd (i.e. thresholded at zero) if they were also thresholded during the training of the ConvNet on ImageNet (as is usually the case). Once you extract the 4096-D codes for all images, train a linear classifier (e.g. Linear SVM or Softmax classifier) for the new dataset.

● Fine-tuning the ConvNet. The second strategy is to not only replace and retrain the classifier on top of the ConvNet on the new dataset, but to also fine-tune the weights of the pretrained network by continuing the backpropagation. It is possible to fine-tune all the layers of the ConvNet, or it’s possible to keep some of the earlier layers fixed (due to overfitting concerns) and only fine-tune some higher-level portion of the network. This is motivated by the observation that the earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original dataset. In case of ImageNet for example, which contains many dog breeds, a significant portion of the representational power of the ConvNet may be devoted to features that are specific to differentiating between dog breeds.

● Pretrained models. Since modern ConvNets take 2-3 weeks to train across multiple GPUs on ImageNet, it is common to see people release their final ConvNet checkpoints for the benefit of others who can use the networks for fine-tuning. For example, the Caffe library has a Model Zoo where people share their network weights.

When and how to fine-tune? How do you decide what type of transfer learning you should perform on a new dataset? This is a function of several factors, but the two most important ones are the size of the new dataset (small or big), and its similarity to the original dataset (e.g. ImageNet-like in terms of the content of images and the classes, or very different, such as microscope images). Keeping in mind that ConvNet features are more generic in early layers and more original-dataset-specific in later layers, here are some common rules of thumb for navigating the 4 major scenarios:

New dataset is small and similar to original dataset. Since the data is small, it is not a good idea to fine-tune the ConvNet due to overfitting concerns. Since the data is similar to the original data, we expect higher-level features in the ConvNet to be relevant to this dataset as well. Hence, the best idea might be to train a linear classifier on the CNN codes.
New dataset is large and similar to the original dataset. Since we have more data, we can have more confidence that we won’t overfit if we were to try to fine-tune through the full network.
New dataset is small but very different from the original dataset. Since the data is small, it is likely best to only train a linear classifier. Since the dataset is very different, it might not be best to train the classifier form the top of the network, which contains more dataset-specific features. Instead, it might work better to train the SVM classifier from activations somewhere earlier in the network.
New dataset is large and very different from the original dataset. Since the dataset is very large, we may expect that we can afford to train a ConvNet from scratch. However, in practice it is very often still beneficial to initialize with weights from a pretrained model. In this case, we would have enough data and confidence to fine-tune through the entire network.

Practical advice. There are a few additional things to keep in mind when performing Transfer Learning:

● Constraints from pretrained models. Note that if you wish to use a pretrained network, you may be slightly constrained in terms of the architecture you can use for your new dataset. For example, you can’t arbitrarily take out Conv layers from the pretrained network. However, some changes are straight-forward: Due to parameter sharing, you can easily run a pretrained network on images of different spatial size. This is clearly evident in the case of Conv/Pool layers because their forward function is independent of the input volume spatial size (as long as the strides “fit”). In case of FC layers, this still holds true because FC layers can be converted to a Convolutional Layer: For example, in an AlexNet, the final pooling volume before the first FC layer is of size [6x6x512]. Therefore, the FC layer looking at this volume is equivalent to having a Convolutional Layer that has receptive field size 6x6, and is applied with padding of 0.

● Learning rates. It’s common to use a smaller learning rate for ConvNet weights that are being fine-tuned, in comparison to the (randomly-initialized) weights for the new linear classifier that computes the class scores of your new dataset. This is because we expect that the ConvNet weights are relatively good, so we don’t wish to distort them too quickly and too much (especially while the new Linear Classifier above them is being trained from random initialization).

Additional References

● CNN Features off-the-shelf: an Astounding Baseline for Recognition trains SVMs on features from ImageNet-pretrained ConvNet and reports several state of the art results.

● DeCAF reported similar findings in 2013. The framework in this paper (DeCAF) was a Python-based precursor to the C++ Caffe library.

● How transferable are features in deep neural networks? studies the transfer learning performance in detail, including some unintuitive findings about layer co-adaptations.

迁移学习transfer-learning
- 迁移学习参考资料：the CS231n course notes
- 在tensorflow上实现迁移学习： How to Retrain Inception's Final Layer for New Categories
- 在实际中，通常不会自己训练庞大的神经网络。事实上已有很多使用大型数据集（如 ImageNet）训练长达数周的模型。
- 学习使用预训练模型 VGGNet 对花朵图像进行分类
- VGGNet is great because it's simple and has great performance, coming in second in the ImageNet competition. The idea here is that we keep all the convolutional layers, but replace the final fully connected layers with our own classifier. This way we can use VGGNet as a feature extractor for our images then easily train a simple classifier on top of that. What we'll do is take the first fully connected layer with 4096 units, including thresholding with ReLUs. We can use those values as a code for each image, then build a classifier on top of those codes.
- using VGGNet to classify images of flowers：
- Github-tensorflow-vgg--Tensorflow VGG16 and VGG19

迁移学习-Transfer Learning的更多相关文章

【转载】迁移学习(Transfer learning),多任务学习(Multitask learning)和端到端学习(End-to-end deep learning)
--------------------- 作者:bestrivern 来源:CSDN 原文:https://blog.csdn.net/bestrivern/article/details/8700 ...
【深度学习系列】迁移学习Transfer Learning
在前面的文章中,我们通常是拿到一个任务,譬如图像分类.识别等,搜集好数据后就开始直接用模型进行训练,但是现实情况中,由于设备的局限性.时间的紧迫性等导致我们无法从头开始训练,迭代一两百万次来收敛模型, ...
pytorch例子学习——TRANSFER LEARNING TUTORIAL
参考:https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html 以下是两种主要的迁移学习场景微调convnet : ...
图像识别 | AI在医学上的应用 | 深度学习 | 迁移学习
参考:登上<Cell>封面的AI医疗影像诊断系统:机器之心专访UCSD张康教授 Identifying Medical Diagnoses and Treatable Diseases b ...
AI小白必读：深度学习、迁移学习、强化学习别再傻傻分不清
摘要:诸多关于人工智能的流行词汇萦绕在我们耳边,比如深度学习 (Deep Learning).强化学习 (Reinforcement Learning).迁移学习 (Transfer Learning ...
【迁移学习】2010-A Survey on Transfer Learning
资源:http://www.cse.ust.hk/TL/ 简介: 一个例子: 关于照片的情感分析. 源:比如你之前已经搜集了大量N种类型物品的图片进行了大量的人工标记(label),耗费了巨大的人力物 ...
[DeeplearningAI笔记]ML strategy_2_3迁移学习/多任务学习
机器学习策略-多任务学习 Learninig from multiple tasks 觉得有用的话,欢迎一起讨论相互学习~Follow Me 2.7 迁移学习 Transfer Learninig 神 ...
迁移学习(Transformer)，面试看这些就够了！(附代码)
1. 什么是迁移学习迁移学习(Transformer Learning)是一种机器学习方法,就是把为任务 A 开发的模型作为初始点,重新使用在为任务 B 开发模型的过程中.迁移学习是通过从已学习的相 ...
迁移学习（ Transfer Learning ）
在传统的机器学习的框架下,学习的任务就是在给定充分训练数据的基础上来学习一个分类模型:然后利用这个学习到的模型来对测试文档进行分类与预测.然而,我们看到机器学习算法在当前的Web挖掘研究中存在着一个关 ...

随机推荐

Java生成MD5加密字符串代码实例
这篇文章主要介绍了Java生成MD5加密字符串代码实例,本文对MD5的作用作了一些介绍,然后给出了Java下生成MD5加密字符串的代码示例,需要的朋友可以参考下 (1)一般使用的数据库中都会保存用 ...
初学Python（三）——字典
初学Python(三)——字典初学Python,主要整理一些学习到的知识点,这次是字典. #-*- coding:utf-8 -*- d = {1:"name",2:" ...
easyAR图钉功能实现相关
图钉功能总算做出来了,发现真的是因为没认真看手册导致的=.=[跪最后的代码很简单,但是过程中看了不少camera/tracker相关的东西,感觉需要整理记录一下图钉功能记录: 目标:实现用ARCa ...
html2canvas页面截图图片不显示
前两天在一个群里,有人问使用html2canvas屏幕截图的时候为什么页面的图片不显示只显示了文字,我没有做过屏幕截图的需求,所以不是很清楚,今天稍稍测试了一下. 在github上将html2canv ...
流畅python学习笔记：第十七章：并发处理
第十七章:并发处理本章主要讨论Python3引入的concurrent.futures模块.在python2.7中需要用pip install futures来安装.concurrent.futur ...
org.apache.commons.io——FileUtils学习笔记
FileUtils类的应用 1.写入一个文件: 2.从文件中读取: 3.创建一个文件夹,包括文件夹: 4.复制文件和文件夹: 5.删除文件和文件夹: 6.从URL地址中获取文件: 7.通过文件过滤器和 ...
POJ 3254 Corn Fields：网格密铺类状压dp
题目链接:http://poj.org/problem?id=3254 题意: 给你一片n*m的耕地,你可以在上面种玉米.但是其中有一些地方是荒芜的,不能种植.并且种植玉米的地方不能相邻.问你在这片地 ...
MySQL主从失败错误Got fatal error 1236解决方法
--MySQL主从失败错误Got fatal error 1236解决方法 ----------------------------------------------------2014/05/1 ...
linux下利用mentohust校园拨号上网小记
mentohust下载地址 :http://code.google.com/p/mentohust/downloads/list 安装就不多说了,我是在kali系统下使用的,经测试只有32为的可以使用 ...
ACM做题过程中的一些小技巧
1.一般用C语言节约空间,要用C++库函数或STL时才用C++; cout.cin和printf.scanf最好不要混用. 2.有时候int型不够用,可以用long long或__int64型(两个下 ...

迁移学习-Transfer Learning

迁移学习-Transfer Learning的更多相关文章

随机推荐

热门专题