Matching Networks for One Shot Learning
1. Introduction
In this work, inspired by metric learning based on deep neural features and memory augment neural networks, authors propose matching networks that map a small labelled support set and an unlabelled example to its label. Then they define one-shot learning problems on vision and language tasks and obtain an improving one-shot accuracy on ImageNet and Omnight. The novelty of their work is twofold: at the modeling level, and at the training procedure.
2. Model
Their non-parametric approach to solving one-shot is based on two components. First, the model architecture follows recent advances in neural networks augmented with memory. Given a support set $S$, the model difines a function $c_S$(or classifier) for each $S$ Sencond, we employ a training strategy which is tailored for one-shot learning from the support set $S$

2.1 Model Architecture
Matching Networks are able to produce sensible test labels for unobserved classes without any changes to the network. We wish to map from a support set of $k$ examples of images-label pairs $S={(x_i,y_i)}_{i=1}^k$ to a classfier $c_S(\hat{x})$ which,given a test example $\hat{x}$, defines a probability distribution over outputs $\hat{y}$. Furthmore, difine the mapping $S\rightarrow c_S(\hat{x})$ to be $P(\hat{y} \mid \hat{x},S)$ where $P$ is parameterised by a neural network. Thus, When given a new support set of examples $S'$ from which to one-shot learn, we simply use the parametric neural network defined by $P$ to make predictions about the appropriate label $\hat{y}$ for each test example $\hat{x}$: $P(\hat{y} \mid \hat{x},S')$. In general, our predicted output class for a given input unseen example $\hat{x}$ and a support set $S$ becomes $arg \max_y P(y\mid \hat{x},S)$. The model in its simplest form computes $\hat{y}$ as follows:
$$ \hat{y}=\sum_{i=1}^k a(\hat{x},x_i)y_i $$
where $x_i,y_i$ are the samples and labels from the support set $S=\{(x_i,y_i)\}_{i=1}^k$, and $a$ is an attention mechanism. Here,the attention kernel function is the softmax over the cosine distance. $$ a(\hat{x},x_i)=\frac{e^{c(f(\hat{x}),g(x_i))}}{\sum_{j=1}^k e^{c(f(\hat{x}),g(x_j))}} $$ where embeding functions $f$ and $g$ are, actually, appropriate neural networks to embed $\hat{x}$ and $x_i$
2.2 Training Strategy
Let us define a tast $T$ as distribution over possible label sets $L$. To form an “episode” to compute gradients and update our model, we first sample $L$ from $T$(e.g.,$L$ could be the label set {cats; dogs}). We then use $L$ to sample the support set $S$ and a batch $B$ (i.e., both $S$ and $B$ are labelled examples of cats and dogs). The Matching Net is then trained to minimise the error predicting the labels in the batch B conditioned on the support set $S$. This is a form of meta-learning since the training procedure explicitly learns to learn from a given support set to minimise a loss over a batch. More precisely, the Matching Nets training objective is as follows:
$$ \theta = arg\max_{\theta}E_{L\sim T}\Big[E_{S\sim L,B\sim L}\Big[\sum_{(x,y)\in B}\log P_{\theta}(y\mid x,S)\Big]\Big] $$
Training $\theta$ with this objective function yields a model which works well when sampling $S'\sim T'$ from a different distribution of novel labels
3. Appendix
3.1 The Fully Conditional Embedding $f$
The embedding function for an example $\hat{x}$ in the batch $B$ is as follows:
$$ f(\hat{x},S)=attLSTM(f'(\hat{x}),g(S),K) $$
where $f'$ is a neural network. $K$ is the number of "processing" steps following work. $g(S)$ represents the embedding function $g$ applied to each element $x_i$ from the set $S$. Thus, the state after $k$ processing steps is as follows:
$$ \hat{h}_k,c_k = LSTM(f'(\hat{x}),[h_{k-1},r_{k-1}],c_{k-1}) $$
$$ h_k = \hat{h}_k+f'(\hat{x}) $$
$$ r_{k-1}=\sum_{i=1}^{|S|}a(h_{k-1},g(x_i))g(x_i) $$
$$ a(h_{k-1},g(x_i))=softmax(h_{k-1}^Tg(x_i)) $$
3.2 The Fully Conditional Embedding $g$
The encoding function for the elements in the support set $S$, $g(x_i,S)$ as a bidirectional LSTM. Let g'(x_i) be a neural network, then we difine $g(x_i,S)=\vec{h}_i+h_i^{\leftarrow}+g'(x_i)$ with:
$$ \vec{h}_i,\vec{c}_i=LSTM(g'(x_i),\vec{h}_{i-1},\vec{c}_{i-1}) $$
$$ h_i^{\leftarrow},c_i^{\leftarrow}=LSTM(g'(x_i),h_{i+1}^{\leftarrow},c_{i+1}^{\leftarrow}) $$
Reference: https://arxiv.org/abs/1606.04080
Matching Networks for One Shot Learning的更多相关文章
- (转)Paper list of Meta Learning/ Learning to Learn/ One Shot Learning/ Lifelong Learning
Meta Learning/ Learning to Learn/ One Shot Learning/ Lifelong Learning 2018-08-03 19:16:56 本文转自:http ...
- Multi-attention Network for One Shot Learning
Multi-attention Network for One Shot Learning 2018-05-15 22:35:50 本文的贡献点在于: 1. 表明类别标签信息对 one shot l ...
- (六)6.11 Neurons Networks implements of self-taught learning
在machine learning领域,更多的数据往往强于更优秀的算法,然而现实中的情况是一般人无法获取大量的已标注数据,这时候可以通过无监督方法获取大量的未标注数据,自学习( self-taught ...
- 论文笔记系列-Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves
I. 背景介绍 1. 学习曲线(Learning Curve) 我们都知道在手工调试模型的参数的时候,我们并不会每次都等到模型迭代完后再修改超参数,而是待模型训练了一定的epoch次数后,通过观察学习 ...
- CS229 6.11 Neurons Networks implements of self-taught learning
在machine learning领域,更多的数据往往强于更优秀的算法,然而现实中的情况是一般人无法获取大量的已标注数据,这时候可以通过无监督方法获取大量的未标注数据,自学习( self-taught ...
- 零样本学习 - (Zero shot learning,ZSL)
https://zhuanlan.zhihu.com/p/41846072 https://zhuanlan.zhihu.com/p/38418698 https://zhuanlan.zhihu.c ...
- 18 Issues in Current Deep Reinforcement Learning from ZhiHu
深度强化学习的18个关键问题 from: https://zhuanlan.zhihu.com/p/32153603 85 人赞了该文章 深度强化学习的问题在哪里?未来怎么走?哪些方面可以突破? 这两 ...
- (zhuan) Where can I start with Deep Learning?
Where can I start with Deep Learning? By Rotek Song, Deep Reinforcement Learning/Robotics/Computer V ...
- Few-Shot/One-Shot Learning
Few-Shot/One-Shot Learning指的是小样本学习,目的是克服机器学习中训练模型需要海量数据的问题,期望通过少量数据即可获得足够的知识. Matching Networks for ...
随机推荐
- Java String对象的问题 String s="a"+"b"+"c"+"d"
1, String s="a"+"b"+"c"+"d"创建了几个对象(假设之前串池是空的) 2,StringBuilde ...
- 微信小程序精品demo
http://www.jianshu.com/p/0ecf5aba79e1 感谢笔者的分享!
- 查询linux计算机的出口ip
执行以下命令即可: [root@tkafka ~]# curl http://members.3322.org/dyndns/getip 123.103.9.7 碰到的场景: 微信公众号需要配置ip白 ...
- [SQL]T-Sql 递归查询(给定节点查所有父节点、所有子节点的方法)
T-Sql 递归查询(给定节点查所有父节点.所有子节点的方法) -- 查找所有父节点with tab as( select Type_Id,ParentId,Type_Name from Sys_ ...
- base64 压缩上传上传图片
@{ ViewBag.Title = "dddddddd"; Layout = "~/Areas/Wap/Views/Shared/_Head.cshtml"; ...
- Android 开发 框架系列 Android-Universal-Image-Loader 图片加载使用demo
Android-Universal-Image-Loader github地址:https://github.com/nostra13/Android-Universal-Image-Loader 加 ...
- RMI(远程方法调用)入门
这两篇可以入门 http://www.cnblogs.com/ninahan0419/archive/2009/06/25/javarmi.html http://www.cnblogs.com/wx ...
- PHP获取手机型号
<?php $user_agent = $_SERVER['HTTP_USER_AGENT']; if (stripos($user_agent, "iPhone") ...
- 系统计划 Cron作业 为什么/etc/crontab /etc/cron.d /etc/cron.* 那么多的定义方式????
当我们要增加全局性的计划任务时,一种方式是直接修改/etc/crontab.但是,一般不建议这样做,/etc/cron.d目录就是为了解决这种问题而创建的.例如,增加一项定时的备份任务,我们可以这样处 ...
- HttpWebResponse远程服务器返回错误: (500) 内部服务器错误 的解决办法
在工作中用C#开发了一个小程序,不断访问去请求一个网站的页面,在循环过程中有时会报“远程服务器返回错误: (500) 内部服务器错误”,有时不会,出现的时机也不太一样.开始以为是网站的问题,后来网站是 ...