Goals for the lecture:

Introduction & overview of the key methods and developments.

[Good starting point for you to start reading and understanding papers!]

原文链接:



@

Probabilistic Graphical Models | Elements of Meta-Learning

01 Intro to Meta-Learning

Motivation and some examples

When is standard machine learning not enough?

Standard ML finally works for well-defined, stationary tasks.



But how about the complex dynamic world, heterogeneous data from people and the interactive robotic systems?

General formulation and probabilistic view

What is meta-learning?

Standard learning: Given a distribution over examples (single task), learn a function that minimizes the loss:



Learning-to-learn: Given a distribution over tasks, output an adaptation rule that can be used at test time to generalize from a task description

A Toy Example: Few-shot Image Classification



Other (practical) Examples of Few-shot Learning







Gradient-based and other types of meta-learning

Model-agnostic Meta-learning (MAML) 与模型无关的元学习

  • Start with a common model initialization \(\theta\)
  • Given a new task \(T_i\) , adapt the model using a gradient step:

  • Meta-training is learning a shared initialization for all tasks:



Does MAML Work?

MAML from a Probabilistic Standpoint

Training points:

testing points:

MAML with log-likelihood loss对数似然损失:



One More Example: One-shot Imitation Learning 模仿学习

Prototype-based Meta-learning



Prototypes:



Predictive distribution:



Does Prototype-based Meta-learning Work?

Rapid Learning or Feature Reuse 特征重用







Neural processes and relation of meta-learning to GPs

Drawing parallels between meta-learning and GPs

In few-shot learning:

  • Learn to identify functions that generated the data from just a few examples.
  • The function class and the adaptation rule encapsulate our prior knowledge.

Recall Gaussian Processes (GPs): 高斯过程

  • Given a few (x, y) pairs, we can compute the predictive mean and variance.
  • Our prior knowledge is encapsulated in the kernel function.

Conditional Neural Processes 条件神经过程







On software packages for meta-learning

A lot of research code releases (code is fragile and sometimes broken)

A few notable libraries that implement a few specific methods:



Takeaways

  • Many real-world scenarios require building adaptive systems and cannot be solved using “learn-once” standard ML approach.
  • Learning-to-learn (or meta-learning) attempts extend ML to rich multitask scenarios—instead of learning a function, learn a learning algorithm.
  • Two families of widely popular methods:
    • Gradient-based meta-learning (MAML and such)
    • Prototype-based meta-learning (Protonets, Neural Processes, ...)
    • Many hybrids, extensions, improvements (CAIVA, MetaSGD, ...)
  • Is it about adaptation or learning good representations? Still unclear and depends on the task; having good representations might be enough.
  • Meta-learning can be used as a mechanism for causal discovery.因果发现 (See Bengio et al., 2019.)

02 Elements of Meta-RL

What is meta-RL and why does it make sense?

Recall the definition of learning-to-learn

Standard learning: Given a distribution over examples (single task), learn a function that minimizes the loss:



Learning-to-learn: Given a distribution over tasks, output an adaptation rule that can be used at test time to generalize from a task description



Meta reinforcement learning (RL): Given a distribution over environments, train a policy update rule that can solve new environments given only limited or no initial experience.

Meta-learning for RL

On-policy and off-policy meta-RL

On-policy RL: Quick Recap 符合策略的RL:快速回顾



REINFORCE algorithm:

On-policy Meta-RL: MAML (again!)

  • Start with a common policy initialization \(\theta\)
  • Given a new task \(T_i\) , collect data using initial policy, then adapt using a gradient step:

  • Meta-training is learning a shared initialization for all tasks:





    Adaptation as Inference 适应推理

    Treat policy parameters, tasks, and all trajectories as random variables随机变量



    meta-learning = learning a prior and adaptation = inference



    Off-policy meta-RL: PEARL



Key points:

  • Infer latent representations z of each task from the trajectory data.
  • The inference networkq is decoupled from the policy, which enables off-policy learning.
  • All objectives involve the inference and policy networks.

Adaptation in nonstationary environments 不稳定环境

Classical few-shot learning setup:

  • The tasks are i.i.d. samples from some underlying distribution.
  • Given a new task, we get to interact with it before adapting.
  • What if we are in a nonstationary environment (i.e. changing over time)? Can we still use meta-learning?



    Example: adaptation to a learning opponent

    Each new round is a new task. Nonstationary environment is a sequence of tasks.

Continuous adaptation setup:

  • The tasks are sequentially dependent.
  • meta-learn to exploit dependencies

Continuous adaptation

Treat policy parameters, tasks, and all trajectories as random variables

RoboSumo: a multiagent competitive env

an agent competes vs. an opponent, the opponent’s behavior changes over time

Takeaways

  • Learning-to-learn (or meta-learning) setup is particularly suitable for multi-task reinforcement learning
  • Both on-policy and off-policy RL can be “upgraded” to meta-RL:
    • On-policy meta-RL is directly enabled by MAML
    • Decoupling task inference and policy learning enables off-policy methods
  • Is it about fast adaptation or learning good multitask representations? (See discussion in Meta-Q-Learning: https://arxiv.org/abs/1910.00125)
  • Probabilistic view of meta-learning allows to use meta-learning ideas beyond distributions of i.i.d. tasks, e.g., continuous adaptation.
  • Very active area of research.

卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning的更多相关文章

  1. 李飞飞确认将离职!谷歌云AI总帅换人,卡耐基·梅隆老教授接棒

    https://mp.weixin.qq.com/s/i1uwZALu1BcOq0jAMvPdBw 看点:李飞飞正式回归斯坦福,新任谷歌云AI总帅还是个教授,不过这次是全职. 智东西9月11日凌晨消息 ...

  2. 知乎:在卡内基梅隆大学 (Carnegie Mellon University) 就读是怎样一番体验?

    转自:http://www.zhihu.com/question/24295398   知乎 Yu Zhang 知乎搜索 首页 话题 发现 消息 调查类问题名校就读体验修改 在卡内基梅隆大学 (Car ...

  3. 卡内基梅隆大学软件工程研究所先后制定用于评价软件系统成熟度的模型CMM和CMMI

    SEI(美国卡内基梅隆大学软件工程研究所(Software Engineering Institute, SEI))开发的CMM模型有: 用于软件的(SW-CMM;SW代表'software即软件') ...

  4. 洛谷P3389 高斯消元 / 高斯消元+线性基学习笔记

    高斯消元 其实开始只是想搞下线性基,,,后来发现线性基和高斯消元的关系挺密切就一块儿在这儿写了好了QwQ 先港高斯消元趴? 这个算法并不难理解啊?就会矩阵运算就过去了鸭,,, 算了都专门为此写个题解还 ...

  5. 【敬业福bug】支付宝五福卡敬业福太难求 被炒至200元

    016年央视春晚官方独家互动合作伙伴--支付宝,正式上线春晚红包玩法集福卡活动. 用户新加入10个支付宝好友,就可以获成3张福卡.剩下2张须要支付宝好友之间相互赠送.交换,终于集齐5张福卡就有机会平分 ...

  6. 【转载】 准人工智能分享Deep Mind报告 ——AI“元强化学习”

    原文地址: https://www.sohu.com/a/231895305_200424 ------------------------------------------------------ ...

  7. (@WhiteTaken)设计模式学习——享元模式

    继续学习享元模式... 乍一看到享元的名字,一头雾水,学习了以后才觉得,这个名字确实比较适合这个模式. 享元,即共享对象的意思. 举个例子,如果制作一个五子棋的游戏,如果每次落子都实例化一个对象的话, ...

  8. 大学启示录I 浅谈大学生的学习与就业

    教育触感 最近看了一些书,有了一些思考,以下纯属博主脑子被抽YY的一些无关大雅的思考,如有雷同,纯属巧合.. 现实总是令人遗憾的,我们当中太多人已经习惯于沿着那一成不变的"典型成功道路&qu ...

  9. python学习(十)元类

    python 可以通过`type`函数创建类,也可通过type判断数据类型 import socket from io import StringIO import sys class TypeCla ...

随机推荐

  1. [NOIP 2016D2T2/Luogu P1600] 天天爱跑步 (LCA+差分)

    待填坑 Code //Luogu P1600 天天爱跑步 //Apr,4th,2018 //树上差分+LCA #include<iostream> #include<cstdio&g ...

  2. uniApp朋友圈(参考)

    介绍 功能:回复,点赞(笔芯),评论,图片(最多六张). 码云地址:https://gitee.com/sunliusen/friend 例:

  3. python爬虫03 Urllib库

    Urllib   这可是 python 内置的库 在 Python 这个内置的 Urllib 库中 有这么 4 个模块 request request模块是我们用的比较多的 就是用它来发起请求 所以我 ...

  4. 比特魔方原创,用十分钟在Cocos-BCX上发行了自己的NFT

    比特魔方原创 作者 | 第二个区块 出品 |比特魔方 NFT正在积累越来越多的共识.每看到人们讨论NFT,我隐约就能联想到2019年人们谈论DeFi的时候.隐约让我感到欠缺的是,相对2019年的DeF ...

  5. 45. 跳跃游戏 II

    给定一个非负整数数组,你最初位于数组的第一个位置. 数组中的每个元素代表你在该位置可以跳跃的最大长度. 你的目标是使用最少的跳跃次数到达数组的最后一个位置. 示例: 输入: [2,3,1,1,4]输出 ...

  6. gdb调试入门(上)

    一.什么是gdb:gdb是GNU debugger的缩写,是编程调试工具二.gdb功能:1.启动程序,可根据用户要求随心所欲的运行程序(比如带参数)2.可让被调试的程序在用户指定的调试的断点处停住3. ...

  7. ipmi常用的命令行命令

    前言 记录一些常用的命令行操作 命令 查询机器的电源状态 ipmitool -I lanplus -U admin -P admin -H 172.16.21.215 power status 硬重启 ...

  8. 如何删除一台OSD主机

    在ceph的一台OSD主机出现故障的时候,数据可以通过副本的机制进行恢复,之后通过删除osd的操作也能够将故障osd从osd tree当中删除掉,但是故障的 osd 的主机仍然会留在集群当中,通过 c ...

  9. webpack 无法打包:No configuration file found and no output filename configured via CLI option

    报错内容 No configuration file found and no output filename configured via CLI option.A configuration fi ...

  10. [代码审计Day2] filter_var函数缺陷代码审计

    简介 // composer require "twig/twig" require 'vendor/autoload.php'; class Template { private ...