A Summary of Multi-task Learning

author by Yubo Feng.

Intro

In this paper[0], the introduction of multi-task learning through the data hungry, the most common problem of Deep Learning[1].

Basic assumption: tasks are related.

MTL mimic human learning since people transfer knowledge from one task to another.

MTL is similar to transfer learning, multi-label learning, and multi-output regression. In my opinion, Dual Learning[2] is a subset of Multi-Task Learning.

Multi-task Learning

Definition 1.(Multi-task learning) Give \(m\) learning tasks \({\{T_i\}}^m_{i=1}\) where tasks are related but not identical, \(multi-task learning\) aims to improve the model for \(T_i\) by using the knowledge in the \(m\) tasks.

Two foundations for MTL:

  1. tasks are related;
  2. tasks lead to learning settings.

1. Multi-task Supervised Learning (MTSL)

Model Description:

  • Task \(T_i\) for \(i=1,...,m\)
  • Dataset \(D_i = \{(x_j^i, y_j^i)\}^{n_i}_{j=1}\)
  • \(x_j^i\) is a \(d\)-dimensional feature vector
  • \(y_j^i\) is the label for \(x_j^i\)
  • \(f_i(x_j^i)\) is a good approximation of \(y_j^i\) , is used to prediction for \(i\)-th task
  • to learn \(\{f_i(x)\}^m_{i=1}\) for \(m\) task

Task relatedness in three aspects:

  1. feature --> feature-based MTSL
  2. parameter --> paramter-based MTSL
  3. instance --> instance-based MTSL

1.1 Feature-based MTSL

From this aspect, it assumes that tasks share feature representations can be a subset or a transformation of the original features.

It can learn a common feature representation for different tasks.

It is more suitable for applications whose original feature representation is not so informative and discriminative.

But it can easily be affected by outlier tasks.

It is a complemental to parameter-based MTSL.

1.1.1 Feature Transformation Approach

There are two types of transformation, linear and nonlinear.

Feedforward neural network

Can do nonlinear and linear transformation if the activation unit is linear.

Linear Fitting

There are two specific methods named multi-task feature learning (MTFL) and multi-task sparse coding(MTSC).

Generally speaking, they transform data instance as \(\hat{x_j^i} = U^T x_j^i\) and then learning \(f_i(x_j^i) = (a^i)^T \hat{x_j^i} + b_i\) . Apprently, there are two steps of linear transformation on the features.

Differences between MTFL and MTSC:

  • MTFL

    • \(U\) is orthogonal
    • \(A\) is row-sparse via \(l_{2,1}\) regularization where \(A=(a^1, ..., a^m)\) .
  • MTSC
    • \(U\) is overcomplete
    • A is sparse via the \(l_1\) regularization

1.1.2 Feature Selection Approach

In this approach, it aims to learn \(f_i(x) = (W^i)^Tx + b\) . Furthermore, there are two distinct operations on \(W\), first is the regularization, second is spare probabilistic priors.

Regularization

To minimize \(||W||_{p,q}\) and \(l_{p,q}\) norm regularization are most widely used techniques of \(W\) .

The effect of \(l_{p,q}\) regularization is to make \(W\) row-sparse and hence some unimportant features can be filtered.

Spare Probabilistic Priors

\(w_{ji} \sim GN(0, \rho_j, p)\) , where \(GN(\cdot, \cdot, \cdot )\) denote generalized normal distribution.

1.1.3 Deep Learning Approach

  • tens of or hundreds of hidden layers
  • treat the output of one hidden layer as the shared feature representation

Recently, the most impact NLP progress, BERT[3], also leverage the output of hidden layers as the shared feature representation to deal with 11 NLP tasks.

1.2 Parameter-based MTSL

It uses model parameters to relate the learning of different tasks.

It can learn more accurate model parameters.

It is more robust to outlier tasks than the feature-based model.

1.2.1 Low-rank Approach

Since tasks are assumed to be related, the parameter matrix \(W\) is likely to be low-rank.

Similar tasks usually have similar model parameters, which makes \(W\) likely to be low-rank. It means that the lower rank of \(W\) the higher linear similarities between tasks, by other words, similar tasks have similar parameter matrix \(W\).

1.2.2 Task-Clustering Approach

It aims to divide tasks into several clusters and all the tasks in a cluster are assumed to share identical or similar model parameters.

In this approach, there are several ways to specifically implement.

TC algorithm

TC algorithm has few steps as below:

  1. separately learn under the single-task setting
  2. cluster the tasks based on the model parameters
  3. pool the training data of all the tasks in a task cluster

Bayesian Neural Network

BNN has a similar structure to the multi-layer neural network.

BNN is based on the Gaussian mixture model in terms of model parameters.

The Dirichlet process is widely used in Bayesian learning to do data clustering.

Regularization

It tries to decompose model parameters \(W\) and then regularizes decomposed components.

1.2.3 Task-relation Learning Approach

It directly learns the pairwise task relations from data. Task relations are used to reflect the task relatedness. Hence it urgently needs distance measures including task similarities and task covariances.

But there are some fatal difficulties to be noticed, in the real-world applications the task relations are hard to verify and the prior information is difficult to obtain.

1.2.4 Dirty Approach

In this approach, it assumes the decomposition of the parameter matrix \(W\) into two component matrices, each of which is regularized by a type of the sparsity.

For example, it decomposes the parameter matrix \(W\) into \(W=U+V\). And the objective function can be defined as to minimize the unified training loss \(g(U) + h(V)\).

\(U\) mainly identifies the task relatedness among tasks. \(V\) is capable of capturing noises or outliers.

1.2.5 Multi-level Approach

This approach is a generalization of the dirty approach. it decomposes the parameter matrix \(W\), more than 2 component matrices, into \(h\) component matrices \(\{W_i\}_{i=1}^{h}\).

It is capable of modeling more complex task structures than dirty approach.

1.3 Instance-based MTSL

There are few works in this category.

It seems parallel to the other two categories.

2. Multi-task Unsupervised Learning (MTUL)

MTUL mainly focuses on multi-task clustering, but not very many studies on multi-task clustering exist. So it is a chance to exploit it.

My research domain is word representation learning, word2vec is one of my baselines. Hence I am always thinking about how to enhance it with leverage more knowledge into it.

3. Multi-task Semi-supervised Learning (MTSSL)

The core idea of the semi-supervised is that unlabeled data are utilized to help improve the performance of supervised learning. In this sense, the MTSSL is same, where unlabeled data are used to improve the performance of supervised learning.

There are two types of MTSSL, classification, and regression.

4. Multi-task Active Learning (MTAL)

Active Learning MTAL
Identical has a small number of labeled data
Difference selects unlabeled instances are informative for all the tasks instead of only one task

5. Multi-task Reinforcement Learning

Motivation: when environments are similar, different reinforcement learning tasks can use similar policies to make decisions.

6. Multi-task Online Learning

Multi-task online learning models can handle the problem that traditional MTL model can not, that is training data came in a sequential way.

7. Multi-task Multi-view Learning

Each data point can be described by different feature representations, each feature representation is called a view.

Each multi-view data point is usually associated with a label.

Application in Nature Language Processing

Major NLP tasks are part-of-speech, tagging, chunking, named entity recognition, semantic role labeling, language modeling, and semantically related words.

Conclusions

Almost all the deep models just share hidden layers for different tasks, it is very useful when all the tasks are very similar.

The future work could focus on to design more flexible architecture that can tolerate dissimilar tasks.

Bibliography

[0] Zhang Y , Yang Q . An overview of multi-task learning[J]. National Science Review, 2018, 5(1):30-43.

[1] Li, Hang. Deep learning for natural language processing: advantages and challenges[J]. National Science Review, 2018, 5(1):24-26.

[2] 夏应策.对偶学习的理论和实验研究[D].中国科学技术大学,2018.

[3] Jacob Devlin. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805

A Summary of Multi-task Learning的更多相关文章

  1. struct2depth 记录

    把效果图放在前面 03.28 handle_motion  False architecture    simple joint_encoder  False depth_normalization  ...

  2. CVPR2022 Oral OGM-GE阅读笔记

    标题:Balanced Multimodal Learning via On-the-fly Gradient Modulation(CVPR 2022 Oral) 论文:https://arxiv. ...

  3. Task示例,多线程

    class Program { static void Main(string[] args) { Run(); } public static async void Run() { var task ...

  4. Self-Supervised Representation Learning

    Self-Supervised Representation Learning 2019-11-11 21:12:14  This blog is copied from: https://lilia ...

  5. 【转】【C#】【Thread】【Task】多线程

    多线程 多线程在4.0中被简化了很多,仅仅只需要用到System.Threading.Tasks.::.Task类,下面就来详细介绍下Task类的使用. 一.简单使用 开启一个线程,执行循环方法,返回 ...

  6. 怎么设置task的最大线程数

    //-------------------------------------------------------------------------- // // Copyright (c) Mic ...

  7. 异步task处理

    public async Task<Customers> GetCustomers() { return await Service.GetCustomersAsync(); } publ ...

  8. 创建CancellationTokenSource对象用于取消Task

    虽然使用线程池ThreadPool让我们使用多线程变得容易,但是因为是由系统来分配的,如果想对线程做精细的控制就不太容易了,比如某个线程结束后执行一个回调方法.恰好Task可以实现这样的需求.这篇文章 ...

  9. 使用Unity拦截一个返回Task的方法

    目标 主要是想为服务方法注入公用的异常处理代码,从而使得业务代码简洁.本人使用Unity.Interception主键来达到这个目标.由于希望默认就执行拦截,所以使用了虚方法拦截器.要实现拦截,需要实 ...

  10. Task的在主线程处理异常信息的Helper类

    最近使用task时候需要把异常记录日志,直接注入非单例模式的实例进入异步线程,在高并发情况下会出现一些问题. 所以需要把异常反馈给主线程 ,并且不在主线程里进行等待,研究相关资料后,自己写了一个简单的 ...

随机推荐

  1. shell 递归枚举文件并操作

    递归枚举文件并操作 #!/bin/bash CURDIR=$(cd $(dirname $0); pwd) export GOPATH=$CURDIR/.. echo GOPATH=$GOPATH c ...

  2. Python Pandas 箱线图

    各国家用户消费分布 import numpy as np import pandas as pd import matplotlib.pyplot as plt data = { 'China': [ ...

  3. 唉,可爱的小朋友---(DFS)

    唉,小朋友是比较麻烦的.在一个幼儿园里,老师要上一节游戏课,有N个小朋友要玩游戏,做游戏时要用小皮球,但是幼儿园里只有M个小皮球,而且有些小朋友不喜欢和一些小朋友在一起玩,而只喜欢和另一些小朋友一起玩 ...

  4. 教你如何用笔记本设置超快WIFI

    以win7为例 1.在主菜单运行框输入  cmd------->以管理员的身份运行 2.命令提示符中输入:netsh wlan set hostednetwork mode=allow ssid ...

  5. ASP.NET页面使用JQuery EasyUI生成Dialog后台取值为空

    原因: JQuery EasyUI生成Dialog后原来的文档结构发生了变化,原本在form里的内容被移动form外面,提交到后台后就没有办法取值了. 解决办法: 在生成Dialog后将它append ...

  6. angularjs使用BUG收集和解决办法

    此文章涉及到时1.X的版本.请注意! 1.关于checkbox和bootstrap不能选中BUG 在使用angularjs的时候,有个比较明显的bug ng-disabled无效的情况 这里是一种情况 ...

  7. SQL中什么时候需要使用游标?使用游标的步骤

    https://zhidao.baidu.com/question/568932670.html 例子table1结构如下id intname varchar(50) declare @id intd ...

  8. python ssh之paramiko模块使用

    1.安装: sudo pip install paramiko 2.连接到linux服务器 方法一: #paramiko.util.log_to_file('ssh.log') #写日志文件 clie ...

  9. SetupFactory 制作安装包

    SetupFactory9.0.3.0Trial汉化破解版+使用教程 https://download.csdn.net/download/u010188178/10652645

  10. 一个ping大包不通问题的解决过程

    1.问题描述 存在问题: 深圳的采集机MQ程序无法与应用服务器进行通讯,表现为:获取小数据时正常,获取大数据时超时 场景图如下 2.数据下载测试 使用SCP工具和FTP工具进行数据下载测试,主要是想排 ...