Macro-Micro Adversarial Network for Human Parsing
Macro-Micro Adversarial Network for Human Parsing
ECCV-2018 2018-10-27 15:15:07
Paper: https://arxiv.org/pdf/1807.08260.pdf
Code: https://github.com/RoyalVane/MMAN
Motiviation-1: Why use the Adversarial Loss ?
Based on CNN architecture, the pixel-wise classification loss is usually used [19,34,10] which punishes the classification error for each pixel. Despite providing an effective baseline, the pixel-wise classification loss which is designed for per-pixel category prediction, has two drawbacks.
First, the pixel-wise classification loss may lead to local inconsistency, such as holes and blur. The reason is that it merely penalizes the false prediction on every pixel without explicitly considering the correlation among the adjacent pixels.
Second, pixel-wise classification loss may lead to semantic inconsistency in the overall segmentation map, such as unreasonable human poses and incorrect spatial relationship of body parts. Compared to the local inconsistency, the semantic inconsistency is generated from deeper layers. When only looking at a local region, the learned model does not have an overall sense of the topology of body parts.
In the attempt to address the inconsistency problems, the conditional random fields (CRFs) [17] can be employed as a post processing method. However, CRFs usually handle inconsistency in very limited scope (locally) due to the pairwise potentials, and may even generate worse label maps given poor initial segmentation result. As an alternative to CRFs, a recent work proposes the use of adversarial network [24]. Since the adversarial loss assesses whether a label map is real or fake by joint configuration of many label variables, it can enforce higher-level consistency, which cannot be achieved with pairwise terms or the per-pixel classification loss. Now, an increasing number of works adopt the routine of combining the cross entropy loss with an adversarial loss to produce label maps closer to the ground truth [5,27,12].
Motiviation-2: Why use the Two Discriminator ?
Nevertheless, the previous adversarial network also has its limitations.
First, the single discriminator back propagates only one adversarial loss to the generator. However, the local inconsistency is generated from top layers and the semantic inconsistency is generated from deep layers. The two targeted layers can not be discretely trained with only one adversarial loss.
Second, a single discriminator has to look at overall high-resolution image (or a large part of it) in order to supervise the global consistency. As mentioned by numbers of literatures [7,14], it is very difficult for a generator to fool the discriminator on a high-resolution image. As a result, the single discriminator back propagates a maximum adversarial loss invariably, which makes the training unbalanced. We call it poor convergence problem, as shown in Fig. 2.
Our Proposed Approach:
In this paper, the basic objective is to improve the local and semantic consistency of label maps in human parsing. We adopt the idea of adversarial training and at the same time aim to addresses its limitations, i.e., the inferior ability in improving parsing consistency with a single adversarial loss and the poor convergence problem. Specifically, we introduce the Macro-Micro Adversarial Nets (MMAN). MMAN consists of a dual-output generator (G) and two discriminators (D), named Macro D and Micro D. The three modules constitute two adversarial networks (Macro AN, Micro AN), addressing the semantic consistency and the local consistency, respectively.
Difference with Previous Works:
A brief pipeline of the proposed framework is shown in Fig. 3. It is in two critical aspects that MMAN departs from previous works.
First, our method explicitly copes with the local inconsistency and semantic inconsistency problem using two task-specific adversarial networks individually.
Second, our method does not use large-sized FOVs on high-resolution image, so we can avoid the poor convergence problem. More detailed description of the merits of the proposed network is provided in Section 3.5.
Our Contributions:
– We propose a new framework called Macro-Micro Adversarial Network (MMAN) for human parsing. The Macro AN and Micro AN focus on semantic and local inconsistency respectively, and work in complementary way to improve the parsing quality.
– The two discriminators in our framework achieve local and global supervision on the label maps with small field of views (FOVs), which avoids the poor convergence problem caused by high-resolution images.
– The proposed adversarial net achieves very competitive mIoU on the LIP and PASCAL-Person-Part datasets, and can be well generalized on a relatively small dataset PPSS.
==
Macro-Micro Adversarial Network for Human Parsing的更多相关文章
- 《Macro-Micro Adversarial Network for Human Parsing》论文阅读笔记
<Macro-Micro Adversarial Network for Human Parsing> 摘要:在人体语义分割中,像素级别的分类损失在其低级局部不一致性和高级语义不一致性方面 ...
- 论文阅读之:Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network 2016.10.23 摘要: ...
- 论文笔记: Mutual Learning to Adapt for Joint Human Parsing and Pose Estimation
Mutual Learning to Adapt for Joint Human Parsing and Pose Estimation 2018-11-03 09:58:58 Paper: http ...
- Face Aging with Conditional Generative Adversarial Network 论文笔记
Face Aging with Conditional Generative Adversarial Network 论文笔记 2017.02.28 Motivation: 本文是要根据最新的条件产 ...
- 生成对抗网络(Generative Adversarial Network)阅读笔记
笔记持续更新中,请大家耐心等待 首先需要大概了解什么是生成对抗网络,参考维基百科给出的定义(https://zh.wikipedia.org/wiki/生成对抗网络): 生成对抗网络(英语:Gener ...
- GAN Generative Adversarial Network 生成式对抗网络-相关内容
参考: https://baijiahao.baidu.com/s?id=1568663805038898&wfr=spider&for=pc Generative Adversari ...
- ASRWGAN: Wasserstein Generative Adversarial Network for Audio Super Resolution
ASEGAN:WGAN音频超分辨率 这篇文章并不具有权威性,因为没有发表,说不定是外国的某个大学的毕业设计,或者课程结束后的作业.或者实验报告. CS230: Deep Learning, Sprin ...
- 论文阅读:Single Image Dehazing via Conditional Generative Adversarial Network
Single Image Dehazing via Conditional Generative Adversarial Network Runde Li∗ Jinshan Pan∗ Zechao L ...
- Speech Super Resolution Generative Adversarial Network
博客作者:凌逆战 博客地址:https://www.cnblogs.com/LXP-Never/p/10874993.html 论文作者:Sefik Emre Eskimez , Kazuhito K ...
随机推荐
- 27、 jq 拖拽
<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title> ...
- js设计模式(五)---观察者模式
概述: 观察者模式也叫 “ 发布-订阅 " 模式 , 发布者发布信息是不需要考虑订阅者是谁?添加订阅者的时候也不需要通知发布者. 应用: 最经典的就是: DOM事件 开发过程中我们常用自定义 ...
- redis数据库通过dump.rdb文件恢复数据库或者数据库迁移
环境:centos7.2软件:redis-3.2.10(yum安装) 情景一:公司之前的redis没有开启aof模式,一直是rdb模式,但是数据又非常重要,数据一点也不能丢失,所以需要开启aof,但是 ...
- 用ps怎么修改照片的背景颜色??【申明:来源于网络】
用ps怎么修改照片的背景颜色??[申明:来源于网络] 地址:http://wenda.so.com/q/1361505315060523?src=140
- 大数据和Hadoop时代的维度建模和Kimball数据集市
小结: 1. Hadoop 文件系统中的存储是不可变的,换句话说,只能插入和追加记录,不能修改数据.如果你熟悉的是关系型数据仓库,这看起来可能有点奇怪.但是从内部机制看,数据库是以类似的机制工作,在一 ...
- Bom 字符串的问题
不含 BOM 的 UTF-8 才是标准形式",的确是这样,无BOM使用得更多些,所以个人还是推荐一般情况下用无BOM的形式吧,除非有问题的时候,再考虑换有BOM的.Windows系统保存的都 ...
- Qt带返回值的信号发射方式(使用QMetaObject::invokeMethod)
一般来说,我们发出信号使用emit这个关键字来操作,但是会发现,emit并不算一个调用,所以它没有返回值.那么如果我们发出这个信号想获取一个返回值怎么办呢? 两个办法:1.通过出参形式返回,引用或者指 ...
- java+tomcat开发环境搭建
java+tomcat开发环境搭建 一.jdk环境变量设置 ...........这里省略n个字............. 二.tomcat环境变量设置 安装好tomcat后 1.新建环境变量: CA ...
- SLAM领域牛人、牛实验室、牛研究成果梳理
点击公众号"计算机视觉life"关注,置顶星标更快接收消息! 本文阅读时间约5分钟 对于小白来说,初入一个领域时最应该了解的当然是这个领域的研究现状啦.只有知道这个领域大家现在正在 ...
- CentOS 7 之 Systemd 入门教程:命令篇
Systemd 是 Linux 系统工具,用来启动守护进程,已成为大多数发行版的标准配置 历史上,Linux 的启动一直采用init进程 下面的命令用来启动服务 [root@DaMoWang ~]# ...