出处：arXiv: Artificial Intelligence, 2016（一年了还没中吗？）

Motivation

使用GAN+RNN来处理continuous sequential data，并训练生成古典音乐

Introduction

In this work, we investigate the feasibility of using adversarial training for a sequential model with continuous data, and evaluate it using classical music in freely available midi files.也就是利用GAN+RNN来处理midi file中的连续数据。RNN主要工作用于处理时序相关的自然语言，同时也被引入到了音乐生成的领域[1,2,3]，but to our knowledge they always use a symbolic representation. In contrast,our work demonstrates how one can train a highly flexible and expressive model with fully continuous sequence data for tone lengths, frequencies, intensities, and timing.作者还刻意提到了LapGAN实现coarse-to-fine的图片生成过程（个人思考：对音乐生成很有启发，包括利用双层GAN来从caption生成image，一层用于生成低分辨率的粗线条色彩图片，一层用于生成细节，这些思路应该可以结合到音乐生成中去）。

Model

对抗网络中的G和D都是RNN模型，损失函数定义为

The input to each cell in G is a random vector, concatenated with the output of previous cell.D采用的是双向循环RNN（LSTM）。数据方面构建了一个tone length, frequency, intensity, and time的四元数组，数据可以表示出复调和弦polyphonous chords。

G和D的LSTM层数皆设置为2，BaseLine为去掉对抗性的单一的RNN生成网络。训练集Dataset是从网上down下来的标准midi格式的古典音乐文件，对所有的”note on“事件进行了记录的读取（包括该note的其他属性，时延，tone，强度等等），代码地址：https://github.com/olofmogren/c-rnn-gan

Training过程中使用了很多小技巧：

使用L2 regularization对G和D的权重做正则化约束
The model was pretrained for 6 epochs with a squared error loss for predicting the next event in the
training sequence
the input to each LSTM cell is a random vector v, concatenated with the output at previous time step. v is uniformly distributed in [0; 1]k, and k
was chosen to be the number of features in each tone, 4.
在预训练时，对采样的序列长度做了管理，从小序列开始逐渐加大，最后变成长序列
采用了[4]中的freezen的trick，当D或G被训练得异常强大以至于对方梯度消失，无法正常进行训练时，对过于强大的一方实施冻结。这里采用的是A‘s training loss is less than 70% of the training loss of B时，冻结A
采用了[4]中的feature matching的trick，将G的目标函数替换为使真假样本的feature差值最小化：

　　其中，R是D的最后一层（激活函数logistic之前）输出。

评估标准

Polyphony 复音是否在同一时间点开始

Scale consistency were computed by counting the fraction of tones that were part of a standard scale, and reporting the number for the best matching such scale.（标准音程是什么鬼？）

Repetitions 小节重复数量

Tone span 最高音和最低音的音程统计

评估工具代码也放在github上面了

结论

第一例通过GAN对抗训练来生成音乐的paper。从人耳听觉的感受上来说，c-RNN-GAN生成的音乐完全不能和真实样本相提并论，应该是单纯地进行对抗训练，单轨音调，缺乏先验乐理知识的融入的缘故导致。

sample 试听：http://mogren.one/publications/2016/c-rnn-gan/

[1]Douglas Eck and Juergen Schmidhuber. Finding temporal structure in music: Blues improvisation
with lstm recurrent networks. In Neural Networks for Signal Processing, 2002. Proceedings of the
2002 12th IEEE Workshop on, pages 747–756. IEEE, 2002.

[2]Pascal Vincent Nicolas Boulanger-Lewandowski, Yoshua Bengio. Modeling temporal dependencies
in high-dimensional sequences: Application to polyphonic music generation and transcription. In
Proceedings of the 29th International Conference on Machine Learning (ICML), page 1159–1166,
2012.

[3]Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. Seqgan: Sequence generative adversarial nets
with policy gradient. arXiv preprint arXiv:1609.05473, 2016.

[4]Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen.
Improved techniques for training gans. In Advances in Neural Information Processing Systems,
pages 2226–2234, 2016.

代码分析

Restore保存的参数：

'num_layers_g' ： RNN cell g的层数

'num_layers_d' ：RNN Cell D的层数

'meta_layer_size'：

'hidden_size_g'：

'hidden_size_d':

'biscale_slow_layer_ticks':

'multiscale':

'disable_feed_previous':

'pace_events':

'minibatch_d':

'unidirectional_d':

'feature_matching':

'composer':选取训练集中哪个作曲家的风格来进行训练，如巴赫贝多芬......

do-not-redownload.txt存在，则不再下载新的midi文件

read_data函数读出的格式为[genre, composer, song_data]

这里组织了一个sources列表，键值为风格，艺术家

用python-midi读出midi_pattern后，遍历每一个track的每一个event,通过NoteOnEvent和NoteOffEvent记录每一个note的四个维度数值：

TICKS_FROM_PREV_START = 0

LENGTH = 1

FREQ = 2

VELOCITY = 3

最后，一首歌的所有的note被汇总到一个song_data的list中去了。每一个[genre, composer, song_data]代表一首歌的特征数据，这些数据被append到 loader.songs['validation']， loader.songs['test'] ，loader.songs['train']中去了。

创建模型训练时使用了l2正则项来避免过拟合：scope.set_regularizer(tf.contrib.layers.l2_regularizer(scale=FLAGS.reg_scale))

创建G，一个多层的LSTM：

输入噪声random_rnninputs的shape为[batch_size, songlength, int(FLAGS.random_input_scale*num_song_features)]，然后转换为list

---恢复内容结束---

出处：arXiv: Artificial Intelligence, 2016（一年了还没中吗？）

Motivation

使用GAN+RNN来处理continuous sequential data，并训练生成古典音乐

Introduction

In this work, we investigate the feasibility of using adversarial training for a sequential model with continuous data, and evaluate it using classical music in freely available midi files.也就是利用GAN+RNN来处理midi file中的连续数据。RNN主要工作用于处理时序相关的自然语言，同时也被引入到了音乐生成的领域[1,2,3]，but to our knowledge they always use a symbolic representation. In contrast,our work demonstrates how one can train a highly flexible and expressive model with fully continuous sequence data for tone lengths, frequencies, intensities, and timing.作者还刻意提到了LapGAN实现coarse-to-fine的图片生成过程（个人思考：对音乐生成很有启发，包括利用双层GAN来从caption生成image，一层用于生成低分辨率的粗线条色彩图片，一层用于生成细节，这些思路应该可以结合到音乐生成中去）。

Model

对抗网络中的G和D都是RNN模型，损失函数定义为

Training过程中使用了很多小技巧：

使用L2 regularization对G和D的权重做正则化约束
The model was pretrained for 6 epochs with a squared error loss for predicting the next event in the
training sequence
the input to each LSTM cell is a random vector v, concatenated with the output at previous time step. v is uniformly distributed in [0; 1]k, and k
was chosen to be the number of features in each tone, 4.
在预训练时，对采样的序列长度做了管理，从小序列开始逐渐加大，最后变成长序列
采用了[4]中的freezen的trick，当D或G被训练得异常强大以至于对方梯度消失，无法正常进行训练时，对过于强大的一方实施冻结。这里采用的是A‘s training loss is less than 70% of the training loss of B时，冻结A
采用了[4]中的feature matching的trick，将G的目标函数替换为使真假样本的feature差值最小化：

　　其中，R是D的最后一层（激活函数logistic之前）输出。

评估标准

Polyphony 复音是否在同一时间点开始

Scale consistency were computed by counting the fraction of tones that were part of a standard scale, and reporting the number for the best matching such scale.（标准音程是什么鬼？）

Repetitions 小节重复数量

Tone span 最高音和最低音的音程统计

评估工具代码也放在github上面了

结论

sample 试听：http://mogren.one/publications/2016/c-rnn-gan/

[3]Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. Seqgan: Sequence generative adversarial nets
with policy gradient. arXiv preprint arXiv:1609.05473, 2016.

代码分析

Restore保存的参数：

'num_layers_g' ： RNN cell g的层数

'num_layers_d' ：RNN Cell D的层数

'meta_layer_size'：

'hidden_size_g'：

'hidden_size_d':

'biscale_slow_layer_ticks':

'multiscale':

'disable_feed_previous':

'pace_events':

'minibatch_d':

'unidirectional_d':

'feature_matching':

'composer':选取训练集中哪个作曲家的风格来进行训练，如巴赫贝多芬......

do-not-redownload.txt存在，则不再下载新的midi文件

read_data函数读出的格式为[genre, composer, song_data]

这里组织了一个sources列表，键值为风格，艺术家

用python-midi读出midi_pattern后，遍历每一个track的每一个event,通过NoteOnEvent和NoteOffEvent记录每一个note的四个维度数值：

TICKS_FROM_PREV_START = 0

LENGTH = 1

FREQ = 2

VELOCITY = 3

最后，一首歌的所有的note被汇总到一个song_data的list中去了。每一个[genre, composer, song_data]代表一首歌的特征数据，这些数据被append到 loader.songs['validation'] loader.songs['test'] loader.songs['train']中去了。

对于待训练的placeholder数据有：

self._input_songdata = tf.placeholder(shape=[batch_size, songlength, num_song_features], dtype=data_type())

self._input_metadata = tf.placeholder(shape=[batch_size, num_meta_features], dtype=data_type())

songdata_inputs将_input_songdata转成songlength个tensor的list，shape为[batch_size,num_song_features](这里用unstack要方便点吧，待测试)：

songdata_inputs = [tf.squeeze(input_, [1])

for input_ in tf.split(self._input_songdata, songlength, 1)]

创建模型训练时使用了l2正则项来避免过拟合：scope.set_regularizer(tf.contrib.layers.l2_regularizer(scale=FLAGS.reg_scale))

创建G的LSTM网络：

输入噪声random_rnninputs的shape为[batch_size, songlength, int(FLAGS.random_input_scale*num_song_features)]，然后转换为list（unstack？）

对G进行RNN的分步训练过程，每个循环是一步，输入为噪音random_rnninput和上一步的输出generated_point（两者concat为一个[batch_size,2*num_song_features]的tensor,第一步输出的初始化从均匀分布中采样）

对G还有个pretraining的过程，输入为噪音random_rnninputs和真实的sample songdata_input[i]

针对G的pretraining的loss是L2距离，注意这里的链表stack和[1,0,2]转置：

self.rnn_pretraining_loss = tf.reduce_mean(tf.squared_difference(x=tf.transpose(tf.stack(self._generated_features_pretraining), perm=[1, 0, 2]), y=self._input_songdata))

并加上一个正则项防止过拟合：

self.rnn_pretraining_loss = self.rnn_pretraining_loss+reg_loss

D采用了多（双）层双向LSTM，由于版本问题，我改写了一个多层lstm的接口:

要注意的是（1）由于bidirectional_dynamic_rnn每构建一次就会自动在名字空间中序号+1，所以用层数名来限定了scope（折腾了一天，是我菜还是tf太坑？）

（2）每次的输入_inputs需要把output中包含了bw和fw的tuple元组concat起来，每个tensor的shape为[batch_size,song_length,ouput_dim],其中output_dim和lstm隐层单元数量（状态数量）

一致,合并后shape为[batch_size,song_length,2×ouput_dim]

随后D将双向LSTM的输出全连接（output num = 1）并sigmoid映射为真假概率，同时输出output作为features，参与到feature loss的计算中去。

loss计算：

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记的更多相关文章

《Vision Permutator: A Permutable MLP-Like ArchItecture For Visual Recognition》论文笔记
论文题目:<Vision Permutator: A Permutable MLP-Like ArchItecture For Visual Recognition> 论文作者:Qibin ...
[place recognition]NetVLAD: CNN architecture for weakly supervised place recognition 论文翻译及解析（转）
https://blog.csdn.net/qq_32417287/article/details/80102466 abstract introduction method overview Dee ...
论文笔记系列-Auto-DeepLab:Hierarchical Neural Architecture Search for Semantic Image Segmentation
Pytorch实现代码:https://github.com/MenghaoGuo/AutoDeeplab 创新点 cell-level and network-level search 以往的NAS ...
论文笔记——Rethinking the Inception Architecture for Computer Vision
1. 论文思想 factorized convolutions and aggressive regularization. 本文给出了一些网络设计的技巧. 2. 结果用5G的计算量和25M的参数. ...
论文笔记：Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells
Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells 2019-04- ...
论文笔记：ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware 2019-03-19 16:13:18 Pape ...
论文笔记：DARTS: Differentiable Architecture Search
DARTS: Differentiable Architecture Search 2019-03-19 10:04:26accepted by ICLR 2019 Paper:https://arx ...
论文笔记：Progressive Neural Architecture Search
Progressive Neural Architecture Search 2019-03-18 20:28:13 Paper:http://openaccess.thecvf.com/conten ...
论文笔记：Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation2019-03-18 14:4 ...
论文笔记系列-DARTS: Differentiable Architecture Search
Summary 我的理解就是原本节点和节点之间操作是离散的,因为就是从若干个操作中选择某一个,而作者试图使用softmax和relaxation(松弛化)将操作连续化,所以模型结构搜索的任务就转变成了 ...

随机推荐

POJ 1741 树上点的分治
题意就是求树上距离小于等于K的点对有多少个 n2的算法肯定不行,因为1W个点这就需要分治.可以看09年漆子超的论文本题用到的是关于点的分治. 一个重要的问题是,为了防止退化,所以每次都要找到树的重 ...
Thinkphp5学习 Windows下的安装
方法一.通过官方网站直接下载: (1)下载地址:http://www.thinkphp.cn/down.html: (2)下载后,解压到web目录下: (3)访问:http://localhost/目 ...
从零开始写STL—哈希表
static const int _stl_num_primes = 28; template<typename T, typename Hash = xhash<T>> cl ...
Pick-up sticks--poj2653(判断两线段是否相交)
http://poj.org/problem?id=2653 题目大意:有n根各种长度的棍一同洒在地上求在最上面的棍子有那几个分析: 我刚开始想倒着遍历因为n是100000 想着会 ...
最长上升子序列(LIS)长度的O(nlogn)算法
最长上升子序列(LIS)的典型变形,熟悉的n^2的动归会超时.LIS问题可以优化为nlogn的算法.定义d[k]:长度为k的上升子序列的最末元素,若有多个长度为k的上升子序列,则记录最小的那个最末元素 ...
final finally finalize 区别及用法
final 1,final修饰的class,代表不可以继承扩展. 2.final的方法也是不可以重写的. 3.final修饰的变量是不可以修改的.这里所谓的不可修改对于基本类型来来,的确是不可以修改. ...
"格式太旧或是类型库无效。 (异常来自 HRESULT:0x80028019 (TYPE_E_UNSUPFORMAT))"
错误提示内容: “System.Runtime.InteropServices.COMException (0x80028019): 格式太旧或是类型库无效. (异常来自 HRESULT:0x8002 ...
AE的Annotation学习摘记
http://xg-357.blog.163.com/blog/static/36263124201151763512894/ IFeatureWorkspaceAnno pFWSAnno = (IF ...
Bag-of-words模型、TF-IDF模型
Bag-of-words model (BoW model) 最早出现在NLP和IR(information retrieval)领域. 该模型忽略掉文本的语法和语序, 用一组无序的单词(words) ...
公布Java桌面程序
我拿了一份桌面工具的开源码,修改动改,在elipse上执行.感觉良好.但到了公布应用程序,就傻眼了. 我竟然不知道咋公布! 呵呵,不愧是Java小白. 假设是微软阵营,直接就编译成exe了. 但jav ...

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记

代码分析

代码分析

《C-RNN-GAN: Continuous recurrent neural networks with adversarial training》论文笔记的更多相关文章

随机推荐

热门专题