《SONG FROM PI: A MUSICALLY PLAUSIBLE NETWORK FOR POP MUSIC GENERATION》论文笔记

出处：ICLR 2017

Motivation

提出一个通用的基于RNN的pop music生成模型，在层次结构中封装了先验乐理知识（prior knowledge about how pop music is composed）。bottom layers生成旋律，higher levels生成鼓，和弦等。人工听觉测试的结论优于google提出的模型。并且作者基于该模型加了两个小应用：neural dancing and karaoke, as well as neural story singing.

Introduction

作者从机器学习往艺术领域的渗透开始谈起，目前已经在模仿梵高风格绘画，生成story，莎翁的小说等等方面取得进展，音乐是其中一个分属领域。RNN在自然语言文本处理方面有着自己的优势，在它的基础上完成音乐生成的工作具备可行性。例如[1,2,3,4]。但这些前人的工作基本都是生成单轨道的note，多轨道生成的研究有[5](polyphonic music)。作者希望能将旋律，和弦，鼓及其他乐器轨道同时生成出来，以构成完整意义上的pop song。作者的想法借鉴了yotube视频上基于$pi$的序列弹奏钢琴曲的启迪（https://youtu.be/OMq9he-5HUU），该钢琴曲的一些生成规则使随机不循环数列形成了音乐（shows both the randomness and the regularity of music. On one hand, since any possible digit sequence is a subset of the $pi$ digit sequence, this implies that pleasing music can be created even from a totally random base signal. On the other hand, the composer uses specific rules such as A Harmonic Minor scale and harmonies to convert the digit sequence into a music sheet. It is these rules that play the key role in converting randomness into music.）

Related work

基本上智能谱曲经历的时期是早期的机器学习+乐理[6]，到神经网络学习[1,2,3]，再到后面的深度学习（RNN）[4,7,8]+淡化乐理

音乐常识

what is note？defines the basic unit that music is composed of

12均分律 Music follows the 12-tone system, i.e., 12 is the cycle length of all notes. The 12 tones are: $C$, $C\^#=D\^b&, &D&,&D\^#=E\^b&, $E$,$F$, $F\^#=G\^b$, &G&, &G\^#=A\^b&, $A$,&A\^#=B\^b&, &B&.

A bar is a short segment of time that corresponds to a specific number of beats (notes).

Scale is a subset of notes.最常见的四种音阶：大小调Major (Minor), 和声小调Harmonic Minor, 旋律小调Melodic Minor and 布鲁斯Blues。如C大调音阶（C major）从c开始The subset of notes specified by C Major is thus C, D, E, F, G, A, and B (a subset of seven notes). All scales types have a subset of seven notes except for Blues which has six. In total we have 48 unique scales, i.e. 4 scale types and 12 possible starting notes. We treat Major and Minor as one type as for a Major scale there is always a Minor that has exactly the same set of notes. In music theory, this is referred to as Relative Minor.（关系小调）

Chord 和弦

The Circle of Fifths 五度音环

利用五度圈可以很容易进行和弦倾向的走向判定（strong chord progression），使整个乐章进行和谐。

模型结构

在生成音乐时，需要将scale作为条件，以便模型选择node。在每个timestep，将旋律melody封装为两个随机变量：key layer和press layer 分别表示按下的key值和duration时间。对于chord和鼓，作者假设它们与旋律是独立的，在每一个timestep，将旋律作为条件，生成chord layer和drum layer。

在实验时，作者针对Scale条件做了一些预处理。通常一个类型的音阶只会使用到12均分律中的7个音，或blues使用的6个。在数据集midi_man中采样了100个小时的pop song sample后，作者对所有note做了一个normalization，将首个note都平移至C（其余notes也做相应的平移），这样就便于将所有的歌曲归纳到4种类型的音阶中去。

旋律生成采用了两层的RNN（LSTM）模型，模型基于我们选定的音阶条件来生成音符，第一层为key layer，第二层为press layer。

由于有不同的scale，所以针对不同的scale，参数不一样，要重新训练？？？？ notes的输入输出范围被限定在C3 to C6，鼓励但不限定输出note一定要在scale的范围内，这样就会得到3个全音程（每个12个音符）加上静音共37个输出的范围值。press layer的输出使用softmax（？为什么）

LSTM的输入包括：以one-hot形式编码的上一个时间节点的note输出， Lookback features(由Google Magenta提出，可以使模型更容易记住近期的生成并在将来进行repeat，这里面有一些细节的数据结构，如用来记录一个bar和两个bar之前的输出与当前输出的对应关系之类的，需要看代码细致了解才行),melody profile（表现了high-level music flow，To get the profile for each song, we compute the local
note histogram at each time step with width of two bars, and cluster all local histograms within the song into 10 clusters via k-means. We order the 10 clusters with mean note ordered from low to high as cluster 1 to 10, and apply moving averages on the cluster id sequence to encourage local smoothness. This results in a 10-dimensional one-hot vector representation of the cluster id for each time step. This additional information allows the user to set the melody’s ups and downs of the song.本人理解这个profile定义了旋律的走向是升高还是降低）。使用了增序序列1，2，3...来表示按键的持续时间，作者指出这种方式相对于Magenta的单一note on消息要有优势，This is important, as Waite et al. has extremely unbalanced output distributions dominated by the repeat-of-holding event. We represent press $y_prs_^t$ as a 8-dimensional one-hot vector. The input to our LSTM is$y_prs_^{t-1}$ , concatenated with the 37-dimensional one-hot encoding of the melody key $y_key_^t$.

Chord layer

作者发现 99.19% of the chords belong to one of 72 chord classes (6 types X 12 start notes)，且chord is strongly correlated with melody.如下为首音符与和弦的对应关系统计图

[1]Jamshed J. Bharucha and Peter M. Todd. Modeling the perception of tonal structure with neural nets. Computer Music Journal, 13(4):44–53, 1989.

[2]Michael C. Mozer. Neural network music composition by prediction: Exploring the benefits of psychoacoustic constraints and multi-scale processing. Connection Science, 6(2-3), 1996.

[3]Chun-Chi J. Chen and Risto Miikkulainen. Creating melodies with evolving recurrent neural networks. In International Joint Conference on Neural Networks, 2001.

[4]Douglas Eck and Juergen Schmidhuber. A first look at music composition using lstm recurrent neural networks. 2002.

[5]Nicolas Boulanger-lewandowski, Yoshua Bengio, and Pascal Vincent. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription.In ICML, 2012.

[6]Michael Chan, John Potter, and Emery Schubert. Improving algorithmic music composition with machine learning. In 9th International Conference on Music Perception and Cognition, 2006.

[7]Semin Kang, Soo-Yol Ok, and Young-Min Kang. Automatic Music Generation and Machine Learning Based Evaluation, pp. 436–443. Springer Berlin Heidelberg, 2012.(复调，但是scale type is enforced)

[8]Allen Huang and Raymond Wu. Deep learning for music. arXiv preprint arXiv:1606.04930, 2016 (2-layer LSTM,able to create chord)

《SONG FROM PI: A MUSICALLY PLAUSIBLE NETWORK FOR POP MUSIC GENERATION》论文笔记的更多相关文章

《Vision Permutator: A Permutable MLP-Like ArchItecture For Visual Recognition》论文笔记
论文题目:<Vision Permutator: A Permutable MLP-Like ArchItecture For Visual Recognition> 论文作者:Qibin ...
[place recognition]NetVLAD: CNN architecture for weakly supervised place recognition 论文翻译及解析（转）
https://blog.csdn.net/qq_32417287/article/details/80102466 abstract introduction method overview Dee ...
论文笔记系列-Auto-DeepLab:Hierarchical Neural Architecture Search for Semantic Image Segmentation
Pytorch实现代码:https://github.com/MenghaoGuo/AutoDeeplab 创新点 cell-level and network-level search 以往的NAS ...
论文笔记——Rethinking the Inception Architecture for Computer Vision
1. 论文思想 factorized convolutions and aggressive regularization. 本文给出了一些网络设计的技巧. 2. 结果用5G的计算量和25M的参数. ...
论文笔记：Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells
Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells 2019-04- ...
论文笔记：ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware 2019-03-19 16:13:18 Pape ...
论文笔记：DARTS: Differentiable Architecture Search
DARTS: Differentiable Architecture Search 2019-03-19 10:04:26accepted by ICLR 2019 Paper:https://arx ...
论文笔记：Progressive Neural Architecture Search
Progressive Neural Architecture Search 2019-03-18 20:28:13 Paper:http://openaccess.thecvf.com/conten ...
论文笔记：Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation2019-03-18 14:4 ...
论文笔记系列-DARTS: Differentiable Architecture Search
Summary 我的理解就是原本节点和节点之间操作是离散的,因为就是从若干个操作中选择某一个,而作者试图使用softmax和relaxation(松弛化)将操作连续化,所以模型结构搜索的任务就转变成了 ...

随机推荐

inux IO 内核参数调优之参数调节和场景分析
http://backend.blog.163.com/blog/static/2022941262013112081215609/ http://blog.csdn.net/icycode/arti ...
【c++】面向对象程序设计之虚函数详解
一.动态绑定什么时候发生当且仅当通过指针或引用调用虚函数时,才会在运行时解析该调用二.派生类中的虚函数当我们在派生类中覆盖了某个虚函数时,可以再一次使用virtual指出该函数的性质,但是这么做 ...
CentOS下常用的 19 条命令
玩过Linux的人都会知道,Linux中的命令的确是非常多,但是玩过Linux的人也从来不会因为Linux的命令如此之多而烦恼,因为我们只需要掌握我们最常用的命令就可以了.当然你也可以在使用时去找一下 ...
CMS - tabBar
Tips:如果网页图片(文字)看不清,请按CTRL+鼠标滚轮 1.建议使用阿里图库或者 easyicon 2.建议使用81*81且低于40KB的图片(建议jpg) 3.如需查看脑图结构,请点击:ta ...
Android手机输入法按键监听-dispatchKeyEvent
近期在项目开发中遇到一个关于手机输入键盘的坑.特来记录下. 应用场景: 项目中有一个界面是用viewpaper加三个fragment写的,当中viewpaper被我屏蔽了左右滑动,上面有三个点击按钮, ...
virtual member functions(单一继承情况)
virtual member functions的实现(就单一继承而言): 1.实现:首先会给有多态的class object身上增加两个members:一个字符串或数字便是class的类型,一个是指 ...
Matlab依据样本随机数求概率曲线
相关Matlab函数:hist, bar, cdfplot, ksdensity (1) hist函数 n = hist(Y, x) 假设x是一个向量,返回x的长度个以x为中心的,Y的分布情况. 比 ...
记录魅族m1note编译TWRP recovery 3.1.0-0，包括mtk机型的处理方法
1.安装64位linux系统,我用的是deepin os 15.3 2.将系统升级到最新版本 sudo apt-get update && sudo apt-get upgrade 3 ...
C++类中static修饰的函数的使用
//在C++中应该养成习惯:只用静态成员函数引用静态成员数据,而不引用非静态成员数据 #include <iostream>using namespace std;class st_inf ...
《ASP.NET》数据绑定—DataList
DataList控件是.NET中的一个控件.DataList控件以表的形式呈现数据(在属性生成器中能够编辑),通过该控件,您能够使用不同的布局来显示数据记录(使用模板编辑).比如,将数据记录排成列或行 ...

《SONG FROM PI: A MUSICALLY PLAUSIBLE NETWORK FOR POP MUSIC GENERATION》论文笔记

《SONG FROM PI: A MUSICALLY PLAUSIBLE NETWORK FOR POP MUSIC GENERATION》论文笔记的更多相关文章

随机推荐

热门专题