《SONG FROM PI: A MUSICALLY PLAUSIBLE NETWORK FOR POP MUSIC GENERATION》论文笔记
出处:ICLR 2017
Motivation
提出一个通用的基于RNN的pop music生成模型,在层次结构中封装了先验乐理知识(prior knowledge about how pop music is composed)。bottom layers生成旋律,higher levels生成鼓,和弦等。人工听觉测试的结论优于google提出的模型。并且作者基于该模型加了两个小应用:neural dancing and karaoke, as well as neural story singing.
Introduction
作者从机器学习往艺术领域的渗透开始谈起,目前已经在模仿梵高风格绘画,生成story,莎翁的小说等等方面取得进展,音乐是其中一个分属领域。RNN在自然语言文本处理方面有着自己的优势,在它的基础上完成音乐生成的工作具备可行性。例如[1,2,3,4]。但这些前人的工作基本都是生成单轨道的note,多轨道生成的研究有[5](polyphonic music)。作者希望能将旋律,和弦,鼓及其他乐器轨道同时生成出来,以构成完整意义上的pop song。作者的想法借鉴了yotube视频上基于$pi$的序列弹奏钢琴曲的启迪(https://youtu.be/OMq9he-5HUU),该钢琴曲的一些生成规则使随机不循环数列形成了音乐(shows both the randomness and the regularity of music. On one hand, since any possible digit sequence is a subset of the $pi$ digit sequence, this implies that pleasing music can be created even from a totally random base signal. On the other hand, the composer uses specific rules such as A Harmonic Minor scale and harmonies to convert the digit sequence into a music sheet. It is these rules that play the key role in converting randomness into music.)
Related work
基本上智能谱曲经历的时期是早期的机器学习+乐理[6],到神经网络学习[1,2,3],再到后面的深度学习(RNN)[4,7,8]+淡化乐理
音乐常识
what is note?defines the basic unit that music is composed of
12均分律 Music follows the 12-tone system, i.e., 12 is the cycle length of all notes. The 12 tones are: $C$, $C\^#=D\^b&, &D&,&D\^#=E\^b&, $E$,$F$, $F\^#=G\^b$, &G&, &G\^#=A\^b&, $A$,&A\^#=B\^b&, &B&.
A bar is a short segment of time that corresponds to a specific number of beats (notes).
Scale is a subset of notes.最常见的四种音阶:大小调Major (Minor), 和声小调Harmonic Minor, 旋律小调Melodic Minor and 布鲁斯Blues。如C大调音阶(C major)从c开始The subset of notes specified by C Major is thus C, D, E, F, G, A, and B (a subset of seven notes). All scales types have a subset of seven notes except for Blues which has six. In total we have 48 unique scales, i.e. 4 scale types and 12 possible starting notes. We treat Major and Minor as one type as for a Major scale there is always a Minor that has exactly the same set of notes. In music theory, this is referred to as Relative Minor.(关系小调)
Chord 和弦
The Circle of Fifths 五度音环
利用五度圈可以很容易进行和弦倾向的走向判定(strong chord progression),使整个乐章进行和谐。
模型结构
在生成音乐时,需要将scale作为条件,以便模型选择node。在每个timestep,将旋律melody封装为两个随机变量:key layer和press layer 分别表示按下的key值和duration时间。对于chord和鼓,作者假设它们与旋律是独立的,在每一个timestep,将旋律作为条件,生成chord layer和drum layer。
在实验时,作者针对Scale条件做了一些预处理。通常一个类型的音阶只会使用到12均分律中的7个音,或blues使用的6个。在数据集midi_man中采样了100个小时的pop song sample后,作者对所有note做了一个normalization,将首个note都平移至C(其余notes也做相应的平移),这样就便于将所有的歌曲归纳到4种类型的音阶中去。
旋律生成采用了两层的RNN(LSTM)模型,模型基于我们选定的音阶条件来生成音符,第一层为key layer,第二层为press layer。
由于有不同的scale,所以针对不同的scale,参数不一样,要重新训练???? notes的输入输出范围被限定在C3 to C6,鼓励但不限定输出note一定要在scale的范围内,这样就会得到3个全音程(每个12个音符)加上静音共37个输出的范围值。press layer的输出使用softmax(?为什么)
LSTM的输入包括:以one-hot形式编码的上一个时间节点的note输出, Lookback features(由Google Magenta提出,可以使模型更容易记住近期的生成并在将来进行repeat,这里面有一些细节的数据结构,如用来记录一个bar和两个bar之前的输出与当前输出的对应关系之类的,需要看代码细致了解才行),melody profile(表现了high-level music flow,To get the profile for each song, we compute the local
note histogram at each time step with width of two bars, and cluster all local histograms within the song into 10 clusters via k-means. We order the 10 clusters with mean note ordered from low to high as cluster 1 to 10, and apply moving averages on the cluster id sequence to encourage local smoothness. This results in a 10-dimensional one-hot vector representation of the cluster id for each time step. This additional information allows the user to set the melody’s ups and downs of the song.本人理解这个profile定义了旋律的走向是升高还是降低)。使用了增序序列1,2,3...来表示按键的持续时间,作者指出这种方式相对于Magenta的单一note on消息要有优势,This is important, as Waite et al. has extremely unbalanced output distributions dominated by the repeat-of-holding event. We represent press $y_prs_^t$ as a 8-dimensional one-hot vector. The input to our LSTM is$y_prs_^{t-1}$ , concatenated with the 37-dimensional one-hot encoding of the melody key $y_key_^t$.
Chord layer
作者发现 99.19% of the chords belong to one of 72 chord classes (6 types X 12 start notes),且chord is strongly correlated with melody.如下为首音符与和弦的对应关系统计图
[1]Jamshed J. Bharucha and Peter M. Todd. Modeling the perception of tonal structure with neural nets. Computer Music Journal, 13(4):44–53, 1989.
[2]Michael C. Mozer. Neural network music composition by prediction: Exploring the benefits of psychoacoustic constraints and multi-scale processing. Connection Science, 6(2-3), 1996.
[3]Chun-Chi J. Chen and Risto Miikkulainen. Creating melodies with evolving recurrent neural networks. In International Joint Conference on Neural Networks, 2001.
[4]Douglas Eck and Juergen Schmidhuber. A first look at music composition using lstm recurrent neural networks. 2002.
[5]Nicolas Boulanger-lewandowski, Yoshua Bengio, and Pascal Vincent. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription.In ICML, 2012.
[6]Michael Chan, John Potter, and Emery Schubert. Improving algorithmic music composition with machine learning. In 9th International Conference on Music Perception and Cognition, 2006.
[7]Semin Kang, Soo-Yol Ok, and Young-Min Kang. Automatic Music Generation and Machine Learning Based Evaluation, pp. 436–443. Springer Berlin Heidelberg, 2012.(复调,但是scale type is enforced)
[8]Allen Huang and Raymond Wu. Deep learning for music. arXiv preprint arXiv:1606.04930, 2016 (2-layer LSTM,able to create chord)
《SONG FROM PI: A MUSICALLY PLAUSIBLE NETWORK FOR POP MUSIC GENERATION》论文笔记的更多相关文章
- 《Vision Permutator: A Permutable MLP-Like ArchItecture For Visual Recognition》论文笔记
论文题目:<Vision Permutator: A Permutable MLP-Like ArchItecture For Visual Recognition> 论文作者:Qibin ...
- [place recognition]NetVLAD: CNN architecture for weakly supervised place recognition 论文翻译及解析(转)
https://blog.csdn.net/qq_32417287/article/details/80102466 abstract introduction method overview Dee ...
- 论文笔记系列-Auto-DeepLab:Hierarchical Neural Architecture Search for Semantic Image Segmentation
Pytorch实现代码:https://github.com/MenghaoGuo/AutoDeeplab 创新点 cell-level and network-level search 以往的NAS ...
- 论文笔记——Rethinking the Inception Architecture for Computer Vision
1. 论文思想 factorized convolutions and aggressive regularization. 本文给出了一些网络设计的技巧. 2. 结果 用5G的计算量和25M的参数. ...
- 论文笔记:Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells
Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells 2019-04- ...
- 论文笔记:ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware 2019-03-19 16:13:18 Pape ...
- 论文笔记:DARTS: Differentiable Architecture Search
DARTS: Differentiable Architecture Search 2019-03-19 10:04:26accepted by ICLR 2019 Paper:https://arx ...
- 论文笔记:Progressive Neural Architecture Search
Progressive Neural Architecture Search 2019-03-18 20:28:13 Paper:http://openaccess.thecvf.com/conten ...
- 论文笔记:Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation2019-03-18 14:4 ...
- 论文笔记系列-DARTS: Differentiable Architecture Search
Summary 我的理解就是原本节点和节点之间操作是离散的,因为就是从若干个操作中选择某一个,而作者试图使用softmax和relaxation(松弛化)将操作连续化,所以模型结构搜索的任务就转变成了 ...
随机推荐
- Maven项目配置外部依赖(本地依赖)
加入有一些公共jar包只限于公司内部使用,不能暴露在外部时,有如下的方案: 1.搭建私有远程仓库(nexus) 2.使用http.ftp.共享地址,github地址等(主要是通过maven-deplo ...
- 上手ReactiveCocoa之基础篇
转自 --> http://www.jianshu.com/p/87ef6720a096 前言 很多blog都说ReactiveCocoa好用,然后各种秀自己如何灵活运用ReactiveCoco ...
- linxu下的shell脚本加密,shell生成二机制可执行文件
再安全的加密也抵不过逆向,斗智斗勇吧,持续加密持续破解 1.简单的加密:gzexe file.sh 2.使用shc加密:下载地址:http://www.datsi.fi.upm.es/~frosal/ ...
- Xcode 技巧充电篇
作为project师,我们最重要的事情就是熟悉我们每天使用的日常工具,但不能仅限于此.仅仅要有可能,我们应该试着掌握和定制能使我们更快.更轻松地实现终于目标的工具.以下是一些小提示和技巧,都是我在 X ...
- weex 项目开发(六)weexpack 项目 打包、签名、发布
一. weexpack build android 和 weexpack run android 的 区别. (1)单纯打包 weexpack build android (2)打包并运行 wee ...
- android menu事件
@Override public boolean onCreateOptionsMenu(Menu menu) { menu.add(0,1,1,R.string.exit); menu.add(0, ...
- 走入asp.net mvc不归路:[5]Action的返回
asp.net mvc提供了多种返回方式,一方面使得视图可以重用,另一方面灵活强大,有直接返回视图,返回Json,返回文件流,返回到相同Controller的Action,返回到另一个Controll ...
- const成员函数总结
const 成员函数: 类的成员函数后面加 const,表明这个函数不会对这个类对象的数据成员(准确地说是非静态数据成员)作不论什么改变. 在设计类的时候.一个原则就是对于不改变数据成员的成员函数都要 ...
- LeetCode(26)题解:Remove Duplicates from Sorted Array
https://leetcode.com/problems/remove-duplicates-from-sorted-array/ Given a sorted array, remove the ...
- Cocos Console命令总结
1. 工程创建 使用Cocos Console创建工程非常简单,安装完cocos命令之后,只需要在需要创建工程的目标目录下打开终端或命令行工具,输入下面的命令即可: cocos new -l js P ...