2011:Audio Classification (Train/Test) Tasks - MIREX Wiki
Contents[hide]
|
Audio Classification (Test/Train) tasks
Description
Many tasks in music classification can be characterized into a two-stage process: training classification models using labeled data and testing the models using new/unseen data. Therefore, we propose this "meta" task which includes various audio classification tasks that follow this Train/Test process. For MIREX 2011, five classification sub-tasks are included:
- Audio Classical Composer Identification
- Audio US Pop Music Genre Classification
- Audio Latin Music Genre Classification
- Audio Mood Classification
All five classification tasks were conducted in previous MIREX runs (please see ). This page presents the evaluation of these tasks, including the datasets as well as the submission rules and formats.
Task specific mailing list
In the past we have use a specific mailing list for the discussion of this task and related tasks. This year, however, we are asking that all discussions take place on the MIREX "EvalFest" list. If you have an question or comment, simply include the task name in the subject heading.
Data
Audio Classical Composer Identification
This dataset requires algorithms to classify music audio according to the composer of the track (drawn from a collection of performances of a variety of classical music genres). The collection used at MIREX 2009 will be re-used.
Collection statistics:
- 2772 30-second 22.05 kHz mono wav clips
- 11 "classical" composers (252 clips per composer), including:
- Bach
- Beethoven
- Brahms
- Chopin
- Dvorak
- Handel
- Haydn
- Mendelssohn
- Mozart
- Schubert
- Vivaldi
Audio US Pop Music Genre Classification
This dataset requires algorithms to classify music audio according to the genre of the track (drawn from a collection of US Pop music tracks). The MIREX 2007 Genre dataset will be re-used, which was drawn from the USPOP 2002 and USCRAP collections.
Collection statistics:
- 7000 30-second audio clips in 22.05kHz mono WAV format
- 10 genres (700 clips from each genre), including:
- Blues
- Jazz
- Country/Western
- Baroque
- Classical
- Romantic
- Electronica
- Hip-Hop
- Rock
- HardRock/Metal
Audio Latin Music Genre Classification
This dataset requires algorithms to classify music audio according to the genre of the track (drawn from a collection of Latin popular and dance music, sourced from Brazil and hand labeled by music experts). Carlos Silla's (cns2 (at) kent (dot) ac (dot) uk) Latin popular and dance music dataset [1] will be re-used. This collection is likely to contain a greater number of styles of music that will be differentiated by rhythmic characteristics than the MIREX 2007 dataset.
Collection statistics:
- 3,227 audio files in 22.05kHz mono WAV format
- 10 Latin music genres, including:
- Axe
- Bachata
- Bolero
- Forro
- Gaucha
- Merengue
- Pagode
- Sertaneja
- Tango
Audio Mood Classification
This dataset requires algorithms to classify music audio according to the mood of the track (drawn from a collection of production msuic sourced from the APM collection [2]). The MIREX 2007 Mood Classification dataset [3] will be re-used.
Collection statistics:
- 600 30 second audio clips in 22.05kHz mono WAV format selected from the APM collection [4], and labeled by human judges using the Evalutron6000 system.
- 5 mood categories [5] each of which contains 120 clips:
- Cluster_1: passionate, rousing, confident,boisterous, rowdy
- Cluster_2: rollicking, cheerful, fun, sweet, amiable/good natured
- Cluster_3: literate, poignant, wistful, bittersweet, autumnal, brooding
- Cluster_4: humorous, silly, campy, quirky, whimsical, witty, wry
- Cluster_5: aggressive, fiery,tense/anxious, intense, volatile,visceral
2014/5/15 11:54:45
Audio Formats
For all datasets, participating algorithms will have to read audio in the following format:
- Sample rate: 22 KHz
- Sample size: 16 bit
- Number of channels: 1 (mono)
- Encoding: WAV
Evaluation
This section first describes evaluation methods common to all the datasets, then specifies settings unique to each of the tasks.
Participating algorithms will be evaluated with 3-fold cross validation. For Artist Identification and Classical Composer Classification, album filtering (保证每张专辑的在训练和测试数据中都有)will be used the test and training splits, i.e. training and test sets will contain tracks from different albums; for US Pop Genre Classification(应该是对应Mixed genre classification) and Latin Genre Classification, artist filtering will be used the test and training splits, i.e. training and test sets will contain different artists.
The raw classification (identification) accuracy, standard deviation and a confusion matrix for each algorithm will be computed.
Classification accuracies will be tested for statistically significant differences using Friedman's Anova with Tukey-Kramer honestly significant difference (HSD) tests for multiple comparisons. This test will be used to rank the algorithms and to group them into sets of equivalent performance.
In addition computation times for feature extraction and training/classification will be measured.
Submission Format
File I/O Format
The audio files to be used in these tasks will be specified in a simple ASCII list file. The formats for the list files are specified below:
Feature extraction list file
The list file passed for feature extraction will be a simple ASCII list file. This file will contain one path per line with no header line.I.e.
- <example path and filename>
E.g.
- /path/to/track1.wav/path/to/track2.wav...
Training list file
The list file passed for model training will be a simple ASCII list file. This file will contain one path per line, followed by a tab character and the class (artist, genre or mood) label, again with no header line.
I.e.
- <example path and filename>\t<class label>
E.g.
- /path/to/track1.wav rock/path/to/track2.wav blues...
Test (classification) list file
The list file passed for testing classification will be a simple ASCII list file identical in format to the Feature extraction list file. This file will contain one path per line with no header line.
I.e.
- <example path and filename>
E.g.
- /path/to/track1.wav/path/to/track2.wav...
Classification output file
Participating algorithms should produce a simple ASCII list file identical in format to the Training list file. This file will contain one path per line, followed by a tab character and the artist label, again with no header line.
I.e.
- <example path and filename>\t<class label>
E.g.
- /path/to/track1.wav classical/path/to/track2.wav blues...
Submission calling formats
Algorithms should divide their feature extraction and training/classification into separate runs. This will facilitate a single feature extraction step for the task, while training and classification can be run for each cross-validation fold.
Hence, participants should provide two executables or command line parameters for a single executable to run the two separate processes.
Executables will have to accept the paths to the aforementioned list files as command line parameters.
Scratch folders will be provided for all submissions for the storage of feature files and any model files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique file names will be assigned to each audio track.
Example submission calling formats
- extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt TrainAndClassify.sh /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
- extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt Train.sh /path/to/scratch/folder /path/to/trainListFile.txt Classify.sh /path/to/scratch/folder /path/to/testListFile.txt /path/to/outputListFile.txt
- myAlgo.sh -extract /path/to/scratch/folder /path/to/featureExtractionListFile.txt myAlgo.sh -train /path/to/scratch/folder /path/to/trainListFile.txt myAlgo.sh -classify /path/to/scratch/folder /path/to/testListFile.txt /path/to/outputListFile.txt
Multi-processor compute nodes will be used to run this task, however, we ask that submissions use no more than 4 cores (as we will be running a lot of submissions and will need to run some in parallel). Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 1, 2 or 4 thread/core configurations.
- extractFeatures.sh -numThreads 4 /path/to/scratch/folder /path/to/featureExtractionListFile.txt TrainAndClassify.sh -numThreads 4 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
- myAlgo.sh -extract -numThreads 4 /path/to/scratch/folder /path/to/featureExtractionListFile.txt myAlgo.sh -TrainAndClassify -numThreads 4 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
Packaging submissions
- All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). IMIRSEL should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
- Be sure to follow the Best Coding Practices for MIREX
- Be sure to follow the MIREX 2011 Submission Instructions
All submissions should include a README file including the following the information:
- Command line calling format for all executables including examples
- Number of threads/cores used or whether this should be specified on the command line
- Expected memory footprint
- Expected runtime
- Approximately how much scratch disk space will the submission need to store any feature/cache files?
- Any required environments/architectures (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
- Any special notice regarding to running your algorithm
Note that the information that you place in the README file is extremely important in ensuring that your submission is evaluated properly.
Time and hardware limits
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.
A hard limit of 24 hours will be imposed on feature extraction times.
A hard limit of 48 hours will be imposed on the 3 training/classification cycles, leading to a total runtime limit of 72 hours for each submission.
Submission opening date
Friday August 5th 2011
Submission closing date
Friday August 26th 2011
Potential Participants
name / email
Participation in previous years and Links to Results
2011:Audio Classification (Train/Test) Tasks - MIREX Wiki的更多相关文章
- 2013:Audio Tag Classification - MIREX Wiki
Contents [hide] 1 Description 1.1 Task specific mailing list 2 Data 2.1 MajorMiner Tag Dataset 2.2 M ...
- pointnet++之classification/train.py
1.数据集加载 if FLAGS.normal: assert(NUM_POINT<=10000) DATA_PATH = os.path.join(ROOT_DIR, 'data/modeln ...
- [MIREX] MIREX评测介绍
MIREX作为国际最权威音频检索评测大赛,竟然在百度上找不到任何介绍,只有几个与什么搜狗.腾讯获得什么成绩相关的检索内容,相比而言,TRECVID的内容收到重视多了...由于研究生阶段主要研究音频领域 ...
- PH_Pooled Featrues Classification MIREX 2011 Submission
Abstract Principal Mel-Spectrum Components (Feature) Temporal Pooling Functions (Model) Single Hidde ...
- #论文阅读# Universial language model fine-tuing for text classification
论文链接:https://aclweb.org/anthology/P18-1031 对文章内容的总结 文章研究了一些在general corous上pretrain LM,然后把得到的model t ...
- 提高神经网络的学习方式Improving the way neural networks learn
When a golf player is first learning to play golf, they usually spend most of their time developing ...
- Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
- ### Paper about Event Detection
Paper about Event Detection. #@author: gr #@date: 2014-03-15 #@email: forgerui@gmail.com 看一些相关的论文. 1 ...
- VGGNet论文翻译-Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan[‡] & Andrew Zi ...
随机推荐
- C#委托与事件的关系(转载)
1.C#中的事件和委托的作用?事件代表一个组件能够被关注的一种信号,比如你的大肠会向你发出想拉屎的信号,你就可以接收到上厕所.委托是可以把一个过程封装成变量进行传递并且执行的对象,比如你上蹲坑和上坐马 ...
- Java集合(三)--Collection、Collections和Arrays
Collection: Collection是集合类的顶级接口,提供了对集合对象进行基本操作的通用接口方法.Collection接口的意义是为各种具体的集合提供了最大化 的统一操作方式,其直接继承接口 ...
- 20181225模拟赛 T1 color (转化思想,分拆思想)
题目: 有⼀块有 n 段的栅栏,要求第 i 段栅栏最终被刷成颜色 ci .每⼀次可以选择 l, r 把第l . . . r 都刷成某种颜色,后刷的颜⾊会覆盖之前的.⼀共有 m 种颜色,雇主知道只需要用 ...
- 运用 node + express + http-proxy-middleware 实现前端代理跨域的 详细实例哦
一.你需要准备的知识储备 运用node的包管理工具npm 安装插件.中间件的基本知识: 2.express框架的一些基础知识,知道如何建立一个小的服务器:晓得如何快速的搭建一个express框架小应用 ...
- 简述HTTP报文请求方法和状态响应码
1. Method 请求方法,表明客户端希望服务器对资源执行的动作: 1.1 GET 向服务器请求资源. 1.2 HEAD 和GET方法的行为类似,但服务器在响应中只返回首部,不会返回实体的主体部分. ...
- windows事件查看器
如果一个软件发生异常,软件本身没有提示异常信息, 需要从事件查看器中查看产生的错误事件 运行输入eventvwr或者win + X
- LeetCode(54)Spiral Matrix
题目 Given a matrix of m x n elements (m rows, n columns), return all elements of the matrix in spiral ...
- NOI模拟(3.3)螺旋序列(出题人一定是月厨)
Description S也想寻求真正的智慧,然而由于“抑制力”的存在,她必须先解决一系列询问.有一个长度为n的序列a,一个长度为m序列b被称为螺旋序列当且仅当b1=bm且对于1<=i<= ...
- python标准库笔记
1.python互联网数据处理模块 base64数据编码 二进制数据 encode ASCII字符 ASCll字符 decode 二进制数据 json数据交换格式 轻量的数据交换格式,json暴露的A ...
- 《C语言程序设计(第四版)》阅读心得(三)
第八章 指针 1.一个变量的地址称为该变量的指针 一个专门用来存放另一变量的地址(即指针),称它为指针变量.指针变量的值是指针(即地址) 如上图2000是变量的指针,pointer是指针变量, 赋值 ...