TIMIT语音库

TIMIT语音库有着准确的音素标注，因此可以应用于语音分割性能评价，同时该数据库又含有几百个说话人语音，所以也是评价说话人识别常用的权威语音库，但该语音库的商业用途是要花钱买的。下面的资源来自与MIT教学实验使用，大概有430多M。

下载地址：http://web.mit.edu/course/6/6.863/share/nltk_lite/

不需要单个文件下载，可以使用下面的下载工具批量下载。

下载工具：http://www.onlinedown.net/soft/53010.htm

The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus
(TIMIT)

Training and Test Data
NIST Speech Disc CD1-1.1

The TIMIT corpus of read speech has been designed to provide speech data for
the acquisition of acoustic-phonetic knowledge and for the development and
evaluation of automatic speech recognition systems. TIMIT has resulted from
the joint efforts of several sites under sponsorship from the Defense Advanced
Research Projects Agency - Information Science and Technology Office
(DARPA-ISTO). Text corpus design was a joint effort among the Massachusetts
Institute of Technology (MIT), Stanford Research Institute (SRI), and Texas
Instruments (TI). The speech was recorded at TI, transcribed at MIT, and has
been maintained, verified, and prepared for CD-ROM production by the National
Institute of Standards and Technology (NIST). This file contains a brief
description of the TIMIT Speech Corpus. Additional information including the
referenced material and some relevant reprints of articles may be found in the
printed documentation which is also available from NTIS (NTIS# PB91-100354).

1. Corpus Speaker Distribution
-- ---------------------------

TIMIT contains a total of 6300 sentences, 10 sentences spoken by each of 630
speakers from 8 major dialect regions of the United States. Table 1 shows the
number of speakers for the 8 dialect regions, broken down by sex. The
percentages are given in parentheses. A speaker's dialect region is the
geographical area of the U.S. where they lived during their childhood years.
The geographical areas correspond with recognized dialect regions in U.S.
(Language Files, Ohio State University Linguistics Dept., 1982), with the
exception of the Western region (dr7) in which dialect boundaries are not
known with any confidence and dialect region 8 where the speakers moved around
a lot during their childhood.

Table 1: Dialect distribution of speakers

Dialect
      Region(dr)    #Male    #Female    Total
      ---------- --------- --------- ----------
         1         31 (63%) 18 (27%)   49 (8%)
         2         71 (70%) 31 (30%) 102 (16%)
         3         79 (67%) 23 (23%) 102 (16%)
         4         69 (69%) 31 (31%) 100 (16%)
         5         62 (63%) 36 (37%)   98 (16%)
         6         30 (65%) 16 (35%)   46 (7%)
         7         74 (74%) 26 (26%) 100 (16%)
         8         22 (67%) 11 (33%)   33 (5%)
       ------     --------- --------- ----------
         8        438 (70%) 192 (30%) 630 (100%)

The dialect regions are:
     dr1: New England
     dr2: Northern
     dr3: North Midland
     dr4: South Midland
     dr5: Southern
     dr6: New York City
     dr7: Western
     dr8: Army Brat (moved around)

2. Corpus Text Material
-- --------------------

The text material in the TIMIT prompts (found in the file "prompts.doc")
consists of 2 dialect "shibboleth" sentences designed at SRI, 450
phonetically-compact sentences designed at MIT, and 1890 phonetically-diverse
sentences selected at TI. The dialect sentences (the SA sentences) were meant
to expose the dialectal variants of the speakers and were read by all 630
speakers. The phonetically-compact sentences were designed to provide a good
coverage of pairs of phones, with extra occurrences of phonetic contexts
thought to be either difficult or of particular interest. Each speaker read 5
of these sentences (the SX sentences) and each text was spoken by 7 different
speakers. The phonetically-diverse sentences (the SI sentences) were selected
from existing text sources - the Brown Corpus (Kuchera and Francis, 1967) and
the Playwrights Dialog (Hultzen, et al., 1964) - so as to add diversity in
sentence types and phonetic contexts. The selection criteria maximized the
variety of allophonic contexts found in the texts. Each speaker read 3 of
these sentences, with each sentence being read only by a single speaker.
Table 2 summarizes the speech material in TIMIT.

Table 2: TIMIT speech material

Sentence Type   #Sentences   #Speakers   Total   #Sentences/Speaker
-------------   ----------   ---------   -----   ------------------
Dialect (SA)          2         630       1260           2
Compact (SX)        450           7       3150           5
Diverse (SI)       1890           1       1890           3
-------------   ----------   ---------   -----    ----------------
Total              2342                   6300          10

3. Suggested Training/Test Subdivision
-- -----------------------------------

The speech material has been subdivided into portions for training and
testing. The criteria for the subdivision is described in the file
"testset.doc". THIS SUBDIVISION HAS NO RELATION TO THE DATA DISTRIBUTED ON
THE PROTOTYPE VERSION OF THE CDROM.

Core Test Set:

The test data has a core portion containing 24 speakers, 2 male and 1 female
from each dialect region. The core test speakers are shown in Table 3. Each
speaker read a different set of SX sentences. Thus the core test material
contains 192 sentences, 5 SX and 3 SI for each speaker, each having a distinct
text prompt.

Table 3: The core test set of 24 speakers

Dialect        Male      Female
     -------       ------     ------
        1        DAB0, WBT0    ELC0
        2        TAS1, WEW0    PAS0
        3        JMP0, LNT0    PKT0
        4        LLL0, TLS0    JLM0
        5        BPM0, KLT0    NLP0
        6        CMJ0, JDH0    MGD0
        7        GRT0, NJM0    DHC0
        8        JLN0, PAM0    MLD0

Complete Test Set:

A more extensive test set was obtained by including the sentences from all
speakers that read any of the SX texts included in the core test set. In
doing so, no sentence text appears in both the training and test sets. This
complete test set contains a total of 168 speakers and 1344 utterances,
accounting for about 27% of the total speech material. The resulting dialect
distribution of the 168 speaker test set is given in Table 4. The complete
test material contains 624 distinct texts.

Table 4: Dialect distribution for complete test set

Dialect    #Male   #Female   Total
      -------    -----   -------   -----
        1           7        4       11
        2          18        8       26
        3          23        3       26
        4          16       16       32
        5          17       11       28
        6           8        3       11
       7          15        8       23
        8           8        3       11
      -----      -----   -------   ------
      Total       112       56      168

4. CDROM TIMIT Directory and File Structure
-- ----------------------------------------

The speech and associated data is organized on the CD-ROM according to the
following hierarchy:

/<CORPUS>/<USAGE>/<DIALECT>/<SEX><SPEAKER_ID>/<SENTENCE_ID>.<FILE_TYPE>

where,

CORPUS :== timit
     USAGE :== train | test
     DIALECT :== dr1 | dr2 | dr3 | dr4 | dr5 | dr6 | dr7 | dr8
                 (see Table 1 for dialect code description)
     SEX :== m | f
     SPEAKER_ID :== <INITIALS><DIGIT>

          where,
          INITIALS :== speaker initials, 3 letters
          DIGIT :== number 0-9 to differentiate speakers with identical
                    initials

     SENTENCE_ID :== <TEXT_TYPE><SENTENCE_NUMBER>

          where,

          TEXT_TYPE :== sa | si | sx
                        (see Section 2 for sentence text type description)
          SENTENCE_NUMBER :== 1 ... 2342

     FILE_TYPE :== wav | txt | wrd | phn
                   (see Table 5 for file type description)

Examples:
     /timit/train/dr1/fcjf0/sa1.wav

     (TIMIT corpus, training set, dialect region 1, female speaker,
      speaker-ID "cjf0", sentence text "sa1", speech waveform file)

/timit/test/df5/mbpm0/sx407.phn

      (TIMIT corpus, test set, dialect region 5, male speaker, speaker-ID
       "bpm0", sentence text "sx407", phonetic transcription file)


Online documentation and tables are located in the directory "timit/doc".
A brief description of each file in this directory can be found in Section 6.

5. File Types
-- ----------

The TIMIT corpus includes several files associated with each utterance. In
addition to a speech waveform file (.wav), three associated transcription
files (.txt, .wrd, .phn) exist. These associated files have the form:

<BEGIN_SAMPLE> <END_SAMPLE> <TEXT><new-line>
        .
        .
        .
        <BEGIN_SAMPLE> <END_SAMPLE> <TEXT><new-line>

where,

                BEGIN_SAMPLE :== The beginning integer sample number for the
                                 segment (Note: The first BEGIN_SAMPLE of each
                                 file is always 0)

                END_SAMPLE :== The ending integer sample number for the segment
                               (Note: Because of the transcription method used,
                               the last END_SAMPLE in each transcription file
                               may be less than the actual last sample in the
                               corresponding .wav file)

TEXT :== <ORTHOGRAPHY> | <WORD_LABEL> | <PHONETIC_LABEL>

                where,

                     ORTHOGRAPHY :== Complete orthographic text transcription
                     WORD_LABEL :== Single word from the orthography
                     PHONETIC_LABEL :== Single phonetic transcription code
                                        (See "phoncode.doc" for description
                                        of codes)

Table 5: Utterance-associated file types

File Type                     Description
--------- ------------------------------------------------------

     .wav - SPHERE-headered speech waveform file. (See the "/sphere"
            directory for speech file manipulation utilities.)

.txt - Associated orthographic transcription of the words the
person said. (Usually this is the same as the prompt, but
in a few cases the orthography and prompt disagree.)

.wrd - Time-aligned word transcription. The word boundaries
            were aligned with the phonetic segments using a dynamic
            string alignment program (see the printed documentation
            section "Notes on the Word Alignments" and the lexical
            pronunciations given in "timitdic.txt".)

.phn - Time-aligned phonetic transcription. (See the reprint
            of the article by Seneff and Zue (1988), in the printed
            documentation, and the section "Notes on Checking the
            Phonetic Transcriptions" for more details on the phonetic
            transcription protocols.)


Example transcriptions from the utterance in "/timit/test/dr5/fnlp0/sa1.wav"

Orthography (.txt):
0 61748 She had your dark suit in greasy wash water all year.

Word label (.wrd):
        7470 11362 she
        11362 16000 had
        15420 17503 your
        17503 23360 dark
        23360 28360 suit
        28360 30960 in
        30960 36971 greasy
        36971 42290 wash
        43120 47480 water
        49021 52184 all
        52184 58840 year

Phonetic label (.phn):
(Note: beginning and ending silence regions are marked with h#)
        0 7470 h#
        7470 9840 sh
        9840 11362 iy
        11362 12908 hv
        12908 14760 ae
        14760 15420 dcl
        15420 16000 jh
        16000 17503 axr
        17503 18540 dcl
        18540 18950 d
        18950 21053 aa
        21053 22200 r
        22200 22740 kcl
        22740 23360 k
        23360 25315 s
        25315 27643 ux
        27643 28360 tcl
        28360 29272 q
        29272 29932 ih
        29932 30960 n
        30960 31870 gcl
        31870 32550 g
        32550 33253 r
        33253 34660 iy
        34660 35890 z
        35890 36971 iy
        36971 38391 w
        38391 40690 ao
        40690 42290 sh
        42290 43120 epi
        43120 43906 w
        43906 45480 ao
        45480 46040 dx
        46040 47480 axr
        47480 49021 q
        49021 51348 ao
        51348 52184 l
        52184 54147 y
        54147 56654 ih
        56654 58840 axr
        58840 61680 h#

6. Online Documentation
-- --------------------

Compact documentation is located in the "/timit/doc" directory. Files in this
directory with a ".doc" extension contain freeform descriptive text and files
with a ".txt" extension contain tables of formatted text which can be searched
programmatically. Lines in the ".txt" files beginning with a semicolon are
comments and should be ignored on searches. The following is a brief
description of their contents:

phoncode.doc - Table of phone symbols used in phonemic dictionary and
                   phonetic transcriptions
     prompts.txt - Table of sentence prompts and sentence-ID numbers
    spkrinfo.txt - Table of speaker attributes
    spkrsent.txt - Table of sentence-ID numbers for each speaker
     testset.doc - Description of suggested train/test subdivision
    timitdic.doc - Description of phonemic lexicion
    timitdic.txt - Phonemic dictionary of all orthographic words in prompts

A more extensive description of corpus design, collection, and transcription
can be found in the printed documentation.

TIMIT语音库的更多相关文章

使用最新的“huihui中文语音库”实现文本转语音功能
最近一个web项目中,需要进行语音播报,将动态的文字转换为语音(TTS)存为WAV文件后通过web播放给用户.选择了微软所提供的SAPI (The Microsoft Speech API),只需要几 ...
python3 使用语音库pyttsx3
python3 使用语音库pyttsx3 环境linux+python3.6 sudo pip install pyttsx3 sudo apt-get install espeak 代码实例 imp ...
Delphi文字转语音TTS【支持选择语音库，播放，暂停，开始，停止，生成语音文件，设置音量，设置语速】
作者QQ:(648437169) 点击下载➨文字转语音TTS [Delphi 文字转语音TTS]调用系统自带的TTS组件,支持XP,vista,win7,win8,win10系统,支持选择语音库,播放 ...
Lingoes安装词典和语音库
安装词典: 选项->词典,出现"词典管理"窗体,点"安装",从磁盘上选择要安装的词典文件(扩展名为ld2的文件),勾选"添加到索引组" ...
Android - 基于 Speex 的高度封装语音库，0 耦合使用
作者:林冠宏 / 指尖下的幽灵掘金:https://juejin.im/user/587f0dfe128fe100570ce2d8 博客:http://www.cnblogs.com/linguan ...
Android 基于 Speex 的高度封装语音库，0 耦合,没三方jar包
作者:林冠宏 / 指尖下的幽灵掘金:https://juejin.im/user/587f0dfe128fe100570ce2d8 博客:http://www.cnblogs.com/linguan ...
kaldi的TIMIT实例一
TIMIT语音库是IT和MIT合作音素级别标注的语音库,用于自动语音识别系统的发展和评估,包括来自美式英语,8个地区方言,630个人. 每个人读10个句子,每个发音都是音素级别.词级别文本标注,16k ...
Android 5.0 到 Android 6.0 + 的深坑之一之 .so 动态库的适配
(原创:http://www.cnblogs.com/linguanh) 目录: 前序一,问题描述二,为何会如此"无情"? 三,目前存在该问题的知名SDK 四,解决方案,1 对 ...
【VC++技术杂谈004】使用微软TTS语音引擎实现文本朗读
本文主要介绍如何使用微软TTS语音引擎实现文本朗读,以及生成wav格式的声音文件. 1.语音引擎及语音库的安装 TTS(Text-To-Speech)是指文本语音的简称,即通过TTS引擎把文本转化为语 ...

随机推荐

C++ 内存管理与堆栈
/*内存管理与堆栈: * # 一个由C/C++编译的程序占用的内存分为以下几个部分 * 1.栈区:由编译器自动分配释放,数据先进后出 * 2.堆区:由程序员手动分配释放,数据先进先出, * new 和 ...
C# vs C++ Performance
http://www.codeproject.com/Articles/212856/Head-to-head-benchmark-Csharp-vs-NET
#if __IPHONE_OS_VERSION_MAX_ALLOWED < __IPHONE_8_0
头文件处理 #import <UIKit/UIKit.h> #if __IPHONE_OS_VERSION_MAX_ALLOWED < __IPHONE_8_0 #else #imp ...
一句SQL实现获取自增列操作
@@IDENTITY返回最后插入的标识值. 语法@@IDENTITY 返回类型numeric 注释在一条 INSERT.SELECT INTO 或大容量复制语句完成后,@@IDENTITY 中包含此语 ...
使用supervisor监控进程
在linux下监控进程,可以使用inittab,最近找到了supervisor,也很好用,记录一下:1.系统要安装python,并安装与之对应的setuptools,下载地址在此2.安装:# sh s ...
android自定义控件(1)-点击实现开关按钮切换
自定义控件的步骤.用到的主要方法: 1.首先需要定义一个类,继承自View:对于继承View的类,会需要实现至少一个构造方法:实际上这里一共有三个构造方法: public View (Contex ...
新浪微博客户端(36)-自定义带placeholder的TextView
iOS 上自带的UITextView竟然不能设置placeholder,但是UITextView却可以,我也真是醉了.没办法了,自己写一个 DJTextView.h #import <UIKit ...
[MongoDB]对数组操作
摘要在实际开发中遇到更新某个document中的数组的值,这里做一下记录. 这里使用的驱动为 using MongoDB.Bson;using MongoDB.Driver; 相关文章 [Mongo ...
理解 AngularJS 的 Scope
一.遇到的问题问题发生在使用 AngularJS 嵌套 Controller 的时候.因为每个 Controller 都有它对应的 Scope(相当于作用域.控制范围),所以 Controller ...
the usage of linux command "expect"
#! /usr/bin/expect -f# this script is used to practise the command "expect" #when "li ...

TIMIT语音库

TIMIT语音库的更多相关文章

随机推荐

热门专题