论文地址:基于GMM的语音窄带到宽带转换 博客作者:凌逆战 博客地址:https://www.cnblogs.com/LXP-Never/p/12151027.html 摘要 在不改变现有通信网络的情况下,利用窄带语音重建宽带语音是一个很有吸引力的问题.本文提出了一种从窄带语音中恢复宽带语音的新方法.该方法基于高斯混合模型(GMM)将输入语音的窄带频谱包络变换为宽带频谱包络,并采用联合密度估计技术对其参数进行计算.然后利用重构后的谱包络,利用LPC合成器对低频和高频语音信号进行重构.本文还提出了…
题名:一种用于语音带宽扩展的深度神经网络方法 作者:Kehuang Li:Chin-Hui Lee 2015年出来的 摘要 本文提出了一种基于深度神经网络(DNN)的语音带宽扩展(BWE)方法.利用对数谱功率作为输入输出特征进行所需的非线性变换,训练神经网络来实现这种高维映射函数.在10小时的大型测试集上对该方法进行评估时,我们发现与传统的基于高斯混合模型(GMMs)的BWE相比,DNN扩展语音信号在信噪比和对数谱失真方面具有很好的客观质量度量.在假定相位信息已知的情况下,主观听力测试对DNN扩…
博客作者:凌逆战 博客地址:https://www.cnblogs.com/LXP-Never/p/10874993.html 论文作者:Sefik Emre Eskimez , Kazuhito Koishida 摘要 语音超分辨率(SSR)或语音带宽扩展的目标是由给定的低分辨率语音信号生成缺失的高频分量.它有提高电信质量的潜力.我们提出了一种新的SSR方法,该方法利用生成对抗网络(GANs)和正则化(regularization)方法来稳定GAN训练.生成器网络是有一维卷积核的卷积自编码器,…
论文地址:基于DNN的语音带宽扩展及其在窄带语音自动识别中加入高频缺失特征的应用 论文代码:github 博客作者:凌逆战 博客地址:https://www.cnblogs.com/LXP-Never/p/12361112.html 摘要 我们提出了一些增强技术来提高从窄带到宽带扩频(BWE)中的语音质量,解决了三个在实际应用中可能非常关键的问题,即:(1)窄带频谱和估计的高频频谱之间的不连续性,(2) 测试和训练话语之间的能量不匹配,(3)扩大了域外语音信号的带宽.通过带宽扩展语音中高频特征缺…
博客作者:凌逆战 论文地址:https://ieeexplore.ieee.xilesou.top/abstract/document/8683611/ 地址:https://www.cnblogs.com/LXP-Never/p/10714401.html 利用条件变分自动编码器进行人工带宽扩展的潜在表示学习 作者:Pramod Bachhav, Massimiliano Todisco and Nicholas Evans 摘要 当宽带设备与窄带设备或基础设施一起使用时,人工带宽扩展(ABE…
论文地址:使用半监督堆栈式自动编码器实现包含记忆的人工带宽扩展 作者:Pramod Bachhav, Massimiliano Todisco and Nicholas Evans 博客作者:凌逆战 博客地址:https://www.cnblogs.com/LXP-Never/p/10889975.html 摘要 为了提高宽带设备从窄带设备或基础设施接收语音信号的质量,开发了人工带宽扩展(ABE)算法.以动态特征或从邻近帧捕获的explicit memory(显式内存)的形式利用上下文信息,在A…
Instruments 用户指南 http://cdn.cocimg.com/bbs/attachment/Fid_6/6_24457_90eabb4ed5b3863.pdf 原著:Apple Inc. 翻译:謝業蘭[老狼] 联系:xyl.layne@gmail.com 鸣谢:有米移动广告平台 CocoaChina 社区 Instruments User Guide 目录 INSTRUMENTS用户指南...............................................…
LAST UPDATE:     1 Dec 15, 2016 APPLIES TO:     1 2 3 4 Oracle Database - Enterprise Edition - Version 7.0.16.0 and later Oracle Database - Standard Edition - Version 7.0.16.0 and later Oracle Database - Personal Edition - Version 7.1.4.0 and later I…
Attention in Long Short-Term Memory Recurrent Neural Networks by Jason Brownlee on June 30, 2017 in Deep Learning   The Encoder-Decoder architecture is popular because it has demonstrated state-of-the-art results across a range of domains. A limitati…
In this post, I will give a list of all undocumented parameters in Oracle 12.1.0.1c. Here is a query to see all the parameters (documented and undocumented) which contain the string you enter when prompted: – Enter name of the parameter when prompted…
In this post, I will give a list of all undocumented parameters in Oracle 11g. Here is a query to see all the parameters (documented and undocumented) which contain the string you enter when prompted: – Enter name of the parameter when prompted SET l…
The new Converter Standalone 5 lacks the Converter Boot CD. Fortunately you can still use the old version 4.1 Converter Boot CD, which is also compatible with vSphere 5! The Converter Boot CD is available for download on the VMware website, although…
In the last chapter we learned that deep neural networks are often much harder to train than shallow neural networks. That's unfortunate, since we have good reason to believe that if we could train deep nets they'd be much more powerful than shallow…
论文地址:基于码本映射的窄带语音宽带重建算法 博客作者:凌逆战 博客地址:https://www.cnblogs.com/LXP-Never/p/12144324.html 摘要 本文提出了一种从窄带语音中重构宽带语音的新算法,该算法有两个新的特点.第一是基于码本映射的频谱包络重构.第二是利用重构的频谱包络进行语音信号重构.由于该算法无需使用任何附加的发送信息就能生成高质量的语音(盲源),所以它适用于任何网络,如现有的电话网络.支持模拟和ISDN服务的网络等.该算法应用于20个说话人.通过aco…
Figure 3-7 shows a block diagram of a DSP system, as the sampling theorem dictates it should be. Before encountering the analog-to-digital converter, the input signal is processed with an electronic low-pass filter to remove all frequencies above the…
论文地址:基于隐马尔科夫模型的电话语音频带扩展 博客作者:凌逆战 博客地址:https://www.cnblogs.com/LXP-Never/p/12151866.html 摘要 本文提出了一种从lowpass-bandlimited(低通带限)语音中恢复宽带语音的算法.窄带输入信号被分类为有限数量的语音,关于宽带频谱包络的信息取自预先训练的码本.在码本搜索算法中,采用了一种基于隐马尔可夫模型的统计方法,该方法考虑了带限语音的不同特征,使均方误差准则最小化.新算法只需要一个宽带码本,本质上保证…
动机(Motivation) 在自动语音识别(Automated Speech Recognition, ASR)中,只是把语音内容转成文字,但是人们对话过程中除了文本还有其它重要的信息,比如语调,情感,响度.这些信息对于语音的理解也是很重要的.本文关注其中一个点,如何识别出语音的情感,即语音情感识别(Speech Emotion Recognition, SER). 语音情感识别的三个难点 1. 感情是主观的:不同人对于同一段语音,理解出的情感不尽相同,而且存在一定的文化差异. 2. 感情在语…
目录 1. gmm-init-mono 模型初始化 2. compile-train-graghs 训练图初始化 3. align-equal-compiled 特征文件均匀分割 4. gmm-acc-stats-ali 累积模型重估所需数据 5. gmm-sum-accs 并行数据合并 6. gmm-est 声音模型参数重估 7. gmm-boost-silence 模型平滑处理 8. gmm-align-compiled 特征重新对齐 9. train_mono.sh 整体流程详解 转载注明…
[Docs] [txt|pdf] [Tracker] [WG] [Email] [Diff1] [Diff2] [Nits] Versions: (draft-spittka-payload-rtp-opus) 00 01 02 03 04 05 06 07 08 09 10 11 RFC 7587 Network Working Group J. Spittka Internet-Draft Intended status: Standards Track K. Vos Expires: Ja…
There are several libraries for this kind of conversion - I host two of those on GitHub: libsprec (this uses the Google speech recognition APIs, so it supports multiple languages) and VocalKit which uses the high-quality opensource PocketSphinx libra…
转载自:http://ganeshtiwaridotcomdotnp.blogspot.com/2010/12/text-prompted-remote-speaker.html Biometrics is, in the simplest definition, something you are. It is a physical characteristic unique to each individual such as fingerprint, retina, iris, speec…
利用WAVENET扩展语音带宽 作者:Archit Gupta, Brendan Shillingford, Yannis Assael, Thomas C. Walters 博客地址:https://www.cnblogs.com/LXP-Never/p/12090929.html 博客作者:凌逆战 摘要 大规模的移动通信系统往往包含传统的通信传输信道,存在窄带瓶颈,从而产生具有电话质量的音频.在高质量的解码器存在的情况下,由于网络的规模和异构性,用现代高质量的音频解码器来传输高采样率的音频在…
论文地址:一种低复杂度实时增强全频带语音的感知激励方法 论文代码 引用格式:A Perceptually Motivated Approach for Low-complexity, Real-time Enhancement of Fullband Speech 摘要 近几年来,基于深度学习的语音增强方法大大超过了传统的基于谱减法和谱估计的语音增强方法.许多新技术直接在短时傅立叶变换(STFT)域中操作,导致了很高的计算复杂度.在这项工作中,我们提出了PercepNet,这是一种高效的方法,它…
论文地址:DeepFilterNet:基于深度滤波的全频带音频低复杂度语音增强框架 论文代码:https://github.com/ Rikorose/DeepFilterNet 引用:Schröter H, Rosenkranz T, Maier A. DeepFilterNet: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering[J]. arXiv preprin…
论文地址:单耳语音增强的时频注意 引用格式:Zhang Q, Song Q, Ni Z, et al. Time-Frequency Attention for Monaural Speech Enhancement[C]//ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022: 7852-7856. 摘要 大多数语音增强研究通常…
Conversion to Dalvik format failed: Unable to execute dex: Multiple dex files define ... 这个错误是因为有两个相同的jar包,删除其中一个就可以正常运行了.…
By Daniel Du In View and Data client side API, The assets in the Autodesk Viewer have an object tree, a tree structure that represents the model hierarchy. Each element in model can be representing as a node of model tree. Each node has a dbId, this…
题目简述 The string "PAYPALISHIRING" is written in a zigzag pattern on a given number of rows like this: (you may want to display this pattern in a fixed font for better legibility) P A H N A P L S I I G Y I R And then read line by line: "PAHNA…
Conversion Operators in OpenCascade eryar@163.com Abstract. C++ lets us redefine the meaning of the operators when applied to objects. It also lets us define conversion operations for class types. Class-type conversions are used like the built-in con…
For almost 26 years, even a trivial boy like me, have made over 100 and listened uncountable speeches, most of which are boring and tedious. Some professionals might have great intelligence but could not talk it out, some of them could talk it out bu…