VALL-E
美国
语音转换语音综合

VALL-E 翻译站点

用3秒的音频模拟任何人的声音

标签:
爱站权重:PC 百度权重移动 百度移动权重

Microsoft的新文本到语音模型可以保护扬声器的情感语调和声学环境。

VALL-E可用于高质量的文本到语音应用程序,语音编辑,其中可以从文本成绩单中编辑并更改一个人的录音(使他们说出最初没有的话),以及创建音频内容与其他生成A​​I模型(如GPT-3)结合使用。

Microsoft称VALL-E为“神经编解码器语言模型”,它是由一种称为Eccodec的技术建立的,Meta在2022年10月宣布。与其他文本到语音方法不同,通常通过操纵波形来综合语音,VALL-E生成来自文本和声学提示的离散音频编解码器代码。它基本上分析了一个人的声音,将信息分解为离散的组件(称为“代币”),并通过EncoDec,并使用培训数据来匹配它“知道”的声音,如果该声音在三个之外说其他短语, - 第二样本。

对于本文的结论,他们写道:

“由于Vall-e可以综合说话者身份的语音,因此它可能会在滥用模型时承担潜在的风险,例如欺骗语音识别或冒充特定的扬声器。为了减轻此类风险,可以建立一个检测模型来区分检测模型Vall-E是否合成了音频剪辑。在进一步开发模型时,我们还将将Microsoft AI原则付诸实践。”

资料来源:https://arstechnica.com/information-technology/2023/01/microsofts-new-ai-can-simune-simune-anyones-anyones-voice-voice-with-3-seconds-oaudio/

原文:

The new Text-to-speech model from Microsoft can preserve speaker's emotional tone and acoustic environment.

VALL-E could be used for high-quality text-to-speech applications, speech editing where a recording of a person could be edited and changed from a text transcript (making them say something they originally didn't), and audio content creation when combined with other generative AI models like GPT-3.

Microsoft calls VALL-E a "neural codec language model," and it builds off of a technology called EnCodec, which Meta announced in October 2022. Unlike other text-to-speech methods that typically synthesize speech by manipulating waveforms, VALL-E generates discrete audio codec codes from text and acoustic prompts. It basically analyzes how a person sounds, breaks that information into discrete components (called "tokens") thanks to EnCodec, and uses training data to match what it "knows" about how that voice would sound if it spoke other phrases outside of the three-second sample.

For the paper's conclusion, they write:

"Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker. To mitigate such risks, it is possible to build a detection model to discriminate whether an audio clip was synthesized by VALL-E. We will also put Microsoft AI Principles into practice when further developing the models."

Source: https://arstechnica.com/information-technology/2023/01/microsofts-new-ai-can-simulate-anyones-voice-with-3-seconds-of-audio/

数据统计

数据评估

VALL-E浏览人数已经达到227,如你需要查询该站的相关权重信息,可以点击"爱站数据""Chinaz数据"进入;以目前的网站数据参考,建议大家请以爱站数据为准,更多网站价值评估因素如:VALL-E的访问速度、搜索引擎收录以及索引量、用户体验等;当然要评估一个站的价值,最主要还是需要根据您自身的需求以及需要,一些确切的数据则需要找VALL-E的站长进行洽谈提供。如该站的IP、PV、跳出率等!

关于VALL-E特别声明

本站GPT 案例导航提供的VALL-E都来源于网络,不保证外部链接的准确性和完整性,同时,对于该外部链接的指向,不由GPT 案例导航实际控制,在2023年3月9日 下午10:44收录时,该网页上的内容,都属于合规合法,后期网页的内容如出现违规,可以直接联系网站管理员进行删除,GPT 案例导航不承担任何责任。

相关导航