Meta AI的SeamlessM4T v2:引领全球语音沟通的新时代

Meta AI的SeamlessM4T v2模型代表了语音和语言翻译技术的一大突破,实现了多语种实时转换的梦想。文章介绍了模型的高效表现力和流媒体能力,强调了对话的自然性和准确性。文章还讨论了如何负责任地构建这些系统,包括减轻有害内容和音频水印技术,确保技术的安全可靠。Meta AI通过提供代码、模型和数据,邀请全球社区共同参与这一技术革新。

Introducing a suite of AI language translation models that preserve expression and improve streaming

介绍一套保留表达细节并提升流媒体效果的人工智能语言翻译模型

Meta AI的SeamlessM4T v2:引领全球语音沟通的新时代

In our increasingly interconnected world, where language differences may present a barrier to communication, translation systems can enable people from different linguistic backgrounds to share knowledge and experiences more seamlessly. However, many of these systems today do not preserve key elements of speech that make human communication human. More specifically, it’s not just the words we choose that convey what we want to say—it’s also how we speak them. Tone of voice, pauses, and emphasis carry important signals that help us communicate emotions and intent. Moreover, human speech and translation are sensitive to nuances such as turn-taking and timing controls. Picture, for example, how human interpreters work: they find just the right balance between low-latency and accurate translations. Waiting too long stifles the flow of communication, while going too fast compromises the overall quality of a translation. Translation systems that enable authentic conversations should deliver across all of these elements of communication.

在我们这个相互联系日益紧密的世界里,语言差异可能会成为沟通的障碍,而翻译系统则可以让来自不同语言背景的人们更顺畅地分享知识和经验。然而,今天的许多翻译系统并没有保留人类交流的关键语言要素。更具体地说,不仅仅是我们选择的词语传达了我们想要表达的意思,我们说话的方式也同样重要。语调、停顿和强调都是帮助我们传达情感和意图的重要信号。此外,人类的语音和翻译对一些细微差别也很敏感,例如轮流发言和时间控制。例如,想象一下人类口译员是如何工作的:他们在低延迟和准确翻译之间找到恰当的平衡。等待时间过长会扼杀交流的流畅性,而过快则会影响翻译的整体质量。能够实现真实对话的翻译系统应该能够跨越所有这些沟通要素。

Today, we are excited to share Seamless, the first publicly available system that unlocks expressive cross-lingual communication in real time. To build Seamless, we developed SeamlessExpressive, a model for preserving expression in speech-to-speech translation, and SeamlessStreaming, a streaming translation model that delivers state-of-the-art results with around two seconds of latency. All of the models are built on SeamlessM4T v2, the latest version of the foundational model we released in August. SeamlessM4T v2 demonstrates performance improvements for automatic speech recognition, speech-to-speech, speech-to text, and text-to-speech capabilities. Compared to previous efforts in expressive speech research, SeamlessExpressive addresses certain underexplored aspects of prosody, such as speech rate and pauses for rhythm, while also preserving emotion and style. The model currently preserves these elements in speech-to-speech translation between English, Spanish, German, French, Italian, and Chinese.

今天,我们很高兴与大家分享 Seamless,它是首个公开可用的系统,可以实时开启富有表现力的跨语言交流。为了构建 Seamless,我们开发了 SeamlessExpressive(一种用于在语音到语音翻译中保留表达的模型)和 SeamlessStreaming(一种流式翻译模型,能以大约两秒的延迟提供最先进的结果)。所有这些模型都建立在 SeamlessM4T v2 的基础上,SeamlessM4T v2 是我们 8 月份发布的基础模型的最新版本。SeamlessM4T v2 在自动语音识别、语音转语音、语音转文本和文本转语音功能方面都实现了性能提升。与以前的表现性语音研究相比,SeamlessExpressive 解决了某些未充分探索的前体问题,如语速和节奏停顿,同时还保留了情感和风格。目前,该模型在英语、西班牙语、德语、法语、意大利语和汉语之间的语音到语音翻译中保留了这些元素。

SeamlessStreaming unlocks real-time conversations with someone who speaks a different language by generating the translation while the speaker is still talking. In contrast to conventional systems which translate when the speaker has finished their sentence, SeamlessStreaming translates while the speaker is still talking. This means that the person they’re speaking to can hear a translation in closer to real-time – there is a delay of a few seconds – rather than waiting until the speaker has finished their sentence. SeamlessStreaming supports automatic speech recognition and speech-to-text translation for nearly 100 input and output languages, and speech-to-speech translation for nearly 100 input languages and 36 output languages. In keeping with our approach to open science, we’re publicly releasing all four models to allow researchers to build on this work.

SeamlessStreaming 可在说话者说话时生成翻译,从而实现与讲不同语言的人进行实时对话。传统的翻译系统是在说话者说完话后才进行翻译,而 SeamlessStreaming 则是在说话者说话的同时进行翻译。这意味着与他们通话的人可以听到接近实时的翻译(有几秒钟的延迟),而不是等到说话者说完话。SeamlessStreaming 支持近 100 种输入和输出语言的自动语音识别和语音到文本翻译,以及近 100 种输入语言和 36 种输出语言的语音到语音翻译。为了与我们的开放科学方法保持一致,我们将公开发布所有四种模型,以便研究人员在此基础上开展工作。

Introducing metadata, data and data alignment tools

介绍元数据、数据和数据对齐工具

Meta AI的SeamlessM4T v2:引领全球语音沟通的新时代

Today, alongside our models, we are releasing metadata, data and data alignment tools to assist the research community, including:

  • Metadata of an extension of SeamlessAlign corresponding to an additional 115,000 hours of speech and text alignments on top of the existing 470k hours. In addition to more hours, the latest version of SeamlessAlign covers a broader range of languages (from 37 previously to 76 with the extension). This corpus is the largest public speech/speech and speech/text parallel corpus in terms of total volume and language coverage to date.
  • Metadata of SeamlessAlignExpressive, an expressivity-focused version of the dataset above. In this dataset, the pairs are parallel from both a semantic and prosodic perspective. SeamlessAlignExpressive is released as a benchmark to validate our expressive alignment approach. In order to train our expressive models, we applied our alignment method to a proprietary dataset.
  • Translated text data for mExpresso, a multilingual, parallel extension of read speech in Expresso, a high-quality expressive speech dataset that includes both read speech and improvised dialogues rendered in different styles. This text benchmark enables benchmarking expressive translation systems from English into other languages.
  • Tools to assist the research community in collecting more datasets for translation.

In particular, we are updating our stopes library and SONAR encoders. With these tools, anyone can automatically create multimodal translation pairs from their own speech and/or text monolingual data through parallel data alignment methods.

今天,除了我们的模型,我们还发布了元数据、数据和数据校准工具,以协助研究界,其中包括

* SeamlessAlign 扩展版的元数据,在现有的 47 万小时语音和文本比对基础上,又增加了 115,000 小时的语音和文本比对。除了更多的小时数,最新版本的 SeamlessAlign 还涵盖了更广泛的语言(从之前的 37 种语言扩展到 76 种语言)。就总量和语言覆盖范围而言,该语料库是迄今为止最大的公共语音/语音和语音/文本并行语料库。

* SeamlessAlignExpressive 的元数据是上述数据集的一个注重表达能力的版本。在这个数据集中,语对从语义和韵律的角度都是平行的。SeamlessAlignExpressive 是作为验证我们的表达性对齐方法的基准而发布的。为了训练我们的表达模型,我们将我们的对齐方法应用于一个专有数据集。

* mExpresso是Expresso中阅读语音的多语种并行扩展,是一个高质量的表达式语音数据集,包括阅读语音和以不同风格呈现的即兴对话。通过该文本基准,可对英语到其他语言的表达式翻译系统进行基准测试。

* 帮助研究界收集更多翻译数据集的工具。

特别是,我们正在更新我们的 stopes 库和 SONAR 编码器。有了这些工具,任何人都可以通过平行数据对齐方法,从自己的语音和/或文本单语数据中自动创建多模态翻译对。

Our approach

我们的做法

Meta AI的SeamlessM4T v2:引领全球语音沟通的新时代

All our models run on fairseq2, the latest update of our sequence modeling toolkit. Similar to our previous work on SeamlessM4T, fairseq2 offers an ideal framework for building our streaming and expressivity updates because it is lightweight, easily composable with other PyTorch ecosystem libraries, and has more efficient modeling and data loader APIs.

UnitY2, a new architecture that has a non-autoregressive text-to-unit decoder, is also instrumental to our work. In SeamlessM4T v2, we used multitask-UnitY2 to enable text input (updated from v1’s multitask-UnitY). We also used the architecture for SeamlessStreaming and SeamlessExpressive. As our next generation multitask model, UnitY2 has superior speech generation capabilities through its improved text-to-unit model. This implementation leads to improved consistency between text output and speech output, compared to the SeamlessM4T v1 model.

Instead of using an autoregressive text-to-unit model as in UnitY, we used a non-autoregressive model. Autoregressive models predict the next token based on the previously generated tokens. While autoregressive models model speech naturally, they scale poorly as sequence length increases. They are also more likely to exhibit repetitive degeneration. Non-autoregressive models predict the duration of each segment, which enables each segment to be decoded in parallel. This makes them robust to long sequences, and we see improvements over the initial iteration of UnitY. Since the model inherently predicts duration, it is much more easily adaptable to the streaming use case, because we know exactly how much speech is needed to be generated for each piece of text, which is not the case for autoregressive models.

我们所有的模型都运行在fairseq2上,这是我们序列建模工具包的最新更新。与我们之前在SeamlessM4T上的工作类似,fairseq2提供了一个理想的框架来构建我们的流媒体和表现力更新,因为它轻巧,易于与其他PyTorch生态系统库组合,并拥有更高效的建模和数据加载API。

UnitY2,一个新的架构,拥有非自回归文本到单元解码器,对我们的工作也至关重要。在SeamlessM4T v2中,我们使用了多任务UnitY2来启用文本输入(从v1的多任务UnitY更新)。我们还使用了这种架构来实现SeamlessStreaming和SeamlessExpressive。作为我们下一代多任务模型,UnitY2通过其改进的文本到单元模型具有卓越的语音生成能力。这种实现使得文本输出和语音输出之间的一致性得到了改进,与SeamlessM4T v1模型相比。

我们使用了非自回归模型,而不是像UnitY中那样使用自回归文本到单元模型。自回归模型基于先前生成的令牌来预测下一个令牌。虽然自回归模型自然地模拟语音,但随着序列长度的增加,它们的扩展性很差。它们也更容易表现出重复退化。非自回归模型预测每个段落的持续时间,使得每个段落可以并行解码。这使它们对长序列更加稳健,我们看到了与UnitY的初始迭代相比的改进。由于模型本质上预测持续时间,它更容易适应流媒体用例,因为我们确切知道每段文本需要生成多少语音,而自回归模型则不是这样。

Streaming

流媒体

EMMA is our core streaming algorithm, which allows us to intelligently decide when we have enough information to generate the next speech segment or target text. It improves upon previous state-of-the-art algorithms especially for long input sequences, which is the case for speech-to-text or speech-to-speech translation. Further, this algorithm allows us to fine-tune from offline models, which allows us to reap the benefits of the Seamless M4T v2 foundation model. Finally, we show empirically that this algorithm generalizes well across many different language pairs, which is particularly challenging for streaming models because the language pairs may be structured differently.

EMMA是我们的核心流媒体算法,它使我们能够智能地决定何时拥有足够的信息来生成下一个语音片段或目标文本。它改进了以往的最先进算法,特别是对于长输入序列,这在语音到文本或语音到语音翻译中尤其常见。此外,这种算法允许我们从离线模型中进行微调,使我们能够从Seamless M4T v2基础模型中获益。最后,我们通过实验证明这种算法能够很好地泛化到许多不同的语言对上,这对于流媒体模型来说尤其具有挑战性,因为不同的语言对可能结构不同。

Expressivity

表现力

Preserving expression also requires a new approach. We replaced the unit HiFi-GAN vocoder in SeamlessM4T v2 with PRETSSEL, an expressive unit-to-speech generator. PRETSSEL is conditioned on the source speech for waveform generation to transfer tones, emotional expression, and vocal style qualities. We initialize our model from SeamlessM4T v2 in order to achieve high translation quality, which is the most fundamental need for a speech-to-speech translation system. We also developed Prosody UnitY2, integrating an expressivity encoder in SeamlessM4T v2 to guide unit generation with proper rhythm, speaking rate, and pauses. In addition, we release a suite of evaluation tools to capture the preservation of these aspects of expressivity.

保留表达也需要一种新的方法。在SeamlessM4T v2中,我们用PRETSSEL替换了单元HiFi-GAN声码器,这是一个表现力强的单元到语音生成器。PRETSSEL在波形生成时以源语音为条件,以传递音调、情感表达和声音风格特质。我们从SeamlessM4T v2初始化我们的模型,以达到高质量的翻译,这对于语音到语音翻译系统来说是最基本的需求。我们还开发了Prosody UnitY2,将表现力编码器集成到SeamlessM4T v2中,以正确的节奏、说话速率和停顿指导单元生成。此外,我们还发布了一套评估工具来捕捉这些表现力方面的保留。

Results

结果

Meta AI的SeamlessM4T v2:引领全球语音沟通的新时代

The updates to UnitY2 have resulted in improved translation quality across a variety of tasks. SeamlessM4T v2 achieves sate of the art translation for speech-to-speech and speech-to-text results in 100 languages. In the same model, it also beats Whisper v3’s for automatic speech recognition on average and in particular for lower resource languages.

For speech-to-text translation, SeamlessM4T v2 improves by 10% compared to the model we released in August and by more than 17% over the strongest cascaded models when translating into English. For speech-to-speech translation, SeamlessM4T v2, improves over SeamlessM4T (v1) by more than 15% when translating into English, and by 25% when translating from English.

In other tasks, SeamlessM4T v2 is on par with No Language Left Behind (NLLB) in text-to-text translation. It is also on-par on average with MMS in automatic speech recognition (ASR) (with better performance on mid and high-resource languages while MMS has better performance on low resource languages), and improving over the recently released Whisper-Large-v3 by more than 25%. In the zero-shot task of text-to-speech translation, SeamlessM4T v2 is on-par with strong cascaded models into English, and improves over these baselines by 16 percent in English.

We compared SeamlessExpressive against a cascaded speech-to-text and text-to-speech pipeline, where speech-to-text is from SeamlessM4T v2, and text-to-speech is from strong open-sourced cross-lingual text-to-speech system that supports vocal style and emotion transfer. Results show that SeamlessExpressive is more stable with respect to noise in the source speech such that the output speech maintains high content translation quality, and better preserves styles and speech rate. SeamlessStreaming achieves state of the art low latency quality with speech-to-speech translation.

UnitY2的更新带来了在各种任务中翻译质量的提升。SeamlessM4T v2在100种语言的语音到语音和语音到文本翻译中实现了最先进的翻译水平。在同一个模型中,它的自动语音识别平均水平也超过了Whisper v3,特别是在资源较少的语言上。

在语音到文本翻译方面,与我们在八月发布的模型相比,SeamlessM4T v2提高了10%,在翻译成英语时,相比最强的串联模型提高了超过17%。在语音到语音翻译方面,与SeamlessM4T (v1)相比,SeamlessM4T v2在翻译成英语时提高了超过15%,从英语翻译时提高了25%。

在其他任务中,SeamlessM4T v2与“不留下任何语言”(NLLB)在文本到文本翻译上不相上下。它在自动语音识别(ASR)上也与MMS平均持平(在中高资源语言上表现更好,而MMS在低资源语言上表现更好),并且相比最近发布的Whisper-Large-v3提高了超过25%。在零样本的文本到语音翻译任务中,SeamlessM4T v2与强大的串联模型在英语翻译上不相上下,并且相比这些基准线在英语中提高了16%。

我们将SeamlessExpressive与串联的语音到文本和文本到语音管道进行了比较,其中语音到文本来自SeamlessM4T v2,文本到语音来自支持声音风格和情感转移的强大开源跨语言文本到语音系统。结果显示,SeamlessExpressive对源语音中的噪音更加稳定,输出语音保持了高内容翻译质量,并更好地保留了风格和语速。SeamlessStreaming在语音到语音翻译方面实现了最先进的低延迟质量。

How we built AI translation systems responsibly: Toxicity mitigation

我们如何负责任地构建人工智能翻译系统:减轻有害内容

Accuracy is paramount in translation systems. Translation errors or unintended toxicity can cause misunderstandings between two people who don’t speak the same language.

Keeping with our commitment to building responsible AI, we explored the problem of hallucinated toxicity further. We focused our efforts on SeamlessM4T v2, which serves as the foundation for SeamlessStreaming, SeamlessExpressive, and our unified Seamless model.

The primary root cause for hallucinated toxicity often lies in the training data. Training samples can be noisy and contain unbalanced toxicity. For example, the input language side and target language side can contain different amounts of toxic words by mistake. Prior to training, we discarded any sample that showed signs of this imbalance.

However, filtering is only a passive technique and does not fully prevent hallucinated toxicity. We went one step further this time, and implemented a novel approach that actively mitigates this phenomenon. During the translation generation process, our model automatically detects generated toxic words. When there are misaligned levels of toxicity, we automatically re-adjust the generation process and use a different choice of words. This works at inference time and does not require any fine-tuning of the translation model. By doing so, we significantly reduce added toxicity while preserving translation quality.

Finally, building upon our past work on toxicity and bias evaluation, we’ve extended our evaluation framework with a new hallucinated toxicity detection tool. While our previous approach relied on an intermediate transcription model (ASR), we are now capable of detecting toxicity directly in the speech signal. This is useful in cases where toxicity is not conveyed by individual words, but rather in tone or general style. This allows us to get a more precise picture of the potential toxicity profile of our model. Additional research needs to be done on responsible AI for machine translation; however, we believe these measures bring us closer to realizing safer and more human-centric translation systems.

在翻译系统中,准确性至关重要。翻译错误或意外的有害内容可能导致不说同一种语言的两个人之间发生误解。

为了坚持我们构建负责任的人工智能的承诺,我们进一步探索了幻觉式有害内容的问题。我们的工作重点放在SeamlessM4T v2上,它是SeamlessStreaming、SeamlessExpressive和我们统一的Seamless模型的基础。

幻觉式有害内容的主要根源通常在于训练数据。训练样本可能含有噪音并包含不平衡的有害内容。例如,输入语言一侧和目标语言一侧可能因为错误而包含不同数量的有害词汇。在训练之前,我们丢弃了任何显示这种不平衡迹象的样本。

然而,过滤只是一种被动技术,不能完全防止幻觉式有害内容。这次我们更进一步,实施了一种积极减轻这种现象的新方法。在翻译生成过程中,我们的模型会自动检测生成的有害词汇。当有害内容的水平不一致时,我们会自动重新调整生成过程,并使用不同的词汇选择。这在推理时工作,不需要对翻译模型进行任何微调。通过这样做,我们显著减少了添加的有害内容,同时保持了翻译质量。

最后,基于我们在有害内容和偏见评估方面的过往工作,我们用一种新的幻觉式有害内容检测工具扩展了我们的评估框架。虽然我们以前的方法依赖于中间转录模型(ASR),但我们现在能够直接在语音信号中检测到有害内容。这在有害内容不是通过单个词汇而是通过语调或总体风格传达的情况下非常有用。这使我们能够更准确地了解我们模型的潜在有害内容概况。尽管需要对机器翻译中的负责任人工智能进行更多研究,但我们相信这些措施使我们更接近于实现更安全、更以人为本的翻译系统。

Audio watermarking

音频水印

While AI tools can help bring the world closer together, it’s just as important that we include measures to prevent the risk of imitation and other forms of misuse. Our watermarking method offers a better level of reliability compared to passive discriminators, which are becoming less effective at differentiating synthetic voices from human ones as voice preservation technology advances. Watermarking actively embeds a signal that is imperceptible to the human ear, but still detectable within the audio using a detector model. Through this watermark, the origin of the audio can be accurately traced. This helps promote the responsible use of voice preservation technology by establishing a verifiable audio provenance and helps prevent potential abuses.

Beyond sheer detection accuracy, our watermarking solution needs to be robust to various attacks. For example, bad actors can try to modify the audio by adding noise, echo, or filtering some frequencies to dilute the watermark and bypass detection. We tested our watermarking method against a broad range of attack types and the results show that it is more robust than the current state-of-the-art. Our method can also pinpoint AI-generated segments in audio down to the frame level, surpassing the previous state-of-the-art (which only provides a one second resolution).

As in any kind of neural-network based safety mechanism, the watermarking model can be fine-tuned in isolation to forget its core properties. However, fine-tuning SeamlessExpressive and Seamless for translation purposes would not involve any update to the watermarking model itself, which does not play any role on translation quality.

尽管人工智能工具可以帮助拉近世界的距离,但同样重要的是我们需要采取措施来防止模仿和其他形式的误用风险。与被动鉴别器相比,我们的水印方法提供了更高的可靠性,随着语音保留技术的进步,被动鉴别器在区分合成声音和人类声音方面的效果正在减弱。水印主动嵌入了人耳无法察觉但在音频中仍可通过检测模型检测到的信号。通过这种水印,音频的来源可以被准确追踪。这有助于通过建立可验证的音频来源来促进语音保留技术的负责任使用,并帮助防止潜在的滥用。

除了纯粹的检测准确性外,我们的水印解决方案还需要对各种攻击具有鲁棒性。例如,不良行为者可能试图通过添加噪音、回声或过滤某些频率来修改音频,以稀释水印并绕过检测。我们对水印方法进行了广泛类型攻击的测试,结果显示它比目前最先进的方法更加鲁棒。我们的方法还可以精确地识别音频中由人工智能生成的片段,直到帧级别,超越了之前最先进的技术(只提供一秒钟的分辨率)。

就像任何基于神经网络的安全机制一样,水印模型可以单独进行微调以忘记其核心属性。然而,微调SeamlessExpressive和Seamless进行翻译目的不会涉及对水印模型本身的任何更新,这对翻译质量没有任何影响。

Providing access to our technology

提供我们技术的访问演示

The breakthroughs we’ve achieved with Seamless show that the dream of a universal, real-time translator isn’t science fiction—it’s becoming a technical reality. We invite everyone to try our expressive translation demo. We’re also making our code, model and data available to the research community.

我们通过Seamless取得的突破表明,实现通用实时翻译器的梦想并非科幻小说中的内容——它正在成为技术现实。我们邀请所有人尝试我们的表现力翻译演示。我们还将我们的代码、模型和数据提供给研究社区。

Try the expressive translation demo

尝试表现力翻译演示

https://seamless.metademolab.com/expressive?utm_source=metaai&utm_medium=web&utm_campaign=fair10&utm_content=blog

Try the Hugging Face demo

尝试Hugging Face演示

https://huggingface.co/collections/facebook/seamless-communication-6568d486ef451c6ba62c7724

Download the code, model, and data

下载代码、模型和数据

https://github.com/facebookresearch/seamless_communication

Read the paper

阅读论文

https://ai.meta.com/research/publications/seamless-multilingual-expressive-and-streaming-speech-translation/

Visit the Seamless website

访问Seamless网站

https://ai.meta.com/research/seamless-communication

This blog post was made possible by the work of Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek, Yilin Yang, Ethan Ye, Ivan Evtimov, Pierre Fernandez, Cynthia Gao, Prangthip Hansanti, Elahe Kalbassi, Amanda Kallet, Artyom Kozhevnikov, Gabriel Mejia, Robin San Roman, Christophe Touret, Corinne Wong, Carleigh Wood, Bokai Yu, Pierre Andrews, Can Balioglu, Peng-Jen Chen, Marta R. Costa-jussà, Maha Elbayad, Hongyu Gong, Francisco Guzmán, Kevin Heffernan, Somya Jain, Justine Kao, Ann Lee, Xutai Ma, Alex Mourachko, Benjamin Peloquin, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Anna Sun, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang, and Mary Williamson.

这篇博客文章得以完成,得益于以下人员的工作:Loïc BarraultYu-An ChungMariano Coria MeglioliDavid DaleNing DongMark DuppenthalerPaul-Ambroise DuquenneBrian EllisHady ElsaharJustin HaaheimJohn HoffmanMin-Jae HwangHirofumi InagumaChristopher KlaiberIlia KulikovPengwei LiDaniel LichtJean MaillardRuslan MavlyutovAlice RakotoarisonKaushik Ram SadagopanAbinesh RamakrishnanTuan TranGuillaume WenzekYilin YangEthan YeIvan EvtimovPierre FernandezCynthia GaoPrangthip HansantiElahe KalbassiAmanda KalletArtyom KozhevnikovGabriel MejiaRobin San RomanChristophe TouretCorinne WongCarleigh WoodBokai YuPierre AndrewsCan BaliogluPeng-Jen ChenMarta R. Costa-jussàMaha ElbayadHongyu GongFrancisco GuzmánKevin HeffernanSomya JainJustine KaoAnn LeeXutai MaAlex MourachkoBenjamin PeloquinJuan PinoSravya PopuriChristophe RopersSafiyyah SaleemHolger SchwenkAnna SunPaden TomaselloChanghan WangJeff WangSkyler Wang Mary Williamson

原创文章,作者:Xaiat超级会员,如若转载,请注明出处:https://www.xaiat.com/meta-ai%e7%9a%84seamlessm4t-v2%ef%bc%9a%e5%bc%95%e9%a2%86%e5%85%a8%e7%90%83%e8%af%ad%e9%9f%b3%e6%b2%9f%e9%80%9a%e7%9a%84%e6%96%b0%e6%97%b6%e4%bb%a3/

(1)
Xaiat的头像Xaiat超级会员管理员
上一篇 2023年12月1日 15:32
下一篇 2023年12月2日 09:36

相关推荐

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注

Xaiat 人工智能艾特 让人人更懂AI