Technology

Meta Launches AI Model for Translating Conversations in 100 Languages

Meta Launches AI Model for Translating Conversations in 100 Languages

Today, Tuesday, Meta Platforms, the parent company of Facebook, launched an AI model capable of translating and transcribing conversations in dozens of languages, laying a foundational stone for potential tools that enable real-time communication, breaking down language barriers. The company stated in a blog post that its model, Seamless M4T, can support text-to-speech and speech-to-text translation in nearly 100 languages, as well as complete speech-to-speech translation in over 35 languages, combining technologies that were previously available only in separate models.

Meta's CEO Mark Zuckerberg envisions that these tools will facilitate interaction among users worldwide in the metaverse—a collection of interconnected virtual worlds on which the company's future relies. The blog post mentioned that Meta will make the model available to the public for non-commercial purposes.

This year, the social media giant has released a series of free AI models, including a large language model known as LLaMA, which poses a significant challenge to the proprietary models sold by Microsoft-owned OpenAI and Alphabet’s Google. Zuckerberg asserts that the open AI ecosystem benefits Meta, as the company stands to gain more by pooling resources to create direct communication tools for consumers on its social platforms, rather than charging for access to models.

However, like other entities in the sector, Meta faces legal questions concerning the training data used to feed its models. Comedian Sarah Silverman and two other authors filed lawsuits in July against Meta and OpenAI, accusing the companies of using their books as training data without permission. Researchers from Meta noted in a research paper that for the Seamless M4T model, they collected audio training data from four million hours of "raw voices sourced from a publicly available web data repository" without specifying any particular source. The paper stated that the textual data originated from databases established last year that utilized content from Wikipedia and associated sites.

Our readers are reading too