OpenAI Previews ‘Voice Engine’ Audio Tool That Can Clone Human Voices With 15 Seconds of Audio

Share

OpenAI is sharing early results from a test for a feature that can read words aloud in a convincing human voice — highlighting a new frontier for artificial intelligence and raising the specter of deepfake risks. The company is sharing early demos and use cases from a small-scale preview of the text-to-speech model, called Voice Engine, which it has shared with about 10 developers so far, a spokesperson said. OpenAI decided against a wider rollout of the feature, which it briefed reporters on earlier this month.

A spokesperson for OpenAI said the company decided to scale back the release after receiving feedback from stakeholders such as policymakers, industry experts, educators and creatives. The company had initially planned to release the tool to as many as 100 developers through an application process, according to the earlier press briefing.

“We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year,” the company wrote in a blog post Friday. “We are engaging with US and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build.”

Other AI technology has already been used to fake voices in some contexts. In January, a bogus but realistic-sounding phone call purporting to be from President Joe Biden encouraged people in New Hampshire not to vote in the primaries — an event that stoked AI fears ahead of critical global elections.

Unlike OpenAI’s previous efforts at generating audio content, Voice Engine can create speech that sounds like individual people, complete with their specific cadence and intonations. All the software needs is 15 seconds of recorded audio of a person speaking to recreate their voice.

During a demonstration of the tool, Bloomberg listened to a clip of OpenAI Chief Executive Officer Sam Altman briefly explaining the technology in a voice that sounded indistinguishable from his actual speech, but was entirely AI-generated.

“If you have the right audio setup, it’s basically a human-caliber voice,” said Jeff Harris, a product lead at OpenAI. “It’s a pretty impressive technical quality.” However, Harris said, “There’s obviously a lot of safety delicacy around the ability to really accurately mimic human speech.”

One of OpenAI’s current developer partners using the tool, the Norman Prince Neurosciences Institute at the not-for-profit health system Lifespan, is using technology to help patients recover their voice. For example, the tool was used to restore the voice of a young patient who lost her ability to speak clearly due to a brain tumor by replicating her speech from an earlier recording for a school project, the company blog post said.

OpenAI’s custom speech model can also translate the audio it generates into different languages. That makes it useful for companies in the audio business, like Spotify Technology SA. Spotify has already used the technology in its own pilot program to translate the podcasts of popular hosts like Lex Fridman. OpenAI also touted other beneficial applications of the technology, such as creating a wider range of voices for educational content for children.

In the testing program, OpenAI is requiring its partners to agree to its usage policies, obtain consent from the original speaker before using their voice, and to disclose to listeners that the voices they’re hearing are AI-generated. The company is also installing an inaudible audio watermark to allow it to distinguish whether a piece of audio was created by its tool.

Before deciding whether to release the feature more broadly, OpenAI said it’s soliciting feedback from outside experts. “It’s important that people around the world understand where this technology is headed, whether we ultimately deploy it widely ourselves or not,” the company said in the blog post.

OpenAI also wrote that it hopes the preview of its software “motivates the need to bolster societal resilience” against the challenges brought about by more advanced AI technologies. For example, the company called on banks to phase out voice authentication as a security measure for accessing bank accounts and sensitive information. It’s also seeking public education about deceptive AI content and more development of techniques for detecting whether audio content is real or AI-generated.

© 2024 Bloomberg L.P.


(This story has not been edited by NDTV staff and is auto-generated from a syndicated feed.)

Affiliate links may be automatically generated – see our ethics statement for details.