Microsoft Develops AI Tool for Perfect Human Voice Cloning

Mon Jul 15 2024
icon-facebook icon-twitter icon-whatsapp

NEW YORK: Microsoft has developed an artificial intelligence tool, VALL-E 2, capable of replicating human speech with extraordinary accuracy, sparking both awe and concern over potential misuse.

VALL-E 2, a successor to its predecessor announced in January 2023, employs advanced text-to-speech technology that can mimic voices based on mere seconds of audio input. Microsoft claims the tool achieves “human parity,” meeting or surpassing benchmarks for naturalness and speaker similarity, a feat unmatched by previous systems.

Key innovations include Repetition Aware Sampling, which enhances speech fluidity by minimizing repetitive sounds, and Grouped Code Modeling, optimizing speed without compromising quality in speech synthesis.

In a statement, Microsoft Research highlighted VALL-E 2’s superiority in speech robustness and naturalness compared to benchmarks like LibriSpeech and VCTK, using the ELLA-V evaluation framework for zero-shot text-to-speech synthesis.

Despite its groundbreaking capabilities, Microsoft has opted not to release VALL-E 2 to the public, citing significant concerns about potential misuse. The company emphasized that allowing unrestricted access could facilitate malicious activities such as voice identification spoofing or impersonation, posing risks to security and privacy.

“We have no plans to incorporate VALL-E 2 into a product or expand public access,” Microsoft stated on its website. The tech giant has set up reporting mechanisms to address suspected misuse.

Instances like “vishing,” where AI-generated voices are used in phishing attacks, highlight the urgent need for responsible AI deployment. Earlier this year, an individual used voice spoofing to influence voter behavior in New Hampshire.

Beyond security concerns, Microsoft faces scrutiny over its broader AI initiatives, including antitrust implications and data privacy controversies stemming from partnerships like its $13 billion venture with OpenAI. The company has also faced blowback from its users.

Recall, an “AI assistant” that takes screen captures of a device every few seconds, saw its release indefinitely postponed last month.

Microsoft faced a deluge of criticism from consumers and data privacy experts like the Information Commissioner’s Office in the UK.

In a statement to The US Sun, a company spokesperson stated Recall would shift “from a preview experience broadly available for Copilot+ PCs…to a preview available first in the Windows Insider Program.”

icon-facebook icon-twitter icon-whatsapp