A new startup has created an artificial intelligence system capable of mimicking voices that are unprecedentedly close to the real thing.
In a video from Dessa, an AI company staffed by former employees of Google, IBM, and Microsoft, multiple audio clips demonstrate a machine-learning software that parrots the voice of popular podcaster, Joe Rogan to a degree that’s almost indiscernible from the real thing.
In the clips, the computer-generated Rogan muses on topics like chimpanzee’s who can play hockey; it pulls off some adept tongue-twisters; and it even pontificates theories about how we’re all living in a simulation, which as noted by The Verge, are some of Rogan’s favorite topics.
Joe Rogan is one of the most popular podcasters in the world, giving AI plenty of data to choose from when trying to mimic the host’s voice
In a response, even Rogan himself called the demonstration ‘terrifyingly accurate’ reports CNET.
What makes the demonstration more intriguing, or perhaps scary, according to Dessa is that software like the one demonstrated channeling Rogan could soon be commonplace.
‘Unlike The Singularity, which is this theoretical thing that could happen in 40, 100 years, speech synthesis is soon going to be a reality everywhere,’ said Principal Machine Learning Architect for Dessa, Alex Krizhevsky in a blog post.
Dessa’s software, called Real Talk, had a trove of data to train itself on. Rogan has almost 1,300 episodes of his show available online with each installment lasting about one to five hours.
And while current AI system’s like Dessa’s require what the company describes as a mixture of ‘expertise, ingenuity, computing power and data,’ rapid advancements in machine-learning and adjacent fields all but ensure voice-mimicking technology will soon be let out of the cage.
‘In the next few years (or even sooner), we’ll see the technology advance to the point where only a few seconds of audio are needed to create a life-like replica of anyone’s voice on the planet,’ reads the post.
Implications of AI-powered voice fabrication could be vast and run the gamut.
Though the tools would likely have positive implications like making voice assistants more realistic and improving accessibility through speech to text devices, it could also aid bad actors in impersonating victims of identity fraud or even be used to doctor voice tapes of politicians in an effort to sway elections, says Dessa.
Concerns over the use of AI voice-mimicry underlay those regarding a new phenomenon called ‘deepfakes.’
By using similar technology researchers have found ways of manipulating other mediums like pictures and video which many consider to be unalterable.
Deepfakes havea are also used in photos and video. Those applications could endanger political campaigns, say skeptics
One recent demonstration by a Japanese company, DataGrid, shows how their system is capable of generating entirely fake, and convincing, human models.
Like Dessa’s concerns with AI manipulated voices, skeptics say videos can be manipulated to spread misinformation to influence political campaigns.
Because of those concerns, Dessa says it will not yet release its data sets, research, or models publicly — or at least not yet. Another future post about its program will give what the company calls a ‘technological overview.’
In the meantime, the company hopes to create a serious, productive, and most importantly, unaltered, dialogue.
‘Everyone should know what kinds of things are possible with the development of speech synthesis technologies,’ reads a blog post.
‘Public awareness and dialogue also pushes governments, policymakers and lawmakers to take action and develop countermeasures swiftly.’
WHAT IS DEEP LEARNING?
Deep learning is a form of machine learning concerned with algorithms which have a wide range of applications.
It is a field which was inspired by the human brain and focuses on building artificial neural networks.
It was formed originally based on brain simulations and to allow learning algorithms to become better and easier to use.
Processing vast amounts of complex data then becomes much easier and allows researchers to trust algorithms to draw accurate conclusions based on the parameters the researchers have set.
Task-specific algorithms which exist are better for specific tasks and goals but deep-learning allows for a wider scope of data collection.