How To Prepare For Climbing Your First Himalayan Mountain

This September I will be climbing Stok Kangri — a 6,153m Himalayan mountain in Northern India. That’s 20,187 ft, in old money. It’s not an especially technical mountain in the summer and early…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




AI Voice Acting

Let’s go into the uncanny VALL-E

She’s helped me temper my Stanford-born enthusiasm for all things in emerging technology with a humanistic understanding of how AI seems like a serious existential danger to an entire profession and people’s livelihoods.

Given my background in art & design, and that I come from a family where my mother is a playwright and theatre professor, and my father is an actor and director —

My perspective is that it doesn’t help to turn a blind eye to emerging technology. We’ve got to understand the state-of-the-art, imagine where it is going, and then find how we need to evolve, and hopefully take advantage of the new opportunities.

Let me pre-empt my conclusion by saying:

However, the emphasis is on craft. The danger to voice actors, and creative professionals as a whole is that:

But if you take your craft seriously as an actor — that is something that won’t readily be reproduced by AI, and I’ll explain why in my actual conclusion.

TEXT-TO-VOICE GENERATION

As the video above shows, you can get a passable read of a piece of text with voice generation tools.

Right now, if you plug a piece of text into a voice generator, you may get some controls or the ability to specify a general emotion like ‘Angry’ — but fine-grained control is not possible.

VOICE CLONING

This is maybe a more worrying topic to voice actors. Recently, James Earl Jones approved the use the AI generation of his voice as Darth Vader:

While Jones was treated with respect and kept in the loop, it’s not difficult to imagine a less ethical production studio using archived audio from previous voiceover performances to create new material.

And the we have VALL-E

The demo (which now seems to be offline) let you play with examples of 3 seconds of voice capture which would then create any length of reading in that voice.

Where before you had to read very specific scripts to capture a voice, now a very small snippet can capture the essence of your voice.

There’s a lot of potential for abuse here, and while most services post an ‘ethical statement’ — the cat’s out of the bag. You can clone existing voices pretty easily, and as an actor I’m not sure you can easily prove that your own voice was the source material.

Every rapidly emerging, industry disruptive technology has its watershed moment where that industry realizes it has to respond, and usually the initial response is: Stop this thing!

We’ve seen this with mp3’s and the music industry, Uber/Lyft and the taxi industry or currently with streaming services and the movie industry with its theatrical release windows.

Once the general public has a taste of the new technology, you can try to restrict it, but I firmly believe that if it truly meets a need, that need will win out. Now — just because something is popular doesn’t mean it’s right, but it’s a lot harder to put the genie back in the bottle unless you make it completely illegal, and even then it still won’t go away.

Rather than swimming upstream, I think it’s better to accept the most likely outcome, plan for it, and even work towards it. My belief is that this all can play out in a way that values original creativity more than ever before. So with that in mind — where’s it all going in the future?

Here’s the fear: you have a producer or director who puts the script into the AI tool, chooses whatever character voices it wants, and the AI performs the script perfectly, the job is done, everyone goes home early.

But — if you’ve witnessed a voice recording session, there’s no such thing as a perfect read, even with a human voice actor. The director will want different readings, to convey the right voice and tone of the work.

Let’s say that the director can instruct the AI precisely, adjust intonations, emphasis words, emotion until they get the precise reading that they want.

There’s a lot of literature on why that’s a bad idea — it essentially boils down to the director stepping into the shoes of the actor. So, in effect, the director is acting all of the roles, which is as problematic as a single actor acting all of the roles, which leads us to the next scenario.

So if we’ve decided that we’ll never get an actor’s performance from the AI, then the next natural assumption might be to let an actor perform all of the parts, and then replace their voice with whatever character voice you need. The actor is essentially puppeteering a digital voice.

Now it’s worth thinking more deeply about an actor’s craft. What is an actor bringing to a role, to their performance and to the reading of their lines?

A good actor will be thinking about their character deeply. Their backstory, the context of the scene and their character’s relationship to the other characters. That informs the nuanced reading of their lines.

If you’re asking a single actor to embody multiple characters, you’re multiplying their ‘homework’ (and cognitive load) with every additional character.

Outcome: performances will degrade with every additional character voiced by a single actor

Presumably, the goal in using a single actor would be to cut down on production time, so if you’re not giving you actor additional time, it stands to reason they’ll be able to do less of their ‘homework’ and bring less understanding to each of their roles.

The idea of the Fine-Crafted Future came from my discussing with a colleague why e-Books will never completely replace physical books. It’s why I go back and buy ‘heirloom’ editions of the books I love.

This is a concept I’ve toyed with for well over 10 years now. It boils down to this:

As we value craft, and understand craft more deeply, we begin to understand the layers of expertise that a voice actor brings to their work.

But this is all better said in the words of my wife, Helen, as she reviews the latest in voice generation technology:

In my opinion, we need to accept that we’re heading towards a great ‘Creative Divergence’.

Some people like to eat candy. Some people eat lots and lots of candy. But candy doesn’t put the fine-dining restaurant industry out of business. AI-driven media is the equivalent of candy-on-tap for everyone 24 hours a day. So if that’s the case, you want to be in the fine-dining side to the food industry.

Here are some further discussions and thoughtful investigations on the topic of voice cloning & AI:

Add a comment

Related posts:

Another Planet Auction??!!

A new round of planet auction would come if we get tons of Yes in the DAO proposal voting this Wednesday! Let’s get into today’s newsletter to find out more exciting updates for this week : ) Later…

Setting Up Git for Unity

Setting up git for unity is easy. After reading this article you should have a better understanding on how to set up git! To open up git you can go to your start and type in git. Then click on Git…

Wood Wool Acoustic Panels Market Competitive Landscape and Industry Analysis Report by 2030

The in-depth report on the Wood Wool Acoustic Panels Market assists the buyers like business owners, marketing personnel, stake holders, and more to boost their sale and ultimately impact the…