How Machine Learning and AI Can Amplify Your Creativity

We all know that the only constant is change.

And this is particularly true for the tools and workflows we use to produce film and television.

Many of us will have fond memories of shooting standard definition footage on tape. But now, just a short few years later, we’ve got 12K cameras, virtual sets, and cloud-based workflows.

And things aren’t slowing down. Artificial Intelligence (AI) and Machine Learning (ML) tools are poised to move us forward, faster and further than ever.

If that last paragraph makes you feel anxious, you’re not alone. For as long as computers have been around, there have been headlines telling you that they’re after your jobs.

I prefer to look at it in a different way.

In this article, I’ll get you up to speed on the state of AI tools and the impact they’re already having on the creative industry. If you want to dive deeper down the rabbit hole, check out my technical breakdown of even more AI tools, and what they mean for filmmakers

Amped up

If someone took away all of the things you use to do your work, would you still consider yourself a creative person? Of course you would.

Machine Learning and AI tools won’t change that. They’ll amplify your creativity. The super-smart assistants coming our way are just that—assistants—and the people who adopt these tools will become more creative, not less.

And there will be more of us, too.

At the dawn of cinema, very few people had the opportunity to express their creativity through motion imagery. The hand-cranked cameras and nitrate film of the late 1800s were just too expensive or complex to be accessible. And throughout the film era, motion imagery remained an exclusive club.

But every digital leap has brought with it an explosion in personal and professional creativity, to the extent that video is now considered an essential form of communication.

There’s a huge opportunity here. We can combine the best of what we do with the best that machines can offer. With machines handling the mundane, we’re left with more time for creativity and experimentation.

And when so many can attain a high standard of visual quality, it raises the bar for the rest of us. I think the world will be even more beautiful and entertaining as a result.

The robots are coming here

As Steve Jobs said “Technology is nothing. What’s important is that you have a faith in people, and if you give them tools, they’ll do wonderful things with them.”

Let’s be very clear: Tools are not competition.

We are so much more than the tools we use. Technology is for us.

Machine Learning is already in play and the industry is reshaping itself like it always does. But ML is here to play a supporting role—think less SKYNET, more WALL-E.

It’s still up to us to pick and choose what works and ditch what doesn’t.

But before we take a look at the innovations that are likely to reach tomorrow’s mainstream, let’s get something straight.

Machine Learning is not intelligent, even though marketers seem to want to slap the term Artificial Intelligence on everything these days.

Almost everything that is reported as being “Artificial Intelligence” is Machine Learning, where a “machine”—mostly some variation on neural networks, but not exclusively—is trained to perform a task.

ML, AI, or whatever you call it, is effectively rote learning.

It’s a system built to analyze a data set to produce a desirable result—like object recognition or upscaling in photos, noise reduction or rotoscoping in video, transcript generation from audio recordings, etc. It doesn’t know what the right answer is until it’s told.

A GAN won’t suddenly stop crunching numbers, shout “I’ve cracked the third act!” and rush to the nearest keyboard. Inspiration and intuition are not ML’s strong points.

Drudgery, on the other hand? ML is great at drudgery.

So let’s take a look at some examples of how Machine Learning is being used in video creation already, and think about the tedious work it frees us from.

Storytelling 

It should come as no surprise that storytelling, a role that’s driven by creativity and humanity, is poorly served by machines.

For example, Beyond the Fence was the first stage musical “written by computers,” but in reality, writers Benjamin Till and Nathan Taylor worked with a series of Machine Learning models to speed up the creation process. (And wrote the music. And some of the lyrics.)

Reviews like these don’t paint the results as a success.

However, if you take another look to examine the project from a productivity perspective, Till reports that his previous, unassisted project took thirteen times longer to complete.

That’s a huge amount of time saved—which perhaps could have gone into making a better musical. But that wasn’t really the point of the exercise.

(On a side note, the machine used for the content generation was dubbed Android Lloyd Webber. So there’s that, at least.)

A similar conclusion can be drawn from Sunspring, which demonstrates what happens when you leave scriptwriting entirely to the machines. Kudos to all the actors for almost making sense of it all, because their art shines. The machine’s writing? Not so much.

But there are other experiments where machines show at least some potential benefit in the writing process.

For example, popular YouTuber Tom Scott used GPT-3, a powerful machine-learning language model, to generate thousands of topics/video titles. In this case, a human would still actually write the script, but the machine’s ability to make plausible-sounding video titles could help inspire creators to pursue stories they would not have thought of otherwise.

Of course, as Scott is quick to point out, most of GPT-3’s suggestions are nonsensical or ridiculous. But he also acknowledges that many others would make for fascinating fiction. Though, as he points out in a followup experiment, GPT-3’s lack of technical accuracy is a significant hurdle for most other types of content production

So while we’re still a long way off from having fully robo-written scripts (at least that are any good) don’t be too surprised if you have an ML-based writing assistant in the future.

Pre-production 

Perhaps a better example of how Machine Learning can benefit storytellers would be RivetAI.

This company uses ML as the foundation for their Agile Producer software, which can automatically break down your script into storyboards, generate shot lists, optimize schedules, and create ballpark budgets. Which are all tasks that most of us would be happy to hand off to a machine.

Similarly, Disney—who created storyboarding in the first place—has an in-house AI that they use to generate simple storyboard animations from scripts.

Read their paper on it here, and you’ll see their stated intent is “not to replace writers and artists, but to make their work more efficient and less tedious.”

While that’s a proprietary tool, it exists and is already in use. So don’t be surprised when the technology filters down into our everyday toolkit (these things often happen faster than you might expect). 

Production 

It’s hard not to be awestruck by technology at the top end of town.

For example, The Volume, Industrial Light & Magic’s colossal virtual set is an absolute game-changer. But Machine Learning is bringing practical benefits to much smaller crews and solo performers.

Take Apple’s Center Stage for the iPad Pro, for example.

This uses Machine Learning to identify the subjects within view of the camera, and crop/reframe to make sure they’re kept in view.

Similarly, hardware like Pivo and Obsbot Me employ machine learning to operate PTZ (pan, tilt, and zoom) camera mounts to keep the subject framed when they move around—a process that previously required the subject to wear a radio transmitter.

And if you’re a drone operator, DJI’s MasterShots/Active Track function uses the same kind of object identification to automate drone flight, producing cinematic aerial passes without any pilot involvement. The results are impressive.

Digital Actors and Sets 

From the scarabs in The Mummy, to the hordes in World War Z, and The Battle of Five Armies in The Hobbit, modern filmmakers have frequently used simulated actors for crowd scenes.

But when it comes to individual performances, production companies still rely on actors to breathe life into their synthetic counterparts—even when the simulations are as real as those generated by Unreal Engine’s MetaHuman Creator (which itself is the product of Machine Learning processes).

This presents exciting new possibilities to actors, who can now portray characters that are completely dissimilar to their own appearance (past, present or future). Even from locations closer to home, assuming that a suitable mocap rig is available.

In the following clip, you’ll see BBC newsreader Matthew Amroliwara delivering a sample report in English, Spanish, Mandarin, and Hindi. He only speaks English.

While this isn’t taking place in real time—and we’re still a world away from being able to instantly synthesize genuine emotional performances—it’s easy to see the benefits of being able to quickly create and distribute content in multiple language formats when the societies in which we live are increasingly multicultural. 

Sure, there are some extremely disturbing and unethical trends arising from this particular technology. And we’re still working out where the line can and should be drawn. But this doesn’t mean it can’t be used to achieve positive outcomes. Like this anti-malaria campaign featuring using David Beckham, among others.

Audio production and post

Similar technology can be found to assist in audio production and post, like automated voiceovers (text to speech), voice cloning, and music composition. 

Just like the voices in our phones and digital assistants, services like Talkia, Speechelo and Lyrebird (now part of Descript) can generate speech from text with not-horrible results.

Like our Chinese AI newsreader, the end results still live in Uncanny Valley, but if you’re tasked with producing training videos or product demos for corporate use, they’ll do the job. Especially if you’re targeting markets in different languages.

To my ears, Lyrebird comes closest to a convincing performance. But don’t take my word for it, listen for yourself on Descript’s Lyrebird demo page. The sample below was generated using the voice known as “Don”—which I’m guessing is a tribute to legendary voice actor Don LaFontaine and modeled by EpicVoiceGuy Jon Bailey.

And it doesn’t stop there. While Talkia and Speechelo are speech-to-text tools, Descript allows you to clone your own voice, which can then be used to overdub errors or alterations to your recordings. It’s powerful stuff and can get you out of a scrape without the need to record pickups.

If you need a music bed to go with your ML-synthesized voice, then you should take a spin at AIVA for some ML-synthesized instrumentals based on parameters that you control.

As you can hear, the out-of-the-box results are impressive, but it also lets you pop the hood on the composition and tinker with almost everything in a piano roll view. I’ve definitely heard worse in some library collections!

VFX

ML is already in upscaling, noise reduction, frame rate conversion, colorization (of monochrome footage), intelligent reframing, rotoscoping and object fill, color grading, aging and de-aging people, digital makeup, facial emotion manipulation, and I’m sure there’s more that I’ve missed.

An excellent example of the tools coming together is The Flying Train footage from 1902 Germany. Compare the original footage from the MoMA Film Vault with the 60fps colorized and cleaned version. 

Adobe, Apple, and DaVinci Resolve users have been using Optical Flow for frame rate conversion to create frames that were never shot, but DAIN: Depth Aware video frame INterpolation takes it to another level. In the demonstration, they take 15 fps stop motion animation up to a very smooth 60 fps. It’s an Open Source project and you can download it for a Patreon donation. (The author is also working on an improved algorithm called RIFE, which is much faster.)

Intelligent Reframing comes into play when we need to convert between video formats. Framings that work well for widescreen 16:9 aren’t always going to work in a square or vertical format. We’ve seen ML-driven reframing in Premiere Pro (Sensei), DaVinci Resolve, and Final Cut Pro, but if you’d rather roll your own, Google published an Open Source project—AutoFlip: Saliency-aware Video Cropping

The existence of an Open Source project strongly indicates that the use of ML for reframing is a known and mature technology.

And if compositing is more your thing, here’s a demo of a new rotoscoping tool found in RunwayML’s NLE, Sequel, which is even faster and more accurate than Adobe’s Rotobrush 2. And who doesn’t want that?

Fast and easy rotoscoping will mean more people can take advantage of the expanded creative universe that this technique allows. With applications like these, it’s not hard to see how AI isn’t just a time-saver for creatives, but a powerful, virtual creative agent.

For example, you would think that color grading requires a human eye, and it does. As an example of ML-powered color grading Colourlab Ai‘s developers echo the Amplified Creative philosophy: 

Colourlab Ai enables content creators to achieve stunning results in less time and focus on the creative. Less tweaking, more creating. 

Metadata 

As we’ve already mentioned, ML is great at drudgery.

So, it’s natural that we would start using it for asset management, where it provides significant productivity benefits.

For example, being able to search for images or footage by their content is an enormous advantage when searching for b-roll. It’s going to take a while before you find it in your daily driver NLE, but it’s already available in third-party tools.

Most major Media Asset Management systems include some form of visual indexing and searching. AxelAI, for example, performs all the analysis on the local network. While Ulti.media’s FCP Video Tag uses a number of different analysis engines to create Keyword Ranges for FCP in a standalone app.

Transcription is now such a mature technology that in most cases it is at least as accurate as human transcription; both failing equally at specialized jargon. With inexpensive transcripts and text-based video editing in tools like Lumberjack System’s Builder NLE  it’s a new way to get to the radio cut, powered by ML.

Unfortunately, we’re not yet at a point where we can extract useful keywords from interviews. But it’s probably coming soon.

Editing 

As far back as April 2016, I wrote about automated editing tools already in use. They used a variety of approaches around some form of template.

A little later, Magisto added image recognition that analyzed your source footage material to generate a decent edit of event-driven content.

And now, in the ML era, you may already have experienced automatic editing for yourselves. Richter Studios’ Blog post AI and the Next Decade of Video Production—now four years old— talks about Apple’s “Memories” feature which is ML editing at work.

There is definitely going to be a trend toward smarter content recognition and automatically assembled programming—particularly if you’re creating shows like House Hunters where the template is obvious. All that’s needed is a little real-time logging and the right ML models, and at least the first assembly edit is done ready for final shot selection and trimming.

It’s already happening with individualized sports recap packages, but we’re still a very long way from ML-edited narrative or creativity. The bottom line is that ML can’t be original. Even when it looks like it can, you’ll usually find a human pulling the levers somewhere.

Amplify Your Own Creativity 

Yes, we’re facing a massive wave of new technology.

Every major leap in technology can be uncomfortable. And changing established workflows and patterns can be difficult or expensive. The innovations that are coming are going to be disruptive.

How quickly and directly this will affect you depends on where you sit in the broad spectrum of film, television, corporate, education, and other production.

To summarize Charles Darwin, it’s not the strongest of the species that survives, nor the most intelligent. It is the one that is most adaptable to change.

Your workflows and tools are going to change. They always have.

And we’ll adapt to these changes. Like we always have.


Lead image courtesy ColourLab AI.

Philip Hodgetts

Philip Hodgetts is a technologist, editor, and industry pundit with over 30 years' experience in production and post. He is a recognized authority on the changing nature of the digital landscape and has an enviable record of accurately predicting the implications of changing technology over the last decade. He's now focused on using metadata to make workflows more efficient with Lumberjack System and Intelligent Assistance Software, Inc. Lumberjack Systems is an integrated suite of tools for acquiring and managing content metadata from shoot to suite, including Builder NLE, the only NLE developed for editing Video and Audio with text. Intelligent Assistance is a technology developer specializing in metadata-enabled, XML-integrated workflows. He is the author of several books has presented his work on the application of Artificial Intelligence at numerous conference including the the Hollywood Post Alliance.

The New Premiere Pro AI Tools I’ll Definitely Be Using

Video Bitrates and Export Myths

How to View Vision Pro’s Spatial Video Before You Get Your Headset