Looking at Virtual Production Through a Cinematographer’s Lens
These days, there are more parallels between the production of movies and video games than ever—including budget, in some cases. But in many ways, video games often have the upper hand. Unbound by the limitations of the real world, they can feature vast, spectacular environments that are incredibly hard to match with physical sets, even at the highest of the high end of filmmaking.
The solution, of course, is to combine physical and virtual worlds, which film makers have been doing to some degree since as far back as 1930 with rear projection, and more recently with blue- and green-screens.
But neither of these approaches are ideal. The illusion of rear projection fails the moment you move the camera, and actors can find it challenging to perform when there’s nothing but a hundred square meters of green cloth to interact with.
And it’s not just actors who have to deal with the limitations of green screen. Cinematographers live in a world that’s still mostly defined by a near century-long tradition of photochemical film and lenses that flare and glow. Creating the longed-for look of the world’s best movies in a world made of violently-green paint and fabric is hard.
Working with a real-world backdrop that reacts naturally to subjects like smoke, hair, transparency and reflection is a whole lot easier. And that’s where virtual production scores, and it keeps scoring in many other ways.
Most of us already know that virtual production uses walls of LED screens to display a background that moves around to compensate for camera position. It’s a remarkable feat of geometry, which relies on a combination of three main technologies that were developed for completely unconnected applications. You can skip ahead if you’re already very familiar with these. If not, here’s a quick primer.
LED video wall
The most visible piece of hardware is the LED video wall itself—and in filmmaking, these can be enormous. For example, the Pixomondo stage in Toronto is 208’ wide and 23’ tall. If you wanted to build something similar, and were using ROE Visual’s BP2 LED tiles to do it, this would take nearly 1800 of them to make. So it’s easy to see how the display can become the most expensive line item on a shoot.
And while these extremely specialized panels have very little in common with their predecessors, it’s worth noting that they’re not new, with the first LED screen appearing back in 1977 though, due to the difficulty in developing blue LEDs, commercial color displays weren’t possible until the late 90s.
Motion tracking has also become very familiar, thanks to behind the scenes photography of people running around in spandex outfits covered in reflective balls. Recognisable forerunners, based on film, go back to the early twentieth century. Electronic motion capture also dates back to the 1970s although the idea of using it to detect the motion of a camera, rather than the camera’s subject, probably wasn’t what the pioneers had in mind.
3D rendering hardware
The graphics rendering hardware used in virtual production is, arguably, the most contemporary of the three technologies at play in virtual production—though even these had their origins back in the 70s, too. Today’s graphics processors (GPUs) are among our most advanced microelectronics but they’re also mass market devices. They’re often related to the chips found in gaming consoles, and sometimes the exact same devices found in gaming PCs.
So when you consider how the technology for virtual production has been around in some form for some time, and how our desire to mix virtual and physical production has existed for even longer, it’s clear that virtual production was inevitable. It was just a question of when. To which the answer is clearly now.
So, now that we’ve got it, how do we work with it?
Building rendering servers for virtual production often means putting together some of the highest-performance single computers used anywhere— at least if we consider sheer number crunching. It’s common to select the best graphics processors, CPUs, and huge amounts of memory in order to meet the minimum requirement of showing complex scenes at a frame rate that matches that of the cameras being used.
As mentioned earlier, the cost of the video wall can be eye-watering. So, the cost of putting the very best of everything in the rendering servers pales by comparison. Most facilities spend big on their machines.
It should be pointed out that it’s actually possible to put together a virtual production facility with the sort of gaming PCs that have skulls on the front and LEDs on the cooling fans. It’s just not very likely we’ll see that happening behind the scenes on the next Disney production.
No matter how much money anyone spends, though, there will still be limits on the complexity of the scenes which can be shown. Performance is often capped by the GPUs themselves, so many facilities use two per server to even out the load between them and the rest of the system. Where the pictures go after that depends slightly on the exact specification of the video wall.
There’ll inevitably be a rack-mounted processor which takes a video signal from the server and splits it up into regions to be fed to each panel. Often the connections between those processors and the panels will be Ethernet or fiber to minimize the sheer bulk of cabling between the processors and the hundreds of panels in the video wall.
Let’s consider those technologies in more detail.
For years, graphics hardware has been capable of rendering a very satisfactory video game in real time. Just not necessarily realistic to the extent that virtual production requires. A lot of our perception of realism is driven by lighting, reflections, and shadows, which are enormously hard to compute. But when GPUs like NVIDIA’s Quadro RTX range first introduced ray tracing in 2018, things started moving much faster.
Given the beautiful environments we’ve all seen in behind-the-scenes stills, it’s easy to get the idea that virtual production inevitably involves a long winded and expensive process of building an elaborate virtual world. But it depends on exactly what sort of virtual environment a production needs. Regardless of the scale—whether a single room or a vast city—virtual set creation is a skilled art requiring the right people, just as much as when physical models are used to do the same thing.
Tools of the trade
As such, a virtual production invariably needs a virtual backlot and a virtual art department. That might mean one person to assemble some pre-existing assets purchased from a library, or a whole team to build half a planet in fine detail.
Real-world locations can be scanned using photogrammetry, where both shape and surface detail is calculated from a series of photographs. There are many online libraries of assets suitable for various of the real-time 3D rendering engines, although the level of realism may vary.
The work required to create the virtual world has much in common with the work which might once have been done to create the equally virtual worlds of computer-generated visual effects. The crucial difference is that VFX can, release schedules allowing, overrun. Prep for virtual production must be complete before the scene can be shot.
This is not new. For example, the model unit for Aliens worked right alongside the main unit. What that should illustrate, though, is that it’s deeply ill-advised to show up at the virtual production stage unprepared on the first day of photography.
The big-screen TV
Some people like to think of rear projection as an early example of virtual production, in which case virtual production is far from a new technique. As mentioned earlier, rear projection goes back as far as 1930—almost as far back as synchronized sound.
And, similar to modern synchronized sound, rear projection and virtual production both demand that the background and the production camera must be genlocked, so that each frame recorded by the camera sees exactly one frame displayed on the background.
Brilliant examples of film rear projection include Cameron’s Aliens, which gave us a spectacular elevator ride down through a vast industrial complex, and the car rides in The Matrix. Front projection was famously used on the early Superman movies, but found a new prominence on Oblivion at a time when in-camera compositing of that sort was often seen as outdated.
The benefits are the same as virtual production using an LED wall: perfect integration of difficult foreground subjects which might be reflective, transparent, out of focus, or with very fine detail.
Cameron knew how to make back projection look good, with interactive lighting, smoke and steam, and the use of a handheld camera. All of those techniques sell just as well with an LED video wall, whether it’s displaying a full three-dimensional virtual environment or pre-recorded live-action footage of the real world. This sort of thing can be used even on a tiny budget, given some quite basic modern video projectors and a bit of know-how.
Real lighting from virtual data
Any kind of projection will almost certainly lack the sheer brightness and contrast of a video wall, though. Sections of video walls have been used as simple light sources—rather than imagery displays—since the early 2010s, on shows including Tron: Legacy and Gravity. In practice, the idea of lighting scenes with an LED wall can suffer some practicality shortfalls, not least that a wall calibrated for proper brightness on camera typically won’t actually be emitting that much light.
The other issue is color quality. It’s possible to calibrate an LED video wall to match a huge variety of cameras, and the images displayed on them look fantastic. However, they’re made solely out of red, green and blue LEDs, and the color quality of emitted light, while it looks pretty, is generally somewhere between poor and awful.
By comparison, high-quality lighting built for film and TV work tends to use white-emitting LEDs constructed from a blue LED and a yellow-emitting phosphor. As the industry in general is painfully aware, the color quality of white LEDs hasn’t always been the best, so it’s no surprise to discover that the color quality of red, green and blue LEDs is even less likely to make people look wonderful.
Manufacturers began to address this issue at the 2023 NAB show. Processor manufacturer Brompton and panel manufacturer Roe showed produce an LED video wall panel and processor including white chips for much better colour quality.
In the past, we might have used various improvised mechanisms to (for instance) move lights past cars to suggest movement. Virtual production can simulate that kind of motion very nicely, although the sheer power of LED panels creates limits. If we’re particularly keen to see the moving shadows of passing street lights flash past our speeding driver, we might need more, and that’s where image-based lighting (IBL) comes in.
Brightness and contrast
At the most basic level, IBL allows us to control real-world production lighting using data from a video image. A canonical example would be to control an array of overhead tube lights using information from the sky of the virtual world, as seen in this demo of Quasar Science tubes on the NAB show floor.
Here, details of the overhead view cast passing light onto the live action foreground—and with color quality that flatters. Image-based lighting is a huge subject, but in general any light that has remote control capabilities can, in principle, be connected to the video data of a virtual scene.
Perhaps just as important as brightness is contrast. By comparison, a rear- or front-projection screen is a large white object that can only produce black to the extent that no other light is falling on it.
It has been said that LED video walls, which are mainly black, aren’t subject to the same problems, but that’s not entirely true. The wall is not absolute zero reflectance, since some of it is made of shiny LEDs. So it’s still necessary to keep extraneous light off the wall to the greatest extent possible. Pack lots of black flags. Still, it’s a lot less reflective than a projection screen.
Adequate resolution is important to virtual production: the tighter the pixels on the video wall, the more in-focus the wall can be before we start to see individual pixels and risk moiré patterning.
In that context, the value of “adequate” depends on the size of the wall, how far away the camera is, and what lens is in use. So a higher resolution wall can be more flexible in terms of camera position and framing. As a result, facilities tend to target high resolution and may have walls tens of thousands of pixels across, which is why there are usually a whole rack full of servers, each driving a small section of the display.
Capturing the camera
If 3D rendered virtual production has a party trick, it’s the way a rectangle of useful imagery (called a frustum) is kept in front of the camera at all times, in just the right way for things to always look correct—no matter how you move the camera. Meanwhile, the rest of the video wall displays a largely static perspective of the surrounding scene, creating proper reflections and interactive lighting.
Really violent camera motion can require some special attention to ensure we don’t see past the edge of the frustum. Still, the option to use any of our favorite grip techniques to move the camera is one of the best features of virtual production.
To put the frustum in the right place you need to know where the camera is and where it’s pointing. And this is typically done using the same technology as performance capture, but with two different options.
Outside-in capture involves “witness” cameras distributed around the studio, watching physical markers on the taking camera. Inside-out capture places a witness camera on the taking camera which observes markers around the studio. The latter is sometimes a technique in broadcast studios, but either works if there’s compatibility between the motion tracking system and the graphics rendering system.
Whatever method you use, the system also needs to understand what the taking camera is seeing, including focus, iris and (as appropriate) zoom settings are, as well as some basic information about the lens.
Some of the information is similar to what might be required before a lens is used for a visual effects shot. Among other things, the precise field of view and any distortion will be measured at a selection of focus distances. This is reasonably straightforward for spherical prime lenses; it’s a bit less straightforward for anamorphics, which tend to have odd distortion behavior.
It’s even less straightforward for zooms, which must often be measured at a variety of focus distances and a variety of zoom distances.
With lenses in mind, virtual production is an especially attractive idea for enthusiasts of pocket-money glass picked up on eBay. Some of those ancient greats may be long on history but short on resolving power. Flare, distortion and softness might be a real nuisance to a greenscreen compositor. On a virtual production stage, though, those things are almost a benefit, helping to tie together real foregrounds and virtual backgrounds with flare, softness and glow which look real because they are real.
Diffusion filters, which we might hesitate to use on a green screen stage, enjoy similar benefits. As we saw on Aliens, smoke and steam techniques which would make greenscreen difficult are actively helpful in both rear projection and virtual production.
If the limit is the capability of the rendering servers to draw more and more complex scenes, the future looks bright. The enormous capability of modern GPUs is now being leveraged by a lot more than games consoles, and AI-based optimizations like DLSS-3 have the potential to provide higher frame rates even in the most complex rendered environments.
“If the limit is the capability of the rendering servers to draw more and more complex scenes, the future looks bright. “
The capability of virtual production to depict more and more intricate worlds seems destined to hit the same limiting factor as video games themselves: the ability of humans to spend enough time designing that world.
For the time being, the real limit on the big, spectacular stages is financial—a seven-figure installation means five- or even six-figure day rates. Where a name actor with limited availability needs to appear in a series of widely-distributed locations, that absolutely makes sense.
At the same time, hybrid approaches using rear- or front-projection, live-action, or even miniature footage work just as well as they always have, meaning that there are routes to virtual production, or perhaps we could just call it in-camera compositing, on every show from the tiniest short to the next superhero blockbuster.