Congratulations! Your project is almost finished and you’ve found a distributor. All that is left is to put the final touches on the mix and you can ship that puppy out. Next stop… fame and fortune!
As a smart filmmaker, you’ve asked your distributor for their required audio and video deliverables beforehand because you want to make sure you give them exactly what they need. The last thing you want to do is deal with costly QC revisions. Then, the distributor sends you the audio spec sheet, a list of technical specifications required for the broadcasting, promotion, and international distribution of your project.
Your heart sinks and your eyes glaze over. What does it all mean?
There are so many standards and different types of deliverables! LKFS. True Peak. Dipped. Un-dipped. LT/RT. How do you know if you’re getting what you need from your mixer? How can you adhere to these standards and avoid time-consuming and costly QC kickbacks? My goal with this article is to de-mystify many of the terms you will find on your typical audio deliverable spec sheet and give you the practical knowledge you need to deliver the right files every time. So pull out your spec sheet and let’s dig in!
Spec #1: Loudness
The CALM Act (H.R. 1084/S. 2847) was passed by the U.S. Congress in 2010 and requires the FCC (Federal Communications Commission) to establish rules that govern television commercial loudness and specifically stated that commercials can’t be louder than the shows that precede them. The FCC, along with a few television standards committees and organizations, established an algorithm called the ITU-R BS.1770-3, which measures the perceived loudness of program material. This algorithm itself is applied to the technical standards known as EBU R128 (in Europe) and ATSC A/85 (in the United States) and you should check the standards of your particular market when delivering. Most of the time, you’ll see the measurement units of this algorithm referred to as LKFS or LUFS (Loudness Units Relative to Full Scale). Over time, the standard began to apply to pretty much all broadcasted programming, including Netflix and other online distribution platforms. In fact, you’ll now find it on essentially every audio spec sheet out there these days, including other mediums such as broadcast radio, podcasts, and online music services.
All of this is to say that when you send your project deliverables to the distributor, it is important that your audio adheres to their specific loudness standards. Generally speaking, your audio must measure between -23 and -25 LKFS for the entire length of the program, although this can vary depending on the distributor. This loudness standard applies to both stereo and surround mixes and can often apply to other mixes. There are many ways to measure LKFS, usually via a software plugin, however there are hardware meters available as well. I personally use either the Waves WLM or Izotope’s Insight metering suite, but there are many other excellent loudness meters out there.
Spec #2: Maximum or True Peak Level
The maximum (or true peak) level refers to only the very loudest parts of your audio i.e. the peak. When you see it on a spec sheet it’s usually describing the amount below the loudest possible level (0 db is the loudest you can get in the digital realm). For example, if your spec sheet says -2 maximum level, the level of the audio cannot exceed 2 dB below clipping. Most mixers place an audio limiter on their output bus to prevent the audio from going over this value, so it generally isn’t a problem. However, I have seen editors have issues when attempting to combine stems they’ve received from the mixer, so if you’re going to attempt to export stems from your NLE, always make sure you have a limiter on the tracks.
Spec #3: Stereo vs Surround Sound
At first glance, this is pretty straightforward, as most of us know what the difference is between stereo and surround sound. However, once we start looking a little deeper, things can start to get complicated.
A surround sound mix, as it is typically delivered to television broadcasters, is a discrete 5.1 mix which has three channels across the front, two channels in the rear, and one sub-woofer (.1) channel dedicated for special low-frequency effects (LFE) like thunder or explosions. Most broadcasters prefer discrete 5.1 mixes, meaning you must deliver each of the 6 channels on a separate file or track. However, sometimes you might be required to deliver something more specialized. While an in-depth explanation of the multitude of surround audio delivery formats is beyond the scope of this article, you should always talk to your audio post-professional about your deliverables, I’ll briefly touch on a couple of the more common surround and stereo deliverables you’re likely to see on your network spec sheet.
Spec #4: Dolby Pro Logic II, LT/RT, and Lo/Ro
Dolby Pro Logic II is a 5-channel (5.0) mix that has been matrix encoded into a stereo 2-channel mix called an LT/RT (Left Total/Right Total). This LT/RT downmix can then be used like any other conventional stereo mix in distribution, transmission, and playback with the exception that it can also be magically (ok, it’s not magic, but it’s very cool) decoded back into a 5.0 surround mix (without the LFE) using decoders that can be found on most home theater systems. This 5-2-5 matrixing was devised by Dolby using “steering logic” and a system of phase and delay relationships in the audio channels. While LT/RT is still quite common in the US television market, it is slowly being phased out in the film world in favor of a Digital Cinema Package (DCP), which is a video, audio, and datastream collection of files that allow filmmakers to standardize their deliverables to distributors. With DCPs generally, you are required to deliver a discreet 5.1 mix with no encoding required.
Additionally, often you will see network specs requesting a Lo/Ro stereo mix. This refers to a simple 5.1 to stereo downmix without using the Dolby Pro Logic II encoding technology and should be indistinguishable from a regular stereo (2.0) mix if done properly, in my opinion. However, some spec sheets (like Netflix) will specifically request a 2.0 that is not a downmix, but a whole separate stereo mix, which I (and others) see as redundant and unnecessary. So it will be up to you to navigate networks that require this, since requesting 2 separate mixes from your audio-post team will increase the money and time needed to provide you with your audio deliverables.
Spec #5: Stems
Stems, simply put, are the separated elements of your full soundtrack, commonly used for promotional or reversioning purposes. They can be single elements (such as voiceover, dialog, music, and sound effects) or several elements mixed together—such as M&E ( music & effects), a mix minus, or MED (music, effects, dialog).In film this is often called the DME, a combination of dialog, music, and effect stems, which can be in channel configurations of mono, stereo, or wider.
Let’s look at a typical audio deliverable spec sheet for your run-of-the-mill reality show:
Typically the dialog, VO, music, and sound effects stems are used by the networks for promotional purposes, whereas the M&E and the MED are used for international reversioning. Let’s take a closer look at what exactly an M&E and MED are.
The M&E and the Mix Minus
The M&E contains only Music and Effects. No dialog or voiceover should be included. However, it should also include any production sound that is not dialog. This includes any on-set audio like a door opening/closing or a car passing by. Additionally, some film and higher-end television productions that are receiving international distribution will require an M&E that is “fully-filled”, meaning that any sound effects that are located on the production dialog track and occur under dialog must be totally re-created for the M&E. This allows international distributors to simply add their own dialog and VO and usually necessitates additional sound work by the audio-post team after the final mix (although not always). Typical low-budget and reality TV programs don’t have the necessary budget or time to request a “fully filled” M&E—so be aware of this.
An MED, or mix minus VO/narration, is for programs where the dialog will be subtitled, but the voiceover will be revoiced for international distribution. This is just another option to allow distributors maximum flexibility when conforming their programs to different languages and markets.
Dipped and Undipped Stems
It’s common for the volume of the music to dip down during dialog to make sure that the dialog is audible. That’s called dipping.
When a network asks for dipped stems, they just want the individual audio stems to be at the level that makes up the full mix. This means that if you simply mix the dialog, narration, sound effects, and music stems together at 0 (no level change) it should sound exactly like your full mix.
Since it takes longer to say things in different languages, most networks will also ask for undipped deliverables for their international distributors. Undipped means that requested stem will stay at the same audio volume level throughout the program, regardless of what the dialog or VO is doing. This allows the international versioning mixer to re-dip the stems to account for the differences in the length of different languages.
These are just a few of the most common things you’ll find on your average audio deliverables spec sheet. As always, you should talk to your audio-post professional before you deliver to make sure everyone is getting exactly what they need. After you get your deliverables out and they pass QC, the only thing you’ll have to worry about is what to wear on the red carpet.
Also, special thanks to Steve “Major” Giammaria and Alan Saunders for their input into this article.
Click the image below to download a handy cheat-sheet.