Media Hash Lists 101: How They Work and Why They Matter So Much
If you’re a producer, DP, editor, post supervisor, or assistant—or involved at all in the process of getting footage from set to post—the chances are you’ve been knee-deep in news about cloud-based workflows in the last few years.
We’re now in a world where, as you shoot, your footage can automatically flow through the internet directly to post production in proxy form with practically no effort needed on your part, which is a revolution. However, there’s a quieter shift happening: how do we keep track of the big camera original files in those cloud workflows?
Making a hash of things
Enter the Media Hash List (MHL), which has been growing as part of download workflows for the last decade or so. It’s a new format for collecting metadata and keeping it traveling alongside the original camera files, and was recently adopted by the American Society of Cinematographers (ASC). So it’s time to get up to speed on the power that it offers.
As you move your editing proxies from set to post via the cloud, it’s important to consider the workflow from start to finish. Cloud workflows are already very robust, with some cameras passing along the original filename and other metadata with an upload-friendly proxy file that editors can work on immediately. Models from RED, ARRI and Sony currently do, and others are likely to follow.However, at the end of the process, you are going to want to reconnect that proxy with the large OCFs for final finishing, and this is where the MHL comes into play. Typically, there will be a lot more OCF than there is material used in the final cut. And, as such, that OCF may be stored across several LTO tapes, cloud storage buckets, several hard drives, or even any combination of those. MHL can help you locate the OCF you need within the sea of archived media.
What is MHL?
The media hash list is an XML file that stores information about your file transfers to make your life easier in post. (XML stands for “extensible markup language,” and is similar to the HTML used for web pages.) You may already be familiar with XML formats from working with XML roundtrips between DaVinci Resolve and Premiere Pro, among others.
XML is designed to be somewhat human readable. So if you’re having issues with an XML file but can’t find a piece of software to open it—which can be due to file corruption or shifting software requirements—you should always be able to open an XML with a text editor and repair the corrupted area. It might not be easy, but it is doable.
Once you get used to the basics of XML, including the use of brackets to delineate formatting and sections, it’s relatively simple to sort through an MHL or any XML for the data you might need.
The data that the MHL contains is designed to make it easier to add metadata to shots in post production without altering the original camera source file.
For instance, you want to document that the shot was shot on Day 4. Some cameras let you add that in camera, but many don’t.
A common workaround is to create a folder called “Day 4” and put the file in there. But this isn’t ideal, since the shot might be copied or moved somewhere else, leading to arcane folders and filenames that are hard to navigate.
The MHL solves this by adding an individual sidecar file alongside your camera original files. Take a look in your transfer folders and you’ll see your OCFs (typically .r3d or .ari or .braw files). Beside these you’ll find files with the same name, but an .mhl file extension. That’s your media hash list. And it can be used to save all sorts of things.
It can include the hash file from the download. The latest version can also include a history of every time a shot is moved. This data includes both time and data of movement and from where to where the file travels on its journey.
Not only that, some MHL tools let you add metadata to the original camera files that travels with the shot—so you can add information like the day it was shot, or a description without relying on filenames or folders.
But perhaps the biggest advantage of MHLs is that they’re searchable. There are even MHL search plugins for this. Let me give you an example: you send out a DP to shoot b-roll in all fifty US states and they add descriptions of their shots to the MHL, you could then search through the MHLs for “Mississippi” to pull just the shots tagged with Mississippi.
Since it’s just a text field, the tagging is unlimited. You could also tag it “cloudy day” and “farmland,” and it’ll show up in searches for all of those terms. It’s a much more robust workflow if you properly train your shooters and DITs in how you want them to organize footage, but the extra effort is well worth it.
It’s worth noting that MHLs don’t require a specific media asset management platform or non-linear editor to open them. So even if your software can’t open the footage, the associated MHL files will still be searchable with all their metadata using MHL-compatible tools—like the ones you’ll find on the mediahashlist.org and Netflix’s Production Technology Alliance lists.
MHL files are also continually editable. With the current version, every time you make a file copy with a tool that supports MHL, whether it’s from camera to post, or post to finishing, or finishing to archiving, the MHL files will be updated to track that history.
What is a hash, exactly?
One word you might be wondering about now is “hash.” If you haven’t run into that word before, it might help to think of hashing as the process of creating a fingerprint that you can use to identify and verify a file.
By running a file through a one-way hash function, you create a string of characters that uniquely identifies that file based on the bytes contained within it. If you change anything about that original file and run it through the hash function again, it will return a different hash. This is how they can be used to ensure your copies are copied bit for bit.
You might also hear the term “checksum” being used. This describes the process of running the file through the hashing algorithm and checking it against the previously stored hash—it’s “checking the sum” and the process is typically much faster than doing a bit-by-bit comparison of the entire file. Though this depends on the hash function being used—MD5 is slower than xxHash.
MHLs store hashes for every file that is copied, and use a variety of algorithms used to make checksums, with MD5 and xxHash being the most common. (The Frame.io Transfer app and our media pipeline use big-endian xxHash64.) In fact, the ASC format supports the following checksum methods: xxHash (64-bit, and latest XXH3 with 64-bit and 128-bit), MD5, SHA1, SHA256 and C4. This is very useful since different facilities and DITs will have their own workflows you can’t always dictate.
On that note, it was previously considered best practice to use the same tools to download your media as were used on set to preserve file integrity throughout post. But MHL is robust enough to allow different tools for every step of the process as each hash will be individually recorded as part of the MHL.
You should be aware that the first time you make a hashed copy will be slower as the algorithm runs and generates the hash. Copies after that will be faster since they can continue to reference the original hash file, assuming that you’re sticking to the same checksum method throughout. Changing the checksum method will require the hash to be rebuilt from scratch.
What does MHL actually fix?
One issue among the many solved here is setting us free from overly complicated folder structure. One habit we used to worry about quite a bit when downloading media was preserving the folder structure of the original camera card. This is something that we should still worry about, but no longer need to be obsessed about.
With the MHL, changes in file structure are no longer a problem since the MHL itself can record information about folder structure. So it can include any changes to it if the tool you’re using is updating the MHL (or making a new one). The sidecar data for each shot contains metadata that tells you what camera card it was on originally, how it was laid out, and if changes were made. So you don’t have to maintain the file structure, or worry if it’s broken.
“With the MHL, changes in file structure are no longer a problem.”
This lets you organize your footage in new ways. For instance, let’s say on set you put all the shots from Day 4 in a “Day 4” folder, but your assistant editor makes a mistake and moves them to another folder. Since the MHL for those files record every file location the files ever occupied, you should be able to recreate that original “Day 4” folder with relative ease. This can be a lifesaver if you are doing an online and conform session and the files have gotten moved after the original edit.
What about the cloud?
This becomes especially helpful when dealing with our new cloud-first workflows. With media flowing directly from camera to set through cloud-based tools, things could get quickly out of hand if it was up to you to recreate the original card file structure. By using MHL workflows, you don’t have to worry about the difficulty of creating those file structures, and can leave it up to automation.
Who supports MHL now?
MHL was launched by the team over at Pomfort, who built the wonderful software tools LiveGrade and Silverstack. A few years ago, Hedge started putting a lot of effort into the format as well. Imagine Products, maker of Shotput, supports MHL in their Truecheck software. Cubix and the now-gone GNARBOX support MHL as well.
Hedge is a Mac/Windows backup solution that supports MHL. Animation © Hedge
But of course the biggest news in the MHL space is the adoption of MHL as a format by the ASC. This means that wider industry adoption is likely, and we’ll probably see more MHL tools going forward.
While you might not be in charge of implementing MHL workflows yourself, a handle on how MHL works will help you as you interface with various teams. For many, a surface level understanding will be enough. For a DP, just knowing what data is being passed on to post is helpful for your confidence about what is and isn’t possible to send along the image pipeline.
If you need to know more, then I’d strongly recommend you visit Media Hash List and the ASC’s MHL page. You’ll find a ton of custom tools available, as well as open source resources for building your own implementations.