How Premiere Pro’s Text-Based Editing Transformed My Filmmaking

Documentary editors face a daunting task. We’re responsible for creating a narrative from hours and hours of footage, audio, and archival material.

Because there’s no prewritten script, we rely on interview transcripts and notes to create one. It’s a lot like writing a story. And even though this portion of the editing process demands the most endurance, it’s my favorite part.

Historically, popular editing software has prioritized visual editing over text editing, leaving documentary editors to develop tedious, hacky workarounds to cull interview content. For example, to create sound bite selects, I’d typically create timecoded transcripts from my interviews, print them out for manual highlighting and annotation, and then go back into Premiere Pro to locate the corresponding phrases in the footage. If it sounds labor intensive, it’s because it was.

If it sounds labor intensive, it’s because it was.

Thankfully, Adobe’s new Text-Based Editing feature in Premiere Pro has completely transformed this process. With Text-Based Editing, I can use the transcribed text as the primary representation of audio or video content. Not only can I see the transcript right inside of Premiere Pro, I can edit, rearrange, add, or remove sentences in the transcript—with my text edits automatically synchronized with the corresponding audio or video.

While this new approach will benefit film and video editors of all kinds, it’s particularly useful for documentary editors who prefer to work with interview transcripts to create a radio edit before diving into visual storytelling. Editors like me. So in this article, I’ll share my experience using Premiere Pro’s new Text-Based Editing workflow.

The old way

When I start work on a documentary, my first goal is to find the story. Sifting through hours of interviews, I look for four main elements: character, plot, conflict, and resolution.

When I start work on a documentary, my first goal is to find the story.

Back then, I would typically start by creating multicam sequences for each interview. I’d use Temi—a transcription service that provides read-along tracking—to generate timecoded transcripts and .srt files which I’d  import into my multicam sequences to create synced captions.

I would place the captions inside each multicam sequence because, in older versions of Premiere (pre-2022), edits would cause them to fall out of sync if the captions were all in the top-level sequence. It was a clunky workaround and it made the captions difficult to navigate. But they’d come in handy later, when searching for keywords.

Finding sound bites

Once the captions were made, my next step would be to find the key sound bites. So I’d print out the transcripts and highlight meaningful sentences that aligned with the director’s vision for the project. I’d note things like tone and delivery, draw parallels between different interviewees’ stories, and mark the aforementioned elements: character, plot, conflict, and resolution.

Then I’d head back into Premiere Pro to search through my multicam captions to find the highlighted phrases. I’d assemble these phrases into new sequences to create sound bite selects for each interview.

To create the outline for the story, I would then re-transcribe each sound bite sequence, print those out, and then start to rearrange these phrases on paper to create an actual “paper cut,”—a time-honored technique among documentary filmmakers. When I was happy with my narrative foundation, I’d arrange the sound bites in a new sequence in Premiere Pro to mirror my written outline. Only then would I begin to add visuals and sound design.

Getting to that point was a tedious, time-consuming process. It could take several days, depending on the length and amount of interviews recorded. But back then, there was kind of no way around it.

My new Text-Based Editing workflow

I was very excited when Premiere Pro released Text-Based Editing. As a staff editor at Frame.io, we always like to use the latest versions of our products to put them to the test in a real-world scenario.

The perfect project presented itself. Titled “A Snapshot of Cloud Photography,” it was a behind-the-scenes story about the new Frame.io in-camera integrations with the FUJIFILM X-H2 and X-H2S cameras. It leaned heavily on interviews and, as is often the case in a busy marketing department, it also needed a fast turnaround. Here’s the finished product.

What I discovered is that Text-Based Editing not only allowed me to navigate the interviews and restructure story elements with greater speed and efficiency, it actually helped me to work more creatively and imaginatively.

For each interview, I went to the Text panel to initiate Premiere Pro’s Speech-to-Text transcription. Once a transcript was generated, it displayed in a document-style format within the Transcript panel.

The transcript synchronized the text with the corresponding audio and video in the sequence, so when I played an interview, there was read-along tracking—the words being spoken were highlighted in real time in the transcript.

Streamlined content navigation

If I skimmed to find a phrase in the Transcript panel and clicked on it, the playhead would jump to the corresponding moment in the sequence. Unlike with captions, the structure of the Transcript panel allows for streamlined navigation of the content, making it convenient to locate specific sections. I could easily find when one speaker introduced his work history, or when another talked about his creative process, or another talked about reimagining workflows.

For each interview, I created a new sequence to be filled with sound bite selects (e.g. Kurt’s sound bites, John’s sound bites, etc.). Then, in the Transcript panel, I used Text-Based Editing to find, select, and insert phrases into my new sequences. The selected phrases were added as multicam excerpts. Voilà! All my sound bites were pulled.

For each sound bites sequence, Premiere Pro auto-generated a new transcript. This allowed me to reference an updated transcript exclusively for the selected sound bites, and I was able to read my selections in a new context.

My story elements quickly became clear:

Character – Kurt, seasoned photographer embracing new cloud technology.

Supporting characters – John, Digital Technician exploring cloud technology with Kurt; Victor, Fujifilm VP confident in the Fujifilm and Frame.io integration’s impact on photography; Michael, Campaign Director excited about the integration; Luis, Lead Art Director enjoying the streamlined workflow.

Plot – Kurt and John use the Frame.io integration with the FUJIFILM X-H2 for a sports drink campaign.

Conflict – Kurt often loses creative control with the traditional photography workflow in which he has to turn over camera cards to clients before he’s had a chance to fully evaluate what he’s captured.

Resolution – Frame.io empowers Kurt with creative control and enhanced collaboration, transforming his process.

The radio edit

At this stage, I began the radio edit, the process in which the editor tells the story by laying out all the audio first. I purposefully add more sound bites than would be in the final edit, because I know some will eventually be omitted in favor of visual storytelling. But including them in the radio edit helps me establish the trajectory of the narrative.

Sometimes, to maintain the narrative’s flow and enhance the clarity of the characters’ dialogue, I have to cut in specific words to modify the tense in which the character is speaking or to reshape the intonation of a word. In my new workflow, I can use the search function in the Transcript panel to find alternatives for the word I want to change. For example, if I need to change Michael’s phrase from “It is amazing!” to “It was amazing!” a quick word search will reveal instances in which Michael says “was in his interview, but it might also find instances where he says “was” followed by a word beginning with “a.” For example, he might say “was absolutely…” or “was able to…” This way, splicing in was between “It” and “amazing” sounds fluid.

If Michael concludes a thought with “I believe this will be the future,” but his delivery of the word future sounds hesitant, I can search the transcript for all the other moments where he says it with more conviction. Then, I’ll cut in a couple of options, making sure the waveforms for future line up until one sounds convincing. Swapping in alternative versions of a word with the desired intonation helps me craft dialogue that better aligns with the character’s intended message.

Cutting the junk

As my radio edit evolved, the transcript in the Text panel updated, and I continued to use it to rearrange bites, phrases, and words. I ruthlessly cut out any unnecessary sound bites or words by highlighting them in the Transcript panel and pressing Delete. It’s incredibly gratifying to uncover how much stronger a story becomes with fewer words.

I quickly found a harmony between my characters’ answers. It’s great when interview subjects can finish each other’s thoughts. I used Text-Based Editing to find and add keywords that would transition their sound bites into one another smoothly.

The ability to listen to and read the radio edit as it evolved improved my comprehension of the content. I was able to get extremely precise with the syntax and could quickly determine if one sound bite conveyed an idea more concisely—and could replace it.

After that, I moved to another sequence and worked on rhythm, adding music, sound effects, B-roll, and graphics. When I arrived at version one of the edit, I uploaded it to Frame.io for feedback. Throughout 12 rounds of review, Premiere Pro’s transcripts continued to play a crucial role.

Of course, feedback about specific lines of dialogue could easily be addressed by directly modifying the transcript in Premiere Pro. But what was more exciting was that with Text-based Editing I was able to address feedback that questioned the nuances of dialogue–clarity, impact, and emotional resonance. With Frame.io’s timecode-specific comments and Premiere Pro’s Transcripts with read-along tracking, intonation and cadence were no longer as elusive.

The visual representation of the dialogue, offered by the transcript, helped me spot speech patterns, address pacing issues, insert explanatory phrases, and maintain consistent tone.

Text-Based Editing is transformative

I’m the kind of editor who loves trying new technology. Even if it initially slows me down, if I can envision it streamlining my workflow in the future, I’m ready to embrace it.

But my experience with Text-Based Editing was decidedly not that. In fact, it was the opposite. The beauty of it is in its simplicity.

Text-Based Editing is not just a convenient feature; it’s a transformative tool that empowers editors to elevate the quality and impact of our work. By focusing my initial edits on the text instead of the video, I can make tons of quick edits to the content of a documentary in seconds, without having to play interviews in real time. This means I have more time to be creative with my storytelling, ultimately resulting in a stronger end product.

I have more time to be creative with my storytelling,

Having completed my first project using Text-Based Editing, I’m excited to see how it will continue to impact my post-production workflow. I can imagine that any project involving lengthy interviews or voiceovers would benefit from this new feature in Premiere Pro.

And given that it’s easy to implement and immediately yielded tangible results, I hope you’ll discover the same results for yourself.

Sandra Lucille

Sandra is a film editor with a background in indie film, branded doc, and behind-the-scenes content. She loves experimenting with new tools that streamline post workflows and make digital art creation more approachable. Film and video are Sandra's chosen instruments for change, and she hopes her work inspires optimism, introspection, and transformation.