An image showing a three-step process involving transcription.

See It, Say It – How to Easily Create Transcripts

The other day, I was deep in Camtasia, crafting a tutorial on how to incorporate accessibility into our everyday work with Word and Docs. Camtasia is a versatile tool, but it falls short when it comes to creating captions and transcripts unless you purchase an add-on. Lacking this add-on and hesitant to upload to Panopto, I embarked on a quest for a solution that was fast, accurate, and affordable. Below are some of the tools I experimented with.

Audio/Video Transcription Tools

There are many reasons to include captions and transcripts with audio/video content.  These include:

  • Accessibility
  • Enhanced comprehension and learning
  • Increased search engine optimization
  • Flexibility in how people access the content
  • The ability to provide captions and transcripts in different languages

My criteria for evaluating these tools was speed, accuracy, and affordability.  

A quick note – I prefer transcripts with timestamps. Primarily because they make it so much easier to pinpoint specific comments and review the context before and after any given section. This feature is particularly useful with tools that don’t sync as you listen to the media.

Whisper AI

Whisper AI quickly became my favorite transcription tool. Initially, the unfamiliar code seemed daunting, but once I overcame the initial intimidation, I discovered it to be incredibly fast and accurate. Plus, it’s completely free at the moment. 

I highly recommend starting with Kevin Stratvert’s “Best FREE Speech to Text AI – Whisper AI” video and his corresponding webpage

Other features:

  • Part of the OpenAI family which brought you ChatGPT, Dalle2
  • Can be used in the “cloud” or installed on your computer. 
  • Supports a variety of ffmpeg file types including MP3, MP4, WAV, etc.
  • Offers translation for about 96 languages
  • No file size restrictions that I found.

Office 365 (O365)

I was pleasantly surprised to discover that Office 365 offers transcription capability.  

  • Important note: Use the online version to have access to the “Transcribe” button – it is not available on the Desktop version.
  • The transcript initially appears to the right of the Word document which confused me the first time.  I think it will make sense once you try it.
  • Cons
    • Import file limit of 300MB (most video are more than this)
    • Slow – took about 4 minutes to transcribe an hour of audio (file size was 54MB)
    • Didn’t seem as accurate as Whisper AI
  • Pro
    • Offers the choice to include just text, just timestamps, just speakers, or a combination.
    • Supports both audio and video file formats.
    • Useful for interviews, such as podcast studio sessions.
    • Language support is growing, with 15 languages currently supported in O365 dictation and I presume the transcription tool follows suit.

Listen Monster

Comparatively, this was my least favorite and is quite limited in the number of minutes and file size it will accept.

  • Cons
    • Not primarily a transcription tool, but you can create a time stamped transcript or strip out the timestamps with another tool.
    • Free versions have file size limits from 10MB to 50MB, while the pro version offers 1GB. Each version has different minute limits.  All have different limits on the number of minutes you can process per month.
  • Pros
    • Offers translations in various languages (full list unavailable).
    • Free and paid versions available ($37 for a lifetime subscription).

Bonus tools

YouTube only tools

In my exploration, I also discovered some very useful tools when working with YouTube videos.

Campus tools

I’d be remiss if I didn’t also mention additional tools you can use with a campus login:

Panopto 

  • Very good auto transcription engine.
  • Perceived to be faster than Office 365.
  • Handles large files and integrates easily with Moodle.
  • Can struggle with discipline-specific jargon, but caption editing is straightforward.

REV

  • Costs $1.25 per minute.
  • Integrated with Panopto.
  • Suitable for sensitive or private media files.
  • Good for content with jargon that automated tools struggle with.
  • Turnaround time between 24 and 48 hours.
  • Reach out to Accessibility to have an account created.

Finding the right transcription tool to help integrate accessibility into your audio/video creations can be challenging. Whisper AI stands out as a top choice due to its impressive accuracy and cost-effectiveness, despite an initial intimidation factor. Office 365 offers convenience and decent functionality but falls short in terms of the speed and file size limitations of imported media. Listen Monster, while affordable, is less ideal due to its restrictive limits. For YouTube-specific needs, tools like YouTube to Text and Video To Be provide valuable options. Additionally, campus tools such as Panopto and REV offer robust solutions for handling large files and sensitive content, though they come with their own set of pros and cons. Ultimately, the choice of tool depends on the specific needs of your project, but learning how to include captions and transcripts in your multimedia projects helps ensure that as many as possible can participate.

Credits:

  • Some of the components and ideas for the masthead comes from “evidence transcription” by bsd studio from the Noun Project (CC BY 3.0).
  • I also used ChatGPT and fellow ITS colleagues as editors