Videos & Accessibility

Creating accessible videos can drastically broaden their reach and usability. Unfortunately an often-overlooked part of video production, accessibility doesn't have to add significant time or cost, especially when considered from the beginning.
An accessible video usually includes captions; a transcript; and careful use of color, text, and flashes or animation. A video should also be delivered in an accessible format with an accessible media player, and may include additional audio description when the default audio track isn't sufficient.

Videos & Captions

WCAG Success Criterion 1.2.2 Captions (level A) says captions should be "provided for all prerecorded audio content in synchronized media, except when the media is a media alternative for text and is clearly labeled as such.” This means every prerecorded video should have captions. Captions are text alternatives of the audio content, synchronized with the video. Popular video hosting sites such as YouTube and Facebook have specific captioning options available.

Closed Captions versus Open Captions

Closed captions are viewable when the viewer turns them on. Open Captions are on continuously throughout the video and cannot be turned off.

The use of open captions or closed captions depends on the platform that the video is being shared. For example, for videos posted on YouTube, closed captions are appropriate since YouTube can support turning the captions on and off. Instagram does not support closed captions, so open captions would be the ideal feature for equitable access to videos.

Creating Open Captions

Open captions are encoded typically into the video file. You can use video editing software for creating captions or text overlays of videos, but be sure that the text format is easy to read against the video background.

Open Captioning Software

Essential Captions Elements

Once you create your caption file or transcript, you should review it for quality. A short summary of quality issues to check for is listed below. Described and Captioned Media Program (DCMP) has built an in-depth guide on creating captions. All standards below come from the guide.

  • Identify all changes in the speaker
  • Add any meaningful non-speech sounds in brackets
  • Ensure all spoken content is transcribed exactly, not paraphrased.
  • Do not include any more than 2 lines of text per caption block.
  • Ensure the caption blocks appear long enough to be easily read; generally they should appear for at least 1 second.

Speaker Identification

  • All speakers in the video should be identified in the captions either by their name or by labelling them numerically based upon appearance in the video.
    • (Speaker 1):
    • (Name):
  • Do not identify the speaker by name until the speaker is introduced in the audio or by an onscreen text/graphic.
  • Ideally, the speaker name should be in parentheses in a separate line from the text.
    • (Tom)
    • Hey, you wanna go to the store with me?
  • Identify the speakers or narrator every time the person speaking changes. For example, during a conversation, there may be two or three speakers. Identify each speaker as they add to the conversation.
  • Identify the narrator in the video in the same manner of identifying a speaker. Include their gender, if possible, when identifying the narrator.
    • If there are multiple speakers and only one narrator, identify as (female narrator) or (male narrator) at the beginning of the media. It is not necessary to identify gender for each caption thereafter.

Identify background noise/sound effects

  • Caption the background noise if the noise is necessary for the understanding and/or enjoyment of the media.
    • For example, if there is an explosion off screen, and all of the actors run off stage, how is the caption user supposed to know why the actors all ran off?
  • A description of sound effects, in brackets, should include the source of the sound.
    • [dog barking]
    • [person screaming]
  • Identify if the noise or sound effect is off screen
    • [dog barking offscreen]
  • For offscreen sound effects, it is not necessary to repeat the source of the sound if it is making the same sound a few captions later.
    • [dog baying]
    • [baying continues]

Language Mechanics

Language mechanics incorporate the proper use of spelling, capitalization, punctuation, grammar, and other factors deemed necessary for high-quality captioned media.

  • Capitalization should only be used when there is screaming or shouting.
  • Spelling throughout the video should be consistent.
  • Use serial commas when captioning a list
    • I saw a tiger, rhino, and a bear.
  • When captioning a word that is spelled out, separate capital letters with hyphens.
    • C-A-R
  • Be sure to correct spelling errors by listening to the sentence in context.
    • For example, “there” “their” and “they’re” all sound the same.
    • Listen to the audio to make sure the right “there” is captioned.
  • Use proper punctuation throughout the captions.
    • Do not end every caption block with a period but rather end the full sentence with the period.
  • Use an ellipsis when there is a significant pause within the audio only if there is nothing of importance being displayed visually.
    • Do not use ellipsis to show that the sentence continues into the next caption.
  • Caption filler words for at least the first minute or two of the video.
    • If captioned throughout the entire video, the excess characters could cause the captions to fall behind the audio.

Accuracy & Timing

  • A live transcriber has an accuracy rate of 99%.
    • Closed captions should have the same accuracy rate.
    • The accuracy rate covers the three main elements listed above and the timing/syncing of the captions with the audio.
  • The captions should appear on the screen in real-time as the audio is played.
    • The caption blocks are assigned start and end times so that they appear at the correct part of the video. Many subtitle programs require you to do this manually.
    • Be sure to adjust the timing within YouTube or other platforms so everything happens in real-time.

Editing of Captions

  • The audio can be edited in the captions so that there is less words going across the screen
    • I.e. original “Will you get out of here!”, edited “Will you get out!”
  • When editing occurs, each caption should maintain the meaning, content, and essential vocabulary of the original narration.
  • When the phrase is long, consider breaking it into two lines of text.
    • Original Narration
      And what we're saying is that people with dyslexia tend to get distracted by the words on either side
    • Edited
      Words on either side
      distract people with dyslexia

Save or Export Your File

If you are creating your captions in a separate software from the media player they will be displayed in, you will need to save or export your captions from the caption editor software so that they can be uploaded to the destination media repository.

Captioning files are typically saved with one of the following extensions: .srt, .vtt, .sbv, .dfxp, .sami, or .ttml. SRT files are the simplest format, and are able to easily be edited by anyone using a text editor. If your video is staying on YouTube after editing the captions, you shouldn’t need to export the captions.

If you create your captions in a captioning editor like YouTube or Amara, you can export your file to a variety of caption formats which can then be uploaded to Panopto, YouTube, or any other player that accepts standard caption formats.


Transcripts can be thought of as text versions of your video. A transcript should include not only what is spoken in the video, but also descriptions of actions or important information on-screen. Usually, a fully-accessible video should include both captions and a transcript.

The transcript can be used by a student who is blind, has other disabilities or does not want to watch the video. They can use the transcript to get all of the information from the video with relative ease.

  • You can create a transcript from the captions made for your video.
  • You could use automatic speech recognition software to create a transcript.
    • The transcript will need to be reviewed and edited as there may be errors.
  • Manually type the video content.
    • This can be time consuming, but useful if you only make videos on occasion.

Transcript to Captions

When creating videos, staff should consider using software that will create auto captions that can be edited or create captions manually. YouTube and Panopto will create caption files that can be edited. Other programs, listed below, will create a transcript that must be edited then converted to captions. The conversion can easily be done by uploading your video to YouTube, copying your transcript over and YouTube will do the rest! By the end, you will have a caption file that can be uploaded to other platforms along with your video.
Note: No software will be consistently accurate enough to transcribe audio on its own. You will need to review and correct the output of any machine-aided transcription.

Software and Programs for Auto-generated Transcription

​Note: If you are using any of the above software except YouTube to transcribe pre-recorded audio, you may need to route your computer’s audio output back into the computer as audio input for better transcription quality. Suggestions are available on how to do so for Mac and Windows.

Software and Programs for Manual Transcription

Accessible Video Content

The visual content of the video can impact accessibility for viewers.

  • Use colors thoughtfully and with good contrast
    • Do not use colors only to distinguish important information
    • Pair color change for conveyance of importance with bolding or italicizing your text or starting the statement with the word “Important”
    • Use a color contrast tester for your material or video content
  • Use text that is easy to read
    • The font should be large enough to read
    • Using font that is sans serif is typically the most accessible for viewers
  • Avoid fast-flashing content
    • Flashing content in videos should be avoided or if included due diligence should be taken to ensure it meets the three-flashes-or-below threshold. Do not use videos that have more than three flashes within a period of 1 second, as this can provoke seizures in some users with seizure disorders.

Audio Description

Audio description is a narration track added to the soundtrack of a video to describe important visual details that cannot be understood from the main soundtrack alone. Audio description should be used when there is information being portrayed solely through visual means that is required for understanding. Audio description is typically used by individuals who cannot see the visual output of a video or movie. The audio description should include information about:

  • Actions
  • Characters
  • Scene changes
  • On-screen text
  • Other visual content

If videos are created with accessibility in mind, and the onscreen information is adequately described in the main soundtrack, audio descriptions probably are not necessary.

Page last updated 3:40 PM, June 6, 2024