May 10, 2026 · 11 min read · MDisBetter

How to Transcribe a Video for Free (8 Methods Compared)

Transcribing video for free is genuinely possible in 2026 — the tools have caught up. The catch, as always, is that "free" hides a wide range of tradeoffs: monthly minute caps, file-size limits, watermarks, mandatory signups, technical setup costs, or quietly-degraded models. Here are the eight methods that actually work, with honest accounting of what each one gives you and what it costs.

Method 1: Web tools with free tiers

The fastest no-setup option. Drop a video file (or paste a YouTube/Vimeo URL) into a browser-based tool and get a transcript back.

How to use

Open /convert/video-to-markdown, NoteGPT, YouTubeToTranscript, or one of the dozen-plus general transcription tools.
Upload the video file or paste a public URL.
Wait for processing (60-180 seconds for a 30-minute video on most cloud tools).
Copy the transcript or download as TXT/SRT/Markdown.

Free-tier reality

mdisbetter — free tier (no signup), monthly minute cap. Output: structured Markdown with speakers + sections + timestamps. Best for AI-pipeline use.
NoteGPT free — monthly cap on YouTube transcripts, plain text + AI summary.
YouTubeToTranscript free — YouTube only, plain text from caption track.
TurboScribe free — 3 files/day, 30 min each. SRT/TXT output.
Otter free — 600 min/month, 40-min per-file cap. Plain text.

Best for

Casual users who want to drop a file and get text back. Pick mdisbetter for AI-input use (structured Markdown), TurboScribe for daily SRT volume, Otter for meetings.

Method 2: Whisper locally (the unbeatable free option for volume)

OpenAI's Whisper is open source. Run it on your own machine and it is truly unlimited.

How to use

# Install
pip install -U faster-whisper

# If your video file isn't already an audio file, extract audio first
# (faster-whisper accepts video files directly via FFmpeg)

from faster_whisper import WhisperModel

model = WhisperModel("large-v3", device="cuda", compute_type="float16")
segments, info = model.transcribe(
    "video.mp4",
    beam_size=5,
    vad_filter=True,
)

with open("transcript.md", "w") as f:
    for seg in segments:
        f.write(f"[{seg.start:.0f}s] {seg.text.strip()}\n\n")

Pros

Truly free at any volume. Best handling of noisy audio. Total privacy. 100+ languages. Choice of model size (tiny, base, small, medium, large-v3).

Cons

Requires Python and ideally a GPU. CPU works but slowly (3-5x real time on a modern laptop). No diarization out of the box (use WhisperX for that). No structure post-processing — you get plain timestamped text.

Best for

Developers, researchers, anyone with privacy constraints, anyone transcribing many hours per month.

Method 3: macOS Live Transcribe / Voice Memos

macOS Sonoma and newer ship live captions and voice memo transcription that work entirely on-device for Apple Silicon.

How to use (live captions for any video playing on the Mac)

System Settings → Accessibility → Live Captions → toggle on.
Play the video in any app (QuickTime, browser, VLC).
Captions appear in a floating window in real time.
To save the transcript, copy from the captions window or use a screen recording.

How to use (Voice Memos transcription)

Extract the audio from your video file: ffmpeg -i video.mp4 -vn audio.m4a
Open Voice Memos on macOS or iOS, import the audio.
Tap the transcript icon — Apple's on-device model produces a transcript.
Copy or share the text.

Pros

Truly free, on-device (no cloud), private, works on any audio.

Cons

Live Captions don't save by default — you have to copy or screen-record. Voice Memos transcription is hidden behind UI tap-throughs and not designed for batch. Apple Silicon required for the offline mode.

Best for

Mac users live-captioning a single video they are watching, or one-off voice memo conversion.

Method 4: Google Voice Type / Live Caption

Google's accessibility features include Live Caption (Android, ChromeOS, Chrome browser on Mac/Windows) which runs on-device.

How to use

Enable Live Caption in Chrome: Settings → Accessibility → Live Caption → toggle on.
Play any video in Chrome (YouTube, embedded videos, local file via drag-into-tab).
Captions render in a floating box.
To save: select and copy the running text, or use a screen recording.

Pros

Free, on-device, available on any platform with Chrome.

Cons

Same limitation as macOS Live Captions: not designed for export. English-heavy support; other languages limited.

Method 5: Otter video upload

Otter accepts video file uploads in addition to its meeting bot.

How to use

Sign up for Otter free (600 min/month, 40-min per-file cap).
In the dashboard, click Import → upload the video file.
Otter extracts the audio and transcribes.
Edit, share, or export the transcript from the Otter dashboard.

Pros

Strong diarization. Searchable archive across uploads. Action-item extraction.

Cons

40-minute per-file cap on free tier (a 60-min lecture won't fit). Plain text output (no Markdown). Aggressive upgrade prompts.

Method 6: VLC + audio extraction + free transcription tool

If your video is in a format that some web tools choke on (large MKV, niche codecs), the workflow is to extract audio first.

How to use

Open the video in VLC.
Media → Convert/Save → select your video → Convert/Save button.
Pick "Audio - MP3" profile and a destination filename.
Click Start. VLC writes an MP3 of the audio track.
Upload the MP3 to any audio transcription tool (the audio-to-markdown tools work the same way).

Pros

Sidesteps video-format issues. Smaller file uploads (MP3 is much smaller than MP4).

Cons

Two-step workflow. The transcription tool you choose still has its own free-tier limits.

Method 7: yt-dlp + Whisper (for online video URLs)

For YouTube, Vimeo, X, TikTok, Twitch, and 1000+ other supported sites, yt-dlp downloads the audio and Whisper transcribes it.

How to use

# Install both
pip install -U yt-dlp faster-whisper

# Download audio only from a URL
yt-dlp -x --audio-format mp3 -o "audio.%(ext)s" \
  "https://www.youtube.com/watch?v=VIDEO_ID"

# Transcribe
from faster_whisper import WhisperModel
model = WhisperModel("large-v3", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3")
for s in segments:
    print(f"[{s.start:.0f}s] {s.text}")

Pros

Works on virtually any online video. Local processing. Highest accuracy. Unlimited use.

Cons

Setup cost. Same Whisper limitations on speaker diarization and structure.

Best for

Researchers and developers transcribing many videos from many platforms with full local control.

Method 8: ffmpeg + manual

The lowest-level option. ffmpeg extracts audio, then any free transcription tool processes the audio.

How to use

# Extract audio from video as MP3
ffmpeg -i video.mp4 -vn -acodec libmp3lame -ab 128k audio.mp3

# Or keep original codec without re-encoding (faster)
ffmpeg -i video.mp4 -vn -acodec copy audio.aac

Then upload audio.mp3 to mdisbetter, Otter, TurboScribe, or feed to local Whisper.

When to use this

When the original video file is in a problematic format, when you need to chop the video first (ffmpeg can clip with -ss and -t flags), or when you want maximum control.

Quick comparison table

Method	Free quota	Setup	Privacy	Output
mdisbetter web	Monthly cap	None	Cloud	Markdown
Whisper local	Unlimited	Python+GPU	Local	Plain + timestamps
macOS Live	Unlimited	Built-in	On-device	Live only
Google Live Caption	Unlimited	Chrome	On-device	Live only
Otter video upload	600 min	Signup	Cloud	TXT
VLC + tool	Tool-dependent	VLC + tool	Cloud	Tool-dependent
yt-dlp + Whisper	Unlimited	CLI + GPU	Local	Plain + timestamps
ffmpeg + tool	Tool-dependent	ffmpeg + tool	Cloud	Tool-dependent

Decision tree

Want Markdown for AI tools? mdisbetter.
Have a GPU and Python comfort? Whisper local. Best free option, period.
Mac user, just need to read what's said in one video? macOS Live Captions or Voice Memos.
Browser-based, English content? Google Live Caption in Chrome.
Many short meetings? Otter free tier.
Problematic video format? VLC or ffmpeg to extract audio first, then any tool.
Online video from any platform? yt-dlp + Whisper.

What you typically don't get for free

The common upgrades behind paid tiers: longer per-file caps, more monthly minutes, additional languages, additional output formats, advanced editor features, team collaboration, real-time captioning, and sometimes accuracy (some vendors route free-tier audio through faster, slightly-less-accurate models).

The format limitation often hurts most. If your transcript feeds an LLM, structured Markdown beats plain text by a meaningful margin. For AI-pipeline use, mdisbetter is the only free Markdown-output option in this list. We cover the broader format question in your YouTube videos are invisible to AI.

The honest summary

For most casual users: pick one cloud free tier and use it (mdisbetter for AI use, TurboScribe for SRT volume, Otter for meetings). For serious volume or privacy: Whisper local. For one-off live transcription on what you're watching: macOS or Google Live Captions. The mistake to avoid is paying before you have tested the free tiers — every option above gives you enough free runway to evaluate.

For the YouTube-specific transcript-download patterns, see how to download a YouTube transcript. For Vimeo specifically, see how to get a transcript from Vimeo. For Zoom recordings, see how to transcribe a Zoom meeting for free. For the audio-only equivalent of this guide, see the parallel /convert/audio-to-markdown-for-podcasters.

The honest cost of "free"

Every option in this guide costs something even when it costs no money. The cloud free tiers cost time waiting in queues and effort working around per-file caps. The local options cost setup time, GPU electricity, and the maintenance burden of keeping a Python environment working. The OS built-ins cost flexibility — they are convenient for one-off use but not designed for archival or batch workflows. The right pick depends on which of these costs is cheapest for you. For most knowledge workers transcribing 5-20 videos per month, the cloud free tiers are the lowest-friction answer. For developers or researchers transcribing 50+ hours per month, the setup cost of local Whisper amortizes quickly. The mistake is treating "free" as a single category instead of recognizing the tradeoffs.

Frequently asked questions

How long does transcribing a 30-minute video take on the free tier?

Cloud tools: typically 60-180 seconds wall-clock. Whisper local on a modern GPU: about 6-10 minutes for large-v3. Whisper on a CPU-only laptop: 90-150 minutes. Most people overestimate the cloud processing time and underestimate the local CPU time — if you're going local, a GPU makes a massive difference.

Can I transcribe a video without uploading it to anyone's server?

Yes — the local options (Whisper, yt-dlp + Whisper, macOS/Google Live Captions, ffmpeg + Whisper) all process the file entirely on your machine. Nothing leaves your computer. Whisper local is the most accurate of these and the right pick for any privacy-sensitive content.

Which free method handles the most languages?

Whisper large-v3 supports 100+ languages with strong accuracy on European and major Asian languages. Cloud tools vary: Otter is English-first, mdisbetter and TurboScribe support major world languages, NoteGPT is broad. macOS and Google Live Captions are English-heavy with limited additional language support depending on OS version. For non-English content, Whisper local is usually the best free option.