How to Transcribe a Video for Free (8 Methods Compared)
Transcribing video for free is genuinely possible in 2026 — the tools have caught up. The catch, as always, is that "free" hides a wide range of tradeoffs: monthly minute caps, file-size limits, watermarks, mandatory signups, technical setup costs, or quietly-degraded models. Here are the eight methods that actually work, with honest accounting of what each one gives you and what it costs.
Method 1: Web tools with free tiers
The fastest no-setup option. Drop a video file (or paste a YouTube/Vimeo URL) into a browser-based tool and get a transcript back.
How to use
- Open /convert/video-to-markdown, NoteGPT, YouTubeToTranscript, or one of the dozen-plus general transcription tools.
- Upload the video file or paste a public URL.
- Wait for processing (60-180 seconds for a 30-minute video on most cloud tools).
- Copy the transcript or download as TXT/SRT/Markdown.
Free-tier reality
- mdisbetter — free tier (no signup), monthly minute cap. Output: structured Markdown with speakers + sections + timestamps. Best for AI-pipeline use.
- NoteGPT free — monthly cap on YouTube transcripts, plain text + AI summary.
- YouTubeToTranscript free — YouTube only, plain text from caption track.
- TurboScribe free — 3 files/day, 30 min each. SRT/TXT output.
- Otter free — 600 min/month, 40-min per-file cap. Plain text.
Best for
Casual users who want to drop a file and get text back. Pick mdisbetter for AI-input use (structured Markdown), TurboScribe for daily SRT volume, Otter for meetings.
Method 2: Whisper locally (the unbeatable free option for volume)
OpenAI's Whisper is open source. Run it on your own machine and it is truly unlimited.
How to use
# Install
pip install -U faster-whisper
# If your video file isn't already an audio file, extract audio first
# (faster-whisper accepts video files directly via FFmpeg)
from faster_whisper import WhisperModel
model = WhisperModel("large-v3", device="cuda", compute_type="float16")
segments, info = model.transcribe(
"video.mp4",
beam_size=5,
vad_filter=True,
)
with open("transcript.md", "w") as f:
for seg in segments:
f.write(f"[{seg.start:.0f}s] {seg.text.strip()}\n\n")Pros
Truly free at any volume. Best handling of noisy audio. Total privacy. 100+ languages. Choice of model size (tiny, base, small, medium, large-v3).
Cons
Requires Python and ideally a GPU. CPU works but slowly (3-5x real time on a modern laptop). No diarization out of the box (use WhisperX for that). No structure post-processing — you get plain timestamped text.
Best for
Developers, researchers, anyone with privacy constraints, anyone transcribing many hours per month.
Method 3: macOS Live Transcribe / Voice Memos
macOS Sonoma and newer ship live captions and voice memo transcription that work entirely on-device for Apple Silicon.
How to use (live captions for any video playing on the Mac)
- System Settings → Accessibility → Live Captions → toggle on.
- Play the video in any app (QuickTime, browser, VLC).
- Captions appear in a floating window in real time.
- To save the transcript, copy from the captions window or use a screen recording.
How to use (Voice Memos transcription)
- Extract the audio from your video file:
ffmpeg -i video.mp4 -vn audio.m4a - Open Voice Memos on macOS or iOS, import the audio.
- Tap the transcript icon — Apple's on-device model produces a transcript.
- Copy or share the text.
Pros
Truly free, on-device (no cloud), private, works on any audio.
Cons
Live Captions don't save by default — you have to copy or screen-record. Voice Memos transcription is hidden behind UI tap-throughs and not designed for batch. Apple Silicon required for the offline mode.
Best for
Mac users live-captioning a single video they are watching, or one-off voice memo conversion.
Method 4: Google Voice Type / Live Caption
Google's accessibility features include Live Caption (Android, ChromeOS, Chrome browser on Mac/Windows) which runs on-device.
How to use
- Enable Live Caption in Chrome: Settings → Accessibility → Live Caption → toggle on.
- Play any video in Chrome (YouTube, embedded videos, local file via drag-into-tab).
- Captions render in a floating box.
- To save: select and copy the running text, or use a screen recording.
Pros
Free, on-device, available on any platform with Chrome.
Cons
Same limitation as macOS Live Captions: not designed for export. English-heavy support; other languages limited.
Method 5: Otter video upload
Otter accepts video file uploads in addition to its meeting bot.
How to use
- Sign up for Otter free (600 min/month, 40-min per-file cap).
- In the dashboard, click Import → upload the video file.
- Otter extracts the audio and transcribes.
- Edit, share, or export the transcript from the Otter dashboard.
Pros
Strong diarization. Searchable archive across uploads. Action-item extraction.
Cons
40-minute per-file cap on free tier (a 60-min lecture won't fit). Plain text output (no Markdown). Aggressive upgrade prompts.
Method 6: VLC + audio extraction + free transcription tool
If your video is in a format that some web tools choke on (large MKV, niche codecs), the workflow is to extract audio first.
How to use
- Open the video in VLC.
- Media → Convert/Save → select your video → Convert/Save button.
- Pick "Audio - MP3" profile and a destination filename.
- Click Start. VLC writes an MP3 of the audio track.
- Upload the MP3 to any audio transcription tool (the audio-to-markdown tools work the same way).
Pros
Sidesteps video-format issues. Smaller file uploads (MP3 is much smaller than MP4).
Cons
Two-step workflow. The transcription tool you choose still has its own free-tier limits.
Method 7: yt-dlp + Whisper (for online video URLs)
For YouTube, Vimeo, X, TikTok, Twitch, and 1000+ other supported sites, yt-dlp downloads the audio and Whisper transcribes it.
How to use
# Install both
pip install -U yt-dlp faster-whisper
# Download audio only from a URL
yt-dlp -x --audio-format mp3 -o "audio.%(ext)s" \
"https://www.youtube.com/watch?v=VIDEO_ID"
# Transcribe
from faster_whisper import WhisperModel
model = WhisperModel("large-v3", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3")
for s in segments:
print(f"[{s.start:.0f}s] {s.text}")Pros
Works on virtually any online video. Local processing. Highest accuracy. Unlimited use.
Cons
Setup cost. Same Whisper limitations on speaker diarization and structure.
Best for
Researchers and developers transcribing many videos from many platforms with full local control.
Method 8: ffmpeg + manual
The lowest-level option. ffmpeg extracts audio, then any free transcription tool processes the audio.
How to use
# Extract audio from video as MP3
ffmpeg -i video.mp4 -vn -acodec libmp3lame -ab 128k audio.mp3
# Or keep original codec without re-encoding (faster)
ffmpeg -i video.mp4 -vn -acodec copy audio.aacThen upload audio.mp3 to mdisbetter, Otter, TurboScribe, or feed to local Whisper.
When to use this
When the original video file is in a problematic format, when you need to chop the video first (ffmpeg can clip with -ss and -t flags), or when you want maximum control.
Quick comparison table
| Method | Free quota | Setup | Privacy | Output |
|---|---|---|---|---|
| mdisbetter web | Monthly cap | None | Cloud | Markdown |
| Whisper local | Unlimited | Python+GPU | Local | Plain + timestamps |
| macOS Live | Unlimited | Built-in | On-device | Live only |
| Google Live Caption | Unlimited | Chrome | On-device | Live only |
| Otter video upload | 600 min | Signup | Cloud | TXT |
| VLC + tool | Tool-dependent | VLC + tool | Cloud | Tool-dependent |
| yt-dlp + Whisper | Unlimited | CLI + GPU | Local | Plain + timestamps |
| ffmpeg + tool | Tool-dependent | ffmpeg + tool | Cloud | Tool-dependent |
Decision tree
- Want Markdown for AI tools? mdisbetter.
- Have a GPU and Python comfort? Whisper local. Best free option, period.
- Mac user, just need to read what's said in one video? macOS Live Captions or Voice Memos.
- Browser-based, English content? Google Live Caption in Chrome.
- Many short meetings? Otter free tier.
- Problematic video format? VLC or ffmpeg to extract audio first, then any tool.
- Online video from any platform? yt-dlp + Whisper.
What you typically don't get for free
The common upgrades behind paid tiers: longer per-file caps, more monthly minutes, additional languages, additional output formats, advanced editor features, team collaboration, real-time captioning, and sometimes accuracy (some vendors route free-tier audio through faster, slightly-less-accurate models).
The format limitation often hurts most. If your transcript feeds an LLM, structured Markdown beats plain text by a meaningful margin. For AI-pipeline use, mdisbetter is the only free Markdown-output option in this list. We cover the broader format question in your YouTube videos are invisible to AI.
The honest summary
For most casual users: pick one cloud free tier and use it (mdisbetter for AI use, TurboScribe for SRT volume, Otter for meetings). For serious volume or privacy: Whisper local. For one-off live transcription on what you're watching: macOS or Google Live Captions. The mistake to avoid is paying before you have tested the free tiers — every option above gives you enough free runway to evaluate.
For the YouTube-specific transcript-download patterns, see how to download a YouTube transcript. For Vimeo specifically, see how to get a transcript from Vimeo. For Zoom recordings, see how to transcribe a Zoom meeting for free. For the audio-only equivalent of this guide, see the parallel /convert/audio-to-markdown-for-podcasters.
The honest cost of "free"
Every option in this guide costs something even when it costs no money. The cloud free tiers cost time waiting in queues and effort working around per-file caps. The local options cost setup time, GPU electricity, and the maintenance burden of keeping a Python environment working. The OS built-ins cost flexibility — they are convenient for one-off use but not designed for archival or batch workflows. The right pick depends on which of these costs is cheapest for you. For most knowledge workers transcribing 5-20 videos per month, the cloud free tiers are the lowest-friction answer. For developers or researchers transcribing 50+ hours per month, the setup cost of local Whisper amortizes quickly. The mistake is treating "free" as a single category instead of recognizing the tradeoffs.