How to Transcribe a Zoom Meeting for Free (Without Otter)
Zoom's built-in transcription is hidden behind paid plans. Otter is the de facto third-party meeting bot, also paid past a generous free tier. For teams and individuals who want to transcribe their Zoom recordings without paying, the workflow is straightforward: download the recording, run it through a free transcription tool. Here are the methods that work, with honest comparison to the paid alternatives.
What Zoom does and doesn't include for free
Zoom's transcription features in 2026 are split across plans:
- Free / Basic plan: No live transcription. No automated post-meeting transcripts. Local recording is supported (saves an MP4 to your computer). Cloud recording is not included.
- Pro plan ($15/user/month): Cloud recording included. Live transcription / automated post-meeting transcripts depend on the specific Pro tier and add-ons.
- Business plan and up: Includes Zoom AI Companion features with meeting summaries, automated transcripts, and chat-based Q&A on past meetings.
For a free Zoom user, or anyone on a basic Pro plan without the AI Companion add-on, transcription is something you handle yourself after the meeting. The good news: this is straightforward and the result is usually better than Zoom's built-in transcript anyway.
Step 1: Record the meeting locally
On the free Zoom plan, the only recording option is local recording (the file saves to your computer's hard drive at the end of the meeting).
- Start the Zoom meeting as host.
- Click Record in the meeting toolbar → choose Record on this Computer.
- Run the meeting normally. Zoom shows a recording indicator to all participants.
- End the meeting. Zoom processes the recording — this takes 10-30% of the meeting duration (a 60-min meeting takes 6-18 min to process).
- Zoom opens a folder containing the meeting files:
zoom_0.mp4(full video),audio_only.m4a(audio track), and a chat log if there was meeting chat.
For non-host participants on the free plan, recording requires the host's permission and is granted/revoked through the meeting controls. If you are a participant and need a recording, ask the host to either record locally and share, or to grant you recording permission during the meeting.
Consent
Zoom's recording indicator notifies all participants visually and audibly. Many jurisdictions require all-party consent for recording. The polite + safe practice: announce at the start of the meeting that you are recording and ask if anyone objects. Most people are fine with it. Respect any "please don't record this part" requests.
Step 2: Transcribe the recording
You have the zoom_0.mp4 (or audio_only.m4a) file. Three free options for converting to a transcript.
Option A: mdisbetter (recommended for structured Markdown)
- Open /convert/video-to-markdown.
- Click upload, select the Zoom MP4 (or M4A — both work).
- Click Convert.
- Wait 60-180 seconds for a 60-minute meeting.
- Download the structured Markdown.
What you get: H2 sections at topic shifts, speaker labels (Zoom recordings often have multi-speaker audio that diarizes well), timestamp anchors, cleaned punctuation. The structure makes the meeting transcript skim-friendly and AI-ready out of the box.
Option B: Local Whisper (best for sensitive meetings)
For client calls under NDA, internal HR conversations, legal discussions — anything you cannot route through a cloud service:
# Install
pip install -U faster-whisper
from faster_whisper import WhisperModel
# Use the audio_only.m4a if available — smaller file, same content
model = WhisperModel("large-v3", device="cuda", compute_type="float16")
segments, info = model.transcribe(
"audio_only.m4a",
beam_size=5,
vad_filter=True,
)
with open("meeting.md", "w") as f:
for s in segments:
f.write(f"[{s.start:.0f}s] {s.text.strip()}\n\n")For speaker diarization (who said what), add WhisperX:
pip install whisperx
import whisperx
model = whisperx.load_model("large-v3", device="cuda")
audio = whisperx.load_audio("audio_only.m4a")
result = model.transcribe(audio, batch_size=16)
# Diarize
diarize_model = whisperx.DiarizationPipeline(
use_auth_token=HF_TOKEN, device="cuda"
)
diarize_segments = diarize_model(audio)
result = whisperx.assign_word_speakers(diarize_segments, result)What you get: Total privacy (nothing leaves your machine), highest available accuracy, full control. You will need to write a small post-processing step to convert the WhisperX output into structured Markdown if you want H2 sections — Whisper's native output is plain text + timestamps.
Option C: Other free web tools
TurboScribe (3 files/day, 30 min each on free), Otter (600 min/month, 40-min per-file cap), VOMO, NoteGPT — all accept Zoom audio/video uploads. Each has its own free-tier limits and output format. For meeting transcription specifically, Otter's strong diarization is notable; the cap on per-file length and free monthly minutes is the limiting factor for hour-long calls. We cover all the free options at how to transcribe a video for free.
Step 3: Run the 5-minute extraction pass
Once you have the structured Markdown transcript, immediately do a focused 5-minute pass to pull out three things:
1. Decisions
## Decisions
- **Punt feature Y to Q4** so engineering can focus on X. (Owner: Tom)
- **Hold mid-tier pricing at $49** despite Priya's concern; revisit in 60 days based on conversion data. (Owner: Sarah)2. Action items
## Action items
- [ ] Tom: updated Q3 timeline by Friday
- [ ] Priya: pull conversion data on mid-tier for next planning
- [ ] Sarah: send out the Q3 commitment to the team Monday3. Open questions
## Open questions
- Whether to communicate the Y→Q4 punt to the customer who asked
- Long-term direction on the mid-tier (raise, hold, restructure)Total time: 4-7 minutes on a 47-min meeting transcript. Output: a permanent, scannable, queryable record of the meeting that anyone on the team can read in 60-90 seconds. The full pattern is in nobody rewatches meeting recordings.
Comparing free workarounds with Otter
The honest tradeoff analysis:
| Aspect | Self-record + mdisbetter | Self-record + local Whisper | Otter live bot (free tier) |
|---|---|---|---|
| Cost | Free (cap on minutes) | Free (unlimited) | Free up to 600 min/mo |
| Setup | None | Python + GPU | Otter signup + calendar integration |
| In-meeting friction | None (you record) | None (you record) | Bot joins call (consent dynamics) |
| Per-file cap | Generous | Unlimited | 40 min on free tier (brutal) |
| Output structure | Markdown + speakers + sections | Plain text + timestamps | Plain text + speakers |
| Privacy | Cloud (mdisbetter) | Local only | Cloud (Otter) |
| Live captions during meeting | No | No | Yes |
Otter is the right pick if you specifically want live captions during the meeting and you can live with the 40-min per-file cap on free tier. The self-record-and-transcribe-after workflow is the right pick for everything else — better structure, no per-file cap, no bot consent dynamics, full control over the recording.
What about Zoom AI Companion?
For Zoom Business plans and up, AI Companion provides automated meeting summaries, post-meeting transcripts, and a chat interface for asking questions about past meetings. The summaries are reasonable; the underlying transcripts are equivalent to what you would get from any modern ASR. The advantage: zero workflow — it just happens.
The disadvantages: paid (factors into the per-user Zoom subscription cost), cloud-locked (transcripts live in Zoom's system, not in your knowledge base), and the structured Markdown output most teams want for AI-pipeline integration is not a native format. For teams already paying for Zoom Business, AI Companion is convenient. For everyone else, the self-record-and-convert workflow is honestly competitive.
For recurring meetings: automate the workflow
If you have weekly standups, monthly all-hands, or other recurring meetings you record consistently, build the transcribe-and-extract workflow into a routine:
- Record meeting (Zoom local recording).
- Drag the resulting MP4 into a designated folder on your machine.
- Run a script that uploads to mdisbetter (or runs local Whisper) and saves the Markdown to your team's meeting notes folder.
- Open the Markdown, do the 5-minute extraction pass, push action items to the task system.
The total cycle time is < 15 minutes per meeting after the meeting itself. Across a team of 20 with 5-10 meetings per week, the cumulative cost is a couple of hours per week — and the cumulative benefit is a permanent, searchable archive of every meeting your team has had. We cover the search-across-many-recordings pattern at you can't search inside videos.
Privacy escalation: what to do for sensitive meetings
Three tiers of sensitivity, three appropriate transcription approaches:
- Routine internal team meetings: mdisbetter cloud workflow. Fast, structured, good enough.
- Customer or client calls: local Whisper. The customer's name, financial details, or strategic discussion does not leave your machine.
- Strictly confidential (legal, HR, M&A): local Whisper on an air-gapped or VPN-secured machine. Treat the recording itself as the sensitive artifact and apply the same controls you would for any other confidential file.
The bigger picture
The pattern across all these methods: stop trying to do transcription "during" the meeting. Record the meeting (you would anyway), then convert to structured Markdown after. The post-meeting workflow is faster, more accurate, more flexible on privacy, and produces better artifacts than any in-meeting bot or live transcription feature. For the meeting-culture implications of treating recordings as searchable artifacts, see why your meeting notes are always incomplete.
Compatibility notes for non-Zoom meeting platforms
The same workflow applies almost identically to recordings from other platforms:
- Microsoft Teams: save the meeting recording from the Stream/SharePoint location, upload the MP4 to mdisbetter.
- Google Meet: recordings save to Google Drive; download and upload.
- Webex: recordings download as MP4; same workflow.
- Slack Huddles / Discord stage: if you screen-recorded the conversation locally, the resulting file converts identically.
The video-to-markdown pipeline does not care which platform produced the recording — it processes the audio track from any video or audio file. For teams using a mix of meeting platforms, this is what makes the workflow durable: one transcription pipeline, every meeting source.