May 10, 2026 · 10 min read · MDisBetter

How to Transcribe a Zoom Meeting for Free (Without Otter)

Q: Can I transcribe a Zoom recording I joined as a participant, not as the host?

Only if the host shared the recording with you (cloud recording link, downloaded MP4 they sent you, etc.) or if the host gave you recording permission during the meeting and you recorded locally yourself. Without one of those, you do not have the recording to transcribe. The host's recording is the canonical source.

Q: How do I handle a meeting where multiple people talked over each other?

Modern ASR with diarization (the engines mdisbetter and WhisperX run) handles overlapping speech reasonably — they separate speakers but accuracy on the overlapped moments drops to 75-85% from the typical 95+. The structured Markdown will flag those moments with multiple speaker labels in close succession. For meetings with constant cross-talk (heated debates, large group discussions), the transcript will have rougher patches there; everywhere else it will be clean.

Q: What about transcribing a Zoom meeting in real time so the transcript is ready when the meeting ends?

For real-time transcription you need a live tool that joins the call (Otter Pilot, Fireflies, Zoom AI Companion). The post-meeting transcribe-from-recording workflow we describe here adds 1-3 minutes after the meeting ends instead of being instant. For most use cases the small wait is worth the better structure, accuracy, and lack of in-meeting bot friction. For use cases where instant transcript matters (live notes for absent stakeholders, accessibility), use a live tool.

Zoom's built-in transcription is hidden behind paid plans. Otter is the de facto third-party meeting bot, also paid past a generous free tier. For teams and individuals who want to transcribe their Zoom recordings without paying, the workflow is straightforward: download the recording, run it through a free transcription tool. Here are the methods that work, with honest comparison to the paid alternatives.

What Zoom does and doesn't include for free

Zoom's transcription features in 2026 are split across plans:

Free / Basic plan: No live transcription. No automated post-meeting transcripts. Local recording is supported (saves an MP4 to your computer). Cloud recording is not included.
Pro plan ($15/user/month): Cloud recording included. Live transcription / automated post-meeting transcripts depend on the specific Pro tier and add-ons.
Business plan and up: Includes Zoom AI Companion features with meeting summaries, automated transcripts, and chat-based Q&A on past meetings.

For a free Zoom user, or anyone on a basic Pro plan without the AI Companion add-on, transcription is something you handle yourself after the meeting. The good news: this is straightforward and the result is usually better than Zoom's built-in transcript anyway.

Step 1: Record the meeting locally

On the free Zoom plan, the only recording option is local recording (the file saves to your computer's hard drive at the end of the meeting).

Start the Zoom meeting as host.
Click Record in the meeting toolbar → choose Record on this Computer.
Run the meeting normally. Zoom shows a recording indicator to all participants.
End the meeting. Zoom processes the recording — this takes 10-30% of the meeting duration (a 60-min meeting takes 6-18 min to process).
Zoom opens a folder containing the meeting files: zoom_0.mp4 (full video), audio_only.m4a (audio track), and a chat log if there was meeting chat.

For non-host participants on the free plan, recording requires the host's permission and is granted/revoked through the meeting controls. If you are a participant and need a recording, ask the host to either record locally and share, or to grant you recording permission during the meeting.

Consent

Zoom's recording indicator notifies all participants visually and audibly. Many jurisdictions require all-party consent for recording. The polite + safe practice: announce at the start of the meeting that you are recording and ask if anyone objects. Most people are fine with it. Respect any "please don't record this part" requests.

Step 2: Transcribe the recording

You have the zoom_0.mp4 (or audio_only.m4a) file. Three free options for converting to a transcript.

Option A: mdisbetter (recommended for structured Markdown)

Open /convert/video-to-markdown.
Click upload, select the Zoom MP4 (or M4A — both work).
Click Convert.
Wait 60-180 seconds for a 60-minute meeting.
Download the structured Markdown.

What you get: H2 sections at topic shifts, speaker labels (Zoom recordings often have multi-speaker audio that diarizes well), timestamp anchors, cleaned punctuation. The structure makes the meeting transcript skim-friendly and AI-ready out of the box.

Option B: Local Whisper (best for sensitive meetings)

For client calls under NDA, internal HR conversations, legal discussions — anything you cannot route through a cloud service:

# Install
pip install -U faster-whisper

from faster_whisper import WhisperModel

# Use the audio_only.m4a if available — smaller file, same content
model = WhisperModel("large-v3", device="cuda", compute_type="float16")
segments, info = model.transcribe(
    "audio_only.m4a",
    beam_size=5,
    vad_filter=True,
)

with open("meeting.md", "w") as f:
    for s in segments:
        f.write(f"[{s.start:.0f}s] {s.text.strip()}\n\n")

For speaker diarization (who said what), add WhisperX:

pip install whisperx

import whisperx
model = whisperx.load_model("large-v3", device="cuda")
audio = whisperx.load_audio("audio_only.m4a")
result = model.transcribe(audio, batch_size=16)

# Diarize
diarize_model = whisperx.DiarizationPipeline(
    use_auth_token=HF_TOKEN, device="cuda"
)
diarize_segments = diarize_model(audio)
result = whisperx.assign_word_speakers(diarize_segments, result)

What you get: Total privacy (nothing leaves your machine), highest available accuracy, full control. You will need to write a small post-processing step to convert the WhisperX output into structured Markdown if you want H2 sections — Whisper's native output is plain text + timestamps.

Option C: Other free web tools

TurboScribe (3 files/day, 30 min each on free), Otter (600 min/month, 40-min per-file cap), VOMO, NoteGPT — all accept Zoom audio/video uploads. Each has its own free-tier limits and output format. For meeting transcription specifically, Otter's strong diarization is notable; the cap on per-file length and free monthly minutes is the limiting factor for hour-long calls. We cover all the free options at how to transcribe a video for free.

Step 3: Run the 5-minute extraction pass

Once you have the structured Markdown transcript, immediately do a focused 5-minute pass to pull out three things:

1. Decisions

## Decisions
- **Punt feature Y to Q4** so engineering can focus on X. (Owner: Tom)
- **Hold mid-tier pricing at $49** despite Priya's concern; revisit in 60 days based on conversion data. (Owner: Sarah)

2. Action items

## Action items
- [ ] Tom: updated Q3 timeline by Friday
- [ ] Priya: pull conversion data on mid-tier for next planning
- [ ] Sarah: send out the Q3 commitment to the team Monday

3. Open questions

## Open questions
- Whether to communicate the Y→Q4 punt to the customer who asked
- Long-term direction on the mid-tier (raise, hold, restructure)

Total time: 4-7 minutes on a 47-min meeting transcript. Output: a permanent, scannable, queryable record of the meeting that anyone on the team can read in 60-90 seconds. The full pattern is in nobody rewatches meeting recordings.

Comparing free workarounds with Otter

The honest tradeoff analysis:

Aspect	Self-record + mdisbetter	Self-record + local Whisper	Otter live bot (free tier)
Cost	Free (cap on minutes)	Free (unlimited)	Free up to 600 min/mo
Setup	None	Python + GPU	Otter signup + calendar integration
In-meeting friction	None (you record)	None (you record)	Bot joins call (consent dynamics)
Per-file cap	Generous	Unlimited	40 min on free tier (brutal)
Output structure	Markdown + speakers + sections	Plain text + timestamps	Plain text + speakers
Privacy	Cloud (mdisbetter)	Local only	Cloud (Otter)
Live captions during meeting	No	No	Yes

Otter is the right pick if you specifically want live captions during the meeting and you can live with the 40-min per-file cap on free tier. The self-record-and-transcribe-after workflow is the right pick for everything else — better structure, no per-file cap, no bot consent dynamics, full control over the recording.

What about Zoom AI Companion?

For Zoom Business plans and up, AI Companion provides automated meeting summaries, post-meeting transcripts, and a chat interface for asking questions about past meetings. The summaries are reasonable; the underlying transcripts are equivalent to what you would get from any modern ASR. The advantage: zero workflow — it just happens.

The disadvantages: paid (factors into the per-user Zoom subscription cost), cloud-locked (transcripts live in Zoom's system, not in your knowledge base), and the structured Markdown output most teams want for AI-pipeline integration is not a native format. For teams already paying for Zoom Business, AI Companion is convenient. For everyone else, the self-record-and-convert workflow is honestly competitive.

For recurring meetings: automate the workflow

If you have weekly standups, monthly all-hands, or other recurring meetings you record consistently, build the transcribe-and-extract workflow into a routine:

Record meeting (Zoom local recording).
Drag the resulting MP4 into a designated folder on your machine.
Run a script that uploads to mdisbetter (or runs local Whisper) and saves the Markdown to your team's meeting notes folder.
Open the Markdown, do the 5-minute extraction pass, push action items to the task system.

The total cycle time is < 15 minutes per meeting after the meeting itself. Across a team of 20 with 5-10 meetings per week, the cumulative cost is a couple of hours per week — and the cumulative benefit is a permanent, searchable archive of every meeting your team has had. We cover the search-across-many-recordings pattern at you can't search inside videos.

Privacy escalation: what to do for sensitive meetings

Three tiers of sensitivity, three appropriate transcription approaches:

Routine internal team meetings: mdisbetter cloud workflow. Fast, structured, good enough.
Customer or client calls: local Whisper. The customer's name, financial details, or strategic discussion does not leave your machine.
Strictly confidential (legal, HR, M&A): local Whisper on an air-gapped or VPN-secured machine. Treat the recording itself as the sensitive artifact and apply the same controls you would for any other confidential file.

The bigger picture

The pattern across all these methods: stop trying to do transcription "during" the meeting. Record the meeting (you would anyway), then convert to structured Markdown after. The post-meeting workflow is faster, more accurate, more flexible on privacy, and produces better artifacts than any in-meeting bot or live transcription feature. For the meeting-culture implications of treating recordings as searchable artifacts, see why your meeting notes are always incomplete.

Compatibility notes for non-Zoom meeting platforms

The same workflow applies almost identically to recordings from other platforms:

Microsoft Teams: save the meeting recording from the Stream/SharePoint location, upload the MP4 to mdisbetter.
Google Meet: recordings save to Google Drive; download and upload.
Webex: recordings download as MP4; same workflow.
Slack Huddles / Discord stage: if you screen-recorded the conversation locally, the resulting file converts identically.

The video-to-markdown pipeline does not care which platform produced the recording — it processes the audio track from any video or audio file. For teams using a mix of meeting platforms, this is what makes the workflow durable: one transcription pipeline, every meeting source.

Frequently asked questions

Can I transcribe a Zoom recording I joined as a participant, not as the host?

Only if the host shared the recording with you (cloud recording link, downloaded MP4 they sent you, etc.) or if the host gave you recording permission during the meeting and you recorded locally yourself. Without one of those, you do not have the recording to transcribe. The host's recording is the canonical source.

How do I handle a meeting where multiple people talked over each other?

Modern ASR with diarization (the engines mdisbetter and WhisperX run) handles overlapping speech reasonably — they separate speakers but accuracy on the overlapped moments drops to 75-85% from the typical 95+. The structured Markdown will flag those moments with multiple speaker labels in close succession. For meetings with constant cross-talk (heated debates, large group discussions), the transcript will have rougher patches there; everywhere else it will be clean.

What about transcribing a Zoom meeting in real time so the transcript is ready when the meeting ends?

For real-time transcription you need a live tool that joins the call (Otter Pilot, Fireflies, Zoom AI Companion). The post-meeting transcribe-from-recording workflow we describe here adds 1-3 minutes after the meeting ends instead of being instant. For most use cases the small wait is worth the better structure, accuracy, and lack of in-meeting bot friction. For use cases where instant transcript matters (live notes for absent stakeholders, accessibility), use a live tool.