May 10, 2026 · 9 min read · MDisBetter

How to Get Structured Meeting Notes from Any Recording

Native meeting bots (Otter, Fireflies, etc.) work well when they fit your stack. They don't always — bots can't join in-person meetings, customers often won't consent, and the action items end up locked in a vendor's dashboard rather than your task system. Here's the no-bot workflow that produces equivalent (often better) structured Markdown notes from any recording, on any platform, with no third-party joining your call.

The end-to-end workflow

Five steps total:

Record the meeting on whatever you already use.
Save the audio file to your machine.
Upload to audio-to-markdown.
Download the structured Markdown.
Optionally feed to AI to extract action items.

Total time per meeting: 5-15 minutes after the call ends, depending on length and how much cleanup you want.

Step 1: Record

Use whatever recording method fits the meeting type:

Zoom / Google Meet / Teams calls: enable local recording at the start. Each platform produces an MP4 (with audio track) or M4A audio file in a designated folder when the meeting ends.
In-person meetings: phone voice memo app on the table, or a dedicated recorder if audio quality matters more (Zoom H1, Sony portable, etc.).
Phone calls: depends on platform and jurisdiction. iPhone has limited native call recording; Android varies by maker; some VoIP platforms have built-in call recording.
Hybrid (some in-person, some remote): the platform's recording captures everything if all attendees join via the platform — including in-person attendees on the conference room speaker. If in-person attendees are off-mic, supplement with a phone recording in the room.

Consent matters. Recording laws vary by jurisdiction. Many require all-party consent. As default behavior: announce that you're recording at the start of every recorded meeting, document the consent in the meeting notes, and respect any attendee's request not to record. The workflow doesn't change ethics or law — it just removes the bot-joining friction that often makes the consent conversation harder.

Step 2: Save the audio

Get the audio file onto your machine:

Zoom local recording: ~/Documents/Zoom/[meeting-name]/audio_only.m4a
Google Meet: download the MP4 from Drive (audio gets extracted in step 3 automatically)
Teams: similar, in OneDrive
Phone voice memo: AirDrop to Mac, USB cable to PC, or sync via cloud
Dedicated recorder: copy via SD card

If you have a video file but only need the audio, the converter handles MP4/MOV directly — no manual extraction required. If you want to extract for any reason: ffmpeg -i video.mp4 -vn -acodec copy audio.m4a.

Step 3: Upload to audio-to-markdown

Open /convert/audio-to-markdown. Drag the file onto the upload zone. Click Convert. Wait 1-3 minutes for a typical 60-minute meeting.

Step 4: Download the structured Markdown

The output has three things that distinguish it from plain-text transcription:

Speaker labels at every turn ("Speaker 1", "Speaker 2", etc.)
H2 section headings at natural topic shifts
Optional timestamps for cross-referencing back to the audio

Download the .md file. Open it in your editor of choice (VS Code, Obsidian, Typora, plain text editor — any will do).

Cleanup pass (3-5 minutes)

Before saving to your team's notes system:

Find-and-replace speaker labels with actual names. **Speaker 1:** → **Sarah (PM):**, etc.
Skim the H2 headings. The AI's section breaks are usually right but occasionally need adjusting (a heading that bundles two topics, or a heading that splits one).
Quick scan for obvious mistranscriptions of proper nouns — company names, product names, technical terms specific to your domain. Fix them.
Add a short YAML frontmatter block at the top with metadata: date, attendees, project, recording-link.

The frontmatter template:

---
date: 2026-05-10
type: meeting
project: q3-pricing
attendees: ["Sarah (PM)", "Marcus (Eng)", "Dana (Design)"]
duration_minutes: 47
audio_source: zoom-2026-05-10-1400.m4a
---

This makes the notes filterable and queryable in tools like Obsidian (with the Dataview plugin) or any custom search.

Step 5: Optional action item extraction

The structured Markdown is a great input for an LLM extraction pass. Open the transcript in Claude or ChatGPT and run a structured prompt:

Read this meeting transcript carefully.

Extract every action item. An action item is:
- An explicit commitment (someone said they will do something), OR
- An assignment (someone was clearly asked and didn't decline)

Output as a Markdown table with columns:
- Owner (the person committing or assigned)
- Action (what they will do)
- Deadline (date or timeframe mentioned, or "unspecified")
- Conditions (any prerequisites, or "none")
- Source quote (verbatim, 1-2 sentences from the transcript)

Do not invent action items. Only include items grounded in clear transcript evidence.

The output is a clean table you can paste directly into Linear, Asana, Notion, or whatever task system you use. The verbatim source quote becomes the task description, so the assignee can verify exactly what was committed. We cover the action item extraction pattern in depth in losing meeting action items.

Comparison with native meeting bots

Native bots (Otter, Fireflies, Read, etc.) automate the recording-and-transcription steps. The tradeoffs:

Aspect	Bot	Manual + audio-to-markdown
Setup	Authorize per platform, calendar integration	Zero setup; works on any audio
Per-meeting friction	Bot joins automatically	You record + upload (5 min after)
Attendee consent	Bot's presence is visible; some attendees decline	You announce recording; same legal need
Output location	Vendor dashboard	Your file system / vault
Per-user cost	$10-30/month/user	Web tool, no per-seat
In-person meetings	Doesn't work	Phone recording + same workflow
Action items	Auto-extracted, vendor-locked	LLM extraction, in your tools

Bots win on per-meeting friction. The manual workflow wins on cost, control, and applicability to non-platform meetings. For teams that meet 50% in-person or with external customers who won't consent to a bot, the manual path is the only option that actually works.

Slides shared during the meeting

The audio captures the spoken discussion, but slides usually contain the structured artifacts of the meeting (numbers, named entities, decisions on screen). For complete records, run any shared decks through pdf-to-markdown as a separate step and concatenate both Markdown files. The combined document is the full meeting record — spoken and visual — in one searchable file.

Building a meeting library

Per-meeting notes are useful. The compounding value comes from a structured library:

meetings/
  2026/
    05/
      2026-05-08-pricing-review.md
      2026-05-08-pricing-review.m4a
      2026-05-09-customer-acme.md
      2026-05-09-customer-acme.m4a
  2025/
    ...

Naming convention: YYYY-MM-DD-topic.ext. The audio file lives next to the Markdown for verification when needed. Frontmatter on each Markdown file enables Dataview queries ("all customer meetings in Q2 mentioning pricing") that no calendar app can answer.

For the search-on-the-library workflow, see you can't search audio recordings.

For Notion users

Build a Meetings database with properties: Date, Type (multi-select: customer, internal, planning, retro), Project (relation), Attendees (multi-select), Status (raw, processed, archived). Each meeting page contains the Markdown transcript pasted as content. The Notion AI integration can query across the database ("summarize the last 5 customer calls") with reasonable results. See audio to Notion workflow.

For Obsidian users

The transcripts plug into your existing PKM workflow. A meetings/ folder in the vault, daily notes that backlink to relevant meetings, a Dataview-powered "meetings this week" panel on the home page. The graph view shows the cross-meeting connections automatically.

Privacy considerations

Two real concerns worth addressing.

Cloud upload of meeting audio. The web tool processes uploads and returns Markdown — review the privacy policy for current handling. For sensitive content (legal, HR, customer-data heavy), the conservative choice is local Whisper. Same workflow conceptually; the audio never leaves your machine. Setup is a single pip install openai-whisper plus modest hardware.

Storage of meeting recordings. The audio file itself contains everything anyone said in the meeting. Treat it like any other sensitive artifact: encrypted local storage, retention policy, access control. The Markdown transcript is somewhat less sensitive (no tone, no audio fingerprint) but contains all the substantive content — apply the same handling.

Common failure modes

Conference room audio with one ceiling mic and 6 attendees. Speaker diarization struggles when voices are similar in pitch or far from the mic. Workaround: have each attendee state their name once at the start ("Sarah from product, Marcus from engineering..."), then add a manual mapping in cleanup.

Meetings that switch language mid-call. Mixed-language conversations confuse the transcription. For genuinely multi-language meetings (international customer calls, etc.), accept that the dominant-language portion will be cleaner; the secondary-language portions may need cleanup.

Long meetings with poor structure. A 3-hour meeting with no clear agenda produces a long transcript with weak H2 boundaries. The fix happens upstream: better meeting agendas with clear topic transitions improve both the meeting itself and the resulting transcript.

The shared-responsibility variant

For recurring team meetings (weekly standups, planning sessions, retros), the rotating-owner pattern works well. One person per meeting owns the post-meeting workflow: download the recording, upload to the converter, do the cleanup pass, share the resulting transcript and action items with the team. Rotate weekly so no single person carries the load. Total per-meeting cost stays at 10-15 minutes; per-person cost drops to once every N weeks.

The shared-responsibility model also distributes knowledge of the workflow itself. Everyone learns to clean up speaker labels, run the action item extraction prompt, and push items to the task system. The workflow stops depending on any single person's discipline, which is what kills these systems most often.

Recommendation

For most teams, the workflow is best treated as a standard end-of-meeting ritual: record, upload within an hour, run the action item extraction pass, push to the task system. The total cost is 10-15 minutes per meeting, in exchange for complete, searchable, structured records of every meeting. After 6 months you have an institutional memory that no individual brain could maintain — and you've spent less than a single bot's per-user subscription would have cost.

Frequently asked questions

How do I handle very small voices or off-mic attendees?

If an attendee is consistently too quiet for the recorder, transcription accuracy on their contributions drops. Workarounds: ensure every speaker is on a working mic when possible, or have someone restate quiet contributions out loud. For routine meeting setups, invest one time in a decent USB conference mic — accuracy gains pay back immediately.

Should I share the meeting transcript with all attendees afterward?

Yes for internal meetings — sharing catches misattributions immediately, holds owners accountable to commitments, and reduces the 'I didn't know I was supposed to do that' pattern. For sensitive or external meetings (customer calls, HR conversations), apply judgment about what to share with whom. Default-share for internal recurring meetings.

What if my organization requires structured meeting minutes in a specific format?

The Markdown transcript becomes input to the formal minutes, not a replacement. Use the LLM extraction pass with a custom prompt that produces minutes in your required format (decisions, action items, attendees, agenda items). The structured Markdown source makes templating reliable; the same transcript can produce multiple output formats for different audiences.