How to Get Structured Meeting Notes from Any Recording
Native meeting bots (Otter, Fireflies, etc.) work well when they fit your stack. They don't always — bots can't join in-person meetings, customers often won't consent, and the action items end up locked in a vendor's dashboard rather than your task system. Here's the no-bot workflow that produces equivalent (often better) structured Markdown notes from any recording, on any platform, with no third-party joining your call.
The end-to-end workflow
Five steps total:
- Record the meeting on whatever you already use.
- Save the audio file to your machine.
- Upload to audio-to-markdown.
- Download the structured Markdown.
- Optionally feed to AI to extract action items.
Total time per meeting: 5-15 minutes after the call ends, depending on length and how much cleanup you want.
Step 1: Record
Use whatever recording method fits the meeting type:
- Zoom / Google Meet / Teams calls: enable local recording at the start. Each platform produces an MP4 (with audio track) or M4A audio file in a designated folder when the meeting ends.
- In-person meetings: phone voice memo app on the table, or a dedicated recorder if audio quality matters more (Zoom H1, Sony portable, etc.).
- Phone calls: depends on platform and jurisdiction. iPhone has limited native call recording; Android varies by maker; some VoIP platforms have built-in call recording.
- Hybrid (some in-person, some remote): the platform's recording captures everything if all attendees join via the platform — including in-person attendees on the conference room speaker. If in-person attendees are off-mic, supplement with a phone recording in the room.
Consent matters. Recording laws vary by jurisdiction. Many require all-party consent. As default behavior: announce that you're recording at the start of every recorded meeting, document the consent in the meeting notes, and respect any attendee's request not to record. The workflow doesn't change ethics or law — it just removes the bot-joining friction that often makes the consent conversation harder.
Step 2: Save the audio
Get the audio file onto your machine:
- Zoom local recording:
~/Documents/Zoom/[meeting-name]/audio_only.m4a - Google Meet: download the MP4 from Drive (audio gets extracted in step 3 automatically)
- Teams: similar, in OneDrive
- Phone voice memo: AirDrop to Mac, USB cable to PC, or sync via cloud
- Dedicated recorder: copy via SD card
If you have a video file but only need the audio, the converter handles MP4/MOV directly — no manual extraction required. If you want to extract for any reason: ffmpeg -i video.mp4 -vn -acodec copy audio.m4a.
Step 3: Upload to audio-to-markdown
Open /convert/audio-to-markdown. Drag the file onto the upload zone. Click Convert. Wait 1-3 minutes for a typical 60-minute meeting.
Step 4: Download the structured Markdown
The output has three things that distinguish it from plain-text transcription:
- Speaker labels at every turn ("Speaker 1", "Speaker 2", etc.)
- H2 section headings at natural topic shifts
- Optional timestamps for cross-referencing back to the audio
Download the .md file. Open it in your editor of choice (VS Code, Obsidian, Typora, plain text editor — any will do).
Cleanup pass (3-5 minutes)
Before saving to your team's notes system:
- Find-and-replace speaker labels with actual names.
**Speaker 1:**→**Sarah (PM):**, etc. - Skim the H2 headings. The AI's section breaks are usually right but occasionally need adjusting (a heading that bundles two topics, or a heading that splits one).
- Quick scan for obvious mistranscriptions of proper nouns — company names, product names, technical terms specific to your domain. Fix them.
- Add a short YAML frontmatter block at the top with metadata: date, attendees, project, recording-link.
The frontmatter template:
---
date: 2026-05-10
type: meeting
project: q3-pricing
attendees: ["Sarah (PM)", "Marcus (Eng)", "Dana (Design)"]
duration_minutes: 47
audio_source: zoom-2026-05-10-1400.m4a
---This makes the notes filterable and queryable in tools like Obsidian (with the Dataview plugin) or any custom search.
Step 5: Optional action item extraction
The structured Markdown is a great input for an LLM extraction pass. Open the transcript in Claude or ChatGPT and run a structured prompt:
Read this meeting transcript carefully.
Extract every action item. An action item is:
- An explicit commitment (someone said they will do something), OR
- An assignment (someone was clearly asked and didn't decline)
Output as a Markdown table with columns:
- Owner (the person committing or assigned)
- Action (what they will do)
- Deadline (date or timeframe mentioned, or "unspecified")
- Conditions (any prerequisites, or "none")
- Source quote (verbatim, 1-2 sentences from the transcript)
Do not invent action items. Only include items grounded in clear transcript evidence.The output is a clean table you can paste directly into Linear, Asana, Notion, or whatever task system you use. The verbatim source quote becomes the task description, so the assignee can verify exactly what was committed. We cover the action item extraction pattern in depth in losing meeting action items.
Comparison with native meeting bots
Native bots (Otter, Fireflies, Read, etc.) automate the recording-and-transcription steps. The tradeoffs:
| Aspect | Bot | Manual + audio-to-markdown |
|---|---|---|
| Setup | Authorize per platform, calendar integration | Zero setup; works on any audio |
| Per-meeting friction | Bot joins automatically | You record + upload (5 min after) |
| Attendee consent | Bot's presence is visible; some attendees decline | You announce recording; same legal need |
| Output location | Vendor dashboard | Your file system / vault |
| Per-user cost | $10-30/month/user | Web tool, no per-seat |
| In-person meetings | Doesn't work | Phone recording + same workflow |
| Action items | Auto-extracted, vendor-locked | LLM extraction, in your tools |
Bots win on per-meeting friction. The manual workflow wins on cost, control, and applicability to non-platform meetings. For teams that meet 50% in-person or with external customers who won't consent to a bot, the manual path is the only option that actually works.
Slides shared during the meeting
The audio captures the spoken discussion, but slides usually contain the structured artifacts of the meeting (numbers, named entities, decisions on screen). For complete records, run any shared decks through pdf-to-markdown as a separate step and concatenate both Markdown files. The combined document is the full meeting record — spoken and visual — in one searchable file.
Building a meeting library
Per-meeting notes are useful. The compounding value comes from a structured library:
meetings/
2026/
05/
2026-05-08-pricing-review.md
2026-05-08-pricing-review.m4a
2026-05-09-customer-acme.md
2026-05-09-customer-acme.m4a
2025/
...Naming convention: YYYY-MM-DD-topic.ext. The audio file lives next to the Markdown for verification when needed. Frontmatter on each Markdown file enables Dataview queries ("all customer meetings in Q2 mentioning pricing") that no calendar app can answer.
For the search-on-the-library workflow, see you can't search audio recordings.
For Notion users
Build a Meetings database with properties: Date, Type (multi-select: customer, internal, planning, retro), Project (relation), Attendees (multi-select), Status (raw, processed, archived). Each meeting page contains the Markdown transcript pasted as content. The Notion AI integration can query across the database ("summarize the last 5 customer calls") with reasonable results. See audio to Notion workflow.
For Obsidian users
The transcripts plug into your existing PKM workflow. A meetings/ folder in the vault, daily notes that backlink to relevant meetings, a Dataview-powered "meetings this week" panel on the home page. The graph view shows the cross-meeting connections automatically.
Privacy considerations
Two real concerns worth addressing.
Cloud upload of meeting audio. The web tool processes uploads and returns Markdown — review the privacy policy for current handling. For sensitive content (legal, HR, customer-data heavy), the conservative choice is local Whisper. Same workflow conceptually; the audio never leaves your machine. Setup is a single pip install openai-whisper plus modest hardware.
Storage of meeting recordings. The audio file itself contains everything anyone said in the meeting. Treat it like any other sensitive artifact: encrypted local storage, retention policy, access control. The Markdown transcript is somewhat less sensitive (no tone, no audio fingerprint) but contains all the substantive content — apply the same handling.
Common failure modes
Conference room audio with one ceiling mic and 6 attendees. Speaker diarization struggles when voices are similar in pitch or far from the mic. Workaround: have each attendee state their name once at the start ("Sarah from product, Marcus from engineering..."), then add a manual mapping in cleanup.
Meetings that switch language mid-call. Mixed-language conversations confuse the transcription. For genuinely multi-language meetings (international customer calls, etc.), accept that the dominant-language portion will be cleaner; the secondary-language portions may need cleanup.
Long meetings with poor structure. A 3-hour meeting with no clear agenda produces a long transcript with weak H2 boundaries. The fix happens upstream: better meeting agendas with clear topic transitions improve both the meeting itself and the resulting transcript.
The shared-responsibility variant
For recurring team meetings (weekly standups, planning sessions, retros), the rotating-owner pattern works well. One person per meeting owns the post-meeting workflow: download the recording, upload to the converter, do the cleanup pass, share the resulting transcript and action items with the team. Rotate weekly so no single person carries the load. Total per-meeting cost stays at 10-15 minutes; per-person cost drops to once every N weeks.
The shared-responsibility model also distributes knowledge of the workflow itself. Everyone learns to clean up speaker labels, run the action item extraction prompt, and push items to the task system. The workflow stops depending on any single person's discipline, which is what kills these systems most often.
Recommendation
For most teams, the workflow is best treated as a standard end-of-meeting ritual: record, upload within an hour, run the action item extraction pass, push to the task system. The total cost is 10-15 minutes per meeting, in exchange for complete, searchable, structured records of every meeting. After 6 months you have an institutional memory that no individual brain could maintain — and you've spent less than a single bot's per-user subscription would have cost.