Pricing Dashboard Sign up
Recent

Audio to Markdown for Journalists — Transcribe Sources Fast

A 60-minute interview is 6-10 hours of manual transcription. Most reporters skip the full transcript and work from notes — losing exact quotes, missing the off-hand remark that turned out to matter, scrambling at deadline to find that one phrase. Upload the audio to mdisbetter.com and the structured Markdown is back in minutes: each speaker labelled, every quote timestamped to the recording for verification, the whole thing greppable across your source archive.

Why this is hard without the right tool

  • Hours of manual transcription per interview
  • Need verbatim quotes for accuracy
  • Speaker attribution required for each quote
  • Tight deadline pressure on every story

Recommended workflow

  1. Record the interview as you normally would (phone, Zoom recording, in-person mic, whatever)
  2. Open /convert/audio-to-markdown and upload the audio file
  3. Convert — minutes per hour of audio, not hours per hour
  4. Download the Markdown: speakers labelled, quotes timestamped, structured as **Reporter:** / **Source:** exchanges
  5. Use ctrl-F to find quotes by keyword; jump to the timestamp in your audio player to verify the verbatim wording before publication
  6. Build a personal source-archive folder of .md transcripts — searchable across every interview you've ever done with that source or on that beat

Verification workflow: never publish a quote you can't play back

The timestamps in the Markdown output ([12:34] next to each speaker turn) map back to the original recording. Before any quote ships, jump to the timestamp in your audio player and confirm the wording verbatim. The transcript is a draft; the recording is the source of truth. Treat the Markdown as a fast index into your audio, not a replacement for it. This is the same discipline pre-AI tools required, just faster — you can verify 20 quotes in the time it used to take to transcribe one.

Multi-source story workflow

When a story pulls from 8 interviews across 6 weeks, an Obsidian vault of .md transcripts becomes a research workspace. Cross-reference quotes from different sources by topic. Build a timeline of who said what, when. Use Obsidian's graph view to see which sources spoke to which themes. None of this is possible with audio files in a folder; all of it falls out for free once the transcripts are Markdown.

Cross-link to PDF source documents

Most investigative stories also pull from PDFs — court filings, leaked memos, regulatory submissions. Convert those with /convert/pdf-to-markdown and store alongside the interview transcripts. Same vault, same searchable corpus, audio quotes and document quotes side by side, all greppable. For source webpages (press releases, archived blog posts), /convert/url-to-markdown finishes the trio.

Privacy note for sensitive sources

mdisbetter processes audio in memory and deletes after conversion (no retention on free/pro tiers). For genuinely sensitive sources — whistleblowers, off-the-record interviews where any cloud upload is a risk — run whisper or faster-whisper entirely offline on your laptop. Same accuracy, zero network egress. The web tool is the right speed/convenience tradeoff for the 90% of interviews where the source isn't at risk.

Frequently asked questions

How accurate are the verbatim quotes?
Word-error-rate is typically 3-8% on clean recordings (single mic, quiet room, native English speaker). For a 60-minute interview that's ~50-150 words across the whole transcript that may need a tweak. Always verify against the audio at the timestamp before publishing — the transcript is a fast index, the recording is the source of truth. For accented English, multi-language conversations, or noisy phone-call audio, expect 5-12% WER and budget more verification time.
How do speaker labels work for two-party phone interviews?
Diarisation auto-detects speaker count and labels Speaker 1 / Speaker 2. After download, find-and-replace those generic labels with actual names (Reporter / Smith, etc.). For two-party phone calls where you and the source have different mic levels, accuracy is usually 95%+ on the labels. Round-table or panel interviews with 4+ voices need more cleanup.
Can I search across years of past interviews?
Yes — once each interview is a <code>.md</code> file, ripgrep / Obsidian search / Notion search all work across the whole archive. Search "supply chain" across 3 years of beat interviews and you get every source who ever mentioned the topic, with timestamps to play back the audio. This kind of cross-source recall was effectively impossible when transcripts didn't exist or lived in proprietary apps.
Is this OK for off-the-record or sensitive sources?
For genuinely sensitive material (whistleblowers, sources whose identity would be at risk), run <a href="https://github.com/openai/whisper">whisper</a> locally on your laptop — same model class as the web tool, runs entirely offline, MIT-licensed. mdisbetter's web tool is appropriate for routine interviews where the cloud-upload risk is acceptable; for the cases where it isn't, the OSS path keeps everything on your own hardware.
What about court filings and PDF source documents alongside interviews?
Convert those with <a href="/convert/pdf-to-markdown">/convert/pdf-to-markdown</a> and drop in the same vault as your interview transcripts. The investigative-story workflow is one corpus of mixed source types — interviews, filings, press releases (via <a href="/convert/url-to-markdown">URL to Markdown</a>) — all in <code>.md</code>, all searchable, all cross-referenceable. The format consistency is the unlock.

Try the tool free →