Pricing Dashboard Sign up
Recent
· 9 min read · MDisBetter

Audio to Markdown for Journalists: Transcribe Sources Fast

You finished a 90-minute interview at 4 PM. Your editor wants the piece by morning. The quote you need is somewhere around the 47-minute mark — at least, you think so. The classic journalism workflow at this point is to scrub the audio file, scribbling notes, and hope you find the line before midnight. Converting the recording to a speaker-labeled Markdown transcript turns that ninety-minute hunt into a Ctrl-F. The transcript becomes the working document — searchable, quotable, and durable enough to defend the quote in fact-checking three weeks later.

Why audio is journalism's slowest workflow

Reporting on a single source interview involves four distinct steps that each consume real time:

For a reporter doing 3-5 source interviews per week, the listening and transcribing time is the bottleneck. A 90-minute interview takes a fast typist 4-6 hours to transcribe by hand. Most reporters skip this and instead listen-and-quote, which works but loses the searchability that fact-checking and follow-up reporting both rely on.

An AI-generated Markdown transcript collapses listen+transcribe into a single step measured in minutes, and produces an artifact that survives the article's lifecycle. The quote you used in last month's piece is still searchable when a corrections request comes in this month.

The interview workflow

The reliable pipeline:

  1. Record the interview (your phone's voice memo app, a portable recorder, or Zoom's local recording feature)
  2. Get consent on the record — every state has its own law, but on-the-record consent at the start of the recording is universally good practice and increasingly an editorial requirement at major outlets
  3. Upload the recording to audio-to-markdown
  4. Download the .md file with speaker labels and timestamp anchors
  5. Read and quote directly from the transcript — searchable, scannable, copyable
  6. Archive the original audio and the transcript together in your source folder, indexed by date and source name

The Markdown is your working document. The audio remains the source of record — for fact-checking, for legal review, for any subsequent dispute about what was actually said.

Quote extraction and verification

The structured transcript makes quote workflow fast. A typical source folder for a single story looks like:

Stories/
  2026-05-housing-crisis/
    interviews/
      2026-04-28-mayor-johnson.mp3
      2026-04-28-mayor-johnson.md
      2026-04-30-developer-smith.mp3
      2026-04-30-developer-smith.md
      2026-05-02-tenant-rivera.mp3
      2026-05-02-tenant-rivera.md
    background/
    drafts/

When you sit down to draft, every quote is one Ctrl-F away in the transcript folder. Need every time the mayor mentioned "affordable" across all three interviews you did with her? Grep across the .md files. Need to confirm a specific phrase before sending to fact-checking? The timestamp in the Markdown points you straight to the relevant 30-second window in the audio for verification.

The working pattern: read the transcript, copy the quote into your draft, paste the timestamp anchor next to it as a comment for fact-checking. The fact-checker pulls up the audio, jumps to the timestamp, confirms the wording matches, and signs off. What used to take 20 minutes per quote takes 2.

Comparing with HappyScribe and TurboScribe

The honest competitive landscape:

For most working reporters, mdisbetter or TurboScribe handle the daily volume. HappyScribe earns its price on long, high-stakes recordings where you want a human in the loop.

Protected sources: the privacy question

Any cloud-based transcription service involves uploading audio to a third party. For routine interviews — public officials, on-record sources, expert commentary — this is fine. For protected sources, sensitive whistleblowers, or audio that could endanger someone if leaked, it is not.

The defensible answer for sensitive sources is to run transcription locally using open-source Whisper. The audio never leaves your machine. The model is OpenAI's open-weights Whisper (or one of its derivatives — faster-whisper, WhisperX) running on your laptop's CPU or GPU. Setup is straightforward:

import whisper

model = whisper.load_model("large-v3")
result = model.transcribe("sensitive-interview.mp3")

with open("sensitive-interview.md", "w", encoding="utf-8") as f:
    for seg in result["segments"]:
        ts = f'[{seg["start"]:.0f}s]'
        f.write(f'{ts} {seg["text"].strip()}\n\n')

The large-v3 model is roughly 3 GB on disk and runs on a modern Mac (M1 or later) at near real-time on CPU, or 5-10x real-time on a consumer GPU. For sensitive material, this is the right tool. For everyday interviews, the cloud workflow is faster.

The technical background on how Whisper works under the hood is in how AI transcription actually works. The accuracy expectations by audio quality are in audio quality vs transcription accuracy.

Recording for transcription quality

The accuracy of any transcription engine — cloud or local — depends on the audio it gets. A few practical reporting tips:

Cross-feature: when the source is a document, not audio

Beat reporting often involves source documents alongside interviews — leaked memos, government PDFs, court filings, web articles you're tracking. Convert these to Markdown too and store them next to your audio transcripts. See URL to Markdown for academic research for the parallel pattern on web sources, which works equivalently for journalism. The end result: a single source folder per story, with everything readable in plaintext, searchable from the command line, and ready to feed to AI for cross-source synthesis.

AI-assisted background research

Once your interview transcripts are Markdown, you can put them to work in ways the audio file never could. Drop a folder of three transcripts into Claude with prompts like:

This is the work that used to take a full day of re-reading and that now takes ten minutes. The catch: the AI can summarize and compare, but the final fact-checking — verifying that the quote you publish exactly matches what the source said — has to happen against the original audio. The transcript is the working tool; the recording is the source of record.

Fact-checking and the corrections drawer

Months after publication, a corrections request comes in. Did the source actually say X? Without a transcript, the answer is "let me find the audio file from May, scrub through 90 minutes, and call you back tomorrow." With a Markdown transcript: grep the file for X, find the timestamp, verify against audio, respond same day.

Newsrooms that have moved to systematic transcription as a workflow — not just for individual stories but as a default discipline — describe the same compounding benefit: every interview becomes a permanently searchable record. Three years in, the corpus is an institutional asset. A reporter joining the team can grep for prior coverage of any source, any topic, any specific phrase. The newsroom remembers more than any individual reporter does.

The investigative beat: long-form corpus building

For investigative reporters working a beat over months, the value of transcripts compounds. A reporter covering a single sprawling story — corporate misconduct, political corruption, institutional failure — typically conducts 30-100 interviews across the life of the project. Without transcripts, each interview is a discrete event whose substance lives in the reporter's head and in scrawled notebook pages. With transcripts, the entire interview corpus is a queryable archive that surfaces patterns no individual conversation reveals.

Three workflows the corpus enables:

For the investigative reporter, the transcript corpus is a research tool that didn't exist before. The technical underpinnings of the search workflows are at building a searchable audio archive.

The bottom line for daily reporting

Record interview → upload to audio-to-markdown → download structured .md → quote and verify → archive with original audio. For sensitive sources, swap step 2 for OSS Whisper locally so the audio never leaves your machine. For mass review of large interview corpora, see building a searchable audio archive. Total time saved per week of source interviews: typically 8-15 hours, depending on volume. Time recovered to do actual reporting.

Frequently asked questions

Can I trust the transcript to be accurate enough to quote from directly?
Trust the transcript for finding the quote and seeing approximately what was said; verify the exact wording against the original audio before publication. Modern AI transcription on clean audio is in the 95-99% accuracy range, but the 1-5% errors tend to cluster on proper nouns, technical terms, and overlapping speech — exactly the words you most need to get right. The workflow is: transcript for discovery and drafting, audio for final verification on every direct quote.
What's the legal risk of uploading interview audio to a cloud service?
For on-the-record interviews where consent was obtained on tape, the legal risk is low — the conversation isn't confidential. For off-the-record sources, anonymous tips, or recordings from jurisdictions with strict two-party-consent laws, the legal and ethical considerations point toward local-only processing. Run OSS Whisper on your own machine for any audio where the source's safety, your source-confidentiality obligation, or your legal exposure depends on the audio not leaving your device. For a public official's on-record statement, the cloud workflow is fine.
How do I handle interviews in a language other than English?
Modern AI transcription engines (including OSS Whisper) support roughly 100 languages with varying quality — strong on European and major Asian languages, weaker on lower-resource languages. For multilingual interviews where the source switches between languages, the transcription handles code-switching reasonably well at the segment level. For best results, transcribe with the model set to the dominant language and manually clean the second-language segments. For long-form coverage of non-English-speaking communities, consider pairing transcription with a separate Markdown-translation pass to produce a bilingual archive.