May 10, 2026 · 9 min read · MDisBetter

Audio to Markdown for Journalists: Transcribe Sources Fast

You finished a 90-minute interview at 4 PM. Your editor wants the piece by morning. The quote you need is somewhere around the 47-minute mark — at least, you think so. The classic journalism workflow at this point is to scrub the audio file, scribbling notes, and hope you find the line before midnight. Converting the recording to a speaker-labeled Markdown transcript turns that ninety-minute hunt into a Ctrl-F. The transcript becomes the working document — searchable, quotable, and durable enough to defend the quote in fact-checking three weeks later.

Why audio is journalism's slowest workflow

Reporting on a single source interview involves four distinct steps that each consume real time:

Recording: 30-120 minutes of audio captured
Listening: replaying the audio to find quotes, usually at 1.5x or 2x speed
Transcribing: typing what was said, either fully or just the quotes you'll use
Verifying: confirming the quote against the recording before publication, then again during legal review

For a reporter doing 3-5 source interviews per week, the listening and transcribing time is the bottleneck. A 90-minute interview takes a fast typist 4-6 hours to transcribe by hand. Most reporters skip this and instead listen-and-quote, which works but loses the searchability that fact-checking and follow-up reporting both rely on.

An AI-generated Markdown transcript collapses listen+transcribe into a single step measured in minutes, and produces an artifact that survives the article's lifecycle. The quote you used in last month's piece is still searchable when a corrections request comes in this month.

The interview workflow

The reliable pipeline:

Record the interview (your phone's voice memo app, a portable recorder, or Zoom's local recording feature)
Get consent on the record — every state has its own law, but on-the-record consent at the start of the recording is universally good practice and increasingly an editorial requirement at major outlets
Upload the recording to audio-to-markdown
Download the .md file with speaker labels and timestamp anchors
Read and quote directly from the transcript — searchable, scannable, copyable
Archive the original audio and the transcript together in your source folder, indexed by date and source name

The Markdown is your working document. The audio remains the source of record — for fact-checking, for legal review, for any subsequent dispute about what was actually said.

Quote extraction and verification

The structured transcript makes quote workflow fast. A typical source folder for a single story looks like:

Stories/
  2026-05-housing-crisis/
    interviews/
      2026-04-28-mayor-johnson.mp3
      2026-04-28-mayor-johnson.md
      2026-04-30-developer-smith.mp3
      2026-04-30-developer-smith.md
      2026-05-02-tenant-rivera.mp3
      2026-05-02-tenant-rivera.md
    background/
    drafts/

When you sit down to draft, every quote is one Ctrl-F away in the transcript folder. Need every time the mayor mentioned "affordable" across all three interviews you did with her? Grep across the .md files. Need to confirm a specific phrase before sending to fact-checking? The timestamp in the Markdown points you straight to the relevant 30-second window in the audio for verification.

The working pattern: read the transcript, copy the quote into your draft, paste the timestamp anchor next to it as a comment for fact-checking. The fact-checker pulls up the audio, jumps to the timestamp, confirms the wording matches, and signs off. What used to take 20 minutes per quote takes 2.

Comparing with HappyScribe and TurboScribe

The honest competitive landscape:

HappyScribe: paid service with human-quality transcription tiers (machine + human-corrected). Best accuracy, slower turnaround, costs $0.20-2.00 per audio minute depending on tier. Right choice when accuracy is critical and budget allows — investigative pieces with multi-hour interviews where errors are unacceptable.
TurboScribe: AI-only transcription, fast, has a free tier. Comparable accuracy to mdisbetter.com on most audio. Output is plain text with optional speaker labels and timestamps.
mdisbetter.com: free tier, AI-only, output is structured Markdown (speaker labels + H2 sections + timestamps). Differentiator is the structure — the .md file is more useful as input to downstream AI workflows than a plain-text transcript.
Otter.ai: real-time transcription during the interview, with collaborative review tools. Useful if you need the transcript live; less useful if you record first and transcribe later.

For most working reporters, mdisbetter or TurboScribe handle the daily volume. HappyScribe earns its price on long, high-stakes recordings where you want a human in the loop.

Protected sources: the privacy question

Any cloud-based transcription service involves uploading audio to a third party. For routine interviews — public officials, on-record sources, expert commentary — this is fine. For protected sources, sensitive whistleblowers, or audio that could endanger someone if leaked, it is not.

The defensible answer for sensitive sources is to run transcription locally using open-source Whisper. The audio never leaves your machine. The model is OpenAI's open-weights Whisper (or one of its derivatives — faster-whisper, WhisperX) running on your laptop's CPU or GPU. Setup is straightforward:

import whisper

model = whisper.load_model("large-v3")
result = model.transcribe("sensitive-interview.mp3")

with open("sensitive-interview.md", "w", encoding="utf-8") as f:
    for seg in result["segments"]:
        ts = f'[{seg["start"]:.0f}s]'
        f.write(f'{ts} {seg["text"].strip()}\n\n')

The large-v3 model is roughly 3 GB on disk and runs on a modern Mac (M1 or later) at near real-time on CPU, or 5-10x real-time on a consumer GPU. For sensitive material, this is the right tool. For everyday interviews, the cloud workflow is faster.

The technical background on how Whisper works under the hood is in how AI transcription actually works. The accuracy expectations by audio quality are in audio quality vs transcription accuracy.

Recording for transcription quality

The accuracy of any transcription engine — cloud or local — depends on the audio it gets. A few practical reporting tips:

Use a real microphone when you can. A $30 USB lavalier outperforms a phone speaker mic by 10-20% on word error rate.
Record to two devices for important interviews. Phone in pocket as backup, dedicated recorder on the table.
Quiet rooms beat coffee shops. Background music and HVAC are the two biggest accuracy killers.
Speak in clear turns. Crosstalk and interruption confuse speaker diarization. Polite "sorry, you go ahead" beats simultaneous talking.
State the date and source name on the recording at the start. The transcript will start with this and you'll never lose track of which file is which.

Cross-feature: when the source is a document, not audio

Beat reporting often involves source documents alongside interviews — leaked memos, government PDFs, court filings, web articles you're tracking. Convert these to Markdown too and store them next to your audio transcripts. See URL to Markdown for academic research for the parallel pattern on web sources, which works equivalently for journalism. The end result: a single source folder per story, with everything readable in plaintext, searchable from the command line, and ready to feed to AI for cross-source synthesis.

AI-assisted background research

Once your interview transcripts are Markdown, you can put them to work in ways the audio file never could. Drop a folder of three transcripts into Claude with prompts like:

"Across these three interviews on the same topic, where do the sources agree? Where do they contradict each other?"
"Identify every factual claim in this transcript and rate how verifiable each one is."
"Pull every quote from Source X relevant to the question of [specific issue]."

This is the work that used to take a full day of re-reading and that now takes ten minutes. The catch: the AI can summarize and compare, but the final fact-checking — verifying that the quote you publish exactly matches what the source said — has to happen against the original audio. The transcript is the working tool; the recording is the source of record.

Fact-checking and the corrections drawer

Months after publication, a corrections request comes in. Did the source actually say X? Without a transcript, the answer is "let me find the audio file from May, scrub through 90 minutes, and call you back tomorrow." With a Markdown transcript: grep the file for X, find the timestamp, verify against audio, respond same day.

Newsrooms that have moved to systematic transcription as a workflow — not just for individual stories but as a default discipline — describe the same compounding benefit: every interview becomes a permanently searchable record. Three years in, the corpus is an institutional asset. A reporter joining the team can grep for prior coverage of any source, any topic, any specific phrase. The newsroom remembers more than any individual reporter does.

The investigative beat: long-form corpus building

For investigative reporters working a beat over months, the value of transcripts compounds. A reporter covering a single sprawling story — corporate misconduct, political corruption, institutional failure — typically conducts 30-100 interviews across the life of the project. Without transcripts, each interview is a discrete event whose substance lives in the reporter's head and in scrawled notebook pages. With transcripts, the entire interview corpus is a queryable archive that surfaces patterns no individual conversation reveals.

Three workflows the corpus enables:

Cross-source verification: source A made a specific claim about an event; sources B and C independently described the same event. Grep across all transcripts for the date or detail in question, find every source's account, compare for consistency.
Inconsistency detection: a source said one thing in March and a different thing in August. Searchable transcripts make this trivial to spot; without them, the reporter would have to remember two months apart that the same source addressed the same topic differently.
Source mapping: who told you about whom? An archive of transcripts surfaces the social graph of the story — which sources mentioned which other people, what relationships exist that weren't obvious from any one interview.

For the investigative reporter, the transcript corpus is a research tool that didn't exist before. The technical underpinnings of the search workflows are at building a searchable audio archive.

The bottom line for daily reporting

Record interview → upload to audio-to-markdown → download structured .md → quote and verify → archive with original audio. For sensitive sources, swap step 2 for OSS Whisper locally so the audio never leaves your machine. For mass review of large interview corpora, see building a searchable audio archive. Total time saved per week of source interviews: typically 8-15 hours, depending on volume. Time recovered to do actual reporting.

Frequently asked questions

Can I trust the transcript to be accurate enough to quote from directly?

Trust the transcript for finding the quote and seeing approximately what was said; verify the exact wording against the original audio before publication. Modern AI transcription on clean audio is in the 95-99% accuracy range, but the 1-5% errors tend to cluster on proper nouns, technical terms, and overlapping speech — exactly the words you most need to get right. The workflow is: transcript for discovery and drafting, audio for final verification on every direct quote.

What's the legal risk of uploading interview audio to a cloud service?

For on-the-record interviews where consent was obtained on tape, the legal risk is low — the conversation isn't confidential. For off-the-record sources, anonymous tips, or recordings from jurisdictions with strict two-party-consent laws, the legal and ethical considerations point toward local-only processing. Run OSS Whisper on your own machine for any audio where the source's safety, your source-confidentiality obligation, or your legal exposure depends on the audio not leaving your device. For a public official's on-record statement, the cloud workflow is fine.

How do I handle interviews in a language other than English?

Modern AI transcription engines (including OSS Whisper) support roughly 100 languages with varying quality — strong on European and major Asian languages, weaker on lower-resource languages. For multilingual interviews where the source switches between languages, the transcription handles code-switching reasonably well at the segment level. For best results, transcribe with the model set to the dominant language and manually clean the second-language segments. For long-form coverage of non-English-speaking communities, consider pairing transcription with a separate Markdown-translation pass to produce a bilingual archive.