Audio to Markdown for Journalists: Transcribe Sources Fast
You finished a 90-minute interview at 4 PM. Your editor wants the piece by morning. The quote you need is somewhere around the 47-minute mark — at least, you think so. The classic journalism workflow at this point is to scrub the audio file, scribbling notes, and hope you find the line before midnight. Converting the recording to a speaker-labeled Markdown transcript turns that ninety-minute hunt into a Ctrl-F. The transcript becomes the working document — searchable, quotable, and durable enough to defend the quote in fact-checking three weeks later.
Why audio is journalism's slowest workflow
Reporting on a single source interview involves four distinct steps that each consume real time:
- Recording: 30-120 minutes of audio captured
- Listening: replaying the audio to find quotes, usually at 1.5x or 2x speed
- Transcribing: typing what was said, either fully or just the quotes you'll use
- Verifying: confirming the quote against the recording before publication, then again during legal review
For a reporter doing 3-5 source interviews per week, the listening and transcribing time is the bottleneck. A 90-minute interview takes a fast typist 4-6 hours to transcribe by hand. Most reporters skip this and instead listen-and-quote, which works but loses the searchability that fact-checking and follow-up reporting both rely on.
An AI-generated Markdown transcript collapses listen+transcribe into a single step measured in minutes, and produces an artifact that survives the article's lifecycle. The quote you used in last month's piece is still searchable when a corrections request comes in this month.
The interview workflow
The reliable pipeline:
- Record the interview (your phone's voice memo app, a portable recorder, or Zoom's local recording feature)
- Get consent on the record — every state has its own law, but on-the-record consent at the start of the recording is universally good practice and increasingly an editorial requirement at major outlets
- Upload the recording to audio-to-markdown
- Download the .md file with speaker labels and timestamp anchors
- Read and quote directly from the transcript — searchable, scannable, copyable
- Archive the original audio and the transcript together in your source folder, indexed by date and source name
The Markdown is your working document. The audio remains the source of record — for fact-checking, for legal review, for any subsequent dispute about what was actually said.
Quote extraction and verification
The structured transcript makes quote workflow fast. A typical source folder for a single story looks like:
Stories/
2026-05-housing-crisis/
interviews/
2026-04-28-mayor-johnson.mp3
2026-04-28-mayor-johnson.md
2026-04-30-developer-smith.mp3
2026-04-30-developer-smith.md
2026-05-02-tenant-rivera.mp3
2026-05-02-tenant-rivera.md
background/
drafts/When you sit down to draft, every quote is one Ctrl-F away in the transcript folder. Need every time the mayor mentioned "affordable" across all three interviews you did with her? Grep across the .md files. Need to confirm a specific phrase before sending to fact-checking? The timestamp in the Markdown points you straight to the relevant 30-second window in the audio for verification.
The working pattern: read the transcript, copy the quote into your draft, paste the timestamp anchor next to it as a comment for fact-checking. The fact-checker pulls up the audio, jumps to the timestamp, confirms the wording matches, and signs off. What used to take 20 minutes per quote takes 2.
Comparing with HappyScribe and TurboScribe
The honest competitive landscape:
- HappyScribe: paid service with human-quality transcription tiers (machine + human-corrected). Best accuracy, slower turnaround, costs $0.20-2.00 per audio minute depending on tier. Right choice when accuracy is critical and budget allows — investigative pieces with multi-hour interviews where errors are unacceptable.
- TurboScribe: AI-only transcription, fast, has a free tier. Comparable accuracy to mdisbetter.com on most audio. Output is plain text with optional speaker labels and timestamps.
- mdisbetter.com: free tier, AI-only, output is structured Markdown (speaker labels + H2 sections + timestamps). Differentiator is the structure — the .md file is more useful as input to downstream AI workflows than a plain-text transcript.
- Otter.ai: real-time transcription during the interview, with collaborative review tools. Useful if you need the transcript live; less useful if you record first and transcribe later.
For most working reporters, mdisbetter or TurboScribe handle the daily volume. HappyScribe earns its price on long, high-stakes recordings where you want a human in the loop.
Protected sources: the privacy question
Any cloud-based transcription service involves uploading audio to a third party. For routine interviews — public officials, on-record sources, expert commentary — this is fine. For protected sources, sensitive whistleblowers, or audio that could endanger someone if leaked, it is not.
The defensible answer for sensitive sources is to run transcription locally using open-source Whisper. The audio never leaves your machine. The model is OpenAI's open-weights Whisper (or one of its derivatives — faster-whisper, WhisperX) running on your laptop's CPU or GPU. Setup is straightforward:
import whisper
model = whisper.load_model("large-v3")
result = model.transcribe("sensitive-interview.mp3")
with open("sensitive-interview.md", "w", encoding="utf-8") as f:
for seg in result["segments"]:
ts = f'[{seg["start"]:.0f}s]'
f.write(f'{ts} {seg["text"].strip()}\n\n')The large-v3 model is roughly 3 GB on disk and runs on a modern Mac (M1 or later) at near real-time on CPU, or 5-10x real-time on a consumer GPU. For sensitive material, this is the right tool. For everyday interviews, the cloud workflow is faster.
The technical background on how Whisper works under the hood is in how AI transcription actually works. The accuracy expectations by audio quality are in audio quality vs transcription accuracy.
Recording for transcription quality
The accuracy of any transcription engine — cloud or local — depends on the audio it gets. A few practical reporting tips:
- Use a real microphone when you can. A $30 USB lavalier outperforms a phone speaker mic by 10-20% on word error rate.
- Record to two devices for important interviews. Phone in pocket as backup, dedicated recorder on the table.
- Quiet rooms beat coffee shops. Background music and HVAC are the two biggest accuracy killers.
- Speak in clear turns. Crosstalk and interruption confuse speaker diarization. Polite "sorry, you go ahead" beats simultaneous talking.
- State the date and source name on the recording at the start. The transcript will start with this and you'll never lose track of which file is which.
Cross-feature: when the source is a document, not audio
Beat reporting often involves source documents alongside interviews — leaked memos, government PDFs, court filings, web articles you're tracking. Convert these to Markdown too and store them next to your audio transcripts. See URL to Markdown for academic research for the parallel pattern on web sources, which works equivalently for journalism. The end result: a single source folder per story, with everything readable in plaintext, searchable from the command line, and ready to feed to AI for cross-source synthesis.
AI-assisted background research
Once your interview transcripts are Markdown, you can put them to work in ways the audio file never could. Drop a folder of three transcripts into Claude with prompts like:
- "Across these three interviews on the same topic, where do the sources agree? Where do they contradict each other?"
- "Identify every factual claim in this transcript and rate how verifiable each one is."
- "Pull every quote from Source X relevant to the question of [specific issue]."
This is the work that used to take a full day of re-reading and that now takes ten minutes. The catch: the AI can summarize and compare, but the final fact-checking — verifying that the quote you publish exactly matches what the source said — has to happen against the original audio. The transcript is the working tool; the recording is the source of record.
Fact-checking and the corrections drawer
Months after publication, a corrections request comes in. Did the source actually say X? Without a transcript, the answer is "let me find the audio file from May, scrub through 90 minutes, and call you back tomorrow." With a Markdown transcript: grep the file for X, find the timestamp, verify against audio, respond same day.
Newsrooms that have moved to systematic transcription as a workflow — not just for individual stories but as a default discipline — describe the same compounding benefit: every interview becomes a permanently searchable record. Three years in, the corpus is an institutional asset. A reporter joining the team can grep for prior coverage of any source, any topic, any specific phrase. The newsroom remembers more than any individual reporter does.
The investigative beat: long-form corpus building
For investigative reporters working a beat over months, the value of transcripts compounds. A reporter covering a single sprawling story — corporate misconduct, political corruption, institutional failure — typically conducts 30-100 interviews across the life of the project. Without transcripts, each interview is a discrete event whose substance lives in the reporter's head and in scrawled notebook pages. With transcripts, the entire interview corpus is a queryable archive that surfaces patterns no individual conversation reveals.
Three workflows the corpus enables:
- Cross-source verification: source A made a specific claim about an event; sources B and C independently described the same event. Grep across all transcripts for the date or detail in question, find every source's account, compare for consistency.
- Inconsistency detection: a source said one thing in March and a different thing in August. Searchable transcripts make this trivial to spot; without them, the reporter would have to remember two months apart that the same source addressed the same topic differently.
- Source mapping: who told you about whom? An archive of transcripts surfaces the social graph of the story — which sources mentioned which other people, what relationships exist that weren't obvious from any one interview.
For the investigative reporter, the transcript corpus is a research tool that didn't exist before. The technical underpinnings of the search workflows are at building a searchable audio archive.
The bottom line for daily reporting
Record interview → upload to audio-to-markdown → download structured .md → quote and verify → archive with original audio. For sensitive sources, swap step 2 for OSS Whisper locally so the audio never leaves your machine. For mass review of large interview corpora, see building a searchable audio archive. Total time saved per week of source interviews: typically 8-15 hours, depending on volume. Time recovered to do actual reporting.