Video to Markdown for Journalists — Video Sources as Text

Press conferences livestream on YouTube and disappear into the algorithm. Video interviews on Zoom take 6-10 hours to transcribe manually. Broadcast footage you need to quote sits as MP4 with no searchable text. None of this works on a deadline. Paste the URL or upload the video to mdisbetter and the structured Markdown is back in minutes: each speaker labelled, every quote timestamped to the video for verification, the whole thing greppable across your source archive.

Why this is hard without the right tool

Press conferences need fast transcription
Video interviews need verbatim quotes
Broadcast footage needs documentation
Deadline pressure on video content

Recommended workflow

For press conferences livestreamed on YouTube / Twitch / Twitter: paste the URL into /convert/video-to-markdown as soon as the stream ends
For Zoom interviews you recorded: download the recording, upload the MP4
For broadcast footage, leaked video, social-media video evidence: upload the file directly
Convert — minutes per hour of video, not hours per hour
Download the Markdown: speakers labelled, quotes timestamped, structured as **Reporter:** / **Source:** exchanges
Use ctrl-F to find quotes by keyword; jump to the timestamp in your video player to verify the verbatim wording AND the speaker's non-verbal cues before publication
Build a personal source-archive folder of .md transcripts — searchable across every video source you've ever logged

Deadline workflow: from livestream end to filed story

Press conference ends at 3pm, deadline is 6pm. Old workflow: re-watch the recording at 1.5x for two hours pulling quotes manually, hit deadline with maybe four usable quotes from a 90-minute event. New workflow: paste the YouTube livestream URL into mdisbetter as the conference ends, get the structured transcript back in 5-10 minutes, ctrl-F for the topics relevant to your beat, pull twenty quotes with timestamps, verify each against the video before filing, hit deadline with depth. The transcription speed-up is the difference between covering one angle and covering five.

Verification discipline: never publish a quote you can't play back

The timestamps in the Markdown output ([12:34] next to each speaker turn) map back to the original video. Before any quote ships, jump to the timestamp in your video player and confirm both the verbatim wording AND the speaker's tone, expression, context. The transcript is a draft; the video is the source of truth. Treat the Markdown as a fast index into your video, not a replacement for it. This is the same discipline pre-AI tools required, just faster — you can verify 30 quotes in the time it used to take to transcribe one.

Multi-source video story workflow

When a story pulls from a press conference, three video interviews, and broadcast footage — six video sources across two weeks — an Obsidian vault of .md transcripts becomes a research workspace. Cross-reference quotes from different sources by topic. Build a timeline of who said what when. Use ripgrep across the whole vault to find every video source that mentioned a particular policy. None of this is possible with video files in a folder; all of it falls out for free once the transcripts are Markdown.

Privacy note for protected sources

For genuinely sensitive video sources — whistleblower video, off-the-record video interviews, leaked footage where any cloud upload is a serious risk — DO NOT use the mdisbetter web tool. Run whisper or faster-whisper entirely offline on your laptop after extracting audio from the video with ffmpeg locally. Same accuracy, zero network egress, no cloud-side processing. The web tool is the right speed/convenience tradeoff for the 90% of video sources where the source isn't at risk; for the 10% where it is, the OSS path keeps everything on your own hardware. This matters more for video than audio because video can identify a source's appearance, location, surroundings — much harder to anonymise than voice.

Cross-link to PDF source documents and webpages

Most investigative stories pull from PDFs (court filings, leaked memos) and webpages (press releases, archived posts) alongside video sources. Convert PDFs with /convert/pdf-to-markdown and store alongside video transcripts. Same vault, same searchable corpus, video quotes and document quotes side by side, all greppable. Format consistency across source types is what makes long investigations actually navigable.

Frequently asked questions

How fast can I get a press conference transcript on deadline?

For a 90-minute press conference livestream on YouTube: paste the URL into <a href="/convert/video-to-markdown">/convert/video-to-markdown</a> as the stream ends, conversion typically completes in 5-15 minutes (depends on current load). Total elapsed time from livestream end to ready-to-grep transcript: under 20 minutes. For uploaded MP4 files, similar timing — minutes per hour of video, not hours per hour. The deadline math is dramatically different from manual transcription where 90 minutes of conference = 6-10 hours of transcription work.

How accurate are verbatim quotes from press conferences and interviews?

Word-error-rate is typically 3-8% on clean recordings (single mic, native English speaker, quiet room). For political press conferences with podium audio, expect 4-7% WER. For Zoom interviews with decent connection, 3-6% WER. For broadcast footage from professional production, 2-5% WER. For phone-quality video calls or noisy press scrums, expect 8-15% WER. ALWAYS verify the verbatim wording against the video at the timestamp before publishing — the transcript is a fast index, the video is the source of truth.

Can I transcribe a YouTube livestream that's still live?

No — wait until the livestream has ended and YouTube has processed it into a regular video. Once the URL points to a finished video (rather than a live stream), paste it into <a href="/convert/video-to-markdown">/convert/video-to-markdown</a> for conversion. Most major news organisations' press conferences become regular YouTube videos within minutes of the livestream ending. For real-time live transcription during the stream, mdisbetter is the wrong tool — use a live captioning service (Otter Live, AssemblyAI streaming) for that.

How do I handle protected sources where cloud processing is a risk?

For genuinely sensitive video sources (whistleblowers, off-the-record interviews, leaked footage where cloud upload is a serious risk), DO NOT use the mdisbetter web tool. Run <a href="https://github.com/openai/whisper">whisper</a> or <a href="https://github.com/SYSTRAN/faster-whisper">faster-whisper</a> entirely offline on your own laptop. Workflow: extract audio from video with ffmpeg locally (<code>ffmpeg -i source.mp4 -vn -acodec mp3 audio.mp3</code>), then run whisper on the audio file offline. Same accuracy as the web tool, zero network egress. Use the web tool for the 90% of routine video sources where this isn't a concern; use the OSS local path for the 10% where it is.

Can I search across years of past video interviews and press conferences?

Yes — once each video source is a <code>.md</code> file, ripgrep / Obsidian search / Notion search all work across the whole archive. Search a politician's name across 3 years of press conferences and you get every appearance, with timestamps to play back the video. Search "supply chain" across your beat's video sources and find every time a source touched the topic. This kind of cross-source recall was effectively impossible when video transcripts didn't exist or lived in proprietary apps.

Try the tool free →