Browse all technical articles on the MDisBetter blog.
Signal-to-noise ratio, microphone choice (USB headsets to SM7B), room treatment, and pre-processing — what actually moves transcription accuracy from 70% to 99%.
TechnicalArchitecture for migrating thousands of Word documents to Markdown at enterprise scale. Audit, categorise, prioritise, batch-convert with Pandoc CLI, quality-check, organise, publish. Real bash and Python snippets, realistic timelines.
TechnicalEngineering retrospective: the architecture decisions, the failure modes we hit, the accuracy improvements that actually moved the needle.
TechnicalDecades of voicemails, meetings, podcasts, and interviews — unindexed and unsearchable. Convert everything to Markdown, organize by date and speaker, search with ripgrep or Obsidian, and optionally embed for semantic retrieval. Includes local Whisper batch script.
TechnicalPractical guide: identify video sources, transcribe (web tool for one-offs, yt-dlp + Whisper local for batch), organize with frontmatter metadata, full-text search with ripgrep or Obsidian, optional semantic search.
TechnicalEnd-to-end architecture for converting web sources into a queryable AI knowledge base. Source identification, conversion, chunking, embedding, vector storage, and update strategy — with code and tool recommendations.
TechnicalStatic fetch vs headless browser, Playwright/Puppeteer mechanics, wait conditions, performance and cost tradeoffs. How modern URL-to-Markdown tools handle JS-rendered SPAs.
TechnicalTechnical deep dive: from HMM-era speech recognition through encoder-decoder transformers and Whisper's 680k-hour training set, with notes on why structured Markdown output matters for downstream LLM use.
TechnicalTechnical deep-dive: a .docx file is a ZIP archive of XML files. Walk through document.xml, styles.xml, and the OOXML structure, and see why naive text extraction loses heading semantics and why styles.xml is the secret to good Word-to-Markdown conversion.
TechnicalA technical deep-dive into the PDF file format: content streams, glyph positioning, why extraction is lossy, and what this means for AI workflows.
TechnicalTechnical deep dive: YouTube's caption system explained — auto-generated ASR vs creator-uploaded tracks, why auto-captions are unreliable, and why fresh AI re-transcription beats them on accuracy.
TechnicalTechnical deep dive: DOM parsing, tree-walking, element-by-element conversion rules, and why naive html2text falls short on modern web pages.
TechnicalTechnical comparison of the three approaches to Word-to-Markdown conversion: Mammoth.js (semantic, JS library), Pandoc (structural, multi-format CLI), and AI-powered (context-aware). When to use each, with realistic accuracy and tradeoff numbers.
TechnicalWe measured token counts for HTML and Markdown versions of 5 representative web pages with tiktoken. Markdown saves 60-85% of tokens. GPT-4o cost math included.
TechnicalSide-by-side: what plain-text transcripts lose, what Markdown preserves (speakers, sections, timestamps, emphasis), and the measurable LLM-extraction quality difference between the two formats.
TechnicalSide-by-side comparison: SRT and VTT are subtitle formats for video player display; plain text is unstructured; Markdown gives you structure plus readability plus AI-readiness. When to use each.
TechnicalThree chunking strategies for RAG pipelines: header-based, token-based, paragraph-based. When each wins, with code examples and evaluation metrics.
TechnicalDiarization explained: pyannote.audio vs proprietary engines, accuracy by speaker count, when it fails, and how Markdown represents multiple speakers cleanly.
TechnicalMethodology and results from a 20-document benchmark measuring token usage on raw PDF vs Markdown for ChatGPT, Claude, and Gemini. With cost implications.
TechnicalTechnical guide: integrate video content into RAG. Pipeline = video → transcript → Markdown → chunk by H2/H3 → embed → vector DB → retrieve. Multi-hour content handling, parent-document linking, real Python with sentence-transformers + ChromaDB.
TechnicalTechnical deep dive: how diarization combines visual cues (face tracking, lip detection) with audio signals to label speakers in video. Realistic accuracy by speaker count and failure modes.
TechnicalTechnical deep-dive on the main-content extraction problem. Mozilla Readability, Trafilatura, and LLM-based extraction compared — strengths, weaknesses, and when to use each.
TechnicalTechnical deep-dive on table conversion: Word's table model supports nested tables, merged cells, multi-row headers, and complex spans. Markdown's table model is flat rows-and-columns. What's possible, what breaks, and the best-effort strategies to bridge the gap.