Why MarkdownNodeParser changes the math on video
Flat caption parsing on video loses everything that makes a long talk usable: where chapters begin and end, who is speaking on multi-host content, and what time range a given idea spans. MarkdownNodeParser reads the structure the converter emits — ## chapter or speaker headings, ### subtopic headings — and builds a node tree that mirrors the video's real shape.
Retrieval over that tree gets two superpowers immediately. Auto-merging-retriever can climb from a specific quote to the chapter's full content to the video's top-level structure. Hierarchical summary indexes can summarise per-chapter, per-speaker, or per-time-window without re-chunking.
The workflow
Convert each video on Video to Markdown (YouTube URL or uploaded file), save the .md files into an ingestion directory, load with SimpleDirectoryReader (filtered to .md), parse with MarkdownNodeParser. The same pattern works for podcast back-catalogues, conference archives, and course corpora.