Native video input is convenient and opaque
Gemini's native video path is excellent for one-off "what is this video about" questions. It is the wrong tool for the job when you want to: (a) keep a re-usable transcript, (b) hand-correct any extraction errors, (c) feed the same video to multiple downstream prompts without re-paying processing tokens, or (d) cross-reference the video against text documents in the same conversation. For all of those, Markdown is the right primitive.
Convert once on Video to Markdown (paste a YouTube URL or upload your MP4), download the .md, and feed that to Gemini. The 1M-token window means you can fit several hours of structured transcript alongside the related slide deck (PDF) and the project brief (URL) — a multi-modal source set that Gemini can cross-reference as one unified context.
AI Studio and Vertex workflow
Both AI Studio and Vertex accept multiple .md attachments per conversation. Pattern: convert the conference talk video, attach the transcript, also attach the PDF version of the slides (PDF to Markdown for Gemini) and the speaker's blog post (URL to Markdown for Gemini). Ask Gemini to verify whether the spoken claims match the slides and the blog. The 1M window is finally doing useful work.