Why MP4 specifically
MP4 is the most-searched-for video format because it's what virtually every modern recording tool defaults to. iPhone and Android camera apps record MP4. Zoom Cloud Recording exports MP4. Most screen recorders default to MP4. Exported edits from Final Cut, Premiere, DaVinci Resolve all output MP4 by default. "MP4 to text" is the dominant search query for the video transcription job; we make sure the workflow is explicit for that format. Other formats (MOV, MKV, WebM) all transcribe identically through the same pipeline.
How it works
Upload your MP4 file. Audio is auto-extracted (the video stream is ignored — speech recognition only operates on audio). The audio is processed through Whisper-class speech recognition. Output is plain UTF-8 text with paragraph breaks. Free tier handles MP4s up to ~60 minutes; Pro handles multi-hour MP4s in one pass.
For other video formats
MOV (QuickTime), MKV (Matroska), WebM, AVI, WMV, FLV all work the same way through the Video to Text tool. If your file isn't MP4 specifically, just use that one — same engine, same output, no quality difference. For unusual formats, convert to MP4 first with ffmpeg one-liner: ffmpeg -i input.weirdformat output.mp4.