How "Audio to Text" actually works here
Every audio file goes through the same Whisper-class speech recognition pipeline as our Markdown converter, but the structural formatting (speaker labels, H2 sections, timestamps) is stripped at the end. Output is flat plain text — paragraphs separated by blank lines, no other structural markers. Copy-paste-ready, search-indexable, ready for any tool that wants UTF-8 string input.
Format support
MP3, WAV, M4A, FLAC, OGG, AAC, WebM, AMR — basically every common audio container. File size limit on the free tier is generous enough for typical podcast episodes and interviews; Pro tier handles multi-hour recordings without splits. Audio quality matters more than format: a 64kbps MP3 transcribes just as accurately as a 320kbps version for speech content.
Single file workflow
Paste the file in, click convert, get the text back. For batch transcription of large back-catalogues (hundreds of files), use openai-whisper or faster-whisper locally — same model class, MIT-licensed, runs on CPU or GPU, processes hundreds of hours overnight. mdisbetter's web tool is the right choice for one-at-a-time conversions where the per-file workflow is acceptable.