The TikTok-specific gotcha: download first, then transcribe
TikTok's native captions (the auto-generated text overlays you see when watching) are sometimes available as caption files via TikTok's creator tools, but inconsistent across regions and account types. For reliable transcription of TikTok videos, the simplest workflow is: download the TikTok video, upload to mdisbetter for AI transcription. mdisbetter doesn't accept TikTok URLs directly today (TikTok's download API has stricter terms than YouTube's public video URLs). The two-step workflow gives you the transcript without depending on whether TikTok's native captions exist for the video.
Downloading TikTok videos
For your own uploads, TikTok's creator tools include a download option for your own posted videos. For others' TikToks you have rights to use (Creative Commons, explicit permission, fair use commentary), several free tools handle downloading: SnapTik, ssstik.io, SaveTik, and similar URL-based downloaders. The downloaded file is typically MP4. Upload the MP4 to mdisbetter for transcription.
Use cases for TikTok transcription
Creators repurposing their own TikToks into blog posts, newsletter sections, or longer-form content. Researchers analysing TikTok content for academic studies (with appropriate IRB and ethics review). Marketers tracking competitors' TikTok content for trend analysis and positioning intelligence. Educators using TikTok content as teaching material with written companion transcripts. Accessibility purposes for users who prefer reading over watching short-form video.
Honest about TikTok limitations
TikTok videos are typically short (15-60 seconds, occasionally up to 10 minutes for longer formats). Transcription works the same way as longer-form video, but the output is just a few sentences for typical TikTok length. For a single TikTok the transcription value is modest; for batch analysis of many TikToks (a creator's back-catalogue, a research corpus, a competitor monitoring archive), the workflow becomes more valuable. For batch processing, the OSS local workflow (download many with yt-dlp, transcribe locally with faster-whisper) is faster than per-video web tool conversion.
Music-heavy TikTok caveat
Many TikTok videos use background music that's significantly louder than the spoken voice (or have no spoken voice at all, just music + visuals). For music-only TikToks, there's no spoken word to transcribe. For voice-with-loud-music TikToks, transcription accuracy drops compared to clean speech (typically 75-90% depending on speech-to-music balance). For best results, prefer TikToks with clear voiceover and music in the background, not music in the foreground.