The DOCX format under the hood
A .docx file is literally a .zip archive containing an XML document tree, styles, embedded media, and metadata. Renaming a .docx to .zip and unzipping gives you the raw OOXML — but the text is buried in nested <w:p> and <w:r> elements with style references that need a Word renderer to make sense of. mdisbetter does that extraction server-side and returns clean plain text in seconds.
What gets extracted
All visible body text in reading order. Headings, paragraphs, list items, and table cells all flatten to plain paragraphs. Comments, footnotes, and tracked-changes anchors are stripped. Hidden text and content controls are dropped. Headers and footers are excluded by default (most users don't want page-number boilerplate in their text). For full structure — headings as ##, lists as -, tables as proper tables — use Word to Markdown.
Browser-only or server-side?
Server-side. DOCX extraction needs a working OOXML parser, which is too heavy to ship to the browser for every page load. Your file is processed in memory and deleted immediately after conversion — no storage, no logs, no training data.