Pricing Dashboard Sign up
Recent
· 11 min read · MDisBetter

Word Formatting Preservation: Accuracy Test Across 5 Converters

Most Word-to-Markdown benchmarks test many tools across a variety of documents. This one inverts: one carefully-constructed test document, scored across 5 converters, feature by feature. The document includes every Word feature that commonly fails in conversion: H1-H4 headings, bulleted and numbered lists with deep nesting, tables (simple and merged-cell), images with captions, code blocks with syntax highlighting, footnotes, in-text citations, equations, and cross-references. Per-feature scoring shows exactly where each converter wins and loses — more useful than aggregate scores for picking a tool by your specific feature needs.

The test document

A 24-page synthetic document constructed to exercise every common Word feature. Contents:

Tools tested

  1. Pandoc 3.5
  2. MDisBetter (web tool, May 2026 build)
  3. Word2MD.net (paid tier with AI alt text)
  4. Mammoth.js 1.8
  5. Word native HTML export → html2md (the "do it yourself" baseline)

Scoring

Each feature scored 0-5: 5 = preserved exactly, 3 = preserved with minor degradation, 1 = preserved with significant loss, 0 = dropped or broken.

Per-feature results

FeaturePandocMDisBetterWord2MDMammothWord HTML
H1 headings55553
H2 headings55553
H3 headings55542
H4 headings54432
Bold / italic / underline55554
Bulleted list (1 level)55553
Bulleted list (3-level nesting)54442
Numbered list (1 level)55553
Numbered list (3-level nesting)54432
Mixed list (numbered + bulleted)43331
Simple table55553
Table with merged horizontal cells33422
Table with multi-row headers + merged vertical22412
Inline images55553
Image captions43422
Side-by-side images22322
Image alt text (AI-generated)00500
Code blocks (with language)54321
Code blocks (without language)55542
Footnotes (as GFM footnotes)52210
In-text citations51211
Cross-references32221
Inline equations53332
Display equations53322
Bibliography section53333
Table of contents43333
Total /13011091977956

Pandoc — 110/130 (85%)

Wins or ties on 19 of 26 features. Dominates on footnotes, citations, equations, and bibliography — the academic/legal/technical features. The only feature where it scores 0 is AI image alt text, because Pandoc has no AI integration.

Where Pandoc scores below 5: complex tables (no GFM equivalent for merged vertical cells), side-by-side images (Markdown doesn't support image floating), AI alt text (not a Pandoc concern). On every other feature, Pandoc is best-in-class.

Word2MD — 97/130 (75%)

Strong all-around with one specific advantage: AI image alt text scores 5 where every other tool scores 0. Also wins on complex tables thanks to the HTML <table> fallback that preserves rowspan/colspan. Loses to Pandoc on footnotes, citations, equations.

For image-heavy AI workflows, Word2MD's AI alt text is uniquely valuable — adds 5 points that no other tool provides.

MDisBetter — 91/130 (70%)

Strong on the basics: heading preservation, list nesting, simple tables, code blocks, image extraction. Loses ground on the academic features: footnote handling is inline (loses GFM footnote structure), citations are plain text (no Word bibliography integration), equations are preserved as images rather than LaTeX.

Where MDisBetter wins: the basics are clean, output is portable, no install or signup required, multi-format breadth across Word + PDF + URL + audio.

Mammoth.js — 79/130 (61%)

Solid for a free JavaScript library. Clean semantic output (paragraphs, lists, simple tables, headings). Weak on academic features and complex tables. Best used as a building block in custom tooling, not as an end-user tool.

Word HTML export → html2md — 56/130 (43%)

The "do it yourself" baseline. Word's native HTML export is famously bloated, with inline styles, MSO-specific tags, and font declarations everywhere. Even after running through html2md, formatting is lost: heading levels degrade, lists flatten, tables become bloated.

This path is only worth using if no other tool is available. Even MDisBetter's free tier is dramatically better.

Per-category subtotals

CategoryPandocMDisBetterWord2MDMammothWord HTML
Headings (4 features)2019191710
Lists (5 features)2421212011
Tables (3 features)10101387
Images (4 features)11101797
Code (2 features)109863
Academic (5 features)231112107
Document structure (3 features)12117911

Picking by feature you care about

If headings + lists + simple tables matter most

Almost every tool handles these well except the Word HTML path. Pick on convenience: MDisBetter for zero install, Pandoc for CLI/batch.

If footnotes, citations, equations matter most

Pandoc, no contest. Other tools degrade significantly on academic features.

If image alt text matters most

Word2MD's AI alt text is the differentiator. Hyperleap AI also offers this (not tested in this 5-tool comparison).

If complex tables matter most

Word2MD's HTML <table> fallback preserves merged cells. Less portable but most accurate. For Pandoc and others, plan to manually fix complex tables.

If code blocks matter most

Pandoc preserves language hints reliably. MDisBetter close second. For docs with lots of language-tagged code, Pandoc.

What survives the worst across the board

Three features that no tool handles well in current Markdown:

  1. Side-by-side images — Markdown has no image floating syntax. All tools flatten to sequential.
  2. Cross-references — Word's "see Section 3.2" feature loses its dynamic link in Markdown. Most tools convert to plain text.
  3. Multi-row headers + merged vertical cells — GFM doesn't support these. Even the HTML fallback (Word2MD) only partially helps because not every Markdown renderer respects rowspan.

For docs heavy in any of these features, plan for manual cleanup regardless of which tool you use.

What changed since 2024

The biggest shift is AI alt text becoming a real feature. In 2024 no tool offered it; today Word2MD and Hyperleap have it shipping. Expect every major paid tool to ship AI alt text by end of 2026 — pricing it into the differentiation comparison.

Pandoc has continued steady improvement on edge cases (better handling of Word's legacy WordArt, better OOXML conformance) but the broad ranking has been stable for 5+ years.

How to test on your own documents

The honest answer: this benchmark uses one synthetic doc. Your docs are different. To replicate:

  1. Pick one of your real, representative Word docs — ideally one that includes the features you actually care about
  2. Convert it through 2-3 candidate tools (don't bother with the Word HTML path)
  3. Open the .md outputs side-by-side with the Word original
  4. Score each on the features that matter to your workflow (skip features you don't use)
  5. Pick the tool that wins on your weighted score

Within 30 minutes you have your-corpus data. Generic benchmarks (this one included) point in a direction; your-corpus tells you the answer for your use case.

What about other source formats?

Most workflows mix Word with PDF and URL. Same logic applies: pick the right converter per format. See best free PDF to Markdown converters for PDF and URL to Markdown ranked review for web. Cross-format pipelines often end up with a mix: Pandoc for Word, marker for PDF, Trafilatura for URL — or one platform like MDisBetter for all four formats.

Output cleanliness comparison

Beyond raw feature preservation, the cleanliness of the resulting Markdown source matters for ongoing editing. Score each tool on source readability:

ToolLine wrappingTrailing whitespaceEmpty paragraphsStray HTMLCleanliness /20
PandocConfigurableNoneRareMinimal17
MDisBetterConfigurableNoneRareMinimal16
Word2MDDefault wrapNoneOccasionalHTML for complex tables14
Mammoth.jsDefault wrapSomeCommonNone13
Word HTML pathErraticCommonCommonHeavy MSO HTML5

Pandoc edges ahead on cleanliness because of --wrap=none producing diff-friendly source. The web tools default to wrapping at standard widths which is fine for reading but produces noisy diffs in Git.

Performance comparison

Conversion speed isn't usually a concern for one-off use, but matters for batch jobs. Approximate speeds for our 24-page test document:

ToolTime per docThroughput per minute
Pandoc (CLI)0.4-0.8 sec~100 docs
Mammoth.js (in-process)0.2-0.5 sec~150 docs
MDisBetter (web upload)3-6 sec~12 docs
Word2MD (web upload)4-8 sec~10 docs

Web tools include upload + processing + download time. Local tools are dramatically faster but require setup. For batch processing speed, Pandoc and Mammoth dominate.

Recommendation

For complex docs (academic, legal, technical with footnotes/citations/equations): Pandoc. For image-heavy docs going into AI workflows: Word2MD. For convenience and multi-format breadth: MDisBetter. For developer integration: Mammoth.js. Skip Word's native HTML export. See also 8-tool benchmark, 2026 ranked review, and Word tables to Markdown guide.

Frequently asked questions

Why do all tools score the same low number on side-by-side images?
Markdown has no syntax for image positioning — there's no float left/right or column layout. Every tool has to flatten side-by-side images to sequential ones, which is technically correct (the image content is preserved) but visually different from the original. To preserve side-by-side layout, you'd need to use raw HTML in the Markdown, which is the standard escape hatch for layout features GFM doesn't support natively.
How is the 'AI image alt text' score relevant if I'm not using AI?
It's only relevant if your downstream workflow includes LLMs (RAG, document Q&A, ChatGPT-as-reviewer) or accessibility requirements (screen readers). For simple human reading or static doc sites, the original Word alt text (often empty or generic) is fine. The score line shows the differentiator for the use case where it matters; ignore it for the use case where it doesn't.
Can I see the actual converted output for the test document?
We don't publish the converted .md files because the test document includes synthetic-but-realistic content (fake citations, fake company names) that could be misleading if extracted. The methodology — feature list, scoring rubric, tool versions — is fully reproducible. Build a similar test doc with your real-world feature mix and run the same comparison; that's the more useful data anyway since it reflects your actual content.