Pricing Dashboard Sign up
Recent
· 10 min read · MDisBetter

Migrate Word Documentation to GitHub (Docs-as-Code Guide)

Engineering teams that ship documentation in Word are doing it wrong, and most of them know it. Word is fine for one-shot deliverables but actively hostile to documentation that needs to evolve: no diff, no merge, no version history that anyone wants to read, no reviewable pull requests, no CI checks, no public-facing render. Docs-as-code — Markdown in Git — fixes all of this. Here's the playbook to migrate a Word documentation library to GitHub progressively, without disrupting your team.

Why docs-as-code wins

Step 1: Set up your repository

Either create a new repo or add a docs/ folder to an existing one. A standard layout:

my-project/
├── README.md              (entry point — what is this thing)
├── CONTRIBUTING.md        (how to contribute, including how to write docs)
├── docs/
│   ├── index.md           (docs landing page)
│   ├── getting-started/
│   │   ├── installation.md
│   │   └── quickstart.md
│   ├── guides/
│   │   ├── authentication.md
│   │   └── deployment.md
│   ├── reference/
│   │   └── api.md
│   └── images/
├── mkdocs.yml             (or docusaurus.config.js, or _config.yml for Jekyll)
└── .github/workflows/
    └── docs.yml           (CI: build docs, deploy to Pages)

Step 2: Convert your Word documents

Two paths depending on volume.

Small library (under ~30 docs): web tool, file by file

For 5-30 documents, the MDisBetter Word to Markdown converter is the fastest setup-free path. For each Word doc:

  1. Drop the .docx in the converter
  2. Click Convert, download the .md
  3. Place it in the right docs/ subfolder
  4. Rename to a kebab-case slug (installing-the-cli.md not Installing The CLI v3.md)
  5. Add YAML frontmatter (see Step 4)
  6. Commit

This is also the right path for progressive migration — convert one doc at a time as you touch each section, rather than a big-bang migration.

Larger library (30+ docs): Pandoc batch

Install Pandoc, then bulk-convert from your Word folder to docs/:

cd /path/to/word-docs
mkdir -p ~/repo/docs/imported

for f in *.docx; do
  base=$(echo "${f%.docx}" | tr '[:upper:]' '[:lower:]' | tr ' ' '-')
  pandoc -f docx -t gfm \
    --extract-media="$HOME/repo/docs/images" \
    "$f" -o "$HOME/repo/docs/imported/$base.md"
done

This kebab-cases filenames and dumps images in a shared folder. For more on bulk strategy see convert multiple Word documents.

Step 3: Restructure

The Word folder structure usually doesn't match the docs structure you want. After the bulk conversion, spend an hour reorganising. Move files into getting-started/, guides/, reference/, tutorials/. Drop docs that are obsolete. Merge docs that overlap. Rename for clarity.

This is the highest-leverage hour of the whole migration. The Word library is probably an organic mess of historical docs; the new docs/ folder should be intentional.

Step 4: Add YAML frontmatter

Most static site generators (MkDocs, Docusaurus, Hugo, Jekyll) want frontmatter at the top of each .md:

---
title: "Authentication Guide"
description: "How to authenticate against the API using OAuth 2.0 and API keys."
weight: 20
tags: [auth, security, api]
updated: 2026-05-10
---

Bulk-add a stub frontmatter to every imported file with a script:

for f in docs/imported/*.md; do
  title=$(head -n 1 "$f" | sed 's/^# //' | sed 's/"/\\"/g')
  cat > "$f.tmp" <<EOF
---
title: "$title"
updated: $(date -I)
---

EOF
  cat "$f" >> "$f.tmp"
  mv "$f.tmp" "$f"
done

Then refine the frontmatter manually as you reorganise.

Step 5: Add MkDocs (or Docusaurus)

MkDocs is the simplest path: pip install, write a YAML config, get a beautiful docs site for free. Install:

pip install mkdocs mkdocs-material

Create mkdocs.yml at the repo root:

site_name: My Project Docs
site_url: https://myproject.github.io
theme:
  name: material
  features:
    - navigation.tabs
    - search.suggest
    - content.code.copy
nav:
  - Home: index.md
  - Getting Started:
    - Installation: getting-started/installation.md
    - Quickstart: getting-started/quickstart.md
  - Guides:
    - Authentication: guides/authentication.md
    - Deployment: guides/deployment.md
  - Reference:
    - API: reference/api.md
markdown_extensions:
  - admonition
  - pymdownx.superfences
  - pymdownx.tabbed
  - tables

Preview locally:

mkdocs serve

Open http://127.0.0.1:8000. For a deeper MkDocs walkthrough see build a MkDocs site from Word documents.

If you want React-flavored docs with versioning and i18n, use Docusaurus instead — same Markdown source files, different generator.

Step 6: Set up CI to deploy on push

Add a GitHub Actions workflow at .github/workflows/docs.yml:

name: Deploy docs
on:
  push:
    branches: [main]
    paths:
      - 'docs/**'
      - 'mkdocs.yml'
      - '.github/workflows/docs.yml'
permissions:
  contents: write
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.x' }
      - run: pip install mkdocs-material
      - run: mkdocs gh-deploy --force

Push to main, GitHub Pages serves the rendered site at https://<org>.github.io/<repo>. Every doc change becomes a code-reviewed PR and deploys automatically.

Step 7: Add docs lint to CI

Catch broken links and bad Markdown in PRs before merge:

  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: DavidAnson/markdownlint-cli2-action@v16
        with:
          globs: 'docs/**/*.md'
      - uses: lycheeverse/lychee-action@v1
        with:
          args: --no-progress 'docs/**/*.md'

Markdownlint catches structural issues (inconsistent heading levels, bad list nesting). Lychee validates every link in your docs is reachable.

Step 8: Update the team workflow

The hardest part isn't conversion — it's behaviour change. To make docs-as-code stick:

Handling images and diagrams

Word docs often have embedded screenshots and diagrams. Pandoc with --extract-media dumps them in a folder. Reference them in your .md as ![Alt text](../images/screenshot.png).

For diagrams, consider switching to Mermaid (text-based diagrams that render natively in GitHub and most static site generators):

```mermaid
sequenceDiagram
  Client->>Server: POST /auth
  Server-->>Client: 200 + token
```

Mermaid diagrams version-control as text — no binary blobs, no "who edited the diagram" mystery.

Pandoc tips for technical docs

For docs with code samples and complex tables, Pandoc has flags worth knowing:

pandoc -f docx -t gfm --wrap=none --strip-comments \
  --extract-media=images guide.docx -o guide.md

Other source formats

Most documentation projects mix Word docs with PDFs (vendor manuals), web pages (RFCs, blog posts), and audio (recorded knowledge transfer sessions). Same destination, different converters: PDF to Markdown, URL to Markdown, Audio to Markdown. All produce GFM-compatible .md you can drop straight into docs/.

What about the Word originals?

Keep them in a docs-archive/ folder for the first 6 months in case you need to verify a conversion. After that, archive offline and treat the Markdown as the source of truth. Don't try to keep Word and Markdown in sync — that's the worst of both worlds.

Versioning your docs

If you ship multiple product versions, you'll want versioned docs. Two common patterns:

Branch-based versioning

Create a docs branch per major version (docs/v1, docs/v2, docs/v3). Each version's docs live on its own branch. Build each branch separately and host at versioned URLs (/v1/, /v2/, /v3/). Simple but maintenance-heavy.

Folder-based versioning

Keep all versions in a single branch under docs/v1/, docs/v2/, docs/v3/. Easier to cross-reference between versions. Plays well with MkDocs mike plugin or Docusaurus versioning.

For most teams, folder-based versioning with the latest as docs/latest/ is the right starting point. Branch-based versioning adds value when versions diverge significantly.

Search and analytics

Once your docs are public, instrument them:

Analytics tells you which docs to invest in. The bottom 20% of pages by traffic often deserve archiving; the top 20% often deserve deeper coverage.

Onboarding new contributors

Once docs-as-code is set up, the harder problem is getting new team members to actually use it. Tactics that work:

Recommendation

Convert progressively: as your team touches each Word doc for an update, convert it to Markdown via the web tool, place it in docs/, PR the change, deprecate the Word original. For docs that are stable but referenced often (API reference, architecture overview), do those first via Pandoc bulk conversion. The migration is done when no one is editing Word anymore — usually 2-3 months for most teams. See also the Obsidian migration guide for a parallel workflow on personal knowledge bases.

Frequently asked questions

Should I convert all Word docs upfront or progressively?
Progressively — almost always. Big-bang migrations sound efficient but produce 200 stale Markdown files no one looks at. Progressive migration ties the conversion to actual editing intent: when someone updates a doc, that's the moment it earns conversion. After 3-6 months, 80% of the high-value docs are converted; the rest were dead anyway.
What's the best static site generator for docs migrated from Word?
MkDocs with the Material theme is the lowest-friction starting point — pip install, write a YAML config, get a polished site in 30 minutes. For React-heavy projects with versioning and translation needs, Docusaurus. For general-purpose docs in a Jekyll/Ruby ecosystem, the GitHub-native Jekyll. All accept the same GFM output from Word conversion.
How do I keep my docs in sync with code changes?
Two patterns. (1) Co-locate docs with code: docs for module X live in src/X/README.md, and PR review naturally catches doc/code drift. (2) Add a CI check that requires every PR touching public APIs to also touch docs/api.md, otherwise fail. Both work; the first is more popular for libraries, the second for products with dedicated technical writers.