Migrate Word Documentation to GitHub (Docs-as-Code Guide)
Engineering teams that ship documentation in Word are doing it wrong, and most of them know it. Word is fine for one-shot deliverables but actively hostile to documentation that needs to evolve: no diff, no merge, no version history that anyone wants to read, no reviewable pull requests, no CI checks, no public-facing render. Docs-as-code — Markdown in Git — fixes all of this. Here's the playbook to migrate a Word documentation library to GitHub progressively, without disrupting your team.
Why docs-as-code wins
- Diffs that mean something. A change to
auth.mdin a PR shows the actual semantic delta. Word's revision marks are noise. - Code review for docs. Pull requests, line comments, suggestions, approval gates — same flow as code.
- Version history.
git log auth.mdshows every change, every author, every reason. Word's "Restore previous version" is a UX disaster. - CI for docs. Lint Markdown, validate links, check spelling, run examples — all on every PR.
- Free public hosting. GitHub Pages, Read the Docs, Cloudflare Pages all render Markdown for free.
- AI-friendly. Markdown is the format LLMs understand best. RAG over Markdown beats RAG over .docx, every time.
Step 1: Set up your repository
Either create a new repo or add a docs/ folder to an existing one. A standard layout:
my-project/
├── README.md (entry point — what is this thing)
├── CONTRIBUTING.md (how to contribute, including how to write docs)
├── docs/
│ ├── index.md (docs landing page)
│ ├── getting-started/
│ │ ├── installation.md
│ │ └── quickstart.md
│ ├── guides/
│ │ ├── authentication.md
│ │ └── deployment.md
│ ├── reference/
│ │ └── api.md
│ └── images/
├── mkdocs.yml (or docusaurus.config.js, or _config.yml for Jekyll)
└── .github/workflows/
└── docs.yml (CI: build docs, deploy to Pages)Step 2: Convert your Word documents
Two paths depending on volume.
Small library (under ~30 docs): web tool, file by file
For 5-30 documents, the MDisBetter Word to Markdown converter is the fastest setup-free path. For each Word doc:
- Drop the .docx in the converter
- Click Convert, download the .md
- Place it in the right
docs/subfolder - Rename to a kebab-case slug (
installing-the-cli.mdnotInstalling The CLI v3.md) - Add YAML frontmatter (see Step 4)
- Commit
This is also the right path for progressive migration — convert one doc at a time as you touch each section, rather than a big-bang migration.
Larger library (30+ docs): Pandoc batch
Install Pandoc, then bulk-convert from your Word folder to docs/:
cd /path/to/word-docs
mkdir -p ~/repo/docs/imported
for f in *.docx; do
base=$(echo "${f%.docx}" | tr '[:upper:]' '[:lower:]' | tr ' ' '-')
pandoc -f docx -t gfm \
--extract-media="$HOME/repo/docs/images" \
"$f" -o "$HOME/repo/docs/imported/$base.md"
doneThis kebab-cases filenames and dumps images in a shared folder. For more on bulk strategy see convert multiple Word documents.
Step 3: Restructure
The Word folder structure usually doesn't match the docs structure you want. After the bulk conversion, spend an hour reorganising. Move files into getting-started/, guides/, reference/, tutorials/. Drop docs that are obsolete. Merge docs that overlap. Rename for clarity.
This is the highest-leverage hour of the whole migration. The Word library is probably an organic mess of historical docs; the new docs/ folder should be intentional.
Step 4: Add YAML frontmatter
Most static site generators (MkDocs, Docusaurus, Hugo, Jekyll) want frontmatter at the top of each .md:
---
title: "Authentication Guide"
description: "How to authenticate against the API using OAuth 2.0 and API keys."
weight: 20
tags: [auth, security, api]
updated: 2026-05-10
---Bulk-add a stub frontmatter to every imported file with a script:
for f in docs/imported/*.md; do
title=$(head -n 1 "$f" | sed 's/^# //' | sed 's/"/\\"/g')
cat > "$f.tmp" <<EOF
---
title: "$title"
updated: $(date -I)
---
EOF
cat "$f" >> "$f.tmp"
mv "$f.tmp" "$f"
doneThen refine the frontmatter manually as you reorganise.
Step 5: Add MkDocs (or Docusaurus)
MkDocs is the simplest path: pip install, write a YAML config, get a beautiful docs site for free. Install:
pip install mkdocs mkdocs-materialCreate mkdocs.yml at the repo root:
site_name: My Project Docs
site_url: https://myproject.github.io
theme:
name: material
features:
- navigation.tabs
- search.suggest
- content.code.copy
nav:
- Home: index.md
- Getting Started:
- Installation: getting-started/installation.md
- Quickstart: getting-started/quickstart.md
- Guides:
- Authentication: guides/authentication.md
- Deployment: guides/deployment.md
- Reference:
- API: reference/api.md
markdown_extensions:
- admonition
- pymdownx.superfences
- pymdownx.tabbed
- tablesPreview locally:
mkdocs serveOpen http://127.0.0.1:8000. For a deeper MkDocs walkthrough see build a MkDocs site from Word documents.
If you want React-flavored docs with versioning and i18n, use Docusaurus instead — same Markdown source files, different generator.
Step 6: Set up CI to deploy on push
Add a GitHub Actions workflow at .github/workflows/docs.yml:
name: Deploy docs
on:
push:
branches: [main]
paths:
- 'docs/**'
- 'mkdocs.yml'
- '.github/workflows/docs.yml'
permissions:
contents: write
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: '3.x' }
- run: pip install mkdocs-material
- run: mkdocs gh-deploy --forcePush to main, GitHub Pages serves the rendered site at https://<org>.github.io/<repo>. Every doc change becomes a code-reviewed PR and deploys automatically.
Step 7: Add docs lint to CI
Catch broken links and bad Markdown in PRs before merge:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: DavidAnson/markdownlint-cli2-action@v16
with:
globs: 'docs/**/*.md'
- uses: lycheeverse/lychee-action@v1
with:
args: --no-progress 'docs/**/*.md'Markdownlint catches structural issues (inconsistent heading levels, bad list nesting). Lychee validates every link in your docs is reachable.
Step 8: Update the team workflow
The hardest part isn't conversion — it's behaviour change. To make docs-as-code stick:
- Update CONTRIBUTING.md with the docs flow (clone, branch, edit .md, PR, merge, auto-deploy)
- Add a docs review template to PRs (covers structure, accuracy, links, examples)
- Make doc updates part of the definition of done for features
- Train non-engineers on the basic flow — for many teams this is just "edit the .md file in the GitHub web UI and click Commit"
Handling images and diagrams
Word docs often have embedded screenshots and diagrams. Pandoc with --extract-media dumps them in a folder. Reference them in your .md as .
For diagrams, consider switching to Mermaid (text-based diagrams that render natively in GitHub and most static site generators):
```mermaid
sequenceDiagram
Client->>Server: POST /auth
Server-->>Client: 200 + token
```Mermaid diagrams version-control as text — no binary blobs, no "who edited the diagram" mystery.
Pandoc tips for technical docs
For docs with code samples and complex tables, Pandoc has flags worth knowing:
--wrap=none: don't insert hard line breaks (cleaner diffs in Git)--strip-comments: remove Word's review comments--reference-links: convert inline links to reference-style for cleaner Markdown source--standalone --metadata title="My Doc": include a title and other metadata
pandoc -f docx -t gfm --wrap=none --strip-comments \
--extract-media=images guide.docx -o guide.mdOther source formats
Most documentation projects mix Word docs with PDFs (vendor manuals), web pages (RFCs, blog posts), and audio (recorded knowledge transfer sessions). Same destination, different converters: PDF to Markdown, URL to Markdown, Audio to Markdown. All produce GFM-compatible .md you can drop straight into docs/.
What about the Word originals?
Keep them in a docs-archive/ folder for the first 6 months in case you need to verify a conversion. After that, archive offline and treat the Markdown as the source of truth. Don't try to keep Word and Markdown in sync — that's the worst of both worlds.
Versioning your docs
If you ship multiple product versions, you'll want versioned docs. Two common patterns:
Branch-based versioning
Create a docs branch per major version (docs/v1, docs/v2, docs/v3). Each version's docs live on its own branch. Build each branch separately and host at versioned URLs (/v1/, /v2/, /v3/). Simple but maintenance-heavy.
Folder-based versioning
Keep all versions in a single branch under docs/v1/, docs/v2/, docs/v3/. Easier to cross-reference between versions. Plays well with MkDocs mike plugin or Docusaurus versioning.
For most teams, folder-based versioning with the latest as docs/latest/ is the right starting point. Branch-based versioning adds value when versions diverge significantly.
Search and analytics
Once your docs are public, instrument them:
- Search: built-in MkDocs/Docusaurus search up to a few hundred pages, then Algolia DocSearch (free for OSS) or self-hosted Meilisearch
- Analytics: Plausible (privacy-friendly) or Cloudflare Web Analytics (free) — see which docs are read most, which trigger searches, which have high bounce rates
- Feedback: "Was this helpful? Yes/No" widget at the bottom of each page (Docsearch supports this; for MkDocs use the
mkdocs-materialfeedback feature)
Analytics tells you which docs to invest in. The bottom 20% of pages by traffic often deserve archiving; the top 20% often deserve deeper coverage.
Onboarding new contributors
Once docs-as-code is set up, the harder problem is getting new team members to actually use it. Tactics that work:
- Make it part of onboarding: every new hire's first PR is a typo fix in the docs
- Show, don't tell: pair-program a doc edit on Day 1 so they know the flow
- Lower the barrier: GitHub's web UI lets non-technical people edit Markdown without ever cloning the repo
- Reward doc PRs: shout-out doc improvements in standups; treat them as first-class engineering work
Recommendation
Convert progressively: as your team touches each Word doc for an update, convert it to Markdown via the web tool, place it in docs/, PR the change, deprecate the Word original. For docs that are stable but referenced often (API reference, architecture overview), do those first via Pandoc bulk conversion. The migration is done when no one is editing Word anymore — usually 2-3 months for most teams. See also the Obsidian migration guide for a parallel workflow on personal knowledge bases.