Word to Markdown for Compliance: Version-Controlled Regulatory Docs
Compliance teams spend their lives in two simultaneous failure modes. They have to maintain policies and procedures that satisfy SOC 2, ISO 27001, HIPAA, PCI-DSS, GDPR, or whatever combination of frameworks their company is subject to. And they have to prove during audits that those policies were in force at any given historical point — that the data-retention policy on June 14, 2024 said what it said, that the access-review procedure was followed, that the change-management workflow approved each system change. Word-based policies stored in SharePoint with file-naming conventions like 'Information-Security-Policy-v3.2-FINAL-FINAL.docx' are not an audit trail. Markdown stored in Git, with every change a commit signed by an identified author and approved through a pull request, is. This article is the playbook for moving compliance documentation from Word to a Git-backed Markdown workflow, with honest notes on what the web tool does (the conversion step) and what you have to set up around it (the audit-trail infrastructure).
The audit-trail gap in most compliance documentation programs
The standard compliance documentation problem in most mid-sized companies:
- Policies and procedures live as Word documents in a SharePoint folder
- Changes happen by someone editing the file, sometimes saving over the previous version, sometimes saving a new version with an incremented filename suffix
- The historical record of "what did the policy say on this date" is at best the SharePoint version history (which tracks the file but not who approved the change), at worst lost entirely
- Auditors asking for the version of the policy in force on a specific date get a best-guess answer with caveats
SOC 2 Type 2 auditors specifically care about this — the type 2 attestation requires evidence that controls operated effectively over a period of time, which means the policy that defined the control was in force throughout the period and was followed by the people subject to it. "Here's the current Word document, here's the SharePoint version history, trust us on the dates" is not a strong response.
Markdown plus Git plus pull-request review fixes the audit-trail problem cleanly. Every change is a Git commit with author, timestamp, message, and the diff of what changed. Every commit is associated with the pull request that approved it, which has reviewers identified by name and approval timestamps. The complete history of any policy is reproducible from the Git log at any historical point. This is the level of evidence auditors actually appreciate.
The honest scope: what the web tool does and does not do
Before the architecture, the disclaimer that compliance teams need to internalize: the web tool at word-to-markdown is the conversion step. Upload a .docx, get a .md, download. The web tool is not an audit-trail system, not a version-control system, not a regulated-workflow tool. It does one thing — converts the document format — and it does it as a one-file-at-a-time browser tool.
For a compliance program that needs an actual audit trail of policy changes, the architecture is: Pandoc running on a corporate machine for the conversion, plus Git running on your enterprise Git platform (GitHub Enterprise, GitLab, Bitbucket, Azure DevOps) for the version control and approval workflow. The web tool is appropriate for ad-hoc individual conversions of non-sensitive documents during the migration. The repeatable, audit-bearing workflow runs on infrastructure your compliance team controls.
For SOC 2, ISO 27001, and most other framework audits, the auditor doesn't care what tool produced the conversion as long as the resulting Markdown is properly version-controlled with attributed approvals. The conversion is plumbing; the audit trail is the deliverable.
The Git-backed compliance documentation architecture
The reference architecture for a compliance team going Markdown + Git:
- Repository: a private Git repository on your enterprise Git platform, dedicated to compliance documentation. Often named something like
compliance-policiesorgovernance-docs. - Folder structure: organized by framework and policy type. e.g.:
compliance-policies/ iso-27001/ isms-scope.md information-security-policy.md risk-management-policy.md access-control-policy.md soc2/ security-policy.md availability-policy.md confidentiality-policy.md hipaa/ privacy-policy.md breach-notification-procedure.md pci-dss/ gdpr/ - Branch protection: the main branch is protected — no direct pushes, all changes via pull request, required reviewers configured per the framework's approval requirements
- CODEOWNERS file: maps each policy file to its required approver(s). Changes to the access-control policy require the CISO's approval; changes to the privacy policy require the DPO's approval; etc.
- Commit signing: GPG-signed commits required, so the Git log proves not just who pushed but who actually authored each change
- CI checks: automated linting (Vale or markdownlint) verifies style consistency; an internal CI step runs framework-specific completeness checks ("this policy needs sections X, Y, Z")
- Deployment: merged changes auto-deploy to a static site (MkDocs, Docusaurus) hosted internally where employees read the policies. The static site is the consumption layer; the Git repo is the source of truth.
This setup takes a compliance engineer or DevOps engineer about a week to scaffold initially. Once running, every policy change leaves a complete audit trail by virtue of how the workflow operates — no extra documentation overhead.
Step-by-step migration of an existing Word policy library
- Inventory existing policies: list every Word policy currently in force, owner, last review date, framework reference, version number
- Triage: confirm-current vs needs-update vs retire. Migration is the right moment to clean up.
- Bulk-convert with Pandoc on a corporate machine: not the web tool for an enterprise corpus. The bash script:
#!/bin/bash SRC=~/SharePoint/Compliance/Policies OUT=~/compliance-policies-staging find "$SRC" -name '*.docx' | while read f; do rel="${f#$SRC/}" out="$OUT/${rel%.docx}.md" mkdir -p "$(dirname "$out")" pandoc "$f" -f docx -t gfm --wrap=preserve -o "$out" done - Editorial pass per policy: open each .md, fix heading hierarchy, add frontmatter (framework reference, owner, version, effective date, next review date), establish cross-references to related policies as Markdown links
- Initial commit batch: commit the cleaned policies to the Git repo in a single "Migrated from Word library" PR with the migration date documented in commit messages
- Switch over: announce the cutover date, freeze the Word library (read-only), make the Git repo + static site the canonical source from that date forward
- Establish workflow discipline: every subsequent change goes through PR + CODEOWNERS approval; no out-of-band edits
For a typical mid-sized compliance program with 30-80 policies, this migration runs 4-8 weeks. The audit-trail benefit kicks in immediately on cutover; subsequent audits get progressively easier as the Git history accumulates.
The pull-request workflow as evidence of approval
The pull-request workflow itself becomes audit evidence. When a control reviewer asks "how did you approve the change to the access-control policy on June 14?", the answer is:
- Open the Git log for access-control-policy.md
- Find the commit dated June 14
- Click through to the associated PR
- The PR shows: who proposed the change, what they changed (the diff), who reviewed, when each reviewer approved, when it merged
This is reproducible, time-stamped, attributed, and complete. SOC 2 auditors love it. ISO 27001 auditors love it. Internal-audit functions love it. Compliance leadership stops dreading audit weeks.
For policies that require formal sign-off beyond peer review (board-level approval, executive committee approval), document the off-Git approval in the PR ("Approved by Information Security Committee 2026-06-12, minutes attached as ISC-2026-06-12.pdf") and store the supporting artifact in your evidence repository. The Git PR is the technical audit trail; the off-Git approval document is the governance audit trail; together they cover the full review chain.
Frontmatter for compliance metadata
The frontmatter on each policy file carries the metadata auditors and compliance staff need:
---
title: Access Control Policy
framework_refs:
- ISO 27001:2022 A.5.15
- SOC 2 CC6.1
- HIPAA § 164.308(a)(4)
owner: CISO
approver: Information Security Committee
version: 4.2
effective_date: 2026-04-15
next_review_date: 2027-04-15
last_audited: 2026-02-10
status: active
classification: internal
---
# Access Control Policy
## Purpose
Defines requirements for granting, modifying, and revoking access to
company information systems and data.
[... rest of policy ...]
The static site renders the frontmatter as a metadata box at the top of each policy page so employees can see at a glance which framework controls the policy supports and when it was last reviewed. The Git repo's CI can run automated checks against the frontmatter ("flag any policy where next_review_date is in the past") to surface overdue reviews proactively.
Mapping policies to framework controls
One of the highest-value side effects of moving compliance docs to Git: the policy-to-control mapping becomes machine-readable. The frontmatter's framework_refs field is a structured list of control identifiers; a CI script can crawl the repo and produce:
- Per-framework crosswalk reports: "these are all the policies satisfying ISO 27001 A.5.15"
- Coverage gap reports: "these ISO 27001 controls have no policy mapped to them"
- Reverse-lookup reports: "this policy supports these controls across these frameworks"
For multi-framework compliance programs (SOC 2 + ISO 27001 + HIPAA, common for SaaS companies serving regulated industries), this crosswalk is a substantial productivity win compared to maintaining the same mapping in spreadsheets that drift from the actual policies.
Evidence collection alongside policy management
Compliance is policies + evidence. Policies say what should happen; evidence proves it actually happened. The Git repo holds policies; a separate evidence repository (or a /evidence/ subfolder) holds the artifacts.
Evidence artifacts that often live alongside the policies:
- Quarterly access-review CSV exports (file-named with date and reviewer)
- Annual penetration test reports (PDF, indexed by date)
- Vendor security-questionnaire responses (PDF or Markdown, indexed by vendor)
- Training-completion reports per quarter
- Incident-response post-mortems (Markdown, indexed by date)
For evidence that arrives as PDF (third-party audit reports, vendor SOC 2 reports, certificates), a useful workflow: convert the key sections to Markdown via PDF to Markdown for searchability while preserving the original PDF as the authoritative artifact. The Markdown lives in /evidence/ alongside the original; auditors get the PDF for authentication; compliance staff search the Markdown for cross-evidence pattern detection.
Cross-feature: documentation across the compliance program
A mature compliance documentation program has multiple input streams beyond policies:
- Recorded compliance training sessions: audio to Markdown for transcripts that become reference material and accessible alternatives
- Vendor and partner documentation: URL to Markdown for capturing publicly-published vendor security pages, attestation summaries, etc.
- Legacy PDF policy archives: PDF to Markdown for migrating historical policies that exist only as PDF
The unifying principle: every input format becomes Markdown, the Markdown lives in Git, every change is approved through PR. The compliance team's documentation operations become engineering-grade in their rigor without requiring the compliance team to write code.
The role of the audit-trail honesty disclaimer
Repeating the honest scope because compliance teams especially need to internalize it: the web tool at mdisbetter.com is a one-file-at-a-time browser converter. It is not an enterprise audit-trail system. It does not version-control your output, does not require approval workflow on changes, does not retain evidence of the conversion. For an audit-bearing compliance program, the converter is one step in the chain — the controlled steps (Pandoc on corporate infrastructure for bulk, Git for version control, PR workflow for approval, CI for automated checks, CODEOWNERS for required reviewers) are what make the program compliant. Use the web tool for individual non-sensitive ad-hoc conversions during migration; build the controlled architecture around it for the going-forward operation.
For related architecture see word to Markdown for enterprise knowledge bases (the broader corpus pattern), word to Markdown for SOPs (the operational-procedure analog), and building an enterprise document migration pipeline (the deep dive on bulk migration).