Pricing Dashboard Sign up
Recent
· 7 min read · MDisBetter

PDF to Markdown for Financial Reports: Tables & Numbers Preserved

Financial reports are the test case where PDF conversion either works or doesn't — they're dense with tables, footnotes, multi-column layouts, and numerical data where one wrong digit changes meaning. Markdown conversion preserves the structure that matters for analysis (tables as tables, footnotes attached to data, sections labeled) and strips the noise that doesn't (page furniture, repeated headers).

Why financial PDFs are hard

A typical 10-K has 200+ pages with:

Generic PDF extractors fail on at least three of these. Tables flatten, columns scramble, footnote attachments get lost. Our converter is built specifically for layout-aware extraction; financial reports are one of the categories we test most heavily.

What converts well

Tables with numerical precision

Revenue tables, balance sheet tables, segment performance tables — all come through as GFM tables with cell-level fidelity. Decimal points, parenthetical negatives, currency markers preserved exactly. The output pastes cleanly into Excel or Google Sheets.

For complex tables (merged cells, multi-row headers), the converter flattens merges and uses / to separate compound headers — see our PDF tables guide for details.

MD&A and narrative sections

Multi-column narrative comes through in correct reading order, with section headings ("Liquidity and Capital Resources", "Critical Accounting Estimates") preserved as ## markers. Inline references to tables ("see Table 12") stay attached to their sentence.

Footnotes and disclosures

Footnotes attached to specific data points (e.g., "Revenue includes $X from acquired entity") stay attached to that data in the converted output, either as inline annotations or as footnote-style references at the end of the section.

Use cases that get unblocked

Quantitative pipeline ingestion

Hedge funds and quant teams routinely need to pull specific numbers from earnings reports — revenue, EPS, segment breakdowns. With Markdown tables, this becomes a few lines of Pandas:

import re
import pandas as pd
from io import StringIO

with open('AAPL_10K.md') as f:
    md = f.read()

# Find every GFM table block
blocks = re.findall(r'((?:^\|.+?\|\n)+)', md, re.MULTILINE)
dfs = [pd.read_csv(StringIO(b.replace('|',',').strip()), sep=',') for b in blocks]
print(f'Extracted {len(dfs)} tables')

For pure table-only extraction (skip the narrative), our PDF to CSV tool emits one CSV per detected table — even simpler for spreadsheet pipelines.

AI-assisted equity research

Paste a converted earnings report into Claude or ChatGPT and ask:

Markdown input gives the model structured access to tables and section navigation. Raw PDF input gives garbled tables and approximate section references — much less useful for serious research.

Cross-document comparison

Compare two consecutive 10-Ks to surface what changed:

git diff AAPL_2024.md AAPL_2025.md

Word-level diff highlights every changed sentence, every shifted number, every new disclosure. With PDF, this comparison requires manual review or expensive specialized tools (Workiva, MyCloseUp). With Markdown, it's a one-line bash command.

Workflow for an earnings season

For an analyst covering 30 companies through earnings season:

  1. Set up an inbox folder where new earnings PDFs land (manually downloaded or auto-fetched from EDGAR)
  2. CLI watch mode converts each new PDF to Markdown automatically
  3. Markdown files land in a Git repo, organized by company and quarter
  4. For each new filing, run a comparison script against the prior period to surface changes
  5. For new readings, paste into AI for first-pass summary, then drill into specific tables manually

Saves hours per company per quarter. Most of the time savings come from comparison and search — opening 30 PDFs sequentially to find one number is the bottleneck that conversion eliminates.

Caveats specific to financial work

Footnoted cells

Tables with footnote markers in cells (e.g., "Revenue: $1,234 (a)") preserve the footnote markers; the footnote text appears immediately after the table. Spot-check that your downstream parser handles the markers correctly.

Charts and graphs

Bar charts, line charts, pie charts: extracted as image placeholders with their caption text. The data behind the chart isn't recovered (the chart was rendered as an image, not as data). For chart data, you'll need to reference the original PDF or, if available, the underlying spreadsheet.

Currency and unit notation

The converter preserves currency markers ($, €, ¥), parenthetical negatives, decimal precision. It does not normalize units — a table in millions stays in millions, a table in thousands stays in thousands. Note this when aggregating across tables.

Auditor's opinion and signature blocks

Auditor signatures and company executive signatures appear as text where readable, with a placeholder for any rendered signature image. The opinion text comes through cleanly.

Compliance considerations

For analysts working with non-public information (NPI) or material non-public information (MNPI): treat conversion the same as any other tool that touches the document. SaaS conversion services may have data-handling implications under your firm's compliance policy. Self-hosted (Marker, Docling) avoids the question. Our Enterprise tier with signed DPA covers most institutional use.

For published research and earnings work on public filings, no compliance issues — public PDFs go through any conversion path freely.

Frequently asked questions

Will my converted earnings report tables paste cleanly into Excel?
Yes — GFM tables paste into Excel, Google Sheets, and Numbers as native tables (the pipe-separated format is auto-recognized by all three). Numerical formatting (decimals, parentheses for negatives, currency markers) is preserved.
Can I batch-convert all 10-K filings from EDGAR?
Yes — combine our API with the EDGAR API: download new filings, convert via our endpoint, save Markdown to your archive. Scales to thousands of filings per day comfortably. See the <a href="/blog/batch-convert-100-pdfs-to-markdown">batch conversion guide</a>.
What about XBRL data in 10-K filings?
XBRL is a separate, structured data feed published by EDGAR — use that for machine-readable financials when available. Our converter is for the prose and table portions of the filing PDF, complementary to XBRL.