What "embedding quality" actually means
Two practical metrics: cosine similarity between semantically related chunks (should be high), and cosine similarity between semantically unrelated chunks (should be low). Raw PDF text fails on both — repeated headers and footers create false similarity between chunks that share nothing else, while column-break artefacts create false dissimilarity between chunks that should cluster.
Markdown removes both effects. We typically observe cosine similarity for related chunks rising 0.05–0.10 (on a 0–1 scale) and unrelated cosine falling by similar amounts — which translates to noticeably sharper top-K retrieval and fewer false positives in re-ranking.
Choosing an embedding model
For most production workloads in 2026, OpenAI text-embedding-3-large, Cohere embed-v3, or Voyage voyage-3-large all perform comparably on Markdown input. The difference is dwarfed by input quality — a worse model on Markdown beats a better model on raw PDF in our internal tests.