Per-platform handling
Reddit: the OP becomes # Title with the post body underneath; each comment is a Markdown section with author, vote count, and timestamp as a small header line, then the comment body, then nested replies indented as nested blockquotes. Stack Overflow: the question becomes # Question with the body, then each answer becomes ## Answer (N votes) with the answer body and any comments as sub-blockquotes; the accepted answer is marked. Hacker News: threading is preserved with indentation depth matching the comment tree. Discourse: first post is the topic, subsequent posts are sections with author attribution.
Why this matters for AI workflows
An LLM asked to summarise a Stack Overflow thread from raw HTML loses track of which voice is which, which answer is accepted, which votes are high. The same LLM given the Markdown conversion answers correctly: the structure is explicit, the attribution is preserved, the accepted answer is labelled. Useful for knowledge extraction, RAG indexing, and offline reference building.