Audio to Markdown for Lawyers: Deposition & Court Transcripts
Before any workflow, the disclaimer that every responsible vendor in this space should put first: AI transcription is not a substitute for a certified court reporter. For trial-admissible deposition transcripts, witness statements presented as evidence, sworn proceedings, and any record that requires authentication under FRE 901 or analogous state rules, you hire a CSR. AI transcription is content extraction; a CSR-certified transcript is a sworn record. The two serve different purposes and the line is firm. Where AI transcription as structured Markdown does pay off is upstream of the certified record: pre-trial review of recorded conversations, internal witness-prep workflow, your own attorney work-product notes from voice memos, and rapid review of the audio recordings that inevitably accompany discovery production. This article covers that scope honestly.
What AI transcription is, and is not
The honest scope, repeated because the disclaimer matters and because your malpractice carrier will care:
Use a CSR (Certified Shorthand Reporter) for: depositions intended to be used as testimony, court hearings, sworn statements, any audio you intend to introduce as evidence, anything where chain of custody and authentication will be questioned. CSR transcripts come with a certificate of accuracy, the reporter's signature, and a recognized professional standard the court will accept. Cost: typically $3-7 per page, plus appearance fees, plus expedite fees if you need fast turnaround.
Use AI transcription for: your own voice memos and case notes, recorded client meetings (with consent), witness-prep sessions you're conducting privately, pre-trial review of recordings produced in discovery, internal review of body-cam footage or 911 calls before a CSR transcribes the official version, audio attached to e-discovery productions you need to triage. Cost: free to a few dollars per hour of audio.
The two workflows complement each other. Spend CSR money on the small fraction of audio that will become trial evidence; use AI transcription on the much larger volume of audio that lives in case files, prep work, and discovery review.
Pre-trial review of audio in discovery
Modern discovery productions routinely include hundreds of hours of audio — recorded business calls, voicemails, customer-service recordings, body-cam footage, surveillance audio, intercepted communications. The first-pass review problem is the same as for documents: most of the audio is irrelevant, a small fraction matters, and the team needs to identify the relevant material without spending weeks listening to everything in real time.
The AI-assisted review workflow:
- Run every audio file in the production through audio-to-markdown, getting back a structured .md transcript per file
- Index the transcripts in a folder structure mirroring the production's Bates-numbered organization
- Review by reading, not by listening — far faster, far more searchable, and the timestamps in the transcript point straight to the relevant audio segment for confirmation
- Flag any file containing a relevant utterance for full attorney review (and, if the file may become evidence, for CSR-certified transcription)
The mass review pass that would have taken three associates two weeks of dedicated listening compresses to two days of reading. The CSR engagement happens only on the 5-10% of audio files that survive triage as evidence-relevant.
Witness prep and recorded interviews
Witness preparation often involves recording the prep session itself — for the attorney's own review, to track the witness's demeanor and consistency over multiple sessions, and to refine the prep approach. These recordings are attorney work product, not evidence. AI transcription is the right tool: fast, cheap, easy to review.
The workflow per witness:
Cases/
Smith-v-Acme-2026/
witnesses/
Johnson-Sarah/
2026-04-12-prep-session-1.mp3
2026-04-12-prep-session-1.md
2026-04-19-prep-session-2.mp3
2026-04-19-prep-session-2.md
deposition-2026-05-10.mp3 (pending CSR transcript)
notes.md
Smith-Robert/
Williams-Diane/Each prep session is recorded, transcribed to Markdown, and stored alongside the audio. Searchable across the witness's full prep history. Cross-referenceable with prior deposition testimony. Useful for refreshing recollection between sessions and for building the impeachment binder when prior inconsistent statements need to be tracked.
Always confirm with the witness on tape that they consent to being recorded for prep purposes — standard practice but easy to forget.
Cost comparison: real numbers
For a typical complex civil matter with 40 hours of recorded audio across deposition prep, recorded client meetings, voice-memo case notes, and discovery audio:
| Workflow | Cost | Turnaround | Use case |
|---|---|---|---|
| CSR-certified transcription | $120-280/hour audio (~$5,000-$11,000 total) | 5-15 business days | Trial-admissible record |
| Paid human (legal-grade) | $60-120/hour audio (~$2,400-$4,800) | 3-7 business days | Pre-trial draft |
| AI cloud transcription | $0-50 total | Hours, not days | Internal review and prep |
The principle: pay CSR rates only for the audio that will be CSR-grade evidence; use AI for everything upstream. Most matters have a 10:1 ratio of internal/prep audio to evidentiary audio. The savings on the larger pool fund the necessary spending on the smaller pool.
Quote extraction and the Markdown advantage
The reason structured Markdown beats plain-text transcripts for legal work: search and quotability scale with structure. A 90-minute deposition transcript with no internal structure is a wall of text. The same content with H2 sections by topic, bold speaker labels, and timestamp anchors is genuinely usable.
For the impeachment binder workflow:
- Read the prior testimony transcript (or the prior recorded prep session, etc.)
- Search for every utterance on the topic in question
- Copy the relevant passage with its timestamp
- Cross-reference against the new statement
- Build the binder entry: "Witness previously stated [quote with timestamp]; today states [contradicting quote]"
Done from a folder of structured Markdown transcripts, this is a same-day exercise. Done from raw audio files or from PDF transcripts produced by a court reporter (no semantic search, broken across page boundaries), it's a multi-day exercise that often gets skipped in tight deadlines.
The privilege and confidentiality question
Any cloud transcription service involves uploading audio to a third party. For audio that contains privileged communications, attorney work product, or client confidences, this is a real consideration. Two approaches:
- Cloud transcription with vendor diligence: review the vendor's terms of service for data retention and use rights. Use the cloud workflow for audio that doesn't contain privileged content (publicly available recordings, recorded depositions where the deponent is not your client, body-cam footage produced in discovery).
- Local-only transcription: run OpenAI's open-weights Whisper model on your own machine for audio containing privileged communications. The audio never leaves your network. Setup:
import whisper
from pathlib import Path
model = whisper.load_model("large-v3") # most accurate Whisper model
def transcribe_locally(audio_path):
result = model.transcribe(str(audio_path))
md = Path(audio_path).with_suffix(".md")
with open(md, "w", encoding="utf-8") as f:
f.write(f"# {Path(audio_path).stem}\n\n")
for seg in result["segments"]:
mins = int(seg["start"] // 60)
secs = int(seg["start"] % 60)
f.write(f"[{mins:02d}:{secs:02d}] {seg['text'].strip()}\n\n")
return md
for audio in Path("witness-prep/").glob("*.mp3"):
transcribe_locally(audio)The Whisper large-v3 model runs at near real-time on a modern CPU and 5-10x real-time on a consumer GPU. A 90-minute prep session transcribes in 90 minutes on a MacBook (background task) or 10-15 minutes on a desktop with a GPU. For privileged material, this is the right tool.
For speaker diarization (separating the attorney's voice from the witness's), pair Whisper with pyannote.audio or use WhisperX which bundles both. The technical details are in speaker identification: how it works.
Cross-feature: the documentary side of the case file
Most cases combine recorded audio with substantial documentary evidence — pleadings, exhibits, contracts, medical records, financial documents. A unified Markdown corpus across audio transcripts and converted documents is the substrate for AI-assisted review.
For the documentary side, see URL to Markdown for legal evidence for web-based sources (corporate disclosures, social posts, marketing pages) and the standard PDF-to-Markdown workflow for case documents. The same Bates-numbered folder structure holds audio transcripts and document conversions; the same AI assistant can search across both.
Useful prompts when the case file is fully Markdownized:
- "Find every reference across these depositions and documents to [specific factual claim]."
- "Identify every inconsistency between these three witnesses' accounts of [event]."
- "Pull every document or transcript passage that supports our theory of [issue]."
The AI is doing first-pass associate work. Final review and judgment remain human; the time saved is the speedup on the mechanical search-and-flag stage.
Voice-memo case notes
Many trial lawyers dictate notes throughout the day — observations from court, thoughts on strategy, instructions for staff, draft language for filings. These voice memos are typically scattered across phone notes apps, never transcribed, and lost when the matter closes.
Running the day's voice memos through transcription at end of day produces a daily case-notes Markdown file. Stored in the matter folder, indexed by date, searchable across the life of the case. Three years into a long-running matter, this corpus of contemporaneous attorney impressions is genuinely valuable for trial prep, settlement positioning, and (in the worst case) responding to malpractice claims that depend on what was thought when.
For a busy litigator generating an hour of voice memos per week, this workflow takes ten minutes per week to maintain and accumulates a case-history archive that didn't exist before.
The summary, with the disclaimer repeated
For trial-admissible records: hire a court reporter. The CSR-certified transcript is the standard the court accepts; AI transcription is not a substitute and presenting an AI transcript as the official record can have serious professional consequences. For pre-trial review, witness prep, internal case notes, mass triage of audio in discovery, and any audio you're processing for your own attorney work product: AI transcription as structured Markdown saves substantial time and money. The two pipelines are complementary; using both well is the modern litigator's audio workflow.
Pre-trial audio → upload to audio-to-markdown (or local Whisper for privileged material) → review in Markdown → flag evidentiary segments for CSR transcription → integrate with documentary case file via URL to Markdown for legal evidence → search the unified corpus. For sales-call workflows that share the structured-transcript pattern (without the legal stakes), see audio to Markdown for sales.