A converter built for the model, not the reader

MarkItDown is a Python utility from Microsoft’s AutoGen team that turns files into Markdown: PDF, Word, PowerPoint, Excel, images, audio, HTML, CSV, JSON, even YouTube URLs and EPUBs. The framing the README is honest about, and that you should internalize before adopting it, is the target audience for the output. This is conversion for large language models and text-analysis pipelines, not for humans who want a faithful reproduction of the original document.

That distinction is the entire reason to pick it or skip it. Markdown is close to plain text, carries just enough structure (headings, lists, tables, links), and is something mainstream models already read fluently and token-efficiently. MarkItDown optimizes for that. If you need a pixel-faithful or print-ready rendering of a complex PDF for a person to read, the project itself points you elsewhere.

What it converts, and the catch with extras

Format coverage is broad, but it is gated behind optional dependencies. A bare install will not handle PDFs or Office files until you pull the matching extras. The simplest path is everything:

pip install 'markitdown[all]'

For a leaner footprint, install only what you touch:

pip install 'markitdown[pdf, docx, pptx]'

This is the most common first-week stumble: install the base package, feed it a PDF, and wonder why support seems missing. It is not missing, it is unselected. The available extras include [pdf], [docx], [pptx], [xlsx], [outlook], [audio-transcription], and [youtube-transcription], among others.

Usage

From the command line, write to a file or pipe:

markitdown path-to-file.pdf -o document.md

From Python, the API is one class and one call:

from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("report.xlsx")
print(result.text_content)

For image descriptions on pictures and slides, you pass an LLM client and model rather than relying on a built-in vision step:

from markitdown import MarkItDown
from openai import OpenAI

md = MarkItDown(llm_client=OpenAI(), llm_model="gpt-4o")
result = md.convert("slide.jpg")
print(result.text_content)

The three conversion tiers most people miss

MarkItDown is not one converter, it is three quality levels, and choosing the wrong one wastes either money or fidelity:

  • Built-in converters run offline, format by format, on local compute. Free, private, and the right default for clean digital documents.
  • Azure Document Intelligence adds cloud layout extraction for scanned PDFs and complex tables. Billable, and worth it when local extraction garbles structure.
  • Azure Content Understanding is the only tier that handles video, does higher-quality audio, and can emit structured fields (invoice amounts, dates) as YAML front matter. Also billable per convert() call, so scope it with cu_file_types rather than routing everything through it.

The practical advice the docs bury: start with built-in, escalate to a cloud tier only for the formats that actually fail locally.

The security note you should not skim

MarkItDown performs I/O with the privileges of the current process, the same way open() or requests.get() would. In an untrusted environment, that means a malicious input can reach whatever the process can reach. The maintainers’ guidance is to sanitize inputs and call the narrowest function for the job, such as convert_stream() or convert_local(), rather than a broad convenience entry point. If you are converting user-uploaded files on a server, read that section before shipping.

markitdown versus the other document-to-text tools

markitdowndoclingunstructuredtextract
Stars149,71761,29614,8764,609
OutputMarkdown for LLMsstructured doc modelelements for RAGplain text
LicenseMITMITApache-2.0MIT
Sweet spotbroad formats, fast LLM preprich layout fidelityenterprise ingestion pipelineslegacy text extraction

Counts are from GitHub as of June 2026. Docling goes deeper on layout and structure, which matters when document fidelity is the point. Unstructured is oriented around chunked elements for retrieval pipelines. Textract is the older, plain-text ancestor MarkItDown explicitly compares itself to. MarkItDown’s edge is breadth and speed of getting reasonable Markdown out of almost anything, including audio and YouTube, with minimal setup.

What the version and tracker tell you

MarkItDown is still pre-1.0, with v0.1.6 tagged in May 2026, and carries a large open-issue count (826 as of 2026-06). That is the cost of supporting so many formats: every format is its own long tail of edge cases. The most-discussed open thread is about deeper LLM integration. Treat it as a fast-moving, broadly useful tool rather than a frozen standard, and pin your version if you depend on exact output.

For turning the Markdown you extract into a local model’s input, see Ollama. For what else is climbing in the ecosystem, browse the daily trending digest and the weekly report.

FAQ

Is MarkItDown good for high-fidelity document conversion? No, and it says so. It targets Markdown for LLMs and text analysis, not faithful human-facing reproduction. Use a layout-focused tool like Docling for that.

Why does my install not handle PDFs? You likely installed the base package without extras. Install markitdown[all] or the specific extras such as markitdown[pdf].

Do I need an Azure subscription? Only for the Document Intelligence and Content Understanding tiers. The built-in converters run locally and free.

Is it safe to convert untrusted files? Only with care. It runs I/O at the process’s privilege level; sanitize inputs and use the narrowest convert_* function in untrusted environments.