Pandoc is the most powerful document conversion tool available. It converts between dozens of formats, including .docx to Markdown, and gives you fine-grained control over the output. This guide covers everything you need to use Pandoc for Word to Markdown conversion.
What Is Pandoc?
Pandoc is a free, open-source command-line tool that converts documents between formats. It supports over 40 input and output formats. For Word to Markdown conversion, Pandoc is the gold standard — more configurable than any browser-based tool, including WordToMD.
When to use Pandoc vs. WordToMD:
| Scenario | Pandoc | WordToMD |
|---|---|---|
| One-off conversion | ❌ Requires install | ✅ Instant, browser-based |
| Batch conversion | ✅ Script it | ❌ One file at a time |
| Custom output format | ✅ Extensive options | ❌ Standard GFM only |
| Privacy-sensitive docs | ✅ Local processing | ✅ Browser-only, no upload |
| No terminal experience | ❌ CLI knowledge needed | ✅ Drag and drop |
| CI/CD pipeline | ✅ Automatable | ❌ Not automatable |
Installing Pandoc
Windows
Download the installer from pandoc.org/installing.html, or use a package manager:
# Winget
winget install pandoc
# Chocolatey
choco install pandoc
# Scoop
scoop install pandoc
macOS
# Homebrew
brew install pandoc
# MacPorts
sudo port install pandoc
Linux
# Ubuntu/Debian
sudo apt-get install pandoc
# Fedora
sudo dnf install pandoc
# Arch
sudo pacman -S pandoc
Verify installation:
pandoc --version
Basic Conversion Command
pandoc document.docx -o output.md
That’s it. Pandoc infers the input format from .docx and the output format from .md.
More explicit version:
pandoc -f docx -t markdown document.docx -o output.md
Output Format Options
Pandoc offers several Markdown variants as output:
# GitHub Flavored Markdown (GFM)
pandoc document.docx -t gfm -o output.md
# CommonMark
pandoc document.docx -t commonmark -o output.md
# Pandoc's extended Markdown (most features)
pandoc document.docx -t markdown -o output.md
# Standard Markdown (basic)
pandoc document.docx -t markdown_strict -o output.md
GFM is recommended for most use cases (GitHub, Obsidian, GitBook, Notion).
Useful Conversion Options
Extract Media Files
Images embedded in .docx are extracted to a directory:
pandoc document.docx --extract-media=./media -o output.md
This saves images to ./media/ and adds Markdown image references to the output.
Wrap Lines
Control line wrapping in the output:
# No wrapping (one paragraph = one line)
pandoc document.docx --wrap=none -o output.md
# Wrap at 80 characters
pandoc document.docx --wrap=auto --columns=80 -o output.md
--wrap=none is recommended for Markdown that will be version-controlled — it produces cleaner Git diffs.
Table of Contents
pandoc document.docx --toc -o output.md
Standalone Document with Metadata
pandoc document.docx -s -o output.md
The -s (standalone) flag generates YAML frontmatter from the document’s metadata.
Batch Conversion
Convert all .docx files in a directory:
Windows PowerShell
Get-ChildItem -Filter "*.docx" | ForEach-Object {
$output = [System.IO.Path]::ChangeExtension($_.Name, ".md")
pandoc $_.FullName -t gfm --wrap=none -o $output
Write-Host "Converted: $($_.Name) → $output"
}
Bash (macOS/Linux)
for file in *.docx; do
output="${file%.docx}.md"
pandoc "$file" -t gfm --wrap=none -o "$output"
echo "Converted: $file → $output"
done
For more batch conversion options, see Batch Convert Word to Markdown.
Customizing Output with Lua Filters
Pandoc supports Lua filters that transform the document AST during conversion:
pandoc document.docx --lua-filter=my-filter.lua -o output.md
Example Lua filter to add a custom frontmatter block:
-- add-frontmatter.lua
function Meta(meta)
meta.draft = false
meta.author = "WordToMD Team"
return meta
end
This is the kind of customization that browser tools can’t match.
Comparing Pandoc and mammoth.js Output
WordToMD uses mammoth.js under the hood. Pandoc uses its own parser. Key differences:
| Feature | Pandoc | mammoth.js (WordToMD) |
|---|---|---|
| Image extraction | ✅ --extract-media | ⚠️ Noted in Conversion Notes |
| Custom styles | ✅ Via reference doc | ⚠️ Logged as warnings |
| Math equations | ✅ MathML → LaTeX | ❌ Not supported |
| Track changes | ✅ Configurable | ❌ Stripped |
| Footnotes | ✅ Preserved | ⚠️ Inline only |
| Comments | ✅ Optional | ❌ Stripped |
For complex documents, Pandoc gives more complete output. For simple documents, WordToMD is faster with zero setup.
Pandoc Defaults Files
For repeated conversions with the same settings, create a defaults file:
# my-defaults.yaml
from: docx
to: gfm
wrap: none
extract-media: ./media
standalone: true
Then run:
pandoc document.docx -d my-defaults.yaml -o output.md
Using Pandoc in CI/CD
For automated pipelines (GitHub Actions, etc.):
# .github/workflows/convert-docs.yml
- name: Install Pandoc
run: sudo apt-get install pandoc
- name: Convert Word docs to Markdown
run: |
for f in docs/*.docx; do
pandoc "$f" -t gfm --wrap=none -o "${f%.docx}.md"
done
FAQ
Pandoc outputs \ line continuations in my Markdown. How do I remove them?
Add --wrap=none to disable hard line wrapping.
My tables look garbled in Pandoc output.
Try -t gfm for GFM table syntax instead of the default Pandoc Markdown tables.
Images aren’t showing up in the output.
Add --extract-media=./images to extract embedded images. Then reference them correctly in your target environment.
Pandoc converts smart quotes to " characters — how do I keep them?
Add --no-highlight or modify the template. For smart quotes specifically, they should be preserved by default in most output formats.
How do Pandoc and WordToMD compare for DOCX to Markdown? Both work well for standard documents. WordToMD requires zero setup and produces clean GFM output. Pandoc requires installation but handles complex documents (images, math, comments, footnotes) more completely.
Conclusion
Pandoc is the most capable tool for Word to Markdown conversion, especially for batch processing, custom output formats, and CI/CD pipelines. For quick, one-off conversions without installation, WordToMD gets the job done instantly. The two tools complement each other well.