Convert MediaWiki to DocBook
Max file size 100mb.
MediaWiki vs DocBook Format Comparison
| Aspect | MediaWiki (Source Format) | DocBook (Target Format) |
|---|---|---|
| Format Overview |
MediaWiki
MediaWiki Markup Language
Wiki markup language created by Magnus Manske and Lee Daniel Crocker for Wikipedia in 2002. Uses syntax like == headings ==, '''bold''', ''italic'', [[links]], and template transclusion. Powers Wikipedia, Wikimedia Commons, Fandom, and thousands of wikis with collaborative editing and versioning. Wiki Standard Collaborative |
DocBook
DocBook XML Schema
Semantic XML vocabulary for technical documentation and publishing, maintained by OASIS. Originally developed at HaL Computer Systems and O'Reilly Media in 1991. Provides a comprehensive set of XML elements for books, articles, reference pages, and technical manuals with rich semantic markup for professional publishing. XML Standard Technical Publishing |
| Technical Specifications |
Type: Wiki markup language
Encoding: UTF-8 MIME Type: text/x-wiki Extensions: .mediawiki, .wiki, .txt Parser: MediaWiki parser, Parsoid Extensibility: Lua modules, extensions |
Type: XML vocabulary/schema
Encoding: UTF-8 (XML default) MIME Type: application/docbook+xml Extensions: .xml, .dbk, .docbook Schema: RELAX NG, DTD, XSD Standard: OASIS DocBook 5.1 |
| Syntax Examples |
MediaWiki lightweight markup: == Chapter One ==
This is '''bold''' and ''italic''.
=== Section 1.1 ===
A [[link]] and a list:
* Item one
* Item two
{| class="wikitable"
|-
! Header !! Value
|-
| Data || 42
|}
|
DocBook XML semantic markup: <chapter>
<title>Chapter One</title>
<para>This is <emphasis role="bold">
bold</emphasis> and <emphasis>
italic</emphasis>.</para>
<section>
<title>Section 1.1</title>
<itemizedlist>
<listitem>Item one</listitem>
<listitem>Item two</listitem>
</itemizedlist>
</section>
</chapter>
|
| Content Support |
|
|
| Advantages |
|
|
| Disadvantages |
|
|
| Common Uses |
|
|
| Best For |
|
|
| Version History |
Introduced: 2002 (for Wikipedia)
Creators: Magnus Manske, Lee Daniel Crocker Status: Actively maintained Evolution: Parsoid, VisualEditor, Lua |
Introduced: 1991 (HaL/O'Reilly)
Current Version: DocBook 5.1 (OASIS, 2016) Status: Stable OASIS standard Evolution: SGML to XML, DTD to RELAX NG |
| Software Support |
MediaWiki: Native rendering
Pandoc: Read/write support Editors: VisualEditor, WikiEditor Other: Parsoid, wiki tools |
XSLT Stylesheets: DocBook XSL (Bob Stayton)
Pandoc: Full read/write support Editors: Oxygen XML, XMLmind, VS Code Other: FOP, xsltproc, Saxon |
Why Convert MediaWiki to DocBook?
Converting MediaWiki markup to DocBook XML transforms collaborative wiki content into a professional, semantically rich document format designed for technical publishing. DocBook is the industry standard for authoring technical books, manuals, and reference documentation used by publishers like O'Reilly Media, the Linux Documentation Project, and major software companies. This conversion bridges the gap between wiki-based content creation and professional publishing workflows.
MediaWiki's lightweight markup with == headings ==, '''bold''', ''italic'', [[links]], and table syntax is powerful for wiki environments but lacks the semantic precision that professional publishing requires. DocBook XML provides elements like <chapter>, <section>, <figure>, <example>, <note>, <warning>, and <glossentry> that carry explicit meaning about the content's role and purpose, enabling sophisticated processing and output generation.
One of DocBook's greatest strengths is its multi-format output capability through XSLT stylesheets. From a single DocBook XML source, you can generate HTML (chunked or single-page), PDF (via FOP or XSL-FO), EPUB, man pages, HTML Help, and other formats. The DocBook XSL stylesheets provide extensive customization options for controlling the appearance of each output format, making it possible to produce professional-quality publications.
This conversion is particularly valuable for organizations that started with wiki-based documentation and need to evolve to a more rigorous publishing pipeline. DocBook XML files can be validated against the official OASIS schema, ensuring structural correctness. They integrate well with XML tools and workflows, support modular document assembly through XInclude, and enable features like automatic index generation, cross-reference resolution, and bibliography management.
Key Benefits of Converting MediaWiki to DocBook:
- Professional Publishing: Industry-standard format for technical books and manuals
- Semantic Richness: Explicit content meaning with dedicated XML elements
- Multi-Format Output: Generate PDF, HTML, EPUB, man pages from one source
- Schema Validation: Verify document structure against OASIS standard
- Modular Assembly: Build large documents from reusable XML components via XInclude
- Professional Indexes: Automatic index, glossary, and bibliography generation
- Toolchain Integration: Works with industry XML tools (Oxygen, Saxon, FOP)
Practical Examples
Example 1: Wiki Article to DocBook Chapter
Input MediaWiki file (guide.mediawiki):
== Installation == '''Prerequisites:''' * [[Python]] 3.8 or later * [[pip]] package manager * 2 GB free disk space === Linux Installation === Run the following command:sudo apt install mediawiki {{Note|Root access is required}}
Output DocBook file (guide.xml):
<chapter xml:id="installation">
<title>Installation</title>
<para><emphasis role="bold">Prerequisites:</emphasis></para>
<itemizedlist>
<listitem><para>Python 3.8 or later</para></listitem>
<listitem><para>pip package manager</para></listitem>
<listitem><para>2 GB free disk space</para></listitem>
</itemizedlist>
<section xml:id="linux-installation">
<title>Linux Installation</title>
<para>Run the following command:</para>
<programlisting language="bash">
sudo apt install mediawiki</programlisting>
<note><para>Root access is required</para></note>
</section>
</chapter>
Example 2: Wiki Reference Page to DocBook Reference
Input MediaWiki file (api_ref.mediawiki):
== API Reference ==
=== getUserById ===
Returns user data by ID.
{| class="wikitable"
|-
! Parameter !! Type !! Description
|-
| id || ''integer'' || User ID (required)
|-
| fields || ''string'' || Comma-separated field list
|}
{{Warning|Requires authentication token}}
Output DocBook file (api_ref.xml):
<chapter xml:id="api-reference">
<title>API Reference</title>
<refentry xml:id="getuserbyid">
<refnamediv>
<refname>getUserById</refname>
<refpurpose>Returns user data by ID</refpurpose>
</refnamediv>
<table>
<title>Parameters</title>
<tgroup cols="3">
<thead><row>
<entry>Parameter</entry>
<entry>Type</entry>
<entry>Description</entry>
</row></thead>
<tbody>...</tbody>
</tgroup>
</table>
<warning><para>Requires authentication</para></warning>
</refentry>
</chapter>
Example 3: Wiki Content to DocBook Book Structure
Input MediaWiki file (book.mediawiki):
= MediaWiki Administrator's Guide = == Introduction == This guide covers '''MediaWiki administration''' for system administrators. == Server Setup == === Hardware Requirements === * CPU: 2+ cores recommended * RAM: ''4 GB minimum'' * Storage: 50 GB for wiki + database === Software Stack === # Install [[Apache]] or [[Nginx]] # Install [[PHP]] 8.0+ # Install [[MySQL]] or [[PostgreSQL]] [[Category:Administration]] [[Category:Server Setup]]
Output DocBook file (book.xml):
<book xml:id="mediawiki-admin-guide">
<info>
<title>MediaWiki Administrator's Guide</title>
<subjectset>
<subject>Administration</subject>
<subject>Server Setup</subject>
</subjectset>
</info>
<chapter xml:id="introduction">
<title>Introduction</title>
<para>This guide covers <emphasis role="bold">
MediaWiki administration</emphasis>...</para>
</chapter>
<chapter xml:id="server-setup">
<title>Server Setup</title>
<section>...</section>
</chapter>
</book>
Frequently Asked Questions (FAQ)
Q: What is DocBook?
A: DocBook is a semantic XML vocabulary for technical documentation, maintained by OASIS (Organization for the Advancement of Structured Information Standards). Originally developed in 1991 for computer documentation, it provides over 400 XML elements for structuring books, articles, and reference materials. DocBook is widely used for Linux documentation, O'Reilly technical books, and enterprise documentation systems.
Q: What output formats can I generate from DocBook?
A: DocBook XML can be transformed into HTML (single-page or chunked), PDF (via XSL-FO and FOP or via dblatex), EPUB, man pages, HTML Help (CHM), JavaHelp, plain text, RTF, and more. The DocBook XSL stylesheets provide the transformation rules, and tools like xsltproc, Saxon, or Apache FOP perform the actual conversion. Each output format can be extensively customized.
Q: How are MediaWiki headings mapped to DocBook elements?
A: MediaWiki heading levels are mapped to DocBook's hierarchical structure. Top-level headings (= ... =) become <book> or <article> titles, level 2 headings (== ... ==) become <chapter> elements, level 3 (=== ... ===) become <section> elements, and deeper levels become nested sections. This creates a proper document hierarchy with xml:id attributes for cross-referencing.
Q: What happens to MediaWiki templates in DocBook?
A: MediaWiki templates are mapped to semantic DocBook elements. Note templates become <note>, warning templates become <warning>, tip templates become <tip>, and important templates become <important>. Infobox templates are converted to structured elements or tables. Complex templates with logic are expanded to their rendered content during conversion.
Q: Can I validate the DocBook output?
A: Yes, DocBook XML can be validated against the official OASIS RELAX NG schema, DTD, or XSD. Tools like xmllint, Jing, or Oxygen XML Editor can validate the output. Validation ensures the document structure is correct before processing it through the publishing toolchain. Our converter generates valid DocBook 5.x output that passes schema validation.
Q: Are wiki tables converted to DocBook tables?
A: Yes, MediaWiki tables are converted to DocBook's CALS table model, which uses <table>, <tgroup>, <thead>, <tbody>, <row>, and <entry> elements. Header rows are placed in <thead>, column specifications are defined in <colspec>, and cell spanning is preserved with namest/nameend and morerows attributes. The CALS model provides precise control over table formatting.
Q: Is DocBook still relevant compared to Markdown or AsciiDoc?
A: DocBook remains the gold standard for complex technical documentation that requires precise structural control, formal validation, and professional publishing output. While Markdown and AsciiDoc are easier to write, DocBook's semantic richness is unmatched for books, API references, and enterprise documentation. Many organizations use lighter formats for authoring and convert to DocBook for publishing.
Q: What tools do I need to process DocBook files?
A: For HTML output, you need an XSLT processor (xsltproc or Saxon) with the DocBook XSL stylesheets. For PDF, add Apache FOP or dblatex. For EPUB, use the DocBook XSL EPUB3 stylesheets. Pandoc can also process DocBook files. XML editors like Oxygen XML Editor and XMLmind provide WYSIWYG editing with live preview. Many Linux distributions include DocBook tools in their package repositories.